In the past we have worked on robot perception challenges, especially on how to use vision to understand 3D scenes by solving problems, such as object detection and 6D object pose estimation. We have also dealt with other perception challenges, such as bearing-only navigation and localization, as well as localization based on wireless signal strength.
6D pose estimation
A requirement in order to be able to plan the motion of a robotic arm in a cluttered environment is to be able to detect the objects in the robot’s vicinity and their 6D pose (i.e., location and orientation). The goal of our work is to build the capability to identify accurate pose estimates for objects in cluttered scenarios. Particularly, we have been working on 1) developing intelligent techniques to autonomously generate labeled datasets for training object recognition pipelines, and 2) developing search-based algorithms for scene estimation, given RGBD data and 3D CAD models of objects.
Navigation without a Map using Bearing-Only Sensors
Often times, robot navigation schemes rely on having accurate distance information in the form of laser-range scanners or sonar. This work focuses on navigation using only bearing information, rather than using distance information. The robot can accurately determine the relative bearing of landmarks in its environment using a panoramic camera. Using this bearing information, the robot is able to execute a long and complex trajectory in order to complete some desired task and then return to its original position with a high degree of accuracy. This work focuses on the theoretical guarantees provided under an ideal model and proves navigability in two-dimensional workspaces under this model.
SLAM with Bearing-Only Sensors
This work focuses on studying the problem of bearing-only Simultaneous Localization and Mapping (SLAM) for robotic systems using only bearing information. A deep and wide study into different approaches to the problem is given, investigating methods such as the Extended Kalman Filter (EKF), Expectation Maximization (EM), and Particle Filtering. This work shows that particle filters work particularly well, especially when extra steps are taken to improve their robustness to outliers.
Publications:
2025 |
Ramesh, D; Keskar, S; Sivaramakrishnan, A; Bekris, K; Yu, J; Boularias, A PROBE: Proprioceptive Obstacle Detection and Estimation while Navigating in Clutter Conference IEEE International Conference on Robotics and Automation (ICRA), 2025. @conference{metha2025probe, title = {PROBE: Proprioceptive Obstacle Detection and Estimation while Navigating in Clutter}, author = {D Ramesh and S Keskar and A Sivaramakrishnan and K Bekris and J Yu and A Boularias}, url = {https://dhruvmetha.github.io/legged-probe/}, year = {2025}, date = {2025-05-01}, booktitle = {IEEE International Conference on Robotics and Automation (ICRA)}, abstract = {In critical applications, including search-andrescue in degraded environments, blockages can be prevalent and prevent the effective deployment of certain sensing modalities, particularly vision, due to occlusion and the constrained range of view of onboard camera sensors. To enable robots to tackle these challenges, we propose a new approach, Proprioceptive Obstacle Detection and Estimation while navigating in clutter (PROBE), which instead relies only on the robot's proprioception to infer the presence or the absence of occluded rectangular obstacles while predicting their dimensions and poses in SE(2). The approach is a Transformer neural network that receives as input a history of applied torques and sensed whole-body movements of the robot and returns a parameterized representation of the obstacles in the environment. The effectiveness of PROBE is evaluated on simulated environments in Isaac Gym and with a real Unitree Go1 quadruped robot.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } In critical applications, including search-andrescue in degraded environments, blockages can be prevalent and prevent the effective deployment of certain sensing modalities, particularly vision, due to occlusion and the constrained range of view of onboard camera sensors. To enable robots to tackle these challenges, we propose a new approach, Proprioceptive Obstacle Detection and Estimation while navigating in clutter (PROBE), which instead relies only on the robot's proprioception to infer the presence or the absence of occluded rectangular obstacles while predicting their dimensions and poses in SE(2). The approach is a Transformer neural network that receives as input a history of applied torques and sensed whole-body movements of the robot and returns a parameterized representation of the obstacles in the environment. The effectiveness of PROBE is evaluated on simulated environments in Isaac Gym and with a real Unitree Go1 quadruped robot. |
2023 |
Lu, S; Chang, H; Jing, E; Boularias, A; Bekris, K Ovir-3d: Open-Vocabulary 3d Instance Retrieval without Training on 3d Data Inproceedings Conference on Robot Learning (CoRL), Atlanta, GA, 2023. @inproceedings{Lu:2023aa, title = {Ovir-3d: Open-Vocabulary 3d Instance Retrieval without Training on 3d Data}, author = {S Lu and H Chang and E Jing and A Boularias and K Bekris}, url = {https://proceedings.mlr.press/v229/lu23a/lu23a.pdf}, year = {2023}, date = {2023-11-01}, booktitle = {Conference on Robot Learning (CoRL)}, address = {Atlanta, GA}, abstract = {This work presents OVIR-3D, a straightforward yet effective method for open-vocabulary 3D object instance retrieval without using any 3D data for training. Given a language query, the proposed method is able to return a ranked set of 3D object instance segments based on the feature similarity of the instance and the text query. This is achieved by a multi-view fusion of text-aligned 2D region proposals into 3D space, where the 2D region proposal network could lever-age 2D datasets, which are more accessible and typically larger than 3D datasets. The proposed fusion process is efficient as it can be performed in real-time for most indoor 3D scenes and does not require additional training in 3D space. Experiments on public datasets and a real robot show the effectiveness of the method and its potential for applications in robot navigation and manipulation.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This work presents OVIR-3D, a straightforward yet effective method for open-vocabulary 3D object instance retrieval without using any 3D data for training. Given a language query, the proposed method is able to return a ranked set of 3D object instance segments based on the feature similarity of the instance and the text query. This is achieved by a multi-view fusion of text-aligned 2D region proposals into 3D space, where the 2D region proposal network could lever-age 2D datasets, which are more accessible and typically larger than 3D datasets. The proposed fusion process is efficient as it can be performed in real-time for most indoor 3D scenes and does not require additional training in 3D space. Experiments on public datasets and a real robot show the effectiveness of the method and its potential for applications in robot navigation and manipulation. |
Chang, H; Boyalakuntla, K; Lu, S; Cai, S; Jing, E; Keskar, S; Geng, S; Abbas, A; Zhou, L; Bekris, K; Boularias, A Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs Conference Conference on Robot Learning (CoRL), Atlanta, GA, 2023. @conference{Chang_Corl23b, title = {Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs}, author = {H Chang and K Boyalakuntla and S Lu and S Cai and E Jing and S Keskar and S Geng and A Abbas and L Zhou and K Bekris and A Boularias}, url = {https://arxiv.org/abs/2309.15940}, year = {2023}, date = {2023-11-01}, booktitle = {Conference on Robot Learning (CoRL)}, address = {Atlanta, GA}, abstract = {We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as pick up a cup on a kitchen table or navigate to a sofa on which someone is sitting. In contrast to existing research on 3D scene graphs, OVSG supports free-form text input and open-vocabulary querying. Through a series of comparative experiments using the ScanNet dataset and a self-collected dataset, we demonstrate that our proposed approach significantly surpasses the performance of previous semantic-based localization techniques. Moreover, we highlight the practical application of OVSG in real-world robot navigation and manipulation experiments.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as pick up a cup on a kitchen table or navigate to a sofa on which someone is sitting. In contrast to existing research on 3D scene graphs, OVSG supports free-form text input and open-vocabulary querying. Through a series of comparative experiments using the ScanNet dataset and a self-collected dataset, we demonstrate that our proposed approach significantly surpasses the performance of previous semantic-based localization techniques. Moreover, we highlight the practical application of OVSG in real-world robot navigation and manipulation experiments. |
Lu, S; Deng, Y; Boularias, A; Bekris, K Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D Videos Inproceedings IEEE International Conference on Robotics and Automation (ICRA), London, UK, 2023. @inproceedings{Lu:2023ab, title = {Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D Videos}, author = {S Lu and Y Deng and A Boularias and K Bekris}, url = {https://arxiv.org/abs/2304.04325}, year = {2023}, date = {2023-05-01}, booktitle = {IEEE International Conference on Robotics and Automation (ICRA)}, address = {London, UK}, abstract = {This work proposes a self-supervised learning system for segmenting rigid objects in RGB images. The proposed pipeline is trained on unlabeled RGB-D videos of static objects, which can be captured with a camera carried by a mobile robot. A key feature of the self-supervised training process is a graph-matching algorithm that operates on the over-segmentation output of the point cloud that is reconstructed from each video. The graph matching, along with point cloud registration, is able to find reoccurring object patterns across videos and combine them into 3D object pseudo labels, even under occlusions or different viewing angles. Projected 2D object masks from 3D pseudo labels are used to train a pixel-wise feature extractor through contrastive learning. During online inference, a clustering method uses the learned features to cluster foreground pixels into object segments. Experiments highlight the method's effectiveness on both real and synthetic video datasets, which include cluttered scenes of tabletop objects. The proposed method outperforms existing unsupervised methods for object segmentation by a large margin.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This work proposes a self-supervised learning system for segmenting rigid objects in RGB images. The proposed pipeline is trained on unlabeled RGB-D videos of static objects, which can be captured with a camera carried by a mobile robot. A key feature of the self-supervised training process is a graph-matching algorithm that operates on the over-segmentation output of the point cloud that is reconstructed from each video. The graph matching, along with point cloud registration, is able to find reoccurring object patterns across videos and combine them into 3D object pseudo labels, even under occlusions or different viewing angles. Projected 2D object masks from 3D pseudo labels are used to train a pixel-wise feature extractor through contrastive learning. During online inference, a clustering method uses the learned features to cluster foreground pixels into object segments. Experiments highlight the method's effectiveness on both real and synthetic video datasets, which include cluttered scenes of tabletop objects. The proposed method outperforms existing unsupervised methods for object segmentation by a large margin. |
Nakhimovich, D; Miao, Y; Bekris, K Resolution Complete In-Place Object Retrieval Given Known Object Models Inproceedings IEEE International Conference on Robotics and Automatics (ICRA), London, UK, 2023. @inproceedings{Nakhimovich:2023aa, title = {Resolution Complete In-Place Object Retrieval Given Known Object Models}, author = {D Nakhimovich and Y Miao and K Bekris}, url = {https://arxiv.org/abs/2303.14562}, year = {2023}, date = {2023-01-01}, booktitle = {IEEE International Conference on Robotics and Automatics (ICRA)}, address = {London, UK}, abstract = {This work proposes a robot task planning framework for retrieving a target object in a confined workspace among multiple stacked objects that obstruct the target. The robot can use prehensile picking and in-workspace placing actions. The method assumes access to 3D models for the visible objects in the scene. The key contribution is in achieving desirable properties, i.e., to provide (a) safety, by avoiding collisions with sensed obstacles, objects, and occluded regions, and (b) resolution completeness (RC) - or probabilistic completeness (PC) depending on implementation - which indicates a solution will be eventually found (if it exists) as the resolution of algorithmic parameters increases. A heuristic variant of the basic RC algorithm is also proposed to solve the task more efficiently while retaining the desirable properties. Simulation results compare using random picking and placing operations against the basic RC algorithm that reasons about object dependency as well as its heuristic variant. The success rate is higher for the RC approaches given the same amount of time. The heuristic variant is able to solve the problem even more efficiently than the basic approach. The integration of the RC algorithm with perception, where an RGB-D sensor detects the objects as they are being moved, enables real robot demonstrations of safely retrieving target objects from a cluttered shelf.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This work proposes a robot task planning framework for retrieving a target object in a confined workspace among multiple stacked objects that obstruct the target. The robot can use prehensile picking and in-workspace placing actions. The method assumes access to 3D models for the visible objects in the scene. The key contribution is in achieving desirable properties, i.e., to provide (a) safety, by avoiding collisions with sensed obstacles, objects, and occluded regions, and (b) resolution completeness (RC) - or probabilistic completeness (PC) depending on implementation - which indicates a solution will be eventually found (if it exists) as the resolution of algorithmic parameters increases. A heuristic variant of the basic RC algorithm is also proposed to solve the task more efficiently while retaining the desirable properties. Simulation results compare using random picking and placing operations against the basic RC algorithm that reasons about object dependency as well as its heuristic variant. The success rate is higher for the RC approaches given the same amount of time. The heuristic variant is able to solve the problem even more efficiently than the basic approach. The integration of the RC algorithm with perception, where an RGB-D sensor detects the objects as they are being moved, enables real robot demonstrations of safely retrieving target objects from a cluttered shelf. |
2022 |
Wen, B; Lian, W; Bekris, K; Schaal, S You Only Demonstrate Once: Category-Level Manipulation from Single Visual Demonstration Inproceedings Robotics: Science and Systems (RSS), 2022, (Nomination for Best Paper Award). @inproceedings{Wen:2022ab, title = {You Only Demonstrate Once: Category-Level Manipulation from Single Visual Demonstration}, author = {B Wen and W Lian and K Bekris and S Schaal}, url = {https://www.roboticsproceedings.org/rss18/p044.pdf}, year = {2022}, date = {2022-06-01}, booktitle = {Robotics: Science and Systems (RSS)}, abstract = {Promising results have been achieved recently in category-level manipulation that generalizes across object instances. Nevertheless, it often requires expensive real-world data collection and manual specification of semantic keypoints for each object category and task. Additionally, coarse keypoint predictions and ignoring intermediate action sequences hinder adoption in complex manipulation tasks beyond pick-and-place. This work proposes a novel, category-level manipulation framework that leverages an object-centric, category-level representation and model-free 6 DoF motion tracking. The canonical object representation is learned solely in simulation and then used to parse a category-level, task trajectory from a single demonstration video. The demonstration is reprojected to a target trajectory tailored to a novel object via the canonical representation. During execution, the manipulation horizon is decomposed into long range, collision-free motion and last-inch manipulation. For the latter part, a category-level behavior cloning (CatBC) method leverages motion tracking to perform closed-loop control. CatBC follows the target trajectory, projected from the demonstration and anchored to a dynamically selected category-level coordinate frame. The frame is automatically selected along the manipulation horizon by a local attention mechanism. This framework allows to teach different manipulation strategies by solely providing a single demonstration, without complicated manual programming. Extensive experiments demonstrate its efficacy in a range of challenging industrial tasks in high precision assembly, which involve learning complex, long-horizon policies. The process exhibits robustness against uncertainty due to dynamics as well as generalization across object instances and scene configurations.}, note = {Nomination for Best Paper Award}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Promising results have been achieved recently in category-level manipulation that generalizes across object instances. Nevertheless, it often requires expensive real-world data collection and manual specification of semantic keypoints for each object category and task. Additionally, coarse keypoint predictions and ignoring intermediate action sequences hinder adoption in complex manipulation tasks beyond pick-and-place. This work proposes a novel, category-level manipulation framework that leverages an object-centric, category-level representation and model-free 6 DoF motion tracking. The canonical object representation is learned solely in simulation and then used to parse a category-level, task trajectory from a single demonstration video. The demonstration is reprojected to a target trajectory tailored to a novel object via the canonical representation. During execution, the manipulation horizon is decomposed into long range, collision-free motion and last-inch manipulation. For the latter part, a category-level behavior cloning (CatBC) method leverages motion tracking to perform closed-loop control. CatBC follows the target trajectory, projected from the demonstration and anchored to a dynamically selected category-level coordinate frame. The frame is automatically selected along the manipulation horizon by a local attention mechanism. This framework allows to teach different manipulation strategies by solely providing a single demonstration, without complicated manual programming. Extensive experiments demonstrate its efficacy in a range of challenging industrial tasks in high precision assembly, which involve learning complex, long-horizon policies. The process exhibits robustness against uncertainty due to dynamics as well as generalization across object instances and scene configurations. |
Lu, S; Wang, R; Miao, Y; Mitash, C; Bekris, K Online Object Model Reconstruction and Reuse for Lifelong Improvement of Robot Manipulation Inproceedings IEEE International Conference on Robotics and Automation (ICRA), 2022, (Nomination for Best Paper Award in Manipulation). @inproceedings{Lu:2022ab, title = {Online Object Model Reconstruction and Reuse for Lifelong Improvement of Robot Manipulation}, author = {S Lu and R Wang and Y Miao and C Mitash and K Bekris}, url = {https://arxiv.org/abs/2109.13910}, year = {2022}, date = {2022-05-01}, booktitle = {IEEE International Conference on Robotics and Automation (ICRA)}, abstract = {This work proposes a robotic pipeline for picking and constrained placement of objects without geometric shape priors. Compared to recent efforts developed for similar tasks, where every object was assumed to be novel, the proposed system recognizes previously manipulated objects and performs online model reconstruction and reuse. Over a lifelong manipulation process, the system keeps learning features of objects it has interacted with and updates their reconstructed models. Whenever an instance of a previously manipulated object reappears, the system aims to first recognize it and then register its previously reconstructed model given the current observation. This step greatly reduces object shape uncertainty allowing the system to even reason for parts of objects, which are currently not observable. This also results in better manipulation efficiency as it reduces the need for active perception of the target object during manipulation. To get a reusable reconstructed model, the proposed pipeline adopts: i) TSDF for object representation, and ii) a variant of the standard particle filter algorithm for pose estimation and tracking of the partial object model. Furthermore, an effective way to construct and maintain a dataset of manipulated objects is presented. A sequence of real-world manipulation experiments is performed. They show how future manipulation tasks become more effective and efficient by reusing reconstructed models of previously manipulated objects, which were generated during their prior manipulation, instead of treating objects as novel every time.}, note = {Nomination for Best Paper Award in Manipulation}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This work proposes a robotic pipeline for picking and constrained placement of objects without geometric shape priors. Compared to recent efforts developed for similar tasks, where every object was assumed to be novel, the proposed system recognizes previously manipulated objects and performs online model reconstruction and reuse. Over a lifelong manipulation process, the system keeps learning features of objects it has interacted with and updates their reconstructed models. Whenever an instance of a previously manipulated object reappears, the system aims to first recognize it and then register its previously reconstructed model given the current observation. This step greatly reduces object shape uncertainty allowing the system to even reason for parts of objects, which are currently not observable. This also results in better manipulation efficiency as it reduces the need for active perception of the target object during manipulation. To get a reusable reconstructed model, the proposed pipeline adopts: i) TSDF for object representation, and ii) a variant of the standard particle filter algorithm for pose estimation and tracking of the partial object model. Furthermore, an effective way to construct and maintain a dataset of manipulated objects is presented. A sequence of real-world manipulation experiments is performed. They show how future manipulation tasks become more effective and efficient by reusing reconstructed models of previously manipulated objects, which were generated during their prior manipulation, instead of treating objects as novel every time. |
Mitash, C; Boularias, A; Bekris, K Physics-Based Scene-Level Reasoning for Object Pose Estimation in Clutter Journal Article International Journal of Robotics Research (IJRR), 2022. @article{Mitash:2022aa, title = {Physics-Based Scene-Level Reasoning for Object Pose Estimation in Clutter}, author = {C Mitash and A Boularias and K Bekris}, url = {https://arxiv.org/pdf/1806.10457.pdf}, year = {2022}, date = {2022-05-01}, journal = {International Journal of Robotics Research (IJRR)}, abstract = {This paper focuses on vision-based pose estimation for multiple rigid objects placed in clutter, especially in cases involving occlusions and objects resting on each other. Progress has been achieved recently in object recognition given advancements in deep learning. Nevertheless, such tools typically require a large amount of training data and significant manual effort to label objects. This limits their applicability in robotics, where solutions must scale to a large number of objects and variety of conditions. Moreover, the combinatorial nature of the scenes that could arise from the placement of multiple objects is hard to capture in the training dataset. Thus, the learned models might not produce the desired level of precision required for tasks, such as robotic manipulation. This work proposes an autonomous process for pose estimation that spans from data generation, to scene-level reasoning and self-learning. In particular, the proposed framework first generates a labeled dataset for training a Convolutional Neural Network (CNN) for object detection in clutter. These detections are used to guide a scene-level optimization process, which considers the interactions between the different objects present in the clutter to output pose estimates of high precision. Furthermore, confident estimates are used to label online real images from multiple views and re-train the process in a self-learning pipeline. Experimental results indicate that this process is quickly able to identify in cluttered scenes physically-consistent object poses that are more precise than the ones found by reasoning over individual instances of objects. Furthermore, the quality of pose estimates increases over time given the self-learning process.}, keywords = {}, pubstate = {published}, tppubtype = {article} } This paper focuses on vision-based pose estimation for multiple rigid objects placed in clutter, especially in cases involving occlusions and objects resting on each other. Progress has been achieved recently in object recognition given advancements in deep learning. Nevertheless, such tools typically require a large amount of training data and significant manual effort to label objects. This limits their applicability in robotics, where solutions must scale to a large number of objects and variety of conditions. Moreover, the combinatorial nature of the scenes that could arise from the placement of multiple objects is hard to capture in the training dataset. Thus, the learned models might not produce the desired level of precision required for tasks, such as robotic manipulation. This work proposes an autonomous process for pose estimation that spans from data generation, to scene-level reasoning and self-learning. In particular, the proposed framework first generates a labeled dataset for training a Convolutional Neural Network (CNN) for object detection in clutter. These detections are used to guide a scene-level optimization process, which considers the interactions between the different objects present in the clutter to output pose estimates of high precision. Furthermore, confident estimates are used to label online real images from multiple views and re-train the process in a self-learning pipeline. Experimental results indicate that this process is quickly able to identify in cluttered scenes physically-consistent object poses that are more precise than the ones found by reasoning over individual instances of objects. Furthermore, the quality of pose estimates increases over time given the self-learning process. |
Wen, B; Lian, W; Bekris, K; Schaal, S Catgrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation Inproceedings IEEE International Conference on Robotics and Automation (ICRA), 2022. @inproceedings{Wen:2022aa, title = {Catgrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation}, author = {B Wen and W Lian and K Bekris and S Schaal}, url = {https://arxiv.org/abs/2109.09163}, year = {2022}, date = {2022-05-01}, booktitle = {IEEE International Conference on Robotics and Automation (ICRA)}, abstract = {Task-relevant grasping is critical for industrial assembly, where downstream manipulation tasks constrain the set of valid grasps. Learning how to perform this task, however, is challenging, since task-relevant grasp labels are hard to define and annotate. There is also yet no consensus on proper representations for modeling or off-the-shelf tools for performing task-relevant grasps. This work proposes a framework to learn task-relevant grasping for industrial objects without the need of time-consuming real-world data collection or manual annotation. To achieve this, the entire framework is trained solely in simulation, including supervised training with synthetic label generation and self-supervised, hand-object interaction. In the context of this framework, this paper proposes a novel, object-centric canonical representation at the category level, which allows establishing dense correspondence across object instances and transferring task-relevant grasps to novel instances. Extensive experiments on task-relevant grasping of densely-cluttered industrial objects are conducted in both simulation and real-world setups, demonstrating the effectiveness of the proposed framework. Code and data is released at https://sites.google.com/view/catgrasp.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Task-relevant grasping is critical for industrial assembly, where downstream manipulation tasks constrain the set of valid grasps. Learning how to perform this task, however, is challenging, since task-relevant grasp labels are hard to define and annotate. There is also yet no consensus on proper representations for modeling or off-the-shelf tools for performing task-relevant grasps. This work proposes a framework to learn task-relevant grasping for industrial objects without the need of time-consuming real-world data collection or manual annotation. To achieve this, the entire framework is trained solely in simulation, including supervised training with synthetic label generation and self-supervised, hand-object interaction. In the context of this framework, this paper proposes a novel, object-centric canonical representation at the category level, which allows establishing dense correspondence across object instances and transferring task-relevant grasps to novel instances. Extensive experiments on task-relevant grasping of densely-cluttered industrial objects are conducted in both simulation and real-world setups, demonstrating the effectiveness of the proposed framework. Code and data is released at https://sites.google.com/view/catgrasp. |
2021 |
Wen, B; Bekris, K Bundletrack: 6d Pose Tracking for Novel Objects without Instance or Category-Level 3d Models Inproceedings IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021. @inproceedings{Wen:2021aa, title = {Bundletrack: 6d Pose Tracking for Novel Objects without Instance or Category-Level 3d Models}, author = {B Wen and K Bekris}, url = {https://arxiv.org/abs/2108.00516}, year = {2021}, date = {2021-09-01}, booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, abstract = {Tracking the 6D pose of objects in video sequences is important for robot manipulation. Prior efforts, however, often assume that the target object's CAD model, at least at a category-level, is available for offline training or during online template matching. This work proposes BundleTrack, a general framework for 6D pose tracking of novel objects, which does not depend upon instance or category-level 3D models. It leverages the complementary attributes of recent advances in deep learning for segmentation and robust feature extraction, as well as memory augmented pose-graph optimization for achieving spatiotemporal consistency. This enables long-term, low-drift tracking under various challenging scenarios, including significant occlusions and object motions. Comprehensive experiments given two public benchmarks demonstrate that the proposed approach significantly outperforms state-of-art category-level 6D tracking or dynamic-SLAM methods. When compared against state-of-art methods that rely on an object instance CAD model, comparable performance is achieved, despite the proposed method's reduced information requirements. An efficient implementation in CUDA provides a real-time performance of 10Hz for the entire framework.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Tracking the 6D pose of objects in video sequences is important for robot manipulation. Prior efforts, however, often assume that the target object's CAD model, at least at a category-level, is available for offline training or during online template matching. This work proposes BundleTrack, a general framework for 6D pose tracking of novel objects, which does not depend upon instance or category-level 3D models. It leverages the complementary attributes of recent advances in deep learning for segmentation and robust feature extraction, as well as memory augmented pose-graph optimization for achieving spatiotemporal consistency. This enables long-term, low-drift tracking under various challenging scenarios, including significant occlusions and object motions. Comprehensive experiments given two public benchmarks demonstrate that the proposed approach significantly outperforms state-of-art category-level 6D tracking or dynamic-SLAM methods. When compared against state-of-art methods that rely on an object instance CAD model, comparable performance is achieved, despite the proposed method's reduced information requirements. An efficient implementation in CUDA provides a real-time performance of 10Hz for the entire framework. |
Morgan, A; Wen, B; Junchi, L; Boularias, A; Dollar, A; Bekris, K Vision-Driven Compliant Manipulation for Reliable, High-Precision Assembly Tasks Conference Robotics: Science and Systems, 2021. @conference{Morgan:2021aa, title = {Vision-Driven Compliant Manipulation for Reliable, High-Precision Assembly Tasks}, author = {A Morgan and B Wen and L Junchi and A Boularias and A Dollar and K Bekris}, year = {2021}, date = {2021-07-01}, booktitle = {Robotics: Science and Systems}, abstract = {Highly constrained manipulation tasks continue to be challenging for autonomous robots as they require high levels of precision, typically less than 1mm, which is often incompatible with what can be achieved by traditional perception systems. This paper demonstrates that the combination of state-of-the-art object tracking with passively adaptive mechanical hardware can be leveraged to complete precision manipulation tasks with tight, industrially-relevant tolerances (0.25mm). The proposed control method closes the loop through vision by tracking the relative 6D pose of objects in the relevant workspace. It adjusts the control reference of both the compliant manipulator and the hand to complete object insertion tasks via within-hand manipulation. Contrary to previous efforts for insertion, our method does not require expensive force sensors, precision manipulators, or time-consuming, online learning, which is data hungry. Instead, this effort leverages mechanical compliance and utilizes an object-agnostic manipulation model of the hand learned offline, off-the-shelf motion planning, and an RGBD-based object tracker trained solely with synthetic data. These features allow the proposed system to easily generalize and transfer to new tasks and environments. This paper describes in detail the system components and showcases its efficacy with extensive experiments involving tight tolerance peg-in-hole insertion tasks of various geometries as well as open-world constrained placement tasks.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Highly constrained manipulation tasks continue to be challenging for autonomous robots as they require high levels of precision, typically less than 1mm, which is often incompatible with what can be achieved by traditional perception systems. This paper demonstrates that the combination of state-of-the-art object tracking with passively adaptive mechanical hardware can be leveraged to complete precision manipulation tasks with tight, industrially-relevant tolerances (0.25mm). The proposed control method closes the loop through vision by tracking the relative 6D pose of objects in the relevant workspace. It adjusts the control reference of both the compliant manipulator and the hand to complete object insertion tasks via within-hand manipulation. Contrary to previous efforts for insertion, our method does not require expensive force sensors, precision manipulators, or time-consuming, online learning, which is data hungry. Instead, this effort leverages mechanical compliance and utilizes an object-agnostic manipulation model of the hand learned offline, off-the-shelf motion planning, and an RGBD-based object tracker trained solely with synthetic data. These features allow the proposed system to easily generalize and transfer to new tasks and environments. This paper describes in detail the system components and showcases its efficacy with extensive experiments involving tight tolerance peg-in-hole insertion tasks of various geometries as well as open-world constrained placement tasks. |
2020 |
Wen, B; Mitash, C; Ren, B; Bekris, K se(3)-TrackNet: Data-Driven 6d Pose Tracking by Calibrating Image Residuals in Synthetic Domains Conference IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, 2020. @conference{Wen:2020ab, title = {se(3)-TrackNet: Data-Driven 6d Pose Tracking by Calibrating Image Residuals in Synthetic Domains}, author = {B Wen and C Mitash and B Ren and K Bekris}, url = {http://arxiv.org/abs/2007.13866}, year = {2020}, date = {2020-10-01}, booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, address = {Las Vegas, NV}, abstract = {Tracking the 6D pose of objects in video sequences is important for robot manipulation. This task, however, introduces multiple challenges: (i) robot manipulation involves significant occlusions; (ii) data and annotations are troublesome and difficult to collect for 6D poses, which complicates machine learning solutions, and (iii) incremental error drift often accumulates in long term tracking to necessitate re-initialization of the object's pose. This work proposes a data-driven optimization approach for long-term, 6D pose tracking. It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object's model. The key contribution in this context is a novel neural network architecture, which appropriately disentangles the feature encoding to help reduce domain shift, and an effective 3D orientation representation via Lie Algebra. Consequently, even when the network is trained only with synthetic data can work effectively over real images. Comprehensive experiments over benchmarks - existing ones as well as a new dataset with significant occlusions related to object manipulation - show that the proposed approach achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images. The approach is also the most computationally efficient among the alternatives and achieves a tracking frequency of 90.9Hz.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Tracking the 6D pose of objects in video sequences is important for robot manipulation. This task, however, introduces multiple challenges: (i) robot manipulation involves significant occlusions; (ii) data and annotations are troublesome and difficult to collect for 6D poses, which complicates machine learning solutions, and (iii) incremental error drift often accumulates in long term tracking to necessitate re-initialization of the object's pose. This work proposes a data-driven optimization approach for long-term, 6D pose tracking. It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object's model. The key contribution in this context is a novel neural network architecture, which appropriately disentangles the feature encoding to help reduce domain shift, and an effective 3D orientation representation via Lie Algebra. Consequently, even when the network is trained only with synthetic data can work effectively over real images. Comprehensive experiments over benchmarks - existing ones as well as a new dataset with significant occlusions related to object manipulation - show that the proposed approach achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images. The approach is also the most computationally efficient among the alternatives and achieves a tracking frequency of 90.9Hz. |
Mitash, C; Shome, R; Wen, B; Boularias, A; Bekris, K Task-Driven Perception and Manipulation for Constrained Placement of Unknown Objects Journal Article IEEE Robotics and Automation Letters (RA-L) (also appearing at IEEE/RSJ IROS 2020), 2020. @article{Mitash:2020ab, title = {Task-Driven Perception and Manipulation for Constrained Placement of Unknown Objects}, author = {C Mitash and R Shome and B Wen and A Boularias and K Bekris}, url = {https://arxiv.org/abs/2006.15503}, year = {2020}, date = {2020-10-01}, journal = {IEEE Robotics and Automation Letters (RA-L) (also appearing at IEEE/RSJ IROS 2020)}, abstract = {Recent progress in robotic manipulation has dealt with the case of no prior object models in the context of relatively simple tasks, such as bin-picking. Existing methods for more constrained problems, however, such as deliberate placement in a tight region, depend more critically on shape information to achieve safe execution. This work introduces a possibilistic object representation for solving constrained placement tasks without shape priors. A perception method is proposed to track and update the object representation during motion execution, which respects physical and geometric constraints. The method operates directly over sensor data, modeling the seen and unseen parts of the object given observations. It results in a dynamically updated conservative representation, which can be used to plan safe manipulation actions. This task-driven perception process is integrated with manipulation task planning architecture for a dual-arm manipulator to discover efficient solutions for the constrained placement task with minimal sensing. The planning process can make use of handoff operations when necessary for safe placement given the conservative representation. The pipeline is evaluated with data from over 240 real-world experiments involving constrained placement of various unknown objects using a dual-arm manipulator. While straightforward pick-sense-and-place architectures frequently fail to solve these problems, the proposed integrated pipeline achieves more than 95% success and faster execution times.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Recent progress in robotic manipulation has dealt with the case of no prior object models in the context of relatively simple tasks, such as bin-picking. Existing methods for more constrained problems, however, such as deliberate placement in a tight region, depend more critically on shape information to achieve safe execution. This work introduces a possibilistic object representation for solving constrained placement tasks without shape priors. A perception method is proposed to track and update the object representation during motion execution, which respects physical and geometric constraints. The method operates directly over sensor data, modeling the seen and unseen parts of the object given observations. It results in a dynamically updated conservative representation, which can be used to plan safe manipulation actions. This task-driven perception process is integrated with manipulation task planning architecture for a dual-arm manipulator to discover efficient solutions for the constrained placement task with minimal sensing. The planning process can make use of handoff operations when necessary for safe placement given the conservative representation. The pipeline is evaluated with data from over 240 real-world experiments involving constrained placement of various unknown objects using a dual-arm manipulator. While straightforward pick-sense-and-place architectures frequently fail to solve these problems, the proposed integrated pipeline achieves more than 95% success and faster execution times. |
Mitash, C Scalable, Physics-Aware 6d Pose Estimation for Robot Manipulation PhD Thesis Rutgers University, 2020. @phdthesis{Mitash:2020aa, title = {Scalable, Physics-Aware 6d Pose Estimation for Robot Manipulation}, author = {C Mitash}, url = {https://rucore.libraries.rutgers.edu/rutgers-lib/64961/}, year = {2020}, date = {2020-09-01}, school = {Rutgers University}, abstract = {Robot manipulation often depend on some form of pose estimation to represent the state of the world and allow decision making both at the task-level and for motion or grasp planning. Recent progress in deep learning gives hope for a pose estimation solution that could generalize over textured and texture-less objects, objects with or without distinctive shape properties, and under different lighting conditions and clutter scenarios. Nevertheless, it gives rise to a new set of challenges such as the painful task of acquiring large-scale labeled training datasets and of dealing with their stochastic output over unforeseen scenarios that are not captured by the training. This restricts the scalability of such pose estimation solutions in robot manipulation tasks that often deal with a variety of objects and changing environments. The thesis first describes an automatic data generation and learning framework to address the scalability challenge. Learning is bootstrapped by generating labeled data via physics simulation and rendering. Then it self-improves over time by acquiring and labeling real-world images via a search-based pose estimation process. The thesis proposes algorithms to generate and validate object poses online based on the objects' geometry and based on the physical consistency of their scene-level interactions. These algorithms provide robustness even when there exists a domain gap between the synthetic training and the real test scenarios. Finally, the thesis proposes a manipulation planning framework that goes beyond model-based pose estimation. By utilizing a dynamic object representation, this integrated perception and manipulation framework can efficiently solve the task of picking unknown objects and placing them in a constrained space. The algorithms are evaluated over real-world robot manipulation experiments and over large-scale public datasets. The results indicate the usefulness of physical constraints in both the training and the online estimation phase. Moreover, the proposed framework, while only utilizing simulated data can obtain robust estimation in challenging scenarios such as densely-packed bins and clutter where other approaches suffer as a result of large occlusion and ambiguities due to similar looking texture-less surfaces.}, keywords = {}, pubstate = {published}, tppubtype = {phdthesis} } Robot manipulation often depend on some form of pose estimation to represent the state of the world and allow decision making both at the task-level and for motion or grasp planning. Recent progress in deep learning gives hope for a pose estimation solution that could generalize over textured and texture-less objects, objects with or without distinctive shape properties, and under different lighting conditions and clutter scenarios. Nevertheless, it gives rise to a new set of challenges such as the painful task of acquiring large-scale labeled training datasets and of dealing with their stochastic output over unforeseen scenarios that are not captured by the training. This restricts the scalability of such pose estimation solutions in robot manipulation tasks that often deal with a variety of objects and changing environments. The thesis first describes an automatic data generation and learning framework to address the scalability challenge. Learning is bootstrapped by generating labeled data via physics simulation and rendering. Then it self-improves over time by acquiring and labeling real-world images via a search-based pose estimation process. The thesis proposes algorithms to generate and validate object poses online based on the objects' geometry and based on the physical consistency of their scene-level interactions. These algorithms provide robustness even when there exists a domain gap between the synthetic training and the real test scenarios. Finally, the thesis proposes a manipulation planning framework that goes beyond model-based pose estimation. By utilizing a dynamic object representation, this integrated perception and manipulation framework can efficiently solve the task of picking unknown objects and placing them in a constrained space. The algorithms are evaluated over real-world robot manipulation experiments and over large-scale public datasets. The results indicate the usefulness of physical constraints in both the training and the online estimation phase. Moreover, the proposed framework, while only utilizing simulated data can obtain robust estimation in challenging scenarios such as densely-packed bins and clutter where other approaches suffer as a result of large occlusion and ambiguities due to similar looking texture-less surfaces. |
Wang, R; Mitash, C; Lu, S; Boehm, D; Bekris, K Safe and Effective Picking Paths in Clutter Given Discrete Distributions of Object Poses Conference IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, 2020. @conference{Wang:2020ab, title = {Safe and Effective Picking Paths in Clutter Given Discrete Distributions of Object Poses}, author = {R Wang and C Mitash and S Lu and D Boehm and K Bekris}, url = {https://arxiv.org/abs/2008.04465}, year = {2020}, date = {2020-01-01}, booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, address = {Las Vegas, NV}, abstract = {Picking an item in the presence of other objects can be challenging as it involves occlusions and partial views. Given object models, one approach is to perform object pose estimation and use the most likely candidate pose per object to pick the target without collisions. This approach, however, ignores the uncertainty of the perception process both regarding the target's and the surrounding objects' poses. This work proposes first a perception process for 6D pose estimation, which returns a discrete distribution of object poses in a scene. Then, an open-loop planning pipeline is proposed to return safe and effective solutions for moving a robotic arm to pick, which (a) minimizes the probability of collision with the obstructing objects; and (b) maximizes the probability of reaching the target item. The planning framework models the challenge as a stochastic variant of the Minimum Constraint Removal (MCR) problem. The effectiveness of the methodology is verified given both simulated and real data in different scenarios. The experiments demonstrate the importance of considering the uncertainty of the perception process in terms of safe execution. The results also show that the methodology is more effective than conservative MCR approaches, which avoid all possible object poses regardless of the reported uncertainty.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Picking an item in the presence of other objects can be challenging as it involves occlusions and partial views. Given object models, one approach is to perform object pose estimation and use the most likely candidate pose per object to pick the target without collisions. This approach, however, ignores the uncertainty of the perception process both regarding the target's and the surrounding objects' poses. This work proposes first a perception process for 6D pose estimation, which returns a discrete distribution of object poses in a scene. Then, an open-loop planning pipeline is proposed to return safe and effective solutions for moving a robotic arm to pick, which (a) minimizes the probability of collision with the obstructing objects; and (b) maximizes the probability of reaching the target item. The planning framework models the challenge as a stochastic variant of the Minimum Constraint Removal (MCR) problem. The effectiveness of the methodology is verified given both simulated and real data in different scenarios. The experiments demonstrate the importance of considering the uncertainty of the perception process in terms of safe execution. The results also show that the methodology is more effective than conservative MCR approaches, which avoid all possible object poses regardless of the reported uncertainty. |
Wen, B; Mitash, C; Soorian, S; Kimmel, A; Sintov, A; Bekris, K Robust, Occlusion-Aware Pose Estimation for Objects Grasped by Adaptive Hands Conference IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 2020. @conference{Wen:2020aa, title = {Robust, Occlusion-Aware Pose Estimation for Objects Grasped by Adaptive Hands}, author = {B Wen and C Mitash and S Soorian and A Kimmel and A Sintov and K Bekris}, url = {https://arxiv.org/abs/2003.03518}, year = {2020}, date = {2020-01-01}, booktitle = {IEEE International Conference on Robotics and Automation (ICRA)}, address = {Paris, France}, abstract = {Many manipulation tasks, such as placement or within-hand manipulation, require the object's pose relative to a robot hand. The task is difficult when the hand significantly occludes the object. It is especially hard for adaptive hands, for which it is not easy to detect the finger's configuration. In addition, RGB-only approaches face issues with texture-less objects or when the hand and the object look similar. This paper presents a depth-based framework, which aims for robust pose estimation and short response times. The approach detects the adaptive hand's state via efficient parallel search given the highest overlap between the hand's model and the point cloud. The hand's point cloud is pruned and robust global registration is performed to generate object pose hypotheses, which are clustered. False hypotheses are pruned via physical reasoning. The remaining poses' quality is evaluated given agreement with observed data. Extensive evaluation on synthetic and real data demonstrates the accuracy and computational efficiency of the framework when applied on challenging, highly-occluded scenarios for different object types. An ablation study identifies how the framework's components help in performance. This work also provides a dataset for in-hand 6D object pose esti- mation. Code and dataset are available at: https://github. com/wenbowen123/icra20-hand-object-pose}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Many manipulation tasks, such as placement or within-hand manipulation, require the object's pose relative to a robot hand. The task is difficult when the hand significantly occludes the object. It is especially hard for adaptive hands, for which it is not easy to detect the finger's configuration. In addition, RGB-only approaches face issues with texture-less objects or when the hand and the object look similar. This paper presents a depth-based framework, which aims for robust pose estimation and short response times. The approach detects the adaptive hand's state via efficient parallel search given the highest overlap between the hand's model and the point cloud. The hand's point cloud is pruned and robust global registration is performed to generate object pose hypotheses, which are clustered. False hypotheses are pruned via physical reasoning. The remaining poses' quality is evaluated given agreement with observed data. Extensive evaluation on synthetic and real data demonstrates the accuracy and computational efficiency of the framework when applied on challenging, highly-occluded scenarios for different object types. An ablation study identifies how the framework's components help in performance. This work also provides a dataset for in-hand 6D object pose esti- mation. Code and dataset are available at: https://github. com/wenbowen123/icra20-hand-object-pose |
2019 |
Mitash, C; Wen, B; Bekris, K; Boularias, A Scene-Level Pose Estimation for Multiple Instances of Densely Packed Objects Conference Conference on Robot Learning (CoRL), Osaka, Japan, 2019. @conference{Mitash:2019aa, title = {Scene-Level Pose Estimation for Multiple Instances of Densely Packed Objects}, author = {C Mitash and B Wen and K Bekris and A Boularias}, url = {https://arxiv.org/pdf/1910.04953.pdf}, year = {2019}, date = {2019-10-01}, booktitle = {Conference on Robot Learning (CoRL)}, address = {Osaka, Japan}, abstract = {This paper introduces key machine learning operations that allow the realization of robust, joint 6D pose estimation of multiple instances of objects either densely packed or in unstructured piles from RGB-D data. The first objective is to learn semantic and instance-boundary detectors without manual labeling. An adversarial training framework in conjunction with physics-based simulation is used to achieve detectors that behave similarly in synthetic and real data. Given the stochastic output of such detectors, candidates for object poses are sampled. The second objective is to automatically learn a single score for each pose candidate that represents its quality in terms of explaining the entire scene via a gradient boosted tree. The proposed method uses features derived from surface and boundary alignment between the observed scene and the object model placed at hypothesized poses. Scene-level, multi-instance pose estimation is then achieved by an integer linear programming process that selects hypotheses that maximize the sum of the learned individual scores, while respecting constraints, such as avoiding collisions. To evaluate this method, a dataset of densely packed objects with challenging setups for state-of-the-art approaches is collected. Experiments on this dataset and a public one show that the method significantly outperforms alternatives in terms of 6D pose accuracy while trained only with synthetic datasets.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } This paper introduces key machine learning operations that allow the realization of robust, joint 6D pose estimation of multiple instances of objects either densely packed or in unstructured piles from RGB-D data. The first objective is to learn semantic and instance-boundary detectors without manual labeling. An adversarial training framework in conjunction with physics-based simulation is used to achieve detectors that behave similarly in synthetic and real data. Given the stochastic output of such detectors, candidates for object poses are sampled. The second objective is to automatically learn a single score for each pose candidate that represents its quality in terms of explaining the entire scene via a gradient boosted tree. The proposed method uses features derived from surface and boundary alignment between the observed scene and the object model placed at hypothesized poses. Scene-level, multi-instance pose estimation is then achieved by an integer linear programming process that selects hypotheses that maximize the sum of the learned individual scores, while respecting constraints, such as avoiding collisions. To evaluate this method, a dataset of densely packed objects with challenging setups for state-of-the-art approaches is collected. Experiments on this dataset and a public one show that the method significantly outperforms alternatives in terms of 6D pose accuracy while trained only with synthetic datasets. |
Shome, R; Tang, W; Song, C; Mitash, C; Kourtev, C; Yu, J; Boularias, A; Bekris, K Towards Robust Product Packing with a Minimalistic End-Effector Conference IEEE International Conference on Robotics and Automation (ICRA), 2019, (Nomination for Best Paper Award in Automation). @conference{Shome:2019ab, title = {Towards Robust Product Packing with a Minimalistic End-Effector}, author = {R Shome and W Tang and C Song and C Mitash and C Kourtev and J Yu and A Boularias and K Bekris}, url = {http://rl.cs.rutgers.edu/publications/ICRA-2019-Packing.pdf}, year = {2019}, date = {2019-05-01}, booktitle = {IEEE International Conference on Robotics and Automation (ICRA)}, abstract = {Advances in sensor technologies, object detection algorithms, planning frameworks and hardware designs have motivated the deployment of robots in warehouse automation. A variety of such applications, like order fulfillment or packing tasks, require picking objects from unstructured piles and carefully arranging them in bins or containers. Desirable solutions need to be low-cost, easily deployable and controllable, making minimalistic hardware choices desirable. The challenge in designing an effective solution to this problem relates to appropriately integrating multiple components, so as to achieve a robust pipeline that minimizes failure conditions. The current work proposes a complete pipeline for solving such packing tasks, given access only to RGB-D data and a single robot arm with a minimalistic, vacuum-based end-effector. To achieve the desired level of robustness, three key manipulation primitives are identified, which take advantage of the environment and simple operations to successfully pack multiple cubic objects. The overall approach is demonstrated to be robust to execution and perception errors. The impact of each manipulation primitive is evaluated by considering different versions of the proposed pipeline that incrementally introduce reasoning about object poses and corrective manipulation actions.}, note = {Nomination for Best Paper Award in Automation}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Advances in sensor technologies, object detection algorithms, planning frameworks and hardware designs have motivated the deployment of robots in warehouse automation. A variety of such applications, like order fulfillment or packing tasks, require picking objects from unstructured piles and carefully arranging them in bins or containers. Desirable solutions need to be low-cost, easily deployable and controllable, making minimalistic hardware choices desirable. The challenge in designing an effective solution to this problem relates to appropriately integrating multiple components, so as to achieve a robust pipeline that minimizes failure conditions. The current work proposes a complete pipeline for solving such packing tasks, given access only to RGB-D data and a single robot arm with a minimalistic, vacuum-based end-effector. To achieve the desired level of robustness, three key manipulation primitives are identified, which take advantage of the environment and simple operations to successfully pack multiple cubic objects. The overall approach is demonstrated to be robust to execution and perception errors. The impact of each manipulation primitive is evaluated by considering different versions of the proposed pipeline that incrementally introduce reasoning about object poses and corrective manipulation actions. |
2018 |
Mitash, C; Boularias, A; Bekris, K Robust 6D Pose Estimation with Stochastic Congruent Sets Conference British Machine Vision Conference (BMVC), Newcastle, UK, 2018. @conference{Mitash:2018ab, title = {Robust 6D Pose Estimation with Stochastic Congruent Sets}, author = {C Mitash and A Boularias and K Bekris}, url = {https://arxiv.org/abs/1805.06324}, year = {2018}, date = {2018-09-01}, booktitle = {British Machine Vision Conference (BMVC)}, address = {Newcastle, UK}, abstract = {Object pose estimation is frequently achieved by first segmenting an RGB image and then, given depth data, registering the corresponding point cloud segment against the object's 3D model. Despite the progress due to CNNs, semantic segmentation output can be noisy, especially when the CNN is only trained on synthetic data. This causes registration methods to fail in estimating a good object pose. This work proposes a novel stochastic optimization process that treats the segmentation output of CNNs as a confidence probability. The algorithm, called Stochastic Congruent Sets (StoCS), samples pointsets on the point cloud according to the soft segmentation distribution and so as to agree with the object's known geometry. The pointsets are then matched to congruent sets on the 3D object model to generate pose estimates. StoCS is shown to be robust on an APC dataset, despite the fact the CNN is trained only on synthetic data. In the YCB dataset, StoCS outperforms a recent network for 6D pose estimation and alternative pointset matching techniques.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Object pose estimation is frequently achieved by first segmenting an RGB image and then, given depth data, registering the corresponding point cloud segment against the object's 3D model. Despite the progress due to CNNs, semantic segmentation output can be noisy, especially when the CNN is only trained on synthetic data. This causes registration methods to fail in estimating a good object pose. This work proposes a novel stochastic optimization process that treats the segmentation output of CNNs as a confidence probability. The algorithm, called Stochastic Congruent Sets (StoCS), samples pointsets on the point cloud according to the soft segmentation distribution and so as to agree with the object's known geometry. The pointsets are then matched to congruent sets on the 3D object model to generate pose estimates. StoCS is shown to be robust on an APC dataset, despite the fact the CNN is trained only on synthetic data. In the YCB dataset, StoCS outperforms a recent network for 6D pose estimation and alternative pointset matching techniques. |
Mitash, C; Boularias, A; Bekris, K Improving 6d Pose Estimation of Objects in Clutter Via Physics-Aware Monte Carlo Tree Search Conference IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 2018. @conference{Mitash:2018aa, title = {Improving 6d Pose Estimation of Objects in Clutter Via Physics-Aware Monte Carlo Tree Search}, author = {C Mitash and A Boularias and K Bekris}, url = {https://arxiv.org/pdf/1710.08577}, year = {2018}, date = {2018-05-01}, booktitle = {IEEE International Conference on Robotics and Automation (ICRA)}, address = {Brisbane, Australia}, abstract = {This work proposes a process for efficiently searching over combinations of individual object 6D pose hypotheses in cluttered scenes, especially in cases involving occlusions and objects resting on each other. The initial set of candidate object poses is generated from state-of-the-art object detection and global point cloud registration techniques. The best scored pose per object by using these techniques may not be accurate due to overlaps and occlusions. Nevertheless, experimental indications provided in this work show that object poses with lower ranks may be closer to the real poses than ones with high ranks according to registration techniques. This motivates a global optimization process for improving these poses by taking into account scene-level physical interactions between objects. It also implies that the Cartesian product of candidate poses for interacting objects must be searched so as to identify the best scene-level hypothesis. To perform the search efficiently, the candidate poses for each object are clustered so as to reduce their number but still keep a sufficient diversity. Then, searching over the combinations of candidate object poses is performed through a Monte Carlo Tree Search (MCTS) process that uses the similarity between the observed depth image of the scene and a rendering of the scene given the hypothesized pose as a score that guides the search procedure. MCTS handles in a principled way the tradeoff between fine-tuning the most promising poses and exploring new ones, by using the Upper Confidence Bound (UCB) technique. Experimental results indicate that this process is able to quickly identify in cluttered scenes physically-consistent object poses that are significantly closer to ground truth compared to poses found by point cloud registration methods.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } This work proposes a process for efficiently searching over combinations of individual object 6D pose hypotheses in cluttered scenes, especially in cases involving occlusions and objects resting on each other. The initial set of candidate object poses is generated from state-of-the-art object detection and global point cloud registration techniques. The best scored pose per object by using these techniques may not be accurate due to overlaps and occlusions. Nevertheless, experimental indications provided in this work show that object poses with lower ranks may be closer to the real poses than ones with high ranks according to registration techniques. This motivates a global optimization process for improving these poses by taking into account scene-level physical interactions between objects. It also implies that the Cartesian product of candidate poses for interacting objects must be searched so as to identify the best scene-level hypothesis. To perform the search efficiently, the candidate poses for each object are clustered so as to reduce their number but still keep a sufficient diversity. Then, searching over the combinations of candidate object poses is performed through a Monte Carlo Tree Search (MCTS) process that uses the similarity between the observed depth image of the scene and a rendering of the scene given the hypothesized pose as a score that guides the search procedure. MCTS handles in a principled way the tradeoff between fine-tuning the most promising poses and exploring new ones, by using the Upper Confidence Bound (UCB) technique. Experimental results indicate that this process is able to quickly identify in cluttered scenes physically-consistent object poses that are significantly closer to ground truth compared to poses found by point cloud registration methods. |
Hodan, T; Kouskouridas, R; Kim, T; Tombari, F; Bekris, K; Drost, B; Groueix, T; Walas, K; Lepetit, V; Leonardis, A; Steger, C; Michel, F; Sahin, C; Rother, C; Matas, J A Summary of the 4th International Workshop on Recovering 6d Object Pose Journal Article 2018. @article{Hodan:2018aa, title = {A Summary of the 4th International Workshop on Recovering 6d Object Pose}, author = {T Hodan and R Kouskouridas and T Kim and F Tombari and K Bekris and B Drost and T Groueix and K Walas and V Lepetit and A Leonardis and C Steger and F Michel and C Sahin and C Rother and J Matas}, url = {https://arxiv.org/abs/1810.03758}, year = {2018}, date = {2018-01-01}, booktitle = {Proceedings of the European Conference on Computer Vision (ECCV) Workshops}, abstract = {This document summarizes the 4th International Workshop on Recovering 6D Object Pose which was organized in conjunction with ECCV 2018 in Munich. The workshop featured four invited talks, oral and poster presentations of accepted workshop papers, and an introduction of the BOP benchmark for 6D object pose estimation. The workshop was attended by 100+ people working on relevant topics in both academia and industry who shared up-to-date advances and discussed open problems.}, keywords = {}, pubstate = {published}, tppubtype = {article} } This document summarizes the 4th International Workshop on Recovering 6D Object Pose which was organized in conjunction with ECCV 2018 in Munich. The workshop featured four invited talks, oral and poster presentations of accepted workshop papers, and an introduction of the BOP benchmark for 6D object pose estimation. The workshop was attended by 100+ people working on relevant topics in both academia and industry who shared up-to-date advances and discussed open problems. |
2017 |
Mitash, C; Bekris, K; Boularias, A IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, Canada, 2017. @conference{Mitash:2017aa, title = {A Self-Supervised Learning System for Object Detection Using Physics Simulation and Multi-View Pose Estimation}, author = {C Mitash and K Bekris and A Boularias}, url = {https://www.cs.rutgers.edu/~kb572/pubs/physics_object_detection.pdf}, year = {2017}, date = {2017-09-01}, booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, address = {Vancouver, Canada}, abstract = {Impressive progress has been achieved in object detection with the use of deep learning. Nevertheless, such tools typically require a large amount of training data and significant manual effort for labeling objects. This limits their applicability in robotics, where it is necessary to scale solutions to a large number of objects and a variety of conditions. The present work proposes a fully autonomous process to train a Convolutional Neural Network (CNNs) for object detection and pose estimation in robotic setups. The application involves detection of objects placed in a clutter and in tight environments, such as a shelf. In particular, given access to 3D object models, several aspects of the environment are simulated and the models are placed in physically realistic poses with respect to their environment to generate a labeled synthetic dataset. To further improve object detection, the network self-trains over real images that are labeled using a robust multi-view pose estimation process. The proposed training process is evaluated on several existing datasets and on a dataset that we collected with a Motoman robotic manipulator. Results show that the proposed process outperforms popular training processes relying on synthetic data generation and manual annotation.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Impressive progress has been achieved in object detection with the use of deep learning. Nevertheless, such tools typically require a large amount of training data and significant manual effort for labeling objects. This limits their applicability in robotics, where it is necessary to scale solutions to a large number of objects and a variety of conditions. The present work proposes a fully autonomous process to train a Convolutional Neural Network (CNNs) for object detection and pose estimation in robotic setups. The application involves detection of objects placed in a clutter and in tight environments, such as a shelf. In particular, given access to 3D object models, several aspects of the environment are simulated and the models are placed in physically realistic poses with respect to their environment to generate a labeled synthetic dataset. To further improve object detection, the network self-trains over real images that are labeled using a robust multi-view pose estimation process. The proposed training process is evaluated on several existing datasets and on a dataset that we collected with a Motoman robotic manipulator. Results show that the proposed process outperforms popular training processes relying on synthetic data generation and manual annotation. |
2016 |
Rennie, C; Shome, R; Bekris, K; Souza, A A Dataset for Improved RGBD-Based Object Detection and Pose Estimation for Warehouse Pick-and-Place Journal Article IEEE Robotics and Automation Letters (RA-L) [Also accepted to appear at the 2016 IEEE International Conference on Robotics and Automation (ICRA)], 1 , pp. 1179–1185, 2016. @article{Rennie:2016aa, title = {A Dataset for Improved RGBD-Based Object Detection and Pose Estimation for Warehouse Pick-and-Place}, author = {C Rennie and R Shome and K Bekris and A Souza}, url = {http://www.cs.rutgers.edu/~kb572/pubs/icra16_pose_estimation.pdf}, year = {2016}, date = {2016-02-01}, journal = {IEEE Robotics and Automation Letters (RA-L) [Also accepted to appear at the 2016 IEEE International Conference on Robotics and Automation (ICRA)]}, volume = {1}, pages = {1179--1185}, address = {Stockholm, Sweden}, abstract = {An important logistics application of robotics involves manipulators that pick-and-place objects placed in warehouse shelves. A critical aspect of this task corresponds to detecting the pose of a known object in the shelf using visual data. Solving this problem can be assisted by the use of an RGB-D sensor, which also provides depth information beyond visual data. Nevertheless, it remains a challenging problem since multiple issues need to be addressed, such as low illumination inside shelves, clutter, texture-less and reflective objects as well as the limitations of depth sensors. This paper provides a new rich data set for advancing the state-of-the-art in RGBD-based 3D object pose estimation, which is focused on the challenges that arise when solving warehouse pick-and-place tasks. The publicly available data set includes thousands of images and corresponding ground truth data for the objects used during the first Amazon Picking Challenge at different poses and clutter conditions. Each image is accompanied with ground truth information to assist in the evaluation of algorithms for object detection. To show the utility of the data set, a recent algorithm for RGBD-based pose estimation is evaluated in this paper. Based on the measured performance of the algorithm on the data set, various modifications and improvements are applied to increase the accuracy of detection. These steps can be easily applied to a variety of different methodologies for object pose detection and improve performance in the domain of warehouse pick-and-place.}, keywords = {}, pubstate = {published}, tppubtype = {article} } An important logistics application of robotics involves manipulators that pick-and-place objects placed in warehouse shelves. A critical aspect of this task corresponds to detecting the pose of a known object in the shelf using visual data. Solving this problem can be assisted by the use of an RGB-D sensor, which also provides depth information beyond visual data. Nevertheless, it remains a challenging problem since multiple issues need to be addressed, such as low illumination inside shelves, clutter, texture-less and reflective objects as well as the limitations of depth sensors. This paper provides a new rich data set for advancing the state-of-the-art in RGBD-based 3D object pose estimation, which is focused on the challenges that arise when solving warehouse pick-and-place tasks. The publicly available data set includes thousands of images and corresponding ground truth data for the objects used during the first Amazon Picking Challenge at different poses and clutter conditions. Each image is accompanied with ground truth information to assist in the evaluation of algorithms for object detection. To show the utility of the data set, a recent algorithm for RGBD-based pose estimation is evaluated in this paper. Based on the measured performance of the algorithm on the data set, various modifications and improvements are applied to increase the accuracy of detection. These steps can be easily applied to a variety of different methodologies for object pose detection and improve performance in the domain of warehouse pick-and-place. |
2012 |
Fallah, N; Apostolopoulos, I; Bekris, K; Folmer, E ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), Austin, TX, 2012. @conference{Fallah:2012aa, title = {The User As a Sensor: Navigating Users with Visual Impairments in Indoor Spaces Using Tactile Landmarks}, author = {N Fallah and I Apostolopoulos and K Bekris and E Folmer}, url = {http://www.cs.rutgers.edu/~kb572/pubs/userasasensor.pdf}, year = {2012}, date = {2012-05-01}, booktitle = {ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, address = {Austin, TX}, abstract = {Indoor navigation systems for users who are visually impaired typically rely upon expensive physical augmentation of the environment or expensive sensing equipment; consequently few systems have been implemented. We present an indoor navigation system called Navatar that allows for localization and navigation by exploiting the physical characteristics of indoor environments, taking advantage of the unique sensing abilities of users with visual impairments, and minimalistic sensing achievable with low cost accelerometers available in smartphones. Particle filters are used to estimate the usertextquoterights location based on the accelerometer data as well as the user confirming the presence of anticipated tactile landmarks along the provided path. Navatar has a high possibility of large-scale deployment, as it only requires an annotated virtual representation of an indoor environment. A user study with six blind users determines the accuracy of the approach, collects qualitative experiences and identifies areas for improvement.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Indoor navigation systems for users who are visually impaired typically rely upon expensive physical augmentation of the environment or expensive sensing equipment; consequently few systems have been implemented. We present an indoor navigation system called Navatar that allows for localization and navigation by exploiting the physical characteristics of indoor environments, taking advantage of the unique sensing abilities of users with visual impairments, and minimalistic sensing achievable with low cost accelerometers available in smartphones. Particle filters are used to estimate the usertextquoterights location based on the accelerometer data as well as the user confirming the presence of anticipated tactile landmarks along the provided path. Navatar has a high possibility of large-scale deployment, as it only requires an annotated virtual representation of an indoor environment. A user study with six blind users determines the accuracy of the approach, collects qualitative experiences and identifies areas for improvement. |
Navkar, N V; Deng, Z; Shah, D; Bekris, K; Tsekos, N Visual and Force-Feedback Guidance for Robot-Assisted Interventions in the Beating Heart with Real-Time Mri Inproceedings IEEE International Conference on Robotics and Automation, ICRA 2012, 14-18 May, 2012, St. Paul, Minnesota, USA, pp. 689–694, 2012. @inproceedings{Navkar:2012aa, title = {Visual and Force-Feedback Guidance for Robot-Assisted Interventions in the Beating Heart with Real-Time Mri}, author = {N V Navkar and Z Deng and D Shah and K Bekris and N Tsekos}, doi = {10.1109/ICRA.2012.6224582}, year = {2012}, date = {2012-01-01}, booktitle = {IEEE International Conference on Robotics and Automation, ICRA 2012, 14-18 May, 2012, St. Paul, Minnesota, USA}, pages = {689--694}, crossref = {DBLP:conf/icra/2012}, abstract = {Robot-assisted surgical procedures are perpetually evolving due to potential improvement in patient treatment and healthcare cost reduction. Integration of an imaging modality intraoperatively further strengthens these procedures by incorporating the information pertaining to the area of intervention. Such information needs to be effectively rendered to the operator as a human-in-the-loop requirement. In this work, we propose a guidance approach that uses real-time MRI to assist the operator in performing robot-assisted procedure in a beating heart. Specifically, this approach provides both real-time visualization and force-feedback based guidance for maneuvering an interventional tool safely inside the dynamic environment of a heart's left ventricle. Experimental evaluation of the functionality of this approach was tested on a simulated scenario of transapical aortic valve replacement and it demonstrated improvement in control and manipulation by providing effective and accurate assistance to the operator in real-time.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Robot-assisted surgical procedures are perpetually evolving due to potential improvement in patient treatment and healthcare cost reduction. Integration of an imaging modality intraoperatively further strengthens these procedures by incorporating the information pertaining to the area of intervention. Such information needs to be effectively rendered to the operator as a human-in-the-loop requirement. In this work, we propose a guidance approach that uses real-time MRI to assist the operator in performing robot-assisted procedure in a beating heart. Specifically, this approach provides both real-time visualization and force-feedback based guidance for maneuvering an interventional tool safely inside the dynamic environment of a heart's left ventricle. Experimental evaluation of the functionality of this approach was tested on a simulated scenario of transapical aortic valve replacement and it demonstrated improvement in control and manipulation by providing effective and accurate assistance to the operator in real-time. |
2010 |
Apostolopoulos, I; Fallah, N; Folmer, E; Bekris, K Feasibility of Interactive Localization and Navigation of People with Visual Impairments Conference IEEE Intelligent Autonomous Systems Conference (IAS), Ottawa, Canada, 2010. @conference{Apostolopoulos:2010aa, title = {Feasibility of Interactive Localization and Navigation of People with Visual Impairments}, author = {I Apostolopoulos and N Fallah and E Folmer and K Bekris}, url = {http://www.cs.rutgers.edu/~kb572/pubs/navatar_feasibility.pdf}, year = {2010}, date = {2010-08-01}, booktitle = {IEEE Intelligent Autonomous Systems Conference (IAS)}, address = {Ottawa, Canada}, abstract = {Indoor localization and navigation systems for individuals who are visually impaired (VI) typically rely upon expensive physical augmentation of the environment or expensive sensing equipment. This is the reason why only few such systems have been implemented. This work conducts a feasibility study of whether it is possible to localize and guide the navigation of people with VI using inexpensive sensors, such as compasses and pedometers. These sensors are already widely available in portable devices such as smart phones. The proposed approach takes advantage of active interaction between the autonomous intelligent system and the human user and employs the map of the world as a prior. Experiments are employed to study what kind of instructions are most succesful in assisting human users to reach their destination. These experiments also show that the application of Bayesian localization tools can also provide sufficient localization accuracy, while achieving real-time operation, despite the minimalistic, noisy nature of the available information and the limited computational resources available on smart phones. This line of research opens the door to many exciting new applications for methods from robotics in the area of human-centered intelligent systems.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Indoor localization and navigation systems for individuals who are visually impaired (VI) typically rely upon expensive physical augmentation of the environment or expensive sensing equipment. This is the reason why only few such systems have been implemented. This work conducts a feasibility study of whether it is possible to localize and guide the navigation of people with VI using inexpensive sensors, such as compasses and pedometers. These sensors are already widely available in portable devices such as smart phones. The proposed approach takes advantage of active interaction between the autonomous intelligent system and the human user and employs the map of the world as a prior. Experiments are employed to study what kind of instructions are most succesful in assisting human users to reach their destination. These experiments also show that the application of Bayesian localization tools can also provide sufficient localization accuracy, while achieving real-time operation, despite the minimalistic, noisy nature of the available information and the limited computational resources available on smart phones. This line of research opens the door to many exciting new applications for methods from robotics in the area of human-centered intelligent systems. |
2006 |
Bekris, K; Glick, M; Kavraki, L Evaluation of Algorithms for Bearing-Only Slam Conference IEEE International Conference on Robotics and Automation (ICRA), Orlando, FL, 2006. @conference{Bekris:2006aa, title = {Evaluation of Algorithms for Bearing-Only Slam}, author = {K Bekris and M Glick and L Kavraki}, url = {http://www.cs.rutgers.edu/~kb572/pubs/bearing_only_slam.pdf}, year = {2006}, date = {2006-05-01}, booktitle = {IEEE International Conference on Robotics and Automation (ICRA)}, address = {Orlando, FL}, abstract = {An important milestone for building affordable robots that can become widely popular is to address robustly the Simultaneous Localization and Mapping (SLAM) problem with inexpensive, off-the-shelf sensors, such as monocular cameras. These sensors, however, impose significant challenges on SLAM procedures because they provide only bearing data related to environmental landmarks. This paper starts by providing an extensive comparison of different techniques for bearing-only SLAM in terms of robustness under different noise models, landmark densities and robot paths. We have experimented in a simulated environment with a variety of existing online algorithms including Rao-Blackwellized Particle Filters (RBPFs). Our experiments suggest that RB-PFs are more robust compared to other existing methods and run considerably faster. Nevertheless, their performance suffers in the presence of outliers. In order to overcome this limitation we proceed to propose an augmentation of RB-PFs with: (a) Gaussian Sum Filters for landmark initialization and (b) an online, unsupervised outlier rejection policy. This framework exhibits impressive robustness and efficiency even in the presence of outliers.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } An important milestone for building affordable robots that can become widely popular is to address robustly the Simultaneous Localization and Mapping (SLAM) problem with inexpensive, off-the-shelf sensors, such as monocular cameras. These sensors, however, impose significant challenges on SLAM procedures because they provide only bearing data related to environmental landmarks. This paper starts by providing an extensive comparison of different techniques for bearing-only SLAM in terms of robustness under different noise models, landmark densities and robot paths. We have experimented in a simulated environment with a variety of existing online algorithms including Rao-Blackwellized Particle Filters (RBPFs). Our experiments suggest that RB-PFs are more robust compared to other existing methods and run considerably faster. Nevertheless, their performance suffers in the presence of outliers. In order to overcome this limitation we proceed to propose an augmentation of RB-PFs with: (a) Gaussian Sum Filters for landmark initialization and (b) an online, unsupervised outlier rejection policy. This framework exhibits impressive robustness and efficiency even in the presence of outliers. |
Bekris, K; Argyros, A; Kavraki, L Exploiting Panoramic Vision for Angle-Based Robot Homing Book Chapter Lecture Notes in Computer Science, 33 , Springer, 2006. @inbook{Bekris:2006ab, title = {Exploiting Panoramic Vision for Angle-Based Robot Homing}, author = {K Bekris and A Argyros and L Kavraki}, url = {http://www.cs.rutgers.edu/~kb572/pubs/omnidirectional_homing.pdf}, year = {2006}, date = {2006-01-01}, booktitle = {Lecture Notes in Computer Science}, volume = {33}, publisher = {Springer}, organization = {Springer}, abstract = {Omni-directional vision allows for the development of techniques for mobile robot navigation that have minimum perceptual requirements. In this work, we focus on robot navigation algorithms that do not require range information or metric maps of the environment. More specifically, we present a homing strategy that enables a robot to return to its home position after executing a long path. The proposed strategy relies on measuring the angle between pairs of features extracted from panoramic images, which can be achieved accurately and robustly. In the heart of the proposed homing strategy lies a novel, local control law that enables a robot to reach any position on the plane by exploiting the bearings of at least three landmarks of unknown position, without making assumptions regarding the robottextquoterights orientation and without making use of a compass. This control law is the result of the unification of two other local control laws which guide the robot by monitoring the bearing of landmarks and which are able to reach complementary sets of goal positions on the plane. Long-range homing is then realized through the systematic application of the unified control law between automatically extracted milestone positions connecting the robottextquoterights current position to the home position. Experimental results, conducted both in a simulated environment and on a robotic platform equipped with a panoramic camera validate the employed local control laws as well as the overall homing strategy. Moreover, they show that panoramic vision can assist in simplifying the perceptual processes required to support robust and accurate homing behaviors.}, keywords = {}, pubstate = {published}, tppubtype = {inbook} } Omni-directional vision allows for the development of techniques for mobile robot navigation that have minimum perceptual requirements. In this work, we focus on robot navigation algorithms that do not require range information or metric maps of the environment. More specifically, we present a homing strategy that enables a robot to return to its home position after executing a long path. The proposed strategy relies on measuring the angle between pairs of features extracted from panoramic images, which can be achieved accurately and robustly. In the heart of the proposed homing strategy lies a novel, local control law that enables a robot to reach any position on the plane by exploiting the bearings of at least three landmarks of unknown position, without making assumptions regarding the robottextquoterights orientation and without making use of a compass. This control law is the result of the unification of two other local control laws which guide the robot by monitoring the bearing of landmarks and which are able to reach complementary sets of goal positions on the plane. Long-range homing is then realized through the systematic application of the unified control law between automatically extracted milestone positions connecting the robottextquoterights current position to the home position. Experimental results, conducted both in a simulated environment and on a robotic platform equipped with a panoramic camera validate the employed local control laws as well as the overall homing strategy. Moreover, they show that panoramic vision can assist in simplifying the perceptual processes required to support robust and accurate homing behaviors. |
2005 |
Argyros, A; Bekris, K; Orphanoudakis, S; Kavraki, L Robot Homing by Exploiting Panoramic Vision Journal Article Autonomous Robots, 19 (1), 2005. @article{Argyros:2005aa, title = {Robot Homing by Exploiting Panoramic Vision}, author = {A Argyros and K Bekris and S Orphanoudakis and L Kavraki}, url = {http://www.cs.rutgers.edu/~kb572/pubs/robot_homing_panoramic.pdf}, year = {2005}, date = {2005-01-01}, journal = {Autonomous Robots}, volume = {19}, number = {1}, chapter = {7-25}, abstract = {We propose a novel, vision-based method for robot homing, the problem of computing a route so that a robot can return to its initial "home" position after the execution of an arbitrary "prior" path. The method assumes that the robot tracks visual features in panoramic views of the environment that it acquires as it moves. By exploiting only angular information regarding the tracked features, a local control strategy moves the robot between two positions, provided that there are at least three features that can be matched in the panoramas acquired at these positions. The strategy is successful when certain geometric constraints on the configuration of the two positions relative to the features are fulfilled. In order to achieve long-range homing, the features' trajectories are organized in a visual memory during the execution of the "prior" path. When homing is initiated, the robot selects Milestone Positions (MPs) on the "prior" path by exploiting information in its visual memory. The MP selection process aims at picking positions that guarantee the success of the local control strategy between two consecutive MPs. The sequence of successive MPs successfully guides the robot even if the visual context in the "home" position is radically different from the visual context at the position where homing was initiated. Experimental results from a prototype implementation of the method demonstrate that homing can be achieved with high accuracy, independent of the distance traveled by the robot. The contribution of this work is that it shows how a complex navigational task such as homing can be accomplished efficiently, robustly and in real-time by exploiting primitive visual cues. Such cues carry implicit information regarding the 3D structure of the environment. Thus, the computation of explicit range information and the existence of a geometric map are not required.}, keywords = {}, pubstate = {published}, tppubtype = {article} } We propose a novel, vision-based method for robot homing, the problem of computing a route so that a robot can return to its initial "home" position after the execution of an arbitrary "prior" path. The method assumes that the robot tracks visual features in panoramic views of the environment that it acquires as it moves. By exploiting only angular information regarding the tracked features, a local control strategy moves the robot between two positions, provided that there are at least three features that can be matched in the panoramas acquired at these positions. The strategy is successful when certain geometric constraints on the configuration of the two positions relative to the features are fulfilled. In order to achieve long-range homing, the features' trajectories are organized in a visual memory during the execution of the "prior" path. When homing is initiated, the robot selects Milestone Positions (MPs) on the "prior" path by exploiting information in its visual memory. The MP selection process aims at picking positions that guarantee the success of the local control strategy between two consecutive MPs. The sequence of successive MPs successfully guides the robot even if the visual context in the "home" position is radically different from the visual context at the position where homing was initiated. Experimental results from a prototype implementation of the method demonstrate that homing can be achieved with high accuracy, independent of the distance traveled by the robot. The contribution of this work is that it shows how a complex navigational task such as homing can be accomplished efficiently, robustly and in real-time by exploiting primitive visual cues. Such cues carry implicit information regarding the 3D structure of the environment. Thus, the computation of explicit range information and the existence of a geometric map are not required. |
2004 |
Bekris, K; Argyros, A; Kavraki, L Angle-Based Methods for Mobile Robot Navigation: Reaching the Entire Plane Conference IEEE International Conference on Robotics and Automation (ICRA04), New Orleans, LA, 2004. @conference{Bekris:2004aa, title = {Angle-Based Methods for Mobile Robot Navigation: Reaching the Entire Plane}, author = {K Bekris and A Argyros and L Kavraki}, url = {http://www.cs.rutgers.edu/~kb572/pubs/angle_based_navigation.pdf}, year = {2004}, date = {2004-04-01}, booktitle = {IEEE International Conference on Robotics and Automation (ICRA04)}, address = {New Orleans, LA}, abstract = {Popular approaches for mobile robot navigation involve range information and metric maps of the workspace. For many sensors, however, such as cameras and wireless hardware, the angle between two extracted features or beacons is easier to measure. With these sensors' features in mind, this paper initially presents a control law, which allows a robot equipped with an omni-directional sensor to reach a subset of the plane by monitoring the angles of only three landmarks. By analyzing the properties of this law, a second law has been developed that reaches the complementary set of points. The two methods are then combined in a path planning framework that reaches any possible goal configuration in a planar obstacle-free workspace with three landmarks. The proposed framework could be combined with other techniques, such as obstacle avoidance and topological maps, to improve the efficiency of autonomous navigation. Experiments have been conducted on a robotic platform equipped with a panoramic camera that exhibits the effectiveness and accuracy of the proposed techniques. This work provides evidence that navigational tasks can be performed using only a small number of primitive sensor cues and without the explicit computation of range information.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Popular approaches for mobile robot navigation involve range information and metric maps of the workspace. For many sensors, however, such as cameras and wireless hardware, the angle between two extracted features or beacons is easier to measure. With these sensors' features in mind, this paper initially presents a control law, which allows a robot equipped with an omni-directional sensor to reach a subset of the plane by monitoring the angles of only three landmarks. By analyzing the properties of this law, a second law has been developed that reaches the complementary set of points. The two methods are then combined in a path planning framework that reaches any possible goal configuration in a planar obstacle-free workspace with three landmarks. The proposed framework could be combined with other techniques, such as obstacle avoidance and topological maps, to improve the efficiency of autonomous navigation. Experiments have been conducted on a robotic platform equipped with a panoramic camera that exhibits the effectiveness and accuracy of the proposed techniques. This work provides evidence that navigational tasks can be performed using only a small number of primitive sensor cues and without the explicit computation of range information. |
2001 |
Argyros, A; Bekris, K; Orphanoudakis, S Robot Homing Based on Corner Tracking in a Sequence of Panoramic Images Conference Computer Vision and Pattern Recognition Conference (CVPR01), Hawaii, USA, 2001. @conference{Argyros:2001aa, title = {Robot Homing Based on Corner Tracking in a Sequence of Panoramic Images}, author = {A Argyros and K Bekris and S Orphanoudakis}, url = {http://www.cs.rutgers.edu/~kb572/pubs/homing_corner_tracking.pdf}, year = {2001}, date = {2001-01-01}, booktitle = {Computer Vision and Pattern Recognition Conference (CVPR01)}, address = {Hawaii, USA}, abstract = {In robotics, homing can be defined as that behavior which enables a robot to return to its initial (home) position, after traveling a certain distance along an arbitrary path. Odometry has traditionally been used for the implementation of such a behavior, but it has been shown to be an unreliable source of information. In this work, a novel method for visual homing is proposed, based on a panoramic camera. As the robot departs from its initial position, it tracks characteristic features of the environment (corners). As soon as homing is activated, the robot selects intermediate target positions on the original path. These intermediate positions (IPs) are then visited sequentially, until the home position is reached. For the robot to move between two consecutive IPs, it is only required to establish correspondence among at least three corners. This correspondence is obtained through a feature tracking mechanism. The proposed homing scheme is based on the extraction of very low-level sensory information, namely the bearing angles of corners, and has been implemented on a robotic platform. Experimental results show that the proposed scheme achieves homing with a remarkable accuracy, which is not affected by the distance traveled by the robot.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } In robotics, homing can be defined as that behavior which enables a robot to return to its initial (home) position, after traveling a certain distance along an arbitrary path. Odometry has traditionally been used for the implementation of such a behavior, but it has been shown to be an unreliable source of information. In this work, a novel method for visual homing is proposed, based on a panoramic camera. As the robot departs from its initial position, it tracks characteristic features of the environment (corners). As soon as homing is activated, the robot selects intermediate target positions on the original path. These intermediate positions (IPs) are then visited sequentially, until the home position is reached. For the robot to move between two consecutive IPs, it is only required to establish correspondence among at least three corners. This correspondence is obtained through a feature tracking mechanism. The proposed homing scheme is based on the extraction of very low-level sensory information, namely the bearing angles of corners, and has been implemented on a robotic platform. Experimental results show that the proposed scheme achieves homing with a remarkable accuracy, which is not affected by the distance traveled by the robot. |