2024 |
Bekris, K; Doerr, J; Meng, P; Tangirala, S The State of Robot Motion Generation Inproceedings International Symposium of Robotics Research (ISRR), Long Beach, California, 2024. Abstract | Links | BibTeX | Tags: Dynamics, Learning, Planning @inproceedings{Bekris:2024aa, title = {The State of Robot Motion Generation}, author = {K Bekris and J Doerr and P Meng and S Tangirala}, url = {https://arxiv.org/abs/2410.12172 https://pracsys.cs.rutgers.edu/papers/the-state-of-robot-motion-generation/}, year = {2024}, date = {2024-12-01}, booktitle = {International Symposium of Robotics Research (ISRR)}, address = {Long Beach, California}, abstract = {This paper first reviews the large spectrum of methods for generating robot motion proposed over the 50 years of robotics research culminating to recent developments. It crosses the boundaries of methodologies, which are typically not surveyed together, from those that operate over explicit models to those that learn implicit ones. The paper concludes with a discussion of the current state-of-the-art and the properties of the varying methodologies highlighting opportunities for integration.}, keywords = {Dynamics, Learning, Planning}, pubstate = {published}, tppubtype = {inproceedings} } This paper first reviews the large spectrum of methods for generating robot motion proposed over the 50 years of robotics research culminating to recent developments. It crosses the boundaries of methodologies, which are typically not surveyed together, from those that operate over explicit models to those that learn implicit ones. The paper concludes with a discussion of the current state-of-the-art and the properties of the varying methodologies highlighting opportunities for integration. |
2023 |
Lu, S; Deng, Y; Boularias, A; Bekris, K Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D Videos Inproceedings IEEE International Conference on Robotics and Automation (ICRA), London, UK, 2023. Abstract | Links | BibTeX | Tags: Learning, Perception @inproceedings{Lu:2023ab, title = {Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D Videos}, author = {S Lu and Y Deng and A Boularias and K Bekris}, url = {https://arxiv.org/abs/2304.04325}, year = {2023}, date = {2023-05-01}, booktitle = {IEEE International Conference on Robotics and Automation (ICRA)}, address = {London, UK}, abstract = {This work proposes a self-supervised learning system for segmenting rigid objects in RGB images. The proposed pipeline is trained on unlabeled RGB-D videos of static objects, which can be captured with a camera carried by a mobile robot. A key feature of the self-supervised training process is a graph-matching algorithm that operates on the over-segmentation output of the point cloud that is reconstructed from each video. The graph matching, along with point cloud registration, is able to find reoccurring object patterns across videos and combine them into 3D object pseudo labels, even under occlusions or different viewing angles. Projected 2D object masks from 3D pseudo labels are used to train a pixel-wise feature extractor through contrastive learning. During online inference, a clustering method uses the learned features to cluster foreground pixels into object segments. Experiments highlight the method's effectiveness on both real and synthetic video datasets, which include cluttered scenes of tabletop objects. The proposed method outperforms existing unsupervised methods for object segmentation by a large margin.}, keywords = {Learning, Perception}, pubstate = {published}, tppubtype = {inproceedings} } This work proposes a self-supervised learning system for segmenting rigid objects in RGB images. The proposed pipeline is trained on unlabeled RGB-D videos of static objects, which can be captured with a camera carried by a mobile robot. A key feature of the self-supervised training process is a graph-matching algorithm that operates on the over-segmentation output of the point cloud that is reconstructed from each video. The graph matching, along with point cloud registration, is able to find reoccurring object patterns across videos and combine them into 3D object pseudo labels, even under occlusions or different viewing angles. Projected 2D object masks from 3D pseudo labels are used to train a pixel-wise feature extractor through contrastive learning. During online inference, a clustering method uses the learned features to cluster foreground pixels into object segments. Experiments highlight the method's effectiveness on both real and synthetic video datasets, which include cluttered scenes of tabletop objects. The proposed method outperforms existing unsupervised methods for object segmentation by a large margin. |
2022 |
McMahon, T; Sivaramakrishnan, A; Kedia, K; Granados, E; Bekris, K Terrain-Aware Learned Controllers for Sampling-Based Kinodynamic Planning Over Physically Simulated Terrains Inproceedings IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022. Abstract | Links | BibTeX | Tags: Dynamics, Learning, Planning @inproceedings{McMahon:2022ab, title = {Terrain-Aware Learned Controllers for Sampling-Based Kinodynamic Planning Over Physically Simulated Terrains}, author = {T McMahon and A Sivaramakrishnan and K Kedia and E Granados and K Bekris}, url = {https://ieeexplore.ieee.org/document/9982136}, year = {2022}, date = {2022-06-01}, booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, abstract = {This paper explores learning an effective controller for improving the efficiency of kinodynamic planning for vehicular systems navigating uneven terrains. It describes the pipeline for training the corresponding controller and using it for motion planning purposes. The training process uses a soft actor-critic approach with hindsight experience replay to train a model, which is parameterized by the incline of the robot's local terrain. This trained model is then used during the expansion process of an asymptotically optimal kinodynamic planner to generate controls that allow the robot to reach desired local states. It is also used to define a heuristic cost-to-go function for the planner via a wavefront operation that estimates the cost of reaching the global goal. The cost-to-go function is used both for selecting nodes for expansion as well as for generating local goals for the controller to expand towards. The accompanying experimental section applies the integrated planning solution on models of all-terrain robots in a variety of physically simulated terrains. It shows that the proposed terrain-aware controller and the proposed wavefront function based on the cost-to-go model enable motion planners to find solutions in less time and with lower cost than alternatives. An ablation study emphasizes the benefits of a learned controller that is parameterized by the incline of the robot's local terrain as well as of an incremental training process for the controller.}, keywords = {Dynamics, Learning, Planning}, pubstate = {published}, tppubtype = {inproceedings} } This paper explores learning an effective controller for improving the efficiency of kinodynamic planning for vehicular systems navigating uneven terrains. It describes the pipeline for training the corresponding controller and using it for motion planning purposes. The training process uses a soft actor-critic approach with hindsight experience replay to train a model, which is parameterized by the incline of the robot's local terrain. This trained model is then used during the expansion process of an asymptotically optimal kinodynamic planner to generate controls that allow the robot to reach desired local states. It is also used to define a heuristic cost-to-go function for the planner via a wavefront operation that estimates the cost of reaching the global goal. The cost-to-go function is used both for selecting nodes for expansion as well as for generating local goals for the controller to expand towards. The accompanying experimental section applies the integrated planning solution on models of all-terrain robots in a variety of physically simulated terrains. It shows that the proposed terrain-aware controller and the proposed wavefront function based on the cost-to-go model enable motion planners to find solutions in less time and with lower cost than alternatives. An ablation study emphasizes the benefits of a learned controller that is parameterized by the incline of the robot's local terrain as well as of an incremental training process for the controller. |
Wen, B; Lian, W; Bekris, K; Schaal, S You Only Demonstrate Once: Category-Level Manipulation from Single Visual Demonstration Inproceedings Robotics: Science and Systems (RSS), 2022, (Nomination for Best Paper Award). Abstract | Links | BibTeX | Tags: Learning, Manipulation, Perception @inproceedings{Wen:2022ab, title = {You Only Demonstrate Once: Category-Level Manipulation from Single Visual Demonstration}, author = {B Wen and W Lian and K Bekris and S Schaal}, url = {https://www.roboticsproceedings.org/rss18/p044.pdf}, year = {2022}, date = {2022-06-01}, booktitle = {Robotics: Science and Systems (RSS)}, abstract = {Promising results have been achieved recently in category-level manipulation that generalizes across object instances. Nevertheless, it often requires expensive real-world data collection and manual specification of semantic keypoints for each object category and task. Additionally, coarse keypoint predictions and ignoring intermediate action sequences hinder adoption in complex manipulation tasks beyond pick-and-place. This work proposes a novel, category-level manipulation framework that leverages an object-centric, category-level representation and model-free 6 DoF motion tracking. The canonical object representation is learned solely in simulation and then used to parse a category-level, task trajectory from a single demonstration video. The demonstration is reprojected to a target trajectory tailored to a novel object via the canonical representation. During execution, the manipulation horizon is decomposed into long range, collision-free motion and last-inch manipulation. For the latter part, a category-level behavior cloning (CatBC) method leverages motion tracking to perform closed-loop control. CatBC follows the target trajectory, projected from the demonstration and anchored to a dynamically selected category-level coordinate frame. The frame is automatically selected along the manipulation horizon by a local attention mechanism. This framework allows to teach different manipulation strategies by solely providing a single demonstration, without complicated manual programming. Extensive experiments demonstrate its efficacy in a range of challenging industrial tasks in high precision assembly, which involve learning complex, long-horizon policies. The process exhibits robustness against uncertainty due to dynamics as well as generalization across object instances and scene configurations.}, note = {Nomination for Best Paper Award}, keywords = {Learning, Manipulation, Perception}, pubstate = {published}, tppubtype = {inproceedings} } Promising results have been achieved recently in category-level manipulation that generalizes across object instances. Nevertheless, it often requires expensive real-world data collection and manual specification of semantic keypoints for each object category and task. Additionally, coarse keypoint predictions and ignoring intermediate action sequences hinder adoption in complex manipulation tasks beyond pick-and-place. This work proposes a novel, category-level manipulation framework that leverages an object-centric, category-level representation and model-free 6 DoF motion tracking. The canonical object representation is learned solely in simulation and then used to parse a category-level, task trajectory from a single demonstration video. The demonstration is reprojected to a target trajectory tailored to a novel object via the canonical representation. During execution, the manipulation horizon is decomposed into long range, collision-free motion and last-inch manipulation. For the latter part, a category-level behavior cloning (CatBC) method leverages motion tracking to perform closed-loop control. CatBC follows the target trajectory, projected from the demonstration and anchored to a dynamically selected category-level coordinate frame. The frame is automatically selected along the manipulation horizon by a local attention mechanism. This framework allows to teach different manipulation strategies by solely providing a single demonstration, without complicated manual programming. Extensive experiments demonstrate its efficacy in a range of challenging industrial tasks in high precision assembly, which involve learning complex, long-horizon policies. The process exhibits robustness against uncertainty due to dynamics as well as generalization across object instances and scene configurations. |
2021 |
Wang, K; Aanjaneya, M; Bekris, K Sim2Sim Evaluation of a Novel Data-Efficient Differentiable Physics Engine for Tensegrity Robots Inproceedings IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021. Abstract | Links | BibTeX | Tags: Dynamics, Learning, Soft-Robots @inproceedings{Wang:2021ab, title = {Sim2Sim Evaluation of a Novel Data-Efficient Differentiable Physics Engine for Tensegrity Robots}, author = {K Wang and M Aanjaneya and K Bekris}, url = {https://arxiv.org/abs/2011.04929}, year = {2021}, date = {2021-09-01}, booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, abstract = {Learning policies in simulation is promising for reducing human effort when training robot controllers. This is especially true for soft robots that are more adaptive and safe but also more difficult to accurately model and control. The sim2real gap is the main barrier to successfully transfer policies from simulation to a real robot. System identification can be applied to reduce this gap but traditional identification methods require a lot of manual tuning. Data-driven alternatives can tune dynamical models directly from data but are often data hungry, which also incorporates human effort in collecting data. This work proposes a data-driven, end-to-end differentiable simulator focused on the exciting but challenging domain of tensegrity robots. To the best of the authors' knowledge, this is the first differentiable physics engine for tensegrity robots that supports cable, contact, and actuation modeling. The aim is to develop a reasonably simplified, data-driven simulation, which can learn approximate dynamics with limited ground truth data. The dynamics must be accurate enough to generate policies that can be transferred back to the ground-truth system. As a first step in this direction, the current work demonstrates sim2sim transfer, where the unknown physical model of MuJoCo acts as a ground truth system. Two different tensegrity robots are used for evaluation and learning of locomotion policies, a 6-bar and a 3-bar tensegrity. The results indicate that only 0.25% of ground truth data are needed to train a policy that works on the ground truth system when the differentiable engine is used for training against training the policy directly on the ground truth system.}, keywords = {Dynamics, Learning, Soft-Robots}, pubstate = {published}, tppubtype = {inproceedings} } Learning policies in simulation is promising for reducing human effort when training robot controllers. This is especially true for soft robots that are more adaptive and safe but also more difficult to accurately model and control. The sim2real gap is the main barrier to successfully transfer policies from simulation to a real robot. System identification can be applied to reduce this gap but traditional identification methods require a lot of manual tuning. Data-driven alternatives can tune dynamical models directly from data but are often data hungry, which also incorporates human effort in collecting data. This work proposes a data-driven, end-to-end differentiable simulator focused on the exciting but challenging domain of tensegrity robots. To the best of the authors' knowledge, this is the first differentiable physics engine for tensegrity robots that supports cable, contact, and actuation modeling. The aim is to develop a reasonably simplified, data-driven simulation, which can learn approximate dynamics with limited ground truth data. The dynamics must be accurate enough to generate policies that can be transferred back to the ground-truth system. As a first step in this direction, the current work demonstrates sim2sim transfer, where the unknown physical model of MuJoCo acts as a ground truth system. Two different tensegrity robots are used for evaluation and learning of locomotion policies, a 6-bar and a 3-bar tensegrity. The results indicate that only 0.25% of ground truth data are needed to train a policy that works on the ground truth system when the differentiable engine is used for training against training the policy directly on the ground truth system. |
2020 |
Wen, B; Mitash, C; Ren, B; Bekris, K se(3)-TrackNet: Data-Driven 6d Pose Tracking by Calibrating Image Residuals in Synthetic Domains Conference IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, 2020. Abstract | Links | BibTeX | Tags: Learning, Perception @conference{Wen:2020ab, title = {se(3)-TrackNet: Data-Driven 6d Pose Tracking by Calibrating Image Residuals in Synthetic Domains}, author = {B Wen and C Mitash and B Ren and K Bekris}, url = {http://arxiv.org/abs/2007.13866}, year = {2020}, date = {2020-10-01}, booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, address = {Las Vegas, NV}, abstract = {Tracking the 6D pose of objects in video sequences is important for robot manipulation. This task, however, introduces multiple challenges: (i) robot manipulation involves significant occlusions; (ii) data and annotations are troublesome and difficult to collect for 6D poses, which complicates machine learning solutions, and (iii) incremental error drift often accumulates in long term tracking to necessitate re-initialization of the object's pose. This work proposes a data-driven optimization approach for long-term, 6D pose tracking. It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object's model. The key contribution in this context is a novel neural network architecture, which appropriately disentangles the feature encoding to help reduce domain shift, and an effective 3D orientation representation via Lie Algebra. Consequently, even when the network is trained only with synthetic data can work effectively over real images. Comprehensive experiments over benchmarks - existing ones as well as a new dataset with significant occlusions related to object manipulation - show that the proposed approach achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images. The approach is also the most computationally efficient among the alternatives and achieves a tracking frequency of 90.9Hz.}, keywords = {Learning, Perception}, pubstate = {published}, tppubtype = {conference} } Tracking the 6D pose of objects in video sequences is important for robot manipulation. This task, however, introduces multiple challenges: (i) robot manipulation involves significant occlusions; (ii) data and annotations are troublesome and difficult to collect for 6D poses, which complicates machine learning solutions, and (iii) incremental error drift often accumulates in long term tracking to necessitate re-initialization of the object's pose. This work proposes a data-driven optimization approach for long-term, 6D pose tracking. It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object's model. The key contribution in this context is a novel neural network architecture, which appropriately disentangles the feature encoding to help reduce domain shift, and an effective 3D orientation representation via Lie Algebra. Consequently, even when the network is trained only with synthetic data can work effectively over real images. Comprehensive experiments over benchmarks - existing ones as well as a new dataset with significant occlusions related to object manipulation - show that the proposed approach achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images. The approach is also the most computationally efficient among the alternatives and achieves a tracking frequency of 90.9Hz. |
2019 |
Mitash, C; Wen, B; Bekris, K; Boularias, A Scene-Level Pose Estimation for Multiple Instances of Densely Packed Objects Conference Conference on Robot Learning (CoRL), Osaka, Japan, 2019. Abstract | Links | BibTeX | Tags: Learning, Perception @conference{Mitash:2019aa, title = {Scene-Level Pose Estimation for Multiple Instances of Densely Packed Objects}, author = {C Mitash and B Wen and K Bekris and A Boularias}, url = {https://arxiv.org/pdf/1910.04953.pdf}, year = {2019}, date = {2019-10-01}, booktitle = {Conference on Robot Learning (CoRL)}, address = {Osaka, Japan}, abstract = {This paper introduces key machine learning operations that allow the realization of robust, joint 6D pose estimation of multiple instances of objects either densely packed or in unstructured piles from RGB-D data. The first objective is to learn semantic and instance-boundary detectors without manual labeling. An adversarial training framework in conjunction with physics-based simulation is used to achieve detectors that behave similarly in synthetic and real data. Given the stochastic output of such detectors, candidates for object poses are sampled. The second objective is to automatically learn a single score for each pose candidate that represents its quality in terms of explaining the entire scene via a gradient boosted tree. The proposed method uses features derived from surface and boundary alignment between the observed scene and the object model placed at hypothesized poses. Scene-level, multi-instance pose estimation is then achieved by an integer linear programming process that selects hypotheses that maximize the sum of the learned individual scores, while respecting constraints, such as avoiding collisions. To evaluate this method, a dataset of densely packed objects with challenging setups for state-of-the-art approaches is collected. Experiments on this dataset and a public one show that the method significantly outperforms alternatives in terms of 6D pose accuracy while trained only with synthetic datasets.}, keywords = {Learning, Perception}, pubstate = {published}, tppubtype = {conference} } This paper introduces key machine learning operations that allow the realization of robust, joint 6D pose estimation of multiple instances of objects either densely packed or in unstructured piles from RGB-D data. The first objective is to learn semantic and instance-boundary detectors without manual labeling. An adversarial training framework in conjunction with physics-based simulation is used to achieve detectors that behave similarly in synthetic and real data. Given the stochastic output of such detectors, candidates for object poses are sampled. The second objective is to automatically learn a single score for each pose candidate that represents its quality in terms of explaining the entire scene via a gradient boosted tree. The proposed method uses features derived from surface and boundary alignment between the observed scene and the object model placed at hypothesized poses. Scene-level, multi-instance pose estimation is then achieved by an integer linear programming process that selects hypotheses that maximize the sum of the learned individual scores, while respecting constraints, such as avoiding collisions. To evaluate this method, a dataset of densely packed objects with challenging setups for state-of-the-art approaches is collected. Experiments on this dataset and a public one show that the method significantly outperforms alternatives in terms of 6D pose accuracy while trained only with synthetic datasets. |
2024 |
The State of Robot Motion Generation Inproceedings International Symposium of Robotics Research (ISRR), Long Beach, California, 2024. |
2023 |
Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D Videos Inproceedings IEEE International Conference on Robotics and Automation (ICRA), London, UK, 2023. |
2022 |
Terrain-Aware Learned Controllers for Sampling-Based Kinodynamic Planning Over Physically Simulated Terrains Inproceedings IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022. |
You Only Demonstrate Once: Category-Level Manipulation from Single Visual Demonstration Inproceedings Robotics: Science and Systems (RSS), 2022, (Nomination for Best Paper Award). |
2021 |
Sim2Sim Evaluation of a Novel Data-Efficient Differentiable Physics Engine for Tensegrity Robots Inproceedings IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021. |
2020 |
se(3)-TrackNet: Data-Driven 6d Pose Tracking by Calibrating Image Residuals in Synthetic Domains Conference IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, 2020. |
2019 |
Scene-Level Pose Estimation for Multiple Instances of Densely Packed Objects Conference Conference on Robot Learning (CoRL), Osaka, Japan, 2019. |