Vision-driven and Learned Manipulation

Publications:

54 entries « ‹ 1 of 2 › »

2025

Li, S; Keipour, A; Zhao, S; Rajagopalan, S; Swan, C; Bekris, K

Learning to Optimize Package Picking for Large-Scale, Real-World Robot Induction Conference

19th International Symposium on Experimental Robotics (ISER), 2025.

Abstract | Links | BibTeX

Wang, C; Vanbaar, J; Mitash, C; Li, S; Randle, D; Keipour, A; Hussein, M; Bekris, K; Katyal, K

Demonstrating Item Picking at Scale via Effective Learning of Multimodal Representations Conference

Proceedings of Robotics: Science and Systems (RSS), 2025.

Abstract | Links | BibTeX

@conference{wang2025Demonstrating,
title = {Demonstrating Item Picking at Scale via Effective Learning of Multimodal Representations},
author = {C Wang and J Vanbaar and C Mitash and S Li and D Randle and A Keipour and M Hussein and K Bekris and K Katyal},
url = {https://arxiv.org/abs/2506.10359},
year = {2025},
date = {2025-06-21},
booktitle = {Proceedings of Robotics: Science and Systems (RSS)},
journal = {Proceedings of Robotics: Science and Systems (RSS)},
abstract = {This work demonstrates how autonomously learning aspects of robotic operation from sparsely-labeled, real-world data of deployed, engineered solutions at industrial scale can provide with solutions that achieve improved performance. Specifically, it focuses on multi-suction robot picking and performs a comprehensive study on the application of multi-modal visual encoders for predicting the success of candidate robotic picks. Picking diverse items from unstructured piles is an important and challenging task for robot manipulation in real-world settings, such as warehouses. Methods for picking from clutter must work for an open set of items while simultaneously meeting latency constraints to achieve high throughput. The demonstrated approach utilizes multiple input modalities, such as RGB, depth and semantic segmentation, to estimate the quality of candidate multi-suction picks. The strategy is trained from real-world item picking data, with a combination of multimodal pretrain and finetune. The manuscript provides comprehensive experimental evaluation performed over a large item-picking dataset, an item-picking dataset targeted to include partial occlusions, and a package-picking dataset, which focuses on containers, such as boxes and envelopes, instead of unpackaged items. The evaluation measures performance for different item configurations, pick scenes, and object types. Ablations help to understand the effects of in-domain pretraining, the impact of different modalities and the importance of finetuning. These ablations reveal both the importance of training over multiple modalities but also the ability of models to learn during pretraining the relationship between modalities so that during finetuning and inference, only a subset of them can be used as input. },
keywords = {},
pubstate = {published},
tppubtype = {conference}
}

Marougkas, I; Ramesh, D; Doerr, J; Granados, E; Sivaramakrishnan, A; Boularias, A; Bekris, K

Integrating Model-based Control and RL for Sim2Real Transfer of Tight Insertion Policies Conference

IEEE International Conference on Robotics and Automation (ICRA), 2025.

Abstract | Links | BibTeX

@conference{marougkas2025integration,
title = {Integrating Model-based Control and RL for Sim2Real Transfer of Tight Insertion Policies},
author = {I Marougkas and D Ramesh and J Doerr and E Granados and A Sivaramakrishnan and A Boularias and K Bekris},
url = {https://arxiv.org/abs/2505.11858},
year = {2025},
date = {2025-05-01},
booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
abstract = {Object insertion under tight tolerances (<1mm) is an important but challenging assembly task as even slight errors can result in undesirable contacts. Recent efforts have focused on using Reinforcement Learning (RL) and often depend on careful definition of dense reward functions. This work proposes an effective strategy for such tasks that integrates traditional model-based control with RL to achieve improved accuracy given training of the policy exclusively in simulation and zero- shot transfer to the real system. It employs a potential field- based controller to acquire a model-based policy for inserting a plug into a socket given full observability in simulation. This policy is then integrated with a residual RL one, which is trained in simulation given only a sparse, goal-reaching reward. A curriculum scheme over observation noise and action magnitude is used for training the residual RL policy. Both policy components use as input the SE(3) poses of both the plug and the socket and return the plug's SE(3) pose transform, which is executed by a robotic arm using a controller. The integrated policy is deployed on the real system without further training or fine-tuning, given a visual SE(3) object tracker. The proposed solution and alternatives are evaluated across a variety of objects and conditions in simulation and reality. The proposed approach outperforms recent RL methods in this domain and prior efforts for hybrid policies. Ablations highlight the impact of each component of the approach},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}

2023

Vieira, E; Gao, K; Nakhimovich, D; Bekris, K; Yu, J

Persistent Homology Guided Monte-Carlo Tree Search for Effective Non-Prehensile Manipulation Inproceedings

International Symposium on Experimental Robotics (ISER), 2023.

Abstract | Links | BibTeX

Li, S; Keipour, A; Jamieson, K; Hudson, N; Swan, C; Bekris, K

Demonstrating Large-Scale Package Manipulation Via Learned Metrics of Pick Success Inproceedings

Robotics: Science and Systems (RSS), Daegu, Korea, 2023.

Abstract | BibTeX

@inproceedings{Li:2023aa,
title = {Demonstrating Large-Scale Package Manipulation Via Learned Metrics of Pick Success},
author = {S Li and A Keipour and K Jamieson and N Hudson and C Swan and K Bekris},
year = {2023},
date = {2023-07-01},
booktitle = {Robotics: Science and Systems (RSS)},
address = {Daegu, Korea},
abstract = {Automating warehouse operations can reduce logistics overhead costs, ultimately driving down the final price for consumers, increasing the speed of delivery, and enhancing the resiliency to workforce fluctuations. The past few years have seen increased interest in automating such repeated tasks but mostly in controlled settings. Tasks such as picking objects from unstructured, cluttered piles have only recently become robust enough for large-scale deployment with minimal human intervention.
This paper demonstrates a large-scale package manipulation from unstructured piles in Amazon Robotics' Robot Induction (Robin) fleet, which utilizes a pick success predictor trained on real production data. Specifically, the system was trained on over 394K picks. It is used for singulating up to 5~million packages per day and has manipulated over 200~million packages during this paper's evaluation period.
The developed learned pick quality measure ranks various pick alternatives in real-time and prioritizes the most promising ones for execution. The pick success predictor aims to estimate from prior experience the success probability of a desired pick by the deployed industrial robotic arms in cluttered scenes containing deformable and rigid objects with partially known properties. It is a shallow machine learning model, which allows us to evaluate which features are most important for the prediction. An online pick ranker leverages the learned success predictor to prioritize the most promising picks for the robotic arm, which are then assessed for collision avoidance. This learned ranking process is demonstrated to overcome the limitations and outperform the performance of manually engineered and heuristic alternatives.
To the best of the authors' knowledge, this paper presents the first large-scale deployment of learned pick quality estimation methods in a real production system.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Nakhimovich, D; Miao, Y; Bekris, K

Resolution Complete In-Place Object Retrieval Given Known Object Models Inproceedings

IEEE International Conference on Robotics and Automatics (ICRA), London, UK, 2023.

Abstract | Links | BibTeX

2022

Wen, B; Lian, W; Bekris, K; Schaal, S

You Only Demonstrate Once: Category-Level Manipulation from Single Visual Demonstration Inproceedings

Robotics: Science and Systems (RSS), 2022, (Nomination for Best Paper Award).

Abstract | Links | BibTeX

@inproceedings{Wen:2022ab,
title = {You Only Demonstrate Once: Category-Level Manipulation from Single Visual Demonstration},
author = {B Wen and W Lian and K Bekris and S Schaal},
url = {https://www.roboticsproceedings.org/rss18/p044.pdf},
year = {2022},
date = {2022-06-01},
booktitle = {Robotics: Science and Systems (RSS)},
abstract = {Promising results have been achieved recently in category-level manipulation that generalizes across object instances. Nevertheless, it often requires expensive real-world data collection and manual specification of semantic keypoints for each object category and task. Additionally, coarse keypoint predictions and ignoring intermediate action sequences hinder adoption in complex manipulation tasks beyond pick-and-place. This work proposes a novel, category-level manipulation framework that leverages an object-centric, category-level representation and model-free 6 DoF motion tracking. The canonical object representation is learned solely in simulation and then used to parse a category-level, task trajectory from a single demonstration video. The demonstration is reprojected to a target trajectory tailored to a novel object via the canonical representation. During execution, the manipulation horizon is decomposed into long range, collision-free motion and last-inch manipulation. For the latter part, a category-level behavior cloning (CatBC) method leverages motion tracking to perform closed-loop control. CatBC follows the target trajectory, projected from the demonstration and anchored to a dynamically selected category-level coordinate frame. The frame is automatically selected along the manipulation horizon by a local attention mechanism. This framework allows to teach different manipulation strategies by solely providing a single demonstration, without complicated manual programming. Extensive experiments demonstrate its efficacy in a range of challenging industrial tasks in high precision assembly, which involve learning complex, long-horizon policies. The process exhibits robustness against uncertainty due to dynamics as well as generalization across object instances and scene configurations.},
note = {Nomination for Best Paper Award},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Wang, R; Gao, K; Yu, J; Bekris, K

Lazy Rearrangement Planning in Confined Spaces Inproceedings

International Conference on Automated Planning and Scheduling (ICAPS), 2022.

Abstract | Links | BibTeX

@inproceedings{Wang:2022ac,
title = {Lazy Rearrangement Planning in Confined Spaces},
author = {R Wang and K Gao and J Yu and K Bekris},
url = {https://arxiv.org/abs/2203.10379},
year = {2022},
date = {2022-06-01},
booktitle = {International Conference on Automated Planning and Scheduling (ICAPS)},
abstract = {Object rearrangement is important for many applications but remains challenging, especially in confined spaces, such as shelves, where objects cannot be accessed from above and they block reachability to each other. Such constraints require many motion planning and collision checking calls, which are computationally expensive. In addition, the arrangement space grows exponentially with the number of objects. To address these issues, this work introduces a lazy evaluation framework with a local monotone solver and a global planner. Monotone instances are those that can be solved by moving each object at most once. A key insight is that reachability constraints at the grasps for objects' starts and goals can quickly reveal dependencies between objects without having to execute expensive motion planning queries. Given that, the local solver builds lazily a search tree that respects these reachability constraints without verifying that the arm paths are collision free. It only collision checks when a promising solution is found. If a monotone solution is not found, the non-monotone planner loads the lazy search tree and explores ways to move objects to intermediate locations from where monotone solutions to the goal can be found. Results show that the proposed framework can solve difficult instances in confined spaces with up to 16 objects, which state-of-the-art methods fail to solve. It also solves problems faster than alter- natives, when the alternatives find a solution. It also achieves high-quality solutions, i.e., only 1.8 additional actions on av- erage are needed for non-monotone instances.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Lu, S; Wang, R; Miao, Y; Mitash, C; Bekris, K

Online Object Model Reconstruction and Reuse for Lifelong Improvement of Robot Manipulation Inproceedings

IEEE International Conference on Robotics and Automation (ICRA), 2022, (Nomination for Best Paper Award in Manipulation).

Abstract | Links | BibTeX

@inproceedings{Lu:2022ab,
title = {Online Object Model Reconstruction and Reuse for Lifelong Improvement of Robot Manipulation},
author = {S Lu and R Wang and Y Miao and C Mitash and K Bekris},
url = {https://arxiv.org/abs/2109.13910},
year = {2022},
date = {2022-05-01},
booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
abstract = {This work proposes a robotic pipeline for picking and constrained placement of objects without geometric shape priors. Compared to recent efforts developed for similar tasks, where every object was assumed to be novel, the proposed system recognizes previously manipulated objects and performs online model reconstruction and reuse. Over a lifelong manipulation process, the system keeps learning features of objects it has interacted with and updates their reconstructed models. Whenever an instance of a previously manipulated object reappears, the system aims to first recognize it and then register its previously reconstructed model given the current observation. This step greatly reduces object shape uncertainty allowing the system to even reason for parts of objects, which are currently not observable. This also results in better manipulation efficiency as it reduces the need for active perception of the target object during manipulation. To get a reusable reconstructed model, the proposed pipeline adopts: i) TSDF for object representation, and ii) a variant of the standard particle filter algorithm for pose estimation and tracking of the partial object model. Furthermore, an effective way to construct and maintain a dataset of manipulated objects is presented. A sequence of real-world manipulation experiments is performed. They show how future manipulation tasks become more effective and efficient by reusing reconstructed models of previously manipulated objects, which were generated during their prior manipulation, instead of treating objects as novel every time.},
note = {Nomination for Best Paper Award in Manipulation},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Vieira, E; Nakhimovich, D; Gao, K; Wang, R; Yu, J; Bekris, K

Persistent Homology for Effective Non-Prehensile Manipulation Inproceedings