2 research outputs found
Object Finding in Cluttered Scenes Using Interactive Perception
Object finding in clutter is a skill that requires perception of the
environment and in many cases physical interaction. In robotics, interactive
perception defines a set of algorithms that leverage actions to improve the
perception of the environment, and vice versa use perception to guide the next
action. Scene interactions are difficult to model, therefore, most of the
current systems use predefined heuristics. This limits their ability to
efficiently search for the target object in a complex environment. In order to
remove heuristics and the need for explicit models of the interactions, in this
work we propose a reinforcement learning based active and interactive
perception system for scene exploration and object search. We evaluate our work
both in simulated and in real-world experiments using a robotic manipulator
equipped with an RGB and a depth camera, and compare our system to two
baselines. The results indicate that our approach, trained in simulation only,
transfers smoothly to reality and can solve the object finding task efficiently
and with more than 88% success rate.Comment: IEEE International Conference on Robotics and Automation (ICRA), 202
Multimodal Sensor Fusion with Differentiable Filters
Leveraging multimodal information with recursive Bayesian filters improves
performance and robustness of state estimation, as recursive filters can
combine different modalities according to their uncertainties. Prior work has
studied how to optimally fuse different sensor modalities with analytical state
estimation algorithms. However, deriving the dynamics and measurement models
along with their noise profile can be difficult or lead to intractable models.
Differentiable filters provide a way to learn these models end-to-end while
retaining the algorithmic structure of recursive filters. This can be
especially helpful when working with sensor modalities that are high
dimensional and have very different characteristics. In contact-rich
manipulation, we want to combine visual sensing (which gives us global
information) with tactile sensing (which gives us local information). In this
paper, we study new differentiable filtering architectures to fuse
heterogeneous sensor information. As case studies, we evaluate three tasks: two
in planar pushing (simulated and real) and one in manipulating a kinematically
constrained door (simulated). In extensive evaluations, we find that
differentiable filters that leverage crossmodal sensor information reach
comparable accuracies to unstructured LSTM models, while presenting
interpretability benefits that may be important for safety-critical systems. We
also release an open-source library for creating and training differentiable
Bayesian filters in PyTorch, which can be found on our project website:
https://sites.google.com/view/multimodalfilterComment: Published in IROS 2020. Updated sponsors, fixed Kalman gain typ