866 research outputs found
Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots
Improving the generalization capabilities of general-purpose robotic agents
has long been a significant challenge actively pursued by research communities.
Existing approaches often rely on collecting large-scale real-world robotic
data, such as the RT-1 dataset. However, these approaches typically suffer from
low efficiency, limiting their capability in open-domain scenarios with new
objects, and diverse backgrounds. In this paper, we propose a novel paradigm
that effectively leverages language-grounded segmentation masks generated by
state-of-the-art foundation models, to address a wide range of pick-and-place
robot manipulation tasks in everyday scenarios. By integrating precise
semantics and geometries conveyed from masks into our multi-view policy model,
our approach can perceive accurate object poses and enable sample-efficient
learning. Besides, such design facilitates effective generalization for
grasping new objects with similar shapes observed during training. Our approach
consists of two distinct steps. First, we introduce a series of foundation
models to accurately ground natural language demands across multiple tasks.
Second, we develop a Multi-modal Multi-view Policy Model that incorporates
inputs such as RGB images, semantic masks, and robot proprioception states to
jointly predict precise and executable robot actions. Extensive real-world
experiments conducted on a Franka Emika robot arm validate the effectiveness of
our proposed paradigm. Real-world demos are shown in YouTube
(https://www.youtube.com/watch?v=1m9wNzfp_4E ) and Bilibili
(https://www.bilibili.com/video/BV178411Z7H2/ )
Interactron: Embodied Adaptive Object Detection
Over the years various methods have been proposed for the problem of object
detection. Recently, we have witnessed great strides in this domain owing to
the emergence of powerful deep neural networks. However, there are typically
two main assumptions common among these approaches. First, the model is trained
on a fixed training set and is evaluated on a pre-recorded test set. Second,
the model is kept frozen after the training phase, so no further updates are
performed after the training is finished. These two assumptions limit the
applicability of these methods to real-world settings. In this paper, we
propose Interactron, a method for adaptive object detection in an interactive
setting, where the goal is to perform object detection in images observed by an
embodied agent navigating in different environments. Our idea is to continue
training during inference and adapt the model at test time without any explicit
supervision via interacting with the environment. Our adaptive object detection
model provides a 11.8 point improvement in AP (and 19.1 points in AP50) over
DETR, a recent, high-performance object detector. Moreover, we show that our
object detection model adapts to environments with completely different
appearance characteristics, and its performance is on par with a model trained
with full supervision for those environments. The code is available at:
https://github.com/allenai/interactron .Comment: CVPR 202
Learning object, grasping and manipulation activities using hierarchical HMMs
This article presents a probabilistic algorithm for representing and learning complex manipulation activities performed by humans in everyday life. The work builds on the multi-level Hierarchical Hidden Markov Model (HHMM) framework which allows decomposition of longer-term complex manipulation activities into layers of abstraction whereby the building blocks can be represented by simpler action modules called action primitives. This way, human task knowledge can be synthesised in a compact, effective representation suitable, for instance, to be subsequently transferred to a robot for imitation. The main contribution is the use of a robust framework capable of dealing with the uncertainty or incomplete data inherent to these activities, and the ability to represent behaviours at multiple levels of abstraction for enhanced task generalisation. Activity data from 3D video sequencing of human manipulation of different objects handled in everyday life is used for evaluation. A comparison with a mixed generative-discriminative hybrid model HHMM/SVM (support vector machine) is also presented to add rigour in highlighting the benefit of the proposed approach against comparable state of the art techniques. © 2014 Springer Science+Business Media New York
Graph learning in robotics: a survey
Deep neural networks for graphs have emerged as a powerful tool for learning
on complex non-euclidean data, which is becoming increasingly common for a
variety of different applications. Yet, although their potential has been
widely recognised in the machine learning community, graph learning is largely
unexplored for downstream tasks such as robotics applications. To fully unlock
their potential, hence, we propose a review of graph neural architectures from
a robotics perspective. The paper covers the fundamentals of graph-based
models, including their architecture, training procedures, and applications. It
also discusses recent advancements and challenges that arise in applied
settings, related for example to the integration of perception,
decision-making, and control. Finally, the paper provides an extensive review
of various robotic applications that benefit from learning on graph structures,
such as bodies and contacts modelling, robotic manipulation, action
recognition, fleet motion planning, and many more. This survey aims to provide
readers with a thorough understanding of the capabilities and limitations of
graph neural architectures in robotics, and to highlight potential avenues for
future research
- …