944 research outputs found
A Deep Probabilistic Framework for Heterogeneous Self-Supervised Learning of Affordances
The perception of affordances provides an action-centered parametric representation of the environment. By perceiving an object's visual features in terms of what actions they afford, novel behavior opportunities can be inferred about previously unseen objects. In this paper, a flexible deep probabilistic framework is proposed which allows an explorative agent to learn tool-object affordances in continuous space. To this end, we use a deep variational auto-encoder with heterogeneous probabilistic distributions to infer the most probable action that achieves a desired effect or to predict a parametric probability distribution over action consequences i.e. effects. Our experiments show the generalization of the method to unseen objects and tools and we have analyzed the influence of different design choices. Our framework goes beyond other proposals by incorporating various probability distributions tailored for each individual modality and by eliminating the need for any pre-processing of the data
SERKET: An Architecture for Connecting Stochastic Models to Realize a Large-Scale Cognitive Model
To realize human-like robot intelligence, a large-scale cognitive
architecture is required for robots to understand the environment through a
variety of sensors with which they are equipped. In this paper, we propose a
novel framework named Serket that enables the construction of a large-scale
generative model and its inference easily by connecting sub-modules to allow
the robots to acquire various capabilities through interaction with their
environments and others. We consider that large-scale cognitive models can be
constructed by connecting smaller fundamental models hierarchically while
maintaining their programmatic independence. Moreover, connected modules are
dependent on each other, and parameters are required to be optimized as a
whole. Conventionally, the equations for parameter estimation have to be
derived and implemented depending on the models. However, it becomes harder to
derive and implement those of a larger scale model. To solve these problems, in
this paper, we propose a method for parameter estimation by communicating the
minimal parameters between various modules while maintaining their programmatic
independence. Therefore, Serket makes it easy to construct large-scale models
and estimate their parameters via the connection of modules. Experimental
results demonstrated that the model can be constructed by connecting modules,
the parameters can be optimized as a whole, and they are comparable with the
original models that we have proposed
Learning grasp affordance reasoning through semantic relations
Reasoning about object affordances allows an autonomous agent to perform
generalised manipulation tasks among object instances. While current approaches
to grasp affordance estimation are effective, they are limited to a single
hypothesis. We present an approach for detection and extraction of multiple
grasp affordances on an object via visual input. We define semantics as a
combination of multiple attributes, which yields benefits in terms of
generalisation for grasp affordance prediction. We use Markov Logic Networks to
build a knowledge base graph representation to obtain a probability
distribution of grasp affordances for an object. To harvest the knowledge base,
we collect and make available a novel dataset that relates different semantic
attributes. We achieve reliable mappings of the predicted grasp affordances on
the object by learning prototypical grasping patches from several examples. We
show our method's generalisation capabilities on grasp affordance prediction
for novel instances and compare with similar methods in the literature.
Moreover, using a robotic platform, on simulated and real scenarios, we
evaluate the success of the grasping task when conditioned on the grasp
affordance prediction.Comment: Accepted in IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS) 201
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age
Simultaneous Localization and Mapping (SLAM)consists in the concurrent
construction of a model of the environment (the map), and the estimation of the
state of the robot moving within it. The SLAM community has made astonishing
progress over the last 30 years, enabling large-scale real-world applications,
and witnessing a steady transition of this technology to industry. We survey
the current state of SLAM. We start by presenting what is now the de-facto
standard formulation for SLAM. We then review related work, covering a broad
set of topics including robustness and scalability in long-term mapping, metric
and semantic representations for mapping, theoretical performance guarantees,
active SLAM and exploration, and other new frontiers. This paper simultaneously
serves as a position paper and tutorial to those who are users of SLAM. By
looking at the published research with a critical eye, we delineate open
challenges and new research issues, that still deserve careful scientific
investigation. The paper also contains the authors' take on two questions that
often animate discussions during robotics conferences: Do robots need SLAM? and
Is SLAM solved
Learning Deep Features for Robotic Inference from Physical Interactions
In order to effectively handle multiple tasks that are not pre-defined, a robotic agent needs to automatically map its high-dimensional sensory inputs into useful features. As a solution, feature learning has empirically shown substantial improvements in obtaining representations that are generalizable to different tasks, compared to feature engineering approaches, but it requires a large amount of data and computational capacity. These challenges are specifically relevant in robotics due to the low signal-to-noise ratios inherent to robotic data, and to the cost typically associated with collecting this type of input. In this paper, we propose a deep probabilistic method based on Convolutional Variational Auto-Encoders (CVAEs) to learn visual features suitable for interaction and recognition tasks. We run our experiments on a self-supervised robotic sensorimotor dataset. Our data was acquired with the iCub humanoid and is based on a standard object collection, thus being readily extensible. We evaluated the learned features in terms of usability for 1) object recognition, 2) capturing the statistics of the effects, and 3) planning. In addition, where applicable, we compared the performance of the proposed architecture with other state-ofthe-art models. These experiments demonstrate that our model is capable of capturing the functional statistics of action and perception (i.e. images) which performs better than existing baselines, without requiring millions of samples or any handengineered features
A Framework for Fast, Autonomous, and Reliable Tool Incorporation on iCub
One of the main advantages of building robots with size and motor capabilities close to those of humans, such as the iCub, lies in the fact that they can potentially take advantage of a world populated with tools and devices designed by and for humans. However, in order to be able to do proper use of the tools around them, robots need to be able to incorporate these tools, that is, to build a representation of the tool's geometry, reach and pose with respect to the robot. The present paper tackles this argument by presenting a repository which implements a series of interconnected methods that enable autonomous, fast and reliable tool incorporation on the iCub platform
End-to-end Autonomous Driving: Challenges and Frontiers
The autonomous driving community has witnessed a rapid growth in approaches
that embrace an end-to-end algorithm framework, utilizing raw sensor input to
generate vehicle motion plans, instead of concentrating on individual tasks
such as detection and motion prediction. End-to-end systems, in comparison to
modular pipelines, benefit from joint feature optimization for perception and
planning. This field has flourished due to the availability of large-scale
datasets, closed-loop evaluation, and the increasing need for autonomous
driving algorithms to perform effectively in challenging scenarios. In this
survey, we provide a comprehensive analysis of more than 250 papers, covering
the motivation, roadmap, methodology, challenges, and future trends in
end-to-end autonomous driving. We delve into several critical challenges,
including multi-modality, interpretability, causal confusion, robustness, and
world models, amongst others. Additionally, we discuss current advancements in
foundation models and visual pre-training, as well as how to incorporate these
techniques within the end-to-end driving framework. To facilitate future
research, we maintain an active repository that contains up-to-date links to
relevant literature and open-source projects at
https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving
- …