944 research outputs found

    A Deep Probabilistic Framework for Heterogeneous Self-Supervised Learning of Affordances

    Get PDF
    The perception of affordances provides an action-centered parametric representation of the environment. By perceiving an object's visual features in terms of what actions they afford, novel behavior opportunities can be inferred about previously unseen objects. In this paper, a flexible deep probabilistic framework is proposed which allows an explorative agent to learn tool-object affordances in continuous space. To this end, we use a deep variational auto-encoder with heterogeneous probabilistic distributions to infer the most probable action that achieves a desired effect or to predict a parametric probability distribution over action consequences i.e. effects. Our experiments show the generalization of the method to unseen objects and tools and we have analyzed the influence of different design choices. Our framework goes beyond other proposals by incorporating various probability distributions tailored for each individual modality and by eliminating the need for any pre-processing of the data

    SERKET: An Architecture for Connecting Stochastic Models to Realize a Large-Scale Cognitive Model

    Full text link
    To realize human-like robot intelligence, a large-scale cognitive architecture is required for robots to understand the environment through a variety of sensors with which they are equipped. In this paper, we propose a novel framework named Serket that enables the construction of a large-scale generative model and its inference easily by connecting sub-modules to allow the robots to acquire various capabilities through interaction with their environments and others. We consider that large-scale cognitive models can be constructed by connecting smaller fundamental models hierarchically while maintaining their programmatic independence. Moreover, connected modules are dependent on each other, and parameters are required to be optimized as a whole. Conventionally, the equations for parameter estimation have to be derived and implemented depending on the models. However, it becomes harder to derive and implement those of a larger scale model. To solve these problems, in this paper, we propose a method for parameter estimation by communicating the minimal parameters between various modules while maintaining their programmatic independence. Therefore, Serket makes it easy to construct large-scale models and estimate their parameters via the connection of modules. Experimental results demonstrated that the model can be constructed by connecting modules, the parameters can be optimized as a whole, and they are comparable with the original models that we have proposed

    Learning grasp affordance reasoning through semantic relations

    Get PDF
    Reasoning about object affordances allows an autonomous agent to perform generalised manipulation tasks among object instances. While current approaches to grasp affordance estimation are effective, they are limited to a single hypothesis. We present an approach for detection and extraction of multiple grasp affordances on an object via visual input. We define semantics as a combination of multiple attributes, which yields benefits in terms of generalisation for grasp affordance prediction. We use Markov Logic Networks to build a knowledge base graph representation to obtain a probability distribution of grasp affordances for an object. To harvest the knowledge base, we collect and make available a novel dataset that relates different semantic attributes. We achieve reliable mappings of the predicted grasp affordances on the object by learning prototypical grasping patches from several examples. We show our method's generalisation capabilities on grasp affordance prediction for novel instances and compare with similar methods in the literature. Moreover, using a robotic platform, on simulated and real scenarios, we evaluate the success of the grasping task when conditioned on the grasp affordance prediction.Comment: Accepted in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 201

    Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age

    Get PDF
    Simultaneous Localization and Mapping (SLAM)consists in the concurrent construction of a model of the environment (the map), and the estimation of the state of the robot moving within it. The SLAM community has made astonishing progress over the last 30 years, enabling large-scale real-world applications, and witnessing a steady transition of this technology to industry. We survey the current state of SLAM. We start by presenting what is now the de-facto standard formulation for SLAM. We then review related work, covering a broad set of topics including robustness and scalability in long-term mapping, metric and semantic representations for mapping, theoretical performance guarantees, active SLAM and exploration, and other new frontiers. This paper simultaneously serves as a position paper and tutorial to those who are users of SLAM. By looking at the published research with a critical eye, we delineate open challenges and new research issues, that still deserve careful scientific investigation. The paper also contains the authors' take on two questions that often animate discussions during robotics conferences: Do robots need SLAM? and Is SLAM solved

    Learning Deep Features for Robotic Inference from Physical Interactions

    Get PDF
    In order to effectively handle multiple tasks that are not pre-defined, a robotic agent needs to automatically map its high-dimensional sensory inputs into useful features. As a solution, feature learning has empirically shown substantial improvements in obtaining representations that are generalizable to different tasks, compared to feature engineering approaches, but it requires a large amount of data and computational capacity. These challenges are specifically relevant in robotics due to the low signal-to-noise ratios inherent to robotic data, and to the cost typically associated with collecting this type of input. In this paper, we propose a deep probabilistic method based on Convolutional Variational Auto-Encoders (CVAEs) to learn visual features suitable for interaction and recognition tasks. We run our experiments on a self-supervised robotic sensorimotor dataset. Our data was acquired with the iCub humanoid and is based on a standard object collection, thus being readily extensible. We evaluated the learned features in terms of usability for 1) object recognition, 2) capturing the statistics of the effects, and 3) planning. In addition, where applicable, we compared the performance of the proposed architecture with other state-ofthe-art models. These experiments demonstrate that our model is capable of capturing the functional statistics of action and perception (i.e. images) which performs better than existing baselines, without requiring millions of samples or any handengineered features

    A Framework for Fast, Autonomous, and Reliable Tool Incorporation on iCub

    Get PDF
    One of the main advantages of building robots with size and motor capabilities close to those of humans, such as the iCub, lies in the fact that they can potentially take advantage of a world populated with tools and devices designed by and for humans. However, in order to be able to do proper use of the tools around them, robots need to be able to incorporate these tools, that is, to build a representation of the tool's geometry, reach and pose with respect to the robot. The present paper tackles this argument by presenting a repository which implements a series of interconnected methods that enable autonomous, fast and reliable tool incorporation on the iCub platform

    End-to-end Autonomous Driving: Challenges and Frontiers

    Full text link
    The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and motion prediction. End-to-end systems, in comparison to modular pipelines, benefit from joint feature optimization for perception and planning. This field has flourished due to the availability of large-scale datasets, closed-loop evaluation, and the increasing need for autonomous driving algorithms to perform effectively in challenging scenarios. In this survey, we provide a comprehensive analysis of more than 250 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving. We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others. Additionally, we discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework. To facilitate future research, we maintain an active repository that contains up-to-date links to relevant literature and open-source projects at https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving
    • …
    corecore