17,934 research outputs found

    Deep Predictive Policy Training using Reinforcement Learning

    Full text link
    Skilled robot task learning is best implemented by predictive action policies due to the inherent latency of sensorimotor processes. However, training such predictive policies is challenging as it involves finding a trajectory of motor activations for the full duration of the action. We propose a data-efficient deep predictive policy training (DPPT) framework with a deep neural network policy architecture which maps an image observation to a sequence of motor activations. The architecture consists of three sub-networks referred to as the perception, policy and behavior super-layers. The perception and behavior super-layers force an abstraction of visual and motor data trained with synthetic and simulated training samples, respectively. The policy super-layer is a small sub-network with fewer parameters that maps data in-between the abstracted manifolds. It is trained for each task using methods for policy search reinforcement learning. We demonstrate the suitability of the proposed architecture and learning framework by training predictive policies for skilled object grasping and ball throwing on a PR2 robot. The effectiveness of the method is illustrated by the fact that these tasks are trained using only about 180 real robot attempts with qualitative terminal rewards.Comment: This work is submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems 2017 (IROS2017

    Conceptual spatial representations for indoor mobile robots

    Get PDF
    We present an approach for creating conceptual representations of human-made indoor environments using mobile robots. The concepts refer to spatial and functional properties of typical indoor environments. Following findings in cognitive psychology, our model is composed of layers representing maps at different levels of abstraction. The complete system is integrated in a mobile robot endowed with laser and vision sensors for place and object recognition. The system also incorporates a linguistic framework that actively supports the map acquisition process, and which is used for situated dialogue. Finally, we discuss the capabilities of the integrated system

    Exploiting Points and Lines in Regression Forests for RGB-D Camera Relocalization

    Full text link
    Camera relocalization plays a vital role in many robotics and computer vision tasks, such as global localization, recovery from tracking failure and loop closure detection. Recent random forests based methods exploit randomly sampled pixel comparison features to predict 3D world locations for 2D image locations to guide the camera pose optimization. However, these image features are only sampled randomly in the images, without considering the spatial structures or geometric information, leading to large errors or failure cases with the existence of poorly textured areas or in motion blur. Line segment features are more robust in these environments. In this work, we propose to jointly exploit points and lines within the framework of uncertainty driven regression forests. The proposed approach is thoroughly evaluated on three publicly available datasets against several strong state-of-the-art baselines in terms of several different error metrics. Experimental results prove the efficacy of our method, showing superior or on-par state-of-the-art performance.Comment: published as a conference paper at 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS

    Time-Contrastive Networks: Self-Supervised Learning from Video

    Full text link
    We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings: imitating object interactions from videos of humans, and imitating human poses. Imitation of human behavior requires a viewpoint-invariant representation that captures the relationships between end-effectors (hands or robot grippers) and the environment, object attributes, and body pose. We train our representations using a metric learning loss, where multiple simultaneous viewpoints of the same observation are attracted in the embedding space, while being repelled from temporal neighbors which are often visually similar but functionally different. In other words, the model simultaneously learns to recognize what is common between different-looking images, and what is different between similar-looking images. This signal causes our model to discover attributes that do not change across viewpoint, but do change across time, while ignoring nuisance variables such as occlusions, motion blur, lighting and background. We demonstrate that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be used as a reward function within a reinforcement learning algorithm. While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human. Reward functions obtained by following the human demonstrations under the learned representation enable efficient reinforcement learning that is practical for real-world robotic systems. Video results, open-source code and dataset are available at https://sermanet.github.io/imitat

    Action recognition based on efficient deep feature learning in the spatio-temporal domain

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Hand-crafted feature functions are usually designed based on the domain knowledge of a presumably controlled environment and often fail to generalize, as the statistics of real-world data cannot always be modeled correctly. Data-driven feature learning methods, on the other hand, have emerged as an alternative that often generalize better in uncontrolled environments. We present a simple, yet robust, 2D convolutional neural network extended to a concatenated 3D network that learns to extract features from the spatio-temporal domain of raw video data. The resulting network model is used for content-based recognition of videos. Relying on a 2D convolutional neural network allows us to exploit a pretrained network as a descriptor that yielded the best results on the largest and challenging ILSVRC-2014 dataset. Experimental results on commonly used benchmarking video datasets demonstrate that our results are state-of-the-art in terms of accuracy and computational time without requiring any preprocessing (e.g., optic flow) or a priori knowledge on data capture (e.g., camera motion estimation), which makes it more general and flexible than other approaches. Our implementation is made available.Peer ReviewedPostprint (author's final draft

    Learning to Singulate Objects using a Push Proposal Network

    Full text link
    Learning to act in unstructured environments, such as cluttered piles of objects, poses a substantial challenge for manipulation robots. We present a novel neural network-based approach that separates unknown objects in clutter by selecting favourable push actions. Our network is trained from data collected through autonomous interaction of a PR2 robot with randomly organized tabletop scenes. The model is designed to propose meaningful push actions based on over-segmented RGB-D images. We evaluate our approach by singulating up to 8 unknown objects in clutter. We demonstrate that our method enables the robot to perform the task with a high success rate and a low number of required push actions. Our results based on real-world experiments show that our network is able to generalize to novel objects of various sizes and shapes, as well as to arbitrary object configurations. Videos of our experiments can be viewed at http://robotpush.cs.uni-freiburg.deComment: International Symposium on Robotics Research (ISRR) 2017, videos: http://robotpush.cs.uni-freiburg.d

    Interactive semantic mapping: Experimental evaluation

    Get PDF
    Robots that are launched in the consumer market need to provide more effective human robot interaction, and, in particular, spoken language interfaces. However, in order to support the execution of high level commands as they are specified in natural language, a semantic map is required. Such a map is a representation that enables the robot to ground the commands into the actual places and objects located in the environment. In this paper, we present the experimental evaluation of a system specifically designed to build semantically rich maps, through the interaction with the user. The results of the experiments not only provide the basis for a discussion of the features of the proposed approach, but also highlight the manifold issues that arise in the evaluation of semantic mapping

    Learning Articulated Motions From Visual Demonstration

    Full text link
    Many functional elements of human homes and workplaces consist of rigid components which are connected through one or more sliding or rotating linkages. Examples include doors and drawers of cabinets and appliances; laptops; and swivel office chairs. A robotic mobile manipulator would benefit from the ability to acquire kinematic models of such objects from observation. This paper describes a method by which a robot can acquire an object model by capturing depth imagery of the object as a human moves it through its range of motion. We envision that in future, a machine newly introduced to an environment could be shown by its human user the articulated objects particular to that environment, inferring from these "visual demonstrations" enough information to actuate each object independently of the user. Our method employs sparse (markerless) feature tracking, motion segmentation, component pose estimation, and articulation learning; it does not require prior object models. Using the method, a robot can observe an object being exercised, infer a kinematic model incorporating rigid, prismatic and revolute joints, then use the model to predict the object's motion from a novel vantage point. We evaluate the method's performance, and compare it to that of a previously published technique, for a variety of household objects.Comment: Published in Robotics: Science and Systems X, Berkeley, CA. ISBN: 978-0-9923747-0-
    corecore