709 research outputs found

    Robot eye-hand coordination learning by watching human demonstrations: a task function approximation approach

    Full text link
    We present a robot eye-hand coordination learning method that can directly learn visual task specification by watching human demonstrations. Task specification is represented as a task function, which is learned using inverse reinforcement learning(IRL) by inferring differential rewards between state changes. The learned task function is then used as continuous feedbacks in an uncalibrated visual servoing(UVS) controller designed for the execution phase. Our proposed method can directly learn from raw videos, which removes the need for hand-engineered task specification. It can also provide task interpretability by directly approximating the task function. Besides, benefiting from the use of a traditional UVS controller, our training process is efficient and the learned policy is independent from a particular robot platform. Various experiments were designed to show that, for a certain DOF task, our method can adapt to task/environment variances in target positions, backgrounds, illuminations, and occlusions without prior retraining.Comment: Accepted in ICRA 201

    Improving sample efficiency in deep reinforcement learning

    Get PDF
    Deep reinforcement learning (DRL) has made great progress in dealing with complex control problems in various test scenarios, such as playing video games, playing board games, and dexterous robotic manipulation, with the promise of critical real-world applications, such as controlling plasmas for nuclear fusion. However, DRL requires large amounts of interactions with an environment to find an optimal policy to solve the task, limiting its application in real-world problems. In this thesis, we focus on two aspects to improve sample efficiency in DRL: 1) solving sparse reward tasks and 2) improving general exploration strategies. First, we analyse the trained agents with and without domain randomisation (DR), a technique that can reduce the reality gap between a simulator and real-world scenarios. Through evaluating their robustness to previous unseen environments and applying both qualitative and quantitative interpretability methods, we provide the insight into the behaviour of trained agents. Finally, some suggestions are also given to researchers who intend to adopt interpretability methods to analyse DRL agents. Second, we propose two methods to overcome exploration difficulties and improve learning efficiency in goal-oriented RL with the sparse reward setting, where an agent can rarely achieve positive feedback. In the first method, to provide sufficient positive samples for training an agent, hindsight goal relabelling is used to replace goals in original samples with intermediate goals, and these augmented positive samples are leveraged to accelerate the training via a self-imitation learning paradigm. An additional selection module is also designed to remove undesirable modified samples and stabilise training. In the second method, to alleviate the inefficiency of hindsight experience replay (HER) caused by its uniform sampling strategy, a diversity-based sampling method is employed to select valuable and diverse experiences for efficient training. Furthermore, diversity-augmented intrinsic motivation is introduced to encourage the agent to explore novel states in an environment with sparse or delayed rewards. During training, the diversity of adjacent state sequences is measured under the framework of determinantal point processes (DPPs) and this measurement is used as an auxiliary reward to facilitate the exploration of the agent, thus improving the final performance.Open Acces

    A Hierarchical Bayesian model for Inverse RL in Partially-Controlled Environments

    Full text link
    Robots learning from observations in the real world using inverse reinforcement learning (IRL) may encounter objects or agents in the environment, other than the expert, that cause nuisance observations during the demonstration. These confounding elements are typically removed in fully-controlled environments such as virtual simulations or lab settings. When complete removal is impossible the nuisance observations must be filtered out. However, identifying the source of observations when large amounts of observations are made is difficult. To address this, we present a hierarchical Bayesian model that incorporates both the expert's and the confounding elements' observations thereby explicitly modeling the diverse observations a robot may receive. We extend an existing IRL algorithm originally designed to work under partial occlusion of the expert to consider the diverse observations. In a simulated robotic sorting domain containing both occlusion and confounding elements, we demonstrate the model's effectiveness. In particular, our technique outperforms several other comparative methods, second only to having perfect knowledge of the subject's trajectory.Comment: 8 pages, 10 figure

    Time-Contrastive Networks: Self-Supervised Learning from Video

    Full text link
    We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings: imitating object interactions from videos of humans, and imitating human poses. Imitation of human behavior requires a viewpoint-invariant representation that captures the relationships between end-effectors (hands or robot grippers) and the environment, object attributes, and body pose. We train our representations using a metric learning loss, where multiple simultaneous viewpoints of the same observation are attracted in the embedding space, while being repelled from temporal neighbors which are often visually similar but functionally different. In other words, the model simultaneously learns to recognize what is common between different-looking images, and what is different between similar-looking images. This signal causes our model to discover attributes that do not change across viewpoint, but do change across time, while ignoring nuisance variables such as occlusions, motion blur, lighting and background. We demonstrate that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be used as a reward function within a reinforcement learning algorithm. While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human. Reward functions obtained by following the human demonstrations under the learned representation enable efficient reinforcement learning that is practical for real-world robotic systems. Video results, open-source code and dataset are available at https://sermanet.github.io/imitat

    Data-driven robotic manipulation of cloth-like deformable objects : the present, challenges and future prospects

    Get PDF
    Manipulating cloth-like deformable objects (CDOs) is a long-standing problem in the robotics community. CDOs are flexible (non-rigid) objects that do not show a detectable level of compression strength while two points on the article are pushed towards each other and include objects such as ropes (1D), fabrics (2D) and bags (3D). In general, CDOs’ many degrees of freedom (DoF) introduce severe self-occlusion and complex state–action dynamics as significant obstacles to perception and manipulation systems. These challenges exacerbate existing issues of modern robotic control methods such as imitation learning (IL) and reinforcement learning (RL). This review focuses on the application details of data-driven control methods on four major task families in this domain: cloth shaping, knot tying/untying, dressing and bag manipulation. Furthermore, we identify specific inductive biases in these four domains that present challenges for more general IL and RL algorithms.Publisher PDFPeer reviewe

    Human Motion Trajectory Prediction: A Survey

    Full text link
    With growing numbers of intelligent autonomous systems in human environments, the ability of such systems to perceive, understand and anticipate human behavior becomes increasingly important. Specifically, predicting future positions of dynamic agents and planning considering such predictions are key tasks for self-driving vehicles, service robots and advanced surveillance systems. This paper provides a survey of human motion trajectory prediction. We review, analyze and structure a large selection of work from different communities and propose a taxonomy that categorizes existing methods based on the motion modeling approach and level of contextual information used. We provide an overview of the existing datasets and performance metrics. We discuss limitations of the state of the art and outline directions for further research.Comment: Submitted to the International Journal of Robotics Research (IJRR), 37 page

    Bayesian Nonparametric Learning of Cloth Models for Real-time State Estimation

    Get PDF
    Robotic solutions to clothing assistance can significantly improve quality of life for the elderly and disabled. Real-time estimation of the human-cloth relationship is crucial for efficient learning of motor skills for robotic clothing assistance. The major challenge involved is cloth-state estimation due to inherent nonrigidity and occlusion. In this study, we present a novel framework for real-time estimation of the cloth state using a low-cost depth sensor, making it suitable for a feasible social implementation. The framework relies on the hypothesis that clothing articles are constrained to a low-dimensional latent manifold during clothing tasks. We propose the use of manifold relevance determination (MRD) to learn an offline cloth model that can be used to perform informed cloth-state estimation in real time. The cloth model is trained using observations from a motion capture system and depth sensor. MRD provides a principled probabilistic framework for inferring the accurate motion-capture state when only the noisy depth sensor feature state is available in real time. The experimental results demonstrate that our framework is capable of learning consistent task-specific latent features using few data samples and has the ability to generalize to unseen environmental settings. We further present several factors that affect the predictive performance of the learned cloth-state model

    Sample efficiency, transfer learning and interpretability for deep reinforcement learning

    Get PDF
    Deep learning has revolutionised artificial intelligence, where the application of increased compute to train neural networks on large datasets has resulted in improvements in real-world applications such as object detection, text-to-speech synthesis and machine translation. Deep reinforcement learning (DRL) has similarly shown impressive results in board and video games, but less so in real-world applications such as robotic control. To address this, I have investigated three factors prohibiting further deployment of DRL: sample efficiency, transfer learning, and interpretability. To decrease the amount of data needed to train DRL systems, I have explored various storage strategies and exploration policies for episodic control (EC) algorithms, resulting in the application of online clustering to improve the memory efficiency of EC algorithms, and the maximum entropy mellowmax policy for improving the sample efficiency and final performance of the same EC algorithms. To improve performance during transfer learning, I have shown that a multi-headed neural network architecture trained using hierarchical reinforcement learning can retain the benefits of positive transfer between tasks while mitigating the interference effects of negative transfer. I additionally investigated the use of multi-headed architectures to reduce catastrophic forgetting under the continual learning setting. While the use of multiple heads worked well within a simple environment, it was of limited use within a more complex domain, indicating that this strategy does not scale well. Finally, I applied a wide range of quantitative and qualitative techniques to better interpret trained DRL agents. In particular, I compared the effects of training DRL agents both with and without visual domain randomisation (DR), a popular technique to achieve simulation-to-real transfer, providing a series of tests that can be applied before real-world deployment. One of the major findings is that DR produces more entangled representations within trained DRL agents, indicating quantitatively that they are invariant to nuisance factors associated with the DR process. Additionally, while my environment allowed agents trained without DR to succeed without requiring complex recurrent processing, all agents trained with DR appear to integrate information over time, as evidenced through ablations on the recurrent state.Open Acces
    • …
    corecore