150,566 research outputs found

    Visual Imitation Learning with Recurrent Siamese Networks

    Full text link
    It would be desirable for a reinforcement learning (RL) based agent to learn behaviour by merely watching a demonstration. However, defining rewards that facilitate this goal within the RL paradigm remains a challenge. Here we address this problem with Siamese networks, trained to compute distances between observed behaviours and the agent's behaviours. Given a desired motion such Siamese networks can be used to provide a reward signal to an RL agent via the distance between the desired motion and the agent's motion. We experiment with an RNN-based comparator model that can compute distances in space and time between motion clips while training an RL policy to minimize this distance. Through experimentation, we have had also found that the inclusion of multi-task data and an additional image encoding loss helps enforce the temporal consistency. These two components appear to balance reward for matching a specific instance of behaviour versus that behaviour in general. Furthermore, we focus here on a particularly challenging form of this problem where only a single demonstration is provided for a given task -- the one-shot learning setting. We demonstrate our approach on humanoid agents in both 2D with 1010 degrees of freedom (DoF) and 3D with 3838 DoF.Comment: PrePrin

    Learning Feedback Terms for Reactive Planning and Control

    Full text link
    With the advancement of robotics, machine learning, and machine perception, increasingly more robots will enter human environments to assist with daily tasks. However, dynamically-changing human environments requires reactive motion plans. Reactivity can be accomplished through replanning, e.g. model-predictive control, or through a reactive feedback policy that modifies on-going behavior in response to sensory events. In this paper, we investigate how to use machine learning to add reactivity to a previously learned nominal skilled behavior. We approach this by learning a reactive modification term for movement plans represented by nonlinear differential equations. In particular, we use dynamic movement primitives (DMPs) to represent a skill and a neural network to learn a reactive policy from human demonstrations. We use the well explored domain of obstacle avoidance for robot manipulation as a test bed. Our approach demonstrates how a neural network can be combined with physical insights to ensure robust behavior across different obstacle settings and movement durations. Evaluations on an anthropomorphic robotic system demonstrate the effectiveness of our work.Comment: 8 pages, accepted to be published at ICRA 2017 conferenc

    Managing technological transitions: prospects, places, publics and policy

    Get PDF
    Transition management (TM) approaches have generated considerable interest in academic and policy circles in recent years (Kemp and Loorbach, 2005; Rotmans and Kemp, 2003). In terms of a loose definition, a ‘transition can be defined as a gradual, continuous process of structural change within a society or culture’ (Rotmans et al, 2001, p.2). The development of TM, much of which has occurred within the context of the Netherlands, may be seen as a response to the complexities, uncertainties and problems which confront many western societies, in organising ‘sustainably’ various aspects of energy, agricultural, water, transport and health systems of production and consumption. Problems such as pollution, congestion, the vulnerability of energy or water supplies and so on are seen as systemic and entwined or embedded in a series of social, economic, political, cultural and technological relationships. The systemic nature of many of these problems highlights the involvement - in the functioning of a particular system and any subsequent transition - of multiple actors or ‘stakeholders’ across different local, national and international scales of activity. With this in mind, such problems become difficult to ‘solve’ and ‘solutions’ are seen to require systemic innovation rather than individual or episodic responses. The point being that ‘these problems are system inherent and
 the solution lies in creating different systems or transforming existing ones’ (Kemp and Loorbach, 2005, p.125). In this paper we critically engage with and build upon transitions approaches to address their ‘applicability’ in the context of the UK. In doing this the paper addresses the prospective potential of transitions approaches, but also their relative neglect of places and publics. Through developing an argument which addresses the strengths and ‘gaps’ of transitions approaches we also analyse the resonances and dissonances between three themes – cities and regions, public participation and national hydrogen strategy – in the transitions literature and the UK policy context

    Learning Sensor Feedback Models from Demonstrations via Phase-Modulated Neural Networks

    Full text link
    In order to robustly execute a task under environmental uncertainty, a robot needs to be able to reactively adapt to changes arising in its environment. The environment changes are usually reflected in deviation from expected sensory traces. These deviations in sensory traces can be used to drive the motion adaptation, and for this purpose, a feedback model is required. The feedback model maps the deviations in sensory traces to the motion plan adaptation. In this paper, we develop a general data-driven framework for learning a feedback model from demonstrations. We utilize a variant of a radial basis function network structure --with movement phases as kernel centers-- which can generally be applied to represent any feedback models for movement primitives. To demonstrate the effectiveness of our framework, we test it on the task of scraping on a tilt board. In this task, we are learning a reactive policy in the form of orientation adaptation, based on deviations of tactile sensor traces. As a proof of concept of our method, we provide evaluations on an anthropomorphic robot. A video demonstrating our approach and its results can be seen in https://youtu.be/7Dx5imy1KcwComment: 8 pages, accepted to be published at the International Conference on Robotics and Automation (ICRA) 201
    • 

    corecore