150,566 research outputs found
Visual Imitation Learning with Recurrent Siamese Networks
It would be desirable for a reinforcement learning (RL) based agent to learn
behaviour by merely watching a demonstration. However, defining rewards that
facilitate this goal within the RL paradigm remains a challenge. Here we
address this problem with Siamese networks, trained to compute distances
between observed behaviours and the agent's behaviours. Given a desired motion
such Siamese networks can be used to provide a reward signal to an RL agent via
the distance between the desired motion and the agent's motion. We experiment
with an RNN-based comparator model that can compute distances in space and time
between motion clips while training an RL policy to minimize this distance.
Through experimentation, we have had also found that the inclusion of
multi-task data and an additional image encoding loss helps enforce the
temporal consistency. These two components appear to balance reward for
matching a specific instance of behaviour versus that behaviour in general.
Furthermore, we focus here on a particularly challenging form of this problem
where only a single demonstration is provided for a given task -- the one-shot
learning setting. We demonstrate our approach on humanoid agents in both 2D
with degrees of freedom (DoF) and 3D with DoF.Comment: PrePrin
Learning Feedback Terms for Reactive Planning and Control
With the advancement of robotics, machine learning, and machine perception,
increasingly more robots will enter human environments to assist with daily
tasks. However, dynamically-changing human environments requires reactive
motion plans. Reactivity can be accomplished through replanning, e.g.
model-predictive control, or through a reactive feedback policy that modifies
on-going behavior in response to sensory events. In this paper, we investigate
how to use machine learning to add reactivity to a previously learned nominal
skilled behavior. We approach this by learning a reactive modification term for
movement plans represented by nonlinear differential equations. In particular,
we use dynamic movement primitives (DMPs) to represent a skill and a neural
network to learn a reactive policy from human demonstrations. We use the well
explored domain of obstacle avoidance for robot manipulation as a test bed. Our
approach demonstrates how a neural network can be combined with physical
insights to ensure robust behavior across different obstacle settings and
movement durations. Evaluations on an anthropomorphic robotic system
demonstrate the effectiveness of our work.Comment: 8 pages, accepted to be published at ICRA 2017 conferenc
Managing technological transitions: prospects, places, publics and policy
Transition management (TM) approaches have generated considerable interest in
academic and policy circles in recent years (Kemp and Loorbach, 2005; Rotmans and
Kemp, 2003). In terms of a loose definition, a âtransition can be defined as a gradual,
continuous process of structural change within a society or cultureâ (Rotmans et al, 2001,
p.2). The development of TM, much of which has occurred within the context of the
Netherlands, may be seen as a response to the complexities, uncertainties and problems
which confront many western societies, in organising âsustainablyâ various aspects of
energy, agricultural, water, transport and health systems of production and consumption.
Problems such as pollution, congestion, the vulnerability of energy or water supplies and
so on are seen as systemic and entwined or embedded in a series of social, economic,
political, cultural and technological relationships.
The systemic nature of many of these problems highlights the involvement - in the
functioning of a particular system and any subsequent transition - of multiple actors or
âstakeholdersâ across different local, national and international scales of activity. With this
in mind, such problems become difficult to âsolveâ and âsolutionsâ are seen to require
systemic innovation rather than individual or episodic responses. The point being that
âthese problems are system inherent and⊠the solution lies in creating different systems or
transforming existing onesâ (Kemp and Loorbach, 2005, p.125).
In this paper we critically engage with and build upon transitions approaches to address
their âapplicabilityâ in the context of the UK. In doing this the paper addresses the
prospective potential of transitions approaches, but also their relative neglect of places and
publics. Through developing an argument which addresses the strengths and âgapsâ of
transitions approaches we also analyse the resonances and dissonances between three
themes â cities and regions, public participation and national hydrogen strategy â in the
transitions literature and the UK policy context
Learning Sensor Feedback Models from Demonstrations via Phase-Modulated Neural Networks
In order to robustly execute a task under environmental uncertainty, a robot
needs to be able to reactively adapt to changes arising in its environment. The
environment changes are usually reflected in deviation from expected sensory
traces. These deviations in sensory traces can be used to drive the motion
adaptation, and for this purpose, a feedback model is required. The feedback
model maps the deviations in sensory traces to the motion plan adaptation. In
this paper, we develop a general data-driven framework for learning a feedback
model from demonstrations. We utilize a variant of a radial basis function
network structure --with movement phases as kernel centers-- which can
generally be applied to represent any feedback models for movement primitives.
To demonstrate the effectiveness of our framework, we test it on the task of
scraping on a tilt board. In this task, we are learning a reactive policy in
the form of orientation adaptation, based on deviations of tactile sensor
traces. As a proof of concept of our method, we provide evaluations on an
anthropomorphic robot. A video demonstrating our approach and its results can
be seen in https://youtu.be/7Dx5imy1KcwComment: 8 pages, accepted to be published at the International Conference on
Robotics and Automation (ICRA) 201
- âŠ