5,162 research outputs found

    Time-Contrastive Networks: Self-Supervised Learning from Video

    Full text link
    We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings: imitating object interactions from videos of humans, and imitating human poses. Imitation of human behavior requires a viewpoint-invariant representation that captures the relationships between end-effectors (hands or robot grippers) and the environment, object attributes, and body pose. We train our representations using a metric learning loss, where multiple simultaneous viewpoints of the same observation are attracted in the embedding space, while being repelled from temporal neighbors which are often visually similar but functionally different. In other words, the model simultaneously learns to recognize what is common between different-looking images, and what is different between similar-looking images. This signal causes our model to discover attributes that do not change across viewpoint, but do change across time, while ignoring nuisance variables such as occlusions, motion blur, lighting and background. We demonstrate that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be used as a reward function within a reinforcement learning algorithm. While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human. Reward functions obtained by following the human demonstrations under the learned representation enable efficient reinforcement learning that is practical for real-world robotic systems. Video results, open-source code and dataset are available at https://sermanet.github.io/imitat

    Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation

    Full text link
    Imitation learning is an effective approach for autonomous systems to acquire control policies when an explicit reward function is unavailable, using supervision provided as demonstrations from an expert, typically a human operator. However, standard imitation learning methods assume that the agent receives examples of observation-action tuples that could be provided, for instance, to a supervised learning algorithm. This stands in contrast to how humans and animals imitate: we observe another person performing some behavior and then figure out which actions will realize that behavior, compensating for changes in viewpoint, surroundings, object positions and types, and other factors. We term this kind of imitation learning "imitation-from-observation," and propose an imitation learning method based on video prediction with context translation and deep reinforcement learning. This lifts the assumption in imitation learning that the demonstration should consist of observations in the same environment configuration, and enables a variety of interesting applications, including learning robotic skills that involve tool use simply by observing videos of human tool use. Our experimental results show the effectiveness of our approach in learning a wide range of real-world robotic tasks modeled after common household chores from videos of a human demonstrator, including sweeping, ladling almonds, pushing objects as well as a number of tasks in simulation.Comment: Accepted at ICRA 2018, Brisbane. YuXuan Liu and Abhishek Gupta had equal contributio

    A quantitative evaluation of the AVITEWRITE model of handwriting learning

    Full text link
    Much sensory-motor behavior develops through imitation, as during the learning of handwriting by children. Such complex sequential acts are broken down into distinct motor control synergies, or muscle groups, whose activities overlap in time to generate continuous, curved movements that obey an intense relation between curvature and speed. The Adaptive Vector Integration to Endpoint (AVITEWRITE) model of Grossberg and Paine (2000) proposed how such complex movements may be learned through attentive imitation. The model suggest how frontal, parietal, and motor cortical mechanisms, such as difference vector encoding, under volitional control from the basal ganglia, interact with adaptively-timed, predictive cerebellar learning during movement imitation and predictive performance. Key psycophysical and neural data about learning to make curved movements were simulated, including a decrease in writing time as learning progresses; generation of unimodal, bell-shaped velocity profiles for each movement synergy; size scaling with isochrony, and speed scaling with preservation of the letter shape and the shapes of the velocity profiles; an inverse relation between curvature and tangential velocity; and a Two-Thirds Power Law relation between angular velocity and curvature. However, the model learned from letter trajectories of only one subject, and only qualitative kinematic comparisons were made with previously published human data. The present work describes a quantitative test of AVITEWRITE through direct comparison of a corpus of human handwriting data with the model's performance when it learns by tracing human trajectories. The results show that model performance was variable across subjects, with an average correlation between the model and human data of 89+/-10%. The present data from simulations using the AVITEWRITE model highlight some of its strengths while focusing attention on areas, such as novel shape learning in children, where all models of handwriting and learning of other complex sensory-motor skills would benefit from further research.Defense Advanced Research Projects Agency and the Office of Naval Research (N00014-95-1-0409); National Institutes of Health (1-R29-DC02952-01); Office of Naval Research (N00014-92-J-1309, N00014-01-1-0624); Air Force Office of Scientific Research (F49620-01-1-0397); National Institute of Neurological Disorders and Stroke (NS 33173

    Adversarial Imitation Learning from Incomplete Demonstrations

    Full text link
    Imitation learning targets deriving a mapping from states to actions, a.k.a. policy, from expert demonstrations. Existing methods for imitation learning typically require any actions in the demonstrations to be fully available, which is hard to ensure in real applications. Though algorithms for learning with unobservable actions have been proposed, they focus solely on state information and overlook the fact that the action sequence could still be partially available and provide useful information for policy deriving. In this paper, we propose a novel algorithm called Action-Guided Adversarial Imitation Learning (AGAIL) that learns a policy from demonstrations with incomplete action sequences, i.e., incomplete demonstrations. The core idea of AGAIL is to separate demonstrations into state and action trajectories, and train a policy with state trajectories while using actions as auxiliary information to guide the training whenever applicable. Built upon the Generative Adversarial Imitation Learning, AGAIL has three components: a generator, a discriminator, and a guide. The generator learns a policy with rewards provided by the discriminator, which tries to distinguish state distributions between demonstrations and samples generated by the policy. The guide provides additional rewards to the generator when demonstrated actions for specific states are available. We compare AGAIL to other methods on benchmark tasks and show that AGAIL consistently delivers comparable performance to the state-of-the-art methods even when the action sequence in demonstrations is only partially available.Comment: Accepted to International Joint Conference on Artificial Intelligence (IJCAI-19
    • …
    corecore