Search CORE

2,655 research outputs found

Time-Contrastive Networks: Self-Supervised Learning from Video

Author: Chebotar Yevgen
Hsu Jasmine
Jang Eric
Levine Sergey
Lynch Corey
Schaal Stefan
Sermanet Pierre
Publication venue
Publication date: 19/03/2018
Field of study

We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings: imitating object interactions from videos of humans, and imitating human poses. Imitation of human behavior requires a viewpoint-invariant representation that captures the relationships between end-effectors (hands or robot grippers) and the environment, object attributes, and body pose. We train our representations using a metric learning loss, where multiple simultaneous viewpoints of the same observation are attracted in the embedding space, while being repelled from temporal neighbors which are often visually similar but functionally different. In other words, the model simultaneously learns to recognize what is common between different-looking images, and what is different between similar-looking images. This signal causes our model to discover attributes that do not change across viewpoint, but do change across time, while ignoring nuisance variables such as occlusions, motion blur, lighting and background. We demonstrate that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be used as a reward function within a reinforcement learning algorithm. While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human. Reward functions obtained by following the human demonstrations under the learned representation enable efficient reinforcement learning that is practical for real-world robotic systems. Video results, open-source code and dataset are available at https://sermanet.github.io/imitat

arXiv.org e-Print Archive

Crossref

Motor control and strategy discovery for physically simulated characters

Author: Yin Zhiqi
Publication venue
Publication date: 27/06/2021
Field of study

In physics-based character animation, motions are realized through control of simulated characters along with their interactions with the virtual environment. In this thesis, we study the problem of character control on two levels: joint-level motor control which transforms control signals to joint torques, and high-level motion control which outputs joint-level control signals given the current state of the character and the environment and the task objective. We propose a Modified Articulated-Body Algorithm (MABA) which achieves stable proportional-derivative (PD) low-level motor control with superior theoretical time complexity, practical efficiency and stability than prior implementations. We further propose a high-level motion control framework based on deep reinforcement learning (DRL) which enables the discovery of appropriate motion strategies without human demonstrations to complete a task objective. To facilitate the learning of realistic human motions, we propose a Pose Variational Autoencoder (P-VAE) to constrain the DRL actions to a subspace of natural poses. Our learning framework can be further combined with a sample-efficient Bayesian Diversity Search (BDS) algorithm and novel policy seeking to discover diverse strategies for tasks with multiple modes, such as various athletic jumping tasks

Simon Fraser University Institutional Repository

ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes

Author: Ai Wensi
Gao Xiaofeng
Geng Haoran
Gong Ran
Huang Jiangyong
Huang Siyuan
Jia Baoxiong
Terzopoulos Demetri
Wu Qingyang
Zhao Yizhou
Zhou Ziheng
Zhu Song-Chun
Publication venue
Publication date: 09/04/2023
Field of study

Understanding the continuous states of objects is essential for task learning and planning in the real world. However, most existing task learning benchmarks assume discrete(e.g., binary) object goal states, which poses challenges for the learning of complex tasks and transferring learned policy from simulated environments to the real world. Furthermore, state discretization limits a robot's ability to follow human instructions based on the grounding of actions and states. To tackle these challenges, we present ARNOLD, a benchmark that evaluates language-grounded task learning with continuous states in realistic 3D scenes. ARNOLD is comprised of 8 language-conditioned tasks that involve understanding object states and learning policies for continuous goals. To promote language-instructed learning, we provide expert demonstrations with template-generated language descriptions. We assess task performance by utilizing the latest language-conditioned policy learning models. Our results indicate that current models for language-conditioned manipulations continue to experience significant challenges in novel goal-state generalizations, scene generalizations, and object generalizations. These findings highlight the need to develop new algorithms that address this gap and underscore the potential for further research in this area. See our project page at: https://arnold-benchmark.github.ioComment: The first two authors contributed equally; 20 pages; 17 figures; project availalbe: https://arnold-benchmark.github.io

arXiv.org e-Print Archive

Motion Synthesis and Control for Autonomous Agents using Generative Models and Reinforcement Learning

Author: Xu Pei
Publication venue: Clemson University Libraries
Publication date: 01/08/2023
Field of study

Imitating and predicting human motions have wide applications in both graphics and robotics, from developing realistic models of human movement and behavior in immersive virtual worlds and games to improving autonomous navigation for service agents deployed in the real world. Traditional approaches for motion imitation and prediction typically rely on pre-defined rules to model agent behaviors or use reinforcement learning with manually designed reward functions. Despite impressive results, such approaches cannot effectively capture the diversity of motor behaviors and the decision making capabilities of human beings. Furthermore, manually designing a model or reward function to explicitly describe human motion characteristics often involves laborious fine-tuning and repeated experiments, and may suffer from generalization issues. In this thesis, we explore data-driven approaches using generative models and reinforcement learning to study and simulate human motions. Specifically, we begin with motion synthesis and control of physically simulated agents imitating a wide range of human motor skills, and then focus on improving the local navigation decisions of autonomous agents in multi-agent interaction settings. For physics-based agent control, we introduce an imitation learning framework built upon generative adversarial networks and reinforcement learning that enables humanoid agents to learn motor skills from a few examples of human reference motion data. Our approach generates high-fidelity motions and robust controllers without needing to manually design and finetune a reward function, allowing at the same time interactive switching between different controllers based on user input. Based on this framework, we further propose a multi-objective learning scheme for composite and task-driven control of humanoid agents. Our multi-objective learning scheme balances the simultaneous learning of disparate motions from multiple reference sources and multiple goal-directed control objectives in an adaptive way, enabling the training of efficient composite motion controllers. Additionally, we present a general framework for fast and robust learning of motor control skills. Our framework exploits particle filtering to dynamically explore and discretize the high-dimensional action space involved in continuous control tasks, and provides a multi-modal policy as a substitute for the commonly used Gaussian policies. For navigation learning, we leverage human crowd data to train a human-inspired collision avoidance policy by combining knowledge distillation and reinforcement learning. Our approach enables autonomous agents to take human-like actions during goal-directed steering in fully decentralized, multi-agent environments. To inform better control in such environments, we propose SocialVAE, a variational autoencoder based architecture that uses timewise latent variables with socially-aware conditions and a backward posterior approximation to perform agent trajectory prediction. Our approach improves current state-of-the-art performance on trajectory prediction tasks in daily human interaction scenarios and more complex scenes involving interactions between NBA players. We further extend SocialVAE by exploiting semantic maps as context conditions to generate map-compliant trajectory prediction. Our approach processes context conditions and social conditions occurring during agent-agent interactions in an integrated manner through the use of a dual-attention mechanism. We demonstrate the real-time performance of our approach and its ability to provide high-fidelity, multi-modal predictions on various large-scale vehicle trajectory prediction tasks

Clemson University: TigerPrints

Object-Independent Human-to-Robot Handovers using Real Time Robotic Vision

Author: Corke Peter
Cosgun Akansel
Grafinger Manfred
Kwan Jun
Newbury Rhys
Ortenzi Valerio
Rosenberger Patrick
Publication venue
Publication date: 21/09/2020
Field of study

We present an approach for safe and object-independent human-to-robot handovers using real time robotic vision and manipulation. We aim for general applicability with a generic object detector, a fast grasp selection algorithm and by using a single gripper-mounted RGB-D camera, hence not relying on external sensors. The robot is controlled via visual servoing towards the object of interest. Putting a high emphasis on safety, we use two perception modules: human body part segmentation and hand/finger segmentation. Pixels that are deemed to belong to the human are filtered out from candidate grasp poses, hence ensuring that the robot safely picks the object without colliding with the human partner. The grasp selection and perception modules run concurrently in real-time, which allows monitoring of the progress. In experiments with 13 objects, the robot was able to successfully take the object from the human in 81.9% of the trials.Comment: IEEE Robotics and Automation Letters (RA-L). Preprint Version. Accepted September, 2020. The code and videos can be found at https://patrosat.github.io/h2r_handovers

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive