Search CORE

791 research outputs found

A physics-based Juggling Simulation using Reinforcement Learning

Author: 제이슨
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (석사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2019. 2. Lee, Jehee.Juggling is a physical skill which consists in keeping one or several objects in continuous motion in the air by tossing and catching it. Jugglers need a high dexterity to control their throws and catches which require speed, accuracy and synchronization. Also, the more balls we juggle with, the more those qualities have to be strong to achieve this performance. This thesis follows a previous project made by Lee et al.[1] where they performed juggling to demonstrate their method. In this work, we want to generalize the juggling skill and create a real time simulation by using machine learning. A reason to choose this skill is that Studying the ability to toss and catch balls and rings provides insight into human coordination, robotics and mathematics as written in the article Science of Juggling[2]. That is why juggling can be a good challenge for realistic physical based simulation to improve our knowledge on these fields, but also to help jugglers to evaluate the feasibility of their tricks. In order to do it, we have to understand all the different notations used in juggling and to apply the mathematical theory of juggling to reproduce it. In this thesis, we find an approach to learn juggling. We first break the need of synchronization of both hands by dividing our character in two. Then we divide the juggling into two subtasks catching and throwing a ball, where we present a deep reinforcement learning method for both of them. Finally, we use these tasks sequentially on both sides of the body to recreate the all juggling process. As a result, our character learns to catch all balls randomly thrown to him and to throw it at the velocity wanted. After combination of both subtasks, our juggler is able to react accurately and with enough speed and power to juggle up to 6 balls, even with external forces applied on it.I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 II. Juggling theory . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Notation and Parameters . . . . . . . . . . . . . . . . . . . 4 2.2 Juggling patterns . . . . . . . . . . . . . . . . . . . . . . . 6 III. Approach to learn juggling . . . . . . . . . . . . . . . . . . . 9 3.1 Juggling sequence . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Reinforcement learning . . . . . . . . . . . . . . . . . . . . 10 3.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.2 Advantages . . . . . . . . . . . . . . . . . . . . . . 13 3.3 Rewards for Juggling . . . . . . . . . . . . . . . . . . . . . 14 3.3.1 Catching . . . . . . . . . . . . . . . . . . . . . . . 14 3.3.2 Throwing . . . . . . . . . . . . . . . . . . . . . . . 15 IV. Experiments and Results . . . . . . . . . . . . . . . . . . . . 17 4.1 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.1.1 States . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.1.2 Actions . . . . . . . . . . . . . . . . . . . . . . . . 19 4.1.3 Environment of our Simulation . . . . . . . . . . . . 20 4.2 Subtasks results . . . . . . . . . . . . . . . . . . . . . . . . 21 iii 4.2.1 Throwing . . . . . . . . . . . . . . . . . . . . . . . 21 4.2.2 Catching . . . . . . . . . . . . . . . . . . . . . . . 22 4.3 Performing juggling . . . . . . . . . . . . . . . . . . . . . . 25 4.3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . 25 4.3.2 Add new ball while juggling . . . . . . . . . . . . . 26 V. Toward a 3D juggling . . . . . . . . . . . . . . . . . . . . . . 28 5.1 Catching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.2 Throwing . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 VI. Discussion and Conclusion . . . . . . . . . . . . . . . . . . . 33 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . 37Maste

SNU Open Repository and Archive

Using humanoid robots to study human behavior

Author: Atkeson C.G.
Hale J.G.
Kawato E.
Kawato M.
Kotosaka S.
Pollick F.E.
Riley M.
Schaul S.
Shibata T.
Tevatia G.
Ude A.
Vijayakumar S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

Our understanding of human behavior advances as our humanoid robotics work progresses-and vice versa. This team's work focuses on trajectory formation and planning, learning from demonstration, oculomotor control and interactive behaviors. They are programming robotic behavior based on how we humans “program” behavior in-or train-each other

CiteSeerX

Crossref

Enlighten

DribbleBot: Dynamic Legged Manipulation in the Wild

Author: Agrawal Pulkit
Ji Yandong
Margolis Gabriel B.
Publication venue
Publication date: 03/04/2023
Field of study

DribbleBot (Dexterous Ball Manipulation with a Legged Robot) is a legged robotic system that can dribble a soccer ball under the same real-world conditions as humans (i.e., in-the-wild). We adopt the paradigm of training policies in simulation using reinforcement learning and transferring them into the real world. We overcome critical challenges of accounting for variable ball motion dynamics on different terrains and perceiving the ball using body-mounted cameras under the constraints of onboard computing. Our results provide evidence that current quadruped platforms are well-suited for studying dynamic whole-body control problems involving simultaneous locomotion and manipulation directly from sensory observations.Comment: To appear at the IEEE Conference on Robotics and Automation (ICRA), 2023. Video is available at https://gmargo11.github.io/dribblebot

arXiv.org e-Print Archive

Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes

Author: Schmidhuber Juergen
Publication venue
Publication date: 01/01/2008
Field of study

I argue that data becomes temporarily interesting by itself to some self-improving, but computationally limited, subjective observer once he learns to predict or compress the data in a better way, thus making it subjectively simpler and more beautiful. Curiosity is the desire to create or discover more non-random, non-arbitrary, regular data that is novel and surprising not in the traditional sense of Boltzmann and Shannon but in the sense that it allows for compression progress because its regularity was not yet known. This drive maximizes interestingness, the first derivative of subjective beauty or compressibility, that is, the steepness of the learning curve. It motivates exploring infants, pure mathematicians, composers, artists, dancers, comedians, yourself, and (since 1990) artificial systems.Comment: 35 pages, 3 figures, based on KES 2008 keynote and ALT 2007 / DS 2007 joint invited lectur

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

Cyclic Policy Distillation: Sample-Efficient Sim-to-Real Reinforcement Learning with Domain Randomization

Author: Kadokawa Yuki
Matsubara Takamitsu
Tsurumine Yoshihisa
Zhu Lingwei
Publication venue
Publication date: 10/04/2023
Field of study

Deep reinforcement learning with domain randomization learns a control policy in various simulations with randomized physical and sensor model parameters to become transferable to the real world in a zero-shot setting. However, a huge number of samples are often required to learn an effective policy when the range of randomized parameters is extensive due to the instability of policy updates. To alleviate this problem, we propose a sample-efficient method named cyclic policy distillation (CPD). CPD divides the range of randomized parameters into several small sub-domains and assigns a local policy to each one. Then local policies are learned while cyclically transitioning to sub-domains. CPD accelerates learning through knowledge transfer based on expected performance improvements. Finally, all of the learned local policies are distilled into a global policy for sim-to-real transfers. CPD's effectiveness and sample efficiency are demonstrated through simulations with four tasks (Pendulum from OpenAIGym and Pusher, Swimmer, and HalfCheetah from Mujoco), and a real-robot, ball-dispersal task. We published code and videos from our experiments at https://github.com/yuki-kadokawa/cyclic-policy-distillation

arXiv.org e-Print Archive

Motion Synthesis and Control for Autonomous Agents using Generative Models and Reinforcement Learning

Author: Xu Pei
Publication venue: Clemson University Libraries
Publication date: 01/08/2023
Field of study

Imitating and predicting human motions have wide applications in both graphics and robotics, from developing realistic models of human movement and behavior in immersive virtual worlds and games to improving autonomous navigation for service agents deployed in the real world. Traditional approaches for motion imitation and prediction typically rely on pre-defined rules to model agent behaviors or use reinforcement learning with manually designed reward functions. Despite impressive results, such approaches cannot effectively capture the diversity of motor behaviors and the decision making capabilities of human beings. Furthermore, manually designing a model or reward function to explicitly describe human motion characteristics often involves laborious fine-tuning and repeated experiments, and may suffer from generalization issues. In this thesis, we explore data-driven approaches using generative models and reinforcement learning to study and simulate human motions. Specifically, we begin with motion synthesis and control of physically simulated agents imitating a wide range of human motor skills, and then focus on improving the local navigation decisions of autonomous agents in multi-agent interaction settings. For physics-based agent control, we introduce an imitation learning framework built upon generative adversarial networks and reinforcement learning that enables humanoid agents to learn motor skills from a few examples of human reference motion data. Our approach generates high-fidelity motions and robust controllers without needing to manually design and finetune a reward function, allowing at the same time interactive switching between different controllers based on user input. Based on this framework, we further propose a multi-objective learning scheme for composite and task-driven control of humanoid agents. Our multi-objective learning scheme balances the simultaneous learning of disparate motions from multiple reference sources and multiple goal-directed control objectives in an adaptive way, enabling the training of efficient composite motion controllers. Additionally, we present a general framework for fast and robust learning of motor control skills. Our framework exploits particle filtering to dynamically explore and discretize the high-dimensional action space involved in continuous control tasks, and provides a multi-modal policy as a substitute for the commonly used Gaussian policies. For navigation learning, we leverage human crowd data to train a human-inspired collision avoidance policy by combining knowledge distillation and reinforcement learning. Our approach enables autonomous agents to take human-like actions during goal-directed steering in fully decentralized, multi-agent environments. To inform better control in such environments, we propose SocialVAE, a variational autoencoder based architecture that uses timewise latent variables with socially-aware conditions and a backward posterior approximation to perform agent trajectory prediction. Our approach improves current state-of-the-art performance on trajectory prediction tasks in daily human interaction scenarios and more complex scenes involving interactions between NBA players. We further extend SocialVAE by exploiting semantic maps as context conditions to generate map-compliant trajectory prediction. Our approach processes context conditions and social conditions occurring during agent-agent interactions in an integrated manner through the use of a dual-attention mechanism. We demonstrate the real-time performance of our approach and its ability to provide high-fidelity, multi-modal predictions on various large-scale vehicle trajectory prediction tasks

Clemson University: TigerPrints