791 research outputs found
A physics-based Juggling Simulation using Reinforcement Learning
학위논문 (석사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2019. 2. Lee, Jehee.Juggling is a physical skill which consists in keeping one or several objects in continuous motion in the air by tossing and catching it. Jugglers need a high dexterity to control their throws and catches which require speed, accuracy and synchronization. Also, the more balls we juggle with, the more those qualities have to be strong to achieve this performance. This thesis follows a previous project made by Lee et al.[1] where they performed juggling to demonstrate their method. In this work, we want to generalize the juggling skill and create a real time simulation by using machine learning. A reason to choose this skill is that Studying the ability to toss and catch balls and rings provides insight into human coordination, robotics and mathematics as written in the article Science of Juggling[2]. That is why juggling can be a good challenge for realistic physical based simulation to improve our knowledge on these fields, but also to help jugglers to evaluate the feasibility of their tricks. In order to do it, we have to understand all the different notations used in juggling and to apply the mathematical theory of juggling to reproduce it.
In this thesis, we find an approach to learn juggling. We first break the need of synchronization of both hands by dividing our character in two. Then we divide the juggling into two subtasks catching and throwing a ball, where we present a deep reinforcement learning method for both of them. Finally, we use these tasks sequentially on both sides of the body to recreate the all juggling process.
As a result, our character learns to catch all balls randomly thrown to him and to throw it at the velocity wanted. After combination of both subtasks, our juggler is able to react accurately and with enough speed and power to juggle up to 6 balls, even with external forces applied on it.I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
II. Juggling theory . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Notation and Parameters . . . . . . . . . . . . . . . . . . . 4
2.2 Juggling patterns . . . . . . . . . . . . . . . . . . . . . . . 6
III. Approach to learn juggling . . . . . . . . . . . . . . . . . . . 9
3.1 Juggling sequence . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Reinforcement learning . . . . . . . . . . . . . . . . . . . . 10
3.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.2 Advantages . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Rewards for Juggling . . . . . . . . . . . . . . . . . . . . . 14
3.3.1 Catching . . . . . . . . . . . . . . . . . . . . . . . 14
3.3.2 Throwing . . . . . . . . . . . . . . . . . . . . . . . 15
IV. Experiments and Results . . . . . . . . . . . . . . . . . . . . 17
4.1 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1.1 States . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1.2 Actions . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.3 Environment of our Simulation . . . . . . . . . . . . 20
4.2 Subtasks results . . . . . . . . . . . . . . . . . . . . . . . . 21
iii
4.2.1 Throwing . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.2 Catching . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Performing juggling . . . . . . . . . . . . . . . . . . . . . . 25
4.3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.2 Add new ball while juggling . . . . . . . . . . . . . 26
V. Toward a 3D juggling . . . . . . . . . . . . . . . . . . . . . . 28
5.1 Catching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Throwing . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
VI. Discussion and Conclusion . . . . . . . . . . . . . . . . . . . 33
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . 37Maste
Using humanoid robots to study human behavior
Our understanding of human behavior advances as our humanoid robotics work progresses-and vice versa. This team's work focuses on trajectory formation and planning, learning from demonstration, oculomotor control and interactive behaviors. They are programming robotic behavior based on how we humans “program” behavior in-or train-each other
DribbleBot: Dynamic Legged Manipulation in the Wild
DribbleBot (Dexterous Ball Manipulation with a Legged Robot) is a legged
robotic system that can dribble a soccer ball under the same real-world
conditions as humans (i.e., in-the-wild). We adopt the paradigm of training
policies in simulation using reinforcement learning and transferring them into
the real world. We overcome critical challenges of accounting for variable ball
motion dynamics on different terrains and perceiving the ball using
body-mounted cameras under the constraints of onboard computing. Our results
provide evidence that current quadruped platforms are well-suited for studying
dynamic whole-body control problems involving simultaneous locomotion and
manipulation directly from sensory observations.Comment: To appear at the IEEE Conference on Robotics and Automation (ICRA),
2023. Video is available at https://gmargo11.github.io/dribblebot
Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes
I argue that data becomes temporarily interesting by itself to some
self-improving, but computationally limited, subjective observer once he learns
to predict or compress the data in a better way, thus making it subjectively
simpler and more beautiful. Curiosity is the desire to create or discover more
non-random, non-arbitrary, regular data that is novel and surprising not in the
traditional sense of Boltzmann and Shannon but in the sense that it allows for
compression progress because its regularity was not yet known. This drive
maximizes interestingness, the first derivative of subjective beauty or
compressibility, that is, the steepness of the learning curve. It motivates
exploring infants, pure mathematicians, composers, artists, dancers, comedians,
yourself, and (since 1990) artificial systems.Comment: 35 pages, 3 figures, based on KES 2008 keynote and ALT 2007 / DS 2007
joint invited lectur
Cyclic Policy Distillation: Sample-Efficient Sim-to-Real Reinforcement Learning with Domain Randomization
Deep reinforcement learning with domain randomization learns a control policy
in various simulations with randomized physical and sensor model parameters to
become transferable to the real world in a zero-shot setting. However, a huge
number of samples are often required to learn an effective policy when the
range of randomized parameters is extensive due to the instability of policy
updates. To alleviate this problem, we propose a sample-efficient method named
cyclic policy distillation (CPD). CPD divides the range of randomized
parameters into several small sub-domains and assigns a local policy to each
one. Then local policies are learned while cyclically transitioning to
sub-domains. CPD accelerates learning through knowledge transfer based on
expected performance improvements. Finally, all of the learned local policies
are distilled into a global policy for sim-to-real transfers. CPD's
effectiveness and sample efficiency are demonstrated through simulations with
four tasks (Pendulum from OpenAIGym and Pusher, Swimmer, and HalfCheetah from
Mujoco), and a real-robot, ball-dispersal task. We published code and videos
from our experiments at
https://github.com/yuki-kadokawa/cyclic-policy-distillation
Motion Synthesis and Control for Autonomous Agents using Generative Models and Reinforcement Learning
Imitating and predicting human motions have wide applications in both graphics and robotics, from developing realistic models of human movement and behavior in immersive virtual worlds and games to improving autonomous navigation for service agents deployed in the real world. Traditional approaches for motion imitation and prediction typically rely on pre-defined rules to model agent behaviors or use reinforcement learning with manually designed reward functions. Despite impressive results, such approaches cannot effectively capture the diversity of motor behaviors and the decision making capabilities of human beings. Furthermore, manually designing a model or reward function to explicitly describe human motion characteristics often involves laborious fine-tuning and repeated experiments, and may suffer from generalization issues. In this thesis, we explore data-driven approaches using generative models and reinforcement learning to study and simulate human motions. Specifically, we begin with motion synthesis and control of physically simulated agents imitating a wide range of human motor skills, and then focus on improving the local navigation decisions of autonomous agents in multi-agent interaction settings. For physics-based agent control, we introduce an imitation learning framework built upon generative adversarial networks and reinforcement learning that enables humanoid agents to learn motor skills from a few examples of human reference motion data. Our approach generates high-fidelity motions and robust controllers without needing to manually design and finetune a reward function, allowing at the same time interactive switching between different controllers based on user input. Based on this framework, we further propose a multi-objective learning scheme for composite and task-driven control of humanoid agents. Our multi-objective learning scheme balances the simultaneous learning of disparate motions from multiple reference sources and multiple goal-directed control objectives in an adaptive way, enabling the training of efficient composite motion controllers. Additionally, we present a general framework for fast and robust learning of motor control skills. Our framework exploits particle filtering to dynamically explore and discretize the high-dimensional action space involved in continuous control tasks, and provides a multi-modal policy as a substitute for the commonly used Gaussian policies. For navigation learning, we leverage human crowd data to train a human-inspired collision avoidance policy by combining knowledge distillation and reinforcement learning. Our approach enables autonomous agents to take human-like actions during goal-directed steering in fully decentralized, multi-agent environments. To inform better control in such environments, we propose SocialVAE, a variational autoencoder based architecture that uses timewise latent variables with socially-aware conditions and a backward posterior approximation to perform agent trajectory prediction. Our approach improves current state-of-the-art performance on trajectory prediction tasks in daily human interaction scenarios and more complex scenes involving interactions between NBA players. We further extend SocialVAE by exploiting semantic maps as context conditions to generate map-compliant trajectory prediction. Our approach processes context conditions and social conditions occurring during agent-agent interactions in an integrated manner through the use of a dual-attention mechanism. We demonstrate the real-time performance of our approach and its ability to provide high-fidelity, multi-modal predictions on various large-scale vehicle trajectory prediction tasks
- …