259 research outputs found
Real-time biped character stepping
PhD ThesisA rudimentary biped activity that is essential in interactive evirtual worlds, such as
video-games and training simulations, is stepping. For example, stepping is fundamental in everyday terrestrial activities that include walking and balance recovery.
Therefore an e๏ฌective 3D stepping control algorithm that is computationally fast
and easy to implement is extremely valuable and important to character animation
research. This thesis focuses on generating real-time controllable stepping motions
on-the-๏ฌy without key-framed data that are responsive and robust (e.g.,can remain
upright and balanced under a variety of conditions, such as pushes and dynami-
cally changing terrain). In our approach, we control the characterโs direction and
speed by means of varying the stepposition and duration. Our lightweight stepping
model is used to create coordinated full-body motions, which produce directable
steps to guide the character with speci๏ฌc goals (e.g., following a particular path
while placing feet at viable locations). We also create protective steps in response
to random disturbances (e.g., pushes). Whereby, the system automatically calculates where and when to place the foot to remedy the disruption. In conclusion,
the inverted pendulum has a number of limitations that we address and resolve
to produce an improved lightweight technique that provides better control and
stability using approximate feature enhancements, for instance, ankle-torque and
elongated-body
Real-Time Character Rise Motions
This paper presents an uncomplicated dynamic controller for generating
physically-plausible three-dimensional full-body biped character rise motions
on-the-fly at run-time. Our low-dimensional controller uses fundamental
reference information (e.g., center-of-mass, hands, and feet locations) to
produce balanced biped get-up poses by means of a real-time physically-based
simulation. The key idea is to use a simple approximate model (i.e., similar to
the inverted-pendulum stepping model) to create continuous reference
trajectories that can be seamlessly tracked by an articulated biped character
to create balanced rise-motions. Our approach does not use any key-framed data
or any computationally expensive processing (e.g., offline-optimization or
search algorithms). We demonstrate the effectiveness and ease of our technique
through example (i.e., a biped character picking itself up from different
laying positions)
Framework for embedding physical systems into virtual experiences
We present an immersive Virtual Reality (VR) experience through a combination of technologies including a physical rig; a gamelike experience; and a reรฏยฌned physics model with control. At its heart, the core technology introduces the concept of a physics-based communication that allows force-driven interaction to be shared between the player and game entities in the virtual world. Because the framework is generic and extendable, the application supports a myriad of interaction modes, constrained only by the limitations of the physical rig (see Figure 1). To showcase the technology, we demonstrate a locomoting robot placed in an immersive gamelike setting
๋์์ ์ ์ฌ๋ ๋์์ ๋ฌผ๋ฆฌ ๊ธฐ๋ฐ ์ฌ๊ตฌ์ฑ ๋ฐ ๋ถ์
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ) -- ์์ธ๋ํ๊ต ๋ํ์ : ๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ, 2021. 2. ์ด์ ํฌ.In computer graphics, simulating and analyzing human movement have been interesting research topics started since the 1960s. Still, simulating realistic human movements in a 3D virtual world is a challenging task in computer graphics. In general, motion capture techniques have been used. Although the motion capture data guarantees realistic result and high-quality data, there is lots of equipment required to capture motion, and the process is complicated. Recently, 3D human pose estimation techniques from the 2D video are remarkably developed. Researchers in computer graphics and computer vision have attempted to reconstruct the various human motions from video data. However, existing methods can not robustly estimate dynamic actions and not work on videos filmed with a moving camera.
In this thesis, we propose methods to reconstruct dynamic human motions from in-the-wild videos and to control the motions. First, we developed a framework to reconstruct motion from videos using prior physics knowledge. For dynamic motions such as backspin, the poses estimated by a state-of-the-art method are incomplete and include unreliable root trajectory or lack intermediate poses. We designed a reward function using poses and hints extracted from videos in the deep reinforcement learning controller and learned a policy to simultaneously reconstruct motion and control a virtual character. Second, we simulated figure skating movements in video. Skating sequences consist of fast and dynamic movements on ice, hindering the acquisition of motion data. Thus, we extracted 3D key poses from a video to then successfully replicate several figure skating movements using trajectory optimization and a deep reinforcement learning controller. Third, we devised an algorithm for gait analysis through video of patients with movement disorders. After acquiring the patients joint positions from 2D video processed by a deep learning network, the 3D absolute coordinates were estimated, and gait parameters such as gait velocity, cadence, and step length were calculated. Additionally, we analyzed the optimization criteria of human walking by using a 3D musculoskeletal humanoid model and physics-based simulation. For two criteria, namely, the minimization of muscle activation and joint torque, we compared simulation data with real human data for analysis.
To demonstrate the effectiveness of the first two research topics, we verified the reconstruction of dynamic human motions from 2D videos using physics-based simulations. For the last two research topics, we evaluated our results with real human data.์ปดํจํฐ ๊ทธ๋ํฝ์ค์์ ์ธ๊ฐ์ ์์ง์ ์๋ฎฌ๋ ์ด์
๋ฐ ๋ถ์์ 1960 ๋
๋๋ถํฐ ๋ค๋ฃจ์ด์ง ํฅ๋ฏธ๋ก์ด ์ฐ๊ตฌ ์ฃผ์ ์ด๋ค. ๋ช ์ญ๋
๋์ ํ๋ฐํ๊ฒ ์ฐ๊ตฌ๋์ด ์์์๋ ๋ถ๊ตฌํ๊ณ , 3์ฐจ์ ๊ฐ์ ๊ณต๊ฐ ์์์ ์ฌ์ค์ ์ธ ์ธ๊ฐ์ ์์ง์์ ์๋ฎฌ๋ ์ด์
ํ๋ ์ฐ๊ตฌ๋ ์ฌ์ ํ ์ด๋ ต๊ณ ๋์ ์ ์ธ ์ฃผ์ ์ด๋ค. ๊ทธ๋์ ์ฌ๋์ ์์ง์ ๋ฐ์ดํฐ๋ฅผ ์ป๊ธฐ ์ํด์ ๋ชจ์
์บก์ณ ๊ธฐ์ ์ด ์ฌ์ฉ๋์ด ์๋ค. ๋ชจ์
์บก์ฒ ๋ฐ์ดํฐ๋ ์ฌ์ค์ ์ธ ๊ฒฐ๊ณผ์ ๊ณ ํ์ง ๋ฐ์ดํฐ๋ฅผ ๋ณด์ฅํ์ง๋ง ๋ชจ์
์บก์ณ๋ฅผ ํ๊ธฐ ์ํด์ ํ์ํ ์ฅ๋น๋ค์ด ๋ง๊ณ , ๊ทธ ๊ณผ์ ์ด ๋ณต์กํ๋ค. ์ต๊ทผ์ 2์ฐจ์ ์์์ผ๋ก๋ถํฐ ์ฌ๋์ 3์ฐจ์ ์์ธ๋ฅผ ์ถ์ ํ๋ ์ฐ๊ตฌ๋ค์ด ๊ด๋ชฉํ ๋งํ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ฌ์ฃผ๊ณ ์๋ค. ์ด๋ฅผ ๋ฐํ์ผ๋ก ์ปดํจํฐ ๊ทธ๋ํฝ์ค์ ์ปดํจํฐ ๋น์ ผ ๋ถ์ผ์ ์ฐ๊ตฌ์๋ค์ ๋น๋์ค ๋ฐ์ดํฐ๋ก๋ถํฐ ๋ค์ํ ์ธ๊ฐ ๋์์ ์ฌ๊ตฌ์ฑํ๋ ค๋ ์๋๋ฅผ ํ๊ณ ์๋ค. ๊ทธ๋ฌ๋ ๊ธฐ์กด์ ๋ฐฉ๋ฒ๋ค์ ๋น ๋ฅด๊ณ ๋ค์ด๋๋ฏนํ ๋์๋ค์ ์์ ์ ์ผ๋ก ์ถ์ ํ์ง ๋ชปํ๋ฉฐ ์์ง์ด๋ ์นด๋ฉ๋ผ๋ก ์ดฌ์ํ ๋น๋์ค์ ๋ํด์๋ ์๋ํ์ง ์๋๋ค.
๋ณธ ๋
ผ๋ฌธ์์๋ ๋น๋์ค๋ก๋ถํฐ ์ญ๋์ ์ธ ์ธ๊ฐ ๋์์ ์ฌ๊ตฌ์ฑํ๊ณ ๋์์ ์ ์ดํ๋ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ๋จผ์ ์ฌ์ ๋ฌผ๋ฆฌํ ์ง์์ ์ฌ์ฉํ์ฌ ๋น๋์ค์์ ๋ชจ์
์ ์ฌ๊ตฌ์ฑํ๋ ํ๋ ์ ์ํฌ๋ฅผ ์ ์ํ๋ค. ๊ณต์ค์ ๋น์ ๊ฐ์ ์ญ๋์ ์ธ ๋์๋ค์ ๋ํด์ ์ต์ ์ฐ๊ตฌ ๋ฐฉ๋ฒ์ ๋์ํ์ฌ ์ถ์ ๋ ์์ธ๋ค์ ์บ๋ฆญํฐ์ ๊ถค์ ์ ์ ๋ขฐํ ์ ์๊ฑฐ๋ ์ค๊ฐ์ ์์ธ ์ถ์ ์ ์คํจํ๋ ๋ฑ ๋ถ์์ ํ๋ค. ์ฐ๋ฆฌ๋ ์ฌ์ธต๊ฐํํ์ต ์ ์ด๊ธฐ์์ ์์์ผ๋ก๋ถํฐ ์ถ์ถํ ํฌ์ฆ์ ํํธ๋ฅผ ํ์ฉํ์ฌ ๋ณด์ ํจ์๋ฅผ ์ค๊ณํ๊ณ ๋ชจ์
์ฌ๊ตฌ์ฑ๊ณผ ์บ๋ฆญํฐ ์ ์ด๋ฅผ ๋์์ ํ๋ ์ ์ฑ
์ ํ์ตํ์๋ค. ๋ ์งธ, ๋น๋์ค์์ ํผ๊ฒจ ์ค์ผ์ดํ
๊ธฐ์ ์ ์๋ฎฌ๋ ์ด์
ํ๋ค. ํผ๊ฒจ ์ค์ผ์ดํ
๊ธฐ์ ๋ค์ ๋น์์์ ๋น ๋ฅด๊ณ ์ญ๋์ ์ธ ์์ง์์ผ๋ก ๊ตฌ์ฑ๋์ด ์์ด ๋ชจ์
๋ฐ์ดํฐ๋ฅผ ์ป๊ธฐ๊ฐ ๊น๋ค๋กญ๋ค. ๋น๋์ค์์ 3์ฐจ์ ํค ํฌ์ฆ๋ฅผ ์ถ์ถํ๊ณ ๊ถค์ ์ต์ ํ ๋ฐ ์ฌ์ธต๊ฐํํ์ต ์ ์ด๊ธฐ๋ฅผ ์ฌ์ฉํ์ฌ ์ฌ๋ฌ ํผ๊ฒจ ์ค์ผ์ดํ
๊ธฐ์ ์ ์ฑ๊ณต์ ์ผ๋ก ์์ฐํ๋ค. ์
์งธ, ํํจ์จ ๋ณ์ด๋ ๋์ฑ๋ง๋น์ ๊ฐ์ ์ง๋ณ์ผ๋ก ์ธํ์ฌ ์์ง์ ์ฅ์ ๊ฐ ์๋ ํ์์ ๋ณดํ์ ๋ถ์ํ๊ธฐ ์ํ ์๊ณ ๋ฆฌ์ฆ์ ์ ์ํ๋ค. 2์ฐจ์ ๋น๋์ค๋ก๋ถํฐ ๋ฅ๋ฌ๋์ ์ฌ์ฉํ ์์ธ ์ถ์ ๊ธฐ๋ฒ์ ์ฌ์ฉํ์ฌ ํ์์ ๊ด์ ์์น๋ฅผ ์ป์ด๋ธ ๋ค์, 3์ฐจ์ ์ ๋ ์ขํ๋ฅผ ์ป์ด๋ด์ด ์ด๋ก๋ถํฐ ๋ณดํญ, ๋ณดํ ์๋์ ๊ฐ์ ๋ณดํ ํ๋ผ๋ฏธํฐ๋ฅผ ๊ณ์ฐํ๋ค. ๋ง์ง๋ง์ผ๋ก, ๊ทผ๊ณจ๊ฒฉ ์ธ์ฒด ๋ชจ๋ธ๊ณผ ๋ฌผ๋ฆฌ ์๋ฎฌ๋ ์ด์
์ ์ด์ฉํ์ฌ ์ธ๊ฐ ๋ณดํ์ ์ต์ ํ ๊ธฐ์ค์ ๋ํด ํ๊ตฌํ๋ค. ๊ทผ์ก ํ์ฑ๋ ์ต์ํ์ ๊ด์ ๋๋ฆผํ ์ต์ํ, ๋ ๊ฐ์ง ๊ธฐ์ค์ ๋ํด ์๋ฎฌ๋ ์ด์
ํ ํ, ์ค์ ์ฌ๋ ๋ฐ์ดํฐ์ ๋น๊ตํ์ฌ ๊ฒฐ๊ณผ๋ฅผ ๋ถ์ํ๋ค.
์ฒ์ ๋ ๊ฐ์ ์ฐ๊ตฌ ์ฃผ์ ์ ํจ๊ณผ๋ฅผ ์
์ฆํ๊ธฐ ์ํด, ๋ฌผ๋ฆฌ ์๋ฎฌ๋ ์ด์
์ ์ฌ์ฉํ์ฌ ์ด์ฐจ์ ๋น๋์ค๋ก๋ถํฐ ์ฌ๊ตฌ์ฑํ ์ฌ๋ฌ ๊ฐ์ง ์ญ๋์ ์ธ ์ฌ๋์ ๋์๋ค์ ์ฌํํ๋ค. ๋์ค ๋ ๊ฐ์ ์ฐ๊ตฌ ์ฃผ์ ๋ ์ฌ๋ ๋ฐ์ดํฐ์์ ๋น๊ต ๋ถ์์ ํตํ์ฌ ํ๊ฐํ๋ค.1 Introduction 1
2 Background 9
2.1 Pose Estimation from 2D Video . . . . . . . . . . . . . . . . . . . . 9
2.2 Motion Reconstruction from Monocular Video . . . . . . . . . . . . 10
2.3 Physics-Based Character Simulation and Control . . . . . . . . . . . 12
2.4 Motion Reconstruction Leveraging Physics . . . . . . . . . . . . . . 13
2.5 Human Motion Control . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.1 Figure Skating Simulation . . . . . . . . . . . . . . . . . . . 16
2.6 Objective Gait Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.7 Optimization for Human Movement Simulation . . . . . . . . . . . . 17
2.7.1 Stability Criteria . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Human Dynamics from Monocular Video with Dynamic Camera Movements 19
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Pose and Contact Estimation . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Learning Human Dynamics . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.1 Policy Learning . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.2 Network Training . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.3 Scene Estimator . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.1 Video Clips . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.2 Comparison of Contact Estimators . . . . . . . . . . . . . . . 33
3.5.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5.4 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 Figure Skating Simulation from Video 42
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 Skating Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3.1 Non-holonomic Constraints . . . . . . . . . . . . . . . . . . 46
4.3.2 Relaxation of Non-holonomic Constraints . . . . . . . . . . . 47
4.4 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.5 Trajectory Optimization and Control . . . . . . . . . . . . . . . . . . 50
4.5.1 Trajectory Optimization . . . . . . . . . . . . . . . . . . . . 50
4.5.2 Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5 Gait Analysis Using Pose Estimation Algorithm with 2D-video of Patients 61
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2.1 Patients and video recording . . . . . . . . . . . . . . . . . . 63
5.2.2 Standard protocol approvals, registrations, and patient consents 66
5.2.3 3D Pose estimation from 2D video . . . . . . . . . . . . . . . 66
5.2.4 Gait parameter estimation . . . . . . . . . . . . . . . . . . . 67
5.2.5 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . 68
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3.1 Validation of video-based analysis of the gait . . . . . . . . . 68
5.3.2 gait analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.4.1 Validation with the conventional sensor-based method . . . . 75
5.4.2 Analysis of gait and turning in TUG . . . . . . . . . . . . . . 75
5.4.3 Correlation with clinical parameters . . . . . . . . . . . . . . 76
5.4.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.5 Supplementary Material . . . . . . . . . . . . . . . . . . . . . . . . . 77
6 Control Optimization of Human Walking 80
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2.1 Musculoskeletal model . . . . . . . . . . . . . . . . . . . . . 82
6.2.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2.3 Control co-activation level . . . . . . . . . . . . . . . . . . . 83
6.2.4 Push-recovery experiment . . . . . . . . . . . . . . . . . . . 84
6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7 Conclusion 90
7.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91Docto
A survey on human performance capture and animation
With the rapid development of computing technology, three-dimensional (3D) human body
models and their dynamic motions are widely used in the digital entertainment industry. Human perfor-
mance mainly involves human body shapes and motions. Key research problems include how to capture
and analyze static geometric appearance and dynamic movement of human bodies, and how to simulate
human body motions with physical e๏ฟฝects. In this survey, according to main research directions of human body performance capture and animation, we summarize recent advances in key research topics, namely
human body surface reconstruction, motion capture and synthesis, as well as physics-based motion sim-
ulation, and further discuss future research problems and directions. We hope this will be helpful for
readers to have a comprehensive understanding of human performance capture and animatio
Neural Categorical Priors for Physics-Based Character Control
Recent advances in learning reusable motion priors have demonstrated their
effectiveness in generating naturalistic behaviors. In this paper, we propose a
new learning framework in this paradigm for controlling physics-based
characters with significantly improved motion quality and diversity over
existing state-of-the-art methods. The proposed method uses reinforcement
learning (RL) to initially track and imitate life-like movements from
unstructured motion clips using the discrete information bottleneck, as adopted
in the Vector Quantized Variational AutoEncoder (VQ-VAE). This structure
compresses the most relevant information from the motion clips into a compact
yet informative latent space, i.e., a discrete space over vector quantized
codes. By sampling codes in the space from a trained categorical prior
distribution, high-quality life-like behaviors can be generated, similar to the
usage of VQ-VAE in computer vision. Although this prior distribution can be
trained with the supervision of the encoder's output, it follows the original
motion clip distribution in the dataset and could lead to imbalanced behaviors
in our setting. To address the issue, we further propose a technique named
prior shifting to adjust the prior distribution using curiosity-driven RL. The
outcome distribution is demonstrated to offer sufficient behavioral diversity
and significantly facilitates upper-level policy learning for downstream tasks.
We conduct comprehensive experiments using humanoid characters on two
challenging downstream tasks, sword-shield striking and two-player boxing game.
Our results demonstrate that the proposed framework is capable of controlling
the character to perform considerably high-quality movements in terms of
behavioral strategies, diversity, and realism. Videos, codes, and data are
available at https://tencent-roboticsx.github.io/NCP/
Skill learning based catching motion control
Ankara : The Department of Computer Engineering and The Graduate School of Engineering and Science of Bilkent Univesity, 2014.Thesis (Master's) -- Bilkent University, 2014.Includes bibliographical references leaves 55-59.In real world, it is crucial to learn biomechanical strategies that prepare the
body in kinematics and kinetics terms during the interception tasks, such as kicking,
throwing and catching. Based on this, we presents a real-time physics-based
approach that generate natural and physically plausible motions for a highly complex
task- ball catching. We showed that ball catching behavior as many other
complex tasks, can be achieved with the proper combination of rather simple
motor skills, such as standing, walking, reaching. Since learned biomechanical
strategies can increase the conscious in motor control, we concerned several issues
that needs to be planned. Among them, we intensively focus on the concept
of timing. The character learns some policies to know how and when to react
by using reinforcement learning in order to use time accurately. We demonstrate
the effectiveness of our method by presenting some of the catching animation
results executed in different catching strategies.In each simulation, the balls were
projected randomly, but within a interval of limits, in order to obtain different
arrival flight time and height conditions.รimen, GรถkรงenM.S
Learning-based methods for planning and control of humanoid robots
Nowadays, humans and robots are more and more likely to coexist as time goes by. The anthropomorphic nature of humanoid robots facilitates physical human-robot interaction, and makes social human-robot interaction more natural. Moreover, it makes humanoids ideal candidates for many applications related to tasks and environments designed for humans.
No matter the application, an ubiquitous requirement for the humanoid is to possess proper locomotion skills. Despite long-lasting research, humanoid locomotion is still far from being a trivial task. A common approach to address humanoid locomotion consists in decomposing its complexity by means of a model-based hierarchical control architecture. To cope with computational constraints, simplified models for the humanoid are employed in some of the architectural layers. At the same time, the redundancy of the humanoid with respect to the locomotion task as well as the closeness of such a task to human locomotion suggest a data-driven approach to learn it directly from experience.
This thesis investigates the application of learning-based techniques to planning and control of humanoid locomotion. In particular, both deep reinforcement learning and deep supervised learning are considered to address humanoid locomotion tasks in a crescendo of complexity.
First, we employ deep reinforcement learning to study the spontaneous emergence of balancing and push recovery strategies for the humanoid, which represent essential prerequisites for more complex locomotion tasks.
Then, by making use of motion capture data collected from human subjects, we employ deep supervised learning to shape the robot walking trajectories towards an improved human-likeness.
The proposed approaches are validated on real and simulated humanoid robots. Specifically, on two versions of the iCub humanoid: iCub v2.7 and iCub v3
Effects of Saltatory Rewards and Generalized Advantage Estimation on Reference-Based Deep Reinforcement Learning of Humanlike Motions
In the application of learning physics-based character skills, deep reinforcement learning (DRL) can lead to slow convergence and local optimum solutions during the training process of a reinforcement learning (RL) agent. With the presence of an environment with reward saltation, we can easily plan to magnify those saltatory rewards with the perspective of sample usage to increase the experience pool of an agent during this training process. In our work, we have proposed two modified algorithms. The first one is the addition of a parameter based reward optimization process to magnify the saltatory rewards and thus increasing an agentโs utilization of previous experiences. We have added this parameter based reward optimization with proximal policy optimization (PPO) algorithm. Whatโs more, the other proposed algorithm introduces generalized advantage estimation in estimating the advantage of the advantage actor critic (A2C) algorithm which resulted in faster convergence to the global optimal solutions of DRL. We have conducted all our experiments to measure their performances in a custom reinforcement learning environment built using a physics engine named PyBullet. In that custom environment, the RL agent has a humanoid body which learns humanlike motions, e.g., walk, run, spin, cartwheel, spinkick, and backflip, from imitating example reference motions using the RL algorithms. Our experiments have shown significant improvement in performance and convergence speed of DRL in this custom environment for learning humanlike motions using the modified versions of PPO and A2C if compared with their vanilla versions
- โฆ