288 research outputs found
Benchmarking Deep Reinforcement Learning for Continuous Control
Recently, researchers have made significant progress combining the advances
in deep learning for learning feature representations with reinforcement
learning. Some notable examples include training agents to play Atari games
based on raw pixel data and to acquire advanced manipulation skills using raw
sensory inputs. However, it has been difficult to quantify progress in the
domain of continuous control due to the lack of a commonly adopted benchmark.
In this work, we present a benchmark suite of continuous control tasks,
including classic tasks like cart-pole swing-up, tasks with very high state and
action dimensionality such as 3D humanoid locomotion, tasks with partial
observations, and tasks with hierarchical structure. We report novel findings
based on the systematic evaluation of a range of implemented reinforcement
learning algorithms. Both the benchmark and reference implementations are
released at https://github.com/rllab/rllab in order to facilitate experimental
reproducibility and to encourage adoption by other researchers.Comment: 14 pages, ICML 201
๋์์ ์ ์ฌ๋ ๋์์ ๋ฌผ๋ฆฌ ๊ธฐ๋ฐ ์ฌ๊ตฌ์ฑ ๋ฐ ๋ถ์
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ) -- ์์ธ๋ํ๊ต ๋ํ์ : ๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ, 2021. 2. ์ด์ ํฌ.In computer graphics, simulating and analyzing human movement have been interesting research topics started since the 1960s. Still, simulating realistic human movements in a 3D virtual world is a challenging task in computer graphics. In general, motion capture techniques have been used. Although the motion capture data guarantees realistic result and high-quality data, there is lots of equipment required to capture motion, and the process is complicated. Recently, 3D human pose estimation techniques from the 2D video are remarkably developed. Researchers in computer graphics and computer vision have attempted to reconstruct the various human motions from video data. However, existing methods can not robustly estimate dynamic actions and not work on videos filmed with a moving camera.
In this thesis, we propose methods to reconstruct dynamic human motions from in-the-wild videos and to control the motions. First, we developed a framework to reconstruct motion from videos using prior physics knowledge. For dynamic motions such as backspin, the poses estimated by a state-of-the-art method are incomplete and include unreliable root trajectory or lack intermediate poses. We designed a reward function using poses and hints extracted from videos in the deep reinforcement learning controller and learned a policy to simultaneously reconstruct motion and control a virtual character. Second, we simulated figure skating movements in video. Skating sequences consist of fast and dynamic movements on ice, hindering the acquisition of motion data. Thus, we extracted 3D key poses from a video to then successfully replicate several figure skating movements using trajectory optimization and a deep reinforcement learning controller. Third, we devised an algorithm for gait analysis through video of patients with movement disorders. After acquiring the patients joint positions from 2D video processed by a deep learning network, the 3D absolute coordinates were estimated, and gait parameters such as gait velocity, cadence, and step length were calculated. Additionally, we analyzed the optimization criteria of human walking by using a 3D musculoskeletal humanoid model and physics-based simulation. For two criteria, namely, the minimization of muscle activation and joint torque, we compared simulation data with real human data for analysis.
To demonstrate the effectiveness of the first two research topics, we verified the reconstruction of dynamic human motions from 2D videos using physics-based simulations. For the last two research topics, we evaluated our results with real human data.์ปดํจํฐ ๊ทธ๋ํฝ์ค์์ ์ธ๊ฐ์ ์์ง์ ์๋ฎฌ๋ ์ด์
๋ฐ ๋ถ์์ 1960 ๋
๋๋ถํฐ ๋ค๋ฃจ์ด์ง ํฅ๋ฏธ๋ก์ด ์ฐ๊ตฌ ์ฃผ์ ์ด๋ค. ๋ช ์ญ๋
๋์ ํ๋ฐํ๊ฒ ์ฐ๊ตฌ๋์ด ์์์๋ ๋ถ๊ตฌํ๊ณ , 3์ฐจ์ ๊ฐ์ ๊ณต๊ฐ ์์์ ์ฌ์ค์ ์ธ ์ธ๊ฐ์ ์์ง์์ ์๋ฎฌ๋ ์ด์
ํ๋ ์ฐ๊ตฌ๋ ์ฌ์ ํ ์ด๋ ต๊ณ ๋์ ์ ์ธ ์ฃผ์ ์ด๋ค. ๊ทธ๋์ ์ฌ๋์ ์์ง์ ๋ฐ์ดํฐ๋ฅผ ์ป๊ธฐ ์ํด์ ๋ชจ์
์บก์ณ ๊ธฐ์ ์ด ์ฌ์ฉ๋์ด ์๋ค. ๋ชจ์
์บก์ฒ ๋ฐ์ดํฐ๋ ์ฌ์ค์ ์ธ ๊ฒฐ๊ณผ์ ๊ณ ํ์ง ๋ฐ์ดํฐ๋ฅผ ๋ณด์ฅํ์ง๋ง ๋ชจ์
์บก์ณ๋ฅผ ํ๊ธฐ ์ํด์ ํ์ํ ์ฅ๋น๋ค์ด ๋ง๊ณ , ๊ทธ ๊ณผ์ ์ด ๋ณต์กํ๋ค. ์ต๊ทผ์ 2์ฐจ์ ์์์ผ๋ก๋ถํฐ ์ฌ๋์ 3์ฐจ์ ์์ธ๋ฅผ ์ถ์ ํ๋ ์ฐ๊ตฌ๋ค์ด ๊ด๋ชฉํ ๋งํ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ฌ์ฃผ๊ณ ์๋ค. ์ด๋ฅผ ๋ฐํ์ผ๋ก ์ปดํจํฐ ๊ทธ๋ํฝ์ค์ ์ปดํจํฐ ๋น์ ผ ๋ถ์ผ์ ์ฐ๊ตฌ์๋ค์ ๋น๋์ค ๋ฐ์ดํฐ๋ก๋ถํฐ ๋ค์ํ ์ธ๊ฐ ๋์์ ์ฌ๊ตฌ์ฑํ๋ ค๋ ์๋๋ฅผ ํ๊ณ ์๋ค. ๊ทธ๋ฌ๋ ๊ธฐ์กด์ ๋ฐฉ๋ฒ๋ค์ ๋น ๋ฅด๊ณ ๋ค์ด๋๋ฏนํ ๋์๋ค์ ์์ ์ ์ผ๋ก ์ถ์ ํ์ง ๋ชปํ๋ฉฐ ์์ง์ด๋ ์นด๋ฉ๋ผ๋ก ์ดฌ์ํ ๋น๋์ค์ ๋ํด์๋ ์๋ํ์ง ์๋๋ค.
๋ณธ ๋
ผ๋ฌธ์์๋ ๋น๋์ค๋ก๋ถํฐ ์ญ๋์ ์ธ ์ธ๊ฐ ๋์์ ์ฌ๊ตฌ์ฑํ๊ณ ๋์์ ์ ์ดํ๋ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ๋จผ์ ์ฌ์ ๋ฌผ๋ฆฌํ ์ง์์ ์ฌ์ฉํ์ฌ ๋น๋์ค์์ ๋ชจ์
์ ์ฌ๊ตฌ์ฑํ๋ ํ๋ ์ ์ํฌ๋ฅผ ์ ์ํ๋ค. ๊ณต์ค์ ๋น์ ๊ฐ์ ์ญ๋์ ์ธ ๋์๋ค์ ๋ํด์ ์ต์ ์ฐ๊ตฌ ๋ฐฉ๋ฒ์ ๋์ํ์ฌ ์ถ์ ๋ ์์ธ๋ค์ ์บ๋ฆญํฐ์ ๊ถค์ ์ ์ ๋ขฐํ ์ ์๊ฑฐ๋ ์ค๊ฐ์ ์์ธ ์ถ์ ์ ์คํจํ๋ ๋ฑ ๋ถ์์ ํ๋ค. ์ฐ๋ฆฌ๋ ์ฌ์ธต๊ฐํํ์ต ์ ์ด๊ธฐ์์ ์์์ผ๋ก๋ถํฐ ์ถ์ถํ ํฌ์ฆ์ ํํธ๋ฅผ ํ์ฉํ์ฌ ๋ณด์ ํจ์๋ฅผ ์ค๊ณํ๊ณ ๋ชจ์
์ฌ๊ตฌ์ฑ๊ณผ ์บ๋ฆญํฐ ์ ์ด๋ฅผ ๋์์ ํ๋ ์ ์ฑ
์ ํ์ตํ์๋ค. ๋ ์งธ, ๋น๋์ค์์ ํผ๊ฒจ ์ค์ผ์ดํ
๊ธฐ์ ์ ์๋ฎฌ๋ ์ด์
ํ๋ค. ํผ๊ฒจ ์ค์ผ์ดํ
๊ธฐ์ ๋ค์ ๋น์์์ ๋น ๋ฅด๊ณ ์ญ๋์ ์ธ ์์ง์์ผ๋ก ๊ตฌ์ฑ๋์ด ์์ด ๋ชจ์
๋ฐ์ดํฐ๋ฅผ ์ป๊ธฐ๊ฐ ๊น๋ค๋กญ๋ค. ๋น๋์ค์์ 3์ฐจ์ ํค ํฌ์ฆ๋ฅผ ์ถ์ถํ๊ณ ๊ถค์ ์ต์ ํ ๋ฐ ์ฌ์ธต๊ฐํํ์ต ์ ์ด๊ธฐ๋ฅผ ์ฌ์ฉํ์ฌ ์ฌ๋ฌ ํผ๊ฒจ ์ค์ผ์ดํ
๊ธฐ์ ์ ์ฑ๊ณต์ ์ผ๋ก ์์ฐํ๋ค. ์
์งธ, ํํจ์จ ๋ณ์ด๋ ๋์ฑ๋ง๋น์ ๊ฐ์ ์ง๋ณ์ผ๋ก ์ธํ์ฌ ์์ง์ ์ฅ์ ๊ฐ ์๋ ํ์์ ๋ณดํ์ ๋ถ์ํ๊ธฐ ์ํ ์๊ณ ๋ฆฌ์ฆ์ ์ ์ํ๋ค. 2์ฐจ์ ๋น๋์ค๋ก๋ถํฐ ๋ฅ๋ฌ๋์ ์ฌ์ฉํ ์์ธ ์ถ์ ๊ธฐ๋ฒ์ ์ฌ์ฉํ์ฌ ํ์์ ๊ด์ ์์น๋ฅผ ์ป์ด๋ธ ๋ค์, 3์ฐจ์ ์ ๋ ์ขํ๋ฅผ ์ป์ด๋ด์ด ์ด๋ก๋ถํฐ ๋ณดํญ, ๋ณดํ ์๋์ ๊ฐ์ ๋ณดํ ํ๋ผ๋ฏธํฐ๋ฅผ ๊ณ์ฐํ๋ค. ๋ง์ง๋ง์ผ๋ก, ๊ทผ๊ณจ๊ฒฉ ์ธ์ฒด ๋ชจ๋ธ๊ณผ ๋ฌผ๋ฆฌ ์๋ฎฌ๋ ์ด์
์ ์ด์ฉํ์ฌ ์ธ๊ฐ ๋ณดํ์ ์ต์ ํ ๊ธฐ์ค์ ๋ํด ํ๊ตฌํ๋ค. ๊ทผ์ก ํ์ฑ๋ ์ต์ํ์ ๊ด์ ๋๋ฆผํ ์ต์ํ, ๋ ๊ฐ์ง ๊ธฐ์ค์ ๋ํด ์๋ฎฌ๋ ์ด์
ํ ํ, ์ค์ ์ฌ๋ ๋ฐ์ดํฐ์ ๋น๊ตํ์ฌ ๊ฒฐ๊ณผ๋ฅผ ๋ถ์ํ๋ค.
์ฒ์ ๋ ๊ฐ์ ์ฐ๊ตฌ ์ฃผ์ ์ ํจ๊ณผ๋ฅผ ์
์ฆํ๊ธฐ ์ํด, ๋ฌผ๋ฆฌ ์๋ฎฌ๋ ์ด์
์ ์ฌ์ฉํ์ฌ ์ด์ฐจ์ ๋น๋์ค๋ก๋ถํฐ ์ฌ๊ตฌ์ฑํ ์ฌ๋ฌ ๊ฐ์ง ์ญ๋์ ์ธ ์ฌ๋์ ๋์๋ค์ ์ฌํํ๋ค. ๋์ค ๋ ๊ฐ์ ์ฐ๊ตฌ ์ฃผ์ ๋ ์ฌ๋ ๋ฐ์ดํฐ์์ ๋น๊ต ๋ถ์์ ํตํ์ฌ ํ๊ฐํ๋ค.1 Introduction 1
2 Background 9
2.1 Pose Estimation from 2D Video . . . . . . . . . . . . . . . . . . . . 9
2.2 Motion Reconstruction from Monocular Video . . . . . . . . . . . . 10
2.3 Physics-Based Character Simulation and Control . . . . . . . . . . . 12
2.4 Motion Reconstruction Leveraging Physics . . . . . . . . . . . . . . 13
2.5 Human Motion Control . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.1 Figure Skating Simulation . . . . . . . . . . . . . . . . . . . 16
2.6 Objective Gait Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.7 Optimization for Human Movement Simulation . . . . . . . . . . . . 17
2.7.1 Stability Criteria . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Human Dynamics from Monocular Video with Dynamic Camera Movements 19
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Pose and Contact Estimation . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Learning Human Dynamics . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.1 Policy Learning . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.2 Network Training . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.3 Scene Estimator . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.1 Video Clips . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.2 Comparison of Contact Estimators . . . . . . . . . . . . . . . 33
3.5.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5.4 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 Figure Skating Simulation from Video 42
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 Skating Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3.1 Non-holonomic Constraints . . . . . . . . . . . . . . . . . . 46
4.3.2 Relaxation of Non-holonomic Constraints . . . . . . . . . . . 47
4.4 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.5 Trajectory Optimization and Control . . . . . . . . . . . . . . . . . . 50
4.5.1 Trajectory Optimization . . . . . . . . . . . . . . . . . . . . 50
4.5.2 Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5 Gait Analysis Using Pose Estimation Algorithm with 2D-video of Patients 61
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2.1 Patients and video recording . . . . . . . . . . . . . . . . . . 63
5.2.2 Standard protocol approvals, registrations, and patient consents 66
5.2.3 3D Pose estimation from 2D video . . . . . . . . . . . . . . . 66
5.2.4 Gait parameter estimation . . . . . . . . . . . . . . . . . . . 67
5.2.5 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . 68
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3.1 Validation of video-based analysis of the gait . . . . . . . . . 68
5.3.2 gait analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.4.1 Validation with the conventional sensor-based method . . . . 75
5.4.2 Analysis of gait and turning in TUG . . . . . . . . . . . . . . 75
5.4.3 Correlation with clinical parameters . . . . . . . . . . . . . . 76
5.4.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.5 Supplementary Material . . . . . . . . . . . . . . . . . . . . . . . . . 77
6 Control Optimization of Human Walking 80
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2.1 Musculoskeletal model . . . . . . . . . . . . . . . . . . . . . 82
6.2.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2.3 Control co-activation level . . . . . . . . . . . . . . . . . . . 83
6.2.4 Push-recovery experiment . . . . . . . . . . . . . . . . . . . 84
6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7 Conclusion 90
7.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91Docto
Push recovery with stepping strategy based on time-projection control
In this paper, we present a simple control framework for on-line push
recovery with dynamic stepping properties. Due to relatively heavy legs in our
robot, we need to take swing dynamics into account and thus use a linear model
called 3LP which is composed of three pendulums to simulate swing and torso
dynamics. Based on 3LP equations, we formulate discrete LQR controllers and use
a particular time-projection method to adjust the next footstep location
on-line during the motion continuously. This adjustment, which is found based
on both pelvis and swing foot tracking errors, naturally takes the swing
dynamics into account. Suggested adjustments are added to the Cartesian 3LP
gaits and converted to joint-space trajectories through inverse kinematics.
Fixed and adaptive foot lift strategies also ensure enough ground clearance in
perturbed walking conditions. The proposed structure is robust, yet uses very
simple state estimation and basic position tracking. We rely on the physical
series elastic actuators to absorb impacts while introducing simple laws to
compensate their tracking bias. Extensive experiments demonstrate the
functionality of different control blocks and prove the effectiveness of
time-projection in extreme push recovery scenarios. We also show self-produced
and emergent walking gaits when the robot is subject to continuous dragging
forces. These gaits feature dynamic walking robustness due to relatively soft
springs in the ankles and avoiding any Zero Moment Point (ZMP) control in our
proposed architecture.Comment: 20 pages journal pape
Learning When to Switch: Composing Controllers to Traverse a Sequence of Terrain Artifacts
Legged robots often use separate control policiesthat are highly engineered
for traversing difficult terrain suchas stairs, gaps, and steps, where
switching between policies isonly possible when the robot is in a region that
is commonto adjacent controllers. Deep Reinforcement Learning (DRL)is a
promising alternative to hand-crafted control design,though typically requires
the full set of test conditions to beknown before training. DRL policies can
result in complex(often unrealistic) behaviours that have few or no
overlappingregions between adjacent policies, making it difficult to
switchbehaviours. In this work we develop multiple DRL policieswith Curriculum
Learning (CL), each that can traverse asingle respective terrain condition,
while ensuring an overlapbetween policies. We then train a network for each
destinationpolicy that estimates the likelihood of successfully switchingfrom
any other policy. We evaluate our switching methodon a previously unseen
combination of terrain artifacts andshow that it performs better than heuristic
methods. Whileour method is trained on individual terrain types, it
performscomparably to a Deep Q Network trained on the full set ofterrain
conditions. This approach allows the development ofseparate policies in
constrained conditions with embedded priorknowledge about each behaviour, that
is scalable to any numberof behaviours, and prepares DRL methods for
applications inthe real worl
FC Portugal 3D Simulation Team: Team Description Paper 2020
The FC Portugal 3D team is developed upon the structure of our previous
Simulation league 2D/3D teams and our standard platform league team. Our
research concerning the robot low-level skills is focused on developing
behaviors that may be applied on real robots with minimal adaptation using
model-based approaches. Our research on high-level soccer coordination
methodologies and team playing is mainly focused on the adaptation of
previously developed methodologies from our 2D soccer teams to the 3D humanoid
environment and on creating new coordination methodologies based on the
previously developed ones. The research-oriented development of our team has
been pushing it to be one of the most competitive over the years (World
champion in 2000 and Coach Champion in 2002, European champion in 2000 and
2001, Coach 2nd place in 2003 and 2004, European champion in Rescue Simulation
and Simulation 3D in 2006, World Champion in Simulation 3D in Bremen 2006 and
European champion in 2007, 2012, 2013, 2014 and 2015). This paper describes
some of the main innovations of our 3D simulation league team during the last
years. A new generic framework for reinforcement learning tasks has also been
developed. The current research is focused on improving the above-mentioned
framework by developing new learning algorithms to optimize low-level skills,
such as running and sprinting. We are also trying to increase student contact
by providing reinforcement learning assignments to be completed using our new
framework, which exposes a simple interface without sharing low-level
implementation details
- โฆ