Search CORE

288 research outputs found

Benchmarking Deep Reinforcement Learning for Continuous Control

Author: Abbeel Pieter
Chen Xi
Duan Yan
Houthooft Rein
Schulman John
Publication venue
Publication date: 01/01/2016
Field of study

Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning. Some notable examples include training agents to play Atari games based on raw pixel data and to acquire advanced manipulation skills using raw sensory inputs. However, it has been difficult to quantify progress in the domain of continuous control due to the lack of a commonly adopted benchmark. In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure. We report novel findings based on the systematic evaluation of a range of implemented reinforcement learning algorithms. Both the benchmark and reference implementations are released at https://github.com/rllab/rllab in order to facilitate experimental reproducibility and to encourage adoption by other researchers.Comment: 14 pages, ICML 201

arXiv.org e-Print Archive

Ghent University Academic Bibliography

동영상 속 사람 동작의 물리 기반 재구성 및 분석

Author: 유리
Publication venue: 서울대학교 대학원
Publication date: 01/02/2021
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2021. 2. 이제희.In computer graphics, simulating and analyzing human movement have been interesting research topics started since the 1960s. Still, simulating realistic human movements in a 3D virtual world is a challenging task in computer graphics. In general, motion capture techniques have been used. Although the motion capture data guarantees realistic result and high-quality data, there is lots of equipment required to capture motion, and the process is complicated. Recently, 3D human pose estimation techniques from the 2D video are remarkably developed. Researchers in computer graphics and computer vision have attempted to reconstruct the various human motions from video data. However, existing methods can not robustly estimate dynamic actions and not work on videos filmed with a moving camera. In this thesis, we propose methods to reconstruct dynamic human motions from in-the-wild videos and to control the motions. First, we developed a framework to reconstruct motion from videos using prior physics knowledge. For dynamic motions such as backspin, the poses estimated by a state-of-the-art method are incomplete and include unreliable root trajectory or lack intermediate poses. We designed a reward function using poses and hints extracted from videos in the deep reinforcement learning controller and learned a policy to simultaneously reconstruct motion and control a virtual character. Second, we simulated figure skating movements in video. Skating sequences consist of fast and dynamic movements on ice, hindering the acquisition of motion data. Thus, we extracted 3D key poses from a video to then successfully replicate several figure skating movements using trajectory optimization and a deep reinforcement learning controller. Third, we devised an algorithm for gait analysis through video of patients with movement disorders. After acquiring the patients joint positions from 2D video processed by a deep learning network, the 3D absolute coordinates were estimated, and gait parameters such as gait velocity, cadence, and step length were calculated. Additionally, we analyzed the optimization criteria of human walking by using a 3D musculoskeletal humanoid model and physics-based simulation. For two criteria, namely, the minimization of muscle activation and joint torque, we compared simulation data with real human data for analysis. To demonstrate the effectiveness of the first two research topics, we verified the reconstruction of dynamic human motions from 2D videos using physics-based simulations. For the last two research topics, we evaluated our results with real human data.컴퓨터 그래픽스에서 인간의 움직임 시뮬레이션 및 분석은 1960 년대부터 다루어진 흥미로운 연구 주제이다. 몇 십년 동안 활발하게 연구되어 왔음에도 불구하고, 3차원 가상 공간 상에서 사실적인 인간의 움직임을 시뮬레이션하는 연구는 여전히 어렵고 도전적인 주제이다. 그동안 사람의 움직임 데이터를 얻기 위해서 모션 캡쳐 기술이 사용되어 왔다. 모션 캡처 데이터는 사실적인 결과와 고품질 데이터를 보장하지만 모션 캡쳐를 하기 위해서 필요한 장비들이 많고, 그 과정이 복잡하다. 최근에 2차원 영상으로부터 사람의 3차원 자세를 추정하는 연구들이 괄목할 만한 결과를 보여주고 있다. 이를 바탕으로 컴퓨터 그래픽스와 컴퓨터 비젼 분야의 연구자들은 비디오 데이터로부터 다양한 인간 동작을 재구성하려는 시도를 하고 있다. 그러나 기존의 방법들은 빠르고 다이나믹한 동작들은 안정적으로 추정하지 못하며 움직이는 카메라로 촬영한 비디오에 대해서는 작동하지 않는다. 본 논문에서는 비디오로부터 역동적인 인간 동작을 재구성하고 동작을 제어하는 방법을 제안한다. 먼저 사전 물리학 지식을 사용하여 비디오에서 모션을 재구성하는 프레임 워크를 제안한다. 공중제비와 같은 역동적인 동작들에 대해서 최신 연구 방법을 동원하여 추정된 자세들은 캐릭터의 궤적을 신뢰할 수 없거나 중간에 자세 추정에 실패하는 등 불완전하다. 우리는 심층강화학습 제어기에서 영상으로부터 추출한 포즈와 힌트를 활용하여 보상 함수를 설계하고 모션 재구성과 캐릭터 제어를 동시에 하는 정책을 학습하였다. 둘 째, 비디오에서 피겨 스케이팅 기술을 시뮬레이션한다. 피겨 스케이팅 기술들은 빙상에서 빠르고 역동적인 움직임으로 구성되어 있어 모션 데이터를 얻기가 까다롭다. 비디오에서 3차원 키 포즈를 추출하고 궤적 최적화 및 심층강화학습 제어기를 사용하여 여러 피겨 스케이팅 기술을 성공적으로 시연한다. 셋 째, 파킨슨 병이나 뇌성마비와 같은 질병으로 인하여 움직임 장애가 있는 환자의 보행을 분석하기 위한 알고리즘을 제안한다. 2차원 비디오로부터 딥러닝을 사용한 자세 추정기법을 사용하여 환자의 관절 위치를 얻어낸 다음, 3차원 절대 좌표를 얻어내어 이로부터 보폭, 보행 속도와 같은 보행 파라미터를 계산한다. 마지막으로, 근골격 인체 모델과 물리 시뮬레이션을 이용하여 인간 보행의 최적화 기준에 대해 탐구한다. 근육 활성도 최소화와 관절 돌림힘 최소화, 두 가지 기준에 대해 시뮬레이션한 후, 실제 사람 데이터와 비교하여 결과를 분석한다. 처음 두 개의 연구 주제의 효과를 입증하기 위해, 물리 시뮬레이션을 사용하여 이차원 비디오로부터 재구성한 여러 가지 역동적인 사람의 동작들을 재현한다. 나중 두 개의 연구 주제는 사람 데이터와의 비교 분석을 통하여 평가한다.1 Introduction 1 2 Background 9 2.1 Pose Estimation from 2D Video . . . . . . . . . . . . . . . . . . . . 9 2.2 Motion Reconstruction from Monocular Video . . . . . . . . . . . . 10 2.3 Physics-Based Character Simulation and Control . . . . . . . . . . . 12 2.4 Motion Reconstruction Leveraging Physics . . . . . . . . . . . . . . 13 2.5 Human Motion Control . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5.1 Figure Skating Simulation . . . . . . . . . . . . . . . . . . . 16 2.6 Objective Gait Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.7 Optimization for Human Movement Simulation . . . . . . . . . . . . 17 2.7.1 Stability Criteria . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Human Dynamics from Monocular Video with Dynamic Camera Movements 19 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Pose and Contact Estimation . . . . . . . . . . . . . . . . . . . . . . 21 3.4 Learning Human Dynamics . . . . . . . . . . . . . . . . . . . . . . . 24 3.4.1 Policy Learning . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4.2 Network Training . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4.3 Scene Estimator . . . . . . . . . . . . . . . . . . . . . . . . 29 3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5.1 Video Clips . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5.2 Comparison of Contact Estimators . . . . . . . . . . . . . . . 33 3.5.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.5.4 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4 Figure Skating Simulation from Video 42 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3 Skating Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.3.1 Non-holonomic Constraints . . . . . . . . . . . . . . . . . . 46 4.3.2 Relaxation of Non-holonomic Constraints . . . . . . . . . . . 47 4.4 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.5 Trajectory Optimization and Control . . . . . . . . . . . . . . . . . . 50 4.5.1 Trajectory Optimization . . . . . . . . . . . . . . . . . . . . 50 4.5.2 Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5 Gait Analysis Using Pose Estimation Algorithm with 2D-video of Patients 61 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.2.1 Patients and video recording . . . . . . . . . . . . . . . . . . 63 5.2.2 Standard protocol approvals, registrations, and patient consents 66 5.2.3 3D Pose estimation from 2D video . . . . . . . . . . . . . . . 66 5.2.4 Gait parameter estimation . . . . . . . . . . . . . . . . . . . 67 5.2.5 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . 68 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.3.1 Validation of video-based analysis of the gait . . . . . . . . . 68 5.3.2 gait analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.4.1 Validation with the conventional sensor-based method . . . . 75 5.4.2 Analysis of gait and turning in TUG . . . . . . . . . . . . . . 75 5.4.3 Correlation with clinical parameters . . . . . . . . . . . . . . 76 5.4.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.5 Supplementary Material . . . . . . . . . . . . . . . . . . . . . . . . . 77 6 Control Optimization of Human Walking 80 6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.2.1 Musculoskeletal model . . . . . . . . . . . . . . . . . . . . . 82 6.2.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.2.3 Control co-activation level . . . . . . . . . . . . . . . . . . . 83 6.2.4 Push-recovery experiment . . . . . . . . . . . . . . . . . . . 84 6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7 Conclusion 90 7.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91Docto

SNU Open Repository and Archive

Push recovery with stepping strategy based on time-projection control

Author: Faraji Salman
Ijspeert Auke J.
Razavi Hamed
Publication venue
Publication date: 07/01/2018
Field of study

In this paper, we present a simple control framework for on-line push recovery with dynamic stepping properties. Due to relatively heavy legs in our robot, we need to take swing dynamics into account and thus use a linear model called 3LP which is composed of three pendulums to simulate swing and torso dynamics. Based on 3LP equations, we formulate discrete LQR controllers and use a particular time-projection method to adjust the next footstep location on-line during the motion continuously. This adjustment, which is found based on both pelvis and swing foot tracking errors, naturally takes the swing dynamics into account. Suggested adjustments are added to the Cartesian 3LP gaits and converted to joint-space trajectories through inverse kinematics. Fixed and adaptive foot lift strategies also ensure enough ground clearance in perturbed walking conditions. The proposed structure is robust, yet uses very simple state estimation and basic position tracking. We rely on the physical series elastic actuators to absorb impacts while introducing simple laws to compensate their tracking bias. Extensive experiments demonstrate the functionality of different control blocks and prove the effectiveness of time-projection in extreme push recovery scenarios. We also show self-produced and emergent walking gaits when the robot is subject to continuous dragging forces. These gaits feature dynamic walking robustness due to relatively soft springs in the ankles and avoiding any Zero Moment Point (ZMP) control in our proposed architecture.Comment: 20 pages journal pape

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Learning When to Switch: Composing Controllers to Traverse a Sequence of Terrain Artifacts

Author: Cosgun Akansel
Hudson Nicolas
Leitner Jurgen
Tidd Brendan
Publication venue
Publication date: 01/01/2021
Field of study

Legged robots often use separate control policiesthat are highly engineered for traversing difficult terrain suchas stairs, gaps, and steps, where switching between policies isonly possible when the robot is in a region that is commonto adjacent controllers. Deep Reinforcement Learning (DRL)is a promising alternative to hand-crafted control design,though typically requires the full set of test conditions to beknown before training. DRL policies can result in complex(often unrealistic) behaviours that have few or no overlappingregions between adjacent policies, making it difficult to switchbehaviours. In this work we develop multiple DRL policieswith Curriculum Learning (CL), each that can traverse asingle respective terrain condition, while ensuring an overlapbetween policies. We then train a network for each destinationpolicy that estimates the likelihood of successfully switchingfrom any other policy. We evaluate our switching methodon a previously unseen combination of terrain artifacts andshow that it performs better than heuristic methods. Whileour method is trained on individual terrain types, it performscomparably to a Deep Q Network trained on the full set ofterrain conditions. This approach allows the development ofseparate policies in constrained conditions with embedded priorknowledge about each behaviour, that is scalable to any numberof behaviours, and prepares DRL methods for applications inthe real worl

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

FigShare

FC Portugal 3D Simulation Team: Team Description Paper 2020

Author: Abreu Mohammadreza Kasaei. Miguel
Lau Nuno
Reis Luis Paulo
Resende Francisco
Silva Tiago
Simoes David
Publication venue
Publication date: 28/03/2023
Field of study

The FC Portugal 3D team is developed upon the structure of our previous Simulation league 2D/3D teams and our standard platform league team. Our research concerning the robot low-level skills is focused on developing behaviors that may be applied on real robots with minimal adaptation using model-based approaches. Our research on high-level soccer coordination methodologies and team playing is mainly focused on the adaptation of previously developed methodologies from our 2D soccer teams to the 3D humanoid environment and on creating new coordination methodologies based on the previously developed ones. The research-oriented development of our team has been pushing it to be one of the most competitive over the years (World champion in 2000 and Coach Champion in 2002, European champion in 2000 and 2001, Coach 2nd place in 2003 and 2004, European champion in Rescue Simulation and Simulation 3D in 2006, World Champion in Simulation 3D in Bremen 2006 and European champion in 2007, 2012, 2013, 2014 and 2015). This paper describes some of the main innovations of our 3D simulation league team during the last years. A new generic framework for reinforcement learning tasks has also been developed. The current research is focused on improving the above-mentioned framework by developing new learning algorithms to optimize low-level skills, such as running and sprinting. We are also trying to increase student contact by providing reinforcement learning assignments to be completed using our new framework, which exposes a simple interface without sharing low-level implementation details

arXiv.org e-Print Archive