1,947 research outputs found

    SLoMo: A General System for Legged Robot Motion Imitation from Casual Videos

    Full text link
    We present SLoMo: a first-of-its-kind framework for transferring skilled motions from casually captured "in the wild" video footage of humans and animals to legged robots. SLoMo works in three stages: 1) synthesize a physically plausible reconstructed key-point trajectory from monocular videos; 2) optimize a dynamically feasible reference trajectory for the robot offline that includes body and foot motion, as well as contact sequences that closely tracks the key points; 3) track the reference trajectory online using a general-purpose model-predictive controller on robot hardware. Traditional motion imitation for legged motor skills often requires expert animators, collaborative demonstrations, and/or expensive motion capture equipment, all of which limits scalability. Instead, SLoMo only relies on easy-to-obtain monocular video footage, readily available in online repositories such as YouTube. It converts videos into motion primitives that can be executed reliably by real-world robots. We demonstrate our approach by transferring the motions of cats, dogs, and humans to example robots including a quadruped (on hardware) and a humanoid (in simulation). To the best knowledge of the authors, this is the first attempt at a general-purpose motion transfer framework that imitates animal and human motions on legged robots directly from casual videos without artificial markers or labels.Comment: accepted at RA-L 2023, with ICRA 2024 optio

    A framework for realistic 3D tele-immersion

    Get PDF
    Meeting, socializing and conversing online with a group of people using teleconferencing systems is still quite differ- ent from the experience of meeting face to face. We are abruptly aware that we are online and that the people we are engaging with are not in close proximity. Analogous to how talking on the telephone does not replicate the experi- ence of talking in person. Several causes for these differences have been identified and we propose inspiring and innova- tive solutions to these hurdles in attempt to provide a more realistic, believable and engaging online conversational expe- rience. We present the distributed and scalable framework REVERIE that provides a balanced mix of these solutions. Applications build on top of the REVERIE framework will be able to provide interactive, immersive, photo-realistic ex- periences to a multitude of users that for them will feel much more similar to having face to face meetings than the expe- rience offered by conventional teleconferencing systems

    Optical Non-Line-of-Sight Physics-based 3D Human Pose Estimation

    Full text link
    We describe a method for 3D human pose estimation from transient images (i.e., a 3D spatio-temporal histogram of photons) acquired by an optical non-line-of-sight (NLOS) imaging system. Our method can perceive 3D human pose by `looking around corners' through the use of light indirectly reflected by the environment. We bring together a diverse set of technologies from NLOS imaging, human pose estimation and deep reinforcement learning to construct an end-to-end data processing pipeline that converts a raw stream of photon measurements into a full 3D human pose sequence estimate. Our contributions are the design of data representation process which includes (1) a learnable inverse point spread function (PSF) to convert raw transient images into a deep feature vector; (2) a neural humanoid control policy conditioned on the transient image feature and learned from interactions with a physics simulator; and (3) a data synthesis and augmentation strategy based on depth data that can be transferred to a real-world NLOS imaging system. Our preliminary experiments suggest that our method is able to generalize to real-world NLOS measurement to estimate physically-valid 3D human poses.Comment: CVPR 2020. Video: https://youtu.be/4HFulrdmLE8. Project page: https://marikoisogawa.github.io/project/nlos_pos

    ๋™์˜์ƒ ์† ์‚ฌ๋žŒ ๋™์ž‘์˜ ๋ฌผ๋ฆฌ ๊ธฐ๋ฐ˜ ์žฌ๊ตฌ์„ฑ ๋ฐ ๋ถ„์„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021. 2. ์ด์ œํฌ.In computer graphics, simulating and analyzing human movement have been interesting research topics started since the 1960s. Still, simulating realistic human movements in a 3D virtual world is a challenging task in computer graphics. In general, motion capture techniques have been used. Although the motion capture data guarantees realistic result and high-quality data, there is lots of equipment required to capture motion, and the process is complicated. Recently, 3D human pose estimation techniques from the 2D video are remarkably developed. Researchers in computer graphics and computer vision have attempted to reconstruct the various human motions from video data. However, existing methods can not robustly estimate dynamic actions and not work on videos filmed with a moving camera. In this thesis, we propose methods to reconstruct dynamic human motions from in-the-wild videos and to control the motions. First, we developed a framework to reconstruct motion from videos using prior physics knowledge. For dynamic motions such as backspin, the poses estimated by a state-of-the-art method are incomplete and include unreliable root trajectory or lack intermediate poses. We designed a reward function using poses and hints extracted from videos in the deep reinforcement learning controller and learned a policy to simultaneously reconstruct motion and control a virtual character. Second, we simulated figure skating movements in video. Skating sequences consist of fast and dynamic movements on ice, hindering the acquisition of motion data. Thus, we extracted 3D key poses from a video to then successfully replicate several figure skating movements using trajectory optimization and a deep reinforcement learning controller. Third, we devised an algorithm for gait analysis through video of patients with movement disorders. After acquiring the patients joint positions from 2D video processed by a deep learning network, the 3D absolute coordinates were estimated, and gait parameters such as gait velocity, cadence, and step length were calculated. Additionally, we analyzed the optimization criteria of human walking by using a 3D musculoskeletal humanoid model and physics-based simulation. For two criteria, namely, the minimization of muscle activation and joint torque, we compared simulation data with real human data for analysis. To demonstrate the effectiveness of the first two research topics, we verified the reconstruction of dynamic human motions from 2D videos using physics-based simulations. For the last two research topics, we evaluated our results with real human data.์ปดํ“จํ„ฐ ๊ทธ๋ž˜ํ”ฝ์Šค์—์„œ ์ธ๊ฐ„์˜ ์›€์ง์ž„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ ๋ถ„์„์€ 1960 ๋…„๋Œ€๋ถ€ํ„ฐ ๋‹ค๋ฃจ์–ด์ง„ ํฅ๋ฏธ๋กœ์šด ์—ฐ๊ตฌ ์ฃผ์ œ์ด๋‹ค. ๋ช‡ ์‹ญ๋…„ ๋™์•ˆ ํ™œ๋ฐœํ•˜๊ฒŒ ์—ฐ๊ตฌ๋˜์–ด ์™”์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , 3์ฐจ์› ๊ฐ€์ƒ ๊ณต๊ฐ„ ์ƒ์—์„œ ์‚ฌ์‹ค์ ์ธ ์ธ๊ฐ„์˜ ์›€์ง์ž„์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜๋Š” ์—ฐ๊ตฌ๋Š” ์—ฌ์ „ํžˆ ์–ด๋ ต๊ณ  ๋„์ „์ ์ธ ์ฃผ์ œ์ด๋‹ค. ๊ทธ๋™์•ˆ ์‚ฌ๋žŒ์˜ ์›€์ง์ž„ ๋ฐ์ดํ„ฐ๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด์„œ ๋ชจ์…˜ ์บก์ณ ๊ธฐ์ˆ ์ด ์‚ฌ์šฉ๋˜์–ด ์™”๋‹ค. ๋ชจ์…˜ ์บก์ฒ˜ ๋ฐ์ดํ„ฐ๋Š” ์‚ฌ์‹ค์ ์ธ ๊ฒฐ๊ณผ์™€ ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด์žฅํ•˜์ง€๋งŒ ๋ชจ์…˜ ์บก์ณ๋ฅผ ํ•˜๊ธฐ ์œ„ํ•ด์„œ ํ•„์š”ํ•œ ์žฅ๋น„๋“ค์ด ๋งŽ๊ณ , ๊ทธ ๊ณผ์ •์ด ๋ณต์žกํ•˜๋‹ค. ์ตœ๊ทผ์— 2์ฐจ์› ์˜์ƒ์œผ๋กœ๋ถ€ํ„ฐ ์‚ฌ๋žŒ์˜ 3์ฐจ์› ์ž์„ธ๋ฅผ ์ถ”์ •ํ•˜๋Š” ์—ฐ๊ตฌ๋“ค์ด ๊ด„๋ชฉํ•  ๋งŒํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋‹ค. ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ปดํ“จํ„ฐ ๊ทธ๋ž˜ํ”ฝ์Šค์™€ ์ปดํ“จํ„ฐ ๋น„์ ผ ๋ถ„์•ผ์˜ ์—ฐ๊ตฌ์ž๋“ค์€ ๋น„๋””์˜ค ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๋‹ค์–‘ํ•œ ์ธ๊ฐ„ ๋™์ž‘์„ ์žฌ๊ตฌ์„ฑํ•˜๋ ค๋Š” ์‹œ๋„๋ฅผ ํ•˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•๋“ค์€ ๋น ๋ฅด๊ณ  ๋‹ค์ด๋‚˜๋ฏนํ•œ ๋™์ž‘๋“ค์€ ์•ˆ์ •์ ์œผ๋กœ ์ถ”์ •ํ•˜์ง€ ๋ชปํ•˜๋ฉฐ ์›€์ง์ด๋Š” ์นด๋ฉ”๋ผ๋กœ ์ดฌ์˜ํ•œ ๋น„๋””์˜ค์— ๋Œ€ํ•ด์„œ๋Š” ์ž‘๋™ํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋น„๋””์˜ค๋กœ๋ถ€ํ„ฐ ์—ญ๋™์ ์ธ ์ธ๊ฐ„ ๋™์ž‘์„ ์žฌ๊ตฌ์„ฑํ•˜๊ณ  ๋™์ž‘์„ ์ œ์–ดํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋จผ์ € ์‚ฌ์ „ ๋ฌผ๋ฆฌํ•™ ์ง€์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ๋น„๋””์˜ค์—์„œ ๋ชจ์…˜์„ ์žฌ๊ตฌ์„ฑํ•˜๋Š” ํ”„๋ ˆ์ž„ ์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๊ณต์ค‘์ œ๋น„์™€ ๊ฐ™์€ ์—ญ๋™์ ์ธ ๋™์ž‘๋“ค์— ๋Œ€ํ•ด์„œ ์ตœ์‹  ์—ฐ๊ตฌ ๋ฐฉ๋ฒ•์„ ๋™์›ํ•˜์—ฌ ์ถ”์ •๋œ ์ž์„ธ๋“ค์€ ์บ๋ฆญํ„ฐ์˜ ๊ถค์ ์„ ์‹ ๋ขฐํ•  ์ˆ˜ ์—†๊ฑฐ๋‚˜ ์ค‘๊ฐ„์— ์ž์„ธ ์ถ”์ •์— ์‹คํŒจํ•˜๋Š” ๋“ฑ ๋ถˆ์™„์ „ํ•˜๋‹ค. ์šฐ๋ฆฌ๋Š” ์‹ฌ์ธต๊ฐ•ํ™”ํ•™์Šต ์ œ์–ด๊ธฐ์—์„œ ์˜์ƒ์œผ๋กœ๋ถ€ํ„ฐ ์ถ”์ถœํ•œ ํฌ์ฆˆ์™€ ํžŒํŠธ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ณด์ƒ ํ•จ์ˆ˜๋ฅผ ์„ค๊ณ„ํ•˜๊ณ  ๋ชจ์…˜ ์žฌ๊ตฌ์„ฑ๊ณผ ์บ๋ฆญํ„ฐ ์ œ์–ด๋ฅผ ๋™์‹œ์— ํ•˜๋Š” ์ •์ฑ…์„ ํ•™์Šตํ•˜์˜€๋‹ค. ๋‘˜ ์งธ, ๋น„๋””์˜ค์—์„œ ํ”ผ๊ฒจ ์Šค์ผ€์ดํŒ… ๊ธฐ์ˆ ์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•œ๋‹ค. ํ”ผ๊ฒจ ์Šค์ผ€์ดํŒ… ๊ธฐ์ˆ ๋“ค์€ ๋น™์ƒ์—์„œ ๋น ๋ฅด๊ณ  ์—ญ๋™์ ์ธ ์›€์ง์ž„์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์–ด ๋ชจ์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ์–ป๊ธฐ๊ฐ€ ๊นŒ๋‹ค๋กญ๋‹ค. ๋น„๋””์˜ค์—์„œ 3์ฐจ์› ํ‚ค ํฌ์ฆˆ๋ฅผ ์ถ”์ถœํ•˜๊ณ  ๊ถค์  ์ตœ์ ํ™” ๋ฐ ์‹ฌ์ธต๊ฐ•ํ™”ํ•™์Šต ์ œ์–ด๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ ํ”ผ๊ฒจ ์Šค์ผ€์ดํŒ… ๊ธฐ์ˆ ์„ ์„ฑ๊ณต์ ์œผ๋กœ ์‹œ์—ฐํ•œ๋‹ค. ์…‹ ์งธ, ํŒŒํ‚จ์Šจ ๋ณ‘์ด๋‚˜ ๋‡Œ์„ฑ๋งˆ๋น„์™€ ๊ฐ™์€ ์งˆ๋ณ‘์œผ๋กœ ์ธํ•˜์—ฌ ์›€์ง์ž„ ์žฅ์• ๊ฐ€ ์žˆ๋Š” ํ™˜์ž์˜ ๋ณดํ–‰์„ ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•œ๋‹ค. 2์ฐจ์› ๋น„๋””์˜ค๋กœ๋ถ€ํ„ฐ ๋”ฅ๋Ÿฌ๋‹์„ ์‚ฌ์šฉํ•œ ์ž์„ธ ์ถ”์ •๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ํ™˜์ž์˜ ๊ด€์ ˆ ์œ„์น˜๋ฅผ ์–ป์–ด๋‚ธ ๋‹ค์Œ, 3์ฐจ์› ์ ˆ๋Œ€ ์ขŒํ‘œ๋ฅผ ์–ป์–ด๋‚ด์–ด ์ด๋กœ๋ถ€ํ„ฐ ๋ณดํญ, ๋ณดํ–‰ ์†๋„์™€ ๊ฐ™์€ ๋ณดํ–‰ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๊ทผ๊ณจ๊ฒฉ ์ธ์ฒด ๋ชจ๋ธ๊ณผ ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ์ด์šฉํ•˜์—ฌ ์ธ๊ฐ„ ๋ณดํ–‰์˜ ์ตœ์ ํ™” ๊ธฐ์ค€์— ๋Œ€ํ•ด ํƒ๊ตฌํ•œ๋‹ค. ๊ทผ์œก ํ™œ์„ฑ๋„ ์ตœ์†Œํ™”์™€ ๊ด€์ ˆ ๋Œ๋ฆผํž˜ ์ตœ์†Œํ™”, ๋‘ ๊ฐ€์ง€ ๊ธฐ์ค€์— ๋Œ€ํ•ด ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•œ ํ›„, ์‹ค์ œ ์‚ฌ๋žŒ ๋ฐ์ดํ„ฐ์™€ ๋น„๊ตํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ๋ถ„์„ํ•œ๋‹ค. ์ฒ˜์Œ ๋‘ ๊ฐœ์˜ ์—ฐ๊ตฌ ์ฃผ์ œ์˜ ํšจ๊ณผ๋ฅผ ์ž…์ฆํ•˜๊ธฐ ์œ„ํ•ด, ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด์ฐจ์› ๋น„๋””์˜ค๋กœ๋ถ€ํ„ฐ ์žฌ๊ตฌ์„ฑํ•œ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์—ญ๋™์ ์ธ ์‚ฌ๋žŒ์˜ ๋™์ž‘๋“ค์„ ์žฌํ˜„ํ•œ๋‹ค. ๋‚˜์ค‘ ๋‘ ๊ฐœ์˜ ์—ฐ๊ตฌ ์ฃผ์ œ๋Š” ์‚ฌ๋žŒ ๋ฐ์ดํ„ฐ์™€์˜ ๋น„๊ต ๋ถ„์„์„ ํ†ตํ•˜์—ฌ ํ‰๊ฐ€ํ•œ๋‹ค.1 Introduction 1 2 Background 9 2.1 Pose Estimation from 2D Video . . . . . . . . . . . . . . . . . . . . 9 2.2 Motion Reconstruction from Monocular Video . . . . . . . . . . . . 10 2.3 Physics-Based Character Simulation and Control . . . . . . . . . . . 12 2.4 Motion Reconstruction Leveraging Physics . . . . . . . . . . . . . . 13 2.5 Human Motion Control . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5.1 Figure Skating Simulation . . . . . . . . . . . . . . . . . . . 16 2.6 Objective Gait Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.7 Optimization for Human Movement Simulation . . . . . . . . . . . . 17 2.7.1 Stability Criteria . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Human Dynamics from Monocular Video with Dynamic Camera Movements 19 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Pose and Contact Estimation . . . . . . . . . . . . . . . . . . . . . . 21 3.4 Learning Human Dynamics . . . . . . . . . . . . . . . . . . . . . . . 24 3.4.1 Policy Learning . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4.2 Network Training . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4.3 Scene Estimator . . . . . . . . . . . . . . . . . . . . . . . . 29 3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5.1 Video Clips . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5.2 Comparison of Contact Estimators . . . . . . . . . . . . . . . 33 3.5.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.5.4 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4 Figure Skating Simulation from Video 42 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3 Skating Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.3.1 Non-holonomic Constraints . . . . . . . . . . . . . . . . . . 46 4.3.2 Relaxation of Non-holonomic Constraints . . . . . . . . . . . 47 4.4 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.5 Trajectory Optimization and Control . . . . . . . . . . . . . . . . . . 50 4.5.1 Trajectory Optimization . . . . . . . . . . . . . . . . . . . . 50 4.5.2 Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5 Gait Analysis Using Pose Estimation Algorithm with 2D-video of Patients 61 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.2.1 Patients and video recording . . . . . . . . . . . . . . . . . . 63 5.2.2 Standard protocol approvals, registrations, and patient consents 66 5.2.3 3D Pose estimation from 2D video . . . . . . . . . . . . . . . 66 5.2.4 Gait parameter estimation . . . . . . . . . . . . . . . . . . . 67 5.2.5 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . 68 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.3.1 Validation of video-based analysis of the gait . . . . . . . . . 68 5.3.2 gait analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.4.1 Validation with the conventional sensor-based method . . . . 75 5.4.2 Analysis of gait and turning in TUG . . . . . . . . . . . . . . 75 5.4.3 Correlation with clinical parameters . . . . . . . . . . . . . . 76 5.4.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.5 Supplementary Material . . . . . . . . . . . . . . . . . . . . . . . . . 77 6 Control Optimization of Human Walking 80 6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.2.1 Musculoskeletal model . . . . . . . . . . . . . . . . . . . . . 82 6.2.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.2.3 Control co-activation level . . . . . . . . . . . . . . . . . . . 83 6.2.4 Push-recovery experiment . . . . . . . . . . . . . . . . . . . 84 6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7 Conclusion 90 7.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91Docto
    • โ€ฆ
    corecore