12 research outputs found

    Deep Autoencoder for Combined Human Pose Estimation and body Model Upscaling

    Get PDF
    We present a method for simultaneously estimating 3D human pose and body shape from a sparse set of wide-baseline camera views. We train a symmetric convolutional autoencoder with a dual loss that enforces learning of a latent representation that encodes skeletal joint positions, and at the same time learns a deep representation of volumetric body shape. We harness the latter to up-scale input volumetric data by a factor of 4Ă—4 \times, whilst recovering a 3D estimate of joint positions with equal or greater accuracy than the state of the art. Inference runs in real-time (25 fps) and has the potential for passive human behaviour monitoring where there is a requirement for high fidelity estimation of human body shape and pose

    Improved Calibration Procedure for Wireless Inertial Measurement Units without Precision Equipment

    Get PDF
    Inertial measurement units (IMUs) are used in medical applications for many different purposes. However, an IMU's measurement accuracy can degrade over time, entailing re-calibration. In their 2014 paper, Tedaldi et al. presented an IMU calibration method that does not require external precision equipment or complex procedures. This allows end-users or personnel without expert knowledge of inertial measurement to re-calibrate the sensors by placing them in several suitable but not precisely defined orientations. In this work, we present several improvements to Tedaldi's method, both on the algorithmic level and the calibration procedure: adaptions for low noise accelerometers, a calibration helper object, and packet loss compensation for wireless calibration. We applied the modified calibration procedure to our custom-built IMU platform and verified the consistency of results across multiple calibration runs. In order to minimize the time needed for re-calibration, we analyzed how the calibration result accuracy degrades when fewer calibration orientations are used. We found that N=12 different orientations are sufficient to achieve a very good calibration, and more orientations yielded only marginal improvements. This is a significant improvement compared to the 37 to 50 orientations recommended by Tedaldi. Thus, we were reduced the time required to calibrate a single IMU from ca. 5 minutes to less than 2 minutes without sacrificing any meaningful calibration accuracy

    Musculoskeletal Estimation Using Inertial Measurement Units and Single Video Image

    Get PDF
    International audienceWe address the problem of estimating the physical burden of a human body. This translates to monitor and estimate muscle tension and joint reaction forces of a mus-culoskeletal model in real-time. The system should minimize the discomfort generating by any sensors that needs to be fixed on the user. Our system combines a 3D pose estimation from vision and IMU sensors. We aim to minimize the number of IMU fixed to the subject while compensating the remaining lack of information with vision

    Real-time Full-Body Motion Capture from Video and IMUs

    Get PDF
    A real-time full-body motion capture system is presented which uses input from a sparse set of inertial measurement units (IMUs) along with images from two or more standard video cameras and requires no optical markers or specialized infra-red cameras. A real-time optimization-based framework is proposed which incorporates constraints from the IMUs, cameras and a prior pose model. The combination of video and IMU data allows the full 6-DOF motion to be recovered including axial rotation of limbs and drift-free global position. The approach was tested using both indoor and outdoor captured data. The results demonstrate the effectiveness of the approach for tracking a wide range of human motion in real time in unconstrained indoor/outdoor scenes

    Real-time Full-Body Motion Capture from Video and IMUs

    No full text
    A real-time full-body motion capture system is presented which uses input from a sparse set of inertial measurement units (IMUs) along with images from two or more standard video cameras and requires no optical markers or specialized infra-red cameras. A real-time optimization-based framework is proposed which incorporates constraints from the IMUs, cameras and a prior pose model. The combination of video and IMU data allows the full 6-DOF motion to be recovered including axial rotation of limbs and drift-free global position. The approach was tested using both indoor and outdoor captured data. The results demonstrate the effectiveness of the approach for tracking a wide range of human motion in real time in unconstrained indoor/outdoor scenes

    Combining haptics and inertial motion capture to enhance remote control of a dual-arm robot

    Full text link
    [EN] High dexterity is required in tasks in which there is contact between objects, such as surface conditioning (wiping, polishing, scuffing, sanding, etc.), specially when the location of the objects involved is unknown or highly inaccurate because they are moving, like a car body in automotive industry lines. These applications require the human adaptability and the robot accuracy. However, sharing the same workspace is not possible in most cases due to safety issues. Hence, a multi-modal teleoperation system combining haptics and an inertial motion capture system is introduced in this work. The human operator gets the sense of touch thanks to haptic feedback, whereas using the motion capture device allows more naturalistic movements. Visual feedback assistance is also introduced to enhance immersion. A Baxter dual-arm robot is used to offer more flexibility and manoeuvrability, allowing to perform two independent operations simultaneously. Several tests have been carried out to assess the proposed system. As it is shown by the experimental results, the task duration is reduced and the overall performance improves thanks to the proposed teleoperation method.This research was funded by Generalitat Valenciana (Grants GV/2021/074 and GV/2021/181) and by the SpanishGovernment (Grants PID2020-118071GB-I00 and PID2020-117421RBC21 funded by MCIN/AEI/10.13039/501100011033). This work was also supported byCoordenacao de Aperfeiaoamento de Pessoal de Nivel Superior (CAPES Brasil) under Finance Code 001, by CEFET-MG, and by a Royal Academy of Engineering Chair in Emerging Technologies to YD.Girbés-Juan, V.; Schettino, V.; Gracia Calandin, LI.; Solanes, JE.; Demiris, Y.; Tornero, J. (2022). Combining haptics and inertial motion capture to enhance remote control of a dual-arm robot. Journal on Multimodal User Interfaces. 16(2):219-238. https://doi.org/10.1007/s12193-021-00386-8219238162Hägele M, Nilsson K, Pires JN, Bischoff R (2016) Industrial robotics. Springer, Cham, pp 1385–1422. https://doi.org/10.1007/978-3-319-32552-1_54Hokayem PF, Spong MW (2006) Bilateral teleoperation: an historical survey. Automatica 42(12):2035–2057. https://doi.org/10.1016/j.automatica.2006.06.027Son HI (2019) The contribution of force feedback to human performance in the teleoperation of multiple unmanned aerial vehicles. J Multimodal User Interfaces 13(4):335–342Jones B, Maiero J, Mogharrab A, Aguliar IA, Adhikari A, Riecke BE, Kruijff E, Neustaedter C, Lindeman RW (2020) Feetback: augmenting robotic telepresence with haptic feedback on the feet. In: Proceedings of the 2020 international conference on multimodal interaction, pp 194–203Merrad W, Héloir A, Kolski C, Krüger A (2021) Rfid-based tangible and touch tabletop for dual reality in crisis management context. J Multimodal User Interfaces. https://doi.org/10.1007/s12193-021-00370-2Schettino V, Demiris Y (2019) Inference of user-intention in remote robot wheelchair assistance using multimodal interfaces. In: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 4600–4606Casper J, Murphy RR (2003) Human–robot interactions during the robot-assisted urban search and rescue response at the world trade center. IEEE Trans Syst Man Cybern Part B (Cybern) 33(3):367–385. https://doi.org/10.1109/TSMCB.2003.811794Chen JY (2010) UAV-guided navigation for ground robot tele-operation in a military reconnaissance environment. Ergonomics 53(8):940–950. https://doi.org/10.1080/00140139.2010.500404 (pMID: 20658388.)Aleotti J, Micconi G, Caselli S, Benassi G, Zambelli N, Bettelli M, Calestani D, Zappettini A (2019) Haptic teleoperation of UAV equipped with gamma-ray spectrometer for detection and identification of radio-active materials in industrial plants. In: Tolio T, Copani G, Terkaj W (eds) Factories of the future: the Italian flagship initiative. Springer, Cham, pp 197–214. https://doi.org/10.1007/978-3-319-94358-9_9Santos Carreras L (2012) Increasing haptic fidelity and ergonomics in teleoperated surgery. PhD Thesis, EPFL, Lausanne, pp 1–188. https://doi.org/10.5075/epfl-thesis-5412Hatzfeld C, Neupert C, Matich S, Braun M, Bilz J, Johannink J, Miller J, Pott PP, Schlaak HF, Kupnik M, Werthschützky R, Kirschniak A (2017) A teleoperated platform for transanal single-port surgery: ergonomics and workspace aspects. In: IEEE world haptics conference (WHC), pp 1–6. https://doi.org/10.1109/WHC.2017.7989847Burns JO, Mellinkoff B, Spydell M, Fong T, Kring DA, Pratt WD, Cichan T, Edwards CM (2019) Science on the lunar surface facilitated by low latency telerobotics from a lunar orbital platform-gateway. Acta Astronaut 154:195–203. https://doi.org/10.1016/j.actaastro.2018.04.031Sivčev S, Coleman J, Omerdić E, Dooly G, Toal D (2018) Underwater manipulators: a review. Ocean Eng 163:431–450. https://doi.org/10.1016/j.oceaneng.2018.06.018Abich J, Barber DJ (2017) The impact of human–robot multimodal communication on mental workload, usability preference, and expectations of robot behavior. J Multimodal User Interfaces 11(2):211–225. https://doi.org/10.1007/s12193-016-0237-4Hong A, Lee DG, Bülthoff HH, Son HI (2017) Multimodal feedback for teleoperation of multiple mobile robots in an outdoor environment. J Multimodal User Interfaces 11(1):67–80. https://doi.org/10.1007/s12193-016-0230-yKatyal KD, Brown CY, Hechtman SA, Para MP, McGee TG, Wolfe KC, Murphy RJ, Kutzer MDM, Tunstel EW, McLoughlin MP, Johannes MS (2014) Approaches to robotic teleoperation in a disaster scenario: from supervised autonomy to direct control. In: IEEE/RSJ international conference on intelligent robots and systems, pp 1874–1881. https://doi.org/10.1109/IROS.2014.6942809Niemeyer G, Preusche C, Stramigioli S, Lee D (2016) Telerobotics. Springer, Cham, pp 1085–1108. https://doi.org/10.1007/978-3-319-32552-1_43Li J, Li Z, Hauser K (2017) A study of bidirectionally telepresent tele-action during robot-mediated handover. In: Proceedings—IEEE international conference on robotics and automation, pp 2890–2896. https://doi.org/10.1109/ICRA.2017.7989335Peng XB, Kanazawa A, Malik J, Abbeel P, Levine S (2018) Sfv: reinforcement learning of physical skills from videos. ACM Trans. Graph. 37(6):178:1-178:14. https://doi.org/10.1145/3272127.3275014Coleca F, State A, Klement S, Barth E, Martinetz T (2015) Self-organizing maps for hand and full body tracking. Neurocomputing 147: 174–184. Advances in self-organizing maps subtitle of the special issue: selected papers from the workshop on self-organizing maps 2012 (WSOM 2012). https://doi.org/10.1016/j.neucom.2013.10.041Von Marcard T, Rosenhahn B, Black MJ, Pons-Moll G (2017) Sparse inertial poser: automatic 3d human pose estimation from sparse Imus. In: Computer graphics forum, vol 36. Wiley, pp 349–360Zhao J (2018) A review of wearable IMU (inertial-measurement-unit)-based pose estimation and drift reduction technologies. J Phys Conf Ser 1087:042003. https://doi.org/10.1088/1742-6596/1087/4/042003Malleson C, Gilbert A, Trumble M, Collomosse J, Hilton A, Volino M (2018) Real-time full-body motion capture from video and IMUs. In: Proceedings—2017 international conference on 3D vision, 3DV 2017 (September), pp 449–457. https://doi.org/10.1109/3DV.2017.00058Du G, Zhang P, Mai J, Li Z (2012) Markerless kinect-based hand tracking for robot teleoperation. Int J Adv Robot Syst 9(2):36. https://doi.org/10.5772/50093Çoban M, Gelen G (2018) Wireless teleoperation of an industrial robot by using myo arm band. In: International conference on artificial intelligence and data processing (IDAP), pp 1–6. https://doi.org/10.1109/IDAP.2018.8620789Lipton JI, Fay AJ, Rus D (2018) Baxter’s homunculus: virtual reality spaces for teleoperation in manufacturing. IEEE Robot Autom Lett 3(1):179–186. https://doi.org/10.1109/LRA.2017.2737046Zhang T, McCarthy Z, Jow O, Lee D, Chen X, Goldberg K, Abbeel P (2018) Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In: IEEE international conference on robotics and automation (ICRA), pp 5628–5635. https://doi.org/10.1109/ICRA.2018.8461249Hannaford B, Okamura AM (2016) Haptics. Springer, Cham, pp 1063–1084. https://doi.org/10.1007/978-3-319-32552-1_42Rodríguez J-L, Velàzquez R (2012) Haptic rendering of virtual shapes with the Novint Falcon. Proc Technol 3:132–138. https://doi.org/10.1016/J.PROTCY.2012.03.014Teklemariam HG, Das AK (2017) A case study of phantom omni force feedback device for virtual product design. Int J Interact Des Manuf (IJIDeM) 11(4):881–892. https://doi.org/10.1007/s12008-015-0274-3Karbasizadeh N, Zarei M, Aflakian A, Masouleh MT, Kalhor A (2018) Experimental dynamic identification and model feed-forward control of Novint Falcon haptic device. Mechatronics 51:19–30. https://doi.org/10.1016/j.mechatronics.2018.02.013Georgiou T, Demiris Y (2017) Adaptive user modelling in car racing games using behavioural and physiological data. User Model User-Adapted Interact 27(2):267–311. https://doi.org/10.1007/s11257-017-9192-3Son HI (2019) The contribution of force feedback to human performance in the teleoperation of multiple unmanned aerial vehicles. J Multimodal User Interfaces 13(4):335–342. https://doi.org/10.1007/s12193-019-00292-0Ramírez-Fernández C, Morán AL, García-Canseco E (2015) Haptic feedback in motor hand virtual therapy increases precision and generates less mental workload. In: 2015 9th international conference on pervasive computing technologies for healthcare (PervasiveHealth), pp 280–286. https://doi.org/10.4108/icst.pervasivehealth.2015.260242Saito Y, Raksincharoensak P (2019) Effect of risk-predictive haptic guidance in one-pedal driving mode. Cognit Technol Work 21(4):671–684. https://doi.org/10.1007/s10111-019-00558-3Girbés V, Armesto L, Dols J, Tornero J (2016) Haptic feedback to assist bus drivers for pedestrian safety at low speed. IEEE Trans Haptics 9(3):345–357. https://doi.org/10.1109/TOH.2016.2531686Girbés V, Armesto L, Dols J, Tornero J (2017) An active safety system for low-speed bus braking assistance. IEEE Trans Intell Transp Syst 18(2):377–387. https://doi.org/10.1109/TITS.2016.2573921Escobar-Castillejos D, Noguez J, Neri L, Magana A, Benes B (2016) A review of simulators with haptic devices for medical training. J Med Syst 40(4):104. https://doi.org/10.1007/s10916-016-0459-8Coles TR, Meglan D, John NW (2011) The role of haptics in medical training simulators: a survey of the state of the art. IEEE Trans Haptics 4(1):51–66. https://doi.org/10.1109/TOH.2010.19Okamura AM, Verner LN, Reiley CE, Mahvash M (2010) Haptics for robot-assisted minimally invasive surgery. In: Kaneko M, Nakamura Y (eds) Robotics research. Springer tracts in advanced robotics, vol 66. Springer, Berlin, pp 361–372. https://doi.org/10.1007/978-3-642-14743-2_30Ehrampoosh S, Dave M, Kia MA, Rablau C, Zadeh MH (2013) Providing haptic feedback in robot-assisted minimally invasive surgery: a direct optical force-sensing solution for haptic rendering of deformable bodies. Comput Aided Surg 18(5–6):129–141. https://doi.org/10.3109/10929088.2013.839744Ju Z, Yang C, Li Z, Cheng L, Ma H (2014) Teleoperation of humanoid Baxter robot using haptic feedback. In: 2014 international conference on multisensor fusion and information integration for intelligent systems (MFI). IEEE, pp 1–6. https://doi.org/10.1109/MFI.2014.6997721Clark JP, Lentini G, Barontini F, Catalano MG, Bianchi M, O’Malley MK (2019) On the role of wearable haptics for force feedback in teleimpedance control for dual-arm robotic teleoperation. In: International conference on robotics and automation (ICRA), pp 5187–5193. https://doi.org/10.1109/ICRA.2019.8793652Gracia L, Solanes JE, Muñoz-Benavent P, Miro JV, Perez-Vidal C, Tornero J (2018) Adaptive sliding mode control for robotic surface treatment using force feedback. Mechatronics 52:102–118. https://doi.org/10.1016/j.mechatronics.2018.04.008Zhu D, Xu X, Yang Z, Zhuang K, Yan S, Ding H (2018) Analysis and assessment of robotic belt grinding mechanisms by force modeling and force control experiments. Tribol Int 120:93–98. https://doi.org/10.1016/j.triboint.2017.12.043Smith C, Karayiannidis Y, Nalpantidis L, Gratal X, Qi P, Dimarogonas DV, Kragic D (2012) Dual arm manipulation—a survey. Robot Auton Syst 60(10):1340–1353. https://doi.org/10.1016/j.robot.2012.07.005Girbés-Juan V, Schettino V, Demiris Y, Tornero J (2021) Haptic and visual feedback assistance for dual-arm robot teleoperation in surface conditioning tasks. IEEE Trans Haptics 14(1):44–56. https://doi.org/10.1109/TOH.2020.3004388Tunstel EW Jr, Wolfe KC, Kutzer MD, Johannes MS, Brown CY, Katyal KD, Para MP, Zeher MJ (2013) Recent enhancements to mobile bimanual robotic teleoperation with insight toward improving operator control. Johns Hopkins APL Tech Digest 32(3):584García A, Solanes JE, Gracia L, Muñoz-Benavent P, Girbés-Juan V, Tornero J (2021) Bimanual robot control for surface treatment tasks. Int J Syst Sci. https://doi.org/10.1080/00207721.2021.1938279Jasim IF, Plapper PW, Voos H (2014) Position identification in force-guided robotic peg-in-hole assembly tasks. Proc CIRP 23((C)):217–222. https://doi.org/10.1016/j.procir.2014.10.077Song HC, Kim YL, Song JB (2016) Guidance algorithm for complex-shape peg-in-hole strategy based on geometrical information and force control. Adv Robot 30(8):552–563. https://doi.org/10.1080/01691864.2015.1130172Kramberger A, Gams A, Nemec B, Chrysostomou D, Madsen O, Ude A (2017) Generalization of orientation trajectories and force-torque profiles for robotic assembly. Robot Auton Syst 98:333–346. https://doi.org/10.1016/j.robot.2017.09.019Pliego-Jiménez J, Arteaga-Pérez MA (2015) Adaptive position/force control for robot manipulators in contact with a rigid surface with unknown parameters. In: European control conference (ECC), pp 3603–3608. https://doi.org/10.1109/ECC.2015.7331090Gierlak P, Szuster M (2017) Adaptive position/force control for robot manipulator in contact with a flexible environment. Robot Auton Syst 95:80–101. https://doi.org/10.1016/j.robot.2017.05.015Solanes JE, Gracia L, Muñoz-Benavent P, Miro JV, Girbés V, Tornero J (2018) Human–robot cooperation for robust surface treatment using non-conventional sliding mode control. ISA Trans 80:528–541. https://doi.org/10.1016/j.isatra.2018.05.013Ravandi AK, Khanmirza E, Daneshjou K (2018) Hybrid force/position control of robotic arms manipulating in uncertain environments based on adaptive fuzzy sliding mode control. Appl Soft Comput 70:864–874. https://doi.org/10.1016/j.asoc.2018.05.048Solanes JE, Gracia L, Muñoz-Benavent P, Esparza A, Miro JV, Tornero J (2018) Adaptive robust control and admittance control for contact-driven robotic surface conditioning. Robot Comput Integr Manuf 54:115–132. https://doi.org/10.1016/j.rcim.2018.05.003Perez-Vidal C, Gracia L, Sanchez-Caballero S, Solanes JE, Saccon A, Tornero J (2019) Design of a polishing tool for collaborative robotics using minimum viable product approach. Int J Comput Integr Manuf 32(9):848–857. https://doi.org/10.1080/0951192X.2019.1637026Chen F, Zhao H, Li D, Chen L, Tan C, Ding H (2019) Contact force control and vibration suppression in robotic polishing with a smart end effector. Robot Comput Integr Manuf 57:391–403. https://doi.org/10.1016/j.rcim.2018.12.019Mohammad AEK, Hong J, Wang D, Guan Y (2019) Synergistic integrated design of an electrochemical mechanical polishing end-effector for robotic polishing applications. Robot Comput Integr Manuf 55:65–75. https://doi.org/10.1016/j.rcim.2018.07.005Waldron KJ, Schmiedeler J (2016) Kinematics. Springer, Cham, pp 11–36. https://doi.org/10.1007/978-3-319-32552-1_2Featherstone R, Orin DE (2016) Dynamics. Springer, Cham, pp 37–66. https://doi.org/10.1007/978-3-319-32552-1_3Wen K, Necsulescu D, Sasiadek J (2008) Haptic force control based on impedance/admittance control aided by visual feedback. Multimed Tools Appl 37(1):39–52. https://doi.org/10.1007/s11042-007-0172-1Tzafestas C, Velanas S, Fakiridis G (2008) Adaptive impedance control in haptic teleoperation to improve transparency under time-delay. In: IEEE international conference on robotics and automation, pp 212–219. https://doi.org/10.1109/ROBOT.2008.4543211Chiaverini S, Oriolo G, Maciejewski AA (2016) Redundant robots. Springer, Cham, pp 221–242. https://doi.org/10.1007/978-3-319-32552-1_10Ogata K (1987) Discrete-time control systems. McGraw-Hill, New YorkGarcía A, Girbés-Juan V, Solanes JE, Gracia L, Perez-Vidal C, Tornero J (2020) Human–robot cooperation for surface repair combining automatic and manual modes. IEEE Access 8:154024–154035. https://doi.org/10.1109/ACCESS.2020.301450

    Real-time 3D human body pose estimation from monocular RGB input

    Get PDF
    Human motion capture finds extensive application in movies, games, sports and biomechanical analysis. However, existing motion capture solutions require cumbersome external and/or on-body instrumentation, or use active sensors with limits on the possible capture volume dictated by power consumption. The ubiquity and ease of deployment of RGB cameras makes monocular RGB based human motion capture an extremely useful problem to solve, which would lower the barrier-to entry for content creators to employ motion capture tools, and enable newer applications of human motion capture. This thesis demonstrates the first real-time monocular RGB based motion-capture solutions that work in general scene settings. They are based on developing neural network based approaches to address the ill-posed problem of estimating 3D human pose from a single RGB image, in combination with model based fitting. In particular, the contributions of this work make advances towards three key aspects of real-time monocular RGB based motion capture, namely speed, accuracy, and the ability to work for general scenes. New training datasets are proposed, for single-person and multi-person scenarios, which, together with the proposed transfer learning based training pipeline, allow learning based approaches to be appearance invariant. The training datasets are accompanied by evaluation benchmarks with multiple avenues of fine-grained evaluation. The evaluation benchmarks differ visually from the training datasets, so as to promote efforts towards solutions that generalize to in-the-wild scenes. The proposed task formulations for the single-person and multi-person case allow higher accuracy, and incorporate additional qualities such as occlusion robustness, that are helpful in the context of a full motion capture solution. The multi-person formulations are designed to have a nearly constant inference time regardless of the number of subjects in the scene, and combined with contributions towards fast neural network inference, enable real-time 3D pose estimation for multiple subjects. Combining the proposed learning-based approaches with a model-based kinematic skeleton fitting step provides temporally stable joint angle estimates, which can be readily employed for driving virtual characters.Menschlicher Motion Capture findet umfangreiche Anwendung in Filmen, Spielen, Sport und biomechanischen Analysen. Bestehende Motion-Capture-Lösungen erfordern jedoch umständliche externe Instrumentierung und / oder Instrumentierung am Körper, oder verwenden aktive Sensoren deren begrenztes Erfassungsvolumen durch den Stromverbrauch begrenzt wird. Die Allgegenwart und einfache Bereitstellung von RGB-Kameras macht die monokulare RGB-basierte Motion Capture zu einem äußerst nützlichen Problem. Dies würde die Eintrittsbarriere für Inhaltsersteller für die Verwendung der Motion Capture verringern und neuere Anwendungen dieser Tools zur Analyse menschlicher Bewegungen ermöglichen. Diese Arbeit zeigt die ersten monokularen RGB-basierten Motion-Capture-Lösungen in Echtzeit, die in allgemeinen Szeneneinstellungen funktionieren. Sie basieren auf der Entwicklung neuronaler netzwerkbasierter Ansätze, um das schlecht gestellte Problem der Schätzung der menschlichen 3D-Pose aus einem einzelnen RGB-Bild in Kombination mit einer modellbasierten Anpassung anzugehen. Insbesondere machen die Beiträge dieser Arbeit Fortschritte in Richtung drei Schlüsselaspekte der monokularen RGB-basierten Echtzeit-Bewegungserfassung, nämlich Geschwindigkeit, Genauigkeit und die Fähigkeit, für allgemeine Szenen zu arbeiten. Es werden neue Trainingsdatensätze für Einzel- und Mehrpersonen-Szenarien vorgeschlagen, die zusammen mit der vorgeschlagenen Trainingspipeline, die auf Transferlernen basiert, ermöglichen, dass lernbasierte Ansätze nicht von Unterschieden im Erscheinungsbild des Bildes beeinflusst werden. Die Trainingsdatensätze werden von Bewertungsbenchmarks mit mehreren Möglichkeiten einer feinkörnigen Bewertung begleitet. Die angegebenen Benchmarks unterscheiden sich visuell von den Trainingsaufzeichnungen, um die Entwicklung von Lösungen zu fördern, die sich auf verschiedene Szenen verallgemeinern lassen. Die vorgeschlagenen Aufgabenformulierungen für den Einzel- und Mehrpersonenfall ermöglichen eine höhere Genauigkeit und enthalten zusätzliche Eigenschaften wie die Robustheit der Okklusion, die im Kontext einer vollständigen Bewegungserfassungslösung hilfreich sind. Die Mehrpersonenformulierungen sind so konzipiert, dass sie unabhängig von der Anzahl der Subjekte in der Szene eine nahezu konstante Inferenzzeit haben. In Kombination mit Beiträgen zur schnellen Inferenz neuronaler Netze ermöglichen sie eine 3D-Posenschätzung in Echtzeit für mehrere Subjekte. Die Kombination der vorgeschlagenen lernbasierten Ansätze mit einem modellbasierten kinematischen Skelettanpassungsschritt liefert zeitlich stabile Gelenkwinkelschätzungen, die leicht zum Ansteuern virtueller Charaktere verwendet werden können

    Whole-Body Motion Capture and Beyond: From Model-Based Inference to Learning-Based Regression

    Get PDF
    Herkömmliche markerlose Motion Capture (MoCap)-Methoden sind zwar effektiv und erfolgreich, haben aber mehrere Einschränkungen: 1) Sie setzen ein charakterspezifi-sches Körpermodell voraus und erlauben daher keine vollautomatische Pipeline und keine Verallgemeinerung über verschiedene Korperformen; 2) es werden keine Objekte verfolgt, mit denen Menschen interagieren, während in der Realität die Interaktion zwischen Menschen und Objekten allgegenwärtig ist; 3) sie sind in hohem Maße von ausgeklügelten Optimierungen abhängig, die eine gute Initialisierung und starke Prioritäten erfordern. Dieser Prozess kann sehr zeitaufwändig sein. In dieser Arbeit befassen wir uns mit allen oben genannten Problemen. Zunächst schlagen wir eine vollautomatische Methode zur genauen 3D-Rekonstruktion des menschlichen Körpers aus RGB-Videos mit mehreren Ansichten vor. Wir verarbeiten alle RGB-Videos vor, um 2D-Keypoints und Silhouetten zu erhalten. Dann passen wir modell in zwei aufeinander folgenden Schritten an die 2D-Messungen an. In der ersten Phase werden die Formparameter und die Posenparameter der SMPL nacheinander und bildweise geschtäzt. In der zweiten Phase wird eine Reihe von Einzelbildern gemeinsam mit der zusätzlichen DCT-Priorisierung (Discrete Cosine Transformation) verfeinert. Unsere Methode kann verschiedene Körperformen und schwierige Posen ohne menschliches Zutun verarbeiten. Dann erweitern wir das MoCap-System, um die Verfolgung von starren Objekten zu unterstutzen, mit denen die Testpersonen interagieren. Unser System besteht aus 6 RGB-D Azure-Kameras. Zunächst werden alle RGB-D Videos vorverarbeitet, indem Menschen und Objekte segmentiert und 2D-Körpergelenke erkannt werden. Das SMPL-X Modell wird hier eingesetzt, um die Handhaltung besser zu erfassen. Das SMPL-XModell wird in 2D-Keypoints und akkumulierte Punktwolken eingepasst. Wir zeigen, dass die Körperhaltung wichtige Informationen für eine bessere Objektverfolgung liefert. Anschließend werden die Körper- und Objektposen gemeinsam mit Kontakt- und Durch-dringungsbeschrankungen optimiert. Mit diesem Ansatz haben wir den ersten Mensch-Objekt-Interaktionsdatensatz mit natürlichen RGB-Bildern und angemessenen Körper und Objektbewegungsinformationen erfasst. Schließlich präsentieren wir das erste praktische, leichtgewichtige MoCap-System, das nur 6 Inertialmesseinheiten (IMUs) benötigt. Unser Ansatz basiert auf bi-direktionalen rekurrenten neuronalen Netzen (Bi-RNN). Das Netzwerk soll die zeitliche Abhängigkeit besser ausnutzen, indem es vergangene und zukünftige Teilmessungen der IMUs zu- sammenfasst. Um das Problem der Datenknappheit zu lösen, erstellen wir synthetische Daten aus archivierten MoCap-Daten. Insgesamt läuft unser System 10 Mal schneller als die Optimierungsmethode und ist numerisch genauer. Wir zeigen auch, dass es möglich ist, die Aktivität der Testperson abzuschätzen, indem nur die IMU Messung der Smart-watch, die die Testperson trägt, betrachtet wird. Zusammenfassend lässt sich sagen, dass wir die markerlose MoCap-Methode weiter-entwickelt haben, indem wir das erste automatische und dennoch genaue System beisteuerten, die MoCap-Methoden zur Unterstützung der Verfolgung starrer Objekte erweiterten und einen praktischen und leichtgewichtigen Algorithmus mit 6 IMUs vorschlugen. Wir glauben, dass unsere Arbeit die markerlose MoCap billiger und praktikabler macht und somit den Endnutzern fur den taglichen Gebrauch näher bringt.Though effective and successful, traditional marker-less Motion Capture (MoCap) methods suffer from several limitations: 1) they presume a character-specific body model, thus they do not permit a fully automatic pipeline and generalization over diverse body shapes; 2) no objects humans interact with are tracked, while in reality interaction between humans and objects is ubiquitous; 3) they heavily rely on a sophisticated optimization process, which needs a good initialization and strong priors. This process can be slow. We address all the aforementioned issues in this thesis, as described below. Firstly we propose a fully automatic method to accurately reconstruct a 3D human body from multi-view RGB videos, the typical setup for MoCap systems. We pre-process all RGB videos to obtain 2D keypoints and silhouettes. Then we fit the SMPL body model into the 2D measurements in two successive stages. In the first stage, the shape and pose parameters of SMPL are estimated frame-wise sequentially. In the second stage, a batch of frames are refined jointly with an extra DCT prior. Our method can naturally handle different body shapes and challenging poses without human intervention. Then we extend this system to support tracking of rigid objects the subjects interact with. Our setup consists of 6 Azure Kinect cameras. Firstly we pre-process all the videos by segmenting humans and objects and detecting 2D body joints. We adopt the SMPL-X model here to capture body and hand pose. The model is fitted to 2D keypoints and point clouds. Then the body poses and object poses are jointly updated with contact and interpenetration constraints. With this approach, we capture a novel human-object interaction dataset with natural RGB images and plausible body and object motion information. Lastly, we present the first practical and lightweight MoCap system that needs only 6 IMUs. Our approach is based on Bi-directional RNNs. The network can make use of temporal information by jointly reasoning about past and future IMU measurements. To handle the data scarcity issue, we create synthetic data from archival MoCap data. Overall, our system runs ten times faster than traditional optimization-based methods, and is numerically more accurate. We also show it is feasible to estimate which activity the subject is doing by only observing the IMU measurement from a smartwatch worn by the subject. This not only can be useful for a high-level semantic understanding of the human behavior, but also alarms the public of potential privacy concerns. In summary, we advance marker-less MoCap by contributing the first automatic yet accurate system, extending the MoCap methods to support rigid object tracking, and proposing a practical and lightweight algorithm via 6 IMUs. We believe our work makes marker-less and IMUs-based MoCap cheaper and more practical, thus closer to end-users for daily usage
    corecore