156 research outputs found

    Real-Time Human Motion Capture with Multiple Depth Cameras

    Full text link
    Commonly used human motion capture systems require intrusive attachment of markers that are visually tracked with multiple cameras. In this work we present an efficient and inexpensive solution to markerless motion capture using only a few Kinect sensors. Unlike the previous work on 3d pose estimation using a single depth camera, we relax constraints on the camera location and do not assume a co-operative user. We apply recent image segmentation techniques to depth images and use curriculum learning to train our system on purely synthetic data. Our method accurately localizes body parts without requiring an explicit shape model. The body joint locations are then recovered by combining evidence from multiple views in real-time. We also introduce a dataset of ~6 million synthetic depth frames for pose estimation from multiple cameras and exceed state-of-the-art results on the Berkeley MHAD dataset.Comment: Accepted to computer robot vision 201

    Multiview 3D markerless human pose estimation from OpenPose skeletons

    Get PDF
    Despite the fact that marker-based systems for human motion estimation provide very accurate tracking of the human body joints (at mm precision), these systems are often intrusive or even impossible to use depending on the circumstances, e.g.~markers cannot be put on an athlete during competition. Instrumenting an athlete with the appropriate number of markers requires a lot of time and these markers may fall off during the analysis, which leads to incomplete data and requires new data capturing sessions and hence a waste of time and effort. Therefore, we present a novel multiview video-based markerless system that uses 2D joint detections per view (from OpenPose) to estimate their corresponding 3D positions while tackling the people association problem in the process to allow the tracking of multiple persons at the same time. Our proposed system can perform the tracking in real-time at 20-25 fps. Our results show a standard deviation between 9.6 and 23.7 mm for the lower body joints based on the raw measurements only. After filtering the data, the standard deviation drops to a range between 6.6 and 21.3 mm. Our proposed solution can be applied to a large number of applications, ranging from sports analysis to virtual classrooms where submillimeter precision is not necessarily required, but where the use of markers is impractical

    AcinoSet: A 3D Pose Estimation Dataset and Baseline Models for Cheetahs in the Wild

    Full text link
    Animals are capable of extreme agility, yet understanding their complex dynamics, which have ecological, biomechanical and evolutionary implications, remains challenging. Being able to study this incredible agility will be critical for the development of next-generation autonomous legged robots. In particular, the cheetah (acinonyx jubatus) is supremely fast and maneuverable, yet quantifying its whole-body 3D kinematic data during locomotion in the wild remains a challenge, even with new deep learning-based methods. In this work we present an extensive dataset of free-running cheetahs in the wild, called AcinoSet, that contains 119,490 frames of multi-view synchronized high-speed video footage, camera calibration files and 7,588 human-annotated frames. We utilize markerless animal pose estimation to provide 2D keypoints. Then, we use three methods that serve as strong baselines for 3D pose estimation tool development: traditional sparse bundle adjustment, an Extended Kalman Filter, and a trajectory optimization-based method we call Full Trajectory Estimation. The resulting 3D trajectories, human-checked 3D ground truth, and an interactive tool to inspect the data is also provided. We believe this dataset will be useful for a diverse range of fields such as ecology, neuroscience, robotics, biomechanics as well as computer vision.Comment: Code and data can be found at: https://github.com/African-Robotics-Unit/AcinoSe

    Human Model Adaptation for Multiview Markerless Motion Capture

    Get PDF
    An approach to automatic modeling of individual human bodies using complex shape and pose information. The aim is to address the need for human shape and pose model generation for markerless motion capture. With multi-view markerless motion capture, three-dimensional morphable models are learned from an existing database of registered body scans in different shapes and poses. We estimate the body skeleton and pose parameters from the visual hull mesh reconstructed from multiple human silhouettes. Pose variation of body shapes is implemented by the defined underlying skeleton. The shape parameters are estimated by fitting the morphable model to the silhouettes. It is done relying on extracted silhouettes only. An error function is defined to measure how well the human model fits the input data, and minimize it to get the good estimate result. Further, experiments on some data show the robustness of the method, where the body shape and the initial pose can be obtained automatically

    MONOCULAR POSE ESTIMATION AND SHAPE RECONSTRUCTION OF QUASI-ARTICULATED OBJECTS WITH CONSUMER DEPTH CAMERA

    Get PDF
    Quasi-articulated objects, such as human beings, are among the most commonly seen objects in our daily lives. Extensive research have been dedicated to 3D shape reconstruction and motion analysis for this type of objects for decades. A major motivation is their wide applications, such as in entertainment, surveillance and health care. Most of existing studies relied on one or more regular video cameras. In recent years, commodity depth sensors have become more and more widely available. The geometric measurements delivered by the depth sensors provide significantly valuable information for these tasks. In this dissertation, we propose three algorithms for monocular pose estimation and shape reconstruction of quasi-articulated objects using a single commodity depth sensor. These three algorithms achieve shape reconstruction with increasing levels of granularity and personalization. We then further develop a method for highly detailed shape reconstruction based on our pose estimation techniques. Our first algorithm takes advantage of a motion database acquired with an active marker-based motion capture system. This method combines pose detection through nearest neighbor search with pose refinement via non-rigid point cloud registration. It is capable of accommodating different body sizes and achieves more than twice higher accuracy compared to a previous state of the art on a publicly available dataset. The above algorithm performs frame by frame estimation and therefore is less prone to tracking failure. Nonetheless, it does not guarantee temporal consistent of the both the skeletal structure and the shape and could be problematic for some applications. To address this problem, we develop a real-time model-based approach for quasi-articulated pose and 3D shape estimation based on Iterative Closest Point (ICP) principal with several novel constraints that are critical for monocular scenario. In this algorithm, we further propose a novel method for automatic body size estimation that enables its capability to accommodate different subjects. Due to the local search nature, the ICP-based method could be trapped to local minima in the case of some complex and fast motions. To address this issue, we explore the potential of using statistical model for soft point correspondences association. Towards this end, we propose a unified framework based on Gaussian Mixture Model for joint pose and shape estimation of quasi-articulated objects. This method achieves state-of-the-art performance on various publicly available datasets. Based on our pose estimation techniques, we then develop a novel framework that achieves highly detailed shape reconstruction by only requiring the user to move naturally in front of a single depth sensor. Our experiments demonstrate reconstructed shapes with rich geometric details for various subjects with different apparels. Last but not the least, we explore the applicability of our method on two real-world applications. First of all, we combine our ICP-base method with cloth simulation techniques for Virtual Try-on. Our system delivers the first promising 3D-based virtual clothing system. Secondly, we explore the possibility to extend our pose estimation algorithms to assist physical therapist to identify their patients’ movement dysfunctions that are related to injuries. Our preliminary experiments have demonstrated promising results by comparison with the gold standard active marker-based commercial system. Throughout the dissertation, we develop various state-of-the-art algorithms for pose estimation and shape reconstruction of quasi-articulated objects by leveraging the geometric information from depth sensors. We also demonstrate their great potentials for different real-world applications

    Inferring Human Pose and Motion from Images

    No full text
    As optical gesture recognition technology advances, touchless human computer interfaces of the future will soon become a reality. One particular technology, markerless motion capture, has gained a large amount of attention, with widespread application in diverse disciplines, including medical science, sports analysis, advanced user interfaces, and virtual arts. However, the complexity of human anatomy makes markerless motion capture a non-trivial problem: I) parameterised pose configuration exhibits high dimensionality, and II) there is considerable ambiguity in surjective inverse mapping from observation to pose configuration spaces with a limited number of camera views. These factors together lead to multimodality in high dimensional space, making markerless motion capture an ill-posed problem. This study challenges these difficulties by introducing a new framework. It begins with automatically modelling specific subject template models and calibrating posture at the initial stage. Subsequent tracking is accomplished by embedding naturally-inspired global optimisation into the sequential Bayesian filtering framework. Tracking is enhanced by several robust evaluation improvements. Sparsity of images is managed by compressive evaluation, further accelerating computational efficiency in high dimensional space

    Human Motion Analysis for Efficient Action Recognition

    Get PDF
    Automatic understanding of human actions is at the core of several application domains, such as content-based indexing, human-computer interaction, surveillance, and sports video analysis. The recent advances in digital platforms and the exponential growth of video and image data have brought an urgent quest for intelligent frameworks to automatically analyze human motion and predict their corresponding action based on visual data and sensor signals. This thesis presents a collection of methods that targets human action recognition using different action modalities. The first method uses the appearance modality and classifies human actions based on heterogeneous global- and local-based features of scene and humanbody appearances. The second method harnesses 2D and 3D articulated human poses and analyizes the body motion using a discriminative combination of the parts’ velocities, locations, and correlations histograms for action recognition. The third method presents an optimal scheme for combining the probabilistic predictions from different action modalities by solving a constrained quadratic optimization problem. In addition to the action classification task, we present a study that compares the utility of different pose variants in motion analysis for human action recognition. In particular, we compare the recognition performance when 2D and 3D poses are used. Finally, we demonstrate the efficiency of our pose-based method for action recognition in spotting and segmenting motion gestures in real time from a continuous stream of an input video for the recognition of the Italian sign gesture language
    • …
    corecore