5 research outputs found

    Pictorial Human Spaces : A Computational Study on the Human Perception of 3D Articulated Poses

    No full text
    Human motion analysis in images and video, with its deeply inter-related 2D and 3D inference components, is a central computer vision problem. Yet, there are no studies that reveal how humans perceive other people in images and how accurate they are. In this paper we aim to unveil some of the processing—as well as the levels of accuracy—involved in the 3D perception of people from images by assessing the human performance. Moreover, we reveal the quantitative and qualitative differences between human and computer performance when presented with the same visual stimuli and show that metrics incorporating human perception can produce more meaningful results when integrated into automatic pose prediction algorithms. Our contributions are: (1) the construction of an experimental apparatus that relates perception and measurement, in particular the visual and kinematic performance with respect to 3D ground truth when the human subject is presented an image of a person in a given pose; (2) the creation of a dataset containing images, articulated 2D and 3D pose ground truth, as well as synchronized eye movement recordings of human subjects, shown a variety of human body configurations, both easy and difficult, as well as their ‘re-enacted’ 3D poses; (3) quantitative analysis revealing the human performance in 3D pose re-enactment tasks, the degree of stability in the visual fixation patterns of human subjects, and the way it correlates with different poses; (4) extensive analysis on the differences between human re-enactments and poses produced by an automatic system when presented with the same visual stimuli; (5) an approach to learning perceptual metrics that, when integrated into visual sensing systems, produces more stable and meaningful results

    Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments.

    No full text
    We introduce a new dataset, Human3.6M, of 3.6 Million 3D Human poses, acquired by recording the performance of 11 subjects, under 4 different viewpoints, for training realistic human sensing systems and for evaluating the next generation of human pose estimation models. Besides increasing the size the current state of the art datasets by several orders of magnitude, we aim to complement such datasets with a diverse set of poses encountered in typical human activities (taking photos, posing, greeting, eating, etc.), with synchronized image, motion capture and depth data, and with accurate 3D body scans of all subjects involved. We also provide mixed reality videos where 3D human models are animated using motion capture data and inserted using correct 3D geometry, in complex real environments, viewed with moving cameras, and under occlusion. Finally, we provide large scale statistical models and detailed evaluation baselines for the dataset illustrating its diversity and the scope for improvement by future work in the research community. The dataset and code for the associated large-scale learning models, features, visualization tools, as well as the evaluation server, are available online at http://vision.imar.ro/human3.6m

    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014 1 Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

    No full text
    Abstract—We introduce a new dataset,Human3.6M, of3.6Million accurate 3DHuman poses, acquired by recording the performance of 5 female and 6 male subjects, under 4 different viewpoints, for training realistic human sensing systems and for evaluating the next generation of human pose estimation models and algorithms. Besides increasing the size of the datasets in the current state of the art by several orders of magnitude, we also aim to complement such datasets with a diverse set of motions and poses encountered as part of typical human activities (taking photos, talking on the phone, posing, greeting, eating, etc.), with additional synchronized image, human motion capture and time of flight (depth) data, and with accurate 3D body scans of all the subject actors involved. We also provide controlled mixed reality evaluation scenarios where 3D human models are animated using motion capture and inserted using correct 3D geometry, in complex real environments, viewed with moving cameras, and under occlusion. Finally, we provide a set of large scale statistical models and detailed evaluation baselines for the dataset illustrating its diversity and the scope for improvement by future work in the research community. Our experiments show that our best large scale model can leverage our full training set to obtain a 20 % improvement in performance compared to a training set of the scale of the largest existing public dataset for this problem. Yet the potential for improvement by leveraging higher capacity, more complex models with our large dataset, is substantially vaster and should stimulate future research. The dataset together with code for the associated large-scale learning models, features, visualization tools, as well as the evaluation server, is available online a
    corecore