50 research outputs found

    Markerless Motion Capture in the Crowd

    Full text link
    This work uses crowdsourcing to obtain motion capture data from video recordings. The data is obtained by information workers who click repeatedly to indicate body configurations in the frames of a video, resulting in a model of 2D structure over time. We discuss techniques to optimize the tracking task and strategies for maximizing accuracy and efficiency. We show visualizations of a variety of motions captured with our pipeline then apply reconstruction techniques to derive 3D structure.Comment: Presented at Collective Intelligence conference, 2012 (arXiv:1204.2991

    Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation

    Full text link
    This paper proposes a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose estimation in monocular images. The architecture can exploit structural domain constraints such as geometric relationships between body joint locations. We show that joint training of these two model paradigms improves performance and allows us to significantly outperform existing state-of-the-art techniques

    MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation

    Full text link
    In this work, we propose a novel and efficient method for articulated human pose estimation in videos using a convolutional network architecture, which incorporates both color and motion features. We propose a new human body pose dataset, FLIC-motion, that extends the FLIC dataset with additional motion features. We apply our architecture to this dataset and report significantly better performance than current state-of-the-art pose detection systems

    Learning Human Pose Estimation Features with Convolutional Networks

    Full text link
    This paper introduces a new architecture for human pose estimation using a multi- layer convolutional network architecture and a modified learning technique that learns low-level features and higher-level weak spatial models. Unconstrained human pose estimation is one of the hardest problems in computer vision, and our new architecture and learning schema shows significant improvement over the current state-of-the-art results. The main contribution of this paper is showing, for the first time, that a specific variation of deep learning is able to outperform all existing traditional architectures on this task. The paper also discusses several lessons learned while researching alternatives, most notably, that it is possible to learn strong low-level feature detectors on features that might even just cover a few pixels in the image. Higher-level spatial models improve somewhat the overall result, but to a much lesser extent then expected. Many researchers previously argued that the kinematic structure and top-down information is crucial for this domain, but with our purely bottom up, and weak spatial model, we could improve other more complicated architectures that currently produce the best results. This mirrors what many other researchers, like those in the speech recognition, object recognition, and other domains have experienced

    Probabilistic Recognition of Human Actions

    No full text
    ion Current speech recognition systems are prime examples where multiple levels of abstraction are integrated successfully. In the lip-reading project, we used a "hybrid system" [5] that combines the various levels in a probabilistic way. The lowest level were features obtained from Eigen-Images. The mid-level where categories similar to phonemes. We developed a set of smallest visual units called "visemes" and composed higher-level word models using such a coding. Because of the small database size this decomposition boosted the generalization performance of our system. We believe that the analog coding of primitive actions and complex actions will have the same advantages. 5.4 Bottom-Up and Top-Down in a Probabilistic Framework In addition to the earlier studies cited above, these techniques were also important in the lip-reading project. In order to find the lips we investigated an iterative technique that incorporates the position of other facial parts in a probabilistic way [10]..

    Tracking People with Twists and Exponential Maps

    No full text
    This paper demonstrates a new visual motion estimation technique that is able to recover high degree-of-freedom articulated human body configurations in complex video sequences. We introduce the use of a novel mathematical technique, the product of exponential maps and twist motions, and its integration into a differential motion estimation. This results in solving simple linear systems, and enables us to recover robustly the kinematic degrees-offreedom in noise and complex self occluded configurations. We demonstrate this on several image sequences of people doing articulated full body movements, and visualize the results in re-animating an artificial 3D human model. We are also able to recover and re-animate the famous movements of Eadweard Muybridge's motion studies from the last century. To the best of our knowledge, this is the first computer vision based system that is able to process such challenging footage and recover complex motions with such high accuracy
    corecore