50 research outputs found
Markerless Motion Capture in the Crowd
This work uses crowdsourcing to obtain motion capture data from video
recordings. The data is obtained by information workers who click repeatedly to
indicate body configurations in the frames of a video, resulting in a model of
2D structure over time. We discuss techniques to optimize the tracking task and
strategies for maximizing accuracy and efficiency. We show visualizations of a
variety of motions captured with our pipeline then apply reconstruction
techniques to derive 3D structure.Comment: Presented at Collective Intelligence conference, 2012
(arXiv:1204.2991
Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
This paper proposes a new hybrid architecture that consists of a deep
Convolutional Network and a Markov Random Field. We show how this architecture
is successfully applied to the challenging problem of articulated human pose
estimation in monocular images. The architecture can exploit structural domain
constraints such as geometric relationships between body joint locations. We
show that joint training of these two model paradigms improves performance and
allows us to significantly outperform existing state-of-the-art techniques
MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation
In this work, we propose a novel and efficient method for articulated human
pose estimation in videos using a convolutional network architecture, which
incorporates both color and motion features. We propose a new human body pose
dataset, FLIC-motion, that extends the FLIC dataset with additional motion
features. We apply our architecture to this dataset and report significantly
better performance than current state-of-the-art pose detection systems
Learning Human Pose Estimation Features with Convolutional Networks
This paper introduces a new architecture for human pose estimation using a
multi- layer convolutional network architecture and a modified learning
technique that learns low-level features and higher-level weak spatial models.
Unconstrained human pose estimation is one of the hardest problems in computer
vision, and our new architecture and learning schema shows significant
improvement over the current state-of-the-art results. The main contribution of
this paper is showing, for the first time, that a specific variation of deep
learning is able to outperform all existing traditional architectures on this
task. The paper also discusses several lessons learned while researching
alternatives, most notably, that it is possible to learn strong low-level
feature detectors on features that might even just cover a few pixels in the
image. Higher-level spatial models improve somewhat the overall result, but to
a much lesser extent then expected. Many researchers previously argued that the
kinematic structure and top-down information is crucial for this domain, but
with our purely bottom up, and weak spatial model, we could improve other more
complicated architectures that currently produce the best results. This mirrors
what many other researchers, like those in the speech recognition, object
recognition, and other domains have experienced
Probabilistic Recognition of Human Actions
ion Current speech recognition systems are prime examples where multiple levels of abstraction are integrated successfully. In the lip-reading project, we used a "hybrid system" [5] that combines the various levels in a probabilistic way. The lowest level were features obtained from Eigen-Images. The mid-level where categories similar to phonemes. We developed a set of smallest visual units called "visemes" and composed higher-level word models using such a coding. Because of the small database size this decomposition boosted the generalization performance of our system. We believe that the analog coding of primitive actions and complex actions will have the same advantages. 5.4 Bottom-Up and Top-Down in a Probabilistic Framework In addition to the earlier studies cited above, these techniques were also important in the lip-reading project. In order to find the lips we investigated an iterative technique that incorporates the position of other facial parts in a probabilistic way [10]..
Tracking People with Twists and Exponential Maps
This paper demonstrates a new visual motion estimation technique that is able to recover high degree-of-freedom articulated human body configurations in complex video sequences. We introduce the use of a novel mathematical technique, the product of exponential maps and twist motions, and its integration into a differential motion estimation. This results in solving simple linear systems, and enables us to recover robustly the kinematic degrees-offreedom in noise and complex self occluded configurations. We demonstrate this on several image sequences of people doing articulated full body movements, and visualize the results in re-animating an artificial 3D human model. We are also able to recover and re-animate the famous movements of Eadweard Muybridge's motion studies from the last century. To the best of our knowledge, this is the first computer vision based system that is able to process such challenging footage and recover complex motions with such high accuracy