423 research outputs found
Estimating Human Pose with Flowing Puppets
International audienceWe address the problem of upper-body human pose estimation in uncontrolled monocular video sequences, without manual initialization. Most current methods focus on isolated video frames and often fail to correctly localize arms and hands. Inferring pose over a video sequence is advantageous because poses of people in adjacent frames exhibit properties of smooth variation due to the nature of human and camera motion. To exploit this, previous methods have used prior knowledge about distinctive actions or generic temporal priors combined with static image likelihoods to track people in motion. Here we take a different approach based on a simple observation: Information about how a person moves from frame to frame is present in the optical flow field. We develop an approach for tracking articulated motions that "links" articulated shape models of people in adjacent frames through the dense optical flow. Key to this approach is a 2D shape model of the body that we use to compute how the body moves over time. The resulting "flowing puppets" provide a way of integrating image evidence across frames to improve pose inference. We apply our method on a challenging dataset of TV video sequences and show state-of-the-art performance
Learning Human Pose Estimation Features with Convolutional Networks
This paper introduces a new architecture for human pose estimation using a
multi- layer convolutional network architecture and a modified learning
technique that learns low-level features and higher-level weak spatial models.
Unconstrained human pose estimation is one of the hardest problems in computer
vision, and our new architecture and learning schema shows significant
improvement over the current state-of-the-art results. The main contribution of
this paper is showing, for the first time, that a specific variation of deep
learning is able to outperform all existing traditional architectures on this
task. The paper also discusses several lessons learned while researching
alternatives, most notably, that it is possible to learn strong low-level
feature detectors on features that might even just cover a few pixels in the
image. Higher-level spatial models improve somewhat the overall result, but to
a much lesser extent then expected. Many researchers previously argued that the
kinematic structure and top-down information is crucial for this domain, but
with our purely bottom up, and weak spatial model, we could improve other more
complicated architectures that currently produce the best results. This mirrors
what many other researchers, like those in the speech recognition, object
recognition, and other domains have experienced
Learning Human Optical Flow
The optical flow of humans is well known to be useful for the analysis of
human action. Given this, we devise an optical flow algorithm specifically for
human motion and show that it is superior to generic flow methods. Designing a
method by hand is impractical, so we develop a new training database of image
sequences with ground truth optical flow. For this we use a 3D model of the
human body and motion capture data to synthesize realistic flow fields. We then
train a convolutional neural network to estimate human flow fields from pairs
of images. Since many applications in human motion analysis depend on speed,
and we anticipate mobile applications, we base our method on SpyNet with
several modifications. We demonstrate that our trained network is more accurate
than a wide range of top methods on held-out test data and that it generalizes
well to real image sequences. When combined with a person detector/tracker, the
approach provides a full solution to the problem of 2D human flow estimation.
Both the code and the dataset are available for research.Comment: British Machine Vision Conference 2018 (Oral
- …