5,765 research outputs found
Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
This paper proposes a new hybrid architecture that consists of a deep
Convolutional Network and a Markov Random Field. We show how this architecture
is successfully applied to the challenging problem of articulated human pose
estimation in monocular images. The architecture can exploit structural domain
constraints such as geometric relationships between body joint locations. We
show that joint training of these two model paradigms improves performance and
allows us to significantly outperform existing state-of-the-art techniques
Flowing ConvNets for Human Pose Estimation in Videos
The objective of this work is human pose estimation in videos, where multiple
frames are available. We investigate a ConvNet architecture that is able to
benefit from temporal context by combining information across the multiple
frames using optical flow.
To this end we propose a network architecture with the following novelties:
(i) a deeper network than previously investigated for regressing heatmaps; (ii)
spatial fusion layers that learn an implicit spatial model; (iii) optical flow
is used to align heatmap predictions from neighbouring frames; and (iv) a final
parametric pooling layer which learns to combine the aligned heatmaps into a
pooled confidence map.
We show that this architecture outperforms a number of others, including one
that uses optical flow solely at the input layers, one that regresses joint
coordinates directly, and one that predicts heatmaps without spatial fusion.
The new architecture outperforms the state of the art by a large margin on
three video pose estimation datasets, including the very challenging Poses in
the Wild dataset, and outperforms other deep methods that don't use a graphical
model on the single-image FLIC benchmark (and also Chen & Yuille and Tompson et
al. in the high precision region).Comment: ICCV'1
Combining Local Appearance and Holistic View: Dual-Source Deep Neural Networks for Human Pose Estimation
We propose a new learning-based method for estimating 2D human pose from a
single image, using Dual-Source Deep Convolutional Neural Networks (DS-CNN).
Recently, many methods have been developed to estimate human pose by using pose
priors that are estimated from physiologically inspired graphical models or
learned from a holistic perspective. In this paper, we propose to integrate
both the local (body) part appearance and the holistic view of each local part
for more accurate human pose estimation. Specifically, the proposed DS-CNN
takes a set of image patches (category-independent object proposals for
training and multi-scale sliding windows for testing) as the input and then
learns the appearance of each local part by considering their holistic views in
the full body. Using DS-CNN, we achieve both joint detection, which determines
whether an image patch contains a body joint, and joint localization, which
finds the exact location of the joint in the image patch. Finally, we develop
an algorithm to combine these joint detection/localization results from all the
image patches for estimating the human pose. The experimental results show the
effectiveness of the proposed method by comparing to the state-of-the-art
human-pose estimation methods based on pose priors that are estimated from
physiologically inspired graphical models or learned from a holistic
perspective.Comment: CVPR 201
Multi-Person Pose Estimation with Local Joint-to-Person Associations
Despite of the recent success of neural networks for human pose estimation,
current approaches are limited to pose estimation of a single person and cannot
handle humans in groups or crowds. In this work, we propose a method that
estimates the poses of multiple persons in an image in which a person can be
occluded by another person or might be truncated. To this end, we consider
multi-person pose estimation as a joint-to-person association problem. We
construct a fully connected graph from a set of detected joint candidates in an
image and resolve the joint-to-person association and outlier detection using
integer linear programming. Since solving joint-to-person association jointly
for all persons in an image is an NP-hard problem and even approximations are
expensive, we solve the problem locally for each person. On the challenging
MPII Human Pose Dataset for multiple persons, our approach achieves the
accuracy of a state-of-the-art method, but it is 6,000 to 19,000 times faster.Comment: Accepted to European Conference on Computer Vision (ECCV) Workshops,
Crowd Understanding, 201
Evaluation of Deep Learning based Pose Estimation for Sign Language Recognition
Human body pose estimation and hand detection are two important tasks for
systems that perform computer vision-based sign language recognition(SLR).
However, both tasks are challenging, especially when the input is color videos,
with no depth information. Many algorithms have been proposed in the literature
for these tasks, and some of the most successful recent algorithms are based on
deep learning. In this paper, we introduce a dataset for human pose estimation
for SLR domain. We evaluate the performance of two deep learning based pose
estimation methods, by performing user-independent experiments on our dataset.
We also perform transfer learning, and we obtain results that demonstrate that
transfer learning can improve pose estimation accuracy. The dataset and results
from these methods can create a useful baseline for future works
Self Adversarial Training for Human Pose Estimation
This paper presents a deep learning based approach to the problem of human
pose estimation. We employ generative adversarial networks as our learning
paradigm in which we set up two stacked hourglass networks with the same
architecture, one as the generator and the other as the discriminator. The
generator is used as a human pose estimator after the training is done. The
discriminator distinguishes ground-truth heatmaps from generated ones, and
back-propagates the adversarial loss to the generator. This process enables the
generator to learn plausible human body configurations and is shown to be
useful for improving the prediction accuracy.Comment: CVPR 2017 Workshop on Visual Understanding of Humans in Crowd Scene
and the 1st Look Into Person (LIP) Challeng
- …