12 research outputs found
Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
This paper proposes a new hybrid architecture that consists of a deep
Convolutional Network and a Markov Random Field. We show how this architecture
is successfully applied to the challenging problem of articulated human pose
estimation in monocular images. The architecture can exploit structural domain
constraints such as geometric relationships between body joint locations. We
show that joint training of these two model paradigms improves performance and
allows us to significantly outperform existing state-of-the-art techniques
MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation
In this work, we propose a novel and efficient method for articulated human
pose estimation in videos using a convolutional network architecture, which
incorporates both color and motion features. We propose a new human body pose
dataset, FLIC-motion, that extends the FLIC dataset with additional motion
features. We apply our architecture to this dataset and report significantly
better performance than current state-of-the-art pose detection systems
Do Convnets Learn Correspondence?
Convolutional neural nets (convnets) trained from massive labeled datasets
have substantially improved the state-of-the-art in image classification and
object detection. However, visual understanding requires establishing
correspondence on a finer level than object category. Given their large pooling
regions and training from whole-image labels, it is not clear that convnets
derive their success from an accurate correspondence model which could be used
for precise localization. In this paper, we study the effectiveness of convnet
activation features for tasks requiring correspondence. We present evidence
that convnet features localize at a much finer scale than their receptive field
sizes, that they can be used to perform intraclass alignment as well as
conventional hand-engineered features, and that they outperform conventional
features in keypoint prediction on objects from PASCAL VOC 2011
DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model
The goal of this paper is to advance the state-of-the-art of articulated pose
estimation in scenes with multiple people. To that end we contribute on three
fronts. We propose (1) improved body part detectors that generate effective
bottom-up proposals for body parts; (2) novel image-conditioned pairwise terms
that allow to assemble the proposals into a variable number of consistent body
part configurations; and (3) an incremental optimization strategy that explores
the search space more efficiently thus leading both to better performance and
significant speed-up factors. Evaluation is done on two single-person and two
multi-person pose estimation benchmarks. The proposed approach significantly
outperforms best known multi-person pose estimation results while demonstrating
competitive performance on the task of single person pose estimation. Models
and code available at http://pose.mpi-inf.mpg.deComment: ECCV'16. High-res version at
https://www.d2.mpi-inf.mpg.de/sites/default/files/insafutdinov16arxiv.pd
MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network
In this paper, we present MultiPoseNet, a novel bottom-up multi-person pose
estimation architecture that combines a multi-task model with a novel
assignment method. MultiPoseNet can jointly handle person detection, keypoint
detection, person segmentation and pose estimation problems. The novel
assignment method is implemented by the Pose Residual Network (PRN) which
receives keypoint and person detections, and produces accurate poses by
assigning keypoints to person instances. On the COCO keypoints dataset, our
pose estimation method outperforms all previous bottom-up methods both in
accuracy (+4-point mAP over previous best result) and speed; it also performs
on par with the best top-down methods while being at least 4x faster. Our
method is the fastest real time system with 23 frames/sec. Source code is
available at: https://github.com/mkocabas/pose-residual-networkComment: to appear in ECCV 201
A graph-based approach can improve keypoint detection of complex poses: a proof-of-concept on injury occurrences in alpine ski racing
For most applications, 2D keypoint detection works well and offers a simple and fast tool to analyse human movements. However, there remain many situations where even the best state-of-the-art algorithms reach their limits and fail to detect human keypoints correctly. Such situations may occur especially when individual body parts are occluded, twisted, or when the whole person is flipped. Especially when analysing injuries in alpine ski racing, such twisted and rotated body positions occur frequently. To improve the detection of keypoints for this application, we developed a novel method that refines keypoint estimates by rotating the input videos. We select the best rotation for every frame with a graph-based global solver. Thereby, we improve keypoint detection of an arbitrary pose estimation algorithm, in particular for 'hard' keypoints. In the current proof-of-concept study, we show that our approach outperforms standard keypoint detection results in all categories and in all metrics, in injury-related out-of-balance and fall situations by a large margin as well as previous methods, in performance and robustness. The Injury Ski II dataset was made publicly available, aiming to facilitate the investigation of sports accidents based on computer vision in the future
Articulated Pose Estimation using Discriminative Armlet Classifiers
solely on strong contours and edges will fail to detect the upper and lower parts of the arms. We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image. For this purpose, we build on the notion of poselets [4] and train highly discriminative classifiers to differentiate among arm configurations, which we call armlets. We propose a rich representation which, in addition to standard HOG features, integrates the information of strong contours, skin color and contextual cues in a principled manner. Unlike existing methods, we evaluate our approach on a large subset of images from the PASCAL VOC detection dataset, where critical visual phenomena, such as occlusion, truncation, multiple instances and clutter are the norm. Our approach outperforms Yang and Ramanan [26], the state-of-the-art technique, with an improvement from 29.0 % to 37.5 % PCP accuracy on the arm keypoint prediction task, on this new pose estimation dataset. 1
Graph-based human pose estimation using neural networks
This thesis investigates the problem of human pose estimation (HPE) from unconstrained single two-dimensional (2D) images using Convolutional Neural Networks (CNNs). Recent approaches propose to solve the HPE problem using various forms of CNN models. Some of these methods focus on training deeper and more computationally expensive CNN structures to classify images of people without any prior knowledge of their poses. Other approaches incorporate an existing prior knowledge of human anatomy and train the CNNs to construct graph-representations of the human pose. These approaches are generally characterised as having lower computational and data requirements. This thesis investigates HPE methods based on the latter approach. In the search for the most accurate and computationally efficient HPE, it explores and compares three types of graph-based pose representations: tree-based, non-tree based, and a hybrid approach combiningbothrepresentations. Thethesiscontributionsarethree-fold. Firstly,theeffectofdifferent CNN structures on the HPE was analysed. New, more efficient network configurations were proposed and tested against the benchmark methods. The proposed configurations achieved offered computational simplicity while maintaining relatively high-performance. Secondly, new data-driven tree-based models were proposed as a modified form of the Chow-Liu Recursive Grouping (CLRG) algorithm. These models were applied within the CNN-based HPE framework showing higher performance compared to the traditional anatomy-based tree-based models. Experiments with different numbers and configurations of tree nodes allowed the determination of a very efficient tree-based configuration consisting of 50 nodes. This configuration achieved higher HPE accuracy compared to the previously proposed 26-node tree. Apart from tree-based models of human pose, efficient non-tree-based models with iterative (looping) connections between nodes were also investigated. The third contribution of this thesis is a novel hybrid HPE framework that combines both tree-based and non-tree-based human pose representations. Experimental results have shown that the hybrid approach leads to higher accuracy compared to either tree-based,or non-tree-based structures individually