2,105 research outputs found

    OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

    Full text link
    Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people in images and videos. In this work, we present a realtime approach to detect the 2D pose of multiple people in an image. The proposed method uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. This bottom-up system achieves high accuracy and realtime performance, regardless of the number of people in the image. In previous work, PAFs and body part location estimation were refined simultaneously across training stages. We demonstrate that a PAF-only refinement rather than both PAF and body part location refinement results in a substantial increase in both runtime performance and accuracy. We also present the first combined body and foot keypoint detector, based on an internal annotated foot dataset that we have publicly released. We show that the combined detector not only reduces the inference time compared to running them sequentially, but also maintains the accuracy of each component individually. This work has culminated in the release of OpenPose, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints.Comment: Journal version of arXiv:1611.08050, with better accuracy and faster speed, release a new foot keypoint dataset: https://cmu-perceptual-computing-lab.github.io/foot_keypoint_dataset

    Improving Multi-Person Pose Estimation using Label Correction

    Full text link
    Significant attention is being paid to multi-person pose estimation methods recently, as there has been rapid progress in the field owing to convolutional neural networks. Especially, recent method which exploits part confidence maps and Part Affinity Fields (PAFs) has achieved accurate real-time prediction of multi-person keypoints. However, human annotated labels are sometimes inappropriate for learning models. For example, if there is a limb that extends outside an image, a keypoint for the limb may not have annotations because it exists outside of the image, and thus the labels for the limb can not be generated. If a model is trained with data including such missing labels, the output of the model for the location, even though it is correct, is penalized as a false positive, which is likely to cause negative effects on the performance of the model. In this paper, we point out the existence of some patterns of inappropriate labels, and propose a novel method for correcting such labels with a teacher model trained on such incomplete data. Experiments on the COCO dataset show that training with the corrected labels improves the performance of the model and also speeds up training

    Efficient Online Multi-Person 2D Pose Tracking with Recurrent Spatio-Temporal Affinity Fields

    Full text link
    We present an online approach to efficiently and simultaneously detect and track the 2D pose of multiple people in a video sequence. We build upon Part Affinity Field (PAF) representation designed for static images, and propose an architecture that can encode and predict Spatio-Temporal Affinity Fields (STAF) across a video sequence. In particular, we propose a novel temporal topology cross-linked across limbs which can consistently handle body motions of a wide range of magnitudes. Additionally, we make the overall approach recurrent in nature, where the network ingests STAF heatmaps from previous frames and estimates those for the current frame. Our approach uses only online inference and tracking, and is currently the fastest and the most accurate bottom-up approach that is runtime invariant to the number of people in the scene and accuracy invariant to input frame rate of camera. Running at ∼\sim30 fps on a single GPU at single scale, it achieves highly competitive results on the PoseTrack benchmarks

    Out of the Box: A combined approach for handling occlusion in Human Pose Estimation

    Full text link
    Human Pose estimation is a challenging problem, especially in the case of 3D pose estimation from 2D images due to many different factors like occlusion, depth ambiguities, intertwining of people, and in general crowds. 2D multi-person human pose estimation in the wild also suffers from the same problems - occlusion, ambiguities, and disentanglement of people's body parts. Being a fundamental problem with loads of applications, including but not limited to surveillance, economical motion capture for video games and movies, and physiotherapy, this is an interesting problem to be solved both from a practical perspective and from an intellectual perspective as well. Although there are cases where no pose estimation can ever predict with 100% accuracy (cases where even humans would fail), there are several algorithms that have brought new state-of-the-art performance in human pose estimation in the wild. We look at a few algorithms with different approaches and also formulate our own approach to tackle a consistently bugging problem, i.e. occlusions.Comment: 11 pages, 12 figure

    Looking at Hands in Autonomous Vehicles: A ConvNet Approach using Part Affinity Fields

    Full text link
    In the context of autonomous driving, where humans may need to take over in the event where the computer may issue a takeover request, a key step towards driving safety is the monitoring of the hands to ensure the driver is ready for such a request. This work, focuses on the first step of this process, which is to locate the hands. Such a system must work in real-time and under varying harsh lighting conditions. This paper introduces a fast ConvNet approach, based on the work of original work of OpenPose for full body joint estimation. The network is modified with fewer parameters and retrained using our own day-time naturalistic autonomous driving dataset to estimate joint and affinity heatmaps for driver & passenger's wrist and elbows, for a total of 8 joint classes and part affinity fields between each wrist-elbow pair. The approach runs real-time on real-world data at 40 fps on multiple drivers and passengers. The system is extensively evaluated both quantitatively and qualitatively, showing at least 95% detection performance on joint localization and arm-angle estimation.Comment: 11 pages, 8 figures, 1 table. Submitted to "IEEE Transactions on Intelligent Vehicles" (under review

    4D Association Graph for Realtime Multi-person Motion Capture Using Multiple Video Cameras

    Full text link
    This paper contributes a novel realtime multi-person motion capture algorithm using multiview video inputs. Due to the heavy occlusions in each view, joint optimization on the multiview images and multiple temporal frames is indispensable, which brings up the essential challenge of realtime efficiency. To this end, for the first time, we unify per-view parsing, cross-view matching, and temporal tracking into a single optimization framework, i.e., a 4D association graph that each dimension (image space, viewpoint and time) can be treated equally and simultaneously. To solve the 4D association graph efficiently, we further contribute the idea of 4D limb bundle parsing based on heuristic searching, followed with limb bundle assembling by proposing a bundle Kruskal's algorithm. Our method enables a realtime online motion capture system running at 30fps using 5 cameras on a 5-person scene. Benefiting from the unified parsing, matching and tracking constraints, our method is robust to noisy detection, and achieves high-quality online pose reconstruction quality. The proposed method outperforms the state-of-the-art method quantitatively without using high-level appearance information. We also contribute a multiview video dataset synchronized with a marker-based motion capture system for scientific evaluation.Comment: Accepted to CVPR 202

    FoxNet: A Multi-face Alignment Method

    Full text link
    Multi-face alignment aims to identify geometry structures of multiple faces in an image, and its performance is essential for the many practical tasks, such as face recognition, face tracking, and face animation. In this work, we present a fast bottom-up multi-face alignment approach, which can simultaneously localize multi-person facial landmarks with high precision.In more detail, our bottom-up architecture maps the landmarks to the high-dimensional space with which landmarks of all faces are represented. By clustering the features belonging to the same face, our approach can align the multi-person facial landmarks synchronously.Extensive experiments show that our method can achieve high performance in the multi-face landmark alignment task while our model is extremely fast. Moreover, we propose a new multi-face dataset to compare the speed and precision of bottom-up face alignment method with top-down methods. Our dataset is publicly available at https://github.com/AISAResearch/FoxNetComment: Accepted by the 26th IEEE International Conference on Image Processing(ICIP2019

    Dual Path Networks for Multi-Person Human Pose Estimation

    Full text link
    The task of multi-person human pose estimation in natural scenes is quite challenging. Existing methods include both top-down and bottom-up approaches. The main advantage of bottom-up methods is its excellent tradeoff between estimation accuracy and computational cost. We follow this path and aim to design smaller, faster, and more accurate neural networks for the regression of keypoints and limb association vectors. These two regression tasks are naturally dependent on each other. In this work, we propose a dual-path network specially designed for multi-person human pose estimation, and compare our performance with the openpose network in aspects of model size, forward speed, and estimation accuracy.Comment: ICCV 2017 Workshop on PoseTrack Challenge. Challenge results available at: https://posetrack.net/workshops/iccv2017/posetrack-challenge-results.htm

    Cascade Feature Aggregation for Human Pose Estimation

    Full text link
    Human pose estimation plays an important role in many computer vision tasks and has been studied for many decades. However, due to complex appearance variations from poses, illuminations, occlusions and low resolutions, it still remains a challenging problem. Taking the advantage of high-level semantic information from deep convolutional neural networks is an effective way to improve the accuracy of human pose estimation. In this paper, we propose a novel Cascade Feature Aggregation (CFA) method, which cascades several hourglass networks for robust human pose estimation. Features from different stages are aggregated to obtain abundant contextual information, leading to robustness to poses, partial occlusions and low resolution. Moreover, results from different stages are fused to further improve the localization accuracy. The extensive experiments on MPII datasets and LIP datasets demonstrate that our proposed CFA outperforms the state-of-the-art and achieves the best performance on the state-of-the-art benchmark MPII

    Pose estimator and tracker using temporal flow maps for limbs

    Full text link
    For human pose estimation in videos, it is significant how to use temporal information between frames. In this paper, we propose temporal flow maps for limbs (TML) and a multi-stride method to estimate and track human poses. The proposed temporal flow maps are unit vectors describing the limbs' movements. We constructed a network to learn both spatial information and temporal information end-to-end. Spatial information such as joint heatmaps and part affinity fields is regressed in the spatial network part, and the TML is regressed in the temporal network part. We also propose a data augmentation method to learn various types of TML better. The proposed multi-stride method expands the data by randomly selecting two frames within a defined range. We demonstrate that the proposed method efficiently estimates and tracks human poses on the PoseTrack 2017 and 2018 datasets.Comment: Won the Honorable Mention Award in the 18'ECCV PoseTrack challenge. Accepted in the 19'IJCNN conferenc
    • …
    corecore