650 research outputs found
Learning to Refine Human Pose Estimation
Multi-person pose estimation in images and videos is an important yet
challenging task with many applications. Despite the large improvements in
human pose estimation enabled by the development of convolutional neural
networks, there still exist a lot of difficult cases where even the
state-of-the-art models fail to correctly localize all body joints. This
motivates the need for an additional refinement step that addresses these
challenging cases and can be easily applied on top of any existing method. In
this work, we introduce a pose refinement network (PoseRefiner) which takes as
input both the image and a given pose estimate and learns to directly predict a
refined pose by jointly reasoning about the input-output space. In order for
the network to learn to refine incorrect body joint predictions, we employ a
novel data augmentation scheme for training, where we model "hard" human pose
cases. We evaluate our approach on four popular large-scale pose estimation
benchmarks such as MPII Single- and Multi-Person Pose Estimation, PoseTrack
Pose Estimation, and PoseTrack Pose Tracking, and report systematic improvement
over the state of the art.Comment: To appear in CVPRW (2018). Workshop: Visual Understanding of Humans
in Crowd Scene and the 2nd Look Into Person Challenge (VUHCS-LIP
Self-supervised Keypoint Correspondences for Multi-Person Pose Estimation and Tracking in Videos
Video annotation is expensive and time consuming. Consequently, datasets for
multi-person pose estimation and tracking are less diverse and have more sparse
annotations compared to large scale image datasets for human pose estimation.
This makes it challenging to learn deep learning based models for associating
keypoints across frames that are robust to nuisance factors such as motion blur
and occlusions for the task of multi-person pose tracking. To address this
issue, we propose an approach that relies on keypoint correspondences for
associating persons in videos. Instead of training the network for estimating
keypoint correspondences on video data, it is trained on a large scale image
datasets for human pose estimation using self-supervision. Combined with a
top-down framework for human pose estimation, we use keypoints correspondences
to (i) recover missed pose detections (ii) associate pose detections across
video frames. Our approach achieves state-of-the-art results for multi-frame
pose estimation and multi-person pose tracking on the PosTrack and
PoseTrack data sets.Comment: Submitted to ECCV 202
A Unified Framework for Mutual Improvement of SLAM and Semantic Segmentation
This paper presents a novel framework for simultaneously implementing
localization and segmentation, which are two of the most important vision-based
tasks for robotics. While the goals and techniques used for them were
considered to be different previously, we show that by making use of the
intermediate results of the two modules, their performance can be enhanced at
the same time. Our framework is able to handle both the instantaneous motion
and long-term changes of instances in localization with the help of the
segmentation result, which also benefits from the refined 3D pose information.
We conduct experiments on various datasets, and prove that our framework works
effectively on improving the precision and robustness of the two tasks and
outperforms existing localization and segmentation algorithms.Comment: 7 pages, 5 figures.This work has been accepted by ICRA 2019. The demo
video can be found at https://youtu.be/Bkt53dAehj
- …