110 research outputs found
Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation
In this paper we present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images. We propose to use high resolution volumetric heatmaps to model joint locations, devising a simple and effective compression method to drastically reduce the size of this representation. At the core of the proposed method lies our Volumetric Heatmap Autoencoder, a fully-convolutional network tasked with the compression of ground-truth heatmaps into a dense intermediate representation. A second model, the Code Predictor, is then trained to predict these codes, which can be decompressed at test time to re-obtain the original representation. Our experimental evaluation shows that our method performs favorably when compared to state of the art on both multi-person and single-person 3D human pose estimation datasets and, thanks to our novel compression strategy, can process full-HD images at the constant runtime of 8 fps regardless of the number of subjects in the scene
Representation learning of vertex heatmaps for 3D human mesh reconstruction from multi-view images
This study addresses the problem of 3D human mesh reconstruction from
multi-view images. Recently, approaches that directly estimate the skinned
multi-person linear model (SMPL)-based human mesh vertices based on volumetric
heatmap representation from input images have shown good performance. We show
that representation learning of vertex heatmaps using an autoencoder helps
improve the performance of such approaches. Vertex heatmap autoencoder (VHA)
learns the manifold of plausible human meshes in the form of latent codes using
AMASS, which is a large-scale motion capture dataset. Body code predictor (BCP)
utilizes the learned body prior from VHA for human mesh reconstruction from
multi-view images through latent code-based supervision and transfer of
pretrained weights. According to experiments on Human3.6M and LightStage
datasets, the proposed method outperforms previous methods and achieves
state-of-the-art human mesh reconstruction performance.Comment: ICIP 202
Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection
While the voxel-based methods have achieved promising results for
multi-person 3D pose estimation from multi-cameras, they suffer from heavy
computation burdens, especially for large scenes. We present Faster VoxelPose
to address the challenge by re-projecting the feature volume to the three
two-dimensional coordinate planes and estimating X, Y, Z coordinates from them
separately. To that end, we first localize each person by a 3D bounding box by
estimating a 2D box and its height based on the volume features projected to
the xy-plane and z-axis, respectively. Then for each person, we estimate
partial joint coordinates from the three coordinate planes separately which are
then fused to obtain the final 3D pose. The method is free from costly 3D-CNNs
and improves the speed of VoxelPose by ten times and meanwhile achieves
competitive accuracy as the state-of-the-art methods, proving its potential in
real-time applications.Comment: 22 pages, 7 figures, submitted to ECCV 202
Integrated In-vehicle Monitoring System Using 3D Human Pose Estimation and Seat Belt Segmentation
Recently, along with interest in autonomous vehicles, the importance of
monitoring systems for both drivers and passengers inside vehicles has been
increasing. This paper proposes a novel in-vehicle monitoring system the
combines 3D pose estimation, seat-belt segmentation, and seat-belt status
classification networks. Our system outputs various information necessary for
monitoring by accurately considering the data characteristics of the in-vehicle
environment. Specifically, the proposed 3D pose estimation directly estimates
the absolute coordinates of keypoints for a driver and passengers, and the
proposed seat-belt segmentation is implemented by applying a structure based on
the feature pyramid. In addition, we propose a classification task to
distinguish between normal and abnormal states of wearing a seat belt using
results that combine 3D pose estimation with seat-belt segmentation. These
tasks can be learned simultaneously and operate in real-time. Our method was
evaluated on a private dataset we newly created and annotated. The experimental
results show that our method has significantly high performance that can be
applied directly to real in-vehicle monitoring systems.Comment: AAAI 2022 workshop AI for Transportation accepte
Deep Learning-Based Human Pose Estimation: A Survey
Human pose estimation aims to locate the human body parts and build human
body representation (e.g., body skeleton) from input data such as images and
videos. It has drawn increasing attention during the past decade and has been
utilized in a wide range of applications including human-computer interaction,
motion analysis, augmented reality, and virtual reality. Although the recently
developed deep learning-based solutions have achieved high performance in human
pose estimation, there still remain challenges due to insufficient training
data, depth ambiguities, and occlusion. The goal of this survey paper is to
provide a comprehensive review of recent deep learning-based solutions for both
2D and 3D pose estimation via a systematic analysis and comparison of these
solutions based on their input data and inference procedures. More than 240
research papers since 2014 are covered in this survey. Furthermore, 2D and 3D
human pose estimation datasets and evaluation metrics are included.
Quantitative performance comparisons of the reviewed methods on popular
datasets are summarized and discussed. Finally, the challenges involved,
applications, and future research directions are concluded. We also provide a
regularly updated project page: \url{https://github.com/zczcwh/DL-HPE
Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D Pose Estimation Tracking and Forecasting on a Video Snippet
Multi-person pose understanding from RGB videos involves three complex tasks:
pose estimation, tracking and motion forecasting. Intuitively, accurate
multi-person pose estimation facilitates robust tracking, and robust tracking
builds crucial history for correct motion forecasting. Most existing works
either focus on a single task or employ multi-stage approaches to solving
multiple tasks separately, which tends to make sub-optimal decision at each
stage and also fail to exploit correlations among the three tasks. In this
paper, we propose Snipper, a unified framework to perform multi-person 3D pose
estimation, tracking, and motion forecasting simultaneously in a single stage.
We propose an efficient yet powerful deformable attention mechanism to
aggregate spatiotemporal information from the video snippet. Building upon this
deformable attention, a video transformer is learned to encode the
spatiotemporal features from the multi-frame snippet and to decode informative
pose features for multi-person pose queries. Finally, these pose queries are
regressed to predict multi-person pose trajectories and future motions in a
single shot. In the experiments, we show the effectiveness of Snipper on three
challenging public datasets where our generic model rivals specialized
state-of-art baselines for pose estimation, tracking, and forecasting
- …