10,257 research outputs found
Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB
We propose a new single-shot method for multi-person 3D pose estimation in
general scenes from a monocular RGB camera. Our approach uses novel
occlusion-robust pose-maps (ORPM) which enable full body pose inference even
under strong partial occlusions by other people and objects in the scene. ORPM
outputs a fixed number of maps which encode the 3D joint locations of all
people in the scene. Body part associations allow us to infer 3D pose for an
arbitrary number of people without explicit bounding box prediction. To train
our approach we introduce MuCo-3DHP, the first large scale training data set
showing real images of sophisticated multi-person interactions and occlusions.
We synthesize a large corpus of multi-person images by compositing images of
individual people (with ground truth from mutli-view performance capture). We
evaluate our method on our new challenging 3D annotated multi-person test set
MuPoTs-3D where we achieve state-of-the-art performance. To further stimulate
research in multi-person 3D pose estimation, we will make our new datasets, and
associated code publicly available for research purposes.Comment: International Conference on 3D Vision (3DV), 201
Implicit 3D Orientation Learning for 6D Object Detection from RGB Images
We propose a real-time RGB-based pipeline for object detection and 6D pose
estimation. Our novel 3D orientation estimation is based on a variant of the
Denoising Autoencoder that is trained on simulated views of a 3D model using
Domain Randomization. This so-called Augmented Autoencoder has several
advantages over existing methods: It does not require real, pose-annotated
training data, generalizes to various test sensors and inherently handles
object and view symmetries. Instead of learning an explicit mapping from input
images to object poses, it provides an implicit representation of object
orientations defined by samples in a latent space. Our pipeline achieves
state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-D
domain. We also evaluate on the LineMOD dataset where we can compete with other
synthetically trained approaches. We further increase performance by correcting
3D orientation estimates to account for perspective errors when the object
deviates from the image center and show extended results.Comment: Code available at: https://github.com/DLR-RM/AugmentedAutoencode
Going Further with Point Pair Features
Point Pair Features is a widely used method to detect 3D objects in point
clouds, however they are prone to fail in presence of sensor noise and
background clutter. We introduce novel sampling and voting schemes that
significantly reduces the influence of clutter and sensor noise. Our
experiments show that with our improvements, PPFs become competitive against
state-of-the-art methods as it outperforms them on several objects from
challenging benchmarks, at a low computational cost.Comment: Corrected post-print of manuscript accepted to the European
Conference on Computer Vision (ECCV) 2016;
https://link.springer.com/chapter/10.1007/978-3-319-46487-9_5
XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera
We present a real-time approach for multi-person 3D motion capture at over 30
fps using a single RGB camera. It operates successfully in generic scenes which
may contain occlusions by objects and by other people. Our method operates in
subsequent stages. The first stage is a convolutional neural network (CNN) that
estimates 2D and 3D pose features along with identity assignments for all
visible joints of all individuals.We contribute a new architecture for this
CNN, called SelecSLS Net, that uses novel selective long and short range skip
connections to improve the information flow allowing for a drastically faster
network without compromising accuracy. In the second stage, a fully connected
neural network turns the possibly partial (on account of occlusion) 2Dpose and
3Dpose features for each subject into a complete 3Dpose estimate per
individual. The third stage applies space-time skeletal model fitting to the
predicted 2D and 3D pose per subject to further reconcile the 2D and 3D pose,
and enforce temporal coherence. Our method returns the full skeletal pose in
joint angles for each subject. This is a further key distinction from previous
work that do not produce joint angle results of a coherent skeleton in real
time for multi-person scenes. The proposed system runs on consumer hardware at
a previously unseen speed of more than 30 fps given 512x320 images as input
while achieving state-of-the-art accuracy, which we will demonstrate on a range
of challenging real-world scenes.Comment: To appear in ACM Transactions on Graphics (SIGGRAPH) 202
- …