2,927 research outputs found
Geometry-Aware Network for Non-Rigid Shape Prediction from a Single View
We propose a method for predicting the 3D shape of a deformable surface from
a single view. By contrast with previous approaches, we do not need a
pre-registered template of the surface, and our method is robust to the lack of
texture and partial occlusions. At the core of our approach is a {\it
geometry-aware} deep architecture that tackles the problem as usually done in
analytic solutions: first perform 2D detection of the mesh and then estimate a
3D shape that is geometrically consistent with the image. We train this
architecture in an end-to-end manner using a large dataset of synthetic
renderings of shapes under different levels of deformation, material
properties, textures and lighting conditions. We evaluate our approach on a
test split of this dataset and available real benchmarks, consistently
improving state-of-the-art solutions with a significantly lower computational
time.Comment: Accepted at CVPR 201
Recovering 6D Object Pose: A Review and Multi-modal Analysis
A large number of studies analyse object detection and pose estimation at
visual level in 2D, discussing the effects of challenges such as occlusion,
clutter, texture, etc., on the performances of the methods, which work in the
context of RGB modality. Interpreting the depth data, the study in this paper
presents thorough multi-modal analyses. It discusses the above-mentioned
challenges for full 6D object pose estimation in RGB-D images comparing the
performances of several 6D detectors in order to answer the following
questions: What is the current position of the computer vision community for
maintaining "automation" in robotic manipulation? What next steps should the
community take for improving "autonomy" in robotics while handling objects? Our
findings include: (i) reasonably accurate results are obtained on
textured-objects at varying viewpoints with cluttered backgrounds. (ii) Heavy
existence of occlusion and clutter severely affects the detectors, and
similar-looking distractors is the biggest challenge in recovering instances'
6D. (iii) Template-based methods and random forest-based learning algorithms
underlie object detection and 6D pose estimation. Recent paradigm is to learn
deep discriminative feature representations and to adopt CNNs taking RGB images
as input. (iv) Depending on the availability of large-scale 6D annotated depth
datasets, feature representations can be learnt on these datasets, and then the
learnt representations can be customized for the 6D problem
Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB
We propose a new single-shot method for multi-person 3D pose estimation in
general scenes from a monocular RGB camera. Our approach uses novel
occlusion-robust pose-maps (ORPM) which enable full body pose inference even
under strong partial occlusions by other people and objects in the scene. ORPM
outputs a fixed number of maps which encode the 3D joint locations of all
people in the scene. Body part associations allow us to infer 3D pose for an
arbitrary number of people without explicit bounding box prediction. To train
our approach we introduce MuCo-3DHP, the first large scale training data set
showing real images of sophisticated multi-person interactions and occlusions.
We synthesize a large corpus of multi-person images by compositing images of
individual people (with ground truth from mutli-view performance capture). We
evaluate our method on our new challenging 3D annotated multi-person test set
MuPoTs-3D where we achieve state-of-the-art performance. To further stimulate
research in multi-person 3D pose estimation, we will make our new datasets, and
associated code publicly available for research purposes.Comment: International Conference on 3D Vision (3DV), 201
V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map
Most of the existing deep learning-based methods for 3D hand and human pose
estimation from a single depth map are based on a common framework that takes a
2D depth map and directly regresses the 3D coordinates of keypoints, such as
hand or human body joints, via 2D convolutional neural networks (CNNs). The
first weakness of this approach is the presence of perspective distortion in
the 2D depth map. While the depth map is intrinsically 3D data, many previous
methods treat depth maps as 2D images that can distort the shape of the actual
object through projection from 3D to 2D space. This compels the network to
perform perspective distortion-invariant estimation. The second weakness of the
conventional approach is that directly regressing 3D coordinates from a 2D
image is a highly non-linear mapping, which causes difficulty in the learning
procedure. To overcome these weaknesses, we firstly cast the 3D hand and human
pose estimation problem from a single depth map into a voxel-to-voxel
prediction that uses a 3D voxelized grid and estimates the per-voxel likelihood
for each keypoint. We design our model as a 3D CNN that provides accurate
estimates while running in real-time. Our system outperforms previous methods
in almost all publicly available 3D hand and human pose estimation datasets and
placed first in the HANDS 2017 frame-based 3D hand pose estimation challenge.
The code is available in https://github.com/mks0601/V2V-PoseNet_RELEASE.Comment: HANDS 2017 Challenge Frame-based 3D Hand Pose Estimation Winner (ICCV
2017), Published at CVPR 201
MoSculp: Interactive Visualization of Shape and Time
We present a system that allows users to visualize complex human motion via
3D motion sculptures---a representation that conveys the 3D structure swept by
a human body as it moves through space. Given an input video, our system
computes the motion sculptures and provides a user interface for rendering it
in different styles, including the options to insert the sculpture back into
the original video, render it in a synthetic scene or physically print it.
To provide this end-to-end workflow, we introduce an algorithm that estimates
that human's 3D geometry over time from a set of 2D images and develop a
3D-aware image-based rendering approach that embeds the sculpture back into the
scene. By automating the process, our system takes motion sculpture creation
out of the realm of professional artists, and makes it applicable to a wide
range of existing video material.
By providing viewers with 3D information, motion sculptures reveal space-time
motion information that is difficult to perceive with the naked eye, and allow
viewers to interpret how different parts of the object interact over time. We
validate the effectiveness of this approach with user studies, finding that our
motion sculpture visualizations are significantly more informative about motion
than existing stroboscopic and space-time visualization methods.Comment: UIST 2018. Project page: http://mosculp.csail.mit.edu
- …