128,307 research outputs found
SurfNet: Generating 3D shape surfaces using deep residual networks
3D shape models are naturally parameterized using vertices and faces, \ie,
composed of polygons forming a surface. However, current 3D learning paradigms
for predictive and generative tasks using convolutional neural networks focus
on a voxelized representation of the object. Lifting convolution operators from
the traditional 2D to 3D results in high computational overhead with little
additional benefit as most of the geometry information is contained on the
surface boundary. Here we study the problem of directly generating the 3D shape
surface of rigid and non-rigid shapes using deep convolutional neural networks.
We develop a procedure to create consistent `geometry images' representing the
shape surface of a category of 3D objects. We then use this consistent
representation for category-specific shape surface generation from a parametric
representation or an image by developing novel extensions of deep residual
networks for the task of geometry image generation. Our experiments indicate
that our network learns a meaningful representation of shape surfaces allowing
it to interpolate between shape orientations and poses, invent new shape
surfaces and reconstruct 3D shape surfaces from previously unseen images.Comment: CVPR 2017 pape
3D Face Reconstruction from Light Field Images: A Model-free Approach
Reconstructing 3D facial geometry from a single RGB image has recently
instigated wide research interest. However, it is still an ill-posed problem
and most methods rely on prior models hence undermining the accuracy of the
recovered 3D faces. In this paper, we exploit the Epipolar Plane Images (EPI)
obtained from light field cameras and learn CNN models that recover horizontal
and vertical 3D facial curves from the respective horizontal and vertical EPIs.
Our 3D face reconstruction network (FaceLFnet) comprises a densely connected
architecture to learn accurate 3D facial curves from low resolution EPIs. To
train the proposed FaceLFnets from scratch, we synthesize photo-realistic light
field images from 3D facial scans. The curve by curve 3D face estimation
approach allows the networks to learn from only 14K images of 80 identities,
which still comprises over 11 Million EPIs/curves. The estimated facial curves
are merged into a single pointcloud to which a surface is fitted to get the
final 3D face. Our method is model-free, requires only a few training samples
to learn FaceLFnet and can reconstruct 3D faces with high accuracy from single
light field images under varying poses, expressions and lighting conditions.
Comparison on the BU-3DFE and BU-4DFE datasets show that our method reduces
reconstruction errors by over 20% compared to recent state of the art
Real-Time Human Motion Capture with Multiple Depth Cameras
Commonly used human motion capture systems require intrusive attachment of
markers that are visually tracked with multiple cameras. In this work we
present an efficient and inexpensive solution to markerless motion capture
using only a few Kinect sensors. Unlike the previous work on 3d pose estimation
using a single depth camera, we relax constraints on the camera location and do
not assume a co-operative user. We apply recent image segmentation techniques
to depth images and use curriculum learning to train our system on purely
synthetic data. Our method accurately localizes body parts without requiring an
explicit shape model. The body joint locations are then recovered by combining
evidence from multiple views in real-time. We also introduce a dataset of ~6
million synthetic depth frames for pose estimation from multiple cameras and
exceed state-of-the-art results on the Berkeley MHAD dataset.Comment: Accepted to computer robot vision 201
Weakly supervised 3D Reconstruction with Adversarial Constraint
Supervised 3D reconstruction has witnessed a significant progress through the
use of deep neural networks. However, this increase in performance requires
large scale annotations of 2D/3D data. In this paper, we explore inexpensive 2D
supervision as an alternative for expensive 3D CAD annotation. Specifically, we
use foreground masks as weak supervision through a raytrace pooling layer that
enables perspective projection and backpropagation. Additionally, since the 3D
reconstruction from masks is an ill posed problem, we propose to constrain the
3D reconstruction to the manifold of unlabeled realistic 3D shapes that match
mask observations. We demonstrate that learning a log-barrier solution to this
constrained optimization problem resembles the GAN objective, enabling the use
of existing tools for training GANs. We evaluate and analyze the manifold
constrained reconstruction on various datasets for single and multi-view
reconstruction of both synthetic and real images
Vision-based hand gesture interaction using particle filter, principle component analysis and transition network
Vision-based human-computer interaction is becoming important nowadays. It offers natural interaction with computers and frees users from mechanical interaction devices, which is favourable especially for wearable computers. This paper presents a human-computer interaction system based on a conventional webcam and hand gesture recognition. This interaction system works in real time and enables users to control a computer cursor with hand motions and gestures instead of a mouse. Five hand gestures are designed on behalf of five mouse operations: moving, left click, left-double click, right click and no-action. An algorithm based on Particle Filter is used for tracking the hand position. PCA-based feature selection is used for recognizing the hand gestures. A transition network is also employed for improving the accuracy and reliability of the interaction system. This interaction system shows good performance in the recognition and interaction test
- …