1,102 research outputs found
3D Face Reconstruction from Light Field Images: A Model-free Approach
Reconstructing 3D facial geometry from a single RGB image has recently
instigated wide research interest. However, it is still an ill-posed problem
and most methods rely on prior models hence undermining the accuracy of the
recovered 3D faces. In this paper, we exploit the Epipolar Plane Images (EPI)
obtained from light field cameras and learn CNN models that recover horizontal
and vertical 3D facial curves from the respective horizontal and vertical EPIs.
Our 3D face reconstruction network (FaceLFnet) comprises a densely connected
architecture to learn accurate 3D facial curves from low resolution EPIs. To
train the proposed FaceLFnets from scratch, we synthesize photo-realistic light
field images from 3D facial scans. The curve by curve 3D face estimation
approach allows the networks to learn from only 14K images of 80 identities,
which still comprises over 11 Million EPIs/curves. The estimated facial curves
are merged into a single pointcloud to which a surface is fitted to get the
final 3D face. Our method is model-free, requires only a few training samples
to learn FaceLFnet and can reconstruct 3D faces with high accuracy from single
light field images under varying poses, expressions and lighting conditions.
Comparison on the BU-3DFE and BU-4DFE datasets show that our method reduces
reconstruction errors by over 20% compared to recent state of the art
Intelligent visual media processing: when graphics meets vision
The computer graphics and computer vision communities have been working closely together in recent
years, and a variety of algorithms and applications have been developed to analyze and manipulate the visual media
around us. There are three major driving forces behind this phenomenon: i) the availability of big data from the
Internet has created a demand for dealing with the ever increasing, vast amount of resources; ii) powerful processing
tools, such as deep neural networks, provide e�ective ways for learning how to deal with heterogeneous visual data;
iii) new data capture devices, such as the Kinect, bridge between algorithms for 2D image understanding and
3D model analysis. These driving forces have emerged only recently, and we believe that the computer graphics
and computer vision communities are still in the beginning of their honeymoon phase. In this work we survey
recent research on how computer vision techniques bene�t computer graphics techniques and vice versa, and cover
research on analysis, manipulation, synthesis, and interaction. We also discuss existing problems and suggest
possible further research directions
Multimodal Three Dimensional Scene Reconstruction, The Gaussian Fields Framework
The focus of this research is on building 3D representations of real world scenes and objects using different imaging sensors. Primarily range acquisition devices (such as laser scanners and stereo systems) that allow the recovery of 3D geometry, and multi-spectral image sequences including visual and thermal IR images that provide additional scene characteristics. The crucial technical challenge that we addressed is the automatic point-sets registration task. In this context our main contribution is the development of an optimization-based method at the core of which lies a unified criterion that solves simultaneously for the dense point correspondence and transformation recovery problems. The new criterion has a straightforward expression in terms of the datasets and the alignment parameters and was used primarily for 3D rigid registration of point-sets. However it proved also useful for feature-based multimodal image alignment. We derived our method from simple Boolean matching principles by approximation and relaxation. One of the main advantages of the proposed approach, as compared to the widely used class of Iterative Closest Point (ICP) algorithms, is convexity in the neighborhood of the registration parameters and continuous differentiability, allowing for the use of standard gradient-based optimization techniques. Physically the criterion is interpreted in terms of a Gaussian Force Field exerted by one point-set on the other. Such formulation proved useful for controlling and increasing the region of convergence, and hence allowing for more autonomy in correspondence tasks. Furthermore, the criterion can be computed with linear complexity using recently developed Fast Gauss Transform numerical techniques. In addition, we also introduced a new local feature descriptor that was derived from visual saliency principles and which enhanced significantly the performance of the registration algorithm. The resulting technique was subjected to a thorough experimental analysis that highlighted its strength and showed its limitations. Our current applications are in the field of 3D modeling for inspection, surveillance, and biometrics. However, since this matching framework can be applied to any type of data, that can be represented as N-dimensional point-sets, the scope of the method is shown to reach many more pattern analysis applications
STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits
We present a novel classifier network called STEP, to classify perceived
human emotion from gaits, based on a Spatial Temporal Graph Convolutional
Network (ST-GCN) architecture. Given an RGB video of an individual walking, our
formulation implicitly exploits the gait features to classify the emotional
state of the human into one of four emotions: happy, sad, angry, or neutral. We
use hundreds of annotated real-world gait videos and augment them with
thousands of annotated synthetic gaits generated using a novel generative
network called STEP-Gen, built on an ST-GCN based Conditional Variational
Autoencoder (CVAE). We incorporate a novel push-pull regularization loss in the
CVAE formulation of STEP-Gen to generate realistic gaits and improve the
classification accuracy of STEP. We also release a novel dataset (E-Gait),
which consists of human gaits annotated with perceived emotions along
with thousands of synthetic gaits. In practice, STEP can learn the affective
features and exhibits classification accuracy of 89% on E-Gait, which is 14 -
30% more accurate over prior methods
- …