16,964 research outputs found
Unconstrained Face Recognition
Although face recognition has been actively studied over the past
decade, the state-of-the-art recognition systems yield
satisfactory performance only under controlled scenarios and
recognition accuracy degrades significantly when confronted with
unconstrained situations due to variations such as illumintion,
pose, etc. In this dissertation, we propose novel approaches that
are able to recognize human faces under unconstrained situations.
Part I presents algorithms for face recognition under
illumination/pose variations. For face recognition across
illuminations, we present a generalized photometric stereo
approach by modeling all face appearances belonging to all humans
under all lighting conditions. Using a linear generalization, we
achieve a factorization of the observation matrix consisting of
face appearances of different individuals, each under a different
illumination. We resolve ambiguities in factorization using
surface integrability and symmetry constraints. In addition, an
illumination-invariant identity descriptor is provided to perform
face recognition across illuminations. We further extend the
generalized photometric stereo approach to an illuminating light
field approach, which is able to recognize faces under pose and
illumination variations.
Face appearance lies in a high-dimensional nonlinear manifold. In
Part II, we introduce machine learning approaches based on
reproducing kernel Hilbert space (RKHS) to capture higher-order
statistical characteristics of the nonlinear appearance manifold.
In particular, we analyze principal components of the RKHS in a
probabilistic manner and compute distances such as the Chernoff
distance, the Kullback-Leibler divergence between two Gaussian
densities in RKHS.
Part III is on face tracking and recognition from video. We first
present an enhanced tracking algorithm that models online
appearance changes in a video sequence using a mixture model and
produces good tracking results in various challenging scenarios.
For video-based face recognition, while conventional approaches
treat tracking and recognition separately, we present a
simultaneous tracking-and-recognition approach. This simultaneous
approach solved using the sequential importance sampling
algorithm improves accuracy in both tracking and recognition.
Finally, we propose a unifying framework called probabilistic
identity characterization able to perform face recognition under
registration/illumination/pose variation and from a still image,
a group of still images, or a video sequence
Object Referring in Videos with Language and Human Gaze
We investigate the problem of object referring (OR) i.e. to localize a target
object in a visual scene coming with a language description. Humans perceive
the world more as continued video snippets than as static images, and describe
objects not only by their appearance, but also by their spatio-temporal context
and motion features. Humans also gaze at the object when they issue a referring
expression. Existing works for OR mostly focus on static images only, which
fall short in providing many such cues. This paper addresses OR in videos with
language and human gaze. To that end, we present a new video dataset for OR,
with 30, 000 objects over 5, 000 stereo video sequences annotated for their
descriptions and gaze. We further propose a novel network model for OR in
videos, by integrating appearance, motion, gaze, and spatio-temporal context
into one network. Experimental results show that our method effectively
utilizes motion cues, human gaze, and spatio-temporal context. Our method
outperforms previousOR methods. For dataset and code, please refer
https://people.ee.ethz.ch/~arunv/ORGaze.html.Comment: Accepted to CVPR 2018, 10 pages, 6 figure
MonoPerfCap: Human Performance Capture from Monocular Video
We present the first marker-less approach for temporally coherent 3D
performance capture of a human with general clothing from monocular video. Our
approach reconstructs articulated human skeleton motion as well as medium-scale
non-rigid surface deformations in general scenes. Human performance capture is
a challenging problem due to the large range of articulation, potentially fast
motion, and considerable non-rigid deformations, even from multi-view data.
Reconstruction from monocular video alone is drastically more challenging,
since strong occlusions and the inherent depth ambiguity lead to a highly
ill-posed reconstruction problem. We tackle these challenges by a novel
approach that employs sparse 2D and 3D human pose detections from a
convolutional neural network using a batch-based pose estimation strategy.
Joint recovery of per-batch motion allows to resolve the ambiguities of the
monocular reconstruction problem based on a low dimensional trajectory
subspace. In addition, we propose refinement of the surface geometry based on
fully automatically extracted silhouettes to enable medium-scale non-rigid
alignment. We demonstrate state-of-the-art performance capture results that
enable exciting applications such as video editing and free viewpoint video,
previously infeasible from monocular video. Our qualitative and quantitative
evaluation demonstrates that our approach significantly outperforms previous
monocular methods in terms of accuracy, robustness and scene complexity that
can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201
Finite Element Based Tracking of Deforming Surfaces
We present an approach to robustly track the geometry of an object that
deforms over time from a set of input point clouds captured from a single
viewpoint. The deformations we consider are caused by applying forces to known
locations on the object's surface. Our method combines the use of prior
information on the geometry of the object modeled by a smooth template and the
use of a linear finite element method to predict the deformation. This allows
the accurate reconstruction of both the observed and the unobserved sides of
the object. We present tracking results for noisy low-quality point clouds
acquired by either a stereo camera or a depth camera, and simulations with
point clouds corrupted by different error terms. We show that our method is
also applicable to large non-linear deformations.Comment: additional experiment
Pedestrian detection in uncontrolled environments using stereo and biometric information
A method for pedestrian detection from challenging real world outdoor scenes is presented in this paper. This technique is able to extract multiple pedestrians, of varying orientations and appearances, from a scene even when faced with large and multiple occlusions. The technique is also robust to changing background lighting conditions and effects, such as shadows. The technique applies an enhanced method from which reliable disparity information can be obtained even from untextured homogeneous areas within a scene. This is used in conjunction with ground plane estimation and biometric information,to obtain reliable pedestrian regions. These regions are robust to erroneous areas of disparity data and also to severe pedestrian occlusion, which often occurs in unconstrained scenarios
- …