16,958 research outputs found

    Unconstrained Face Recognition

    Get PDF
    Although face recognition has been actively studied over the past decade, the state-of-the-art recognition systems yield satisfactory performance only under controlled scenarios and recognition accuracy degrades significantly when confronted with unconstrained situations due to variations such as illumintion, pose, etc. In this dissertation, we propose novel approaches that are able to recognize human faces under unconstrained situations. Part I presents algorithms for face recognition under illumination/pose variations. For face recognition across illuminations, we present a generalized photometric stereo approach by modeling all face appearances belonging to all humans under all lighting conditions. Using a linear generalization, we achieve a factorization of the observation matrix consisting of face appearances of different individuals, each under a different illumination. We resolve ambiguities in factorization using surface integrability and symmetry constraints. In addition, an illumination-invariant identity descriptor is provided to perform face recognition across illuminations. We further extend the generalized photometric stereo approach to an illuminating light field approach, which is able to recognize faces under pose and illumination variations. Face appearance lies in a high-dimensional nonlinear manifold. In Part II, we introduce machine learning approaches based on reproducing kernel Hilbert space (RKHS) to capture higher-order statistical characteristics of the nonlinear appearance manifold. In particular, we analyze principal components of the RKHS in a probabilistic manner and compute distances such as the Chernoff distance, the Kullback-Leibler divergence between two Gaussian densities in RKHS. Part III is on face tracking and recognition from video. We first present an enhanced tracking algorithm that models online appearance changes in a video sequence using a mixture model and produces good tracking results in various challenging scenarios. For video-based face recognition, while conventional approaches treat tracking and recognition separately, we present a simultaneous tracking-and-recognition approach. This simultaneous approach solved using the sequential importance sampling algorithm improves accuracy in both tracking and recognition. Finally, we propose a unifying framework called probabilistic identity characterization able to perform face recognition under registration/illumination/pose variation and from a still image, a group of still images, or a video sequence

    Object Referring in Videos with Language and Human Gaze

    Full text link
    We investigate the problem of object referring (OR) i.e. to localize a target object in a visual scene coming with a language description. Humans perceive the world more as continued video snippets than as static images, and describe objects not only by their appearance, but also by their spatio-temporal context and motion features. Humans also gaze at the object when they issue a referring expression. Existing works for OR mostly focus on static images only, which fall short in providing many such cues. This paper addresses OR in videos with language and human gaze. To that end, we present a new video dataset for OR, with 30, 000 objects over 5, 000 stereo video sequences annotated for their descriptions and gaze. We further propose a novel network model for OR in videos, by integrating appearance, motion, gaze, and spatio-temporal context into one network. Experimental results show that our method effectively utilizes motion cues, human gaze, and spatio-temporal context. Our method outperforms previousOR methods. For dataset and code, please refer https://people.ee.ethz.ch/~arunv/ORGaze.html.Comment: Accepted to CVPR 2018, 10 pages, 6 figure

    MonoPerfCap: Human Performance Capture from Monocular Video

    Full text link
    We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows to resolve the ambiguities of the monocular reconstruction problem based on a low dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness and scene complexity that can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201

    Finite Element Based Tracking of Deforming Surfaces

    Full text link
    We present an approach to robustly track the geometry of an object that deforms over time from a set of input point clouds captured from a single viewpoint. The deformations we consider are caused by applying forces to known locations on the object's surface. Our method combines the use of prior information on the geometry of the object modeled by a smooth template and the use of a linear finite element method to predict the deformation. This allows the accurate reconstruction of both the observed and the unobserved sides of the object. We present tracking results for noisy low-quality point clouds acquired by either a stereo camera or a depth camera, and simulations with point clouds corrupted by different error terms. We show that our method is also applicable to large non-linear deformations.Comment: additional experiment

    Pedestrian detection in uncontrolled environments using stereo and biometric information

    Get PDF
    A method for pedestrian detection from challenging real world outdoor scenes is presented in this paper. This technique is able to extract multiple pedestrians, of varying orientations and appearances, from a scene even when faced with large and multiple occlusions. The technique is also robust to changing background lighting conditions and effects, such as shadows. The technique applies an enhanced method from which reliable disparity information can be obtained even from untextured homogeneous areas within a scene. This is used in conjunction with ground plane estimation and biometric information,to obtain reliable pedestrian regions. These regions are robust to erroneous areas of disparity data and also to severe pedestrian occlusion, which often occurs in unconstrained scenarios