388 research outputs found
Colour invariants for machine face recognition
Illumination invariance remains the most researched, yet the most challenging aspect of automatic face recognition. In this paper we investigate the discriminative power of colour-based invariants in the presence of large illumination changes between training and test data, when appearance changes due to cast shadows and non-Lambertian effects are significant. Specifically, there are three main contributions: (i) we employ a more sophisticated photometric model of the camera and show how its parameters can be estimated, (ii) we derive several novel colour-based face invariants, and (iii) on a large database of video sequences we examine and evaluate the largest number of colour-based representations in the literature. Our results suggest that colour invariants do have a substantial discriminative power which may increase the robustness and accuracy of recognition from low resolution images
Robust 3D face capture using example-based photometric stereo
We show that using example-based photometric stereo, it is possible to achieve realistic reconstructions of the human face. The method can handle non-Lambertian reflectance and attached shadows after a simple calibration step. We use spherical harmonics to model and de-noise the illumination functions from images of a reference object with known shape, and a fast grid technique to invert those functions and recover the surface normal for each point of the target object. The depth coordinate is obtained by weighted multi-scale integration of these normals, using an integration weight mask obtained automatically from the images themselves. We have applied these techniques to improve the PHOTOFACE system of Hansen et al. (2010). © 2013 Elsevier B.V. All rights reserved
Consensus Message Passing for Layered Graphical Models
Generative models provide a powerful framework for probabilistic reasoning.
However, in many domains their use has been hampered by the practical
difficulties of inference. This is particularly the case in computer vision,
where models of the imaging process tend to be large, loopy and layered. For
this reason bottom-up conditional models have traditionally dominated in such
domains. We find that widely-used, general-purpose message passing inference
algorithms such as Expectation Propagation (EP) and Variational Message Passing
(VMP) fail on the simplest of vision models. With these models in mind, we
introduce a modification to message passing that learns to exploit their
layered structure by passing 'consensus' messages that guide inference towards
good solutions. Experiments on a variety of problems show that the proposed
technique leads to significantly more accurate inference results, not only when
compared to standard EP and VMP, but also when compared to competitive
bottom-up conditional models.Comment: Appearing in Proceedings of the 18th International Conference on
Artificial Intelligence and Statistics (AISTATS) 201
Moving cast shadows detection methods for video surveillance applications
Moving cast shadows are a major concern in today’s performance from broad range of many vision-based surveillance applications because they highly difficult the object classification task. Several shadow detection methods have been reported in the literature during the last years. They are mainly divided into two domains. One usually works with static images, whereas the second one uses image sequences, namely video content. In spite of the fact that both cases can be analogously analyzed, there is a difference in the application field. The first case, shadow detection methods can be exploited in order to obtain additional geometric and semantic cues about shape and position of its casting object (’shape from shadows’) as well as the localization of the light source. While in the second one, the main purpose is usually change detection, scene matching or surveillance (usually in a background subtraction context). Shadows can in fact modify in a negative way the shape and color of the target object and therefore affect the performance of scene analysis and interpretation in many applications. This chapter wills mainly reviews shadow detection methods as well as their taxonomies related with the second case, thus aiming at those shadows which are associated with moving objects (moving shadows).Peer Reviewe
Overcoming the Challenges Associated with Image-based Mapping of Small Bodies in Preparation for the OSIRIS-REx Mission to (101955) Bennu
The OSIRIS-REx Asteroid Sample Return Mission is the third mission in NASA's
New Frontiers Program and is the first U.S. mission to return samples from an
asteroid to Earth. The most important decision ahead of the OSIRIS-REx team is
the selection of a prime sample-site on the surface of asteroid (101955) Bennu.
Mission success hinges on identifying a site that is safe and has regolith that
can readily be ingested by the spacecraft's sampling mechanism. To inform this
mission-critical decision, the surface of Bennu is mapped using the OSIRIS-REx
Camera Suite and the images are used to develop several foundational data
products. Acquiring the necessary inputs to these data products requires
observational strategies that are defined specifically to overcome the
challenges associated with mapping a small irregular body. We present these
strategies in the context of assessing candidate sample-sites at Bennu
according to a framework of decisions regarding the relative safety,
sampleability, and scientific value across the asteroid's surface. To create
data products that aid these assessments, we describe the best practices
developed by the OSIRIS-REx team for image-based mapping of irregular small
bodies. We emphasize the importance of using 3D shape models and the ability to
work in body-fixed rectangular coordinates when dealing with planetary surfaces
that cannot be uniquely addressed by body-fixed latitude and longitude.Comment: 31 pages, 10 figures, 2 table
Subspace Representations for Robust Face and Facial Expression Recognition
Analyzing human faces and modeling their variations have always been of interest to the computer vision community. Face analysis based on 2D intensity images is a challenging problem, complicated by variations in pose, lighting, blur, and non-rigid facial deformations due to facial expressions. Among the different sources of variation, facial expressions are of interest as important channels of non-verbal communication. Facial expression analysis is also affected by changes in view-point and inter-subject variations in performing different expressions. This dissertation makes an attempt to address some of the challenges involved in developing robust algorithms for face and facial expression recognition by exploiting the idea of proper subspace representations for data.
Variations in the visual appearance of an object mostly arise due to changes in illumination and pose. So we first present a video-based sequential algorithm for estimating the face albedo as an illumination-insensitive signature for face recognition. We show that by knowing/estimating the pose of the face at each frame of a sequence, the albedo can be efficiently estimated using a Kalman filter. Then we extend this to the case of unknown pose by simultaneously tracking the pose as well as updating the albedo through an efficient Bayesian inference method performed using a Rao-Blackwellized particle filter.
Since understanding the effects of blur, especially motion blur, is an important problem in unconstrained visual analysis, we then propose a blur-robust recognition algorithm for faces with spatially varying blur. We model a blurred face as a weighted average of geometrically transformed instances of its clean face. We then build a matrix, for each gallery face, whose column space spans the space of all the motion blurred images obtained from the clean face. This matrix representation is then used to define a proper objective function and perform blur-robust face recognition.
To develop robust and generalizable models for expression analysis one needs to break the dependence of the models on the choice of the coordinate frame of the camera. To this end, we build models for expressions on the affine shape-space (Grassmann manifold), as an approximation to the projective shape-space, by using a Riemannian interpretation of deformations that facial expressions cause on different parts of the face. This representation enables us to perform various expression analysis and recognition algorithms without the need for pose normalization as a preprocessing step.
There is a large degree of inter-subject variations in performing various expressions. This poses an important challenge on developing robust facial expression recognition algorithms. To address this challenge, we propose a dictionary-based approach for facial expression analysis by decomposing expressions in terms of action units (AUs). First, we construct an AU-dictionary using domain experts' knowledge of AUs. To incorporate the high-level knowledge regarding expression decomposition and AUs, we then perform structure-preserving sparse coding by imposing two layers of grouping over AU-dictionary atoms as well as over the test image matrix columns. We use the computed sparse code matrix for each expressive face to perform expression decomposition and recognition.
Most of the existing methods for the recognition of faces and expressions consider either the expression-invariant face recognition problem or the identity-independent facial expression recognition problem. We propose joint face and facial expression recognition using a dictionary-based component separation algorithm (DCS). In this approach, the given expressive face is viewed as a superposition of a neutral face component with a facial expression component, which is sparse with respect to the whole image. This assumption leads to a dictionary-based component separation algorithm, which benefits from the idea of sparsity and morphological diversity. The DCS algorithm uses the data-driven dictionaries to decompose an expressive test face into its constituent components. The sparse codes we obtain as a result of this decomposition are then used for joint face and expression recognition
Statistical Approaches to Inferring Object Shape from Single Images
Depth inference is a fundamental problem of computer vision with a broad range of potential applications. Monocular depth inference techniques, particularly shape from shading dates back to as early as the 40's when it was first used to study the shape of the lunar surface. Since then there has been ample research to develop depth inference algorithms using monocular cues. Most of these are based on physical models of image formation and rely on a number of simplifying assumptions that do not hold for real world and natural imagery. Very few make use of the rich statistical information contained in real world images and their 3D information. There have been a few notable exceptions though. The study of statistics of natural scenes has been concentrated on outdoor scenes which are cluttered. Statistics of scenes of single objects has been less studied, but is an essential part of daily human interaction with the environment. Inferring shape of single objects is a very important computer vision problem which has captured the interest of many researchers over the past few decades and has applications in object recognition, robotic grasping, fault detection and Content Based Image Retrieval (CBIR). This thesis focuses on studying the statistical properties of single objects and their range images which can benefit shape inference techniques. I acquired two databases: Single Object Range and HDR (SORH) and the Eton Myers Database of single objects, including laser-acquired depth, binocular stereo, photometric stereo and High Dynamic Range (HDR) photography. I took a data driven approach and studied the statistics of color and range images of real scenes of single objects along with whole 3D objects and uncovered some interesting trends in the data. The fractal structure of natural images was previously well known, and thought to be a universal property. However, my research showed that the fractal structure of single objects and surfaces is governed by a wholly different set of rules. Classical computer vision problems of binocular and multi-view stereo, photometric stereo, shape from shading, structure from motion, and others, all rely on accurate and complete models of which 3D shapes and textures are plausible in nature, to avoid producing unlikely outputs. Bayesian approaches are common for these problems, and hopefully the findings on the statistics of the shape of single objects from this work and others will both inform new and more accurate Bayesian priors on shape, and also enable more efficient probabilistic inference procedures
Lunar navigation study, sections 1 through 7 Final report, Jun. 1964 - May 1965
Lunar navigation analysis using passive nongyro, inertial navigation, and radio frequency technolog
- …