49 research outputs found

    Phenomenological modeling of image irradiance for non-Lambertian surfaces under natural illumination.

    Get PDF
    Various vision tasks are usually confronted by appearance variations due to changes of illumination. For instance, in a recognition system, it has been shown that the variability in human face appearance is owed to changes to lighting conditions rather than person\u27s identity. Theoretically, due to the arbitrariness of the lighting function, the space of all possible images of a fixed-pose object under all possible illumination conditions is infinite dimensional. Nonetheless, it has been proven that the set of images of a convex Lambertian surface under distant illumination lies near a low dimensional linear subspace. This result was also extended to include non-Lambertian objects with non-convex geometry. As such, vision applications, concerned with the recovery of illumination, reflectance or surface geometry from images, would benefit from a low-dimensional generative model which captures appearance variations w.r.t. illumination conditions and surface reflectance properties. This enables the formulation of such inverse problems as parameter estimation. Typically, subspace construction boils to performing a dimensionality reduction scheme, e.g. Principal Component Analysis (PCA), on a large set of (real/synthesized) images of object(s) of interest with fixed pose but different illumination conditions. However, this approach has two major problems. First, the acquired/rendered image ensemble should be statistically significant vis-a-vis capturing the full behavior of the sources of variations that is of interest, in particular illumination and reflectance. Second, the curse of dimensionality hinders numerical methods such as Singular Value Decomposition (SVD) which becomes intractable especially with large number of large-sized realizations in the image ensemble. One way to bypass the need of large image ensemble is to construct appearance subspaces using phenomenological models which capture appearance variations through mathematical abstraction of the reflection process. In particular, the harmonic expansion of the image irradiance equation can be used to derive an analytic subspace to represent images under fixed pose but different illumination conditions where the image irradiance equation has been formulated in a convolution framework. Due to their low-frequency nature, irradiance signals can be represented using low-order basis functions, where Spherical Harmonics (SH) has been extensively adopted. Typically, an ideal solution to the image irradiance (appearance) modeling problem should be able to incorporate complex illumination, cast shadows as well as realistic surface reflectance properties, while moving away from the simplifying assumptions of Lambertian reflectance and single-source distant illumination. By handling arbitrary complex illumination and non-Lambertian reflectance, the appearance model proposed in this dissertation moves the state of the art closer to the ideal solution. This work primarily addresses the geometrical compliance of the hemispherical basis for representing surface reflectance while presenting a compact, yet accurate representation for arbitrary materials. To maintain the plausibility of the resulting appearance, the proposed basis is constructed in a manner that satisfies the Helmholtz reciprocity property while avoiding high computational complexity. It is believed that having the illumination and surface reflectance represented in the spherical and hemispherical domains respectively, while complying with the physical properties of the surface reflectance would provide better approximation accuracy of image irradiance when compared to the representation in the spherical domain. Discounting subsurface scattering and surface emittance, this work proposes a surface reflectance basis, based on hemispherical harmonics (HSH), defined on the Cartesian product of the incoming and outgoing local hemispheres (i.e. w.r.t. surface points). This basis obeys physical properties of surface reflectance involving reciprocity and energy conservation. The basis functions are validated using analytical reflectance models as well as scattered reflectance measurements which might violate the Helmholtz reciprocity property (this can be filtered out through the process of projecting them on the subspace spanned by the proposed basis, where the reciprocity property is preserved in the least-squares sense). The image formation process of isotropic surfaces under arbitrary distant illumination is also formulated in the frequency space where the orthogonality relation between illumination and reflectance bases is encoded in what is termed as irradiance harmonics. Such harmonics decouple the effect of illumination and reflectance from the underlying pose and geometry. Further, a bilinear approach to analytically construct irradiance subspace is proposed in order to tackle the inherent problem of small-sample-size and curse of dimensionality. The process of finding the analytic subspace is posed as establishing a relation between its principal components and that of the irradiance harmonics basis functions. It is also shown how to incorporate prior information about natural illumination and real-world surface reflectance characteristics in order to capture the full behavior of complex illumination and non-Lambertian reflectance. The use of the presented theoretical framework to develop practical algorithms for shape recovery is further presented where the hitherto assumed Lambertian assumption is relaxed. With a single image of unknown general illumination, the underlying geometrical structure can be recovered while accounting explicitly for object reflectance characteristics (e.g. human skin types for facial images and teeth reflectance for human jaw reconstruction) as well as complex illumination conditions. Experiments on synthetic and real images illustrate the robustness of the proposed appearance model vis-a-vis illumination variation. Keywords: computer vision, computer graphics, shading, illumination modeling, reflectance representation, image irradiance, frequency space representations, {hemi)spherical harmonics, analytic bilinear PCA, model-based bilinear PCA, 3D shape reconstruction, statistical shape from shading

    BRDF Estimation for Faces from a Sparse Dataset Using a Neural Network

    Full text link

    Subspace Representations for Robust Face and Facial Expression Recognition

    Get PDF
    Analyzing human faces and modeling their variations have always been of interest to the computer vision community. Face analysis based on 2D intensity images is a challenging problem, complicated by variations in pose, lighting, blur, and non-rigid facial deformations due to facial expressions. Among the different sources of variation, facial expressions are of interest as important channels of non-verbal communication. Facial expression analysis is also affected by changes in view-point and inter-subject variations in performing different expressions. This dissertation makes an attempt to address some of the challenges involved in developing robust algorithms for face and facial expression recognition by exploiting the idea of proper subspace representations for data. Variations in the visual appearance of an object mostly arise due to changes in illumination and pose. So we first present a video-based sequential algorithm for estimating the face albedo as an illumination-insensitive signature for face recognition. We show that by knowing/estimating the pose of the face at each frame of a sequence, the albedo can be efficiently estimated using a Kalman filter. Then we extend this to the case of unknown pose by simultaneously tracking the pose as well as updating the albedo through an efficient Bayesian inference method performed using a Rao-Blackwellized particle filter. Since understanding the effects of blur, especially motion blur, is an important problem in unconstrained visual analysis, we then propose a blur-robust recognition algorithm for faces with spatially varying blur. We model a blurred face as a weighted average of geometrically transformed instances of its clean face. We then build a matrix, for each gallery face, whose column space spans the space of all the motion blurred images obtained from the clean face. This matrix representation is then used to define a proper objective function and perform blur-robust face recognition. To develop robust and generalizable models for expression analysis one needs to break the dependence of the models on the choice of the coordinate frame of the camera. To this end, we build models for expressions on the affine shape-space (Grassmann manifold), as an approximation to the projective shape-space, by using a Riemannian interpretation of deformations that facial expressions cause on different parts of the face. This representation enables us to perform various expression analysis and recognition algorithms without the need for pose normalization as a preprocessing step. There is a large degree of inter-subject variations in performing various expressions. This poses an important challenge on developing robust facial expression recognition algorithms. To address this challenge, we propose a dictionary-based approach for facial expression analysis by decomposing expressions in terms of action units (AUs). First, we construct an AU-dictionary using domain experts' knowledge of AUs. To incorporate the high-level knowledge regarding expression decomposition and AUs, we then perform structure-preserving sparse coding by imposing two layers of grouping over AU-dictionary atoms as well as over the test image matrix columns. We use the computed sparse code matrix for each expressive face to perform expression decomposition and recognition. Most of the existing methods for the recognition of faces and expressions consider either the expression-invariant face recognition problem or the identity-independent facial expression recognition problem. We propose joint face and facial expression recognition using a dictionary-based component separation algorithm (DCS). In this approach, the given expressive face is viewed as a superposition of a neutral face component with a facial expression component, which is sparse with respect to the whole image. This assumption leads to a dictionary-based component separation algorithm, which benefits from the idea of sparsity and morphological diversity. The DCS algorithm uses the data-driven dictionaries to decompose an expressive test face into its constituent components. The sparse codes we obtain as a result of this decomposition are then used for joint face and expression recognition

    Reconstructing Geometry from Its Latent Structures

    Get PDF
    Our world is full of objects with complex shapes and structures. Through extensive experience humans quickly develop an intuition about how objects are shaped, and what their material properties are simply by analyzing their appearance. We engage this intuitive understanding of geometry in nearly everything we do.It is not surprising then, that a careful treatment of geometry stands to give machines a powerful advantage in the many tasks of visual perception. To that end, this thesis focuses on geometry recovery in a wide range of real-world problems. First, we describe a new approach to image registration. We observe that the structure of the imaged subject becomes embedded in the image intensities. By minimizing the change in shape of these intensity structures we ensure a physically realizable deformation. We then describe a method for reassembling fragmented, thin-shelled objects from range-images of their fragments using only the geometric and photometric structure embedded in the boundary of each fragment. Third, we describe a method for recovering and representing the shape of a geometric texture (such as bark, or sandpaper) by studying the characteristic properties of texture---self similarity and scale variability. Finally, we describe two methods for recovering the 3D geometry and reflectance properties of an object from images taken under natural illumination. We note that the structure of the surrounding environment, modulated by the reflectance, becomes embedded in the appearance of the object giving strong clues about the object's shape.Though these domains are quite diverse, an essential premise---that observations of objects contain within them salient clues about the object's structure---enables new and powerful approaches. For each problem we begin by investigating what these clues are.We then derive models and methods to canonically represent these clues and enable their full exploitation. The wide-ranging success of each method shows the importance of our carefully formulated observations about geometry, and the fundamental role geometry plays in visual perception.Ph.D., Computer Science -- Drexel University, 201

    Recognizing Human Faces: Physical Modeling and Pattern Classification

    Get PDF
    Although significant work has been done in the field of face recognition, the performance of the state-of-the-art face recognition algorithms is not good enough to be effective in operational systems. Most algorithms work well for controlled images but are quite susceptible to changes in illumination, pose, etc. In this dissertation, we propose methods which address these issues, to recognize faces in more realistic scenarios. The developed approaches show the importance of physical modeling, contextual constraints and pattern classification for this task. For still image-based face recognition, we develop an algorithm to recognize faces illuminated by arbitrarily placed, multiple light sources, given just a single image. Though the problem is ill-posed in its generality, linear approximations to the subspace of Lambertian images in combination with rank constraints on unknown facial shape and albedo are used to make it tractable. In addition, we develop a purely geometric illumination-invariant matching algorithm that makes use of the bilateral symmetry of human faces. In particular, we prove that the set of images of bilaterally symmetric objects can be partitioned into equivalence classes such that it is always possible to distinguish between two objects belonging to different equivalence classes using just one image per object. For recognizing faces in videos, the challenge lies in suitable characterization of faces using the information available in the video. We propose a method that models a face as a linear dynamical system whose appearance changes with pose. Though the proposed method performs very well on the available datasets, it does not explicitly take the 3D structure or illumination conditions into account. To address these issues, we propose an algorithm to perform 3D facial pose tracking in videos. The approach combines the structural advantages of geometric modeling with the statistical advantages of a particle filter based inference to recover the 3D configuration of facial features in each frame of the video. The recovered 3D configuration parameters are further used to recognize faces in videos. From a pattern classification point of view, automatic face recognition presents a very unique challenge due to the presence of just one (or a few) sample(s) per identity. To address this, we develop a cohort-based framework that makes use of the large number of non-match samples present in the database to improve verification and identification performance

    Analysis of 3D Face Reconstruction

    No full text
    This thesis investigates the long standing problem of 3D reconstruction from a single 2D face image. Face reconstruction from a single 2D face image is an ill posed problem involving estimation of the intrinsic and the extrinsic camera parameters, light parameters, shape parameters and the texture parameters. The proposed approach has many potential applications in the law enforcement, surveillance, medicine, computer games and the entertainment industries. This problem is addressed using an analysis by synthesis framework by reconstructing a 3D face model from identity photographs. The identity photographs are a widely used medium for face identi cation and can be found on identity cards and passports. The novel contribution of this thesis is a new technique for creating 3D face models from a single 2D face image. The proposed method uses the improved dense 3D correspondence obtained using rigid and non-rigid registration techniques. The existing reconstruction methods use the optical ow method for establishing 3D correspondence. The resulting 3D face database is used to create a statistical shape model. The existing reconstruction algorithms recover shape by optimizing over all the parameters simultaneously. The proposed algorithm simplifies the reconstruction problem by using a step wise approach thus reducing the dimension of the parameter space and simplifying the opti- mization problem. In the alignment step, a generic 3D face is aligned with the given 2D face image by using anatomical landmarks. The texture is then warped onto the 3D model by using the spatial alignment obtained previously. The 3D shape is then recovered by optimizing over the shape parameters while matching a texture mapped model to the target image. There are a number of advantages of this approach. Firstly, it simpli es the optimization requirements and makes the optimization more robust. Second, there is no need to accurately recover the illumination parameters. Thirdly, there is no need for recovering the texture parameters by using a texture synthesis approach. Fourthly, quantitative analysis is used for improving the quality of reconstruction by improving the cost function. Previous methods use qualitative methods such as visual analysis, and face recognition rates for evaluating reconstruction accuracy. The improvement in the performance of the cost function occurs as a result of improvement in the feature space comprising the landmark and intensity features. Previously, the feature space has not been evaluated with respect to reconstruction accuracy thus leading to inaccurate assumptions about its behaviour. The proposed approach simpli es the reconstruction problem by using only identity images, rather than placing eff ort on overcoming the pose, illumination and expression (PIE) variations. This makes sense, as frontal face images under standard illumination conditions are widely available and could be utilized for accurate reconstruction. The reconstructed 3D models with texture can then be used for overcoming the PIE variations

    Challenges in 3D scanning: Focusing on Ears and Multiple View Stereopsis

    Get PDF

    Methods for Structure from Motion

    Get PDF

    Computer vision in the space of light rays: plenoptic videogeometry and polydioptric camera design

    Get PDF
    Most of the cameras used in computer vision, computer graphics, and image processing applications are designed to capture images that are similar to the images we see with our eyes. This enables an easy interpretation of the visual information by a human observer. Nowadays though, more and more processing of visual information is done by computers. Thus, it is worth questioning if these human inspired ``eyes'' are the optimal choice for processing visual information using a machine. In this thesis I will describe how one can study problems in computer vision without reference to a specific camera model by studying the geometry and statistics of the space of light rays that surrounds us. The study of the geometry will allow us to determine all the possible constraints that exist in the visual input and could be utilized if we had a perfect sensor. Since no perfect sensor exists we use signal processing techniques to examine how well the constraints between different sets of light rays can be exploited given a specific camera model. A camera is modeled as a spatio-temporal filter in the space of light rays which lets us express the image formation process in a function approximation framework. This framework then allows us to relate the geometry of the imaging camera to the performance of the vision system with regard to the given task. In this thesis I apply this framework to problem of camera motion estimation. I show how by choosing the right camera design we can solve for the camera motion using linear, scene-independent constraints that allow for robust solutions. This is compared to motion estimation using conventional cameras. In addition we show how we can extract spatio-temporal models from multiple video sequences using multi-resolution subdivison surfaces
    corecore