    3D face structure extraction from images at arbitrary poses and under arbitrary illumination conditions

    With the advent of 9/11, face detection and recognition is becoming an important tool to be used for securing homeland safety against potential terrorist attacks by tracking and identifying suspects who might be trying to indulge in such activities. It is also a technology that has proven its usefulness for law enforcement agencies by helping identifying or narrowing down a possible suspect from surveillance tape on the crime scene, or quickly by finding a suspect based on description from witnesses.In this thesis we introduce several improvements to morphable model based algorithms and make use of the 3D face structures extracted from multiple images to conduct illumination analysis and face recognition experiments. We present an enhanced Active Appearance Model (AAM), which possesses several sub-models that are independently updated to introduce more model flexibility to achieve better feature localization. Most appearance based models suffer from the unpredictability of facial background, which might result in a bad boundary extraction. To overcome this problem we propose a local projection models that accurately locates face boundary landmarks. We also introduce a novel and unbiased cost function that casts the face alignment as an optimization problem, where shape constraints obtained from direct motion estimation are incorporated to achieve a much higher convergence rate and more accurate alignment. Viewing angles are roughly categorized to four different poses, and the customized view-based AAMs align face images in different specific pose categories. We also attempt at obtaining individual 3D face structures by morphing a 3D generic face model to fit the individual faces. Face contour is dynamically generated so that the morphed face looks realistic. To overcome the correspondence problem between facial feature points on the generic and the individual face, we use an approach based on distance maps. With the extracted 3D face structure we study the illumination effects on the appearance based on the spherical harmonic illumination analysis. By normalizing the illumination conditions on different facial images, we extract a global illumination-invariant texture map, which jointly with the extracted 3D face structure in the form of cubic morphing parameters completely encode an individual face, and allow for the generation of images at arbitrary pose and under arbitrary illumination.Face recognition is conducted based on the face shape matching error, texture error and illumination-normalized texture error. Experiments show that a higher face recognition rate is achieved by compensating for illumination effects. Furthermore, it is observed that the fusion of shape and texture information result in a better performance than using either shape or texture information individually.Ph.D., Electrical Engineering -- Drexel University, 200


    This dissertation addresses the problem of inferring scene depth information from a collection of calibrated images taken from different viewpoints via stereo matching. Although it has been heavily investigated for decades, depth from stereo remains a long-standing challenge and popular research topic for several reasons. First of all, in order to be of practical use for many real-time applications such as autonomous driving, accurate depth estimation in real-time is of great importance and one of the core challenges in stereo. Second, for applications such as 3D reconstruction and view synthesis, high-quality depth estimation is crucial to achieve photo realistic results. However, due to the matching ambiguities, accurate dense depth estimates are difficult to achieve. Last but not least, most stereo algorithms rely on identification of corresponding points among images and only work effectively when scenes are Lambertian. For non-Lambertian surfaces, the brightness constancy assumption is no longer valid. This dissertation contributes three novel stereo algorithms that are motivated by the specific requirements and limitations imposed by different applications. In addressing high speed depth estimation from images, we present a stereo algorithm that achieves high quality results while maintaining real-time performance. We introduce an adaptive aggregation step in a dynamic-programming framework. Matching costs are aggregated in the vertical direction using a computationally expensive weighting scheme based on color and distance proximity. We utilize the vector processing capability and parallelism in commodity graphics hardware to speed up this process over two orders of magnitude. In addressing high accuracy depth estimation, we present a stereo model that makes use of constraints from points with known depths - the Ground Control Points (GCPs) as referred to in stereo literature. Our formulation explicitly models the influences of GCPs in a Markov Random Field. A novel regularization prior is naturally integrated into a global inference framework in a principled way using the Bayes rule. Our probabilistic framework allows GCPs to be obtained from various modalities and provides a natural way to integrate information from various sensors. In addressing non-Lambertian reflectance, we introduce a new invariant for stereo correspondence which allows completely arbitrary scene reflectance (bidirectional reflectance distribution functions - BRDFs). This invariant can be used to formulate a rank constraint on stereo matching when the scene is observed by several lighting configurations in which only the lighting intensity varies

    This document represents the final report for the View Generated Database (VGD) project, NAS7-1066. It documents the work done on the project up to the point at which all project work was terminated due to lack of project funds. The VGD was to provide the capability to accurately represent any real-world object or scene as a computer model. Such models include both an accurate spatial/geometric representation of surfaces of the object or scene, as well as any surface detail present on the object. Applications of such models are numerous, including acquisition and maintenance of work models for tele-autonomous systems, generation of accurate 3-D geometric/photometric models for various 3-D vision systems, and graphical models for realistic rendering of 3-D scenes via computer graphics

    Image-based 3D reconstruction of surfaces with highly complex reflectance properties

    The camera-based acquisition of the environment has become an ordinary task in today’s society as much in science as in everyday-life situations. Smartphone cameras are employed in interactive video games and augmented reality, just as industrial quality inspection, remote sensing, robotics and autonomous vehicles rely on camera sensors to analyze the outside world. One crucial aspect of the automated analysis is the retrieval of the 3D structure of unknown objects in the scene – be it for collision prevention, grabbing, or comparison to a CAD model – from the acquired image data. Reflectance-based surface reconstruction methods form a valuable part of the set of camera-based algorithms. Stereo cameras exploit geometrical optics to triangulate the 3D position of a scene point while photometric procedures require one camera only and estimate a surface gradient field based on the shading of an object. The reflectance properties of the object have to be known to achieve this which results in a chicken-and-egg problem on unknown objects since the surface shape has to be available to approximate the reflectance properties, and the reflectance properties have to be known to estimate the surface shape. This situation is circumvented on Lambertian surfaces, yet, those that are of interest in real-world applications exhibit much more complex reflectance properties for which this problem remains. The challenge of estimating the unknown spatially varying bidirectional reflectance distribution function (BRDF) parameters of an object of approximately known shape is approached from a Bayesian perspective employing reversible jump Markov chain Monte Carlo methods to infer both, reflectance parameters and surface regions that show similar reflectance properties from sampling the posterior distributions of the data. A significant advantage compared to non-linear least squares estimates is the availability of statistical information that can directly be used to evaluate the accuracy of the inferred patches and parameters. In the evaluation of the method, the derived patches accurately separate a synthetic and a laboratory dataset into meaningful segments. The reflectance of the synthetic dataset is almost perfectly reproduced and misestimated BRDF parameters underline the necessity for a large dataset to apply statistical inference. The real-world dataset reveals the inherent problems of BRDF estimation in the presence of cast shadows and interreflections. Furthermore, a procedure that is suitable to calibrate a two-camera photometric stereo acquisition setup is examined. The calibration is based on multiple images of a diffuse spherical object that is located in corresponding images. Although the calibration object is supposed to be perfectly diffuse by design, considering a specular Phong component in addition to the Lambertian BRDF model increases the accuracy of the rendered images. The light source positions are initialized based on stereo geometry and optimized by minimizing the intensity error between measured and rendered images of the calibration object. Ultimately, this dissertation tackles the task of image-based surface reconstruction with the contribution of two novel algorithms. The first one computes an initial approximation of the 3D shape based on the diffuse component of the reflectance and iteratively refines this rough guess with gradient fields calculated from photometric stereo assuming a combination of the BRDF models of Lambert and Blinn. The second method computes the surface gradient fields for both views of a stereo camera setup and updates the estimated depth subject to Horn’s integrability constraint and a new regularization term that accounts for the disparity offset between the two matching gradient fields. Both procedures are evaluated on objects that exhibit complex reflectance properties and challenging shapes. A fringe projection 3D scanner is used for reference data and error assessment. Small details that are not visible in the coarse initial 3D data, that is supplied to the first algorithm, are recovered based on the high-quality gradient data obtained from photometric stereo. The error of the test data with respect to the reference scanner is less than 0.3 mm. In contrast to the first method that computes shape information, the stereo camera algorithm yields absolute 3D data and produces very good reconstruction results on all datasets. The proposed method even surpasses the reconstruction accuracy of the 3D scanner on a metallic dataset. This is a notable contribution, as most existing camera-based surface reconstruction methods exclusively handle diffusely reflecting objects and those that focus on non-Lambertian objects still struggle with highly specular metallic surfaces

    Data driven analysis of faces from images

    This thesis proposes three new data-driven approaches to detect, analyze, or modify faces in images. All presented contributions are inspired by the use of prior knowledge and they derive information about facial appearances from pre-collected databases of images or 3D face models. First, we contribute an approach that extends a widely-used monocular face detector by an additional classifier that evaluates disparity maps of a passive stereo camera. The algorithm runs in real-time and significantly reduces the number of false positives compared to the monocular approach. Next, with a many-core implementation of the detector, we train view-dependent face detectors based on tailored views which guarantee that the statistical variability is fully covered. These detectors are superior to the state of the art on a challenging dataset and can be trained in an automated procedure. Finally, we contribute a model describing the relation of facial appearance and makeup. The approach extracts makeup from before/after images of faces and allows to modify faces in images. Applications such as machine-suggested makeup can improve perceived attractiveness as shown in a perceptual study. In summary, the presented methods help improve the outcome of face detection algorithms, ease and automate their training procedures and the modification of faces in images. Moreover, their data-driven nature enables new and powerful applications arising from the use of prior knowledge and statistical analyses.In der vorliegenden Arbeit werden drei neue, datengetriebene Methoden vorgestellt, die Gesichter in Abbildungen detektieren, analysieren oder modifizieren. Alle Algorithmen extrahieren dabei Vorwissen über Gesichter und deren Erscheinungsformen aus zuvor erstellten Gesichts- Datenbanken, in 2-D oder 3-D. Zunächst wird ein weit verbreiteter monokularer Gesichtsdetektions- Algorithmus um einen zweiten Klassifikator erweitert. In Echtzeit wertet dieser stereoskopische Tiefenkarten aus und führt so zu nachweislich weniger falsch detektierten Gesichtern. Anschließend wird der Basis-Algorithmus durch Parallelisierung verbessert und mit synthetisch generierten Bilddaten trainiert. Diese garantieren die volle Nutzung des verfügbaren Varianzspektrums. So erzeugte Detektoren übertreffen bisher präsentierte Detektoren auf einem schwierigen Datensatz und können automatisch erzeugt werden. Abschließend wird ein Datenmodell für Gesichts-Make-up vorgestellt. Dieses extrahiert Make-up aus Vorher/Nachher-Fotos und kann Gesichter in Abbildungen modifizieren. In einer Studie wird gezeigt, dass vom Computer empfohlenes Make-up die wahrgenommene Attraktivität von Gesichtern steigert. Zusammengefasst verbessern die gezeigten Methoden die Ergebnisse von Gesichtsdetektoren, erleichtern und automatisieren ihre Trainingsprozedur sowie die automatische Veränderung von Gesichtern in Abbildungen. Durch Extraktion von Vorwissen und statistische Datenanalyse entstehen zudem neuartige Anwendungsfelder

    Shadow segmentation and tracking in real-world conditions

    Visual information, in the form of images and video, comes from the interaction of light with objects. Illumination is a fundamental element of visual information. Detecting and interpreting illumination effects is part of our everyday life visual experience. Shading for instance allows us to perceive the three-dimensional nature of objects. Shadows are particularly salient cues for inferring depth information. However, we do not make any conscious or unconscious effort to avoid them as if they were an obstacle when we walk around. Moreover, when humans are asked to describe a picture, they generally omit the presence of illumination effects, such as shadows, shading, and highlights, to give a list of objects and their relative position in the scene. Processing visual information in a way that is close to what the human visual system does, thus being aware of illumination effects, represents a challenging task for computer vision systems. Illumination phenomena interfere in fact with fundamental tasks in image analysis and interpretation applications, such as object extraction and description. On the other hand, illumination conditions are an important element to be considered when creating new and richer visual content that combines objects from different sources, both natural and synthetic. When taken into account, illumination effects can play an important role in achieving realism. Among illumination effects, shadows are often integral part of natural scenes and one of the elements contributing to naturalness of synthetic scenes. In this thesis, the problem of extracting shadows from digital images is discussed. A new analysis method for the segmentation of cast shadows in still and moving images without the need of human supervision is proposed. The problem of separating moving cast shadows from moving objects in image sequences is particularly relevant for an always wider range of applications, ranging from video analysis to video coding, and from video manipulation to interactive environments. Therefore, particular attention has been dedicated to the segmentation of shadows in video. The validity of the proposed approach is however also demonstrated through its application to the detection of cast shadows in still color images. Shadows are a difficult phenomenon to model. Their appearance changes with changes in the appearance of the surface they are cast upon. It is therefore important to exploit multiple constraints derived from the analysis of the spectral, geometric and temporal properties of shadows to develop effective techniques for their extraction. The proposed method combines an analysis of color information and of photometric invariant features to a spatio-temporal verification process. With regards to the use of color information for shadow analysis, a complete picture of the existing solutions is provided, which points out the fundamental assumptions, the adopted color models and the link with research problems such as computational color constancy and color invariance. The proposed spatial verification does not make any assumption about scene geometry nor about object shape. The temporal analysis is based on a novel shadow tracking technique. On the basis of the tracking results, a temporal reliability estimation of shadows is proposed which allows to discard shadows which do not present time coherence. The proposed approach is general and can be applied to a wide class of applications and input data. The proposed cast shadow segmentation method has been evaluated on a number of different video data representing indoor and outdoor real-world environments. The obtained results have confirmed the validity of the approach, in particular its ability to deal with different types of content and its robustness to different physically important independent variables, and have demonstrated the improvement with respect to the state of the art. Examples of application of the proposed shadow segmentation tool to the enhancement of video object segmentation, tracking and description operations, and to video composition, have demonstrated the advantages of a shadow-aware video processing

    Augmented reality for non-rigid surfaces

    Augmented Reality (AR) is the process of integrating virtual elements in reality, often by mixing computer graphics into a live video stream of a real scene. It requires registration of the target object with respect to the cameras. To this end, some approaches rely on dedicated hardware, such as magnetic trackers or infra-red cameras, but they are too expensive and cumbersome to reach a large public. Others are based on specifically designed markers which usually look like bar-codes. However, they alter the look of objects to be augmented, thereby hindering their use in application for which visual design matters. Recent advances in Computer Vision have made it possible to track and detect objects by relying on natural features. However, no such method is commonly used in the AR community, because the maturity of available packages is not sufficient yet. As far as deformable surfaces are concerned, the choice is even more limited, mainly because initialization is so difficult. Our main contribution is therefore a new AR framework that can properly augment deforming surfaces in real-time. Its target platform is a standard PC and a single webcam. It does not require any complex calibration procedure, making it perfectly suitable for novice end-users. To satisfy to the most demanding application designers, our framework does not require any scene engineering, renders virtual objects illuminated by real light, and let real elements occlude virtual ones. To meet this challenge, we developed several innovative techniques. Our approach to real-time registration of a deforming surface is based on wide-baseline feature matching. However, traditional outlier elimination techniques such as RANSAC are unable to handle the non-rigid surface's large number of degrees of freedom. We therefore proposed a new robust estimation scheme that allows both 2–D and 3–D non-rigid surface registration. Another issue of critical importance in AR to achieve realism is illumination handling, for which existing techniques often require setup procedures or devices such as reflective spheres. By contrast, our framework includes methods to estimate illumination for rendering purposes without sacrificing ease of use. Finally, several existing approaches to handling occlusions in AR rely on multiple cameras or can only deal with occluding objects modeled beforehand. Our requires only one camera and models occluding objects at runtime. We incorporated these components in a consistent and flexible framework. We used it to augment many different objects such as a deforming T-shirt or a sheet of paper, under challenging conditions, in real-time, and with correct handling of illumination and occlusions. We also used our non-rigid surface registration technique to measure the shape of deformed sails. We validated the ease of deployment of our framework by distributing a software package and letting an artist use it to create two AR applications
