2,234 research outputs found

    A statistical multiresolution approach for face recognition using structural hidden Markov models

    Get PDF
    This paper introduces a novel methodology that combines the multiresolution feature of the discrete wavelet transform (DWT) with the local interactions of the facial structures expressed through the structural hidden Markov model (SHMM). A range of wavelet filters such as Haar, biorthogonal 9/7, and Coiflet, as well as Gabor, have been implemented in order to search for the best performance. SHMMs perform a thorough probabilistic analysis of any sequential pattern by revealing both its inner and outer structures simultaneously. Unlike traditional HMMs, the SHMMs do not perform the state conditional independence of the visible observation sequence assumption. This is achieved via the concept of local structures introduced by the SHMMs. Therefore, the long-range dependency problem inherent to traditional HMMs has been drastically reduced. SHMMs have not previously been applied to the problem of face identification. The results reported in this application have shown that SHMM outperforms the traditional hidden Markov model with a 73% increase in accuracy

    MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction

    Get PDF
    In this work we propose a novel model-based deep convolutional autoencoder that addresses the highly challenging problem of reconstructing a 3D human face from a single in-the-wild color image. To this end, we combine a convolutional encoder network with an expert-designed generative model that serves as decoder. The core innovation is our new differentiable parametric decoder that encapsulates image formation analytically based on a generative model. Our decoder takes as input a code vector with exactly defined semantic meaning that encodes detailed face pose, shape, expression, skin reflectance and scene illumination. Due to this new way of combining CNN-based with model-based face reconstruction, the CNN-based encoder learns to extract semantically meaningful parameters from a single monocular input image. For the first time, a CNN encoder and an expert-designed generative model can be trained end-to-end in an unsupervised manner, which renders training on very large (unlabeled) real world data feasible. The obtained reconstructions compare favorably to current state-of-the-art approaches in terms of quality and richness of representation.Comment: International Conference on Computer Vision (ICCV) 2017 (Oral), 13 page

    Category-Specific Object Reconstruction from a Single Image

    Full text link
    Object reconstruction from a single image -- in the wild -- is a problem where we can make progress and get meaningful results today. This is the main message of this paper, which introduces an automated pipeline with pixels as inputs and 3D surfaces of various rigid categories as outputs in images of realistic scenes. At the core of our approach are deformable 3D models that can be learned from 2D annotations available in existing object detection datasets, that can be driven by noisy automatic object segmentations and which we complement with a bottom-up module for recovering high-frequency shape details. We perform a comprehensive quantitative analysis and ablation study of our approach using the recently introduced PASCAL 3D+ dataset and show very encouraging automatic reconstructions on PASCAL VOC.Comment: First two authors contributed equally. To appear at CVPR 201

    SIFT Flow: Dense Correspondence across Scenes and its Applications

    Get PDF
    While image alignment has been studied in different areas of computer vision for decades, aligning images depicting different scenes remains a challenging problem. Analogous to optical flow where an image is aligned to its temporally adjacent frame, we propose SIFT flow, a method to align an image to its nearest neighbors in a large image corpus containing a variety of scenes. The SIFT flow algorithm consists of matching densely sampled, pixel-wise SIFT features between two images, while preserving spatial discontinuities. The SIFT features allow robust matching across different scene/object appearances, whereas the discontinuity-preserving spatial model allows matching of objects located at different parts of the scene. Experiments show that the proposed approach robustly aligns complex scene pairs containing significant spatial differences. Based on SIFT flow, we propose an alignment-based large database framework for image analysis and synthesis, where image information is transferred from the nearest neighbors to a query image according to the dense scene correspondence. This framework is demonstrated through concrete applications, such as motion field prediction from a single image, motion synthesis via object transfer, satellite image registration and face recognition

    FrameNet: Learning Local Canonical Frames of 3D Surfaces from a Single RGB Image

    Full text link
    In this work, we introduce the novel problem of identifying dense canonical 3D coordinate frames from a single RGB image. We observe that each pixel in an image corresponds to a surface in the underlying 3D geometry, where a canonical frame can be identified as represented by three orthogonal axes, one along its normal direction and two in its tangent plane. We propose an algorithm to predict these axes from RGB. Our first insight is that canonical frames computed automatically with recently introduced direction field synthesis methods can provide training data for the task. Our second insight is that networks designed for surface normal prediction provide better results when trained jointly to predict canonical frames, and even better when trained to also predict 2D projections of canonical frames. We conjecture this is because projections of canonical tangent directions often align with local gradients in images, and because those directions are tightly linked to 3D canonical frames through projective geometry and orthogonality constraints. In our experiments, we find that our method predicts 3D canonical frames that can be used in applications ranging from surface normal estimation, feature matching, and augmented reality

    Learning to Navigate the Energy Landscape

    Full text link
    In this paper, we present a novel and efficient architecture for addressing computer vision problems that use `Analysis by Synthesis'. Analysis by synthesis involves the minimization of the reconstruction error which is typically a non-convex function of the latent target variables. State-of-the-art methods adopt a hybrid scheme where discriminatively trained predictors like Random Forests or Convolutional Neural Networks are used to initialize local search algorithms. While these methods have been shown to produce promising results, they often get stuck in local optima. Our method goes beyond the conventional hybrid architecture by not only proposing multiple accurate initial solutions but by also defining a navigational structure over the solution space that can be used for extremely efficient gradient-free local search. We demonstrate the efficacy of our approach on the challenging problem of RGB Camera Relocalization. To make the RGB camera relocalization problem particularly challenging, we introduce a new dataset of 3D environments which are significantly larger than those found in other publicly-available datasets. Our experiments reveal that the proposed method is able to achieve state-of-the-art camera relocalization results. We also demonstrate the generalizability of our approach on Hand Pose Estimation and Image Retrieval tasks

    Variational methods for shape and image registrations.

    Get PDF
    Estimating and analysis of deformation, either rigid or non-rigid, is an active area of research in various medical imaging and computer vision applications. Its importance stems from the inherent inter- and intra-variability in biological and biomedical object shapes and from the dynamic nature of the scenes usually dealt with in computer vision research. For instance, quantifying the growth of a tumor, recognizing a person\u27s face, tracking a facial expression, or retrieving an object inside a data base require the estimation of some sort of motion or deformation undergone by the object of interest. To solve these problems, and other similar problems, registration comes into play. This is the process of bringing into correspondences two or more data sets. Depending on the application at hand, these data sets can be for instance gray scale/color images or objects\u27 outlines. In the latter case, one talks about shape registration while in the former case, one talks about image/volume registration. In some situations, the combinations of different types of data can be used complementarily to establish point correspondences. One of most important image analysis tools that greatly benefits from the process of registration, and which will be addressed in this dissertation, is the image segmentation. This process consists of localizing objects in images. Several challenges are encountered in image segmentation, including noise, gray scale inhomogeneities, and occlusions. To cope with such issues, the shape information is often incorporated as a statistical model into the segmentation process. Building such statistical models requires a good and accurate shape alignment approach. In addition, segmenting anatomical structures can be accurately solved through the registration of the input data set with a predefined anatomical atlas. Variational approaches for shape/image registration and segmentation have received huge interest in the past few years. Unlike traditional discrete approaches, the variational methods are based on continuous modelling of the input data through the use of Partial Differential Equations (PDE). This brings into benefit the extensive literature on theory and numerical methods proposed to solve PDEs. This dissertation addresses the registration problem from a variational point of view, with more focus on shape registration. First, a novel variational framework for global-to-local shape registration is proposed. The input shapes are implicitly represented through their signed distance maps. A new Sumof- Squared-Differences (SSD) criterion which measures the disparity between the implicit representations of the input shapes, is introduced to recover the global alignment parameters. This new criteria has the advantages over some existing ones in accurately handling scale variations. In addition, the proposed alignment model is less expensive computationally. Complementary to the global registration field, the local deformation field is explicitly established between the two globally aligned shapes, by minimizing a new energy functional. This functional incrementally and simultaneously updates the displacement field while keeping the corresponding implicit representation of the globally warped source shape as close to a signed distance function as possible. This is done under some regularization constraints that enforce the smoothness of the recovered deformations. The overall process leads to a set of coupled set of equations that are simultaneously solved through a gradient descent scheme. Several applications, where the developed tools play a major role, are addressed throughout this dissertation. For instance, some insight is given as to how one can solve the challenging problem of three dimensional face recognition in the presence of facial expressions. Statistical modelling of shapes will be presented as a way of benefiting from the proposed shape registration framework. Second, this dissertation will visit th
    corecore