1,410 research outputs found

    Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz

    Full text link
    The reconstruction of dense 3D models of face geometry and appearance from a single image is highly challenging and ill-posed. To constrain the problem, many approaches rely on strong priors, such as parametric face models learned from limited 3D scan data. However, prior models restrict generalization of the true diversity in facial geometry, skin reflectance and illumination. To alleviate this problem, we present the first approach that jointly learns 1) a regressor for face shape, expression, reflectance and illumination on the basis of 2) a concurrently learned parametric face model. Our multi-level face model combines the advantage of 3D Morphable Models for regularization with the out-of-space generalization of a learned corrective space. We train end-to-end on in-the-wild images without dense annotations by fusing a convolutional encoder with a differentiable expert-designed renderer and a self-supervised training loss, both defined at multiple detail levels. Our approach compares favorably to the state-of-the-art in terms of reconstruction quality, better generalizes to real world faces, and runs at over 250 Hz.Comment: CVPR 2018 (Oral). Project webpage: https://gvv.mpi-inf.mpg.de/projects/FML

    A review of content-based video retrieval techniques for person identification

    Get PDF
    The rise of technology spurs the advancement in the surveillance field. Many commercial spaces reduced the patrol guard in favor of Closed-Circuit Television (CCTV) installation and even some countries already used surveillance drone which has greater mobility. In recent years, the CCTV Footage have also been used for crime investigation by law enforcement such as in Boston Bombing 2013 incident. However, this led us into producing huge unmanageable footage collection, the common issue of Big Data era. While there is more information to identify a potential suspect, the massive size of data needed to go over manually is a very laborious task. Therefore, some researchers proposed using Content-Based Video Retrieval (CBVR) method to enable to query a specific feature of an object or a human. Due to the limitations like visibility and quality of video footage, only certain features are selected for recognition based on Chicago Police Department guidelines. This paper presents the comprehensive reviews on CBVR techniques used for clothing, gender and ethnic recognition of the person of interest and how can it be applied in crime investigation. From the findings, the three recognition types can be combined to create a Content-Based Video Retrieval system for person identification

    On Detecting Faces And Classifying Facial Races With Partial Occlusions And Pose Variations

    Get PDF
    In this dissertation, we present our contributions in face detection and facial race classification. Face detection in unconstrained images is a traditional problem in computer vision community. Challenges still remain. In particular, the detection of partially occluded faces with pose variations has not been well addressed. In the first part of this dissertation, our contributions are three-fold. First, we introduce our four image datasets consisting of large-scale labeled face dataset, noisy large-scale labeled non-face dataset, CrowdFaces dataset, and CrowdNonFaces dataset intended to be used for face detection training. Second, we improve Viola-Jones (VJ) face detection results by first training a Convolutional Neural Network (CNN) model on our noisy datasets. We show our improvement over the VJ face detector on AFW face detection benchmark dataset. However, existing partial occluded face detection methods require training several models, computing hand-crafted features, or both. Hence, we thirdly propose our Large-Scale Deep Learning (LSDL), a method that does not require training several CNN models or hand-crafted features computations to detect faces. Our LSDL face detector is trained on a single CNN model to detect unconstrained multi-view partially occluded and non-partially occluded faces. The model is trained with a large number of face training examples that cover most partial occlusions and non-partial occlusions facial appearances. The LSDL face detection method is achieved by selecting detection windows with the highest confidence scores using a threshold. Our evaluation results show that our LSDL method achieves the best performance on AFW dataset and a comparable performance on FDDB dataset among state-of-the-art face detection methods without manually extending or adjusting the square detection bounding boxes. Many biometrics and security systems use facial information to obtain an individual identification and recognition. Classifying a race from a face image can provide a strong hint to search for facial identity and criminal identification. Current facial race classification methods are confined only to constrained non-partially occluded frontal faces. Challenges remain under unconstrained environments such as partial occlusions and pose variations, low illuminations, and small scales. In the second part of the dissertation, we propose a CNN model to classify facial races with partial occlusions and pose variations. The proposed model is trained using a broad and balanced racial distributed face image dataset. The model is trained on four major human races, Caucasian, Indian, Mongolian, and Negroid. Our model is evaluated against the state-of-the-art methods on a constrained face test dataset. Also, an evaluation of the proposed model and human performance is conducted and compared on our new unconstrained facial race benchmark (CIMN) dataset. Our results show that our model achieves 95.1% of race classification accuracy in the constrained environment. Furthermore, the model achieves a comparable accuracy of race classification compared to human performance on the current challenges in the unconstrained environment

    Improving Landmark Localization with Semi-Supervised Learning

    Full text link
    We present two techniques to improve landmark localization in images from partially annotated datasets. Our primary goal is to leverage the common situation where precise landmark locations are only provided for a small data subset, but where class labels for classification or regression tasks related to the landmarks are more abundantly available. First, we propose the framework of sequential multitasking and explore it here through an architecture for landmark localization where training with class labels acts as an auxiliary signal to guide the landmark localization on unlabeled data. A key aspect of our approach is that errors can be backpropagated through a complete landmark localization model. Second, we propose and explore an unsupervised learning technique for landmark localization based on having a model predict equivariant landmarks with respect to transformations applied to the image. We show that these techniques, improve landmark prediction considerably and can learn effective detectors even when only a small fraction of the dataset has landmark labels. We present results on two toy datasets and four real datasets, with hands and faces, and report new state-of-the-art on two datasets in the wild, e.g. with only 5\% of labeled images we outperform previous state-of-the-art trained on the AFLW dataset.Comment: Published as a conference paper in CVPR 201
    corecore