8,911 research outputs found

    Towards Interpretable Face Recognition

    Full text link
    Deep CNNs have been pushing the frontier of visual recognition over past years. Besides recognition accuracy, strong demands in understanding deep CNNs in the research community motivate developments of tools to dissect pre-trained models to visualize how they make predictions. Recent works further push the interpretability in the network learning stage to learn more meaningful representations. In this work, focusing on a specific area of visual recognition, we report our efforts towards interpretable face recognition. We propose a spatial activation diversity loss to learn more structured face representations. By leveraging the structure, we further design a feature activation diversity loss to push the interpretable representations to be discriminative and robust to occlusions. We demonstrate on three face recognition benchmarks that our proposed method is able to improve face recognition accuracy with easily interpretable face representations.Comment: 10 pages, 9 figures, 6 tables, To appear in ICCV 2019 as an oral pape

    Occlusion Coherence: Detecting and Localizing Occluded Faces

    Full text link
    The presence of occluders significantly impacts object recognition accuracy. However, occlusion is typically treated as an unstructured source of noise and explicit models for occluders have lagged behind those for object appearance and shape. In this paper we describe a hierarchical deformable part model for face detection and landmark localization that explicitly models part occlusion. The proposed model structure makes it possible to augment positive training data with large numbers of synthetically occluded instances. This allows us to easily incorporate the statistics of occlusion patterns in a discriminatively trained model. We test the model on several benchmarks for landmark localization and detection including challenging new data sets featuring significant occlusion. We find that the addition of an explicit occlusion model yields a detection system that outperforms existing approaches for occluded instances while maintaining competitive accuracy in detection and landmark localization for unoccluded instances

    Learning Disentangling and Fusing Networks for Face Completion Under Structured Occlusions

    Full text link
    Face completion aims to generate semantically new pixels for missing facial components. It is a challenging generative task due to large variations of face appearance. This paper studies generative face completion under structured occlusions. We treat the face completion and corruption as disentangling and fusing processes of clean faces and occlusions, and propose a jointly disentangling and fusing Generative Adversarial Network (DF-GAN). First, three domains are constructed, corresponding to the distributions of occluded faces, clean faces and structured occlusions. The disentangling and fusing processes are formulated as the transformations between the three domains. Then the disentangling and fusing networks are built to learn the transformations from unpaired data, where the encoder-decoder structure is adopted and allows DF-GAN to simulate structure occlusions by modifying the latent representations. Finally, the disentangling and fusing processes are unified into a dual learning framework along with an adversarial strategy. The proposed method is evaluated on Meshface verification problem. Experimental results on four Meshface databases demonstrate the effectiveness of our proposed method for the face completion under structured occlusions.Comment: Submitted to CVPR 201

    Recognizing Partial Biometric Patterns

    Full text link
    Biometric recognition on partial captured targets is challenging, where only several partial observations of objects are available for matching. In this area, deep learning based methods are widely applied to match these partial captured objects caused by occlusions, variations of postures or just partial out of view in person re-identification and partial face recognition. However, most current methods are not able to identify an individual in case that some parts of the object are not obtainable, while the rest are specialized to certain constrained scenarios. To this end, we propose a robust general framework for arbitrary biometric matching scenarios without the limitations of alignment as well as the size of inputs. We introduce a feature post-processing step to handle the feature maps from FCN and a dictionary learning based Spatial Feature Reconstruction (SFR) to match different sized feature maps in this work. Moreover, the batch hard triplet loss function is applied to optimize the model. The applicability and effectiveness of the proposed method are demonstrated by the results from experiments on three person re-identification datasets (Market1501, CUHK03, DukeMTMC-reID), two partial person datasets (Partial REID and Partial iLIDS) and two partial face datasets (CASIA-NIR-Distance and Partial LFW), on which state-of-the-art performance is ensured in comparison with several state-of-the-art approaches. The code is released online and can be found on the website: https://github.com/lingxiao-he/Partial-Person-ReID.Comment: 13 pages, 11 figure

    Learning Locality-Constrained Collaborative Representation for Face Recognition

    Full text link
    The model of low-dimensional manifold and sparse representation are two well-known concise models that suggest each data can be described by a few characteristics. Manifold learning is usually investigated for dimension reduction by preserving some expected local geometric structures from the original space to a low-dimensional one. The structures are generally determined by using pairwise distance, e.g., Euclidean distance. Alternatively, sparse representation denotes a data point as a linear combination of the points from the same subspace. In practical applications, however, the nearby points in terms of pairwise distance may not belong to the same subspace, and vice versa. Consequently, it is interesting and important to explore how to get a better representation by integrating these two models together. To this end, this paper proposes a novel coding algorithm, called Locality-Constrained Collaborative Representation (LCCR), which improves the robustness and discrimination of data representation by introducing a kind of local consistency. The locality term derives from a biologic observation that the similar inputs have similar code. The objective function of LCCR has an analytical solution, and it does not involve local minima. The empirical studies based on four public facial databases, ORL, AR, Extended Yale B, and Multiple PIE, show that LCCR is promising in recognizing human faces from frontal views with varying expression and illumination, as well as various corruptions and occlusions.Comment: 16 pages, v

    Structured Occlusion Coding for Robust Face Recognition

    Full text link
    Occlusion in face recognition is a common yet challenging problem. While sparse representation based classification (SRC) has been shown promising performance in laboratory conditions (i.e. noiseless or random pixel corrupted), it performs much worse in practical scenarios. In this paper, we consider the practical face recognition problem, where the occlusions are predictable and available for sampling. We propose the structured occlusion coding (SOC) to address occlusion problems. The structured coding here lies in two folds. On one hand, we employ a structured dictionary for recognition. On the other hand, we propose to use the structured sparsity in this formulation. Specifically, SOC simultaneously separates the occlusion and classifies the image. In this way, the problem of recognizing an occluded image is turned into seeking a structured sparse solution on occlusion-appended dictionary. In order to construct a well-performing occlusion dictionary, we propose an occlusion mask estimating technique via locality constrained dictionary (LCD), showing striking improvement in occlusion sample. On a category-specific occlusion dictionary, we replace norm sparsity with the structured sparsity which is shown more robust, further enhancing the robustness of our approach. Moreover, SOC achieves significant improvement in handling large occlusion in real world. Extensive experiments are conducted on public data sets to validate the superiority of the proposed algorithm

    Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition

    Full text link
    Occlusion and pose variations, which can change facial appearance significantly, are two major obstacles for automatic Facial Expression Recognition (FER). Though automatic FER has made substantial progresses in the past few decades, occlusion-robust and pose-invariant issues of FER have received relatively less attention, especially in real-world scenarios. This paper addresses the real-world pose and occlusion robust FER problem with three-fold contributions. First, to stimulate the research of FER under real-world occlusions and variant poses, we build several in-the-wild facial expression datasets with manual annotations for the community. Second, we propose a novel Region Attention Network (RAN), to adaptively capture the importance of facial regions for occlusion and pose variant FER. The RAN aggregates and embeds varied number of region features produced by a backbone convolutional neural network into a compact fixed-length representation. Last, inspired by the fact that facial expressions are mainly defined by facial action units, we propose a region biased loss to encourage high attention weights for the most important regions. We validate our RAN and region biased loss on both our built test datasets and four popular datasets: FERPlus, AffectNet, RAF-DB, and SFEW. Extensive experiments show that our RAN and region biased loss largely improve the performance of FER with occlusion and variant pose. Our method also achieves state-of-the-art results on FERPlus, AffectNet, RAF-DB, and SFEW. Code and the collected test data will be publicly available.Comment: The test set and the code of this paper will be available at https://github.com/kaiwang960112/Challenge-condition-FER-datase

    A Survey of the Trends in Facial and Expression Recognition Databases and Methods

    Full text link
    Automated facial identification and facial expression recognition have been topics of active research over the past few decades. Facial and expression recognition find applications in human-computer interfaces, subject tracking, real-time security surveillance systems and social networking. Several holistic and geometric methods have been developed to identify faces and expressions using public and local facial image databases. In this work we present the evolution in facial image data sets and the methodologies for facial identification and recognition of expressions such as anger, sadness, happiness, disgust, fear and surprise. We observe that most of the earlier methods for facial and expression recognition aimed at improving the recognition rates for facial feature-based methods using static images. However, the recent methodologies have shifted focus towards robust implementation of facial/expression recognition from large image databases that vary with space (gathered from the internet) and time (video recordings). The evolution trends in databases and methodologies for facial and expression recognition can be useful for assessing the next-generation topics that may have applications in security systems or personal identification systems that involve "Quantitative face" assessments.Comment: 16 pages, 4 figures, 3 tables, International Journal of Computer Science and Engineering Survey, October, 201

    Deep Multi-Center Learning for Face Alignment

    Full text link
    Facial landmarks are highly correlated with each other since a certain landmark can be estimated by its neighboring landmarks. Most of the existing deep learning methods only use one fully-connected layer called shape prediction layer to estimate the locations of facial landmarks. In this paper, we propose a novel deep learning framework named Multi-Center Learning with multiple shape prediction layers for face alignment. In particular, each shape prediction layer emphasizes on the detection of a certain cluster of semantically relevant landmarks respectively. Challenging landmarks are focused firstly, and each cluster of landmarks is further optimized respectively. Moreover, to reduce the model complexity, we propose a model assembling method to integrate multiple shape prediction layers into one shape prediction layer. Extensive experiments demonstrate that our method is effective for handling complex occlusions and appearance variations with real-time performance. The code for our method is available at https://github.com/ZhiwenShao/MCNet-Extension.Comment: This paper has been accepted by Neurocomputin

    Facial Landmark Detection: a Literature Survey

    Full text link
    The locations of the fiducial facial landmark points around facial components and facial contour capture the rigid and non-rigid facial deformations due to head movements and facial expressions. They are hence important for various facial analysis tasks. Many facial landmark detection algorithms have been developed to automatically detect those key points over the years, and in this paper, we perform an extensive review of them. We classify the facial landmark detection algorithms into three major categories: holistic methods, Constrained Local Model (CLM) methods, and the regression-based methods. They differ in the ways to utilize the facial appearance and shape information. The holistic methods explicitly build models to represent the global facial appearance and shape information. The CLMs explicitly leverage the global shape model but build the local appearance models. The regression-based methods implicitly capture facial shape and appearance information. For algorithms within each category, we discuss their underlying theories as well as their differences. We also compare their performances on both controlled and in the wild benchmark datasets, under varying facial expressions, head poses, and occlusion. Based on the evaluations, we point out their respective strengths and weaknesses. There is also a separate section to review the latest deep learning-based algorithms. The survey also includes a listing of the benchmark databases and existing software. Finally, we identify future research directions, including combining methods in different categories to leverage their respective strengths to solve landmark detection "in-the-wild"
    • …
    corecore