145,183 research outputs found

    FACE RECOGNITION AND VERIFICATION IN UNCONSTRAINED ENVIRIONMENTS

    Get PDF
    Face recognition has been a long standing problem in computer vision. General face recognition is challenging because of large appearance variability due to factors including pose, ambient lighting, expression, size of the face, age, and distance from the camera, etc. There are very accurate techniques to perform face recognition in controlled environments, especially when large numbers of samples are available for each face (individual). However, face identification under uncontrolled( unconstrained) environments or with limited training data is still an unsolved problem. There are two face recognition tasks: face identification (who is who in a probe face set, given a gallery face set) and face verification (same or not, given two faces). In this work, we study both face identification and verification in unconstrained environments. Firstly, we propose a face verification framework that combines Partial Least Squares (PLS) and the One-Shot similarity model[1]. The idea is to describe a face with a large feature set combining shape, texture and color information. PLS regression is applied to perform multi-channel feature weighting on this large feature set. Finally the PLS regression is used to compute the similarity score of an image pair by One-Shot learning (using a fixed negative set). Secondly, we study face identification with image sets, where the gallery and probe are sets of face images of an individual. We model a face set by its covariance matrix (COV) which is a natural 2nd-order statistic of a sample set.By exploring an efficient metric for the SPD matrices, i.e., Log-Euclidean Distance (LED), we derive a kernel function that explicitly maps the covariance matrix from the Riemannian manifold to Euclidean space. Then, discriminative learning is performed on the COV manifold: the learning aims to maximize the between-class COV distance and minimize the within-class COV distance. Sparse representation and dictionary learning have been widely used in face recognition, especially when large numbers of samples are available for each face (individual). Sparse coding is promising since it provides a more stable and discriminative face representation. In the last part of our work, we explore sparse coding and dictionary learning for face verification application. More specifically, in one approach, we apply sparse representations to face verification in two ways via a fix reference set as dictionary. In the other approach, we propose a dictionary learning framework with explicit pairwise constraints, which unifies the discriminative dictionary learning for pair matching (face verification) and classification (face recognition) problems

    Probabilistic models for multi-view semi-supervised learning and coding

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 146-160).This thesis investigates the problem of classification from multiple noisy sensors or modalities. Examples include speech and gesture interfaces and multi-camera distributed sensor networks. Reliable recognition in such settings hinges upon the ability to learn accurate classification models in the face of limited supervision and to cope with the relatively large amount of potentially redundant information transmitted by each sensor or modality (i.e., view). We investigate and develop novel multi view learning algorithms capable of learning from semi-supervised noisy sensor data, for automatically adapting to new users and working conditions, and for performing distributed feature selection on bandwidth limited sensor networks. We propose probabilistic models built upon multi-view Gaussian Processes (GPs) for solving this class of problems, and demonstrate our approaches for solving audio-visual speech and gesture, and multi-view object classification problems. Multi-modal tasks are good candidates for multi-view learning, since each modality provides a potentially redundant view to the learning algorithm. On audio-visual speech unit classification, and user agreement recognition using spoken utterances and head gestures, we demonstrate that multi-modal co-training can be used to learn from only a few labeled examples in one or both of the audio-visual modalities. We also propose a co-adaptation algorithm, which adapts existing audio-visual classifiers to a particular user or noise condition by leveraging the redundancy in the unlabeled data. Existing methods typically assume constant per-channel noise models.(cont.) In contrast we develop co-training algorithms that are able to learn from noisy sensor data corrupted by complex per-sample noise processes, e.g., occlusion common to multi sensor classification problems. We propose a probabilistic heteroscedastic approach to co-training that simultaneously discovers the amount of noise on a per-sample basis, while solving the classification task. This results in accurate performance in the presence of occlusion or other complex noise processes. We also investigate an extension of this idea for supervised multi-view learning where we develop a Bayesian multiple kernel learning algorithm that can learn a local weighting over each view of the input space. We additionally consider the problem of distributed object recognition or indexing from multiple cameras, where the computational power available at each camera sensor is limited and communication between cameras is prohibitively expensive. In this scenario, it is desirable to avoid sending redundant visual features from multiple views. Traditional supervised feature selection approaches are inapplicable as the class label is unknown at each camera. In this thesis, we propose an unsupervised multi-view feature selection algorithm based on a distributed coding approach. With our method, a Gaussian Process model of the joint view statistics is used at the receiver to obtain a joint encoding of the views without directly sharing information across encoders. We demonstrate our approach on recognition and indexing tasks with multi-view image databases and show that our method compares favorably to an independent encoding of the features from each camera.by C. Mario Christoudias.Ph.D

    Facial Action Unit Detection Using Attention and Relation Learning

    Full text link
    Attention mechanism has recently attracted increasing attentions in the field of facial action unit (AU) detection. By finding the region of interest of each AU with the attention mechanism, AU-related local features can be captured. Most of the existing attention based AU detection works use prior knowledge to predefine fixed attentions or refine the predefined attentions within a small range, which limits their capacity to model various AUs. In this paper, we propose an end-to-end deep learning based attention and relation learning framework for AU detection with only AU labels, which has not been explored before. In particular, multi-scale features shared by each AU are learned firstly, and then both channel-wise and spatial attentions are adaptively learned to select and extract AU-related local features. Moreover, pixel-level relations for AUs are further captured to refine spatial attentions so as to extract more relevant local features. Without changing the network architecture, our framework can be easily extended for AU intensity estimation. Extensive experiments show that our framework (i) soundly outperforms the state-of-the-art methods for both AU detection and AU intensity estimation on the challenging BP4D, DISFA, FERA 2015 and BP4D+ benchmarks, (ii) can adaptively capture the correlated regions of each AU, and (iii) also works well under severe occlusions and large poses.Comment: This paper is accepted by IEEE Transactions on Affective Computin

    CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection

    Full text link
    Robust face detection in the wild is one of the ultimate components to support various facial related problems, i.e. unconstrained face recognition, facial periocular recognition, facial landmarking and pose estimation, facial expression recognition, 3D facial model construction, etc. Although the face detection problem has been intensely studied for decades with various commercial applications, it still meets problems in some real-world scenarios due to numerous challenges, e.g. heavy facial occlusions, extremely low resolutions, strong illumination, exceptionally pose variations, image or video compression artifacts, etc. In this paper, we present a face detection approach named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN) to robustly solve the problems mentioned above. Similar to the region-based CNNs, our proposed network consists of the region proposal component and the region-of-interest (RoI) detection component. However, far apart of that network, there are two main contributions in our proposed network that play a significant role to achieve the state-of-the-art performance in face detection. Firstly, the multi-scale information is grouped both in region proposal and RoI detection to deal with tiny face regions. Secondly, our proposed network allows explicit body contextual reasoning in the network inspired from the intuition of human vision system. The proposed approach is benchmarked on two recent challenging face detection databases, i.e. the WIDER FACE Dataset which contains high degree of variability, as well as the Face Detection Dataset and Benchmark (FDDB). The experimental results show that our proposed approach trained on WIDER FACE Dataset outperforms strong baselines on WIDER FACE Dataset by a large margin, and consistently achieves competitive results on FDDB against the recent state-of-the-art face detection methods
    • …
    corecore