4,894 research outputs found

    On Robust Face Recognition via Sparse Encoding: the Good, the Bad, and the Ugly

    Get PDF
    In the field of face recognition, Sparse Representation (SR) has received considerable attention during the past few years. Most of the relevant literature focuses on holistic descriptors in closed-set identification applications. The underlying assumption in SR-based methods is that each class in the gallery has sufficient samples and the query lies on the subspace spanned by the gallery of the same class. Unfortunately, such assumption is easily violated in the more challenging face verification scenario, where an algorithm is required to determine if two faces (where one or both have not been seen before) belong to the same person. In this paper, we first discuss why previous attempts with SR might not be applicable to verification problems. We then propose an alternative approach to face verification via SR. Specifically, we propose to use explicit SR encoding on local image patches rather than the entire face. The obtained sparse signals are pooled via averaging to form multiple region descriptors, which are then concatenated to form an overall face descriptor. Due to the deliberate loss spatial relations within each region (caused by averaging), the resulting descriptor is robust to misalignment & various image deformations. Within the proposed framework, we evaluate several SR encoding techniques: l1-minimisation, Sparse Autoencoder Neural Network (SANN), and an implicit probabilistic technique based on Gaussian Mixture Models. Thorough experiments on AR, FERET, exYaleB, BANCA and ChokePoint datasets show that the proposed local SR approach obtains considerably better and more robust performance than several previous state-of-the-art holistic SR methods, in both verification and closed-set identification problems. The experiments also show that l1-minimisation based encoding has a considerably higher computational than the other techniques, but leads to higher recognition rates

    Attentive monitoring of multiple video streams driven by a Bayesian foraging strategy

    Full text link
    In this paper we shall consider the problem of deploying attention to subsets of the video streams for collating the most relevant data and information of interest related to a given task. We formalize this monitoring problem as a foraging problem. We propose a probabilistic framework to model observer's attentive behavior as the behavior of a forager. The forager, moment to moment, focuses its attention on the most informative stream/camera, detects interesting objects or activities, or switches to a more profitable stream. The approach proposed here is suitable to be exploited for multi-stream video summarization. Meanwhile, it can serve as a preliminary step for more sophisticated video surveillance, e.g. activity and behavior analysis. Experimental results achieved on the UCR Videoweb Activities Dataset, a publicly available dataset, are presented to illustrate the utility of the proposed technique.Comment: Accepted to IEEE Transactions on Image Processin

    Evaluation and Understandability of Face Image Quality Assessment

    Get PDF
    Face image quality assessment (FIQA) has been an area of interest to researchers as a way to improve the face recognition accuracy. By filtering out the low quality images we can reduce various difficulties faced in unconstrained face recognition, such as, failure in face or facial landmark detection or low presence of useful facial information. In last decade or so, researchers have proposed different methods to assess the face image quality, spanning from fusion of quality measures to using learning based methods. Different approaches have their own strength and weaknesses. But, it is hard to perform a comparative assessment of these methods without a database containing wide variety of face quality, a suitable training protocol that can efficiently utilize this large-scale dataset. In this thesis we focus on developing an evaluation platfrom using a large scale face database containing wide ranging face image quality and try to deconstruct the reason behind the predicted scores of learning based face image quality assessment methods. Contributions of this thesis is two-fold. Firstly, (i) a carefully crafted large scale database dedicated entirely to face image quality assessment has been proposed; (ii) a learning to rank based large-scale training protocol is devel- oped. Finally, (iii) a comprehensive study of 15 face image quality assessment methods using 12 different feature types, and relative ranking based label generation schemes, is performed. Evalua- tion results show various insights about the assessment methods which indicate the significance of the proposed database and the training protocol. Secondly, we have seen that in last few years, researchers have tried various learning based approaches to assess the face image quality. Most of these methods offer either a quality bin or a score summary as a measure of the biometric quality of the face image. But, to the best of our knowledge, so far there has not been any investigation on what are the explainable reasons behind the predicted scores. In this thesis, we propose a method to provide a clear and concise understanding of the predicted quality score of a learning based face image quality assessment. It is believed that this approach can be integrated into the FBI’s understandable template and can help in improving the image acquisition process by providing information on what quality factors need to be addressed

    Content-Adaptive Sketch Portrait Generation by Decompositional Representation Learning

    Get PDF
    Sketch portrait generation benefits a wide range of applications such as digital entertainment and law enforcement. Although plenty of efforts have been dedicated to this task, several issues still remain unsolved for generating vivid and detail-preserving personal sketch portraits. For example, quite a few artifacts may exist in synthesizing hairpins and glasses, and textural details may be lost in the regions of hair or mustache. Moreover, the generalization ability of current systems is somewhat limited since they usually require elaborately collecting a dictionary of examples or carefully tuning features/components. In this paper, we present a novel representation learning framework that generates an end-to-end photo-sketch mapping through structure and texture decomposition. In the training stage, we first decompose the input face photo into different components according to their representational contents (i.e., structural and textural parts) by using a pre-trained Convolutional Neural Network (CNN). Then, we utilize a Branched Fully Convolutional Neural Network (BFCN) for learning structural and textural representations, respectively. In addition, we design a Sorted Matching Mean Square Error (SM-MSE) metric to measure texture patterns in the loss function. In the stage of sketch rendering, our approach automatically generates structural and textural representations for the input photo and produces the final result via a probabilistic fusion scheme. Extensive experiments on several challenging benchmarks suggest that our approach outperforms example-based synthesis algorithms in terms of both perceptual and objective metrics. In addition, the proposed method also has better generalization ability across dataset without additional training.Comment: Published in TIP 201
    • …
    corecore