5,049 research outputs found
On Robust Face Recognition via Sparse Encoding: the Good, the Bad, and the Ugly
In the field of face recognition, Sparse Representation (SR) has received
considerable attention during the past few years. Most of the relevant
literature focuses on holistic descriptors in closed-set identification
applications. The underlying assumption in SR-based methods is that each class
in the gallery has sufficient samples and the query lies on the subspace
spanned by the gallery of the same class. Unfortunately, such assumption is
easily violated in the more challenging face verification scenario, where an
algorithm is required to determine if two faces (where one or both have not
been seen before) belong to the same person. In this paper, we first discuss
why previous attempts with SR might not be applicable to verification problems.
We then propose an alternative approach to face verification via SR.
Specifically, we propose to use explicit SR encoding on local image patches
rather than the entire face. The obtained sparse signals are pooled via
averaging to form multiple region descriptors, which are then concatenated to
form an overall face descriptor. Due to the deliberate loss spatial relations
within each region (caused by averaging), the resulting descriptor is robust to
misalignment & various image deformations. Within the proposed framework, we
evaluate several SR encoding techniques: l1-minimisation, Sparse Autoencoder
Neural Network (SANN), and an implicit probabilistic technique based on
Gaussian Mixture Models. Thorough experiments on AR, FERET, exYaleB, BANCA and
ChokePoint datasets show that the proposed local SR approach obtains
considerably better and more robust performance than several previous
state-of-the-art holistic SR methods, in both verification and closed-set
identification problems. The experiments also show that l1-minimisation based
encoding has a considerably higher computational than the other techniques, but
leads to higher recognition rates
Attentive monitoring of multiple video streams driven by a Bayesian foraging strategy
In this paper we shall consider the problem of deploying attention to subsets
of the video streams for collating the most relevant data and information of
interest related to a given task. We formalize this monitoring problem as a
foraging problem. We propose a probabilistic framework to model observer's
attentive behavior as the behavior of a forager. The forager, moment to moment,
focuses its attention on the most informative stream/camera, detects
interesting objects or activities, or switches to a more profitable stream. The
approach proposed here is suitable to be exploited for multi-stream video
summarization. Meanwhile, it can serve as a preliminary step for more
sophisticated video surveillance, e.g. activity and behavior analysis.
Experimental results achieved on the UCR Videoweb Activities Dataset, a
publicly available dataset, are presented to illustrate the utility of the
proposed technique.Comment: Accepted to IEEE Transactions on Image Processin
Evaluation and Understandability of Face Image Quality Assessment
Face image quality assessment (FIQA) has been an area of interest to researchers as a way to improve the face recognition accuracy. By filtering out the low quality images we can reduce various difficulties faced in unconstrained face recognition, such as, failure in face or facial landmark detection or low presence of useful facial information. In last decade or so, researchers have proposed different methods to assess the face image quality, spanning from fusion of quality measures to using learning based methods. Different approaches have their own strength and weaknesses. But, it is hard to perform a comparative assessment of these methods without a database containing wide variety of face quality, a suitable training protocol that can efficiently utilize this large-scale dataset. In this thesis we focus on developing an evaluation platfrom using a large scale face database containing wide ranging face image quality and try to deconstruct the reason behind the predicted scores of learning based face image quality assessment methods. Contributions of this thesis is two-fold. Firstly, (i) a carefully crafted large scale database dedicated entirely to face image quality assessment has been proposed; (ii) a learning to rank based large-scale training protocol is devel- oped. Finally, (iii) a comprehensive study of 15 face image quality assessment methods using 12 different feature types, and relative ranking based label generation schemes, is performed. Evalua- tion results show various insights about the assessment methods which indicate the significance of the proposed database and the training protocol. Secondly, we have seen that in last few years, researchers have tried various learning based approaches to assess the face image quality. Most of these methods offer either a quality bin or a score summary as a measure of the biometric quality of the face image. But, to the best of our knowledge, so far there has not been any investigation on what are the explainable reasons behind the predicted scores. In this thesis, we propose a method to provide a clear and concise understanding of the predicted quality score of a learning based face image quality assessment. It is believed that this approach can be integrated into the FBI’s understandable template and can help in improving the image acquisition process by providing information on what quality factors need to be addressed
Content-Adaptive Sketch Portrait Generation by Decompositional Representation Learning
Sketch portrait generation benefits a wide range of applications such as
digital entertainment and law enforcement. Although plenty of efforts have been
dedicated to this task, several issues still remain unsolved for generating
vivid and detail-preserving personal sketch portraits. For example, quite a few
artifacts may exist in synthesizing hairpins and glasses, and textural details
may be lost in the regions of hair or mustache. Moreover, the generalization
ability of current systems is somewhat limited since they usually require
elaborately collecting a dictionary of examples or carefully tuning
features/components. In this paper, we present a novel representation learning
framework that generates an end-to-end photo-sketch mapping through structure
and texture decomposition. In the training stage, we first decompose the input
face photo into different components according to their representational
contents (i.e., structural and textural parts) by using a pre-trained
Convolutional Neural Network (CNN). Then, we utilize a Branched Fully
Convolutional Neural Network (BFCN) for learning structural and textural
representations, respectively. In addition, we design a Sorted Matching Mean
Square Error (SM-MSE) metric to measure texture patterns in the loss function.
In the stage of sketch rendering, our approach automatically generates
structural and textural representations for the input photo and produces the
final result via a probabilistic fusion scheme. Extensive experiments on
several challenging benchmarks suggest that our approach outperforms
example-based synthesis algorithms in terms of both perceptual and objective
metrics. In addition, the proposed method also has better generalization
ability across dataset without additional training.Comment: Published in TIP 201
- …