20,676 research outputs found
On Robust Face Recognition via Sparse Encoding: the Good, the Bad, and the Ugly
In the field of face recognition, Sparse Representation (SR) has received
considerable attention during the past few years. Most of the relevant
literature focuses on holistic descriptors in closed-set identification
applications. The underlying assumption in SR-based methods is that each class
in the gallery has sufficient samples and the query lies on the subspace
spanned by the gallery of the same class. Unfortunately, such assumption is
easily violated in the more challenging face verification scenario, where an
algorithm is required to determine if two faces (where one or both have not
been seen before) belong to the same person. In this paper, we first discuss
why previous attempts with SR might not be applicable to verification problems.
We then propose an alternative approach to face verification via SR.
Specifically, we propose to use explicit SR encoding on local image patches
rather than the entire face. The obtained sparse signals are pooled via
averaging to form multiple region descriptors, which are then concatenated to
form an overall face descriptor. Due to the deliberate loss spatial relations
within each region (caused by averaging), the resulting descriptor is robust to
misalignment & various image deformations. Within the proposed framework, we
evaluate several SR encoding techniques: l1-minimisation, Sparse Autoencoder
Neural Network (SANN), and an implicit probabilistic technique based on
Gaussian Mixture Models. Thorough experiments on AR, FERET, exYaleB, BANCA and
ChokePoint datasets show that the proposed local SR approach obtains
considerably better and more robust performance than several previous
state-of-the-art holistic SR methods, in both verification and closed-set
identification problems. The experiments also show that l1-minimisation based
encoding has a considerably higher computational than the other techniques, but
leads to higher recognition rates
Reference face graph for face recognition
Face recognition has been studied extensively; however, real-world face recognition still remains a challenging task. The demand for unconstrained practical face recognition is rising with the explosion of online multimedia such as social networks, and video surveillance footage where face analysis is of significant importance. In this paper, we approach face recognition in the context of graph theory. We recognize an unknown face using an external reference face graph (RFG). An RFG is generated and recognition of a given face is achieved by comparing it to the faces in the constructed RFG. Centrality measures are utilized to identify distinctive faces in the reference face graph. The proposed RFG-based face recognition algorithm is robust to the changes in pose and it is also alignment free. The RFG recognition is used in conjunction with DCT locality sensitive hashing for efficient retrieval to ensure scalability. Experiments are conducted on several publicly available databases and the results show that the proposed approach outperforms the state-of-the-art methods without any preprocessing necessities such as face alignment. Due to the richness in the reference set construction, the proposed method can also handle illumination and expression variation
Multimodal Visual Concept Learning with Weakly Supervised Techniques
Despite the availability of a huge amount of video data accompanied by
descriptive texts, it is not always easy to exploit the information contained
in natural language in order to automatically recognize video concepts. Towards
this goal, in this paper we use textual cues as means of supervision,
introducing two weakly supervised techniques that extend the Multiple Instance
Learning (MIL) framework: the Fuzzy Sets Multiple Instance Learning (FSMIL) and
the Probabilistic Labels Multiple Instance Learning (PLMIL). The former encodes
the spatio-temporal imprecision of the linguistic descriptions with Fuzzy Sets,
while the latter models different interpretations of each description's
semantics with Probabilistic Labels, both formulated through a convex
optimization algorithm. In addition, we provide a novel technique to extract
weak labels in the presence of complex semantics, that consists of semantic
similarity computations. We evaluate our methods on two distinct problems,
namely face and action recognition, in the challenging and realistic setting of
movies accompanied by their screenplays, contained in the COGNIMUSE database.
We show that, on both tasks, our method considerably outperforms a
state-of-the-art weakly supervised approach, as well as other baselines.Comment: CVPR 201
Side-View Face Recognition
Side-view face recognition is a challenging problem with many applications. Especially in real-life scenarios where the environment is uncontrolled, coping with pose variations up to side-view positions is an important task for face recognition. In this paper we discuss the use of side view face recognition techniques to be used in house safety applications. Our aim is to recognize people as they pass through a door, and estimate their location in the house. Here, we compare available databases appropriate for this task, and review current methods for profile face recognition
Non-Parametric Probabilistic Image Segmentation
We propose a simple probabilistic generative model for
image segmentation. Like other probabilistic algorithms
(such as EM on a Mixture of Gaussians) the proposed model
is principled, provides both hard and probabilistic cluster
assignments, as well as the ability to naturally incorporate
prior knowledge. While previous probabilistic approaches
are restricted to parametric models of clusters (e.g., Gaussians)
we eliminate this limitation. The suggested approach
does not make heavy assumptions on the shape of the clusters
and can thus handle complex structures. Our experiments
show that the suggested approach outperforms previous
work on a variety of image segmentation tasks
Learning Object Categories From Internet Image Searches
In this paper, we describe a simple approach to learning models of visual object categories from images gathered from Internet image search engines. The images for a given keyword are typically highly variable, with a large fraction being unrelated to the query term, and thus pose a challenging environment from which to learn. By training our models directly from Internet images, we remove the need to laboriously compile training data sets, required by most other recognition approaches-this opens up the possibility of learning object category models āon-the-fly.ā We describe two simple approaches, derived from the probabilistic latent semantic analysis (pLSA) technique for text document analysis, that can be used to automatically learn object models from these data. We show two applications of the learned model: first, to rerank the images returned by the search engine, thus improving the quality of the search engine; and second, to recognize objects in other image data sets
- ā¦