6,955 research outputs found
A Survey on Ear Biometrics
Recognizing people by their ear has recently received significant attention in the literature. Several reasons account for this trend: first, ear recognition does not suffer from some problems associated with other non contact biometrics, such as face recognition; second, it is the most promising candidate for combination with the face in the context of multi-pose face recognition; and third, the ear can be used for human recognition in surveillance videos where the face may be occluded completely or in part. Further, the ear appears to degrade little with age. Even though, current ear detection and recognition systems have reached a certain level of maturity, their success is limited to controlled indoor conditions. In addition to variation in illumination, other open research problems include hair occlusion; earprint forensics; ear symmetry; ear classification; and ear individuality. This paper provides a detailed survey of research conducted in ear detection and recognition. It provides an up-to-date review of the existing literature revealing the current state-of-art for not only those who are working in this area but also for those who might exploit this new approach. Furthermore, it offers insights into some unsolved ear recognition problems as well as ear databases available for researchers
Online Mutual Foreground Segmentation for Multispectral Stereo Videos
The segmentation of video sequences into foreground and background regions is
a low-level process commonly used in video content analysis and smart
surveillance applications. Using a multispectral camera setup can improve this
process by providing more diverse data to help identify objects despite adverse
imaging conditions. The registration of several data sources is however not
trivial if the appearance of objects produced by each sensor differs
substantially. This problem is further complicated when parallax effects cannot
be ignored when using close-range stereo pairs. In this work, we present a new
method to simultaneously tackle multispectral segmentation and stereo
registration. Using an iterative procedure, we estimate the labeling result for
one problem using the provisional result of the other. Our approach is based on
the alternating minimization of two energy functions that are linked through
the use of dynamic priors. We rely on the integration of shape and appearance
cues to find proper multispectral correspondences, and to properly segment
objects in low contrast regions. We also formulate our model as a frame
processing pipeline using higher order terms to improve the temporal coherence
of our results. Our method is evaluated under different configurations on
multiple multispectral datasets, and our implementation is available online.Comment: Preprint accepted for publication in IJCV (December 2018
EmoNets: Multimodal deep learning approaches for emotion recognition in video
The task of the emotion recognition in the wild (EmotiW) Challenge is to
assign one of seven emotions to short video clips extracted from Hollywood
style movies. The videos depict acted-out emotions under realistic conditions
with a large degree of variation in attributes such as pose and illumination,
making it worthwhile to explore approaches which consider combinations of
features from multiple modalities for label assignment. In this paper we
present our approach to learning several specialist models using deep learning
techniques, each focusing on one modality. Among these are a convolutional
neural network, focusing on capturing visual information in detected faces, a
deep belief net focusing on the representation of the audio stream, a K-Means
based "bag-of-mouths" model, which extracts visual features around the mouth
region and a relational autoencoder, which addresses spatio-temporal aspects of
videos. We explore multiple methods for the combination of cues from these
modalities into one common classifier. This achieves a considerably greater
accuracy than predictions from our strongest single-modality classifier. Our
method was the winning submission in the 2013 EmotiW challenge and achieved a
test set accuracy of 47.67% on the 2014 dataset
Infrared face recognition: a comprehensive review of methodologies and databases
Automatic face recognition is an area with immense practical potential which
includes a wide range of commercial and law enforcement applications. Hence it
is unsurprising that it continues to be one of the most active research areas
of computer vision. Even after over three decades of intense research, the
state-of-the-art in face recognition continues to improve, benefitting from
advances in a range of different research fields such as image processing,
pattern recognition, computer graphics, and physiology. Systems based on
visible spectrum images, the most researched face recognition modality, have
reached a significant level of maturity with some practical success. However,
they continue to face challenges in the presence of illumination, pose and
expression changes, as well as facial disguises, all of which can significantly
decrease recognition accuracy. Amongst various approaches which have been
proposed in an attempt to overcome these limitations, the use of infrared (IR)
imaging has emerged as a particularly promising research direction. This paper
presents a comprehensive and timely review of the literature on this subject.
Our key contributions are: (i) a summary of the inherent properties of infrared
imaging which makes this modality promising in the context of face recognition,
(ii) a systematic review of the most influential approaches, with a focus on
emerging common trends as well as key differences between alternative
methodologies, (iii) a description of the main databases of infrared facial
images available to the researcher, and lastly (iv) a discussion of the most
promising avenues for future research.Comment: Pattern Recognition, 2014. arXiv admin note: substantial text overlap
with arXiv:1306.160
Neural correlates of the processing of co-speech gestures
In communicative situations, speech is often accompanied by gestures. For example, speakers tend to illustrate certain contents of speech by means of iconic gestures which are hand movements that bear a formal relationship to the contents of speech. The meaning of an iconic gesture is determined both by its form as well as the speech context in which it is performed. Thus, gesture and speech interact in comprehension. Using fMRI, the present study investigated what brain areas are involved in this interaction process. Participants watched videos in which sentences containing an ambiguous word (e.g. She touched the mouse) were accompanied by either a meaningless grooming movement, a gesture supporting the more frequent dominant meaning (e.g. animal) or a gesture supporting the less frequent subordinate meaning (e.g. computer device). We hypothesized that brain areas involved in the interaction of gesture and speech would show greater activation to gesture-supported sentences as compared to sentences accompanied by a meaningless grooming movement. The main results are that when contrasted with grooming, both types of gestures (dominant and subordinate) activated an array of brain regions consisting of the left posterior superior temporal sulcus (STS), the inferior parietal lobule bilaterally and the ventral precentral sulcus bilaterally. Given the crucial role of the STS in audiovisual integration processes, this activation might reflect the interaction between the meaning of gesture and the ambiguous sentence. The activations in inferior frontal and inferior parietal regions may reflect a mechanism of determining the goal of co-speech hand movements through an observation-execution matching process
- ā¦