232,434 research outputs found
Seeing Voices and Hearing Faces: Cross-modal biometric matching
We introduce a seemingly impossible task: given only an audio clip of someone
speaking, decide which of two face images is the speaker. In this paper we
study this, and a number of related cross-modal tasks, aimed at answering the
question: how much can we infer from the voice about the face and vice versa?
We study this task "in the wild", employing the datasets that are now publicly
available for face recognition from static images (VGGFace) and speaker
identification from audio (VoxCeleb). These provide training and testing
scenarios for both static and dynamic testing of cross-modal matching. We make
the following contributions: (i) we introduce CNN architectures for both binary
and multi-way cross-modal face and audio matching, (ii) we compare dynamic
testing (where video information is available, but the audio is not from the
same video) with static testing (where only a single still image is available),
and (iii) we use human testing as a baseline to calibrate the difficulty of the
task. We show that a CNN can indeed be trained to solve this task in both the
static and dynamic scenarios, and is even well above chance on 10-way
classification of the face given the voice. The CNN matches human performance
on easy examples (e.g. different gender across faces) but exceeds human
performance on more challenging examples (e.g. faces with the same gender, age
and nationality).Comment: To appear in: IEEE Computer Vision and Pattern Recognition (CVPR),
201
Evaluation of face recognition algorithms under noise
One of the major applications of computer vision and image processing is face recognition,
where a computerized algorithm automatically identifies a person’s face from
a large image dataset or even from a live video. This thesis addresses facial recognition,
a topic that has been widely studied due to its importance in many applications
in both civilian and military domains. The application of face recognition systems
has expanded from security purposes to social networking sites, managing fraud, and
improving user experience. Numerous algorithms have been designed to perform face
recognition with good accuracy. This problem is challenging due to the dynamic nature
of the human face and the different poses that it can take. Regardless of the
algorithm, facial recognition accuracy can be heavily affected by the presence of noise.
This thesis presents a comparison of traditional and deep learning face recognition
algorithms under the presence of noise. For this purpose, Gaussian and salt-andpepper
noises are applied to the face images drawn from the ORL Dataset. The
image recognition is performed using each of the following eight algorithms: principal
component analysis (PCA), two-dimensional PCA (2D-PCA), linear discriminant
analysis (LDA), independent component analysis (ICA), discrete cosine transform
(DCT), support vector machine (SVM), convolution neural network (CNN) and Alex
Net. The ORL dataset was used in the experiments to calculate the evaluation accuracy
for each of the investigated algorithms. Each algorithm is evaluated with two
experiments; in the first experiment only one image per person is used for training,
whereas in the second experiment, five images per person are used for training. The investigated traditional algorithms are implemented with MATLAB and the deep
learning algorithms approaches are implemented with Python. The results show that
the best performance was obtained using the DCT algorithm with 92% dominant
eigenvalues and 95.25 % accuracy, whereas for deep learning, the best performance
was using a CNN with accuracy of 97.95%, which makes it the best choice under noisy
conditions
The role of facial movements in emotion recognition
Most past research on emotion recognition has used photographs of posed expressions intended to depict the apex of the emotional display. Although these studies have provided important insights into how emotions are perceived in the face, they necessarily leave out any role of dynamic information. In this Review, we synthesize evidence from vision science, affective science and neuroscience to ask when, how and why dynamic information contributes to emotion recognition, beyond the information conveyed in static images. Dynamic displays offer distinctive temporal information such as the direction, quality and speed of movement, which recruit higher-level cognitive processes and support social and emotional inferences that enhance judgements of facial affect. The positive influence of dynamic information on emotion recognition is most evident in suboptimal conditions when observers are impaired and/or facial expressions are degraded or subtle. Dynamic displays further recruit early attentional and motivational resources in the perceiver, facilitating the prompt detection and prediction of others’ emotional states, with benefits for social interaction. Finally, because emotions can be expressed in various modalities, we examine the multimodal integration of dynamic and static cues across different channels, and conclude with suggestions for future research
The Privacy Leakage of IP Camera Systems
For in-home security, intelligent operations like top individual recognition and minimizing losses due to home break-ins, emergencies, and fraud are keys to success. This application integrates the closed-circuit television (CCTV) camera and the deep learning algorithms used to process these images. Automated intrusion detection alerts, real-time fire alerts, smart checkout, and potentially fraudulent point of sale (POS) transactions are its main features. Dynamic intrusion with machine learning is a software program in which the price of certain products changes over time through an algorithm that considers a variety of pricing variables. The face locator is a part of the algorithm that locates and detects motion by using the image search function. The system collects all available product locations from the live videos from multiple cameras. This is a helpful feature for finding misplaced products and detecting POS user fraud. This intrusion detection system (IDS) records POS transaction details on the screen as an overlay on video images to reduce home break-ins. To improve the ease and speed of transaction searches, the faces of individuals are used to search for disputed cases. Smart Checkout System (SCS) utilizes a self-service kiosk where users can generate bills by showing products to the linked camera. SCS uses Google vision technology to identify products. Motion detector and queue detection will detect long queues at the checkout counter in real-time and open new lanes to speed up the transaction, improve the experience, and reduce the number of abandoned purchases. Face recognition premium and alerts can also be provided
Automatic Analysis of Facial Expressions Based on Deep Covariance Trajectories
In this paper, we propose a new approach for facial expression recognition
using deep covariance descriptors. The solution is based on the idea of
encoding local and global Deep Convolutional Neural Network (DCNN) features
extracted from still images, in compact local and global covariance
descriptors. The space geometry of the covariance matrices is that of Symmetric
Positive Definite (SPD) matrices. By conducting the classification of static
facial expressions using Support Vector Machine (SVM) with a valid Gaussian
kernel on the SPD manifold, we show that deep covariance descriptors are more
effective than the standard classification with fully connected layers and
softmax. Besides, we propose a completely new and original solution to model
the temporal dynamic of facial expressions as deep trajectories on the SPD
manifold. As an extension of the classification pipeline of covariance
descriptors, we apply SVM with valid positive definite kernels derived from
global alignment for deep covariance trajectories classification. By performing
extensive experiments on the Oulu-CASIA, CK+, and SFEW datasets, we show that
both the proposed static and dynamic approaches achieve state-of-the-art
performance for facial expression recognition outperforming many recent
approaches.Comment: A preliminary version of this work appeared in "Otberdout N, Kacem A,
Daoudi M, Ballihi L, Berretti S. Deep Covariance Descriptors for Facial
Expression Recognition, in British Machine Vision Conference 2018, BMVC 2018,
Northumbria University, Newcastle, UK, September 3-6, 2018. ; 2018 :159."
arXiv admin note: substantial text overlap with arXiv:1805.0386
- …