232,434 research outputs found

    Seeing Voices and Hearing Faces: Cross-modal biometric matching

    Full text link
    We introduce a seemingly impossible task: given only an audio clip of someone speaking, decide which of two face images is the speaker. In this paper we study this, and a number of related cross-modal tasks, aimed at answering the question: how much can we infer from the voice about the face and vice versa? We study this task "in the wild", employing the datasets that are now publicly available for face recognition from static images (VGGFace) and speaker identification from audio (VoxCeleb). These provide training and testing scenarios for both static and dynamic testing of cross-modal matching. We make the following contributions: (i) we introduce CNN architectures for both binary and multi-way cross-modal face and audio matching, (ii) we compare dynamic testing (where video information is available, but the audio is not from the same video) with static testing (where only a single still image is available), and (iii) we use human testing as a baseline to calibrate the difficulty of the task. We show that a CNN can indeed be trained to solve this task in both the static and dynamic scenarios, and is even well above chance on 10-way classification of the face given the voice. The CNN matches human performance on easy examples (e.g. different gender across faces) but exceeds human performance on more challenging examples (e.g. faces with the same gender, age and nationality).Comment: To appear in: IEEE Computer Vision and Pattern Recognition (CVPR), 201

    Evaluation of face recognition algorithms under noise

    Get PDF
    One of the major applications of computer vision and image processing is face recognition, where a computerized algorithm automatically identifies a person’s face from a large image dataset or even from a live video. This thesis addresses facial recognition, a topic that has been widely studied due to its importance in many applications in both civilian and military domains. The application of face recognition systems has expanded from security purposes to social networking sites, managing fraud, and improving user experience. Numerous algorithms have been designed to perform face recognition with good accuracy. This problem is challenging due to the dynamic nature of the human face and the different poses that it can take. Regardless of the algorithm, facial recognition accuracy can be heavily affected by the presence of noise. This thesis presents a comparison of traditional and deep learning face recognition algorithms under the presence of noise. For this purpose, Gaussian and salt-andpepper noises are applied to the face images drawn from the ORL Dataset. The image recognition is performed using each of the following eight algorithms: principal component analysis (PCA), two-dimensional PCA (2D-PCA), linear discriminant analysis (LDA), independent component analysis (ICA), discrete cosine transform (DCT), support vector machine (SVM), convolution neural network (CNN) and Alex Net. The ORL dataset was used in the experiments to calculate the evaluation accuracy for each of the investigated algorithms. Each algorithm is evaluated with two experiments; in the first experiment only one image per person is used for training, whereas in the second experiment, five images per person are used for training. The investigated traditional algorithms are implemented with MATLAB and the deep learning algorithms approaches are implemented with Python. The results show that the best performance was obtained using the DCT algorithm with 92% dominant eigenvalues and 95.25 % accuracy, whereas for deep learning, the best performance was using a CNN with accuracy of 97.95%, which makes it the best choice under noisy conditions

    The role of facial movements in emotion recognition

    Get PDF
    Most past research on emotion recognition has used photographs of posed expressions intended to depict the apex of the emotional display. Although these studies have provided important insights into how emotions are perceived in the face, they necessarily leave out any role of dynamic information. In this Review, we synthesize evidence from vision science, affective science and neuroscience to ask when, how and why dynamic information contributes to emotion recognition, beyond the information conveyed in static images. Dynamic displays offer distinctive temporal information such as the direction, quality and speed of movement, which recruit higher-level cognitive processes and support social and emotional inferences that enhance judgements of facial affect. The positive influence of dynamic information on emotion recognition is most evident in suboptimal conditions when observers are impaired and/or facial expressions are degraded or subtle. Dynamic displays further recruit early attentional and motivational resources in the perceiver, facilitating the prompt detection and prediction of others’ emotional states, with benefits for social interaction. Finally, because emotions can be expressed in various modalities, we examine the multimodal integration of dynamic and static cues across different channels, and conclude with suggestions for future research

    The Privacy Leakage of IP Camera Systems

    Get PDF
    For in-home security, intelligent operations like top individual recognition and minimizing losses due to home break-ins, emergencies, and fraud are keys to success. This application integrates the closed-circuit television (CCTV) camera and the deep learning algorithms used to process these images. Automated intrusion detection alerts, real-time fire alerts, smart checkout, and potentially fraudulent point of sale (POS) transactions are its main features. Dynamic intrusion with machine learning is a software program in which the price of certain products changes over time through an algorithm that considers a variety of pricing variables. The face locator is a part of the algorithm that locates and detects motion by using the image search function. The system collects all available product locations from the live videos from multiple cameras. This is a helpful feature for finding misplaced products and detecting POS user fraud. This intrusion detection system (IDS) records POS transaction details on the screen as an overlay on video images to reduce home break-ins. To improve the ease and speed of transaction searches, the faces of individuals are used to search for disputed cases. Smart Checkout System (SCS) utilizes a self-service kiosk where users can generate bills by showing products to the linked camera. SCS uses Google vision technology to identify products. Motion detector and queue detection will detect long queues at the checkout counter in real-time and open new lanes to speed up the transaction, improve the experience, and reduce the number of abandoned purchases. Face recognition premium and alerts can also be provided

    Facial Expression Recognition

    Get PDF

    Automatic Analysis of Facial Expressions Based on Deep Covariance Trajectories

    Get PDF
    In this paper, we propose a new approach for facial expression recognition using deep covariance descriptors. The solution is based on the idea of encoding local and global Deep Convolutional Neural Network (DCNN) features extracted from still images, in compact local and global covariance descriptors. The space geometry of the covariance matrices is that of Symmetric Positive Definite (SPD) matrices. By conducting the classification of static facial expressions using Support Vector Machine (SVM) with a valid Gaussian kernel on the SPD manifold, we show that deep covariance descriptors are more effective than the standard classification with fully connected layers and softmax. Besides, we propose a completely new and original solution to model the temporal dynamic of facial expressions as deep trajectories on the SPD manifold. As an extension of the classification pipeline of covariance descriptors, we apply SVM with valid positive definite kernels derived from global alignment for deep covariance trajectories classification. By performing extensive experiments on the Oulu-CASIA, CK+, and SFEW datasets, we show that both the proposed static and dynamic approaches achieve state-of-the-art performance for facial expression recognition outperforming many recent approaches.Comment: A preliminary version of this work appeared in "Otberdout N, Kacem A, Daoudi M, Ballihi L, Berretti S. Deep Covariance Descriptors for Facial Expression Recognition, in British Machine Vision Conference 2018, BMVC 2018, Northumbria University, Newcastle, UK, September 3-6, 2018. ; 2018 :159." arXiv admin note: substantial text overlap with arXiv:1805.0386
    • …
    corecore