722 research outputs found

    Deep Multimodal Speaker Naming

    Full text link
    Automatic speaker naming is the problem of localizing as well as identifying each speaking character in a TV/movie/live show video. This is a challenging problem mainly attributes to its multimodal nature, namely face cue alone is insufficient to achieve good performance. Previous multimodal approaches to this problem usually process the data of different modalities individually and merge them using handcrafted heuristics. Such approaches work well for simple scenes, but fail to achieve high performance for speakers with large appearance variations. In this paper, we propose a novel convolutional neural networks (CNN) based learning framework to automatically learn the fusion function of both face and audio cues. We show that without using face tracking, facial landmark localization or subtitle/transcript, our system with robust multimodal feature extraction is able to achieve state-of-the-art speaker naming performance evaluated on two diverse TV series. The dataset and implementation of our algorithm are publicly available online

    Smile detection in the wild based on transfer learning

    Full text link
    Smile detection from unconstrained facial images is a specialized and challenging problem. As one of the most informative expressions, smiles convey basic underlying emotions, such as happiness and satisfaction, which lead to multiple applications, e.g., human behavior analysis and interactive controlling. Compared to the size of databases for face recognition, far less labeled data is available for training smile detection systems. To leverage the large amount of labeled data from face recognition datasets and to alleviate overfitting on smile detection, an efficient transfer learning-based smile detection approach is proposed in this paper. Unlike previous works which use either hand-engineered features or train deep convolutional networks from scratch, a well-trained deep face recognition model is explored and fine-tuned for smile detection in the wild. Three different models are built as a result of fine-tuning the face recognition model with different inputs, including aligned, unaligned and grayscale images generated from the GENKI-4K dataset. Experiments show that the proposed approach achieves improved state-of-the-art performance. Robustness of the model to noise and blur artifacts is also evaluated in this paper

    Staticand Dynamic Facial Emotion Recognition Using Neural Network Models

    Get PDF
    Emotion recognition is the process of identifying human emotions. It is made possible by processing various modalities including facial expressions, speech signals, biometricsignals,etc. Withtheadvancementsincomputingtechnologies,FacialEmo tion Recognition (FER) became important for several applications in which the user’s emotional state is required, such as emotional training for autistic children. The recent years witnessed a major leap in Artificial Intelligence(AI),specially neural networks for computer vision applications. In this thesis, we investigate the application of AI algo rithms for FER from static and dynamic data. Our experiments address the limitations and challenges of previous works such as limited generalizability due to the datasets. We compare the performance of machine learning classifiers and convolution neural networks (CNNs) for FER from static data (images). Moreover, we study the perfor mance of the proposed CNN for dynamic FER(videos),in addition to Long-ShortTerm Memory(LSTM)inaCNN-LSTM hybrid approach to utilize the temporal information in the videos. The proposed CNN architecture out performed the other classifiers with an accuracy of 86.5%. It also outperformed the hybrid approach for dynamic FER which achievedanaccuracyof74.6

    Time-Efficient Hybrid Approach for Facial Expression Recognition

    Get PDF
    Facial expression recognition is an emerging research area for improving human and computer interaction. This research plays a significant role in the field of social communication, commercial enterprise, law enforcement, and other computer interactions. In this paper, we propose a time-efficient hybrid design for facial expression recognition, combining image pre-processing steps and different Convolutional Neural Network (CNN) structures providing better accuracy and greatly improved training time. We are predicting seven basic emotions of human faces: sadness, happiness, disgust, anger, fear, surprise and neutral. The model performs well regarding challenging facial expression recognition where the emotion expressed could be one of several due to their quite similar facial characteristics such as anger, disgust, and sadness. The experiment to test the model was conducted across multiple databases and different facial orientations, and to the best of our knowledge, the model provided an accuracy of about 89.58% for KDEF dataset, 100% accuracy for JAFFE dataset and 71.975% accuracy for combined (KDEF + JAFFE + SFEW) dataset across these different scenarios. Performance evaluation was done by cross-validation techniques to avoid bias towards a specific set of images from a database

    Detection of Driver Drowsiness and Distraction Using Computer Vision and Machine Learning Approaches

    Get PDF
    Drowsiness and distracted driving are leading factor in most car crashes and near-crashes. This research study explores and investigates the applications of both conventional computer vision and deep learning approaches for the detection of drowsiness and distraction in drivers. In the first part of this MPhil research study conventional computer vision approaches was studied to develop a robust drowsiness and distraction system based on yawning detection, head pose detection and eye blinking detection. These algorithms were implemented by using existing human crafted features. Experiments were performed for the detection and classification with small image datasets to evaluate and measure the performance of system. It was observed that the use of human crafted features together with a robust classifier such as SVM gives better performance in comparison to previous approaches. Though, the results were satisfactorily, there are many drawbacks and challenges associated with conventional computer vision approaches, such as definition and extraction of human crafted features, thus making these conventional algorithms to be subjective in nature and less adaptive in practice. In contrast, deep learning approaches automates the feature selection process and can be trained to learn the most discriminative features without any input from human. In the second half of this research study, the use of deep learning approaches for the detection of distracted driving was investigated. It was observed that one of the advantages of the applied methodology and technique for distraction detection includes and illustrates the contribution of CNN enhancement to a better pattern recognition accuracy and its ability to learn features from various regions of a human body simultaneously. The comparison of the performance of four convolutional deep net architectures (AlexNet, ResNet, MobileNet and NASNet) was carried out, investigated triplet training and explored the impact of combining a support vector classifier (SVC) with a trained deep net. The images used in our experiments with the deep nets are from the State Farm Distracted Driver Detection dataset hosted on Kaggle, each of which captures the entire body of a driver. The best results were obtained with the NASNet trained using triplet loss and combined with an SVC. It was observed that one of the advantages of deep learning approaches are their ability to learn discriminative features from various regions of a human body simultaneously. The ability has enabled deep learning approaches to reach accuracy at human level.

    Improving Facial Emotion Recognition with Image processing and Deep Learning

    Get PDF
    Humans often use facial expressions along with words in order to communicate effectively. There has been extensive study of how we can classify facial emotion with computer vision methodologies. These have had varying levels of success given challenges and the limitations of databases, such as static data or facial capture in non-real environments. Given this, we believe that new preprocessing techniques are required to improve the accuracy of facial detection models. In this paper, we propose a new yet simple method for facial expression recognition that enhances accuracy. We conducted our experiments on the FER-2013 dataset that contains static facial images. We utilized Unsharp Mask and Histogram equalization to emphasize texture and details of the images. We implemented Convolution Neural Networks [CNNs] to classify the images into 7 different facial expressions, yielding an accuracy of 69.46% on the test set. We also employed pre-trained models such as Resnet-50, Senet-50, VGG16, and FaceNet, and applied transfer learning to achieve an accuracy of 76.01% using an ensemble of seven models

    Improving Human Face Recognition Using Deep Learning Based Image Registration And Multi-Classifier Approaches

    Get PDF
    Face detection, registration, and recognition have become a fascinating field for researchers. The motivation behind the enormous interest in the topic is the need to improve the accuracy of many real-time applications. Countless methodologies have been acknowledged and presented in the past years. The complexity of the human face visual and the significant changes based on different effects make it more challenging to design as well as implementing a powerful computational system for object recognition in addition to human face recognition. Using supervised learning often requires extensive training for the computer which results in high execution times. It is an essential step in the face recognition to apply strong preprocessing approaches such as face registration to achieve a high recognition accuracy rate. Although there are exist approaches do both detection and recognition, we believe the absence of a complete end-to-end system capable of performing recognition from an arbitrary scene is in large part due to the difficulty in alignment. Often, the face registration is ignored, with the assumption that the detector will perform a rough alignment, leading to suboptimal recognition performance. In this research, we presented an enhanced approach to improve human face recognition using a back-propagation neural network (BPNN) and features extraction based on the correlation between the training images. A key contribution of this paper is the generation of a new set called the T-Dataset from the original training data set, which is used to train the BPNN. We generated the T-Dataset using the correlation between the training images without using a common technique of image density. The correlated T-Dataset provides a high distinction layer between the training images, which helps the BPNN to converge faster and achieve better accuracy. Data and features reduction is essential in the face recognition process, and researchers have recently focused on the modern neural network. Therefore, we used using a classical conventional Principal Component Analysis (PCA) and Local Binary Patterns (LBP) to prove that there is a potential improvement even using traditional methods. We applied five distance measurement algorithms and then combined them to obtain the T-Dataset, which we fed into the BPNN. We achieved higher face recognition accuracy with less computational cost compared with the current approach by using reduced image features. We test the proposed framework on two small data sets, the YALE and AT&T data sets, as the ground truth. We achieved tremendous accuracy. Furthermore, we evaluate our method on one of the state-of-the-art benchmark data sets, Labeled Faces in the Wild (LFW), where we produce a competitive face recognition performance. In addition, we presented an enhanced framework to improve the face registration using deep learning model. We used deep architectures such as VGG16 and VGG19 to train our method. We trained our model to learn the transformation parameters (Rotation, scaling, and shifting). By leaning the transformation parameters, we will able to transfer the image back to the frontal domain. We used the LFW dataset to evaluate our method, and we achieve high accuracy

    Using Bezier Curve analysis in context of Expression Analysis

    Get PDF
    Affective computing is an area of research under increasing demand in the field of computer vision. Expression analysis, in particular, is a topic that has been undergoing research for many years. In this paper, an algorithm for expression determination and analysis is performed for the detection of seven expressions: sadness, anger, happiness, neutral, fear, disgust and surprise. First, the 68 landmarks of the face are detected and the face is realigned and warped to obtain a new image. Next, feature extraction is performed using LPQ. We then use a dimensionality reduction algorithm followed by a dual RBF-SVM and Adaboost classification algorithm to find the interest points in the features extracted. We then plot bezier curves on the regions of interest obtained. The curves are then used as the input to a CNN and this determines the facial expression. The results showed the algorithm to be extremely successfu