443 research outputs found

    LOMo: Latent Ordinal Model for Facial Analysis in Videos

    Full text link
    We study the problem of facial analysis in videos. We propose a novel weakly supervised learning method that models the video event (expression, pain etc.) as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for smile, brow lower and cheek raise for pain). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF- it extends such frameworks to model the ordinal or temporal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations. In combination with complimentary features, we report state-of-the-art results on these datasets.Comment: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR

    Discriminatively Trained Latent Ordinal Model for Video Classification

    Full text link
    We study the problem of video classification for facial analysis and human action recognition. We propose a novel weakly supervised learning method that models the video as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for "smile", running and jumping for "highjump"). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF -- it extends such frameworks to model the ordinal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations and on three challenging human action datasets. We also validate the method with qualitative results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1604.0150

    Temporal Exemplar-based Bayesian Networks for facial expression recognition

    Get PDF
    Proceedings of the International Conference on Machine Learning and Applications, 2008, p. 16-22We present a Temporal Exemplar-based Bayesian Networks (TEBNs) far facial expression recognition. The proposed Bayesian Networks (BNs) consists of three layers: Observation layer, Exemplars layer and Prior Knowledge layer. In the Exemplars layer, exemplar-based model is integrated with BNs to improve the accuracy of probability estimation. In the Prior Knowledge layer, static BNs is extended to Temporal BNs by considering historical observations to model temporal behavior of facial expression. Experiment on CMU expression database illustrates that the proposed TEBNs is very efficient in modeling the evolution of facial deformation. © 2008 IEEE.published_or_final_versio

    An Experimental Investigation about the Integration of Facial Dynamics in Video-Based Face Recognition

    Get PDF
    Recent psychological and neural studies indicate that when people talk their changing facial expressions and head movements provide a dynamic cue for recognition. Therefore, both fixed facial features and dynamic personal characteristics are used in the human visual system (HVS) to recognize faces. However, most automatic recognition systems use only the static information as it is unclear how the dynamic cue can be integrated and exploited. The few works attempting to combine facial structure and its dynamics do not consider the relative importance of these two cues. They rather combine the two cues in an adhoc manner. But what is the relative importance of these two cues separately? Does combining them enhance systematically the recognition performance? To date, no work has extensively studied these issues. In this article, we investigate these issues by analyzing the effects of incorporating the dynamic information in video-based automatic face recognition. We consider two factors (face sequence length and image quality) and study their effects on the performance of video-based systems that attempt to use a spatio-temporal representation instead of one based on a still image. We experiment with two different databases and consider HMM (the temporal hidden Markov model) and ARMA (the auto-regressive and moving average model) as baseline methods for the spatio-temporal representation and PCA and LDA for the image-based one. The extensive experimental results show that motion information enhances also automatic recognition but not in a systematic way as in the HVS

    Facial Emotion Expressions in Human-Robot Interaction: A Survey

    Get PDF
    Facial expressions are an ideal means of communicating one's emotions or intentions to others. This overview will focus on human facial expression recognition as well as robotic facial expression generation. In the case of human facial expression recognition, both facial expression recognition on predefined datasets as well as in real-time will be covered. For robotic facial expression generation, hand-coded and automated methods i.e., facial expressions of a robot are generated by moving the features (eyes, mouth) of the robot by hand-coding or automatically using machine learning techniques, will also be covered. There are already plenty of studies that achieve high accuracy for emotion expression recognition on predefined datasets, but the accuracy for facial expression recognition in real-time is comparatively lower. In the case of expression generation in robots, while most of the robots are capable of making basic facial expressions, there are not many studies that enable robots to do so automatically. In this overview, state-of-the-art research in facial emotion expressions during human-robot interaction has been discussed leading to several possible directions for future research

    Recognition of human activities and expressions in video sequences using shape context descriptor

    Get PDF
    The recognition of objects and classes of objects is of importance in the field of computer vision due to its applicability in areas such as video surveillance, medical imaging and retrieval of images and videos from large databases on the Internet. Effective recognition of object classes is still a challenge in vision; hence, there is much interest to improve the rate of recognition in order to keep up with the rising demands of the fields where these techniques are being applied. This thesis investigates the recognition of activities and expressions in video sequences using a new descriptor called the spatiotemporal shape context. The shape context is a well-known algorithm that describes the shape of an object based upon the mutual distribution of points in the contour of the object; however, it falls short when the distinctive property of an object is not just its shape but also its movement across frames in a video sequence. Since actions and expressions tend to have a motion component that enhances the capability of distinguishing them, the shape based information from the shape context proves insufficient. This thesis proposes new 3D and 4D spatiotemporal shape context descriptors that incorporate into the original shape context changes in motion across frames. Results of classification of actions and expressions demonstrate that the spatiotemporal shape context is better than the original shape context at enhancing recognition of classes in the activity and expression domains

    Facial Expression Recognition in the Presence of Head Motion

    Get PDF
    corecore