10,938 research outputs found

    Temporal Exemplar-based Bayesian Networks for facial expression recognition

    Get PDF
    Proceedings of the International Conference on Machine Learning and Applications, 2008, p. 16-22We present a Temporal Exemplar-based Bayesian Networks (TEBNs) far facial expression recognition. The proposed Bayesian Networks (BNs) consists of three layers: Observation layer, Exemplars layer and Prior Knowledge layer. In the Exemplars layer, exemplar-based model is integrated with BNs to improve the accuracy of probability estimation. In the Prior Knowledge layer, static BNs is extended to Temporal BNs by considering historical observations to model temporal behavior of facial expression. Experiment on CMU expression database illustrates that the proposed TEBNs is very efficient in modeling the evolution of facial deformation. © 2008 IEEE.published_or_final_versio

    Taking the bite out of automated naming of characters in TV video

    No full text
    We investigate the problem of automatically labelling appearances of characters in TV or film material with their names. This is tremendously challenging due to the huge variation in imaged appearance of each character and the weakness and ambiguity of available annotation. However, we demonstrate that high precision can be achieved by combining multiple sources of information, both visual and textual. The principal novelties that we introduce are: (i) automatic generation of time stamped character annotation by aligning subtitles and transcripts; (ii) strengthening the supervisory information by identifying when characters are speaking. In addition, we incorporate complementary cues of face matching and clothing matching to propose common annotations for face tracks, and consider choices of classifier which can potentially correct errors made in the automatic extraction of training data from the weak textual annotation. Results are presented on episodes of the TV series ‘‘Buffy the Vampire Slayer”

    LOMo: Latent Ordinal Model for Facial Analysis in Videos

    Full text link
    We study the problem of facial analysis in videos. We propose a novel weakly supervised learning method that models the video event (expression, pain etc.) as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for smile, brow lower and cheek raise for pain). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF- it extends such frameworks to model the ordinal or temporal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations. In combination with complimentary features, we report state-of-the-art results on these datasets.Comment: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR

    Familiarization through Ambient Images Alone

    Get PDF
    The term “ambient images” has begun to show up in much of the current literature on facial recognition. Ambient images refer to naturally occurring views of a face that captures the idiosyncratic ways in which a target face may vary (Ritchie & Burton, 2017). Much of the literature on ambient images have concluded that exposing people to ambient images of a target face can lead to improved facial recognition for that target face. Some studies have even suggested that familiarity is the result of increased exposure to ambient images of a target face (Burton, Kramer, Ritchie, & Jenkins, 2016). The current study extended the literature on ambient images. Using the face sorting paradigm from Jenkins, White, Van Montfort, and Burton (2011), the current study served three purposes. First, this study captured whether there was an incremental benefit in showing ambient images. Particularly, we observed whether performance improved as participants were shown a low, medium, or high number of ambient images. Next, this study attempted to provide a strong enough manipulation that participant would be able to perform the face sorting task perfectly, after being exposed to a high number (45 total) of ambient images. Lastly, this study introduced time data as a measure of face familiarity. The results found support for one aim of this study and partial support for another aim of this study. Time data were found to be an effective quantitative measure of familiarity. Also, there was some evidence of an incremental benefit of ambient images, but that benefit disappeared after viewing around 15 unique exemplar presentations of a novel identity’s face. Lastly, exposing participants to 45 ambient images alone did not cause them to reach perfect performance. The paper concludes with a discussion on the need to extend past ambient images to understand how to best mimic natural familiarity in a lab setting

    Discriminatively Trained Latent Ordinal Model for Video Classification

    Full text link
    We study the problem of video classification for facial analysis and human action recognition. We propose a novel weakly supervised learning method that models the video as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for "smile", running and jumping for "highjump"). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF -- it extends such frameworks to model the ordinal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations and on three challenging human action datasets. We also validate the method with qualitative results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1604.0150

    Mean value coordinates–based caricature and expression synthesis

    Get PDF
    We present a novel method for caricature synthesis based on mean value coordinates (MVC). Our method can be applied to any single frontal face image to learn a specified caricature face pair for frontal and 3D caricature synthesis. This technique only requires one or a small number of exemplar pairs and a natural frontal face image training set, while the system can transfer the style of the exemplar pair across individuals. Further exaggeration can be fulfilled in a controllable way. Our method is further applied to facial expression transfer, interpolation, and exaggeration, which are applications of expression editing. Additionally, we have extended our approach to 3D caricature synthesis based on the 3D version of MVC. With experiments we demonstrate that the transferred expressions are credible and the resulting caricatures can be characterized and recognized

    A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"

    Full text link
    Recently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as "in-the-wild"). This is partially attributed to the fact that comprehensive "in-the-wild" benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking "in-the-wild". Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second authorshi
    • 

    corecore