1,025 research outputs found

    Taking the bite out of automated naming of characters in TV video

    No full text
    We investigate the problem of automatically labelling appearances of characters in TV or film material with their names. This is tremendously challenging due to the huge variation in imaged appearance of each character and the weakness and ambiguity of available annotation. However, we demonstrate that high precision can be achieved by combining multiple sources of information, both visual and textual. The principal novelties that we introduce are: (i) automatic generation of time stamped character annotation by aligning subtitles and transcripts; (ii) strengthening the supervisory information by identifying when characters are speaking. In addition, we incorporate complementary cues of face matching and clothing matching to propose common annotations for face tracks, and consider choices of classifier which can potentially correct errors made in the automatic extraction of training data from the weak textual annotation. Results are presented on episodes of the TV series ‘‘Buffy the Vampire Slayer”

    Beyond Frontal Faces: Improving Person Recognition Using Multiple Cues

    Full text link
    We explore the task of recognizing peoples' identities in photo albums in an unconstrained setting. To facilitate this, we introduce the new People In Photo Albums (PIPA) dataset, consisting of over 60000 instances of 2000 individuals collected from public Flickr photo albums. With only about half of the person images containing a frontal face, the recognition task is very challenging due to the large variations in pose, clothing, camera viewpoint, image resolution and illumination. We propose the Pose Invariant PErson Recognition (PIPER) method, which accumulates the cues of poselet-level person recognizers trained by deep convolutional networks to discount for the pose variations, combined with a face recognizer and a global recognizer. Experiments on three different settings confirm that in our unconstrained setup PIPER significantly improves on the performance of DeepFace, which is one of the best face recognizers as measured on the LFW dataset

    Improved Face Tracking Thanks to Local Features Correspondence

    Get PDF
    In this paper, we propose a technique to enhance the quality of detected face tracks in videos. In particular, we present a tracking algorithm that can improve the temporal localization of the tracks, remedying to the unavoidable failures of the face detection algorithms. Local features are extracted and tracked to “fill the gaps” left by missed detections. The principal aim of this work is to provide robust and well localized tracks of faces to a system of Interactive Movietelling, but the concepts can be extended whenever there is the necessity to localize the presence of a determined face even in environments where the face detection is, for any reason, difficult. We test the effectiveness of the proposed algorithm in terms of faces localization both in space and time, first assessing the performance in an ad-hoc simulation scenario and then showing output examples of some real-world video sequences

    Person Recognition in Personal Photo Collections

    Full text link
    Recognising persons in everyday photos presents major challenges (occluded faces, different clothing, locations, etc.) for machine vision. We propose a convnet based person recognition system on which we provide an in-depth analysis of informativeness of different body cues, impact of training data, and the common failure modes of the system. In addition, we discuss the limitations of existing benchmarks and propose more challenging ones. Our method is simple and is built on open source and open data, yet it improves the state of the art results on a large dataset of social media photos (PIPA).Comment: Accepted to ICCV 2015, revise

    Unsupervised discovery of character dictionaries in animation movies

    Get PDF
    Automatic content analysis of animation movies can enable an objective understanding of character (actor) representations and their portrayals. It can also help illuminate potential markers of unconscious biases and their impact. However, multimedia analysis of movie content has predominantly focused on live-action features. A dearth of multimedia research in this field is because of the complexity and heterogeneity in the design of animated characters-an extremely challenging problem to be generalized by a single method or model. In this paper, we address the problem of automatically discovering characters in animation movies as a first step toward automatic character labeling in these media. Movie-specific character dictionaries can act as a powerful first step for subsequent content analysis at scale. We propose an unsupervised approach which requires no prior information about the characters in a movie. We first use a deep neural network-based object detector that is trained on natural images to identify a set of initial character candidates. These candidates are further pruned using saliency constraints and visual object tracking. A character dictionary per movie is then generated from exemplars obtained by clustering these candidates. We are able to identify both anthropomorphic and nonanthropomorphic characters in a dataset of 46 animation movies with varying composition and character design. Our results indicate high precision and recall of the automatically detected characters compared to human-annotated ground truth, demonstrating the generalizability of our approach
    • 

    corecore