171 research outputs found

    Taking the bite out of automated naming of characters in TV video

    No full text
    We investigate the problem of automatically labelling appearances of characters in TV or film material with their names. This is tremendously challenging due to the huge variation in imaged appearance of each character and the weakness and ambiguity of available annotation. However, we demonstrate that high precision can be achieved by combining multiple sources of information, both visual and textual. The principal novelties that we introduce are: (i) automatic generation of time stamped character annotation by aligning subtitles and transcripts; (ii) strengthening the supervisory information by identifying when characters are speaking. In addition, we incorporate complementary cues of face matching and clothing matching to propose common annotations for face tracks, and consider choices of classifier which can potentially correct errors made in the automatic extraction of training data from the weak textual annotation. Results are presented on episodes of the TV series ‘‘Buffy the Vampire Slayer”

    "'Who are you?' - Learning person specific classifiers from video"

    Get PDF
    We investigate the problem of automatically labelling faces of characters in TV or movie material with their names, using only weak supervision from automaticallyaligned subtitle and script text. Our previous work (Everingham et al. [8]) demonstrated promising results on the task, but the coverage of the method (proportion of video labelled) and generalization was limited by a restriction to frontal faces and nearest neighbour classification. In this paper we build on that method, extending the coverage greatly by the detection and recognition of characters in profile views. In addition, we make the following contributions: (i) seamless tracking, integration and recognition of profile and frontal detections, and (ii) a character specific multiple kernel classifier which is able to learn the features best able to discriminate between the characters. We report results on seven episodes of the TV series “Buffy the Vampire Slayer”, demonstrating significantly increased coverage and performance with respect to previous methods on this material

    Methods for control over learning individual trajectory

    Get PDF
    The article discusses models, methods and algorithms of determining student's optimal individual educational trajectory. A new method of controlling the learning trajectory has been developed as a dynamic model of learning trajectory control, which uses score assessment to construct a sequence of studied subjects

    A framework for automatic semantic video annotation

    Get PDF
    The rapidly increasing quantity of publicly available videos has driven research into developing automatic tools for indexing, rating, searching and retrieval. Textual semantic representations, such as tagging, labelling and annotation, are often important factors in the process of indexing any video, because of their user-friendly way of representing the semantics appropriate for search and retrieval. Ideally, this annotation should be inspired by the human cognitive way of perceiving and of describing videos. The difference between the low-level visual contents and the corresponding human perception is referred to as the ‘semantic gap’. Tackling this gap is even harder in the case of unconstrained videos, mainly due to the lack of any previous information about the analyzed video on the one hand, and the huge amount of generic knowledge required on the other. This paper introduces a framework for the Automatic Semantic Annotation of unconstrained videos. The proposed framework utilizes two non-domain-specific layers: low-level visual similarity matching, and an annotation analysis that employs commonsense knowledgebases. Commonsense ontology is created by incorporating multiple-structured semantic relationships. Experiments and black-box tests are carried out on standard video databases for action recognition and video information retrieval. White-box tests examine the performance of the individual intermediate layers of the framework, and the evaluation of the results and the statistical analysis show that integrating visual similarity matching with commonsense semantic relationships provides an effective approach to automated video annotation

    Finding people frequently appearing in news

    Get PDF
    We propose a graph based method to improve the performance of person queries in large news video collections. The method benefits from the multi-modal structure of videos and integrates text and face information. Using the idea that a person appears more frequently when his/her name is mentioned, we first use the speech transcript text to limit our search space for a query name. Then, we construct a similarity graph with nodes corresponding to all of the faces in the search space, and the edges corresponding to similarity of the faces. With the assumption that the images of the query name will be more similar to each other than to other images, the problem is then transformed into finding the densest component in the graph corresponding to the images of the query name. The same graph algorithm is applied for detecting and removing the faces of the anchorpeople in an unsupervised way. The experiments are conducted on 229 news videos provided by NIST for TRECVID 2004. The results show that proposed method outperforms the text only based methods and provides cues for recognition of faces on the large scale. © Springer-Verlag Berlin Heidelberg 2006

    Video Mining with Frequent Itemset Configurations

    Get PDF
    International audienceWe present a method for mining frequently occurring objects and scenes from videos. Object candidates are detected by finding recurring spatial arrangements of affine covariant regions. Our mining method is based on the class of frequent itemset mining algorithms, which have proven their efficiency in other domains, but have not been applied to video mining before. In this work we show how to express vector-quantized features and their spatial relations as itemsets. Furthermore, a fast motion segmentation method is introduced as an attention filter for the mining algorithm. Results are shown on real world data consisting of music video clips

    Content-Based Retrieval in Endomicroscopy: Toward an Efficient Smart Atlas for Clinical Diagnosis

    Get PDF
    International audienceIn this paper we present the first Content-Based Image Retrieval (CBIR) framework in the field of in vivo endomicroscopy, with applications ranging from training support to diagnosis support. We propose to adjust the standard Bag-of-Visual-Words method for the retrieval of endomicroscopic videos. Retrieval performance is evaluated both indirectly from a classification point-of-view, and directly with respect to a perceived similarity ground truth. The proposed method significantly outperforms, on two different endomicroscopy databases, several state-of-the-art methods in CBIR. With the aim of building a self-training simulator, we use retrieval results to estimate the interpretation difficulty experienced by the endoscopists. Finally, by incorporating clinical knowledge about perceived similarity and endomicroscopy semantics, we are able: 1) to learn an adequate visual similarity distance and 2) to build visual-word-based semantic signatures that extract, from low-level visual features, a higher-level clinical knowledge expressed in the endoscopist own language

    Joint Inference in Weakly-Annotated Image Datasets via Dense Correspondence

    Get PDF
    We present a principled framework for inferring pixel labels in weakly-annotated image datasets. Most previous, example-based approaches to computer vision rely on a large corpus of densely labeled images. However, for large, modern image datasets, such labels are expensive to obtain and are often unavailable. We establish a large-scale graphical model spanning all labeled and unlabeled images, then solve it to infer pixel labels jointly for all images in the dataset while enforcing consistent annotations over similar visual patterns. This model requires significantly less labeled data and assists in resolving ambiguities by propagating inferred annotations from images with stronger local visual evidences to images with weaker local evidences. We apply our proposed framework to two computer vision problems, namely image annotation with semantic segmentation, and object discovery and co-segmentation (segmenting multiple images containing a common object). Extensive numerical evaluations and comparisons show that our method consistently outperforms the state-of-the-art in automatic annotation and semantic labeling, while requiring significantly less labeled data. In contrast to previous co-segmentation techniques, our method manages to discover and segment objects well even in the presence of substantial amounts of noise images (images not containing the common object), as typical for datasets collected from Internet search
    • 

    corecore