18 research outputs found

    Biometric responses to music-rich segments in films: the CDVPlex

    Get PDF
    Summarising or generating trailers for films or movies involves finding the highlights within those films, those segments where we become most afraid, happy, sad, annoyed, excited, etc. In this paper we explore three questions related to automatic detection of film highlights by measuring the physiological responses of viewers of those films. Firstly, whether emotional highlights can be detected through viewer biometrics, secondly whether individuals watching a film in a group experience similar emotional reactions as others in the group and thirdly whether the presence of music in a film correlates with the occurrence of emotional highlights. We analyse the results of an experiment known as the CDVPlex, where we monitored and recorded physiological reactions from people as they viewed films in a controlled cinema-like environment. A selection of films were manually annotated for the locations of their emotive contents. We then studied the physiological peaks identified among participants while viewing the same film and how these correlated with emotion tags and with music. We conclude that these are highly correlated and that music-rich segments of a film do act as a catalyst in stimulating viewer response, though we don't know what exact emotions the viewers were experiencing. The results of this work could impact the way in which we index movie content on PVRs for example, paying special significance to movie segments which are most likely to be highlights

    An affect-based video retrieval system with open vocabulary querying

    Get PDF
    Content-based video retrieval systems (CBVR) are creating new search and browse capabilities using metadata describing significant features of the data. An often overlooked aspect of human interpretation of multimedia data is the affective dimension. Incorporating affective information into multimedia metadata can potentially enable search using this alternative interpretation of multimedia content. Recent work has described methods to automatically assign affective labels to multimedia data using various approaches. However, the subjective and imprecise nature of affective labels makes it difficult to bridge the semantic gap between system-detected labels and user expression of information requirements in multimedia retrieval. We present a novel affect-based video retrieval system incorporating an open-vocabulary query stage based on WordNet enabling search using an unrestricted query vocabulary. The system performs automatic annotation of video data with labels of well defined affective terms. In retrieval annotated documents are ranked using the standard Okapi retrieval model based on open-vocabulary text queries. We present experimental results examining the behaviour of the system for retrieval of a collection of automatically annotated feature films of different genres. Our results indicate that affective annotation can potentially provide useful augmentation to more traditional objective content description in multimedia retrieval

    Mind the Gap: Another look at the problem of the semantic gap in image retrieval

    No full text
    This paper attempts to review and characterise the problem of the semantic gap in image retrieval and the attempts being made to bridge it. In particular, we draw from our own experience in user queries, automatic annotation and ontological techniques. The first section of the paper describes a characterisation of the semantic gap as a hierarchy between the raw media and full semantic understanding of the media's content. The second section discusses real users' queries with respect to the semantic gap. The final sections of the paper describe our own experience in attempting to bridge the semantic gap. In particular we discuss our work on auto-annotation and semantic-space models of image retrieval in order to bridge the gap from the bottom up, and the use of ontologies, which capture more semantics than keyword object labels alone, as a technique for bridging the gap from the top down

    Finding people frequently appearing in news

    Get PDF
    We propose a graph based method to improve the performance of person queries in large news video collections. The method benefits from the multi-modal structure of videos and integrates text and face information. Using the idea that a person appears more frequently when his/her name is mentioned, we first use the speech transcript text to limit our search space for a query name. Then, we construct a similarity graph with nodes corresponding to all of the faces in the search space, and the edges corresponding to similarity of the faces. With the assumption that the images of the query name will be more similar to each other than to other images, the problem is then transformed into finding the densest component in the graph corresponding to the images of the query name. The same graph algorithm is applied for detecting and removing the faces of the anchorpeople in an unsupervised way. The experiments are conducted on 229 news videos provided by NIST for TRECVID 2004. The results show that proposed method outperforms the text only based methods and provides cues for recognition of faces on the large scale. © Springer-Verlag Berlin Heidelberg 2006

    Measuring concept similarities in multimedia ontologies: analysis and evaluations

    Get PDF
    The recent development of large-scale multimedia concept ontologies has provided a new momentum for research in the semantic analysis of multimedia repositories. Different methods for generic concept detection have been extensively studied, but the question of how to exploit the structure of a multimedia ontology and existing inter-concept relations has not received similar attention. In this paper, we present a clustering-based method for modeling semantic concepts on low-level feature spaces and study the evaluation of the quality of such models with entropy-based methods. We cover a variety of methods for assessing the similarity of different concepts in a multimedia ontology. We study three ontologies and apply the proposed techniques in experiments involving the visual and semantic similarities, manual annotation of video, and concept detection. The results show that modeling inter-concept relations can provide a promising resource for many different application areas in semantic multimedia processing

    Keypics: free–hand drawn iconic keywords

    Get PDF
    We propose an iconic indexing of images to be exposed on the Web. This should be accomplished by “Keypics”, i.e. auxiliary, simplified pictures referring to the geometrical and/or the semantic content of the indexed image. Keypics should not be rigidly standardized; they should be left free to evolve, to express nuances and to stress details. A mathematical tool for dealing with such freedom, in the retrieval task, already exists: Size Functions. An experiment on 494 Keypics with Size Functions based on three measuring functions (distances, projections and jumps) and their combination is presented

    Legal knowledge acquisition and multimedia applications

    Get PDF
    Search, retrieval, and management of multimedia contents are challenging tasks for users and researchers alike. The aim of e-sentencias Project is to develop a software-hardware system for the global management of the multimedia contents produced by the Spanish Civil Courts. We apply technologies such as the Semantic Web, ontologies, NLP techniques, audio-video segmentation and IR. The ultimate goal is to obtain an automatic classification of images and segments of the audiovisual records that, coupled with textual semantics, allows anefficient navigation and retrieval of judicial documents and additional legal sources

    A Model For e-Government Digital Document

    Get PDF
    The presence of a great amount of information is typical of bureaucratic processes, like the ones pertaining to public and private administrations. Such information is often recorded on paper or in different digital formats and its management is very expensive, both in terms of space used for storing documents and in terms of time spent in searching for the documents of interest. Furthermore, the manual management of these documents is absolutely not error-free. To efficiently access the information contained in very large document repositories, such as public administration archives, techniques for syntactic and semantic document management are required, so to ensure a large and intense process of document dematerialization, and eliminate, or at least reduce, the quantity of paper documents. In this work we present a novel RDF model of digital documents for improving the dematerialization effectiveness, that constitutes the starting point of an information system able to manage documental streams in the most efficient way. Such model takes into account the important need that is required in several E-Government applications which, depending on authorities or final users or time, provides different representations of the same multimedia contents

    Movie/Script: Alignment and Parsing of Video and Text Transcription

    Get PDF
    Movies and TV are a rich source of diverse and complex video of people, objects, actions and locales “in the wild”. Harvesting automatically labeled sequences of actions from video would enable creation of large-scale and highly-varied datasets. To enable such collection, we focus on the task of recovering scene structure in movies and TV series for object tracking and action retrieval. We present a weakly supervised algorithm that uses the screenplay and closed captions to parse a movie into a hierarchy of shots and scenes. Scene boundaries in the movie are aligned with screenplay scene labels and shots are reordered into a sequence of long continuous tracks or threads which allow for more accurate tracking of people, actions and objects. Scene segmentation, alignment, and shot threading are formulated as inference in a unified generative model and a novel hierarchical dynamic programming algorithm that can handle alignment and jump-limited reorderings in linear time is presented. We present quantitative and qualitative results on movie alignment and parsing, and use the recovered structure to improve character naming and retrieval of common actions in several episodes of popular TV series

    KEYPICS: FREE–HAND DRAWN ICONIC KEYWORDS

    Full text link
    corecore