7,673 research outputs found

    On the surplus value of semantic video analysis beyond the key frame

    Get PDF
    Typical semantic video analysis methods aim for classification of camera shots based on extracted features from a single key frame only. In this paper, we sketch a video analysis scenario and evaluate the benefit of analysis beyond the key frame for semantic concept detection performance. We developed detectors for a lexicon of 26 concepts, and evaluated their performance on 120 hours of video data. Results show that, on average, detection performance can increase with almost 40 % when the analysis method takes more visual content into account. 1

    Evaluation campaigns and TRECVid

    Get PDF
    The TREC Video Retrieval Evaluation (TRECVid) is an international benchmarking activity to encourage research in video information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. TRECVid completed its fifth annual cycle at the end of 2005 and in 2006 TRECVid will involve almost 70 research organizations, universities and other consortia. Throughout its existence, TRECVid has benchmarked both interactive and automatic/manual searching for shots from within a video corpus, automatic detection of a variety of semantic and low-level video features, shot boundary detection and the detection of story boundaries in broadcast TV news. This paper will give an introduction to information retrieval (IR) evaluation from both a user and a system perspective, highlighting that system evaluation is by far the most prevalent type of evaluation carried out. We also include a summary of TRECVid as an example of a system evaluation benchmarking campaign and this allows us to discuss whether such campaigns are a good thing or a bad thing. There are arguments for and against these campaigns and we present some of them in the paper concluding that on balance they have had a very positive impact on research progress

    Symbiosis between the TRECVid benchmark and video libraries at the Netherlands Institute for Sound and Vision

    Get PDF
    Audiovisual archives are investing in large-scale digitisation efforts of their analogue holdings and, in parallel, ingesting an ever-increasing amount of born- digital files in their digital storage facilities. Digitisation opens up new access paradigms and boosted re-use of audiovisual content. Query-log analyses show the shortcomings of manual annotation, therefore archives are complementing these annotations by developing novel search engines that automatically extract information from both audio and the visual tracks. Over the past few years, the TRECVid benchmark has developed a novel relationship with the Netherlands Institute of Sound and Vision (NISV) which goes beyond the NISV just providing data and use cases to TRECVid. Prototype and demonstrator systems developed as part of TRECVid are set to become a key driver in improving the quality of search engines at the NISV and will ultimately help other audiovisual archives to offer more efficient and more fine-grained access to their collections. This paper reports the experiences of NISV in leveraging the activities of the TRECVid benchmark

    Speeding up the detection of non-iconic and iconic gestures (SPUDNIG): A toolkit for the automatic detection of hand movements and gestures in video data

    No full text
    In human face-to-face communication, speech is frequently accompanied by visual signals, especially communicative hand gestures. Analyzing these visual signals requires detailed manual annotation of video data, which is often a labor-intensive and time-consuming process. To facilitate this process, we here present SPUDNIG (SPeeding Up the Detection of Non-iconic and Iconic Gestures), a tool to automatize the detection and annotation of hand movements in video data. We provide a detailed description of how SPUDNIG detects hand movement initiation and termination, as well as open-source code and a short tutorial on an easy-to-use graphical user interface (GUI) of our tool. We then provide a proof-of-principle and validation of our method by comparing SPUDNIG’s output to manual annotations of gestures by a human coder. While the tool does not entirely eliminate the need of a human coder (e.g., for false positives detection), our results demonstrate that SPUDNIG can detect both iconic and non-iconic gestures with very high accuracy, and could successfully detect all iconic gestures in our validation dataset. Importantly, SPUDNIG’s output can directly be imported into commonly used annotation tools such as ELAN and ANVIL. We therefore believe that SPUDNIG will be highly relevant for researchers studying multimodal communication due to its annotations significantly accelerating the analysis of large video corpora

    An aesthetics of touch: investigating the language of design relating to form

    Get PDF
    How well can designers communicate qualities of touch? This paper presents evidence that they have some capability to do so, much of which appears to have been learned, but at present make limited use of such language. Interviews with graduate designer-makers suggest that they are aware of and value the importance of touch and materiality in their work, but lack a vocabulary to fully relate to their detailed explanations of other aspects such as their intent or selection of materials. We believe that more attention should be paid to the verbal dialogue that happens in the design process, particularly as other researchers show that even making-based learning also has a strong verbal element to it. However, verbal language alone does not appear to be adequate for a comprehensive language of touch. Graduate designers-makers’ descriptive practices combined non-verbal manipulation within verbal accounts. We thus argue that haptic vocabularies do not simply describe material qualities, but rather are situated competences that physically demonstrate the presence of haptic qualities. Such competencies are more important than groups of verbal vocabularies in isolation. Design support for developing and extending haptic competences must take this wide range of considerations into account to comprehensively improve designers’ capabilities

    Towards a theory of experimental music theatre: 'showing doing', 'non-matrixed performance' and 'metaxis'

    Get PDF
    Although recent years have seen the emergence of sustained research on experimental music theater, most of this is of a largely descriptive nature. To address the shortcomings of such approaches, this essay outlines a theory of experimental music theater based on a clear definition and a number of constitutive features. A number of theoretical terms from the fields of performance theory and theater practice are introduced, namely “showing doing” (Richard Schechner), “non-matrixed performance” and “non-matrixed representation” (Michael Kirby), and “metaxis” (Augusto Boal). The analytical effectiveness of this theoretical framework is demonstrated by discussion of case studies drawn both from the “classics” of experimental music theater (John Cage, Mauricio Kagel) and from recent work (Christopher Fox, David Bithell, Trond Reinholdtsen)

    Evaluating Multimedia Features and Fusion for Example-Based Event Detection

    Get PDF
    Multimedia event detection (MED) is a challenging problem because of the heterogeneous content and variable quality found in large collections of Internet videos. To study the value of multimedia features and fusion for representing and learning events from a set of example video clips, we created SESAME, a system for video SEarch with Speed and Accuracy for Multimedia Events. SESAME includes multiple bag-of-words event classifiers based on single data types: low-level visual, motion, and audio features; high-level semantic visual concepts; and automatic speech recognition. Event detection performance was evaluated for each event classifier. The performance of low-level visual and motion features was improved by the use of difference coding. The accuracy of the visual concepts was nearly as strong as that of the low-level visual features. Experiments with a number of fusion methods for combining the event detection scores from these classifiers revealed that simple fusion methods, such as arithmetic mean, perform as well as or better than other, more complex fusion methods. SESAME’s performance in the 2012 TRECVID MED evaluation was one of the best reported

    Architecture of participation : the realization of the Semantic Web, and Internet OS

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, System Design and Management Program, February 2008.Includes bibliographical references (p. 65-68).The Internet and World Wide Web (WWW) is becoming an integral part of our daily life and touching every part of the society around the world including both well-developed and developing countries. The simple technology and genuine intention of the original WWW, which is to help researchers share and exchange information and data across incompatible platforms and systems, have evolved into something larger and beyond what one could conceive. While WWW has reached the critical mass, many limitations are uncovered. To address the limitations, the development of its extension, the Semantic Web, has been underway for more than five years by the inventor of WWW, Tim Berners-Lee, and the technical community. Yet, no significant impact has been made. Its awareness by the public is surprisingly and unfortunately low. This thesis will review the development effort of the Semantic Web, examine its progress which appears lagging compared to WWW, and propose a promising business model to accelerate its adoption path.by Shelley Lau.S.M
    corecore