57 research outputs found

    Dutch speech recognition in multimedia information retrieval

    Get PDF
    As data storage capacities grow to nearly unlimited sizes thanks to ever ongoing hardware and software improvements, an increasing amount of information is being stored in multimedia and spoken-word collections. Assuming that the intention of data storage is to use (portions of) it some later time, these collections must also be searchable in one way or another

    Audiovisual Archive Exploitation in the Networked Information Society

    Get PDF
    Safeguarding the massive body of audiovisual content, including rich music collections, in audiovisual archives and enabling access for various types of user groups is a prerequisite for unlocking the social-economic value of these collections. Data quantities and the need for specific content descriptors however, force archives to re-evaluate their annotation strategies and access models, and incorporate technology in the archival workflow. It is argued that this can only be successfully done provided that user requirements are studied well and that new approaches are introduced in a well-balanced manner, fitting in with traditional archival perspectives, and by bringing the archivist in the technology loop by means of education and by deploying hybrid work-flows for technology aided annotation

    Creating a data collection for evaluating rich speech retrieval

    Get PDF
    We describe the development of a test collection for the investigation of speech retrieval beyond identification of relevant content. This collection focuses on satisfying user information needs for queries associated with specific types of speech acts. The collection is based on an archive of the Internet video from Internet video sharing platform (blip.tv), and was provided by the MediaEval benchmarking initiative. A crowdsourcing approach was used to identify segments in the video data which contain speech acts, to create a description of the video containing the act and to generate search queries designed to refind this speech act. We describe and reflect on our experiences with crowdsourcing this test collection using the Amazon Mechanical Turk platform. We highlight the challenges of constructing this dataset, including the selection of the data source, design of the crowdsouring task and the specification of queries and relevant items

    Jupyter Notebooks for Generous Archive Interfaces

    Get PDF

    Overview of MediaEval 2011 rich speech retrieval task and genre tagging task

    Get PDF
    The MediaEval 2011 Rich Speech Retrieval Tasks and Genre Tagging Tasks are two new tasks oered in MediaEval 2011 that are designed to explore the development of techniques for semi-professional user generated content (SPUG). They both use the same data set: the MediaEval 2010 Wild Wild Web Tagging Task (ME10WWW). The ME10WWW data set contains Creative Commons licensed video collected from blip.tv in 2009. It was created by the PetaMedia Network of Excellence (http://www.petamedia.eu) in order to test retrieval algorithms for video content as it occurs `in the wild' on the Internet and, in particular, for user contributed multimedia that is embedded within a social network. In this overview paper, we repeat the essential characteristics of the data set, describe the tasks and specify how they are evaluated

    Search and hyperlinking task at MediaEval 2012

    Get PDF
    The Search and Hyperlinking Task was one of the Brave New Tasks at MediaEval 2012. The Task consisted of two subtasks which focused on search and linking in retrieval from a collection of semi-professional video content. These tasks followed up on research carried out within the MediaEval 2011 Rich Speech Retrieval (RSR) Task and the VideoCLEF 2009 Linking Task

    The search and hyperlinking task at MediaEval 2013

    Get PDF
    The Search and Hyperlinking Task formed part of the MediaEval 2013 evaluation workshop. The Task consisted of two sub-tasks: (1) answering known-item queries from a collection of roughly 1200 hours of broadcast TV material, and (2) linking anchors within the known item to other parts of the video collection. We provide an overview of the task and the data sets used
    corecore