57 research outputs found
Dutch speech recognition in multimedia information retrieval
As data storage capacities grow to nearly unlimited sizes thanks to ever ongoing hardware and software improvements, an increasing amount of information is being stored in multimedia and spoken-word collections. Assuming that the intention of data storage is to use (portions of) it some later time, these collections must also be searchable in one way or another
Audiovisual Archive Exploitation in the Networked Information Society
Safeguarding the massive body of audiovisual content, including rich music collections, in audiovisual archives and enabling access for various types of user groups is a prerequisite for unlocking the social-economic value of these collections. Data quantities and the need for specific content descriptors however, force archives to re-evaluate their annotation strategies and access models, and incorporate technology in the archival workflow. It is argued that this can only be successfully done provided that user requirements are studied well and that new approaches are introduced in a well-balanced manner, fitting in with traditional archival perspectives, and by bringing the archivist in the technology loop by means of education and by deploying hybrid work-flows for technology aided annotation
Creating a data collection for evaluating rich speech retrieval
We describe the development of a test collection for the investigation of speech retrieval beyond identification of relevant content. This collection focuses on satisfying user information needs for queries associated with specific types of speech acts. The collection is based on an archive of the Internet video from Internet video sharing platform (blip.tv), and was provided by the MediaEval benchmarking initiative. A crowdsourcing approach was used to identify segments in the video data which contain speech acts, to create a description of the video containing the act and to generate search queries designed to refind this speech act. We describe and reflect on our experiences with crowdsourcing this test collection using the Amazon Mechanical Turk platform. We highlight the challenges of constructing this dataset, including the selection of the data source, design of the crowdsouring task and the specification of queries and relevant items
Overview of MediaEval 2011 rich speech retrieval task and genre tagging task
The MediaEval 2011 Rich Speech Retrieval Tasks and Genre Tagging Tasks are two new tasks oered in MediaEval 2011 that are designed to explore the development of techniques for semi-professional user generated content (SPUG). They both use the same data set: the MediaEval 2010 Wild Wild Web Tagging Task (ME10WWW). The ME10WWW data set contains Creative Commons licensed video collected from blip.tv in 2009. It was created by the PetaMedia Network of Excellence (http://www.petamedia.eu) in order to test retrieval algorithms for video content as it occurs `in the wild' on the Internet and, in particular, for user contributed multimedia that is embedded within a social network. In this overview paper, we repeat the essential characteristics of the data set, describe the tasks and specify how they are evaluated
Search and hyperlinking task at MediaEval 2012
The Search and Hyperlinking Task was one of the Brave New Tasks at MediaEval 2012. The Task consisted of two subtasks which focused on search and linking in retrieval from a collection of semi-professional video content. These tasks followed up on research carried out within the MediaEval 2011 Rich Speech Retrieval (RSR) Task and the VideoCLEF 2009 Linking Task
Dealing with Phrase Level Co-Articulation (PLC) in speech recognition: A first approach
Whereas nowadays within-word co-articulation effects are usually sufficiently dealt with in automatic speech recognition, this is not always the case with phrase level co-articulation effects (PLC). This paper describes a first approach in dealing with phrase level co-articulation by applying these rules on the reference transcripts used for training our recogniser and by adding a set of temporary PLC phones that later on will be mapped on the original phones. In fact we temporarily break down acoustic context into a general and a PLC context. With this method, more robust models could be trained because phones that are confused due to PLC effects like for example /v/-/f/ and /z/-/s/, receive their own models. A first attempt to apply this method is described
- …