366 research outputs found

    TRECVID: evaluating the effectiveness of information retrieval tasks on digital video

    Get PDF
    TRECVID is an annual exercise which encourages research in information retrieval from digital video by providing a large video test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. TRECVID benchmarking covers both interactive and manual searching by end users, as well as the benchmarking of some supporting technologies including shot boundary detection, extraction of some semantic features, and the automatic segmentation of TV news broadcasts into non-overlapping news stories. TRECVID has a broad range of over 40 participating groups from across the world and as it is now (2004) in its 4th annual cycle it is opportune to stand back and look at the lessons we have learned from the cumulative activity. In this paper we shall present a brief and high-level overview of the TRECVID activity covering the data, the benchmarked tasks, the overall results obtained by groups to date and an overview of the approaches taken by selective groups in some tasks. While progress from one year to the next cannot be measured directly because of the changing nature of the video data we have been using, we shall present a summary of the lessons we have learned from TRECVID and include some pointers on what we feel are the most important of these lessons

    A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments

    Full text link
    Most speech and language technologies are trained with massive amounts of speech and text information. However, most of the world languages do not have such resources or stable orthography. Systems constructed under these almost zero resource conditions are not only promising for speech technology but also for computational language documentation. The goal of computational language documentation is to help field linguists to (semi-)automatically analyze and annotate audio recordings of endangered and unwritten languages. Example tasks are automatic phoneme discovery or lexicon discovery from the speech signal. This paper presents a speech corpus collected during a realistic language documentation process. It is made up of 5k speech utterances in Mboshi (Bantu C25) aligned to French text translations. Speech transcriptions are also made available: they correspond to a non-standard graphemic form close to the language phonology. We present how the data was collected, cleaned and processed and we illustrate its use through a zero-resource task: spoken term discovery. The dataset is made available to the community for reproducible computational language documentation experiments and their evaluation.Comment: accepted to LREC 201

    Utilización de los sistemas de diálogo hablado para el acceso a la información en diferentes dominios

    Get PDF
    Ponencias de la Segunda Conferencia internacional sobre brecha digital e inclusión social, celebrada del 28 al 30 de octubre de 2009 en la Universidad Carlos III de MadridLa acción de conversar es el modo más natural para resolver un gran número de acciones cotidianas entre los seres humanos. Por este motivo, un interés histórico dentro del campo de las Tecnologías del Habla ha sido utilizar estas tecnologías en aplicaciones reales, especialmente en aplicaciones que permitan a una persona utilizar su voz para obtener información mediante la interacción directa con una máquina o para controlar un determinado sistema. El objetivo es disponer de sistemas que faciliten la comunicación persona-máquina del modo más natural posible, es decir, a través de la conversación. En esta comunicación se resumen los resultados de la aplicación de estas tecnologías para el desarrollo de diferentes sistemas de diálogo en los que la interacción entre el usuario y el sistema se lleva a cabo mediante habla espontánea en castellano. Para su implementación se ha primado la utilización de diferentes herramientas de software libre para el reconocimiento automático del habla, compresión del lenguaje natural, gestión del diálogo y síntesis de texto a voz. De este modo, el objetivo principal de la comunicación es presentar las principales ventajas que proporcionan los sistemas de diálogo para facilitar el acceso a diferentes servicios dentro de dominios semánticos restringidos, qué posibilidades brinda el uso de herramientas de software libre para su implementación y su evaluación en diferentes casos concretos de aplicación
    corecore