23,801 research outputs found

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Automated speech and audio analysis for semantic access to multimedia

    Get PDF
    The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to increased granularity of automatically extracted metadata. A number of techniques will be presented, including the alignment of speech and text resources, large vocabulary speech recognition, key word spotting and speaker classification. The applicability of techniques will be discussed from a media crossing perspective. The added value of the techniques and their potential contribution to the content value chain will be illustrated by the description of two (complementary) demonstrators for browsing broadcast news archives

    Multimedia search without visual analysis: the value of linguistic and contextual information

    Get PDF
    This paper addresses the focus of this special issue by analyzing the potential contribution of linguistic content and other non-image aspects to the processing of audiovisual data. It summarizes the various ways in which linguistic content analysis contributes to enhancing the semantic annotation of multimedia content, and, as a consequence, to improving the effectiveness of conceptual media access tools. A number of techniques are presented, including the time-alignment of textual resources, audio and speech processing, content reduction and reasoning tools, and the exploitation of surface features

    Balancing SoNaR: IPR versus Processing Issues in a 500-Million-Word Written Dutch Reference Corpus

    Get PDF
    In The Low Countries, a major reference corpus for written Dutch is beingbuilt. We discuss the interplay between data acquisition and data processingduring the creation of the SoNaR Corpus. Based on developments in traditionalcorpus compiling and new web harvesting approaches, SoNaR is designed tocontain 500 million words, balanced over 36 text types including bothtraditional and new media texts. Beside its balanced design, every text sampleincluded in SoNaR will have its IPR issues settled to the largest extentpossible. This data collection task presents many challenges because everydecision taken on the level of text acquisition has ramifications for the levelof processing and the general usability of the corpus. As far as thetraditional text types are concerned, each text brings its own processingrequirements and issues. For new media texts - SMS, chat - the problem is evenmore complex, issues such as anonimity, recognizability and citation right, allpresent problems that have to be tackled. The solutions actually lead to thecreation of two corpora: a gigaword SoNaR, IPR-cleared for research purposes,and the smaller - of commissioned size - more privacy compliant SoNaR,IPR-cleared for commercial purposes as well

    Unsupervised naming of speakers in broadcast TV: using written names, pronounced names or both ?

    Get PDF
    International audiencePersons identification in video from TV broadcast is a valuable tool for indexing them. However, the use of biometric mod- els is not a very sustainable option without a priori knowledge of people present in the videos. The pronounced names (PN) or written names (WN) on the screen can provide hypotheses names for speakers. We propose an experimental comparison of the potential of these two modalities (names pronounced or written) to extract the true names of the speakers. The names pronounced offer many instances of citation but transcription and named-entity detection errors halved the potential of this modality. On the contrary, the written names detection benefits of the video quality improvement and is nowadays rather robust and efficient to name speakers. Oracle experiments presented for the mapping between written names and speakers also show the complementarity of both PN and WN modalities

    Towards Affordable Disclosure of Spoken Word Archives

    Get PDF
    This paper presents and discusses ongoing work aiming at affordable disclosure of real-world spoken word archives in general, and in particular of a collection of recorded interviews with Dutch survivors of World War II concentration camp Buchenwald. Given such collections, the least we want to be able to provide is search at different levels and a flexible way of presenting results. Strategies for automatic annotation based on speech recognition – supporting e.g., within-document search– are outlined and discussed with respect to the Buchenwald interview collection. In addition, usability aspects of the spoken word search are discussed on the basis of our experiences with the online Buchenwald web portal. It is concluded that, although user feedback is generally fairly positive, automatic annotation performance is still far from satisfactory, and requires additional research

    Recovering capitalization and punctuation marks for automatic speech recognition: case study for Portuguese broadcast news

    Get PDF
    The following material presents a study about recovering punctuation marks, and capitalization information from European Portuguese broadcast news speech transcriptions. Different approaches were tested for capitalization, both generative and discriminative, using: finite state transducers automatically built from language models; and maximum entropy models. Several resources were used, including lexica, written newspaper corpora and speech transcriptions. Finite state transducers produced the best results for written newspaper corpora, but the maximum entropy approach also proved to be a good choice, suitable for the capitalization of speech transcriptions, and allowing straightforward on-the-fly capitalization. Evaluation results are presented both for written newspaper corpora and for broadcast news speech transcriptions. The frequency of each punctuation mark in BN speech transcriptions was analyzed for three different languages: English, Spanish and Portuguese. The punctuation task was performed using a maximum entropy modeling approach, which combines different types of information both lexical and acoustic. The contribution of each feature was analyzed individually and separated results for each focus condition are given, making it possible to analyze the performance differences between planned and spontaneous speech. All results were evaluated on speech transcriptions of a Portuguese broadcast news corpus. The benefits of enriching speech recognition with punctuation and capitalization are shown in an example, illustrating the effects of described experiments into spoken texts.info:eu-repo/semantics/acceptedVersio

    Lessons Learned from EVALITA 2020 and Thirteen Years of Evaluation of Italian Language Technology

    Get PDF
    This paper provides a summary of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA2020) which was held online on December 17th, due to the 2020 COVID-19 pandemic. The 2020 edition of Evalita included 14 different tasks belonging to five research areas, namely: (i) Affect, Hate, and Stance, (ii) Creativity and Style, (iii) New Challenges in Long-standing Tasks, (iv) Semantics and Multimodality, (v) Time and Diachrony. This paper provides a description of the tasks and the key findings from the analysis of participant outcomes. Moreover, it provides a detailed analysis of the participants and task organizers which demonstrates the growing interest with respect to this campaign. Finally, a detailed analysis of the evaluation of tasks across the past seven editions is provided; this allows to assess how the research carried out by the Italian community dealing with Computational Linguistics has evolved in terms of popular tasks and paradigms during the last 13 years
    corecore