816 research outputs found

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Towards Affordable Disclosure of Spoken Word Archives

    Get PDF
    This paper presents and discusses ongoing work aiming at affordable disclosure of real-world spoken word archives in general, and in particular of a collection of recorded interviews with Dutch survivors of World War II concentration camp Buchenwald. Given such collections, the least we want to be able to provide is search at different levels and a flexible way of presenting results. Strategies for automatic annotation based on speech recognition – supporting e.g., within-document search– are outlined and discussed with respect to the Buchenwald interview collection. In addition, usability aspects of the spoken word search are discussed on the basis of our experiences with the online Buchenwald web portal. It is concluded that, although user feedback is generally fairly positive, automatic annotation performance is still far from satisfactory, and requires additional research

    discrimination of different serbian pronunciations from shtokavian dialect

    Get PDF
    Abstract This paper proposes a new methodology for discrimination of different pronunciations in the Shtokavian dialect of the Serbian language. At the first, the written language (Unicode text) is converted into codes according to the energy status of each character in the text-line. Such a set of codes is seen as a grayscale image. Then, the local structures of the image are explored by local binary operators. It creates a vector set which differentiates various pronunciations of the Serbian language. The experiment is performed on fifty documents given in Serbian language. A comparison performed between the proposed method and the n -gram method shows its clear advantage

    Two-Staged Acoustic Modeling Adaption for Robust Speech Recognition by the Example of German Oral History Interviews

    Full text link
    In automatic speech recognition, often little training data is available for specific challenging tasks, but training of state-of-the-art automatic speech recognition systems requires large amounts of annotated speech. To address this issue, we propose a two-staged approach to acoustic modeling that combines noise and reverberation data augmentation with transfer learning to robustly address challenges such as difficult acoustic recording conditions, spontaneous speech, and speech of elderly people. We evaluate our approach using the example of German oral history interviews, where a relative average reduction of the word error rate by 19.3% is achieved.Comment: Accepted for IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, July 201

    Immersion Within Call

    Get PDF
    The purpose of this research study was to explore the idea of immersion and what constitutes immersion in Computer Assisted Language Learning (CALL). CALL has increasingly become important in the field of SLA (Second Language Acquisition) and continues to grow in usage each year. As a graduate instructor of a basic level French course, my research focused on the immersion factor of CALL programs. This research was designed to obtain and analyze first year French students opinions of a CD-ROM CALL program by asking the following questions: (1) Did the participants feel immersed in the French language using the CD-ROM? (2) Had the participants visited a French speaking country or did they plan on studying in a French speaking country in the future? (3) Did the participants enjoy using the CD-ROM to learn French? (4) What did the participants like most and least about using the CD-ROM CALL program? The most substantial finding of the study was that a majority of the participants did feel immersed in the French language while using the CALL program. A secondary finding was that many of the likes and dislikes mentioned specifically by the participants coincide with the main advantages and disadvantages of CALL
    corecore