10 research outputs found

    Automated speech and audio analysis for semantic access to multimedia

    Get PDF
    The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to increased granularity of automatically extracted metadata. A number of techniques will be presented, including the alignment of speech and text resources, large vocabulary speech recognition, key word spotting and speaker classification. The applicability of techniques will be discussed from a media crossing perspective. The added value of the techniques and their potential contribution to the content value chain will be illustrated by the description of two (complementary) demonstrators for browsing broadcast news archives

    Multimedia search without visual analysis: the value of linguistic and contextual information

    Get PDF
    This paper addresses the focus of this special issue by analyzing the potential contribution of linguistic content and other non-image aspects to the processing of audiovisual data. It summarizes the various ways in which linguistic content analysis contributes to enhancing the semantic annotation of multimedia content, and, as a consequence, to improving the effectiveness of conceptual media access tools. A number of techniques are presented, including the time-alignment of textual resources, audio and speech processing, content reduction and reasoning tools, and the exploitation of surface features

    Towards Affordable Disclosure of Spoken Word Archives

    Get PDF
    This paper presents and discusses ongoing work aiming at affordable disclosure of real-world spoken word archives in general, and in particular of a collection of recorded interviews with Dutch survivors of World War II concentration camp Buchenwald. Given such collections, the least we want to be able to provide is search at different levels and a flexible way of presenting results. Strategies for automatic annotation based on speech recognition – supporting e.g., within-document search– are outlined and discussed with respect to the Buchenwald interview collection. In addition, usability aspects of the spoken word search are discussed on the basis of our experiences with the online Buchenwald web portal. It is concluded that, although user feedback is generally fairly positive, automatic annotation performance is still far from satisfactory, and requires additional research

    Efektivitas Media BIGVU pada Pembelajaran Jurnalistik dalam Meningkatkan Keterampilan Newscaster Mahasiswa PBSI

    Get PDF
    Teknologi digital membuka ruang bagi mahasiswa menjalankan aktivitas jurnalistik. Ilmu jurnalistik dan bahasa berkaitan dengan  public speaking  mendorong mahasiswa PBSI menguji keterampilan berbahasanya. Tujuan penelitian ini mengetahui keefektifan BIGVU pada mata kuliah jurnalistik, sebagai media  peningkatan keterampilan newscaster mahasiswa PBSI. Jenis Penelitian ini adalah penelitian kuantitatif, menggunakan pre-experimental designs, dalam bentuk one-goup pretest-postes design. Metode pengumpulan data menggunakan projek penugasan dalam bentuk video. Adapun temuan penelitian (1) BIGVU meningkatkan keterampilan newscaster, dibuktikan dari pengingkatan hasi belajar, pada posttest sebesar 82,8, sedangkan pretest hanya sebesar 69,9. (2) BIGVU digunakan sebagai media pembelajaran jurnalistik meningkatkan keterampilan newscaster  terbilang efektif, dibuktikan dari hasil perbandingan nilai t-hitung (9,6) lebih besar dari pada t-tabel (1,6736). &nbsp

    Switching Partners: Dancing with the Ontological Engineers

    Get PDF
    Ontologies are today being applied in almost every field to support the alignment and retrieval of data of distributed provenance. Here we focus on new ontological work on dance and on related cultural phenomena belonging to what UNESCO calls the “intangible heritage.” Currently data and information about dance, including video data, are stored in an uncontrolled variety of ad hoc ways. This serves not only to prevent retrieval, comparison and analysis of the data, but may also impinge on our ability to preserve the data that already exists. Here we explore recent technological developments that are designed to counteract such problems by allowing information to be retrieved across disciplinary, cultural, linguistic and technological boundaries. Software applications such as the ones envisaged here will enable speedier recovery of data and facilitate its analysis in ways that will assist both archiving of and research on dance

    Weakly-supervised text-to-speech alignment confidence measure

    Get PDF
    International audienceThis work proposes a new confidence measure for evaluating text-to-speech alignment systems outputs, which is a key component for many applications, such as semi-automatic corpus anonymization, lips syncing, film dubbing, corpus preparation for speech synthesis and speech recognition acoustic models training. This confidence measure exploits deep neural networks that are trained on large corpora without direct supervision. It is evaluated on an open-source spontaneous speech corpus and outperforms a confidence score derived from a state-of-the-art text-to-speech aligner. We further show that this confidence measure can be used to fine-tune the output of this aligner and improve the quality of the resulting alignment

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Automated speech and audio analysis for semantic access to multimedia

    No full text
    The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to increased granularity of automatically extracted metadata. A number of techniques will be presented, including the alignment of speech and text resources, large vocabulary speech recognition, key word spotting and speaker classification. The applicability of techniques will be discussed from a media crossing perspective. The added value of the techniques and their potential contribution to the content value chain will be illustrated by the description of two (complementary) demonstrators for browsing broadcast news archives
    corecore