58 research outputs found

    Phonemic Segmentation and Labelling using the MAUS Technique

    Get PDF
    We describe the pronunciation model of the automatic segmentation technique MAUS based on a data-driven Markov process and a new evaluation measure for phonemic transcripts relative symmetric accuracy; results are given for the MAUS segmentation and labelling on German dialog speech. MAUS is currently distributed as a freeware package by the Bavarian Archive for Speech Signals and will also be implemented as a web-service in the near future

    Three New Corpora at the Bavarian Archive for Speech Signals - and a First Step Towards Distributed Web-Based Recording

    Get PDF
    The Bavarian Archive for Speech Signals has released three new speech corpora for both industrial and academic use: a) Hempels Sofa contains recordings of up to 60 seconds of non-scripted telephone speech, b) ZipTel is a corpus with telephone speech covering postal addresses and telephone numbers from a real world application, and c) RVG-J, an extension of the original Regional Variants of German corpus with juvenile speakers. All three corpora were transcribed orthographically according to the SpeechDat annotation guidelines using the WWWTranscribe annotation software. Recently, BAS has begun to investigate performing large-scale audio recordings via the web, and RVG-J has become the testbed for this type of recording

    The Validation of Speech Corpora

    Get PDF
    1.2 Intended audience........................

    On the Convergence Rate of Gaussianization with Random Rotations

    Full text link
    Gaussianization is a simple generative model that can be trained without backpropagation. It has shown compelling performance on low dimensional data. As the dimension increases, however, it has been observed that the convergence speed slows down. We show analytically that the number of required layers scales linearly with the dimension for Gaussian input. We argue that this is because the model is unable to capture dependencies between dimensions. Empirically, we find the same linear increase in cost for arbitrary input p(x)p(x), but observe favorable scaling for some distributions. We explore potential speed-ups and formulate challenges for further research

    Perceived Prominence Reflected by Imitations of Words with and without F0 Continuity

    Get PDF
    Mixdorff H, Hönemann A, Niebuhr O, Draxler C. Perceived Prominence Reflected by Imitations of Words with and without F0 Continuity. In: Speech Prosody 2014. 2014

    Investigating the communicative function of breathing and non-breathing "silent" pauses

    Get PDF
    Cwiek A, Neueder S, Wagner P. Investigating the communicative function of breathing and non-breathing "silent" pauses. In: Draxler C, Kleber F, eds. Tagungsband der 12. Tagung Phonetik und Phonologie im deutschsprachigen Raum. München, Deutschland: Ludwig-Maximilians-Universität München; 2016: 27-29

    Deriving a strategy for synthesizing lengthening disfluencies based on spontaneous conversational speech data

    Get PDF
    Betz S, Wagner P, Voße J. Deriving a strategy for synthesizing lengthening disfluencies based on spontaneous conversational speech data. In: Draxler C, Kleber F, eds. Tagungsband der 12. Tagung Phonetik und Phonologie im deutschsprachigen Raum. München: LMU; 2016: 19-22

    A CLARIN Transcription Portal for Interview Data

    Get PDF
    In this paper we present a first version of a transcription portal for audio files based on automatic speech recognition (ASR) in various languages. The portal is implemented in the CLARIN resources research network and intended for use by non-technical scholars. We explain the background and interdisciplinary nature of interview data, the perks and quirks of using ASR for transcribing the audio in a research context, the dos and don’ts for optimal use of the portal, and future developments foreseen. The portal is promoted in a range of workshops, but there are a number of challenges that have to be met. These challenges concern privacy issues, ASR quality, and cost, amongst others
    corecore