58 research outputs found
Phonemic Segmentation and Labelling using the MAUS Technique
We describe the pronunciation model of the automatic segmentation technique MAUS based on a data-driven Markov process and a new evaluation measure for phonemic transcripts relative symmetric accuracy; results are given for the MAUS segmentation and labelling on German dialog speech. MAUS is currently distributed as a freeware package by the Bavarian Archive for Speech Signals and will also be implemented as a web-service in the near future
Three New Corpora at the Bavarian Archive for Speech Signals - and a First Step Towards Distributed Web-Based Recording
The Bavarian Archive for Speech Signals has released three new speech corpora for both industrial and academic use: a) Hempels Sofa contains recordings of up to 60 seconds of non-scripted telephone speech, b) ZipTel is a corpus with telephone speech covering postal addresses and telephone numbers from a real world application, and c) RVG-J, an extension of the original Regional Variants of German corpus with juvenile speakers. All three corpora were transcribed orthographically according to the SpeechDat annotation guidelines using the WWWTranscribe annotation software. Recently, BAS has begun to investigate performing large-scale audio recordings via the web, and RVG-J has become the testbed for this type of recording
On the Convergence Rate of Gaussianization with Random Rotations
Gaussianization is a simple generative model that can be trained without
backpropagation. It has shown compelling performance on low dimensional data.
As the dimension increases, however, it has been observed that the convergence
speed slows down. We show analytically that the number of required layers
scales linearly with the dimension for Gaussian input. We argue that this is
because the model is unable to capture dependencies between dimensions.
Empirically, we find the same linear increase in cost for arbitrary input
, but observe favorable scaling for some distributions. We explore
potential speed-ups and formulate challenges for further research
Perceived Prominence Reflected by Imitations of Words with and without F0 Continuity
Mixdorff H, Hönemann A, Niebuhr O, Draxler C. Perceived Prominence Reflected by Imitations of Words with and without F0 Continuity. In: Speech Prosody 2014. 2014
Investigating the communicative function of breathing and non-breathing "silent" pauses
Cwiek A, Neueder S, Wagner P. Investigating the communicative function of breathing and non-breathing "silent" pauses. In: Draxler C, Kleber F, eds. Tagungsband der 12. Tagung Phonetik und Phonologie im deutschsprachigen Raum. München, Deutschland: Ludwig-Maximilians-Universität München; 2016: 27-29
Deriving a strategy for synthesizing lengthening disfluencies based on spontaneous conversational speech data
Betz S, Wagner P, Voße J. Deriving a strategy for synthesizing lengthening disfluencies based on spontaneous conversational speech data. In: Draxler C, Kleber F, eds. Tagungsband der 12. Tagung Phonetik und Phonologie im deutschsprachigen Raum. München: LMU; 2016: 19-22
A CLARIN Transcription Portal for Interview Data
In this paper we present a first version of a transcription portal for audio files based on automatic speech recognition (ASR) in various languages. The portal is implemented in the CLARIN resources research network and intended for use by non-technical scholars. We explain the background and interdisciplinary nature of interview data, the perks and quirks of using ASR for transcribing the audio in a research context, the dos and don’ts for optimal use of the portal, and future developments foreseen. The portal is promoted in a range of workshops, but there are a number of challenges that have to be met. These challenges concern privacy issues, ASR quality, and cost, amongst others
- …