Search CORE

13 research outputs found

Technically enabled explaining of voice characteristics

Author: Glarner Thomas
Häb-Umbach Reinhold
Rautenberg Frederik
Wagner Petra
Wiechmann Jana
Publication venue: Phonetik und Phonologie im deutschsprachigen Raum
Publication date: 05/10/2022
Field of study

BieColl - Bielefeld eCollections

Technically enabled explaining of voice characteristics

Author: Glarner Thomas
Häb-Umbach Reinhold
Rautenberg Frederik
Wagner Petra
Wiechmann Jana
Publication venue: Phonetik und Phonologie im deutschsprachigen Raum
Publication date: 05/10/2022
Field of study

BieColl - Bielefeld Electronic Collections

BieColl - Bielefeld eCollections

Autonomous Learning of Representations

Author: Hammer Barbara
Häb-Umbach Reinhold
Mokbel Bassam
Paaßen Benjamin
Walter Oliver
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Walter O, Häb-Umbach R, Mokbel B, Paaßen B, Hammer B. Autonomous Learning of Representations. KI - Künstliche Intelligenz. 2015;29(4):339–351.Besides the core learning algorithm itself, one major question in machine learning is how to best encode given training data such that the learning technology can efficiently learn based thereon and generalize to novel data. While classical approaches often rely on a hand coded data representation, the topic of autonomous representation or feature learning plays a major role in modern learning architectures. The goal of this contribution is to give an overview about different principles of autonomous feature learning, and to exemplify two principles based on two recent examples: autonomous metric learning for sequences, and autonomous learning of a deep representation for spoken language, respectively

Publications at Bielefeld University

Convolutional Recurrent Neural Network and Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision

Author: Ebbers Janek
Häb-Umbach Reinhold
Publication venue: 'New York University'
Publication date: 01/01/2019
Field of study

In this paper we present our audio tagging system for the DCASE 2019 Challenge Task 2. We propose a model consisting of a convolutional front end using log-mel-energies as input features, a recurrent neural network sequence encoder and a fully connected classifier network outputting an activity probability for each of the 80 considered event classes. Due to the recurrent neural network, which encodes a whole sequence into a single vector, our model is able to process sequences of varying lengths. The model is trained with only little manually labeled training data and a larger amount of automatically labeled web data, which hence suffers from label noise. To efficiently train the model with the provided data we use various data augmentation to prevent overfitting and improve generalization. Our best submitted system achieves a label-weighted label-ranking average precision (lwlrap) of 75.5% on the private test set which is an absolute improvement of 21.7% over the baseline. This system scored the second place in the teams ranking of the DCASE 2019 Challenge Task 2 and the fifth place in the Kaggle competition ``Freesound Audio Tagging 2019'' with more than 400 participants. After the challenge ended we further improved performance to 76.5% lwlrap setting a new state-of-the-art on this dataset.646

Crossref

New York University Faculty Digital Archive

Disentangling the Dimensions of Phonetic Variation: First Steps towards and Explanatory and Exploratory Research Tool in Phonetics

Author: Heldner Mattias
Häb-Umbach Reinhold
Wagner Petra
Publication venue: Department of Linguistics, Stockholm
Publication date: 01/01/2019
Field of study

Wagner P, Häb-Umbach R. Disentangling the Dimensions of Phonetic Variation: First Steps towards and Explanatory and Exploratory Research Tool in Phonetics. PERILUS: Phonetic experimental research at the Institute of Linguistics, University of Stockholm. 2019;XXVII:79-83.In this paper, we present first evidence for a potential application of novel speech technological methods as a valuable tool for basic phonetics research. We describe a research program aiming at identifying the complex phonetic realizations underlying various dimensions of phonetic variation. This will be addressed with the help of recent approaches in unsupervised voice conversion and waveform generation. Concretely, we present a model for disentangling speakers' voice qualities and their linguistic-phonetic content, which can then be used to perform voice conversion across different dimensions of phonetic variation. The resulting signals are then "audible versions" of the phonetic dimensions of interest, and lend themselves to straightforward phonetic interpretation

Publications at Bielefeld University

Explaining voice characteristics to novice voice practitioners - How successful is it?

Author: Häb-Umbach Reinhold
Rautenberg Frederik
Wagner Petra
Wiechmann Jana
Publication venue
Publication date: 01/01/2023
Field of study

Wiechmann J, Rautenberg F, Wagner P, Häb-Umbach R. Explaining voice characteristics to novice voice practitioners - How successful is it? Presented at the 20th International Congress of the Phonetic Sciences (ICPhS) , Prague, Czech Republic.Human voices are notoriously difficult to characterize. A suitable and consistent description of voice characteristics is crucial in many applied disciplines such as speech therapy or forensics. The present study examines the ability of novice voice practitioners (students of clinical linguistics) to characterize voices before and after an expert explanation of laryngeal, supralaryngeal and prosodic voice features. Results show that even short expert explanations lead to a higher agreement between expert and novices. Especially voice characteristics related to laryngeal and supralaryngeal settings remain a major challenge to identify. We suggest that voice conversion technology may be employed in the future to assist the explanation of voice characteristics

Publications at Bielefeld University

Technically enabled explaining of voice characteristics

Author: Bruggemann Anna
Glarner Thomas
Häb-Umbach Reinhold
Ludusan Bogdan
Rautenberg Frederik
Wagner Petra
Wiechmann Jana
Publication venue: Universitätsbibliothek Bielefeld
Publication date: 01/01/2022
Field of study

Wiechmann J, Glarner T, Rautenberg F, Wagner P, Häb-Umbach R. Technically enabled explaining of voice characteristics. In: Bruggemann A, Ludusan B, eds. P & P 18. Bielefeld: Universitätsbibliothek Bielefeld; 2022

Publications at Bielefeld University

Acoustic modeling of hoarseness

Author: Häb-Umbach Reinhold
Leemann Adrian
Pistor Tillmann Rainer
Rautenberg Frederik
Steiner Carina
Tomascheck Fabian
Wagner Petra
Wiechmann Jana
Publication venue: Bern Open Publishing
Publication date: 01/01/2023
Field of study

Wiechmann J, Rautenberg F, Wagner P, Häb-Umbach R. Acoustic modeling of hoarseness. In: Pistor TR, Steiner C, Tomascheck F, Leemann A, eds. Book of Abstracts der 19. Tagung Phonetik und Phonologie im deutschsprachigen Raum. Bern: Bern Open Publishing; 2023

Publications at Bielefeld University