13 research outputs found
Autonomous Learning of Representations
Walter O, Häb-Umbach R, Mokbel B, Paaßen B, Hammer B. Autonomous Learning of Representations. KI - Künstliche Intelligenz. 2015;29(4):339–351.Besides the core learning algorithm itself, one major question in machine learning is how to best encode given training data such that the learning technology can efficiently learn based thereon and generalize to novel data. While classical approaches often rely on a hand coded data representation, the topic of autonomous representation or feature learning plays a major role in modern learning architectures. The goal of this contribution is to give an overview about different principles of autonomous feature learning, and to exemplify two principles based on two recent examples: autonomous metric learning for sequences, and autonomous learning of a deep representation for spoken language, respectively
Convolutional Recurrent Neural Network and Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision
In this paper we present our audio tagging system for the DCASE 2019 Challenge Task 2. We propose a model consisting of a convolutional front end using log-mel-energies as input features, a recurrent neural network sequence encoder and a fully connected classifier network outputting an activity probability for each of the 80 considered event classes. Due to the recurrent neural network, which encodes a whole sequence into a single vector, our model is able to process sequences of varying lengths. The model is trained with only little manually labeled training data and a larger amount of automatically labeled web data, which hence suffers from label noise. To efficiently train the model with the provided data we use various data augmentation to prevent overfitting and improve generalization. Our best submitted system achieves a label-weighted label-ranking average precision (lwlrap) of 75.5% on the private test set which is an absolute improvement of 21.7% over the baseline. This system scored the second place in the teams ranking of the DCASE 2019 Challenge Task 2 and the fifth place in the Kaggle competition ``Freesound Audio Tagging 2019'' with more than 400 participants. After the challenge ended we further improved performance to 76.5% lwlrap setting a new state-of-the-art on this dataset.646
Disentangling the Dimensions of Phonetic Variation: First Steps towards and Explanatory and Exploratory Research Tool in Phonetics
Wagner P, Häb-Umbach R. Disentangling the Dimensions of Phonetic Variation: First Steps towards and Explanatory and Exploratory Research Tool in Phonetics. PERILUS: Phonetic experimental research at the Institute of Linguistics, University of Stockholm. 2019;XXVII:79-83.In this paper, we present first evidence for a potential application of novel speech technological methods as a valuable tool for basic phonetics research. We describe a research program aiming at identifying the complex phonetic realizations underlying various dimensions of phonetic variation. This will be addressed with the help of recent approaches in unsupervised voice conversion and waveform generation. Concretely, we present a model for disentangling speakers' voice qualities and their linguistic-phonetic content, which can then be used to perform voice conversion across different dimensions of phonetic variation. The resulting signals are then "audible versions" of the phonetic dimensions of interest, and lend themselves to straightforward phonetic interpretation
Explaining voice characteristics to novice voice practitioners - How successful is it?
Wiechmann J, Rautenberg F, Wagner P, Häb-Umbach R. Explaining voice characteristics to novice voice practitioners - How successful is it? Presented at the 20th International Congress of the Phonetic Sciences (ICPhS) , Prague, Czech Republic.Human voices are notoriously difficult to
characterize. A suitable and consistent description
of voice characteristics is crucial in many applied
disciplines such as speech therapy or forensics.
The present study examines the ability of novice
voice practitioners (students of clinical linguistics)
to characterize voices before and after an expert
explanation of laryngeal, supralaryngeal and
prosodic voice features. Results show that
even short expert explanations lead to a higher
agreement between expert and novices. Especially
voice characteristics related to laryngeal and
supralaryngeal settings remain a major challenge
to identify. We suggest that voice conversion
technology may be employed in the future to assist
the explanation of voice characteristics
Technically enabled explaining of voice characteristics
Wiechmann J, Glarner T, Rautenberg F, Wagner P, Häb-Umbach R. Technically enabled explaining of voice characteristics. In: Bruggemann A, Ludusan B, eds. P & P 18. Bielefeld: Universitätsbibliothek Bielefeld; 2022
Acoustic modeling of hoarseness
Wiechmann J, Rautenberg F, Wagner P, Häb-Umbach R. Acoustic modeling of hoarseness. In: Pistor TR, Steiner C, Tomascheck F, Leemann A, eds. Book of Abstracts der 19. Tagung Phonetik und Phonologie im deutschsprachigen Raum. Bern: Bern Open Publishing; 2023