13 research outputs found

    Technically enabled explaining of voice characteristics

    Get PDF

    Autonomous Learning of Representations

    Get PDF
    Walter O, Häb-Umbach R, Mokbel B, Paaßen B, Hammer B. Autonomous Learning of Representations. KI - Künstliche Intelligenz. 2015;29(4):339–351.Besides the core learning algorithm itself, one major question in machine learning is how to best encode given training data such that the learning technology can efficiently learn based thereon and generalize to novel data. While classical approaches often rely on a hand coded data representation, the topic of autonomous representation or feature learning plays a major role in modern learning architectures. The goal of this contribution is to give an overview about different principles of autonomous feature learning, and to exemplify two principles based on two recent examples: autonomous metric learning for sequences, and autonomous learning of a deep representation for spoken language, respectively

    Convolutional Recurrent Neural Network and Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision

    Get PDF
    In this paper we present our audio tagging system for the DCASE 2019 Challenge Task 2. We propose a model consisting of a convolutional front end using log-mel-energies as input features, a recurrent neural network sequence encoder and a fully connected classifier network outputting an activity probability for each of the 80 considered event classes. Due to the recurrent neural network, which encodes a whole sequence into a single vector, our model is able to process sequences of varying lengths. The model is trained with only little manually labeled training data and a larger amount of automatically labeled web data, which hence suffers from label noise. To efficiently train the model with the provided data we use various data augmentation to prevent overfitting and improve generalization. Our best submitted system achieves a label-weighted label-ranking average precision (lwlrap) of 75.5% on the private test set which is an absolute improvement of 21.7% over the baseline. This system scored the second place in the teams ranking of the DCASE 2019 Challenge Task 2 and the fifth place in the Kaggle competition ``Freesound Audio Tagging 2019'' with more than 400 participants. After the challenge ended we further improved performance to 76.5% lwlrap setting a new state-of-the-art on this dataset.646

    Disentangling the Dimensions of Phonetic Variation: First Steps towards and Explanatory and Exploratory Research Tool in Phonetics

    Get PDF
    Wagner P, Häb-Umbach R. Disentangling the Dimensions of Phonetic Variation: First Steps towards and Explanatory and Exploratory Research Tool in Phonetics. PERILUS: Phonetic experimental research at the Institute of Linguistics, University of Stockholm. 2019;XXVII:79-83.In this paper, we present first evidence for a potential application of novel speech technological methods as a valuable tool for basic phonetics research. We describe a research program aiming at identifying the complex phonetic realizations underlying various dimensions of phonetic variation. This will be addressed with the help of recent approaches in unsupervised voice conversion and waveform generation. Concretely, we present a model for disentangling speakers' voice qualities and their linguistic-phonetic content, which can then be used to perform voice conversion across different dimensions of phonetic variation. The resulting signals are then "audible versions" of the phonetic dimensions of interest, and lend themselves to straightforward phonetic interpretation

    Explaining voice characteristics to novice voice practitioners - How successful is it?

    No full text
    Wiechmann J, Rautenberg F, Wagner P, Häb-Umbach R. Explaining voice characteristics to novice voice practitioners - How successful is it? Presented at the 20th International Congress of the Phonetic Sciences (ICPhS) , Prague, Czech Republic.Human voices are notoriously difficult to characterize. A suitable and consistent description of voice characteristics is crucial in many applied disciplines such as speech therapy or forensics. The present study examines the ability of novice voice practitioners (students of clinical linguistics) to characterize voices before and after an expert explanation of laryngeal, supralaryngeal and prosodic voice features. Results show that even short expert explanations lead to a higher agreement between expert and novices. Especially voice characteristics related to laryngeal and supralaryngeal settings remain a major challenge to identify. We suggest that voice conversion technology may be employed in the future to assist the explanation of voice characteristics

    Technically enabled explaining of voice characteristics

    No full text
    Wiechmann J, Glarner T, Rautenberg F, Wagner P, Häb-Umbach R. Technically enabled explaining of voice characteristics. In: Bruggemann A, Ludusan B, eds. P & P 18. Bielefeld: Universitätsbibliothek Bielefeld; 2022

    Acoustic modeling of hoarseness

    No full text
    Wiechmann J, Rautenberg F, Wagner P, Häb-Umbach R. Acoustic modeling of hoarseness. In: Pistor TR, Steiner C, Tomascheck F, Leemann A, eds. Book of Abstracts der 19. Tagung Phonetik und Phonologie im deutschsprachigen Raum. Bern: Bern Open Publishing; 2023
    corecore