Search CORE

71 research outputs found

Enhancing Crisis-Related Tweet Classification with Entity-Masked Language Modeling and Multi-Task Learning

Author: Riedhammer Korbinian
Seeberger Philipp
Publication venue
Publication date: 21/11/2022
Field of study

Social media has become an important information source for crisis management and provides quick access to ongoing developments and critical information. However, classification models suffer from event-related biases and highly imbalanced label distributions which still poses a challenging task. To address these challenges, we propose a combination of entity-masked language modeling and hierarchical multi-label classification as a multi-task learning problem. We evaluate our method on tweets from the TREC-IS dataset and show an absolute performance gain w.r.t. F1-score of up to 10% for actionable information types. Moreover, we found that entity-masking reduces the effect of overfitting to in-domain events and enables improvements in cross-event generalization.Comment: Accepted at NLP4PI (EMNLP 2022

arXiv.org e-Print Archive

Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0

Author: Bayerl Sebastian P.
Nöth Elmar
Riedhammer Korbinian
Wagner Dominik
Publication venue: 'International Speech Communication Association'
Publication date: 16/06/2022
Field of study

Stuttering is a varied speech disorder that harms an individual's communication ability. Persons who stutter (PWS) often use speech therapy to cope with their condition. Improving speech recognition systems for people with such non-typical speech or tracking the effectiveness of speech therapy would require systems that can detect dysfluencies while at the same time being able to detect speech techniques acquired in therapy. This paper shows that fine-tuning wav2vec 2.0 [1] for the classification of stuttering on a sizeable English corpus containing stuttered speech, in conjunction with multi-task learning, boosts the effectiveness of the general-purpose wav2vec 2.0 features for detecting stuttering in speech; both within and across languages. We evaluate our method on FluencyBank , [2] and the German therapy-centric Kassel State of Fluency (KSoF) [3] dataset by training Support Vector Machine classifiers using features extracted from the finetuned models for six different stuttering-related event types: blocks, prolongations, sound repetitions, word repetitions, interjections, and - specific to therapy - speech modifications. Using embeddings from the fine-tuned models leads to relative classification performance gains up to 27% w.r.t. F1-score.Comment: Accepted at Interspeech 202

arXiv.org e-Print Archive

Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments

Author: Baumann Ilja
Bayerl Sebastian P.
Bocklet Tobias
Riedhammer Korbinian
Wagner Dominik
Publication venue
Publication date: 07/12/2023
Field of study

We analyze the impact of speaker adaptation in end-to-end automatic speech recognition models based on transformers and wav2vec 2.0 under different noise conditions. By including speaker embeddings obtained from x-vector and ECAPA-TDNN systems, as well as i-vectors, we achieve relative word error rate improvements of up to 16.3% on LibriSpeech and up to 14.5% on Switchboard. We show that the proven method of concatenating speaker vectors to the acoustic features and supplying them as auxiliary model inputs remains a viable option to increase the robustness of end-to-end architectures. The effect on transformer models is stronger, when more noise is added to the input speech. The most substantial benefits for systems based on wav2vec 2.0 are achieved under moderate or no noise conditions. Both x-vectors and ECAPA-TDNN embeddings outperform i-vectors as speaker representations. The optimal embedding size depends on the dataset and also varies with the noise condition.Comment: Accepted at ASRU 202

arXiv.org e-Print Archive

A Stutter Seldom Comes Alone -- Cross-Corpus Stuttering Detection as a Multi-label Problem

Author: Baumann Ilja
Bayerl Sebastian P.
Bocklet Tobias
Hönig Florian
Nöth Elmar
Riedhammer Korbinian
Wagner Dominik
Publication venue
Publication date: 30/05/2023
Field of study

Most stuttering detection and classification research has viewed stuttering as a multi-class classification problem or a binary detection task for each dysfluency type; however, this does not match the nature of stuttering, in which one dysfluency seldom comes alone but rather co-occurs with others. This paper explores multi-language and cross-corpus end-to-end stuttering detection as a multi-label problem using a modified wav2vec 2.0 system with an attention-based classification head and multi-task learning. We evaluate the method using combinations of three datasets containing English and German stuttered speech, one containing speech modified by fluency shaping. The experimental results and an error analysis show that multi-label stuttering detection systems trained on cross-corpus and multi-language data achieve competitive results but performance on samples with multiple labels stays below over-all detection results.Comment: Accepted for presentation at Interspeech 2023. arXiv admin note: substantial text overlap with arXiv:2210.1598

arXiv.org e-Print Archive

Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data?

Author: Baumann Ilja
Bayerl Sebastian P.
Bocklet Tobias
Braun Franziska
Nöth Elmar
Riedhammer Korbinian
Wagner Dominik
Publication venue
Publication date: 27/10/2022
Field of study

The detection of pathologies from speech features is usually defined as a binary classification task with one class representing a specific pathology and the other class representing healthy speech. In this work, we train neural networks, large margin classifiers, and tree boosting machines to distinguish between four different pathologies: Parkinson's disease, laryngeal cancer, cleft lip and palate, and oral squamous cell carcinoma. We demonstrate that latent representations extracted at different layers of a pre-trained wav2vec 2.0 system can be effectively used to classify these types of pathological voices. We evaluate the robustness of our classifiers by adding room impulse responses to the test data and by applying them to unseen speech corpora. Our approach achieves unweighted average F1-Scores between 74.1% and 96.4%, depending on the model and the noise conditions used. The systems generalize and perform well on unseen data of healthy speakers sampled from a variety of different sources.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

A Survey of Music Generation in the Context of Interaction

Author: Agchar Ismael
Baumann Ilja
Braun Franziska
Perez-Toro Paula Andrea
Riedhammer Korbinian
Trump Sebastian
Ullrich Martin
Publication venue
Publication date: 23/02/2024
Field of study

In recent years, machine learning, and in particular generative adversarial neural networks (GANs) and attention-based neural networks (transformers), have been successfully used to compose and generate music, both melodies and polyphonic pieces. Current research focuses foremost on style replication (eg. generating a Bach-style chorale) or style transfer (eg. classical to jazz) based on large amounts of recorded or transcribed music, which in turn also allows for fairly straight-forward "performance" evaluation. However, most of these models are not suitable for human-machine co-creation through live interaction, neither is clear, how such models and resulting creations would be evaluated. This article presents a thorough review of music representation, feature analysis, heuristic algorithms, statistical and parametric modelling, and human and automatic evaluation measures, along with a discussion of which approaches and models seem most suitable for live interaction

arXiv.org e-Print Archive

Classifying Dementia in the Presence of Depression: A Cross-Corpus Study

Author: Bayerl Sebastian P.
Bocklet Tobias
Braun Franziska
Hillemacher Thomas
Hönig Florian
Lehfeld Hartmut
Nöth Elmar
Pérez-Toro Paula A.
Riedhammer Korbinian
Publication venue
Publication date: 16/08/2023
Field of study

Automated dementia screening enables early detection and intervention, reducing costs to healthcare systems and increasing quality of life for those affected. Depression has shared symptoms with dementia, adding complexity to diagnoses. The research focus so far has been on binary classification of dementia (DEM) and healthy controls (HC) using speech from picture description tests from a single dataset. In this work, we apply established baseline systems to discriminate cognitive impairment in speech from the semantic Verbal Fluency Test and the Boston Naming Test using text, audio and emotion embeddings in a 3-class classification problem (HC vs. MCI vs. DEM). We perform cross-corpus and mixed-corpus experiments on two independently recorded German datasets to investigate generalization to larger populations and different recording conditions. In a detailed error analysis, we look at depression as a secondary diagnosis to understand what our classifiers actually learn.Comment: Accepted at INTERSPEECH 202

arXiv.org e-Print Archive

Renal and Skeletal Anomalies in a Cohort of Individuals With Clinically Presumed Hereditary Nephropathy Analyzed by Molecular Genetic Testing

Author: Bald Martin
Braunisch Matthias C.
Gessner Michaela
Gunthner Roman
Heemann Uwe
Hoefele Julia
Lange-Sperandio Barbel
Renders Lutz
Riedhammer Korbinian M.
Schmaderer Christoph
Schmidts Miriam
Stippel Michaela
Strotmann Peter
Tasic Velibor
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 26/05/2021
Field of study

Background: Chronic kidney disease (CKD) in childhood and adolescence occurs with a median incidence of 9 per million of the age-related population. Over 70% of CKD cases under the age of 25 years can be attributed to a hereditary kidney disease. Among these are hereditary podocytopathies, ciliopathies and (monogenic) congenital anomalies of the kidney and urinary tract (CAKUT). These disease entities can present with a vast variety of extrarenal manifestations. So far, skeletal anomalies (SA) have been infrequently described as extrarenal manifestation in these entities. The aim of this study was to retrospectively investigate a cohort of individuals with hereditary podocytopathies, ciliopathies or CAKUT, in which molecular genetic testing had been performed, for the extrarenal manifestation of SA. Material and Methods: A cohort of 65 unrelated individuals with a clinically presumed hereditary podocytopathy (focal segmental glomerulosclerosis, steroid resistant nephrotic syndrome), ciliopathy (nephronophthisis, Bardet-Biedl syndrome, autosomal recessive/dominant polycystic kidney disease), or CAKUT was screened for SA. Data was acquired using a standardized questionnaire and medical reports. 57/65 (88%) of the index cases were analyzed using exome sequencing (ES). Results: 8/65 (12%) index individuals presented with a hereditary podocytopathy, ciliopathy, or CAKUT and an additional skeletal phenotype. In 5/8 families (63%), pathogenic variants in known disease-associated genes (1x BBS1, 1x MAFB, 2x PBX1, 1x SIX2) could be identified. Conclusions: This study highlights the genetic heterogeneity and clinical variability of hereditary nephropathies in respect of skeletal anomalies as extrarenal manifestation

Open Access LMU

PubMed Central

The ACM Multimedia 2022 Computational Paralinguistics Challenge: vocalisations, stuttering, activity, & mosquitoes

Author: Amiriparian Shahin
Batliner Anton
Bayerl Sebastien
Bergler Christian
Coppock Harry
Gerczuk Maurice
Holz Natalie
Kiskin Ivan
Larrouy-Maestri Pauline
Mallol-Ragolta Adria
Pateraki Maria
Riedhammer Korbinian
Roberts Stephen
Schuller Björn
Sinka Marianne
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2022
Field of study

The ACM Multimedia 2022 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the Vocalisations and Stuttering Sub-Challenges, a classification on human non-verbal vocalisations and speech has to be made; the Activity Sub-Challenge aims at beyond-audio human activity recognition from smartwatch sensor data; and in the Mosquitoes Sub-Challenge, mosquitoes need to be detected. We describe the Sub-Challenges, baseline feature extraction, and classifiers based on the 'usual' ComParE and BoAW features, the auDeep toolkit, and deep feature extraction from pre-trained CNNs using the DeepSpectrum toolkit; in addition, we add end-to-end sequential modelling, and a log-mel-128-BNN

OPUS Augsburg