71 research outputs found

    Enhancing Crisis-Related Tweet Classification with Entity-Masked Language Modeling and Multi-Task Learning

    Full text link
    Social media has become an important information source for crisis management and provides quick access to ongoing developments and critical information. However, classification models suffer from event-related biases and highly imbalanced label distributions which still poses a challenging task. To address these challenges, we propose a combination of entity-masked language modeling and hierarchical multi-label classification as a multi-task learning problem. We evaluate our method on tweets from the TREC-IS dataset and show an absolute performance gain w.r.t. F1-score of up to 10% for actionable information types. Moreover, we found that entity-masking reduces the effect of overfitting to in-domain events and enables improvements in cross-event generalization.Comment: Accepted at NLP4PI (EMNLP 2022

    Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0

    Full text link
    Stuttering is a varied speech disorder that harms an individual's communication ability. Persons who stutter (PWS) often use speech therapy to cope with their condition. Improving speech recognition systems for people with such non-typical speech or tracking the effectiveness of speech therapy would require systems that can detect dysfluencies while at the same time being able to detect speech techniques acquired in therapy. This paper shows that fine-tuning wav2vec 2.0 [1] for the classification of stuttering on a sizeable English corpus containing stuttered speech, in conjunction with multi-task learning, boosts the effectiveness of the general-purpose wav2vec 2.0 features for detecting stuttering in speech; both within and across languages. We evaluate our method on FluencyBank , [2] and the German therapy-centric Kassel State of Fluency (KSoF) [3] dataset by training Support Vector Machine classifiers using features extracted from the finetuned models for six different stuttering-related event types: blocks, prolongations, sound repetitions, word repetitions, interjections, and - specific to therapy - speech modifications. Using embeddings from the fine-tuned models leads to relative classification performance gains up to 27% w.r.t. F1-score.Comment: Accepted at Interspeech 202

    Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments

    Full text link
    We analyze the impact of speaker adaptation in end-to-end automatic speech recognition models based on transformers and wav2vec 2.0 under different noise conditions. By including speaker embeddings obtained from x-vector and ECAPA-TDNN systems, as well as i-vectors, we achieve relative word error rate improvements of up to 16.3% on LibriSpeech and up to 14.5% on Switchboard. We show that the proven method of concatenating speaker vectors to the acoustic features and supplying them as auxiliary model inputs remains a viable option to increase the robustness of end-to-end architectures. The effect on transformer models is stronger, when more noise is added to the input speech. The most substantial benefits for systems based on wav2vec 2.0 are achieved under moderate or no noise conditions. Both x-vectors and ECAPA-TDNN embeddings outperform i-vectors as speaker representations. The optimal embedding size depends on the dataset and also varies with the noise condition.Comment: Accepted at ASRU 202

    A Stutter Seldom Comes Alone -- Cross-Corpus Stuttering Detection as a Multi-label Problem

    Full text link
    Most stuttering detection and classification research has viewed stuttering as a multi-class classification problem or a binary detection task for each dysfluency type; however, this does not match the nature of stuttering, in which one dysfluency seldom comes alone but rather co-occurs with others. This paper explores multi-language and cross-corpus end-to-end stuttering detection as a multi-label problem using a modified wav2vec 2.0 system with an attention-based classification head and multi-task learning. We evaluate the method using combinations of three datasets containing English and German stuttered speech, one containing speech modified by fluency shaping. The experimental results and an error analysis show that multi-label stuttering detection systems trained on cross-corpus and multi-language data achieve competitive results but performance on samples with multiple labels stays below over-all detection results.Comment: Accepted for presentation at Interspeech 2023. arXiv admin note: substantial text overlap with arXiv:2210.1598

    Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data?

    Full text link
    The detection of pathologies from speech features is usually defined as a binary classification task with one class representing a specific pathology and the other class representing healthy speech. In this work, we train neural networks, large margin classifiers, and tree boosting machines to distinguish between four different pathologies: Parkinson's disease, laryngeal cancer, cleft lip and palate, and oral squamous cell carcinoma. We demonstrate that latent representations extracted at different layers of a pre-trained wav2vec 2.0 system can be effectively used to classify these types of pathological voices. We evaluate the robustness of our classifiers by adding room impulse responses to the test data and by applying them to unseen speech corpora. Our approach achieves unweighted average F1-Scores between 74.1% and 96.4%, depending on the model and the noise conditions used. The systems generalize and perform well on unseen data of healthy speakers sampled from a variety of different sources.Comment: Submitted to ICASSP 202

    A Survey of Music Generation in the Context of Interaction

    Full text link
    In recent years, machine learning, and in particular generative adversarial neural networks (GANs) and attention-based neural networks (transformers), have been successfully used to compose and generate music, both melodies and polyphonic pieces. Current research focuses foremost on style replication (eg. generating a Bach-style chorale) or style transfer (eg. classical to jazz) based on large amounts of recorded or transcribed music, which in turn also allows for fairly straight-forward "performance" evaluation. However, most of these models are not suitable for human-machine co-creation through live interaction, neither is clear, how such models and resulting creations would be evaluated. This article presents a thorough review of music representation, feature analysis, heuristic algorithms, statistical and parametric modelling, and human and automatic evaluation measures, along with a discussion of which approaches and models seem most suitable for live interaction

    Classifying Dementia in the Presence of Depression: A Cross-Corpus Study

    Full text link
    Automated dementia screening enables early detection and intervention, reducing costs to healthcare systems and increasing quality of life for those affected. Depression has shared symptoms with dementia, adding complexity to diagnoses. The research focus so far has been on binary classification of dementia (DEM) and healthy controls (HC) using speech from picture description tests from a single dataset. In this work, we apply established baseline systems to discriminate cognitive impairment in speech from the semantic Verbal Fluency Test and the Boston Naming Test using text, audio and emotion embeddings in a 3-class classification problem (HC vs. MCI vs. DEM). We perform cross-corpus and mixed-corpus experiments on two independently recorded German datasets to investigate generalization to larger populations and different recording conditions. In a detailed error analysis, we look at depression as a secondary diagnosis to understand what our classifiers actually learn.Comment: Accepted at INTERSPEECH 202

    Renal and Skeletal Anomalies in a Cohort of Individuals With Clinically Presumed Hereditary Nephropathy Analyzed by Molecular Genetic Testing

    Get PDF
    Background: Chronic kidney disease (CKD) in childhood and adolescence occurs with a median incidence of 9 per million of the age-related population. Over 70% of CKD cases under the age of 25 years can be attributed to a hereditary kidney disease. Among these are hereditary podocytopathies, ciliopathies and (monogenic) congenital anomalies of the kidney and urinary tract (CAKUT). These disease entities can present with a vast variety of extrarenal manifestations. So far, skeletal anomalies (SA) have been infrequently described as extrarenal manifestation in these entities. The aim of this study was to retrospectively investigate a cohort of individuals with hereditary podocytopathies, ciliopathies or CAKUT, in which molecular genetic testing had been performed, for the extrarenal manifestation of SA. Material and Methods: A cohort of 65 unrelated individuals with a clinically presumed hereditary podocytopathy (focal segmental glomerulosclerosis, steroid resistant nephrotic syndrome), ciliopathy (nephronophthisis, Bardet-Biedl syndrome, autosomal recessive/dominant polycystic kidney disease), or CAKUT was screened for SA. Data was acquired using a standardized questionnaire and medical reports. 57/65 (88%) of the index cases were analyzed using exome sequencing (ES). Results: 8/65 (12%) index individuals presented with a hereditary podocytopathy, ciliopathy, or CAKUT and an additional skeletal phenotype. In 5/8 families (63%), pathogenic variants in known disease-associated genes (1x BBS1, 1x MAFB, 2x PBX1, 1x SIX2) could be identified. Conclusions: This study highlights the genetic heterogeneity and clinical variability of hereditary nephropathies in respect of skeletal anomalies as extrarenal manifestation

    The ACM Multimedia 2022 Computational Paralinguistics Challenge: vocalisations, stuttering, activity, & mosquitoes

    Get PDF
    The ACM Multimedia 2022 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the Vocalisations and Stuttering Sub-Challenges, a classification on human non-verbal vocalisations and speech has to be made; the Activity Sub-Challenge aims at beyond-audio human activity recognition from smartwatch sensor data; and in the Mosquitoes Sub-Challenge, mosquitoes need to be detected. We describe the Sub-Challenges, baseline feature extraction, and classifiers based on the 'usual' ComParE and BoAW features, the auDeep toolkit, and deep feature extraction from pre-trained CNNs using the DeepSpectrum toolkit; in addition, we add end-to-end sequential modelling, and a log-mel-128-BNN
    corecore