Search CORE

285 research outputs found

Multi-Task Self-Supervised Learning for Disfluency Detection

Author: Che Wanxiang
Liu Qi
Liu Ting
Qin Pengda
Wang Shaolei
Wang William Yang
Publication venue
Publication date: 03/04/2020
Field of study

Most existing approaches to disfluency detection heavily rely on human-annotated data, which is expensive to obtain in practice. To tackle the training data bottleneck, we investigate methods for combining multiple self-supervised tasks-i.e., supervised tasks where data can be collected without manual labeling. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled news data, and propose two self-supervised pre-training tasks: (i) tagging task to detect the added noisy words. (ii) sentence classification to distinguish original sentences from grammatically-incorrect sentences. We then combine these two tasks to jointly train a network. The pre-trained network is then fine-tuned using human-annotated disfluency detection training data. Experimental results on the commonly used English Switchboard test set show that our approach can achieve competitive performance compared to the previous systems (trained using the full dataset) by using less than 1% (1000 sentences) of the training data. Our method trained on the full dataset significantly outperforms previous methods, reducing the error by 21% on English Switchboard

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Alzheimer’s Dementia Recognition Through Spontaneous Speech

Author
Publication venue: 'Frontiers Media SA'
Publication date: 21/10/2021
Field of study

Edinburgh Research Explorer

A Stutter Seldom Comes Alone -- Cross-Corpus Stuttering Detection as a Multi-label Problem

Author: Baumann Ilja
Bayerl Sebastian P.
Bocklet Tobias
Hönig Florian
Nöth Elmar
Riedhammer Korbinian
Wagner Dominik
Publication venue
Publication date: 30/05/2023
Field of study

Most stuttering detection and classification research has viewed stuttering as a multi-class classification problem or a binary detection task for each dysfluency type; however, this does not match the nature of stuttering, in which one dysfluency seldom comes alone but rather co-occurs with others. This paper explores multi-language and cross-corpus end-to-end stuttering detection as a multi-label problem using a modified wav2vec 2.0 system with an attention-based classification head and multi-task learning. We evaluate the method using combinations of three datasets containing English and German stuttered speech, one containing speech modified by fluency shaping. The experimental results and an error analysis show that multi-label stuttering detection systems trained on cross-corpus and multi-language data achieve competitive results but performance on samples with multiple labels stays below over-all detection results.Comment: Accepted for presentation at Interspeech 2023. arXiv admin note: substantial text overlap with arXiv:2210.1598

arXiv.org e-Print Archive

Individual differences in the production of disfluency: a latent variable analysis of memory ability and verbal intelligence

Author: Corley Martin
Engelhardt Paul E.
McMullon Mhairi E.G.
Publication venue: 'SAGE Publications'
Publication date: 14/05/2018
Field of study

Recent work has begun to focus on the role that individual differences in executive function and intelligence have on the production of fluent speech. However, isolating the underlying causes of different types of disfluency has been difficult given the speed and complexity of language production. In this study, we focused on the role of memory abilities and verbal intelligence, and we chose a task that relied heavily on memory for successful performance. Given the task demands, we hypothesised that a substantial proportion of disfluencies would be due to memory retrieval problems. We contrasted memory abilities with individual differences in verbal intelligence as previous work highlighted verbal intelligence as an important factor in disfluency production. A total of 78 participants memorised and repeated 40 syntactically complex sentences, which were recorded and coded for disfluencies. Model comparisons were carried out using hierarchical structural equation modelling. Results showed that repetitions were significantly related to verbal intelligence. Unfilled pauses and repairs, in contrast, were marginally (p <.09) related to memory abilities. The relationship in all cases was negative. Conclusions explore the link between different types of disfluency and particular problems arising in the course of production, and how individual differences inform theoretical debates in language production

Edinburgh Research Explorer

University of East Anglia digital repository