285 research outputs found
Multi-Task Self-Supervised Learning for Disfluency Detection
Most existing approaches to disfluency detection heavily rely on
human-annotated data, which is expensive to obtain in practice. To tackle the
training data bottleneck, we investigate methods for combining multiple
self-supervised tasks-i.e., supervised tasks where data can be collected
without manual labeling. First, we construct large-scale pseudo training data
by randomly adding or deleting words from unlabeled news data, and propose two
self-supervised pre-training tasks: (i) tagging task to detect the added noisy
words. (ii) sentence classification to distinguish original sentences from
grammatically-incorrect sentences. We then combine these two tasks to jointly
train a network. The pre-trained network is then fine-tuned using
human-annotated disfluency detection training data. Experimental results on the
commonly used English Switchboard test set show that our approach can achieve
competitive performance compared to the previous systems (trained using the
full dataset) by using less than 1% (1000 sentences) of the training data. Our
method trained on the full dataset significantly outperforms previous methods,
reducing the error by 21% on English Switchboard
A Stutter Seldom Comes Alone -- Cross-Corpus Stuttering Detection as a Multi-label Problem
Most stuttering detection and classification research has viewed stuttering
as a multi-class classification problem or a binary detection task for each
dysfluency type; however, this does not match the nature of stuttering, in
which one dysfluency seldom comes alone but rather co-occurs with others. This
paper explores multi-language and cross-corpus end-to-end stuttering detection
as a multi-label problem using a modified wav2vec 2.0 system with an
attention-based classification head and multi-task learning. We evaluate the
method using combinations of three datasets containing English and German
stuttered speech, one containing speech modified by fluency shaping. The
experimental results and an error analysis show that multi-label stuttering
detection systems trained on cross-corpus and multi-language data achieve
competitive results but performance on samples with multiple labels stays below
over-all detection results.Comment: Accepted for presentation at Interspeech 2023. arXiv admin note:
substantial text overlap with arXiv:2210.1598
Individual differences in the production of disfluency: a latent variable analysis of memory ability and verbal intelligence
Recent work has begun to focus on the role that individual differences in executive function and intelligence have on the production of fluent speech. However, isolating the underlying causes of different types of disfluency has been difficult given the speed and complexity of language production. In this study, we focused on the role of memory abilities and verbal intelligence, and we chose a task that relied heavily on memory for successful performance. Given the task demands, we hypothesised that a substantial proportion of disfluencies would be due to memory retrieval problems. We contrasted memory abilities with individual differences in verbal intelligence as previous work highlighted verbal intelligence as an important factor in disfluency production. A total of 78 participants memorised and repeated 40 syntactically complex sentences, which were recorded and coded for disfluencies. Model comparisons were carried out using hierarchical structural equation modelling. Results showed that repetitions were significantly related to verbal intelligence. Unfilled pauses and repairs, in contrast, were marginally (p <.09) related to memory abilities. The relationship in all cases was negative. Conclusions explore the link between different types of disfluency and particular problems arising in the course of production, and how individual differences inform theoretical debates in language production
- …