285 research outputs found

    Multi-Task Self-Supervised Learning for Disfluency Detection

    Full text link
    Most existing approaches to disfluency detection heavily rely on human-annotated data, which is expensive to obtain in practice. To tackle the training data bottleneck, we investigate methods for combining multiple self-supervised tasks-i.e., supervised tasks where data can be collected without manual labeling. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled news data, and propose two self-supervised pre-training tasks: (i) tagging task to detect the added noisy words. (ii) sentence classification to distinguish original sentences from grammatically-incorrect sentences. We then combine these two tasks to jointly train a network. The pre-trained network is then fine-tuned using human-annotated disfluency detection training data. Experimental results on the commonly used English Switchboard test set show that our approach can achieve competitive performance compared to the previous systems (trained using the full dataset) by using less than 1% (1000 sentences) of the training data. Our method trained on the full dataset significantly outperforms previous methods, reducing the error by 21% on English Switchboard

    Alzheimer’s Dementia Recognition Through Spontaneous Speech

    Get PDF

    A Stutter Seldom Comes Alone -- Cross-Corpus Stuttering Detection as a Multi-label Problem

    Full text link
    Most stuttering detection and classification research has viewed stuttering as a multi-class classification problem or a binary detection task for each dysfluency type; however, this does not match the nature of stuttering, in which one dysfluency seldom comes alone but rather co-occurs with others. This paper explores multi-language and cross-corpus end-to-end stuttering detection as a multi-label problem using a modified wav2vec 2.0 system with an attention-based classification head and multi-task learning. We evaluate the method using combinations of three datasets containing English and German stuttered speech, one containing speech modified by fluency shaping. The experimental results and an error analysis show that multi-label stuttering detection systems trained on cross-corpus and multi-language data achieve competitive results but performance on samples with multiple labels stays below over-all detection results.Comment: Accepted for presentation at Interspeech 2023. arXiv admin note: substantial text overlap with arXiv:2210.1598

    Individual differences in the production of disfluency: a latent variable analysis of memory ability and verbal intelligence

    Get PDF
    Recent work has begun to focus on the role that individual differences in executive function and intelligence have on the production of fluent speech. However, isolating the underlying causes of different types of disfluency has been difficult given the speed and complexity of language production. In this study, we focused on the role of memory abilities and verbal intelligence, and we chose a task that relied heavily on memory for successful performance. Given the task demands, we hypothesised that a substantial proportion of disfluencies would be due to memory retrieval problems. We contrasted memory abilities with individual differences in verbal intelligence as previous work highlighted verbal intelligence as an important factor in disfluency production. A total of 78 participants memorised and repeated 40 syntactically complex sentences, which were recorded and coded for disfluencies. Model comparisons were carried out using hierarchical structural equation modelling. Results showed that repetitions were significantly related to verbal intelligence. Unfilled pauses and repairs, in contrast, were marginally (p <.09) related to memory abilities. The relationship in all cases was negative. Conclusions explore the link between different types of disfluency and particular problems arising in the course of production, and how individual differences inform theoretical debates in language production
    • …
    corecore