321 research outputs found

    Stuttering Detection Using Speaker Representations and Self-supervised Contextual Embeddings

    Full text link
    The adoption of advanced deep learning architectures in stuttering detection (SD) tasks is challenging due to the limited size of the available datasets. To this end, this work introduces the application of speech embeddings extracted from pre-trained deep learning models trained on large audio datasets for different tasks. In particular, we explore audio representations obtained using emphasized channel attention, propagation, and aggregation time delay neural network (ECAPA-TDNN) and Wav2Vec2.0 models trained on VoxCeleb and LibriSpeech datasets respectively. After extracting the embeddings, we benchmark with several traditional classifiers, such as the K-nearest neighbour (KNN), Gaussian naive Bayes, and neural network, for the SD tasks. In comparison to the standard SD systems trained only on the limited SEP-28k dataset, we obtain a relative improvement of 12.08%, 28.71%, 37.9% in terms of unweighted average recall (UAR) over the baselines. Finally, we have shown that combining two embeddings and concatenating multiple layers of Wav2Vec2.0 can further improve the UAR by up to 2.60% and 6.32% respectively.Comment: Accepted in International Journal of Speech Technology, Springer 2023 substantial overlap with arXiv:2204.0156

    Robust Stuttering Detection via Multi-task and Adversarial Learning

    Get PDF
    By automatic detection and identification of stuttering, speech pathologists can track the progression of disfluencies of persons who stutter (PWS). In this paper, we investigate the impact of multi-task (MTL) and adversarial learning (ADV) to learn robust stutter features. This is the first-ever preliminary study where MTL and ADV have been employed in stuttering identification (SI). We evaluate our system on the SEP-28k stuttering dataset consisting of 20 hours (approx) of data from 385 podcasts. Our methods show promising results and outperform the baseline in various disfluency classes. We achieve up to 10%, 6.78%, and 2% improvement in repetitions, blocks, and interjections respectively over the baseline.Comment: Under Review in European Signal Processing Conference 202

    End-to-End and Self-Supervised Learning for ComParE 2022 Stuttering Sub-Challenge

    Full text link
    In this paper, we present end-to-end and speech embedding based systems trained in a self-supervised fashion to participate in the ACM Multimedia 2022 ComParE Challenge, specifically the stuttering sub-challenge. In particular, we exploit the embeddings from the pre-trained Wav2Vec2.0 model for stuttering detection (SD) on the KSoF dataset. After embedding extraction, we benchmark with several methods for SD. Our proposed self-supervised based SD system achieves a UAR of 36.9% and 41.0% on validation and test sets respectively, which is 31.32% (validation set) and 1.49% (test set) higher than the best (DeepSpectrum) challenge baseline (CBL). Moreover, we show that concatenating layer embeddings with Mel-frequency cepstral coefficients (MFCCs) features further improves the UAR of 33.81% and 5.45% on validation and test sets respectively over the CBL. Finally, we demonstrate that the summing information across all the layers of Wav2Vec2.0 surpasses the CBL by a relative margin of 45.91% and 5.69% on validation and test sets respectively. Grand-challenge: Computational Paralinguistics ChallengEComment: Accepted in ACM MM 2022 Conference : Grand Challenges, "\c{opyright} {Owner/Author | ACM} {2022}. This is the author's version of the work. It is posted here for your personal use. Not for redistributio

    Formant Structures of Vowels Produced by Stutterers in Normal and Fast Speech Rates

    Get PDF
    The aim of this study is to analyse the steady--state portion of the first two formants (F1) and (F2) in the production of [CV] sequences, containing vowels [i, a, u], pronounced in two speech rates (normal and fast), by groups of untreated and treated stutterers, and control subjects. Locus equations have been calculated to observe for potential differences in coarticulatory strategies between the three groups. Data analyses reveal a reduction of vowel space for stutterers at a normal speaking rate. When speech rate increases, no reduction of vowel space is noticeable for the latter group of speakers, contrary to treated stutterers and controls. No significant differences between the three groups have been observed in coarticulatory strategies

    Articulatory Modeling Based on Semi-polar Coordinates and Guided PCA Technique

    Get PDF
    International audienceResearch on 2-dimensional static articulatory modeling has been performed by using the semi-polar system and the guided PCA analysis of lateral X-ray images of vocal tract. The density of the grid lines in the semi-polar system has been increased to have a better descriptive precision. New parameters have been introduced to describe the movements of tongue apex. An extra feature, the tongue root, has been extracted as one of the elementary factors in order to improve the precision of tongue model. New methods still remain to be developed for describing the movements of tongue apex

    Robust Stuttering Detection via Multi-task and Adversarial Learning

    Get PDF
    Accepted in EUSIPCO 2022International audienceBy automatic detection and identification of stuttering, speech pathologists can track the progression of disfluencies of persons who stutter (PWS). In this paper, we investigate the impact of multi-task (MTL) and adversarial learning (ADV) to learn robust stutter features. This is the first-ever preliminary study where MTL and ADV have been employed in stuttering identification (SI). We evaluate our system on the SEP-28k stuttering dataset consisting of ≈ 20 hours of data from 385 podcasts. Our methods show promising results and outperform the baseline in various disfluency classes. We achieve up to 10%, 6.78%, and 2% improvement in repetitions, blocks, and interjections respectively over the baseline

    End-to-End and Self-Supervised Learning for ComParE 2022 Stuttering Sub-Challenge

    Get PDF
    "\c{opyright} {Owner/Author | ACM} {2022}. This is the author's version of the work. It is posted here for your personal use. Not for redistributionInternational audienceIn this paper, we present end-to-end and speech embedding based systems trained in a self-supervised fashion to participate in the ACM Multimedia 2022 ComParE Challenge, specifically the stuttering sub-challenge. In particular, we exploit the embeddings from the pre-trained Wav2Vec2.0 model for stuttering detection (SD) on the KSoF dataset. After embedding extraction, we benchmark with several methods for SD. Our proposed self-supervised based SD system achieves a UAR of 36.9% and 41.0% on validation and test sets respectively, which is 31.32% (validation set) and 1.49% (test set) higher than the best (DeepSpectrum) challenge baseline (CBL). Moreover, we show that concatenating layer embeddings with Mel-frequency cepstral coefficients (MFCCs) features further improves the UAR of 33.81% and 5.45% on validation and test sets respectively over the CBL. Finally, we demonstrate that the summing information across all the layers of Wav2Vec2.0 surpasses the CBL by a relative margin of 45.91% and 5.69% on validation and test sets respectively
    • …
    corecore