321 research outputs found
Stuttering Detection Using Speaker Representations and Self-supervised Contextual Embeddings
The adoption of advanced deep learning architectures in stuttering detection
(SD) tasks is challenging due to the limited size of the available datasets. To
this end, this work introduces the application of speech embeddings extracted
from pre-trained deep learning models trained on large audio datasets for
different tasks. In particular, we explore audio representations obtained using
emphasized channel attention, propagation, and aggregation time delay neural
network (ECAPA-TDNN) and Wav2Vec2.0 models trained on VoxCeleb and LibriSpeech
datasets respectively. After extracting the embeddings, we benchmark with
several traditional classifiers, such as the K-nearest neighbour (KNN),
Gaussian naive Bayes, and neural network, for the SD tasks. In comparison to
the standard SD systems trained only on the limited SEP-28k dataset, we obtain
a relative improvement of 12.08%, 28.71%, 37.9% in terms of unweighted average
recall (UAR) over the baselines. Finally, we have shown that combining two
embeddings and concatenating multiple layers of Wav2Vec2.0 can further improve
the UAR by up to 2.60% and 6.32% respectively.Comment: Accepted in International Journal of Speech Technology, Springer 2023
substantial overlap with arXiv:2204.0156
Robust Stuttering Detection via Multi-task and Adversarial Learning
By automatic detection and identification of stuttering, speech pathologists
can track the progression of disfluencies of persons who stutter (PWS). In this
paper, we investigate the impact of multi-task (MTL) and adversarial learning
(ADV) to learn robust stutter features. This is the first-ever preliminary
study where MTL and ADV have been employed in stuttering identification (SI).
We evaluate our system on the SEP-28k stuttering dataset consisting of 20 hours
(approx) of data from 385 podcasts. Our methods show promising results and
outperform the baseline in various disfluency classes. We achieve up to 10%,
6.78%, and 2% improvement in repetitions, blocks, and interjections
respectively over the baseline.Comment: Under Review in European Signal Processing Conference 202
End-to-End and Self-Supervised Learning for ComParE 2022 Stuttering Sub-Challenge
In this paper, we present end-to-end and speech embedding based systems
trained in a self-supervised fashion to participate in the ACM Multimedia 2022
ComParE Challenge, specifically the stuttering sub-challenge. In particular, we
exploit the embeddings from the pre-trained Wav2Vec2.0 model for stuttering
detection (SD) on the KSoF dataset. After embedding extraction, we benchmark
with several methods for SD. Our proposed self-supervised based SD system
achieves a UAR of 36.9% and 41.0% on validation and test sets respectively,
which is 31.32% (validation set) and 1.49% (test set) higher than the best
(DeepSpectrum) challenge baseline (CBL). Moreover, we show that concatenating
layer embeddings with Mel-frequency cepstral coefficients (MFCCs) features
further improves the UAR of 33.81% and 5.45% on validation and test sets
respectively over the CBL. Finally, we demonstrate that the summing information
across all the layers of Wav2Vec2.0 surpasses the CBL by a relative margin of
45.91% and 5.69% on validation and test sets respectively. Grand-challenge:
Computational Paralinguistics ChallengEComment: Accepted in ACM MM 2022 Conference : Grand Challenges, "\c{opyright}
{Owner/Author | ACM} {2022}. This is the author's version of the work. It is
posted here for your personal use. Not for redistributio
De l'utilisation de la pause silencieuse dans le débat politique télévisé. Le cas de François Hollande
International audienc
Formant Structures of Vowels Produced by Stutterers in Normal and Fast Speech Rates
The aim of this study is to analyse the steady--state portion of the first two formants (F1) and (F2) in the production of [CV] sequences, containing vowels [i, a, u], pronounced in two speech rates (normal and fast), by groups of untreated and treated stutterers, and control subjects. Locus equations have been calculated to observe for potential differences in coarticulatory strategies between the three groups. Data analyses reveal a reduction of vowel space for stutterers at a normal speaking rate. When speech rate increases, no reduction of vowel space is noticeable for the latter group of speakers, contrary to treated stutterers and controls. No significant differences between the three groups have been observed in coarticulatory strategies
Articulatory Modeling Based on Semi-polar Coordinates and Guided PCA Technique
International audienceResearch on 2-dimensional static articulatory modeling has been performed by using the semi-polar system and the guided PCA analysis of lateral X-ray images of vocal tract. The density of the grid lines in the semi-polar system has been increased to have a better descriptive precision. New parameters have been introduced to describe the movements of tongue apex. An extra feature, the tongue root, has been extracted as one of the elementary factors in order to improve the precision of tongue model. New methods still remain to be developed for describing the movements of tongue apex
Robust Stuttering Detection via Multi-task and Adversarial Learning
Accepted in EUSIPCO 2022International audienceBy automatic detection and identification of stuttering, speech pathologists can track the progression of disfluencies of persons who stutter (PWS). In this paper, we investigate the impact of multi-task (MTL) and adversarial learning (ADV) to learn robust stutter features. This is the first-ever preliminary study where MTL and ADV have been employed in stuttering identification (SI). We evaluate our system on the SEP-28k stuttering dataset consisting of ≈ 20 hours of data from 385 podcasts. Our methods show promising results and outperform the baseline in various disfluency classes. We achieve up to 10%, 6.78%, and 2% improvement in repetitions, blocks, and interjections respectively over the baseline
End-to-End and Self-Supervised Learning for ComParE 2022 Stuttering Sub-Challenge
"\c{opyright} {Owner/Author | ACM} {2022}. This is the author's version of the work. It is posted here for your personal use. Not for redistributionInternational audienceIn this paper, we present end-to-end and speech embedding based systems trained in a self-supervised fashion to participate in the ACM Multimedia 2022 ComParE Challenge, specifically the stuttering sub-challenge. In particular, we exploit the embeddings from the pre-trained Wav2Vec2.0 model for stuttering detection (SD) on the KSoF dataset. After embedding extraction, we benchmark with several methods for SD. Our proposed self-supervised based SD system achieves a UAR of 36.9% and 41.0% on validation and test sets respectively, which is 31.32% (validation set) and 1.49% (test set) higher than the best (DeepSpectrum) challenge baseline (CBL). Moreover, we show that concatenating layer embeddings with Mel-frequency cepstral coefficients (MFCCs) features further improves the UAR of 33.81% and 5.45% on validation and test sets respectively over the CBL. Finally, we demonstrate that the summing information across all the layers of Wav2Vec2.0 surpasses the CBL by a relative margin of 45.91% and 5.69% on validation and test sets respectively
- …