Search CORE

321 research outputs found

Stuttering Detection Using Speaker Representations and Self-supervised Contextual Embeddings

Author: Hirsch Fabrice
Ouni Slim
Sahidullah Md
Sheikh Shakeel A.
Publication venue
Publication date: 01/06/2023
Field of study

The adoption of advanced deep learning architectures in stuttering detection (SD) tasks is challenging due to the limited size of the available datasets. To this end, this work introduces the application of speech embeddings extracted from pre-trained deep learning models trained on large audio datasets for different tasks. In particular, we explore audio representations obtained using emphasized channel attention, propagation, and aggregation time delay neural network (ECAPA-TDNN) and Wav2Vec2.0 models trained on VoxCeleb and LibriSpeech datasets respectively. After extracting the embeddings, we benchmark with several traditional classifiers, such as the K-nearest neighbour (KNN), Gaussian naive Bayes, and neural network, for the SD tasks. In comparison to the standard SD systems trained only on the limited SEP-28k dataset, we obtain a relative improvement of 12.08%, 28.71%, 37.9% in terms of unweighted average recall (UAR) over the baselines. Finally, we have shown that combining two embeddings and concatenating multiple layers of Wav2Vec2.0 can further improve the UAR by up to 2.60% and 6.32% respectively.Comment: Accepted in International Journal of Speech Technology, Springer 2023 substantial overlap with arXiv:2204.0156

arXiv.org e-Print Archive

Robust Stuttering Detection via Multi-task and Adversarial Learning

Author: Hirsch Fabrice
Ouni Slim
Sahidullah Md
Sheikh Shakeel Ahmad
Publication venue
Publication date: 04/04/2022
Field of study

By automatic detection and identification of stuttering, speech pathologists can track the progression of disfluencies of persons who stutter (PWS). In this paper, we investigate the impact of multi-task (MTL) and adversarial learning (ADV) to learn robust stutter features. This is the first-ever preliminary study where MTL and ADV have been employed in stuttering identification (SI). We evaluate our system on the SEP-28k stuttering dataset consisting of 20 hours (approx) of data from 385 podcasts. Our methods show promising results and outperform the baseline in various disfluency classes. We achieve up to 10%, 6.78%, and 2% improvement in repetitions, blocks, and interjections respectively over the baseline.Comment: Under Review in European Signal Processing Conference 202

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

End-to-End and Self-Supervised Learning for ComParE 2022 Stuttering Sub-Challenge

Author: Hirsch Fabrice
Ouni Slim
Sahidullah Md
Sheikh Shakeel Ahmad
Publication venue
Publication date: 20/07/2022
Field of study

In this paper, we present end-to-end and speech embedding based systems trained in a self-supervised fashion to participate in the ACM Multimedia 2022 ComParE Challenge, specifically the stuttering sub-challenge. In particular, we exploit the embeddings from the pre-trained Wav2Vec2.0 model for stuttering detection (SD) on the KSoF dataset. After embedding extraction, we benchmark with several methods for SD. Our proposed self-supervised based SD system achieves a UAR of 36.9% and 41.0% on validation and test sets respectively, which is 31.32% (validation set) and 1.49% (test set) higher than the best (DeepSpectrum) challenge baseline (CBL). Moreover, we show that concatenating layer embeddings with Mel-frequency cepstral coefficients (MFCCs) features further improves the UAR of 33.81% and 5.45% on validation and test sets respectively over the CBL. Finally, we demonstrate that the summing information across all the layers of Wav2Vec2.0 surpasses the CBL by a relative margin of 45.91% and 5.69% on validation and test sets respectively. Grand-challenge: Computational Paralinguistics ChallengEComment: Accepted in ACM MM 2022 Conference : Grand Challenges, "\c{opyright} {Owner/Author | ACM} {2022}. This is the author's version of the work. It is posted here for your personal use. Not for redistributio

arXiv.org e-Print Archive

De l'utilisation de la pause silencieuse dans le débat politique télévisé. Le cas de François Hollande

Author: Bechet Marion
Hirsch Fabrice
Marsac Fabrice
Richard Arnaud
Sandré Marion
Sock Rudolph
Publication venue: Lyon : ENS Éditions
Publication date: 01/01/2013
Field of study

International audienc

Formant Structures of Vowels Produced by Stutterers in Normal and Fast Speech Rates

Author: Bechet Marion
Bouarourou Fayssal
Hirsch Fabrice
Monfrais-Pfauwadel Marie-Claude
Sock Rudolph
Sturm Jean
Vaxelaire Béatrice
Publication venue: HAL CCSD
Publication date: 08/12/2008
Field of study

The aim of this study is to analyse the steady--state portion of the first two formants (F1) and (F2) in the production of [CV] sequences, containing vowels [i, a, u], pronounced in two speech rates (normal and fast), by groups of untreated and treated stutterers, and control subjects. Locus equations have been calculated to observe for potential differences in coarticulatory strategies between the three groups. Data analyses reveal a reduction of vowel space for stutterers at a normal speaking rate. When speech rate increases, no reduction of vowel space is noticeable for the latter group of speakers, contrary to treated stutterers and controls. No significant differences between the three groups have been observed in coarticulatory strategies

HAL Descartes

Articulatory Modeling Based on Semi-polar Coordinates and Guided PCA Technique

Author: Busset Julie
Cai Jun
Hirsch Fabrice
Laprie Yves
Publication venue: HAL CCSD
Publication date: 07/09/2009
Field of study

International audienceResearch on 2-dimensional static articulatory modeling has been performed by using the semi-polar system and the guided PCA analysis of lateral X-ray images of vocal tract. The density of the grid lines in the semi-polar system has been increased to have a better descriptive precision. New parameters have been introduced to describe the movements of tongue apex. An extra feature, the tongue root, has been extracted as one of the elementary factors in order to improve the precision of tongue model. New methods still remain to be developed for describing the movements of tongue apex

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Robust Stuttering Detection via Multi-task and Adversarial Learning

Author: Hirsch Fabrice
Ouni Slim
Sahidullah Md
Sheikh Shakeel
Publication venue: HAL CCSD
Publication date: 29/08/2022
Field of study

Accepted in EUSIPCO 2022International audienceBy automatic detection and identification of stuttering, speech pathologists can track the progression of disfluencies of persons who stutter (PWS). In this paper, we investigate the impact of multi-task (MTL) and adversarial learning (ADV) to learn robust stutter features. This is the first-ever preliminary study where MTL and ADV have been employed in stuttering identification (SI). We evaluate our system on the SEP-28k stuttering dataset consisting of ≈ 20 hours of data from 385 podcasts. Our methods show promising results and outperform the baseline in various disfluency classes. We achieve up to 10%, 6.78%, and 2% improvement in repetitions, blocks, and interjections respectively over the baseline

INRIA a CCSD electronic archive server

End-to-End and Self-Supervised Learning for ComParE 2022 Stuttering Sub-Challenge

Author: Hirsch Fabrice
Ouni Slim
Sahidullah Md
Sheikh Shakeel,
Publication venue: HAL CCSD
Publication date: 14/10/2022
Field of study

"\c{opyright} {Owner/Author | ACM} {2022}. This is the author's version of the work. It is posted here for your personal use. Not for redistributionInternational audienceIn this paper, we present end-to-end and speech embedding based systems trained in a self-supervised fashion to participate in the ACM Multimedia 2022 ComParE Challenge, specifically the stuttering sub-challenge. In particular, we exploit the embeddings from the pre-trained Wav2Vec2.0 model for stuttering detection (SD) on the KSoF dataset. After embedding extraction, we benchmark with several methods for SD. Our proposed self-supervised based SD system achieves a UAR of 36.9% and 41.0% on validation and test sets respectively, which is 31.32% (validation set) and 1.49% (test set) higher than the best (DeepSpectrum) challenge baseline (CBL). Moreover, we show that concatenating layer embeddings with Mel-frequency cepstral coefficients (MFCCs) features further improves the UAR of 33.81% and 5.45% on validation and test sets respectively over the CBL. Finally, we demonstrate that the summing information across all the layers of Wav2Vec2.0 surpasses the CBL by a relative margin of 45.91% and 5.69% on validation and test sets respectively

INRIA a CCSD electronic archive server