Search CORE

2,992 research outputs found

Leveraging native language information for improved accented speech recognition

Author: Ghorbani Shahram
Hansen John H. L.
Publication venue: 'International Speech Communication Association'
Publication date: 18/04/2019
Field of study

Recognition of accented speech is a long-standing challenge for automatic speech recognition (ASR) systems, given the increasing worldwide population of bi-lingual speakers with English as their second language. If we consider foreign-accented speech as an interpolation of the native language (L1) and English (L2), using a model that can simultaneously address both languages would perform better at the acoustic level for accented speech. In this study, we explore how an end-to-end recurrent neural network (RNN) trained system with English and native languages (Spanish and Indian languages) could leverage data of native languages to improve performance for accented English speech. To this end, we examine pre-training with native languages, as well as multi-task learning (MTL) in which the main task is trained with native English and the secondary task is trained with Spanish or Indian Languages. We show that the proposed MTL model performs better than the pre-training approach and outperforms a baseline model trained simply with English data. We suggest a new setting for MTL in which the secondary task is trained with both English and the native language, using the same output set. This proposed scenario yields better performance with +11.95% and +17.55% character error rate gains over baseline for Hispanic and Indian accents, respectively.Comment: Accepted at Interspeech 201

arXiv.org e-Print Archive

Crossref

The new accent technologies:recognition, measurement and manipulation of accented speech

Author: Huckvale M
Publication venue: Beijing: Language and Culture Press
Publication date: 01/01/2006
Field of study

UCL Discovery

Robust language recognition via adaptive language factor extraction

Author: Demuynck Kris
Desplanques Brecht
Martens Jean-Pierre
Publication venue
Publication date: 01/01/2014
Field of study

This paper presents a technique to adapt an acoustically based language classifier to the background conditions and speaker accents. This adaptation improves language classification on a broad spectrum of TV broadcasts. The core of the system consists of an iVector-based setup in which language and channel variabilities are modeled separately. The subsequent language classifier (the backend) operates on the language factors, i.e. those features in the extracted iVectors that explain the observed language variability. The proposed technique adapts the language variability model to the background conditions and to the speaker accents present in the audio. The effect of the adaptation is evaluated on a 28 hours corpus composed of documentaries and monolingual as well as multilingual broadcast news shows. Consistent improvements in the automatic identification of Flemish (Belgian Dutch), English and French are demonstrated for all broadcast types

Ghent University Academic Bibliography

Pronunciation variation modelling using accent features

Author: Huckvale M
Tjalve M
Publication venue
Publication date: 01/01/2005
Field of study

UCL Discovery

Acoustic model selection using limited data for accent robust speech recognition

Author: Hanani Abualsoud
Najafian Maryam
Russell Martin
Safavi Saeid
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

University of Birmingham Research Portal

Accented Speech Recognition With Accent-specific Codebooks

Author: Ganapathy Sriram
Jyothi Preethi
Prabhu Darshan
Unni Vinit
Publication venue
Publication date: 26/10/2023
Field of study

Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems. Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR. In this work, we propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks. These learnable codebooks capture accent-specific information and are integrated within the ASR encoder layers. The model is trained on accented English speech, while the test data also contained accents which were not seen during training. On the Mozilla Common Voice multi-accented dataset, we show that our proposed approach yields significant performance gains not only on the seen English accents (up to

37\%

relative improvement in word error rate) but also on the unseen accents (up to

5\%

relative improvement in WER). Further, we illustrate benefits for a zero-shot transfer setup on the L2Artic dataset. We also compare the performance with other approaches based on accent adversarial training.Comment: Accepted to EMNLP 2023 Main Conference (Long Paper

arXiv.org e-Print Archive

CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice

Author: Ahmed Sara
Subakan Cem
Visockas Danielius
Zuluaga-Gomez Juan
Publication venue
Publication date: 29/05/2023
Field of study

Despite the recent advancements in Automatic Speech Recognition (ASR), the recognition of accented speech still remains a dominant problem. In order to create more inclusive ASR systems, research has shown that the integration of accent information, as part of a larger ASR framework, can lead to the mitigation of accented speech errors. We address multilingual accent classification through the ECAPA-TDNN and Wav2Vec 2.0/XLSR architectures which have been proven to perform well on a variety of speech-related downstream tasks. We introduce a simple-to-follow recipe aligned to the SpeechBrain toolkit for accent classification based on Common Voice 7.0 (English) and Common Voice 11.0 (Italian, German, and Spanish). Furthermore, we establish new state-of-the-art for English accent classification with as high as 95% accuracy. We also study the internal categorization of the Wav2Vev 2.0 embeddings through t-SNE, noting that there is a level of clustering based on phonological similarity. (Our recipe is open-source in the SpeechBrain toolkit, see: https://github.com/speechbrain/speechbrain/tree/develop/recipes)Comment: To appear in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 202

arXiv.org e-Print Archive