Search CORE

5 research outputs found

Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models

Author: Klejch Ondřej
Lam-Yee-Mui Léa-Marie
Yang Lucas Ondel
Publication venue
Publication date: 20/08/2023
Field of study

An Empirical Evaluation of Zero Resource Acoustic Unit Discovery

Author: Burget Lukas
Dehak Najim
Ghahremani Pegah
Kesiraju Santosh
Khudanpur Sanjeev
Liu Chunxi
Ondel Lucas
Rott Alena
Sun Ming
Yang Jinyi
Publication venue
Publication date: 04/02/2017
Field of study

Acoustic unit discovery (AUD) is a process of automatically identifying a categorical acoustic unit inventory from speech and producing corresponding acoustic unit tokenizations. AUD provides an important avenue for unsupervised acoustic model training in a zero resource setting where expert-provided linguistic knowledge and transcribed speech are unavailable. Therefore, to further facilitate zero-resource AUD process, in this paper, we demonstrate acoustic feature representations can be significantly improved by (i) performing linear discriminant analysis (LDA) in an unsupervised self-trained fashion, and (ii) leveraging resources of other languages through building a multilingual bottleneck (BN) feature extractor to give effective cross-lingual generalization. Moreover, we perform comprehensive evaluations of AUD efficacy on multiple downstream speech applications, and their correlated performance suggests that AUD evaluations are feasible using different alternative language resources when only a subset of these evaluation resources can be available in typical zero resource applications.Comment: 5 pages, 1 figure; Accepted for publication at ICASSP 201

arXiv.org e-Print Archive

Crossref

Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models

Author: Klejch Ondřej
Lam-Yee-Mui Léa-Marie
Yang Lucas, Ondel
Publication venue: ISCA
Publication date: 20/08/2023
Field of study

This paper investigates the potential of improving a hybrid automatic speech recognition model trained on 10 hours of transcribed data with 200 hours of untranscribed data in lowresource languages. First, we compare baseline methods of cross-lingual transfer with MFCC features and features extracted with the multilingual self-supervised model XLSR-53. Subsequently, we compare two approaches that can leverage the untranscribed data: semi-supervised training with LF-MMI and continued self-supervised pre-training of XLSR-53. Our results on well-resourced English broadcast data derived from MGB show that both methods achieve 18% and 27% relative improvements compared to the baseline, respectively. On the low-resource South African Soap Opera dataset, the relative improvement with semi-supervised training is only 3% due to the inherently weak language model. However, continued pretraining achieves 8.6% relative improvement because it does not rely on any external information

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

Author: Brutti Alessio
Cappellazzo Umberto
Falavigna Daniele
Raj Desh
Wright George August
Yang Lucas Ondel
Zaiem Salah
Publication venue: HAL CCSD
Publication date: 23/09/2023
Field of study

The possibility of dynamically modifying the computational load of neural models at inference time is crucial for on-device processing, where computational power is limited and time-varying. Established approaches for neural model compression exist, but they provide architecturally static models. In this paper, we investigate the use of early-exit architectures, that rely on intermediate exit branches, applied to large-vocabulary speech recognition. This allows for the development of dynamic models that adjust their computational cost to the available resources and recognition performance. Unlike previous works, besides using pre-trained backbones we also train the model from scratch with an early-exit architecture. Experiments on public datasets show that early-exit architectures from scratch not only preserve performance levels when using fewer encoder layers, but also improve task accuracy as compared to using single-exit models or using pre-trained models. Additionally, we investigate an exit selection strategy based on posterior probabilities as an alternative to frame-based entropy

INRIA a CCSD electronic archive server

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

Author: Brutti Alessio
Cappellazzo Umberto
Falavigna Daniele
Raj Desh
Wright George August
Yang Lucas Ondel
Zaiem Salah
Publication venue: HAL CCSD
Publication date: 23/09/2023
Field of study

HAL-CentraleSupelec