Search CORE

4,537 research outputs found

Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

Author: Chng Eng Siong
Khassanov Yerbolat
Ma Bin
Ni Chongjia
Pham Van Tung
Xu Haihua
Zeng Zhiping
Publication venue: 'International Speech Communication Association'
Publication date: 31/07/2019
Field of study

The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models. In this work, we propose a method to train an improved end-to-end code-switching ASR using only monolingual data. Our method encourages the distributions of output token embeddings of monolingual languages to be similar, and hence, promotes the ASR model to easily code-switch between languages. Specifically, we propose to use Jensen-Shannon divergence and cosine distance based constraints. The former will enforce output embeddings of monolingual languages to possess similar distributions, while the later simply brings the centroids of two distributions to be close to each other. Experimental results demonstrate high effectiveness of the proposed method, yielding up to 4.5% absolute mixed error rate improvement on Mandarin-English code-switching ASR task.Comment: 5 pages, 3 figures, accepted to INTERSPEECH 201

arXiv.org e-Print Archive

Crossref

Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models

Author: Klejch Ondřej
Lam-Yee-Mui Léa-Marie
Yang Lucas Ondel
Publication venue
Publication date: 20/08/2023
Field of study

Edinburgh Research Explorer

The CSTR System for Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages

Author: Bell Peter
Klejch Ondřej
Wallington Electra
Publication venue: 'International Speech Communication Association'
Publication date: 30/08/2021
Field of study

Edinburgh Research Explorer

Semisupervised Speech Data Extraction from Basque Parliament Sessions and Validation on Fully Bilingual Basque–Spanish ASR

Author: Bordel García German
Peñagarikano Badiola Mikel
Rodríguez Fuentes Luis Javier
Varona Fernández Amparo
Publication venue: MDPI
Publication date: 28/07/2023
Field of study

In this paper, a semisupervised speech data extraction method is presented and applied to create a new dataset designed for the development of fully bilingual Automatic Speech Recognition (ASR) systems for Basque and Spanish. The dataset is drawn from an extensive collection of Basque Parliament plenary sessions containing frequent code switchings. Since session minutes are not exact, only the most reliable speech segments are kept for training. To that end, we use phonetic similarity scores between nominal and recognized phone sequences. The process starts with baseline acoustic models trained on generic out-of-domain data, then iteratively updates the models with the extracted data and applies the updated models to refine the training dataset until the observed improvement between two iterations becomes small enough. A development dataset, involving five plenary sessions not used for training, has been manually audited for tuning and evaluation purposes. Cross-validation experiments (with 20 random partitions) have been carried out on the development dataset, using the baseline and the iteratively updated models. On average, Word Error Rate (WER) reduces from 16.57% (baseline) to 4.41% (first iteration) and further to 4.02% (second iteration), which corresponds to relative WER reductions of 73.4% and 8.8%, respectively. When considering only Basque segments, WER reduces on average from 16.57% (baseline) to 5.51% (first iteration) and further to 5.13% (second iteration), which corresponds to relative WER reductions of 66.7% and 6.9%, respectively. As a result of this work, a new bilingual Basque–Spanish resource has been produced based on Basque Parliament sessions, including 998 h of training data (audio segments + transcriptions), a development set (17 h long) designed for tuning and evaluation under a cross-validation scheme and a fully bilingual trigram language model.This work was partially funded by the Spanish Ministry of Science and Innovation (OPEN-SPEECH project, PID2019-106424RB-I00) and by the Basque Government under the general support program to research groups (IT-1704-22)

Archivo Digital para la Docencia y la Investigación