Search CORE

6 research outputs found

Improving bottleneck features for Vietnamese large vocabulary continuous speech recognition system using deep neural networks

Author: Luong Mai Chi
Nguyen Bao Quoc
Vu Thang Tat
Publication venue: 'Publishing House for Science and Technology, Vietnam Academy of Science and Technology'
Publication date: 03/01/2016
Field of study

In this paper, the pre-training method based on denoising auto-encoder is investigated and proved to be good models for initializing bottleneck networks of Vietnamese speech recognition system that result in better recognition performance compared to base bottleneck features reported previously. The experiments are carried out on the dataset containing speeches on Voice of Vietnam channel (VOV). The results show that the DBNF extraction for Vietnamese recognition decreases relative word error rate by 14 % and 39 % compared to the base bottleneck features and MFCC baseline, respectively

Vietnam Academy of Science and Technology: Journals Online

Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages

Author: Abdullah Badr M.
Avgustinova Tania
Klakow Dietrich
Möbius Bernd
Publication venue
Publication date: 06/08/2020
Field of study

State-of-the-art spoken language identification (LID) systems, which are based on end-to-end deep neural networks, have shown remarkable success not only in discriminating between distant languages but also between closely-related languages or even different spoken varieties of the same language. However, it is still unclear to what extent neural LID models generalize to speech samples with different acoustic conditions due to domain shift. In this paper, we present a set of experiments to investigate the impact of domain mismatch on the performance of neural LID systems for a subset of six Slavic languages across two domains (read speech and radio broadcast) and examine two low-level signal descriptors (spectral and cepstral features) for this task. Our experiments show that (1) out-of-domain speech samples severely hinder the performance of neural LID models, and (2) while both spectral and cepstral features show comparable performance within-domain, spectral features show more robustness under domain mismatch. Moreover, we apply unsupervised domain adaptation to minimize the discrepancy between the two domains in our study. We achieve relative accuracy improvements that range from 9% to 77% depending on the diversity of acoustic conditions in the source domain.Comment: To appear in INTERSPEECH 202

arXiv.org e-Print Archive

Crossref

Multilingual shifting deep bottleneck features for low-resource ASR

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Modelling multi-modal language learning: From sentences to words

Author: Merkx D.
Publication venue: Radboud University Nijmegen
Publication date: 01/01/2022
Field of study

MPG.PuRe