Search CORE

109 research outputs found

Impact of deep MLP architecture on different acoustic modeling techniques for under-resourced speech recognition

Author: Bourlard Hervé
Garner Philip N.
Imseng David
Motlicek Petr
Publication venue
Publication date: 01/01/2013
Field of study

Posterior based acoustic modeling techniques such as Kullback– Leibler divergence based HMM (KL-HMM) and Tandem are able to exploit out-of-language data through posterior fea-tures, estimated by a Multi-Layer Perceptron (MLP). In this paper, we investigate the performance of posterior based ap-proaches in the context of under-resourced speech recognition when a standard three-layer MLP is replaced by a deeper five-layer MLP. The deeper MLP architecture yields similar gains of about 15 % (relative) for Tandem, KL-HMM as well as for a hybrid HMM/MLP system that directly uses the poste-rior estimates as emission probabilities. The best performing system, a bilingual KL-HMM based on a deep MLP, jointly trained on Afrikaans and Dutch data, performs 13 % better than a hybrid system using the same bilingual MLP and 26% better than a subspace Gaussian mixture system only trained on Afrikaans data. Index Terms — KL-HMM, Tandem, hybrid system, deep MLPs, under-resourced speech recognitio

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Crossref

Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

Author: Vu Ngoc Thang
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech

KITopen

Analysis of Data Augmentation Methods for Low-Resource Maltese ASR

Author: Borg Claudia
DeMarco Andrea
Gatt Albert
Mena Carlos
van der Plas Lonneke
Williams Aiden
Publication venue
Publication date: 20/01/2023
Field of study

Recent years have seen an increased interest in the computational speech processing of Maltese, but resources remain sparse. In this paper, we consider data augmentation techniques for improving speech recognition for low-resource languages, focusing on Maltese as a test case. We consider three different types of data augmentation: unsupervised training, multilingual training and the use of synthesized speech as training data. The goal is to determine which of these techniques, or combination of them, is the most effective to improve speech recognition for languages where the starting point is a small corpus of approximately 7 hours of transcribed speech. Our results show that combining the data augmentation techniques studied here lead us to an absolute WER improvement of 15% without the use of a language model.Comment: 12 page

arXiv.org e-Print Archive

Multilingual Deep Neural Network based Acoustic Modeling For Rapid Language Adaptation

Author: Bourlard Hervé
Imseng David
Motlicek Petr
Povey Daniel
Schultz Tanja
Vu Ngoc Thang
Publication venue
Publication date: 19/04/2014
Field of study

This paper presents a study on multilingual deep neural network (DNN) based acoustic modeling and its application to new languages. We investigate the effect of phone merging on multilingual DNN in context of rapid language adaptation. Moreover, the combination of multilingual DNNs with Kullback--Leibler divergence based acoustic modeling (KL-HMM) is explored. Using ten different languages from the Globalphone database, our studies reveal that crosslingual acoustic model transfer through multilingual DNNs is superior to unsupervised RBM pre-training and greedy layer-wise supervised training. We also found that KL-HMM based decoding consistently outperforms conventional hybrid decoding, especially in low-resource scenarios. Furthermore, the experiments indicate that multilingual DNN training equally benefits from simple phoneset concatenation and manually derived universal phonesets

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Crossref

Development of Bilingual ASR System for MediaParl Corpus

Author: Cernak Milos
Imseng David
Kim Namhoon
Motlicek Petr
Publication venue: Rue Marconi 19, Idiap
Publication date: 19/12/2014
Field of study

The development of an Automatic Speech Recognition (ASR) system for the bilingual MediaParl corpus is challenging for several reasons: (1) reverberant recordings, (2) accented speech, and (3) no prior information about the language. In that context, we employ frequency domain linear prediction-based (FDLP) features to reduce the effect of reverberation, exploit bilingual deep neural networks applied in Tandem and hybrid acoustic modeling approaches to significantly improve ASR for accented speech and develop a fully bilingual ASR system using entropy-based decoding-graph selection. Our experiments indicate that the proposed bilingual ASR system performs similar to a language-specific ASR system if approximately five seconds of speech are available

Infoscience - École polytechnique fédérale de Lausanne

On Modeling Context-dependent Clustered States: Comparing HMM/GMM, Hybrid HMM/ANN and KL-HMM Approaches

Author: Magimai.-Doss Mathew
Rasipuram Ramya
Razavi Marzieh
Publication venue: Idiap
Publication date: 19/12/2013
Field of study

Infoscience - École polytechnique fédérale de Lausanne

A survey on Automatic Speech Recognition systems for Portuguese language and its variations

Author: Aguiar de Lima Thales
da Costa-Abreu Marjory
Publication venue: 'Elsevier BV'
Publication date: 06/12/2019
Field of study

Communication has been an essential part of being human and living in society. There are several different languages and variations of them, so you can speak English in one place and not be able to communicate effectively with someone who speaks English with a different accent. There are several application areas where voice/speech data can be of importance, such as health, security, biometric analysis or education. However, most studies focus on English, Arabic or Asian languages, neglecting other relevant languages, such as Portuguese, which leaves their investigations wide open. Thus, it is crucial to understand the area, where the main focus is: what are the most used techniques for feature extraction and classification, and so on. This paper presents a survey on automatic speech recognition components for Portuguese-based language and its variations, as an understudied language. With a total of 101 papers from 2012 to 2018, the Portuguese-based automatic speech recognition field tendency will be explained, and several possible unexplored methods will be presented and discussed in a collaborative and overall way as our main contribution

Sheffield Hallam University Research Archive

Multilingual audio information management system based on semantic knowledge in complex environments

Author: Barroso Moreno Nora
Calvo Salomón Pilar María
Ezeiza Ramos Aitzol
Fernández Gómez de Segura Elsa
Hernández Gómez María del Carmen
López de Ipiña Peña Miren Karmele
Susperregui Aseguinolaza Unai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

This paper proposes a multilingual audio information management system based on semantic knowledge in complex environments. The complex environment is defined by the limited resources (financial, material, human, and audio resources); the poor quality of the audio signal taken from an internet radio channel; the multilingual context (Spanish, French, and Basque that is in under-resourced situation in some areas); and the regular appearance of cross-lingual elements between the three languages. In addition to this, the system is also constrained by the requirements of the local multilingual industrial sector. We present the first evolutionary system based on a scalable architecture that is able to fulfill these specifications with automatic adaptation based on automatic semantic speech recognition, folksonomies, automatic configuration selection, machine learning, neural computing methodologies, and collaborative networks. As a result, it can be said that the initial goals have been accomplished and the usability of the final application has been tested successfully, even with non-experienced users.This work is being funded by Grants: TEC201677791-C4 from Plan Nacional de I + D + i, Ministry of Economic Affairs and Competitiveness of Spain and from the DomusVi Foundation Kms para recorder, the Basque Government (ELKARTEK KK-2018/00114, GEJ IT1189-19, the Government of Gipuzkoa (DG18/14 DG17/16), UPV/EHU (GIU19/090), COST ACTION (CA18106, CA15225)

Archivo Digital para la Docencia y la Investigación