Search CORE

123 research outputs found

Shared-hidden-layer Deep Neural Network for Under-resourced Language the Content

Author: Hoesen Devin
Lestari Dessi Puji
Widyantoro Dwi Hendratmo
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/06/2018
Field of study

Training speech recognizer with under-resourced language data still proves difficult. Indonesian language is considered under-resourced because the lack of a standard speech corpus, text corpus, and dictionary. In this research, the efficacy of augmenting limited Indonesian speech training data with highly-resourced-language training data, such as English, to train Indonesian speech recognizer was analyzed. The training was performed in form of shared-hidden-layer deep-neural-network (SHL-DNN) training. An SHL-DNN has language-independent hidden layers and can be pre-trained and trained using multilingual training data without any difference with a monolingual deep neural network. The SHL-DNN using Indonesian and English speech training data proved effective for decreasing word error rate (WER) in decoding Indonesian dictated-speech by achieving 3.82% absolute decrease compared to a monolingual Indonesian hidden Markov model using Gaussian mixture model emission (GMM-HMM). The case was confirmed when the SHL-DNN was also employed to decode Indonesian spontaneous-speech by achieving 4.19% absolute WER decrease

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System

Pronunciation Lexicon Development for Under-Resourced Languages Using Automatically Derived Subword Units: A Case Study on Scottish Gaelic

Author: Magimai.-Doss Mathew
Rasipuram Ramya
Razavi Marzieh
Publication venue
Publication date: 19/11/2015
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Feature analysis for discriminative confidence estimation in spoken term detection

Author: Akbacak
Almuallim
Almuallim
Ayed
Bekkerman
Ben-Bassat
Bergen
Bi
Bishop
Breiman
Can
Chan
Chase
Chen
Cox
Deligne
Dong Wang
Doroteo T. Toledano
Duda
Duda
Forman
Furey
Gadde
Gillick
Goldwater
Good
Guyon
Guyon
Hain
Hall
Hastie
Hauptmann
Hellevik
Jansen
Javier Tejedor
Jiang
José Colás
Kamppari
Kao
Kemp
Kira
Kira
Kohavi
Koller
Kononenko
Langley
Liaw
Logan
Mamou
Mamou
Manos
Mathan
Meng
Moreno
Motlicek
Neti
Ou
Parada
Parlak
Pinto
Rohlicek
Saeys
Saraçlar
Schaaf
Shafran
Simon King
Siu
Stolcke
Sudoh
Sukkar
Szöke
Szöke
Szöke
Tejedor
Tejedor
Thambiratmann
Tibshirani
Torkkola
Tusher
Vergyri
Vergyri
Wallace
Wallace
Wang
Wang
Wang
Wang
Wang
Weintraub
Weston
Yu
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

This is the author’s version of a work that was accepted for publication in Computer Speech & Language. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Computer Speech & Language, 28, 5, (2014) DOI: 10.1016/j.csl.2013.09.008Discriminative confidence based on multi-layer perceptrons (MLPs) and multiple features has shown significant advantage compared to the widely used lattice-based confidence in spoken term detection (STD). Although the MLP-based framework can handle any features derived from a multitude of sources, choosing all possible features may lead to over complex models and hence less generality. In this paper, we design an extensive set of features and analyze their contribution to STD individually and as a group. The main goal is to choose a small set of features that are sufficiently informative while keeping the model simple and generalizable. We employ two established models to conduct the analysis: one is linear regression which targets for the most relevant features and the other is logistic linear regression which targets for the most discriminative features. We find the most informative features are comprised of those derived from diverse sources (ASR decoding, duration and lexical properties) and the two models deliver highly consistent feature ranks. STD experiments on both English and Spanish data demonstrate significant performance gains with the proposed feature sets.This work has been partially supported by project PriorSPEECH (TEC2009-14719-C02-01) from the Spanish Ministry of Science and Innovation and by project MAV2VICMR (S2009/TIC-1542) from the Community of Madrid

CiteSeerX

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Edinburgh Research Explorer

Biblos-e Archivo