Search CORE

57 research outputs found

Speech recognition and keyword spotting for low-resource languages : Babel project research at CUED

Author: Gales M.J.F.
Knill K.M.
Ragni A.
Rath S.P.
Publication venue: International Speech Communication Association (ISCA)
Publication date: 14/05/2014
Field of study

Recently there has been increased interest in Automatic Speech Recognition (ASR) and Key Word Spotting (KWS) systems for low resource languages. One of the driving forces for this research direction is the IARPA Babel project. This paper describes some of the research funded by this project at Cambridge University, as part of the Lorelei team co-ordinated by IBM. A range of topics are discussed including: deep neural network based acoustic models; data augmentation; and zero acoustic model resource systems. Performance for all approaches is evaluated using the Limited (approximately 10 hours) and/or Full (approximately 80 hours) language packs distributed by IARPA. Both KWS and ASR performance figures are given. Though absolute performance varies from language to language, and keyword list, the approaches described show consistent trends over the languages investigated to date. Using comparable systems over the five Option Period 1 languages indicates a strong correlation between ASR performance and KWS performance

CiteSeerX

White Rose Research Online

Improving interpretability and regularization in deep learning

Author: Gales M.J.F.
Karanasou P.
Ragni A.
Sim K.C.
Wu C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Deep learning approaches yield state-of-the-art performance in a range of tasks, including automatic speech recognition. However, the highly distributed representation in a deep neural network (DNN) or other network variations is difficult to analyze, making further parameter interpretation and regularization challenging. This paper presents a regularization scheme acting on the activation function output to improve the network interpretability and regularization. The proposed approach, referred to as activation regularization, encourages activation function outputs to satisfy a target pattern. By defining appropriate target patterns, different learning concepts can be imposed on the network. This method can aid network interpretability and also has the potential to reduce overfitting. The scheme is evaluated on several continuous speech recognition tasks: the Wall Street Journal continuous speech recognition task, eight conversational telephone speech tasks from the IARPA Babel program and a U.S. English broadcast news task. On all the tasks, the activation regularization achieved consistent performance gains over the standard DNN baselines

OPUS Augsburg

Crossref

White Rose Research Online

Discriminative classifiers with adaptive kernels for noise robust speech recognition

Author: Acero
Burges
Deng
F. Flego
Huang
Huang
Jaakkola
Kuo
M.J.F. Gales
Smith
Vapnik
Wu
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Improving speech recognition and keyword search for low resource languages using web data

Author: Cooper E.
Gales M.J.F.
Hirschberg J.
Knill K.M.
Mendels G.
Ragni A.
Soto V.
Wang H.
Publication venue: International Speech Communication Association (ISCA)
Publication date: 06/09/2015
Field of study

We describe the use of text data scraped from the web to augment language models for Automatic Speech Recognition and Keyword Search for Low Resource Languages. We scrape text from multiple genres including blogs, online news, translated TED talks, and subtitles. Using linearly interpolated language models, we find that blogs and movie subtitles are more relevant for language modeling of conversational telephone speech and obtain large reductions in out-of-vocabulary keywords. Furthermore, we show that the web data can improve Term Error Rate Performance by 3.8% absolute and Maximum Term-Weighted Value in Keyword Search by 0.0076-0.1059 absolute points. Much of the gain comes from the reduction of out-of-vocabulary items

White Rose Research Online

Transcription of multi-genre media archives using out-of-domain data

Author: Bell P.J.
Gales M.J.F.
Lanchantin P.
Liu X.
Long Y.
Renals S.
Swietojanski Pawel
Woodland P.C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

We describe our work on developing a speech recognition system for multi-genre media archives. The high diversity of the data makes this a challenging recognition task, which may benefit from systems trained on a combination of in-domain and out-of-domain data. Working with tandem HMMs, we present Multi-level Adaptive Networks (MLAN), a novel technique for incorporating information from out-of-domain posterior features using deep neural networks. We show that it provides a substantial reduction in WER over other systems, with relative WER reductions of 15 % over a PLP baseline, 9 % over in-domain tandem features and 8 % over the best out-of-domain tandem features

CiteSeerX

Crossref

Edinburgh Research Explorer

Progress in the CU-HTK broadcast news transcription system

Author: D. Mrva
Do Yeong Kim
Ho Yin Chan
M.J.F. Gales
P.C. Woodland
R. Sinha
S.E. Tranter
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Use of contexts in language model interpolation and adaptation

Author: Bahl
Bellegarda
Bengio
Blei
Brants
Bulyko
Bulyko
Caseiro
Chen
Chen
Cheng
Chien
Clarkson
Darroch
Della Pietra
Doumpiotis
Federico
Federico
Gildea
Gopalakrishnan
Hermansky
Hieronymus
Hinton
Hsu
Iyer
Iyer
Jelinek
Jelinek
Kaiser
Katz
Kneser
Kneser
Liu
Liu
Liu
Liu
Liu
M.J.F. Gales
McDonough
Mohri
Mohri
Mohri
Mohri
Mrva
Mrva
Och
Oonishi
P.C. Woodland
Povey
Rosenfeld
Rosenfeld
Rosenfeld
Schwenk
Seymore
Sinha
Stolcke
Tam
Woodland
X. Liu
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Initialization of fMLLR with Sufficient Statistics from Similar Speakers

Author: A. Pražák
M.J.F. Gales
M.J.F. Gales
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Crossref