Search CORE

3 research outputs found

Acoustic Adaptation to Dynamic Background Conditions with Asynchronous Transformations

Author: Hain T.
Saz O.
Publication venue: Elsevier
Publication date: 01/01/2017
Field of study

This paper proposes a framework for performing adaptation to complex and non-stationary background conditions in Automatic Speech Recognition (ASR) by means of asynchronous Constrained Maximum Likelihood Linear Regression (aCMLLR) transforms and asynchronous Noise Adaptive Training (aNAT). The proposed method aims to apply the feature transform that best compensates the background for every input frame. The implementation is done with a new Hidden Markov Model (HMM) topology that expands the usual left-to-right HMM into parallel branches adapted to different background conditions and permits transitions among them. Using this, the proposed adaptation does not require ground truth or previous knowledge about the background in each frame as it aims to maximise the overall log-likelihood of the decoded utterance. The proposed aCMLLR transforms can be further improved by retraining models in an aNAT fashion and by using speaker-based MLLR transforms in cascade for an efficient modelling of background effects and speaker. An initial evaluation in a modified version of the WSJCAM0 corpus incorporating 7 different background conditions provides a benchmark in which to evaluate the use of aCMLLR transforms. A relative reduction of 40.5% in Word Error Rate (WER) was achieved by the combined use of aCMLLR and MLLR in cascade. Finally, this selection of techniques was applied in the transcription of multi-genre media broadcasts, where the use of aNAT training, aCMLLR transforms and MLLR transforms provided a relative improvement of 2–3%

Elsevier - Publisher Connector

Crossref

White Rose Research Online

Acoustic adaptation to dynamic background conditions with asynchronous transformations

Author: Anastasakos
Astudillo
Bell
Buera
Chen
Cieri
Cooke
Droppo
Ephraim
Ephraim
Fiscus
Gales
Gales
Gales
Gales
Galliano
Gauvain
Grezl
Grezl
Hain
Hermansky
Hirsch
Kalinli
Lee
Li
Liao
Liu
Long
Miguel
Milner
Moreno
Novak
Oscar Saz
Paliwal
Pallett
Parihar
Paul
Povey
Povey
Povey
Raj
Richmond
Robinson
Saz
Saz
Saz
Saz
Schuller
Seltzer
Seltzer
Seo
Stolcke
Thomas Hain
Varga
Varga
Vesely
Wang
Yin
Young
Zelenak
Zhang
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Computer, Speech and Language - Experiment results for paper "Acoustic Adaptation to Dynamic Background Conditions with Asynchronous Transformations"

Author: Oscar Saz Torralba (1376514)
Thomas Hain (1263813)
Publication venue
Publication date
Field of study

<div>The files in the dataset correspond to results that have been generated for the Computer, Speech and Language article: "Acoustic Adaptation to Dynamic Background Conditions with Asynchronous Transformations" <a href="http://dx.doi.org/10.1016/j.csl.2016.06.008">http://dx.doi.org/10.1016/j.csl.2016.06.008</a>.</div><div><br></div><div>The files in the zip file are of three types:</div><div>- .ctm, which correspond to the output of the automatic speech recognition system and the columns include segment information as well as transcripts of the recognition.</div><div>- .sys, which correspond to scoring of the automatic speech recognition system and includes the overall word error rate as well as the number of insertions, deletions and substitutions of the overall system.</div><div>- .lur, which provides a more detailed decomposition of the word error rate across different tags.</div><div><br></div><div>The following is a description about the naming convention of the files:</div><div><br></div><div>TableX-LineY: This is the recognition and scoring output corresponding to Line Y of Table X in the article.</div><div>Figure X-BarY: This is the recognition and scoring output corresponding to Bar Y (starting on the left hand side) of Figure X in the article.</div><div><br></div><div>All three file types are standard outputs that are recognised by the automatic speech recognition community and can be opened using any text editor.</div

FigShare