Search CORE

3 research outputs found

Multilingual representations for low resource speech recognition and keyword search

Author: Audhkhasi K
Cui J
Cui X
Gales MJF
Golik P
Kingsbury B
Kislal E
Knill KM
Mangu L
Ney H
Nussbaum-Thom M
Picheny M
Ragni A
Ramabhadran B
Schluter R
Sethy A
Tüske Z
Wang H
Woodland P
Publication venue: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings
Publication date: 01/01/2015
Field of study

© 2015 IEEE. This paper examines the impact of multilingual (ML) acoustic representations on Automatic Speech Recognition (ASR) and keyword search (KWS) for low resource languages in the context of the OpenKWS15 evaluation of the IARPA Babel program. The task is to develop Swahili ASR and KWS systems within two weeks using as little as 3 hours of transcribed data. Multilingual acoustic representations proved to be crucial for building these systems under strict time constraints. The paper discusses several key insights on how these representations are derived and used. First, we present a data sampling strategy that can speed up the training of multilingual representations without appreciable loss in ASR performance. Second, we show that fusion of diverse multilingual representations developed at different LORELEI sites yields substantial ASR and KWS gains. Speaker adaptation and data augmentation of these representations improves both ASR and KWS performance (up to 8.7% relative). Third, incorporating un-transcribed data through semi-supervised learning, improves WER and KWS performance. Finally, we show that these multilingual representations significantly improve ASR and KWS performance (relative 9% for WER and 5% for MTWV) even when forty hours of transcribed audio in the target language is available. Multilingual representations significantly contributed to the LORELEI KWS systems winning the OpenKWS15 evaluation

Publikationsserver der RWTH Aachen University

Apollo (Cambridge)

White Rose Research Online

CUED - Cambridge University Engineering Department

Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit.

Author: A Cutler
A Hannun
A Schweitzer
B Gick
B Mandelbrot
B Widrow
C Burgess
C Cucchiarini
D Norris
Denis Arnold
DJ Broad
E Charniak
F Rosenblatt
F Tomaschek
F Tomaschek
Fabian Tomaschek
Florence Lopez
Hedderik van Rijn
J Hay
J Sueur
JC Junqua
JL McClelland
JR Saffran
K Keune
KJ Kohler
Konstantin Sering
L Lisker
L ten Bosch
LJ Raphael
M Ernestus
M Ramscar
M Ramscar
MG Gaskell
N Gogtay
N Kitaoka
O Scharenborg
P Brockwell
P Hendrix
P Milin
PC Trimmer
R Kemps
R. Harald Baayen
RA Rescorla
RF Port
RF Port
RG Mbu Nyamsi
RH Baayen
RH Baayen
RH Baayen
RP Lippmann
RQ Quiroga
RR Miller
S Waydo
S Zerlin
SN Wood
SN Wood
T Nearey
T Paus
TK Landauer
W Schultz
Z Tüske
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2017
Field of study

Sound units play a pivotal role in cognitive models of auditory comprehension. The general consensus is that during perception listeners break down speech into auditory words and subsequently phones. Indeed, cognitive speech recognition is typically taken to be computationally intractable without phones. Here we present a computational model trained on 20 hours of conversational speech that recognizes word meanings within the range of human performance (model 25%, native speakers 20-44%), without making use of phone or word form representations. Our model also generates successfully predictions about the speed and accuracy of human auditory comprehension. At the heart of the model is a 'wide' yet sparse two-layer artificial neural network with some hundred thousand input units representing summaries of changes in acoustic frequency bands, and proxies for lexical meanings as output units. We believe that our model holds promise for resolving longstanding theoretical problems surrounding the notion of the phone in linguistic theory

Crossref

Directory of Open Access Journals

Publikationsserver der Universität Tübingen

PubMed Central

Publikationsserver des Instituts für Deutsche Sprache