Search CORE

34 research outputs found

Direct Acoustics-to-Word Models for English Conversational Speech Recognition

Author: Audhkhasi Kartik
Nahamoo David
Picheny Michael
Ramabhadran Bhuvana
Saon George
Publication venue
Publication date: 22/03/2017
Field of study

Recent work on end-to-end automatic speech recognition (ASR) has shown that the connectionist temporal classification (CTC) loss can be used to convert acoustics to phone or character sequences. Such systems are used with a dictionary and separately-trained Language Model (LM) to produce word sequences. However, they are not truly end-to-end in the sense of mapping acoustics directly to words without an intermediate phone representation. In this paper, we present the first results employing direct acoustics-to-word CTC models on two well-known public benchmark tasks: Switchboard and CallHome. These models do not require an LM or even a decoder at run-time and hence recognize speech with minimal complexity. However, due to the large number of word output units, CTC word models require orders of magnitude more data to train reliably compared to traditional systems. We present some techniques to mitigate this issue. Our CTC word model achieves a word error rate of 13.0%/18.8% on the Hub5-2000 Switchboard/CallHome test sets without any LM or decoder compared with 9.6%/16.0% for phone-based CTC with a 4-gram LM. We also present rescoring results on CTC word model lattices to quantify the performance benefits of a LM, and contrast the performance of word and phone CTC models.Comment: Submitted to Interspeech-201

arXiv.org e-Print Archive

Crossref

Generalization of Extended Baum-Welch Parameter Estimation for Discriminative Training and Decoding

Author: Kanevsky Dr Dimitri
Nahamoo Dr David
Ramabhadran Dr Bhuvana
Sainath Dr Tara
Publication venue
Publication date: 27/04/2008
Field of study

We demonstrate the generalizability of the Extended Baum-Welch (EBW) algorithm not only for HMM parameter estimation but for decoding as well.\ud We show that there can exist a general function associated with the objective function under EBW that reduces to the well-known auxiliary function used in the Baum-Welch algorithm for maximum likelihood estimates.\ud We generalize representation for the updates of model parameters by making use of a differentiable function (such as arithmetic or geometric\ud mean) on the updated and current model parameters and describe their effect on the learning rate during HMM parameter estimation. Improvements on speech recognition tasks are also presented here

On a model-robust training method for speech recognition

Author: A. Nadas
D. Nahamoo
M.A. Picheny
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

ULTRASONIC DIFFRACTION IMAGING

Author: NAHAMOO DAVID
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/1982
Field of study

A new approach to diffraction tomography is presented. This approach eliminates the need for plane wave illumination and, also, only two rotational positions of the object are required. The theory for this new approach is based on small perturbation approximation to the wave equation. We have derived reconstruction algorithms for linear viscoelastic and viscous fluid models of the object. Algorithms are also presented for lossy coupling media. Computer simulation and some very preliminary experimental results are presented to verify the theory

Purdue E-Pubs

DESIGN OF A PN CODED DOPPLER SYSTEM FLOWMETER

Author: Baker
Cathignol
Nahamoo
Publication venue: 'Elsevier BV'
Publication date: 01/01/1983
Field of study

Crossref

Improved pretraining of deep belief networks using sparse encoding symmetric machines

Author: Bhuvana Ramabhadran
Christian Plahl
David Nahamoo
Tara N. Sainath
Publication venue
Publication date: 01/01/2012
Field of study

Restricted Boltzmann Machines (RBM) continue to be a popular methodology to pre-train weights of Deep Belief Networks (DBNs). However, the RBM objective function cannot be maximized directly. Therefore, it is not clear what function to monitor when deciding to stop the training, leading to a challenge in managing the computational costs. The Sparse Encoding Symmetric Machine (SESM) has been suggested as an alternative method for pre-training. By placing a sparseness term on the NN output codebook, SESM allows the objective function to be optimized directly and reliably be monitored as an indicator to stop the training. In this paper, we explore SESM to pre-train DBNs and apply this the first time to speech recognition. First, we provide a detailed analysis comparing the behavior of SESM and RBM. Second, we compare the performance of SESM pre-trained and RBM pre-trained DBNs on TIMIT and a 50 hour English Broadcast News task. Results indicate that pre-trained DBNs using SESM and RBMs achieve comparable performance and outperform randomly initialized DBNs with SESM providing a much easier stopping criterion relative to RBM. Index Terms — Deep belief network, pre-training, neural network feature extraction, sparse representatio

CiteSeerX

Crossref

Publikationsserver der RWTH Aachen University