Search CORE

2,679 research outputs found

Context-Dependent Acoustic Modeling without Explicit Phone Clustering

Author: Beck Eugen
Ney Hermann
Raissi Tina
Schlüter Ralf
Publication venue
Publication date: 15/05/2020
Field of study

Phoneme-based acoustic modeling of large vocabulary automatic speech recognition takes advantage of phoneme context. The large number of context-dependent (CD) phonemes and their highly varying statistics require tying or smoothing to enable robust training. Usually, Classification and Regression Trees are used for phonetic clustering, which is standard in Hidden Markov Model (HMM)-based systems. However, this solution introduces a secondary training objective and does not allow for end-to-end training. In this work, we address a direct phonetic context modeling for the hybrid Deep Neural Network (DNN)/HMM, that does not build on any phone clustering algorithm for the determination of the HMM state inventory. By performing different decompositions of the joint probability of the center phoneme state and its left and right contexts, we obtain a factorized network consisting of different components, trained jointly. Moreover, the representation of the phonetic context for the network relies on phoneme embeddings. The recognition accuracy of our proposed models on the Switchboard task is comparable and outperforms slightly the hybrid model using the standard state-tying decision trees.Comment: Submitted to Interspeech 202

arXiv.org e-Print Archive

Crossref

Leveraging native language information for improved accented speech recognition

Author: Ghorbani Shahram
Hansen John H. L.
Publication venue: 'International Speech Communication Association'
Publication date: 18/04/2019
Field of study

Recognition of accented speech is a long-standing challenge for automatic speech recognition (ASR) systems, given the increasing worldwide population of bi-lingual speakers with English as their second language. If we consider foreign-accented speech as an interpolation of the native language (L1) and English (L2), using a model that can simultaneously address both languages would perform better at the acoustic level for accented speech. In this study, we explore how an end-to-end recurrent neural network (RNN) trained system with English and native languages (Spanish and Indian languages) could leverage data of native languages to improve performance for accented English speech. To this end, we examine pre-training with native languages, as well as multi-task learning (MTL) in which the main task is trained with native English and the secondary task is trained with Spanish or Indian Languages. We show that the proposed MTL model performs better than the pre-training approach and outperforms a baseline model trained simply with English data. We suggest a new setting for MTL in which the secondary task is trained with both English and the native language, using the same output set. This proposed scenario yields better performance with +11.95% and +17.55% character error rate gains over baseline for Hispanic and Indian accents, respectively.Comment: Accepted at Interspeech 201

arXiv.org e-Print Archive

Crossref

Linking working memory and long-term memory: A computational model of the learning of new words

Author: Baddeley A.D.
Bates E.
Brown G.D.A.
Case R.
Chi M.T.H.
Conrad R.
Croker S.
De Groot A.D.
Dunn L.M.
Freudenthal D.
Freudenthal D.
Freudenthal D.
Freudenthal D.
Freudenthal D.
Gathercole S.E.
Gathercole S.E.
Gathercole S.E.
Gathercole S.E.
Gathercole S.E.
Gobet F.
Jones G.
Jones G.
Jones G.
Lane P.C.R.
Masoura E.V.
Munson B.
Munson B.
Nagy W.E.
Page M.P.A.
Papagno C.
Richman H.B.
Roy P.
Ruchkin D.S.
Schneider W.
Theakston A.L.
Treiman R.
Zhang G.
Publication venue: Blackwell Publishing. The definitive version is available at www.blackwell-synergy.com
Publication date: 01/01/2007
Field of study

The nonword repetition (NWR) test has been shown to be a good predictor of children’s vocabulary size. NWR performance has been explained using phonological working memory, which is seen as a critical component in the learning of new words. However, no detailed specification of the link between phonological working memory and long-term memory (LTM) has been proposed. In this paper, we present a computational model of children’s vocabulary acquisition (EPAM-VOC) that specifies how phonological working memory and LTM interact. The model learns phoneme sequences, which are stored in LTM and mediate how much information can be held in working memory. The model’s behaviour is compared with that of children in a new study of NWR, conducted in order to ensure the same nonword stimuli and methodology across ages. EPAM-VOC shows a pattern of results similar to that of children: performance is better for shorter nonwords and for wordlike nonwords, and performance improves with age. EPAM-VOC also simulates the superior performance for single consonant nonwords over clustered consonant nonwords found in previous NWR studies. EPAM-VOC provides a simple and elegant computational account of some of the key processes involved in the learning of new words: it specifies how phonological working memory and LTM interact; makes testable predictions; and suggests that developmental changes in NWR performance may reflect differences in the amount of information that has been encoded in LTM rather than developmental changes in working memory capacity. Keywords: EPAM, working memory, long-term memory, nonword repetition, vocabulary acquisition, developmental change

Crossref

Nottingham Trent Institutional Repository (IRep)

Brunel University Research Archive

Post-training discriminative pruning for RBMs

Author: Albornoz Enrique Marcelo
Close John Goddard
Rufiner Hugo Leonardo
Sánchez Gutiérrez Máximo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2017
Field of study

One of the major challenges in the area of artificial neural networks is the identification of a suitable architecture for a specific problem. Choosing an unsuitable topology can exponentially increase the training cost, and even hinder network convergence. On the other hand, recent research indicates that larger or deeper nets can map the problem features into a more appropriate space, and thereby improve the classification process, thus leading to an apparent dichotomy. In this regard, it is interesting to inquire whether independent measures, such as mutual information, could provide a clue to finding the most discriminative neurons in a network. In the present work we explore this question in the context of Restricted Boltzmann Machines, by employing different measures to realize post-training pruning. The neurons which are determined by each measure to be the most discriminative, are combined and a classifier is applied to the ensuing network to determine its usefulness. We find that two measures in particular seem to be good indicators of the most discriminative neurons, producing savings of generally more than 50% of the neurons, while maintaining an acceptable error rate. Further, it is borne out that starting with a larger network architecture and then pruning is more advantageous than using a smaller network to begin with. Finally, a quantitative index is introduced which can provide information on choosing a suitable pruned network.Fil: Sánchez Gutiérrez, Máximo. Universidad Autónoma Metropolitana; MéxicoFil: Albornoz, Enrique Marcelo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Rufiner, Hugo Leonardo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina. Universidad Nacional de Entre Ríos; ArgentinaFil: Close, John Goddard. Universidad Autónoma Metropolitana; Méxic

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital