Search CORE

4,876 research outputs found

Improving large vocabulary continuous speech recognition by combining GMM-based and reservoir-based acoustic modeling

Author: Demuynck Kris
Martens Jean-Pierre
Triefenbach Fabian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

In earlier work we have shown that good phoneme recognition is possible with a so-called reservoir, a special type of recurrent neural network. In this paper, different architectures based on Reservoir Computing (RC) for large vocabulary continuous speech recognition are investigated. Besides experiments with HMM hybrids, it is shown that a RC-HMM tandem can achieve the same recognition accuracy as a classical HMM, which is a promising result for such a fairly new paradigm. It is also demonstrated that a state-level combination of the scores of the tandem and the baseline HMM leads to a significant improvement over the baseline. A word error rate reduction of the order of 20\% relative is possible

Crossref

Ghent University Academic Bibliography

Subword and Crossword Units for CTC Acoustic Models

Author: Metze Florian
Sanabria Ramon
Waibel Alex
Zenkel Thomas
Publication venue
Publication date: 18/06/2018
Field of study

This paper proposes a novel approach to create an unit set for CTC based speech recognition systems. By using Byte Pair Encoding we learn an unit set of an arbitrary size on a given training text. In contrast to using characters or words as units this allows us to find a good trade-off between the size of our unit set and the available training data. We evaluate both Crossword units, that may span multiple word, and Subword units. By combining this approach with decoding methods using a separate language model we are able to achieve state of the art results for grapheme based CTC systems.Comment: Current version accepted at Interspeech 201

arXiv.org e-Print Archive

Crossref

Non-Native Pronunciation Variation Modeling for Automatic Speech Recognition

Author: Hong Kook Kim
Mina Kim
Yoo Rhee Oh
Publication venue: 'IntechOpen'
Publication date: 16/08/2010
Field of study

IntechOpen

Speech recognition for smart homes

Author: McLoughlin Ian Vince
Sharifzadeh Hamid Reza
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Kent Academic Repository

Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models

Author: Asaei Afsaneh
Bourlard Herve
Dighe Pranay
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/10/2016
Field of study

Conventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training. Subword classes in speech recognition systems correspond to context-dependent tied states or senones. The present work addresses some limitations of GMM-HMM senone alignments for DNN training. We hypothesize that the senone probabilities obtained from a DNN trained with binary labels can provide more accurate targets to learn better acoustic models. However, DNN outputs bear inaccuracies which are exhibited as high dimensional unstructured noise, whereas the informative components are structured and low-dimensional. We exploit principle component analysis (PCA) and sparse coding to characterize the senone subspaces. Enhanced probabilities obtained from low-rank and sparse reconstructions are used as soft-targets for DNN acoustic modeling, that also enables training with untranscribed data. Experiments conducted on AMI corpus shows 4.6% relative reduction in word error rate

arXiv.org e-Print Archive

Crossref

Speech Recognition in Hindi

Author: Chayani Satyabrata
Paul Abhisek
Publication venue
Publication date: 16/05/2011
Field of study

This project is an attempt towards reducing the gap between the computer and the people of rural India, by allowing them to use Hindi language, the most common language being used by the people in rural areas. Speech recognition will, indeed, play a very significant role in promoting the technology in the rural areas. Although many speech interfaces are already available, the need is for speech interfaces in local Indian languages, hence we attempt to build a speech recognition system in Hindi, in this project. The project report explains in brief about the basic model of a speech recognition engine and its different modules. It also briefs about the construction of the Hindi language dictionary and training the model for recognition of speech and finally testing the model for accuracy. The results of the tests have been provided and finally the report ends with the derived conclusion and recommended future work

ethesis@nitr