Search CORE

1,752 research outputs found

Conversion of NNLM to Back-off language model in ASR

Author: Mr. Pappu kumar, Dr. S. L. lahudkar
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/09/2015
Field of study

In daily life, automatic speech recognition is one of the aspect which is widely used for security system. To convert speech into text using neural network, Language model is one of the block on which efficiency of speech recognition depends. In this paper we developed an algorithm to convert Neural Network Language model (NNLM) to Back-off language model for more efficient decoding. For large vocabulary system this conversion gives more efficient result. Efficiency of language model depends on perplexity and Word Error Rate (WER

International Journal on Recent and Innovation Trends in Computing and Communication

Recommended from our members

Two efficient lattice rescoring methods using recurrent neural network language models

Author: Chen X
Gales MJF
Liu X
Wang Y
Woodland PC
Publication venue: IEEE/ACM Transactions on Audio Speech and Language Processing
Publication date: 28/04/2016
Field of study

An important part of the language modelling problem for automatic speech recognition (ASR) systems, and many other related applications, is to appropriately model long-distance context dependencies in natural languages. Hence, statistical language models (LMs) that can model longer span history contexts, for example, recurrent neural network language models (RNNLMs), have become increasingly popular for state-of-the-art ASR systems. As RNNLMs use a vector representation of complete history contexts, they are normally used to rescore N-best lists. Motivated by their intrinsic characteristics, two efficient lattice rescoring methods for RNNLMs are proposed in this paper. The first method uses an

\textit{n}

-gram style clustering of history contexts. The second approach directly exploits the distance measure between recurrent hidden history vectors. Both methods produced 1-best performance comparable to a 10 k-best rescoring baseline RNNLM system on two large vocabulary conversational telephone speech recognition tasks for US English and Mandarin Chinese. Consistent lattice size compression and recognition performance improvements after confusion network (CN) decoding were also obtained over the prefix tree structured N-best rescoring approach.This work was supported by EPSRC under Grant EP/I031022/1 (Natural Speech Technology) and DARPA under the Broad Operational Language Translation and RATS programs. The work of X. Chen was supported by Toshiba Research Europe Ltd, Cambridge Research Lab.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/TASLP.2016.255882

Apollo (Cambridge)

Recommended from our members

Efficient lattice rescoring using recurrent neural network language models

Author: Chen X
Gales MJF
Liu X
Wang Y
Woodland PC
Publication venue: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publication date: 01/01/2014
Field of study

This is the accepted manuscript of a paper published in the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, Issue Date: 4-9 May 2014, Written by: Liu, X.; Wang, Y.; Chen, X.; Gales, M.J.F.; Woodland, P.C.).Recurrent neural network language models (RNNLM) have become an increasingly popular choice for state-of-the-art speech recognition systems due to their inherently strong generalization performance. As these models use a vector representation of complete history contexts, RNNLMs are normally used to rescore N-best lists. Motivated by their intrinsic characteristics, two novel lattice rescoring methods for RNNLMs are investigated in this paper. The first uses an n-gram style clustering of history contexts. The second approach directly exploits the distance measure between hidden history vectors. Both methods produced 1-best performance comparable with a 10k-best rescoring baseline RNNLMsystem on a large vocabulary conversational telephone speech recognition task. Significant lattice size compression of over 70% and consistent improvements after confusion network (CN) decoding were also obtained over the N-best rescoring approach.The research leading to these results was supported by EPSRC grant EP/I031022/1 (Natural Speech Technology) and DARPA under the Broad Operational Language Translation (BOLT) and RATS programs

Apollo (Cambridge)

COMPUTATIONAL ANALYSIS OF THE CONVERSATIONAL DYNAMICS OF THE UNITED STATES SUPREME COURT

Author: Hawes Timothy
Publication venue
Publication date: 01/01/2009
Field of study

The decisions of the United States Supreme Court have far-reaching implications in American life. Using transcripts of Supreme Court oral arguments this work looks at the conversational dynamics of Supreme Court justices and links their conversational interaction with the decisions of the Court and individual justices. While several studies have looked at the relationship between oral arguments and case variables, to our knowledge, none have looked at the relationship between conversational dynamics and case outcomes. Working from this view, we show that the conversation of Supreme Court justices is both predictable and predictive. We aim to show that conversation during Supreme Court cases is patterned, this patterned conversation is associated with case outcomes, and that this association can be used to make predictions about case outcomes. We present three sets of experiments to accomplish this. The first examines the order of speakers during oral arguments as a patterned sequence, showing that cohesive elements in the discourse, along with references to individuals, provide significant improvements over our "bag-of-words" baseline in identifying speakers in sequence within a transcript. The second graphically examines the association between speaker turn-taking and case outcomes. The results presented with this experiment point to interesting and complex relationships between conversational interaction and case variables, such as justices' votes. The third experiment shows that this relationship can be used in the prediction of case outcomes with accuracy ranging from 62.5% to 76.8% for varying conditions. Finally, we offer recommendations for improved tools for legal researchers interested in the relationship between conversation during oral arguments and case outcomes, and suggestions for how these tools may be applied to more general problems

Digital Repository at the University of Maryland

Comparison of Non-Parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing

Author: Remes Ulpu
Räsänen Okko
Seshadri Shreyas
Publication venue: ISCA
Publication date: 01/01/2017
Field of study

Peer reviewe

Crossref

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto