Search CORE

1,406 research outputs found

Recommended from our members

Two efficient lattice rescoring methods using recurrent neural network language models

Author: Chen X
Gales MJF
Liu X
Wang Y
Woodland PC
Publication venue: IEEE/ACM Transactions on Audio Speech and Language Processing
Publication date: 28/04/2016
Field of study

An important part of the language modelling problem for automatic speech recognition (ASR) systems, and many other related applications, is to appropriately model long-distance context dependencies in natural languages. Hence, statistical language models (LMs) that can model longer span history contexts, for example, recurrent neural network language models (RNNLMs), have become increasingly popular for state-of-the-art ASR systems. As RNNLMs use a vector representation of complete history contexts, they are normally used to rescore N-best lists. Motivated by their intrinsic characteristics, two efficient lattice rescoring methods for RNNLMs are proposed in this paper. The first method uses an

\textit{n}

-gram style clustering of history contexts. The second approach directly exploits the distance measure between recurrent hidden history vectors. Both methods produced 1-best performance comparable to a 10 k-best rescoring baseline RNNLM system on two large vocabulary conversational telephone speech recognition tasks for US English and Mandarin Chinese. Consistent lattice size compression and recognition performance improvements after confusion network (CN) decoding were also obtained over the prefix tree structured N-best rescoring approach.This work was supported by EPSRC under Grant EP/I031022/1 (Natural Speech Technology) and DARPA under the Broad Operational Language Translation and RATS programs. The work of X. Chen was supported by Toshiba Research Europe Ltd, Cambridge Research Lab.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/TASLP.2016.255882

Apollo (Cambridge)

A Personalized System for Conversational Recommendations

Author: Goker M. H.
Langley P.
Thompson C. A.
Publication venue: 'AI Access Foundation'
Publication date: 30/06/2011
Field of study

Searching for and making decisions about information is becoming increasingly difficult as the amount of information and number of choices increases. Recommendation systems help users find items of interest of a particular type, such as movies or restaurants, but are still somewhat awkward to use. Our solution is to take advantage of the complementary strengths of personalized recommendation systems and dialogue systems, creating personalized aides. We present a system -- the Adaptive Place Advisor -- that treats item selection as an interactive, conversational process, with the program inquiring about item attributes and the user responding. Individual, long-term user preferences are unobtrusively obtained in the course of normal recommendation dialogues and used to direct future conversations with the same user. We present a novel user model that influences both item search and the questions asked during a conversation. We demonstrate the effectiveness of our system in significantly reducing the time and number of interactions required to find a satisfactory item, as compared to a control group of users interacting with a non-adaptive version of the system

arXiv.org e-Print Archive

Crossref

UCSY-SC1: A Myanmar speech corpus for automatic speech recognition

Author: Mon Aye Nyein
Pa Win Pa
Thu Ye Kyaw
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/08/2019
Field of study

This paper introduces a speech corpus which is developed for Myanmar Automatic Speech Recognition (ASR) research. Automatic Speech Recognition (ASR) research has been conducted by the researchers around the world to improve their language technologies. Speech corpora are important in developing the ASR and the creation of the corpora is necessary especially for low-resourced languages. Myanmar language can be regarded as a low-resourced language because of lack of pre-created resources for speech processing research. In this work, a speech corpus named UCSY-SC1 (University of Computer Studies Yangon - Speech Corpus1) is created for Myanmar ASR research. The corpus consists of two types of domain: news and daily conversations. The total size of the speech corpus is over 42 hrs. There are 25 hrs of web news and 17 hrs of conversational recorded data.The corpus was collected from 177 females and 84 males for the news data and 42 females and 4 males for conversational domain. This corpus was used as training data for developing Myanmar ASR. Three different types of acoustic models such as Gaussian Mixture Model (GMM) - Hidden Markov Model (HMM), Deep Neural Network (DNN), and Convolutional Neural Network (CNN) models were built and compared their results. Experiments were conducted on different data sizes and evaluation is done by two test sets: TestSet1, web news and TestSet2, recorded conversational data. It showed that the performance of Myanmar ASRs using this corpus gave satisfiable results on both test sets. The Myanmar ASR using this corpus leading to word error rates of 15.61% on TestSet1 and 24.43% on TestSet2

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Paraphrastic language models

Author: Gales MJF
Liu X
Woodland PC
Publication venue: Computer Speech and Language
Publication date: 01/01/2014
Field of study

Natural languages are known for their expressive richness. Many sentences can be used to represent the same underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage and generalization, for example, when using n-gram language models (LMs). This paper proposes a novel form of language model, the paraphrastic LM, that addresses these issues. A phrase level paraphrase model statistically learned from standard text data with no semantic annotation is used to generate multiple paraphrase variants. LM probabilities are then estimated by maximizing their marginal probability. Multi-level language models estimated at both the word level and the phrase level are combined. An efﬁcient weighted ﬁnite state transducer (WFST) based paraphrase generation approach is also presented. Signiﬁcant error rate reductions of 0.5–0.6% absolute were obtained over the baseline n-gram LMs on two state-of-the-art recognition tasks for English conversational telephone speech and Mandarin Chinese broadcast speech using a paraphrastic multi-level LM modelling both word and phrase sequences. When it is further combined with word and phrase level feed-forward neural network LMs, a signiﬁcant error rate reduction of 0.9% absolute (9% relative) and 0.5% absolute (5% relative) were obtained over the baseline n-gram and neural network LMs respectivelyThe research leading to these results was supported by EPSRC grant EP/I031022/1 (Natural Speech Technology) and DARPA under the Broad Operational Language Translation (BOLT) program.This version is the author accepted manuscript. The final published version can be found on the publisher's website at:http://www.sciencedirect.com/science/article/pii/S088523081400028X# © 2014 Elsevier Ltd. All rights reserved

CiteSeerX

Apollo (Cambridge)