1,406 research outputs found
Recommended from our members
Two efficient lattice rescoring methods using recurrent neural network language models
An important part of the language modelling problem for automatic speech recognition (ASR) systems, and many other related applications, is to appropriately model long-distance context dependencies in natural languages. Hence, statistical language models (LMs) that can model longer span history contexts, for example, recurrent neural network language models (RNNLMs), have become increasingly popular for state-of-the-art ASR systems. As RNNLMs use a vector representation of complete history contexts, they are normally used to rescore N-best lists. Motivated by their intrinsic characteristics, two efficient lattice rescoring methods for RNNLMs are proposed in this paper. The first method uses an -gram style clustering of history contexts. The second approach directly exploits the distance measure between recurrent hidden history vectors. Both methods produced 1-best performance comparable to a 10 k-best rescoring baseline RNNLM system on two large vocabulary conversational telephone speech recognition tasks for US English and Mandarin Chinese. Consistent lattice size compression and recognition performance improvements after confusion network (CN) decoding were also obtained over the prefix tree structured N-best rescoring approach.This work was supported by EPSRC under Grant EP/I031022/1 (Natural Speech Technology) and DARPA under the Broad Operational Language Translation and RATS programs. The work of X. Chen was supported by Toshiba Research Europe Ltd, Cambridge Research Lab.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/TASLP.2016.255882
A Personalized System for Conversational Recommendations
Searching for and making decisions about information is becoming increasingly
difficult as the amount of information and number of choices increases.
Recommendation systems help users find items of interest of a particular type,
such as movies or restaurants, but are still somewhat awkward to use. Our
solution is to take advantage of the complementary strengths of personalized
recommendation systems and dialogue systems, creating personalized aides. We
present a system -- the Adaptive Place Advisor -- that treats item selection as
an interactive, conversational process, with the program inquiring about item
attributes and the user responding. Individual, long-term user preferences are
unobtrusively obtained in the course of normal recommendation dialogues and
used to direct future conversations with the same user. We present a novel user
model that influences both item search and the questions asked during a
conversation. We demonstrate the effectiveness of our system in significantly
reducing the time and number of interactions required to find a satisfactory
item, as compared to a control group of users interacting with a non-adaptive
version of the system
UCSY-SC1: A Myanmar speech corpus for automatic speech recognition
This paper introduces a speech corpus which is developed for Myanmar Automatic Speech Recognition (ASR) research. Automatic Speech Recognition (ASR) research has been conducted by the researchers around the world to improve their language technologies. Speech corpora are important in developing the ASR and the creation of the corpora is necessary especially for low-resourced languages. Myanmar language can be regarded as a low-resourced language because of lack of pre-created resources for speech processing research. In this work, a speech corpus named UCSY-SC1 (University of Computer Studies Yangon - Speech Corpus1) is created for Myanmar ASR research. The corpus consists of two types of domain: news and daily conversations. The total size of the speech corpus is over 42 hrs. There are 25 hrs of web news and 17 hrs of conversational recorded data.The corpus was collected from 177 females and 84 males for the news data and 42 females and 4 males for conversational domain. This corpus was used as training data for developing Myanmar ASR. Three different types of acoustic models such as Gaussian Mixture Model (GMM) - Hidden Markov Model (HMM), Deep Neural Network (DNN), and Convolutional Neural Network (CNN) models were built and compared their results. Experiments were conducted on different data sizes and evaluation is done by two test sets: TestSet1, web news and TestSet2, recorded conversational data. It showed that the performance of Myanmar ASRs using this corpus gave satisfiable results on both test sets. The Myanmar ASR using this corpus leading to word error rates of 15.61% on TestSet1 and 24.43% on TestSet2
Paraphrastic language models
Natural languages are known for their expressive richness. Many sentences can be used to represent the same underlying meaning.
Only modelling the observed surface word sequence can result in poor context coverage and generalization, for example, when using
n-gram language models (LMs). This paper proposes a novel form of language model, the paraphrastic LM, that addresses these
issues. A phrase level paraphrase model statistically learned from standard text data with no semantic annotation is used to generate
multiple paraphrase variants. LM probabilities are then estimated by maximizing their marginal probability. Multi-level language
models estimated at both the word level and the phrase level are combined. An efficient weighted finite state transducer (WFST)
based paraphrase generation approach is also presented. Significant error rate reductions of 0.5–0.6% absolute were obtained over the
baseline n-gram LMs on two state-of-the-art recognition tasks for English conversational telephone speech and Mandarin Chinese
broadcast speech using a paraphrastic multi-level LM modelling both word and phrase sequences. When it is further combined with
word and phrase level feed-forward neural network LMs, a significant error rate reduction of 0.9% absolute (9% relative) and 0.5%
absolute (5% relative) were obtained over the baseline n-gram and neural network LMs respectivelyThe research leading to these results was supported by EPSRC grant EP/I031022/1 (Natural Speech Technology)
and DARPA under the Broad Operational Language Translation (BOLT) program.This version is the author accepted manuscript. The final published version can be found on the publisher's website at:http://www.sciencedirect.com/science/article/pii/S088523081400028X# © 2014 Elsevier Ltd. All rights reserved
- …