300 research outputs found
Paraphrastic neural network language models
Expressive richness in natural languages presents a significant challenge for statistical language models (LM). As multiple word sequences can represent the same underlying meaning, only modelling the observed surface word sequence can lead to poor context coverage. To handle this issue, paraphrastic LMs were previously proposed to improve the generalization of back-off n-gram LMs. Paraphrastic neural network LMs (NNLM) are investigated in this paper. Using a paraphrastic multi-level feedforward NNLM modelling both word and phrase sequences, significant error rate reductions of 1.3% absolute (8% relative) and 0.9% absolute (5.5% relative) were obtained over the baseline n-gram and NNLM systems respectively on a state-of-the-art conversational telephone speech recognition system trained on 2000 hours of audio and 545 million words of texts.The research leading to these results was supported by EPSRC grant
EP/I031022/1 (Natural Speech Technology) and DARPA under the Broad
Operational Language Translation (BOLT) program.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/ICASSP.2014.685453
Recommended from our members
Paraphrastic language models and combination with neural network language models
In natural languages multiple word sequences can represent the same
underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage, for example, when using n-gram language models (LM). To handle this issue, paraphrastic LMs were proposed in previous research and successfully applied to a US English conversational telephone speech transcription
task. In order to exploit the complementary characteristics of paraphrastic LMs and neural network LMs (NNLM), the combination
between the two is investigated in this paper. To investigate paraphrastic LMs’ generalization ability to other languages, experiments
are conducted on a Mandarin Chinese broadcast speech transcription task. Using a paraphrastic multi-level LM modelling both word
and phrase sequences, significant error rate reductions of 0.9% absolute (9% relative) and 0.5% absolute (5% relative) were obtained
over the baseline n-gram and NNLM systems respectively, after a
combination with word and phrase level NNLMs.The research leading to these results was supported by EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology)This is the author accepted manuscript. The final version is available at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6639308
Cross-domain paraphrasing for improving language modelling using out-of-domain data
In natural languages the variability in the underlying linguistic
generation rules significantly alters the observed surface word
sequence they create, and thus introduces a mismatch against
other data generated via alternative realizations associated with,
for example, a different domain. Hence, direct modelling of
out-of-domain data can result in poor generalization to the indomain
data of interest. To handle this problem, this paper
investigated using cross-domain paraphrastic language models
to improve in-domain language modelling (LM) using out-ofdomain
data. Phrase level paraphrase models learnt from each
domain were used to generate paraphrase variants for the data
of other domains. These were used to both improve the context
coverage of in-domain data, and reduce the domain mismatch of
the out-of-domain data. Significant error rate reduction of 0.6%
absolute was obtained on a state-of-the-art conversational telephone
speech recognition task using a cross-domain paraphrastic
multi-level LM trained on a billion words of mixed conversational
and broadcast news data. Consistent improvements on
the in-domain data context coverage were also obtained.The research leading to these results was supported by EPSRC Programme
Grant EP/I031022/1 (Natural Speech Technology) and DARPA
under the Broad Operational Language Translation (BOLT) program.This is the accepted manuscript. The final version is available at http://www.isca-speech.org/archive/interspeech_2013/i13_3424.htm
Paraphrastic language models
Natural languages are known for their expressive richness. Many sentences can be used to represent the same underlying meaning.
Only modelling the observed surface word sequence can result in poor context coverage and generalization, for example, when using
n-gram language models (LMs). This paper proposes a novel form of language model, the paraphrastic LM, that addresses these
issues. A phrase level paraphrase model statistically learned from standard text data with no semantic annotation is used to generate
multiple paraphrase variants. LM probabilities are then estimated by maximizing their marginal probability. Multi-level language
models estimated at both the word level and the phrase level are combined. An efficient weighted finite state transducer (WFST)
based paraphrase generation approach is also presented. Significant error rate reductions of 0.5–0.6% absolute were obtained over the
baseline n-gram LMs on two state-of-the-art recognition tasks for English conversational telephone speech and Mandarin Chinese
broadcast speech using a paraphrastic multi-level LM modelling both word and phrase sequences. When it is further combined with
word and phrase level feed-forward neural network LMs, a significant error rate reduction of 0.9% absolute (9% relative) and 0.5%
absolute (5% relative) were obtained over the baseline n-gram and neural network LMs respectivelyThe research leading to these results was supported by EPSRC grant EP/I031022/1 (Natural Speech Technology)
and DARPA under the Broad Operational Language Translation (BOLT) program.This version is the author accepted manuscript. The final published version can be found on the publisher's website at:http://www.sciencedirect.com/science/article/pii/S088523081400028X# © 2014 Elsevier Ltd. All rights reserved
Paraphrastic recurrent neural network language models
Recurrent neural network language models (RNNLM) have become an increasingly popular choice for state-of-the-art speech recognition systems. Linguistic factors influencing the realization of surface word sequences, for example, expressive richness, are only implicitly learned by RNNLMs. Observed sentences and their associated alternative paraphrases representing the same meaning are not explicitly related during training. In order to improve context coverage and generalization, paraphrastic RNNLMs are investigated in this paper. Multiple paraphrase variants were automatically generated and used in paraphrastic RNNLM training. Using a paraphrastic multi-level RNNLM modelling both word and phrase sequences, significant error rate reductions of 0.6% absolute and perplexity reduction of 10% relative were obtained over the baseline RNNLM on a large vocabulary conversational telephone speech recognition system trained on 2000 hours of audio and 545 million words of texts. The overall improvement over the baseline n-gram LM was increased from 8.4% to 11.6% relative.The research leading to these results was supported by EPSRC grant EP/I031022/1 (Natural Speech Technology) and DARPA under the Broad Operational Language Translation (BOLT) and RATS programs. The paper does not necessarily reflect the position or the policy of US Government and no official endorsement should be inferred. Xie Chen is supported by Toshiba Research Europe Ltd, Cambridge Research Lab.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/ICASSP.2015.717900
Recurrent neural network language model training with noise contrastive estimation for speech recognition
In recent years recurrent neural network language models (RNNLMs) have been successfully applied to a range of tasks including speech recognition. However, an important issue that limits the quantity of data used, and their possible application areas, is the computational cost in training. A significant part of this cost is associated with the softmax function at the output layer, as this requires a normalization term to be explicitly calculated. This impacts both the training and testing speed, especially when a large output vocabulary is used. To address this problem, noise contrastive estimation (NCE), is used in RNNLM training in this paper. It does not require the above normalization during both training and testing and is insensitive to the output layer size. On a large vocabulary conversational telephone speech recognition task, a doubling in training speed and 56 time speed up in test time evaluation were obtained.Xie Chen is supported by Toshiba Research Europe Ltd, Cambridge Research Lab. The research leading to these results was also supported by EPSRC grant EP/I031022/1 (Natural Speech Technology) and DARPA under the Broad Operational Language Translation (BOLT) and RATS programs. The paper does not necessarily reflect the position or the policy of US Government and no official endorsement should be inferred. The authos also would like to thanks Ashish Vaswani from USC for suggestions and discussion on training of NNLMs with NCE.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/ICASSP.2015.717900
Improving the training and evaluation efficiency of recurrent neural network language models
Recurrent neural network language models (RNNLMs) are becoming increasingly popular for speech recognition. Previously, we have shown that RNNLMs with a full (non-classed) output layer (F-RNNLMs) can be trained efficiently using a GPU giving a large reduction in training time over conventional class-based models (C-RNNLMs) on a standard CPU. However, since test-time RNNLM evaluation is often performed entirely on a CPU, standard F-RNNLMs are inefficient since the entire output layer needs to be calculated for normalisation. In this paper, it is demonstrated that C-RNNLMs can be efficiently trained on a GPU, using our spliced sentence bunch technique which allows good CPU test-time performance (42x speedup over F-RNNLM). Furthermore, the performance of different classing approaches is investigated. We also examine the use of variance regularisation of the softmax denominator for F-RNNLMs and show that it allows F-RNNLMs to be efficiently used in test (56x speedup on CPU). Finally the use of two GPUs for F-RNNLM training using pipelining is described and shown to give a reduction in training time over a single GPU by a factor of 1.6.Xie Chen is supported by Toshiba Research Europe Ltd, Cambridge Research Lab. The research leading to these results was also supported by EPSRC grant EP/I031022/1 (Natural Speech Technology) and DARPA under the Broad Operational Language Translation (BOLT) and RATS programs. The paper does not necessarily reflect the position or the policy of US Government and no official endorsement should be inferred.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/ICASSP.2015.717900
Speaker adaptation and adaptive training for jointly optimised tandem systems
Speaker independent (SI) Tandem systems trained by joint optimisation
of bottleneck (BN) deep neural networks (DNNs) and
Gaussian mixture models (GMMs) have been found to produce
similar word error rates (WERs) to Hybrid DNN systems. A
key advantage of using GMMs is that existing speaker adaptation
methods, such as maximum likelihood linear regression
(MLLR), can be used which to account for diverse speaker
variations and improve system robustness. This paper investigates
speaker adaptation and adaptive training (SAT) schemes
for jointly optimised Tandem systems. Adaptation techniques
investigated include constrained MLLR (CMLLR) transforms
based on BN features for SAT as well as MLLR and parameterised
sigmoid functions for unsupervised test-time adaptation.
Experiments using English multi-genre broadcast (MGB3) data
show that CMLLR SAT yields a 4% relative WER reduction
over jointly trained Tandem and Hybrid SI systems, and further
reductions in WER are obtained by system combination
Recommended from our members
CUED-RNNLM - An open-source toolkit for efficient training and evaluation of recurrent neural network language models
Recommended from our members
Two efficient lattice rescoring methods using recurrent neural network language models
An important part of the language modelling problem for automatic speech recognition (ASR) systems, and many other related applications, is to appropriately model long-distance context dependencies in natural languages. Hence, statistical language models (LMs) that can model longer span history contexts, for example, recurrent neural network language models (RNNLMs), have become increasingly popular for state-of-the-art ASR systems. As RNNLMs use a vector representation of complete history contexts, they are normally used to rescore N-best lists. Motivated by their intrinsic characteristics, two efficient lattice rescoring methods for RNNLMs are proposed in this paper. The first method uses an -gram style clustering of history contexts. The second approach directly exploits the distance measure between recurrent hidden history vectors. Both methods produced 1-best performance comparable to a 10 k-best rescoring baseline RNNLM system on two large vocabulary conversational telephone speech recognition tasks for US English and Mandarin Chinese. Consistent lattice size compression and recognition performance improvements after confusion network (CN) decoding were also obtained over the prefix tree structured N-best rescoring approach.This work was supported by EPSRC under Grant EP/I031022/1 (Natural Speech Technology) and DARPA under the Broad Operational Language Translation and RATS programs. The work of X. Chen was supported by Toshiba Research Europe Ltd, Cambridge Research Lab.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/TASLP.2016.255882
- …