Search CORE

2,473,294 research outputs found

Language Models

Author: Hiemstra D.
Publication venue: Springer Verlag
Publication date: 01/01/2009
Field of study

Contains fulltext : 227630.pdf (preprint version ) (Open Access

Crossref

Radboud Repository (Radboud Univ.)

University of Twente Research Information

Cold Fusion: Training Seq2Seq Models Together with Language Models

Author: Coates Adam
Jun Heewoo
Satheesh Sanjeev
Sriram Anuroop
Publication venue
Publication date: 21/08/2017
Field of study

Sequence-to-sequence (Seq2Seq) models with attention have excelled at tasks which involve generating natural language sentences such as machine translation, image captioning and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language model. In this work, we present the Cold Fusion method, which leverages a pre-trained language model during training, and show its effectiveness on the speech recognition task. We show that Seq2Seq models with Cold Fusion are able to better utilize language information enjoying i) faster convergence and better generalization, and ii) almost complete transfer to a new domain while using less than 10% of the labeled training data

arXiv.org e-Print Archive

Crossref

Character-Word LSTM Language Models

Author: Pelemans Joris
Van hamme Hugo
Verwimp Lyan
Wambacq Patrick
Publication venue
Publication date: 01/01/2017
Field of study

We present a Character-Word Long Short-Term Memory Language Model which both reduces the perplexity with respect to a baseline word-level language model and reduces the number of parameters of the model. Character information can reveal structural (dis)similarities between words and can even be used when a word is out-of-vocabulary, thus improving the modeling of infrequent and unknown words. By concatenating word and character embeddings, we achieve up to 2.77% relative improvement on English compared to a baseline model with a similar amount of parameters and 4.57% on Dutch. Moreover, we also outperform baseline word-level models with a larger number of parameters

arXiv.org e-Print Archive

Crossref

Scaling Recurrent Neural Network Language Models

Author: Ash Tom
Mrva David
Prasad Niranjani
Robinson Tony
Williams Will
Publication venue
Publication date: 02/02/2015
Field of study

This paper investigates the scaling properties of Recurrent Neural Network Language Models (RNNLMs). We discuss how to train very large RNNs on GPUs and address the questions of how RNNLMs scale with respect to model size, training-set size, computational costs and memory. Our analysis shows that despite being more costly to train, RNNLMs obtain much lower perplexities on standard benchmarks than n-gram models. We train the largest known RNNs and present relative word error rates gains of 18% on an ASR task. We also present the new lowest perplexities on the recently released billion word language modelling benchmark, 1 BLEU point gain on machine translation and a 17% relative hit rate gain in word prediction

arXiv.org e-Print Archive

Crossref