4 research outputs found
Language Modeling with Power Low Rank Ensembles
We present power low rank ensembles (PLRE), a flexible framework for n-gram
language modeling where ensembles of low rank matrices and tensors are used to
obtain smoothed probability estimates of words in context. Our method can be
understood as a generalization of n-gram modeling to non-integer n, and
includes standard techniques such as absolute discounting and Kneser-Ney
smoothing as special cases. PLRE training is efficient and our approach
outperforms state-of-the-art modified Kneser Ney baselines in terms of
perplexity on large corpora as well as on BLEU score in a downstream machine
translation task
Efficient subsampling for training complex language models
We propose an efficient way to train maximum entropy language models (MELM) and neural network language models (NNLM). The advantage of the proposed method comes from a more robust and efficient subsampling technique. The original multi-class language modeling problem is transformed into a set of binary problems where each binary classifier predicts whether or not a particular word will occur. We show that the binarized model is as powerful as the standard model and allows us to aggressively subsample negative training examples without sacrificing predictive performance. Empirical results show that we can train MELM and NNLM at 1 % ∼ 5 % of the standard complexity with no loss in performance.