In our paper, we address the problem of estimating stochastic language models based on n-gram statistics. We present a novel approach, rational interpolation, for the combination of a competing set of conditional n-gram word probability predictors, which consistently outperforms the traditional linear interpolation scheme. The superiority of rational interpolation is substantiated by experimental results from language modeling, speech recognition, dialog act classification, and language identification. 1. INTRODUCTION In our paper, we address the problem of estimating stochastic language models P (w) for sentences w = w1 : : : wT of words w t from a finite vocabulary V. The joint distribution P (w) can be decomposed by the wellknown chain rule P (w) = T Y t=1 P (w t jw t\Gamma1 1 ) = T Y t=1 P (w t j w1 : : : w t\Gamma1 ) (1) into a product of conditional word probabilities (by w t s we denote the substring ws : : : w t of w). The latter, in turn, are usually approximate..