18 research outputs found
Hierarchical Bayesian Nonparametric Models for Power-Law Sequences
Sequence data that exhibits power-law behavior in its marginal and conditional distributions arises frequently from natural processes, with natural language text being a prominent example. We study probabilistic models for such sequences based on a hierarchical non-parametric Bayesian prior, develop inference and learning procedures for making these models useful in practice and applicable to large, real-world data sets, and empirically demonstrate their excellent predictive performance. In particular, we consider models based on the infinite-depth variant of the hierarchical Pitman-Yor process (HPYP) language model [Teh, 2006b] known as the Sequence Memoizer, as well as Sequence Memoizer-based cache language models and hybrid models combining the HPYP with neural language models. We empirically demonstrate that these models performwell on languagemodelling and data compression tasks
Neural forecasting: Introduction and literature overview
Neural network based forecasting methods have become ubiquitous in
large-scale industrial forecasting applications over the last years. As the
prevalence of neural network based solutions among the best entries in the
recent M4 competition shows, the recent popularity of neural forecasting
methods is not limited to industry and has also reached academia. This article
aims at providing an introduction and an overview of some of the advances that
have permitted the resurgence of neural networks in machine learning. Building
on these foundations, the article then gives an overview of the recent
literature on neural networks for forecasting and applications.Comment: 66 pages, 5 figure
Lossless compression based on the Sequence Memoizer
In this work we describe a sequence compression method based on combining a Bayesian nonparametric sequence model with entropy encoding. The model, a hierarchy of Pitman-Yor processes of unbounded depth previously proposed by Wood et al. [16] in the context of language modelling, allows modelling of long-range dependencies by allowing conditioning contexts of unbounded length. We show that incremental approximate inference can be performed in this model, thereby allowing it to be used in a text compression setting. The resulting compressor reliably outperforms several PPM variants on many types of data, but is particularly effective in compressing data that exhibits power law properties. 1