IEEE Workshop on Signal Processing Advances in Wireless Communications, SPAWC
Doi
Abstract
The identification of useful temporal dependence
structure in discrete time series data is an important component
of algorithms applied to many tasks in statistical inference and
machine learning, and used in a wide variety of problems across
the spectrum of biological studies. Most of the early statistical
approaches were ineffective in practice, because the amount of
data required for reliable modelling grew exponentially with
memory length. On the other hand, many of the more modern
methodological approaches that make use of more flexible and
parsimonious models result in algorithms that do not scale
well and are computationally ineffective for larger data sets.
In this paper we describe a class of novel methodological tools
for effective Bayesian inference for general discrete time series,
motivated primarily by questions regarding data originating from
studies in genetics and neuroscience.
Our starting point is the development of a rich class of Bayesian hierarchical models for variable-memory Markov chains.
The particular prior structure we adopt makes it possible to
design effective, linear-time algorithms that can compute most of
the important features of the relevant posterior and predictive
distributions without resorting to Markov chain Monte Carlo
simulation. The origin of some of these algorithms can be traced
to the family of Context Tree Weighting (CTW) algorithms developed for data compression since the mid-1990s. We have used the
resulting methodological tools in numerous application-specific
tasks (including prediction, segmentation, classification, anomaly
detection, entropy estimation, and causality testing) on data from
different areas of application. The results obtained compare quite
favourably with those obtained using earlier approaches, such as
Probabilistic Suffix Trees (PST), Variable-Length Markov Chains
(VLMC), and the class of Markov Transition Distributions (MTD)