1,356 research outputs found
Context Tree Selection: A Unifying View
The present paper investigates non-asymptotic properties of two popular
procedures of context tree (or Variable Length Markov Chains) estimation:
Rissanen's algorithm Context and the Penalized Maximum Likelihood criterion.
First showing how they are related, we prove finite horizon bounds for the
probability of over- and under-estimation. Concerning overestimation, no
boundedness or loss-of-memory conditions are required: the proof relies on new
deviation inequalities for empirical probabilities of independent interest. The
underestimation properties rely on loss-of-memory and separation conditions of
the process.
These results improve and generalize the bounds obtained previously. Context
tree models have been introduced by Rissanen as a parsimonious generalization
of Markov models. Since then, they have been widely used in applied probability
and statistics
Context tree switching
This paper describes the Context Tree Switching technique, a modification of Context Tree
Weighting for the prediction of binary, stationary, n-Markov sources. By modifying Context
Tree Weightingâs recursive weighting scheme, it is possible to mix over a strictly larger class of
models without increasing the asymptotic time or space complexity of the original algorithm.
We prove that this generalization preserves the desirable theoretical properties of Context Tree
Weighting on stationary n-Markov sources, and show empirically that this new technique leads
to consistent improvements over Context Tree Weighting as measured on the Calgary Corpus
Adaptive context tree weighting
We describe an adaptive context tree weighting (ACTW) algorithm, as an extension to the standard context tree weighting (CTW) algorithm. Unlike the standard CTW algorithm, which weights all observations equally regardless of the depth, ACTW gives increasing weight to more recent observations, aiming to improve performance in cases where the input sequence is from a non-stationary distribution. Data compression results show ACTW variants improving over CTW on merged files from standard compression benchmark tests while never being significantly worse on any individual file
A Universal Parallel Two-Pass MDL Context Tree Compression Algorithm
Computing problems that handle large amounts of data necessitate the use of
lossless data compression for efficient storage and transmission. We present a
novel lossless universal data compression algorithm that uses parallel
computational units to increase the throughput. The length- input sequence
is partitioned into blocks. Processing each block independently of the
other blocks can accelerate the computation by a factor of , but degrades
the compression quality. Instead, our approach is to first estimate the minimum
description length (MDL) context tree source underlying the entire input, and
then encode each of the blocks in parallel based on the MDL source. With
this two-pass approach, the compression loss incurred by using more parallel
units is insignificant. Our algorithm is work-efficient, i.e., its
computational complexity is . Its redundancy is approximately
bits above Rissanen's lower bound on universal compression
performance, with respect to any context tree source whose maximal depth is at
most . We improve the compression by using different quantizers for
states of the context tree based on the number of symbols corresponding to
those states. Numerical results from a prototype implementation suggest that
our algorithm offers a better trade-off between compression and throughput than
competing universal data compression algorithms.Comment: Accepted to Journal of Selected Topics in Signal Processing special
issue on Signal Processing for Big Data (expected publication date June
2015). 10 pages double column, 6 figures, and 2 tables. arXiv admin note:
substantial text overlap with arXiv:1405.6322. Version: Mar 2015: Corrected a
typ
- âŠ