252 research outputs found
On Prediction Using Variable Order Markov Models
This paper is concerned with algorithms for prediction of discrete sequences
over a finite alphabet, using variable order Markov models. The class of such
algorithms is large and in principle includes any lossless compression
algorithm. We focus on six prominent prediction algorithms, including Context
Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic
Suffix Trees (PSTs). We discuss the properties of these algorithms and compare
their performance using real life sequences from three domains: proteins,
English text and music pieces. The comparison is made with respect to
prediction quality as measured by the average log-loss. We also compare
classification algorithms based on these predictors with respect to a number of
large protein classification tasks. Our results indicate that a "decomposed"
CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in
sequence prediction tasks. Somewhat surprisingly, a different algorithm, which
is a modification of the Lempel-Ziv compression algorithm, significantly
outperforms all algorithms on the protein classification problems
Context-Tree-Based Lossy Compression and Its Application to CSI Representation
We propose novel compression algorithms for time-varying channel state
information (CSI) in wireless communications. The proposed scheme combines
(lossy) vector quantisation and (lossless) compression. First, the new vector
quantisation technique is based on a class of parametrised companders applied
on each component of the normalised CSI vector. Our algorithm chooses a
suitable compander in an intuitively simple way whenever empirical data are
available. Then, the sequences of quantisation indices are compressed using a
context-tree-based approach. Essentially, we update the estimate of the
conditional distribution of the source at each instant and encode the current
symbol with the estimated distribution. The algorithms have low complexity, are
linear-time in both the spatial dimension and time duration, and can be
implemented in an online fashion. We run simulations to demonstrate the
effectiveness of the proposed algorithms in such scenarios.Comment: 12 pages, 9 figures. Accepted for publication in the IEEE
Transactions on Communication
Low-Complexity Nonparametric Bayesian Online Prediction with Universal Guarantees
We propose a novel nonparametric online predictor for discrete labels
conditioned on multivariate continuous features. The predictor is based on a
feature space discretization induced by a full-fledged k-d tree with randomly
picked directions and a recursive Bayesian distribution, which allows to
automatically learn the most relevant feature scales characterizing the
conditional distribution. We prove its pointwise universality, i.e., it
achieves a normalized log loss performance asymptotically as good as the true
conditional entropy of the labels given the features. The time complexity to
process the -th sample point is in probability with respect to
the distribution generating the data points, whereas other exact nonparametric
methods require to process all past observations. Experiments on challenging
datasets show the computational and statistical efficiency of our algorithm in
comparison to standard and state-of-the-art methods.Comment: Camera-ready version published in NeurIPS 201
On Statistical Data Compression
ï»żIm Zuge der stetigen Weiterentwicklung moderner Technik wĂ€chst die Menge an
zu verarbeitenden Daten.Es gilt diese Daten zu verwalten, zu ĂŒbertragen und
zu speichern.DafĂŒr ist Datenkompression unerlĂ€sslich.Gemessen an
empirischen Kompressionsraten zÀhlen Statistische
Datenkompressionsalgorithmen zu den Besten.Diese Algorithmen verarbeiten
einen Eingabetext buchstabenweise.Dabei verfĂ€hrt man fĂŒr jeden Buchstaben
in zwei Phasen - Modellierung und Kodierung.WĂ€hrend der Modellierung
schÀtzt ein Modell, basierend auf dem bereits bekannten Text, eine
Wahrscheinlichkeitsverteilung fĂŒr den nĂ€chsten Buchstaben.Ein Kodierer
ĂŒberfĂŒhrt die Verteilung und den Buchstaben in ein Codewort.Umgekehrt
ermittelt der Dekodierer aus der Verteilung und dem Codewort den kodierten
Buchstaben.Die Wahl des Modells bestimmt den statistischen
Kompressionsalgorithmus, das Modell ist also von zentraler Bedeutung.Ein
Modell mischt typischerweise viele einfache WahrscheinlichkeitsschÀtzer.In
der statistischen Datenkompression driften Theorie und Praxis
auseinander.Theoretiker legen Wert auf Modelle, die mathematische Analysen
zulassen, vernachlÀssigen aber Laufzeit, Speicherbedarf und empirische
Verbesserungen;Praktiker verfolgen den gegenteiligen Ansatz.Die
PAQ-Algorithmen haben die Ăberlegenheit des praktischen Ansatzes
verdeutlicht.Diese Arbeit soll Theorie und Praxis annÀhren.Dazu wird das
Handwerkszeug des Theoretikers, die CodelÀngenanlyse, auf Algorithmen des
Praktikers angewendet.Es werden WahrscheinlichkeitsschÀtzer, basierend auf
gealterten relativen HĂ€ufigkeiten und basierend auf exponentiell
geglÀtteten Wahrscheinlichkeiten, analysiert.Weitere Analysen decken
Methoden ab, die Verteilungen durch gewichtetes arithmetisches und
geometrisches Mitteln mischen und Gewichte mittels Gradientenverfahren
bestimmen.Die Analysen zeigen, dass sich die betrachteten Verfahren Àhnlich
gut wie idealisierte Vergleichsverfahren verhalten.Methoden aus PAQ werden
mit dieser Arbeit erweitert und mit einer theoretischen Basis
versehen.Experimente stĂŒtzen die Analyseergebnisse.Ein weiterer Beitrag
dieser Arbeit ist Context Tree Mixing (CTM), eine Verallgemeinerung von
Context Tree Weighting (CTW).Durch die Kombination von CTM mit Methoden aus
PAQ entsteht ein theoretisch fundierter Kompressionsalgorithmus, der in
Experimenten besser als CTW komprimiert.The ongoing evolution of hardware leads to a steady increase in the amount
of data that is processed, transmitted and stored.Data compression is an
essential tool to keep the amount of data manageable.In terms of empirical
performance statistical data compression algorithms rank among the best.A
statistical data compressor processes an input text letter by letter and
compresses in two stages --- modeling and coding.During modeling a model
estimates a probability distribution on the next letter based on the past
input.During coding an encoder translates this distribution and the next
letter into a codeword.Decoding reverts this process.The model is
exchangeable and its choice determines a statistical data compression
algorithm.All major models use a mixer to combine multiple simple
probability estimators, so-called elementary models.In statistical data
compression there is a gap between theory and practice.On the one hand,
theoreticians put emphasis on models that allow for a mathematical
analysis, but neglect running time and space considerations and empirical
improvements.On the other hand practitioners focus on the very reverse.The
family of PAQ statistical compressors demonstrated the superiority of the
practitioner's approach in terms of empirical compression.With this thesis
we attempt to bridge the aforementioned gap between theory and practice
with special focus on PAQ.To achieve this we apply the theoretician's tools
to practitioner's approaches:We provide a code length analysis for several
practical modeling and mixing techniques.The analysis covers modeling by
relative frequencies with frequency discount and modeling by exponential
smoothing of probabilities.For mixing we consider linear and geometrically
weighted averaging of probabilities with Online Gradient Descent for weight
estimation.Our results show that the models and mixers we consider perform
nearly as well as idealized competitors.Experiments support our
analysis.Moreover, our results add a theoretical basis to modeling and
mixing from PAQ and generalize methods from PAQ.Ultimately, we propose and
analyze Context Tree Mixing (CTM), a generalization of Context Tree
Weighting (CTW).We couple CTM with modeling and mixing techniques from PAQ
and obtain a theoretically sound compression algorithm that improves over
CTW, as shown in experiments
Asymptotics of Continuous Bayes for Non-i.i.d. Sources
Clarke and Barron analysed the relative entropy between an i.i.d. source and
a Bayesian mixture over a continuous class containing that source. In this
paper a comparable result is obtained when the source is permitted to be both
non-stationary and dependent. The main theorem shows that Bayesian methods
perform well for both compression and sequence prediction even in this most
general setting with only mild technical assumptions.Comment: 16 pages, 1 figur
Combined Industry, Space and Earth Science Data Compression Workshop
The sixth annual Space and Earth Science Data Compression Workshop and the third annual Data Compression Industry Workshop were held as a single combined workshop. The workshop was held April 4, 1996 in Snowbird, Utah in conjunction with the 1996 IEEE Data Compression Conference, which was held at the same location March 31 - April 3, 1996. The Space and Earth Science Data Compression sessions seek to explore opportunities for data compression to enhance the collection, analysis, and retrieval of space and earth science data. Of particular interest is data compression research that is integrated into, or has the potential to be integrated into, a particular space or earth science data information system. Preference is given to data compression research that takes into account the scien- tist's data requirements, and the constraints imposed by the data collection, transmission, distribution and archival systems
Sparse adaptive Dirichlet-multinomial-like processes
Online estimation and modelling of i.i.d. data for short
sequences over large or complex ''alphabets'' is a ubiquitous
(sub)problem in machine learning, information theory, data
compression, statistical language processing, and document
analysis. The Dirichlet-Multinomial distribution (also called
Polya urn scheme) and extensions thereof are widely applied for
online i.i.d. estimation. Good a-priori choices for the
parameters in this regime are difficult to obtain though. I
derive an optimal adaptive choice for the main parameter via
tight, data-dependent redundancy bounds for a related model. The
1-line recommendation is to set the 'total mass' = 'precision' =
'concentration' parameter to m/2ln[(n+1)/m], where n
is the (past) sample size and m the number of different symbols
observed (so far). The resulting estimator is simple, online,
fast, and experimental performance is superb
- âŠ