8 research outputs found
Online Learning of k-CNF Boolean Functions
This paper revisits the problem of learning a k-CNF Boolean function from
examples in the context of online learning under the logarithmic loss. In doing
so, we give a Bayesian interpretation to one of Valiant's celebrated PAC
learning algorithms, which we then build upon to derive two efficient, online,
probabilistic, supervised learning algorithms for predicting the output of an
unknown k-CNF Boolean function. We analyze the loss of our methods, and show
that the cumulative log-loss can be upper bounded, ignoring logarithmic
factors, by a polynomial function of the size of each example.Comment: 20 LaTeX pages. 2 Algorithms. Some Theorem
On Probability Estimation via Relative Frequencies and Discount
Probability estimation is an elementary building block of every statistical
data compression algorithm. In practice probability estimation is often based
on relative letter frequencies which get scaled down, when their sum is too
large. Such algorithms are attractive in terms of memory requirements, running
time and practical performance. However, there still is a lack of theoretical
understanding. In this work we formulate a typical probability estimation
algorithm based on relative frequencies and frequency discount, Algorithm RFD.
Our main contribution is its theoretical analysis. We show that the code length
it requires above an arbitrary piecewise stationary model with bounded and
unbounded letter probabilities is small. This theoretically confirms the
recency effect of periodic frequency discount, which has often been observed
empirically
On Probability Estimation by Exponential Smoothing
Probability estimation is essential for every statistical data compression
algorithm. In practice probability estimation should be adaptive, recent
observations should receive a higher weight than older observations. We present
a probability estimation method based on exponential smoothing that satisfies
this requirement and runs in constant time per letter. Our main contribution is
a theoretical analysis in case of a binary alphabet for various smoothing rate
sequences: We show that the redundancy w.r.t. a piecewise stationary model with
segments is for any bit sequence of length , an
improvement over redundancy of previous
approaches with similar time complexity
Online Learning for Changing Environments using Coin Betting
A key challenge in online learning is that classical algorithms can be slow
to adapt to changing environments. Recent studies have proposed "meta"
algorithms that convert any online learning algorithm to one that is adaptive
to changing environments, where the adaptivity is analyzed in a quantity called
the strongly-adaptive regret. This paper describes a new meta algorithm that
has a strongly-adaptive regret bound that is a factor of
better than other algorithms with the same time complexity, where is the
time horizon. We also extend our algorithm to achieve a first-order (i.e.,
dependent on the observed losses) strongly-adaptive regret bound for the first
time, to our knowledge. At its heart is a new parameter-free algorithm for the
learning with expert advice (LEA) problem in which experts sometimes do not
output advice for consecutive time steps (i.e., \emph{sleeping} experts). This
algorithm is derived by a reduction from optimal algorithms for the so-called
coin betting problem. Empirical results show that our algorithm outperforms
state-of-the-art methods in both learning with expert advice and metric
learning scenarios.Comment: submitted to a journal. arXiv admin note: substantial text overlap
with arXiv:1610.0457
On Statistical Data Compression
ï»żIm Zuge der stetigen Weiterentwicklung moderner Technik wĂ€chst die Menge an
zu verarbeitenden Daten.Es gilt diese Daten zu verwalten, zu ĂŒbertragen und
zu speichern.DafĂŒr ist Datenkompression unerlĂ€sslich.Gemessen an
empirischen Kompressionsraten zÀhlen Statistische
Datenkompressionsalgorithmen zu den Besten.Diese Algorithmen verarbeiten
einen Eingabetext buchstabenweise.Dabei verfĂ€hrt man fĂŒr jeden Buchstaben
in zwei Phasen - Modellierung und Kodierung.WĂ€hrend der Modellierung
schÀtzt ein Modell, basierend auf dem bereits bekannten Text, eine
Wahrscheinlichkeitsverteilung fĂŒr den nĂ€chsten Buchstaben.Ein Kodierer
ĂŒberfĂŒhrt die Verteilung und den Buchstaben in ein Codewort.Umgekehrt
ermittelt der Dekodierer aus der Verteilung und dem Codewort den kodierten
Buchstaben.Die Wahl des Modells bestimmt den statistischen
Kompressionsalgorithmus, das Modell ist also von zentraler Bedeutung.Ein
Modell mischt typischerweise viele einfache WahrscheinlichkeitsschÀtzer.In
der statistischen Datenkompression driften Theorie und Praxis
auseinander.Theoretiker legen Wert auf Modelle, die mathematische Analysen
zulassen, vernachlÀssigen aber Laufzeit, Speicherbedarf und empirische
Verbesserungen;Praktiker verfolgen den gegenteiligen Ansatz.Die
PAQ-Algorithmen haben die Ăberlegenheit des praktischen Ansatzes
verdeutlicht.Diese Arbeit soll Theorie und Praxis annÀhren.Dazu wird das
Handwerkszeug des Theoretikers, die CodelÀngenanlyse, auf Algorithmen des
Praktikers angewendet.Es werden WahrscheinlichkeitsschÀtzer, basierend auf
gealterten relativen HĂ€ufigkeiten und basierend auf exponentiell
geglÀtteten Wahrscheinlichkeiten, analysiert.Weitere Analysen decken
Methoden ab, die Verteilungen durch gewichtetes arithmetisches und
geometrisches Mitteln mischen und Gewichte mittels Gradientenverfahren
bestimmen.Die Analysen zeigen, dass sich die betrachteten Verfahren Àhnlich
gut wie idealisierte Vergleichsverfahren verhalten.Methoden aus PAQ werden
mit dieser Arbeit erweitert und mit einer theoretischen Basis
versehen.Experimente stĂŒtzen die Analyseergebnisse.Ein weiterer Beitrag
dieser Arbeit ist Context Tree Mixing (CTM), eine Verallgemeinerung von
Context Tree Weighting (CTW).Durch die Kombination von CTM mit Methoden aus
PAQ entsteht ein theoretisch fundierter Kompressionsalgorithmus, der in
Experimenten besser als CTW komprimiert.The ongoing evolution of hardware leads to a steady increase in the amount
of data that is processed, transmitted and stored.Data compression is an
essential tool to keep the amount of data manageable.In terms of empirical
performance statistical data compression algorithms rank among the best.A
statistical data compressor processes an input text letter by letter and
compresses in two stages --- modeling and coding.During modeling a model
estimates a probability distribution on the next letter based on the past
input.During coding an encoder translates this distribution and the next
letter into a codeword.Decoding reverts this process.The model is
exchangeable and its choice determines a statistical data compression
algorithm.All major models use a mixer to combine multiple simple
probability estimators, so-called elementary models.In statistical data
compression there is a gap between theory and practice.On the one hand,
theoreticians put emphasis on models that allow for a mathematical
analysis, but neglect running time and space considerations and empirical
improvements.On the other hand practitioners focus on the very reverse.The
family of PAQ statistical compressors demonstrated the superiority of the
practitioner's approach in terms of empirical compression.With this thesis
we attempt to bridge the aforementioned gap between theory and practice
with special focus on PAQ.To achieve this we apply the theoretician's tools
to practitioner's approaches:We provide a code length analysis for several
practical modeling and mixing techniques.The analysis covers modeling by
relative frequencies with frequency discount and modeling by exponential
smoothing of probabilities.For mixing we consider linear and geometrically
weighted averaging of probabilities with Online Gradient Descent for weight
estimation.Our results show that the models and mixers we consider perform
nearly as well as idealized competitors.Experiments support our
analysis.Moreover, our results add a theoretical basis to modeling and
mixing from PAQ and generalize methods from PAQ.Ultimately, we propose and
analyze Context Tree Mixing (CTM), a generalization of Context Tree
Weighting (CTW).We couple CTM with modeling and mixing techniques from PAQ
and obtain a theoretically sound compression algorithm that improves over
CTW, as shown in experiments