Search CORE

8 research outputs found

Online Learning of k-CNF Boolean Functions

Author: Hutter Marcus
Veness Joel
Publication venue
Publication date: 26/03/2014
Field of study

This paper revisits the problem of learning a k-CNF Boolean function from examples in the context of online learning under the logarithmic loss. In doing so, we give a Bayesian interpretation to one of Valiant's celebrated PAC learning algorithms, which we then build upon to derive two efficient, online, probabilistic, supervised learning algorithms for predicting the output of an unknown k-CNF Boolean function. We analyze the loss of our methods, and show that the cumulative log-loss can be upper bounded, ignoring logarithmic factors, by a polynomial function of the size of each example.Comment: 20 LaTeX pages. 2 Algorithms. Some Theorem

arXiv.org e-Print Archive

CiteSeerX

On Probability Estimation via Relative Frequencies and Discount

Author: Mattern Christopher
Publication venue
Publication date: 09/01/2015
Field of study

Probability estimation is an elementary building block of every statistical data compression algorithm. In practice probability estimation is often based on relative letter frequencies which get scaled down, when their sum is too large. Such algorithms are attractive in terms of memory requirements, running time and practical performance. However, there still is a lack of theoretical understanding. In this work we formulate a typical probability estimation algorithm based on relative frequencies and frequency discount, Algorithm RFD. Our main contribution is its theoretical analysis. We show that the code length it requires above an arbitrary piecewise stationary model with bounded and unbounded letter probabilities is small. This theoretically confirms the recency effect of periodic frequency discount, which has often been observed empirically

arXiv.org e-Print Archive

Crossref

On Probability Estimation by Exponential Smoothing

Author: Mattern Christopher
Publication venue
Publication date: 09/01/2015
Field of study

Probability estimation is essential for every statistical data compression algorithm. In practice probability estimation should be adaptive, recent observations should receive a higher weight than older observations. We present a probability estimation method based on exponential smoothing that satisfies this requirement and runs in constant time per letter. Our main contribution is a theoretical analysis in case of a binary alphabet for various smoothing rate sequences: We show that the redundancy w.r.t. a piecewise stationary model with

s

segments is

O\left(s\sqrt n\right)

for any bit sequence of length

n

, an improvement over redundancy

O\left(s\sqrt{n\log n}\right)

of previous approaches with similar time complexity

arXiv.org e-Print Archive

Crossref

Online Learning for Changing Environments using Coin Betting

Author: Jun Kwang-Sung
Orabona Francesco
Willett Rebecca
Wright Stephen
Publication venue
Publication date: 01/01/2017
Field of study

A key challenge in online learning is that classical algorithms can be slow to adapt to changing environments. Recent studies have proposed "meta" algorithms that convert any online learning algorithm to one that is adaptive to changing environments, where the adaptivity is analyzed in a quantity called the strongly-adaptive regret. This paper describes a new meta algorithm that has a strongly-adaptive regret bound that is a factor of

\sqrt{\log(T)}

better than other algorithms with the same time complexity, where

T

is the time horizon. We also extend our algorithm to achieve a first-order (i.e., dependent on the observed losses) strongly-adaptive regret bound for the first time, to our knowledge. At its heart is a new parameter-free algorithm for the learning with expert advice (LEA) problem in which experts sometimes do not output advice for consecutive time steps (i.e., \emph{sleeping} experts). This algorithm is derived by a reduction from optimal algorithms for the so-called coin betting problem. Empirical results show that our algorithm outperforms state-of-the-art methods in both learning with expert advice and metric learning scenarios.Comment: submitted to a journal. arXiv admin note: substantial text overlap with arXiv:1610.0457

arXiv.org e-Print Archive

Crossref

On Statistical Data Compression

Author: Mattern Christopher
Publication venue
Publication date: 17/02/2016
Field of study

Im Zuge der stetigen Weiterentwicklung moderner Technik wächst die Menge an zu verarbeitenden Daten.Es gilt diese Daten zu verwalten, zu übertragen und zu speichern.Dafür ist Datenkompression unerlässlich.Gemessen an empirischen Kompressionsraten zählen Statistische Datenkompressionsalgorithmen zu den Besten.Diese Algorithmen verarbeiten einen Eingabetext buchstabenweise.Dabei verfährt man für jeden Buchstaben in zwei Phasen - Modellierung und Kodierung.Während der Modellierung schätzt ein Modell, basierend auf dem bereits bekannten Text, eine Wahrscheinlichkeitsverteilung für den nächsten Buchstaben.Ein Kodierer überführt die Verteilung und den Buchstaben in ein Codewort.Umgekehrt ermittelt der Dekodierer aus der Verteilung und dem Codewort den kodierten Buchstaben.Die Wahl des Modells bestimmt den statistischen Kompressionsalgorithmus, das Modell ist also von zentraler Bedeutung.Ein Modell mischt typischerweise viele einfache Wahrscheinlichkeitsschätzer.In der statistischen Datenkompression driften Theorie und Praxis auseinander.Theoretiker legen Wert auf Modelle, die mathematische Analysen zulassen, vernachlässigen aber Laufzeit, Speicherbedarf und empirische Verbesserungen;Praktiker verfolgen den gegenteiligen Ansatz.Die PAQ-Algorithmen haben die Überlegenheit des praktischen Ansatzes verdeutlicht.Diese Arbeit soll Theorie und Praxis annähren.Dazu wird das Handwerkszeug des Theoretikers, die Codelängenanlyse, auf Algorithmen des Praktikers angewendet.Es werden Wahrscheinlichkeitsschätzer, basierend auf gealterten relativen Häufigkeiten und basierend auf exponentiell geglätteten Wahrscheinlichkeiten, analysiert.Weitere Analysen decken Methoden ab, die Verteilungen durch gewichtetes arithmetisches und geometrisches Mitteln mischen und Gewichte mittels Gradientenverfahren bestimmen.Die Analysen zeigen, dass sich die betrachteten Verfahren ähnlich gut wie idealisierte Vergleichsverfahren verhalten.Methoden aus PAQ werden mit dieser Arbeit erweitert und mit einer theoretischen Basis versehen.Experimente stützen die Analyseergebnisse.Ein weiterer Beitrag dieser Arbeit ist Context Tree Mixing (CTM), eine Verallgemeinerung von Context Tree Weighting (CTW).Durch die Kombination von CTM mit Methoden aus PAQ entsteht ein theoretisch fundierter Kompressionsalgorithmus, der in Experimenten besser als CTW komprimiert.The ongoing evolution of hardware leads to a steady increase in the amount of data that is processed, transmitted and stored.Data compression is an essential tool to keep the amount of data manageable.In terms of empirical performance statistical data compression algorithms rank among the best.A statistical data compressor processes an input text letter by letter and compresses in two stages --- modeling and coding.During modeling a model estimates a probability distribution on the next letter based on the past input.During coding an encoder translates this distribution and the next letter into a codeword.Decoding reverts this process.The model is exchangeable and its choice determines a statistical data compression algorithm.All major models use a mixer to combine multiple simple probability estimators, so-called elementary models.In statistical data compression there is a gap between theory and practice.On the one hand, theoreticians put emphasis on models that allow for a mathematical analysis, but neglect running time and space considerations and empirical improvements.On the other hand practitioners focus on the very reverse.The family of PAQ statistical compressors demonstrated the superiority of the practitioner's approach in terms of empirical compression.With this thesis we attempt to bridge the aforementioned gap between theory and practice with special focus on PAQ.To achieve this we apply the theoretician's tools to practitioner's approaches:We provide a code length analysis for several practical modeling and mixing techniques.The analysis covers modeling by relative frequencies with frequency discount and modeling by exponential smoothing of probabilities.For mixing we consider linear and geometrically weighted averaging of probabilities with Online Gradient Descent for weight estimation.Our results show that the models and mixers we consider perform nearly as well as idealized competitors.Experiments support our analysis.Moreover, our results add a theoretical basis to modeling and mixing from PAQ and generalize methods from PAQ.Ultimately, we propose and analyze Context Tree Mixing (CTM), a generalization of Context Tree Weighting (CTW).We couple CTM with modeling and mixing techniques from PAQ and obtain a theoretically sound compression algorithm that improves over CTW, as shown in experiments

Digitale Bibliothek Thüringen

Partition Tree Weighting

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref