Search CORE

6 research outputs found

Bootstrapping and learning PDFA in data streams

Author: Balle Pigem Borja de
Castro Rabal Jorge
Gavaldà Mestre Ricard
Publication venue
Publication date: 01/01/2012
Field of study

Best Student Paper ICGI 2012Markovian models with hidden state are widely-used formalisms for modeling sequential phenomena. Learnability of these models has been well studied when the sample is given in batch mode, and algorithms with PAC-like learning guarantees exist for specic classes of models such as Probabilistic Deterministic Finite Automata (PDFA). Here we focus on PDFA and give an algorithm for infering models in this class under the stringent data stream scenario: unlike existing methods, our algorithm works incrementally and in one pass, uses memory sublinear in the stream length, and processes input items in amortized constant time. We provide rigorous PAC-like bounds for all of the above, as well as an evaluation on synthetic data showing that the algorithm performs well in practice. Our algorithm makes a key usage of several old and new sketching techniques. In particular, we develop a new sketch for implementing bootstrapping in a streaming setting which may be of independent interest. In experiments we have observed that this sketch yields important reductions in the examples required for performing some crucial statistical tests in our algorithm.Peer ReviewedAward-winningPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Learning probability distributions generated by finite-state machines

Author: Castro Rabal Jorge
Gavaldà Mestre Ricard
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

We review methods for inference of probability distributions generated by probabilistic automata and related models for sequence generation. We focus on methods that can be proved to learn in the inference in the limit and PAC formal models. The methods we review are state merging and state splitting methods for probabilistic deterministic automata and the recently developed spectral method for nondeterministic probabilistic automata. In both cases, we derive them from a high-level algorithm described in terms of the Hankel matrix of the distribution to be learned, given as an oracle, and then describe how to adapt that algorithm to account for the error introduced by a finite sample.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Combinatorial Kalman Filter and High Level Trigger Reconstruction for the Belle II Experiment

Author: Braun Nils Lennart
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2019
Field of study

KITopen

Bootstrapping and learning PDFA in data streams

Author: Balle Pigem Borja de
Castro Rabal Jorge
Gavaldà Mestre Ricard
Publication venue
Publication date
Field of study

RECERCAT