Search CORE

121 research outputs found

PAC Classification based on PAC Estimates of Label Class Distributions

Author: Goldberg Paul W.
Palmer Nick
Publication venue
Publication date: 01/01/2005
Field of study

A standard approach in pattern classification is to estimate the distributions of the label classes, and then to apply the Bayes classifier to the estimates of the distributions in order to classify unlabeled examples. As one might expect, the better our estimates of the label class distributions, the better the resulting classifier will be. In this paper we make this observation precise by identifying risk bounds of a classifier in terms of the quality of the estimates of the label class distributions. We show how PAC learnability relates to estimates of the distributions that have a PAC guarantee on their

L_1

distance from the true distribution, and we bound the increase in negative log likelihood risk in terms of PAC bounds on the KL-divergence. We give an inefficient but general-purpose smoothing method for converting an estimated distribution that is good under the

L_1

metric into a distribution that is good under the KL-divergence.Comment: 14 page

arXiv.org e-Print Archive

CiteSeerX

Bootstrapping and learning PDFA in data streams

Author: Balle Pigem Borja de
Castro Rabal Jorge
Gavaldà Mestre Ricard
Publication venue
Publication date: 01/01/2012
Field of study

Best Student Paper ICGI 2012Markovian models with hidden state are widely-used formalisms for modeling sequential phenomena. Learnability of these models has been well studied when the sample is given in batch mode, and algorithms with PAC-like learning guarantees exist for specic classes of models such as Probabilistic Deterministic Finite Automata (PDFA). Here we focus on PDFA and give an algorithm for infering models in this class under the stringent data stream scenario: unlike existing methods, our algorithm works incrementally and in one pass, uses memory sublinear in the stream length, and processes input items in amortized constant time. We provide rigorous PAC-like bounds for all of the above, as well as an evaluation on synthetic data showing that the algorithm performs well in practice. Our algorithm makes a key usage of several old and new sketching techniques. In particular, we develop a new sketch for implementing bootstrapping in a streaming setting which may be of independent interest. In experiments we have observed that this sketch yields important reductions in the examples required for performing some crucial statistical tests in our algorithm.Peer ReviewedAward-winningPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

On the Learnability of Shuffle Ideals

Author: Angluin Dana
Aspnes James
Eisenstat Sarah Charmian
Kontorovich Aryeh
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

PAC learning of unrestricted regular languages is long known to be a difficult problem. The class of shuffle ideals is a very restricted subclass of regular languages, where the shuffle ideal generated by a string u is the collection of all strings containing u as a subsequence. This fundamental language family is of theoretical interest in its own right and provides the building blocks for other important language families. Despite its apparent simplicity, the class of shuffle ideals appears quite difficult to learn. In particular, just as for unrestricted regular languages, the class is not properly PAC learnable in polynomial time if RP 6= NP, and PAC learning the class improperly in polynomial time would imply polynomial time algorithms for certain fundamental problems in cryptography. In the positive direction, we give an efficient algorithm for properly learning shuffle ideals in the statistical query (and therefore also PAC) model under the uniform distribution.T-Party Projec

CiteSeerX

DSpace@MIT

Should We Learn Probabilistic Models for Model Checking? A New Approach and An Empirical Study

Author: A Bauer
A Bianco
A Itai
A Mizera
C Baier
C Higuera De la
C Kermorvant
C Rohr
D Angluin
D Ron
D Tabakov
EM Clarke
EM Clarke
F He
G Norman
G Norman
HL Younes
HLS Younes
HLS Younes
I Shmulevich
JH Holland
K Havelund
K Sen
L Helmink
M Kwiatkowska
MK Reiter
RC Carrasco
RC Carrasco
T Brázdil
T Herman
Y Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Many automated system analysis techniques (e.g., model checking, model-based testing) rely on first obtaining a model of the system under analysis. System modeling is often done manually, which is often considered as a hindrance to adopt model-based system analysis and development techniques. To overcome this problem, researchers have proposed to automatically "learn" models based on sample system executions and shown that the learned models can be useful sometimes. There are however many questions to be answered. For instance, how much shall we generalize from the observed samples and how fast would learning converge? Or, would the analysis result based on the learned model be more accurate than the estimation we could have obtained by sampling many system executions within the same amount of time? In this work, we investigate existing algorithms for learning probabilistic models for model checking, propose an evolution-based approach for better controlling the degree of generalization and conduct an empirical study in order to answer the questions. One of our findings is that the effectiveness of learning may sometimes be limited.Comment: 15 pages, plus 2 reference pages, accepted by FASE 2017 in ETAP

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

Open Repository and Bibliography - Luxembourg

Pattern classification via unsupervised learners

Author: Palmer Nicholas James
Publication venue
Publication date
Field of study

We consider classification problems in a variant of the Probably Approximately Correct (PAC)-learning framework, in which an unsupervised learner creates a discriminant function over each class and observations are labeled by the learner returning the highest value associated with that observation. Consideration is given to whether this approach gains significant advantage over traditional discriminant techniques. It is shown that PAC-learning distributions over class labels under Ll distance or KL-divergence implies PAC classification in this framework. We give bounds on the regret associated with the resulting classifier, taking into account the possibility of variable misclassification penalties. We demonstrate the advantage of estimating the a posteriori probability distributions over class labels in the setting of Optical Character Recognition. We show that unsupervised learners can be used to learn a class of probabilistic concepts (stochastic rules denoting the probability that an observation has a positive label in a 2-class setting). This demonstrates a situation where unsupervised learners can be used even when it is hard to learn distributions over class labels - in this case the discriminant functions do not estimate the class probability densities. We use a standard state-merging technique to PAC-learn a class of probabilistic automata and show that by learning the distribution over outputs under the weaker L1 distance rather than KL-divergence we are able to learn without knowledge of the expected length of an output. It is also shown that for a restricted class of these automata learning under L1 distance is equivalent to learning under KL-divergence

Warwick Research Archives Portal Repository

Learning stochastic finite automata from experts

Author: D. Angluin
E.M. Gold
E.M. Gold
K. Lari
N. Abe
P. García
T. Goan
W. Hoeffding
Y. Sakakibara
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Learning probabilistic models for model checking: an evolutionary approach and an empirical study

Author: PANG Jun
SUN Jun
WANG Jingyi
YUAN Qixia
Publication venue: Springer Verlag
Publication date: 01/11/2018
Field of study

Institutional Knowledge at Singapore Management University

Learning deterministic probabilistic automata from a model checking perspective

Author: A Clark
B Bollig
Brian Nielsen
C Baier
C Baier
C Courcoubetis
Cd Higuera
D Angluin
D Ron
D Ron
DR Cox
EA Gehan
F Breugel van
G Behrmann
Hua Mao
J Oncina
Kim G. Larsen
Manfred Jaeger
O Grinchtein
P Bouyer
RC Carrasco
Thomas D. Nielsen
WG Tzeng
Yingke Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Probabilistic automata models play an important role in the formal design and analysis of hard- and software systems. In this area of applications, one is often interested in formal model-checking procedures for verifying critical system properties. Since adequate system models are often difficult to design manually, we are interested in learning models from observed system behaviors. To this end we adopt techniques for learning finite probabilistic automata, notably the Alergia algorithm. In this paper we show how to extend the basic algorithm to also learn automata models for both reactive and timed systems. A key question of our investigation is to what extent one can expect a learned model to be a good approximation for the kind of probabilistic properties one wants to verify by model checking. We establish theoretical convergence properties for the learning algorithm as well as for probability estimates of system properties expressed in linear time temporal logic and linear continuous stochastic logic. We empirically compare the learning algorithm with statistical model checking and demonstrate the feasibility of the approach for practical system verification

Northumbria Research Link

Crossref

Teeside University's Research Repository

VBN