290 research outputs found
Predictive PAC Learning and Process Decompositions
We informally call a stochastic process learnable if it admits a
generalization error approaching zero in probability for any concept class with
finite VC-dimension (IID processes are the simplest example). A mixture of
learnable processes need not be learnable itself, and certainly its
generalization error need not decay at the same rate. In this paper, we argue
that it is natural in predictive PAC to condition not on the past observations
but on the mixture component of the sample path. This definition not only
matches what a realistic learner might demand, but also allows us to sidestep
several otherwise grave problems in learning from dependent data. In
particular, we give a novel PAC generalization bound for mixtures of learnable
processes with a generalization error that is not worse than that of each
mixture component. We also provide a characterization of mixtures of absolutely
regular (-mixing) processes, of independent probability-theoretic
interest.Comment: 9 pages, accepted in NIPS 201
On the positive eigenvalues and eigenvectors of a non-negative matrix
The paper develops the general theory for the items in the title, assuming
that the matrix is countable and cofinal.Comment: Version 2 allows the matrix to have zero row(s) and rows with
infinitely many non-zero entries. In addition the introduction has been
rewritte
Pattern Recognition for Conditionally Independent Data
In this work we consider the task of relaxing the i.i.d assumption in pattern
recognition (or classification), aiming to make existing learning algorithms
applicable to a wider range of tasks. Pattern recognition is guessing a
discrete label of some object based on a set of given examples (pairs of
objects and labels). We consider the case of deterministically defined labels.
Traditionally, this task is studied under the assumption that examples are
independent and identically distributed. However, it turns out that many
results of pattern recognition theory carry over a weaker assumption. Namely,
under the assumption of conditional independence and identical distribution of
objects, while the only assumption on the distribution of labels is that the
rate of occurrence of each label should be above some positive threshold.
We find a broad class of learning algorithms for which estimations of the
probability of a classification error achieved under the classical i.i.d.
assumption can be generalised to the similar estimates for the case of
conditionally i.i.d. examples.Comment: parts of results published at ALT'04 and ICML'0
MCMC Learning
The theory of learning under the uniform distribution is rich and deep, with
connections to cryptography, computational complexity, and the analysis of
boolean functions to name a few areas. This theory however is very limited due
to the fact that the uniform distribution and the corresponding Fourier basis
are rarely encountered as a statistical model.
A family of distributions that vastly generalizes the uniform distribution on
the Boolean cube is that of distributions represented by Markov Random Fields
(MRF). Markov Random Fields are one of the main tools for modeling high
dimensional data in many areas of statistics and machine learning.
In this paper we initiate the investigation of extending central ideas,
methods and algorithms from the theory of learning under the uniform
distribution to the setup of learning concepts given examples from MRF
distributions. In particular, our results establish a novel connection between
properties of MCMC sampling of MRFs and learning under the MRF distribution.Comment: 28 pages, 1 figur
Approximate Learning of Limit-Average Automata
Limit-average automata are weighted automata on infinite words that use average to aggregate the weights seen in infinite runs. We study approximate learning problems for limit-average automata in two settings: passive and active. In the passive learning case, we show that limit-average automata are not PAC-learnable as samples must be of exponential-size to provide (with good probability) enough details to learn an automaton. We also show that the problem of finding an automaton that fits a given sample is NP-complete. In the active learning case, we show that limit-average automata can be learned almost-exactly, i.e., we can learn in polynomial time an automaton that is consistent with the target automaton on almost all words. On the other hand, we show that the problem of learning an automaton that approximates the target automaton (with perhaps fewer states) is NP-complete. The abovementioned results are shown for the uniform distribution on words. We briefly discuss learning over different distributions
- …