Location of Repository

We consider classification problems in a variant of the Probably Approximately Correct (PAC)-learning framework, in which an unsupervised learner creates a discriminant function over each class and observations are labeled by the learner returning the highest value associated with that observation. Consideration is given to whether this approach gains significant advantage over traditional discriminant techniques.\ud It is shown that PAC-learning distributions over class labels under Ll distance or KL-divergence implies PAC classification in this framework. We give bounds on the regret associated with the resulting classifier, taking into account the possibility of variable misclassification penalties. We demonstrate the advantage of estimating the a posteriori probability distributions over class labels in the setting of Optical Character Recognition.\ud We show that unsupervised learners can be used to learn a class of probabilistic concepts (stochastic rules denoting the probability that an observation has a positive label in a 2-class setting). This demonstrates a situation where unsupervised learners can be used even when it is hard to learn distributions over class labels - in this case the discriminant functions do not estimate the class probability densities.\ud We use a standard state-merging technique to PAC-learn a class of probabilistic automata and show that by learning the distribution over outputs under the weaker L1 distance rather than KL-divergence we are able to learn without knowledge of the expected length of an output. It is also shown that for a restricted class of these automata learning under L1 distance is equivalent to learning under KL-divergence

Topics:
LB, QA

OAI identifier:
oai:wrap.warwick.ac.uk:2373

Provided by:
Warwick Research Archives Portal Repository

Downloaded from
http://wrap.warwick.ac.uk/2373/1/WRAP_THESIS_Palmer_2008.pdf

- (1952). A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the Sum of Observations.
- (1996). A Probabilistic Theory of Pattern Recognition.
- (2007). A Survey of Dimension Reduction Techniques. Lawrence Livermore National Laboratory,
- (1984). A Theory of the Learnable.
- (1995). Artificial Intelligence: A Modern Approach.
- (2003). Best Practice for Convolutional Neural Networks Applied to Visual Document Analysis.
- (2007). Discriminative Learning can Succeed where Generative Learning Fails. Information Processing Letters
- (1997). Discriminative vs Informative Learning.
- (1994). Efficient Distribution-free Learning of Probabilistic Concepts.
- (1991). Elements of Information Theory.
- (1991). Equivalence of Models for Polynomial Learnability.
- (2001). Evolutionary Trees can be Learned in Polynomial Time in the Two-State General Markov Model.
- (1998). Gradient-Based Learning Applied to Document Recognition.
- (2005). Learnability of Probabilistic Automata via Oracles. In
- (1993). Learning and Robust Learning of Product Distributions.
- (1999). Learning Mixtures of Gaussians.
- (2004). Learning Stochastic Finite Automata.
- (1994). Learning Stochastic Regular Grammars by means of a State Merging Method.
- (2005). Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms.
- (1999). Neural Network Learning: Theoretical Foundations.
- (1995). Neural Networks for Pattern Recognition.
- (2001). On discriminative vs generative classifiers: A comparison of logistic regression and naive bayes.
- (2007). On distribution classes induced by probabilistic automata.
- (1998). On the Learnability and Usage of Acyclic Probabilistic Finite Automata.
- (1994). On the Learnability of Discrete Distributions.
- (1971). On the Uniform Convergence of Relative Frequencies of Events to their Probabilities. Theory of Probability and its Applications,
- (2004). PAC Classification via PAC Estimates of Label Class Distributions.
- (2007). PAC-Learnability of Probabilistic Deterministic Finite State Automata in terms of Variation Distance.
- (2004). PAC-learnability of Probabilistic Deterministic Finite State Automata.
- (1973). Pattern Classification and Scene Analysis.
- (2001). Polynomial Learnability of Stochastic Rules with respect to the KL-divergence and Quadratic Distance.
- (2006). Principled hybrids of generative and discriminative models.
- (1990). Probably Approximately Correct Learning.
- (2000). Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers.
- (1996). Semi-supervised classification with hybrid generative/discriminative methods.
- (2005). Semi-Supervised Learning Literature Survey.
- (2004). Smoothing Probabilistic Automata: An Error-Correcting Approach.
- (2000). The Nature of Statistical Learning Theory.
- (1995). Theory and Applications of Agnostic PAC-Learning with Small Decision Trees.
- (1992). Toward Efficient Agnostic Learning.
- (2001). Variations on Probabilistic Suffix Trees: Statistical Modeling and Prediction of Protein Families.
- (2001). When Can Two Unsupervised Learners Achieve PAC Separation?