Location of Repository

Pattern classification via unsupervised learners

By Nicholas James Palmer

Abstract

We consider classification problems in a variant of the Probably Approximately Correct (PAC)-learning framework, in which an unsupervised learner creates a discriminant function over each class and observations are labeled by the learner returning the highest value associated with that observation. Consideration is given to whether this approach gains significant advantage over traditional discriminant techniques.\ud It is shown that PAC-learning distributions over class labels under Ll distance or KL-divergence implies PAC classification in this framework. We give bounds on the regret associated with the resulting classifier, taking into account the possibility of variable misclassification penalties. We demonstrate the advantage of estimating the a posteriori probability distributions over class labels in the setting of Optical Character Recognition.\ud We show that unsupervised learners can be used to learn a class of probabilistic concepts (stochastic rules denoting the probability that an observation has a positive label in a 2-class setting). This demonstrates a situation where unsupervised learners can be used even when it is hard to learn distributions over class labels - in this case the discriminant functions do not estimate the class probability densities.\ud We use a standard state-merging technique to PAC-learn a class of probabilistic automata and show that by learning the distribution over outputs under the weaker L1 distance rather than KL-divergence we are able to learn without knowledge of the expected length of an output. It is also shown that for a restricted class of these automata learning under L1 distance is equivalent to learning under KL-divergence

Topics: LB, QA
OAI identifier: oai:wrap.warwick.ac.uk:2373

Suggested articles

Preview

Citations

  1. (1952). A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the Sum of Observations. doi
  2. (1996). A Probabilistic Theory of Pattern Recognition. doi
  3. (2007). A Survey of Dimension Reduction Techniques. Lawrence Livermore National Laboratory, doi
  4. (1984). A Theory of the Learnable. doi
  5. (1995). Artificial Intelligence: A Modern Approach. doi
  6. (2003). Best Practice for Convolutional Neural Networks Applied to Visual Document Analysis. doi
  7. (2007). Discriminative Learning can Succeed where Generative Learning Fails. Information Processing Letters doi
  8. (1997). Discriminative vs Informative Learning.
  9. (1994). Efficient Distribution-free Learning of Probabilistic Concepts. doi
  10. (1991). Elements of Information Theory. doi
  11. (1991). Equivalence of Models for Polynomial Learnability. doi
  12. (2001). Evolutionary Trees can be Learned in Polynomial Time in the Two-State General Markov Model. doi
  13. (1998). Gradient-Based Learning Applied to Document Recognition. doi
  14. (2005). Learnability of Probabilistic Automata via Oracles. In doi
  15. (1993). Learning and Robust Learning of Product Distributions. doi
  16. (1999). Learning Mixtures of Gaussians. doi
  17. (2004). Learning Stochastic Finite Automata. doi
  18. (1994). Learning Stochastic Regular Grammars by means of a State Merging Method. doi
  19. (2005). Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms. doi
  20. (1999). Neural Network Learning: Theoretical Foundations. doi
  21. (1995). Neural Networks for Pattern Recognition. doi
  22. (2001). On discriminative vs generative classifiers: A comparison of logistic regression and naive bayes. doi
  23. (2007). On distribution classes induced by probabilistic automata. doi
  24. (1998). On the Learnability and Usage of Acyclic Probabilistic Finite Automata. doi
  25. (1994). On the Learnability of Discrete Distributions. doi
  26. (1971). On the Uniform Convergence of Relative Frequencies of Events to their Probabilities. Theory of Probability and its Applications, doi
  27. (2004). PAC Classification via PAC Estimates of Label Class Distributions.
  28. (2007). PAC-Learnability of Probabilistic Deterministic Finite State Automata in terms of Variation Distance. doi
  29. (2004). PAC-learnability of Probabilistic Deterministic Finite State Automata. doi
  30. (1973). Pattern Classification and Scene Analysis. doi
  31. (2001). Polynomial Learnability of Stochastic Rules with respect to the KL-divergence and Quadratic Distance. doi
  32. (2006). Principled hybrids of generative and discriminative models. doi
  33. (1990). Probably Approximately Correct Learning. doi
  34. (2000). Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers.
  35. (1996). Semi-supervised classification with hybrid generative/discriminative methods. doi
  36. (2005). Semi-Supervised Learning Literature Survey.
  37. (2004). Smoothing Probabilistic Automata: An Error-Correcting Approach. doi
  38. (2000). The Nature of Statistical Learning Theory. doi
  39. (1995). Theory and Applications of Agnostic PAC-Learning with Small Decision Trees. doi
  40. (1992). Toward Efficient Agnostic Learning. doi
  41. (2001). Variations on Probabilistic Suffix Trees: Statistical Modeling and Prediction of Protein Families. doi
  42. (2001). When Can Two Unsupervised Learners Achieve PAC Separation? doi

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.