Machine learning via transitions

Van Rooyen, Brendan

thesis

Machine learning via transitions

Authors: Brendan Van Rooyen
Publication date
Publisher
Doi

Abstract

This thesis presents a clear conceptual basis for theoretically studying machine learning problems. Machine learning methods afford means to automate the discovery of relationships in data sets. A relationship between quantities X and Y allows the prediction of one quantity given information of the other. It is these relationships that we make the central object of study. We call these relationships transitions. A transition from a set X to a set Y is a function from X into the probability distributions on Y. Beginning with this simple notion, the thesis proceeds as follows: Utilizing tools from statistical decision theory, we develop an abstract language for quantifying the information present in a transition. We attack the problem of generalized supervision. Generalized supervision is the learning of classifiers from non-ideal data. An important example of this is the learning of classifiers from noisily labelled data. We demonstrate the virtues of our abstract treatment by producing generic methods for solving these problems, as well as producing generic upper bounds for our methods as well as lower bounds for any method that attempts to solve these problems. As a result of our study in generalized supervision, we produce means to define procedures that are robust to certain forms of corruption. We explore, in detail, procedures for learning classifiers that are robust to the effects of symmetric label noise. The result is a classification algorithm that is easier to understand, implement and parallelize than standard kernel based classification schemes, such as the support vector machine and logistic regression. Furthermore, we demonstrate the uniqueness of this method. Finally, we show how many feature learning schemes can be understood via our language. We present well motivated objectives for the task of learning features from unlabelled data, before showing how many standard feature learning methods (such as PCA, sparse coding, auto-encoders and so on) can be seen as minimizing surrogates to our objective functions

Similar works

Full text

Available Versions

The Australian National University

oai:openresearch-repository.an...

Last time updated on 26/04/2018