2,738 research outputs found

    Robust semi-supervised learning: projections, limits & constraints

    Get PDF
    In many domains of science and society, the amount of data being gathered is increasing rapidly. To estimate input-output relationships that are often of interest, supervised learning techniques rely on a specific type of data: labeled examples for which we know both the input and an outcome. The problem of semi-supervised learning is how to use, increasingly abundantly available, unlabeled examples, with unknown outcomes, to improve supervised learning methods. This thesis is concerned with the question if and how these improvements are possible in a "robust", or safe, way: can we guarantee these methods do not lead to worse performance than the supervised solution?We show that for some supervised classifiers, most notably, the least squares classifier, semi-supervised adaptations can be constructed where this non-degradation in performance can indeed be guaranteed, in terms of the surrogate loss used by the classifier. Since these guarantees are given in terms of the surrogate loss, we explore why this is a useful criterion to evaluate performance. We then prove that semi-supervised versions with strict non-degradation guarantees are not possible for a large class of commonly used supervised classifiers. Other aspects covered in the thesis include optimistic learning, the peaking phenomenon and reproducibility.COMMIT - Project P23LUMC / Geneeskunde Repositoriu

    Sample Compressed PAC-Bayesian Bounds and learning algorithms

    Get PDF
    Dans le domaine de la classification, les algorithmes d'apprentissage par compression d'échantillons sont des algorithmes qui utilisent les données d'apprentissage disponibles pour construire l'ensemble de classificateurs possibles. Si les données appartiennent seulement à un petit sous-espace de l'espace de toutes les données «possibles», ces algorithmes possédent l'intéressante capacité de ne considérer que les classificateurs qui permettent de distinguer les exemples qui appartiennent à notre domaine d'intérêt. Ceci contraste avec d'autres algorithmes qui doivent considérer l'ensemble des classificateurs avant d'examiner les données d'entraînement. La machine à vecteurs de support (le SVM) est un algorithme d'apprentissage très performant qui peut être considéré comme un algorithme d'apprentissage par compression d'échantillons. Malgré son succès, le SVM est actuellement limité par le fait que sa fonction de similarité doit être un noyau symétrique semi-défini positif. Cette limitation rend le SVM difficilement applicable au cas où on désire utiliser une mesure de similarité quelconque.In classification, sample compression algorithms are the algorithms that make use of the available training data to construct the set of possible predictors. If the data belongs to only a small subspace of the space of all "possible" data, such algorithms have the interesting ability of considering only the predictors that distinguish examples in our areas of interest. This is in contrast with non sample compressed algorithms which have to consider the set of predictors before seeing the training data. The Support Vector Machine (SVM) is a very successful learning algorithm that can be considered as a sample-compression learning algorithm. Despite its success, the SVM is currently limited by the fact that its similarity function must be a symmetric positive semi-definite kernel. This limitation by design makes SVM hardly applicable for the cases where one would like to be able to use any similarity measure of input example. PAC-Bayesian theory has been shown to be a good starting point for designing learning algorithms. In this thesis, we propose a PAC-Bayes sample-compression approach to kernel methods that can accommodate any bounded similarity function. We show that the support vector classifier is actually a particular case of sample-compressed classifiers known as majority votes of sample-compressed classifiers. We propose two different groups of PAC-Bayesian risk bounds for majority votes of sample-compressed classifiers. The first group of proposed bounds depends on the KL divergence between the prior and the posterior over the set of sample-compressed classifiers. The second group of proposed bounds has the unusual property of having no KL divergence when the posterior is aligned with the prior in some precise way that we define later in this thesis. Finally, for each bound, we provide a new learning algorithm that consists of finding the predictor that minimizes the bound. The computation times of these algorithms are comparable with algorithms like the SVM. We also empirically show that the proposed algorithms are very competitive with the SVM

    Semi-generative modelling: learning with cause and effect features

    Get PDF
    We consider a case of covariate shift where prior causal inference or expert knowledge has identified some features as effects, and show how this setting, when analysed from a causal perspective, gives rise to a semi-generative modelling framework: P(Y,X_eff|Xcau)
    corecore