Search CORE

35,022 research outputs found

The Sample Complexity of Learning Linear Predictors with the Squared Loss

Author: Ohad Shamir
Publication venue
Publication date: 01/01/2015
Field of study

Abstract We provide a tight sample complexity bound for learning bounded-norm linear predictors with respect to the squared loss. Our focus is on an agnostic PAC-style setting, where no assumptions are made on the data distribution beyond boundedness. This contrasts with existing results in the literature, which rely on other distributional assumptions, refer to specific parameter settings, or use other performance measures

CiteSeerX

Learning with Clustering Structure

Author: Bach Francis
d'Aspremont Alexandre
Fogel Fajwel
Roulet Vincent
Publication venue
Publication date: 19/09/2016
Field of study

We study supervised learning problems using clustering constraints to impose structure on either features or samples, seeking to help both prediction and interpretation. The problem of clustering features arises naturally in text classification for instance, to reduce dimensionality by grouping words together and identify synonyms. The sample clustering problem on the other hand, applies to multiclass problems where we are allowed to make multiple predictions and the performance of the best answer is recorded. We derive a unified optimization formulation highlighting the common structure of these problems and produce algorithms whose core iteration complexity amounts to a k-means clustering step, which can be approximated efficiently. We extend these results to combine sparsity and clustering constraints, and develop a new projection algorithm on the set of clustered sparse vectors. We prove convergence of our algorithms on random instances, based on a union of subspaces interpretation of the clustering structure. Finally, we test the robustness of our methods on artificial data sets as well as real data extracted from movie reviews.Comment: Completely rewritten. New convergence proofs in the clustered and sparse clustered case. New projection algorithm on sparse clustered vector

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Reconciling modern machine learning practice and the bias-variance trade-off

Author: Belkin Mikhail
Hsu Daniel
Ma Siyuan
Mandal Soumik
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 10/09/2019
Field of study

Breakthroughs in machine learning are rapidly changing science and society, yet our fundamental understanding of this technology has lagged far behind. Indeed, one of the central tenets of the field, the bias-variance trade-off, appears to be at odds with the observed behavior of methods used in the modern machine learning practice. The bias-variance trade-off implies that a model should balance under-fitting and over-fitting: rich enough to express underlying structure in data, simple enough to avoid fitting spurious patterns. However, in the modern practice, very rich models such as neural networks are trained to exactly fit (i.e., interpolate) the data. Classically, such models would be considered over-fit, and yet they often obtain high accuracy on test data. This apparent contradiction has raised questions about the mathematical foundations of machine learning and their relevance to practitioners. In this paper, we reconcile the classical understanding and the modern practice within a unified performance curve. This "double descent" curve subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance. We provide evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets, and we posit a mechanism for its emergence. This connection between the performance and the structure of machine learning models delineates the limits of classical analyses, and has implications for both the theory and practice of machine learning

arXiv.org e-Print Archive

Supervised Learning with Similarity Functions

Author: Jain Prateek
Kar Purushottam
Publication venue
Publication date: 22/10/2012
Field of study

We address the problem of general supervised learning when data can only be accessed through an (indefinite) similarity function between data points. Existing work on learning with indefinite kernels has concentrated solely on binary/multi-class classification problems. We propose a model that is generic enough to handle any supervised learning task and also subsumes the model previously proposed for classification. We give a "goodness" criterion for similarity functions w.r.t. a given supervised learning task and then adapt a well-known landmarking technique to provide efficient algorithms for supervised learning using "good" similarity functions. We demonstrate the effectiveness of our model on three important super-vised learning problems: a) real-valued regression, b) ordinal regression and c) ranking where we show that our method guarantees bounded generalization error. Furthermore, for the case of real-valued regression, we give a natural goodness definition that, when used in conjunction with a recent result in sparse vector recovery, guarantees a sparse predictor with bounded generalization error. Finally, we report results of our learning algorithms on regression and ordinal regression tasks using non-PSD similarity functions and demonstrate the effectiveness of our algorithms, especially that of the sparse landmark selection algorithm that achieves significantly higher accuracies than the baseline methods while offering reduced computational costs.Comment: To appear in the proceedings of NIPS 2012, 30 page

arXiv.org e-Print Archive

CiteSeerX

A PAC-Bayesian bound for Lifelong Learning

Author: Jebara Tony
Lampert Christoph
Pentina Anastasia
Xing Eric
Publication venue
Publication date: 01/01/2014
Field of study

Transfer learning has received a lot of attention in the machine learning community over the last years, and several effective algorithms have been developed. However, relatively little is known about their theoretical properties, especially in the setting of lifelong learning, where the goal is to transfer information to tasks for which no data have been observed so far. In this work we study lifelong learning from a theoretical perspective. Our main result is a PAC-Bayesian generalization bound that offers a unified view on existing paradigms for transfer learning, such as the transfer of parameters or the transfer of low-dimensional representations. We also use the bound to derive two principled lifelong learning algorithms, and we show that these yield results comparable with existing methods.Comment: to appear at ICML 201

arXiv.org e-Print Archive

CiteSeerX

IST Austria: PubRep (Institute of Science and Technology)