Search CORE

704 research outputs found

Learning with Spectral Kernels and Heavy-Tailed Data

Author: Mahoney Michael W.
Narayanan Hariharan
Publication venue
Publication date: 10/05/2010
Field of study

Two ubiquitous aspects of large-scale data analysis are that the data often have heavy-tailed properties and that diffusion-based or spectral-based methods are often used to identify and extract structure of interest. Perhaps surprisingly, popular distribution-independent methods such as those based on the VC dimension fail to provide nontrivial results for even simple learning problems such as binary classification in these two settings. In this paper, we develop distribution-dependent learning methods that can be used to provide dimension-independent sample complexity bounds for the binary classification problem in these two popular settings. In particular, we provide bounds on the sample complexity of maximum margin classifiers when the magnitude of the entries in the feature vector decays according to a power law and also when learning is performed with the so-called Diffusion Maps kernel. Both of these results rely on bounding the annealed entropy of gap-tolerant classifiers in a Hilbert space. We provide such a bound, and we demonstrate that our proof technique generalizes to the case when the margin is measured with respect to more general Banach space norms. The latter result is of potential interest in cases where modeling the relationship between data elements as a dot product in a Hilbert space is too restrictive.Comment: 21 pages. Substantially revised and extended relative to the first versio

arXiv.org e-Print Archive

Convex Risk Minimization and Conditional Probability Estimation

Author: Dudík Miroslav
Schapire Robert
Telgarsky Matus
Publication venue
Publication date: 15/06/2015
Field of study

This paper proves, in very general settings, that convex risk minimization is a procedure to select a unique conditional probability model determined by the classification problem. Unlike most previous work, we give results that are general enough to include cases in which no minimum exists, as occurs typically, for instance, with standard boosting algorithms. Concretely, we first show that any sequence of predictors minimizing convex risk over the source distribution will converge to this unique model when the class of predictors is linear (but potentially of infinite dimension). Secondly, we show the same result holds for \emph{empirical} risk minimization whenever this class of predictors is finite dimensional, where the essential technical contribution is a norm-free generalization bound.Comment: To appear, COLT 201

arXiv.org e-Print Archive

Bayesian Inference with Posterior Regularization and applications to Infinite Latent SVMs

Author: Chen Ning
Xing Eric P.
Zhu Jun
Publication venue
Publication date: 12/02/2014
Field of study

Existing Bayesian models, especially nonparametric Bayesian methods, rely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations. While priors can affect posterior distributions through Bayes' rule, imposing posterior regularization is arguably more direct and in some cases more natural and general. In this paper, we present regularized Bayesian inference (RegBayes), a novel computational framework that performs posterior inference with a regularization term on the desired post-data posterior distribution under an information theoretical formulation. RegBayes is more flexible than the procedure that elicits expert knowledge via priors, and it covers both directed Bayesian networks and undirected Markov networks whose Bayesian formulation results in hybrid chain graph models. When the regularization is induced from a linear operator on the posterior distributions, such as the expectation operator, we present a general convex-analysis theorem to characterize the solution of RegBayes. Furthermore, we present two concrete examples of RegBayes, infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the large-margin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets, which appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics. Such results were not available until now, and contribute to push forward the interface between these two important subfields, which have been largely treated as isolated in the community.Comment: 49 pages, 11 figure

arXiv.org e-Print Archive

Multi-task Learning in Vector-valued Reproducing Kernel Banach Spaces with the $\ell^1$ Norm

Author: Lin Rongrong
Song Guohui
Zhang Haizhang
Publication venue
Publication date: 04/01/2019
Field of study

Targeting at sparse multi-task learning, we consider regularization models with an

\ell^1

penalty on the coefficients of kernel functions. In order to provide a kernel method for this model, we construct a class of vector-valued reproducing kernel Banach spaces with the

\ell^1

norm. The notion of multi-task admissible kernels is proposed so that the constructed spaces could have desirable properties including the crucial linear representer theorem. Such kernels are related to bounded Lebesgue constants of a kernel interpolation question. We study the Lebesgue constant of multi-task kernels and provide examples of admissible kernels. Furthermore, we present numerical experiments for both synthetic data and real-world benchmark data to demonstrate the advantages of the proposed construction and regularization models

arXiv.org e-Print Archive

Solving $\ell^p\!$ -norm regularization with tensor kernels

Author: Rosasco Lorenzo
Salzo Saverio
Suykens Johan A. K.
Publication venue
Publication date: 18/10/2017
Field of study

In this paper, we discuss how a suitable family of tensor kernels can be used to efficiently solve nonparametric extensions of

\ell^p

regularized learning methods. Our main contribution is proposing a fast dual algorithm, and showing that it allows to solve the problem efficiently. Our results contrast recent findings suggesting kernel methods cannot be extended beyond Hilbert setting. Numerical experiments confirm the effectiveness of the method

arXiv.org e-Print Archive

Online Learning via Sequential Complexities

Author: Rakhlin Alexander
Sridharan Karthik
Tewari Ambuj
Publication venue
Publication date: 12/08/2014
Field of study

We consider the problem of sequential prediction and provide tools to study the minimax value of the associated game. Classical statistical learning theory provides several useful complexity measures to study learning with i.i.d. data. Our proposed sequential complexities can be seen as extensions of these measures to the sequential setting. The developed theory is shown to yield precise learning guarantees for the problem of sequential prediction. In particular, we show necessary and sufficient conditions for online learnability in the setting of supervised learning. Several examples show the utility of our framework: we can establish learnability without having to exhibit an explicit online learning algorithm

arXiv.org e-Print Archive

Empirical margin distributions and bounding the generalization error of combined classifiers

Author: Koltchinskii Vladimir
Panchenko Dmitry
Publication venue
Publication date: 17/05/2004
Field of study

We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such as boosting and bagging. The bounds are in terms of the empirical distribution of the margin of the combined classifier. They are based on the methods of the theory of Gaussian and empirical processes (comparison inequalities, symmetrization method, concentration inequalities) and they improve previous results of Bartlett (1998) on bounding the generalization error of neural networks in terms of l_1-norms of the weights of neurons and of Schapire, Freund, Bartlett and Lee (1998) on bounding the generalization error of boosting. We also obtain rates of convergence in Levy distance of empirical margin distribution to the true margin distribution uniformly over the classes of classifiers and prove the optimality of these rates.Comment: 35 pages, 1 figur

arXiv.org e-Print Archive

Non-asymptotic Analysis of $\ell_1$ -norm Support Vector Machines

Author: Kolleck Anton
Vybíral Jan
Publication venue
Publication date: 27/09/2015
Field of study

Support Vector Machines (SVM) with

\ell_1

penalty became a standard tool in analysis of highdimensional classification problems with sparsity constraints in many applications including bioinformatics and signal processing. Although SVM have been studied intensively in the literature, this paper has to our knowledge first non-asymptotic results on the performance of

\ell_1

-SVM in identification of sparse classifiers. We show that a

d

-dimensional

s

-sparse classification vector can be (with high probability) well approximated from only

O(s\log(d))

Gaussian trials. The methods used in the proof include concentration of measure and probability in Banach spaces

arXiv.org e-Print Archive

On the Sample Complexity of Predictive Sparse Coding

Author: Gray Alexander G.
Mehta Nishant A.
Publication venue
Publication date: 01/01/2012
Field of study

The goal of predictive sparse coding is to learn a representation of examples as sparse linear combinations of elements from a dictionary, such that a learned hypothesis linear in the new representation performs well on a predictive task. Predictive sparse coding algorithms recently have demonstrated impressive performance on a variety of supervised tasks, but their generalization properties have not been studied. We establish the first generalization error bounds for predictive sparse coding, covering two settings: 1) the overcomplete setting, where the number of features k exceeds the original dimensionality d; and 2) the high or infinite-dimensional setting, where only dimension-free bounds are useful. Both learning bounds intimately depend on stability properties of the learned sparse encoder, as measured on the training sample. Consequently, we first present a fundamental stability result for the LASSO, a result characterizing the stability of the sparse codes with respect to perturbations to the dictionary. In the overcomplete setting, we present an estimation error bound that decays as \tilde{O}(sqrt(d k/m)) with respect to d and k. In the high or infinite-dimensional setting, we show a dimension-free bound that is \tilde{O}(sqrt(k^2 s / m)) with respect to k and s, where s is an upper bound on the number of non-zeros in the sparse code for any training data point.Comment: Sparse Coding Stability Theorem from version 1 has been relaxed considerably using a new notion of coding margin. Old Sparse Coding Stability Theorem still in new version, now as Theorem 2. Presentation of all proofs simplified/improved considerably. Paper reorganized. Empirical analysis showing new coding margin is non-trivial on real dataset

arXiv.org e-Print Archive

CiteSeerX

A Bayes consistent 1-NN classifier

Author: Kontorovich Aryeh
Weiss Roi
Publication venue
Publication date: 17/08/2018
Field of study

We show that a simple modification of the 1-nearest neighbor classifier yields a strongly Bayes consistent learner. Prior to this work, the only strongly Bayes consistent proximity-based method was the k-nearest neighbor classifier, for k growing appropriately with sample size. We will argue that a margin-regularized 1-NN enjoys considerable statistical and algorithmic advantages over the k-NN classifier. These include user-friendly finite-sample error bounds, as well as time- and memory-efficient learning and test-point evaluation algorithms with a principled speed-accuracy tradeoff. Encouraging empirical results are reported

arXiv.org e-Print Archive