Search CORE

47 research outputs found

Statistical properties of kernel principal component analysis

Author: Laurent Zwald
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

On the Sample Complexity of Subspace Learning

Author: Canas Guille D.
Rosasco Lorenzo
Rudi Alessandro
Publication venue
Publication date: 01/01/2013
Field of study

A large number of algorithms in machine learning, from principal component analysis (PCA), and its non-linear (kernel) extensions, to more recent spectral embedding and support estimation methods, rely on estimating a linear subspace from samples. In this paper we introduce a general formulation of this problem and derive novel learning error estimates. Our results rely on natural assumptions on the spectral properties of the covariance operator associated to the data distribu- tion, and hold for a wide class of metrics between subspaces. As special cases, we discuss sharp error estimates for the reconstruction properties of PCA and spectral support estimation. Key to our analysis is an operator theoretic approach that has broad applicability to spectral learning methods.Comment: Extendend Version of conference pape

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Genova

Estimates of the Approximation Error Using Rademacher Complexity: Learning Vector-Valued Functions

Author: Gnecco Giorgio
Sanguineti Marcello
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2008
Field of study

For certain families of multivariable vector-valued functions to be approximated, the accuracy of approximation schemes made up of linear combinations of computational units containing adjustable parameters is investigated. Upper bounds on the approximation error are derived that depend on the Rademacher complexities of the families. The estimates exploit possible relationships among the components of the multivariable vector-valued functions. All such components are approximated simultaneously in such a way to use, for a desired approximation accuracy, less computational units than those required by componentwise approximation. An application to -stage optimization problems is discussed

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Archivio della ricerca della Scuola IMT Alti Studi Lucca

Archivio istituzionale della ricerca - Università di Genova

IMT Institutional Repository

On information plus noise kernel random matrices

Author: Karoui Noureddine El
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

Kernel random matrices have attracted a lot of interest in recent years, from both practical and theoretical standpoints. Most of the theoretical work so far has focused on the case were the data is sampled from a low-dimensional structure. Very recently, the first results concerning kernel random matrices with high-dimensional input data were obtained, in a setting where the data was sampled from a genuinely high-dimensional structure---similar to standard assumptions in random matrix theory. In this paper, we consider the case where the data is of the type "information

{}+{}

noise." In other words, each observation is the sum of two independent elements: one sampled from a "low-dimensional" structure, the signal part of the data, the other being high-dimensional noise, normalized to not overwhelm but still affect the signal. We consider two types of noise, spherical and elliptical. In the spherical setting, we show that the spectral properties of kernel random matrices can be understood from a new kernel matrix, computed only from the signal part of the data, but using (in general) a slightly different kernel. The Gaussian kernel has some special properties in this setting. The elliptical setting, which is important from a robustness standpoint, is less prone to easy interpretation.Comment: Published in at http://dx.doi.org/10.1214/10-AOS801 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Sample Complexity of Dictionary Learning

Author: Aharon
Amaldi
Baraniuk
Bruckstein
Burges
Campadelli
Campbell
Campbell
Candes
Candès
Chapelle
Dehak
Donoho
Eliathamby Ambikairajah
Fauve
Figueiredo
Friedman
Georghiades
Huang
Ji
Jia Min Karen Kua
Julien Epps
Kinnunen
Kreutz-Delgado
Mairal
McLaren
Reynolds
Reynolds
Tao
Tibshirani
Tikhonov
Webb
Wright
Zou
Publication venue: 'Elsevier BV'
Publication date: 24/11/2010
Field of study

A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary. Algorithms for various signal processing applications, including classification, denoising and signal separation, learn a dictionary from a set of signals to be represented. Can we expect that the representation found by such a dictionary for a previously unseen example from the same source will have L_2 error of the same magnitude as those for the given examples? We assume signals are generated from a fixed distribution, and study this questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefficient selection, as measured by the expected L_2 error in representation when the dictionary is used. For the case of l_1 regularized coefficient selection we provide a generalization bound of the order of O(sqrt(np log(m lambda)/m)), where n is the dimension, p is the number of elements in the dictionary, lambda is a bound on the l_1 norm of the coefficient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most k dictionary elements, we provide a bound of the order O(sqrt(np log(m k)/m)) under an assumption on the level of orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results further yield fast rates of order 1/m as opposed to 1/sqrt(m) using localized Rademacher complexity. We provide similar results in a general setting using kernels with weak smoothness requirements

arXiv.org e-Print Archive

CiteSeerX

Crossref