47 research outputs found

    Statistical properties of kernel principal component analysis

    Full text link

    On the Sample Complexity of Subspace Learning

    Full text link
    A large number of algorithms in machine learning, from principal component analysis (PCA), and its non-linear (kernel) extensions, to more recent spectral embedding and support estimation methods, rely on estimating a linear subspace from samples. In this paper we introduce a general formulation of this problem and derive novel learning error estimates. Our results rely on natural assumptions on the spectral properties of the covariance operator associated to the data distribu- tion, and hold for a wide class of metrics between subspaces. As special cases, we discuss sharp error estimates for the reconstruction properties of PCA and spectral support estimation. Key to our analysis is an operator theoretic approach that has broad applicability to spectral learning methods.Comment: Extendend Version of conference pape

    Estimates of the Approximation Error Using Rademacher Complexity: Learning Vector-Valued Functions

    Get PDF
    For certain families of multivariable vector-valued functions to be approximated, the accuracy of approximation schemes made up of linear combinations of computational units containing adjustable parameters is investigated. Upper bounds on the approximation error are derived that depend on the Rademacher complexities of the families. The estimates exploit possible relationships among the components of the multivariable vector-valued functions. All such components are approximated simultaneously in such a way to use, for a desired approximation accuracy, less computational units than those required by componentwise approximation. An application to -stage optimization problems is discussed

    On information plus noise kernel random matrices

    Full text link
    Kernel random matrices have attracted a lot of interest in recent years, from both practical and theoretical standpoints. Most of the theoretical work so far has focused on the case were the data is sampled from a low-dimensional structure. Very recently, the first results concerning kernel random matrices with high-dimensional input data were obtained, in a setting where the data was sampled from a genuinely high-dimensional structure---similar to standard assumptions in random matrix theory. In this paper, we consider the case where the data is of the type "information+{}+{}noise." In other words, each observation is the sum of two independent elements: one sampled from a "low-dimensional" structure, the signal part of the data, the other being high-dimensional noise, normalized to not overwhelm but still affect the signal. We consider two types of noise, spherical and elliptical. In the spherical setting, we show that the spectral properties of kernel random matrices can be understood from a new kernel matrix, computed only from the signal part of the data, but using (in general) a slightly different kernel. The Gaussian kernel has some special properties in this setting. The elliptical setting, which is important from a robustness standpoint, is less prone to easy interpretation.Comment: Published in at http://dx.doi.org/10.1214/10-AOS801 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The Sample Complexity of Dictionary Learning

    Full text link
    A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary. Algorithms for various signal processing applications, including classification, denoising and signal separation, learn a dictionary from a set of signals to be represented. Can we expect that the representation found by such a dictionary for a previously unseen example from the same source will have L_2 error of the same magnitude as those for the given examples? We assume signals are generated from a fixed distribution, and study this questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefficient selection, as measured by the expected L_2 error in representation when the dictionary is used. For the case of l_1 regularized coefficient selection we provide a generalization bound of the order of O(sqrt(np log(m lambda)/m)), where n is the dimension, p is the number of elements in the dictionary, lambda is a bound on the l_1 norm of the coefficient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most k dictionary elements, we provide a bound of the order O(sqrt(np log(m k)/m)) under an assumption on the level of orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results further yield fast rates of order 1/m as opposed to 1/sqrt(m) using localized Rademacher complexity. We provide similar results in a general setting using kernels with weak smoothness requirements