96 research outputs found

    Learnability of Gaussians with flexible variances

    Get PDF
    Copyright © 2007 Yiming Ying and Ding-Xuan ZhouGaussian kernels with flexible variances provide a rich family of Mercer kernels for learning algorithms. We show that the union of the unit balls of reproducing kernel Hilbert spaces generated by Gaussian kernels with fexible variances is a uniform Glivenko-Cantelli (uGC) class. This result confirms a conjecture concerning learnability of Gaussian kernels and verifies the uniform convergence of many learning algorithms involving Gaussians with changing variances. Rademacher averages and empirical covering numbers are used to estimate sample errors of multi-kernel regularization schemes associated with general loss functions. It is then shown that the regularization error associated with the least square loss and the Gaussian kernels can be greatly improved when °exible variances are allowed. Finally, for regularization schemes generated by Gaussian kernels with fexible variances we present explicit learning rates for regression with least square loss and classification with hinge loss

    Nonlinear Approximation Using Gaussian Kernels

    Get PDF
    It is well-known that non-linear approximation has an advantage over linear schemes in the sense that it provides comparable approximation rates to those of the linear schemes, but to a larger class of approximands. This was established for spline approximations and for wavelet approximations, and more recently by DeVore and Ron for homogeneous radial basis function (surface spline) approximations. However, no such results are known for the Gaussian function, the preferred kernel in machine learning and several engineering problems. We introduce and analyze in this paper a new algorithm for approximating functions using translates of Gaussian functions with varying tension parameters. At heart it employs the strategy for nonlinear approximation of DeVore and Ron, but it selects kernels by a method that is not straightforward. The crux of the difficulty lies in the necessity to vary the tension parameter in the Gaussian function spatially according to local information about the approximand: error analysis of Gaussian approximation schemes with varying tension are, by and large, an elusive target for approximators. We show that our algorithm is suitably optimal in the sense that it provides approximation rates similar to other established nonlinear methodologies like spline and wavelet approximations. As expected and desired, the approximation rates can be as high as needed and are essentially saturated only by the smoothness of the approximand.Comment: 15 Pages; corrected typos; to appear in J. Funct. Ana

    Sketching for Large-Scale Learning of Mixture Models

    Get PDF
    Learning parameters from voluminous data can be prohibitive in terms of memory and computational requirements. We propose a "compressive learning" framework where we estimate model parameters from a sketch of the training data. This sketch is a collection of generalized moments of the underlying probability distribution of the data. It can be computed in a single pass on the training set, and is easily computable on streams or distributed datasets. The proposed framework shares similarities with compressive sensing, which aims at drastically reducing the dimension of high-dimensional signals while preserving the ability to reconstruct them. To perform the estimation task, we derive an iterative algorithm analogous to sparse reconstruction algorithms in the context of linear inverse problems. We exemplify our framework with the compressive estimation of a Gaussian Mixture Model (GMM), providing heuristics on the choice of the sketching procedure and theoretical guarantees of reconstruction. We experimentally show on synthetic data that the proposed algorithm yields results comparable to the classical Expectation-Maximization (EM) technique while requiring significantly less memory and fewer computations when the number of database elements is large. We further demonstrate the potential of the approach on real large-scale data (over 10 8 training samples) for the task of model-based speaker verification. Finally, we draw some connections between the proposed framework and approximate Hilbert space embedding of probability distributions using random features. We show that the proposed sketching operator can be seen as an innovative method to design translation-invariant kernels adapted to the analysis of GMMs. We also use this theoretical framework to derive information preservation guarantees, in the spirit of infinite-dimensional compressive sensing

    A super-polynomial lower bound for learning nonparametric mixtures

    Full text link
    We study the problem of learning nonparametric distributions in a finite mixture, and establish a super-polynomial lower bound on the sample complexity of learning the component distributions in such models. Namely, we are given i.i.d. samples from ff where f=∑i=1kwifi,∑i=1kwi=1,wi>0 f=\sum_{i=1}^k w_i f_i, \quad\sum_{i=1}^k w_i=1, \quad w_i>0 and we are interested in learning each component fif_i. Without any assumptions on fif_i, this problem is ill-posed. In order to identify the components fif_i, we assume that each fif_i can be written as a convolution of a Gaussian and a compactly supported density Îœi\nu_i with supp(Îœi)∩supp(Îœj)=∅\text{supp}(\nu_i)\cap \text{supp}(\nu_j)=\emptyset. Our main result shows that Ω((1Δ)Clog⁥log⁥1Δ)\Omega((\frac{1}{\varepsilon})^{C\log\log \frac{1}{\varepsilon}}) samples are required for estimating each fif_i. The proof relies on a fast rate for approximation with Gaussians, which may be of independent interest. This result has important implications for the hardness of learning more general nonparametric latent variable models that arise in machine learning applications
    • 

    corecore