11,460 research outputs found

    Fast learning rate of multiple kernel learning: Trade-off between sparsity and smoothness

    Full text link
    We investigate the learning rate of multiple kernel learning (MKL) with β„“1\ell_1 and elastic-net regularizations. The elastic-net regularization is a composition of an β„“1\ell_1-regularizer for inducing the sparsity and an β„“2\ell_2-regularizer for controlling the smoothness. We focus on a sparse setting where the total number of kernels is large, but the number of nonzero components of the ground truth is relatively small, and show sharper convergence rates than the learning rates have ever shown for both β„“1\ell_1 and elastic-net regularizations. Our analysis reveals some relations between the choice of a regularization function and the performance. If the ground truth is smooth, we show a faster convergence rate for the elastic-net regularization with less conditions than β„“1\ell_1-regularization; otherwise, a faster convergence rate for the β„“1\ell_1-regularization is shown.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1095 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org). arXiv admin note: text overlap with arXiv:1103.043

    Fast Convergence Rate of Multiple Kernel Learning with Elastic-net Regularization

    Full text link
    We investigate the learning rate of multiple kernel leaning (MKL) with elastic-net regularization, which consists of an β„“1\ell_1-regularizer for inducing the sparsity and an β„“2\ell_2-regularizer for controlling the smoothness. We focus on a sparse setting where the total number of kernels is large but the number of non-zero components of the ground truth is relatively small, and prove that elastic-net MKL achieves the minimax learning rate on the β„“2\ell_2-mixed-norm ball. Our bound is sharper than the convergence rates ever shown, and has a property that the smoother the truth is, the faster the convergence rate is.Comment: 21 pages, 0 figur

    Does generalization performance of lql^q regularization learning depend on qq? A negative example

    Full text link
    lql^q-regularization has been demonstrated to be an attractive technique in machine learning and statistical modeling. It attempts to improve the generalization (prediction) capability of a machine (model) through appropriately shrinking its coefficients. The shape of a lql^q estimator differs in varying choices of the regularization order qq. In particular, l1l^1 leads to the LASSO estimate, while l2l^{2} corresponds to the smooth ridge regression. This makes the order qq a potential tuning parameter in applications. To facilitate the use of lql^{q}-regularization, we intend to seek for a modeling strategy where an elaborative selection on qq is avoidable. In this spirit, we place our investigation within a general framework of lql^{q}-regularized kernel learning under a sample dependent hypothesis space (SDHS). For a designated class of kernel functions, we show that all lql^{q} estimators for 0<q<∞0< q < \infty attain similar generalization error bounds. These estimated bounds are almost optimal in the sense that up to a logarithmic factor, the upper and lower bounds are asymptotically identical. This finding tentatively reveals that, in some modeling contexts, the choice of qq might not have a strong impact in terms of the generalization capability. From this perspective, qq can be arbitrarily specified, or specified merely by other no generalization criteria like smoothness, computational complexity, sparsity, etc..Comment: 35 pages, 3 figure

    Inverse Density as an Inverse Problem: The Fredholm Equation Approach

    Full text link
    In this paper we address the problem of estimating the ratio qp\frac{q}{p} where pp is a density function and qq is another density, or, more generally an arbitrary function. Knowing or approximating this ratio is needed in various problems of inference and integration, in particular, when one needs to average a function with respect to one probability distribution, given a sample from another. It is often referred as {\it importance sampling} in statistical inference and is also closely related to the problem of {\it covariate shift} in transfer learning as well as to various MCMC methods. It may also be useful for separating the underlying geometry of a space, say a manifold, from the density function defined on it. Our approach is based on reformulating the problem of estimating qp\frac{q}{p} as an inverse problem in terms of an integral operator corresponding to a kernel, and thus reducing it to an integral equation, known as the Fredholm problem of the first kind. This formulation, combined with the techniques of regularization and kernel methods, leads to a principled kernel-based framework for constructing algorithms and for analyzing them theoretically. The resulting family of algorithms (FIRE, for Fredholm Inverse Regularized Estimator) is flexible, simple and easy to implement. We provide detailed theoretical analysis including concentration bounds and convergence rates for the Gaussian kernel in the case of densities defined on Rd\R^d, compact domains in Rd\R^d and smooth dd-dimensional sub-manifolds of the Euclidean space. We also show experimental results including applications to classification and semi-supervised learning within the covariate shift framework and demonstrate some encouraging experimental comparisons. We also show how the parameters of our algorithms can be chosen in a completely unsupervised manner.Comment: Fixing a few typos in last versio
    • …
    corecore