6,182 research outputs found

    Fast learning rate of multiple kernel learning: Trade-off between sparsity and smoothness

    Full text link
    We investigate the learning rate of multiple kernel learning (MKL) with β„“1\ell_1 and elastic-net regularizations. The elastic-net regularization is a composition of an β„“1\ell_1-regularizer for inducing the sparsity and an β„“2\ell_2-regularizer for controlling the smoothness. We focus on a sparse setting where the total number of kernels is large, but the number of nonzero components of the ground truth is relatively small, and show sharper convergence rates than the learning rates have ever shown for both β„“1\ell_1 and elastic-net regularizations. Our analysis reveals some relations between the choice of a regularization function and the performance. If the ground truth is smooth, we show a faster convergence rate for the elastic-net regularization with less conditions than β„“1\ell_1-regularization; otherwise, a faster convergence rate for the β„“1\ell_1-regularization is shown.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1095 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org). arXiv admin note: text overlap with arXiv:1103.043

    Sparse Learning over Infinite Subgraph Features

    Full text link
    We present a supervised-learning algorithm from graph data (a set of graphs) for arbitrary twice-differentiable loss functions and sparse linear models over all possible subgraph features. To date, it has been shown that under all possible subgraph features, several types of sparse learning, such as Adaboost, LPBoost, LARS/LASSO, and sparse PLS regression, can be performed. Particularly emphasis is placed on simultaneous learning of relevant features from an infinite set of candidates. We first generalize techniques used in all these preceding studies to derive an unifying bounding technique for arbitrary separable functions. We then carefully use this bounding to make block coordinate gradient descent feasible over infinite subgraph features, resulting in a fast converging algorithm that can solve a wider class of sparse learning problems over graph data. We also empirically study the differences from the existing approaches in convergence property, selected subgraph features, and search-space sizes. We further discuss several unnoticed issues in sparse learning over all possible subgraph features.Comment: 42 pages, 24 figures, 4 table

    Understanding and Comparing Scalable Gaussian Process Regression for Big Data

    Full text link
    As a non-parametric Bayesian model which produces informative predictive distribution, Gaussian process (GP) has been widely used in various fields, like regression, classification and optimization. The cubic complexity of standard GP however leads to poor scalability, which poses challenges in the era of big data. Hence, various scalable GPs have been developed in the literature in order to improve the scalability while retaining desirable prediction accuracy. This paper devotes to investigating the methodological characteristics and performance of representative global and local scalable GPs including sparse approximations and local aggregations from four main perspectives: scalability, capability, controllability and robustness. The numerical experiments on two toy examples and five real-world datasets with up to 250K points offer the following findings. In terms of scalability, most of the scalable GPs own a time complexity that is linear to the training size. In terms of capability, the sparse approximations capture the long-term spatial correlations, the local aggregations capture the local patterns but suffer from over-fitting in some scenarios. In terms of controllability, we could improve the performance of sparse approximations by simply increasing the inducing size. But this is not the case for local aggregations. In terms of robustness, local aggregations are robust to various initializations of hyperparameters due to the local attention mechanism. Finally, we highlight that the proper hybrid of global and local scalable GPs may be a promising way to improve both the model capability and scalability for big data.Comment: 25 pages, 15 figures, preprint submitted to KB
    • …
    corecore