Search CORE

75,417 research outputs found

Learning with sample dependent hypothesis spaces

Author: Wu Qiang
Zhou Ding-Xuan
Publication venue: Elsevier Ltd.
Publication date: 31/12/2008
Field of study

AbstractMany learning algorithms use hypothesis spaces which are trained from samples, but little theoretical work has been devoted to the study of these algorithms. In this paper we show that mathematical analysis for these algorithms is essentially different from that for algorithms with hypothesis spaces independent of the sample or depending only on the sample size. The difficulty lies in the lack of a proper characterization of approximation error. To overcome this difficulty, we propose an idea of using a larger function class (not necessarily linear space) containing the union of all possible hypothesis spaces (varying with the sample) to measure the approximation ability of the algorithm. We show how this idea provides error analysis for two particular classes of learning algorithms in kernel methods: learning the kernel via regularization and coefficient based regularization. We demonstrate the power of this approach by its wide applicability

Does generalization performance of $l^q$ regularization learning depend on $q$ ? A negative example

Author: Fang Jian
Lin Shaobo
Xu Chen
Zeng Jingshan
Publication venue
Publication date: 24/07/2013
Field of study

l^q

-regularization has been demonstrated to be an attractive technique in machine learning and statistical modeling. It attempts to improve the generalization (prediction) capability of a machine (model) through appropriately shrinking its coefficients. The shape of a

l^q

estimator differs in varying choices of the regularization order

q

. In particular,

l^1

leads to the LASSO estimate, while

l^{2}

corresponds to the smooth ridge regression. This makes the order

q

a potential tuning parameter in applications. To facilitate the use of

l^{q}

-regularization, we intend to seek for a modeling strategy where an elaborative selection on

q

is avoidable. In this spirit, we place our investigation within a general framework of

l^{q}

-regularized kernel learning under a sample dependent hypothesis space (SDHS). For a designated class of kernel functions, we show that all

l^{q}

estimators for

0< q < \infty

attain similar generalization error bounds. These estimated bounds are almost optimal in the sense that up to a logarithmic factor, the upper and lower bounds are asymptotically identical. This finding tentatively reveals that, in some modeling contexts, the choice of

q

might not have a strong impact in terms of the generalization capability. From this perspective,

q

can be arbitrarily specified, or specified merely by other no generalization criteria like smoothness, computational complexity, sparsity, etc..Comment: 35 pages, 3 figure

arXiv.org e-Print Archive

Discussion of: Brownian distance covariance

Author: Fukumizu Kenji
Gretton Arthur
Sriperumbudur Bharath K.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 05/10/2010
Field of study

Discussion on "Brownian distance covariance" by G\'{a}bor J. Sz\'{e}kely and Maria L. Rizzo [arXiv:1010.0297]Comment: Published in at http://dx.doi.org/10.1214/09-AOAS312E the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

A low variance consistent test of relative dependency

Author: Blaschko Matthew
Bounliphone Wacha
Gretton Arthur
Tenenhaus Arthur
Publication venue
Publication date: 27/05/2015
Field of study

We describe a novel non-parametric statistical hypothesis test of relative dependence between a source variable and two candidate target variables. Such a test enables us to determine whether one source variable is significantly more dependent on a first target variable or a second. Dependence is measured via the Hilbert-Schmidt Independence Criterion (HSIC), resulting in a pair of empirical dependence measures (source-target 1, source-target 2). We test whether the first dependence measure is significantly larger than the second. Modeling the covariance between these HSIC statistics leads to a provably more powerful test than the construction of independent HSIC statistics by sub-sampling. The resulting test is consistent and unbiased, and (being based on U-statistics) has favorable convergence properties. The test can be computed in quadratic time, matching the computational complexity of standard empirical HSIC estimators. The effectiveness of the test is demonstrated on several real-world problems: we identify language groups from a multilingual corpus, and we prove that tumor location is more dependent on gene expression than chromosomal imbalances. Source code is available for download at https://github.com/wbounliphone/reldep.Comment: International Conference on Machine Learning, Jul 2015, Lille, Franc

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Local Rademacher Complexity-based Learning Guarantees for Multi-Task Learning

Author: Anagnostopoulos Georgios
Kloft Marius
Lei Yunwen
Mollaghasemi Mansooreh
Yousefi Niloofar
Publication venue
Publication date: 09/02/2017
Field of study

We show a Talagrand-type concentration inequality for Multi-Task Learning (MTL), using which we establish sharp excess risk bounds for MTL in terms of distribution- and data-dependent versions of the Local Rademacher Complexity (LRC). We also give a new bound on the LRC for norm regularized as well as strongly convex hypothesis classes, which applies not only to MTL but also to the standard i.i.d. setting. Combining both results, one can now easily derive fast-rate bounds on the excess risk for many prominent MTL methods, including---as we demonstrate---Schatten-norm, group-norm, and graph-regularized MTL. The derived bounds reflect a relationship akeen to a conservation law of asymptotic convergence rates. This very relationship allows for trading off slower rates w.r.t. the number of tasks for faster rates with respect to the number of available samples per task, when compared to the rates obtained via a traditional, global Rademacher analysis.Comment: In this version, some arguments and results (of the previous version) have been corrected, or modifie

arXiv.org e-Print Archive