75,417 research outputs found
Learning with sample dependent hypothesis spaces
AbstractMany learning algorithms use hypothesis spaces which are trained from samples, but little theoretical work has been devoted to the study of these algorithms. In this paper we show that mathematical analysis for these algorithms is essentially different from that for algorithms with hypothesis spaces independent of the sample or depending only on the sample size. The difficulty lies in the lack of a proper characterization of approximation error. To overcome this difficulty, we propose an idea of using a larger function class (not necessarily linear space) containing the union of all possible hypothesis spaces (varying with the sample) to measure the approximation ability of the algorithm. We show how this idea provides error analysis for two particular classes of learning algorithms in kernel methods: learning the kernel via regularization and coefficient based regularization. We demonstrate the power of this approach by its wide applicability
Does generalization performance of regularization learning depend on ? A negative example
-regularization has been demonstrated to be an attractive technique in
machine learning and statistical modeling. It attempts to improve the
generalization (prediction) capability of a machine (model) through
appropriately shrinking its coefficients. The shape of a estimator
differs in varying choices of the regularization order . In particular,
leads to the LASSO estimate, while corresponds to the smooth
ridge regression. This makes the order a potential tuning parameter in
applications. To facilitate the use of -regularization, we intend to
seek for a modeling strategy where an elaborative selection on is
avoidable. In this spirit, we place our investigation within a general
framework of -regularized kernel learning under a sample dependent
hypothesis space (SDHS). For a designated class of kernel functions, we show
that all estimators for attain similar generalization
error bounds. These estimated bounds are almost optimal in the sense that up to
a logarithmic factor, the upper and lower bounds are asymptotically identical.
This finding tentatively reveals that, in some modeling contexts, the choice of
might not have a strong impact in terms of the generalization capability.
From this perspective, can be arbitrarily specified, or specified merely by
other no generalization criteria like smoothness, computational complexity,
sparsity, etc..Comment: 35 pages, 3 figure
Discussion of: Brownian distance covariance
Discussion on "Brownian distance covariance" by G\'{a}bor J. Sz\'{e}kely and
Maria L. Rizzo [arXiv:1010.0297]Comment: Published in at http://dx.doi.org/10.1214/09-AOAS312E the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A low variance consistent test of relative dependency
We describe a novel non-parametric statistical hypothesis test of relative
dependence between a source variable and two candidate target variables. Such a
test enables us to determine whether one source variable is significantly more
dependent on a first target variable or a second. Dependence is measured via
the Hilbert-Schmidt Independence Criterion (HSIC), resulting in a pair of
empirical dependence measures (source-target 1, source-target 2). We test
whether the first dependence measure is significantly larger than the second.
Modeling the covariance between these HSIC statistics leads to a provably more
powerful test than the construction of independent HSIC statistics by
sub-sampling. The resulting test is consistent and unbiased, and (being based
on U-statistics) has favorable convergence properties. The test can be computed
in quadratic time, matching the computational complexity of standard empirical
HSIC estimators. The effectiveness of the test is demonstrated on several
real-world problems: we identify language groups from a multilingual corpus,
and we prove that tumor location is more dependent on gene expression than
chromosomal imbalances. Source code is available for download at
https://github.com/wbounliphone/reldep.Comment: International Conference on Machine Learning, Jul 2015, Lille, Franc
Local Rademacher Complexity-based Learning Guarantees for Multi-Task Learning
We show a Talagrand-type concentration inequality for Multi-Task Learning
(MTL), using which we establish sharp excess risk bounds for MTL in terms of
distribution- and data-dependent versions of the Local Rademacher Complexity
(LRC). We also give a new bound on the LRC for norm regularized as well as
strongly convex hypothesis classes, which applies not only to MTL but also to
the standard i.i.d. setting. Combining both results, one can now easily derive
fast-rate bounds on the excess risk for many prominent MTL methods,
including---as we demonstrate---Schatten-norm, group-norm, and
graph-regularized MTL. The derived bounds reflect a relationship akeen to a
conservation law of asymptotic convergence rates. This very relationship allows
for trading off slower rates w.r.t. the number of tasks for faster rates with
respect to the number of available samples per task, when compared to the rates
obtained via a traditional, global Rademacher analysis.Comment: In this version, some arguments and results (of the previous version)
have been corrected, or modifie
- …