11,965 research outputs found
Fast learning rate of multiple kernel learning: Trade-off between sparsity and smoothness
We investigate the learning rate of multiple kernel learning (MKL) with
and elastic-net regularizations. The elastic-net regularization is a
composition of an -regularizer for inducing the sparsity and an
-regularizer for controlling the smoothness. We focus on a sparse
setting where the total number of kernels is large, but the number of nonzero
components of the ground truth is relatively small, and show sharper
convergence rates than the learning rates have ever shown for both and
elastic-net regularizations. Our analysis reveals some relations between the
choice of a regularization function and the performance. If the ground truth is
smooth, we show a faster convergence rate for the elastic-net regularization
with less conditions than -regularization; otherwise, a faster
convergence rate for the -regularization is shown.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1095 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org). arXiv admin note: text overlap with
arXiv:1103.043
Fast Convergence Rate of Multiple Kernel Learning with Elastic-net Regularization
We investigate the learning rate of multiple kernel leaning (MKL) with
elastic-net regularization, which consists of an -regularizer for
inducing the sparsity and an -regularizer for controlling the
smoothness. We focus on a sparse setting where the total number of kernels is
large but the number of non-zero components of the ground truth is relatively
small, and prove that elastic-net MKL achieves the minimax learning rate on the
-mixed-norm ball. Our bound is sharper than the convergence rates ever
shown, and has a property that the smoother the truth is, the faster the
convergence rate is.Comment: 21 pages, 0 figur
Does generalization performance of regularization learning depend on ? A negative example
-regularization has been demonstrated to be an attractive technique in
machine learning and statistical modeling. It attempts to improve the
generalization (prediction) capability of a machine (model) through
appropriately shrinking its coefficients. The shape of a estimator
differs in varying choices of the regularization order . In particular,
leads to the LASSO estimate, while corresponds to the smooth
ridge regression. This makes the order a potential tuning parameter in
applications. To facilitate the use of -regularization, we intend to
seek for a modeling strategy where an elaborative selection on is
avoidable. In this spirit, we place our investigation within a general
framework of -regularized kernel learning under a sample dependent
hypothesis space (SDHS). For a designated class of kernel functions, we show
that all estimators for attain similar generalization
error bounds. These estimated bounds are almost optimal in the sense that up to
a logarithmic factor, the upper and lower bounds are asymptotically identical.
This finding tentatively reveals that, in some modeling contexts, the choice of
might not have a strong impact in terms of the generalization capability.
From this perspective, can be arbitrarily specified, or specified merely by
other no generalization criteria like smoothness, computational complexity,
sparsity, etc..Comment: 35 pages, 3 figure
Inverse Density as an Inverse Problem: The Fredholm Equation Approach
In this paper we address the problem of estimating the ratio
where is a density function and is another density, or, more generally
an arbitrary function. Knowing or approximating this ratio is needed in various
problems of inference and integration, in particular, when one needs to average
a function with respect to one probability distribution, given a sample from
another. It is often referred as {\it importance sampling} in statistical
inference and is also closely related to the problem of {\it covariate shift}
in transfer learning as well as to various MCMC methods. It may also be useful
for separating the underlying geometry of a space, say a manifold, from the
density function defined on it.
Our approach is based on reformulating the problem of estimating
as an inverse problem in terms of an integral operator
corresponding to a kernel, and thus reducing it to an integral equation, known
as the Fredholm problem of the first kind. This formulation, combined with the
techniques of regularization and kernel methods, leads to a principled
kernel-based framework for constructing algorithms and for analyzing them
theoretically.
The resulting family of algorithms (FIRE, for Fredholm Inverse Regularized
Estimator) is flexible, simple and easy to implement.
We provide detailed theoretical analysis including concentration bounds and
convergence rates for the Gaussian kernel in the case of densities defined on
, compact domains in and smooth -dimensional sub-manifolds of
the Euclidean space.
We also show experimental results including applications to classification
and semi-supervised learning within the covariate shift framework and
demonstrate some encouraging experimental comparisons. We also show how the
parameters of our algorithms can be chosen in a completely unsupervised manner.Comment: Fixing a few typos in last versio
- β¦