6 research outputs found
New Generalization Bounds for Learning Kernels
This paper presents several novel generalization bounds for the problem of
learning kernels based on the analysis of the Rademacher complexity of the
corresponding hypothesis sets. Our bound for learning kernels with a convex
combination of p base kernels has only a log(p) dependency on the number of
kernels, p, which is considerably more favorable than the previous best bound
given for the same problem. We also give a novel bound for learning with a
linear combination of p base kernels with an L_2 regularization whose
dependency on p is only in p^{1/4}
Multiple Kernel Learning from Noisy Labels by Stochastic Programming
We study the problem of multiple kernel learning from noisy labels. This is
in contrast to most of the previous studies on multiple kernel learning that
mainly focus on developing efficient algorithms and assume perfectly labeled
training examples. Directly applying the existing multiple kernel learning
algorithms to noisily labeled examples often leads to suboptimal performance
due to the incorrect class assignments. We address this challenge by casting
multiple kernel learning from noisy labels into a stochastic programming
problem, and presenting a minimax formulation. We develop an efficient
algorithm for solving the related convex-concave optimization problem with a
fast convergence rate of where is the number of iterations.
Empirical studies on UCI data sets verify both the effectiveness of the
proposed framework and the efficiency of the proposed optimization algorithm.Comment: ICML201
Structured Sparsity and Generalization
We present a data dependent generalization bound for a large class of
regularized algorithms which implement structured sparsity constraints. The
bound can be applied to standard squared-norm regularization, the Lasso, the
group Lasso, some versions of the group Lasso with overlapping groups, multiple
kernel learning and other regularization schemes. In all these cases
competitive results are obtained. A novel feature of our bound is that it can
be applied in an infinite dimensional setting such as the Lasso in a separable
Hilbert space or multiple kernel learning with a countable number of kernels
Voted Kernel Regularization
This paper presents an algorithm, Voted Kernel Regularization , that provides
the flexibility of using potentially very complex kernel functions such as
predictors based on much higher-degree polynomial kernels, while benefitting
from strong learning guarantees. The success of our algorithm arises from
derived bounds that suggest a new regularization penalty in terms of the
Rademacher complexities of the corresponding families of kernel maps. In a
series of experiments we demonstrate the improved performance of our algorithm
as compared to baselines. Furthermore, the algorithm enjoys several favorable
properties. The optimization problem is convex, it allows for learning with
non-PDS kernels, and the solutions are highly sparse, resulting in improved
classification speed and memory requirements.Comment: 16 page
Guaranteed Classification via Regularized Similarity Learning
Learning an appropriate (dis)similarity function from the available data is a
central problem in machine learning, since the success of many machine learning
algorithms critically depends on the choice of a similarity function to compare
examples. Despite many approaches for similarity metric learning have been
proposed, there is little theoretical study on the links between similarity
met- ric learning and the classification performance of the result classifier.
In this paper, we propose a regularized similarity learning formulation
associated with general matrix-norms, and establish their generalization
bounds. We show that the generalization error of the resulting linear separator
can be bounded by the derived generalization bound of similarity learning. This
shows that a good gen- eralization of the learnt similarity function guarantees
a good classification of the resulting linear classifier. Our results extend
and improve those obtained by Bellet at al. [3]. Due to the techniques
dependent on the notion of uniform stability [6], the bound obtained there
holds true only for the Frobenius matrix- norm regularization. Our techniques
using the Rademacher complexity [5] and its related Khinchin-type inequality
enable us to establish bounds for regularized similarity learning formulations
associated with general matrix-norms including sparse L 1 -norm and mixed
(2,1)-norm
Ensembles of Kernel Predictors
This paper examines the problem of learning with a finite and possibly large
set of p base kernels. It presents a theoretical and empirical analysis of an
approach addressing this problem based on ensembles of kernel predictors. This
includes novel theoretical guarantees based on the Rademacher complexity of the
corresponding hypothesis sets, the introduction and analysis of a learning
algorithm based on these hypothesis sets, and a series of experiments using
ensembles of kernel predictors with several data sets. Both convex combinations
of kernel-based hypotheses and more general Lq-regularized nonnegative
combinations are analyzed. These theoretical, algorithmic, and empirical
results are compared with those achieved by using learning kernel techniques,
which can be viewed as another approach for solving the same problem