1,230 research outputs found
Kernels for linear time invariant system identification
In this paper, we study the problem of identifying the impulse response of a
linear time invariant (LTI) dynamical system from the knowledge of the input
signal and a finite set of noisy output observations. We adopt an approach
based on regularization in a Reproducing Kernel Hilbert Space (RKHS) that takes
into account both continuous and discrete time systems. The focus of the paper
is on designing spaces that are well suited for temporal impulse response
modeling. To this end, we construct and characterize general families of
kernels that incorporate system properties such as stability, relative degree,
absence of oscillatory behavior, smoothness, or delay. In addition, we discuss
the possibility of automatically searching over these classes by means of
kernel learning techniques, so as to capture different modes of the system to
be identified
Learning 2D Gabor Filters by Infinite Kernel Learning Regression
Gabor functions have wide-spread applications in image processing and
computer vision. In this paper, we prove that 2D Gabor functions are
translation-invariant positive-definite kernels and propose a novel formulation
for the problem of image representation with Gabor functions based on infinite
kernel learning regression. Using this formulation, we obtain a support vector
expansion of an image based on a mixture of Gabor functions. The problem with
this representation is that all Gabor functions are present at all support
vector pixels. Applying LASSO to this support vector expansion, we obtain a
sparse representation in which each Gabor function is positioned at a very
small set of pixels. As an application, we introduce a method for learning a
dataset-specific set of Gabor filters that can be used subsequently for feature
extraction. Our experiments show that use of the learned Gabor filters improves
the recognition accuracy of a recently introduced face recognition algorithm
Alignment Based Kernel Learning with a Continuous Set of Base Kernels
The success of kernel-based learning methods depend on the choice of kernel.
Recently, kernel learning methods have been proposed that use data to select
the most appropriate kernel, usually by combining a set of base kernels. We
introduce a new algorithm for kernel learning that combines a {\em continuous
set of base kernels}, without the common step of discretizing the space of base
kernels. We demonstrate that our new method achieves state-of-the-art
performance across a variety of real-world datasets. Furthermore, we explicitly
demonstrate the importance of combining the right dictionary of kernels, which
is problematic for methods based on a finite set of base kernels chosen a
priori. Our method is not the first approach to work with continuously
parameterized kernels. However, we show that our method requires substantially
less computation than previous such approaches, and so is more amenable to
multiple dimensional parameterizations of base kernels, which we demonstrate
A Metric-learning based framework for Support Vector Machines and Multiple Kernel Learning
Most metric learning algorithms, as well as Fisher's Discriminant Analysis
(FDA), optimize some cost function of different measures of within-and
between-class distances. On the other hand, Support Vector Machines(SVMs) and
several Multiple Kernel Learning (MKL) algorithms are based on the SVM large
margin theory. Recently, SVMs have been analyzed from SVM and metric learning,
and to develop new algorithms that build on the strengths of each. Inspired by
the metric learning interpretation of SVM, we develop here a new
metric-learning based SVM framework in which we incorporate metric learning
concepts within SVM. We extend the optimization problem of SVM to include some
measure of the within-class distance and along the way we develop a new
within-class distance measure which is appropriate for SVM. In addition, we
adopt the same approach for MKL and show that it can be also formulated as a
Mahalanobis metric learning problem. Our end result is a number of SVM/MKL
algorithms that incorporate metric learning concepts. We experiment with them
on a set of benchmark datasets and observe important predictive performance
improvements
A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning
We consider the problem of simultaneously learning to linearly combine a very
large number of kernels and learn a good predictor based on the learnt kernel.
When the number of kernels to be combined is very large, multiple kernel
learning methods whose computational cost scales linearly in are
intractable. We propose a randomized version of the mirror descent algorithm to
overcome this issue, under the objective of minimizing the group -norm
penalized empirical risk. The key to achieve the required exponential speed-up
is the computationally efficient construction of low-variance estimates of the
gradient. We propose importance sampling based estimates, and find that the
ideal distribution samples a coordinate with a probability proportional to the
magnitude of the corresponding gradient. We show the surprising result that in
the case of learning the coefficients of a polynomial kernel, the combinatorial
structure of the base kernels to be combined allows the implementation of
sampling from this distribution to run in time, making the total
computational cost of the method to achieve an -optimal solution to
be , thereby allowing our method to operate for very
large values of . Experiments with simulated and real data confirm that the
new algorithm is computationally more efficient than its state-of-the-art
alternatives
Algorithms for Learning Kernels Based on Centered Alignment
This paper presents new and effective algorithms for learning kernels. In
particular, as shown by our empirical results, these algorithms consistently
outperform the so-called uniform combination solution that has proven to be
difficult to improve upon in the past, as well as other algorithms for learning
kernels based on convex combinations of base kernels in both classification and
regression. Our algorithms are based on the notion of centered alignment which
is used as a similarity measure between kernels or kernel matrices. We present
a number of novel algorithmic, theoretical, and empirical results for learning
kernels based on our notion of centered alignment. In particular, we describe
efficient algorithms for learning a maximum alignment kernel by showing that
the problem can be reduced to a simple QP and discuss a one-stage algorithm for
learning both a kernel and a hypothesis based on that kernel using an
alignment-based regularization. Our theoretical results include a novel
concentration bound for centered alignment between kernel matrices, the proof
of the existence of effective predictors for kernels with high alignment, both
for classification and for regression, and the proof of stability-based
generalization bounds for a broad family of algorithms for learning kernels
based on centered alignment. We also report the results of experiments with our
centered alignment-based algorithms in both classification and regression
Ensembles of Kernel Predictors
This paper examines the problem of learning with a finite and possibly large
set of p base kernels. It presents a theoretical and empirical analysis of an
approach addressing this problem based on ensembles of kernel predictors. This
includes novel theoretical guarantees based on the Rademacher complexity of the
corresponding hypothesis sets, the introduction and analysis of a learning
algorithm based on these hypothesis sets, and a series of experiments using
ensembles of kernel predictors with several data sets. Both convex combinations
of kernel-based hypotheses and more general Lq-regularized nonnegative
combinations are analyzed. These theoretical, algorithmic, and empirical
results are compared with those achieved by using learning kernel techniques,
which can be viewed as another approach for solving the same problem
L2 Regularization for Learning Kernels
The choice of the kernel is critical to the success of many learning
algorithms but it is typically left to the user. Instead, the training data can
be used to learn the kernel by selecting it out of a given family, such as that
of non-negative linear combinations of p base kernels, constrained by a trace
or L1 regularization. This paper studies the problem of learning kernels with
the same family of kernels but with an L2 regularization instead, and for
regression problems. We analyze the problem of learning kernels with ridge
regression. We derive the form of the solution of the optimization problem and
give an efficient iterative algorithm for computing that solution. We present a
novel theoretical analysis of the problem based on stability and give learning
bounds for orthogonal kernels that contain only an additive term O(pp/m) when
compared to the standard kernel ridge regression stability bound. We also
report the results of experiments indicating that L1 regularization can lead to
modest improvements for a small number of kernels, but to performance
degradations in larger-scale cases. In contrast, L2 regularization never
degrades performance and in fact achieves significant improvements with a large
number of kernels.Comment: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty
in Artificial Intelligence (UAI2009
Kernel machines with two layers and multiple kernel learning
In this paper, the framework of kernel machines with two layers is
introduced, generalizing classical kernel methods. The new learning methodology
provide a formal connection between computational architectures with multiple
layers and the theme of kernel learning in standard regularization methods.
First, a representer theorem for two-layer networks is presented, showing that
finite linear combinations of kernels on each layer are optimal architectures
whenever the corresponding functions solve suitable variational problems in
reproducing kernel Hilbert spaces (RKHS). The input-output map expressed by
these architectures turns out to be equivalent to a suitable single-layer
kernel machines in which the kernel function is also learned from the data.
Recently, the so-called multiple kernel learning methods have attracted
considerable attention in the machine learning literature. In this paper,
multiple kernel learning methods are shown to be specific cases of kernel
machines with two layers in which the second layer is linear. Finally, a simple
and effective multiple kernel learning method called RLS2 (regularized least
squares with two layers) is introduced, and his performances on several
learning problems are extensively analyzed. An open source MATLAB toolbox to
train and validate RLS2 models with a Graphic User Interface is available
Optimality Implies Kernel Sum Classifiers are Statistically Efficient
We propose a novel combination of optimization tools with learning theory
bounds in order to analyze the sample complexity of optimal kernel sum
classifiers. This contrasts the typical learning theoretic results which hold
for all (potentially suboptimal) classifiers. Our work also justifies
assumptions made in prior work on multiple kernel learning. As a byproduct of
our analysis, we also provide a new form of Rademacher complexity for
hypothesis classes containing only optimal classifiers
- …