48,739 research outputs found
Online Learning of Noisy Data with Kernels
We study online learning when individual instances are corrupted by
adversarially chosen random noise. We assume the noise distribution is unknown,
and may change over time with no restriction other than having zero mean and
bounded variance. Our technique relies on a family of unbiased estimators for
non-linear functions, which may be of independent interest. We show that a
variant of online gradient descent can learn functions in any dot-product
(e.g., polynomial) or Gaussian kernel space with any analytic convex loss
function. Our variant uses randomized estimates that need to query a random
number of noisy copies of each instance, where with high probability this
number is upper bounded by a constant. Allowing such multiple queries cannot be
avoided: Indeed, we show that online learning is in general impossible when
only one noisy copy of each instance can be accessed.Comment: This is a full version of the paper appearing in the 23rd
International Conference on Learning Theory (COLT 2010
Online Learning with Multiple Operator-valued Kernels
We consider the problem of learning a vector-valued function f in an online
learning setting. The function f is assumed to lie in a reproducing Hilbert
space of operator-valued kernels. We describe two online algorithms for
learning f while taking into account the output structure. A first contribution
is an algorithm, ONORMA, that extends the standard kernel-based online learning
algorithm NORMA from scalar-valued to operator-valued setting. We report a
cumulative error bound that holds both for classification and regression. We
then define a second algorithm, MONORMA, which addresses the limitation of
pre-defining the output structure in ONORMA by learning sequentially a linear
combination of operator-valued kernels. Our experiments show that the proposed
algorithms achieve good performance results with low computational cost
Analyzing sparse dictionaries for online learning with kernels
Many signal processing and machine learning methods share essentially the
same linear-in-the-parameter model, with as many parameters as available
samples as in kernel-based machines. Sparse approximation is essential in many
disciplines, with new challenges emerging in online learning with kernels. To
this end, several sparsity measures have been proposed in the literature to
quantify sparse dictionaries and constructing relevant ones, the most prolific
ones being the distance, the approximation, the coherence and the Babel
measures. In this paper, we analyze sparse dictionaries based on these
measures. By conducting an eigenvalue analysis, we show that these sparsity
measures share many properties, including the linear independence condition and
inducing a well-posed optimization problem. Furthermore, we prove that there
exists a quasi-isometry between the parameter (i.e., dual) space and the
dictionary's induced feature space.Comment: 10 page
Sequential Gaussian Processes for Online Learning of Nonstationary Functions
Many machine learning problems can be framed in the context of estimating
functions, and often these are time-dependent functions that are estimated in
real-time as observations arrive. Gaussian processes (GPs) are an attractive
choice for modeling real-valued nonlinear functions due to their flexibility
and uncertainty quantification. However, the typical GP regression model
suffers from several drawbacks: i) Conventional GP inference scales
with respect to the number of observations; ii) updating a GP model
sequentially is not trivial; and iii) covariance kernels often enforce
stationarity constraints on the function, while GPs with non-stationary
covariance kernels are often intractable to use in practice. To overcome these
issues, we propose an online sequential Monte Carlo algorithm to fit mixtures
of GPs that capture non-stationary behavior while allowing for fast,
distributed inference. By formulating hyperparameter optimization as a
multi-armed bandit problem, we accelerate mixing for real time inference. Our
approach empirically improves performance over state-of-the-art methods for
online GP estimation in the context of prediction for simulated non-stationary
data and hospital time series data
Extension of Wirtinger's Calculus to Reproducing Kernel Hilbert Spaces and the Complex Kernel LMS
Over the last decade, kernel methods for nonlinear processing have
successfully been used in the machine learning community. The primary
mathematical tool employed in these methods is the notion of the Reproducing
Kernel Hilbert Space. However, so far, the emphasis has been on batch
techniques. It is only recently, that online techniques have been considered in
the context of adaptive signal processing tasks. Moreover, these efforts have
only been focussed on real valued data sequences. To the best of our knowledge,
no adaptive kernel-based strategy has been developed, so far, for complex
valued signals. Furthermore, although the real reproducing kernels are used in
an increasing number of machine learning problems, complex kernels have not,
yet, been used, in spite of their potential interest in applications that deal
with complex signals, with Communications being a typical example. In this
paper, we present a general framework to attack the problem of adaptive
filtering of complex signals, using either real reproducing kernels, taking
advantage of a technique called \textit{complexification} of real RKHSs, or
complex reproducing kernels, highlighting the use of the complex gaussian
kernel. In order to derive gradients of operators that need to be defined on
the associated complex RKHSs, we employ the powerful tool of Wirtinger's
Calculus, which has recently attracted attention in the signal processing
community. To this end, in this paper, the notion of Wirtinger's calculus is
extended, for the first time, to include complex RKHSs and use it to derive
several realizations of the Complex Kernel Least-Mean-Square (CKLMS) algorithm.
Experiments verify that the CKLMS offers significant performance improvements
over several linear and nonlinear algorithms, when dealing with nonlinearities.Comment: 15 pages (double column), preprint of article accepted in IEEE Trans.
Sig. Pro
- …