303 research outputs found
Analyzing sparse dictionaries for online learning with kernels
Many signal processing and machine learning methods share essentially the
same linear-in-the-parameter model, with as many parameters as available
samples as in kernel-based machines. Sparse approximation is essential in many
disciplines, with new challenges emerging in online learning with kernels. To
this end, several sparsity measures have been proposed in the literature to
quantify sparse dictionaries and constructing relevant ones, the most prolific
ones being the distance, the approximation, the coherence and the Babel
measures. In this paper, we analyze sparse dictionaries based on these
measures. By conducting an eigenvalue analysis, we show that these sparsity
measures share many properties, including the linear independence condition and
inducing a well-posed optimization problem. Furthermore, we prove that there
exists a quasi-isometry between the parameter (i.e., dual) space and the
dictionary's induced feature space.Comment: 10 page
Sparse Online Learning with Kernels using Random Features for Estimating Nonlinear Dynamic Graphs
Online topology estimation of graph-connected time series is challenging in practice, particularly because the dependencies between the time series in many real-world scenarios are nonlinear. To address this challenge, we introduce a novel kernel-based algorithm for online graph topology estimation. Our proposed algorithm also performs a Fourier-based random feature approximation to tackle the curse of dimensionality associated with kernel representations. Exploiting the fact that real-world networks often exhibit sparse topologies, we propose a group-Lasso based optimization framework, which is solved using an iterative composite objective mirror descent method, yielding an online algorithm with fixed computational complexity per iteration. We provide theoretical guarantees for our algorithm and prove that it can achieve sublinear dynamic regret under certain reasonable assumptions. In experiments conducted on both real and synthetic data, our method outperforms existing state-of-the-art competitors.submittedVersio
Efficient online learning with kernels for adversarial large scale problems
We are interested in a framework of online learning with kernels for low-dimensional but large-scale and potentially adversarial datasets. We study the computational and theoretical performance of online variations of kernel Ridge regression. Despite its simplicity, the algorithm we study is the first to achieve the optimal regret for a wide range of kernels with a per-round complexity of order with . The algorithm we consider is based on approximating the kernel with the linear span of basis functions. Our contributions is two-fold: 1) For the Gaussian kernel, we propose to build the basis beforehand (independently of the data) through Taylor expansion. For -dimensional inputs, we provide a (close to) optimal regret of order with per-round time complexity and space complexity . This makes the algorithm a suitable choice as soon as which is likely to happen in a scenario with small dimensional and large-scale dataset; 2) For general kernels with low effective dimension, the basis functions are updated sequentially in a data-adaptive fashion by sampling Nyström points. In this case, our algorithm improves the computational trade-off known for online kernel regression
Entropy of Overcomplete Kernel Dictionaries
In signal analysis and synthesis, linear approximation theory considers a
linear decomposition of any given signal in a set of atoms, collected into a
so-called dictionary. Relevant sparse representations are obtained by relaxing
the orthogonality condition of the atoms, yielding overcomplete dictionaries
with an extended number of atoms. More generally than the linear decomposition,
overcomplete kernel dictionaries provide an elegant nonlinear extension by
defining the atoms through a mapping kernel function (e.g., the gaussian
kernel). Models based on such kernel dictionaries are used in neural networks,
gaussian processes and online learning with kernels.
The quality of an overcomplete dictionary is evaluated with a diversity
measure the distance, the approximation, the coherence and the Babel measures.
In this paper, we develop a framework to examine overcomplete kernel
dictionaries with the entropy from information theory. Indeed, a higher value
of the entropy is associated to a further uniform spread of the atoms over the
space. For each of the aforementioned diversity measures, we derive lower
bounds on the entropy. Several definitions of the entropy are examined, with an
extensive analysis in both the input space and the mapped feature space.Comment: 10 page
Online Learning with Multiple Operator-valued Kernels
We consider the problem of learning a vector-valued function f in an online
learning setting. The function f is assumed to lie in a reproducing Hilbert
space of operator-valued kernels. We describe two online algorithms for
learning f while taking into account the output structure. A first contribution
is an algorithm, ONORMA, that extends the standard kernel-based online learning
algorithm NORMA from scalar-valued to operator-valued setting. We report a
cumulative error bound that holds both for classification and regression. We
then define a second algorithm, MONORMA, which addresses the limitation of
pre-defining the output structure in ONORMA by learning sequentially a linear
combination of operator-valued kernels. Our experiments show that the proposed
algorithms achieve good performance results with low computational cost
- …