42 research outputs found
Analyzing sparse dictionaries for online learning with kernels
Many signal processing and machine learning methods share essentially the
same linear-in-the-parameter model, with as many parameters as available
samples as in kernel-based machines. Sparse approximation is essential in many
disciplines, with new challenges emerging in online learning with kernels. To
this end, several sparsity measures have been proposed in the literature to
quantify sparse dictionaries and constructing relevant ones, the most prolific
ones being the distance, the approximation, the coherence and the Babel
measures. In this paper, we analyze sparse dictionaries based on these
measures. By conducting an eigenvalue analysis, we show that these sparsity
measures share many properties, including the linear independence condition and
inducing a well-posed optimization problem. Furthermore, we prove that there
exists a quasi-isometry between the parameter (i.e., dual) space and the
dictionary's induced feature space.Comment: 10 page
Improving Sparsity in Kernel Adaptive Filters Using a Unit-Norm Dictionary
Kernel adaptive filters, a class of adaptive nonlinear time-series models,
are known by their ability to learn expressive autoregressive patterns from
sequential data. However, for trivial monotonic signals, they struggle to
perform accurate predictions and at the same time keep computational complexity
within desired boundaries. This is because new observations are incorporated to
the dictionary when they are far from what the algorithm has seen in the past.
We propose a novel approach to kernel adaptive filtering that compares new
observations against dictionary samples in terms of their unit-norm
(normalised) versions, meaning that new observations that look like previous
samples but have a different magnitude are not added to the dictionary. We
achieve this by proposing the unit-norm Gaussian kernel and define a
sparsification criterion for this novel kernel. This new methodology is
validated on two real-world datasets against standard KAF in terms of the
normalised mean square error and the dictionary size.Comment: Accepted at the IEEE Digital Signal Processing conference 201
Stochastic Behavior Analysis of the Gaussian Kernel Least-Mean-Square Algorithm
The kernel least-mean-square (KLMS) algorithm is a popular algorithm in nonlinear adaptive filtering due to its
simplicity and robustness. In kernel adaptive filters, the statistics of the input to the linear filter depends on the parameters of the kernel employed. Moreover, practical implementations require a finite nonlinearity model order. A Gaussian KLMS has two design parameters, the step size and the Gaussian kernel bandwidth. Thus, its design requires analytical models for the algorithm behavior as a function of these two parameters. This paper studies the steady-state behavior and the transient behavior of the
Gaussian KLMS algorithm for Gaussian inputs and a finite order nonlinearity model. In particular, we derive recursive expressions for the mean-weight-error vector and the mean-square-error. The model predictions show excellent agreement with Monte Carlo simulations in transient and steady state. This allows the explicit analytical determination of stability limits, and gives opportunity
to choose the algorithm parameters a priori in order to achieve prescribed convergence speed and quality of the estimate. Design examples are presented which validate the theoretical analysis and illustrates its application
Adaptation and learning over networks for nonlinear system modeling
In this chapter, we analyze nonlinear filtering problems in distributed
environments, e.g., sensor networks or peer-to-peer protocols. In these
scenarios, the agents in the environment receive measurements in a streaming
fashion, and they are required to estimate a common (nonlinear) model by
alternating local computations and communications with their neighbors. We
focus on the important distinction between single-task problems, where the
underlying model is common to all agents, and multitask problems, where each
agent might converge to a different model due to, e.g., spatial dependencies or
other factors. Currently, most of the literature on distributed learning in the
nonlinear case has focused on the single-task case, which may be a strong
limitation in real-world scenarios. After introducing the problem and reviewing
the existing approaches, we describe a simple kernel-based algorithm tailored
for the multitask case. We evaluate the proposal on a simulated benchmark task,
and we conclude by detailing currently open problems and lines of research.Comment: To be published as a chapter in `Adaptive Learning Methods for
Nonlinear System Modeling', Elsevier Publishing, Eds. D. Comminiello and J.C.
Principe (2018
Entropy of Overcomplete Kernel Dictionaries
In signal analysis and synthesis, linear approximation theory considers a
linear decomposition of any given signal in a set of atoms, collected into a
so-called dictionary. Relevant sparse representations are obtained by relaxing
the orthogonality condition of the atoms, yielding overcomplete dictionaries
with an extended number of atoms. More generally than the linear decomposition,
overcomplete kernel dictionaries provide an elegant nonlinear extension by
defining the atoms through a mapping kernel function (e.g., the gaussian
kernel). Models based on such kernel dictionaries are used in neural networks,
gaussian processes and online learning with kernels.
The quality of an overcomplete dictionary is evaluated with a diversity
measure the distance, the approximation, the coherence and the Babel measures.
In this paper, we develop a framework to examine overcomplete kernel
dictionaries with the entropy from information theory. Indeed, a higher value
of the entropy is associated to a further uniform spread of the atoms over the
space. For each of the aforementioned diversity measures, we derive lower
bounds on the entropy. Several definitions of the entropy are examined, with an
extensive analysis in both the input space and the mapped feature space.Comment: 10 page
Approximation errors of online sparsification criteria
Many machine learning frameworks, such as resource-allocating networks,
kernel-based methods, Gaussian processes, and radial-basis-function networks,
require a sparsification scheme in order to address the online learning
paradigm. For this purpose, several online sparsification criteria have been
proposed to restrict the model definition on a subset of samples. The most
known criterion is the (linear) approximation criterion, which discards any
sample that can be well represented by the already contributing samples, an
operation with excessive computational complexity. Several computationally
efficient sparsification criteria have been introduced in the literature, such
as the distance, the coherence and the Babel criteria. In this paper, we
provide a framework that connects these sparsification criteria to the issue of
approximating samples, by deriving theoretical bounds on the approximation
errors. Moreover, we investigate the error of approximating any feature, by
proposing upper-bounds on the approximation error for each of the
aforementioned sparsification criteria. Two classes of features are described
in detail, the empirical mean and the principal axes in the kernel principal
component analysis.Comment: 10 page
Performance Analysis of Kernel Adaptive Filters based on LMS Algorithm
AbstractThe design of adaptive nonlinear filters has sparked a great interest in the machine learning community. The present paper aims to present some recent developments in nonlinear adaptive filtering. It provides an in-depth analysis of the performance and complexity of a class of kernel filters based on the least-mean-squares algorithm. A key feature that underlies kernel algorithms is that they map the data in a high-dimensional feature space where linear filtering is performed. The arithmetic operations are carried out in the initial space via evaluation of inner products between pairs of input patterns called kernels. The SNR improvement and the convergence speed of kernel-based least-mean-squares filters are evaluated on two types of applications: time series prediction and cardiac artifacts extraction from magnetoencephalographic data