8 research outputs found
Analyzing sparse dictionaries for online learning with kernels
Many signal processing and machine learning methods share essentially the
same linear-in-the-parameter model, with as many parameters as available
samples as in kernel-based machines. Sparse approximation is essential in many
disciplines, with new challenges emerging in online learning with kernels. To
this end, several sparsity measures have been proposed in the literature to
quantify sparse dictionaries and constructing relevant ones, the most prolific
ones being the distance, the approximation, the coherence and the Babel
measures. In this paper, we analyze sparse dictionaries based on these
measures. By conducting an eigenvalue analysis, we show that these sparsity
measures share many properties, including the linear independence condition and
inducing a well-posed optimization problem. Furthermore, we prove that there
exists a quasi-isometry between the parameter (i.e., dual) space and the
dictionary's induced feature space.Comment: 10 page
Improving Sparsity in Kernel Adaptive Filters Using a Unit-Norm Dictionary
Kernel adaptive filters, a class of adaptive nonlinear time-series models,
are known by their ability to learn expressive autoregressive patterns from
sequential data. However, for trivial monotonic signals, they struggle to
perform accurate predictions and at the same time keep computational complexity
within desired boundaries. This is because new observations are incorporated to
the dictionary when they are far from what the algorithm has seen in the past.
We propose a novel approach to kernel adaptive filtering that compares new
observations against dictionary samples in terms of their unit-norm
(normalised) versions, meaning that new observations that look like previous
samples but have a different magnitude are not added to the dictionary. We
achieve this by proposing the unit-norm Gaussian kernel and define a
sparsification criterion for this novel kernel. This new methodology is
validated on two real-world datasets against standard KAF in terms of the
normalised mean square error and the dictionary size.Comment: Accepted at the IEEE Digital Signal Processing conference 201
Entropy of Overcomplete Kernel Dictionaries
In signal analysis and synthesis, linear approximation theory considers a
linear decomposition of any given signal in a set of atoms, collected into a
so-called dictionary. Relevant sparse representations are obtained by relaxing
the orthogonality condition of the atoms, yielding overcomplete dictionaries
with an extended number of atoms. More generally than the linear decomposition,
overcomplete kernel dictionaries provide an elegant nonlinear extension by
defining the atoms through a mapping kernel function (e.g., the gaussian
kernel). Models based on such kernel dictionaries are used in neural networks,
gaussian processes and online learning with kernels.
The quality of an overcomplete dictionary is evaluated with a diversity
measure the distance, the approximation, the coherence and the Babel measures.
In this paper, we develop a framework to examine overcomplete kernel
dictionaries with the entropy from information theory. Indeed, a higher value
of the entropy is associated to a further uniform spread of the atoms over the
space. For each of the aforementioned diversity measures, we derive lower
bounds on the entropy. Several definitions of the entropy are examined, with an
extensive analysis in both the input space and the mapped feature space.Comment: 10 page
Approximation errors of online sparsification criteria
Many machine learning frameworks, such as resource-allocating networks,
kernel-based methods, Gaussian processes, and radial-basis-function networks,
require a sparsification scheme in order to address the online learning
paradigm. For this purpose, several online sparsification criteria have been
proposed to restrict the model definition on a subset of samples. The most
known criterion is the (linear) approximation criterion, which discards any
sample that can be well represented by the already contributing samples, an
operation with excessive computational complexity. Several computationally
efficient sparsification criteria have been introduced in the literature, such
as the distance, the coherence and the Babel criteria. In this paper, we
provide a framework that connects these sparsification criteria to the issue of
approximating samples, by deriving theoretical bounds on the approximation
errors. Moreover, we investigate the error of approximating any feature, by
proposing upper-bounds on the approximation error for each of the
aforementioned sparsification criteria. Two classes of features are described
in detail, the empirical mean and the principal axes in the kernel principal
component analysis.Comment: 10 page
A kernel-based embedding framework for high-dimensional data analysis
The world is essentially multidimensional, e.g., neurons, computer networks, Internet traffic, and financial markets. The challenge is to discover and extract information that lies hidden in these high-dimensional datasets to support classification, regression, clustering, and visualization tasks. As a result, dimensionality reduction aims to provide a faithful representation of data in a low-dimensional space. This removes noise and redundant features, which is useful to understand and visualize the structure of complex datasets. The focus of this work is the analysis of high-dimensional data to support regression tasks and exploratory data analysis in real-world scenarios. Firstly, we propose an online framework to predict longterm future behavior of time-series. Secondly, we propose a new dimensionality reduction method to preserve the significant structure of high-dimensional data in a low-dimensional space. Lastly, we propose an sparsification strategy based on dimensionality reduction to avoid overfitting and reduce computational complexity in online applicationsEl mundo es esencialmente multidimensional, por ejemplo, neuronas, redes computacionales, tr谩fico de internet y los mercados financieros. El desaf铆o es descubrir y extraer informaci贸n que permanece oculta en estos conjuntos de datos de alta dimensi贸n para apoyar tareas de clasificaci贸n, regresi贸n, agrupamiento y visualizaci贸n. Como resultado de ello, los m茅todos de reducci贸n de dimensi贸n pretenden suministrar una fiel representaci贸n de los datos en un espacio de baja dimensi贸n. Esto permite eliminar ruido y caracter铆sticas redundantes, lo que es 煤til para entender y visualizar la estructura de conjuntos de datos complejos. Este trabajo se enfoca en el an谩lisis de datos de alta dimensi贸n para apoyar tareas de regresi贸n y el an谩lisis exploratorio de datos en escenarios del mundo real. En primer lugar, proponemos un marco para la predicci贸n del comportamiento a largo plazo de series de tiempo. En segundo lugar, se propone un nuevo m茅todo de reducci贸n de dimensi贸n para preservar la estructura significativa de datos de alta dimensi贸n en un espacio de baja dimensi贸n. Finalmente, proponemos una estrategia de esparsificacion que utiliza reducci贸n de dimensional dad para evitar sobre ajuste y reducir la complejidad computacional de aplicaciones en l铆neaDoctorad