42 research outputs found

    Analyzing sparse dictionaries for online learning with kernels

    Full text link
    Many signal processing and machine learning methods share essentially the same linear-in-the-parameter model, with as many parameters as available samples as in kernel-based machines. Sparse approximation is essential in many disciplines, with new challenges emerging in online learning with kernels. To this end, several sparsity measures have been proposed in the literature to quantify sparse dictionaries and constructing relevant ones, the most prolific ones being the distance, the approximation, the coherence and the Babel measures. In this paper, we analyze sparse dictionaries based on these measures. By conducting an eigenvalue analysis, we show that these sparsity measures share many properties, including the linear independence condition and inducing a well-posed optimization problem. Furthermore, we prove that there exists a quasi-isometry between the parameter (i.e., dual) space and the dictionary's induced feature space.Comment: 10 page

    Improving Sparsity in Kernel Adaptive Filters Using a Unit-Norm Dictionary

    Full text link
    Kernel adaptive filters, a class of adaptive nonlinear time-series models, are known by their ability to learn expressive autoregressive patterns from sequential data. However, for trivial monotonic signals, they struggle to perform accurate predictions and at the same time keep computational complexity within desired boundaries. This is because new observations are incorporated to the dictionary when they are far from what the algorithm has seen in the past. We propose a novel approach to kernel adaptive filtering that compares new observations against dictionary samples in terms of their unit-norm (normalised) versions, meaning that new observations that look like previous samples but have a different magnitude are not added to the dictionary. We achieve this by proposing the unit-norm Gaussian kernel and define a sparsification criterion for this novel kernel. This new methodology is validated on two real-world datasets against standard KAF in terms of the normalised mean square error and the dictionary size.Comment: Accepted at the IEEE Digital Signal Processing conference 201

    Stochastic Behavior Analysis of the Gaussian Kernel Least-Mean-Square Algorithm

    Get PDF
    The kernel least-mean-square (KLMS) algorithm is a popular algorithm in nonlinear adaptive filtering due to its simplicity and robustness. In kernel adaptive filters, the statistics of the input to the linear filter depends on the parameters of the kernel employed. Moreover, practical implementations require a finite nonlinearity model order. A Gaussian KLMS has two design parameters, the step size and the Gaussian kernel bandwidth. Thus, its design requires analytical models for the algorithm behavior as a function of these two parameters. This paper studies the steady-state behavior and the transient behavior of the Gaussian KLMS algorithm for Gaussian inputs and a finite order nonlinearity model. In particular, we derive recursive expressions for the mean-weight-error vector and the mean-square-error. The model predictions show excellent agreement with Monte Carlo simulations in transient and steady state. This allows the explicit analytical determination of stability limits, and gives opportunity to choose the algorithm parameters a priori in order to achieve prescribed convergence speed and quality of the estimate. Design examples are presented which validate the theoretical analysis and illustrates its application

    Adaptation and learning over networks for nonlinear system modeling

    Full text link
    In this chapter, we analyze nonlinear filtering problems in distributed environments, e.g., sensor networks or peer-to-peer protocols. In these scenarios, the agents in the environment receive measurements in a streaming fashion, and they are required to estimate a common (nonlinear) model by alternating local computations and communications with their neighbors. We focus on the important distinction between single-task problems, where the underlying model is common to all agents, and multitask problems, where each agent might converge to a different model due to, e.g., spatial dependencies or other factors. Currently, most of the literature on distributed learning in the nonlinear case has focused on the single-task case, which may be a strong limitation in real-world scenarios. After introducing the problem and reviewing the existing approaches, we describe a simple kernel-based algorithm tailored for the multitask case. We evaluate the proposal on a simulated benchmark task, and we conclude by detailing currently open problems and lines of research.Comment: To be published as a chapter in `Adaptive Learning Methods for Nonlinear System Modeling', Elsevier Publishing, Eds. D. Comminiello and J.C. Principe (2018

    Entropy of Overcomplete Kernel Dictionaries

    Full text link
    In signal analysis and synthesis, linear approximation theory considers a linear decomposition of any given signal in a set of atoms, collected into a so-called dictionary. Relevant sparse representations are obtained by relaxing the orthogonality condition of the atoms, yielding overcomplete dictionaries with an extended number of atoms. More generally than the linear decomposition, overcomplete kernel dictionaries provide an elegant nonlinear extension by defining the atoms through a mapping kernel function (e.g., the gaussian kernel). Models based on such kernel dictionaries are used in neural networks, gaussian processes and online learning with kernels. The quality of an overcomplete dictionary is evaluated with a diversity measure the distance, the approximation, the coherence and the Babel measures. In this paper, we develop a framework to examine overcomplete kernel dictionaries with the entropy from information theory. Indeed, a higher value of the entropy is associated to a further uniform spread of the atoms over the space. For each of the aforementioned diversity measures, we derive lower bounds on the entropy. Several definitions of the entropy are examined, with an extensive analysis in both the input space and the mapped feature space.Comment: 10 page

    Approximation errors of online sparsification criteria

    Full text link
    Many machine learning frameworks, such as resource-allocating networks, kernel-based methods, Gaussian processes, and radial-basis-function networks, require a sparsification scheme in order to address the online learning paradigm. For this purpose, several online sparsification criteria have been proposed to restrict the model definition on a subset of samples. The most known criterion is the (linear) approximation criterion, which discards any sample that can be well represented by the already contributing samples, an operation with excessive computational complexity. Several computationally efficient sparsification criteria have been introduced in the literature, such as the distance, the coherence and the Babel criteria. In this paper, we provide a framework that connects these sparsification criteria to the issue of approximating samples, by deriving theoretical bounds on the approximation errors. Moreover, we investigate the error of approximating any feature, by proposing upper-bounds on the approximation error for each of the aforementioned sparsification criteria. Two classes of features are described in detail, the empirical mean and the principal axes in the kernel principal component analysis.Comment: 10 page

    Performance Analysis of Kernel Adaptive Filters based on LMS Algorithm

    Get PDF
    AbstractThe design of adaptive nonlinear filters has sparked a great interest in the machine learning community. The present paper aims to present some recent developments in nonlinear adaptive filtering. It provides an in-depth analysis of the performance and complexity of a class of kernel filters based on the least-mean-squares algorithm. A key feature that underlies kernel algorithms is that they map the data in a high-dimensional feature space where linear filtering is performed. The arithmetic operations are carried out in the initial space via evaluation of inner products between pairs of input patterns called kernels. The SNR improvement and the convergence speed of kernel-based least-mean-squares filters are evaluated on two types of applications: time series prediction and cardiac artifacts extraction from magnetoencephalographic data
    corecore