5 research outputs found

    Analyzing sparse dictionaries for online learning with kernels

    Full text link
    Many signal processing and machine learning methods share essentially the same linear-in-the-parameter model, with as many parameters as available samples as in kernel-based machines. Sparse approximation is essential in many disciplines, with new challenges emerging in online learning with kernels. To this end, several sparsity measures have been proposed in the literature to quantify sparse dictionaries and constructing relevant ones, the most prolific ones being the distance, the approximation, the coherence and the Babel measures. In this paper, we analyze sparse dictionaries based on these measures. By conducting an eigenvalue analysis, we show that these sparsity measures share many properties, including the linear independence condition and inducing a well-posed optimization problem. Furthermore, we prove that there exists a quasi-isometry between the parameter (i.e., dual) space and the dictionary's induced feature space.Comment: 10 page

    Approximation errors of online sparsification criteria

    Full text link
    Many machine learning frameworks, such as resource-allocating networks, kernel-based methods, Gaussian processes, and radial-basis-function networks, require a sparsification scheme in order to address the online learning paradigm. For this purpose, several online sparsification criteria have been proposed to restrict the model definition on a subset of samples. The most known criterion is the (linear) approximation criterion, which discards any sample that can be well represented by the already contributing samples, an operation with excessive computational complexity. Several computationally efficient sparsification criteria have been introduced in the literature, such as the distance, the coherence and the Babel criteria. In this paper, we provide a framework that connects these sparsification criteria to the issue of approximating samples, by deriving theoretical bounds on the approximation errors. Moreover, we investigate the error of approximating any feature, by proposing upper-bounds on the approximation error for each of the aforementioned sparsification criteria. Two classes of features are described in detail, the empirical mean and the principal axes in the kernel principal component analysis.Comment: 10 page

    Variable Weight Kernel Density Estimation

    Full text link
    Nonparametric density estimation is a common and important task in many problems in machine learning. It consists in estimating a density function from available observations without making parametric assumptions on the generating distribution. Kernel means are nonparametric estimators composed of the average of simple functions, called kernels, centered at each data point. This work studies some relatives of these kernel means with structural similarity but which assign different weights to each kernel unit in order to attain certain desired characteristics. In particular, we present a sparse kernel mean estimator and a consistent kernel density estimator with fixed bandwidth parameter. First, regarding kernel means, we study the kernel density estimator (KDE) and the kernel mean embedding. These are frequently used to represent probability distributions, unfortunately, they face scalability issues. A single point evaluation of the kernel density estimator, for example, requires a computation time linear in the training sample size. To address this challenge, we present a method to efficiently construct a sparse approximation of a kernel mean. We do so by first establishing an incoherence-based bound on the approximation error. We then observe that, for any kernel with constant norm (which includes all translation invariant kernels), the bound can be efficiently minimized by solving the k-center problem. The outcome is a linear time construction of a sparse kernel mean, which also lends itself naturally to an automatic sparsity selection scheme. We demonstrate the computational gains of our method by looking at several benchmark data sets, as well as three applications involving kernel means: Euclidean embedding of distributions, class proportion estimation, and clustering using the mean-shift algorithm. Second we address the bandwidth selection problem in kernel density estimation. Consistency of the KDE requires that the kernel bandwidth tends to zero as the sample size grows. In this work, we investigate the question of whether consistency is still possible when the bandwidth is fixed, if we consider a more general class of weighted KDEs. To answer this question in the affirmative, we introduce the fixed-bandwidth KDE (fbKDE), obtained by solving a quadratic program, that consistently estimates any continuous square-integrable density. Rates of convergence are also established for the fbKDE for radial kernels and the box kernel under appropriate smoothness assumptions. Furthermore, in a simulation study we demonstrate that the fbKDE compares favorably to the standard KDE and the previously proposed variable bandwidth KDE.PHDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/138533/1/encc_1.pd

    ONE-CLASS MACHINES BASED ON THE COHERENCE CRITERION

    No full text
    The one-class classification problem is often addressed by solving a constrained quadratic optimization problem, in the same spirit as support vector machines. In this paper, we derive a novel one-class classification approach, by investigating an original sparsification criterion. This criterion, known as the coherence criterion, is based on a fundamental quantity that describes the behavior of dictionaries in sparse approximation problems. The proposed framework allows us to derive new theoretical results. We associate the coherence criterion with a one-class classification algorithm by solving a least-squares optimization problem. We also provide an adaptive updating scheme. Experiments are conducted on real datasets and time series, illustrating the relevance of our approach to existing methods in both accuracy and computational efficiency. Index Terms — support vector machines, machine learning, kernel methods, one-class classification 1

    ONLINE ONE-CLASS MACHINES BASED ON THE COHERENCE CRITERION

    No full text
    In this paper, we investigate a novel online one-class classification method. We consider a least-squares optimization problem, where the model complexity is controlled by the coherence criterion as a sparsification rule. This criterion iscoupled with a simple updating rule for online learning, which yields a low computational demanding algorithm. Experiments conducted on time series illustrate the relevance of our approach to existing methods. Index Terms — support vector machines, kernel methods, one-class classification, online learning, coherence parameter 1
    corecore