5,963 research outputs found

    Learning additive models online with fast evaluating kernels

    Get PDF
    We develop three new techniques to build on the recent advances in online learning with kernels. First, we show that an exponential speed-up in prediction time pertrial is possible for such algorithms as the Kernel-Adatron, the Kernel-Perceptron, and ROMMA for specific additive models. Second, we show that the techniques of the recent algorithms developed for online linear prediction when the best predictor changes over time may be implemented for kernel-based learners at no additional asymptotic cost. Finally, we introduce a new online kernel-based learning algorithm for which we give worst-case loss bounds for the ϵ-insensitive square loss

    High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso

    Full text link
    The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and output values. In this paper, we consider a feature-wise kernelized Lasso for capturing non-linear input-output dependency. We first show that, with particular choices of kernel functions, non-redundant features with strong statistical dependence on output values can be found in terms of kernel-based independence measures. We then show that the globally optimal solution can be efficiently computed; this makes the approach scalable to high-dimensional problems. The effectiveness of the proposed method is demonstrated through feature selection experiments with thousands of features.Comment: 18 page

    Forecasting of commercial sales with large scale Gaussian Processes

    Full text link
    This paper argues that there has not been enough discussion in the field of applications of Gaussian Process for the fast moving consumer goods industry. Yet, this technique can be important as it e.g., can provide automatic feature relevance determination and the posterior mean can unlock insights on the data. Significant challenges are the large size and high dimensionality of commercial data at a point of sale. The study reviews approaches in the Gaussian Processes modeling for large data sets, evaluates their performance on commercial sales and shows value of this type of models as a decision-making tool for management.Comment: 1o pages, 5 figure

    Sketching for Large-Scale Learning of Mixture Models

    Get PDF
    Learning parameters from voluminous data can be prohibitive in terms of memory and computational requirements. We propose a "compressive learning" framework where we estimate model parameters from a sketch of the training data. This sketch is a collection of generalized moments of the underlying probability distribution of the data. It can be computed in a single pass on the training set, and is easily computable on streams or distributed datasets. The proposed framework shares similarities with compressive sensing, which aims at drastically reducing the dimension of high-dimensional signals while preserving the ability to reconstruct them. To perform the estimation task, we derive an iterative algorithm analogous to sparse reconstruction algorithms in the context of linear inverse problems. We exemplify our framework with the compressive estimation of a Gaussian Mixture Model (GMM), providing heuristics on the choice of the sketching procedure and theoretical guarantees of reconstruction. We experimentally show on synthetic data that the proposed algorithm yields results comparable to the classical Expectation-Maximization (EM) technique while requiring significantly less memory and fewer computations when the number of database elements is large. We further demonstrate the potential of the approach on real large-scale data (over 10 8 training samples) for the task of model-based speaker verification. Finally, we draw some connections between the proposed framework and approximate Hilbert space embedding of probability distributions using random features. We show that the proposed sketching operator can be seen as an innovative method to design translation-invariant kernels adapted to the analysis of GMMs. We also use this theoretical framework to derive information preservation guarantees, in the spirit of infinite-dimensional compressive sensing
    • …
    corecore