5,861 research outputs found

    Low Computational Cost Machine Learning: Random Projections and Polynomial Kernels

    Get PDF
    [EN] According to recent reports, over the course of 2018, the volume of data generated, captured and replicated globally was 33 Zettabytes (ZB), and it is expected to reach 175 ZB by the year 2025. Managing this impressive increase in the volume and variety of data represents a great challenge, but also provides organizations with a precious opportunity to support their decision-making processes with insights and knowledge extracted from massive collections of data and to automate tasks leading to important savings. In this context, the field of machine learning has attracted a notable level of attention, and recent breakthroughs in the area have enabled the creation of predictive models of unprecedented accuracy. However, with the emergence of new computational paradigms, the field is now faced with the challenge of creating more efficient models, capable of running on low computational power environments while maintaining a high level of accuracy. This thesis focuses on the design and evaluation of new algorithms for the generation of useful data representations, with special attention to the scalability and efficiency of the proposed solutions. In particular, the proposed methods make an intensive use of randomization in order to map data samples to the feature spaces of polynomial kernels and then condensate the useful information present in those feature spaces into a more compact representation. The resulting algorithmic designs are easy to implement and require little computational power to run. As a consequence, they are perfectly suited for applications in environments where computational resources are scarce and data needs to be analyzed with little delay. The two major contributions of this thesis are: (1) we present and evaluate efficient and data-independent algorithms that perform Random Projections from the feature spaces of polynomial kernels of different degrees and (2) we demonstrate how these techniques can be used to accelerate machine learning tasks where polynomial interaction features are used, focusing particularly on bilinear models in deep learning

    The inverse moment problem for convex polytopes

    Full text link
    The goal of this paper is to present a general and novel approach for the reconstruction of any convex d-dimensional polytope P, from knowledge of its moments. In particular, we show that the vertices of an N-vertex polytope in R^d can be reconstructed from the knowledge of O(DN) axial moments (w.r.t. to an unknown polynomial measure od degree D) in d+1 distinct generic directions. Our approach is based on the collection of moment formulas due to Brion, Lawrence, Khovanskii-Pukhikov, and Barvinok that arise in the discrete geometry of polytopes, and what variously known as Prony's method, or Vandermonde factorization of finite rank Hankel matrices.Comment: LaTeX2e, 24 pages including 1 appendi

    Data-independent Random Projections from the feature-map of the homogeneous polynomial kernel of degree two

    Get PDF
    [EN] This paper presents a novel non-linear extension of the Random Projection method based on the degree-2 homogeneous polynomial kernel. Our algorithm is able to implicitly map data points to the high-dimensional feature space of that kernel and from there perform a Random Projection to an Euclidean space of the desired dimensionality. Pairwise distances between data points in the kernel feature space are approximately preserved in the resulting representation. As opposed to previous kernelized Random Projection versions, our method is data-independent and preserves much of the computational simplicity of the original algorithm. This is achieved by focusing on a specific kernel function, what allowed us to analyze the effect of its associated feature mapping in the distribution of the Random Projection hyperplanes. Finally, we present empirical evidence that the proposed method outperforms alternative approaches in terms of pairwise distance preservation, while being significantly more efficient. Also, we show how our method can be used to approximate the accuracy of non-linear classifiers with efficient linear classifiers in some datasets

    Random Feature Maps for Dot Product Kernels

    Full text link
    Approximating non-linear kernels using feature maps has gained a lot of interest in recent years due to applications in reducing training and testing times of SVM classifiers and other kernel based learning algorithms. We extend this line of work and present low distortion embeddings for dot product kernels into linear Euclidean spaces. We base our results on a classical result in harmonic analysis characterizing all dot product kernels and use it to define randomized feature maps into explicit low dimensional Euclidean spaces in which the native dot product provides an approximation to the dot product kernel with high confidence.Comment: To appear in the proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS 2012). This version corrects a minor error with Lemma 10. Acknowledgements : Devanshu Bhimwa

    The Geometry of Nonlinear Embeddings in Kernel Discriminant Analysis

    Full text link
    Fisher's linear discriminant analysis is a classical method for classification, yet it is limited to capturing linear features only. Kernel discriminant analysis as an extension is known to successfully alleviate the limitation through a nonlinear feature mapping. We study the geometry of nonlinear embeddings in discriminant analysis with polynomial kernels and Gaussian kernel by identifying the population-level discriminant function that depends on the data distribution and the kernel. In order to obtain the discriminant function, we solve a generalized eigenvalue problem with between-class and within-class covariance operators. The polynomial discriminants are shown to capture the class difference through the population moments explicitly. For approximation of the Gaussian discriminant, we use a particular representation of the Gaussian kernel by utilizing the exponential generating function for Hermite polynomials. We also show that the Gaussian discriminant can be approximated using randomized projections of the data. Our results illuminate how the data distribution and the kernel interact in determination of the nonlinear embedding for discrimination, and provide a guideline for choice of the kernel and its parameters

    The Teaching Dimension of Kernel Perceptron

    Full text link
    Algorithmic machine teaching has been studied under the linear setting where exact teaching is possible. However, little is known for teaching nonlinear learners. Here, we establish the sample complexity of teaching, aka teaching dimension, for kernelized perceptrons for different families of feature maps. As a warm-up, we show that the teaching complexity is Θ(d)\Theta(d) for the exact teaching of linear perceptrons in Rd\mathbb{R}^d, and Θ(dk)\Theta(d^k) for kernel perceptron with a polynomial kernel of order kk. Furthermore, under certain smooth assumptions on the data distribution, we establish a rigorous bound on the complexity for approximately teaching a Gaussian kernel perceptron. We provide numerical examples of the optimal (approximate) teaching set under several canonical settings for linear, polynomial and Gaussian kernel perceptrons.Comment: AISTATS 202
    corecore