5,861 research outputs found
Low Computational Cost Machine Learning: Random Projections and Polynomial Kernels
[EN] According to recent reports, over the course of 2018, the volume of data generated, captured and replicated globally was 33 Zettabytes (ZB), and it is expected to reach 175 ZB by the year 2025. Managing this impressive increase in the volume and variety of data represents a great challenge, but also provides organizations with a precious opportunity to support their decision-making processes with insights and knowledge extracted from massive collections of data and to automate tasks leading to important savings. In this context, the field of machine learning has attracted a notable level of attention, and recent breakthroughs in the area have enabled the creation of predictive models of unprecedented accuracy. However, with the emergence of new computational paradigms, the field is now faced with the challenge of creating more efficient models, capable of running on low computational power environments while maintaining a high level of accuracy. This thesis focuses on the design and evaluation of new algorithms for the generation of useful data representations, with special attention to the scalability and efficiency of the proposed solutions. In particular, the proposed methods make an intensive use of randomization in order to map data samples to the feature spaces of polynomial kernels and then condensate the useful information present in those feature spaces into a more compact representation. The resulting algorithmic designs are easy to implement and require little computational power to run. As a consequence, they are perfectly suited for applications in environments where computational resources are scarce and data needs to be analyzed with little delay. The two major contributions of this thesis are: (1) we present and evaluate efficient and data-independent algorithms that perform Random Projections from the feature spaces of polynomial kernels of different degrees and (2) we demonstrate how these techniques can be used to accelerate machine learning tasks where polynomial interaction features are used, focusing particularly on bilinear models in deep learning
The inverse moment problem for convex polytopes
The goal of this paper is to present a general and novel approach for the
reconstruction of any convex d-dimensional polytope P, from knowledge of its
moments. In particular, we show that the vertices of an N-vertex polytope in
R^d can be reconstructed from the knowledge of O(DN) axial moments (w.r.t. to
an unknown polynomial measure od degree D) in d+1 distinct generic directions.
Our approach is based on the collection of moment formulas due to Brion,
Lawrence, Khovanskii-Pukhikov, and Barvinok that arise in the discrete geometry
of polytopes, and what variously known as Prony's method, or Vandermonde
factorization of finite rank Hankel matrices.Comment: LaTeX2e, 24 pages including 1 appendi
Data-independent Random Projections from the feature-map of the homogeneous polynomial kernel of degree two
[EN] This paper presents a novel non-linear extension of the Random Projection method based on the degree-2 homogeneous polynomial kernel. Our algorithm is able to implicitly map data points to the high-dimensional feature space of that kernel and from there perform a Random Projection to an Euclidean space of the desired dimensionality. Pairwise distances between data points in the kernel feature space are approximately preserved in the resulting representation. As opposed to previous kernelized Random Projection versions, our method is data-independent and preserves much of the computational simplicity of the original algorithm. This is achieved by focusing on a specific kernel function, what allowed us to analyze the effect of its associated feature mapping in the distribution of the Random Projection hyperplanes. Finally, we present empirical evidence that the proposed method outperforms alternative approaches in terms of pairwise distance preservation, while being significantly more efficient. Also, we show how our method can be used to approximate the accuracy of non-linear classifiers with efficient linear classifiers in some datasets
Random Feature Maps for Dot Product Kernels
Approximating non-linear kernels using feature maps has gained a lot of
interest in recent years due to applications in reducing training and testing
times of SVM classifiers and other kernel based learning algorithms. We extend
this line of work and present low distortion embeddings for dot product kernels
into linear Euclidean spaces. We base our results on a classical result in
harmonic analysis characterizing all dot product kernels and use it to define
randomized feature maps into explicit low dimensional Euclidean spaces in which
the native dot product provides an approximation to the dot product kernel with
high confidence.Comment: To appear in the proceedings of the 15th International Conference on
Artificial Intelligence and Statistics (AISTATS 2012). This version corrects
a minor error with Lemma 10. Acknowledgements : Devanshu Bhimwa
The Geometry of Nonlinear Embeddings in Kernel Discriminant Analysis
Fisher's linear discriminant analysis is a classical method for
classification, yet it is limited to capturing linear features only. Kernel
discriminant analysis as an extension is known to successfully alleviate the
limitation through a nonlinear feature mapping. We study the geometry of
nonlinear embeddings in discriminant analysis with polynomial kernels and
Gaussian kernel by identifying the population-level discriminant function that
depends on the data distribution and the kernel. In order to obtain the
discriminant function, we solve a generalized eigenvalue problem with
between-class and within-class covariance operators. The polynomial
discriminants are shown to capture the class difference through the population
moments explicitly. For approximation of the Gaussian discriminant, we use a
particular representation of the Gaussian kernel by utilizing the exponential
generating function for Hermite polynomials. We also show that the Gaussian
discriminant can be approximated using randomized projections of the data. Our
results illuminate how the data distribution and the kernel interact in
determination of the nonlinear embedding for discrimination, and provide a
guideline for choice of the kernel and its parameters
The Teaching Dimension of Kernel Perceptron
Algorithmic machine teaching has been studied under the linear setting where
exact teaching is possible. However, little is known for teaching nonlinear
learners. Here, we establish the sample complexity of teaching, aka teaching
dimension, for kernelized perceptrons for different families of feature maps.
As a warm-up, we show that the teaching complexity is for the exact
teaching of linear perceptrons in , and for kernel
perceptron with a polynomial kernel of order . Furthermore, under certain
smooth assumptions on the data distribution, we establish a rigorous bound on
the complexity for approximately teaching a Gaussian kernel perceptron. We
provide numerical examples of the optimal (approximate) teaching set under
several canonical settings for linear, polynomial and Gaussian kernel
perceptrons.Comment: AISTATS 202
- …