5,140 research outputs found
Data-independent Random Projections from the feature-map of the homogeneous polynomial kernel of degree two
[EN] This paper presents a novel non-linear extension of the Random Projection method based on the degree-2 homogeneous polynomial kernel. Our algorithm is able to implicitly map data points to the high-dimensional feature space of that kernel and from there perform a Random Projection to an Euclidean space of the desired dimensionality. Pairwise distances between data points in the kernel feature space are approximately preserved in the resulting representation. As opposed to previous kernelized Random Projection versions, our method is data-independent and preserves much of the computational simplicity of the original algorithm. This is achieved by focusing on a specific kernel function, what allowed us to analyze the effect of its associated feature mapping in the distribution of the Random Projection hyperplanes. Finally, we present empirical evidence that the proposed method outperforms alternative approaches in terms of pairwise distance preservation, while being significantly more efficient. Also, we show how our method can be used to approximate the accuracy of non-linear classifiers with efficient linear classifiers in some datasets
Low Computational Cost Machine Learning: Random Projections and Polynomial Kernels
[EN] According to recent reports, over the course of 2018, the volume of data generated, captured and replicated globally was 33 Zettabytes (ZB), and it is expected to reach 175 ZB by the year 2025. Managing this impressive increase in the volume and variety of data represents a great challenge, but also provides organizations with a precious opportunity to support their decision-making processes with insights and knowledge extracted from massive collections of data and to automate tasks leading to important savings. In this context, the field of machine learning has attracted a notable level of attention, and recent breakthroughs in the area have enabled the creation of predictive models of unprecedented accuracy. However, with the emergence of new computational paradigms, the field is now faced with the challenge of creating more efficient models, capable of running on low computational power environments while maintaining a high level of accuracy. This thesis focuses on the design and evaluation of new algorithms for the generation of useful data representations, with special attention to the scalability and efficiency of the proposed solutions. In particular, the proposed methods make an intensive use of randomization in order to map data samples to the feature spaces of polynomial kernels and then condensate the useful information present in those feature spaces into a more compact representation. The resulting algorithmic designs are easy to implement and require little computational power to run. As a consequence, they are perfectly suited for applications in environments where computational resources are scarce and data needs to be analyzed with little delay. The two major contributions of this thesis are: (1) we present and evaluate efficient and data-independent algorithms that perform Random Projections from the feature spaces of polynomial kernels of different degrees and (2) we demonstrate how these techniques can be used to accelerate machine learning tasks where polynomial interaction features are used, focusing particularly on bilinear models in deep learning
The Geometry of Nonlinear Embeddings in Kernel Discriminant Analysis
Fisher's linear discriminant analysis is a classical method for
classification, yet it is limited to capturing linear features only. Kernel
discriminant analysis as an extension is known to successfully alleviate the
limitation through a nonlinear feature mapping. We study the geometry of
nonlinear embeddings in discriminant analysis with polynomial kernels and
Gaussian kernel by identifying the population-level discriminant function that
depends on the data distribution and the kernel. In order to obtain the
discriminant function, we solve a generalized eigenvalue problem with
between-class and within-class covariance operators. The polynomial
discriminants are shown to capture the class difference through the population
moments explicitly. For approximation of the Gaussian discriminant, we use a
particular representation of the Gaussian kernel by utilizing the exponential
generating function for Hermite polynomials. We also show that the Gaussian
discriminant can be approximated using randomized projections of the data. Our
results illuminate how the data distribution and the kernel interact in
determination of the nonlinear embedding for discrimination, and provide a
guideline for choice of the kernel and its parameters
Kernel computations from large-scale random features obtained by Optical Processing Units
Approximating kernel functions with random features (RFs)has been a
successful application of random projections for nonparametric estimation.
However, performing random projections presents computational challenges for
large-scale problems. Recently, a new optical hardware called Optical
Processing Unit (OPU) has been developed for fast and energy-efficient
computation of large-scale RFs in the analog domain. More specifically, the OPU
performs the multiplication of input vectors by a large random matrix with
complex-valued i.i.d. Gaussian entries, followed by the application of an
element-wise squared absolute value operation - this last nonlinearity being
intrinsic to the sensing process. In this paper, we show that this operation
results in a dot-product kernel that has connections to the polynomial kernel,
and we extend this computation to arbitrary powers of the feature map.
Experiments demonstrate that the OPU kernel and its RF approximation achieve
competitive performance in applications using kernel ridge regression and
transfer learning for image classification. Crucially, thanks to the use of the
OPU, these results are obtained with time and energy savings.Comment: 5 pages, 3 figures, submitted to ICASSP 202
The Teaching Dimension of Kernel Perceptron
Algorithmic machine teaching has been studied under the linear setting where
exact teaching is possible. However, little is known for teaching nonlinear
learners. Here, we establish the sample complexity of teaching, aka teaching
dimension, for kernelized perceptrons for different families of feature maps.
As a warm-up, we show that the teaching complexity is for the exact
teaching of linear perceptrons in , and for kernel
perceptron with a polynomial kernel of order . Furthermore, under certain
smooth assumptions on the data distribution, we establish a rigorous bound on
the complexity for approximately teaching a Gaussian kernel perceptron. We
provide numerical examples of the optimal (approximate) teaching set under
several canonical settings for linear, polynomial and Gaussian kernel
perceptrons.Comment: AISTATS 202
The bottleneck degree of algebraic varieties
A bottleneck of a smooth algebraic variety is a pair
of distinct points such that the Euclidean normal spaces at
and contain the line spanned by and . The narrowness of bottlenecks
is a fundamental complexity measure in the algebraic geometry of data. In this
paper we study the number of bottlenecks of affine and projective varieties,
which we call the bottleneck degree. The bottleneck degree is a measure of the
complexity of computing all bottlenecks of an algebraic variety, using for
example numerical homotopy methods. We show that the bottleneck degree is a
function of classical invariants such as Chern classes and polar classes. We
give the formula explicitly in low dimension and provide an algorithm to
compute it in the general case.Comment: Major revision. New introduction. Added some new illustrative lemmas
and figures. Added pseudocode for the algorithm to compute bottleneck degree.
Fixed some typo
- …