3 research outputs found
Learning Random Fourier Features by Hybrid Constrained Optimization
The kernel embedding algorithm is an important component for adapting kernel
methods to large datasets. Since the algorithm consumes a major computation
cost in the testing phase, we propose a novel teacher-learner framework of
learning computation-efficient kernel embeddings from specific data. In the
framework, the high-precision embeddings (teacher) transfer the data
information to the computation-efficient kernel embeddings (learner). We
jointly select informative embedding functions and pursue an orthogonal
transformation between two embeddings. We propose a novel approach of
constrained variational expectation maximization (CVEM), where the alternate
direction method of multiplier (ADMM) is applied over a nonconvex domain in the
maximization step. We also propose two specific formulations based on the
prevalent Random Fourier Feature (RFF), the masked and blocked version of
Computation-Efficient RFF (CERF), by imposing a random binary mask or a block
structure on the transformation matrix. By empirical studies of several
applications on different real-world datasets, we demonstrate that the CERF
significantly improves the performance of kernel methods upon the RFF, under
certain arithmetic operation requirements, and suitable for structured matrix
multiplication in Fastfood type algorithms
TripleSpin - a generic compact paradigm for fast machine learning computations
We present a generic compact computational framework relying on structured
random matrices that can be applied to speed up several machine learning
algorithms with almost no loss of accuracy. The applications include new fast
LSH-based algorithms, efficient kernel computations via random feature maps,
convex optimization algorithms, quantization techniques and many more. Certain
models of the presented paradigm are even more compressible since they apply
only bit matrices. This makes them suitable for deploying on mobile devices.
All our findings come with strong theoretical guarantees. In particular, as a
byproduct of the presented techniques and by using relatively new
Berry-Esseen-type CLT for random vectors, we give the first theoretical
guarantees for one of the most efficient existing LSH algorithms based on the
structured matrix ("Practical
and Optimal LSH for Angular Distance"). These guarantees as well as theoretical
results for other aforementioned applications follow from the same general
theoretical principle that we present in the paper. Our structured family
contains as special cases all previously considered structured schemes,
including the recently introduced -model. Experimental evaluation confirms
the accuracy and efficiency of TripleSpin matrices
Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond
Random features is one of the most popular techniques to speed up kernel
methods in large-scale problems. Related works have been recognized by the
NeurIPS Test-of-Time award in 2017 and the ICML Best Paper Finalist in 2019.
The body of work on random features has grown rapidly, and hence it is
desirable to have a comprehensive overview on this topic explaining the
connections among various algorithms and theoretical results. In this survey,
we systematically review the work on random features from the past ten years.
First, the motivations, characteristics and contributions of representative
random features based algorithms are summarized according to their sampling
schemes, learning procedures, variance reduction properties and how they
exploit training data. Second, we review theoretical results that center around
the following key question: how many random features are needed to ensure a
high approximation quality or no loss in the empirical/expected risks of the
learned estimator. Third, we provide a comprehensive evaluation of popular
random features based algorithms on several large-scale benchmark datasets and
discuss their approximation quality and prediction performance for
classification. Last, we discuss the relationship between random features and
modern over-parameterized deep neural networks (DNNs), including the use of
high dimensional random features in the analysis of DNNs as well as the gaps
between current theoretical and empirical results. This survey may serve as a
gentle introduction to this topic, and as a users' guide for practitioners
interested in applying the representative algorithms and understanding
theoretical results under various technical assumptions. We hope that this
survey will facilitate discussion on the open problems in this topic, and more
importantly, shed light on future research directions.Comment: Short version will be published on IEEE TPAM