15 research outputs found
On Data-Dependent Random Features for Improved Generalization in Supervised Learning
The randomized-feature approach has been successfully employed in large-scale
kernel approximation and supervised learning. The distribution from which the
random features are drawn impacts the number of features required to
efficiently perform a learning task. Recently, it has been shown that employing
data-dependent randomization improves the performance in terms of the required
number of random features. In this paper, we are concerned with the
randomized-feature approach in supervised learning for good generalizability.
We propose the Energy-based Exploration of Random Features (EERF) algorithm
based on a data-dependent score function that explores the set of possible
features and exploits the promising regions. We prove that the proposed score
function with high probability recovers the spectrum of the best fit within the
model class. Our empirical results on several benchmark datasets further verify
that our method requires smaller number of random features to achieve a certain
generalization error compared to the state-of-the-art while introducing
negligible pre-processing overhead. EERF can be implemented in a few lines of
code and requires no additional tuning parameters.Comment: 12 pages; (pages 1-8) to appear in Proc. of AAAI Conference on
Artificial Intelligence (AAAI), 201
Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations
Fast linear transforms are ubiquitous in machine learning, including the
discrete Fourier transform, discrete cosine transform, and other structured
transformations such as convolutions. All of these transforms can be
represented by dense matrix-vector multiplication, yet each has a specialized
and highly efficient (subquadratic) algorithm. We ask to what extent
hand-crafting these algorithms and implementations is necessary, what
structural priors they encode, and how much knowledge is required to
automatically learn a fast algorithm for a provided structured transform.
Motivated by a characterization of fast matrix-vector multiplication as
products of sparse matrices, we introduce a parameterization of
divide-and-conquer methods that is capable of representing a large class of
transforms. This generic formulation can automatically learn an efficient
algorithm for many important transforms; for example, it recovers the Cooley-Tukey FFT algorithm to machine precision, for dimensions up to
. Furthermore, our method can be incorporated as a lightweight
replacement of generic matrices in machine learning pipelines to learn
efficient and compressible transformations. On a standard task of compressing a
single hidden-layer network, our method exceeds the classification accuracy of
unconstrained matrices on CIFAR-10 by 3.9 points---the first time a structured
approach has done so---with 4X faster inference speed and 40X fewer parameters