Kernel methods represent one of the most powerful tools in machine learning
to tackle problems expressed in terms of function values and derivatives due to
their capability to represent and model complex relations. While these methods
show good versatility, they are computationally intensive and have poor
scalability to large data as they require operations on Gram matrices. In order
to mitigate this serious computational limitation, recently randomized
constructions have been proposed in the literature, which allow the application
of fast linear algorithms. Random Fourier features (RFF) are among the most
popular and widely applied constructions: they provide an easily computable,
low-dimensional feature representation for shift-invariant kernels. Despite the
popularity of RFFs, very little is understood theoretically about their
approximation quality. In this paper, we provide a detailed finite-sample
theoretical analysis about the approximation quality of RFFs by (i)
establishing optimal (in terms of the RFF dimension, and growing set size)
performance guarantees in uniform norm, and (ii) presenting guarantees in Lr
(1≤r<∞) norms. We also propose an RFF approximation to derivatives of
a kernel with a theoretical study on its approximation quality.Comment: To appear at NIPS-201