Search CORE

1 research outputs found

The main contribution of the paper is to show that Gaussian sketching of a kernel-Gram matrix

\boldsymbol K

yields an operator whose counterpart in an RKHS

\mathcal H

, is a \emph{random projection} operator---in the spirit of Johnson-Lindenstrauss (J-L) lemma. To be precise, given a random matrix

Z

with i.i.d. Gaussian entries, we show that a sketch

Z\boldsymbol{K}

corresponds to a particular random operator in (infinite-dimensional) Hilbert space

\mathcal H

that maps functions

f \in \mathcal H

to a low-dimensional space

\mathbb R^d

, while preserving a weighted RKHS inner-product of the form

\langle f, g \rangle_{\Sigma} \doteq \langle f, \Sigma^3 g \rangle_{\mathcal H}

, where

\Sigma

is the \emph{covariance} operator induced by the data distribution. In particular, under similar assumptions as in kernel PCA (KPCA), or kernel

k

-means (K-

k

-means), well-separated subsets of feature-space

\{K(\cdot, x): x \in \cal X\}

remain well-separated after such operation, which suggests similar benefits as in KPCA and/or K-

k

-means, albeit at the much cheaper cost of a random projection. In particular, our convergence rates suggest that, given a large dataset

\{X_i\}_{i=1}^N

of size

N

, we can build the Gram matrix

\boldsymbol K

on a much smaller subsample of size

n\ll N

, so that the sketch

Z\boldsymbol K

is very cheap to obtain and subsequently apply as a projection operator on the original data

\{X_i\}_{i=1}^N

. We verify these insights empirically on synthetic data, and on real-world clustering applications.Comment: 16 page