86,181 research outputs found
The GaussianSketch for Almost Relative Error Kernel Distance
We introduce two versions of a new sketch for approximately embedding the Gaussian kernel into Euclidean inner product space. These work by truncating infinite expansions of the Gaussian kernel, and carefully invoking the RecursiveTensorSketch [Ahle et al. SODA 2020]. After providing concentration and approximation properties of these sketches, we use them to approximate the kernel distance between points sets. These sketches yield almost (1+?)-relative error, but with a small additive ? term. In the first variants the dependence on 1/? is poly-logarithmic, but has higher degree of polynomial dependence on the original dimension d. In the second variant, the dependence on 1/? is still poly-logarithmic, but the dependence on d is linear
Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation
In all state-of-the-art sketching and coreset techniques for clustering, as
well as in the best known fixed-parameter tractable approximation algorithms,
randomness plays a key role. For the classic -median and -means problems,
there are no known deterministic dimensionality reduction procedure or coreset
construction that avoid an exponential dependency on the input dimension ,
the precision parameter or . Furthermore, there is no
coreset construction that succeeds with probability and whose size does
not depend on the number of input points, . This has led researchers in the
area to ask what is the power of randomness for clustering sketches [Feldman,
WIREs Data Mining Knowl. Discov'20]. Similarly, the best approximation ratio
achievable deterministically without a complexity exponential in the dimension
are for both -median and -means, even when allowing a
complexity FPT in the number of clusters . This stands in sharp contrast
with the -approximation achievable in that case, when allowing
randomization.
In this paper, we provide deterministic sketches constructions for
clustering, whose size bounds are close to the best-known randomized ones. We
also construct a deterministic algorithm for computing
-approximation to -median and -means in high dimensional
Euclidean spaces in time , close to the
best randomized complexity.
Furthermore, our new insights on sketches also yield a randomized coreset
construction that uses uniform sampling, that immediately improves over the
recent results of [Braverman et al. FOCS '22] by a factor .Comment: FOCS 2023. Abstract reduced for arxiv requirement
Randomized Sketches of Convex Programs with Sharp Guarantees
Random projection (RP) is a classical technique for reducing storage and
computational costs. We analyze RP-based approximations of convex programs, in
which the original optimization problem is approximated by the solution of a
lower-dimensional problem. Such dimensionality reduction is essential in
computation-limited settings, since the complexity of general convex
programming can be quite high (e.g., cubic for quadratic programs, and
substantially higher for semidefinite programs). In addition to computational
savings, random projection is also useful for reducing memory usage, and has
useful properties for privacy-sensitive optimization. We prove that the
approximation ratio of this procedure can be bounded in terms of the geometry
of constraint set. For a broad class of random projections, including those
based on various sub-Gaussian distributions as well as randomized Hadamard and
Fourier transforms, the data matrix defining the cost function can be projected
down to the statistical dimension of the tangent cone of the constraints at the
original solution, which is often substantially smaller than the original
dimension. We illustrate consequences of our theory for various cases,
including unconstrained and -constrained least squares, support vector
machines, low-rank matrix estimation, and discuss implications on
privacy-sensitive optimization and some connections with de-noising and
compressed sensing
- …