Search CORE

86,181 research outputs found

The GaussianSketch for Almost Relative Error Kernel Distance

Author: Phillips Jeff M.
Tai Wai Ming
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)
Publication date: 01/01/2020
Field of study

We introduce two versions of a new sketch for approximately embedding the Gaussian kernel into Euclidean inner product space. These work by truncating infinite expansions of the Gaussian kernel, and carefully invoking the RecursiveTensorSketch [Ahle et al. SODA 2020]. After providing concentration and approximation properties of these sketches, we use them to approximate the kernel distance between points sets. These sketches yield almost (1+?)-relative error, but with a small additive ? term. In the first variants the dependence on 1/? is poly-logarithmic, but has higher degree of polynomial dependence on the original dimension d. In the second variant, the dependence on 1/? is still poly-logarithmic, but the dependence on d is linear

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation

Author: Cohen-Addad Vincent
Saulpic David
Schwiegelshohn Chris
Publication venue
Publication date: 06/10/2023
Field of study

In all state-of-the-art sketching and coreset techniques for clustering, as well as in the best known fixed-parameter tractable approximation algorithms, randomness plays a key role. For the classic

k

-median and

k

-means problems, there are no known deterministic dimensionality reduction procedure or coreset construction that avoid an exponential dependency on the input dimension

d

, the precision parameter

\varepsilon^{-1}

k

. Furthermore, there is no coreset construction that succeeds with probability

1-1/n

and whose size does not depend on the number of input points,

n

. This has led researchers in the area to ask what is the power of randomness for clustering sketches [Feldman, WIREs Data Mining Knowl. Discov'20]. Similarly, the best approximation ratio achievable deterministically without a complexity exponential in the dimension are

\Omega(1)

for both

k

-median and

k

-means, even when allowing a complexity FPT in the number of clusters

k

. This stands in sharp contrast with the

(1+\varepsilon)

-approximation achievable in that case, when allowing randomization. In this paper, we provide deterministic sketches constructions for clustering, whose size bounds are close to the best-known randomized ones. We also construct a deterministic algorithm for computing

(1+\varepsilon)

-approximation to

k

-median and

k

-means in high dimensional Euclidean spaces in time

2^{k^2/\varepsilon^{O(1)}} poly(nd)

, close to the best randomized complexity. Furthermore, our new insights on sketches also yield a randomized coreset construction that uses uniform sampling, that immediately improves over the recent results of [Braverman et al. FOCS '22] by a factor

k

.Comment: FOCS 2023. Abstract reduced for arxiv requirement

arXiv.org e-Print Archive

Randomized Sketches of Convex Programs with Sharp Guarantees

Author: Pilanci Mert
Wainwright Martin J.
Publication venue
Publication date: 28/04/2014
Field of study

Random projection (RP) is a classical technique for reducing storage and computational costs. We analyze RP-based approximations of convex programs, in which the original optimization problem is approximated by the solution of a lower-dimensional problem. Such dimensionality reduction is essential in computation-limited settings, since the complexity of general convex programming can be quite high (e.g., cubic for quadratic programs, and substantially higher for semidefinite programs). In addition to computational savings, random projection is also useful for reducing memory usage, and has useful properties for privacy-sensitive optimization. We prove that the approximation ratio of this procedure can be bounded in terms of the geometry of constraint set. For a broad class of random projections, including those based on various sub-Gaussian distributions as well as randomized Hadamard and Fourier transforms, the data matrix defining the cost function can be projected down to the statistical dimension of the tangent cone of the constraints at the original solution, which is often substantially smaller than the original dimension. We illustrate consequences of our theory for various cases, including unconstrained and

\ell_1

-constrained least squares, support vector machines, low-rank matrix estimation, and discuss implications on privacy-sensitive optimization and some connections with de-noising and compressed sensing

arXiv.org e-Print Archive

Crossref