86,181 research outputs found

    The GaussianSketch for Almost Relative Error Kernel Distance

    Get PDF
    We introduce two versions of a new sketch for approximately embedding the Gaussian kernel into Euclidean inner product space. These work by truncating infinite expansions of the Gaussian kernel, and carefully invoking the RecursiveTensorSketch [Ahle et al. SODA 2020]. After providing concentration and approximation properties of these sketches, we use them to approximate the kernel distance between points sets. These sketches yield almost (1+?)-relative error, but with a small additive ? term. In the first variants the dependence on 1/? is poly-logarithmic, but has higher degree of polynomial dependence on the original dimension d. In the second variant, the dependence on 1/? is still poly-logarithmic, but the dependence on d is linear

    Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation

    Full text link
    In all state-of-the-art sketching and coreset techniques for clustering, as well as in the best known fixed-parameter tractable approximation algorithms, randomness plays a key role. For the classic kk-median and kk-means problems, there are no known deterministic dimensionality reduction procedure or coreset construction that avoid an exponential dependency on the input dimension dd, the precision parameter ε−1\varepsilon^{-1} or kk. Furthermore, there is no coreset construction that succeeds with probability 1−1/n1-1/n and whose size does not depend on the number of input points, nn. This has led researchers in the area to ask what is the power of randomness for clustering sketches [Feldman, WIREs Data Mining Knowl. Discov'20]. Similarly, the best approximation ratio achievable deterministically without a complexity exponential in the dimension are Ω(1)\Omega(1) for both kk-median and kk-means, even when allowing a complexity FPT in the number of clusters kk. This stands in sharp contrast with the (1+ε)(1+\varepsilon)-approximation achievable in that case, when allowing randomization. In this paper, we provide deterministic sketches constructions for clustering, whose size bounds are close to the best-known randomized ones. We also construct a deterministic algorithm for computing (1+ε)(1+\varepsilon)-approximation to kk-median and kk-means in high dimensional Euclidean spaces in time 2k2/εO(1)poly(nd)2^{k^2/\varepsilon^{O(1)}} poly(nd), close to the best randomized complexity. Furthermore, our new insights on sketches also yield a randomized coreset construction that uses uniform sampling, that immediately improves over the recent results of [Braverman et al. FOCS '22] by a factor kk.Comment: FOCS 2023. Abstract reduced for arxiv requirement

    Randomized Sketches of Convex Programs with Sharp Guarantees

    Full text link
    Random projection (RP) is a classical technique for reducing storage and computational costs. We analyze RP-based approximations of convex programs, in which the original optimization problem is approximated by the solution of a lower-dimensional problem. Such dimensionality reduction is essential in computation-limited settings, since the complexity of general convex programming can be quite high (e.g., cubic for quadratic programs, and substantially higher for semidefinite programs). In addition to computational savings, random projection is also useful for reducing memory usage, and has useful properties for privacy-sensitive optimization. We prove that the approximation ratio of this procedure can be bounded in terms of the geometry of constraint set. For a broad class of random projections, including those based on various sub-Gaussian distributions as well as randomized Hadamard and Fourier transforms, the data matrix defining the cost function can be projected down to the statistical dimension of the tangent cone of the constraints at the original solution, which is often substantially smaller than the original dimension. We illustrate consequences of our theory for various cases, including unconstrained and â„“1\ell_1-constrained least squares, support vector machines, low-rank matrix estimation, and discuss implications on privacy-sensitive optimization and some connections with de-noising and compressed sensing
    • …
    corecore