Search CORE

168 research outputs found

Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

Author: Munteanu Alexander
Schwiegelshohn Chris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques

Archivio della ricerca- Università di Roma La Sapienza

Optimality of the Johnson-Lindenstrauss Lemma

Author: Larsen Kasper Green
Nelson Jelani
Publication venue
Publication date: 08/11/2017
Field of study

For any integers

d, n \geq 2

and

1/({\min\{n,d\}})^{0.4999} < \varepsilon<1

, we show the existence of a set of

n

vectors

X\subset \mathbb{R}^d

such that any embedding

f:X\rightarrow \mathbb{R}^m

satisfying

\forall x,y\in X,\ (1-\varepsilon)\|x-y\|_2^2\le \|f(x)-f(y)\|_2^2 \le (1+\varepsilon)\|x-y\|_2^2

must have

m = \Omega(\varepsilon^{-2} \lg n).

This lower bound matches the upper bound given by the Johnson-Lindenstrauss lemma [JL84]. Furthermore, our lower bound holds for nearly the full range of

\varepsilon

of interest, since there is always an isometric embedding into dimension

\min\{d, n\}

(either the identity map, or projection onto

\mathop{span}(X)

). Previously such a lower bound was only known to hold against linear maps

f

, and not for such a wide range of parameters

\varepsilon, n, d

[LN16]. The best previously known lower bound for general

f

was

m = \Omega(\varepsilon^{-2}\lg n/\lg(1/\varepsilon))

[Wel74, Lev83, Alo03], which is suboptimal for any

\varepsilon = o(1)

.Comment: v2: simplified proof, also added reference to Lev8

arXiv.org e-Print Archive

Crossref

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

Author: Cohen Michael B.
Elder Sam
Musco Cameron
Musco Christopher
Persu Madalina
Publication venue
Publication date: 02/04/2015
Field of study

We show how to approximate a data matrix

\mathbf{A}

with a much smaller sketch

\mathbf{\tilde A}

that can be used to solve a general class of constrained k-rank approximation problems to within

(1+\epsilon)

error. Importantly, this class of problems includes

k

-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just

O(k)

dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For

k

-means dimensionality reduction, we provide

(1+\epsilon)

relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only `cover' a good subspace for \bv{A}, but can be used directly to compute this subspace. Finally, for

k

-means clustering, we show how to achieve a

(9+\epsilon)

approximation by Johnson-Lindenstrauss projecting data points to just

O(\log k/\epsilon^2)

dimensions. This gives the first result that leverages the specific structure of

k

-means to achieve dimension independent of input size and sublinear in

k

arXiv.org e-Print Archive

CiteSeerX

Tracking the l_2 Norm with Constant Update Time

Author: Chou Chi-Ning
Lei Zhixian
Nakkiran Preetum
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

The l_2 tracking problem is the task of obtaining a streaming algorithm that, given access to a stream of items a_1,a_2,a_3,... from a universe [n], outputs at each time t an estimate to the l_2 norm of the frequency vector f^{(t)}in R^n (where f^{(t)}_i is the number of occurrences of item i in the stream up to time t). The previous work [Braverman-Chestnut-Ivkin-Nelson-Wang-Woodruff, PODS 2017] gave a streaming algorithm with (the optimal) space using O(epsilon^{-2}log(1/delta)) words and O(epsilon^{-2}log(1/delta)) update time to obtain an epsilon-accurate estimate with probability at least 1-delta. We give the first algorithm that achieves update time of O(log 1/delta) which is independent of the accuracy parameter epsilon, together with the nearly optimal space using O(epsilon^{-2}log(1/delta)) words. Our algorithm is obtained using the Count Sketch of [Charilkar-Chen-Farach-Colton, ICALP 2002]

Dagstuhl Research Online Publication Server

Entropy Matters: Understanding Performance of Sparse Random Embeddings

Author: Skorski Maciej
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd International Symposium on Algorithms and Computation (ISAAC 2022)
Publication date: 01/01/2022
Field of study

Dagstuhl Research Online Publication Server