Search CORE

683 research outputs found

Optimality of the Johnson-Lindenstrauss Lemma

Author: Larsen Kasper Green
Nelson Jelani
Publication venue
Publication date: 08/11/2017
Field of study

For any integers

d, n \geq 2

and

1/({\min\{n,d\}})^{0.4999} < \varepsilon<1

, we show the existence of a set of

n

vectors

X\subset \mathbb{R}^d

such that any embedding

f:X\rightarrow \mathbb{R}^m

satisfying

\forall x,y\in X,\ (1-\varepsilon)\|x-y\|_2^2\le \|f(x)-f(y)\|_2^2 \le (1+\varepsilon)\|x-y\|_2^2

must have

m = \Omega(\varepsilon^{-2} \lg n).

This lower bound matches the upper bound given by the Johnson-Lindenstrauss lemma [JL84]. Furthermore, our lower bound holds for nearly the full range of

\varepsilon

of interest, since there is always an isometric embedding into dimension

\min\{d, n\}

(either the identity map, or projection onto

\mathop{span}(X)

). Previously such a lower bound was only known to hold against linear maps

f

, and not for such a wide range of parameters

\varepsilon, n, d

[LN16]. The best previously known lower bound for general

f

was

m = \Omega(\varepsilon^{-2}\lg n/\lg(1/\varepsilon))

[Wel74, Lev83, Alo03], which is suboptimal for any

\varepsilon = o(1)

.Comment: v2: simplified proof, also added reference to Lev8

arXiv.org e-Print Archive

Crossref

Recommended from our members

Almost Optimal Explicit Johnson-Lindenstrauss Families

Author: Kane Daniel
Meka Raghu
Nelson Jelani
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/04/2015
Field of study

The Johnson-Lindenstrauss lemma is a fundamental result in probability with several applications in the design and analysis of algorithms. Constructions of linear embeddings satisfying the Johnson Lindenstrauss property necessarily involve randomness and much attention has been given to obtain explicit constructions minimizing the number of random bits used. In this work we give explicit constructions with an almost optimal use of randomness: For 0 ε ] ≤ δ, with seed-length r = O log d + log(1/δ) · loglog(1/δ)ε. In particular, for δ = 1/ poly(d) and fixed ε > 0, we obtain seed-length O((log d)(log log d)). Previous constructions required Ω(log2d) random bits to obtain polynomially small error. We also give a new elementary proof of the optimality of the JL lemma showing a lower bound of Ω(log(1/δ)/ε2) on the embedding dimension. Previously, Jayram and Woodruff [9] used communication complexity techniques to show a similar bound.Engineering and Applied Science

Harvard University - DASH

Random projections for linear programming

Author: Liberti Leo
Poirion Pierre-Louis
Vu Ky
Publication venue
Publication date: 08/06/2017
Field of study

Random projections are random linear maps, sampled from appropriate distributions, that approx- imately preserve certain geometrical invariants so that the approximation improves as the dimension of the space grows. The well-known Johnson-Lindenstrauss lemma states that there are random ma- trices with surprisingly few rows that approximately preserve pairwise Euclidean distances among a set of points. This is commonly used to speed up algorithms based on Euclidean distances. We prove that these matrices also preserve other quantities, such as the distance to a cone. We exploit this result to devise a probabilistic algorithm to solve linear programs approximately. We show that this algorithm can approximately solve very large randomly generated LP instances. We also showcase its application to an error correction coding problem.Comment: 26 pages, 1 figur

arXiv.org e-Print Archive

Crossref

HAL-Polytechnique

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

Author: Cohen Michael B.
Elder Sam
Musco Cameron
Musco Christopher
Persu Madalina
Publication venue
Publication date: 02/04/2015
Field of study

We show how to approximate a data matrix

\mathbf{A}

with a much smaller sketch

\mathbf{\tilde A}

that can be used to solve a general class of constrained k-rank approximation problems to within

(1+\epsilon)

error. Importantly, this class of problems includes

k

-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just

O(k)

dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For

k

-means dimensionality reduction, we provide

(1+\epsilon)

relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only `cover' a good subspace for \bv{A}, but can be used directly to compute this subspace. Finally, for

k

-means clustering, we show how to achieve a

(9+\epsilon)

approximation by Johnson-Lindenstrauss projecting data points to just

O(\log k/\epsilon^2)

dimensions. This gives the first result that leverages the specific structure of

k

-means to achieve dimension independent of input size and sublinear in

k

arXiv.org e-Print Archive

CiteSeerX

Randomized Dimensionality Reduction for k-means Clustering

Author: Boutsidis Christos
Drineas Petros
Mahoney Michael W.
Zouzias Anastasios
Publication venue
Publication date: 01/01/2013
Field of study

We study the topic of dimensionality reduction for

k

-means clustering. Dimensionality reduction encompasses the union of two approaches: \emph{feature selection} and \emph{feature extraction}. A feature selection based algorithm for

k

-means clustering selects a small subset of the input features and then applies

k

-means clustering on the selected features. A feature extraction based algorithm for

k

-means clustering constructs a small set of new artificial features and then applies

k

-means clustering on the constructed features. Despite the significance of

k

-means clustering as well as the wealth of heuristic methods addressing it, provably accurate feature selection methods for

k

-means clustering are not known. On the other hand, two provably accurate feature extraction methods for

k

-means clustering are known in the literature; one is based on random projections and the other is based on the singular value decomposition (SVD). This paper makes further progress towards a better understanding of dimensionality reduction for

k

-means clustering. Namely, we present the first provably accurate feature selection method for

k

-means clustering and, in addition, we present two feature extraction methods. The first feature extraction method is based on random projections and it improves upon the existing results in terms of time complexity and number of features needed to be extracted. The second feature extraction method is based on fast approximate SVD factorizations and it also improves upon the existing results in terms of time complexity. The proposed algorithms are randomized and provide constant-factor approximation guarantees with respect to the optimal

k

-means objective value.Comment: IEEE Transactions on Information Theory, to appea

arXiv.org e-Print Archive

CiteSeerX