683 research outputs found

    Optimality of the Johnson-Lindenstrauss Lemma

    Full text link
    For any integers d,n2d, n \geq 2 and 1/(min{n,d})0.4999<ε<11/({\min\{n,d\}})^{0.4999} < \varepsilon<1, we show the existence of a set of nn vectors XRdX\subset \mathbb{R}^d such that any embedding f:XRmf:X\rightarrow \mathbb{R}^m satisfying x,yX, (1ε)xy22f(x)f(y)22(1+ε)xy22 \forall x,y\in X,\ (1-\varepsilon)\|x-y\|_2^2\le \|f(x)-f(y)\|_2^2 \le (1+\varepsilon)\|x-y\|_2^2 must have m=Ω(ε2lgn). m = \Omega(\varepsilon^{-2} \lg n). This lower bound matches the upper bound given by the Johnson-Lindenstrauss lemma [JL84]. Furthermore, our lower bound holds for nearly the full range of ε\varepsilon of interest, since there is always an isometric embedding into dimension min{d,n}\min\{d, n\} (either the identity map, or projection onto span(X)\mathop{span}(X)). Previously such a lower bound was only known to hold against linear maps ff, and not for such a wide range of parameters ε,n,d\varepsilon, n, d [LN16]. The best previously known lower bound for general ff was m=Ω(ε2lgn/lg(1/ε))m = \Omega(\varepsilon^{-2}\lg n/\lg(1/\varepsilon)) [Wel74, Lev83, Alo03], which is suboptimal for any ε=o(1)\varepsilon = o(1).Comment: v2: simplified proof, also added reference to Lev8

    Random projections for linear programming

    Get PDF
    Random projections are random linear maps, sampled from appropriate distributions, that approx- imately preserve certain geometrical invariants so that the approximation improves as the dimension of the space grows. The well-known Johnson-Lindenstrauss lemma states that there are random ma- trices with surprisingly few rows that approximately preserve pairwise Euclidean distances among a set of points. This is commonly used to speed up algorithms based on Euclidean distances. We prove that these matrices also preserve other quantities, such as the distance to a cone. We exploit this result to devise a probabilistic algorithm to solve linear programs approximately. We show that this algorithm can approximately solve very large randomly generated LP instances. We also showcase its application to an error correction coding problem.Comment: 26 pages, 1 figur

    Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

    Full text link
    We show how to approximate a data matrix A\mathbf{A} with a much smaller sketch A~\mathbf{\tilde A} that can be used to solve a general class of constrained k-rank approximation problems to within (1+ϵ)(1+\epsilon) error. Importantly, this class of problems includes kk-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just O(k)O(k) dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For kk-means dimensionality reduction, we provide (1+ϵ)(1+\epsilon) relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only `cover' a good subspace for \bv{A}, but can be used directly to compute this subspace. Finally, for kk-means clustering, we show how to achieve a (9+ϵ)(9+\epsilon) approximation by Johnson-Lindenstrauss projecting data points to just O(logk/ϵ2)O(\log k/\epsilon^2) dimensions. This gives the first result that leverages the specific structure of kk-means to achieve dimension independent of input size and sublinear in kk

    Randomized Dimensionality Reduction for k-means Clustering

    Full text link
    We study the topic of dimensionality reduction for kk-means clustering. Dimensionality reduction encompasses the union of two approaches: \emph{feature selection} and \emph{feature extraction}. A feature selection based algorithm for kk-means clustering selects a small subset of the input features and then applies kk-means clustering on the selected features. A feature extraction based algorithm for kk-means clustering constructs a small set of new artificial features and then applies kk-means clustering on the constructed features. Despite the significance of kk-means clustering as well as the wealth of heuristic methods addressing it, provably accurate feature selection methods for kk-means clustering are not known. On the other hand, two provably accurate feature extraction methods for kk-means clustering are known in the literature; one is based on random projections and the other is based on the singular value decomposition (SVD). This paper makes further progress towards a better understanding of dimensionality reduction for kk-means clustering. Namely, we present the first provably accurate feature selection method for kk-means clustering and, in addition, we present two feature extraction methods. The first feature extraction method is based on random projections and it improves upon the existing results in terms of time complexity and number of features needed to be extracted. The second feature extraction method is based on fast approximate SVD factorizations and it also improves upon the existing results in terms of time complexity. The proposed algorithms are randomized and provide constant-factor approximation guarantees with respect to the optimal kk-means objective value.Comment: IEEE Transactions on Information Theory, to appea
    corecore