8 research outputs found

    Approximate Matrix Multiplication with Application to Linear Embeddings

    Full text link
    In this paper, we study the problem of approximately computing the product of two real matrices. In particular, we analyze a dimensionality-reduction-based approximation algorithm due to Sarlos [1], introducing the notion of nuclear rank as the ratio of the nuclear norm over the spectral norm. The presented bound has improved dependence with respect to the approximation error (as compared to previous approaches), whereas the subspace -- on which we project the input matrices -- has dimensions proportional to the maximum of their nuclear rank and it is independent of the input dimensions. In addition, we provide an application of this result to linear low-dimensional embeddings. Namely, we show that any Euclidean point-set with bounded nuclear rank is amenable to projection onto number of dimensions that is independent of the input dimensionality, while achieving additive error guarantees.Comment: 8 pages, International Symposium on Information Theor

    Approximation and Streaming Algorithms for Projective Clustering via Random Projections

    Full text link
    Let PP be a set of nn points in Rd\mathbb{R}^d. In the projective clustering problem, given k,qk, q and norm ρ[1,]\rho \in [1,\infty], we have to compute a set F\mathcal{F} of kk qq-dimensional flats such that (pPd(p,F)ρ)1/ρ(\sum_{p\in P}d(p, \mathcal{F})^\rho)^{1/\rho} is minimized; here d(p,F)d(p, \mathcal{F}) represents the (Euclidean) distance of pp to the closest flat in F\mathcal{F}. We let fkq(P,ρ)f_k^q(P,\rho) denote the minimal value and interpret fkq(P,)f_k^q(P,\infty) to be maxrPd(r,F)\max_{r\in P}d(r, \mathcal{F}). When ρ=1,2\rho=1,2 and \infty and q=0q=0, the problem corresponds to the kk-median, kk-mean and the kk-center clustering problems respectively. For every 0<ϵ<10 < \epsilon < 1, SPS\subset P and ρ1\rho \ge 1, we show that the orthogonal projection of PP onto a randomly chosen flat of dimension O(((q+1)2log(1/ϵ)/ϵ3)logn)O(((q+1)^2\log(1/\epsilon)/\epsilon^3) \log n) will ϵ\epsilon-approximate f1q(S,ρ)f_1^q(S,\rho). This result combines the concepts of geometric coresets and subspace embeddings based on the Johnson-Lindenstrauss Lemma. As a consequence, an orthogonal projection of PP to an O(((q+1)2log((q+1)/ϵ)/ϵ3)logn)O(((q+1)^2 \log ((q+1)/\epsilon)/\epsilon^3) \log n) dimensional randomly chosen subspace ϵ\epsilon-approximates projective clusterings for every kk and ρ\rho simultaneously. Note that the dimension of this subspace is independent of the number of clusters~kk. Using this dimension reduction result, we obtain new approximation and streaming algorithms for projective clustering problems. For example, given a stream of nn points, we show how to compute an ϵ\epsilon-approximate projective clustering for every kk and ρ\rho simultaneously using only O((n+d)((q+1)2log((q+1)/ϵ))/ϵ3logn)O((n+d)((q+1)^2\log ((q+1)/\epsilon))/\epsilon^3 \log n) space. Compared to standard streaming algorithms with Ω(kd)\Omega(kd) space requirement, our approach is a significant improvement when the number of input points and their dimensions are of the same order of magnitude.Comment: Canadian Conference on Computational Geometry (CCCG 2015

    Dimensionality reduction with subgaussian matrices: a unified theory

    Full text link
    We present a theory for Euclidean dimensionality reduction with subgaussian matrices which unifies several restricted isometry property and Johnson-Lindenstrauss type results obtained earlier for specific data sets. In particular, we recover and, in several cases, improve results for sets of sparse and structured sparse vectors, low-rank matrices and tensors, and smooth manifolds. In addition, we establish a new Johnson-Lindenstrauss embedding for data sets taking the form of an infinite union of subspaces of a Hilbert space

    Approximation and Streaming Algorithms for Projective Clustering via Random Projections

    Get PDF
    Abstract Let P be a set of n points in R d . In the projective clustering problem, given k, q and norm ρ ∈ [1, ∞], we have to compute a set F of k q-dimensional flats such that represents the (Euclidean) distance of p to the closest flat in F. We let f q k (P, ρ) denote the minimal value and interpret f q k (P, ∞) to be max r∈P d(r, F). When ρ = 1, 2 and ∞ and q = 0, the problem corresponds to the k-median, kmean and the k-center clustering problems respectively. For every 0 &lt; ε &lt; 1, S ⊂ P and ρ ≥ 1, we show that the orthogonal projection of P onto a randomly chosen flat of dimension O(((q + 1) 2 log(1/ε)/ε 3 ) log n) will ε-approximate f q 1 (S, ρ). This result combines the concepts of geometric coresets and subspace embeddings based on the Johnson-Lindenstrauss Lemma. As a consequence, an orthogonal projection of P to an O(((q + 1) 2 log((q + 1)/ε)/ε 3 ) log n) dimensional randomly chosen subspace ε-approximates projective clusterings for every k and ρ simultaneously. Note that the dimension of this subspace is independent of the number of clusters k. Using this dimension reduction result, we obtain new approximation and streaming algorithms for projective clustering problems. For example, given a stream of n points, we show how to compute an ε-approximate projective clustering for every k and ρ simultaneously using only O((n + d)((q + 1) 2 log((q + 1)/ε))/ε 3 log n) space. Compared to standard streaming algorithms with Ω(kd) space requirement, our approach is a significant improvement when the number of input points and their dimensions are of the same order of magnitude

    Embeddings of surfaces, curves, and moving points in euclidean space

    No full text
    In this paper we show that dimensionality reduction (i.e., Johnson-Lindenstrauss lemma) preserves not only the distances between static points, but also between moving points, and more generally between low-dimensional flats, polynomial curves, curves with low winding degree, and polynomial surfaces. We also show that surfaces with bounded doubling dimension can be embedded into low dimension with small additive error. Finally, we show that for points with polynomial motion, the radius of the smallest enclosing ball can be preserved under dimensionality reduction.

    Random observations on random observations: Sparse signal acquisition and processing

    Get PDF
    In recent years, signal processing has come under mounting pressure to accommodate the increasingly high-dimensional raw data generated by modern sensing systems. Despite extraordinary advances in computational power, processing the signals produced in application areas such as imaging, video, remote surveillance, spectroscopy, and genomic data analysis continues to pose a tremendous challenge. Fortunately, in many cases these high-dimensional signals contain relatively little information compared to their ambient dimensionality. For example, signals can often be well-approximated as a sparse linear combination of elements from a known basis or dictionary. Traditionally, sparse models have been exploited only after acquisition, typically for tasks such as compression. Recently, however, the applications of sparsity have greatly expanded with the emergence of compressive sensing, a new approach to data acquisition that directly exploits sparsity in order to acquire analog signals more efficiently via a small set of more general, often randomized, linear measurements. If properly chosen, the number of measurements can be much smaller than the number of Nyquist-rate samples. A common theme in this research is the use of randomness in signal acquisition, inspiring the design of hardware systems that directly implement random measurement protocols. This thesis builds on the field of compressive sensing and illustrates how sparsity can be exploited to design efficient signal processing algorithms at all stages of the information processing pipeline, with a particular focus on the manner in which randomness can be exploited to design new kinds of acquisition systems for sparse signals. Our key contributions include: (i) exploration and analysis of the appropriate properties for a sparse signal acquisition system; (ii) insight into the useful properties of random measurement schemes; (iii) analysis of an important family of algorithms for recovering sparse signals from random measurements; (iv) exploration of the impact of noise, both structured and unstructured, in the context of random measurements; and (v) algorithms that process random measurements to directly extract higher-level information or solve inference problems without resorting to full-scale signal recovery, reducing both the cost of signal acquisition and the complexity of the post-acquisition processing
    corecore