3,159 research outputs found

    Approximation and Streaming Algorithms for Projective Clustering via Random Projections

    Full text link
    Let PP be a set of nn points in Rd\mathbb{R}^d. In the projective clustering problem, given k,qk, q and norm ρ[1,]\rho \in [1,\infty], we have to compute a set F\mathcal{F} of kk qq-dimensional flats such that (pPd(p,F)ρ)1/ρ(\sum_{p\in P}d(p, \mathcal{F})^\rho)^{1/\rho} is minimized; here d(p,F)d(p, \mathcal{F}) represents the (Euclidean) distance of pp to the closest flat in F\mathcal{F}. We let fkq(P,ρ)f_k^q(P,\rho) denote the minimal value and interpret fkq(P,)f_k^q(P,\infty) to be maxrPd(r,F)\max_{r\in P}d(r, \mathcal{F}). When ρ=1,2\rho=1,2 and \infty and q=0q=0, the problem corresponds to the kk-median, kk-mean and the kk-center clustering problems respectively. For every 0<ϵ<10 < \epsilon < 1, SPS\subset P and ρ1\rho \ge 1, we show that the orthogonal projection of PP onto a randomly chosen flat of dimension O(((q+1)2log(1/ϵ)/ϵ3)logn)O(((q+1)^2\log(1/\epsilon)/\epsilon^3) \log n) will ϵ\epsilon-approximate f1q(S,ρ)f_1^q(S,\rho). This result combines the concepts of geometric coresets and subspace embeddings based on the Johnson-Lindenstrauss Lemma. As a consequence, an orthogonal projection of PP to an O(((q+1)2log((q+1)/ϵ)/ϵ3)logn)O(((q+1)^2 \log ((q+1)/\epsilon)/\epsilon^3) \log n) dimensional randomly chosen subspace ϵ\epsilon-approximates projective clusterings for every kk and ρ\rho simultaneously. Note that the dimension of this subspace is independent of the number of clusters~kk. Using this dimension reduction result, we obtain new approximation and streaming algorithms for projective clustering problems. For example, given a stream of nn points, we show how to compute an ϵ\epsilon-approximate projective clustering for every kk and ρ\rho simultaneously using only O((n+d)((q+1)2log((q+1)/ϵ))/ϵ3logn)O((n+d)((q+1)^2\log ((q+1)/\epsilon))/\epsilon^3 \log n) space. Compared to standard streaming algorithms with Ω(kd)\Omega(kd) space requirement, our approach is a significant improvement when the number of input points and their dimensions are of the same order of magnitude.Comment: Canadian Conference on Computational Geometry (CCCG 2015

    A Dimension Reduction Scheme for the Computation of Optimal Unions of Subspaces

    Get PDF
    Given a set of points \F in a high dimensional space, the problem of finding a union of subspaces \cup_i V_i\subset \R^N that best explains the data \F increases dramatically with the dimension of \R^N. In this article, we study a class of transformations that map the problem into another one in lower dimension. We use the best model in the low dimensional space to approximate the best solution in the original high dimensional space. We then estimate the error produced between this solution and the optimal solution in the high dimensional space.Comment: 15 pages. Some corrections were added, in particular the title was changed. It will appear in "Sampling Theory in Signal and Image Processing
    corecore