Let P be a set of n points in Rd. In the projective
clustering problem, given k,q and norm ρ∈[1,∞], we have to
compute a set F of kq-dimensional flats such that (∑p∈Pd(p,F)ρ)1/ρ is minimized; here d(p,F)
represents the (Euclidean) distance of p to the closest flat in
F. We let fkq(P,ρ) denote the minimal value and interpret
fkq(P,∞) to be maxr∈Pd(r,F). When ρ=1,2 and
∞ and q=0, the problem corresponds to the k-median, k-mean and the
k-center clustering problems respectively.
For every 0<ϵ<1, S⊂P and ρ≥1, we show that the
orthogonal projection of P onto a randomly chosen flat of dimension
O(((q+1)2log(1/ϵ)/ϵ3)logn) will ϵ-approximate
f1q(S,ρ). This result combines the concepts of geometric coresets and
subspace embeddings based on the Johnson-Lindenstrauss Lemma. As a consequence,
an orthogonal projection of P to an O(((q+1)2log((q+1)/ϵ)/ϵ3)logn) dimensional randomly chosen subspace
ϵ-approximates projective clusterings for every k and ρ
simultaneously. Note that the dimension of this subspace is independent of the
number of clusters~k.
Using this dimension reduction result, we obtain new approximation and
streaming algorithms for projective clustering problems. For example, given a
stream of n points, we show how to compute an ϵ-approximate
projective clustering for every k and ρ simultaneously using only
O((n+d)((q+1)2log((q+1)/ϵ))/ϵ3logn) space. Compared to
standard streaming algorithms with Ω(kd) space requirement, our approach
is a significant improvement when the number of input points and their
dimensions are of the same order of magnitude.Comment: Canadian Conference on Computational Geometry (CCCG 2015