4,879 research outputs found
Approximation and Streaming Algorithms for Projective Clustering via Random Projections
Let be a set of points in . In the projective
clustering problem, given and norm , we have to
compute a set of -dimensional flats such that is minimized; here
represents the (Euclidean) distance of to the closest flat in
. We let denote the minimal value and interpret
to be . When and
and , the problem corresponds to the -median, -mean and the
-center clustering problems respectively.
For every , and , we show that the
orthogonal projection of onto a randomly chosen flat of dimension
will -approximate
. This result combines the concepts of geometric coresets and
subspace embeddings based on the Johnson-Lindenstrauss Lemma. As a consequence,
an orthogonal projection of to an dimensional randomly chosen subspace
-approximates projective clusterings for every and
simultaneously. Note that the dimension of this subspace is independent of the
number of clusters~.
Using this dimension reduction result, we obtain new approximation and
streaming algorithms for projective clustering problems. For example, given a
stream of points, we show how to compute an -approximate
projective clustering for every and simultaneously using only
space. Compared to
standard streaming algorithms with space requirement, our approach
is a significant improvement when the number of input points and their
dimensions are of the same order of magnitude.Comment: Canadian Conference on Computational Geometry (CCCG 2015
Analysis of Incomplete Data and an Intrinsic-Dimension Helly Theorem
The analysis of incomplete data is a long-standing challenge in practical statistics. When, as is typical, data objects are represented by points in R^d , incomplete data objects correspond to affine subspaces (lines or Δ-flats).With this motivation we study the problem of finding the minimum intersection radius r(L) of a set of lines or Δ-flats L: the least r such that there is a ball of radius r intersecting every flat in L. Known algorithms for finding the minimum enclosing ball for a point set (or clustering by several balls) do not easily extend to higher dimensional flats, primarily because “distances” between flats do not satisfy the triangle inequality. In this paper we show how to restore geometry (i.e., a substitute for the triangle inequality) to the problem, through a new analog of Helly’s theorem. This “intrinsic-dimension” Helly theorem states: for any family L of Δ-dimensional convex sets in a Hilbert space, there exist Δ + 2 sets L' ⊆ L such that r(L) ≤ 2r(L'). Based upon this we present
an algorithm that computes a (1+ε)-core set L' ⊆ L, |L'| = O(Δ^4/ε), such that the ball centered at a point c with radius (1 +ε)r(L') intersects every element of L. The running time of the algorithm is O(n^(Δ+1)dpoly(Δ/ε)). For the case of lines or line segments (Δ = 1), the (expected) running time of the algorithm can be improved to O(ndpoly(1/ε)).We note that the size of the core set depends only on the dimension of the input objects and is independent of the input size n and the dimension d of the ambient space
- …