Search CORE

71 research outputs found

Approximation and Streaming Algorithms for Projective Clustering via Random Projections

Author: Kerber Michael
Raghvendra Sharath
Publication venue
Publication date: 08/07/2014
Field of study

Let

P

be a set of

n

points in

\mathbb{R}^d

. In the projective clustering problem, given

k, q

and norm

\rho \in [1,\infty]

, we have to compute a set

\mathcal{F}

k

q

-dimensional flats such that

(\sum_{p\in P}d(p, \mathcal{F})^\rho)^{1/\rho}

is minimized; here

d(p, \mathcal{F})

represents the (Euclidean) distance of

p

to the closest flat in

\mathcal{F}

. We let

f_k^q(P,\rho)

denote the minimal value and interpret

f_k^q(P,\infty)

to be

\max_{r\in P}d(r, \mathcal{F})

. When

\rho=1,2

and

\infty

and

q=0

, the problem corresponds to the

k

-median,

k

-mean and the

k

-center clustering problems respectively. For every

0 < \epsilon < 1

S\subset P

and

\rho \ge 1

, we show that the orthogonal projection of

P

onto a randomly chosen flat of dimension

O(((q+1)^2\log(1/\epsilon)/\epsilon^3) \log n)

will

\epsilon

-approximate

f_1^q(S,\rho)

. This result combines the concepts of geometric coresets and subspace embeddings based on the Johnson-Lindenstrauss Lemma. As a consequence, an orthogonal projection of

P

to an

O(((q+1)^2 \log ((q+1)/\epsilon)/\epsilon^3) \log n)

dimensional randomly chosen subspace

\epsilon

-approximates projective clusterings for every

k

and

\rho

simultaneously. Note that the dimension of this subspace is independent of the number of clusters~

k

. Using this dimension reduction result, we obtain new approximation and streaming algorithms for projective clustering problems. For example, given a stream of

n

points, we show how to compute an

\epsilon

-approximate projective clustering for every

k

and

\rho

simultaneously using only

O((n+d)((q+1)^2\log ((q+1)/\epsilon))/\epsilon^3 \log n)

space. Compared to standard streaming algorithms with

\Omega(kd)

space requirement, our approach is a significant improvement when the number of input points and their dimensions are of the same order of magnitude.Comment: Canadian Conference on Computational Geometry (CCCG 2015

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe

Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction

Author: Ding Hu
Wang Zixiu
Yu Haikuo
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

We study the problem of k-center clustering with outliers in arbitrary metrics and Euclidean space. Though a number of methods have been developed in the past decades, it is still quite challenging to design quality guaranteed algorithm with low complexity for this problem. Our idea is inspired by the greedy method, Gonzalez\u27s algorithm, for solving the problem of ordinary k-center clustering. Based on some novel observations, we show that this greedy strategy actually can handle k-center clustering with outliers efficiently, in terms of clustering quality and time complexity. We further show that the greedy approach yields small coreset for the problem in doubling metrics, so as to reduce the time complexity significantly. Our algorithms are easy to implement in practice. We test our method on both synthetic and real datasets. The experimental results suggest that our algorithms can achieve near optimal solutions and yield lower running times comparing with existing methods

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server