Search CORE

5 research outputs found

Improved analysis of D2-sampling based PTAS for k-means and other Clustering problems

Author: Jaiswal Ragesh
Kumar Mehul
Yadav Pulkit
Publication venue
Publication date: 15/01/2014
Field of study

We give an improved analysis of the simple

D^2

-sampling based PTAS for the

k

-means clustering problem given by Jaiswal, Kumar, and Sen (Algorithmica, 2013). The improvement on the running time is from

O\left(nd \cdot 2^{\tilde{O}(k^2/\epsilon)}\right)

O\left(nd \cdot 2^{\tilde{O}(k/\epsilon)}\right)

.Comment: arXiv admin note: substantial text overlap with arXiv:1201.420

arXiv.org e-Print Archive

Faster Balanced Clusterings in High Dimension

Author: Ding Hu
Publication venue
Publication date: 09/09/2018
Field of study

The problem of constrained clustering has attracted significant attention in the past decades. In this paper, we study the balanced

k

-center,

k

-median, and

k

-means clustering problems where the size of each cluster is constrained by the given lower and upper bounds. The problems are motivated by the applications in processing large-scale data in high dimension. Existing methods often need to compute complicated matchings (or min cost flows) to satisfy the balance constraint, and thus suffer from high complexities especially in high dimension. We develop an effective framework for the three balanced clustering problems to address this issue, and our method is based on a novel spatial partition idea in geometry. For the balanced

k

-center clustering, we provide a

4

-approximation algorithm that improves the existing approximation factors; for the balanced

k

-median and

k

-means clusterings, our algorithms yield constant and

(1+\epsilon)

-approximation factors with any

\epsilon>0

. More importantly, our algorithms achieve linear or nearly linear running times when

k

is a constant, and significantly improve the existing ones. Our results can be easily extended to metric balanced clusterings and the running times are sub-linear in terms of the complexity of

n

-point metric

arXiv.org e-Print Archive

Simple and sharp analysis of k-means||

Author: Rozhoň Václav
Publication venue
Publication date: 02/07/2020
Field of study

We present a simple analysis of k-means|| (Bahmani et al., PVLDB 2012) -- a distributed variant of the k-means++ algorithm (Arthur and Vassilvitskii, SODA 2007). Moreover, the bound on the number of rounds is improved from

O(\log n)

O(\log n / \log\log n)

, which we show to be tight

arXiv.org e-Print Archive

Speeding Up Constrained $k$ -Means Through 2-Means

Author: Feng Qilong
Fu Bin
Publication venue
Publication date: 13/08/2018
Field of study

For the constrained 2-means problem, we present a

O\left(dn+d({1\over\epsilon})^{O({1\over \epsilon})}\log n\right)

time algorithm. It generates a collection

U

of approximate center pairs

(c_1, c_2)

such that one of pairs in

U

can induce a

(1+\epsilon)

-approximation for the problem. The existing approximation scheme for the constrained 2-means problem takes

O(({1\over\epsilon})^{O({1\over \epsilon})}dn)

time, and the existing approximation scheme for the constrained

k

-means problem takes

O(({k\over\epsilon})^{O({k\over \epsilon})}dn)

time. Using the method developed in this paper, we point out that every existing approximating scheme for the constrained

k

-means so far with time

C(k, n, d, \epsilon)

can be transformed to a new approximation scheme with time complexity

{C(k, n, d, \epsilon)/ k^{\Omega({1\over\epsilon})}}

arXiv.org e-Print Archive

Streaming PTAS for Constrained k-Means

Author: Goyal Dishant
Jaiswal Ragesh
Kumar Amit
Publication venue
Publication date: 18/02/2020
Field of study

We generalise the results of Bhattacharya et al. (Journal of Computing Systems, 62(1):93-115, 2018) for the list-

k

-means problem defined as -- for a (unknown) partition

X_1, ..., X_k

of the dataset

X \subseteq \mathbb{R}^d

, find a list of

k

-center sets (each element in the list is a set of

k

centers) such that at least one of

k

-center sets

\{c_1, ..., c_k\}

in the list gives an

(1+\varepsilon)

-approximation with respect to the cost function

\min_{\textrm{permutation } \pi} \left[ \sum_{i=1}^{k} \sum_{x \in X_i} ||x - c_{\pi(i)}||^2 \right]

. The list-

k

-means problem is important for the constrained

k

-means problem since algorithms for the former can be converted to PTAS for various versions of the latter. Following are the consequences of our generalisations: - Streaming algorithm: Our

D^2

-sampling based algorithm running in a single iteration allows us to design a 2-pass, logspace streaming algorithm for the list-

k

-means problem. This can be converted to a 4-pass, logspace streaming PTAS for various constrained versions of the

k

-means problem. - Faster PTAS under stability: Our generalisation is also useful in

k

-means clustering scenarios where finding good centers becomes easy once good centers for a few "bad" clusters have been chosen. One such scenario is clustering under stability where the number of such bad clusters is a constant. Using the above idea, we significantly improve the running time of the known algorithm from

O(dn^3) (k \log{n})^{poly(\frac{1}{\beta}, \frac{1}{\varepsilon})}

O \left(dn^3 k^{\tilde{O}_{\beta \varepsilon}(\frac{1}{\beta \varepsilon})} \right)

.Comment: Changes from previous version: (i) added discussion on coreset, and (ii) fixed few typo

arXiv.org e-Print Archive