Search CORE

5 research outputs found

Scalable and Accurate Density-Peaks Clustering on Fully Dynamic Data

Author: Amagata Daichi
Publication venue: Institute of Electrical and Electronics Engineers Inc.
Publication date
Field of study

Clustering is a primitive and important operator that analyzes a given dataset to discover its hidden patterns and features. Because datasets are usually updated dynamically (i.e., it accepts continuous insertions and arbitrary deletions), analyzing such dynamic data is also an important topic, and dynamic clustering effectively supports it, but is a challenging problem. In this paper, we consider the problem of density-peaks clustering (DPC) on dynamic data. DPC is one of the density-based clustering algorithms and attracts attention for many applications, due to its effectiveness. We investigate the hardness of this problem theoretically to measure the efficiencies of dynamic DPC algorithms. We prove that any exact solutions are costly, and propose an approximation algorithm to enable faster updates. We conduct experiments on real datasets, and the results confirm that our algorithm is much faster and more accurate than state-of-the-art.Amagata D., . Scalable and Accurate Density-Peaks Clustering on Fully Dynamic Data. Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022 , 445 (2022); https://doi.org/10.1109/BigData55660.2022.10020690

Osaka University Knowledge Archive

Fully Dynamic $k$ -Clustering in $\tilde O(k)$ Update Time

Author: Bhattacharya Sayan
Costa Martín
Lattanzi Silvio
Parotsidis Nikos
Publication venue
Publication date: 26/10/2023
Field of study

We present a

O(1)

-approximate fully dynamic algorithm for the

k

-median and

k

-means problems on metric spaces with amortized update time

\tilde O(k)

and worst-case query time

\tilde O(k^2)

. We complement our theoretical analysis with the first in-depth experimental study for the dynamic

k

-median problem on general metrics, focusing on comparing our dynamic algorithm to the current state-of-the-art by Henzinger and Kale [ESA'20]. Finally, we also provide a lower bound for dynamic

k

-median which shows that any

O(1)

-approximate algorithm with

\tilde O(\text{poly}(k))

query time must have

\tilde \Omega(k)

amortized update time, even in the incremental setting.Comment: Accepted at NeurIPS 202

arXiv.org e-Print Archive

An experimental evaluation of sliding-window algorithms for k-means clustering

Author: Mallick Satyaki
Publication venue
Publication date: 30/09/2021
Field of study

Pure OAI Repository

Tight Bounds for Adversarially Robust Streams and Sliding Windows via Difference Estimators

Author: Woodruff David P.
Zhou Samson
Publication venue
Publication date: 23/11/2021
Field of study

In the adversarially robust streaming model, a stream of elements is presented to an algorithm and is allowed to depend on the output of the algorithm at earlier times during the stream. In the classic insertion-only model of data streams, Ben-Eliezer et. al. (PODS 2020, best paper award) show how to convert a non-robust algorithm into a robust one with a roughly

1/\varepsilon

factor overhead. This was subsequently improved to a

1/\sqrt{\varepsilon}

factor overhead by Hassidim et. al. (NeurIPS 2020, oral presentation), suppressing logarithmic factors. For general functions the latter is known to be best-possible, by a result of Kaplan et. al. (CRYPTO 2021). We show how to bypass this impossibility result by developing data stream algorithms for a large class of streaming problems, with no overhead in the approximation factor. Our class of streaming problems includes the most well-studied problems such as the

L_2

-heavy hitters problem,

F_p

-moment estimation, as well as empirical entropy estimation. We substantially improve upon all prior work on these problems, giving the first optimal dependence on the approximation factor. As in previous work, we obtain a general transformation that applies to any non-robust streaming algorithm and depends on the so-called flip number. However, the key technical innovation is that we apply the transformation to what we call a difference estimator for the streaming problem, rather than an estimator for the streaming problem itself. We then develop the first difference estimators for a wide range of problems. Our difference estimator methodology is not only applicable to the adversarially robust model, but to other streaming models where temporal properties of the data play a central role. (Abstract shortened to meet arXiv limit.)Comment: FOCS 202

arXiv.org e-Print Archive