BanditPAM++: Faster $k$-medoids Clustering

Kang, Ryan; Lee, Donghyun; Piech, Chris; Shomorony, Ilan; Thrun, Sebastian; Tiwari, Mo; Zhang, Martin Jinye

BanditPAM++: Faster $k$ -medoids Clustering

Authors: Ryan Kang
Donghyun Lee
Chris Piech
Ilan Shomorony
Sebastian Thrun
Mo Tiwari
Martin Jinye Zhang
Publication date: 28 October 2023
Publisher

Abstract

Clustering is a fundamental task in data science with wide-ranging applications. In

k

-medoids clustering, cluster centers must be actual datapoints and arbitrary distance metrics may be used; these features allow for greater interpretability of the cluster centers and the clustering of exotic objects in

k

-medoids clustering, respectively.

k

-medoids clustering has recently grown in popularity due to the discovery of more efficient

k

-medoids algorithms. In particular, recent research has proposed BanditPAM, a randomized

k

-medoids algorithm with state-of-the-art complexity and clustering accuracy. In this paper, we present BanditPAM++, which accelerates BanditPAM via two algorithmic improvements, and is

O(k)

faster than BanditPAM in complexity and substantially faster than BanditPAM in wall-clock runtime. First, we demonstrate that BanditPAM has a special structure that allows the reuse of clustering information

\textit{within}

each iteration. Second, we demonstrate that BanditPAM has additional structure that permits the reuse of information

\textit{across}

different iterations. These observations inspire our proposed algorithm, BanditPAM++, which returns the same clustering solutions as BanditPAM but often several times faster. For example, on the CIFAR10 dataset, BanditPAM++ returns the same results as BanditPAM but runs over 10

\times

faster. Finally, we provide a high-performance C++ implementation of BanditPAM++, callable from Python and R, that may be of interest to practitioners at https://github.com/motiwari/BanditPAM. Auxiliary code to reproduce all of our experiments via a one-line script is available at https://github.com/ThrunGroup/BanditPAM_plusplus_experiments.Comment: NeurIPS 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.18844

Last time updated on 18/01/2024