Clustering with minimum spanning trees: How good can it be?

Bartoszuk, Maciej; Brzozowski, Łukasz; Cena, Anna; Gagolewski, Marek

Clustering with minimum spanning trees: How good can it be?

Authors: Maciej Bartoszuk
Łukasz Brzozowski
Anna Cena
Marek Gagolewski
Publication date: 9 March 2023
Publisher

Abstract

Minimum spanning trees (MSTs) provide a convenient representation of datasets in numerous pattern recognition activities. Moreover, they are relatively fast to compute. In this paper, we quantify the extent to which they can be meaningful in data clustering tasks. By identifying the upper bounds for the agreement between the best (oracle) algorithm and the expert labels from a large battery of benchmark data, we discover that MST methods can overall be very competitive. Next, instead of proposing yet another algorithm that performs well on a limited set of examples, we review, study, extend, and generalise existing, the state-of-the-art MST-based partitioning schemes, which leads to a few new and interesting approaches. It turns out that the Genie method and the information-theoretic approaches often outperform the non-MST algorithms such as k-means, Gaussian mixtures, spectral clustering, BIRCH, and classical hierarchical agglomerative procedures

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2303.05679

Last time updated on 24/03/2023