To achieve scientific progress in terms of building a cumulative body of
knowledge, careful attention to benchmarking is of the utmost importance. This
means that proposals of new methods of data pre-processing, new data-analytic
techniques, and new methods of output post-processing, should be extensively
and carefully compared with existing alternatives, and that existing methods
should be subjected to neutral comparison studies. To date, benchmarking and
recommendations for benchmarking have been frequently seen in the context of
supervised learning. Unfortunately, there has been a dearth of guidelines for
benchmarking in an unsupervised setting, with the area of clustering as an
important subdomain. To address this problem, discussion is given to the
theoretical conceptual underpinnings of benchmarking in the field of cluster
analysis by means of simulated as well as empirical data. Subsequently, the
practicalities of how to address benchmarking questions in clustering are dealt
with, and foundational recommendations are made

Boulesteix, Anne-Laure

Dangl, Rainer

Dean, Nema

Guyon, Isabelle

Hennig, Christian

Leisch, Friedrich

Steinley, Douglas

Van Mechelen, Iven

English

arXiv

Note: A revised version of this is now published. Please cite and read (it's
open access): Van Mechelen, I., Boulesteix, A.-L., Dangl, R., Dean, N., Hennig,
C., Leisch, F., Steinley, D., Warrens, M. J. (2023). A white paper on good
research practices in benchmarking: The case of cluster analysis. WIREs Data
Mining and Knowledge Discovery, e1511. https://doi.org/10.1002/widm.1511
  To achieve scientific progress in terms of building a cumulative body of
knowledge, careful attention to benchmarking is of the utmost importance. This
means that proposals of new methods of data pre-processing, new data-analytic
techniques, and new methods of output post-processing, should be extensively
and carefully compared with existing alternatives, and that existing methods
should be subjected to neutral comparison studies. To date, benchmarking and
recommendations for benchmarking have been frequently seen in the context of
supervised learning. Unfortunately, there has been a dearth of guidelines for
benchmarking in an unsupervised setting, with the area of clustering as an
important subdomain. To address this problem, discussion is given to the
theoretical conceptual underpinnings of benchmarking in the field of cluster
analysis by means of simulated as well as empirical data. Subsequently, the
practicalities of how to address benchmarking questions in clustering are dealt
with, and foundational recommendations are made

arXiv.org e-Print Archive

Benchmarking in cluster analysis: A white paper

To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance, requiring that proposals of new methods are extensively and carefully compared with their best predecessors, and existing methods subjected to neutral comparison studies. Answers to benchmarking questions should be evidence-based, with the relevant evidence being collected through well-thought-out procedures, in reproducible and replicable ways. In the present paper, we review good research practices in benchmarking from the perspective of the area of cluster analysis. Discussion is given to the theoretical, conceptual underpinnings of benchmarking based on simulated and empirical data in this context. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made based on existing literature. This article is categorized under: Fundamental Concepts of Data and Knowledge &gt; Data Concepts Fundamental Concepts of Data and Knowledge &gt; Key Design Issues in Data Mining Technologies &gt; Structure Discovery and Clustering.</p

Boulesteix, Anne Laure

Warrens, Matthijs J.

ARTS repository - University of Groningen

A white paper on good research practices in benchmarking:The case of cluster analysis

Iven Van Mechelen

Anne‐Laure Boulesteix

Rainer Dangl

Nema Dean

Christian Hennig

Friedrich Leisch

Douglas Steinley

Matthijs J. Warrens

Crossref

A white paper on good research practices in benchmarking: The case of cluster analysis

University of Groningen

To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance, requiring that proposals of new methods are extensively and carefully compared with their best predecessors, and existing methods subjected to neutral comparison studies. Answers to benchmarking questions should be evidence-based, with the relevant evidence being collected through well-thought-out procedures, in reproducible and replicable ways. In the present paper, we review good research practices in benchmarking from the perspective of the area of cluster analysis. Discussion is given to the theoretical, conceptual underpinnings of benchmarking based on simulated and empirical data in this context. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made based on existing literature.This article is categorized under:Fundamental Concepts of Data and Knowledge &gt; Data ConceptsFundamental Concepts of Data and Knowledge &gt; Key Design Issues in Data MiningTechnologies &gt; Structure Discovery and Clusterin

Anne-Laure Boulesteix

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance, requiring that proposals of new methods are extensively and carefully compared with their best predecessors, and existing methods subjected to neutral comparison studies. Answers to benchmarking questions should be evidence-based, with the relevant evidence being collected through well-thought-out procedures, in reproducible and replicable ways. In the present paper, we review good research practices in benchmarking from the perspective of the area of cluster analysis. Discussion is given to the theoretical, conceptual underpinnings of benchmarking based on simulated and empirical data in this context. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made based on existing literature

Boulesteix, Anne‐Laure

Enlighten

A white paper on good research practices in benchmarking: the case of cluster analysis

https://eprints.gla.ac.uk/304763/2/304763.pdf

Benchmarking in cluster analysis: A white paper

Abstract

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

ARTS repository - University of Groningen

Crossref

ARTS repository - University of Groningen

University of Groningen

University of Groningen

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Enlighten