Assessing the efficiency of clustering algorithms and goodness-of-fit measures using phytoplankton field data

Abstract

Investigation of patterns in beta diversity has received increased attention over the last years particularly in light of new ecological theories such as the metapopulation paradigm and metacommunity theory. Traditionally, beta diversity patterns can be described by cluster analysis (i.e. dendrograms) that enables the classification of samples. Clustering algorithms define the structure of dendrograms, consequently assessing their performance is crucial. A common, although not always appropriate approach for assessing algorithm suitability is the cophenetic correlation coefficient c. Alternatively the 2-norm has been recently proposed as an increasingly informative method for evaluating the distortion engendered by clustering algorithms. In the present work, the 2-norm is applied for the first time on field data and is compared with the cophenetic correlation coefficient using a set of 105 pairwise combinations of 7 clustering methods (e.g. UPGMA) and 15 (dis)similarity/distance indices (e.g. Jaccard index). In contrast to the 2-norm, cophenetic correlation coefficient does not provide a clear indication on the efficiency of the clustering algorithms for all combinations. The two approaches were not always in agreement in the choice of the most faithful algorithm. Additionally, the 2-norm revealed that UPGMA is the most efficient clustering algorithm and Ward's the least. The present results suggest that goodness-of-fit measures such as the 2-norm should be applied prior to clustering analyses for reliable beta diversity measures

    Similar works

    Full text

    thumbnail-image

    Enlighten

    redirect
    Last time updated on 15/12/2013

    This paper was published in Enlighten.

    Having an issue?

    Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.