17 research outputs found

    Evaluation of Jackknife and Bootstrap for Defining Confidence Intervals for Pairwise Agreement Measures

    Get PDF
    Several research fields frequently deal with the analysis of diverse classification results of the same entities. This should imply an objective detection of overlaps and divergences between the formed clusters. The congruence between classifications can be quantified by clustering agreement measures, including pairwise agreement measures. Several measures have been proposed and the importance of obtaining confidence intervals for the point estimate in the comparison of these measures has been highlighted. A broad range of methods can be used for the estimation of confidence intervals. However, evidence is lacking about what are the appropriate methods for the calculation of confidence intervals for most clustering agreement measures. Here we evaluate the resampling techniques of bootstrap and jackknife for the calculation of the confidence intervals for clustering agreement measures. Contrary to what has been shown for some statistics, simulations showed that the jackknife performs better than the bootstrap at accurately estimating confidence intervals for pairwise agreement measures, especially when the agreement between partitions is low. The coverage of the jackknife confidence interval is robust to changes in cluster number and cluster size distribution

    Controlling and Visualizing the Precision-Recall Tradeoff for External Performance Indices

    No full text
    International audienceIn many machine learning problems, the performance of the results is measured by indices that often combine precision and recall. In this paper, we study the behavior of such indices in function of the tradeoff precision-recall. We present a new tool of performance visualization and analysis referred to the tradeoff space, which plots the performance index in function of the precision-recall tradeoff. We analyse the properties of this new space and show its advantages over the precision-recall space. Code related to this paper is available at: https://sites-google-com.ezproxy.universite-paris-saclay.fr/site/bhanczarhomepage/prerec

    Understanding Malvestuto’s normalized mutual information

    No full text
    Malvestuto’s version of the normalized mutual information is a well-known information theoretic index for quantifying agreement between two partitions. To further our understanding of what information on agreement between the clusters the index may reflect, we study components of the index that contain information on individual clusters, using mathematical analysis and numerical examples. The indices for individual clusters provide useful information on what is going on with specific clusters

    Understanding the Rand index

    No full text
    The Rand index continues to be one of the most popular indices for assessing agreement between two partitions. The Rand index combines two sources of information, object pairs put together, and object pairs assigned to different clusters, in both partitions. Via a decomposition of the Rand index into four asymmetric indices, we show that in many situations object pairs that were assigned to different clusters have considerable impact on the value of the overall Rand index.<br/
    corecore