34 research outputs found

    Evaluation of Jackknife and Bootstrap for Defining Confidence Intervals for Pairwise Agreement Measures

    Get PDF
    Several research fields frequently deal with the analysis of diverse classification results of the same entities. This should imply an objective detection of overlaps and divergences between the formed clusters. The congruence between classifications can be quantified by clustering agreement measures, including pairwise agreement measures. Several measures have been proposed and the importance of obtaining confidence intervals for the point estimate in the comparison of these measures has been highlighted. A broad range of methods can be used for the estimation of confidence intervals. However, evidence is lacking about what are the appropriate methods for the calculation of confidence intervals for most clustering agreement measures. Here we evaluate the resampling techniques of bootstrap and jackknife for the calculation of the confidence intervals for clustering agreement measures. Contrary to what has been shown for some statistics, simulations showed that the jackknife performs better than the bootstrap at accurately estimating confidence intervals for pairwise agreement measures, especially when the agreement between partitions is low. The coverage of the jackknife confidence interval is robust to changes in cluster number and cluster size distribution

    Ranked Adjusted Rand: integrating distance and partition information in a measure of clustering agreement

    Get PDF
    BACKGROUND: Biological information is commonly used to cluster or classify entities of interest such as genes, conditions, species or samples. However, different sources of data can be used to classify the same set of entities and methods allowing the comparison of the performance of two data sources or the determination of how well a given classification agrees with another are frequently needed, especially in the absence of a universally accepted "gold standard" classification. RESULTS: Here, we describe a novel measure – the Ranked Adjusted Rand (RAR) index. RAR differs from existing methods by evaluating the extent of agreement between any two groupings, taking into account the intercluster distances. This characteristic is relevant to evaluate cases of pairs of entities grouped in the same cluster by one method and separated by another. The latter method may assign them to close neighbour clusters or, on the contrary, to clusters that are far apart from each other. RAR is applicable even when intercluster distance information is absent for both or one of the groupings. In the first case, RAR is equal to its predecessor, Adjusted Rand (HA) index. Artificially designed clusterings were used to demonstrate situations in which only RAR was able to detect differences in the grouping patterns. A study with larger simulated clusterings ensured that in realistic conditions, RAR is effectively integrating distance and partition information. The new method was applied to biological examples to compare 1) two microbial typing methods, 2) two gene regulatory network distances and 3) microarray gene expression data with pathway information. In the first application, one of the methods does not provide intercluster distances while the other originated a hierarchical clustering. RAR proved to be more sensitive than HA in the choice of a threshold for defining clusters in the hierarchical method that maximizes agreement between the results of both methods. CONCLUSION: RAR has its major advantage in combining cluster distance and partition information, while the previously available methods used only the latter. RAR should be used in the research problems were HA was previously used, because in the absence of inter cluster distance effects it is an equally effective measure, and in the presence of distance effects it is a more complete one

    Machine Learning Methodologies to Analyse Antibiotic and Biocide Susceptibility in Staphylococcus aureus

    No full text
    Background The rise of antibiotic resistance in pathogenic bacteria is a significant problem for the treatment of infectious diseases. Resistance is usually selected by the antibiotic itself; however, biocides might also co-select for resistance to antibiotics. Although resistance to biocides is poorly defined, different in vitro studies have shown that mutants presenting low susceptibility to biocides also have reduced susceptibility to antibiotics. However, studies with natural bacterial isolates are more limited and there are no clear conclusions as to whether the use of biocides results in the development of multidrug resistant bacteria. Methods The main goal is to perform an unbiased blind-based evaluation of the relationship between antibiotic and biocide reduced susceptibility in natural isolates of Staphylococcus aureus. One of the largest data sets ever studied comprising 1632 human clinical isolates of S. aureus originated worldwide was analysed. The phenotypic characterization of 13 antibiotics and 4 biocides was performed for all the strains. Complex links between reduced susceptibility to biocides and antibiotics are difficult to elucidate using the standard statistical approaches in phenotypic data. Therefore, machine learning techniques were applied to explore the data. Results In this pioneer study, we demonstrated that reduced susceptibility to two common biocides, chlorhexidine and benzalkonium chloride, which belong to different structural families, is associated to multidrug resistance. We have consistently found that a minimum inhibitory concentration greater than 2 mg/L for both biocides is related to antibiotic non-susceptibility in S. aureus. Conclusions Two important results emerged from our work, one methodological and one other with relevance in the field of antibiotic resistance. We could not conclude on whether the use of antibiotics selects for biocide resistance or vice versa. However, the observation of association between multiple resistance and two biocides commonly used may be of concern for the treatment of infectious diseases in the future
    corecore