17 research outputs found

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

    Bioavailable iron in the Southern Ocean: the significance of the iceberg conveyor belt

    Get PDF
    Productivity in the Southern Oceans is iron-limited, and the supply of iron dissolved from aeolian dust is believed to be the main source from outside the marine reservoir. Glacial sediment sources of iron have rarely been considered, as the iron has been assumed to be inert and non-bioavailable. This study demonstrates the presence of potentially bioavailable Fe as ferrihydrite and goethite in nanoparticulate clusters, in sediments collected from icebergs in the Southern Ocean and glaciers on the Antarctic landmass. Nanoparticles in ice can be transported by icebergs away from coastal regions in the Southern Ocean, enabling melting to release bioavailable Fe to the open ocean. The abundance of nanoparticulate iron has been measured by an ascorbate extraction. This data indicates that the fluxes of bioavailable iron supplied to the Southern Ocean from aeolian dust (0.01–0.13 Tg yr-1) and icebergs (0.06–0.12 Tg yr-1) are comparable. Increases in iceberg production thus have the capacity to increase productivity and this newly identified negative feedback may help to mitigate fossil fuel emissions

    Drug-drug interactions and QT prolongation as a commonly assessed cardiac effect - comprehensive overview of clinical trials

    Full text link

    Specific Variants in the MLH1 Gene Region May Drive DNA Methylation, Loss of Protein Expression, and MSI-H Colorectal Cancer

    Get PDF
    Background: We previously identified an association between a mismatch repair gene, MLH1, promoter SNP (rs1800734) and microsatellite unstable (MSI-H) colorectal cancers (CRCs) in two samples. The current study expanded on this finding as we explored the genetic basis of DNA methylation in this region of chromosome 3. We hypothesized that specific polymorphisms in the MLH1 gene region predispose it to DNA methylation, resulting in the loss of MLH1 gene expression, mismatch-repair function, and consequently to genome-wide microsatellite instability. Methodology/Principal Findings: We first tested our hypothesis in one sample from Ontario (901 cases, 1,097 controls) and replicated major findings in two additional samples from Newfoundland and Labrador (479 cases, 336 controls) and from Seattle (591 cases, 629 controls). Logistic regression was used to test for association between SNPs in the region of MLH1 and CRC, MSI-H CRC, MLH1 gene expression in CRC, and DNA methylation in CRC. The association between rs1800734 and MSI-H CRCs, previously reported in Ontario and Newfoundland, was replicated in the Seattle sample. Two additional SNPs, in strong linkage disequilibrium with rs1800734, showed strong associations with MLH1 promoter methylation, loss of MLH1 protein, and MSI-H CRC in all three samples. The logistic regression model of MSI-H CRC that included MLH1-promotermethylation status and MLH1 immunohisotchemistry status fit most parsimoniously in all three samples combined. When rs1800734 was added to this model, its effect was not statistically significant (P-value = 0.72 vs. 2.361024 when the SNP was examined alone). Conclusions/Significance: The observed association of rs1800734 with MSI-H CRC occurs through its effect on the MLH1 promoter methylation, MLH1 IHC deficiency, or both
    corecore