17 research outputs found
Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm
Over the past five decades, k-means has become the clustering algorithm of
choice in many application domains primarily due to its simplicity, time/space
efficiency, and invariance to the ordering of the data points. Unfortunately,
the algorithm's sensitivity to the initial selection of the cluster centers
remains to be its most serious drawback. Numerous initialization methods have
been proposed to address this drawback. Many of these methods, however, have
time complexity superlinear in the number of data points, which makes them
impractical for large data sets. On the other hand, linear methods are often
random and/or sensitive to the ordering of the data points. These methods are
generally unreliable in that the quality of their results is unpredictable.
Therefore, it is common practice to perform multiple runs of such methods and
take the output of the run that produces the best results. Such a practice,
however, greatly increases the computational requirements of the otherwise
highly efficient k-means algorithm. In this chapter, we investigate the
empirical performance of six linear, deterministic (non-random), and
order-invariant k-means initialization methods on a large and diverse
collection of data sets from the UCI Machine Learning Repository. The results
demonstrate that two relatively unknown hierarchical initialization methods due
to Su and Dy outperform the remaining four methods with respect to two
objective effectiveness criteria. In addition, a recent method due to Erisoglu
et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms
(Springer, 2014). arXiv admin note: substantial text overlap with
arXiv:1304.7465, arXiv:1209.196
Bioavailable iron in the Southern Ocean: the significance of the iceberg conveyor belt
Productivity in the Southern Oceans is iron-limited, and the supply of iron dissolved from aeolian dust is believed to be the main source from outside the marine reservoir. Glacial sediment sources of iron have rarely been considered, as the iron has been assumed to be inert and non-bioavailable. This study demonstrates the presence of potentially bioavailable Fe as ferrihydrite and goethite in nanoparticulate clusters, in sediments collected from icebergs in the Southern Ocean and glaciers on the Antarctic landmass. Nanoparticles in ice can be transported by icebergs away from coastal regions in the Southern Ocean, enabling melting to release bioavailable Fe to the open ocean. The abundance of nanoparticulate iron has been measured by an ascorbate extraction. This data indicates that the fluxes of bioavailable iron supplied to the Southern Ocean from aeolian dust (0.01–0.13 Tg yr-1) and icebergs (0.06–0.12 Tg yr-1) are comparable. Increases in iceberg production thus have the capacity to increase productivity and this newly identified negative feedback may help to mitigate fossil fuel emissions
Specific Variants in the MLH1 Gene Region May Drive DNA Methylation, Loss of Protein Expression, and MSI-H Colorectal Cancer
Background: We previously identified an association between a mismatch repair gene, MLH1, promoter SNP (rs1800734)
and microsatellite unstable (MSI-H) colorectal cancers (CRCs) in two samples. The current study expanded on this finding as
we explored the genetic basis of DNA methylation in this region of chromosome 3. We hypothesized that specific
polymorphisms in the MLH1 gene region predispose it to DNA methylation, resulting in the loss of MLH1 gene expression,
mismatch-repair function, and consequently to genome-wide microsatellite instability.
Methodology/Principal Findings: We first tested our hypothesis in one sample from Ontario (901 cases, 1,097 controls) and
replicated major findings in two additional samples from Newfoundland and Labrador (479 cases, 336 controls) and from
Seattle (591 cases, 629 controls). Logistic regression was used to test for association between SNPs in the region of MLH1
and CRC, MSI-H CRC, MLH1 gene expression in CRC, and DNA methylation in CRC. The association between rs1800734 and
MSI-H CRCs, previously reported in Ontario and Newfoundland, was replicated in the Seattle sample. Two additional SNPs, in
strong linkage disequilibrium with rs1800734, showed strong associations with MLH1 promoter methylation, loss of MLH1
protein, and MSI-H CRC in all three samples. The logistic regression model of MSI-H CRC that included MLH1-promotermethylation
status and MLH1 immunohisotchemistry status fit most parsimoniously in all three samples combined. When
rs1800734 was added to this model, its effect was not statistically significant (P-value = 0.72 vs. 2.361024 when the SNP was
examined alone).
Conclusions/Significance: The observed association of rs1800734 with MSI-H CRC occurs through its effect on the MLH1
promoter methylation, MLH1 IHC deficiency, or both