8 research outputs found

    Short statistics on the three microarray experimental data used in the testing of our algorithm and the other three variants of k-means algorithm.

    Full text link
    <p>The second and third columns indicate the total number of genes covered in each experiment and the number of points (at equal interval) at which the genes transcriptional expression are measured.</p

    Hubert-Arabie Adjusted Rand Index (ARI<sub>HA</sub>) Cluster Quality Computation Result for Biological and Non-biological data.

    Full text link
    <p>For each data, Bozdech et al. 3D7 and HB3 strains <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0049946#pone.0049946-Bozdech1" target="_blank">[26]</a> and Le Roch et al. <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0049946#pone.0049946-LeRoch1" target="_blank">[27]</a>, we used two values of k to demonstrate the effect of changing k values on the clusters quality of the clustering algorithms. We considered the structure of the Traditional k-means as the known structure and compare the clusters of MM, Enhanced and Overlapped k-means respectively with it. In a separate (last) column, we also compare the structure of the Enhanced k-means with that of Overlapped k-means.</p

    Non-Biological data used for testing our algorithm and the other three variants of k-means algorithm.

    Full text link
    <p>Abalone dataset described with 8 attributes represents physical measurements of abalone (sea organism). Wind dataset described by 12 attributes represents measurements on wind from 1/1/1961 to 31/12/1978. Letter dataset represents the image of English capital letters described by 16 primitive numerical attributes (statistical moments and edge counts).</p

    Performance comparison for all types of k-means algorithms considered for very large data sets.

    Full text link
    <p>This constitute simulation of three large data sets in the order of; 10,000Γ—50, 30,000Γ—50 and 50,000Γ—50 dimension. The range of K used is 10≀K≀40 for the four algorithms.</p

    Execution Time (Bozdech <i>et al.</i>, <i>P.f</i> 3D7 Microarray Dataset).

    Full text link
    <p>The plot shows that our MMk-means has the fastest run-time for tested number of clusters, 15≀k≀25. Comparatively, kβ€Š=β€Š20 took the longest run-time for all the four algorithms, implying that this is a function of the nature of the data under consideration.</p

    Pseudocode of our Compute_MM Sub-program for <i>MMk-means</i>.

    Full text link
    <p>We create a covariance matrix, computing the Pearson product moment correlation coefficient between the k centroids of the previous and current iterations and then deduce k previous and current iterations eigenvalues. The difference of these eigenvalues for each cluster is computed and checked to see if it satisfies the <i>Ding-H</i>e interval.</p

    Pseudocode of our main program for <i>MMk-means</i>.

    Full text link
    <p>It runs similar to the traditional k-means except that it is equipped with a metric matrices based mechanism to determine when a cluster is stable (that is, its members will not move from this cluster in subsequent iteration). This mechanism is implemented in sub-procedure Compute_MM of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0049946#pone-0049946-g001" target="_blank"><i>Figure</i> 1</a>. We use the theory developed by Zha <i>et al. </i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0049946#pone.0049946-Zha1" target="_blank">[20]</a> from the singular values of the matrix X of the input data points to determine when it is appropriate to execute Compute_MM during the k-means iterations. This is implemented in lines 34–40.</p

    Quality of Clusters (Bozdech <i>et al.</i>, <i>P.f</i> 3D7 Microarray Dataset).

    Full text link
    <p>The qualities of clusters for the four algorithms are similar. The MSE decreases gradually as the number of clusters increases except for kβ€Š=β€Š21 that has a higher MSE than when kβ€Š=β€Š20.</p
    corecore