Search CORE

8 research outputs found

Short statistics on the three microarray experimental data used in the testing of our algorithm and the other three variants of k-means algorithm.

Author: Ezekiel Femi Adebiyi (111859)
Jelilli Olarenwaju Oyelade (111862)
Seydou Doumbia (65941)
Victor Chukwudi Osamor (111856)
Publication venue
Publication date
Field of study

The second and third columns indicate the total number of genes covered in each experiment and the number of points (at equal interval) at which the genes transcriptional expression are measured.</p

The Francis Crick Institute

Hubert-Arabie Adjusted Rand Index (ARIHA) Cluster Quality Computation Result for Biological and Non-biological data.

Author: Ezekiel Femi Adebiyi (111859)
Jelilli Olarenwaju Oyelade (111862)
Seydou Doumbia (65941)
Victor Chukwudi Osamor (111856)
Publication venue
Publication date
Field of study

For each data, Bozdech et al. 3D7 and HB3 strains <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0049946#pone.0049946-Bozdech1" target="_blank">[26]</a> and Le Roch et al. <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0049946#pone.0049946-LeRoch1" target="_blank">[27]</a>, we used two values of k to demonstrate the effect of changing k values on the clusters quality of the clustering algorithms. We considered the structure of the Traditional k-means as the known structure and compare the clusters of MM, Enhanced and Overlapped k-means respectively with it. In a separate (last) column, we also compare the structure of the Enhanced k-means with that of Overlapped k-means.</p

The Francis Crick Institute

Non-Biological data used for testing our algorithm and the other three variants of k-means algorithm.

Author: Ezekiel Femi Adebiyi (111859)
Jelilli Olarenwaju Oyelade (111862)
Seydou Doumbia (65941)
Victor Chukwudi Osamor (111856)
Publication venue
Publication date
Field of study

Abalone dataset described with 8 attributes represents physical measurements of abalone (sea organism). Wind dataset described by 12 attributes represents measurements on wind from 1/1/1961 to 31/12/1978. Letter dataset represents the image of English capital letters described by 16 primitive numerical attributes (statistical moments and edge counts).</p

The Francis Crick Institute

Performance comparison for all types of k-means algorithms considered for very large data sets.

Author: Ezekiel Femi Adebiyi (111859)
Jelilli Olarenwaju Oyelade (111862)
Seydou Doumbia (65941)
Victor Chukwudi Osamor (111856)
Publication venue
Publication date
Field of study

This constitute simulation of three large data sets in the order of; 10,000×50, 30,000×50 and 50,000×50 dimension. The range of K used is 10≤K≤40 for the four algorithms.</p

The Francis Crick Institute

Execution Time (Bozdech et al., P.f 3D7 Microarray Dataset).

Author: Ezekiel Femi Adebiyi (111859)
Jelilli Olarenwaju Oyelade (111862)
Seydou Doumbia (65941)
Victor Chukwudi Osamor (111856)
Publication venue
Publication date
Field of study

The plot shows that our MMk-means has the fastest run-time for tested number of clusters, 15≤k≤25. Comparatively, k = 20 took the longest run-time for all the four algorithms, implying that this is a function of the nature of the data under consideration.</p

The Francis Crick Institute

Pseudocode of our Compute_MM Sub-program for MMk-means.

Author: Ezekiel Femi Adebiyi (111859)
Jelilli Olarenwaju Oyelade (111862)
Seydou Doumbia (65941)
Victor Chukwudi Osamor (111856)
Publication venue
Publication date
Field of study

We create a covariance matrix, computing the Pearson product moment correlation coefficient between the k centroids of the previous and current iterations and then deduce k previous and current iterations eigenvalues. The difference of these eigenvalues for each cluster is computed and checked to see if it satisfies the Ding-He interval.</p

The Francis Crick Institute

Pseudocode of our main program for MMk-means.

Author: Ezekiel Femi Adebiyi (111859)
Jelilli Olarenwaju Oyelade (111862)
Seydou Doumbia (65941)
Victor Chukwudi Osamor (111856)
Publication venue
Publication date
Field of study

It runs similar to the traditional k-means except that it is equipped with a metric matrices based mechanism to determine when a cluster is stable (that is, its members will not move from this cluster in subsequent iteration). This mechanism is implemented in sub-procedure Compute_MM of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0049946#pone-0049946-g001" target="_blank">Figure 1</a>. We use the theory developed by Zha et al. <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0049946#pone.0049946-Zha1" target="_blank">[20]</a> from the singular values of the matrix X of the input data points to determine when it is appropriate to execute Compute_MM during the k-means iterations. This is implemented in lines 34–40.</p

The Francis Crick Institute

Quality of Clusters (Bozdech et al., P.f 3D7 Microarray Dataset).

Author: Ezekiel Femi Adebiyi (111859)
Jelilli Olarenwaju Oyelade (111862)
Seydou Doumbia (65941)
Victor Chukwudi Osamor (111856)
Publication venue
Publication date
Field of study

The qualities of clusters for the four algorithms are similar. The MSE decreases gradually as the number of clusters increases except for k = 21 that has a higher MSE than when k = 20.</p

The Francis Crick Institute

Short statistics on the three microarray experimental data used in the testing of our algorithm and the other three variants of k-means algorithm.

Hubert-Arabie Adjusted Rand Index (ARI<sub>HA</sub>) Cluster Quality Computation Result for Biological and Non-biological data.

Non-Biological data used for testing our algorithm and the other three variants of k-means algorithm.

Performance comparison for all types of k-means algorithms considered for very large data sets.

Execution Time (Bozdech <i>et al.</i>, <i>P.f</i> 3D7 Microarray Dataset).

Pseudocode of our Compute_MM Sub-program for <i>MMk-means</i>.

Pseudocode of our main program for <i>MMk-means</i>.

Quality of Clusters (Bozdech <i>et al.</i>, <i>P.f</i> 3D7 Microarray Dataset).