Search CORE

24,132 research outputs found

Reducing the Time Requirement of k-Means Algorithm

Author: Adebiyi E. F.
Doumbia Seydou
Osamor V. C.
Oyelade O. J.
Publication venue
Publication date: 01/01/2012
Field of study

Traditional k-means and most k-means variants are still computationally expensive for large datasets, such as microarray data, which have large datasets with large dimension size d. In k-means clustering, we are given a set of n data points in ddimensional space Rd and an integer k. The problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this work, we develop a novel k-means algorithm, which is simple but more efficient than the traditional k-means and the recent enhanced k-means. Our new algorithm is based on the recently established relationship between principal component analysis and the k-means clustering. We provided the correctness proof for this algorithm. Results obtained from testing the algorithm on three biological data and six non-biological data (three of these data are real, while the other three are simulated) also indicate that our algorithm is empirically faster than other known k-means algorithms. We assessed the quality of our algorithm clusters against the clusters of a known structure using the Hubert-Arabie Adjusted Rand index (ARIHA). We found that when k is close to d, the quality is good (ARIHA.0.8) and when k is not close to d, the quality of our new k-means algorithm is excellent (ARIHA.0.9). In this paper, emphases are on the reduction of the time requirement of the k-means algorithm and its application to microarray data due to the desire to create a tool for clustering and malaria research. However, the new clustering algorithm can be used for other clustering needs as long as an appropriate measure of distance between the centroids and the members is used. This has been demonstrated in this work on six non-biological data

CiteSeerX

Covenant University Repository

Directory of Open Access Journals

PubMed Central

FigShare

Study of document clustering using the k-means algorithm

Author: Gummuluru Meghna Sharma
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/2006
Field of study

One of the most commonly used data mining techniques is document clustering or unsupervised document classification which deals with the grouping of documents based on some document similarity function; This thesis deals with research issues associated with categorizing documents using the k-means clustering algorithm which groups objects into K number of groups based on document representations and similarities; The proposed hypothesis of this thesis is to prove that unsupervised clustering of a set of documents produces similar results to that of their supervised categorization

University of Nevada, Las Vegas Repository

A Faster $k$ -means++ Algorithm

Author: Liang Jiehao
Sarkhel Somdeb
Song Zhao
Yin Chenbo
Zhuo Danyang
Publication venue
Publication date: 28/11/2022
Field of study

K-means++ is an important algorithm to choose initial cluster centers for the k-means clustering algorithm. In this work, we present a new algorithm that can solve the

k

-means++ problem with near optimal running time. Given

n

data points in

\mathbb{R}^d

, the current state-of-the-art algorithm runs in

\widetilde{O}(k )

iterations, and each iteration takes

\widetilde{O}(nd k)

time. The overall running time is thus

\widetilde{O}(n d k^2)

. We propose a new algorithm \textsc{FastKmeans++} that only takes in

\widetilde{O}(nd + nk^2)

time, in total

arXiv.org e-Print Archive

Color image segmentation using a spatial k-means clustering algorithm

Author: Ilea Dana E.
Whelan Paul F.
Publication venue
Publication date: 01/01/2006
Field of study

This paper details the implementation of a new adaptive technique for color-texture segmentation that is a generalization of the standard K-Means algorithm. The standard K-Means algorithm produces accurate segmentation results only when applied to images defined by homogenous regions with respect to texture and color since no local constraints are applied to impose spatial continuity. In addition, the initialization of the K-Means algorithm is problematic and usually the initial cluster centers are randomly picked. In this paper we detail the implementation of a novel technique to select the dominant colors from the input image using the information from the color histograms. The main contribution of this work is the generalization of the K-Means algorithm that includes the primary features that describe the color smoothness and texture complexity in the process of pixel assignment. The resulting color segmentation scheme has been applied to a large number of natural images and the experimental data indicates the robustness of the new developed segmentation algorithm

Irish Universities

DCU Online Research Access Service