Search CORE

10,901 research outputs found

Systematically and efficiently improving existing $k$ -means initialization algorithms by pairwise-nearest-neighbor smoothing

Author: Baldassi Carlo
Publication venue
Publication date: 15/09/2022
Field of study

We present a meta-method for initializing (seeding) the

k

-means clustering algorithm called PNN-smoothing. It consists in splitting a given dataset into

J

random subsets, clustering each of them individually, and merging the resulting clusterings with the pairwise-nearest-neighbor (PNN) method. It is a meta-method in the sense that when clustering the individual subsets any seeding algorithm can be used. If the computational complexity of that seeding algorithm is linear in the size of the data

N

and the number of clusters

k

, PNN-smoothing is also almost linear with an appropriate choice of

J

, and quite competitive in practice. We show empirically, using several existing seeding methods and testing on several synthetic and real datasets, that this procedure results in systematically better costs. Our implementation is publicly available at https://github.com/carlobaldassi/KMeansPNNSmoothing.jl.Comment: 12 pages (+8 appendix), 2 figures, 3 tables (+14 appendix

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Bocconi

An Arithmetic-Based Deterministic Centroid Initialization Method for the k-Means Clustering Algorithm

Author: Mayo Matthew Michael
Publication venue: CSU ePress
Publication date: 01/05/2016
Field of study

One of the greatest challenges in k-means clustering is positioning the initial cluster centers, or centroids, as close to optimal as possible, and doing so in an amount of time deemed reasonable. Traditional fc-means utilizes a randomization process for initializing these centroids, and poor initialization can lead to increased numbers of required clustering iterations to reach convergence, and a greater overall runtime. This research proposes a simple, arithmetic-based deterministic centroid initialization method which is much faster than randomized initialization. Preliminary experiments suggest that this collection of methods, referred to herein as the sharding centroid initialization algorithm family, often outperforms random initialization in terms of the required number of iterations for convergence and overall time-related metrics and is competitive or better in terms of the reported mean sum of squared errors (SSE) metric. Surprisingly, the sharding algorithms often manage to report more advantageous mean SSE values in the instances where their performance is slower than random initialization

Columbus State University

Brain image clustering by wavelet energy and CBSSO optimization algorithm

Author: Hosseinzadeh Hasan
Sedaghat Mohammad
Publication venue: ValpoScholar
Publication date: 28/04/2019
Field of study

Previously, the diagnosis of brain abnormality was significantly important in the saving of social and hospital resources. Wavelet energy is known as an effective feature detection which has great efficiency in different utilities. This paper suggests a new method based on wavelet energy to automatically classify magnetic resonance imaging (MRI) brain images into two groups (normal and abnormal), utilizing support vector machine (SVM) classification based on chaotic binary shark smell optimization (CBSSO) to optimize the SVM weights. The results of the suggested CBSSO-based KSVM are compared favorably to several other methods in terms of better sensitivity and authenticity. The proposed CAD system can additionally be utilized to categorize the images with various pathological conditions, types, and illness modes

Valparaiso University