10,901 research outputs found
Systematically and efficiently improving existing -means initialization algorithms by pairwise-nearest-neighbor smoothing
We present a meta-method for initializing (seeding) the -means clustering
algorithm called PNN-smoothing. It consists in splitting a given dataset into
random subsets, clustering each of them individually, and merging the
resulting clusterings with the pairwise-nearest-neighbor (PNN) method. It is a
meta-method in the sense that when clustering the individual subsets any
seeding algorithm can be used. If the computational complexity of that seeding
algorithm is linear in the size of the data and the number of clusters ,
PNN-smoothing is also almost linear with an appropriate choice of , and
quite competitive in practice. We show empirically, using several existing
seeding methods and testing on several synthetic and real datasets, that this
procedure results in systematically better costs. Our implementation is
publicly available at https://github.com/carlobaldassi/KMeansPNNSmoothing.jl.Comment: 12 pages (+8 appendix), 2 figures, 3 tables (+14 appendix
An Arithmetic-Based Deterministic Centroid Initialization Method for the k-Means Clustering Algorithm
One of the greatest challenges in k-means clustering is positioning the initial cluster centers, or centroids, as close to optimal as possible, and doing so in an amount of time deemed reasonable. Traditional fc-means utilizes a randomization process for initializing these centroids, and poor initialization can lead to increased numbers of required clustering iterations to reach convergence, and a greater overall runtime. This research proposes a simple, arithmetic-based deterministic centroid initialization method which is much faster than randomized initialization. Preliminary experiments suggest that this collection of methods, referred to herein as the sharding centroid initialization algorithm family, often outperforms random initialization in terms of the required number of iterations for convergence and overall time-related metrics and is competitive or better in terms of the reported mean sum of squared errors (SSE) metric. Surprisingly, the sharding algorithms often manage to report more advantageous mean SSE values in the instances where their performance is slower than random initialization
Brain image clustering by wavelet energy and CBSSO optimization algorithm
Previously, the diagnosis of brain abnormality was significantly important in the saving of social and hospital resources. Wavelet energy is known as an effective feature detection which has great efficiency in different utilities. This paper suggests a new method based on wavelet energy to automatically classify magnetic resonance imaging (MRI) brain images into two groups (normal and abnormal), utilizing support vector machine (SVM) classification based on chaotic binary shark smell optimization (CBSSO) to optimize the SVM weights.
The results of the suggested CBSSO-based KSVM are compared favorably to several other methods in terms of better sensitivity and authenticity. The proposed CAD system can additionally be utilized to categorize the images with various pathological conditions, types, and illness modes
- …