Search CORE

99,323 research outputs found

Evaluating Heuristics and Crowding on Center Selection in K-Means Genetic Algorithms

Author: McGarvey William
Publication venue: NSUWorks
Publication date: 01/01/2014
Field of study

Data clustering involves partitioning data points into clusters where data points within the same cluster have high similarity, but are dissimilar to the data points in other clusters. The k-means algorithm is among the most extensively used clustering techniques. Genetic algorithms (GA) have been successfully used to evolve successive generations of cluster centers. The primary goal of this research was to develop improved GA-based methods for center selection in k-means by using heuristic methods to improve the overall fitness of the initial population of chromosomes along with crowding techniques to avoid premature convergence. Prior to this research, no rigorous systematic examination of the use of heuristics and crowding methods in this domain had been performed. The evaluation included computational experiments involving repeated runs of the genetic algorithm in which values that affect heuristics or crowding were systematically varied and the results analyzed. Genetic algorithm performance under the various configurations was analyzed based upon (1) the fitness of the partitions produced, and by (2) the overall time it took the GA to converge to good solutions. Two heuristic methods for initial center seeding were tested: Density and Separation. Two crowding techniques were evaluated on their ability to prevent premature convergence: Deterministic and Parent Favored Hybrid local tournament selection. Based on the experiment results, the Density method provides no significant advantage over random seeding either in discovering quality partitions or in more quickly evolving better partitions. The Separation method appears to result in an increased probability of the genetic algorithm finding slightly better partitions in slightly fewer generations, and to more quickly converge to quality partitions. Both local tournament selection techniques consistently allowed the genetic algorithm to find better quality partitions than roulette-wheel sampling. Deterministic selection consistently found better quality partitions in fewer generations than Parent Favored Hybrid. The combination of Separation center seeding and Deterministic selection performed better than any other combination, achieving the lowest mean best SSE value more than twice as often as any other combination. On all 28 benchmark problem instances, the combination identified solutions that were at least as good as any identified by extant methods

NSU Works

Unsupervised cryo-EM data clustering through adaptively constrained K-means algorithm

Author: Mao Youdong
Wu Jiayi
Xu Yaofang
Yin Chang-Cheng
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis.Comment: 35 pages, 14 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare

A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

Author: Al Hasan
Al-Daoud
Aloise
Aloise
Anderberg
Babu
Babu
Ball
Bei
Bergmann
Bottou
Breunig
Cao
Celebi
Chen
Chen
Daniel
Forgy
Friedman
Garcia
Garcia
Gonzalez
Hartigan
Hassan A. Kingravi
Hotelling
Huang
Huang
Hubert
Hyvärinen
Iman
Jain
Jain
Jancey
Kanungo
Katsavounidis
Kaufman
Lance
Likas
Linde
Lloyd
Lu
Luengo
M. Emre Celebi
Maitra
Mao
Matsumoto
Meilă
Milligan
Milligan
Norušis
Onoda
Ordonez
Pal
Patricio A. Vela
Pena
Redmond
Selim
Späth
Su
Tarsitano
Tou
Wu
Zhang
Publication venue: 'Elsevier BV'
Publication date: 10/09/2012
Field of study

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using non-parametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods.Comment: 17 pages, 1 figure, 7 table

arXiv.org e-Print Archive

Crossref