29,921 research outputs found
Evaluating Heuristics and Crowding on Center Selection in K-Means Genetic Algorithms
Data clustering involves partitioning data points into clusters where data points within the same cluster have high similarity, but are dissimilar to the data points in other clusters. The k-means algorithm is among the most extensively used clustering techniques. Genetic algorithms (GA) have been successfully used to evolve successive generations of cluster centers. The primary goal of this research was to develop improved GA-based methods for center selection in k-means by using heuristic methods to improve the overall fitness of the initial population of chromosomes along with crowding techniques to avoid premature convergence. Prior to this research, no rigorous systematic examination of the use of heuristics and crowding methods in this domain had been performed.
The evaluation included computational experiments involving repeated runs of the genetic algorithm in which values that affect heuristics or crowding were systematically varied and the results analyzed. Genetic algorithm performance under the various configurations was analyzed based upon (1) the fitness of the partitions produced, and by (2) the overall time it took the GA to converge to good solutions. Two heuristic methods for initial center seeding were tested: Density and Separation. Two crowding techniques were evaluated on their ability to prevent premature convergence: Deterministic and Parent Favored Hybrid local tournament selection.
Based on the experiment results, the Density method provides no significant advantage over random seeding either in discovering quality partitions or in more quickly evolving better partitions. The Separation method appears to result in an increased probability of the genetic algorithm finding slightly better partitions in slightly fewer generations, and to more quickly converge to quality partitions. Both local tournament selection techniques consistently allowed the genetic algorithm to find better quality partitions than roulette-wheel sampling. Deterministic selection consistently found better quality partitions in fewer generations than Parent Favored Hybrid. The combination of Separation center seeding and Deterministic selection performed better than any other combination, achieving the lowest mean best SSE value more than twice as often as any other combination. On all 28 benchmark problem instances, the combination identified solutions that were at least as good as any identified by extant methods
Recommended from our members
A niching memetic algorithm for simultaneous clustering and feature selection
Clustering is inherently a difficult task, and is made even more difficult when the selection of relevant features is also an issue. In this paper we propose an approach for simultaneous clustering and feature selection using a niching memetic algorithm. Our approach (which we call NMA_CFS) makes feature selection an integral part of the global clustering search procedure and attempts to overcome the problem of identifying less promising locally optimal solutions in both clustering and feature selection, without making any a priori assumption about the number of clusters. Within the NMA_CFS procedure, a variable composite representation is devised to encode both feature selection and cluster centers with different numbers of clusters. Further, local search operations are introduced to refine feature selection and cluster centers encoded in the chromosomes. Finally, a niching method is integrated to preserve the population diversity and prevent premature convergence. In an experimental evaluation we demonstrate the effectiveness of the proposed approach and compare it with other related approaches, using both synthetic and real data
- …