Search CORE

10,249 research outputs found

Finding groups in data: Cluster analysis with ants

Author: Berger
Bonabeau
Bonabeau
Brito
Brucker
Chu
Deneubourg
Deneubourg
Dorigo
Dubes
Ester
Franks
Ganti
Gibson
Guha
Halkidi
Handl
Hansen
Jain
Karypis
Kaufman
Kennedy
Lee
Lumer
MacQueen
Ng
Oprisan
Rijsbergen
Urszula Boryczka
Welch
Zait
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

Wepresent in this paper a modification of Lumer and Faieta’s algorithm for data clustering. This approach mimics the clustering behavior observed in real ant colonies. This algorithm discovers automatically clusters in numerical data without prior knowledge of possible number of clusters. In this paper we focus on ant-based clustering algorithms, a particular kind of a swarm intelligent system, and on the effects on the final clustering by using during the classification differentmetrics of dissimilarity: Euclidean, Cosine, and Gower measures. Clustering with swarm-based algorithms is emerging as an alternative to more conventional clustering methods, such as e.g. k-means, etc. Among the many bio-inspired techniques, ant clustering algorithms have received special attention, especially because they still require much investigation to improve performance, stability and other key features that would make such algorithms mature tools for data mining. As a case study, this paper focus on the behavior of clustering procedures in those new approaches. The proposed algorithm and its modifications are evaluated in a number of well-known benchmark datasets. Empirical results clearly show that ant-based clustering algorithms performs well when compared to another techniques

Crossref

Bournemouth University Research Online

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

arXiv.org e-Print Archive

Crossref

A link density clustering algorithm based on automatically selecting density peaks for overlapping community detection

Author: Huang Lan
Ma Qin
Pang Wei
Wang Guishen
Wang Yan
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 30/09/2016
Field of study

Peer reviewedPostprin

Aberdeen University Research

Heriot Watt Pure