Search CORE

478 research outputs found

Iterative Optimization and Simplification of Hierarchical Clusterings

Author: Fisher D.
Publication venue
Publication date: 01/01/1995
Field of study

Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search strategy should consistently construct clusterings of high quality, but be computationally inexpensive as well. In general, we cannot have it both ways, but we can partition the search so that a system inexpensively constructs a `tentative' clustering for initial examination, followed by iterative optimization, which continues to search in background for improved clusterings. Given this motivation, we evaluate an inexpensive strategy for creating initial clusterings, coupled with several control strategies for iterative optimization, each of which repeatedly modifies an initial clustering in search of a better one. One of these methods appears novel as an iterative optimization strategy in clustering contexts. Once a clustering has been constructed it is judged by analysts -- often according to task-specific criteria. Several authors have abstracted these criteria and posited a generic performance task akin to pattern completion, where the error rate over completed patterns is used to `externally' judge clustering utility. Given this performance task, we adapt resampling-based pruning strategies used by supervised learning systems to the task of simplifying hierarchical clusterings, thus promising to ease post-clustering analysis. Finally, we propose a number of objective functions, based on attribute-selection measures for decision-tree induction, that might perform well on the error rate and simplicity dimensions.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Optimizing an Organized Modularity Measure for Topographic Graph Clustering: a Deterministic Annealing Approach

Author: Becker
Bishop
Blondel
Boulet
Butts
Cerny
Di Battista
Duch
Eades
Fabrice Rossi
Fabrikant
Fortunato
Fruchterman
Golub
Graepel
Graepel
Guimera
Herman
Hofmann
Jaakkola
Knuth
Lee
Lehmann
Nathalie Villa-Vialaneix
Newman
Newman
Newman
Newman
Newman
Noack
Noack
Purchase
Reichardt
Rose
Schaeffer
Schölkopf
Vesanto
von Luxburg
Ware
Wasserman
Watts
Yen
Zachary
Publication venue: 'Elsevier BV'
Publication date: 01/03/2010
Field of study

This paper proposes an organized generalization of Newman and Girvan's modularity measure for graph clustering. Optimized via a deterministic annealing scheme, this measure produces topologically ordered graph clusterings that lead to faithful and readable graph representations based on clustering induced graphs. Topographic graph clustering provides an alternative to more classical solutions in which a standard graph clustering method is applied to build a simpler graph that is then represented with a graph layout algorithm. A comparative study on four real world graphs ranging from 34 to 1 133 vertices shows the interest of the proposed approach with respect to classical solutions and to self-organizing maps for graphs

arXiv.org e-Print Archive

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

Autonomous clustering using rough set theory

Author: A. K. Jain
A. K. Jain
A. Skowron
B. J. F. Manly
B. S. Everitt
C. L. Bean
C. L. Bean
Chandra Kambhampati
Charlotte Bean
D. Dubois
E. W. Forgey
F. H. C. Marriott
F. Höppner
G. H. Ball
J. A. Hartigan
J. B. MacQueen
J. C. Bezdek
J. C. Dunn
J. H. Ward
J. Komorowski
J. S. R. Jang
M. R. Anderberg
M. S. Aldenderfer
M. S. Kamel
P. Sneath
R. C. Jancey
R. R. Sokal
R. R. Yegar
S. Sharma
S. Z. Selim
T. Okuzaki
T. Sorensen
Z. Pawlak
Z. Pawlak
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

This paper proposes a clustering technique that minimises the need for subjective human intervention and is based on elements of rough set theory. The proposed algorithm is unified in its approach to clustering and makes use of both local and global data properties to obtain clustering solutions. It handles single-type and mixed attribute data sets with ease and results from three data sets of single and mixed attribute types are used to illustrate the technique and establish its efficiency

Repository@Hull - Worktribe

Crossref

Warwick Research Archives Portal Repository

A Short Survey on Data Clustering Algorithms

Author: Wong Ka-Chun
Publication venue
Publication date: 25/11/2015
Field of study

With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial analysis. Formally speaking, given a set of data instances, a clustering algorithm is expected to divide the set of data instances into the subsets which maximize the intra-subset similarity and inter-subset dissimilarity, where a similarity measure is defined beforehand. In this work, the state-of-the-arts clustering algorithms are reviewed from design concept to methodology; Different clustering paradigms are discussed. Advanced clustering algorithms are also discussed. After that, the existing clustering evaluation metrics are reviewed. A summary with future insights is provided at the end

arXiv.org e-Print Archive

Crossref

Comparison and validation of community structures in complex networks

Author: Anna Lombardi
Ashburner
Azuaje
Bolshakova
Danon
Duch
Evans
Fisher
Girvan
Guimera
Gusfield
Jaccard
Maslov
Massen
Michael Hörnquist
Mika Gustafsson
Milligan
Newman
Newman
Newman
Newman
Rives
Rousseeuw
Stanley
Strehl
Zachary
Zhou
Publication venue: 'Elsevier BV'
Publication date: 10/01/2006
Field of study

The issue of partitioning a network into communities has attracted a great deal of attention recently. Most authors seem to equate this issue with the one of finding the maximum value of the modularity, as defined by Newman. Since the problem formulated this way is NP-hard, most effort has gone into the construction of search algorithms, and less to the question of other measures of community structures, similarities between various partitionings and the validation with respect to external information. Here we concentrate on a class of computer generated networks and on three well-studied real networks which constitute a bench-mark for network studies; the karate club, the US college football teams and a gene network of yeast. We utilize some standard ways of clustering data (originally not designed for finding community structures in networks) and show that these classical methods sometimes outperform the newer ones. We discuss various measures of the strength of the modular structure, and show by examples features and drawbacks. Further, we compare different partitions by applying some graph-theoretic concepts of distance, which indicate that one of the quality measures of the degree of modularity corresponds quite well with the distance from the true partition. Finally, we introduce a way to validate the partitionings with respect to external data when the nodes are classified but the network structure is unknown. This is here possible since we know everything of the computer generated networks, as well as the historical answer to how the karate club and the football teams are partitioned in reality. The partitioning of the gene network is validated by use of the Gene Ontology database, where we show that a community in general corresponds to a biological process.Comment: To appear in Physica A; 25 page

arXiv.org e-Print Archive

Crossref

CERN Document Server