391 research outputs found
Methods of Hierarchical Clustering
We survey agglomerative hierarchical clustering algorithms and discuss
efficient implementations that are available in R and other software
environments. We look at hierarchical self-organizing maps, and mixture models.
We review grid-based clustering, focusing on hierarchical density-based
approaches. Finally we describe a recently developed very efficient (linear
time) hierarchical clustering algorithm, which can also be viewed as a
hierarchical grid-based algorithm.Comment: 21 pages, 2 figures, 1 table, 69 reference
Communities in Networks
We survey some of the concepts, methods, and applications of community
detection, which has become an increasingly important area of network science.
To help ease newcomers into the field, we provide a guide to available
methodology and open problems, and discuss why scientists from diverse
backgrounds are interested in these problems. As a running theme, we emphasize
the connections of community detection to problems in statistical physics and
computational optimization.Comment: survey/review article on community structure in networks; published
version is available at
http://people.maths.ox.ac.uk/~porterm/papers/comnotices.pd
A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm
K-means is undoubtedly the most widely used partitional clustering algorithm.
Unfortunately, due to its gradient descent nature, this algorithm is highly
sensitive to the initial placement of the cluster centers. Numerous
initialization methods have been proposed to address this problem. In this
paper, we first present an overview of these methods with an emphasis on their
computational efficiency. We then compare eight commonly used linear time
complexity initialization methods on a large and diverse collection of data
sets using various performance criteria. Finally, we analyze the experimental
results using non-parametric statistical tests and provide recommendations for
practitioners. We demonstrate that popular initialization methods often perform
poorly and that there are in fact strong alternatives to these methods.Comment: 17 pages, 1 figure, 7 table
Clustering Algorithms For High Dimensional Data – A Survey Of Issues And Existing Approaches
Clustering is the most prominent data mining technique used for grouping the data into clusters based on distance measures. With the advent growth of high dimensional data such as microarray gene expression data, and grouping high dimensional data into clusters will encounter the similarity between the objects in the full dimensional space is often invalid because it contains different types of data. The process of grouping into high dimensional data into clusters is not accurate and perhaps not up to the level of expectation when the dimension of the dataset is high. It is now focusing tremendous attention towards research and development. The performance issues of the data clustering in high dimensional data it is necessary to study issues like dimensionality reduction, redundancy elimination, subspace clustering, co-clustering and data labeling for clusters are to analyzed and improved. In this paper, we presented a brief comparison of the existing algorithms that were mainly focusing at clustering on high dimensional data
Local, multi-resolution detection of network communities by Markovian dynamics
Complex networks are used to represent systems from many disciplines,
including biology, physics, medicine, engineering and the social sciences;
Many real-world networks are organised into densely connected communi-
ties, whose composition gives some insight into the underlying network.
Most approaches for nding such communities do so by partitioning the
network into disjoint subsets, at the cost of requiring global information
and that nodes belong to exactly one community. In recent years, some effort
has been devoted towards the development of local methods, but these
are either limited in resolution or ignore relevant network features such as
directedness.
Here we show that introducing a dynamic process onto the network allows
us to de ne a community quality function severability which is inherently
multi-resolution, takes into account edge-weight and direction, can accommodate
overlapping communities and orphan nodes and crucially does not
require global knowledge. Both constructive and real-world examples|
drawn from elds as diverse as image segmentation, metabolic networks
and word association|are used to illustrate the characteristics of this approach.
We envision this approach as a starting point for the future analysis
of both evolving networks and networks too large to be readily analysed as
a whole (e.g. the World Wide Web).Open Acces
- …