391 research outputs found

    Cluster validity in clustering methods

    Get PDF

    Methods of Hierarchical Clustering

    Get PDF
    We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self-organizing maps, and mixture models. We review grid-based clustering, focusing on hierarchical density-based approaches. Finally we describe a recently developed very efficient (linear time) hierarchical clustering algorithm, which can also be viewed as a hierarchical grid-based algorithm.Comment: 21 pages, 2 figures, 1 table, 69 reference

    Communities in Networks

    Full text link
    We survey some of the concepts, methods, and applications of community detection, which has become an increasingly important area of network science. To help ease newcomers into the field, we provide a guide to available methodology and open problems, and discuss why scientists from diverse backgrounds are interested in these problems. As a running theme, we emphasize the connections of community detection to problems in statistical physics and computational optimization.Comment: survey/review article on community structure in networks; published version is available at http://people.maths.ox.ac.uk/~porterm/papers/comnotices.pd

    A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using non-parametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods.Comment: 17 pages, 1 figure, 7 table

    Clustering Algorithms For High Dimensional Data – A Survey Of Issues And Existing Approaches

    Get PDF
    Clustering is the most prominent data mining technique used for grouping the data into clusters based on distance measures. With the advent growth of high dimensional data such as microarray gene expression data, and grouping high dimensional data into clusters will encounter the similarity between the objects in the full dimensional space is often invalid because it contains different types of data. The process of grouping into high dimensional data into clusters is not accurate and perhaps not up to the level of expectation when the dimension of the dataset is high. It is now focusing tremendous attention towards research and development. The performance issues of the data clustering in high dimensional data it is necessary to study issues like dimensionality reduction, redundancy elimination, subspace clustering, co-clustering and data labeling for clusters are to analyzed and improved. In this paper, we presented a brief comparison of the existing algorithms that were mainly focusing at clustering on high dimensional data

    Local, multi-resolution detection of network communities by Markovian dynamics

    Get PDF
    Complex networks are used to represent systems from many disciplines, including biology, physics, medicine, engineering and the social sciences; Many real-world networks are organised into densely connected communi- ties, whose composition gives some insight into the underlying network. Most approaches for nding such communities do so by partitioning the network into disjoint subsets, at the cost of requiring global information and that nodes belong to exactly one community. In recent years, some effort has been devoted towards the development of local methods, but these are either limited in resolution or ignore relevant network features such as directedness. Here we show that introducing a dynamic process onto the network allows us to de ne a community quality function severability which is inherently multi-resolution, takes into account edge-weight and direction, can accommodate overlapping communities and orphan nodes and crucially does not require global knowledge. Both constructive and real-world examples| drawn from elds as diverse as image segmentation, metabolic networks and word association|are used to illustrate the characteristics of this approach. We envision this approach as a starting point for the future analysis of both evolving networks and networks too large to be readily analysed as a whole (e.g. the World Wide Web).Open Acces
    corecore