Search CORE

391 research outputs found

Cluster validity in clustering methods

Author: Zhao Qinpei
Publication venue: University of Eastern Finland
Publication date
Field of study

Methods of Hierarchical Clustering

Author: Contreras Pedro
Murtagh Fionn
Publication venue
Publication date: 01/01/2011
Field of study

We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self-organizing maps, and mixture models. We review grid-based clustering, focusing on hierarchical density-based approaches. Finally we describe a recently developed very efficient (linear time) hierarchical clustering algorithm, which can also be viewed as a hierarchical grid-based algorithm.Comment: 21 pages, 2 figures, 1 table, 69 reference

arXiv.org e-Print Archive

Royal Holloway Research Online

Royal Holloway - Pure

Communities in Networks

Author: J. Mucha
Jukka-pekka Onnela
Jukka-pekka Onnela
Mason A. Porter
Mason A. Porter
Peter
Peter J. Mucha
Publication venue
Publication date: 01/01/2009
Field of study

We survey some of the concepts, methods, and applications of community detection, which has become an increasingly important area of network science. To help ease newcomers into the field, we provide a guide to available methodology and open problems, and discuss why scientists from diverse backgrounds are interested in these problems. As a running theme, we emphasize the connections of community detection to problems in statistical physics and computational optimization.Comment: survey/review article on community structure in networks; published version is available at http://people.maths.ox.ac.uk/~porterm/papers/comnotices.pd

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

Author: Al Hasan
Al-Daoud
Aloise
Aloise
Anderberg
Babu
Babu
Ball
Bei
Bergmann
Bottou
Breunig
Cao
Celebi
Chen
Chen
Daniel
Forgy
Friedman
Garcia
Garcia
Gonzalez
Hartigan
Hassan A. Kingravi
Hotelling
Huang
Huang
Hubert
Hyvärinen
Iman
Jain
Jain
Jancey
Kanungo
Katsavounidis
Kaufman
Lance
Likas
Linde
Lloyd
Lu
Luengo
M. Emre Celebi
Maitra
Mao
Matsumoto
Meilă
Milligan
Milligan
Norušis
Onoda
Ordonez
Pal
Patricio A. Vela
Pena
Redmond
Selim
Späth
Su
Tarsitano
Tou
Wu
Zhang
Publication venue: 'Elsevier BV'
Publication date: 10/09/2012
Field of study

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using non-parametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods.Comment: 17 pages, 1 figure, 7 table

arXiv.org e-Print Archive

Crossref

Clustering Algorithms For High Dimensional Data – A Survey Of Issues And Existing Approaches

Author: Babu B.Hari
Chandra N.Subash
Gopal T. Venu
Publication venue: Institute for Project Management Pvt. Ltd
Publication date: 05/09/2020
Field of study

Clustering is the most prominent data mining technique used for grouping the data into clusters based on distance measures. With the advent growth of high dimensional data such as microarray gene expression data, and grouping high dimensional data into clusters will encounter the similarity between the objects in the full dimensional space is often invalid because it contains different types of data. The process of grouping into high dimensional data into clusters is not accurate and perhaps not up to the level of expectation when the dimension of the dataset is high. It is now focusing tremendous attention towards research and development. The performance issues of the data clustering in high dimensional data it is necessary to study issues like dimensionality reduction, redundancy elimination, subspace clustering, co-clustering and data labeling for clusters are to analyzed and improved. In this paper, we presented a brief comparison of the existing algorithms that were mainly focusing at clustering on high dimensional data

Interscience Research Network

Local, multi-resolution detection of network communities by Markovian dynamics

Author: Yu Yun William
Publication venue: Mathematics, Imperial College London
Publication date: 01/04/2014
Field of study

Complex networks are used to represent systems from many disciplines, including biology, physics, medicine, engineering and the social sciences; Many real-world networks are organised into densely connected communi- ties, whose composition gives some insight into the underlying network. Most approaches for nding such communities do so by partitioning the network into disjoint subsets, at the cost of requiring global information and that nodes belong to exactly one community. In recent years, some effort has been devoted towards the development of local methods, but these are either limited in resolution or ignore relevant network features such as directedness. Here we show that introducing a dynamic process onto the network allows us to de ne a community quality function severability which is inherently multi-resolution, takes into account edge-weight and direction, can accommodate overlapping communities and orphan nodes and crucially does not require global knowledge. Both constructive and real-world examples| drawn from elds as diverse as image segmentation, metabolic networks and word association|are used to illustrate the characteristics of this approach. We envision this approach as a starting point for the future analysis of both evolving networks and networks too large to be readily analysed as a whole (e.g. the World Wide Web).Open Acces

Spiral - Imperial College Digital Repository

Knowledge discovery in high dimensional data

Author: Tasoulis Sotiris
Τασούλης Σωτήρης
Publication venue
Publication date: 01/01/2013
Field of study

University of Thessaly Institutional Repository