Search CORE

422 research outputs found

A new hierarchical clustering algorithm to identify non-overlapping like-minded communities

Author: Adhya Hindol
Deepak Talasila Sai
Gullapalli Bhanuteja
Kejriwal Shyamal
Shannigrahi Saswata
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/02/2016
Field of study

A network has a non-overlapping community structure if the nodes of the network can be partitioned into disjoint sets such that each node in a set is densely connected to other nodes inside the set and sparsely connected to the nodes out- side it. There are many metrics to validate the efficacy of such a structure, such as clustering coefficient, betweenness, centrality, modularity and like-mindedness. Many methods have been proposed to optimize some of these metrics, but none of these works well on the recently introduced metric like-mindedness. To solve this problem, we propose a be- havioral property based algorithm to identify communities that optimize the like-mindedness metric and compare its performance on this metric with other behavioral data based methodologies as well as community detection methods that rely only on structural data. We execute these algorithms on real-life datasets of Filmtipset and Twitter and show that our algorithm performs better than the existing algorithms with respect to the like-mindedness metric

arXiv.org e-Print Archive

Kernel Spectral Clustering and applications

In this chapter we review the main literature related to kernel spectral clustering (KSC), an approach to clustering cast within a kernel-based optimization setting. KSC represents a least-squares support vector machine based formulation of spectral clustering described by a weighted kernel PCA objective. Just as in the classifier case, the binary clustering model is expressed by a hyperplane in a high dimensional space induced by a kernel. In addition, the multi-way clustering can be obtained by combining a set of binary decision functions via an Error Correcting Output Codes (ECOC) encoding scheme. Because of its model-based nature, the KSC method encompasses three main steps: training, validation, testing. In the validation stage model selection is performed to obtain tuning parameters, like the number of clusters present in the data. This is a major advantage compared to classical spectral clustering where the determination of the clustering parameters is unclear and relies on heuristics. Once a KSC model is trained on a small subset of the entire data, it is able to generalize well to unseen test points. Beyond the basic formulation, sparse KSC algorithms based on the Incomplete Cholesky Decomposition (ICD) and

L_0

L_1, L_0 + L_1

, Group Lasso regularization are reviewed. In that respect, we show how it is possible to handle large scale data. Also, two possible ways to perform hierarchical clustering and a soft clustering method are presented. Finally, real-world applications such as image segmentation, power load time-series clustering, document clustering and big data learning are considered.Comment: chapter contribution to the book "Unsupervised Learning Algorithms

arXiv.org e-Print Archive

Meta Clustering

Author: Caruana Rich
Elhawary Mohamed
Nguyen Nam
Smith Casey
Publication venue: 'SAGE Publications'
Publication date: 29/09/2006
Field of study

Clustering is ill-defined. Unlike supervised learning where labels lead to crisp performance criteria such as accuracy and squared error, clustering quality depends on how the clusters will be used. Devising clustering criteria that capture what users need is difficult. Most clustering algorithms search for one optimal clustering based on a pre-specified clustering criterion. Once that clustering has been determined, no further clusterings are examined. Our approach differs in that we search for many alternate reasonable clusterings of the data, and then allow users to select the clustering(s) that best fit their needs. Any reasonable partitioning of the data is potentially useful for some purpose, regardless of whether or not it is optimal according to a specific clustering criterion. Our approach first finds a variety of reasonable clusterings. It then clusters this diverse set of clusterings so that users must only examine a small number of qualitatively different clusterings. In this paper, we present methods for automatically generating a diverse set of alternate clusterings, as well as methods for grouping clusterings into meta clusters. We evaluate meta clustering on four test problems, and then apply meta clustering to two case studies. Surprisingly, clusterings that would be of most interest to users often are not very compact clusterings

AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number

Author: Cooper James B
Newman Aaron M
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry. Results We integrated strategies from machine learning, cartography, and graph theory into a new informatics method for automatically clustering self-organizing map ensembles of high-dimensional data. Our new method, called AutoSOME, readily identifies discrete and fuzzy data clusters without prior knowledge of cluster number or structure in diverse datasets including whole genome microarray data. Visualization of AutoSOME output using network diagrams and differential heat maps reveals unexpected variation among well-characterized cancer cell lines. Co-expression analysis of data from human embryonic and induced pluripotent stem cells using AutoSOME identifies >3400 up-regulated genes associated with pluripotency, and indicates that a recently identified protein-protein interaction network characterizing pluripotency was underestimated by a factor of four. Conclusions By effectively extracting important information from high-dimensional microarray data without prior knowledge or the need for data filtration, AutoSOME can yield systems-level insights from whole genome microarray expression studies. Due to its generality, this new method should also have practical utility for a variety of data-intensive applications, including the results of deep sequencing experiments. AutoSOME is available for download at <url>http://jimcooperlab.mcdb.ucsb.edu/autosome</url>.</p

Springer - Publisher Connector

Directory of Open Access Journals