Search CORE

3,228 research outputs found

Uncovering Group Level Insights with Accordant Clustering

Author: Ackerman Margareta
Dhurandhar Amit
Wang Xiang
Publication venue
Publication date: 07/04/2017
Field of study

Clustering is a widely-used data mining tool, which aims to discover partitions of similar items in data. We introduce a new clustering paradigm, \emph{accordant clustering}, which enables the discovery of (predefined) group level insights. Unlike previous clustering paradigms that aim to understand relationships amongst the individual members, the goal of accordant clustering is to uncover insights at the group level through the analysis of their members. Group level insight can often support a call to action that cannot be informed through previous clustering techniques. We propose the first accordant clustering algorithm, and prove that it finds near-optimal solutions when data possesses inherent cluster structure. The insights revealed by accordant clusterings enabled experts in the field of medicine to isolate successful treatments for a neurodegenerative disease, and those in finance to discover patterns of unnecessary spending.Comment: accepted to SDM 2017 (oral

arXiv.org e-Print Archive

Crossref

Bad Communities with High Modularity

Author: Kehagias Athanasios
Pitsoulis Leonidas
Publication venue
Publication date: 01/01/2013
Field of study

In this paper we discuss some problematic aspects of Newman's modularity function QN. Given a graph G, the modularity of G can be written as QN = Qf -Q0, where Qf is the intracluster edge fraction of G and Q0 is the expected intracluster edge fraction of the null model, i.e., a randomly connected graph with same expected degree distribution as G. It follows that the maximization of QN must accomodate two factors pulling in opposite directions: Qf favors a small number of clusters and Q0 favors many balanced (i.e., with approximately equal degrees) clusters. In certain cases the Q0 term can cause overestimation of the true cluster number; this is the opposite of the well-known under estimation effect caused by the "resolution limit" of modularity. We illustrate the overestimation effect by constructing families of graphs with a "natural" community structure which, however, does not maximize modularity. In fact, we prove that we can always find a graph G with a "natural clustering" V of G and another, balanced clustering U of G such that (i) the pair (G; U) has higher modularity than (G; V) and (ii) V and U are arbitrarily different.Comment: Significantly improved version of the paper, with the help of L. Pitsouli

arXiv.org e-Print Archive

CiteSeerX

EDP Sciences OAI-PMH repository (1.2.0)

Partitioning Complex Networks via Size-constrained Clustering

Author: B. Hendrickson
C. Chevalier
C. Walshaw
C. Walshaw
G. Karypis
I. Safro
L.F. Costa
P. Sanders
P. Sanders
R. Diekmann
T.N. Bui
Publication venue
Publication date: 01/01/2014
Field of study

The most commonly used method to tackle the graph partitioning problem in practice is the multilevel approach. During a coarsening phase, a multilevel graph partitioning algorithm reduces the graph size by iteratively contracting nodes and edges until the graph is small enough to be partitioned by some other algorithm. A partition of the input graph is then constructed by successively transferring the solution to the next finer graph and applying a local search algorithm to improve the current solution. In this paper, we describe a novel approach to partition graphs effectively especially if the networks have a highly irregular structure. More precisely, our algorithm provides graph coarsening by iteratively contracting size-constrained clusterings that are computed using a label propagation algorithm. The same algorithm that provides the size-constrained clusterings can also be used during uncoarsening as a fast and simple local search algorithm. Depending on the algorithm's configuration, we are able to compute partitions of very high quality outperforming all competitors, or partitions that are comparable to the best competitor in terms of quality, hMetis, while being nearly an order of magnitude faster on average. The fastest configuration partitions the largest graph available to us with 3.3 billion edges using a single machine in about ten minutes while cutting less than half of the edges than the fastest competitor, kMetis

arXiv.org e-Print Archive

CiteSeerX

Crossref