3,228 research outputs found
Uncovering Group Level Insights with Accordant Clustering
Clustering is a widely-used data mining tool, which aims to discover
partitions of similar items in data. We introduce a new clustering paradigm,
\emph{accordant clustering}, which enables the discovery of (predefined) group
level insights. Unlike previous clustering paradigms that aim to understand
relationships amongst the individual members, the goal of accordant clustering
is to uncover insights at the group level through the analysis of their
members. Group level insight can often support a call to action that cannot be
informed through previous clustering techniques. We propose the first accordant
clustering algorithm, and prove that it finds near-optimal solutions when data
possesses inherent cluster structure. The insights revealed by accordant
clusterings enabled experts in the field of medicine to isolate successful
treatments for a neurodegenerative disease, and those in finance to discover
patterns of unnecessary spending.Comment: accepted to SDM 2017 (oral
Bad Communities with High Modularity
In this paper we discuss some problematic aspects of Newman's modularity
function QN. Given a graph G, the modularity of G can be written as QN = Qf
-Q0, where Qf is the intracluster edge fraction of G and Q0 is the expected
intracluster edge fraction of the null model, i.e., a randomly connected graph
with same expected degree distribution as G. It follows that the maximization
of QN must accomodate two factors pulling in opposite directions: Qf favors a
small number of clusters and Q0 favors many balanced (i.e., with approximately
equal degrees) clusters. In certain cases the Q0 term can cause overestimation
of the true cluster number; this is the opposite of the well-known under
estimation effect caused by the "resolution limit" of modularity. We illustrate
the overestimation effect by constructing families of graphs with a "natural"
community structure which, however, does not maximize modularity. In fact, we
prove that we can always find a graph G with a "natural clustering" V of G and
another, balanced clustering U of G such that (i) the pair (G; U) has higher
modularity than (G; V) and (ii) V and U are arbitrarily different.Comment: Significantly improved version of the paper, with the help of L.
Pitsouli
Partitioning Complex Networks via Size-constrained Clustering
The most commonly used method to tackle the graph partitioning problem in
practice is the multilevel approach. During a coarsening phase, a multilevel
graph partitioning algorithm reduces the graph size by iteratively contracting
nodes and edges until the graph is small enough to be partitioned by some other
algorithm. A partition of the input graph is then constructed by successively
transferring the solution to the next finer graph and applying a local search
algorithm to improve the current solution.
In this paper, we describe a novel approach to partition graphs effectively
especially if the networks have a highly irregular structure. More precisely,
our algorithm provides graph coarsening by iteratively contracting
size-constrained clusterings that are computed using a label propagation
algorithm. The same algorithm that provides the size-constrained clusterings
can also be used during uncoarsening as a fast and simple local search
algorithm.
Depending on the algorithm's configuration, we are able to compute partitions
of very high quality outperforming all competitors, or partitions that are
comparable to the best competitor in terms of quality, hMetis, while being
nearly an order of magnitude faster on average. The fastest configuration
partitions the largest graph available to us with 3.3 billion edges using a
single machine in about ten minutes while cutting less than half of the edges
than the fastest competitor, kMetis
- …