12,603 research outputs found
Clustering and Community Detection with Imbalanced Clusters
Spectral clustering methods which are frequently used in clustering and
community detection applications are sensitive to the specific graph
constructions particularly when imbalanced clusters are present. We show that
ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to
imbalanced cluster sizes since they tend to emphasize cut sizes over cut
values. We propose a graph partitioning problem that seeks minimum cut
partitions under minimum size constraints on partitions to deal with imbalanced
cluster sizes. Our approach parameterizes a family of graphs by adaptively
modulating node degrees on a fixed node set, yielding a set of parameter
dependent cuts reflecting varying levels of imbalance. The solution to our
problem is then obtained by optimizing over these parameters. We present
rigorous limit cut analysis results to justify our approach and demonstrate the
superiority of our method through experiments on synthetic and real datasets
for data clustering, semi-supervised learning and community detection.Comment: Extended version of arXiv:1309.2303 with new applications. Accepted
to IEEE TSIP
On morphological hierarchical representations for image processing and spatial data clustering
Hierarchical data representations in the context of classi cation and data
clustering were put forward during the fties. Recently, hierarchical image
representations have gained renewed interest for segmentation purposes. In this
paper, we briefly survey fundamental results on hierarchical clustering and
then detail recent paradigms developed for the hierarchical representation of
images in the framework of mathematical morphology: constrained connectivity
and ultrametric watersheds. Constrained connectivity can be viewed as a way to
constrain an initial hierarchy in such a way that a set of desired constraints
are satis ed. The framework of ultrametric watersheds provides a generic scheme
for computing any hierarchical connected clustering, in particular when such a
hierarchy is constrained. The suitability of this framework for solving
practical problems is illustrated with applications in remote sensing
Solving weighted and counting variants of connectivity problems parameterized by treewidth deterministically in single exponential time
It is well known that many local graph problems, like Vertex Cover and
Dominating Set, can be solved in 2^{O(tw)}|V|^{O(1)} time for graphs G=(V,E)
with a given tree decomposition of width tw. However, for nonlocal problems,
like the fundamental class of connectivity problems, for a long time we did not
know how to do this faster than tw^{O(tw)}|V|^{O(1)}. Recently, Cygan et al.
(FOCS 2011) presented Monte Carlo algorithms for a wide range of connectivity
problems running in time $c^{tw}|V|^{O(1)} for a small constant c, e.g., for
Hamiltonian Cycle and Steiner tree. Naturally, this raises the question whether
randomization is necessary to achieve this runtime; furthermore, it is
desirable to also solve counting and weighted versions (the latter without
incurring a pseudo-polynomial cost in terms of the weights).
We present two new approaches rooted in linear algebra, based on matrix rank
and determinants, which provide deterministic c^{tw}|V|^{O(1)} time algorithms,
also for weighted and counting versions. For example, in this time we can solve
the traveling salesman problem or count the number of Hamiltonian cycles. The
rank-based ideas provide a rather general approach for speeding up even
straightforward dynamic programming formulations by identifying "small" sets of
representative partial solutions; we focus on the case of expressing
connectivity via sets of partitions, but the essential ideas should have
further applications. The determinant-based approach uses the matrix tree
theorem for deriving closed formulas for counting versions of connectivity
problems; we show how to evaluate those formulas via dynamic programming.Comment: 36 page
RASCAL: calculation of graph similarity using maximum common edge subgraphs
A new graph similarity calculation procedure is introduced for comparing labeled graphs. Given a minimum similarity threshold, the procedure consists of an initial screening process to determine whether it is possible for the measure of similarity between the two graphs to exceed the minimum threshold, followed by a rigorous maximum common edge subgraph (MCES) detection algorithm to compute the exact degree and composition of similarity. The proposed MCES algorithm is based on a maximum clique formulation of the problem and is a significant improvement over other published algorithms. It presents new approaches to both lower and upper bounding as well as vertex selection
Compressing networks with super nodes
Community detection is a commonly used technique for identifying groups in a
network based on similarities in connectivity patterns. To facilitate community
detection in large networks, we recast the network to be partitioned into a
smaller network of 'super nodes', each super node comprising one or more nodes
in the original network. To define the seeds of our super nodes, we apply the
'CoreHD' ranking from dismantling and decycling. We test our approach through
the analysis of two common methods for community detection: modularity
maximization with the Louvain algorithm and maximum likelihood optimization for
fitting a stochastic block model. Our results highlight that applying community
detection to the compressed network of super nodes is significantly faster
while successfully producing partitions that are more aligned with the local
network connectivity, more stable across multiple (stochastic) runs within and
between community detection algorithms, and overlap well with the results
obtained using the full network
- …