9,309 research outputs found
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
Probabilistic Multilevel Clustering via Composite Transportation Distance
We propose a novel probabilistic approach to multilevel clustering problems
based on composite transportation distance, which is a variant of
transportation distance where the underlying metric is Kullback-Leibler
divergence. Our method involves solving a joint optimization problem over
spaces of probability measures to simultaneously discover grouping structures
within groups and among groups. By exploiting the connection of our method to
the problem of finding composite transportation barycenters, we develop fast
and efficient optimization algorithms even for potentially large-scale
multilevel datasets. Finally, we present experimental results with both
synthetic and real data to demonstrate the efficiency and scalability of the
proposed approach.Comment: 25 pages, 3 figure
Memetic Multilevel Hypergraph Partitioning
Hypergraph partitioning has a wide range of important applications such as
VLSI design or scientific computing. With focus on solution quality, we develop
the first multilevel memetic algorithm to tackle the problem. Key components of
our contribution are new effective multilevel recombination and mutation
operations that provide a large amount of diversity. We perform a wide range of
experiments on a benchmark set containing instances from application areas such
VLSI, SAT solving, social networks, and scientific computing. Compared to the
state-of-the-art hypergraph partitioning tools hMetis, PaToH, and KaHyPar, our
new algorithm computes the best result on almost all instances
Estimating the resolution limit of the map equation in community detection
A community detection algorithm is considered to have a resolution limit if
the scale of the smallest modules that can be resolved depends on the size of
the analyzed subnetwork. The resolution limit is known to prevent some
community detection algorithms from accurately identifying the modular
structure of a network. In fact, any global objective function for measuring
the quality of a two-level assignment of nodes into modules must have some sort
of resolution limit or an external resolution parameter. However, it is yet
unknown how the resolution limit affects the so-called map equation, which is
known to be an efficient objective function for community detection. We derive
an analytical estimate and conclude that the resolution limit of the map
equation is set by the total number of links between modules instead of the
total number of links in the full network as for modularity. This mechanism
makes the resolution limit much less restrictive for the map equation than for
modularity, and in practice orders of magnitudes smaller. Furthermore, we argue
that the effect of the resolution limit often results from shoehorning
multi-level modular structures into two-level descriptions. As we show, the
hierarchical map equation effectively eliminates the resolution limit for
networks with nested multi-level modular structures.Comment: 12 pages, 7 figure
PT-Scotch: A tool for efficient parallel graph ordering
The parallel ordering of large graphs is a difficult problem, because on the
one hand minimum degree algorithms do not parallelize well, and on the other
hand the obtainment of high quality orderings with the nested dissection
algorithm requires efficient graph bipartitioning heuristics, the best
sequential implementations of which are also hard to parallelize. This paper
presents a set of algorithms, implemented in the PT-Scotch software package,
which allows one to order large graphs in parallel, yielding orderings the
quality of which is only slightly worse than the one of state-of-the-art
sequential algorithms. Our implementation uses the classical nested dissection
approach but relies on several novel features to solve the parallel graph
bipartitioning problem. Thanks to these improvements, PT-Scotch produces
consistently better orderings than ParMeTiS on large numbers of processors
Relaxation-Based Coarsening for Multilevel Hypergraph Partitioning
Multilevel partitioning methods that are inspired by principles of
multiscaling are the most powerful practical hypergraph partitioning solvers.
Hypergraph partitioning has many applications in disciplines ranging from
scientific computing to data science. In this paper we introduce the concept of
algebraic distance on hypergraphs and demonstrate its use as an algorithmic
component in the coarsening stage of multilevel hypergraph partitioning
solvers. The algebraic distance is a vertex distance measure that extends
hyperedge weights for capturing the local connectivity of vertices which is
critical for hypergraph coarsening schemes. The practical effectiveness of the
proposed measure and corresponding coarsening scheme is demonstrated through
extensive computational experiments on a diverse set of problems. Finally, we
propose a benchmark of hypergraph partitioning problems to compare the quality
of other solvers
Parallel Graph Partitioning for Complex Networks
Processing large complex networks like social networks or web graphs has
recently attracted considerable interest. In order to do this in parallel, we
need to partition them into pieces of about equal size. Unfortunately, previous
parallel graph partitioners originally developed for more regular mesh-like
networks do not work well for these networks. This paper addresses this problem
by parallelizing and adapting the label propagation technique originally
developed for graph clustering. By introducing size constraints, label
propagation becomes applicable for both the coarsening and the refinement phase
of multilevel graph partitioning. We obtain very high quality by applying a
highly parallel evolutionary algorithm to the coarsened graph. The resulting
system is both more scalable and achieves higher quality than state-of-the-art
systems like ParMetis or PT-Scotch. For large complex networks the performance
differences are very big. For example, our algorithm can partition a web graph
with 3.3 billion edges in less than sixteen seconds using 512 cores of a high
performance cluster while producing a high quality partition -- none of the
competing systems can handle this graph on our system.Comment: Review article. Parallelization of our previous approach
arXiv:1402.328
- …