114 research outputs found
Advanced Flow-Based Multilevel Hypergraph Partitioning
The balanced hypergraph partitioning problem is to partition a hypergraph into k disjoint blocks of bounded size such that the sum of the number of blocks connected by each hyperedge is minimized. We present an improvement to the flow-based refinement framework of KaHyPar-MF, the current state-of-the-art multilevel k-way hypergraph partitioning algorithm for high-quality solutions. Our improvement is based on the recently proposed HyperFlowCutter algorithm for computing bipartitions of unweighted hypergraphs by solving a sequence of incremental maximum flow problems. Since vertices and hyperedges are aggregated during the coarsening phase, refinement algorithms employed in the multilevel setting must be able to handle both weighted hyperedges and weighted vertices - even if the initial input hypergraph is unweighted. We therefore enhance HyperFlowCutter to handle weighted instances and propose a technique for computing maximum flows directly on weighted hypergraphs.
We compare the performance of two configurations of our new algorithm with KaHyPar-MF and seven other partitioning algorithms on a comprehensive benchmark set with instances from application areas such as VLSI design, scientific computing, and SAT solving. Our first configuration, KaHyPar-HFC, computes slightly better solutions than KaHyPar-MF using significantly less running time. The second configuration, KaHyPar-HFC*, computes solutions of significantly better quality and is still slightly faster than KaHyPar-MF. Furthermore, in terms of solution quality, both configurations also outperform all other competing partitioners
Fully Dynamic Matching in Bipartite Graphs
Maximum cardinality matching in bipartite graphs is an important and
well-studied problem. The fully dynamic version, in which edges are inserted
and deleted over time has also been the subject of much attention. Existing
algorithms for dynamic matching (in general graphs) seem to fall into two
groups: there are fast (mostly randomized) algorithms that do not achieve a
better than 2-approximation, and there slow algorithms with \O(\sqrt{m})
update time that achieve a better-than-2 approximation. Thus the obvious
question is whether we can design an algorithm -- deterministic or randomized
-- that achieves a tradeoff between these two: a approximation
and a better-than-2 approximation simultaneously. We answer this question in
the affirmative for bipartite graphs.
Our main result is a fully dynamic algorithm that maintains a 3/2 + \eps
approximation in worst-case update time O(m^{1/4}\eps^{/2.5}). We also give
stronger results for graphs whose arboricity is at most \al, achieving a (1+
\eps) approximation in worst-case time O(\al (\al + \log n)) for constant
\eps. When the arboricity is constant, this bound is and when the
arboricity is polylogarithmic the update time is also polylogarithmic.
The most important technical developement is the use of an intermediate graph
we call an edge degree constrained subgraph (EDCS). This graph places
constraints on the sum of the degrees of the endpoints of each edge: upper
bounds for matched edges and lower bounds for unmatched edges. The main
technical content of our paper involves showing both how to maintain an EDCS
dynamically and that and EDCS always contains a sufficiently large matching. We
also make use of graph orientations to help bound the amount of work done
during each update.Comment: Longer version of paper that appears in ICALP 201
Parallel Flow-Based Hypergraph Partitioning
We present a shared-memory parallelization of flow-based refinement, which is considered the most powerful iterative improvement technique for hypergraph partitioning at the moment. Flow-based refinement works on bipartitions, so current sequential partitioners schedule it on different block pairs to improve k-way partitions. We investigate two different sources of parallelism: a parallel scheduling scheme and a parallel maximum flow algorithm based on the well-known push-relabel algorithm. In addition to thoroughly engineered implementations, we propose several optimizations that substantially accelerate the algorithm in practice, enabling the use on extremely large hypergraphs (up to 1 billion pins). We integrate our approach in the state-of-the-art parallel multilevel framework Mt-KaHyPar and conduct extensive experiments on a benchmark set of more than 500 real-world hypergraphs, to show that the partition quality of our code is on par with the highest quality sequential code (KaHyPar), while being an order of magnitude faster with 10 threads
Quality Hypergraph Partitioning via Max-Flow-Min-Cut Computations
In dieser Arbeit wird ein Framework basierend auf Max-Flow-Min-Cut Berechnungen vorgestellt, zur Verbesserung einer balancierten k-teilige Aufteilung eines Hypergraphen. Aktuell werden Varianten des FM Algorithmus [17] in allen modernen Multilevel Hypergraph Partitionierer als lokaler Suchalgorithmus verwendet. Solche bewegungsbasierenden Heuristiken haben den Nachteil, dass sie nur lokale Informationen über die Problemstruktur in die Berechnungen miteinfließen lassen. Wenn viele Knotenbewegungen den selben Einfluss auf die Lösungsqualität haben, dann hängt das Ergebnis oft von zufälligen Entscheidungen ab, welche der Algorithmus selbst trifft [15, 31, 36]. Flussbasierende Ansätze sind nicht bewegungbasiert und finden einen globalen minimalen Schnitt, welcher zwei Knoten s und t eines Graphen trennt [18]. Unser Framework ist durch die Arbeit von Sanders und Schulz [44] inspiriert. Diese integrierten eine flussbasierende Heuristik erfolgreich in Ihren Multilevel Graph Partitionierer. Wir generalisieren viele Ihrer Ideen, sodass sie im Multilevel Hypergraph Partitionierung-Kontext anwendbar sind. Wir entwickeln mehrere Techniken, um das aktuelle Hypergraph Flussnetzwerk zu verkleinern, welche die resultierende Problemgröße im Vergleich zu der aktuellen Representation [33], um den Faktor 2 reduziert. Zusätzlich zeigen wir, wie ein Flussproblem auf einem Subhypergraphen konfiguriert werden kann, sodass das eine Max-Flow-Min-Cut Berechnung eine bessere Qualität erzielt, als die Modellierung von Sanders und Schulz. Am Ende haben wir unsere Arbeit als Verbesserungsstrategie in den n-level Hypergraph Partitionierer KaHyPar integriert [25]. Wir haben unser Framework auf 3216 verschiedenen Instanzen getestet. Im Vergleich mit 5 verschiedenen Systemen erzielt unsere neue Konfiguration, auf 73% der Instanzen, die besten Ergebnisse. Im Vergleich zu der aktuellen Variante von KaHyPar ist die Qualität der Lösungen um 2.5% gestiegen, während die Laufzeit lediglich um den Faktor 2 langsamer ist. Jedoch hat unser Algorithmus eine vergleichbare Laufzeit mit hMetis und erzielt auf 84% der Instanzen bessere Ergebnisse
Parallel and Flow-Based High Quality Hypergraph Partitioning
Balanced hypergraph partitioning is a classic NP-hard optimization problem that is a fundamental tool in such diverse disciplines as VLSI circuit design, route planning, sharding distributed databases, optimizing communication volume in parallel computing, and accelerating the simulation of quantum circuits.
Given a hypergraph and an integer , the task is to divide the vertices into disjoint blocks with bounded size, while minimizing an objective function on the hyperedges that span multiple blocks.
In this dissertation we consider the most commonly used objective, the connectivity metric, where we aim to minimize the number of different blocks connected by each hyperedge.
The most successful heuristic for balanced partitioning is the multilevel approach, which consists of three phases.
In the coarsening phase, vertex clusters are contracted to obtain a sequence of structurally similar but successively smaller hypergraphs.
Once sufficiently small, an initial partition is computed.
Lastly, the contractions are successively undone in reverse order, and an iterative improvement algorithm is employed to refine the projected partition on each level.
An important aspect in designing practical heuristics for optimization problems is the trade-off between solution quality and running time.
The appropriate trade-off depends on the specific application, the size of the data sets, and the computational resources available to solve the problem.
Existing algorithms are either slow, sequential and offer high solution quality, or are simple, fast, easy to parallelize, and offer low quality.
While this trade-off cannot be avoided entirely, our goal is to close the gaps as much as possible.
We achieve this by improving the state of the art in all non-trivial areas of the trade-off landscape with only a few techniques, but employed in two different ways.
Furthermore, most research on parallelization has focused on distributed memory, which neglects the greater flexibility of shared-memory algorithms and the wide availability of commodity multi-core machines.
In this thesis, we therefore design and revisit fundamental techniques for each phase of the multilevel approach, and develop highly efficient shared-memory parallel implementations thereof.
We consider two iterative improvement algorithms, one based on the Fiduccia-Mattheyses (FM) heuristic, and one based on label propagation.
For these, we propose a variety of techniques to improve the accuracy of gains when moving vertices in parallel, as well as low-level algorithmic improvements.
For coarsening, we present a parallel variant of greedy agglomerative clustering with a novel method to resolve cluster join conflicts on-the-fly.
Combined with a preprocessing phase for coarsening based on community detection, a portfolio of from-scratch partitioning algorithms, as well as recursive partitioning with work-stealing, we obtain our first parallel multilevel framework.
It is the fastest partitioner known, and achieves medium-high quality, beating all parallel partitioners, and is close to the highest quality sequential partitioner.
Our second contribution is a parallelization of an n-level approach, where only one vertex is contracted and uncontracted on each level.
This extreme approach aims at high solution quality via very fine-grained, localized refinement, but seems inherently sequential.
We devise an asynchronous n-level coarsening scheme based on a hierarchical decomposition of the contractions, as well as a batch-synchronous uncoarsening, and later fully asynchronous uncoarsening.
In addition, we adapt our refinement algorithms, and also use the preprocessing and portfolio.
This scheme is highly scalable, and achieves the same quality as the highest quality sequential partitioner (which is based on the same components), but is of course slower than our first framework due to fine-grained uncoarsening.
The last ingredient for high quality is an iterative improvement algorithm based on maximum flows.
In the sequential setting, we first improve an existing idea by solving incremental maximum flow problems, which leads to smaller cuts and is faster due to engineering efforts.
Subsequently, we parallelize the maximum flow algorithm and schedule refinements in parallel.
Beyond the strive for highest quality, we present a deterministically parallel partitioning framework.
We develop deterministic versions of the preprocessing, coarsening, and label propagation refinement.
Experimentally, we demonstrate that the penalties for determinism in terms of partition quality and running time are very small.
All of our claims are validated through extensive experiments, comparing our algorithms with state-of-the-art solvers on large and diverse benchmark sets.
To foster further research, we make our contributions available in our open-source framework Mt-KaHyPar.
While it seems inevitable, that with ever increasing problem sizes, we must transition to distributed memory algorithms, the study of shared-memory techniques is not in vain.
With the multilevel approach, even the inherently slow techniques have a role to play in fast systems, as they can be employed to boost quality on coarse levels at little expense.
Similarly, techniques for shared-memory parallelism are important, both as soon as a coarse graph fits into memory, and as local building blocks in the distributed algorithm
Navigating Central Path with Electrical Flows: from Flows to Matchings, and Back
We present an -time algorithm for
the maximum s-t flow and the minimum s-t cut problems in directed graphs with
unit capacities. This is the first improvement over the sparse-graph case of
the long-standing time bound due to Even and
Tarjan [EvenT75]. By well-known reductions, this also establishes an
-time algorithm for the maximum-cardinality bipartite
matching problem. That, in turn, gives an improvement over the celebrated
celebrated time bound of Hopcroft and Karp [HK73] whenever the
input graph is sufficiently sparse
Scalable High-Quality Graph and Hypergraph Partitioning
The balanced hypergraph partitioning problem (HGP) asks for a partition of the node set
of a hypergraph into blocks of roughly equal size, such that an objective function defined
on the hyperedges is minimized. In this work, we optimize the connectivity metric,
which is the most prominent objective function for HGP.
The hypergraph partitioning problem is NP-hard and there exists no constant factor approximation.
Thus, heuristic algorithms are used in practice with the multilevel scheme as
the most successful approach to solve the problem: First, the input hypergraph is coarsened to
obtain a hierarchy of successively smaller and structurally similar approximations.
The smallest hypergraph is then initially partitioned into blocks, and subsequently,
the contractions are reverted level-by-level, and, on each level, local search algorithms are used
to improve the partition (refinement phase).
In recent years, several new techniques were developed for sequential multilevel partitioning
that substantially improved solution quality at the cost of an increased running time.
These developments divide the landscape of existing partitioning algorithms into systems that either aim for
speed or high solution quality with the former often being more than an order of magnitude faster
than the latter. Due to the high running times of the best sequential algorithms, it is currently not
feasible to partition the largest real-world hypergraphs with the highest possible quality.
Thus, it becomes increasingly important to parallelize the techniques used in these algorithms.
However, existing state-of-the-art parallel partitioners currently do not achieve the same solution
quality as their sequential counterparts because they use comparatively weak components that are easier to parallelize.
Moreover, there has been a recent trend toward simpler methods for partitioning large hypergraphs
that even omit the multilevel scheme.
In contrast to this development, we present two shared-memory multilevel hypergraph partitioners
with parallel implementations of techniques used by the highest-quality sequential systems.
Our first multilevel algorithm uses a parallel clustering-based coarsening scheme,
which uses substantially fewer locking mechanisms than previous approaches.
The contraction decisions are guided by the community structure of the input hypergraph
obtained via a parallel community detection algorithm.
For initial partitioning, we implement parallel multilevel recursive bipartitioning with a
novel work-stealing approach and a portfolio of initial bipartitioning techniques to
compute an initial solution. In the refinement phase, we use three different parallel improvement
algorithms: label propagation refinement, a highly-localized direct -way
FM algorithm, and a novel parallelization of flow-based refinement.
These algorithms build on our highly-engineered partition data structure, for which we propose
several novel techniques to compute accurate gain values of node moves in the parallel setting.
Our second multilevel algorithm parallelizes the -level partitioning scheme used in
the highest-quality sequential partitioner KaHyPar. Here, only a single node
is contracted on each level, leading to a hierarchy with approximately levels where
is the number of nodes. Correspondingly, in each refinement step, only a single node is uncontracted, allowing
a highly-localized search for improvements.
We show that this approach, which seems inherently sequential, can be parallelized efficiently without compromises in solution quality.
To this end, we design a forest-based representation of contractions from which we derive a feasible parallel
schedule of the contraction operations that we apply on a novel dynamic hypergraph data structure on-the-fly.
In the uncoarsening phase, we decompose the contraction forest into batches, each containing
a fixed number of nodes. We then uncontract each batch in parallel and use highly-localized
versions of our refinement algorithms to improve the partition around the uncontracted nodes.
We further show that existing sequential partitioning algorithms considerably struggle to find balanced partitions
for weighted real-world hypergraphs. To this end, we present a technique that enables partitioners based on recursive
bipartitioning to reliably compute balanced solutions. The idea is to preassign a small portion of the
heaviest nodes to one of the two blocks of each bipartition and optimize the objective function on the
remaining nodes. We integrated the approach into the sequential hypergraph partitioner KaHyPar
and show that our new approach can compute balanced solutions for all tested instances without negatively affecting the solution
quality and running time of KaHyPar.
In our experimental evaluation, we compare our new shared-memory (hyper)graph partitioner Mt-KaHyPar
to different graph and hypergraph partitioners on over (hyper)graphs with up to two billion edges/pins.
The results indicate that already our fastest configuration outperforms almost all existing
hypergraph partitioners with regards to both solution quality and running time. Our highest-quality configuration
(-level with flow-based refinement) achieves the same solution quality as the currently best
sequential partitioner KaHyPar, while being almost an order of magnitude faster with ten threads.
In addition, we optimize our data structures for graph partitioning, which improves the running times of both multilevel partitioners by
almost a factor of two for graphs. As a result, Mt-KaHyPar also outperforms most of the existing
graph partitioning algorithms. While the shared-memory graph partitioner KaMinPar is still faster than
Mt-KaHyPar, its produced solutions are worse by in the median. The best sequential graph
partitioner KaFFPa-StrongS computes slightly better partitions than Mt-KaHyPar
(median improvement is ), but is more than an order of magnitude slower on average
- …