29,404 research outputs found
Multilevel Algebraic Approach for Performance Analysis of Parallel Algorithms
In order to solve a problem in parallel we need to undertake the fundamental step of splitting the computational tasks into parts, i.e. decomposing the problem solving. A whatever decomposition does not necessarily lead to a parallel algorithm with the highest performance. This topic is even more important when complex parallel algorithms must be developed for hybrid or heterogeneous architectures. We present an innovative approach which starts from a problem decomposition into parts (sub-problems). These parts will be regarded as elements of an algebraic structure and will be related to each other according to a suitably defined dependency relationship. The main outcome of such framework is to define a set of block matrices (dependency, decomposition, memory accesses and execution) which simply highlight fundamental characteristics of the corresponding algorithm, such as inherent parallelism and sources of overheads. We provide a mathematical formulation of this approach, and we perform a feasibility analysis for the performance of a parallel algorithm in terms of its time complexity and scalability. We compare our results with standard expressions of speed up, efficiency, overhead, and so on. Finally, we show how the multilevel structure of this framework eases the choice of the abstraction level (both for the problem decomposition and for the algorithm description) in order to determine the granularity of the tasks within the performance analysis. This feature is helpful to better understand the mapping of parallel algorithms on novel hybrid and heterogeneous architectures
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
Parallel Graph Partitioning for Complex Networks
Processing large complex networks like social networks or web graphs has
recently attracted considerable interest. In order to do this in parallel, we
need to partition them into pieces of about equal size. Unfortunately, previous
parallel graph partitioners originally developed for more regular mesh-like
networks do not work well for these networks. This paper addresses this problem
by parallelizing and adapting the label propagation technique originally
developed for graph clustering. By introducing size constraints, label
propagation becomes applicable for both the coarsening and the refinement phase
of multilevel graph partitioning. We obtain very high quality by applying a
highly parallel evolutionary algorithm to the coarsened graph. The resulting
system is both more scalable and achieves higher quality than state-of-the-art
systems like ParMetis or PT-Scotch. For large complex networks the performance
differences are very big. For example, our algorithm can partition a web graph
with 3.3 billion edges in less than sixteen seconds using 512 cores of a high
performance cluster while producing a high quality partition -- none of the
competing systems can handle this graph on our system.Comment: Review article. Parallelization of our previous approach
arXiv:1402.328
Partitioning Complex Networks via Size-constrained Clustering
The most commonly used method to tackle the graph partitioning problem in
practice is the multilevel approach. During a coarsening phase, a multilevel
graph partitioning algorithm reduces the graph size by iteratively contracting
nodes and edges until the graph is small enough to be partitioned by some other
algorithm. A partition of the input graph is then constructed by successively
transferring the solution to the next finer graph and applying a local search
algorithm to improve the current solution.
In this paper, we describe a novel approach to partition graphs effectively
especially if the networks have a highly irregular structure. More precisely,
our algorithm provides graph coarsening by iteratively contracting
size-constrained clusterings that are computed using a label propagation
algorithm. The same algorithm that provides the size-constrained clusterings
can also be used during uncoarsening as a fast and simple local search
algorithm.
Depending on the algorithm's configuration, we are able to compute partitions
of very high quality outperforming all competitors, or partitions that are
comparable to the best competitor in terms of quality, hMetis, while being
nearly an order of magnitude faster on average. The fastest configuration
partitions the largest graph available to us with 3.3 billion edges using a
single machine in about ten minutes while cutting less than half of the edges
than the fastest competitor, kMetis
Tree-based Coarsening and Partitioning of Complex Networks
Many applications produce massive complex networks whose analysis would
benefit from parallel processing. Parallel algorithms, in turn, often require a
suitable network partition. For solving optimization tasks such as graph
partitioning on large networks, multilevel methods are preferred in practice.
Yet, complex networks pose challenges to established multilevel algorithms, in
particular to their coarsening phase.
One way to specify a (recursive) coarsening of a graph is to rate its edges
and then contract the edges as prioritized by the rating. In this paper we (i)
define weights for the edges of a network that express the edges' importance
for connectivity, (ii) compute a minimum weight spanning tree with
respect to these weights, and (iii) rate the network edges based on the
conductance values of 's fundamental cuts. To this end, we also (iv)
develop the first optimal linear-time algorithm to compute the conductance
values of \emph{all} fundamental cuts of a given spanning tree. We integrate
the new edge rating into a leading multilevel graph partitioner and equip the
latter with a new greedy postprocessing for optimizing the maximum
communication volume (MCV). Experiments on bipartitioning frequently used
benchmark networks show that the postprocessing already reduces MCV by 11.3%.
Our new edge rating further reduces MCV by 10.3% compared to the previously
best rating with the postprocessing in place for both ratings. In total, with a
modest increase in running time, our new approach reduces the MCV of complex
network partitions by 20.4%
A Multilevel Approach to Topology-Aware Collective Operations in Computational Grids
The efficient implementation of collective communiction operations has
received much attention. Initial efforts produced "optimal" trees based on
network communication models that assumed equal point-to-point latencies
between any two processes. This assumption is violated in most practical
settings, however, particularly in heterogeneous systems such as clusters of
SMPs and wide-area "computational Grids," with the result that collective
operations perform suboptimally. In response, more recent work has focused on
creating topology-aware trees for collective operations that minimize
communication across slower channels (e.g., a wide-area network). While these
efforts have significant communication benefits, they all limit their view of
the network to only two layers. We present a strategy based upon a multilayer
view of the network. By creating multilevel topology-aware trees we take
advantage of communication cost differences at every level in the network. We
used this strategy to implement topology-aware versions of several MPI
collective operations in MPICH-G2, the Globus Toolkit[tm]-enabled version of
the popular MPICH implementation of the MPI standard. Using information about
topology provided by MPICH-G2, we construct these multilevel topology-aware
trees automatically during execution. We present results demonstrating the
advantages of our multilevel approach by comparing it to the default
(topology-unaware) implementation provided by MPICH and a topology-aware
two-layer implementation.Comment: 16 pages, 8 figure
- …