67,319 research outputs found
Achieving Efficient Strong Scaling with PETSc using Hybrid MPI/OpenMP Optimisation
The increasing number of processing elements and decreas- ing memory to core
ratio in modern high-performance platforms makes efficient strong scaling a key
requirement for numerical algorithms. In order to achieve efficient scalability
on massively parallel systems scientific software must evolve across the entire
stack to exploit the multiple levels of parallelism exposed in modern
architectures. In this paper we demonstrate the use of hybrid MPI/OpenMP
parallelisation to optimise parallel sparse matrix-vector multiplication in
PETSc, a widely used scientific library for the scalable solution of partial
differential equations. Using large matrices generated by Fluidity, an open
source CFD application code which uses PETSc as its linear solver engine, we
evaluate the effect of explicit communication overlap using task-based
parallelism and show how to further improve performance by explicitly load
balancing threads within MPI processes. We demonstrate a significant speedup
over the pure-MPI mode and efficient strong scaling of sparse matrix-vector
multiplication on Fujitsu PRIMEHPC FX10 and Cray XE6 systems
GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU
High-performance implementations of graph algorithms are challenging to
implement on new parallel hardware such as GPUs because of three challenges:
(1) the difficulty of coming up with graph building blocks, (2) load imbalance
on parallel hardware, and (3) graph problems having low arithmetic intensity.
To address some of these challenges, GraphBLAS is an innovative, on-going
effort by the graph analytics community to propose building blocks based on
sparse linear algebra, which will allow graph algorithms to be expressed in a
performant, succinct, composable and portable manner. In this paper, we examine
the performance challenges of a linear-algebra-based approach to building graph
frameworks and describe new design principles for overcoming these bottlenecks.
Among the new design principles is exploiting input sparsity, which allows
users to write graph algorithms without specifying push and pull direction.
Exploiting output sparsity allows users to tell the backend which values of the
output in a single vectorized computation they do not want computed.
Load-balancing is an important feature for balancing work amongst parallel
workers. We describe the important load-balancing features for handling graphs
with different characteristics. The design principles described in this paper
have been implemented in "GraphBLAST", the first high-performance linear
algebra-based graph framework on NVIDIA GPUs that is open-source. The results
show that on a single GPU, GraphBLAST has on average at least an order of
magnitude speedup over previous GraphBLAS implementations SuiteSparse and GBTL,
comparable performance to the fastest GPU hardwired primitives and
shared-memory graph frameworks Ligra and Gunrock, and better performance than
any other GPU graph framework, while offering a simpler and more concise
programming model.Comment: 50 pages, 14 figures, 14 table
The DUNE-ALUGrid Module
In this paper we present the new DUNE-ALUGrid module. This module contains a
major overhaul of the sources from the ALUgrid library and the binding to the
DUNE software framework. The main changes include user defined load balancing,
parallel grid construction, and an redesign of the 2d grid which can now also
be used for parallel computations. In addition many improvements have been
introduced into the code to increase the parallel efficiency and to decrease
the memory footprint.
The original ALUGrid library is widely used within the DUNE community due to
its good parallel performance for problems requiring local adaptivity and
dynamic load balancing. Therefore, this new model will benefit a number of DUNE
users. In addition we have added features to increase the range of problems for
which the grid manager can be used, for example, introducing a 3d tetrahedral
grid using a parallel newest vertex bisection algorithm for conforming grid
refinement. In this paper we will discuss the new features, extensions to the
DUNE interface, and explain for various examples how the code is used in
parallel environments.Comment: 25 pages, 11 figure
Doubly Exponential Solution for Randomized Load Balancing Models with General Service Times
In this paper, we provide a novel and simple approach to study the
supermarket model with general service times. This approach is based on the
supplementary variable method used in analyzing stochastic models extensively.
We organize an infinite-size system of integral-differential equations by means
of the density dependent jump Markov process, and obtain a close-form solution:
doubly exponential structure, for the fixed point satisfying the system of
nonlinear equations, which is always a key in the study of supermarket models.
The fixed point is decomposited into two groups of information under a product
form: the arrival information and the service information. based on this, we
indicate two important observations: the fixed point for the supermarket model
is different from the tail of stationary queue length distribution for the
ordinary M/G/1 queue, and the doubly exponential solution to the fixed point
can extensively exist even if the service time distribution is heavy-tailed.
Furthermore, we analyze the exponential convergence of the current location of
the supermarket model to its fixed point, and study the Lipschitz condition in
the Kurtz Theorem under general service times. Based on these analysis, one can
gain a new understanding how workload probing can help in load balancing jobs
with general service times such as heavy-tailed service.Comment: 40 pages, 4 figure
- …