2,646 research outputs found
Multi-partitioning for ADI-schemes on message passing architectures
A kind of discrete-operator splitting called Alternating Direction Implicit (ADI) has been found to be useful in simulating fluid flow problems. In particular, it is being used to study the effects of hot exhaust jets from high performance aircraft on landing surfaces. Decomposition techniques that minimize load imbalance and message-passing frequency are described. Three strategies that are investigated for implementing the NAS Scalar Penta-diagonal Parallel Benchmark (SP) are transposition, pipelined Gaussian elimination, and multipartitioning. The multipartitioning strategy, which was used on Ethernet, was found to be the most efficient, although it was considered only a moderate success because of Ethernet's limited communication properties. The efficiency derived largely from the coarse granularity of the strategy, which reduced latencies and allowed overlap of communication and computation
Causal Consistency: Beyond Memory
In distributed systems where strong consistency is costly when not
impossible, causal consistency provides a valuable abstraction to represent
program executions as partial orders. In addition to the sequential program
order of each computing entity, causal order also contains the semantic links
between the events that affect the shared objects -- messages emission and
reception in a communication channel , reads and writes on a shared register.
Usual approaches based on semantic links are very difficult to adapt to other
data types such as queues or counters because they require a specific analysis
of causal dependencies for each data type. This paper presents a new approach
to define causal consistency for any abstract data type based on sequential
specifications. It explores, formalizes and studies the differences between
three variations of causal consistency and highlights them in the light of
PRAM, eventual consistency and sequential consistency: weak causal consistency,
that captures the notion of causality preservation when focusing on convergence
; causal convergence that mixes weak causal consistency and convergence; and
causal consistency, that coincides with causal memory when applied to shared
memory.Comment: 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel
Programming, Mar 2016, Barcelone, Spai
Analyzing the effect of local rounding error propagation on the maximal attainable accuracy of the pipelined Conjugate Gradient method
Pipelined Krylov subspace methods typically offer improved strong scaling on
parallel HPC hardware compared to standard Krylov subspace methods for large
and sparse linear systems. In pipelined methods the traditional synchronization
bottleneck is mitigated by overlapping time-consuming global communications
with useful computations. However, to achieve this communication hiding
strategy, pipelined methods introduce additional recurrence relations for a
number of auxiliary variables that are required to update the approximate
solution. This paper aims at studying the influence of local rounding errors
that are introduced by the additional recurrences in the pipelined Conjugate
Gradient method. Specifically, we analyze the impact of local round-off effects
on the attainable accuracy of the pipelined CG algorithm and compare to the
traditional CG method. Furthermore, we estimate the gap between the true
residual and the recursively computed residual used in the algorithm. Based on
this estimate we suggest an automated residual replacement strategy to reduce
the loss of attainable accuracy on the final iterative solution. The resulting
pipelined CG method with residual replacement improves the maximal attainable
accuracy of pipelined CG, while maintaining the efficient parallel performance
of the pipelined method. This conclusion is substantiated by numerical results
for a variety of benchmark problems.Comment: 26 pages, 6 figures, 2 tables, 4 algorithm
Iteration-fusing conjugate gradient for sparse linear systems with MPI + OmpSs
In this paper, we target the parallel solution of sparse linear systems via iterative Krylov subspace-based method enhanced with a block-Jacobi preconditioner on a cluster of multicore processors. In order to tackle large-scale problems, we develop task-parallel implementations of the preconditioned conjugate gradient method that improve the interoperability between the message-passing interface and OmpSs programming models. Specifically, we progressively integrate several communication-reduction and iteration-fusing strategies into the initial code, obtaining more efficient versions of the method. For all these implementations, we analyze the communication patterns and perform a comparative analysis of their performance and scalability on a cluster consisting of 32 nodes with 24 cores each. The experimental analysis shows that the techniques described in the paper outperform the classical method by a margin that varies between 6 and 48%, depending on the evaluation
Iteration-fusing conjugate gradient for sparse linear systems with MPI + OmpSs
In this paper, we target the parallel solution of sparse linear systems via iterative Krylov subspace-based method enhanced with a block-Jacobi preconditioner on a cluster of multicore processors. In order to tackle large-scale problems, we develop task-parallel implementations of the preconditioned conjugate gradient method that improve the interoperability between the message-passing interface and OmpSs programming models. Specifically, we progressively integrate several communication-reduction and iteration-fusing strategies into the initial code, obtaining more efficient versions of the method. For all these implementations, we analyze the communication patterns and perform a comparative analysis of their performance and scalability on a cluster consisting of 32 nodes with 24 cores each. The experimental analysis shows that the techniques described in the paper outperform the classical method by a margin that varies between 6 and 48%, depending on the evaluation.This research was partially supported by the H2020 EU FETHPC Project 671602 “INTERTWinE.” The researchers from Universidad Jaume I were sponsored by Project TIN2017-82972-R of the Spanish Ministerio de EconomĂa y Competitividad. Maria Barreda was supported by the POSDOC-A/2017/11 project from the Universitat Jaume I.Peer ReviewedPostprint (author's final draft
Parallelization of implicit finite difference schemes in computational fluid dynamics
Implicit finite difference schemes are often the preferred numerical schemes in computational fluid dynamics, requiring less stringent stability bounds than the explicit schemes. Each iteration in an implicit scheme involves global data dependencies in the form of second and higher order recurrences. Efficient parallel implementations of such iterative methods are considerably more difficult and non-intuitive. The parallelization of the implicit schemes that are used for solving the Euler and the thin layer Navier-Stokes equations and that require inversions of large linear systems in the form of block tri-diagonal and/or block penta-diagonal matrices is discussed. Three-dimensional cases are emphasized and schemes that minimize the total execution time are presented. Partitioning and scheduling schemes for alleviating the effects of the global data dependencies are described. An analysis of the communication and the computation aspects of these methods is presented. The effect of the boundary conditions on the parallel schemes is also discussed
- …