Search CORE

2,646 research outputs found

Multi-partitioning for ADI-schemes on message passing architectures

Author: Vanderwijngaart Rob F.
Publication venue
Publication date
Field of study

A kind of discrete-operator splitting called Alternating Direction Implicit (ADI) has been found to be useful in simulating fluid flow problems. In particular, it is being used to study the effects of hot exhaust jets from high performance aircraft on landing surfaces. Decomposition techniques that minimize load imbalance and message-passing frequency are described. Three strategies that are investigated for implementing the NAS Scalar Penta-diagonal Parallel Benchmark (SP) are transposition, pipelined Gaussian elimination, and multipartitioning. The multipartitioning strategy, which was used on Ethernet, was found to be the most efficient, although it was considered only a moderate success because of Ethernet's limited communication properties. The efficiency derived largely from the coarse granularity of the strategy, which reduced latencies and allowed overlap of communication and computation

NASA Technical Reports Server

Causal Consistency: Beyond Memory

Author: Jard Claude
Mostefaoui Achour
Perrin Matthieu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/03/2016
Field of study

In distributed systems where strong consistency is costly when not impossible, causal consistency provides a valuable abstraction to represent program executions as partial orders. In addition to the sequential program order of each computing entity, causal order also contains the semantic links between the events that affect the shared objects -- messages emission and reception in a communication channel , reads and writes on a shared register. Usual approaches based on semantic links are very difficult to adapt to other data types such as queues or counters because they require a specific analysis of causal dependencies for each data type. This paper presents a new approach to define causal consistency for any abstract data type based on sequential specifications. It explores, formalizes and studies the differences between three variations of causal consistency and highlights them in the light of PRAM, eventual consistency and sequential consistency: weak causal consistency, that captures the notion of causality preservation when focusing on convergence ; causal convergence that mixes weak causal consistency and convergence; and causal consistency, that coincides with causal memory when applied to shared memory.Comment: 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Mar 2016, Barcelone, Spai

arXiv.org e-Print Archive

Crossref

Analyzing the effect of local rounding error propagation on the maximal attainable accuracy of the pipelined Conjugate Gradient method

Author: Agullo Emmanuel
Cools Siegfried
Giraud Luc
Vanroose Wim
Yetkin Emrullah Fatih
Publication venue
Publication date: 29/11/2017
Field of study

Pipelined Krylov subspace methods typically offer improved strong scaling on parallel HPC hardware compared to standard Krylov subspace methods for large and sparse linear systems. In pipelined methods the traditional synchronization bottleneck is mitigated by overlapping time-consuming global communications with useful computations. However, to achieve this communication hiding strategy, pipelined methods introduce additional recurrence relations for a number of auxiliary variables that are required to update the approximate solution. This paper aims at studying the influence of local rounding errors that are introduced by the additional recurrences in the pipelined Conjugate Gradient method. Specifically, we analyze the impact of local round-off effects on the attainable accuracy of the pipelined CG algorithm and compare to the traditional CG method. Furthermore, we estimate the gap between the true residual and the recursively computed residual used in the algorithm. Based on this estimate we suggest an automated residual replacement strategy to reduce the loss of attainable accuracy on the final iterative solution. The resulting pipelined CG method with residual replacement improves the maximal attainable accuracy of pipelined CG, while maintaining the efficient parallel performance of the pipelined method. This conclusion is substantiated by numerical results for a variety of benchmark problems.Comment: 26 pages, 6 figures, 2 tables, 4 algorithm

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Institutional Repository Universiteit Antwerpen

Iteration-fusing conjugate gradient for sparse linear systems with MPI + OmpSs

Author: Aliaga Estellés José Ignacio
Barreda Vayá Maria
Beltran Querol Vicenç
Casas Marc
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/12/2019
Field of study

In this paper, we target the parallel solution of sparse linear systems via iterative Krylov subspace-based method enhanced with a block-Jacobi preconditioner on a cluster of multicore processors. In order to tackle large-scale problems, we develop task-parallel implementations of the preconditioned conjugate gradient method that improve the interoperability between the message-passing interface and OmpSs programming models. Specifically, we progressively integrate several communication-reduction and iteration-fusing strategies into the initial code, obtaining more efficient versions of the method. For all these implementations, we analyze the communication patterns and perform a comparative analysis of their performance and scalability on a cluster consisting of 32 nodes with 24 cores each. The experimental analysis shows that the techniques described in the paper outperform the classical method by a margin that varies between 6 and 48%, depending on the evaluation

Repositori Institucional de la Universitat Jaume I

Iteration-fusing conjugate gradient for sparse linear systems with MPI + OmpSs

Author: Aliaga José I
Barreda María
Beltran Vicenç
Casas Marc
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

UPCommons. Portal del coneixement obert de la UPC

Parallelization of implicit finite difference schemes in computational fluid dynamics

Author: Decker Naomi H.
Naik Vijay K.
Nicoules Michel
Publication venue
Publication date
Field of study

Implicit finite difference schemes are often the preferred numerical schemes in computational fluid dynamics, requiring less stringent stability bounds than the explicit schemes. Each iteration in an implicit scheme involves global data dependencies in the form of second and higher order recurrences. Efficient parallel implementations of such iterative methods are considerably more difficult and non-intuitive. The parallelization of the implicit schemes that are used for solving the Euler and the thin layer Navier-Stokes equations and that require inversions of large linear systems in the form of block tri-diagonal and/or block penta-diagonal matrices is discussed. Three-dimensional cases are emphasized and schemes that minimize the total execution time are presented. Partitioning and scheduling schemes for alleviating the effects of the global data dependencies are described. An analysis of the communication and the computation aspects of these methods is presented. The effect of the boundary conditions on the parallel schemes is also discussed

NASA Technical Reports Server