2,646 research outputs found

    Multi-partitioning for ADI-schemes on message passing architectures

    Get PDF
    A kind of discrete-operator splitting called Alternating Direction Implicit (ADI) has been found to be useful in simulating fluid flow problems. In particular, it is being used to study the effects of hot exhaust jets from high performance aircraft on landing surfaces. Decomposition techniques that minimize load imbalance and message-passing frequency are described. Three strategies that are investigated for implementing the NAS Scalar Penta-diagonal Parallel Benchmark (SP) are transposition, pipelined Gaussian elimination, and multipartitioning. The multipartitioning strategy, which was used on Ethernet, was found to be the most efficient, although it was considered only a moderate success because of Ethernet's limited communication properties. The efficiency derived largely from the coarse granularity of the strategy, which reduced latencies and allowed overlap of communication and computation

    Causal Consistency: Beyond Memory

    Get PDF
    In distributed systems where strong consistency is costly when not impossible, causal consistency provides a valuable abstraction to represent program executions as partial orders. In addition to the sequential program order of each computing entity, causal order also contains the semantic links between the events that affect the shared objects -- messages emission and reception in a communication channel , reads and writes on a shared register. Usual approaches based on semantic links are very difficult to adapt to other data types such as queues or counters because they require a specific analysis of causal dependencies for each data type. This paper presents a new approach to define causal consistency for any abstract data type based on sequential specifications. It explores, formalizes and studies the differences between three variations of causal consistency and highlights them in the light of PRAM, eventual consistency and sequential consistency: weak causal consistency, that captures the notion of causality preservation when focusing on convergence ; causal convergence that mixes weak causal consistency and convergence; and causal consistency, that coincides with causal memory when applied to shared memory.Comment: 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Mar 2016, Barcelone, Spai

    Analyzing the effect of local rounding error propagation on the maximal attainable accuracy of the pipelined Conjugate Gradient method

    Get PDF
    Pipelined Krylov subspace methods typically offer improved strong scaling on parallel HPC hardware compared to standard Krylov subspace methods for large and sparse linear systems. In pipelined methods the traditional synchronization bottleneck is mitigated by overlapping time-consuming global communications with useful computations. However, to achieve this communication hiding strategy, pipelined methods introduce additional recurrence relations for a number of auxiliary variables that are required to update the approximate solution. This paper aims at studying the influence of local rounding errors that are introduced by the additional recurrences in the pipelined Conjugate Gradient method. Specifically, we analyze the impact of local round-off effects on the attainable accuracy of the pipelined CG algorithm and compare to the traditional CG method. Furthermore, we estimate the gap between the true residual and the recursively computed residual used in the algorithm. Based on this estimate we suggest an automated residual replacement strategy to reduce the loss of attainable accuracy on the final iterative solution. The resulting pipelined CG method with residual replacement improves the maximal attainable accuracy of pipelined CG, while maintaining the efficient parallel performance of the pipelined method. This conclusion is substantiated by numerical results for a variety of benchmark problems.Comment: 26 pages, 6 figures, 2 tables, 4 algorithm

    Iteration-fusing conjugate gradient for sparse linear systems with MPI + OmpSs

    Get PDF
    In this paper, we target the parallel solution of sparse linear systems via iterative Krylov subspace-based method enhanced with a block-Jacobi preconditioner on a cluster of multicore processors. In order to tackle large-scale problems, we develop task-parallel implementations of the preconditioned conjugate gradient method that improve the interoperability between the message-passing interface and OmpSs programming models. Specifically, we progressively integrate several communication-reduction and iteration-fusing strategies into the initial code, obtaining more efficient versions of the method. For all these implementations, we analyze the communication patterns and perform a comparative analysis of their performance and scalability on a cluster consisting of 32 nodes with 24 cores each. The experimental analysis shows that the techniques described in the paper outperform the classical method by a margin that varies between 6 and 48%, depending on the evaluation

    Iteration-fusing conjugate gradient for sparse linear systems with MPI + OmpSs

    Get PDF
    In this paper, we target the parallel solution of sparse linear systems via iterative Krylov subspace-based method enhanced with a block-Jacobi preconditioner on a cluster of multicore processors. In order to tackle large-scale problems, we develop task-parallel implementations of the preconditioned conjugate gradient method that improve the interoperability between the message-passing interface and OmpSs programming models. Specifically, we progressively integrate several communication-reduction and iteration-fusing strategies into the initial code, obtaining more efficient versions of the method. For all these implementations, we analyze the communication patterns and perform a comparative analysis of their performance and scalability on a cluster consisting of 32 nodes with 24 cores each. The experimental analysis shows that the techniques described in the paper outperform the classical method by a margin that varies between 6 and 48%, depending on the evaluation.This research was partially supported by the H2020 EU FETHPC Project 671602 “INTERTWinE.” The researchers from Universidad Jaume I were sponsored by Project TIN2017-82972-R of the Spanish Ministerio de Economía y Competitividad. Maria Barreda was supported by the POSDOC-A/2017/11 project from the Universitat Jaume I.Peer ReviewedPostprint (author's final draft

    Parallelization of implicit finite difference schemes in computational fluid dynamics

    Get PDF
    Implicit finite difference schemes are often the preferred numerical schemes in computational fluid dynamics, requiring less stringent stability bounds than the explicit schemes. Each iteration in an implicit scheme involves global data dependencies in the form of second and higher order recurrences. Efficient parallel implementations of such iterative methods are considerably more difficult and non-intuitive. The parallelization of the implicit schemes that are used for solving the Euler and the thin layer Navier-Stokes equations and that require inversions of large linear systems in the form of block tri-diagonal and/or block penta-diagonal matrices is discussed. Three-dimensional cases are emphasized and schemes that minimize the total execution time are presented. Partitioning and scheduling schemes for alleviating the effects of the global data dependencies are described. An analysis of the communication and the computation aspects of these methods is presented. The effect of the boundary conditions on the parallel schemes is also discussed
    • …
    corecore