6 research outputs found

    Improving pipelined time stepping algorithm for distributed memory multicomputers

    Get PDF
    Time stepping algorithm with spatial parallelisation is commonly used to solve time dependent partial differential equations. Computation in each time step is carried out using all processors available before sequentially advancing to the next time step. In cases where few spatial components are involved and there are relatively many processors available for use, this will result in fine granularity and decreased scalability. Naturally one alternative is to parallelise the temporal domain. Several time parallelisation algorithms have been suggested for the past two decades. One of them is the pipelined iterations across time steps. In this pipelined time stepping method, communication however is extensive between time steps during the pipelining process. This causes a decrease in performance on distributed memory environment which often has high message latency. We present a modified pipelined time stepping algorithm based on delayed pipelining and reduced communication strategies to improve overall execution time on a distributed memory environment using MPI. Our goal is to reduce the inter-time step communications while providing adequate information for the next time step to converge. Numerical result confirms that the improved algorithm is faster than the original pipelined algorithm and sequential time stepping algorithm with spatial parallelisation alone. The improved algorithm is most beneficial for fine granularity time dependent problems with limited spatial parallelisation

    Fault-tolerant grid-based solvers: Combining concepts from sparse grids and MapReduce

    Get PDF
    A key issue confronting petascale and exascale computing is the growth in probability of soft and hard faults with increasing system size. A promising approach to this problem is the use of algorithms that are inherently fault tolerant. We introduce such an algorithm for the solution of partial differential equations, based on the sparse grid approach. Here, the solution of multiple component grids are efficiently combined to achieve a solution on a full grid. The technique also lends itself to a (modified) MapReduce framework on a cluster of processors, with the map stage corresponding to allocating each component grid for solution over a subset of the processors, and the reduce stage corresponding to their combination. We describe how the sparse grid combination method can be modified to robustly solve partial differential equations in the presence of faults. This is based on a modified combination formula that can accommodate the loss of one or two component grids. We also discuss accuracy issues associated with this formula. We give details of a prototype implementation within a MapReduce framework using the dynamic process features and asynchronous message passing facilities of MPI. Results on a two-dimensional advection problem show that the errors after the loss of one or two sub-grids are within a factor of 3 of the sparse grid solution in the presence of no faults. They also indicate that the sparse grid technique with four times the resolution has approximately the same error as a full grid, while requiring (for a sufficiently high resolution) much lower computation and memory requirements. We finally outline a MapReduce variant capable of responding to faults in ways other than re-scheduling of failed tasks. We discuss the likely software requirements for such a flexible MapReduce framework, the requirements it will impose on users’ legacy codes, and the system's runtime behavior.J. W. Larson, M. Hegland, B. Harding, S. Roberts, L. Stals, A. P. Rendell, P. Strazdins, M. M. Ali, C. Kowitz, R. Nobes, J. Southern, N. Wilson, M. Li, Y. Oish

    Communication-aware adaptive parareal with application to a nonlinear hyperbolic system of partial dierential equations

    Get PDF
    In the strong scaling limit, the performance of conventional spatial domain decomposition techniques for the parallel solution of PDEs saturates. When sub-domains become small, halo-communication and other overheard come to dominate. A potential path beyond this scaling limit is to introduce domain-decomposition in time, with one such popular approach being the Parareal algorithm which has received a lot of attention due to its generality and potential scalability. Low efficiency, particularly on convection dominated problems, has however limited the adoption of the method. In this paper we introduce a new strategy, Communication Aware Adaptive Parareal (CAAP) to overcome some of the challenges. With CAAP, we choose time-subdomains short enough that convergence of the Parareal algorithm is quick, yet long enough that the overheard of communicating time-subdomain interfaces does not induce a new limit to parallel speed-up. Furthermore, we propose an adaptive work scheduling algorithm that overlaps consecutive Parareal cycles and decouples the number of time-subdomains and number of active node-groups in an efficient manner to allow for comparatively high parallel eciency. We demonstrate the viability of CAAP trough the parallel-in-time integration of a hyperbolic system of PDEs in the form of the two-dimensional nonlinear shallow-water wave equation solved using a 3rd order accurate WENO-RK discretization. For the computational cheap approximate operator needed as a preconditioner in the Parareal corrections we use a lower order Roe type discretization. Time-parallel integration of purely hyperbolic type evolution problems is traditionally considered impractical. Trough large-scale numerical experiments we demonstrate that with CAAP, it is possible not only to obtain time-parallel speedup on this class of evolution problems, but also that we may obtain parallel acceleration beyond what is possible using conventional spatial domain-decomposition techniques alone. The approach is widely applicable for parallel-in-time integration over long time domains, regardless of the class of evolution problem

    Latency tolerance through parallelization of time in scientific applications

    No full text
    Distributed computing environments, such as the Grid, promise enormous raw computational power, but involve high communication overheads. It is therefore believed that they are primarily suited for “embarrassingly parallel” applications, such as Monte Carlo, and for certain applications where the loosely-coupled nature of the science involved in the simulations leads to a coarse grained computation. In a typical application, this is not feasible. We discuss our solution strategy, based on scalable functional decomposition, which can be used to keep the computation coarse grained, even on a large number of processors. Such a decomposition can be attempted through a variety of means. We will discuss the use of time parallelization to achieve this. We demonstrate results with a model problem, and then discuss its implementation for an important problem in nanomaterials simulation. We also show that this technique can be extended to make it inherently fault-tolerant.Conference PaperPublishe
    corecore