1,928 research outputs found

    Run-time scheduling and execution of loops on message passing machines

    Get PDF
    Sparse system solvers and general purpose codes for solving partial differential equations are examples of the many types of problems whose irregularity can result in poor performance on distributed memory machines. Often, the data structures used in these problems are very flexible. Crucial details concerning loop dependences are encoded in these structures rather than being explicitly represented in the program. Good methods for parallelizing and partitioning these types of problems require assignment of computations in rather arbitrary ways. Naive implementations of programs on distributed memory machines requiring general loop partitions can be extremely inefficient. Instead, the scheduling mechanism needs to capture the data reference patterns of the loops in order to partition the problem. First, the indices assigned to each processor must be locally numbered. Next, it is necessary to precompute what information is needed by each processor at various points in the computation. The precomputed information is then used to generate an execution template designed to carry out the computation, communication, and partitioning of data, in an optimized manner. The design is presented for a general preprocessor and schedule executer, the structures of which do not vary, even though the details of the computation and of the type of information are problem dependent

    A new approximate matrix factorization for implicit time integration in air pollution modeling

    Get PDF
    Implicit time stepping typically requires solution of one or several linear systems with a matrix I−τJ per time step where J is the Jacobian matrix. If solution of these systems is expensive, replacing I−τJ with its approximate matrix factorization (AMF) (I−τR)(I−τV), R+V=J, often leads to a good compromise between stability and accuracy of the time integration on the one hand and its efficiency on the other hand. For example, in air pollution modeling, AMF has been successfully used in the framework of Rosenbrock schemes. The standard AMF gives an approximation to I−τJ with the error τ2RV, which can be significant in norm. In this paper we propose a new AMF. In assumption that −V is an M-matrix, the error of the new AMF can be shown to have an upper bound τ||R||, while still being asymptotically O(τ2)O(\tau^2). This new AMF, called AMF+, is equal in costs to standard AMF and, as both analysis and numerical experiments reveal, provides a better accuracy. We also report on our experience with another, cheaper AMF and with AMF-preconditioned GMRES

    Batch solution of small PDEs with the OPS DSL

    Get PDF
    In this paper we discuss the challenges and optimisations opportunities when solving a large number of small, equally sized discretised PDEs on regular grids. We present an extension of the OPS (Oxford Parallel library for Structured meshes) embedded Domain Specific Language, and show how support can be added for solving multiple systems, and how OPS makes it easy to deploy a variety of transformations and optimisations. The new capabilities in OPS allow to automatically apply data structure transformations, as well as execution schedule transformations to deliver high performance on a variety of hardware platforms. We evaluate our work on an industrially representative finance simulation on Intel CPUs, as well as NVIDIA GPUs
    • …
    corecore