139 research outputs found

    Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation

    Get PDF
    Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is proposed which may substantially reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme. In our test scenarios the pJDS format cuts the overall spMVM memory footprint on the GPGPU by up to 70%, and achieves 95% to 130% of the ELLPACK-R performance. Using a suitable performance model we identify performance bottlenecks on the node level that invalidate some types of matrix structures for efficient multi-GPGPU parallelization. For appropriate sparsity patterns we extend previous work on distributed-memory parallel spMVM to demonstrate a scalable hybrid MPI-GPGPU code, achieving efficient overlap of communication and computation.Comment: 10 pages, 5 figures. Added reference to other recent sparse matrix format

    Parallel sparse matrix-vector multiplication as a test case for hybrid MPI+OpenMP programming

    Full text link
    We evaluate optimized parallel sparse matrix-vector operations for two representative application areas on widespread multicore-based cluster configurations. First the single-socket baseline performance is analyzed and modeled with respect to basic architectural properties of standard multicore chips. Going beyond the single node, parallel sparse matrix-vector operations often suffer from an unfavorable communication to computation ratio. Starting from the observation that nonblocking MPI is not able to hide communication cost using standard MPI implementations, we demonstrate that explicit overlap of communication and computation can be achieved by using a dedicated communication thread, which may run on a virtual core. We compare our approach to pure MPI and the widely used "vector-like" hybrid programming strategy.Comment: 12 pages, 6 figure

    Preconditioners for state constrained optimal control problems\ud with Moreau-Yosida penalty function tube

    Get PDF
    Optimal control problems with partial differential equations play an important role in many applications. The inclusion of bound constraints for the state poses a significant challenge for optimization methods. Our focus here is on the incorporation of the constraints via the Moreau-Yosida regularization technique. This method has been studied recently and has proven to be advantageous compared to other approaches. In this paper we develop preconditioners for the efficient solution of the Newton steps associated with the fast solution of the Moreau-Yosida regularized problem. Numerical results illustrate the competitiveness of this approach. \ud \ud Copyright c 2000 John Wiley & Sons, Ltd

    Preconditioning for Allen-Cahn variational inequalities with non-local constraints

    Get PDF
    The solution of Allen-Cahn variational inequalities with mass constraints is of interest in many applications. This problem can be solved both in its scalar and vector-valued form as a PDE-constrained optimization problem by means of a primal-dual active set method. At the heart of this method lies the solution of linear systems in saddle point form. In this paper we propose the use of Krylov-subspace solvers and suitable preconditioners for the saddle point systems. Numerical results illustrate the competitiveness of this approach

    Preconditioners for state constrained optimal control problems with Moreau-Yosida penalty function

    Get PDF
    Optimal control problems with partial differential equations as constraints play an important role in many applications. The inclusion of bound constraints for the state variable poses a significant challenge for optimization methods. Our focus here is on the incorporation of the constraints via the Moreau-Yosida regularization technique. This method has been studied recently and has proven to be advantageous compared to other approaches. In this paper we develop robust preconditioners for the efficient solution of the Newton steps associated with solving the Moreau-Yosida regularized problem. Numerical results illustrate the efficiency of our approach

    Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems

    Full text link
    We evaluate optimized parallel sparse matrix-vector operations for several representative application areas on widespread multicore-based cluster configurations. First the single-socket baseline performance is analyzed and modeled with respect to basic architectural properties of standard multicore chips. Beyond the single node, the performance of parallel sparse matrix-vector operations is often limited by communication overhead. Starting from the observation that nonblocking MPI is not able to hide communication cost using standard MPI implementations, we demonstrate that explicit overlap of communication and computation can be achieved by using a dedicated communication thread, which may run on a virtual core. Moreover we identify performance benefits of hybrid MPI/OpenMP programming due to improved load balancing even without explicit communication overlap. We compare performance results for pure MPI, the widely used "vector-like" hybrid programming strategies, and explicit overlap on a modern multicore-based cluster and a Cray XE6 system.Comment: 16 pages, 10 figure

    All-at-once solution of time-dependent PDE-constrained optimization problems

    Get PDF
    Time-dependent partial differential equations (PDEs) play an important role in applied mathematics and many other areas of science. One-shot methods try to compute the solution to these problems in a single iteration that solves for all time-steps at the same time. In this paper, we look at one-shot approaches for the optimal control of time-dependent PDEs and focus on the fast solution of these problems. The use of Krylov subspace solvers together with an efficient preconditioner allows for minimal storage requirements. We solve only approximate time-evolutions for both forward and adjoint problem and compute accurate solutions of a given control problem only at convergence of the overall Krylov subspace iteration. We show that our approach can give competitive results for a variety of problem formulations

    All-at-Once Solution if Time-Dependent PDE-Constrained Optimisation Problems

    Get PDF
    Time-dependent partial differential equations (PDEs) play an important role in applied mathematics and many other areas of science. One-shot methods try to compute the solution to these problems in a single iteration that solves for all time-steps at the same time. In this paper, we look at one-shot approaches for the optimal control of time-dependent PDEs and focus on the fast solution of these problems. The use of Krylov subspace solvers together with an efficient preconditioner allows for minimal storage requirements. We solve only approximate time-evolutions for both forward and adjoint problem and compute accurate solutions of a given control problem only at convergence of the overall Krylov subspace iteration. We show that our approach can give competitive results for a variety of problem formulations
    corecore