10,368 research outputs found

    Parallel sparse matrix-vector multiplication as a test case for hybrid MPI+OpenMP programming

    Full text link
    We evaluate optimized parallel sparse matrix-vector operations for two representative application areas on widespread multicore-based cluster configurations. First the single-socket baseline performance is analyzed and modeled with respect to basic architectural properties of standard multicore chips. Going beyond the single node, parallel sparse matrix-vector operations often suffer from an unfavorable communication to computation ratio. Starting from the observation that nonblocking MPI is not able to hide communication cost using standard MPI implementations, we demonstrate that explicit overlap of communication and computation can be achieved by using a dedicated communication thread, which may run on a virtual core. We compare our approach to pure MPI and the widely used "vector-like" hybrid programming strategy.Comment: 12 pages, 6 figure

    Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems

    Full text link
    We evaluate optimized parallel sparse matrix-vector operations for several representative application areas on widespread multicore-based cluster configurations. First the single-socket baseline performance is analyzed and modeled with respect to basic architectural properties of standard multicore chips. Beyond the single node, the performance of parallel sparse matrix-vector operations is often limited by communication overhead. Starting from the observation that nonblocking MPI is not able to hide communication cost using standard MPI implementations, we demonstrate that explicit overlap of communication and computation can be achieved by using a dedicated communication thread, which may run on a virtual core. Moreover we identify performance benefits of hybrid MPI/OpenMP programming due to improved load balancing even without explicit communication overlap. We compare performance results for pure MPI, the widely used "vector-like" hybrid programming strategies, and explicit overlap on a modern multicore-based cluster and a Cray XE6 system.Comment: 16 pages, 10 figure

    A pseudospectral matrix method for time-dependent tensor fields on a spherical shell

    Full text link
    We construct a pseudospectral method for the solution of time-dependent, non-linear partial differential equations on a three-dimensional spherical shell. The problem we address is the treatment of tensor fields on the sphere. As a test case we consider the evolution of a single black hole in numerical general relativity. A natural strategy would be the expansion in tensor spherical harmonics in spherical coordinates. Instead, we consider the simpler and potentially more efficient possibility of a double Fourier expansion on the sphere for tensors in Cartesian coordinates. As usual for the double Fourier method, we employ a filter to address time-step limitations and certain stability issues. We find that a tensor filter based on spin-weighted spherical harmonics is successful, while two simplified, non-spin-weighted filters do not lead to stable evolutions. The derivatives and the filter are implemented by matrix multiplication for efficiency. A key technical point is the construction of a matrix multiplication method for the spin-weighted spherical harmonic filter. As example for the efficient parallelization of the double Fourier, spin-weighted filter method we discuss an implementation on a GPU, which achieves a speed-up of up to a factor of 20 compared to a single core CPU implementation.Comment: 33 pages, 9 figure
    corecore