145 research outputs found

    Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

    Full text link
    GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a constant focus on absolute performance. Release 4.6 of GROMACS uses SIMD acceleration on a wide range of architectures, GPU offloading acceleration, and both OpenMP and MPI parallelism within and between nodes, respectively. The recent work on acceleration made it necessary to revisit the fundamental algorithms of molecular simulation, including the concept of neighborsearching, and we discuss the present and future challenges we see for exascale simulation - in particular a very fine-grained task parallelism. We also discuss the software management, code peer review and continuous integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin

    GPU-Accelerated Algorithms for Compressed Signals Recovery with Application to Astronomical Imagery Deblurring

    Get PDF
    Compressive sensing promises to enable bandwidth-efficient on-board compression of astronomical data by lifting the encoding complexity from the source to the receiver. The signal is recovered off-line, exploiting GPUs parallel computation capabilities to speedup the reconstruction process. However, inherent GPU hardware constraints limit the size of the recoverable signal and the speedup practically achievable. In this work, we design parallel algorithms that exploit the properties of circulant matrices for efficient GPU-accelerated sparse signals recovery. Our approach reduces the memory requirements, allowing us to recover very large signals with limited memory. In addition, it achieves a tenfold signal recovery speedup thanks to ad-hoc parallelization of matrix-vector multiplications and matrix inversions. Finally, we practically demonstrate our algorithms in a typical application of circulant matrices: deblurring a sparse astronomical image in the compressed domain

    Development of High Performance Molecular Dynamics with Application to Multimillion-Atom Biomass Simulations

    Get PDF
    An understanding of the recalcitrance of plant biomass is important for efficient economic production of biofuel. Lignins are hydrophobic, branched polymers and form a residual barrier to effective hydrolysis of lignocellulosic biomass. Understanding lignin\u27s structure, dynamics and its interaction and binding to cellulose will help with finding more efficient ways to reduce its contribution to the recalcitrance. Molecular dynamics (MD) using the GROMACS software is employed to study these properties in atomic detail. Studying complex, realistic models of pretreated plant cell walls, requires simulations significantly larger than was possible before. The most challenging part of such large simulations is the computation of the electrostatic interaction. As a solution, the reaction-field (RF) method has been shown to give accurate results for lignocellulose systems, as well as good computational efficiency on leadership class supercomputers. The particle-mesh Ewald method has been improved by implementing 2D decomposition and thread level parallelization for molecules not accurately modeled by RF. Other scaling limiting computational components, such as the load balancing and memory requirements, were identified and addressed to allow such large scale simulations for the first time. This work was done with the help of modern software engineering principles, including code-review, continuous integration, and integrated development environments. These methods were adapted to the special requirements for scientific codes. Multiple simulations of lignocellulose were performed. The simulation presented primarily, explains the temperature-dependent structure and dynamics of individual softwood lignin polymers in aqueous solution. With decreasing temperature, the lignins are found to transition from mobile, extended to glassy, compact states. The low-temperature collapse is thermodynamically driven by the increase of the translational entropy and density fluctuations of water molecules removed from the hydration shell

    Meteorological modelling on the ICL distributed array processor and other parallel computers

    Get PDF

    Precision-Energy-Throughput Scaling Of Generic Matrix Multiplication and Convolution Kernels Via Linear Projections

    Get PDF
    Generic matrix multiplication (GEMM) and one-dimensional convolution/cross-correlation (CONV) kernels often constitute the bulk of the compute- and memory-intensive processing within image/audio recognition and matching systems. We propose a novel method to scale the energy and processing throughput of GEMM and CONV kernels for such error-tolerant multimedia applications by adjusting the precision of computation. Our technique employs linear projections to the input matrix or signal data during the top-level GEMM and CONV blocking and reordering. The GEMM and CONV kernel processing then uses the projected inputs and the results are accumulated to form the final outputs. Throughput and energy scaling takes place by changing the number of projections computed by each kernel, which in turn produces approximate results, i.e. changes the precision of the performed computation. Results derived from a voltage- and frequency-scaled ARM Cortex A15 processor running face recognition and music matching algorithms demonstrate that the proposed approach allows for 280%~440% increase of processing throughput and 75%~80% decrease of energy consumption against optimized GEMM and CONV kernels without any impact in the obtained recognition or matching accuracy. Even higher gains can be obtained if one is willing to tolerate some reduction in the accuracy of the recognition and matching applications
    • …
    corecore