20,123 research outputs found

    Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

    Full text link
    GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a constant focus on absolute performance. Release 4.6 of GROMACS uses SIMD acceleration on a wide range of architectures, GPU offloading acceleration, and both OpenMP and MPI parallelism within and between nodes, respectively. The recent work on acceleration made it necessary to revisit the fundamental algorithms of molecular simulation, including the concept of neighborsearching, and we discuss the present and future challenges we see for exascale simulation - in particular a very fine-grained task parallelism. We also discuss the software management, code peer review and continuous integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin

    Towards a more realistic sink particle algorithm for the RAMSES code

    Full text link
    We present a new sink particle algorithm developed for the Adaptive Mesh Refinement code RAMSES. Our main addition is the use of a clump finder to identify density peaks and their associated regions (the peak patches). This allows us to unambiguously define a discrete set of dense molecular cores as potential sites for sink particle formation. Furthermore, we develop a new scheme to decide if the gas in which a sink could potentially form, is indeed gravitationally bound and rapidly collapsing. This is achieved using a general integral form of the virial theorem, where we use the curvature in the gravitational potential to correctly account for the background potential. We detail all the necessary steps to follow the evolution of sink particles in turbulent molecular cloud simulations, such as sink production, their trajectory integration, sink merging and finally the gas accretion rate onto an existing sink. We compare our new recipe for sink formation to other popular implementations. Statistical properties such as the sink mass function, the average sink mass and the sink multiplicity function are used to evaluate the impact that our new scheme has on accurately predicting fundamental quantities such as the stellar initial mass function or the stellar multiplicity function.Comment: submitted to MNRAS, 24 pages, 19 figures, 5 table

    Design and optimization of a portable LQCD Monte Carlo code using OpenACC

    Full text link
    The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core GPUs, exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenACC, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.Comment: 26 pages, 2 png figures, preprint of an article submitted for consideration in International Journal of Modern Physics
    • …
    corecore