143 research outputs found

    Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

    Full text link
    GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a constant focus on absolute performance. Release 4.6 of GROMACS uses SIMD acceleration on a wide range of architectures, GPU offloading acceleration, and both OpenMP and MPI parallelism within and between nodes, respectively. The recent work on acceleration made it necessary to revisit the fundamental algorithms of molecular simulation, including the concept of neighborsearching, and we discuss the present and future challenges we see for exascale simulation - in particular a very fine-grained task parallelism. We also discuss the software management, code peer review and continuous integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin

    Multiscale Universal Interface: A Concurrent Framework for Coupling Heterogeneous Solvers

    Full text link
    Concurrently coupled numerical simulations using heterogeneous solvers are powerful tools for modeling multiscale phenomena. However, major modifications to existing codes are often required to enable such simulations, posing significant difficulties in practice. In this paper we present a C++ library, i.e. the Multiscale Universal Interface (MUI), which is capable of facilitating the coupling effort for a wide range of multiscale simulations. The library adopts a header-only form with minimal external dependency and hence can be easily dropped into existing codes. A data sampler concept is introduced, combined with a hybrid dynamic/static typing mechanism, to create an easily customizable framework for solver-independent data interpretation. The library integrates MPI MPMD support and an asynchronous communication protocol to handle inter-solver information exchange irrespective of the solvers' own MPI awareness. Template metaprogramming is heavily employed to simultaneously improve runtime performance and code flexibility. We validated the library by solving three different multiscale problems, which also serve to demonstrate the flexibility of the framework in handling heterogeneous models and solvers. In the first example, a Couette flow was simulated using two concurrently coupled Smoothed Particle Hydrodynamics (SPH) simulations of different spatial resolutions. In the second example, we coupled the deterministic SPH method with the stochastic Dissipative Particle Dynamics (DPD) method to study the effect of surface grafting on the hydrodynamics properties on the surface. In the third example, we consider conjugate heat transfer between a solid domain and a fluid domain by coupling the particle-based energy-conserving DPD (eDPD) method with the Finite Element Method (FEM).Comment: The library source code is freely available under the GPLv3 license at http://www.cfm.brown.edu/repo/release/MUI

    Best bang for your buck: GPU nodes for GROMACS biomolecular simulations

    Full text link
    The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well exploited with a combination of SIMD, multi-threading, and MPI-based SPMD/MPMD parallelism, while GPUs can be used as accelerators to compute interactions offloaded from the CPU. Here we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance-to-price ratio, energy efficiency, and several other criteria. Though hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer-class GPUs this improvement equally reflects in the performance-to-price ratio. Although memory issues in consumer-class GPUs could pass unnoticed since these cards do not support ECC memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost-efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well-balanced ratio of CPU and consumer-class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime

    Trilinos I/O Support (Trios)

    Get PDF

    An Experimental Study on Relationship between Performance and Energy Consumption of Serial and Parallel Text Searching Algorithm.

    Get PDF
    The world data is growing vigorously intersecting of large ordered sets and it is a common problem in the evaluation of data queries to a search engine. Thus, text retrieval systems have become a popular way in providing support for text databases. However this becomes a major question among us like how much energy is consumed? How to reduce execution time in searching large amount of data? In this paper, text searching algorithm is using to study the relationship between performance of computer and amount of energy produced in serial and parallel text searching algorithm. The amount of energy produced should be reduced along with the execution time to increase performance in data searching. Based on data recorded from the series of experiments, Serial Text Searching Algorithm is saving energy and reducing power usage. However, their performance is reducing as a smaller processor speed is using. In contrast to Parallel Text Searching Algorithm, there are larger amount of energy consumed from this experiment. However, it is approved that the performance of parallel experiment is far better than a single node performance

    Type Oriented Parallel Programming

    Get PDF
    Context: Parallel computing is an important field within the sciences. With the emergence of multi, and soon many, core CPUs this is moving more and more into the domain of general computing. HPC programmers want performance, but at the moment this comes at a cost; parallel languages are either efficient or conceptually simple, but not both. Aim: To develop and evaluate a novel programming paradigm which will address the problem of parallel programming and allow for languages which are both conceptually simple and efficient. Method: A type-based approach, which allows the programmer to control all aspects of parallelism by the use and combination of types has been developed. As a vehicle to present and analyze this new paradigm a parallel language, Mesham, and associated compilation tools have also been created. By using types to express parallelism the programmer can exercise efficient, flexible control in a high level abstract model yet with a sufficiently rich amount of information in the source code upon which the compiler can perform static analysis and optimization. Results: A number of case studies have been implemented in Mesham. Official benchmarks have been performed which demonstrate the paradigm allows one to write code which is comparable, in terms of performance, with existing high performance solutions. Sections of the parallel simulation package, Gadget-2, have been ported into Mesham, where substantial code simplifications have been made. Conclusions: The results obtained indicate that the type-based approach does satisfy the aim of the research described in this thesis. By using this new paradigm the programmer has been able to write parallel code which is both simple and efficient

    CFD investigation of a complete floating offshore wind turbine

    Get PDF
    This chapter presents numerical computations for floating offshore wind turbines for a machine of 10-MW rated power. The rotors were computed using the Helicopter Multi-Block flow solver of the University of Glasgow that solves the Navier-Stokes equations in integral form using the arbitrary Lagrangian-Eulerian formulation for time-dependent domains with moving boundaries. Hydrodynamic loads on the support platform were computed using the Smoothed Particle Hydrodynamics method. This method is mesh-free, and represents the fluid by a set of discrete particles. The motion of the floating offshore wind turbine is computed using a Multi-Body Dynamic Model of rigid bodies and frictionless joints. Mooring cables are modelled as a set of springs and dampers. All solvers were validated separately before coupling, and the loosely coupled algorithm used is described in detail alongside the obtained results

    Demonstration of a coupled floating offshore wind turbine analysis with high-fidelity methods

    Get PDF
    This paper presents results of numerical computations for floating off-shore wind turbines using, as an example, a machine of 10-MW rated power. The aerodynamic loads on the rotor are computed using the Helicopter Multi-Block flow solver developed at the University of Liverpool. The method solves the Navier–Stokes equations in integral form using the arbitrary Lagrangian–Eulerian formulation for time-dependent domains with moving boundaries. Hydrodynamic loads on the support platform are computed using the Smoothed Particle Hydrodynamics method, which is mesh-free and represents the water and floating structures by a set of discrete elements, referred to as particles. The motion of the floating offshore wind turbine is computed using a Multi-Body Dynamic Model of rigid bodies and frictionless joints. Mooring cables are modelled as a set of springs and dampers. All solvers were validated separately before coupling, and the results are presented in this paper. The importance of coupling is assessed and the loosely coupled algorithm used is described in detail alongside the obtained results
    corecore