2,010 research outputs found

    Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

    Full text link
    GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a constant focus on absolute performance. Release 4.6 of GROMACS uses SIMD acceleration on a wide range of architectures, GPU offloading acceleration, and both OpenMP and MPI parallelism within and between nodes, respectively. The recent work on acceleration made it necessary to revisit the fundamental algorithms of molecular simulation, including the concept of neighborsearching, and we discuss the present and future challenges we see for exascale simulation - in particular a very fine-grained task parallelism. We also discuss the software management, code peer review and continuous integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin

    Rapid codesign of a soft vector processor and its compiler

    Get PDF
    Despite a decade of activity in the development of soft vector processors for FPGAs, high-level language support remains thin. We attribute this problem to a design method in which the high-level vector programming interface is only really considered once the processor architecture has been perfected, by which point the designer may be committed to the timeconsuming development of a complicated compiler. In this paper, we present the codesign of a soft vector processor and a lightweight compiler, which together lift the level of abstraction for the programmer while allowing a rapid compiler implementation phase.We demonstrate the effectiveness of our approach on a range of applications from digital signal processing, neuroscience, and machine learning.This work is sponsored by EPSRC grant EP/G015783/1.This is the accepted manuscript version. The final version is available at http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6927425&tag=1. © IEEE 201

    Empowering parallel computing with field programmable gate arrays

    Get PDF
    After more than 30 years, reconfigurable computing has grown from a concept to a mature field of science and technology. The cornerstone of this evolution is the field programmable gate array, a building block enabling the configuration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural refinements

    SPARROW: A low-cost hardware/software co-designed SIMD microarchitecture for AI operations in space processors

    Get PDF
    Recently there is an increasing interest in the use of artificial intelligence for on-board processing as indicated by the latest space missions, which cannot be satisfied by existing low-performance space-qualified processors. Although COTS AI accelerators can provide the required performance, they are not designed to meet space requirements. In this work, we co-design a low-cost SIMD micro-architecture integrated in a space qualified processor, which can significantly increase its performance. Our solution has no impact on the processor's 100 MHz frequency and consumes minimal area thanks to its innovative design compared to conventional vector micro-architectures. For the minimum configuration of our baseline space processor, our results indicate a performance boost of up to 9.3× for commonly used AI-related and image processing algorithms and 5.5× faster for a complex, space-relevant inference application with just 30% area increase.This work was supported by ESA through the GPU4S (GPU for Space) project, the Spanish Ministry of Economy and Competitiveness under grants PID2019- 107255GB and FJCI-2017-34095 (Spanish State Research Agency / http://dx.doi.org/10.13039/501100011033), the European Commission’s Horizon 2020 programme under the UP2DATE project (grant agreement 871465), the HiPEAC Network of Excellence and a first prize in Xilinx’s University Open Hardware Competition 2021 in the student category.Peer ReviewedPostprint (author's final draft
    corecore