2 research outputs found

    EXSCALATE: An Extreme-Scale Virtual Screening Platform for Drug Discovery Targeting Polypharmacology to Fight SARS-CoV-2

    Get PDF
    The social and economic impact of the COVID-19 pandemic demands a reduction of the time required to find a therapeutic cure. In this paper, we describe the EXSCALATE molecular docking platform capable to scale on an entire modern supercomputer for supporting extreme-scale virtual screening campaigns. Such virtual experiments can provide in short time information on which molecules to consider in the next stages of the drug discovery pipeline, and it is a key asset in case of a pandemic. The EXSCALATE platform has been designed to benefit from heterogeneous computation nodes and to reduce scaling issues. In particular, we maximized the accelerators’ usage, minimized the communications between nodes, and aggregated the I/O requests to serve them more efficiently. Moreover, we balanced the computation across the nodes by designing an ad-hoc workflow based on the execution time prediction of each molecule. We deployed the platform on two HPC supercomputers, with a combined computational power of 81 PFLOPS, to evaluate the interaction between 70 billion of small molecules and 15 binding-sites of 12 viral proteins of SARS-CoV-2. The experiment lasted 60 hours and it performed more than one trillion ligand-pocket evaluations, setting a new record on the virtual screening scale

    Efficient NAS parallel benchmark kernels with CUDA

    No full text
    NAS Parallel Benchmarks (NPB) are one of the standard benchmark suites used to evaluate parallel hardware and software. There are many research efforts trying to provide different parallel versions apart from the original OpenMP and MPI. Concerning GPU accelerators, there are only the OpenCL and OpenACC available as consolidated versions. Our goal is to provide an efficient parallel implementation of the five NPB kernels with CUDA. Our contribution covers different aspects. First, best parallel programming practices were followed to implement NPB kernels using CUDA. Second, the support of larger workloads (class B and C) allow to stress and investigate the memory of robust GPUs. Third, we show that it is possible to make NPB efficient and suitable for GPUs although the benchmarks were designed for CPUs in the past. We succeed in achieving double performance with respect to the state-of-the-art in some cases as well as implementing efficient memory usage. Fourth, we discuss new experiments comparing performance and memory usage against OpenACC and OpenCL state-of-the-art versions using a relative new GPU architecture. The experimental results also revealed that our version is the best one for all the NPB kernels compared to OpenACC and OpenCL. The greatest differences were observed for the FT and EP kernels
    corecore