4,046 research outputs found

    86 PFLOPS Deep Potential Molecular Dynamics simulation of 100 million atoms with ab initio accuracy

    Full text link
    We present the GPU version of DeePMD-kit, which, upon training a deep neural network model using ab initio data, can drive extremely large-scale molecular dynamics (MD) simulation with ab initio accuracy. Our tests show that the GPU version is 7 times faster than the CPU version with the same power consumption. The code can scale up to the entire Summit supercomputer. For a copper system of 113, 246, 208 atoms, the code can perform one nanosecond MD simulation per day, reaching a peak performance of 86 PFLOPS (43% of the peak). Such unprecedented ability to perform MD simulation with ab initio accuracy opens up the possibility of studying many important issues in materials and molecules, such as heterogeneous catalysis, electrochemical cells, irradiation damage, crack propagation, and biochemical reactions.Comment: 29 pages, 11 figure

    The role of graphics super-workstations in a supercomputing environment

    Get PDF
    A new class of very powerful workstations has recently become available which integrate near supercomputer computational performance with very powerful and high quality graphics capability. These graphics super-workstations are expected to play an increasingly important role in providing an enhanced environment for supercomputer users. Their potential uses include: off-loading the supercomputer (by serving as stand-alone processors, by post-processing of the output of supercomputer calculations, and by distributed or shared processing), scientific visualization (understanding of results, communication of results), and by real time interaction with the supercomputer (to steer an iterative computation, to abort a bad run, or to explore and develop new algorithms)

    Massive-parallel Implementation of the Resolution-of-Identity Coupled-cluster Approaches in the Numeric Atom-centered Orbital Framework for Molecular Systems

    Full text link
    We present a massive-parallel implementation of the resolution-of-identity (RI) coupled-cluster approach that includes single, double and perturbatively triple excitations, namely RI-CCSD(T), in the FHI-aims package for molecular systems. A domain-based distributed-memory algorithm in the MPI/OpenMP hybrid framework has been designed to effectively utilize the memory bandwidth and significantly minimize the interconnect communication, particularly for the tensor contraction in the evaluation of the particle-particle ladder term. Our implementation features a rigorous avoidance of the on-the-fly disk storage and an excellent strong scaling up to 10,000 and more cores. Taking a set of molecules with different sizes, we demonstrate that the parallel performance of our CCSD(T) code is competitive with the CC implementations in state-of-the-art high-performance computing (HPC) computational chemistry packages. We also demonstrate that the numerical error due to the use of RI approximation in our RI-CCSD(T) is negligibly small. Together with the correlation-consistent numeric atom-centered orbital (NAO) basis sets, NAO-VCC-nZ, the method is applied to produce accurate theoretical reference data for 22 bio-oriented weak interactions (S22), 11 conformational energies of gaseous cysteine conformers (CYCONF), and 32 isomerization energies (ISO32)

    Computational Cosmology and Astrophysics on Adaptive Meshes using Charm++

    Full text link
    Astrophysical and cosmological phenomena involve a large variety of physical processes, and can encompass an enormous range of scales. To effectively investigate these phenomena computationally, applications must similarly support modeling these phenomena on enormous ranges of scales; furthermore, they must do so efficiently on high-performance computing platforms of ever-increasing parallelism and complexity. We describe Enzo-P, a Petascale redesign of the ENZO adaptive mesh refinement astrophysics and cosmology application, along with Cello, a reusable and scalable adaptive mesh refinement software framework, on which Enzo-P is based. Cello's scalability is enabled by the Charm++ Parallel Programming System, whose data-driven asynchronous execution model is ideal for taking advantage of the available but irregular parallelism in adaptive mesh refinement-based applications. We present scaling results on the NSF Blue Waters supercomputer, and outline our future plans to bring Enzo-P to the Exascale Era by targeting highly-heterogeneous accelerator-based platforms.Comment: 5 pages, 6 figures, submitted to SC18 workshop: PAW-AT

    Computational fluid dynamics research at the United Technologies Research Center requiring supercomputers

    Get PDF
    An overview of research activities at the United Technologies Research Center (UTRC) in the area of Computational Fluid Dynamics (CFD) is presented. The requirement and use of various levels of computers, including supercomputers, for the CFD activities is described. Examples of CFD directed toward applications to helicopters, turbomachinery, heat exchangers, and the National Aerospace Plane are included. Helicopter rotor codes for the prediction of rotor and fuselage flow fields and airloads were developed with emphasis on rotor wake modeling. Airflow and airload predictions and comparisons with experimental data are presented. Examples are presented of recent parabolized Navier-Stokes and full Navier-Stokes solutions for hypersonic shock-wave/boundary layer interaction, and hydrogen/air supersonic combustion. In addition, other examples of CFD efforts in turbomachinery Navier-Stokes methodology and separated flow modeling are presented. A brief discussion of the 3-tier scientific computing environment is also presented, in which the researcher has access to workstations, mid-size computers, and supercomputers

    MolSSI and BioExcel Workflow Workshop 2018 Report

    Full text link
    Workflows in biomolecular science are very important as they are intricately intertwined with the scientific outcomes, as well as algorithmic and methodological innovations. The use and effectiveness of workflow tools to meet the needs of the biomolecular science community is varied. MolSSI co-organized a biomolecular workflows workshop in December 2018 with the goal of identifying specific software gaps and opportunities for improved workflow practices. This report captures presentations and discussion from that workshop. The workshop participants were primary tools developers, along with "neutral observers" and some biomolecular domain scientists. After contextualizing and motivating the workshop, the report covers the existing roles and emerging trends in how workflow systems are utilized. A few recurring observations are presented as recommendations for improving the use and effectiveness of workflow tools. The tools presented are discussed in Appendix B.Comment: 13 pages, Workflow Developers, Workshop, Repor

    Establishing the Quantum Supremacy Frontier with a 281 Pflop/s Simulation

    Full text link
    Noisy Intermediate-Scale Quantum (NISQ) computers are entering an era in which they can perform computational tasks beyond the capabilities of the most powerful classical computers, thereby achieving "Quantum Supremacy", a major milestone in quantum computing. NISQ Supremacy requires comparison with a state-of-the-art classical simulator. We report HPC simulations of hard random quantum circuits (RQC), which have been recently used as a benchmark for the first experimental demonstration of Quantum Supremacy, sustaining an average performance of 281 Pflop/s (true single precision) on Summit, currently the fastest supercomputer in the World. These simulations were carried out using qFlex, a tensor-network-based classical high-performance simulator of RQCs. Our results show an advantage of many orders of magnitude in energy consumption of NISQ devices over classical supercomputers. In addition, we propose a standard benchmark for NISQ computers based on qFlex.Comment: The paper has been published in Quantum Science and Technolog

    An efficient MPI/OpenMP parallelization of the Hartree-Fock method for the second generation of Intel Xeon Phi processor

    Full text link
    Modern OpenMP threading techniques are used to convert the MPI-only Hartree-Fock code in the GAMESS program to a hybrid MPI/OpenMP algorithm. Two separate implementations that differ by the sharing or replication of key data structures among threads are considered, density and Fock matrices. All implementations are benchmarked on a super-computer of 3,000 Intel Xeon Phi processors. With 64 cores per processor, scaling numbers are reported on up to 192,000 cores. The hybrid MPI/OpenMP implementation reduces the memory footprint by approximately 200 times compared to the legacy code. The MPI/OpenMP code was shown to run up to six times faster than the original for a range of molecular system sizes.Comment: SC17 conference paper, 12 pages, 7 figure

    Education for Computational Science and Engineering

    Full text link
    Computational science and engineering (CSE) has been misunderstood to advance with the construction of enormous computers. To the contrary, the historical record demonstrates that innovations in CSE come from improvements to the mathematics embodied by computer programs. Whether scientists and engineers become inventors who make these breakthroughs depends on circumstances and the interdisciplinary extent of their educations. The USA currently has the largest CSE professorate, but the data suggest this prominence is ephemeral.Comment: 9 pages, 2 figures, 1 tabl

    Parallel Transport Time-Dependent Density Functional Theory Calculations with Hybrid Functional on Summit

    Full text link
    Real-time time-dependent density functional theory (rt-TDDFT) with hybrid exchange-correlation functional has wide-ranging applications in chemistry and material science simulations. However, it can be thousands of times more expensive than a conventional ground state DFT simulation, hence is limited to small systems. In this paper, we accelerate hybrid functional rt-TDDFT calculations using the parallel transport gauge formalism, and the GPU implementation on Summit. Our implementation can efficiently scale to 786 GPUs for a large system with 1536 silicon atoms, and the wall clock time is only 1.5 hours per femtosecond. This unprecedented speed enables the simulation of large systems with more than 1000 atoms using rt-TDDFT and hybrid functional
    • …
    corecore