4,046 research outputs found
86 PFLOPS Deep Potential Molecular Dynamics simulation of 100 million atoms with ab initio accuracy
We present the GPU version of DeePMD-kit, which, upon training a deep neural
network model using ab initio data, can drive extremely large-scale molecular
dynamics (MD) simulation with ab initio accuracy. Our tests show that the GPU
version is 7 times faster than the CPU version with the same power consumption.
The code can scale up to the entire Summit supercomputer. For a copper system
of 113, 246, 208 atoms, the code can perform one nanosecond MD simulation per
day, reaching a peak performance of 86 PFLOPS (43% of the peak). Such
unprecedented ability to perform MD simulation with ab initio accuracy opens up
the possibility of studying many important issues in materials and molecules,
such as heterogeneous catalysis, electrochemical cells, irradiation damage,
crack propagation, and biochemical reactions.Comment: 29 pages, 11 figure
The role of graphics super-workstations in a supercomputing environment
A new class of very powerful workstations has recently become available which integrate near supercomputer computational performance with very powerful and high quality graphics capability. These graphics super-workstations are expected to play an increasingly important role in providing an enhanced environment for supercomputer users. Their potential uses include: off-loading the supercomputer (by serving as stand-alone processors, by post-processing of the output of supercomputer calculations, and by distributed or shared processing), scientific visualization (understanding of results, communication of results), and by real time interaction with the supercomputer (to steer an iterative computation, to abort a bad run, or to explore and develop new algorithms)
Massive-parallel Implementation of the Resolution-of-Identity Coupled-cluster Approaches in the Numeric Atom-centered Orbital Framework for Molecular Systems
We present a massive-parallel implementation of the resolution-of-identity
(RI) coupled-cluster approach that includes single, double and perturbatively
triple excitations, namely RI-CCSD(T), in the FHI-aims package for molecular
systems. A domain-based distributed-memory algorithm in the MPI/OpenMP hybrid
framework has been designed to effectively utilize the memory bandwidth and
significantly minimize the interconnect communication, particularly for the
tensor contraction in the evaluation of the particle-particle ladder term. Our
implementation features a rigorous avoidance of the on-the-fly disk storage and
an excellent strong scaling up to 10,000 and more cores. Taking a set of
molecules with different sizes, we demonstrate that the parallel performance of
our CCSD(T) code is competitive with the CC implementations in state-of-the-art
high-performance computing (HPC) computational chemistry packages. We also
demonstrate that the numerical error due to the use of RI approximation in our
RI-CCSD(T) is negligibly small. Together with the correlation-consistent
numeric atom-centered orbital (NAO) basis sets, NAO-VCC-nZ, the method is
applied to produce accurate theoretical reference data for 22 bio-oriented weak
interactions (S22), 11 conformational energies of gaseous cysteine conformers
(CYCONF), and 32 isomerization energies (ISO32)
Computational Cosmology and Astrophysics on Adaptive Meshes using Charm++
Astrophysical and cosmological phenomena involve a large variety of physical
processes, and can encompass an enormous range of scales. To effectively
investigate these phenomena computationally, applications must similarly
support modeling these phenomena on enormous ranges of scales; furthermore,
they must do so efficiently on high-performance computing platforms of
ever-increasing parallelism and complexity. We describe Enzo-P, a Petascale
redesign of the ENZO adaptive mesh refinement astrophysics and cosmology
application, along with Cello, a reusable and scalable adaptive mesh refinement
software framework, on which Enzo-P is based. Cello's scalability is enabled by
the Charm++ Parallel Programming System, whose data-driven asynchronous
execution model is ideal for taking advantage of the available but irregular
parallelism in adaptive mesh refinement-based applications. We present scaling
results on the NSF Blue Waters supercomputer, and outline our future plans to
bring Enzo-P to the Exascale Era by targeting highly-heterogeneous
accelerator-based platforms.Comment: 5 pages, 6 figures, submitted to SC18 workshop: PAW-AT
Computational fluid dynamics research at the United Technologies Research Center requiring supercomputers
An overview of research activities at the United Technologies Research Center (UTRC) in the area of Computational Fluid Dynamics (CFD) is presented. The requirement and use of various levels of computers, including supercomputers, for the CFD activities is described. Examples of CFD directed toward applications to helicopters, turbomachinery, heat exchangers, and the National Aerospace Plane are included. Helicopter rotor codes for the prediction of rotor and fuselage flow fields and airloads were developed with emphasis on rotor wake modeling. Airflow and airload predictions and comparisons with experimental data are presented. Examples are presented of recent parabolized Navier-Stokes and full Navier-Stokes solutions for hypersonic shock-wave/boundary layer interaction, and hydrogen/air supersonic combustion. In addition, other examples of CFD efforts in turbomachinery Navier-Stokes methodology and separated flow modeling are presented. A brief discussion of the 3-tier scientific computing environment is also presented, in which the researcher has access to workstations, mid-size computers, and supercomputers
MolSSI and BioExcel Workflow Workshop 2018 Report
Workflows in biomolecular science are very important as they are intricately
intertwined with the scientific outcomes, as well as algorithmic and
methodological innovations. The use and effectiveness of workflow tools to meet
the needs of the biomolecular science community is varied. MolSSI co-organized
a biomolecular workflows workshop in December 2018 with the goal of identifying
specific software gaps and opportunities for improved workflow practices. This
report captures presentations and discussion from that workshop. The workshop
participants were primary tools developers, along with "neutral observers" and
some biomolecular domain scientists. After contextualizing and motivating the
workshop, the report covers the existing roles and emerging trends in how
workflow systems are utilized. A few recurring observations are presented as
recommendations for improving the use and effectiveness of workflow tools. The
tools presented are discussed in Appendix B.Comment: 13 pages, Workflow Developers, Workshop, Repor
Establishing the Quantum Supremacy Frontier with a 281 Pflop/s Simulation
Noisy Intermediate-Scale Quantum (NISQ) computers are entering an era in
which they can perform computational tasks beyond the capabilities of the most
powerful classical computers, thereby achieving "Quantum Supremacy", a major
milestone in quantum computing. NISQ Supremacy requires comparison with a
state-of-the-art classical simulator. We report HPC simulations of hard random
quantum circuits (RQC), which have been recently used as a benchmark for the
first experimental demonstration of Quantum Supremacy, sustaining an average
performance of 281 Pflop/s (true single precision) on Summit, currently the
fastest supercomputer in the World. These simulations were carried out using
qFlex, a tensor-network-based classical high-performance simulator of RQCs. Our
results show an advantage of many orders of magnitude in energy consumption of
NISQ devices over classical supercomputers. In addition, we propose a standard
benchmark for NISQ computers based on qFlex.Comment: The paper has been published in Quantum Science and Technolog
An efficient MPI/OpenMP parallelization of the Hartree-Fock method for the second generation of Intel Xeon Phi processor
Modern OpenMP threading techniques are used to convert the MPI-only
Hartree-Fock code in the GAMESS program to a hybrid MPI/OpenMP algorithm. Two
separate implementations that differ by the sharing or replication of key data
structures among threads are considered, density and Fock matrices. All
implementations are benchmarked on a super-computer of 3,000 Intel Xeon Phi
processors. With 64 cores per processor, scaling numbers are reported on up to
192,000 cores. The hybrid MPI/OpenMP implementation reduces the memory
footprint by approximately 200 times compared to the legacy code. The
MPI/OpenMP code was shown to run up to six times faster than the original for a
range of molecular system sizes.Comment: SC17 conference paper, 12 pages, 7 figure
Education for Computational Science and Engineering
Computational science and engineering (CSE) has been misunderstood to advance
with the construction of enormous computers. To the contrary, the historical
record demonstrates that innovations in CSE come from improvements to the
mathematics embodied by computer programs. Whether scientists and engineers
become inventors who make these breakthroughs depends on circumstances and the
interdisciplinary extent of their educations. The USA currently has the largest
CSE professorate, but the data suggest this prominence is ephemeral.Comment: 9 pages, 2 figures, 1 tabl
Parallel Transport Time-Dependent Density Functional Theory Calculations with Hybrid Functional on Summit
Real-time time-dependent density functional theory (rt-TDDFT) with hybrid
exchange-correlation functional has wide-ranging applications in chemistry and
material science simulations. However, it can be thousands of times more
expensive than a conventional ground state DFT simulation, hence is limited to
small systems. In this paper, we accelerate hybrid functional rt-TDDFT
calculations using the parallel transport gauge formalism, and the GPU
implementation on Summit. Our implementation can efficiently scale to 786 GPUs
for a large system with 1536 silicon atoms, and the wall clock time is only 1.5
hours per femtosecond. This unprecedented speed enables the simulation of large
systems with more than 1000 atoms using rt-TDDFT and hybrid functional
- …