2 research outputs found
Pushing Back the Limit of Ab-initio Quantum Transport Simulations on Hybrid Supercomputers
The capabilities of CP2K, a density-functional theory package and OMEN, a
nano-device simulator, are combined to study transport phenomena from
first-principles in unprecedentedly large nanostructures. Based on the
Hamiltonian and overlap matrices generated by CP2K for a given system, OMEN
solves the Schroedinger equation with open boundary conditions (OBCs) for all
possible electron momenta and energies. To accelerate this core operation a
robust algorithm called SplitSolve has been developed. It allows to
simultaneously treat the OBCs on CPUs and the Schroedinger equation on GPUs,
taking advantage of hybrid nodes. Our key achievements on the Cray-XK7 Titan
are (i) a reduction in time-to-solution by more than one order of magnitude as
compared to standard methods, enabling the simulation of structures with more
than 50000 atoms, (ii) a parallel efficiency of 97% when scaling from 756 up to
18564 nodes, and (iii) a sustained performance of 15 DP-PFlop/s
COUNTDOWN Slack: a Run-time Library to Reduce Energy Footprint in Large-scale MPI Applications
The power consumption of supercomputers is a major challenge for system
owners, users, and society. It limits the capacity of system installations, it
requires large cooling infrastructures, and it is the cause of a large carbon
footprint. Reducing power during application execution without changing the
application source code or increasing time-to-completion is highly desirable in
real-life high-performance computing scenarios. The power management run-time
frameworks proposed in the last decade are based on the assumption that the
duration of communication and application phases in an MPI application can be
predicted and used at run-time to trade-off communication slack with power
consumption. In this manuscript, we first show that this assumption is too
general and leads to mispredictions, slowing down applications, thereby
jeopardizing the claimed benefits. We then propose a new approach based on (i)
the separation of communication phases and slack during MPI calls and (ii) a
timeout algorithm to cope with the hardware power management latency, which
jointly makes it possible to achieve performance-neutral power saving in MPI
applications without requiring labor-intensive and risky application source
code modifications. We validate our approach in a tier-1 production environment
with widely adopted scientific applications. Our approach has a
time-to-completion overhead lower than 1%, while it successfully exploits slack
in communication phases to achieve an average energy saving of 10%. If we focus
on a large-scale application runs, the proposed approach achieves 22% energy
saving with an overhead of only 0.4%. With respect to state-of-the-art
approaches, COUNTDOWN Slack is the only that always leads to an energy saving
with negligible overhead (<3%).Comment: 13 pages, 4 figures, 3 table