1,018 research outputs found
GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems
While many of the architectural details of future exascale-class high
performance computer systems are still a matter of intense research, there
appears to be a general consensus that they will be strongly heterogeneous,
featuring "standard" as well as "accelerated" resources. Today, such resources
are available as multicore processors, graphics processing units (GPUs), and
other accelerators such as the Intel Xeon Phi. Any software infrastructure that
claims usefulness for such environments must be able to meet their inherent
challenges: massive multi-level parallelism, topology, asynchronicity, and
abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is a
collection of building blocks that targets algorithms dealing with sparse
matrix representations on current and future large-scale systems. It implements
the "MPI+X" paradigm, has a pure C interface, and provides hybrid-parallel
numerical kernels, intelligent resource management, and truly heterogeneous
parallelism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. We
describe the details of its design with respect to the challenges posed by
modern heterogeneous supercomputers and recent algorithmic developments.
Implementation details which are indispensable for achieving high efficiency
are pointed out and their necessity is justified by performance measurements or
predictions based on performance models. The library code and several
applications are available as open source. We also provide instructions on how
to make use of GHOST in existing software packages, together with a case study
which demonstrates the applicability and performance of GHOST as a component
within a larger software stack.Comment: 32 pages, 11 figure
From Piz Daint to the Stars: Simulation of Stellar Mergers using High-Level Abstractions
We study the simulation of stellar mergers, which requires complex
simulations with high computational demands. We have developed Octo-Tiger, a
finite volume grid-based hydrodynamics simulation code with Adaptive Mesh
Refinement which is unique in conserving both linear and angular momentum to
machine precision. To face the challenge of increasingly complex, diverse, and
heterogeneous HPC systems, Octo-Tiger relies on high-level programming
abstractions.
We use HPX with its futurization capabilities to ensure scalability both
between nodes and within, and present first results replacing MPI with
libfabric achieving up to a 2.8x speedup. We extend Octo-Tiger to heterogeneous
GPU-accelerated supercomputers, demonstrating node-level performance and
portability. We show scalability up to full system runs on Piz Daint. For the
scenario's maximum resolution, the compute-critical parts (hydrodynamics and
gravity) achieve 68.1% parallel efficiency at 2048 nodes.Comment: Accepted at SC1
Session-Based Programming for Parallel Algorithms: Expressiveness and Performance
This paper investigates session programming and typing of benchmark examples
to compare productivity, safety and performance with other communications
programming languages. Parallel algorithms are used to examine the above
aspects due to their extensive use of message passing for interaction, and
their increasing prominence in algorithmic research with the rising
availability of hardware resources such as multicore machines and clusters. We
contribute new benchmark results for SJ, an extension of Java for type-safe,
binary session programming, against MPJ Express, a Java messaging system based
on the MPI standard. In conclusion, we observe that (1) despite rich libraries
and functionality, MPI remains a low-level API, and can suffer from commonly
perceived disadvantages of explicit message passing such as deadlocks and
unexpected message types, and (2) the benefits of high-level session
abstraction, which has significant impact on program structure to improve
readability and reliability, and session type-safety can greatly facilitate the
task of communications programming whilst retaining competitive performance
- …