24 research outputs found
Toward Performance-Portable PETSc for GPU-based Exascale Systems
The Portable Extensible Toolkit for Scientific computation (PETSc) library
delivers scalable solvers for nonlinear time-dependent differential and
algebraic equations and for numerical optimization.The PETSc design for
performance portability addresses fundamental GPU accelerator challenges and
stresses flexibility and extensibility by separating the programming model used
by the application from that used by the library, and it enables application
developers to use their preferred programming model, such as Kokkos, RAJA,
SYCL, HIP, CUDA, or OpenCL, on upcoming exascale systems. A blueprint for using
GPUs from PETSc-based codes is provided, and case studies emphasize the
flexibility and high performance achieved on current GPU-based systems.Comment: 15 pages, 10 figures, 2 table
Research and Education in Computational Science and Engineering
Over the past two decades the field of computational science and engineering
(CSE) has penetrated both basic and applied research in academia, industry, and
laboratories to advance discovery, optimize systems, support decision-makers,
and educate the scientific and engineering workforce. Informed by centuries of
theory and experiment, CSE performs computational experiments to answer
questions that neither theory nor experiment alone is equipped to answer. CSE
provides scientists and engineers of all persuasions with algorithmic
inventions and software systems that transcend disciplines and scales. Carried
on a wave of digital technology, CSE brings the power of parallelism to bear on
troves of data. Mathematics-based advanced computing has become a prevalent
means of discovery and innovation in essentially all areas of science,
engineering, technology, and society; and the CSE community is at the core of
this transformation. However, a combination of disruptive
developments---including the architectural complexity of extreme-scale
computing, the data revolution that engulfs the planet, and the specialization
required to follow the applications to new frontiers---is redefining the scope
and reach of the CSE endeavor. This report describes the rapid expansion of CSE
and the challenges to sustaining its bold advances. The report also presents
strategies and directions for CSE research and education for the next decade.Comment: Major revision, to appear in SIAM Revie
Improving Energy Saving of One-sided Matrix Decompositions on CPU-GPU Heterogeneous Systems
One-sided dense matrix decompositions (e.g., Cholesky, LU, and QR) are the
key components in scientific computing in many different fields. Although their
design has been highly optimized for modern processors, they still consume a
considerable amount of energy. As CPU-GPU heterogeneous systems are commonly
used for matrix decompositions, in this work, we aim to further improve the
energy saving of one-sided matrix decompositions on CPU-GPU heterogeneous
systems. We first build an Algorithm-Based Fault Tolerance protected
overclocking technique (ABFT-OC) to enable us to exploit reliable overclocking
for key matrix decomposition operations. Then, we design an energy-saving
matrix decomposition framework, Bi-directional Slack Reclamation(BSR), that can
intelligently combine the capability provided by ABFT-OC and DVFS to maximize
energy saving and maintain performance and reliability. Experiments show that
BSR is able to save up to 11.7% more energy compared with the current best
energy saving optimization approach with no performance degradation and up to
14.1% Energy * Delay^2 reduction. Also, BSR enables the Pareto efficient
performance-energy trade-off, which is able to provide up to 1.43x performance
improvement without costing extra energy
Recommended from our members
FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs
Article describes how today's large-scale scientific applications running on high-performance computing (HPC) systems generate vast data volumes. Thus, data compression is becoming a critical technique to mitigate the storage burden and data-movement cost. In this paper, the authors develop a fast and high- ratio error-bounded lossy compressor on GPUs for scientific data (called FZ-GPU)
Research and Education in Computational Science and Engineering
This report presents challenges, opportunities, and directions for computational science and engineering (CSE) research and education for the next decade. Over the past two decades the field of CSE has penetrated both basic and applied research in academia, industry, and laboratories to advance discovery, optimize systems, support decision-makers, and educate the scientific and engineering workforce. Informed by centuries of theory and experiment, CSE performs computational experiments to answer questions that neither theory nor experiment alone is equipped to answer. CSE provides scientists and engineers with algorithmic inventions and software systems that transcend disciplines and scales. CSE brings the power of parallelism to bear on troves of data. Mathematics-based advanced computing has become a prevalent means of discovery and innovation in essentially all areas of science, engineering, technology, and society, and the CSE community is at the core of this transformation. However, a combination of disruptive developments---including the architectural complexity of extreme-scale computing, the data revolution and increased attention to data-driven discovery, and the specialization required to follow the applications to new frontiers---is redefining the scope and reach of the CSE endeavor. With these many current and expanding opportunities for the CSE field, there is a growing demand for CSE graduates and a need to expand CSE educational offerings. This need includes CSE programs at both the undergraduate and graduate levels, as well as continuing education and professional development programs, exploiting the synergy between computational science and data science. Yet, as institutions consider new and evolving educational programs, it is essential to consider the broader research challenges and opportunities that provide the context for CSE education and workforce development
Recommended from our members
From Petascale to Exascale: Eight Focus Areas of R&D Challenges for HPC Simulation Environments
Programming models bridge the gap between the underlying hardware architecture and the supporting layers of software available to applications. Programming models are different from both programming languages and application programming interfaces (APIs). Specifically, a programming model is an abstraction of the underlying computer system that allows for the expression of both algorithms and data structures. In comparison, languages and APIs provide implementations of these abstractions and allow the algorithms and data structures to be put into practice - a programming model exists independently of the choice of both the programming language and the supporting APIs. Programming models are typically focused on achieving increased developer productivity, performance, and portability to other system designs. The rapidly changing nature of processor architectures and the complexity of designing an exascale platform provide significant challenges for these goals. Several other factors are likely to impact the design of future programming models. In particular, the representation and management of increasing levels of parallelism, concurrency and memory hierarchies, combined with the ability to maintain a progressive level of interoperability with today's applications are of significant concern. Overall the design of a programming model is inherently tied not only to the underlying hardware architecture, but also to the requirements of applications and libraries including data analysis, visualization, and uncertainty quantification. Furthermore, the successful implementation of a programming model is dependent on exposed features of the runtime software layers and features of the operating system. Successful use of a programming model also requires effective presentation to the software developer within the context of traditional and new software development tools. Consideration must also be given to the impact of programming models on both languages and the associated compiler infrastructure. Exascale programming models must reflect several, often competing, design goals. These design goals include desirable features such as abstraction and separation of concerns. However, some aspects are unique to large-scale computing. For example, interoperability and composability with existing implementations will prove critical. In particular, performance is the essential underlying goal for large-scale systems. A key evaluation metric for exascale models will be the extent to which they support these goals rather than merely enable them
Remnants of compact binary mergers and next-generation numerical relativity codes
Numerical relativity (NR) simulations are crucial for studying the coalescence of compact binaries. Based on NR data, we produce a model for the mass and spin of the remnant black hole (BH) for the coalescence of black hole-neutron star systems, discussing its crucial role in gravitational wave (GW) modeling and in the parameter estimation of the two signals GW200105 and GW200115. In the context of binary neutron star merger simulations, we perform the first systematic study comparing results obtained with various neutrino treatments, the presence of turbulent viscosity and different grid resolutions. We find that the time of BH formation after merger is heavily affected by grid resolution and turbulent viscosity. An early BH formation limits matter ejection from the accretion disc, as the BH swallows a significant portion of it. Our results indicate that more reliable kilonova light curves are obtained only if the various ejecta components are present. Moreover, robust r-process nucleosynthesis yields require inclusion of both neutrino emission and reabsorption in simulations. Advanced neutrino schemes and turbulent viscosity in simulations resolved beyond current standards appear necessary for reliable astrophysical predictions. To carry out computationally demanding simulations of growing complexity, next-generation NR codes that can efficiently leverage the latest pre-exascale many-core and heterogeneous infrastructures are required. To this end we develop GR-Athena++, a new dynamical spacetime solver built on top of Athena++, that shows high-order convergence properties and excellent parallel scalability up to O(105) cores in full 3D binary black hole (BBH) merger simulations. Finally we present GR-AthenaK, the first performance-portable spacetime solver, obtained by refactoring GR-Athena++ with the Kokkos programming model. We demonstrate the correctness and convergence properties of GR-AthenaK with BBH runs on GPUs. GR-AthenaK shows a speedup âŒ50 on one GPU compared to GR-Athena++ on a single CPU core
Region-Adaptive, Error-Controlled Scientific Data Compression using Multilevel Decomposition
The increase of computer processing speed is significantly outpacing improvements in network and storage bandwidth, leading to the big data challenge in modern science, where scientific applications can quickly generate much more data than that can be transferred and stored. As a result, big scientific data must be reduced by a few orders of magnitude while the accuracy of the reduced data needs to be guaranteed for further scientific explorations. Moreover, scientists are often interested in some specific spatial/temporal regions in their data, where higher accuracy is required. The locations of the regions requiring high accuracy can sometimes be prescribed based on application knowledge, while other times they must be estimated based on general spatial/temporal variation. In this paper, we develop a novel multilevel approach which allows users to impose region-wise compression error bounds. Our method utilizes the byproduct of a multilevel compressor to detect regions where details are rich and we provide the theoretical underpinning for region-wise error control. With spatially varying precision preservation, our approach can achieve significantly higher compression ratios than single-error bounded compression approaches and control errors in the regions of interest. We conduct the evaluations on two climate use cases-one targeting small-scale, node features and the other focusing on long, areal features. For both use cases, the locations of the features were unknown ahead of the compression. By selecting approximately 16% of the data based on multi-scale spatial variations and compressing those regions with smaller error tolerances than the rest, our approach improves the accuracy of post-analysis by approximately 2 x compared to single-error-bounded compression at the same compression ratio. Using the same error bound for the region of interest, our approach can achieve an increase of more than 50% in overall compression ratio
Automatische Codegenerierung fĂŒr Massiv Parallele Applikationen in der Numerischen Strömungsmechanik
Solving partial differential equations (PDEs) is a fundamental challenge in many application domains in industry and academia alike. With increasingly large problems, efficient and highly scalable implementations become more and more crucial. Today, facing this challenge is more difficult than ever due to the increasingly heterogeneous hardware landscape. One promising approach is developing domainâspecific languages (DSLs) for a set of applications. Using code generation techniques then allows targeting a range of hardware platforms while concurrently applying domainâspecific optimizations in an automated fashion. The present work aims to further the state of the art in this field. As domain, we choose PDE solvers and, in particular, those from the group of geometric multigrid methods. To avoid having a focus too broad, we restrict ourselves to methods working on structured and patchâstructured grids.
We face the challenge of handling a domain as complex as ours, while providing different abstractions for diverse user groups, by splitting our external DSL ExaSlang into multiple layers, each specifying different aspects of the final application. Layer 1 is designed to resemble LaTeX and allows inputting continuous equations and functions. Their discretization is expressed on layer 2. It is complemented by algorithmic components which can be implemented in a Matlabâlike syntax on layer 3. All information provided to this point is summarized on layer 4, enriched with particulars about data structures and the employed parallelization. Additionally, we support automated progression between the different layers. All ExaSlang input is processed by our jointly developed Scala code generation framework to ultimately emit C++ code. We particularly focus on how to generate applications parallelized with, e.g., MPI and OpenMP that are able to run on workstations and largeâscale cluster alike.
We showcase the applicability of our approach by implementing simple test problems, like Poissonâs equation, as well as relevant applications from the field of computational fluid dynamics (CFD). In particular, we implement scalable solvers for the Stokes, NavierâStokes and shallow water equations (SWE) discretized using finite differences (FD) and finite volumes (FV). For the case of NavierâStokes, we also extend our implementation towards nonâuniform grids, thereby enabling static mesh refinement, and advanced effects such as the simulated fluid being nonâNewtonian and nonâisothermal