Search CORE

24 research outputs found

Toward Performance-Portable PETSc for GPU-based Exascale Systems

Author: Adams Mark F.
Balay Satish
Brown Jed
Dener Alp
Knepley Matthew
Kruger Scott E.
Mills Richard Tran
Morgan Hannah
Munson Todd
Rupp Karl
Smith Barry F.
Zampini Stefano
Zhang Hong
Zhang Junchao
Publication venue
Publication date: 29/09/2021
Field of study

The Portable Extensible Toolkit for Scientific computation (PETSc) library delivers scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization.The PETSc design for performance portability addresses fundamental GPU accelerator challenges and stresses flexibility and extensibility by separating the programming model used by the application from that used by the library, and it enables application developers to use their preferred programming model, such as Kokkos, RAJA, SYCL, HIP, CUDA, or OpenCL, on upcoming exascale systems. A blueprint for using GPUs from PETSc-based codes is provided, and case studies emphasize the flexibility and high performance achieved on current GPU-based systems.Comment: 15 pages, 10 figures, 2 table

arXiv.org e-Print Archive

eScholarship - University of California

Research and Education in Computational Science and Engineering

Over the past two decades the field of computational science and engineering (CSE) has penetrated both basic and applied research in academia, industry, and laboratories to advance discovery, optimize systems, support decision-makers, and educate the scientific and engineering workforce. Informed by centuries of theory and experiment, CSE performs computational experiments to answer questions that neither theory nor experiment alone is equipped to answer. CSE provides scientists and engineers of all persuasions with algorithmic inventions and software systems that transcend disciplines and scales. Carried on a wave of digital technology, CSE brings the power of parallelism to bear on troves of data. Mathematics-based advanced computing has become a prevalent means of discovery and innovation in essentially all areas of science, engineering, technology, and society; and the CSE community is at the core of this transformation. However, a combination of disruptive developments---including the architectural complexity of extreme-scale computing, the data revolution that engulfs the planet, and the specialization required to follow the applications to new frontiers---is redefining the scope and reach of the CSE endeavor. This report describes the rapid expansion of CSE and the challenges to sustaining its bold advances. The report also presents strategies and directions for CSE research and education for the next decade.Comment: Major revision, to appear in SIAM Revie

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Improving Energy Saving of One-sided Matrix Decompositions on CPU-GPU Heterogeneous Systems

Author: Bhuyan Laxmi
Chen Jieyang
Chen Zizhong
Liang Xin
Sabzi Hadi Zamani
Zhao Kai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/04/2023
Field of study

One-sided dense matrix decompositions (e.g., Cholesky, LU, and QR) are the key components in scientific computing in many different fields. Although their design has been highly optimized for modern processors, they still consume a considerable amount of energy. As CPU-GPU heterogeneous systems are commonly used for matrix decompositions, in this work, we aim to further improve the energy saving of one-sided matrix decompositions on CPU-GPU heterogeneous systems. We first build an Algorithm-Based Fault Tolerance protected overclocking technique (ABFT-OC) to enable us to exploit reliable overclocking for key matrix decomposition operations. Then, we design an energy-saving matrix decomposition framework, Bi-directional Slack Reclamation(BSR), that can intelligently combine the capability provided by ABFT-OC and DVFS to maximize energy saving and maintain performance and reliability. Experiments show that BSR is able to save up to 11.7% more energy compared with the current best energy saving optimization approach with no performance degradation and up to 14.1% Energy * Delay^2 reduction. Also, BSR enables the Pareto efficient performance-energy trade-off, which is able to provide up to 1.43x performance improvement without costing extra energy

arXiv.org e-Print Archive

Recommended from our members

FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs

Author: Cappello Franck
Di Sheng
Feng Yunhe
Liang Xin
Tao Dingwen
Tian Jiannan
Yu Xiaodong
Zhang Bouyan
Publication venue: Association for Computing Machinery
Publication date: 07/08/2023
Field of study

Article describes how today's large-scale scientific applications running on high-performance computing (HPC) systems generate vast data volumes. Thus, data compression is becoming a critical technique to mitigate the storage burden and data-movement cost. In this paper, the authors develop a fast and high- ratio error-bounded lossy compressor on GPUs for scientific data (called FZ-GPU)

UNT Digital Library

Research and Education in Computational Science and Engineering

Author: De Sterck Hans
McInnes Lois Curfman
Rude Ulrich
willcox Karen
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 20/11/2019
Field of study

This report presents challenges, opportunities, and directions for computational science and engineering (CSE) research and education for the next decade. Over the past two decades the field of CSE has penetrated both basic and applied research in academia, industry, and laboratories to advance discovery, optimize systems, support decision-makers, and educate the scientific and engineering workforce. Informed by centuries of theory and experiment, CSE performs computational experiments to answer questions that neither theory nor experiment alone is equipped to answer. CSE provides scientists and engineers with algorithmic inventions and software systems that transcend disciplines and scales. CSE brings the power of parallelism to bear on troves of data. Mathematics-based advanced computing has become a prevalent means of discovery and innovation in essentially all areas of science, engineering, technology, and society, and the CSE community is at the core of this transformation. However, a combination of disruptive developments---including the architectural complexity of extreme-scale computing, the data revolution and increased attention to data-driven discovery, and the specialization required to follow the applications to new frontiers---is redefining the scope and reach of the CSE endeavor. With these many current and expanding opportunities for the CSE field, there is a growing demand for CSE graduates and a need to expand CSE educational offerings. This need includes CSE programs at both the undergraduate and graduate levels, as well as continuing education and professional development programs, exploiting the synergy between computational science and data science. Yet, as institutions consider new and evolving educational programs, it is essential to consider the broader research challenges and opportunities that provide the context for CSE education and workforce development

KU ScholarWorks

Recommended from our members

From Petascale to Exascale: Eight Focus Areas of R&D Challenges for HPC Simulation Environments

Author: Ahrens J.
Hemmert S.
Knoll D.
McCormick P.
Minnich R.
Schulz M.
Springmeyer R.
Still C.
Ward L.
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 17/03/2011
Field of study

Programming models bridge the gap between the underlying hardware architecture and the supporting layers of software available to applications. Programming models are different from both programming languages and application programming interfaces (APIs). Specifically, a programming model is an abstraction of the underlying computer system that allows for the expression of both algorithms and data structures. In comparison, languages and APIs provide implementations of these abstractions and allow the algorithms and data structures to be put into practice - a programming model exists independently of the choice of both the programming language and the supporting APIs. Programming models are typically focused on achieving increased developer productivity, performance, and portability to other system designs. The rapidly changing nature of processor architectures and the complexity of designing an exascale platform provide significant challenges for these goals. Several other factors are likely to impact the design of future programming models. In particular, the representation and management of increasing levels of parallelism, concurrency and memory hierarchies, combined with the ability to maintain a progressive level of interoperability with today's applications are of significant concern. Overall the design of a programming model is inherently tied not only to the underlying hardware architecture, but also to the requirements of applications and libraries including data analysis, visualization, and uncertainty quantification. Furthermore, the successful implementation of a programming model is dependent on exposed features of the runtime software layers and features of the operating system. Successful use of a programming model also requires effective presentation to the software developer within the context of traditional and new software development tools. Consideration must also be given to the impact of programming models on both languages and the associated compiler infrastructure. Exascale programming models must reflect several, often competing, design goals. These design goals include desirable features such as abstraction and separation of concerns. However, some aspects are unique to large-scale computing. For example, interoperability and composability with existing implementations will prove critical. In particular, performance is the essential underlying goal for large-scale systems. A key evaluation metric for exascale models will be the extent to which they support these goals rather than merely enable them

UNT Digital Library

Remnants of compact binary mergers and next-generation numerical relativity codes

Author: Zappa Francesco
Publication venue
Publication date: 01/01/2023
Field of study

Numerical relativity (NR) simulations are crucial for studying the coalescence of compact binaries. Based on NR data, we produce a model for the mass and spin of the remnant black hole (BH) for the coalescence of black hole-neutron star systems, discussing its crucial role in gravitational wave (GW) modeling and in the parameter estimation of the two signals GW200105 and GW200115. In the context of binary neutron star merger simulations, we perform the first systematic study comparing results obtained with various neutrino treatments, the presence of turbulent viscosity and different grid resolutions. We find that the time of BH formation after merger is heavily affected by grid resolution and turbulent viscosity. An early BH formation limits matter ejection from the accretion disc, as the BH swallows a significant portion of it. Our results indicate that more reliable kilonova light curves are obtained only if the various ejecta components are present. Moreover, robust r-process nucleosynthesis yields require inclusion of both neutrino emission and reabsorption in simulations. Advanced neutrino schemes and turbulent viscosity in simulations resolved beyond current standards appear necessary for reliable astrophysical predictions. To carry out computationally demanding simulations of growing complexity, next-generation NR codes that can efficiently leverage the latest pre-exascale many-core and heterogeneous infrastructures are required. To this end we develop GR-Athena++, a new dynamical spacetime solver built on top of Athena++, that shows high-order convergence properties and excellent parallel scalability up to O(105) cores in full 3D binary black hole (BBH) merger simulations. Finally we present GR-AthenaK, the first performance-portable spacetime solver, obtained by refactoring GR-Athena++ with the Kokkos programming model. We demonstrate the correctness and convergence properties of GR-AthenaK with BBH runs on GPUs. GR-AthenaK shows a speedup ∼50 on one GPU compared to GR-Athena++ on a single CPU core

Digitale Bibliothek Thüringen

Region-Adaptive, Error-Controlled Scientific Data Compression using Multilevel Decomposition

Author: Chen Jieyang
Gong Qian
Jacob Robert
Klasky Scott
Liang Xin
Liu Qing
Rangarajan Anand
Ranka Sanjay
Ullrich Paul
Wan Lipeng
Whitney Ben
Zhang Chengzhu
Publication venue: Scholars\u27 Mine
Publication date: 06/07/2022
Field of study

The increase of computer processing speed is significantly outpacing improvements in network and storage bandwidth, leading to the big data challenge in modern science, where scientific applications can quickly generate much more data than that can be transferred and stored. As a result, big scientific data must be reduced by a few orders of magnitude while the accuracy of the reduced data needs to be guaranteed for further scientific explorations. Moreover, scientists are often interested in some specific spatial/temporal regions in their data, where higher accuracy is required. The locations of the regions requiring high accuracy can sometimes be prescribed based on application knowledge, while other times they must be estimated based on general spatial/temporal variation. In this paper, we develop a novel multilevel approach which allows users to impose region-wise compression error bounds. Our method utilizes the byproduct of a multilevel compressor to detect regions where details are rich and we provide the theoretical underpinning for region-wise error control. With spatially varying precision preservation, our approach can achieve significantly higher compression ratios than single-error bounded compression approaches and control errors in the regions of interest. We conduct the evaluations on two climate use cases-one targeting small-scale, node features and the other focusing on long, areal features. For both use cases, the locations of the features were unknown ahead of the compression. By selecting approximately 16% of the data based on multi-scale spatial variations and compressing those regions with smaller error tolerances than the rest, our approach improves the accuracy of post-analysis by approximately 2 x compared to single-error-bounded compression at the same compression ratio. Using the same error bound for the region of interest, our approach can achieve an increase of more than 50% in overall compression ratio

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Automatische Codegenerierung für Massiv Parallele Applikationen in der Numerischen Strömungsmechanik

Author: Kuckuk Sebastian
Publication venue
Publication date: 01/01/2019
Field of study

Solving partial differential equations (PDEs) is a fundamental challenge in many application domains in industry and academia alike. With increasingly large problems, efficient and highly scalable implementations become more and more crucial. Today, facing this challenge is more difficult than ever due to the increasingly heterogeneous hardware landscape. One promising approach is developing domain‐specific languages (DSLs) for a set of applications. Using code generation techniques then allows targeting a range of hardware platforms while concurrently applying domain‐specific optimizations in an automated fashion. The present work aims to further the state of the art in this field. As domain, we choose PDE solvers and, in particular, those from the group of geometric multigrid methods. To avoid having a focus too broad, we restrict ourselves to methods working on structured and patch‐structured grids. We face the challenge of handling a domain as complex as ours, while providing different abstractions for diverse user groups, by splitting our external DSL ExaSlang into multiple layers, each specifying different aspects of the final application. Layer 1 is designed to resemble LaTeX and allows inputting continuous equations and functions. Their discretization is expressed on layer 2. It is complemented by algorithmic components which can be implemented in a Matlab‐like syntax on layer 3. All information provided to this point is summarized on layer 4, enriched with particulars about data structures and the employed parallelization. Additionally, we support automated progression between the different layers. All ExaSlang input is processed by our jointly developed Scala code generation framework to ultimately emit C++ code. We particularly focus on how to generate applications parallelized with, e.g., MPI and OpenMP that are able to run on workstations and large‐scale cluster alike. We showcase the applicability of our approach by implementing simple test problems, like Poisson’s equation, as well as relevant applications from the field of computational fluid dynamics (CFD). In particular, we implement scalable solvers for the Stokes, Navier‐Stokes and shallow water equations (SWE) discretized using finite differences (FD) and finite volumes (FV). For the case of Navier‐Stokes, we also extend our implementation towards non‐uniform grids, thereby enabling static mesh refinement, and advanced effects such as the simulated fluid being non‐Newtonian and non‐isothermal