Search CORE

1,057 research outputs found

Scalable Applications on Heterogeneous System Architectures: A Systematic Performance Analysis Framework

Author: Dietrich Robert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/11/2019
Field of study

The efficient parallel execution of scientific applications is a key challenge in high-performance computing (HPC). With growing parallelism and heterogeneity of compute resources as well as increasingly complex software, performance analysis has become an indispensable tool in the development and optimization of parallel programs. This thesis presents a framework for systematic performance analysis of scalable, heterogeneous applications. Based on event traces, it automatically detects the critical path and inefficiencies that result in waiting or idle time, e.g. due to load imbalances between parallel execution streams. As a prerequisite for the analysis of heterogeneous programs, this thesis specifies inefficiency patterns for computation offloading. Furthermore, an essential contribution was made to the development of tool interfaces for OpenACC and OpenMP, which enable a portable data acquisition and a subsequent analysis for programs with offload directives. At present, these interfaces are already part of the latest OpenACC and OpenMP API specification. The aforementioned work, existing preliminary work, and established analysis methods are combined into a generic analysis process, which can be applied across programming models. Based on the detection of wait or idle states, which can propagate over several levels of parallelism, the analysis identifies wasted computing resources and their root cause as well as the critical-path share for each program region. Thus, it determines the influence of program regions on the load balancing between execution streams and the program runtime. The analysis results include a summary of the detected inefficiency patterns and a program trace, enhanced with information about wait states, their cause, and the critical path. In addition, a ranking, based on the amount of waiting time a program region caused on the critical path, highlights program regions that are relevant for program optimization. The scalability of the proposed performance analysis and its implementation is demonstrated using High-Performance Linpack (HPL), while the analysis results are validated with synthetic programs. A scientific application that uses MPI, OpenMP, and CUDA simultaneously is investigated in order to show the applicability of the analysis

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

Towards a Mini-App for Smoothed Particle Hydrodynamics at Exascale

Author: Cabezón Rubén M.
Cavelan Aurélien
Ciorba Florina M.
Guerrera Danilo
Imbert David
Mayer Lucio
Piccinali Jean-Guillaume
Reed Darren
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

The smoothed particle hydrodynamics (SPH) technique is a purely Lagrangian method, used in numerical simulations of fluids in astrophysics and computational fluid dynamics, among many other fields. SPH simulations with detailed physics represent computationally-demanding calculations. The parallelization of SPH codes is not trivial due to the absence of a structured grid. Additionally, the performance of the SPH codes can be, in general, adversely impacted by several factors, such as multiple time-stepping, long-range interactions, and/or boundary conditions. This work presents insights into the current performance and functionalities of three SPH codes: SPHYNX, ChaNGa, and SPH-flow. These codes are the starting point of an interdisciplinary co-design project, SPH-EXA, for the development of an Exascale-ready SPH mini-app. To gain such insights, a rotating square patch test was implemented as a common test simulation for the three SPH codes and analyzed on two modern HPC systems. Furthermore, to stress the differences with the codes stemming from the astrophysics community (SPHYNX and ChaNGa), an additional test case, the Evrard collapse, has also been carried out. This work extrapolates the common basic SPH features in the three codes for the purpose of consolidating them into a pure-SPH, Exascale-ready, optimized, mini-app. Moreover, the outcome of this serves as direct feedback to the parent codes, to improve their performance and overall scalability.Comment: 18 pages, 4 figures, 5 tables, 2018 IEEE International Conference on Cluster Computing proceedings for WRAp1

arXiv.org e-Print Archive

Crossref

edoc

ZORA

Tem_357 Harnessing the Power of Digital Transformation, Artificial Intelligence and Big Data Analytics with Parallel Computing

Author: Bong Chin Wei
Hoe Min
Jamal Ahmad Dargham
Publication venue
Publication date: 01/01/2020
Field of study

Traditionally, 2D and especially 3D forward modeling and inversion of large geophysical datasets are performed on supercomputing clusters. This was due to the fact computing time taken by using PC was too time consuming. With the introduction of parallel computing, attempts have been made to perform computationally intensive tasks on PC or clusters of personal computers where the computing power was based on Central Processing Unit (CPU). It is further enhanced with Graphical Processing Unit (GPU) as the GPU has become affordable with the launch of GPU based computing devices. Therefore this paper presents a didactic concept in learning and applying parallel computing with the use of General Purpose Graphical Processing Unit (GPGPU) was carried out and perform preliminary testing in migrating existing sequential codes for solving initially 2D forward modeling of geophysical dataset. There are many challenges in performing these tasks mainly due to lack of some necessary development software tools, but the preliminary findings are promising. Traditionally, 2D and especially 3D forward modeling and inversion of large geophysical datasets are performed on supercomputing clusters. This was due to the fact computing time taken by using PC was too time consuming. With the introduction of parallel computing, attempts have been made to perform computationally intensive tasks on PC or clusters of personal computers where the computing power was based on Central Processing Unit (CPU). It is further enhanced with Graphical Processing Unit (GPU) as the GPU has become affordable with the launch of GPU based computing devices. Therefore this paper presents a didactic concept in learning and applying parallel computing with the use of General Purpose Graphical Processing Unit (GPGPU) was carried out and perform preliminary testing in migrating existing sequential codes for solving initially 2D forward modeling of geophysical dataset. There are many challenges in performing these tasks mainly due to lack of some necessary development software tools, but the preliminary findings are promising.Traditionally, 2D and especially 3D forward modeling and inversion of large geophysical datasets are performed on supercomputing clusters. This was due to the fact computing time taken by using PC was too time consuming. With the introduction of parallel computing, attempts have been made to perform computationally intensive tasks on PC or clusters of personal computers where the computing power was based on Central Processing Unit (CPU). It is further enhanced with Graphical Processing Unit (GPU) as the GPU has become affordable with the launch of GPU based computing devices. Therefore this paper presents a didactic concept in learning and applying parallel computing with the use of General Purpose Graphical Processing Unit (GPGPU) was carried out and perform preliminary testing in migrating existing sequential codes for solving initially 2D forward modeling of geophysical dataset. There are many challenges in performing these tasks mainly due to lack of some necessary development software tools, but the preliminary findings are promising

UMS Institutional Repository

Mixing multi-core CPUs and GPUs for scientific simulation software

Author: Hawick K.A.
Leist A.
Playne D.P.
Publication venue: 'Massey University'
Publication date: 01/01/2010
Field of study

Recent technological and economic developments have led to widespread availability of multi-core CPUs and specialist accelerator processors such as graphical processing units (GPUs). The accelerated computational performance possible from these devices can be very high for some applications paradigms. Software languages and systems such as NVIDIA's CUDA and Khronos consortium's open compute language (OpenCL) support a number of individual parallel application programming paradigms. To scale up the performance of some complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica- tions using threading approaches and multi-core CPUs to control independent GPU devices. We present speed-up data and discuss multi-threading software issues for the applications level programmer and o er some suggested areas for language development and integration between coarse-grained and ne-grained multi-thread systems. We discuss results from three common simulation algorithmic areas including: partial di erential equations; graph cluster metric calculations and random number generation. We report on programming experiences and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs; a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and trends in multi-core programming for scienti c applications developers

Massey Research Online

Exascale machines require new programming paradigms and runtimes

Author: Astsatryan Hrachya
Da Costa Georges
Fahringer Thomas
Grasso Ivan
Hristov Atanas
Karatza Helen D.
Lastovetsky Alexey
Marozzo Fabrizio
Petcu Dana
Rico-Gallego Juan-Antonio
Stavrinides Georgios L.
Talia Domenico
Trufio Paolo
Publication venue
Publication date: 01/01/2015
Field of study

Extreme scale parallel computing systems will have tens of thousands of optionally accelerator-equiped nodes with hundreds of cores each, as well as deep memory hierarchies and complex interconnect topologies. Such Exascale systems will provide hardware parallelism at multiple levels and will be energy constrained. Their extreme scale and the rapidly deteriorating reliablity of their hardware components means that Exascale systems will exhibit low mean-time-between-failure values. Furthermore, existing programming models already require heroic programming and optimisation efforts to achieve high efficiency on current supercomputers. Invariably, these efforts are platform-specific and non-portable. In this paper we will explore the shortcomings of existing programming models and runtime systems for large scale computing systems. We then propose and discuss important features of programming paradigms and runtime system to deal with large scale computing systems with a special focus on data-intensive applications and resilience. Finally, we also discuss code sustainability issues and propose several software metrics that are of paramount importance for code development for large scale computing systems

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Open Access Repository