4 research outputs found
Experimental Verification and Analysis of Dynamic Loop Scheduling in Scientific Applications
Scientific applications are often irregular and characterized by large computationally-intensive parallel loops. Dynamic loop scheduling (DLS) techniques improve the performance of computationally-intensive scientific applications via load balancing of their execution on high-performance computing (HPC) systems. Identifying the most suitable choices of data distribution strategies, system sizes, and DLS techniques which improve the performance of a given application, requires intensive assessment and a large number of exploratory native experiments (using real applications on real systems), which may not always be feasible or practical due to associated time and costs. In such cases, simulative experiments are more appropriate for studying the performance of applications. This motivates the question of ‘How realistic are the simulations of executions of scientific applications using DLS on HPC platforms?’ In the present work, a methodology is devised to answer this question. It involves the experimental verification and analysis of the performance of DLS in scientific applications. The proposed methodology is employed for a computer vision application executing using four DLS techniques on two different HPC platforms, both via native and simulative experiments. The evaluation and analysis of the native and simulative results indicate that the accuracy of the simulative experiments is strongly influenced by the approach used to extract the computational effort of the application (FLOP- or time-based), the choice of application model representation into simulation (data or task parallel), and the available HPC subsystem models in the simulator (multi-core CPUs, memory hierarchy, and network topology). The minimum and the maximum percent errors achieved between the native and the simulative experiments are 0.95% and 8.03%, respectively
An Approach for Realistically Simulating the Performance of Scientific Applications on High Performance Computing Systems
Scientific applications often contain large, computationally-intensive, and
irregular parallel loops or tasks that exhibit stochastic characteristics.
Applications may suffer from load imbalance during their execution on
high-performance computing (HPC) systems due to such characteristics. Dynamic
loop self-scheduling (DLS) techniques are instrumental in improving the
performance of scientific applications on HPC systems via load balancing.
Selecting a DLS technique that results in the best performance for different
problems and system sizes requires a large number of exploratory experiments. A
theoretical model that can be used to predict the scheduling technique that
yields the best performance for a given problem and system has not yet been
identified. Therefore, simulation is the most appropriate approach for
conducting such exploratory experiments with reasonable costs. This work
devises an approach to realistically simulate computationally-intensive
scientific applications that employ DLS and execute on HPC systems. Several
approaches to represent the application tasks (or loop iterations) are compared
to establish their influence on the simulative application performance. A novel
simulation strategy is introduced, which transforms a native application code
into a simulative code. The native and simulative performance of two
computationally-intensive scientific applications are compared to evaluate the
realism of the proposed simulation approach. The comparison of the performance
characteristics extracted from the native and simulative performance shows that
the proposed simulation approach fully captured most of the performance
characteristics of interest. This work shows and establishes the importance of
simulations that realistically predict the performance of DLS techniques for
different applications and system configurations
Experimental Verification and Analysis of Dynamic Loop Scheduling in Scientific Applications
Scientific applications are often irregular and characterized by large
computationally-intensive parallel loops. Dynamic loop scheduling (DLS)
techniques improve the performance of computationally-intensive scientific
applications via load balancing of their execution on high-performance
computing (HPC) systems. Identifying the most suitable choices of data
distribution strategies, system sizes, and DLS techniques which improve the
performance of a given application, requires intensive assessment and a large
number of exploratory native experiments (using real applications on real
systems), which may not always be feasible or practical due to associated time
and costs. In such cases, simulative experiments are more appropriate for
studying the performance of applications. This motivates the question of How
realistic are the simulations of executions of scientific applications using
DLS on HPC platforms? In the present work, a methodology is devised to answer
this question. It involves the experimental verification and analysis of the
performance of DLS in scientific applications. The proposed methodology is
employed for a computer vision application executing using four DLS techniques
on two different HPC plat- forms, both via native and simulative experiments.
The evaluation and analysis of the native and simulative results indicate that
the accuracy of the simulative experiments is strongly influenced by the
approach used to extract the computational effort of the application (FLOP- or
time-based), the choice of application model representation into simulation
(data or task parallel), and the available HPC subsystem models in the
simulator (multi-core CPUs, memory hierarchy, and network topology). The
minimum and the maximum percent errors achieved between the native and the
simulative experiments are 0.95% and 8.03%, respectively
An approach for Realistically Simulating the Performance of Scientific Applications on High Performance Computing Systems
Scientific applications often contain large, computationally-intensive, and irregular parallel loops or tasks that exhibit stochastic behavior leading to load imbalance. Load imbalance often manifests during the execution of parallel scientific applications on large and complex high-performance computing (HPC) systems. The extreme scale of HPC systems on the road to Exascale computing only exacerbates the loss in performance due to load imbalance. Dynamic loop self-scheduling (DLS) techniques are instrumental in improving the performance of scientific applications on HPC systems via load balancing. Selecting a DLS technique that results in the best performance for different problems and system sizes requires a large number of exploratory experiments. Currently, a theoretical model that can be used to predict the scheduling technique that yields the best performance for a given problem and system has not yet been identified. Therefore, simulation is the most appropriate approach for conducting such exploratory experiments in a reasonable amount of time. However, conducting realistic and trustworthy simulations of application performance under different configurations is challenging. This work devises an approach to realistically simulate computationally-intensive scientific applications that employ DLS and execute on HPC systems. The proposed approach minimizes the sources of uncertainty in the simulative experiments results by bridging the native and simulative experimental approaches. A new method is proposed to capture the variation of application performance between different native executions. Several approaches to represent the application tasks (or loop iterations) are compared to establish their influence on the simulative application performance. A novel simulation strategy is introduced that applies the proposed approach, which transforms a native application code into simulative code. The native and simulative performance of two computationally-intensive scientific applications that employ eight task scheduling techniques (static, nonadaptive dynamic, and adaptive dynamic) are compared to evaluate the realism of the proposed simulation approach. The comparison of the performance characteristics extracted from the native and simulative performance shows that the proposed simulation approach fully captured most of the performance characteristics of interest. This work shows and establishes the importance of simulations that realistically predict the performance of DLS techniques for different applications and system configurations