15 research outputs found

    Performance Reproduction and Prediction of Selected Dynamic Loop Scheduling Experiments

    Full text link
    Scientific applications are complex, large, and often exhibit irregular and stochastic behavior. The use of efficient loop scheduling techniques in computationally-intensive applications is crucial for improving their performance on high-performance computing (HPC) platforms. A number of dynamic loop scheduling (DLS) techniques have been proposed between the late 1980s and early 2000s, and efficiently used in scientific applications. In most cases, the computing systems on which they have been tested and validated are no longer available. This work is concerned with the minimization of the sources of uncertainty in the implementation of DLS techniques to avoid unnecessary influences on the performance of scientific applications. Therefore, it is important to ensure that the DLS techniques employed in scientific applications today adhere to their original design goals and specifications. The goal of this work is to attain and increase the trust in the implementation of DLS techniques in present studies. To achieve this goal, the performance of a selection of scheduling experiments from the 1992 original work that introduced factoring is reproduced and predicted via both, simulative and native experimentation. The experiments show that the simulation reproduces the performance achieved on the past computing platform and accurately predicts the performance achieved on the present computing platform. The performance reproduction and prediction confirm that the present implementation of the DLS techniques considered both, in simulation and natively, adheres to their original description. The results confirm the hypothesis that reproducing experiments of identical scheduling scenarios on past and modern hardware leads to an entirely different behavior from expected

    Examining the Reproducibility of Using Dynamic Loop Scheduling Techniques in Scientific Applications

    Get PDF
    Reproducibility of the execution of scientific applications on parallel and distributed systems is a growing concern, underlying the trustworthiness of the experiments and the conclusions derived from experiments. Dynamic loop scheduling (DLS) techniques are an effective approach towards performance improvement of scientific applications via load balancing. These techniques address algorithmic and systemic sources of load imbalance by dynamically assigning tasks to processing elements. The DLS techniques have demonstrated their effectiveness when applied in real applications. Complementing native experiments, simulation is a powerful tool for studying the behavior of parallel and distributed applications. In earlier work, the scalability [1], robustness [2], and resilience [3] of the DLS techniques were investigated using the MSG interface of the SimGrid simulation framework [4]. The present work complements the earlier work and concentrates on the verification via reproducibility of the implementation of the DLS techniques in SimGrid-MSG. This work describes the challenges of verifying the performance of using DLS techniques in earlier implementations of scientific applications. The verification is performed via reproducibility of simulations based on SimGrid-MSG. To simulate experiments selected from earlier literature, the reproduction process begins by extracting the information needed from the earlier literature and converting it into the input required by SimGrid-MSG. The reproducibility study is carried out by comparing the performance of SimGrid-MSG-based experiments with those reported in two selected publications in which the DLS techniques were originally proposed. While the reproduction was not successful for experiments from one of the selected publications, it was successful for experiments from the other. This successful reproduction implies the verification of the DLS implementation in SimGrid-MSG for the considered applications and systems, and thus, it allows well-founded future research on the DLS techniques

    SiL: An Approach for Adjusting Applications to Heterogeneous Systems Under Perturbations

    Full text link
    Scientific applications consist of large and computationally-intensive loops. Dynamic loop scheduling (DLS) techniques are used to load balance the execution of such applications. Load imbalance can be caused by variations in loop iteration execution times due to problem, algorithmic, or systemic characteristics (also, perturbations). The following question motivates this work: "Given an application, a high-performance computing (HPC) system, and both their characteristics and interplay, which DLS technique will achieve improved performance under unpredictable perturbations?" Existing work only considers perturbations caused by variations in the HPC system delivered computational speeds. However, perturbations in available network bandwidth or latency are inevitable on production HPC systems. Simulator in the loop (SiL) is introduced, herein, as a new control-theoretic inspired approach to dynamically select DLS techniques that improve the performance of applications on heterogeneous HPC systems under perturbations. The present work examines the performance of six applications on a heterogeneous system under all above system perturbations. The SiL proof of concept is evaluated using simulation. The performance results confirm the initial hypothesis that no single DLS technique can deliver best performance in all scenarios, while the SiL-based DLS selection delivered improved application performance in most experiments

    Experimental Verification and Analysis of Dynamic Loop Scheduling in Scientific Applications

    Get PDF
    Scientific applications are often irregular and characterized by large computationally-intensive parallel loops. Dynamic loop scheduling (DLS) techniques improve the performance of computationally-intensive scientific applications via load balancing of their execution on high-performance computing (HPC) systems. Identifying the most suitable choices of data distribution strategies, system sizes, and DLS techniques which improve the performance of a given application, requires intensive assessment and a large number of exploratory native experiments (using real applications on real systems), which may not always be feasible or practical due to associated time and costs. In such cases, simulative experiments are more appropriate for studying the performance of applications. This motivates the question of How realistic are the simulations of executions of scientific applications using DLS on HPC platforms? In the present work, a methodology is devised to answer this question. It involves the experimental verification and analysis of the performance of DLS in scientific applications. The proposed methodology is employed for a computer vision application executing using four DLS techniques on two different HPC plat- forms, both via native and simulative experiments. The evaluation and analysis of the native and simulative results indicate that the accuracy of the simulative experiments is strongly influenced by the approach used to extract the computational effort of the application (FLOP- or time-based), the choice of application model representation into simulation (data or task parallel), and the available HPC subsystem models in the simulator (multi-core CPUs, memory hierarchy, and network topology). The minimum and the maximum percent errors achieved between the native and the simulative experiments are 0.95% and 8.03%, respectively

    Experimental Verification and Analysis of Dynamic Loop Scheduling in Scientific Applications

    Get PDF
    Scientific applications are often irregular and characterized by large computationally-intensive parallel loops. Dynamic loop scheduling (DLS) techniques improve the performance of computationally-intensive scientific applications via load balancing of their execution on high-performance computing (HPC) systems. Identifying the most suitable choices of data distribution strategies, system sizes, and DLS techniques which improve the performance of a given application, requires intensive assessment and a large number of exploratory native experiments (using real applications on real systems), which may not always be feasible or practical due to associated time and costs. In such cases, simulative experiments are more appropriate for studying the performance of applications. This motivates the question of ‘How realistic are the simulations of executions of scientific applications using DLS on HPC platforms?’ In the present work, a methodology is devised to answer this question. It involves the experimental verification and analysis of the performance of DLS in scientific applications. The proposed methodology is employed for a computer vision application executing using four DLS techniques on two different HPC platforms, both via native and simulative experiments. The evaluation and analysis of the native and simulative results indicate that the accuracy of the simulative experiments is strongly influenced by the approach used to extract the computational effort of the application (FLOP- or time-based), the choice of application model representation into simulation (data or task parallel), and the available HPC subsystem models in the simulator (multi-core CPUs, memory hierarchy, and network topology). The minimum and the maximum percent errors achieved between the native and the simulative experiments are 0.95% and 8.03%, respectively

    An Approach for Realistically Simulating the Performance of Scientific Applications on High Performance Computing Systems

    Full text link
    Scientific applications often contain large, computationally-intensive, and irregular parallel loops or tasks that exhibit stochastic characteristics. Applications may suffer from load imbalance during their execution on high-performance computing (HPC) systems due to such characteristics. Dynamic loop self-scheduling (DLS) techniques are instrumental in improving the performance of scientific applications on HPC systems via load balancing. Selecting a DLS technique that results in the best performance for different problems and system sizes requires a large number of exploratory experiments. A theoretical model that can be used to predict the scheduling technique that yields the best performance for a given problem and system has not yet been identified. Therefore, simulation is the most appropriate approach for conducting such exploratory experiments with reasonable costs. This work devises an approach to realistically simulate computationally-intensive scientific applications that employ DLS and execute on HPC systems. Several approaches to represent the application tasks (or loop iterations) are compared to establish their influence on the simulative application performance. A novel simulation strategy is introduced, which transforms a native application code into a simulative code. The native and simulative performance of two computationally-intensive scientific applications are compared to evaluate the realism of the proposed simulation approach. The comparison of the performance characteristics extracted from the native and simulative performance shows that the proposed simulation approach fully captured most of the performance characteristics of interest. This work shows and establishes the importance of simulations that realistically predict the performance of DLS techniques for different applications and system configurations

    Evaluating the Robustness of Resource Allocations Obtained through Performance Modeling with Stochastic Process Algebra

    Get PDF
    Recent developments in the field of parallel and distributed computing has led to a proliferation of solving large and computationally intensive mathematical, science, or engineering problems, that consist of several parallelizable parts and several non-parallelizable (sequential) parts. In a parallel and distributed computing environment, the performance goal is to optimize the execution of parallelizable parts of an application on concurrent processors. This requires efficient application scheduling and resource allocation for mapping applications to a set of suitable parallel processors such that the overall performance goal is achieved. However, such computational environments are often prone to unpredictable variations in application (problem and algorithm) and system characteristics. Therefore, a robustness study is required to guarantee a desired level of performance. Given an initial workload, a mapping of applications to resources is considered to be robust if that mapping optimizes execution performance and guarantees a desired level of performance in the presence of unpredictable perturbations at runtime. In this research, a stochastic process algebra, Performance Evaluation Process Algebra (PEPA), is used for obtaining resource allocations via a numerical analysis of performance modeling of the parallel execution of applications on parallel computing resources. The PEPA performance model is translated into an underlying mathematical Markov chain model for obtaining performance measures. Further, a robustness analysis of the allocation techniques is performed for finding a robustmapping from a set of initial mapping schemes. The numerical analysis of the performance models have confirmed similarity with the simulation results of earlier research available in existing literature. When compared to direct experiments and simulations, numerical models and the corresponding analyses are easier to reproduce, do not incur any setup or installation costs, do not impose any prerequisites for learning a simulation framework, and are not limited by the complexity of the underlying infrastructure or simulation libraries

    Design of robust scheduling methodologies for high performance computing

    Get PDF
    Scientific applications are often large, complex, computationally-intensive, and irregular. Loops are often an abundant source of parallelism in scientific applications. Due to the ever-increasing computational needs of scientific applications, high performance computing (HPC) systems have become larger and more complex, offering increased parallelism at multiple hardware levels. Load imbalance, caused by irregular computational load per task and unpredictable computing system characteristics (system variability), often degrades the performance of applications. Besides, perturbations, such as reduced computing power, network latency availability, or failures, can severely impact the performance of the applications. System variability and perturbations are only expected to increase in future extreme-scale computing systems. Extrapolating the current failure rate to Exascale would result in a failure every 20 minutes. Such failure rate and perturbations would render the computing systems unusable. This doctoral thesis improves the performance of computationally-intensive scientific applications on HPC systems via robust load balancing. Robust scheduling ensures and maintains improved load balanced execution under unpredictable application and system characteristics. A number of dynamic loop self-scheduling (DLS) techniques have been introduced and successfully used in scientific applications between the 1980s and 2000s. These DLS techniques are not fault-tolerant as they were originally introduced. In this thesis, we identify three major research questions to achieve robust scheduling (1) How to ensure that the DLS techniques employed in scientific applications today adhere to their original design goals and specifications? (2) How to select a DLS technique that will achieve improved performance under perturbations? (3) How to tolerate perturbations during execution and maintain a load balanced execution on HPC systems? To answer the first question, we reproduced the original experiments that introduced the DLS techniques to verify their present implementation. Simulation is used to reproduce experiments on systems from the past. Realistic simulation induces a similar analysis and conclusions to the analysis of the native results. To this end, we devised an approach for bridging the native and simulative executions of parallel applications on HPC systems. This simulation approach is used to reproduce scheduling experiments on past and present systems to verify the implementation of DLS techniques. Given the multiple levels of parallelism offered by the present HPC systems, we analyzed the load imbalance in scientific applications, from computer vision, astrophysics, and mathematical kernels, at both thread and process levels. This analysis revealed a significant interplay between thread level and process level load balancing. We found that dynamic load balancing at the thread level propagates to the process level and vice versa. However, the best application performance is only achieved by two-level dynamic load balancing. Next, we examined the performance of applications under perturbations. We found that the most robust DLS technique does not deliver the best performance under various perturbations. The most efficient DLS technique changes by changing the application, the system, or perturbations during execution. This signifies the algorithm selection problem in the DLS. We leveraged realistic simulations to address the algorithm selection problem of scheduling under perturbations via a simulation assisted approach (SimAS), which answers the second question. SimAS dynamically selects DLS techniques that improve the performance depending on the application, system, and perturbations during the execution. To answer the third question, we introduced a robust dynamic load balancing (rDLB) approach for the robust self-scheduling of scientific applications under failures (question 3). rDLB proactively reschedules already allocated tasks and requires no detection of perturbations. rDLB tolerates up to P −1 processor failures (P is the number of processors allocated to the application) and boosts the flexibility of applications against nonfatal perturbations, such as reduced availability of resources. This thesis is the first to provide insights into the interplay between thread and process level dynamic load balancing in scientific applications. Verified DLS techniques, SimAS, and rDLB are integrated into an MPI-based dynamic load balancing library (DLS4LB), which supports thirteen DLS techniques, for robust dynamic load balancing of scientific applications on HPC systems. Using the methods devised in this thesis, we improved the performance of scientific applications by up to 21% via two-level dynamic load balancing. Under perturbations, we enhanced their performance by a factor of 7 and their flexibility by a factor of 30. This thesis opens up the horizons into understanding the interplay of load balancing between various levels of software parallelism and lays the ground for robust multilevel scheduling for the upcoming Exascale HPC systems and beyond

    HAEC News

    Get PDF
    corecore