9,140 research outputs found
rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent Tasks
Scientific applications often contain large and computationally intensive
parallel loops. Dynamic loop self scheduling (DLS) is used to achieve a
balanced load execution of such applications on high performance computing
(HPC) systems. Large HPC systems are vulnerable to processors or node failures
and perturbations in the availability of resources. Most self-scheduling
approaches do not consider fault-tolerant scheduling or depend on failure or
perturbation detection and react by rescheduling failed tasks. In this work, a
robust dynamic load balancing (rDLB) approach is proposed for the robust self
scheduling of independent tasks. The proposed approach is proactive and does
not depend on failure or perturbation detection. The theoretical analysis of
the proposed approach shows that it is linearly scalable and its cost decrease
quadratically by increasing the system size. rDLB is integrated into an MPI DLS
library to evaluate its performance experimentally with two computationally
intensive scientific applications. Results show that rDLB enables the tolerance
of up to (P minus one) processor failures, where P is the number of processors
executing an application. In the presence of perturbations, rDLB boosted the
robustness of DLS techniques up to 30 times and decreased application execution
time up to 7 times compared to their counterparts without rDLB
Evaluating the Robustness of Resource Allocations Obtained through Performance Modeling with Stochastic Process Algebra
Recent developments in the field of parallel and distributed computing has led to a proliferation of solving large and computationally intensive mathematical, science, or engineering problems, that consist of several parallelizable parts and several non-parallelizable (sequential) parts. In a parallel and distributed computing environment, the performance goal is to optimize the execution of parallelizable parts of an application on concurrent processors. This requires efficient application scheduling and resource allocation for mapping applications to a set of suitable parallel processors such that the overall performance goal is achieved. However, such computational environments are often prone to unpredictable variations in application (problem and algorithm) and system characteristics. Therefore, a robustness study is required to guarantee a desired level of performance. Given an initial workload, a mapping of applications to resources is considered to be robust if that mapping optimizes execution performance and guarantees a desired level of performance in the presence of unpredictable perturbations at runtime. In this research, a stochastic process algebra, Performance Evaluation Process Algebra (PEPA), is used for obtaining resource allocations via a numerical analysis of performance modeling of the parallel execution of applications on parallel computing resources. The PEPA performance model is translated into an underlying mathematical Markov chain model for obtaining performance measures. Further, a robustness analysis of the allocation techniques is performed for finding a robustmapping from a set of initial mapping schemes. The numerical analysis of the performance models have confirmed similarity with the simulation results of earlier research available in existing literature. When compared to direct experiments and simulations, numerical models and the corresponding analyses are easier to reproduce, do not incur any setup or installation costs, do not impose any prerequisites for learning a simulation framework, and are not limited by the complexity of the underlying infrastructure or simulation libraries
Feedback and time are essential for the optimal control of computing systems
The performance, reliability, cost, size and energy usage of computing systems can be improved by one or more orders of magnitude by the systematic use of modern control and optimization methods. Computing systems rely on the use of feedback algorithms to schedule tasks, data and resources, but the models that are used to design these algorithms are validated using open-loop metrics. By using closed-loop metrics instead, such as the gap metric developed in the control community, it should be possible to develop improved scheduling algorithms and computing systems that have not been over-engineered. Furthermore, scheduling problems are most naturally formulated as constraint satisfaction or mathematical optimization problems, but these are seldom implemented using state of the art numerical methods, nor do they explicitly take into account the fact that the scheduling problem itself takes time to solve. This paper makes the case that recent results in real-time model predictive control, where optimization problems are solved in order to control a process that evolves in time, are likely to form the basis of scheduling algorithms of the future. We therefore outline some of the research problems and opportunities that could arise by explicitly considering feedback and time when designing optimal scheduling algorithms for computing systems
Design of robust scheduling methodologies for high performance computing
Scientific applications are often large, complex, computationally-intensive, and irregular. Loops are often an abundant source of parallelism in scientific applications. Due to the ever-increasing computational needs of scientific applications, high performance computing (HPC) systems have become larger and more complex, offering increased parallelism at multiple hardware levels.
Load imbalance, caused by irregular computational load per task and unpredictable computing system characteristics (system variability), often degrades the performance of applications. Besides, perturbations, such as reduced computing power, network latency availability, or failures, can severely impact the performance of the applications. System variability and perturbations are only expected to increase in future extreme-scale computing systems. Extrapolating the current failure rate to Exascale would result in a failure every 20 minutes. Such failure rate and perturbations would render the computing systems unusable. This doctoral thesis improves the performance of computationally-intensive scientific applications on HPC systems via robust load balancing. Robust scheduling ensures and maintains improved load balanced execution under unpredictable application and system characteristics.
A number of dynamic loop self-scheduling (DLS) techniques have been introduced and successfully used in scientific applications between the 1980s and 2000s. These DLS techniques are not fault-tolerant as they were originally introduced. In this thesis, we identify three major research questions to achieve robust scheduling (1) How to ensure that the DLS techniques employed in scientific applications today adhere to their original design goals and specifications? (2) How to select a DLS technique that will achieve improved performance under perturbations? (3) How to tolerate perturbations during execution and maintain a load balanced execution on HPC systems?
To answer the first question, we reproduced the original experiments that introduced the DLS techniques to verify their present implementation. Simulation is used to reproduce experiments on systems from the past. Realistic simulation induces a similar analysis and conclusions to the analysis of the native results. To this end, we devised an approach for bridging the native and simulative executions of parallel applications on HPC systems. This simulation approach is used to reproduce scheduling experiments on past and present systems to verify the implementation of DLS techniques.
Given the multiple levels of parallelism offered by the present HPC systems, we analyzed the load imbalance in scientific applications, from computer vision, astrophysics, and mathematical kernels, at both thread and process levels. This analysis revealed a significant interplay between thread level and process level load balancing. We found that dynamic load balancing at the thread level propagates to the process level and vice versa. However, the best application performance is only achieved by two-level dynamic load balancing.
Next, we examined the performance of applications under perturbations. We found that the most robust DLS technique does not deliver the best performance under various perturbations. The most efficient DLS technique changes by changing the application, the system, or perturbations during execution. This signifies the algorithm selection problem in the DLS. We leveraged realistic simulations to address the algorithm selection problem of scheduling under perturbations via a simulation assisted approach (SimAS), which answers the second question. SimAS dynamically selects DLS techniques that improve the performance depending on the application, system, and perturbations during the execution.
To answer the third question, we introduced a robust dynamic load balancing (rDLB) approach for the robust self-scheduling of scientific applications under failures (question 3). rDLB proactively reschedules already allocated tasks and requires no detection of perturbations. rDLB tolerates up to P −1 processor failures (P is the number of processors allocated to the application) and boosts the flexibility of applications against nonfatal perturbations, such as reduced availability of resources.
This thesis is the first to provide insights into the interplay between thread and process level dynamic load balancing in scientific applications. Verified DLS techniques, SimAS, and rDLB are integrated into an MPI-based dynamic load balancing library (DLS4LB), which supports thirteen DLS techniques, for robust dynamic load balancing of scientific applications on HPC systems. Using the methods devised in this thesis, we improved the performance of scientific applications by up to 21% via two-level dynamic load balancing. Under perturbations, we enhanced their performance by a factor of 7 and their flexibility by a factor of 30. This thesis opens up the horizons into understanding the interplay of load balancing between various levels of software parallelism and lays the ground for robust multilevel scheduling for the upcoming Exascale HPC systems and beyond
Performance Reproduction and Prediction of Selected Dynamic Loop Scheduling Experiments
Scientific applications are complex, large, and often exhibit irregular and
stochastic behavior. The use of efficient loop scheduling techniques in
computationally-intensive applications is crucial for improving their
performance on high-performance computing (HPC) platforms. A number of dynamic
loop scheduling (DLS) techniques have been proposed between the late 1980s and
early 2000s, and efficiently used in scientific applications. In most cases,
the computing systems on which they have been tested and validated are no
longer available. This work is concerned with the minimization of the sources
of uncertainty in the implementation of DLS techniques to avoid unnecessary
influences on the performance of scientific applications. Therefore, it is
important to ensure that the DLS techniques employed in scientific applications
today adhere to their original design goals and specifications. The goal of
this work is to attain and increase the trust in the implementation of DLS
techniques in present studies. To achieve this goal, the performance of a
selection of scheduling experiments from the 1992 original work that introduced
factoring is reproduced and predicted via both, simulative and native
experimentation. The experiments show that the simulation reproduces the
performance achieved on the past computing platform and accurately predicts the
performance achieved on the present computing platform. The performance
reproduction and prediction confirm that the present implementation of the DLS
techniques considered both, in simulation and natively, adheres to their
original description. The results confirm the hypothesis that reproducing
experiments of identical scheduling scenarios on past and modern hardware leads
to an entirely different behavior from expected
Revisiting Actor Programming in C++
The actor model of computation has gained significant popularity over the
last decade. Its high level of abstraction makes it appealing for concurrent
applications in parallel and distributed systems. However, designing a
real-world actor framework that subsumes full scalability, strong reliability,
and high resource efficiency requires many conceptual and algorithmic additives
to the original model.
In this paper, we report on designing and building CAF, the "C++ Actor
Framework". CAF targets at providing a concurrent and distributed native
environment for scaling up to very large, high-performance applications, and
equally well down to small constrained systems. We present the key
specifications and design concepts---in particular a message-transparent
architecture, type-safe message interfaces, and pattern matching
facilities---that make native actors a viable approach for many robust,
elastic, and highly distributed developments. We demonstrate the feasibility of
CAF in three scenarios: first for elastic, upscaling environments, second for
including heterogeneous hardware like GPGPUs, and third for distributed runtime
systems. Extensive performance evaluations indicate ideal runtime behaviour for
up to 64 cores at very low memory footprint, or in the presence of GPUs. In
these tests, CAF continuously outperforms the competing actor environments
Erlang, Charm++, SalsaLite, Scala, ActorFoundry, and even the OpenMPI.Comment: 33 page
Wireless industrial monitoring and control networks: the journey so far and the road ahead
While traditional wired communication technologies have played a crucial role in industrial monitoring and control networks over the past few decades, they are increasingly proving to be inadequate to meet the highly dynamic and stringent demands of today’s industrial applications, primarily due to the very rigid nature of wired infrastructures. Wireless technology, however, through its increased pervasiveness, has the potential to revolutionize the industry, not only by mitigating the problems faced by wired solutions, but also by introducing a completely new class of applications. While present day wireless technologies made some preliminary inroads in the monitoring domain, they still have severe limitations especially when real-time, reliable distributed control operations are concerned. This article provides the reader with an overview of existing wireless technologies commonly used in the monitoring and control industry. It highlights the pros and cons of each technology and assesses the degree to which each technology is able to meet the stringent demands of industrial monitoring and control networks. Additionally, it summarizes mechanisms proposed by academia, especially serving critical applications by addressing the real-time and reliability requirements of industrial process automation. The article also describes certain key research problems from the physical layer communication for sensor networks and the wireless networking perspective that have yet to be addressed to allow the successful use of wireless technologies in industrial monitoring and control networks
- …