238 research outputs found

    rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent Tasks

    Full text link
    Scientific applications often contain large and computationally intensive parallel loops. Dynamic loop self scheduling (DLS) is used to achieve a balanced load execution of such applications on high performance computing (HPC) systems. Large HPC systems are vulnerable to processors or node failures and perturbations in the availability of resources. Most self-scheduling approaches do not consider fault-tolerant scheduling or depend on failure or perturbation detection and react by rescheduling failed tasks. In this work, a robust dynamic load balancing (rDLB) approach is proposed for the robust self scheduling of independent tasks. The proposed approach is proactive and does not depend on failure or perturbation detection. The theoretical analysis of the proposed approach shows that it is linearly scalable and its cost decrease quadratically by increasing the system size. rDLB is integrated into an MPI DLS library to evaluate its performance experimentally with two computationally intensive scientific applications. Results show that rDLB enables the tolerance of up to (P minus one) processor failures, where P is the number of processors executing an application. In the presence of perturbations, rDLB boosted the robustness of DLS techniques up to 30 times and decreased application execution time up to 7 times compared to their counterparts without rDLB

    SiL: An Approach for Adjusting Applications to Heterogeneous Systems Under Perturbations

    Full text link
    Scientific applications consist of large and computationally-intensive loops. Dynamic loop scheduling (DLS) techniques are used to load balance the execution of such applications. Load imbalance can be caused by variations in loop iteration execution times due to problem, algorithmic, or systemic characteristics (also, perturbations). The following question motivates this work: "Given an application, a high-performance computing (HPC) system, and both their characteristics and interplay, which DLS technique will achieve improved performance under unpredictable perturbations?" Existing work only considers perturbations caused by variations in the HPC system delivered computational speeds. However, perturbations in available network bandwidth or latency are inevitable on production HPC systems. Simulator in the loop (SiL) is introduced, herein, as a new control-theoretic inspired approach to dynamically select DLS techniques that improve the performance of applications on heterogeneous HPC systems under perturbations. The present work examines the performance of six applications on a heterogeneous system under all above system perturbations. The SiL proof of concept is evaluated using simulation. The performance results confirm the initial hypothesis that no single DLS technique can deliver best performance in all scenarios, while the SiL-based DLS selection delivered improved application performance in most experiments

    Design of robust scheduling methodologies for high performance computing

    Get PDF
    Scientific applications are often large, complex, computationally-intensive, and irregular. Loops are often an abundant source of parallelism in scientific applications. Due to the ever-increasing computational needs of scientific applications, high performance computing (HPC) systems have become larger and more complex, offering increased parallelism at multiple hardware levels. Load imbalance, caused by irregular computational load per task and unpredictable computing system characteristics (system variability), often degrades the performance of applications. Besides, perturbations, such as reduced computing power, network latency availability, or failures, can severely impact the performance of the applications. System variability and perturbations are only expected to increase in future extreme-scale computing systems. Extrapolating the current failure rate to Exascale would result in a failure every 20 minutes. Such failure rate and perturbations would render the computing systems unusable. This doctoral thesis improves the performance of computationally-intensive scientific applications on HPC systems via robust load balancing. Robust scheduling ensures and maintains improved load balanced execution under unpredictable application and system characteristics. A number of dynamic loop self-scheduling (DLS) techniques have been introduced and successfully used in scientific applications between the 1980s and 2000s. These DLS techniques are not fault-tolerant as they were originally introduced. In this thesis, we identify three major research questions to achieve robust scheduling (1) How to ensure that the DLS techniques employed in scientific applications today adhere to their original design goals and specifications? (2) How to select a DLS technique that will achieve improved performance under perturbations? (3) How to tolerate perturbations during execution and maintain a load balanced execution on HPC systems? To answer the first question, we reproduced the original experiments that introduced the DLS techniques to verify their present implementation. Simulation is used to reproduce experiments on systems from the past. Realistic simulation induces a similar analysis and conclusions to the analysis of the native results. To this end, we devised an approach for bridging the native and simulative executions of parallel applications on HPC systems. This simulation approach is used to reproduce scheduling experiments on past and present systems to verify the implementation of DLS techniques. Given the multiple levels of parallelism offered by the present HPC systems, we analyzed the load imbalance in scientific applications, from computer vision, astrophysics, and mathematical kernels, at both thread and process levels. This analysis revealed a significant interplay between thread level and process level load balancing. We found that dynamic load balancing at the thread level propagates to the process level and vice versa. However, the best application performance is only achieved by two-level dynamic load balancing. Next, we examined the performance of applications under perturbations. We found that the most robust DLS technique does not deliver the best performance under various perturbations. The most efficient DLS technique changes by changing the application, the system, or perturbations during execution. This signifies the algorithm selection problem in the DLS. We leveraged realistic simulations to address the algorithm selection problem of scheduling under perturbations via a simulation assisted approach (SimAS), which answers the second question. SimAS dynamically selects DLS techniques that improve the performance depending on the application, system, and perturbations during the execution. To answer the third question, we introduced a robust dynamic load balancing (rDLB) approach for the robust self-scheduling of scientific applications under failures (question 3). rDLB proactively reschedules already allocated tasks and requires no detection of perturbations. rDLB tolerates up to P −1 processor failures (P is the number of processors allocated to the application) and boosts the flexibility of applications against nonfatal perturbations, such as reduced availability of resources. This thesis is the first to provide insights into the interplay between thread and process level dynamic load balancing in scientific applications. Verified DLS techniques, SimAS, and rDLB are integrated into an MPI-based dynamic load balancing library (DLS4LB), which supports thirteen DLS techniques, for robust dynamic load balancing of scientific applications on HPC systems. Using the methods devised in this thesis, we improved the performance of scientific applications by up to 21% via two-level dynamic load balancing. Under perturbations, we enhanced their performance by a factor of 7 and their flexibility by a factor of 30. This thesis opens up the horizons into understanding the interplay of load balancing between various levels of software parallelism and lays the ground for robust multilevel scheduling for the upcoming Exascale HPC systems and beyond

    Parallel and Distributed Computing

    Get PDF
    The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

    Personal mobile grids with a honeybee inspired resource scheduler

    Get PDF
    The overall aim of the thesis has been to introduce Personal Mobile Grids (PMGrids) as a novel paradigm in grid computing that scales grid infrastructures to mobile devices and extends grid entities to individual personal users. In this thesis, architectural designs as well as simulation models for PM-Grids are developed. The core of any grid system is its resource scheduler. However, virtually all current conventional grid schedulers do not address the non-clairvoyant scheduling problem, where job information is not available before the end of execution. Therefore, this thesis proposes a honeybee inspired resource scheduling heuristic for PM-Grids (HoPe) incorporating a radical approach to grid resource scheduling to tackle this problem. A detailed design and implementation of HoPe with a decentralised self-management and adaptive policy are initiated. Among the other main contributions are a comprehensive taxonomy of grid systems as well as a detailed analysis of the honeybee colony and its nectar acquisition process (NAP), from the resource scheduling perspective, which have not been presented in any previous work, to the best of our knowledge. PM-Grid designs and HoPe implementation were evaluated thoroughly through a strictly controlled empirical evaluation framework with a well-established heuristic in high throughput computing, the opportunistic scheduling heuristic (OSH), as a benchmark algorithm. Comparisons with optimal values and worst bounds are conducted to gain a clear insight into HoPe behaviour, in terms of stability, throughput, turnaround time and speedup, under different running conditions of number of jobs and grid scales. Experimental results demonstrate the superiority of HoPe performance where it has successfully maintained optimum stability and throughput in more than 95% of the experiments, with HoPe achieving three times better than the OSH under extremely heavy loads. Regarding the turnaround time and speedup, HoPe has effectively achieved less than 50% of the turnaround time incurred by the OSH, while doubling its speedup in more than 60% of the experiments. These results indicate the potential of both PM-Grids and HoPe in realising futuristic grid visions. Therefore considering the deployment of PM-Grids in real life scenarios and the utilisation of HoPe in other parallel processing and high throughput computing systems are recommended.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Robust distributed scheduling via time period aggregation

    Get PDF
    In this paper, we evaluate whether the robustness of a market mechanism that allocates complementary resources could be improved through the aggregation of time periods in which resources are consumed. In particular, we study a multi-round combinatorial auction that is built on a general equilibrium framework. We adopt the general equilibrium framework and the particular combinatorial auction design from the literature, and we investigate the benefits and the limitation of time-period aggregation when demand-side uncertainties are introduced. By using simulation experiments on a real-life resource allocation problem from a container port, we show that, under stochastic conditions, the performance variation of the process decreases as the time frame length (time frames are obtained by aggregating time periods) increases. This is achieved without causing substantial deterioration in the mean performance. The main driver for the increase in robustness is that longer time frames result in allocations where resources are assigned in longer contiguous time blocks. The resulting resource continuity allows bidders to shift schedules upon realization of stochasticity. To demonstrate the generality of the notion that resource continuity increases allocation robustness, we perform further experiments on a decentralized variant of the classical job shop scheduling problem. The experiment results demonstrate similar benefits

    Application level runtime load management : a bayesian approach

    Get PDF
    A computação paralela em sistemas distribuídos partilhados exige novas abordagens ao problema da gestão da carga computacional, uma vez que os algoritmos existentes ficam aquém das expectativas. A execução eficiente de aplicações paralelas irregulares em clusters de computadores partilhados dinamicamente exibe um comportamento imprevisível, devido à à variabilidade dos requisitos da aplicação e da disponibilidade dos recursos do sistema. Esta tese investiga as vantagens de incluir explicitamente no modelo de execução de um escalonador ao nível da aplicação a incerteza que este tem sobre o estado do ambiente em cada instante. Propõe-se um mecanismo de decisão baseado em redes de decisão de Bayes, complementado por uma estrutura genérica para estas redes, vocacionada para o escalonamento ao nível da aplicação; a utilização de um algoritmo de inferência probabilística permite ao escalonador tomar decisões mais eficazes, baseadas em previsões escolásticas das consequências destas decisões, geradas a partir de informação incompleta e desactualizada sobre o estado do ambiente. É proposto um modelo de desempenho da aplicação e respectivas métricas, que permite prever o comportamento da aplicação e do sistema distribuído; estas métricas são utilizadas quer no mecanismo de decisão do escalonador, quer para avaliar o desempenho do mesmo. Para verificar se esta abordagem contribui para melhorar o tempo de execução das aplicações e a eficiência do escalonador, foi desenvolvido um ray tracer paralelo, representativo de uma classe de aplicações baseada em passagem de mensagens com paralelismo no domínio dos dados e comportamento irregular. Este protótipo foi executado num cluster com sete nodos partilhados no tempo e submetidos a vários padrões sintéticos de cargas de trabalho dinâmicas. Para avaliar a eficácia da gestão de carga proposta, o desempenho do escalonador estocástico foi comparado com três escalonadores de referência: uma distribuição estática e uniforme da carga, uma estratégia orientada ao pedido e uma política de escalonamento determinística baseada em sensores. Os resultados obtidos demonstram que estratégias dinâmicas baseadas em sensores obtêm grandes melhorias de desempenho sobre estratégias que não usam informação sobre o estado do ambiente, e realçam as vantagens do escalonador estocástico relativamente a um escalonador determinístico com um nível de complexidade equivalente.Affordable parallel computing on distributed shared systems requires novel approaches to manage the runtime load distribution, since current algorithms fall below expectations. The efficient execution of irregular parallel applications, on dynamically shared computing clusters, has an unpredictable dynamic behaviour, due both to the application requirements and to the available system's resources. This thesis addresses the explicit inclusion of the uncertainty an application level scheduling agent has about the environment, on its internal model of the world and on its decision making mechanism. Bayesian decision networks are introduced and a generic framework is proposed for application level scheduling, where a probabilistic inference algorithm helps the scheduler to efficiently make decisions with improved predictions, based on available incomplete and aged measured data. An application level performance model and associated metrics (performance, environment and overheads) are proposed to obtain application and system behaviour estimates, to include in the scheduling agent's model and to help the evaluation. To verify that this novel approach improves the overall application execution time and the scheduling efficiency, a parallel ray tracer was developed as a message passing irregular data parallel application, and an execution model prototype was built to run on a seven time-shared nodes computing cluster, with dynamically variable synthetic workloads. To assess the effectiveness of the load management, the stochastic scheduler was evaluated rendering several complex scenes, and compared with three reference scheduling strategies: a uniform work distribution, a demand driven work allocation and a sensor based deterministic scheduling strategy. The evaluation results show considerable performance improvements over blind strategies, and stress the decision network based scheduler improvements over the sensor based deterministic approach of identical complexity.Fundação para a Ciência e Tecnologia - PRAXIS XXI 2/2.1/TTT/1557/95
    corecore