252 research outputs found

    Development of Data-Driven Dispatching Heuristics for Heterogeneous HPC Systems

    Get PDF
    Nell’ambito dei sistemi High-Performance Computing, l'uso di euristiche di dispatching efficaci, per lo scheduling e l'allocazione dei jobs in arrivo, è fondamentale al fine di ottenere buoni livelli di Quality of Service. In questo elaborato ci concentreremo sul design e l’analisi di euristiche di allocazione delle risorse, che saranno progettate per sistemi HPC eterogenei, nei quali i nodi possono essere equipaggiati con diverse tipologie di unità di elaborazione. Impiegheremo poi euristiche data-driven per la predizione della durata dei jobs, e valuteremo il tutto dal punto di vista del throughput di sistema. Considereremo in particolare Eurora, un sistema HPC eterogeneo realizzato da CINECA, oltre che un workload catturato dal relativo log di sistema, contenente jobs reali inviati dagli utenti. Tutto ciò è stato possibile grazie ad AccaSim, un simulatore di sistemi HPC sviluppato nel Dipartimento di Informatica - Scienza e Ingegneria (DISI) dell’Università di Bologna, ed al quale si è contribuito in modo sostanziale. Quest’elaborato mostra che l’impatto di diverse euristiche di allocazione sul throughput di un sistema HPC eterogeneo non è trascurabile, con variazioni in grado di raggiungere picchi di un ordine di grandezza, e più pronunciate considerando brevi intervalli temporali, dell'ordine dei mesi. Abbiamo inoltre osservato che l’impiego di euristiche per la predizione della durata dei jobs è di grande beneficio al throughput su tutte le euristiche di allocazione, e specialmente su quelle che integrano in maniera più profonda tali elementi data-driven. Infine, l’analisi effettuata ha permesso di caratterizzare integralmente il sistema Eurora ed il relativo workload, permettendoci di comprendere al meglio gli effetti su di esso dei diversi metodi di dispatching, nonché di estendere le nostre considerazioni anche ad altre classi di sistemi

    Managing server energy and reducing operational cost for online service providers

    Get PDF
    The past decade has seen the energy consumption in servers and Internet Data Centers (IDCs) skyrocket. A recent survey estimated that the worldwide spending on servers and cooling have risen to above $30 billion and is likely to exceed spending on the new server hardware . The rapid rise in energy consumption has posted a serious threat to both energy resources and the environment, which makes green computing not only worthwhile but also necessary. This dissertation intends to tackle the challenges of both reducing the energy consumption of server systems and by reducing the cost for Online Service Providers (OSPs). Two distinct subsystems account for most of IDC’s power: the server system, which accounts for 56% of the total power consumption of an IDC, and the cooling and humidifcation systems, which accounts for about 30% of the total power consumption. The server system dominates the energy consumption of an IDC, and its power draw can vary drastically with data center utilization. In this dissertation, we propose three models to achieve energy effciency in web server clusters: an energy proportional model, an optimal server allocation and frequency adjustment strategy, and a constrained Markov model. The proposed models have combined Dynamic Voltage/Frequency Scaling (DV/FS) and Vary-On, Vary-off (VOVF) mechanisms that work together for more energy savings. Meanwhile, corresponding strategies are proposed to deal with the transition overheads. We further extend server energy management to the IDC’s costs management, helping the OSPs to conserve, manage their own electricity cost, and lower the carbon emissions. We have developed an optimal energy-aware load dispatching strategy that periodically maps more requests to the locations with lower electricity prices. A carbon emission limit is placed, and the volatility of the carbon offset market is also considered. Two energy effcient strategies are applied to the server system and the cooling system respectively. With the rapid development of cloud services, we also carry out research to reduce the server energy in cloud computing environments. In this work, we propose a new live virtual machine (VM) placement scheme that can effectively map VMs to Physical Machines (PMs) with substantial energy savings in a heterogeneous server cluster. A VM/PM mapping probability matrix is constructed, in which each VM request is assigned with a probability running on PMs. The VM/PM mapping probability matrix takes into account resource limitations, VM operation overheads, server reliability as well as energy effciency. The evolution of Internet Data Centers and the increasing demands of web services raise great challenges to improve the energy effciency of IDCs. We also express several potential areas for future research in each chapter

    Load Balancing in the Non-Degenerate Slowdown Regime

    Full text link
    We analyse Join-the-Shortest-Queue in a contemporary scaling regime known as the Non-Degenerate Slowdown regime. Join-the-Shortest-Queue (JSQ) is a classical load balancing policy for queueing systems with multiple parallel servers. Parallel server queueing systems are regularly analysed and dimensioned by diffusion approximations achieved in the Halfin-Whitt scaling regime. However, when jobs must be dispatched to a server upon arrival, we advocate the Non-Degenerate Slowdown regime (NDS) to compare different load-balancing rules. In this paper we identify novel diffusion approximation and timescale separation that provides insights into the performance of JSQ. We calculate the price of irrevocably dispatching jobs to servers and prove this to within 15% (in the NDS regime) of the rules that may manoeuvre jobs between servers. We also compare ours results for the JSQ policy with the NDS approximations of many modern load balancing policies such as Idle-Queue-First and Power-of-dd-choices policies which act as low information proxies for the JSQ policy. Our analysis leads us to construct new rules that have identical performance to JSQ but require less communication overhead than power-of-2-choices.Comment: Revised journal submission versio

    Constraint Programming-based Job Dispatching for Modern HPC Applications

    Get PDF
    A High-Performance Computing job dispatcher is a critical software that assigns the finite computing resources to submitted jobs. This resource assignment over time is known as the on-line job dispatching problem in HPC systems. The fact the problem is on-line means that solutions must be computed in real-time, and their required time cannot exceed some threshold to do not affect the normal system functioning. In addition, a job dispatcher must deal with a lot of uncertainty: submission times, the number of requested resources, and duration of jobs. Heuristic-based techniques have been broadly used in HPC systems, at the cost of achieving (sub-)optimal solutions in a short time. However, the scheduling and resource allocation components are separated, thus generates a decoupled decision that may cause a performance loss. Optimization-based techniques are less used for this problem, although they can significantly improve the performance of HPC systems at the expense of higher computation time. Nowadays, HPC systems are being used for modern applications, such as big data analytics and predictive model building, that employ, in general, many short jobs. However, this information is unknown at dispatching time, and job dispatchers need to process large numbers of them quickly while ensuring high Quality-of-Service (QoS) levels. Constraint Programming (CP) has been shown to be an effective approach to tackle job dispatching problems. However, state-of-the-art CP-based job dispatchers are unable to satisfy the challenges of on-line dispatching, such as generate dispatching decisions in a brief period and integrate current and past information of the housing system. Given the previous reasons, we propose CP-based dispatchers that are more suitable for HPC systems running modern applications, generating on-line dispatching decisions in a proper time and are able to make effective use of job duration predictions to improve QoS levels, especially for workloads dominated by short jobs

    Kairos: Preemptive Data Center Scheduling Without Runtime Estimates

    Get PDF
    The vast majority of data center schedulers use task runtime estimates to improve the quality of their scheduling decisions. Knowledge about runtimes allows the schedulers, among other things, to achieve better load balance and to avoid head-of-line blocking. Obtaining accurate runtime estimates is, however, far from trivial, and erroneous estimates lead to sub-optimal scheduling decisions. Techniques to mitigate the effect of inaccurate estimates have shown some success, but the fundamental problem remains. This paper presents Kairos, a novel data center scheduler that assumes no prior information on task runtimes. Kairos introduces a distributed approximation of the Least Attained Service (LAS) scheduling policy. Kairos consists of a centralized scheduler and per-node schedulers. The per-node schedulers implement LAS for tasks on their node, using preemption as necessary to avoid head-of-line blocking. The centralized scheduler distributes tasks among nodes in a manner that balances the load and imposes on each node a workload in which LAS provides favorable performance. We have implemented Kairos in YARN. We compare its performance against the YARN FIFO scheduler and Big-C, an open-source state-of-the-art YARN-based scheduler that also uses preemption. Compared to YARN FIFO, Kairos reduces the median job completion time by 73% and the 99th percentile by 30%. Compared to Big-C, the improvements are 37% for the median and 57% for the 99th percentile. We evaluate Kairos at scale by implementing it in the Eagle simulator and comparing its performance against Eagle. Kairos improves the 99th percentile of short job completion times by up to 55% for the Google trace and 85% for the Yahoo trace

    Algorithms for Scheduling Problems

    Get PDF
    This edited book presents new results in the area of algorithm development for different types of scheduling problems. In eleven chapters, algorithms for single machine problems, flow-shop and job-shop scheduling problems (including their hybrid (flexible) variants), the resource-constrained project scheduling problem, scheduling problems in complex manufacturing systems and supply chains, and workflow scheduling problems are given. The chapters address such subjects as insertion heuristics for energy-efficient scheduling, the re-scheduling of train traffic in real time, control algorithms for short-term scheduling in manufacturing systems, bi-objective optimization of tortilla production, scheduling problems with uncertain (interval) processing times, workflow scheduling for digital signal processor (DSP) clusters, and many more

    Load Balancing of Elastic Data Traffic in Heterogeneous Wireless Networks

    Get PDF
    The increasing amount of mobile data traffic has resulted in an architectural innovation in cellular networks through the introduction of heterogeneous networks. In heterogeneous networks, the deployment of macrocells is accompanied by the use of low power pico and femtocells (referred to as microcells) in hot spot areas inside the macrocell which increase the data rate per unit area. The purpose of this thesis is to study the load balancing problem of elastic data traffic in heterogeneous wireless networks. These networks consist of different types of cells with different characteristics. Individual cells are modelled as an M/G/1 - PS queueing system. This results in a multi-server queueing model consisting of a single macrocell with multiple microcells within the area. Both static and dynamic load balancing schemes are developed to balance the data flows between the macrocell and microcells so that the mean flow-level delay is minimized. Both analytical and numerical methods are used for static policies. For dynamic policies, the performance is evaluated by simulations. The results of the study indicate that all dynamic policies can significantly improve the flow-level delay performance in the system under consideration compared to the optimal static policy. The results also indicate that MJSQ and MP are best policies although MJSQ needs less state information. The performance gain of most of the dynamic polices is insensitive with respect to the flow size distribution. In addition, many interesting tests are conducted such as the effect of increasing the number of microcells and the impact of service rate difference between macrocell and microcells
    • …
    corecore