848 research outputs found

    A Survey of Prediction and Classification Techniques in Multicore Processor Systems

    Get PDF
    In multicore processor systems, being able to accurately predict the future provides new optimization opportunities, which otherwise could not be exploited. For example, an oracle able to predict a certain application\u27s behavior running on a smart phone could direct the power manager to switch to appropriate dynamic voltage and frequency scaling modes that would guarantee minimum levels of desired performance while saving energy consumption and thereby prolonging battery life. Using predictions enables systems to become proactive rather than continue to operate in a reactive manner. This prediction-based proactive approach has become increasingly popular in the design and optimization of integrated circuits and of multicore processor systems. Prediction transforms from simple forecasting to sophisticated machine learning based prediction and classification that learns from existing data, employs data mining, and predicts future behavior. This can be exploited by novel optimization techniques that can span across all layers of the computing stack. In this survey paper, we present a discussion of the most popular techniques on prediction and classification in the general context of computing systems with emphasis on multicore processors. The paper is far from comprehensive, but, it will help the reader interested in employing prediction in optimization of multicore processor systems

    Two-tiered Online Optimization of Region-wide Datacenter Resource Allocation via Deep Reinforcement Learning

    Full text link
    This paper addresses the important need for advanced techniques in continuously allocating workloads on shared infrastructures in data centers, a problem arising due to the growing popularity and scale of cloud computing. It particularly emphasizes the scarcity of research ensuring guaranteed capacity in capacity reservations during large-scale failures. To tackle these issues, the paper presents scalable solutions for resource management. It builds on the prior establishment of capacity reservation in cluster management systems and the two-level resource allocation problem addressed by the Resource Allowance System (RAS). Recognizing the limitations of Mixed Integer Linear Programming (MILP) for server assignment in a dynamic environment, this paper proposes the use of Deep Reinforcement Learning (DRL), which has been successful in achieving long-term optimal results for time-varying systems. A novel two-level design that utilizes a DRL-based algorithm is introduced to solve optimal server-to-reservation assignment, taking into account of fault tolerance, server movement minimization, and network affinity requirements due to the impracticality of directly applying DRL algorithms to large-scale instances with millions of decision variables. The paper explores the interconnection of these levels and the benefits of such an approach for achieving long-term optimal results in the context of large-scale cloud systems. We further show in the experiment section that our two-level DRL approach outperforms the MIP solver and heuristic approaches and exhibits significantly reduced computation time compared to the MIP solver. Specifically, our two-level DRL approach performs 15% better than the MIP solver on minimizing the overall cost. Also, it uses only 26 seconds to execute 30 rounds of decision making, while the MIP solver needs nearly an hour

    A HPC Co-Scheduler with Reinforcement Learning

    Full text link
    Although High Performance Computing (HPC) users understand basic resource requirements such as the number of CPUs and memory limits, internal infrastructural utilization data is exclusively leveraged by cluster operators, who use it to configure batch schedulers. This task is challenging and increasingly complex due to ever larger cluster scales and heterogeneity of modern scientific workflows. As a result, HPC systems achieve low utilization with long job completion times (makespans). To tackle these challenges, we propose a co-scheduling algorithm based on an adaptive reinforcement learning algorithm, where application profiling is combined with cluster monitoring. The resulting cluster scheduler matches resource utilization to application performance in a fine-grained manner (i.e., operating system level). As opposed to nominal allocations, we apply decision trees to model applications' actual resource usage, which are used to estimate how much resource capacity from one allocation can be co-allocated to additional applications. Our algorithm learns from incorrect co-scheduling decisions and adapts from changing environment conditions, and evaluates when such changes cause resource contention that impacts quality of service metrics such as jobs slowdowns. We integrate our algorithm in an HPC resource manager that combines Slurm and Mesos for job scheduling and co-allocation, respectively. Our experimental evaluation performed in a dedicated cluster executing a mix of four real different scientific workflows demonstrates improvements on cluster utilization of up to 51% even in high load scenarios, with 55% average queue makespan reductions under low loads

    MOSAIC: A Multi-Objective Optimization Framework for Sustainable Datacenter Management

    Full text link
    In recent years, cloud service providers have been building and hosting datacenters across multiple geographical locations to provide robust services. However, the geographical distribution of datacenters introduces growing pressure to both local and global environments, particularly when it comes to water usage and carbon emissions. Unfortunately, efforts to reduce the environmental impact of such datacenters often lead to an increase in the cost of datacenter operations. To co-optimize the energy cost, carbon emissions, and water footprint of datacenter operation from a global perspective, we propose a novel framework for multi-objective sustainable datacenter management (MOSAIC) that integrates adaptive local search with a collaborative decomposition-based evolutionary algorithm to intelligently manage geographical workload distribution and datacenter operations. Our framework sustainably allocates workloads to datacenters while taking into account multiple geography- and time-based factors including renewable energy sources, variable energy costs, power usage efficiency, carbon factors, and water intensity in energy. Our experimental results show that, compared to the best-known prior work frameworks, MOSAIC can achieve 27.45x speedup and 1.53x improvement in Pareto Hypervolume while reducing the carbon footprint by up to 1.33x, water footprint by up to 3.09x, and energy costs by up to 1.40x. In the simultaneous three-objective co-optimization scenario, MOSAIC achieves a cumulative improvement across all objectives (carbon, water, cost) of up to 4.61x compared to the state-of-the-arts

    Learning Scheduling Algorithms for Data Processing Clusters

    Full text link
    Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems, however, use simple generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective such as minimizing average job completion time. Off-the-shelf RL techniques, however, cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves the average job completion time over hand-tuned scheduling heuristics by at least 21%, achieving up to 2x improvement during periods of high cluster load

    Edge Intelligence Simulator:a platform for simulating intelligent edge orchestration solutions

    Get PDF
    Abstract. To support the stringent requirements of the future intelligent and interactive applications, intelligence needs to become an essential part of the resource management in the edge environment. Developing intelligent orchestration solutions is a challenging and arduous task, where the evaluation and comparison of the proposed solution is a focal point. Simulation is commonly used to evaluate and compare proposed solutions. However, there does not currently exist openly available simulators that would have a specific focus on supporting the research on intelligent edge orchestration methods. This thesis presents a simulation platform called Edge Intelligence Simulator (EISim), the purpose of which is to facilitate the research on intelligent edge orchestration solutions. In its current form, the platform supports simulating deep reinforcement learning based solutions and different orchestration control topologies in scenarios related to task offloading and resource pricing on edge. The platform also includes additional tools for creating simulation environments, running simulations for agent training and evaluation, and plotting results. This thesis gives a comprehensive overview of the state of the art in edge and fog simulation, orchestration, offloading, and resource pricing, which provides a basis for the design of EISim. The methods and tools that form the foundation of the current EISim implementation are also presented, along with a detailed description of the EISim architecture, default implementations, use, and additional tools. Finally, EISim with its default implementations is validated and evaluated through a large-scale simulation study with 24 simulation scenarios. The results of the simulation study verify the end-to-end performance of EISim and show its capability to produce sensible results. The results also illustrate how EISim can help the researcher in controlling and monitoring the training of intelligent agents, as well as in evaluating solutions against different control topologies.Reunaälysimulaattori : alusta älykkäiden reunalaskennan orkestrointiratkaisujen simulointiin. Tiivistelmä. Älykkäiden ratkaisujen täytyy tulla olennaiseksi osaksi reunaympäristön resurssien hallinnointia, jotta tulevaisuuden vuorovaikutteisten ja älykkäiden sovellusten suoritusta voidaan tukea tasolla, joka täyttää sovellusten tiukat suoritusvaatimukset. Älykkäiden orkestrointiratkaisujen kehitys on vaativa ja työläs prosessi, jonka keskiöön kuuluu olennaisesti menetelmien testaaminen ja vertailu muita menetelmiä vasten. Simulointia käytetään tyypillisesti menetelmien arviointiin ja vertailuun, mutta tällä hetkellä ei ole avoimesti saatavilla simulaattoreita, jotka eritoten keskittyisivät tukemaan älykkäiden reunaorkestrointiratkaisujen kehitystä. Tässä opinnäytetyössä esitellään simulaatioalusta nimeltään Edge Intelligence Simulator (EISim; Reunaälysimulaattori), jonka tarkoitus on helpottaa älykkäiden reunaorkestrointiratkaisujen tutkimusta. Nykymuodossaan se tukee vahvistusoppimispohjaisten ratkaisujen sekä erityyppisten orkestroinnin kontrollitopologioiden simulointia skenaarioissa, jotka liittyvät laskennan siirtoon ja resurssien hinnoitteluun reunaympäristössä. Alustan mukana tulee myös lisätyökaluja, joita voi käyttää simulaatioympäristöjen luomiseen, simulaatioiden ajamiseen agenttien koulutusta ja arviointia varten, sekä simulaatiotulosten visualisoimiseen. Tämä opinnäytetyö sisältää kattavan katsauksen reunaympäristön simuloinnin, reunaorkestroinnin, laskennan siirron ja resurssien hinnoittelun nykytilaan kirjallisuudessa, mikä tarjoaa kunnollisen lähtökohdan EISimin toteutukselle. Opinnäytetyö esittelee menetelmät ja työkalut, joihin EISimin tämänhetkinen toteutus perustuu, sekä antaa yksityiskohtaisen kuvauksen EISimin arkkitehtuurista, oletustoteutuksista, käytöstä ja lisätyökaluista. EISimin validointia ja arviointia varten esitellään laaja simulaatiotutkimus, jossa EISimin oletustoteutuksia simuloidaan 24 simulaatioskenaariossa. Simulaatiotutkimuksen tulokset todentavat EISimin kokonaisvaltaisen toimintakyvyn, sekä osoittavat EISimin kyvyn tuottaa järkeviä tuloksia. Tulokset myös havainnollistavat, miten EISim voi auttaa tutkijoita älykkäiden agenttien koulutuksessa ja ratkaisujen arvioinnissa eri kontrollitopologioita vasten
    corecore