13 research outputs found

    Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015) Krakow, Poland

    Get PDF
    Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015

    Time Based Traffic Policing and Shaping Algorithms on Campus Network Internet Traffic

    Get PDF
    This paper presents the development of algorithm on Policing and Shaping Traffic for bandwidth management which serves as Quality of Services (QoS) in a Campus network. The Campus network is connected with a 16 Mbps Virtual Private Network line to the internet Wide Area Network. Both inbound and outbound real internet traffic were captured and analyzed. Goodness of Fit (GoF) test with Anderson-darling (AD) was fitted to real traffic to identify the best distribution. The Best-fitted Cumulative Distribution Function (CDF) model was used to analyze and characterized the data and the parameters. Based on the identified parameters, a new Time Based Policing and Shaping algorithm have been developed and simulated. The policing process drops the burst traffic, while the shaping process delays traffic to the next time transmissions. Mathematical model to formulate the controlled algorithm on burst traffic with selected time has been derived. Inbound traffic threshold control burst was policed at 1200 MByte (MB) while outbound traffic threshold was policed at 680 MB in the algorithms. The algorithms were varied in relation to the identified Weibull parameters to reduce the burst. The analysis shows that the higher shape parameter value that relates to the lower burst of network throughput can be controlled. This research presented a new method for time based bandwidth management and an enhanced network performance by identifying new traffic parameters for traffic modeling in Campus network

    A Framework for Approximate Optimization of BoT Application Deployment in Hybrid Cloud Environment

    Get PDF
    We adopt a systematic approach to investigate the efficiency of near-optimal deployment of large-scale CPU-intensive Bag-of-Task applications running on cloud resources with the non-proportional cost to performance ratios. Our analytical solutions perform in both known and unknown running time of the given application. It tries to optimize users' utility by choosing the most desirable tradeoff between the make-span and the total incurred expense. We propose a schema to provide a near-optimal deployment of BoT application regarding users' preferences. Our approach is to provide user with a set of Pareto-optimal solutions, and then she may select one of the possible scheduling points based on her internal utility function. Our framework can cope with uncertainty in the tasks' execution time using two methods, too. First, an estimation method based on a Monte Carlo sampling called AA algorithm is presented. It uses the minimum possible number of sampling to predict the average task running time. Second, assuming that we have access to some code analyzer, code profiling or estimation tools, a hybrid method to evaluate the accuracy of each estimation tool in certain interval times for improving resource allocation decision has been presented. We propose approximate deployment strategies that run on hybrid cloud. In essence, proposed strategies first determine either an estimated or an exact optimal schema based on the information provided from users' side and environmental parameters. Then, we exploit dynamic methods to assign tasks to resources to reach an optimal schema as close as possible by using two methods. A fast yet simple method based on First Fit Decreasing algorithm, and a more complex approach based on the approximation solution of the transformed problem into a subset sum problem. Extensive experiment results conducted on a hybrid cloud platform confirm that our framework can deliver a near optimal solution respecting user's utility function

    Anti load-balancing for energy-aware distributed scheduling of virtual machines

    Get PDF
    La multiplication de l'informatique en nuage (Cloud) a abouti à la création de centres de données dans le monde entier. Le Cloud contient des milliers de nœuds de calcul. Cependant, les centres de données consomment d'énorme quantités d'énergie à travers le monde estimées à plus de 1,5 % de la consommation mondiale d'électricité et devrait continuer à croître. Une problématique habituellement étudiée dans les systèmes distribués est de répartir équitablement la charge. Mais lorsque l'objectif est de réduire la consommation électrique, ce type d'algorithmes peut mener à avoir des serveurs fortement sous chargés et donc à consommer de l'énergie inutilement. Cette thèse présente de nouvelles techniques, des algorithmes et des logiciels pour la consolidation dynamique et distribuée de machines virtuelles (VM) dans le Cloud. L'objectif principal de cette thèse est de proposer des stratégies d'ordonnancement tenant compte de l'énergie dans le Cloud pour les économies d'énergie. Pour atteindre cet objectif, nous utilisons des approches centralisées et décentralisées. Les contributions à ce niveau méthodologique sont présentées sur ces deux axes. L'objectif de notre démarche est de réduire la consommation de l'énergie totale du centre de données en contrôlant la consommation globale d'énergie des applications tout en assurant les contrats de service pour l'exécution des applications. La consommation d'énergie est réduite en désactivant et réactivant dynamiquement les nœuds physiques pour répondre à la demande des ressources. Les principales contributions sont les suivantes: - Ici on s'intéressera à la problématique contraire de l'équilibrage de charge. Il s'agit d'une technique appelée Anti Load-Balancing pour concentrer la charge sur un nombre minimal de nœuds. Le but est de pouvoir éteindre les nœuds libérés et donc de minimiser la consommation énergétique du système. - Ensuite une approche centralisée a été proposée et fonctionne en associant une valeur de crédit à chaque nœud. Le crédit d'un nœud dépend de son affinité pour ses tâches, sa charge de travail actuelle et sa façon d'effectuer ses communications. Les économies d'énergie sont atteintes par la consolidation continue des machines virtuelles en fonction de l'utilisation actuelle des ressources, les topologies de réseaux virtuels établis entre les machines virtuelles et l'état thermique de nœuds de calcul. Les résultats de l'expérience sur une extension de CloudSim (EnerSim) montrent que l'énergie consommée par les applications du Cloud et l'efficacité énergétique ont été améliorées. - Le troisième axe est consacré à l'examen d'une approche appelée "Cooperative scheduling Anti load-balancing Algorithm for cloud". Il s'agit d'une approche décentralisée permettant la coopération entre les différents sites. Pour valider cet algorithme, nous avons étendu le simulateur MaGateSim. Avec une large évaluation expérimentale d'un ensemble de données réelles, nous sommes arrivés à la conclusion que l'approche à la fois en utilisant des algorithmes centralisés et décentralisés peut réduire l'énergie consommée des centres de données.The multiplication of Cloud computing has resulted in the establishment of largescale data centers around the world containing thousands of compute nodes. However, Cloud consume huge amounts of energy. Energy consumption of data centers worldwide is estimated at more than 1.5% of the global electricity use and is expected to grow further. A problem usually studied in distributed systems is to evenly distribute the load. But when the goal is to reduce energy consumption, this type of algorithms can lead to have machines largely under-loaded and therefore consuming energy unnecessarily. This thesis presents novel techniques, algorithms, and software for distributed dynamic consolidation of Virtual Machines (VMs) in Cloud. The main objective of this thesis is to provide energy-aware scheduling strategies in cloud computing for energy saving. To achieve this goal, we use centralized and decentralized approaches. Contributions in this method are presented these two axes. The objective of our approach is to reduce data center's total energy consumed by controlling cloud applications' overall energy consumption while ensuring cloud applications' service level agreement. Energy consumption is reduced by dynamically deactivating and reactivating physical nodes to meet the current resource demand. The key contributions are: - First, we present an energy aware clouds scheduling using anti-load balancing algorithm : concentrate the load on a minimum number of severs. The goal is to turn off the machines released and therefore minimize the energy consumption of the system. - The second axis proposed an algorithm which works by associating a credit value with each node. The credit of a node depends on its affinity to its jobs, its current workload and its communication behavior. Energy savings are achieved by continuous consolidation of VMs according to current utilization of resources, virtual network topologies established between VMs, and thermal state of computing nodes. The experiment results, obtained with a simulator which extends CloudSim (EnerSim), show that the cloud application energy consumption and energy efficiency are being improved. - The third axis is dedicated to the consideration of a decentralized dynamic scheduling approach entitled Cooperative scheduling Anti-load balancing Algorithm for cloud. It is a decentralized approach that allows cooperation between different sites. To validate this algorithm, we have extended the simulator MaGateSim. With an extensive experimental evaluation with a real workload dataset, we got the conclusion that both the approach using centralized and decentralized algorithms can reduce energy consumed by data centers

    Straggler-Resilient Distributed Computing

    Get PDF
    In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of University of Bergen's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.Utbredelsen av distribuerte datasystemer har økt betydelig de siste årene. Dette skyldes først og fremst at behovet for beregningskraft øker raskere enn hastigheten til en enkelt datamaskin, slik at vi må bruke flere datamaskiner for å møte etterspørselen, og at det blir stadig mer vanlig at systemer er spredt over et stort geografisk område. Dette paradigmeskiftet medfører mange tekniske utfordringer. En av disse er knyttet til "straggler"-problemet, som er forårsaket av forsinkelsesvariasjoner i distribuerte systemer, der en beregning forsinkes av noen få langsomme noder slik at andre noder må vente før de kan fortsette. Straggler-problemet kan svekke effektiviteten til distribuerte systemer betydelig i situasjoner der en enkelt node som opplever en midlertidig overbelastning kan låse et helt system. I denne avhandlingen studerer vi metoder for å gjøre beregninger av forskjellige typer motstandsdyktige mot slike problemer, og dermed gjøre det mulig for et distribuert system å fortsette til tross for at noen noder ikke svarer i tide. Metodene vi foreslår er skreddersydde for spesielle typer beregninger. Vi foreslår metoder tilpasset distribuert matrise-vektor-multiplikasjon (som er en grunnleggende operasjon i mange typer beregninger), distribuert maskinlæring og distribuert sporing av en tilfeldig prosess (for eksempel det å spore plasseringen til kjøretøy for å unngå kollisjon). De foreslåtte metodene utnytter redundans som enten blir introdusert som en del av metoden, eller som naturlig eksisterer i det underliggende problemet, til å kompensere for manglende delberegninger. For en av de foreslåtte metodene utnytter vi redundans for også å øke effektiviteten til kommunikasjonen mellom noder, og dermed redusere mengden data som må kommuniseres over nettverket. I likhet med straggler-problemet kan slik kommunikasjon begrense effektiviteten i distribuerte systemer betydelig. De foreslåtte metodene gir signifikante forbedringer i ventetid og pålitelighet sammenlignet med tidligere metoder.The number and scale of distributed computing systems being built have increased significantly in recent years. Primarily, that is because: i) our computing needs are increasing at a much higher rate than computers are becoming faster, so we need to use more of them to meet demand, and ii) systems that are fundamentally distributed, e.g., because the components that make them up are geographically distributed, are becoming increasingly prevalent. This paradigm shift is the source of many engineering challenges. Among them is the straggler problem, which is a problem caused by latency variations in distributed systems, where faster nodes are held up by slower ones. The straggler problem can significantly impair the effectiveness of distributed systems—a single node experiencing a transient outage (e.g., due to being overloaded) can lock up an entire system. In this thesis, we consider schemes for making a range of computations resilient against such stragglers, thus allowing a distributed system to proceed in spite of some nodes failing to respond on time. The schemes we propose are tailored for particular computations. We propose schemes designed for distributed matrix-vector multiplication, which is a fundamental operation in many computing applications, distributed machine learning—in the form of a straggler-resilient first-order optimization method—and distributed tracking of a time-varying process (e.g., tracking the location of a set of vehicles for a collision avoidance system). The proposed schemes rely on exploiting redundancy that is either introduced as part of the scheme, or exists naturally in the underlying problem, to compensate for missing results, i.e., they are a form of forward error correction for computations. Further, for one of the proposed schemes we exploit redundancy to also improve the effectiveness of multicasting, thus reducing the amount of data that needs to be communicated over the network. Such inter-node communication, like the straggler problem, can significantly limit the effectiveness of distributed systems. For the schemes we propose, we are able to show significant improvements in latency and reliability compared to previous schemes.Doktorgradsavhandlin

    An Energy-Efficient Multi-Cloud Service Broker for Green Cloud Computing Environment

    Get PDF
    The heavy demands on cloud computing resources have led to a substantial growth in energy consumption of the data transferred between cloud computing parties (i.e., providers, datacentres, users, and services) and in datacentre’s services due to the increasing loads on these services. From one hand, routing and transferring large amounts of data into a datacentre located far from the user’s geographical location consume more energy than just processing and storing the same data on the cloud datacentre. On the other hand, when a cloud user submits a job (in the form of a set of functional and non-functional requirements) to a cloud service provider (aka, datacentre) via a cloud services broker; the broker becomes responsible to find the best-fit service to the user request based mainly on the user’s requirements and Quality of Service (QoS) (i.e., response time, latency). Hence, it becomes a high necessity to locate the lowest energy consumption route between the user and the designated datacentre; and the minimum possible number of most energy efficient services that satisfy the user request. In fact, finding the most energy-efficient route to the datacentre, and most energy efficient service(s) to the user are the biggest challenges of multi-cloud broker’s environment. This thesis presents and evaluates a novel multi-cloud broker solution that contains three innovative models and their associated algorithms. The first one is aimed at finding the most energy efficient route, among multiple possible routes, between the user and cloud datacentre. The second model is to find and provide the lowest possible number of most energy efficient services in order to minimise data exchange based on a bin-packing approach. The third model creates an energy-aware composition plan by integrating the most energy efficient services, in order to fulfil user requirements. The results demonstrated a favourable performance of these models in terms of selecting the most energy efficient route and reaching the least possible number of services for an optimum and energy efficient composition

    Operating policies for energy efficient large scale computing

    Get PDF
    PhD ThesisEnergy costs now dominate IT infrastructure total cost of ownership, with datacentre operators predicted to spend more on energy than hardware infrastructure in the next five years. With Western European datacentre power consumption estimated at 56 TWh/year in 2007 and projected to double by 2020, improvements in energy efficiency of IT operations is imperative. The issue is further compounded by social and political factors and strict environmental legislation governing organisations. One such example of large IT systems includes high-throughput cycle stealing distributed systems such as HTCondor and BOINC, which allow organisations to leverage spare capacity on existing infrastructure to undertake valuable computation. As a consequence of increased scrutiny of the energy impact of these systems, aggressive power management policies are often employed to reduce the energy impact of institutional clusters, but in doing so these policies severely restrict the computational resources available for high-throughput systems. These policies are often configured to quickly transition servers and end-user cluster machines into low power states after only short idle periods, further compounding the issue of reliability. In this thesis, we evaluate operating policies for energy efficiency in large-scale computing environments by means of trace-driven discrete event simulation, leveraging real-world workload traces collected within Newcastle University. The major contributions of this thesis are as follows: i) Evaluation of novel energy efficient management policies for a decentralised peer-to-peer (P2P) BitTorrent environment. ii) Introduce a novel simulation environment for the evaluation of energy efficiency of large scale high-throughput computing systems, and propose a generalisable model of energy consumption in high-throughput computing systems. iii iii) Proposal and evaluation of resource allocation strategies for energy consumption in high-throughput computing systems for a real workload. iv) Proposal and evaluation for a realworkload ofmechanisms to reduce wasted task execution within high-throughput computing systems to reduce energy consumption. v) Evaluation of the impact of fault tolerance mechanisms on energy consumption

    Efficient Learning Machines

    Get PDF
    Computer scienc
    corecore