3,119 research outputs found

    Resource Management and Scheduling for Big Data Applications in Cloud Computing Environments

    Get PDF
    This chapter presents software architectures of the big data processing platforms. It will provide an in-depth knowledge on resource management techniques involved while deploying big data processing systems on cloud environment. It starts from the very basics and gradually introduce the core components of resource management which we have divided in multiple layers. It covers the state-of-art practices and researches done in SLA-based resource management with a specific focus on the job scheduling mechanisms.Comment: 27 pages, 9 figure

    Load Balancing in Distributed Cloud Computing: A Reinforcement Learning Algorithms in Heterogeneous Environment

    Get PDF
    Balancing load in cloud based is an important aspect that plays a vital role in order to achieve sharing of load between different types of resources such as virtual machines that lay on servers, storage in the form of hard drives and servers. Reinforcement learning approaches can be adopted with cloud computing to achieve quality of service factors such as minimized cost and response time, increased throughput, fault tolerance and utilization of all available resources in the network, thus increasing system performance. Reinforcement Learning based approaches result in making effective resource utilization by selecting the best suitable processor for task execution with minimum makespan. Since in the earlier related work done on sharing of load, there are limited reinforcement learning based approaches. However this paper, focuses on the importance of RL based approaches for achieving balanced load in the area of distributed cloud computing. A Reinforcement Learning framework is proposed and implemented for execution of tasks in heterogeneous environments, particularly, Least Load Balancing (LLB) and Booster Reinforcement Controller (BRC) Load Balancing. With the help of reinforcement learning approaches an optimal result is achieved for load sharing and task allocation. In this RL based framework processor workload is taken as an input. In this paper, the results of proposed RL based approaches have been evaluated for cost and makespan and are compared with existing load balancing techniques for task execution and resource utilization.

    Cooperative scheduling and load balancing techniques in fog and edge computing

    Get PDF
    Fog and Edge Computing are two models that reached maturity in the last decade. Today, they are two solid concepts and plenty of literature tried to develop them. Also corroborated by the development of technologies, like for example 5G, they can now be considered de facto standards when building low and ultra-low latency applications, privacy-oriented solutions, industry 4.0 and smart city infrastructures. The common trait of Fog and Edge computing environments regards their inherent distributed and heterogeneous nature where the multiple (Fog or Edge) nodes are able to interact with each other with the essential purpose of pre-processing data gathered by the uncountable number of sensors to which they are connected to, even by running significant ML models and relying upon specific processors (TPU). However, nodes are often placed in a geographic domain, like a smart city, and the dynamic of the traffic during the day may cause some nodes to be overwhelmed by requests while others instead may become completely idle. To achieve the optimal usage of the system and also to guarantee the best possible QoS across all the users connected to the Fog or Edge nodes, the need to design load balancing and scheduling algorithms arises. In particular, a reasonable solution is to enable nodes to cooperate. This capability represents the main objective of this thesis, which is the design of fully distributed algorithms and solutions whose purpose is the one of balancing the load across all the nodes, also by following, if possible, QoS requirements in terms of latency or imposing constraints in terms of power consumption when the nodes are powered by green energy sources. Unfortunately, when a central orchestrator is missing, a crucial element which makes the design of such algorithms difficult is that nodes need to know the state of the others in order to make the best possible scheduling decision. However, it is not possible to retrieve the state without introducing further latency during the service of the request. Furthermore, the retrieved information about the state is always old, and as a consequence, the decision is always relying on imprecise data. In this thesis, the problem is circumvented in two main ways. The first one considers randomised algorithms which avoid probing all of the neighbour nodes in favour of at maximum two nodes picked at random. This is proven to bring an exponential improvement in performance with respect to the probe of a single node. The second approach, instead, considers Reinforcement Learning as a technique for inferring the state of the other nodes thanks to the reward received by the agents when requests are forwarded. Moreover, the thesis will also focus on the energy aspect of the Edge devices. In particular, will be analysed a scenario of Green Edge Computing, where devices are powered only by Photovoltaic Panels and a scenario of mobile offloading targeting ML image inference applications. Lastly, a final glance will be given at a series of infrastructural studies, which will give the foundations for implementing the proposed algorithms on real devices, in particular, Single Board Computers (SBCs). There will be presented a structural scheme of a testbed of Raspberry Pi boards, and a fully-fledged framework called ``P2PFaaS'' which allows the implementation of load balancing and scheduling algorithms based on the Function-as-a-Service (FaaS) paradigm

    Intelligent Resource Scheduling at Scale: a Machine Learning Perspective

    Get PDF
    Resource scheduling in a computing system addresses the problem of packing tasks with multi-dimensional resource requirements and non-functional constraints. The exhibited heterogeneity of workload and server characteristics in Cloud-scale or Internet-scale systems is adding further complexity and new challenges to the problem. Compared with,,,, existing solutions based on ad-hoc heuristics, Machine Learning (ML) has the potential to improve further the efficiency of resource management in large-scale systems. In this paper we,,,, will describe and discuss how ML could be used to understand automatically both workloads and environments, and to help to cope with scheduling-related challenges such as consolidating co-located workloads, handling resource requests, guaranteeing application's QoSs, and mitigating tailed stragglers. We will introduce a generalized ML-based solution to large-scale resource scheduling and demonstrate its effectiveness through a case study that deals with performance-centric node classification and straggler mitigation. We believe that an MLbased method will help to achieve architectural optimization and efficiency improvement

    Task Scheduler for Heterogeneous Data Centers based on Deep Reinforcement Learning

    Get PDF
    ABSTRACT: The computational capacity needs by both companies and research groups all over the world has increased greatly in the last years. These necessities are due to the increasing amount of data to analyse, training of machine learning models, high performance scientific applications and many other types of jobs that are too costly for the infrastructure of these groups. This means that data centers are getting more and more jobs, and thus need more and more resources to accommodate all of them. The scheduling of all of these tasks in the heterogeneous resources of a data center is too complex computationally to be performed optimally, as it is an NP-Complete problem. For this reason, task scheduling is still performed by heuristic algorithms. However, these algorithms are beginning to fall short for managing the great heterogeneity of both jobs and resources efficiently. This fact, together with the desire of being able to have a method that is adaptable to different objectives such as minimizing energy consumption or job slowdown, opens the door to the idea of using a machine learning approach. The objective of this work is to design and implement an intelligent agent that is able to make use of all the available information from both jobs and resources to take the best scheduling decisions depending on the selected objective.RESUMEN: Las necesidades tanto de las empresas como de los grupos de investigación de todo el mundo en cuanto a capacidad de cómputo ha crecido enormemente en los últimos años. Estas necesidades se deben a la creciente cantidad de datos que analizar, entrenamientos de algoritmos de inteligencia artificial, aplicaciones científicas de computación de altas prestaciones y muchos otros tipos de trabajos que son demasiado costosos para la infraestructura de muchos de estos grupos. Esto hace que los data centers cada vez reciban más y más trabajos, y necesiten a su vez más y más recursos para dar cabida a todos ellos. La planificación de esta gran cantidad de trabajos en los recursos tan heterogéneos de un data center es una tarea computacionalmente demasiado compleja como para ser realizada de forma óptima, ya que es un problema NP-Completo. Debido a esto, hasta ahora esta planificación ha sido realizada mediante el uso de algoritmos heurísticos. Sin embargo, estos algoritmos comienzan a quedarse cortos a la hora de manejar de forma eficiente la gran heterogeneidad tanto de los trabajos como de los recursos. Esto, junto a la necesidad de tener un método que sea capaz de adaptarse a distintos objetivos como pueden ser minimizar el consumo de energía o el slowdown de los trabajos, abre las puertas a tratar de usar un enfoque basado en aprendizaje automático. El objetivo de este trabajo es diseñar e implementar un agente inteligente que sepa aprovechar toda la información disponible tanto de los trabajos como de los recursos para hacer una planificación lo más eficiente posible en función del objetivo seleccionado.Grado en Ingeniería Informátic
    • …
    corecore