750,158 research outputs found

    A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning

    Full text link
    Automatic decision-making approaches, such as reinforcement learning (RL), have been applied to (partially) solve the resource allocation problem adaptively in the cloud computing system. However, a complete cloud resource allocation framework exhibits high dimensions in state and action spaces, which prohibit the usefulness of traditional RL techniques. In addition, high power consumption has become one of the critical concerns in design and control of cloud computing systems, which degrades system reliability and increases cooling cost. An effective dynamic power management (DPM) policy should minimize power consumption while maintaining performance degradation within an acceptable level. Thus, a joint virtual machine (VM) resource allocation and power management framework is critical to the overall cloud computing system. Moreover, novel solution framework is necessary to address the even higher dimensions in state and action spaces. In this paper, we propose a novel hierarchical framework for solving the overall resource allocation and power management problem in cloud computing systems. The proposed hierarchical framework comprises a global tier for VM resource allocation to the servers and a local tier for distributed power management of local servers. The emerging deep reinforcement learning (DRL) technique, which can deal with complicated control problems with large state space, is adopted to solve the global tier problem. Furthermore, an autoencoder and a novel weight sharing structure are adopted to handle the high-dimensional state space and accelerate the convergence speed. On the other hand, the local tier of distributed server power managements comprises an LSTM based workload predictor and a model-free RL based power manager, operating in a distributed manner.Comment: accepted by 37th IEEE International Conference on Distributed Computing (ICDCS 2017

    Autonomic Cloud Computing: Open Challenges and Architectural Elements

    Full text link
    As Clouds are complex, large-scale, and heterogeneous distributed systems, management of their resources is a challenging task. They need automated and integrated intelligent strategies for provisioning of resources to offer services that are secure, reliable, and cost-efficient. Hence, effective management of services becomes fundamental in software platforms that constitute the fabric of computing Clouds. In this direction, this paper identifies open issues in autonomic resource provisioning and presents innovative management techniques for supporting SaaS applications hosted on Clouds. We present a conceptual architecture and early results evidencing the benefits of autonomic management of Clouds.Comment: 8 pages, 6 figures, conference keynote pape

    Design and evaluation of a scalable hierarchical application component placement algorithm for cloud resource allocation

    Get PDF
    In the context of cloud systems, mapping application components to a set of physical servers and assigning resources to those components is challenging. For large-scale clouds, traditional resource allocation systems, which rely on a centralized management paradigm, become ineffective and inefficient. Therefore, there is an essential need of providing new management solutions that scale well with the size of large cloud systems. In this paper a distributed and hierarchical component placement algorithm is presented, evaluated and compared to a centralized algorithm. Each application is represented as a collection of interacting services, and multiple service types with differing placement characteristics are considered. Our evaluations show that the proposed algorithm is at least 84.65 times faster and offers better scalability compared with a central approach, while the percentage of servers used and fully placed applications remains close to that of the centralized algorithm

    Riding out of the storm: How to deal with the complexity of grid and cloud management

    Get PDF
    Over the last decade, Grid computing paved the way for a new level of large scale distributed systems. This infrastructure made it possible to securely and reliably take advantage of widely separated computational resources that are part of several different organizations. Resources can be incorporated to the Grid, building a theoretical virtual supercomputer. In time, cloud computing emerged as a new type of large scale distributed system, inheriting and expanding the expertise and knowledge that have been obtained so far. Some of the main characteristics of Grids naturally evolved into clouds, others were modified and adapted and others were simply discarded or postponed. Regardless of these technical specifics, both Grids and clouds together can be considered as one of the most important advances in large scale distributed computing of the past ten years; however, this step in distributed computing has came along with a completely new level of complexity. Grid and cloud management mechanisms play a key role, and correct analysis and understanding of the system behavior are needed. Large scale distributed systems must be able to self-manage, incorporating autonomic features capable of controlling and optimizing all resources and services. Traditional distributed computing management mechanisms analyze each resource separately and adjust specific parameters of each one of them. When trying to adapt the same procedures to Grid and cloud computing, the vast complexity of these systems can make this task extremely complicated. But large scale distributed systems complexity could only be a matter of perspective. It could be possible to understand the Grid or cloud behavior as a single entity, instead of a set of resources. This abstraction could provide a different understanding of the system, describing large scale behavior and global events that probably would not be detected analyzing each resource separately. In this work we define a theoretical framework that combines both ideas, multiple resources and single entity, to develop large scale distributed systems management techniques aimed at system performance optimization, increased dependability and Quality of Service (QoS). The resulting synergy could be the key 350 J. Montes et al. to address the most important difficulties of Grid and cloud management

    ACTiCLOUD: Enabling the Next Generation of Cloud Applications

    Get PDF
    Despite their proliferation as a dominant computing paradigm, cloud computing systems lack effective mechanisms to manage their vast amounts of resources efficiently. Resources are stranded and fragmented, ultimately limiting cloud systems' applicability to large classes of critical applications that pose non-moderate resource demands. Eliminating current technological barriers of actual fluidity and scalability of cloud resources is essential to strengthen cloud computing's role as a critical cornerstone for the digital economy. ACTiCLOUD proposes a novel cloud architecture that breaks the existing scale-up and share-nothing barriers and enables the holistic management of physical resources both at the local cloud site and at distributed levels. Specifically, it makes advancements in the cloud resource management stacks by extending state-of-the-art hypervisor technology beyond the physical server boundary and localized cloud management system to provide a holistic resource management within a rack, within a site, and across distributed cloud sites. On top of this, ACTiCLOUD will adapt and optimize system libraries and runtimes (e.g., JVM) as well as ACTiCLOUD-native applications, which are extremely demanding, and critical classes of applications that currently face severe difficulties in matching their resource requirements to state-of-the-art cloud offerings

    DIANA Scheduling Hierarchies for Optimizing Bulk Job Scheduling

    Get PDF
    The use of meta-schedulers for resource management in large-scale distributed systems often leads to a hierarchy of schedulers. In this paper, we discuss why existing meta-scheduling hierarchies are sometimes not sufficient for Grid systems due to their inability to re-organise jobs already scheduled locally. Such a job re-organisation is required to adapt to evolving loads which are common in heavily used Grid infrastructures. We propose a peer-to-peer scheduling model and evaluate it using case studies and mathematical modelling. We detail the DIANA (Data Intensive and Network Aware) scheduling algorithm and its queue management system for coping with the load distribution and for supporting bulk job scheduling. We demonstrate that such a system is beneficial for dynamic, distributed and self-organizing resource management and can assist in optimizing load or job distribution in complex Grid infrastructures.Comment: 8 pages, 9 figures. Presented at the 2nd IEEE Int Conference on eScience & Grid Computing. Amsterdam Netherlands, December 200
    corecore