Search CORE

750,158 research outputs found

A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning

Author: Li Zhe
Lin Sheng
Liu Ning
Qiu Qinru
Tang Jian
Wang Yanzhi
Xu Jielong
Xu Zhiyuan
Publication venue
Publication date: 11/08/2017
Field of study

Automatic decision-making approaches, such as reinforcement learning (RL), have been applied to (partially) solve the resource allocation problem adaptively in the cloud computing system. However, a complete cloud resource allocation framework exhibits high dimensions in state and action spaces, which prohibit the usefulness of traditional RL techniques. In addition, high power consumption has become one of the critical concerns in design and control of cloud computing systems, which degrades system reliability and increases cooling cost. An effective dynamic power management (DPM) policy should minimize power consumption while maintaining performance degradation within an acceptable level. Thus, a joint virtual machine (VM) resource allocation and power management framework is critical to the overall cloud computing system. Moreover, novel solution framework is necessary to address the even higher dimensions in state and action spaces. In this paper, we propose a novel hierarchical framework for solving the overall resource allocation and power management problem in cloud computing systems. The proposed hierarchical framework comprises a global tier for VM resource allocation to the servers and a local tier for distributed power management of local servers. The emerging deep reinforcement learning (DRL) technique, which can deal with complicated control problems with large state space, is adopted to solve the global tier problem. Furthermore, an autoencoder and a novel weight sharing structure are adopted to handle the high-dimensional state space and accelerate the convergence speed. On the other hand, the local tier of distributed server power managements comprises an LSTM based workload predictor and a model-free RL based power manager, operating in a distributed manner.Comment: accepted by 37th IEEE International Conference on Distributed Computing (ICDCS 2017

arXiv.org e-Print Archive

Crossref

Autonomic Cloud Computing: Open Challenges and Architectural Elements

Author: Buyya Rajkumar
Calheiros Rodrigo N.
Li Xiaorong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/09/2012
Field of study

As Clouds are complex, large-scale, and heterogeneous distributed systems, management of their resources is a challenging task. They need automated and integrated intelligent strategies for provisioning of resources to offer services that are secure, reliable, and cost-efficient. Hence, effective management of services becomes fundamental in software platforms that constitute the fabric of computing Clouds. In this direction, this paper identifies open issues in autonomic resource provisioning and presents innovative management techniques for supporting SaaS applications hosted on Clouds. We present a conceptual architecture and early results evidencing the benefits of autonomic management of Clouds.Comment: 8 pages, 6 figures, conference keynote pape

arXiv.org e-Print Archive

Crossref

Design and evaluation of a scalable hierarchical application component placement algorithm for cloud resource allocation

Author: Barshan Maryam
De Turck Filip
Moens Hendrik
Publication venue
Publication date: 01/01/2014
Field of study

In the context of cloud systems, mapping application components to a set of physical servers and assigning resources to those components is challenging. For large-scale clouds, traditional resource allocation systems, which rely on a centralized management paradigm, become ineffective and inefficient. Therefore, there is an essential need of providing new management solutions that scale well with the size of large cloud systems. In this paper a distributed and hierarchical component placement algorithm is presented, evaluated and compared to a centralized algorithm. Each application is represented as a collection of interacting services, and multiple service types with differing placement characteristics are considered. Our evaluations show that the proposed algorithm is at least 84.65 times faster and offers better scalability compared with a central approach, while the percentage of servers used and fully placed applications remains close to that of the centralized algorithm

Crossref

Ghent University Academic Bibliography

Riding out of the storm: How to deal with the complexity of grid and cloud management

Author: A Avizienis
A Sánchez
A Weiss
Alberto Sánchez
B Nicolae
B Rochwerger
B Sotomayor
DH Bailey
DR Ogura
F Gagliardi
G Tsouloupas
H Stockinger
J Carroll
J Montes
Jesús Montes
JJ Valdés
JJ Valdés
K Krauter
L Rodero-Merino
LM Vaquero
M Frumkin
M Siddiqui
María S. Pérez
MD Dikaiakos
WMP Aalst van der
Y Jégou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Over the last decade, Grid computing paved the way for a new level of large scale distributed systems. This infrastructure made it possible to securely and reliably take advantage of widely separated computational resources that are part of several different organizations. Resources can be incorporated to the Grid, building a theoretical virtual supercomputer. In time, cloud computing emerged as a new type of large scale distributed system, inheriting and expanding the expertise and knowledge that have been obtained so far. Some of the main characteristics of Grids naturally evolved into clouds, others were modified and adapted and others were simply discarded or postponed. Regardless of these technical specifics, both Grids and clouds together can be considered as one of the most important advances in large scale distributed computing of the past ten years; however, this step in distributed computing has came along with a completely new level of complexity. Grid and cloud management mechanisms play a key role, and correct analysis and understanding of the system behavior are needed. Large scale distributed systems must be able to self-manage, incorporating autonomic features capable of controlling and optimizing all resources and services. Traditional distributed computing management mechanisms analyze each resource separately and adjust specific parameters of each one of them. When trying to adapt the same procedures to Grid and cloud computing, the vast complexity of these systems can make this task extremely complicated. But large scale distributed systems complexity could only be a matter of perspective. It could be possible to understand the Grid or cloud behavior as a single entity, instead of a set of resources. This abstraction could provide a different understanding of the system, describing large scale behavior and global events that probably would not be detected analyzing each resource separately. In this work we define a theoretical framework that combines both ideas, multiple resources and single entity, to develop large scale distributed systems management techniques aimed at system performance optimization, increased dependability and Quality of Service (QoS). The resulting synergy could be the key 350 J. Montes et al. to address the most important difficulties of Grid and cloud management

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

ACTiCLOUD: Enabling the Next Generation of Cloud Applications

Author: Attwood A.
Elmroth E.
Flouris M.
Foutris N.
Goodacre J.
Goumas G.
Grohmann D.
Karakostas V.
Kersten M.
Kotselidis C.
Koutsourakis P.
Koziris N.
Lakew E.B.
Lee K.
Liu L.
Lujàn M.
Nikas K.
Rustad E.
Thomson J.
Tomás L.
Vesterkjaer A.
Webber J.
Zhang Y.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Despite their proliferation as a dominant computing paradigm, cloud computing systems lack effective mechanisms to manage their vast amounts of resources efficiently. Resources are stranded and fragmented, ultimately limiting cloud systems' applicability to large classes of critical applications that pose non-moderate resource demands. Eliminating current technological barriers of actual fluidity and scalability of cloud resources is essential to strengthen cloud computing's role as a critical cornerstone for the digital economy. ACTiCLOUD proposes a novel cloud architecture that breaks the existing scale-up and share-nothing barriers and enables the holistic management of physical resources both at the local cloud site and at distributed levels. Specifically, it makes advancements in the cloud resource management stacks by extending state-of-the-art hypervisor technology beyond the physical server boundary and localized cloud management system to provide a holistic resource management within a rack, within a site, and across distributed cloud sites. On top of this, ACTiCLOUD will adapt and optimize system libraries and runtimes (e.g., JVM) as well as ACTiCLOUD-native applications, which are extremely demanding, and critical classes of applications that currently face severe difficulties in matching their resource requirements to state-of-the-art cloud offerings

Crossref

The University of Manchester - Institutional Repository

International Migration, Integration and Social Cohesion online publications

DIANA Scheduling Hierarchies for Optimizing Bulk Job Scheduling

Author: Ali A.
Alvi O.
Anjum A.
Hasham K.
McClatchey R.
Sagheer M.
Stockinger H.
Thomas M.
Willers I.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2006
Field of study

The use of meta-schedulers for resource management in large-scale distributed systems often leads to a hierarchy of schedulers. In this paper, we discuss why existing meta-scheduling hierarchies are sometimes not sufficient for Grid systems due to their inability to re-organise jobs already scheduled locally. Such a job re-organisation is required to adapt to evolving loads which are common in heavily used Grid infrastructures. We propose a peer-to-peer scheduling model and evaluate it using case studies and mathematical modelling. We detail the DIANA (Data Intensive and Network Aware) scheduling algorithm and its queue management system for coping with the load distribution and for supporting bulk job scheduling. We demonstrate that such a system is beneficial for dynamic, distributed and self-organizing resource management and can assist in optimizing load or job distribution in complex Grid infrastructures.Comment: 8 pages, 9 figures. Presented at the 2nd IEEE Int Conference on eScience & Grid Computing. Amsterdam Netherlands, December 200

arXiv.org e-Print Archive

Crossref

Caltech Authors

Recommended from our members

Engineering emergence for cluster configuration

Author: Anthony Richard
Publication venue: International Institute of Informatics and Cybernetics
Publication date: 01/01/2005
Field of study

Distributed applications are being deployed on ever-increasing scale and with ever-increasing functionality. Due to the accompanying increase in behavioural complexity, self-management abilities, such as self-healing, have become core requirements. A key challenge is the smooth embedding of such functionality into our systems. Natural distributed systems such as ant colonies have evolved highly efficient behaviour. These emergent systems achieve high scalability through the use of low complexity communication strategies and are highly robust through large-scale replication of simple, anonymous entities. Ways to engineer this fundamentally non-deterministic behaviour for use in distributed applications are being explored. An emergent, dynamic, cluster management scheme, which forms part of a hierarchical resource management architecture, is presented. Natural biological systems, which embed self-healing behaviour at several levels, have influenced the architecture. The resulting system is a simple, lightweight and highly robust platform on which cluster-based autonomic applications can be deployed

Greenwich Academic Literature Archive

Directory of Open Access Journals