Search CORE

41,193 research outputs found

A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning

Author: Li Zhe
Lin Sheng
Liu Ning
Qiu Qinru
Tang Jian
Wang Yanzhi
Xu Jielong
Xu Zhiyuan
Publication venue
Publication date: 11/08/2017
Field of study

Automatic decision-making approaches, such as reinforcement learning (RL), have been applied to (partially) solve the resource allocation problem adaptively in the cloud computing system. However, a complete cloud resource allocation framework exhibits high dimensions in state and action spaces, which prohibit the usefulness of traditional RL techniques. In addition, high power consumption has become one of the critical concerns in design and control of cloud computing systems, which degrades system reliability and increases cooling cost. An effective dynamic power management (DPM) policy should minimize power consumption while maintaining performance degradation within an acceptable level. Thus, a joint virtual machine (VM) resource allocation and power management framework is critical to the overall cloud computing system. Moreover, novel solution framework is necessary to address the even higher dimensions in state and action spaces. In this paper, we propose a novel hierarchical framework for solving the overall resource allocation and power management problem in cloud computing systems. The proposed hierarchical framework comprises a global tier for VM resource allocation to the servers and a local tier for distributed power management of local servers. The emerging deep reinforcement learning (DRL) technique, which can deal with complicated control problems with large state space, is adopted to solve the global tier problem. Furthermore, an autoencoder and a novel weight sharing structure are adopted to handle the high-dimensional state space and accelerate the convergence speed. On the other hand, the local tier of distributed server power managements comprises an LSTM based workload predictor and a model-free RL based power manager, operating in a distributed manner.Comment: accepted by 37th IEEE International Conference on Distributed Computing (ICDCS 2017

arXiv.org e-Print Archive

Crossref

Towards Data-Driven Autonomics in Data Centers

Author: Babaoglu Ozalp
Sîrbu Alina
Publication venue
Publication date: 01/01/2015
Field of study

Continued reliance on human operators for managing data centers is a major impediment for them from ever reaching extreme dimensions. Large computer systems in general, and data centers in particular, will ultimately be managed using predictive computational and executable models obtained through data-science tools, and at that point, the intervention of humans will be limited to setting high-level goals and policies rather than performing low-level operations. Data-driven autonomics, where management and control are based on holistic predictive models that are built and updated using generated data, opens one possible path towards limiting the role of operators in data centers. In this paper, we present a data-science study of a public Google dataset collected in a 12K-node cluster with the goal of building and evaluating a predictive model for node failures. We use BigQuery, the big data SQL platform from the Google Cloud suite, to process massive amounts of data and generate a rich feature set characterizing machine state over time. We describe how an ensemble classifier can be built out of many Random Forest classifiers each trained on these features, to predict if machines will fail in a future 24-hour window. Our evaluation reveals that if we limit false positive rates to 5%, we can achieve true positive rates between 27% and 88% with precision varying between 50% and 72%. We discuss the practicality of including our predictive model as the central component of a data-driven autonomic manager and operating it on-line with live data streams (rather than off-line on data logs). All of the scripts used for BigQuery and classification analyses are publicly available from the authors' website.Comment: 12 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Towards Operator-less Data Centers Through Data-Driven, Predictive, Proactive Autonomics

Author: Babaoglu Ozalp
Sîrbu Alina
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Continued reliance on human operators for managing data centers is a major impediment for them from ever reaching extreme dimensions. Large computer systems in general, and data centers in particular, will ultimately be managed using predictive computational and executable models obtained through data-science tools, and at that point, the intervention of humans will be limited to setting high-level goals and policies rather than performing low-level operations. Data-driven autonomics, where management and control are based on holistic predictive models that are built and updated using live data, opens one possible path towards limiting the role of operators in data centers. In this paper, we present a data-science study of a public Google dataset collected in a 12K-node cluster with the goal of building and evaluating predictive models for node failures. Our results support the practicality of a data-driven approach by showing the effectiveness of predictive models based on data found in typical data center logs. We use BigQuery, the big data SQL platform from the Google Cloud suite, to process massive amounts of data and generate a rich feature set characterizing node state over time. We describe how an ensemble classifier can be built out of many Random Forest classifiers each trained on these features, to predict if nodes will fail in a future 24-hour window. Our evaluation reveals that if we limit false positive rates to 5%, we can achieve true positive rates between 27% and 88% with precision varying between 50% and 72%.This level of performance allows us to recover large fraction of jobs' executions (by redirecting them to other nodes when a failure of the present node is predicted) that would otherwise have been wasted due to failures. [...

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

6G White Paper on Machine Learning in Wireless Communication Networks

Author: Abbas Robert
Ahmad Ijaz
Ali Samad
Amuru Saidhiraj
Bhadauria Shubhangi
Bhatia Vimal
Capobianco Michele
Chang Kapseok
Chen Mingzhe
Chu Thi My Chinh
Claes Maelick
Girnyk Maksym
Huusko Jyrki
Karvonen Teemu
Malik Hassan
Mei Kai
Mitra Rangeet
Rajatheva Nandana
Saad Walid
Shao Baohua
Shiri Hamid
Sliwa Benjamin
Steinbach Daniel
Suutala Jaakko
Wietfeld Christian
Yu Guanghui
Zepernick Hans-Jürgen
Publication venue
Publication date: 28/04/2020
Field of study

The focus of this white paper is on machine learning (ML) in wireless communications. 6G wireless communication networks will be the backbone of the digital transformation of societies by providing ubiquitous, reliable, and near-instant wireless connectivity for humans and machines. Recent advances in ML research has led enable a wide range of novel technologies such as self-driving vehicles and voice assistants. Such innovation is possible as a result of the availability of advanced ML models, large datasets, and high computational power. On the other hand, the ever-increasing demand for connectivity will require a lot of innovation in 6G wireless networks, and ML tools will play a major role in solving problems in the wireless domain. In this paper, we provide an overview of the vision of how ML will impact the wireless communication systems. We first give an overview of the ML methods that have the highest potential to be used in wireless networks. Then, we discuss the problems that can be solved by using ML in various layers of the network such as the physical layer, medium access layer, and application layer. Zero-touch optimization of wireless networks using ML is another interesting aspect that is discussed in this paper. Finally, at the end of each section, important research questions that the section aims to answer are presented

arXiv.org e-Print Archive

University of Miami: Scholarship Miami