1,542 research outputs found

    Adaptive prediction models for data center resources utilization estimation

    Get PDF
    Accurate estimation of data center resource utilization is a challenging task due to multi-tenant co-hosted applications having dynamic and time-varying workloads. Accurate estimation of future resources utilization helps in better job scheduling, workload placement, capacity planning, proactive auto-scaling, and load balancing. The inaccurate estimation leads to either under or over-provisioning of data center resources. Most existing estimation methods are based on a single model that often does not appropriately estimate different workload scenarios. To address these problems, we propose a novel method to adaptively and automatically identify the most appropriate model to accurately estimate data center resources utilization. The proposed approach trains a classifier based on statistical features of historical resources usage to decide the appropriate prediction model to use for given resource utilization observations collected during a specific time interval. We evaluated our approach on real datasets and compared the results with multiple baseline methods. The experimental evaluation shows that the proposed approach outperforms the state-of-the-art approaches and delivers 6% to 27% improved resource utilization estimation accuracy compared to baseline methods.This work is partially supported by the European Research Council (ERC) under the EU Horizon 2020 programme (GA 639595), the Spanish Ministry of Economy, Industry and Competitiveness (TIN2015-65316-P and IJCI2016-27485), the Generalitat de Catalunya (2014-SGR-1051), and NPRP grant # NPRP9-224-1-049 from the Qatar National Research Fund (a member of Qatar Foundation) and University of the Punjab, Pakistan.Peer ReviewedPostprint (published version

    Achieving Fair Load Balancing by Invoking a Learning Automata-based Two Time Scale Separation Paradigm

    Get PDF
    Author's accepted manuscript.© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this article, we consider the problem of load balancing (LB), but, unlike the approaches that have been proposed earlier, we attempt to resolve the problem in a fair manner (or rather, it would probably be more appropriate to describe it as an ε-fair manner because, although the LB can, probably, never be totally fair, we achieve this by being ``as close to fair as possible''). The solution that we propose invokes a novel stochastic learning automaton (LA) scheme, so as to attain a distribution of the load to a number of nodes, where the performance level at the different nodes is approximately equal and each user experiences approximately the same Quality of the Service (QoS) irrespective of which node that he/she is connected to. Since the load is dynamically varying, static resource allocation schemes are doomed to underperform. This is further relevant in cloud environments, where we need dynamic approaches because the available resources are unpredictable (or rather, uncertain) by virtue of the shared nature of the resource pool. Furthermore, we prove here that there is a coupling involving LA's probabilities and the dynamics of the rewards themselves, which renders the environments to be nonstationary. This leads to the emergence of the so-called property of ``stochastic diminishing rewards.'' Our newly proposed novel LA algorithm ε-optimally solves the problem, and this is done by resorting to a two-time-scale-based stochastic learning paradigm. As far as we know, the results presented here are of a pioneering sort, and we are unaware of any comparable results.acceptedVersio

    URegM: a unified prediction model of resource consumption for refactoring software smells in open source cloud

    Full text link
    The low cost and rapid provisioning capabilities have made the cloud a desirable platform to launch complex scientific applications. However, resource utilization optimization is a significant challenge for cloud service providers, since the earlier focus is provided on optimizing resources for the applications that run on the cloud, with a low emphasis being provided on optimizing resource utilization of the cloud computing internal processes. Code refactoring has been associated with improving the maintenance and understanding of software code. However, analyzing the impact of the refactoring source code of the cloud and studying its impact on cloud resource usage require further analysis. In this paper, we propose a framework called Unified Regression Modelling (URegM) which predicts the impact of code smell refactoring on cloud resource usage. We test our experiments in a real-life cloud environment using a complex scientific application as a workload. Results show that URegM is capable of accurately predicting resource consumption due to code smell refactoring. This will permit cloud service providers with advanced knowledge about the impact of refactoring code smells on resource consumption, thus allowing them to plan their resource provisioning and code refactoring more effectively

    Machine Learning-Based Anomaly Detection in Cloud Virtual Machine Resource Usage

    Get PDF
    Anomaly detection is an important activity in cloud computing systems because it aids in the identification of odd behaviours or actions that may result in software glitch, security breaches, and performance difficulties. Detecting aberrant resource utilization trends in virtual machines is a typical application of anomaly detection in cloud computing (VMs). Currently, the most serious cyber threat is distributed denial-of-service attacks. The afflicted server\u27s resources and internet traffic resources, such as bandwidth and buffer size, are slowed down by restricting the server\u27s capacity to give resources to legitimate customers. To recognize attacks and common occurrences, machine learning techniques such as Quadratic Support Vector Machines (QSVM), Random Forest, and neural network models such as MLP and Autoencoders are employed. Various machine learning algorithms are used on the optimised NSL-KDD dataset to provide an efficient and accurate predictor of network intrusions. In this research, we propose a neural network based model and experiment on various central and spiral rearrangements of the features for distinguishing between different types of attacks and support our approach of better preservation of feature structure with image representations. The results are analysed and compared to existing models and prior research. The outcomes of this study have practical implications for improving the security and performance of cloud computing systems, specifically in the area of identifying and mitigating network intrusions

    Power consumption prediction in cloud data center using machine learning

    Get PDF
    The flourishing development of the cloud computing paradigm provides several services in the industrial business world. Power consumption by cloud data centers is one of the crucial issues for service providers in the domain of cloud computing. Pursuant to the rapid technology enhancements in cloud environments and data centers augmentations, power utilization in data centers is expected to grow unabated. A diverse set of numerous connected devices, engaged with the ubiquitous cloud, results in unprecedented power utilization by the data centers, accompanied by increased carbon footprints. Nearly a million physical machines (PM) are running all over the data centers, along with (5 – 6) million virtual machines (VM). In the next five years, the power needs of this domain are expected to spiral up to 5% of global power production. The virtual machine power consumption reduction impacts the diminishing of the PM’s power, however further changing in power consumption of data center year by year, to aid the cloud vendors using prediction methods. The sudden fluctuation in power utilization will cause power outage in the cloud data centers. This paper aims to forecast the VM power consumption with the help of regressive predictive analysis, one of the Machine Learning (ML) techniques. The potency of this approach to make better predictions of future value, using Multi-layer Perceptron (MLP) regressor which provides 91% of accuracy during the prediction process

    Cost-Effective Scheduling and Load Balancing Algorithms in Cloud Computing Using Learning Automata

    Get PDF
    Cloud computing is a distributed computing model in which access is based on demand. A cloud computing environment includes a wide variety of resource suppliers and consumers. Hence, efficient and effective methods for task scheduling and load balancing are required. This paper presents a new approach to task scheduling and load balancing in the cloud computing environment with an emphasis on the cost-efficiency of task execution through resources. The proposed algorithms are based on the fair distribution of jobs between machines, which will prevent the unconventional increase in the price of a machine and the unemployment of other machines. The two parameters Total Cost and Final Cost are designed to achieve the mentioned goal. Applying these two parameters will create a fair basis for job scheduling and load balancing. To implement the proposed approach, learning automata are used as an effective and efficient technique in reinforcement learning. Finally, to show the effectiveness of the proposed algorithms we conducted simulations using CloudSim toolkit and compared proposed algorithms with other existing algorithms like BCO, PES, CJS, PPO and MCT. The proposed algorithms can balance the Final Cost and Total Cost of machines. Also, the proposed algorithms outperform best existing algorithms in terms of efficiency and imbalance degree

    Review and Analysis of Failure Detection and Prevention Techniques in IT Infrastructure Monitoring

    Get PDF
    Maintaining the health of IT infrastructure components for improved reliability and availability is a research and innovation topic for many years. Identification and handling of failures are crucial and challenging due to the complexity of IT infrastructure. System logs are the primary source of information to diagnose and fix failures. In this work, we address three essential research dimensions about failures, such as the need for failure handling in IT infrastructure, understanding the contribution of system-generated log in failure detection and reactive & proactive approaches used to deal with failure situations. This study performs a comprehensive analysis of existing literature by considering three prominent aspects as log preprocessing, anomaly & failure detection, and failure prevention. With this coherent review, we (1) presume the need for IT infrastructure monitoring to avoid downtime, (2) examine the three types of approaches for anomaly and failure detection such as a rule-based, correlation method and classification, and (3) fabricate the recommendations for researchers on further research guidelines. As far as the authors\u27 knowledge, this is the first comprehensive literature review on IT infrastructure monitoring techniques. The review has been conducted with the help of meta-analysis and comparative study of machine learning and deep learning techniques. This work aims to outline significant research gaps in the area of IT infrastructure failure detection. This work will help future researchers understand the advantages and limitations of current methods and select an adequate approach to their problem

    Data center's telemetry reduction and prediction through modeling techniques

    Get PDF
    Nowadays, Cloud Computing is widely used to host and deliver services over the Internet. The architecture of clouds is complex due to its heterogeneous nature of hardware and is hosted in large scale data centers. To effectively and efficiently manage such complex infrastructure, constant monitoring is needed. This monitoring generates large amounts of telemetry data streams (e.g. hardware utilization metrics) which are used for multiple purposes including problem detection, resource management, workload characterization, resource utilization prediction, capacity planning, and job scheduling. These telemetry streams require costly bandwidth utilization and storage space particularly at medium-long term for large data centers. Moreover, accurate future estimation of these telemetry streams is a challenging task due to multi-tenant co-hosted applications and dynamic workloads. The inaccurate estimation leads to either under or over-provisioning of data center resources. In this Ph.D. thesis, we propose to improve the prediction accuracy and reduce the bandwidth utilization and storage space requirement with the help of modeling and prediction methods from machine learning. Most of the existing methods are based on a single model which often does not appropriately estimate different workload scenarios. Moreover, these prediction methods use a fixed size of observation windows which cannot produce accurate results because these are not adaptively adjusted to capture the local trends in the recent data. Therefore, the estimation method trains on fixed sliding windows use an irrelevant large number of observations which yields inaccurate estimations. In summary, we C1) efficiently reduce bandwidth and storage for telemetry data through real-time modeling using Markov chain model. C2) propose a novel method to adaptively and automatically identify the most appropriate model to accurately estimate data center resources utilization. C3) propose a deep learning-based adaptive window size selection method which dynamically limits the sliding window size to capture the local trend in the latest resource utilization for building estimation model.Hoy en día, Cloud Computing se usa ampliamente para alojar y prestar servicios a través de Internet. La arquitectura de las nubes es compleja debido a su naturaleza heterogénea del hardware y está alojada en centros de datos a gran escala. Para administrar de manera efectiva y eficiente dicha infraestructura compleja, se necesita un monitoreo constante. Este monitoreo genera grandes cantidades de flujos de datos de telemetría (por ejemplo, métricas de utilización de hardware) que se utilizan para múltiples propósitos, incluyendo detección de problemas, gestión de recursos, caracterización de carga de trabajo, predicción de utilización de recursos, planificación de capacidad y programación de trabajos. Estas transmisiones de telemetría requieren una utilización costosa del ancho de banda y espacio de almacenamiento, particularmente a mediano y largo plazo para grandes centros de datos. Además, la estimación futura precisa de estas transmisiones de telemetría es una tarea difícil debido a las aplicaciones cohospedadas de múltiples inquilinos y las cargas de trabajo dinámicas. La estimación inexacta conduce a un suministro insuficiente o excesivo de los recursos del centro de datos. En este Ph.D. En la tesis, proponemos mejorar la precisión de la predicción y reducir la utilización del ancho de banda y los requisitos de espacio de almacenamiento con la ayuda de métodos de modelado y predicción del aprendizaje automático. La mayoría de los métodos existentes se basan en un modelo único que a menudo no estima adecuadamente diferentes escenarios de carga de trabajo. Además, estos métodos de predicción utilizan un tamaño fijo de ventanas de observación que no pueden producir resultados precisos porque no se ajustan adaptativamente para capturar las tendencias locales en los datos recientes. Por lo tanto, el método de estimación entrena en ventanas corredizas fijas utiliza un gran número de observaciones irrelevantes que produce estimaciones inexactas. En resumen, C1) reducimos eficientemente el ancho de banda y el almacenamiento de datos de telemetría a través del modelado en tiempo real utilizando el modelo de cadena de Markov. C2) proponer un método novedoso para identificar de forma adaptativa y automática el modelo más apropiado para estimar con precisión la utilización de los recursos del centro de datos. C3) proponer un método de selección de tamaño de ventana adaptativo basado en el aprendizaje profundo que limita dinámicamente el tamaño de ventana deslizante para capturar la tendencia local en la última utilización de recursos para el modelo de estimación de construcción.Postprint (published version

    Adaptive sliding windows for improved estimation of data center resource utilization

    Get PDF
    Accurate prediction of data center resource utilization is required for capacity planning, job scheduling, energy saving, workload placement, and load balancing to utilize the resources efficiently. However, accurately predicting those resources is challenging due to dynamic workloads, heterogeneous infrastructures, and multi-tenant co-hosted applications. Existing prediction methods use fixed size observation windows which cannot produce accurate results because of not being adaptively adjusted to capture local trends in the most recent data. Therefore, those methods train on large fixed sliding windows using an irrelevant large number of observations yielding to inaccurate estimations or fall for inaccuracy due to degradation of estimations with short windows on quick changing trends. In this paper we propose a deep learning-based adaptive window size selection method, dynamically limiting the sliding window size to capture the trend for the latest resource utilization, then build an estimation model for each trend period. We evaluate the proposed method against multiple baseline and state-of-the-art methods, using real data-center workload data sets. The experimental evaluation shows that the proposed solution outperforms those state-of-the-art approaches and yields 16 to 54% improved prediction accuracy compared to the baseline methods.This work is partially supported by the European ResearchCouncil (ERC) under the EU Horizon 2020 programme(GA 639595), the Spanish Ministry of Economy, Industry andCompetitiveness (TIN2015-65316-P and IJCI2016-27485), theGeneralitat de Catalunya, Spain (2014-SGR-1051) and Universityof the Punjab, Pakistan. The statements made herein are solelythe responsibility of the authors.Peer ReviewedPostprint (published version
    • …
    corecore