6 research outputs found

    Adaptive sliding windows for improved estimation of data center resource utilization

    Get PDF
    Accurate prediction of data center resource utilization is required for capacity planning, job scheduling, energy saving, workload placement, and load balancing to utilize the resources efficiently. However, accurately predicting those resources is challenging due to dynamic workloads, heterogeneous infrastructures, and multi-tenant co-hosted applications. Existing prediction methods use fixed size observation windows which cannot produce accurate results because of not being adaptively adjusted to capture local trends in the most recent data. Therefore, those methods train on large fixed sliding windows using an irrelevant large number of observations yielding to inaccurate estimations or fall for inaccuracy due to degradation of estimations with short windows on quick changing trends. In this paper we propose a deep learning-based adaptive window size selection method, dynamically limiting the sliding window size to capture the trend for the latest resource utilization, then build an estimation model for each trend period. We evaluate the proposed method against multiple baseline and state-of-the-art methods, using real data-center workload data sets. The experimental evaluation shows that the proposed solution outperforms those state-of-the-art approaches and yields 16 to 54% improved prediction accuracy compared to the baseline methods.This work is partially supported by the European ResearchCouncil (ERC) under the EU Horizon 2020 programme(GA 639595), the Spanish Ministry of Economy, Industry andCompetitiveness (TIN2015-65316-P and IJCI2016-27485), theGeneralitat de Catalunya, Spain (2014-SGR-1051) and Universityof the Punjab, Pakistan. The statements made herein are solelythe responsibility of the authors.Peer ReviewedPostprint (published version

    Data center's telemetry reduction and prediction through modeling techniques

    Get PDF
    Nowadays, Cloud Computing is widely used to host and deliver services over the Internet. The architecture of clouds is complex due to its heterogeneous nature of hardware and is hosted in large scale data centers. To effectively and efficiently manage such complex infrastructure, constant monitoring is needed. This monitoring generates large amounts of telemetry data streams (e.g. hardware utilization metrics) which are used for multiple purposes including problem detection, resource management, workload characterization, resource utilization prediction, capacity planning, and job scheduling. These telemetry streams require costly bandwidth utilization and storage space particularly at medium-long term for large data centers. Moreover, accurate future estimation of these telemetry streams is a challenging task due to multi-tenant co-hosted applications and dynamic workloads. The inaccurate estimation leads to either under or over-provisioning of data center resources. In this Ph.D. thesis, we propose to improve the prediction accuracy and reduce the bandwidth utilization and storage space requirement with the help of modeling and prediction methods from machine learning. Most of the existing methods are based on a single model which often does not appropriately estimate different workload scenarios. Moreover, these prediction methods use a fixed size of observation windows which cannot produce accurate results because these are not adaptively adjusted to capture the local trends in the recent data. Therefore, the estimation method trains on fixed sliding windows use an irrelevant large number of observations which yields inaccurate estimations. In summary, we C1) efficiently reduce bandwidth and storage for telemetry data through real-time modeling using Markov chain model. C2) propose a novel method to adaptively and automatically identify the most appropriate model to accurately estimate data center resources utilization. C3) propose a deep learning-based adaptive window size selection method which dynamically limits the sliding window size to capture the local trend in the latest resource utilization for building estimation model.Hoy en día, Cloud Computing se usa ampliamente para alojar y prestar servicios a través de Internet. La arquitectura de las nubes es compleja debido a su naturaleza heterogénea del hardware y está alojada en centros de datos a gran escala. Para administrar de manera efectiva y eficiente dicha infraestructura compleja, se necesita un monitoreo constante. Este monitoreo genera grandes cantidades de flujos de datos de telemetría (por ejemplo, métricas de utilización de hardware) que se utilizan para múltiples propósitos, incluyendo detección de problemas, gestión de recursos, caracterización de carga de trabajo, predicción de utilización de recursos, planificación de capacidad y programación de trabajos. Estas transmisiones de telemetría requieren una utilización costosa del ancho de banda y espacio de almacenamiento, particularmente a mediano y largo plazo para grandes centros de datos. Además, la estimación futura precisa de estas transmisiones de telemetría es una tarea difícil debido a las aplicaciones cohospedadas de múltiples inquilinos y las cargas de trabajo dinámicas. La estimación inexacta conduce a un suministro insuficiente o excesivo de los recursos del centro de datos. En este Ph.D. En la tesis, proponemos mejorar la precisión de la predicción y reducir la utilización del ancho de banda y los requisitos de espacio de almacenamiento con la ayuda de métodos de modelado y predicción del aprendizaje automático. La mayoría de los métodos existentes se basan en un modelo único que a menudo no estima adecuadamente diferentes escenarios de carga de trabajo. Además, estos métodos de predicción utilizan un tamaño fijo de ventanas de observación que no pueden producir resultados precisos porque no se ajustan adaptativamente para capturar las tendencias locales en los datos recientes. Por lo tanto, el método de estimación entrena en ventanas corredizas fijas utiliza un gran número de observaciones irrelevantes que produce estimaciones inexactas. En resumen, C1) reducimos eficientemente el ancho de banda y el almacenamiento de datos de telemetría a través del modelado en tiempo real utilizando el modelo de cadena de Markov. C2) proponer un método novedoso para identificar de forma adaptativa y automática el modelo más apropiado para estimar con precisión la utilización de los recursos del centro de datos. C3) proponer un método de selección de tamaño de ventana adaptativo basado en el aprendizaje profundo que limita dinámicamente el tamaño de ventana deslizante para capturar la tendencia local en la última utilización de recursos para el modelo de estimación de construcción.Postprint (published version

    Resource Management in Sustainable Cyber-Physical Systems Using Heterogeneous Cloud Computing

    No full text

    Extending the BASE architecture for complex and reconfigurable cyber-physical systems using Holonic principles.

    Get PDF
    Thesis (MEng)--Stellenbosch University, 2021.ENGLISH ABSTRACT: ndustry 4.0 (I4.0) represents the newest technological revolution aimed at optimising industries using drivers such as Cyber-Physical Systems (CPSs), the Internet of Things (IoT) and many more. In the past two decades, the holonic paradigm has become a major driver of intelligent manufacturing systems, making it ideal to advance I4.0. The objective of this thesis is to extend an existing holonic reference architecture, the Biography-Attributes-Schedule-Execution (BASE) architecture, for complex and reconfigurable CPSs. In the context of this thesis, complex and reconfigurable systems are considered to be systems that are comprised of many diverse, autonomous and interacting entities, and of which the functionality, organization or size is expected to change over time. The thesis applies the principles of holonic systems to manage complexity and enhance reconfigurability of CPS applications. The BASE architecture is extended for two reasons: to enable it to integrate many diverse entities, and to enhance its reconfigurability. With regards to research on holonic systems, this thesis aims to address two important functions for systems implemented using holonic principles, namely cooperation and cyber-physical interfacing The most important extensions made to the architecture were to enable scalability, refine the cooperation between holons, and integrate cyber-physical interfacing services as Interface Holons. These extensions include platform management components (e.g. a service directory) and standardised plugins (e.g. cyber-physical interfacing plugins). The extended architecture was implemented on an educational sheep farm, because of the many heterogeneous resources (sheep, camps, sensors, humans, etc.) on the farm that need to be integrated into a BASE architecture implemented CPS. This case study implementation had to integrate data from different sensors, provide live analysis of observed data and, when required, notify the physical world of any problems in the CPS. At the end of the implementation, an evaluation was done using the requirements of a complex, reconfigurable CPS as evaluation criteria. This evaluation involved setting up quantitative and qualitative evaluation metrics for the evaluation criteria, doing the evaluations, and discussing what the results from the different evaluations indicate about the effectiveness and efficiency of the extensions made to the BASE architecture. The extensions made to the BASE architecture were found to improve robustness and resilience. The use of Erlang was found to play a very important role in the resulting reliability. The extensions also helped to fully address the original BASE architecture’s scalability shortcomings and to increase development productivity. Lastly, the extensions show the benefits of using service orientation to enable cooperation between holons and how extracting all cyber-physical interfacing of a system into dedicated Interface Holons reduces development time, improves reusability and enhances diagnosability of interfacing problems.AFRIKAANSE OPSOMMING: ndustrie 4.0 (I4.0) is die nuutste tegnologiese revolusie en dit is daarop gemik om industrieë te optimiseer deur middel van drywers soos Kuber-Fisiese Stelsels (KFSs), die Internet of Things (IoT) en vele meer. In die afgelope twee dekades het die holoniese paradigma ʼn belangrike drywer van intelligente vervaardigingstelsels geword, wat dit ideaal maak om I4.0 te bevorder. Die doel van hierdie tesis is om ‘n bestaande holoniese verwysings argitektuur, die Biography-Attributes-Schedule-Execution (BASE-) argitektuur, uit te brei vir komplekse, herkonfigureerbare KFSs. In die konteks van hierdie tesis, word komplekse en herkonfigureerbare stelsels gesien as stelsels wat bestaan uit menige diverse, outonome entiteite wat met mekaar interaksie het en waarvan die funksionaliteit, organisasie en grootte verwag is om te verander met verloop van tyd. Hierdie tesis pas die beginsels van holoniese stelsels toe om die kompleksiteit van KFSs te bestuur en om herkonfigureerbaarheid van KFSs te verbeter. Die BASE-argitektuur word uitgebrei om twee redes, naamlik om die integrasie van menige diverse entiteite te ondersteun en om die argitektuur se herkonfigureerbaarheid te verbeter. Die studie sal ‘n navorsingsbydrae lewer oor holoniese stelsels deur twee belangrike funksionaliteite van stelsels wat geïmplementeer is deur middel van holoniese stelsels aan te spreek – samewerking tussen holons en kuber-fisiese koppeling. Die belangrikste uitbreidings wat gemaak is aan die argitektuur was om skaleerbaarheid moontlik te maak, samewerking tussen holons te verfyn en om kuber-fisiese koppelingsdienste te integreer as holons. Hierdie uitbreidings sluit nuwe platformbestuurkomponente en gestandaardiseerde plugins in. Die uitgebreide argitektuur is geïmplementeer op ʼn opvoedkundige skaapplaas, omdat die skaapplaas baie heterogene hulpbronne (skape, kampe, sensors, mense, ens.) insluit wat in die BASE-argitektuur geïmplementeerde KFS geïntegreer kon word. v Hierdie gevallestudie-implementering moes data van verskillende sensors integreer, intydse analises doen van die waargeneemde data en wanneer nodig, ‘n entiteit in die fisiese wêreld inlig van enige probleme in die KFS. Aan die einde van die implementering is ʼn evaluering gedoen deur die vereistes van ʼn komplekse, herkonfigureerbare KFS as evalueringskriteria te gebruik. Die evaluering het bestaan uit die opstel van kwantitatiewe en kwalitatiewe evalueringsmaatreëls, die uitvoer van die evaluerings en ʼn bespreking van wat die evalueringsresultate aandui oor die effektiwiteit en doeltreffendheid van die uitbreidings wat aan die BASE- argitektuur gemaak is. Dit is bevind dat die uitbreidings wat gemaak is aan die BASE-argitektuur robuustheid en veerkragtigheid verbeter het. Die gebruik van Erlang het ʼn groot rol gespeel in die gevolglike betroubaarheid. Die uitbreidings aan die BASE- argitektuur het ook gehelp om die argitektuur volledig skaleerbaar te maak en om ontwikkelingsproduktiwiteit te verbeter. Laastens, bewys die uitbreidings die voordele van diensoriëntasie in die samewerking tussen holons en hoe die gebruik van Koppelings Holons (Interface Holons) ontwikkelingstyd verminder, die herbruikbaarheid van programbronkode verbeter en diagnoseerbaarheid van koppelingsprobleme versterk.Master

    Energy and performance-optimized scheduling of tasks in distributed cloud and edge computing systems

    Get PDF
    Infrastructure resources in distributed cloud data centers (CDCs) are shared by heterogeneous applications in a high-performance and cost-effective way. Edge computing has emerged as a new paradigm to provide access to computing capacities in end devices. Yet it suffers from such problems as load imbalance, long scheduling time, and limited power of its edge nodes. Therefore, intelligent task scheduling in CDCs and edge nodes is critically important to construct energy-efficient cloud and edge computing systems. Current approaches cannot smartly minimize the total cost of CDCs, maximize their profit and improve quality of service (QoS) of tasks because of aperiodic arrival and heterogeneity of tasks. This dissertation proposes a class of energy and performance-optimized scheduling algorithms built on top of several intelligent optimization algorithms. This dissertation includes two parts, including background work, i.e., Chapters 3–6, and new contributions, i.e., Chapters 7–11. 1) Background work of this dissertation. Chapter 3 proposes a spatial task scheduling and resource optimization method to minimize the total cost of CDCs where bandwidth prices of Internet service providers, power grid prices, and renewable energy all vary with locations. Chapter 4 presents a geography-aware task scheduling approach by considering spatial variations in CDCs to maximize the profit of their providers by intelligently scheduling tasks. Chapter 5 presents a spatio-temporal task scheduling algorithm to minimize energy cost by scheduling heterogeneous tasks among CDCs while meeting their delay constraints. Chapter 6 gives a temporal scheduling algorithm considering temporal variations of revenue, electricity prices, green energy and prices of public clouds. 2) Contributions of this dissertation. Chapter 7 proposes a multi-objective optimization method for CDCs to maximize their profit, and minimize the average loss possibility of tasks by determining task allocation among Internet service providers, and task service rates of each CDC. A simulated annealing-based bi-objective differential evolution algorithm is proposed to obtain an approximate Pareto optimal set. A knee solution is selected to schedule tasks in a high-profit and high-quality-of-service way. Chapter 8 formulates a bi-objective constrained optimization problem, and designs a novel optimization method to cope with energy cost reduction and QoS improvement. It jointly minimizes both energy cost of CDCs, and average response time of all tasks by intelligently allocating tasks among CDCs and changing task service rate of each CDC. Chapter 9 formulates a constrained bi-objective optimization problem for joint optimization of revenue and energy cost of CDCs. It is solved with an improved multi-objective evolutionary algorithm based on decomposition. It determines a high-quality trade-off between revenue maximization and energy cost minimization by considering CDCs’ spatial differences in energy cost while meeting tasks’ delay constraints. Chapter 10 proposes a simulated annealing-based bees algorithm to find a close-to-optimal solution. Then, a fine-grained spatial task scheduling algorithm is designed to minimize energy cost of CDCs by allocating tasks among multiple green clouds, and specifies running speeds of their servers. Chapter 11 proposes a profit-maximized collaborative computation offloading and resource allocation algorithm to maximize the profit of systems and guarantee that response time limits of tasks are met in cloud-edge computing systems. A single-objective constrained optimization problem is solved by a proposed simulated annealing-based migrating birds optimization. This dissertation evaluates these algorithms, models and software with real-life data and proves that they improve scheduling precision and cost-effectiveness of distributed cloud and edge computing systems
    corecore