6 research outputs found
Adaptive sliding windows for improved estimation of data center resource utilization
Accurate prediction of data center resource utilization is required for capacity planning, job scheduling, energy saving, workload placement, and load balancing to utilize the resources efficiently. However, accurately predicting those resources is challenging due to dynamic workloads, heterogeneous infrastructures, and multi-tenant co-hosted applications. Existing prediction methods use fixed size observation windows which cannot produce accurate results because of not being adaptively adjusted to capture local trends in the most recent data. Therefore, those methods train on large fixed sliding windows using an irrelevant large number of observations yielding to inaccurate estimations or fall for inaccuracy due to degradation of estimations with short windows on quick changing trends. In this paper we propose a deep learning-based adaptive window size selection method, dynamically limiting the sliding window size to capture the trend for the latest resource utilization, then build an estimation model for each trend period. We evaluate the proposed method against multiple baseline and state-of-the-art methods, using real data-center workload data sets. The experimental evaluation shows that the proposed solution outperforms those state-of-the-art approaches and yields 16 to 54% improved prediction accuracy compared to the baseline methods.This work is partially supported by the European ResearchCouncil (ERC) under the EU Horizon 2020 programme(GA 639595), the Spanish Ministry of Economy, Industry andCompetitiveness (TIN2015-65316-P and IJCI2016-27485), theGeneralitat de Catalunya, Spain (2014-SGR-1051) and Universityof the Punjab, Pakistan. The statements made herein are solelythe responsibility of the authors.Peer ReviewedPostprint (published version
Data center's telemetry reduction and prediction through modeling techniques
Nowadays, Cloud Computing is widely used to host and deliver services over the Internet. The architecture of clouds is complex due to its heterogeneous nature of hardware and is hosted in large scale data centers. To effectively and efficiently manage such complex infrastructure, constant monitoring is needed.
This monitoring generates large amounts of telemetry data streams (e.g. hardware utilization metrics) which are used for multiple purposes including problem detection, resource management, workload characterization, resource utilization prediction, capacity planning, and job scheduling. These telemetry streams require costly bandwidth utilization and storage space particularly at medium-long term for large data centers.
Moreover, accurate future estimation of these telemetry streams is a challenging task due to multi-tenant co-hosted applications and dynamic workloads. The inaccurate estimation leads to either under or over-provisioning of data center resources. In this Ph.D. thesis, we propose to improve the prediction accuracy and reduce the bandwidth utilization and storage space requirement with the help of modeling and prediction methods from machine learning. Most of the existing methods are based on a single model which often does not appropriately estimate different workload scenarios. Moreover, these prediction methods use a fixed size of observation windows which cannot produce accurate results because these are not adaptively adjusted to capture the local trends in the recent data. Therefore, the estimation method trains on fixed sliding windows use an irrelevant large number of observations which yields inaccurate estimations.
In summary, we C1) efficiently reduce bandwidth and storage for telemetry data through real-time modeling using Markov chain model. C2) propose a novel method to adaptively and automatically identify the most appropriate model to accurately estimate data center resources utilization. C3) propose a deep learning-based adaptive window size selection method which dynamically limits the sliding window size to capture the local trend in the latest resource utilization for building estimation model.Hoy en dÃa, Cloud Computing se usa ampliamente para alojar y prestar servicios a través de Internet. La arquitectura de las nubes es compleja debido a su naturaleza heterogénea del hardware y está alojada en centros de datos a gran escala. Para administrar de manera efectiva y eficiente dicha infraestructura compleja, se necesita un monitoreo constante. Este monitoreo genera grandes cantidades de flujos de datos de telemetrÃa (por ejemplo, métricas de utilización de hardware) que se utilizan para múltiples propósitos, incluyendo detección de problemas, gestión de recursos, caracterización de carga de trabajo, predicción de utilización de recursos, planificación de capacidad y programación de trabajos. Estas transmisiones de telemetrÃa requieren una utilización costosa del ancho de banda y espacio de almacenamiento, particularmente a mediano y largo plazo para grandes centros de datos. Además, la estimación futura precisa de estas transmisiones de telemetrÃa es una tarea difÃcil debido a las aplicaciones cohospedadas de múltiples inquilinos y las cargas de trabajo dinámicas. La estimación inexacta conduce a un suministro insuficiente o excesivo de los recursos del centro de datos. En este Ph.D. En la tesis, proponemos mejorar la precisión de la predicción y reducir la utilización del ancho de banda y los requisitos de espacio de almacenamiento con la ayuda de métodos de modelado y predicción del aprendizaje automático. La mayorÃa de los métodos existentes se basan en un modelo único que a menudo no estima adecuadamente diferentes escenarios de carga de trabajo. Además, estos métodos de predicción utilizan un tamaño fijo de ventanas de observación que no pueden producir resultados precisos porque no se ajustan adaptativamente para capturar las tendencias locales en los datos recientes. Por lo tanto, el método de estimación entrena en ventanas corredizas fijas utiliza un gran número de observaciones irrelevantes que produce estimaciones inexactas. En resumen, C1) reducimos eficientemente el ancho de banda y el almacenamiento de datos de telemetrÃa a través del modelado en tiempo real utilizando el modelo de cadena de Markov. C2) proponer un método novedoso para identificar de forma adaptativa y automática el modelo más apropiado para estimar con precisión la utilización de los recursos del centro de datos. C3) proponer un método de selección de tamaño de ventana adaptativo basado en el aprendizaje profundo que limita dinámicamente el tamaño de ventana deslizante para capturar la tendencia local en la última utilización de recursos para el modelo de estimación de construcción.Postprint (published version
Extending the BASE architecture for complex and reconfigurable cyber-physical systems using Holonic principles.
Thesis (MEng)--Stellenbosch University, 2021.ENGLISH ABSTRACT: ndustry 4.0 (I4.0) represents the newest technological revolution aimed at
optimising industries using drivers such as Cyber-Physical Systems (CPSs), the Internet of Things (IoT) and many more. In the past two decades, the holonic paradigm has become a major driver of intelligent manufacturing systems, making it ideal to advance I4.0.
The objective of this thesis is to extend an existing holonic reference architecture, the Biography-Attributes-Schedule-Execution (BASE) architecture, for complex
and reconfigurable CPSs. In the context of this thesis, complex and reconfigurable systems are considered to be systems that are comprised of many diverse,
autonomous and interacting entities, and of which the functionality, organization or size is expected to change over time. The thesis applies the principles of holonic
systems to manage complexity and enhance reconfigurability of CPS applications.
The BASE architecture is extended for two reasons: to enable it to integrate many diverse entities, and to enhance its reconfigurability. With regards to research on
holonic systems, this thesis aims to address two important functions for systems implemented using holonic principles, namely cooperation and cyber-physical interfacing
The most important extensions made to the architecture were to enable scalability,
refine the cooperation between holons, and integrate cyber-physical interfacing
services as Interface Holons. These extensions include platform management
components (e.g. a service directory) and standardised plugins (e.g. cyber-physical
interfacing plugins). The extended architecture was implemented on an educational
sheep farm, because of the many heterogeneous resources (sheep, camps, sensors,
humans, etc.) on the farm that need to be integrated into a BASE architecture
implemented CPS. This case study implementation had to integrate data from
different sensors, provide live analysis of observed data and, when required, notify the physical world of any problems in the CPS. At the end of the implementation,
an evaluation was done using the requirements of a complex, reconfigurable CPS
as evaluation criteria. This evaluation involved setting up quantitative and
qualitative evaluation metrics for the evaluation criteria, doing the evaluations, and
discussing what the results from the different evaluations indicate about the
effectiveness and efficiency of the extensions made to the BASE architecture.
The extensions made to the BASE architecture were found to improve robustness
and resilience. The use of Erlang was found to play a very important role in the
resulting reliability. The extensions also helped to fully address the original BASE
architecture’s scalability shortcomings and to increase development productivity.
Lastly, the extensions show the benefits of using service orientation to enable
cooperation between holons and how extracting all cyber-physical interfacing of a
system into dedicated Interface Holons reduces development time, improves
reusability and enhances diagnosability of interfacing problems.AFRIKAANSE OPSOMMING: ndustrie 4.0 (I4.0) is die nuutste tegnologiese revolusie en dit is daarop gemik om
industrieë te optimiseer deur middel van drywers soos Kuber-Fisiese Stelsels
(KFSs), die Internet of Things (IoT) en vele meer. In die afgelope twee dekades het
die holoniese paradigma ʼn belangrike drywer van intelligente vervaardigingstelsels
geword, wat dit ideaal maak om I4.0 te bevorder.
Die doel van hierdie tesis is om ‘n bestaande holoniese verwysings argitektuur, die
Biography-Attributes-Schedule-Execution (BASE-) argitektuur, uit te brei vir
komplekse, herkonfigureerbare KFSs. In die konteks van hierdie tesis, word
komplekse en herkonfigureerbare stelsels gesien as stelsels wat bestaan uit menige
diverse, outonome entiteite wat met mekaar interaksie het en waarvan die
funksionaliteit, organisasie en grootte verwag is om te verander met verloop van
tyd. Hierdie tesis pas die beginsels van holoniese stelsels toe om die kompleksiteit
van KFSs te bestuur en om herkonfigureerbaarheid van KFSs te verbeter.
Die BASE-argitektuur word uitgebrei om twee redes, naamlik om die integrasie van
menige diverse entiteite te ondersteun en om die argitektuur se
herkonfigureerbaarheid te verbeter. Die studie sal ‘n navorsingsbydrae lewer oor
holoniese stelsels deur twee belangrike funksionaliteite van stelsels wat
geïmplementeer is deur middel van holoniese stelsels aan te spreek – samewerking
tussen holons en kuber-fisiese koppeling.
Die belangrikste uitbreidings wat gemaak is aan die argitektuur was om
skaleerbaarheid moontlik te maak, samewerking tussen holons te verfyn en om
kuber-fisiese koppelingsdienste te integreer as holons. Hierdie uitbreidings sluit
nuwe platformbestuurkomponente en gestandaardiseerde plugins in. Die
uitgebreide argitektuur is geïmplementeer op ʼn opvoedkundige skaapplaas, omdat
die skaapplaas baie heterogene hulpbronne (skape, kampe, sensors, mense, ens.)
insluit wat in die BASE-argitektuur geïmplementeerde KFS geïntegreer kon word.
v
Hierdie gevallestudie-implementering moes data van verskillende sensors
integreer, intydse analises doen van die waargeneemde data en wanneer nodig, ‘n
entiteit in die fisiese wêreld inlig van enige probleme in die KFS. Aan die einde van
die implementering is ʼn evaluering gedoen deur die vereistes van ʼn komplekse,
herkonfigureerbare KFS as evalueringskriteria te gebruik. Die evaluering het
bestaan uit die opstel van kwantitatiewe en kwalitatiewe evalueringsmaatreëls, die
uitvoer van die evaluerings en ʼn bespreking van wat die evalueringsresultate aandui
oor die effektiwiteit en doeltreffendheid van die uitbreidings wat aan die BASE-
argitektuur gemaak is.
Dit is bevind dat die uitbreidings wat gemaak is aan die BASE-argitektuur
robuustheid en veerkragtigheid verbeter het. Die gebruik van Erlang het ʼn groot rol
gespeel in die gevolglike betroubaarheid. Die uitbreidings aan die BASE-
argitektuur het ook gehelp om die argitektuur volledig skaleerbaar te maak en om
ontwikkelingsproduktiwiteit te verbeter. Laastens, bewys die uitbreidings die
voordele van diensoriëntasie in die samewerking tussen holons en hoe die gebruik
van Koppelings Holons (Interface Holons) ontwikkelingstyd verminder, die
herbruikbaarheid van programbronkode verbeter en diagnoseerbaarheid van
koppelingsprobleme versterk.Master
Energy and performance-optimized scheduling of tasks in distributed cloud and edge computing systems
Infrastructure resources in distributed cloud data centers (CDCs) are shared by heterogeneous applications in a high-performance and cost-effective way. Edge computing has emerged as a new paradigm to provide access to computing capacities in end devices. Yet it suffers from such problems as load imbalance, long scheduling time, and limited power of its edge nodes. Therefore, intelligent task scheduling in CDCs and edge nodes is critically important to construct energy-efficient cloud and edge computing systems. Current approaches cannot smartly minimize the total cost of CDCs, maximize their profit and improve quality of service (QoS) of tasks because of aperiodic arrival and heterogeneity of tasks. This dissertation proposes a class of energy and performance-optimized scheduling algorithms built on top of several intelligent optimization algorithms. This dissertation includes two parts, including background work, i.e., Chapters 3–6, and new contributions, i.e., Chapters 7–11.
1) Background work of this dissertation.
Chapter 3 proposes a spatial task scheduling and resource optimization method to minimize the total cost of CDCs where bandwidth prices of Internet service providers, power grid prices, and renewable energy all vary with locations. Chapter 4 presents a geography-aware task scheduling approach by considering spatial variations in CDCs to maximize the profit of their providers by intelligently scheduling tasks. Chapter 5 presents a spatio-temporal task scheduling algorithm to minimize energy cost by scheduling heterogeneous tasks among CDCs while meeting their delay constraints. Chapter 6 gives a temporal scheduling algorithm considering temporal variations of revenue, electricity prices, green energy and prices of public clouds.
2) Contributions of this dissertation.
Chapter 7 proposes a multi-objective optimization method for CDCs to maximize their profit, and minimize the average loss possibility of tasks by determining task allocation among Internet service providers, and task service rates of each CDC. A simulated annealing-based bi-objective differential evolution algorithm is proposed to obtain an approximate Pareto optimal set. A knee solution is selected to schedule tasks in a high-profit and high-quality-of-service way. Chapter 8 formulates a bi-objective constrained optimization problem, and designs a novel optimization method to cope with energy cost reduction and QoS improvement. It jointly minimizes both energy cost of CDCs, and average response time of all tasks by intelligently allocating tasks among CDCs and changing task service rate of each CDC. Chapter 9 formulates a constrained bi-objective optimization problem for joint optimization of revenue and energy cost of CDCs. It is solved with an improved multi-objective evolutionary algorithm based on decomposition. It determines a high-quality trade-off between revenue maximization and energy cost minimization by considering CDCs’ spatial differences in energy cost while meeting tasks’ delay constraints. Chapter 10 proposes a simulated annealing-based bees algorithm to find a close-to-optimal solution. Then, a fine-grained spatial task scheduling algorithm is designed to minimize energy cost of CDCs by allocating tasks among multiple green clouds, and specifies running speeds of their servers. Chapter 11 proposes a profit-maximized collaborative computation offloading and resource allocation algorithm to maximize the profit of systems and guarantee that response time limits of tasks are met in cloud-edge computing systems. A single-objective constrained optimization problem is solved by a proposed simulated annealing-based migrating birds optimization. This dissertation evaluates these algorithms, models and software with real-life data and proves that they improve scheduling precision and cost-effectiveness of distributed cloud and edge computing systems