Search CORE

16 research outputs found

Technical Report: A Trace-Based Performance Study of Autoscaling Workloads of Workflows in Datacenters

Author: Iosup Alexandru
Neacşu Mihai
Versluis Laurens
Publication venue
Publication date: 24/11/2017
Field of study

To improve customer experience, datacenter operators offer support for simplifying application and resource management. For example, running workloads of workflows on behalf of customers is desirable, but requires increasingly more sophisticated autoscaling policies, that is, policies that dynamically provision resources for the customer. Although selecting and tuning autoscaling policies is a challenging task for datacenter operators, so far relatively few studies investigate the performance of autoscaling for workloads of workflows. Complementing previous knowledge, in this work we propose the first comprehensive performance study in the field. Using trace-based simulation, we compare state-of-the-art autoscaling policies across multiple application domains, workload arrival patterns (e.g., burstiness), and system utilization levels. We further investigate the interplay between autoscaling and regular allocation policies, and the complexity cost of autoscaling. Our quantitative study focuses not only on traditional performance metrics and on state-of-the-art elasticity metrics, but also on time- and memory-related autoscaling-complexity metrics. Our main results give strong and quantitative evidence about previously unreported operational behavior, for example, that autoscaling policies perform differently across application domains and by how much they differ.Comment: Technical Report for the CCGrid 2018 submission "A Trace-Based Performance Study of Autoscaling Workloads of Workflows in Datacenters

arXiv.org e-Print Archive

VU Research Portal

Crossref

Recommended from our members

Model-based resource management for fine-grained services

Author: Gias Alim Ul
Publication venue: Computing, Imperial College London
Publication date: 01/08/2022
Field of study

The emergence of DevOps has changed the way modern distributed software systems are developed. Architectures decomposed in fine-grained services, such as microservices or function-as-a-service (FaaS), are now widespread across many organizations. From a resource management perspective, although the systems built with such architectures have many benefits, there are still research challenges that need further attention. In this study, we have focused on three such challenges, each concerning a specific system resource: compute, memory, or storage. Firstly, we focus on scaling the capacity of microservices at runtime. Here, the challenge is to design an autoscaler that can decide between vertical and horizontal scaling options to distribute the CPU capacity. Secondly, we focus on estimating the required capacity of an on-premises FaaS platform such that the service level agreements (SLAs) for function response times are satisfied. The challenge here is to address the cold start dilemma, i.e., that a cold start delays a function response but reduces the memory consumption. Thus, we must find a limit of cold starts such that the memory-consumption remains in-check while satisfying the SLAs. Finally, we focus on the storage management for distributed tracing targeted at microservices. The volume of such traces generated in a data center can be in the scale of tens of terabytes per day, but only a small fraction of these traces is useful for troubleshooting. The objective then is to sample only the useful traces. The key to addressing all these challenges is first, modeling the dynamics concerning the resources and subsequently, leveraging the model in a resource controller. To address the first challenge, we have developed an autoscaler ATOM that leverages layered queueing network (LQN) models to take its scaling decisions. Our experiment, with a real-life application, shows that ATOM produces 30-37% better results than the baseline autoscalers. For the second challenge, we have developed COCOA, a cold start aware capacity planner. COCOA utilizes M/M/k setup and LQN models to assess the cold start scenario and estimate the required capacity. We show with simulation that COCOA can reduce over-provisioning by over 70% compared to the availability aware approaches. Finally, addressing the third challenge, we propose SampleHST, a trace sampler that works under a storage budget constraint. SampleHST relies on either bag of words or graph-based models to represent a trace and groups similar traces using online clustering to perform sampling. We have evaluated the performance of SampleHST using data from both literature and production, which shows it produces 1.2x to 19x better results than the state-of-the-art.Open Acces

City Research Online

Spiral - Imperial College Digital Repository

A case study of proactive auto-scaling for an ecommerce workload

Author: de Almeida Marcella Medeiros Siqueira Coutinho
Morais Fabio
Pereira Thiago Emmanuel
Publication venue
Publication date: 21/11/2022
Field of study

Preliminary data obtained from a partnership between the Federal University of Campina Grande and an ecommerce company indicates that some applications have issues when dealing with variable demand. This happens because a delay in scaling resources leads to performance degradation and, in literature, is a matter usually treated by improving the auto-scaling. To better understand the current state-of-the-art on this subject, we re-evaluate an auto-scaling algorithm proposed in the literature, in the context of ecommerce, using a long-term real workload. Experimental results show that our proactive approach is able to achieve an accuracy of up to 94 percent and led the auto-scaling to a better performance than the reactive approach currently used by the ecommerce company

arXiv.org e-Print Archive

Automatic Scaling in Cloud Computing

Author: Vondra Tomáš
Publication venue
Publication date: 01/01/2017
Field of study

This dissertation thesis deals with automatic scaling in cloud computing, mainly focusing on the performance of interactive workloads, that is web servers and services, running in an elastic cloud environment. In the rst part of the thesis, the possibility of forecasting the daily curve of workload is evaluated using long-range seasonal techniques of statistical time series analysis. The accuracy is high enough to enable either green computing or lling the unused capacity with batch jobs, hence the need for long-range forecasts. The second part focuses on simulations of automatic scaling, which is necessary for the interactive workload to actually free up space when it is not being utilized at peak capacity. Cloud users are mostly scared of letting a machine control their servers, which is why realistic simulations are needed. We have explored two methods, event-driven simulation and queuetheoretic models. During work on the rst, we have extended the widely-used CloudSim simulation package to be able to dynamically scale the simulation setup at run time and have corrected its engine using knowledge from queueing theory. Our own simulator then relies solely on theoretical models, making it much more precise and much faster than the more general CloudSim. The tools from the two parts together constitute the theoretical foundation which, once implemented in practice, can help leverage cloud technology to actually increase the e ciency of data center hardware. In particular, the main contributions of the dissertation thesis are as follows: 1. New methodology for forecasting time series of web server load and its validation 2. Extension of the often-used simulator CloudSim for interactive load and increasing the accuracy of its output 3. Design and implementation of a fast and accurate simulator of automatic scaling using queueing theoryTato dizerta cn pr ace se zab yv a cloud computingem, konkr etn e se zam e ruje na v ykon interaktivn z at e ze, nap r klad webov ych server u a slu zeb, kter e b e z v elastick em cloudov em prost red . V prvn c asti pr ace je zhodnocena mo znost p redpov d an denn k rivky z at e ze pomoc metod statistick e anal yzy casov ych rad se sez onn m prvkem a dlouh ym dosahem. P resnost je dostate cn e vysok a, aby umo znila bu d set ren energi nebo vypl nov an nevyu zit e kapacity d avkov ymi ulohami, jejich z doba b ehu je hlavn m d uvodem pro pot rebu dlouhodob e p redpov edi. Druh a c ast se zam e ruje na simulace automatick eho sk alov an , kter e je nutn e, aby interaktivn z at e z skute cn e uvolnila prostor, pokud nen vyt e zov ana na plnou kapacitu. U zivatel e cloud u se p rev a zn e boj nechat stroj, aby ovl adal jejich servery, a pr av e proto jsou pot reba realistick e simulace. Prozkoumali jsme dv e metody, konkr etn e simulaci s prom enn ym casov ym krokem r zen ym ud alostmi a modely z teorie hromadn e obsluhy. B ehem pr ace na prvn z t echto metod jsme roz s rili siroce pou z van y simula cn bal k CloudSim o mo znost dynamicky sk alovat simulovan y syst em za b ehu a opravili jsme jeho j adro za pomoci znalost z teorie hromadn e obsluhy. N a s vlastn simul ator se pak spol eh a pouze na teoretick e modely, co z ho cin p resn ej s m a mnohem rychlej s m ne zli obecn ej s CloudSim. N astroje z obou c ast pr ace tvo r dohromady teoretick y z aklad, kter y, pokud bude implementov an v praxi, pom u ze vyu z t technologii cloudu tak, aby se skute cn e zv y sila efektivita vyu zit hardwaru datov ych center. Hlavn p r nosy t eto dizerta cn pr ace jsou n asleduj c : 1. Stanoven metodologie pro p redpov d an casov ych rad z at e ze webov ych server u a jej validace 2. Roz s ren casto citovan eho simul atoru CloudSim o mo znost simulace interaktivn z at e ze a zp resn en jeho v ysledk u 3. N avrh a implementace rychl eho a p resn eho simul atoru automatick eho sk alov an vyu z vaj c ho teorii hromadn e obsluhyKatedra kybernetik

Digital Library of the Czech Technical University in Prague