16 research outputs found
Technical Report: A Trace-Based Performance Study of Autoscaling Workloads of Workflows in Datacenters
To improve customer experience, datacenter operators offer support for
simplifying application and resource management. For example, running workloads
of workflows on behalf of customers is desirable, but requires increasingly
more sophisticated autoscaling policies, that is, policies that dynamically
provision resources for the customer. Although selecting and tuning autoscaling
policies is a challenging task for datacenter operators, so far relatively few
studies investigate the performance of autoscaling for workloads of workflows.
Complementing previous knowledge, in this work we propose the first
comprehensive performance study in the field. Using trace-based simulation, we
compare state-of-the-art autoscaling policies across multiple application
domains, workload arrival patterns (e.g., burstiness), and system utilization
levels. We further investigate the interplay between autoscaling and regular
allocation policies, and the complexity cost of autoscaling. Our quantitative
study focuses not only on traditional performance metrics and on
state-of-the-art elasticity metrics, but also on time- and memory-related
autoscaling-complexity metrics. Our main results give strong and quantitative
evidence about previously unreported operational behavior, for example, that
autoscaling policies perform differently across application domains and by how
much they differ.Comment: Technical Report for the CCGrid 2018 submission "A Trace-Based
Performance Study of Autoscaling Workloads of Workflows in Datacenters
Recommended from our members
Model-based resource management for fine-grained services
The emergence of DevOps has changed the way modern distributed software systems are developed. Architectures decomposed in fine-grained services, such as microservices or function-as-a-service (FaaS), are now widespread across many organizations. From a resource management perspective, although the systems built with such architectures have many benefits, there are still research challenges that need further attention. In this study, we have focused on three such challenges, each concerning a specific system resource: compute, memory, or storage. Firstly, we focus on scaling the capacity of microservices at runtime. Here, the challenge is to design an autoscaler that can decide between vertical and horizontal scaling options to distribute the CPU capacity. Secondly, we focus on estimating the required capacity of an on-premises FaaS platform such that the service level agreements (SLAs) for function response times are satisfied. The challenge here is to address the cold start dilemma, i.e., that a cold start delays a function response but reduces the memory consumption. Thus, we must find a limit of cold starts such that the memory-consumption remains in-check while satisfying the SLAs. Finally, we focus on the storage management for distributed tracing targeted at microservices. The volume of such traces generated in a data center can be in the scale of tens of terabytes per day, but only a small fraction of these traces is useful for troubleshooting. The objective then is to sample only the useful traces. The key to addressing all these challenges is first, modeling the dynamics concerning the resources and subsequently, leveraging the model in a resource controller. To address the first challenge, we have developed an autoscaler ATOM that leverages layered queueing network (LQN) models to take its scaling decisions. Our experiment, with a real-life application, shows that ATOM produces 30-37% better results than the baseline autoscalers. For the second challenge, we have developed COCOA, a cold start aware capacity planner. COCOA utilizes M/M/k setup and LQN models to assess the cold start scenario and estimate the required capacity. We show with simulation that COCOA can reduce over-provisioning by over 70% compared to the availability aware approaches. Finally, addressing the third challenge, we propose SampleHST, a trace sampler that works under a storage budget constraint. SampleHST relies on either bag of words or graph-based models to represent a trace and groups similar traces using online clustering to perform sampling. We have evaluated the performance of SampleHST using data from both literature and production, which shows it produces 1.2x to 19x better results than the state-of-the-art.Open Acces
A case study of proactive auto-scaling for an ecommerce workload
Preliminary data obtained from a partnership between the Federal University
of Campina Grande and an ecommerce company indicates that some applications
have issues when dealing with variable demand. This happens because a delay in
scaling resources leads to performance degradation and, in literature, is a
matter usually treated by improving the auto-scaling. To better understand the
current state-of-the-art on this subject, we re-evaluate an auto-scaling
algorithm proposed in the literature, in the context of ecommerce, using a
long-term real workload. Experimental results show that our proactive approach
is able to achieve an accuracy of up to 94 percent and led the auto-scaling to
a better performance than the reactive approach currently used by the ecommerce
company
Automatic Scaling in Cloud Computing
This dissertation thesis deals with automatic scaling in cloud computing, mainly focusing
on the performance of interactive workloads, that is web servers and services, running in an
elastic cloud environment. In the rst part of the thesis, the possibility of forecasting the
daily curve of workload is evaluated using long-range seasonal techniques of statistical time
series analysis. The accuracy is high enough to enable either green computing or lling
the unused capacity with batch jobs, hence the need for long-range forecasts. The second
part focuses on simulations of automatic scaling, which is necessary for the interactive
workload to actually free up space when it is not being utilized at peak capacity. Cloud
users are mostly scared of letting a machine control their servers, which is why realistic
simulations are needed. We have explored two methods, event-driven simulation and queuetheoretic
models. During work on the rst, we have extended the widely-used CloudSim
simulation package to be able to dynamically scale the simulation setup at run time and
have corrected its engine using knowledge from queueing theory. Our own simulator then
relies solely on theoretical models, making it much more precise and much faster than the
more general CloudSim. The tools from the two parts together constitute the theoretical
foundation which, once implemented in practice, can help leverage cloud technology to
actually increase the e ciency of data center hardware.
In particular, the main contributions of the dissertation thesis are as follows:
1. New methodology for forecasting time series of web server load and its validation
2. Extension of the often-used simulator CloudSim for interactive load and increasing
the accuracy of its output
3. Design and implementation of a fast and accurate simulator of automatic scaling
using queueing theoryTato dizerta cn pr ace se zab yv a cloud computingem, konkr etn e se zam e ruje na v ykon interaktivn
z at e ze, nap r klad webov ych server u a slu zeb, kter e b e z v elastick em cloudov em
prost red . V prvn c asti pr ace je zhodnocena mo znost p redpov d an denn k rivky z at e ze
pomoc metod statistick e anal yzy casov ych rad se sez onn m prvkem a dlouh ym dosahem.
P resnost je dostate cn e vysok a, aby umo znila bu d set ren energi nebo vypl nov an
nevyu zit e kapacity d avkov ymi ulohami, jejich z doba b ehu je hlavn m d uvodem pro pot rebu
dlouhodob e p redpov edi. Druh a c ast se zam e ruje na simulace automatick eho sk alov an ,
kter e je nutn e, aby interaktivn z at e z skute cn e uvolnila prostor, pokud nen vyt e zov ana na
plnou kapacitu. U zivatel e cloud u se p rev a zn e boj nechat stroj, aby ovl adal jejich servery,
a pr av e proto jsou pot reba realistick e simulace. Prozkoumali jsme dv e metody, konkr etn e
simulaci s prom enn ym casov ym krokem r zen ym ud alostmi a modely z teorie hromadn e obsluhy.
B ehem pr ace na prvn z t echto metod jsme roz s rili siroce pou z van y simula cn bal k
CloudSim o mo znost dynamicky sk alovat simulovan y syst em za b ehu a opravili jsme jeho
j adro za pomoci znalost z teorie hromadn e obsluhy. N a s vlastn simul ator se pak spol eh a
pouze na teoretick e modely, co z ho cin p resn ej s m a mnohem rychlej s m ne zli obecn ej s
CloudSim. N astroje z obou c ast pr ace tvo r dohromady teoretick y z aklad, kter y, pokud
bude implementov an v praxi, pom u ze vyu z t technologii cloudu tak, aby se skute cn e zv y sila
efektivita vyu zit hardwaru datov ych center.
Hlavn p r nosy t eto dizerta cn pr ace jsou n asleduj c :
1. Stanoven metodologie pro p redpov d an casov ych rad z at e ze webov ych server u a jej
validace
2. Roz s ren casto citovan eho simul atoru CloudSim o mo znost simulace interaktivn
z at e ze a zp resn en jeho v ysledk u
3. N avrh a implementace rychl eho a p resn eho simul atoru automatick eho sk alov an vyu z vaj c ho
teorii hromadn e obsluhyKatedra kybernetik