333 research outputs found
Performance-oriented Cloud Provisioning: Taxonomy and Survey
Cloud computing is being viewed as the technology of today and the future.
Through this paradigm, the customers gain access to shared computing resources
located in remote data centers that are hosted by cloud providers (CP). This
technology allows for provisioning of various resources such as virtual
machines (VM), physical machines, processors, memory, network, storage and
software as per the needs of customers. Application providers (AP), who are
customers of the CP, deploy applications on the cloud infrastructure and then
these applications are used by the end-users. To meet the fluctuating
application workload demands, dynamic provisioning is essential and this
article provides a detailed literature survey of dynamic provisioning within
cloud systems with focus on application performance. The well-known types of
provisioning and the associated problems are clearly and pictorially explained
and the provisioning terminology is clarified. A very detailed and general
cloud provisioning classification is presented, which views provisioning from
different perspectives, aiding in understanding the process inside-out. Cloud
dynamic provisioning is explained by considering resources, stakeholders,
techniques, technologies, algorithms, problems, goals and more.Comment: 14 pages, 3 figures, 3 table
Autonomic management of virtualized resources in cloud computing
The last five years have witnessed a rapid growth of cloud computing in business, governmental and educational IT deployment. The success of cloud services depends critically on the effective management of virtualized resources. A key requirement of cloud management is the ability to dynamically match resource allocations to actual demands, To this end, we aim to design and implement a cloud resource management mechanism that manages underlying complexity, automates resource provisioning and controls client-perceived quality of service (QoS) while still achieving resource efficiency.
The design of an automatic resource management centers on two questions: when to adjust resource allocations and how much to adjust. In a cloud, applications have different definitions on capacity and cloud dynamics makes it difficult to determine a static resource to performance relationship. In this dissertation, we have proposed a generic metric that measures application capacity, designed model-independent and adaptive approaches to manage resources and built a cloud management system scalable to a cluster of machines.
To understand web system capacity, we propose to use a metric of
productivity index (PI), which is defined as the ratio of yield to
cost, to measure the system processing capability online. PI is a generic concept that can be applied to different levels to monitor system progress in order to identify if more capacity is needed. We applied the concept of PI to the problem of overload prevention in multi-tier websites. The overload predictor built on the PI metric shows more accurate and responsive overload prevention compared to conventional approaches.
To address the issue of the lack of accurate server model, we propose a model-independent fuzzy control based approach for CPU allocation. For adaptive and stable control performance, we embed the controller with self-tuning output amplification and flexible rule selection. Finally, we build a QoS provisioning framework that supports multi-objective QoS control and service differentiation. Experiments on a virtual cluster with two service classes show the effectiveness of our approach in both performance and power control.
To address the problems of complex interplay between resources and process delays in fine-grained multi-resource allocation, we consider capacity management as a decision-making problem and employ reinforcement learning (RL) to optimize the process. The optimization depends on the trial-and-error interactions with the cloud system. In order to improve the initial management performance, we propose a model-based RL algorithm. The neural network based environment model, which is learned from previous management history, generates simulated resource allocations for the RL agent. Experiment results on heterogeneous applications show that our approach makes efficient use of limited interactions and find near optimal resource configurations within 7 steps.
Finally, we present a distributed reinforcement learning approach to the cluster-wide cloud resource management. We decompose the cluster-wide resource allocation problem into sub-problems concerning individual VM resource configurations. The cluster-wide allocation is optimized if individual VMs meet their SLA with a high resource utilization. For scalability, we develop an efficient reinforcement learning approach with continuous state space. For adaptability, we use VM low-level runtime statistics to accommodate workload dynamics. Prototyped in a iBalloon system, the distributed learning approach successfully manages 128 VMs on a 16-node close correlated cluster
DEPAS: A Decentralized Probabilistic Algorithm for Auto-Scaling
The dynamic provisioning of virtualized resources offered by cloud computing
infrastructures allows applications deployed in a cloud environment to
automatically increase and decrease the amount of used resources. This
capability is called auto-scaling and its main purpose is to automatically
adjust the scale of the system that is running the application to satisfy the
varying workload with minimum resource utilization. The need for auto-scaling
is particularly important during workload peaks, in which applications may need
to scale up to extremely large-scale systems.
Both the research community and the main cloud providers have already
developed auto-scaling solutions. However, most research solutions are
centralized and not suitable for managing large-scale systems, moreover cloud
providers' solutions are bound to the limitations of a specific provider in
terms of resource prices, availability, reliability, and connectivity.
In this paper we propose DEPAS, a decentralized probabilistic auto-scaling
algorithm integrated into a P2P architecture that is cloud provider
independent, thus allowing the auto-scaling of services over multiple cloud
infrastructures at the same time. Our simulations, which are based on real
service traces, show that our approach is capable of: (i) keeping the overall
utilization of all the instantiated cloud resources in a target range, (ii)
maintaining service response times close to the ones obtained using optimal
centralized auto-scaling approaches.Comment: Submitted to Springer Computin
Software-Defined Cloud Computing: Architectural Elements and Open Challenges
The variety of existing cloud services creates a challenge for service
providers to enforce reasonable Software Level Agreements (SLA) stating the
Quality of Service (QoS) and penalties in case QoS is not achieved. To avoid
such penalties at the same time that the infrastructure operates with minimum
energy and resource wastage, constant monitoring and adaptation of the
infrastructure is needed. We refer to Software-Defined Cloud Computing, or
simply Software-Defined Clouds (SDC), as an approach for automating the process
of optimal cloud configuration by extending virtualization concept to all
resources in a data center. An SDC enables easy reconfiguration and adaptation
of physical resources in a cloud infrastructure, to better accommodate the
demand on QoS through a software that can describe and manage various aspects
comprising the cloud environment. In this paper, we present an architecture for
SDCs on data centers with emphasis on mobile cloud applications. We present an
evaluation, showcasing the potential of SDC in two use cases-QoS-aware
bandwidth allocation and bandwidth-aware, energy-efficient VM placement-and
discuss the research challenges and opportunities in this emerging area.Comment: Keynote Paper, 3rd International Conference on Advances in Computing,
Communications and Informatics (ICACCI 2014), September 24-27, 2014, Delhi,
Indi
Effective Resource and Workload Management in Data Centers
The increasing demand for storage, computation, and business continuity has driven the growth of data centers. Managing data centers efficiently is a difficult task because of the wide variety of datacenter applications, their ever-changing intensities, and the fact that application performance targets may differ widely. Server virtualization has been a game-changing technology for IT, providing the possibility to support multiple virtual machines (VMs) simultaneously. This dissertation focuses on how virtualization technologies can be utilized to develop new tools for maintaining high resource utilization, for achieving high application performance, and for reducing the cost of data center management.;For multi-tiered applications, bursty workload traffic can significantly deteriorate performance. This dissertation proposes an admission control algorithm AWAIT, for handling overloading conditions in multi-tier web services. AWAIT places on hold requests of accepted sessions and refuses to admit new sessions when the system is in a sudden workload surge. to meet the service-level objective, AWAIT serves the requests in the blocking queue with high priority. The size of the queue is dynamically determined according to the workload burstiness.;Many admission control policies are triggered by instantaneous measurements of system resource usage, e.g., CPU utilization. This dissertation first demonstrates that directly measuring virtual machine resource utilizations with standard tools cannot always lead to accurate estimates. A directed factor graph (DFG) model is defined to model the dependencies among multiple types of resources across physical and virtual layers.;Virtualized data centers always enable sharing of resources among hosted applications for achieving high resource utilization. However, it is difficult to satisfy application SLOs on a shared infrastructure, as application workloads patterns change over time. AppRM, an automated management system not only allocates right amount of resources to applications for their performance target but also adjusts to dynamic workloads using an adaptive model.;Server consolidation is one of the key applications of server virtualization. This dissertation proposes a VM consolidation mechanism, first by extending the fair load balancing scheme for multi-dimensional vector scheduling, and then by using a queueing network model to capture the service contentions for a particular virtual machine placement
A Reliable and Cost-Efficient Auto-Scaling System for Web Applications Using Heterogeneous Spot Instances
Cloud providers sell their idle capacity on markets through an auction-like
mechanism to increase their return on investment. The instances sold in this
way are called spot instances. In spite that spot instances are usually 90%
cheaper than on-demand instances, they can be terminated by provider when their
bidding prices are lower than market prices. Thus, they are largely used to
provision fault-tolerant applications only. In this paper, we explore how to
utilize spot instances to provision web applications, which are usually
considered availability-critical. The idea is to take advantage of differences
in price among various types of spot instances to reach both high availability
and significant cost saving. We first propose a fault-tolerant model for web
applications provisioned by spot instances. Based on that, we devise novel
auto-scaling polices for hourly billed cloud markets. We implemented the
proposed model and policies both on a simulation testbed for repeatable
validation and Amazon EC2. The experiments on the simulation testbed and the
real platform against the benchmarks show that the proposed approach can
greatly reduce resource cost and still achieve satisfactory Quality of Service
(QoS) in terms of response time and availability
Snapshot Provisioning of Cloud Application Stacks to Face Traffic Surges
Traffic surges, like the Slashdot effect, occur when a web application is overloaded by a huge number of requests, potentially leading to unavailability. Unfortunately, such traffic variations are generally totally unplanned, of great amplitude, within a very short period, and a variable delay to return to a normal regime. In this report, we introduce PeakForecast as an elastic middleware solution to detect and absorb a traffic surge. In particular, PeakForecast can, from a trace of queries received in the last seconds, minutes or hours, to detect if the underlying system is facing a traffic surge or not, and then estimate the future traffic using a forecast model with an acceptable precision, thereby calculating the number of resources required to absorb the remaining traffic to come. We validate our solution by experimental results demonstrating that it can provide instantaneous elasticity of resources for traffic surges observed on the Japanese version of Wikipedia during the Fukushima Daiichi nuclear disaster in March 2011.Les pics de trafic, tels que l'effet Slashdot, apparaissent lorsqu'une application web doit faire face un nombre important de requêtes qui peut potentiellement entraîner une mise hors service de l'application. Malheureusement, de telles variations de traffic sont en général totalement imprévues et d'une grande amplitude, arrivent pendant une très courte période de temps et le retour à un régime normal prend un délai variable. Dans ce rapport, nous présentons PeakForecast qui est une solution intergicielle élastique pour détecter et absorber les pics de trafic. En particulier, PeakForecast peut, à partir des traces de requêtes reçues dans les dernières secondes, minutes ou heures, détecter si le système sous-jacent fait face ou non à un pic de trafic, estimer le trafic futur en utilisant un modèle de prédiction suffisamment précis, et calculer le nombre de ressources nécessaires à l'absorption du trafic restant à venir. Nous validons notre solution avec des résultats expérimentaux qui démontrent qu'elle fournit une élasticité instantanée des ressources pour des pics de trafic qui ont été observés sur la version japonaise de Wikipedia lors de l'accident nucléaire de Fukushima Daiichi en mars 2011
Snapshot Provisioning of Cloud Application Stacks to Face Traffic Surges
Traffic surges, like the Slashdot effect, occur when a web application is overloaded by a huge number of requests, potentially leading to unavailability. Unfortunately, such traffic variations are generally totally unplanned, of great amplitude, within a very short period, and a variable delay to return to a normal regime. In this report, we introduce PeakForecast as an elastic middleware solution to detect and absorb a traffic surge. In particular, PeakForecast can, from a trace of queries received in the last seconds, minutes or hours, to detect if the underlying system is facing a traffic surge or not, and then estimate the future traffic using a forecast model with an acceptable precision, thereby calculating the number of resources required to absorb the remaining traffic to come. We validate our solution by experimental results demonstrating that it can provide instantaneous elasticity of resources for traffic surges observed on the Japanese version of Wikipedia during the Fukushima Daiichi nuclear disaster in March 2011.Les pics de trafic, tels que l'effet Slashdot, apparaissent lorsqu'une application web doit faire face un nombre important de requêtes qui peut potentiellement entraîner une mise hors service de l'application. Malheureusement, de telles variations de traffic sont en général totalement imprévues et d'une grande amplitude, arrivent pendant une très courte période de temps et le retour à un régime normal prend un délai variable. Dans ce rapport, nous présentons PeakForecast qui est une solution intergicielle élastique pour détecter et absorber les pics de trafic. En particulier, PeakForecast peut, à partir des traces de requêtes reçues dans les dernières secondes, minutes ou heures, détecter si le système sous-jacent fait face ou non à un pic de trafic, estimer le trafic futur en utilisant un modèle de prédiction suffisamment précis, et calculer le nombre de ressources nécessaires à l'absorption du trafic restant à venir. Nous validons notre solution avec des résultats expérimentaux qui démontrent qu'elle fournit une élasticité instantanée des ressources pour des pics de trafic qui ont été observés sur la version japonaise de Wikipedia lors de l'accident nucléaire de Fukushima Daiichi en mars 2011
- …