1,149 research outputs found
Energy-Aware Cloud Management through Progressive SLA Specification
Novel energy-aware cloud management methods dynamically reallocate
computation across geographically distributed data centers to leverage regional
electricity price and temperature differences. As a result, a managed VM may
suffer occasional downtimes. Current cloud providers only offer high
availability VMs, without enough flexibility to apply such energy-aware
management. In this paper we show how to analyse past traces of dynamic cloud
management actions based on electricity prices and temperatures to estimate VM
availability and price values. We propose a novel SLA specification approach
for offering VMs with different availability and price values guaranteed over
multiple SLAs to enable flexible energy-aware cloud management. We determine
the optimal number of such SLAs as well as their availability and price
guaranteed values. We evaluate our approach in a user SLA selection simulation
using Wikipedia and Grid'5000 workloads. The results show higher customer
conversion and 39% average energy savings per VM.Comment: 14 pages, conferenc
SLA Management in Intent-Driven Service Management Systems: A Taxonomy and Future Directions
Traditionally, network and system administrators are responsible for
designing, configuring, and resolving the Internet service requests.
Human-driven system configuration and management are proving unsatisfactory due
to the recent interest in time-sensitive applications with stringent quality of
service (QoS). Aiming to transition from the traditional human-driven to
zero-touch service management in the field of networks and computing,
intent-driven service management (IDSM) has been proposed as a response to
stringent quality of service requirements. In IDSM, users express their service
requirements in a declarative manner as intents. IDSM, with the help of closed
control-loop operations, perform configurations and deployments, autonomously
to meet service request requirements. The result is a faster deployment of
Internet services and reduction in configuration errors caused by manual
operations, which in turn reduces the service-level agreement (SLA) violations.
In the early stages of development, IDSM systems require attention from
industry as well as academia. In an attempt to fill the gaps in current
research, we conducted a systematic literature review of SLA management in IDSM
systems. As an outcome, we have identified four IDSM intent management
activities and proposed a taxonomy for each activity. Analysis of all studies
and future research directions, are presented in the conclusions.Comment: Extended version of the preprint submitted at ACM Computing Surveys
(CSUR
Progressive introduction of network softwarization in operational telecom networks: advances at architectural, service and transport levels
Technological paradigms such as Software Defined Networking, Network Function
Virtualization and Network Slicing are altogether offering new ways of providing services.
This process is widely known as Network Softwarization, where traditional operational
networks adopt capabilities and mechanisms inherit form the computing world, such as
programmability, virtualization and multi-tenancy.
This adoption brings a number of challenges, both from the technological and operational
perspectives. On the other hand, they provide an unprecedented flexibility opening
opportunities to developing new services and new ways of exploiting and consuming telecom
networks.
This Thesis first overviews the implications of the progressive introduction of network
softwarization in operational networks for later on detail some advances at different levels,
namely architectural, service and transport levels. It is done through specific exemplary use
cases and evolution scenarios, with the goal of illustrating both new possibilities and existing
gaps for the ongoing transition towards an advanced future mode of operation.
This is performed from the perspective of a telecom operator, paying special attention on
how to integrate all these paradigms into operational networks for assisting on their evolution
targeting new, more sophisticated service demands.Programa de Doctorado en Ingeniería Telemática por la Universidad Carlos III de MadridPresidente: Eduardo Juan Jacob Taquet.- Secretario: Francisco Valera Pintor.- Vocal: Jorge López Vizcaín
START: Straggler Prediction and Mitigation for Cloud Computing Environments using Encoder LSTM Networks
A common performance problem in large-scale cloud systems is dealing with straggler tasks that are slow running instances which increase the overall response time. Such tasks impact the system's QoS and the SLA. There is a need for automatic straggler detection and mitigation mechanisms that execute jobs without violating the SLA. Prior work typically builds reactive models that focus first on detection and then mitigation of straggler tasks, which leads to delays. Other works use prediction based proactive mechanisms, but ignore volatile task characteristics. We propose a Straggler Prediction and Mitigation Technique (START) that is able to predict which tasks might be stragglers and dynamically adapt scheduling to achieve lower response times. START analyzes all tasks and hosts based on compute and network resource consumption using an Encoder LSTM network to predict and mitigate expected straggler tasks. This reduces the SLA violation rate and execution time without compromising QoS. Specifically, we use the CloudSim toolkit to simulate START and compare it with IGRU-SD, SGC, Dolly, GRASS, NearestFit and Wrangler in terms of QoS parameters. Experiments show that START reduces execution time, resource contention, energy and SLA violations by 13%, 11%, 16%, 19%, compared to the state-of-the-art
Scalable and Distributed Resource Management Protocols for Cloud and Big Data Clusters
Cloud data centers require an operating system to manage resources and satisfy operational requirements and management objectives. The growth of popularity in cloud services causes the appearance of a new spectrum of services with sophisticated workload and resource management requirements. Also, data centers are growing by addition of various type of hardware to accommodate the ever-increasing requests of users. Nowadays a large percentage of cloud resources are executing data-intensive applications which need continuously changing workload fluctuations and specific resource management. To this end, cluster computing frameworks are shifting towards distributed resource management for better scalability and faster decision making. Such systems benefit from the parallelization of control and are resilient to failures. Throughout this thesis we investigate algorithms, protocols and techniques to address these challenges in large-scale data centers. We introduce a distributed resource management framework which consolidates virtual machine to as few servers as possible to reduce the energy consumption of data center and hence decrease the cost of cloud providers. This framework can characterize the workload of virtual machines and hence handle trade-off energy consumption and Service Level Agreement (SLA) of customers efficiently. The algorithm is highly scalable and requires low maintenance cost with dynamic workloads and it tries to minimize virtual machines migration costs. We also introduce a scalable and distributed probe-based scheduling algorithm for Big data analytics frameworks. This algorithm can efficiently address the problem job heterogeneity in workloads that has appeared after increasing the level of parallelism in jobs. The algorithm is massively scalable and can reduce significantly average job completion times in comparison with the-state of-the-art. Finally, we propose a probabilistic fault-tolerance technique as part of the scheduling algorithm
- …