1,023 research outputs found
Alioth: A Machine Learning Based Interference-Aware Performance Monitor for Multi-Tenancy Applications in Public Cloud
Multi-tenancy in public clouds may lead to co-location interference on shared
resources, which possibly results in performance degradation of cloud
applications. Cloud providers want to know when such events happen and how
serious the degradation is, to perform interference-aware migrations and
alleviate the problem. However, virtual machines (VM) in
Infrastructure-as-a-Service public clouds are black-boxes to providers, where
application-level performance information cannot be acquired. This makes
performance monitoring intensely challenging as cloud providers can only rely
on low-level metrics such as CPU usage and hardware counters.
We propose a novel machine learning framework, Alioth, to monitor the
performance degradation of cloud applications. To feed the data-hungry models,
we first elaborate interference generators and conduct comprehensive
co-location experiments on a testbed to build Alioth-dataset which reflects the
complexity and dynamicity in real-world scenarios. Then we construct Alioth by
(1) augmenting features via recovering low-level metrics under no interference
using denoising auto-encoders, (2) devising a transfer learning model based on
domain adaptation neural network to make models generalize on test cases unseen
in offline training, and (3) developing a SHAP explainer to automate feature
selection and enhance model interpretability. Experiments show that Alioth
achieves an average mean absolute error of 5.29% offline and 10.8% when testing
on applications unseen in the training stage, outperforming the baseline
methods. Alioth is also robust in signaling quality-of-service violation under
dynamicity. Finally, we demonstrate a possible application of Alioth's
interpretability, providing insights to benefit the decision-making of cloud
operators. The dataset and code of Alioth have been released on GitHub.Comment: Accepted by 2023 IEEE International Parallel & Distributed Processing
Symposium (IPDPS
SymbioCity: Smart Cities for Smarter Networks
The "Smart City" (SC) concept revolves around the idea of embodying
cutting-edge ICT solutions in the very fabric of future cities, in order to
offer new and better services to citizens while lowering the city management
costs, both in monetary, social, and environmental terms. In this framework,
communication technologies are perceived as subservient to the SC services,
providing the means to collect and process the data needed to make the services
function. In this paper, we propose a new vision in which technology and SC
services are designed to take advantage of each other in a symbiotic manner.
According to this new paradigm, which we call "SymbioCity", SC services can
indeed be exploited to improve the performance of the same communication
systems that provide them with data. Suggestive examples of this symbiotic
ecosystem are discussed in the paper. The dissertation is then substantiated in
a proof-of-concept case study, where we show how the traffic monitoring service
provided by the London Smart City initiative can be used to predict the density
of users in a certain zone and optimize the cellular service in that area.Comment: 14 pages, submitted for publication to ETT Transactions on Emerging
Telecommunications Technologie
Mage: Online Interference-Aware Scheduling in Multi-Scale Heterogeneous Systems
Heterogeneity has grown in popularity both at the core and server level as a
way to improve both performance and energy efficiency. However, despite these
benefits, scheduling applications in heterogeneous machines remains
challenging. Additionally, when these heterogeneous resources accommodate
multiple applications to increase utilization, resources are prone to
contention, destructive interference, and unpredictable performance. Existing
solutions examine heterogeneity either across or within a server, leading to
missed performance and efficiency opportunities. We present Mage, a practical
interference-aware runtime that optimizes performance and efficiency in systems
with intra- and inter-server heterogeneity. Mage leverages fast and online data
mining to quickly explore the space of application placements, and determine
the one that minimizes destructive interference between co-resident
applications. Mage continuously monitors the performance of active
applications, and, upon detecting QoS violations, it determines whether
alternative placements would prove more beneficial, taking into account any
overheads from migration. Across 350 application mixes on a heterogeneous CMP,
Mage improves performance by 38% and up to 2x compared to a greedy scheduler.
Across 160 mixes on a heterogeneous cluster, Mage improves performance by 30%
on average and up to 52% over the greedy scheduler, and by 11% over the
combination of Paragon [15] for inter- and intra-server heterogeneity
Autonomous management of cost, performance, and resource uncertainty for migration of applications to infrastructure-as-a-service (IaaS) clouds
2014 Fall.Includes bibliographical references.Infrastructure-as-a-Service (IaaS) clouds abstract physical hardware to provide computing resources on demand as a software service. This abstraction leads to the simplistic view that computing resources are homogeneous and infinite scaling potential exists to easily resolve all performance challenges. Adoption of cloud computing, in practice however, presents many resource management challenges forcing practitioners to balance cost and performance tradeoffs to successfully migrate applications. These challenges can be broken down into three primary concerns that involve determining what, where, and when infrastructure should be provisioned. In this dissertation we address these challenges including: (1) performance variance from resource heterogeneity, virtualization overhead, and the plethora of vaguely defined resource types; (2) virtual machine (VM) placement, component composition, service isolation, provisioning variation, and resource contention for multitenancy; and (3) dynamic scaling and resource elasticity to alleviate performance bottlenecks. These resource management challenges are addressed through the development and evaluation of autonomous algorithms and methodologies that result in demonstrably better performance and lower monetary costs for application deployments to both public and private IaaS clouds. This dissertation makes three primary contributions to advance cloud infrastructure management for application hosting. First, it includes design of resource utilization models based on step-wise multiple linear regression and artificial neural networks that support prediction of better performing component compositions. The total number of possible compositions is governed by Bell's Number that results in a combinatorially explosive search space. Second, it includes algorithms to improve VM placements to mitigate resource heterogeneity and contention using a load-aware VM placement scheduler, and autonomous detection of under-performing VMs to spur replacement. Third, it describes a workload cost prediction methodology that harnesses regression models and heuristics to support determination of infrastructure alternatives that reduce hosting costs. Our methodology achieves infrastructure predictions with an average mean absolute error of only 0.3125 VMs for multiple workloads
- …