87 research outputs found
Modelling email traffic workloads with RNN and LSTM models
Analysis of time series data has been a challenging research subject for decades. Email traffic has recently been modelled as a time series function using a Recurrent Neural Network (RNN) and RNNs were shown to provide higher prediction accuracy than previous probabilistic models from the literature. Given the exponential rise of email workloads which need to be handled by email servers, in this paper we first present and discuss the literature on modelling email traffic. We then explain the advantages and limitations of different approaches as well as their points of agreement and disagreement. Finally, we present a comprehensive comparison between the performance of RNN and Long Short Term Memory (LSTM) models. Our experimental results demonstrate that both approaches can achieve high accuracy over four large datasets acquired from different universities’ servers, outperforming existing work, and show that the use of LSTM and RNN is very promising for modelling email traffic
Recommended from our members
Scheduling, Characterization and Prediction of HPC Workloads for Distributed Computing Environments
As High Performance Computing (HPC) has grown considerably and is expected to grow even more, effective resource management for distributed computing sys- tems is motivated more than ever. As the computational workloads grow in quantity, it is becoming more crucial to apply efficient resource management and workload scheduling to use resources efficiently while keeping the computational performance reasonably good. The problem of efficiently scheduling workloads on resources while meeting performance standards is hard. Additionally, non-clairvoyance of job dimen- sions makes resource management even harder in real-world scenarios. Our research methodology investigates the scheduling problem compliant for HPC and researches the challenges for deploying the scheduling in real world-scenarios using state of the art machine learning and data science techniques.To this end, this Ph.D. dissertation makes the following core contributions: a) We perform a theoretical analysis of space-sharing, non-preemptive scheduling: we studied this scheduling problem and proposed scheduling algorithms with polyno- mial computation time. We also proved constant upper-bounds for the performance of these algorithms. b) We studied the sensitivity of scheduling algorithms to the accuracy of runtime and devised a meta-learning approach to estimate prediction accuracy for newly submitted jobs to the HPC system. c) We studied the runtime prediction problem for HPC applications. For this purpose, we studied the distri- bution of available public workloads and proposed two different solutions that can predict multi-modal distributions: switching state-space models and Mixture Density Networks. d) We studied the effectiveness of recent recurrent neural network models for CPU usage trace prediction for individual VM traces as well as aggregate CPU usage traces. In this dissertation, we explore solutions to improve the performance of scheduling workloads on distributed systems.We begin by looking at the problem from the theoretical perspective. Modeling the problem mathematically, we first propose a scheduling algorithm that finds a constant approximation of the optimal solution for the problem in polynomial time. We prove that the performance of the algorithm (average completion time is the constant approximation of the performance of the optimal scheduling. We next look at the problem in real-world scenarios. Considering High-Performance Computing (HPC) workload computing environments as the most similar real-world equivalent of our mathematical model, we explore the problem of predicting application runtime. We propose an algorithm to handle the existing uncertainties in the real world and show-case our algorithm with demonstrative effectiveness in terms of response time and resource utilization. After looking at the uncertainty problem, we focus on trying to improve the accuracy of existing prediction approaches for HPC application runtime. We propose two solutions, one based on Kalman filters and one based on deep density mixture networks. We showcase the effectiveness of our prediction approaches by comparing with previous prediction approaches in terms of prediction accuracy and impact on improving scheduling performance. In the end, we focus on predicting resource usage for individual applications during their execution. We explore the application of recurrent neural networks for predicting resource usage of applications deployed on individual virtual machines. To validate our proposed models and solutions, we performed extensive trace-driven simulation and measured the effectiveness of our approaches
MetaNet: automated dynamic selection of scheduling policies in cloud environments
Task scheduling is a well-studied problem in the context of optimizing the Quality of Service (QoS) of cloud computing environments. In order to sustain the rapid growth of computational demands, one of the most important QoS metrics for cloud schedulers is the execution cost. In this regard, several data-driven deep neural networks (DNNs) based schedulers have been proposed in recent years to allow scalable and efficient resource management in dynamic workload settings. However, optimal scheduling frequently relies on sophisticated DNNs with high computational needs implying higher execution costs. Further, even in non-stationary environments, sophisticated schedulers might not always be required and we could briefly rely on low-cost schedulers in the interest of cost-efficiency. Therefore, this work aims to solve the non-trivial meta problem of online dynamic selection of a scheduling policy using a surrogate model called MetaNet. Unlike traditional solutions with a fixed scheduling policy, MetaNet on-the-fly chooses a scheduler from a large set of DNN based methods to optimize task scheduling and execution costs in tandem. Compared to state-of-the-art DNN schedulers, this allows for improvement in execution costs, energy consumption, response time and service level agreement violations by up to 11, 43, 8 and 13 percent, respectively
Adaptive prediction models for data center resources utilization estimation
Accurate estimation of data center resource utilization is a challenging task due to multi-tenant co-hosted applications having dynamic and time-varying workloads. Accurate estimation of future resources utilization helps in better job scheduling, workload placement, capacity planning, proactive auto-scaling, and load balancing. The inaccurate estimation leads to either under or over-provisioning of data center resources. Most existing estimation methods are based on a single model that often does not appropriately estimate different workload scenarios. To address these problems, we propose a novel method to adaptively and automatically identify the most appropriate model to accurately estimate data center resources utilization. The proposed approach trains a classifier based on statistical features of historical resources usage to decide the appropriate prediction model to use for given resource utilization observations collected during a specific time interval. We evaluated our approach on real datasets and compared the results with multiple baseline methods. The experimental evaluation shows that the proposed approach outperforms the state-of-the-art approaches and delivers 6% to 27% improved resource utilization estimation accuracy compared to baseline methods.This work is partially supported by the European Research Council (ERC) under the EU Horizon 2020 programme (GA 639595), the Spanish Ministry of Economy, Industry and Competitiveness (TIN2015-65316-P and IJCI2016-27485), the Generalitat de Catalunya (2014-SGR-1051), and NPRP grant # NPRP9-224-1-049 from the Qatar National Research Fund (a member of Qatar Foundation) and University of the Punjab, Pakistan.Peer ReviewedPostprint (published version
Data-Driven Methods for Data Center Operations Support
During the last decade, cloud technologies have been evolving at
an impressive pace, such that we are now living in a cloud-native
era where developers can leverage on an unprecedented landscape
of (possibly managed) services for orchestration, compute, storage,
load-balancing, monitoring, etc. The possibility to have on-demand
access to a diverse set of configurable virtualized resources allows
for building more elastic, flexible and highly-resilient distributed
applications. Behind the scenes, cloud providers sustain the heavy
burden of maintaining the underlying infrastructures, consisting in
large-scale distributed systems, partitioned and replicated among
many geographically dislocated data centers to guarantee scalability,
robustness to failures, high availability and low latency. The larger the
scale, the more cloud providers have to deal with complex interactions
among the various components, such that monitoring, diagnosing and
troubleshooting issues become incredibly daunting tasks.
To keep up with these challenges, development and operations
practices have undergone significant transformations, especially in
terms of improving the automations that make releasing new software,
and responding to unforeseen issues, faster and sustainable at scale.
The resulting paradigm is nowadays referred to as DevOps. However,
while such automations can be very sophisticated, traditional DevOps
practices fundamentally rely on reactive mechanisms, that typically
require careful manual tuning and supervision from human experts.
To minimize the risk of outages—and the related costs—it is crucial to
provide DevOps teams with suitable tools that can enable a proactive
approach to data center operations.
This work presents a comprehensive data-driven framework to address
the most relevant problems that can be experienced in large-scale
distributed cloud infrastructures. These environments are indeed characterized
by a very large availability of diverse data, collected at each
level of the stack, such as: time-series (e.g., physical host measurements,
virtual machine or container metrics, networking components
logs, application KPIs); graphs (e.g., network topologies, fault graphs
reporting dependencies among hardware and software components,
performance issues propagation networks); and text (e.g., source code,
system logs, version control system history, code review feedbacks).
Such data are also typically updated with relatively high frequency,
and subject to distribution drifts caused by continuous configuration
changes to the underlying infrastructure. In such a highly dynamic scenario,
traditional model-driven approaches alone may be inadequate
at capturing the complexity of the interactions among system components. DevOps teams would certainly benefit from having robust
data-driven methods to support their decisions based on historical
information. For instance, effective anomaly detection capabilities may
also help in conducting more precise and efficient root-cause analysis.
Also, leveraging on accurate forecasting and intelligent control
strategies would improve resource management.
Given their ability to deal with high-dimensional, complex data,
Deep Learning-based methods are the most straightforward option for
the realization of the aforementioned support tools. On the other hand,
because of their complexity, this kind of models often requires huge
processing power, and suitable hardware, to be operated effectively
at scale. These aspects must be carefully addressed when applying
such methods in the context of data center operations. Automated
operations approaches must be dependable and cost-efficient, not to
degrade the services they are built to improve.
i
Online Contextual System Tuning with Bayesian Optimization and Workload Forecasting
L'ottimizzazione dei moderni sistemi software può essere estremamente impegnativa: l'elevata quantità di parametri di configurazione e le loro complesse dipendenze rendono estremamente tediosa e dispendiosa la ricerca della configurazione ottimale. Inoltre, la configurazione ottimale dipende dal carico di lavoro a cui è soggetto il sistema.
Questa tesi presenta il lavoro svolto per estendere un preesistente ottimizzatore di prestazioni in modo da ottimizzare direttamente il sistema di produzione sfruttando il carico di lavoro reale percepito dal sistema, ovvero mentre sta servendo i suoi clienti. Questo approccio evita la necessità di analizzare e replicare il carico di lavoro su una replica del sistema, ma pone nuove sfide.
Per applicare l'ottimizzatore direttamente ai sistemi di produzione sono stati sviluppati due principali moduli: un modulo di previsione del carico di lavoro, basato su tecniche all'avanguardia che riducono al minimo la necessità del lavoro manuale, e un modulo di verifica della stabilità, utilizzato per decidere quando sperimentare nuove configurazioni. Con questi due moduli si riduce la probabilità di esaminare una nuova configurazione possibilmente errata durante un cambiamento nel carico di lavoro, che potenzialmente ridurrebbe la qualità dei servizi del sistema.
Inoltre, ottimizzando direttamente il sistema di produzione si riduce la mole di lavoro necessaria per poter applicare l'ottimizzatore su sistemi distinti.
La soluzione proposta è stata verificata svolgendo 20 esperimenti di ottimizzazione di due modelli di database, evidenziando che l'integrazione di tecniche di previsione migliora la sicurezza del processo di ottimizzazione mantenendo l'efficacia dell'ottimizzatore originale.Tuning modern software systems can be tremendously challenging: the huge number of configuration parameters and their complex dependencies make the manual research for the optimal configuration tedious and time-consuming. Furthermore, such optimal configuration depends on the workload under which the system is running.
This thesis presents the work done to extend an existing performance tuner to be directly applied to a production environment exploiting the real workload perceived by the system, i.e. while it is serving its clients, hence the term Online System Tuning. This approach avoids the necessity of analyzing and replicating the workload on a replica of the system but poses new challenges.
To apply the tuner directly to production environments, two main modules were developed: a workload forecasting module, based on state-of-the-art techniques that minimize the necessity of manual work, and a stability finder module, used to decide when to perform tuning experiments. With these two modules, the probability of testing a new and possibly mistaken software configuration during a workload change is reduced, which would cause the system clients to suffer Quality of Service losses.
Moreover, by directly tuning the production system the effort of running the tuner is reduced, meaning that it is easier and faster to apply to different scenarios.
The proposed solution was tested on two DBMS models with 20 scenarios, highlighting that the integration of forecasting techniques improves the safety of the tuning process while keeping the effectiveness of the original tuner
Workload Analysis of Cloud Resources using Time Series and Machine Learning Prediction
© 2019 IEEE. Most of the businesses now-a-days have started using cloud platforms to host their software applications. A Cloud platform is shared resource that provides various services like software as a service (SAAS), infrastructure as a service (IAAS) or anything as a service (XAAS) that is required to develop and deploy any business application. These cloud services are provided as virtual machines (VM) that can handle the end user's requirements. The cloud providers must ensure efficient resource handling mechanisms for different time intervals to avoid wastage of resources. Auto-scaling mechanisms would take care of using these resources appropriately along with providing an excellent quality of service. Auto-scaling supports the cloud service providers achieve the goal of supplying the required resources automatically. It use methods that will calculate the number of requests and decides the resources to release based on workload. The workload consists of some quantity of application program running on the machine and usually some number of users connected to and communicating with the computer's applications. The researchers have used various approaches to perform autoscaling which is a process to predict the workload that is required to handle the end users request and provide required resources as Virtual Machines (VM) disruptively. Along with providing uninterrupted service, the businesses also only pay for the service they use, thus increasing the popularity of Cloud computing. Based on the workload identified the resources are provisioned. The resource provisioning techniques is a model used for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, applications, and services) required resources are released. In this regard, the aim of this paper is to develop a framework to predict the workload using deep learning which would be able to handle provisioning of cloud resources dynamically. This framework would handle the user request efficiently and allocate the required virtual machines. As a result, an efficient dynamic method of provisioning of cloud services would be implemented supporting both the cloud providers and users
МОДЕЛЮВАННЯ МІНІМАЛЬНОЇ КІЛЬКОСТІ ВУЗЛІВ КЛАСТЕРА ВІРТУАЛІЗАЦІЇ ПРИВАТНИХ УНІВЕРСИТЕТСЬКОЇ ХМАРИ
Cloud computing is a dynamically evolving computing paradigm. The demand for cloud applications and technologies has especially increased during the CoVID-19 pandemic and martial law in Ukraine. The main purpose of using cloud applications and technologies is to free users of cloud resources from managing hardware and software. One of the challenges in designing a private university cloud is estimating the required number of virtualization cluster nodes. These hosts host virtual machines (VMs) of users. These VMs can be used by students and teachers to complete academic assignments as well as scientific work. The second task is to optimize the placement of VMs in the computer network (CN) of the university, which makes it possible to reduce the number of CN nodes without affecting functionality. And this ultimately helps to reduce the cost of such a solution to deploy a private university cloud, which is not unimportant for Ukrainian universities under martial law. The article proposes a model for estimating the required number of virtualization cluster nodes for a private university cloud. The model is based on a combined approach that involves jointly solving the problem of optimal packing and finding, using a genetic algorithm, the configuration of server platforms of a private university cloud.Хмарні обчислення - це парадигма обчислень, що динамічно розвивається. Особливо зросла затребуваність хмарних додатків та технологій (ХДТ) у період пандемії коронавірусу CoVID-19 та військового стану в Україні. Основною метою застосування ХДТ є звільнення користувачів хмарними ресурсами від управління апаратним та програмним забезпеченням (ПЗ). Однією із задач при проектуванні приватної університетської хмари є оцінка необхідної кількості вузлів кластера віртуалізації. На таких вузлах розміщують віртуальні машини (ВМ) користувачів. Ці ВМ можуть використовуватися учнями та викладачами для виконання навчальних завдань, а також для наукової роботи. Друге завдання – оптимізація розміщення ВМ в обчислювальній мережі (ОМ) закладу вищої освіти, що дозволяє скоротити кількість вузлів ОМ без впливу на функціональність. А це, зрештою, сприяє скороченню вартості такого рішення щодо розгортання приватної університетської хмари, що не менш важливо для українських закладів вищої освіти в умовах військового стану. У статті запропоновано модель оцінки необхідної кількості вузлів кластера віртуалізації для приватної університетської хмари. Модель заснована на комбінованому підході, який передбачає спільне рішення задачі про оптимальну упаковку та знаходження за допомогою генетичного алгоритму конфігурації серверних платформ приватної університетської хмари
- …