152 research outputs found
Fatias de rede fim-a-fim : da extração de perfis de funções de rede a SLAs granulares
Orientador: Christian Rodolfo Esteve RothenbergTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Nos últimos dez anos, processos de softwarização de redes vêm sendo continuamente diversi- ficados e gradativamente incorporados em produção, principalmente através dos paradigmas de Redes Definidas por Software (ex.: regras de fluxos de rede programáveis) e Virtualização de Funções de Rede (ex.: orquestração de funções virtualizadas de rede). Embasado neste processo o conceito de network slice surge como forma de definição de caminhos de rede fim- a-fim programáveis, possivelmente sobre infrastruturas compartilhadas, contendo requisitos estritos de desempenho e dedicado a um modelo particular de negócios. Esta tese investiga a hipótese de que a desagregação de métricas de desempenho de funções virtualizadas de rede impactam e compõe critérios de alocação de network slices (i.e., diversas opções de utiliza- ção de recursos), os quais quando realizados devem ter seu gerenciamento de ciclo de vida implementado de forma transparente em correspondência ao seu caso de negócios de comu- nicação fim-a-fim. A verificação de tal assertiva se dá em três aspectos: entender os graus de liberdade nos quais métricas de desempenho de funções virtualizadas de rede podem ser expressas; métodos de racionalização da alocação de recursos por network slices e seus re- spectivos critérios; e formas transparentes de rastrear e gerenciar recursos de rede fim-a-fim entre múltiplos domínios administrativos. Para atingir estes objetivos, diversas contribuições são realizadas por esta tese, dentre elas: a construção de uma plataforma para automatização de metodologias de testes de desempenho de funções virtualizadas de redes; a elaboração de uma metodologia para análises de alocações de recursos de network slices baseada em um algoritmo classificador de aprendizado de máquinas e outro algoritmo de análise multi- critério; e a construção de um protótipo utilizando blockchain para a realização de contratos inteligentes envolvendo acordos de serviços entre domínios administrativos de rede. Por meio de experimentos e análises sugerimos que: métricas de desempenho de funções virtualizadas de rede dependem da alocação de recursos, configurações internas e estímulo de tráfego de testes; network slices podem ter suas alocações de recursos coerentemente classificadas por diferentes critérios; e acordos entre domínios administrativos podem ser realizados de forma transparente e em variadas formas de granularidade por meio de contratos inteligentes uti- lizando blockchain. Ao final deste trabalho, com base em uma ampla discussão as perguntas de pesquisa associadas à hipótese são respondidas, de forma que a avaliação da hipótese proposta seja realizada perante uma ampla visão das contribuições e trabalhos futuros desta teseAbstract: In the last ten years, network softwarisation processes have been continuously diversified and gradually incorporated into production, mainly through the paradigms of Software Defined Networks (e.g., programmable network flow rules) and Network Functions Virtualization (e.g., orchestration of virtualized network functions). Based on this process, the concept of network slice emerges as a way of defining end-to-end network programmable paths, possibly over shared network infrastructures, requiring strict performance metrics associated to a par- ticular business case. This thesis investigate the hypothesis that the disaggregation of network function performance metrics impacts and composes a network slice footprint incurring in di- verse slicing feature options, which when realized should have their Service Level Agreement (SLA) life cycle management transparently implemented in correspondence to their fulfilling end-to-end communication business case. The validation of such assertive takes place in three aspects: the degrees of freedom by which performance of virtualized network functions can be expressed; the methods of rationalizing the footprint of network slices; and transparent ways to track and manage network assets among multiple administrative domains. In order to achieve such goals, a series of contributions were achieved by this thesis, among them: the construction of a platform for automating methodologies for performance testing of virtual- ized network functions; an elaboration of a methodology for the analysis of footprint features of network slices based on a machine learning classifier algorithm and a multi-criteria analysis algorithm; and the construction of a prototype using blockchain to carry out smart contracts involving service level agreements between administrative systems. Through experiments and analysis we suggest that: performance metrics of virtualized network functions depend on the allocation of resources, internal configurations and test traffic stimulus; network slices can have their resource allocations consistently analyzed/classified by different criteria; and agree- ments between administrative domains can be performed transparently and in various forms of granularity through blockchain smart contracts. At the end of his thesis, through a wide discussion we answer all the research questions associated to the investigated hypothesis in such way its evaluation is performed in face of wide view of the contributions and future work of this thesisDoutoradoEngenharia de ComputaçãoDoutor em Engenharia ElétricaFUNCAM
Dynamic service chain composition in virtualised environment
Network Function Virtualisation (NFV) has contributed to improving the flexibility of network service provisioning and reducing the time to market of new services. NFV leverages the virtualisation technology to decouple the software implementation of network appliances from the physical devices on which they run. However, with the emergence of this paradigm, providing data centre applications with an adequate network performance becomes challenging. For instance, virtualised environments cause network congestion, decrease the throughput and hurt the end user experience. Moreover, applications usually communicate through multiple sequences of virtual network functions (VNFs), aka service chains, for policy enforcement and performance and security enhancement, which increases the management complexity at to the network level.
To address this problematic situation, existing studies have proposed high-level approaches of VNFs chaining and placement that improve service chain performance. They consider the VNFs as homogenous entities regardless of their specific characteristics. They have overlooked their distinct behaviour toward the traffic load and how their underpinning implementation can intervene in defining resource usage. Our research aims at filling this gap by finding out particular patterns on production and widely used VNFs. And proposing a categorisation that helps in reducing network latency at the chains.
Based on experimental evaluation, we have classified firewalls, NAT, IDS/IPS, Flow monitors into I/O- and CPU-bound functions. The former category is mainly sensitive to the throughput, in packets per second, while the performance of the latter is primarily affected by the network bandwidth, in bits per second. By doing so, we correlate the VNF category with the traversing traffic characteristics and this will dictate how the service chains would be composed.
We propose a heuristic called Natif, for a VNF-Aware VNF insTantIation and traFfic distribution scheme, to reconcile the discrepancy in VNF requirements based on the category they belong to and to eventually reduce network latency. We have deployed Natif in an OpenStack-based environment and have compared it to a network-aware VNF composition approach. Our results show a decrease in latency by around 188% on average without sacrificing the throughput
Management And Security Of Multi-Cloud Applications
Single cloud management platform technology has reached maturity and is quite successful in information technology applications. Enterprises and application service providers are increasingly adopting a multi-cloud strategy to reduce the risk of cloud service provider lock-in and cloud blackouts and, at the same time, get the benefits like competitive pricing, the flexibility of resource provisioning and better points of presence. Another class of applications that are getting cloud service providers increasingly interested in is the carriers\u27 virtualized network services. However, virtualized carrier services require high levels of availability and performance and impose stringent requirements on cloud services. They necessitate the use of multi-cloud management and innovative techniques for placement and performance management. We consider two classes of distributed applications – the virtual network services and the next generation of healthcare – that would benefit immensely from deployment over multiple clouds. This thesis deals with the design and development of new processes and algorithms to enable these classes of applications. We have evolved a method for optimization of multi-cloud platforms that will pave the way for obtaining optimized placement for both classes of services. The approach that we have followed for placement itself is predictive cost optimized latency controlled virtual resource placement for both types of applications. To improve the availability of virtual network services, we have made innovative use of the machine and deep learning for developing a framework for fault detection and localization. Finally, to secure patient data flowing through the wide expanse of sensors, cloud hierarchy, virtualized network, and visualization domain, we have evolved hierarchical autoencoder models for data in motion between the IoT domain and the multi-cloud domain and within the multi-cloud hierarchy
Management and orchestration of virtual network functions via deep reinforcement learning
Management and orchestration (MANO) of re-sources by virtual network functions (VNFs) represents one of thekey challenges towards a fully virtualized network architectureas envisaged by 5G standards. Current threshold-based policiesinefficiently over-provision network resources and under-utilizeavailable hardware, incurring high cost for network operators,and consequently, the users. In this work, we present a MANOalgorithm for VNFs allowing a central unit (CU) to learnto autonomously re-configure resources (processing power andstorage), deploy new VNF instances, or offload them to the cloud,depending on the network conditions, available pool of resources,and the VNF requirements, with the goal of minimizing a costfunction that takes into account the economical cost as wellas latency and the quality-of-service (QoS) experienced by theusers. First, we formulate the stochastic resource optimizationproblem as a parameterized action Markov decision process(PAMDP). Then, we propose a solution based on deep reinforce-ment learning (DRL). More precisely, we present a novel RLapproach, called parameterized action twin (PAT) deterministicpolicy gradient, which leverages anactor-critic architecturetolearn to provision resources to the VNFs in an online manner.Finally, we present numerical performance results, and map themto 5G key performance indicators (KPIs). To the best of ourknowledge, this is the first work that considers DRL for MANOof VNFs’ physical resources
Analysis of scaling policies for NFV providing 5G/6G reliability levels with fallible servers
The softwarization of mobile networks enables an efficient use of resources, by dynamically scaling and re-assigning them following variations in demand. Given that the activation of additional servers is not immediate, scaling up resources should anticipate traffic demands to prevent service disruption. At the same time, the activation of more servers than strictly necessary results in a waste of resources, and thus should be avoided. Given the stringent reliability requirements of 5G applications (up to 6 nines) and the fallible nature of servers, finding the right trade-off between efficiency and service disruption is particularly critical. In this paper, we analyze a generic auto-scaling mechanism for communication services, used to de(activate) servers in a cluster, based on occupation thresholds. We model the impact of the activation delay and the finite lifetime of the servers on performance, in terms of power consumption and failure probability. Based on this model, we derive an algorithm to optimally configure the thresholds. Simulation results confirm the accuracy of the model both under synthetic and realistic traffic patterns as well as the effectiveness of the configuration algorithm. We also provide some insights on the best strategy to support an energy-efficient highly-reliable service: deploying a few powerful and reliable machines versus deploying many machines, but less powerful and reliable.The work of
Jorge Ortin was funded in part by the Spanish Ministry of Science under
Grant RTI2018-099063-B-I00, in part by the Gobierno de Aragon through
Research Group under Grant T31_20R, in part by the European Social
Fund (ESF), and in part by Centro Universitario de la Defensa under
Grant CUD-2021_11. The work of Pablo Serrano was partly funded by
the European Commission (EC) through the H2020 project Hexa-X (Grant
Agreement no. 101015956), and in part by Spanish State Research Agency
(TRUE5G project, PID2019-108713RB-C52PID2019-108713RB-C52/AEI/
10.13039/501100011033). The work of Jaime Garcia-Reinoso was partially
supported by the EC in the framework of H2020-EU.2.1.1. 5G EVE project
(Grant agreement no. 815074). The work of Albert Banchs was partially
supported by the EC in the framework of H2020-EU.2.1.1. 5G-TOURS
project (Grant agreement no. 856950) also partially supported by the Spanish
State Research Agency (TRUE5G project, PID2019-108713RB-C52PID2019-
108713RB-C52/AEI/10.13039/501100011033)
Definition and specification of connectivity and QoE/QoS management mechanisms – final report
This document summarizes the WP5 work throughout the project, describing its functional architecture and the solutions that implement the WP5 concepts on network control and orchestration. For this purpose, we defined 3 innovative controllers that embody the network slicing and multi tenancy: SDM-C, SDM-X and SDM-O. The functionalities of each block are detailed with the interfaces connecting them and validated through exemplary network processes, highlighting thus 5G NORMA innovations. All the proposed modules are designed to implement the functionality needed to provide the challenging KPIs required by future 5G networks while keeping the largest possible compatibility with the state of the art
Data-Driven Methods for Data Center Operations Support
During the last decade, cloud technologies have been evolving at
an impressive pace, such that we are now living in a cloud-native
era where developers can leverage on an unprecedented landscape
of (possibly managed) services for orchestration, compute, storage,
load-balancing, monitoring, etc. The possibility to have on-demand
access to a diverse set of configurable virtualized resources allows
for building more elastic, flexible and highly-resilient distributed
applications. Behind the scenes, cloud providers sustain the heavy
burden of maintaining the underlying infrastructures, consisting in
large-scale distributed systems, partitioned and replicated among
many geographically dislocated data centers to guarantee scalability,
robustness to failures, high availability and low latency. The larger the
scale, the more cloud providers have to deal with complex interactions
among the various components, such that monitoring, diagnosing and
troubleshooting issues become incredibly daunting tasks.
To keep up with these challenges, development and operations
practices have undergone significant transformations, especially in
terms of improving the automations that make releasing new software,
and responding to unforeseen issues, faster and sustainable at scale.
The resulting paradigm is nowadays referred to as DevOps. However,
while such automations can be very sophisticated, traditional DevOps
practices fundamentally rely on reactive mechanisms, that typically
require careful manual tuning and supervision from human experts.
To minimize the risk of outages—and the related costs—it is crucial to
provide DevOps teams with suitable tools that can enable a proactive
approach to data center operations.
This work presents a comprehensive data-driven framework to address
the most relevant problems that can be experienced in large-scale
distributed cloud infrastructures. These environments are indeed characterized
by a very large availability of diverse data, collected at each
level of the stack, such as: time-series (e.g., physical host measurements,
virtual machine or container metrics, networking components
logs, application KPIs); graphs (e.g., network topologies, fault graphs
reporting dependencies among hardware and software components,
performance issues propagation networks); and text (e.g., source code,
system logs, version control system history, code review feedbacks).
Such data are also typically updated with relatively high frequency,
and subject to distribution drifts caused by continuous configuration
changes to the underlying infrastructure. In such a highly dynamic scenario,
traditional model-driven approaches alone may be inadequate
at capturing the complexity of the interactions among system components. DevOps teams would certainly benefit from having robust
data-driven methods to support their decisions based on historical
information. For instance, effective anomaly detection capabilities may
also help in conducting more precise and efficient root-cause analysis.
Also, leveraging on accurate forecasting and intelligent control
strategies would improve resource management.
Given their ability to deal with high-dimensional, complex data,
Deep Learning-based methods are the most straightforward option for
the realization of the aforementioned support tools. On the other hand,
because of their complexity, this kind of models often requires huge
processing power, and suitable hardware, to be operated effectively
at scale. These aspects must be carefully addressed when applying
such methods in the context of data center operations. Automated
operations approaches must be dependable and cost-efficient, not to
degrade the services they are built to improve.
i
- …