103 research outputs found
A Federated Learning Approach to Routing in Challenged SDN-Enabled Edge Networks
The edge computing paradigm allows computationally intensive tasks to be offloaded from small devices to nearby (more) powerful servers, via an edge network. The intersection between such edge computing paradigm and Machine Learning (ML), in general, and deep learning in particular, has brought to light several advantages for network operators: from automating management tasks, to gain additional insights on their networks. Most of the existing approaches that use ML to drive routing and traffic control decisions are valuable but rarely focus on challenged networks, that are characterized by continually varying network conditions and the high volume of traffic generated by edge devices. In particular, recently proposed distributed ML-based architectures require either a long synchronization phase or a training phase that is unsustainable for challenged networks. In this paper, we fill this knowledge gap with Blaster, a federated architecture for routing packets within a distributed edge network, to improve the application's performance and allow scalability of data-intensive applications. We also propose a novel path selection model that uses Long Short Term Memory (LSTM) to predict the optimal route. Finally, we present some initial results obtained by testing our approach via simulations and with a prototype deployed over the GENI testbed. By leveraging a Federated Learning (FL) model, our approach shows that we can optimize the communication between SDN controllers, preserving bandwidth for the data traffic
Resource Inference for Task Migration in Challenged Edge Networks with RITMO
Edge computing, combined with the proliferation of IoT devices, is generating new business model opportunities and applications. Among those applications, Unmanned Aerial Vehicles (UAVs) have been deployed in several scenarios, from surveillance and monitoring to disaster response, to precision agriculture.
To support such applications, however, edge network managers and application programmers need to overcome a few challenges, e.g., unstable network conditions, high loss rate, and node failures. Existing solutions designed to mitigate such inefficiencies by predicting future network conditions are often computationally intensive and hence less portable on constrained devices. In this paper, we propose RITMO, a distributed and adaptive task planning algorithm that aims at solving these challenges while running on a network of UAV devices.
We model our system as a network of queues, and we exploit a simple yet effective ARIMA regressor, to dynamically predict the length of future UAV task queues. Such prediction is then used to proactively migrate the tasks in case of a failure or unbalanced loads. Our simulation results demonstrate how RITMO helps to reduce the overall latency perceived by the application and anticipates the node overloading by avoiding agents that are likely to exhaust their computational resources
A Distributed Reinforcement Learning Approach for Energy and Congestion-Aware Edge Networks
The abiding attempt of automation has also pervaded computer networks, with the ability to measure, analyze, and control themselves in an automated manner, by reacting to changes in the environment (e.g., demand) while exploiting existing flexibilities. When provided with these features, networks are often referred to as "self-driving". Network virtualization and machine learning are the drivers. In this regard, the provision and orchestration of physical or virtual resources are crucial for both Quality of Service guarantees and cost management in the edge/cloud computing ecosystem. Auto-scaling mechanisms are hence essential to effectively manage the lifecycle of network resources. In this poster, we propose Relevant, a distributed reinforcement learning approach to enable distributed automation for network orchestrators. Our solution aims at solving the congestion control problem within Software-Defined Network infrastructures, while being mindful of the energy consumption, helping resources to scale up and down as traffic demands fluctuate and energy optimization opportunities arise
Restoring Application Traffic of Latency-Sensitive Networked Systems using Adversarial Autoencoders
The Internet of Things (IoT), coupled with the edge computing paradigm, is enabling several pervasive networked applications with stringent real-time requirements, such as telemedicine and haptic telecommunications. Recent advances in network virtualization and artificial intelligence are helping solve network latency and capacity problems, learning from several states of the network stack. However, despite such advances, a network architecture able to meet the demands of next-generation networked applications with stringent real-time requirements still has untackled challenges. In this paper, we argue that only using network (or transport) layer information to predict traffic evolution and other network states may be insufficient, and a more holistic approach that considers predictions of application-layer states is needed to repair the inefficiencies of the TCP/IP architecture. Based on this intuition, we present the design and implementation of Reparo. At its core, the design of our solution is based on the detection of a packet loss and its restoration using a Hidden Markov Model (HMM) empowered with adversarial autoencoders. In our evaluation, we considered a telemedicine use case, specifically a telepathology session, in which a microscope is controlled remotely in real-time to assess histological imagery. Our results confirm that the use of adversarial autoencoders enhances the accuracy of the prediction method satisfying our telemedicine application’s requirements with a notable improvement in terms of throughput and latency perceived by the user
On Control and Data Plane Programmability for Data-Driven Networking
The soaring complexity of networks has led to more and more complex methods to manage and orchestrate efficiently the multitude of network environments. Several solutions exist, such as OpenFlow, NetConf, P4, DPDK, etc., that allow network programmability at both control and data plane level, driving innovation in many focused high-performance networked applications. However, with the increase of strict requirements in critical applications, also the networking architecture and its operations should be redesigned. In particular, recent advances in machine learning have opened new opportunities to the automation of network management, exploiting existing advances in software-defined infrastructures. We argue that the design of effective data-driven network management solutions needs to collect, merge, and process states from both data and control planes. This paper sheds light upon the benefits of utilizing such an approach to support feature extraction and data collection for network automation
HINT: Supporting Congestion Control Decisions with P4-driven In-Band Network Telemetry
Years of research on congestion controls have highlighted how end-to-end and in-network protocols might perform poorly in some contexts. Recent advances in data plane network programmability could also bring advantages in transport protocols, enabling mining and processing in-network congestion signals. However, the new machine learning-based congestion control class has only partially used data from the network, favoring a more sophisticated model design but neglecting possibly precious pieces of data. In this paper, we present HINT, an in-band network telemetry architecture designed to provide insights into network congestion to the end-host TCP algorithm during the learning process. In particular, the key idea is to adapt switches’ behavior via P4 and instruct them to insert simple device information, such as processing delay and queue occupancy, directly into transferred packets. Initial experimental results show that this approach comes with a little network overhead but can improve the visibility and, consequently, the accuracy of TCP decisions of the end-host. At the same time, the programmability of both switches and hosts also enables customization of the default behavior as the user’s needs change
Sustainable Task Offloading in UAV Networks via Multi-Agent Reinforcement Learning
The recent growth of IoT devices, along with edge computing, has revealed many opportunities for novel applications. Among them, Unmanned Aerial Vehicles (UAVs), which are deployed for surveillance and environmental monitoring, are attracting increasing attention. In this context, typical solutions must deal with events that may change the state of the network, providing a service that continuously maintains a high level of performance. In this paper, we address this problem by proposing a distributed architecture that leverages a Multi-Agent Reinforcement Learning (MARL) technique to dynamically offload tasks from UAVs to the edge cloud. Nodes of the system co-operate to jointly minimize the overall latency perceived by the user and the energy usage on UAVs by continuously learning from the environment the best action, which entails the decision of offloading and, in this case, the best transmission technology, i.e., Wi-Fi or cellular. Results validate our distributed architecture and show the effectiveness of the approach in reaching the above targets
An architecture for adaptive task planning in support of IoT-based machine learning applications for disaster scenarios
The proliferation of the Internet of Things (IoT) in conjunction with edge computing has recently opened up several possibilities for several new applications. Typical examples are Unmanned Aerial Vehicles (UAV) that are deployed for rapid disaster response, photogrammetry, surveillance, and environmental monitoring. To support the flourishing development of Machine Learning assisted applications across all these networked applications, a common challenge is the provision of a persistent service, i.e., a service capable of consistently maintaining a high level of performance, facing possible failures. To address these service resilient challenges, we propose APRON, an edge solution for distributed and adaptive task planning management in a network of IoT devices, e.g., drones. Exploiting Jackson's network model, our architecture applies a novel planning strategy to better support control and monitoring operations while the states of the network evolve. To demonstrate the functionalities of our architecture, we also implemented a deep-learning based audio-recognition application using the APRON NorthBound interface, to detect human voices in challenged networks. The application's logic uses Transfer Learning to improve the audio classification accuracy and the runtime of the UAV-based rescue operations
Supporting Sustainable Virtual Network Mutations with Mystique
The abiding attempt of automation has also permeated the networks, with the ability to measure, analyze, and control themselves in an automated manner, by reacting to changes in the environment (e.g., demand).
When provided with these features, networks are often labeled as "self-driving" or "autonomous". In this regard, the provision and orchestration of physical or virtual resources are crucial for both Quality of Service (QoS) guarantees and cost management in the edge/cloud computing environment. To effectively manage the lifecycle of these resources, an auto-scaling mechanism is essential.
However, traditional threshold-based and recent Machine Learning (ML)-based policies are often unable to address the soaring complexity of networks due to their centralized approach.
By relying on multi-agent reinforcement learning, we propose Mystique, a solution that learns from the load on links to establish the minimal set of active network resources. As traffic demands ebb and flow, our adaptive and self-driving solution can scale up and down and also react to failures in a fully automated, flexible, and efficient manner.
Our results demonstrate that the presented solution can reduce network energy consumption while providing an adequate service level, outperforming other benchmark auto-scaling approaches
Owl: Congestion Control with Partially Invisible Networks via Reinforcement Learning
Years of research on transport protocols have not solved the tussle between in-network and end-to-end congestion control. This debate is due to the variance of conditions and assumptions in different network scenarios, e.g., cellular versus data center networks. Recently, the community has proposed a few transport protocols driven by machine learning, nonetheless limited to end-to-end approaches.
In this paper, we present Owl, a transport protocol based on reinforcement learning, whose goal is to select the proper congestion window learning from end-to-end features and network signals, when available.
We show that our solution converges to a fair resource allocation after the learning overhead.
Our kernel implementation, deployed over emulated and large scale virtual network testbeds, outperforms all benchmark solutions based on end-to-end or in-network congestion control
- …