Search CORE

395 research outputs found

OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs

Author: Benini Luca
Benz Thomas
Chrapek Marcin
De Sensi Daniele
Di Girolamo Salvatore
Hoefler Torsten
Khalilov Mikhail
Schneider Timo
Shen Siyuan
Vezzu Alessandro
Publication venue
Publication date: 08/09/2023
Field of study

Multi-tenancy is essential for unleashing SmartNIC's potential in datacenters. Our systematic analysis in this work shows that existing on-path SmartNICs have resource multiplexing limitations. For example, existing solutions lack multi-tenancy capabilities such as performance isolation and QoS provisioning for compute and IO resources. Compared to standard NIC data paths with a well-defined set of offloaded functions, unpredictable execution times of SmartNIC kernels make conventional approaches for multi-tenancy and QoS insufficient. We fill this gap with OSMOSIS, a SmartNICs resource manager co-design. OSMOSIS extends existing OS mechanisms to enable dynamic hardware resource multiplexing on top of the on-path packet processing data plane. We implement OSMOSIS within an open-source RISC-V-based 400Gbit/s SmartNIC. Our performance results demonstrate that OSMOSIS fully supports multi-tenancy and enables broader adoption of SmartNICs in datacenters with low overhead.Comment: 12 pages, 14 figures, 103 reference

arXiv.org e-Print Archive

Reconfigurable network systems and software-defined networking

Author: Moore AW
Rotsos C
Watts PM
Zilberman N
Publication venue: Proceedings of the IEEE
Publication date: 01/01/2015
Field of study

Modern high-speed networks have evolved from relatively static networks to highly adaptive networks facilitating dynamic reconfiguration. This evolution has influenced all levels of network design and management, introducing increased programmability and configuration flexibility. This influence has extended from the lowest level of physical hardware interfaces to the highest level of network management by software. A key representative of this evolution is the emergence of softwaredefined networking (SDN). In this paper, we review the current state of the art in reconfigurable network systems, covering hardware reconfiguration, SDN, and the interplay between them. We take a top-down approach, starting with a tutorial on software-defined networks. We then continue to discuss programming languages as the linking element between different levels of software and hardware in the network. We review electronic switching systems, highlighting programmability and reconfiguration aspects, and describe the trends in reconfigurable network elements. Finally, we describe the state of the art in the integration of photonic transceiver and switching elements with electronic technologies, and consider the implications for SDN and reconfigurable network systems.This work was jointly supported by the UKs Engineering and Physical Sciences Research Council (EPSRC) Internet Project EP/H040536/1, an EPSRC Research Fellowship grant to Philip Watts (EP/I004157/2), and DARPA and AFRL under contract FA8750-11-C-0249.This is the final version of the article. It first appeared from IEEE via http://dx.doi.org/10.1109/JPROC.2015.243573

Crossref

Apollo (Cambridge)

Lancaster E-Prints

A novel dynamic software-defined networking approach to neutralize traffic burst

Author: Balasubramanian Venki
Kamruzzaman Joarder
Sharma Aakanksha
Publication venue: Multidisciplinary Digital Publishing Institute (MDPI)
Publication date: 01/01/2023
Field of study

Software-defined networks (SDN) has a holistic view of the network. It is highly suitable for handling dynamic loads in the traditional network with a minimal update in the network infrastructure. However, the standard SDN architecture control plane has been designed for single or multiple distributed SDN controllers facing severe bottleneck issues. Our initial research created a reference model for the traditional network, using the standard SDN (referred to as SDN hereafter) in a network simulator called NetSim. Based on the network traffic, the reference models consisted of light, modest and heavy networks depending on the number of connected IoT devices. Furthermore, a priority scheduling and congestion control algorithm is proposed in the standard SDN, named extended SDN (eSDN), which minimises congestion and performs better than the standard SDN. However, the enhancement was suitable only for the small-scale network because, in a large-scale network, the eSDN does not support dynamic SDN controller mapping. Often, the same SDN controller gets overloaded, leading to a single point of failure. Our literature review shows that most proposed solutions are based on static SDN controller deployment without considering flow fluctuations and traffic bursts that lead to a lack of load balancing among the SDN controllers in real-time, eventually increasing the network latency. Therefore, to maintain the Quality of Service (QoS) in the network, it becomes imperative for the static SDN controller to neutralise the on-the-fly traffic burst. Thus, our novel dynamic controller mapping algorithm with multiple-controller placement in the SDN is critical to solving the identified issues. In dSDN, the SDN controllers are mapped dynamically with the load fluctuation. If any SDN controller reaches its maximum threshold, the rest of the traffic will be diverted to another controller, significantly reducing delay and enhancing the overall performance. Our technique considers the latency and load fluctuation in the network and manages the situations where static mapping is ineffective in dealing with the dynamic flow variation. © 2023 by the authors

Federation ResearchOnline

VEGa : a high performance vehicular Ethernet gateway on hybrid FPGA

Author: Chakraborty Samarjit
Ettner Andreas
Fahmy Suhaib A.
Lukasiewycz Martin
Mundhenk Philipp
Shreejith Shanker
Steinhorst Sebastian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2017
Field of study

Modern vehicles employ a large amount of distributed computation and require the underlying communication scheme to provide high bandwidth and low latency. Existing communication protocols like Controller Area Network (CAN) and FlexRay do not provide the required bandwidth, paving the way for adoption of Ethernet as the next generation network backbone for in-vehicle systems. Ethernet would co-exist with safety-critical communication on legacy networks, providing a scalable platform for evolving vehicular systems. This requires a high-performance network gateway that can simultaneously handle high bandwidth, low latency, and isolation; features that are not achievable with traditional processor based gateway implementations. We present VEGa, a configurable vehicular Ethernet gateway architecture utilising a hybrid FPGA to closely couple software control on a processor with dedicated switching circuit on the reconfigurable fabric. The fabric implements isolated interface ports and an accelerated routing mechanism, which can be controlled and monitored from software. Further, reconfigurability enables the switching behaviour to be altered at run-time under software control, while the configurable architecture allows easy adaptation to different vehicular architectures using high-level parameter settings. We demonstrate the architecture on the Xilinx Zynq platform and evaluate the bandwidth, latency, and isolation using extensive tests in hardware

Crossref

Warwick Research Archives Portal Repository

Recommended from our members

Cross-Layer Platform for Dynamic, Energy-Efficient Optical Networks

Author: Lai Caroline Phooi-Mun
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2011
Field of study

The design of the next-generation Internet infrastructure is driven by the need to sustain the massive growth in bandwidth demands. Novel, energy-efficient, optical networking technologies and architectures are required to effectively meet the stringent performance requirements with low cost and ultrahigh energy efficiencies. In this thesis, a cross-layer communications platform is proposed to enable greater intelligence and functionality on the physical layer. Providing the optical layer with advanced networking capabilities will facilitate the dynamic management and optimization of optical switching based on performance monitoring measurements and higher-layer attributes. The cross-layer platform aims to create a new framework for networks to incorporate packet-scale measurement subsystems and techniques for monitoring the health of the optical channel. This will allow for quality-of-service- and energy-aware routing schemes, as well as an enhanced awareness of the optical data signals. This thesis first presents the design and development of an optical packet switching fabric. Leveraging a networking test-bed environment to validate networking hypotheses, advanced switching functionalities are demonstrated, including the support for quality-of-service based routing and packet multicasting. The investigated cross-layering is based on emerging optical technologies, enabling packet protection techniques and packet-rate switching fabric reconfiguration. Coupled with fast performance monitoring, the platform will achieve significant performance gains within the endeavor of all-optical switching. Allowing for a more intelligent, programmable optical layer aims to support greater flexibility with respect to bandwidth allocation and potentially a significant reduction in the network's energy consumption. The ultimate deliverable of this work is a high-performance, cross-layer enabled optical network node. The experimental demonstration of an initial prototype creates a dynamic network element with distributed control plane management, featuring fast packet-rate optical switching capabilities and embedded physical-layer performance monitoring modules. The cross-layer box enables an intelligent traffic delivery system that can dynamically manipulate optical switching on a packet-granular scale. With the goal of achieving advanced multi-layer routing and control algorithms, the network node requires an intelligent co-optimization across all the layers. The proposed cross-layer design should drive optical technologies and architectures in an innovative way, in order to fulfill the void between the design of basic photonic devices and the networking protocols that use them. The performance of the entire network -- from the optical components, to the routing algorithms and user applications -- should be optimized in concert. This contribution to the area of cross-layer network design creates an adaptable optical pipe that is extremely flexible and intelligent aware of both the physical optical signals and higher-layer requirements. The impact of this work will be seen in the realization of dynamic, energy-efficient optical communication links in future networking infrastructures

Columbia University Academic Commons

COREC: Concurrent Non-Blocking Single-Queue Receive Driver for Low Latency Networking

Author: Belocchi Giacomo
Bianchi Giuseppe
Faltelli Marco
Quaglia Francesco
Publication venue
Publication date: 23/01/2024
Field of study

Existing network stacks tackle performance and scalability aspects by relying on multiple receive queues. However, at software level, each queue is processed by a single thread, which prevents simultaneous work on the same queue and limits performance in terms of tail latency. To overcome this limitation, we introduce COREC, the first software implementation of a concurrent non-blocking single-queue receive driver. By sharing a single queue among multiple threads, workload distribution is improved, leading to a work-conserving policy for network stacks. On the technical side, instead of relying on traditional critical sections - which would sequentialize the operations by threads - COREC coordinates the threads that concurrently access the same receive queue in non-blocking manner via atomic machine instructions from the Read-Modify-Write (RMW) class. These instructions allow threads to access and update memory locations atomically, based on specific conditions, such as the matching of a target value selected by the thread. Also, they enable making any update globally visible in the memory hierarchy, bypassing interference on memory consistency caused by the CPU store buffers. Extensive evaluation results demonstrate that the possible additional reordering, which our approach may occasionally cause, is non-critical and has minimal impact on performance, even in the worst-case scenario of a single large TCP flow, with performance impairments accounting to at most 2-3 percent. Conversely, substantial latency gains are achieved when handling UDP traffic, real-world traffic mix, and multiple shorter TCP flows

arXiv.org e-Print Archive

Accurate and Resource-Efficient Monitoring for Future Networks

Author: Tangari Gioacchino
Publication venue: UCL (University College London)
Publication date: 28/10/2019
Field of study

Monitoring functionality is a key component of any network management system. It is essential for profiling network resource usage, detecting attacks, and capturing the performance of a multitude of services using the network. Traditional monitoring solutions operate on long timescales producing periodic reports, which are mostly used for manual and infrequent network management tasks. However, these practices have been recently questioned by the advent of Software Defined Networking (SDN). By empowering management applications with the right tools to perform automatic, frequent, and fine-grained network reconfigurations, SDN has made these applications more dependent than before on the accuracy and timeliness of monitoring reports. As a result, monitoring systems are required to collect considerable amounts of heterogeneous measurement data, process them in real-time, and expose the resulting knowledge in short timescales to network decision-making processes. Satisfying these requirements is extremely challenging given today’s larger network scales, massive and dynamic traffic volumes, and the stringent constraints on time availability and hardware resources. This PhD thesis tackles this important challenge by investigating how an accurate and resource-efficient monitoring function can be realised in the context of future, software-defined networks. Novel monitoring methodologies, designs, and frameworks are provided in this thesis, which scale with increasing network sizes and automatically adjust to changes in the operating conditions. These achieve the goal of efficient measurement collection and reporting, lightweight measurement- data processing, and timely monitoring knowledge delivery

UCL Discovery

A Hardware Architecture of a Dynamic Ranking Packet Scheduler for Programmable Network Devices

Author: David Jean Pierre
Elbediwy Mostafa
Pontikakis Bill
Savaria Yvon
Publication venue: IEEE
Publication date: 01/01/2023
Field of study

PolyPublie

A Framework for eBPF-Based Network Functions in an Era of Microservices

Author: Bertrone Matteo
Lu Yunsong
Miano Sebastiano
Risso Fulvio
Vasquez Bernal Mauricio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

By moving network functionality from dedicated hardware to software running on end-hosts, Network Functions Virtualization (NFV) pledges the benefits of cloud computing to packet processing. While most of the NFV frameworks today rely on kernel-bypass approaches, no attention has been given to kernel packet processing, which has always proved hard to evolve and to program. In this article, we present Polycube, a software framework whose main goal is to bring the power of NFV to in-kernel packet processing applications, enabling a level of flexibility and customization that was unthinkable before. Polycube enables the creation of arbitrary and complex network function chains, where each function can include an efficient in-kernel data plane and a flexible user-space control plane with strong characteristics of isolation, persistence, and composability. Polycube network functions, called Cubes, can be dynamically generated and injected into the kernel networking stack, without requiring custom kernels or specific kernel modules, simplifying the debugging and introspection, which are two fundamental properties in recent cloud environments. We validate the framework by showing significant improvements over existing applications, and we prove the generality of the Polycube programming model through the implementation of complex use cases such as a network provider for Kubernetes

Archivio istituzionale della ricerca - Politecnico di Milano

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Reconfigurable-Hardware Accelerated Stream Aggregation

Author: Ramakrishnan Geethakumari Prajith
Publication venue
Publication date: 01/01/2022
Field of study

High throughput and low latency stream aggregation is essential for many applications that analyze massive volumes of data in real-time. Incoming data need to be stored in a single sliding-window before processing, in cases where incremental aggregations are wasteful or not possible at all. However, storing all incoming values in a single-window puts tremendous pressure on the memory bandwidth and capacity. GPU and CPU memory management is inefficient for this task as it introduces unnecessary data movement that wastes bandwidth. FPGAs can make more efficient use of their memory but existing approaches employ only on-chip memory and therefore, can only support small problem sizes (i.e. small sliding windows and number of keys) due to the limited capacity. This thesis addresses the above limitations of stream processing systems by proposing techniques for accelerating single sliding-window stream aggregation using FPGAs to achieve line-rate processing throughput and ultra low latency. It does so first by building accelerators using FPGAs and second, by alleviating the memory pressure posed by single-window stream aggregation. The initial part of this thesis presents the accelerators for both windowing policies, namely, tuple- and time-\ua0based, using Maxeler\u27s DataFlow Engines\ua0(DFEs) which have a direct feed of incoming data from the network as well as direct access to off-chip DRAM. Compared to state-of-the-art stream processing software system, the DFEs offer 1-2 orders of magnitude higher processing throughput and 4 orders of magnitude lower latency. The later part of this thesis focuses on alleviating the memory pressure due to the various steps in single-window stream aggregation. Updating the window with new incoming values and reading it to feed the aggregation functions are the two primary steps in stream aggregation. The high on-chip SRAM bandwidth enables line-rate processing, but only for small problem sizes due to the limited capacity. The larger off-chip DRAM size supports larger problems, but falls short on performance due to lower bandwidth. In order to bridge this gap, this thesis introduces a specialized memory hierarchy for stream aggregation. It employs Multi-Level Queues (MLQs) spanning across multiple memory levels with different characteristics to offer both high bandwidth and capacity. In doing so, larger stream aggregation problems can be supported at line-rate performance, outperforming existing competing solutions. Compared to designs with only on-chip memory, our approach supports 4 orders of magnitude larger problems. Compared to designs that use only DRAM, our design achieves up to 8x higher throughput. Finally, this thesis aims to alleviate the memory pressure due to the window-aggregation step. Although window-updates can be supported efficiently using MLQs, frequent window-aggregations remain a performance bottleneck. This thesis addresses this problem by introducing StreamZip, a dataflow stream aggregation engine that is able to compress the sliding-windows. StreamZip deals with a number of data and control dependency challenges to integrate a compressor in the stream aggregation pipeline and alleviate the memory pressure posed by frequent aggregations. In doing so, StreamZip offers higher throughput as well as larger effective window capacity to support larger problems. StreamZip supports diverse compression algorithms offering both lossless and lossy compression to fixed- as well as floating- point numbers. Compared to designs using MLQs, StreamZip lossless and lossy designs achieve up to 7.5x and 22x higher throughput, while improving the effective memory capacity by up to 5x and 23x, respectively

Chalmers Research