Search CORE

34 research outputs found

On the Exploration of FPGAs and High-Level Synthesis Capabilities on Multi-Gigabit-per-Second Networks

Author: Ruiz Noguera Mario Daniel
Publication venue
Publication date: 24/01/2020
Field of study

Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura: 24-01-2020Traffic on computer networks has faced an exponential grown in recent years. Both links and communication equipment had to adapt in order to provide a minimum quality of service required for current needs. However, in recent years, a few factors have prevented commercial off-the-shelf hardware from being able to keep pace with this growth rate, consequently, some software tools are struggling to fulfill their tasks, especially at speeds higher than 10 Gbit/s. For this reason, Field Programmable Gate Arrays (FPGAs) have arisen as an alternative to address the most demanding tasks without the need to design an application specific integrated circuit, this is in part to their flexibility and programmability in the field. Needless to say, developing for FPGAs is well-known to be complex. Therefore, in this thesis we tackle the use of FPGAs and High-Level Synthesis (HLS) languages in the context of computer networks. We focus on the use of FPGA both in computer network monitoring application and reliable data transmission at very high-speed. On the other hand, we intend to shed light on the use of high level synthesis languages and boost FPGA applicability in the context of computer networks so as to reduce development time and design complexity. In the first part of the thesis, devoted to computer network monitoring. We take advantage of the FPGA determinism in order to implement active monitoring probes, which consist on sending a train of packets which is later used to obtain network parameters. In this case, the determinism is key to reduce the uncertainty of the measurements. The results of our experiments show that the FPGA implementations are much more accurate and more precise than the software counterpart. At the same time, the FPGA implementation is scalable in terms of network speed — 1, 10 and 100 Gbit/s. In the context of passive monitoring, we leverage the FPGA architecture to implement algorithms able to thin cyphered traffic as well as removing duplicate packets. These two algorithms straightforward in principle, but very useful to help traditional network analysis tools to cope with their task at higher network speeds. On one hand, processing cyphered traffic bring little benefits, on the other hand, processing duplicate traffic impacts negatively in the performance of the software tools. In the second part of the thesis, devoted to the TCP/IP stack. We explore the current limitations of reliable data transmission using standard software at very high-speed. Nowadays, the network is becoming an important bottleneck to fulfill current needs, in particular in data centers. What is more, in recent years the deployment of 100 Gbit/s network links has started. Consequently, there has been an increase scrutiny of how networking functionality is deployed, furthermore, a wide range of approaches are currently being explored to increase the efficiency of networks and tailor its functionality to the actual needs of the application at hand. FPGAs arise as the perfect alternative to deal with this problem. For this reason, in this thesis we develop Limago an FPGA-based open-source implementation of a TCP/IP stack operating at 100 Gbit/s for Xilinx’s FPGAs. Limago not only provides an unprecedented throughput, but also, provides a tiny latency when compared to the software implementations, at least fifteen times. Limago is a key contribution in some of the hottest topic at the moment, for instance, network-attached FPGA and in-network data processing

Biblos-e Archivo

100GBit/s RF sample offload for RFSoC using GNU Radio and PYNQ

Author: Crockett Louise H.
Goldsmith Josh
Kaladė Šarūnas
Northcote David
Šiaučiulis Marius
Publication venue
Publication date: 28/06/2023
Field of study

Modern software-defined radio systems are capable of multi-gigabit-per-second sampling rates producing unprecedented amounts of digitised RF data. In applications such as wideband spectrum sensing and machine learning algorithms for cognitive radio, prototyping, and instrumentation, it is often impractical to process the acquired data locally in real time. This motivates the need for a high-speed connection to offload data to an accelerator application running on a secondary processing resource. In this paper, we present a novel hardware and software co-design using the AMD RFSoC 4x2 platform, PYNQ and GNU Radio projects. The demonstrated system is capable of continuous 80GBit/s offload in a 100GBit/s channel, utilising GPU acceleration to rapidly process the Fast Fourier Transforms of a full 2GHz bandwidth RF signal at 60 frames per second

University of Strathclyde Institutional Repository

OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs

Author: Benini Luca
Benz Thomas
Chrapek Marcin
De Sensi Daniele
Di Girolamo Salvatore
Hoefler Torsten
Khalilov Mikhail
Schneider Timo
Shen Siyuan
Vezzu Alessandro
Publication venue
Publication date: 08/09/2023
Field of study

Multi-tenancy is essential for unleashing SmartNIC's potential in datacenters. Our systematic analysis in this work shows that existing on-path SmartNICs have resource multiplexing limitations. For example, existing solutions lack multi-tenancy capabilities such as performance isolation and QoS provisioning for compute and IO resources. Compared to standard NIC data paths with a well-defined set of offloaded functions, unpredictable execution times of SmartNIC kernels make conventional approaches for multi-tenancy and QoS insufficient. We fill this gap with OSMOSIS, a SmartNICs resource manager co-design. OSMOSIS extends existing OS mechanisms to enable dynamic hardware resource multiplexing on top of the on-path packet processing data plane. We implement OSMOSIS within an open-source RISC-V-based 400Gbit/s SmartNIC. Our performance results demonstrate that OSMOSIS fully supports multi-tenancy and enables broader adoption of SmartNICs in datacenters with low overhead.Comment: 12 pages, 14 figures, 103 reference

arXiv.org e-Print Archive

Novel Architectures for Offloading and Accelerating Computations in Artificial Intelligence and Big Data

Author: Weber Lukas Max
Publication venue
Publication date: 19/10/2023
Field of study

Due to the end of Moore's Law and Dennard Scaling, performance gains in general-purpose architectures have significantly slowed in recent years. While raising the number of cores has been a viable approach for further performance increases, Amdahl's Law and its implications on parallelization also limit further performance gains. Consequently, research has shifted towards different approaches, including domain-specific custom architectures tailored to specific workloads. This has led to a new golden age for computer architecture, as noted in the Turing Award Lecture by Hennessy and Patterson, which has spawned several new architectures and architectural advances specifically targeted at highly current workloads, including Machine Learning. This thesis introduces a hierarchy of architectural improvements ranging from minor incremental changes, such as High-Bandwidth Memory, to more complex architectural extensions that offload workloads from the general-purpose CPU towards more specialized accelerators. Finally, we introduce novel architectural paradigms, namely Near-Data or In-Network Processing, as the most complex architectural improvements. This cumulative dissertation then investigates several architectural improvements to accelerate Sum-Product Networks, a novel Machine Learning approach from the class of Probabilistic Graphical Models. Furthermore, we use these improvements as case studies to discuss the impact of novel architectures, showing that minor and major architectural changes can significantly increase performance in Machine Learning applications. In addition, this thesis presents recent works on Near-Data Processing, which introduces Smart Storage Devices as a novel architectural paradigm that is especially interesting in the context of Big Data. We discuss how Near-Data Processing can be applied to improve performance in different database settings by offloading database operations to smart storage devices. Offloading data-reductive operations, such as selections, reduces the amount of data transferred, thus improving performance and alleviating bandwidth-related bottlenecks. Using Near-Data Processing as a use-case, we also discuss how Machine Learning approaches, like Sum-Product Networks, can improve novel architectures. Specifically, we introduce an approach for offloading Cardinality Estimation using Sum-Product Networks that could enable more intelligent decision-making in smart storage devices. Overall, we show that Machine Learning can benefit from developing novel architectures while also showing that Machine Learning can be applied to improve the applications of novel architectures

TUbiblio

tuprints

Recommended from our members

HyPaFilter - A versatile hybrid FPGA packet filter

Author: Fiessler A
Hager S
Moore AW
Scheuermann B
Publication venue: ANCS 2016 - Proceedings of the 2016 Symposium on Architectures for Networking and Communications Systems
Publication date: 01/01/2016
Field of study

With network traffic rates continuously growing, security systems like firewalls are facing increasing challenges to process incoming packets at line speed without sacrificing protection. Accordingly, specialized hardware firewalls are increasingly used in high-speed environments. Hardware solutions, though, are inherently limited in terms of the complexity of the policies they can implement, often forcing users to choose between throughput and comprehensive analysis. On the contrary, complex rules typically constitute only a small fraction of the rule set. This motivates the combination of massively parallel, yet complexity-limited specialized circuitry with a slower, but semantically powerful software firewall. The key challenge in such a design arises from the dependencies between classification rules due to their relative priorities within the rule set: complex rules requiring software-based processing may be interleaved at arbitrary positions between those where hardware processing is feasible. We therefore discuss approaches for partitioning and transforming rule sets for hybrid packet processing, and propose HyPaFilter, a hybrid classification system based on tailored circuitry on an FPGA as an accelerator for a Linux netfilter firewall. Our evaluation demonstrates 30-fold performance gains in comparison to software-only processing.Horizon 2020 (Grant ID: SSICLOPS project, 644866)This is the author accepted manuscript. The final version is available from the Association for Computing Machinery via http://dx.doi.org/10.1145/2881025.288103

Apollo (Cambridge)

EMU: Rapid prototyping of networking services

Author: Bressana P
Clegg RG
Costa P
Crowcroft Jonathon
Galea S
Greaves David
Mai L
Moore Andrew
Mortier Richard
Pietzuch P
Shipton J
Soulé R
Sultana Nikolai
Wójcik M
Zilberman Noa
Publication venue: Proceedings of the 2017 USENIX Annual Technical Conference, USENIX ATC 2017
Publication date: 15/11/2016
Field of study

Due to their performance and flexibility, FPGAs are an attractive platform for the execution of network functions. It has been a challenge for a long time though to make FPGA programming accessible to a large audience of developers. An appealing solution is to compile code from a general-purpose language to hardware using high-level synthesis. Unfortunately, current approaches to implement rich network functionality are insufficient because they lack: (i) libraries with abstractions for common network operations and data structures, (ii) bindings to the underlying “substrate” on the FPGA, and (iii) debugging and profiling support. This paper describes Emu, a new standard library for an FPGA hardware compiler that enables developers to rapidly create and deploy network functionality. Emu allows for high-performance designs without being bound to particular packet processing paradigms. Furthermore, it supports running the same programs on CPUs, in Mininet, and on FPGAs, providing a better development environment that includes advanced debugging capabilities. We demonstrate that network functions implemented using Emu have only negligible resource and performance overheads compared with natively-written hardware versions

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Crossref

Springer - Publisher Connector

PubMed Central

Apollo (Cambridge)

Technische Universität Dresden: Qucosa

HyPaFilter+: Enhanced Hybrid Packet Filtering using Hardware Assisted Classification and Header Space Analysis

Author: Fiessler A
Hager S
Lorenz C
Moore AW
Scheuermann B
Publication venue: IEEE/ACM Transactions on Networking
Publication date: 01/12/2017
Field of study

Firewalls, key components for secured network in- frastructures, are faced with two different kinds of challenges: first, they must be fast enough to classify network packets at line speed, second, their packet processing capabilities should be versatile in order to support complex filtering policies. Unfortu- nately, most existing classification systems do not qualify equally well for both requirements: systems built on special-purpose hardware are fast, but limited in their filtering functionality. In contrast, software filters provide powerful matching semantics, but struggle to meet line speed. This motivates the combination of parallel, yet complexity-limited specialized circuitry with a slower, but versatile software firewall. The key challenge in such a design arises from the dependencies between classification rules due to their relative priorities within the rule set: complex rules requiring software-based processing may be interleaved at arbitrary positions between those where hardware processing is feasible. We therefore discuss approaches for partitioning and transforming rule sets for hybrid packet processing. As a result we propose HyPaFilter+, a hybrid classification system consisting of an FPGA-based hardware matcher and a Linux netfilter firewall, which provides a simple, yet effective hardware/software packet shunting algorithm. Our evaluation shows up to 30-fold throughput gains over software packet processing.We would like to acknowledge the support of the German Federal Ministry for Economic Affairs and Energy and the German Federal Ministry of Education and Research. This work was, in part, supported by the EU Horizon 2020 SSICLOPS project (grant agreement 644866)

Crossref

Apollo (Cambridge)

A Survey on Data Plane Programming with P4: Fundamentals, Advances, and Applied Research

Author: Frank Reinhard
Gurevich Vladimir
Hauser Frederik
Häberle Marco
Lindner Steffen
Menth Michael
Merling Daniel
Zeiger Florian
Publication venue
Publication date: 26/01/2021
Field of study

With traditional networking, users can configure control plane protocols to match the specific network configuration, but without the ability to fundamentally change the underlying algorithms. With SDN, the users may provide their own control plane, that can control network devices through their data plane APIs. Programmable data planes allow users to define their own data plane algorithms for network devices including appropriate data plane APIs which may be leveraged by user-defined SDN control. Thus, programmable data planes and SDN offer great flexibility for network customization, be it for specialized, commercial appliances, e.g., in 5G or data center networks, or for rapid prototyping in industrial and academic research. Programming protocol-independent packet processors (P4) has emerged as the currently most widespread abstraction, programming language, and concept for data plane programming. It is developed and standardized by an open community and it is supported by various software and hardware platforms. In this paper, we survey the literature from 2015 to 2020 on data plane programming with P4. Our survey covers 497 references of which 367 are scientific publications. We organize our work into two parts. In the first part, we give an overview of data plane programming models, the programming language, architectures, compilers, targets, and data plane APIs. We also consider research efforts to advance P4 technology. In the second part, we analyze a large body of literature considering P4-based applied research. We categorize 241 research papers into different application domains, summarize their contributions, and extract prototypes, target platforms, and source code availability.Comment: Submitted to IEEE Communications Surveys and Tutorials (COMS) on 2021-01-2

arXiv.org e-Print Archive