30 research outputs found
P4-compatible High-level Synthesis of Low Latency 100 Gb/s Streaming Packet Parsers in FPGAs
Packet parsing is a key step in SDN-aware devices. Packet parsers in SDN
networks need to be both reconfigurable and fast, to support the evolving
network protocols and the increasing multi-gigabit data rates. The combination
of packet processing languages with FPGAs seems to be the perfect match for
these requirements. In this work, we develop an open-source FPGA-based
configurable architecture for arbitrary packet parsing to be used in SDN
networks. We generate low latency and high-speed streaming packet parsers
directly from a packet processing program. Our architecture is pipelined and
entirely modeled using templated C++ classes. The pipeline layout is derived
from a parser graph that corresponds a P4 code after a series of graph
transformation rounds. The RTL code is generated from the C++ description using
Xilinx Vivado HLS and synthesized with Xilinx Vivado. Our architecture achieves
100 Gb/s data rate in a Xilinx Virtex-7 FPGA while reducing the latency by 45%
and the LUT usage by 40% compared to the state-of-the-art.Comment: Accepted for publication at the 26th ACM/SIGDA International
Symposium on Field-Programmable Gate Arrays February 25 - 27, 2018 Monterey
Marriott Hotel, Monterey, California, 7 pages, 7 figures, 1 tabl
HP4 High-Performance Programmable Packet Parser
Now, header parsing is the main topic in the modern network systems to support many operations such as packet processing and security functions. The header parser design has a significant effect on the network devices' performances (latency, throughput, and resource utilization). However, the header parser design suffers from a lot number of difficulties, such as the incrementing in network throughput and a variety of protocols. Therefore, the programmable hardware packet parsing is the best solution to meet the dynamic reconfiguration and speed needs. Field Programmable Gate Array (FPGA) is an appropriate device for programmable high-speed packet implementation. This paper introduces a novel FPGA High-Performance Programmable Packet Parser architecture (HP4). HP4 automatically generated by the P4 (Programming protocol-independent Packet Processors) to optimize the speed, dynamic reconfiguration, and resource consumption. The HP4 shows a pipelined packet parser dynamic reconfiguration and low latency. In addition to high throughput (over 600 Gb/s), HP4 resource utilization is less than 7.5 percent of Virtex-7 870HT, and latency is about 88 ns. HP4 can use in a high-speed dynamic packet switch and network security
Fully Programming the Data Plane: A Hardware/Software Approach
Les réseaux définis par logiciel — en anglais Software-Defined Networking (SDN) — sont apparus ces dernières années comme un nouveau paradigme de réseau. SDN introduit une séparation entre les plans de gestion, de contrôle et de données, permettant à ceux-ci d’évoluer de manière indépendante, rompant ainsi avec la rigidité des réseaux traditionnels. En particulier, dans le plan de données, les avancées récentes ont porté sur la définition des langages
de traitement de paquets, tel que P4, et sur la définition d’architectures de commutateurs programmables, par exemple la Protocol Independent Switch Architecture (PISA). Dans cette thèse, nous nous intéressons a l’architecture PISA et évaluons comment exploiter les FPGA comme plateforme de traitement efficace de paquets. Cette problématique est
étudiée a trois niveaux d’abstraction : microarchitectural, programmation et architectural. Au niveau microarchitectural, nous avons proposé une architecture efficace d’un analyseur d’entêtes de paquets pour PISA. L’analyseur de paquets utilise une architecture pipelinée avec propagation en avant — en anglais feed-forward. La complexité de l’architecture est réduite par rapport à l’état de l’art grâce a l’utilisation d’optimisations algorithmiques. Finalement, l’architecture est générée par un compilateur P4 vers C++, combiné à un outil de synthèse de haut niveau. La solution proposée atteint un débit de 100 Gb/s avec une latence comparable à celle d’analyseurs d’entêtes de paquets écrits à la main. Au niveau de la programmation, nous avons proposé une nouvelle méthodologie de conception de synthèse de haut niveau visant à améliorer conjointement la qualité logicielle et matérielle. Nous exploitons les fonctionnalités du C++ moderne pour améliorer à la fois la modularité et la lisibilité du code, tout en conservant (ou améliorant) les résultats du matériel généré.
Des exemples de conception utilisant notre méthodologie, incluant pour l’analyseur d’entête de paquets, ont été rendus publics.----------ABSTRACT: Software-Defined Networking (SDN) has emerged in recent years as a new network paradigm to de-ossify communication networks. Indeed, by offering a clear separation of network concerns
between the management, control, and data planes, SDN allows each of these planes to evolve independently, breaking the rigidity of traditional networks. However, while well
spread in the control and management planes, this de-ossification has only recently reached the data plane with the advent of packet processing languages, e.g. P4, and novel programmable switch architectures, e.g. Protocol Independent Switch Architecture (PISA). In this work, we focus on leveraging the PISA architecture by mainly exploiting the FPGA capabilities for efficient packet processing. In this way, we address this issue at different
abstraction levels: i) microarchitectural; ii) programming; and, iii) architectural. At the microarchitectural level, we have proposed an efficient FPGA-based packet parser
architecture, which is a major PISA’s component. The proposed packet parser follows a feedforward
pipeline architecture in which the internal microarchitectural has been meticulously optimized for FPGA implementation. The architecture is automatically generated by a P4- to-C++ compiler after several rounds of graph optimizations. The proposed solution achieves 100 Gb/s line rate with latency comparable to hand-written packet parsers. The throughput scales from 10 Gb/s to 160 Gb/s with moderate increase in resource consumption. Both the compiler and the packet parser codebase have been open-sourced to permit reproducibility. At the programming level, we have proposed a novel High-Level Synthesis (HLS) design methodology aiming at improving software and hardware quality. We have employed this novel methodology when designing the packet parser. In our work, we have exploited features of modern C++ that improves at the same time code modularity and readability while keeping (or improving) the results of the generated hardware. Design examples using our methodology have been publicly released
Module-per-Object: a Human-Driven Methodology for C++-based High-Level Synthesis Design
High-Level Synthesis (HLS) brings FPGAs to audiences previously unfamiliar to
hardware design. However, achieving the highest Quality-of-Results (QoR) with
HLS is still unattainable for most programmers. This requires detailed
knowledge of FPGA architecture and hardware design in order to produce
FPGA-friendly codes. Moreover, these codes are normally in conflict with best
coding practices, which favor code reuse, modularity, and conciseness.
To overcome these limitations, we propose Module-per-Object (MpO), a
human-driven HLS design methodology intended for both hardware designers and
software developers with limited FPGA expertise. MpO exploits modern C++ to
raise the abstraction level while improving QoR, code readability and
modularity. To guide HLS designers, we present the five characteristics of MpO
classes. Each characteristic exploits the power of HLS-supported modern C++
features to build C++-based hardware modules. These characteristics lead to
high-quality software descriptions and efficient hardware generation. We also
present a use case of MpO, where we use C++ as the intermediate language for
FPGA-targeted code generation from P4, a packet processing domain specific
language. The MpO methodology is evaluated using three design experiments: a
packet parser, a flow-based traffic manager, and a digital up-converter. Based
on experiments, we show that MpO can be comparable to hand-written VHDL code
while keeping a high abstraction level, human-readable coding style and
modularity. Compared to traditional C-based HLS design, MpO leads to more
efficient circuit generation, both in terms of performance and resource
utilization. Also, the MpO approach notably improves software quality,
augmenting parametrization while eliminating the incidence of code duplication.Comment: 9 pages. Paper accepted for publication at The 27th IEEE
International Symposium on Field-Programmable Custom Computing Machines, San
Diego CA, April 28 - May 1, 201
Bridging the Gap: FPGAs as Programmable Switches
The emergence of P4, a domain specific language, coupled to PISA, a domain
specific architecture, is revolutionizing the networking field. P4 allows to
describe how packets are processed by a programmable data plane, spanning ASICs
and CPUs, implementing PISA. Because the processing flexibility can be limited
on ASICs, while the CPUs performance for networking tasks lag behind, recent
works have proposed to implement PISA on FPGAs. However, little effort has been
dedicated to analyze whether FPGAs are good candidates to implement PISA. In
this work, we take a step back and evaluate the micro-architecture efficiency
of various PISA blocks. We demonstrate, supported by a theoretical and
experimental analysis, that the performance of a few PISA blocks is severely
limited by the current FPGA architectures. Specifically, we show that match
tables and programmable packet schedulers represent the main performance
bottlenecks for FPGA-based programmable switches. Thus, we explore two avenues
to alleviate these shortcomings. First, we identify network applications well
tailored to current FPGAs. Second, to support a wider range of networking
applications, we propose modifications to the FPGA architectures which can also
be of interest out of the networking field.Comment: To be published in : IEEE International Conference on High
Performance Switching and Routing 202
Design Principles for Packet Deparsers on FPGAs
The P4 language has drastically changed the networking field as it allows to quickly describe and implement new networking applications. Although a large variety of applications can be described with the P4 language, current programmable switch architectures impose significant constraints on P4 programs. To address this shortcoming, FPGAs have been explored as potential targets for P4 applications. P4 applications are described using three abstractions: a packet parser, match-action tables, and a packet deparser, which reassembles the output packet with the result of the match-action tables. While implementations of packet parsers and match-action tables on FPGAs have been widely covered in the literature, no general design principles have been presented for the packet deparser. Indeed, implementing a high-speed and efficient deparser on FPGAs remains an open issue because it requires a large amount of interconnections and the architecture must be tailored to a P4 program. As a result, in several works where a P4 application is implemented on FPGAs, the deparser consumes a significant proportion of chip resources. Hence, in this paper, we address this issue by presenting design principles for efficient and high-speed deparsers on FPGAs. As an artifact, we introduce a tool that generates an efficient vendor-agnostic deparser architecture from a P4 program.Our design has been validated and simulated with a cocotb-based framework.The resulting architecture is implemented on Xilinx Ultrascale+ FPGAs and supports a throughput of more than 200 Gbps while reducing resource usage by almost 10x compared to other solutions
EMU: Rapid prototyping of networking services
Due to their performance and flexibility, FPGAs are an attractive platform for the execution of network functions.
It has been a challenge for a long time though to make FPGA programming accessible to a large audience of developers. An appealing solution is to compile code from a general-purpose language to hardware using high-level synthesis. Unfortunately, current approaches to implement rich network functionality are insufficient because they lack: (i) libraries with abstractions for common network operations and data structures, (ii) bindings
to the underlying “substrate” on the FPGA, and (iii) debugging
and profiling support.
This paper describes Emu, a new standard library for an FPGA hardware compiler that enables developers to rapidly create and deploy network functionality. Emu allows for high-performance designs without being bound to particular packet processing paradigms. Furthermore, it supports running the same programs on CPUs, in Mininet, and on FPGAs, providing a better development environment that includes advanced debugging capabilities.
We demonstrate that network functions implemented using Emu have only negligible resource and performance overheads compared with natively-written hardware versions
EMU: Rapid prototyping of networking services
Due to their performance and flexibility, FPGAs are an attractive platform for the execution of network functions.
It has been a challenge for a long time though to make FPGA programming accessible to a large audience of developers. An appealing solution is to compile code from a general-purpose language to hardware using high-level synthesis. Unfortunately, current approaches to implement rich network functionality are insufficient because they lack: (i) libraries with abstractions for common network operations and data structures, (ii) bindings
to the underlying “substrate” on the FPGA, and (iii) debugging
and profiling support.
This paper describes Emu, a new standard library for an FPGA hardware compiler that enables developers to rapidly create and deploy network functionality. Emu allows for high-performance designs without being bound to particular packet processing paradigms. Furthermore, it supports running the same programs on CPUs, in Mininet, and on FPGAs, providing a better development environment that includes advanced debugging capabilities.
We demonstrate that network functions implemented using Emu have only negligible resource and performance overheads compared with natively-written hardware versions
The P4->NetFPGA Workflow for Line-Rate Packet Processing
P4 has emerged as the de facto standard language for describing how network packets should be processed, and is becoming widely used by network owners, systems developers, researchers and in the classroom. The goal of the work presented here is to make it easier for engineers, researchers and students to learn how to program using P4, and to build prototypes running on real hardware. Our target is the NetFPGA SUME platform, a 4x10 Gb/s PCIe card designed for use in universities for teaching and research. Until now, NetFPGA users have needed to learn an HDL such as Verilog or VHDL, making it off limits to many software developers and students. Therefore, we developed the P4->NetFPGA workflow, allowing developers to describe how packets are to be processed in the high-level P4 language, then compile their P4 programs to run at line rate on the NetFPGA SUME board. The P4->NetFPGA workflow is built upon the Xilinx P4-SDNet compiler and the NetFPGA SUME open source code base. In this tutorial paper, we provide an overview of the P4 programming language and describe the P4->NetFPGA workflow. We also describe how the workflow is being used by the P4 community to build research prototypes, and to teach how network systems are built by providing students with hands-on experience working with real hardware.Leverhulme Trust
Isaac Newton Trust
TBD other