Search CORE

199 research outputs found

P4-compatible High-level Synthesis of Low Latency 100 Gb/s Streaming Packet Parsers in FPGAs

Author: Boyer François-Raymond
da Silva Jeferson Santiago
Langlois J. M. Pierre
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/11/2017
Field of study

Packet parsing is a key step in SDN-aware devices. Packet parsers in SDN networks need to be both reconfigurable and fast, to support the evolving network protocols and the increasing multi-gigabit data rates. The combination of packet processing languages with FPGAs seems to be the perfect match for these requirements. In this work, we develop an open-source FPGA-based configurable architecture for arbitrary packet parsing to be used in SDN networks. We generate low latency and high-speed streaming packet parsers directly from a packet processing program. Our architecture is pipelined and entirely modeled using templated C++ classes. The pipeline layout is derived from a parser graph that corresponds a P4 code after a series of graph transformation rounds. The RTL code is generated from the C++ description using Xilinx Vivado HLS and synthesized with Xilinx Vivado. Our architecture achieves 100 Gb/s data rate in a Xilinx Virtex-7 FPGA while reducing the latency by 45% and the LUT usage by 40% compared to the state-of-the-art.Comment: Accepted for publication at the 26th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays February 25 - 27, 2018 Monterey Marriott Hotel, Monterey, California, 7 pages, 7 figures, 1 tabl

arXiv.org e-Print Archive

PolyPublie

P4-PSFP: P4-Based Per-Stream Filtering and Policing for Time-Sensitive Networking

Author: Ihle Fabian
Lindner Steffen
Menth Michael
Publication venue
Publication date: 13/11/2023
Field of study

Time-Sensitive Networking (TSN) extends Ethernet to enable real-time communication, including the Credit-Based Shaper (CBS) for prioritized scheduling and the Time-Aware Shaper (TAS) for scheduled traffic. Generally, TSN requires streams to be explicitly admitted before being transmitted. To ensure that admitted traffic conforms with the traffic descriptors indicated for admission control, Per-Stream Filtering and Policing (PSFP) has been defined. For credit-based metering, well-known token bucket policers are applied. However, time-based metering requires time-dependent switch behavior and time synchronization with sub-microsecond precision. While TSN-capable switches support various TSN traffic shaping mechanisms, a full implementation of PSFP is still not available. To bridge this gap, we present a P4-based implementation of PSFP on a 100 Gb/s per port hardware switch. We explain the most interesting aspects of the PSFP implementation whose code is available on GitHub. We demonstrate credit-based and time-based policing and synchronization capabilities to validate the functionality and effectiveness of P4-PSFP. The implementation scales up to 35840 streams depending on the stream identification method. P4-PSFP can be used in practice as long as appropriate TSN switches lack this function. Moreover, its implementation may be helpful for other P4-based hardware implementations that require time synchronization

arXiv.org e-Print Archive

NetFPGA SUME: Toward 100 Gbps as research commodity

Author: Audzevich Y
Covington GA
Moore AW
Zilberman N
Publication venue: IEEE Micro
Publication date: 01/01/2014
Field of study

The demand-led growth of datacenter networks has meant that many constituent technologies are beyond the budget of the research community. In order to make and validate timely and relevant research contributions, the wider research community requires accessible evaluation, experimentation and demonstration environments with speciﬁcation comparable to the subsystems of the most massive datacenter networks. We present NetFPGA SUME, an FPGA-based PCIe board with I/O capabilities for 100Gb/s operation as NIC, multiport switch, ﬁrewall, or test/measurement environment. As a powerful new NetFPGA platform, SUME provides an accessible development environment that both reuses existing codebases and enables new designs.This work was jointly supported by EPSRC INTERNET Project EP/H040536/1, National Science Foundation under Grant No. CNS-0855268, and Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory (AFRL), under contract FA8750-11-C-0249.This is the author accepted manuscript. The final version is available from IEEE at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6866035&sortType%3Dasc_p_Sequence%26filter%3DAND%28p_IS_Number%3A5210076%29

Crossref

Apollo (Cambridge)

Bridging the Gap: FPGAs as Programmable Switches

Author: da Silva Jeferson Santiago
Langlois J. M. Pierre
Luinaud Thomas
Savaria Yvon
Stimpfling Thibaut
Publication venue
Publication date: 01/01/2020
Field of study

The emergence of P4, a domain specific language, coupled to PISA, a domain specific architecture, is revolutionizing the networking field. P4 allows to describe how packets are processed by a programmable data plane, spanning ASICs and CPUs, implementing PISA. Because the processing flexibility can be limited on ASICs, while the CPUs performance for networking tasks lag behind, recent works have proposed to implement PISA on FPGAs. However, little effort has been dedicated to analyze whether FPGAs are good candidates to implement PISA. In this work, we take a step back and evaluate the micro-architecture efficiency of various PISA blocks. We demonstrate, supported by a theoretical and experimental analysis, that the performance of a few PISA blocks is severely limited by the current FPGA architectures. Specifically, we show that match tables and programmable packet schedulers represent the main performance bottlenecks for FPGA-based programmable switches. Thus, we explore two avenues to alleviate these shortcomings. First, we identify network applications well tailored to current FPGAs. Second, to support a wider range of networking applications, we propose modifications to the FPGA architectures which can also be of interest out of the networking field.Comment: To be published in : IEEE International Conference on High Performance Switching and Routing 202

arXiv.org e-Print Archive

PolyPublie

Distributed hardware accelerated secure joint computation on the COPA framework

Author: Haghi Pouya
Herbord Martin
Jain Shweta
Kot Andriy
Krishnan Venkata
Patel Rushi
Varia Mayrmk
Publication venue: IEEE
Publication date: 10/04/2022
Field of study

https://arxiv.org/pdf/2204.04816.pdfFirst author draf

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

HP4 High-Performance Programmable Packet Parser

Author: Ibrahim Amr
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 16/04/2022
Field of study

Now, header parsing is the main topic in the modern network systems to support many operations such as packet processing and security functions. The header parser design has a significant effect on the network devices' performances (latency, throughput, and resource utilization). However, the header parser design suffers from a lot number of difficulties, such as the incrementing in network throughput and a variety of protocols. Therefore, the programmable hardware packet parsing is the best solution to meet the dynamic reconfiguration and speed needs. Field Programmable Gate Array (FPGA) is an appropriate device for programmable high-speed packet implementation. This paper introduces a novel FPGA High-Performance Programmable Packet Parser architecture (HP4). HP4 automatically generated by the P4 (Programming protocol-independent Packet Processors) to optimize the speed, dynamic reconfiguration, and resource consumption. The HP4 shows a pipelined packet parser dynamic reconfiguration and low latency. In addition to high throughput (over 600 Gb/s), HP4 resource utilization is less than 7.5 percent of Virtex-7 870HT, and latency is about 88 ns. HP4 can use in a high-speed dynamic packet switch and network security

International Journal of Communication Networks and Information Security (IJCNIS)

FEC killed the cut-through switch

Author: Moore AW
Sella OS
Zilberman N
Publication venue: NEAT 2018 - Proceedings of the 2018 Workshop on Networking for Emerging Applications and Technologies, Part of SIGCOMM 2018
Publication date: 01/01/2018
Field of study

Latency penalty in Ethernet links beyond 10Gb/s is due to forward error correction (FEC) blocks. In the worst case a single-hop penalty approaches the latency of an entire cutthrough switch. Latency jitter is also introduced, making latency prediction harder, with large peak to peak variance. These factors stretch the tail of latency distribution in Rackscale systems and Data Centers, which in turn degrades performance of distributed applications. We analyse the underlying mechanisms, calculate lower bounds and propose a different approach that would reduce the penalty, allow control over latency and feedback for application level optimisation.Rudin foundation, Isaac Newton trust, Leverhulme trust, Microsoft researc

Crossref

Apollo (Cambridge)

A high-speed, scalable, and programmable traffic manager architecture for flow-based networking

Author: Benacer Imad
Boyer François-Raymond
Savaria Yvon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/01/2019
Field of study

In this paper, we present a programmable and scalable traffic manager (TM) architecture, targeting requirements of high-speed networking devices, especially in the software-defined networking context. This TM is intended to ease the deployability of new architectures through field-programmable gate array (FPGA) platforms and to make the data plane programmable and scalable. Flow-based networking allows treating traffic in terms of flows rather than as a simple aggregation of individual packets, which simplifies scheduling and bandwidth allocation for each flow. Programmability brings agility, flexibility, and rapid adaptation to changes, allowing to meet network requirements in real-time. Traffic management with fast queuing and reduced latency plays an important role to support the upcoming 5G cellular communication technology. The proposed TM architecture is coded in C++ and is synthesized with the Vivado High-Level Synthesis tool. This TM is capable of supporting links operating beyond 40 Gb/s, on the ZC706 board and XCVU440-FLGB2377-3-E FPGA device from Xilinx, while achieving 80 Gb/s and 100 Gb/s throughput, respectively. The resulting placed and routed design was tested on the ZC706 board with its embedded ARM processor controlling table updates

PolyPublie

Hierarchical Content Stores in High-speed ICN Routers: Emulation and Prototype Implementation

Author: Barcellos Marinho
Gallo Massimo
Leonardi Emilio
Mansilha Rordrigo
Perino Diego
Rossi Dario
Saino Lorenzo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2015
Field of study

Recent work motivates the design of Information-centric rou-ters that make use of hierarchies of memory to jointly scale in the size and speed of content stores. The present paper advances this understanding by (i) instantiating a general purpose two-layer packet-level caching system, (ii) investigating the solution design space via emulation, and (iii) introducing a proof-of-concept prototype. The emulation-based study reveals insights about the broad design space, the expected impact of workload, and gains due to multi-threaded execution. The full-blown system prototype experimentally confirms that, by exploiting both DRAM and SSD memory technologies, ICN routers can sustain cache operations in excess of 10Gbps running on off-the-shelf hardware

INRIA a CCSD electronic archive server

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Control Plane Strategies for Elastic Optical Networks

Author: Turus Ioan
Publication venue: Technical University of Denmark
Publication date: 01/01/2015
Field of study

Online Research Database In Technology