199 research outputs found
P4-compatible High-level Synthesis of Low Latency 100 Gb/s Streaming Packet Parsers in FPGAs
Packet parsing is a key step in SDN-aware devices. Packet parsers in SDN
networks need to be both reconfigurable and fast, to support the evolving
network protocols and the increasing multi-gigabit data rates. The combination
of packet processing languages with FPGAs seems to be the perfect match for
these requirements. In this work, we develop an open-source FPGA-based
configurable architecture for arbitrary packet parsing to be used in SDN
networks. We generate low latency and high-speed streaming packet parsers
directly from a packet processing program. Our architecture is pipelined and
entirely modeled using templated C++ classes. The pipeline layout is derived
from a parser graph that corresponds a P4 code after a series of graph
transformation rounds. The RTL code is generated from the C++ description using
Xilinx Vivado HLS and synthesized with Xilinx Vivado. Our architecture achieves
100 Gb/s data rate in a Xilinx Virtex-7 FPGA while reducing the latency by 45%
and the LUT usage by 40% compared to the state-of-the-art.Comment: Accepted for publication at the 26th ACM/SIGDA International
Symposium on Field-Programmable Gate Arrays February 25 - 27, 2018 Monterey
Marriott Hotel, Monterey, California, 7 pages, 7 figures, 1 tabl
P4-PSFP: P4-Based Per-Stream Filtering and Policing for Time-Sensitive Networking
Time-Sensitive Networking (TSN) extends Ethernet to enable real-time
communication, including the Credit-Based Shaper (CBS) for prioritized
scheduling and the Time-Aware Shaper (TAS) for scheduled traffic. Generally,
TSN requires streams to be explicitly admitted before being transmitted. To
ensure that admitted traffic conforms with the traffic descriptors indicated
for admission control, Per-Stream Filtering and Policing (PSFP) has been
defined. For credit-based metering, well-known token bucket policers are
applied. However, time-based metering requires time-dependent switch behavior
and time synchronization with sub-microsecond precision. While TSN-capable
switches support various TSN traffic shaping mechanisms, a full implementation
of PSFP is still not available. To bridge this gap, we present a P4-based
implementation of PSFP on a 100 Gb/s per port hardware switch. We explain the
most interesting aspects of the PSFP implementation whose code is available on
GitHub. We demonstrate credit-based and time-based policing and synchronization
capabilities to validate the functionality and effectiveness of P4-PSFP. The
implementation scales up to 35840 streams depending on the stream
identification method. P4-PSFP can be used in practice as long as appropriate
TSN switches lack this function. Moreover, its implementation may be helpful
for other P4-based hardware implementations that require time synchronization
NetFPGA SUME: Toward 100 Gbps as research commodity
The demand-led growth of datacenter networks has
meant that many constituent technologies are beyond the budget
of the research community. In order to make and validate
timely and relevant research contributions, the wider research
community requires accessible evaluation, experimentation and
demonstration environments with speciïŹcation comparable to
the subsystems of the most massive datacenter networks. We
present NetFPGA SUME, an FPGA-based PCIe board with I/O
capabilities for 100Gb/s operation as NIC, multiport switch,
ïŹrewall, or test/measurement environment. As a powerful new
NetFPGA platform, SUME provides an accessible development
environment that both reuses existing codebases and enables new
designs.This work was jointly supported by EPSRC INTERNET
Project EP/H040536/1, National Science Foundation under
Grant No. CNS-0855268, and Defense Advanced Research
Projects Agency (DARPA) and Air Force Research Laboratory (AFRL), under contract FA8750-11-C-0249.This is the author accepted manuscript. The final version is available from IEEE at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6866035&sortType%3Dasc_p_Sequence%26filter%3DAND%28p_IS_Number%3A5210076%29
Bridging the Gap: FPGAs as Programmable Switches
The emergence of P4, a domain specific language, coupled to PISA, a domain
specific architecture, is revolutionizing the networking field. P4 allows to
describe how packets are processed by a programmable data plane, spanning ASICs
and CPUs, implementing PISA. Because the processing flexibility can be limited
on ASICs, while the CPUs performance for networking tasks lag behind, recent
works have proposed to implement PISA on FPGAs. However, little effort has been
dedicated to analyze whether FPGAs are good candidates to implement PISA. In
this work, we take a step back and evaluate the micro-architecture efficiency
of various PISA blocks. We demonstrate, supported by a theoretical and
experimental analysis, that the performance of a few PISA blocks is severely
limited by the current FPGA architectures. Specifically, we show that match
tables and programmable packet schedulers represent the main performance
bottlenecks for FPGA-based programmable switches. Thus, we explore two avenues
to alleviate these shortcomings. First, we identify network applications well
tailored to current FPGAs. Second, to support a wider range of networking
applications, we propose modifications to the FPGA architectures which can also
be of interest out of the networking field.Comment: To be published in : IEEE International Conference on High
Performance Switching and Routing 202
Distributed hardware accelerated secure joint computation on the COPA framework
https://arxiv.org/pdf/2204.04816.pdfFirst author draf
HP4 High-Performance Programmable Packet Parser
Now, header parsing is the main topic in the modern network systems to support many operations such as packet processing and security functions. The header parser design has a significant effect on the network devices' performances (latency, throughput, and resource utilization). However, the header parser design suffers from a lot number of difficulties, such as the incrementing in network throughput and a variety of protocols. Therefore, the programmable hardware packet parsing is the best solution to meet the dynamic reconfiguration and speed needs. Field Programmable Gate Array (FPGA) is an appropriate device for programmable high-speed packet implementation. This paper introduces a novel FPGA High-Performance Programmable Packet Parser architecture (HP4). HP4 automatically generated by the P4 (Programming protocol-independent Packet Processors) to optimize the speed, dynamic reconfiguration, and resource consumption. The HP4 shows a pipelined packet parser dynamic reconfiguration and low latency. In addition to high throughput (over 600 Gb/s), HP4 resource utilization is less than 7.5 percent of Virtex-7 870HT, and latency is about 88 ns. HP4 can use in a high-speed dynamic packet switch and network security
FEC killed the cut-through switch
Latency penalty in Ethernet links beyond 10Gb/s is due to
forward error correction (FEC) blocks. In the worst case a
single-hop penalty approaches the latency of an entire cutthrough
switch. Latency jitter is also introduced, making
latency prediction harder, with large peak to peak variance.
These factors stretch the tail of latency distribution in Rackscale
systems and Data Centers, which in turn degrades
performance of distributed applications. We analyse the underlying
mechanisms, calculate lower bounds and propose
a different approach that would reduce the penalty, allow
control over latency and feedback for application level optimisation.Rudin foundation, Isaac Newton trust, Leverhulme trust, Microsoft researc
A high-speed, scalable, and programmable traffic manager architecture for flow-based networking
In this paper, we present a programmable and scalable traffic manager (TM) architecture, targeting requirements of high-speed networking devices, especially in the software-defined networking context. This TM is intended to ease the deployability of new architectures through field-programmable gate array (FPGA) platforms and to make the data plane programmable and scalable. Flow-based networking allows treating traffic in terms of flows rather than as a simple aggregation of individual packets, which simplifies scheduling and bandwidth allocation for each flow. Programmability brings agility, flexibility, and rapid adaptation to changes, allowing to meet network requirements in real-time. Traffic management with fast queuing and reduced latency plays an important role to support the upcoming 5G cellular communication technology. The proposed TM architecture is coded in C++ and is synthesized with the Vivado High-Level Synthesis tool. This TM is capable of supporting links operating beyond 40 Gb/s, on the ZC706 board and XCVU440-FLGB2377-3-E FPGA device from Xilinx, while achieving 80 Gb/s and 100 Gb/s throughput, respectively. The resulting placed and routed design was tested on the ZC706 board with its embedded ARM processor controlling table updates
Hierarchical Content Stores in High-speed ICN Routers: Emulation and Prototype Implementation
Recent work motivates the design of Information-centric rou-ters that make use of hierarchies of memory to jointly scale in the size and speed of content stores. The present paper advances this understanding by (i) instantiating a general purpose two-layer packet-level caching system, (ii) investigating the solution design space via emulation, and (iii) introducing a proof-of-concept prototype. The emulation-based study reveals insights about the broad design space, the expected impact of workload, and gains due to multi-threaded execution. The full-blown system prototype experimentally confirms that, by exploiting both DRAM and SSD memory technologies, ICN routers can sustain cache operations in excess of 10Gbps running on off-the-shelf hardware
- âŠ