642 research outputs found
P4-compatible High-level Synthesis of Low Latency 100 Gb/s Streaming Packet Parsers in FPGAs
Packet parsing is a key step in SDN-aware devices. Packet parsers in SDN
networks need to be both reconfigurable and fast, to support the evolving
network protocols and the increasing multi-gigabit data rates. The combination
of packet processing languages with FPGAs seems to be the perfect match for
these requirements. In this work, we develop an open-source FPGA-based
configurable architecture for arbitrary packet parsing to be used in SDN
networks. We generate low latency and high-speed streaming packet parsers
directly from a packet processing program. Our architecture is pipelined and
entirely modeled using templated C++ classes. The pipeline layout is derived
from a parser graph that corresponds a P4 code after a series of graph
transformation rounds. The RTL code is generated from the C++ description using
Xilinx Vivado HLS and synthesized with Xilinx Vivado. Our architecture achieves
100 Gb/s data rate in a Xilinx Virtex-7 FPGA while reducing the latency by 45%
and the LUT usage by 40% compared to the state-of-the-art.Comment: Accepted for publication at the 26th ACM/SIGDA International
Symposium on Field-Programmable Gate Arrays February 25 - 27, 2018 Monterey
Marriott Hotel, Monterey, California, 7 pages, 7 figures, 1 tabl
Module-per-Object: a Human-Driven Methodology for C++-based High-Level Synthesis Design
High-Level Synthesis (HLS) brings FPGAs to audiences previously unfamiliar to
hardware design. However, achieving the highest Quality-of-Results (QoR) with
HLS is still unattainable for most programmers. This requires detailed
knowledge of FPGA architecture and hardware design in order to produce
FPGA-friendly codes. Moreover, these codes are normally in conflict with best
coding practices, which favor code reuse, modularity, and conciseness.
To overcome these limitations, we propose Module-per-Object (MpO), a
human-driven HLS design methodology intended for both hardware designers and
software developers with limited FPGA expertise. MpO exploits modern C++ to
raise the abstraction level while improving QoR, code readability and
modularity. To guide HLS designers, we present the five characteristics of MpO
classes. Each characteristic exploits the power of HLS-supported modern C++
features to build C++-based hardware modules. These characteristics lead to
high-quality software descriptions and efficient hardware generation. We also
present a use case of MpO, where we use C++ as the intermediate language for
FPGA-targeted code generation from P4, a packet processing domain specific
language. The MpO methodology is evaluated using three design experiments: a
packet parser, a flow-based traffic manager, and a digital up-converter. Based
on experiments, we show that MpO can be comparable to hand-written VHDL code
while keeping a high abstraction level, human-readable coding style and
modularity. Compared to traditional C-based HLS design, MpO leads to more
efficient circuit generation, both in terms of performance and resource
utilization. Also, the MpO approach notably improves software quality,
augmenting parametrization while eliminating the incidence of code duplication.Comment: 9 pages. Paper accepted for publication at The 27th IEEE
International Symposium on Field-Programmable Custom Computing Machines, San
Diego CA, April 28 - May 1, 201
HPQS: A fast, high-capacity, hybrid priority queuing system for high-speed networking devices
In this paper, we present a fast hybrid priority queue architecture intended for scheduling and prioritizing packets in a network data plane. Due to increasing traffic and tight requirements of high-speed networking devices, a high capacity priority queue, with constant latency and guaranteed performance is needed. We aim at reducing latency to best support the upcoming 5G wireless standards. The proposed hybrid priority queuing system (HPQS) enables pipelined queue operations with almost constant time complexity in practice. The proposed architecture is implemented in C++, and is synthesized with the Vivado High-Level Synthesis (HLS) tool. Two configurations are proposed. The first one is intended for scheduling with a multi-queuing system for which implementation results of 64 up to 512 independent queues are reported. The second configuration is intended for large capacity priority queues, that are placed and routed on a ZC706 board and a XCVU440-FLGB2377-3-E Xilinx FPGA supporting a total capacity of 1/2 million packet tags. The reported results are compared across a range of priority queue depths and performance metrics with existing approaches. The proposed HPQS supports links operating at 40 Gb/s
On the application of minimum noise tracking to cancel cosine shaped residual noise
It has been shown recently that for coherence based dual microphone array speech enhancement systems, cross-spectral subtraction is an efficient technique aimed to reduce the correlated noise components. The zero-phase filtering criterion employed in these methods is derived from the standard coherence function that is modified to
incorporate the noise cross power spectrum between the two channels. However, there has been limited success at applying coherence based filters when speech processing is carried out under relatively harsh acoustic conditions (SNR below -5dB) or when the speech and noise sources are closely spaced. We propose an alternative method that is effective, and that attempts to use a phase-based filtering criterion by substituting the cross power spectrum of the noisy signals received on the two channels by its real part. Then, a variant of the running minimum noise tracking procedure is applied on the estimated speech spectrum as an adaptive
postfiltering to reduce the cosine shaped power spectrum of the remaining residual musical noise to a minimum spectral floor. Using that adaptive postfilter, a softdecision
scheme is implemented to control the amount of noise suppression. Our preliminary results based on experiments conducted on real speech signals show an improved performance of the proposed method over the coherence based
approaches. These results also show that it performs well on speech while producing less spectral distortion even in severe noisy conditions
A high-speed, scalable, and programmable traffic manager architecture for flow-based networking
In this paper, we present a programmable and scalable traffic manager (TM) architecture, targeting requirements of high-speed networking devices, especially in the software-defined networking context. This TM is intended to ease the deployability of new architectures through field-programmable gate array (FPGA) platforms and to make the data plane programmable and scalable. Flow-based networking allows treating traffic in terms of flows rather than as a simple aggregation of individual packets, which simplifies scheduling and bandwidth allocation for each flow. Programmability brings agility, flexibility, and rapid adaptation to changes, allowing to meet network requirements in real-time. Traffic management with fast queuing and reduced latency plays an important role to support the upcoming 5G cellular communication technology. The proposed TM architecture is coded in C++ and is synthesized with the Vivado High-Level Synthesis tool. This TM is capable of supporting links operating beyond 40 Gb/s, on the ZC706 board and XCVU440-FLGB2377-3-E FPGA device from Xilinx, while achieving 80 Gb/s and 100 Gb/s throughput, respectively. The resulting placed and routed design was tested on the ZC706 board with its embedded ARM processor controlling table updates
Node configuration for the Aho-Corasick algorithm in intrusion detection systems
In this paper, we analyze the performance and cost trade-off from selecting two representations of nodes when implementing the Aho-Corasick algorithm. This algorithm can be used for pattern matching in network-based intrusion detection systems such as Snort. Our analysis uses the Snort 2.9.7 rules set, which contains almost 26k patterns. Our methodology consists of code profiling and analysis, followed by the selection of a parameter to maximize a metric that combines clock cycles count and memory usage. The parameter determines which of two types of nodes is selected for each trie node. We show that it is possible to select the parameter to optimize the metric, which results in an improvement by up to 12× compared with the single node-type case
Multidifferential study of identified charged hadron distributions in -tagged jets in proton-proton collisions at 13 TeV
Jet fragmentation functions are measured for the first time in proton-proton
collisions for charged pions, kaons, and protons within jets recoiling against
a boson. The charged-hadron distributions are studied longitudinally and
transversely to the jet direction for jets with transverse momentum 20 GeV and in the pseudorapidity range . The
data sample was collected with the LHCb experiment at a center-of-mass energy
of 13 TeV, corresponding to an integrated luminosity of 1.64 fb. Triple
differential distributions as a function of the hadron longitudinal momentum
fraction, hadron transverse momentum, and jet transverse momentum are also
measured for the first time. This helps constrain transverse-momentum-dependent
fragmentation functions. Differences in the shapes and magnitudes of the
measured distributions for the different hadron species provide insights into
the hadronization process for jets predominantly initiated by light quarks.Comment: All figures and tables, along with machine-readable versions and any
supplementary material and additional information, are available at
https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2022-013.html (LHCb
public pages
Study of the decay
The decay is studied
in proton-proton collisions at a center-of-mass energy of TeV
using data corresponding to an integrated luminosity of 5
collected by the LHCb experiment. In the system, the
state observed at the BaBar and Belle experiments is
resolved into two narrower states, and ,
whose masses and widths are measured to be where the first uncertainties are statistical and the second
systematic. The results are consistent with a previous LHCb measurement using a
prompt sample. Evidence of a new
state is found with a local significance of , whose mass and width
are measured to be and , respectively. In addition, evidence of a new decay mode
is found with a significance of
. The relative branching fraction of with respect to the
decay is measured to be , where the first
uncertainty is statistical, the second systematic and the third originates from
the branching fractions of charm hadron decays.Comment: All figures and tables, along with any supplementary material and
additional information, are available at
https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2022-028.html (LHCb
public pages
- …