642 research outputs found

    P4-compatible High-level Synthesis of Low Latency 100 Gb/s Streaming Packet Parsers in FPGAs

    Full text link
    Packet parsing is a key step in SDN-aware devices. Packet parsers in SDN networks need to be both reconfigurable and fast, to support the evolving network protocols and the increasing multi-gigabit data rates. The combination of packet processing languages with FPGAs seems to be the perfect match for these requirements. In this work, we develop an open-source FPGA-based configurable architecture for arbitrary packet parsing to be used in SDN networks. We generate low latency and high-speed streaming packet parsers directly from a packet processing program. Our architecture is pipelined and entirely modeled using templated C++ classes. The pipeline layout is derived from a parser graph that corresponds a P4 code after a series of graph transformation rounds. The RTL code is generated from the C++ description using Xilinx Vivado HLS and synthesized with Xilinx Vivado. Our architecture achieves 100 Gb/s data rate in a Xilinx Virtex-7 FPGA while reducing the latency by 45% and the LUT usage by 40% compared to the state-of-the-art.Comment: Accepted for publication at the 26th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays February 25 - 27, 2018 Monterey Marriott Hotel, Monterey, California, 7 pages, 7 figures, 1 tabl

    Module-per-Object: a Human-Driven Methodology for C++-based High-Level Synthesis Design

    Full text link
    High-Level Synthesis (HLS) brings FPGAs to audiences previously unfamiliar to hardware design. However, achieving the highest Quality-of-Results (QoR) with HLS is still unattainable for most programmers. This requires detailed knowledge of FPGA architecture and hardware design in order to produce FPGA-friendly codes. Moreover, these codes are normally in conflict with best coding practices, which favor code reuse, modularity, and conciseness. To overcome these limitations, we propose Module-per-Object (MpO), a human-driven HLS design methodology intended for both hardware designers and software developers with limited FPGA expertise. MpO exploits modern C++ to raise the abstraction level while improving QoR, code readability and modularity. To guide HLS designers, we present the five characteristics of MpO classes. Each characteristic exploits the power of HLS-supported modern C++ features to build C++-based hardware modules. These characteristics lead to high-quality software descriptions and efficient hardware generation. We also present a use case of MpO, where we use C++ as the intermediate language for FPGA-targeted code generation from P4, a packet processing domain specific language. The MpO methodology is evaluated using three design experiments: a packet parser, a flow-based traffic manager, and a digital up-converter. Based on experiments, we show that MpO can be comparable to hand-written VHDL code while keeping a high abstraction level, human-readable coding style and modularity. Compared to traditional C-based HLS design, MpO leads to more efficient circuit generation, both in terms of performance and resource utilization. Also, the MpO approach notably improves software quality, augmenting parametrization while eliminating the incidence of code duplication.Comment: 9 pages. Paper accepted for publication at The 27th IEEE International Symposium on Field-Programmable Custom Computing Machines, San Diego CA, April 28 - May 1, 201

    HPQS: A fast, high-capacity, hybrid priority queuing system for high-speed networking devices

    Get PDF
    In this paper, we present a fast hybrid priority queue architecture intended for scheduling and prioritizing packets in a network data plane. Due to increasing traffic and tight requirements of high-speed networking devices, a high capacity priority queue, with constant latency and guaranteed performance is needed. We aim at reducing latency to best support the upcoming 5G wireless standards. The proposed hybrid priority queuing system (HPQS) enables pipelined queue operations with almost constant time complexity in practice. The proposed architecture is implemented in C++, and is synthesized with the Vivado High-Level Synthesis (HLS) tool. Two configurations are proposed. The first one is intended for scheduling with a multi-queuing system for which implementation results of 64 up to 512 independent queues are reported. The second configuration is intended for large capacity priority queues, that are placed and routed on a ZC706 board and a XCVU440-FLGB2377-3-E Xilinx FPGA supporting a total capacity of 1/2 million packet tags. The reported results are compared across a range of priority queue depths and performance metrics with existing approaches. The proposed HPQS supports links operating at 40 Gb/s

    On the application of minimum noise tracking to cancel cosine shaped residual noise

    Get PDF
    It has been shown recently that for coherence based dual microphone array speech enhancement systems, cross-spectral subtraction is an efficient technique aimed to reduce the correlated noise components. The zero-phase filtering criterion employed in these methods is derived from the standard coherence function that is modified to incorporate the noise cross power spectrum between the two channels. However, there has been limited success at applying coherence based filters when speech processing is carried out under relatively harsh acoustic conditions (SNR below -5dB) or when the speech and noise sources are closely spaced. We propose an alternative method that is effective, and that attempts to use a phase-based filtering criterion by substituting the cross power spectrum of the noisy signals received on the two channels by its real part. Then, a variant of the running minimum noise tracking procedure is applied on the estimated speech spectrum as an adaptive postfiltering to reduce the cosine shaped power spectrum of the remaining residual musical noise to a minimum spectral floor. Using that adaptive postfilter, a softdecision scheme is implemented to control the amount of noise suppression. Our preliminary results based on experiments conducted on real speech signals show an improved performance of the proposed method over the coherence based approaches. These results also show that it performs well on speech while producing less spectral distortion even in severe noisy conditions

    A high-speed, scalable, and programmable traffic manager architecture for flow-based networking

    Get PDF
    In this paper, we present a programmable and scalable traffic manager (TM) architecture, targeting requirements of high-speed networking devices, especially in the software-defined networking context. This TM is intended to ease the deployability of new architectures through field-programmable gate array (FPGA) platforms and to make the data plane programmable and scalable. Flow-based networking allows treating traffic in terms of flows rather than as a simple aggregation of individual packets, which simplifies scheduling and bandwidth allocation for each flow. Programmability brings agility, flexibility, and rapid adaptation to changes, allowing to meet network requirements in real-time. Traffic management with fast queuing and reduced latency plays an important role to support the upcoming 5G cellular communication technology. The proposed TM architecture is coded in C++ and is synthesized with the Vivado High-Level Synthesis tool. This TM is capable of supporting links operating beyond 40 Gb/s, on the ZC706 board and XCVU440-FLGB2377-3-E FPGA device from Xilinx, while achieving 80 Gb/s and 100 Gb/s throughput, respectively. The resulting placed and routed design was tested on the ZC706 board with its embedded ARM processor controlling table updates

    Node configuration for the Aho-Corasick algorithm in intrusion detection systems

    Get PDF
    In this paper, we analyze the performance and cost trade-off from selecting two representations of nodes when implementing the Aho-Corasick algorithm. This algorithm can be used for pattern matching in network-based intrusion detection systems such as Snort. Our analysis uses the Snort 2.9.7 rules set, which contains almost 26k patterns. Our methodology consists of code profiling and analysis, followed by the selection of a parameter to maximize a metric that combines clock cycles count and memory usage. The parameter determines which of two types of nodes is selected for each trie node. We show that it is possible to select the parameter to optimize the metric, which results in an improvement by up to 12× compared with the single node-type case

    Multidifferential study of identified charged hadron distributions in ZZ-tagged jets in proton-proton collisions at s=\sqrt{s}=13 TeV

    Full text link
    Jet fragmentation functions are measured for the first time in proton-proton collisions for charged pions, kaons, and protons within jets recoiling against a ZZ boson. The charged-hadron distributions are studied longitudinally and transversely to the jet direction for jets with transverse momentum 20 <pT<100< p_{\textrm{T}} < 100 GeV and in the pseudorapidity range 2.5<η<42.5 < \eta < 4. The data sample was collected with the LHCb experiment at a center-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 1.64 fb1^{-1}. Triple differential distributions as a function of the hadron longitudinal momentum fraction, hadron transverse momentum, and jet transverse momentum are also measured for the first time. This helps constrain transverse-momentum-dependent fragmentation functions. Differences in the shapes and magnitudes of the measured distributions for the different hadron species provide insights into the hadronization process for jets predominantly initiated by light quarks.Comment: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2022-013.html (LHCb public pages

    Study of the BΛc+ΛˉcKB^{-} \to \Lambda_{c}^{+} \bar{\Lambda}_{c}^{-} K^{-} decay

    Full text link
    The decay BΛc+ΛˉcKB^{-} \to \Lambda_{c}^{+} \bar{\Lambda}_{c}^{-} K^{-} is studied in proton-proton collisions at a center-of-mass energy of s=13\sqrt{s}=13 TeV using data corresponding to an integrated luminosity of 5 fb1\mathrm{fb}^{-1} collected by the LHCb experiment. In the Λc+K\Lambda_{c}^+ K^{-} system, the Ξc(2930)0\Xi_{c}(2930)^{0} state observed at the BaBar and Belle experiments is resolved into two narrower states, Ξc(2923)0\Xi_{c}(2923)^{0} and Ξc(2939)0\Xi_{c}(2939)^{0}, whose masses and widths are measured to be m(Ξc(2923)0)=2924.5±0.4±1.1MeV,m(Ξc(2939)0)=2938.5±0.9±2.3MeV,Γ(Ξc(2923)0)=0004.8±0.9±1.5MeV,Γ(Ξc(2939)0)=0011.0±1.9±7.5MeV, m(\Xi_{c}(2923)^{0}) = 2924.5 \pm 0.4 \pm 1.1 \,\mathrm{MeV}, \\ m(\Xi_{c}(2939)^{0}) = 2938.5 \pm 0.9 \pm 2.3 \,\mathrm{MeV}, \\ \Gamma(\Xi_{c}(2923)^{0}) = \phantom{000}4.8 \pm 0.9 \pm 1.5 \,\mathrm{MeV},\\ \Gamma(\Xi_{c}(2939)^{0}) = \phantom{00}11.0 \pm 1.9 \pm 7.5 \,\mathrm{MeV}, where the first uncertainties are statistical and the second systematic. The results are consistent with a previous LHCb measurement using a prompt Λc+K\Lambda_{c}^{+} K^{-} sample. Evidence of a new Ξc(2880)0\Xi_{c}(2880)^{0} state is found with a local significance of 3.8σ3.8\,\sigma, whose mass and width are measured to be 2881.8±3.1±8.5MeV2881.8 \pm 3.1 \pm 8.5\,\mathrm{MeV} and 12.4±5.3±5.8MeV12.4 \pm 5.3 \pm 5.8 \,\mathrm{MeV}, respectively. In addition, evidence of a new decay mode Ξc(2790)0Λc+K\Xi_{c}(2790)^{0} \to \Lambda_{c}^{+} K^{-} is found with a significance of 3.7σ3.7\,\sigma. The relative branching fraction of BΛc+ΛˉcKB^{-} \to \Lambda_{c}^{+} \bar{\Lambda}_{c}^{-} K^{-} with respect to the BD+DKB^{-} \to D^{+} D^{-} K^{-} decay is measured to be 2.36±0.11±0.22±0.252.36 \pm 0.11 \pm 0.22 \pm 0.25, where the first uncertainty is statistical, the second systematic and the third originates from the branching fractions of charm hadron decays.Comment: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2022-028.html (LHCb public pages
    corecore