219 research outputs found
P4-compatible High-level Synthesis of Low Latency 100 Gb/s Streaming Packet Parsers in FPGAs
Packet parsing is a key step in SDN-aware devices. Packet parsers in SDN
networks need to be both reconfigurable and fast, to support the evolving
network protocols and the increasing multi-gigabit data rates. The combination
of packet processing languages with FPGAs seems to be the perfect match for
these requirements. In this work, we develop an open-source FPGA-based
configurable architecture for arbitrary packet parsing to be used in SDN
networks. We generate low latency and high-speed streaming packet parsers
directly from a packet processing program. Our architecture is pipelined and
entirely modeled using templated C++ classes. The pipeline layout is derived
from a parser graph that corresponds a P4 code after a series of graph
transformation rounds. The RTL code is generated from the C++ description using
Xilinx Vivado HLS and synthesized with Xilinx Vivado. Our architecture achieves
100 Gb/s data rate in a Xilinx Virtex-7 FPGA while reducing the latency by 45%
and the LUT usage by 40% compared to the state-of-the-art.Comment: Accepted for publication at the 26th ACM/SIGDA International
Symposium on Field-Programmable Gate Arrays February 25 - 27, 2018 Monterey
Marriott Hotel, Monterey, California, 7 pages, 7 figures, 1 tabl
PoET-BiN: Power Efficient Tiny Binary Neurons
The success of neural networks in image classification has inspired various
hardware implementations on embedded platforms such as Field Programmable Gate
Arrays, embedded processors and Graphical Processing Units. These embedded
platforms are constrained in terms of power, which is mainly consumed by the
Multiply Accumulate operations and the memory accesses for weight fetching.
Quantization and pruning have been proposed to address this issue. Though
effective, these techniques do not take into account the underlying
architecture of the embedded hardware. In this work, we propose PoET-BiN, a
Look-Up Table based power efficient implementation on resource constrained
embedded devices. A modified Decision Tree approach forms the backbone of the
proposed implementation in the binary domain. A LUT access consumes far less
power than the equivalent Multiply Accumulate operation it replaces, and the
modified Decision Tree algorithm eliminates the need for memory accesses. We
applied the PoET-BiN architecture to implement the classification layers of
networks trained on MNIST, SVHN and CIFAR-10 datasets, with near state-of-the
art results. The energy reduction for the classifier portion reaches up to six
orders of magnitude compared to a floating point implementations and up to
three orders of magnitude when compared to recent binary quantized neural
networks.Comment: Accepted in MLSys 2020 conferenc
CARLA: A Convolution Accelerator with a Reconfigurable and Low-Energy Architecture
Convolutional Neural Networks (CNNs) have proven to be extremely accurate for
image recognition, even outperforming human recognition capability. When
deployed on battery-powered mobile devices, efficient computer architectures
are required to enable fast and energy-efficient computation of costly
convolution operations. Despite recent advances in hardware accelerator design
for CNNs, two major problems have not yet been addressed effectively,
particularly when the convolution layers have highly diverse structures: (1)
minimizing energy-hungry off-chip DRAM data movements; (2) maximizing the
utilization factor of processing resources to perform convolutions. This work
thus proposes an energy-efficient architecture equipped with several optimized
dataflows to support the structural diversity of modern CNNs. The proposed
approach is evaluated by implementing convolutional layers of VGGNet-16 and
ResNet-50. Results show that the architecture achieves a Processing Element
(PE) utilization factor of 98% for the majority of 3x3 and 1x1 convolutional
layers, while limiting latency to 396.9 ms and 92.7 ms when performing
convolutional layers of VGGNet-16 and ResNet-50, respectively. In addition, the
proposed architecture benefits from the structured sparsity in ResNet-50 to
reduce the latency to 42.5 ms when half of the channels are pruned.Comment: 12 page
Module-per-Object: a Human-Driven Methodology for C++-based High-Level Synthesis Design
High-Level Synthesis (HLS) brings FPGAs to audiences previously unfamiliar to
hardware design. However, achieving the highest Quality-of-Results (QoR) with
HLS is still unattainable for most programmers. This requires detailed
knowledge of FPGA architecture and hardware design in order to produce
FPGA-friendly codes. Moreover, these codes are normally in conflict with best
coding practices, which favor code reuse, modularity, and conciseness.
To overcome these limitations, we propose Module-per-Object (MpO), a
human-driven HLS design methodology intended for both hardware designers and
software developers with limited FPGA expertise. MpO exploits modern C++ to
raise the abstraction level while improving QoR, code readability and
modularity. To guide HLS designers, we present the five characteristics of MpO
classes. Each characteristic exploits the power of HLS-supported modern C++
features to build C++-based hardware modules. These characteristics lead to
high-quality software descriptions and efficient hardware generation. We also
present a use case of MpO, where we use C++ as the intermediate language for
FPGA-targeted code generation from P4, a packet processing domain specific
language. The MpO methodology is evaluated using three design experiments: a
packet parser, a flow-based traffic manager, and a digital up-converter. Based
on experiments, we show that MpO can be comparable to hand-written VHDL code
while keeping a high abstraction level, human-readable coding style and
modularity. Compared to traditional C-based HLS design, MpO leads to more
efficient circuit generation, both in terms of performance and resource
utilization. Also, the MpO approach notably improves software quality,
augmenting parametrization while eliminating the incidence of code duplication.Comment: 9 pages. Paper accepted for publication at The 27th IEEE
International Symposium on Field-Programmable Custom Computing Machines, San
Diego CA, April 28 - May 1, 201
Bridging the Gap: FPGAs as Programmable Switches
The emergence of P4, a domain specific language, coupled to PISA, a domain
specific architecture, is revolutionizing the networking field. P4 allows to
describe how packets are processed by a programmable data plane, spanning ASICs
and CPUs, implementing PISA. Because the processing flexibility can be limited
on ASICs, while the CPUs performance for networking tasks lag behind, recent
works have proposed to implement PISA on FPGAs. However, little effort has been
dedicated to analyze whether FPGAs are good candidates to implement PISA. In
this work, we take a step back and evaluate the micro-architecture efficiency
of various PISA blocks. We demonstrate, supported by a theoretical and
experimental analysis, that the performance of a few PISA blocks is severely
limited by the current FPGA architectures. Specifically, we show that match
tables and programmable packet schedulers represent the main performance
bottlenecks for FPGA-based programmable switches. Thus, we explore two avenues
to alleviate these shortcomings. First, we identify network applications well
tailored to current FPGAs. Second, to support a wider range of networking
applications, we propose modifications to the FPGA architectures which can also
be of interest out of the networking field.Comment: To be published in : IEEE International Conference on High
Performance Switching and Routing 202
Models for the Brane-Bulk Interaction: Toward Understanding Braneworld Cosmological Perturbation
Using some simple toy models, we explore the nature of the brane-bulk
interaction for cosmological models with a large extra dimension. We are in
particular interested in understanding the role of the bulk gravitons, which
from the point of view of an observer on the brane will appear to generate
dissipation and nonlocality, effects which cannot be incorporated into an
effective (3+1)-dimensional Lagrangian field theoretic description. We
explicitly work out the dynamics of several discrete systems consisting of a
finite number of degrees of freedom on the boundary coupled to a
(1+1)-dimensional field theory subject to a variety of wave equations. Systems
both with and without time translation invariance are considered and moving
boundaries are discussed as well. The models considered contain all the
qualitative feature of quantized linearized cosmological perturbations for a
Randall-Sundrum universe having an arbitrary expansion history, with the sole
exception of gravitational gauge invariance, which will be treated in a later
paper.Comment: 47 pages, RevTeX (or Latex, etc) with 5 eps figure
Node configuration for the Aho-Corasick algorithm in intrusion detection systems
In this paper, we analyze the performance and cost trade-off from selecting two representations of nodes when implementing the Aho-Corasick algorithm. This algorithm can be used for pattern matching in network-based intrusion detection systems such as Snort. Our analysis uses the Snort 2.9.7 rules set, which contains almost 26k patterns. Our methodology consists of code profiling and analysis, followed by the selection of a parameter to maximize a metric that combines clock cycles count and memory usage. The parameter determines which of two types of nodes is selected for each trie node. We show that it is possible to select the parameter to optimize the metric, which results in an improvement by up to 12× compared with the single node-type case
Alzheimer’s Prevention Initiative Generation Program: Development of an APOE genetic counseling and disclosure process in the context of clinical trials
IntroductionAs the number of Alzheimer’s disease (AD) prevention studies grows, many individuals will need to learn their genetic and/or biomarker risk for the disease to determine trial eligibility. An alternative to traditional models of genetic counseling and disclosure is needed to provide comprehensive standardized counseling and disclosure of apolipoprotein E (APOE) results efficiently, safely, and effectively in the context of AD prevention trials.MethodsA multidisciplinary Genetic Testing, Counseling, and Disclosure Committee was established and charged with operationalizing the Alzheimer’s Prevention Initiative (API) Genetic Counseling and Disclosure Process for use in the API Generation Program trials. The objective was to provide consistent information to research participants before and during the APOE counseling and disclosure session using standardized educational and session materials.ResultsThe Genetic Testing, Counseling, and Disclosure Committee created a process consisting of eight components: requirements of APOE testing and reports, psychological readiness assessment, determination of AD risk estimates, guidance for identifying providers of disclosure, predisclosure education, APOE counseling and disclosure session materials, APOE counseling and disclosure session flow, and assessing APOE disclosure impact.DiscussionThe API Genetic Counseling and Disclosure Process provides a framework for largeâ scale disclosure of APOE genotype results to study participants and serves as a model for disclosure of biomarker results. The process provides education to participants about the meaning and implication(s) of their APOE results while also incorporating a comprehensive assessment of disclosure impact. Data assessing participant safety and psychological wellâ being before and after APOE disclosure are still being collected and will be presented in a future publication.Highlightsâ ¢Participants may need to learn their risk for Alzheimer’s disease to enroll in studies.â ¢Alternatives to traditional models of apolipoprotein E counseling and disclosure are needed.â ¢An alternative process was developed by the Alzheimer’s Prevention Initiative.â ¢This process has been implemented by the Alzheimer’s Prevention Initiative Generation Program.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/153071/1/trc2jtrci201909013.pd
- …