530 research outputs found
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
Efficient hardware implementations of low bit depth motion estimation algorithms
In this paper, we present efficient hardware implementation of multiplication free one-bit transform (MF1BT) based and constraint one-bit transform (C-1BT) based motion estimation (ME) algorithms, in order to provide low bit-depth representation based full search block ME hardware for real-time video encoding. We used a source pixel based linear array (SPBLA) hardware architecture for low bit depth ME for the first time in the literature. The proposed SPBLA based implementation results in a genuine data flow scheme which significantly reduces the number of data reads from the current block memory, which in turn reduces the power consumption by at least 50% compared to conventional 1BT based ME hardware architecture presented in the literature. Because of the binary nature of low bit-depth ME algorithms, their hardware architectures are more efficient than existing 8 bits/pixel representation based ME architectures
Module-per-Object: a Human-Driven Methodology for C++-based High-Level Synthesis Design
High-Level Synthesis (HLS) brings FPGAs to audiences previously unfamiliar to
hardware design. However, achieving the highest Quality-of-Results (QoR) with
HLS is still unattainable for most programmers. This requires detailed
knowledge of FPGA architecture and hardware design in order to produce
FPGA-friendly codes. Moreover, these codes are normally in conflict with best
coding practices, which favor code reuse, modularity, and conciseness.
To overcome these limitations, we propose Module-per-Object (MpO), a
human-driven HLS design methodology intended for both hardware designers and
software developers with limited FPGA expertise. MpO exploits modern C++ to
raise the abstraction level while improving QoR, code readability and
modularity. To guide HLS designers, we present the five characteristics of MpO
classes. Each characteristic exploits the power of HLS-supported modern C++
features to build C++-based hardware modules. These characteristics lead to
high-quality software descriptions and efficient hardware generation. We also
present a use case of MpO, where we use C++ as the intermediate language for
FPGA-targeted code generation from P4, a packet processing domain specific
language. The MpO methodology is evaluated using three design experiments: a
packet parser, a flow-based traffic manager, and a digital up-converter. Based
on experiments, we show that MpO can be comparable to hand-written VHDL code
while keeping a high abstraction level, human-readable coding style and
modularity. Compared to traditional C-based HLS design, MpO leads to more
efficient circuit generation, both in terms of performance and resource
utilization. Also, the MpO approach notably improves software quality,
augmenting parametrization while eliminating the incidence of code duplication.Comment: 9 pages. Paper accepted for publication at The 27th IEEE
International Symposium on Field-Programmable Custom Computing Machines, San
Diego CA, April 28 - May 1, 201
- …