13,741 research outputs found
On the AER Convolution Processors for FPGA
Image convolution operations in digital computer
systems are usually very expensive operations in terms of
resource consumption (processor resources and processing time)
for an efficient Real-Time application. In these scenarios the
visual information is divided into frames and each one has to be
completely processed before the next frame arrives in order to
warranty the real-time. A spike-based philosophy for computing
convolutions based on the neuro-inspired Address-Event-
Representation (AER) is achieving high performances. In this
paper we present two FPGA implementations of AER-based
convolution processors for relatively small Xilinx FPGAs
(Spartan-II 200 and Spartan-3 400), which process 64x64 images
with 11x11 convolution kernels. The maximum equivalent
operation rate that can be reached is 163.51 MOPS for 11x11
kernels, in a Xilinx Spartan 3 400 FPGA with a 50MHz clock.
Formulations, hardware architecture, operation examples and
performance comparison with frame-based convolution
processors are presented and discussed.Ministerio de Ciencia e Innovación TEC2006-11730-C03-02Ministerio de Ciencia e Innovación TEC2009-10639-C04-02Junta de Andalucía P06-TIC-0141
FPGA Implementations Comparison of Neuro-cortical Inspired Convolution Processors for Spiking Systems
Image convolution operations in digital computer systems are usually
very expensive operations in terms of resource consumption (processor
resources and processing time) for an efficient Real-Time application. In these
scenarios the visual information is divided in frames and each one has to be
completely processed before the next frame arrives. Recently a new method for
computing convolutions based on the neuro-inspired philosophy of spiking
systems (Address-Event-Representation systems, AER) is achieving high
performances. In this paper we present two FPGA implementations of AERbased
convolution processors that are able to work with 64x64 images and
programmable kernels of up to 11x11 elements. The main difference is the use
of RAM for integrators in one solution and the absence of integrators in the
second solution that is based on mapping operations. The maximum equivalent
operation rate is 163.51 MOPS for 11x11 kernels, in a Xilinx Spartan 3 400
FPGA with a 50MHz clock. Formulations, hardware architecture, operation
examples and performance comparison with frame-based convolution
processors are presented and discussed.Ministerio de Ciencia e Innovación TEC2006-11730-C03-02Junta de Andalucía P06-TIC-0141
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
Towards Lattice Quantum Chromodynamics on FPGA devices
In this paper we describe a single-node, double precision Field Programmable
Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the
context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we
invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three
Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250
accelerator and the largest device available on the market, the VU13P device.
In our implementation we separate software/hardware parts in such a way that
the entire multiplication by the Dirac operator is performed in hardware, and
the rest of the algorithm runs on the host. We find out that the FPGA
implementation can offer a performance comparable with that obtained using
current CPU or Intel's many core Xeon Phi accelerators. A possible multiple
node FPGA-based system is discussed and we argue that power-efficient High
Performance Computing (HPC) systems can be implemented using FPGA devices only.Comment: 17 pages, 4 figure
A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems
Recent technological advances have greatly improved the performance and
features of embedded systems. With the number of just mobile devices now
reaching nearly equal to the population of earth, embedded systems have truly
become ubiquitous. These trends, however, have also made the task of managing
their power consumption extremely challenging. In recent years, several
techniques have been proposed to address this issue. In this paper, we survey
the techniques for managing power consumption of embedded systems. We discuss
the need of power management and provide a classification of the techniques on
several important parameters to highlight their similarities and differences.
This paper is intended to help the researchers and application-developers in
gaining insights into the working of power management techniques and designing
even more efficient high-performance embedded systems of tomorrow
ParaFPGA 2013: Harnessing Programs, Power and Performance in Parallel FPGA applications
Future computing systems will require dedicated accelerators to achieve high-performance. The mini-symposium ParaFPGA explores parallel computing with FPGAs as an interesting avenue to reduce the gap between the architecture and the application. Topics discussed are the power of functional and dataflow languages, the performance of high-level synthesis tools, the automatic creation of hardware multi-cores using C-slow retiming, dynamic power management to control the energy consumption, real-time reconfiguration of streaming image processing filters and memory optimized event image segmentation
- …