1,456 research outputs found
FPGA Implementation of Convolutional Neural Networks with Fixed-Point Calculations
Neural network-based methods for image processing are becoming widely used in
practical applications. Modern neural networks are computationally expensive
and require specialized hardware, such as graphics processing units. Since such
hardware is not always available in real life applications, there is a
compelling need for the design of neural networks for mobile devices. Mobile
neural networks typically have reduced number of parameters and require a
relatively small number of arithmetic operations. However, they usually still
are executed at the software level and use floating-point calculations. The use
of mobile networks without further optimization may not provide sufficient
performance when high processing speed is required, for example, in real-time
video processing (30 frames per second). In this study, we suggest
optimizations to speed up computations in order to efficiently use already
trained neural networks on a mobile device. Specifically, we propose an
approach for speeding up neural networks by moving computation from software to
hardware and by using fixed-point calculations instead of floating-point. We
propose a number of methods for neural network architecture design to improve
the performance with fixed-point calculations. We also show an example of how
existing datasets can be modified and adapted for the recognition task in hand.
Finally, we present the design and the implementation of a floating-point gate
array-based device to solve the practical problem of real-time handwritten
digit classification from mobile camera video feed
Design of Novel Algorithm and Architecture for Gaussian Based Color Image Enhancement System for Real Time Applications
This paper presents the development of a new algorithm for Gaussian based
color image enhancement system. The algorithm has been designed into
architecture suitable for FPGA/ASIC implementation. The color image enhancement
is achieved by first convolving an original image with a Gaussian kernel since
Gaussian distribution is a point spread function which smoothen the image.
Further, logarithm-domain processing and gain/offset corrections are employed
in order to enhance and translate pixels into the display range of 0 to 255.
The proposed algorithm not only provides better dynamic range compression and
color rendition effect but also achieves color constancy in an image. The
design exploits high degrees of pipelining and parallel processing to achieve
real time performance. The design has been realized by RTL compliant Verilog
coding and fits into a single FPGA with a gate count utilization of 321,804.
The proposed method is implemented using Xilinx Virtex-II Pro XC2VP40-7FF1148
FPGA device and is capable of processing high resolution color motion pictures
of sizes of up to 1600x1200 pixels at the real time video rate of 116 frames
per second. This shows that the proposed design would work for not only still
images but also for high resolution video sequences.Comment: 15 pages, 15 figure
A Scalable Correlator Architecture Based on Modular FPGA Hardware, Reuseable Gateware, and Data Packetization
A new generation of radio telescopes is achieving unprecedented levels of
sensitivity and resolution, as well as increased agility and field-of-view, by
employing high-performance digital signal processing hardware to phase and
correlate large numbers of antennas. The computational demands of these imaging
systems scale in proportion to BMN^2, where B is the signal bandwidth, M is the
number of independent beams, and N is the number of antennas. The
specifications of many new arrays lead to demands in excess of tens of PetaOps
per second.
To meet this challenge, we have developed a general purpose correlator
architecture using standard 10-Gbit Ethernet switches to pass data between
flexible hardware modules containing Field Programmable Gate Array (FPGA)
chips. These chips are programmed using open-source signal processing libraries
we have developed to be flexible, scalable, and chip-independent. This work
reduces the time and cost of implementing a wide range of signal processing
systems, with correlators foremost among them,and facilitates upgrading to new
generations of processing technology. We present several correlator
deployments, including a 16-antenna, 200-MHz bandwidth, 4-bit, full Stokes
parameter application deployed on the Precision Array for Probing the Epoch of
Reionization.Comment: Accepted to Publications of the Astronomy Society of the Pacific. 31
pages. v2: corrected typo, v3: corrected Fig. 1
On the AER Convolution Processors for FPGA
Image convolution operations in digital computer
systems are usually very expensive operations in terms of
resource consumption (processor resources and processing time)
for an efficient Real-Time application. In these scenarios the
visual information is divided into frames and each one has to be
completely processed before the next frame arrives in order to
warranty the real-time. A spike-based philosophy for computing
convolutions based on the neuro-inspired Address-Event-
Representation (AER) is achieving high performances. In this
paper we present two FPGA implementations of AER-based
convolution processors for relatively small Xilinx FPGAs
(Spartan-II 200 and Spartan-3 400), which process 64x64 images
with 11x11 convolution kernels. The maximum equivalent
operation rate that can be reached is 163.51 MOPS for 11x11
kernels, in a Xilinx Spartan 3 400 FPGA with a 50MHz clock.
Formulations, hardware architecture, operation examples and
performance comparison with frame-based convolution
processors are presented and discussed.Ministerio de Ciencia e Innovación TEC2006-11730-C03-02Ministerio de Ciencia e Innovación TEC2009-10639-C04-02Junta de Andalucía P06-TIC-0141
- …