6,132 research outputs found
Fast and efficient FPGA implementation of connected operators
International audienceThe Connected Component Tree (CCT)-based operators play a central role in the development of new algorithms related to image processing applications such as pattern recognition, video-surveillance or motion extraction. The CCT construction, being a time consuming task (about 80% of the application time), these applications remain far-off mobile embedded systems. This paper presents its efficient FPGA implementation suited for embedded systems. Three main contributions are discussed: an efficient data structure proposal adapted to representing the CCT in embedded systems, a memory organization suitable for FPGA implementation by using on-chip memory and a customizable hardware accelerator architecture for CCT-based applications
A Many-Core Overlay for High-Performance Embedded Computing on FPGAs
In this work, we propose a configurable many-core overlay for
high-performance embedded computing. The size of internal memory, supported
operations and number of ports can be configured independently for each core of
the overlay. The overlay was evaluated with matrix multiplication, LU
decomposition and Fast-Fourier Transform (FFT) on a ZYNQ-7020 FPGA platform.
The results show that using a system-level many-core overlay avoids complex
hardware design and still provides good performance results.Comment: Presented at First International Workshop on FPGAs for Software
Programmers (FSP 2014) (arXiv:1408.4423
FPGA-Based Bandwidth Selection for Kernel Density Estimation Using High Level Synthesis Approach
FPGA technology can offer significantly hi\-gher performance at much lower
power consumption than is available from CPUs and GPUs in many computational
problems. Unfortunately, programming for FPGA (using ha\-rdware description
languages, HDL) is a difficult and not-trivial task and is not intuitive for
C/C++/Java programmers. To bring the gap between programming effectiveness and
difficulty the High Level Synthesis (HLS) approach is promoting by main FPGA
vendors. Nowadays, time-intensive calculations are mainly performed on GPU/CPU
architectures, but can also be successfully performed using HLS approach. In
the paper we implement a bandwidth selection algorithm for kernel density
estimation (KDE) using HLS and show techniques which were used to optimize the
final FPGA implementation. We are also going to show that FPGA speedups,
comparing to highly optimized CPU and GPU implementations, are quite
substantial. Moreover, power consumption for FPGA devices is usually much less
than typical power consumption of the present CPUs and GPUs.Comment: 23 pages, 6 figures, extended version of initial pape
A dynamically reconfigurable pattern matcher for regular expressions on FPGA
In this article we describe how to expand a partially dynamic reconfig- urable pattern matcher for regular expressions presented in previous work by Di- vyasree and Rajashekar [2]. The resulting, extended, pattern matcher is fully dynamically reconfigurable. First, the design is adapted for use with parameterisable configurations, a method for Dynamic Circuit Specialization. Using parameteris- able configurations allows us to achieve the same area gains as the hand crafted reconfigurable design, with the benefit that parameterisable configurations can be applied automatically. This results in a design that is more easily adaptable to spe- cific applications and allows for an easier design exploration. Additionally, the pa- rameterisable configuration implementation is also generated automatically, which greatly reduces the design overhead of using dynamic reconfiguration. Secondly, we propose a number of expansions to the original design to overcome several limitations in the original design that constrain the dynamic reconfigurability of the pattern matcher. We propose two different solutions to dynamically change the character that is matched in a certain block. The resulting pattern matcher, after these changes, is fully dynamically reconfigurable, all aspects of the implemented regular expression can be changed at run-time
- …