3,187 research outputs found
A general framework for efficient FPGA implementation of matrix product
Original article can be found at: http://www.medjcn.com/ Copyright Softmotor LimitedHigh performance systems are required by the developers for fast processing of computationally intensive applications. Reconfigurable hardware devices in the form of Filed-Programmable Gate Arrays (FPGAs) have been proposed as viable system building blocks in the construction of high performance systems at an economical price. Given the importance and the use of matrix algorithms in scientific computing applications, they seem ideal candidates to harness and exploit the advantages offered by FPGAs. In this paper, a system for matrix algorithm cores generation is described. The system provides a catalog of efficient user-customizable cores, designed for FPGA implementation, ranging in three different matrix algorithm categories: (i) matrix operations, (ii) matrix transforms and (iii) matrix decomposition. The generated core can be either a general purpose or a specific application core. The methodology used in the design and implementation of two specific image processing application cores is presented. The first core is a fully pipelined matrix multiplier for colour space conversion based on distributed arithmetic principles while the second one is a parallel floating-point matrix multiplier designed for 3D affine transformations.Peer reviewe
FPGA-Based Bandwidth Selection for Kernel Density Estimation Using High Level Synthesis Approach
FPGA technology can offer significantly hi\-gher performance at much lower
power consumption than is available from CPUs and GPUs in many computational
problems. Unfortunately, programming for FPGA (using ha\-rdware description
languages, HDL) is a difficult and not-trivial task and is not intuitive for
C/C++/Java programmers. To bring the gap between programming effectiveness and
difficulty the High Level Synthesis (HLS) approach is promoting by main FPGA
vendors. Nowadays, time-intensive calculations are mainly performed on GPU/CPU
architectures, but can also be successfully performed using HLS approach. In
the paper we implement a bandwidth selection algorithm for kernel density
estimation (KDE) using HLS and show techniques which were used to optimize the
final FPGA implementation. We are also going to show that FPGA speedups,
comparing to highly optimized CPU and GPU implementations, are quite
substantial. Moreover, power consumption for FPGA devices is usually much less
than typical power consumption of the present CPUs and GPUs.Comment: 23 pages, 6 figures, extended version of initial pape
OPTIMAL AREA AND PERFORMANCE MAPPING OF K-LUT BASED FPGAS
FPGA circuits are increasingly used in many fields: for rapid prototyping of new products (including fast ASIC implementation), for logic emulation, for producing a small number of a device, or if a device should be reconfigurable in use (reconfigurable computing). Determining if an arbitrary, given wide, function can be implemented by a programmable logic block, unfortunately, it is generally, a very difficult problem. This problem is called the Boolean matching problem. This paper introduces a new implemented algorithm able to map, both for area and performance, combinational networks using k-LUT based FPGAs.k-LUT based FPGAs, combinational circuits, performance-driven mapping.
Evaluation of Single-Chip, Real-Time Tomographic Data Processing on FPGA - SoC Devices
A novel approach to tomographic data processing has been developed and
evaluated using the Jagiellonian PET (J-PET) scanner as an example. We propose
a system in which there is no need for powerful, local to the scanner
processing facility, capable to reconstruct images on the fly. Instead we
introduce a Field Programmable Gate Array (FPGA) System-on-Chip (SoC) platform
connected directly to data streams coming from the scanner, which can perform
event building, filtering, coincidence search and Region-Of-Response (ROR)
reconstruction by the programmable logic and visualization by the integrated
processors. The platform significantly reduces data volume converting raw data
to a list-mode representation, while generating visualization on the fly.Comment: IEEE Transactions on Medical Imaging, 17 May 201
Recommended from our members
On evolution of relatively large combinational logic circuits
Evolvable hardware (EHW) (Yao and Higuchi, 1999) is a technique introduced to automatically design circuits where the circuit configuration is carried out by evolutionary algorithms. One of the main difficulties in using EHW to solve real-world problems is the scalability. Until now, several strategies have been proposed to avoid this problem, but none of them completely tackle the issue. In this paper three different methods for evolving the most complex circuits have been tested for their scalability. These methods are bi-directional incremental evolution (SO-BIE); generalised disjunction decomposition (GD-BIE) and evolutionary strategies (ES) with dynamic mutation rate. In order to achieve the generalised conclusions the chosen approaches were tested using multipliers, traditionally used in EHW, but also logic circuits taken from MCNC (Yang, 1991) benchmark library and randomly generated circuits. The analysis of the approaches demonstrated that PLA-based ES is capable of evolving logic circuits of up to 12 inputs. The use of SO-BIE allows the generation of fully functional circuits of 14 inputs and GD-BIE is estimated to be able to evolve circuits of 21 inputs
Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code
Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance.
In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented
Generalized disjunction decomposition for evolvable hardware
Evolvable hardware (EHW) refers to self-reconfiguration hardware design, where the configuration is under the control of an evolutionary algorithm (EA). One of the main difficulties in using EHW to solve real-world problems is scalability, which limits the size of the circuit that may be evolved. This paper outlines a new type of decomposition strategy for EHW, the “generalized disjunction decomposition” (GDD), which allows the evolution of large circuits. The proposed method has been extensively tested, not only with multipliers and parity bit problems traditionally used in the EHW community, but also with logic circuits taken from the Microelectronics Center of North Carolina (MCNC) benchmark library and randomly generated circuits. In order to achieve statistically relevant results, each analyzed logic circuit has been evolved 100 times, and the average of these results is presented and compared with other EHW techniques. This approach is necessary because of the probabilistic nature of EA; the same logic circuit may not be solved in the same way if tested several times. The proposed method has been examined in an extrinsic EHW system using theevolution strategy. The results obtained demonstrate that GDD significantly improves the evolution of logic circuits in terms of the number of generations, reduces computational time as it is able to reduce the required time for a single iteration of the EA, and enables the evolution of larger circuits never before evolved. In addition to the proposed method, a short overview of EHW systems together with the most recent applications in electrical circuit design is provided
- …