1,290 research outputs found
A general framework for efficient FPGA implementation of matrix product
Original article can be found at: http://www.medjcn.com/ Copyright Softmotor LimitedHigh performance systems are required by the developers for fast processing of computationally intensive applications. Reconfigurable hardware devices in the form of Filed-Programmable Gate Arrays (FPGAs) have been proposed as viable system building blocks in the construction of high performance systems at an economical price. Given the importance and the use of matrix algorithms in scientific computing applications, they seem ideal candidates to harness and exploit the advantages offered by FPGAs. In this paper, a system for matrix algorithm cores generation is described. The system provides a catalog of efficient user-customizable cores, designed for FPGA implementation, ranging in three different matrix algorithm categories: (i) matrix operations, (ii) matrix transforms and (iii) matrix decomposition. The generated core can be either a general purpose or a specific application core. The methodology used in the design and implementation of two specific image processing application cores is presented. The first core is a fully pipelined matrix multiplier for colour space conversion based on distributed arithmetic principles while the second one is a parallel floating-point matrix multiplier designed for 3D affine transformations.Peer reviewe
NIKEL: Electronics and data acquisition for kilopixels kinetic inductance camera
A prototype of digital frequency multiplexing electronics allowing the real
time monitoring of microwave kinetic inductance detector (MKIDs) arrays for
mm-wave astronomy has been developed. Thanks to the frequency multiplexing, it
can monitor simultaneously 400 pixels over a 500 MHz bandwidth and requires
only two coaxial cables for instrumenting such a large array. The chosen
solution and the performances achieved are presented in this paper.Comment: 21 pages, 14 figure
High Speed and Low Latency ECC Implementation over GF(2m) on FPGA
In this paper, a novel high-speed elliptic curve cryptography (ECC) processor implementation for point multiplication (PM) on field-programmable gate array (FPGA) is proposed. A new segmented pipelined full-precision multiplier is used to reduce the latency, and the Lopez-Dahab Montgomery PM algorithm is modified for careful scheduling to avoid data dependency resulting in a drastic reduction in the number of clock cycles (CCs) required. The proposed ECC architecture has been implemented on Xilinx FPGAs' Virtex4, Virtex5, and Virtex7 families. To the best of our knowledge, our single- and three-multiplier-based designs show the fastest performance to date when compared with reported works individually. Our one-multiplier-based ECC processor also achieves the highest reported speed together with the best reported area-time performance on Virtex4 (5.32 ÎŒs at 210 MHz), on Virtex5 (4.91 ÎŒs at 228 MHz), and on the more advanced Virtex7 (3.18 ÎŒs at 352 MHz). Finally, the proposed three-multiplier-based ECC implementation is the first work reporting the lowest number of CCs and the fastest ECC processor design on FPGA (450 CCs to get 2.83 ÎŒs on Virtex7)
PGPG: An Automatic Generator of Pipeline Design for Programmable GRAPE Systems
We have developed PGPG (Pipeline Generator for Programmable GRAPE), a
software which generates the low-level design of the pipeline processor and
communication software for FPGA-based computing engines (FBCEs). An FBCE
typically consists of one or multiple FPGA (Field-Programmable Gate Array)
chips and local memory. Here, the term "Field-Programmable" means that one can
rewrite the logic implemented to the chip after the hardware is completed, and
therefore a single FBCE can be used for calculation of various functions, for
example pipeline processors for gravity, SPH interaction, or image processing.
The main problem with FBCEs is that the user need to develop the detailed
hardware design for the processor to be implemented to FPGA chips. In addition,
she or he has to write the control logic for the processor, communication and
data conversion library on the host processor, and application program which
uses the developed processor. These require detailed knowledge of hardware
design, a hardware description language such as VHDL, the operating system and
the application, and amount of human work is huge. A relatively simple design
would require 1 person-year or more. The PGPG software generates all necessary
design descriptions, except for the application software itself, from a
high-level design description of the pipeline processor in the PGPG language.
The PGPG language is a simple language, specialized to the description of
pipeline processors. Thus, the design of pipeline processor in PGPG language is
much easier than the traditional design. For real applications such as the
pipeline for gravitational interaction, the pipeline processor generated by
PGPG achieved the performance similar to that of hand-written code. In this
paper we present a detailed description of PGPG version 1.0.Comment: 24 pages, 6 figures, accepted PASJ 2005 July 2
A modelâbased design floatingâpoint accumulator. Case of study: FPGA implementation of a support vector machine kernel function
Recent research in wearable sensors have led to the development of an advanced platform capable of embedding complex algorithms such as machine learning algorithms, which are known to usually be resourceâdemanding. To address the need for high computational power, one solution is to design custom hardware platforms dedicated to the specific application by exploiting, for example, Field Programmable Gate Array (FPGA). Recently, modelâbased techniques and automatic code generation have been introduced in FPGA design. In this paper, a new modelâbased floatingâpoint accumulation circuit is presented. The architecture is based on the stateâofâtheâart delayed buffering algorithm. This circuit was conceived to be exploited in order to compute the kernel function of a support vector machine. The implementation of the proposed model was carried out in Simulink, and simulation results showed that it had better performance in terms of speed and occupied area when compared to other solutions. To better evaluate its figure, a practical case of a polynomial kernel function was considered. Simulink and VHDL postâimplementation timing simulations and measurements on FPGA confirmed the good results of the standâalone accumulator
- âŠ