Search CORE

5,064 research outputs found

BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing

Author: Rasnayake Lahiru
Sjalander Magnus
Umuroglu Yaman
Publication venue
Publication date: 01/01/2018
Field of study

Matrix-matrix multiplication is a key computational kernel for numerous applications in science and engineering, with ample parallelism and data locality that lends itself well to high-performance implementations. Many matrix multiplication-dependent applications can use reduced-precision integer or fixed-point representations to increase their performance and energy efficiency while still offering adequate quality of results. However, precision requirements may vary between different application phases or depend on input data, rendering constant-precision solutions ineffective. We present BISMO, a vectorized bit-serial matrix multiplication overlay for reconfigurable computing. BISMO utilizes the excellent binary-operation performance of FPGAs to offer a matrix multiplication performance that scales with required precision and parallelism. We characterize the resource usage and performance of BISMO across a range of parameters to build a hardware cost model, and demonstrate a peak performance of 6.5 TOPS on the Xilinx PYNQ-Z1 board.Comment: To appear at FPL'1

arXiv.org e-Print Archive

Crossref

NORA - Norwegian Open Research Archives

A model for peak matrix performance on FPGAs

Author: Leong PHW
Lin CY
So HKH
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Computations involving matrices form the kernel of a large spectrum of computationally demanding applications for which FPGAs have actively been utilized as accelerators. The performances of such matrix operations on FPGAs are related to underlying architectural parameters such as computational resources, memory and I/O bandwidth. A model that gives bounds on the peak performance of matrix-vector and matrix-matrix multiplication operations on FPGAs based on these parameters is presented. The architecture and efficiency of existing implementations are compared against the model. Future trends in matrix performance on FPGA devices are estimated based on the performance model and system parameters from the past decade. © 2011 IEEE.published_or_final_versio

HKU Scholars Hub

A Many-Core Overlay for High-Performance Embedded Computing on FPGAs

Author: Neto Horácio
Véstias Mário
Publication venue
Publication date: 21/08/2014
Field of study

In this work, we propose a configurable many-core overlay for high-performance embedded computing. The size of internal memory, supported operations and number of ports can be configured independently for each core of the overlay. The overlay was evaluated with matrix multiplication, LU decomposition and Fast-Fourier Transform (FFT) on a ZYNQ-7020 FPGA platform. The results show that using a system-level many-core overlay avoids complex hardware design and still provides good performance results.Comment: Presented at First International Workshop on FPGAs for Software Programmers (FSP 2014) (arXiv:1408.4423

arXiv.org e-Print Archive

Repositório Científico do Instituto Politécnico de Lisboa