1 research outputs found

    Energy Performance of Floating-Point Matrix Multiplication on FPGAs

    No full text
    Floating-point matrix multiplication is a basic kernel in scientific computing. It has been shown that implementations of this kernel on FPGAs can achieve high sustained performance [1]. However, to the best of our knowledge, existing work on FPGA-based floating-point matrix multiplication considers the optimization of latency or area only. In this paper, we analyze the impact of various parameters on the energy dissipation of two floating-point matrix multiplication algorithms developed by us. Due to space limitation, the algorithms are not presented here. Details of the algorithms (Algorithm 1 and Algorithm 2) can be found in [1]. We identify several parameters that affect the energy dissipation of the algorithms. These include the number of pipeline stages within the floating-point units, the block size for block matrix multiplication, and the number of PEs configure
    corecore