Single-Precision and Double-Precision Merged Floating-Point Multiplication and Addition Units on FPGA

Abstract

Floating-point (FP) operations defined in IEEE 754-2008 Standard for Floating-Point Arithmetic can provide wider dynamic range and higher precision than fixed-point operations. Many scientific computations and multimedia applications adopt FP operations. Among all the FP operations, addition and multiplication are the most frequent operations. In this thesis, the single-precision (SP) and double-precision (DP) merged FP multiplier and FP adder architectures are proposed. The proposed efficient iterative FP multiplier is designed based on the Karatsuba algorithm and implemented with the pipelined architecture. It can accomplish two parallel SP multiplication operations in one iteration with a latency of 6 clock cycles or one DP multiplication operation in two iterations with a latency of 9 clock cycles. Implemented on Xilinx Virtex-5 (xc5vlx155ff1760-3) FPGA device, the proposed multiplier runs at 348 MHz using 6 DSP48E blocks, 1117 LUTs, and 1370 FFs. Compared to previous FPGA based multiple-precision FP multiplier, the proposed designs runs at 4% faster clock frequency with reduction of 33% of DSP blocks, 17% latency for SP multiplication, and 28% latency for DP multiplication. The proposed high performance FP adder is designed based one the two-path FP addition algorithm. With fully pipelined architecture, the proposed adder can accomplish one DP or two parallel SP addition/subtraction operations in 6 clock cycles. The proposed adder architecture is implemented on both Altera and Xilinx 65nm process FPGA devices. The proposed adder can run up to 336 MHz with 1694 FFs, 1420 LUTs on Xilinx Virtex-5 (xc5vlx155ff1760-3) FPGA device. Compared to the combination of one DP and two SP architecture built with Xilinx FP operator, the proposed adder has 11.3% faster clock frequency. On Altera Stratix-III (EP3SL340F1760C2) FPGA device, the maximum clock frequency of the proposed adder can reach 358 MHz and 1686 ALUTs and 1556 registers are occupied. The proposed adder is 11.6% faster than the combination of one DP and two SP architecture built with Altera FP megafunction. For the reference of other researchers, the implementation results of the proposed FP multiplier and FP adder on the latest Xilinx Virtex-7 device and Altera Arria 10 device are also provided

    Similar works