2 research outputs found

    Pipelining Of Double Precision Floating Point Division And Square Root Operations On Field-programmable Gate Arrays

    Get PDF
    Many space applications, such as vision-based systems, synthetic aperture radar, and radar altimetry rely increasingly on high data rate DSP algorithms. These algorithms use double precision floating point arithmetic operations. While most DSP applications can be executed on DSP processors, the DSP numerical requirements of these new space applications surpass by far the numerical capabilities of many current DSP processors. Since the tradition in DSP processing has been to use fixed point number representation, only recently have DSP processors begun to incorporate floating point arithmetic units, even though most of these units handle only single precision floating point addition/subtraction, multiplication, and occasionally division. While DSP processors are slowly evolving to meet the numerical requirements of newer space applications, FPGA densities have rapidly increased to parallel and surpass even the gate densities of many DSP processors and commodity CPUs. This makes them attractive platforms to implement compute-intensive DSP computations. Even in the presence of this clear advantage on the side of FPGAs, few attempts have been made to examine how wide precision floating point arithmetic, particularly division and square root operations, can perform on FPGAs to support these compute-intensive DSP applications. In this context, this thesis presents the sequential and pipelined designs of IEEE-754 compliant double floating point division and square root operations based on low radix digit recurrence algorithms. FPGA implementations of these algorithms have the advantage of being easily testable. In particular, the pipelined designs are synthesized based on careful partial and full unrolling of the iterations in the digit recurrence algorithms. In the overall, the implementations of the sequential and pipelined designs are common-denominator implementations which do not use any performance-enhancing embedded components such as multipliers and block memory. As these implementations exploit exclusively the fine-grain reconfigurable resources of Virtex FPGAs, they are easily portable to other FPGAs with similar reconfigurable fabrics without any major modifications. The pipelined designs of these two operations are evaluated in terms of area, throughput, and dynamic power consumption as a function of pipeline depth. Pipelining experiments reveal that the area overhead tends to remain constant regardless of the degree of pipelining to which the design is submitted, while the throughput increases with pipeline depth. In addition, these experiments reveal that pipelining reduces power considerably in shallow pipelines. Pipelining further these designs does not necessarily lead to significant power reduction. By partitioning these designs into deeper pipelines, these designs can reach throughputs close to the 100 MFLOPS mark by consuming a modest 1% to 8% of the reconfigurable fabric within a Virtex-II XC2VX000 (e.g., XC2V1000 or XC2V6000) FPGA

    Pipelining Of Double Precision Floating Point Division And Square Root Operations

    No full text
    Space applications rely increasingly on high data rate DSP algorithms. These algorithms use double precision floating point arithmetic operations. While most DSP applications can be compiled on DSP processors, high data rate DSP computations require novel implementation technologies to support their high throughputs. Only recently, gate densities in FPGAs have reached a level which makes them attractive platforms to implement compute-intensive DSP applications. In this context, this paper presents the sequential and pipelined designs of a double precision floating point divider and square root unit on FPGAs. Contrary to pipelined parallel implementations, the pipelining of these units is based on unrolling the iterations in low-radix digit recurrence algorithms. These units are mapped on generic FPGA reconfigurable fabric without taking advantage of any advanced architectural components available in high capacity FPGAs. The implementations of these designs show that their performances are comparable to, and sometimes higher than, the performances of non-iterative designs based of high radix numbers. The iterative divider and square root unit occupy less than 1% of an XC2V6000 FPGA chip while their pipelined counterparts can produce throughputs that reach the 100 MFLOPS mark by consuming a modest 8% of the chip area
    corecore