76 research outputs found

    An Operand-Optimized Asynchronous IEEE 754 Double-Precision Floating-Point Adder

    Full text link

    IEEE Compliant Double-Precision FPU and 64-bit ALU with Variable Latency Integer Divider

    Get PDF
    Together the arithmetic logic unit (ALU) and floating-point unit (FPU) perform all of the mathematical and logic operations of computer processors. Because they are used so prominently, they fall in the critical path of the central processing unit - often becoming the bottleneck, or limiting factor for performance. As such, the design of a high-speed ALU and FPU is vital to creating a processor capable of performing up to the demanding standards of today\u27s computer users. In this paper, both a 64-bit ALU and a 64-bit FPU are designed based on the reduced instruction set computer architecture. The ALU performs the four basic mathematical operations - addition, subtraction, multiplication and division - in both unsigned and two\u27s complement format, basic logic operations and shifting. The division algorithm is a novel approach, using a comparison multiples based SRT divider to create a variable latency integer divider. The floating-point unit performs the double-precision floating-point operations add, subtract, multiply and divide, in accordance with the IEEE 754 standard for number representation and rounding. The ALU and FPU were implemented in VHDL, simulated in ModelSim, and constrained and synthesized using Synopsys Design Compiler (2006.06). They were synthesized using TSMC 0.1 3nm CMOS technology. The timing, power and area synthesis results were recorded, and, where applicable, compared to those of the corresponding DesignWare components.The ALU synthesis reported an area of 122,215 gates, a power of 384 mW, and a delay of 2.89 ns - a frequency of 346 MHz. The FPU synthesis reported an area 84,440 gates, a delay of 2.82 ns and an operating frequency of 355 MHz. It has a maximum dynamic power of 153.9 mW

    Design and implementation of an out of order execution engine of floating point arithmetic operations

    Get PDF
    In this thesis, work is undertaken towards the design in hardware description languages and implementation in FPGA of an out of order execution engine of floating point arithmetic operations. This thesis work, is part of a project called Lagarto

    Parallel Pipeline Implementation of 64-bit FPU on Hardware

    Get PDF
    This project is entitled "Parallel Pipelined Implementation of 64-bit FPU on Hardware". Most modern processors typically have two different logic units which handlethe calculations requiredby the computer. One of them is the arithmetic-logic unit (ALU) which operates on integer operands while the other is the floating point unit (FPU) which operates on real operands. The aim of this project is therefore to create a FPUwhich complies withthe IEEE-754 double precision standard (64-bit). The project also aims to study the speed improvements offered by parallel and pipelined design. The project also requires application of advanced digital design techniques by using Verilog in a real world project. The designed FPU is targeted to be capable of performing floating point addition (FADD), subtraction (FSUB), multiplication (FMUL) and division (FDIV) operations equally as fast. The FPU must also demonstrate the performance rewards of the parallel and pipeline design. It is thus implied that the project would require an initial study on FP numbers and FP arithmetic. How FP arithmetic is actually implemented in hardware must also be know in-depth. The section on methodology details each steps that is expected to be taken throughout the course of the project. The methodology would serve as a general guideline to execute the project and more details and other refinements may be made as further progress is made into the project. The project basically has two main phases, the first being software RTL coding to be completed in semester 1 while the second is hardware implementation and testing in FPGA. The results available from the project thus far is incomplete, because of time constraints, the Verilog coding is not totally finished. Oncethe codes are done, RTL tests and simulation would need to be conducted, and then only will it be implemented on the FPGA

    Low Power Synchronous Design of Hardware Architecture for Ieee 754 Single Precision Floating Point Fast Fourier Transform

    Get PDF
    Signal Processing, communication systems, Digital information systems and many other fields of DSP have the wide need for Fast Fourier Transformation computations. Hardware architecture for computing IEEE 754 single precision floating point FFT is proposed here and the work is focused on power optimization of the design. Cooley-Tukey�s (DIF) Decimation in Frequency domain butterfly algorithm is used for the design implementation. Proposed design is a synchronous architecture and proved to be an efficient compared to the earlier parallel architectures. The clock latency and hardware over head of the design is productive and cost effective compared to the designs known earlier. The design is implemented in RTL Verilog and logically verified using Altera-Model Sim. Synthesis of the design is carried out in gscl-45 nm library, 1.1 v process using Synopsys design vision and prime time tools. The power reports showed that the proposed design consumes 90% less power with 50% reduced clock latency compared to earlier designs. Frequency of the design is compromised to an extent but can be improved using the suggested novel sub-designs of floating point add/sub and multiply blocks. Techniques for further power optimization are also given for future implementations.Electrical Engineerin

    REAL-TIME ADAPTIVE PULSE COMPRESSION ON RECONFIGURABLE, SYSTEM-ON-CHIP (SOC) PLATFORMS

    Get PDF
    New radar applications need to perform complex algorithms and process a large quantity of data to generate useful information for the users. This situation has motivated the search for better processing solutions that include low-power high-performance processors, efficient algorithms, and high-speed interfaces. In this work, hardware implementation of adaptive pulse compression algorithms for real-time transceiver optimization is presented, and is based on a System-on-Chip architecture for reconfigurable hardware devices. This study also evaluates the performance of dedicated coprocessors as hardware accelerator units to speed up and improve the computation of computing-intensive tasks such matrix multiplication and matrix inversion, which are essential units to solve the covariance matrix. The tradeoffs between latency and hardware utilization are also presented. Moreover, the system architecture takes advantage of the embedded processor, which is interconnected with the logic resources through high-performance buses, to perform floating-point operations, control the processing blocks, and communicate with an external PC through a customized software interface. The overall system functionality is demonstrated and tested for real-time operations using a Ku-band testbed together with a low-cost channel emulator for different types of waveforms

    Comparison of logarithmic and floating-point number systems implemented on Xilinx Virtex-II field-programmable gate arrays

    Get PDF
    The aim of this thesis is to compare the implementation of parameterisable LNS (logarithmic number system) and floating-point high dynamic range number systems on FPGA. The Virtex/Virtex-II range of FPGAs from Xilinx, which are the most popular FPGA technology, are used to implement the designs. The study focuses on using the low level primitives of the technology in an efficient way and so initially the design issues in implementing fixed-point operators are considered. The four basic operations of addition, multiplication, division and square root are considered. Carry- free adders, ripple-carry adders, parallel multipliers and digit recurrence division and square root are discussed. The floating-point operators use the word format and exceptions as described by the IEEE std-754. A dual-path adder implementation is described in detail, as are floating-point multiplier, divider and square root components. Results and comparisons with other works are given. The efficient implementation of function evaluation methods is considered next. An overview of current FPGA methods is given and a new piecewise polynomial implementation using the Taylor series is presented and compared with other designs in the literature. In the next section the LNS word format, accuracy and exceptions are described and two new LNS addition/subtraction function approximations are described. The algorithms for performing multiplication, division and powering in the LNS domain are also described and are compared with other designs in the open literature. Parameterisable conversion algorithms to convert to/from the fixed-point domain from/to the LNS and floating-point domain are described and implementation results given. In the next chapter MATLAB bit-true software models are given that have the exact functionality as the hardware models. The interfaces of the models are given and a serial communication system to perform low speed system tests is described. A comparison of the LNS and floating-point number systems in terms of area and delay is given. Different functions implemented in LNS and floating-point arithmetic are also compared and conclusions are drawn. The results show that when the LNS is implemented with a 6-bit or less characteristic it is superior to floating-point. However, for larger characteristic lengths the floating-point system is more efficient due to the delay and exponential area increase of the LNS addition operator. The LNS is beneficial for larger characteristics than 6-bits only for specialist applications that require a high portion of division, multiplication, square root, powering operations and few additions

    Posits: An Alternative to Floating Point Calculations

    Get PDF
    Floating point arithmetic is one of several methods of performing computations in digital designs; others include integer and fixed point computations. Fixed point utilizes a method comparable to scientific notation in the binary domain. In terms of computations, floating point is by far the most prevalent in today’s digital designs. Between the support offered by compilers, as well as for ready-to-use IP blocks, floating point units (FPU’s) are a de-facto standard for most processors. Despite its prevalence in modern designs, floating point has many flaws. One of the most common is the use of not-a-numbers (NaN’s). These are meant to provide a way of signaling invalid operation, however the excessive amount of them wastes usable bit patterns. As an alternative to floating point, a system named Universal Numbers or UNUMs was developed. This system consists of three different types, however for hardware compatibility, the Type III provides the best stand in for floating point. This system eliminates the NaN problem by only using one bit pattern, and also provides many other inherent benefits

    Optimized linear, quadratic and cubic interpolators for elementary function hardware implementations

    Get PDF
    This paper presents a method for designing linear, quadratic and cubic interpolators that compute elementary functions using truncated multipliers, squarers and cubers. Initial coefficient values are obtained using a Chebyshev series approximation. A direct search algorithm is then used to optimize the quantized coefficient values to meet a user-specified error constraint. The algorithm minimizes coefficient lengths to reduce lookup table requirements, maximizes the number of truncated columns to reduce the area, delay and power of the arithmetic units, and minimizes the maximum absolute error of the interpolator output. The method can be used to design interpolators to approximate any function to a user-specified accuracy, up to and beyond 53-bits of precision (e.g., IEEE double precision significand). Linear, quadratic and cubic interpolator designs that approximate reciprocal, square root, reciprocal square root and sine are presented and analyzed. Area, delay and power estimates are given for 16, 24 and 32-bit interpolators that compute the reciprocal function, targeting a 65 nm CMOS technology from IBM. Results indicate the proposed method uses smaller arithmetic units and has reduced lookup table sizes compared to previously proposed methods. The method can be used to optimize coefficients in other systems while accounting for coefficient quantization as well as truncation and rounding effects of multiple arithmetic units.Peer reviewedElectrical and Computer Engineerin
    • …
    corecore