Search CORE

76 research outputs found

An Operand-Optimized Asynchronous IEEE 754 Double-Precision Floating-Point Adder

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

IEEE Compliant Double-Precision FPU and 64-bit ALU with Variable Latency Integer Divider

Author: Williams Ryan D
Publication venue: RIT Scholar Works
Publication date: 01/01/2007
Field of study

Together the arithmetic logic unit (ALU) and floating-point unit (FPU) perform all of the mathematical and logic operations of computer processors. Because they are used so prominently, they fall in the critical path of the central processing unit - often becoming the bottleneck, or limiting factor for performance. As such, the design of a high-speed ALU and FPU is vital to creating a processor capable of performing up to the demanding standards of today\u27s computer users. In this paper, both a 64-bit ALU and a 64-bit FPU are designed based on the reduced instruction set computer architecture. The ALU performs the four basic mathematical operations - addition, subtraction, multiplication and division - in both unsigned and two\u27s complement format, basic logic operations and shifting. The division algorithm is a novel approach, using a comparison multiples based SRT divider to create a variable latency integer divider. The floating-point unit performs the double-precision floating-point operations add, subtract, multiply and divide, in accordance with the IEEE 754 standard for number representation and rounding. The ALU and FPU were implemented in VHDL, simulated in ModelSim, and constrained and synthesized using Synopsys Design Compiler (2006.06). They were synthesized using TSMC 0.1 3nm CMOS technology. The timing, power and area synthesis results were recorded, and, where applicable, compared to those of the corresponding DesignWare components.The ALU synthesis reported an area of 122,215 gates, a power of 384 mW, and a delay of 2.89 ns - a frequency of 346 MHz. The FPU synthesis reported an area 84,440 gates, a delay of 2.82 ns and an operating frequency of 355 MHz. It has a maximum dynamic power of 153.9 mW

RIT Scholar Works

Design and implementation of an out of order execution engine of floating point arithmetic operations

Author: Ramírez Lazo Cristóbal
Publication venue: Universitat Politècnica de Catalunya
Publication date: 04/02/2016
Field of study

In this thesis, work is undertaken towards the design in hardware description languages and implementation in FPGA of an out of order execution engine of floating point arithmetic operations. This thesis work, is part of a project called Lagarto

UPCommons. Portal del coneixement obert de la UPC

Parallel Pipeline Implementation of 64-bit FPU on Hardware

Author: Ng Kiat Hong
Publication venue: Universiti Teknologi Petronas
Publication date: 01/06/2004
Field of study

This project is entitled "Parallel Pipelined Implementation of 64-bit FPU on Hardware". Most modern processors typically have two different logic units which handlethe calculations requiredby the computer. One of them is the arithmetic-logic unit (ALU) which operates on integer operands while the other is the floating point unit (FPU) which operates on real operands. The aim of this project is therefore to create a FPUwhich complies withthe IEEE-754 double precision standard (64-bit). The project also aims to study the speed improvements offered by parallel and pipelined design. The project also requires application of advanced digital design techniques by using Verilog in a real world project. The designed FPU is targeted to be capable of performing floating point addition (FADD), subtraction (FSUB), multiplication (FMUL) and division (FDIV) operations equally as fast. The FPU must also demonstrate the performance rewards of the parallel and pipeline design. It is thus implied that the project would require an initial study on FP numbers and FP arithmetic. How FP arithmetic is actually implemented in hardware must also be know in-depth. The section on methodology details each steps that is expected to be taken throughout the course of the project. The methodology would serve as a general guideline to execute the project and more details and other refinements may be made as further progress is made into the project. The project basically has two main phases, the first being software RTL coding to be completed in semester 1 while the second is hardware implementation and testing in FPGA. The results available from the project thus far is incomplete, because of time constraints, the Verilog coding is not totally finished. Oncethe codes are done, RTL tests and simulation would need to be conducted, and then only will it be implemented on the FPGA

UTPedia

Integrating a floating point unit into the AT&T Hobbit(tm) microprocessor

Author: Holler Paul T.
Publication venue: Lehigh Preserve
Publication date
Field of study

Lehigh University: Lehigh Preserve

Low Power Synchronous Design of Hardware Architecture for Ieee 754 Single Precision Floating Point Fast Fourier Transform

Author: Sunkari Sai Kiran
Publication venue
Publication date: 01/12/2015
Field of study

Signal Processing, communication systems, Digital information systems and many other fields of DSP have the wide need for Fast Fourier Transformation computations. Hardware architecture for computing IEEE 754 single precision floating point FFT is proposed here and the work is focused on power optimization of the design. Cooley-Tukey�s (DIF) Decimation in Frequency domain butterfly algorithm is used for the design implementation. Proposed design is a synchronous architecture and proved to be an efficient compared to the earlier parallel architectures. The clock latency and hardware over head of the design is productive and cost effective compared to the designs known earlier. The design is implemented in RTL Verilog and logically verified using Altera-Model Sim. Synthesis of the design is carried out in gscl-45 nm library, 1.1 v process using Synopsys design vision and prime time tools. The power reports showed that the proposed design consumes 90% less power with 50% reduced clock latency compared to earlier designs. Frequency of the design is compromised to an extent but can be improved using the suggested novel sub-designs of floating point add/sub and multiply blocks. Techniques for further power optimization are also given for future implementations.Electrical Engineerin

SHAREOK repository

REAL-TIME ADAPTIVE PULSE COMPRESSION ON RECONFIGURABLE, SYSTEM-ON-CHIP (SOC) PLATFORMS

Author: Suarez Hernan
Publication venue
Publication date: 01/12/2015
Field of study

New radar applications need to perform complex algorithms and process a large quantity of data to generate useful information for the users. This situation has motivated the search for better processing solutions that include low-power high-performance processors, efficient algorithms, and high-speed interfaces. In this work, hardware implementation of adaptive pulse compression algorithms for real-time transceiver optimization is presented, and is based on a System-on-Chip architecture for reconfigurable hardware devices. This study also evaluates the performance of dedicated coprocessors as hardware accelerator units to speed up and improve the computation of computing-intensive tasks such matrix multiplication and matrix inversion, which are essential units to solve the covariance matrix. The tradeoffs between latency and hardware utilization are also presented. Moreover, the system architecture takes advantage of the embedded processor, which is interconnected with the logic resources through high-performance buses, to perform floating-point operations, control the processing blocks, and communicate with an external PC through a customized software interface. The overall system functionality is demonstrated and tested for real-time operations using a Ku-band testbed together with a low-cost channel emulator for different types of waveforms

SHAREOK repository

Comparison of logarithmic and floating-point number systems implemented on Xilinx Virtex-II field-programmable gate arrays

Author: Lee Barry Roland
Publication venue
Publication date
Field of study

The aim of this thesis is to compare the implementation of parameterisable LNS (logarithmic number system) and floating-point high dynamic range number systems on FPGA. The Virtex/Virtex-II range of FPGAs from Xilinx, which are the most popular FPGA technology, are used to implement the designs. The study focuses on using the low level primitives of the technology in an efficient way and so initially the design issues in implementing fixed-point operators are considered. The four basic operations of addition, multiplication, division and square root are considered. Carry- free adders, ripple-carry adders, parallel multipliers and digit recurrence division and square root are discussed. The floating-point operators use the word format and exceptions as described by the IEEE std-754. A dual-path adder implementation is described in detail, as are floating-point multiplier, divider and square root components. Results and comparisons with other works are given. The efficient implementation of function evaluation methods is considered next. An overview of current FPGA methods is given and a new piecewise polynomial implementation using the Taylor series is presented and compared with other designs in the literature. In the next section the LNS word format, accuracy and exceptions are described and two new LNS addition/subtraction function approximations are described. The algorithms for performing multiplication, division and powering in the LNS domain are also described and are compared with other designs in the open literature. Parameterisable conversion algorithms to convert to/from the fixed-point domain from/to the LNS and floating-point domain are described and implementation results given. In the next chapter MATLAB bit-true software models are given that have the exact functionality as the hardware models. The interfaces of the models are given and a serial communication system to perform low speed system tests is described. A comparison of the LNS and floating-point number systems in terms of area and delay is given. Different functions implemented in LNS and floating-point arithmetic are also compared and conclusions are drawn. The results show that when the LNS is implemented with a 6-bit or less characteristic it is superior to floating-point. However, for larger characteristic lengths the floating-point system is more efficient due to the delay and exponential area increase of the LNS addition operator. The LNS is beneficial for larger characteristics than 6-bits only for specialist applications that require a high portion of division, multiplication, square root, powering operations and few additions

Online Research @ Cardiff

Posits: An Alternative to Floating Point Calculations

Author: Wagner Matt
Publication venue: RIT Scholar Works
Publication date: 01/05/2020
Field of study

Floating point arithmetic is one of several methods of performing computations in digital designs; others include integer and fixed point computations. Fixed point utilizes a method comparable to scientific notation in the binary domain. In terms of computations, floating point is by far the most prevalent in today’s digital designs. Between the support offered by compilers, as well as for ready-to-use IP blocks, floating point units (FPU’s) are a de-facto standard for most processors. Despite its prevalence in modern designs, floating point has many flaws. One of the most common is the use of not-a-numbers (NaN’s). These are meant to provide a way of signaling invalid operation, however the excessive amount of them wastes usable bit patterns. As an alternative to floating point, a system named Universal Numbers or UNUMs was developed. This system consists of three different types, however for hardware compatibility, the Type III provides the best stand in for floating point. This system eliminates the NaN problem by only using one bit pattern, and also provides many other inherent benefits

RIT Scholar Works

Optimized linear, quadratic and cubic interpolators for elementary function hardware implementations

Author: Sadeghian Masoud
Stine James E.
Walters E. George, III
Publication venue: 'MDPI AG'
Publication date: 01/04/2016
Field of study

This paper presents a method for designing linear, quadratic and cubic interpolators that compute elementary functions using truncated multipliers, squarers and cubers. Initial coefficient values are obtained using a Chebyshev series approximation. A direct search algorithm is then used to optimize the quantized coefficient values to meet a user-specified error constraint. The algorithm minimizes coefficient lengths to reduce lookup table requirements, maximizes the number of truncated columns to reduce the area, delay and power of the arithmetic units, and minimizes the maximum absolute error of the interpolator output. The method can be used to design interpolators to approximate any function to a user-specified accuracy, up to and beyond 53-bits of precision (e.g., IEEE double precision significand). Linear, quadratic and cubic interpolator designs that approximate reciprocal, square root, reciprocal square root and sine are presented and analyzed. Area, delay and power estimates are given for 16, 24 and 32-bit interpolators that compute the reciprocal function, targeting a 65 nm CMOS technology from IBM. Results indicate the proposed method uses smaller arithmetic units and has reduced lookup table sizes compared to previously proposed methods. The method can be used to optimize coefficients in other systems while accounting for coefficient quantization as well as truncation and rounding effects of multiple arithmetic units.Peer reviewedElectrical and Computer Engineerin

Multidisciplinary Digital Publishing Institute

Crossref

Directory of Open Access Journals

SHAREOK repository