151 research outputs found

    IEEE Compliant Double-Precision FPU and 64-bit ALU with Variable Latency Integer Divider

    Get PDF
    Together the arithmetic logic unit (ALU) and floating-point unit (FPU) perform all of the mathematical and logic operations of computer processors. Because they are used so prominently, they fall in the critical path of the central processing unit - often becoming the bottleneck, or limiting factor for performance. As such, the design of a high-speed ALU and FPU is vital to creating a processor capable of performing up to the demanding standards of today\u27s computer users. In this paper, both a 64-bit ALU and a 64-bit FPU are designed based on the reduced instruction set computer architecture. The ALU performs the four basic mathematical operations - addition, subtraction, multiplication and division - in both unsigned and two\u27s complement format, basic logic operations and shifting. The division algorithm is a novel approach, using a comparison multiples based SRT divider to create a variable latency integer divider. The floating-point unit performs the double-precision floating-point operations add, subtract, multiply and divide, in accordance with the IEEE 754 standard for number representation and rounding. The ALU and FPU were implemented in VHDL, simulated in ModelSim, and constrained and synthesized using Synopsys Design Compiler (2006.06). They were synthesized using TSMC 0.1 3nm CMOS technology. The timing, power and area synthesis results were recorded, and, where applicable, compared to those of the corresponding DesignWare components.The ALU synthesis reported an area of 122,215 gates, a power of 384 mW, and a delay of 2.89 ns - a frequency of 346 MHz. The FPU synthesis reported an area 84,440 gates, a delay of 2.82 ns and an operating frequency of 355 MHz. It has a maximum dynamic power of 153.9 mW

    Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer. Volume 1: FTMP principles of operation

    Get PDF
    The basic organization of the fault tolerant multiprocessor, (FTMP) is that of a general purpose homogeneous multiprocessor. Three processors operate on a shared system (memory and I/O) bus. Replication and tight synchronization of all elements and hardware voting is employed to detect and correct any single fault. Reconfiguration is then employed to repair a fault. Multiple faults may be tolerated as a sequence of single faults with repair between fault occurrences

    A Synthesizable single-cycle multiply-accumulator

    Get PDF
    The multiplication and multiply-accumulate operations are expensive to implement in hardware for Digital Signal Processing, video, and graphics applications. A standard multiply-accumulator has three inputs and a single output that is equal to the product of two of its inputs added to the third input. For some applications it is desirable for a multiply-accumulator to have two outputs; one output that is the product of the first two inputs, and a second output that is the multiply-accumulate result. The goal of this thesis is to investigate algorithms and architectures used to design multipliers and multiply-accumulators, and to create a multiply-accumulator that computes both outputs in a single clock cycle. Often times in high speed designs the most time-consuming operations are pipelined to meet the system timing requirements. If the multiply-accumulate computation can be reduced to a single-cycle operation the overall processor performance can be improved for many applications. A multiply-accumulator with two outputs can be created using a combination of standard multiply, add, or multiply-accumulate components. Using these components, a multiplier and a multiply-accumulator can be used to produce the outputs in the most time-efficient manner. A multiplier and an adder will result in a smaller design with a larger worst-case delay. Therefore, the goal is to create a multiply-accumulator that is comparable in speed, but requires less area than a design using an industry standard multiplier and multiply-accumulator

    Decimal Floating-point Fused Multiply Add with Redundant Number Systems

    Get PDF
    The IEEE standard of decimal floating-point arithmetic was officially released in 2008. The new decimal floating-point (DFP) format and arithmetic can be applied to remedy the conversion error caused by representing decimal floating-point numbers in binary floating-point format and to improve the computing performance of the decimal processing in commercial and financial applications. Nowadays, many architectures and algorithms of individual arithmetic functions for decimal floating-point numbers are proposed and investigated (e.g., addition, multiplication, division, and square root). However, because of the less efficiency of representing decimal number in binary devices, the area consumption and performance of the DFP arithmetic units are not comparable with the binary counterparts. IBM proposed a binary fused multiply-add (FMA) function in the POWER series of processors in order to improve the performance of floating-point computations and to reduce the complexity of hardware design in reduced instruction set computing (RISC) systems. Such an instruction also has been approved to be suitable for efficiently implementing not only stand-alone addition and multiplication, but also division, square root, and other transcendental functions. Additionally, unconventional number systems including digit sets and encodings have displayed advantages on performance and area efficiency in many applications of computer arithmetic. In this research, by analyzing the typical binary floating-point FMA designs and the design strategy of unconventional number systems, ``a high performance decimal floating-point fused multiply-add (DFMA) with redundant internal encodings" was proposed. First, the fixed-point components inside the DFMA (i.e., addition and multiplication) were studied and investigated as the basis of the FMA architecture. The specific number systems were also applied to improve the basic decimal fixed-point arithmetic. The superiority of redundant number systems in stand-alone decimal fixed-point addition and multiplication has been proved by the synthesis results. Afterwards, a new DFMA architecture which exploits the specific redundant internal operands was proposed. Overall, the specific number system improved, not only the efficiency of the fixed-point addition and multiplication inside the FMA, but also the architecture and algorithms to build up the FMA itself. The functional division, square root, reciprocal, reciprocal square root, and many other functions, which exploit the Newton's or other similar methods, can benefit from the proposed DFMA architecture. With few necessary on-chip memory devices (e.g., Look-up tables) or even only software routines, these functions can be implemented on the basis of the hardwired FMA function. Therefore, the proposed DFMA can be implemented on chip solely as a key component to reduce the hardware cost. Additionally, our research on the decimal arithmetic with unconventional number systems expands the way of performing other high-performance decimal arithmetic (e.g., stand-alone division and square root) upon the basic binary devices (i.e., AND gate, OR gate, and binary full adder). The proposed techniques are also expected to be helpful to other non-binary based applications

    Application specific serial arithmetic arrays

    Get PDF
    High performance systolic arrays of serial-parallel multiplier elements may be rapidly constructed for specific applications by applying hardware description language techniques to a library of full-custom CMOS building blocks. Single clock pre-charged circuits have been implemented for these arrays at clock rates in excess of 100 Mhz using economical 2-micron (minimum feature size) CMOS processes, which may be quickly configured for a variety of applications. A number of application-specific arrays are presented, including a 2-D convolver for image processing, an integer polynomial solver, and a finite-field polynomial solver

    A Computer Graphics Head-Up Display for Air-To-Air and Air-To-Ground Flight Simulation

    Get PDF
    A computer graphics simulation of an aircraft Head-Up Display was designed using an RDS-3000 Ikonas Graphics Processor and a PDP-11/34 host computer system. The software control and display modules were accomplished using Ikonas microcode and Digital Equipment Corporation Fortran IV-PLUS. The Head-Up Display system consists of the basic flight data, which includes aerodynamic flight information, Roll/Pitch Ladder, and the Velocity Vector or Flight Path Marker. The system was designed for flexibility in modifications and evaluation of various weapons delivery systems. These will be adapted to specific needs by research scientists and engineers at the Visual Technology Research Simulator in Orlando, Florida

    An optimization framework for fixed-point digital signal processing.

    Get PDF
    Lam Yuet Ming.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 80-86).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivation --- p.1Chapter 1.1.1 --- Difficulties of fixed-point design --- p.1Chapter 1.1.2 --- Why still fixed-point? --- p.2Chapter 1.1.3 --- Difficulties of converting floating-point to fixed-point --- p.2Chapter 1.1.4 --- Why wordlength optimization? --- p.3Chapter 1.2 --- Objectives --- p.3Chapter 1.3 --- Contributions --- p.3Chapter 1.4 --- Thesis Organization --- p.4Chapter 2 --- Review --- p.5Chapter 2.1 --- Introduction --- p.5Chapter 2.2 --- Simulation approach to address quantization issue --- p.6Chapter 2.3 --- Analytical approach to address quantization issue --- p.8Chapter 2.4 --- Implementation of speech systems --- p.9Chapter 2.5 --- Discussion --- p.10Chapter 2.6 --- Summary --- p.11Chapter 3 --- Fixed-point arithmetic background --- p.12Chapter 3.1 --- Introduction --- p.12Chapter 3.2 --- Fixed-point representation --- p.12Chapter 3.3 --- Fixed-point addition/subtraction --- p.14Chapter 3.4 --- Fixed-point multiplication --- p.16Chapter 3.5 --- Fixed-point division --- p.18Chapter 3.6 --- Summary --- p.20Chapter 4 --- Fixed-point class implementation --- p.21Chapter 4.1 --- Introduction --- p.21Chapter 4.2 --- Fixed-point simulation using overloading --- p.21Chapter 4.3 --- Fixed-point class implementation --- p.24Chapter 4.3.1 --- Fixed-point object declaration --- p.24Chapter 4.3.2 --- Overload the operators --- p.25Chapter 4.3.3 --- Arithmetic operations --- p.26Chapter 4.3.4 --- Automatic monitoring of dynamic range --- p.27Chapter 4.3.5 --- Automatic calculation of quantization error --- p.27Chapter 4.3.6 --- Array supporting --- p.28Chapter 4.3.7 --- Cosine calculation --- p.28Chapter 4.4 --- Summary --- p.29Chapter 5 --- Speech recognition background --- p.30Chapter 5.1 --- Introduction --- p.30Chapter 5.2 --- Isolated word recognition system overview --- p.30Chapter 5.3 --- Linear predictive coding processor --- p.32Chapter 5.3.1 --- The LPC model --- p.32Chapter 5.3.2 --- The LPC processor --- p.33Chapter 5.4 --- Vector quantization --- p.36Chapter 5.5 --- Hidden Markov model --- p.38Chapter 5.6 --- Summary --- p.40Chapter 6 --- Optimization --- p.41Chapter 6.1 --- Introduction --- p.41Chapter 6.2 --- Simplex Method --- p.41Chapter 6.2.1 --- Initialization --- p.42Chapter 6.2.2 --- Reflection --- p.42Chapter 6.2.3 --- Expansion --- p.44Chapter 6.2.4 --- Contraction --- p.44Chapter 6.2.5 --- Stop --- p.45Chapter 6.3 --- One-dimensional optimization approach --- p.45Chapter 6.3.1 --- One-dimensional optimization approach --- p.46Chapter 6.3.2 --- Search space reduction --- p.47Chapter 6.3.3 --- Speeding up convergence --- p.48Chapter 6.4 --- Summary --- p.50Chapter 7 --- Word Recognition System Design Methodology --- p.51Chapter 7.1 --- Introduction --- p.51Chapter 7.2 --- Framework design --- p.51Chapter 7.2.1 --- Fixed-point class --- p.52Chapter 7.2.2 --- Fixed-point application --- p.53Chapter 7.2.3 --- Optimizer --- p.53Chapter 7.3 --- Speech system implementation --- p.54Chapter 7.3.1 --- Model training --- p.54Chapter 7.3.2 --- Simulate the isolated word recognition system --- p.56Chapter 7.3.3 --- Hardware cost model --- p.57Chapter 7.3.4 --- Cost function --- p.58Chapter 7.3.5 --- Fraction size optimization --- p.59Chapter 7.3.6 --- One-dimensional optimization --- p.61Chapter 7.4 --- Summary --- p.63Chapter 8 --- Results --- p.64Chapter 8.1 --- Model training --- p.64Chapter 8.2 --- Simplex method optimization --- p.65Chapter 8.2.1 --- Simulation platform --- p.65Chapter 8.2.2 --- System level optimization --- p.66Chapter 8.2.3 --- LPC processor optimization --- p.67Chapter 8.2.4 --- One-dimensional optimization --- p.68Chapter 8.3 --- Speeding up the optimization convergence --- p.71Chapter 8.4 --- Optimization criteria --- p.73Chapter 8.5 --- Summary --- p.75Chapter 9 --- Conclusion --- p.76Chapter 9.1 --- Search space reduction --- p.76Chapter 9.2 --- Speeding up the searching --- p.77Chapter 9.3 --- Optimization criteria --- p.77Chapter 9.4 --- Flexibility of the framework design --- p.78Chapter 9.5 --- Further development --- p.78Bibliography --- p.8
    • …
    corecore