147 research outputs found

    IEEE Compliant Double-Precision FPU and 64-bit ALU with Variable Latency Integer Divider

    Get PDF
    Together the arithmetic logic unit (ALU) and floating-point unit (FPU) perform all of the mathematical and logic operations of computer processors. Because they are used so prominently, they fall in the critical path of the central processing unit - often becoming the bottleneck, or limiting factor for performance. As such, the design of a high-speed ALU and FPU is vital to creating a processor capable of performing up to the demanding standards of today\u27s computer users. In this paper, both a 64-bit ALU and a 64-bit FPU are designed based on the reduced instruction set computer architecture. The ALU performs the four basic mathematical operations - addition, subtraction, multiplication and division - in both unsigned and two\u27s complement format, basic logic operations and shifting. The division algorithm is a novel approach, using a comparison multiples based SRT divider to create a variable latency integer divider. The floating-point unit performs the double-precision floating-point operations add, subtract, multiply and divide, in accordance with the IEEE 754 standard for number representation and rounding. The ALU and FPU were implemented in VHDL, simulated in ModelSim, and constrained and synthesized using Synopsys Design Compiler (2006.06). They were synthesized using TSMC 0.1 3nm CMOS technology. The timing, power and area synthesis results were recorded, and, where applicable, compared to those of the corresponding DesignWare components.The ALU synthesis reported an area of 122,215 gates, a power of 384 mW, and a delay of 2.89 ns - a frequency of 346 MHz. The FPU synthesis reported an area 84,440 gates, a delay of 2.82 ns and an operating frequency of 355 MHz. It has a maximum dynamic power of 153.9 mW

    The Design of a Processing Element for the Systolic Array Implementation of a Kalman Filter

    Get PDF
    The Kalman filter is an important component of optimal estimation theory. It has applications in a wide range of high performance control systems including navigational, fire control, and targeting systems. The Kalman filter, however, has not been utilized to its full potential due to the limitations of its inherent computational intensiveness which requires off-line processing or allows only low bandwidth real-time applications. The recent advances in VLSI circuit technology have created the opportunity to design algorithms and data structures for direct implementation in integrated circuits. A systolic architecture is a concept which allows the construction of massively parallel systems in integrated circuits and has been utilized as a means of achieving high data rates. A systolic system consists of a set of interconnected processing elements, each capable of performing some simple operation. The design of a processing element in an orthogonal systolic architecture will be investigated using the state of the art in VLSI technology. The goal is to create a high speed, high precision processing element which is adaptive to a highly configurable systolic architecture. In order to achieve the necessary high computational throughput, the arithmetic unit of the processing element will be implemented using the Logarithmic Number System. The Systolic architecture approach will be used in an attempt to implement a Kalman filtering system with both a high sampling rate and a small package size. The design of such a Kalman filter would enable this filtering technology to be applied to the areas of process control, computer vision, and robotics

    A Low Power CMOS Comparator Using Logic Shut-down Technique

    Get PDF
    © ASEE 2009Low power VLSI has become a very hot area due to the rapid increase in energy cost and wide applications of mobile electronics. Various techniques can be used to reduce the power consumption of VLSI circuits. In this paper, a novel low-power 32-bit comparator using pass transistor and logic shut-down technique is proposed. The comparator will first compare the higher bits of the input patterns. Whenever a decision can be made, the comparison logic for the lower bits will be shut down to save power. The lower bits are compared only when a decision cannot be made from the higher bits. In this way, the unnecessary comparisons are avoided and the power savings can be maximized. Pass transistor logic is also utilized in the comparator design to further reduce the transistor count so that the power consumption can be further reduced compared to CMOS logic. Other comparators are also compared. The schematic design for proposed comparator is designed with PSPSICE. The netlists are extracted and fed to PSPICE for power analysis. An auxiliary power measurement circuitry is introduced to measure the power consumption of the circuits in a smart way. Simulation results show that using pass-transistor and logic shut down techniques can significantly reduce the consumption of the transistor and the power, furthermore, the shrinking signal path are introduced for delay improvement

    Software floating-point computation on parallel mahcines

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 71).by Michael Ruogu Zhang.M.Eng

    Comparison of Binary and Multi-Level Logic Electronics for Embedded Systems

    Get PDF
    Embedded systems are dependent on low-power, miniaturized instrumentation. Comparator circuits are common elements in applications for digital threshold detection. A multi-level, memory-based logic approach is in development that offers potential benefits in power usage and size with respect to traditional binary logic systems. Basic 4-bit operations with CMOS gates and comparators are chosen to compare circuit implementations of binary structures and quaternary equivalents. Circuit layouts and functional operation are presented. In particular, power characteristics and transistor count are examined. The potential for improved embedded systems based on the multilevel, memory-based logic is discussed

    A polymorphic reconfigurable emulator for parallel simulation

    Get PDF
    Microprocessor and arithmetic support chip technology was applied to the design of a reconfigurable emulator for real time flight simulation. The system developed consists of master control system to perform all man machine interactions and to configure the hardware to emulate a given aircraft, and numerous slave compute modules (SCM) which comprise the parallel computational units. It is shown that all parts of the state equations can be worked on simultaneously but that the algebraic equations cannot (unless they are slowly varying). Attempts to obtain algorithms that will allow parellel updates are reported. The word length and step size to be used in the SCM's is determined and the architecture of the hardware and software is described

    The implementation and applications of multiple-valued logic

    Get PDF
    Multiple-Valued Logic (MVL) takes two major forms. Multiple-valued circuits can implement the logic directly by using multiple-valued signals, or the logic can be implemented indirectly with binary circuits, by using more than one binary signal to represent a single multiple-valued signal. Techniques such as carry-save addition can be viewed as indirectly implemented MVL. Both direct and indirect techniques have been shown in the past to provide advantages over conventional arithmetic and logic techniques in algorithms required widely in computing for applications such as image and signal processing. It is possible to implement basic MVL building blocks at the transistor level. However, these circuits are difficult to design due to their non binary nature. In the design stage they are more like analogue circuits than binary circuits. Current integrated circuit technologies are biased towards binary circuitry. However, in spite of this, there is potential for power and area savings from MVL circuits, especially in technologies such as BiCMOS. This thesis shows that the use of voltage mode MVL will, in general not provide bandwidth increases on circuit buses because the buses become slower as the number of signal levels increases. Current mode MVL circuits however do have potential to reduce power and area requirements of arithmetic circuitry. The design of transistor level circuits is investigated in terms of a modern production technology. A novel methodology for the design of current mode MVL circuits is developed. The methodology is based upon the novel concept of the use of non-linear current encoding of signals, providing the opportunity for the efficient design of many previously unimplemented circuits in current mode MVL. This methodology is used to design a useful set of basic MVL building blocks, and fabrication results are reported. The creation of libraries of MVL circuits is also discussed. The CORDIC algorithm for two dimensional vector rotation is examined in detail as an example for indirect MVL implementation. The algorithm is extended to a set of three dimensional vector rotators using conventional arithmetic, redundant radix four arithmetic, and Taylor's series expansions. These algorithms can be used for two dimensional vector rotations in which no scale factor corrections are needed. The new algorithms are compared in terms of basic VLSI criteria against previously reported algorithms. A pipelined version of the redundant arithmetic algorithm is floorplanned and partially laid out to give indications of wiring overheads, and layout densities. An indirectly implemented MVL algorithm such as the CORDIC algorithm described in this thesis would clearly benefit from direct implementation in MVL

    THE DESIGN OF AN IC HALF PRECISION FLOATING POINT ARITHMETIC LOGIC UNIT

    Get PDF
    A 16 bit floating point (FP) Arithmetic Logic Unit (ALU) was designed and implemented in 0.35µm CMOS technology. Typical uses of the 16 bit FP ALU include graphics processors and embedded multimedia applications. The ALU of the modern microprocessors use a fused multiply add (FMA) design technique. An advantage of the FMA is to remove the need for a comparator which is required for a normal FP adder. The FMA consists of a multiplier, shifters, adders and rounding circuit. A fast multiplier based on the Wallace tree configuration was designed. The number of partial products was greatly reduced by the use of the modified booth encoder. The Wallace tree was chosen to reduce the number of reduction layers of partial products. The multiplier also involved the design of a pass transistor based 4:2 compressor. The average delay of the pass transistor based compressor was 55ps and was found to be 7 times faster than the full adder based 4:2 compressor. The shifters consist of separate left and right shifters using multiplexers. The shift amount is calculated using the exponents of the three operands. The addition operation is implemented using a carry skip adder (CSK). The average delay of the CSK was 1.05ns and was slower than the carry look ahead adder by about 400ps. The advantages of the CSK are reduced power, gate count and area when compared to the similar sized carry look ahead adder. The adder computes the addition of the multiplier result and the shifted value of the addend. In most modern computers, division is performed using software thereby eliminating the need for a separate hardware unit. FMA hardware unit was utilized to perform FP division. The FP divider uses the Newton Raphson algorithm to solve division by iteration. The initial approximated value with five bit accuracy was assumed to be pre-stored in cache memory and a separate clock cycle for cache read was assumed before the start of the FP division operation. In order to significantly reduce the area of the design, only one multiplier was used. Rounding to nearest technique was implemented using an 11 bit variable CSK adder. This is the best rounding technique when compared to other rounding techniques. In both the FMA and division, rounding was performed after the computation of the final result during the last clock cycle of operation. Testability analysis is performed for the multiplier which is the most complex and critical part of the FP ALU. The specific aim of testability was to ensure the correct operation of the multiplier and thus guarantee the correctness of the FMA circuit at the layout stage. The multiplier\u27s output was tested by identifying the minimal number of input vectors which toggle the inputs of the 4:2 compressors of the multiplier. The test vectors were identified in a semi automated manner using Perl scripting language. The multiplier was tested with a test set of thirty one vectors. The fault coverage of the multiplier was found to be 90.09%. The layout was implemented using IC station of Mentor Graphics CAD tool and resulted in a chip area of 1.96mm2. The specifications for basic arithmetic operations were met successfully. FP Division operation was completed within six clock cycles. The other arithmetic operations like FMA, FP addition, FP subtraction and FP multiplication were completed within three clock cycles

    Feasibility study for a numerical aerodynamic simulation facility. Volume 2: Hardware specifications/descriptions

    Get PDF
    An FMP (Flow Model Processor) was designed for use in the Numerical Aerodynamic Simulation Facility (NASF). The NASF was developed to simulate fluid flow over three-dimensional bodies in wind tunnel environments and in free space. The facility is applicable to studying aerodynamic and aircraft body designs. The following general topics are discussed in this volume: (1) FMP functional computer specifications; (2) FMP instruction specification; (3) standard product system components; (4) loosely coupled network (LCN) specifications/description; and (5) three appendices: performance of trunk allocation contention elimination (trace) method, LCN channel protocol and proposed LCN unified second level protocol

    Applied high resolution digital control for universal precision systems

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2008.Includes bibliographical references (p. 223-225).This thesis describes the design and characterization of a high-resolution analog interface for dSPACE digital control systems and a high-resolution, high-speed data acquisition and control system. These designs are intended to enable higher precision digital control than currently available. The dSPACE system was previously designed within the PMC Lab and includes higher resolution A/D and D/A interfaces than natively available. Characterization on the custom A/D channel demonstrates 20.1 effective bits, or a 121 dB dynamic range, and the custom D/A channel demonstrates 15.1 effective bits, or a 91 dB dynamic range. This compares to a 15.7 effective bits on the A/D dSPACE channel and 12.3 effective bits on the D/A dSPACE channel. The increased resolution is attained by higher performance hardware and oversampling and averaging the A/D channel. The sampling rate is limited to 8 kHz. The high-resolution, high-speed data acquisition and control system can sample two A/D channels at 2.5 MHz and display/save an acquired one second burst. The A/D channel is characterized at 109 dB dynamic range with a grounded input and 96 dB dynamic range, or 0.74 nm RMS over a 50 [mu]m range, with a fixtured capacitive probe. Acquisition at 2.5 MHz and closed-loop control at 625 kHz sampling rate is implemented on a National Instruments FPGA. The A/D circuit was designed and built on a custom printed circuit board around the commercially available AD7760 sigma-delta converter from Analog Devices and includes fully differential ±10 V inputs, a dedicated microcontroller to provide an initialization sequence, and digital galvanic isolation. LabVIEW FPGA code demonstrates arbitrary transfer function control implementation.(cont.) The digital platform is applied to a 1-DOF positioner to demonstrate 0.10 nm RMS control over a 10 [mu]m mechanical range when filtered to the 1.5 kHz closed-loop bandwidth, which is limited by the A/D converter architecture propagation delay.by Aaron John Gawlik.S.M
    corecore