200 research outputs found

    IEEE Compliant Double-Precision FPU and 64-bit ALU with Variable Latency Integer Divider

    Get PDF
    Together the arithmetic logic unit (ALU) and floating-point unit (FPU) perform all of the mathematical and logic operations of computer processors. Because they are used so prominently, they fall in the critical path of the central processing unit - often becoming the bottleneck, or limiting factor for performance. As such, the design of a high-speed ALU and FPU is vital to creating a processor capable of performing up to the demanding standards of today\u27s computer users. In this paper, both a 64-bit ALU and a 64-bit FPU are designed based on the reduced instruction set computer architecture. The ALU performs the four basic mathematical operations - addition, subtraction, multiplication and division - in both unsigned and two\u27s complement format, basic logic operations and shifting. The division algorithm is a novel approach, using a comparison multiples based SRT divider to create a variable latency integer divider. The floating-point unit performs the double-precision floating-point operations add, subtract, multiply and divide, in accordance with the IEEE 754 standard for number representation and rounding. The ALU and FPU were implemented in VHDL, simulated in ModelSim, and constrained and synthesized using Synopsys Design Compiler (2006.06). They were synthesized using TSMC 0.1 3nm CMOS technology. The timing, power and area synthesis results were recorded, and, where applicable, compared to those of the corresponding DesignWare components.The ALU synthesis reported an area of 122,215 gates, a power of 384 mW, and a delay of 2.89 ns - a frequency of 346 MHz. The FPU synthesis reported an area 84,440 gates, a delay of 2.82 ns and an operating frequency of 355 MHz. It has a maximum dynamic power of 153.9 mW

    FPGA BASED PARALLEL IMPLEMENTATION OF STACKED ERROR DIFFUSION ALGORITHM

    Get PDF
    Digital halftoning is a crucial technique used in digital printers to convert a continuoustone image into a pattern of black and white dots. Halftoning is used since printers have a limited availability of inks and cannot reproduce all the color intensities in a continuous image. Error Diffusion is an algorithm in halftoning that iteratively quantizes pixels in a neighborhood dependent fashion. This thesis focuses on the development and design of a parallel scalable hardware architecture for high performance implementation of a high quality Stacked Error Diffusion algorithm. The algorithm is described in ‘C’ and requires a significant processing time when implemented on a conventional CPU. Thus, a new hardware processor architecture is developed to implement the algorithm and is implemented to and tested on a Xilinx Virtex 5 FPGA chip. There is an extraordinary decrease in the run time of the algorithm when run on the newly proposed parallel architecture implemented to FPGA technology compared to execution on a single CPU. The new parallel architecture is described using the Verilog Hardware Description Language. Post-synthesis and post-implementation, performance based Hardware Description Language (HDL), simulation validation of the new parallel architecture is achieved via use of the ModelSim CAD simulation tool

    Emerging Design Methodology And Its Implementation Through Rns And Qca

    Get PDF
    Digital logic technology has been changing dramatically from integrated circuits, to a Very Large Scale Integrated circuits (VLSI) and to a nanotechnology logic circuits. Research focused on increasing the speed and reducing the size of the circuit design. Residue Number System (RNS) architecture has ability to support high speed concurrent arithmetic applications. To reduce the size, Quantum-Dot Cellular Automata (QCA) has become one of the new nanotechnology research field and has received a lot of attention within the engineering community due to its small size and ultralow power. In the last decade, residue number system has received increased attention due to its ability to support high speed concurrent arithmetic applications such as Fast Fourier Transform (FFT), image processing and digital filters utilizing the efficiencies of RNS arithmetic in addition and multiplication. In spite of its effectiveness, RNS has remained more an academic challenge and has very little impact in practical applications due to the complexity involved in the conversion process, magnitude comparison, overflow detection, sign detection, parity detection, scaling and division. The advancements in very large scale integration technology and demand for parallelism computation have enabled researchers to consider RNS as an alternative approach to high speed concurrent arithmetic. Novel parallel - prefix structure binary to residue number system conversion method and RNS novel scaling method are presented in this thesis. Quantum-dot cellular automata has become one of the new nanotechnology research field and has received a lot of attention within engineering community due to its extremely small feature size and ultralow power consumption compared to COMS technology. Novel methodology for generating QCA Boolean circuits from multi-output Boolean circuits is presented. Our methodology takes as its input a Boolean circuit, generates simplified XOR-AND equivalent circuit and output an equivalent majority gate circuits. During the past decade, quantum-dot cellular automata showed the ability to implement both combinational and sequential logic devices. Unlike conventional Boolean AND-OR-NOT based circuits, the fundamental logical device in QCA Boolean networks is majority gate. With combining these QCA gates with NOT gates any combinational or sequential logical device can be constructed from QCA cells. We present an implementation of generalized pipeline cellular array using quantum-dot cellular automata cells. The proposed QCA pipeline array can perform all basic operations such as multiplication, division, squaring and square rooting. The different mode of operations are controlled by a single control line

    Multi-operand Decimal Adder Trees for FPGAs

    Get PDF
    The research and development of hardware designs for decimal arithmetic is currently going under an intense activity. For most part, the methods proposed to implement fixed and floating point operations are intended for ASIC designs. Thus, a direct mapping or adaptation of these techniques into a FPGA could be far from an optimal solution. Only a few studies have considered new methods more suitable for FPGA implementations. A basic operation that has not received enough attention in this context is multi-operand BCD addition. For example, it is of interest for low latency implementations of decimal fixed and floating point multipliers and decimal fused multiply-add units. We have explored the most representative proposals for multi-operand BCD addition and found that the resultant implementations in FPGAs are still very inefficient in terms of both area and latency when compared to their binary counterparts. In this paper we present a new method for fast and efficient implementation of multi-operand BCD addition in current FPGA devices. In particular, our proposal maps quite well into the slice structure of the Xilinx Virtex-5/Virtex-6 families and it is highly pipelineable. The synthesis results for a Virtex-6 device indicate that our implementations halve the area and latency of previous proposals, presenting area and delay figures close to those of optimal binary adder trees.La recherche sur l'implantation en matériel de l'arithmétique décimale est actuellement très active, la plupart des travaux portant sur des opérateurs pour les processeurs, en virgule fixe ou flottante. Mais les techniques développées pour un circuit intégré n'aboutissent pas forcément à une implémentation optimale dans un FPGA. Il n'y a que peu d'études ciblant explicitement les FPGA. Cet article s'intéresse dans ce contexte, à l'addition BCD multi-opérande, au cœur de multiplieurs et de multiplieurs-accumulateurs à faible latence. Nous étudions les architectures proposées pour cette opération décimale, et nous observons que, sur FPGA, leur performance (surface et latence) est très inférieure à celle des opérations binaire à précision comparable. Nous présentons donc dans cet article une nouvelle technique d'addition BCD multi-opérandes qui s'avère plus efficace que les propositions précédentes sur les FPGA actuels. Elle s'adapte particulièrement bien à la structure fine des FPGA Xilinx Virtex-5/Virtex-6, et se prête bien au pipeline. Les résultats de synthèse montrent que notre implémentation divise par deux la surface et la latence par rapport aux propositions précédentes, les ramenant à des valeurs comparables à celles des meilleurs additionneurs multi-opérandes binaires

    Analysis and implementation of decimal arithmetic hardware in nanometer CMOS technology

    Get PDF
    Scope and Method of Study: In today's society, decimal arithmetic is growing considerably in importance given its relevance in financial and commercial applications. Decimal calculations on binary hardware significantly impact performance mainly because most systems utilize software to emulate decimal calculations. The introduction of dedicated decimal hardware on the other hand promises the ability to improve performance by two or three orders of magnitude. The founding blocks of binary arithmetic are studied and applied to the development of decimal arithmetic hardware. New findings are contrasted with existent implementations and validated through extensive simulation.Findings and Conclusions: New architectures and a significant study of decimal arithmetic was developed and implemented. The architectures proposed include an IEEE-754 current revision draft compliant floating-point comparator, a study on decimal division, partial product reduction schemes using decimal compressor trees and a final implementation of a decimal multiplier using advanced techniques for partial product generation. The results of each hardware implementation in nanometer technologies are weighed against existent propositions and show improvements upon area, delay, and power

    A Review - Quaternary Signed Digit Number System by Reversible Logic Gate

    Get PDF
    A limitation is applied over the speed of latest computers while performing the arithmetic functions such as subtraction, addition & multiplication have to deal with delay in propagation. The arithmetic operations that are free of carry are attained by implementation of high level radix number system such as QSD. We suggest high speed adders constituted over QSD number system. In QSD, every digit is presented by a number in between -3 to 3. The operations on greater numbers like 64, 128 & addition that is carry free is implemented with a persistent delay & low complicacy. In this document, a reversible logic gate is implemented that is constituted over QSD. The performance of QSD adder can be improvised by invading adder based over logic gate that absorbs low power & delay

    Unified Architecture for Double/Two-Parallel Single Precision Floating Point Adder

    Get PDF
    postprin
    corecore