42 research outputs found
Design of ALU and Cache Memory for an 8 bit ALU
The design of an ALU and a Cache memory for use in a high performance processor was examined in this thesis. Advanced architectures employing increased parallelism were analyzed to minimize the number of execution cycles needed for 8 bit integer arithmetic operations. In addition to the arithmetic unit, an optimized SRAM memory cell was designed to be used as cache memory and as fast Look Up Table. The ALU consists of stand alone units for bit parallel computation of basic integer arithmetic operations. Addition and subtraction were performed using Kogge Stone parallel prefix hardware operating at 330MHz. A high performance multiplier was built using Radix 4 Modified Booth Encoder (MBE) and a Wallace Tree summation array. The multiplier requires single clock cycle for 8 bit integer multiplication and operates at a maximum frequency of 100MHz. Multiplicative division hardware was built for executing both integer division and square root. The division hardware computes 8-bit division and square root in 4 clock cycles. Multiplier forms the basic building block of all these functional units, making high level of resource sharing feasible with this architecture. The optimal operating frequency for the arithmetic unit is 70MHz. A 6T CMOS SRAM cell measuring 90 µm2 was designed using minimum size transistors. The layout allows for horizontal overlap resulting in effective area of 76 µm2 for an 8x8 array. By substituting equivalent bit line capacitance of P4 L1 Cache, the memory was simulated to have a read time of 3.27ns. An optimized set of test vectors were identified to enable high fault coverage without the need for any additional test circuitry. Sixteen test cases were identified that would toggle all the nodes and provide all possible inputs to the sub units of the multiplier. A correlation based semi automatic method was investigated to facilitate test case identification for large multipliers. This method of testability eliminates performance and area overhead associated with conventional testability hardware. Bottom up design methodology was employed for the design. The performance and area metrics are presented along with estimated power consumption. A set of Monte Carlo analysis was carried out to ensure the dependability of the design under process variations as well as fluctuations in operating conditions. The arithmetic unit was found to require a total die area of 2mm2 (approx.) in 0.35 micron process
An asynchrobatic, radix-four, carry look-ahead adder
A low-power, Asynchrobatic (asynchronous, quasi-adiabatic), sixteen-bit, radix-four, parallel-prefix adder circuit is presented. The results show that it is an efficient, low power design, and that as would be expected with an asynchronous design, its performance is determined by its operating conditions. On a 0.35 mum CMOS process, under ldquotypicalrdquo process conditions, operating at an effective frequency of 22 MHz, an addition can be performed using 69 pW, with 48.3 pW used by the control logic and 20.7 pW by the data-path
Parallel-prefix structures for binary and modulo {2n - 1, 2n, 2n + 1} adders
Adders are the among the most essential arithmetic units within digital systems. Parallel-prefix structures are efficient for adders because of their regular topology and logarithmic delay. However, building parallel-prefix adders are barely discussed in literature. This work puts emphasis on how to build prefix trees and simple algorithms for building these architectures. One particular modification of adders is for use with modulo arithmetic. The most common type of modulo adders are modulo 2n -1 and modulo 2n + 1 adders because they have a common base that is a power of 2. In order to improve their speed, parallel-prefix structures can also be employed for modulo 2n +- 1 adders. This dissertation presents the formation of several binary and modulo prefix architectures and their modifications using Ling's algorithm. For all binary and modulo adders, both algorithmic and quantitative analysis are provided to compare the performance of different architectures. Furthermore, to see how process impact the design, three technologies, from deep submicron to nanometer range, are utilized to collect the quantitative data
Recommended from our members
Tradeoffs in parallel prefix adder structures
textThis report presents the results of research on comparing the structures and qualities of fast parallel prefix adders. The binary adder serves as a fundamental component of many digital arithmetic operations. Many modern microprocessors and ASICs that require high speed arithmetic logic often implement parallel prefix adders. Modern parallel prefix adder structures are based on previous works including those of Kogge-Stone, Brent-Kung, Ladner-Fischer, Knowles, et al. and designs presented in each work have their own merits and tradeoffs that are suitable for certain applications. Previous works have described standard and systematic ways to design and construct functional parallel prefix adder structures. Although the parallel prefix adder has been studied for decades, this work explores the possibility that non-standard and more optimal structures may exist by developing and utilizing a brute force search algorithm based on the prefix operator rules and properties to find all possible parallel prefix adder structures. The parallel prefix adder search algorithm design, search results and study of tradeoffs are discussed in this work.Electrical and Computer Engineerin
Recommended from our members
A comparative analysis of parallel prefix adders in 32nm and 45nm static CMOS technology
textBinary adders form a major part in various arithmetic logical operation units including multipliers, dividers and digital signal processors. Parallel prefix adders represent a set of efficient structures for binary addition, greatly suited for VLSI implementation due to their regularity and speed. This report is focused on the comparative analysis of 5 major types of parallel prefix adder frameworks namely Kooge-Stone, Knowles adders, Brent-Kung, Han-Carlson and Ladner-Fischer adders implemented in Synopsys's SAED 32nm static CMOS technology operating at 1.05V for 8-bit, 16-bit and 32-bit input vectors based on power, performance and area (PPA) metrics. The process technology is modeled with 9 metal tracks. Power, performance and area metrics based on circuit simulations are used for comparison. The metrics are compared across SAED 32nm and FreePDK 45nm technology to quantify the impact of technology on architecture.Electrical and Computer Engineerin
IEEE Compliant Double-Precision FPU and 64-bit ALU with Variable Latency Integer Divider
Together the arithmetic logic unit (ALU) and floating-point unit (FPU) perform all of the mathematical and logic operations of computer processors. Because they are used so prominently, they fall in the critical path of the central processing unit - often becoming the bottleneck, or limiting factor for performance. As such, the design of a high-speed ALU and FPU is vital to creating a processor capable of performing up to the demanding standards of today\u27s computer users.
In this paper, both a 64-bit ALU and a 64-bit FPU are designed based on the reduced instruction set computer architecture. The ALU performs the four basic mathematical operations - addition, subtraction, multiplication and division - in both unsigned and two\u27s complement format, basic logic operations and shifting. The division algorithm is a novel approach, using a comparison multiples based SRT divider to create a variable latency integer divider. The floating-point unit performs the double-precision floating-point operations add, subtract, multiply and divide, in accordance with the IEEE 754 standard for number representation and rounding.
The ALU and FPU were implemented in VHDL, simulated in ModelSim, and constrained and synthesized using Synopsys Design Compiler (2006.06). They were synthesized using TSMC 0.1 3nm CMOS technology. The timing, power and area synthesis results were recorded, and, where applicable, compared to those of the corresponding DesignWare components.The ALU synthesis reported an area of 122,215 gates, a power of 384 mW, and a delay of 2.89 ns - a frequency of 346 MHz. The FPU synthesis reported an area 84,440 gates, a delay of 2.82 ns and an operating frequency of 355 MHz. It has a maximum dynamic power of 153.9 mW
A Novel VLSI Design On CSKA Of Binary Tree Adder With Compaq Area And High Throughput
Addition is one of the most basic operations performed in all computing units, including microprocessors and digital signal processors. It is also a basic unit utilized in various complicated algorithms of multiplication and division. Efficient implementation of an adder circuit usually revolves around reducing the cost to propagate the carry between successive bit positions. Multi-operand adders are important arithmetic design blocks especially in the addition of partial products of hardware multipliers. The multi-operand adders (MOAs) are widely used in the modern low-power and high-speed portable very-large-scale integration systems for image and signal processing applications such as digital filters, transforms, convolution neural network architecture. Hence, a new high-speed and area efficient adder architecture is proposed using pre-compute bitwise addition followed by carry prefix computation logic to perform the three-operand binary addition that consumes substantially less area, low power and drastically reduces the adder delay. Further, this project is enhanced by using Modified carry bypass adder to further reduce more density and latency constraints. Modified carry skip adder introduces simple and low complex carry skip logic to reduce parameters constraints. In this proposal work, designed binary tree adder (BTA) is analyzed to find the possibilities for area minimization. Based on the analysis, critical path of carry is taken into the new logic implementation and the corresponding design of CSKP are proposed for the BTA with AOI, OAI