Search CORE

285 research outputs found

Design of ALU and Cache Memory for an 8 bit ALU

Author: Chandran Pravin chander
Publication venue: Clemson University Libraries
Publication date: 01/12/2007
Field of study

The design of an ALU and a Cache memory for use in a high performance processor was examined in this thesis. Advanced architectures employing increased parallelism were analyzed to minimize the number of execution cycles needed for 8 bit integer arithmetic operations. In addition to the arithmetic unit, an optimized SRAM memory cell was designed to be used as cache memory and as fast Look Up Table. The ALU consists of stand alone units for bit parallel computation of basic integer arithmetic operations. Addition and subtraction were performed using Kogge Stone parallel prefix hardware operating at 330MHz. A high performance multiplier was built using Radix 4 Modified Booth Encoder (MBE) and a Wallace Tree summation array. The multiplier requires single clock cycle for 8 bit integer multiplication and operates at a maximum frequency of 100MHz. Multiplicative division hardware was built for executing both integer division and square root. The division hardware computes 8-bit division and square root in 4 clock cycles. Multiplier forms the basic building block of all these functional units, making high level of resource sharing feasible with this architecture. The optimal operating frequency for the arithmetic unit is 70MHz. A 6T CMOS SRAM cell measuring 90 µm2 was designed using minimum size transistors. The layout allows for horizontal overlap resulting in effective area of 76 µm2 for an 8x8 array. By substituting equivalent bit line capacitance of P4 L1 Cache, the memory was simulated to have a read time of 3.27ns. An optimized set of test vectors were identified to enable high fault coverage without the need for any additional test circuitry. Sixteen test cases were identified that would toggle all the nodes and provide all possible inputs to the sub units of the multiplier. A correlation based semi automatic method was investigated to facilitate test case identification for large multipliers. This method of testability eliminates performance and area overhead associated with conventional testability hardware. Bottom up design methodology was employed for the design. The performance and area metrics are presented along with estimated power consumption. A set of Monte Carlo analysis was carried out to ensure the dependability of the design under process variations as well as fluctuations in operating conditions. The arithmetic unit was found to require a total die area of 2mm2 (approx.) in 0.35 micron process

Clemson University: TigerPrints

Energy Aware Design and Analysis for Synchronous and Asynchronous Circuits

Author: Di Jia
Publication venue: University of Central Florida
Publication date: 01/01/2004
Field of study

Power dissipation has become a major concern for IC designers. Various low power design techniques have been developed for synchronous circuits. Asynchronous circuits, however. have gained more interests recently due to their benefits in lower noise, easy timing control, etc. But few publications on energy reduction techniques for asynchronous logic are available. Power awareness indicates the ability of the system power to scale with changing conditions and quality requirements. Scalability is an important figure-of-merit since it allows the end user to implement operational policy. just like the user of mobile multimedia equipment needs to select between better quality and longer battery operation time. This dissertation discusses power/energy optimization and performs analysis on both synchronous and asynchronous logic. The major contributions of this dissertation include: 1 ) A 2-Dimensional Pipeline Gating technique for synchronous pipelined circuits to improve their power awareness has been proposed. This technique gates the corresponding clock lines connected to registers in both vertical direction (the data flow direction) and horizontal direction (registers within each pipeline stage) based on current input precision. 2) Two energy reduction techniques, Signal Bypassing & Insertion and Zero Insertion. have been developed for NCL circuits. Both techniques use Nulls to replace redundant Data 0\u27s based on current input precision in order to reduce the switching activity while Signal Bypassing & Insertion is for non-pipelined NCI, circuits and Zero Insertion is for pipelined counterparts. A dynamic active-bit detection scheme is also developed as an expansion. 3) Two energy estimation techniques, Equivalent Inverter Modeling based on Input Mapping in transistor-level and Switching Activity Modeling in gate-level, have been proposed. The former one is for CMOS gates with feedbacks and the latter one is for NCL circuits

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

A Modified Architecture of Multiplier and Accumulator Using Spurious Power Suppression Technique

Author: Mohanapriya R.
Rajesh K.
Publication venue: 'GIAP Journals'
Publication date: 27/09/2015
Field of study

High speed and low power Multiplier and Accumulator (MAC) unit is at most requirement of today’s VLSI systems and digital signal processing (DSP) applications like FFT, Finite Impulse response filters, convolution etc. In this modified architecture, Radix-4 Modified Booth Encoding (MBE) is used to produce the partial products. In this multiplication and accumulation has been combined using a hybrid type of Carry Save Adder (CSA). So the performance will be improved. A Carry Look ahead Adder is inserted in the CSA tree to reduce the number of bits in the final adder. In booth multiplication, when two numbers are multiplied some portion of the data may be zero. By neglecting those data, power has been reduced. For this purpose Spurious Power Suppression Technique (SPST) is used to remove useless portion of the data in addition process. In this modified architecture, the overall process is three stages to produce the result. The modified MAC operation is coded with Verilog and simulated using Xilinx 12.1

Gyandhara International Academic Publication (GIAP): Journals

MULTIPAC, a multiple pool processor and computer for a spacecraft central data system, phase 2 Final report

Author: Baker T. E.
Cummings G. A.
South R. L.
Publication venue
Publication date
Field of study

MULTIPAC, multiple pool processor and computer for deep space probe central data syste

NASA Technical Reports Server

The KT-Pilot Computer

Author: HAGIWARA Hiroshi
OKUGAWA Syunji
Publication venue: 京都大学工学部
Publication date: 26/03/1966
Field of study

This paper deals with the outline, design features and performance of the KT-Pilot, which is a binary, parallel, high-speed digital computer developed for research into the technical problems encountered in high-speed computers. This machine, although a small-sized experimental one, has many unique features : a microprogram control system utilizing a newly developed phototransistor fixed or read-only memory, asynchronous operations adopted to speed up the machine, a continuous-sheet Supermalloy thin film memory first put into practical use as the computer memory in this country, high-speed logic circuits employing fast mesa transistors and both mesa and gold-bonded diodes as circuit components, and so on. Two types of core memories, i.e. current-coincident type and word-organized type, constitute the main memory together with the above-mentioned thin film memory. A photo-tape reader and a Flexowriter are used respectively as the input and output devices. This machine has been operating stably at our laboratory, and its performance is very satisfactory

Kyoto University Research Information Repository

Power-Aware Design Methodologies for FPGA-Based Implementation of Video Processing Systems

Author: Ngo Hau Trung
Publication venue: ODU Digital Commons
Publication date: 01/01/2007
Field of study

The increasing capacity and capabilities of FPGA devices in recent years provide an attractive option for performance-hungry applications in the image and video processing domain. FPGA devices are often used as implementation platforms for image and video processing algorithms for real-time applications due to their programmable structure that can exploit inherent spatial and temporal parallelism. While performance and area remain as two main design criteria, power consumption has become an important design goal especially for mobile devices. Reduction in power consumption can be achieved by reducing the supply voltage, capacitances, clock frequency and switching activities in a circuit. Switching activities can be reduced by architectural optimization of the processing cores such as adders, multipliers, multiply and accumulators (MACS), etc. This dissertation research focuses on reducing the switching activities in digital circuits by considering data dependencies in bit level, word level and block level neighborhoods in a video frame. The bit level data neighborhood dependency consideration for power reduction is illustrated in the design of pipelined array, Booth and log-based multipliers. For an array multiplier, operands of the multipliers are partitioned into higher and lower parts so that the probability of the higher order parts being zero or one increases. The gating technique for the pipelined approach deactivates part(s) of the multiplier when the above special values are detected. For the Booth multiplier, the partitioning and gating technique is integrated into the Booth recoding scheme. In addition, a delay correction strategy is developed for the Booth multiplier to reduce the switching activities of the sign extension part in the partial products. A novel architecture design for the computation of log and inverse-log functions for the reduction of power consumption in arithmetic circuits is also presented. This also utilizes the proposed partitioning and gating technique for further dynamic power reduction by reducing the switching activities. The word level and block level data dependencies for reducing the dynamic power consumption are illustrated by presenting the design of a 2-D convolution architecture. Here the similarities of the neighboring pixels in window-based operations of image and video processing algorithms are considered for reduced switching activities. A partitioning and detection mechanism is developed to deactivate the parallel architecture for window-based operations if higher order parts of the pixel values are the same. A neighborhood dependent approach (NDA) is incorporated with different window buffering schemes. Consideration of the symmetry property in filter kernels is also applied with the NDA method for further reduction of switching activities. The proposed design methodologies are implemented and evaluated in a FPGA environment. It is observed that the dynamic power consumption in FPGA-based circuit implementations is significantly reduced in bit level, data level and block level architectures when compared to state-of-the-art design techniques. A specific application for the design of a real-time video processing system incorporating the proposed design methodologies for low power consumption is also presented. An image enhancement application is considered and the proposed partitioning and gating, and NDA methods are utilized in the design of the enhancement system. Experimental results show that the proposed multi-level power aware methodology achieves considerable power reduction. Research work is progressing In utilizing the data dependencies in subsequent frames in a video stream for the reduction of circuit switching activities and thereby the dynamic power consumption

Old Dominion University

An Ultra-Low-Power 75mV 64-Bit Current-Mode Majority-Function Adder

Author: Ebrahimi Manuchehr
Publication venue: 'University of Waterloo'
Publication date: 18/05/2012
Field of study

Ultra-low-power circuits are becoming more desirable due to growing portable device markets and they are also becoming more interesting and applicable today in biomedical, pharmacy and sensor networking applications because of the nano-metric scaling and CMOS reliability improvements. In this thesis, three main achievements are presented in ultra-low-power adders. First, a new majority function algorithm for carry and the sum generation is presented. Then with this algorithm and implied new architecture, we achieved a circuit with 75mV supply voltage operation. Last but not least, a 64 bit current-mode majority-function adder based on the new architecture and algorithm is successfully tested at 75mV supply voltage. The circuit consumed 4.5nW or 3.8pJ in one of the worst conditions

University of Waterloo's Institutional Repository

ANALYSIS OF MOS CURRENT MODE LOGIC (MCML) AND IMPLEMENTATION OF MCML STANDARD CELL LIBRARY FOR LOW-NOISE DIGITAL CIRCUIT DESIGN

Author: Heim Marcus Edwin Allan
Publication venue: DigitalCommons@CalPoly
Publication date: 01/06/2015
Field of study

MOS current mode logic (MCML) offers low noise digital circuits that reduce noise that can cripple analog components in mixed-signal integrated circuits, when compared to CMOS digital circuits. An MCML standard cell library was developed for the Cadence Virtuoso Integrated Circuit (IC) design software that gives IC designers the ability to design complex, low noise digital circuits for use in mixed-signal and noise sensitive systems at a high level of abstraction, allowing them to get superior products to market faster than competitors. The MCML standard cell library developed and presented here allows for fast development of mixed signal circuits by providing quiet digital building block gates that reduce the simultaneous switching noise (SSN) by an order of magnitude over conventional CMOS based designs [3]. This thesis project developed the following digital gates in MCML as a standard cell library for general-purpose low noise and very low noise applications: inverter, buffer, NAND, AND, NOR, OR, XOR, NXOR, 2:1 MUX, CMOS to MCML, MCML to CMOS, and double edge triggered flip-flop (DETFF)

DigitalCommons@CalPoly