9 research outputs found

    Evaluating A+B=K conditions in constant time

    Get PDF
    The authors consider a type of condition that can be evaluated without requiring a complete ALU (arithmetic logic unit) operation. The circuit that is presented detects the condition A+B=K (n-bit numbers) in constant time, avoiding the carry propagation delay. This circuit can be used to detect a wide spectrum of conditions in branch instructions. It can improve the processor performance by advancing the evaluation of conditions and eliminating the pipeline delays produced by these operations.Peer ReviewedPostprint (published version

    Design of ALU and Cache Memory for an 8 bit ALU

    Get PDF
    The design of an ALU and a Cache memory for use in a high performance processor was examined in this thesis. Advanced architectures employing increased parallelism were analyzed to minimize the number of execution cycles needed for 8 bit integer arithmetic operations. In addition to the arithmetic unit, an optimized SRAM memory cell was designed to be used as cache memory and as fast Look Up Table. The ALU consists of stand alone units for bit parallel computation of basic integer arithmetic operations. Addition and subtraction were performed using Kogge Stone parallel prefix hardware operating at 330MHz. A high performance multiplier was built using Radix 4 Modified Booth Encoder (MBE) and a Wallace Tree summation array. The multiplier requires single clock cycle for 8 bit integer multiplication and operates at a maximum frequency of 100MHz. Multiplicative division hardware was built for executing both integer division and square root. The division hardware computes 8-bit division and square root in 4 clock cycles. Multiplier forms the basic building block of all these functional units, making high level of resource sharing feasible with this architecture. The optimal operating frequency for the arithmetic unit is 70MHz. A 6T CMOS SRAM cell measuring 90 µm2 was designed using minimum size transistors. The layout allows for horizontal overlap resulting in effective area of 76 µm2 for an 8x8 array. By substituting equivalent bit line capacitance of P4 L1 Cache, the memory was simulated to have a read time of 3.27ns. An optimized set of test vectors were identified to enable high fault coverage without the need for any additional test circuitry. Sixteen test cases were identified that would toggle all the nodes and provide all possible inputs to the sub units of the multiplier. A correlation based semi automatic method was investigated to facilitate test case identification for large multipliers. This method of testability eliminates performance and area overhead associated with conventional testability hardware. Bottom up design methodology was employed for the design. The performance and area metrics are presented along with estimated power consumption. A set of Monte Carlo analysis was carried out to ensure the dependability of the design under process variations as well as fluctuations in operating conditions. The arithmetic unit was found to require a total die area of 2mm2 (approx.) in 0.35 micron process

    Computer arithmetic based on the Continuous Valued Number System

    Get PDF

    An Ultra-Low-Power 75mV 64-Bit Current-Mode Majority-Function Adder

    Get PDF
    Ultra-low-power circuits are becoming more desirable due to growing portable device markets and they are also becoming more interesting and applicable today in biomedical, pharmacy and sensor networking applications because of the nano-metric scaling and CMOS reliability improvements. In this thesis, three main achievements are presented in ultra-low-power adders. First, a new majority function algorithm for carry and the sum generation is presented. Then with this algorithm and implied new architecture, we achieved a circuit with 75mV supply voltage operation. Last but not least, a 64 bit current-mode majority-function adder based on the new architecture and algorithm is successfully tested at 75mV supply voltage. The circuit consumed 4.5nW or 3.8pJ in one of the worst conditions

    High-Performance, Energy-Efficient CMOS Arithmetic Circuits

    Get PDF
    In a modern microprocessor, datapath/arithmetic circuits have always been an important building block in delivering high-performance, energy-efficient computing, because arithmetic operations such as addition and binary number comparison are two of the most commonly used computing instructions. Besides the manufacturing CMOS process, the two most critical design considerations for arithmetic circuits are the logic style and micro-architecture. In this thesis, a constant-delay (CD) logic style is proposed targeting full-custom high-speed applications. The constant delay characteristic of this logic style (regardless of the logic type) makes it suitable for implementing complicated logic expressions such as addition. CD logic exhibits a unique characteristic where the output is pre-evaluated before the inputs from the preceding stage are ready. This feature enables a performance advantage over static and dynamic domino logic styles in a single cycle, multi-stage circuit block. Several design considerations including timing window width adjustment and clock distribution are discussed. Using a 65-nm general-purpose CMOS technology, the proposed logic style demonstrates an average speedup of 94% and 56% over static and dynamic domino logic, respectively, in five different logic gates. Simulation results of 8-bit ripple carry adders conclude that CD logic is 39% and 23% faster than the static and dynamic-based adders, respectively. CD logic also demonstrates 39% speedup and 64% (22%) energy-delay product reduction from static logic at 100% (10%) data activity in 32-bit carry lookahead adders. To confirm CD logic's potential, a 148 ps, single-cycle 64-bit adder with CD logic implemented in the critical path is fabricated in a 65-nm, 1-V CMOS process. A new 64-bit Ling adder micro-architecture, which utilizes both inversion and absorption properties to minimize the number of CD logic and the number of logic stage in the critical path, is also proposed. At 1-V supply, this adder's measured worst-case power and leakage power are 135 mW and 0.22 mW, respectively. A single-cycle 64-bit binary comparator utilizing a radix-2 tree structure is also proposed. This comparator architecture is specifically designed for static logic to achieve both low-power and high-performance operation, especially in low input data activity environments. At 65-nm technology with 25% (10%) data activity, the proposed design demonstrates 2.3x (3.5x) and 3.7x (5.8x) power and energy-delay product efficiency, respectively. This comparator is also 2.7x faster at iso-energy (80 fJ) or 3.3x more energy-efficient at iso-delay (200 ps) than existing designs. An improved comparator, where CD logic is utilized in the critical path to achieve high performance without sacrificing the overall energy efficiency, is also realized in a 65-nm 1-V CMOS process. At 1-V supply, the proposed comparator's measured delay is 167 ps, and has an average power and a leakage power of 2.34 mW and 0.06 mW, respectively. At 0.3-pJ iso-energy or 250-ps iso-delay budget, the proposed comparator with CD logic is 20% faster or 17% more energy-efficient compared to a comparator implemented with just the static logic

    Frequency Synthesis in Wireless and Wireline Systems

    Get PDF
    First, a frequency synthesizer for IEEE 802.15.4 / ZigBee transceiver applications that employs dynamic True Single Phase Clocking (TSPC) circuits in its frequency dividers is presented and through the analysis and measurement results of this synthesizer, the need for low power circuit techniques in frequency dividers is discussed. Next, Differential Cascode Voltage-Switch-Logic (DCVSL) based delay cells are explored for implementing radio-frequency (RF) frequency dividers of low power frequency synthesizers. DCVSL ip- ops offer small input and clock capacitance which makes the power consumption of these circuits and their driving stages, very low. We perform a delay analysis of DCVSL circuits and propose a closed-form delay model that predicts the speed of DCVSL circuits with 8 percent worst case accuracy. The proposed delay model also demonstrates that DCVSL circuits suffer from a large low-to-high propagation delay ( PLH) which limits their speed and results in asymmetrical output waveforms. Our proposed enhanced DCVSL, which we call DCVSL-R, solves this delay bottleneck, reducing PLH and achieving faster operation. We implement two ring-oscillator-based voltage controlled oscillators (VCOs) in 0.13 mu m technology with DCVSL and DCVSL-R delay cells. In measurements, for the same oscillation frequency (2.4GHz) and same phase noise (-113dBc/Hz at 10MHz), DCVSL-R VCO consumes 30 percent less power than the DCVSL VCO. We also use the proposed DCVSL-R circuit to implement the 2.4GHz dual-modulus prescaler of a low power frequency synthesizer in 0.18 mu m technology. In measurements, the synthesizer exhibits -135dBc/Hz phase noise at 10MHz offset and 58 mu m settling time with 8.3mW power consumption, only 1.07mWof which is consumed by the dual modulus prescaler and the buffer that drives it. When compared to other dual modulus prescalers with similar division ratios and operating frequencies in literature, DCVSL-R dual modulus prescaler demonstrates the lowest power consumption. An all digital phase locked loop (ADPLL) that operates for a wide range of frequencies to serve as a multi-protocol compatible PLL for microprocessor and serial link applications, is presented. The proposed ADPLL is truly digital and is implemented in a standard complementary metal-oxide-semiconductor (CMOS) technology without any analog/RF or non-scalable components. It addresses the challenges that come along with continuous wide range of operation such as stability and phase frequency detection for a large frequency error range. A proposed multi-bit bidirectional smart shifter serves as the digitally controlled oscillator (DCO) control and tunes the DCO frequency by turning on/off inverter units in a large row/column matrix that constitute the ring oscillator. The smart shifter block is completely digital, consisting of standard cell logic gates, and is capable of tracking the row/column unit availability of the DCO and shifting multiple bits per single update cycle. This enables fast frequency acquisition times without necessitating dual loop fi lter or gear shifting mechanisms. The proposed ADPLL loop architecture does not employ costly, cumbersome DACs or binary to thermometer converters and minimizes loop filter and DCO control complexity. The wide range ADPLL is implemented in 90nm digital CMOS technology and has a 9-bit TDC, the output of which is processed by a 10-bit digital loop filter and a 5-bit smart shifter. In measurements, the synthesizer achieves 2.5GHz-7.3GHz operation while consuming 10mW/GHz power, with an active area of 0.23 mm2

    Lossy Polynomial Datapath Synthesis

    No full text
    The design of the compute elements of hardware, its datapath, plays a crucial role in determining the speed, area and power consumption of a device. The building blocks of datapath are polynomial in nature. Research into the implementation of adders and multipliers has a long history and developments in this area will continue. Despite such efficient building block implementations, correctly determining the necessary precision of each building block within a design is a challenge. It is typical that standard or uniform precisions are chosen, such as the IEEE floating point precisions. The hardware quality of the datapath is inextricably linked to the precisions of which it is composed. There is, however, another essential element that determines hardware quality, namely that of the accuracy of the components. If one were to implement each of the official IEEE rounding modes, significant differences in hardware quality would be found. But in the same fashion that standard precisions may be unnecessarily chosen, it is typical that components may be constructed to return one of these correctly rounded results, where in fact such accuracy is far from necessary. Unfortunately if a lesser accuracy is permissible then the techniques that exist to reduce hardware implementation cost by exploiting such freedom invariably produce an error with extremely difficult to determine properties. This thesis addresses the problem of how to construct hardware to efficiently implement fixed and floating-point polynomials while exploiting a global error freedom. This is a form of lossy synthesis. The fixed-point contributions include resource minimisation when implementing mutually exclusive polynomials, the construction of minimal lossy components with guaranteed worst case error and a technique for efficient composition of such components. Contributions are also made to how a floating-point polynomial can be implemented with guaranteed relative error.Open Acces

    Space Communications: Theory and Applications. Volume 3: Information Processing and Advanced Techniques. A Bibliography, 1958 - 1963

    Get PDF
    Annotated bibliography on information processing and advanced communication techniques - theory and applications of space communication
    corecore