260 research outputs found

    Power Reductions with Energy Recovery Using Resonant Topologies

    Get PDF
    The problem of power densities in system-on-chips (SoCs) and processors has become more exacerbated recently, resulting in high cooling costs and reliability issues. One of the largest components of power consumption is the low skew clock distribution network (CDN), driving large load capacitance. This can consume as much as 70% of the total dynamic power that is lost as heat, needing elaborate sensing and cooling mechanisms. To mitigate this, resonant clocking has been utilized in several applications over the past decade. An improved energy recovering reconfigurable generalized series resonance (GSR) solution with all the critical support circuitry is developed in this work. This LC resonant clock driver is shown to save about 50% driver power (\u3e40% overall), on a 22nm process node and has 50% less skew than a non-resonant driver at 2GHz. It can operate down to 0.2GHz to support other energy savings techniques like dynamic voltage and frequency scaling (DVFS). As an example, GSR can be configured for the simpler pulse series resonance (PSR) operation to enable further power saving for double data rate (DDR) applications, by using de-skewing latches instead of flip-flop banks. A PSR based subsystem for 40% savings in clocking power with 40% driver active area reduction xii is demonstrated. This new resonant driver generates tracking pulses at each transition of clock for dual edge operation across DVFS. PSR clocking is designed to drive explicit-pulsed latches with negative setup time. Simulations using 45nm IBM/PTM device and interconnect technology models, clocking 1024 flip-flops show the reductions, compared to non-resonant clocking. DVFS range from 2GHz/1.3V to 200MHz/0.5V is obtained. The PSR frequency is set \u3e3× the clock rate, needing only 1/10th the inductance of prior-art LC resonance schemes. The skew reductions are achieved without needing to increase the interconnect widths owing to negative set-up times. Applications in data circuits are shown as well with a 90nm example. Parallel resonant and split-driver non-resonant configurations as well are derived from GSR. Tradeoffs in timing performance versus power, based on theoretical analysis, are compared for the first time and verified. This enables synthesis of an optimal topology for a given application from the GSR

    Design and automation of voltage-scaled clock networks

    Get PDF
    In this dissertation, a vital step of VLSI physical design flow, synthesis of clock distribution networks, is investigated. Clock network synthesis (CNS) involves large and complex optimization problems to achieve high performance and low power demands of current integrated circuits (ICs). Ineffectiveness of existing methodologies to provide high performance at lower voltage nodes is the main driver for this dissertation research. A design and automation flow for voltage-scaled clock networks is proposed to satisfy tight timing constraints at high frequency (for high performance) and low voltage (for low power) operation. One implementation of voltage-scaled clock networks is low (voltage) swing clocking, which is a known technique, yet its applicability remains limited to designs with low performance demands. In this dissertation, novel methodologies are introduced to i) apply low swing clocking to legacy designs as a power saving methodology, ii) develop a complete CNS flow for low swing clocking of high performance ICs. These methodologies include slew-driven approaches that are better suited to future transistor and interconnect technologies. Second implementation of voltage-scaled clock networks is multi-voltage clocking, which is another known technique, yet its applicability remains limited to clock tree topology. In this dissertation, multi-voltage clocking with a clock mesh topology is investigated in order to address a missing aspect in the current IC design flows. Practical considerations of the current IC design flows are also investigated in this dissertation to expand the applicability of the proposed CNS flow. A novel methodology is introduced to facilitate clock gating within low swing clocking. The applicability of low swing clocking to FinFET technology, which is currently the industry norm, is shown to be effective.Ph.D., Electrical Engineering -- Drexel University, 201

    Robust Circuit Design for Low-Voltage VLSI.

    Full text link
    Voltage scaling is an effective way to reduce the overall power consumption, but the major challenges in low voltage operations include performance degradation and reliability issues due to PVT variations. This dissertation discusses three key circuit components that are critical in low-voltage VLSI. Level converters must be a reliable interface between two voltage domains, but the reduced on/off-current ratio makes it extremely difficult to achieve robust conversions at low voltages. Two static designs are proposed: LC2 adopts a novel pulsed-operation and modulates its pull-up strength depending on its state. A 3-sigma robustness is guaranteed using a current margin plot; SLC inherently reduces the contention by diode-insertion. Improvements in performance, power, and robustness are measured from 130nm CMOS test chips. SRAM is a major bottleneck in voltage-scaling due to its inherent ratioed-bitcell design. The proposed 7T SRAM alleviates the area overhead incurred by 8T bitcells and provides robust operation down to 0.32V in 180nm CMOS test chips with 3.35fW/bit leakage. Auto-Shut-Off provides a 6.8x READ energy reduction, and its innate Quasi-Static READ has been demonstrated which shows a much improved READ error rate. A use of PMOS Pass-Gate improves the half-select robustness by directly modulating the device strength through bitline voltage. Clocked sequential elements, flip-flops in short, are ubiquitous in today’s digital systems. The proposed S2CFF is static, single-phase, contention-free, and has the same number of devices as in TGFF. It shows a 40% power reduction as well as robust low-voltage operations in fabricated 45nm SOI test chips. Its simple hold-time path and the 3.4x improvement in 3-sigma hold-time is presented. A new on-chip flip-flop testing harness is also proposed, and measured hold-time variations of flip-flops are presented.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111525/1/yejoong_1.pd

    A Novel Approach For Design Of Pulse Triggered Flip-Flop To Enhance Speed And Power

    Get PDF
    In VLSI Technology, flip-flops contribute a significant portion of chip area and power consumption to overall system design. Pulse triggered flip-flops (P-FF) have single latch and hence simpler in circuit complexity. Use of Explicit type design for P-FF gives the speed advantage. This paper presents various Pulse triggered Flip-flop (P-FF) designs and various techniques to achieve a better design in terms of power consumption and speed. Introduction of simple pass transistor in latch design can be used to speed up data transition. Dual edge triggering can be adopted as it consumes less power as compared to single edge triggering. Also conditional discharge technique can be used to reduce switching activity. The work is done in tanner tool software. DOI: 10.17762/ijritcc2321-8169.15025

    Power Reduction Techniques in Clock Distribution Networks with Emphasis on LC Resonant Clocking

    Get PDF
    In this thesis we propose a set of independent techniques in the overall concept of LC resonant clocking where each technique reduces power consumption and improve system performance. Low-power design is becoming a crucial design objective due to the growing demand on portable applications and the increasing difficulties in cooling and heat removal. The clock distribution network delivers the clock signal which acts as a reference to all sequential elements in the synchronous system. The clock distribution network consumes a considerable amount of power in synchronous digital systems. Resonant clocking is an emerging promising technique to reduce the power of the clock network. The inductor used in resonant clocking enables the conversion of the electric energy stored on the clock capacitance to magnetic energy in the inductor and vice versa. In this thesis, the concept of the slack in the clock skew has been extended for an LC fully-resonant clock distribution network. This extra slack in comparison to standard clock distribution networks can be used to reduce routing complexity, achieve reduction in wire elongation, total wire length, and power consumption. Simulation results illustrate that by utilizing the proposed approach, an average reduction of 53% in the number of wire elongations and 11% reduction in total wire length can be achieved. A dual-edge clocking scheme introduced in the literature to enable the operation of the flip-flop at the rising- and falling edges of the clock has been modified. The interval by which the charging elements in the flip-flop are being switched-on was reduced causing a reduction in power consumption. Simulating the flip-flop in STMicroelectronics 90-nm technology shows correct functionality of the Sense Amplifier flip-flop with a resonant clock signal of 500 MHz and a throughput of 1 GHz under process, voltage, and temperature (PVT) variations. Modeling the resonant system with the proposed flip-flop illustrates that dual-edge compared to single-edge triggering can achieve up to 58% reduction in power consumption when the clock capacitance is the dominating factor. The application of low-swing clocking to LC resonant clock distribution network has been investigated on-chip. The proposed low-swing resonant clocking scheme operates with one voltage supply and does not require an additional supply voltage. The Differential Conditional Capturing flip-flop introduced in the literature was modified to operate with a low-swing sinusoidal clock. Low-swing resonant clocking achieved around 5.8% reduction in total power with 5.7% area overhead. Modeling the clock network with the proposed flip-flop illustrates that low-swing clocking can achieve up to 58% reduction in the power consumption of the resonant clock. An analytical approach was introduced to estimate the required driver strength in the clock generator. Using the proposed approach early in the design stage reduces area and power overhead by eliminating the need for programmable switches in the driving circuit

    Power-efficient high-speed interface circuit techniques

    Get PDF
    Inter- and intra-chip connections have become the new challenge to enable the scaling of computing systems, ranging from mobile devices to high-end servers. Demand for aggregate I/O bandwidth has been driven by applications including high-speed ethernet, backplane micro-servers, memory, graphics, chip-to-chip and network onchip. I/O circuitry is becoming the major power consumer in SoC processors and memories as the increasing bandwidth demands larger per-pin data rate or larger I/O pin count per component. The aggregate I/O bandwidth has approximately doubled every three to four years across a diverse range of standards in different applications. However, in order to keep pace with these standards enabled in part by process-technology scaling, we will require more than just device scaling in the near future. New energy-efficient circuit techniques must be proposed to enable the next generations of handheld and high-performance computers, given the thermal and system-power limits they start facing. ^ In this work, we are proposing circuit architectures that improve energy efficiency without decreasing speed performance for the most power hungry circuits in high speed interfaces. By the introduction of a new kind of logic operators in CMOS, called implication operators, we implemented a new family of high-speed frequency dividers/prescalers with reduced footprint and power consumption. New techniques and circuits for clock distribution, for pre-emphasis and for driver at the transmitter side of the I/O circuitry have been proposed and implemented. At the receiver side, new DFE architecture and CDR have been proposed and have been proven experimentally

    Robust low-power digital circuit design in nano-CMOS technologies

    Get PDF
    Device scaling has resulted in large scale integrated, high performance, low-power, and low cost systems. However the move towards sub-100 nm technology nodes has increased variability in device characteristics due to large process variations. Variability has severe implications on digital circuit design by causing timing uncertainties in combinational circuits, degrading yield and reliability of memory elements, and increasing power density due to slow scaling of supply voltage. Conventional design methods add large pessimistic safety margins to mitigate increased variability, however, they incur large power and performance loss as the combination of worst cases occurs very rarely. In-situ monitoring of timing failures provides an opportunity to dynamically tune safety margins in proportion to on-chip variability that can significantly minimize power and performance losses. We demonstrated by simulations two delay sensor designs to detect timing failures in advance that can be coupled with different compensation techniques such as voltage scaling, body biasing, or frequency scaling to avoid actual timing failures. Our simulation results using 45 nm and 32 nm technology BSIM4 models indicate significant reduction in total power consumption under temperature and statistical variations. Future work involves using dual sensing to avoid useless voltage scaling that incurs a speed loss. SRAM cache is the first victim of increased process variations that requires handcrafted design to meet area, power, and performance requirements. We have proposed novel 6 transistors (6T), 7 transistors (7T), and 8 transistors (8T)-SRAM cells that enable variability tolerant and low-power SRAM cache designs. Increased sense-amplifier offset voltage due to device mismatch arising from high variability increases delay and power consumption of SRAM design. We have proposed two novel design techniques to reduce offset voltage dependent delays providing a high speed low-power SRAM design. Increasing leakage currents in nano-CMOS technologies pose a major challenge to a low-power reliable design. We have investigated novel segmented supply voltage architecture to reduce leakage power of the SRAM caches since they occupy bulk of the total chip area and power. Future work involves developing leakage reduction methods for the combination logic designs including SRAM peripherals

    Architectural & circuit level techniques to improve energy efficiency of high speed serial links

    Get PDF
    High performance computing and communication are two key aspects of all information processing systems. With aggressive scaling of silicon technology enabling integration of a large number of transistors in a small area, managing power and thermal reliability has become very challenging. While lowering the power needed for performing computation has been the prime focus for decades, energy consumed for data transfer has recently become a major bottleneck especially in high performance applications. The focus of this thesis is on improving energy efficiency of communication links by exploring design techniques at both the architectural and circuit levels. In the first part of this work, we propose a time-based equalization scheme to implement transmit de-emphasis in voltage-mode output drivers. Using two-level pulse-width modulation, it overcomes the tradeoff between impedance matching, output swing, and de-emphasis resolution in conventional voltage-mode drivers. A prototype PWM-based 5 \,Gb/s voltage-mode transmitter was implemented in a 90 \,nm CMOS process and characterized across different channels and output swings to demonstrate the effectiveness of proposed techniques. The horizontal/vertical eye openings (BER=10−12\rm 10^{-12}) at the ends of 60 \,inch and 96 \,inch stripline channels are 78 \,mV/0.6 \,UI and 8 \,mV/0.3 \,UI, respectively. This transmitter achieves an energy efficiency of 3.1 \,mW/Gb/s while compensating for 16-28 \,dB channel loss, which compares favorably with the state-of-the-art. In the second part, techniques to improve energy efficiency of a complete transceiver are presented. The transmitter employs a novel partially segmented voltage-mode output driver to lower power consumption in pre-drivers during 2-tap FIR equalization. The receiver implements a low power half-rate clock and data recovery with the proposed ring PLL based multi-phase sampling clock generation in CDR loop and charge-based sampling and deserialization. These techniques are verified using the measured results obtained from a 14Gb/s transceiver prototype. Transmitter achieves an energy efficiency of 0.89 \,mW/Gb/s while securing a 0.36 \,UI sampling time margin with BER=10−12\rm{BER=10^{-12}} at the end of the channel with 11 \,dB loss at Nyquist frequency. The receiver recovers sampling clock with 1.8 \,psrms\rm{ps_{rms}} long term absolute jitter while recovering 14 \,Gb/s data at BER=10−12\rm{BER=10^{-12}}. The receiver achieves an energy efficiency of 1.69 \,mW/Gb/s. Transmitter and receiver share an LC PLL, which achieves 0.605 \,psrms\rm{ps_{rms}} integrated jitter at 7 \,GHz output with an energy efficiency of 0.5 \,mW/GHz. The transceiver as a whole achieves an energy efficiency of 2.8 \,mW/Gb/s

    MAS: Maximum Energy-Aware Sense Amplifier Link for Asynchronous Network on Chip

    Get PDF
    A real-time multiprocessor chip model is also called a Network-on-Chip (NoC), and deals a promising architecture for future systems-on-chips. Even though a lot of Double Tail Sense Amplifiers are used in architectural approach, the existing DTSA with transceiver exhibits a difficulty of consuming more energy than its gouged design during various traffic condition. Novel Low Power pulse Triggered Flip Flop with DTSA is designed in this research to eliminate the difficulty. The Traffic Aware Sense amplifier MAS consists of Sense amplifiers (SA’s), Traffic Generator, and Estimator. Among various SA’S suitable (DTSA and NLPTF -DTSA) SA are selected and information transferred to the receiver. The performance of both DTSA with Transceiver and NLPTF-DTSA with transceiver compared under various traffic conditions. The proposed design (NLPTF-DTSA) is observed on TSMC 90 nm technology, showing 5.92 Gb/s data rate and 0.51 W total link power
    • …
    corecore