248 research outputs found

    Power Reductions with Energy Recovery Using Resonant Topologies

    Get PDF
    The problem of power densities in system-on-chips (SoCs) and processors has become more exacerbated recently, resulting in high cooling costs and reliability issues. One of the largest components of power consumption is the low skew clock distribution network (CDN), driving large load capacitance. This can consume as much as 70% of the total dynamic power that is lost as heat, needing elaborate sensing and cooling mechanisms. To mitigate this, resonant clocking has been utilized in several applications over the past decade. An improved energy recovering reconfigurable generalized series resonance (GSR) solution with all the critical support circuitry is developed in this work. This LC resonant clock driver is shown to save about 50% driver power (\u3e40% overall), on a 22nm process node and has 50% less skew than a non-resonant driver at 2GHz. It can operate down to 0.2GHz to support other energy savings techniques like dynamic voltage and frequency scaling (DVFS). As an example, GSR can be configured for the simpler pulse series resonance (PSR) operation to enable further power saving for double data rate (DDR) applications, by using de-skewing latches instead of flip-flop banks. A PSR based subsystem for 40% savings in clocking power with 40% driver active area reduction xii is demonstrated. This new resonant driver generates tracking pulses at each transition of clock for dual edge operation across DVFS. PSR clocking is designed to drive explicit-pulsed latches with negative setup time. Simulations using 45nm IBM/PTM device and interconnect technology models, clocking 1024 flip-flops show the reductions, compared to non-resonant clocking. DVFS range from 2GHz/1.3V to 200MHz/0.5V is obtained. The PSR frequency is set \u3e3× the clock rate, needing only 1/10th the inductance of prior-art LC resonance schemes. The skew reductions are achieved without needing to increase the interconnect widths owing to negative set-up times. Applications in data circuits are shown as well with a 90nm example. Parallel resonant and split-driver non-resonant configurations as well are derived from GSR. Tradeoffs in timing performance versus power, based on theoretical analysis, are compared for the first time and verified. This enables synthesis of an optimal topology for a given application from the GSR

    A Modified Signal Feed-Through Pulsed Flip-Flop for Low Power Applications

    Get PDF
    In this paper a modified signal feed-through pulsed flip-flop has been presented for low power applications. Signal feed-through flip-flop uses a pass transistor to feed input data directly to the output. Feed through transistor and feedback signals have been modified for delay, static and dynamic power reduction. HSPICE simulation shows 22% reduction in leakage power and 8% of dynamic power. Delay has been reduced by 14% using TSMC 90nm technology parameters. The proposed pulsed flip-flop has the lowest PDP (Power Delay Product) among other pulsed flip-flops discussed

    A novel approach to embedding memory with crosstalk logic circuits for high performance applications

    Get PDF
    Title from PDF of title page viewed January 20, 2022Thesis advisor: Mostafizur RahmanVitaIncludes bibliographical references (page 33-36)Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2021One of the essential elements of computing is the memory element. Flip-flops form an integral part of a System-on-Chip (SoC) and consume most of the area on the die. To meet the high-speed performance demands by data-intensive applications like artificial intelligence, cloud computing, and machine learning, we intend to integrate memory with the logic to get built-in memory logic circuits that operate based on the crosstalk computing logic. These circuits are called Crosstalk Built-in Memory Logic (CBML) circuits which exploit the detrimental interconnect crosstalk and astutely turn this unwanted effect into a computing principle with embedded memory. By virtue of our novel CBML circuit technique, the logic is computed by the novel computing technique called Crosstalk Computing, and the result is stored intrinsically within these complex circuits. The stored values will be retained irrespective of the change in input until the next logic evaluation cycle. This neoteric embedding of memory in logic provides high-speed operation with a reduced number of transistors. In this work, we present how Crosstalk Computing can be leveraged to embed memory in logic. We have manifested the in-built memory feature of the complex CBML circuits using 16 nanometer (nm) PTM models in HSPICE. Benchmarking is performed with the equivalent static CMOS circuits to compare the number of transistors, performance, and power. It is observed that the number of transistors consumed by the CBML logic circuits like 3-input AND, OR, is 40% less than the equivalent CMOS circuits. The cascaded CBML 4-bit Full-Adder (the key element prevalent in Arithmetic circuits, e.g., ALU, Counters, etc.,) is up to 46% less, and performance is improved by 27% than the equivalent CMOS circuits. This circuit serves as an example of a large-scale CBML circuit. Also, the performance improvement achieved by other circuits such as 3-input AND and the CARRY logic is up to 60% along with a 40% reduction in the number of transistors. The qualitative analysis with the existing flip-flop architectures shows that the transistor count reduction of the CBML circuits is around 45% less than the architectures like Semi-dynamic Single-Phase Pulsed FF (SDFF), Dual dynamic node FF (DDFF) which embed logic into their flip-flop circuits. Additionally, these existing architectures do not have capability to embed complex circuits like full adder into a single flip-flop circuit. Hence, CBML circuits have the potential to pave the way for special high-speed macros with specifically engineered structures.Introduction and Motivation -- Basic Flip-Flop Memory -- Literature survey on Existing Flip-Flop Architectures With Embedded Logic -- Overview: Basic Crosstalk Circuits -- Crosstalk Logic With Embedded Memory Feature -- Comparison and Benchmarking -- Conclusion and Future Wor

    High-Speed and Low-Energy On-Chip Communication Circuits.

    Full text link
    Continuous technology scaling sharply reduces transistor delays, while fixed-length global wire delays have increased due to less wiring pitch with higher resistance and coupling capacitance. Due to this ever growing gap, long on-chip interconnects pose well-known latency, bandwidth, and energy challenges to high-performance VLSI systems. Repeaters effectively mitigate wire RC effects but do little to improve their energy costs. Moreover, the increased complexity and high level of integration requires higher wire densities, worsening crosstalk noise and power consumption of conventionally repeated interconnects. Such increasing concerns in global on-chip wires motivate circuits to improve wire performance and energy while reducing the number of repeaters. This work presents circuit techniques and investigation for high-performance and energy-efficient on-chip communication in the aspects of encoding, data compression, self-timed current injection, signal pre-emphasis, low-swing signaling, and technology mapping. The improved bus designs also consider the constraints of robust operation and performance/energy gains across process corners and design space. Measurement results from 5mm links on 65nm and 90nm prototype chips validate 2.5-3X improvement in energy-delay product.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/75800/1/jseo_1.pd

    Low-Power Design of Digital VLSI Circuits around the Point of First Failure

    Get PDF
    As an increase of intelligent and self-powered devices is forecasted for our future everyday life, the implementation of energy-autonomous devices that can wirelessly communicate data from sensors is crucial. Even though techniques such as voltage scaling proved to effectively reduce the energy consumption of digital circuits, additional energy savings are still required for a longer battery life. One of the main limitations of essentially any low-energy technique is the potential degradation of the quality of service (QoS). Thus, a thorough understanding of how circuits behave when operated around the point of first failure (PoFF) is key for the effective application of conventional energy-efficient methods as well as for the development of future low-energy techniques. In this thesis, a variety of circuits, techniques, and tools is described to reduce the energy consumption in digital systems when operated either in the safe and conservative exact region, close to the PoFF, or even inside the inexact region. A straightforward approach to reduce the power consumed by clock distribution while safely operating in the exact region is dual-edge-triggered (DET) clocking. However, the DET approach is rarely taken, primarily due to the perceived complexity of its integration. In this thesis, a fully automated design flow is introduced for applying DET clocking to a conventional single-edge-triggered (SET) design. In addition, the first static true-single-phase-clock DET flip-flop (DET-FF) that completely avoids clock-overlap hazards of DET registers is proposed. Even though the correct timing of synchronous circuits is ensured in worst-case conditions, the critical path might not always be excited. Thus, dynamic clock adjustment (DCA) has been proposed to trim any available dynamic timing margin by changing the operating clock frequency at runtime. This thesis describes a dynamically-adjustable clock generator (DCG) capable of modifying the period of the produced clock signal on a cycle-by-cycle basis that enables the DCA technique. In addition, a timing-monitoring sequential (TMS) that detects input transitions on either one of the clock phases to enable the selection of the best timing-monitoring strategy at runtime is proposed. Energy-quality scaling techniques aimat trading lower energy consumption for a small degradation on the QoS whenever approximations can be tolerated. In this thesis, a low-power methodology for the perturbation of baseline coefficients in reconfigurable finite impulse response (FIR) filters is proposed. The baseline coefficients are optimized to reduce the switching activity of the multipliers in the FIR filter, enabling the possibility of scaling the power consumption of the filter at runtime. The area as well as the leakage power of many system-on-chips is often dominated by embedded memories. Gain-cell embedded DRAM (GC-eDRAM) is a compact, low-power and CMOS-compatible alternative to the conventional static random-access memory (SRAM) when a higher memory density is desired. However, due to GC-eDRAMs relying on many interdependent variables, the adaptation of existing memories and the design of future GCeDRAMs prove to be highly complex tasks. Thus, the first modeling tool that estimates timing, memory availability, bandwidth, and area of GC-eDRAMs for a fast exploration of their design space is proposed in this thesis

    Single event upset hardened embedded domain specific reconfigurable architecture

    Get PDF
    corecore