32 research outputs found

    Single Event Effect Hardening Designs in 65nm CMOS Bulk Technology

    Get PDF
    Radiation from terrestrial and space environments is a great danger to integrated circuits (ICs). A single particle from a radiation environment strikes semiconductor materials resulting in voltage and current perturbation, where errors are induced. This phenomenon is termed a Single Event Effect (SEE). With the shrinking of transistor size, charge sharing between adjacent devices leads to less effectiveness of current radiation hardening methods. Improving fault-tolerance of storage cells and logic gates in advanced technologies becomes urgent and important. A new Single Event Upset (SEU) tolerant latch is proposed based on a previous hardened Quatro design. Soft error analysis tools are used and results show that the critical charge of the proposed design is approximately 2 times higher than that of the reference design with negligible penalty in area, delay, and power consumption. A test chip containing the proposed flip-flop chains was designed and exposed to alpha particles as well as heavy ions. Radiation experimental results indicate that the soft error rates of the proposed design are greatly reduced when Linear Energy Transfer (LET) is lower than 4, which makes it a suitable candidate for ground-level high reliability applications. To improve radiation tolerance of combinational circuits, two combinational logic gates are proposed. One is a layout-based hardening Cascode Voltage Switch Logic (CVSL) and the other is a fault-tolerant differential dynamic logic. Results from a SEE simulation tool indicate that the proposed CVSL has a higher critical charge, less cross section, and shorter Single Event Transient (SET) pulses when compared with reference designs. Simulation results also reveal that the proposed differential dynamic logic significantly reduces the SEU rate compared to traditional dynamic logic, and has a higher critical charge and shorter SET pulses than reference hardened design

    Energy Consumption Evaluation of Flip-Flops for Dynamic Voltage Scaling Systems and Circulating-Temperature Applications

    Get PDF
    CMOS circuit technology has developed with a help of transistor scaling. In past decades, previous studies found that operating environment of digital system affects circuit performance significantly. For example, extensive research has been performed to learn temperature dependency of CMOS circuits and optimize their performance at given temperature. Adaptive voltage scaling (AVS) is one of the important techniques, responses to operating temperature and dynamically change supply voltage of digital circuits to optimize circuit performance. AVS can enable energy optimization for a system experiencing significant temperature variation, including a space satellite. However, inappropriate AVS results in functional failure in an extreme condition, which can cause substantial problem for a critical mission. This work proposes technique to evaluate recently published flip-flops considering their power-delay product (PDP) and reliability for an AVS system operating under circulating temperature. The 8 flip-flops are evaluated under different temperature and supply voltage. Their PDPs are compared assuming that temperature changes linearly. Functional reliability is quantitatively evaluated using corner and Monte-Carlo simulations, and failure mechanisms of flip-flops are discussed as well

    Robust Circuit Design for Low-Voltage VLSI.

    Full text link
    Voltage scaling is an effective way to reduce the overall power consumption, but the major challenges in low voltage operations include performance degradation and reliability issues due to PVT variations. This dissertation discusses three key circuit components that are critical in low-voltage VLSI. Level converters must be a reliable interface between two voltage domains, but the reduced on/off-current ratio makes it extremely difficult to achieve robust conversions at low voltages. Two static designs are proposed: LC2 adopts a novel pulsed-operation and modulates its pull-up strength depending on its state. A 3-sigma robustness is guaranteed using a current margin plot; SLC inherently reduces the contention by diode-insertion. Improvements in performance, power, and robustness are measured from 130nm CMOS test chips. SRAM is a major bottleneck in voltage-scaling due to its inherent ratioed-bitcell design. The proposed 7T SRAM alleviates the area overhead incurred by 8T bitcells and provides robust operation down to 0.32V in 180nm CMOS test chips with 3.35fW/bit leakage. Auto-Shut-Off provides a 6.8x READ energy reduction, and its innate Quasi-Static READ has been demonstrated which shows a much improved READ error rate. A use of PMOS Pass-Gate improves the half-select robustness by directly modulating the device strength through bitline voltage. Clocked sequential elements, flip-flops in short, are ubiquitous in today’s digital systems. The proposed S2CFF is static, single-phase, contention-free, and has the same number of devices as in TGFF. It shows a 40% power reduction as well as robust low-voltage operations in fabricated 45nm SOI test chips. Its simple hold-time path and the 3.4x improvement in 3-sigma hold-time is presented. A new on-chip flip-flop testing harness is also proposed, and measured hold-time variations of flip-flops are presented.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111525/1/yejoong_1.pd

    A Comparative Analysis for Low-voltage, Low-power, and Low-energy Flip-flops

    Get PDF
    Recently, several flip-flops have been proposed to increase their speed while reducing their power and energy consumption. Flip-flop power is dependent on data activity and in many applications data activity is between 5-15%. In such cases, significantly large clock power and energy is wasted. This thesis explores performance of seven advanced D flip-flops in terms of power, delay, and energy using 65nm CMOS process technology. The main objective of this research is to compare and contrast recent flip-flops under different voltage and data activity conditions, and draw conclusions. Transmission gate flip-flop (TGFF) is used as a reference flip-flop, and based on comparison result TGFF has shown power-performance trade-offs. 18-T single-phase clocked static flip-flops (18TSPC, TSPC18), and Low-power at low data activity flip-flop (LLFF) are the fastest alternatives suitable for higher performance. However, LLFF consumes more power on higher data activities, where as TSPC18 and 18TSPC consume more power on lower data activities. For lower data activities Topologically compressed flip-flop (TCFF) is power efficient amongst all, but poor in performance. Furthermore, post-layout simulation result illustrates that LLFF is the most energy efficient amongst all up to 20% of data activities, and 18TSPC is energy efficient for higher data activities

    Embedding Logic and Non-volatile Devices in CMOS Digital Circuits for Improving Energy Efficiency

    Get PDF
    abstract: Static CMOS logic has remained the dominant design style of digital systems for more than four decades due to its robustness and near zero standby current. Static CMOS logic circuits consist of a network of combinational logic cells and clocked sequential elements, such as latches and flip-flops that are used for sequencing computations over time. The majority of the digital design techniques to reduce power, area, and leakage over the past four decades have focused almost entirely on optimizing the combinational logic. This work explores alternate architectures for the flip-flops for improving the overall circuit performance, power and area. It consists of three main sections. First, is the design of a multi-input configurable flip-flop structure with embedded logic. A conventional D-type flip-flop may be viewed as realizing an identity function, in which the output is simply the value of the input sampled at the clock edge. In contrast, the proposed multi-input flip-flop, named PNAND, can be configured to realize one of a family of Boolean functions called threshold functions. In essence, the PNAND is a circuit implementation of the well-known binary perceptron. Unlike other reconfigurable circuits, a PNAND can be configured by simply changing the assignment of signals to its inputs. Using a standard cell library of such gates, a technology mapping algorithm can be applied to transform a given netlist into one with an optimal mixture of conventional logic gates and threshold gates. This approach was used to fabricate a 32-bit Wallace Tree multiplier and a 32-bit booth multiplier in 65nm LP technology. Simulation and chip measurements show more than 30% improvement in dynamic power and more than 20% reduction in core area. The functional yield of the PNAND reduces with geometry and voltage scaling. The second part of this research investigates the use of two mechanisms to improve the robustness of the PNAND circuit architecture. One is the use of forward and reverse body biases to change the device threshold and the other is the use of RRAM devices for low voltage operation. The third part of this research focused on the design of flip-flops with non-volatile storage. Spin-transfer torque magnetic tunnel junctions (STT-MTJ) are integrated with both conventional D-flipflop and the PNAND circuits to implement non-volatile logic (NVL). These non-volatile storage enhanced flip-flops are able to save the state of system locally when a power interruption occurs. However, manufacturing variations in the STT-MTJs and in the CMOS transistors significantly reduce the yield, leading to an overly pessimistic design and consequently, higher energy consumption. A detailed analysis of the design trade-offs in the driver circuitry for performing backup and restore, and a novel method to design the energy optimal driver for a given yield is presented. Efficient designs of two nonvolatile flip-flop (NVFF) circuits are presented, in which the backup time is determined on a per-chip basis, resulting in minimizing the energy wastage and satisfying the yield constraint. To achieve a yield of 98%, the conventional approach would have to expend nearly 5X more energy than the minimum required, whereas the proposed tunable approach expends only 26% more energy than the minimum. A non-volatile threshold gate architecture NV-TLFF are designed with the same backup and restore circuitry in 65nm technology. The embedded logic in NV-TLFF compensates performance overhead of NVL. This leads to the possibility of zero-overhead non-volatile datapath circuits. An 8-bit multiply-and- accumulate (MAC) unit is designed to demonstrate the performance benefits of the proposed architecture. Based on the results of HSPICE simulations, the MAC circuit with the proposed NV-TLFF cells is shown to consume at least 20% less power and area as compared to the circuit designed with conventional DFFs, without sacrificing any performance.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Circuit Techniques for Adaptive and Reliable High Performance Computing.

    Full text link
    Increasing power density with process scaling has caused stagnation in the clock speed of modern microprocessors. Accordingly, designers have adopted message passing and shared memory based multicore architectures in order to keep up with the rapidly rising demand for computing throughput. At the same time, applications are not entirely parallel and improving single-thread performance continues to remain critical. Additionally, reliability is also worsening with process scaling, and margining for failures due to process and environmental variations in modern technologies consumes an increasingly large portion of the power/performance envelope. In the wake of multicore computing, reliability of signal synchronization between the cores is also becoming increasingly critical. This forces designers to search for alternate efficient methods to improve compute performance while addressing reliability. Accordingly, this dissertation presents innovative circuit and architectural techniques for variation-tolerance, performance and reliability targeted at datapath logic, signal synchronization and memories. Firstly, a domino logic based design style for datapath logic is presented that uses Adaptive Robustness Tuning (ART) in addition to timing speculation to provide up to 71% performance gains over conventional domino logic in 32bx32b multiplier in 65nm CMOS. Margins are reduced until functionality errors are detected, that are used to guide the tuning. Secondly, for signal synchronization across clock domains, a new class of dynamic logic based synchronizers with single-cycle synchronization latency is presented, where pulses, rather than stable intermediate voltages cause metastability. Such pulses are amplified using skewed inverters to improve mean time between failures by ~1e6x over jamb latches and double flip-flops at 2GHz in 65nm CMOS. Thirdly, a reconfigurable sensing scheme for 6T SRAMs is presented that employs auto-zero calibration and pre-amplification to improve sensing reliability (by up to 1.2 standard deviations of NMOS threshold voltage in 28nm CMOS); this increased reliability is in turn traded for ~42% sensing speedup. Finally, a main memory architecture design methodology to address reliability and power in the context of Exascale computing systems is presented. Based on 3D-stacked DRAMs, the methodology co-optimizes DRAM access energy, refresh power and the increased cost of error resilience, to meet stringent power and reliability constraints.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/107238/1/bharan_1.pd

    Reliable Low-Power High Performance Spintronic Memories

    Get PDF
    Moores Gesetz folgend, ist es der Chipindustrie in den letzten fünf Jahrzehnten gelungen, ein explosionsartiges Wachstum zu erreichen. Dies hatte ebenso einen exponentiellen Anstieg der Nachfrage von Speicherkomponenten zur Folge, was wiederum zu speicherlastigen Chips in den heutigen Computersystemen führt. Allerdings stellen traditionelle on-Chip Speichertech- nologien wie Static Random Access Memories (SRAMs), Dynamic Random Access Memories (DRAMs) und Flip-Flops eine Herausforderung in Bezug auf Skalierbarkeit, Verlustleistung und Zuverlässigkeit dar. Eben jene Herausforderungen und die überwältigende Nachfrage nach höherer Performanz und Integrationsdichte des on-Chip Speichers motivieren Forscher, nach neuen nichtflüchtigen Speichertechnologien zu suchen. Aufkommende spintronische Spe- ichertechnologien wie Spin Orbit Torque (SOT) und Spin Transfer Torque (STT) erhielten in den letzten Jahren eine hohe Aufmerksamkeit, da sie eine Reihe an Vorteilen bieten. Dazu gehören Nichtflüchtigkeit, Skalierbarkeit, hohe Beständigkeit, CMOS Kompatibilität und Unan- fälligkeit gegenüber Soft-Errors. In der Spintronik repräsentiert der Spin eines Elektrons dessen Information. Das Datum wird durch die Höhe des Widerstandes gespeichert, welche sich durch das Anlegen eines polarisierten Stroms an das Speichermedium verändern lässt. Das Prob- lem der statischen Leistung gehen die Speichergeräte sowohl durch deren verlustleistungsfreie Eigenschaft, als auch durch ihr Standard- Aus/Sofort-Ein Verhalten an. Nichtsdestotrotz sind noch andere Probleme, wie die hohe Zugriffslatenz und die Energieaufnahme zu lösen, bevor sie eine verbreitete Anwendung finden können. Um diesen Problemen gerecht zu werden, sind neue Computerparadigmen, -architekturen und -entwurfsphilosophien notwendig. Die hohe Zugriffslatenz der Spintroniktechnologie ist auf eine vergleichsweise lange Schalt- dauer zurückzuführen, welche die von konventionellem SRAM übersteigt. Des Weiteren ist auf Grund des stochastischen Schaltvorgangs der Speicherzelle und des Einflusses der Prozessvari- ation ein nicht zu vernachlässigender Zeitraum dafür erforderlich. In diesem Zeitraum wird ein konstanter Schreibstrom durch die Bitzelle geleitet, um den Schaltvorgang zu gewährleisten. Dieser Vorgang verursacht eine hohe Energieaufnahme. Für die Leseoperation wird gleicher- maßen ein beachtliches Zeitfenster benötigt, ebenfalls bedingt durch den Einfluss der Prozess- variation. Dem gegenüber stehen diverse Zuverlässigkeitsprobleme. Dazu gehören unter An- derem die Leseintereferenz und andere Degenerationspobleme, wie das des Time Dependent Di- electric Breakdowns (TDDB). Diese Zuverlässigkeitsprobleme sind wiederum auf die benötigten längeren Schaltzeiten zurückzuführen, welche in der Folge auch einen über längere Zeit an- liegenden Lese- bzw. Schreibstrom implizieren. Es ist daher notwendig, sowohl die Energie, als auch die Latenz zur Steigerung der Zuverlässigkeit zu reduzieren, um daraus einen potenziellen Kandidaten für ein on-Chip Speichersystem zu machen. In dieser Dissertation werden wir Entwurfsstrategien vorstellen, welche das Ziel verfolgen, die Herausforderungen des Cache-, Register- und Flip-Flop-Entwurfs anzugehen. Dies erre- ichen wir unter Zuhilfenahme eines Cross-Layer Ansatzes. Für Caches entwickelten wir ver- schiedene Ansätze auf Schaltkreisebene, welche sowohl auf der Speicherarchitekturebene, als auch auf der Systemebene in Bezug auf Energieaufnahme, Performanzsteigerung und Zuver- lässigkeitverbesserung evaluiert werden. Wir entwickeln eine Selbstabschalttechnik, sowohl für die Lese-, als auch die Schreiboperation von Caches. Diese ist in der Lage, den Abschluss der entsprechenden Operation dynamisch zu ermitteln. Nachdem der Abschluss erkannt wurde, wird die Lese- bzw. Schreiboperation sofort gestoppt, um Energie zu sparen. Zusätzlich limitiert die Selbstabschalttechnik die Dauer des Stromflusses durch die Speicherzelle, was wiederum das Auftreten von TDDB und Leseinterferenz bei Schreib- bzw. Leseoperationen re- duziert. Zur Verbesserung der Schreiblatenz heben wir den Schreibstrom an der Bitzelle an, um den magnetischen Schaltprozess zu beschleunigen. Um registerbankspezifische Anforderungen zu berücksichtigen, haben wir zusätzlich eine Multiport-Speicherarchitektur entworfen, welche eine einzigartige Eigenschaft der SOT-Zelle ausnutzt, um simultan Lese- und Schreiboperatio- nen auszuführen. Es ist daher möglich Lese/Schreib- Konfilkte auf Bitzellen-Ebene zu lösen, was sich wiederum in einer sehr viel einfacheren Multiport- Registerbankarchitektur nieder- schlägt. Zusätzlich zu den Speicheransätzen haben wir ebenfalls zwei Flip-Flop-Architekturen vorgestellt. Die erste ist eine nichtflüchtige non-Shadow Flip-Flop-Architektur, welche die Speicherzelle als aktive Komponente nutzt. Dies ermöglicht das sofortige An- und Ausschalten der Versorgungss- pannung und ist daher besonders gut für aggressives Powergating geeignet. Alles in Allem zeigt der vorgestellte Flip-Flop-Entwurf eine ähnliche Timing-Charakteristik wie die konventioneller CMOS Flip-Flops auf. Jedoch erlaubt er zur selben Zeit eine signifikante Reduktion der statis- chen Leistungsaufnahme im Vergleich zu nichtflüchtigen Shadow- Flip-Flops. Die zweite ist eine fehlertolerante Flip-Flop-Architektur, welche sich unanfällig gegenüber diversen Defekten und Fehlern verhält. Die Leistungsfähigkeit aller vorgestellten Techniken wird durch ausführliche Simulationen auf Schaltkreisebene verdeutlicht, welche weiter durch detaillierte Evaluationen auf Systemebene untermauert werden. Im Allgemeinen konnten wir verschiedene Techniken en- twickeln, die erhebliche Verbesserungen in Bezug auf Performanz, Energie und Zuverlässigkeit von spintronischen on-Chip Speichern, wie Caches, Register und Flip-Flops erreichen

    Energy-Efficient Neural Network Architectures

    Full text link
    Emerging systems for artificial intelligence (AI) are expected to rely on deep neural networks (DNNs) to achieve high accuracy for a broad variety of applications, including computer vision, robotics, and speech recognition. Due to the rapid growth of network size and depth, however, DNNs typically result in high computational costs and introduce considerable power and performance overheads. Dedicated chip architectures that implement DNNs with high energy efficiency are essential for adding intelligence to interactive edge devices, enabling them to complete increasingly sophisticated tasks by extending battery lie. They are also vital for improving performance in cloud servers that support demanding AI computations. This dissertation focuses on architectures and circuit technologies for designing energy-efficient neural network accelerators. First, a deep-learning processor is presented for achieving ultra-low power operation. Using a heterogeneous architecture that includes a low-power always-on front-end and a selectively-enabled high-performance back-end, the processor dynamically adjusts computational resources at runtime to support conditional execution in neural networks and meet performance targets with increased energy efficiency. Featuring a reconfigurable datapath and a memory architecture optimized for energy efficiency, the processor supports multilevel dynamic activation of neural network segments, performing object detection tasks with 5.3x lower energy consumption in comparison with a static execution baseline. Fabricated in 40nm CMOS, the processor test-chip dissipates 0.23mW at 5.3 fps. It demonstrates energy scalability up to 28.6 TOPS/W and can be configured to run a variety of workloads, including severely power-constrained ones such as always-on monitoring in mobile applications. To further improve the energy efficiency of the proposed heterogeneous architecture, a new charge-recovery logic family, called zero-short-circuit current (ZSCC) logic, is proposed to decrease the power consumption of the always-on front-end. By relying on dedicated circuit topologies and a four-phase clocking scheme, ZSCC operates with significantly reduced short-circuit currents, realizing order-of-magnitude power savings at relatively low clock frequencies (in the order of a few MHz). The efficiency and applicability of ZSCC is demonstrated through an ANSI S1.11 1/3 octave filter bank chip for binaural hearing aids with two microphones per ear. Fabricated in a 65nm CMOS process, this charge-recovery chip consumes 13.8µW with a 1.75MHz clock frequency, achieving 9.7x power reduction per input in comparison with a 40nm monophonic single-input chip that represents the published state of the art. The ability of ZSCC to further increase the energy efficiency of the heterogeneous neural network architecture is demonstrated through the design and evaluation of a ZSCC-based front-end. Simulation results show 17x power reduction compared with a conventional static CMOS implementation of the same architecture.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147614/1/hsiwu_1.pd

    Circuits and Systems Advances in Near Threshold Computing

    Get PDF
    Modern society is witnessing a sea change in ubiquitous computing, in which people have embraced computing systems as an indispensable part of day-to-day existence. Computation, storage, and communication abilities of smartphones, for example, have undergone monumental changes over the past decade. However, global emphasis on creating and sustaining green environments is leading to a rapid and ongoing proliferation of edge computing systems and applications. As a broad spectrum of healthcare, home, and transport applications shift to the edge of the network, near-threshold computing (NTC) is emerging as one of the promising low-power computing platforms. An NTC device sets its supply voltage close to its threshold voltage, dramatically reducing the energy consumption. Despite showing substantial promise in terms of energy efficiency, NTC is yet to see widescale commercial adoption. This is because circuits and systems operating with NTC suffer from several problems, including increased sensitivity to process variation, reliability problems, performance degradation, and security vulnerabilities, to name a few. To realize its potential, we need designs, techniques, and solutions to overcome these challenges associated with NTC circuits and systems. The readers of this book will be able to familiarize themselves with recent advances in electronics systems, focusing on near-threshold computing

    Near-Threshold Computing: Past, Present, and Future.

    Full text link
    Transistor threshold voltages have stagnated in recent years, deviating from constant-voltage scaling theory and directly limiting supply voltage scaling. To overcome the resulting energy and power dissipation barriers, energy efficiency can be improved through aggressive voltage scaling, and there has been increased interest in operating at near-threshold computing (NTC) supply voltages. In this region sizable energy gains are achieved with moderate performance loss, some of which can be regained through parallelism. This thesis first provides a methodical definition of how near to threshold is "near threshold" and continues with an in-depth examination of NTC across past, present, and future CMOS technologies. By systematically defining near-threshold, the trends and tradeoffs are analyzed, lending insight in how best to design and optimize near-threshold systems. NTC works best for technologies that feature good circuit delay scalability, therefore technologies without strong short-channel effects. Early planar technologies (prior to 90nm or so) featured good circuit scalability (8x energy gains), but lacked area in which to add cores for parallelization. Recent planar nodes (32nm – 20nm) feature more area for cores but suffer from poor delay scalability, and so are not well-suited for NTC (4x energy gains). The switch to FinFET CMOS technology allows for a return to strong voltage scalability (8x gain), reversing trends seen in planar technologies, while dark silicon has created an opportunity to add cores for parallelization. Improved FinFET voltage scalability even allows for latency reduction of a single task, as long as the task is sufficiently parallelizable (< 10% serial code). Finally, we will look at a technique for fast voltage boosting, called Shortstop, in which a core's operating voltage is raised in 10s of cycles. Shortstop can be used to quickly respond to single-threaded performance demands of a near-threshold system by leveraging the innate parasitic inductance of a dedicated dirty supply rail, further improving energy efficiency. The technique is demonstrated in a wirebond implementation and is able to boost a core up to 1.8x faster than a header-based approach, while reducing supply droop by 2-7x. An improved flip-chip architecture is also proposed.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113600/1/npfet_1.pd
    corecore