21 research outputs found

    Design considerations of a nonvolatile accumulator-based 8-bit processor

    Get PDF
    The rise of the Internet of Things (IoT) and theconstant growth of portable electronics have leveraged the con-cern with energy consumption. Nonvolatile memory (NVM)emerged as a solution to mitigate the problem due to its abilityto retain data on sleep mode without a power supply. Non-volatile processors (NVPs) may further improve energy savingby using nonvolatile flip-flops (NVFFs) to store system state,allowing the device to be turned off when idle and resume ex-ecution instantly after power-on. In view of the potential pre-sented by NVPs, this work describes the initial steps to imple-ment a nonvolatile version of Neander, a hypothetical processorcreated for educational purposes. First, we implemented Ne-ander in Register Transfer Level (RTL), separating the com-binational logic from the sequential elements. Then, the lat-ter was replaced by circuit-level descriptions of volatile flip-flops. We then validated this implementation by employinga mixed-signal simulation over a set of benchmarks. Resultshave shown the expected behavior for the whole instructionset. Then, we implemented circuit-level descriptions of mag-netic tunnel junction (MTJ) based nonvolatile flip-flops, usingan open-source MTJ model. These elements were exhaustivelyvalidated using electrical simulations. With these results, weintend to carry on the implementation and fully equip our pro-cessor with nonvolatile features such as instant wake-up

    Reliable Low-Power High Performance Spintronic Memories

    Get PDF
    Moores Gesetz folgend, ist es der Chipindustrie in den letzten fĂŒnf Jahrzehnten gelungen, ein explosionsartiges Wachstum zu erreichen. Dies hatte ebenso einen exponentiellen Anstieg der Nachfrage von Speicherkomponenten zur Folge, was wiederum zu speicherlastigen Chips in den heutigen Computersystemen fĂŒhrt. Allerdings stellen traditionelle on-Chip Speichertech- nologien wie Static Random Access Memories (SRAMs), Dynamic Random Access Memories (DRAMs) und Flip-Flops eine Herausforderung in Bezug auf Skalierbarkeit, Verlustleistung und ZuverlĂ€ssigkeit dar. Eben jene Herausforderungen und die ĂŒberwĂ€ltigende Nachfrage nach höherer Performanz und Integrationsdichte des on-Chip Speichers motivieren Forscher, nach neuen nichtflĂŒchtigen Speichertechnologien zu suchen. Aufkommende spintronische Spe- ichertechnologien wie Spin Orbit Torque (SOT) und Spin Transfer Torque (STT) erhielten in den letzten Jahren eine hohe Aufmerksamkeit, da sie eine Reihe an Vorteilen bieten. Dazu gehören NichtflĂŒchtigkeit, Skalierbarkeit, hohe BestĂ€ndigkeit, CMOS KompatibilitĂ€t und Unan- fĂ€lligkeit gegenĂŒber Soft-Errors. In der Spintronik reprĂ€sentiert der Spin eines Elektrons dessen Information. Das Datum wird durch die Höhe des Widerstandes gespeichert, welche sich durch das Anlegen eines polarisierten Stroms an das Speichermedium verĂ€ndern lĂ€sst. Das Prob- lem der statischen Leistung gehen die SpeichergerĂ€te sowohl durch deren verlustleistungsfreie Eigenschaft, als auch durch ihr Standard- Aus/Sofort-Ein Verhalten an. Nichtsdestotrotz sind noch andere Probleme, wie die hohe Zugriffslatenz und die Energieaufnahme zu lösen, bevor sie eine verbreitete Anwendung finden können. Um diesen Problemen gerecht zu werden, sind neue Computerparadigmen, -architekturen und -entwurfsphilosophien notwendig. Die hohe Zugriffslatenz der Spintroniktechnologie ist auf eine vergleichsweise lange Schalt- dauer zurĂŒckzufĂŒhren, welche die von konventionellem SRAM ĂŒbersteigt. Des Weiteren ist auf Grund des stochastischen Schaltvorgangs der Speicherzelle und des Einflusses der Prozessvari- ation ein nicht zu vernachlĂ€ssigender Zeitraum dafĂŒr erforderlich. In diesem Zeitraum wird ein konstanter Schreibstrom durch die Bitzelle geleitet, um den Schaltvorgang zu gewĂ€hrleisten. Dieser Vorgang verursacht eine hohe Energieaufnahme. FĂŒr die Leseoperation wird gleicher- maßen ein beachtliches Zeitfenster benötigt, ebenfalls bedingt durch den Einfluss der Prozess- variation. Dem gegenĂŒber stehen diverse ZuverlĂ€ssigkeitsprobleme. Dazu gehören unter An- derem die Leseintereferenz und andere Degenerationspobleme, wie das des Time Dependent Di- electric Breakdowns (TDDB). Diese ZuverlĂ€ssigkeitsprobleme sind wiederum auf die benötigten lĂ€ngeren Schaltzeiten zurĂŒckzufĂŒhren, welche in der Folge auch einen ĂŒber lĂ€ngere Zeit an- liegenden Lese- bzw. Schreibstrom implizieren. Es ist daher notwendig, sowohl die Energie, als auch die Latenz zur Steigerung der ZuverlĂ€ssigkeit zu reduzieren, um daraus einen potenziellen Kandidaten fĂŒr ein on-Chip Speichersystem zu machen. In dieser Dissertation werden wir Entwurfsstrategien vorstellen, welche das Ziel verfolgen, die Herausforderungen des Cache-, Register- und Flip-Flop-Entwurfs anzugehen. Dies erre- ichen wir unter Zuhilfenahme eines Cross-Layer Ansatzes. FĂŒr Caches entwickelten wir ver- schiedene AnsĂ€tze auf Schaltkreisebene, welche sowohl auf der Speicherarchitekturebene, als auch auf der Systemebene in Bezug auf Energieaufnahme, Performanzsteigerung und Zuver- lĂ€ssigkeitverbesserung evaluiert werden. Wir entwickeln eine Selbstabschalttechnik, sowohl fĂŒr die Lese-, als auch die Schreiboperation von Caches. Diese ist in der Lage, den Abschluss der entsprechenden Operation dynamisch zu ermitteln. Nachdem der Abschluss erkannt wurde, wird die Lese- bzw. Schreiboperation sofort gestoppt, um Energie zu sparen. ZusĂ€tzlich limitiert die Selbstabschalttechnik die Dauer des Stromflusses durch die Speicherzelle, was wiederum das Auftreten von TDDB und Leseinterferenz bei Schreib- bzw. Leseoperationen re- duziert. Zur Verbesserung der Schreiblatenz heben wir den Schreibstrom an der Bitzelle an, um den magnetischen Schaltprozess zu beschleunigen. Um registerbankspezifische Anforderungen zu berĂŒcksichtigen, haben wir zusĂ€tzlich eine Multiport-Speicherarchitektur entworfen, welche eine einzigartige Eigenschaft der SOT-Zelle ausnutzt, um simultan Lese- und Schreiboperatio- nen auszufĂŒhren. Es ist daher möglich Lese/Schreib- Konfilkte auf Bitzellen-Ebene zu lösen, was sich wiederum in einer sehr viel einfacheren Multiport- Registerbankarchitektur nieder- schlĂ€gt. ZusĂ€tzlich zu den SpeicheransĂ€tzen haben wir ebenfalls zwei Flip-Flop-Architekturen vorgestellt. Die erste ist eine nichtflĂŒchtige non-Shadow Flip-Flop-Architektur, welche die Speicherzelle als aktive Komponente nutzt. Dies ermöglicht das sofortige An- und Ausschalten der Versorgungss- pannung und ist daher besonders gut fĂŒr aggressives Powergating geeignet. Alles in Allem zeigt der vorgestellte Flip-Flop-Entwurf eine Ă€hnliche Timing-Charakteristik wie die konventioneller CMOS Flip-Flops auf. Jedoch erlaubt er zur selben Zeit eine signifikante Reduktion der statis- chen Leistungsaufnahme im Vergleich zu nichtflĂŒchtigen Shadow- Flip-Flops. Die zweite ist eine fehlertolerante Flip-Flop-Architektur, welche sich unanfĂ€llig gegenĂŒber diversen Defekten und Fehlern verhĂ€lt. Die LeistungsfĂ€higkeit aller vorgestellten Techniken wird durch ausfĂŒhrliche Simulationen auf Schaltkreisebene verdeutlicht, welche weiter durch detaillierte Evaluationen auf Systemebene untermauert werden. Im Allgemeinen konnten wir verschiedene Techniken en- twickeln, die erhebliche Verbesserungen in Bezug auf Performanz, Energie und ZuverlĂ€ssigkeit von spintronischen on-Chip Speichern, wie Caches, Register und Flip-Flops erreichen

    Soft-Error Resilience Framework For Reliable and Energy-Efficient CMOS Logic and Spintronic Memory Architectures

    Get PDF
    The revolution in chip manufacturing processes spanning five decades has proliferated high performance and energy-efficient nano-electronic devices across all aspects of daily life. In recent years, CMOS technology scaling has realized billions of transistors within large-scale VLSI chips to elevate performance. However, these advancements have also continually augmented the impact of Single-Event Transient (SET) and Single-Event Upset (SEU) occurrences which precipitate a range of Soft-Error (SE) dependability issues. Consequently, soft-error mitigation techniques have become essential to improve systems\u27 reliability. Herein, first, we proposed optimized soft-error resilience designs to improve robustness of sub-micron computing systems. The proposed approaches were developed to deliver energy-efficiency and tolerate double/multiple errors simultaneously while incurring acceptable speed performance degradation compared to the prior work. Secondly, the impact of Process Variation (PV) at the Near-Threshold Voltage (NTV) region on redundancy-based SE-mitigation approaches for High-Performance Computing (HPC) systems was investigated to highlight the approach that can realize favorable attributes, such as reduced critical datapath delay variation and low speed degradation. Finally, recently, spin-based devices have been widely used to design Non-Volatile (NV) elements such as NV latches and flip-flops, which can be leveraged in normally-off computing architectures for Internet-of-Things (IoT) and energy-harvesting-powered applications. Thus, in the last portion of this dissertation, we design and evaluate for soft-error resilience NV-latching circuits that can achieve intriguing features, such as low energy consumption, high computing performance, and superior soft errors tolerance, i.e., concurrently able to tolerate Multiple Node Upset (MNU), to potentially become a mainstream solution for the aerospace and avionic nanoelectronics. Together, these objectives cooperate to increase energy-efficiency and soft errors mitigation resiliency of larger-scale emerging NV latching circuits within iso-energy constraints. In summary, addressing these reliability concerns is paramount to successful deployment of future reliable and energy-efficient CMOS logic and spintronic memory architectures with deeply-scaled devices operating at low-voltages

    Normally-Off Computing Design Methodology Using Spintronics: From Devices to Architectures

    Get PDF
    Energy-harvesting-powered computing offers intriguing and vast opportunities to dramatically transform the landscape of Internet of Things (IoT) devices and wireless sensor networks by utilizing ambient sources of light, thermal, kinetic, and electromagnetic energy to achieve battery-free computing. In order to operate within the restricted energy capacity and intermittency profile of battery-free operation, it is proposed to innovate Elastic Intermittent Computation (EIC) as a new duty-cycle-variable computing approach leveraging the non-volatility inherent in post-CMOS switching devices. The foundations of EIC will be advanced from the ground up by extending Spin Hall Effect Magnetic Tunnel Junction (SHE-MTJ) device models to realize SHE-MTJ-based Majority Gate (MG) and Polymorphic Gate (PG) logic approaches and libraries, that leverage intrinsic-non-volatility to realize middleware-coherent, intermittent computation without checkpointing, micro-tasking, or software bloat and energy overheads vital to IoT. Device-level EIC research concentrates on encapsulating SHE-MTJ behavior with a compact model to leverage the non-volatility of the device for intrinsic provision of intermittent computation and lifetime energy reduction. Based on this model, the circuit-level EIC contributions will entail the design, simulation, and analysis of PG-based spintronic logic which is adaptable at the gate-level to support variable duty cycle execution that is robust to brief and extended supply outages or unscheduled dropouts, and development of spin-based research synthesis and optimization routines compatible with existing commercial toolchains. These tools will be employed to design a hybrid post-CMOS processing unit utilizing pipelining and power-gating through state-holding properties within the datapath itself, thus eliminating checkpointing and data transfer operations

    FeFET Based Nonvolatile TCAM and DRAM Development

    Get PDF
    Ferroelectric Field Effect Transistor (FeFET) is a promising nonvolatile device which provides high integration density, fast programming speed, and excellent CMOS compatibility. In general, the non-volatility of FeFET is impacted by its physical structure and there is a trade-off between data retention time and device endurance. To improve the cell endurance, for example, the ferroelectric layer of FeFET needs to be programmed to a low polarization level, leading to a short retention time. In ferroelectric DRAM (FeDRAM) design, degradation in FeFET retention time and write-read disturbance requires the FeDRAM cells to be periodically refreshed in order to prevent data loss. In this work, I propose a novel adaptive refreshing and read voltage control scheme to minimize the energy overheads associated with FeDRAM refreshing while still achieve high cell access reliability. In addition to the DRAM application FeFET based TCAM memory is also studied. TCAM (ternary content addressable memory) is a special memory type that can compare input search data with stored data, and return location (sometime, the associated content) of matched data. TCAM is widely used in microprocessor designs as well as communication chip, e.g., IP-routing. Following technology advances of emerging nonvolatile memories (eNVM), applying eNVM to TCAM designs becomes attractive to achieve high density and low standby power. In this work, I examined the applications of three promising eNVM tech-nologies, i.e., magnetic tunneling junction (MTJ), memristor, and ferroelectric memory field effect transistor (FeMFET), in the design of nonvolatile TCAM cells. All these technologies can achieve close-to-zero standby power though each of them has very different pros and cons

    Embedding Logic and Non-volatile Devices in CMOS Digital Circuits for Improving Energy Efficiency

    Get PDF
    abstract: Static CMOS logic has remained the dominant design style of digital systems for more than four decades due to its robustness and near zero standby current. Static CMOS logic circuits consist of a network of combinational logic cells and clocked sequential elements, such as latches and flip-flops that are used for sequencing computations over time. The majority of the digital design techniques to reduce power, area, and leakage over the past four decades have focused almost entirely on optimizing the combinational logic. This work explores alternate architectures for the flip-flops for improving the overall circuit performance, power and area. It consists of three main sections. First, is the design of a multi-input configurable flip-flop structure with embedded logic. A conventional D-type flip-flop may be viewed as realizing an identity function, in which the output is simply the value of the input sampled at the clock edge. In contrast, the proposed multi-input flip-flop, named PNAND, can be configured to realize one of a family of Boolean functions called threshold functions. In essence, the PNAND is a circuit implementation of the well-known binary perceptron. Unlike other reconfigurable circuits, a PNAND can be configured by simply changing the assignment of signals to its inputs. Using a standard cell library of such gates, a technology mapping algorithm can be applied to transform a given netlist into one with an optimal mixture of conventional logic gates and threshold gates. This approach was used to fabricate a 32-bit Wallace Tree multiplier and a 32-bit booth multiplier in 65nm LP technology. Simulation and chip measurements show more than 30% improvement in dynamic power and more than 20% reduction in core area. The functional yield of the PNAND reduces with geometry and voltage scaling. The second part of this research investigates the use of two mechanisms to improve the robustness of the PNAND circuit architecture. One is the use of forward and reverse body biases to change the device threshold and the other is the use of RRAM devices for low voltage operation. The third part of this research focused on the design of flip-flops with non-volatile storage. Spin-transfer torque magnetic tunnel junctions (STT-MTJ) are integrated with both conventional D-flipflop and the PNAND circuits to implement non-volatile logic (NVL). These non-volatile storage enhanced flip-flops are able to save the state of system locally when a power interruption occurs. However, manufacturing variations in the STT-MTJs and in the CMOS transistors significantly reduce the yield, leading to an overly pessimistic design and consequently, higher energy consumption. A detailed analysis of the design trade-offs in the driver circuitry for performing backup and restore, and a novel method to design the energy optimal driver for a given yield is presented. Efficient designs of two nonvolatile flip-flop (NVFF) circuits are presented, in which the backup time is determined on a per-chip basis, resulting in minimizing the energy wastage and satisfying the yield constraint. To achieve a yield of 98%, the conventional approach would have to expend nearly 5X more energy than the minimum required, whereas the proposed tunable approach expends only 26% more energy than the minimum. A non-volatile threshold gate architecture NV-TLFF are designed with the same backup and restore circuitry in 65nm technology. The embedded logic in NV-TLFF compensates performance overhead of NVL. This leads to the possibility of zero-overhead non-volatile datapath circuits. An 8-bit multiply-and- accumulate (MAC) unit is designed to demonstrate the performance benefits of the proposed architecture. Based on the results of HSPICE simulations, the MAC circuit with the proposed NV-TLFF cells is shown to consume at least 20% less power and area as compared to the circuit designed with conventional DFFs, without sacrificing any performance.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    MFPA: Mixed-Signal Field Programmable Array for Energy-Aware Compressive Signal Processing

    Get PDF
    Compressive Sensing (CS) is a signal processing technique which reduces the number of samples taken per frame to decrease energy, storage, and data transmission overheads, as well as reducing time taken for data acquisition in time-critical applications. The tradeoff in such an approach is increased complexity of signal reconstruction. While several algorithms have been developed for CS signal reconstruction, hardware implementation of these algorithms is still an area of active research. Prior work has sought to utilize parallelism available in reconstruction algorithms to minimize hardware overheads; however, such approaches are limited by the underlying limitations in CMOS technology. Herein, the MFPA (Mixed-signal Field Programmable Array) approach is presented as a hybrid spin-CMOS reconfigurable fabric specifically designed for implementation of CS data sampling and signal reconstruction. The resulting fabric consists of 1) slice-organized analog blocks providing amplifiers, transistors, capacitors, and Magnetic Tunnel Junctions (MTJs) which are configurable to achieving square/square root operations required for calculating vector norms, 2) digital functional blocks which feature 6-input clockless lookup tables for computation of matrix inverse, and 3) an MRAM-based nonvolatile crossbar array for carrying out low-energy matrix-vector multiplication operations. The various functional blocks are connected via a global interconnect and spin-based analog-to-digital converters. Simulation results demonstrate significant energy and area benefits compared to equivalent CMOS digital implementations for each of the functional blocks used: this includes an 80% reduction in energy and 97% reduction in transistor count for the nonvolatile crossbar array, 80% standby power reduction and 25% reduced area footprint for the clockless lookup tables, and roughly 97% reduction in transistor count for a multiplier built using components from the analog blocks. Moreover, the proposed fabric yields 77% energy reduction compared to CMOS when used to implement CS reconstruction, in addition to latency improvements

    Heterogeneous Reconfigurable Fabrics for In-circuit Training and Evaluation of Neuromorphic Architectures

    Get PDF
    A heterogeneous device technology reconfigurable logic fabric is proposed which leverages the cooperating advantages of distinct magnetic random access memory (MRAM)-based look-up tables (LUTs) to realize sequential logic circuits, along with conventional SRAM-based LUTs to realize combinational logic paths. The resulting Hybrid Spin/Charge FPGA (HSC-FPGA) using magnetic tunnel junction (MTJ) devices within this topology demonstrates commensurate reductions in area and power consumption over fabrics having LUTs constructed with either individual technology alone. Herein, a hierarchical top-down design approach is used to develop the HSCFPGA starting from the configurable logic block (CLB) and slice structures down to LUT circuits and the corresponding device fabrication paradigms. This facilitates a novel architectural approach to reduce leakage energy, minimize communication occurrence and energy cost by eliminating unnecessary data transfer, and support auto-tuning for resilience. Furthermore, HSC-FPGA enables new advantages of technology co-design which trades off alternative mappings between emerging devices and transistors at runtime by allowing dynamic remapping to adaptively leverage the intrinsic computing features of each device technology. HSC-FPGA offers a platform for fine-grained Logic-In-Memory architectures and runtime adaptive hardware. An orthogonal dimension of fabric heterogeneity is also non-determinism enabled by either low-voltage CMOS or probabilistic emerging devices. It can be realized using probabilistic devices within a reconfigurable network to blend deterministic and probabilistic computational models. Herein, consider the probabilistic spin logic p-bit device as a fabric element comprising a crossbar-structured weighted array. The Programmability of the resistive network interconnecting p-bit devices can be achieved by modifying the resistive states of the array\u27s weighted connections. Thus, the programmable weighted array forms a CLB-scale macro co-processing element with bitstream programmability. This allows field programmability for a wide range of classification problems and recognition tasks to allow fluid mappings of probabilistic and deterministic computing approaches. In particular, a Deep Belief Network (DBN) is implemented in the field using recurrent layers of co-processing elements to form an n x m1 x m2 x ::: x mi weighted array as a configurable hardware circuit with an n-input layer followed by i ≄ 1 hidden layers. As neuromorphic architectures using post-CMOS devices increase in capability and network size, the utility and benefits of reconfigurable fabrics of neuromorphic modules can be anticipated to continue to accelerate
    corecore