50 research outputs found

    Facilitating Emerging Non-volatile Memories in Next-Generation Memory System Design: Architecture-Level and Application-Level Perspectives

    Get PDF
    This dissertation focuses on three types of emerging NVMs, spin-transfer torque RAM (STT-RAM), phase change memory (PCM), and metal-oxide resistive RAM (ReRAM). STT-RAM has been identified as the best replacement of SRAM to build large-scale and low-power on-chip caches and also an energy-efficient alternative to DRAM as main memory. PCM and ReRAM have been considered to be promising technologies for building future large-scale and low-power main memory systems. This dissertation investigates two aspects to facilitate them in next-generation memory system design, architecture-level and application-level perspectives. First, multi-level cell (MLC) STT-RAM based cache design is optimized by using data encoding and data compression. Second, MLC STT-RAM is utilized as persistent main memory for fast and energy-efficient local checkpointing. Third, the commonly used database indexing algorithm, B+tree, is redesigned to be NVM-friendly. Forth, a novel processing-in-memory architecture built on ReRAM based main memory is proposed to accelerate neural network applications

    Enabling Reliable, Efficient, and Secure Computing for Energy Harvesting Powered IoT Devices

    Get PDF
    Energy harvesting is one of the most promising techniques to power devices for future generation IoT. While energy harvesting does not have longevity, safety, and recharging concerns like traditional batteries, its instability brings a new challenge to the embedded systems: the energy harvested from environment is usually weak and intermittent. With traditional CMOS based technology, whenever the power is off, the computation has to start from the very beginning. Compared with existing CMOS based memory devices, emerging non-volatile memory devices such as PCM and STT-RAM, have the benefits of sustaining the data even when there is no power. By checkpointing the processor's volatile state to non-volatile memory, a program can resume its execution immediately after power comes back on again instead of restarting from the very beginning with checkpointing techniques. However, checkpointing is not sufficient for energy harvesting systems. First, the program execution resumed from the last checkpoint might not execute correctly and causes inconsistency problem to the system. This problem is due to the inconsistency between volatile system state and non-volatile system state during checkpointing. Second, the process of checkpointing consumes a considerable amount of energy and time due to the slow and energy-consuming write operation of non-volatile memory. Finally, connecting to the internet poses many security issues to energy harvesting IoT devices. Traditional data encryption methods are both energy and time consuming which do not fit the resource constrained IoT devices. Therefore, a light-weight encryption method is in urgent need for securing IoT devices. Targeting those three challenges, this dissertation proposes three techniques to enable reliable, efficient, and secure computing in energy harvesting IoT devices. First, a consistency-aware checkpointing technique is proposed to avoid inconsistency errors generated from the inconsistency between volatile state and non-volatile state. Second, checkpoint aware hybrid cache architecture is proposed to guarantee reliable checkpointing while maintaining a low checkpointing overhead from cache. Finally, to ensure the security of energy harvesting IoT devices, an energy-efficient in-memory encryption implementation for protecting the IoT device is proposed which can quickly encrypts the data in non-volatile memory and protect the embedded system physical and on-line attacks

    Embedding Logic and Non-volatile Devices in CMOS Digital Circuits for Improving Energy Efficiency

    Get PDF
    abstract: Static CMOS logic has remained the dominant design style of digital systems for more than four decades due to its robustness and near zero standby current. Static CMOS logic circuits consist of a network of combinational logic cells and clocked sequential elements, such as latches and flip-flops that are used for sequencing computations over time. The majority of the digital design techniques to reduce power, area, and leakage over the past four decades have focused almost entirely on optimizing the combinational logic. This work explores alternate architectures for the flip-flops for improving the overall circuit performance, power and area. It consists of three main sections. First, is the design of a multi-input configurable flip-flop structure with embedded logic. A conventional D-type flip-flop may be viewed as realizing an identity function, in which the output is simply the value of the input sampled at the clock edge. In contrast, the proposed multi-input flip-flop, named PNAND, can be configured to realize one of a family of Boolean functions called threshold functions. In essence, the PNAND is a circuit implementation of the well-known binary perceptron. Unlike other reconfigurable circuits, a PNAND can be configured by simply changing the assignment of signals to its inputs. Using a standard cell library of such gates, a technology mapping algorithm can be applied to transform a given netlist into one with an optimal mixture of conventional logic gates and threshold gates. This approach was used to fabricate a 32-bit Wallace Tree multiplier and a 32-bit booth multiplier in 65nm LP technology. Simulation and chip measurements show more than 30% improvement in dynamic power and more than 20% reduction in core area. The functional yield of the PNAND reduces with geometry and voltage scaling. The second part of this research investigates the use of two mechanisms to improve the robustness of the PNAND circuit architecture. One is the use of forward and reverse body biases to change the device threshold and the other is the use of RRAM devices for low voltage operation. The third part of this research focused on the design of flip-flops with non-volatile storage. Spin-transfer torque magnetic tunnel junctions (STT-MTJ) are integrated with both conventional D-flipflop and the PNAND circuits to implement non-volatile logic (NVL). These non-volatile storage enhanced flip-flops are able to save the state of system locally when a power interruption occurs. However, manufacturing variations in the STT-MTJs and in the CMOS transistors significantly reduce the yield, leading to an overly pessimistic design and consequently, higher energy consumption. A detailed analysis of the design trade-offs in the driver circuitry for performing backup and restore, and a novel method to design the energy optimal driver for a given yield is presented. Efficient designs of two nonvolatile flip-flop (NVFF) circuits are presented, in which the backup time is determined on a per-chip basis, resulting in minimizing the energy wastage and satisfying the yield constraint. To achieve a yield of 98%, the conventional approach would have to expend nearly 5X more energy than the minimum required, whereas the proposed tunable approach expends only 26% more energy than the minimum. A non-volatile threshold gate architecture NV-TLFF are designed with the same backup and restore circuitry in 65nm technology. The embedded logic in NV-TLFF compensates performance overhead of NVL. This leads to the possibility of zero-overhead non-volatile datapath circuits. An 8-bit multiply-and- accumulate (MAC) unit is designed to demonstrate the performance benefits of the proposed architecture. Based on the results of HSPICE simulations, the MAC circuit with the proposed NV-TLFF cells is shown to consume at least 20% less power and area as compared to the circuit designed with conventional DFFs, without sacrificing any performance.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Improving Storage Performance with Non-Volatile Memory-based Caching Systems

    Get PDF
    University of Minnesota Ph.D. dissertation. April 2017. Major: Computer Science. Advisor: David Du. 1 computer file (PDF); ix, 104 pages.With the rapid development of new types of non-volatile memory (NVRAM), e.g., 3D Xpoint, NVDIMM, and STT-MRAM, these technologies have been or will be integrated into current computer systems to work together with traditional DRAM. Compared with DRAM, which can cause data loss when the power fails or the system crashes, NVRAM's non-volatile nature makes it a better candidate as caching material. In the meantime, storage performance needs to keep up to process and accommodate the rapidly generated amounts of data around the world (a.k.a the big data problem). Throughout my Ph.D. research, I have been focusing on building novel NVRAM-based caching systems to provide cost-effective ways to improve storage system performance. To show the benefits of designing novel NVRAM-based caching systems, I target four representative storage devices and systems: solid state drives (SSDs), hard disk drives (HDDs), disk arrays, and high-performance computing (HPC) parallel file systems (PFSs). For SSDs, to mitigate their wear out problem and extend their lifespan, we propose two NVRAM-based buffer cache policies which can work together in different layers to maximally reduce SSD write traffic: a main memory buffer cache design named Hierarchical Adaptive Replacement Cache (H-ARC) and an internal SSD write buffer design named Write Traffic Reduction Buffer (WRB). H-ARC considers four factors (dirty, clean, recency, and frequency) to reduce write traffic and improve cache hit ratios in the host. WRB reduces block erasures and write traffic further inside an SSD by effectively exploiting temporal and spatial localities. For HDDs, to exploit their fast sequential access speed to improve I/O throughput, we propose a buffer cache policy, named I/O-Cache, that regroups and synchronizes long sets of consecutive dirty pages to take advantage of HDDs' fast sequential access speed and the non-volatile property of NVRAM. In addition, our new policy can dynamically separate the whole cache into a dirty cache and a clean cache, according to the characteristics of the workload, to decrease storage writes. For disk arrays, although numerous cache policies have been proposed, most are either targeted at main memory buffer caches or manage NVRAM as write buffers and separately manage DRAM as read caches. To the best of our knowledge, cooperative hybrid volatile and non-volatile memory buffer cache policies specifically designed for storage systems using newer NVRAM technologies have not been well studied. Based on our elaborate study of storage server block I/O traces, we propose a novel cooperative HybrId NVRAM and DRAM Buffer cACHe polIcy for storage arrays, named Hibachi. Hibachi treats read cache hits and write cache hits differently to maximize cache hit rates and judiciously adjusts the clean and the dirty cache sizes to capture workloads' tendencies. In addition, it converts random writes to sequential writes for high disk write throughput and further exploits storage server I/O workload characteristics to improve read performance. For modern complex HPC systems (e.g., supercomputers), data generated during checkpointing are bursty and so dominate HPC I/O traffic that relying solely on PFSs will slow down the whole HPC system. In order to increase HPC checkpointing speed, we propose an NVRAM-based burst buffer coordination system for PFSs, named collaborative distributed burst buffer (CDBB). Inspired by our observations of HPC application execution patterns and experimentations on HPC clusters, we design CDBB to coordinate all the available burst buffers, based on their priorities and states, to help overburdened burst buffers and maximize resource utilization

    Studies in Exascale Computer Architecture: Interconnect, Resiliency, and Checkpointing

    Full text link
    Today’s supercomputers are built from the state-of-the-art components to extract as much performance as possible to solve the most computationally intensive problems in the world. Building the next generation of exascale supercomputers, however, would require re-architecting many of these components to extract over 50x more performance than the current fastest supercomputer in the United States. To contribute towards this goal, two aspects of the compute node architecture were examined in this thesis: the on-chip interconnect topology and the memory and storage checkpointing platforms. As a first step, a skeleton exascale system was modeled to meet 1 exaflop of performance along with 100 petabytes of main memory. The model revealed that large kilo-core processors would be necessary to meet the exaflop performance goal; existing topologies, however, would not scale to those levels. To address this new challenge, we investigated and proposed asymmetric high-radix topologies that decoupled local and global communications and used different radix routers for switching network traffic at each level. The proposed topologies scaled more readily to higher numbers of cores with better latency and energy consumption than before. The vast number of components that the model revealed would be needed in these exascale systems cautioned towards better fault tolerance mechanisms. To address this challenge, we showed that local checkpoints within the compute node can be saved to a hybrid DRAM and SSD platform in order to write them faster without wearing out the SSD or consuming a lot of energy. A hybrid checkpointing platform allowed more frequent checkpoints to be made without sacrificing performance. Subsequently, we proposed switching to a DIMM-based SSD in order to perform fine-grained I/O operations that would be integral in interleaving checkpointing and computation while still providing persistence guarantees. Two more techniques that consolidate and overlap checkpointing were designed to better hide the checkpointing latency to the SSD.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/137096/1/sabeyrat_1.pd

    Applications for Packetized Memory Interfaces

    Get PDF
    The performance of the memory subsystem has a large impact on the performance of modern computer systems. Many important applications are memory bound and others are expected to become memory bound in the future. The importance of memory performance makes it imperative to understand and optimize the interactions between applications and the system architecture. Prototyping and exploring various configurations of memory systems can give important insights, but current memory interfaces are limited in the amount of flexibility they provide. This inflexibility stems primarily from the fixed timing of the memory interface. Packetized memory interfaces abstract away the underlying timing characteristics of the memory technology and allow greater flexibility in the design of memory hierarchies. This work uses packetized interfaces to explore memory hierarchy designs and prototype a novel network attached memory. Since current processors do not support packetized memory interfaces, a coherent processor bus is used as a memory interface for the DiskRAM project. The Hybrid Memory Cube (HMC) packetized memory interface is also presented and used to prototype network-attached memory. The HMC interface is discussed in detail, along with the design and implementation of a Universal Verification Component (UVC) environment. The convergence of network and memory interfaces is also predicted

    Normally-Off Computing Design Methodology Using Spintronics: From Devices to Architectures

    Get PDF
    Energy-harvesting-powered computing offers intriguing and vast opportunities to dramatically transform the landscape of Internet of Things (IoT) devices and wireless sensor networks by utilizing ambient sources of light, thermal, kinetic, and electromagnetic energy to achieve battery-free computing. In order to operate within the restricted energy capacity and intermittency profile of battery-free operation, it is proposed to innovate Elastic Intermittent Computation (EIC) as a new duty-cycle-variable computing approach leveraging the non-volatility inherent in post-CMOS switching devices. The foundations of EIC will be advanced from the ground up by extending Spin Hall Effect Magnetic Tunnel Junction (SHE-MTJ) device models to realize SHE-MTJ-based Majority Gate (MG) and Polymorphic Gate (PG) logic approaches and libraries, that leverage intrinsic-non-volatility to realize middleware-coherent, intermittent computation without checkpointing, micro-tasking, or software bloat and energy overheads vital to IoT. Device-level EIC research concentrates on encapsulating SHE-MTJ behavior with a compact model to leverage the non-volatility of the device for intrinsic provision of intermittent computation and lifetime energy reduction. Based on this model, the circuit-level EIC contributions will entail the design, simulation, and analysis of PG-based spintronic logic which is adaptable at the gate-level to support variable duty cycle execution that is robust to brief and extended supply outages or unscheduled dropouts, and development of spin-based research synthesis and optimization routines compatible with existing commercial toolchains. These tools will be employed to design a hybrid post-CMOS processing unit utilizing pipelining and power-gating through state-holding properties within the datapath itself, thus eliminating checkpointing and data transfer operations

    Anchor: Architecture for Secure Non-Volatile Memories

    Get PDF
    The rapid growth of memory-intensive applications like cloud computing, deep learning, bioinformatics, etc., have propelled memory industry to develop scalable, high density, low power non-volatile memory (NVM) technologies; however, computing systems that integrate these advanced NVMs are vulnerable to several security attacks that threaten (i) data confidentiality, (ii) data availability, and (iii) data integrity. This dissertation presents ANCHOR, which integrates 4 low overhead, high performance security solutions SECRET, COVERT, ACME, and STASH to thwart these attacks on NVM systems. SECRET is a low cost security solution for data confidentiality in multi-/triple-level cell (i.e., MLC/TLC) NVMs. SECRET synergistically combines (i) smart encryption, which prevents re-encryption of unmodified or zero-words during a write-back with (ii) XOR-based energy masking, which further optimizes NVM writes by transforming a high-energy ciphertext into a low-energy ciphertext. SECRET outperforms state-of-the-art encryption solutions, with the lowest write energy and latency, as well as the highest lifetime. COVERT and ACME complement SECRET to improve system availability of counter mode encryption (CME). COVERT repurposes unused error correction resources to dynamically extend time to counter overflow of fast growing counters, thereby delaying frequent full memory re-encryption (system freeze). ACME performs counter write leveling (CWL) to further increase time to counter overflow, and thereby delays the time to full memory re-encryption. COVERT+ACME achieves system availability of 99.999% during normal operation and 99.9% under a denial of memory service (DoMS) attack. In contrast, conventional CME achieves system availability of only 85.71% during normal operation and is rendered non-operational under a DoMS attack. Finally, STASH is a comprehensive end-to-end security architecture for state-of-the-art smart hybrid memories (SHMs) that employ a smart DRAM cache with smart NVM-based main memory. STASH integrates (i) CME for data confidentiality, (ii) page-level Merkle Tree authentication for data integrity, (iii) recovery-compatible MT updates to withstand power/system failures, and (iv) page-migration friendly security meta-data management. For security guarantees equivalent to state-of-the-art, STASH reduces memory overhead by 12.7x, improves system performance by 65%, and increases NVM lifetime by 5x. This dissertation thus addresses the core security challenges of next-generation NVM-based memory systems. Directions for future research include (i) exploration of holistic architectures that ensure both security and reliability of smart memory systems, (ii) investigating applications of ANCHOR to reduce security overhead of Internet-of-Things, and (iii) extending ANCHOR to safeguard emerging non-volatile processors, especially in the light of advanced attacks like Spectre and Meltdown
    corecore