31 research outputs found
Towards Energy-Efficient and Reliable Computing: From Highly-Scaled CMOS Devices to Resistive Memories
The continuous increase in transistor density based on Moore\u27s Law has led us to highly scaled Complementary Metal-Oxide Semiconductor (CMOS) technologies. These transistor-based process technologies offer improved density as well as a reduction in nominal supply voltage. An analysis regarding different aspects of 45nm and 15nm technologies, such as power consumption and cell area to compare these two technologies is proposed on an IEEE 754 Single Precision Floating-Point Unit implementation. Based on the results, using the 15nm technology offers 4-times less energy and 3-fold smaller footprint. New challenges also arise, such as relative proportion of leakage power in standby mode that can be addressed by post-CMOS technologies. Spin-Transfer Torque Random Access Memory (STT-MRAM) has been explored as a post-CMOS technology for embedded and data storage applications seeking non-volatility, near-zero standby energy, and high density. Towards attaining these objectives for practical implementations, various techniques to mitigate the specific reliability challenges associated with STT-MRAM elements are surveyed, classified, and assessed herein. Cost and suitability metrics assessed include the area of nanomagmetic and CMOS components per bit, access time and complexity, Sense Margin (SM), and energy or power consumption costs versus resiliency benefits. In an attempt to further improve the Process Variation (PV) immunity of the Sense Amplifiers (SAs), a new SA has been introduced called Adaptive Sense Amplifier (ASA). ASA can benefit from low Bit Error Rate (BER) and low Energy Delay Product (EDP) by combining the properties of two of the commonly used SAs, Pre-Charge Sense Amplifier (PCSA) and Separated Pre-Charge Sense Amplifier (SPCSA). ASA can operate in either PCSA or SPCSA mode based on the requirements of the circuit such as energy efficiency or reliability. Then, ASA is utilized to propose a novel approach to actually leverage the PV in Non-Volatile Memory (NVM) arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time
Architecting Memory Systems for Emerging Technologies
The advance of traditional dynamic random access memory (DRAM) technology has slowed down, while the capacity and performance needs of memory system have continued to increase. This is a result of increasing data volume from emerging applications, such as machine learning and big data analytics. In addition to such demands, increasing energy consumption is becoming a major constraint on the capabilities of computer systems. As a result, emerging non-volatile memories, for example, Spin Torque Transfer Magnetic RAM (STT-MRAM), and new memory interfaces, for example, High Bandwidth Memory (HBM), have been developed as an alternative. Thus far, most previous studies have retained a DRAM-like memory architecture and management policy. This preserves compatibility but hides the true benefits of those new memory technologies.
In this research, we proposed the co-design of memory architectures and their management policies for emerging technologies. First, we introduced a new memory architecture for an STT-MRAM main memory. In particular, we defined a new page mode operation for efficient activation and sensing. By fully exploiting the non-destructive nature of STT- MRAM, our design achieved higher performance, lower energy consumption, and a smaller area than the traditional designs. Second, we developed a cost-effective technique to improve load balancing for HBM memory channels. We showed that the proposed technique was capable of efficiently redistributing memory requests across multiple memory channels to improve the channel utilization, resulting in improved performance.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145988/1/bcoh_1.pd
Energy-Aware Data Movement In Non-Volatile Memory Hierarchies
While technology scaling enables increased density for memory cells, the intrinsic high leakage power of conventional CMOS technology and the demand for reduced energy consumption inspires the use of emerging technology alternatives such as eDRAM and Non-Volatile Memory (NVM) including STT-MRAM, PCM, and RRAM. The utilization of emerging technology in Last Level Cache (LLC) designs which occupies a signifcant fraction of total die area in Chip Multi Processors (CMPs) introduces new dimensions of vulnerability, energy consumption, and performance delivery. To be specific, a part of this research focuses on eDRAM Bit Upset Vulnerability Factor (BUVF) to assess vulnerable portion of the eDRAM refresh cycle where the critical charge varies depending on the write voltage, storage and bit-line capacitance. This dissertation broaden the study on vulnerability assessment of LLC through investigating the impact of Process Variations (PV) on narrow resistive sensing margins in high-density NVM arrays, including on-chip cache and primary memory. Large-latency and power-hungry Sense Amplifers (SAs) have been adapted to combat PV in the past. Herein, a novel approach is proposed to leverage the PV in NVM arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time. On the other hand, this dissertation investigates a novel technique to prioritize the service to 1) Extensive Read Reused Accessed blocks of the LLC that are silently dropped from higher levels of cache, and 2) the portion of the working set that may exhibit distant re-reference interval in L2. In particular, we develop a lightweight Multi-level Access History Profiler to effciently identify ERRA blocks through aggregating the LLC block addresses tagged with identical Most Signifcant Bits into a single entry. Experimental results indicate that the proposed technique can reduce the L2 read miss ratio by 51.7% on average across PARSEC and SPEC2006 workloads. In addition, this dissertation will broaden and apply advancements in theories of subspace recovery to pioneer computationally-aware in-situ operand reconstruction via the novel Logic In Interconnect (LI2) scheme. LI2 will be developed, validated, and re?ned both theoretically and experimentally to realize a radically different approach to post-Moore\u27s Law computing by leveraging low-rank matrices features offering data reconstruction instead of fetching data from main memory to reduce energy/latency cost per data movement. We propose LI2 enhancement to attain high performance delivery in the post-Moore\u27s Law era through equipping the contemporary micro-architecture design with a customized memory controller which orchestrates the memory request for fetching low-rank matrices to customized Fine Grain Reconfigurable Accelerator (FGRA) for reconstruction while the other memory requests are serviced as before. The goal of LI2 is to conquer the high latency/energy required to traverse main memory arrays in the case of LLC miss, by using in-situ construction of the requested data dealing with low-rank matrices. Thus, LI2 exchanges a high volume of data transfers with a novel lightweight reconstruction method under specific conditions using a cross-layer hardware/algorithm approach
Spin-Transfer-Torque (STT) Devices for On-chip Memory and Their Applications to Low-standby Power Systems
With the scaling of CMOS technology, the proportion of the leakage power to total power consumption increases. Leakage may account for almost half of total power consumption in high performance processors. In order to reduce the leakage power, there is an increasing interest in using nonvolatile storage devices for memory applications. Among various promising nonvolatile memory elements, spin-transfer torque magnetic RAM (STT-MRAM) is identified as one of the most attractive alternatives to conventional SRAM. However, several design challenges of STT-MRAM such as shared read and write current paths, single-ended sensing, and high dynamic power are major challenges to be overcome to make it suitable for on-chip memories. To mitigate such problems, we propose a domain wall coupling based spin-transfer torque (DWCSTT) device for on-chip caches. Our proposed DWCSTT bit-cell decouples the read and the write current paths by the electrically-insulating magnetic coupling layer so that we can separately optimize read operation without having an impact on write-ability. In addition, the complementary polarizer structure in the read path of the DWCSTT device allows DWCSTT to enable self-referenced differential sensing. DWCSTT bit-cells improve the write power consumption due to the low electrical resistance of the write current path. Furthermore, we also present three different bit-cell level design techniques of Spin-Orbit Torque MRAM (SOT-MRAM) for alleviating some of the inefficiencies of conventional magnetic memories while maintaining the advantages of spin-orbit torque (SOT) based novel switching mechanism such as low write current requirement and decoupled read and write current path. Our proposed SOT-MRAM with supporting dual read/write ports (1R/1W) can address the issue of high-write latency of STT-MRAM by simultaneous 1R/1W accesses. Second, we propose a new type of SOT-MRAM which uses only one access transistor along with a Schottky diode in order to mitigate the area-overhead caused by two access transistors in conventional SOT-MRAM. Finally, a new design technique of SOT-MRAM is presented to improve the integration density by utilizing a shared bit-line structure
STT-MRAM characterization and its test implications
Spin torque transfer (STT)-magnetoresistive random-access memory (MRAM) has come
a long way in research to meet the speed and power consumption requirements for future
memory applications. The state-of-the-art STT-MRAM bit-cells employ magnetic tunnel
junction (MTJ) with perpendicular magnetic anisotropy (PMA). The process repeatabil-
ity and yield stability for wafer fabrication are some of the critical issues encountered in
STT-MRAM mass production. Some of the yield improvement techniques to combat the
e ect of process variations have been previously explored. However, little research has been
done on defect oriented testing of STT-MRAM arrays. In this thesis, the author investi-
gates the parameter deviation and non-idealities encountered during the development of
a novel MTJ stack con guration. The characterization result provides motivation for the
development of the design for testability (DFT) scheme that can help test and characterize
STT-MRAM bit-cells and the CMOS peripheral circuitry e ciently.
The primary factors for wafer yield degradation are the device parameter variation and
its non-uniformity across the wafer due to the fabrication process non-idealities. There-
fore, e ective in-process testing strategies for exploring and verifying the impact of the
parameter variation on the wafer yield will be needed to achieve fabrication process opti-
mization. While yield depends on the CMOS process variability, quality of the deposited
MTJ lm, and other process non-idealities, test platform can enable parametric optimiza-
tion and veri cation using the CMOS-based DFT circuits. In this work, we develop a DFT
algorithm and implement a DFT circuit for parametric testing and prequali cation of the
critical circuits in the CMOS wafer. The DFT circuit successfully replicates the electrical
characteristics of MTJ devices and captures their spatial variation across the wafer with
an error of less than 4%. We estimate the yield of the read sensing path by implement-
ing the DFT circuit, which can replicate the resistance-area product variation up to 50%
from its nominal value. The yield data from the read sensing path at di erent wafer loca-
tions are analyzed, and a usable wafer radius has been estimated. Our DFT scheme can
provide quantitative feedback based on in-die measurement, enabling fabrication process
optimization through iterative estimation and veri cation of the calibrated parameters.
Another concern that prevents mass production of STT-MRAM arrays is the defect
formation in MTJ devices due to aging. Identifying manufacturing defects in the magnetic
tunnel junction (MTJ) device is crucial for the yield and reliability of spin-torque-transfer
(STT) magnetic random-access memory (MRAM) arrays. Several of the MTJ defects result
in parametric deviations of the device that deteriorate over time. We extend our work on
the DFT scheme by monitoring the electrical parameter deviations occurring due to the
defect formation over time. A programmable DFT scheme was implemented for a sub-arrayin 65 nm CMOS technology to evaluate the feasibility of the test scheme. The scheme utilizes the read sense path to compare the bit-cell electrical parameters against known
DFT cells characteristics. Built-in-self-test (BIST) methodology is utilized to trigger the
onset of the fault once the device parameter crosses a threshold value. We demonstrate
the operation and evaluate the accuracy of detection with the proposed scheme. The
DFT scheme can be exploited for monitoring aging defects, modeling their behavior and
optimization of the fabrication process.
DFT scheme could potentially nd numerous applications for parametric characteriza-
tion and fault monitoring of STT-MRAM bit-cell arrays during mass production. Some of
the applications include a) Fabrication process feedback to improve wafer turnaround time,
b) STT-MRAM bit-cell health monitoring, c) Decoupled characterization of the CMOS pe-
ripheral circuitry such as read-sensing path and sense ampli er characterization within the
STT-MRAM array. Additionally, the DFT scheme has potential applications for detec-
tion of fault formation that could be utilized for deploying redundancy schemes, providing
a graceful degradation in MTJ-based bit-cell array due to aging of the device, and also
providing feedback to improve the fabrication process and yield learning
High-Density Solid-State Memory Devices and Technologies
This Special Issue aims to examine high-density solid-state memory devices and technologies from various standpoints in an attempt to foster their continuous success in the future. Considering that broadening of the range of applications will likely offer different types of solid-state memories their chance in the spotlight, the Special Issue is not focused on a specific storage solution but rather embraces all the most relevant solid-state memory devices and technologies currently on stage. Even the subjects dealt with in this Special Issue are widespread, ranging from process and design issues/innovations to the experimental and theoretical analysis of the operation and from the performance and reliability of memory devices and arrays to the exploitation of solid-state memories to pursue new computing paradigms
Domain walls in spin-valve nanotracks: characterisation and applications
Magnetic systems based on the manipulation of domain walls (DWs) in nanometre-scaled tracks have been shown to store data at high density, perform complex logic operations, and even mechanically manipulate magnetic beads. The magnetic nano-track has also been an indispensable model system to study fundamental magnetic and
magneto-electronic phenomena, such as field induced DW propagation, spin-transfer
torque, and other micromagnetic properties. Its value to fundamental research and the
breath of potentially useful applications have made this class of systems the focus of
wide research in the area of nanomagnetism and spintronics.
This thesis focuses on DW manipulation and DW-based devices in spin-valve
nanotracks. The spin-valve is a metallic multi-layered spintronic structure, wherein the
electrical resistance varies greatly with the magnetisation of its layers. In comparison to
monolayer tracks, the spin-valve track enables more sensitive and versatile
measurements, as well as demonstrating electronic output of DW-based devices, an
achievement of crucial interest to technological applications. However, these multi-layered tracks introduce new, potentially disruptive magnetic interactions, as well as
fabrication challenges.
In this thesis, the DW propagation in spin-valve nanotracks of different compositions
was studied, and a system with DW propagation properties comparable to the state-of-the-art in monolayer tracks was demonstrated, down to an unprecedented lateral size
of 33nm.
Several DW logic devices of variable complexity were demonstrated and studied,
namely a turn-counting DW spiral, a DW gate, multiple DW logic NOT gates, and a DW-DW interactor. It was found that, where the comparison was possible, the overall
magnetic behaviour of these devices was analogous to that of monolayer structures,
and the device performance, as defined by the range of field wherein they function
desirably, was found to be comparable, albeit inferior, to that of their monolayer
counterparts. The interaction between DWs in adjacent tracks was studied and new
phenomena were observed and characterised, such as DW depinning induced by a
static or travelling adjacent DW.
The contribution of different physical mechanisms to electrical current induced
depinning were quantified, and it was found that the Oersted field, typically negligible
in monolayer tracks, was responsible for large variations in depinning field in SV tracks,
and that the strength of spin-transfer effect was similar in magnitude to that reported
in monolayer tracks. Finally, current induced ferromagnetic resonance was measured,
and the domain uniform resonant mode was observed, in very good agreement to
Kittel theory and simulations