22 research outputs found

    Improving Phase Change Memory (PCM) and Spin-Torque-Transfer Magnetic-RAM (STT-MRAM) as Next-Generation Memories: A Circuit Perspective

    Get PDF
    In the memory hierarchy of computer systems, the traditional semiconductor memories Static RAM (SRAM) and Dynamic RAM (DRAM) have already served for several decades as cache and main memory. With technology scaling, they face increasingly intractable challenges like power, density, reliability and scalability. As a result, they become less appealing in the multi/many-core era with ever increasing size and memory-intensity of working sets. Recently, there is an increasing interest in using emerging non-volatile memory technologies in replacement of SRAM and DRAM, due to their advantages like non-volatility, high device density, near-zero cell leakage and resilience to soft errors. Among several new memory technologies, Phase Change Memory (PCM) and Spin-Torque-Transfer Magnetic-RAM (STT-MRAM) are most promising candidates in building main memory and cache, respectively. However, both of them possess unique limitations that preventing them from being effectively adopted. In this dissertation, I present my circuit design work on tackling the limitations of PCM and STT-MRAM. At bit level, both PCM and STT-MRAM suffer from excessive write energy, and PCM has very limited write endurance. For PCM, I implement Differential Write to remove large number of unnecessary bit-writes that do not alter the stored data. It is then extended to STT-MRAM as Early Write Termination, with specific optimizations to eliminate the overhead of pre-write read. At array level, PCM enjoys high density but could not provide competitive throughput due to its long write latency and limited number of read/write circuits. I propose a Pseudo-Multi-Port Bank design to exploit intra-bank parallelism by recycling and reusing shared peripheral circuits between accesses in a time-multiplexed manner. On the other hand, although STT-MRAM features satisfactory throughput, its conventional array architecture is constrained on density and scalability by the pitch of the per-column bitline pair. I propose a Common-Source-Line Array architecture which uses a shared source-line along the row, essentially leaving only one bitline per column. For these techniques, I provide circuit level analyses as well as architecture/system level and/or process/device level discussions. In addition, relevant background and work are thoroughly surveyed and potential future research topics are discussed, offering insights and prospects of these next-generation memories

    Towards Successful Application of Phase Change Memories: Addressing Challenges from Write Operations

    Get PDF
    The emerging Phase Change Memory (PCM) technology is drawing increasing attention due to its advantages in non-volatility, byte-addressability and scalability. It is regarded as a promising candidate for future main memory. However, PCM's write operation has some limitations that pose challenges to its application in memory. The disadvantages include long write latency, high write power and limited write endurance. In this thesis, I present my effort towards successful application of PCM memory. My research consists of several optimizing techniques at both the circuit and architecture level. First, at the circuit level, I propose Differential Write to remove unnecessary bit changes in PCM writes. This is not only beneficial to endurance but also to the energy and latency of writes. Second, I propose two memory scheduling enhancements (AWP and RAWP) for a non-blocking bank design. My memory scheduling enhancements can exploit intra-bank parallelism provided by non-blocking bank design, and achieve significant throughput improvement. Third, I propose Bit Level Power Budgeting (BPB), a fine-grained power budgeting technique that leverages the information from Differential Write to achieve even higher memory throughput under the same power budget. Fourth, I propose techniques to improve the QoS tuning ability of high-priority applications when running on PCM memory. In summary, the techniques I propose effectively address the challenges of PCM's write operations. In addition, I present the experimental infrastructure in this work and my visions of potential future research topics, which could be helpful to other researchers in the area

    Parallel and Distributed Computing

    Get PDF
    The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

    A Reliability Prediction Method for Phase-Change Devices Using Optimized Pulse Conditions

    Get PDF
    Owing to the outstanding device characteristics of Phase-Change Random Access Memory (PCRAM) such as high scalability, high speed, good cycling endurance, and compatibility with conventional complementary metal-oxide-semiconductor (CMOS) processes, PCRAM has reached the point of volume production. However, due to the temperature dependent nature of the phase-change memory device material and the high electrical and thermal stresses applied during the programming operation, the standard methods of high-temperature (Temperature \u3e 125 °C) accelerated retention testing may not be able to accurately predict bit sensing failures or determine slight pulse condition changes needed if the device were to be programmed at an elevated temperature several times, in an environment where the ambient temperature is between 25 and 125 °C. In this work a new reliability prediction method, different than standard PCRAM reliability methods is presented. This new method will model and predict a single combination of temperature and pulse conditions for temperatures between 25 and 125 °C, giving the lowest Bit Error Rate (BER). The prediction model was created by monitoring the cell resistance distributions collected from sections of the PCRAM 1Gigabit (Gb) array after applying a given RESET or SET programming pulse shape at a given temperature, in the range of 25 to 125 °C. This model can be used to determine the optimal pulse conditions for a given ambient temperature and predict the BER and/or data retention loss over large arrays of devices on the Micron/Numonyx 45nm PCRAM part

    Adaptation in Standard CMOS Processes with Floating Gate Structures and Techniques

    Get PDF
    We apply adaptation into ordinary circuits and systems to achieve high performance, high quality results. Mismatch in manufactured VLSI devices has been the main limiting factor in quality for many analog and mixed-signal designs. Traditional compensation methods are generally costly. A few examples include enlarging the device size, averaging signals, and trimming with laser. By applying floating gate adaptation to standard CMOS circuits, we demonstrate here that we are able to trim CMOS comparator offset to a precision of 0.7mV, reduce CMOS image sensor fixed-pattern noise power by a factor of 100, and achieve 5.8 effective number of bits (ENOB) in a 6-bit flash analog-to-digital converter (ADC) operating at 750MHz. The adaptive circuits generally exhibit special features in addition to an improved performance. These special features are generally beyond the capabilities of traditional CMOS design approaches and they open exciting opportunities in novel circuit designs. Specifically, the adaptive comparator has the ability to store an accurate arbitrary offset, the image sensor can be set up to memorize previously captured scenes like a human retina, and the ADC can be configured to adapt to the incoming analog signal distribution and perform an efficient signal conversion that minimizes distortion and maximizes output entropy

    Fully Integrated High-Performance MEMS Lumped Element Filters for Reconfigurable Radios.

    Full text link
    In this research, we present RF MEMS filters which address the most challenging performance requirements of modern RF front-end systems, namely multi-band processing capability, low energy consumption, and small size. These filters not only provide a wide tuning range for multiple-band selection, but also offer low loss, high power handling capability, fast tuning speed, and temperature stability. Two different technologies are considered for tunable lumped element filter targeting UHF range. The first technology is a tunable RF MEMS platform based on surface micromachining, enabling fabrication of continuously tuned capacitors, capacitive and ohmic switches, as well as high-Q inductors, all on a single chip. The filter is in a third-order coupled resonator configuration. Continuous electrostatic tuning is achieved using three tunable capacitor banks each consisting of one continuously tunable capacitor and three switched capacitors with pull-in voltage of less than 40V. The center frequency of the filter is tuned from 1GHz to 600MHz while maintaining a 3dB-bandwidth of 13 to 14% and insertion loss of 2%. The filter occupies a small size (1.5 cm x 1.0 cm). This filter shows the best published performance yet in terms of insertion loss, out-of-band rejection, temperature stability, and tuning range. The second technology is based on a new tuning mechanism utilizing phase-change (PC) materials. PC technology has been investigated and adopted in memory industry due to its fast transition time in nano second range, small size, and high resistance change ratio. Although PC materials offer several benefits, they have not been considered for RF applications because of their limited power handling capability and relatively higher on-resistance in their current form. In this work, germanium tellurium (GeTe) is considered as it offers a low on-resistivity and pronounced resistance change ratio of up to 106. To characterize RF properties of GeTe, different types of RF switches have been fabricated and compared. Such PC switches can be monolithically integrated with other micromachined components to implement reconfigurable front-end modules, potentially offering high tuning speed, low loss, high linearity, and small size.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/98038/1/yhshim_1.pd

    High Temperature Characterization of Ge2Sb2Te5Thin Films for Phase Change Memory Applications

    Get PDF
    The recent proliferation of portable communication devices or data storage equipment is strongly related to the development of memory technology. Non-volatile semiconductor solid-state memories are needed for high-capacity storage media, high-speed operation and low power consumption, with stringent requirements of retention and endurance. Phase change memory (PCM) is currently seen as one of the most promising candidates for a future storage-class memory with the potential to be close to dynamic random-access memory (DRAM) in speed but with much longer retention times and as dense as flash memory. PCM devices utilize chalcogenide materials (most commonly Ge2Sb2Te5 or GST) that can be switched rapidly and reversibly between amorphous and crystalline phases with orders of magnitude difference in electrical resistivity. Since PCM devices operate at elevated (current-induced) temperatures and are significantly impacted by thermoelectric effects it is very important to determine the high temperature material properties of GST. Resistivity, carrier mobility, and carrier concentration in semiconducting materials are three key parameters indispensable for device modeling. In this work two measurement setups for high temperature thin film characterizations were developed, a Seebeck setup and a Hall setup. The Seebeck coefficient measurement setup is fully automated and uses resistive and inductive heaters to control the temperature gradient and can reach temperatures up to ~650 °C. The Hall measurement setup, developed based on the van der Paw method for characterization of semiconducting thin films, can measure thin film samples of a wide resistivity range from room temperature to ~500 °C. The resistivity, carrier concentration, and Hall carrier mobility are calculated from I-V measurements and the constant magnetic field applied in ‘up’ and ‘down’ directions. Measurement results on GST thin films with different thicknesses revealed interesting correlations between S-T and ρ-T characteristics and showed that GST behaves as a unipolar p-type semiconducting material from room temperature up to melting. The thermoelectric properties of the GST films were also correlated to the average grain sizes obtained from in-situ XRD measurements during crystallization. These studies show that the activation energy of carriers in mixed phase amorphous-fcc GST is a linear function of the Peltier coefficient. From these results and the ρ-T characteristics, the expected Seebeck coefficient of single crystal fcc GST is obtained. Using the experimental results for resistivity and Seebeck coefficient, together with a phase separation model, the temperature-dependent thermal conductivity of the mixed phase GST is extracted

    A DATA AWARE APPROACH TO SALVAGE THE ENDURANCE OF PHASE-CHANGE MEMORY

    Get PDF
    Phase Change Memory (PCM) is an emerging non-volatile memory technology that could either replace or augment DRAM and NAND flash that are hindered by scalability challenges. PCM suffers from a limited endurance problem that needs to be alleviated before it can be endorsed into the memory stack. This thesis is based on the observation that the endurance problem and its ramification depend on the write data. Accordingly, a data-aware approach is applied to salvage the endurance of PCM at three different stages: pre-write fault avoidance, post-write fault tolerance and post-failure recovery. The pre-write fault avoidance stage aims at reducing the endurance cost of servicing write requests. To this end, Cost Aware Flip Optimization (CAFO) is presented as an efficient technique to lessen the endurance degradation. Essentially, CAFO relies on a cost model that captures the endurance cost of programming memory cells based on their already stored values. Subsequently,the write data is encoded into a form that incurs a lower endurance cost than the original write data. Overall, CAFO is capable of reducing the endurance cost by up to 65% more than the existing schemes. Worn out PCM cells exhibit a stuck-at fault model which makes the manifestation of errors dependent on the values that cells are stuck at. When a write fails, the data is rewritten inverted. This dissertation shows that applying data inversion at the post-write fault tolerance stage exploits the data dependent nature of errors which enables ECCs to tolerate faults up to double their nominal capability. Furthermore, extensions to RDIS which is an ECC designed specifically for the stuck-at fault model are presented. At the post-failure recovery stage, Data Dependent Sparing is presented to manage bad blocks in PCM. Departing from the observation that defective blocks in the context of the stuck-at fault model still exhibit a low write failure probability due to the data dependent nature of errors, this thesis takes the approach of reusing blocks that are defective yet better-than-bad through a dynamic management of the reserve spare space. Data Dependent Sparing is capable of increasing the lifetime of PCM by up to 18%

    Bio-inspired learning and hardware acceleration with emerging memories

    Get PDF
    Machine Learning has permeated many aspects of engineering, ranging from the Internet of Things (IoT) applications to big data analytics. While computing resources available to implement these algorithms have become more powerful, both in terms of the complexity of problems that can be solved and the overall computing speed, the huge energy costs involved remains a significant challenge. The human brain, which has evolved over millions of years, is widely accepted as the most efficient control and cognitive processing platform. Neuro-biological studies have established that information processing in the human brain relies on impulse like signals emitted by neurons called action potentials. Motivated by these facts, the Spiking Neural Networks (SNNs), which are a bio-plausible version of neural networks have been proposed as an alternative computing paradigm where the timing of spikes generated by artificial neurons is central to its learning and inference capabilities. This dissertation demonstrates the computational power of the SNNs using conventional CMOS and emerging nanoscale hardware platforms. The first half of this dissertation presents an SNN architecture which is trained using a supervised spike-based learning algorithm for the handwritten digit classification problem. This network achieves an accuracy of 98.17% on the MNIST test data-set, with about 4X fewer parameters compared to the state-of-the-art neural networks achieving over 99% accuracy. In addition, a scheme for parallelizing and speeding up the SNN simulation on a GPU platform is presented. The second half of this dissertation presents an optimal hardware design for accelerating SNN inference and training with SRAM (Static Random Access Memory) and nanoscale non-volatile memory (NVM) crossbar arrays. Three prominent NVM devices are studied for realizing hardware accelerators for SNNs: Phase Change Memory (PCM), Spin Transfer Torque RAM (STT-RAM) and Resistive RAM (RRAM). The analysis shows that a spike-based inference engine with crossbar arrays of STT-RAM bit-cells is 2X and 5X more efficient compared to PCM and RRAM memories, respectively. Furthermore, the STT-RAM design has nearly 6X higher throughput per unit Watt per unit area than that of an equivalent SRAM-based (Static Random Access Memory) design. A hardware accelerator with on-chip learning on an STT-RAM memory array is also designed, requiring 1616 bits of floating-point synaptic weight precision to reach the baseline SNN algorithmic performance on the MNIST dataset. The complete design with STT-RAM crossbar array achieves nearly 20X higher throughput per unit Watt per unit mm^2 than an equivalent design with SRAM memory. In summary, this work demonstrates the potential of spike-based neuromorphic computing algorithms and its efficient realization in hardware based on conventional CMOS as well as emerging technologies. The schemes presented here can be further extended to design spike-based systems that can be ubiquitously deployed for energy and memory constrained edge computing applications
    corecore