75 research outputs found

    Improving Phase Change Memory (PCM) and Spin-Torque-Transfer Magnetic-RAM (STT-MRAM) as Next-Generation Memories: A Circuit Perspective

    Get PDF
    In the memory hierarchy of computer systems, the traditional semiconductor memories Static RAM (SRAM) and Dynamic RAM (DRAM) have already served for several decades as cache and main memory. With technology scaling, they face increasingly intractable challenges like power, density, reliability and scalability. As a result, they become less appealing in the multi/many-core era with ever increasing size and memory-intensity of working sets. Recently, there is an increasing interest in using emerging non-volatile memory technologies in replacement of SRAM and DRAM, due to their advantages like non-volatility, high device density, near-zero cell leakage and resilience to soft errors. Among several new memory technologies, Phase Change Memory (PCM) and Spin-Torque-Transfer Magnetic-RAM (STT-MRAM) are most promising candidates in building main memory and cache, respectively. However, both of them possess unique limitations that preventing them from being effectively adopted. In this dissertation, I present my circuit design work on tackling the limitations of PCM and STT-MRAM. At bit level, both PCM and STT-MRAM suffer from excessive write energy, and PCM has very limited write endurance. For PCM, I implement Differential Write to remove large number of unnecessary bit-writes that do not alter the stored data. It is then extended to STT-MRAM as Early Write Termination, with specific optimizations to eliminate the overhead of pre-write read. At array level, PCM enjoys high density but could not provide competitive throughput due to its long write latency and limited number of read/write circuits. I propose a Pseudo-Multi-Port Bank design to exploit intra-bank parallelism by recycling and reusing shared peripheral circuits between accesses in a time-multiplexed manner. On the other hand, although STT-MRAM features satisfactory throughput, its conventional array architecture is constrained on density and scalability by the pitch of the per-column bitline pair. I propose a Common-Source-Line Array architecture which uses a shared source-line along the row, essentially leaving only one bitline per column. For these techniques, I provide circuit level analyses as well as architecture/system level and/or process/device level discussions. In addition, relevant background and work are thoroughly surveyed and potential future research topics are discussed, offering insights and prospects of these next-generation memories

    Design techniques for dense embedded memory in advanced CMOS technologies

    Get PDF
    University of Minnesota Ph.D. dissertation. February 2012. Major: Electrical Engineering. Advisor: Chris H. Kim. 1 computer file (PDF); viii, 116 pages.On-die cache memory is a key component in advanced processors since it can boost micro-architectural level performance at a moderate power penalty. Demand for denser memories only going to increase as the number of cores in a microprocessor goes up with technology scaling. A commensurate increase in the amount of cache memory is needed to fully utilize the larger and more powerful processing units. 6T SRAMs have been the embedded memory of choice for modern microprocessors due to their logic compatibility, high speed, and refresh-free operation. However, the relatively large cell size and conflicting requirements for read and write make aggressive scaling of 6T SRAMs challenging in sub-22 nm. In this dissertation, circuit techniques and simulation methodologies are presented to demonstrate the potential of alternative options such as gain cell eDRAMs and spin-torque-transfer magnetic RAMs (STT-MRAMs) for high density embedded memories.Three unique test chip designs are presented to enhance the retention time and access speed of gain cell eDRAMs. Proposed bit-cells utilize preferential boostings, beneficial couplings, and aggregated cell leakages for expanding signal window between data `1' and `0'. The design space of power-delay product can be further enhanced with various assist schemes that harness the innate properties of gain cell eDRAMs. Experimental results from the test chips demonstrate that the proposed gain cell eDRAMs achieve overall faster system performances and lower static power dissipations than SRAMs in a generic 65 nm low-power (LP) CMOS process. A magnetic tunnel junction (MTJ) scaling scenario and an efficient HSPICE simulation methodology are proposed for exploring the scalability of STT-MRAMs under variation effects from 65 nm to 8 nm. A constant JC0*RA/VDD scaling method is adopted to achieve optimal read and write performances of STT-MRAMs and thermal stabilities for a 10 year retention are achieved by adjusting free layer thicknesses as well as projecting crystalline anisotropy improvements. Studies based on the proposed methodology show that in-plane STT-MRAM will outperform SRAM from 15 nm node, while its perpendicular counterpart requires further innovations in MTJ material properties in order to overcome the poor write performance from 22 nm node

    Low-Power, Low-Voltage SRAM Circuits Design For Nanometric CMOS Technologies

    Get PDF
    Embedded SRAM memory is a vital component in modern SoCs. More than 80% of the System-on-Chip (SoC) die area is often occupied by SRAM arrays. As such, system reliability and yield is largely governed by the SRAM's performance and robustness. The aggressive scaling trend in CMOS device minimum feature size, coupled with the growing demand in high-capacity memory integration, has imposed the use of minimal size devices to realize a memory bitcell. The smallest 6T SRAM bitcell to date occupies a 0.1um2 in silicon area. SRAM bitcells continue to benefit from an aggressive scaling trend in CMOS technologies. Unfortunately, other system components, such as interconnects, experience a slower scaling trend. This has resulted in dramatic deterioration in a cell's ability to drive a heavily-loaded interconnects. Moreover, the growing fluctuation in device properties due to Process, Voltage, and Temperature (PVT) variations has added more uncertainty to SRAM operation. Thus ensuring the ability of a miniaturized cell to drive heavily-loaded bitlines and to generate adequate voltage swing is becoming challenging. A large percentage of state-of-the-art SoC system failures are attributed to the inability of SRAM cells to generate the targeted bitline voltage swing within a given access time. The use of read-assist mechanisms and current mode sense amplifiers are the two key strategies used to surmount bitline loading effects. On the other hand, new bitcell topologies and cell supply voltage management are used to overcome fluctuations in device properties. In this research we tackled conventional 6T SRAM bitcell limited drivability by introducing new integrated voltage sensing schemes and current-mode sense amplifiers. The proposed schemes feature a read-assist mechanism. The proposed schemes' functionality and superiority over existing schemes are verified using transient and statistical SPICE simulations. Post-layout extracted views of the devices are used for realistic simulation results. Low-voltage operated SRAM reliability and yield enhancement is investigated and a wordline boost technique is proposed as a means to manage the cell's WL operating voltage. The proposed wordline driver design shows a significant improvement in reliability and yield in a 400-mV 6T SRAM cell. The proposed wordline driver design exploit the cell's Dynamic Noise Margin (DNM), therefore boost peak level and boost decay rate programmability features are added. SPICE transient and statistical simulations are used to verify the proposed design's functionality. Finally, at a bitcell-level, we proposed a new five-transistor (5T) SRAM bitcell which shows competitive performance and reliability figures of merit compared to the conventional 6T bitcell. The functionality of the proposed cell is verified by post-layout SPICE simulations. The proposed bitcell topology is designed, implemented and fabricated in a standard ST CMOS 65nm technology process. A 1.2_ 1.2 mm2 multi-design project test chip consisting of four 32-Kbit (256-row x 128-column) SRAM macros with the required peripheral and timing control units is fabricated. Two of the designed SRAM macros are dedicated for this work, namely, a 32-Kbit 5T macro and a 32-Kbit 6T macro which is used as a comparison reference. Other macros belong to other projects and are not discussed in this document

    Design of High Performance SRAM Based Memory Chip

    Get PDF
    The semiconductor memory SRAM uses bi-stable latch circuit to store the logic data 1 or 0. It differs from Dynamic RAM (DRAM) which needs periodic refreshment operation for the storage of logic data. Depending upon the frequency of operation SRAM power consumption varies i.e. it consumes very high power at higher frequencies like DRAM. The Cache memory present in the microprocessor needs high speed memory hence SRAM can be used for that purpose in microprocessors. The DRAM is normally used in the Main memory of processors, where importance is given to the density than its speed. The SRAM is also used in industrial subsystems, scientific and automotive electronics. In this thesis 16-Kb Memory is designed by using memory banking method in UMC 90nm technology ,which operates at a frequency of 1GHz.The post layout simulation for the complete design is performed and also obtained power analysis for the overall design. All peripherals like pre-charge, Row Decoder, Word line driver, Sense amplifier, Column Decoder/Mux and write driver are designed and layouts of all the above peripherals also drawn in an optimised manner such that their layout occupies minimum area. The 6T SRAM cell is designed with operating frequency of 8 GHz and stability analysis are also performed for single SRAM cell. The layout of Single SRAM cell is drawn in a symmetric manner, such that two adjacent cells can share same contact, which results reduction in the area of cell layout. The Static Noise Margin, Read noise margin and Write Noise Margin of single cell are found to be 240mV, 115mV and 425mV respectively for a supply voltage of 1V.The effect of pull-up ratio and cell ratio on the stability of SRAM cell is observed

    차세대 HBM 용 고집적, 저전력 송수신기 설계

    Get PDF
    학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2020. 8. 정덕균.This thesis presents design techniques for high-density power-efficient transceiver for the next-generation high bandwidth memory (HBM). Unlike the other memory interfaces, HBM uses a 3D-stacked package using through-silicon via (TSV) and a silicon interposer. The transceiver for HBM should be able to solve the problems caused by the 3D-stacked package and TSV. At first, a data (DQ) receiver for HBM with a self-tracking loop that tracks a phase skew between DQ and data strobe (DQS) due to a voltage or thermal drift is proposed. The self-tracking loop achieves low power and small area by uti-lizing an analog-assisted baud-rate phase detector. The proposed pulse-to-charge (PC) phase detector (PD) converts the phase skew to a voltage differ-ence and detects the phase skew from the voltage difference. An offset calibra-tion scheme that can compensates for a mismatch of the PD is also proposed. The proposed calibration scheme operates without any additional sensing cir-cuits by taking advantage of the write training of HBM. Fabricated in 65 nm CMOS, the DQ receiver shows a power efficiency of 370 fJ/b at 4.8 Gb/s and occupies 0.0056 mm2. The experimental results show that the DQ receiver op-erates without any performance degradation under a ± 10% supply variation. In a second prototype IC, a high-density transceiver for HBM with a feed-forward-equalizer (FFE)-combined crosstalk (XT) cancellation scheme is pre-sented. To compensate for the XT, the transmitter pre-distorts the amplitude of the FFE output according to the XT. Since the proposed XT cancellation (XTC) scheme reuses the FFE implemented to equalize the channel loss, additional circuits for the XTC is minimized. Thanks to the XTC scheme, a channel pitch can be significantly reduced, allowing for the high channel density. Moreover, the 3D-staggered channel structure removes the ground layer between the verti-cally adjacent channels, which further reduces a cross-sectional area of the channel per lane. The test chip including 6 data lanes is fabricated in 65 nm CMOS technology. The 6-mm channels are implemented on chip to emulate the silicon interposer between the HBM and the processor. The operation of the XTC scheme is verified by simultaneously transmitting 4-Gb/s data to the 6 consecutive channels with 0.5-um pitch and the XTC scheme reduces the XT-induced jitter up to 78 %. The measurement result shows that the transceiver achieves the throughput of 8 Gb/s/um. The transceiver occupies 0.05 mm2 for 6 lanes and consumes 36.6 mW at 6 x 4 Gb/s.본 논문에서는 차세대 HBM을 위한 고집적 저전력 송수신기 설계 방법을 제안한다. 첫 번째로, 전압 및 온도 변화에 의한 데이터와 클럭 간 위상 차이를 보상할 수 있는 자체 추적 루프를 가진 데이터 수신기를 제안한다. 제안하는 자체 추적 루프는 데이터 전송 속도와 같은 속도로 동작하는 위상 검출기를 사용하여 전력 소모와 면적을 줄였다. 또한 메모리의 쓰기 훈련 (write training) 과정을 이용하여 효과적으로 위상 검출기의 오프셋을 보상할 수 있는 방법을 제안한다. 제안하는 데이터 수신기는 65 nm 공정으로 제작되어 4.8 Gb/s에서 370 fJ/b을 소모하였다. 또한 10 % 의 전압 변화에 대하여 안정적으로 동작하는 것을 확인하였다. 두 번째로, 피드 포워드 이퀄라이저와 결합된 크로스 토크 보상 방식을 활용한 고집적 송수신기를 제안한다. 제안하는 송신기는 크로스 토크 크기에 해당하는 만큼 송신기 출력을 왜곡하여 크로스 토크를 보상한다. 제안하는 크로스 토크 보상 방식은 채널 손실을 보상하기 위해 구현된 피드 포워드 이퀄라이저를 재활용함으로써 추가적인 회로를 최소화한다. 제안하는 송수신기는 크로스 토크가 보상 가능하기 때문에, 채널 간격을 크게 줄여 고집적 통신을 구현하였다. 또한 집적도를 더 증가시키기 위해 세로로 인접한 채널 사이의 차폐 층을 제거한 적층 채널 구조를 제안한다. 6개의 송수신기를 포함한 프로토타입 칩은 65 nm 공정으로 제작되었다. HBM과 프로세서 사이의 silicon interposer channel 을 모사하기 위한 6 mm 의 채널이 칩 위에 구현되었다. 제안하는 크로스 토크 보상 방식은 0.5 um 간격의 6개의 인접한 채널에 동시에 데이터를 전송하여 검증되었으며, 크로스 토크로 인한 지터를 최대 78 % 감소시켰다. 제안하는 송수신기는 8 Gb/s/um 의 처리량을 가지며 6 개의 송수신기가 총 36.6 mW의 전력을 소모하였다.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 4 CHAPTER 2 BACKGROUND ON HIGH-BANDWIDTH MEMORY 6 2.1 OVERVIEW 6 2.2 TRANSCEIVER ARCHITECTURE 10 2.3 READ/WRITE OPERATION 15 2.3.1 READ OPERATION 15 2.3.2 WRITE OPERATION 19 CHAPTER 3 BACKGROUNDS ON COUPLED WIRES 21 3.1 GENERALIZED MODEL 21 3.2 EFFECT OF CROSSTALK 26 CHAPTER 4 DQ RECEIVER WITH BAUD-RATE SELF-TRACKING LOOP 29 4.1 OVERVIEW 29 4.2 FEATURES OF DQ RECEIVER FOR HBM 33 4.3 PROPOSED PULSE-TO-CHARGE PHASE DETECTOR 35 4.3.1 OPERATION OF PULSE-TO-CHARGE PHASE DETECTOR 35 4.3.2 OFFSET CALIBRATION 37 4.3.3 OPERATION SEQUENCE 39 4.4 CIRCUIT IMPLEMENTATION 42 4.5 MEASUREMENT RESULT 46 CHAPTER 5 HIGH-DENSITY TRANSCEIVER FOR HBM WITH 3D-STAGGERED CHANNEL AND CROSSTALK CANCELLATION SCHEME 57 5.1 OVERVIEW 57 5.2 PROPOSED 3D-STAGGERED CHANNEL 61 5.2.1 IMPLEMENTATION OF 3D-STAGGERED CHANNEL 61 5.2.2 CHANNEL CHARACTERISTICS AND MODELING 66 5.3 PROPOSED FEED-FORWARD-EQUALIZER-COMBINED CROSSTALK CANCELLATION SCHEME 72 5.4 CIRCUIT IMPLEMENTATION 77 5.4.1 OVERALL ARCHITECTURE 77 5.4.2 TRANSMITTER WITH FFE-COMBINED XTC 79 5.4.3 RECEIVER 81 5.5 MEASUREMENT RESULT 82 CHAPTER 6 CONCLUSION 93 BIBLIOGRAPHY 95 초 록 102Docto

    Emerging embedded nonvolatile memory solution for ultra low power microcontroller systems

    Get PDF
    13301甲第4810号博士(工学)金沢大学博士論文本文Full 以下に掲載および掲載予定:1.IEEE Journal of Solid-State Circuits 27(4) pp.569-573 1992. IEEE. 共著者:M. Hayashikoshi, H. Hidaka, K. Arimoto, K. Fujishima 2.IEEE Transactions on Multi-Scale Computing Systems IEEE. 共著者:M. Hayashikoshi, H. Noda, H. Kawai, Y. Murai, S. Otani, K. Nii, Y. Matsuda, H. Kond

    Reliability in the face of variability in nanometer embedded memories

    Get PDF
    In this thesis, we have investigated the impact of parametric variations on the behaviour of one performance-critical processor structure - embedded memories. As variations manifest as a spread in power and performance, as a first step, we propose a novel modeling methodology that helps evaluate the impact of circuit-level optimizations on architecture-level design choices. Choices made at the design-stage ensure conflicting requirements from higher-levels are decoupled. We then complement such design-time optimizations with a runtime mechanism that takes advantage of adaptive body-biasing to lower power whilst improving performance in the presence of variability. Our proposal uses a novel fully-digital variation tracking hardware using embedded DRAM (eDRAM) cells to monitor run-time changes in cache latency and leakage. A special fine-grain body-bias generator uses the measurements to generate an optimal body-bias that is needed to meet the required yield targets. A novel variation-tolerant and soft-error hardened eDRAM cell is also proposed as an alternate candidate for replacing existing SRAM-based designs in latency critical memory structures. In the ultra low-power domain where reliable operation is limited by the minimum voltage of operation (Vddmin), we analyse the impact of failures on cache functional margin and functional yield. Towards this end, we have developed a fully automated tool (INFORMER) capable of estimating memory-wide metrics such as power, performance and yield accurately and rapidly. Using the developed tool, we then evaluate the #effectiveness of a new class of hybrid techniques in improving cache yield through failure prevention and correction. Having a holistic perspective of memory-wide metrics helps us arrive at design-choices optimized simultaneously for multiple metrics needed for maintaining lifetime requirements

    Design of Logic-Compatible Embedded Flash Memories for Moderate Density On-Chip Non-Volatile Memory Applications

    Get PDF
    University of Minnesota Ph.D. dissertation. December 2013. Major: Electrical Engineering. Advisor: Chris H. Kim. 1 computer file (PDF); xx, 129 pages.An on-chip embedded NVM (eNVM) enables a zero-standby power system-on-a-chip with a smaller form factor, faster access speed, lower access power, and higher security than an off-chip NVM. Differently from the high density eNVM technologies such as dual-poly eflash, FeRAM, STT-MRAM, and RRAM that typically require process overhead beyond standard logic process, the moderate density eNVM technologies such as e-fuse, anti-fuse, and single-poly embedded flash (eflash) can be fabricated in a standard logic process with no process overhead. Among them, a single-poly eflash is a unique multiple-time programmable moderate density eNVM, while it is expected to play a key role in mitigating variability and reliability issues of the future VLSI technologies; however, the challenges such as a high voltage disturbance, an implementation of logic compatible High Voltage Switch (HVS), and a limited sensing margin are required to be solved for its implementation using a standard I/O device. This thesis focuses on alleviating such challenges of the single-poly eflash memory with three single-poly eflash designs proposed in a generic logic process for moderate density eNVM applications. Firstly, the proposed 5T eflash features a WL-by-WL accessible architecture with no disturbance issue of the unselected WL cells, an overstress-free multi-story HVS expanding the cell sensing margin, and a selective WL refresh scheme for the higher cell endurance. The most favorable eflash cell configuration is also studied when the performance, endurance, retention, and disturbance characteristics are all considered. Secondly, the proposed 6T eflash features the bit-by-bit re-write capability for the higher overall cell endurance, while not disturbing the unselected WL cells. The logic compatible on-chip charge pump to provide the appropriate high voltages for the proposed eflash operations is also discussed. Finally, the proposed 10T eflash features a multi-configurable HVS that does not require the boosted read supplies, and a differential cell architecture with improved retention time. All these proposed eflash memories were implemented in a 65nm standard logic process, and the test chip measurement results confirmed the functionality of the proposed designs with a reasonable retention margin, showing the competitiveness of the proposed eflash memories compared to the other moderate density eNVM candidates

    A design methodology for robust, energy-efficient, application-aware memory systems

    Get PDF
    Memory design is a crucial component of VLSI system design from area, power and performance perspectives. To meet the increasingly challenging system specifications, architecture, circuit and device level innovations are required for existing memory technologies. Emerging memory solutions are widely explored to cater to strict budgets. This thesis presents design methodologies for custom memory design with the objective of power-performance benefits across specific applications. Taking example of STTRAM (spin transfer torque random access memory) as an emerging memory candidate, the design space is explored to find optimal energy design solution. A thorough thermal reliability study is performed to estimate detection reliability challenges and circuit solutions are proposed to ensure reliable operation. Adoption of the application-specific optimal energy solution is shown to yield considerable energy benefits in a read-heavy application called MBC (memory based computing). Circuit level customizations are studied for the volatile SRAM (static random access memory) memory, which will provide improved energy-delay product (EDP) for the same MBC application. Memory design has to be aware of upcoming challenges from not only the application nature but also from the packaging front. Taking 3D die-folding as an example, SRAM performance shift under die-folding is illustrated. Overall the thesis demonstrates how knowledge of the system and packaging can help in achieving power efficient and high performance memory design.Ph.D

    Memristive Non-Volatile Memory Based on Graphene Materials

    Get PDF
    Resistive random access memory (RRAM), which is considered as one of the most promising next-generation non-volatile memory (NVM) devices and a representative of memristor technologies, demonstrated great potential in acting as an artificial synapse in the industry of neuromorphic systems and artificial intelligence (AI), due its advantages such as fast operation speed, low power consumption, and high device density. Graphene and related materials (GRMs), especially graphene oxide (GO), acting as active materials for RRAM devices, are considered as a promising alternative to other materials including metal oxides and perovskite materials. Herein, an overview of GRM-based RRAM devices is provided, with discussion about the properties of GRMs, main operation mechanisms for resistive switching (RS) behavior, figure of merit (FoM) summary, and prospect extension of GRM-based RRAM devices. With excellent physical and chemical advantages like intrinsic Young’s modulus (1.0 TPa), good tensile strength (130 GPa), excellent carrier mobility (2.0 × 105 cm2∙V−1∙s−1), and high thermal (5000 Wm−1∙K−1) and superior electrical conductivity (1.0 × 106 S∙m−1), GRMs can act as electrodes and resistive switching media in RRAM devices. In addition, the GRM-based interface between electrode and dielectric can have an effect on atomic diffusion limitation in dielectric and surface effect suppression. Immense amounts of concrete research indicate that GRMs might play a significant role in promoting the large-scale commercialization possibility of RRAM devices
    corecore