41 research outputs found

    ARCHITECTING EMERGING MEMORY TECHNOLOGIES FOR ENERGY-EFFICIENT COMPUTING IN MODERN PROCESSORS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Circuit and Architecture Co-Design of STT-RAM for High Performance and Low Energy

    Get PDF
    Spin-Transfer Torque Random Access Memory (STT-RAM) has been proved a promising emerging nonvolatile memory technology suitable for many applications such as cache mem- ory of CPU. Compared with other conventional memory technology, STT-RAM offers many attractive features such as nonvolatility, fast random access speed and extreme low leakage power. However, STT-RAM is still facing many challenges. First of all, programming STT-RAM is a stochastic process due to random thermal fluctuations, so the write errors are hard to avoid. Secondly, the existing STT-RAM cell designs can be used for only single-port accesses, which limits the memory access bandwidth and constraints the system performance. Finally, while other memory technology supports multi-level cell (MLC) design to boost the storage density, adopting MLC to STT-RAM brings many disadvantages such as requirement for large transistor and low access speed. In this work, we proposed solutions on both circuit and architecture level to address these challenges. For the write error issues, we proposed two probabilistic methods, namely write-verify- rewrite with adaptive period (WRAP) and verify-one-while-writing (VOW), for performance improvement and write failure reduction. For dual-port solution, we propose the design methods to support dual-port accesses for STT-RAM. The area increment by introducing an additional port is reduced by leveraging the shared source-line structure. Detailed analysis on the performance/reliability degrada- tion caused by dual-port accesses is performed, and the corresponding design optimization is provided. To unleash the potential of MLC STT-RAM cache, we proposed a new design through a cross-layer co-optimization. The memory cell structure integrated the reversed stacking of magnetic junction tunneling (MTJ) for a more balanced device and design trade-off. In architecture development, we presented an adaptive mode switching mechanism: based on application’s memory access behavior, the MLC STT-RAM cache can dynamically change between low latency SLC mode and high capacity MLC mode. Finally, we present a 4Kb test chip design which can support different types and sizes of MTJs. A configurable sensing solution is used in the test chip so that it can support wide range of MTJ resistance. Such test chip design can help to evaluate various type of MTJs in the future

    Gestión de jerarquías de memoria híbridas a nivel de sistema

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadoras y Automática y de Ku Leuven, Arenberg Doctoral School, Faculty of Engineering Science, leída el 11/05/2017.In electronics and computer science, the term ‘memory’ generally refers to devices that are used to store information that we use in various appliances ranging from our PCs to all hand-held devices, smart appliances etc. Primary/main memory is used for storage systems that function at a high speed (i.e. RAM). The primary memory is often associated with addressable semiconductor memory, i.e. integrated circuits consisting of silicon-based transistors, used for example as primary memory but also other purposes in computers and other digital electronic devices. The secondary/auxiliary memory, in comparison provides program and data storage that is slower to access but offers larger capacity. Examples include external hard drives, portable flash drives, CDs, and DVDs. These devices and media must be either plugged in or inserted into a computer in order to be accessed by the system. Since secondary storage technology is not always connected to the computer, it is commonly used for backing up data. The term storage is often used to describe secondary memory. Secondary memory stores a large amount of data at lesser cost per byte than primary memory; this makes secondary storage about two orders of magnitude less expensive than primary storage. There are two main types of semiconductor memory: volatile and nonvolatile. Examples of non-volatile memory are ‘Flash’ memory (sometimes used as secondary, sometimes primary computer memory) and ROM/PROM/EPROM/EEPROM memory (used for firmware such as boot programs). Examples of volatile memory are primary memory (typically dynamic RAM, DRAM), and fast CPU cache memory (typically static RAM, SRAM, which is fast but energy-consuming and offer lower memory capacity per are a unit than DRAM). Non-volatile memory technologies in Si-based electronics date back to the 1990s. Flash memory is widely used in consumer electronic products such as cellphones and music players and NAND Flash-based solid-state disks (SSDs) are increasingly displacing hard disk drives as the primary storage device in laptops, desktops, and even data centers. The integration limit of Flash memories is approaching, and many new types of memory to replace conventional Flash memories have been proposed. The rapid increase of leakage currents in Silicon CMOS transistors with scaling poses a big challenge for the integration of SRAM memories. There is also the case of susceptibility to read/write failure with low power schemes. As a result of this, over the past decade, there has been an extensive pooling of time, resources and effort towards developing emerging memory technologies like Resistive RAM (ReRAM/RRAM), STT-MRAM, Domain Wall Memory and Phase Change Memory(PRAM). Emerging non-volatile memory technologies promise new memories to store more data at less cost than the expensive-to build silicon chips used by popular consumer gadgets including digital cameras, cell phones and portable music players. These new memory technologies combine the speed of static random-access memory (SRAM), the density of dynamic random-access memory (DRAM), and the non-volatility of Flash memory and so become very attractive as another possibility for future memory hierarchies. The research and information on these Non-Volatile Memory (NVM) technologies has matured over the last decade. These NVMs are now being explored thoroughly nowadays as viable replacements for conventional SRAM based memories even for the higher levels of the memory hierarchy. Many other new classes of emerging memory technologies such as transparent and plastic, three-dimensional(3-D), and quantum dot memory technologies have also gained tremendous popularity in recent years...En el campo de la informática, el término ‘memoria’ se refiere generalmente a dispositivos que son usados para almacenar información que posteriormente será usada en diversos dispositivos, desde computadoras personales (PC), móviles, dispositivos inteligentes, etc. La memoria principal del sistema se utiliza para almacenar los datos e instrucciones de los procesos que se encuentre en ejecución, por lo que se requiere que funcionen a alta velocidad (por ejemplo, DRAM). La memoria principal está implementada habitualmente mediante memorias semiconductoras direccionables, siendo DRAM y SRAM los principales exponentes. Por otro lado, la memoria auxiliar o secundaria proporciona almacenaje(para ficheros, por ejemplo); es más lenta pero ofrece una mayor capacidad. Ejemplos típicos de memoria secundaria son discos duros, memorias flash portables, CDs y DVDs. Debido a que estos dispositivos no necesitan estar conectados a la computadora de forma permanente, son muy utilizados para almacenar copias de seguridad. La memoria secundaria almacena una gran cantidad de datos aun coste menor por bit que la memoria principal, siendo habitualmente dos órdenes de magnitud más barata que la memoria primaria. Existen dos tipos de memorias de tipo semiconductor: volátiles y no volátiles. Ejemplos de memorias no volátiles son las memorias Flash (algunas veces usadas como memoria secundaria y otras veces como memoria principal) y memorias ROM/PROM/EPROM/EEPROM (usadas para firmware como programas de arranque). Ejemplos de memoria volátil son las memorias DRAM (RAM dinámica), actualmente la opción predominante a la hora de implementar la memoria principal, y las memorias SRAM (RAM estática) más rápida y costosa, utilizada para los diferentes niveles de cache. Las tecnologías de memorias no volátiles basadas en electrónica de silicio se remontan a la década de1990. Una variante de memoria de almacenaje por carga denominada como memoria Flash es mundialmente usada en productos electrónicos de consumo como telefonía móvil y reproductores de música mientras NAND Flash solid state disks(SSDs) están progresivamente desplazando a los dispositivos de disco duro como principal unidad de almacenamiento en computadoras portátiles, de escritorio e incluso en centros de datos. En la actualidad, hay varios factores que amenazan la actual predominancia de memorias semiconductoras basadas en cargas (capacitivas). Por un lado, se está alcanzando el límite de integración de las memorias Flash, lo que compromete su escalado en el medio plazo. Por otra parte, el fuerte incremento de las corrientes de fuga de los transistores de silicio CMOS actuales, supone un enorme desafío para la integración de memorias SRAM. Asimismo, estas memorias son cada vez más susceptibles a fallos de lectura/escritura en diseños de bajo consumo. Como resultado de estos problemas, que se agravan con cada nueva generación tecnológica, en los últimos años se han intensificado los esfuerzos para desarrollar nuevas tecnologías que reemplacen o al menos complementen a las actuales. Los transistores de efecto campo eléctrico ferroso (FeFET en sus siglas en inglés) se consideran una de las alternativas más prometedores para sustituir tanto a Flash (por su mayor densidad) como a DRAM (por su mayor velocidad), pero aún está en una fase muy inicial de su desarrollo. Hay otras tecnologías algo más maduras, en el ámbito de las memorias RAM resistivas, entre las que cabe destacar ReRAM (o RRAM), STT-RAM, Domain Wall Memory y Phase Change Memory (PRAM)...Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

    Towards Successful Application of Phase Change Memories: Addressing Challenges from Write Operations

    Get PDF
    The emerging Phase Change Memory (PCM) technology is drawing increasing attention due to its advantages in non-volatility, byte-addressability and scalability. It is regarded as a promising candidate for future main memory. However, PCM's write operation has some limitations that pose challenges to its application in memory. The disadvantages include long write latency, high write power and limited write endurance. In this thesis, I present my effort towards successful application of PCM memory. My research consists of several optimizing techniques at both the circuit and architecture level. First, at the circuit level, I propose Differential Write to remove unnecessary bit changes in PCM writes. This is not only beneficial to endurance but also to the energy and latency of writes. Second, I propose two memory scheduling enhancements (AWP and RAWP) for a non-blocking bank design. My memory scheduling enhancements can exploit intra-bank parallelism provided by non-blocking bank design, and achieve significant throughput improvement. Third, I propose Bit Level Power Budgeting (BPB), a fine-grained power budgeting technique that leverages the information from Differential Write to achieve even higher memory throughput under the same power budget. Fourth, I propose techniques to improve the QoS tuning ability of high-priority applications when running on PCM memory. In summary, the techniques I propose effectively address the challenges of PCM's write operations. In addition, I present the experimental infrastructure in this work and my visions of potential future research topics, which could be helpful to other researchers in the area

    상변화 메모리 시스템의 간섭 오류 완화 및 RMW 성능 향상 기법

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2021.8. 이혁재.Phase-change memory (PCM) announces the beginning of the new era of memory systems, owing to attractive characteristics. Many memory product manufacturers (e.g., Intel, SK Hynix, and Samsung) are developing related products. PCM can be applied to various circumstances; it is not simply limited to an extra-scale database. For example, PCM has a low standby power due to its non-volatility; hence, computation-intensive applications or mobile applications (i.e., long memory idle time) are suitable to run on PCM-based computing systems. Despite these fascinating features of PCM, PCM is still far from the general commercial market due to low reliability and long latency problems. In particular, low reliability is a painful problem for PCM in past decades. As the semiconductor process technology rapidly scales down over the years, DRAM reaches 10 nm class process technology. In addition, it is reported that the write disturbance error (WDE) would be a serious issue for PCM if it scales down below 54 nm class process technology. Therefore, addressing the problem of WDEs becomes essential to make PCM competitive to DRAM. To overcome this problem, this dissertation proposes a novel approach that can restore meta-stable cells on demand by levering two-level SRAM-based tables, thereby significantly reducing the number WDEs. Furthermore, a novel randomized approach is proposed to implement a replacement policy that originally requires hundreds of read ports on SRAM. The second problem of PCM is a long-latency compared to that of DRAM. In particular, PCM tries to enhance its throughput by adopting a larger transaction unit; however, the different unit size from the general-purpose processor cache line further degrades the system performance due to the introduction of a read-modify-write (RMW) module. Since there has never been any research related to RMW in a PCM-based memory system, this dissertation proposes a novel architecture to enhance the overall system performance and reliability of a PCM-based memory system having an RMW module. The proposed architecture enhances data re-usability without introducing extra storage resources. Furthermore, a novel operation that merges commands regardless of command types is proposed to enhance performance notably. Another problem is the absence of a full simulation platform for PCM. While the announced features of the PCM-related product (i.e., Intel Optane) are scarce due to confidential issues, all priceless information can be integrated to develop an architecture simulator that resembles the available product. To this end, this dissertation tries to scrape up all available features of modules in a PCM controller and implement a dedicated simulator for future research purposes.상변화 메모리는(PCM) 매력적인 특성을 통해 메모리 시스템의 새로운 시대의 시작을 알렸다. 많은 메모리 관련 제품 제조업체(예 : 인텔, SK 하이닉스, 삼성)가 관련 제품 개발에 박차를 가하고 있다. PCM은 단순히 대규모 데이터베이스에만 국한되지 않고 다양한 상황에 적용될 수 있다. 예를 들어, PCM은 비휘발성으로 인해 대기 전력이 낮다. 따라서 계산 집약적인 애플리케이션 또는 모바일 애플리케이션은(즉, 긴 메모리 유휴 시간) PCM 기반 컴퓨팅 시스템에서 실행하기에 적합하다. PCM의 이러한 매력적인 특성에도 불구하고 PCM은 낮은 신뢰성과 긴 대기 시간으로 인해 여전히 일반 산업 시장에서는 DRAM과 다소 격차가 있다. 특히 낮은 신뢰성은 지난 수십 년 동안 PCM 기술의 발전을 저해하는 문제다. 반도체 공정 기술이 수년에 걸쳐 빠르게 축소됨에 따라 DRAM은 10nm 급 공정 기술에 도달하였다. 이어서, 쓰기 방해 오류 (WDE)가 54nm 등급 프로세스 기술 아래로 축소되면 PCM에 심각한 문제가 될 것으로 보고되었다. 따라서, WDE 문제를 해결하는 것은 PCM이 DRAM과 동등한 경쟁력을 갖추도록 하는 데 있어 필수적이다. 이 문제를 극복하기 위해 이 논문에서는 2-레벨 SRAM 기반 테이블을 활용하여 WDE 수를 크게 줄여 필요에 따라 준 안정 셀을 복원할 수 있는 새로운 접근 방식을 제안한다. 또한, 원래 SRAM에서 수백 개의 읽기 포트가 필요한 대체 정책을 구현하기 위해 새로운 랜덤 기반의 기법을 제안한다. PCM의 두 번째 문제는 DRAM에 비해 지연 시간이 길다는 것이다. 특히 PCM은 더 큰 트랜잭션 단위를 채택하여 단위시간 당 데이터 처리량 향상을 도모한다. 그러나 범용 프로세서 캐시 라인과 다른 유닛 크기는 읽기-수정-쓰기 (RMW) 모듈의 도입으로 인해 시스템 성능을 저하하게 된다. PCM 기반 메모리 시스템에서 RMW 관련 연구가 없었기 때문에 본 논문은 RMW 모듈을 탑재 한 PCM 기반 메모리 시스템의 전반적인 시스템 성능과 신뢰성을 향상하게 시킬 수 있는 새로운 아키텍처를 제안한다. 제안된 아키텍처는 추가 스토리지 리소스를 도입하지 않고도 데이터 재사용성을 향상시킨다. 또한, 성능 향상을 위해 명령 유형과 관계없이 명령을 병합하는 새로운 작업을 제안한다. 또 다른 문제는 PCM을 위한 완전한 시뮬레이션 플랫폼이 부재하다는 것이다. PCM 관련 제품(예 : Intel Optane)에 대해 발표된 정보는 대외비 문제로 인해 부족하다. 하지만 알려져 있는 정보를 적절히 취합하면 시중 제품과 유사한 아키텍처 시뮬레이터를 개발할 수 있다. 이를 위해 본 논문은 PCM 메모리 컨트롤러에 필요한 모든 모듈 정보를 활용하여 향후 이와 관련된 연구에서 충분히 사용 가능한 전용 시뮬레이터를 구현하였다.1 INTRODUCTION 1 1.1 Limitation of Traditional Main Memory Systems 1 1.2 Phase-Change Memory as Main Memory 3 1.2.1 Opportunities of PCM-based System 3 1.2.2 Challenges of PCM-based System 4 1.3 Dissertation Overview 7 2 BACKGROUND AND PREVIOUS WORK 8 2.1 Phase-Change Memory 8 2.2 Mitigation Schemes for Write Disturbance Errors 10 2.2.1 Write Disturbance Errors 10 2.2.2 Verification and Correction 12 2.2.3 Lazy Correction 13 2.2.4 Data Encoding-based Schemes 14 2.2.5 Sparse-Insertion Write Cache 16 2.3 Performance Enhancement for Read-Modify-Write 17 2.3.1 Traditional Read-Modify-Write 17 2.3.2 Write Coalescing for RMW 19 2.4 Architecture Simulators for PCM 21 2.4.1 NVMain 21 2.4.2 Ramulator 22 2.4.3 DRAMsim3 22 3 IN-MODULE DISTURBANCE BARRIER 24 3.1 Motivation 25 3.2 IMDB: In Module-Disturbance Barrier 29 3.2.1 Architectural Overview 29 3.2.2 Implementation of Data Structures 30 3.2.3 Modification of Media Controller 36 3.3 Replacement Policy 38 3.3.1 Replacement Policy for IMDB 38 3.3.2 Approximate Lowest Number Estimator 40 3.4 Putting All Together: Case Studies 43 3.5 Evaluation 45 3.5.1 Configuration 45 3.5.2 Architectural Exploration 47 3.5.3 Effectiveness of the Replacement Policy 48 3.5.4 Sensitivity to Main Table Configuration 49 3.5.5 Sensitivity to Barrier Buffer Size 51 3.5.6 Sensitivity to AppLE Group Size 52 3.5.7 Comparison with Other Studies 54 3.6 Discussion 59 3.7 Summary 63 4 INTEGRATION OF AN RMW MODULE IN A PCM-BASED SYSTEM 64 4.1 Motivation 65 4.2 Utilization of DRAM Cache for RMW 67 4.2.1 Architectural Design 67 4.2.2 Algorithm 70 4.3 Typeless Command Merging 73 4.3.1 Architectural Design 73 4.3.2 Algorithm 74 4.4 An Alternative Implementation: SRC-RMW 78 4.4.1 Implementation of SRC-RMW 78 4.4.2 Design Constraint 80 4.5 Case Study 82 4.6 Evaluation 85 4.6.1 Configuration 85 4.6.2 Speedup 88 4.6.3 Read Reliability 91 4.6.4 Energy Consumption: Selecting a Proper Page Size 93 4.6.5 Comparison with Other Studies 95 4.7 Discussion 97 4.8 Summary 99 5 AN ALL-INCLUSIVE SIMULATOR FOR A PCM CONTROLLER 100 5.1 Motivation 101 5.2 PCMCsim: PCM Controller Simulator 103 5.2.1 Architectural Overview 103 5.2.2 Underlying Classes of PCMCsim 104 5.2.3 Implementation of Contention Behavior 108 5.2.4 Modules of PCMCsim 109 5.3 Evaluation 116 5.3.1 Correctness of the Simulator 116 5.3.2 Comparison with Other Simulators 117 5.4 Summary 119 6 Conclusion 120 Abstract (In Korean) 141 Acknowledgment 143박
    corecore