4 research outputs found

    A Modern Primer on Processing in Memory

    Full text link
    Modern computing systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in computing that cause performance, scalability and energy bottlenecks: (1) data access is a key bottleneck as many important applications are increasingly data-intensive, and memory bandwidth and energy do not scale well, (2) energy consumption is a key limiter in almost all computing platforms, especially server and mobile systems, (3) data movement, especially off-chip to on-chip, is very expensive in terms of bandwidth, energy and latency, much more so than computation. These trends are especially severely-felt in the data-intensive server and energy-constrained mobile systems of today. At the same time, conventional memory technology is facing many technology scaling challenges in terms of reliability, energy, and performance. As a result, memory system architects are open to organizing memory in different ways and making it more intelligent, at the expense of higher cost. The emergence of 3D-stacked memory plus logic, the adoption of error correcting codes inside the latest DRAM chips, proliferation of different main memory standards and chips, specialized for different purposes (e.g., graphics, low-power, high bandwidth, low latency), and the necessity of designing new solutions to serious reliability and security issues, such as the RowHammer phenomenon, are an evidence of this trend. This chapter discusses recent research that aims to practically enable computation close to data, an approach we call processing-in-memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside the memory chips, in the logic layer of 3D-stacked memory, or in the memory controllers), so that data movement between the computation units and memory is reduced or eliminated.Comment: arXiv admin note: substantial text overlap with arXiv:1903.0398

    Architectural Techniques for Multi-Level Cell Phase Change Memory Based Main Memory

    Get PDF
    Phase change memory (PCM) recently has emerged as a promising technology to meet the fast growing demand for large capacity main memory in modern computing systems. Multi-level cell (MLC) PCM storing multiple bits in a single cell offers high density with low per-byte fabrication cost. However, PCM suffers from long write latency, short cell endurance, limited write throughput and high peak power, which makes it challenging to be integrated in the memory hierarchy. To address the long write latency, I propose write truncation to reduce the number of write iterations with the assistance of an extra error correction code (ECC). I also propose form switch (FS) to reduce the storage overhead of the ECC. By storing highly compressible lines in single level cell (SLC) form, FS improves read latency as well. To attack the short cell endurance and large peak power, I propose elastic RESET (ER) to construct triple-level cell PCM. By reducing RESET energy, ER significantly reduces peak power and prolongs PCM lifetime. To improve the write concurrency, I propose fine-grained write power budgeting (FPB) observing a global power budget and regulates power across write iterations according to the step-down power demand of each iteration. A global charge pump is also integrated onto a DIMM to boost power for hot PCM chips while staying within the global power budget. To further reduce the peak power, I propose intra-write RESET scheduling distributing cell RESET initializations in the whole write operation duration, so that the on-chip charge pump size can also be reduced

    Multi-Level Main Memory Systems: Technology Choices, Design Considerations, and Trade-off Analysis

    Get PDF
    Multi-level main memory systems provide a way to leverage the advantages of different memory technologies to build a main memory that overcomes the limitations of the current flat DRAM-based architecture. The slowdown of DRAM scaling has resulted in the development of new memory technologies that potentially enable the continued improvement of the main memory system in terms of performance, capacity, and energy efficiency. However, all of these novel technologies have weaknesses that necessitate the utilization of a multi-level main memory hierarchy in order to build a main memory system with acceptable characteristics. This dissertation investigates the implications of these new multi-level main memory architectures and provides key insights into the trade-offs associated with the technology and organization choices that are integral to their design. The design space of multi-level main memory systems is much larger than the traditional main memory system's because it also includes additional cache design and technology choices. This dissertation divides the analysis of that space into three more manageable components. First, we begin by exploring the ways in which high level design choices affect this new type of system differently than current state of the art systems. Second, we focus on the details of the DRAM cache and propose a novel design that efficiently enables associativity. Finally, we turn our attention to the backing store and evaluate the performance effects of different organizations and optimizations for that system. From these studies we are able to identify the critical aspects of the system that contribute significantly to its overall performance. In particular, we note that in most potential systems the ratio of hit latency to miss latency is the dominant factor that determines performance. This motivated the development of our novel associative DRAM cache design in order to minimize the miss rate and reduce the impact of the miss latency while maintaining an acceptable hit latency. In addition, we also observe that selecting the page size, organization, and prefetching degree that best suits each particular backing store technology can help to reduce the miss penalty thereby improving the performance of the overall system

    Morphable Resistive Memory Optimization for Mobile Virtualization

    No full text
    corecore