62 research outputs found

    Bridging the Latency Gap between NVM and DRAM for Latency-bound Operations

    Get PDF
    Non-Volatile Memory (NVM) technologies exhibit 4× the read access latency of conventional DRAM. When the working set does not fit in the processor cache, this latency gap between DRAM and NVM leads to more than 2× runtime increase for queries dominated by latency-bound operations such as index joins and tuple reconstruction. We explain how to easily hide NVM latency by interleaving the execution of parallel work in index joins and tuple reconstruction using coroutines. Our evaluation shows that interleaving applied to the non-trivial implementations of these two operations in a production-grade codebase accelerates end-to-end query runtimes on both NVM and DRAM by up to 1.7× and 2.6× respectively, thereby reducing the performance difference between DRAM and NVM by more than 60%

    Extending Memory Capacity in Consumer Devices with Emerging Non-Volatile Memory: An Experimental Study

    Full text link
    The number and diversity of consumer devices are growing rapidly, alongside their target applications' memory consumption. Unfortunately, DRAM scalability is becoming a limiting factor to the available memory capacity in consumer devices. As a potential solution, manufacturers have introduced emerging non-volatile memories (NVMs) into the market, which can be used to increase the memory capacity of consumer devices by augmenting or replacing DRAM. Since entirely replacing DRAM with NVM in consumer devices imposes large system integration and design challenges, recent works propose extending the total main memory space available to applications by using NVM as swap space for DRAM. However, no prior work analyzes the implications of enabling a real NVM-based swap space in real consumer devices. In this work, we provide the first analysis of the impact of extending the main memory space of consumer devices using off-the-shelf NVMs. We extensively examine system performance and energy consumption when the NVM device is used as swap space for DRAM main memory to effectively extend the main memory capacity. For our analyses, we equip real web-based Chromebook computers with the Intel Optane SSD, which is a state-of-the-art low-latency NVM-based SSD device. We compare the performance and energy consumption of interactive workloads running on our Chromebook with NVM-based swap space, where the Intel Optane SSD capacity is used as swap space to extend main memory capacity, against two state-of-the-art systems: (i) a baseline system with double the amount of DRAM than the system with the NVM-based swap space; and (ii) a system where the Intel Optane SSD is naively replaced with a state-of-the-art (yet slower) off-the-shelf NAND-flash-based SSD, which we use as a swap space of equivalent size as the NVM-based swap space

    Bridging the Gap between Application and Solid-State-Drives

    Get PDF
    Data storage is one of the important and often critical parts of the computing system in terms of performance, cost, reliability, and energy. Numerous new memory technologies, such as NAND flash, phase change memory (PCM), magnetic RAM (STT-RAM) and Memristor, have emerged recently. Many of them have already entered the production system. Traditional storage optimization and caching algorithms are far from optimal because storage I/Os do not show simple locality. To provide optimal storage we need accurate predictions of I/O behavior. However, the workloads are increasingly dynamic and diverse, making the long and short time I/O prediction challenge. Because of the evolution of the storage technologies and the increasing diversity of workloads, the storage software is becoming more and more complex. For example, Flash Translation Layer (FTL) is added for NAND-flash based Solid State Disks (NAND-SSDs). However, it introduces overhead such as address translation delay and garbage collection costs. There are many recent studies aim to address the overhead. Unfortunately, there is no one-size-fits-all solution due to the variety of workloads. Despite rapidly evolving in storage technologies, the increasing heterogeneity and diversity in machines and workloads coupled with the continued data explosion exacerbate the gap between computing and storage speeds. In this dissertation, we improve the data storage performance from both top-down and bottom-up approach. First, we will investigate exposing the storage level parallelism so that applications can avoid I/O contentions and workloads skew when scheduling the jobs. Second, we will study how architecture aware task scheduling can improve the performance of the application when PCM based NVRAM are equipped. Third, we will develop an I/O correlation aware flash translation layer for NAND-flash based Solid State Disks. Fourth, we will build a DRAM-based correlation aware FTL emulator and study the performance in various filesystems

    Doctor of Philosophy in Computing

    Get PDF
    dissertationThe demand for main memory capacity has been increasing for many years and will continue to do so. In the past, Dynamic Random Access Memory (DRAM) process scaling has enabled this increase in memory capacity. Along with continued DRAM scaling, the emergence of new technologies like 3D-stacking, buffered Dual Inline Memory Modules (DIMMs), and crosspoint nonvolatile memory promise to continue this trend in the years ahead. However, these technologies will bring with them their own gamut of problems. In this dissertation, I look at the problems facing these technologies from a current delivery perspective. 3D-stacking increases memory capacity available per package, but the increased current requirement means that more pins on the package have to be now dedicated to provide Vdd/Vss, hence increasing cost. At the system level, using buffered DIMMs to increase the number of DRAM ranks increases the peak current requirements of the system if all the DRAM chips in the system are Refreshed simultaneously. Crosspoint memories promise to greatly increase bit densities but have long read latencies because of sneak currents in the cross-bar. In this dissertation, I provide architectural solutions to each of these problems. We observe that smart data placement by the architecture and the Operating System (OS) is a vital ingredient in all of these solutions. We thereby mitigate major bottlenecks in these technologies, hence enabling higher memory densities

    ATOM: Atomic Durability in Non-volatile Memory through Hardware Logging

    Get PDF
    • …
    corecore