23 research outputs found

    Persistent Memory File Systems:A Survey

    Get PDF
    Persistent Memory (PM) is non-volatile byte-addressable memory that offers read and write latencies in the order of magnitude smaller than flash storage, such as SSDs. This survey discusses how file systems address the most prominent challenges in the implementation of file systems for Persistent Memory. First, we discuss how the properties of Persistent Memory change file system design. Second, we discuss work that aims to optimize small file I/O and the associated meta-data resolution. Third, we address how existing Persistent Memory file systems achieve (meta) data persistence and consistency

    Understanding and Optimizing Flash-based Key-value Systems in Data Centers

    Get PDF
    Flash-based key-value systems are widely deployed in today’s data centers for providing high-speed data processing services. These systems deploy flash-friendly data structures, such as slab and Log Structured Merge(LSM) tree, on flash-based Solid State Drives(SSDs) and provide efficient solutions in caching and storage scenarios. With the rapid evolution of data centers, there appear plenty of challenges and opportunities for future optimizations. In this dissertation, we focus on understanding and optimizing flash-based key-value systems from the perspective of workloads, software, and hardware as data centers evolve. We first propose an on-line compression scheme, called SlimCache, considering the unique characteristics of key-value workloads, to virtually enlarge the cache space, increase the hit ratio, and improve the cache performance. Furthermore, to appropriately configure increasingly complex modern key-value data systems, which can have more than 50 parameters with additional hardware and system settings, we quantitatively study and compare five multi-objective optimization methods for auto-tuning the performance of an LSM-tree based key-value store in terms of throughput, the 99th percentile tail latency, convergence time, real-time system throughput, and the iteration process, etc. Last but not least, we conduct an in-depth, comprehensive measurement work on flash-optimized key-value stores with recently emerging 3D XPoint SSDs. We reveal several unexpected bottlenecks in the current key-value store design and present three exemplary case studies to showcase the efficacy of removing these bottlenecks with simple methods on 3D XPoint SSDs. Our experimental results show that our proposed solutions significantly outperform traditional methods. Our study also contributes to providing system implications for auto-tuning the key-value system on flash-based SSDs and optimizing it on revolutionary 3D XPoint based SSDs

    Next-Gen Hybrid Memory and Interconnect System Architectures

    Get PDF
    This dissertation mainly addresses two problems that emerge along with the 'big data' trend: the increasing demands of memory capacity for mobile computing platform, and the needs for interconnection network with higher bandwidth/energy efficiency in the HPC/Data Center. The current mobile applications have rapidly growing memory footprints, posing a great challenge for memory system design. Insufficient DRAM main memory will incur frequent data swaps between memory and storage, a process that hurts performance, consumes energy and deteriorates the write endurance of typical flash storage devices. Alternately, a larger DRAM has higher leakage power and drains the battery faster. Further, DRAM scaling trends make further growth of DRAM in the mobile space prohibitive due to cost. Emerging non-volatile memory (NVM) has the potential to alleviate these issues due to its higher capacity per cost than DRAM and minimal static power. Recently, a wide spectrum of NVM technologies, including phase-change memories (PCM), memristor, and 3D XPoint have emerged. Despite the mentioned advantages, NVM has longer access latency compared to DRAM and NVM writes can incur higher latencies and wear costs. Therefore integration of these new memory technologies in the memory hierarchy requires a fundamental rearchitecting of traditional system designs. In this work, we propose a hardware-accelerated memory manager (HMMU) that addresses both types of memory in a flat space address space. We design a set of data placement and data migration policies within this memory manager, such that we may exploit the advantages of each memory technology. By augmenting the system with this HMMU, we reduce the overall memory latency while also reducing energy consumption and writes to the NVM. Experimental results show that our design achieves a 39% reduction in energy consumption with only a 12% performance degradation versus an all-DRAM baseline that is likely untenable in the future. After developing the pure hardware memory management for the data migration between DRAM and NVM, we consider to integrate information from the software stack into our system. These software information, such as programmers' hints or application profiling results, reveals the longer-term memory access pattern and data object properties; but they come at the cost of high software latency. Hardware approaches can avoid the latencies of software kernel processes related to page migration, such as page fault handling. However, hardware's vision is limited to a short time window, as it can only monitor and analyze the recently received memory requests. Ideally, the execution time advantages of pure hardware approaches, should be combined with the data object properties in a global scope. Further, application programmer's hints could guide the data placement at the allocation time, thus data objects with similar property could be congregated to reduce unnecessary page migrations. In this work, we propose such a hardware-software cooperative approach. In particular, we built a heap memory manager that allows the programmer to choose the memory type for each data object allocation. Such denotations are relayed to the hardware memory manager as hints for the decisions on data placement and migration. Meanwhile the hardware memory manager is still capable of capturing the per-application phase changes and maintaining flexibility in its data redistribution. The integration of the two mechanisms leads to optimal results from both long-term and short-term aspects. Experiment results show that our design shortens the overall memory latency while also reducing energy consumption and writes to the NVM versus prior approaches. Our design achieves a 40% reduction in energy consumption with only a 16% performance degradation versus the all-DRAM memory system. As for the HPC/Data domain, a primary problem is how to scale up the interconnection network to service the ever-increasing number of nodes. Photonic-links, with its high bandwidth and low signal loss across long distance propagation, is a promising technology to solve this problem. The higher bandwidth allows the router to connect more nodes while the long-distance connection makes it possible to implement more advanced typologies, such as the flattened butterfly. Both factors help to reduce the average number of hops between nodes across the network. Such high-radix and short distance network is essential to provisioning low latency communications in massive scale systems. However, due to the different physical and device properties, interconnection network needs redesign to adopt the photonic links. We first listed the basic formulas and design flow for interconnection network, and introduced a highly efficient event-driven simulator. Then we conducted a series of experiments to explore the design space, and gave a quantitative comparison between interconnection networks made of pure electrical links and those with electronic/photonic hybrid design

    Redesigning Transaction Processing Systems for Non-Volatile Memory

    Get PDF
    Department of Computer Science and EngineeringTransaction Processing Systems are widely used because they make the user be able to manage their data more efficiently. However, they suffer performance bottleneck due to the redundant I/O for guaranteeing data consistency. In addition to the redundant I/O, slow storage device makes the performance more degraded. Leveraging non-volatile memory is one of the promising solutions the performance bottleneck in Transaction Processing Systems. However, since the I/O granularity of legacy storage devices and non-volatile memory is not equal, traditional Transaction Processing System cannot fully exploit the performance of persistent memory. The goal of this dissertation is to fully exploit non-volatile memory for improving the performance of Transaction Processing Systems. Write amplification between Transaction Processing System is pointed out as a performance bottleneck. As first approach, we redesigned Transaction Processing Systems to minimize the redundant I/O between the Transaction Processing Systems. We present LS-MVBT that integrates recovery information into the main database file to remove temporary files for recovery. The LS-MVBT also employs five optimizations to reduce the write traffics in single fsync() calls. We also exploit the persistent memory to reduce the performance bottleneck from slow storage devices. However, since the traditional recovery method is for slow storage devices, we develop byte-addressable differential logging, user-level heap manager, and transaction-aware persistence to fully exploit the persistent memory. To minimize the redundant I/O for guarantee data consistency, we present the failure-atomic slotted paging with persistent buffer cache. Redesigning indexing structure is the second approach to exploit the non-volatile memory fully. Since the B+-tree is originally designed for block granularity, It generates excessive I/O traffics in persistent memory. To mitigate this traffic, we develop cache line friendly B+-tree which aligns its node size to cache line size. It can minimize the write traffic. Moreover, with hardware transactional memory, it can update its single node atomically without any additional redundant I/O for guaranteeing data consistency. It can also adapt Failure-Atomic Shift and Failure-Atomic In-place Rebalancing to eliminate unnecessary I/O. Furthermore, We improved the persistent memory manager that exploit traditional memory heap structure with free-list instead of segregated lists for small memory allocations to minimize the memory allocation overhead. Our performance evaluation shows that our improved version that consider I/O granularity of non-volatile memory can efficiently reduce the redundant I/O traffic and improve the performance by large of a margin.ope

    Transactional and analytical data management on persistent memory

    Get PDF
    Die zunehmende Anzahl von Smart-Geräten und Sensoren, aber auch die sozialen Medien lassen das Datenvolumen und damit die geforderte Verarbeitungsgeschwindigkeit stetig wachsen. Gleichzeitig müssen viele Anwendungen Daten persistent speichern oder sogar strenge Transaktionsgarantien einhalten. Die neuartige Speichertechnologie Persistent Memory (PMem) mit ihren einzigartigen Eigenschaften scheint ein natürlicher Anwärter zu sein, um diesen Anforderungen effizient nachzukommen. Sie ist im Vergleich zu DRAM skalierbarer, günstiger und dauerhaft. Im Gegensatz zu Disks ist sie deutlich schneller und direkt adressierbar. Daher wird in dieser Dissertation der gezielte Einsatz von PMem untersucht, um den Anforderungen moderner Anwendung gerecht zu werden. Nach der Darlegung der grundlegenden Arbeitsweise von und mit PMem, konzentrieren wir uns primär auf drei Aspekte der Datenverwaltung. Zunächst zerlegen wir mehrere persistente Daten- und Indexstrukturen in ihre zugrundeliegenden Entwurfsprimitive, um Abwägungen für verschiedene Zugriffsmuster aufzuzeigen. So können wir ihre besten Anwendungsfälle und Schwachstellen, aber auch allgemeine Erkenntnisse über das Entwerfen von PMem-basierten Datenstrukturen ermitteln. Zweitens schlagen wir zwei Speicherlayouts vor, die auf analytische Arbeitslasten abzielen und eine effiziente Abfrageausführung auf beliebigen Attributen ermöglichen. Während der erste Ansatz eine verknüpfte Liste von mehrdimensionalen gruppierten Blöcken verwendet, handelt es sich beim zweiten Ansatz um einen mehrdimensionalen Index, der Knoten im DRAM zwischenspeichert. Drittens zeigen wir unter Verwendung der bisherigen Datenstrukturen und Erkenntnisse, wie Datenstrom- und Ereignisverarbeitungssysteme mit transaktionaler Zustandsverwaltung verbessert werden können. Dabei schlagen wir ein neuartiges Transactional Stream Processing (TSP) Modell mit geeigneten Konsistenz- und Nebenläufigkeitsprotokollen vor, die an PMem angepasst sind. Zusammen sollen die diskutierten Aspekte eine Grundlage für die Entwicklung noch ausgereifterer PMem-fähiger Systeme bilden. Gleichzeitig zeigen sie, wie Datenverwaltungsaufgaben PMem ausnutzen können, indem sie neue Anwendungsgebiete erschließen, die Leistung, Skalierbarkeit und Wiederherstellungsgarantien verbessern, die Codekomplexität vereinfachen sowie die ökonomischen und ökologischen Kosten reduzieren.The increasing number of smart devices and sensors, but also social media are causing the volume of data and thus the demanded processing speed to grow steadily. At the same time, many applications need to store data persistently or even comply with strict transactional guarantees. The novel storage technology Persistent Memory (PMem), with its unique properties, seems to be a natural candidate to meet these requirements efficiently. Compared to DRAM, it is more scalable, less expensive, and durable. In contrast to disks, it is significantly faster and directly addressable. Therefore, this dissertation investigates the deliberate employment of PMem to fit the needs of modern applications. After presenting the fundamental work of and with PMem, we focus primarily on three aspects of data management. First, we disassemble several persistent data and index structures into their underlying design primitives to reveal the trade-offs for various access patterns. It allows us to identify their best use cases and vulnerabilities but also to gain general insights into the design of PMem-based data structures. Second, we propose two storage layouts that target analytical workloads and enable an efficient query execution on arbitrary attributes. While the first approach employs a linked list of multi-dimensional clustered blocks that potentially span several storage layers, the second approach is a multi-dimensional index that caches nodes in DRAM. Third, we show how to improve stream and event processing systems involving transactional state management using the preceding data structures and insights. In this context, we propose a novel Transactional Stream Processing (TSP) model with appropriate consistency and concurrency protocols adapted to PMem. Together, the discussed aspects are intended to provide a foundation for developing even more sophisticated PMemenabled systems. At the same time, they show how data management tasks can take advantage of PMem by opening up new application domains, improving performance, scalability, and recovery guarantees, simplifying code complexity, plus reducing economic and environmental costs

    Prefetched Address Translation

    Get PDF
    With explosive growth in dataset sizes and increasing machine memory capacities, per-application memory footprints are commonly reaching into hundreds of GBs. Such huge datasets pressure the TLB, resulting in frequent misses that must be resolved through a page walk - a long-latency pointer chase through multiple levels of the in-memory radix tree-based page table.Anticipating further growth in dataset sizes and their adverse affect on TLB hit rates, this work seeks to accelerate page walks while fully preserving existing virtual memory abstractions and mechanisms - a must for software compatibility and generality. Our idea is to enable direct indexing into a given level of the page table, thus eliding the need to first fetch pointers from the preceding levels. A key contribution of our work is in showing that this can be done by simply ordering the pages containing the page table in physical memory to match the order of the virtual memory pages they map to. Doing so enables direct indexing into the page table using a base-plus-offset arithmetic.We introduce Address Translation with Prefetching (ASAP), a new approach for reducing the latency of address translation to a single access to the memory hierarchy. Upon a TLB miss, ASAP launches prefetches to the deeper levels of the page table, bypassing the preceding levels. These prefetches happen concurrently with a conventional page walk, which observes a latency reduction due to prefetching while guaranteeing that only correctly-predicted entries are consumed. ASAP requires minimal extensions to the OS and trivial microarchitectural support. Moreover, ASAP is fully legacy-preserving, requiring no modifications to the existing radix tree-based page table, TLBs and other software and hardware mechanisms for address translation. Our evaluation on a range of memory-intensive workloads shows that under SMT colocation, ASAP is able to reduce page walk latency by an average of 25% (42% max) in native execution, and 45% (55% max) under virtualization
    corecore