6 research outputs found

    Cross-Layer Optimization Techniques for Improving Performance and Reliability of NAND Flash-Based Storage Systems

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 8. 김지홍.As the cost-per-bit of NAND flash memory is quickly improved by advanced process technologies and multi-leveling techniques, NAND flash-based storage systems are widely employed from mobile embedded systems to high-end enterprise server systems. Although the advanced process and device techniques have greatly improved the cost-per-bit of NAND flash memory, they have also significantly degraded the performance and reliability of NAND flash memory as key side effects of the advanced techniques. In order for NAND flash-based storage systems to be more broadly used in various computing environments, it is critical to overcome the performance and reliability problems of recent high-density NAND flash memory in a satisfactory fashion. In this dissertation, we argue that cross-layer optimization techniques, which vertically integrate various optimization factors from different design abstraction levels, can play key roles in improving performance and reliability of high-density NAND flash memory. First, we propose read-disturb management techniques which reduce the expensive read-disturb management overheads while maintaining reliability of NAND flash memory. An FTL using the read-disturb management module, called redFTL, alleviates highly skewed read accesses to a small part of NAND flash memory into more balanced read accesses to a large number of blocks, thus reducing data migrations needed for avoiding read-disturb errors. As an extended version of redFTL, we propose an integrated read-disturb management technique, called redFTL+, which fundamentally solves read-disturb problems by exploiting a tradeoff between the read disturbance and write speed. By modifying NAND chips to support multiple read modes with different read voltages and write speeds, redFTL+ intelligently allocates frequently-read data to read-resistant blocks. Since the read disturbance is also proportional to the read time, redFTL+ takes advantage of the difference in the read time among different NAND pages by reallocating read-intensive data to read-resistant pages. Second, we propose data separation techniques which reduce garbage collection overhead. We propose a program context-aware data separation technique, called PDS, which can reduce the garbage collection overhead by exploiting program context hints. By using a program context, which serves as a proper granularity of maintaining data update behavior, PDS helps an FTL gather data with similar update times to the same blocks. As an improved version of PDS, we propose an integrated data separation technique, called IDS, which uses both update history of NAND device and program context hints for predicting data update behaviors. By classifying data based on the cross-layer information, an FTL using IDS can make more dead or near-dead blocks over PDS, thus reducing the garbage collection overhead. In order to evaluate the effectiveness of the proposed techniques, we performed a series of evaluations using both a simulator and an emulator with I/O traces which were collected from various systems. Our experimental results show that cross-layer optimization techniques are more effective over our single-layer optimization techniques. RedFTL+ decreases the read-disturb management overhead on average by 24% over redFTL. The IDS-based FTL decreases the garbage collection overhead on aver-age by 18% over the PDS-based FTL. The evaluation results demonstrate that our cross-layer optimization techniques improve an overall performance of NAND-based storage systems over previous single-layered optimization techniques by reducing overheads from read-disturb management and garbage collection while maintaining the reliability of the storage systems.Contents I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Read-Disturb Problem . . . . . . . . . . . . . . . . 2 1.1.2 Garbage Collection Problem . . . . . . . . . . . . . 4 1.2 Research Goals and Contributions . . . . . . . . . . . . . . 7 1.3 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . 9 II. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1 NAND Flash Memory . . . . . . . . . . . . . . . . . . . . 11 2.2 System Software for NAND Flash Memory . . . . . . . . . 17 2.3 NAND Flash-Based Storage Devices . . . . . . . . . . . . . 18 2.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.1 Read-Disturb Techniques . . . . . . . . . . . . . . . 20 2.4.2 Data Separation Techniques . . . . . . . . . . . . . 21 III. A Single-Layered Read Disturb Management Technique . . . 24 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Performance Implications of Read Disturbs . . . . . . . . . 28 3.2.1 Effect of Frequent Read Reclaims . . . . . . . . . . 28 3.2.2 Effect of Read Reclaims on Response Time Fluctu- ations . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.3 Effect of SSD Read Buffer on Read Reclaims . . . . 31 3.3 Read Disturb Management Techniques . . . . . . . . . . . . 32 3.3.1 Data Distribution Technique . . . . . . . . . . . . . 32 3.3.2 Proactive Data Migration . . . . . . . . . . . . . . . 35 3.4 RedFTL: Read Disturb-Aware FTL . . . . . . . . . . . . . . 35 3.4.1 Overview of RedFTL . . . . . . . . . . . . . . . . . 35 3.4.2 Read-Hot Page Separation . . . . . . . . . . . . . . 37 3.4.3 Good Block Pool Management . . . . . . . . . . . . 38 3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . 38 IV. An Integrated Approach for Read Disturb Management . . . 43 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2 Read Disturb Management Techniques . . . . . . . . . . . . 46 4.2.1 Mitigation of Read Reclaims by Read Voltage Scaling 47 4.2.2 Mitigation of Read Reclaims by Read Operation Time Scaling . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2.3 NAND Read-Disturbance Model . . . . . . . . . . . 55 4.3 Design and Implementation of RedFTL+ . . . . . . . . . . . 57 4.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . 57 4.3.2 Dynamic Mode Selection . . . . . . . . . . . . . . . 58 4.3.3 Distributed Migration to RRBs . . . . . . . . . . . . 59 4.3.4 Read-Hotness Detection . . . . . . . . . . . . . . . 61 4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . 63 V. A Single-Layered Data Separation Technique . . . . . . . . . 70 5.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.1.1 Frequency-Based Data Separation . . . . . . . . . . 70 5.1.2 Garbage Collection Using ORA . . . . . . . . . . . 73 5.1.3 Evaluation of Existing Locality-based Heuristic . . . 74 5.2 Correlation between Program Contexts and Updates . . . . 78 5.3 PDS: Program Context-Aware Data Separation Technique . . 82 5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . 87 VI. An Integrated Data Separation Technique . . . . . . . . . . . 93 6.1 Limitations of Single-Layered Program Context-Aware Data Separation Technique . . . . . . . . . . . . . . . . . . . . . 93 6.2 IDS: Integrated Data Separation Technique . . . . . . . . . 94 6.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . 94 6.2.2 Determination of Update Program Context . . . . . 96 6.2.3 Dynamic Clustering Program Contexts Based On Update Locality . . . . . . . . . . . . . . . . . . . . 96 6.2.4 Managing The Hot Data Associated with An Update Program Context . . . . . . . . . . . . . . . . 103 6.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . 104 VII. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.2.1 Improving QoS of RedFTL+ by Exploiting Program Context Hints . . . . . . . . . . . . . . . . . . 114 7.2.2 Mitigating Read-Disturb Problem by Read Disturb- Aware Read Buffer Management Technique . . . . . 115 7.2.3 Improving Efficiency of Garbage Collection by Adjusting GC Trigger Points . . . . . . . . . . . . . . 115 7.2.4 Improving Performance and Reliability of NAND Flash Memory by Integrating Various Techniques . . 117 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126Docto

    異種の不揮発性メモリで構成される半導体ストレージシステムに関する研究

    Get PDF
    【学位授与の要件】中央大学学位規則第4条第1項【論文審査委員主査】竹内 健 (中央大学理工学部教授)【論文審査委員副査】山村 清隆(中央大学理工学部教授)、築山 修治(中央大学理工学部教授)、首藤 一幸(東京工業大学大学院情報理工学研究科准教授)博士(工学)中央大

    Integration of Non-volatile Memory into Storage Hierarchy

    Get PDF
    In this dissertation, we present novel approaches for integrating non-volatile memory devices into storage hierarchy of a computer system. There are several types of non- volatile memory devices, such as flash memory, Phase Change Memory (PCM), Spin- transfer torque memory (STT-RAM). These devices have many appealing features for applications; however, they also offer several challenges. This dissertation is focused on how to efficiently integrate these non-volatile memories into existing memory and disk storage systems. This work is composed of two major parts. The first part investigates a main-memory system employing Phase Change Memory instead of traditional DRAM. Compared to DRAM, PCM has higher density and no static power consumption, which are very important factors for building large capacity memory systems. However, PCM has higher write latency and power consumption compared to read operations. Moreover, PCM has limited write endurance. To efficiently integrate PCM into a memory system, we have to solve the challenges brought by its expensive write operations. We propose new replacement policies and cache organizations for the last-level CPU cache, which can effectively reduce the write traffic to the PCM main memory. We evaluated our design with multiple workloads and configurations. The results show that the proposed approaches improve the lifetime and energy consumption of PCM significantly. The second part of the dissertation considers the design of a data/disk storage using non-volatile memories, e.g. flash memory, PCM and nonvolatile DIMMs. We consider multiple design options for utilizing the nonvolatile memories in the storage hierarchy. First, we consider a system that employs nonvolatile memories such as PCM or nonvolatile DIMMs on memory bus along with flash-based SSDs. We propose a hybrid file system, NVMFS, that manages both these devices. NVMFS exploits the nonvolatile memory to improve the characteristics of the write workload at the SSD. We satisfy most small random write requests on the fast nonvolatile DIMM and only do large and optimized writes on SSD. We also group data of similar update patterns together before writing to flash-SSD; as a result, we can effectively reduce the garbage collection overhead. We implemented a prototype of NVMFS in Linux and evaluated its performance through multiple benchmarks. Secondly, we consider the problem of using flash memory as a cache for a disk drive based storage system. Since SSDs are expensive, a few SSDs are designed to serve as a cache for a large number of disk drives. SSD cache space can be used for both read and write requests. In our design, we managed multiple flash-SSD devices directly at the cache layer without the help of RAID software. To ensure data reliability and cache space efficiency, we only duplicated dirty data on flash- SSDs. We also balanced the write endurance of different flash-SSDs. As a result, no single SSD will fail much earlier than the others. Thirdly, when using PCM-like devices only as data storage, it’s possible to exploit memory management hardware resources to improve file system performance. However, in this case, PCM may share critical system resources such as the TLB, page table with DRAM which can potentially impact PCM’s performance. To solve this problem, we proposed to employ superpages to reduce the pressure on memory management resources. As a result, the file system performance is further improved

    Improving Reliability and Performance of NAND Flash Based Storage System

    Get PDF
    High seek and rotation overhead of magnetic hard disk drive (HDD) motivates development of storage devices, which can offer good random performance. As an alternative technology, NAND flash memory demonstrates low power consumption, microsecond-order access latency and good scalability. Thanks to these advantages, NAND flash based solid state disks (SSD) show many promising applications in enterprise servers. With multi-level cell (MLC) technique, the per-bit fabrication cost is reduced and low production cost enables NAND flash memory to extend its application to the consumer electronics. Despite these advantages, limited memory endurance, long data protection latency and write amplification continue to be the major challenges in the designs of NAND flash storage systems. The limited memory endurance and long data protection latency issue derive from memory bit errors. High bit error rate (BER) severely impairs data integrity and reduces memory durance. The limited endurance is a major obstacle to apply NAND flash memory to the application with high reliability requirement. To protect data integrity, hard-decision error correction codes (ECC) such as Bose-Chaudhuri-Hocquenghem (BCH) are employed. However, the hardware cost becomes prohibitively with the increase of BER when the BCH ECC is employed to extend system lifetime. To extend system lifespan without high hardware cost, we has proposed data pattern aware (DPA) error prevention system design. DPA realizes BER reduction by minimizing the occurrence of data patterns vulnerable to high BER with simple linear feedback shift register circuits. Experimental results show that DPA can increase the system lifetime by up to 4× with marginal hardware cost. With the technology node scaling down to 2Xnm, BER increases up to 0.01. Hard-decision ECCs and DPA are no longer applicable to guarantee data integrity due to either prohibitively high hardware cost or high storage overhead. Soft-decision ECC, such as lowdensity parity check (LDPC) code, has been introduced to provide more powerful error correction capability. However, LDPC code demands extra memory sensing operations, directly leading to long read latency. To reduce LDPC code induced read latency without adverse impact on system reliability, we has proposed FlexLevel NAND flash storage system design. The FlexLevel design reduces BER by broadening the noise margin via threshold voltage (Vth) level reduction. Under relatively low BER, no extra sensing level is required and therefore read performance can be improved. To balance Vth level reduction induced capacity loss and the read speedup, the FlexLevel design identifies the data with high LDPC overhead and only performs Vth reduction to these data. Experimental results show that compared with the best existing works, the proposed design achieves up to 11% read speedup with negligible capacity loss. Write amplification is a major cause to performance and endurance degradation of the NAND flash based storage system. In the object-based NAND flash device (ONFD), write amplification partially results from onode partial update and cascading update. Onode partial update only over-writes partial data of a NAND flash page and incurs unnecessary data migration of the un-updated data. Cascading update is update to object metadata in a cascading manner due to object data update or migration. Even through only several bytes in the object metadata are updated, one or more page has to be re-written, significantly degrading write performance. To minimize write operations incurred by onode partial update and cascading update, we has proposed a Data Migration Minimizing (DMM) device design. The DMM device incorporates 1) the multi-level garbage collection technique to minimize the unnecessary data migration of onode partial update and 2) the virtual B+ tree and diff cache to reduce the write operations incurred by cascading update. The experiment results demonstrate that the DMM device can offer up to 20% write reduction compared with the best state-of-art works
    corecore