117 research outputs found

    Dynamic Virtual Page-based Flash Translation Layer with Novel Hot Data Identification and Adaptive Parallelism Management

    Get PDF
    Solid-state disks (SSDs) tend to replace traditional motor-driven hard disks in high-end storage devices in past few decades. However, various inherent features, such as out-of-place update [resorting to garbage collection (GC)] and limited endurance (resorting to wear leveling), need to be reduced to a large extent before that day comes. Both the GC and wear leveling fundamentally depend on hot data identification (HDI). In this paper, we propose a hot data-aware flash translation layer architecture based on a dynamic virtual page (DVPFTL) so as to improve the performance and lifetime of NAND flash devices. First, we develop a generalized dual layer HDI (DL-HDI) framework, which is composed of a cold data pre-classifier and a hot data post-identifier. Those can efficiently follow the frequency and recency of information access. Then, we design an adaptive parallelism manager (APM) to assign the clustered data chunks to distinct resident blocks in the SSD so as to prolong its endurance. Finally, the experimental results from our realized SSD prototype indicate that the DVPFTL scheme has reliably improved the parallelizability and endurance of NAND flash devices with improved GC-costs, compared with related works.Peer reviewe

    A Survey on the Integration of NAND Flash Storage in the Design of File Systems and the Host Storage Software Stack

    Get PDF
    With the ever-increasing amount of data generate in the world, estimated to reach over 200 Zettabytes by 2025, pressure on efficient data storage systems is intensifying. The shift from HDD to flash-based SSD provides one of the most fundamental shifts in storage technology, increasing performance capabilities significantly. However, flash storage comes with different characteristics than prior HDD storage technology. Therefore, storage software was unsuitable for leveraging the capabilities of flash storage. As a result, a plethora of storage applications have been design to better integrate with flash storage and align with flash characteristics. In this literature study we evaluate the effect the introduction of flash storage has had on the design of file systems, which providing one of the most essential mechanisms for managing persistent storage. We analyze the mechanisms for effectively managing flash storage, managing overheads of introduced design requirements, and leverage the capabilities of flash storage. Numerous methods have been adopted in file systems, however prominently revolve around similar design decisions, adhering to the flash hardware constrains, and limiting software intervention. Future design of storage software remains prominent with the constant growth in flash-based storage devices and interfaces, providing an increasing possibility to enhance flash integration in the host storage software stack

    A Survey on the Integration of NAND Flash Storage in the Design of File Systems and the Host Storage Software Stack

    Full text link
    With the ever-increasing amount of data generate in the world, estimated to reach over 200 Zettabytes by 2025, pressure on efficient data storage systems is intensifying. The shift from HDD to flash-based SSD provides one of the most fundamental shifts in storage technology, increasing performance capabilities significantly. However, flash storage comes with different characteristics than prior HDD storage technology. Therefore, storage software was unsuitable for leveraging the capabilities of flash storage. As a result, a plethora of storage applications have been design to better integrate with flash storage and align with flash characteristics. In this literature study we evaluate the effect the introduction of flash storage has had on the design of file systems, which providing one of the most essential mechanisms for managing persistent storage. We analyze the mechanisms for effectively managing flash storage, managing overheads of introduced design requirements, and leverage the capabilities of flash storage. Numerous methods have been adopted in file systems, however prominently revolve around similar design decisions, adhering to the flash hardware constrains, and limiting software intervention. Future design of storage software remains prominent with the constant growth in flash-based storage devices and interfaces, providing an increasing possibility to enhance flash integration in the host storage software stack

    Bridging the Gap between Application and Solid-State-Drives

    Get PDF
    Data storage is one of the important and often critical parts of the computing system in terms of performance, cost, reliability, and energy. Numerous new memory technologies, such as NAND flash, phase change memory (PCM), magnetic RAM (STT-RAM) and Memristor, have emerged recently. Many of them have already entered the production system. Traditional storage optimization and caching algorithms are far from optimal because storage I/Os do not show simple locality. To provide optimal storage we need accurate predictions of I/O behavior. However, the workloads are increasingly dynamic and diverse, making the long and short time I/O prediction challenge. Because of the evolution of the storage technologies and the increasing diversity of workloads, the storage software is becoming more and more complex. For example, Flash Translation Layer (FTL) is added for NAND-flash based Solid State Disks (NAND-SSDs). However, it introduces overhead such as address translation delay and garbage collection costs. There are many recent studies aim to address the overhead. Unfortunately, there is no one-size-fits-all solution due to the variety of workloads. Despite rapidly evolving in storage technologies, the increasing heterogeneity and diversity in machines and workloads coupled with the continued data explosion exacerbate the gap between computing and storage speeds. In this dissertation, we improve the data storage performance from both top-down and bottom-up approach. First, we will investigate exposing the storage level parallelism so that applications can avoid I/O contentions and workloads skew when scheduling the jobs. Second, we will study how architecture aware task scheduling can improve the performance of the application when PCM based NVRAM are equipped. Third, we will develop an I/O correlation aware flash translation layer for NAND-flash based Solid State Disks. Fourth, we will build a DRAM-based correlation aware FTL emulator and study the performance in various filesystems

    Letter from the Special Issue Editor

    Get PDF
    Editorial work for DEBULL on a special issue on data management on Storage Class Memory (SCM) technologies

    SSD์˜ ๊ธด ๊ผฌ๋ฆฌ ์ง€์—ฐ์‹œ๊ฐ„ ๋ฌธ์ œ ์™„ํ™”๋ฅผ ์œ„ํ•œ ๊ฐ•ํ™”ํ•™์Šต์˜ ์ ์šฉ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ์œ ์Šน์ฃผ.NAND flash memory is widely used in a variety of systems, from realtime embedded systems to high-performance enterprise server systems. Flash memory has (1) erase-before-write (write-once) and (2) endurance problems. To handle the erase-before-write feature, apply a flash-translation layer (FTL). Currently, the page-level mapping method is mainly used to reduce the latency increase caused by the write-once and block erase characteristics of flash memory. Garbage collection (GC) is one of the leading causes of long-tail latency, which increases more than 100 times the average latency at 99th percentile. Therefore, real-time systems or quality-critical systems cannot satisfy given requirements such as QoS restrictions. As flash memory capacity increases, GC latency also tends to increase. This is because the block size (the number of pages included in one block) of the flash memory increases as the capacity of the flash memory increases. GC latency is determined by valid page copy and block erase time. Therefore, as block size increases, GC latency also increases. Especially, the block size gets increased from 2D to 3D NAND flash memory, e.g., 256 pages/block in 2D planner NAND flash memory and 768 pages/block in 3D NAND flash memory. Even in 3D NAND flash memory, the block size is expected to continue to increase. Thus, the long write latency problem incurred by GC can become more serious in 3D NAND flash memory-based storage. In this dissertation, we propose three versions of the novel GC scheduling method based on reinforcement learning. The purpose of this method is to reduce the long tail latency caused by GC by utilizing the idle time of the storage system. Also, we perform a quantitative analysis for the RL-assisted GC solution. RL-assisted GC scheduling technique was proposed which learns the storage access behavior online and determines the number of GC operations to exploit the idle time. We also presented aggressive methods, which helps in further reducing the long tail latency by aggressively performing fine-grained GC operations. We also proposed a technique that dynamically manages key states in RL-assisted GC to reduce the long-tail latency. This technique uses many fine-grained pieces of information as state candidates and manages key states that suitably represent the characteristics of the workload using a relatively small amount of memory resource. Thus, the proposed method can reduce the long-tail latency even further. In addition, we presented a Q-value prediction network that predicts the initial Q-value of a newly inserted state in the Q-table cache. The integrated solution of the Q-table cache and Q-value prediction network can exploit the short-term history of the system with a low-cost Q-table cache. It is also equipped with a small network called Q-value prediction network to make use of the long-term history and provide good Q-value initialization for the Q-table cache. The experiments show that our proposed method reduces by 25%-37% the long tail latency compared to the state-of-the-art method.๋‚ธ๋“œ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ๋Š” ์‹ค์‹œ๊ฐ„ ์ž„๋ฒ ๋””๋“œ ์‹œ์Šคํ…œ์œผ๋กœ๋ถ€ํ„ฐ ๊ณ ์„ฑ๋Šฅ์˜ ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ์„œ๋ฒ„ ์‹œ์Šคํ…œ๊นŒ์ง€ ๋‹ค์–‘ํ•œ ์‹œ์Šคํ…œ์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ ๋˜๊ณ  ์žˆ๋‹ค. ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ๋Š” (1) erase-before-write (write-once)์™€ (2) endurance ๋ฌธ์ œ๋ฅผ ๊ฐ–๊ณ  ์žˆ๋‹ค. Erase-before-write ํŠน์„ฑ์„ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด flash-translation layer (FTL)์„ ์ ์šฉ ํ•œ๋‹ค. ํ˜„์žฌ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์˜ write-once ํŠน์„ฑ๊ณผ block eraseํŠน์„ฑ์œผ๋กœ ์ธํ•œ latency ์ฆ๊ฐ€๋ฅผ ๊ฐ์†Œ ์‹œํ‚ค๊ธฐ ์œ„ํ•˜์—ฌ page-level mapping๋ฐฉ์‹์ด ์ฃผ๋กœ ์‚ฌ์šฉ ๋œ๋‹ค. Garbage collection (GC)์€ 99th percentile์—์„œ ํ‰๊ท  ์ง€์—ฐ์‹œ๊ฐ„์˜ 100๋ฐฐ ์ด์ƒ ์ฆ๊ฐ€ํ•˜๋Š” long tail latency๋ฅผ ์œ ๋ฐœ์‹œํ‚ค๋Š” ์ฃผ์š” ์›์ธ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๋”ฐ๋ผ์„œ ์‹ค์‹œ๊ฐ„ ์‹œ์Šคํ…œ์ด๋‚˜ quality-critical system์—์„œ๋Š” Quality of Service (QoS) ์ œํ•œ๊ณผ ๊ฐ™์€ ์ฃผ์–ด์ง„ ์š”๊ตฌ ์กฐ๊ฑด์„ ๋งŒ์กฑ ์‹œํ‚ฌ ์ˆ˜ ์—†๋‹ค. ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์˜ ์šฉ๋Ÿ‰์ด ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ GC latency๋„ ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์ธ๋‹ค. ์ด๊ฒƒ์€ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์˜ ์šฉ๋Ÿ‰์ด ์ฆ๊ฐ€ ํ•จ์— ๋”ฐ๋ผ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์˜ ๋ธ”๋ก ํฌ๊ธฐ (ํ•˜๋‚˜์˜ ๋ธ”๋ก์ด ํฌํ•จํ•˜๊ณ  ์žˆ๋Š” ํŽ˜์ด์ง€์˜ ์ˆ˜)๊ฐ€ ์ฆ๊ฐ€ ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. GC latency๋Š” valid page copy์™€ block erase ์‹œ๊ฐ„์— ์˜ํ•ด ๊ฒฐ์ • ๋œ๋‹ค. ๋”ฐ๋ผ์„œ, ๋ธ”๋ก ํฌ๊ธฐ๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด, GC latency๋„ ์ฆ๊ฐ€ ํ•œ๋‹ค. ํŠนํžˆ, ์ตœ๊ทผ 2D planner ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์—์„œ 3D vertical ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์กฐ๋กœ ์ „ํ™˜๋จ์— ๋”ฐ๋ผ ๋ธ”๋ก ํฌ๊ธฐ๋Š” ์ฆ๊ฐ€ ํ•˜์˜€๋‹ค. ์‹ฌ์ง€์–ด 3D vertical ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์—์„œ๋„ ๋ธ”๋ก ํฌ๊ธฐ๊ฐ€ ์ง€์†์ ์œผ๋กœ ์ฆ๊ฐ€ ํ•˜๊ณ  ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ 3D vertical ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์—์„œ long tail latency ๋ฌธ์ œ๋Š” ๋”์šฑ ์‹ฌ๊ฐํ•ด ์ง„๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ ์šฐ๋ฆฌ๋Š” ๊ฐ•ํ™”ํ•™์Šต(Reinforcement learning, RL)์„ ์ด์šฉํ•œ ์„ธ ๊ฐ€์ง€ ๋ฒ„์ „์˜ ์ƒˆ๋กœ์šด GC scheduling ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ ๊ธฐ์ˆ ์˜ ๋ชฉ์ ์€ ์Šคํ† ๋ฆฌ์ง€ ์‹œ์Šคํ…œ์˜ idle ์‹œ๊ฐ„์„ ํ™œ์šฉํ•˜์—ฌ GC์— ์˜ํ•ด ๋ฐœ์ƒ๋œ long tail latency๋ฅผ ๊ฐ์†Œ ์‹œํ‚ค๋Š” ๊ฒƒ์ด๋‹ค. ๋˜ํ•œ, ์šฐ๋ฆฌ๋Š” RL-assisted GC ์†”๋ฃจ์…˜์„ ์œ„ํ•œ ์ •๋Ÿ‰ ๋ถ„์„ ํ•˜์˜€๋‹ค. ์šฐ๋ฆฌ๋Š” ์Šคํ† ๋ฆฌ์ง€์˜ access behavior๋ฅผ ์˜จ๋ผ์ธ์œผ๋กœ ํ•™์Šตํ•˜๊ณ , idle ์‹œ๊ฐ„์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋Š” GC operation์˜ ์ˆ˜๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” RL-assisted GC scheduling ๊ธฐ์ˆ ์„ ์ œ์•ˆ ํ•˜์˜€๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ ์šฐ๋ฆฌ๋Š” ๊ณต๊ฒฉ์ ์ธ ๋ฐฉ๋ฒ•์„ ์ œ์‹œ ํ•˜์˜€๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ์ž‘์€ ๋‹จ์œ„์˜ GC operation๋“ค์„ ๊ณต๊ฒฉ์ ์œผ๋กœ ์ˆ˜ํ–‰ ํ•จ์œผ๋กœ์จ, long tail latency๋ฅผ ๋”์šฑ ๊ฐ์†Œ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋„๋ก ๋„์›€์„ ์ค€๋‹ค. ๋˜ํ•œ ์šฐ๋ฆฌ๋Š” long tail latency๋ฅผ ๋”์šฑ ๊ฐ์†Œ์‹œํ‚ค๊ธฐ ์œ„ํ•˜์—ฌ RL-assisted GC์˜ key state๋“ค์„ ๋™์ ์œผ๋กœ ๊ด€๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” Q-table cache ๊ธฐ์ˆ ์„ ์ œ์•ˆ ํ•˜์˜€๋‹ค. ์ด ๊ธฐ์ˆ ์€ state ํ›„๋ณด๋กœ ๋งค์šฐ ๋งŽ์€ ์ˆ˜์˜ ์„ธ๋ฐ€ํ•œ ์ •๋ณด๋“ค์„ ์‚ฌ์šฉ ํ•˜๊ณ , ์ƒ๋Œ€์ ์œผ๋กœ ์ž‘์€ ๋ฉ”๋ชจ๋ฆฌ ๊ณต๊ฐ„์„ ์ด์šฉํ•˜์—ฌ workload์˜ ํŠน์„ฑ์„ ์ ์ ˆํ•˜๊ฒŒ ํ‘œํ˜„ ํ•  ์ˆ˜ ์žˆ๋Š” key state๋“ค์„ ๊ด€๋ฆฌ ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ long tail latency๋ฅผ ๋”์šฑ ๊ฐ์†Œ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ, ์šฐ๋ฆฌ๋Š” Q-table cache์— ์ƒˆ๋กญ๊ฒŒ ์ถ”๊ฐ€๋˜๋Š” state์˜ ์ดˆ๊ธฐ๊ฐ’์„ ์˜ˆ์ธกํ•˜๋Š” Q-value prediction network (QP Net)๋ฅผ ์ œ์•ˆ ํ•˜์˜€๋‹ค. Q-table cache์™€ QP Net์˜ ํ†ตํ•ฉ ์†”๋ฃจ์…˜์€ ์ € ๋น„์šฉ์˜ Q-table cache๋ฅผ ์ด์šฉํ•˜์—ฌ ๋‹จ๊ธฐ๊ฐ„์˜ ๊ณผ๊ฑฐ ์ •๋ณด๋ฅผ ํ™œ์šฉ ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ์ด๊ฒƒ์€ QP Net์ด๋ผ๊ณ  ๋ถ€๋ฅด๋Š” ์ž‘์€ ์‹ ๊ฒฝ๋ง์„ ์ด์šฉํ•˜์—ฌ ํ•™์Šตํ•œ ์žฅ๊ธฐ๊ฐ„์˜ ๊ณผ๊ฑฐ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Q-table cache์— ์ƒˆ๋กญ๊ฒŒ ์‚ฝ์ž…๋˜๋Š” state์— ๋Œ€ํ•ด ์ข‹์€ Q-value ์ดˆ๊ธฐ๊ฐ’์„ ์ œ๊ณตํ•œ๋‹ค. ์‹คํ—˜๊ฒฐ๊ณผ๋Š” ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ด state-of-the-art ๋ฐฉ๋ฒ•์— ๋น„๊ตํ•˜์—ฌ 25%-37%์˜ long tail latency๋ฅผ ๊ฐ์†Œ ์‹œ์ผฐ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.Chapter 1 Introduction 1 Chapter 2 Background 6 2.1 System Level Tail Latency 6 2.2 Solid State Drive 10 2.2.1 Flash Storage Architecture and Garbage Collection 10 2.3 Reinforcement Learning 13 Chapter 3 Related Work 17 Chapter 4 Small Q-table based Solution to Reduce Long Tail Latency 23 4.1 Problem and Motivation 23 4.1.1 Long Tail Problem in Flash Storage Access Latency 23 4.1.2 Idle Time in Flash Storage 24 4.2 Design and Implementation 26 4.2.1 Solution Overview 26 4.2.2 RL-assisted Garbage Collection Scheduling 27 4.2.3 Aggressive RL-assisted Garbage Collection Scheduling 33 4.3 Evaluation 35 4.3.1 Evaluation Setup 35 4.3.2 Results and Discussion 39 Chapter 5 Q-table Cache to Exploit a Large Number of States at Small Cost 52 5.1 Motivation 52 5.2 Design and Implementation 56 5.2.1 Solution Overview 56 5.2.2 Dynamic Key States Management 61 5.3 Evaluation 67 5.3.1 Evaluation Setup 67 5.3.2 Results and Discussion 67 Chapter 6 Combining Q-table cache and Neural Network to Exploit both Long and Short-term History 73 6.1 Motivation and Problem 73 6.1.1 More State Information can Further Reduce Long Tail Latency 73 6.1.2 Locality Behavior of Workload 74 6.1.3 Zero Initialization Problem 75 6.2 Design and Implementation 77 6.2.1 Solution Overview 77 6.2.2 Q-table Cache for Action Selection 80 6.2.3 Q-value Prediction 83 6.3 Evaluation 87 6.3.1 Evaluation Setup 87 6.3.2 Storage-Intensive Workloads 89 6.3.3 Latency Comparison: Overall 92 6.3.4 Q-value Prediction Network Effects on Latency 97 6.3.5 Q-table Cache Analysis 110 6.3.6 Immature State Analysis 113 6.3.7 Miscellaneous Analysis 116 6.3.8 Multi Channel Analysis 121 Chapter 7 Conculsion and Future Work 138 7.1 Conclusion 138 7.2 Future Work 140 Bibliography 143 ๊ตญ๋ฌธ์ดˆ๋ก 154Docto

    Performance and Reliability Study and Exploration of NAND Flash-based Solid State Drives

    Get PDF
    The research that stems from my doctoral dissertation focuses on addressing essential challenges in developing techniques that utilize solid-state memory technologies (with emphasis on NAND flash memory) from device, circuit, architecture, and system perspectives in order to exploit their true potential for improving I/O performance in high-performance computing systems. These challenges include not only the performance quirks arising from the physical nature of NAND flash memory, e.g., the inability to modify data in-place, read/write performance asymmetry, and slow and constrained erase functionality, but also the reliability drawbacks that limits solid state drives (SSDs) from widely deployed. To address these challenges, I have proposed, analyzed, and evaluated the I/O scheduling schemes, strategies for storage space virtualization, and data protection methods, to boost the performance and reliability of SSDs

    Persistent Memory File Systems:A Survey

    Get PDF
    Persistent Memory (PM) is non-volatile byte-addressable memory that offers read and write latencies in the order of magnitude smaller than flash storage, such as SSDs. This survey discusses how file systems address the most prominent challenges in the implementation of file systems for Persistent Memory. First, we discuss how the properties of Persistent Memory change file system design. Second, we discuss work that aims to optimize small file I/O and the associated meta-data resolution. Third, we address how existing Persistent Memory file systems achieve (meta) data persistence and consistency
    • โ€ฆ
    corecore