127 research outputs found

    Self-Learning Hot Data Prediction: Where Echo State Network Meets NAND Flash Memories

    Get PDF
    ยฉ 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Well understanding the access behavior of hot data is significant for NAND flash memory due to its crucial impact on the efficiency of garbage collection (GC) and wear leveling (WL), which respectively dominate the performance and life span of SSD. Generally, both GC and WL rely greatly on the recognition accuracy of hot data identification (HDI). However, in this paper, the first time we propose a novel concept of hot data prediction (HDP), where the conventional HDI becomes unnecessary. First, we develop a hybrid optimized echo state network (HOESN), where sufficiently unbiased and continuously shrunk output weights are learnt by a sparse regression based on L2 and L1/2 regularization. Second, quantum-behaved particle swarm optimization (QPSO) is employed to compute reservoir parameters (i.e., global scaling factor, reservoir size, scaling coefficient and sparsity degree) for further improving prediction accuracy and reliability. Third, in the test on a chaotic benchmark (Rossler), the HOESN performs better than those of six recent state-of-the-art methods. Finally, simulation results about six typical metrics tested on five real disk workloads and on-chip experiment outcomes verified from an actual SSD prototype indicate that our HOESN-based HDP can reliably promote the access performance and endurance of NAND flash memories.Peer reviewe

    Block Cleaning Process in Flash Memory

    Get PDF

    PIYAS-Proceeding to Intelligent Service Oriented Memory Allocation for Flash Based Data Centric Sensor Devices in Wireless Sensor Networks

    Get PDF
    Flash memory has become a more widespread storage medium for modern wireless devices because of its effective characteristics like non-volatility, small size, light weight, fast access speed, shock resistance, high reliability and low power consumption. Sensor nodes are highly resource constrained in terms of limited processing speed, runtime memory, persistent storage, communication bandwidth and finite energy. Therefore, for wireless sensor networks supporting sense, store, merge and send schemes, an efficient and reliable file system is highly required with consideration of sensor node constraints. In this paper, we propose a novel log structured external NAND flash memory based file system, called Proceeding to Intelligent service oriented memorY Allocation for flash based data centric Sensor devices in wireless sensor networks (PIYAS). This is the extended version of our previously proposed PIYA [1]. The main goals of the PIYAS scheme are to achieve instant mounting and reduced SRAM space by keeping memory mapping information to a very low size of and to provide high query response throughput by allocation of memory to the sensor data by network business rules. The scheme intelligently samples and stores the raw data and provides high in-network data availability by keeping the aggregate data for a longer period of time than any other scheme has done before. We propose effective garbage collection and wear-leveling schemes as well. The experimental results show that PIYAS is an optimized memory management scheme allowing high performance for wireless sensor networks

    Towards Design and Analysis For High-Performance and Reliable SSDs

    Get PDF
    NAND Flash-based Solid State Disks have many attractive technical merits, such as low power consumption, light weight, shock resistance, sustainability of hotter operation regimes, and extraordinarily high performance for random read access, which makes SSDs immensely popular and be widely employed in different types of environments including portable devices, personal computers, large data centers, and distributed data systems. However, current SSDs still suffer from several critical inherent limitations, such as the inability of in-place-update, asymmetric read and write performance, slow garbage collection processes, limited endurance, and degraded write performance with the adoption of MLC and TLC techniques. To alleviate these limitations, we propose optimizations from both specific outside applications layer and SSDs\u27 internal layer. Since SSDs are good compromise between the performance and price, so SSDs are widely deployed as second layer caches sitting between DRAMs and hard disks to boost the system performance. Due to the special properties of SSDs such as the internal garbage collection processes and limited lifetime, traditional cache devices like DRAM and SRAM based optimizations might not work consistently for SSD-based cache. Therefore, for the outside applications layer, our work focus on integrating the special properties of SSDs into the optimizations of SSD caches. Moreover, our work also involves the alleviation of the increased Flash write latency and ECC complexity due to the adoption of MLC and TLC technologies by analyzing the real work workloads

    SSD์˜ ๊ธด ๊ผฌ๋ฆฌ ์ง€์—ฐ์‹œ๊ฐ„ ๋ฌธ์ œ ์™„ํ™”๋ฅผ ์œ„ํ•œ ๊ฐ•ํ™”ํ•™์Šต์˜ ์ ์šฉ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ์œ ์Šน์ฃผ.NAND flash memory is widely used in a variety of systems, from realtime embedded systems to high-performance enterprise server systems. Flash memory has (1) erase-before-write (write-once) and (2) endurance problems. To handle the erase-before-write feature, apply a flash-translation layer (FTL). Currently, the page-level mapping method is mainly used to reduce the latency increase caused by the write-once and block erase characteristics of flash memory. Garbage collection (GC) is one of the leading causes of long-tail latency, which increases more than 100 times the average latency at 99th percentile. Therefore, real-time systems or quality-critical systems cannot satisfy given requirements such as QoS restrictions. As flash memory capacity increases, GC latency also tends to increase. This is because the block size (the number of pages included in one block) of the flash memory increases as the capacity of the flash memory increases. GC latency is determined by valid page copy and block erase time. Therefore, as block size increases, GC latency also increases. Especially, the block size gets increased from 2D to 3D NAND flash memory, e.g., 256 pages/block in 2D planner NAND flash memory and 768 pages/block in 3D NAND flash memory. Even in 3D NAND flash memory, the block size is expected to continue to increase. Thus, the long write latency problem incurred by GC can become more serious in 3D NAND flash memory-based storage. In this dissertation, we propose three versions of the novel GC scheduling method based on reinforcement learning. The purpose of this method is to reduce the long tail latency caused by GC by utilizing the idle time of the storage system. Also, we perform a quantitative analysis for the RL-assisted GC solution. RL-assisted GC scheduling technique was proposed which learns the storage access behavior online and determines the number of GC operations to exploit the idle time. We also presented aggressive methods, which helps in further reducing the long tail latency by aggressively performing fine-grained GC operations. We also proposed a technique that dynamically manages key states in RL-assisted GC to reduce the long-tail latency. This technique uses many fine-grained pieces of information as state candidates and manages key states that suitably represent the characteristics of the workload using a relatively small amount of memory resource. Thus, the proposed method can reduce the long-tail latency even further. In addition, we presented a Q-value prediction network that predicts the initial Q-value of a newly inserted state in the Q-table cache. The integrated solution of the Q-table cache and Q-value prediction network can exploit the short-term history of the system with a low-cost Q-table cache. It is also equipped with a small network called Q-value prediction network to make use of the long-term history and provide good Q-value initialization for the Q-table cache. The experiments show that our proposed method reduces by 25%-37% the long tail latency compared to the state-of-the-art method.๋‚ธ๋“œ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ๋Š” ์‹ค์‹œ๊ฐ„ ์ž„๋ฒ ๋””๋“œ ์‹œ์Šคํ…œ์œผ๋กœ๋ถ€ํ„ฐ ๊ณ ์„ฑ๋Šฅ์˜ ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ์„œ๋ฒ„ ์‹œ์Šคํ…œ๊นŒ์ง€ ๋‹ค์–‘ํ•œ ์‹œ์Šคํ…œ์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ ๋˜๊ณ  ์žˆ๋‹ค. ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ๋Š” (1) erase-before-write (write-once)์™€ (2) endurance ๋ฌธ์ œ๋ฅผ ๊ฐ–๊ณ  ์žˆ๋‹ค. Erase-before-write ํŠน์„ฑ์„ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด flash-translation layer (FTL)์„ ์ ์šฉ ํ•œ๋‹ค. ํ˜„์žฌ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์˜ write-once ํŠน์„ฑ๊ณผ block eraseํŠน์„ฑ์œผ๋กœ ์ธํ•œ latency ์ฆ๊ฐ€๋ฅผ ๊ฐ์†Œ ์‹œํ‚ค๊ธฐ ์œ„ํ•˜์—ฌ page-level mapping๋ฐฉ์‹์ด ์ฃผ๋กœ ์‚ฌ์šฉ ๋œ๋‹ค. Garbage collection (GC)์€ 99th percentile์—์„œ ํ‰๊ท  ์ง€์—ฐ์‹œ๊ฐ„์˜ 100๋ฐฐ ์ด์ƒ ์ฆ๊ฐ€ํ•˜๋Š” long tail latency๋ฅผ ์œ ๋ฐœ์‹œํ‚ค๋Š” ์ฃผ์š” ์›์ธ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๋”ฐ๋ผ์„œ ์‹ค์‹œ๊ฐ„ ์‹œ์Šคํ…œ์ด๋‚˜ quality-critical system์—์„œ๋Š” Quality of Service (QoS) ์ œํ•œ๊ณผ ๊ฐ™์€ ์ฃผ์–ด์ง„ ์š”๊ตฌ ์กฐ๊ฑด์„ ๋งŒ์กฑ ์‹œํ‚ฌ ์ˆ˜ ์—†๋‹ค. ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์˜ ์šฉ๋Ÿ‰์ด ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ GC latency๋„ ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์ธ๋‹ค. ์ด๊ฒƒ์€ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์˜ ์šฉ๋Ÿ‰์ด ์ฆ๊ฐ€ ํ•จ์— ๋”ฐ๋ผ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์˜ ๋ธ”๋ก ํฌ๊ธฐ (ํ•˜๋‚˜์˜ ๋ธ”๋ก์ด ํฌํ•จํ•˜๊ณ  ์žˆ๋Š” ํŽ˜์ด์ง€์˜ ์ˆ˜)๊ฐ€ ์ฆ๊ฐ€ ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. GC latency๋Š” valid page copy์™€ block erase ์‹œ๊ฐ„์— ์˜ํ•ด ๊ฒฐ์ • ๋œ๋‹ค. ๋”ฐ๋ผ์„œ, ๋ธ”๋ก ํฌ๊ธฐ๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด, GC latency๋„ ์ฆ๊ฐ€ ํ•œ๋‹ค. ํŠนํžˆ, ์ตœ๊ทผ 2D planner ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์—์„œ 3D vertical ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์กฐ๋กœ ์ „ํ™˜๋จ์— ๋”ฐ๋ผ ๋ธ”๋ก ํฌ๊ธฐ๋Š” ์ฆ๊ฐ€ ํ•˜์˜€๋‹ค. ์‹ฌ์ง€์–ด 3D vertical ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์—์„œ๋„ ๋ธ”๋ก ํฌ๊ธฐ๊ฐ€ ์ง€์†์ ์œผ๋กœ ์ฆ๊ฐ€ ํ•˜๊ณ  ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ 3D vertical ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์—์„œ long tail latency ๋ฌธ์ œ๋Š” ๋”์šฑ ์‹ฌ๊ฐํ•ด ์ง„๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ ์šฐ๋ฆฌ๋Š” ๊ฐ•ํ™”ํ•™์Šต(Reinforcement learning, RL)์„ ์ด์šฉํ•œ ์„ธ ๊ฐ€์ง€ ๋ฒ„์ „์˜ ์ƒˆ๋กœ์šด GC scheduling ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ ๊ธฐ์ˆ ์˜ ๋ชฉ์ ์€ ์Šคํ† ๋ฆฌ์ง€ ์‹œ์Šคํ…œ์˜ idle ์‹œ๊ฐ„์„ ํ™œ์šฉํ•˜์—ฌ GC์— ์˜ํ•ด ๋ฐœ์ƒ๋œ long tail latency๋ฅผ ๊ฐ์†Œ ์‹œํ‚ค๋Š” ๊ฒƒ์ด๋‹ค. ๋˜ํ•œ, ์šฐ๋ฆฌ๋Š” RL-assisted GC ์†”๋ฃจ์…˜์„ ์œ„ํ•œ ์ •๋Ÿ‰ ๋ถ„์„ ํ•˜์˜€๋‹ค. ์šฐ๋ฆฌ๋Š” ์Šคํ† ๋ฆฌ์ง€์˜ access behavior๋ฅผ ์˜จ๋ผ์ธ์œผ๋กœ ํ•™์Šตํ•˜๊ณ , idle ์‹œ๊ฐ„์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋Š” GC operation์˜ ์ˆ˜๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” RL-assisted GC scheduling ๊ธฐ์ˆ ์„ ์ œ์•ˆ ํ•˜์˜€๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ ์šฐ๋ฆฌ๋Š” ๊ณต๊ฒฉ์ ์ธ ๋ฐฉ๋ฒ•์„ ์ œ์‹œ ํ•˜์˜€๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ์ž‘์€ ๋‹จ์œ„์˜ GC operation๋“ค์„ ๊ณต๊ฒฉ์ ์œผ๋กœ ์ˆ˜ํ–‰ ํ•จ์œผ๋กœ์จ, long tail latency๋ฅผ ๋”์šฑ ๊ฐ์†Œ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋„๋ก ๋„์›€์„ ์ค€๋‹ค. ๋˜ํ•œ ์šฐ๋ฆฌ๋Š” long tail latency๋ฅผ ๋”์šฑ ๊ฐ์†Œ์‹œํ‚ค๊ธฐ ์œ„ํ•˜์—ฌ RL-assisted GC์˜ key state๋“ค์„ ๋™์ ์œผ๋กœ ๊ด€๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” Q-table cache ๊ธฐ์ˆ ์„ ์ œ์•ˆ ํ•˜์˜€๋‹ค. ์ด ๊ธฐ์ˆ ์€ state ํ›„๋ณด๋กœ ๋งค์šฐ ๋งŽ์€ ์ˆ˜์˜ ์„ธ๋ฐ€ํ•œ ์ •๋ณด๋“ค์„ ์‚ฌ์šฉ ํ•˜๊ณ , ์ƒ๋Œ€์ ์œผ๋กœ ์ž‘์€ ๋ฉ”๋ชจ๋ฆฌ ๊ณต๊ฐ„์„ ์ด์šฉํ•˜์—ฌ workload์˜ ํŠน์„ฑ์„ ์ ์ ˆํ•˜๊ฒŒ ํ‘œํ˜„ ํ•  ์ˆ˜ ์žˆ๋Š” key state๋“ค์„ ๊ด€๋ฆฌ ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ long tail latency๋ฅผ ๋”์šฑ ๊ฐ์†Œ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ, ์šฐ๋ฆฌ๋Š” Q-table cache์— ์ƒˆ๋กญ๊ฒŒ ์ถ”๊ฐ€๋˜๋Š” state์˜ ์ดˆ๊ธฐ๊ฐ’์„ ์˜ˆ์ธกํ•˜๋Š” Q-value prediction network (QP Net)๋ฅผ ์ œ์•ˆ ํ•˜์˜€๋‹ค. Q-table cache์™€ QP Net์˜ ํ†ตํ•ฉ ์†”๋ฃจ์…˜์€ ์ € ๋น„์šฉ์˜ Q-table cache๋ฅผ ์ด์šฉํ•˜์—ฌ ๋‹จ๊ธฐ๊ฐ„์˜ ๊ณผ๊ฑฐ ์ •๋ณด๋ฅผ ํ™œ์šฉ ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ์ด๊ฒƒ์€ QP Net์ด๋ผ๊ณ  ๋ถ€๋ฅด๋Š” ์ž‘์€ ์‹ ๊ฒฝ๋ง์„ ์ด์šฉํ•˜์—ฌ ํ•™์Šตํ•œ ์žฅ๊ธฐ๊ฐ„์˜ ๊ณผ๊ฑฐ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Q-table cache์— ์ƒˆ๋กญ๊ฒŒ ์‚ฝ์ž…๋˜๋Š” state์— ๋Œ€ํ•ด ์ข‹์€ Q-value ์ดˆ๊ธฐ๊ฐ’์„ ์ œ๊ณตํ•œ๋‹ค. ์‹คํ—˜๊ฒฐ๊ณผ๋Š” ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ด state-of-the-art ๋ฐฉ๋ฒ•์— ๋น„๊ตํ•˜์—ฌ 25%-37%์˜ long tail latency๋ฅผ ๊ฐ์†Œ ์‹œ์ผฐ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.Chapter 1 Introduction 1 Chapter 2 Background 6 2.1 System Level Tail Latency 6 2.2 Solid State Drive 10 2.2.1 Flash Storage Architecture and Garbage Collection 10 2.3 Reinforcement Learning 13 Chapter 3 Related Work 17 Chapter 4 Small Q-table based Solution to Reduce Long Tail Latency 23 4.1 Problem and Motivation 23 4.1.1 Long Tail Problem in Flash Storage Access Latency 23 4.1.2 Idle Time in Flash Storage 24 4.2 Design and Implementation 26 4.2.1 Solution Overview 26 4.2.2 RL-assisted Garbage Collection Scheduling 27 4.2.3 Aggressive RL-assisted Garbage Collection Scheduling 33 4.3 Evaluation 35 4.3.1 Evaluation Setup 35 4.3.2 Results and Discussion 39 Chapter 5 Q-table Cache to Exploit a Large Number of States at Small Cost 52 5.1 Motivation 52 5.2 Design and Implementation 56 5.2.1 Solution Overview 56 5.2.2 Dynamic Key States Management 61 5.3 Evaluation 67 5.3.1 Evaluation Setup 67 5.3.2 Results and Discussion 67 Chapter 6 Combining Q-table cache and Neural Network to Exploit both Long and Short-term History 73 6.1 Motivation and Problem 73 6.1.1 More State Information can Further Reduce Long Tail Latency 73 6.1.2 Locality Behavior of Workload 74 6.1.3 Zero Initialization Problem 75 6.2 Design and Implementation 77 6.2.1 Solution Overview 77 6.2.2 Q-table Cache for Action Selection 80 6.2.3 Q-value Prediction 83 6.3 Evaluation 87 6.3.1 Evaluation Setup 87 6.3.2 Storage-Intensive Workloads 89 6.3.3 Latency Comparison: Overall 92 6.3.4 Q-value Prediction Network Effects on Latency 97 6.3.5 Q-table Cache Analysis 110 6.3.6 Immature State Analysis 113 6.3.7 Miscellaneous Analysis 116 6.3.8 Multi Channel Analysis 121 Chapter 7 Conculsion and Future Work 138 7.1 Conclusion 138 7.2 Future Work 140 Bibliography 143 ๊ตญ๋ฌธ์ดˆ๋ก 154Docto
    • โ€ฆ
    corecore