3,704 research outputs found

    A Cache Management Strategy to Replace Wear Leveling Techniques for Embedded Flash Memory

    Full text link
    Prices of NAND flash memories are falling drastically due to market growth and fabrication process mastering while research efforts from a technological point of view in terms of endurance and density are very active. NAND flash memories are becoming the most important storage media in mobile computing and tend to be less confined to this area. The major constraint of such a technology is the limited number of possible erase operations per block which tend to quickly provoke memory wear out. To cope with this issue, state-of-the-art solutions implement wear leveling policies to level the wear out of the memory and so increase its lifetime. These policies are integrated into the Flash Translation Layer (FTL) and greatly contribute in decreasing the write performance. In this paper, we propose to reduce the flash memory wear out problem and improve its performance by absorbing the erase operations throughout a dual cache system replacing FTL wear leveling and garbage collection services. We justify this idea by proposing a first performance evaluation of an exclusively cache based system for embedded flash memories. Unlike wear leveling schemes, the proposed cache solution reduces the total number of erase operations reported on the media by absorbing them in the cache for workloads expressing a minimal global sequential rate.Comment: Ce papier a obtenu le "Best Paper Award" dans le "Computer System track" nombre de page: 8; International Symposium on Performance Evaluation of Computer & Telecommunication Systems, La Haye : Netherlands (2011

    Flash-memories in Space Applications: Trends and Challenges

    Get PDF
    Nowadays space applications are provided with a processing power absolutely overcoming the one available just a few years ago. Typical mission-critical space system applications include also the issue of solid-state recorder(s). Flash-memories are nonvolatile, shock-resistant and power-economic, but in turn have different drawbacks. A solid-state recorder for space applications should satisfy many different constraints especially because of the issues related to radiations: proper countermeasures are needed, together with EDAC and testing techniques in order to improve the dependability of the whole system. Different and quite often contrasting dimensions need to be explored during the design of a flash-memory based solid- state recorder. In particular, we shall explore the most important flash-memory design dimensions and trade-offs to tackle during the design of flash-based hard disks for space application

    Dynamic Virtual Page-based Flash Translation Layer with Novel Hot Data Identification and Adaptive Parallelism Management

    Get PDF
    Solid-state disks (SSDs) tend to replace traditional motor-driven hard disks in high-end storage devices in past few decades. However, various inherent features, such as out-of-place update [resorting to garbage collection (GC)] and limited endurance (resorting to wear leveling), need to be reduced to a large extent before that day comes. Both the GC and wear leveling fundamentally depend on hot data identification (HDI). In this paper, we propose a hot data-aware flash translation layer architecture based on a dynamic virtual page (DVPFTL) so as to improve the performance and lifetime of NAND flash devices. First, we develop a generalized dual layer HDI (DL-HDI) framework, which is composed of a cold data pre-classifier and a hot data post-identifier. Those can efficiently follow the frequency and recency of information access. Then, we design an adaptive parallelism manager (APM) to assign the clustered data chunks to distinct resident blocks in the SSD so as to prolong its endurance. Finally, the experimental results from our realized SSD prototype indicate that the DVPFTL scheme has reliably improved the parallelizability and endurance of NAND flash devices with improved GC-costs, compared with related works.Peer reviewe

    SimpleSSD: Modeling Solid State Drives for Holistic System Simulation

    Full text link
    Existing solid state drive (SSD) simulators unfortunately lack hardware and/or software architecture models. Consequently, they are far from capturing the critical features of contemporary SSD devices. More importantly, while the performance of modern systems that adopt SSDs can vary based on their numerous internal design parameters and storage-level configurations, a full system simulation with traditional SSD models often requires unreasonably long runtimes and excessive computational resources. In this work, we propose SimpleSSD, a highfidelity simulator that models all detailed characteristics of hardware and software, while simplifying the nondescript features of storage internals. In contrast to existing SSD simulators, SimpleSSD can easily be integrated into publicly-available full system simulators. In addition, it can accommodate a complete storage stack and evaluate the performance of SSDs along with diverse memory technologies and microarchitectures. Thus, it facilitates simulations that explore the full design space at different levels of system abstraction.Comment: This paper has been accepted at IEEE Computer Architecture Letters (CAL

    Self-Learning Hot Data Prediction: Where Echo State Network Meets NAND Flash Memories

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Well understanding the access behavior of hot data is significant for NAND flash memory due to its crucial impact on the efficiency of garbage collection (GC) and wear leveling (WL), which respectively dominate the performance and life span of SSD. Generally, both GC and WL rely greatly on the recognition accuracy of hot data identification (HDI). However, in this paper, the first time we propose a novel concept of hot data prediction (HDP), where the conventional HDI becomes unnecessary. First, we develop a hybrid optimized echo state network (HOESN), where sufficiently unbiased and continuously shrunk output weights are learnt by a sparse regression based on L2 and L1/2 regularization. Second, quantum-behaved particle swarm optimization (QPSO) is employed to compute reservoir parameters (i.e., global scaling factor, reservoir size, scaling coefficient and sparsity degree) for further improving prediction accuracy and reliability. Third, in the test on a chaotic benchmark (Rossler), the HOESN performs better than those of six recent state-of-the-art methods. Finally, simulation results about six typical metrics tested on five real disk workloads and on-chip experiment outcomes verified from an actual SSD prototype indicate that our HOESN-based HDP can reliably promote the access performance and endurance of NAND flash memories.Peer reviewe

    SSD의 긴 꼬리 지연시간 문제 완화를 위한 강화학습의 적용

    Get PDF
    학위논문(박사)--서울대학교 대학원 :공과대학 컴퓨터공학부,2020. 2. 유승주.NAND flash memory is widely used in a variety of systems, from realtime embedded systems to high-performance enterprise server systems. Flash memory has (1) erase-before-write (write-once) and (2) endurance problems. To handle the erase-before-write feature, apply a flash-translation layer (FTL). Currently, the page-level mapping method is mainly used to reduce the latency increase caused by the write-once and block erase characteristics of flash memory. Garbage collection (GC) is one of the leading causes of long-tail latency, which increases more than 100 times the average latency at 99th percentile. Therefore, real-time systems or quality-critical systems cannot satisfy given requirements such as QoS restrictions. As flash memory capacity increases, GC latency also tends to increase. This is because the block size (the number of pages included in one block) of the flash memory increases as the capacity of the flash memory increases. GC latency is determined by valid page copy and block erase time. Therefore, as block size increases, GC latency also increases. Especially, the block size gets increased from 2D to 3D NAND flash memory, e.g., 256 pages/block in 2D planner NAND flash memory and 768 pages/block in 3D NAND flash memory. Even in 3D NAND flash memory, the block size is expected to continue to increase. Thus, the long write latency problem incurred by GC can become more serious in 3D NAND flash memory-based storage. In this dissertation, we propose three versions of the novel GC scheduling method based on reinforcement learning. The purpose of this method is to reduce the long tail latency caused by GC by utilizing the idle time of the storage system. Also, we perform a quantitative analysis for the RL-assisted GC solution. RL-assisted GC scheduling technique was proposed which learns the storage access behavior online and determines the number of GC operations to exploit the idle time. We also presented aggressive methods, which helps in further reducing the long tail latency by aggressively performing fine-grained GC operations. We also proposed a technique that dynamically manages key states in RL-assisted GC to reduce the long-tail latency. This technique uses many fine-grained pieces of information as state candidates and manages key states that suitably represent the characteristics of the workload using a relatively small amount of memory resource. Thus, the proposed method can reduce the long-tail latency even further. In addition, we presented a Q-value prediction network that predicts the initial Q-value of a newly inserted state in the Q-table cache. The integrated solution of the Q-table cache and Q-value prediction network can exploit the short-term history of the system with a low-cost Q-table cache. It is also equipped with a small network called Q-value prediction network to make use of the long-term history and provide good Q-value initialization for the Q-table cache. The experiments show that our proposed method reduces by 25%-37% the long tail latency compared to the state-of-the-art method.낸드 플래시 메모리는 실시간 임베디드 시스템으로부터 고성능의 엔터프라이즈 서버 시스템까지 다양한 시스템에서 널리 사용 되고 있다. 플래시 메모리는 (1) erase-before-write (write-once)와 (2) endurance 문제를 갖고 있다. Erase-before-write 특성을 다루기 위해 flash-translation layer (FTL)을 적용 한다. 현재 플래시 메모리의 write-once 특성과 block erase특성으로 인한 latency 증가를 감소 시키기 위하여 page-level mapping방식이 주로 사용 된다. Garbage collection (GC)은 99th percentile에서 평균 지연시간의 100배 이상 증가하는 long tail latency를 유발시키는 주요 원인 중 하나이다. 따라서 실시간 시스템이나 quality-critical system에서는 Quality of Service (QoS) 제한과 같은 주어진 요구 조건을 만족 시킬 수 없다. 플래시 메모리의 용량이 증가함에 따라 GC latency도 증가하는 경향을 보인다. 이것은 플래시 메모리의 용량이 증가 함에 따라 플래시 메모리의 블록 크기 (하나의 블록이 포함하고 있는 페이지의 수)가 증가 하기 때문이다. GC latency는 valid page copy와 block erase 시간에 의해 결정 된다. 따라서, 블록 크기가 증가하면, GC latency도 증가 한다. 특히, 최근 2D planner 플래시 메모리에서 3D vertical 플래시 메모리 구조로 전환됨에 따라 블록 크기는 증가 하였다. 심지어 3D vertical 플래시 메모리에서도 블록 크기가 지속적으로 증가 하고 있다. 따라서 3D vertical 플래시 메모리에서 long tail latency 문제는 더욱 심각해 진다. 본 논문에서 우리는 강화학습(Reinforcement learning, RL)을 이용한 세 가지 버전의 새로운 GC scheduling 기법을 제안하였다. 제안된 기술의 목적은 스토리지 시스템의 idle 시간을 활용하여 GC에 의해 발생된 long tail latency를 감소 시키는 것이다. 또한, 우리는 RL-assisted GC 솔루션을 위한 정량 분석 하였다. 우리는 스토리지의 access behavior를 온라인으로 학습하고, idle 시간을 활용할 수 있는 GC operation의 수를 결정하는 RL-assisted GC scheduling 기술을 제안 하였다. 추가적으로 우리는 공격적인 방법을 제시 하였다. 이 방법은 작은 단위의 GC operation들을 공격적으로 수행 함으로써, long tail latency를 더욱 감소 시킬 수 있도록 도움을 준다. 또한 우리는 long tail latency를 더욱 감소시키기 위하여 RL-assisted GC의 key state들을 동적으로 관리할 수 있는 Q-table cache 기술을 제안 하였다. 이 기술은 state 후보로 매우 많은 수의 세밀한 정보들을 사용 하고, 상대적으로 작은 메모리 공간을 이용하여 workload의 특성을 적절하게 표현 할 수 있는 key state들을 관리 한다. 따라서, 제안된 방법은 long tail latency를 더욱 감소 시킬 수 있다. 추가적으로, 우리는 Q-table cache에 새롭게 추가되는 state의 초기값을 예측하는 Q-value prediction network (QP Net)를 제안 하였다. Q-table cache와 QP Net의 통합 솔루션은 저 비용의 Q-table cache를 이용하여 단기간의 과거 정보를 활용 할 수 있다. 또한 이것은 QP Net이라고 부르는 작은 신경망을 이용하여 학습한 장기간의 과거 정보를 사용하여 Q-table cache에 새롭게 삽입되는 state에 대해 좋은 Q-value 초기값을 제공한다. 실험결과는 제안한 방법이 state-of-the-art 방법에 비교하여 25%-37%의 long tail latency를 감소 시켰음을 보여준다.Chapter 1 Introduction 1 Chapter 2 Background 6 2.1 System Level Tail Latency 6 2.2 Solid State Drive 10 2.2.1 Flash Storage Architecture and Garbage Collection 10 2.3 Reinforcement Learning 13 Chapter 3 Related Work 17 Chapter 4 Small Q-table based Solution to Reduce Long Tail Latency 23 4.1 Problem and Motivation 23 4.1.1 Long Tail Problem in Flash Storage Access Latency 23 4.1.2 Idle Time in Flash Storage 24 4.2 Design and Implementation 26 4.2.1 Solution Overview 26 4.2.2 RL-assisted Garbage Collection Scheduling 27 4.2.3 Aggressive RL-assisted Garbage Collection Scheduling 33 4.3 Evaluation 35 4.3.1 Evaluation Setup 35 4.3.2 Results and Discussion 39 Chapter 5 Q-table Cache to Exploit a Large Number of States at Small Cost 52 5.1 Motivation 52 5.2 Design and Implementation 56 5.2.1 Solution Overview 56 5.2.2 Dynamic Key States Management 61 5.3 Evaluation 67 5.3.1 Evaluation Setup 67 5.3.2 Results and Discussion 67 Chapter 6 Combining Q-table cache and Neural Network to Exploit both Long and Short-term History 73 6.1 Motivation and Problem 73 6.1.1 More State Information can Further Reduce Long Tail Latency 73 6.1.2 Locality Behavior of Workload 74 6.1.3 Zero Initialization Problem 75 6.2 Design and Implementation 77 6.2.1 Solution Overview 77 6.2.2 Q-table Cache for Action Selection 80 6.2.3 Q-value Prediction 83 6.3 Evaluation 87 6.3.1 Evaluation Setup 87 6.3.2 Storage-Intensive Workloads 89 6.3.3 Latency Comparison: Overall 92 6.3.4 Q-value Prediction Network Effects on Latency 97 6.3.5 Q-table Cache Analysis 110 6.3.6 Immature State Analysis 113 6.3.7 Miscellaneous Analysis 116 6.3.8 Multi Channel Analysis 121 Chapter 7 Conculsion and Future Work 138 7.1 Conclusion 138 7.2 Future Work 140 Bibliography 143 국문초록 154Docto
    corecore