Search CORE

658 research outputs found

Bridging the Gap between Application and Solid-State-Drives

Author: Zhou Jian
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2018
Field of study

Data storage is one of the important and often critical parts of the computing system in terms of performance, cost, reliability, and energy. Numerous new memory technologies, such as NAND flash, phase change memory (PCM), magnetic RAM (STT-RAM) and Memristor, have emerged recently. Many of them have already entered the production system. Traditional storage optimization and caching algorithms are far from optimal because storage I/Os do not show simple locality. To provide optimal storage we need accurate predictions of I/O behavior. However, the workloads are increasingly dynamic and diverse, making the long and short time I/O prediction challenge. Because of the evolution of the storage technologies and the increasing diversity of workloads, the storage software is becoming more and more complex. For example, Flash Translation Layer (FTL) is added for NAND-flash based Solid State Disks (NAND-SSDs). However, it introduces overhead such as address translation delay and garbage collection costs. There are many recent studies aim to address the overhead. Unfortunately, there is no one-size-fits-all solution due to the variety of workloads. Despite rapidly evolving in storage technologies, the increasing heterogeneity and diversity in machines and workloads coupled with the continued data explosion exacerbate the gap between computing and storage speeds. In this dissertation, we improve the data storage performance from both top-down and bottom-up approach. First, we will investigate exposing the storage level parallelism so that applications can avoid I/O contentions and workloads skew when scheduling the jobs. Second, we will study how architecture aware task scheduling can improve the performance of the application when PCM based NVRAM are equipped. Third, we will develop an I/O correlation aware flash translation layer for NAND-flash based Solid State Disks. Fourth, we will build a DRAM-based correlation aware FTL emulator and study the performance in various filesystems

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

낸드 플래시 저장장치의 성능 및 수명 향상을 위한 프로그램 컨텍스트 기반 최적화 기법

Author: 김태진
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2019. 2. 김지홍.컴퓨팅 시스템의 성능 향상을 위해, 기존의 느린 하드디스크(HDD)를 빠른 낸드 플래시 메모리 기반 저장장치(SSD)로 대체하고자 하는 연구가 최근 활발히 진행 되고 있다. 그러나 지속적인 반도체 공정 스케일링 및 멀티 레벨링 기술로 SSD 가격을 동급 HDD 수준으로 낮아졌지만, 최근의 첨단 디바이스 기술의 부작용으 로 NAND 플래시 메모리의 수명이 짧아지는 것은 고성능 컴퓨팅 시스템에서의 SSD의 광범위한 채택을 막는 주요 장벽 중 하나이다. 본 논문에서는 최근의 고밀도 낸드 플래시 메모리의 수명 및 성능 문제를 해결하기 위한 시스템 레벨의 개선 기술을 제안한다. 제안 된 기법은 응용 프로 그램의 쓰기 문맥을 활용하여 기존에는 얻을 수 없었던 데이터 수명 패턴 및 중복 데이터 패턴을 분석하였다. 이에 기반하여, 단일 계층의 단순한 정보만을 활용했 던 기존 기법의 한계를 극복함으로써 효과적으로 NAND 플래시 메모리의 성능 및 수명을 향상시키는 최적화 방법론을 제시한다. 먼저, 응용 프로그램의 I/O 작업에는 문맥에 따라 고유한 데이터 수명과 중 복 데이터의 패턴이 존재한다는 점을 분석을 통해 확인하였다. 문맥 정보를 효과 적으로 활용하기 위해 프로그램 컨텍스트 (쓰기 문맥) 추출 방법을 구현 하였다. 프로그램 컨텍스트 정보를 통해 가비지 컬렉션 부하와 제한된 수명의 NAND 플 래시 메모리 개선을 위한 기존 기술의 한계를 효과적으로 극복할 수 있다. 둘째, 멀티 스트림 SSD에서 WAF를 줄이기 위해 데이터 수명 예측의 정확 성을 높이는 기법을 제안하였다. 이를 위해 애플리케이션의 I/O 컨텍스트를 활용 하는 시스템 수준의 접근 방식을 제안하였다. 제안된 기법의 핵심 동기는 데이터 수명이 LBA보다 높은 추상화 수준에서 평가 되어야 한다는 것이다. 따라서 프 로그램 컨텍스트를 기반으로 데이터의 수명을 보다 정확히 예측함으로써, 기존 기법에서 LBA를 기반으로 데이터 수명을 관리하는 한계를 극복한다. 결론적으 로 따라서 가비지 컬렉션의 효율을 높이기 위해 수명이 짧은 데이터를 수명이 긴 데이터와 효과적으로 분리 할 수 있다. 마지막으로, 쓰기 프로그램 컨텍스트의 중복 데이터 패턴 분석을 기반으로 불필요한 중복 제거 작업을 피할 수있는 선택적 중복 제거를 제안한다. 중복 데 이터를 생성하지 않는 프로그램 컨텍스트가 존재함을 분석적으로 보이고 이들을 제외함으로써, 중복제거 동작의 효율성을 높일 수 있다. 또한 중복 데이터가 발생 하는 패턴에 기반하여 기록된 데이터를 관리하는 자료구조 유지 정책을 새롭게 제안하였다. 추가적으로, 서브 페이지 청크를 도입하여 중복 데이터를 제거 할 가능성을 높이는 세분화 된 중복 제거를 제안한다. 제안 된 기술의 효과를 평가하기 위해 다양한 실제 시스템에서 수집 된 I/O 트레이스에 기반한 시뮬레이션 평가 뿐만 아니라 에뮬레이터 구현을 통해 실제 응용을 동작하면서 일련의 평가를 수행했다. 더 나아가 멀티 스트림 디바이스의 내부 펌웨어를 수정하여 실제와 가장 비슷하게 설정된 환경에서 실험을 수행하 였다. 실험 결과를 통해 제안된 시스템 수준 최적화 기법이 성능 및 수명 개선 측면에서 기존 최적화 기법보다 더 효과적이었음을 확인하였다. 향후 제안된 기 법들이 보다 더 발전된다면, 낸드 플래시 메모리가 초고속 컴퓨팅 시스템의 주 저장장치로 널리 사용되는 데에 긍정적인 기여를 할 수 있을 것으로 기대된다.Replacing HDDs with NAND flash-based storage devices (SSDs) has been one of the major challenges in modern computing systems especially in regards to better performance and higher mobility. Although the continuous semiconductor process scaling and multi-leveling techniques lower the price of SSDs to the comparable level of HDDs, the decreasing lifetime of NAND flash memory, as a side effect of recent advanced device technologies, is emerging as one of the major barriers to the wide adoption of SSDs in highperformance computing systems. In this dissertation, system-level lifetime improvement techniques for recent high-density NAND flash memory are proposed. Unlike existing techniques, the proposed techniques resolve the problems of decreasing performance and lifetime of NAND flash memory by exploiting the I/O context of an application to analyze data lifetime patterns or duplicate data contents patterns. We first present that I/O activities of an application have distinct data lifetime and duplicate data patterns. In order to effectively utilize the context information, we implemented the program context extraction method. With the program context, we can overcome the limitations of existing techniques for improving the garbage collection overhead and limited lifetime of NAND flash memory. Second, we propose a system-level approach to reduce WAF that exploits the I/O context of an application to increase the data lifetime prediction for the multi-streamed SSDs. The key motivation behind the proposed technique was that data lifetimes should be estimated at a higher abstraction level than LBAs, so we employ a write program context as a stream management unit. Thus, it can effectively separate data with short lifetimes from data with long lifetimes to improve the efficiency of garbage collection. Lastly, we propose a selective deduplication that can avoid unnecessary deduplication work based on the duplicate data pattern analysis of write program context. With the help of selective deduplication, we also propose fine-grained deduplication which improves the likelihood of eliminating redundant data by introducing sub-page chunk. It also resolves technical difficulties caused by its finer granularity, i.e., increased memory requirement and read response time. In order to evaluate the effectiveness of the proposed techniques, we performed a series of evaluations using both a trace-driven simulator and emulator with I/O traces which were collected from various real-world systems. To understand the feasibility of the proposed techniques, we also implemented them in Linux kernel on top of our in-house flash storage prototype and then evaluated their effects on the lifetime while running real-world applications. Our experimental results show that system-level optimization techniques are more effective over existing optimization techniques.I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Garbage Collection Problem . . . . . . . . . . . . . 2 1.1.2 Limited Endurance Problem . . . . . . . . . . . . . 4 1.2 Dissertation Goals . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . 7 II. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 NAND Flash Memory System Software . . . . . . . . . . . 9 2.2 NAND Flash-Based Storage Devices . . . . . . . . . . . . . 10 2.3 Multi-stream Interface . . . . . . . . . . . . . . . . . . . . 11 2.4 Inline Data Deduplication Technique . . . . . . . . . . . . . 12 2.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5.1 Data Separation Techniques for Multi-streamed SSDs 13 2.5.2 Write Traffic Reduction Techniques . . . . . . . . . 15 2.5.3 Program Context based Optimization Techniques for Operating Systems . . . . . . . . 18 III. Program Context-based Analysis . . . . . . . . . . . . . . . . 21 3.1 Definition and Extraction of Program Context . . . . . . . . 21 3.2 Data Lifetime Patterns of I/O Activities . . . . . . . . . . . 24 3.3 Duplicate Data Patterns of I/O Activities . . . . . . . . . . . 26 IV. Fully Automatic Stream Management For Multi-Streamed SSDs Using Program Contexts . . 29 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2.1 No Automatic Stream Management for General I/O Workloads . . . . . . . . . 33 4.2.2 Limited Number of Supported Streams . . . . . . . 36 4.3 Automatic I/O Activity Management . . . . . . . . . . . . . 38 4.3.1 PC as a Unit of Lifetime Classification for General I/O Workloads . . . . . . . . . . . 39 4.4 Support for Large Number of Streams . . . . . . . . . . . . 41 4.4.1 PCs with Large Lifetime Variances . . . . . . . . . 42 4.4.2 Implementation of Internal Streams . . . . . . . . . 44 4.5 Design and Implementation of PCStream . . . . . . . . . . 46 4.5.1 PC Lifetime Management . . . . . . . . . . . . . . 46 4.5.2 Mapping PCs to SSD streams . . . . . . . . . . . . 49 4.5.3 Internal Stream Management . . . . . . . . . . . . . 50 4.5.4 PC Extraction for Indirect Writes . . . . . . . . . . 51 4.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . 53 4.6.1 Experimental Settings . . . . . . . . . . . . . . . . 53 4.6.2 Performance Evaluation . . . . . . . . . . . . . . . 55 4.6.3 WAF Comparison . . . . . . . . . . . . . . . . . . . 56 4.6.4 Per-stream Lifetime Distribution Analysis . . . . . . 57 4.6.5 Impact of Internal Streams . . . . . . . . . . . . . . 58 4.6.6 Impact of the PC Attribute Table . . . . . . . . . . . 60 V. Deduplication Technique using Program Contexts . . . . . . 62 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.2 Selective Deduplication using Program Contexts . . . . . . . 63 5.2.1 PCDedup: Improving SSD Deduplication Efficiency using Selective Hash Cache Management . . . . . . 63 5.2.2 2-level LRU Eviction Policy . . . . . . . . . . . . . 68 5.3 Exploiting Small Chunk Size . . . . . . . . . . . . . . . . . 70 5.3.1 Fine-Grained Deduplication . . . . . . . . . . . . . 70 5.3.2 Read Overhead Management . . . . . . . . . . . . . 76 5.3.3 Memory Overhead Management . . . . . . . . . . . 80 5.3.4 Experimental Results . . . . . . . . . . . . . . . . . 82 VI. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.1 Summary and Conclusions . . . . . . . . . . . . . . . . . . 88 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.2.1 Supporting applications that have unusal program contexts . . . . . . . . . . . . . 89 6.2.2 Optimizing read request based on the I/O context . . 90 6.2.3 Exploiting context information to improve fingerprint lookups . . . . .. . . . . . 91 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92Docto

SNU Open Repository and Archive

Improving flash write performance by using update frequency

Author: Agrawal N.
Bouganim L.
Bux W.
Cai Y.
Chiang M.-L.
Choi H.
Corless R.
Grupp L.
Grupp L.
Gupta A.
Haas R.
Hu X.
Johnson R.
Kawaguchi A.
Lee S.
Lee S.-W.
Lee Y.
Ma D.
Park D.
Park D.
Rosenblum M.
Rosenblum M.
Wu M.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

Cross-Layer Optimization Techniques for Improving Performance and Reliability of NAND Flash-Based Storage Systems

Author: 하건수
Publication venue: 서울대학교 대학원
Publication date: 01/08/2015
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 8. 김지홍.As the cost-per-bit of NAND flash memory is quickly improved by advanced process technologies and multi-leveling techniques, NAND flash-based storage systems are widely employed from mobile embedded systems to high-end enterprise server systems. Although the advanced process and device techniques have greatly improved the cost-per-bit of NAND flash memory, they have also significantly degraded the performance and reliability of NAND flash memory as key side effects of the advanced techniques. In order for NAND flash-based storage systems to be more broadly used in various computing environments, it is critical to overcome the performance and reliability problems of recent high-density NAND flash memory in a satisfactory fashion. In this dissertation, we argue that cross-layer optimization techniques, which vertically integrate various optimization factors from different design abstraction levels, can play key roles in improving performance and reliability of high-density NAND flash memory. First, we propose read-disturb management techniques which reduce the expensive read-disturb management overheads while maintaining reliability of NAND flash memory. An FTL using the read-disturb management module, called redFTL, alleviates highly skewed read accesses to a small part of NAND flash memory into more balanced read accesses to a large number of blocks, thus reducing data migrations needed for avoiding read-disturb errors. As an extended version of redFTL, we propose an integrated read-disturb management technique, called redFTL+, which fundamentally solves read-disturb problems by exploiting a tradeoff between the read disturbance and write speed. By modifying NAND chips to support multiple read modes with different read voltages and write speeds, redFTL+ intelligently allocates frequently-read data to read-resistant blocks. Since the read disturbance is also proportional to the read time, redFTL+ takes advantage of the difference in the read time among different NAND pages by reallocating read-intensive data to read-resistant pages. Second, we propose data separation techniques which reduce garbage collection overhead. We propose a program context-aware data separation technique, called PDS, which can reduce the garbage collection overhead by exploiting program context hints. By using a program context, which serves as a proper granularity of maintaining data update behavior, PDS helps an FTL gather data with similar update times to the same blocks. As an improved version of PDS, we propose an integrated data separation technique, called IDS, which uses both update history of NAND device and program context hints for predicting data update behaviors. By classifying data based on the cross-layer information, an FTL using IDS can make more dead or near-dead blocks over PDS, thus reducing the garbage collection overhead. In order to evaluate the effectiveness of the proposed techniques, we performed a series of evaluations using both a simulator and an emulator with I/O traces which were collected from various systems. Our experimental results show that cross-layer optimization techniques are more effective over our single-layer optimization techniques. RedFTL+ decreases the read-disturb management overhead on average by 24% over redFTL. The IDS-based FTL decreases the garbage collection overhead on aver-age by 18% over the PDS-based FTL. The evaluation results demonstrate that our cross-layer optimization techniques improve an overall performance of NAND-based storage systems over previous single-layered optimization techniques by reducing overheads from read-disturb management and garbage collection while maintaining the reliability of the storage systems.Contents I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Read-Disturb Problem . . . . . . . . . . . . . . . . 2 1.1.2 Garbage Collection Problem . . . . . . . . . . . . . 4 1.2 Research Goals and Contributions . . . . . . . . . . . . . . 7 1.3 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . 9 II. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1 NAND Flash Memory . . . . . . . . . . . . . . . . . . . . 11 2.2 System Software for NAND Flash Memory . . . . . . . . . 17 2.3 NAND Flash-Based Storage Devices . . . . . . . . . . . . . 18 2.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.1 Read-Disturb Techniques . . . . . . . . . . . . . . . 20 2.4.2 Data Separation Techniques . . . . . . . . . . . . . 21 III. A Single-Layered Read Disturb Management Technique . . . 24 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Performance Implications of Read Disturbs . . . . . . . . . 28 3.2.1 Effect of Frequent Read Reclaims . . . . . . . . . . 28 3.2.2 Effect of Read Reclaims on Response Time Fluctu- ations . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.3 Effect of SSD Read Buffer on Read Reclaims . . . . 31 3.3 Read Disturb Management Techniques . . . . . . . . . . . . 32 3.3.1 Data Distribution Technique . . . . . . . . . . . . . 32 3.3.2 Proactive Data Migration . . . . . . . . . . . . . . . 35 3.4 RedFTL: Read Disturb-Aware FTL . . . . . . . . . . . . . . 35 3.4.1 Overview of RedFTL . . . . . . . . . . . . . . . . . 35 3.4.2 Read-Hot Page Separation . . . . . . . . . . . . . . 37 3.4.3 Good Block Pool Management . . . . . . . . . . . . 38 3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . 38 IV. An Integrated Approach for Read Disturb Management . . . 43 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2 Read Disturb Management Techniques . . . . . . . . . . . . 46 4.2.1 Mitigation of Read Reclaims by Read Voltage Scaling 47 4.2.2 Mitigation of Read Reclaims by Read Operation Time Scaling . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2.3 NAND Read-Disturbance Model . . . . . . . . . . . 55 4.3 Design and Implementation of RedFTL+ . . . . . . . . . . . 57 4.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . 57 4.3.2 Dynamic Mode Selection . . . . . . . . . . . . . . . 58 4.3.3 Distributed Migration to RRBs . . . . . . . . . . . . 59 4.3.4 Read-Hotness Detection . . . . . . . . . . . . . . . 61 4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . 63 V. A Single-Layered Data Separation Technique . . . . . . . . . 70 5.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.1.1 Frequency-Based Data Separation . . . . . . . . . . 70 5.1.2 Garbage Collection Using ORA . . . . . . . . . . . 73 5.1.3 Evaluation of Existing Locality-based Heuristic . . . 74 5.2 Correlation between Program Contexts and Updates . . . . 78 5.3 PDS: Program Context-Aware Data Separation Technique . . 82 5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . 87 VI. An Integrated Data Separation Technique . . . . . . . . . . . 93 6.1 Limitations of Single-Layered Program Context-Aware Data Separation Technique . . . . . . . . . . . . . . . . . . . . . 93 6.2 IDS: Integrated Data Separation Technique . . . . . . . . . 94 6.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . 94 6.2.2 Determination of Update Program Context . . . . . 96 6.2.3 Dynamic Clustering Program Contexts Based On Update Locality . . . . . . . . . . . . . . . . . . . . 96 6.2.4 Managing The Hot Data Associated with An Update Program Context . . . . . . . . . . . . . . . . 103 6.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . 104 VII. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.2.1 Improving QoS of RedFTL+ by Exploiting Program Context Hints . . . . . . . . . . . . . . . . . . 114 7.2.2 Mitigating Read-Disturb Problem by Read Disturb- Aware Read Buffer Management Technique . . . . . 115 7.2.3 Improving Efficiency of Garbage Collection by Adjusting GC Trigger Points . . . . . . . . . . . . . . 115 7.2.4 Improving Performance and Reliability of NAND Flash Memory by Integrating Various Techniques . . 117 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126Docto

SNU Open Repository and Archive

Recommended from our members

Cost-Age-Time Data Organized Garbage Collection

Author: Lee Dongjun
Publication venue: 'Oregon State University'
Publication date
Field of study

NAND flash based solid state drives (SSDs) require out-of-place updating due to the characteristics of flash memories. In addition, due to the mismatched granularity between read/write and erase operations, a cleaning policy involving garbage collection and wear leveling has to perform data migration incurring high overhead. Another challenge is that flash devices can tolerate a limited number of erases. This paper proposes the Cost-Age-Time Data Organized Garbage Collection (CATDOG) scheme, which clusters data based on their update frequencies to reduce the overhead of data migration, trades off between endurance and performance, and efficiently erases multiple blocks to reduce garbage collection latency. To the best of our knowledge, this is the first paper to provide a holistic discussion on the effects of combining all three factors. Our simulation study shows that CATDOG achieves a maximum of 3.54 times higher throughput performance and 1.18 times greater endurance than a selected baseline for a heavy write workload

ScholarsArchive@OSU

Flash Memory Devices

Author
Publication venue: 'MDPI AG'
Publication date: 21/03/2022
Field of study

Flash memory devices have represented a breakthrough in storage since their inception in the mid-1980s, and innovation is still ongoing. The peculiarity of such technology is an inherent flexibility in terms of performance and integration density according to the architecture devised for integration. The NOR Flash technology is still the workhorse of many code storage applications in the embedded world, ranging from microcontrollers for automotive environment to IoT smart devices. Their usage is also forecasted to be fundamental in emerging AI edge scenario. On the contrary, when massive data storage is required, NAND Flash memories are necessary to have in a system. You can find NAND Flash in USB sticks, cards, but most of all in Solid-State Drives (SSDs). Since SSDs are extremely demanding in terms of storage capacity, they fueled a new wave of innovation, namely the 3D architecture. Today “3D” means that multiple layers of memory cells are manufactured within the same piece of silicon, easily reaching a terabit capacity. So far, Flash architectures have always been based on "floating gate," where the information is stored by injecting electrons in a piece of polysilicon surrounded by oxide. On the contrary, emerging concepts are based on "charge trap" cells. In summary, flash memory devices represent the largest landscape of storage devices, and we expect more advancements in the coming years. This will require a lot of innovation in process technology, materials, circuit design, flash management algorithms, Error Correction Code and, finally, system co-design for new applications such as AI and security enforcement

Directory of Open Access Books (DOAB)

Exploiting solid state drive parallelism for real-time flash storage

Author: Missimer Katherine
Publication venue
Publication date: 08/02/2021
Field of study

The increased volume of sensor data generated by emerging applications in areas such as autonomous vehicles requires new technologies for storage and retrieval. NAND flash memory has desirable characteristics for real-time information storage and retrieval, such as non-volatility, shock resistance, low power consumption and fast access time. However, NAND flash memory management suffers high tail latency during storage space reclamation. This is unacceptable in a real-time system, where missed deadlines can have potentially catastrophic consequences. Current methods to ensure timing guarantees in flash storage do not explicitly exploit the internal parallelism in Solid State Drives (SSDs). Modern SSDs are able to support massive amounts of parallelism, as evidenced by the shift from the Advanced Host Controller Interface (AHCI) to the Non-Volatile Memory Host Controller Interface (NVMe), a multi-queue interface. This thesis focuses on providing predictable, low-latency guarantees for read and write requests in NAND flash memory by exploiting the internal parallelism in SSDs. The first part of the thesis presents a partitioned flash design that dynamically assigns each parallel flash unit to perform either reads or writes. To access data from a flash unit that is busy servicing a write request or performing garbage collection, the device rebuilds the data using encoding. Consequently, reads are never blocked by writes or storage space reclamation. In this design, however, low read latency is achieved at the expense of write throughput. The second part of the thesis explores how to predictably improve performance by minimizing the garbage collection cost in flash storage. The root cause of this extra cost is due to the SSD’s inability to accurately determine data lifetime and group together data that expires before space needs to be reclaimed. This is exacerbated by the narrow block I/O interface, which prevents optimizations from either the device or the application above. By sharing application-specific knowledge of data lifetime with the device, the SSD is able to efficiently lay out data such that garbage collection cost is minimized

Boston University Institutional Repository (OpenBU)