Search CORE

11 research outputs found

초고용량 솔리드 스테이드 드라이브를 위한 신뢰성 향상 및 성능 최적화 기술

Author: 홍두원
Publication venue: 서울대학교 대학원
Publication date: 01/08/2021
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2021.8. 김지홍.The development of ultra-large NAND flash storage devices (SSDs) is recently made possible by NAND flash memory semiconductor process scaling and multi-leveling techniques, and NAND package technology, which enables continuous increasing of storage capacity by mounting many NAND flash memory dies in an SSD. As the capacity of an SSD increases, the total cost of ownership of the storage system can be reduced very effectively, however due to limitations of ultra-large SSDs in reliability and performance, there exists some obstacles for ultra-large SSDs to be widely adopted. In order to take advantage of an ultra-large SSD, it is necessary to develop new techniques to improve these reliability and performance issues. In this dissertation, we propose various optimization techniques to solve the reliability and performance issues of ultra-large SSDs. In order to overcome the optimization limitations of the existing approaches, our techniques were designed based on various characteristic evaluation results of NAND flash devices and field failure characteristics analysis results of real SSDs. We first propose a low-stress erase technique for the purpose of reducing the characteristic deviation between wordlines (WLs) in a NAND flash block. By reducing the erase stress on weak WLs, it effectively slows down NAND degradation and improves NAND endurance. From the NAND evaluation results, the conditions that can most effectively guard the weak WLs are defined as the gerase mode. In addition, considering the user workload characteristics, we propose a technique to dynamically select the optimal gerase mode that can maximize the lifetime of the SSD. Secondly, we propose an integrated approach that maximizes the efficiency of copyback operations to improve performance while not compromising data reliability. Based on characterization using real 3D TLC flash chips, we propose a novel per-block error propagation model under consecutive copyback operations. Our model significantly increases the number of successive copybacks by exploiting the aging characteristics of NAND blocks. Furthermore, we devise a resource-efficient error management scheme that can handle successive copybacks where pages move around multiple blocks with different reliability. By utilizing proposed copyback operation for internal data movement, SSD performance can be effectively improved without any reliability issues. Finally, we propose a new recovery scheme, called reparo, for a RAID storage system with ultra-large SSDs. Unlike the existing RAID recovery schemes, reparo repairs a failed SSD at the NAND die granularity without replacing it with a new SSD, thus avoiding most of the inter-SSD data copies during a RAID recovery step. When a NAND die of an SSD fails, reparo exploits a multi-core processor of the SSD controller to identify failed LBAs from the failed NAND die and to recover data from the failed LBAs. Furthermore, reparo ensures no negative post-recovery impact on the performance and lifetime of the repaired SSD. In order to evaluate the effectiveness of the proposed techniques, we implemented them in a storage device prototype, an open NAND flash storage device development environment, and a real SSD environment. And their usefulness was verified using various benchmarks and I/O traces collected the from real-world applications. The experiment results show that the reliability and performance of the ultra-large SSD can be effectively improved through the proposed techniques.반도체 공정의 미세화, 다치화 기술에 의해서 지속적으로 용량이 증가하고 있는 단위 낸드 플래쉬 메모리와 하나의 낸드 플래쉬 기반 스토리지 시스템 내에 수 많은 낸드 플래쉬 메모리 다이를 실장할 수 있게하는 낸드 패키지 기술로 인해 하드디스크보다 훨씬 더 큰 초고용량의 낸드 플래쉬 저장장치의 개발을 가능하게 했다. 플래쉬 저장장치의 용량이 증가할 수록 스토리지 시스템의 총 소유비용을 줄이는데 매우 효과적인 장점을 가지고 있으나, 신뢰성 및 성능의 측면에서의 한계로 인해서 초고용량 낸드 플래쉬 저장장치가 널리 사용되는데 있어서 장애물로 작용하고 있다. 초고용량 저장장치의 장점을 활용하기 위해서는 이러한 신뢰성 및 성능을 개선하기 위한 새로운 기법의 개발이 필요하다. 본 논문에서는 초고용량 낸드기반 저장장치(SSD)의 문제점인 성능 및 신뢰성을 개선하기 위한 다양한 최적화 기술을 제안한다. 기존 기법들의 최적화 한계를 극복하기 위해서, 우리의 기술은 실제 낸드 플래쉬 소자에 대한 다양한 특성 평가 결과와 SSD의 현장 불량 특성 분석결과를 기반으로 고안되었다. 이를 통해서 낸드의 플래쉬 특성과 SSD, 그리고 호스트 시스템의 동작 특성을 고려한 성능 및 신뢰성을 향상시키는 최적화 방법론을 제시한다. 첫째로, 본 논문에서는 낸드 플래쉬 불록내의 페이지들간의 특성편차를 줄이기 위해서 동적인 소거 스트레스 경감 기법을 제안한다. 제안된 기법은 낸드 블록의 내구성을 늘리기 위해서 특성이 약한 페이지들에 대해서 더 적은 소거 스트레스가 인가할 수 있도록 낸드 평가 결과로 부터 소거 스트레스 경감 모델을 구축한다. 또한 사용자 워크로드 특성을 고려하여, 소거 스트레스 경감 기법의 효과가 최대화 될 수 있는 최적의 경감 수준을 동적으로 판단할 수 있도록 한다. 이를 통해서 낸드 블록을 열화시키는 주요 원인인 소거 동작을 효율적으로 제어함으로써 저장장치의 수명을 효과적으로 향상시킨다. 둘째로, 본 논문에서는 고용량 SSD에서의 내부 데이터 이동으로 인한 성능 저하문제를 개선하기 위해서 낸드 플래쉬의 제한된 카피백(copyback) 명령을 활용하는 적응형 기법인 rCPB을 제안한다. rCPB은 Copyback 명령의 효율성을 극대화 하면서도 데이터 신뢰성에 문제가 없도록 낸드의 블럭의 노화특성을 반영한 새로운 copyback 오류 전파 모델을 기반으로한다. 이에더해, 신뢰성이 다른 블럭간의 copyback 명령을 활용한 데이터 이동을 문제없이 관리하기 위해서 자원 효율적인 오류 관리 체계를 제안한다. 이를 통해서 신뢰성에 문제를 주지 않는 수준에서 copyback을 최대한 활용하여 내부 데이터 이동을 최적화 함으로써 SSD의 성능향상을 달성할 수 있다. 마지막으로, 본 논문에서는 초고용량 SSD에서 낸드 플래쉬의 다이(die) 불량으로 인한 레이드(redundant array of independent disks, RAID) 리빌드 오버헤드를 최소화 하기위한 새로운 RAID 복구 기법인 reparo를 제안한다. Reparo는 SSD에 대한 교체없이 SSD의 불량 die에 대해서만 복구를 수행함으로써 복구 오버헤드를 최소화한다. 불량이 발생한 die의 데이터만 선별적으로 복구함으로써 복구 과정의 리빌드 트래픽을 최소화하며, SSD 내부의 병렬구조를 활용하여 불량 die 복구 시간을 효과적으로 단축한다. 또한 die 불량으로 인한 물리적 공간감소의 부작용을 최소화 함으로써 복구 이후의 성능 저하 및 수명의 감소 문제가 없도록 한다. 본 논문에서 제안한 기법들은 저장장치 프로토타입 및 공개 낸드 플래쉬 저장장치 개발환경, 그리고 실장 SSD환경에 구현되었으며, 실제 응용 프로그램을 모사한 다양한 벤트마크 및 실제 I/O 트레이스들을 이용하여 그 유용성을 검증하였다. 실험 결과, 제안된 기법들을 통해서 초고용량 SSD의 신뢰성 및 성능을 효과적으로 개선할 수 있음을 확인하였다.I Introduction 1 1.1 Motivation 1 1.2 Dissertation Goals 3 1.3 Contributions 5 1.4 Dissertation Structure 8 II Background 11 2.1 Overview of 3D NAND Flash Memory 11 2.2 Reliability Management in NAND Flash Memory 14 2.3 UL SSD architecture 15 2.4 Related Work 17 2.4.1 NAND endurance optimization by utilizing page characteristics difference 17 2.4.2 Performance optimizations using copyback operation 18 2.4.3 Optimizations for RAID Rebuild 19 2.4.4 Reliability improvement using internal RAID 20 III GuardedErase: Extending SSD Lifetimes by Protecting Weak Wordlines 22 3.1 Reliability Characterization of a 3D NAND Flash Block 22 3.1.1 Large Reliability Variations Among WLs 22 3.1.2 Erase Stress on Flash Reliability 26 3.2 GuardedErase: Design Overview and its Endurance Model 28 3.2.1 Basic Idea 28 3.2.2 Per-WL Low-Stress Erase Mode 31 3.2.3 Per-Block Erase Modes 35 3.3 Design and Implementation of LongFTL 39 3.3.1 Overview 39 3.3.2 Weak WL Detector 40 3.3.3 WAF Monitor 42 3.3.4 GErase Mode Selector 43 3.4 Experimental Results 46 3.4.1 Experimental Settings 46 3.4.2 Lifetime Improvement 47 3.4.3 Performance Overhead 49 3.4.4 Effectiveness of Lowest Erase Relief Ratio 50 IV Improving SSD Performance Using Adaptive Restricted- Copyback Operations 52 4.1 Motivations 52 4.1.1 Data Migration in Modern SSD 52 4.1.2 Need for Block Aging-Aware Copyback 53 4.2 RCPB: Copyback with a Limit 55 4.2.1 Error-Propagation Characteristics 55 4.2.2 RCPB Operation Model 58 4.3 Design and Implementation of rcFTL 59 4.3.1 EPM module 60 4.3.2 Data Migration Mode Selection 64 4.4 Experimental Results 65 4.4.1 Experimental Setup 65 4.4.2 Evaluation Results 66 V Reparo: A Fast RAID Recovery Scheme for Ultra- Large SSDs 70 5.1 SSD Failures: Causes and Characteristics 70 5.1.1 SSD Failure Types 70 5.1.2 SSD Failure Characteristics 72 5.2 Impact of UL SSDs on RAID Reliability 74 5.3 RAID Recovery using Reparo 77 5.3.1 Overview of Reparo 77 5.4 Cooperative Die Recovery 82 5.4.1 Identifier: Parallel Search of Failed LBAs 82 5.4.2 Handler: Per-Core Space Utilization Adjustment 83 5.5 Identifier Acceleration Using P2L Mapping Information 89 5.5.1 Page-level P2L Entrustment to Neighboring Die 90 5.5.2 Block-level P2L Entrustment to Neighboring Die 92 5.5.3 Additional Considerations for P2L Entrustment 94 5.6 Experimental Results 95 5.6.1 Experimental Settings 95 5.6.2 Experimental Results 97 VI Conclusions 109 6.1 Summary 109 6.2 Future Work 111 6.2.1 Optimization with Accurate WAF Prediction 111 6.2.2 Maximizing Copyback Threshold 111 6.2.3 Pre-failure Detection 112박

SNU Open Repository and Archive

Fourth NASA Goddard Conference on Mass Storage Systems and Technologies

Author: Hariharan P. C.
Kobler Benjamin
Publication venue
Publication date
Field of study

This report contains copies of all those technical papers received in time for publication just prior to the Fourth Goddard Conference on Mass Storage and Technologies, held March 28-30, 1995, at the University of Maryland, University College Conference Center, in College Park, Maryland. This series of conferences continues to serve as a unique medium for the exchange of information on topics relating to the ingestion and management of substantial amounts of data and the attendant problems involved. This year's discussion topics include new storage technology, stability of recorded media, performance studies, storage system solutions, the National Information infrastructure (Infobahn), the future for storage technology, and lessons learned from various projects. There also will be an update on the IEEE Mass Storage System Reference Model Version 5, on which the final vote was taken in July 1994

NASA Technical Reports Server

Prefetching and Caching Techniques in File Systems for Mimd Multiprocessors

Author: Kotz David F
Publication venue: Dartmouth Digital Commons
Publication date: 01/01/1991
Field of study

The increasing speed of the most powerful computers, especially multiprocessors, makes it difficult to provide sufficient I/O bandwidth to keep them running at full speed for the largest problems. Trends show that the difference in the speed of disk hardware and the speed of processors is increasing, with I/O severely limiting the performance of otherwise fast machines. This widening access-time gap is known as the “I/O bottleneck crisis.” One solution to the crisis, suggested by many researchers, is to use many disks in parallel to increase the overall bandwidth. \par This dissertation studies some of the file system issues needed to get high performance from parallel disk systems, since parallel hardware alone cannot guarantee good performance. The target systems are large MIMD multiprocessors used for scientific applications, with large files spread over multiple disks attached in parallel. The focus is on automatic caching and prefetching techniques. We show that caching and prefetching can transparently provide the power of parallel disk hardware to both sequential and parallel applications using a conventional file system interface. We also propose a new file system interface (compatible with the conventional interface) that could make it easier to use parallel disks effectively. \par Our methodology is a mixture of implementation and simulation, using a software testbed that we built to run on a BBN GP1000 multiprocessor. The testbed simulates the disks and fully implements the caching and prefetching policies. Using a synthetic workload as input, we use the testbed in an extensive set of experiments. The results show that prefetching and caching improved the performance of parallel file systems, often dramatically

CiteSeerX

Dartmouth Digital Commons (Dartmouth College)

Galley: A New Parallel File System for Parallel Applications

Author: Nieuwejaar Nils
Publication venue: Dartmouth Digital Commons
Publication date: 01/11/1996
Field of study

Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/O requirements of parallel scientific applications. Most multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access those multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated application and library programmers to use knowledge about their I/O to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. In this work we examine current multiprocessor file systems, as well as how those file systems are used by scientific applications. Contrary to the expectations of the designers of current parallel file systems, the workloads on those systems are dominated by requests to read and write small pieces of data. Furthermore, rather than being accessed sequentially and contiguously, as in uniprocessor and supercomputer workloads, files in multiprocessor file systems are accessed in regular, structured, but non-contiguous patterns. Based on our observations of multiprocessor workloads, we have designed Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. In this work, we introduce Galley and discuss its design and implementation. We describe Galley\u27s new three-dimensional file structure and discuss how that structure can be used by parallel applications to achieve higher performance. We introduce several new data-access interfaces, which allow applications to explicitly describe the regular access patterns we found to be common in parallel file system workloads. We show how these new interfaces allow parallel applications to achieve tremendous increases in I/O performance. Finally, we discuss how Galley\u27s new file structure and data-access interfaces can be useful in practice

Dartmouth Digital Commons (Dartmouth College)

Clusterfile: a parallel file system for clusters

Author: Isaila Florin Daniel
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2004
Field of study

KITopen

Supercomputing Frontiers

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This open access book constitutes the refereed proceedings of the 6th Asian Supercomputing Conference, SCFA 2020, which was planned to be held in February 2020, but unfortunately, the physical conference was cancelled due to the COVID-19 pandemic. The 8 full papers presented in this book were carefully reviewed and selected from 22 submissions. They cover a range of topics including file systems, memory hierarchy, HPC cloud platform, container image configuration workflow, large-scale applications, and scheduling

OAPEN Library

A Fast Approach to Scale Up Disk Arrays With Parity Declustered Data Layout by Minimizing Data Migration

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Crossref