174 research outputs found
Self-Learning Hot Data Prediction: Where Echo State Network Meets NAND Flash Memories
Β© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Well understanding the access behavior of hot data is significant for NAND flash memory due to its crucial impact on the efficiency of garbage collection (GC) and wear leveling (WL), which respectively dominate the performance and life span of SSD. Generally, both GC and WL rely greatly on the recognition accuracy of hot data identification (HDI). However, in this paper, the first time we propose a novel concept of hot data prediction (HDP), where the conventional HDI becomes unnecessary. First, we develop a hybrid optimized echo state network (HOESN), where sufficiently unbiased and continuously shrunk output weights are learnt by a sparse regression based on L2 and L1/2 regularization. Second, quantum-behaved particle swarm optimization (QPSO) is employed to compute reservoir parameters (i.e., global scaling factor, reservoir size, scaling coefficient and sparsity degree) for further improving prediction accuracy and reliability. Third, in the test on a chaotic benchmark (Rossler), the HOESN performs better than those of six recent state-of-the-art methods. Finally, simulation results about six typical metrics tested on five real disk workloads and on-chip experiment outcomes verified from an actual SSD prototype indicate that our HOESN-based HDP can reliably promote the access performance and endurance of NAND flash memories.Peer reviewe
Dynamic Virtual Page-based Flash Translation Layer with Novel Hot Data Identification and Adaptive Parallelism Management
Solid-state disks (SSDs) tend to replace traditional motor-driven hard disks in high-end storage devices in past few decades. However, various inherent features, such as out-of-place update [resorting to garbage collection (GC)] and limited endurance (resorting to wear leveling), need to be reduced to a large extent before that day comes. Both the GC and wear leveling fundamentally depend on hot data identification (HDI). In this paper, we propose a hot data-aware flash translation layer architecture based on a dynamic virtual page (DVPFTL) so as to improve the performance and lifetime of NAND flash devices. First, we develop a generalized dual layer HDI (DL-HDI) framework, which is composed of a cold data pre-classifier and a hot data post-identifier. Those can efficiently follow the frequency and recency of information access. Then, we design an adaptive parallelism manager (APM) to assign the clustered data chunks to distinct resident blocks in the SSD so as to prolong its endurance. Finally, the experimental results from our realized SSD prototype indicate that the DVPFTL scheme has reliably improved the parallelizability and endurance of NAND flash devices with improved GC-costs, compared with related works.Peer reviewe
Towards Design and Analysis For High-Performance and Reliable SSDs
NAND Flash-based Solid State Disks have many attractive technical merits, such as low power consumption, light weight, shock resistance, sustainability of hotter operation regimes, and extraordinarily high performance for random read access, which makes SSDs immensely popular and be widely employed in different types of environments including portable devices, personal computers, large data centers, and distributed data systems.
However, current SSDs still suffer from several critical inherent limitations, such as the inability of in-place-update, asymmetric read and write performance, slow garbage collection processes, limited endurance, and degraded write performance with the adoption of MLC and TLC techniques. To alleviate these limitations, we propose optimizations from both specific outside applications layer and SSDs\u27 internal layer. Since SSDs are good compromise between the performance and price, so SSDs are widely deployed as second layer caches sitting between DRAMs and hard disks to boost the system performance. Due to the special properties of SSDs such as the internal garbage collection processes and limited lifetime, traditional cache devices like DRAM and SRAM based optimizations might not work consistently for SSD-based cache. Therefore, for the outside applications layer, our work focus on integrating the special properties of SSDs into the optimizations of SSD caches. Moreover, our work also involves the alleviation of the increased Flash write latency and ECC complexity due to the adoption of MLC and TLC technologies by analyzing the real work workloads
Study On Endurance Of Flash Memory Ssds
Flash memory promises to revolutionize storage systems because of its massive performance gains, ruggedness, large decrease in power usage and physical space requirements, but it is not a direct replacement for magnetic hard disks. Flash memory possesses fundamentally different characteristics and in order to fully utilize the positive aspects of flash memory, we must engineer around its unique limitations. The primary limitations are lack of in-place updates, the asymmetry between the sizes of the write and erase operations, and the limited endurance of flash memory cells. This leads to the need for efficient methods for block cleaning, combating write amplification and performing wear leveling. These are fundamental attributes of flash memory and will always need to be understood and efficiently managed to produce an efficient and high performance storage system.
Our goal in this work is to provide analysis and algorithms for efficiently managing data storage for endurance in flash memory. We present update codes, a class of floating codes, which encodes data updates as flash memory cell increments that results in reduced block erases and longer lifespan of flash memory, and provides a new algorithm for constructing optimal floating codes. We also analyze the theoretically possible limits of write amplification reduction and minimization by using offline workloads. We give an estimation of the minimal write amplification by a workload decomposition algorithm and find that write amplification can be pushed to zero with relatively low over-provisioning. Additionally, we give simple, efficient and practical algorithms that are effective in reducing write amplification and performing wear leveling. Finally, we present a quantitative model of wear levels in flash memory by constructing a difference equation that gives erase counts of a block with workload, wear leveling strategy and SSD configuration as parameters
Elevating commodity storage with the SALSA host translation layer
To satisfy increasing storage demands in both capacity and performance,
industry has turned to multiple storage technologies, including Flash SSDs and
SMR disks. These devices employ a translation layer that conceals the
idiosyncrasies of their mediums and enables random access. Device translation
layers are, however, inherently constrained: resources on the drive are scarce,
they cannot be adapted to application requirements, and lack visibility across
multiple devices. As a result, performance and durability of many storage
devices is severely degraded.
In this paper, we present SALSA: a translation layer that executes on the
host and allows unmodified applications to better utilize commodity storage.
SALSA supports a wide range of single- and multi-device optimizations and,
because is implemented in software, can adapt to specific workloads. We
describe SALSA's design, and demonstrate its significant benefits using
microbenchmarks and case studies based on three applications: MySQL, the Swift
object store, and a video server.Comment: Presented at 2018 IEEE 26th International Symposium on Modeling,
Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS
κ³ μ±λ₯ μ»΄ν¨ν μμ€ν μμ λ²μ€νΈ λ²νΌλ₯Ό μν I/O λΆλ¦¬ κΈ°λ²μ μ€μ¦μ ꡬν
νμλ
Όλ¬Έ(μμ¬)--μμΈλνκ΅ λνμ :곡과λν μ»΄ν¨ν°κ³΅νλΆ,2019. 8. μνμ.To meet the exascale I/O requirements in the High-Performance Computing (HPC), a new I/O subsystem, named Burst Buffer, based on non-volatile memory, has been developed. However, the diverse HPC workloads and the bursty I/O pattern cause severe data fragmentation to SSDs, which creates the need for expensive garbage collection (GC) and also increase the number of bytes actually written to SSD. The new multi-stream feature in SSDs offers an option to reduce the cost of garbage collection. In this paper, we leverage this multi-stream feature to group the I/O streams based on the user IDs and implement this strategy in a burst buffer we call BIOS, short for Burst Buffer with an I/O Separation scheme. Furthermore, to optimize the I/O separation scheme in burst buffer environments, we propose a stream-aware scheduling policy based on burst buffer pools in workload manager and implement the real burst buffer system, BIOS framework, by integrating the BIOS with workload manager. We evaluate the BIOS and framework with a burst buffer I/O traces from Cori Supercomputer including a diverse set of applications. We also disclose and analyze the benefits and limitations of using I/O separation scheme in HPC systems. Experimental results show that the BIOS could improve the performance by 1.44Γ on average and reduce the Write Amplification Factor (WAF) by up to 1.20Γ, and prove that the framework can keep on the benefits of the I/O separation scheme in the HPC environment.Abstract
Introduction 1
Background and Challenges 5
Burst Buffer 5
Write Amplification in SSDs 6
Multi-streamed SSD 7
Challenges of Multi-stream Feature in Burst Buffers 7
I/O Separation Scheme in Burst Buffer 10
Stream Allocation Criteria 10
Implementation 12
Limitations of User ID-based Stream Allocation 14
BIOS Framework 15
Support in Workload Manager 15
Burst Buffer Pools 16
Stream-Aware Scheduling Policy 18
Workflow of BIOS Framework 20
Evaluation 21
Experiment Setup 21
Evaluation with Synthetic Workload 21
Evaluation with HPC Applications 25
Evaluation with Emulated Workload 27
Evaluation with Different Striping Configuration 29
Evaluation on BIOS Framework 30
Summary and Lessons Learned 33
An I/O Separation Scheme in Burst Buffer 33
Evaluation with Synthetic Workload 33
Evaluation with HPC Applications 33
Evaluation with Emulated Workload 34
Evaluation with Striping Configurations 34
A BIOS Framework 34
Evaluation with Real Burst Buffer Environments 34
Discussion 36
Limited Number of Nodes 36
Advanced BIOS Framework 37
Related work 38
Conclusions 40
Bibliography 42
μ΄λ‘ 48Maste
- β¦