Search CORE

56 research outputs found

Towards Design and Analysis For High-Performance and Reliable SSDs

Author: Xia Qianbin
Publication venue: VCU Scholars Compass
Publication date: 01/01/2017
Field of study

NAND Flash-based Solid State Disks have many attractive technical merits, such as low power consumption, light weight, shock resistance, sustainability of hotter operation regimes, and extraordinarily high performance for random read access, which makes SSDs immensely popular and be widely employed in different types of environments including portable devices, personal computers, large data centers, and distributed data systems. However, current SSDs still suffer from several critical inherent limitations, such as the inability of in-place-update, asymmetric read and write performance, slow garbage collection processes, limited endurance, and degraded write performance with the adoption of MLC and TLC techniques. To alleviate these limitations, we propose optimizations from both specific outside applications layer and SSDs\u27 internal layer. Since SSDs are good compromise between the performance and price, so SSDs are widely deployed as second layer caches sitting between DRAMs and hard disks to boost the system performance. Due to the special properties of SSDs such as the internal garbage collection processes and limited lifetime, traditional cache devices like DRAM and SRAM based optimizations might not work consistently for SSD-based cache. Therefore, for the outside applications layer, our work focus on integrating the special properties of SSDs into the optimizations of SSD caches. Moreover, our work also involves the alleviation of the increased Flash write latency and ECC complexity due to the adoption of MLC and TLC technologies by analyzing the real work workloads

VCU Scholars Compass

Recommended from our members

Data reliability and error correction for NAND Flash Memory System

Author: Xu Quan
Publication venue
Publication date
Field of study

NAND flash memory has been widely used for data storage due to its high density, high throughput, and low power. However, as the flash memory scales to smaller process technologies and stores more bits per cell, its reliability is decreasing. The error correction coding can be used to significantly improve the data reliability; nevertheless, the advanced ECCs such as low-density parity-check (LDPC) codes generally demand soft decisions while NAND flash memory channel provides hard-decisions only. Extracting the soft information requires the accurate characterization of flash memory channel and the effective design of coding schemes. To this end, we have presented a novel LDPC-TCM coding scheme for the Multilevel Cell (MLC) flash memories. The a posteriori TCM decoding algorithm is used in the scheme to generate soft information, which is fed to the LDPC decoder for further correction of data bits. It has been demonstrated that the proposed scheme can achieve higher error correction performance than the traditional hard-decisions based flash coding algorithms, and is feasible in the design practice. Further with the LDPC-TCM, we believe it is important to characterize the flash memory channel and investigate a method to calculate the soft decision for each bit, with the available channel outputs. We studied the various noises and interferences occurring in the memory channel and mathematically formulated the probability density function of the overall noise distribution. Based on the results we derived the final distribution for the cell threshold voltages, which can be used to instruct the calculation of soft decisions. The discoveries on the theoretical level have been demonstrated to be consistent with the real channel behaviours. The channel characterization and model provided in this dissertation can enable more design of soft-decisions based ECCs for future NAND flash memories. The data pattern processing algorithm deals with the write patterns and targets to lower the proportion of patterns that would introduce data errors. On the other hand, the voltages applied to the memory cells charges the MOSFET capacitances frequently on programming these data patterns, leading to the power problem. The high energy consumption and current spikes also cause reliability issue to the data stored in the flash memory. This dissertation proposes a write pattern formatting algorithm (WPFA) attempting to solve the two problems together. We have designed and implemented the algorithm and evaluated its performance through both the software simulations and hardware synthesis

City Research Online

Improving Reliability and Performance of NAND Flash Based Storage System

Author: Guo Jie
Publication venue
Publication date: 15/06/2016
Field of study

High seek and rotation overhead of magnetic hard disk drive (HDD) motivates development of storage devices, which can offer good random performance. As an alternative technology, NAND flash memory demonstrates low power consumption, microsecond-order access latency and good scalability. Thanks to these advantages, NAND flash based solid state disks (SSD) show many promising applications in enterprise servers. With multi-level cell (MLC) technique, the per-bit fabrication cost is reduced and low production cost enables NAND flash memory to extend its application to the consumer electronics. Despite these advantages, limited memory endurance, long data protection latency and write amplification continue to be the major challenges in the designs of NAND flash storage systems. The limited memory endurance and long data protection latency issue derive from memory bit errors. High bit error rate (BER) severely impairs data integrity and reduces memory durance. The limited endurance is a major obstacle to apply NAND flash memory to the application with high reliability requirement. To protect data integrity, hard-decision error correction codes (ECC) such as Bose-Chaudhuri-Hocquenghem (BCH) are employed. However, the hardware cost becomes prohibitively with the increase of BER when the BCH ECC is employed to extend system lifetime. To extend system lifespan without high hardware cost, we has proposed data pattern aware (DPA) error prevention system design. DPA realizes BER reduction by minimizing the occurrence of data patterns vulnerable to high BER with simple linear feedback shift register circuits. Experimental results show that DPA can increase the system lifetime by up to 4× with marginal hardware cost. With the technology node scaling down to 2Xnm, BER increases up to 0.01. Hard-decision ECCs and DPA are no longer applicable to guarantee data integrity due to either prohibitively high hardware cost or high storage overhead. Soft-decision ECC, such as lowdensity parity check (LDPC) code, has been introduced to provide more powerful error correction capability. However, LDPC code demands extra memory sensing operations, directly leading to long read latency. To reduce LDPC code induced read latency without adverse impact on system reliability, we has proposed FlexLevel NAND flash storage system design. The FlexLevel design reduces BER by broadening the noise margin via threshold voltage (Vth) level reduction. Under relatively low BER, no extra sensing level is required and therefore read performance can be improved. To balance Vth level reduction induced capacity loss and the read speedup, the FlexLevel design identifies the data with high LDPC overhead and only performs Vth reduction to these data. Experimental results show that compared with the best existing works, the proposed design achieves up to 11% read speedup with negligible capacity loss. Write amplification is a major cause to performance and endurance degradation of the NAND flash based storage system. In the object-based NAND flash device (ONFD), write amplification partially results from onode partial update and cascading update. Onode partial update only over-writes partial data of a NAND flash page and incurs unnecessary data migration of the un-updated data. Cascading update is update to object metadata in a cascading manner due to object data update or migration. Even through only several bytes in the object metadata are updated, one or more page has to be re-written, significantly degrading write performance. To minimize write operations incurred by onode partial update and cascading update, we has proposed a Data Migration Minimizing (DMM) device design. The DMM device incorporates 1) the multi-level garbage collection technique to minimize the unnecessary data migration of onode partial update and 2) the virtual B+ tree and diff cache to reduce the write operations incurred by cascading update. The experiment results demonstrate that the DMM device can offer up to 20% write reduction compared with the best state-of-art works

D-Scholarship@Pitt

FLARES: an aging aware algorithm to autonomously adapt the error correction capability in NAND Flash memories

Author: Bertozzi D.
Di Carlo Stefano
Galfano Salvatore
Indaco Marco
Olivo P.
Prinetto Paolo Ernesto
Zambelli C.
Publication venue: ACM
Publication date: 01/01/2014
Field of study

With the advent of solid-state storage systems, NAND flash memories are becoming a key storage technology. However, they suffer from serious reliability and endurance issues during the operating lifetime that can be handled by the use of appropriate error correction codes (ECC) in order to reconstruct the information when needed.. Adaptable ECCs may provide the flexibility to avoid worst-case reliability design thus leading to improved performance. However, a way to control such adaptable ECCs strength is required. This paper proposes FLARES, an algorithm able to adapt the ECC correction capability of each page of a flash based on a flash RBER prediction model and on a measurement of the number of errors detected in a given time window. FLARES has been fully implemented within the YAFFS 2 filesystem under the Linux operating system. This allowed us to perform an extensive set of simulations on a set of standard benchmarks that highlighted the benefit of FLARES on the overall storage subsystem performance

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Archivio istituzionale della ricerca - Università di Ferrara

PORTO Publications Open Repository TOrino

Flash Memory Devices

Author
Publication venue: 'MDPI AG'
Publication date: 21/03/2022
Field of study

Flash memory devices have represented a breakthrough in storage since their inception in the mid-1980s, and innovation is still ongoing. The peculiarity of such technology is an inherent flexibility in terms of performance and integration density according to the architecture devised for integration. The NOR Flash technology is still the workhorse of many code storage applications in the embedded world, ranging from microcontrollers for automotive environment to IoT smart devices. Their usage is also forecasted to be fundamental in emerging AI edge scenario. On the contrary, when massive data storage is required, NAND Flash memories are necessary to have in a system. You can find NAND Flash in USB sticks, cards, but most of all in Solid-State Drives (SSDs). Since SSDs are extremely demanding in terms of storage capacity, they fueled a new wave of innovation, namely the 3D architecture. Today “3D” means that multiple layers of memory cells are manufactured within the same piece of silicon, easily reaching a terabit capacity. So far, Flash architectures have always been based on "floating gate," where the information is stored by injecting electrons in a piece of polysilicon surrounded by oxide. On the contrary, emerging concepts are based on "charge trap" cells. In summary, flash memory devices represent the largest landscape of storage devices, and we expect more advancements in the coming years. This will require a lot of innovation in process technology, materials, circuit design, flash management algorithms, Error Correction Code and, finally, system co-design for new applications such as AI and security enforcement

Directory of Open Access Books (DOAB)

낸드플래시 메모리 오류정정을 위한 고성능 LDPC 복호방법 연구

Author: 김종홍
Publication venue: 서울대학교 대학원
Publication date: 01/08/2013
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2013. 8. 성원용.반도체 공정의 미세화에 따라 비트 에러율이 증가하는 낸드 플래시 메모리에서 고성능 에러 정정 방법은 필수적이다. Low-density parity-check (LDPC) 부호와 같은 연판정 에러 정정 부호는 뛰어난 에러 정정 성능을 보이지만, 높은 구현 복잡도로 인해 플래시 메모리 시스템에 적용되기 힘든 단점이 있다. 본 논문에서는 LDPC 부호의 효율적인 복호를 위해 고성능 메시지 전파 스케줄링 방법과 저 복잡도 복호 알고리즘을 제안한다. 특히 finite geometry (FG) LDPC 부호에 대한 효율적인 디코더 아키텍쳐를 제안하며, 구현된 디코더를 이용하여 낸드 플래시 메모리에 대해 연판정 복호시의 에너지 소모량에 대해 연구한다. 본 논문의 첫 번째 부분에서는 동적 스케줄링 (informed dynamic scheduling, IDS) 알고리즘의 성능향상 방법에 대해 연구한다. 이를 위해 우선 기존의 가장 빠른 수렴 속도를 보이는 IDS 알고리즘인 레지듀얼 신뢰 전파 (residual belief propagation, RBP) 알고리즘의 동작 특성을 분석하고, 이를 바탕으로 특정 노드에 메시지 갱신이 집중되는 것을 방지하여 RBP 알고리즘의 수렴속도를 증가시킨 improved RBP (iRBP) 알고리즘을 제안한다. 또한 iRBP의 뛰어난 수렴속도와 기존의 NS 알고리즘의 우수한 에러 정정 능력을 모두 갖춘 신드롬 기반의 혼합 스케줄링 (mixed scheduling) 방법을 제안한다. 끝으로 다양한 부호율의 LDPC 부호에 대한 모의실험을 통해 제안된 신드롬 기반의 혼합 스케줄링 방법이 본 논문에서 시험된 다른 모든 스케줄링 알고리즘의 성능을 능가함을 확인하였다. 논문의 두 번째 부분에서는 복호 실패시 많은 비트 에러를 발생시키는 a posteriori probability (APP) 알고리즘의 개선 방안에 방안을 제안한다. 또한 빠른 수렴속도와 우수한 에러 마루 (error-floor) 성능으로 데이터 저장장치에 적합한 FG-LDPC 부호에 대해 제안된 알고리즘이 적용된 하드웨어 아키텍처를 제안하였다. 제안된 아키텍처는 높은 노드 가중치를 가지는 FG-LDPC 부호에 적합하도록 쉬프트 레지스터 (shift registers)와 SRAM 기반의 혼합 구조를 채용하며, 높은 처리량을 얻기 위해 파이프라인된 병렬 아키텍처를 사용한다. 또한 메모리 사용량을 줄이기 위해 세 가지의 메모리 용량 감소 기법을 적용하며, 전력 소비를 줄이기 위해 두 가지의 저전력 기법을 제안한다. 본 제안된 아키텍처는 부호율 0.96의 (68254, 65536) Euclidean geometry LDPC 부호에 대해 0.13-um CMOS 공정에서 구현하였다. 마지막으로 본 논문에서는 연판정 복호가 적용된 낸드 플래시 메모리 시스템의 에너지 소모를 낮추는 방법에 대해 제안한다. 연판정 기반의 에러 정정 알고리즘은 높은 성능을 보이지만, 이는 플래시 메모리의 센싱 수와 에너지 소모를 증가 시키는 단점이 있다. 본 연구에서는 앞서 구현된 LDPC 디코더가 채용된 낸드 플래시 메모리 시스템의 에너지 소모를 분석하고, LDPC 디코더와 BCH 디코더 간의 칩 사이즈와 에너지 소모량을 비교하였다. 이와 더불어 본 논문에서는 LDPC 디코더를 이용한 센싱 정밀도 결정 방법을 제안한다. 본 연구를 통해 제안된 복호 및 스케줄링 알고리즘, VLSI 아키텍쳐, 그리고 읽기 정밀도 결정 방법을 통해 낸드 플래시 메모리 시스템의 에러 정정 성능을 극대화 하고 에너지 소모를 최소화 할 수 있다.High-performance error correction for NAND flash memory is greatly needed because the raw bit error rate increases as the semiconductor geometry shrinks for high density. Soft-decision error correction, such as low-density parity-check (LDPC) codes, offers high performance but their implementation complexity hinders wide adoption to consumer products. This dissertation proposes two high-performance message-passing schedules and a low-complexity decoding algorithm for LDPC codes. In particular, an efficient decoder architecture for finite geometry (FG) LDPC codes is proposed, and the energy consumption of soft-decision decoding for NAND flash memory is analyzed. The first part of this dissertation is devoted to improving the informed dynamic scheduling (IDS) algorithms. We analyze the behavior of the residual belief propagation (RBP), which is the fastest IDS algorithm, and develop an improved RBP (iRBP) by avoiding the concentration of message updates at a particular node. We also study the syndrome-based mixed scheduling of the iRBP and the node-wise scheduling (NS). The proposed mixed scheduling outperforms all other scheduling methods tested in this work. The next part of this dissertation is to develop a conditional variable node update scheme for the a posteriori probability (APP) algorithm. The developed algorithm is robust to decoding failures and can reduce the dynamic power consumption by lowering switching activities in the LDPC decoder. To implement the developed algorithm, we propose a memory-efficient pipelined parallel architecture for LDPC decoding. The architecture employs FG-LDPC codes that not only show fast convergence speed and good error-floor performance but also perform well with iterative decoding algorithms, which is especially suitable for data storage devices. We also developed a rate-0.96 (68254, 65536) Euclidean geometry LDPC code and implemented the proposed architecture in 0.13-um CMOS technology. This dissertation also covers low-energy error correction of NAND flash memory through soft-decision decoding. The soft-decision-based error correction algorithms show high performance, but they demand an increased number of flash memory sensing operations and consume more energy for memory access. We examine the energy consumption of a NAND flash memory system equipping an LDPC code-based soft-decision error correction circuit. The sum of energy consumed at NAND flash memory and the LDPC decoder is minimized. In addition, the chip size and energy consumption of the decoder were compared with those of two Bose-Chaudhuri-Hocquenghem (BCH) decoding circuits showing the comparable error performance and the throughput. We also propose an LDPC decoder-assisted precision selection method that needs virtually no overhead. This dissertation is intended to develop high-performance and low-power error correction circuits for NAND flash memory by studying improved decoding and scheduling algorithms, VLSI architecture, and a read precision selection method.1 Introduction 1 1.1 NAND Flash Memory 1 1.2 LDPC Codes 4 1.3 Outline of the Dissertation 6 2 LDPC Decoding and Scheduling Algorithms 8 2.1 Introduction 8 2.2 Decoding Algorithms for LDPC Codes 10 2.2.1 Belief Propagation Algorithm 10 2.2.2 Simplified Belief Propagation Algorithms 12 2.3 Message-Passing Schedules for Decoding of LDPC Codes 15 2.3.1 Static Schedules 15 2.3.2 Dynamic Schedules 17 3 Improved Dynamic Scheduling Algorithms for Decoding of LDPC Codes 22 3.1 Introduction 22 3.2 Improved Residual Belief Propagation Algorithm 23 3.3 Syndrome-Based Mixed Scheduling of iRBP and NS 26 3.4 Complexity Analysis and Simulation Results 28 3.4.1 Complexity Analysis 28 3.4.2 Simulation Results 29 3.5 Concluding Remarks 33 4 A Pipelined Parallel Architecture for Decoding of Finite-Geometry LDPC Codes 36 4.1 Introduction 36 4.2 Finite-Geometry LDPC Codes and Conditional Variable Node Update Algorithm 38 4.2.1 Finite-Geometry LDPC codes 38 4.2.2 Conditional Variable Node Update Algorithm for Fixed-Point Normalized APP-Based Algorithm 40 4.3 Decoder Architecture 46 4.3.1 Baseline Sequential Architecture 46 4.3.2 Pipelined-Parallel Architecture 54 4.3.3 Memory Capacity Reduction 57 4.4 Implementation Results 60 4.5 Concluding Remarks 64 5 Low-Energy Error Correction of NAND Flash Memory through Soft-Decision Decoding 66 5.1 Introduction 66 5.2 Energy Consumption of Read Operations in NAND Flash Memory 67 5.2.1 Voltage Sensing Scheme for Soft-Decision Data Output 67 5.2.2 LSB and MSB Concurrent Access Scheme for Low-Energy Soft-Decision Data Output 72 5.2.3 Energy Consumption of Read Operations in NAND Flash Memory 73 5.3 The Performance of Soft-Decision Error Correction over a NAND Flash Memory Channel 76 5.4 Hardware Performance of the (68254, 65536) LDPC Decoder 81 5.4.1 Energy Consumption of the LDPC Decoder 81 5.4.2 Performance Comparison of the LDPC Decoder and Two BCH Decoders 83 5.5 Low-Energy Error Correction Scheme for NAND Flash Memory 87 5.5.1 Optimum Precision for Low-Energy Decoding 87 5.5.2 Iteration Count-Based Precision Selection 90 5.6 Concluding Remarks 91 6 Conclusion 94 Bibliography 96 Abstract in Korean 110 감사의 글 112Docto

SNU Open Repository and Archive

A Scalable Flash-Based Hardware Architecture for the Hierarchical Temporal Memory Spatial Pooler

Author: Streat Lennard G
Publication venue: RIT Scholar Works
Publication date: 01/05/2016
Field of study

Hierarchical temporal memory (HTM) is a biomimetic machine learning algorithm focused upon modeling the structural and algorithmic properties of the neocortex. It is comprised of two components, realizing pattern recognition of spatial and temporal data, respectively. HTM research has gained momentum in recent years, leading to both hardware and software exploration of its algorithmic formulation. Previous work on HTM has centered on addressing performance concerns; however, the memory-bound operation of HTM presents significant challenges to scalability. In this work, a scalable flash-based storage processor unit, Flash-HTM (FHTM), is presented along with a detailed analysis of its potential scalability. FHTM leverages SSD flash technology to implement the HTM cortical learning algorithm spatial pooler. The ability for FHTM to scale with increasing model complexity is addressed with respect to design footprint, memory organization, and power efficiency. Additionally, a mathematical model of the hardware is evaluated against the MNIST dataset, yielding 91.98% classification accuracy. A fully custom layout is developed to validate the design in a TSMC 180nm process. The area and power footprints of the spatial pooler are 30.538mm2 and 5.171mW, respectively. Storage processor units have the potential to be viable platforms to support implementations of HTM at scale

RIT Scholar Works