31 research outputs found
Каскадное кодирование для многоуровневой флэш-памяти с исправлением ошибок малой кратности во внешней ступени
One of the approaches to organization of error correcting coding for multilevel flash memory is based on concatenated construction, in particular, on multidimensional lattices for inner coding. A characteristic feature of such structures is the dominance of the complexity of the outer decoder in the total decoder complexity. Therefore the concatenated construction with low-complexity outer decoder may be attractive since in practical applications the decoder complexity is the crucial limitation for the usage of the error correction coding.
We consider a concatenated coding scheme for multilevel flash memory with the Barnes-Wall lattice based codes as an inner code and the Reed-Solomon code with correction up to 4…5 errors as an outer one.
Performance analysis is fulfilled for a model characterizing the basic physical features of a flash memory cell with non-uniform target voltage levels and noise variance dependent on the recorded value (input-dependent additive Gaussian noise, ID-AGN). For this model we develop a modification of our approach for evaluation the error probability for the inner code. This modification uses the parallel structure of the inner code trellis which significantly reduces the computational complexity of the performance estimation. We present numerical examples of achievable recording density for the Reed-Solomon codes with correction up to four errors as the outer code for wide range of the retention time and number of write/read cycles.Один из эффективных подходов к организации помехоустойчивого кодирования в многоуровневой флэш-памяти связан с использованием каскадных конструкций на основе многомерных целочисленных решеток, используемых для построения внутреннего кода. Характерной особенностью таких каскадных конструкций является доминирование доли сложности внешнего декодера в общей сложности каскадного декодера. Учитывая, что в практических приложениях сложность декодирования, как правило, ключевое ограничение, определяющее возможность использования помехоустойчивого кодирования для многоуровневой флэш-памяти, каскадные конструкции со сравнительно малой сложностью внешнего декодера могут оказаться привлекательным решением в рамках обменного соотношения «плотность записи — сложность декодирования». Рассмотрена каскадная схема кодирования для многоуровневой флэш-памяти, в которой в качестве внутренней ступени используются коды на основе решеток Барнса — Уолла, а в качестве внешней ступени используется код Рида — Соломона с исправлением малого числа ошибок — не более 4…5.
Анализ помехоустойчивости предложенной каскадной схемы выполнен применительно к модели, отражающей основные физические особенности ячейки флэш-памяти с неравномерно расположенными целевыми уровнями напряжения в ячейке и дисперсией шума, зависящей от записанного значения (input-dependent additive Gaussian noise, ID-AGN). Для этой модели в работе развита модификация ранее предложенного авторами подхода к оценке вероятности ошибки декодирования внутреннего кода, основанная на использовании параллельной структуры кодовой решетки внутреннего кода, что позволяет существенно понизить сложность вычислений и ускорить получение окончательного результата. Приведены численные результаты, иллюстрирующие степень снижения достижимой плотности записи при введении ограничения на число исправляемых кодом Рида — Соломона ошибок — не более 4 — для широкого диапазона значений времени хранения данных и числа циклов перезаписи
Каскадное кодирование для многоуровневой флэш-памяти с исправлением ошибок малой кратности во внешней ступени
Один из эффективных подходов к организации помехоустойчивого кодирования в многоуровневой флэш-памяти связан с использованием каскадных конструкций на основе многомерных целочисленных решеток, используемых для построения внутреннего кода. Характерной особенностью таких каскадных конструкций является доминирование доли сложности внешнего декодера в общей сложности каскадного декодера. Учитывая, что в практических приложениях сложность декодирования, как правило, ключевое ограничение, определяющее возможность использования помехоустойчивого кодирования для многоуровневой флэш-памяти, каскадные конструкции со сравнительно малой сложностью внешнего декодера могут оказаться привлекательным решением в рамках обменного соотношения «плотность записи — сложность декодирования». Рассмотрена каскадная схема кодирования для многоуровневой флэш-памяти, в которой в качестве внутренней ступени используются коды на основе решеток Барнса — Уолла, а в качестве внешней ступени используется код Рида — Соломона с исправлением малого числа ошибок — не более 4…5.
Анализ помехоустойчивости предложенной каскадной схемы выполнен применительно к модели, отражающей основные физические особенности ячейки флэш-памяти с неравномерно расположенными целевыми уровнями напряжения в ячейке и дисперсией шума, зависящей от записанного значения (input-dependent additive Gaussian noise, ID-AGN). Для этой модели в работе развита модификация ранее предложенного авторами подхода к оценке вероятности ошибки декодирования внутреннего кода, основанная на использовании параллельной структуры кодовой решетки внутреннего кода, что позволяет существенно понизить сложность вычислений и ускорить получение окончательного результата. Приведены численные результаты, иллюстрирующие степень снижения достижимой плотности записи при введении ограничения на число исправляемых кодом Рида — Соломона ошибок — не более 4 — для широкого диапазона значений времени хранения данных и числа циклов перезаписи
Variable-Rate FEC Decoder VLSI Architecture for 400G Rate-Adaptive Optical Communication
Optical communication systems rely on forward error correction (FEC) to decrease the error rate of the received data. Since the properties of the optical channel will vary over time, a variable FEC coding gain would be useful. For example, if the channel conditions are benign, lower code overhead can be used, effectively increasing the code rate. We introduce a variable-rate FEC decoder architecture that can operate in several different modes, where each mode is linked to code rate and decoding iterations. We demonstrate a decoder implementation that provides a net coding gain range of 9.96–10.38 dB at a post-FEC bit-error rate of 10^-15. For this range, a decoder implemented in a 28-nm process technology offers throughputs in excess of 400 Gbps, decoding latencies below 53 ns and a power dissipation of less than 0.95 W (or 1.3 pJ/information bit)
Variable-Rate VLSI Architecture for 400-Gb/s Hard-Decision Product Decoder
Variable-rate transceivers, which adapt to the conditions, will be central to energy-efficient communication. However, fiber-optic communication systems with high bit-rate requirements make design of flexible transceivers challenging, since additional circuits needed to orchestrate the flexibility will increase area and degrade speed. We propose a variable-rate VLSI architecture of a forward error correction (FEC) decoder based on hard-decision product codes. Variable shortening of component codes provides a mechanism by which code rate can be varied, the number of iterations offers a knob to control the coding gain, while a key-equation solver module that can swap between error-locator polynomial coefficients provides a means to change error correction capability. Our evaluations based on 28-nm netlists show that a variable-rate decoder implementation can offer a net coding gain (NCG) range of 9.96-10.38 dB at a post-FEC bit-error rate of 10^-15. The decoder achieves throughputs in excess of 400 Gb/s, latencies below 53 ns, and energy efficiencies of 1.14 pJ/bit or less. While the area of the variable-rate decoder is 31% larger than a decoder with a fixed rate, the power dissipation is a mere 5% higher. The variable error correction capability feature increases the NCG range further, to above 10.5 dB, but at a significant area cost
저밀도 부호의 응용: 묶음 지그재그 파운틴 부호와 WOM 부호
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2017. 2. 노종선.This dissertation contains the following two contributions on the applications of sparse codes.
Fountain codes Batched zigzag (BZ) fountain codes
– Two-phase batched zigzag (TBZ) fountain codes
Write-once memory (WOM) codes
– WOM codes implemented by rate-compatible low-density generator matrix (RC-LDGM) codes
First, two classes of fountain codes, called batched zigzag fountain codes and two-phase batched zigzag fountain codes, are proposed for the symbol erasure channel. At a cost of slightly lengthened code symbols, the involved message symbols in each batch of the proposed codes can be recovered by low complexity zigzag decoding algorithm. Thus, the proposed codes have low buffer occupancy during decoding process. These features are suitable for receivers with limited hardware resources in the broadcasting channel. A method to obtain degree distributions of code symbols for the proposed codes via ripple size evolution is also proposed by taking into account the released code symbols from the batches. It is shown that the proposed codes outperform Luby transform codes and zigzag decodable fountain codes with respect to intermediate recovery rate and coding overhead when message length is short, symbol erasure rate is low, and available buffer size is limited.
In the second part of this dissertation, WOM codes constructed by sparse codes are presented. Recently, WOM codes are adopted to NAND flash-based solid-state drive (SSD) in order to extend the lifetime by reducing the number of erasure operations. Here, a new rewriting scheme for the SSD is proposed, which is implemented by multiple binary erasure quantization (BEQ) codes. The corresponding BEQ codes are constructed by RC-LDGM codes. Moreover, by putting RC-LDGM codes together with a page selection method, writing efficiency can be improved. It is verified via simulation that the SSD with proposed rewriting scheme outperforms the SSD without and with the conventional WOM codes for single level cell (SLC) and multi-level cell (MLC) flash memories.1 Introduction 1
1.1 Background 1
1.2 Overview of Dissertation 5
2 Sparse Codes 7
2.1 Linear Block Codes 7
2.2 LDPC Codes 9
2.3 Message Passing Decoder 11
3 New Fountain Codes with Improved Intermediate Recovery Based on Batched Zigzag Coding 13
3.1 Preliminaries 17
3.1.1 Definitions and Notation 17
3.1.2 LT Codes 18
3.1.3 Zigzag Decodable Codes 20
3.1.4 Bit-Level Overhead 22
3.2 New Fountain Codes Based on Batched Zigzag Coding 23
3.2.1 Construction of Shift Matrix 24
3.2.2 Encoding and Decoding of the Proposed BZ Fountain Codes 25
3.2.3 Storage and Computational Complexity 28
3.3 Degree Distribution of BZ Fountain Codes 31
3.3.1 Relation Between and 31
3.3.2 Derivation of via Ripple Size Evolution 32
3.4 Two-Phase Batched Zigzag Fountain Codes with Additional Memory 40
3.4.1 Code Construction 41
3.4.2 Bit-Level Overhead 46
3.5 Numerical Analysis 49
4 Write-Once Memory Codes Using Rate-Compatible LDGM Codes 60
4.1 Preliminaries 62
4.1.1 NAND Flash Memory 62
4.1.2 Rewriting Schemes for Flash Memory 62
4.1.3 Construction of Rewriting Codes by BEQ Codes 65
4.2 Proposed Rewriting Codes 67
4.2.1 System Model 67
4.2.2 Multi-rate Rewriting Codes 68
4.2.3 Page Selection for Rewriting 70
4.3 RC-LDGM Codes 74
4.4 Numerical Analysis 76
5 Conclusions 80
Bibliography 82
초록 94Docto
연판정 오류정정을 위한 낮은 복잡도의 블록 터보부호 복호화 연구
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 8. 성원용.As the throughput needed for communication systems and storage devices increases, high-performance forward error correction (FEC), especially soft-decision (SD) based technique, becomes essential. In particular, block turbo codes (BTCs) and low-density parity check (LDPC) codes are considered as candidate FEC codes for the next generation systems, such as beyond-100Gbps optical networks and under-20nm NAND flash memory devices, which require capacity-approaching performance and very low error floor. The BTCs have definite strengths in diversity and encoding complexity because they generally employ a two-dimensional structure, which enables sub-frame level decoding for the row or column code-words. This sub-frame level decoding gives a strong advantage for parallel processing. The BTC decoding throughput can be improved by applying a low-complexity algorithm to the small level decoding or by running multiple sub-frame decoding modules simultaneously. In this dissertation, we develop high-throughput BTC decoding software that pursuits these advantages.
The first part of this dissertation is devoted to finding efficient test patterns in the Chase-Pyndiah algorithm. Although the complexity of this algorithm linearly increases according to the number of the test patterns, it naively considers all possible patterns containing least reliable positions. As a result, consideration of one more position nearly doubles the complexity. To solve this issue, we first introduce a new position selection criterion that excludes some of the selected ones having a relatively large reliability. This technique excludes the selection of sufficiently reliable positions, which greatly reduces the complexity. Secondly, we propose a pattern selection scheme considering the error coverage. We define the error coverage factor that represents the influence on the error-correcting performance and compute it by analyzing error events. Based on the computed factor, we select the patterns with the greedy algorithm. By using these methods, we can flexibly balance the complexity and the performance.
The second part of this dissertation is developing low-complexity soft-output processing methods needed for BTC decoding. In the Chase-Pyndiah algorithm, the soft-output is updated in two different ways according to whether competing code-words exist on the updating positions or not. If the competing code-words exist, the Euclidean distance between the soft-input signal and the code-words that are generated from the test patterns is used. However, the cost of distance computation is very high and linearly increases with the sub-frame length. We identify computationally redundant positions and optimize the computing process by ignoring them. If the competing ones do not exist, the reliability factor that should be pre-determined by an extensive search is demanded. To avoid this, we propose adaptive determination methods, which provides even better error-correcting performance. In addition, we investigate the Pyndiah's soft-output computation and find its drawbacks that appear during the approximation process. To remove the drawbacks, we replace the updating method of the positions that are expected to be seriously damaged by the approximation with the reliability factor-based one, which is much simpler, even though they have the competing words.
This dissertation also develops a graphics processing unit (GPU) based BTC decoding program. In order to hide the latency of arithmetic and memory access operations, this software applies the kernel structure that processes multiple BTC-words and allocates multiple sub-frames to each thread-block. Global memory access optimization and data compression, which demands less shared memory space, are also employed. For efficient mapping of the Chase-Pyndiah algorithm onto GPUs, we propose parallel processing schemes employing efficient reduction algorithms and provide step-by-step parallel algorithms for the algebraic decoding.
The last part of this dissertation is devoted to summarizing the developed decoding method and comparing it with the decoding of the LDPC convolutional code (CC), which is currently reported as the most powerful candidate for the 100Gbps optical network. We first investigate the complexity reduction and the error rate performance improvement of the developed method. Then, we analyze the complexity of the LDPC-CC decoding and compare it with the developed BTC decoding for the 20% overhead codes.
This dissertation is intended to develop high-throughput SD decoding software by introducing complexity reduction techniques for the Chase-Pyndiah algorithm and efficient parallel processing methods, and to emphasize the competitiveness of the BTC. The proposed decoding methods and parallel processing algorithms verified in the GPU-based systems are also applicable to hardware-based ones. By implementing hardware-based decoders that employ the developed methods in this dissertation, significant improvements on the throughputs and the energy efficiency can be obtained. Moreover, thanks to the wide rate coverage of the BTC, the developed techniques can be applied to many high-throughput error correction applications, such as the next-generation optical network and storage device systems.Chapter 1 Introduction 1
1.1 Turbo Codes 1
1.2 Applications of Turbo Codes 4
1.3 Outline of the Dissertation 5
Chapter 2 Encoding and Iterative Decoding of Block Turbo Codes 7
2.1 Introduction 7
2.2 Encoding Procedure of Shortened-Extended BTCs 9
2.3 Scheduling Methods for Iterative Decoding 9
2.3.1 Serial Scheduling 10
2.3.2 Parallel Scheduling 10
2.3.3 Replica Scheduling 11
2.4 Elementary Decoding with Chase-Pyndiah Algorithm 13
2.4.1 Chase-Pyndiah Algorithm for Extended BTCs 13
2.4.2 Reliability Computation of the ML Code-Word 17
2.4.3 Algebraic Decoding for SEC and DEC BCH Codes 20
2.5 Issues of Chase-Pyndiah Algorithm 23
Chapter 3 Complexity Reduction Techniques for Code-Word Set Generation of the Chase-Pyndiah Algorithm 24
3.1 Introduction 24
3.2 Adaptive Selection of LRPs 25
3.2.1 Selection Constraints of LRPs 25
3.2.2 Simulation Results 26
3.3 Test Pattern Selection 29
3.3.1 The Error Coverage Factor of Test Patterns 30
3.3.2 Greedy Selection of Test Patterns 33
3.3.3 Simulation Results 34
3.4 Concluding Remarks 34
Chapter 4 Complexity Reduction Techniques for Soft-Output Update of the Chase-Pyndiah Algorithm 37
4.1 Introduction 37
4.2 Distance Computation 38
4.2.1 Position-Index List Based Method 39
4.2.2 Double Index Set-Based Method 42
4.2.3 Complexity Analysis 46
4.2.4 Simulation Results 47
4.3 Reliability Factor Determination 49
4.3.1 Refinement of Distance-Based Reliability Factor 51
4.3.2 Adaptive Determination of the Reliability Factor 51
4.3.3 Simulation Results 53
4.4 Accuracy Improvement in Extrinsic Information Update 54
4.4.1 Drawbacks of the Sub-Optimal Update 55
4.4.2 Low-Complexity Extrinsic Information Update 58
4.4.3 Simulation Results 59
4.5 Concluding Remarks 61
Chapter 5 High-Throughput BTC Decoding on GPUs 64
5.1 Introduction 64
5.2 BTC Decoder Architecture for GPU Implementations 66
5.3 Memory Optimization 68
5.3.1 Global Memory Access Reduction 68
5.3.2 Improvement of Global Memory Access Coalescing 68
5.3.3 Efficient Shared Memory Control with Data Compression 70
5.3.4 Index Parity Check Scheme 73
5.4 Parallel Algorithms with the CUDA Shuffle Function 77
5.5 Implementation of Algebraic Decoder 78
5.5.1 Galois Field Operations with Look-Up Tables 78
5.5.2 Error-Locator Polynomial Setting with the LUTs 81
5.5.3 Parallel Chien Search with the LUTs 84
5.6 Simulation Results 85
5.7 Concluding Remarks 89
Chapter 6 Competitiveness of BTCs as FEC codes for the Next-Generation Optical Networks 91
6.1 Introduction 91
6.2 The Complexity Reduction of the Modified Chase-Pyndiah Algorithm 92
6.2.1 Summary of the Complexity Reduction 92
6.2.2 The Error-Correcting Performance 94
6.3 Comparison of BTCs and LDPC-CCs 97
6.3.1 Complexity Analysis of the LDPC-CC Decoding 97
6.3.2 Comparison of the 20% Overhead BTC and LDPC-CC 100
6.4 Concluding Remarks 101
Chapter 7 Conclusion 102
Bibliography 105
국문 초록 113Docto