Search CORE

19 research outputs found

FPGA accelerator of algebraic quasi cyclic LDPC codes for NAND flash memories

Author: Martina Maurizio
Masera Guido
Tuoheti Abuduwaili
Zaidi Syed Azhar Ali
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2016
Field of study

In this article, the authors implement an FPGA simulator that accelerates the performance evaluation of very long QC-LDPC codes, and present a novel 8-KB LDPC code for NAND flash memory with better performance

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Error Correction Codes and Signal Processing in Flash Memory

Author: Dong Guiqiang
Pan Liyang
Wang Xueqiang
Zhou Runde
Publication venue: 'IntechOpen'
Publication date: 06/09/2011
Field of study

IntechOpen

Crossref

VLSI algorithms and architectures for non-binary-LDPC decoding

Author: Lacruz Jucht Jesús Omar
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 04/11/2016
Field of study

Tesis por compendio[EN] This thesis studies the design of low-complexity soft-decision Non-Binary Low-Density Parity-Check (NB-LDPC) decoding algorithms and their corresponding hardware architectures suitable for decoding high-rate codes at high throughput (hundreds of Mbps and Gbps). In the first part of the thesis the main aspects concerning to the NB-LDPC codes are analyzed, including a study of the main bottlenecks of conventional softdecision decoding algorithms (Q-ary Sum of Products (QSPA), Extended Min-Sum (EMS), Min-Max and Trellis-Extended Min-Sum (T-EMS)) and their corresponding hardware architectures. Despite the limitations of T-EMS algorithm (high complexity in the Check Node (CN) processor, wiring congestion due to the high number of exchanged messages between processors and the inability to implement decoders over high-order Galois fields due to the high decoder complexity), it was selected as starting point for this thesis due to its capability to reach high-throughput. Taking into account the identified limitations of the T-EMS algorithm, the second part of the thesis includes six papers with the results of the research made in order to mitigate the T-EMS disadvantages, offering solutions that reduce the area, the latency and increase the throughput compared to previous proposals from literature without sacrificing coding gain. Specifically, five low-complexity decoding algorithms are proposed, which introduce simplifications in different parts of the decoding process. Besides, five complete decoder architectures are designed and implemented on a 90nm Complementary Metal-Oxide-Semiconductor (CMOS) technology. The results show an achievement in throughput higher than 1Gbps and an area less than 10 mm2. The increase in throughput is 120% and the reduction in area is 53% compared to previous implementations of T-EMS, for the (837,726) NB-LDPC code over GF(32). The proposed decoders reduce the CN area, latency, wiring between CN and Variable Node (VN) processor and the number of storage elements required in the decoder. Considering that these proposals improve both area and speed, the efficiency parameter (Mbps / Million NAND gates) is increased in almost five times compared to other proposals from literature. The improvements in terms of area allow us to implement NB-LDPC decoders over high-order fields which had not been possible until now due to the highcomplexity of decoders previously proposed in literature. Therefore, we present the first post-place and route report for high-rate codes over high-order fields higher than Galois Field (GF)(32). For example, for the (1536,1344) NB-LDPC code over GF(64) the throughput is 1259Mbps occupying an area of 28.90 mm2. On the other hand, a decoder architecture is implemented on a Field Programmable Gate Array (FPGA) device achieving 630 Mbps for the high-rate (2304,2048) NB-LDPC code over GF(16). To the best knowledge of the author, these results constitute the highest ones presented in literature for similar codes and implemented on the same technologies.[ES] En esta tesis se aborda el estudio del diseño de algoritmos de baja complejidad para la decodificación de códigos de comprobación de paridad de baja densidad no binarios (NB-LDPC) y sus correspondientes arquitecturas apropiadas para decodificar códigos de alta tasa a altas velocidades (cientos de Mbps y Gbps). En la primera parte de la tesis los principales aspectos concernientes a los códigos NB-LDPC son analizados, incluyendo un estudio de los principales cuellos de botella presentes en los algoritmos de decodificación convencionales basados en decisión blanda (QSPA, EMS, Min-Max y T-EMS) y sus correspondientes arquitecturas hardware. A pesar de las limitaciones del algoritmo T-EMS (alta complejidad en el procesador del nodo de chequeo de paridad (CN), congestión en el rutado debido al intercambio de mensajes entre procesadores y la incapacidad de implementar decodificadores para campos de Galois de orden elevado debido a la elevada complejidad), éste fue seleccionado como punto de partida para esta tesis debido a su capacidad para alcanzar altas velocidades. Tomando en cuenta las limitaciones identificadas en el algoritmo T-EMS, la segunda parte de la tesis incluye seis artículos con los resultados de la investigación realizada con la finalidad de mitigar las desventajas del algoritmo T-EMS, ofreciendo soluciones que reducen el área, la latencia e incrementando la velocidad comparado con propuestas previas de la literatura sin sacrificar la ganancia de codificación. Especificamente, cinco algoritmos de decodificación de baja complejidad han sido propuestos, introduciendo simplificaciones en diferentes partes del proceso de decodificación. Además, arquitecturas completas de decodificadores han sido diseñadas e implementadas en una tecnologia CMOS de 90nm consiguiéndose una velocidad mayor a 1Gbps con un área menor a 10 mm2, aumentando la velocidad en 120% y reduciendo el área en 53% comparado con previas implementaciones del algoritmo T-EMS para el código (837,726) implementado sobre campo de Galois GF(32). Las arquitecturas propuestas reducen el área del CN, latencia, número de mensajes intercambiados entre el nodo de comprobación de paridad (CN) y el nodo variable (VN) y el número de elementos de almacenamiento en el decodificador. Considerando que estas propuestas mejoran tanto el área comola velocidad, el parámetro de eficiencia (Mbps / Millones de puertas NAND) se ha incrementado en casi cinco veces comparado con otras propuestas de la literatura. Las mejoras en términos de área nos ha permitido implementar decodificadores NBLDPC sobre campos de Galois de orden elevado, lo cual no habia sido posible hasta ahora debido a la alta complejidad de los decodificadores anteriormente propuestos en la literatura. Por lo tanto, en esta tesis se presentan los primeros resultados incluyendo el emplazamiento y rutado para códigos de alta tasa sobre campos finitos de orden mayor a GF(32). Por ejemplo, para el código (1536,1344) sobre GF(64) la velocidad es 1259 Mbps ocupando un área de 28.90 mm2. Por otro lado, una arquitectura de decodificador ha sido implementada en un dispositivo FPGA consiguiendo 660 Mbps de velocidad para el código de alta tasa (2304,2048) sobre GF(16). Estos resultados constituyen, según el mejor conocimiento del autor, los mayores presentados en la literatura para códigos similares implementados para las mismas tecnologías.[CA] En esta tesi s'aborda l'estudi del disseny d'algoritmes de baixa complexitat per a la descodificació de codis de comprovació de paritat de baixa densitat no binaris (NB-LDPC), i les seues corresponents arquitectures per a descodificar codis d'alta taxa a altes velocitats (centenars de Mbps i Gbps). En la primera part de la tesi els principals aspectes concernent als codis NBLDPC són analitzats, incloent un estudi dels principals colls de botella presents en els algoritmes de descodificació convencionals basats en decisió blana (QSPA, EMS, Min-Max i T-EMS) i les seues corresponents arquitectures. A pesar de les limitacions de l'algoritme T-EMS (alta complexitat en el processador del node de revisió de paritat (CN), congestió en el rutat a causa de l'intercanvi de missatges entre processadors i la incapacitat d'implementar descodificadors per a camps de Galois d'orde elevat a causa de l'elevada complexitat), este va ser seleccionat com a punt de partida per a esta tesi degut a la seua capacitat per a aconseguir altes velocitats. Tenint en compte les limitacions identificades en l'algoritme T-EMS, la segona part de la tesi inclou sis articles amb els resultats de la investigació realitzada amb la finalitat de mitigar els desavantatges de l'algoritme T-EMS, oferint solucions que redueixen l'àrea, la latència i incrementant la velocitat comparat amb propostes prèvies de la literatura sense sacrificar el guany de codificació. Específicament, s'han proposat cinc algoritmes de descodificació de baixa complexitat, introduint simplificacions en diferents parts del procés de descodificació. A més, s'han dissenyat arquitectures completes de descodificadors i s'han implementat en una tecnologia CMOS de 90nm aconseguint-se una velocitat major a 1Gbps amb una àrea menor a 10 mm2, augmentant la velocitat en 120% i reduint l'àrea en 53% comparat amb prèvies implementacions de l'algoritme T-EMS per al codi (837,726) implementat sobre camp de Galois GF(32). Les arquitectures proposades redueixen l'àrea del CN, la latència, el nombre de missatges intercanviats entre el node de comprovació de paritat (CN) i el node variable (VN) i el nombre d'elements d'emmagatzemament en el descodificador. Considerant que estes propostes milloren tant l'àrea com la velocitat, el paràmetre d'eficiència (Mbps / Milions deportes NAND) s'ha incrementat en quasi cinc vegades comparat amb altres propostes de la literatura. Les millores en termes d'àrea ens ha permès implementar descodificadors NBLDPC sobre camps de Galois d'orde elevat, la qual cosa no havia sigut possible fins ara a causa de l'alta complexitat dels descodificadors anteriorment proposats en la literatura. Per tant, nosaltres presentem els primers reports després de l'emplaçament i rutat per a codis d'alta taxa sobre camps finits d'orde major a GF(32). Per exemple, per al codi (1536,1344) sobre GF(64) la velocitat és 1259 Mbps ocupant una àrea de 28.90 mm2. D'altra banda, una arquitectura de descodificador ha sigut implementada en un dispositiu FPGA aconseguint 660 Mbps de velocitat per al codi d'alta taxa (2304,2048) sobre GF(16). Estos resultats constitueixen, per al millor coneixement de l'autor, els millors presentats en la literatura per a codis semblants implementats per a les mateixes tecnologies.Lacruz Jucht, JO. (2016). VLSI algorithms and architectures for non-binary-LDPC decoding [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/73266TESISCompendi

RiuNet

낸드플래시 메모리 오류정정을 위한 고성능 LDPC 복호방법 연구

Author: 김종홍
Publication venue: 서울대학교 대학원
Publication date: 01/08/2013
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2013. 8. 성원용.반도체 공정의 미세화에 따라 비트 에러율이 증가하는 낸드 플래시 메모리에서 고성능 에러 정정 방법은 필수적이다. Low-density parity-check (LDPC) 부호와 같은 연판정 에러 정정 부호는 뛰어난 에러 정정 성능을 보이지만, 높은 구현 복잡도로 인해 플래시 메모리 시스템에 적용되기 힘든 단점이 있다. 본 논문에서는 LDPC 부호의 효율적인 복호를 위해 고성능 메시지 전파 스케줄링 방법과 저 복잡도 복호 알고리즘을 제안한다. 특히 finite geometry (FG) LDPC 부호에 대한 효율적인 디코더 아키텍쳐를 제안하며, 구현된 디코더를 이용하여 낸드 플래시 메모리에 대해 연판정 복호시의 에너지 소모량에 대해 연구한다. 본 논문의 첫 번째 부분에서는 동적 스케줄링 (informed dynamic scheduling, IDS) 알고리즘의 성능향상 방법에 대해 연구한다. 이를 위해 우선 기존의 가장 빠른 수렴 속도를 보이는 IDS 알고리즘인 레지듀얼 신뢰 전파 (residual belief propagation, RBP) 알고리즘의 동작 특성을 분석하고, 이를 바탕으로 특정 노드에 메시지 갱신이 집중되는 것을 방지하여 RBP 알고리즘의 수렴속도를 증가시킨 improved RBP (iRBP) 알고리즘을 제안한다. 또한 iRBP의 뛰어난 수렴속도와 기존의 NS 알고리즘의 우수한 에러 정정 능력을 모두 갖춘 신드롬 기반의 혼합 스케줄링 (mixed scheduling) 방법을 제안한다. 끝으로 다양한 부호율의 LDPC 부호에 대한 모의실험을 통해 제안된 신드롬 기반의 혼합 스케줄링 방법이 본 논문에서 시험된 다른 모든 스케줄링 알고리즘의 성능을 능가함을 확인하였다. 논문의 두 번째 부분에서는 복호 실패시 많은 비트 에러를 발생시키는 a posteriori probability (APP) 알고리즘의 개선 방안에 방안을 제안한다. 또한 빠른 수렴속도와 우수한 에러 마루 (error-floor) 성능으로 데이터 저장장치에 적합한 FG-LDPC 부호에 대해 제안된 알고리즘이 적용된 하드웨어 아키텍처를 제안하였다. 제안된 아키텍처는 높은 노드 가중치를 가지는 FG-LDPC 부호에 적합하도록 쉬프트 레지스터 (shift registers)와 SRAM 기반의 혼합 구조를 채용하며, 높은 처리량을 얻기 위해 파이프라인된 병렬 아키텍처를 사용한다. 또한 메모리 사용량을 줄이기 위해 세 가지의 메모리 용량 감소 기법을 적용하며, 전력 소비를 줄이기 위해 두 가지의 저전력 기법을 제안한다. 본 제안된 아키텍처는 부호율 0.96의 (68254, 65536) Euclidean geometry LDPC 부호에 대해 0.13-um CMOS 공정에서 구현하였다. 마지막으로 본 논문에서는 연판정 복호가 적용된 낸드 플래시 메모리 시스템의 에너지 소모를 낮추는 방법에 대해 제안한다. 연판정 기반의 에러 정정 알고리즘은 높은 성능을 보이지만, 이는 플래시 메모리의 센싱 수와 에너지 소모를 증가 시키는 단점이 있다. 본 연구에서는 앞서 구현된 LDPC 디코더가 채용된 낸드 플래시 메모리 시스템의 에너지 소모를 분석하고, LDPC 디코더와 BCH 디코더 간의 칩 사이즈와 에너지 소모량을 비교하였다. 이와 더불어 본 논문에서는 LDPC 디코더를 이용한 센싱 정밀도 결정 방법을 제안한다. 본 연구를 통해 제안된 복호 및 스케줄링 알고리즘, VLSI 아키텍쳐, 그리고 읽기 정밀도 결정 방법을 통해 낸드 플래시 메모리 시스템의 에러 정정 성능을 극대화 하고 에너지 소모를 최소화 할 수 있다.High-performance error correction for NAND flash memory is greatly needed because the raw bit error rate increases as the semiconductor geometry shrinks for high density. Soft-decision error correction, such as low-density parity-check (LDPC) codes, offers high performance but their implementation complexity hinders wide adoption to consumer products. This dissertation proposes two high-performance message-passing schedules and a low-complexity decoding algorithm for LDPC codes. In particular, an efficient decoder architecture for finite geometry (FG) LDPC codes is proposed, and the energy consumption of soft-decision decoding for NAND flash memory is analyzed. The first part of this dissertation is devoted to improving the informed dynamic scheduling (IDS) algorithms. We analyze the behavior of the residual belief propagation (RBP), which is the fastest IDS algorithm, and develop an improved RBP (iRBP) by avoiding the concentration of message updates at a particular node. We also study the syndrome-based mixed scheduling of the iRBP and the node-wise scheduling (NS). The proposed mixed scheduling outperforms all other scheduling methods tested in this work. The next part of this dissertation is to develop a conditional variable node update scheme for the a posteriori probability (APP) algorithm. The developed algorithm is robust to decoding failures and can reduce the dynamic power consumption by lowering switching activities in the LDPC decoder. To implement the developed algorithm, we propose a memory-efficient pipelined parallel architecture for LDPC decoding. The architecture employs FG-LDPC codes that not only show fast convergence speed and good error-floor performance but also perform well with iterative decoding algorithms, which is especially suitable for data storage devices. We also developed a rate-0.96 (68254, 65536) Euclidean geometry LDPC code and implemented the proposed architecture in 0.13-um CMOS technology. This dissertation also covers low-energy error correction of NAND flash memory through soft-decision decoding. The soft-decision-based error correction algorithms show high performance, but they demand an increased number of flash memory sensing operations and consume more energy for memory access. We examine the energy consumption of a NAND flash memory system equipping an LDPC code-based soft-decision error correction circuit. The sum of energy consumed at NAND flash memory and the LDPC decoder is minimized. In addition, the chip size and energy consumption of the decoder were compared with those of two Bose-Chaudhuri-Hocquenghem (BCH) decoding circuits showing the comparable error performance and the throughput. We also propose an LDPC decoder-assisted precision selection method that needs virtually no overhead. This dissertation is intended to develop high-performance and low-power error correction circuits for NAND flash memory by studying improved decoding and scheduling algorithms, VLSI architecture, and a read precision selection method.1 Introduction 1 1.1 NAND Flash Memory 1 1.2 LDPC Codes 4 1.3 Outline of the Dissertation 6 2 LDPC Decoding and Scheduling Algorithms 8 2.1 Introduction 8 2.2 Decoding Algorithms for LDPC Codes 10 2.2.1 Belief Propagation Algorithm 10 2.2.2 Simplified Belief Propagation Algorithms 12 2.3 Message-Passing Schedules for Decoding of LDPC Codes 15 2.3.1 Static Schedules 15 2.3.2 Dynamic Schedules 17 3 Improved Dynamic Scheduling Algorithms for Decoding of LDPC Codes 22 3.1 Introduction 22 3.2 Improved Residual Belief Propagation Algorithm 23 3.3 Syndrome-Based Mixed Scheduling of iRBP and NS 26 3.4 Complexity Analysis and Simulation Results 28 3.4.1 Complexity Analysis 28 3.4.2 Simulation Results 29 3.5 Concluding Remarks 33 4 A Pipelined Parallel Architecture for Decoding of Finite-Geometry LDPC Codes 36 4.1 Introduction 36 4.2 Finite-Geometry LDPC Codes and Conditional Variable Node Update Algorithm 38 4.2.1 Finite-Geometry LDPC codes 38 4.2.2 Conditional Variable Node Update Algorithm for Fixed-Point Normalized APP-Based Algorithm 40 4.3 Decoder Architecture 46 4.3.1 Baseline Sequential Architecture 46 4.3.2 Pipelined-Parallel Architecture 54 4.3.3 Memory Capacity Reduction 57 4.4 Implementation Results 60 4.5 Concluding Remarks 64 5 Low-Energy Error Correction of NAND Flash Memory through Soft-Decision Decoding 66 5.1 Introduction 66 5.2 Energy Consumption of Read Operations in NAND Flash Memory 67 5.2.1 Voltage Sensing Scheme for Soft-Decision Data Output 67 5.2.2 LSB and MSB Concurrent Access Scheme for Low-Energy Soft-Decision Data Output 72 5.2.3 Energy Consumption of Read Operations in NAND Flash Memory 73 5.3 The Performance of Soft-Decision Error Correction over a NAND Flash Memory Channel 76 5.4 Hardware Performance of the (68254, 65536) LDPC Decoder 81 5.4.1 Energy Consumption of the LDPC Decoder 81 5.4.2 Performance Comparison of the LDPC Decoder and Two BCH Decoders 83 5.5 Low-Energy Error Correction Scheme for NAND Flash Memory 87 5.5.1 Optimum Precision for Low-Energy Decoding 87 5.5.2 Iteration Count-Based Precision Selection 90 5.6 Concluding Remarks 91 6 Conclusion 94 Bibliography 96 Abstract in Korean 110 감사의 글 112Docto

SNU Open Repository and Archive

Decoding of Decode and Forward (DF) Relay Protocol using Min-Sum Based Low Density Parity Check (LDPC) System

Author: Ab Hamid Khairuddin
Othman Al-Khalid
Suud Jamaah Binti
Zen Hushairi
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 17/04/2022
Field of study

Decoding high complexity is a major issue to design a decode and forward (DF) relay protocol. Thus, the establishment of low complexity decoding system would beneficial to assist decode and forward relay protocol. This paper reviews existing methods for the min-sum based LDPC decoding system as the low complexity decoding system. Reference lists of chosen articles were further reviewed for associated publications. This paper introduces comprehensive system model representing and describing the methods developed for LDPC based for DF relay protocol. It is consists of a number of components: (1) encoder and modulation at the source node, (2) demodulation, decoding, encoding and modulation at relay node, and (3) demodulation and decoding at the destination node. This paper also proposes a new taxonomy for min-sum based LDPC decoding techniques, highlights some of the most important components such as data used, result performances and profiles the Variable and Check Node (VCN) operation methods that have the potential to be used in DF relay protocol. Min-sum based LDPC decoding methods have the potential to provide an objective measure the best tradeoff between low complexities decoding process and the decoding error performance, and emerge as a cost-effective solution for practical application

International Journal of Communication Networks and Information Security (IJCNIS)

Energy-Efficient Decoders of Near-Capacity Channel Codes.

Author: Park Youn Sung
Publication venue
Publication date
Field of study

Channel coding has become essential in state-of-the-art communication and storage systems for ensuring reliable transmission and storage of information. Their goal is to achieve high transmission reliability while keeping the transmit energy consumption low by taking advantage of the coding gain provided by these codes. The lowest total system energy is achieved with a decoder that provides both good coding gain and high energy-efficiency. This thesis demonstrates the VLSI implementation of near-capacity channel decoders using the LDPC, nonbinary LDPC (NB-LDPC) and polar codes with an emphasis of reducing the decode energy. LDPC code is a widely used channel code due to its excellent error-correcting performance. However, memory dominates the power of high-throughput LDPC decoders. Therefore, these memories are replaced with a novel non-refresh embedded DRAM (eDRAM) taking advantage of the deterministic memory access pattern and short access window of the decoding algorithm to trade off retention time for faster access speed. The resulting LDPC decoder with integrated eDRAMs achieves state-of-the-art area- and energy-efficiency. NB-LDPC code achieves better error-correcting performance than LDPC code at the cost of higher decoding complexity. However, the factor graph is simplified, permitting a fully parallel architecture with low wiring overhead. To reduce the dynamic power of the decoder, a fine-grained dynamic clock gating technique is applied based on node-level convergence. This technique greatly reduces dynamic power allowing the decoder to achieve high energy-efficiency while achieving high throughput. The recently invented polar code has a similar error-correcting performance to LDPC code of comparable block length. However, the easy reconfigurability of code rate as well as block length makes it desirable in numerous applications where LDPC is not competitive. In addition, the regular structure and simple processing enables a highly efficient decoder in terms of area and power. Using the belief propagation algorithm with architectural and memory improvements, a polar decoder is demonstrated achieving high throughput and high energy- and area-efficiency. The demonstrated energy-efficient decoders have advanced the state-of-the-art. The decoders will allow the continued reduction of decode energy for the latest communication and storage applications. The developed techniques are widely applicable to designing low-power DSP processors.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/108731/1/parkyoun_1.pd

Deep Blue Documents at the University of Michigan

Flexible and Low-Complexity Encoding and Decoding of Systematic Polar Codes

Author: Giard Pascal
Gross Warren J.
Sarkis Gabi
Tal Ido
Thibeault Claude
Vardy Alexander
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

In this work, we present hardware and software implementations of flexible polar systematic encoders and decoders. The proposed implementations operate on polar codes of any length less than a maximum and of any rate. We describe the low-complexity, highly parallel, and flexible systematic-encoding algorithm that we use and prove its correctness. Our hardware implementation results show that the overhead of adding code rate and length flexibility is little, and the impact on operation latency minor compared to code-specific versions. Finally, the flexible software encoder and decoder implementations are also shown to be able to maintain high throughput and low latency.Comment: Submitted to IEEE Transactions on Communications, 201

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

연판정 오류정정을 위한 낮은 복잡도의 블록 터보부호 복호화 연구

Author: Junhee Cho
Publication venue: 서울대학교 대학원
Publication date: 01/08/2016
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 8. 성원용.As the throughput needed for communication systems and storage devices increases, high-performance forward error correction (FEC), especially soft-decision (SD) based technique, becomes essential. In particular, block turbo codes (BTCs) and low-density parity check (LDPC) codes are considered as candidate FEC codes for the next generation systems, such as beyond-100Gbps optical networks and under-20nm NAND flash memory devices, which require capacity-approaching performance and very low error floor. The BTCs have definite strengths in diversity and encoding complexity because they generally employ a two-dimensional structure, which enables sub-frame level decoding for the row or column code-words. This sub-frame level decoding gives a strong advantage for parallel processing. The BTC decoding throughput can be improved by applying a low-complexity algorithm to the small level decoding or by running multiple sub-frame decoding modules simultaneously. In this dissertation, we develop high-throughput BTC decoding software that pursuits these advantages. The first part of this dissertation is devoted to finding efficient test patterns in the Chase-Pyndiah algorithm. Although the complexity of this algorithm linearly increases according to the number of the test patterns, it naively considers all possible patterns containing least reliable positions. As a result, consideration of one more position nearly doubles the complexity. To solve this issue, we first introduce a new position selection criterion that excludes some of the selected ones having a relatively large reliability. This technique excludes the selection of sufficiently reliable positions, which greatly reduces the complexity. Secondly, we propose a pattern selection scheme considering the error coverage. We define the error coverage factor that represents the influence on the error-correcting performance and compute it by analyzing error events. Based on the computed factor, we select the patterns with the greedy algorithm. By using these methods, we can flexibly balance the complexity and the performance. The second part of this dissertation is developing low-complexity soft-output processing methods needed for BTC decoding. In the Chase-Pyndiah algorithm, the soft-output is updated in two different ways according to whether competing code-words exist on the updating positions or not. If the competing code-words exist, the Euclidean distance between the soft-input signal and the code-words that are generated from the test patterns is used. However, the cost of distance computation is very high and linearly increases with the sub-frame length. We identify computationally redundant positions and optimize the computing process by ignoring them. If the competing ones do not exist, the reliability factor that should be pre-determined by an extensive search is demanded. To avoid this, we propose adaptive determination methods, which provides even better error-correcting performance. In addition, we investigate the Pyndiah's soft-output computation and find its drawbacks that appear during the approximation process. To remove the drawbacks, we replace the updating method of the positions that are expected to be seriously damaged by the approximation with the reliability factor-based one, which is much simpler, even though they have the competing words. This dissertation also develops a graphics processing unit (GPU) based BTC decoding program. In order to hide the latency of arithmetic and memory access operations, this software applies the kernel structure that processes multiple BTC-words and allocates multiple sub-frames to each thread-block. Global memory access optimization and data compression, which demands less shared memory space, are also employed. For efficient mapping of the Chase-Pyndiah algorithm onto GPUs, we propose parallel processing schemes employing efficient reduction algorithms and provide step-by-step parallel algorithms for the algebraic decoding. The last part of this dissertation is devoted to summarizing the developed decoding method and comparing it with the decoding of the LDPC convolutional code (CC), which is currently reported as the most powerful candidate for the 100Gbps optical network. We first investigate the complexity reduction and the error rate performance improvement of the developed method. Then, we analyze the complexity of the LDPC-CC decoding and compare it with the developed BTC decoding for the 20% overhead codes. This dissertation is intended to develop high-throughput SD decoding software by introducing complexity reduction techniques for the Chase-Pyndiah algorithm and efficient parallel processing methods, and to emphasize the competitiveness of the BTC. The proposed decoding methods and parallel processing algorithms verified in the GPU-based systems are also applicable to hardware-based ones. By implementing hardware-based decoders that employ the developed methods in this dissertation, significant improvements on the throughputs and the energy efficiency can be obtained. Moreover, thanks to the wide rate coverage of the BTC, the developed techniques can be applied to many high-throughput error correction applications, such as the next-generation optical network and storage device systems.Chapter 1 Introduction 1 1.1 Turbo Codes 1 1.2 Applications of Turbo Codes 4 1.3 Outline of the Dissertation 5 Chapter 2 Encoding and Iterative Decoding of Block Turbo Codes 7 2.1 Introduction 7 2.2 Encoding Procedure of Shortened-Extended BTCs 9 2.3 Scheduling Methods for Iterative Decoding 9 2.3.1 Serial Scheduling 10 2.3.2 Parallel Scheduling 10 2.3.3 Replica Scheduling 11 2.4 Elementary Decoding with Chase-Pyndiah Algorithm 13 2.4.1 Chase-Pyndiah Algorithm for Extended BTCs 13 2.4.2 Reliability Computation of the ML Code-Word 17 2.4.3 Algebraic Decoding for SEC and DEC BCH Codes 20 2.5 Issues of Chase-Pyndiah Algorithm 23 Chapter 3 Complexity Reduction Techniques for Code-Word Set Generation of the Chase-Pyndiah Algorithm 24 3.1 Introduction 24 3.2 Adaptive Selection of LRPs 25 3.2.1 Selection Constraints of LRPs 25 3.2.2 Simulation Results 26 3.3 Test Pattern Selection 29 3.3.1 The Error Coverage Factor of Test Patterns 30 3.3.2 Greedy Selection of Test Patterns 33 3.3.3 Simulation Results 34 3.4 Concluding Remarks 34 Chapter 4 Complexity Reduction Techniques for Soft-Output Update of the Chase-Pyndiah Algorithm 37 4.1 Introduction 37 4.2 Distance Computation 38 4.2.1 Position-Index List Based Method 39 4.2.2 Double Index Set-Based Method 42 4.2.3 Complexity Analysis 46 4.2.4 Simulation Results 47 4.3 Reliability Factor Determination 49 4.3.1 Refinement of Distance-Based Reliability Factor 51 4.3.2 Adaptive Determination of the Reliability Factor 51 4.3.3 Simulation Results 53 4.4 Accuracy Improvement in Extrinsic Information Update 54 4.4.1 Drawbacks of the Sub-Optimal Update 55 4.4.2 Low-Complexity Extrinsic Information Update 58 4.4.3 Simulation Results 59 4.5 Concluding Remarks 61 Chapter 5 High-Throughput BTC Decoding on GPUs 64 5.1 Introduction 64 5.2 BTC Decoder Architecture for GPU Implementations 66 5.3 Memory Optimization 68 5.3.1 Global Memory Access Reduction 68 5.3.2 Improvement of Global Memory Access Coalescing 68 5.3.3 Efficient Shared Memory Control with Data Compression 70 5.3.4 Index Parity Check Scheme 73 5.4 Parallel Algorithms with the CUDA Shuffle Function 77 5.5 Implementation of Algebraic Decoder 78 5.5.1 Galois Field Operations with Look-Up Tables 78 5.5.2 Error-Locator Polynomial Setting with the LUTs 81 5.5.3 Parallel Chien Search with the LUTs 84 5.6 Simulation Results 85 5.7 Concluding Remarks 89 Chapter 6 Competitiveness of BTCs as FEC codes for the Next-Generation Optical Networks 91 6.1 Introduction 91 6.2 The Complexity Reduction of the Modified Chase-Pyndiah Algorithm 92 6.2.1 Summary of the Complexity Reduction 92 6.2.2 The Error-Correcting Performance 94 6.3 Comparison of BTCs and LDPC-CCs 97 6.3.1 Complexity Analysis of the LDPC-CC Decoding 97 6.3.2 Comparison of the 20% Overhead BTC and LDPC-CC 100 6.4 Concluding Remarks 101 Chapter 7 Conclusion 102 Bibliography 105 국문 초록 113Docto

SNU Open Repository and Archive