8 research outputs found

    Massively parallel implementation of cyclic LDPC codes on a general purpose graphic processing unit

    Get PDF
    2009 IEEE Workshop On Signal Processing Systems (SiPS) Tampere, Finland 2009-10-07 ~ 2009-10-09Simulation of low-density parity-check (LDPC) codes frequently takes several days, thus the use of general purpose graphics processing units (GPGPUs) is very promising. However, GPGPUs are designed for compute-intensive applications, and they are not optimized for data caching or control management. In LDPC decoding, the parity check matrix H needs to be accessed at every node updating process, and the size of H matrix is often larger than that of GPU on-chip memory especially when the code-length is long or the weight is high. In this work, the parity check matrix of cyclic or quasi-cyclic LDPC codes is greatly compressed by exploiting the periodic property of the matrix. In our experiments, the Compute Unified Device Architecture (CUDA) of Nvidia is used. With the (1057, 813) and (4161, 3431) projective geometry (PG)–LDPC codes, the execution speed of the proposed method is more than twice of the reference implementations that do not exploit the cyclic property of the parity check matrices

    A Massively Parallel Implementation of QC-LDPC Decoder on GPU

    Get PDF
    The graphics processor unit (GPU) is able to provide a low-cost and flexible software-based multi-core architecture for high performance computing. However, it is still very challenging to efficiently map the real-world applications to GPU and fully utilize the computational power of GPU. As a case study, we present a GPU-based implementation of a real-world digital signal processing (DSP) application: low-density parity-check (LDPC) decoder. The paper shows the efforts we made to map the algorithm onto the massively parallel architecture of GPU and fully utilize GPU’s computational resources to significantly boost the performance. Moreover, several efficient data structures have been proposed to reduce the memory access latency and the memory bandwidth requirement. Experimental results show that the proposed GPU-based LDPC decoding accelerator can take advantage of the multi-core computational power provided by GPU and achieve high throughput up to 100.3Mbps.Renesas MobileTexas InstrumentsXilinxNational Science Foundatio

    GPU Accelerated Scalable Parallel Decoding of LDPC Codes

    Get PDF
    This paper proposes a flexible low-density parity-check (LDPC) decoder which leverages graphic processor units (GPU) to provide high decoding throughput. LDPC codes are widely adopted by the new emerging standards for wireless communication systems and storage applications due to their near-capacity error correcting performance. To achieve high decoding throughput on GPU, we leverage the parallelism embedded in the check-node computation and variable-node computation and propose a parallel strategy of partitioning the decoding jobs among multi-processors in GPU. In addition, we propose a scalable multi-codeword decoding scheme to fully utilize the computation resources of GPU. Furthermore, we developed a novel adaptive performance-tuning method to make our decoder implementation more flexible and scalable. The experimental results show that our LDPC decoder is scalable and flexible, and the adaptive performance-tuning method can deliver the peak performance based on the GPU architecture.Renesas MobileSamsungNational Science Foundatio

    연판정 오류정정을 위한 낮은 복잡도의 블록 터보부호 복호화 연구

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 8. 성원용.As the throughput needed for communication systems and storage devices increases, high-performance forward error correction (FEC), especially soft-decision (SD) based technique, becomes essential. In particular, block turbo codes (BTCs) and low-density parity check (LDPC) codes are considered as candidate FEC codes for the next generation systems, such as beyond-100Gbps optical networks and under-20nm NAND flash memory devices, which require capacity-approaching performance and very low error floor. The BTCs have definite strengths in diversity and encoding complexity because they generally employ a two-dimensional structure, which enables sub-frame level decoding for the row or column code-words. This sub-frame level decoding gives a strong advantage for parallel processing. The BTC decoding throughput can be improved by applying a low-complexity algorithm to the small level decoding or by running multiple sub-frame decoding modules simultaneously. In this dissertation, we develop high-throughput BTC decoding software that pursuits these advantages. The first part of this dissertation is devoted to finding efficient test patterns in the Chase-Pyndiah algorithm. Although the complexity of this algorithm linearly increases according to the number of the test patterns, it naively considers all possible patterns containing least reliable positions. As a result, consideration of one more position nearly doubles the complexity. To solve this issue, we first introduce a new position selection criterion that excludes some of the selected ones having a relatively large reliability. This technique excludes the selection of sufficiently reliable positions, which greatly reduces the complexity. Secondly, we propose a pattern selection scheme considering the error coverage. We define the error coverage factor that represents the influence on the error-correcting performance and compute it by analyzing error events. Based on the computed factor, we select the patterns with the greedy algorithm. By using these methods, we can flexibly balance the complexity and the performance. The second part of this dissertation is developing low-complexity soft-output processing methods needed for BTC decoding. In the Chase-Pyndiah algorithm, the soft-output is updated in two different ways according to whether competing code-words exist on the updating positions or not. If the competing code-words exist, the Euclidean distance between the soft-input signal and the code-words that are generated from the test patterns is used. However, the cost of distance computation is very high and linearly increases with the sub-frame length. We identify computationally redundant positions and optimize the computing process by ignoring them. If the competing ones do not exist, the reliability factor that should be pre-determined by an extensive search is demanded. To avoid this, we propose adaptive determination methods, which provides even better error-correcting performance. In addition, we investigate the Pyndiah's soft-output computation and find its drawbacks that appear during the approximation process. To remove the drawbacks, we replace the updating method of the positions that are expected to be seriously damaged by the approximation with the reliability factor-based one, which is much simpler, even though they have the competing words. This dissertation also develops a graphics processing unit (GPU) based BTC decoding program. In order to hide the latency of arithmetic and memory access operations, this software applies the kernel structure that processes multiple BTC-words and allocates multiple sub-frames to each thread-block. Global memory access optimization and data compression, which demands less shared memory space, are also employed. For efficient mapping of the Chase-Pyndiah algorithm onto GPUs, we propose parallel processing schemes employing efficient reduction algorithms and provide step-by-step parallel algorithms for the algebraic decoding. The last part of this dissertation is devoted to summarizing the developed decoding method and comparing it with the decoding of the LDPC convolutional code (CC), which is currently reported as the most powerful candidate for the 100Gbps optical network. We first investigate the complexity reduction and the error rate performance improvement of the developed method. Then, we analyze the complexity of the LDPC-CC decoding and compare it with the developed BTC decoding for the 20% overhead codes. This dissertation is intended to develop high-throughput SD decoding software by introducing complexity reduction techniques for the Chase-Pyndiah algorithm and efficient parallel processing methods, and to emphasize the competitiveness of the BTC. The proposed decoding methods and parallel processing algorithms verified in the GPU-based systems are also applicable to hardware-based ones. By implementing hardware-based decoders that employ the developed methods in this dissertation, significant improvements on the throughputs and the energy efficiency can be obtained. Moreover, thanks to the wide rate coverage of the BTC, the developed techniques can be applied to many high-throughput error correction applications, such as the next-generation optical network and storage device systems.Chapter 1 Introduction 1 1.1 Turbo Codes 1 1.2 Applications of Turbo Codes 4 1.3 Outline of the Dissertation 5 Chapter 2 Encoding and Iterative Decoding of Block Turbo Codes 7 2.1 Introduction 7 2.2 Encoding Procedure of Shortened-Extended BTCs 9 2.3 Scheduling Methods for Iterative Decoding 9 2.3.1 Serial Scheduling 10 2.3.2 Parallel Scheduling 10 2.3.3 Replica Scheduling 11 2.4 Elementary Decoding with Chase-Pyndiah Algorithm 13 2.4.1 Chase-Pyndiah Algorithm for Extended BTCs 13 2.4.2 Reliability Computation of the ML Code-Word 17 2.4.3 Algebraic Decoding for SEC and DEC BCH Codes 20 2.5 Issues of Chase-Pyndiah Algorithm 23 Chapter 3 Complexity Reduction Techniques for Code-Word Set Generation of the Chase-Pyndiah Algorithm 24 3.1 Introduction 24 3.2 Adaptive Selection of LRPs 25 3.2.1 Selection Constraints of LRPs 25 3.2.2 Simulation Results 26 3.3 Test Pattern Selection 29 3.3.1 The Error Coverage Factor of Test Patterns 30 3.3.2 Greedy Selection of Test Patterns 33 3.3.3 Simulation Results 34 3.4 Concluding Remarks 34 Chapter 4 Complexity Reduction Techniques for Soft-Output Update of the Chase-Pyndiah Algorithm 37 4.1 Introduction 37 4.2 Distance Computation 38 4.2.1 Position-Index List Based Method 39 4.2.2 Double Index Set-Based Method 42 4.2.3 Complexity Analysis 46 4.2.4 Simulation Results 47 4.3 Reliability Factor Determination 49 4.3.1 Refinement of Distance-Based Reliability Factor 51 4.3.2 Adaptive Determination of the Reliability Factor 51 4.3.3 Simulation Results 53 4.4 Accuracy Improvement in Extrinsic Information Update 54 4.4.1 Drawbacks of the Sub-Optimal Update 55 4.4.2 Low-Complexity Extrinsic Information Update 58 4.4.3 Simulation Results 59 4.5 Concluding Remarks 61 Chapter 5 High-Throughput BTC Decoding on GPUs 64 5.1 Introduction 64 5.2 BTC Decoder Architecture for GPU Implementations 66 5.3 Memory Optimization 68 5.3.1 Global Memory Access Reduction 68 5.3.2 Improvement of Global Memory Access Coalescing 68 5.3.3 Efficient Shared Memory Control with Data Compression 70 5.3.4 Index Parity Check Scheme 73 5.4 Parallel Algorithms with the CUDA Shuffle Function 77 5.5 Implementation of Algebraic Decoder 78 5.5.1 Galois Field Operations with Look-Up Tables 78 5.5.2 Error-Locator Polynomial Setting with the LUTs 81 5.5.3 Parallel Chien Search with the LUTs 84 5.6 Simulation Results 85 5.7 Concluding Remarks 89 Chapter 6 Competitiveness of BTCs as FEC codes for the Next-Generation Optical Networks 91 6.1 Introduction 91 6.2 The Complexity Reduction of the Modified Chase-Pyndiah Algorithm 92 6.2.1 Summary of the Complexity Reduction 92 6.2.2 The Error-Correcting Performance 94 6.3 Comparison of BTCs and LDPC-CCs 97 6.3.1 Complexity Analysis of the LDPC-CC Decoding 97 6.3.2 Comparison of the 20% Overhead BTC and LDPC-CC 100 6.4 Concluding Remarks 101 Chapter 7 Conclusion 102 Bibliography 105 국문 초록 113Docto

    Políticas de Copyright de Publicações Científicas em Repositórios Institucionais: O Caso do INESC TEC

    Get PDF
    A progressiva transformação das práticas científicas, impulsionada pelo desenvolvimento das novas Tecnologias de Informação e Comunicação (TIC), têm possibilitado aumentar o acesso à informação, caminhando gradualmente para uma abertura do ciclo de pesquisa. Isto permitirá resolver a longo prazo uma adversidade que se tem colocado aos investigadores, que passa pela existência de barreiras que limitam as condições de acesso, sejam estas geográficas ou financeiras. Apesar da produção científica ser dominada, maioritariamente, por grandes editoras comerciais, estando sujeita às regras por estas impostas, o Movimento do Acesso Aberto cuja primeira declaração pública, a Declaração de Budapeste (BOAI), é de 2002, vem propor alterações significativas que beneficiam os autores e os leitores. Este Movimento vem a ganhar importância em Portugal desde 2003, com a constituição do primeiro repositório institucional a nível nacional. Os repositórios institucionais surgiram como uma ferramenta de divulgação da produção científica de uma instituição, com o intuito de permitir abrir aos resultados da investigação, quer antes da publicação e do próprio processo de arbitragem (preprint), quer depois (postprint), e, consequentemente, aumentar a visibilidade do trabalho desenvolvido por um investigador e a respetiva instituição. O estudo apresentado, que passou por uma análise das políticas de copyright das publicações científicas mais relevantes do INESC TEC, permitiu não só perceber que as editoras adotam cada vez mais políticas que possibilitam o auto-arquivo das publicações em repositórios institucionais, como também que existe todo um trabalho de sensibilização a percorrer, não só para os investigadores, como para a instituição e toda a sociedade. A produção de um conjunto de recomendações, que passam pela implementação de uma política institucional que incentive o auto-arquivo das publicações desenvolvidas no âmbito institucional no repositório, serve como mote para uma maior valorização da produção científica do INESC TEC.The progressive transformation of scientific practices, driven by the development of new Information and Communication Technologies (ICT), which made it possible to increase access to information, gradually moving towards an opening of the research cycle. This opening makes it possible to resolve, in the long term, the adversity that has been placed on researchers, which involves the existence of barriers that limit access conditions, whether geographical or financial. Although large commercial publishers predominantly dominate scientific production and subject it to the rules imposed by them, the Open Access movement whose first public declaration, the Budapest Declaration (BOAI), was in 2002, proposes significant changes that benefit the authors and the readers. This Movement has gained importance in Portugal since 2003, with the constitution of the first institutional repository at the national level. Institutional repositories have emerged as a tool for disseminating the scientific production of an institution to open the results of the research, both before publication and the preprint process and postprint, increase the visibility of work done by an investigator and his or her institution. The present study, which underwent an analysis of the copyright policies of INESC TEC most relevant scientific publications, allowed not only to realize that publishers are increasingly adopting policies that make it possible to self-archive publications in institutional repositories, all the work of raising awareness, not only for researchers but also for the institution and the whole society. The production of a set of recommendations, which go through the implementation of an institutional policy that encourages the self-archiving of the publications developed in the institutional scope in the repository, serves as a motto for a greater appreciation of the scientific production of INESC TEC

    Ramon Llull's Ars Magna

    Get PDF
    corecore