427 research outputs found

    ๋‚ธ๋“œํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ ์˜ค๋ฅ˜์ •์ •์„ ์œ„ํ•œ ๊ณ ์„ฑ๋Šฅ LDPC ๋ณตํ˜ธ๋ฐฉ๋ฒ• ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2013. 8. ์„ฑ์›์šฉ.๋ฐ˜๋„์ฒด ๊ณต์ •์˜ ๋ฏธ์„ธํ™”์— ๋”ฐ๋ผ ๋น„ํŠธ ์—๋Ÿฌ์œจ์ด ์ฆ๊ฐ€ํ•˜๋Š” ๋‚ธ๋“œ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์—์„œ ๊ณ ์„ฑ๋Šฅ ์—๋Ÿฌ ์ •์ • ๋ฐฉ๋ฒ•์€ ํ•„์ˆ˜์ ์ด๋‹ค. Low-density parity-check (LDPC) ๋ถ€ํ˜ธ์™€ ๊ฐ™์€ ์—ฐํŒ์ • ์—๋Ÿฌ ์ •์ • ๋ถ€ํ˜ธ๋Š” ๋›ฐ์–ด๋‚œ ์—๋Ÿฌ ์ •์ • ์„ฑ๋Šฅ์„ ๋ณด์ด์ง€๋งŒ, ๋†’์€ ๊ตฌํ˜„ ๋ณต์žก๋„๋กœ ์ธํ•ด ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ ์‹œ์Šคํ…œ์— ์ ์šฉ๋˜๊ธฐ ํž˜๋“  ๋‹จ์ ์ด ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” LDPC ๋ถ€ํ˜ธ์˜ ํšจ์œจ์ ์ธ ๋ณตํ˜ธ๋ฅผ ์œ„ํ•ด ๊ณ ์„ฑ๋Šฅ ๋ฉ”์‹œ์ง€ ์ „ํŒŒ ์Šค์ผ€์ค„๋ง ๋ฐฉ๋ฒ•๊ณผ ์ € ๋ณต์žก๋„ ๋ณตํ˜ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•œ๋‹ค. ํŠนํžˆ finite geometry (FG) LDPC ๋ถ€ํ˜ธ์— ๋Œ€ํ•œ ํšจ์œจ์ ์ธ ๋””์ฝ”๋” ์•„ํ‚คํ…์ณ๋ฅผ ์ œ์•ˆํ•˜๋ฉฐ, ๊ตฌํ˜„๋œ ๋””์ฝ”๋”๋ฅผ ์ด์šฉํ•˜์—ฌ ๋‚ธ๋“œ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์— ๋Œ€ํ•ด ์—ฐํŒ์ • ๋ณตํ˜ธ์‹œ์˜ ์—๋„ˆ์ง€ ์†Œ๋ชจ๋Ÿ‰์— ๋Œ€ํ•ด ์—ฐ๊ตฌํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ์ฒซ ๋ฒˆ์งธ ๋ถ€๋ถ„์—์„œ๋Š” ๋™์  ์Šค์ผ€์ค„๋ง (informed dynamic scheduling, IDS) ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์„ฑ๋Šฅํ–ฅ์ƒ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์—ฐ๊ตฌํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์šฐ์„  ๊ธฐ์กด์˜ ๊ฐ€์žฅ ๋น ๋ฅธ ์ˆ˜๋ ด ์†๋„๋ฅผ ๋ณด์ด๋Š” IDS ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ ๋ ˆ์ง€๋“€์–ผ ์‹ ๋ขฐ ์ „ํŒŒ (residual belief propagation, RBP) ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋™์ž‘ ํŠน์„ฑ์„ ๋ถ„์„ํ•˜๊ณ , ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํŠน์ • ๋…ธ๋“œ์— ๋ฉ”์‹œ์ง€ ๊ฐฑ์‹ ์ด ์ง‘์ค‘๋˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜์—ฌ RBP ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ˆ˜๋ ด์†๋„๋ฅผ ์ฆ๊ฐ€์‹œํ‚จ improved RBP (iRBP) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ iRBP์˜ ๋›ฐ์–ด๋‚œ ์ˆ˜๋ ด์†๋„์™€ ๊ธฐ์กด์˜ NS ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์šฐ์ˆ˜ํ•œ ์—๋Ÿฌ ์ •์ • ๋Šฅ๋ ฅ์„ ๋ชจ๋‘ ๊ฐ–์ถ˜ ์‹ ๋“œ๋กฌ ๊ธฐ๋ฐ˜์˜ ํ˜ผํ•ฉ ์Šค์ผ€์ค„๋ง (mixed scheduling) ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋์œผ๋กœ ๋‹ค์–‘ํ•œ ๋ถ€ํ˜ธ์œจ์˜ LDPC ๋ถ€ํ˜ธ์— ๋Œ€ํ•œ ๋ชจ์˜์‹คํ—˜์„ ํ†ตํ•ด ์ œ์•ˆ๋œ ์‹ ๋“œ๋กฌ ๊ธฐ๋ฐ˜์˜ ํ˜ผํ•ฉ ์Šค์ผ€์ค„๋ง ๋ฐฉ๋ฒ•์ด ๋ณธ ๋…ผ๋ฌธ์—์„œ ์‹œํ—˜๋œ ๋‹ค๋ฅธ ๋ชจ๋“  ์Šค์ผ€์ค„๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์„ฑ๋Šฅ์„ ๋Šฅ๊ฐ€ํ•จ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋…ผ๋ฌธ์˜ ๋‘ ๋ฒˆ์งธ ๋ถ€๋ถ„์—์„œ๋Š” ๋ณตํ˜ธ ์‹คํŒจ์‹œ ๋งŽ์€ ๋น„ํŠธ ์—๋Ÿฌ๋ฅผ ๋ฐœ์ƒ์‹œํ‚ค๋Š” a posteriori probability (APP) ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฐœ์„  ๋ฐฉ์•ˆ์— ๋ฐฉ์•ˆ์„ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ ๋น ๋ฅธ ์ˆ˜๋ ด์†๋„์™€ ์šฐ์ˆ˜ํ•œ ์—๋Ÿฌ ๋งˆ๋ฃจ (error-floor) ์„ฑ๋Šฅ์œผ๋กœ ๋ฐ์ดํ„ฐ ์ €์žฅ์žฅ์น˜์— ์ ํ•ฉํ•œ FG-LDPC ๋ถ€ํ˜ธ์— ๋Œ€ํ•ด ์ œ์•ˆ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์ ์šฉ๋œ ํ•˜๋“œ์›จ์–ด ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ ์•„ํ‚คํ…์ฒ˜๋Š” ๋†’์€ ๋…ธ๋“œ ๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ€์ง€๋Š” FG-LDPC ๋ถ€ํ˜ธ์— ์ ํ•ฉํ•˜๋„๋ก ์‰ฌํ”„ํŠธ ๋ ˆ์ง€์Šคํ„ฐ (shift registers)์™€ SRAM ๊ธฐ๋ฐ˜์˜ ํ˜ผํ•ฉ ๊ตฌ์กฐ๋ฅผ ์ฑ„์šฉํ•˜๋ฉฐ, ๋†’์€ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ์–ป๊ธฐ ์œ„ํ•ด ํŒŒ์ดํ”„๋ผ์ธ๋œ ๋ณ‘๋ ฌ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๋˜ํ•œ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ์„ธ ๊ฐ€์ง€์˜ ๋ฉ”๋ชจ๋ฆฌ ์šฉ๋Ÿ‰ ๊ฐ์†Œ ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜๋ฉฐ, ์ „๋ ฅ ์†Œ๋น„๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ๋‘ ๊ฐ€์ง€์˜ ์ €์ „๋ ฅ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋ณธ ์ œ์•ˆ๋œ ์•„ํ‚คํ…์ฒ˜๋Š” ๋ถ€ํ˜ธ์œจ 0.96์˜ (68254, 65536) Euclidean geometry LDPC ๋ถ€ํ˜ธ์— ๋Œ€ํ•ด 0.13-um CMOS ๊ณต์ •์—์„œ ๊ตฌํ˜„ํ•˜์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์—ฐํŒ์ • ๋ณตํ˜ธ๊ฐ€ ์ ์šฉ๋œ ๋‚ธ๋“œ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ ์‹œ์Šคํ…œ์˜ ์—๋„ˆ์ง€ ์†Œ๋ชจ๋ฅผ ๋‚ฎ์ถ”๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์ œ์•ˆํ•œ๋‹ค. ์—ฐํŒ์ • ๊ธฐ๋ฐ˜์˜ ์—๋Ÿฌ ์ •์ • ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ด์ง€๋งŒ, ์ด๋Š” ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์˜ ์„ผ์‹ฑ ์ˆ˜์™€ ์—๋„ˆ์ง€ ์†Œ๋ชจ๋ฅผ ์ฆ๊ฐ€ ์‹œํ‚ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์•ž์„œ ๊ตฌํ˜„๋œ LDPC ๋””์ฝ”๋”๊ฐ€ ์ฑ„์šฉ๋œ ๋‚ธ๋“œ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ ์‹œ์Šคํ…œ์˜ ์—๋„ˆ์ง€ ์†Œ๋ชจ๋ฅผ ๋ถ„์„ํ•˜๊ณ , LDPC ๋””์ฝ”๋”์™€ BCH ๋””์ฝ”๋” ๊ฐ„์˜ ์นฉ ์‚ฌ์ด์ฆˆ์™€ ์—๋„ˆ์ง€ ์†Œ๋ชจ๋Ÿ‰์„ ๋น„๊ตํ•˜์˜€๋‹ค. ์ด์™€ ๋”๋ถˆ์–ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” LDPC ๋””์ฝ”๋”๋ฅผ ์ด์šฉํ•œ ์„ผ์‹ฑ ์ •๋ฐ€๋„ ๊ฒฐ์ • ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด ์ œ์•ˆ๋œ ๋ณตํ˜ธ ๋ฐ ์Šค์ผ€์ค„๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜, VLSI ์•„ํ‚คํ…์ณ, ๊ทธ๋ฆฌ๊ณ  ์ฝ๊ธฐ ์ •๋ฐ€๋„ ๊ฒฐ์ • ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ๋‚ธ๋“œ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ ์‹œ์Šคํ…œ์˜ ์—๋Ÿฌ ์ •์ • ์„ฑ๋Šฅ์„ ๊ทน๋Œ€ํ™” ํ•˜๊ณ  ์—๋„ˆ์ง€ ์†Œ๋ชจ๋ฅผ ์ตœ์†Œํ™” ํ•  ์ˆ˜ ์žˆ๋‹ค.High-performance error correction for NAND flash memory is greatly needed because the raw bit error rate increases as the semiconductor geometry shrinks for high density. Soft-decision error correction, such as low-density parity-check (LDPC) codes, offers high performance but their implementation complexity hinders wide adoption to consumer products. This dissertation proposes two high-performance message-passing schedules and a low-complexity decoding algorithm for LDPC codes. In particular, an efficient decoder architecture for finite geometry (FG) LDPC codes is proposed, and the energy consumption of soft-decision decoding for NAND flash memory is analyzed. The first part of this dissertation is devoted to improving the informed dynamic scheduling (IDS) algorithms. We analyze the behavior of the residual belief propagation (RBP), which is the fastest IDS algorithm, and develop an improved RBP (iRBP) by avoiding the concentration of message updates at a particular node. We also study the syndrome-based mixed scheduling of the iRBP and the node-wise scheduling (NS). The proposed mixed scheduling outperforms all other scheduling methods tested in this work. The next part of this dissertation is to develop a conditional variable node update scheme for the a posteriori probability (APP) algorithm. The developed algorithm is robust to decoding failures and can reduce the dynamic power consumption by lowering switching activities in the LDPC decoder. To implement the developed algorithm, we propose a memory-efficient pipelined parallel architecture for LDPC decoding. The architecture employs FG-LDPC codes that not only show fast convergence speed and good error-floor performance but also perform well with iterative decoding algorithms, which is especially suitable for data storage devices. We also developed a rate-0.96 (68254, 65536) Euclidean geometry LDPC code and implemented the proposed architecture in 0.13-um CMOS technology. This dissertation also covers low-energy error correction of NAND flash memory through soft-decision decoding. The soft-decision-based error correction algorithms show high performance, but they demand an increased number of flash memory sensing operations and consume more energy for memory access. We examine the energy consumption of a NAND flash memory system equipping an LDPC code-based soft-decision error correction circuit. The sum of energy consumed at NAND flash memory and the LDPC decoder is minimized. In addition, the chip size and energy consumption of the decoder were compared with those of two Bose-Chaudhuri-Hocquenghem (BCH) decoding circuits showing the comparable error performance and the throughput. We also propose an LDPC decoder-assisted precision selection method that needs virtually no overhead. This dissertation is intended to develop high-performance and low-power error correction circuits for NAND flash memory by studying improved decoding and scheduling algorithms, VLSI architecture, and a read precision selection method.1 Introduction 1 1.1 NAND Flash Memory 1 1.2 LDPC Codes 4 1.3 Outline of the Dissertation 6 2 LDPC Decoding and Scheduling Algorithms 8 2.1 Introduction 8 2.2 Decoding Algorithms for LDPC Codes 10 2.2.1 Belief Propagation Algorithm 10 2.2.2 Simplified Belief Propagation Algorithms 12 2.3 Message-Passing Schedules for Decoding of LDPC Codes 15 2.3.1 Static Schedules 15 2.3.2 Dynamic Schedules 17 3 Improved Dynamic Scheduling Algorithms for Decoding of LDPC Codes 22 3.1 Introduction 22 3.2 Improved Residual Belief Propagation Algorithm 23 3.3 Syndrome-Based Mixed Scheduling of iRBP and NS 26 3.4 Complexity Analysis and Simulation Results 28 3.4.1 Complexity Analysis 28 3.4.2 Simulation Results 29 3.5 Concluding Remarks 33 4 A Pipelined Parallel Architecture for Decoding of Finite-Geometry LDPC Codes 36 4.1 Introduction 36 4.2 Finite-Geometry LDPC Codes and Conditional Variable Node Update Algorithm 38 4.2.1 Finite-Geometry LDPC codes 38 4.2.2 Conditional Variable Node Update Algorithm for Fixed-Point Normalized APP-Based Algorithm 40 4.3 Decoder Architecture 46 4.3.1 Baseline Sequential Architecture 46 4.3.2 Pipelined-Parallel Architecture 54 4.3.3 Memory Capacity Reduction 57 4.4 Implementation Results 60 4.5 Concluding Remarks 64 5 Low-Energy Error Correction of NAND Flash Memory through Soft-Decision Decoding 66 5.1 Introduction 66 5.2 Energy Consumption of Read Operations in NAND Flash Memory 67 5.2.1 Voltage Sensing Scheme for Soft-Decision Data Output 67 5.2.2 LSB and MSB Concurrent Access Scheme for Low-Energy Soft-Decision Data Output 72 5.2.3 Energy Consumption of Read Operations in NAND Flash Memory 73 5.3 The Performance of Soft-Decision Error Correction over a NAND Flash Memory Channel 76 5.4 Hardware Performance of the (68254, 65536) LDPC Decoder 81 5.4.1 Energy Consumption of the LDPC Decoder 81 5.4.2 Performance Comparison of the LDPC Decoder and Two BCH Decoders 83 5.5 Low-Energy Error Correction Scheme for NAND Flash Memory 87 5.5.1 Optimum Precision for Low-Energy Decoding 87 5.5.2 Iteration Count-Based Precision Selection 90 5.6 Concluding Remarks 91 6 Conclusion 94 Bibliography 96 Abstract in Korean 110 ๊ฐ์‚ฌ์˜ ๊ธ€ 112Docto

    Algorithm Development and VLSI Implementation of Energy Efficient Decoders of Polar Codes

    Get PDF
    With its low error-floor performance, polar codes attract significant attention as the potential standard error correction code (ECC) for future communication and data storage. However, the VLSI implementation complexity of polar codes decoders is largely influenced by its nature of in-series decoding. This dissertation is dedicated to presenting optimal decoder architectures for polar codes. This dissertation addresses several structural properties of polar codes and key properties of decoding algorithms that are not dealt with in the prior researches. The underlying concept of the proposed architectures is a paradigm that simplifies and schedules the computations such that hardware is simplified, latency is minimized and bandwidth is maximized. In pursuit of the above, throughput centric successive cancellation (TCSC) and overlapping path list successive cancellation (OPLSC) VLSI architectures and express journey BP (XJBP) decoders for the polar codes are presented. An arbitrary polar code can be decomposed by a set of shorter polar codes with special characteristics, those shorter polar codes are referred to as constituent polar codes. By exploiting the homogeneousness between decoding processes of different constituent polar codes, TCSC reduces the decoding latency of the SC decoder by 60% for codes with length n = 1024. The error correction performance of SC decoding is inferior to that of list successive cancellation decoding. The LSC decoding algorithm delivers the most reliable decoding results; however, it consumes most hardware resources and decoding cycles. Instead of using multiple instances of decoding cores in the LSC decoders, a single SC decoder is used in the OPLSC architecture. The computations of each path in the LSC are arranged to occupy the decoder hardware stages serially in a streamlined fashion. This yields a significant reduction of hardware complexity. The OPLSC decoder has achieved about 1.4 times hardware efficiency improvement compared with traditional LSC decoders. The hardware efficient VLSI architectures for TCSC and OPLSC polar codes decoders are also introduced. Decoders based on SC or LSC algorithms suffer from high latency and limited throughput due to their serial decoding natures. An alternative approach to decode the polar codes is belief propagation (BP) based algorithm. In BP algorithm, a graph is set up to guide the beliefs propagated and refined, which is usually referred to as factor graph. BP decoding algorithm allows decoding in parallel to achieve much higher throughput. XJBP decoder facilitates belief propagation by utilizing the specific constituent codes that exist in the conventional factor graph, which results in an express journey (XJ) decoder. Compared with the conventional BP decoding algorithm for polar codes, the proposed decoder reduces the computational complexity by about 40.6%. This enables an energy-efficient hardware implementation. To further explore the hardware consumption of the proposed XJBP decoder, the computations scheduling is modeled and analyzed in this dissertation. With discussions on different hardware scenarios, the optimal scheduling plans are developed. A novel memory-distributed micro-architecture of the XJBP decoder is proposed and analyzed to solve the potential memory access problems of the proposed scheduling strategy. The register-transfer level (RTL) models of the XJBP decoder are set up for comparisons with other state-of-the-art BP decoders. The results show that the power efficiency of BP decoders is improved by about 3 times

    Algorithm Development and VLSI Implementation of Energy Efficient Decoders of Polar Codes

    Get PDF
    With its low error-floor performance, polar codes attract significant attention as the potential standard error correction code (ECC) for future communication and data storage. However, the VLSI implementation complexity of polar codes decoders is largely influenced by its nature of in-series decoding. This dissertation is dedicated to presenting optimal decoder architectures for polar codes. This dissertation addresses several structural properties of polar codes and key properties of decoding algorithms that are not dealt with in the prior researches. The underlying concept of the proposed architectures is a paradigm that simplifies and schedules the computations such that hardware is simplified, latency is minimized and bandwidth is maximized. In pursuit of the above, throughput centric successive cancellation (TCSC) and overlapping path list successive cancellation (OPLSC) VLSI architectures and express journey BP (XJBP) decoders for the polar codes are presented. An arbitrary polar code can be decomposed by a set of shorter polar codes with special characteristics, those shorter polar codes are referred to as constituent polar codes. By exploiting the homogeneousness between decoding processes of different constituent polar codes, TCSC reduces the decoding latency of the SC decoder by 60% for codes with length n = 1024. The error correction performance of SC decoding is inferior to that of list successive cancellation decoding. The LSC decoding algorithm delivers the most reliable decoding results; however, it consumes most hardware resources and decoding cycles. Instead of using multiple instances of decoding cores in the LSC decoders, a single SC decoder is used in the OPLSC architecture. The computations of each path in the LSC are arranged to occupy the decoder hardware stages serially in a streamlined fashion. This yields a significant reduction of hardware complexity. The OPLSC decoder has achieved about 1.4 times hardware efficiency improvement compared with traditional LSC decoders. The hardware efficient VLSI architectures for TCSC and OPLSC polar codes decoders are also introduced. Decoders based on SC or LSC algorithms suffer from high latency and limited throughput due to their serial decoding natures. An alternative approach to decode the polar codes is belief propagation (BP) based algorithm. In BP algorithm, a graph is set up to guide the beliefs propagated and refined, which is usually referred to as factor graph. BP decoding algorithm allows decoding in parallel to achieve much higher throughput. XJBP decoder facilitates belief propagation by utilizing the specific constituent codes that exist in the conventional factor graph, which results in an express journey (XJ) decoder. Compared with the conventional BP decoding algorithm for polar codes, the proposed decoder reduces the computational complexity by about 40.6%. This enables an energy-efficient hardware implementation. To further explore the hardware consumption of the proposed XJBP decoder, the computations scheduling is modeled and analyzed in this dissertation. With discussions on different hardware scenarios, the optimal scheduling plans are developed. A novel memory-distributed micro-architecture of the XJBP decoder is proposed and analyzed to solve the potential memory access problems of the proposed scheduling strategy. The register-transfer level (RTL) models of the XJBP decoder are set up for comparisons with other state-of-the-art BP decoders. The results show that the power efficiency of BP decoders is improved by about 3 times

    NEURAL NETWORKS FOR DECISION SUPPORT: PROBLEMS AND OPPORTUNITIES

    Get PDF
    Neural networks offer an approach to computing which - unlike conventional programming - does not necessitate a complete algorithmic specification. Furthermore, neural networks provide inductive means for gathering, storing, and using, experiential knowledge. Incidentally, these have also been some of the fundamental motivations for the development of decision support systems in general. Thus, the interest in neural networks for decision support is immediate and obvious. In this paper, we analyze the potential contribution of neural networks for decision support, on one hand, and point out at some inherent constraints that might inhibit their use, on the other. For the sake of completeness and organization, the analysis is carried out in the context of a general-purpose DSS framework that examines all the key factors that come into play in the design of any decision support system.Information Systems Working Papers Serie

    A Visual Analytics Approach to Debugging Cooperative, Autonomous Multi-Robot Systems' Worldviews

    Full text link
    Autonomous multi-robot systems, where a team of robots shares information to perform tasks that are beyond an individual robot's abilities, hold great promise for a number of applications, such as planetary exploration missions. Each robot in a multi-robot system that uses the shared-world coordination paradigm autonomously schedules which robot should perform a given task, and when, using its worldview--the robot's internal representation of its belief about both its own state, and other robots' states. A key problem for operators is that robots' worldviews can fall out of sync (often due to weak communication links), leading to desynchronization of the robots' scheduling decisions and inconsistent emergent behavior (e.g., tasks not performed, or performed by multiple robots). Operators face the time-consuming and difficult task of making sense of the robots' scheduling decisions, detecting de-synchronizations, and pinpointing the cause by comparing every robot's worldview. To address these challenges, we introduce MOSAIC Viewer, a visual analytics system that helps operators (i) make sense of the robots' schedules and (ii) detect and conduct a root cause analysis of the robots' desynchronized worldviews. Over a year-long partnership with roboticists at the NASA Jet Propulsion Laboratory, we conduct a formative study to identify the necessary system design requirements and a qualitative evaluation with 12 roboticists. We find that MOSAIC Viewer is faster- and easier-to-use than the users' current approaches, and it allows them to stitch low-level details to formulate a high-level understanding of the robots' schedules and detect and pinpoint the cause of the desynchronized worldviews.Comment: To appear in IEEE Conference on Visual Analytics Science and Technology (VAST) 202

    System Development and VLSI Implementation of High Throughput and Hardware Efficient Polar Code Decoder

    Get PDF
    Polar code is the first channel code which is provable to achieve the Shannon capacity. Additionally, it has a very good performance in terms of low error floor. All these merits make it a potential candidate for the future standard of wireless communication or storage system. Polar code is received increasing research interest these years. However, the hardware implementation of hardware decoder still has not meet the expectation of practical applications, no matter from neither throughput aspect nor hardware efficient aspect. This dissertation presents several system development approaches and hardware structures for three widely known decoding algorithms. These algorithms are successive cancellation (SC), list successive cancellation (LSC) and belief propagation (BP). All the efforts are in order to maximize the throughput meanwhile minimize the hardware cost. Throughput centric successive cancellation (TCSC) decoder is proposed for SC decoding. By introducing the concept of constituent code, the decoding latency is significantly reduced with a negligible decoding performance loss. However, the specifically designed computation unites dramatically increase the hardware cost, and how to handle the conventional polar code sets and constituent codes sets makes the hardware implementation more complicated. By exploiting the natural property of conventional SC decoder, datapaths for decoding constituent codes are compatibly built via computation units sharing technique. This approach does not incur additional hardware cost expect some multiplexer logic, but can significantly increase the decoding throughput. Other techniques such as pre-computing and gate-level optimization are used as well in order to further increase the decoding throughput. A specific designed partial sum generator (PSG) is also investigated in this dissertation. This PSG is hardware efficient and timing compatible with proposed TCSC decoder. Additionally, a polar code construction scheme with constituent codes optimization is also presents. This construction scheme aims to reduce the constituent codes based SC decoding latency. Results show that, compared with the state-of-art decoder, TCSC can achieve at least 60% latency reduction for the codes with length n = 1024. By using Nangate FreePDK 45nm process, TCSC decoder can reach throughput up to 5.81 Gbps and 2.01 Gbps for (1024, 870) and (1024, 512) polar code, respectively. Besides, with the proposed construction scheme, the TCSC decoder generally is able to further achieve at least around 20% latency deduction with an negligible gain loss. Overlapped List Successive Cancellation (OLSC) is proposed for LSC decoding as a design approach. LSC decoding has a better performance than LS decoding at the cost of hardware consumption. With such approach, the l (l > 1) instances of successive cancellation (SC) decoder for LSC with list size l can be cut down to only one. This results in a dramatic reduction of the hardware complexity without any decoding performance loss. Meanwhile, approaches to reduce the latency associated with the pipeline scheme are also investigated. Simulation results show that with proposed design approach the hardware efficiency is increased significantly over the recently proposed LSC decoders. Express Journey Belief Propagation (XJBP) is proposed for BP decoding. This idea origins from extending the constituent codes concept from SC to BP decoding. Express journey refers to the datapath of specific constituent codes in the factor graph, which accelerates the belief information propagation speed. The XJBP decoder is able to achieve 40.6% computational complexity reduction with the conventional BP decoding. This enables an energy efficient hardware implementation. In summary, all the efforts to optimize the polar code decoder are presented in this dissertation, supported by the careful analysis, precise description, extensively numerical simulations, thoughtful discussion and RTL implementation on VLSI design platforms

    System Development and VLSI Implementation of High Throughput and Hardware Efficient Polar Code Decoder

    Get PDF
    Polar code is the first channel code which is provable to achieve the Shannon capacity. Additionally, it has a very good performance in terms of low error floor. All these merits make it a potential candidate for the future standard of wireless communication or storage system. Polar code is received increasing research interest these years. However, the hardware implementation of hardware decoder still has not meet the expectation of practical applications, no matter from neither throughput aspect nor hardware efficient aspect. This dissertation presents several system development approaches and hardware structures for three widely known decoding algorithms. These algorithms are successive cancellation (SC), list successive cancellation (LSC) and belief propagation (BP). All the efforts are in order to maximize the throughput meanwhile minimize the hardware cost. Throughput centric successive cancellation (TCSC) decoder is proposed for SC decoding. By introducing the concept of constituent code, the decoding latency is significantly reduced with a negligible decoding performance loss. However, the specifically designed computation unites dramatically increase the hardware cost, and how to handle the conventional polar code sets and constituent codes sets makes the hardware implementation more complicated. By exploiting the natural property of conventional SC decoder, datapaths for decoding constituent codes are compatibly built via computation units sharing technique. This approach does not incur additional hardware cost expect some multiplexer logic, but can significantly increase the decoding throughput. Other techniques such as pre-computing and gate-level optimization are used as well in order to further increase the decoding throughput. A specific designed partial sum generator (PSG) is also investigated in this dissertation. This PSG is hardware efficient and timing compatible with proposed TCSC decoder. Additionally, a polar code construction scheme with constituent codes optimization is also presents. This construction scheme aims to reduce the constituent codes based SC decoding latency. Results show that, compared with the state-of-art decoder, TCSC can achieve at least 60% latency reduction for the codes with length n = 1024. By using Nangate FreePDK 45nm process, TCSC decoder can reach throughput up to 5.81 Gbps and 2.01 Gbps for (1024, 870) and (1024, 512) polar code, respectively. Besides, with the proposed construction scheme, the TCSC decoder generally is able to further achieve at least around 20% latency deduction with an negligible gain loss. Overlapped List Successive Cancellation (OLSC) is proposed for LSC decoding as a design approach. LSC decoding has a better performance than LS decoding at the cost of hardware consumption. With such approach, the l (l > 1) instances of successive cancellation (SC) decoder for LSC with list size l can be cut down to only one. This results in a dramatic reduction of the hardware complexity without any decoding performance loss. Meanwhile, approaches to reduce the latency associated with the pipeline scheme are also investigated. Simulation results show that with proposed design approach the hardware efficiency is increased significantly over the recently proposed LSC decoders. Express Journey Belief Propagation (XJBP) is proposed for BP decoding. This idea origins from extending the constituent codes concept from SC to BP decoding. Express journey refers to the datapath of specific constituent codes in the factor graph, which accelerates the belief information propagation speed. The XJBP decoder is able to achieve 40.6% computational complexity reduction with the conventional BP decoding. This enables an energy efficient hardware implementation. In summary, all the efforts to optimize the polar code decoder are presented in this dissertation, supported by the careful analysis, precise description, extensively numerical simulations, thoughtful discussion and RTL implementation on VLSI design platforms

    Dataset Distillation with Convexified Implicit Gradients

    Full text link
    We propose a new dataset distillation algorithm using reparameterization and convexification of implicit gradients (RCIG), that substantially improves the state-of-the-art. To this end, we first formulate dataset distillation as a bi-level optimization problem. Then, we show how implicit gradients can be effectively used to compute meta-gradient updates. We further equip the algorithm with a convexified approximation that corresponds to learning on top of a frozen finite-width neural tangent kernel. Finally, we improve bias in implicit gradients by parameterizing the neural network to enable analytical computation of final-layer parameters given the body parameters. RCIG establishes the new state-of-the-art on a diverse series of dataset distillation tasks. Notably, with one image per class, on resized ImageNet, RCIG sees on average a 108% improvement over the previous state-of-the-art distillation algorithm. Similarly, we observed a 66% gain over SOTA on Tiny-ImageNet and 37% on CIFAR-100

    New Identification and Decoding Techniques for Low-Density Parity-Check Codes

    Get PDF
    Error-correction coding schemes are indispensable for high-capacity high data-rate communication systems nowadays. Among various channel coding schemes, low-density parity-check (LDPC) codes introduced by pioneer Robert G. Gallager are prominent due to the capacity-approaching and superior error-correcting properties. There is no hard constraint on the code rate of LDPC codes. Consequently, it is ideal to incorporate LDPC codes with various code rate and codeword length in the adaptive modulation and coding (AMC) systems which change the encoder and the modulator adaptively to improve the system throughput. In conventional AMC systems, a dedicated control channel is assigned to coordinate the encoder/decoder changes. A questions then rises: if the AMC system still works when such a control channel is absent. This work gives positive answer to this question by investigating various scenarios consisting of different modulation schemes, such as quadrature-amplitude modulation (QAM), frequency-shift keying (FSK), and different channels, such as additive white Gaussian noise (AWGN) channels and fading channels. On the other hand, LDPC decoding is usually carried out by iterative belief-propagation (BP) algorithms. As LDPC codes become prevalent in advanced communication and storage systems, low-complexity LDPC decoding algorithms are favored in practical applications. In the conventional BP decoding algorithm, the stopping criterion is to check if all the parities are satisfied. This single rule may not be able to identify the undecodable blocks, as a result, the decoding time and power consumption are wasted for executing unnecessary iterations. In this work, we propose a new stopping criterion to identify the undecodable blocks in the early stage of the iterative decoding process. Furthermore, in the conventional BP decoding algorithm, the variable (check) nodes are updated in parallel. It is known that the number of iterations can be reduced by the serial scheduling algorithm. The informed dynamic scheduling (IDS) algorithms were proposed in the existing literatures to further reduce the number of iterations. However, the computational complexity involved in finding the update node in the existing IDS algorithms would not be neglected. In this work, we propose a new efficient IDS scheme which can provide better performance-complexity trade-off compared to the existing IDS ones. In addition, the iterative decoding threshold, which is used for differentiating which LDPC code is better, is investigated in this work. A family of LDPC codes, called LDPC convolutional codes, has drawn a lot of attentions from researchers in recent years due to the threshold saturation phenomenon. The IDT for an LDPC convolutional code may be computationally demanding when the termination length goes to thousand or even approaches infinity, especially for AWGN channels. In this work, we propose a fast IDT estimation algorithm which can greatly reduce the complexity of the IDT calculation for LDPC convolutional codes with arbitrary large termination length (including infinity). By utilizing our new IDT estimation algorithm, the IDTs for LDPC convolutional codes with arbitrary large termination length (including infinity) can be quickly obtained

    Plan Projection, Execution, and Learning for Mobile Robot Control

    Get PDF
    Most state-of-the-art hybrid control systems for mobile robots are decomposed into different layers. While the deliberation layer reasons about the actions required for the robot in order to achieve a given goal, the behavioral layer is designed to enable the robot to quickly react to unforeseen events. This decomposition guarantees a safe operation even in the presence of unforeseen and dynamic obstacles and enables the robot to cope with situations it was not explicitly programmed for. The layered design, however, also leaves us with the problem of plan execution. The problem of plan execution is the problem of arbitrating between the deliberation- and the behavioral layer. Abstract symbolic actions have to be translated into streams of local control commands. Simultaneously, execution failures have to be handled on an appropriate level of abstraction. It is now widely accepted that plan execution should form a third layer of a hybrid robot control system. The resulting layered architectures are called three-tiered architectures, or 3T architectures for short. Although many high level programming frameworks have been proposed to support the implementation of the intermediate layer, there is no generally accepted algorithmic basis for plan execution in three-tiered architectures. In this thesis, we propose to base plan execution on plan projection and learning and present a general framework for the self-supervised improvement of plan execution. This framework has been implemented in APPEAL, an Architecture for Plan Projection, Execution And Learning, which extends the well known RHINO control system by introducing an execution layer. This thesis contributes to the field of plan-based mobile robot control which investigates the interrelation between planning, reasoning, and learning techniques based on an explicit representation of the robot's intended course of action, a plan. In McDermott's terminology, a plan is that part of a robot control program, which the robot cannot only execute, but also reason about and manipulate. According to that broad view, a plan may serve many purposes in a robot control system like reasoning about future behavior, the revision of intended activities, or learning. In this thesis, plan-based control is applied to the self-supervised improvement of mobile robot plan execution
    • โ€ฆ
    corecore