Search CORE

427 research outputs found

낸드플래시 메모리 오류정정을 위한 고성능 LDPC 복호방법 연구

Author: 김종홍
Publication venue: 서울대학교 대학원
Publication date: 01/08/2013
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2013. 8. 성원용.반도체 공정의 미세화에 따라 비트 에러율이 증가하는 낸드 플래시 메모리에서 고성능 에러 정정 방법은 필수적이다. Low-density parity-check (LDPC) 부호와 같은 연판정 에러 정정 부호는 뛰어난 에러 정정 성능을 보이지만, 높은 구현 복잡도로 인해 플래시 메모리 시스템에 적용되기 힘든 단점이 있다. 본 논문에서는 LDPC 부호의 효율적인 복호를 위해 고성능 메시지 전파 스케줄링 방법과 저 복잡도 복호 알고리즘을 제안한다. 특히 finite geometry (FG) LDPC 부호에 대한 효율적인 디코더 아키텍쳐를 제안하며, 구현된 디코더를 이용하여 낸드 플래시 메모리에 대해 연판정 복호시의 에너지 소모량에 대해 연구한다. 본 논문의 첫 번째 부분에서는 동적 스케줄링 (informed dynamic scheduling, IDS) 알고리즘의 성능향상 방법에 대해 연구한다. 이를 위해 우선 기존의 가장 빠른 수렴 속도를 보이는 IDS 알고리즘인 레지듀얼 신뢰 전파 (residual belief propagation, RBP) 알고리즘의 동작 특성을 분석하고, 이를 바탕으로 특정 노드에 메시지 갱신이 집중되는 것을 방지하여 RBP 알고리즘의 수렴속도를 증가시킨 improved RBP (iRBP) 알고리즘을 제안한다. 또한 iRBP의 뛰어난 수렴속도와 기존의 NS 알고리즘의 우수한 에러 정정 능력을 모두 갖춘 신드롬 기반의 혼합 스케줄링 (mixed scheduling) 방법을 제안한다. 끝으로 다양한 부호율의 LDPC 부호에 대한 모의실험을 통해 제안된 신드롬 기반의 혼합 스케줄링 방법이 본 논문에서 시험된 다른 모든 스케줄링 알고리즘의 성능을 능가함을 확인하였다. 논문의 두 번째 부분에서는 복호 실패시 많은 비트 에러를 발생시키는 a posteriori probability (APP) 알고리즘의 개선 방안에 방안을 제안한다. 또한 빠른 수렴속도와 우수한 에러 마루 (error-floor) 성능으로 데이터 저장장치에 적합한 FG-LDPC 부호에 대해 제안된 알고리즘이 적용된 하드웨어 아키텍처를 제안하였다. 제안된 아키텍처는 높은 노드 가중치를 가지는 FG-LDPC 부호에 적합하도록 쉬프트 레지스터 (shift registers)와 SRAM 기반의 혼합 구조를 채용하며, 높은 처리량을 얻기 위해 파이프라인된 병렬 아키텍처를 사용한다. 또한 메모리 사용량을 줄이기 위해 세 가지의 메모리 용량 감소 기법을 적용하며, 전력 소비를 줄이기 위해 두 가지의 저전력 기법을 제안한다. 본 제안된 아키텍처는 부호율 0.96의 (68254, 65536) Euclidean geometry LDPC 부호에 대해 0.13-um CMOS 공정에서 구현하였다. 마지막으로 본 논문에서는 연판정 복호가 적용된 낸드 플래시 메모리 시스템의 에너지 소모를 낮추는 방법에 대해 제안한다. 연판정 기반의 에러 정정 알고리즘은 높은 성능을 보이지만, 이는 플래시 메모리의 센싱 수와 에너지 소모를 증가 시키는 단점이 있다. 본 연구에서는 앞서 구현된 LDPC 디코더가 채용된 낸드 플래시 메모리 시스템의 에너지 소모를 분석하고, LDPC 디코더와 BCH 디코더 간의 칩 사이즈와 에너지 소모량을 비교하였다. 이와 더불어 본 논문에서는 LDPC 디코더를 이용한 센싱 정밀도 결정 방법을 제안한다. 본 연구를 통해 제안된 복호 및 스케줄링 알고리즘, VLSI 아키텍쳐, 그리고 읽기 정밀도 결정 방법을 통해 낸드 플래시 메모리 시스템의 에러 정정 성능을 극대화 하고 에너지 소모를 최소화 할 수 있다.High-performance error correction for NAND flash memory is greatly needed because the raw bit error rate increases as the semiconductor geometry shrinks for high density. Soft-decision error correction, such as low-density parity-check (LDPC) codes, offers high performance but their implementation complexity hinders wide adoption to consumer products. This dissertation proposes two high-performance message-passing schedules and a low-complexity decoding algorithm for LDPC codes. In particular, an efficient decoder architecture for finite geometry (FG) LDPC codes is proposed, and the energy consumption of soft-decision decoding for NAND flash memory is analyzed. The first part of this dissertation is devoted to improving the informed dynamic scheduling (IDS) algorithms. We analyze the behavior of the residual belief propagation (RBP), which is the fastest IDS algorithm, and develop an improved RBP (iRBP) by avoiding the concentration of message updates at a particular node. We also study the syndrome-based mixed scheduling of the iRBP and the node-wise scheduling (NS). The proposed mixed scheduling outperforms all other scheduling methods tested in this work. The next part of this dissertation is to develop a conditional variable node update scheme for the a posteriori probability (APP) algorithm. The developed algorithm is robust to decoding failures and can reduce the dynamic power consumption by lowering switching activities in the LDPC decoder. To implement the developed algorithm, we propose a memory-efficient pipelined parallel architecture for LDPC decoding. The architecture employs FG-LDPC codes that not only show fast convergence speed and good error-floor performance but also perform well with iterative decoding algorithms, which is especially suitable for data storage devices. We also developed a rate-0.96 (68254, 65536) Euclidean geometry LDPC code and implemented the proposed architecture in 0.13-um CMOS technology. This dissertation also covers low-energy error correction of NAND flash memory through soft-decision decoding. The soft-decision-based error correction algorithms show high performance, but they demand an increased number of flash memory sensing operations and consume more energy for memory access. We examine the energy consumption of a NAND flash memory system equipping an LDPC code-based soft-decision error correction circuit. The sum of energy consumed at NAND flash memory and the LDPC decoder is minimized. In addition, the chip size and energy consumption of the decoder were compared with those of two Bose-Chaudhuri-Hocquenghem (BCH) decoding circuits showing the comparable error performance and the throughput. We also propose an LDPC decoder-assisted precision selection method that needs virtually no overhead. This dissertation is intended to develop high-performance and low-power error correction circuits for NAND flash memory by studying improved decoding and scheduling algorithms, VLSI architecture, and a read precision selection method.1 Introduction 1 1.1 NAND Flash Memory 1 1.2 LDPC Codes 4 1.3 Outline of the Dissertation 6 2 LDPC Decoding and Scheduling Algorithms 8 2.1 Introduction 8 2.2 Decoding Algorithms for LDPC Codes 10 2.2.1 Belief Propagation Algorithm 10 2.2.2 Simplified Belief Propagation Algorithms 12 2.3 Message-Passing Schedules for Decoding of LDPC Codes 15 2.3.1 Static Schedules 15 2.3.2 Dynamic Schedules 17 3 Improved Dynamic Scheduling Algorithms for Decoding of LDPC Codes 22 3.1 Introduction 22 3.2 Improved Residual Belief Propagation Algorithm 23 3.3 Syndrome-Based Mixed Scheduling of iRBP and NS 26 3.4 Complexity Analysis and Simulation Results 28 3.4.1 Complexity Analysis 28 3.4.2 Simulation Results 29 3.5 Concluding Remarks 33 4 A Pipelined Parallel Architecture for Decoding of Finite-Geometry LDPC Codes 36 4.1 Introduction 36 4.2 Finite-Geometry LDPC Codes and Conditional Variable Node Update Algorithm 38 4.2.1 Finite-Geometry LDPC codes 38 4.2.2 Conditional Variable Node Update Algorithm for Fixed-Point Normalized APP-Based Algorithm 40 4.3 Decoder Architecture 46 4.3.1 Baseline Sequential Architecture 46 4.3.2 Pipelined-Parallel Architecture 54 4.3.3 Memory Capacity Reduction 57 4.4 Implementation Results 60 4.5 Concluding Remarks 64 5 Low-Energy Error Correction of NAND Flash Memory through Soft-Decision Decoding 66 5.1 Introduction 66 5.2 Energy Consumption of Read Operations in NAND Flash Memory 67 5.2.1 Voltage Sensing Scheme for Soft-Decision Data Output 67 5.2.2 LSB and MSB Concurrent Access Scheme for Low-Energy Soft-Decision Data Output 72 5.2.3 Energy Consumption of Read Operations in NAND Flash Memory 73 5.3 The Performance of Soft-Decision Error Correction over a NAND Flash Memory Channel 76 5.4 Hardware Performance of the (68254, 65536) LDPC Decoder 81 5.4.1 Energy Consumption of the LDPC Decoder 81 5.4.2 Performance Comparison of the LDPC Decoder and Two BCH Decoders 83 5.5 Low-Energy Error Correction Scheme for NAND Flash Memory 87 5.5.1 Optimum Precision for Low-Energy Decoding 87 5.5.2 Iteration Count-Based Precision Selection 90 5.6 Concluding Remarks 91 6 Conclusion 94 Bibliography 96 Abstract in Korean 110 감사의 글 112Docto

SNU Open Repository and Archive

Algorithm Development and VLSI Implementation of Energy Efficient Decoders of Polar Codes

Author: Xu Jingwei
Publication venue
Publication date: 27/02/2020
Field of study

With its low error-floor performance, polar codes attract significant attention as the potential standard error correction code (ECC) for future communication and data storage. However, the VLSI implementation complexity of polar codes decoders is largely influenced by its nature of in-series decoding. This dissertation is dedicated to presenting optimal decoder architectures for polar codes. This dissertation addresses several structural properties of polar codes and key properties of decoding algorithms that are not dealt with in the prior researches. The underlying concept of the proposed architectures is a paradigm that simplifies and schedules the computations such that hardware is simplified, latency is minimized and bandwidth is maximized. In pursuit of the above, throughput centric successive cancellation (TCSC) and overlapping path list successive cancellation (OPLSC) VLSI architectures and express journey BP (XJBP) decoders for the polar codes are presented. An arbitrary polar code can be decomposed by a set of shorter polar codes with special characteristics, those shorter polar codes are referred to as constituent polar codes. By exploiting the homogeneousness between decoding processes of different constituent polar codes, TCSC reduces the decoding latency of the SC decoder by 60% for codes with length n = 1024. The error correction performance of SC decoding is inferior to that of list successive cancellation decoding. The LSC decoding algorithm delivers the most reliable decoding results; however, it consumes most hardware resources and decoding cycles. Instead of using multiple instances of decoding cores in the LSC decoders, a single SC decoder is used in the OPLSC architecture. The computations of each path in the LSC are arranged to occupy the decoder hardware stages serially in a streamlined fashion. This yields a significant reduction of hardware complexity. The OPLSC decoder has achieved about 1.4 times hardware efficiency improvement compared with traditional LSC decoders. The hardware efficient VLSI architectures for TCSC and OPLSC polar codes decoders are also introduced. Decoders based on SC or LSC algorithms suffer from high latency and limited throughput due to their serial decoding natures. An alternative approach to decode the polar codes is belief propagation (BP) based algorithm. In BP algorithm, a graph is set up to guide the beliefs propagated and refined, which is usually referred to as factor graph. BP decoding algorithm allows decoding in parallel to achieve much higher throughput. XJBP decoder facilitates belief propagation by utilizing the specific constituent codes that exist in the conventional factor graph, which results in an express journey (XJ) decoder. Compared with the conventional BP decoding algorithm for polar codes, the proposed decoder reduces the computational complexity by about 40.6%. This enables an energy-efficient hardware implementation. To further explore the hardware consumption of the proposed XJBP decoder, the computations scheduling is modeled and analyzed in this dissertation. With discussions on different hardware scenarios, the optimal scheduling plans are developed. A novel memory-distributed micro-architecture of the XJBP decoder is proposed and analyzed to solve the potential memory access problems of the proposed scheduling strategy. The register-transfer level (RTL) models of the XJBP decoder are set up for comparisons with other state-of-the-art BP decoders. The results show that the power efficiency of BP decoders is improved by about 3 times

Texas A&M Repository

Algorithm Development and VLSI Implementation of Energy Efficient Decoders of Polar Codes

Author: Xu Jingwei
Publication venue
Publication date: 27/02/2020
Field of study

NEURAL NETWORKS FOR DECISION SUPPORT: PROBLEMS AND OPPORTUNITIES

Author: Ariav Gad
Schocken Shirnon
Publication venue: Stern School of Business, New York University
Publication date: 01/11/1991
Field of study

Neural networks offer an approach to computing which - unlike conventional programming - does not necessitate a complete algorithmic specification. Furthermore, neural networks provide inductive means for gathering, storing, and using, experiential knowledge. Incidentally, these have also been some of the fundamental motivations for the development of decision support systems in general. Thus, the interest in neural networks for decision support is immediate and obvious. In this paper, we analyze the potential contribution of neural networks for decision support, on one hand, and point out at some inherent constraints that might inhibit their use, on the other. For the sake of completeness and organization, the analysis is carried out in the context of a general-purpose DSS framework that examines all the key factors that come into play in the design of any decision support system.Information Systems Working Papers Serie

New York University Faculty Digital Archive

A Visual Analytics Approach to Debugging Cooperative, Autonomous Multi-Robot Systems' Worldviews

Author: Bae Suyun
Davidoff Scott
Hook Joshua Vander
Ma Kwan-Liu
Rossi Federico
Publication venue
Publication date: 03/09/2020
Field of study

Autonomous multi-robot systems, where a team of robots shares information to perform tasks that are beyond an individual robot's abilities, hold great promise for a number of applications, such as planetary exploration missions. Each robot in a multi-robot system that uses the shared-world coordination paradigm autonomously schedules which robot should perform a given task, and when, using its worldview--the robot's internal representation of its belief about both its own state, and other robots' states. A key problem for operators is that robots' worldviews can fall out of sync (often due to weak communication links), leading to desynchronization of the robots' scheduling decisions and inconsistent emergent behavior (e.g., tasks not performed, or performed by multiple robots). Operators face the time-consuming and difficult task of making sense of the robots' scheduling decisions, detecting de-synchronizations, and pinpointing the cause by comparing every robot's worldview. To address these challenges, we introduce MOSAIC Viewer, a visual analytics system that helps operators (i) make sense of the robots' schedules and (ii) detect and conduct a root cause analysis of the robots' desynchronized worldviews. Over a year-long partnership with roboticists at the NASA Jet Propulsion Laboratory, we conduct a formative study to identify the necessary system design requirements and a qualitative evaluation with 12 roboticists. We find that MOSAIC Viewer is faster- and easier-to-use than the users' current approaches, and it allows them to stitch low-level details to formulate a high-level understanding of the robots' schedules and detect and pinpoint the cause of the desynchronized worldviews.Comment: To appear in IEEE Conference on Visual Analytics Science and Technology (VAST) 202

arXiv.org e-Print Archive

Crossref

System Development and VLSI Implementation of High Throughput and Hardware Efficient Polar Code Decoder

Author: Che Tiben
Publication venue
Publication date: 21/02/2020
Field of study

Polar code is the first channel code which is provable to achieve the Shannon capacity. Additionally, it has a very good performance in terms of low error floor. All these merits make it a potential candidate for the future standard of wireless communication or storage system. Polar code is received increasing research interest these years. However, the hardware implementation of hardware decoder still has not meet the expectation of practical applications, no matter from neither throughput aspect nor hardware efficient aspect. This dissertation presents several system development approaches and hardware structures for three widely known decoding algorithms. These algorithms are successive cancellation (SC), list successive cancellation (LSC) and belief propagation (BP). All the efforts are in order to maximize the throughput meanwhile minimize the hardware cost. Throughput centric successive cancellation (TCSC) decoder is proposed for SC decoding. By introducing the concept of constituent code, the decoding latency is significantly reduced with a negligible decoding performance loss. However, the specifically designed computation unites dramatically increase the hardware cost, and how to handle the conventional polar code sets and constituent codes sets makes the hardware implementation more complicated. By exploiting the natural property of conventional SC decoder, datapaths for decoding constituent codes are compatibly built via computation units sharing technique. This approach does not incur additional hardware cost expect some multiplexer logic, but can significantly increase the decoding throughput. Other techniques such as pre-computing and gate-level optimization are used as well in order to further increase the decoding throughput. A specific designed partial sum generator (PSG) is also investigated in this dissertation. This PSG is hardware efficient and timing compatible with proposed TCSC decoder. Additionally, a polar code construction scheme with constituent codes optimization is also presents. This construction scheme aims to reduce the constituent codes based SC decoding latency. Results show that, compared with the state-of-art decoder, TCSC can achieve at least 60% latency reduction for the codes with length n = 1024. By using Nangate FreePDK 45nm process, TCSC decoder can reach throughput up to 5.81 Gbps and 2.01 Gbps for (1024, 870) and (1024, 512) polar code, respectively. Besides, with the proposed construction scheme, the TCSC decoder generally is able to further achieve at least around 20% latency deduction with an negligible gain loss. Overlapped List Successive Cancellation (OLSC) is proposed for LSC decoding as a design approach. LSC decoding has a better performance than LS decoding at the cost of hardware consumption. With such approach, the l (l > 1) instances of successive cancellation (SC) decoder for LSC with list size l can be cut down to only one. This results in a dramatic reduction of the hardware complexity without any decoding performance loss. Meanwhile, approaches to reduce the latency associated with the pipeline scheme are also investigated. Simulation results show that with proposed design approach the hardware efficiency is increased significantly over the recently proposed LSC decoders. Express Journey Belief Propagation (XJBP) is proposed for BP decoding. This idea origins from extending the constituent codes concept from SC to BP decoding. Express journey refers to the datapath of specific constituent codes in the factor graph, which accelerates the belief information propagation speed. The XJBP decoder is able to achieve 40.6% computational complexity reduction with the conventional BP decoding. This enables an energy efficient hardware implementation. In summary, all the efforts to optimize the polar code decoder are presented in this dissertation, supported by the careful analysis, precise description, extensively numerical simulations, thoughtful discussion and RTL implementation on VLSI design platforms

System Development and VLSI Implementation of High Throughput and Hardware Efficient Polar Code Decoder

Author: Che Tiben
Publication venue
Publication date: 21/02/2020
Field of study

Texas A&M Repository

Dataset Distillation with Convexified Implicit Gradients

Author: Hasani Ramin
Lechner Mathias
Loo Noel
Rus Daniela
Publication venue
Publication date: 13/02/2023
Field of study

We propose a new dataset distillation algorithm using reparameterization and convexification of implicit gradients (RCIG), that substantially improves the state-of-the-art. To this end, we first formulate dataset distillation as a bi-level optimization problem. Then, we show how implicit gradients can be effectively used to compute meta-gradient updates. We further equip the algorithm with a convexified approximation that corresponds to learning on top of a frozen finite-width neural tangent kernel. Finally, we improve bias in implicit gradients by parameterizing the neural network to enable analytical computation of final-layer parameters given the body parameters. RCIG establishes the new state-of-the-art on a diverse series of dataset distillation tasks. Notably, with one image per class, on resized ImageNet, RCIG sees on average a 108% improvement over the previous state-of-the-art distillation algorithm. Similarly, we observed a 66% gain over SOTA on Tiny-ImageNet and 37% on CIFAR-100

arXiv.org e-Print Archive

New Identification and Decoding Techniques for Low-Density Parity-Check Codes

Author: Xia Tian
Publication venue: LSU Digital Commons
Publication date: 01/01/2015
Field of study

Error-correction coding schemes are indispensable for high-capacity high data-rate communication systems nowadays. Among various channel coding schemes, low-density parity-check (LDPC) codes introduced by pioneer Robert G. Gallager are prominent due to the capacity-approaching and superior error-correcting properties. There is no hard constraint on the code rate of LDPC codes. Consequently, it is ideal to incorporate LDPC codes with various code rate and codeword length in the adaptive modulation and coding (AMC) systems which change the encoder and the modulator adaptively to improve the system throughput. In conventional AMC systems, a dedicated control channel is assigned to coordinate the encoder/decoder changes. A questions then rises: if the AMC system still works when such a control channel is absent. This work gives positive answer to this question by investigating various scenarios consisting of different modulation schemes, such as quadrature-amplitude modulation (QAM), frequency-shift keying (FSK), and different channels, such as additive white Gaussian noise (AWGN) channels and fading channels. On the other hand, LDPC decoding is usually carried out by iterative belief-propagation (BP) algorithms. As LDPC codes become prevalent in advanced communication and storage systems, low-complexity LDPC decoding algorithms are favored in practical applications. In the conventional BP decoding algorithm, the stopping criterion is to check if all the parities are satisfied. This single rule may not be able to identify the undecodable blocks, as a result, the decoding time and power consumption are wasted for executing unnecessary iterations. In this work, we propose a new stopping criterion to identify the undecodable blocks in the early stage of the iterative decoding process. Furthermore, in the conventional BP decoding algorithm, the variable (check) nodes are updated in parallel. It is known that the number of iterations can be reduced by the serial scheduling algorithm. The informed dynamic scheduling (IDS) algorithms were proposed in the existing literatures to further reduce the number of iterations. However, the computational complexity involved in finding the update node in the existing IDS algorithms would not be neglected. In this work, we propose a new efficient IDS scheme which can provide better performance-complexity trade-off compared to the existing IDS ones. In addition, the iterative decoding threshold, which is used for differentiating which LDPC code is better, is investigated in this work. A family of LDPC codes, called LDPC convolutional codes, has drawn a lot of attentions from researchers in recent years due to the threshold saturation phenomenon. The IDT for an LDPC convolutional code may be computationally demanding when the termination length goes to thousand or even approaches infinity, especially for AWGN channels. In this work, we propose a fast IDT estimation algorithm which can greatly reduce the complexity of the IDT calculation for LDPC convolutional codes with arbitrary large termination length (including infinity). By utilizing our new IDT estimation algorithm, the IDTs for LDPC convolutional codes with arbitrary large termination length (including infinity) can be quickly obtained

Louisiana State University

Plan Projection, Execution, and Learning for Mobile Robot Control

Author: Belker Thorsten
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

Most state-of-the-art hybrid control systems for mobile robots are decomposed into different layers. While the deliberation layer reasons about the actions required for the robot in order to achieve a given goal, the behavioral layer is designed to enable the robot to quickly react to unforeseen events. This decomposition guarantees a safe operation even in the presence of unforeseen and dynamic obstacles and enables the robot to cope with situations it was not explicitly programmed for. The layered design, however, also leaves us with the problem of plan execution. The problem of plan execution is the problem of arbitrating between the deliberation- and the behavioral layer. Abstract symbolic actions have to be translated into streams of local control commands. Simultaneously, execution failures have to be handled on an appropriate level of abstraction. It is now widely accepted that plan execution should form a third layer of a hybrid robot control system. The resulting layered architectures are called three-tiered architectures, or 3T architectures for short. Although many high level programming frameworks have been proposed to support the implementation of the intermediate layer, there is no generally accepted algorithmic basis for plan execution in three-tiered architectures. In this thesis, we propose to base plan execution on plan projection and learning and present a general framework for the self-supervised improvement of plan execution. This framework has been implemented in APPEAL, an Architecture for Plan Projection, Execution And Learning, which extends the well known RHINO control system by introducing an execution layer. This thesis contributes to the field of plan-based mobile robot control which investigates the interrelation between planning, reasoning, and learning techniques based on an explicit representation of the robot's intended course of action, a plan. In McDermott's terminology, a plan is that part of a robot control program, which the robot cannot only execute, but also reason about and manipulate. According to that broad view, a plan may serve many purposes in a robot control system like reasoning about future behavior, the revision of intended activities, or learning. In this thesis, plan-based control is applied to the self-supervised improvement of mobile robot plan execution

bonndoc – Der Publikationsserver der Universität Bonn