Search CORE

7 research outputs found

A 5.16Gbps decoder ASIC for Polar Code in 16nm FinFET

Author: Liu Xiaocheng
Qiu Pengcheng
Tong Jiajie
Wang Jun
Zhang Huazi
Zhang Qifan
Zhao Changyong
Publication venue
Publication date: 04/07/2018
Field of study

Polar codes has been selected as 5G standard. However, only a couple of ASIC featuring decoders are fabricated,and none of them support list size L > 4 and code length N > 1024. This paper presents an ASIC implementation of three decoders for polar code: successive cancellation (SC) decoder, flexible decoder and ultra-reliable decoder. These decoders are all SC based decoder, supporting list size up to 1,8,32 and code length up to 2^15,2^14,2^11 respectively. This chip is fabricated in a 16nm TSMC FinFET technology, and can be clocked at 1 Ghz. Optimization techniques are proposed and employed to increase throughput. Experiment result shows that the throughput can achieve up to 5.16Gbps. Compared with fabricated AISC decoder and synthesized decoder in literature, the flexible decoder achieves higher area efficiency

arXiv.org e-Print Archive

Tb/s Polar Successive Cancellation Decoder 16nm ASIC Implementation

Author: Bertrand Kaoutar
Derudder Veerle
Kolağasıoğlu Ertuğrul
Sezer E. Göksu
Süral Altuğ
Publication venue
Publication date: 20/09/2020
Field of study

This work presents an efficient ASIC implementation of successive cancellation (SC) decoder for polar codes. SC is a low-complexity depth-first search decoding algorithm, favorable for beyond-5G applications that require extremely high throughput and low power. The ASIC implementation of SC in this work exploits many techniques including pipelining and unrolling to achieve Tb/s data throughput without compromising power and area metrics. To reduce the complexity of the implementation, an adaptive log-likelihood ratio (LLR) quantization scheme is used. This scheme optimizes bit precision of the internal LLRs within the range of 1-5 bits by considering irregular polarization and entropy of LLR distribution in SC decoder. The performance cost of this scheme is less than 0.2 dB when the code block length is 1024 bits and the payload is 854 bits. Furthermore, some computations in SC take large space with high degree of parallelization while others take longer time steps. To optimize these computations and reduce both memory and latency, register reduction/balancing (R-RB) method is used. The final decoder architecture is called optimized polar SC (OPSC). The post-placement-routing results at 16nm FinFet ASIC technology show that OPSC decoder achieves 1.2 Tb/s coded throughput on 0.79 mm

^2

area with 0.95 pJ/bit energy efficiency

arXiv.org e-Print Archive

On the Construction of $G_N$ -coset Codes for Parallel Decoding

Author: Ge Yiqun
Li Rong
Tong Jiajie
Wang Jun
Wang Xianbin
Zhang Huazi
Publication venue
Publication date: 29/10/2019
Field of study

In this paper, we propose a type of

G_N

-coset codes for a highly parallel stage-permuted turbo-like decoder. The decoder exploits the equivalence between two stage-permuted factor graphs of

G_N

-coset codes. Specifically, the inner codes of a

G_N

-coset code consist of independent component codes, thus are decoded in parallel. The extrinsic information of the code bits is obtained and iteratively exchanged between the two graphs until convergence. Accordingly, we explore a heuristic and flexible code construction method (information set selection) for various information lengths and coding rates. Simulations show that the proposed

G_N

-coset codes could achieve a coding performance comparable with polar codes but enjoy higher decoding parallelism.Comment: 6 pages, 6 figure

arXiv.org e-Print Archive

Toward Terabits-per-second Communications: A High-Throughput Hardware Implementation of $G_N$ -Coset Codes

Author: Dai Shengchen
Li Rong
Tong Jiajie
Wang Jun
Wang Xianbin
Zhang Huazi
Zhang Qifan
Publication venue
Publication date: 21/04/2020
Field of study

Recently, a parallel decoding algorithm of

G_N

-coset codes was proposed.The algorithm exploits two equivalent decoding graphs.For each graph, the inner code part, which consists of independent component codes, is decoded in parallel. The extrinsic information of the code bits is obtained and iteratively exchanged between the graphs until convergence. This algorithm enjoys a higher decoding parallelism than the previous successive cancellation algorithms, due to the avoidance of serial outer code processing. In this work, we present a hardware implementation of the parallel decoding algorithm, it can support maximum

N=16384

. We complete the decoder's physical layout in TSMC

16nm

process and the size is

999.936\mu m\times 999.936\mu m, \,\approx 1.00mm^2

. The decoder's area efficiency and power consumption are evaluated for the cases of

N=16384,K=13225

and

N=16384, K=14161

. Scaled to

7nm

process, the decoder's throughput is higher than

477Gbps/mm^2

and

533Gbps/mm^2

with five iterations.Comment: 5 pages, 6 figure

arXiv.org e-Print Archive

An Asymmetric Adaptive SCL Decoder Hardware for Ultra-Low-Error-Rate Polar Codes

Author: Huang Lingchen
Liu Xiaocheng
Tong Jiajie
Wang Jun
Zhang Huazi
Publication venue
Publication date: 03/04/2019
Field of study

In theory, Polar codes do not exhibit an error floor under successive-cancellation (SC) decoding. In practice, frame error rate (FER) down to

10^{-12}

has not been reported with a real SC list (SCL) decoder hardware. This paper presents an asymmetric adaptive SCL (A2SCL) decoder, implemented in real hardware, for high-throughput and ultra-reliable communications. We propose to concatenate multiple SC decoders with an SCL decoder, in which the numbers of SC/SCL decoders are balanced with respect to their area and latency. In addition, a novel unequal-quantization technique is adopted. The two optimizations are crucial for improving SCL throughput within limited chip area. As an application, we build a link-level FPGA emulation platform to measure ultra-low FERs of 3GPP NR Polar codes (with parity-check and CRC bits). It is flexible to support all list sizes up to

8

, code lengths up to

1024

and arbitrary code rates. With the proposed hardware, decoding speed is 7000 times faster than a CPU core. For the first time, FER as low as

10^{-12}

is measured and quantization effect is analyzed

arXiv.org e-Print Archive

Toward Terabits-per-second Communications: Low-Complexity Parallel Decoding of $G_N$ -Coset Codes

Author: Dai Shengchen
Li Rong
Tong Jiajie
Wang Jun
Wang Xianbin
Zhang Huazi
Publication venue
Publication date: 21/04/2020
Field of study

Recently, a parallel decoding framework of

G_N

-coset codes was proposed. High throughput is achieved by decoding the independent component polar codes in parallel. Various algorithms can be employed to decode these component codes, enabling a flexible throughput-performance tradeoff. In this work, we adopt SC as the component decoders to achieve the highest-throughput end of the tradeoff. The benefits over soft-output component decoders are reduced complexity and simpler (binary) interconnections among component decoders. To reduce performance degradation, we integrate an error detector and a log-likelihood ratio (LLR) generator into each component decoder. The LLR generator, specifically the damping factors therein, is designed by a genetic algorithm. This low-complexity design can achieve an area efficiency of

533Gbps/mm^2

under 7nm technology.Comment: 5 pages, 6 figure

arXiv.org e-Print Archive

A Flip-Syndrome-List Polar Decoder Architecture for Ultra-Low-Latency Communications

Author: Huangfu Yourui
Li Rong
Qiu Pengcheng
Tong Jiajie
Wang Jun
Wang Xianbin
Xu Chen
Zhang Huazi
Publication venue
Publication date: 06/08/2018
Field of study

We consider practical hardware implementation of Polar decoders. To reduce latency due to the serial nature of successive cancellation (SC), existing optimizations improve parallelism with two approaches, i.e., multi-bit decision or reduced path splitting. In this paper, we combine the two procedures into one with an error-pattern-based architecture. It simultaneously generates a set of candidate paths for multiple bits with pre-stored patterns. For rate-1 (R1) or single parity-check (SPC) nodes, we prove that a small number of deterministic patterns are required to guarantee performance preservation. For general nodes, low-weight error patterns are indexed by syndrome in a look-up table and retrieved in O(1) time. The proposed flip-syndrome-list (FSL) decoder fully parallelizes all constituent code blocks without sacrificing performance, thus is suitable for ultra-low-latency applications. Meanwhile, two code construction optimizations are presented to further reduce complexity and improve performance, respectively.Comment: 10 pages, submitted to IEEE Access (Special Issue on Advances in Channel Coding for 5G and Beyond

arXiv.org e-Print Archive