Abstract-Successive cancellation list (SCL) decoding algorithm is a powerful method that can help polar codes achieve excellent error-correcting performance. However, the current SCL algorithm and decoders are based on likelihood or loglikelihood forms, which render high hardware complexity. In this paper, we propose a log-likelihood-ratio (LLR)-based SCL (LLR-SCL) decoding algorithm, which only needs half the computation and storage complexity than the conventional one. Then, based on the proposed algorithm, we develop low-complexity VLSI architectures for LLR-SCL decoders. Analysis results show that the proposed LLR-SCL decoder achieves 50% reduction in hardware and 98% improvement in hardware efficiency.
INTRODUCTION
Polar codes have become one of the most favorable forward error correction (FEC) codes since their discovery in 2008 [1] . Nowadays information theorists have shown that the capacity-achieving polar codes can achieve beyond-LDPC error-correcting performance with the use of successive cancellation list (SCL) algorithm [2] . Therefore, the SCL algorithm is viewed as the most promising approach for practical polar decoding.
To date, original SCL algorithm and its variants are based on likelihood [2] or log-likelihood (LL) forms [3] [4] [8] . Because most FEC-contained systems are based on loglikelihood ratio (LLR) form, the current non-LLR-based SCL decoders are incompatible for practical applications. Moreover, because two types of decoding messages (probability being 0 and 1) need to be processed and stored in non-LLR-based SCL decoders, the corresponding hardware complexity is very high. This paper presents an LLR-based SCL (LLR-SCL) algorithm for polar code decoding. In the proposed algorithm, the ratio of probability being 0 and 1 is used to represent the decoding messages. As a result, the computation complexity and memory requirement of LLR-SCL algorithm is greatly reduced as compared to LL-SCL case. Then, based on this new algorithm, a VLSI architecture of the LLR-SCL decoder is presented. Analysis shows that the proposed LLR-SCL decoder achieves 50% reduction in hardware and 98% increase in hardware efficiency.
The rest of this paper is organized as follows. Section II reviews the polar codes. The proposed LLR-SCL algorithm is presented in Section III. Section IV develops the hardware architecture of LLR-SCL decoder. Hardware performance is analyzed in Section V. Section VI draws the conclusions.
II. REVIEW OF POLAR CODES

A. Polar Code
As shown in [1] , the reliability of decoded bits over discrete binary memoryless symmetric channel (D-BMS) can be polarized according to their positions at the codeword. Therefore, by assigning k bits in the source data and (n-k) "0" bits over the reliable and unreliable positions, respectively, we can construct a length-n polar code with rate R=k/n. In general, these (n-k) "0" bits are called "frozen" bits while the k information bits are called "free" bits. For the details of polar encoding, the reader is referred to [1] .
B. Successive Cancellation SC Algorithm
An SCL decoder can be viewed as multiple copies of successive cancellation (SC) decoder; therefore we first introduce SC decoder in this subsection. Fig. 1 shows the decoding scheme of a likelihood-based SC decoder with n=4. Here the SC decoder consists of log 2 n=2 stages, where each stage consist of 4-input 2-output f and g units as the basic computation units. At the end of the last stage (stage2 in this example), a hard-decision unit (h) is used to determine the decoded bit  i u . Notice that in Fig. 1 each f or g unit is labeled with a number, which indicates the time index when the corresponding unit is activated. The function of f and g units can be derived from the polar encoding procedure. Fig. 2(a) shows the basic computation unit of the polar encoder. It can be seen that the basic encoding computation is a left-to-right transformation as out 1 =in 1  in 2 , and out 2 =in 2 , where  is the exclusive-or operation.
where 
In general, (1)- (5) describe likelihood-based SC algorithm.
C. SC Algorithm over code tree
On the other hand, from the view of code tree, the SC algorithm can be viewed as the path searching process over nlevel code tree. Fig. 3 shows an example searching procedure with n=4. Here level-i represents decoded bit  i u . In addition, the value associated with each node is the metric for the path from root node to the current node. For example, 0.09 is the path metric for length-3 path (1,0,1). Here (1,0,1) represents
u  and  3 1 u  . Therefore, the path (1,0,1) with metric 0.09 indicates that Notice that for the f or g units in the last stage (for example stage2 in Fig. 1 
. Therefore, the path metric of SC algorithm is calculated by the f or g units in the last stage.
With the use of path metric, the SC decoder performs a locally optimal searching strategy to find the length-n path with largest metric. As shown in Fig. 3 , in each level the SC decoder first visits two children nodes that are associated with the current survival path. By comparing the corresponding path metrics, the SC decoder selects the path with larger metrics as the updated survival path. The example survival path in Fig. 3 is marked as red arrows. It can be seen that the valid length-4 path with largest metric 0.19 can be found by the SC decoder.
D. SCL Decoding Algorithm
Due to the limit of locally optimal search, in many cases the SC decoder cannot find the correct decoding path. In [2] SCL algorithm was proposed to solve this problem. By using multiple SC decoders over the same code tree, the chance of finding the correct decoding path is significantly improved. Here the number of SC component decoders is referred as list size L. Fig. 4 shows an example of n=4 SCL decoder with L=2. It can be seen that the SCL decoder can trace the correct path (1,0,0,0) with metric 0.23 while SC decoder cannot. 
III. THE PROPOSED LLR-SCL ALGORITHM
A. Benefit of LLR-based Representation
In Section II we present the likelihood-based SC and SCL algorithms. However, in practical applications LLR-based representation, instead of likelihood-based form, is usually adopted for soft FEC decoder designs. This is because the LLR-based designs have much less hardware complexity than the non-LLR-based ones. Generally, in order to describe the joint probabilistic information of a bit v and event Ψ, the likelihood-based decoders need to process and store two types of messages as Pr(v=0, Ψ) and Pr(v=1, Ψ), while LLR-based decoders only need to deal with one type of message as (ln (Pr(v=0, Ψ)/Pr(v=1, Ψ) )). As a result, the computation complexity and memory requirement of LLR-based decoders are much lower than their likelihood-based counterparts.
In [5] , the LLR-based representation had been used in SC decoder design. The success of LLR-based scheme in SC algorithm is built on the property that the binary value of decoded bit  i u can be directly determined from the sign of corresponding LLR messages. However, in SCL algorithm no such property exists. The  i u in the SCL algorithm has to be determined by comparing the metrics of all the candidate paths, which are inherently based on likelihood form. As a result, the current SCL decoders are either based on likelihood or log-likelihood (LL) form, instead of LLR form.
B. The Proposed LLR-SCL Decoding Algorithm
In this subsection we present a LLR-based SCL (LLR-SCL) algorithm. First, we convert original likelihood-base messages in (1)-(4) to the LLR-based forms as follows: After representing all the messages in the LLR form, the next step is to calculate path metrics, which is the key task for developing LLR-based SCL algorithm. For the likelihoodbased SCL algorithm, this calculation is automatically performed by the likelihood-based f or g unit in the last stage of SC component decoders. However, for the LLR-SCL algorithm, after the LLR-based f or g unit outputs LLR messages c or d, an extra metric computation unit (MCU) is needed to calculate path metrics. Next we derive the function of MCU.
First, notice that the metric for the length-i path (
Then with the log-domain representation, the metrics for length-i paths are: 
In addition, similar to the case for length-i path, the log-
) can be represented as: 
C. Simulation Results
In subsection III-B, we perform approximation on the path metric calculation to avoid complex ln(•) computation. Fig. 5 shows this approximation does not cause performance loss. In addition, it is also seen that the approximated LLR-SCL algorithm has the same error-correcting performance with the original non-LLR-based SCL algorithm. 
IV. HARDWARE ARCHITECTURE OF LLR-SCL DECODER
A. Overall Architecture
In this section, based on the new LLR-SCL algorithm, we develop the corresponding hardware architecture. Fig. 6 shows the overall architecture of L-size LLR-SCL decoder. It can be seen that the LLR-SCL decoder consists of L LLR-SC component decoders, which had been discussed in our prior work [5] . After the last stages (f or g units) of the SC decoders calculate c or d, these LLR messages, together with previous path metrics, are input to L metric computation units (MCUs) to generate 2L path metric candidates. Then a sorting block is used to select L largest metrics among the 2L candidates. The L paths which are associated with the L selected metrics become the updated survival paths.
Besides the above mentioned computation blocks, the LLR-SCL decoder also contains three types of memory banks. The LLR messages memory bank stores and provides the LLR messages which are used in L LLR-SC decoders. Survival path memory bank and path metrics memory bank store and update the L survival paths and the associated path metrics.
Considering the design of memory bank is straightforward, in this section we focus the discussion on f/g units, MCU and sorting block. 
B. Processing element (PE) for LLR-based f and g units
As shown in Fig. 6 , the LLR-SC component decoder consists of multiple processing elements (PEs). Each PE contains an LLR-based f unit and an LLR-based g unit. Since the functions of these two units have been described in equations (6)(7), hence the architecture of a q-bit PE is developed as shown in Fig. 7 . Here C2S and S2C represent the conversion blocks between 2's complement and signmagnitude forms. The detail of LLR-based PE can be referred to [5] . 
C. Metric Computation Unit (MCU)
(14)-(17) describe the function of MCU. Since for each decoded bit  i u either c or d can be input to the MCU, we use inputLLR to represent these for convenience. Then, according to (14)-(17), the q-bit architecture of MCU is developed as shown in Fig. 8 . 
D. Sorting block
Recall that the sorting block is used to compare those metrics and select the L paths with larger metrics. Here we use the batcher odd-even merge algorithm [6] to perform sorting function. Fig. 9 is an example architecture for 8-input sorting.
Here C&S unit represents the compare and swap operation. It can be seen that for the example 8-input sorting block its critical path delay is 1+2+3=6 T C&S , where T C&S is the critical path delay of C&S unit. In general, for 2 i -input sorting block, the critical path delay is 1+2+...i=(i+1)i/2 T C&S , and it is also the critical path delay of the overall LLR-SCL decoder. 
V. ANALYSIS OF HARDWARE PERFORMANCE
In this section the hardware performance of the LLR-SCL decoder is analyzed. Table I lists the hardware performances of LLR-SCL decoder and the conventional log-likelihoodbased SCL (LL-SCL) decoder [3] [4] . For fair comparison, we assume that the listed decoders have the same n=1024, L=4 and q-bit quantization 1 . In addition, both of their SC component decoders adopt line-type architecture [7] . Table I shows that the proposed LLR-SCL decoder is very area efficient than prior designs. Compared with the LL-SCL architecture, the LLR-SCL decoder reduces total gate count by 50%. This great saving is contributed to the reduced need of data storage and computation for soft messages. Moreover, because the LLR-SCL and LL-SCL decoders have the same critical path delay and latency, it means these two designs have the same throughput. Therefore, hardware efficiency of LLR-SCL decoder, defined as the ratio between throughput and gate count, is 98% higher than that of LL-SCL design.
It is noticed that another approach that derives the LLRbased SCL was reported in [9] . Without prior access to [9] , we independently propose the derivation procedure in this paper. Our derivation procedure is different from [9] but leads to the same final LLR-SCL form. This illustrates that the inherent
