Successive-cancellation list (SCL) decoding is an algorithm that provides very good error-correction performance for polar codes. However, its hardware implementation requires a large amount of memory, mainly to store intermediate results.
INTRODUCTION
Polar codes provably achieve the symmetric capacity of memoryless channels and therefore have gained a lot of attention as promising error-correcting codes [1] . Successive-cancellation (SC) decoding was first proposed as a low-complexity decoding algorithm for polar codes. It was shown that the error probability of polar codes under SC decoding goes to zero as the blocklength goes to infinity, provided that the rate of the polar code is less than the capacity of the channel. From a hardware implementation point of view, SC decoding can be represented as a decoder tree having a fixed time and space complexity and is thus very attractive [2] . However, the algorithm is sub-optimal, especially for decoding moderate-length polar codes.
To improve the error-correction performance of SC decoding, the SC list (SCL) decoding algorithm was proposed in [3] . Unlike SC decoding, which estimates each bit based on the estimation of previous bits, SCL keeps a constrained list of the most likely candidates at each decoding step using the loglikelihood (LL) of each candidate. SCL reduces the gap between SC and maximum likelihood (ML) decoding at the cost of increased complexity. Furthermore, it was shown that concatenating polar codes with a cyclic redundancy check (CRC) as an outer code improves the performance of SCL to the extent where polar codes decoded with CRC-aided SCL are able to outperform low-density parity-check (LDPC) codes of the same length and rate [3] . To reduce the hardware complexity associated with LL-based SCL decoding, log-likelihood ratio (LLR) values were used and the path metric calculations adapted accordingly in [4] . Unfortunately, similarly to its LLbased counterpart, LLR-based SCL decoding requires a large memory to store the intermediate values, i.e. the total core area is often largely dominated by memory [4] .
In this paper, a partitioned SCL (PSCL) decoding algorithm is proposed in order to reduce the memory requirements associated with SCL decoding. More specifically, PSCL decoding performs SCL decoding on partitions of the decoder tree and only one path candidate is transferred from one partition to the next. As a result, memory can be shared between the different partitions of the code, therefore, significantly reducing the overall memory requirements. Without loss of generality, we propose a CRC-aided scheme.
The paper is organized as follows: Section 2 offers a brief overview on polar encoding and decoding. Section 3 describes the proposed PSCL algorithm and compares its errorcorrection performance with that of conventional SCL decoding. In Section 4 hardware implementation results are presented showing memory and total area savings of up to 41% and 42%, respectively, at similar error-correction performance. Finally, conclusions are drawn in Section 5.
POLAR CODES
A polar code of length N = 2 n which carries K information bits, denoted by P(N, K), has rate R K N and is constructed by concatenating two polar codes of length 
where G ⊗n is the n-th Kronecker product of the polarizing matrix G = 1 0 1 1 . 
SC works on a decoder tree such as the one illustrated in Fig. 2 . There are two types of messages passed through the different levels in a decoder tree: 1) the soft messages which contain the LLR values α; 2) the hard bit estimates β. Each node at level s of the decoder tree contains 2 s bits and the messages in Fig. 2 are calculated as
and
where ⊕ denotes the bitwise XOR operation [2] and β[i] are called partial sums.
To improve the error-correcting performance of SC decoding, for each non-frozen bit, SCL decoding creates two tentative paths on the decoding tree corresponding toû i = 0 andû i = 1. In order to avoid an exponential growth in the number of tentative paths, only the L best (i.e., most likely) paths are kept. Specifically, in LLR-based SCL decoding [4] , the L best paths are determined by the following path metric 
and [4] 
+ LQ PM path metrics
respectively. We note that Q PM grows at most as log N [4] , so the term LQ PM is negligible in Eq. (7).
PARTITIONED SCL DECODING OF POLAR CODES
The large memory requirements of the SCL algorithm translate into a large area occupation in the actual hardware decoder implementation. In fact, the total area is often largely dominated by memory, e.g. the memory area accounts for 45% of the total area in [4] . In order to reduce the required memory and, therefore, the area of the decoder, we propose a partitioned SCL (PSCL) decoding technique.
Proposed PSCL Decoding Algorithm
The conventional CRC-aided SCL decoding algorithm first performs SCL decoding to obtain the L most likely codeword candidates and, in the end, selects the (hopefully) correct estimate by choosing the candidate that matches the expected CRC. If no codeword verifies the CRC, the candidate with the best path metric is selected. In PSCL decoding, on the other hand, the decoder tree is broken into partitions (i.e., subtrees) and SCL decoding is performed only on the partitions, while the standard SC rules are applied to the remainder of the decoding tree. Each partition outputs a single candidate codeword which is selected with the help of a CRC and then sent to the next partition for further decoding. The decoding process starts with the standard SC update rules given by (3) and (4). Therefore, the decoder does not require memory to store L entire trees of internal LLRs, but only L copies of the partitions on which SCL decoding is performed. Fig. 3 and Fig. 4 show the PSCL process when a code is broken into two and four partitions, respectively. The memory used in each CRC-aided SCL decoding block can be shared with the next decoding block since only one candidate survives after decoding each partition. The total memory usage in PSCL with P partitions and list size L can be calculated as
where P ≥ 2 and P = 1 makes PSCL decoding equivalent to conventional SCL decoding. Also note that when P = 2, P −2 k=1 N 2 k = 0. It should be noted that the lower bound on the memory usage for PSCL is the memory requirement of the SC algorithm and the upper bound is the memory required by SCL with list size L. Fig. 5 illustrates the PSCL memory usage with different numbers of partitions and list sizes for a polar code with N = 2048, Q α = 6 bits, and Q PM = 8 bits. As it can be seen in the figure, the amount of memory decays exponentially towards the SC bound as the number of partitions increases. In other words, a small increase in the number of partitions results in significant savings, e.g. using four partitions in PSCL4 is expected to require less memory than SCL2.
Error-Correction Performance
Fig . 6 shows the frame error rate (FER) and bit error rate (BER) performance of SCL and PSCL. The error-correction performance of the plain SC algorithm is also included as a reference. SCLL-CRCx denotes the SCL algorithm with list size L and CRC length x and PSCL(P, L)-CRCx represents the PSCL algorithm with P partitions, list size L, and a CRC of length x. The performance results are provided for a polar code of length N = 2048 and rate R = 1 2 , while a CRC of length 32 is used for the conventional SCL decoding algorithm. To keep the code rate unchanged and to have a fair comparison, PSCL(2, L) uses a CRC of length 16, i.e. each of its two partitions uses a CRC16. Similarly, for PSCL(4, L) each of the four partitions uses a CRC8. The CRC polynomials were taken from [5, 6] . Fig. 6 shows that PSCL(2, 2)-CRC16 has identical FER and BER performance compared to SCL2-CRC32 and there is only a slight deterioration in performance when the code is broken into four partitions, as shown by the PSCL(4, 2)-CRC8 curve. However, in Fig. 6 , it can also be seen that PSCL(4, 4)-CRC8 has superior error-correction performance compared to that of SCL2-CRC32. Furthermore, PSCL(4, 4) actually requires slightly less memory than SCL2, as shown in Fig. 5 . Thus, PSCL achieves better performance and reduces memory usage at the same time. Table 1 presents indicative synthesis results to compare SC, the conventional CRC-aided SCL algorithm with L ∈ {2, 4}, and the proposed PSCL algorithm for L ∈ {2, 4} and P ∈ {2, 4}, for a polar code of blocklength N = 2048. For the CRC-aided SCL algorithm, the hardware architecture of [4] is used while an appropriately modified version of [4] was used for the PSCL algorithm. All synthesis results are for a TSMC 90 nm CMOS library (1 V, 25
HARDWARE IMPLEMENTATION RESULTS
• C) with a target frequency of 500 MHz. All decoders have an equal latency of 5248 clock cycles (10.5 µs) and throughput of 164 Mbps.
From Table 1 , we observe that the PSCL(2, 2) and PSCL(4, 2) decoders require 23% and 41% less memory area than the SCL2 decoder, respectively. The PSCL (4, 4) decoder implementation is shown to require 23% less memory area than the SCL2 decoder while offering a better coding gain by approximately 0.25 dB at a BER of 10 −5 . Regardless of the implementation, the memory area of the list decoders amounts to 40%-45% of the total area. The memory savings observed for the PSCL implementations thus translate into very significant reductions in the total area, making them very attractive compared to the conventional SCL decoders.
CONCLUSION
In this paper, we have proposed a novel partitioned list decoding algorithm for polar codes. In this algorithm, the code is broken into partitions and each partition is decoded with a CRC-aided successive-cancellation list decoder. Since the memory is shared between different partitions in the code, the memory requirements of a hardware implementation of partitioned list decoder is significantly smaller than that of a conventional list decoder without any error-correction performance degradation. Implementation results show that at equivalent error-correction performance, the proposed algorithm leads to memory and total area savings of 41% and 42%, respectively, when compared to a similar list decoder implementation. Moreover, the proposed algorithm enables a coding gain of approximately 0.25 dB at a bit error rate of 10 −5 while occupying 13% less total area than the conventional CRC-aided successive-cancellation list decoder.
