Polar codes have gained extensive attention during the past few years and recently they have been selected for the next generation of wireless communications standards (5G). Successive-cancellation-based (SC-based) decoders, such as SC list (SCL) and SC flip (SCF), provide a reasonable error performance for polar codes at the cost of low decoding speed. Fast SC-based decoders, such as Fast-SSC, Fast-SSCL, and Fast-SSCF, identify the special constituent codes in a polar code graph off-line, produce a list of operations, store the list in memory, and feed the list to the decoder to decode the constituent codes in order efficiently, thus increasing the decoding speed. However, the list of operations is dependent on the code rate and as the rate changes, a new list is produced, making fast SC-based decoders not rate-flexible. In this paper, we propose a completely rateflexible fast SC-based decoder by creating the list of operations directly in hardware, with low implementation complexity. We further propose a hardware architecture implementing the proposed method and show that the area occupation of the rateflexible fast SC-based decoder in this paper is only 38% of the total area of the memory-based base-line decoder when 5G code rates are supported.
I. INTRODUCTION
Polar codes can provably achieve the capacity of a binary memoryless symmetric (BMS) channel with the lowcomplexity successive-cancellation (SC) decoding algorithm [1] . However, this capacity-achieving property under SC decoding only occurs as the code length tends towards infinity. For practical values of code length, SC decoding fails to provide a reasonable error-correction performance. In order to improve the error-correction performance of SC decoding, SC list (SCL) [2] and SC flip (SCF) [3] decoders run multiple SC decoders in parallel and in series, respectively. With this error-correction performance improvement, polar codes were selected as a channel coding scheme for the enhanced mobile broadband (eMBB) control channel in the next generation of wireless communications standard (5G).
SC-based decoding algorithms such as SC, SCL, and SCF, suffer from high latency and low throughput when implemented on hardware. This is due to the serial nature of SC decoding in which the decoding proceeds bit by bit. In order to address this issue, polar codes where shown to be a concatenation of smaller constituent codes which can be decoded in parallel [4] , [5] . In [6] , more constituent codes were identified and low-complexity parallel decoders were designed to increase the throughput of SC decoders even further. It was shown in [7] , [8] that the constituent codes can be decoded efficiently under SCL decoding while keeping the error-correction performance of SCL decoder unaltered. The same approach was applied to the SCF decoder in [9] .
The construction of polar codes is based on the identification of reliable bit-channels through which information bits are transmitted. The remaining bit-channels carry fixed values and are called frozen bits. In SC-based decoders, the frozen and information bit sequence can be either stored in a memory, or computed on-line given the bit-channel relative reliability vector and desired code rate, as proposed in [10] . In fact, the latter approach is significantly more efficient in case of multicode decoders, and is facilitated by nested reliability vectors as those selected for the 5G eMBB control channel. Therefore, in 5G, the polar encoder and decoder are provided with a vector of bit indices in descending reliability order and an information length K, from which the encoder and the decoder should extract the frozen/information bit sequence.
Fast SC-based decoders rely on the identification of the type and the length of constituent codes in a polar code. The direct calculation of the list of operations for fast SC-based decoders requires complicated controller logic [5] . Therefore, the identification of the type and the length of constituent codes is performed off-line and the decoding order is stored in a dedicated memory as a list of operations [5] , [7] , [8] . The decoder fetches the list of operations from memory to decode the constituent codes in order one by one. The main drawbacks of the aforementioned fast SC-based decoders are twofold: first, the list of operations requires high memory usage when implemented on hardware. Second, the list of operations is highly dependent on the rate of the polar code and as the rate changes, the list of operations changes too. Therefore, for 5G applications which require the support of multiple rates, multiple lists of operations need to be stored in memory.
In this paper, we propose completely rate-flexible fast SCbased decoders by introducing a method to infer the list of operations directly in hardware without the need to store it in memory. We show that the type and the length of a constituent code in a polar code can be identified with low hardware implementation complexity, by checking only a few bits of the constituent code. We further show that the list of operations adapts with the rate of the code, allowing the resulting fast SC-based decoder to be completely rate-flexible. We design and implement a hardware architecture for the proposed decoder and show that the memory required to store the list of operations can be completely removed, resulting in significantly lower decoder area occupation.
II. PRELIMINARIES A. Polar Codes
A polar code of length N = 2 n that carries K information bits has a rate R = K N and is denoted by P(N, K). It can be constructed using a lower-triangular generator matrix G as x = uG, where x = {x 0 , x 1 , . . . , x N −1 } is the vector of coded bits and u = {u 0 , u 1 , . . . , u N −1 } is the vector of input bits. The matrix G is equal to B N F ⊗n where B N is the bitreversal permutation matrix, and F ⊗n is the n-th Kronecker power of the polarizing matrix F = [ 1 0 1 1 ] . As N goes large, the polarization phenomenon creates bit-channels that are either completely noisy or completely noiseless, and the fraction of noiseless bit-channels equals the channel capacity. Given a block length N , a bit-channel relative reliability vector v = {v 0 , v 1 , . . . , v N −1 }, where 0 ≤ v i < N , is generated and fed into the encoder and the decoder based on the polarization phenomenon which shows the rank of each bit-channel. Thus, v is a vector of integers such that if v i < v j , then the bit-channel i is more reliable than bit-channel j. The polar construction process consists of the classification of the bitchannels in u into two groups based on v: the K most reliable bit-channels which carry the information bits, and the N − K least reliable bit-channels that are fixed to a predefined value (here, it will be 0). This classification can be represented as a sequence of binary values
x ∈ X , y ∈ Y}. In order to quantify the reliability of W , we use the Bhattacharyya parameter Z(W ) ∈ [0, 1], that is defined as
Hence, the good bit-channels are the ones that have the lowest Bhattacharyya parameters.
B. SC-Based Decoding
SC-based decoding algorithms can be represented as a depth-first binary tree search with priority to the left branches as depicted in Fig. 1 . Two kinds of messages are passed between the nodes in the graph: the soft log-likelihood ratio (LLR) values α = {α 0 , α 1 , . . . , α 2T −1 } which are passed from a parent node at level log 2 (2T ) = t + 1 to the child nodes at level log 2 (T ) = t, and the hard bit estimates β = {β 0 , β 1 , . . . , β 2T −1 } which are passed from a child node at level t to a parent node at level t + 1.
The T = 2 t elements of the left child node α = {α 0 , α 1 , . . . , α T −1 } and of the right child node α r = {α r 0 , α r 1 , . . . , α r T −1 } can be computed as
where Fig. 1 : SC-based decoding on a binary tree for P (8, 4) and v = {7, 6, 5, 3, 4, 2, 1, 0} (s = {0, 0, 0, 1, 0, 1, 1, 1}).
Assume that the vector of relative reliabilities of bit-channels v is stored in memory and is available to the decoder. In SC and SCF decoding algorithms, when a leaf node is reached, the i-th bitû i can be estimated as
In SCL decoding, when an information bit is reached, both of its possible values of 0 and 1 are considered. After the estimation of the value ofû i , the left child and right child node messages
The depth-first binary tree search of SC-based decoding algorithms can be represented by a list of operations which can be generated directly on hardware by simple bitwise operations [11] , [12] . It is worth mentioning that the list of operations for SC-based decoders is fixed for all rates and thus SC-based decoders are rate-flexible. However, the number of time steps required to finish the decoding process in SC-based decoders is at least 2N −2 [3] , [12] . This limits the latency and throughput of polar codes when decoded by SC-based decoders.
C. Fast SC-Based Decoding
In order to reduce the latency and increase the throughput of SC-based decoders for polar codes, special node structures are identified and the decoding is performed based on the LLR values at the intermediate levels in the SC-based decoding tree without the need of traversing it. It was shown in [4] , [5] that four special nodes can be decoded efficiently in fast simplified SC (Fast-SSC) decoding without traversing the tree at the special nodes.
. . , s t T −1 } represent a subset of s corresponding to a node of length T in a polar code decoding tree. The four special nodes are the following: Rate-0 Node: This node consists of only frozen bits, i.e., v ti ≥ K for any i ∈ {0, 1, . . . , T − 1} (s t = {0, 0, . . . , 0}). Rate-1 Node: This node consists of only information bits, i.e., v ti < K for any i ∈ {0, 1, . . . , T − 1} (s t = {1, 1, . . . , 1}). Repetition (Rep) Node: This node consists of frozen bits except for the last bit which is an information bit, i.e., v t T −1 < K and v ti ≥ K for any i ∈ {0, 1, . . . , T − 2} (s t = {0, . . . , 0, 0, 1}). Single parity-check (SPC) Node: This node consists of information bits except for the first bit which is a frozen bit, i.e., v t0 ≥ K and v ti < K for any i ∈ {1, 2, . . . , T − 1} (s t = {0, 1, 1, . . . , 1}) .
Recently, five new special nodes are observed in [6] and efficient decoders that can be used in SC decoding were designed for them. These nodes are the following: Type-I Node: This node consists of frozen bits except for the last two bits which are information bits, i.e., v t T −1 < K, v t T −2 < K, and v ti ≥ K for any i ∈ {0, 1, . . . , T − 3} (s t = {0, . . . , 0, 1, 1}). Type-II Node: This node consists of frozen bits except for the last three bits which are information bits, i.e., v t {0, . . . , 0, 1, 1, 1}) . Type-III Node: This node consists of information bits except for the first two bits which are frozen bits, i.e., v t0 ≥ K, v t1 ≥ K, and v ti < K for any i ∈ {2, 3, . . . , T − 1} (s t = {0, 0, 1, . . . , 1}). Type-IV Node: This node consists of information bits except for the first three bits which are frozen bits, i.e., v t0 ≥ K, v t1 ≥ K, v t2 ≥ K, and v ti < K for any i ∈ {3, 4, . . . , T − 1} (s t = {0, 0, 0, 1, . . . , 1}). Type-V Node: This node consists of frozen bits except for the bits T − 5, T − 3, T − 2, and T − 1 which are information bits 0, 1, 0, 1, 1, 1}) .
Consider again the example of Fig. 1 . Then, if the new nodes are not taken into account, P(8, 4) can be decoded in four time steps by traversing the tree for one level and decode the resulting Rep and SPC nodes. The resulting list of operations for the decoder would be {F 2 , Rep 2 , G 2 , SPC 2 }, where Rep t and SPC t represent the decoding of Rep and SPC nodes of length T = 2 t , respectively. As the rate changes, the list of operations changes, and the resulting decoder is not rate-flexible. For applications that support codes with multiple rates, for each rate, the list of operations has to be stored in memory to make the decoder rate-flexible. However, this results in high memory usage when implemented on hardware.
III. RATE-FLEXIBLE FAST POLAR DECODING
The high memory usage of storing the list of operations can be mitigated by generating the list of operations on hardware as the decoding proceeds. A rudimentary approach would be to generate the vector s t from K and the vector v t using comparators, and check the pattern of information and frozen bits in s t for every encountered node. The problem with this approach is that, for nodes of large length, there is a high hardware complexity overhead in generating s t from K and v t , and in determining the node types. Moreover, the module that generates the list of operations should account for the largest possible node which is the root node in the decoding tree with size N . This results in a large critical path which limits the operating frequency.
In order to tackle the above issue, the idea is to exploit the inherent order in the Bhattacharyya parameters of the bitchannels. Let W i and W j be the bit-channels corresponding to u i and u j , and let b i and b j be the binary expansions of the integers i and j. In [13] , [14] a partial order between the polarized bit-channels was introduced. In particular, it was proven that W i is stochastically degraded with respect to W j , i.e., W i ≺ W j , when one of the following two properties hold: Addition Property [15] : There exists k ∈ {0, 1, . . . , n−1} such
then all the reliability measures of W i are worse than those of W j , i.e., W i has smaller mutual information, larger Bhattacharyya parameter, and larger error probability. Consequently, if u j belongs to the frozen set, then also u i belongs to the frozen set. Furthermore, if u i belongs to the information set, then also u j belongs to the information set. By using the two properties above, it was shown in [15] that it suffices to compute the reliability of a sublinear fraction of channels in order to identify the frozen and the information sets.
Another option to find an ordering between the Bhattacharyya parameters of the bit-channels can be described as follows. Consider the transmission over a BMS channel W with Bhattacharyya parameter Z(W ) and define the synthetic channels W 0 and W 1 as
Then, the following inequalities between Z(W 0 ), Z(W 1 ) and Z(W ) hold
which follow from Proposition 5 of [1] and from Exercise 4.62 of [16] . Furthermore, the bit-channel W i corresponding to u i is given by the recursive formula below:
In what follows, we will denote by Z i the Bhattacharyya parameter of W i . At this point, we are ready to state the main results of this paper. The first theorem concerns the identification of Rate-0, Rate-1, Rep, and SPC nodes, and the second theorem concerns the identification of Type-I, Type-II, Type-III, Type-IV, and Type-V nodes. Theorem 1. Consider a node of length T = 2 t in a polar code of length N = 2 n . Then, the following properties hold: 1) If v t T −1 ≥ K, then the node is a Rate-0 node. 2) If v t0 < K, then the node is a Rate-1 node.
3) If v t T −1 < K and v t T −2 ≥ K, then the node is a Rep node. 4) If v t0 ≥ K and v t1 < K, then the node is an SPC node.
Proof. 1) Note that b T −1 = {1, . . . , 1}. By using the addition property, we obtain that W i ≺ W T −1 for any i ∈ {0, 1, . . . , T − 2}. Hence, as v t T −1 ≥ K, v ti ≥ K for any i ∈ {0, 1, . . . , T − 2}. This means that the polar code consists of only frozen bits, i.e., it is a Rate-0 node.
2) Note that b 0 = {0, . . . , 0}. By using the addition property, we obtain that W 0 ≺ W i for any i ∈ {1, 2, . . . , T − 1}. Hence, as v t0 < K, v ti < K for any i ∈ {1, 2, . . . , T − 1}. This means that the polar code consists of only information bits, i.e., it is a Rate-1 node.
3) Note that b T −2 = {1, . . . , 1, 0}. By using the addition and the left-swap properties, we obtain that W i ≺ W T −2 for any i ∈ {0, 1, . . . , T − 3}. Hence, as v t T −2 ≥ K, v ti ≥ K for any i ∈ {0, 1, . . . , T − 3}. As v t T −1 < K, the polar code consists of frozen bits except for the last bit which is an information bit, i.e., it is a Rep node. 4) Note that b 1 = {0, . . . , 0, 1}. By using the addition and the left-swap properties, we obtain that W 1 ≺ W i for any i ∈ {2, 3, . . . , T − 1}. Hence, as v t1 < K, v ti < K for any i ∈ {2, 3, . . . , T − 1}. As v t0 ≥ K, the polar code consists of information bits except for the first bit which is a frozen bit, i.e., it is an SPC node.
Theorem 2. Consider a node of length T = 2 t in a polar code of length N = 2 n . Then, the following properties hold:
then the node is a Type-II node.
3) If v t0 ≥ K, v t1 ≥ K, and v t2 < K, then the node is a Type-III node. 4) If v t0 ≥ K, v t1 ≥ K, v t2 ≥ K, and v t4 < K, then the node is a Type-IV node.
and v t T −9 ≥ K, then the node is a Type-V node.
Proof. The proofs of 1)-3)-5) are based on the application of the addition and the left-swap properties. As they are similar to the proofs of Theorem 1, we omit them and the interested reader can find them in the longer version of this paper [17] .
2) Note that b T −5 = {1, . . . , 1, 0, 1, 1}. By using the addition and the left-swap properties, we obtain that W i ≺ W T −5 for any i ∈ {0, 1, . . . , T − 6}. Hence, as v t T −5 ≥ K, v ti ≥ K for any i ∈ {0, 1, . . . , T − 6}. Furthermore, note that b T −4 = {1, . . . , 1, 1, 0, 0}. Let z be the Bhattacharyya parameter of the channel obtained by applying t − 3 times the transform (6) to the transmission channel W . Then, by using (7) , we have that
It is easy to check that, for any z ∈ [0, 1],
which implies that
and v t T −3 < K, the node consists of frozen bits except for the last three bits which are information bits, i.e., it is a Type-II node. 4) Note that b 4 = {0, . . . , 0, 1, 0, 0}. By using the addition property and the left-swap property, we obtain that W 4 ≺ W i for any i ∈ {5, 6, . . . , T − 1}. Hence, as v t4 < K, v ti < K for any i ∈ {5, 6, . . . , T − 1}. Furthermore, note that b 3 = {0, . . . , 0, 0, 1, 1}. Let z be the Bhattacharyya parameter of the channel obtained by applying t − 3 times the transform (5) to the transmission channel W . Then, by using (7), we have that
Since (9) holds for any z ∈ [0, 1], we obtain that Z 3 ≤ Z 4 . Consequently, as v t4 < K, v t3 < K. As a result, since v t0 ≥ K, v t1 ≥ K, and v t2 ≥ K, the node consists of information bits except for the first three bits which are frozen bits, i.e., it is a Type-IV node.
The proofs for the identification of Rate-0, Rep, SPC, Rate-1, Type-I, Type-III, and Type-V nodes are based on stochastic degradation arguments. Consequently, the result does not depend on the fact that the frozen bits are determined according to the value of the Bhattacharyya parameter. On the contrary, the proofs for Type-II and Type-IV nodes use the inequalities (7) for Bhattacharyya parameters. In order to prove a similar statement for different reliability measures, one would need to find bounds of the form (7) for the desired reliability measure (e.g., mutual information, error probability). Let us further clarify that the proofs for Type-II and Type-IV nodes provide an ordering between the Bhattacharyya parameter of bit-channels. As such, they do not depend on the particular technique used to compute those Bhattacharyya parameters. It is also worth mentioning that since every node in the SCbased decoding tree represents a polar code constructed for a different channel [1] , the results in this section are valid for all the nodes in any polar code of any length.
IV. HARDWARE IMPLEMENTATION RESULTS
As a proof of concept, a decoder architecture implementing the proposed technique has been designed. It implements the layered partitioned SCL (LPSCL) decoding algorithm detailed in [18] and the Fast-SSCL-SPC algorithm introduced in [8] , along with the memory-reduction techniques proposed in [19] . The LPSCL decoder decreases the memory requirements of standard SCL decoding by dividing the SC decoding tree in different partitions; the bottom part of the SC decoding tree belonging to each partition is decoded with SCL with a list size L max . When information needs to be passed between partitions, i.e. at the top stages of the tree, only L t < L max candidate codewords are passed, with L t decreasing progressively as the stage t increases. The Fast-SSCL-SPC algorithm is applied to the lower stages of the tree, where L max candidates are considered. This architecture has been described in VHDL and synthesized in TSMC 65 nm CMOS technology. Two versions of the decoder have been implemented: one considering the proposed special node identification technique, and one based on the off-line identification and storage used in [8] . Both decoders target the 5G polar code with code length N = 1024 and partitioning factor P = 4. The bottom part of the SC decoding tree is decoded with a list size L max = 4, while for the upper stages L 10 = L 9 = 2. Additional details about the decoder architecture are provided in the longer version of this paper [17] .
