Polar codes are a family of capacity-achieving errorcorrecting codes, and they have been selected as part of the next generation wireless communication standard. Each polar code bit-channel is assigned a reliability value, used to determine which bits transmit information and which parity. Relative reliabilities need to be known by both encoders and decoders: in case of multi-mode systems, where multiple code lengths and code rates are supported, the storage of relative reliabilities can lead to high implementation complexity. In this work, we observe patterns among code reliabilities, and propose an approximate computation technique to easily represent the reliabilities of multiple codes, through a limited set of variables and update rules. The proposed method allows to tune the trade-off between reliability accuracy and implementation complexity. An approximate computation architecture for encoders and decoders is designed and implemented, showing 50.7% less area occupation than storage-based solutions, with less than 0.05 dB error correction performance degradation. Used within a standard SCL decoder, the proposed architecture results in up to 17.0% less area occupation.
I. INTRODUCTION
Polar codes are a class of error-correcting codes proposed in [1] , that can achieve capacity with a low-complexity encoding and decoding. Their construction exploits the channel polarization effect: this means that some of the channels through which codeword bits are transmitted (called bit-channels) are more reliable than others. Information bits are transmitted through the most reliable bit-channels, while the least reliable are set to a fixed value (frozen bits): the relative order of reliabilities is dependent on the code length and on the signal-to-noise ratio (SNR) for which the code has been constructed.
The number and position of information bits in a polar codeword needs to be known by both the encoder and the decoder. For encoders, information bits need to be correctly interleaved with frozen bits before encoding, and frozen bits need to be re-set halfway through systematic encoding [2] . Decoders targeting any decoding algorithm need to be aware of the bit arrangement as well [1] , [3] - [6] . Hardware implementations of encoders and decoders usually consider the frozen-information bit pattern as an input, and thus do not evaluate its storage or calculation cost. Many implementations of encoders and decoders target a single or a limited number of combinations of code lengths and rates, and a single SNR [3] , [7] - [10] : thus, it is possible to store the frozen-information bit pattern for each supported code. However, practical applications demand the 978-1-5386-0446-5/17/$31.00 c 2017 IEEE support of a possibly large number of code lengths and rates, and various SNRs. Multi-mode decoders, and thus encoders, need to grant an even higher degree of flexibility than what can currently be achieved [11] . Within this framework, the direct storage of the bit pattern for each supported case can lead to unbearable implementation costs.
A few recent works address the problem of easy construction of polar code relative reliabilities. Partial orders in the reliability of polar code bit-channels were noticed in [12] , while a theoretical framework based on β-expansion for fast polar code construction has been proposed in [13] . While these approaches greatly reduce the computation complexity of the relative reliabilities, the direct implementation cost is still very high.
In this work, we propose an approximate method to compute the relative reliability of polar codes, that can be implemented with considerably lower complexity than the direct storage of all values, along with a flexible architecture that can be used in both encoders and decoders. The trade-off between implementation complexity and degree of approximation can be tuned according to the application constraints.
The rest of the paper is structured as follows. Section II briefly introduces polar codes. Section III describes the observed patterns in reliabilities and proposes the approximate computation method. Section IV details a case study and evaluates the impact of various approximations on the errorcorrection performance, whereas Section V details an architecture for the implementation of the proposed technique, and provides implementation results. Finally, Section VI draws the conclusions.
II. POLAR CODES
A polar code P(N, K) is a linear block code of length N = 2 n and rate K/N . It is constructed as the concatenation of two polar codes of length N/2, and can be expressed as the matrix multiplication
where u = {u 0 , u 1 , . . . , u N −1 } is the input vector, x = {x 0 , x 1 , . . . , x N −1 } is the codeword, and the generator matrix G ⊗n is the n-th Kronecker product of the polarizing matrix G = 1 0 1 1 . The polar code structure allows to sort the Nbit input vector u according to the reliability of the bitchannels. The reliabilities associated with the bit-channels can be determined either by using the Bhattacharyya parameter [1] ,
x 4
x 5
x 6
x 7 or through the direct use of probability function [14] . The K information bits are assigned to the most reliable bit-channels of u, while the remaining N − K (frozen bits) are set to a predetermined value, usually 0. Codeword x is transmitted through the channel, and the decoder receives the Logarithmic Likelihood Ratio (LLR) vector y = {y 0 , y 1 , . . . , y N −1 }.
The encoding process in Equation (1) can be represented as in Fig. 1 , that shows a polar code encoding example for P (8, 4) where the frozen bits set F contains {u 0 , u 1 , u 2 , u 4 }.
Polar codes have been defined in [1] together with the successive cancellation (SC) decoder: SC-based decoding process can be represented as a full binary tree search, in which the tree is explored depth first, with priority to the left branches. Fig. 2 shows an example of SC decoding tree for P(16, 8) , where nodes at stage s contain 2 s bits. White leaf nodes are frozen bits, while black leaf nodes are information bits.
The message passing criteria among tree nodes is detailed in Fig. 3 . LLR values α are sent from parents to children, that in return send back the hard bit estimates β. Left branch messages α l and right branch messages α r can be computed in a hardware-friendly way [15] as while β is computed as
where ⊕ is the bitwise XOR. Due to data dependencies, SC computations need to follow a particular schedule. Every node receives α first, then computes α l , receives β l , computes α r , receives β r , and finally sends back β. At leaf nodes, β i is set as the estimated bitû i :
SC decoding suffers from mediocre error-correction performance when applied to moderate and short code lengths. The SC list (SCL) algorithm described in [4] improves the error-correction performance by storing a set of L codeword candidates, that gets updated after every bit estimation.
III. APPROXIMATE RELIABILITY COMPUTATION
Given a polar code P(N, K), let us define the reliability vector p as the N-length sequence of elements p i , where 0 ≤ i < N . Each element p i represents the reliability of bit-channel i, where p i = N − 1 is the least reliable bit and p i = 0 is the most reliable bit. Index i = 0 refers to the leftmost bit on the decoding tree, while i = N − 1 refers to the rightmost.
As a few works have shown [12] , [13] , it is possible to identify regular patterns in the reliability vectors of polar codes, both within the same p and among vectors constructed for polar codes with different lengths. We observe these patterns and propose an efficient approximate method to describe p through variables, update rules and scaling.
In this section, we describe some of the patterns that we have identified and used to derive a hardware-efficient approximate reliability computation technique. We focused on codes constructed for the AWGN channel with the method used in [14] , targeting an SNR of ≈ 6 dB. However, the proposed method can be easily extended to the reliabilities of codes constructed for other SNR values.
We define here some variables that are going to be useful to explain the proposed method. Let us divide the reliability vector p in two halves, p L and p H , where p L contains all p i for 0 ≤ i < N/2, and p H the other ones. We call a reliability byte p 8 L a series of eight p i belonging to p L , and p 8 H one belonging to p H . To identify different reliability bytes, we assign an additional subscript to p 8 L and p 8 H , as, for example, in p 8 B1L and p 8 E3H . 
A. Intra-code reliability patterns
Observing p L from i = 0, it is possible to see how, generally, p i tends to decrease (i.e. becomes more reliable) as i increases: this is because the first bits tend to be the least reliable of the code. In the same way, in p H , starting at i = N − 1, p i tends to increase as i decreases, since the last bits are usually the most reliable.
Both p L and p H can be expressed as a series of variables associated to update rules. As an example, let us consider p for N = 8: p = {7, 6, 5, 3, 4, 2, 1, 0}
where p L = {7, 6, 5, 3} and p H = {4, 2, 1, 0}. We can write these vectors as, for example:
with the values of the variables initialized as N = 8, H = 0, ENDL = 3, ENDH = 4.
As the code length increases, the regularity of the reliabilities decreases, and either a higher number of variables are needed to represent p exactly, or more irregular update patterns need to be used. For example, for N = 16: p = {15, 14, 13, 10, 12, 9, 8, 4, 11, 7, 6, 3, 5, 2, 1, 0}. With larger code lengths, we can derive an approximate p by limiting the number of variables used, and assigning a single, regular update pattern to each variable. As an example, Table I 
B. Inter-code reliability patterns
Having defined the code reliability as in Section III-A, it is possible to observe the evolution of p from a smaller code to larger codes. A first observation can be made towards the reuse of p L and p H of a length-N code as part of p L and p H of a length-2N code. Fig. 4 shows that the variable sequence found in p L for length N , can approximate the first N/2 positions of p L for length 2N . A mirrored observation can be made for p H . Different initialization values and different update rules will be necessary when the code length is changed, but the same variable sequence can be used. The precision with which variable sequences of lower-length codes can approximate part of larger-length code sequences depends on the number of variables. A high number of variables will guarantee very good precision and will allow to reuse a particular variable sequence for much larger codes. On the other hand, a large number of variables results in a higher degree of implementation complexity.
The frequency of occurrence and positioning of a variable within p can often be computed exactly. For example, variables N and H in Table I will be encountered every δ i = 2δ i−1 + 1 variables, with δ 0 = δ 1 = 0. Moreover, the initialization values of many variables can be expressed in function of the code length. Variable I can be initialized as log 2 (N )+1, while ENDL = log 2 (N ) and L ≈ 6 log 2 (N )−8.
C. Proposed approximate reliability representation
We thus exploit the observed intra-and inter-code reliability patterns to propose an efficient way to store code reliabilities in decoder implementations. A single variable sequence p is selected, targeting the maximum code length supported. Shorter code lengths can be derived by considering only some p 8 , as in Fig. 4 . To each variable is assigned an initialization value and an update value for each supported code length. The complete p can be constructed sequentially, starting from i = 0 for p L and from i = N − 1 for p H . The code structure derived for a certain code length can be extended to that targeting different SNR points by a limited number of adhoc p 8 substitutions. The selection of variables and update rules can be helped by theoretical construction approaches like [13] . The technique proposed in our work is orthogonal to the construction method.
IV. SIMULATIONS AND PERFORMANCE
As a proof of concept, we provide the full construction method of the approximate reliability for codes of length 8 to 256. Table II lists the variable sequences for the reliability bytes needed to construct codes with maximum code length of 256. The boldfaced variables in the low (high) half are 
LIIHIHHH
substituted with ENDL (ENDH) when they represent p N/2−1 (p N/2 ). Initialization and update values for all considered code lengths are listed in Table III . The choice of variable placement and their total number is one of the possible schemes: a larger number of variables will lead to more precise representation of p, but will entail higher storage requirements.
The error-correction performance of the approximated reliability has been evaluated under both SC and SCL decoding. Fig. 5 shows the frame error rate (FER) for the P(64, 32) polar code, both with the original reliabilities and the approximated ones according to Table II -III. The degradation in FER brought by the approximation is negligible for both SC and SCL (L = 2 and L = 8): thus, an approximation with 10 variables guarantees sufficient precision. Fig. 6 shows the same type of result for P(256, 128): we can see that the proposed 24variable approximation causes significant FER degradation, particularly severe with SC. Increasing the number of variables to 32 allows a more precise approximation of p: the red curves show a ≈ 0.05 dB FER degradation with respect to the ideal case. It is worth noting that the codes considered in Fig. 5-6 are rate 1/2, and thus more susceptible to the imprecisions of the approximation method. In fact, in general, the lowest and highest reliabilities are easy to represent with 238 a regular structure, while the middle reliabilities are more complex. Thus, with high and low rate codes, the demarcation line between frozen and information bits will be closer to well-approximated reliabilities, and thus less likely to cause degradation.
V. RELIABILITY COMPUTATION ARCHITECTURE Reliability vectors need to be stored both at the encoder and the decoder side: multiple code lengths and rates can lead to high memory requirements. Frozen bit sequences require less memory to be stored, but a single reliability vector is sufficient for all code rates, while each frozen bit sequence identifies a single P(N, K). A frozen bit sequence is easily generated by comparing each reliability with a threshold value: if p i is higher than the threshold, then bit i is among the least reliable bit-channels, and is used as a frozen bit. The proposed approximated reliability computation method can be efficiently implemented in hardware. Figure 7 depicts the architecture at the encoder or decoder side: two of them are instantiated, one for p L and one for p H . The Structure Memory stores the variable sequence, as shown in Table II . Each variable is represented with 5 bits, sufficient to represent up to 2 5 variable types. The Initialization Memory holds the 8-bit initialization values for all combinations of variables and code lengths in p L , and the Update Memory the relative 5-bit update rules. In our case study, these two memories hold 38 values for p L and 30 values for p H in the 24-variable case. In the 32-variable case, 43 values are needed for p L and 35 for p H . A counter from 0 to N −1 2 addresses the Structure Memory: the output variable type, together with the code length, selects the correct variable and update values. The Current Variables memory holds the update value of all variable types used in the relative p half, appropriately initialized at the beginning of the computation. All memories can be reconfigured through external inputs.
The Update Memory stores fixed point values with four integer bits and one decimal bit, while the Current Variables values are integers. Before the summation, the selected variable is left-shifted of one position, and the result is right-shifted back before the Current Variables memory is updated.
Along with the proposed architecture, we have designed a baseline architecture for comparative purposes. If no on-line construction technique is applied, the reliability vector for each supported code must be stored. Thus, the baseline considers a total 504 8-bit memory elements, along with the upload and reading logic. Both architectures output two reliabilities at each clock cycle. Fig. 7 , so that both p L and p H can be computed. With a target frequency of 500 MHz, the area occupation A of the 24-var architecture is 54.4% less than that of the baseline. The multiplexing logic and adders account for a larger part of the area occupation than the baseline, reducing the percentage of memory elements A mem . The 32-var architecture's extra memory requirements and logic lead to an area reduction A red of 50.7%. When the target frequency is increased to 1 GHz, A red decreases for both approximated architectures. This is due to the longer critical path that the reliability computation entails, that leads to higher synthesis effort and logic duplication.
To evaluate the impact of the proposed architecture on a polar decoder, we synthesized the three techniques listed in Table IV together with the SCL decoder described in [11] . The decoder has been sized for a maximum code length of N = 256, yielding an area occupation of 0.116 mm 2 for L = 2 and 0.554 mm 2 for L = 8. The reliability computation architecture leads to an area saving ranging from 2.9% up to 17.0% with respect to SCL decoders considering the full storage of reliability vectors.
VI. CONCLUSION
In this work, we have proposed an approximate approach to polar code reliability computation that can efficiently be implemented in hardware encoders and decoders. Regular patterns within the reliability vector of a code and among those of different codes are observed, expressing the reliability vectors as a sequence of variables and update rules. Simulations show how the proposed method can be tuned to strike the desired trade-off between accuracy and ease of implementation. A low-complexity architecture is designed for various degrees of approximation, implemented and compared to a storage-based solution, showing 50.7% less area occupation with < 0.05 dB FER loss. Used within a standard SCL decoder, the proposed architecture results in up to 17.0% less area occupation.
