Novel simplified merged processing element (SMPE) architectures to design a low-complexity successive-cancellation (SC) polar decoder are presented. The proposed SMPE architectures reduce the number of sign-magnitude conversions and switch networks, relative to those of the conventional merged processing element. Synthesis results show that the (1024, 512) SC polar decoder using the proposed SMPE architectures significantly decreases hardware complexity and improves technology scaled normalised throughput, as compared to those of the previously reported architectures.
Introduction: Polar codes have attracted significant attention among various forward error correction codes because of their excellent properties relative to those of the binary-input discrete memoryless channel [1] . To date, much of this work has addressed the theoretical aspects of polar codes and is aimed at improving the error correction performance of polar decoders [2] . Several researchers have viewed the successive-cancellation (SC) algorithm as a good candidate for hardware design of polar decoders because of its low complexity [3] [4] [5] [6] . The semi-parallel SC decoder of Leroux et al. [4] has low-processing complexity, long latency and low throughput. Since SC decoding has low-intrinsic parallelism, look-ahead techniques [6] to reduce decoding latency and increase throughput of SC decoders were proposed. Their hardware complexity is greater than that of the semi-parallel SC decoder. Yuan and Parhi [5] proposed the so-called 2b-SC precomputation decoder, which reduces decoding latency without causing performance loss.
This Letter proposes a novel simplified merged processing element (SMPE) method to design an efficient SC polar decoder. Generally, merged processing elements (MPEs), which use sign-magnitude, occupy the largest area in the SC polar decoder. The proposed SMPE architectures reduce the number of sign-magnitude conversions and switch networks, relative to those of the conventional MPE. Our results demonstrate that the proposed architecture significantly decreases the hardware complexity and improves technology scaled normalised throughput (TSNT) compared to those of previous works.
SMPE architecture:
The SC decoding algorithm [1] successively estimates the bitsû i , i = 0, …, N − 1, using the channel output y and the previously estimated bitsû 0 toû i−1 . The SC decoding algorithm, based on the log-likelihood ratio (LLR), includes the proposed semi-parallel SC decoder [4] . LLRs with f and g functions can be generated by recursively applying (1) and (2), respectively,
The decision rule isû
Zang et al. [6] proposed an MPE architecture that is used to calculate pre-computation f and g functions. The conventional MPE [6] consists of Type 1 Processing Element (PE), two's complement to signmagnitude (TtoS), and sign-magnitude to two's complement (StoT), as shown in Fig. 1 . Type 1 PE is in parallel and consists of an addersubtractor, as shown in Fig 2. For the full adder, the sum and carry-out bit are represented by S and C out . The difference and borrow-out produced by the full subtractor are denoted by D and B out . Equation (1) requires the computation of the minimum magnitude value between a and b. The conventional MPE architecture uses a TtoS block to separate the sign and magnitude to find the minimum magnitude between a and b. Then, the magnitude values are used as the input values of Type 1 PE. The output value of Type 1 PE, B n , is used as a control signal of the MUX to find the minimum value of the magnitude between a and b. Thus, the hardware complexity of the conventional MPE architecture is large because of the TtoS block required by all MPEs. In this section, novel SMPE architectures and their design methods are proposed to simplify the hardware design. Fig. 3 shows the proposed SMPE architectures, which find the minimum value of magnitude between a and b without using a TtoS block. Table 1 shows Type 1 PE output values, C n and B n , which are calculated using SMPE input values. That is, using Table 1 , we can calculate the minimum value of (|a|, |b|), min(|a|, |b|), in (1) . There are four possible cases for the evaluation of min(|a|, |b|), as follows:
(i) a ≥ 0 and b ≥ 0: When both a and b are positive, the sign bits of a and b are both '0'; therefore C n is always '0'. B n has three cases based on the magnitude of a and b. If |a| > |b|, B n is '0'. If |a| < |b|, B n is '1'. If |a| = |b|, B n is '0'. That is, the B n values can be used to find min(|a|, |b|) in (1).
(ii) a < 0 and b < 0: When both a and b are negative, the sign bits of a and b are both '1'; therefore C n is always '1'. The B n value has three cases based on the magnitude of a and b. If |a| > |b|, B n is '1'. If |a| < |b| or |a| = |b|, B n is '0'. That is, the B n values can be used to find the min(|a|, |b|). (iii) a ≥ 0 and b < 0: When a is positive and b is negative, the sign bit of a is '0' and the sign bit of b is '1'; therefore, B n is always '1'. C n has three cases based on the magnitude of a and b. If |a| > |b| or |a| = |b|, C n is '1'. That is, the C n values can be used to find the min(|a|, |b|).
(iv) a < 0 and b ≥ 0: When a is positive and b is negative, the sign bit of a is '1' and the sign bit of b is '0'; therefore, B n is always '0'. If |a| > |b|, C n is '0'. If |a| < |b| or |a| = |b|, C n is '1'. That is, C n values can be used to find the min(|a|, |b|). 3a shows the proposed SMPE 1 architecture. SMPE 1 does not require a TtoS block and has only one StoT block using the method in Table 1 . That is, in (2), ifû s = 0, the output of the g function is (a + b) and does not require an StoT block in SMPE 1. Ifû s = 1, the output of the g function is (−a + b = diff). However, because the output of the g function is (a − b = −diff) in Type 1 PE, the negation block is employed in the output stage of the g function. Fig. 3b shows the proposed SMPE 2 architecture used in the final stage, in which the decision rule, (3), is used. The output of the f function is generated by [sign(a) XOR sign(b)] operation. Ifû s = 0, the output of the g function uses the sign bit of the SUM output in Type 1 PE. Ifû s = 1, the output of the g function is generated using a NOR operation. Therefore, based on (3), the proposed SMPE architectures use only the sign bit and have a much lower hardware complexity than that of the conventional MPE. The conventional MPE uses the B n output of Type 1 PE as its MUX control signal. However, the proposed SMPE 1 uses the B n XOR C n operation value as the control signal of the MUX to find the minimum value of the magnitude between a and b. In addition, the TtoS block and switch block are removed and the number of StoT blocks is reduced, relative to those of the conventional MPE.
To show that the SC polar decoder using the proposed SMPE architectures is valid and will not significantly degrade the decoding performance, a bit-error-rate (BER) performance curve for the (1024, 512) SC polar code is provided in Fig. 4 . According to the BER comparison, the SC polar decoder using the proposed method has the same BER performance as that of the conventional SC polar decoder.
Implementation and performance comparison: The (1024, 512) SC polar decoder using the proposed SMPE architectures was modelled in Verilog HDL and simulated to verify its functionality. After complete verification of the design functionality, the decoder was synthesised using appropriate time and area constraints. Both simulation and synthesis steps were carried out using the SYNOPSYS design tool and 65-nm CMOS technology. The estimated total number of NAND gates is 268,200 from the synthesised results. The clock speed is 670 MHz for the proposed polar decoder. When scaling to the same technology (65-nm CMOS), the TSNT metric, is defined in terms of throughput per thousand gates (Kgate). Table 2 lists the implementation results for the reported (1024, 512) SC polar decoders. The proposed decoder can achieve a 20% reduction in gate count, as compared to the 2b-SC architecture [5] . The clock speed of the proposed SC polar decoder is improved by 34%, relative to that of the semi-parallel architecture [4] in the same CMOS technology. The TSNT of the proposed SC polar decoder is improved by 22.5%, relative to that of the 2b-SC architecture [5] . Conclusion: A novel SMPE for the efficient SC polar decoder design is proposed. Based on this method, a low-complexity SC polar decoder architecture can be implemented. Results show that the proposed architecture has significant advantages with respect to both hardware complexity and efficiency.
