Abstract-Channel polarization and Polar code are widely considered as major breakthroughs in coding theory because they have shown promising features for future wireless standards. The main drawbacks of Polar code are high-latency in decoding hardware, and unimpressive error-correction performance in case limited code-length is implemented. These two disadvantages limit implementation of Polar code in low-throughput wireless communication systems. In this paper, we propose a low-complexity low-latency hardware architecture for the softdecision compact (16,11) Systematic Successive Cancellation Polar Decoder (S-SCD). Experimental results has shown that the latency of the proposed S-SCD improves 3.75 times and 2.75 times compared with conventional and 2b-SC architectures. Besides, it has also shown a better BER/FER performance compared with RS(15,11) code, which is applied widely in current VLC-based systems.
I. INTRODUCTION
Visible light communication (VLC) refers to short-range optical wireless communication using the visible light spectrum. VLC transmits data by intensity modulating optical sources, such as light emitting diodes (LEDs) and laser diodes, faster than the persistence of human eye [1] - [3] . LEDs are also increasingly being adopted in the general illumination market in both the commercial and residential segments, because of their advantages over competing lighting technologies in energy efficiency, longevity, color rendering capability, and environment factor [4] . However, VLC has certain shortcomings compared to traditional RF communication. The main drawback is that the achievable data rate drops sharply with increasing link distance, which limits the range of high data rate VLC use cases [4] . Fortunately, we can increase channel reliability and link distance by forward error correction (FEC) techniques [2] , [5] , [6] .
In communication system, FEC is an error correction method by encoding data with redundant bits at transmitter. The redundant data enable receiver detect and correct some errors without asking the transmitter to re-transmit the data [7] . FEC techniques are also known as channel coding methods. Current optical networks employ FEC based on classical error-correcting codes such as Reed-Solomon (RS) or Bose-Chaudhuri-Hocquenghem (BCH) codes [8] . Both RS and BCH codes currently use hard-decision-based receivers that have limited coding gain. Fig.1 shows block diagram of a typical low-data-rate VLC transmitter/receiver. In this system, a concatenation FEC solution is selected for the channel encode/decode. RS code is at Outer side, and at the Inner side is the Convolutional Code (CC). FEC solutions of different operation modes in VLC systems are well presented in [1] . Polar code is introduced as a low-complexity channel coding method that can achieve Shannons channel capacity for any binary-input symmetric discrete memoryless channel [9] . Systematic polar code (SPC) were proposed by Arikan and are known for their improved bit error rate (BER) performance compared to the original non-systematic polar codes [6] , [10] - [12] . The basic decoding algorithm for Polar codes is the Successive Cancellation (SC) algorithm, which is a non-iterative sequential algorithm with complexity O(N logN ) for a code of length N. Due to low-complexity and highperformance, Polar code now is applied in many systems. The main drawback of Polar code is unimpressive error-correction performance in case of short code length is used. In this case, many approaches are introduced to enhance the performance of Polar code to make it feasible in systems which requires limited code lengths.
In this paper, we propose applying Polar code as a FEC solution for VLC systems. From experimental results, we found that the Polar code outperforms RS code on error correction performance. We also propose a low-latency lowresource architecture for the (16, 11) soft-decision Systematic Successive Cancellation Polar decoder (S-SCD). 
II. POLAR ENCODING/DECODING
A polar code may be specified completely by (N, K, F ) where N is the length of a code word in bits, K is the number of information bits encoded per codeword, and F is a set of indices known as information bit indices [10] , [13] . For an (N, K, F ) polar code we describe below the encoding operation for a vector of information bits u of length K. The rate of the code is R = K/N . Let n = log 2 (N ) and G = F ⊗n = F ⊗ . . . ⊗ F (n copies) is the n-fold Kronecker product of Arikans [11] , [12] standard polarizing kerner,
Then a codeword is generated as Equation 1.
Polar codes in their standard form are non-systematic codes [12] . In other words, the information bits do not appear as part of the codeword transparently. A systematic polar code may be described as an equivalent to original polar code, except that the message vectors are mapped to codewords, such that the message bits are explicitly visible. Systematic Polar encoding of an information u of K bits, is the solution of Equation 2.
Where y F (message bit positions) and x F c (frozen bits positions) are the unknowns. There are exactly N unknowns, shared between x and y. In this paper, we implement a nonrecursive systematic polar code which is introduced in [10] . Fig.2 shows an example of encoding of a (8,5) Systematic Polar Encoder (SPE). At the positions of frozen bits, the value of y is 0, and the data flow runs from left to right. At the position of information bit, data flow runs from right to left. Encoded bits are results at x. We can found that inside the encoded data x, it includes original information bits u. This encoder requires only (N/2).log 2 N XOR computation, which is the same as non-systematic Polar encoder. The most popular decoding algorithm for Polar code is the SC algorithm which is first introduced by Akiran [12] . In fact, we employ the same SC decoder for both systematic and nonsystematic codes. In both case, the decoder tooks input y and produces an estimateû of u. For non-systematic coding, the decoder stops after putting outx A . For systematic coding, decoder has an extra step of computing an estimatex =û.G of x, and producedx A as output. Fig.3 presents decoding diagram of a S-SCD. The S-SCD involves calculations using likelihood ratio (LR) values. The LRs are usually stored directly in floating-point variables. It is well-known to cause an underflow or an overflow. A popular solution to this problem is to store log-likelihood ratios instead of likelihood ratios. Realvalued calculations in log-domain is used for F function which is inside each processing elements (PE) (Fig.3) . Specifically, F function in log-domain is presented in Equation 3:
F function:
Approximation form of F function:
In hardware implementation, computing efforts based on logarithm and exponential are high-complexity. Normally, F and G functions can be implemented by simple logic gates and logic circuits by using approximation form as shown in Tab.II and Eq.4. The proposed architecture of (16,11) S-SCD is specified in Fig.4 . The architecture includes five main parts:
III. PROPOSED ARCHITECTURE
• PEs (Processing Elements): Inside each P E block, we implement one F and one G function. A SEL signal is created to select which output of either F-function or Gfunction circuit is the output of the P E. Fig.5 shows the specification of the P E.
• Modified P E: At the last stage of P E tree, we modify the architecture of a normal P E by extract both outputs of F-function and G-function circuits as the outputs of P E.
Decoded bit u i is forwarded to the input of G-function circuit to decodeû i+1 • Control Finite State Machine (FSM): This block implements a FSM to manage scheduling for whole S-SCD core. It assigns SEL and s signals, which control the operation of P E tree network.
• Decoding (DEC) and FN transform: DEC and F N transform logic circuit is implemented by simple combinational circuit.
• Registers: In proposed architecture, for each clock cycle, S-SCD finishes decoding 2 bits. For each event of positive clock edge, new input data will be loaded to P Es of stage 3, and two decoded bits at the previous round will be stored in data registers. Current hardware architectures for SCD focus on highthroughput [6] , [7] , [9] , [14] , [15] and they are expected to be applied in high-speed systems. On the other hand, VLC systems work mostly on low-data-rate (PHY-I) and medium-data-rate modes (PHY-II, PHY-III) [1] . Data-rate range varies from 11.67 Kb/s to 5 Mb/s with optical clock rate is set up to 7.5 Mhz. In this paper, we propose a low-latency, low-resource architecture for S-SCD. In this architecture, high-throughput is not the highest priority in design criteria. Specifically, we implement a compact (16, 11) soft-decision S-SCD. The decoder shows a good performance with low-complexity of implementation. Code rate (N/K = 16/11) is also an equivalent code-rate with RS (15, 11) , which is the FEC solution in many modes of VLC systems. Fig.6 shows the processing scheduling technique of the proposed architecture. (16,11) S-SCD has four stages of PEs. In conventional architecture [14] , one clock cycle is dedicated for each PE stages processing, and two clock cycles are spent for the last stage. We propose all PEs of four stages to be processed within one clock cycles. For the last stage, we make some minor modifications to make the PE extracts two decoded bits at the output of the last stages PE.
Furthermore, the proposed architecture is based on fixedpoint calculations. Therefore, deciding number of quantization bit Q is very important. Fig.7 shows BER performance of 4-bit and 5-bit fixed-point (16,11) SCD compared with its floating point version. At Q=5, the fixed-point decoder shows a similar BER performance with the floating-point decoder.
IV. EXPERIMENTAL RESULTS
Tab.III summaries the number of latency clocks of the proposed S-SCD compared with conventional architecture and 2b-SC concept. The proposed (16,11) S-SCD requires only 8 clocks to finish decoding 16 bits; which reduces the latency 3.75 times and 2.75 times, compared with conventional and 2b-SC architectures respectively. However, maximum clock frequency of the proposed decoder is slower than if conventional and 2b-SC architectures are applied. This creates a trade-off between latency and clock frequency of the decoder.
However, as explained in Section III, for a low-data-rate systems like VLC; latency seems to be put in higher priority [14] ( compared with through-put [16] . Fig.8 and Fig.9 show biterror-rate (BER) performance and frame-error-rate (FER) performance of the proposed soft-decision S-SCD. These figures also show performance of the reference RS (15, 11) and harddecision SCD and S-SCD. Performance of soft-decision Polar decoder is much better than hard-decision decoder. Specifically, Fig.8 shows a 2dB better in coding gain between soft-decision and hard-decision (16, 11 ) SCD, at BER=1E-4. Because RS (15, 11) is widely used in VLC systems, we make a comparison between RS(15,11) which one symbol includes 8 bits; and hard-decision (128,96) S-SCD. In this case, not only BER, FER performances of hard-decision (128,96) S-SCD are better; the S-SCD also shows a better information utilization, in term of higher code-rate is used (128/96 compared with 15/11). In summary, we propose applying soft-decision (16,11) S-SCD, because of its low-complexity, and better BER/PER performance compared with current RS solutions in VLC systems.
Tab.IV summarizes results of hardware synthesis of the proposed (16, 11) soft-decision S-SCD. With Q=5-bit, the proposed hardware can get maximum frequency around 73 Mhz while keeping low-resource, in which no memory bits are used. The synthesis results shown Tab.IV are achieved by synthesizing the proposed design with Quartus II software. The selected FPGA device is Altera Cyclone IV EP4CE115F29C7N.
V. CONCLUSION
In this paper, we have proposed a low-latency, low-resource architecture for the compact (16,11) soft-decision S-SCD. The proposed decoder has shown an improvement in latency compared with conventional and 2b-SC architectures. We have also shown that BER/FER performance of the proposed S-SCD is better than RS (15, 11) , which is current FEC approach in VLC systems. Moreover, hardware synthesis results have demonstrated that the proposed S-SCD is a low-complexity FEC solution. Therefore, the proposed decoder is quite suitable to be applied in VLC systems. For near future works, we are building a full VLC system based on FPGA and customized VLC front-ends, in which the proposed S-SCD is also applied.
