Abstract: Turbo decoding for the advanced 4GPP-LTE wireless communication standard is the most challenging technique in terms of computational complexity and power consumption of corresponding cellular devices. The standard VLSI implementation of turbo decoder requires memory, latency, area, power, and throughput. These particularities cannot tolerate in some applications as well as the previous method of floating point has some pitfalls due to this; the performance is degraded. The novel fixed point technique for SISO parallel advanced long-term evolution turbo decoders with new hard-wried 6144 bit interleaver is proposed here to achieve the specification. The major contribution of the planned methodology is the enhancement of throughput in parallel SISO decoders by eliminating the critical data path problem in floating point method. The signal to noise ratio is calculated to reduce the power consumption, and zero state metrics are used to eliminate the bit error rate. The designed algorithms are used to enhance the error correction capacity.
Introduction
Turbo decoders are the codes to decode the input signal, and it uses the decoder and interleaver. In the past years it is getting higher attraction due to the increased coding gain. The performance can be improved by the scaling factor and the calculations [1] . The decoder with Gaussian random variables are used for the approximation that if the code word length is large, then predicts the threshold value [2] .
The evaluation of actual density function can extract the extrinsic information of decoder. In the decoder, based on the defined noise figure the convergence of its correct code word is performed [3] . The decoders use the parallel architectures for the application of high data rate to write and read to the memory without corruption [4] . This can be analysed by the effect of finite precision due to the determination of optimal word length and the existence of a trade-off between the cost and performance [5] [6] [7] . The architecture can optimize the decoder and interleaver block to achieve the low power and high throughput. This can be obtained by the parallel interleaving [8] The design of parallel decoders can yield the reduced latency and increased throughput. To improve the efficient area, the foundation of large blocks are utilized [9] . The decoder is implemented with the help of algorithm to decrease the energy consumption, delay, and area. The particular code depends on the optimal choice of parameter [10] . The high throughput decoding uses the low complexity inter-leaver. The problem of memory access contention occurs in this. The efficient turbo decoder is yielded by the parallelizing of long MAP coder [11] .
The turbo coders perform the high coding gain. The high-speed implementations are difficult to realize the complex and recursive structure of the decoder. The decoding operations provide the collision free inter-leavers. The power can be saved due to the dynamic reconfiguration with response to the channel conditions [12] [13] [14] . The performance of the decoder can be improved by several algorithms like Viterbi, Shannon's, SISO and parallel decoding to improve the coding. These algorithms provide high complexity [15] . The coders are mainly applicable in the field of deep space communication, PCS, 3GPP and IS 2000, etc. The coders will operate in the serial mode. In which the first data is processed before the operation of second data. They never exchange the data between the users [16] . The flexible and programmable unit is implemented in the interleaver to store the patterns in large size ROM which leads the very low performance [17] .
The reliable output can be represented by the soft output based on the extrinsic and priori contents. This can be transfer among the hard and reliable decision. Through this approach the parameters like state, code and iterations are evaluated [18] [19] [20] . The decoding schemes have some demerits like complexity, power consumption, computational burden, etc. This can be overcome by the proposed scheme.
The rest of the paper is organised as follows. The previous techniques used for decoder is given in section 2, the proposed methodology of fixed point turbo decoder is briefly explained in section 3, In section 4, comparison results of proposed with existing methods are given, and final conclusion part is provided in section 5.
Related work
The error correction codes (turbo codes) were employed in wireless communication to facilitate thethroughput of near capacity transmission. The challenges can be overcome by the transmission of high throughputs through the implementation of turbo decoders. This was operated on the basis of logarithmic Bahl-Cocke-Jelinek-Raviv (Log-BCJR) algorithm to the dependent data. The computing resource of NoC's had limited and scaled up the size. The Aldujaily et al. [21] had established the novel of turbo decoder algorithm to remove the dependent data. It was shown by the optimization algorithm with the manner of NoC to achieve the utility of available resources. Thus the scheme has obtained the factor upto 2.13 which will be greater than that of the Log-BCJR benchmarker.
The turbo and low-density parity check decoders were implemented in hardware to provide the ability of optimal error correction. So the turbo decoders had been developed by Perez-Andrade et al. [22] . This can enhance the timing error tolerance to reduce the latency. The proposed timing error tolerant had operated at 1.2 V with the interval of 2.2 ns, and it has been compared with the state of art at 1.2 V and 4ns with no more variation of the power supply. Thus the process can reduce the power consumption at the rate of 4, and the rate of improved throughput was given as 2.42. So the overall latency had been reduced.
The signal corruption can be decoded by the analogue error correcting codes. In digital communication, the errors can be checked by the cyclic redundancy. The alternative decoding method was proposed from analogue CRC, and it had been developed by Zanko et al. [23] . The long block code was decoded and decoupled into parallel LP/LASSO subproblems. Due to this, the complexity was reduced when it compared with the one step convex decoding programming. The result was compared with the least squares when the decoder shows little loss.
The different VLSI architecture had analysed for 3GPP and advanced decoders of LTE/LTE for a trade-off in terms of area and throughput. The two variants of quadratic permutation polynomial (QPP) interleaver had been developed by Verma et al. [24] . This was used to enhance the simplified complexity of 'mod' operator and provide the best compromise between delay, power, and area. The decoder has been using the multi-port memory to increase the throughput.
In wireless communication, the parallel decoders were used for the high data rate of large hardware resources. The memory reduced back trace technique was adopted to calculate the backward recursion factors, and it had been developed by Shrestha et al. [25] . The comparative analysis was considered between MAP and parallel turbo decoder for the performance of bit error rate (BER). The higher data rate benchmarks can be deployed by the implementation of parallel turbo decoders with 8-64 MAP decoders. Thus the performance of decoders had shown the hardware savings at the rate of 34 and 44 % respectively.
Proposed methodology
In communication field, the scope of wireless communication offers the high-speed data transmission over the network to improve the quality of audio, video and the broadcast services.
The turbo codes are still effective, so it is applicable to the 4G standard. The decoders are used to enhance the key parameters of throughput, Figure. 2 Parallel fixed point turbo decoder efficiency, power consumption reduction and avid latency. In this, the throughput will be inversely proportional to the complexity.
The existing methods cannot improve the parameters. This can be done by the proposed algorithm of fixed point with advanced long-term evolution. The proposed flow chart is given in Fig. 1 , in which the LTE decoder consists of two phases like encoding and decoding. The major function of code block segmentation done before the process of encoding which leads to split the large size of blocks into smaller code blocks. Then checks the length of the code. Then the binary input signal is encoded by the encoder, and the produced output is modulated to provide the binary key values. With this output signal add the noise signal which is then decoded in the receiver side.
Tail overlapped decoding algorithm
For turbo decoding, the MAP algorithm is used, but it is memory intensive due to the size of memory because the separate calculation has done for forward and backward state metrics. This method appears latency due to the waiting of previous phase completion [26] . The maximum a posteriori algorithm is sensitive to SNR mismatch and error in noise variance. The ASIP structure is provided in order to achieve the flexible throughput over this the high-speed downlink packet access, and long term evolution has been analysed. But it is not much better for advanced long term evolution [27] . This can be overcame by the floating point decoder.
Floating point decoding algorithm
This algorithm is compatible with all turbo codes for long term evolution and WiMAX standards. The data transmitted along the path is critical due to the instruction execution to the longest path where the number of stages required creates the delay [28] . This considers only the algorithmic part. The fixed point algorithm is used to avoid the critical path and increase the rate of throughput also it makes trade-off between the performance. The error correction capability can be made better by enabling the minimum number of bit widths and iterations.
Fully parallel fixed-point decoder
The fixed point decoder is characterized by the minimum number of bit widths and iterations to maintain the error correction capability. The fixed point overflow can be reduced by the logarithmic likelihood ratio to provide the reduced bit width. The new hard-wried 6144-bit interleaver is proposed for better error correction capability and to decode the shorter frames and it is shown in Fig. 2 .
The trade-off between the hardware attributes and error correction capacity of bit error rate is shown in Fig. 3 , the attribute like area, complexity, throughput and latency itself creates trade-off. The trade-off can be overcome either by hardware level or algorithmic level. In a hardware implementation, the error correction capability will be degraded. In the proposed scheme, the trade-off can be limited by the algorithmic level of fixed point turbo decoder algorithm. The termination unit of the proposed decoder is implemented with eight data path stages which are operated on the basis of the given Eq. (1). This is the conversion of long likelihood ratio into the extrinsic state metrics. In first data path stage, the terms presented in Eq. (2) is calculated, and in the sixth stage, the parameters in Eq. (3) of algorithmic blocks are calculated in the backward recursive manner which needs two stages. The last data path can be obtained by normalization, and it has various bit widths. The termination unit is needed in the decoder before starting the decoding process. The odd-even interleaver plays an important role in turbo decoder to perform the operation of odd and even. While decoding processes the extrinsic information 1, ̅̅̅̅ is exchanged between the decoder of upper and lower through the process of interleaver, and it is given in Eq. (4). In an algorithmic part having the block of the decoder to accept the number of forward and backward state metrics. In both forward and backward state matrix, the advanced long-term evolution engages the state like M=8. The index value of an algorithmic block is engendered in the previous half cycle where registers are required save the half cycles. So in the initial stage of the iteration, the priori message . The iteration starts from the initial stage of 0 = 0. In which the term of − can be replaced in terms of the negative constant with the magnitude of high range, during this interval of time the fixed point is employed in the decoder [29] .
The forward state matrix is given as,
The backward state matrix is obtained by the termination unit of decoder, and it is given by,
In above Eqs. (2) and (3) the value of  can be represented by the binary level of one and those equations are used here to yield the state metrics. In the tail overlapped the estimated message bits can be retained from the binary test. But in the case of the fixed point, the hard decision value is obtained.
Result and Discussion
The performance of delay, throughput, latency, bit error rate, complexity, and power consumption is analysed in this section.
Delay
The decoding delay is made up of interleaver delay and computation delay of the dedicated decoder. In this paper the reduction of delay is achieved via the reduction of memory size. During two clock cycle, the termination unit causes the delay before starting the decoding progress. In proposed scheme, the delay has reduced due to the termination unit start at the same time of decoding. 
Latency and throughput
The aim of this work is to minimize the latency of the fixed point turbo decoder without the change appearance in convergence and number of iterations. If any changes occurred in the iteration, it would affect the coding gain. Here the SISO parallel advanced LTE turbo decoder schemes are proposed to reduce the latency than that of the sequential decoder. The latency per frame can be evaluated by,
The number of iterations is considered as 39 and the clock frequency in the range of about 125 MHz. Then the latency is reduced in the range of about 0.4 μs which is slightly improved than the existing methods.
In Eq. (6) the latency is inversely proportional to the throughput of the decoder. If the latency is decreased, then the throughput will increase. As contrast the decrement of throughput yields high latency and it is shown in below Fig. 4 . This can be improved by the reduction of a required number of iterations, and it depends on the number of decoding units. In this, the throughput can be improved by the parallel architecture of decoder, and it can achieve the rate of about 1Gbps. The energy-efficient decoder can improve the throughput. These can be increased by the division of information blocks into several segments. The iterative decoding is done on the basis of fixed point to improve the throughput. The processing units are less when compared to the other methods also there is no need of external memory for storage application. The throughput can be represented by,
The throughput can be improved by increasing the iteration and latency reduction in each frame at 0.4 μs. In this fixed point turbo decoder, the number of iterations increased at the rate of 39 than the existing decoders as well as the latency is reduced. Thus the proposed system can yield the 15 Gbps which is greater than that of the existing methods. 
Bit Error Rate
The quantization distortion degrades the bit error performance. The size and structure of the interleaver can affect the error performance (BER). The larger distance of the decoder can widely lead to increase the slope of bit error rate. The various bit width can make the better error performance. If the iteration count becomes high, due to this, the error performance can make better than that of the less number of iterations. The above Fig. 5 , shows that the bit error rate. When the signal to noise ratio and bit width is improved then the bit error rate is decreased. The red mark indicated the reduced bit widths and the blue mark indicated the increased bit widths. The additive white Gaussian channels have included in this decoder. The BER degradation can be eliminated by the enlargement of state metrics by at least two bits. So the overall bit widths are developed high, but the other performance will degrade. So the zero state metric is used here to improve the performance. This normalizes the value of forward and backward state metrics like ̅̅̅̅ (0) and −1 (0) as mentioned in Eqs. (2) and (3). The rate of bit widths (K=6144) is used to analyse the bit error. Thus an error is too poor than that of the existing methods.
Power Consumption
The channel decoder is energy constrained as it has the limited energy resource. The error value is reduced in turbo cliff region. The criterion of early stopping consumes a lot of power, and this can be reduced by the fixed point approach. If the number of processing elements consumes more power, then this can reduce the processing capability of element. The energy estimation of the decoder which has the arbitrary length by scaling of average energy consumption around the location of processing element in middle row and it is found out by,
In fixed point, the transmission errors are eliminated. Thus the proposed scheme consumes less amount of energy per frame is 3.74μJ. Then the total energy consumption also reduced and it can be represented by,
In above Eq. (9) the energy consumption is inversely proportional to the noise ratio. If the decoder consumes more energy, then the noise ratio gets decreased and vice versa. If the noise ratio is high then the transmission errors are reduced, due to this the power consumption also reduced and it is shown in Fig. 6 .
Computational Complexity
The iteration depends on the complexity of the coder. If the number of iteration becomes high then the complexity is increased. So it is reduced in parallel processing of the decoder. The computational complexity is reduced by the avoidance of critical path. This method can enable the reduced complexity than the previous methods.
The above Table 1 , shows that the performance of clock frequency, power consumption, throughput, and latency and of fixed approach was compared with the existing methods. The effectiveness of the proposed scheme is the throughput improvement and it is about 15 Gbps which is highly improved than the other methods. From the comparison results, it is proved that proposed fixed point turbo decoder is better than that of the other decoder through the performance and usage of standard inter-leaver as well as proposed decoder. The rate of latency, clock frequency and power consumption are optimized due to this the better throughput is obtained.
Conclusion
The design of parallel turbo decoder for the advanced 4GPP-LTE is developed in this paper. In this work the fixed point algorithm is suggested to facilitate the parallel operation of decoders. The proposed methodology is accurate for decoding operation when compared with the working and operation of tail overlapped, floating point, High performance parallel turbo decoder, Network on chip for turbo decoders. The standard turbo decoder enables the efficient odd-even interleavers and the novelty is introduced by reducing the complexity of the proposed algorithm to a level of 50%. The interleaver leads to support decoding of frames with interleaver patterns and undefined length. In future, the reconfigurable interleaver is used for multi-mode environment. The identification of bit width at various levels causes a trade-off between the low computational complexity and better bit error rate. The proposed scheme is offered better throughput, low latency, reduced power consumption which is superior to that of the existing methods.
