VLSI ARCHITECTURE FOR AN AREA-EFFICIENT COMPUTATION IN LTE TURBO DECODERS by Aparna, Ms. S. & Pradeep Kumar, Mr. T.
S. Aparna* et al. 
  (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
  Volume No.4, Issue No.6,October – November 2016, 4295-4298.  
2320 –5547 @ 2013-2016 http://www.ijitr.com All rights Reserved.  Page | 4295 
VLSI Architecture For An Area-Efficient 
Computation In LTE Turbo Decoders 
Ms. S.APARNA 
M.Tech Student 
Department of ECE 
Vaagdevi College of Engineering 
Warangal, Telangana, India 
Mr. T. PRADEEP KUMAR 
Assistant Professor 
Department of ECE 
Vaagdevi College of Engineering,  
Warangal, Telangana, India
Abstract-  Long Term Evaluation (LTE) has been used to achieve peak data rates in wireless 
communication system. Turbo codes are used as the channel encoding scheme. MAP algorithm has been 
used as a decoding scheme. Complexity in MAP algorithm is reduced by implementing the algorithm in 
log domain giving rise to Log MAP algorithm. The main objective of this work is to reduce the complexity 
of state metric computation by employing of various algorithms and these algorithms differ only by their 
implementation of correction terms. The ACS unit is implemented using constant log MAP algorithm, 
linear log MAP algorithm, MAX log MAP algorithm, multi step log MAP algorithm and hybrid log MAP 
algorithm. The state metric calculation is implemented with the help of radix-4 Add -Compare-Select 
(ACS) unit. The distance calculation involved between two concurrent computations of state metric can 
be shared among them which give rise to Maximum Shared Resource (MSR) architecture. The proposed 
implementation of these algorithms leads to reduction in the power dissipation, propagation delay and the 
number of logical elements used for the recursion computation in turbo decoders used in LTE system. 
The MSR architecture for recursion computation reduces the number of LUTs by 12.1% when compared 
with the existing. 
 Keywords:  Add–Compare–Select (ACS) Unit; Long-Term Evolution (LTE); Turbo Decoder; Wireless 
Communications); 
I. INTRODUCTION 
During the last few years, 3G wireless 
communication standards, such as HSDPA, firmly 
established themselves as an enabling technology 
for data-centric communication. The advent of 
smart-phones, net books, and other mobile 
broadband devices finally ushered in an era of 
throughput intensive wireless applications. The 
rapid increase in wireless data traffic now begins to 
strain the network capacity and operators are 
looking for novel technologies enabling even 
higher data-rates than those achieved by HSDPA. 
Recently, the new air interface standard LTE (Long 
Term Evolution) has been defined by the standards 
body 3GPP and aims at improving the data-rates by 
more than 30× (compared to that of HSDPA) in the 
next few years. Theoretically, LTE supports up to 
326.4 Mb/s , whereas the industry plans to realize 
the first milestone at about 100 Mb/s in 1-or-2 
years. 
LTE specifies the use of turbo-codes to ensure 
reliable communication. Parallel turbo-decoding, 
which deploys multiple soft-input soft-output 
(SISO) decoders operating concurrently, will be the 
key for achieving the high data-rates offered by 
LTE. However, the implementation of such will be 
among the main challenges in terms of 
computational intensity and power consumption. 
The fact that none of the recently reported parallel 
turbo-decoders achieves the LTE peak data-rate or 
provides desirable power consumption for battery-
powered devices of less than 100mW at the 100 
Mb/s milestones, indicates that the architecture 
design for such decoders is a challenging task. 
Recently, long-term evolution (LTE) advanced has 
been dominated as the next-generation wireless 
communication standard, which is aimed at higher 
peak data rates close to 3 Gb/s . The turbo decoder, 
which is specified in LTE, reveals to be a limiting 
block toward this goal due to its iterative decoding 
nature, high latency, and significant silicon area 
consumption. The decoding procedure is performed 
using the algorithm optimal decoding of linear 
codes. Since the implementation of the actual 
maximum a posteriori (MAP) algorithm incurs 
very high computational complexity, typically, two 
modified forms of the MAP algorithm, i.e., the 
max-log-MAP and log-MAP algorithms, are 
commonly realized instead.  
In these two alternative methods, the MAP core 
consists of log-likelihood ratio (LLR) units, as well 
as the core units to compute α, β, and γ, i.e., the 
forward, backward, and branch metrics, 
respectively. In fact, the α and β units, due to their 
recursive computation nature, are the most 
challenging units to implement, occupying almost 
40% of the whole MAP core area. The γ unit, on 
the other hand, is a trivial part of the turbo decoder, 
consisting of few addition computations. Therefore, 
an area-efficient architecture for α and β metrics 
computation is highly desirable, which has always 
been a challenge in literature.  
S. Aparna* et al. 
  (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
  Volume No.4, Issue No.6,October – November 2016, 4295-4298.  
2320 –5547 @ 2013-2016 http://www.ijitr.com All rights Reserved.  Page | 4296 
In order to address this challenge, in this brief, a 
new relation between the α and β metrics is 
introduced; based on this new relation, a novel 
add–compare–select (ACS) unit for forward and 
backward computation is proposed. The proposed 
scheme results in, at most, an 18.1% reduction  in 
the silicon area compared with the designs reported 
to date. 
II. TURBO ENCODING ARCHITECTURE 
Turbo encoders are mainly designed by combining 
2 recursive systematic convolutional (RSC) 
encoders by parallel concatenation method which is 
separated by a single interleaver. Fig-1 shows the 
block diagram of turbo encoder where the RSC 
encoder is selected as rate 1/3 encoder. The input 
sequence Ek is represented by the binary input 
values Ek = [E1, E2, E3,........En]. These input 
sequences are passed into the encoder path1 
producing the output of systematic sequence   
  
and recursive redundant output sequence   
   
called the parity1 encoded bits. The input Ek is 
then interleaved using a QPP (Quadratic 
permutation polynomial) or random interleaver. 
Interleaver is used in-between to enhance the 
performance of turbo codes. The pseudo random 
interleaver is usually used, where the data bits are 
read-out in user designed fashion. These 
interleaved data sequences are passed through 
encoder path2 producing the other set of recursive 
redundant output sequence Okp2 called parity2 
encoded sequence. Thus the encoder produces three 
outputs from a single input, hence called the rate 
1/3 encoder unit. 
 
Fig.1 Turbo Encoder Architecture 
III. TURBO ENCODING ARCHITECTURE 
 The turbo decoder mainly consists of serially 
connected soft-input soft-output (SISO) decoders 
with interleaver in between and the corresponding 
deinterleaver (it performs reverse operation of 
interleaver). The outputs of encoded unit serve as 
input to the decoder unit. Thus the decoder has 
three inputs   
  ,   
  , and    
   which on iterative 
decoding produces the output Dk. The decoders 
considered here are MAP (Maximum A Posteriori) 
decoders. The block representation is shown in fig-
2. 
The MAP decoder1 recieves the systematic and 
parity1 data bits, which on decoding produces a 
soft value which is an extrinsic estimate,these 
values are interleaved and again layed as input to 
MAP decoder 2 .after decoding, again the output is 
sent into the deinterleaver, thus it now consists of 
second estimated extrinsic valueswhich intern is 
again fedback into the MAP decoder1. There is 
continuous iterations taking place between MAP 
decoder1 and 2 units until the error rate is found to 
be null.in the last stage simple approximations are 
performed to obtain the hard decision values at the 
MAP decoder2 stage. 
 
Fig.2 Turbo Decoder Architecture 
Turbo decoder algorithms  
In 1974, Bahl, Cocke, Jelinek and Raviv proposed 
the posteriori probabilities based decoding 
algorithm which later came to be known as MAP 
algorithm. There are2 other schemes of MAP 
algorithm which makes the computations easier 
and faster, they are; log-MAP and Max-log-MAP 
algorithm. The MAP decoders receive the input 
binary sequence and estimate the most likely input 
value. These values are referred to as the log-
likelihood ratios also called the soft decisions-
having polarity and amplitude. The polarity of log 
likelihood ratios (LLR) value will provide the sign 
of the bit and amplitude will give the probability. 
The LLR value is calculated using; 
 
Where the numerator indicates the APP (A 
Posteriori Probability) of input sequence Ek. The 
turbo decoder performs the decoding action 
iteratively i.e., the MAP decoder1 performs 
decoding and then its values are passed to MAP 
decoder2 via interleaver, then the MAP decoder2 
performs estimations after decoding and sends the 
same to MAP decoder1, thus first iteration is 
completed. These values obtained from serves as 
priori values for second iteration. Until the bit error 
rate is reduced to null or approximately null, the 
iterations are performed. The LLR values for 
forward, backward and branch metrics are 
calculated using the fprmula. 
 
   
 
S. Aparna* et al. 
  (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
  Volume No.4, Issue No.6,October – November 2016, 4295-4298.  
2320 –5547 @ 2013-2016 http://www.ijitr.com All rights Reserved.  Page | 4297 
  
Where  and  are forward and 
backward traced directions respectively. Lc and Le 
are channel reliability and extrinsic information 
bits. Max-log-MAP algorithm is a method where 
calculated values are rewritten in logarithmic 
domain to simplify calculations. 
 
ACS UNITS  
The forward and backward recursion computation 
is calculated using ACS architecture. The radix2 
ACS architecture is shown in fig-3. 
 
Fig.3 ACS Unit for Radix-2 
 
Fig.4 ACS Unit for Radix-4 
The components in it include the adders, 
comparators and selector unit, hence the name 
ACS. The LUT (Look Up Table) is used to 
implement the logarithmic term. In order to 
increase the processing speed, we are combining 
two radix2 units to form a radix4 unit illustrated in 
fig-4. 
 
Fig.5 Radix-4 ACS Unit with MSR Architecture 
Then the, β and γ computations are as shown 
below. According to trellis diagram, the node 
values are calculated as follows; 
 
                      
 
Radix2 unit is used to compute l and m values, but 
another radix2 unit is necessary, also we can 
observe that distances between input of l and p are 
equal, in the same manner for m and q also they are 
equal, thus sharing the resources between them 
which lead to a novel MSR architecture which 
helps in reducing area. It is illustrated in fig-5. 
IV. RESULTS 
Fig. 6 Turbo Encoder 
 
Fig. 7 Turbo Decoder 
  
S. Aparna* et al. 
  (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
  Volume No.4, Issue No.6,October – November 2016, 4295-4298.  
2320 –5547 @ 2013-2016 http://www.ijitr.com All rights Reserved.  Page | 4298 
Device Utilization Summary: 
Selected Device : 3s500efg320-4 
Number of Slices: 364  out of   4656     7% 
 Number of 4 input LUTs: 660  out of   9312     7% 
 Number of IOs: 64 
Number of bonded IOBs: 64 out of    232    27% 
Delay: 10 us 
V. CONCLUSION 
In this paper, by investigating the relation between 
the recursion computations, a novel method has 
been proposed which is called MSR. By applying 
the proposed method to the previous ACS 
architectures, an area-efficient architecture for 
recursive computations was achieved. The 
proposed architectures achieve, at most, 18.1% 
reduction in complexity according to the 
implementation results, which significantly reduces 
the complexity of the whole MAP core of the turbo 
decoder. Furthermore, the proposed method can be 
also used for higher radix designs to reduce 
complexity. 
REFERENCES 
[1]  Arash Ardakani and Mahdi Shabany, “A 
Novel Area-Efficient VLSI Architecture for 
Recursion Computation in LTE Turbo 
Decoders”, IEEE Transactions on Circuits 
and Systems—Ii: Express Briefs, Vol. 62, 
No. 6, June 2015.  
[2]  S. Belfanti, C. Roth, M. Gautschi, C. 
Benkeser, and Q. Huang, “A 1 Gbps LTE-
advanced turbo-decoder ASIC in 65 nm 
CMOS,” in Proc. VLSIC Symp., Jun. 2013, 
pp. C284–C285. 
[3]  L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, 
“Optimal decoding of linear codes for 
minimizing symbol error rate (Corresp.),” 
IEEE Trans. Inf. Theory, vol. IT-20, no. 2, 
pp. 284–287, Mar. 1974. 
[4]  J. Woodard and L. Hanzo, “Comparative 
study of turbo decoding techniques: An 
overview,” IEEE Trans. Veh. Technol., vol. 
49, no. 6, pp. 2208–2233, Nov. 2000.  
[5]  C. Studer, C. Benkeser, S. Belfanti, and Q. 
Huang, “Design and implementation of a 
parallel turbo-decoder ASIC for 3GPP-
LTE,” IEEE J. Solid-State Circuits, vol. 46, 
no. 1, pp. 8–17, Jan. 2011.  
[6]  C. Berrou and A. Glavieux, “Near optimum 
error correcting coding and decoding: 
Turbo-codes,” IEEE Trans. Commun., vol. 
44, no. 10, pp. 1261–1271, Oct. 1996.  
[7]  V. Franz and J. Anderson, “Concatenated 
decoding with a reducedsearch BCJR 
algorithm,” IEEE J. Sel. Areas Commun., 
vol. 16, no. 2, pp. 186–195, Feb. 1998.  
[8]  J.-F. Cheng and T. Ottosson, “Linearly 
approximated log-MAP algorithms for turbo 
decoding,” in Proc. IEEE VTC—
SpringT,okyo, Japan,May 2000, vol. 3, pp. 
2252–2256. 
 
