A FRAMEWORK TO DECREASE POWER CONSUMPTION IN DECODING by Sravyatha, S. & Dada Beer, S.
S. Sravyatha* et al. 
  (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
  Volume No.4, Issue No.6, October – November 2016, 5024-5026. 
2320 –5547 @ 2013-2016 http://www.ijitr.com All rights Reserved.  Page | 5024 
A Framework To Decrease Power 
Consumption In Decoding 
S.SRAVYATHA 
PG Scholar, Dept of ECE 
Sir C.V. Raman Institute of technology & Sciences, 
Tadipatri, Anantapur(Dt), AP,India 
S.DADA BEER 
Assistant Professor, Dept of ECE 
Sir C.V. Raman Institute of technology & Sciences, 
Tadipatri, Anantapur(Dt), AP,India
Abstract: We advise a pre-computation architecture added to -T-formula for VD, which could effectively 
lessen the power consumption without degrading the decoding speed much. High-speed, low-power style 
of Viterbi decoders for trellis coded modulation (TCM) systems is presented within this paper. It is 
known the Viterbi decoder (VD) may be the dominant module figuring out the general power use of TCM 
decoders. An over-all means to fix derive the perfect pre-computation steps can also be succumbed the 
paper. Implementation consequence of a VD for any rate-3/4 convolution code utilized in a TCM system 
implies that in contrast to the entire trellis VD, the precomputation architecture cuts down on the power 
consumption up to 70% without performance loss, as the degradation in clock speed is minimal. To beat 
this drawback, two variations from the T-formula happen to be suggested: the relaxed adaptive VD, 
which implies utilizing a believed optimal PM, rather to find the actual one each cycle and also the 
limited-search parallel condition VD according to scarce condition transition (SST). 
Keywords: Trellis Coded Modulation (TCM); Viterbi Decoder; VLSI; 
I. INTRODUCTION 
Typically, a TCM system employs a higher-rate 
convolution code, which results in a higher 
complexity from the Viterbi decoder (VD) for that 
TCM decoder, whether or not the constraint entire 
convolution code is moderate. Therefore, when it 
comes to power consumption, the Viterbi decoder 
may be the dominant module inside a TCM 
decoder. To be able to lessen the computational 
complexity along with the power consumption, 
low-power schemes ought to be exploited for that 
VD inside a TCM decoder. Over-scaling from the 
supply current usually needs to consider the entire 
system which includes the VD (if the system 
enables this kind of over-scaling or otherwise), 
which isn't the primary focus in our research [1]. 
RSSD is within general less efficient because the 
T-T-formula T-formula is much more generally 
used than T-formula in practical applications, since 
the T-formula needs a sorting process inside a 
feedback loop while T-formula only looks for the 
perfect path metric (PM), that's, the minimum value 
or even the maximum worth of all PMs. To beat 
this drawback, two variations from the T-formula 
happen to be suggested: the relaxed adaptive VD, 
which implies utilizing a believed optimal PM, 
rather to find the actual one each cycle and also the 
limited-search parallel condition VD according to 
scarce condition transition (SST). Within our 
preliminary work, we've proven that whenever put 
on high-rate convolution codes, the relaxed 
adaptive VD suffers a serious degradation of bit-
error-rate (BER) performance because of the 
natural drifting error between your believed 
optimal PM and also the accurate one. In TCM, the 
encoded data will always be connected having a 
complex multi-level modulation plan like 8-ary 
phase-shift keying (8PSK) or 16/64-ary quadrature 
amplitude modulation (16/64QAM) via a 
constellation point mapped. In the receiver, a 
gentle-input VD should be familiar with guarantee 
a great coding gain. Within our preliminary work, 
we suggested an add-compare-select unit (ACSU) 
architecture according to precomputation for VDs 
incorporating T-formula, which efficiently 
increases the clock speed of the VD with T-formula 
for any rate-3/4 code [2]. Within this work, we 
further evaluate the precomputation T-formula. An 
organized way to look for the optimal 
precomputation steps is presented, in which the 
minimum quantity of steps for that critical path to 
offer the theoretical iteration bound is calculated 
and also the computational complexity overhead 
because of pre-computation is evaluated. 
II. METHODOLOGY 
Inside a TCM decoder, this module is substituted 
with transition metrics unit (TMU) that is more 
complicated compared to BMU. Then, BMs are 
given in to the ACSU that recursively computes the 
PMs and outputs decision bits for every possible 
condition transition. Next, the choice bits are kept 
in and retrieved in the SMU to be able to decode 
the origin bits across the final survivor path. The 
PMs of the present iteration are kept in the PM unit 
(PMU). T-formula requires extra computation 
within the ACSU loop for calculating the perfect 
PM and puncturing states. Therefore, an easy 
implementation of T-formula will dramatically 
lessen the decoding speed. The main factor of 
increasing the clock speed of T-formula would be 
to rapidly discover the optimal PM. The 
fundamental concept of the precomputation T-
formula was presented [3]. Think about a VD for 
S. Sravyatha* et al. 
  (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
  Volume No.4, Issue No.6, October – November 2016, 5024-5026. 
2320 –5547 @ 2013-2016 http://www.ijitr.com All rights Reserved.  Page | 5025 
any convolution code having a constraint length, 
where each condition receives candidate pathways. 
Then, we group the use into several clusters to 
lessen the computational overhead brought on by 
look-ahead computation. The trellis butterflies for 
any VD will often have a symmetric structure. For 
every cluster can be simply acquired in the BMU or 
TMU and also the amount of time in each cluster 
could be recalculated simultaneously once the 
ACSU is updating the brand new PMs for time. 
Consequently, the decoding speed from the low-
power VD is greatly improved. However, after 
reaching a particular quantity of steps, further 
precomputation wouldn't lead to additional benefits 
due to the natural iteration bound from the ACSU 
loop. To offer the iteration bound, for that 
precomputation in every pipelining stage, we limit 
the comparison to become among only metrics [4]. 
In some instances, the amount of remaining metrics 
may slightly expand throughout a certain pipeline 
stage after addition with BMs. Usually; the 
additional delay could be absorbed by an enhanced 
architecture or circuit design. Whether or not the 
extra delay is difficult to get rid of, the resultant 
clock speed is not far from the theoretical bound. 
To completely attain the iteration bound, we're able 
to add another pipeline stage, though it's very 
pricey and so will be proven next. The majority of 
the computational overhead originates from adding 
BMs towards the metrics each and every stage as 
indicated. Quite simply, should there be remaining 
metrics after comparison inside a stage; the 
computational overhead out of this stage reaches 
least addition operations. The precise overhead 
differs from situation to situation in line with the 
convolution code’s trellis diagram. Again, to 
simplify the evaluation, we think about a code 
having a constraint length and precomputation 
steps. Also, we think that each remaining metric 
would result in a computational overhead of 1 
addition operation. Therefore, a small amount of 
precomputational steps is preferred although the 
iteration bound might not be fully satisfied. 
Generally, one- or more-step precomputation is a 
great choice. We still make use of the 4-D 8PSK 
TCM system referred to as the instance. The speed-
3/4 convolution code used in the TCM product is 
proven. BER performance from the VD employing 
T-formula with various values well over an 
additive white-colored Gaussian noise funnel is 
proven. The general coding rates are 11/12 after 
because of other encoded bits in TCM system. In 
contrast to the perfect Viterbi T-formula, the brink 
could be decreased to .3 with under .1dB of 
performance loss, as the computational complexity 
might be reduced. We've figured two-step 
precomputation may be the optimal option for the 
speed-3/4 code VD. The minimum worth of each 
BM group (BMG) could be calculated in BMU or 
TMU after which passed towards the “Threshold 
Generator” unit (TGU) to calculate. This 
architecture continues to be enhanced to satisfy the 
iteration bound. In contrast to the traditional T-
formula, the computational overhead of the 
architecture is 12 addition operations along with a 
comparison that is a little more compared to 
number acquired in the evaluation. We address an 
essential issue regarding SMU design when T-
formula is utilized. There are two various kinds of 
SMU within the literature: register exchange (RE) 
and trace back (TB) schemes. Within the regular 
VD with no low-power schemes, SMU always 
outputs the decoded data from the fixed condition 
(arbitrarily selected ahead of time) if RE plan can 
be used, or traces back the survivor path in the 
fixed condition if TB plan can be used, for low-
complexity purpose. For VD added to T-formula, 
no condition is certain to be active whatsoever 
clock cycles. As a result it doesn't seem possible to 
appoint a set condition for either outputting the 
decoded bit (RE plan) or beginning the trace-back 
process (TB plan). Within the conventional 
implementation of T-formula, the decoder may use 
the perfect condition. The creation of the priority 
encoder will be the unpaged condition using the 
cheapest index. Implementation of these a table 
isn't trivial. Within our design, we employ a 
competent architecture for that 64-to-6 priority 
encoder according to three 4-to-2 priority encoders. 
Applying some-to-2 priority encoder is a lot 
simpler than applying the 64-to-6 priority encoder 
[5]. The entire-trellis VD, the VD using the two-
step precomputation architecture and something 
using the conventional T-formula are modeled with 
Verilog High-density lipoprotein code. The soft 
inputs of VDs are quantized with 7 bits. Each PM 
in most VDs is quantized as 12 bits. RE plan with 
survival period of 42 can be used for SMU and also 
the register arrays connected using the purged 
states are clock-gated to lessen the ability 
consumption in SMU. For ASIC synthesis, we use 
TSMC 90-nm CMOS standard cell. The synthesis 
targets to offer the maximum clock speed for every 
situation and also the answers are proven. 
 
Fig.1.Proposed T-algorithm 
  
S. Sravyatha* et al. 
  (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
  Volume No.4, Issue No.6, October – November 2016, 5024-5026. 
2320 –5547 @ 2013-2016 http://www.ijitr.com All rights Reserved.  Page | 5026 
III. RESULTS 
Simulation Results: 
The simulations results are taken from the XILINX 
tool and the below shown fig 9 describes the 
simulation results of convolutional encoder and the 
output will be according to the states in the 
convolutional encoder . 
 
Fig : Simulationl result of convolutional encoder 
Simulation Results Of Viterbi Decoder 
 
Fig : Simulation result of Viterbi Decoder 
 
These wave forms are corresponding to the data 
which we have given at the encoder input. we 
know that in any digital communication channel 
the output data at the decoder is same as that  of the 
input data  that was given at the encoder input. So 
that, the output which we got at the viterbi decoder 
is same as that of the input which we have given at 
the input of the encoder. observe in the output 
wave forms shown in above fig 10. 
IV. CONCLUSION 
This T-formula is appropriate for TCM systems 
which only use high-rate convolution codes. The 
precomputation architecture that includes T-
formula efficiently cuts down on the power use of 
VDs without lowering the decoding speed 
appreciably. We've suggested a higher-speed low-
power VD the perception of TCM systems. We've 
also examined the precomputation T-formula, in 
which the optimal precomputation steps are 
calculated and discussed. ASIC synthesis and 
power estimation results reveal that, in contrast to 
the entire-trellis VD with no low-power plan, the 
precomputation VD could lessen the power 
consumption. Finally, we presented a design 
situation. Both ACSU and SMU are modified to 
properly decode the signal. 
V. REFERENCES 
[1]  J. B. Anderson and E. Offer, “Reduced-state 
sequence detection with convolution codes,” 
IEEE Trans. Inf. Theory, vol. 40, no. 3, pp. 
965–972, May 1994. 
[2]  J. He, Z. Wang, and H. Liu, efficient 4-D 
8PSK TCM decoder architecture,” IEEE 
Trans. Very Large Scale Integer. (VLSI) 
Syst., vol. 18, no. 5, pp. 808–817, May 
2010. 
[3]  F. Chan and D. Haccoun, “Adaptive viterbi 
decoding of convolution codes over 
memoryless channels,” IEEE Trans. 
Commun., vol. 45, no. 11, pp. 1389–1400, 
Nov. 1997. 
[4]  J. Jin and C.-Y. Tsui, “Low-power limited-
search parallel state viterbi decoder 
implementation based on scarece state 
transition,” IEEE Trans. Very Large Scale 
Integer. (VLSI) Syst., vol. 15, no. 11, pp. 
1172–1176, Oct. 2007. 
[5]  J. He, H. Liu, and Z. Wang, “A fast ACSU 
architecture for viterbi decoder using T-T-
algorithm,” in Proc. 43rd IEEE Asilomar 
Conf. Signals, Syst. Comput., Nov. 2009, 
pp. 231–235. 
