Abstract-This work addresses the low-power VLSI implementation of the Viterbi decoder 0). A new precomputational scheme applied to the trellis butterflies calculation is presented. The proposed scheme is implemented in a l h t a t e , rate 1/3 VD. Gatelevel power verification indicates that the proposed design reduces the power dissipated by the original trellis butteflies calculation by 42%.
I. INTRODUCTION
The Viterbi algorithm [I], which has been extensively applied to several decoding and estimation applications in communication and signal processing, was introduced in 1967 as a method for decoding convolutional code [Z] .
Convolutional encoding with Viterbi decoding is one of the most popular forward-error-correction methods for error correction in communication systems. In decoding, the VD attempts to reconshuct the action of the encoder based on the transmission of its outputs over a noisy channel.
The VD is comprised of three basic units -a branch metric unit (BMU), an add-compare-select unit (ACSU) and a trace back unit (TBU). The VD dissipates most of its power on ACSU and TBU.
The feedback loop can not be parallelized using a standard method, so the ACSU is normally considered to be the most critical part of the implementation of high speed Viterbi algorithms. Reference [3] and [4] showed that, despite the feedback loop, high speed can be achieved using a purely feed-forward parallel implementation that operates on the M-Step trellis. A high-speed V U that needs that must calculate more parallel trellis butterflies in parallel usually consumes more power.
Numerous techniques for reducing power dissipation on TBU have been proposed. Scarce state transition architecture [5] has been used to reduce the switching rate of the SMU In [6] , both system and circuit techniques have been proposed to reduce the power consumed by the SMU. In ACSU design for low power consumption, an adaptive VD [7] dynamically discards some states in the trellis that have high path metrics during the decoding, however, the extra-llcnntrols required are too complex.
Reference IS] and [9] introduce the low power design of large state VD.
This work presents a precomputation-based low power design scheme, which can also be applied to ACSU is a collection of butterflies' calculations.
Each buttertly calculates the path metrics of the two paths that connect two old statuses at time t to a new state at time tfl, and selects a smaller one as the new path metric of the new state. The TBU traces the decision vector to generate the corrected output sequence. This can be done either by forward-processing the decision using the so-called register-exchange algorithm or by bachard-tracing the previously stored decisions.
The quality of a VD design is primarily measured by applying four criteria -coding gain, throughput, area and power consumption. This work focuses on the power-efficiency implementation of the txellis butterfly calculation. the number is even large. The power consumed by the ACSU must be minimized to reduce the power consumed by VD.
PRECOMPUTATION SCHEME
Precomputation logic, first proposed in [IO] , is optimized to trade area for power in a synchronous digital circuit. The principle of precomputation logic is to identify logical conditions at some inputs to combination logic that do not affect output. Since those input values do not affect the output, the input transitions can be disabled to reduce the switching activities.
A. Proposed Precomputation Logic
Fig . 5 shows the precomputation architecture of a 16-state, rate 1/3 VD. Due to the nature of ACSj, there are some conditions under which the output is independent of some of the values of the input registers and latches. For a code rate 113 VD, the maximum number of branch metrics is three, when the difference between the old path metric of the two competing paths exceeds three, the selected path can be determined independent of the input data of another path. Under such precomputation condition referred as G(X) in Fig. 5 , we can disable register load of these registers and latches to prevent unnecessary switching, thereby conserving power. The ACSU is correctly computed because it receives all required value from the remaining register.
Comparing Fig 4 and Fig. 5 shows that the extra logic added for the precomputation is latch and G(X). Since the number of branch metric is usually smaller, latches do not cause much hardware cost. For example, an R=1/3 code has a maximum branch metric of three, so two-bit latches suffice to control a branch metric. To be efficient, the selection logic G(X) must also be simple. The following section introduces a simple C(X) design.
B. Precomputation Conditions
To generate the load-disable signal to the unusable registers and latches, a precomputation function G(X) is required to detect the condition under which ACSU is independent of the unusable registers and latches. For ACSU, an obvious G(X) is: b2, bl, bo) respectively, then the precomputation conditions G (X) and G , (X) can be calculated by using the following logic expressions.
According to the G(X), if the probabilities of obtaining a zero and a one are equal, then the
(30/64), under which condition almost half of the path metric calculation is disabled. Also when the load disable signal is asserted, the comparator performs few switching activities because some of its input data are not switched.
Iv. IMF'LEMENTATION AND RESULTS
A 16-state, rate 1/3 V D is designed, using the proposed precomputation logic. The conventional and the proposed VDs are coded in VHDL and synthesized with Synopsys using the TCMS 0.13 um technology library. Fig. 6 shows the results of a gatelevel. Here I" refers to original data input to the convolutional encoder, EOP is the output of the encoder, and SOUT is the output of the VD.
The power consumptions of the two architectures are estimated using a gate-level power simulator based on a real delay model. A 1.2V supply and 25MHZ operating frequency are assumed. The results indicate that the proposed ACSU design has an increased by 3% larger area, a 1% lower speed and a 42% lower power than the conventional ACSU design.
V. CONCLUSIONS
Low-power architectures for the ACSU of VD were presented. A 3% increase in area, 1% increase in speed and 42% reduction in power consumption were obtained using the proposed architecture. 
