A runtime reconfigurahle architecture for high speed Viterhi and Turbo decoding is designed and implemented on an FPGA. The architecture can be reconfigured to decode a range of convolutionally coded data with constraint lengths varying from 3 to 9, rates I12 and 113, and various generator polynomials. It can also he reconfigured to decode Turbo coded data with constraint length 4 and rate 113. Reconfiguration ofthe architecture requires a single clock cycle and does not require FPGA reprogramming. The proposed architecture can deliver data rates up to 60.5 Mhps for Viterhi decoding and 3.54 Mhps for Turbo decoding, making it suitable for a range of wireless communication standards like IEEE 802.1 la, 3GPP, GSM, GPRS, and many others.
INTRODUCTION
There has been a growing need for devices that have the flexibility to support multiple communication standards. A reconfigurable architecture, which has the flexibility to operate in multiple standards has obvious advantage over conventional devices in terms of smaller area and seamless switching across standards. This has motivated the design of a high speed reconfigurable channel decoding architecture with Viterbi and Turbo decoding capability as proposed in this paper. Previous work on unified ViterbiKurho decoding [I] was limited to 3G codes and data rates (2 Mhps). A flexible Viterbi decoder [2] with data rate up to 2.5 Mbps was also proposed recently.
DECODER ARCHITECTURE
As the proposed architecture is highly flexible and caters to high data rate systems, numerous critical issues were addressed in order to realize it. First and log-MAP are two competing algorithms for Turbo decoding. While the SOVA algorithm offers a small degradation in performance compared to log-MAP [3], the disadvantage is more than offset hy the fact that the computational complexity of SOVA is nearly half that of log-MAP, and it is also very similar to Viterhi decoding, hence making SOVA the ideal candidate for our reconfigurahle architecture.
Another important issue was the reconfiguration between different constraint lengths(K) and rates(R), on top of the two different decoding techniques. We know that for a con- The third important issue was to limit power consumption. Architectural power control schemes were designed so as to power down parts of the circuit that may not be required for a paticular decoding type. Figure 3 shows the complete architecture with the reconfigurablelflexible units shaded, each of which we shall now discuss. A detailed study of the architecture can he found in the work by Vaya ~41.
Branch Metric Unit (BMU)
In order to provide enhanced flexibility for all different decoding configurations, the BMU has been divided into three major units: Branch Metric Computation Unit (BMcompute). Codeword Look-Up 
ICASSP 2003
BMcompute : As shown in Figure 2 , the BMcompikte unit computes all the possible branch metrics for a given decoder type with inputs being: the received data, decoder type(TurboNiterbi), and the rate. It may be noted here that the number ofpossible branch metrics is equal to 2". where k l n is the rate of the constituent encoder. Each ACS unit needs a specific pair of branch metrics and these are provided to it using the Codeword LUT and B M m u x multiplexer. Codeword LUT and BMmux: The Codeword LUT uses the constraint length, rate, decoder type, and the index of the ACS unit to provide the relevant B M m u x with a codeword, as shown in Figure 2 . This codeword is used as a control signal by the B M m u x to select the correct branch metric. Also, new codewords can be programmed into the Codeword LUT, hence providing support for any generator polynomial for constraint length 3-9 Viterbi decoding and constraint length 4 Turbo decoding.
Add Compare Select Unit
The ACS unit takes in as inputs: the concerned path metrics, the concerned branch metrics, and outputs the survivor path metrics for Viterbi and Turbo decoding, and the decision bits for Viterbi and Turbo decoding. For the case of SOVA based Turbo decoding, the difference between the path metrics [3] is also computed. It may be noted here that, since K = 4 for constituent Turbo encoders, four ACS Units, specifically with indices from 0 to 3 have been programmed to do the ACS computations for both Viterbi and Turbodecoding, while the remaining ACS units(4 -127) are only activated when computing path metrics and decisions for Viterbi decoding.
Configurable Data Router
Because we have designed a completely parallel high speed architecture, the intermediate path metrics are not stored, but are routed back to the relevant ACS units for use in the next clock cycle. However, this is a complex problem since the routing of the path metrics varies with the constraint length in question. Configurable Data Routers (Figure 2) were designed [4] to solve this problem. Configurable Data Routers consist of banks of multiplexers, each multiplexer receiving inputs from the outputs of different ACS units, and feeding a particular ACS unit, as shown in Figure 1 . The Multiplexers have inbuilt logic to route the path metrics according to the constraint length, decoding type, and the index of the Multiplexer. While 2 input multiplexers are sufficient for Viterbi decoding, 4 input multiplexers are needed for Viterbimurbo decoding. As we can see, Configurable Data Routers are critical to the reconfigurability of the architecture, as they provide the flexibility to migrate between different constraint lengths and decoding types.
Fig. 1. j f h ACS unit and Configurable Data Router interconnections
For writing the decision bits and path metric differences to memory it is important that the data be written in order. With varying constraint lengths, the order of the data changes and hence Configurable Data Routers (as shown in Figures 2, 3) have been employed to route the data according to the constraint length.
Survivor management Unit (SMU)
The flexible traceback units, which form the core of the S M U , use the decodertype, constraint length, current state, and the decision bit stored at a certain state to evaluate the previous state and the decoded bit. The S M U also contains additional hardware for soft decision computation for Turbo decoding, which is powered down when Viterbi decoding is in progress.
Interleaving
The focus of this work was on the reconfigurability between different decoding techniques, and since interleaving is only used for Turbo decoding, a simple block based interleaver was implemented. Data is written in a matrix format, and the transpose of the matrix is output as the interleaved data.
ARCHITECTURAL POWER CONTROL
As explained earlier, for a high speed decoding architecture, a fully parallel architecture has been developed. and BMmux units are used for constraint length 4 decoding, 128 ACS and Bhlniuz units are used for constraint length 9 decoding. The power control mechanism [4] shuts down the clock inputs to the BMmuzes and ACS units that are not pertinent to the ongoing decoding (Figure 2) , hence saving power. Also, depending upon the decoding type (Viterbimurho), parts of the SMU that are not being used are shut down. The proposed architecture was implemented on a Xilinx Virtex I1 FPGA and VHDL was used to describe the architecture. Table 1 shows the various tradeoffs involved in the proposed reconfigurahle design. The gate counts for logic and memory have been separated in order to give an in-depth analysis of the architecture. Under the column 'Decoder Type', the numbers in brackets represent the constraint length of decoder. Comparing the logic gate count for a standalone constraint length 9 Viterhi decoder (2nd row), with that for a reconfigurahle constraint length 3 to 9 Viterhi decoder (Sth row), we see that the gate overhead for reconfiguration is only 9%. Now comparing the reconfigurahle constraint length 3 to 9 Viterbi decoder with our VITURBO architecture (constraint length 3 to 9 Viterhi and Turbo), we see that the gate overhead for Turbo decoding is only 5%. Table 2 shows the achievable data rates and the power consumption for different configurations of VITURBO. For the case of Viterhi decoding, the throughput is one output per clock cycle (with some initial latency). As shown in Table  1 the maximum clocking frequency for VITURBO is 60.5 Mhz, and hence data rates upto 60.5 Mbps are possible for Viterhi decoding. However, Turbo decoding throughput is lesser as numerous iterations are required for generating reliable results. For a clocking frequency of 60.5 Mhz, the throughput is 3.54 Mhps (for four iterations). From the table we can also compare the power consumption for different configurations of VITURBO. We see that for the same throughput, constraint length 5 Viterhi decoding requires much less power than constraint length 9 Viterbi decoding, as the computational complexity for constraint length 5 decoding is much smaller. It is clear from the data presented that while reconfigurahle architectures provide enhanced flexibility at the cost of marginal increase in gate requirements, the power consumption is limited to the active gates in the selected configuration.
SYSTEM IMPLEMENTATION AND RESULTS

CONCLUSIONS
In this paper, we have reported a single reconfigurahle architecture for Viterhi and Turbo decoding. This architecture can provide throughputs in the range of 60 Mbps for constraint length 3-9 Viterhi decoding and 3.54 Mhps for SOVA based Turbo decoding (4 iterations). It was demonstrated that with a 5% overhead in area (excluding memory). a constraint length 3-9 Viterhi decoder could support Turbo decoding. Such an architecture will find applications in devices which will support multiple standards for wireless communications. Power saving techniques ensure that the architecture is feasible for receiver structures, where power is a critical issue.
