ABSTRACT
INTRODUCTION
The Viterbi algorithm (VA) is a maximum-likelihood decoding method, which minimizes the probability of word error for convolutional codes [1] . In digital communication systems, the VA is widely used to detect sequential error-control codes and to detect symbols. In the recent years, a new class of convolutional codes called turbo codes was introduced by Berrou, Glavieux, and Thitimajashima [2] , and it is well known for its extremely superior decoding accuracy. The decoder consists of two component soft-output decoders and operators by iterative fashion. The component soft-output algorithm prescribed in the original turbo code paper [2] is usually known as the maximum a-posteriori probability (MAP) algorithm [2] . [4] .
Due to the outstanding decoding ability, turbo coding gets rapid development within just a few years and become standardized [5] , like CDMA2000 and WCDMA. Generally, the voice and data streams in these systems use different types of coding schemes, such as convolutional code and turbo code. Traditionally, the corresponding Viterbi and turbo decoder are built separately. But here, in order to save chip area and make the design simple and efficient, we propose a unified solution by integrating the two decoders. In the recent works, several timing charts of VA or MAP are presented [6] [7] [8] , but there is no combined timing analysis of both algorithms. Therefore, we propose two types of triple-mode MAP/VA timing charts by complementing idle time of each other. We define triple mode below: (a) MAP/VA concurrent mode. (b) VA single mode. (c) MAP single mode. In addition, we introduce three techniques, including interleaving, pointer and parallel schemes, which can be used in timing charts to reduce memory or increase throughput. Then, by analyzing the concepts and the architectures of the Viterbi and the turbo decoders, some circuits sharing techniques can be applied to merge the main functions into one unified decoder.
TIMING ANALYSIS OF VITERBI DECODING

Viterbi decoding
In the VA operation, it needs to store the values of decision information and fetch while tracing back. Due to reduce the storage space, sliding-widow decoding is adopted in Fig. 1 . At the first time unit (Area I), forward-path-metric recursion begins with equal probability at the head of the first sub block and stores the decision information. When the operation at the second time unit (Area II) reaches the tail of the second sub block, we can choose any state to trace back at the third time unit (Area III) to find the converge state at the head of second sub block. Then, in fourth time unit (Area IV), the valid tracing-back starts at the head of the first sub block and decodes correct information bits. The sliding-window method just reuse three sub block memory and make almost the same performance as global recursion. We define two parameters V and S: the vaild (V) decoding region and shift (S) from present operating to next operating. The unit L means the time of decoding one sub block or the sliding window length. In the Fig. 1 , the valid region equals L (V=L), and the shift equals L (S=L).
The shade area means what time and location the decision information must be kept. Observing the cross line, we can find that at each time constant the decoder needs three sub-block Memory (M=3L), one forward Path Metric (PM) recursive unit (Npm=1) and two Trace-Back (TB) units (Ntb=2). Finally, V divided by S equals throughput (T=V/S=1).
Several researches about timing of VA also been studied. We arrange and plot graphic represent Fig. 2 . The memory can be reduced in both Type A and Type C. 
Timing analysis of Viterbi algorithm
Now, we introduce three techniques to reduce memory or increase throughput.
A. Interleaving Technique
For reducing the memory of Type B in Fig. 1 , we use more than one PM recursive unit and the original big triangular area in Type B, which means the decision bits stored in the memory, can be interleaving into two small (If=4) triangular areas in Type D. The parameter forward interleaving (If) indicates the degree of interleaving, and the memory can be reduced approximate 1/If but the PM recursive unit can be increased approximate If. 
B. Pointer Technique
In the above discussion, we know the interleaving technique can reduce memory but increase PM recursive units. Observing the Type D in Fig. 3 , we find that by increasing the degree of interleaving, more and more PM recursive units are used, and just the triangular areas are used. Therefore, if we can just run the triangular sections, the hardware or power can be reduced. But the problem is where the initial probability of PM recursive units is from. For the Type E in Fig. 4 , one PM recursive unit pre-computes the initial probability, stores in registers and pass for computing of other PM recursive units in triangular sections. The parameter forward pointer (Pf) indicates the degree of pointer, and the memory can be reduced approximate 1/Pf but the hardware is approximate the same.
C. Parallel Technique
The throughput (T) is decided by valid region (V) and Shift (S), T=V/S.
(1) Therefore, the throughput of Type B in Fig. 1 equals one. Thus, for increasing throughput, we can increase valid region or decrease shift (Type F) in Fig. 5 .
Timing of Viterbi decoding with forward pointer technique, Type E (Pf=4). 
D. Combined Approach
For high throughput and low memory, we can combine interleaving, pointer and parallel techniques in We summarize the information of all VA timing charts in Table 1 . 
TIMING ANALYSIS OF MAP DECODING
MAP decoding
In the Log-MAP operation, it needs to store the values of forward-alpha recursion and fetch while backward-beta recursion. Due to the high cost of large memory and latency, sliding-widow decoding is adopted. A long frame is divided into several sub blocks which length equals about five times constrain length. For easily speaking, we define one time unit is the time of forward or backward accumulation in one sub block. In the first time unit (Area I) of Fig. 7 (a), backward-beta recursion begins with equal probability at the tail of the second sub block. When the operation reaches the head of the second sub block, the accumulated beta values are converged and used to be the initial beta values for valid backward-beta recursion in the second time unit (Area II). The valid beta values of the second time unit must be stored in the memory. Then, in third time unit (Area III), the forward-alpha recursion starts the head of the first sub block and combines the valid beta values from the memory to produce LLR values. The shade area means what time and location the valid beta values must be kept. We can find that at each time constant the decoder needs one sub block memory, one forward recursive unit (RUA) and two backward recursive units (RUB). With the same hardware as Fig. 7(a) , the stored values in memory can be halves alpha and beta values in Fig. 7(b) , or all alpha values in Fig. 7 (c).
Timing analysis of MAP algorithm
In the recent works, several researches about timing of MAP have been studied [6] [10]. We arrange and plot graphic represents as follows.
A. Interleaving Technique
The same as VA, for reducing the memory of Type B (D=2L) in Fig. 7(a) , we use more than one beta recursive unit (RUB) and the original big triangular area in Type B (D=2L), which means the backward recursive probability stored in the memory, can be interleaving into two small (Ib=2) triangular areas in Type C in Fig. 9 . The parameter backward interleaving (Ib) indicates the degree of interleaving, and the memory can be reduced approximate 1/Ib but the beta recursive unit can be increased approximate Ib.
number of alpha(Na)=1, number of beta(Nb)=1, Besides, we also can use forward interleaving technique to reduce memory in Type D in Fig. 10 . The parameter forward interleaving (If) indicates the degree of interleaving, and the memory can be reduced approximate 1/If but the forward recursive unit can be increased approximate If. Moreover, backward and forward interleaving can be combined to use in Type E in Fig. 11 . 
B. Pointer Technique
To improve interleaving technique, the pointer method is also used in MAP timing charts. In Type F in Fig. 12(a) , the parameter backward pointer Pb indicates the degree of pointer, and the memory can be reduced approximate 1/Pb but the hardware is approximates the same. The forward pointer also can be used in Type G in Fig. 12(b) . The parameter forward pointer Pf indicates the degree of pointer, and the memory can be reduced approximate 1/Pf but the hardware approximates the same. Moreover, backward and forward pointer can be combined to use in Type H in Fig. 13 . 
C. Parallel Technique
The same as VA, the throughput (T) is also decided by valid region (V) and Shift (S). For increasing throughput, we can increase valid region (Type I in Fig.  14(a) ) or decrease shift (Type J in Fig. 14(b) ).
D. Combined Approach
For high throughput and low memory, we can combine backward interleaving with parallel techniques in Type L in Fig. 15(a) , combine forward interleaving with parallel techniques in Type M in Fig. 15(b) , combine backward and forward interleaving with parallel techniques in Type N in Fig. 15(e) . Again, we can combine backward pointer with parallel techniques in Type O in Fig. 15(c) , combine forward pointer with parallel techniques in Type P in Fig. 15(d) , combine backward and forward pointer with parallel techniques in Type Q in Fig. 15(f) . Table 2 shows the summary of the MAP timing charts. 
TIMING ANALYSIS OF TRIPLE-MODE MAP/VA DECODING
We have detailed discussion of MAP and VA timing charts. Now we want to find a suitable timing chart that can combine both different algorithms. Observing Type A timing chart of VA, we find that the PM recursive unit is always idle in half time and the alpha recursive unit in Type A timing chart of MAP is the same situation. Therefore, we try to combine timing charts of MAP and VA in Fig. 16 that can decode at the same time. The principle of combination is to use the idle time of each other, thus, the decoder can rum in MAP mode, VA mode or even MAP/VA mode.
VA : Npm=1(share), Ntb=1, M=2L+1R, T=0.5 MAP : Na=1(share), Nb=1, M=L, T=0.5 Fig. 16 Timing of triple mode decoding, Type A of VA and Type A (D=0L) of MAP.
We propose two types of triple-mode VA/MAP timing charts that first mode is VA mode, second mode is MAP mode and third mode is MAP/VA mode. In Type I, the MAP/VA mode in Fig. 16 is combined Type A of VA and Type A (D=0L) of MAP. In the VA mode in Fig. 17(a) , we use the interleaving technique (If=2) by the original sharing hardware of forward recursive unit. In the MAP mode in Fig. 17(b) , we also use the pointer technique (Pf=4) by the original sharing hardware. Moreover, we try to translate the backward recursive unit of MAP part into forward recursive unit by exchanging the input and output ports of trellis wires. Thus, when running in VA mode, we can use pointer technique in Fig. 18 by the backward recursive unit of MAP part, which is idle. In Type II, we use pointer technique to reduce memory by additional forward recursive unit in Fig. 19 . Specially, when running in MAP mode, it can operate like original MAP part in Fig. 19(a) or enhance throughput in Fig. 19(c) .
VA : Npm=2(share), Ntb=1, M=1L+1R, T=0.5 MAP : Na=2(share), Nb=1, M=L/4+3R, T=0.5 
CONCLUSIONS
In this paper, we present three techniques, Interleaving, Pointer, and Parallel schemes, to reduce memory or increase throughput weather in Viterbi algorithm decoding or MAP decoding. Then, we present several timing charts of two algorithms and do some comparisons. Finally, we propose two types of triple-mode MAP/VA timing charts, thus it can be used to design a unified convolutional/turbo decoder for modern communication, such as 3GPP handset.
