In this paper, a new implementation of the Viterbi decoder is proposed. The Modified State-Mapping VD algorithm combines the TB algorithm with the RE algorithm. By updating the starting point of the state for each memory bank, and by using Trace Back and Trace Forward information, LIFO (Last Input First Output) operation can be eliminated, which reduces the latency of the TB algorithm and decreases the resource usage of the RE algorithm. When the memory unit is 3, the resource usage is 13184 bits and the latency is 54 clocks. The latency of the proposed algorithm is 25% smaller than the MRE algorithm and 50% smaller than the k-pointer even TB algorithm. In addition, resource usage is 50% smaller than the RE algorithm. The resource usage is a little larger than that of the MRE algorithm for the small value of k, but it becomes smaller after k is larger than 16. key words: Viterbi decoder, register exchange, trace back, latency, resource usage
Introduction
The Viterbi algorithm, which uses maximum likelihood decoding (MLD), is widely used in modern digital communication to achieve low-error-rate decoding. There are two main algorithms of the implementation-based Viterbi decoder. The first one is the Trace Back (TB) algorithm [1] , and the second one is the Register Exchange (RE) algorithm [2] . Because of its simplicity, many applications use the TB algorithm. However, the drawback of this algorithm is its rather long latency (2∼3 times of Depth T). The register exchange algorithm [3] uses a trace forward scheme to achieve optimum latency (T ). Because of its large resource usage and power consumption, the TB algorithm is preferred in the design of large constraint length. Recently, the combination of TB and RE was introduced in paper [4] . By eliminating the trace back operation from the TB algorithm, its resource usage and latency are enhanced.
In Feygin's paper [5] , the latency and resource usage of various decoding methods are generalized as an equation. In this paper, Feygin's equations are revised and applied to the k-pointer even TB algorithm, RE algorithm, MRE algorithm and the proposed Modified State-Mapping algorithm.
The rest of the paper is organized as follows. In Sect. 2, the k-pointer even TB, RE and MRE algorithms are described and analyzed. In Sect. 3, the proposed algorithm Table 1 The abbreviations used in this paper.
is analyzed in generalized equations and its performance is compared with pre-described algorithms. Finally we conclude the paper in Sect. 4.
The Operation and the Performance Analysis of the Basic Viterbi Algorithms

The k-Pointer Even TB Algorithm
The trace back sequence is depicted in Fig. 1 , where K is 3 and the source sequence is 1101100. From the last stage, a trace back is performed by using the decision information which is the result of ACS operation. After tracing back from the last to the first stage, we can get a reverse ordered decoded sequence, which means the TB algorithm requires a LIFO operation. Figure 2 shows the pipelined operation of the 3-pointer even algorithm. After writing decision information through the three blocks of the memory bank, the trace back operations in TB mode are started during 2 blocks of the memory bank. If the number of the last two trace backed information is large enough (> T ), the starting point of the tracing at decoding (DC) mode can be converged with high probability. The decoding operation is then started in the first memory bank, and the source data is decoded in reverse order.
The analysis of resource usage and latency is as follows. The number of columns per one memory bank (U) is T/(k − 1) [5] . The number of accessed memory banks to decode one memory bank (B) is 2k, and the total number of memory bank (B ) is 2k. Therefore, the total latency (L) is the number of columns per memory bank multiplied by the number of memory banks (L = UB = 2kT/(k−1)). From the resource point of view, PM update memories and decision information memories are used (M pu
We assume that the bit width of PM is 10 bits. The memories for LIFO (2U-2 bits) are also used. Thus, the total resource usage is 20 Figure 3 shows the Register Exchange sequence. The memory for the decision information is updated and exchanged at every stage. The exchange operation involves high power consumption, but the content of the memory itself is a decoded output sequence, which means the RE algorithm does not require a LIFO operation. When the value of K is 7, the size of ACS decision information memory is 36 (T ). This algorithm has a fixed latency of T[clock]. But it can be analyzed and represented using the same methods of the k-pointer TB algorithm.
Register Exchange Algorithm
The parameter U = T/(k − 1), B = k − 1, B = k − 1, M pu = 2 K−1 * 10 * 2, M p = 2 K−1 * 10 * UB , and Md = 2 K−1 UB . Therefore, the latency L = U xB = T and T r = 2 K−1 * 10 * 2 + T * 2 K−1 * (10 + 1).
The Modified RE Algorithm
The modified RE algorithm is found in [4] . This algorithm is a combined version of the TB and RE algorithms. While the RE algorithm exchanges the decision information itself, the MRE algorithm exchanges the pointers of states. Because there are only 64 states for the Viterbi decoder with K = 7, the width of the pointer is only 6 bits which are exchanged at every clock. This operation eliminates the trace back sequence. That is, the starting point of the decoding operation is resolved at the time the writing operation is finished (T clocks later). Figure 4 describes the operation of the Modified RE algorithm. Like the 3-pointer even TB algorithm, the decision information is written in the memory bank, but the pointers (Mi) of starting states are also exchanged at every clock. After exchanging the pointers of states for the Depth (T ) number of stages, the values of pointers indicate the addresses of converged survivor path, which means there are no need to trace back to get the converged survivor path.
To decode one memory bank in Fig. 4 , one set of pointers is needed. So, two sets of pointers are needed in this pipelined operation. The analysis of the resource usages and latency is as follows.
Thus the latency L is UB, which is equal to T (k + 1)/(k − 1). From the resource point of view, M pu
* UB , and the memory block for LIFO is 2U-2. Therefore the total resource is equal to 2 Table 2 is a summary of the generalized performance equations for the pre-described Viterbi algorithms of which constraint length (K) is 7 and depth (T ) is 36.
The Modified State-Mapping Algorithm
Operation of Modified State-Mapping Algorithm
The pipelined operation of a Viterbi decoder using the Modified State-Mapping algorithm is shown in Fig. 5 . Instead of using (k − 1) address sets in MRE scheme [4] , only one address set is used at each memory bank. The address set has 2 K−1 pointers to indicate the address of each state. These pointers copy the address of each state's starting address of the memory bank at each stage. For example, in Fig. 5 , the addresses of states at time 0 in memory bank0 are exchanged until T/2. That is, in one memory bank, each pointer value at the last stage has the starting state's address at the first stage. At time 3T/2, the pointer value in memory bank2 has the starting state's address, and that address can be used as a pointer's index in bank1 to know the starting state's address in bank1. This operation is called "MAP (mapping)" in Fig. 5 . The converged point of the survivor path can be determined by using this operation. Figure 6 shows the memory contents for the Modified State-Mapping method. Trace Forward information (TFinfo) and Trace Back information (TBinfo) are generated at the ACS (Add, Compare and Selection) operation, and used to decode the source sequences in the memory bank. TFinfo indicates where the state should go in the next stage, and TBinfo indicates which path the state comes from. The memory bits for TFinfo are 2 bits which indicates the status of each state. "00" means the terminated state which has no more paths to the next state; "10" means that there is an upper path to the next state; "01" means that there is a lower path to the next state; and "11" indicates that both upper and lower paths exist to the next state. To eliminate the one of two paths at state (TFinfo is "11"), a partial trace back operation is used as shown in Fig. 6 . Tracing back from the terminated state, the revised TFinfo is represented in Fig. 7 . For example the TFinfo "11," at stage 6 in Fig. 6 , is revised to "10" by the partial trace back operation from stage 7 at status "00."
In order to decode one block of data, the TB unit must go through DC&WR (Decoding after mapping & Writing mode), TFIR (Trace Forward Information Revision), MAP (mapping), and DC&WR mode again. Decoding after mapping in DC&WR mode means the decision operation from the converged survivor path which uses the TFinfo, and Writing means the writing of TF and TB information into the memory. In TFIR mode, by adapting partial trace back operation from the terminated state, such a dual path can be eliminated. To perform such an operation, TBinfo is also written in the memory. In MAP (mapping) mode, the starting point of state is determined by the value of pointer. From the converged starting point, the decoding operation is performed by using the revised TFinfo. If the state address is larger than (2 K−1 /2) − 1, the decoded output is "1." This output sequence is the decoded output itself, so the LIFO operation is not required. Similar to the modified RE algorithm [4] , the addresses of the states are exchanged at every clock. But the exchanging operation is performed in each memory bank, independently. Table 3 shows the resource usage of the proposed algorithm. By using additional Ms(2 K−1 * 6k) bits, the Msu resource of the Modified State-Mapping scheme is 1/(k − 1) times smaller than that of the MRE algorithm. Because this algorithm eliminates the LIFO operation, the decoded sequence can be decided at the same time as the DC operation. This means the value of B is k. But instead of reducing the latency, additional Trace Forward resources (2 * Md bits) are used. U and B values of this algorithm are the same as those of the MRE algorithm. The total latency (L) is UB(kT/(k − 1) and the total resource usage is equal to 2 K−1 (26 + 6k + kT/(k − 1)). Figure 8 shows the latency and resource usage with the parameter k. As shown, the shortest latency is achieved by the RE algorithm, and the smallest resource usage is achieved by the k-pointer even algorithm. The latency of MRE converges to the latency of the RE algorithm at about k = 15, but the resource usage also increases to the RE algorithm. The reason for this resource increase is that the number of pointers is also increased as the value of k increases. The proposed method reduces latency to almost half of the MRE algorithm's latency, and the resource usage increases very slowly as the value of k increases.
Latency and Resource Usage Analysis of the Proposed Algorithm
Conclusion
This paper proposed an implementation scheme of the Viterbi decoder. The Modified State-Mapping algorithm uses the pointer of state which indicates the starting point of each trace unit. After mapping this pointer to the first stage, the trace forward operation is used. This operation reduces total decoding latency. Although the total resource usage is a little bit larger than the MRE algorithm, it is rather smaller than the RE algorithm, and grows slower than the MRE algorithm as the trace unit increases. When k is 3, the resource usage is 13184 bits and the latency is 54 clocks. The latency of the proposed algorithm is 25% shorter than the MRE algorithm and 50% shorter than the k-pointer even TB algorithm. The resource usage is 50% smaller than the RE algorithm. The resource usage of the proposed algorithm is a little bit larger than that of the MRE algorithm for small values of k, but it becomes smaller after k is larger than 16.
