A simplification of the MAP decoder for use in turbo decoders is presented. The simplification eliminates the need for a ROM or multiplexor-tree lookup table and replaces it with a constant value. Simulations show that the performance of turbo decoders is not adversely affected by this simplification.
I. INTRODUCTION
Maximum-a-posteriori (MAP) decoding of convolutional codes has seen a resurgence of interest since the discovery of turbo codes in 1993 [1] . MAP decoders make optimum symbol-by-symbol decisions, as well as providing 'soft' reliability information which is necessary in concatenated decoding systems such as turbo decoders. The BCJR algorithm [2] is the most commonly used MAP algorithm in turbo decoding. The BCJR algorithm suffers several shortcomings which make it unsuitable for VLSI implementation, namely the requirement of multiplications and exponentiations. The concept of applying the Jacobi logarithm to simplify MAP decoders was first introduced by Erfanian and Pasupathy [3] . In the logarithmic form of these algorithms, exponentials disappear, multiplications become additions and additions become the MAX* operation (using Viterbi's notation [4] ). In this communication we introduce a simplified version of the MAX operation and demonstrate the performance of a turbo decoder using our simplification.
II. THE LOG-BCJR ALGORITHM
We will briefly describe the BCJR algorithm in the logarithmic domain, the Log-BCJR algorithm. We refer the reader to [2] , [4] and [5] for detailed derivations of the algorithm. Consider an information block u of ¡ bits which take on values of
. We encode the block to obtain the coded block of symbols c. After transmission through a channel we receive the block y. We describe a transition in the binary trellis of the code by its starting state ! # "
Following Erfanian and Pasupathy [3] we use the Jacobi logarithm:
Viterbi calls this the MAX operation [4] denoting that it is essentially a MAX operation adjusted by a correction factor. For an AWGN channel, the branch metrics`
where "¦ §b¦ X 0
is the symbol associated with the transition "¦ §b¦ X 0 in the convolutional encoder. Given that the trellis starts and ends in the all-zeros state, the forward state metrics are calculated by a forward recursion as:
with initial conditions:
The reverse state metrics are calculated by a backwards recursion as:
with initial conditions
III. THE SIMPLIFIED MAX OPERATION
In this section we discuss a simplified implementation of the MAX operation for use in the Log-BCJR algorithm. It is well known that the correction factor in 2 can be approximated in a small lookup table with negligible effects on performance [3] [5] . We plot the correction function [5] . We have implemented the Log-BCJR algorithm using 8-bit 2's complement integer arithmetic. Our numerical representation used 5 integer bits and 3 fractional bits and therefore the smallest value of r " % s 0
we can represent is 1/8, corresponding to a maximum value of s of about 2.0. Our simulations of a turbo decoder using the Log-BCJR algorithm have shown that the function in Fig. 1 can be approximated by the following rule:
The implications of the above rule is that the lookup table for r " % s 0
can be reduced to a simple logic circuit which either adds or does not add a constant to the output of the maximum selection circuit. The simplified circuit is shown in Fig. 2 . An 8-location lookup table can be implemented in CMOS using 60 transistors, assuming that both ¥ and ¥ are available from the subtractor circuit. The simplified circuit can be implemented using 20 transistors and only requires the difference ¥
. Synthesis using standard cells in 0.5 CMOS results in an area savings of 40%. The savings are even more pronounced when considering FPGA or DSP implementations. In these cases, the muliplexor tree required for the 8-location lookup table cannot be realized as efficiently. The lookup table may require storage in RAM or ROM which can create a memory bottleneck. In addition, the subtractor circuits used are not likely to provide both differences required and therefore two subtractors or an absolute value circuit is required. The simplified rule does not have any of these restrictions and is well suited to FPGA or DSP implementations.
IV. PERFORMANCE Fig. 3 . shows the performance of a 4-state rate 1/2 turbo code (polynomials 7/5) with a block length of 1024 for 1 and 10 iterations using our simplified MAX operation and 8-bit metrics compared to a floating point simulation. There is a negligible performance loss of 0.03 dB at high bit error rates but at the interesting bit error rate region around £f W and £f W there is practically no difference in performance.
V. CONCLUSIONS
We have introduced a simplification of the MAX operation for the Log-BCJR algorithm which replaces the lookup table with a constant value. Simulations show that turbo code performance is not adversely affected. 
