Abstract: The optimum decoding of component codes in block coded modulation (BCM) schemes requires the use of the log-likelihood ratio (LLR) as the signal metric. An approximation to the LLR for the least reliable bit (LRB) in an 8-PSK modulation based on planar equations with fixed-point arithmetic is developed that is both accurate and easily realisable for practical BCM schemes. Through an error power analysis and an example simulation it is shown that the approximation results in less than 0.06dB in degradation over the exact expression at an E/N o of IOdB. It is also shown that the approximation can be realised in combinatorial logic using roughly 7300 transistors. This compares favourably to a look-up table approach in typical systems.
Introduction
Combined modulation and coding is an efficient method of conveying information through power and bandwidth lim ited channels. Imai and Hirakawa's multilevel coded modu lation schemes (MLCM) [I] , also called block-coded modulation (BCM), can achieve trellis-coded modulation (TCM) performance in a block structure. They can be an alternative to TCM in systems where a block format, code flexibility, and decoding speed are important. Though a BCM scheme is generally not maximum likelihood (ML), its structure can offer more coding for less complexity than TCM in some systems, such as in packet switched systems.
The BCM structure applies individual codes for each bit in a modulated symbol. These component codes are denoted Co, C I , ..., Cn-I where n is the number of bits in the symbol. Each component code can be a block or convolu tional code, and they can be decoded with or without chan nel information. The error correcting capability of the ith component code is chosen in accordance with the channel bit error probability associated with the ith (i = 0, I, ..., n -I) bit in the modulated symbol as well as taking into account information provided by the decoder from the (i -I)th level. Usually the overall goal is to 'balance' the system by obtaining approximately the same decoded error proba bility for each level of decoded bits.
8-PSK log-likelihood ratio
In applications such as satellite and mobile communica tions the digital modulation format 8-PSK is one emerging as a practical choice in bandwidth-and power-limited situ ations. One example of BCM applied to 8-PSK uses three component codes, one for each bit in an 8-PSK symbol.
The associated encoder and decoder structures are illus trated in Figs. 1 and 2 , where the bottom code Co is for the least significant bit (LSB) and the top code C 2 is for the most significant bit (MSB). As will be seen shortly, the LSB is also the least reliable bit (LRB). To obtain a benefit from multistage decoding the LSB in the constellation must alternate between binary 0 and 1 as the symbols are defined from 0 to h/8 radians [2] . A mapping that fits this criterion is shown in Fig. 3 . Each symbol is defined to have a power normalised to 1. Multistage decoding requires that the bottom code Co be decoded first. The signal metric for maximum likelihood decoding (MLD) for this code with the given constellation assignment is the log-likelihood ratio (LLR) [3, 4] . In 8 PSK, the LLR of the rightmost bit or the least reliable bit (LRB) being a binary 0 can be expressed as Note that in each case the LLR has been normalised so that the maximum absolute value is equal to I in each of these plots. An explicit evaluation of the LLR in real-time is very undesirable in most practical systems due to the number of complicated mathematical operations required. For this 
where
It is important to remember that these values, whether the exact LLR or the LLR planar approximation, are the soft decision metrics to be sent to the decoder. The performance of the decoder does not depend on the absolute size of the metrics. Thus any positive scaling factor that is convenient can be chosen since multiplying all outputs by some con stant has no effect on the performance of the decoder. This translates into a freedom of choice for one of the two values for a and {3. The other value is determined by the ratio between a and {3. If one considers fixed point arithme tic (integers) a = 29, and {3 = -70 preserves the ratio quite 
The evaluation of the LLRPA as a function of I and Q is plotted in Fig. 7 . Unlike the exact values for the LLR, the planar approximation is not dependent on the EjNo.
Visually, the plot looks like an increasing good fit to the LLR as the EjN o increases. 
Error power analysis
An error power analysis can be used to find the 'effective' SNR degradation due to the use of the LLRPA as com pared with the exact LLR. The approach finds the power associated with the LLRPA and considers it as an addi tional noise term. This noise is considered as an effective increase in the channel noise as depicted in Fig. 8 . This analysis is an estimate since both the effect of the nonline arity associated with LLR device and the fact that the noise term associated with The relative size of the LLRPA noise term associated with Fig. 8c is estimated by the relative size of the noise term associated with Fig. 8b . In other words, the expected power in the noise term in Fig. 8b is used to compare to the expected power in the output from the exact LLR. The error power is given by the expected value of the squared difference signal. The difference signal is given by
DS(I, Q) = LLR(I, Q) -A[LLRPA(I, Q)] (5) t e-(E.INO)d;] DS(I, Q)
The coefficient A is a scaling factor to find the best fit between the LLR and the LLRPA. The best fit is defined when the expected value of the squared value is minimised. As mentioned in Section 2, a scaling factor on the LLRPA does not effect the performance of the decoder. The coeffi cient A is therefore omitted in any real system, though it is important in an analysis of error power.
Once the difference signal DS(/, Q) is determined, the expected value of the squared error is found as
where peS;) is the probability that the ith signal was sent, and p;(I, Q) is the probability density of receiving the point (I, Q) given the ith signal constellation point was transmit ted. If the assumption is made that the eight signals are equally likely, owing to the symmetry of the 8-PSK constel lation, this simplifies to
Here p(/, Q) is the probability density of receiving the point (I, Q) given a particular symbol was transmitted. The expected squared difference signal can then be related to the expected squared signal or signal power (after the LLR operation). This is essentially the expected squared output (no approximation) which is given by
E[LLR 2 ] = JJp(I, Q)LLR 2 (I, Q)dIdQ (9)
The ratio (10) is an estimate of the additional noise-to-signal ratio due to the log likelihood ratio planar approximation. An estimate of the overall signal-to-noise ratio is obtained by 1 S N Restimate = ---1------;E""r-;O:Dc;;S""2j' (11)
SN Rchannel + E[LLR2]
In dB, this corresponds to a reduction in SNR given by
Example
Consider an E/N o of 6.0dB as an operating point. We have mentioned that a IOdB operating point of E/N o is needed for a coded satellite system to obtain a bit error rate of 10-6 . The reason that the 6.0dB example is given here is to demonstrate that the LLRPA even can perform well at an SNR lower than IOdB. Fig. 5 illustrates the LLR for this SNR. The difference signal (DS) is the difference between the normalised LLR and the planar approximation (with the appropriate A). This is shown in Fig. 9 . Fig. 10 is the squared error signal. Fig. II is the probability density func tion of the received signal for a given symbol transmitted at Es/N o of 6.0dB. 
--2
The ratio of the expected squared difference signal and the expected squared true LLR is an estimate of the addi tional effective noise-to-signal ratio. For the example, the estimated reduction in the signal-to-noise ratio due to the log likelihood ratio planar approximation is calculated numerically to be O.216dB. This is an estimate of the degradation associated with the LLRPA. The accuracy of the approximate degradation can be assessed through simulation. A realistic simulation example uses the rate 1/4, 16-state convolutional code given in [3] as Co, and 8 bits of quantisation on both I and Q. One simulation uses a LLR look-up table, while the other simulation uses the LLRPA equation. Both simulations use the same PN sequences for both the information and the noise. The exact LLR look up table performs better for all operating points (values of channel SNR), but the difference (as measured in SNR reduction for a given BER or SNR operating point) is quite small. Fig. 12 illustrates the difference between the SNR reductions computed theoretically, and those found by simulation. The EJN u range in the simulation corre sponds to a HER range of IO--{i and lower. 
Implementation analysis
Although it is intuitive that a hardware realisation of the LLRPA would be simpler than the exact LLR, in practice the exact LLR is computed via a look-up table (LUT). As such, an implementation analysis is really a comparison between the hardware realisation of the LLRPA and a suf ficient size memory based LUT to find the exact LLR. This type of comparison is somewhat system dependent, and the comparison presented here that is based strictly on an approximate transistor count must be taken within the sys tem context. For example, in a demodulator/decoder that is realised mostly with VLSI technology, corning off the device to an external LUT and then back on the device has disadvan tages in both the speed of external routing and the increase of VLSI complexity owing to increased 1/0 requirements. In this case, the number of transistors required for both techniques in the context of the particular VLSI device is a good comparison. Further, systems implemented with pro grammable logic such as field programmable gate arrays (FPGAs) tend to be constrained in the amount of memory space available, making the LLRPA implementation attractive. Alternatively, systems that are not fully realised in VLSI circuitry may benefit from the potential simplicity of a single memory device to perform the LLR LUT. The benefits gained from the design maturity of memory tech nology may outweigh a specific implementation of an algo rithm such as the LLRPA.
A block diagram of the required processing for the LLRPA is shown in Fig. 13 . The block diagram indicates that 8-bit data from an analogue-to-digital converter or digital filter is first converted to its absolute value. The resulting 7-bit magnitude values of I and Q are compared to find the greatest value. If the magnitude of I is greater than or equal to the magnitude of Q, the I data follow the top leg of processing and the Q the bottom leg. If the mag nitude of Q is greater than the magnitude of I this is reversed. The appropriate values are then multiplied by either 29 or 70 and are then subtracted. The result is then divided by 256 to maintain only the six most significant bits. The complexity of the LLRPA implementation can be approximated through a rough estimation of the complexity in terms of gates for each of these functions. These gate counts are then converted to an overall estimate of transistor count. The accuracy of the approximation is subject to the goals of a particular system in terms of speed, power consumption, or real estate. Further, the number representation presented by the upstream hardware and required by the downstream hardware can also be relevant. First, in its worst case, the absolute value function requires a magnitude compare, a select, and then an 8-bit addition or subtraction, requiring a rough total of 200 gates. Secondly, the magnitude comparison and select require about 80 gates. Next, the fixed multiplies can be realised by shifts and adds resulting in about 250 gates. The final substractor requires approximately 200 gates and the divider chooses the six MSBs. Assuming an average of ten transistors per gate, the total approximate transistor count is 7300. For a rough comparison, the LUT table would have a 2 8 x 2 8 = 65536 memory addresses. If each address contains six bits to maintain good quantisation accuracy this corresponds to a 65536 x 6 memory. A static random access memory (SRAM) that used five transistors per cell would require 1.97 x 10 6 transistors. This ignores the transistors required for column decoders, row decoders, and read/write circuitry. These estimates indicate that the LLRPA requires approximately 270 times fewer transistors than the LUT. Also, the LLRPA computation can be implemented in parallel to obtain an operating speed increase. In this case, the number of transistors will increase by the factor of the speed increase plus the gates required to multiplex and demultiplex the I/O.
LLR and the C1 code
Once the bottom code CO is decoded and re-encoded, the re-encoded data is used to determine which of two 4-PSK symbol sets is used for the remaining two bits. That is set {SO, S2, S4, S6} or set {SI, S3, S5, S7} with re.spect t? Fig. 3 . Given one of these two sets, the least reliable bIt (which is really the middle bit now) must also alternate between a and I as the symbols are encountered moving around the circumference of the circle. The data impressed onto this symbol is from the CI code. For decoding purposes, the optimum signal metric is the log likelihood ratio for this constellation. If we consider the set {SO, S2, S4, S6} We state without proof that error associated with this approximation is less than that associated with CO. If the set in question is the set {Sl, S3, S5, S7}, a 'rotationt' operation will need to be performed. That is, I and Q should exchange positions in eqn. 14.
Conclusions
It has been shown that the planar approximation to the log-likelihood ratio in the least reliable bit of an 8PSK modulation format is suitable for practical systems. The approximation results in very little degradation in effecti~e SNR as indicated by an approximate error power analySIS and verified through simulation results at relevant operating points. The complexity of the LLRPA discussed as a comparison between the implementation of LLRPA and an equivalent memory based LUT evaluating the exact LLR indicates that the LLRPA is practical for many systems.
Although appropriate for coded 8PSK, the orthogonality of gray-coded QPSK and the single dimension of BPSK make the calculation of the appropriate LLR metric simply equivalent to either the value of I or Q. In these cases an approximation is not necessary. For higher-ord.er P~K systems, a sirnilar approach for a planar approXllllatlOn Can be taken. Although the decision device to determine the multipliers for I and Q may be more complex, the required size LUT for an exact LLR may get undesirably large. It is uncertain whether there exist small integer multipliers that will preserve a good approximation. Finally, due to the complex decision regions it is unclear whether QAM modulation schemes could benefit from a similar approximation technique.
