Abstract-Unary Error Correction (UEC) codes constitute a recently proposed Joint Source and Channel Code (JSCC) family, conceived for alphabets having an infinite cardinality, whilst out-performing previously used Separate Source and Channel Codes (SSCCs). UEC based schemes rely on an iterative decoding process, which involves three decoding blocks when concatenated with a turbo code. Owing to this, following the activation of one of the three blocks, the next block to be activated must be chosen from the other two decoding block options. Furthermore, the UEC decoder offers a number of decoding options, allowing its complexity and error correction capability to be dynamically adjusted. It has been shown that iterative decoding convergence can be expedited by activating the specific decoding option that offers the highest Mutual Information (MI) improvement to computational complexity ratio. This paper introduces an iterative demodulator, which is shown to improve the associated error correction performance, while reducing the overall iterative decoding complexity. The challenge is that the iterative demodulator has to forward its soft-information to the other two iterative decoding blocks, and hence the corresponding MI improvements cannot be compared on a like-for-like basis. Additionally, we also propose a method of eliminating the logarithmic calculations from the adaptive iterative decoding algorithm, hence further reducing its implementational complexity without impacting its error correcting performance.
I. INTRODUCTION
Wireless multimedia transmission has to be both bandwidthefficient and resilient to transmission errors, motivating both source and channel coding. Traditionally, Separate Source and Channel Codes (SSCCs) have been employed for satisfying these requirements. For example, Elias Gamma (EG) codes [1] represent a typical source code, which may be concatenated with channel codes such as turbo codes [2] . However, SSCCs lead to capacity loss due to the residual redundancy that typically remains following source encoding, hence potentially limiting the error correction performance of the scheme [3] . In general, SSCC can be replaced by Joint Source and Channel Codes (JSCCs), which exploit the residual redundancy for the sake of achieving an improved error correction capability [4] . However, the source symbols produced by multimedia codecs such as the H.264 video encoder are approximately zeta distributed [3] , with most symbols having a low value, but with infrequent symbols having high values of around 1000. The resultant high cardinality of the source alphabet prevents the employment of classic JSCCs, such as the Variable Length Error Correction (VLEC) [5] , owing to the associated computational complexity. This potentially excessive complexity
The authors wish to gratefully acknowledge the financial support of the ERSPC, Swindon UK under the auspices of grant EP/J015520/1 and EP/L010550/1, as well as the TSB, Swindon UK under the auspices of grant TS/L009390/1. recently motivated the design of a JSCC known as the Unary Error Correction (UEC) code [3] , which maintains a modest complexity, even when the cardinality of the source alphabet is infinite.
In [6] , we concatenated the UEC encoder with a turbo encoder, corresponding to an iterative decoder comprising three decoding blocks. Following the activation of each decoding block, the next block to be activated must be chosen from the other two decoding options. Furthermore, the UEC decoder offers a number of decoding options [6] , allowing its complexity versus error correction capability tradeoff to be dynamically adjusted. An adaptive iterative decoding algorithm was proposed in [6] for making a dynamic on-line selection of the decoding option to activate at each stage of the iterative decoding process. It was demonstrated that iterative decoding convergence can be expedited by activating the specific decoding option that offers the highest Mutual Information (MI) improvement to computational complexity ratio. This enables a frame to be decoded either at a reduced computational complexity or at a reduced Symbol Error Ratio (SER), when compared to a static activation order of the decoders.
In this paper, we extend our previous contribution of [6] , by additionally concatenating the UEC code and turbo code with an iterative demodulator, producing an iterative receiver comprising four decoding blocks. We use EXtrinsic Information Transfer (EXIT) chart analysis for demonstrating that this iterative demodulator complements conveniently the turbo code, with the aid of reduced-complexity components, without increasing the E b /N 0 at which iterative decoding convergence to a low SER becomes possible. We demonstrate that the introduction of iterative demodulation actually reduces the overall complexity of the iterative receiver. However, since the iterative demodulator is invoked for recovering different bits for forwarding to the UEC decoder and to the turbo component decoders, the extension of the adaptive iterative decoding algorithm of [6] to this scheme is not trivial. More specifically, the MI improvements associated with the iterative demodulator cannot be compared on a like-for-like basis with those offered by the three other decoding blocks. We propose a solution to this problem, which may also be applied in other multicomponent iterative receivers. More specifically, we consider the operation of the iterative demodulator jointly with one of the turbo component decoders, allowing all decoding options or activation orders to be considered on the basis of MI improvements pertaining to the same bits. In this paper, we also explore the practical implementation of the adaptive algorithm advocated, proposing a simple approximation of the MI measurement function. This reduces the associated implementational complexity by avoiding floating point logarithmic and exponential function evaluations, without significantly degrading the SER performance.
We commence in Section II by introducing the UEC code and our proposed schematic. Section III provides the EXIT chart [7] analysis of the decoding blocks, which form the basis of the adaptive algorithm. This section concludes by analyzing the implementational complexity of the proposed scheme, firstly quantifying its storage requirements, followed by detailing the proposed MI approximation. Error correction results are provided in Section IV, where the proposed scheme is found to offer up to 2 dB performance gain without increasing the required transmission energy, bandwidth, delay or decoder complexity. Section V offers our concluding remarks.
II. TRANSMITTER AND RECEIVER SCHEME
In this section, we detail the operation of the UEC-Turbo-QPSK scheme of Fig. 1(a) . The transmitter's operation is described in Section II-A, while the receiver is considered in Section II-B.
A. Transmitter
The transmitter design of the proposed joint source coding, channel coding and modulation scheme is shown in Fig. 1(a) . This scheme is designed for conveying a vector
of a source symbols, each of which has a zeta-distributed integer value x i in the of range one to infinity x i ∈ N 1 , and parameterized by p 1 = P (x i = 1) [3] . This distribution models the symbols generated by a H.264 video encoder [3] , for example. These symbols are forwarded to the UEC source encoder, which comprises a unary encoder and a trellis encoder. The unary encoder produces a binary codeword y i for each symbol x i , where the length of the codeword is equal to the value of the symbol. The codewords are then concatenated for forming the bit vector y = [y j ] b j=1 , which is then encoded using the trellis encoder to obtain the bit vector
x i is the length of the bit vectors y and z, which can vary from one frame to the next, owing to the varying lengths of the unary codewords produced by the UEC encoder. To elaborate fuurther, [6, Fig. 3] shows the r = 6-state UEC trellis, which is employed in the transmitter of Fig. 1(a) . The UEC coding rate R o is a function of the source distribution parameter p 1 , where p 1 = 0.797 results in a coding rate of R o = 0.762, for example.
Two replicas of the vector z are interleaved by the interleavers π 1 and π 2 to obtain the bit vectors u u and u l , as shown in Fig. 1(a) . Turbo channel encoding is performed upon u u and u l by a parallel concatenation of Unity Rate Code (URC) encoders [6] and the resultant bit vectors v u and v l are interleaved by a pair of interleavers, both having the same design π 3 . The bits in the resultant vectors are paired up by a multiplexer to form the vector w, which is QPSK modulated and transmitted over the channel using a natural mapping [8] . As we will show in Section IV, natural mapping facilitates improved error correction capability and/or reduced iterative decoding complexity, compared to Gray mapping. This results in an effective throughput given by η = R o R i log 2 (M ), where the use of the turbo code leads to R i = 0.5 and we have M = 4 for QPSK modulation.
B. Receiver
The receiver of the proposed scheme is shown in Fig. 1(a) . As seen in Fig. 1(a) , the URC decoders convert the a priori As shown in [6] , the number of states employed by the UEC trellis decoder can be selected independently of the number of states employed in the UEC trellis encoder. Increasing the number of states r employed by the UEC trellis decoder in this way has the benefit of improving its error correction capability, albeit this is achieved at the cost of increasing its complexity [3] . In Section III-B, we will exploit this for dynamically adjusting r for the sake of striking an attractive trade-off between the UEC trellis decoder's complexity and its error correction capability.
During the iterative decoding process, the iterative operation of the URC1, URC2, UEC decoders and of the demodulator seen in Fig. 1 (a) may be performed using a wide variety of different decoder activation orderings. For the sake of conceptional simplicity, a fixed decoder activation order may be employed, in which the URC1, URC2, UEC decoders and the demodulator of Fig. 1 (a) are activated using a regular activation order repeated periodically, namely [demod, URC1, URC2, UEC, demod, URC1,...]. Alternatively, we may employ a dynamic decoder activation order, in which an on-line decision is made at each stage of the iterative decoding process, in order to adaptively and dynamically select which of the URC1, URC2, UEC decoders or whether the demodulator of Fig. 1(a) should be activated next. This is explored in Section III using a novel 3D EXIT chart aided technique, in order to expedite iterative decoding convergence towards the Maximum Likelihood (ML) decoder's performance.
Following the achievement of iterative decoding convergence, the UEC trellis decoder of Fig. 1(a) generates the vector of a posteriori LLRsỹ p , which pertain to the corresponding unary-encoded bits in the vector y. Finally,ỹ p can be unary decoded in order to obtain the symbol vectorx, as detailed in [3] . 
III. ADAPTIVE DECODER ACTIVATION ORDER ALGORITHM
This section details the proposed extension to the adaptive iterative decoding algorithm of [6] for the sake of facilitating its contribution with schemes relying on additional serially concatenated blocks, such as the proposed concatenation of an iterative source and channel code with an iterative demodulator. This algorithm is best suited for scenarios which are complexity-limited, since it allows the receiver to adapt the decoder activation order to make best use of the limited resources. We begin in Section III-A by considering the EXIT functions of each decoder, which are used by the proposed algorithm for quantifying the benefit of activating the corresponding decoding block at a particular point in the iterative decoding process. Following this, Section III-B details the decision process used by the proposed adaptive algorithm. Section III-C quantifies the storage requirements of the algorithm, while Section III-D conceives a technique for eliminating the algorithm's complex logarithmic and exponential operations without a reducing its performance.
A. EXIT Function Analysis
EXIT functions [7] rely on the MI to quantify the quality of the extrinsic information output by a decoding block, as a function of the quality of its input a priori information, as well as the the channel Signal-to-Noise Ratio (SNR) if the decoding block performs demodulation, equalization or multiuser detection, for example.
The following sections describe the EXIT functions for the different components of the scheme shown in Fig. 1(a) , with the UEC EXIT function described in Section III-A1, the URC described in Section III-A2, and the combined demodulator and URC EXIT function described in Section III-A3. In common with our previous work [6] , all of the EXIT functions are defined in terms of I(z) and I(w), rather than I(ũ) and I(ṽ).
1) UEC Decoder: The operation of the UEC decoder is only dependent on a single input vector of a priori LLRsz a o , which is used for producing both the output extrinsic LLR vectorz 
2) URC
, where I(z a ) and I(w a ) are the mutual information ofz a andw a , respectively. 3) Demodulator and URC Decoder: The iterative QPSK demodulator and one of the URC decoders can be considered as a single block for the purposes of the proposed algorithm. This is because the operation of the demodulator will always be followed by activating one of the URC decoders, so that the resultant informationw e i gleaned from the demodulator can be propagated to the rest of the iterative decoder. This approach takes the demodulator into account, but allows an EXIT function to be produced that characterizes the expected MI improvement of the signal z e . Since both the URC and UEC EXIT functions also characterize the improvement of z e , this allows like-for-like comparisons to be made between the three different types of decoding blocks.
The joint demodulator and URC decoder EXIT function depends on the channel's E b /N 0 , as well as on the a priori LLR vectorz a u orz a l provided for the URC decoder and on the a priori LLR vectorw a i fed back to the demodulator. This can be expressed as I(z Fig. 1(b) for the combination of natural mapping aided QPSK and a URC decoder having 8 states.
B. Adaptive activation algorithm
Upon starting to detect a new frame, the receiver first has to activate the iterative demodulator, followed by one of the URC blocks in order to disseminate the information received from the channel. Following this, it may decide from several options as to which decoding block to operate next. In general, the decoding activation options that can be selected based on Fig 1(a) are the upper URC decoder (URC1), the lower URC decoder (URC2), the demodulator followed by the upper URC decoder (URC1+demod), the demodulator followed by the lower URC decoder (URC2+demod), or the UEC decoder (UEC), where selecting the UEC presents the option of operating with different numbers of states r.
During the iterative decoding process, the benefit of each decoding option is quantified using the expected improvement to the quality of the extrinsic LLR vector that it provides. This is achieved by first measuring the MI of the most-recently generated values of the a priori LLR vectorsz a o ,z a u ,z a l ,w a u , and w a l , using the technique described in Section III-D. These MIs may be used as inputs to the EXIT functions of Section III-A, which may be generated off-line and stored in Look-Up Tables (LUTs), as will be described in Section III-C. These predict the MI of the extrinsic LLR vector that would be generated by each decoding option This can be compared to the MI of the currently available extrinsic LLR vector that would be replaced to make an estimation of the potential MI improvement offered by selecting this decoding option. Note that additions are used for obtaining the LLR vectorsz , for predicting the expected benefit of operating the demodulator followed by URC1. By subtracting the MI of the currently avaliable extrinsic LLR vector from the expected MI of the vector that would replace it, we obtain the expected MI improvement for each decoding option. By selecting the next decoding option with consideration of its expected MI improvement, it allows us to give priority to those blocks which have not been activated recently.
The algorithm for selecting which of the available decoding options to perform next (UEC, URC1, URC2, URC1+demod, URC2+demod) also quantifies of the cost of each the options. More specifically, we consider using the number of Add-Compare-Select (ACS) [10] Table III ]. Upon dividing the expected MI improvement by the number of ACS operations needed per bit for each of the decoding options, we arrive at a score for each decoding option. The algorithm then picks the specific option with the highest score as the block which should be operated next.
C. Storage Requirements
In this section, we consider the memory required for the storage of the EXIT function LUTs, which are required for the adaptive iterative decoding algorithm of Section III-B to function. As described in Section III-B, three sets of EXIT function LUTs are required: one set for the UEC code I(z 
As a comparison, the combined memory requirement for three different 6144 bit random interleavers is 30 kB, demonstrating that the memory requirement of the proposed algorithm is not excessive, when compared to the original memory requirement of the scheme in Fig. 1(a) .
D. MI Measurement
The adaptive iterative decoding algorithm of Section III-B relies on measuring the MI of the LLR vectors provided by the various blocks of the decoder. Using the averaging method [7] , the MI of an LLR vector may be is estimated without any knowledge of the true bit values. The MI is calculated as a function of the entropies that are implied by the b LLRs in the vector [7] , according to
The binary entropy function H b (z k ) is comprised of several log and exp functions, which imposes a high implementation complexity. As a result, it is desirable to approximate this function. Fig. 2 shows that the function 1 − H b (z k ) resembles an inverted Gaussian distribution. Motivated by this, we propose to approximate it using a linear approximation. As shown in Fig. 2 , this approximation is characterized by a straight line, which is mirrored for positive and negative LLR values, and that is clipped such that it has a maximum MI value of 1, and the minimum MI value of 0. This results in a linear piece-wise approximation of 5 segments, and can be written as I(z k ) = max(min(m|z k | + c, 1), 0), where m and c represent the gradient and intercept point of the line. To choose the optimum values for m and c, LLRs were generated having a range of target MI values in the range of 0 to 1. Following this, the approximate MI measurement function was fitted to these LLRs, producing an estimate for the MI of each group of LLRs. The slope m and intercept c values were chosen for the approximation as those, which produced the minimum aggregate error between the approximate MI values and the exact MI values. In this case, m = 0.24 and c = −0.06 were found to be optimum. To reduce the number of calculations needed, instead of scaling and shifting the MI of each LLR before averaging, we can transform the operation, so that the clipped LLRs are averaged, before this average is scaled by m and shifted by c to get the final MI of the vector. These techniques result in negligible complexity for the MI measurement compared to the operation of the decoding blocks. 
IV. RESULTS
In this section, the error correcting performance of the proposed scheme using the adaptive iterative decoding algorithm is compared to that of various benchmarkers. This section will first describe the nature of these benchmarkers in Section IV-A, before presenting the results in Section IV-B.
A. Considered Schemes
In this section we contrast the proposed scheme of Section II to a number of benchmarkers. A summary of the benchmarkers is shown in Table I and more detailed information is available in [6] . For each scheme, we will consider both a 4-state and an 8-state URC design for the upper and lower URC decoder producing turbo decoders, which are referred to as Turbo4 and Turbo8 from here on. Each of the schemes can also be operated with or without the iterative demodulator, where activation of the iterative demodulator is denoted with the -QPSK suffix. In the case when the QPSK demodulator is not operated iteratively, Gray mapping is employed since it provides superior performance to natural mapping in this case [8] . Table I shows the E b /N 0 capacity bound, which is the theoretical limit for reliable communication of the given effective throughput η, the E b /N 0 area bound which is the limit at which the area beneath the EXIT functions match each other, and finally the E b /N 0 tunnel bound, which is the threshold where the EXIT charts exhibit an open tunnel. These bounds indicate successively stricter limits for the E b /N 0 , where a low SER becomes possible for each scheme. Since all of the schemes have the same effective throughput of η = 0.762, our comparisons between them are fair in terms of transmission energy, delay and bandwidth.
1) UEC-Turbo(-QPSK): This is the proposed scheme of Fig. 1(a) , which may be operated with or without the aid of the adaptive iterative decoding algorithm advocated. The tunnel bound of Table I suggests that the UEC-Turbo8 scheme of our previous work [6] would marginally out-perform UECTurbo4-QPSK by 0.1 dB, but only in the absence of an iterative decoding complexity limit. Although the iterative demodulator of the proposed scheme imposes only a modest complexity, it actually facilitates a significant overall complexity reduction since it allows the less complex 4-state URC to be used, suggesting that it will out-perform the UEC-Turbo8 scheme in the presence of an iterative decoding complexity limit.
2) EG-URC-Turbo(-QPSK): This scheme is a classic SSCC benchmarker, where the unary encoder of Fig. 1(a) is replaced with an EG code for source coding. Furthermore, the UEC trellis code of Fig. 1(a) is replaced with an accumulator, which is an r = 2 state URC encoder. This is required for converting the vector of non-equiprobable bits y into a vector of equiprobable bits z [6] . During decoding, each of the three URC decoders are operated in turn. In this scheme the outer URC decoder is provided with a priori information pertaining to the bit value probabilities in y, which allows some of the residual redundancy to be exploited for error correction.
3) EG-Turbo(-QPSK): This SSCC benchmarker replaces the UEC code of Fig. 1(a) with an EG code. This results in a variation of the EG-URC-Turbo scheme, which omits the URC code. This scheme suffers from a significant capacity loss due to the non-equiprobable bits in z [6] . However, this scheme has a low complexity due to the reduced number of iterative blocks, which are operated in the decoder.
4) Unary-Turbo(-QPSK): This is another SSCC benchmarker, which has non-equiprobable bits at y, and therefore also suffers a from capacity loss. Compared to the proposed scheme of Fig. 1(a) , the UEC code has been replaced by a unary code. It is therefore similar to the EG-Turbo scheme, but uses a unary encoder instead of a EG encoder.
B. SER Performance
In this section, we consider a limited complexity-receiver, where there is a limited amount of computational resource. Fig. 3 provides SER plots for each of the schemes chosen and for a range of complexity limits, where complexity is quantified in terms of ACS operations per bit of z. Results are provided for the transmission of frames comprising a = 650 symbols over an uncorrelated narrowband Rayleigh fading channel. The complexity limits imposed are 1000, 2000 and 3000 operations per bit of z, which allow the relative performance of the various schemes to be compared for a range of available decoder resources. If the iterative scheme of Fig. 1(a) operates each of the blocks successively in turn, these complexity limits allow approximately 3, 6 and 9 iterations to be completed, when 4 states are used for the UEC and URC decoders. Fig. 3 , when the scheme of Fig. 1(a) uses 4-state URCs and the iterative demodulator (UEC-Turbo4-QPSK), it performs better than the UEC-Turbo8 scheme of [6] , which uses 8-state URCs without iterative demodulation. Although the EXIT chart analysis of Section IV-A suggests that the ultimate performance of the UEC-Turbo8 scheme would be 0.1 dB better, the UECTurbo4-QPSK arrangement is seen to be 2 dB superior, when a maximum complexity of C = 2000 ops/bit of z is imposed. This may be attributed to the fact that the 4-state URC decoder imposes about half the complexity of the 8-state URC decoder, while the added complexity of the iterative demodulator is minimal in comparison.
The proposed UEC-Turbo4-QPSK scheme performs particularly well under low complexity limits, since its advantage of reduced complexity is diminished, as the complexity limit is increased. Owing to this, the gap between the UEC-Turbo8 and UEC-Turbo4-QPSK schemes reduces to 1.2 dB for a complexity limit of C = 3000 ACS ops/bit of z.
2) Low Complexity Schemes vs. High Performance Schemes: Fig. 3 also compares the proposed adaptive UECTurbo4-QPSK scheme to other benchmarkers. Since the proposed adaptive algorithm allows for a reduced decoding complexity, it has a performance advantage over the set of high-performance benchmarkers. However, under the low complexity limit of C = 1000 ACS per bit of z, the less complex benchmarkers, such as the EG-Turbo scheme, exhibit a 1.5 dB gain over the more complex near-capacity schemes. However, when the complexity limit is increased to C = 2000 ACS per bit of z, these low complexity schemes suffer a performance loss, while the proposed adaptive UEC-Turbo4-QPSK scheme offers a modest, but non-negligible performance gain of 0.25 dB, without any increase in transmission energy, delay, bandwidth or decoder complexity.
3) Effects of Approximation: Fig. 4 characterizes the performance erosion imposed upon the proposed adaptive UECTurbo4-QPSK scheme, when EXIT chart LUT quantization is used, as well as when approximate MI measurements are used. It can be seen in Fig. 4 that 6-bit and 8-bitquantization imposes only a negligible degradation on the SER performance, while 4-bit-quantization reduces the gain of the proposed scheme by 0.2 dB. The approximation invoked for measuring the MI also has a negligible impact on the proposed scheme, hence this is recommended for hardware implementation.
V. CONCLUSIONS
In this paper we have demonstrated that the adaptive iterative decoding algorithm of [6] can be further developed for reducing the complexity of an iterative decoder with a four stage concatenation, including an iterative demodulator. This iterative demodulator also allows the employment of a URC code with fewer states, resulting in an overall complexity, whilst beneficially increasing the performance of the scheme in complexity-limited scenarios. The proposed adaptive algorithm allows low SERs to be achieved over a range of complexity limits, while the benchmarkers either favor low complexity limits or high complexity limits exclusively. This paper also proposes an approximation to the MI measurement used in the adaptive algorithm, facilitating its hardware implementation with a low overhead.
