ABSTRACT A novel joint source and channel coding (JSCC) scheme is proposed, which we refer to as the reordered Elias gamma error correction (REGEC) code. Like the recently proposed unary error correction (UEC) code and EGEC code, the proposed code facilitates the practical near-capacity transmission of source symbol values that are randomly selected from a large or infinite alphabet. However, in contrast to the UEC code, both the EGEC and our proposed REGEC codes are universal codes, facilitating the transmission of source symbol values that are randomly selected using any monotonic probability distribution. However, the EGEC code has a complicated structure comprising two parts, where unequal error protection is required to balance the two parts with the aid of a specific parameterization that must be tailored to the source distribution, preventing its employment for unknown or non-stationary sources. By contrast, the proposed REGEC code does not need unequal error protection, and hence its parameterization does not have to be tailored to the particular source distribution, and thus the REGEC code is a more attractive scheme. More explicitly, our REGEC code has a simple structure comprising only a single part, which does not suffer from the delay and loss of synchronization that are associated with the two parts of the EGEC code. In a particular practical scenario, where the source symbols obey a specific Zeta probability distribution, our REGEC scheme is shown to offer gains of up to 0.9 dB over the best of JSCC and separate source and channel coding (SSCC) benchmarkers, when QPSK modulation is employed for transmission over an uncorrelated narrowband Rayleigh fading channel. In the scenario where the source symbols obey the distribution produced by the H.265 video codec, our REGEC scheme is shown to offer a gain of 0.7 dB over the SSCC benchmarker. These gains are achieved for free, without increasing the required transmit-duration, transmit-bandwidth, transmit-energy, or decoding complexity.
REGEC Reordered Elias Gamma Error
Reconstruction of the corresponding symbol or bit vector.
I. INTRODUCTION
Multimedia codecs such as H.264 [1] and H.265 [2] typically produce symbols that have a wide range of values. Our previous work [3, Fig. 1 ] and Figure 1 demonstrate that both H.264 and H.265 produce symbol values that may be represented using positive integers having values of up to around 1000, where higher values are observed with lower probabilities. This is characteristic of Zipf's law [5] and so the symbol values may be modeled using a Zeta probability distribution [5] . These multimedia codecs employ source codes such as the unary code [6] and the Elias Gamma (EG) code [7] for the entropy coding of these symbols. Here, each symbol value is mapped to a different binary codeword, having a variety of different lengths. In order to facilitate the reliable transmission of multimedia signals over error-prone channels, both source and channel coding is required. The timeline of their development is characterized at a glance in Figures 2 and 3 , respectively. Shannon's source-coding and channel-coding separation theorem [8] states that near-capacity communication is theoretically possible, when employing Separate Source and Channel Coding (SSCC). For example, this may be achieved by combining a near-entropy source code, such as an adaptive arithmetic code [9] or a Lempel-Ziv code [10] , with a near-capacity channel code, such as a Low Density Parity Check (LDPC) code [11] or a turbo code [12] . However, the source-coding and channel-coding separation theorem relies upon a number of idealized simplifying assumptions, such as having an infinite block-length and random channel errors encountered for transmission over Gaussian channels. Hence this profound theorem has a limited validity for practical VOLUME 4, 2016 FIGURE 2. Timeline of source coding milestones.
finite-delay, finite-complexity schemes communicating over fading channels exhibiting bursty -rather than randomerror distributions [13] . Furthermore, near-entropy adaptive arithmetic coding or Lempel-Ziv coding requires both the transmitter and receiver to accurately estimate the occurrence probability of every source symbol. However, the occurrence probability of rare symbol values cannot be accurately estimated until a sufficiently high number of symbols have been generated, imposing an excessive latency which cannot be tolerated in many practical applications. This problem becomes particularly severe, when the symbol values are selected from a set having an infinite cardinality, such as the set of all positive integers. Furthermore, transmission errors may result in corrupting longer codewords in a specific way, where the corrupted codewords mimic a shorter legitimate codeword. This inevitably leads to the loss of synchronization between the transmitter and receiver, potentially causing an avalanche-like propagation of decoding errors.
It is these issues that motivate the employment of structured source codes, such as the unary [6] and EG code [7] in many practical multimedia communication schemes. More specifically, structured source codes operate on the basis of codewords that conform to a particular structure, rather than having a design that is tailored to the specific probabilities of occurrence of the symbols generated by the particular source, like arithmetic and Lempel-Ziv codes. Owing to this, structured source codes facilitate the communication of symbols selected from infinite sets, without requiring any knowledge of the corresponding occurrence probabilities at either the transmitter or receiver. Other examples of structured source codes include the Elias delta code [7] , the Elias omega code [7] , the Even-Rodeh code [14] , the Stout code [15] and the Fibonacci code [16] , just to name a few. Furthermore, the Exponential Golomb (ExpG) code [17] is a parametrized structured source code, which subsumes the EG code as a special case. As we mentioned before, structured source codes are typically employed in multimedia codecs, where they are invoked for encoding the values of various symbols, such as motion vectors of a sophisticated video codec. However, typically some residual redundancy remains in the source-coded bit-stream when structured source codes are employed for representing symbols that are produced by multimedia codes, hence imposing a capacity loss and preventing near-capacity operation when SSCC is employed [3] . Furthermore, SSCC is sensitive to transmission errors, with a single bit error potentially causing the corruption of several video frames in H.264, for example. As a remedy, JSCCs [42] have been proposed for exploiting the residual redundancy associated with structured source codes, hence avoiding capacity loss. We previously proposed a pair of JSCCs schemes for the near-capacity transmission of source symbols that are randomly selected from a large alphabet, namely the Unary Error Correction (UEC) code [3] and the Elias Gamma Error Correction (EGEC) code [43] . More specifically, our previously proposed UEC code [3] was the first JSCC that has a low decoding complexity, when invoked for representing symbols values that are selected from an alphabet having a large or infinite cardinality. However, the UEC code has limited applicability, since it is based on the unary code [6] , which is a structured source code, but is not a universal code, as shown in Figure 4 . More specifically, the UEC code only has a finite average codeword length for particular source distributions, including only a limited subset of the Zeta probability distributions that does not include the Zeta distribution that models the symbols produced by H.265 most closely. Motivated by this, we subsequently proposed the EGEC code [43] , which was the first universal JSCC. More specifically, since the EGEC code is based on the universal EG source code [7] , it has a finite average codeword length for any mototonic source distribution, in which lower symbol values have greater occurrence probabilities than higher symbol values. Owing to this, the EGEC code was the first JSCC that facilitates the low-complexity nearcapacity transmission of symbol values that are randomly selected from a large or infinite alphabet using a widely applicable range of probability distributions. However the EGEC scheme has a complicated structure, comprising two parts, namely the EGEC (UEC) part and the EGEC (FLC-CC) part, as shown in Figure 5 (b). The EGEC(UEC) part operates on the basis of the UEC code of [3] while the EGEC (FLC-CC) part employs a serial concatenation of a FLC with a Convolutional Code (CC) and relies on side information provided by the EGEC(UEC) part. Owing to this specific structure, the EGEC(FLC-CC) part cannot be operated until after the operation of the EGEC(UEC) part has been completed, which will cause additional processing delay. Furthermore, if the side information provided by the EGEC(UEC) part contains any decoding errors, the EGEC(FLC-CC) decoder part will become desynchronized, w.r.t to the encoder, hence inflicting a high number of decoding errors. Depending on the particular source probability distribution, the two parts of the EGEC code typically have different error correction performances, with one or other of the parts becoming the dominant limitation of the overall error correction performance. This deficiency can be solved by using puncturing [43] for involving Unequal Error Protection (UEP) for the two parts, giving them equal error correction performances. However the puncturing will impose some capacity loss and it will also increase the complexity of the system, since the punctured bits still have to be decoded during the decoding process. Furthermore, the above-mentioned UEP must be specifically parametrized for a particular source probability distribution. If the actual source distribution is unknown or it is non-stationary, then it will typically fail to match the distributions, hence causing further capacity loss.
Against this background, this paper proposes a universal JSCC scheme, which we refer to as the Reordered Elias Gamma Error Correction (REGEC) code. This has a simple structure, which facilitates the near-capacity transmission of symbol values that are randomly selected from large alphabets using any arbitrary monotonic probability distribution at a low complexity. Since it is a universal code, the applicability of the REGEC code is not limited to any particular source symbol distribution like the UEC. Furthermore, since the REGEC code has a simple structure comprising only a single constituent part as shown in Figure 5 (a), it does not suffer from the delay, from loss of synchronization, loss of capacity or from the increased complexity of puncturing, that are associated with the EGEC. Furthermore, the REGEC code is an attractive solution, since it does not require UEP that is tailored for a specific source distribution, like the EGEC code. Our REGEC code is based on a novel source code, which we refer to as the Reordered Elias Gamma (REG) code, since it reorders the bits in each of the EG codewords for creating a relatively simple structure. Since this is achieved without changing the length of the codewords, the REG code constitutes a universal code, like the EG code. The proposed REGEC code combines the REG source code with a novel trellis-based channel code. Reordering the bits in the EG codewords allows the REGEC trellis to be designed for ensuring that the transitions between its states are synchronous with the transitions between the consecutive codewords in the REG encoded bit sequence. This allows the residual redundancy in the REG encoded-bit sequence to be exploited for error correction by the REGEC trellis decoder, hence facilitating near-capacity operation.
As shown in Figure 6 , the rest of this paper is organized as follows. In Section II, we describe the Zeta source probability distribution and generalize the infinitecardinality source alphabet of our previous work to the case of VOLUME 4, 2016 FIGURE 5. Schematics of the (a) REGEC, (b) EGEC, (c) UEC JSCC schemes and (d) the EG-CC SSCC scheme, when serially concatenated with URC and Gray-coded QPSK modulation schemes. Bold notation without a diacritic is used to denote a symbol vector or a bit vector. A diacritical hat represents a reconstruction of the symbol or bit vector having the corresponding notation. A diacritical tilde represents an LLR vector pertaining to the bit vector with the corresponding notation. A roman superscript 'a' is employed to denote an a priori LLR vector, while 'e' is employed for extrinsic LLR vectors. Furthermore, {π 1 , . . . , π 5 } represent interleavers, while {π −1 1 , . . . , π −1
5
} represent the corresponding deinterleavers. Puncturing may also be performed in π 2 and π 5 , while the corresponding depuncturing operations take place in π −1 2 and π −1
. Multiplexing and demultiplexing is performed in the crossed boxes. 5952 VOLUME 4, 2016 FIGURE 6. The structure of the paper. a finite cardinality, where this cardinality represents an additional parameter to be considered. In Section III, we introduce the novel REG code and describe the structure of the REG codewords. Section IV and V introduce our novel REGEC encoder and decoder, respectively. In Section VI, we analyze the parametrization of the proposed REGEC scheme and demonstrate that it facilitates near-capacity operation. In Section VII, we will consider a wide range of finite Zeta-like probability distributions as well as the H.265 distribution and we will show that our REGEC scheme is capable of offering gains of up to 0.9 dB over the best UEC, EGEC and SSCC benchmarkers in each case, when employing Quaternary Phase Shift Keying (QPSK) for communication over an uncorrelated narrowband Rayleigh fading channel. In the scenario where the source symbols obey H.265 distributions, our REGEC scheme is shown to offer a gain of 0.7 dB over the SSCC benchmarker. Note that these gains are achieved for free, without increasing the required transmitduration, transmit-bandwidth, transmit-energy or decoding complexity. Finally, we offer our conclusions in Section VIII.
II. SYMBOL VALUE SETS HAVING A LARGE CARDINALITY
The schemes considered in this paper are designed to convey 
where N L = {1, 2, 3, . . . L} is the finite-cardinality alphabet comprising positive integers with the cardinality L. Our previous contributions [3] , [43] characterized the performance of the UEC, EGEC and SSCC schemes invoked for to representing symbols values that are selected from a set having an infinite cardinality. Instead, in this paper we will the use symbol set N L having the finite cardinality of L = 1000, since the symbol values of H.264 shown in [3, Fig. 1 ] of this treatise and H.265 shown in Figure 1 are selected from an alphabet having a cardinality of approximately 1000. Here, the symbol entropy is given by
Again Figure 1 exemplifies the distribution of the symbol values that are obtained from the H.265 video encoder, corresponding to a symbol entropy of H D = 2.3922 bits per symbol. Note that these symbol values obey Zipf's law [5] , since their distribution may be approximated by the finite Zeta-like distribution. Here, we define the finite Zeta-like distribution as
where
is the generalized harmonic number 1 of order L of s, where s ∈ R for finite L. The limit of L → ∞ exists when s > 1 and the generalized harmonic number converges to the Riemann Zeta function. 2 The finite Zeta-like distribution may be more conveniently parametrized by the probability of the symbols adapting the most likely value of 1, which is given by
L . In the case of the finite Zeta-like distribution, the symbol entropy is given by
is the derivative of the harmonic number with respect to s.
III. REORDERED ELIAS GAMMA CODE
As shown in Table 1 , source encoders such as the unary or EG encoders represent each symbol d i in the vector d using a corresponding binary codeword, namely Unary(d i ) or EG(d i ), respectively. Note that for the convenience of our ensuing discussions, the unary codewords shown in Table 1 are the complements of those that are conventionally employed, for example in [3, Table I ]. The average codeword length is given by
where l(d) is the length of the d th codeword.
In the case of a unary code, the length of the codeword Unary codeword length of
when the source symbols obey the finite Zeta-like distribution of (1). However, the average unary codeword length l is only finite for s > 2 and hence for p 1 > 0.608 when L tends to infinity. For the case of the finite Zeta-like distribution having the cardinality L = 1000, the average codeword length of the unary code is almost double that of the EG code when p 1 = 0.608, as we will characterize below. Despite this, the unary code was used as the basis of the JSCC UEC scheme [3] , since its codewords have a relatively simple structure, which can be readily exploited for error correction. More specifically, the structure of the unary codewords can be described by the UEC trellis of [3] , without requiring an excessive number of trellis transitions and states. By contrast, an EG codeword EG(d i ) has a length of l EG (d i ) = 2 log 2 (d i ) +1. When the source symbols obey the finite Zeta-like distribution, the average EG codeword length becomes [43] 
where the frac(·) operator yields the fractional part of the operand, where frac(3.4) = 0.4 for example [43] . Note that the average EG codeword length l is finite for all Zeta distributions as L → ∞, not just for those for which have p 1 > 0.608. For the case of L = 1000, the average EG codeword length is lower than that of the unary code for all cases where p 1 < 0.794. However, the conventional EG codewords have a relatively complicated structure, which cannot be readily described by a single trellis and hence cannot be readily exploited for low-complexity error correction using a simple JSCC structure. Owing to this, our previous work [43] was only able to develop a trellis representation of the EG code by decomposing each symbol d i into two sub-symbols x i and t i as shown in Figure 5( Figure 5 (b), each sub-symbol x i is encoded by the EGEC(UEC) part of the EGEC code, while each subsymbol t i is encoded by the EGEC(FLC-CC) part. However, the reliance on these two parts leads to the requirement to tailor the UEP of the two parts for the specific source probability distribution, which may not match with the actual source distribution if it is unknown or non-stationary, as well as imposing for the disadvantages associated with an increased delay, loss of synchronization, capacity loss and increased complexity due to puncturing, as described in Section I. In order to eliminate the requirement for a complicated code structure comprising two parts, with the designobjective of creating a simple trellis structure, we propose a novel reordering of the bits in each EG codeword. We refer to the reordered code as the REG code, where the generalized structure of each REG codeword is shown in Figure 7 . The reordering is conceived as follows. As described above, the first x i bits of the conventional EG codeword, EG(d i ) are given by a unary codeword Unary(x i ). These bits become the odd-indexed bits of the our corresponding REG codeword. Notice that the final 1-valued bit in Unary(x i ) becomes the final bit in REG(d i ), since all REG codewords comprise an odd number of bits, in common with all EG codewords. The last x i − 1 bits of the conventional codeword EG(d i ) comprise the FLC codeword FLC(t i , x i − 1), which become the evenindexed bits of the corresponding REG codeword REG(d i ).
Since each REG codeword has the same length of l REG(d i ) = 2 log 2 (d i ) + 1 as the corresponding EG codeword, the REG code will have the same average codeword length l REG as the EG code, which is given by (5) . Therefore, since the EG is a universal code, so too is the REG. This approach is motivated by the difference in the structures of the unary and EG codewords shown in Table 1 . In [3] , a UEC code was designed for the JSCC of unary-encoded symbols, in order to facilitate near-capacity communication. This is achieved by designing the UEC trellis of [3] for ensuring that the path through the trellis remains synchronized with the unary codewords. More specifically, the UEC trellis uses the logical 1-valued bit at the end of each unary codeword to detect the boundary between consecutive codewords and to trigger a return to a start state. By contrast, maintaining trellis synchronization during the joint source and channel coding of EG-coded symbols is more complicated. This is because the length of the EG codeword depends on the length of its unary prefix, which may be detected using the 1-valued bit at the end. However, an EGEC trellis designed for maintaining synchronization with the EG codewords would require states, i.e. memory for storing the length of the unary prefix all the way until the end of the FLC suffix is reached, whereupon a return to start state could be triggered. Since the unary prefix can have any length selected from an infinite set, an infinite number of states would be required to store this information, hence preventing the construction of a practical trellis. Instead, we can maintain synchronization by reordering the bits in the EG codeword, so that the logical 1-valued bit at the end of the unary prefix appears instead at the end of the REG codeword. In this way, this logical 1-valued bit may be used for detecting the boundary between consecutive codewords and to trigger a return to start state in the proposed REGEC trellis, which will be introduced in Section IV. In this way, synchronization can be maintained and near-capacity joint source and channel coding can be achieved.
IV. REORDERED ELIAS GAMMA ERROR CORRECTION ENCODER
In this section, we introduce the REGEC encoder, which is illustrated in Figure 5 (a). In Section IV-A, we discuss the operation of the REG source encoder. The operation of the REGEC trellis is described in Section IV-B. Finally, Section IV-C describes the serial concatenation of the REGEC encoder with the Unity Rate Code (URC) encoder and QPSK modulator of Figure 5 (a).
A. REORDERED ELIAS GAMMA ENCODER
The REG encoder of Figure 5 
B. REORDERED ELIAS GAMMA ERROR CORRECTION TRELLIS ENCODER
As shown in Figure 5 (a), the bit vector of concatenated REG codewords y is forwarded to a trellis encoder, which employs a novel REGEC trellis for encoding each bit y j in the vector y, in order of increasing bit-index j. The trellis comprises b number of concatenated trellis stages of the type depicted in Figure 8 . Each trellis stage comprises 2r number of transitions between r number of states, where r is required to satisfy r = 2f + 2, where f must be even. For example, an r = 6-state trellis is shown in Figure 8 (a), an r = 14-state trellis is shown in Figure 8 (b) and the general case is given in Figure 8 (c). Each successive bit of y forces the trellis encoder to transition from its particular previous state m j−1 ∈ {1, 2, . . . , r} into a new state m j ∈ {1, 2, . . . , r} that is selected from two legitimate alternatives, depending on the bit value y j . In the trellis stages of Figure As discussed in Section III, the odd-indexed bits in the REG codewords derive from a unary codeword, while the even-indexed bits come from an FLC codeword. These unary and FLC bits force the trellis path into different sub-sets of the r trellis states. More specifically, we decompose the set of r states in to three sub-sets, namely the unary states, the FLC states and the holding states. The trellis is designed for ensuring that each input bit y j that is provided by a unary bit causes a transition from one of the first f number of states m j−1 ∈ {1, 2, . . . , f }, which we refer to as the unary states, where f must be even. The transition enters a next state m j , according to
where odd(·) yields 1 if its operand is odd or 0 if it is even. Note that since each REG codeword ends with a unary bit having the value y j = 1, the trellis path m is guaranteed to enter either state m j = 1 or m j = 2 after each codeword. states m j−1 ∈ {f + 1, f + 2, . . . , 2f − 2}, which we refer to as the FLC states, since the next bit will be an FLC bit. This FLC bit is guaranteed to cause a transition from the FLC state to a unary state, since an FLC bit is always followed by a unary bit in the REG codewords. The next state m j , is selected according to
if y j = 0 and m j−1 ∈ {f + 1, . . . , 2f − 2}.
Observe that when f = 2, there are no FLC states in the trellis, as shown in Figure 8 (a). Note that REG codewords having a length l(d i ) ≤ (2f − 2) cause the path m to enter only the unary and FLC states described above. However, REG codewords having a length l(d i ) > (2f − 2) require four additional states, which we refer to as the holding states, since they act as a 'holding pattern' for the bits in the REG codeword from the (2f − 1) st bit onward. More specifically, the FLC holding states m j ∈ {2f − 1, 2f } are entered into, if the unary bit y j = 0 is encountered, while being in one of the unary states of the set m j−1 ∈ {f −1, f }, as shown in (6) . Upon emerging from the FLC holding states m j−1 ∈ {2f − 1, 2f }, the next state will be chosen from the unary holding states of the set m j ∈ {2f + 1, 2f + 2} according to
if y j = 0 and m j−1 ∈ {2f − 1, 2f}. (8) Likewise, upon traversing from the unary holding states m j−1 ∈ {2f + 1, 2f + 2}, the next state will be chosen according to
if y j = 1 and m j−1 ∈ {2f + 1, 2f + 2} m j−1 − 2 if y j = 0 and m j−1 ∈ {2f + 1, 2f + 2}. (9) Note that the trellis path m will remain in the holding states, as long as unary bits having the value of y j = 0 are encountered. When the final y j = 1-valued unary bit of the REG codeword is encountered, the trellis path returns to state m j = 1 or m j = 2, ready for the start of the next REG codeword.
Finally, combining Equations (6) to (9) yields (10) , as shown at the bottom of this page. Note that the total number of states is given by r = (2f + 2).
The path m may be modeled as a particular realization of a vector M = [M j ] b j=0 comprising (b + 1) RVs, which are associated with the transition probabilities Pr(M j = m, M j−1 = m ) = P(m, m ) of (11), as shown at the bottom of this page. These transition probabilities depend on the source symbol probabilities P(d), which can be derived by employing the method of [3, Appendix] . In (11), l 1 is the average length of Unary(x i ), as described in Section III. In the case of the finite Zeta-like distribution of (1), l 1 is given by [43] 
. (12) The conditional transition probabilities
Once the path m has been determined, the trellis encoder uses it to represent each bit y j in the vector y by an n-bit
codeword z j . This is selected from the matrix of r/2 codewords C = c 1 ; c 2 ; . . . ; c f +1 or from the complementary matrix C = c 1 ; c 2 ; . . . ; c f +1 . As shown in Figure 8 (c), this is achieved according to
Following this, the selected codewords are concatenated to obtain the bn-bit vector z = [z k ] bn k=1 of Figure 5 . For example, the vector y = 010010111000001100111001 of b = 24 bits is represented by the vector z = 111101111011111000001101110100010011100011110001 of bn = 48 bits, when employing the r = 6-state REGEC trellis of Figure 8 (a), with the n = 2-bit codebook C = [00; 11; 01] .
Note that the selection of the number of trellis states r is discussed in Section VI-D, while the selection of the codebook C is discussed in Section VI-E. We emphasize that REGEC trellis encoder operates in a similar manner to a UEC trellis encoder and a CC encoder, but subject to the following important differences, as follows.
1) As in the UEC trellis encoder, a bit having the value of y j = 1 will force a transition from the odd-indexed states at the top half of the REGEC trellis to the even-indexed states in the bottom half and vice-versa. Owing to this symmetry and due to using complementary codewords, the REGEC trellis encoder produces equiprobable bit values for the bit vector z. This results in a bit entropy of H z = 1, which is a necessary condition for avoiding capacity loss, as described in [3] . However, in contract to the unary codewords of the UEC encoder, y j = 1 does not only occur at the end of a REG codeword, resulting in transitions between the top and bottom halves of the REGEC trellis more frequently than only at the end of each codeword. By contrast, CC encoders produce binary values that are not guaranteed to be equiprobable, unless they are specifically parametrized for this purpose, as characterized in [3, Table II ]. 2) As we described above, the final unary-bit y j in each REG codeword is guaranteed to induce a transition to either state m j = 1 or state m j = 2 of the REGEC trellis, in analogy with the UEC trellis. However, unlike in the UEC encoder, the particular one from the pair of states m b = 1 or state m b = 2 that is selected at the end of the REGEC trellis path m depends on more than factors just deciding whether the length a of the symbol vector d is odd or even. This is due to the transitions between the top and bottom halves of the REGEC trellis that are caused by bits having the value y j = 1 in the middle of REG codewords, as described above. By contrast, in a generalized CC encoder, the trellis path can potentially end in any state, since the transitions between states are not synchronized with the codewords of the source encoder.
Since the binary values in the vector z are equiprobable, the average coding rate of the REGEC encoder is given by
Here, we employ the roman superscript 'o' to indicate that this coding rate relates to the outer encoder of a serial concatenation, namely the REGEC encoder shown in Figure 5 (a).
C. INTEGRATION OF THE REGEC ENCODER INTO A TRANSMITTER
Following REGEC encoding, the bit vector z is interleaved by the block π 1 , URC encoded [44] and then interleaved again by the block π 2 , as shown in Figure 5 (a). Puncturing may also be performed within π 2 in order to achieve a particular desired effective throughput η for the transmitter. This is achieved by discarding an appropriate number of bits following interleaving. The inner coding rate R i is defined by the ratio of bits input into the URC encoder to the number of bits output by π 2 , where R i > 1 will be obtained if puncturing is used. Here we employ the roman superscript 'i' to indicate that this coding rate relates to the inner code of a serial concatenation, namely the punctured URC code shown in Figure 5 (a). In order to avoid obfuscating the performance analysis of the proposed REGEC scheme by invoking the prevalent high-order modulations schemes routinely used for multimedia communication, simple M = 4-ary Gray-coded QPSK modulation may be employed for transmission, as shown in Figure 5 (a). Note that other mapping schemes or a modulation scheme having a higher modulation-order M can be employed instead, although this may increase the complexity of the receiver, as we will discuss in Section V. The effective throughput of the transmitter is given by
Note that no knowledge of the source probability distribution P(x) is required anywhere in the transmitter.
V. REORDERED ELIAS GAMMA ERROR CORRECTION DECODER
In this section, we describe the operation of the REGEC decoder of Figure 5 (a). In Section V-A, we discuss the integration of the REGEC decoder with the URC decoder and QPSK demodulator of Figure 5 (a). Following this, we detail the operation of the REGEC trellis decoder in Section V-B, while the REG decoder is described in Section V-C.
A. INTEGRATION OF REORDERED ELIAS GAMMA ERROR CORRECTION DECODER INTO A RECEIVER
In the receiver, soft QPSK demodulation [45] , depuncturing and deinterleaving π −1 2 , Bahl-Cocke-Jelinek-Raviv (BCJR)-based URC decoding [13] and further deinterleaving π −1 1 may be performed, before invoking the proposed REGEC decoder of Figure 5(a) . If a higher order modulation scheme were employed, then iterative decoding between the demodulator and the URC decoder would be required to avoid capacity loss, as is the case in any iterative decoding scheme [46] . Note that the receiver is required to employ the same pseudo-random interleaver designs as the transmitter. However, the entire set of interleavers can be generated independently by both the transmitter and receiver using only a single pseudo-random number generator seed. This seed may be hard-coded into both the transmitter and receiver, or may be reliably conveyed using only a very small amount of side information. The REGEC decoder is provided with the a priori LLR vectorz a and in response it generates the extrinsic LLR vectorz e of Figure 5 (a), which may be iteratively exchanged with the serially concatenated URC decoder, until iterative decoding convergence to an infinitesimally low SER is achieved. In turn, the URC decoder may also iteratively exchange extrinsic LLRs with the demodulator [47] , in order to avoid capacity loss when a mapping scheme other than Gray coding or when a higher-order modulation scheme is employed. Note that the combination of the URC decoder and the demodulator will have an EXtrinsic Information Transfer (EXIT) curve that reaches the (1, 1) point at the top right corner of the EXIT chart [48] .
B. REORDERED ELIAS GAMMA ERROR CORRECTION TRELLIS DECODER
As shown in Figure 5 (a), the REGEC trellis decoder is provided with a vector of a priori LLRsz a = [z a k ] bn k=1 that pertain to the corresponding bits in the vector z. The trellis decoder applies the BCJR algorithm [32] to a REGEC trellis of the sort shown in Figure 8 (c) to consider every legitimate bit vector that could be represented byz a , having the particular length bn. Here the value of bn is assumed to be perfectly known to the receiver, where the transmitter may employ a small amount of side information to reliably convey this value in practice. Here, the synchronization between the REGEC trellis and the REG codewords is exploited during the BCJR algorithm's γ t calculation of [32, eq. (9) ], by employing the conditional transition probabilities P(m|m ) of (11) . Note that the REGEC trellis should be terminated at m 0 = 1 and at both possibilities for the final state, namely m b = 1 and m b = 2, as described in Section V-A. As shown in Figure 5(a) , the BCJR decoder generates the vector of extrinsic LLRsz e = [z e k ] bn k=1 which is provided for the next iteration of the concatenated URC decoder's operation. Note that the REGEC trellis decoder's BCJR algorithm has only modest complexity, since it may employ a low number r of states. Furthermore, it facilitates error correction even if the symbol probability distribution P(d) is unknown, provided that the channel Signal to Noise Ratio (SNR) is sufficiently high, as we shall demonstrate in Section VI-E. In this case, the conditional transition probabilities P(m|m ) of (11) will also be unknown and so they are simply omitted from the BCJR algorithm's γ t calculation.
The transformation ofz a intoz e by the trellis decoder of Figure 5 (a) may be characterized by plotting the inverted REGEC EXIT curve in an EXIT chart [49] , as exemplified in Figure 9 . Note that if a suitably designed codebook C comprising codewords having at least n = 2 bits is employed, then the free distance d free of the REGEC code will be at least two, as it will be quantified in Section VI. In this case the inverted REGEC EXIT curve will reach the (1, 1) point in the top right corner of the EXIT chart [50] . Since the URC decoder and demodulator also have an EXIT curve that reaches the (1, 1) point in the top right corner of the EXIT chart [48] as shown in Figure 9 , iterative decoder convergence towards the Maximum Likelihood (ML) performance is facilitated [51] .
The EXIT chart area A o that is situated below the inverted REGEC EXIT curve is given by [43] , [52] 
Note that, the REGEC EXIT chart area A o is independent of the codebook design, but using different codebooks can affect the shape of the EXIT curve, as will be discussed in Section VI-B. Following the completion of iterative decoding, the REGEC trellis decoder may employ the Viterbi algorithm to generate the vectorŷ = [ŷ j ] b j=1 of recovered bits, which pertain to the corresponding bits in the vector y, as shown in Figure 5 (a).
C. REORDERED ELIAS GAMMA DECODER
The decoded bit vectorŷ can be REG decoded in order to obtain the recovered symbol vectord of Figure 5 (a). If there are any bit errors in the vectorŷ, then we might arrive either at the wrong legitimate REG codeword or fail VOLUME 4, 2016 to find a legitimate codeword. In this case, these bits are discarded. If the decoded symbol vectord does not contain the correct number a of symbols, then an appropriate number of symbols is removed from the end ofd or an appropriate number of 1-valued symbols is appended to the end ofd, accordingly. Here, it is assumed that the REG decoder has perfect knowledge of a. In practice, this value may be fixed in both the transmitter and receiver, or it may be reliably conveyed from transmitter to receiver using a small amount of side information.
VI. PARAMETRIZATION OF THE REORDERED ELIAS GAMMA ERROR CORRECTION CODE
In this section, we discuss the parametrization of the REGEC code. In Section VI-A, we introduce the extension rule of the REGEC codebook extension. In Section VI-B, we analyze the the near-capacity operation of the REGEC decoder. In Section VI-C, we discuss the codebook design of the REGEC trellis encoder, considering the free distance properties of various candidate codebooks. The EXIT curves of the candidate codebooks and their EXIT chart matching are discussed in Section VI-D. Finally we analyze the error floor of the candidate codebooks in Section VI-E and selected a recommended codebook.
A. REORDERED ELIAS GAMMA ERROR CORRECTION CODEBOOK EXTENSION
As described in Section IV-B, an REGEC trellis having r number of states is parametrized by a set of r/2 codewords C, each comprising n number of bits, where C = [00; 11; 01] in the r = 6, n = 2 example of Figure 8 (a) and C = [00; 01; 01; 11; 11; 11; 01] in the r = 14, n = 2 example of Figure 8(b) . Any codebook C corresponding to a trellis having r = 2f + 2 number of states can be extended to a codebook C corresponding to r > r number of states including several new unary and FLC states. Note that when provided with the same REG-encoded bit vector y, REGEC trellis encoders employing the trellises of Figure 8 (a) and Figure 8 (b) are guaranteed to generate identical REGECencoded bit vectors z, despite using different codebooks C. This is because the r = 14 codebook of Figure 8(b) is an extension of the r = 6 codebook of Figure 8(a) . In this way, the use of extension allows a higher number of states r to be used in the REGEC trellis decoder than in the REGEC trellis encoder. This allows us to dynamically change the number of states employed in the decoder in order to strike an attractive trade-off between its performance versus trellis complexity, as characterized in Section VI-B [53] .
B. PERFORMANCE ANALYSIS
Near-capacity operation is achieved, when reliable communication can be maintained at transmission throughputs η that approach the Discrete-input Continuous-output Memoryless Channel (DCMC) capacity C [54] that is associated with M = 4 QPSK modulation and uncorrelated narrowband Rayleigh fading. This is facilitated, if the following conditions are satisfied [52] :
1) The URC decoder of Figure 5 (a) is required to have an EXIT curve having an area beneath it of
2) The area A o beneath the inverted EXIT curve of the REGEC trellis decoder is required to approach the REGEC coding rate R o . If these two conditions are satisfied, then near-capacity operation will be achieved, when the shape of URC decoder's EXIT curve is closely matched to that of the inverted REGEC EXIT curve. This creates a narrow, but marginally open EXIT chart tunnel, which facilitates iterative decoding convergence towards the ML performance [51] . FIGURE 10. Plots of R o n and A o n that are obtained for the REGEC scheme, EG-CC scheme and UEC scheme, in the case where the symbol values of d obey a finite Zeta-like distribution having the parameter p 1 and cardinality L = 1000. Here, R o is the coding rate, A o is the area beneath the inverted EXIT curve and n is the codeword length of the corresponding scheme. The value of A o n is provided for an REGEC code having f /2 ∈ {1, 2, 3, 4, 5, 6, 7, 8, 9}, while the value of A o n is provided for a UEC code having r ∈ {2, 4, 6, 8, 30} .
The first condition listed above is satisfied by a punctured URC code, as discussed in [52] . Figure 10 shows that the second of the above-mentioned conditions is satisfied when the RVs in the vector D obey the finite Zeta-like distribution of (1) having the cardinality L = 1000 and various values for the parameter p 1 . This figure plots the REGEC coding rate R o of (15) when multiplied with the REGEC trellis codeword lengths n. Furthermore, Figure 10 plots the product of n and the area A o of (17) beneath the inverted REGEC EXIT curve for the case where the trellis decoder employs f /2 ∈ {1, 2, 3, 4, 5, 6, 7, 8, 9}, giving r ∈ {6, 10, 14, 18, 22, 26, 30, 34, 38} states. Note that according to (15) and (17), the area A o and coding rate R o are dependent on the symbol entropy H D , average REG codeword length l REG and trellis codeword length n, but are independent of the codebook design C. Furthermore, the for n = 2 bits and r = 6 states and their corresponding FD d f . For finite Zeta-like probability distributions having L = 1000 and various values of p 1 , the number of states in the URC having the best matching EXIT curve is provided, together with the corresponding E b /N o tunnel bound in brackets.
product of the REGEC EXIT chart area A o and the codeword length n is related to the number of unary states f , as shown in Figure 10 . In the case of the H.265 symbol value distribution of Figure 1 , we obtain R o n = 0.8787. (1) for a cardinality of L = 1000 and for various values for the parameter p 1 . Note that in all the scenarios considered, the discrepancy is less than 10 −1 and becomes less than 10 −2 when f /2 ≥ 4, including the case of the H.265 symbol value distribution of Figure 1 .
However, the trellis complexity and hence the complexity of REGEC decoding is proportional to the number of states. Our experiments revealed that f = 2 and r = 6 represents an attractive trade-off between maintaining a low trellis complexity and facilitating near-capacity operation.
C. REGEC CODEBOOK CANDIDATE SELECTION
In this section, we will discuss the codebook design for an n = 2 r = 6 REGEC trellis. An n = 2 r = 6 codebook comprises r/2 = 3 codewords, each comprising n = 2 bits. Therefore, there are 2 6 possible n = 2 r = 6 codebooks. However it can be shown that all of these are equivalent to one of the 10 codebooks shown in Table 2 , which contains no pairs of equivalent codebooks. More specifically, two codebooks are equivalent, if each pairing of codewords within one of the codebooks has the same Hamming Distance (HD) as the corresponding pairing of codewords within the other codebook. Owing to this, two codebooks are equivalent, if one can be transformed into the other by toggling all bits and/or changing the order of the bits in each codeword using the same reordering pattern.
The 10 candidate REGEC codebooks having n = 2 r = 6 are shown in Table 2 , where the bits of the codewords have been toggled and reordered in order to minimize the decimal values that are represented by successive codewords. The error correction capability of a codebook may be characterized by the Free Distance (FD) that results at the output of the REGEC trellis encoder [55] . Table 2 quantifies the FD of each candidate codebook, which was obtained using a bruteforce search. As described in Section IV, the REGEC trellis path always starts at the state m 0 = 1 and will always end at either state m b = 1 or state m b = 2. Therefore, our bruteforce search only needs to consider the free distance between paths that start and end at these states. Owing to this, our experiments revealed that a trellis comprising five stages like that of Figure 8 (a) is sufficient for finding the free distance, resulting in only a moderate searching complexity. Table 2 suggest that the candidate codebooks C 5 , C 6 , C 8 and C 10 will produce the best error correction capability, since they have the highest FD of 4. However, in iterative decoding schemes it is necessary to separately consider the error correction capability in the turbo cliff and error-floor regions of the Symbol Error Ratio (SER) plot, before the best VOLUME 4, 2016 candidate parametrization can be identified with certainty, as we shall discuss in the following sections.
Note that the FD of on REGEC code remains unaltered if its codebook is extended using the process of Section VI-A, since extension does not change the REGEC-encoding bit vector z produced for a given REG-encoded bit vector y. Furthermore, depending on the length n of each codeword, the FD of an REGEC code cannot be increased by increasing the number of states r above a particular limit. For example, the largest possible FDs of n = 2-bit REGEC codes is 4, regardless of whether r = 6 or r > 6 number of states are employed. This is because the legitimate transition path set of an r state trellis is a subset of the legitimate transition path set of a trellis having a higher number of possible states r > r. Therefore, we will focus our attention on codebooks corresponding to trellises having r = 6 states throughout the remainder of this paper.
D. EXIT CHARTS OF THE REGEC CANDIDATE CODEBOOKS AND THE BEST MATCHING URCs
As discussed in Section VI-B, the area A o beneath the inverted REGEC EXIT function and the REGEC coding rate R o are independent of the codebook design C. However, the shape of the REGEC EXIT curve and therefore its match with the URC EXIT curve does depend on the specific codebook design C. Since the candidate codebooks of Table 2 are unique with no pair of codebooks that are equivalent to each other, their inverted EXIT curves are all different from each other. Owing to this, different candidate codebooks have inverted EXIT curves that match best with the EXIT curve of URC codes having different parametrizations. In order to investigate this, we plotted the inverted EXIT curves of each candidate REGEC codebook, when used to encode source symbols obeying finite Zeta-like distributions having the cardinality L = 1000 and various values for the parameter p 1 ∈ {0.7942, 0.6, 0.4, 0.2}. In each case, the resultant EXIT curve was plotted together with the EXIT curves of URC codes having 2, 4 and 8 states. show the resultant EXIT charts for the cases of using the candidate codebooks C 1 , C 8 and C 9 to encode symbols obeying the finite Zeta-like distribution for L = 1000 and p 1 = 0.7942. Figures 9, 12 (a) and 12(b) also show the corresponding EXIT charts that result in the case, where the symbol probability distribution is unknown in the receiver, as described in Section V-B. The results of Table 2 show that C 1 is the codebook that facilitates an open EXIT chart tunnel at the lowest E b /N 0 value. This suggests that C 1 should offer the best performance in the turbo cliff region of the SER plot, since an open EXIT chart tunnel implies that iterative decoding convergence to an ML SER performance can be achieved [13] . However, C 1 may not offer the best performance in the error floor region of the SER plot, as we will investigate in the next section.
E. ERROR FLOOR ANALYSIS
The error correction capability of the candidate REGEC codebooks in the error floor region may be evaluated by considering the SER plots of Figure 12(c) . Note that when knowledge of the source probability distribution P(d) is available at the receiver, the candidate codebooks C 8 and C 9 offer steep turbo cliffs at E b /N 0 values near the corresponding E b /N 0 tunnel bounds, as predicted by the EXIT charts analysis of Section VI-D. However, the candidate codebook C 1 can be seen to suffer from an error floor, which prevents us from achieving a low SER at E b /N 0 values near the corresponding E b /N 0 tunnel bound of 1.6 dB. This may explained by the observation that the candidate codebook C 1 requires the a priori LLR vectorz a of Figure 5 (a) to have a higher Mutual Information (MI) I (z a ; z) than C 8 and C 9 require, in order to achieve a low SER, as shown in Figure 12(d) . Owing to this, the candidate codebook C 1 requires the iterative decoding process to converge closer towards the (1,1) point of the EXIT chart, which becomes difficult when the interleaver π 1 of Figure 5 (a) has only a moderate length [56] . As shown in Figure 12 (d), the candidate codebooks C 8 and C 9 require the lowest MIs I (z a ; z) in order to achieve low SERs. Meanwhile, the codebooks C 5 , C 6 , and C 7 have similar SER vs MI curves as C 4 while C 2 , C 3 and C 10 have similar performance with C 1 . Note that the FD-3 codebook C 9 offers better SER performance than several of the other codebooks having FDs of 4. We may speculate that this is because the the error correction capability of a candidate codebook is not only decided by the overall FD but also by the Hamming distances between the codewords that are associated with the transitions in the REGEC trellis having the highest transition probabilities of (11) . In the case where the receiver has no knowledge of the source probability distribution P(d), the SER curve of each candidate codebook is degraded, as shown in Figure 12 (c). However, this degradation is particularly apparent in the case of C 8 , since this causes it to develop an error floor. By contrast, the candidate codebooks C 4 , C 5 , C 6 , C 7 and C 9 do not suffer from an error floor, regardless of whether knowledge of the source probability distribution is available in the receiver while C 1 , C 2 , C 3 and C 10 suffer from error floors for both cases. Overall, we recommend the candidate codebook C 9 , since it offers the best performance among the candidate codebooks that never suffer from an error floor. Also, the candidate codebook C 9 works best with the URC inner code having the lowest complexity, namely that employing only r = 2 states. Therefore, we employ the candidate codebook C 9 throughout the next section, when we compare the performance of the proposed REGEC scheme with suitably designed benchmarkers. (a) and (b) EXIT charts of the proposed REGEC scheme. The EXIT curves are provided for REGEC codes employing the n = 2-bit r = 6-state codebooks C ∈ {C 1 , C 8 }, as well as for a URC having r ∈ {4, 8} states. (c) SER vs E b /N 0 plot for the REGEC codes employing the n = 2-bit r = 6-state codebooks C ∈ {C 1 , C 8 , C 9 }, when combined with URC codes having r ∈ {2, 4, 8} states. (d) SER vs I(z a ; z) plot for the REGEC codes employing the n = 2-bit r = 6-state codebooks C ∈ {C 1 , C 4 , C 8 , C 9 } when a priori LLR vectors Q z a having different MI I(z a ; z) are provided to the REGEC trellis decoder. In all plots, the symbols of d obey the finite Zeta-like distribution having p 1 = 0.7942 and L = 1000. The plots labeled 'No Probs' indicate the case where the source distribution P(d ) is unknown to the receiver.
VII. PERFORMANCE COMPARISON WITH THE BENCHMARKERS
In this section, we compare the proposed REGEC scheme to the EGEC benchmarker of Figures 5(b) [43] and to the UEC and EG-CC benchmarkers of [3] . Like the proposed REGEC schemes, both the EGEC and the UEC benchmarkers constitute examples of JSCCs, while the EG-CC benchmarker represents SSCC. More specifically, the EG-CC benchmarker employs an EG code for source coding, while an iterativelydecoded serial-concatenation of a CC and a URC is employed for separate channel coding. Note that apart from our own previous work, no other JSCC schemes have been design for large-cardinality sources. For example, the Variable Length Error Correction (VLEC) code [57] suffer from excessive complexity for large-cardinality sources, hence preventing a comparison with the proposed REGEC scheme. Again, we used QPSK modulation for transmission over an uncorrelated narrowband Rayleigh fading channel for all schemes, since this is representative of transmissions over realistic wireless channels and because this facilitates direct comparison with the results of [3] and [58] . In Section VII-A, we will discuss the parametrization of the REGEC scheme as well as of the three benchmarkers, in order to facilitate fair comparisons. Then we will analyze the SER performance of the proposed REGEC scheme and the three benchmarkers in Section VII-B.
A. PARAMETRIZATION Table 3 provides several parametrizations of the REGEC scheme, which are designed for transmitting symbols that obey the finite Zeta-like distribution of (1). Table 3 also provides corresponding parametrizations for the three benchmarkers, which offer the same throughput η as our REGEC scheme parametrizations. We parametrize the finite Zetalike distribution using a cardinality of L = 1000 and the parameter of p 1 ∈ {0.7942, 0.6, 0.4, 0.2}, which represents a wide selection of the p 1 values shown in Figure 10 . Note that the specific value of p 1 = 0.7942 is chosen, since it results in the same coding rate for the unary code and the EG code, and hence the same outer coding rate R o for all schemes considered in this section. Note that, when we have L → ∞, the UEC code becomes impractical for p 1 = 0.2, 0.4 and 0.6, since the average unary codeword length becomes infinite in these cases [3] . For finite case of L = 1000, the average unary codeword length is more than twice that of the EG code when p 1 = 0.2 and 0.4, hence severely degrading the performance of the UEC benchmarker. For this reason, the UEC benchmarker is not considered for these values of p 1 . Table 3 also considers the case of source symbols obeying the H.265 distribution of Figure 1 . Note that as described in Section I, the EGEC benchmarker has two parts that must be jointly optimized for each particular source symbol distribution using UEP. More specifically, the puncturing rates R i for the UEC part and the FLC-CC part must be carefully selected so that they have the same E b /N 0 tunnel bound [43] , as shown in Table 3 .
For all the schemes considered, we selected codewords comprising n = 2 bits when possible, while n = 3-bit codewords were selected for the FLC-CC part of the EGEC benchmarker, whenever necessary to achieve the desired effective throughput η for designing the UEP. We selected r = 6 states for the proposed REGEC scheme, since this is sufficiently high for imposing only an insignificant amount of capacity loss, as discussed in Section VI-A. Furthermore, we employ the REGEC codebook C 9 = [00; 11; 01] in order to avoid the error floors that are characterized in Section V-B. Furthermore, we adopt the r = 4-state UEC trellis of [3] for both the UEC benchmarker and for the UEC part of the EGEC benchmarker. Meanwhile, we employ an r = 4-state CC trellis in both the FLC-CC part of the EGEC benchmarker and in the EG-CC benchmarker, as recommended in [43] and [58] and because using higher numbers of states was found to be detrimental in [3] . All of the schemes considered in this section employ URC inner codes, for the sake of facilitating iterative decoding. As discussed in Section VI-D, the selected REGEC codebook C 9 has an EXIT curve that matches best with that of a URC code having 2 states, shown in Table 2 . The EGEC, UEC and EG-CC benchmarkers also have EXIT curves that match best with a 2-state URC, since these were found to yield open EXIT chart tunnels at the lowest E b /N 0 values in [43] . Therefore, we employ 2-state URCs for the inner codes of all schemes considered in this section. Note that the EGEC, UEC and EG-CC benchmarkers offer fair and natural comparisons with the proposed REGEC scheme, since they all employ simple unary, FLC or EG codewords, as well as trellis-based iterative decoding. Table 3 provides the E b /N 0 values where the DCMC capacity C becomes equal to the throughput η of each scheme considered. These E b /N 0 values represent capacity bounds, above which it is theoretically possible to achieve reliable communication, provided that the scheme facilitates nearcapacity operation. Furthermore, the specific E b /N 0 values, where we have A i = A o are provided for each scheme considered in Table 3 . These area bounds represent the lowest E b /N 0 values, where it is theoretically possible to create an open EXIT chart tunnel, provided that the outer and inner EXIT curves have shapes that closely match each other. Note that the discrepancy between the capacity bound and the area bound of each scheme represents an E b /N 0 capacity loss, as exemplified by Figure 10 for the REGEC, UEC and EG-CC schemes. As in the proposed REGEC code, the EXIT chart area A o below the inverted UEC curve approaches the UEC coding rate R o , when the number of states r is increased. By contrast, the EXIT chart area A o below the inverted EG-CC EXIT curve is not affected by the number of states in the CC trellis, hence resulting in large discrepancies between A o and R o , therefore imposing significant amounts of capacity loss.
As shown in Table 3 , the E b /N 0 the capacity loss of all JSCC schemes is more significant for smaller p 1 values, indicating that trellises having higher numbers of states are required to mitigate capacity loss in these cases. However, these capacity losses are smaller than those of the SSCC EG-CC benchmarker, as shown in Table 3 . For each of the source symbol distributions considered the capacity loss of the REGEC scheme is less than 0.3 dB, which is the smaller than the capacity loss of all the benchmarkers in each case, demonstrating that the proposed REGEC scheme facilitates near-capacity operation. Finally, Table 3 provides the tunnel bound of each scheme, which quantifies the lowest E b /N 0 value, where an open EXIT chart tunnel can be created upon employing a two-state accumulator for the URC code, as it was discussed in Section VI-D.
The proposed REGEC schemes facilitate reliable communication at E b /N 0 values that exceed the corresponding tunnel bound, provided that the symbol vector d comprises a sufficiently high number a of symbols. Note that higher E b /N 0 values will be required to achieve low SERs, when employing short frames [59] . For all considered values of p 1 as well as for the H.265 distribution, our proposed REGEC scheme offers an open tunnel at the lowest E b /N 0 values, facilitating low SERs at low E b /N 0 values. At high E b /N 0 values, the REGEC scheme will offer the widest open EXIT chart tunnel, requiring fewer decoding iterations to achieve a low SER than the benchmarkers. Table 3 also characterizes the complexity of all the schemes considered in this section. Here, the complexity is quantified by the average number of Add, Compare and Select (ACS) operations performed per decoding iteration and per symbol in the vector d. This is justified, since the REGEC trellis decoder UEC trellis decoder, FLC decoder, CC decoder and the URC decoder operate entirely on the basis of addition, subtraction and max * operations, which can be further decomposed into ACS operations. All other components in Figure 5 may be considered to have a relatively insignificant complexity [58] , [60] . As in [58] , we assume that the addition and subtraction operations each require a single ACS operation, while each max * operation may be approximated by a look up table operation, which can be completed using five ACS operations [61] . As shown in Table 3 , the complexity tends to increase as the Zeta distribution parameter p 1 is reduced, which may be explained by the resultant increases in the average codeword lengths l REG , l EG and l Unary . Note that the complexity of the proposed REGEC scheme is higher than those of the benchmarkers, because the REGEC scheme employs an r = 6-state trellis, while all benchmarkers employ r = 4-state trellises. In order to make fair comparisons in Section VII-B, we will limit the number of decoding iterations performed by the proposed REGEC scheme, so that all schemes operate within the same overall complexity limits. These complexity limits will be chosen to be sufficient for the benchmarker having the lowest complexity to achieve an SER performance that is within 0.1 dB of the performance it can achieve with unlimited complexity. This facilitates a fair comparison by ensuring that the selected complexity limit is not sufficiently high to favor the schemes having the highest complexity, such as the proposed REGEC scheme.
B. SER COMPARISON WITH THE BENCHMARKERS
Figures 13 and 14 characterize the SER performance of the schemes parametrized in Table 3 . We consider the transmission of source symbol vectors d comprising a = 2 · 10 4 symbols, which we found to be typical of the number of symbols in a H.265 [2] slice. Therefore, the SER performance of Figures 13 and 14 may be considered to be achievable without imposing any additional latency in multimedia applications.
As shown in Figure 13 , the proposed REGEC scheme facilitates reliable communication within as little as 1.2 dB of the capacity bound and consistently offers the best SER performance for each of the finite Zeta-like distribution p 1 values considered. This consistency is a key benefit of the proposed REGEC scheme, because while it offers only a small gain over the best of the three benchmarkers in each case, the performance of these benchmarkers is particularly inconsistent. More explicitly, while the proposed REGEC scheme offers a gain of 0.4 dB over the UEC benchmarker for p 1 = 0.7942, this gain becomes 5 dB for p 1 = 0.6, owing to the severe puncturing that the UEC scheme requires in this case [58] . Similarly, while the proposed REGEC scheme offers only a marginal gain over the EGEC benchmarker for p 1 = 0.6, this gain becomes 0.8 dB for p 1 = 0.2, owing to the severe puncturing of the two parts of the EGEC benchmarker in order to achieve UEP [58] , as described in Section I. Note that the EGEC scheme has worse performance than the SSCC EG-CC benchmarker for p 1 ∈ {0.2, 0.4}. In the case of p 1 = 0.2, this may also be attributed to the severe puncturing invoked for UEP. In the case of p 1 ∈ {0.4, 0.6}, UEP does not improve the performance of the EGEC benchmarker, beyond that of the Equal Error Protection (EEP). Since our proposed REGEC scheme does not have two parts that must be carefully balanced, it does not suffer from these problems. Similarly, while the proposed REGEC scheme offers only a marginal gain over the EG-CC benchmarker for p 1 = 0.4, this gain becomes 0.6 dB for p 1 = 0.2 and 0.9 dB for p 1 = 0.7942, as shown in Figure 13 .
In the case where the source symbols obey the H.265 distribution of Figure 1 , our REGEC scheme offers a gain of 0.7 dB over the SSCC EG-CC benchmarker, as shown in Figure 14 . Furthermore, our REGEC scheme offers 0.3 dB gain over the EGEC benchmarker, where UEP does not improve the performance of the EGEC benchmarker in this scenario. The UEC benchmarker has the worst performance of all the schemes considered in this scenario, owing to the severe puncturing that it requires to achieve the same effective throughput as the other schemes.
Note that since the SER results of Figures 13 and 14 offer fair comparisons in terms of complexity and effective throughput, the gains offered by our proposed REGEC scheme are obtained for free, with no cost in terms of transmit-duration, transmit-bandwidth, transmit-energy or decoding complexity. Therefore, a gain of say 0.9 dB may be deemed significant, particularly since it is achieved in the extreme vicinity of the DCMC capacity bound, namely within about 1.5 dB. This is achieved by mitigating the capacity loss, which is inherent in SSCC and which limits the performance of other JSCC schemes. Since these gains are associated with the improvements offered by the REGEC code over the benchmarker SSCC and JSCC codes, similar gains may be expected when combining them with any other channel codes, modulation schemes or channels.
Note that throughout our discussions above, it was assumed that the receiver of the proposed REGEC scheme has knowledge of the average REG codeword length l. Furthermore, it was assumed that the decoder has knowledge of the probabilities of occurrence P(d). However, Figure 13 and 14 show that when the channel SNR is sufficiently high, the proposed REGEC receiver facilitates a low SER, even if it does not have any knowledge of the symbol probabilities P(d). The symbol probabilities may be estimated by storing a sufficient number of symbol vectorsd, in order to heuristically estimate the required information, hence facilitating near-capacity communication for the subsequent symbol vectors.
VIII. CONCLUSIONS
In this paper, we have proposed a novel REGEC code for the near-capacity transmission of symbol values that are randomly selected from a source set having a large or infinite cardinality. Our REGEC code comprises a novel REG source code and a novel trellis code, which facilitates joint source and channel coding. In contrast to the UEC code previously proposed for the same purpose, our REGEC code is a universal code, facilitating the transmission of symbol values that are randomly selected using any mototonic probability distribution. On the other hand, in contrast to the EGEC code previously proposed for the same purpose, our REGEC code has a simple structure, which solves the delay, synchronization and computational complexity problems associated with the two parts of the EGEC code. In particular, the EGEC code must be specially parametrized for operation with the particular source distribution, preventing its application for unknown or non-stationary sources. By contrast, the proposed REGEC code can be applied for any distribution without requiring special parametrization.
In some practical scenarios where the source symbols obey particular finite Zeta-like probability distributions, our REGEC scheme is shown to offer gains of up to 0.9 dB over the best benchmarkers, when QPSK modulation is employed for transmission over an uncorrelated narrowband Rayleigh fading channel. In the scenario where the source symbols obey the H.265 distribution, our REGEC scheme is shown to offer a gain of 0.7 dB over the SSCC benchmarker, when QPSK modulation is employed for transmission over an uncorrelated narrowband Rayleigh fading channel. These gains are achieved for free, without increasing the required transmit-duration, transmit-bandwidth, transmit-energy or decoding complexity. We consider these gains to be significant, since they are achieved within the extreme vicinity of the DCMC capacity, namely within 1.4 dB. This is achieved by mitigating the capacity loss inherent in SSCC, which limits the performance of other JSCC schemes. Since these gains are associated with the improvements offered by the REGEC code over the benchmarker SSCC and JSCC codes, similar gains may be expected when combining with any other channel codes, modulation schemes or channels.
Our future work will consider the integration of the proposed REGEC code into a practical video codec, such as H.264 or H.265. This may be achieved by replacing all EG codewords with REG codewords. Since the iteration decoding process performs best when the interleaver length is long, it may be necessary to modify the video codec in order to keep all REG-encoded bits together within each frame. The resultant bit vector y can then be trellis encoded, interleaved and URC encoded, as shown in Figure 5(a) .
Following modulation, transmission and demodulation, iterative decoding may be employed to recover the REG-encoded LLRsỹ p . These LLRs may then be converted into symbols, or the soft information of these LLRs may be exploited to aid error concealment.
APPENDIX DERIVATION OF (11)
The method of [3, Appendix] may be used to derive the transition probabilities P(m, m ) of (11) 
where l 1 is the average length of the unary codeword Unary(x i ), as described in Section III. Dividing the result for all cases by the expected number of transition in the path m, namely al, yields the transition probability given in (11) . 
