The Effect of Coupling Memory and Block Length on Spatially Coupled
  Serially Concatenated Codes by Mahdavi, Mojtaba et al.
The Effect of Coupling Memory and Block Length
on Spatially Coupled Serially Concatenated Codes
Mojtaba Mahdavi, Muhammad Umar Farooq, Liang Liu, Ove Edfors, Viktor O¨wall, and Michael Lentmaier
Department of Electrical and Information Technology (EIT), Lund University, Lund, Sweden
Emails: {mojtaba.mahdavi, muhammad.umar farooq, liang.liu, ove.edfors, viktor.owall, michael.lentmaier}@eit.lth.se
Abstract—Spatially coupled serially concatenated codes (SC-
SCCs) are a class of spatially coupled turbo-like codes, which
have a close-to-capacity performance and low error floor. In
this paper we investigate the impact of coupling memory, block
length, decoding window size, and number of iterations on
the performance, complexity, and latency of SC-SCCs. Several
design tradeoffs are presented to see the relation between these
parameters in a wide range. Also, our analysis provides design
guidelines for SC-SCCs in different scenarios to make the code
design independent of block length. As a result, block length and
coupling memory can be exchanged flexibly without changing the
latency and complexity. Also, we observe that the performance of
SC-SCCs is improved with respect to the uncoupled ensembles
for a fixed latency and complexity.
I. INTRODUCTION
It has been shown that spatial coupling improves the de-
coding threshold of low-density parity-check (LDPC) codes
[1]. More specifically, the threshold of an iterative belief
propagation (BP) decoder saturates to the threshold of the
optimal maximum-a-posteriori (MAP) decoder [2], [3]. The
concept of spatial coupling has been extended to turbo-like
codes in [4], where it has been proven that threshold saturation
also occurs for this class of codes. The decoding of spatially
coupled codes can be done efficiently using window decoding
[5]–[7]. An information-coupled version of the turbo codes
from the LTE standard was proposed in [8]. On the other hand,
it has been demonstrated in [9] that spatial coupling leads to
a new tradeoff between error floor and waterfall performance
of turbo-like codes. As a result, with spatial coupling, serially
concatenated codes (SCCs) achieve better performance than
parallel concatenated codes (PCCs) in both the waterfall and
the error floor regions [9]. For this reason, spatially coupled
serially concatenated codes (SC-SCCs) are selected as the
focus of this paper.
From the analysis in [4] it can be seen that the decod-
ing thresholds can be improved by increasing the coupling
memory. But since the required size of the decoding window
increases with the coupling memory, this option may not look
appealing from a latency perspective. In this paper, we take
another approach and propose some design criteria that allow
us to increase the coupling memory without increasing latency
or complexity and without any performance loss. To this end,
we investigate the effect of block size, coupling memory,
window size, and number of iterations on the performance,
The simulations were performed on resources provided by the Swedish
National Infrastructure for Computing (SNIC) at the center for scientific and
technical computing at Lund University (LUNARC).
Outer 
Encoder
Inner 
Encoder
u
(a) (b)
u
In
te
rl
ea
ve
r 
1 K
2K
u
Interleaver 1
Fig. 1. (a) SCC encoder structure, (b) Compact graph representation of SCC.
complexity, and latency of SC-SCCs. Then, based on this
analysis, we introduce a setup which allows us to fix the
latency and complexity and trade between block length and
coupling memory. This enables a fair comparison between dif-
ferent coding scenarios in terms of performance, complexity,
and latency. As our approach lets us to flexibly exchange
the block length with the coupling memory, spatial coupling
allows a code designer to choose the strength and performance
independently for a given block length. A performance loss for
small block lengths is avoided in our scheme with continuous
encoding and decoding.
II. BACKGROUND
A. SCC Encoder
The structure of an SCC component encoder is depicted
in Fig. 1(a), which is made up of two recursive systematic
convolutional (RSC) encoders concatenated in a serial manner
using the interleaver. The left and right RSC encoders are
called the outer and inner encoders with the trellis length of K
and 2K, respectively. As shown in Fig. 1(a) the outer encoder
receives the information sequence, u, of length K bits and
produces the K-bit parity sequence pO. Then, the sequences
u and pO are multiplexed and permuted to generate the 2K-bit
sequence qO. This sequence is encoded by the inner encoder
to produce the 2K-bits parity sequence pI . Finally, the output
of the SCC encoder is v = (u,pO,pI).
B. SC-SCC Encoder
We have built the SC-SCC encoder by coupling m + 1
samples of SCC component encoders as shown in Fig. 2,
where m is the coupling memory. Let us consider the encoding
process at time instant t to see how the inner and outer
encoders are coupled together. As shown in Fig. 2 the outer
encoder receives the information bits, ut, and generates the
parity sequence pOt . Then, the pair of (ut, p
O
t ) is permuted
using Interleaver 1 to create a 2K-bit sequence, qOt . This
sequence is divided into m+ 1 parts of equal size, which are
ar
X
iv
:2
00
6.
13
39
6v
1 
 [c
s.I
T]
  2
4 J
un
 20
20
Outer 
Encoder
Inner 
Encoder
In
te
rl
e
av
e
r 
1
In
te
rl
e
av
e
r 
2
Outer 
Encoder
Inner 
Encoder
In
te
rl
e
av
e
r 
1
In
te
rl
e
av
e
r 
2
Outer 
Encoder
Inner 
Encoder
In
te
rl
e
av
e
r 
1
In
te
rl
e
av
e
r 
2
Fig. 2. Structure of SC-SCC encoder with couping memory of m, which is built by spatial coupling of m+1 samples of SCC component encoders together.
named as qOt,0, q
O
t,1, . . . , q
O
t,m. This implies that m+ 1 should
be smaller than 2K and also divide 2K. The first subsequence,
qOt,0, is used to generate the input of the current inner encoder
at time t and the other ones, qOt,1, . . . , q
O
t,m, will be used in the
next inner encoders at time t+1, . . . , t+m, respectively. Thus,
at time t, the sequence (qOt,0, q
O
t−1,1, . . . , q
O
t−m,m), which is
generated by the current and previous m outer encoders is
permuted by Interleaver 2 and sent to the inner encoder to
produce the parity sequence pIt . Finally, the output of the SC-
SCC encoder at time t is vt = (ut,pOt ,p
I
t ). In this paper, a
code rate of 1/3 is considered. Therefore, the output of the
inner encoder, pIt , is punctured such that only half of it, i.e.
K bits, is transmitted.
Similar to protograph-based LDPC codes, we can describe
turbo-like codes by compact graphs [9]. The compact graph
representation of SCCs is shown in Fig. 1(b), where the input
and parity sequences are shown by black circles and referred
to as variable nodes. Also, the outer and inner code trellises
are represented by squares, which are referred to as factor
nodes and labeled by the corresponding trellis lengths.
The corresponding compact graph representation of an SC-
SCC with coupling memory m = 1 is shown in Fig. 3.
The double circles are referred to as state variable nodes,
which transfer the encoder state at time t to the encoder at
time t + 1. As a result, our scheme performs the encoding
continuously without termination. The reason behind this
strategy is described in Section III-B. In a similar way, the
compact graph of an SC-SCC with larger m can be obtained.
C. SC-SCC Window Decoder
Analogously to LDPC codes, the nodes in an iterative mes-
sage passing decoder exchange log-likelihood ratios (LLRs)
along the edges in the graph (see Fig. 3). The inner and
outer trellises are decoded using the Bahl-Cocke-Jelinek-Raviv
(BCJR) algorithm. Let us consider a decoding window of
length W blocks, which starts at time t and ends at t+W −1,
as shown by a solid rectangle in Fig. 3. Among these blocks,
the first one to be decoded is referred to as the target block,
which is located to the leftmost side of the window.
For all blocks with index t′ = t, . . . , t+W−1, first the inner
and then the outer decoder perform IW decoding iterations as
follows. In each iteration, the inner decoder receives three se-
quences: the channel LLR values Lch(q
O
t′ ) and Lch(p
I
t′), and
the a-priori LLR values, La(pIt′), which are obtained based
on the previous extrinsic LLRs of the corresponding outer
decoder, Le(pOt′ ). The inner decoder produces the extrinsic
LLRs, Le(pIt′), and sends them back to the outer decoder.
Then, similarly, the outer decoder receives the channel LLR
W=4
Interleaver 1
Interleaver 2
Interleaver 1
Interleaver 2
Interleaver 1
Interleaver 2
Interleaver 1
Interleaver 2
Interleaver 1
Interleaver 2
Interleaver 1
Interleaver 2
W=4
K
2K 2K 2K 2K 2K 2K
K K K K K
Fig. 3. Compact graph representation of an infinite chain of SC-SCC for
coupling memory m=1. Two decoding windows with W=4 blocks are shown.
values Lch(ut′) and Lch(pOt′ ), and the a-priori LLRs, La(p
O
t′ ),
which are computed based on the previous extrinsic LLRs of
the corresponding inner decoder, Le(pIt′). The outer decoder
produces the extrinsic LLRs, Le(pOt′ ), and sends them back
to the inner decoder. After IW iterations, the decoding of the
target block, ut, is finished and the window is moved by one
block. The same process is done for the next window, i.e. the
dashed rectangle in Fig. 3, to decode the target block ut+1.
Definitions: The strength of spatially coupled codes de-
pends on the constraint length, which is defined as
C = K · (m+ 1), m < W, 2K. (1)
Also, the structural latency [10], [11] is represented as
L = W ·K, (bit) (2)
which for simplicity we call latency in the rest of the paper.
III. DESIGN GUIDELINES FOR FLEXIBLE CHOICE OF
BLOCK SIZE AND COUPLING MEMORY
A. Using Higher Coupling Memory in a Fixed Latency
A window decoder will perform very poorly if the window
size, W , is smaller than m + 1. Thus, if a higher coupling
memory is needed, W should be increased, which considerably
increases the latency as stated in (2). To solve this problem,
we propose to reduce the block length, K, and increase
the number of blocks per window, W , simultaneousely. As
a result, a higher coupling memory can be used without
changing the latency.
Fig. 4 shows an example of an SC-SCC scheme with a
latency of L = 4096 bits in two cases. In Fig. 4(a), four
blocks of K = 1024 bits per window are employed, which
W = 4
W = 8
K=1024 bits
K=512 bits
(a)
(b)
Fig. 4. Two fixed-latency scenarios with different block length, K, and
window size, W . (a) K = 1024, W = 4 and (b) K = 512, W = 8.
implies that the coupling memory cannot be larger than m = 3.
On the other hand, the same latency is achieved in Fig. 4(b)
by reducing the block length to K = 512 bits and doubling
the window size while the coupling memory can be increased
up to m = 7. Thus, depending on the block length, different
window sizes should be considered to have a fixed latency and
relax the limitation of the coupling memory. In Section IV-A,
we will show that in a fixed latency scenario, a higher coupling
memory results in a better performance compared to a smaller
one. However, there are some challenges to employ small
blocks and large coupling memory, which are addressed in
our scheme as follows.
B. Continuous Encoding
The classical way of encoding the SC-SCCs is to terminate
the encoder after each block [9], i.e. encoder starts and ends in
the zero state. The drawback of such schemes is a significant
rate loss for small block lengths, K. One of the contributions
of this paper is to perform continuous encoding without
termination after each block to avoid the rate loss specially
for the small K. For this purpose, after encoding of the block
at time t, the encoder state is passed to the encoder of the
block at time t+1. Thus, the last state of the encoder at time
t is used as the starting state of the encoder at time t+ 1. To
represent this concept, we have added the state variable nodes
to the SC-SCC graph, as shown by double circles in Fig. 3.
C. Performance Improvement of Boundaries
The traditional window decoding algorithm usually works
in a block-wise basis [7] as shown in Fig. 5(a). Thus, at time
instant t, the computation of α and β are done in the forward
and backward recursions for the block ut. Then, the results
at time t are used in decoding of the next block, ut+1. This
method works properly for large block lengths, K. However,
in case of small block lengths, running the BCJR for a very
short trellis results in a poor performance at the boundaries
between blocks. This is due to the unreliable states at the start
and end of each trellis. Thus, the bits which are close to the
boundaries will have a weak protection. To some extent this
problem can be resolved by doing more iterations, but the
computational complexity will be increased significantly.
To address this challenge, we propose to perform the de-
coding over the whole window at once. As shown in Fig. 5(b),
the α computation is done in the forward recursion over the
whole window at once and then β is computed in the backward
recursion. Thus, regardless of the value of block length, K,
and window size, W , in our scheme the BCJR algorithm is
run one time per iteration over the whole window instead of
(a)
(b)
α 
β  
α 
β  
α 
β  
α 
β  
α 
β  
α 
β  
α 
β  
Fig. 5. (a) Block-wise window decoding, (b) Proposed window decoding
scheme. The window size, W , is the same in both cases.
W times per iteration. As a result, the presented decoding
scheme will be independent of the block length and window
size. Also, since the trellis length becomes large the boundary
states are more reliable, which can improve the performance
especially for small block lengths.
D. Fixed Complexity
Since, the trellis length of the inner decoder is twice the one
of outer decoder, we define 2OD and OD as the complexity of
the inner and outer decoders. Due to the overlaps between the
successive windows, shown in Fig. 3, each block is processed
W ·IW times, where IW is the number of iterations per window
position. Thus, the computational complexity per bit is
Obit = W · (3OD) · IW
K
=
3OD
K
· Ieff , (3)
which is proportional to the effective number of iterations
Ieff = W · IW , since OD is proportional to K. Consequently,
if the same IW is used for both cases in Fig. 4(a) and (b),
the scenario in Fig. 4(b) will have higher complexity than the
one in Fig. 4(a), which is due to the larger W and amount of
overlaps between successive windows.
Here, Ieff specifies how often the BCJR is run to decode
a certain block. The goal is to adjust the IW such that the
same Ieff is achieved for all scenarios, which results in the
same complexity per bit. This enables us to perform a fair
comparison between different SC-SCC scenarios regardless of
their block length, window size, and latency. For example, to
have the same complexity in both scenarios in Fig. 4, the IW
in the second scenario, Fig. 4(b), should be set to
IW2 =
W1 · IW1
W2
, (4)
where W1 and IW1 are corresponding to the case in Fig. 4(a).
It can be seen that less iterations per window, IW , are used for
smaller blocks. Also, it is important to point out that from a
complexity perspective, both cases in Fig. 5 are the same and
it does not matter to run a long BCJR or several short ones.
It is worth to mention that the computational complexity
is not the only comparison metric that should be taken into
account. There are other costs like the size of required memory
and routing, which contribute to the hardware cost. However,
these implementation issues are mainly related to the hardware
architecture, which is not in the scope of this paper.
IV. PERFORMANCE EVALUATION
We have investigated the effect of code properties (e.g.
K, m) and also the decoding parameters (e.g. W , Ieff) on
the performance and complexity of SC-SCCs. To this end,
TABLE I
DIFFERENT SCENARIOS OF SC-SCCS WITH THE SAME LATENCY (L),
CONSTRAINT LENGTH (C), AND COMPUTATIONAL COMPLEXITY.
L‡= 16384 K 4096 2048 1024 512 256 128W 4 8 16 32 64 128
C†= 8192 m 1 3 7 15 31 63IW ∗ 20 10 5 3 2 1
L = 8192 K 2048 1024 512 256 128 64
W 4 8 16 32 64 128
C = 4096 m 1 3 7 15 31 63
IW 20 10 5 3 2 1
L = 4096 K 1024 512 256 128 64 32
W 4 8 16 32 64 128
C = 2048 m 1 3 7 15 31 63
IW 20 10 5 3 2 1
L = 2048 K 512 256 128 64 32 -
/
W 4 8 16 32 64 -
C = 1024 m 1 3 7 15 31 -
IW 20 10 5 3 2 -
L = 1024 K 256 128 64 32 16 -
/
W 4 8 16 32 64 -
C = 512 m 1 3 7 15 31 -
IW 20 10 5 3 2 -
‡ Calculated using (2) † Calculated using (1) ∗ Calculated using (4)
 Rounded to the nearest largest integer number. Also, Ieff = 80 is used
to have the same complexity and to perform IW ≥ 1 for all scenarios.
/ Not available in this scenario since (2) implies that m < 2K.
we have defined and used five SC-SCC scenarios as listed
in Table I. In each scenario the latency L, constraint length
C, and complexity are fixed, which are obtained by different
combinations of K, W , m, and IW in a wide range. In
the simulations, the information sequence is modulated using
binary phase shift keying (BPSK) modulation and transmitted
through the additive white Gaussian noise (AWGN) channel1.
A. Effect of Coupling Memory on the Performance
We have investigated the effect of coupling memory, m, on
the performance of SC-SCCs. The goal is to fix the window
size, W , and block length, K, and then find the value of
coupling memory, m, which leads to the best performance.
As an example, this concept is investigated for three cases:
{L = 1024, K = 32, W = 32}, {L = 8192, K = 512,
W = 16}, and {L = 8192, K = 64, W = 128} and the
corresponding results are depicted in Fig. 6(a)-(c). As a result,
by increasing the coupling memory up to m = W/2 − 1 the
performance will be improved considerably (i.e. 0.2 dB to 1.1
dB). Also, the error floor goes down to the lower BERs and the
waterfall performance becomes better. But, if m > W/2−1 the
performance will be degraded, as shown with doted curves in
Fig. 6. This is due to the fact that in such a case we cannot see
even one constraint length, C, inside the window (see Fig. 3)
and therefore the performance of the decoder cannot fully
exploit the code. Thus, for a given K and W the coupling
memory of m = W/2 − 1 results in the best performance
in such a setup. It is worth to mention that this performance
improvement is achieved without compromising the latency
and complexity.
1We have picked a set of pseudo-random interleavers and fixed them for
all the code sequences through the simulations.
(a) (b) (c)
K= 32
W=32
K= 512
W=16
K= 64
W=128
         m = 1
         m = 3
         m = 7
         m = 15
         m = 31
         m = 63
         m = 1
         m = 3
         m = 7
         m = 15
         m = 31
        m=1
        m=3
        m=7
        m=15
Fig. 6. The effect of coupling memory, m, on the performance in different
scenarios. The latency is (a) L = 1024, (b) L = 8192, and (c) L = 8192
bits. The complexity is the same for all scenarios (Ieff = 80 for all cases).
B. Shorter Block Length with Higher Coupling Memory
As mentioned in Section IV-A, the coupling memory of
m = W/2 − 1 leads to the best performance for the given
K and W . This choice of coupling memory results in the
constraint length of C = K · (m+ 1) = K ·W/2, which can
be achieved by either a small K and large m or a large K
and small m while the latency remains fixed (L = 2C). For
example, {K = 256,W = 4,m = 1} and {K = 64,W =
16,m = 7} achieve the same latency of L = 1024 bits and
constraint length of C = 512 as shown in Table I.
It is expected to achieve same performance for the scenarios,
which have the same constraint length. We have investigated
this concept for the five scenarios in Table I and the simulation
results are shown in Fig. 7(a)-(e). In each scenario the latency
and constraint length are fixed while the complexity is the
same for all of them by considering Ieff = 80. Simulation
results show that, for a certain latency and constraint length,
selecting a small block length, K, and large coupling memory,
m, can lead to a better performance compared to a large
block length and small coupling memory. This performance
improvement can be seen in both waterfall and error floor
regions in Fig. 7.
As a result, our analysis reveals the flexibility of SC-SCCs
such that for a given latency and constraint length, it is possible
to make the block length smaller and use higher coupling
memory while we get the same or even better performance
compared to the larger K. It is worth to mention that, in case
of very small K the performance degrades and an error floor
appears at high BERs, which are shown by the dashed curves
in Fig. 7(a)-(e). This is mainly due to the fact that in our
scheme, we have employed independent random interleavers
to show how the performance changes for different block
lengths. But, in case of very small K the short-length random
interleavers are not efficient. In such cases, the interleavers
should not be designed independently2. Moreover, since we
want to have the same complexity for different block lengths,
2We plan to investigate a joint interleaver design for small block lengths,
K, in our future research.
(b) (c) (d) (e)
       K=256, W=4,   m=1
       K=128, W=8,   m=3
       K=64,   W=16, m=7
       K=32,   W=32, m=15
       K=512, W=4,   m=1
       K=256, W=8,   m=3
       K=128, W=16, m=7
       K=32,   W=64, m=31
     K=2048,W=4,  m=1
     K=1024,W=8,  m=3
     K=512,  W=16,m=7
     K=128,  W=64,m=31
      K=4096, W=4,  m=1
      K=2048, W=8,  m=3
      K=1024, W=16,m=7
      K=256,   W=64,m=31
      K=1024,W=4,   m=1
      K=512,  W=8,   m=3
      K=256,  W=16, m=7
      K=64,    W=64, m=31
(a)
Fig. 7. BER Performance of the scenarios in Table I, where the latency and constraint are fixed to (a) L = 1024, C = 512, (b) L = 2048, C = 1024, (c)
L = 4096, C = 2048, (d) L = 8192, C = 4096, and (e) L = 16384, C = 8192. The same complexity is considered for all scenarios by choosing Ieff = 80.
IW would be very low (e.g. IW = 1, 2, 3 in Table I) for
the very small K, which degrades the performance. Note that
a small or large block length is relative to the latency. For
example, K = 128 is considered as a large K in case of
L = 1024, while it is a small K for L = 8192.
C. Performance Comparison with Uncoupled Codes
Fig. 8 shows the performance comparison between the
presented SC-SCC scheme and the uncoupled ensembles,
SCC, for different latencies, L, and block lengths, K. To
have a fair comparison, the same complexity is considered for
both SC-SCC and SCC regardless of L and K as described in
Section III-D. It can be seen that spatial coupling significantly
improves the performance of the SCC and makes it much
closer to the capacity. Having considered the same interleaver
size (i.e. fixed block length, K) the SC-SCC achieves around
1 dB better performance than the corresponding SCC scheme
with the same K. Also, in case of equal latency, the SC-
SCC scheme still achieves around 0.5 dB better performance
than the SCC at the BER of 10−4. The latency of an SCC
is L = K. Moreover, Fig. 8 shows that the performance
improvement resulting from increasing the latency is more
pronounced in SC-SCCs than SCCs. More specifically by
increasing the latency from L = 1024 to L = 32768 bits,
0.7 dB and 1.1 dB performance improvement is achieved in
SCC and SC-SCC, respectively.
It is worthwhile to mention that, even with lower latency,
the SC-SCC can achieve even better performance than the
SCC scheme. For example, as shown in Fig. 8, the SC-
SCC with L = 8192 has better performance than the SCC
with L = 32768, 16384. This means that by just increasing
the latency and block length the SCC cannot achieve better
performance than the SC-SCC, which is due to the thresh-
old improvement resulting from spatial coupling. Moreover,
asymptotic decoding thresholds of the SCC and SC-SCC
ensembles for the AWGN channel are depicted using vertical
lines in Fig. 8. These values are computed using the erasure
channel prediction method introduced in [12].
V. DESIGN TRADEOFFS
So far we assumed fixed latency and complexity in the
evaluations. Now we are interested to see the performance
gain of SC-SCCs if we increase the latency or the complexity.
        SCC        
          K=1024
          K=2048
          K=4096
          K=8192
          K=16384
          K=32768
         Threshold
              SC-SCC            
             =1024,   K=256
             =2048,   K=512
             =4096,   K=1024
             =8192,   K=2048
             =16384, K=4096
             =32768, K=8192
         Threshold
Fig. 8. Performance comparison between proposed SC-SCC and SCC
for different block lengths, K, and latencies, L. The same complexity is
considered for all cases by choosing Ieff = 80.
A. Performance-Latency Tradeoff
As mentioned in Section IV-A, for a given latency, L, and
complexity, the performance can be improved by increasing
the constraint length C, i.e., larger m or K. Now, we want
to see if for a given constraint length and complexity, it
is possible to improve the performance by increasing the
latency? More specifically, having considered a fixed K and
m, how does the performance change if we make the window
size, W , larger? We have investigated this concept and the
results are shown in Fig. 9. For a given constraint length
and complexity, if we only change the window size (e.g.
doubling W ) we will not gain too much in performance
while the latency is increased (e.g. twice latency). So, if the
targeted application can tolerate the higher latencies, it is better
to increase the coupling memory, m, as well to make the
code stronger rather than just increasing the window size,
W . As a result, this strategy provides the effective use of a
certain latency to achieve better performance. It is worth to
mention that, the larger window size enables us to employ
a higher coupling memory. This is due to the limitations on
the coupling memory, i.e. m ≤ W/2 − 1, as described in
Section IV-A.
Fig. 10 shows the tradeoff between latency and performance
for the scenarios in Table I. In each scenario the latency,
(a) (b) (c)
K= 128 K= 256 K= 2048
W=4, m=1
W=8, m=1
W=8, m=3
   W=8,   m=3
   W=16, m=3
   W=16, m=7
W=4, m=1
W=8, m=1
W=8, m=3
Fig. 9. Simulation results to investigate the effect of window size, W , on the
performance. The block length, K, is (a) 128 bits, (b) 256 bits, and (c) 2048
bits. The computational complexity is the same for all scenarios (Ieff = 80).
K = [1024  2048  4096  256],     =8192
K = [512    1024  2048  128],     =4096
K = [256    512    1024  64],       =2048
K = [128    256    512    32],       =1024
K = [64      128    32      256],     = 512
( 
   
)
Fig. 10. The latency-performance tradeoff for the scenarios in Table I. The
required Eb/N0 to achieve BER of 10−5 is shown in x-axis. The listed values
of K in the legend are corresponding to the markers from left to right. The
same computational complexity is considered for all scenarios (Ieff = 80).
L, and constraint length, C, are fixed and the complexity is
the same for all scenarios. In this figure, x-axis shows the
required Eb/N0 to achieve the BER of 10−5 in all scenarios.
The markers, which tend to the lower left corner of Fig. 10 are
corresponding to the scenarios with a low latency, L, and good
performance. Thus, by considering this tradeoff the proper
values of W , K, and m can be obtained.
B. Performance-Complexity Tradeoff
It is worthwhile to see the effect of number of iterations on
the performance: how will the performance be improved if we
would spend more complexity? We have considered different
effective number of iterations, Ieff, for the SC-SCC scheme
in case of L = 4096 and L = 16384 bits latencies. The
corresponding simulation results are shown in Fig. 11(a) and
(b), respectively. Simulation results show that the performance
improvement resulting from a higher effective number of
iterations, Ieff, is more pronounced in the high latency, L,
scenarios than the low latency scenarios. Also, it can be seen
that by spending more complexity the error floor goes down
and it happens at a much lower BER.
VI. CONCLUSION
We have investigated the effect of coupling memory, block
length, window size, and number of iterations on the perfor-
(a) (b)
K= 1024, W=4, m=1 K= 4096, W=4, m=1
Ieff =4
Ieff =8
Ieff =12
Ieff =20
Ieff =40
Ieff =80
Ieff =4
Ieff =8
Ieff =12
Ieff =20
Ieff =40
Ieff =80
Fig. 11. Simulation results to investigate the effect of number of iterations
on the performance. The latency equals (a) L = 4096 and (b) L=16384 bits.
mance, complexity, and latency of SC-SCCs. Our approach
provides the flexibly to exchange the block size with the
coupling memory, which makes the code design independent
of the block length. We have demonstrated that how the higher
coupling memory can be used without increasing the latency
and complexity. Moreover, we have shown that SC-SCCs can
achieve better performance than the uncoupled ensembles with
the same latency and complexity.
REFERENCES
[1] M. Lentmaier, A. Sridharan, D. J. Costello, and K. S. Zigangirov,
“Iterative decoding threshold analysis for LDPC convolutional codes,”
IEEE Transactions on Information Theory, vol. 56, no. 10, pp. 5274–
5289, Oct 2010.
[2] S. Kudekar, T. J. Richardson, and R. L. Urbanke, “Threshold saturation
via spatial coupling: Why convolutional LDPC ensembles perform so
well over the BEC,” IEEE Transactions on Information Theory, vol. 57,
no. 2, pp. 803–834, Feb 2011.
[3] S. Kumar, A. J. Young, N. Macris, and H. D. Pfister, “Threshold
saturation for spatially coupled LDPC and LDGM codes on BMS
channels,” IEEE Transactions on Info. Theory, vol. 60, no. 12, pp. 7389–
7415, Dec 2014.
[4] S. Moloudi, M. Lentmaier, and A. Graell i Amat, “Spatially coupled
turbo-like codes,” IEEE Transactions on Information Theory, vol. 63,
no. 10, pp. 6199–6215, Oct 2017.
[5] N. U. Hassan, M. Schlu¨ter, and G. P. Fettweis, “Fully parallel window
decoder architecture for spatially-coupled LDPC codes,” in 2016 IEEE
International Conference on Communications (ICC), 2016, pp. 1–6.
[6] O. I˙s¸can and W. Xu, “Window-interleaved turbo codes,” IEEE Commu-
nications Letters, vol. 22, no. 4, pp. 676–679, 2018.
[7] M. Zhu, D. G. M. Mitchell, M. Lentmaier, D. J. Costello, and B. Bai,
“Braided convolutional codes with sliding window decoding,” IEEE
Transactions on Communications, vol. 65, no. 9, pp. 3645–3658, 2017.
[8] L. Yang, Y. Xie, X. Wu, J. Yuan, X. Cheng, and L. Wan, “Partially
information-coupled turbo codes for LTE systems,” IEEE Transactions
on Communications, vol. 66, no. 10, pp. 4381–4392, Oct 2018.
[9] S. Moloudi, M. Lentmaier, and A.Graell i Amat, “Spatially coupled
turbo-like codes: A new trade-off between waterfall and error floor,”
IEEE Transactions on Comm., vol. 67, no. 5, pp. 3114–3123, May 2019.
[10] C. Rachinger, J. B. Huber, and R. R. Mu¨ller, “Comparison of convolu-
tional and block codes for low structural delay,” IEEE Transactions on
Communications, vol. 63, no. 12, pp. 4629–4638, 2015.
[11] C. Rachinger, R. Mu¨ller, and J. B. Huber, “Low latency-constrained
high rate coding: LDPC codes vs. convolutional codes,” in 2014 8th
International Symposium on Turbo Codes and Iterative Information
Processing (ISTC), 2014, pp. 218–222.
[12] M. U. Farooq, S. Moloudi, and M. Lentmaier, “Thresholds of braided
convolutional codes on the AWGN channel,” in 2018 IEEE International
Symposium on Information Theory (ISIT), 2018, pp. 1375–1379.
