Shuffled iterative receiver for LDPC-coded MIMO systems by Zhao, Peiyao et al.
Shufﬂed Iterative Receiver for LDPC-Coded MIMO
Systems
Peiyao Zhao1, Chen Qian1, Zhaocheng Wang1, Linglong Dai1 and Sheng Chen2,3
1Tsinghua National Laboratory for Information Science and Technology (TNList),
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
2Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, U.K.
3Faculty of Engineering, King Abdulaziz University, Jeddah 21589, Saudi Arabia
Abstract—In this paper, we consider the low density parity
check (LDPC) coded multi-input multi-output (MIMO) system
with iterative detection and decoding (IDD). Since the traditional
frame-by-frame receiver scheme suffers from a huge decoding
delay, we propose an efﬁcient scheme with a shufﬂed structure
between the demapper and decoder, which adopts group vertical
shufﬂed belief propagation (BP) algorithm. The proposed shuf-
ﬂed iterative receiver converges faster and signiﬁcantly reduces
the delay introduced by the IDD process. Simulation results
demonstrate that our proposed shufﬂed iterative receiver exhibits
several tenths dB of signal-to-noise ratio gain in comparison to the
existing schemes, while imposing a much lower average number
of iterations for the IDD process.
Index Terms—Iterative detection and decoding, MIMO, shuf-
ﬂed iterative receiver, LDPC code, shufﬂed BP algorithm
I. INTRODUCTION
Many receiver schemes have been designed to approach
the channel capacity of multi-input multi-output (MIMO)
systems. In particular, the receivers that adopt an iterative
detection and decoding (IDD) structure [1], [2] are capable
of closely approximating the optimal joint detection and
decoding in an iterative fashion and, therefore, achieving
excellent performance while maintaining tractable complexity.
An IDD receiver consists of a soft detector/demapper and a
soft decoder. The demapper estimates the log likelihood ratios
(LLRs) of the encoded bits, which serve as the input of the
decoder. Then the decoder generates a posteriori LLRs and
feeds back the extrinsic information to the demapper. This
iterative process is repeated until the procedure converges or
the preset maximum number of iterations is reached.
Low density parity check (LDPC) code is a class of linear
block code with near Shannon limit performance. It has been
widely considered as a forward error correction (FEC) code in
the IDD schemes for MIMO systems [3]–[7]. In [3], the de-
coder exchanges the extrinsic information with the demapper
frame by frame per lc decoding-loop iterations. In the process,
the check node messages are either reset to zero or not reset
after each demapper-decoder iteration, which are referred to
as the resetting and non-resetting algorithms, respectively. The
non-resetting algorithm with lc = 1 is the traditional frame-
by-frame scheme commonly used in LDPC-coded MIMO
systems. However, such LDPC-coded MIMO systems suffer
from the drawbacks of high computational complexity and
severe iteration delay.
Shufﬂed decoding is ﬁrst proposed in the turbo-decoding
ﬁeld to improve the convergence speed. The scheme proposed
in [8] extends the shufﬂed decoding to reduce the delay of
demapper-decoder iteration for bit-interleaved coded modu-
lation with iterative demapping (BICM-ID) in single-input
single-output systems. But the number of demapping units
required remain large, which equals to the parallel order of
the decoder. This may lead to a prohibitive computational
complexity for high-order modulation, and thus it is unsuitable
for MIMO systems.
In this paper, we develop an efﬁcient IDD scheme with
a shufﬂed structure between the demapper and decoder for
LDPC-coded MIMO systems. The proposed shufﬂed iterative
receiver as usual consists of a soft demapper, a bit-wise in-
terleaver and an LDPC decoder. However, our decoder adopts
a semi-parallel structure in which the extrinsic information
generated in each decoding cycle is fed back to the demapper
as the a priori information immediately, instead of waiting
for the decoding completion of the whole code frame. The
bit-wise interleaver is carefully designed to guarantee that
the bits fed back by the decoder in each cycle are mapped
onto several intact symbol vectors. The number of demapping
units required by our shufﬂed iterative receiver equals to the
number of symbol vectors, which is much smaller than that
of the scheme proposed in [8]. We also propose a partial
feedback of decoded bits which offers a ﬂexible performance-
complexity tradeoff. Based on a well-designed schedule, our
scheme enjoys a low iteration delay as well as a relatively low
complexity. Simulation results show that the proposed shufﬂed
iterative receiver exhibits several tenths dB gains in the signal-
to-noise ratio (SNR) in comparison to the non-iterative scheme
and resetting algorithm given in [3], while imposing a much
lower average number of iterations.
II. BACKGROUND
A. System Model
The LDPC-coded MIMO system with Nt transmit antennas
and Nr receive antennas is considered, in which the interleaver
and de-interleaver are denoted by Π and Π−1, respectively.
33
>W
ŶĐŽĚĞƌ
^ǇŵďŽů
DĂƉƉĞƌ ^ͬW
^ŽĨƚ
ĞŵĂͲ
ƉƉĞƌ
>W
ĞĐŽĚĞƌ
3
^ŽƵ͘ŝƚƐ
^ŝŶŬŝƚƐ

D/

H/H/

D/
Fig. 1: LDPC-coded MIMO system with iterative detection and
decoding.
As shown in Fig. 1, the source bits u =
[
u1 u2 · · ·uK
]T are
encoded by a rate-Rc LDPC code into c =
[
c1 c2 · · · cN
]T
,
where K = N · Rc. The coded bits after passing through the
interleaver are grouped into vectors of length Kb = m·Nt, and
each bit vector is mapped onto a symbol vector s ∈ CNt×1
whose entries are chosen from a complex-valued constellation
A, where |A| = 2m and m is the order of the constellation.
The received signal y is given by
y = Hs+ n, (1)
where H ∈ CNr×Nt is the MIMO channel matrix and
n ∈ CNr×1 denotes a complex-valued additive white Gaussian
noise (AWGN) vector with covariance matrix σ2INr . We
assume a quasi-static Rayleigh ﬂat fading environment, and
the entries of H are independent and identically distributed
(i.i.d.) complex-valued Gaussian variables with zero mean and
a variance 0.5 per dimension. We further assume that the
receiver has perfect knowledge of the channel matrix H.
The receiver performs IDD as illustrated in Fig. 1. For
each demapper-decoder iteration, the demapper calculates the
extrinsic information Le1 based on the channel observation y
and the a priori information La1 provided by the decoder. Then
Le1 is forwarded to the decoder as the a priori information L
a
2
after de-interleaving, based on which the decoder generates
the a posteriori LLRs. The extrinsic information Le2 of the
decoder is in turn fed back to the demapper as the a priori
information La1 after re-interleaving for next iteration. The it-
erative operations are repeated until all the checks are satisﬁed
or the pre-determined maximum iteration number is reached.
B. Soft-Input Soft-Output Demapper
Both the optimal maximum likelihood (ML) demapper [1]
and the suboptimal K-BEST sphere decoder demapper [9] are
considered. The demapper computes the extrinsic information
for each coded block of bits based on the received vector
y and the a priori information La1 . Let the a priori infor-
mation La1 and the extrinsic information L
e
1 associated with
each coded block of bits bi, 1 ≤ i ≤ Kb, be denoted as{
La1,1, L
a
1,2, · · · , La1,Kb
}
and
{
Le1,1, L
e
1,2, · · · , Le1,Kb
}
, respec-
tively. The extrinsic information Le1,i of the i-th bit bi, where
1 ≤ i ≤ Kb, associated with the transmit vector s is given by
[1]
Le1,i = log
∑
s∈Bi,0
exp
⎛
⎝−||y −Hs||2
σ2
+
∑
j =i,bj=0
La1,j
⎞
⎠
− log
∑
s∈Bi,1
exp
⎛
⎝−||y −Hs||2
σ2
+
∑
j =i,bj=0
La1,j
⎞
⎠, (2)
where Bi,0 and Bi,1 denote the sets of the candidate symbol
vectors with bi = 0 and bi = 1, respectively. All the possible
transmit vectors are considered by the ML demapper, leading
to a computational complexity that increases exponentially
with Kb = m · Nt. By contrast, for the K-BEST demapper,
a small set of the candidate vectors is generated by a breath-
ﬁrst tree search keeping only the best K candidate at each
level, and consequently the complexity is reduced. It should
be noted that if either Bi,0 or Bi,1 is null, no information is
obtained regarding one of the two hypothesises of this bit.
In such a case, then the output LLR is clipped to a constant
value, denoted by ±lclip, respectively.
For both the ML demapper and the K-BEST demapper, the
candidate transmit vectors and the corresponding Euclidean
distances ||y −Hs|| are stored for the iterative operation.
C. LDPC Decoder
A group vertical shufﬂed belief propagation (BP) algorithm
[10] is adopted at the decoder to speed up the convergence of
decoding. We group all the bit nodes into G layers uniformly
and perform the vertical shufﬂed BP algorithm layers by layers
in each iteration. The decoding process is summarized in
Algorithm 1 Group Vertical Shufﬂed BP Algorithm
For iteration t (t = 1, 2, · · · , tmax) and layer g (g =
1, 2, · · · , G), perform the following operations on each bit
node b that belongs to layer g.
Horizontal process
R
(g,t)
l,b = 2 tanh
−1
(
M
(g−1,t)
l
tanh
(
Q
(t−1)
l,b /2
)
)
, where l ∈ M(b)
(3)
Vertical process
Q
(t)
b = F
(t−1)
b +
∑
l∈M(b)
R
(g,t)
l,b (4)
Q
(t)
l,b = Q
(t)
b −R(g,t)l,b (5)
Updating process
M
(g,t)
l =
∏
b′∈N(l)
G(b′)<g
tanh
(
Q
(t)
l,b′
2
) ∏
b′∈N(l)
G(b′)≥g
tanh
(
Q
(t−1)
l,b′
2
)
(6)
Algorithm 1, where Rl,b denotes the message passing from
check node l to bit node b and Ql,b as the reverse operation,
while Fb andQb are the a priori and a posteriori LLRs of bit b,
respectively. The superscript pair (g, t) of a symbol represent
the corresponding value at the g-th layer and t-th iteration.
Since the values of Ql,b ,Qb and Fb do not change with layer
number g, we omit the superscript g on them. In Algorithm 1,
M(b) denotes the set of the check nodes connected to bit
node b, and N (l) denotes the set of bit nodes that participate
in check l, while Ml is an intermediate variable deﬁned in
Eq. (6), where G(b′) denotes the layer number of bit node b′.
Notice that in Eq. (6), G(b′) < g means that the bit node b′ has
been decoded in the former layers, and thus we use the updated
value Q(t)l,b′ . The decoding process of a layer is referred to as a
decoding cycle. In each decoding cycle, the decoder generates
the a posteriori LLRs of P = N/G bits. The initialization,
stopping criterion test and output steps remain the same as
those of the standard BP algorithm [11].
The group shufﬂed decoding is suitable for quasi-cyclic
LDPC (QC-LDPC) code whose check matrix is comprised of
circulant matrices and null matrices of the same size q × q.
We can simply set G = q, and the g-th layer contains the g-th
bit of each sub-matrix, where g = 1, 2, · · · , G. In this paper,
we use QC-LDPC code as an example, and we point out that
other kinds of LDPC codes can also be supported.
D. Iterative Operation Between Demapper and Decoder
The traditional MIMO IDD receiver performs demapper-
decoder iteration in a frame-by-frame schedule, which means
that the extrinsic information generated by the decoder can
only be fed back to the demapper after the entire code frame
has been decoded. In such a frame-by-frame schedule, the
decoder and demapper work in turn, and each waits the other
to complete its operations on an entire code frame, which leads
to a huge iteration delay. The long delay of traditional frame-
by-frame schemes severely limits the effective throughput of
the system. In order to reduce the IDD delay, a large number
of demapping units are required which however increases the
complexity considerably.
For example, the message passing algorithm proposed in [8]
extends the idea of shufﬂed decoding to exchange information
between the demapper and decoder efﬁciently which utilizes
the parallelism of the LDPC decoder and a partial update
strategy of the demapper. In each sub-iteration, the LLRs of
the bits involved are calculated by the demapper employing the
existing a priori information, and then the decoder generates
the extrinsic information of these bits which are fed back to
the demapper immediately. This scheme reduces the delay
introduced by demapper-decoder iteration considerably, but
the number of demapping units required, which equals to the
parallel order of the LDPC decoder, is large. Therefore, its
complexity is relatively high, especially for systems with high-
order modulation. Consequently, for a large MIMO system
with high-order modulation, the computational complexity of
the shufﬂed decoding scheme proposed in [8] may become
prohibitively high.
III. EFFICIENT SHUFFLED ITERATIVE RECEIVER
As discussed in the previous section, a critical problem of
the conventional frame-by-frame receiver scheme is the severe
iteration delay induced. The shufﬂed receiver scheme of [8]
may effectively reduce this IDD delay at the cost of high
complexity. We propose an efﬁcient shufﬂed iterative receiver
which enjoys a low IDD delay and converges faster, while
only imposing a relatively low complexity.
A. The Proposed Shufﬂed Iterative Receiver
Consider the group vertical shufﬂed BP algorithm adopted
by the decoder. In the g-th decoding cycle, the extrinsic
information Le2 of P bits, denoted as L
e
2,g , are generated by the
decoder employing the a priori information La2 of these bits,
denoted as La2,g . Note that L
a
2,g will only be used in the g-th
decoding cycle of the next iteration, and thus the updating of
La2,g by the demapper does not interfere with the decoding
operation of other layers, which means the demapper and
decoder can work in parallel with a well designed schedule. In
our proposed scheme, the extrinsic information Le2,g are fed
back to the demapper immediately, after they are generated
by the decoder. Then the demapper updates the a priori
information La2,g , which will be forwarded to the decoder
for next iteration. At the same time, the decoder moves to
the next decoding cycle without waiting for the completion
of demapping operation. The demapper and decoder form a
pipeline structure which reduces the iteration delay signiﬁ-
cantly, compared with the traditional frame-by-frame scheme.
We also propose a partial feedback strategy, which only
feeds back the extrinsic information of Pf bits in each de-
coding cycle, where Pf ≤ P . With a smaller number of bits
participating in the feedback, the computational complexity
of demapping is reduced. Besides, only the candidate transmit
vectors and the corresponding Euclidean distances associated
with the bits that participate in the feedback need to be stored,
which leads to a reduction of RAM resources. Hence the
partial feedback strategy offers a ﬂexible trade-off between the
performance and complexity. In particular, Pf = 0 indicates
that no extrinsic information is exchanged between the decoder
and demapper, which is equivalent to the non-iterative scheme,
while Pf = P means that all the extrinsic information are fed
back and, therefore, the hardware complexity required is the
highest and the BER performance attainable is the best.
The interleaver and de-interleaver are carefully designed to
guarantee that the feedback bits are mapped onto several intact
symbol vectors. Thus we have Pf/Kb ∈ N. This minimizes
the number of symbol vectors related to these Pf bits, and con-
sequently it reduces the number of demapping units required,
which equals to Pf/Kb. By contrast, the scheme proposed in
[8] requires P demapping units, which is much larger than
Pf/Kb. Thus our proposed shufﬂed iterative receiver enjoys
a much lower complexity. The interleaving process is actually
reading/writing the extrinsic information at appropriate address
and, therefore, it can simply be realized as a look-up table
(LUT) that memorises the reading/writing addresses at each
decoding cycle.
ϭƐƚĚĞĐŽĚŝŶŐĐǇĐůĞ
;WďŝƚƐͿ
>hd ĞŵĂƉƉŝŶŐ;WĨͬ<ďƐǇŵďŽůƐͿ
ϮŶĚĚĞĐŽĚŝŶŐĐǇĐůĞ
;WďŝƚƐͿ
KŶĞĐǇĐůĞŽĨƐŚƵĨĨůĞĚƐĐŚĞĚƵůĞ
'ͲƚŚĚĞĐŽĚŝŶŐĐǇĐůĞ
;WďŝƚƐͿ
KŶĞŝƚĞƌĂƚŝŽŶŽĨƐŚƵĨĨůĞĚƐĐŚĞĚƵůĞ
ϭƐƚĚĞĐŽĚŝŶŐĐǇĐůĞ
;WďŝƚƐͿ
^ƚĂƌƚŽĨŶĞǆƚŝƚĞƌĂƚŝŽŶ
>hd ĞŵĂƉƉŝŶŐ;WĨͬ<ďƐǇŵďŽůƐͿ >hd
ĞŵĂƉƉŝŶŐ
;WĨͬ<ďƐǇŵďŽůƐͿ
Fig. 2: Schedule of the proposed shufﬂed iterative receiver.
The proposed shufﬂed iterative receiver is given in Algo-
rithm 2, where it is seen that in each decoding cycle g, P bits
are decoded, while for the Pf bits among these P bits, which
are to be fed back, their extrinsic information are calculated
by the Pf/Kb demapping units and thereafter the a priori
information of these Pf bits are updated. Note that for MIMO
systems with a large number of antennas and/or high-order
modulation, Kb may become larger than P , which indicates
that the bits decoded in one cycle cannot be mapped onto an
intact symbol. Therefore some modiﬁcations are made on the
schedule for such systems. We perform a decoder-demapper
iteration per lc decoding cycles and the decoded lc ·Pf bits are
mapped onto several intact symbol vectors. Thus the number
of demapping units required becomes lc ·Pf/Kb. The receiver
still enjoys a pipeline structure with low IDD delay.
Algorithm 2 Algorithm of Shufﬂed Iterative Receiver
For iteration t (t = 1, 2, · · · , tmax) and cycle g (g =
1, 2, · · · , G), denote n (n = 1, 2, · · · , P ) as the index of
the bits processed in this cycle, n˜ (n˜ = 1, 2, · · · , Pf ) as the
index of the bits to be fed back to the demapper, and nˆ (nˆ =
Π(1),Π(2), · · · ,Π(Pf )) as the index of the interleaved
feedback bits associated with index n˜. The interleaved bits
are mapped onto symbol vector k (k = 1, 2, · · · , Pf/Kb).
Decoding Process
1. Calculate the a posteriori LLRs of all the bits Q(t)n using
Algorithm 1.
2. Calculate the extrinsic information of the bits to be fed
back as
Le2(n˜) = Q
(t)
n˜ − F (0)n˜ (7)
Interleaving Process
La1(nˆ) = L
e
2(n˜) (8)
Demapping Process
Calculate the extrinsic information Le1(nˆ) by demapping
symbol vector k using Eq. (2).
De-interleaving Process
Update the a priori information F (t)n˜ according to
F
(t)
n˜ = L
e
1(nˆ) (9)
The a priori information of the bits that do not participate
in the iterative process remain unchanged.
B. Analysis of The Proposed Scheme
The proposed shufﬂed iterative receiver has several ad-
vantages. Firstly, the delay induced by the IDD process is
greatly reduced compared with the traditional frame-by-frame
scheme. Secondly, the number of demapping units required is
much less than that of the existing shufﬂed receiver given in
[8], leading to a low complexity. Furthermore, the proposed
partial feedback strategy offers a ﬂexible trade-off between the
performance and complexity.
Fig. 2 illustrates the schedule of our proposed shufﬂed
iterative receiver, where it can be observed that this shufﬂed
iterative receiver employs a parallel schedule, namely, the
decoder and demapper form a pipeline structure and they work
simultaneously. As long as the sum of the clock cycles required
for the LUT and demapping operations is guaranteed to be no
larger than that of a decoding cycle, the decoder will never be
idle to wait for the completion of demapping operation.
Let us take the QC-LDPC code in IEEE 802.11n with the
code length N = 1944, the code rate Rc = 2/3 and the
sub-matrix size of q = 81 as an example. The decoding
process consists G = 81 cycles and in each cycle P = 24
bits of a layer are decoded. For simplicity, consider that the
extrinsic information of all the Pf = P bits are fed back.
Further assume that a decoding cycle occupies Tc clock cycles,
a demapping unit which handles a symbol vector needs Td
clock cycles, and the LUT operation needs δ clock cycles.
As can be inferred from Fig. 2, for the proposed shufﬂed
iterative receiver, a total of 81Tc + Td + δ clock cycles are
required for an iteration. By contrast, for the traditional frame-
by-frame scheme, if we use the same number of demapping
units as the shufﬂed one, a decoder-demapper iteration requires
81Tc + 81Td + Δ clock cycles, where Δ is the delay of the
interleaver which is typically very long. It can be seen that
our shufﬂed iterative receiver signiﬁcantly reduces the delay
induced by the IDD process, compared with the traditional
frame-by-frame scheme. Even compared with the non-iterative
scheme, for which 81Tc clock cycles are needed for one
iteration, our proposed shufﬂed scheme is competitive in terms
of process delay.
IV. SIMULATION RESULTS
We now present the simulation results to compare the pro-
posed shufﬂed iterative receiver with the non-iterative scheme
and the resetting algorithm given in [3]. The QC-LDPC code
13 13.5 14 14.5 15
10−6
10−5
10−4
10−3
10−2
10−1
BER performance,2x2MIMO,16QAM,ML Detection
SNR(dB)
B
E
R
Shuffled Iterative Receiver,max_iter=20
Shuffled Iterative Receiver,max_iter=30
Non−Iterative scheme,max_iter=50
Resetting Algorithm,lc=25,max_iter=2
Fig. 3: BER performance comparison of the proposed shufﬂed
iterative receiver, the non-iteration scheme and the resetting algorithm
over the 2 × 2 MIMO channel with 16-QAM modulation and ML
detection.
in IEEE 802.11n with code length 1944 and code rate 2/3 was
employed. A quasi-static Rayleigh ﬂat fading MIMO channel
was assumed. And the SNR was deﬁned as SNR = Esσ2 , where
Es denoted the average symbol energy.
For the (Nr = 2)× (Nt = 2) MIMO system with 16-QAM
modulation and the ML detection, Figs. 3 and 4 compare the
bit error rate (BER) performance and the average iteration
numbers of the three receivers, respectively. For the non-
iterative scheme, the maximum iteration number was set to
50. For the resetting algorithm, the decoder and demapper ex-
changed the extrinsic information once per lc = 25 decoding-
loop iterations, and the maximum iteration number of the
decoder-demapper loop was set to 2. For our proposed scheme,
10−6 10−5 10−4 10−3 10−2 10−1
0
5
10
15
20
25
Average Iteration Number,2x2MIMO,16QAM,ML Detection
BER
Ite
ra
tio
n 
N
um
be
r
Shuffled Iterative Receiver,max_iter=20
Shuffled Iterative Receiver,max_iter=30
Non−Iterative scheme,max_iter=50
Resetting Algorithm,lc=25,max_iter=2
Fig. 4: Average iteration number comparison of the proposed
shufﬂed iterative receiver, the non-iteration scheme and the resetting
algorithm over the 2× 2 MIMO channel with 16-QAM modulation
and ML detection.
13.5 14 14.5 15 15.5
10−6
10−5
10−4
10−3
10−2
10−1
BER performance,3x3MIMO,16QAM,K−BEST Detection
SNR(dB)
B
E
R
Shuffled Iterative Receiver,max_iter=20
Non−Iterative scheme,max_iter=50
Resetting Algorithm,lc=25,max_iter=2
Fig. 5: BER performance comparison of the proposed shufﬂed
iterative receiver, the non-iteration scheme and the resetting algorithm
over the 3×3 MIMO channel with 16-QAM modulation and K-BEST
detection.
the extrinsic information of Pf = P = 24 bits were fed back
in each cycle, while the maximum iteration numbers of 20 and
30 were considered. For each scenario, the iterative process
was repeated until the LDPC decoder converged or the preset
maximum iteration number was reached.
It can be observed from Fig. 3 that our proposed shufﬂed
iterative receiver provides approximately 0.7 dB and 0.5 dB
gains in the SNR over the non-iterative scheme and the
resetting algorithm, respectively, at the BER level of 10−5.
Furthermore, the average iteration number of our shufﬂed
iterative algorithm is much less than those of the other two
schemes at the same BER level, as can be seen from Fig. 4.
Additionally, we also notice that the performance gain of our
10−6 10−5 10−4 10−3 10−2 10−1
0
5
10
15
20
25
Average Iteration Number,3x3MIMO,16QAM,K−BEST Detection
BER
Ite
ra
tio
n 
N
um
be
r
Shuffled Iterative Receiver,max_iter=20
Non−Iterative scheme,max_iter=50
Resetting Algorithm,lc=25,max_iter=2
Fig. 6: Average iteration number comparison of the proposed
shufﬂed iterative receiver, the non-iteration scheme and the resetting
algorithm over the 3× 3 MIMO channel with 16-QAM modulation
and K-BEST detection.
proposed shufﬂed iterative receiver attained by increasing its
maximum iteration number from 20 to 30 is limited. This
demonstrates that our shufﬂed iterative receiver is capable of
obtaining a good performance even with a relatively small
maximum iteration number.
Next we present the simulation results for the (Nr =
3) × (Nt = 3) MIMO system with 16-QAM modulation
and the K-BEST detection where K was set as 64. For our
shufﬂed iterative receiver, only the maximum iteration number
of 20 was considered, while in each cycle, all the decoded
Pf = P = 24 bits were fed back. The parameters of the other
two schemes remained the same as the previous example. As
can be seen from Fig. 5, the proposed shufﬂed iterative receiver
exhibits approximately 0.5 dB and 0.2 dB gains in the SNR
at the BER level of 10−5 over the non-iterative scheme and
the resetting algorithm, respectively. Due to the suboptimal
demapping algorithm adopted, the gains are not as large as
in the case of adopting the ML demapper, but they are still
substantial. Fig. 6 shows that the average iteration number
is also greatly reduced by our shufﬂed iterative receiver,
compared with the other two schemes.
Our simulation investigation therefore shows that the pro-
posed shufﬂed iterative receiver attains several tenths dB gains
in the SNR in comparison to the widely used non-iterative
scheme and resetting algorithm, as well as imposes a smaller
number of iterations at a give BER level, compared with
the existing schemes. Furthermore, as demonstrated in the
previous section, our shufﬂed iterative receiver exhibits a much
lower IDD delay, compared with the traditional frame-by-
frame scheme. Therefore, our proposed scheme offers a low-
complexity and low-delay design to achieve a high MIMO
system throughput.
V. CONCLUSIONS
In this contribution, we have proposed a shufﬂed iterative
receiver for LDPC-coded MIMO systems. In our shufﬂed it-
erative receiver, the decoder adopts the vertical group shufﬂed
BP algorithm, and the extrinsic information of the decoded
bits generated in each cycle are fed back to the demapper
immediately, rather than waiting for the completion of decod-
ing the entire code frame. The decoder and demapper form a
pipeline structure which leads to a signiﬁcant reduction in the
IDD delay. A partial feedback strategy has also been suggested
to provide a ﬂexible performance and complexity trade-off.
Simulation results have demonstrated that the proposed shuf-
ﬂed iterative receiver outperforms the existing non-iterative
scheme and resetting algorithm in terms of achievable BER
performance, while imposing a smaller average number of
iterations. Our work thus has shown that our proposed scheme
offers a low-complexity and low-delay IDD design for high-
throughput LDPC-coded MIMO systems.
ACKNOWLEDGMENT
This work was supported by National Nature Science Foun-
dation of China (Grant No. 61271266), National High Tech-
nology Research and Development Program of China (Grant
No. 2014AA01A704), Beijing Natural Science Foundation
(Grant No. 4142027), Shenzhen Visible Light Communication
System Key Laboratory (ZDSYS20140512114229398) and
Shenzhen Peacock Plan (No. 1108170036003286).
REFERENCES
[1] B. M. Hochwald and S. ten Brink, “Achieving near-capacity on a
multiple-antenna channel,” IEEE Trans. Commun., vol. 51, no. 3,
pp. 389–399, Mar. 2003.
[2] Y. Liu, M. P. Fitz, and O. Y. Takeshita, “Full rate space-time turbo
codes,” IEEE J. Select. Areas Commun, vol. 19, no. 5, pp. 969–980,
May 2001.
[3] J. Hou, P. H. Siegel, and L. B. Milstein, “Design of multi-input multi-
output systems based on low-density parity-check codes,” IEEE Trans.
Commun., vol. 53, no. 4, pp. 601–611, Apr. 2005.
[4] S. ten Brink, G. Kramer, and A. Ashikhmin, “Design of low-density
parity-check codes for modulation and detection,” IEEE Trans. Com-
mun., vol. 52, no. 4, pp. 670–678, Apr. 2004.
[5] J. Nam, S. R. Kim, J. Ha, and J. Y. Ahn, “A new design of iterative
detection and decoding with soft interference cancellation,” in Proc. VTC
2008-Fall (Calgary, BC, Canada), Sept. 21-24, 2008, pp. 1–6.
[6] J. Liu, P. Li, and R. C. de Lamare, “Iterative detection and decoding for
MIMO systems with knowledge-aided belief propagation algorithms,” in
Proc. Asilomar Conf. Signals, Syst., Comp. (Paciﬁc Grove, CA), Nov.
4-7, 2012, pp. 1250–1254.
[7] B. Lu, G. Yue, and X. Wang, “Performance analysis and design
optimization of LDPC-coded MIMO OFDM systems,” IEEE Trans.
Signal Processing, vol. 52, no. 2, pp. 348–361, Feb. 2004.
[8] M. Li, C. A. Nour, C. Jego, J. Yang, and C. Douillard, “A shufﬂed
iterative bit-interleaved coded modulation receiver for the DVB-T2
standard: Design, implementation and FPGA prototyping,” in Proc. 2011
IEEE Workshop Signal Processing Systems (Beirut, Lebanon), Oct. 4-7,
2011, pp. 55–60.
[9] Z. Guo and P. Nilsson, “Algorithm and implementation of the k-best
sphere decoding for mimo detection,” IEEE J. Select. Areas Commun.,
vol. 24, no. 3, pp. 491–503, Mar. 2006.
[10] J. Zhang and M. P. C. Fossorier, “Shufﬂed iterative decoding,” IEEE
Trans. Commun., vol. 53, no. 2, pp. 209–213, Feb. 2005.
[11] D. J. C. MacKay, “Good error-correcting codes based on very sparse
matrices,” IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 399–431, Mar.
1999.
