Staircase Codes: FEC for 100 Gb/s OTN by Smith, Benjamin P. et al.
ar
X
iv
:1
20
1.
41
06
v1
  [
cs
.IT
]  
19
 Ja
n 2
01
2
IEEE/OSA JOURNAL OF LIGHTWAVE TECHNOLOGY 1
Staircase Codes: FEC for 100 Gb/s OTN
Benjamin P. Smith, Arash Farhood, Andrew Hunt,
Frank R. Kschischang Fellow, IEEE and John Lodge Fellow, IEEE
Abstract—Staircase codes, a new class of forward-error-
correction (FEC) codes suitable for high-speed optical commu-
nications, are introduced. An ITU-T G.709-compatible staircase
code with rate R = 239/255 is proposed, and FPGA-based
simulation results are presented, exhibiting a net coding gain
(NCG) of 9.41 dB at an output error rate of 10−15, an
improvement of 0.42 dB relative to the best code from the ITU-
T G.975.1 recommendation. An error floor analysis technique is
presented, and the proposed code is shown to have an error floor
at 4.0 × 10−21.
Index Terms—Staircase codes, fiber-optic communications, for-
ward error correction, product codes, low-density parity-check
codes.
I. INTRODUCTION
ADVANCES in physics—the invention of the laser, low-loss optical fiber, and the optical amplifier—have driven
the exponential growth in worldwide data communications.
However, as these technologies mature, system designers have
increasingly focused on techniques from communication the-
ory, including forward error correction, to simultaneously in-
crease transmission capacity and decrease transmission costs.
One of the first proposals for FEC in an optical system
appeared in [1], which demonstrated a shortened (224, 216)
Hamming code implementation at 565 Mbit/s. Since then,
ITU-T Recommendations G.975 and G.975.1 have standard-
ized more powerful codes for optical transport networks
(OTNs). More recently, low-density parity-check (LDPC)
codes [2], [3]—which provide the potential for capacity-
approaching performance—have been investigated, as aptly
summarized in [4], [5]. While implementations exists at 10
Gb/s (for 10GBase-T ethernet networks), the blocklengths of
such implementations (∼ 500–2000) are too short to provide
performance close to capacity; the (2048, 1723) RS-LDPC
code is approximately 3 dB from the Shannon Limit at
10−15 [6], see also [7]. Another significant roadblock is that
fiber-optic communication systems are typically required to
provide bit-error-rates below 10−15. It is well-known that
capacity-approaching LDPC codes exhibit error floors [8],
and to achieve the targeted error rate would likely require
concatenation with an outer code (e.g., as in [9]). In this work,
we focus on product-like codes (by product-like codes, we
B. P. Smith and F. R. Kschischang are with the Electrical and
Computer Engineering Department, University of Toronto, 10 King’s
College Road, Toronto, Ontario M5S 3G4, Canada (e-mail: {ben,
frank}@comm.utoronto.ca).
A. Farhood is with Cortina Systems Inc., 535 Legget Drive, Suite 1000,
Kanata, Ontario K2K 3B8, Canada.
A. Hunt and J. Lodge are with the Communications Research Centre
Canada, 3701 Carling Ave., Ottawa, Ontario K2H 8S2, Canada.
mean any generalized LDPC code with algebraic component
codes), since they possess properties that make them particu-
larly suited to providing error-correction in fiber-optic commu-
nication systems. In particular, for 100 Gb/s implementations,
we argue that syndrome-based decoding of product-like codes
is significantly more efficient than message-passing decoding
of LDPC codes.
This paper presents a new class of high-rate binary error-
correcting codes—staircase codes—whose construction com-
bines ideas from convolutional and block coding. Indeed, stair-
case codes can be interpreted as having a ‘continuous’ product-
like construction. In the context of wireless communications,
related code constructions include braided block codes [10],
braided convolutional codes [11], diamond codes [12] and
cross parity check convolutional codes [13], each of which
is related to the recurrent codes of Wyner-Ash [14]. However,
these proposals considered soft decoding of the component
codes, which is unsuitable for high-speed fiber-optic commu-
nications. Herein, we describe a syndrome-based decoder for
staircase codes, that provides excellent performance with an
efficient decoder implementation.
In Section II, we review the specifications and performance
of FEC codes defined in ITU-T Recommendations G.975
and G.975.1. In Section III, we describe the syndrome-based
decoder for product-like codes, and argue that it results in a
decoder data-flow that is more than two orders of magnitude
smaller than the message-passing decoder of an LDPC code.
Staircase codes are presented in Section IV, and a G.709-
compatible staircase code is proposed. In Section V, we
present an analytical method for determining the error floor
of iteratively decoded staircase codes, and show that the
proposed staircase code has an error floor at 4.0 × 10−21.
Finally, in Section VI, we present FPGA-based simulation
results, illustrating that the proposed code provides a 9.41 dB
NCG at an output error rate of 10−15, an improvement of
0.42 dB relative to the best code from the ITU-T G.975.1
recommendation, and only 0.56 dB from the Shannon Limit.
II. EXISTING PROPOSALS
A. ITU-T Recommendation G.975
The first error-correction code standardized for optical com-
munications was the (255, 239) Reed-Solomon code, with
symbols in F28 , capable of correcting up to 8 symbol errors
in any codeword. For an output-error-rate of 10−15, the NCG
of the RS code is 6.2 dB, which is 3.77 dB from capacity.
In order to provide improved burst-error-correction, 16
codewords are block-interleaved, providing correction for
0000–0000/00$00.00 c© 2011 IEEE
IEEE/OSA JOURNAL OF LIGHTWAVE TECHNOLOGY 2
bursts of as many as 1024 transmitted bits. A framing row
consists of 16 · 255 · 8 bits, 30592 of which are information
bits, and the remaining 2048 bits of which are parity. The
resulting framing structure—a frame consists of four rows—
is standardized in ITU-T recommendation G.709, and remains
the required framing structure for OTNs; as a direct result, the
coding rate of any candidate code must be R = 239/255.
B. ITU-T Recommendation G.975.1
As per-channel data rates increased to 10 Gb/s, and the
capabilities of high-speed electronics improved, the (255, 239)
RS code was replaced with stronger error-correcting codes.
In ITU-T recommendation G.975.1, several ‘next-generation’
coding schemes were proposed; among the many proposals,
the common mechanism for increased coding gain was the use
of concatenated coding schemes with iterative hard-decision
decoding. We now describe four of the best proposals, which
will motivate our approach in Section IV.
In Appendix I.3 of G.975.1, a serially concatenated coding
scheme is described, with outer (3860, 3824) binary BCH
code and inner (2040, 1930) binary BCH code, which are
obtained by shortening their respective mother codes. First,
30592 = 8 · 3824 information bits are divided into 8 units,
each of which is encoded by the outer code; we will refer
to the resulting unit of 30880 bits as a ‘block’. Prior to
encoding by the inner code, the contents of consecutive blocks
are interleaved (in a ‘continuous’ fashion, similar to convolu-
tional interleavers [15]). Specifically, each inner codeword in
a given block involves ‘information’ bits from each of the
eight preceding ‘outer’ blocks. Note that the interleaving step
increases the effective block-length of the overall code, but it
necessitates a sliding-window style decoding algorithm, due to
the continuous nature of the interleaver. Furthermore, unlike
a product code, the parity bits of the inner code are protected
by a single component codeword, which reduces their level of
protection. For an output-error-rate of 10−15, the NCG of the
I.3 code is 8.99 dB, which is 0.98 dB from capacity.
In Appendix I.4 of G.975.1, a serially concatenated scheme
with (shortened versions of) an outer (1023, 1007) RS code
and (shortened versions of) an inner (2047, 1952) binary BCH
code is proposed. After encoding 122368 bits with the outer
code, the coded bits are block interleaved and encoded by
the inner BCH code, resulting in a block length of 130560
bits, i.e., exactly one G.709 frame. As in the previous case,
the parity bits of the inner code are singly-protected. For an
output-error-rate of 10−15, the NCG of the I.4 code is 8.67
dB, which is 1.3 dB from capacity.
In Appendix I.5 of G.975.1, a serially concatenated scheme
with an outer (1901, 1855) RS code and an inner (512, 502)×
(510, 500) extended-Hamming product code is described. It-
erative decoding is applied to the inner product code, after
which the outer code is decoded; the purpose of the outer
code is to eliminate the error floor of the inner code, since
the inner code has small stall patterns (see Section V). For an
output-error-rate of 10−15, the NCG of the I.5 code is 8.5 dB,
which is 1.47 dB from capacity.
Finally, in Appendix I.9 of G.975.1, a product-like code
with (1020, 988) doubly-extended binary BCH component
codes is proposed. The overall code is described in terms
of a 512 × 1020 matrix of bits, in which the bits along
both the rows of the matrix as well as a particular choice
of ‘diagonals’ must form valid codewords in the component
code. Since the diagonals are chosen to include 2 bits in
every row, any diagonal codeword has two bits in common
with any row codeword; in contrast, for a product code, any
row and column have exactly one bit in common. Note that
the I.9 construction achieves a product-like construction (their
choice of diagonals ensures that each bit is protected by two
component codewords) with essentially half the overall block
length of the related product code (even so, the I.9 code has the
longest block length among all G.975.1 proposals). However,
the choice of diagonals decreases the size of the smallest stall
patterns, introducing an error floor above 10−14. For an output-
error-rate of 2 · 10−14, the NCG of the I.9 code is 8.67 dB,
which is 1.3 dB from capacity.
III. LDPC VS. PRODUCT CODES
In this section, we present a high-level view of iterative
decoders for LDPC and product codes. Due to the differences
in their implementations, a precise comparison of their im-
plementation complexities is difficult. Nevertheless, since the
communication complexity of message-passing is a significant
challenge in LDPC decoder design, we consider the decoder
data-flow, i.e., the rate of routing/storing messages, as a
surrogate for the implementation complexity.
A. Decoder-Data-flow Comparison
We consider a system that transmits information at D bits/s,
using a binary error-correcting code of rate R—for which
hard decisions at D/R bits/s are input to the decoder—and a
decoder that operates at a clock frequency fc Hz.
1) LDPC Code: We consider an LDPC decoder that im-
plements sum-product decoding (or some quantized approx-
imation) with a parallel-flooding schedule. We assume q-bit
messages internal to the decoder, an average variable node
degree dav , and N decoder iterations; typically, q is 4 or 5
bits, dav ≈ 3, and N ∼ 15− 25. Initially, hard-decisions are
input to the decoder at a rate of D/R bits/s and stored in
flip-flop registers. At each iteration, variable nodes compute
and broadcast q-bit messages over every edge, and similarly
for the check nodes, i.e., 2qdav bits are broadcast per iteration
per variable node. Since bits arrive from the channel at D/R
bits/s, the corresponding internal data-flow per iteration is then
D2qdav/R, and the total data-flow, including initial loading
of 1-bit channel messages, is
FLDPC =
D
R
+
2NDqdav
R
≈
2NDqdav
R
.
For N = 20, q = 4, dav = 3, FLDPC ≈ 480D/R, which
corresponds to a data-flow of more than 48 Tb/s for 100 Gb/s
systems.
IEEE/OSA JOURNAL OF LIGHTWAVE TECHNOLOGY 3
N iterations
D bits/s
︸ ︷︷ ︸
2Dqdav/R bits/s/iteration
Broadcast
var-to-checkLoad variable
node flip-flops
Broadcast
check-to-var
messagesmessages
D
R
bits/s
Fig. 1. Data-flow in an LDPC decoder
2) Product Code: When the component codes of a product
code can be efficiently decoded via syndromes (e.g., BCH
codes), there exists an especially efficient decoder for the prod-
uct code. Briefly, by operating exclusively in the ‘syndrome
domain’—which compresses the received signal—and passing
only ≤ t messages per (component) decoding (for t-error-
correcting component codes), the implementation complexity
of decoding is significantly reduced.
The following is a step-by-step description of the decoding
algorithm:
1) From the received data, compute and store the syndrome
for each row and column codeword. Store a copy of the
received data in memory R.
2) Decode those non-zero syndromes corresponding to row
codewords1. In the event of a successful decoding, set
the syndrome to zero, flip the corresponding t or fewer
positions in memory R, and update the t or fewer
affected column syndromes by a masking operation.
3) Repeat Step 2, reversing the roles of rows and columns.
4) If any syndromes are non-zero, and fewer than the
maximum number of iterations have been performed,
go to Step 2. Otherwise, output the contents of memory
R.
We quantify the complexity of decoding a product code by
its decoder data-flow. At first glance, it may seem that this
approach ignores the complexity of decoding the (component)
t-error-correcting BCH codewords. However, for relatively
small t, the decoding of a component codeword can be ef-
ficiently decomposed into a series of look-up table operations,
for which the data-flow interpretation is well-justified. In
this section, we will ignore the data-flow contribution of the
BCH decoding algorithm, but we return to this point in the
Appendix, where it is shown that the corresponding data-flow
is negligible.
We assume that rows are encoded by a t1-error-correcting
(n1, k1 = n1 − r1) BCH code, and the columns are encoded
by a t2-error-correcting (n2, k2 = n2 − r2) BCH code, for
an overall rate R = R1R2. We assume each row/column
codeword is decoded (on average, over the course of decoding
the overall product code) v times, where typically v ranges
from 3 to 4.
The hard-decisions from the channel—at D/R bits/s—are
written to a data RAM, in addition to being processed by a
syndrome computation/storage device. Contrary to the LDPC
decoder data-flow, the clock frequency fc plays a central role,
namely in the data-flow of the initial syndrome calculation.
1In practice, the syndrome corresponding to a fixed row is decoded only if
its value has changed since its last decoding.
n1
D
Rfc
bits
︷ ︸︸ ︷
n2
r1fc bits/s
r2fc bits/s
D
R
bits/s Masking
tree
Update row
syndrome
Look-up
table
Update column
syndromes
Fig. 2. Data-flow in the initial syndrome computing
Referring to Fig. 2, and assuming that the bits in a product
code are transmitted row-by-row, the input bus-width (i.e., the
number of input bits per decoder clock cycle) is D/(Rfc)
bits. Now, assuming these bits correspond to a single row
of the product code, each non-zero bit corresponds to some
r1-bit mask (i.e., the corresponding column of the parity-
check matrix of the row code), the modulo-2 sum of these is
performed by a masking tree, and the r1-bit output is masked
with the current contents of the corresponding (syndrome) flip-
flop register. That is, each clock cycle causes a r1-bit mask
to be added to the contents of the corresponding row in the
syndrome bank. Of course, each received bit also impacts a
distinct column syndrome, however, the same r2-bit mask is
applied (when the corresponding received bit is non-zero) to
each of the involved column syndromes; the corresponding
data-flow is then r2 bits per clock cycle.
Once the syndromes are computed from the received data,
iterative decoding commences. To perform a row decoding, an
r1-bit syndrome is read from the syndrome bank. Since there
are n2 row codewords, and each row is decoded on average v
times, the corresponding data-flow from the syndrome bank to
the row decoder is r1n2vD/(Rn1n2) = r1vD/(Rn1) bits/s.
For each row decoding, at most t1 positions are corrected, each
of which is specified by ⌈log2 n1⌉+ ⌈log2 n2⌉ bits. Therefore,
the data-flow from the row decoder to the data RAM is
t1n2vD(⌈log2 n1⌉+ ⌈log2 n2⌉)
Rn1n2
=
t1vD(⌈log2 n1⌉+ ⌈log2 n2⌉)
Rn1
bits/s. Furthermore, for each corrected bit, a r2-bit mask must
be applied to the corresponding column syndrome, which
yields a data-flow from the row decoder to the syndrome bank
of t1n2r2vD/(Rn1n2) = t1r2vD/(Rn1) bits/s. A similar
analysis can be applied to column decodings. In total, the
decoder data-flow is
FP =
D
R
+ (r1 + r2) · fc
+
Dv
Rn1
· (t1⌈log2 n1⌉+ t1⌈log2 n2⌉+ r1 + t1r2)
+
Dv
Rn2
· (t2⌈log2 n1⌉+ t2⌈log2 n2⌉+ r2 + t2r1) .
In this work, we will focus on codes for which n1 = n2 ≈
1000, r1 = r2 = 32, t1 = t2 = 3, and the decoder is assumed
to operate at fc ≈ 400 MHz. For v = 4, we then have a data-
flow of approximately 293 Gb/s. Note that this is more than
two orders of magnitude smaller than the corresponding data-
flow for LDPC decoding. Intuitively, the advantage arises from
two facts. First, when R1 > 1/2 and R2 > 1/2, syndromes
IEEE/OSA JOURNAL OF LIGHTWAVE TECHNOLOGY 4
D bits/s
an
d
lo
ad
sy
n
dr
o
m
e
fli
p-
flo
ps
Ca
lc
u
la
te
sy
n
dr
o
m
es
D
R
bits/s R
A
M
D
at
a
Dvt1(⌈log2 n1⌉+ ⌈log2 n2⌉)
Rn1
Column
Dvt1r2
Rn1
bits/s
Dvr1
Rn1
bits/s
Dvr2
Rn2
bits/s
Dvt2r1
Rn2
bits/s
bits/s
(r1 + r2)fc bits/s
Dvt2(⌈log2 n1⌉+ ⌈log2 n2⌉)
Rn2
bits/s
Row
Decoder
Decoder
Fig. 3. Data-flow in a product-code decoder
provide a compressed representation of the received signal.
Second, the algebraic component codes admit an economical
message-passing scheme, in the sense that message updates are
only required for the small fraction of bits that are corrected
by a particular (component code) decoding.
IV. STAIRCASE CODES
The staircase code construction combines ideas from recur-
sive convolutional coding and block coding. Staircase codes
are completely characterized by the relationship between
successive matrices of symbols. Specifically, consider the
(infinite) sequence B0, B1, B2, . . . of m-by-m matrices Bi,
i ∈ Z+. Herein, we restrict our attention to Bi with elements
in F2, but an analogous construction applies in the non-binary
case.
Block B0 is initialized to a reference state known to the
encoder-decoder pair, e.g., block B0 could be initialized to
the all-zeros state, i.e., an m-by-m array of zero symbols.
Furthermore, we select a conventional FEC code (e.g., Ham-
ming, BCH, Reed-Solomon, etc.) in systematic form to serve
as the component code; this code, which we henceforth refer
to as C, is selected to have block length 2m symbols, r of
which are parity symbols.
Encoding proceeds recursively on the Bi. For each i,
m(m − r) information symbols (from the streaming source)
are arranged into the m−r leftmost columns of Bi; we denote
this sub-matrix by Bi,L. Then, the entries of the rightmost r
columns (this sub-matrix is denoted by Bi,R) are specified as
follows:
1) Form the m×(2m−r) matrix, A = [BTi−1 Bi,L], where
BTi−1 is the matrix-transpose of Bi−1.
2) The entries of Bi,R are then computed such that each
of the rows of the matrix
[
BTi−1 Bi,L Bi,R
]
is a valid
codeword of C. That is, the elements in the jth row of
Bi,R are exactly the r parity symbols that result from
encoding the 2m − r ‘information’ symbols in the jth
row of A.
Generally, the relationship between successive blocks in a
staircase code satisfies the following relation: for any i ≥ 1,
each of the rows of the matrix
[
BTi−1Bi
]
is a valid codeword in
C. An equivalent description—from which the term ‘staircase
codes’ originates—is suggested by Fig. 4, in which (the
concatenation of the symbols in) every row (and every column)
in the ‘staircase’ is a valid codeword of C; this representation
m
m BT0 B1
BT2 B3
BT4
Fig. 4. The ‘staircase’ visualization of staircase codes.
suggests their connection to product codes. However, staircase
codes are naturally unterminated (i.e., their block length is
indeterminate), and thus admit a range of decoding strategies
with varying latencies. Most importantly, we will see that they
outperform product codes.
The rate of a staircase code is
Rs = 1−
r
m
,
since encoding produces r parity symbols for each set of m−
r ‘new’ information symbols. However, note that the related
product code has rate
Rp =
(
2m− r
2m
)2
= 1−
r
m
+
r2
4m2
,
which is greater than the rate of the staircase code. However,
for sufficiently high rates, the difference is small, and staircase
codes outperform product codes of the same rate.
From the context of transmitter latency—which includes
encoding latency and frame-mapping latency—staircase codes
have the advantage (relative to product codes) that the effective
rate (i.e., the ratio of ‘new’ information symbols, m−r, to the
total number of ‘new’ symbols, m) of a component codeword
is exactly the rate of the overall code. Therefore, the encoder
produces parity at a ‘regular’ rate, which enables the design
of a frame-mapper that minimizes the transmitter latency.
We note that staircase codes can be interpreted as general-
ized LDPC codes with a systematic encoder and an indeter-
minate block-length, which admits decoding algorithms with
a range of latencies.
Using arguments analogous to those used for product codes,
a t-error-correcting component code C with minimum distance
dmin has a Hamming distance between any two staircase
codewords that is at least d2min.
A. Decoding Algorithm
Staircase codes are naturally unterminated (i.e., their block
length is indeterminate), and thus admit a range of decoding
strategies with varying latencies. That is, decoding can be ac-
complished in a sliding-window fashion, in which the decoder
operates on the received bits corresponding to L consecutively
received blocks Bi, Bi+1, . . . , Bi+L−1. For a fixed i, the
decoder iteratively decodes as follows: First, those component
IEEE/OSA JOURNAL OF LIGHTWAVE TECHNOLOGY 5
most reliable least reliable
Π Π Π
Fig. 5. A multi-edge-type graphical representation of staircase codes. Π is
a standard block interleaver, i.e., it represents the transpose operation on an
m-by-m matrix.
codewords that ‘terminate’ in block Bi+L−1 (i.e., whose
parity bits are in Bi+L−1) are decoded; since every symbol
is involved in two component codewords, the corresponding
syndrome updates are performed, as in Section III-A2. Next,
those codewords that terminate in block Bi+L−2 are decoded.
This process continues until those codewords that terminate in
block Bi are decoded. Now, since decoding those codewords
terminating in some block Bj affects those codewords that
terminate in block Bj+1, it is beneficial to return to Bi+L−1
and to repeat the process. This iterative process continues until
some maximum number of iterations is performed, at which
time the decoder outputs its estimate for the contents of Bi,
accepts in a new block Bi+L, and the entire process repeats
(i.e., the decoding window slides one block to the ‘right’).
B. Multi-edge-type Interpretation
Staircase codes have a simple graphical representation,
which provides a multi-edge-type [3] interpretation of their
construction. The term ‘multi-edge-type’ was originally ap-
plied to describe a refined class of irregular LDPC codes, in
which variable nodes (and check nodes) are classified by their
degrees with respect to a set of edge types. Intuitively, the
introduction of multiple edge types allows degree-one variable
nodes, punctured variable nodes, and other beneficial features
that are not admitted by the conventional irregular ensemble.
In turn, better performance for finite blocklengths and fixed
decoding complexities is possible.
In Fig. 5, we present the factor graph representation of a
decoder that operates on a window of L = 4 blocks; the graph
for general L follows in an obvious way. Dotted variable nodes
indicate symbols whose value was decoded in the previous
stage of decoding. The key observation is that when these
symbols are correctly decoded—which is essentially always
the case, since the output BER is required to be less than
10−15—the component codewords in which they are involved
are effectively shortened by m symbols. Therefore, the most
reliable messages are passed over those edges connecting
variable nodes to the shortened (component) codewords, as
indicated in Fig. 5. On the other hand, the rightmost collection
of variable nodes are (with respect to the current decoding
window) only involved in a single component codeword, and
thus the edges to which they are connected carry the least
reliable messages. Due to the nature of iterative decoding,
the intermediate edges carry messages whose reliability lies
between these two extremes.
C. A G.709-compatible Staircase Code
The ITU-T Recommendation G.709 defines the framing
structure and error-correcting coding rate for OTNs. For our
purposes, it suffices to know that an optical frame consists
of 130560 bits, 122368 of which are information bits, and
the remaining 8192 are parity bits, which corresponds to
error-correcting codes of rate R = 239/255. Since (510 −
32)/510 = 239/255, we will consider a component code
with m = 510 and r = 32. Specifically, the binary (n =
1023, k = 993, t = 3) BCH code with generator polynomial
(x10+x3+1)(x10+x3+x2+x+1)(x10+x8+x3+x2+1)
is adapted to provide an additional 2-bit error-detecting mech-
anism, resulting in the generator polynomial2
g(x) = (x10 + x3 + 1)(x10 + x3 + x2 + x+ 1)
· (x10 + x8 + x3 + x2 + 1)(x2 + 1).
In order to provide a simple mapping to the G.709 frame, we
first note that 2 ·130560 = 510 ·512. This leads us to define a
slight generalization of staircase codes, in which the blocks Bi
consist of 512 rows of 510 bits. The encoding rule is modified
as follows:
1) Form the 512× (512+ 510) matrix, A =
[
BˆTi−1 Bi,L
]
,
where BˆTi−1 is obtained by appending two all-zero rows
to the top of the matrix-transpose of Bi−1.
2) The entries of Bi,R are then computed such that each
of the rows of the matrix
[
BTi−1 Bi,L Bi,R
]
is a valid
codeword of C. That is, the elements in the jth row of
Bi,R are exactly the 32 parity symbols that result from
encoding the 990 ‘information’ symbols in the jth row
of A.
Here, C is the code obtained by shortening the code
generated by g(x) by one bit, since our overall codeword
length is 510 + 512 = 1022.
V. ERROR FLOOR ANALYSIS
For iteratively decoded codes, an error floor (in the output
bit-error-rate) can often be attributed to error patterns that
‘confuse’ the decoder, even though such error patterns could
easily be corrected by a maximum-likelihood decoder. In the
context of LDPC codes, these error patterns are often referred
to as trapping sets [8]. In the case of product-like codes with
an iterative hard-decision decoding algorithm, we will refer to
them as stall patterns, due to the fact that the decoder gets
locked in a state in which no updates are performed, i.e., the
decoder stalls, as in Fig. 6.
Definition 1: A stall pattern is a set s of codeword posi-
tions, for which every row and column involving positions in
s has at least t+ 1 positions in s.
We note that this definition includes stall patterns that are
correctable, since an incorrect decoding may fortuitously
cause one or more bits in s to be corrected, which could
then lead to all bits in s eventually being corrected. In this
section, we obtain an estimate for the error floor by over-
bounding the probabilities of these events, and pessimistically
2This is the code applied to the rows (but not the slopes) of the I.9 code
in G.975.1.
IEEE/OSA JOURNAL OF LIGHTWAVE TECHNOLOGY 6
BTi Bi+1
BTi+2 Bi+3
Fig. 6. A stall pattern for a staircase code with a triple-error correcting
component code. Since every involved component codeword has 4 errors,
decoding stalls.
assuming that every stall pattern is uncorrectable (i.e., if any
stall pattern appears during the course of decoding, it will
appear in the final output). The methods presented for the
error floor analysis apply to a general staircase code, but for
simplicity of the presentation, we will focus on a staircase code
with m = 510 and doubly-extended triple-error-correcting
component codes.
A. A Union Bound Technique
Due to the streaming nature of staircase codes, it is neces-
sary to account for stall patterns that span (possibly multiple)
consecutive blocks. In order to determine the bit-error-rate due
to stall patterns, we consider a fixed block Bi, and the set
of stall patterns that include positions in Bi. Specifically, we
‘assign’ to Bi those stall patterns that include symbols in Bi
(and possibly additional positions in Bi+1) but no symbols in
Bi−1. Let Si represent the set of stall patterns assigned to Bi.
By the union bound, we then have
BERfloor ≤
∑
s∈Si
Pr[bits in s in error] · |s|
5102
.
Therefore, bounding the error floor amounts to enumerating
the set Si, and evaluating the probabilities of its elements being
in error.
B. Bounding the Contribution Due to Minimal Stalls
Definition 2: A minimal stall pattern has the property that
there are only t+ 1 rows with positions in s, and only t+ 1
columns with positions in s.
The minimal stall patterns of a staircase code can be counted
in a straightforward manner; the multiplicity of minimal stall
patterns that are assigned to Bi is
Mmin =
(
510
4
)
·
4∑
m=1
(
510
m
)
·
(
510
4−m
)
,
and we refer to the set of minimal stall patterns by Smin. The
probability that the positions in some minimal stall pattern s
are received in error is p16.
Next, we consider the case in which not all positions in
some minimal stall pattern s are received in error, but that due
to incorrect decoding(s), all positions in s are—at some point
during decoding—simultaneously in error. For some fixed s
and l, 1 ≤ l ≤ 16, there are
(
16
l
)
ways in which 16 − l
positions in s can be received in error. For the moment, let’s
assume that erroneous bit flips occur independently with some
probability ζ, and that ζ does not depend on l. Then we can
overbound the probability that a particular minimal stall s
occurs by
16∑
l=0
(
16
l
)
p16−lζl = (p+ ζ)16.
In order to provide evidence in favor of these assumptions,
Table I presents empirical estimates, for l = 0, l = 1 and
l = 2, of the probability that a minimal stall pattern s occurs
during iterative decoding, given that 16− l positions in s are
(intentionally) received in error. Note that even if a minimal
stall is received, there exists a non-zero probability that it
will be corrected as a result of erroneous decodings; we will
ignore this effect in our estimation, i.e., we make the worst-
case assumption that any minimal stall persists. Furthermore,
from the results for l = 1 and l = 2, it appears that our
stated assumptions regarding ζ hold true, and ζ ≈ 5.8×10−4.
For l > 2, we did not have access to sufficient computational
resources for estimating the corresponding probabilities. Nev-
ertheless, based on the evidence presented in Table I, the error
floor contribution due to minimal stall patterns is estimated as
16
5102
·Mmin · (p+ ζ)
16,
where ζ = 5.8× 10−4 when p = 4.8× 10−3.
TABLE I
ESTIMATED PROBABILITY OF A MINIMAL STALL s, GIVEN THAT 16 − l
POSITIONS ARE RECEIVED IN ERROR
l Estimated probability
0 149/150
1 1/1725
2 (1/1772)2
C. Bounding the Contribution Due to Non-minimal Stalls
We now wish to account for the error floor contribution of
non-minimal stalls, e.g., the stall pattern illustrated in Fig. 7. In
the general case, a stall pattern s includes codeword positions
in K rows and L columns, K ≥ 4, L ≥ 4; we refer to
these as (K,L)-stalls. Furthermore, each (K,L)-stall includes
l positions, 4 ·max(K,L) ≤ l ≤ K ·L, where the lower bound
follows from the fact that every row and column (in the stall)
includes at least 4 positions. Note that there are
AK,L =
(
510
L
)
·
K∑
m=1
(
510
m
)
·
(
510
K −m
)
ways to select the involved rows and columns.
For a fixed (K,L) 6= (4, 4) and a fixed choice of rows and
columns, we now proceed to overbound the contributions of
candidate stall patterns. Without loss of generality, we assume
that K ≥ L, and note that there are
(
L
4
)K
ways of choosing
l = 4K elements (in the L ·K ‘grid’ induced by the choice
of rows and columns) such that each column includes exactly
four elements, and that every stall pattern ‘contains’ at least
one of these. Now, since a stall pattern includes l elements,
IEEE/OSA JOURNAL OF LIGHTWAVE TECHNOLOGY 7
BTi Bi+1
BTi+2 Bi+3
Fig. 7. A non-minimal stall pattern for a staircase code with a triple-error
correcting component code.
4 ·K ≤ l ≤ K ·L, the number of stall patterns with l elements
is overbounded as(
L
4
)K
·
(
K · L− 4 ·K
l − 4 ·K
)
.
For a general (K,L) 6= (4, 4), it follows that the number of
stall patterns with l elements, 4 ·max(K,L) ≤ l ≤ K · L, is
overbounded as(
min(K,L)
4
)max(K,L)
·
(
K · L− 4 ·max(K,L)
l − 4 ·max(K,L)
)
.
Finally, over the choice of the K rows and L columns, there
are
M lK,L = AK,L·
(
min(K,L)
4
)max(K,L)
·
(
KL− 4 ·max(K,L)
l − 4 ·max(K,L)
)
(K,L)-stalls with l elements.
For a fixed K and L, the contribution to the error floor can
be estimated as
K·L∑
l=4·max(K,L)
l
5102
·M lK,L · (p+ ζ)
l,
and in Table II, we provide values for various K and L, when
ζ = 5.8× 10−4 and p = 4.8× 10−3.
Note that the dominant contribution to the error floor is due
to minimal stall patterns (i.e., K = L = 4), and that the overall
estimate for the error floor of the code is 3.8×10−21. Finally,
we note that by a similar (but more cumbersome) analysis, the
error floor of the G.709-compliant staircase code is estimated
to occur at 4.0× 10−21.
VI. SIMULATION RESULTS
In Fig. 8, simulation results—generated in hardware on
an FPGA implementation—are provided for the G.709-
compatible staircase code, for L = 7. We also present the
TABLE II
CONTRIBUTION TO ERROR FLOOR ESTIMATE OF (K,L)-STALL PATTERNS
K L Contribution
4 4 3.55× 10−21
4 5 7.81× 10−28
5 5 2.54× 10−22
5 6 2.21× 10−28
6 6 1.40× 10−23
6 7 1.49× 10−29
7 7 8.53× 10−25
7 8 1.83× 10−32
10-15
10-14
10-13
10-12
10-11
10-10
10-9
10-8
10-7
10-6
10-410-310-2
BERout
BERin
20 log10(Q) (dB)7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0
BSC Lim
it (C=239/255)
RS(255,239)
Staircase
I.3 I.4 I.5
I.9 G.975.1 codes
Fig. 8. Performance of a R = 239/255 staircase code on a binary symmetric
channel with crossover probability BERin, compared with various G.975.1
codes. The upper scale plots the equivalent binary-input Gaussian channel Q
(in dB), where BERin = (1/2)erfc(Q/
√
2).
bit-error-rate curves for the G.975 RS code, as well as the
G.975.1 codes described in Section II. For an output error rate
10−15, the staircase code provides approximately 9.41 dB net
coding gain, which is within 0.56 dB of the Shannon limit,
and an improvement of 0.42 dB relative to the best G.975.1
code.
VII. CONCLUSIONS
We proposed staircase codes, a class of product-like FEC
codes that provide reliable communication for streaming
sources. Their construction admits low-latency encoding and
variable-latency decoding, and a decoding algorithm with
an efficient hardware implementation. For R = 239/255, a
G.709-compatible staircase code was presented, and perfor-
mance within 0.56 dB of the Shannon Limit at 10−15 was
provided via an FPGA-based simulation.
APPENDIX
This section briefly describes known techniques for effi-
ciently decoding triple-error-correcting binary BCH codes, and
discusses the data-flow associated with a lookup-table-based
decoder architecture.
For a syndrome S = (S1, S3, S5), Si ∈ F2m , we first
compute D3 = S31 + S3 and D5 = S51 + S5. A triple-error
correcting decoder distinguishes the cases
v = 0 : S1 = S2 = S3 = 0
v = 1 : S1 6= 0, D3 = D5 = 0
v = 2 : S1 6= 0, D3 6= 0, S1D5 = S3D3
v = 3 : D3 6= 0, v 6= 2,
where v is the number of positions to invert in order to obtain
a valid codeword.
IEEE/OSA JOURNAL OF LIGHTWAVE TECHNOLOGY 8
In order to determine the corresponding positions, a recip-
rocal error-locator polynomial σ˜(x) is defined, the roots of
which identify the positions. From [16], we have:
v = 1 : σ˜(x) = x+ S1
v = 2 : σ˜(x) = x2 + S1x+D3/S1
v = 3 : σ˜(x) = x3 + S1x
2 + bx+ S1b+D3
where
b = (S21S3 + S5)/D3.
When t = 2, note that all of the coefficients of σ˜(x) are
nonzero.
It remains to determine the roots of σ˜(x). For v = 1,
it is trivial to determine the error location. For v = 2 or
v = 3, lookup-based methods for solving the corresponding
quadratic and cubic equations are described in [17], [18]. In the
remainder of this section, we briefly describe these methods,
and discuss their data-flow.
For a quadratic equation fX(x) = x2 + ax+ b with a 6= 0,
substitute x = ay to obtain
fY (y) = a
2(y2 + y + b/a2).
If fY (r) = 0 then fX(ar) = 0. Thus the problem of finding
roots of fX(x) reduces to the problem of finding roots of the
suppressed quadratic fY (y), which can be solved by lookup
using a table with 2m entries, each of which is a pair of
elements in F2m . Therefore, when v = 2, decoding requires
2m bits to be read from a lookup-table memory.
Similarly, for a cubic equation fX(x) = x3+ ax2+ bx+ c,
substitute x = y + a to obtain
fY (y) = y
3 + (a2 + b)y + ab+ c.
Note that yfY (y) is a linearized polynomial with respect
to F2 and hence the set of zeros of yfY (y) is a vector space
over F2. In particular, the roots of yfY (y), if distinct, are of
the form {0, r1, r2, r1 + r2}. Thus only r1 and r2 need to be
stored in the lookup table.
Two cases arise, depending on the value of a2+b = D5/D3.
If D5 = 0, so that a2+b = 0, then fY (y) = y3+ab+c, and the
roots can be found by finding the cube roots of ab+ c = D3,
which requires lookup using a table with 2m entries, each
of which is a pair of elements in F2m . If D5 6= 0, so that
a2 + b 6= 0, substitute y = (a2 + b)1/2z to obtain
fZ(z) = (a
2 + b)3/2(z3 + z + (ab+ c)/(a2 + b)3/2),
where
ab+ c
(a2 + b)3/2
=
(
D53
D35
)1/2
.
The roots of the suppressed cubic fZ(z) can be found by
lookup using a table with 2m entries, each of which is a pair of
elements in F2m . Therefore, in either case, decoding requires
2m bits to be read from a lookup-table memory.
Finally, for n = n1 = n2, the data-flow contribution of the
lookup-table-based decoding architecture is 4mvDnR . For n =
1000, m = 10, v = 4, R = 239/255 and D = 100 Gb/s, the
corresponding data-flow is 17.1 Gb/s, which is small relative
to the data-flow that arises due to those effects considered in
Section III-A2.
REFERENCES
[1] W. D. Grover, “Forward error correction in dispersion-limited lightwave
systems,” J. Lightw. Technol., vol. 6, no. 5, pp. 643–645, May 1988.
[2] R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, MA:
MIT Press, 1963.
[3] T. Richardson and R. Urbanke, Modern Coding Theory. Cambridge,
UK: Cambridge University Press, 2008.
[4] I. B. Djordjevic, M. Arabaci, and L. L. Minkov, “Next generation
FEC for high-capacity communication in optical transport networks,”
J. Lightw. Technol., vol. 27, no. 16, pp. 3518–3530, Aug. 2009.
[5] T. Mizuochi, “Recent progress in forward error correction and its
interplay with transmission impairments,” IEEE J. Sel. Topics Quantum
Electron., vol. 12, no. 4, pp. 544–554, Jul. 2006.
[6] Z. Zhang, V. Anantharam, M. J. Wainwright, and B. Nikolic, “An
efficient 10GBASE-T ethernet LDPC decoder design with low error
floors,” IEEE J. Solid-State Circuits, vol. 45, no. 4, pp. 843–855, Apr.
2010.
[7] A. Darabiha, A. Chan Carusone, and F. R. Kschischang, “Power re-
duction techniques for LDPC decoders,” IEEE J. Solid-State Circuits,
vol. 43, no. 8, pp. 1835–1845, Aug. 2008.
[8] T. Richardson, “Error floors of LDPC codes,” in Proc. 41st Allerton
Conf. Comm., Control, and Comput., Monticello, IL, 2003.
[9] T. Mizuochi et al., “Experimental demonstration of concatenated LDPC
and RS codes by FPGAs emulation,” IEEE Photon. Technol. Lett.,
vol. 21, no. 18, pp. 1302–1304, Sep. 2009.
[10] A. J. Feltstro¨m, D. Truhachev, M. Lentmaier, and K. S. Zigangirov,
“Braided block codes,” IEEE Trans. Inf. Theory, vol. 55, no. 6, pp.
2640–2658, Jun. 2009.
[11] W. Zhang, M. Lentmaier, K. S. Zigangirov, and D. J. Costello Jr.,
“Braided convolutional codes: A new class of turbo-like codes,” IEEE
Trans. Inf. Theory, vol. 56, no. 1, pp. 316–331, Jan. 2010.
[12] C. P. M. J. Baggen and L. M. G. M. Tolhuizen, “On diamond codes,”
IEEE Trans. Inf. Theory, vol. 43, no. 5, pp. 1400–1411, Sep. 1997.
[13] T. Fuja, C. Heegard, and M. Blaum, “Cross parity check convolutional
codes,” IEEE Trans. Inf. Theory, vol. 35, no. 6, pp. 1265–1276, Nov.
1989.
[14] A. D. Wyner and R. B. Ash, “Analysis of recurrent codes,” IEEE Trans.
Inf. Theory, vol. 9, no. 3, pp. 143–156, 1963.
[15] G. D. Forney Jr., “Burst-correcting codes for the classic bursty channel,”
IEEE Trans. Commun., vol. 19, no. 5, pp. 772–781, Oct. 1971.
[16] I. S. Reed and X. Chen, Error-Control Coding for Data Networks.
Boston, MA: Kluwer Academic Publishers, 1999.
[17] R. T. Chien, B. D. Cunningham, and I. B. Oldham, “Hybrid methods
for finding roots of a polynomial with application to BCH decoding,”
IEEE Trans. Inf. Theory, vol. 15, pp. 329–335, Mar. 1969.
[18] E. R. Berlekamp, H. Rumsey, and G. Solomon, “On the solution of
algebraic equations over finite fields,” Inform. Contr., vol. 10, pp. 553–
564, 1967.
