Abstract. A low-complexity method for resolving stall patterns when decoding staircase codes is proposed. Stall patterns are the dominating contributor to the error floor in the original decoding method. Our improvement is based on locating stall patterns by intersecting non-zero syndromes and flipping the corresponding bits. The approach effectively lowers the error floor of decoding staircase codes. This allows for a new range of block sizes to be considered for optical communication at a certain rate or, alternatively, a significantly decreased error floor for the same block size. Further, an improved analysis of the error floor behavior is introduced which provides a more accurate estimation of the contributions to the error floor.
Introduction
Staircase codes were introduced by Smith et al. in [1] and are a powerful code construction for error-correction in optical communication systems. Staircase codes are based on a spatially-coupled binary BCH component code. With performance close to capacity for the binary symmetric channel (BSC) and decoder complexity lower than a comparable LDPC code, staircase codes provide a costefficient alternative to soft decision decoding of LDPC codes. As shown in [2] , staircase codes perform well for a multitude of different parameters. However, the usability of staircase codes is limited by the requirement of optical communication systems to guarantee an error floor below 10 −15 , which allows small block sizes of the staircase code only at relatively low code rates.
Similar to trapping sets in decoding LDPC codes, certain constellations of errors, called stall patterns, cannot be resolved by the component codes of the staircase code. A strategy that enables the decoder to resolve stall patterns will improve the performance in the error floor region and potentially allow for more efficient decoding, as smaller block sizes can be used. For the structurally closely related product and half-product codes, several approaches for resolving stall patterns have been proposed. In [6, 9] resolving of stall patterns by performing erasure decoding is considered. However, while requiring additional hardware in implementation, this approach offers no advantages in terms of stall pattern resolving capability and is even outperformed when comparing to approaches of lower complexity based on bit-flipping [7, 8, 4] for larger stall patterns. To evaluate the performance of a code with given parameters, an analysis of the error floor is required. While [7, 4] offer an analysis based on exhaustive search, this work introduces a new analytical approach devised by relating the problem of counting stall patterns to the numerical problem of finding the number of binary matrices with certain row and column weight [5] .
In this paper, we present an improved decoder for staircase codes, which is able to locate stall patterns and resolves many of them, by adapting the concept of bit-flipping as known for product codes. In particular, our method guarantees to correct all stall patterns when the number of involved columns and rows is each smaller than the minimum distance of the component BCH code and no undetected error events occur. Another contribution of this work is a new estimation of the error floor that is significantly more accurate than the one from [1] . Finally, we present simulation results for a staircase code with a quarter of the block size compared to the scheme of [1] , which reaches a BER out = 10 −15 at ∼ 1 dB from BSC capacity at the cost of a tiny rate loss ( are codewords of the BCH code, for all i ≥ 1. An illustration is given in Fig. 1 .
Sliding-Window Decoding
Decoding is based on multiple iterations of hard-decision BCH decoders operating on a sliding window of appropriate size W in multiple iterations. A window comprised of B i to B i+W −1 is decoded by first decoding the BCH codewords spanning B T i B i+1 , followed by the codewords spanning B T i+1 B i+2 , until the last block B i+W −1 of the window is reached. Then, the decoder returns to B T i B i+1 and the process is repeated. When no more errors are detected in the last block of the window or a fixed maximum number of iterations is reached, the decoder declares B i as decoded, slides the window by one block and repeats the process for the new window comprised of B i+1 to B i+W .
Stall Patterns and Known Error Floor Analysis
The [n, k] BCH component code has (unique) error-correcting capability t, i.e. if t + 1 errors affect the codeword, the decoder is not able to resolve them.
A stall pattern for the staircase code is a set of erroneous bit positions such that each erroneous row and each erroneous column contains at least t + 1 erroneous bits. The minimal number of rows K and columns L involved in such a pattern is K = L = t + 1 and the minimal number of errors is ǫ = (t + 1)
2 . For such minimal stall patterns (compare Fig. 2 ), every intersection of an involved row and an involved column is an erroneous bit.
If more than t + 1 rows or columns are part of the stall pattern, it is possible that not every bit in the intersection of involved rows and columns is in error. The weight of the error vectors of each involved row or column has to be at least t + 1 and therefore, the number of errors ǫ in a (K, L) stall pattern is bounded by:
(1) Fig. 3 shows a non-minimal (4, 4)-stall pattern with t = 2 and ǫ = ǫ min = 12. The error floor estimation given in [1] is based on the assumption that the dominating contributors to the error floor are stall patterns. It is estimated by enumerating the number of possible stall patterns and weighting each pattern with the probability that the corresponding positions are in error. A stall pattern is associated to the block B i with lowest index that contains at least one of its errors. The number of combinations of K rows and L columns such that the stall pattern belongs to a certain block is
Given the rows and columns, the number of different ways to distribute errors within their intersections is denoted by N ǫ K,L is overestimated by (see [1] ):
With (2) and (3), the contribution of (K, L)-stall patterns to the BER out in the error floor region can be overbounded by weighing each pattern with the probability that the corresponding positions are in error, to obtain
where p is the crossover probability of the BSC and ξ is an additional correction factor, see [1] for details.
Resolving Stall Patterns

The Bit-Flip Operation
Assume that all errors that are not part of a stall pattern are resolved by the regular sliding-window decoding procedure (see Section 2.2) and hence only stall patterns remain. In a minimal stall pattern, each involved row and column contains exactly t+ 1 erroneous bits and therefore, results in a non-zero syndrome for the component code with distance d ≥ 2t + 2. Thus, a minimal stall pattern can be resolved by flipping each bit at the intersection of the words with non-zero syndromes. 
Definition 1 (Bit Flipping) Consider a staircase code with the BCH component code C and let
be the masking matrix and let B (z) i be block B i after z decoding iterations. Define the operation bit-flip as
The matrix M i constructs a stall pattern of given size and maximum weight. It completely resolves minimal stall patterns, but it is not always successful for non-minimal stall patterns.
Analysis of Bit-Flip Without Undetected Error Events
First assume that no undetected error events occur. By undetected error event, we refer to an incorrectly decoded component word with all-zero syndrome.
In general, for a non-minimal stall pattern of B 
This mask therefore reconstructs a stall pattern of correct size, but of maximum weight (compare (1)) which might introduceǭ new errors, whereǭ
For example, for the (4, 4) non-minimal stall pattern of Fig. 4 , the bit-flip operation resolves the 12 erroneous bits of the stall pattern, but introduces 4 new errors, as indicated by red markers. These 4 new errors can then be corrected by a usual sliding-window decoding iteration. Proof. When K, L < 2(t + 1), the weight of every involved row is t + 1 ≤ w H (r) ≤ L, where the lower bound is given by the definition of a stall pattern. When the L involved bits of that row are flipped, the new weight of the row is
which can be corrected by the component codes in a normal sliding-window iteration.
⊓ ⊔
For larger K, L, the restriction of (8) no longer holds in general. Clearly, if ǫ ≥ d 2 min it is possible that a stall pattern is undetectable. However, this case is unlikely and the staircase code cannot be protected against it. The more likely problem arises when K ≥ 2(t + 1) and L ≥ 2(t + 1), but ǫ < d 2 min . Then theǭ previously error free positions can form a stall pattern of size (K, L) causing the bit-flip operation to result in another stall pattern. This case can be avoided with high probability by flipping only a single column if K ≥ 2(t + 1) and L ≥ 2(t + 1). Then, with a high probability, some rows can be decoded, leading to decodable columns and eventually to resolving of the stall pattern.
Analysis of Bit-Flip With Undetected Error Events
Assume that undetected error events occur, i.e., there is an incorrect component word with all-zero syndrome. This component word is clearly a codeword of the component code, but the blocks do not give the correct staircase codeword. As a result, not all positions involved in a stall pattern can be located and (8) does not necessarily hold. When an extended BCH code of minimum distance 2t + 2 is used as component code, the lowest weight of a component error that can cause an undetected error event is t + 2. Since it is more likely that an incorrect codeword is at distance t of the received word than at a smaller distance, performing iterations correcting less than t errors resolves many errors inserted due to undetected error events by the adjacent component code decoders, without being inserted again. However, multiple equal error vectors can prevent this, as the inserted errors appear in the same positions. It is therefore hard to give a theoretical analysis of stall patterns with undetected error events, but our simulations (Section 4.2) show that many of these patterns can still be resolved. Fig. 5 shows an example of an uncorrectable (4, 3) stall pattern with three equal error vectors in the columns and additional undetected erroneous rows that cannot be resolved at all. While the decoder detects that the block is not valid, the columns cannot be located because their syndromes are zero and the operation bit-flip fails. In this case the stall pattern is neither detectable nor correctable. 4 Improved Error-Floor Analysis
An Improved Analysis
The error floor of the standard sliding-window decoder (see Section 2.2) is dominated by minimal stall patterns [1] . As shown in Section 3.1, the improved decoder is able to solve all minimal stall patterns and its error floor is dominated by the remaining unsolvable stall patterns.
Since the success of the resolving strategy described in Section 3.1 depends on K, L and also ǫ, we analyze each summand from (4) separately:
Since minimal stall patterns are no longer dominating, this estimation is not accurate, due to the overestimation of
The problem of finding the number of stall patterns of weight ǫ within K rows and L columns, is equivalent to the problem of finding the number of binary K × L matrices with weight ǫ and a given weight in each row and column. A solution to this combinatorial problem is given in [5] . The function takes a vector r = [r 1 , r 2 , ..., r K ] containing the row weights as well as s = [s 1 , s 2 , ..., s L ] containing the column weights and returns the number of distinct binary matrices that meet the weight restrictions on the rows and columns, denoted by A(r, s). By definition, for stall patterns it holds that
Since every error has to be in one of the rows and one of the columns, K i=1 r i = ǫ and L j=1 s j = ǫ. Let R be the set of all pairs of vectors, such that these conditions on r and s hold. The exact number of stall patterns of weight ǫ, given the rows and columns, is
where D(·) is a function devised to recursively iterate through all vector pairs of R and return the result of the summation (see Lemma 1 in the appendix). Thus, the contribution of stall patterns of a given size to the error floor can be stated without overestimating the number of unique stall patterns.
Theorem 2. Consider a staircase code of block size m × m. Given ξ, the contribution to the error floor stall patterns of size (K, L) and weight ǫ is given by
The accuracy of (11) is determined by ξ, which is usually approximated by an estimation or simulation. This function determines the contribution to the error floor for all stall patterns, assuming they cannot be resolved by the techniques introduced in Section 3.1. Notice that from these "unsolvable" patterns, actually a large percentage is resolvable but due to the wide variety of combinations that can occur, it is difficult to obtain analytical results.
Simulation Results
To show the improvement in performance of our new technique, simulation results for a specific staircase code are presented. The parameters are chosen such that the encoder and decoder can employ a simpler structure than in [1] , by using less memory and lower complexity component codes. The size of the blocks is quartered compared to [1] by using a [n = 510, k = 491] extended t = 2 error correcting BCH code, resulting in block size Decoding with the regular decoder, as described in Section 2.2, results in an error floor at ∼ 2 · 10 −10 for p = 5 · 10 −3 , which is well above the desired error floor of 10 −15 . This error floor was found by simulation, using a sliding window of size W = 7. The variable ξ adjusting for the difference between estimated error floor and simulated probability of a stall pattern occurring was determined to be ξ = 1.6 · 10 −3 . To find the capability of the improved decoder to solve stall patterns, a dedicated channel was implemented. Stall patterns of a certain size and weight are inserted at a random position within two blocks. At the output, it is determined whether the stall pattern was resolved or not. It was assumed, that the decoder was able to resolve all surrounding errors. For this reason and to avoid unwanted effects, such as resolving of a stall pattern through an undetected error event, no errors other than the ones belonging to the stall pattern were inserted. The simulation results are given in Table 1 .
In Columns 1-3 of Table 1 , the size and weight of the respective stall patterns are given. The fourth column gives the percentage of stall patterns that the improved decoder was able to solve. For each size, more than 2000 stall patterns were inserted as described above. If any errors in the corresponding blocks were observed at the decoder output, the stall pattern was counted as unsolved. The remaining columns give the contribution to the error floor, as found by applying different estimations. The column labeled P C,old gives the estimated contribution of stall patterns of the respective size, obtained by applying the estimation from [1] , shown in (9). In column P C,new , the corresponding values obtained by (11) are given. The column P f loor,new shows the contribution of the improved decoder which resolves the respective percentage of stall patterns. The new dominating contributor are the stall patterns of size (3, 4) and (4, 3), mainly due to the inability of the decoder to solve a large percentage of these. An illustration of such an unsolvable stall pattern was given in Fig. 5 . Our simulations show that the resolving strategy of only partially inverting large stall patterns (i.e., those which are not covered by Theorem 1) resolves a large percentage of these (e.g. 99.9% of (6, 6) stall patterns with ǫ min = 18). The error floor of the staircase code with the given parameters is expected to be at BER out ≈ 9 · 10 −15 for a BSC crossover probability of p = 5 · 10 −3 . For comparison, the error floor of the same staircase code employing the regular decoder lies at BER out ≈ 2 · 10 −10 . The performance of the code in terms of the gap to BSC capacity for the given rate is estimated to 1dB and it achieves a net coding gain (NCG) of 9.16dB at a BER out of 10 −15 . For comparison, the code presented in [1] achieves a NCG of 9.41dB and a gap to capacity of 0.56dB, however operating on much larger blocks. Fig. 6 shows a comparison between the expected performance when using the regular decoder and the improved decoder. Note that the simulations on the ability of the improved decoder to resolve stall patterns were performed under the assumption that the decoder is able to isolate each stall pattern, i.e., resolve all errors which are not part of stall patterns. Furthermore, the assumption made in [1] , that the stall patterns dominate the error floor, is adopted. Assuming that the window size of the regular decoder is chosen such that only the first (lowest indexed) block is free of all correctable errors, the sliding window size of the improved decoder has to be slightly larger to obtain a sufficient number of blocks that can be considered to be error-free with the exception of stall patterns.
Conclusion
Staircase codes are a powerful code construction for optical networks which perform close to the BSC capacity. The decoding based on a hard decision component code provides efficient implementations, even at high data rates. However, the usable block size is limited by requiring an error floor of 10 −15 in optical communications. In this work, an improved decoder is proposed, lowering the error floor significantly while increasing complexity only marginally. This improvement enables the use of smaller block sizes at comparable rates which effectively lowers the memory requirement and the component decoder complexity. Furthermore, an analysis of the error floor was presented resulting in a more accurate estimation, especially for the improved decoder.
Future work of interest includes an FPGA implementation capable of simulating the code with the given parameters, down to its estimated error floor to show that assumptions made on the capability of the decoder to correct errors surrounding a stall pattern hold.
