This letter presents a "before convergence" early stopping criterion for the LDPC decoder defined in the second generation of DVB standards. The idea is to stop the decoding process once the estimated number of remaining errors is below the maximum capacity correction of the outer BCH decoder used in the DVB-S2, T2 and C2 standards. Simulations show that the average number of iterations is reduced by up to 26% compared with classical early stopping criterion.
Introduction:
The DVB-S2 [1] , -C2, -T2 and -S2X standards (called DVB-X2 in the sequel) were ratified by the DVB committee to specify the broadcast of TV through different channels (satellite, cable and terrestrial). The coding part of these standards contains a common inner Low Density Parity Check (LDPC) encoder and an outer Bose-ChaudhuriHocquenghem (BCH) encoder. Similarly, the Chinese Mobile Multimedia Broadcasting (CMMB) [2] defines an LDPC encoder and an outer ReedSolomon (RS) encoder. The use of an Early Stopping Criterion (ESC) avoids wasting time and energy on useless decoding iterations once the inner LDPC decoder has converged to a codeword, i.e., when all the parity checks are fulfilled. In [3] , Kienle et al propose to extend the utilization of the ESC to discriminate decodable and non-decodable blocks. In [5] , the Hard-Decision-Aided (HDA) method, initially designed for Turbo codes [4] , was implemented for LDPC decoders. The HDA method is simple to implement but suffers from performance degradation. The key idea of this letter is to replace the ESC by a "Before Convergence" Early Stopping Criterion (BC-ESC), i.e., to stop the iterative decoding process as soon as the estimated number of remaining errors in the information bits is below or equal to the error capacity correction of the outer code.
Principle of the "Before Convergence" Stopping Criterion:
The DVB-X2 standards feature a powerful coding scheme based on an outer BCH encoder and an inner LDPC encoder. This outer code is introduced to avoid error floors at low bit-error rates. One of the key features of BCH codes is the ease with which they can be decoded. Another advantage of using BCH codes is the capability to correct all patterns of t or less errors among the bits of information, with t = 12 for code rates r=1/4, 1/3, 2/5, 1/2, 3/5, 3/4 and 4/5; t = 10 for code rates 2/3 and 5/6; t = 8 for code rates 8/9 and 9/10. In the DVB-X2 standard codes, all redundant bits of the LDPC code have a variable node degree equal to two and the information bits have a degree greater or equal to 3. As a consequence, information bits tend to converge faster than redundant bits. If the errors are all located on the redundancy bits, a non-null syndrome can still be associated to error free information bits. Moreover, since the BCH code can correct t errors in the information bits, it is possible to stop the decoding process as soon as the number of errors e in the information bit is bound by t (e ≤ t), with an arbitrary number of errors in the redundant part. In the sequel, the parity check matrix contains m parity checks of degree d c and n variables. For a given Check Node (CN) c, V(c) represents the set of Variable Nodes (VN) connected to CN c, V/v represents V excluding VN v. The messages from check to variable and the messages from variable to check are denoted, respectively M c→v and M v→c ; the soft output of variable v is denoted
).
When the number of errors is low, the number of unsatisfied checks should also be low. From this observation, a "Before Convergence" Early Stopping Criterion can be derived. After each decoding iteration, the number τ o of Non-satisfied CN is computed by counting the number of one's in the syndrome vector (S o (c), c = 0 . . . m − 1), with S o (c) computed using the sum over GF (2) as
where the sign function is the hard decision associated with the soft value x: sign(x) = 0 if x > 0, 1 otherwise. When τ o is below a threshold T o , the decoding process is stopped. Note that if τ o = 0, the decoded message is a codeword. In a flooding scheduling architecture, an iteration consists in updating all the check nodes (thus, the computation of the S o ), then all the variable nodes. Unfortunately, the computation of the S o values is not carried out in a layered decoding algorithm.
Syndrome computation in layered decoders: Let us recall the principle of layered decoding [6] and associated stopping criteria. One iteration consists in sequentially processing all the CNs. The update of the VNs connected to a given CN is done serially in three steps. First, the M v→c are calculated:
with M old c→v set to 0 during the first iteration and SO(v) initialized with the intrinsic information. The second step is the serial M c→v update. For implementation convenience, the sign and the absolute value of the message are updated separately:
where f (x) = ln tanh
. The third step is the calculation of the SO
From these equations, the node processor architecture can be derived, as depicted in Fig. 1 . The left adder of the architecture performs (2) while the right adder implements (5) . The central part is in charge of the serial M c→v updates (i.e. (3) and (4)). The computation of τ 0 requires copying all SO values at the end of every decoding iteration and processing them independently of the layered decoding process. To avoid freezing the layered decoding process when the syndrome is computed, additional hardware is required to process the syndrome in parallel. The drawback is a more complex design and a systematic waste of one decoding iteration before the ESC is computed (with the hypothesis that the syndrome computation takes the same time as the decoding iteration). In order to overcome this problem, the syndrome can be computed on the fly during the decoding iteration. Two sub-optimal syndromes can be generated in the horizontal scheduling architecture (or layered architecture). First, the syndrome S l (c) can be computed on the fly using the current value of the soft information (5)
The additional hardware required to compute S l (c) is presented in dashed lines in Fig. 1 . S l (c) is set to zero when the check node c starts to be processed (this mechanism is not shown in Fig. 1 ). During d c clock cycles, S l (c) accumulates the values SO new to process (6) . The computation of S l will require less hardware than S o , however S l is sub-optimal because the SO used in (6) may change sign several times during one iteration. The classical method to compute (3) for all v ∈ V(c) is to use the fact that M ⊕ M = 0 in GF(2), thus:
Considering an approximate of SO(v ′ ) by the variable to check message M v ′ →c (see (2)), S a (c) can be used as an approximated version
This paper is a postprint of a paper submitted to and accepted for publication in Electronics Letters and is subject to Institution of Engineering and Technology Copyright. The copy of record is available at IET Digital Library Jan. 2015 Vol. 00 No. 00
This
Simulation results: Bit true C simulation has been performed using the architecture presented in [7] . For each code rate, the signal to noise ratio (SNR) is set to the value required by the standard for a Quasi Error Free (QEF) transmission. For each code rate, N = 5 × 10 6 frames are simulated with it max iterations. A BC-ESC is assumed to be "standard compliant" if, among the N tested frames, it never stops the decoding process with more than t remaining errors. In the sequel, subscript x will denote one of the three BC-ESC methods (x = o, l or a), the average number of decoding iterations for a given BC-ESC method is noted as A x .
Let τ x (k, i) (respectively e(k, i)) be the number of non-verified syndromes (respectively remaining errors) for the k th simulated codeword at iteration i (i ≤ it max ). Let T x be the set of integers so that T ∈ T x implies that, for all k = 1 . . . N − 1, i = 1 . . . it max ,
The threshold T x is thus determined as T x = max{T x }. A two dimensional BC-ESC implying both τ l and τ a can be defined by extending (9) to the two dimensional case with a dual-criteria BC-ESC. Let T l,a be the set of couples (T l ,T a ) ∈ N 2 so that, for all k = 1 . . . N − 1, i = 1 . . . it max :
At a given iteration, if (T l = τ l (k, i),T a = τ a (k, i)) belongs to T l,a , then, according to (10), the number of residual errors is below t, thus the LDPC decoder can stop the iterative process and outputs the current hard decision to the outer BCH code. Fig. 2 represents the set T Table 1 shows T x and A x for different rates. By convention A x (T x ) indicates the average number of iterations when the BC-ESC τ x ≤ T x is used. A g is the minimum average number of iterations obtainable when a genius BC-ESC stops the decoding process as soon as e ≤ t. Considering the state of the art ESC A o , one iteration is added to take into account the latency for the syndrome computation processing in a layered architecture.
Note that, in [6] , convergence is detected when all HDA l (c) = 0 (no change of SO sign during two consecutive iterations, see Fig. 1 ) and when τ a = 0 (all check nodes are fulfilled). This ESC also requires an extra decoding iteration after convergence, again leading to an average number of decoding iterations equal to A o + 1.
For all code rates, A l , A a and A l,a are below A o + 1, except for code rates r = 2/3, 3/4 and 4/5 where A l (T l ) = it max . For these code rates, T l = −1, which implies that condition (9) is never fulfilled. In fact, as shown on Fig. 2 , when τ l = 0, if τ a > 180 then e can be greater than t, i.e τ l = 0 is not a sufficient criterion to ensure that e ≤ t. The last line of Table 1 shows the reduction gain between the classical ESC (A o + 1) and the proposed BC-ESC when the dual-criteria A l,a is used. The gain varies from 8 % (high code rates) up to 26% for code rate 1/4. Performance of the genius BC-ESC shows that there is still a significant potential reduction in the average number of decoding iterations for low code rates (for code rate r = 1/4, A g = 15.7 while A l,a = 20.1).
Combined with an input buffer, the BC-ESC can be used to increase the decoding throughput [8] . It can also be used to reduce the average energy required to decode a codeword. Conclusion: This letter shows that stopping the decoding process of an inner LDPC code can significantly reduce the average number of decoding iterations without any performance degradation, when the estimated number of remaining errors is below the maximum capacity correction of the outer decoder. A low complexity dual-criteria BC-ESC has also been proposed for DVB-S2, T2 and C2 standards. Compared to ESC, BC-ESC can reduce the average number of iterations of a few percents up to 26 %, depending on the code rate. The proposed BC-ESC can be used for energy saving or for increasing the average decoding throughput. 
