Abstract-We give an architecture of a storage system consisting of a storage medium made of unreliable memory elements and an error correction circuit made of a combination of noisy and noiseless logic gates that is capable of retaining the stored information with the lower probability of error than a storage system with a correction circuit made completely of noiseless logic gates. Our correction circuit is based on the iterative decoding of low-density parity check codes, and uses the positive effect of errors in logic gates to correct errors in memory elements. In the spirit of Marcus Tullius Cicero's Clavus clavo eicitur (one nail drives out another), the proposed storage system operates on the principle: error errore eicitur-one error drives out another. The randomness that is present in the logic gates makes these classes of decoders superior to their noiseless counterparts. Moreover, random perturbations do not require any additional computational resources as they are inherent to unreliable hardware itself. To utilize the benefits of logic gate failures, our correction circuit relies on two key novelties: a mixture of reliable and unreliable gates and decoder rewinding. We present a method based on absorbing Markov chains for the probability of error analysis, and explain how the randomness in the variable and check node update function helps a decoder to escape to local minima associated with trapping sets.
Abstract-We give an architecture of a storage system consisting of a storage medium made of unreliable memory elements and an error correction circuit made of a combination of noisy and noiseless logic gates that is capable of retaining the stored information with the lower probability of error than a storage system with a correction circuit made completely of noiseless logic gates. Our correction circuit is based on the iterative decoding of low-density parity check codes, and uses the positive effect of errors in logic gates to correct errors in memory elements. In the spirit of Marcus Tullius Cicero's Clavus clavo eicitur (one nail drives out another), the proposed storage system operates on the principle: error errore eicitur-one error drives out another. The randomness that is present in the logic gates makes these classes of decoders superior to their noiseless counterparts. Moreover, random perturbations do not require any additional computational resources as they are inherent to unreliable hardware itself. To utilize the benefits of logic gate failures, our correction circuit relies on two key novelties: a mixture of reliable and unreliable gates and decoder rewinding. We present a method based on absorbing Markov chains for the probability of error analysis, and explain how the randomness in the variable and check node update function helps a decoder to escape to local minima associated with trapping sets.
Index Terms-Faulty hardware, Gallager-B decoding, LDPC codes, Markov chains, unreliable logic.
I. INTRODUCTION

O
RIGINS of a system's unreliability lie in the underlying physics mechanisms governing the operation of its parts. For example, in micro and nano-electronics devices it is due to low supply voltages and imperfections in the manufacturing process [1] , [2] , in space missions electronics due to high energy particles striking the semiconductor devices [3] .
In order to ensure the robustness of a system to noise and/or faults in its parts, one relies on redundancy and computation, which compensate the negative effects of unreliable parts. Typical examples are systems for storage of information. Information written on a storage medium is physically represented as one of the several stable states of the memory elements comprising the medium (magnetization direction of magnetic grains, surface reflectivity, charge in capacitors, etc.). Since the reliability of the memory elements cannot be improved due to the underlying physics and manufacturing cost, one relies on a periodic refreshing of the stored data content to prevent the data decay. In this process, the errors that may have occurred in the medium are corrected in a computational device, called a correction circuit. Without loss of generality, a correction circuit can be assumed to perform a sequence of binary operations, and all traditional systems rely on the assumption that the correction circuit is noiseless, i.e., made of noiseless Boolean logic gates. In other words, computations performed in the correction circuit are deterministic, while randomness (in the form of noise and/or errors) exists only in the storage medium.
The above assumption is certainly appropriate for correction circuits in which the reliability of logic gates is many orders of magnitude higher than the reliability of memory elements. However, an interesting situation arises when a correction circuit itself is made of noisy components. For example, in low-powered submicron complementary metal oxide semiconductor (CMOS) chips mentioned above, the supply voltage is kept low in order to reduce power consumption, thus making logic gates susceptible to noise and increasing the probability of incorrect logic gate output. Due to unreliability of its logic gates, the correction circuit -whose purpose is to correct errors -introduces errors in the process of correcting errors from the storage medium.
Making logic gates reliable (for example by using larger supply voltages) appears as a logical solution. Another way to ensure robustness of a decoder is to employ the von Neumann multiplexing [4] . However, this comes with a price of large redundancy because it does not take into account the specifics of the decoding algorithm. The first attempt to use a more advanced coding scheme to ensure fault tolerance of storage systems made from unreliable components is due to Taylor [5] and Kuznetsov [6] . Fault-tolerant decoding and storage has attracted significant attention lately, and numerous approaches have been proposed which exploit the inherent 0090-6778 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
redundancy of the existing decoders [7] - [10] . In this paper we show that by two simple but key modifications the rewinding schedule and more reliable gates for critical computations the decoder can be made tolerant for a wide range of the gate failure rates. Moreover, we show that the logic gate errors may help the decoder, and present a class of noisy decoders that perform better than their noiseless counterparts. We show that the correction circuit can operate in the unreliable gate regime, and still be able to correct more errors than the noiseless correction circuit. Moreover, the random perturbations do not require any additional computational resources as they are built in the unreliable hardware itself. User information is stored as a codeword of a low-density parity check (LDPC) code, and the correction circuit is based on the Gallager-B decoding algorithm (an architecture based on the bit-flipping will perform similarly [11] , [12] .) For the scheme to work, not all logic gates can be allowed to be noisy. Small fraction of critical ones are made reliable. This can be practically done using larger transistor size, higher voltage supply or slower clock. Secondly, to avoid accumulation of errors and divergence from the true codeword, the decoder is periodically rewound -i.e., re-initialized and restarted.
The first trace of the deliberate error idea can be found in Gallager's work where the random flips are used to resolve ties in the majority voting operation in the variable node, while the first iterative decoding algorithm that explicitly relies on randomness to correct errors is Miladinovic and Fossorier's Probabilistic Bit Flipping (PBF) [13] . A closely related technique of adding noise to messages in a BP decoder on the AWGN channel is by Leduc-Primeau et al. [14] for reducing error floor in the context of noiseless decoders. Recently it was shown by Sundararajan et al. [15] that random perturbations can be used to increase the performance of a gradient descent bit flipping decoder (GDBF), introduced by Wadayama et al. [16] . At the same time, we observed that the randomness coming from computational noise even more improves the GDBF decoding performance. Based on that result we developed a probabilistic gradient-descent bit flipping (PGDBF) algorithm [17] , and introducing a random perturbation in the PGDBF algorithm is reminiscent of the operation of mutation in genetic algorithms [18] .
A dominant method for analysis of noisy decoders is the noisy-density evolution (DE) technique [19] . It provides a convenient measure of robustness of a decoder, but it cannot provide an explanation why and when a noisy decoder performs better than the noiseless one. The underlying reasons are based on the assumptions of message independence and infinite number of iterations, thus implying an infinitely long code. For finite length codes, the performance of a noiseless iterative decoder depends upon the presence of trapping sets which are annihilated by the randomness in the decoder. Although the analysis in this paper is given for the storage system based on Gallager-B decoder, it is also applicable to other decoding algorithms.
The rest of the paper is organized as follows. In Section II, the preliminaries on iterative message passing decoding of LDPC codes are discussed. In Section III, we give a description of our storage system architecture. Section IV is dedicated to the theoretical analysis of the noisy Gallager-B decoder performance. The numerical results are presented in Section V. Finally, some concluding remarks and future research directions are given in Section VI.
II. PRELIMINARIES
A. LDPC Codes
Consider a (γ , ρ)-regular binary LDPC code, denoted by (n, k), with code rate R = k/n ≥ 1 − γ /ρ and parity check matrix H . The parity check matrix is the bi-adjacency matrix of a bipartite (Tanner) graph G = (V ∪ C, E), where V represents the set of n variable nodes, C is the set of nγ /ρ check nodes, and E is the set of nγ edges. Each matrix element H c,v = 1 indicates that there is an edge between nodes c ∈ C and v ∈ V , which are referred as neighbors. Let N v (N c ) be the set of neighbors of the variable node v (check node c). Then, |N v | = γ , ∀v ∈ V and |N c | = ρ, ∀c ∈ C, where | · | denotes cardinality. In irregular LDPC codes, nodes do not necessarily have the same number of neighbors.
Let x = (x 1 , x 2 , . . . , x n ) denote a codeword of an LDPC code, where x v represents the binary value associated with the variable node v. During transmission over a channel, an error vector e = (e 1 , e 2 , . . . , e n ) is superimposed to the codeword, and the received word is y = x ⊕ e, where ⊕ is componentwise XOR, and y = (y 1 , y 2 , . . . , y n ). The analysis methodology presented here is not restricted to a particular type of channel errors nor error statistics, but the majority of the results are given for the Binary Symmetric Channel (BSC), where we assume that the code bits are flipped independently with probability α M . The message passed from a check node c to a variable node v in the -th iteration is denoted by μ 
B. Noiseless Gallager-B Decoder
The Gallager-B decoder works by sending binary messages over the edges of the graph. The messages are calculated based on the node update functions, following the rule that a message sent over an edge is obtained based on all received messages except the one arriving over that edge. The check node update function corresponds to the (ρ − 1)-input XOR logic gate, and (γ − 1)-input majority logic (MAJ) gate is used for the variable node update function implementation. In other words, the following operations are performed until the codeword is found or a maximum number of iteration, denoted by L, is reached. In each iteration , for each variable node v ∈ V , and for all c ∈ N v , i.e.
The function MAJ is defined as MAJ(m) = sgn( m) wherein denotes the sum of its argument's components, and sgn is the signum function. By convention, we take sgn(0) = 0. We note that an alternative rule is possible which does not require a register for storing the channel values. In this case the previous variable estimate is used when there is a tie in the incoming messages of the variable node v. More precisely, ν ( ) 
, when MAJ(m ( ) ) = 0. Performance superiority of the rule given in Eq. (1), justifies the hardware overhead due to reliable registers for storing y. Keeping the received word in a reliable temporary register is crucial for the decoder trajectory stabilization because it anchors the decoder state to that determined by the initial channel values, and prevents trajectory divergence from the initial value.
In each iteration ≥ 0, for each check node c ∈ C, and for all v ∈ N c , (n) = n, where is taken componentwise and is equivalent to computing the XOR of the incoming bits. The decided bit value in the -th iteration is calculated by a majority vote among the all incoming messages together with the channel value. By convention, +1 corresponds to the input 0 and −1 to 1. For decoders with binary messages, it is more convenient to define the message alphabet as M = {0, 1}. Then a check node c performs XOR operation, and it is said to be satisfied if σ c = 0, and unsatisfied if σ c = 1. The syndrome at the -th iteration is the ordered set {σ We say that a codeword is found if all M check nodes are satisfied. The iterative procedure is halted when all parity checks are satisfied or the predefined maximum number of iterations, denoted by L, is reached. The decoding is called successful if a codeword (either correct or wrong) is found. Otherwise, the decoding is said to have failed. The event of producing a codeword estimate which is a wrong codeword is called miss-correction. If t is the error correction capability of a given iterative decoder, the decoder is said not to converge to the correct codeword if there exists some error pattern e of weight supp(e) > t, which leads to a decoding trajectory {μ ( ) C→V } ≥0 such that for arbitrary there is at least one bit estimatex
C. Noisy Gallager-B Decoder
Due to hardware unreliability the results of the update functions are not always correctly computed. We model this by XOR-ing the noiseless output with the binary error e M AJ or e ⊕ ν ( )
We consider the von Neumann probabilistic failure model, according to which e M AJ and e ⊕ are independent Bernoulli random variables with parameters α M AJ and α ⊕ . Clearly, a gate with a smaller Bernoulli parameter are more reliable.
Note that different realizations of Bernoulli random variables In addition to the logic gates needed for calculation of messages, the decoder also requires logic gates for the final bit estimations and parity-checks calculation. If we allow the gates used to compute the variable node estimates to have failure rate comparable with the bit error rate of the decoder, the performance would be determined by the failure probabilities of these gates, not by the error control scheme [20] . Hence, these gates must have higher reliability.
III. THE PROPOSED STORAGE SYSTEM ARCHITECTURE
In our storage system, each k-bit user information is stored as a codeword of an (n, k) regular LDPC code of length n and code rate R = k/n (applicability of our method to irregular codes is illustrated in Section V). The storage medium contains Nn memory elements and is capable of storing Nk user bits and N(n −k) redundant bits. The memory elements are unreliable and fail transiently and independently of each other -they follow the von Neumann failure model [4] . To prevent data decay, the stored information is periodically refreshed. Each of N codewords is processed in a round-robin fashion by a common correction circuit, whose n-bit output is written back into the corresponding codeword location on the medium as shown in Fig. 1(a) . The set {x (1) , x (2) , . . . , x (N) } in Fig. 1 (a) denotes the set of codewords stored on a medium.
Due to errors in memory locations, x (i) is read back as y (i) . The role of the correction circuit is to correct errors in each word y (i) , and write the n-bit codeword estimatex (i) back into the corresponding location on the medium. When a codeword on the medium is scheduled to be processed, the probability of error in each memory element is α M . Since the correction of each codeword is independent of others, the storage of the codeword x can be modeled as a transmission through a communication channel. The medium is modelled as the BSC in which the code bits are flipped independently with probability α M . The decoder must maintain a low probability of not recovering x -the so-called frame error rate (FER).
A. Critical Gates Must Be Reliable
As explained at the end of Section II, the logic gates used to extract and output bits from any decoder must be more reliable than gates that are used for implementing the node update functions. Thus, the majority-logic gates in the decoder's decision logic are made of the more reliable gates, as explained at the end of the previous section. We also assume that the effect of errors in the encoder are incorporated in the value of the memory element error probability α M .
The register inside the decoder which temporarily stores the word read from a memory medium (the channel values) is also reliable. This is necessary because otherwise the codeword estimate would drift away from the true codeword in the course of decoding, as iterations progress. As the channel value memory is usually much smaller than the memory required for the messages, the resulting overhead is quite small.
As syndrome checker (the ρ-input XOR gate together with the m-input AND gate) is used as a decoding halting criterion, it must be also made of reliable logic gates. Registers for storing the intermediate results of computations of and are unreliable, and their unreliability is accounted for in α M AJ and α ⊕ . The blocks made on noiseless gates and those made on noisy gates are shown in Fig. 1(b) .
B. Rewinding
To allow the decoder to benefit from errors, a large number of iterations are needed under some conditions of gate unreliability. However, too many logic gate errors can overwhelm the decoder. In addition to the Gallager-B update rules given in Eq. (1), our decoder is equipped with the following key feature which prevents the accumulation of errors in the messages when the number of iterations is large. If a codeword is not found after L R iterations, where L R L, the decoding algorithm is re-initialized with the word received from the channel. Instead of running the whole L iterations, the decoder runs r = L/L R very short rounds with a maximum of L R iterations each. A decoder with such rewinding schedule is referred to as the rewind-decoder and denoted by F r (L R ). We write
where denotes the rewinding schedule, and the expression has r terms. Clearly, the plain noisy Gallager-B decoder with no rewinding, denoted by F (L), is a special case of the rewinding decoder,
IV. PERFORMANCE EVALUATION OF NOISY GALLAGER-B DECODER
To characterize the FER performance of a given noisy decoder F , we need to compute the probability Pr{x ( ) = x} for arbitrary error vector e and find how it varies with the parameters α ⊕ , α M AJ and L, and, for the rewind-decoder F r , also with the parameter of L R . The goal is to find a region in this 4-dimensional design space in which F and F r outperform their noiseless counterpart F . We now introduce a Markov model which facilitates this analysis. We first derive the model for the entire code graph, and then show that it can be simplified and applied to specific code subgraphs while keeping its accuracy. Let μ ( ) = (μ ( ) c ) c∈C be the ordered set of all messages from check nodes in iteration . From Eq. (2), they can be expressed as
where the function ϒ C is the composition of and , and define the dynamical system of the noiseless decoder. The binary vector e ( ) μ of length nγ (= mρ) is the realization of errors at time that affect the computation of messages μ. Their elements are deterministic functions of Bernoulli random variables representing the errors in the MAJ and XOR gates, and are time invariant but not independent. Probability distribution of e μ can be determined using elementary probability.
From the discussion above, for a given decoder F and error pattern e in the memory elements, the random process {μ ( ) } >0 is homogenous Markov chain W e with finite state space S = {0, 1} |C|ρ , the transition probability matrix
is a function of the memory error vector e and α ⊕ . In the case of noiseless decoder, the initial state of the Markov chainε e is uniquely determined for each channel error vector e, and thus π (0) ε e = 1, while all other states have zero initial probability. In a noisy decoder, however, every outgoing message form a check node is also subject to a flip induced in that check node. Therefore, there is a nonzero probability of starting from any state in S. Given the channel error vector e, the initial probability of the state ε is
where dε e ,ε is the Hamming distance between the binary vectorsε e and ε. The transition probabilities between states p ε,δ = Pr{μ ( ) = δ|μ ( −1) = ε} depend on α ⊕ and α M AJ . Since is the function of the memory output y, ϒ C also depends on y.
Thus the transition probabilities depend on the the channel error vector e, and for a given decoder we have an ensemble of Markov chains {W e } e∈{0,1} n . Due to independence of logic gate errors, their impact can be condensed to a single parameter α G (where G stands for "gate"). Letδ e = ϒ C (ε, e) be the state of the noiseless decoder F reached from the state ε for the channel error vector e. Then
where dδ e ,δ is the Hamming distance between the binary vectorsδ e and δ, and α G is the probability that a single XOR gate output in a noisy decoder F is different from the corresponding XOR gate output in the noiseless decoder F ,
The term multiplying (1−α ⊕ ) is the occurrence probability of an odd number of MAJ gate errors, while the term multiplying α ⊕ is the occurrence probability of even number of MAJ gate errors. Note that for small logic gate error rates, the above expression can be approximated by
Let the vectorsx ( ) andσ ( ) be the variable node decisions and check node estimates in iteration . If they were computed by noisy gates, the probability of convergence to a codeword would be close to zero for any iteration . This follows from Eq. (3) and the fact that π ( ) = π (0) P W ( ) . This explains the requirement for using noiseless gates to compute the variable node estimates.
It is instructive to classify the states of W e with respect to their closeness to codewords. Due to channel, decoder, and the gate-error mechanism symmetries, the decoder behavior is independent of the received word y. However, it depends on the error pattern e in memory elements. Furthermore, since the code is linear, it is sufficient to consider the all-zero codeword x = 0 for the FER analysis [21] . Let S 0 ⊂ S denote the subset of states for which all parity check are satisfied and the variable node decisions form the all-zero codeword 0. Similarly, S ∼0 denote the set of states for which all parity checks are satisfied, and the variable node decisions form a non-zero codeword. The set S ∼C includes all states for which the variable node decisions are not codewords. Thus, the above three disjoint sets partition the set of states S = S 0 ∪S ∼0 ∪S ∼C .
Note that the above analysis is applicable not only to the von Neumann model, but to any failure model in which gate failures are transient and happen with non-zero probability. In such a case the transition probabilities of the absorbing Markov chain would be non-zero, although the corresponding expressions could be more complex.
A. FER Performance and Time to Absorption for a Given Error Pattern
For a given error pattern e, the conditional FER, and the conditional miscorrection rate (MER) of the decoder F , in the iteration , Pr{x ( ) = x,σ ( ) = 0} can be now found from W and expressed as
Theorem 1: For a noisy Gallager-B decoding algorithm D = F (L) on any LDPC code C , ∃L * and , which depends on
Proof: The proof is given in Appendix A.
The average number of iterations taken for absorption to the states in S 0 and S ∼0 can be calculated from the Markov chain W e . If we combine all states in S 0 into a single state, and do the same for S ∼0 , we end up a with a reduced Markov chain denoted by M e . The transition probability matrix
can be obtained from that of W e by summing up the corresponding rows as explained in Appendix A. M e is also homogenous and absorbing but has only two absorbing states, one corresponding to the correct decision, and the other corresponding to miscorrection. The matrix Q in Eq. (6) is a transition probability matrix between the transient states in S ∼S with no zero entries, and I 2 is the 2×2 identity matrix. The transition probabilities from transient to absorbing states are given by the matrix R = (R 0 , R ∼0 ). The fundamental matrix N = (I − Q) −1 determines the average times to absorption from different transient states. As long as the gate failures are transient, the sum of entries in any row of Q is strictly less then one, and the largest eigenvalue is less than one. If set of transient states is finite, the invertibility of I − Q is guaranteed [22] . More specifically, δ∈S ∼C N β,δ is the average time to absorbtion from the transient state β. The complete derivation is provided in Appendix A. We note that the Markov chain was derived for a given channel error vectors e, and that no assumptions were made about the statistics of e. Hence, our methodology captures a wide set of memory models including memories with permanent errors (defects). In the rest of the paper, we focus our attention on the case when the bits in memory are flipped independently and transiently with probability α M .
B. Average FER Performance
We have shown that the Markov chain model allows us to determine these probabilities. The details of this procedure are also given in Appendix. By averaging over all error patterns, we obtain the average FER as
Note that the Eq. (5) is valid for a given error pattern e, and it translates to the averages (i.e., (7) can differ significantly. The probability of the error pattern e, Pr{e} depends on its Hamming weight w(e). In the case of transient i.i.d. memory errors occurring with probability α M , it can be expressed as
For a noiseless decoder for which α G = 0, transitions between the states are deterministic, and the attractor basin of a dynamical system (μ ( ) = ϒ C (μ ( −1) ) in Eq. (3) includes the codewords -which are the fixed points -and trapping sets, which can be either fixed points or cycle attractors. A noiseless decoder may oscillate between these states, thus failing to converge to a codeword. On the other hand, in a noisy decoder, every state can be reached with a nonzero probability. Thus, the noisy decoder will eventually converge to a codewordeither correct or incorrect one.
In the case when the decoding algorithm have small probability of miscorrection in the first decoding iterations, it is better to use the rewinding decoder with r = L/L R rounds, where the restarts and re-initializations are performed after
The rewinding decoder is a composition of r rounds of the non-
This fact allows us to obtain the miscorrection probabilities for every particular error pattern. The restart is performed only in the case of decoding failure, i.e. when the syndrome is not equal to zero. Therefore, the miscorrection probability of the rewinding decoder is
and the frame error rate after r rewinds includes the cases of miss-corrections in all r rounds and the case when the syndrome is not zero in all iterations. Therefore,
Finally, the average FER is obtained as
We have shown that the ensemble of Markov chains {W e } completely determines the decoder's performance when the states are defined based on messages on entire Tanner graph. However, even for short codes, the state space S is large, and the above approach is computationally demanding. On the other hand, the theory of iterative decoders provides the graph topologies, known as trapping sets, that are responsible for decoding failures [23] , [24] . As dominant trapping sets are typically the smallest ones, considering only messages on an isolated trapping set will result in a Markov chain with tractable number of states. Thus, instead of averaging FER e (D) over all error patterns (as in Eqs. (7), (11)), we estimate probabilities that the error pattern e in trapping set of critical classes χ is not corrected. By using expressions (1)-(3) from [25] and [26] , the decoder performance in the error floor region is estimated as
where w(e) is weight of error pattern, m and s denote the critical number and strength of the trapping set as defined in [25] , M is the maximum number of errors which lead to decoding failure due to the trapping set χ, and s |χ| is number of non-isomorphic trapping sets of class χ [25] .
V. NUMERICAL RESULTS
We start with an example of a short code for which the performance can be determined analytically, as it helps to explain the concepts we have introduced so far. We then present the results on the various column-weight three codes.
A. (5,1) Code -Analytical Approach
As a motivation example, we present performance analysis of a short irregular code defined with the bipartite graph presented in Fig. 2 . It has the codeword length n=5, regular row weight (ρ j = 2, ∀ j ), and irregular column weight. It is easy to check that there are only two codewords: "all-zero" codeword and "all-one" codeword. This is a repetition code with minimum Hamming weight d min = 5, allowing the maximum likelihood (ML) decoder to correct all error patterns of weight t = 2 or less (this is a perfect code which attains the sphere packing bound).
As mentioned before, for the case of BSC and symmetric decoders, we can assume that the all-zero codeword is transmitted to analyze the effect of errors. First, we will consider weight-two error pattern e = (11000). The initial state is determined by error pattern from the channel, and if messages μ (1) c j →v i are sent from the j -th check node to the neighboring variable nodes, this initial state is μ (0) = (100, 10, 10, 10, 100). According to the messages μ (1) c j →v i , the codeword estimate after the first iteration is obtained asx (1) =  (0, 1, 0, 0, 0) . If all logic gates in the decoder are noiseless, the next state is completely determined by the check node and variable node processor functions. Therefore, the next state μ (1) =δ e = (000, 00, 00, 00, 111) is reached with probability one, i.e. Pr{μ( + 1) =δ e |μ( + 1) = ε e } = 1, while the other transitions are forbidden. Syndrome is not equal to zero in further iterations, and this error pattern is not correctable by using noiseless decoder.
In the case of noisy Gallager-B decoder, the (ρ j − 1)-input XOR gates which generate the messages μ v i →c j may be noisy. It is only assumed that the γ i -input MAJ gates, which produce the codeword estimate are noiseless, as well as the syndrome checker logic. For such a noisy decoder, all other states μ( + 1) = δ e can be reached in the next iteration in addition to the state μ( + 1) =δ e . The transition probabilities Pr{μ( + 1) = δ e |μ( + 1) = ε e } are given in Eq. (4). As we have shown in Section IV, by using a theory of absorbing Markov chains, it is possible to analytically determine the conditional FER and MER of a noisy decoder F (L) for any memory error vector e.
Performance of noisy Gallager-B decoder for error pattern e = (11000) is presented in Fig. 3(a) . It is clear that the increase in the number of iterations results in better performance for the certain values of the failure rates. For an infinite number of decoding iterations, the probability that the pattern is not decoded converges to the corresponding MER, and if tends to infinity we obtain FER
e . The accuracy of the proposed analytical model is verified for any number of iterations, by using independent simulation model.
In the same figure, we show that rewinding reduces the miscorrection probability. As MER lower bounds FER, the decoder performance can be improved for any error pattern if rewinding is applied with the appropriately chosen rewinding period L R . If rewinding is applied in the moment where the probability of the miscorrection is negligible and the probability of correct decoding is not negligible, this will affect the overall performance after rewinding. As the numerical values for conditional FER and MER are known for a particular error pattern for all iterations, the optimal value of parameter L R can be estimated. Numerical results obtained by using the proposed analytical approach perfectly corresponds to the simulation results.
The average FER as a function of crossover probability is presented in Fig. 3(b) , and it is obtained by averaging over all error patterns i.e. by using Eq. (7) . If the failure rate is high, it can help us to decode some high-weight patterns uncorrectable by using noiseless decoder but the MER for lowerror patterns is increased, and some error patterns correctable by using noiseless decoder are now uncorrectable with high probability. Therefore, in the case of high failure rates, the performance in error-floor region is typically poor. For a lower failure rate, the average FER will be reduced for a wide range of α M , with the price of increased number of iterations (effect from Fig. 3(a) ). If the failure rate in logic gates is high, the decoder performance can be significantly improved by using the rewinding procedure. For the noisy decoding with α ⊕ = 0.01, rewinding after L R = 3 iterations results in performance close to the ML bound.
B. Annihilation of Trapping Sets by Gate Failures
For the received word y, the subgraph induced by the set of variable nodes which are not eventually correct is called a trapping set. TS(a, b) denotes a trapping set with a variable nodes, and b odd-degree check nodes. TS(5, 3) and TS (4, 4) are shown in Fig. 4 (a) and Fig. 4 (b) . / denotes an initially incorrect/correct variable node, and / denotes an even-degree unsatisfied/satisfied check node.
denotes an odd-degree check node.
The topology of TS(5, 3) shown in Fig. 4(a) is similar to the graph in Fig. 2 . The difference is in three additional check nodes that connect this subgraph with the rest of the graph. Now, let us assume the following simplified scenario commonly used in literature to analyze trapping sets referred to as independence assumption [21] . In this scenario (i) messages generated inside the trapping set do not return from the rest of the graph, and (ii) gate failures that originate from the rest of the graph are represented by failures generated in the check nodes that connect the trapping set with the rest of the graph. Such check nodes are denoted by in Fig. 4 , and for the analysis of an isolated TS(5, 3) the Markov chain states are determined by nγ = 15 bits.
Using the above assumption, we now estimate the probability of "escaping" from TS(5, 3) that is induced by the three-bit error pattern given in Fig. 4(a) . This is the only combination of three bits in the variable nodes {v 1 , v 2 , . . . , v 5 } uncorrectable by the noiseless decoder. The corresponding initial state is (μ 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0) . In order to exhibit a net performance gain, the noisy decoder must be able to correct more errors -both those correctable and those uncorrectable by the noiseless decoder. Therefore, we also analyze the effect of logic gate errors to error patterns correctable by a noiseless decoder. An example of such pattern is the one where the variables v 1 , v 3 and v 5 are in error. Fig. 5 shows the probability of correcting an error pattern on an isolated trapping set as a function of the number of iterations for different values of α G , for TS(5, 3) and TS (4, 4) . It can be seen that while larger α G reduces the decoding latency for those error patterns uncorrectable by a noiseless decoder, the gate failures have a negative effect on decoding of error patterns correctable by the noiseless decoder.
The analysis in this section is given for isolated trapping sets and for the critical error patterns. If the trapping set is not isolated, logic gate failures in the rest of the graph could further degrade the overall performance. This effect is strong at very high failure rates when the gate failures introduce more new errors than what can be corrected within the trapping set. However, we show in the next section that the main conclusions remain the same even if the logic gate failures are inserted in the entire graph if the code have good distance properties. In such a case there is a broad range of failure rates where noisy decoder outperforms its noiseless counterpart.
C. Simulation Analysis of the Finite Length LDPC Codes
In section V-B, we have shown that an isolated trapping set can be corrected by our noisy decoder with high probability. However, the isolation assumption means that the logic gate errors in the rest of the graph were ignored. Now, we show that the decoder performance improvement due to logic gate failures, which we observed and quantified on isolated subgraphs, also holds when logic gate failures are present in the entire Tanner graph.
We consider the regular (n, k) = (155, 64) LDPC code with γ = 3 and ρ = 5, constructed by Tanner et al. [27] . The minimum Hamming distance between codewords is d min = 20, thus the noiseless maximum likelihood (ML) decoder would be able to correct any nine-error pattern (t = 9). It was shown in [24] that TS(5, 3) is a dominant trapping set in the (155, 64) code, and the noiseless Gallager-B decoder fails on some three-error patterns [28] . The most harmful lowweight error pattern for this code is shown in Fig. 4(a) . The conditional FER of the Tanner (155, 64) code for this pattern is presented in Fig. 6 .
The particular low-weight error pattern, which cannot be decoded by the noiseless decoder, can be decoded with nonzero probability by a noisy decoder for a wide range of gate error probabilities. For high values of α G , failures in logic gates make more damage than benefits as some error patterns correctable by the noiseless decoder need more iterations to be successfully decoded. Increasing the maximum number of iterations, L, reduces the probability that the error pattern remains uncorrected. The impact of L is more significant for high reliability gates. In this case, hardware errors cannot help much in the process of annihilating trapping sets because the state transition probabilities in W are small for most transitions other than those that already exist in the noiseless decoder. Consequently, the convergence to the subset S 0 takes longer, and the probability of escaping from a trapping set becomes significant only after a large number of iterations. For this particular error pattern, the optimal value of failure rate is 0.01 < α G < 0.1, which agrees with the results on an isolated trapping set shown in Fig. 5 . The results in Fig. 6 are given for the error pattern that is uncorrectable by the noiseless decoder. Therefore, the noisy decoder works better on the TS(5, 3) than the noiseless one for any gate failure rate in the range α G ≤ 0.1 and for any L. For a broad range of gate error rates, our decoder actually benefits from logic gate errors, and exhibits the stochastic resonance phenomenon.
If the storage medium is modeled as a BSC, any error pattern can occur at the decoder input with certain probability. We estimate the performance of noisy Gallager-B decoder in this case as well, for α M = 2×10 −3 . The numerical results are presented in Fig. 7(a) for the case when XOR and MAJ gates are both noisy. Since the previously analyzed error pattern is the most critical, the main effects are same as in Fig. 6 , and the lowest FER is achieved for the failure rates that maximize successful correction of the most critical trapping sets. The noisy Gallager-B decoder is more efficient than its noiseless counterpart for any values of the failure rates in the logic gates less than α G = 5 × 10 −2 . Even more importantly, when L = 1000 and the gate error rates have near-optimal values, the noisy hard-decision algorithm has better performance than the more complex soft-decision min-sum algorithm realized in noiseless hardware. In the error floor region, L has the dominant effect on the FER, as shown in Fig. 7(a) .
For a fixed hardware error rate α G , we can identify a range of tolerable channel error rates α M , required for not exceeding the predefined average FER after L decoding iterations. The numerical results are presented in Fig. 7(b) . In the waterfall region, as expected, an increase of the logic gate failure rate always reduces the tolerable α M . A similar effect was observed in [9] . However, the analysis in [9] was performed by using the density evolution technique. Since the density evolution is valid only for cycle-free Tanner graphs, it does not capture the stochastic resonance effect. Our analysis takes into account the effect of cycles through the analysis of trapping sets in the presence of logic gate failures. It identifies a range of α G in which the logic gate failures increase the tolerance to channel errors in the error floor region.
In Fig. 8(a) , the average FER as a function of L is presented for (155, 64) code. The failure rate of α G = 0.01 results in the performance improvement after L = 10 iterations, while after L = 30 iterations, the performance becomes significantly better. However, prolonging the decoding further does not reduce the FER significantly. For smaller failure rates, significant performance improvement can be achieved only for large L, as shown in Fig. 5 . The rewinding amplifies the positive effect of gate failures as it uses different initialization in every round. It also minimizes the negative effects of gate failures because it prevents accumulation of errors in large number of iterations, thus resulting in a faster performance improvement. It is interesting to notice that only L = 100 iterations are required for convergence when a noisy decoder with the failure rate α G = 0.01 is run in a rewinding schedule. This is slightly faster convergence than the convergence of the two-bit-bit-flipping (TBBF) decoding algorithm [29] , that was optimized for the column weight three codes with girth g = 8.
In Fig. 8(b) we present the performance of two longer codes, Margulis (2640, 1320) code and PEG (1008, 504) code, both (3, 6)-regular LDPC codes. Although the dominant trapping sets are not the same for these codes [30] , for both codes the performance improvement can be observed if L ≥ 20. Allowing larger number of iterations results in additional performance improvement.
A comparison of different decoding strategies suitable for logic gates with high or low reliability is shown in Fig. 9 . The FER curves for Tanner (155,64) code are shown in Fig. 9(a) . In the error floor region, the numerical results obtained by using isolated trapping set analysis (by using Markov chains) combined with Eq. (12) perfectly match with the simulation results. For all L, the decoder F outperforms the ideal decoder F , and for large L, its performance approaches the Recently, Varshney have identified the fundamental limits for the construction of reliable memories by using noisy binary logic gates affected with i.i.d. errors, [31] , and the importance of sphere packing bound is identified. In another recent paper [12] , we have shown that the rewinding can be successfully applied to the various hard-decision decoders built of noisy gates, and that the maximum-likelihood bound could be approached if we can allow very large decoding latency.
The FER curves of the PEG (1008, 504) code and Margulis (2640, 1320) code are shown in Fig. 9(b) . For both codes, the error floor performance is greatly improved. Comparison with the analytical results obtained by using Eq. (12) is given for Margulis code. The PEG (1008, 504) code suffers from a slight degradation in the waterfall region, but failures in logic gates result in significant performance improvement in error floor region. Although the numerical results are presented for only two values of α G , the performance of both codes is improved for a wide range of logic gate failures rates.
VI. CONCLUSIONS AND PERSPECTIVES
The decoder proposed in this paper is built of a mixture of noisy and noiseless logic gates, and for a broad range of gate failure rates it works better than a decoder made completely of noiseless gates. The fact that noise can be used constructively has been observed in many of natural and artificial analog signal processing systems, and is known as stochastic resonance [32] . The phenomenon studied in this paper can be also interpreted using the language of stochastic resonance. However, due to huge complexity of our correction circuits, the available stochastic resonance analysis tools are not sufficient to characterize their improved robustness.
The proposed analytical model based on absorbing Markov chains is used to quantify contributions of dominant trapping sets to the FER. Its parameters can be populated from the parameters of the failure mechanism statistics, and is used to identify the impact of critical parameters to the performance and offer design tradeoffs. For example, small gate failure rates require more iterations, but result in better performance. In the case when the rewinding is applied, it has been shown that the convergence speed is comparable to the best existing deterministic algorithm of a comparable complexity designed to escape from trapping sets. The analysis is applicable to any memory error statistics, including permanent failures. Although the results are given for von Neumann model of logic gate failures, we showed that the absorbing Markov chain with fully connected set of transient states can be defined for any type of transition failures in logic gates.
We have shown that the main effects of errors in logic gates can be captured by the analysis of isolated trapping sets, and that this analysis is predictable of the behavior on the entire Tanner graph. Even though all the XOR and MAJ gates used in the message update functions are subject to failures, for a broad range of failure rates the noisy decoder outperforms its noiseless counterpart. We have verified this by simulation of codes of various lengths obtained by different constructions.
APPENDIX (PROOF OF THEOREM)
A. Classification of States
It is instructive to classify the states of W e with respect to their closeness to codewords. Let C denote the set of all 2 k codewords and let ∼ C denote the set of 2 n − 2 k n-tuples that are not codewords. For a given codeword x, let ∼ x denote the set C \ x. For a given decoder F and the error pattern e in the memory elements, let S be the set states of Markov chain W e , and let S x denote the subset of S for which all parity check are satisfied, and the variable node decisions form the codeword x. Similarly, S ∼x denotes the set of states for which all parity checks are satisfied, and the variable node decisions form a codeword different from x. S ∼C denotes the set of states for which the variable node decisions are not codewords. For any x ∈ C , the above three disjoint sets partition the set of states S S = S x ∪ S ∼x ∪ S ∼C ,
When the noiseless syndrome checker is turned on, and if the Markov chain is in the state β ∈ S x ∪ S ∼x , the decoding is terminated, and the Markov chain stays in β. Thus, the states in S x and S ∼x are absorbing (the state transition diagram is shown in Fig. 10(a) ).
B. Probability of Absorbtion
Define now the matrices P ∼C ,x , P ∼C ,∼x and P ∼C ,∼C with dimensions, respectively, |S ∼C |×1, |S ∼C |×1 and |S ∼C |×|S ∼C |, 
defines the transition probability matrix of a new Markov chain M e . (Fig. 10(b) ). In M e , all the states in S x are lumped into a single state. With a moderate abuse of notation, this new state is labeled as S x . The second absorbing state (with lumped states from S ∼x ) is labeled as S ∼x . The matrix Q in Eq. (14) is a transition probability matrix between the transient states in S ∼C , and I 2 is the 2 × 2 identity matrix. The initial distribution of M e can be written as
where π Note that the transition diagram of the transient states is a strongly connected graph, and that Q does not have any nonzero entries. The transition probabilities from transient to absorbing states S x and S ∼x are given by the matrix R = (R x , R ∼x ), where R x = P ∼C ,x , and R ∼x = P ∼C ,∼x .
The transition probabilities between states in iterations are given by 
The fundamental matrix of the absorbing chain N = (I −Q) −1 determines the average times to absorption from different transient states. More specifically, δ∈S ∼C N β,δ is the average time to absorption from the transient state β.
C. FER and MER
For a given decoder, and error pattern e, the frame error probability and the miscorrection probability, in the iteration , Pr{x ( ) = x} can be now found from W e and expressed as
where 0 denotes the all-zero codeword.
If we write the matrix B ( ) in Eq. (17) ∼x . Because the sum of entries in every row of Q is strictly less than one, the largest eigenvalue of Q is less than one. Therefore, when tends to infinity Q +1 → 0. Thus, π 
