Thanks to its superior features of non-volatility, fast read/write speed, high endurance, and low power consumption, spin-torque transfer magnetic random access memory (STT-MRAM) has become a promising candidate for the next generation non-volatile memories (NVMs) and storage class memories (SCMs). However, it has been found that the write errors and read errors caused by thermal fluctuation and process variation severely degrade the reliability of STT-MRAM. Moreover, process imperfection also causes a diversity of the raw bit error rate (BER) among different dies of STT-MRAM. In this paper, we propose the design of novel rate-compatible protograph low-density parity-check (RCP-LDPC) codes to correct memory cell errors and mitigate the raw BER diversity of STT-MRAM. In particular, to deal with the asymmetric property of the STT-MRAM channel, we first apply an independent and identically distributed (i.i.d.) channel adapter to symmetrize the STT-MRAM channel. We then present a modified protograph extrinsic information transfer (P-EXIT) algorithm for the symmetrized STT-MRAM channel. We further propose a combined guideline, including the modified P-EXIT algorithm, the asymptotic weight enumerator (AWE) analysis, as well as the actual error rate performance, for designing protograph LDPC codes with short information word lengths for STT-MRAM. By further applying a code extension approach, we design novel RCP-LDPC codes that can work with a single encoder/decoder. Simulation results show that our proposed RCP-LDPC codes outperform the well-known rate-compatible AR4JA protograph codes as well as the fixed-rate quasi-cyclic (QC) LDPC codes in terms of both the error rate performance and the convergence speed over the STT-MRAM channel.
I. INTRODUCTION
In recent years, spin-torque transfer magnetic random access memory (STT-MRAM) has been considered as an ideal candidate for both the embedded non-volatile memory (NVM) and the storage class memory (SCM) [1] . Key advantages of STT-MRAM include high endurances, fast read/write speed, and low switching energy [2] . However, due to process variation and thermal fluctuation, both the write errors and The associate editor coordinating the review of this manuscript and approving it for publication was Yi Fang . read errors occur, leading to a detrimental effect on the reliability of data stored in the memory cells [3] . Moreover, due to the memory process imperfection, there also exists a diversity of the raw bit error rate (BER) among different dies of STT-MRAM. Therefore, it is critical to develop advanced error correction coding techniques to effectively correct memory cell errors and improve the reliability of STT-MRAM.
To maintain the fast read/write speed of STT-MRAM for applications such as the SCMs, the error correction code (ECC) [4] adopted should have a short information VOLUME 7, 2019 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ word length of around 512 bits or lower [5] . For example, a (71, 64) single-error-correcting Hamming code is adopted by Everspin's 16Mb MRAM [6] . Bose-Chaudhuri-Hoquenghem (BCH) codes with double-error-correction capabilities are also applied to improve the reliability of STT-MRAM [7] . Moreover, in [8] , an Euclidean Geometry (EG) low-density parity-check (LDPC) code with a reliability-based min-sum (RB-MS) decoder, which can achieve better error rate performance than the BCH code, is further proposed for STT-MRAM. However, the above ECCs are all fixed-rate codes which have to be designed for the worst case raw BER, leading to a high redundancy of ECCs and hence a waste of memory storage density.
To reduce the ECC redundancy, adaptive error correcting schemes have also been proposed [5] , [9] . In these schemes, ECCs with different error correction capabilities are assigned to data blocks with different raw BERs. However, as these ECCs need separate encoders and decoders, the proposed schemes will incur a large silicon areas and more power consumption.
On the other hand, during the recent few years, long protograph LDPC codes with information word length of around 4k bits have shown superior performance for the additive white Gaussian noise (AWGN) channel [10] , the magnetic recording channel [11] , and the multi-level-cell flash memory channel [12] . The codes possess a readily parallelizable decoder structure, which guarantees a low implementation complexity. During the design of protograph LDPC codes, the protograph extrinsic information transfer (P-EXIT) algorithm [10] , together with the asymptotic weight enumerator (AWE) analysis [13] , [14] , are theoretical tools to analyze the performance of the designed protograph codes at the waterfall region and error floor region, respectively [13] . They are typically effective for the design of long protograph codes. The rate-compatible protograph LDPC (RCP-LDPC) codes can facilitate protograph coding across a wide range of code rates by using just a common encoder/decoder structure. They have been investigated for the AWGN channel, with the AR4JA rate-compatible protograph codes showing superior error rate performance [14] . However, so far no work has been reported for the design of rate-compatible protograph codes for STT-MRAM.
In this work, we propose novel RCP-LDPC codes that can effectively correct both the write errors and read errors of STT-MRAM. Since the rate-compatible property the codes possess can mitigate the raw BER diversity of the memory, they provide great advantage over the prior art fixed-rate BCH codes and the EG LDPC code designed for STT-MRAM. More specifically, as the STT-MRAM channel is asymmetric by nature, we first adopt an independent and identically distributed (i.i.d.) channel adapter [15] to force the channel to be symmetric. We then present a modified P-EXIT algorithm for the symmetrized STT-MRAM channel. Since for short codes, the P-EXIT algorithm and the AWE analysis may not possess the same significance as in the the case with long codes, we further propose a combined guideline that includes the modified P-EXIT algorithm, the AWE analysis [13] , [14] , as well as the actual error rate performance to construct RCP-LDPC codes with short information word lengths for the STT-MRAM channel. Simulation results demonstrate that our proposed RCP-LDPC codes outperform the ratecompatible AR4JA codes as well as fixed-rate auasi-cyclic (QC) LDPC codes with the same code rates and information word length [14] , [16] , in terms of both the decoding error rate performance and the convergence speed.
The rest of the paper is organized as follows. Section II introduces the channel model of STT-MRAM adopted by this work. Section III presents the detailed methods to design RCP-LDPC codes for STT-MRAM. The simulation results are presented in Section IV. Finally, Section V concludes the paper. 
II. PRELIMINARIES OF STT-MRAM A. STT-MRAM CELL BASICS
As shown by Fig. 1 , an STT-MRAM cell typically has a magnetic tunneling junction (MTJ) and an nMOS transistor, which serve as the data storage element and the access control device, respectively [2] , [17] . In the MTJ, a tunneling oxide layer is sandwiched between a reference layer whose magnetization direction is fixed, and a free layer whose magnetization direction can be switched by passing spin polarized currents (i.e. the write currents) of different directions through the MTJ. When the magnetization direction between the free layer and the reference layer are anti-parallel, the MTJ has a high resistance, which can be used to denote an input information bit of '1'. Otherwise if the relative directions between the free layer and the reference layer are parallel, the MTJ has a low resistance, which can be used to denote an input information bit of '0'.
B. WRITE AND READ OPERATIONS
As illustrated by Fig. 1 , to write a '0', we connect the wordline (WL) and bit-line (BL) to the supply voltage and sourceline (SL) to the ground. Thus, the write current will flow from the free layer to the reference layer. To write a '1', we connect the BL to the ground, and the WL and SL to the supply voltage, and hence the direction of the write current is reversed.
To read the stored data in an STT-MRAM cell, the WL is first set to turn on the access transistor. A small read current, which can be set to either the write-0 direction or the write-1 direction, is then applied to the MTJ. The resistant state of the MTJ is detected by a memory sensing circuit thereafter.
C. RELIABILITY CHANLLENGES
In STT-MRAM, process variation causes variations of both the MTJ geometry and the nMOS transistor size, leading to both the write errors and read errors [2] , [7] . For the write error, it occurs when the magnetization of the free layer fails to be switched during writing. Furthermore, due to its lower spin-transfer efficiency, the 0→1 switching needs a higher write current [5] . However, in the typical STT-MRAM write circuit, the switching current for writing a '1' is lower than writing a '0'. This causes the asymmetry of the write errors. That is, the write error rate for 0→1 switching P 0→1 is much greater than the error rate for 1→0 switching P 1→0 .
For the read error, it can be classified into the read decision error and the read disturb error (with an error rate of P rd ). The read decision error occurs when the resistances of MTJ cannot be differentiated during the read operation due to widened distributions of the MTJ resistances caused by the process variation. Furthermore, process variation or thermal fluctuation may lead to a large read current, which will cause an accidental switching of the MTJ and hence the read disturb error. Moreover, if the read current is set to the write-1 direction, only the 0→1 switching can occur. Similarly, if the read current is set to the write-0 direction, only the 1→0 switching will occur.
D. CASCADED CHANNEL MODEL OF STT-MRAM
Based on the major characteristics and cell failure mechanism of STT-MRAM, a cascaded channel model for STT-MRAM is proposed by [3] and used in our work. The cascaded channel model adopts a binary asymmetric channel (BAC) to describe the combined write error and the read disturb error of STT-MRAM. For reading with write-1 direction, the corresponding crossover probability of the BAC can be expressed
The cascaded channel model further uses a Gaussian mixture channel (GMC) to model the distributions of the two cell resistance states (denoted by R 1 and R 0 , respectively), since they are found to approximately follow the Gaussian distributions. The means and variances of R 1 and R 0 are denoted by µ 1 , σ 1 , µ 0 and σ 0 , respectively. In the simulations of this work, the channel parameters are taken from [8] , corresponding to a 45nm×90nm in plane MTJ under a PTM 45nm technology node, µ 1 = 2k , µ 0 = 1k , and σ 1 /µ 1 = σ 0 /µ 0 . The influence of the process variation on the read decision errors can be incorporated in through varying the resistance distributions by changing σ 0 /µ 0 (and hence σ 1 /µ 1 ). Thus, a large σ 0 /µ 0 represents a worse fabrication process with more read decision errors, while a small σ 0 /µ 0 represents a better fabrication process with fewer read decision errors. Correspondingly, unlike in the AWGN channel where the signal-to-noise-ratio (SNR) is used to measure the decoding threshold, which is the lowest SNR value to meet the stopping criterion of the modified EXIT and P-EXIT analysis [10] , [18] , [19] , we use the value of σ 0 /µ 0 to evaluate the decoding threshold of the STT-MRAM channel.
III. DESIGN OF RCP-LDPC CODES FOR STT-MRAM A. MODIFIED P-EXIT ANALYSIS FOR THE STT-MRAM CHANNEL
The conventional P-EXIT algorithm [10] , [18] , based on Gaussian assumption, is a theoretical tool to measure the decoding threshold of protograph LDPC codes, for the binary-input symmetric channels only. Hence it will no longer be applicable to the STT-MRAM channel which is asymmetric by nature. In the following, we first symmetrize the STT-MRAM channel, and then modify the P-EXIT analysis and utilize it for designing the RCP-LDPC codes.
1) I.I.D CHANNEL ADAPTER
A binary-input channel is symmetric if p(Y = y|C = 0) = p(Y = −y|C = 1), with y and C being the output and input of the channel [14] , respectively. Obviously, the STT-MRAM channel is asymmetric. In order to overcome this obstacle, we adopt an i.i.d. channel adapter [15] to force the STT-MRAM channel to be symmetric. An i.i.d channel adapter, as illustrated by Fig. 2 , first generates binary symbols t i , i = 1, 2, . . . , n, following an i.i.d equiprobable distribution, where n is the length of the channel input sequence. It then mod-2 adds these symbols to the channel input bits c i . At the receiver side, a sign adjuster is adopted that performs the operation of v i = u i * (1 − 2 * t i ), with u i being the log-likelihood ratio (LLR) of the channel output signal y i , v i being the input of the LDPC decoder. Here, u i is provided by an LLR generator, such as a soft-output channel detector [3] . As can be seen, the sign adjuster undoes the effect of the mod-2 adder. It can be proved that the new augmented channel with input bit c i and output bit v i is a binaryinput symmetric channel [15] . In this way, the STT-MRAM channel is symmetrized. Since in the symmetrized channel, the decoding error probability is independent with the specific codeword being stored in the memory, it is reasonable to adopt the all-zero codeword for the threshold analysis and code design. Moreover, the mutual information (MI) of the symmetrized channel is also the same with that of the original STT-MRAM channel [15] .
2) MODIFIED P-EXIT ALGORITHM
Next, in order to modify the P-EXIT algorithm for the symmetrized STT-MRAM channel, we first investigate the probability density function (PDF) of v i and find that it can be fitted by two Gaussian-Mixture (GM) distributions as shown by Fig. 3 . These two GM distrubutions can be considered as a weighted sum of Guassian distributions, given by:
where w i , µ GM i , and σ GM i are the weight, the mean, and the variance of the ith Gaussian distribution. The above parameters can be estimated by using the Expectation-Maximization (EM) method [20] . Thus, the modified P-EXIT algorithm can then be described as follows.
1) MI of the symmetrized STT-MRAM channel. Given a σ 0 /µ 0 , the channel input LLR V , which is a vector of v i , can be collected easily. Then the MI I ch of the symmetrized STT-MRAM channel can be computed by the Monte Carlo simulation [12] as:
2) MI of variable notes (VNs). By using the GM approximation, the MI of VNs can be computed as:
where w vn i , µ vn i , and σ vn i are defined as:
where b ij denotes the (i, j) th entry of an M × N base matrix B corresponding to a protograph. If s = p, set δ sp = 1. Otherwise, set δ sp = 0. 3) MI of check notes (CNs). By exploiting the duality property between the VN and CN, the MI of the CN update can be computed similar to (3) as well. 4) Decoding threshold criterion. With a predetermined iteration number, the MI is updated iteratively between CNs and VNs. We define the decoding threshold as the highest value of σ 0 /µ 0 for which the a posteriori MIs of all VNs converge to 1. We verify the effectiveness of the above modified P-EXIT algorithm over the symmetrized STT-MRAM channel by comparing the decoding thresholds obtained by using the above modified P-EXIT algorithm with those obtained by using the density evolution (DE) method [21] . TABLE 1 illustrates the difference of the decoding thresholds obtained by using the two methods, denoted by ε th , for different highrate LDPC codes, and with different VN degrees d v and CN degrees d c . Observe that the difference of the obtained decoding thresholds is within 0.05. Therefore, it is valid to apply the modified P-EXIT algorithm to obtain the decoding thresholds of the protograph codes over the symmetrized STT-MRAM channel, based on which we can design and optimize the protograph codes.
B. AWE ANALYSIS
The normalized logarithmic AWE r(δ) for the protograph code ensemble can be evaluated by
where N is the number of VNs, r( ) denotes the logarithmic AWE for each scalar weight vector , and r t ( t ) for any subvector t of partial weight is obtained by maximizing r( ) [14] . The second-zero crossing point of r(δ) is called the typical minimum distance ratio (TMDR). If r(δ) has a positive value, the linear minimum distance growth property [13] ensures that there is a high chance that the minimum distance of the code ensemble increases linearly with the block length, thus leading to a low error floor during decoding.
C. RCP-LDPC CODE DESIGN
As introduced earlier, LDPC codes with short information word lengths should be adopted so as to sustain the fast read access time of STT-MRAM. However, in this case, the P-EXIT algorithm for evaluating the decoding threshold, and the AWE analysis for measuring the low error floor may not possess the same significance as in the case with long codes [14] . Therefore, in this work, a list of L codes (e.g. L = 30) codes with low decoding thresholds and TMDRs are first generated. Then from among these codes, we choose the code with the best performance at both the waterfall region and error floor region based on computer simulations [22] . Thus, we use a combined guideline as summarized by Fig. 4 , for the design of protograph LDPC codes with short information word lengths for the STT-MRAM channel. By using the above described combined guideline, we can first construct a high-rate protograph LDPC code with an information word length of 520 bits. Since the exhaustive search for the base matrix of a high-rate protograph LDPC code is impractical due to the large search scope, we start by finding a rate R = 0.91 code with a small base matrix of size 1 × 11. The following empirical constraints are imposed to obtain the corresponding code protograph.
• The protograph with low decoding threshold has no punctured (un-transmitted) VN [23] and the maximum degree of the VN is 5.
• In order to preserve the linear minimum distance growth property of the protograph [13] , the number of degree-2 VNs should be limited to 1. This also ensures that no length-4 cycles will be formed.
After a simple search by incorporating the above constraints, the base matrix of the rate-0.91 protograph code is obtained, given by 
The corresponding protograph structure is illustrated by Fig. 5 . The rate-0.91 code serves as the daughter code of our proposed RCP-LDPC codes. That is, we further apply an approach called code extension by adding the same numbers of VNs and CNs to the daughter code, and thereby can generate a family of rate-compatible codes with rates R = 10 11 + k , where k = 1, 2, . . . is the number of VNs and CNs that is added to the daughter code. For example, we can construct a rate-0.83 protograph code by adding one VN and CN simutaneously to the rate-0.91 daughter code, and the corresponding base matrix can be written as
where 0 0 0 is an all-zero sub -matrix, B B B 0.83 and B B B 0 .91 are base matrices for the rate-0.83 code and rate-0.91 code, respectively. By following a similar process, we can construct a rate-0.72 protograph code, and its base matrix is given by
Therefore, for a specific information word length (e.g. 520 bits), we can construct novel RCP-LDPC codes with code rates of 0.91, 0.83, 0.72, respectively. Since the proposed codes are based on the same parity-check matrix with specific structure, a single protograph encoder/decoder is sufficient to support all the code rates [22] .
IV. SIMULATION RESULTS
The simulations are based on the STT-MRAM channel model proposed by [3] without adding the i.i.d channel adapter, and a 3-bit capacity-maximizing quantizer used in the simulation work of [24] . We also construct short rate-compatible AR4JA protograph codes according to [14] and short fixed-rate QC LDPC codes according to [16] . They have the same code rates and information word length with our proposed RCP-LDPC codes. Note that the fixed-rate QC LDPC codes would need a separate encoder/decoder for each of the code rate. In addition, the RB-MS algorithm [8] , which only requires integer/ logical operations, is adopted for decoding.
The TMDR r(δ) of protograph codes can be obtained from the AWE analysis. Hence in Table 2 we compare the TMDRs of the proposed RCP-LDPC codes and those of the ratecompatible AR4JA codes, with different code rates of 0.91, 0.83, and 0.72. From the table, we observe that the TMDRs of all the codes are positive, which guarantees that all the codes to have linear minimum distance growth. Furthermore, compared with the rate-compatible AR4JA codes with different code rates, the proposed RCP-LDPC codes possess greater TMDRs, and hence lower error floors. Next, we carry out simulations by fixing the write error rate P 0→1 at 2 × 10 −4 (with P 1→0 and P rd to be two orders of magnitute lower than P 0→1 ) [3] , and varying the mean normalized resistance spread σ 0 /µ 0 of STT-MRAM to account for the impact of process variations. Fig. 6 and Fig. 7 illustrate respectively, the comparison of BER and average number of iterations for the decoder to converge for the proposed RCP-LDPC codes, the rate-compatible AR4JA codes, and the fixed-rate QC LDPC codes. Observe from Fig. 6 that the proposed RCP-LDPC codes perform significantly better than the AR4JA codes and fixed-rate QC LDPC codes for all the different code rates over the STT-MRAM channel. In particular, the RCP-LDPC codes achieve a performance gain of more than 0.5% in terms of the maximum affordable resistance spread σ 0 /µ 0 with the code rates of 0.72 and 0.83, and a gain of more than 2% of σ 0 /µ 0 with the code rate of 0.91, over both the rate-compatible AR4JA codes and the fixed-rate QC LDPC codes at BER = 10 −6 . Moreover, Fig. 7 shows that the RCP-LDPC codes also have faster convergence speeds than both the rate-compatible AR4JA codes and the fixed-rate QC LDPC codes with different code rates. Again, the reduction in decoding iterations of the RCP-LDPC codes over both the rate-compatible AR4JA codes and the fixed-rate QC LDPC codes is larger in the rate 0.91 case.
Finally, we fix the resistance spread σ 0 /µ 0 at 9.5%, and vary the write error rate P 0→1 to investigate the system's tolerance to the write errors and read disturb errors with different LDPC codes. As shown by Fig. 8 , both the rate-compatible AR4JA codes and the fixed-rate QC LDPC codes have an error floor at around BER = 10 −5 , with the code rate of 0.91. This error floor is mainly caused by the read decision error occurs at σ 0 /µ 0 = 9.5%. The proposed RCP-LDPC codes, however, can overcome the high error floor and improve the system's performance under both the write errors and read errors. The RCP-LDPC codes outperform the AR4JA codes and the fixed-rate QC LDPC codes with the code rates of 0.83 and 0.72 as well. Furthermore, Fig. 9 also shows that the proposed protograph codes require less number of iterations to converge than both the rate-compatible AR4JA codes and the fixed-rate QC LDPC codes when dealing with the write errors and read disturb errors, for all the three different code rates. Thus, the read latency can be reduced as well.
V. CONCLUSION
In this paper, we have considered the application of RCP-LDPC codes to correct the memory cell errors and improve the reliability of STT-MRAM. We have first symmetrized the STT-MRAM channel, based on which we derived a modified P-EXIT algorithm for the STT-MRAM channel to obtain the decoding thresholds of the designed codes. We have then proposed a combined guideline consisting the modified P-EXIT algorithm, the AWE analysis, and the actual error rate performance to design protograph LDPC codes with short information word lengths for STT-MRAM. We have further adopted a code extension approach to construct novel RCP-LDPC codes that can work with a single encoder/decoder. Simulation results have shown that the proposed RCP-LDPC codes can achieve superior error rate performance and faster convergence speed than both the ratecompatible AR4JA protograph codes and the fixed-rate QC LDPC codes, thus demonstrating their potential to improve the data recovery reliability of STT-MRAM.
