Abstract-Spin-torque transfer magnetic random access memory (STT-MRAM) has emerged as a promising nonvolatile memory technology, with advantages, such as scalability, speed, endurance, and power consumption. This paper presents an STT-MRAM cell operation channel model with write and read operations for information theorists and error correction code designers. This model considers the effects of process variations and thermal fluctuations and considers all principle flaws during the fabrication and operation processes. With this model, evaluations are not only made for the write channel, the read channel, but also the write and read channel with metrics, such as operation failure rate, bit error rate, channel ergodic capacity, and channel outage probability at certain outage capacity. Moreover, it is proved that the distributions of written-in bit states are not uniformly distributed and are proportional to their respective write success probabilities. Finally, simulation results show that practical code rates and code block lengths can guarantee reliable performances only if the operation success rate difference between state 1 and state 0 is small enough.
technical parameters mutually correlated, it is possible to reduce the design complexity and tolerate a certain level of device imperfection by introducing another degree-of-freedom, i.e., error correction codes (ECCs) [1] , [2] .
In order to design a satisfying ECC, the STT-MRAM cell channel needs to be carefully modeled and investigated by obtaining metrics, such as operation failure rate, bit error rate (BER), channel ergodic capacity, and channel outage probability at certain outage capacity. This model is extremely important for the efficient selection of both the code rate and the code length in order to meet the practical performance requirements. The scientific canvas for this design is information theory; the STT-MRAM is then considered as a device having an input (i.e., the original information) and an output (i.e., a resistance corresponding to the written/read out information), the output being statistically linked to the input through the physical properties of the media. For more distorted channels, more redundancy should be added by the ECC, i.e., the code rate between the real information and the coded bits (information and added redundancy) should be lower and Shannon postulated that an asymptotically small error can be achieved if the code rate is less than the channel capacity [3] . However, there are a very few works correlating the ECC design with the cell channel. Chen et al. [4] modeled the STT-MRAM operation channel as an asymmetrical resistance variation channel; both the influence of the write and read failures, and the process variations are considered as factors enlarging the standard deviations (STDs) of resistance distributions which is over-simplifying as the write and read failures are nonlinear processes. Moreover, Wen et al. [5] proposed an asymmetric write channel model considering process variations and thermal fluctuations. However, in an STT-MRAM memory system, both the write and the read operations are important and each of these functions must be modeled carefully; actually, the optimization on either the write or the read operation generally does not lead to a global optimization and technical parameters are usually selected to balance both write and read performances to reach a global optimum performance. Taking the transistor width as an example, the selection of the transistor width generally determines the current drive capacity; on one side, the write operation needs a large current to target the magnetization as quickly as possible, and the read operation requires a sufficient current to drive the sense amplifier as fast as possible; on the other side, the read current should be kept small enough to avoid flipping the cell content. Differently from [4] and [5] , [6] proposed an optimization technique to minimize both read and write failures and developed a mixed-mode framework to optimize the bit-cell level reliability. This framework captured the transport physics using the nonequilibrium Green's function method, solved the MTJ magnetization dynamics with the Landau-Lifshitz-Gilbert (LLG) equation, and performed bitlevel optimization with HSPICE. Zhao et al. [7] dealt with the reliability issues by analyzing the impact of the nonpermanent soft errors introduced by various operations, as well as the permanent hard-errors caused by permanent device damages. Fong et al. [6] and Zhao et al. [7] concentrate on hardware design in order to optimize the memory performance. However, such a performance level can also be reached using an additional degree-of-freedom-a properly designed ECC [1] , [8] [9] [10] [11] [12] [13] ; the hardware design complexity can thus tolerate a certain level of unreliability that will be improved by ECC.
Differently from the previous works [4] [5] [6] [7] , this paper intends to propose a channel model to simulate the reliability of the basic STT-MRAM cells write and read operations by considering both the process variations and thermal fluctuations (without considering the impact of hard errors); also differently from the compact models [14] [15] [16] [17] [18] , this model targets to bridge the gap between the information theory community and the physical device community by considering various process variations and thermal fluctuations without solving any complex equations. Moreover, aiming at an efficient ECC design, operation failure rates, BERs, and channel capacities is evaluated. Comments are also made for the highly asymmetrical characteristics of the STT-MRAM channel. Finally, suggestions are made for the selection of both the code rate and the code block length (BLK).
The rest of this paper is organized as follows. The basics of STT-MRAM cell operations, various process variations, thermal fluctuations, and capacity definitions are briefly reviewed in Section II. The proposed channel model, including both write and read operations, is detailed in Section III. Simulation and numerical results are given in Section IV with comments on the ECC design. Finally, the conclusions are drawn in Section V.
II. STT-MRAM CELL OPERATIONS

A. STT-MRAM Cell Basics
A datum in a STT-MRAM cell is represented as the resistance state of a MTJ device, which can be switched by applying programming currents with different polarizations [19] , [20] . A widely used STT-MRAM cell structure is displayed in Fig. 1 and the so-called 1T-1MTJ structure consists of one transistor and one MTJ, where a tunneling oxide layer [see Fig. 1 (gray bars)] is sandwiched between two ferromagnetic layers; one of these layer is called reference layer and has a fixed magnetization, and the other layer is called free layer with two possible magnetizations to represent a bit.
Writing a 0 or writing a 1 to a cell is achieved by applying reversed direction currents. When writing 0 [MTJ in parallel (P) state], the word line (WL) and the bit line (BL) are connected to the supply voltage V DD , and the source line (SL) is connected to the ground [ Fig. 1(a) ]. The nMOS transistor is either working in its saturation region for a small transistor width or in its linear region for a large transistor width. When writing 1 [MTJ in antiparallel (AP) state], the WL and the SL are connected to V DD , while the BL is connected to the ground [ Fig. 1(b) ]. The transistor is then working in its saturation region.
There are two ways to read a cell, the so-called P direction read with the same direction as writing 0, and the AP read with the same direction as writing 1. In the P direction reading, a low voltage is applied between BL and SL. After activating WL, a current flows from BL to SL. In the AP direction reading, the voltage polarity applied to the BL and the SL is switched and a current flows in the reversed direction-from the SL to the BL.
B. CMOS Process Variations
The CMOS process variations contribute to the variability of the driving strength of the nMOS transistor due to random dopant fluctuations, line-edge roughness, shallow trench isolation stress, and geometry variations of the transistor channel length/width [21] . All these process variations have a direct impact over the transistor's threshold voltage V TH and its equivalent resistance.
C. MTJ Process Variations
The MTJ process variations are independent from the CMOS process variations and lead to the variability of the MTJ. These variations stem from the MTJ shaping variations, from the oxide thickness variation, and from the localized fluctuation of magnetic anisotropy [22] . The first two factors cause the variations of the MTJ resistance and of the MTJ switching current by changing the bias conditions of the nMOS transistor, whereas the third factor is an intrinsic variation of the magnetic material that both affects the MTJ's critical switching current density J C0 and the magnetization stability barrier height.
D. Random Thermal Fluctuations
In general, the magnetization dynamics of the MTJ switching affected by thermal fluctuations can be modeled by the famous LLG equation by considering the thermal agitation fluctuating field [23] . Due to the random thermal fluctuations, the MTJ switching time becomes unrepeatable and is independent of the process variations.
It has been found that switching modes in MTJ are categorized as a function of the switch current duration and can be classified into three distinct modes [24] : 1) thermal activation; 2) dynamic reversal; and 3) precessional switching.
For a long current pulse (>10 ns), the magnetization switching is a thermally activated process. In this regime, the magnetization switching is independent of the initial conditions and is only determined by thermal agitation during the switching process.
For a very short switch current duration (<3 ns), the magnetization switching is precessional switching and is mainly dependent on the initial thermal distribution. In this regime, both the magnetization switching distribution and the switching probability are independent of the thermal agitation during the switching process.
For an intermediate current pulse duration (between 3 and 10 ns), the magnetization switching is dynamic reversal [24] and is determined by the initial thermal distribution and the thermal agitation during the switching process.
E. Write Variations
During the write operation, two kinds of failures can occur. 1) The cell fails to be flipped from 0 to 1 and keeps the 0 state while AP writing is performed.
2) The cell fails to be flipped from 1 to 0 and stays at the 1 state when P writing is performed. These failures come from two factors that can lead to the variation of the MTJ switching current and thus result on a switching time uncertainty [25] . One factor is the CMOS transistor and MTJ process variations, which cause a driving ability variation of the transistor; the other factor is the random thermal fluctuations, inducing a stochastic MTJ magnetization switching process [23] .
Moreover, these two factors lead to a high asymmetry between the two writing state transitions 0 → 1 and 1 → 0. The bias difference condition [26] of the transistor causes that the 0 → 1 transition requires a longer time to perform the transition compared with the 1 → 0 transition, and the STD of the transition 0 → 1 is much broader than the one of the transition 1 → 0 [27] . Therefore, the write operation 0 → 1 contributes prominently to writing failure events [26] and is considered as an unfavorable switching direction.
F. Read Variations
One must achieve a compromise on setting a proper read current [6] , [7] , [28] for the read operation. On one side, the read current requires to be high enough to generate a sufficient sense voltage margin to drive the sense amplifier and to ensure a fast read access time; on the other side, the read current must be kept low enough so as to avoid flipping the stored state to the reversed one.
Therefore, three types of errors can occur during the read operation.
1) The cell stores a 0 but is read out as a 1.
2) The cell stores a 1 but is read out as a 0.
3) The cell stores a 0 (resp. 1) but is flipped to 1 (resp. 0) during an AP (resp. P) read operation. The first two error types come from the process variations of the cell MTJs and transistors, when compared with a reference resistance that is assumed to be ideal with neither process variations nor thermal fluctuations; the third error type stems from too large read current flipping the MTJ cell state.
G. Channel Capacity
In order to design an efficient ECC with reliable performance for STT-MRAM systems, not only should the operation failure rates be measured, but also the operational channel capacity, i.e., the maximum ratio that can be reliably written into and read out from 1T-1MTJ cells, needs to be evaluated.
For the STT-MRAM write and read channel, the capacity can be written as
where X = {0, 1} is the input of the channel and Y is a continuous output resistance value.
Since the a priori information about the input bit X is highly content dependent, it is reasonable to assume an equiprobable distribution for X, i.e., p(x = 0) = p(x = 1) = 0.5. Therefore, the channel capacity is equal to the mutual information I (X; Y ), given by
where H (Y ) is the entropy of the channel output
and the probability density function (PDF) p(y) is
where H (Y |X) is the conditional entropy of the channel output Y given the channel input X, defined as
Note that (2) can be applied to the capacity evaluation of the write channel, the read channel, and the write and read channel in order to balance write and read operations. The capacity (2), also called ergodic capacity, is obtained by averaging over all possible channel realizations (i.e., an infinite number of 1T-1MTJ cells). This implies that the ergodic capacity can be achieved only by a theoretic infinite length ECC.
However, in practice, for a finite code length, the channel capacity varies from one block to another due to the limited number of channel realizations. The outage probability ε o [29] is more useful in this case; ε o is defined as the probability that a capacity C N measured over a finite sample of size N is lower than a given capacity threshold C o , where C N represents the actual data rate and C o represents a target data rate that is able to be correctly memorized and delivered. When the actual blockwise channel capacity, C N is smaller than the required data rate C o , no ECC exists to guarantee a zero error event, and a decoding failure is thus declared. In other words, if a design target with a block code of length N bits and a decoding failure rate ε o is set, the maximum useful information bit number is NC N and the minimum redundant bit number introduced by the ECC should be N (1 − C N ) . Mathematically, the definition of the outage probability is given by
where the terms C N , H (X N ), and H (X N |y n ) can be computed as follows:
It is noted that y n is just one realization of Y and a finite block of N realizations cannot cover the whole distribution of Y .
III. STT-MRAM CELL OPERATION CHANNEL
In this section, a complete STT-MRAM cell operation channel model with both write and read operations is proposed. This model considers transistor and MTJ process variations, random thermal fluctuations, writing failures, reading flipping errors, and resistance variations.
The proposed complete cell channel model is shown in Fig. 2 and includes two operations and three states. The two operations, write channel and read channel are further elaborated in Figs. 3 and 6 , respectively. The three states, target bit (TB), written-in bit (WIB), and read-out bit (ROB) represent the three different living states where a bit message resides, respectively, before writing, after writing (or before reading), and after reading.
A. Write Operation Channel
The write operation channel model is divided into five consecutive steps.
1) Generate the mean write current valueĪ PV w for the TB w = {0, 1}. 2) Add a random variation to the mean write currentĪ PV w to generate the affected write current I PV w process variation. 
5) Finally, a writing operation success/failure decision is made by comparing the given write pulse duration (WPD) T WPD w with the required final switching time T FST w . If T WPD w ≥ T FST w , the WIB z is successfully updated as the TB w; otherwise, the write operation fails and the WIB z keeps the previous state z − before this write operation
Moreover, the soft WIB state z, i.e., the MTJ resistance value R Z [22] , [25] , is such that
where t ox and A MTJ are the MTJ's tunneling oxide thickness and shape area, respectively. The previous steps of the write operation channel model are displayed in Fig. 3 and will be further detailed in Sections III-B and III-C.
The writing switching current I PV w impacted by the transistor and MTJ process variations can be modeled as a dual-exponential distribution [5] 
where w = 0 (resp. 1) is the write switching current direction in the P (resp. AP) direction;Ī PV w and σ PV w are, respectively, the mean nominal switching current value and the STD of the corresponding switching current listed in Table I .
The mapping of step 3 from a mean of the MTJ switching current I PV w to a switching frequency f PV w (reciprocal of the switching time T PV w ) for both transitions 0 → 1 and 1 → 0 is given in Fig. 4 [25] .
The ratio σ TF w /T PV w between the STD σ TF w and the mean of the MTJ switching time T PV w versus switching frequency f PV w is shown in Fig. 5 for both transitions 0 → 1 and 1 → 0 [25] . . In order to solve this theoretical problem, the random thermal-induced deviation T TF w is calibrated as T
where the PDF of δ E is given by 
where
For an intermediate switch time √ 3 ns < T PV w < 10 ns, the thermal-induced switching time T FSW w is a mixture of the two previous distributions [30] 
Therefore, the final switching time T FSW w follows the distribution characterized by the process variation induced by both the switching time T PV w and the STD. The write operation failure rate of the STT-MRAM cell at step 5 can be defined as the probability that the write access to the STT-MRAM cell cannot be completed within a given WPD T WPD w , i.e., the probability that the given WPD T WPD w is shorter than the final switching time T FST w . Both, the MTJ's tunneling oxide thickness t ox and the shape area A MTJ follow Gaussian distributions [25] :
where u tox and σ tox are the mean and STD of the tunneling oxide thickness, respectively, while u AMTJ and σ AMTJ are the mean and STD of the shape area, respectively. Considering (13), the equivalent resistance of the MTJ with technical variations can be approximated as
The technical parameters σ tox and σ AMTJ in (20)- (22) are obtained from [22] . The other parameters u tox and u AMTJ are taken from [25] in which an elliptical shaped 45 nm × 90 nm in-plane MTJ under a predictive technology model 45-nm model [31] was proposed. These parameters were calibrated with the measurement data from a leading magnetic recording company and are recalled in Table II. The STD σ V th of the threshold voltage V th (Table II) is approximately computed as [25] 
where W T and L T are, respectively, the transistor width and length in nanometer. Let us turn now to the evaluation of the write channel capacity. Given the equiprobable assumption made over the TABLE II  MTJ AND TRANSISTOR TECHNICAL PARAMETERS input TB w, the capacity of the write channel can be written as (24) where the key terms p(R z ) and p(R z |w) are given by
p(z), according to the previous state distribution p(z − ) can be computed as
From Section II-A and step 5 of Section III-A, the transition probability p(z|w, z − ) can be expressed in terms of write success and fail probabilities
Substituting (28) into (27) , p(z = 0) and p(z = 1) can be further written as
Since the previous state z − has asymptotically the same distribution as z, (29) can be reformulated as
Since p(w = 0) = p(w = 1) = 0.5, the ratio of p(z − = 1) and p(z − = 0) can readily be obtained as
Equation (32) simply means that the WIB z distribution depends only on the write operation success rate and that the state distribution ratio is exactly equal to the ratio of the AP and P write success probabilities. In other words, with no a priori information on the TB, the distribution of the WIB converges to the distribution given by (32) . Therefore, the equiprobable assumption does not hold anymore for the WIB z. Thus, computing the capacity (24) involves to obtain p(write success) and the PDF p(R z |w, z − ), which can be achieved by Monte Carlo simulations of the proposed write channel model.
B. Read Channel Model
Due to the unbalanced driving ability of the transistor, the failure probability of AP (0 → 1) writing is much higher than that of P (1 → 0) writing. However, the higher write operation failure probability gives a favor of lower flipping probability to the read operation. Therefore, differently from the write channel, the AP direction is preferable to the P direction for the read operation.
The read operation channel model can also be divided into five consecutive steps. (Table II) is the mean threshold voltage, V th is the actual threshold voltage depending of process variation, R r is the actual resistance value corresponding to ROB state r , and R REF = 1500 (Table II) . Due to the V th variation, the nominal resistance value for the bit decision is
It should be mentioned that there are many kinds of sense amplifiers [32] [33] [34] [35] , and none of them have really become a standard cell. Because of this, the sense amplifier in the read channel is assumed to be an ideal current sense amplifier with a reference current value simply being the mean of the current values of the low-and high-resistance states; in other words, this sense amplifier does neither consider the process variations, nor the thermal fluctuations.
Apart from the current direction and the current strength, the read operation is analogous to the write operation (Fig. 6) . In this way, most of the technical parameters and all the distribution models already used for the write channel can be used again for the read channel. In addition, the reference resistance is assumed to be ideal with neither process variations nor thermal fluctuations.
For the AP read operation over the 1T-1MTJ cell, there are three types of reading errors.
3) The cell stores a 0 but is flipped to 1. The capacity of the read channel can be written as
where the resistance distribution p(Rr ) of the nominal resistance value in (38) can be written as
and p(z) is obtained from (32) by computing p(write success). Therefore, in order to evaluate the PDF p(Rr |z) and to compute (39), one has to simulate both the write and read operations.
C. Write and Read Channel Capacity
The combined write and read channel capacity can be written as
where p(Rr ) and p(Rr |w) are
Similarly to Sections III-A and III-B, p(z|w, z − ) and p(Rr |z) are computed by simulating the write channel and the read channel, respectively.
IV. SIMULATIONS AND RESULTS
In this section, the proposed channel model is first validated by comparing simulation results to the experimental results published in [25] . After validation, the reliability of the Comparison of writing 1 error rates between our model and [25, Fig. 7(b) ].
1T-1MTJ operation channel is evaluated in terms of PDF, operation failure rate, BER, and channel capacity. All the process variations and thermal fluctuations mentioned in Section III are included in these simulations.
A. Model Validation
In this section, all the curves with REF represent the original experimental results published in [25] , whereas the curves with SIMU represent the recreated results via the proposed channel model.
Figs. 7 and 8 (see [25, Fig. 7(a) ]) show our simulation results of the write error rates (WERs) with a WPD equal to 10 and 20 ns. It can be observed that the recreated results obtained by the proposed STT-MRAM operation model follow closely the already published corresponding results. Fig. 9 (see [25, Fig. 7(b) ]) displays the required WPDs for different nMOS transistor widths. In this figure, the ideal switching time represents the results based on the mean device parameters without considering any process variations and thermal fluctuations. It can be observed that the recreated 1% and 5% WER switching time also follow closely the corresponding already published curves. The limited differences between the 1% (resp. 5%) WER switching time curves are mainly due to the small difference between the recreated ideal switching time and the corresponding published ideal switching time. Fig. 10 (see [25, Fig. 3]) gives the read error rates for different transistor widths. As [25] uses a practical sense amplifier, the recreated results with an ideal sense amplifier lead to slightly better results in most cases.
B. Write Operation Channel
In this section, the write channel is evaluated according to the model illustrated in Fig. 3 . The TBs are assumed to be equiprobable. We recall that the switching current parameters for the write operations are listed in Table I , the conversion from switching current to switching time is displayed in Fig. 4 and the thermal-induced switching time is generated by using (15)- (19) . Fig. 11 displays the whole process of the write operation channel with a 540-nm transistor size; Fig. 11(a) shows the distributions of write currents I PV w under the impact of process variations, (14) . Fig. 11(b) shows the distribution of switch time T PV w mapped from switching current I PV w (see Fig. 4 and step 3 in Section III-A), where the larger deviation for the AP direction (0 → 1) can be easily observed. Fig. 11(c) highlights even larger expansions of the final switching time T FST w of (11) affected by thermal fluctuations; a large difference can be noticed between Fig. 11(b) and (c) with and without thermal fluctuations, respectively. Moreover, in Fig. 11(c) , a green dashed line indicates a given WPD = 10 ns serving as the boundary between a write operation success (left-hand side) region with required cell flipping time inferior to this given WPD and a write operation failure (right-hand side) region. Fig. 11(d) gives the written-in resistance distributions generated from (22) for a WPD = 10 ns. The write failure for the 0 → 1 transition, i.e., the small red peak around the low-resistance state (∼1000 ) can be clearly observed, and this peak stems from the large tail existing at the right-hand side of the green 10-ns dashed line in Fig. 11(c) for too long switching durations. Therefore, for WPD = 10 ns, the high write operation failure rate can be predicted. Fig. 12 displays the whole process of the write operation channel with even more critical parameters (720-nm transistor size and WPD = 5.5 ns). Differently from Fig. 11 , due to the insufficient WPD, the reliability of writing 0 is also affected and a write operation failure for both transitions can be observed in Fig. 11(c) and (d) . Fig. 13 shows the write operation failure rates for different transistor widths and different WPDs. It is easy to notice that the write failure rate for the 0 → 1 transition is several orders of magnitude higher than the failure rate for the 1 → 0 transition. The larger the transistor, the larger the drive current strength, so that the required switching time is shorter, and thus the write operation failure rate tends to be lower. Similarly to increasing the transistor width, the same improvement for the write operation failure can be easily observed by increasing the WPD. Since the operation failure rate involves only 0 → 1 and 1 → 0 transitions, the performance is not influenced by the original cell state before the writing operation. Differently from Fig. 13, Fig. 14 measures the written-in BER and the corresponding simulations logically involves the original cell state before the writing operation. Therefore, the fact that the TB can be successfully written into the STT-MRAM cell depends also of the original cell state. To simplify the simulations, we assume that there are originally as much 0 s as 1 s. Fig. 15 displays the write channel capacity (24) for various transistor widths and various WPD. Similarly to Fig. 14 , since Fig. 15 is related with the bit reliability, the capacity simulations also involve the original cell state. From Fig. 15 for a WPD = 10 ns, a target channel code rate equal to 0.85 cannot meet the capacity requirement due to the too high write failure rate of the AP direction; moreover, solutions with transistor widths inferior to 200 nm can neither satisfy the system requirements. As the target code rate slightly increases to 0.9, solutions can only be selected among the designs with transistor width being superior to 360 nm. Fig. 16 shows the result given by (32) . It shows that the distribution of the WIB is not equiprobable in general and that writing 0 is always easier than writing 1. Moreover, solutions with WPD = 10 ns and solutions with a tran- sistor width smaller than 270 nm cause large differences between p(z = 0) and p(z = 1). By comparing the results of Fig. 13 with Fig. 16 , it can be further observed that the WIB approaches to the equiprobability as the write operation failure rate decreases. It can then be concluded by comparing with Fig. 15 that the hypothesis that WIBs are equiprobable holds for reliable write channels with the write channel capacity being superior to 0.9 bit/cell.
C. Read Operation Channel
In this section, the read channel is evaluated according to the model illustrated in Fig. 6 . The WIBs are assumed to be uniformly distributed to eliminate any write channel influence. The parameters for the read operation are listed in Table II and are applied to (20) - (23) .
Figs. 17 and 18 show the read operational failure rate and the read channel capacity for different transistor widths and for different RPDs. Because of the low values taken by read currents, the flipping error, i.e., the third type of read error nearly never happens even with RPD = 15 ns and operational failures are mainly due to process variations of the MTJ resistance and to the threshold voltage variations. As the threshold voltage STD decreases when the transistor width increases, the operation failure rate for a large transistor width is better than the rate for a small transistor width. Since the MTJ resistance distribution is independent of the transistor width, the read failure rate is no affected by the short RPDs. Moreover, because of the larger resistance variations of state 1, the failure rate to read a 1 is higher than that to read a 0. The read channel capacity of Fig. 18 is much higher than the write channel capacity of Fig. 15 ; this is due to the small current values used for read operations, and consequently the flipping error rate is near zero. Fig. 19 shows the ROB distributions after that the cell write and read operations are both completed. It can be seen that the ROB distributions are similar to the WIB distributions in Fig. 16 . However, as the channel width increases, the ROB distribution difference is slightly larger than the WIB distribution difference (compare the curves within the gray dashed circle in Fig. 19 with the corresponding curves in Fig. 16 ). This fact comes from the difference of MTJ resistance deviations for state 0 and state 1 due to process variations of the MTJ's shape surface and tunnel oxide thickness. However, the equiprobable assumption still approximately holds for the cases of a transistor size larger than 270 nm and a WPD longer than 10 ns.
D. Combined Write and Read Channel
Figs. 20 and 21, respectively, show the operation failure rate and the BER of the combined write and read operation channel. Due to the process variations of the MTJ resistance, there exists error floors for both P and AP directions. Note that the intrinsic resistance variations cannot be removed by changing extrinsic parameters, such as the transistor size or the WPD; therefore, using an ECC becomes absolutely compulsory when the BER does not meet the target requirement. Fig. 22 gives the combined channel capacities, i.e., the maximum bit number that can be reliably written in and read out in one cell. This metric gives the upper bound for the channel-coding rate with infinite code length. Note that as the transistor drive capacity increases, the reduced operation failure rate and the increased channel capacity indicate that an ECC can have less redundancy (i.e., higher efficiency) to protect messages. If the target code rate is 0.7, the transistor width needs to be larger than 270 nm and the WPD has to be kept longer than 17.5 ns. If the target code rate is 0.9, the minimum transistor width is 350 nm for a minimum WPD equal to 20 ns.
Finally, Fig. 23 gives the outage probability for an outage capacity C o = 0.9 bit/cell with different transistor widths, different BLKs and different WPDs. The ergodic curves of different WPDs and different transistor sizes serve as limits. The lowest outage probabilities simulated for different WPDs and all transistor sizes reach at least the level of 10 −7 . In other words, if a point is not plotted, it simply means that the performance is below 10 −7 .
It can be observed that the outage probability can be improved as the BLK increases; this is simply due to the fact that an increased BLK has more channel realizations and thus leads the block capacity to approach the ergodic capacity limit. Moreover, as the transistor width or the WPD increases, the decreased outage probability should be attributed to both improved the operation channel quality and the lower write operation failure rate. Note that for large outage probabilities the gain obtained with an increased BLK is usually smaller than the gain obtained with improved technical parameters; this is because the former only induces that the block capacity approaches the ergodic capacity while channel conditions are not improved; however, improving technical (i.e., physical) parameters directly increases the channel capacity.
Obviously, there is a price for improved technical parameters. For example, increasing the transistor width does improve the channel capacity and thus allows the use of higher code rate ECCs; however, both the memory area and power consumption then increase. Therefore, for a specific application, the optimum solution will be selected by balancing the various requirements among latency, throughput, size, and power constrain.
V. CONCLUSION
This paper proposed a complete channel model to simulate write and read operations of the 1T-1MTJ STT-MRAM cells. This model considered both process variations and thermal fluctuations. Based on the proposed cell operation channel, reliabilities, including operation failure rate, BER, channel ergodic capacity, and channel outage probability, were evaluated from an information theory perspective. Moreover, it is proved that the distributions of the WIB states are not equiprobable and that their ratio is determined by their respective write success probabilities. Finally, simulation results show that practical code rates and code BLKs can guarantee reliable performances only if the difference between state 1 and state 0 operation success rates is small enough.
