Magnetic random-access memory (MRAM) is a promising memory technology due to its high density, nonvolatility, and high endurance. However, achieving high memory fidelity incurs significant write-energy costs, which should be reduced for large-scale deployment of MRAMs. In this paper, we formulate an optimization problem for maximizing the memory fidelity given energy constraints, and propose a biconvex optimization approach to solve it. The basic idea is to allocate non-uniform write pulses depending on the importance of each bit position. The fidelity measure we consider is minimum mean squared error (MSE), for which we propose an iterative waterfilling algorithm. Although the iterative algorithm does not guarantee global optimality, we can choose a proper starting point that decreases the MSE exponentially and guarantees fast convergence. For an 8-bit accessed word, the proposed algorithm reduces the MSE by a factor of 21.
I. INTRODUCTION
Magnetic random access memory (MRAM) is a nonvolatile memory technology that has a potential to combine the speed of static RAM (SRAM) and the density of dynamic RAM (DRAM). Furthermore, MRAM technology is attractive since it provides high endurance and complementary metal-oxidesemiconductor (CMOS) compatibility [1] , [2] .
In spite of its attractive features, one of the main challenges is the high energy consumption to write information reliably in the memory element [1] - [3] . In an MRAM device, a memory state "1" or "0" is determined by the magnetic moment orientation of the memory element [1] . Switching the magnetic moment orientation requires high write current, which introduces write errors when the energy budget is limited [2] . In addition, high current injection through the tunneling barriers incurs a severe stress and leads to breakdown, which degrades the endurance of MRAM cells [3] , [4] . Hence, one of the key directions of MRAM research has been toward providing reliable switching with limited energy cost. At the device level, new materials [5] , [6] or new switching mechanisms [7] , [8] have been explored. Several architectural techniques to reduce write energy can be found in [3] , [9] , [10] .
However, prior efforts have not considered the differential importance of each bit position in error tolerant applications such as signal processing and machine learning (ML) tasks. In these applications, the impact of bit errors depends on bit position, i.e., most significant bits (MSBs) are more important than least significant bits (LSBs) [11] , [12] . This differential importance has been leveraged to effectively optimize energy Y. Kim in major memory technologies such as SRAMs [13] - [16] and DRAMs [17] , [18] .
In this paper, we provide a principled approach to improving MRAM's write fidelity. In error tolerant applications, the mean squared error (MSE) is a more meaningful fidelity metric than the write failure probability (or bit error rate). We formulate a biconvex optimization problem to minimize the MSE for a given write energy constraint. Since the write energy and the MSE depend on the write current and the write pulse duration, we attempt to optimize both parameters by solving the biconvex problem.
Biconvex problem is an optimization problem where the objective function and the constraint set are biconvex [19] . A common algorithm for solving biconvex problems is alternate convex search (ACS), which updates each variable by fixing another and solving the corresponding convex problem in an iterative manner [20] . We propose an iterative algorithm based on ACS to optimize the write current and the write pulse duration. In addition, we show that the proposed iterative algorithm converges and the convergence speed can be very fast by choosing a proper starting point.
In general, ACS cannot guarantee the global optimal solution since biconvex problems may have a large number of local minima [19] . However, we prove that the proposed iterative algorithm can reduce the MSE exponentially by choosing a proper starting point. Furthermore, we show that this starting point guarantees the fastest convergence. We derive analytic expressions of the optimal solutions for each iteration. Since each iteration of the algorithm corresponds to solving convex problems, we rely on the Karush-Kuhn-Tucker (KKT) conditions to derive the optimal solutions. We also provide waterfilling interpretations for each iteration.
Prior optimization studies on voltage swing of SRAMs [15] , [16] and refresh operations of DRAMs [18] are similar in spirit, viz. minimizing the MSE for given resource constraints. However, the MRAM write optimization of this work is nonconvex whereas the formulated problems in [15] , [18] are convex. Hence, we propose the iterative algorithm and analyze convergence and improvement of the optimized MSE. To the best of our knowledge, our work is the first informationtheoretic approach to optimization of write pulse parameters of MRAMs.
The rest of this paper is organized as follows. Section II explains the basics of MRAM and the challenges of high write energy consumption. Section III introduces the optimization metrics for MRAM write operations. Section IV formulates optimization problems and provides the iterative algorithm based on ACS. Section V provides theoretical analysis on convergence and MSE reduction. Section VI gives numerical results and Section VII concludes. 
II. BASIC PRINCIPLES OF MRAMS

A. Basics of MRAMs
MRAM cells store information by controlling bistable magnetization of ferromagnetic material and retrieve information by sensing resistance of magnetic tunnel junctions (MTJs). An MTJ device consists of two ferromagnetic layers of reference layer (RL) and free layer (FL), separated by a very thin tunneling barrier. RL has a very stable magnetization and it maintains the magnetization throughout all operations, while FL can be switched between two stable magnetization states by a moderate stimulus. The resistance of an MTJ depends on the relative orientation of the FL magnetization with respect to that of the RL as shown in Fig. 1 . If the magnetizations of FL and RL are in the same direction (parallel-or Pstate), then the corresponding resistance is low. The opposite direction (antiparallel-or AP-state) results in high resistance.
The difference in tunneling currents between a P-state (low resistance) and a AP-state (high resistance) is utilized to encode binary data [1] , [2] .
Writing information into an MTJ is performed by driving a sufficient current through it. Depending on the current's direction, one can flip the magnetization of the FL into P-or AP-state. If a current flows from FL to RL (electrons from RL to FL), electrons are spin-polarized along the magnetization of RL while passing through the layer. The electrons transmitted from the RL interact and exchange the magnetic moments with ones in the FL. If the MTJ is in the AP-state and the current is sufficiently high, then the magnetization orientation is flipped to P-state. When the current is reversed, incoming electrons are polarized along the magnetization of FL. Since the RL's magnetization is parallel to the FL, the majority of the electrons tunnel the barrier while the minority that have antiparallel magnetizations are reflected. Because of this selective tunneling, the antiparallel spins are accumulated in the FL. If the enriched antiparallel spin dominates the FL, it flips the magnetization of the FL into the AP-state.
The magnetization switching between P state and AP state is not deterministic. The write (switching) failure probability depends on the magnitude and the duration of the write current pulse as follows [4, Eq. (26)]:
where ∆ denotes the thermal stability factor. The normalized current i is given by i = I Ic where I denotes the actual write current and I c is the critical current. The normalized duration is given by t = T Tc where T denotes the actual write duration and T c is the characteristic relaxation time. Note that ∆, I c , and T c are fabrication parameters [4] , [21] .
To ensure a low write failure probability, we should control the write current magnitude or the duration judiciously. A longer write duration may lower the write failure probability at the expense of longer write latency and higher energy consumption. Instead of increasing the write duration, we can adopt higher write current. However, it increases the write energy and the risk of dielectric breakdown of the MTJ.
B. Subarray Architecture
The MRAM cells are arranged in arrays and each of the cells is selectively connected to the read/write circuits to access the data. The metal-oxide-semiconductor field-effect transistors (MOSFETs) are commonly used for the selectors in DRAMs where the required current for memory operations is low enough; a MOSFET with minimum feature sizes can drive the required current. However, the required MRAM write current is more than an order of magnitude higher than that of DRAMs, which requires MOSFETs with large channel width to drive high write current. They are not suitable for highdensity memories because of large area on a silicon substrate.
In order to handle this problem, each MRAM cell consists of an MTJ and a threshold switching selector [2] , [22] . These MRAM cells are populated in a crossbar array. To access an MRAM cell, a voltage higher than the threshold voltage of the selector is applied, which turns on the corresponding selector between the selected row-line and column-line, while all the unselected row-lines and column-lines are biased to a midpoint voltage, which keeps all the unselected cells in the array under biases below their threshold voltages. In this manner, the number of needed MOSFETs driving high currents can be reduced from n row × n col to n row + n col for a subarray, which is much better suited for high density memories.
Because of the limited current drivability of the row line and the column line drivers, only one cell can be accessed at a time in each subarray unlike DRAMs where a whole page (row-line) can be read/written together (see Fig. 2 ). Multiple subarrays are operated in parallel to match the required data bandwidth. This MRAM architecture provides an opportunity to write each bit in different conditions (e.g., write current and pulse duration).
III. METRICS FOR MRAM WRITE OPERATIONS
The write failure probability expression of (1) is too complicated to formulate an optimization problem. Fortunately, we can use the following approximation instead of (1):
where c = ∆π 2 4 . This is a slightly modified approximation of [4, Eq. (27) ] so as to formulate a biconvex optimization problem. Fig. 3 shows that the approximated write failure probability (2) is very close to (1), especially for lower p. The write failure probability can be controlled by the normalized current i and the normalized write duration t.
The normalized energy for writing a single bit is given by
As shown in (2) and (3), the write current i and the write duration t are key knobs to control the trade-off between write failure probability and the write energy. If we allocate different write currents and durations depending on the importance of each bit position, then the corresponding current and duration assignments are given by
where i 0 and t 0 define the write pulse for least significant bit (LSB) and i B−1 and t B−1 are the write pulse parameters for most significant bit (MSB). We define metrics for energy, latency, and fidelity for writing a B-bit word.
Definition 1 (Normalized Energy): The normalized energy of writing a B-bit word is given by 
Latency
Definition 2 (Normalized Latency): The normalized latency of writing a B-bit word depends on the maximum write duration among t = (t 0 , . . . , t B−1 ), i.e.,
Note that E(i, t) and L(t) are resource metrics. As a fidelity metric, we consider mean squared error (MSE).
Definition 3: The MSE of B-bit words is given by
where the weight 4 b represents the differential importance of each bit position [14] , [15] . Table I summarizes the defined metrics for writing a B-bit word.
IV. OPTIMIZING PARAMETERS OF WRITE OPERATIONS
In this section, we investigate optimization of write operation parameters. First, the optimized current and duration for a single bit will be discussed and then we provide biconvex optimization problems for a B-bit word.
A. Optimized Parameters for Single Bit Write
First, we note that the normalized current should be greater than 1 for a successful write in (2) . It shows that the write current should be greater than the critical current (i.e., I > I c ) so as to switch the direction of magnetization [4] , [21] . Then, we can formulate the following optimization problem for single-bit (also multi-bit uniform) write:
where E is a constant corresponding to the given write energy budget. We introduce ǫ > 0 to guarantee i > 1. This optimization problem is equivalent to
Note that the objective function (i − 1)t is not concave. However, we can readily obtain the optimal i * and t * as follows.
Lemma 4: The optimized current and duration for single bit write are i * = 2 and t * = E 4 , respectively. The corresponding write failure probability is given by
Proof: The proof is given in Appendix A. Note that the write failure probability is an exponentially decaying function of E.
B. Optimized Parameters for B-bit Word Writes
We formulate an optimization problem to determine the currents and durations. For a given write energy constraint, we seek to minimize MSE as follows.
We may include additional constraints such as L(t) ≤ δ to guarantee a required write speed performance. Note that L(t) ≤ δ is a convex constraint.
Although the optimization problem (11) is not convex, we show that (11) is a biconvex optimization problem. Hence, we can find suboptimal solutions via effective algorithms such as alternate convex search (ACS) [19] .
Definition 5 (Biconvex Set [19] ): Let S ⊆ X × Y where X ⊆ R n and Y ⊆ R m denote two non-empty and convex sets. The set S is defined as a biconvex set on X × Y , if for every fixed x ∈ X, S x {y ∈ Y | (x, y) ∈ S} is a convex set in Y and for every fixed y ∈ Y , S y {x ∈ X | (x, y) ∈ S} is a convex set in X.
Definition 6 (Biconvex Function [19] ): A function f : S → R is defined as a biconvex function on S, if for every fixed x ∈ X, f x (·) = f (x, ·) : S x → R is a convex function on S x , and for every fixed y ∈ Y , f y (·) = f (·, y) : S y → R is a convex function on S y .
Definition 7 (Biconvex Problem [19] ): An optimization problem of the following form:
is defined as a biconvex problem, if the feasible set S is biconvex on X × Y and the objective function f is biconvex on S. Theorem 8: The optimization problem (11) is biconvex. Proof: First, we show that
is a biconvex function of i b and t b . Since the positive weight 4 b preserves convexity, the objective function is biconvex.
Since (11) is a biconvex problem, ACS can effectively find a suboptimal solution [19] , [20] . It alternatively updates variables by fixing one of them and solving the corresponding convex optimization problem. We propose Algorithm 1 to optimize the write current i and the write duration t of the biconvex optimization problem (11) by using ACS.
Algorithm 1 ACS algorithm to solve (11) 1: Choose a starting point i (0) from the feasible set S and set k = 0. 2: For fixed i (k) , find t (k+1) by solving the following convex problem:
3: For fixed t (k+1) , find i (k+1) by solving the following convex problem.
4: If the point (i (k+1) , t (k+1) ) satisfies a stopping criterion, then stop. Otherwise, set k := k + 1 and go back to line 2.
Remark 9 (Starting Point):
Since biconvex optimization problems may have a large number of local minima [19] , a starting point i (0) can affect the final solution. We can choose i (0) = (2, . . . , 2) as a starting point, which minimizes the uniform write failure probability (see Lemma 4) . In Corollary 16, we show that this starting point guarantees the fastest convergence.
Remark 10 (Stopping Criterion [19] ): There are several ways to define the stopping criterion in Algorithm 1. For example, we can consider the absolute values of the differences between (i (k) , t (k) ) and (i (k+1) , t (k+1) ) or the difference between MSE(i (k) , t (k) ) and MSE(i (k+1) , t (k+1) ). Alternatively, we can set a maximum number of iterations.
V. ANALYSIS OF ALTERNATE CONVEX SEARCH FOR MRAM WRITE PARAMETERS A. Optimal Solutions for Each Iteration
In this subsection, we present the optimal solutions for (13) and (14) . Since these problems are convex, we exploit the structure of the problems to derive the optimal solutions analytically using the KKT conditions. Theorem 11: For fixed i (k) = i, the optimal t (k+1) = t * of (13) is given by
, otherwise
where ν is a dual variable of corresponding KKT conditions. Note that ν depends on the energy budget E. Proof: We define the Lagrangian L 1 (t, ν, λ) associated with problem (13) as
where ν and λ = (λ 0 , . . . , λ B−1 ) are the dual variables. The details of the proof are given in Appendix B. Theorem 12: For fixed t (k+1) = t, the optimal i (k+1) = i * of (14) is given by
where ν ′ is a dual variable. Also, W (·) denotes the Lambert W function (i.e., the inverse function of f (x) = xe x ) [23] . Proof: We define the Lagrangian L 2 (i, ν ′ , λ ′ ) associated with problem (14) as
where ν ′ and λ ′ = (λ ′ 0 , . . . , λ ′ B−1 ) are the dual variables. The details of the proof are given in Appendix C.
Remark 13: The solutions of (15) and (17) can be interpreted as water-filling. Each bit position can be regarded as an individual channel among B parallel channels as in [15] , [16] . The ground levels depend on the importance of bit positions; hence larger current or longer duration are assigned to more significant bit positions.
B. Convergence of MSE
We show that Algorithm 1 guarantees convergence to a locally optimal MSE. The converged MSE depends on a starting point.
Lemma 14: The sequence MSE(i (k) , t (k) ) k∈N obtained by Algorithm 1 is monotonically decreasing, i.e.,
Proof: Note that MSE(i (k) , t (k+1) ) ≤ MSE(i (k) , t (k) ) and MSE(i (k+1) , t (k+1) ) ≤ MSE(i (k) , t (k+1) ) because of (13) and (14) , respectively. Hence, MSE(i (k+1) , t (k+1) ) ≤ MSE(i (k) , t (k) ).
Theorem 15: The sequence MSE(i (k) , t (k) ) k∈N obtained by Algorithm 1 converges monotonically.
Proof: It is clear that MSE(i (k) , t (k) ) ≥ 0 for all k ∈ N by (2) and (7) . Then, MSE(i (k) , t (k) ) k∈N is monotonically decreasing and bounded below, MSE(i (k) , t (k) ) k∈N converges because of monotone convergence theorem.
Corollary 16: By setting i (0) = (2, . . . , 2), we obtain
if t
Proof: We will show that (i (0) , t (1) ) (i.e., the solution of (13)) satisfies the KKT conditions of (14) . Then, i (1) = i (0) , which makes Algorithm 1 converge in one step. The details of the proof are given in Appendix D.
Corollary 16 means that the starting point i (0) = (2, . . . , 2) guarantees the fastest convergence. Note that we do not need to solve (14) .
The starting point i (0) = (2, . . . , 2) guarantees the fastest convergence. Note that it minimizes the write failure probability for the single bit case (see Lemma 4) . In this subsection, we show that i (0) = (2, . . . , 2) is a good starting point, in the sense that it reduces the MSE exponentially with B.
Suppose that the starting point is i (0) = (2, . . . , 2). By Theorem 11 and Corollary 16, Algorithm 1 provides the following optimized write durations t (1) 
Proof: The proof is given in Appendix E. Theorem 18: If E > 2B(B − 1) log 2, then the MSE reduction ratio by Algorithm 1 is given by
where MSE(i (0) , t) (i.e., the optimized MSE by Algorithm 1) is given by
where the optimized t is given by (20) . In addition, MSE(i (0) , t (0) ) (i.e., the MSE by uniform energy allocation) is given by
where t (0) is the uniform value to satisfy the energy constraint (i.e., t (0) = E 4B · (1, . . . , 1)). Proof: The proof is given in Appendix E. Note that MSE i (0) , t (0) is the MSE corresponding to the parameters minimizing the write failure probability (see Lemma 4) .
Remark 19: By setting i (0) = (2, . . . , 2), Algorithm 1 reduces the MSE exponentially with B, compared to the parameters optimized for write failure probability. Although we cannot guarantee that (i (0) , t (1) = t) is globally optimal, (i (0) , t (1) ) decrease the MSE exponentially by solving (13) once (see Corollary 16) . Furthermore, the solution of (13) can be easily computed by Lemma 17. 
VI. NUMERICAL RESULTS
We evaluate the solutions to optimize the write failure probability for single bits as well as the MSE for B-bit words. The critical current I c and the characteristic relaxation time T c do not affect the numerical results because the normalized values i = I Ic and t = T Tc are considered. As in [4] , we set ∆ = 60 for the thermal stability factor. Fig. 4 shows that i * = 2 and t * = E 4 minimize the write failure probability as proved in Lemma 4. The corresponding minimal write failure probability decreases exponentially with the write energy as shown in (10) . Fig. 5 shows numerical results by solving (11) . Fig. 5 (a) compares the MSEs of uniform write energy allocation and the optimized energy allocation by Algorithm 1. We set a starting point i (0) = (2, . . . , 2). As shown in Theorem 18, the MSE reduction ratio is γ ≈ 3B 2 · 2 −B = 0.0469 for B = 8. Fig. 5 (b) compares the peak signal-to-noise ratios (PSNRs), which is a widely used fidelity metric for image and video quality. The PSNR depends on the MSE as PSNR = 10 log 10
At PSNR = 40 dB, the optimized write energy allocation can reduce the write energy by 24 %. Fig. 6 shows that the MSE reduction ratio improves exponentially with B (as derived in Theorem 18). Although we cannot guarantee the optimality, the proposed Algorithm 1 is very effective to reduce the MSE. Note that γ = 3.66 × 10 −4 for B = 16 and γ = 1.12 × 10 −8 for B = 32. Fig. 7 characterizes the convergence of Algorithm 1. The convergence speed depends on the starting point i (0) . For both i (0) = (1, . . . , 1) and i (0) = (2, . . . , 2), Algorithm 1 converges; however, the convergence speed of i (0) = (1, . . . , 1) is slower than that of i (0) = (2, . . . , 2). As shown in Corollary 16, the starting point i (0) = (2, . . . , 2) guarantees the fastest convergence (see Fig. 7(c) and (d) ). Fig. 8 compares the MSEs of i (0) = (1, . . . , 1) and i (0) = (2, . . . , 2). We observe that the MSE for i (0) = (2, . . . , 2) is better than that for i (0) = (1, . . . , 1) . The gap between these two MSE is vanishing as iterations progress. 
VII. CONCLUSION
We proposed an information-theoretic approach to improving MRAM's write energy efficiency. After formulating the biconvex optimization problem, we proposed the iterative Normalized Duration t t 0 t 1 t 2 t 3 t 4 t 5 t 6 t 7 algorithm to solve the biconvex problem, which attempts to minimize the MSE under a refresh power budget. Also, we proved that the proposed algorithm converges and it can reduce the MSE exponentially. The proposed optimization scheme can be extended in future work to coded information representations, where redundancy is added to the written values to further improve the fidelity.
APPENDIX A PROOF OF LEMMA 4
It is clear that i * and t * satisfy i 2 t = E to maximize (i − 1)t. Then, we can set t = E i 2 and the corresponding objective function is given by
Since g ′ (i) = E · 2−i i 3 , g ′ (2) = 0 and g ′ (i) < 0 for i > 2. Hence, g(i) is maximized when i * = 2 and t * = E 4 . 8 
APPENDIX B PROOF OF THEOREM 11
The corresponding KKT conditions are as follows:
. From (28), λ b is given by
Suppose that ν = 0. Then λ b < 0 because of i b ≥ 1+ǫ, which violates the condition of λ ≥ 0. Hence, ν = 0 and
From (27) and (29),
, then t b = 0. Otherwise (i.e., t b > 0
, then t b = 0 is not allowed because of (32). Hence, t b > 0 and λ b = 0. By (29) and i b ≥ 1 + ǫ,
which results in
APPENDIX C PROOF OF THEOREM 12
for b ∈ [0, B − 1]. From (37), λ ′ b is given by
Suppose that ν ′ = 0. Then, λ ′ b = −2t b · 4 b e −2t b (i b −1) ≤ 0, which is true only if t b = 0 for all b ∈ [0, B − 1]. Since this is a trivial case, we focus on ν ′ = 0 and B−1 b=0 i 2 b t b = E. If t b = 0, then the corresponding i b affects neither the MSE nor the energy. Hence, we suppose that t b = 0. If λ ′ b = 0, then
which is equivalent to
where W (·) denotes the Lambert W function [23] . Hence, > 0 for all b ∈ [0, B − 1], then we obtain the following solution by solving (13) .
which follows from (15) . We will show that (i (0) , t (1) ) also satisfies all the KKT conditions for (13) (i.e., (35)-(37) in Appendix C). First, (i (0) , t (1) ) satisfies (35) which is equivalent to (26). In addition, i (0) b = 2 satisfies (36) and makes λ ′ b = 0 for all b ∈ [0, B − 1]. Then, (37) will be
Suppose that i (1) = i (0) . Then, (39) is modified to ν ′ = 4 b 2 e −2t (1) b , which satisfies (43). Thus, (i (0) , t (1) ) satisfies all the KKT conditions of (13) and (14) .
APPENDIX E PROOF OF LEMMA 17 AND THEOREM 18
From (20) , we observe that t b > 0 for all b ∈ [0, B − 1] if ν < 1 2 . By (30), (i (0) , t) satisfies
Then, the condition ν < 1 2 is equivalent to E > 2B(B − 1) log 2.
(46)
Hence, t b > 0 for all b ∈ [0, B − 1] if (46) holds. By (20) and (45), we obtain (21) . By (7) and (21),
The uniform energy allocation of (i (0) , t (0) ) results in
From (47) and (48), we obtain (22) .
ACKNOWLEDGMENT
The work of Yuval Cassuto was partly supported by the US-Israel Binational Science Foundation, and by the Israel Science Foundation.
