Optimizing the Write Fidelity of MRAMs by Kim, Yongjune et al.
ar
X
iv
:2
00
1.
03
80
3v
1 
 [c
s.I
T]
  1
1 J
an
 20
20
1
Optimizing the Write Fidelity of MRAMs
Yongjune Kim, Yoocharn Jeon, Cyril Guyot, and Yuval Cassuto
Abstract—Magnetic random-access memory (MRAM) is a
promising memory technology due to its high density, non-
volatility, and high endurance. However, achieving high memory
fidelity incurs significant write-energy costs, which should be
reduced for large-scale deployment of MRAMs. In this paper,
we formulate an optimization problem for maximizing the mem-
ory fidelity given energy constraints, and propose a biconvex
optimization approach to solve it. The basic idea is to allocate
non-uniform write pulses depending on the importance of each
bit position. The fidelity measure we consider is minimum mean
squared error (MSE), for which we propose an iterative water-
filling algorithm. Although the iterative algorithm does not
guarantee global optimality, we can choose a proper starting
point that decreases the MSE exponentially and guarantees fast
convergence. For an 8-bit accessed word, the proposed algorithm
reduces the MSE by a factor of 21.
I. INTRODUCTION
Magnetic random access memory (MRAM) is a nonvolatile
memory technology that has a potential to combine the speed
of static RAM (SRAM) and the density of dynamic RAM
(DRAM). Furthermore, MRAM technology is attractive since
it provides high endurance and complementary metal-oxide-
semiconductor (CMOS) compatibility [1], [2].
In spite of its attractive features, one of the main challenges
is the high energy consumption to write information reliably in
the memory element [1]–[3]. In an MRAM device, a memory
state “1” or “0” is determined by the magnetic moment
orientation of the memory element [1]. Switching the magnetic
moment orientation requires high write current, which intro-
duces write errors when the energy budget is limited [2]. In
addition, high current injection through the tunneling barriers
incurs a severe stress and leads to breakdown, which degrades
the endurance of MRAM cells [3], [4]. Hence, one of the
key directions of MRAM research has been toward providing
reliable switching with limited energy cost. At the device level,
new materials [5], [6] or new switching mechanisms [7], [8]
have been explored. Several architectural techniques to reduce
write energy can be found in [3], [9], [10].
However, prior efforts have not considered the differential
importance of each bit position in error tolerant applications
such as signal processing and machine learning (ML) tasks.
In these applications, the impact of bit errors depends on bit
position, i.e., most significant bits (MSBs) are more important
than least significant bits (LSBs) [11], [12]. This differential
importance has been leveraged to effectively optimize energy
Y. Kim, Y. Jeon, and C. Guyot are with Western Digital Re-
search, Milpitas, CA 95035 USA (e-mail: {yongjune.kim, yoocharn.jeon,
cyril.guyot}@wdc.com). Y. Cassuto is with the Viterbi Department of Elec-
trical Engineering, Technion – Israel Institute of Technology, Haifa, Israel,
(e-mail: ycassuto@ee.technion.ac.il)
in major memory technologies such as SRAMs [13]–[16] and
DRAMs [17], [18].
In this paper, we provide a principled approach to improving
MRAM’s write fidelity. In error tolerant applications, the mean
squared error (MSE) is a more meaningful fidelity metric than
the write failure probability (or bit error rate). We formulate
a biconvex optimization problem to minimize the MSE for
a given write energy constraint. Since the write energy and
the MSE depend on the write current and the write pulse
duration, we attempt to optimize both parameters by solving
the biconvex problem.
Biconvex problem is an optimization problem where the
objective function and the constraint set are biconvex [19]. A
common algorithm for solving biconvex problems is alternate
convex search (ACS), which updates each variable by fixing
another and solving the corresponding convex problem in an
iterative manner [20]. We propose an iterative algorithm based
on ACS to optimize the write current and the write pulse
duration. In addition, we show that the proposed iterative
algorithm converges and the convergence speed can be very
fast by choosing a proper starting point.
In general, ACS cannot guarantee the global optimal solu-
tion since biconvex problems may have a large number of local
minima [19]. However, we prove that the proposed iterative
algorithm can reduce the MSE exponentially by choosing a
proper starting point. Furthermore, we show that this starting
point guarantees the fastest convergence. We derive analytic
expressions of the optimal solutions for each iteration. Since
each iteration of the algorithm corresponds to solving convex
problems, we rely on the Karush-Kuhn-Tucker (KKT) condi-
tions to derive the optimal solutions. We also provide water-
filling interpretations for each iteration.
Prior optimization studies on voltage swing of SRAMs [15],
[16] and refresh operations of DRAMs [18] are similar in
spirit, viz. minimizing the MSE for given resource constraints.
However, the MRAM write optimization of this work is non-
convex whereas the formulated problems in [15], [18] are
convex. Hence, we propose the iterative algorithm and analyze
convergence and improvement of the optimized MSE. To the
best of our knowledge, our work is the first information-
theoretic approach to optimization of write pulse parameters
of MRAMs.
The rest of this paper is organized as follows. Section II
explains the basics of MRAM and the challenges of high write
energy consumption. Section III introduces the optimization
metrics for MRAM write operations. Section IV formulates
optimization problems and provides the iterative algorithm
based on ACS. Section V provides theoretical analysis on
convergence and MSE reduction. Section VI gives numerical
results and Section VII concludes.
2Free layer
Reference 
layer
Barrier
P-state
(low resistance)
AP-state
(high resistance)
Fig. 1. P state and AP state of MTJ MRAM devices.
II. BASIC PRINCIPLES OF MRAMS
A. Basics of MRAMs
MRAM cells store information by controlling bistable mag-
netization of ferromagnetic material and retrieve information
by sensing resistance of magnetic tunnel junctions (MTJs). An
MTJ device consists of two ferromagnetic layers of reference
layer (RL) and free layer (FL), separated by a very thin
tunneling barrier. RL has a very stable magnetization and it
maintains the magnetization throughout all operations, while
FL can be switched between two stable magnetization states
by a moderate stimulus. The resistance of an MTJ depends on
the relative orientation of the FL magnetization with respect
to that of the RL as shown in Fig. 1. If the magnetizations
of FL and RL are in the same direction (parallel- or P-
state), then the corresponding resistance is low. The opposite
direction (antiparallel- or AP-state) results in high resistance.
The difference in tunneling currents between a P-state (low
resistance) and a AP-state (high resistance) is utilized to
encode binary data [1], [2].
Writing information into an MTJ is performed by driving
a sufficient current through it. Depending on the current’s
direction, one can flip the magnetization of the FL into P- or
AP-state. If a current flows from FL to RL (electrons from RL
to FL), electrons are spin-polarized along the magnetization of
RL while passing through the layer. The electrons transmitted
from the RL interact and exchange the magnetic moments
with ones in the FL. If the MTJ is in the AP-state and the
current is sufficiently high, then the magnetization orientation
is flipped to P-state. When the current is reversed, incoming
electrons are polarized along the magnetization of FL. Since
the RL’s magnetization is parallel to the FL, the majority
of the electrons tunnel the barrier while the minority that
have antiparallel magnetizations are reflected. Because of this
selective tunneling, the antiparallel spins are accumulated in
the FL. If the enriched antiparallel spin dominates the FL, it
flips the magnetization of the FL into the AP-state.
The magnetization switching between P state and AP state
is not deterministic. The write (switching) failure probability
depends on the magnitude and the duration of the write current
pulse as follows [4, Eq. (26)]:
p(i, t) = 1− exp
(
−
∆π2(i − 1)
4 {i exp(2(i− 1)t)− 1}
)
, (1)
where ∆ denotes the thermal stability factor. The normalized
current i is given by i = I
Ic
where I denotes the actual write
current and Ic is the critical current. The normalized duration
is given by t = T
Tc
where T denotes the actual write duration
R
o
w
 D
e
c
o
d
e
r
Col. Decoder
R/W Circuit
R
o
w
 D
e
c
o
d
e
r
Col. Decoder
R/W Circuit
Data Buffer
Data Bus
Address Bus
1-bit 1-bit
SubarraySubarray
Fig. 2. MRAM subarray architecture where each subarray consists of nrow
rows and ncol columns.
and Tc is the characteristic relaxation time. Note that ∆, Ic,
and Tc are fabrication parameters [4], [21].
To ensure a low write failure probability, we should control
the write current magnitude or the duration judiciously. A
longer write duration may lower the write failure probability
at the expense of longer write latency and higher energy
consumption. Instead of increasing the write duration, we can
adopt higher write current. However, it increases the write
energy and the risk of dielectric breakdown of the MTJ.
B. Subarray Architecture
The MRAM cells are arranged in arrays and each of the
cells is selectively connected to the read/write circuits to
access the data. The metal-oxide-semiconductor field-effect
transistors (MOSFETs) are commonly used for the selectors
in DRAMs where the required current for memory operations
is low enough; a MOSFET with minimum feature sizes can
drive the required current. However, the required MRAM write
current is more than an order of magnitude higher than that of
DRAMs, which requires MOSFETs with large channel width
to drive high write current. They are not suitable for high-
density memories because of large area on a silicon substrate.
In order to handle this problem, each MRAM cell consists
of an MTJ and a threshold switching selector [2], [22]. These
MRAM cells are populated in a crossbar array. To access
an MRAM cell, a voltage higher than the threshold voltage
of the selector is applied, which turns on the corresponding
selector between the selected row-line and column-line, while
all the unselected row-lines and column-lines are biased to
a midpoint voltage, which keeps all the unselected cells in
the array under biases below their threshold voltages. In this
manner, the number of needed MOSFETs driving high currents
can be reduced from nrow× ncol to nrow + ncol for a subarray,
which is much better suited for high density memories.
Because of the limited current drivability of the row line
and the column line drivers, only one cell can be accessed at
a time in each subarray unlike DRAMs where a whole page
(row-line) can be read/written together (see Fig. 2). Multiple
subarrays are operated in parallel to match the required data
31 2 3 4 5 6 7 8 9 10
Normalized Current i
10-10
10-8
10-6
10-4
10-2
100
W
rit
e 
Fa
ilu
re
 P
ro
ba
bi
lity
 p
p(i,t=1)
p(i,t=2)
p(i,t=5)
p(i,t=10)
p(i,t=20)
p(i,t=1) approx
p(i,t=2) approx
p(i,t=5) approx
p(i,t=10) approx
p(i,t=20) approx
Fig. 3. Comparison of the write failure probability (1) and its approximation
(2) (∆ = 60 as in [4, Fig. 13]).
bandwidth. This MRAM architecture provides an opportunity
to write each bit in different conditions (e.g., write current and
pulse duration).
III. METRICS FOR MRAM WRITE OPERATIONS
The write failure probability expression of (1) is too com-
plicated to formulate an optimization problem. Fortunately, we
can use the following approximation instead of (1):
p(i, t) ≈ c exp (−2(i− 1)t) . (2)
where c = ∆π
2
4 . This is a slightly modified approximation
of [4, Eq. (27)] so as to formulate a biconvex optimization
problem. Fig. 3 shows that the approximated write failure
probability (2) is very close to (1), especially for lower p. The
write failure probability can be controlled by the normalized
current i and the normalized write duration t.
The normalized energy for writing a single bit is given by
E(i, t) = i2t. (3)
As shown in (2) and (3), the write current i and the write
duration t are key knobs to control the trade-off between write
failure probability and the write energy. If we allocate different
write currents and durations depending on the importance of
each bit position, then the corresponding current and duration
assignments are given by
i = (i0, . . . , iB−1), t = (t0, . . . , tB−1) (4)
where i0 and t0 define the write pulse for least significant bit
(LSB) and iB−1 and tB−1 are the write pulse parameters for
most significant bit (MSB).
We define metrics for energy, latency, and fidelity for
writing a B-bit word.
Definition 1 (Normalized Energy): The normalized energy
of writing a B-bit word is given by
E(i, t) =
B−1∑
b=0
i2btb. (5)
TABLE I
RESOURCE AND FIDELITY METRICS FOR WRITE OPERATION
Metrics Remarks
Energy E(i, t) =
∑
B−1
b=0 i
2
b
tb Definition 1
Latency L(t) = max{t0, . . . , tB−1} Definition 2
Fidelity MSE(i, t) =
∑
B−1
b=0 4
bp(ib, tb) Definition 3
Definition 2 (Normalized Latency): The normalized latency
of writing a B-bit word depends on the maximum write
duration among t = (t0, . . . , tB−1), i.e.,
L(t) = max{t0, . . . , tB−1}. (6)
Note that E(i, t) and L(t) are resource metrics. As a fidelity
metric, we consider mean squared error (MSE).
Definition 3: The MSE of B-bit words is given by
MSE(i, t) =
B−1∑
b=0
4bp(ib, tb)
= c ·
B−1∑
b=0
4b exp (−2(ib − 1)tb) (7)
where the weight 4b represents the differential importance of
each bit position [14], [15].
Table I summarizes the defined metrics for writing a B-bit
word.
IV. OPTIMIZING PARAMETERS OF WRITE OPERATIONS
In this section, we investigate optimization of write opera-
tion parameters. First, the optimized current and duration for
a single bit will be discussed and then we provide biconvex
optimization problems for a B-bit word.
A. Optimized Parameters for Single Bit Write
First, we note that the normalized current should be greater
than 1 for a successful write in (2). It shows that the write
current should be greater than the critical current (i.e., I >
Ic) so as to switch the direction of magnetization [4], [21].
Then, we can formulate the following optimization problem
for single-bit (also multi-bit uniform) write:
minimize
i,t
p(i, t) = c exp (−2(i− 1)t)
subject to i2t ≤ E , i ≥ 1 + ǫ, t ≥ 0,
(8)
where E is a constant corresponding to the given write energy
budget. We introduce ǫ > 0 to guarantee i > 1. This
optimization problem is equivalent to
maximize
i,t
(i − 1)t
subject to i2t ≤ E , i ≥ 1 + ǫ, t ≥ 0.
(9)
Note that the objective function (i − 1)t is not concave.
However, we can readily obtain the optimal i∗ and t∗ as
follows.
4Lemma 4: The optimized current and duration for single bit
write are i∗ = 2 and t∗ = E4 , respectively. The corresponding
write failure probability is given by
p(i∗, t∗) = c exp
(
−
E
2
)
. (10)
Proof: The proof is given in Appendix A.
Note that the write failure probability is an exponentially
decaying function of E .
B. Optimized Parameters for B-bit Word Writes
We formulate an optimization problem to determine the
currents and durations. For a given write energy constraint,
we seek to minimize MSE as follows.
minimize
i,t
B−1∑
b=0
4b exp(−2(ib − 1)tb)
subject to
B−1∑
b=0
i2btb ≤ E
ib ≥ 1 + ǫ, tb ≥ 0, b = 0, . . . , B − 1
(11)
We may include additional constraints such as L(t) ≤ δ
to guarantee a required write speed performance. Note that
L(t) ≤ δ is a convex constraint.
Although the optimization problem (11) is not convex, we
show that (11) is a biconvex optimization problem. Hence, we
can find suboptimal solutions via effective algorithms such as
alternate convex search (ACS) [19].
Definition 5 (Biconvex Set [19]): Let S ⊆ X × Y where
X ⊆ Rn and Y ⊆ Rm denote two non-empty and convex
sets. The set S is defined as a biconvex set on X × Y , if for
every fixed x ∈ X , Sx , {y ∈ Y | (x,y) ∈ S} is a convex set
in Y and for every fixed y ∈ Y , Sy , {x ∈ X | (x,y) ∈ S}
is a convex set in X .
Definition 6 (Biconvex Function [19]): A function f : S →
R is defined as a biconvex function on S, if for every fixed
x ∈ X , fx(·) = f(x, ·) : Sx → R is a convex function on
Sx, and for every fixed y ∈ Y , fy(·) = f(·,y) : Sy → R is a
convex function on Sy.
Definition 7 (Biconvex Problem [19]): An optimization
problem of the following form:
minimize {f(x,y) | (x,y) ∈ S} (12)
is defined as a biconvex problem, if the feasible set S is
biconvex on X × Y and the objective function f is biconvex
on S.
Theorem 8: The optimization problem (11) is biconvex.
Proof: First, we show that
∑B−1
b=0 i
2
btb ≤ E is a biconvex
set. Note that i2btb is a convex function of ib for every fixed
tb ≥ 0. In addition, i
2
btb is a convex function for every fixed
ib ≥ 1 + ǫ. Hence,
∑B−1
b=0 i
2
btb ≤ E is a biconvex set.
It is clear that exp(−2(ib − 1)tb) is a biconvex function of
ib and tb. Since the positive weight 4
b preserves convexity, the
objective function is biconvex.
Since (11) is a biconvex problem, ACS can effectively
find a suboptimal solution [19], [20]. It alternatively updates
variables by fixing one of them and solving the corresponding
convex optimization problem. We propose Algorithm 1 to
optimize the write current i and the write duration t of the
biconvex optimization problem (11) by using ACS.
Algorithm 1 ACS algorithm to solve (11)
1: Choose a starting point i(0) from the feasible set S and
set k = 0.
2: For fixed i(k), find t(k+1) by solving the following convex
problem:
minimize
t
B−1∑
b=0
4b exp
(
−2
(
i
(k)
b − 1
)
tb
)
subject to
B−1∑
b=0
(i
(k)
b )
2tb ≤ E
tb ≥ 0, b = 0, . . . , B − 1
(13)
3: For fixed t(k+1), find i(k+1) by solving the following
convex problem.
minimize
i
B−1∑
b=0
4b exp
(
−2(ib − 1)t
(k+1)
b
)
subject to
B−1∑
b=0
i2bt
(k+1)
b ≤ E
ib ≥ 1 + ǫ, b = 0, . . . , B − 1
(14)
4: If the point (i(k+1), t(k+1)) satisfies a stopping criterion,
then stop. Otherwise, set k := k + 1 and go back to line
2.
Remark 9 (Starting Point): Since biconvex optimization
problems may have a large number of local minima [19], a
starting point i(0) can affect the final solution. We can choose
i(0) = (2, . . . , 2) as a starting point, which minimizes the
uniform write failure probability (see Lemma 4). In Corol-
lary 16, we show that this starting point guarantees the fastest
convergence.
Remark 10 (Stopping Criterion [19]): There are several
ways to define the stopping criterion in Algorithm 1. For
example, we can consider the absolute values of the differences
between (i(k), t(k)) and (i(k+1), t(k+1)) or the difference be-
tween MSE(i(k), t(k)) and MSE(i(k+1), t(k+1)). Alternatively,
we can set a maximum number of iterations.
V. ANALYSIS OF ALTERNATE CONVEX SEARCH FOR
MRAM WRITE PARAMETERS
A. Optimal Solutions for Each Iteration
In this subsection, we present the optimal solutions for
(13) and (14). Since these problems are convex, we exploit
the structure of the problems to derive the optimal solutions
analytically using the KKT conditions.
Theorem 11: For fixed i(k) = i, the optimal t(k+1) = t∗ of
(13) is given by
t∗b =

0, if ν ≥ 2·4
b(ib−1)
i2
b
;
log
(
1
ν
·
2·4b(ib−1)
i2
b
)
2(ib−1)
, otherwise
(15)
5where ν is a dual variable of corresponding KKT conditions.
Note that ν depends on the energy budget E .
Proof: We define the Lagrangian L1(t, ν,λ) associated
with problem (13) as
L1(t, ν,λ) =
B−1∑
b=0
4b exp(−2(ib − 1)tb)
+ ν
(
B−1∑
b=0
i2btb − E
)
−
B−1∑
b=0
λbtb (16)
where ν and λ = (λ0, . . . , λB−1) are the dual variables. The
details of the proof are given in Appendix B.
Theorem 12: For fixed t(k+1) = t, the optimal i(k+1) = i∗
of (14) is given by
i∗b =
{
1 + ǫ, if ν′ ≥ 4
b
1+ǫe
−2tbǫ;
1
2tb
W
(
2·4btbe
2t
b
ν′
)
, otherwise
(17)
where ν′ is a dual variable. Also, W (·) denotes the Lambert
W function (i.e., the inverse function of f(x) = xex) [23].
Proof: We define the Lagrangian L2(i, ν
′,λ′) associated
with problem (14) as
L2(i, ν
′,λ′) =
B−1∑
b=0
4be−2(ib−1)tb + ν′
(
B−1∑
b=0
i2btb − E
)
−
B−1∑
b=0
λ′b {ib − (1 + ǫ)} (18)
where ν′ and λ′ = (λ′0, . . . , λ
′
B−1) are the dual variables. The
details of the proof are given in Appendix C.
Remark 13: The solutions of (15) and (17) can be interpreted
as water-filling. Each bit position can be regarded as an
individual channel among B parallel channels as in [15], [16].
The ground levels depend on the importance of bit positions;
hence larger current or longer duration are assigned to more
significant bit positions.
B. Convergence of MSE
We show that Algorithm 1 guarantees convergence to a
locally optimal MSE. The converged MSE depends on a
starting point.
Lemma 14: The sequence
{
MSE(i(k), t(k))
}
k∈N
ob-
tained by Algorithm 1 is monotonically decreasing, i.e.,
MSE(i(k+1), t(k+1)) ≤ MSE(i(k), t(k)) for all k ∈ N.
Proof: Note that MSE(i(k), t(k+1)) ≤ MSE(i(k), t(k))
and MSE(i(k+1), t(k+1)) ≤ MSE(i(k), t(k+1)) because of
(13) and (14), respectively. Hence, MSE(i(k+1), t(k+1)) ≤
MSE(i(k), t(k)).
Theorem 15: The sequence
{
MSE(i(k), t(k))
}
k∈N
obtained
by Algorithm 1 converges monotonically.
Proof: It is clear thatMSE(i(k), t(k)) ≥ 0 for all k ∈ N by
(2) and (7). Then,
{
MSE(i(k), t(k))
}
k∈N
is monotonically de-
creasing and bounded below,
{
MSE(i(k), t(k))
}
k∈N
converges
because of monotone convergence theorem.
Corollary 16: By setting i(0) = (2, . . . , 2), we obtain
lim
k→∞
(
i(k), t(k)
)
=
(
i(0), t(1)
)
, (19)
if t
(1)
b 6= 0 for all b ∈ [0, B − 1].
Proof: We will show that (i(0), t(1)) (i.e., the solution of
(13)) satisfies the KKT conditions of (14). Then, i(1) = i(0),
which makes Algorithm 1 converge in one step. The details
of the proof are given in Appendix D.
Corollary 16 means that the starting point i(0) = (2, . . . , 2)
guarantees the fastest convergence. Note that we do not need
to solve (14).
C. Starting Point of i(0) = (2, . . . , 2)
The starting point i(0) = (2, . . . , 2) guarantees the fastest
convergence. Note that it minimizes the write failure probabil-
ity for the single bit case (see Lemma 4). In this subsection,
we show that i(0) = (2, . . . , 2) is a good starting point, in the
sense that it reduces the MSE exponentially with B.
Suppose that the starting point is i(0) = (2, . . . , 2). By
Theorem 11 and Corollary 16, Algorithm 1 provides the
following optimized write durations t(1) = t˜ = (t˜0, . . . , t˜B−1)
where
t˜b =
{
0, if ν ≥ 4
b
2 ;
1
2 log
(
1
ν
· 4
b
2
)
, otherwise.
(20)
Lemma 17: If E > 2B(B − 1) log 2, then t˜b > 0 for all
b ∈ [0, B − 1] and
t˜b =
E
4B
+
(
b−
B − 1
2
)
· log 2. (21)
Proof: The proof is given in Appendix E.
Theorem 18: If E > 2B(B − 1) log 2, then the MSE
reduction ratio by Algorithm 1 is given by
γ =
MSE
(
i(0), t˜
)
MSE
(
i(0), t(0)
) = 3B
2
·
2B
4B − 1
≈
3B
2
· 2−B (22)
where MSE(i(0), t˜) (i.e., the optimized MSE by Algorithm 1)
is given by
MSE
(
i(0), t˜
)
= c ·
B
2
· 2B exp
(
−
E
2B
)
(23)
where the optimized t˜ is given by (20). In addition,
MSE(i(0), t(0)) (i.e., the MSE by uniform energy allocation)
is given by
MSE
(
i(0), t(0)
)
= c ·
4B − 1
3
exp
(
−
E
2B
)
(24)
where t(0) is the uniform value to satisfy the energy constraint
(i.e., t(0) = E4B · (1, . . . , 1)).
Proof: The proof is given in Appendix E.
Note that MSE
(
i(0), t(0)
)
is the MSE corresponding to
the parameters minimizing the write failure probability (see
Lemma 4).
Remark 19: By setting i(0) = (2, . . . , 2), Algorithm 1
reduces the MSE exponentially with B, compared to the
parameters optimized for write failure probability. Although
we cannot guarantee that (i(0), t(1) = t˜) is globally optimal,
(i(0), t(1)) decrease the MSE exponentially by solving (13)
once (see Corollary 16). Furthermore, the solution of (13) can
be easily computed by Lemma 17.
61 2 3 4 5 6 7 8
Normalized Current i
10-10
10-8
10-6
10-4
10-2
100
W
rit
e 
Fa
ilu
re
 P
ro
ba
bi
lity
 p
E = 25
E = 30
E = 35
E = 40
E = 45
E = 50
Fig. 4. Normalized write current to minimize the write failure probability
(see Lemma 4) for several energy constraints.
VI. NUMERICAL RESULTS
We evaluate the solutions to optimize the write failure
probability for single bits as well as the MSE for B-bit words.
The critical current Ic and the characteristic relaxation time
Tc do not affect the numerical results because the normalized
values i = I
Ic
and t = T
Tc
are considered. As in [4], we set
∆ = 60 for the thermal stability factor.
Fig. 4 shows that i∗ = 2 and t∗ = E4 minimize the write
failure probability as proved in Lemma 4. The corresponding
minimal write failure probability decreases exponentially with
the write energy as shown in (10).
Fig. 5 shows numerical results by solving (11). Fig. 5(a)
compares the MSEs of uniform write energy allocation and the
optimized energy allocation by Algorithm 1. We set a starting
point i(0) = (2, . . . , 2). As shown in Theorem 18, the MSE
reduction ratio is γ ≈ 3B2 ·2
−B = 0.0469 for B = 8. Fig. 5(b)
compares the peak signal-to-noise ratios (PSNRs), which is a
widely used fidelity metric for image and video quality. The
PSNR depends on the MSE as PSNR = 10 log10
(2B−1)2
MSE
. At
PSNR = 40dB, the optimized write energy allocation can
reduce the write energy by 24%.
Fig. 6 shows that the MSE reduction ratio improves expo-
nentially with B (as derived in Theorem 18). Although we
cannot guarantee the optimality, the proposed Algorithm 1 is
very effective to reduce the MSE. Note that γ = 3.66× 10−4
for B = 16 and γ = 1.12× 10−8 for B = 32.
Fig. 7 characterizes the convergence of Algorithm 1. The
convergence speed depends on the starting point i(0). For both
i(0) = (1, . . . , 1) and i(0) = (2, . . . , 2), Algorithm 1 con-
verges; however, the convergence speed of i(0) = (1, . . . , 1)
is slower than that of i(0) = (2, . . . , 2). As shown in Corol-
lary 16, the starting point i(0) = (2, . . . , 2) guarantees the
fastest convergence (see Fig. 7(c) and (d)). Fig. 8 compares
the MSEs of i(0) = (1, . . . , 1) and i(0) = (2, . . . , 2). We
observe that the MSE for i(0) = (2, . . . , 2) is better than that
for i(0) = (1, . . . , 1). The gap between these two MSE is
vanishing as iterations progress.
10 15 20 25 30 35 40
Normalized Write Energy per Bit (E/B)
10-4
10-3
10-2
10-1
100
101
102
103
104
M
SE
Uniform energy allocation
Optimized energy allocation
(a)
10 15 20 25 30 35 40
Normalized Write Energy per Bit (E/B)
0
10
20
30
40
50
60
70
80
90
PS
NR
 [d
B]
Uniform energy allocation
Optimized energy allocation
(b)
Fig. 5. Comparison of the conventional uniform energy allocation and the
optimized energy allocation by Algorithm 1 (B = 8): (a) MSE and (b) PNSR.
1 2 4 8 16 32
Bits per Word B
10-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
M
SE
 R
ed
uc
tio
n 
Ra
tio
 
Fig. 6. The MSE reduction ratio γ by Theorem 18.
VII. CONCLUSION
We proposed an information-theoretic approach to improv-
ing MRAM’s write energy efficiency. After formulating the
biconvex optimization problem, we proposed the iterative
70 10 20 30 40 50 60
Iteration
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
3
N
or
m
al
iz
ed
 C
ur
re
nt
 i
i0 i1 i2 i3 i4 i5 i6 i7
(a)
0 10 20 30 40 50 60
Iteration
6
7
8
9
10
11
12
13
N
or
m
al
iz
ed
 D
ur
at
io
n 
t
t0 t1 t2 t3 t4 t5 t6 t7
(b)
0 10 20 30 40 50 60
Iteration
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
3
N
or
m
al
iz
ed
 C
ur
re
nt
 i
i0 i1 i2 i3 i4 i5 i6 i7
(c)
0 10 20 30 40 50 60
Iteration
6
7
8
9
10
11
12
13
N
or
m
al
iz
ed
 D
ur
at
io
n 
t
t0 t1 t2 t3 t4 t5 t6 t7
(d)
Fig. 7. Convergence of Algorithm 1 (B = 8 and E = 300): (a) i∗ for i(0) = (1, . . . , 1), (b) t∗ for i(0) = (1, . . . , 1), (c) i∗ for i(0) = (2, . . . , 2), and (d)
t
∗ for i(0) = (2, . . . , 2).
0 10 20 30 40 50 60
Iteration
5
5.5
6
6.5
7
7.5
8
8.5
9
9.5
10
M
SE
10-4
i(0) = (1,...,1)
i(0) = (2,...,2)
Fig. 8. The MSE comparison for i(0) = (1, . . . , 1) and i(0) = (2, . . . , 2)
(B = 8 and E = 300).
algorithm to solve the biconvex problem, which attempts to
minimize the MSE under a refresh power budget. Also, we
proved that the proposed algorithm converges and it can
reduce the MSE exponentially. The proposed optimization
scheme can be extended in future work to coded information
representations, where redundancy is added to the written
values to further improve the fidelity.
APPENDIX A
PROOF OF LEMMA 4
It is clear that i∗ and t∗ satisfy i2t = E to maximize (i −
1)t. Then, we can set t = E
i2
and the corresponding objective
function is given by
g(i) = (i− 1)t = E ·
i− 1
i2
. (25)
Since g′(i) = E · 2−i
i3
, g′(2) = 0 and g′(i) < 0 for i > 2.
Hence, g(i) is maximized when i∗ = 2 and t∗ = E4 .
8APPENDIX B
PROOF OF THEOREM 11
The corresponding KKT conditions are as follows:
B−1∑
b=0
i2btb ≤ E , ν ≥ 0, ν ·
(
B−1∑
b=0
i2btb − E
)
= 0, (26)
tb ≥ 0, λb ≥ 0, λbtb = 0 (27)
∂L1
∂tb
= −2 · 4b(ib − 1)e
−2(ib−1)tb + νi2b − λb = 0 (28)
for b ∈ [0, B − 1]. From (28), λb is given by
λb = i
2
b
(
ν −
2 · 4b(ib − 1)
i2b
· e−2(ib−1)tb
)
. (29)
Suppose that ν = 0. Then λb < 0 because of ib ≥ 1+ǫ, which
violates the condition of λ ≥ 0. Hence, ν 6= 0 and
B−1∑
b=0
i2btb = E . (30)
From (27) and (29),
λbtb = i
2
b · tb
{
ν −
2 · 4b(ib − 1)
i2b
· e−2(ib−1)tb
}
= 0. (31)
Because of λb ≥ 0 and ib ≥ 1 + ǫ, we obtain
ν ≥
2 · 4b(ib − 1)
i2b
· e−2(ib−1)tb . (32)
If ν ≥ 2·4
b(ib−1)
i2
b
, then tb = 0. Otherwise (i.e., tb > 0
and λb = 0), then ν =
2·4b(ib−1)
i2
b
· e−2(ib−1)tb . Because
of e−2(ib−1)tb < 1 for tb > 0, which contradicts to ν ≥
2·4b(ib−1)
i2
b
.
If ν <
2·4b(ib−1)
i2
b
, then tb = 0 is not allowed because of
(32). Hence, tb > 0 and λb = 0. By (29) and ib ≥ 1 + ǫ,
ν =
2 · 4b(ib − 1)
i2b
· e−2(ib−1)tb , (33)
which results in
t∗b =
1
2(ib − 1)
log
(
1
ν
·
2 · 4b(ib − 1)
i2b
)
. (34)
APPENDIX C
PROOF OF THEOREM 12
The corresponding KKT conditions are as follows:
B−1∑
b=0
i2btb ≤ E , ν
′ ≥ 0, ν′ ·
(
B−1∑
b=0
i2btb − E
)
= 0, (35)
ib ≥ 1 + ǫ, λ
′
b ≥ 0, λ
′
b{ib − (1 + ǫ)} = 0 (36)
∂L2
∂ib
= −2 · 4btbe
−2(ib−1)tb + 2ν′tbib − λ
′
b = 0 (37)
for b ∈ [0, B − 1]. From (37), λ′b is given by
λ′b = 2tbib
(
ν′ −
4be−2tb(ib−1)
ib
)
. (38)
Suppose that ν′ = 0. Then, λ′b = −2tb · 4
be−2tb(ib−1) ≤ 0,
which is true only if tb = 0 for all b ∈ [0, B − 1]. Since this
is a trivial case, we focus on ν′ 6= 0 and
∑B−1
b=0 i
2
btb = E .
If tb = 0, then the corresponding ib affects neither the MSE
nor the energy. Hence, we suppose that tb 6= 0. If λ
′
b = 0, then
ν′ =
4be−2tb(ib−1)
ib
, (39)
which is equivalent to
2 · 4btbe
2tb
ν′
= 2tbibe
2tbib = zez (40)
where z = 2tbib. Then, z = 2tbib = W
(
2·4btbe
2t
b
ν′
)
where
W (·) denotes the Lambert W function [23]. Hence,
i∗b =
1
2tb
W
(
2 · 4btbe
2tb
ν′
)
. (41)
Note that i∗b = 1+ǫ for ν
′ = 4
be−2tbǫ
1+ǫ because ofW (ze
z) = z.
Suppose that g(ib) =
4be−2tb(ib−1)
ib
in (38). Because of
dg(ib)
dib
< 0, g(ib) is a monotonically decreasing function. If
ν′ > 4
be−2tbǫ
1+ǫ = g(1+ ǫ), then λ
′
b 6= 0 and i
∗
b = 1+ ǫ by (36).
APPENDIX D
PROOF OF COROLLARY 16
If i
(0)
b = 2 and t
(1)
b > 0 for all b ∈ [0, B − 1], then we
obtain the following solution by solving (13).
t
(1)
b =
1
2
log
(
4b
2ν
)
, (42)
which follows from (15). We will show that (i(0), t(1)) also
satisfies all the KKT conditions for (13) (i.e., (35)–(37) in Ap-
pendix C). First, (i(0), t(1)) satisfies (35) which is equivalent
to (26). In addition, i
(0)
b = 2 satisfies (36) and makes λ
′
b = 0
for all b ∈ [0, B − 1]. Then, (37) will be
− 2 · 4bt
(1)
b e
−2t
(1)
b + 4ν′t
(1)
b = 0. (43)
Suppose that i(1) = i(0). Then, (39) is modified to ν′ =
4b
2 e
−2t
(1)
b , which satisfies (43). Thus, (i(0), t(1)) satisfies all
the KKT conditions of (13) and (14).
APPENDIX E
PROOF OF LEMMA 17 AND THEOREM 18
From (20), we observe that t˜b > 0 for all b ∈ [0, B − 1] if
ν < 12 . By (30), (i
(0), t˜) satisfies
B−1∑
b=0
4tb =
B−1∑
b=0
2 log
(
1
ν
·
4b
2
)
= E , (44)
which results in
ν = 2B−2 · exp
(
−
E
2B
)
. (45)
Then, the condition ν < 12 is equivalent to
E > 2B(B − 1) log 2. (46)
9Hence, t˜b > 0 for all b ∈ [0, B− 1] if (46) holds. By (20) and
(45), we obtain (21).
By (7) and (21),
MSE
(
i(0), t˜
)
= c ·
B−1∑
b=0
4b exp
(
−2t˜b
)
= c ·B · 2B−1 exp
(
−
E
2B
)
. (47)
The uniform energy allocation of (i(0), t(0)) results in
MSE
(
i(0), t(0)
)
= c ·
4B − 1
3
exp
(
−
E
2B
)
. (48)
From (47) and (48), we obtain (22).
ACKNOWLEDGMENT
The work of Yuval Cassuto was partly supported by the
US-Israel Binational Science Foundation, and by the Israel
Science Foundation.
REFERENCES
[1] J. Zhu, “Magnetoresistive random access memory: The path to compet-
itiveness and scalability,” Proc. IEEE, vol. 96, no. 11, pp. 1786–1798,
Nov. 2008.
[2] J. Kim et al., “Spin-based computing: Device concepts, current status,
and a case study on a high-performance microprocessor,” Proc. IEEE,
vol. 103, no. 1, pp. 106–130, Jan. 2015.
[3] Y. Kim, S. K. Gupta, S. P. Park, G. Panagopoulos, and K. Roy, “Write-
optimized reliable design of STT MRAM,” in Proc. ACM/IEEE Int.
Symp. Low Power Electron. Design (ISLPED), Jul.-Aug. 2012, pp. 3–8.
[4] A. V. Khvalkovskiy et al., “Basic principles of STT-MRAM cell
operation in memory arrays,” J. Phys. D: Appl. Phys, vol. 46, no. 7,
p. 074001, Feb. 2013.
[5] S. Ikeda et al., “A perpendicular-anisotropy CoFeBMgO magnetic tunnel
junction,” Nature Mater., vol. 9, no. 9, pp. 721–724, Jul. 2010.
[6] H. Meng and J.-P. Wang, “Spin transfer in nanomagnetic devices with
perpendicular anisotropy,” Appl. Phys. Lett., vol. 88, no. 17, p. 172506,
Apr. 2006.
[7] T. Nozaki, Y. Shiota, M. Shiraishi, T. Shinjo, and Y. Suzuki, “Voltage-
induced perpendicular magnetic anisotropy change in magnetic tunnel
junctions,” Appl. Phys. Lett., vol. 96, no. 2, p. 022506, Jan. 2010.
[8] W.-G. Wang, M. Li, S. Hageman, and C. L. Chien, “Electric-field-
assisted switching in magnetic tunnel junctions,” Nature Mater., vol. 11,
no. 1, pp. 64–68, Nov. 2012.
[9] P. Zhou, B. Zhao, J. Yang, and Y. Zhang, “Energy reduction for STT-
RAM using early write termination,” in Proc. IEEE/ACM Int. Conf.
Comput.-Aided Design (ICCAD), Nov. 2009, pp. 264–268.
[10] A. Ranjan, S. Venkataramani, X. Fong, K. Roy, and A. Raghunathan,
“Approximate storage for energy efficient spintronic memories,” in Proc.
Design Autom. Conf. (DAC), Jun. 2015, pp. 1–6.
[11] S. Mittal, “A survey of techniques for approximate computing,” ACM
Comput. Surv., vol. 48, no. 4, pp. 62:1–62:33, Mar. 2016.
[12] M. Alioto, “Energy-quality scalable adaptive VLSI circuits and systems
beyond approximate computing,” in Proc. Design Autom. Test Europe
(DATE), Mar. 2017, pp. 127–132.
[13] F. Frustaci, D. Blaauw, D. Sylvester, and M. Alioto, “Approximate
SRAMs with dynamic energy-quality management,” IEEE Trans. VLSI
Syst., vol. 24, no. 6, pp. 2128–2141, Jun. 2016.
[14] X. Yang and K. Mohanram, “Unequal-error-protection codes in SRAMs
for mobile multimedia applications,” in Proc. IEEE/ACM Int. Conf.
Comput.-Aided Design (ICCAD), Nov. 2011, pp. 21–27.
[15] Y. Kim, M. Kang, L. R. Varshney, and N. R. Shanbhag, “Generalized
water-filling for source-aware energy-efficient SRAMs,” IEEE Trans.
Commun., vol. 66, no. 10, pp. 4826–4841, Oct. 2018.
[16] ——, “SRAM bit-line swings optimization using generalized waterfill-
ing,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Jun. 2018, pp. 1670–
1674.
[17] K. Cho, Y. Lee, Y. H. Oh, G.-c. Hwang, and J. W. Lee, “eDRAM-based
tiered-reliability memory with applications to low-power frame buffers,”
in Proc. ACM/IEEE Int. Symp. Low Power Electron. Design (ISLPED),
Aug. 2014, pp. 333–338.
[18] Y. Kim, W. H. Choi, C. Guyot, and Y. Cassuto, “On the optimal refresh
power allocation for energy-efficient memories,” in Proc. IEEE Global
Commun. Conf. (GLOBECOM), Dec. 2019, pp. 1–6.
[19] J. Gorski, F. Pfeuffer, and K. Klamroth, “Biconvex sets and optimization
with biconvex functions: A survey and extensions,” Math. Methods Oper.
Res., vol. 66, no. 3, pp. 373–407, Jun. 2007.
[20] R. E. Wendell and A. P. Hurter, “Minimization of a non-separable
objective function subject to disjoint constraints,” Oper. Res., vol. 24,
no. 4, pp. 643–657, Jul.-Aug. 1976.
[21] W. H. Butler et al., “Switching distributions for perpendicular spin-
torque devices within the macrospin approximation,” IEEE Trans.
Magn., vol. 48, no. 12, pp. 4684–4700, Dec. 2012.
[22] H. Yang et al., “Threshold switching selector and 1S1R integration
development for 3D cross-point STT-MRAM,” in Proc. IEEE Int.
Electron Devices Meeting (IEDM), Dec. 2017, pp. 38.1.1–38.1.4.
[23] R. M. Corless, G. H. Gonnet, D. E. G. Hare, D. J. Jeffrey, and D. E.
Knuth, “On the Lambert W function,” Adv. Comput. Math., vol. 5, no. 1,
pp. 329–359, Dec. 1996.
