MERAM: Non-Volatile Cache Memory Based on Magneto-Electric FETs by Angizi, Shaahin et al.
1MERAM: Non-Volatile Cache Memory Based on
Magneto-Electric FETs
Shaahin Angizi†, Navid Khoshavi∗, Andrew Marshall‡, Peter Dowben§ and Deliang Fan†
†School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85287
∗Department of Computer Science, Florida Polytechnic University, FL
‡Department of Electrical and Computer Engineering, The University of Texas at Dallas, Richardson, TX
§Department of Physics and Astronomy, University of Nebraska- Lincoln, Lincoln, NE
sangizi@asu.edu, nkhoshavinajafabadi@floridapoly.edu, Andrew.Marshall@utdallas.edu, pdowben@unl.edu,
dfan@asu.edu
Abstract—Magneto-Electric FET (MEFET) is a recently de-
veloped post-CMOS FET, which offers intriguing characteristics
for high speed and low-power design in both logic and memory
applications. In this paper, for the first time, we propose a non-
volatile 2T-1MEFET memory bit-cell with separate read and
write paths. We show that with proper co-design at the device,
cell and array levels, such a design is a promising candidate
for fast non-volatile cache memory, termed as MERAM. To
further evaluate its performance in memory system, we, for the
first time, build a device-to-architecture cross-layer evaluation
framework based on an experimentally-calibrated MEFET device
model to quantitatively analyze and benchmark the proposed
MERAM design with other memory technologies, including both
volatile memory (i.e. SRAM, eDRAM) and other popular non-
volatile emerging memory (i.e. ReRAM, STT-MRAM, and SOT-
MRAM). The experiment results show that MERAM has a high
state distinguishability with almost 36x magnitude difference
in sense current. Results for the PARSEC benchmark suite
indicate that as an L2 cache alternative, MERAM reduces Energy
Area Latency (EAT) product on average by ∼98% and ∼70%
compared with typical 6T SRAM and 2T SOT-MRAM platforms,
respectively.
Index Terms—Magneto-electric FETs, Memory bit-cell, Cache
design.
I. INTRODUCTION
Over the past decade, Non-Volatile Memories (NVM) have
been actively explored with the main goals of satisfying
robustness, minimizing stand-by leakage, achieving high speed
and integration density as major requirements to replace
conventional volatile memory technologies in main mem-
ory (i.e. DRAM) or cache (i.e. SRAM) [1]–[3]. This could
optimistically boost the memory capacity and performance
especially when it comes to on-chip cache for embedded
applications and low-energy budget IoT systems. However,
there is very few memory technologies still surviving in this
arena. Among popular NVM technologies, ReRAM [4] and
PCM [5] offer higher ON/OFF ratio, thus higher sense margin,
and packing density than DRAM (∼2-4×) [5]. However, they
suffer from slow and power hungry write operations as well
as low endurance (∼ 105-1010) [3], [6]. Recent experiments
and fabrication of spin-based NVMs show the ability to
switch the magnetization using current-induced Spin-Transfer
Torque (STT) or Spin-Orbit Torque (SOT) with high speed
(sub-nanosecond), long retention time (10 years) and less
than fJ/bit memory write energy (close to SRAM) [7], [8].
However, such NVMs have poor ON/OFF ratios (maximum re-
sistance ratio ∼7×) in parallel and anti-parallel configurations.
Moreover, the current densities in current-driven spin devices
impose reliability issues and large static power dissipation
[9], [10]. The ferroelectric transistor random access memories
(FERAMs) [2] offer high endurance and sense margin, but
suffering from a destructive read operation. FE-FET memories
[11], [12], however, offer FERAMs’ benefits with a reduced
1-10 ns [11] write time and could be a possible alternative.
The downside is their large write voltage (> 4.0V ) and power
consumption [13]–[15]
Recently, another promising spintronic device, based on
Magneto-Electric (ME) phenomena [1], [3], [16], [17] has
shown superior performance in terms of switching speed,
energy, ON/OFF ratio, etc. The principle innovative feature
of this emerging device that significantly differs from the
traditional spintronic devices is that switching speed is only
limited by the switching dynamics of the ME material of
the voltage controlled spintronic devices [1], [18], [19]. With
coherent rotation, as the domain switching mechanism, the
switching speed might be as fast as 5-6 ps [19] as it doesn’t
require the switching of a ferromagnet or movement of a
ferromagnetic domain wall. Therefore, this may be considered
to be spintronics without a ferromagnet, achieving fast write
speeds (<20 ps/full-adder) [18]–[20], a low energy cost (<20
aJ/full-adder) [20], combined with a great temperature stability
(operational to 400 K or more), and improved scalability.
While there are proposals for logic design based on ME
devices, such as ME-MTJ [20], [21] and MEFET [1], [16],
[18], [22], etc., to the best of our knowledge, there is no
published ME memory design and evaluation. In this work,
we are the first to propose a 3-terminal MEFET-based non-
volatile memory cell design with separate read and write paths.
We show that with proper co-design at the device, cell and
array levels, such design may be a potential competitor for
current non-volatile on-chip cache replacement race. Our main
contributions in this work are summarized as follows:
• We enhance the experimentally-calibrated MEFET
Verilog-A circuit model [23] to perform extensive analy-
sis at the device, cell/array circuit levels.
• We propose a 2-transistor 1-MEFET memory bit-cell
circuit design with high speed and low read/write energy
suitable for on-chip memory cache.
• We develop a bottom-up evaluation framework to ex-
ar
X
iv
:2
00
9.
06
11
9v
1 
 [c
s.E
T]
  1
4 S
ep
 20
20
2tensively assess and benchmark MEFET cache perfor-
mance with current volatile and non-volatile memories,
including SRAM, eDRAM, ReRAM, STT-MRAM, SOT-
MRAM.
II. MAGNETO-ELECTRIC SPIN FIELD EFFECT TRANSISTOR
(MEFET)
A. Device Characterization
The Magneto-Electric spin Field Effect Transistor (MEFET)
is structurally very similar to the conventional CMOS FET
device. Fig. 1a shows the basic single source version of
MEFET as a 4-terminal device with gate (at T1), source (T2),
drain (T3), and back gate (T4) terminals [24]. The device
is a stacked structure of a narrow semiconductor channel
sandwiched by two dielectrics i.e. Magneto-Electric (ME)
material (e.g. Chromia (Cr2O3)) and insulator (e.g. Alumina
(Al2O3)). There are two electrodes contacting the stacked
structure, at the bottom gate via ME layer (T1) and at the
top via back gate alumina layer. The channel as shown in
Fig. 1b is made of tungsten diselenide (WSe2), enabling on-
off ratio of ∼ 106 [24] and high hole mobility comparable
to CMOS. The source (at T2 terminal) could be made of
both fixed spin ferro-magnetic (FM) polarizer or a conductor.
MEFET operates based on programming of the semiconductor
channel polarization so-called Spin Orbit Coupling (SOC),
by the boundary polarization of the ME gate, through the
proximity effect. In other words, the channel can be polarized
by the ME layer on extremely low voltage of around ±100
mV [1], [16], [17] at T1 while T4 is grounded.
The ME layer has high interface polarization which can be
controlled by vertical voltage [1]. Chromia here is a promising
ME gate dielectric that has the potential to induce spin
polarization in an over-layer channel [1]. Applying voltage
across the gate and back gate terminal is equivalent to charging
of the ME capacitor. Therefore, depending on the positive
or negative voltage applied to T1, a vertical electrical field
across the gate is created. In response to the electrical field,
paraelectric polarization and Atomic Force Microscopy (AFM)
order in ME insulator layer are switched. It first changes
the direction of orientation of chromia spin vectors through
SOC. The ME boundary polarization can have an exchange
interaction with a semiconductor channel to polarize the
carriers’ spins in the channel and induce preferred conduction,
i.e., much lower resistance, in only one direction along the
channel. This high spin boundary polarization was predicted
independently by Andreev and Belashchenko [25], [26] and
has been experimentally confirmed, for ME chromia, by a
wide variety of techniques [1]. In other words, the influence of
surface magnetization on the channel produces directionality
of conduction, which is not possible through conventional gate
dielectrics, as depicted in Fig. 1d. The current versus voltage
dependent on the direction of ME polarization is obtained by
NEGF transport simulation [27] in a 2-D ribbon with a width
of 20 nm and band mass of 0.1me. We consider a conservative
value of exchange splitting of 0.1 eV, T3-T2 = 0.1 V, at 300
K. As show in Fig. 1e, chromia induces a very high level of
spin polarization in WSe2 channel, virtually 100% at the top
of the valence band for hole conduction. Such interaction with
Figure 1: (a) The basic Magneto-Electric spin-FET (MEFET) with
gate, source, drain and back gate. The narrow semiconductor channel
can be made of any suitable material (e.g: graphene, InP, GaSb, PbS,
WSe2, etc.), (b) A 2D view of MEFET, (c) The proposed MEFET
circuit scheme, (d) Source to drain current versus voltage at T1 in
the ME-FET. The SOC channel polarized in opposite directions (+
or -) by the ME gate, (e) Induced spin polarization in WSe2, and (f)
Interaction with chromia adapted from [1], (g) Verilog-A modules
developed for MEFET modeling.
chromia is shown in Fig. 1f (adapted from author’s preliminary
work in [1]). Therefore, in the end, the channel spin vector
changes to either ‘up’ or ‘down’ direction, as shown in Fig. 1d.
After biasing the SOC channel, the charge current is injected
through source generating a spin-polarized current at the T3.
Fig. 1b and Fig. 1c show the 2D view of the MEFET device
and its transistor circuit representation used in this work. As
T4 is grounded, the simplified three-terminal scheme is used
hereafter.
Compared to a 200 ps coupling delay of the Magneto-
Electric Magnetic Tunnel Junction (ME-MTJ) [20], MEFET
achieves an extremely low switching delay (somewhere in the
region of 10-100 ps [28], thus avoiding the excessive delays
associated with exchange-coupled ferromagnets. Such device
shows the feature of non-volatility due to the non-volatile
AFM ordering of the ME and a very high and sharp turn-
on voltage due to the sharp turn on of the ME-switching [23].
It it worth pointing out that for sensing of device resistance,
a more energy-efficient read circuitry could be expected for
MEFET as it shows much higher ON/OFF ratio compared with
TMR-sensing in traditional spin-based devices. The ON/OFF
current ratio for WSe2 [29] is experimentally shown to extend
up to 106, while magneto-resistance effect in MTJs does not
exceed 102. Regarding MEFET fabrication status, there exists
many experiments proving that with a static magnetic field, the
chromia’s magnetic order could be switched back and forth via
3Table I: Compact model parameters of the MEFET, used in the
Verilog-A model adapted from [20], [23]
Parameter Value Description of Parameter and Units
ME 12 Dielectric constant of chromia [32]
Al2O3 10 Dielectric constant of Alumina
tME 10 thickness of magnetoelectric layer, nm
WME × LME 900 area of magnetoelectric layer, nm2
tox 2 Oxide barrier thickness, nm
Vt 0.05 Threshold of Chromia state inversion, V
Vg 0.1 Voltage applied across ME layer, V
Ron 1.05 ON Resistance, kΩ
Roff 63.4 OFF Resistance, MΩ
applying an electric field [30], [31]. However, the creation of
a desired direction via SOC is yet to be proved experimentally
and has been only anticipated based on principle computations
[1], [22].
B. Device Modeling
We consider two aspects in developing the MEFET model:
First, ME control of the channel spin polarization based on
the proximity induced polarization in the narrow 2D channel.
Second, spin injection/detection function at the source/drain
[16], [23]. The model is developed in Verilog-A with three
distinct modules as shown in Fig. 1g. In the presented MEFET
model, T2 and T3 are utilized to inject the spin-polarized
current and detect it, respectively.
In the Verilog-A modeling shown in Fig. 1g, we further
consider the experimental switching parameters for chromia
layer and SOC channel. Module-I receives the gate and
source voltage and compares the gate-source voltage with the
threshold of chromia state inversion (0.050V [16]), initializes
the memory and assigns back to voltage across the drain
and source terminal. Module-II considers the delay factor in
transition between source to drain. We consider a computed
delay element is associated with the boundary magnetization
between the ME film and the interface of channel. The
switching time of MEFET device is then limited only by
the switching dynamics of ME. Module-III assigns the proper
Rch and calculates corresponding electrical parameters output
voltage at the drain. The channel resistance (Rch) across the
two-dimensional (2D) narrow channel is considered in series
to the input resistance Rin to define the boundary conditions
for switching. Besides, we consider two spin state terminals
(“Ss” and “Ds”) to validate the spin state injected/detected
at the source/drain terminals as shown in Fig. 1g. The ‘up’
and ‘down’ spins are represented by constant voltages sources
with ‘+1 V’ and ‘-1 V’, respectively at the “Ss” terminal.
Before running the simulation, the injected spin orientation can
be selected, making the model flexible to be used in various
CAD tools such as Cadence platform. The spin current then
can be detected at the drain (“Ds” in Fig. 1g). In our compact
model, the processional delay across the FM layer was taken
into account by a fixed delay assumed to be 200 ps. This
assumption is based on the best estimate of the coupling delay
[18]. Table I lists the parameter values used in the models.
III. MEFET MEMORY
The proposed non-volatile 2T-1MEFET RAM bit-cell,
called MERAM, consists of one MEFET as the main storage
element and two access transistors, as shown in Fig. 2a. By
SA
W
B
L
1
WWL1
 Memory Column Decoder
RWL1
RBL1
WWL2
RWL2
RBL2
WWLm
RWLm
RBLm
S
L
1
W
B
L
2
S
L
2
V
o
lta
g
e 
D
riv
e
r
V
o
lta
g
e 
D
riv
e
r
V
ol
ta
g
e 
D
riv
e
r
V
o
lta
g
e 
D
riv
e
r
 M
em
or
y 
R
ow
 D
e
co
d
er
W
B
L
n
S
L
n
Write 
Read 
RBL
W
B
L
S
L
write access 
transistor
read access 
transistor
W
B
L
S
L
Vwrite
GND
VDD
W
B
L
S
L
VDD
SA
MEFET
(a)
Iread
(b)
RWL
WWL
Figure 2: (a) The 2T-1MEFET RAM (MERAM) bit-cell with Read
and Write signals, (b) An m×n MERAM array with peripheral
circuitry.
virtue of three-terminal structure of MEFETs, depicted in Fig.
1c, we design the memory bit-cell to have separate read,
write paths which facilitates independent optimization of both
operations, and avoids read-write conflicts in many other 1T1R
resistive non-volatile memory designs. Each cell is controlled
by five controlling signals i.e. Write Word Line (WWL), Write
Bit Line (WBL), Read Word Line (RWL), Read Bit Line
(RBL), and Source Line (SL). The read/write access transistor
is controlled by RWL/WWL enabling selective read/write op-
eration on the cells located within one row. An m×n MERAM
array developed based on the proposed bit-cell is shown in Fig.
2b. The WLs and RBLs are shared amongst the cells within
the same row and WBLs and SLs are shared between cells in
the same column. The WLs are controlled by the memory row
decoder (active-high output). The column decoder (active-high
output) controls the activation of read current path through the
SL. The voltage driver component is designed to set the proper
voltage on the WBL. In the following, we explain the read and
write operations, respectively.
A. Read Operation
The concept behind MERAM’s read operation is to sense
the resistance of the selected memory cell and compare it
by a reference resistor using a Sense Amplifier (SA), as
shown in Fig. 2a. At the array level, shown in Fig. 3a, the
row and column decoders activate the RWL and SL paths
respectively. When a memory cell is selected, by applying a
very small sense current (sub-micro) to RBL, a voltage (Vsense)
is generated on the corresponding SL, which is taken as the
input of the sense circuit, as shown in Fig 3b. Owing to the low
or high resistance state of the selected 2T-1MEFET RAM bit-
cell (RM1), the sense voltage is Vlow or Vhigh (Vlow<Vhigh),
respectively. Thus, through setting the reference voltage at
(VAP+VP)/2, the SA outputs binary ‘1’ when Vsense>Vref,
whereas output is ‘0’. We designed and tuned the sense circuit
based on StrongARM latch [33] shown in Fig. 3c. Each read
operation requires two clock phases: pre-charge (CLK ‘high’)
and sensing (CLK ‘low’). During the pre-charge phase, both
4RWL1
RBL1
S
L
1
S
L
2
Voltage 
Driver
S
L
12
8
Isense
RBL2
switch
Voltage 
Driver
RBL128
switch
0
0
0
0
0
1
(a)
SA
Out
V
ref
V
sen
se
enable
Out
Rref 
SA
Out
enable
Out
Rref 
S
A
Out
enable
Out
Rref 
Iref Iref Iref
Voltage 
Driver
RBL1
switch
(c)
CLK
CLK CLK
V
se
n
se
Vref
OUT OUT
RBL
SL
~
Rhigh
Rlow
Vlow Vhigh
read
Vsense
R
M
1
R
1
Is
e
n
se
SA
R
re
f
Ir
e
f
Vref
(b)
VDDIm
Figure 3: (a) The array-level read operation. It can be seen that
read access transistors isolate the un-accessed cells avoiding sneak
paths. (b) The idea of voltage comparison between Vsense and Vref for
memory read, (c) The Sense Amplifier (SA).
SA’s outputs are reset to ground potential. Then, in the sensing
phase, the input transistors provide various charging current
based on the gate biasing voltages (Vsense and Vref), leading to
various switching speed for the latch’s cross-coupled inverters.
The biasing condition for the read operation is tabulated in
Table II.
B. Write Operation
The write operation, shown in Fig. 2a, is accomplished
by activating WWL and asserting appropriate bipolar write
voltage (Vwrite/-Vwrite) through voltage driver on WBL. As
detailed above, the voltage provides enough vertical electrical
field magnitude across the gate to switch the spin vectors of
underlying chromia layer and channel of MEFET. The RBL
and SL don’t require to be grounded for the write period as
seen in other memory technologies such as FEFET [14]. The
unaccessed rows are isolated by driving the corresponding
WLs to GND by row decoder. This is necessary to avoid
unwanted current paths that can cause false write/read-states.
The biasing condition for the write operation is tabulated in
Table II.
Table II: Bias configuration of MERAM array.
Operation WWL RWL WBL RBL SL
Accessed Read 0 1 - Isense -Unaccessed 0 0 - - -
Accessed Write
(’1’/’0’)
1 0 - Vwrite/-Vwrite - -
Unaccessed 0 0 - - -
All Hold 0 0 - - -
IV. EVALUATION
A. Bottom-Up Evaluation Framework
For cross-technology evaluation and comparison, we devel-
oped a comprehensive bottom-up cross-layer framework as
shown in Fig. 4. The device-level model of each technology
was first developed/extracted from various models and assess-
ments. For MERAM, we used the Verilog-A model developed
by our co-author Dr. Marshall in preliminary work [16].
For STT-MRAM and SOT-MRAM, we jointly use the Non-
Equilibrium Green’s Function (NEGF) and Landau-Lifshitz-
Gilbert (LLG) equations to model the bitcell, developed in
MTJ modeling using 
NEGF-LLG 
(Verilog-A)D
e
vi
ce
Extracting Performance Parameters i.e. Delay, Energy, Area
(Spectre/Spice)
C
ir
cu
it
 
C
o
n
tr
o
ll
e
r 
 
(S
y
n
o
p
sy
s 
 D
e
si
g
n
 C
o
m
p
il
e
r)
  
MRAM
1T1C circuit level 
(Spectre)
eDRAM 
6T circuit level
(Spectre) 
SRAM 
DRAM cell 
parameters from 
Rambus
ReRAM 
Design & Verification of a single 256x256 sub-array 
(Cadence Spectre)
Circuit level 
(Spectre)
Default NVSim 
ReRAM .cell file
Default NVSim 
SRAM .cell file
Verilog-
A  1T1R 
digital 
ReRAM
Verilog-A 
1T1R STT-
/2T SOT-
MRAM
Develop memory library for 
NVSim based on circuit level 
data for MERAM
A
rc
h
it
ec
tu
re
 
Modified Cacti based on 
circuit level DRAM data
Extracting Performance Parameters i.e. Delay, Energy, Area for the system w.r.t. 
cache configuration file (.cfg)
Architectural-level Simulation with MARSSx86 
Compact model 
parameters
Verilog-A 
2T-
1MEFET 
RAM
MERAM
Extracting Performance Parameters i.e. Delay, Energy, Area for PARSEC 2.1 suite
Configure NVSIM for 
existing memory 
technologies
Figure 4: The bottom-up evaluation framework developed for cache
memory evaluation.
our co-author Dr. Fan’s preliminary work [34]–[36]. We lever-
aged the default ReRAM and SRAM cell configuration of
NVSim [37]. The eDRAM cell parameters were adopted and
scaled from Rambus [38]. In the circuit-level, we developed
a 256×256 memory sub-array for each memory technology
with peripheral circuity, simulated in Cadence Spectre with
the 45nm NCSU Product Development Kit (PDK) library [39].
At the architecture-level, we developed memory libraries for
MERAM in C++, based on circuit level results and configure
other memory technologies based on existing NVSim’s [37]
and Cacti’s [40] libraries. Performance data (i.e. delay, energy,
area) for cache are estimated with respect to a single input
memory configuration file (.cfg) as tabulated in Table III. The
results are then fed to the cycle-accurate MARSSx86 simulator
[41] for each memory technology to show the architecture-
level performance.
B. Device and Circuit Level
Fig. 6 shows the transient simulation results of a 2T-
1MEFET RAM cell located in a 256×256 sub-array based
on the architecture shown in Fig. 2. Here, we consider two
experiment scenarios for the write operation, as indicated by
the solid blue and dotted red line in Fig. 6. For the sake of
clarity of waveforms, we assume a 3ns period clock synchro-
nises the write and read operation. However, <1ns period can
be used for a reliable read operation. During the precharge
phase of SA (Clk=1), the Vwrite voltage is set (=-100mv in the
1st experiment or +100mv in the 2nd experiment) and applied
to the WBL to change the MEFET resistance to Rlow=1.05
kΩ or Rhigh=63.4 MΩ. Prior to the evaluation phase (Eval.)
of SA, WWL and WBL is grounded while RBL is fed by
the very small sense current, Isense= 900 nA. In the evaluation
phase, RWL goes high and depending on the resistance state
of MERAM bit-cell and accordingly SL, Vsense is generated
at the first input of SA, when Vref is generated at the second
input of SA. The comparison between Vsense and Vref for both
experiments is plotted in Fig. 6. We observe when Vsense<Vref
(1st experiment), the SA outputs binary ‘0’, whereas output
is ‘1’ (2nd experiment). As can be seen, the same value can
be sensed again in the next sensing cycle (3-6 ns) regardless
of WBL voltage since WWL is deactivated so the MERAM
bit-cell remains unchanged.
5Metrics ReRAM STT-MRAM SOT-MRAM SRAM eDRAM MERAM
Non-volatility Yes Yes Yes No No Yes
# of access transistors 1 1 2 6 1 2
Area (mm2) 1.77 5.42 5.85 12.4 4.46 6.94
Cache Hit Latency (ns) 2.55 3.14 5.07 1.59 3.1 0.65
Cache Miss Latency (ns) 1.21 1.28 1.32 0.34 - 0.22
Cache Write Latency (ns) 20.5 10.7 3.93 0.78 3.1 0.94
Cache Hit Dynamic Energy (nJ) 0.33 0.52 0.21 0.73 0.24 0.22
Cache Miss Dynamic Energy (nJ) 0.033 0.044 0.03 0.017 - 0.037
Cache Write Dynamic Energy (nJ) 0.82 1.27 0.27 0.72 0.24 0.27
Cache Total Leakage Power (W) 0.38 0.79 0.21 6.2 0.57 0.19
Endurance ∼ 105 − 1010 ∼ 1010 − 1015 ∼ 1010 − 1015 Unlimited ∼ 1015 ∼ 1017
Data Over-written Issue No No No No Yes No
Table III: Estimated row Performance of various memory technologies as a
4MB unified L2 cache with 64 Bytes cache line size.
Figure 5: Benchmarking radar plot of MEFET vs. other
technologies.
Figure 6: The transient simulation results of two experiments on
MERAM cell.
Table III compares the row performance of six various mem-
ory technologies, i.e. volatile SRAM and eDRAM with non-
volatile ReRAM, STT-MRAM, SOT-MRAM, and MERAM
integrated as a 4MB L2 cache with 64 Bytes cache line
size. It is worth pointing out that the data is extracted from
our bottom-up evaluation framework and architecture level
simulators. Here, we discuss the results reported in Table
III. Additionally, the radar plot in Fig. 5 further investigates
and intuitively shows the pros and cons associated with each
technology compared with MERAM in the array-level that
could be employed to build high-density arrays.
1) Latency: As listed in Table III, we observe that MERAM
shows a remarkable improvement in cache hit and miss latency
(0.65/0.22 ns) as compared with other platforms. In terms of
cache write, MERAM achieves the second shortest latency
after volatile SRAM. It can achieve 0.94 ns cache write oper-
ation, which is ∼4× shorter than the best non-volatile memory
(SOT-MRAM-3.93 ns) but not faster than SRAM counterpart
(0.78 ns). Therefore, MERAM meets the requirements of a
high speed non-volatile cache unit.
2) Energy Consumption: The energy budget for various
cache operations is shown in Table III. The proposed MERAM
design achieves a comparable dynamic energy consumption
for cache hit to eDRAM and SOT-MRAM as the most energy-
efficient designs. While SRAM, SOT-MRAM and ReRAM
achieve the least energy consumption for cache miss. More-
over, we observe SOT-MRAM and MERAM platforms con-
sume the smallest cache write dynamic energy among all the
NVM platforms, due to their intrinsically low-power device
operation, however eDRAM shows the least energy consump-
tion. MERAM consumes 0.27 nJ for write operation, which is
∼2.6× smaller than SRAM platform. MERAM writing tech-
nique averts dissipative currents and is thus energy-efficient
and inerts against detrimental impacts of Joule heating [1].
The cache total leakage power is also reported in Table III.
We observe that MERAM and SOT-MRAM consume the least
leakage power compared to other candidates. Thus, MERAM
could be considered as a promising non-volatile unit in terms
of energy-efficiency.
3) Endurance: The inorganic MEFET easily lasts to 1017
switches [16]. The reason here is the required current densities
are very low, which considerably reduces the device failure
rate from what has been seen in other spintronic devices, which
require much higher current densities (1010-1015) [42].
4) Area: The MERAM bit-cell requires two minimum size
access transistors (W:L=90:50 at 45nm technology node) to
enable separate read and write paths. In this way, MERAM
occupies 6.94 mm2 to implement a 4MB cache, which turns
out to be a larger chip area compared to other non-volatile
memories and eDRAM. However, it achieves ∼1.7× smaller
area than 6T SRAM platform. Therefore, MERAM array
couldn’t be considered an area-efficient non-volatile mem-
ory candidate compared to ReRAM, STT-MRAM and SOT-
MRAM designs.
5) Integration with CMOS: The device characteristics of
chromia layer makes this material interesting for the inte-
grating into the back end of line (BEoL). The feasibility
of integration of chromia with silicon was experimentally
demonstrated in [43], [44]. The bit-cell layout of single-
port MERAM, shown in Fig. 7(b), was estimated using λ-
based layout rules (λ: half of the minimum feature size,
F, here λ:22.5nm) [45]. The proposed cell takes estimately
40λ×16λ=640λ2, in contrast to 2000λ2 of layout of baseline
6T-SRAM adapted from [46]. The layout of 4×4 MERAM
array with controlling signals is illustrated in Fig. 7(a).
C. Architecture Level
1) Experiment Setup: The cycle-accurate simulator
MARSSx86 [41] was used to evaluate the efficiency of our
proposed circuit-to-architecture cross-layer framework. The
6Figure 7: Layout of (a) 4×4 MERAM array and (b) MERAM bit-
cell.
cache controller modified to realize the functionality of L2
architecture with various memory technology candidates. We
configured the simulator with the parameters listed in Table
IV. We selected eleven various benchmarks from PARSEC
2.1 suite for testing the performance of MERAM compared
to the existing cache technologies. The cache in the simulator
warmed up with 5 million instructions. The 500 million
instructions starting at the Region Of Interest (ROI) of each
workload was executed afterward. The collected reports were
used to analyze the L2 candidates based on Energy Area
Latency (EAT) product. The EAT metric can holistically
identify the preferred L2 candidate by taking into account
several essential metrics in the cache design.
Table IV: System Configuration
CPU 4 cores, 3.3 GHz, Fetch/Exec/ Commit width 4
L1 private, 32 KB, I/D separate, 8-way, 64 B, SRAM, WB
L2 private, 4 MB, unifed, 8-way, 64 B, memory tech. candidate, WB
Main Memory 8 GB, 1 channel, 4 ranks/ channel, 8 bank/ rank
2) Energy Comparison: Fig. 8a shows the dynamic energy
consumption comparison among L2 candidates. The dynamic
energy consumption varies based on the workload charac-
teristics (e.g. read/write intensity), and the power required
for read/write operation in the given memory technology.
In particular, the workloads with higher writeread ratio impose
considerably more dynamic energy in the candidates with
high write energy consumption such as STT-MRAM. As
an example, the dynamic energy consumption for heavily
write intensive facesim and ferret are 6.09 mJ and 5.22
mJ, respectively in STT-MRAM based L2 candidate which
are considerably higher than other candidates. On the other
hand, the read intensive workloads such as streamcluster
experience relatively higher number of read accesses. Thus,
running the read intensive workloads incurs considerable high
dynamic energy consumption in the candidates with relatively
energy-costly read access. Among the L2 candidates, the
SOT-RAM and MERAM consume the least dynamic power
compared to other candidates due to leveraging innovative
approaches to control the magnetic state of memory cell.
The execution time of each workload along with the leakage
power unit for each L2 candidate were used to compute the
leakage energy. As it is noticeable in Fig. 8b, the SRAM incurs
significantly higher leakage power, primarily due to its sub-
threshold leakage paths and the gate leakage current. Among
the L2 candidates, the MERAM consumes the least leakage
energy due to its intrinsically energy-efficient operations. Since
the leakage energy is the major contributor to the overall
energy consumption, the low leakage power consumption can
significantly reduce the corresponding EAT for that individual
L2 candidate.
3) Latency Comparison: We utilized the key parameters
acquired from NVSim, CACTI, and our MERAM memory
library, to compute the overall access latency for read/write
operations based on the cache access pattern for each work-
load, as illustrated in Fig. 9a. To reduce the standard deviation
of the simulation results w.r.t the commercialized design,
we integrated the acquired profiles with the latency associ-
ated with the peripheral circuits. The SRAM offers a rapid
read/write access compared to other L2 candidates because
of its symmetrical structure that enables easily detectable
minor voltage swings. The MERAM is the closest candidate to
SRAM which offers significantly low read/write latency. This
means that SRAM and MERAM should benefit from their low
overall latency while estimating EAT for each L2 candidate.
4) EAT Product: The MERAM offers an SRAM-
competitive performance, superior energy consumption, and
admissible area overhead compared to other candidates which
makes it preferred L2 candidate. As illustrated in Fig. 9b, the
MERAM delivers the least EAT among the L2 candidates.
Compared to SOT-RAM which was considered as the su-
perior alternative, MERAM reduces the EAT by 70.81% on
average. In the heavily write intensive workloads like vips
and facesim, the EAT is reduced by 71.74% and 71.44%,
respectively relative to delivered EAT in SOT-RAM. This trend
is also seen in the read intensive streamcluster workload,
whereby the EAT reduced by 68.63% w.r.t delivered EAT in
SOT-MRAM candidate. To be specific, MERAM decreases the
EAT by 80.26%, 82.48%, 94.57%, and 98.12% w.r.t ReRAM,
eDRAM, STT-MRAM, and SRAM, respectively.
V. CONCLUSIONS
In this work, we presented a non-volatile 2T-1MEFET mem-
ory bit-cell with separate read and write paths. We designed
a device-to-architecture cross-layer evaluation framework to
quantitatively analyze and compare the proposed design with
other memory architectures. Our simulation results showed
that MERAM offers an SRAM-competitive performance, su-
perior energy consumption, and admissible area overhead
compared to other candidates which makes it the preferred
L2 candidate. As an L2 cache alternative, MERAM reduces
Energy Area Latency (EAT) product on average by ∼98% and
∼70% relative to 6T SRAM and 2T SOT-MRAM platforms,
respectively. We believe such circuit/architecture experiments
can bring important motivation and guidance to device-level
researchers in this domain to see the potential performance of
this new emerging paradigm.
70
1
2
3
4
5
6
7
D
yn
am
ic
 E
ne
rg
y 
(m
J)
SRAM STT-MRAM SOT-MRAM ReRAM eDRAM MERAM
0
20
40
60
80
100
120
140
160
180
200
Le
ak
ag
e 
E
ne
rg
y 
(m
J)
SRAM STT-MRAM SOT-MRAM ReRAM eDRAM MERAM
873.01 912.50 838.46 1042.22 973.71 767.86 561.72 886.49 922.381467.13 826.18 1311.82 571.53
(a) (b)
Figure 8: L2 cache (a) dynamic energy and (b) leakage energy breakdown for SRAM, STT-MRAM, SOT-MRAM, ReRAM, eDRAM, and
MERAM.
0
100
200
300
400
500
600
700
800
900
1000
N
or
m
al
iz
ed
 L
at
en
cy
SRAM STT-MRAM SOT-MRAM ReRAM eDRAM MERAM
11
70
10
18
11
64
20
77
17
04
18
38
0
20
N
or
m
al
iz
ed
 E
A
T
SRAM STT-MRAM SOT-MRAM ReRAM eDRAM MERAM
15
6.
54
14
8.
27
19
5.
47
84
.3
6
11
1.
29
82
.0
5
87
.0
7
66
.7
6
57
.5
2
51
.2
8
44
.5
3
62
.4
9
69
.3
1
58
.5
4
(a) (b)
Figure 9: (a) Latency comparison and (b) EAT comparison of various L2 candidates. The results are normalized w.r.t the average of latency
(ns)/EAT for SRAM across all workloads and among various L2 candidates.
REFERENCES
[1] P. A. Dowben, C. Binek, K. Zhang, L. Wang, W.-N. Mei, J. P. Bird,
U. Singisetti, X. Hong, K. L. Wang, and D. Nikonov, “Towards a
strong spin–orbit coupling magnetoelectric transistor,” IEEE Journal
on Exploratory Solid-State Computational Devices and Circuits, vol. 4,
no. 1, pp. 1–9, 2018.
[2] S. Das and J. Appenzeller, “Fetram. an organic ferroelectric material
based novel random access memory cell,” Nano letters, vol. 11, no. 9,
pp. 4003–4007, 2011.
[3] P. Dowben, D. Nikonov, A. Marshall, and C. Binek, “Magneto-electric
antiferromagnetic spin–orbit logic devices,” Applied Physics Letters, vol.
116, no. 8, p. 080502, 2020.
[4] H. Akinaga and H. Shima, “Resistive random access memory (reram)
based on metal oxides,” Proceedings of the IEEE, vol. 98, no. 12, pp.
2237–2251, 2010.
[5] B. C. Lee, E. Ipek, O. Mutlu, and D. Burger, “Architecting phase change
memory as a scalable dram alternative,” in Proceedings of the 36th
annual international symposium on Computer architecture, 2009, pp.
2–13.
[6] X. Dong, X. Wu, G. Sun, Y. Xie, H. Li, and Y. Chen, “Circuit and
microarchitecture evaluation of 3d stacking magnetic ram (mram) as
a universal memory replacement,” in 2008 45th ACM/IEEE Design
Automation Conference. IEEE, 2008, pp. 554–559.
[7] S. Angizi, Z. He, A. S. Rakin, and D. Fan, “Cmp-pim: an energy-efficient
comparator-based processing-in-memory neural network accelerator,” in
Proceedings of the 55th Annual Design Automation Conference, 2018,
pp. 1–6.
[8] S. Angizi, Z. He, F. Parveen, and D. Fan, “Imce: Energy-efficient bit-
wise in-memory convolution engine for deep neural network,” in 2018
23rd Asia and South Pacific Design Automation Conference (ASP-DAC).
IEEE, 2018, pp. 111–116.
[9] S. Angizi, Z. He, A. Awad, and D. Fan, “Mrima: An mram-based in-
memory accelerator,” IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, vol. 39, no. 5, pp. 1123–1136, 2019.
[10] S. Angizi, Z. He, and D. Fan, “Pima-logic: A novel processing-
in-memory architecture for highly flexible and energy-efficient logic
computation,” in Proceedings of the 55th Annual Design Automation
Conference, 2018, pp. 1–6.
[11] S. George, K. Ma, A. Aziz, X. Li, A. Khan, S. Salahuddin, M.-F. Chang,
S. Datta, J. Sampson, S. Gupta et al., “Nonvolatile memory design
based on ferroelectric fets,” in Proceedings of the 53rd Annual Design
Automation Conference, 2016, pp. 1–6.
[12] D. Reis, M. Niemier, and X. S. Hu, “Computing in memory with
fefets,” in Proceedings of the International Symposium on Low Power
Electronics and Design, 2018, pp. 1–6.
[13] K. Ni, M. Jerry, J. A. Smith, and S. Datta, “A circuit compatible accurate
compact model for ferroelectric-fets,” in 2018 IEEE Symposium on VLSI
Technology. IEEE, 2018, pp. 131–132.
[14] D. Reis, K. Ni, W. Chakraborty, X. Yin, M. Trentzsch, S. D. Dünkel,
T. Melde, J. Müller, S. Beyer, S. Datta et al., “Design and analysis of
an ultra-dense, low-leakage, and fast fefet-based random access memory
array,” IEEE Journal on Exploratory Solid-State Computational Devices
and Circuits, vol. 5, no. 2, pp. 103–112, 2019.
[15] D. Reis, D. Gao, S. Angizi, X. Yin, D. Fan, M. Niemier, C. Zhuo,
and X. S. Hu, “Modeling and benchmarking computing-in-memory for
design space exploration,” in Proceedings of the 2020 on Great Lakes
Symposium on VLSI, ser. GLSVLSI ’20. Association for Computing
Machinery, 2020, p. 39–44.
[16] N. Sharma, A. Marshall, J. Bird, and P. Dowben, “Verilog-a based
compact modeling of the magneto-electric fet device,” in 2017 Fifth
Berkeley Symposium on Energy Efficient Electronic Systems & Steep
Transistors Workshop (E3S). IEEE, 2017, pp. 1–3.
[17] N. Sharma, J. P. Bird, C. Binek, P. A. Dowben, D. E. Nikonov, and
A. Marshall, “Evolving magneto-electric device technologies,” Semi-
conductor Science and Technology, 2020.
[18] D. E. Nikonov and I. A. Young, “Benchmarking of beyond-cmos
exploratory devices for logic integrated circuits,” IEEE Journal on
Exploratory Solid-State Computational Devices and Circuits, vol. 1, pp.
3–11, 2015.
[19] A. Parthasarathy and S. Rakheja, “Reversal time of jump-noise dynamics
for large nucleation,” IEEE Transactions on Magnetics, vol. 55, no. 2,
pp. 1–3, 2018.
[20] N. Sharma, A. Marshall, J. Bird, and P. Dowben, “Novel ring oscillator
design using me-mtj based devices,” in 2017 Fifth Berkeley Symposium
on Energy Efficient Electronic Systems & Steep Transistors Workshop
(E3S). IEEE, 2017, pp. 1–3.
[21] N. Sharma, A. Marshall, P. A. Dowben, and D. E. Nikonov, “Circuits
based on magnetoelectric transistor devices,” Mar. 26 2020, uS Patent
App. 16/581,691.
[22] C. Pan and A. Naeemi, “Complementary logic implementation for
antiferromagnet field-effect transistors,” IEEE Journal on Exploratory
Solid-State Computational Devices and Circuits, vol. 4, no. 2, pp. 69–
75, 2018.
[23] N. Sharma, C. Binek, A. Marshall, J. Bird, P. Dowben, and D. Nikonov,
“Compact modeling and design of magneto-electric transistor devices
8and circuits,” in 2018 31st IEEE International System-on-Chip Confer-
ence (SOCC). IEEE, 2018, pp. 146–151.
[24] H.-J. Chuang, B. Chamlagain, M. Koehler, M. M. Perera, J. Yan,
D. Mandrus, D. Tomanek, and Z. Zhou, “Low-resistance 2d/2d ohmic
contacts: a universal approach to high-performance wse2, mos2, and
mose2 transistors,” Nano letters, vol. 16, no. 3, pp. 1896–1902, 2016.
[25] A. Andreev, “Macroscopic magnetic fields of antiferromagnets,” Journal
of Experimental and Theoretical Physics Letters, vol. 63, no. 9, pp. 758–
762, 1996.
[26] K. D. Belashchenko, “Equilibrium magnetization at the boundary of
a magnetoelectric antiferromagnet,” Physical review letters, vol. 105,
no. 14, p. 147204, 2010.
[27] M. Anantram, M. S. Lundstrom, and D. E. Nikonov, “Modeling of
nanoscale devices,” Proceedings of the IEEE, vol. 96, no. 9, pp. 1511–
1550, 2008.
[28] S. Manipatruni, D. E. Nikonov, R. Ramesh, H. Li, and I. A. Young,
“Spin-orbit logic with magnetoelectric nodes: A scalable charge me-
diated nonvolatile spintronic logic,” arXiv preprint arXiv:1512.05428,
2015.
[29] H. Fang, S. Chuang, T. C. Chang, K. Takei, T. Takahashi, and A. Javey,
“High-performance single layered wse2 p-fets with chemically doped
contacts,” Nano letters, vol. 12, no. 7, pp. 3788–3792, 2012.
[30] T. Kosub, M. Kopte, R. Hühne, P. Appel, B. Shields, P. Maletinsky,
R. Hübner, M. O. Liedke, J. Fassbender, O. G. Schmidt et al., “Purely
antiferromagnetic magnetoelectric random access memory,” Nature com-
munications, vol. 8, no. 1, pp. 1–7, 2017.
[31] K. Toyoki, Y. Shiratsuchi, A. Kobane, C. Mitsumata, Y. Kotani, T. Naka-
mura, and R. Nakatani, “Magnetoelectric switching of perpendicular ex-
change bias in pt/co/α-cr2o3/pt stacked films,” Applied Physics Letters,
vol. 106, no. 16, p. 162404, 2015.
[32] A. Iyama and T. Kimura, “Magnetoelectric hysteresis loops in cr 2 o 3
at room temperature,” Physical Review B, vol. 87, no. 18, p. 180408,
2013.
[33] B. Razavi, “The strongarm latch [a circuit for all seasons],” IEEE Solid-
State Circuits Magazine, vol. 7, no. 2, pp. 12–17, 2015.
[34] X. Fong, Y. Kim, K. Yogendra, D. Fan, A. Sengupta, A. Raghunathan,
and K. Roy, “Spin-transfer torque devices for logic and memory:
Prospects and perspectives,” IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, vol. 35, no. 1, pp. 1–22,
2015.
[35] S. Angizi, Z. He, D. Reis, X. S. Hu, W. Tsai, S. J. Lin, and D. Fan,
“Accelerating deep neural networks in processing-in-memory platforms:
Analog or digital approach?” in 2019 IEEE Computer Society Annual
Symposium on VLSI (ISVLSI). IEEE, 2019, pp. 197–202.
[36] S. Angizi, J. Sun, W. Zhang, and D. Fan, “Graphs: A graph processing
accelerator leveraging sot-mram,” in 2019 Design, Automation & Test in
Europe Conference & Exhibition (DATE). IEEE, 2019, pp. 378–383.
[37] X. Dong, C. Xu, Y. Xie, and N. P. Jouppi, “Nvsim: A circuit-level
performance, energy, and area model for emerging nonvolatile memory,”
IEEE Transactions on Computer-Aided Design of Integrated Circuits
and Systems, vol. 31, no. 7, pp. 994–1007, 2012.
[38] . https://www.rambus.com/energy/, “Dram power model.”
[39] (2011) Ncsu eda freepdk45. [Online]. Available: http://www.eda.ncsu.
edu/wiki/FreePDK45:Contents
[40] S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi, “Cacti
5.1,” Technical Report HPL-2008-20, HP Labs, Tech. Rep., 2008.
[41] A. Patel, F. Afram, S. Chen, and K. Ghose, “Marss: a full system
simulator for multicore x86 cpus,” in 2011 48th ACM/EDAC/IEEE
Design Automation Conference (DAC). IEEE, 2011, pp. 1050–1055.
[42] S. Fukami, T. Anekawa, C. Zhang, and H. Ohno, “A spin–orbit torque
switching scheme with collinear magnetic easy axis and current config-
uration,” nature nanotechnology, vol. 11, no. 7, pp. 621–625, 2016.
[43] S. Punugupati, J. Narayan, and F. Hunte, “Strain induced ferromagnetism
in epitaxial cr2o3 thin films integrated on si (001),” Applied Physics
Letters, vol. 105, no. 13, p. 132401, 2014.
[44] A. K. Panda, A. Singh, R. Divakar, N. G. Krishna, V. Reddy, R. Thiru-
murugesan, S. Murugesan, P. Parameswaran, and E. Mohandas, “Crys-
tallographic texture study of pulsed laser deposited cr2o3 thin films,”
Thin Solid Films, vol. 660, pp. 328–334, 2018.
[45] S. K. Gupta, S. P. Park, N. N. Mojumder, and K. Roy, “Layout-aware
optimization of stt mrams,” in 2012 Design, Automation & Test in
Europe Conference & Exhibition (DATE). IEEE, 2012, pp. 1455–1458.
[46] M. R. Guthaus, J. E. Stine, S. Ataei, B. Chen, B. Wu, and M. Sarwar,
“Openram: An open-source memory compiler,” in 2016 IEEE/ACM
International Conference on Computer-Aided Design (ICCAD). IEEE,
2016, pp. 1–6.
