Clockless Spin-based Look-Up Tables with Wide Read Margin by Salehi, Soheil et al.
Clockless Spin-based Look-Up Tables with Wide Read Margin
Soheil Salehi, Ramtin Zand, Ronald F. DeMara
Department of Electrical and Computer Engineering, University of Central Florida, Orlando, FL, 32816 USA
ABSTRACT
In this paper, we develop a 6-input fracturable non-volatile Clock-
less LUT (C-LUT) using spin Hall effect (SHE)-based Magnetic Tun-
nel Junctions (MTJs) and provide a detailed comparison between
the SHE-MTJ-based C-LUT and Spin Transfer Torque (STT)-MTJ-
based C-LUT. The proposed C-LUT offers an attractive alternative
for implementing combinational logic as well as sequential logic
versus previous spin-based LUT designs in the literature. Foremost,
C-LUT eliminates the sense amplifier typically employed by us-
ing a differential polarity dual MTJ design, as opposed to a static
reference resistance MTJ. This realizes a much wider read margin
and the Monte Carlo simulation of the proposed fracturable C-LUT
indicates no read and write errors in the presence of a variety of
process variations scenarios involving MOS transistors as well as
MTJs. Additionally, simulation results indicate that the proposed C-
LUT reduces the standby power dissipation by 5.4-fold compared to
the SRAM-based LUT. Furthermore, the proposed SHE-MTJ-based
C-LUT reduces the area by 1.3-fold and 2-fold compared to the
SRAM-based LUT and the STT-MTJ-based C-LUT, respectively.
CCS CONCEPTS
•Hardware→ Spintronics andmagnetic technologies;Emerg-
ing architectures;Asynchronous circuits;Combinational cir-
cuits;Programmable logic elements;Process, voltage and tem-
perature variations;
KEYWORDS
Reconfigurable Logic, Fracturable LUT, Magnetic Tunnel Junction,
Spin-based Memory Cell, Spin Hall Effect, Spin Transfer Torque.
1 INTRODUCTION
Flexibility and runtime adaptability are two of the main motivations
for the wide adoption of reconfigurable fabrics. Among the most
commonly used reconfigurable fabrics, Field Programmable Gate
Arrays (FPGA) have been the primary focus due to their flexibil-
ity that allows realization of logic elements at medium and fine
granularities while incurring low non-recurring engineering costs
and rapid deployment to market. Additionally, FPGAs have been
researched as promising platform that can be utilized effectively to
increase reliability in case of process-voltage-temperature variation
[1]. The main challenge of static random access memory (SRAM)-
based FPGAs is their increased area and power consumption to
achieve flexible design. The main components of FPGAs are Look-
Up Tables (LUTs) and switch boxes that are mainly consisted of
SRAM cells [6]. However, SRAM-based LUTs incur limitations such
as high static power, volatility, and low logic density.
Innovations using emerging devices within FPGAs have been
sought to bridge the gaps needed to overcome the limitations of
SRAM-based FPGAs. High-endurance non-volatile spin-based LUTs
have been studied in the literature as promising alternatives to
SRAM-based LUTs, Flash-based LUTs, and other state-of-the-art
emerging LUTs such as resistive random access memory (RRAM)-
based LUTs and phase change memory (PCM)-based LUTs [2, 4, 10–
12, 14]. Spin-based devices offer non-volatility, near-zero static
power, high endurance, and high integration density [9, 13]. The
spin-based LUTs presented in the literature [2, 4, 10–12, 14] require
separate read and write operations as well as a clock, which makes
these LUTs a suitable candidate for sequential logic operations.
However, the main challenge that has not been addressed in the
literature is providing a spin-based LUT design for combinational
logic operation without the need for a clock. Additionally, proposed
spin-based LUTs proposed in the literature fail to maintain a wide
sense margin and high reliability without incurring significant area
and power dissipation overheads [2, 4, 10–12, 14]. In this paper,
in order to address the aforementioned challenges, we develop a
clockless 6-input fracturable non-volatile Combinational LUT (C-
LUT) with wide read margin using spin Hall effect (SHE)-based
Magnetic Tunnel Junction (MTJ) and provide a detailed comparison
between the SHE-MRAM and Spin Transfer Torque (STT)-MRAM
C-LUTs. Additionally, we provide detailed analysis on the reliability
of our proposed C-LUT in the presence of Process Variation (PV).
2 REALIZING FRACTURABLE 6-INPUT
CLOCKLESS LUT
The primary goal of using LUTs in the reconfigurable fabrics is
for implementing combinational logic. Generally,M-input Boolean
functions are implemented using LUTs that are considered a mem-
ory that has 2M memory cells. The inputs are assigned using a select
tree which is constructed with Pass Transistors and Transmission
Gates (TGs) [15]. Most contemporary FPGAs, utilize fracturable
6-input LUTs in their design in order to be able to implement one
6-input boolean function or two 5-input boolean functions [7]. Fig.
1(a) depicts our proposed 6-input fracturable SHE-MRAM C-LUT
and Fig. 1(b) illustrates the 6-input fracturable STT-MRAM C-LUT.
In Fig. 1(a) and Fig. 1(b), where red color indicates the write path
and black color indicates the read path. When theWWL andWWL
signals are asserted, the Write TGs of each memory cell, TGW1
and TGW2, will turn on and using Bit Lines, BLi , and Source Lines,
SLi , we write into both MTJs in each memory cell,MTJi andMTJi ,
so that they hold complementary values. IfMTJi is in the P state
thenMTJi will be in the AP state and vice versa. This will result in
a wide read margin during the read operation.
After the termination of the write operation, in order to read the
data stored in the MTJs, RWL and RWL signals will be enabled,
which results in activation of Read TGs of each memory cell, TGR.
During the read operation, PR and NR transistors are turned on
when RWL and RWL are asserted, which provides the read path
from VDD to GND. The source of PR, which is a PMOS transistor,
is connected to VDD to provide strong one and the source of NR,
which is anNMOS transistor, is connected toGND to provide strong
ar
X
iv
:1
90
3.
00
97
8v
2 
 [c
s.E
T]
  1
3 M
ar 
20
19
(a)
(b)
Figure 1: The circuit-level diagram of the proposed 6-input
fracturable Combinational Look-Up Table (C-LUT) using (a)
SHE-MTJ devices and (b) STT-MTJ devices.
zero. A voltage divider circuit is designed as a result of resistance
difference between the MTJi and MTJi , and the divided voltage
can be observed at the Di nodes shown in Fig. 1(a) and Fig. 1(b).
According to the select tree input signals, shown as A, B, C, D,
E, and F in Fig. 1, using two inverters, the voltage on Di nodes
will be amplified to generate the required output. Since the values
stored in theMTJi andMTJi devices are complementary, using one
MTJ device to retain the data value and the other as the reference
value will result in a wide read margin from AP to P [8], which we
leverage herein to increase the reliability of the read operation.
In the proposed C-LUT design there is no need for an external
clock or a large sense amplifier circuit. Furthermore, the proposed
Table 1: Comparison between SRAM-LUT and MRAM-LUT.
Power (µW ) Delay
Read Write Standby Read Write
SRAM LUT
Logic “0” 2.58 28.4 1.5 30 ps 20 ps
Logic “1” 7.55 27.7 1.85 30 ps 20 ps
Average 5.06 25.08 1.67 30 ps 20 ps
MRAM C-LUT
Logic “0” 14.38 81.16 0.31 20 ps 2 ns
Logic “1” 19.91 81.25 0.31 60 ps 2 ns
Average 17.15 81.18 0.31 40 ps 2 ns
Table 2: Area and Energy Consumption comparison be-
tween SRAM LUT and MRAM C-LUT.
Features SRAM LUT MRAM C-LUT
Storage Cells 384 MOS 128MTJ
Device Write/Control 384 MOS 256×4 + 256 MOS(1)
Count Read 261 MOS 267 MOS
Total 1029 MOS 1547 MOS + 128 MTJ
Average Energy Read 2.53 fJ 8.58 fJ
Consumption Write 14 fJ 162.36 fJ
(1) Write transistors are 4× larger than minimum feature size.
fracturable C-LUT can perform as a single 6-input LUT or two 5-
input LUTs. The Operation mode of the proposed LUT is controlled
using S5 and S6 signals. If S5 signal is enabled and S6 is disabled,
then the C-LUT will be operating as two 5-input LUTs and the
outputs of the C-LUT will be OUT0 and OUT2. On the other hand,
if S5 signal is disabled and S6 signal is enabled, then the C-LUT will
be operating as a 6-input LUT and OUT1 will be the C-LUT’s
output. The proposed fracturable C-LUT provides significantly
higher functional flexibility at the expense of slightly more power
consumption as studied in Section 3.
3 SIMULATION FRAMEWORK, RESULTS,
AND ANALYSIS
Herein, we use theHSPICE circuit simulator to validate the function-
ality of proposed C-LUT using 45nmCMOS technology and the STT-
MRAM model developed by Kim et al. in [5]. Figure 2(a) and 2(b)
show the transient response of the C-LUT implementing a 6-input
OR operation for ABCDEF = “000000” and ABCDEF = “111111”
input signals, respectively. In order to generate the current required
for a write delay of less than 2ns, the write transistors are required
to be enlarged 4-fold. As shown, the HSPICE simulations verify the
correct functionality of our proposed C-LUT.
Table 1 lists comparison results between the SRAM-LUT and
proposed C-LUT in terms of power consumption and delay. The
results show more than 80% standby power reduction at the cost of
increased write power which can be tolerated due to its infrequent
occurrence of write operations in LUTs. There are three energy
profiles in the FPGA LUT circuits: (1) Read energy consumption
during the FPGA normal operation, (2) Standby energy for the
LUTs that are not on the active datapath, which can constitute a
significant portion of the FPGA fabric, and (3) write energy that is
consumed during the LUTs’ configuration operation which occurs
(a)
(b)
Figure 2: Transient response of C-LUT implementing 6-
input OR operation for (a) ABCDEF = “000000” input signal,
and (b) ABCDEF = “111111” input signal.
rarely. Table 2 provides an area and energy consumption compari-
son between SRAM-LUT and C-LUT. As listed, the structure of a
6-input MRAM-based C-LUT requires 1, 547MOS transistors plus
128 MTJs, which can be fabricated on top of the CMOS transis-
tors incurring low area overhead, while the conventional 6-input
SRAM-LUT includes 1, 029 MOS transistors. This results in an area
overhead of roughly 50% for C-LUT compared to SRAM-LUT, which
is primarily induced by the write circuits. Thus, innovations are
Table 3: Iso-Delay Area and Write Energy Consumption
comparison between STT-MRAM and SHE-MRAMC-LUTs.
Features C-LUTSTT-MRAM SHE-MRAM
Storage Cells 128MTJ 128MTJ
Device Write/Control (256×4)+256MOS (1) 256+256MOS (2)
Count Read 267MOS 267MOS
Total 1547MOS+128MTJ 779MOS+128MTJ
Average Write 162.3 fJ 175.5 fJEnergy per Cell
(1) Write transistors are 4× larger than minimum feature size.
(2) Write transistors with minimum feature size are used.
sought to reduce the area and energy consumption of the MRAM
cell’s write circuit to mitigate these issues. Recently, SHE-MRAM
cells have attracted considerable attentions as an alternative for the
conventional STT-MRAMs. Herein, we have used the SHE-MRAM
device model proposed by Camsari et al. [3] to realize a circuit-level
simulation of our SHE-MRAM C-LUT. The results obtained exhibit
that a TG-based write circuit with minimum-sized MOS transistors
can produce the sufficient write current amplitude required for
switching the SHE-MRAM’s state in less than 2ns. Thus, table 3
provides an iso-delay comparison between STT-MRAM and SHE-
MRAM C-LUT in terms of device count and write energy. As listed,
the SHE-MRAM C-LUT can achieve more than 49% area reduction,
while realizing comparable write energy consumption. Moreover,
the SHE-MRAMC-LUT achieves at least 24% device count reduction
compared to SRAM-LUT.
Furthermore, to analyze the reliability of the read and write op-
erations of the proposed C-LUT, Monte Carlo (MC) simulation is
performed to cover a wide range of PV scenarios that may occur
in the fabricated device. The MC simulation is performed with
1, 000 instances considering the effects of PV on CMOS peripheral
circuit and the MTJs. In particular, variation of 10% for the MTJs’
dimensions along with 10% variation on the threshold voltage and
1% variation on transistors dimentions are assessed. Fig. 3(a) de-
picts the distribution of the switching times for TP−AP and TAP−P ,
Fig. 3(b) illustrates the distribution of MTJ resistances in RAP and
RP states, and Fig. 3(c) shows the distribution of read, IREAD , and
write, IWrite currents for the 1, 000MC instances. According to the
MC simulation results, C-LUT provides reliable write performance
resulting in less than 0.001% write errors in 1, 000 error-free MC
instances. In particular, results of the MC simulation show that the
switching time for P −AP is 1.63ns on average and the switching
time for AP − P is 1.13ns on average, which both fall under the 2ns
duration of the write operation, as depicted in Fig. 3(a). Additionally,
since the states of the MTJs are differential, they provide a wide
read margin and as a result there are less than 0.001% read errors
caused by PV based on the 1, 000 error-free MC simulation results.
Furthermore, our proposed C-LUT does not suffer from read distur-
bance due to the small read current compared to the write current
as shown in Fig. 3(c). According to our MC simulation results, the
read current is 38.21µA on average, which is significantly lower
than the write current that is 71.13µA on average.
(a) (b) (c)
Figure 3: Simulation Results of 1, 000MC instances for (a) TP−AP and TAP−P Switching Times, (b) RAP and RP resistance states,
and (c) read, IREAD , and write, IWrite currents.
4 CONCLUSION
To overcome the conventional SRAM-LUT limitations such as high
static power, volatility, and low logic density, we have proposed a
novel LUT design using spin-based devices. The proposed C-LUT is
a clockless design and a suitable candidate for combinational logic,
which can also be combined with a flip-flop circuit to implement
sequential logic. According to our simulation results, the standby
power dissipation of the proposed C-LUT is 0.31µW, which is re-
duced by 5.4-fold compared to the SRAM-based LUT. Moreover, the
structure of the proposed SHE-MRAM based C-LUT includes 250
and 768 fewer transistors compared to the SRAM-based LUT and
the STT-MRAM based C-LUT, respectively. Additionally, according
to the process variation reliability analysis, the C-LUT circuit ex-
hibits < 0.001% error rate for read and write operations in presence
of variations spanning both transistors and MTJs.
ACKNOWLEDGEMENT
This workwas supported in part by the National Science Foundation
(NSF) through ECCS-1810256.
REFERENCES
[1] Rawad Al-Haddad, Rashad S. Oreifej, Ramtin Zand, Abdel Ejnioui, and Ronald F.
DeMara. 2015. Adaptive Mitigation of Radiation-Induced Errors and TDDB in
Reconfigurable Logic Fabrics. In 2015 IEEE 24th North Atlantic Test Workshop.
IEEE, 23–32. https://doi.org/10.1109/NATW.2015.14
[2] Aliyar Attaran, Tyler David Sheaves, Praveen Kumar Mugula, and Hamid
Mahmoodi. 2018. Static Design of Spin Transfer Torques Magnetic Look Up
Tables for ASIC Designs. In Proceedings of the 2018 on Great Lakes Sympo-
sium on VLSI - GLSVLSI ’18. ACM Press, New York, New York, USA, 507–510.
https://doi.org/10.1145/3194554.3194651
[3] Kerem Yunus Camsari, Samiran Ganguly, and Supriyo Datta. 2015. Modular
approach to spintronics. Scientific reports 5, 1 (9 2015), 10571. https://doi.org/10.
1038/srep10571
[4] Kejie Huang, Yajun Ha, Rong Zhao, Akash Kumar, and Yong Lian. 2014. A
Low Active Leakage and High Reliability Phase Change Memory (PCM) Based
Non-Volatile FPGA Storage Element. IEEE Transactions on Circuits and Systems
I: Regular Papers 61, 9 (9 2014), 2605–2613. https://doi.org/10.1109/TCSI.2014.
2312499
[5] Jongyeon Kim, An Chen, Behtash Behin-Aein, Saurabh Kumar, Jian-Ping Wang,
and Chris H. Kim. 2015. A technology-agnostic MTJ SPICE model with user-
defined dimensions for STT-MRAM scalability studies. In 2015 IEEE Custom
Integrated Circuits Conference (CICC). IEEE, 1–4. https://doi.org/10.1109/CICC.
2015.7338407
[6] Ian Kuon, Russell Tessier, and Jonathan Rose. 2008. Fpga architecture: Survey and
challenges. Foundations and Trends in Electronic Design Automation 2, 2 (2008),
135–253. https://doi.org/10.1561/1000000005
[7] Andrew Percey. 2007. Advantages of the Virtex-5 FPGA 6-Input LUTArchitecture.
(2007). www.BDTIC.com/XILINX
[8] Soheil Salehi and Ronald F. DeMara. 2018. BGIM: Bit-Grained Instant-on Memory
Cell for Sleep Power Critical Mobile Applications. In 2018 IEEE 36th International
Conference on Computer Design (ICCD). IEEE, 342–345. https://doi.org/10.1109/
ICCD.2018.00058
[9] Soheil Salehi, Deliang Fan, and Ronald F Demara. 2017. Survey of STT-MRAM
Cell Design Strategies: Taxonomy and Sense Amplifier Tradeoffs for Resiliency.
ACM Journal on Emerging Technologies in Computing Systems 13, 3 (2017), 1–16.
https://doi.org/10.1145/2997650
[10] Daisuke Suzuki and Takahiro Hanyu. 2019. Design of a highly reliable, high-
speed MTJ-based lookup table circuit using fractured logic-in-memory structure.
Japanese Journal of Applied Physics 58, SB (2 2019), SBBB10. https://doi.org/10.
7567/1347-4065/aafd98
[11] Daisuke Suzuki, Yuhui Lin, Masanori Natsui, and Takahiro Hanyu. 2013. A
71%-Area-Reduced Six-Input Nonvolatile Lookup-Table Circuit Using a Three-
Terminal Magnetic-Tunnel-Junction-Based Single-Ended Structure. Japanese
Journal of Applied Physics 52, 4S (4 2013), 04CM04. https://doi.org/10.7567/JJAP.
52.04CM04
[12] Xifan Tang, Gain Kim, Pierre-Emmanuel Gaillardon, and Giovanni De Micheli.
2016. A Study on the Programming Structures for RRAM-Based FPGA Architec-
tures. IEEE Transactions on Circuits and Systems I: Regular Papers 63, 4 (4 2016),
503–516. https://doi.org/10.1109/TCSI.2016.2528079
[13] Hiroaki Yoda, Hideyuki Sugiyama, Tomoaki Inokuchi, Yuushi Kato, Yuichi Oh-
sawa, Keiko Abe, Naoharu Shimomura, Yoshiaki Saito, Satoshi Shirotori, Kat-
suhiko Koui, Buyandalai Altansargai, Souichi Oikawa, Mariko Shimizu, Mizue
Ishikawa, Kazutaka Ikegami, Yuuzo Kamiguchi, Shinobu Fujita, and Atsushi
Kurobe. 2017. High-Speed Voltage-Control Spintronics Memory (High-Speed
VoCSM). In 2017 IEEE International Memory Workshop (IMW). IEEE, 1–4. https:
//doi.org/10.1109/IMW.2017.7939085
[14] Ramtin Zand and Ronald F DeMara. 2017. Radiation-hardened MRAM-based
LUT for non-volatile FPGA soft error mitigation with multi-node upset tolerance.
Journal of Physics D: Applied Physics 50, 50 (12 2017), 505002. https://doi.org/10.
1088/1361-6463/aa9781
[15] Ramtin Zand, Arman Roohi, Soheil Salehi, and Ronald F. DeMara. 2016. Scalable
Adaptive Spintronic Reconfigurable Logic Using Area-Matched MTJ Design.
IEEE Transactions on Circuits and Systems II: Express Briefs 63, 7 (2016), 678–682.
https://doi.org/10.1109/TCSII.2016.2532099
