CrossStack: A 3-D Reconfigurable RRAM Crossbar Inference Engine by Eshraghian, Jason K. et al.
CrossStack: A 3-D Reconfigurable RRAM Crossbar
Inference Engine
Jason K. Eshraghian1, Kyoungrok Cho2 and Sung Mo Kang3
1School of Electrical, Electronic and Computer Engineering, University of Michigan, Ann Arbor, MI 48109 USA
2College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 362763, South Korea
3Jack Baskin School of Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064 USA
Abstract—Deep neural network inference accelerators are
rapidly growing in importance as we turn to massively par-
allelized processing beyond GPUs and ASICs. The dominant
operation in feedforward inference is the multiply-and-accumlate
process, where each column in a crossbar generates the current
response of a single neuron. As a result, memristor crossbar
arrays parallelize inference and image processing tasks very
efficiently. In this brief, we present a 3-D active memristor
crossbar array ‘CrossStack’, which adopts stacked pairs of
Al/TiO2/TiO2 – x/Al devices with common middle electrodes. By
designing CMOS-memristor hybrid cells used in the layout of
the array, CrossStack can operate in one of two user-configurable
modes as a reconfigurable inference engine: 1) expansion mode
and 2) deep-net mode. In expansion mode, the resolution of the
network is doubled by increasing the number of inputs for a
given chip area, reducing IR drop by 22%. In deep-net mode,
inference speed per-10-bit convolution is improved by 29% by
simultaneously using one TiO2/TiO2 – x layer for read processes,
and the other for write processes. We experimentally verify both
modes on our 10× 10× 2 array.
Index Terms—deep learning, in-memory computing, memris-
tors, neural network, RRAM.
I. INTRODUCTION
Increasing the sizes of artificial neural networks (ANNs)
has been the most common response to the copious amount
of data being continuously generated. Where training sets are
in excess of billions of inputs processed through hundreds
of millions of parameters in a neural network [1], new ways
to speed up the processing of all this information must be
developed. Since 2012, the training runtime of neural networks
has doubled every 3–4 months. It is equally important to
develop hardware that is not only optimized for running very
large scale networks, but is adaptable to the unceasing wave
of emerging ANN topologies.
Memristors are now ubiquitous in neuromorphic computing
literature due to their long retention [2]–[4], excellent scala-
bility [5], [6], fast read and write speeds [7], [8], compatibility
with CMOS technology [9]–[12], and precise weight updates
[13], [14]. The development of dense integrated structures with
3-D stacked crossbar arrays enables an increase in the through-
put for a given chip area, but thus far, target applications of
3-D RRAM have mostly been limited to digital memory [16]–
[23].
It can be difficult for ASIC designs to keep pace with the
latest developments in machine learning due to the lag time
between algorithm development and the full IC design cycle.
With popular machine learning methods in a rapidly evolving
state, reconfigurability is of paramount importance to ensure
hardware does not become obsolete the moment new network
architectures and topologies are introduced. In response to
this, we present a reconfigurable stacked pair of memristor
crossbars dubbed ‘CrossStack’ that can be operated in one
of two modes: 1) expansion mode, and 2) deep-net mode. In
expansion mode, the resolution of the network is doubled by
increasing the number of inputs for a given chip area, thus
reducing IR drop by 22% of an equivalent array. In deep-net
mode, inference speed per-10-bit convolution is improved by
29% by simultaneously using one array for read processes, and
the other for write processes. This brief will demonstrate how
to selectively isolate and couple the two layers using CMOS
cell design. We experimentally verify this on our in-house
fabricated crossbar stack, using separately controlled CMOS
circuitry in the SK Hynix 180nm process.
II. MATRIX-VECTOR MULTIPLICATION
To perform analog-domain multiply-and-accumulate (MAC)
using RRAM arrays, a voltage vector V i is applied at the
input, multiplied by a conductance matrix G, to generate a
current vector of i = V iG in accordance with Ohm’s Law
and Kirchhoff’s Current Law. On a pre-trained network, the
conductance of each memristor is programmed prior to read-
out. For further detail on MAC on a crossbar, we recom-
mend referring to [26], [27]. In a conventional crossbar, the
memristors must be programmed prior to read-out. Where the
number of parameters exceed the memory resources available,
RRAM cells must be reprogrammed while the data flow of the
activations is stalled. CrossStack avoids possible stalling that
may occur by adding the option to pipeline the read and write
processes simultaneously across the two layers. How this is
achieved will be described in the following section.
III. CIRCUIT OPERATION MODES
A simplified structure of CrossStack is depicted in Fig. 1(a),
and the memristor-CMOS cell schematic is given in Fig. 1(b).
This work presents two modes in which CrossStack may
operate in. These modes are controlled by an active-high read-
enable signal RE.
A. Expansion Mode
Expansion mode enables access to a shared column line


























Fig. 1. (a) Simplified 3-D memristor crossbar array in expansion mode with
cumulative current through the shared electrode (b) Memristor-CMOS cell (c)
Current flow in read mode when read-enable is set high (d) Current flow in
write mode when read-enable is set low (e) A stacked pair of cells during the
read cycle in expansion mode (f) A stacked pair of cells in deep-net mode.
stacking of memristors doubles the number of possible inputs
and weights to each neuron for a given length of column wire
which is illustrated in Fig. 1(a) when compared to conventional
crossbars. This can be formalized by the following equation:
[










G1,1 G1,2 ... G1,m





Gn,1 Gn,2 ... Gn,m

(1)
where m is the number of columns in the crossbar and n is
the number of rows. In expansion mode, n is double that of a
2-D array, as there are rows both above and below the shared
column contributing to output current.
To activate a cell in expansion mode, the read-enable signal
of all cells must be identical. The current pathway from input
to output of a single cell is depicted in Fig. 1(c), and a pair
of stacked cells is shown in Fig. 1(e). To generate a read-
out current at each shared column, RE must be set high (in
our case, V ≥ VTh = 0.4V ; described in further detail
in our experimental results). If transistors N1 and N2 are
treated as switches, then N2 would be off and N1 would be
on. Therefore, a current pathway is formed from both upper
and lower crossbar arrays to the column line. To program
a memristor, RE is set low as in Fig. 1(d). This causes
N2 to switch on and N1 off, which forms a pathway from
the memristor to ground and prevents current from flowing
to the shared column. Therefore, the two crossbars can be
programmed independently of one another by isolating them.
The transistors are sized to ensure a negligible leakage current,
and to sustain a sufficiently low ON resistance in comparison
to the memristance while it is operating in the linear region
such that it behaves as an ideal switch.
B. Deep-net Mode
Deep-net mode ensures both pairs of arrays are isolated
from one another at all times. Isolation biasing enables each
layer to operate independently. The two arrays must have
complementary RE signals as distinct from expansion mode,
depicted in Fig. 1(f). This means that while one array generates
a read-out current, the other array of memristors are being
programmed (in write mode). As described above, when RE
is low the cell does not contribute to read-out current and
input voltage V i for the write layer is applied such that
the conductances written to the memristors correspond to the
weights of the next hidden layer in the neural network. Once
the analog output current is digitized as a voltage, the write-
layer has been pre-programmed and there is no need to buffer
the current or store it in memory, as is required by most other
pipelines [27], [29], [30]. This process is repeated, but now the
roles of the crossbars are reversed. The original read-layer is
now programmed to the next hidden layer of weights, and the
original write-layer generates the read-out current. Thus, read-
write processes run in parallel. One layer is programmed in
anticipation of the output from the other layer, which enables
a novel in-situ pipeline.
In describing how to program a memristor, the key dif-
ference between expansion and deep-net modes is that in
expansion, all cells are identically biased for either read or
write at any given time. In deep-net mode, 50% of the cells
are biased for read processes and the remaining 50% are biased
for write processes, only switching once each operation is
complete. The shorter read-out time is subsumed within the
programming cycle, but at the expense of half the number
of inputs n in (1). We quantitatively demonstrate this in our
experimental results.
Fig. 2. A 10× 10× 2 prototype of CrossStack, with a cross-sectional view




Rs static resistance of set; Rs = 1gs 10KΩ ± 7%
Rr static resistance of reset;Rr = 1gr 100KΩ ± 10%
VDD supply voltage 1.8 V
Vread read voltage 0.5 V
Vwrite write voltage 1.2 V
tread current-read out time 10 ns
twrite programming time 250 ns
n number of memristors 200
vTh threshold voltage |0.4| V
Pcritical worst case power consumption 2.9 mW
Rwire wire resistance 3.2Ω p/cell
Acell cell area 20 µm × 20µm
W/L transistor sizing 450nm/180nm = 2.5
IV. EXPERIMENTAL RESULTS
A. Crossbar Fabrication
CrossStack was constructed based on two monolithically
integrated crossbar arrays with a shared central electrode
which make up the column line, based on a sandwich structure
of Al/TiO2/TiO2 – x/Al layers. A layer of Al (200-nm-thick and
20-µm-wide) was deposited using photolithography on a glass
wafer as the bottom electrode (irradiated using mask alignment
for 100 s, subsequently developed at 23◦C for 120 s). Any
excess Al outside of the channel region was removed via wet
etching (H3PO4 : HNO3 : CH3COOH : H2O = 80 ml : 5 ml : 5
ml : 10 ml) at a rate of ∆d/t = 300 nm/min. TiO2 (5-nm-thick)
and TiO2 – x (15-nm-thick) thin films were formed by atomic
layer deposition and magnetron sputtering to fabricate the
memristor. Another 200-nm-thick layer of Al was sputtered as
the top electrode using photolithography to create a 20 µm ×
20 µm mask. After a planarization step, the top stack of active
layer and metal were also deposited. Note that the polarity of
the pair of active layers are mirrored, as distinguishable from
[31]. This allows for identical input voltages to be applied
when programming the memristors. Fig. 2 shows a working
10 × 10 × 2 prototype of CrossStack, and a cross-sectional
view of the memristor taken using a focus ion beam analyzer.
B. Cell Test
The CMOS cell was designed in the SK Hynix 180-nm
process where vDD = 1.8 V, vTh = 0.4 V, and our parameters
are summarized in Table I. We used a read voltage in the range
of {0V, 0.5V}, a write voltage of 1.2 V, both applied at VIN ,
with measurements taken with a Micromanipulator tungsten
probe tip. The pinched hysteresis loop measured with a 50Hz
source is shown in Fig. 3(a).
First, we tested the circuit in expansion mode. In the critical
case of a write voltage applied to all devices Vwrite=1.2V and
RE is set HIGH (1.8V), we show that IR drop is decreased
by approximately 22% compared to a similar planar inference
engine in [26]. This is shown by the slower decline of current
output across columns in Fig. 3(b) for CrossStack, where the
gold standard would be a perfectly straight line. Therefore, we
verified that expansion mode reduces line losses for a given
number of inputs due to the shorter length of column wire
required. The trade-off is that column wires must handle twice
the current capacity of an equivalent 2-D array. This demands
wide column lines to handle such current capacity without
risk of electromigration. But given that RRAM is integrated
in the back end of the line, minimum thickness of higher layer
routing wires may mitigate this.
Designing for deep-net mode opens up susceptibility to
leakage currents through N1 during write mode, concurrently
with N2 in read mode (see Fig. 1). The worst case leakage
occurs along the shared column line, when there is a minimal
read current and a maximal write current. The read array
memristors will all be OFF, Rr = 100KΩ, and all write
array memristors will be ON, Rs = 10KΩ. To calculate the
minimum value for Vread, we use 0.5 V as the maximum read
voltage and assume an input of 7-bit resolution which requires
increments of approximately 500mV/128 ≈ 4mV . The output
current of a single cell under these conditions was measured to
be 39.6 nA, which is 1% off the ideal 40 nA. The accumulated
leakage current through a column of 10 memristors being
programmed (VIN=1.2V; RE=LOW) was negligible in our
experiments, and simulated to be approximately 2.5 pA per
cell (i.e., 2.5pA×10 cells = 25pA column current). This is
6.3×10−2% of the worst-case read-current, and so in the 180-
nm process used, we are able to employ minimum transistor
dimensions (W = 450nm, L = 180nm, W/L = 2.5). Leakage
from a single cell is shown as a function of a DC sweep at
the input is measured in Fig. 3(c) and (d), with a Monte Carlo
parametric sweep of resistance overlaid (10kΩ±7%, Gaussian
distribution across 200 trials).
Current read-out measurements taken from a transient anal-
ysis under a switching input during a read-cycle are given in
Fig. 4, where current deviates by 8% from the measured value
in the worst-case. This suggests that a single device should
be limited to 3.5-bits. We note this is a limitation not of the
CrossStack architecture, but rather of memristor variability.
In the conservative case of a 1-bit cell across 10 columns,
the 10ns read-out is subsumed within the 25ns programming
time as opposed to being added to a separate cycle after
programming. This confirms improvement in speed due to our
in-situ pipelining mechanism by 29%.
V. DISCUSSION AND CONCLUSION
With two different modes available to the user, how would
you decide to use one over another? In short, deep-net is suited
for tasks where speed is paramount and expansion mode is for
tasks involving a large number of inputs, be it for increased
(a) (b) (c) (d)
Fig. 3. Experimental results (a) Pinched hysteresis loop of a single memristor (b) IR loss comparison in expansion mode (c) Worst-case leakage current
through transistor N1 during a write cycle in deep-net mode with Monte Carlo parameter sweep of memristance (d) Extreme test case under large input write
signal resulting in a nonlinear voltage drop across the memristor in deep-net mode.
Fig. 4. Transient analysis of a single cell current output during deep-net mode
in a read cycle.
resolution, or to process a very large number of parameters in
a shallow network.
There are three primary motivations in the use of expansion
mode: 1) fully-connected networks typically have a far larger
number of connections than in convolutional layers. Expansion
mode doubles the number of possible inputs for a fixed area
which enables increases the number of possible inputs; 2) a
larger number of inputs requires a larger length of column
wire. By using expansion mode to double up on inputs,
we reduce wire resistance to half the original amount for
a given number of inputs. More inputs per unit length of
wire enables our crossbar to be resilient to write failures
arising from line resistance IR drops by reducing line losses
by 22%. 3) The lack of reliable analog memory technology
makes it hard to perform hardware multiplexing in analog,
and transmitting analog values over long distances or at
high speed is not efficient. Restricting each memristor to
one of two conductance values (i.e., single-bit memristors)
means that one would require log2(n) memristors for n bits
of precision. For crossbars that use conductance states of
memristors conservatively, digital computing is more desirable
but requires more memristors in a crossbar than analog for
the same precision. Expansion mode facilitates this increase
in devices whilst halving crossbar area. The drawback is that
a larger current must be carried through the wire with more
vias.
Deep-net mode is a novel processing scheme where the two
crossbar layers are isolated from one another by appropriately
biasing RE, in order to parallelize read and write operations.
It is engaged where speed is of greater importance than
precision. In the most simplistic way (ignoring max-pooling
and dropout), a crossbar performs inference in the following
way:
1) Write weights to memristor conductances,
2) Apply a sub-threshold read voltage,
3) Buffer or store read-out signal in memory,
4) Write the next hidden layer weights to the crossbar,
5) Repeat steps 2-5 until output is generated.
In deep-net mode, CrossStack performs steps 2 and 4 simul-
taneously which enables read and write operations to occur
together. By the time an output is generated from step 3, it is
ready to be processed by the next hidden layer in step 4 for a
speed increase of 29% over an equivalent 2-D array.
The most prevailing issues in realizing large scale cross-
bar arrays beyond our working prototype are mismatch and
endurance. At this stage, our current error rate was 8%
which limits the number of bits that can be represented by
a single cell. Subthreshold current was hardly an issue in
our design, but Fig. 3(d) shows the nonlinearity of voltage
across the memristor under high write voltages (VIN > 3V ),
and leakages will become more prevalent at shorter channel
lengths. Capacitive coupling as a result of high programming
voltages into bit-lines that are reading out poses a degree of
risk, but should be tolerable for higher metal layers given
the wider spacing. Allowing for heat dissipation in a 3D
structure is an ongoing challenge and the subject of significant
process-related research, often calling for external heat sinks.
In general, the advantages seem to outweigh the drawbacks
and CrossStack presents a promising methodology for recon-
figurable inference acceleration to adapt to the various types
of ANNs being deployed.
ACKNOWLEDGMENT
This work was supported by the National Research Founda-
tion of Korea grant funded by the Korean government (MSIT)
(No.2020R1F1A1069381).
REFERENCES
[1] D. Mahajan, R. Girshick, V. Ramanathan, K. He, M. Paluri, Y. Li,
A. Bharambe and L. van der Maaten, “Exploring the limits of weakly
supervised pretraining,” Proc. of the European Conf. on Computer Vision
(EECV), pp. 181–196, 2018.
A. G. Howard, et al., “MobileNets: Efficient convolutional neural networks
for mobile vision applications”, arXiv preprint, arXiv:1704.04861.
[2] K. H. Kim, S. Jo, S. Gaba and W. Lu, “Nanoscale resistive memory
with intrinsic diode characteristics and long endurance,” Applied Physics
Letters, vol. 96, no. 5, p. 053106, Feb. 2010.
[3] A. Kumar, M. Das, V. Garg, B. S. Sengar, M. T. Htay, S. Kumar, A. Kranti
and S. Mukherjee, “Forming-free high-endurance Al/ZnO/Al memristor
fabricated by dual ion beam sputtering,” Applied Physics Letters, vol 110,
no. 25, p. 253509, Jun. 2017.
[4] C.-Y. Lin et al., “Adaptive synaptic memory via lithium ion modulation
in RRAM devices”, Small, vol. 16, no. 42, p. 2003964, Oct. 2020.
[5] S. Pi, C. Li, H. Jiang, W. Xia, H. Xin, J. J. Yang and Q. Xia, “Memristor
crossbar arrays with 6-nm half-pitch and 2-nm critical dimension,” Nature
nanotechnology, vol. 14, no. 1, p. 35, Jan. 2019.
[6] E. J. Fuller, S. T. Keene, A. Melianas, Z. Wang, S. Agarwal, Y. Li,
Y. Tuchman, C. D. James, M. J. Marinella, J. J. Yang, A. salleo and
A. A. Talin, “Parallel programming of an ionic floating-gate memory array
for scalable neuromorphic computing”, Science, vol. 364, no. 6440, pp.
570–574, May 2019.
[7] E. J. Merced-Grafals, N. Dávila, N. Ge, R. S. Williams and J. P. Strachan,
“Repeatable, accurate, and high speed multi-level programming of mem-
ristor 1T1R arrays for power efficient analog computing applications,”
Nanotechnology, vol. 27, no. 36, p. 365202, Aug. 2016.
[8] C.-Y. Lin et al., “A high-speed MIM resistive memory cell with an
inherent vanadium selector”, Applied Materials Today, vol. 21, p. 100848,
Dec. 2020.
[9] J. K. Eshraghian, K. Cho, C. Zheng, M. Nam, H. H. C. Iu, W. Lei, and
K. Eshraghian, “Neuromorphic Vision Hybrid RRAM-CMOS Architec-
ture,” IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 26,
no. 12, pp. 2816—2829, Dec. 2018
[10] B. Chakrabarti, M. A. Lastras-Montaño, G. Adam, M. Prezioso,
B. Hoskins, M. Payvand, A. Madhavan, A. Ghofrani, L. Theogarajan,
K. T. Cheng, D. B. Strukov, “A multiply-add engine with monolithically
integrated 3D memristor crossbar/CMOS hybrid circuit,” Scientific reports,
vol. 14, no. 7, p. 42429, Feb. 2017.
[11] F. Cai, J. M. Correll, S. H. Lee, Y. Lim, V. Bothra, Z. Zhang, M. P. Flynn
and W. D. Lu, “A fully integrated reprogrammable memristor-CMOS
system for efficient multiply-accumulate operations”, Nature Electronics,
vol. 15, no. 1, Jul. 2019.
[12] M. R. Azghadi et al., “Complementary Metal-Oxide Semiconductor and
Memristive Hardware for Neuromorphic Computing”, Advanced Intelli-
gent Systems, vol. 2, no. 5, Mar. 2020.
[13] C. Lammie and M. R. Azghadi, “MemTorch: A simulation framework
for deep memristive Cross-Bar architectures”, 2020 IEEE Int. Symp.
Circuits and Systems (ISCAS), Sevilla, Spain, Oct. 2020.
[14] O. Krestinskaya, K. N. Salama and A. P. James, “Analog Backpropaga-
tion Learning Circuits for Memristive Crossbar Neural Networks”, 2018
IEEE Int. Symp. Circuits and Systems (ISCAS), Florence, Italy, May 2018
[15] A. Serb, J. Bill, A. Khiat, R. Berdan, R. Legenstein and T. Prodro-
makis, “Unsupervised learning in probabilistic neural networks with multi-
state metal-oxide memristive synapses,” Nature communications, vol. 7,
p. 12611, Sep. 2016.
[16] C. J. Chevallier, C. H. Siau, S. F. Lim, S. R. Namala, M. Matsuoka,
B. L. Bateman and D. A. Rinerson, “A 0.13µm 64Mb multi-layered
conductive metal-oxide memory”, 2010 IEEE Int. Solid-State Circuits
Conf. (ISSCC), pp. 260–261, Feb. 2010.
[17] C. H. Wang, Y. H. Tsai, K. C. Lin, M. F. Chang, Y. C. King, C. J. Lin,
S. S. Sheu, Y. S. Chen, Y. H. Lee, F. T. Chen and M. J. Tsai, “Three-
dimensional 4F 2 ReRAM cell with vertical BJT driver by CMOS logic
compatible process,” IEEE Trans. on Electron Devices, vol. 58, no. 8,
pp. 2466–2472, Jul 2011.
[18] S. H. Jo, T. Kumar, S. Narayanan, W. D. Lu and H. Nazarian, “3D-
stackable crossbar resistive memory based on field assisted superlinear
threshold (FAST) selector,” 2014 IEEE Int. Electron Devices Meeting,
Washington, DC, USA, pp. 6–7, Dec. 2014.
[19] J. Hong, M. Stone, B. Navarrete, K. Luongo, Q. Zheng, Z. Yuan, K. Xia,
N. Xu, J. Bokor, L. You and S. Khizroev, “3D multilevel spin transfer
torque devices,” Applied Physics Letters, vol. 112, no. 11, p. 112402,
Mar. 2018.
[20] M. R. Azghadi et al., “Hardware implementation of deep network
accelerators towards healthcare and biomedical applications”, IEEE Trans.
Biomedical Circuits and Systems, vol. 14, no. 6, pp. 1138-1159, Dec. 2020.
[21] S. Baek, J. K. Eshraghian, S.-H. Ahn, A. James and K. Cho, “A
memristor-CMOS Braun multiplier array for arithmetic pipelining”, 2019
26th IEEE Int. Conf. Electronics, Circuits and Systems (ICECS), Genoa,
Italy, pp. 735–738, Nov. 2019.
[22] R. Fastow, K. Hasnat, P. Majhi and O. Jungroth, “Three-dimensional
(3D) memory with shared control circuitry using wafer-to-wafer bonding,”
United States Patent Application, US 16/011, 139, Feb. 2019.
[23] J. K. Eshraghian, K. R. Cho, H. H. C. Iu, T. Fernando, N. Iannella,
S. M. Kang and K. Eshraghian, “Maximization of crossbar array memory
using fundamental memristor theory,” IEEE Trans. on Circuits and Syst.
II: Express Briefs, vol. 64, no. 12, pp. 1402–1406, Oct. 2017.
[24] C. Li, M. Hu, Y. Li, H. Jiang, N. Ge, E. Montgomery, J. Zhang, W. Song,
N. Davila, C. E. Graves, Z. Li, J. P. Strachan, P. Lin, Z. Wang, M. Barnell,
Q. Wu, R. S. Williams, J. J. Yang, and Q. Xia, “Analogue signal and image
processing with large memristor crossbars”, Nature Electronics, vol. 1, no.
1, pp. 52-–59, Jan. 2018.
[25] J. K. Eshraghian, S. M. Kang, S. Baek, G. Orchard, H. H. C. Iu, and
W. Lei, “Analog weights in ReRAM DNN Accelerators”, 2019 IEEE
Artificial Circuits and Systems Conference, Mar. 2019.
[26] M. Hu, J. P. Strachan, Z. Li, E. M. Grafals, N. Davila, C. Graves,
S. Lam, N. Ge, J. J. Yang, R. S. Williams, “Dot-product engine for
neuromorphic computing: Programming 1T1M crossbar to accelerate
matrix-vector multiplication,” Proc. of the 53rd Annual Design Automation
Conference, p. 19, Jun. 2016.
[27] A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Stra-
chan, M. Hu, R. S. Williams and V. Srikumar, “ISAAC: A convolutional
neural network accelerator with in-situ analog arithmetic in crossbars,”
ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 14–26,
Oct. 2016.
[28] S. Stathopoulos, A. Khiat, M. Trapatseli, S. Cortese, A. Serb, I. Valov
and T. Prodromakis, “Multibit memory operation of metal-oxide bi-layer
memristors,” Scientific Reports, vol. 7, no. 17532, Dec. 2017.
[29] L. Song, X. Qian, H. Li and Y. Chen, “Pipelayer: A pipelined ReRAM-
based accelerator for deep learning,” 2017 IEEE Int. Symp. on High
Performance Computer Architecture (HPCA), pp. 541–552, Feb. 2017.
[30] H. Valavi, P. J. Ramadge, E. Nestler and N. Verma, “A 64-Tile 2.4-
Mb In-Memory-Computing CNN accelerator employing charge-domain
computer,” IEEE Journal of Solid-State Circuits, vol. 54, no. 6, pp. 1789-
1799, Mar. 2019.
[31] G. C. Adam, B. D. Hoskins, M. Prezioso, F. Merrikh-Bayat,
B. Chakrabarti and D. B. Strukov, “3-D memristor crossbars for analog
and neuromorphic computing applications,” IEEE Transactions on Elec-
tron Devices, vol. 64, no. 1, pp. 312–318, Jan. 2017.
