Hardware and software co-optimization for the initialization failure of
  the ReRAM based cross-bar array by Kim, Youngseok et al.
Hardware and soware co-optimization for the initialization failure of the ReRAM
based cross-bar array
YOUNGSEOK KIM, IBM Research, Albany, NY 12203, USA
SEYOUNG KIM∗, Department of Materials Science and Engineering, POSTECH, Pohang, South Korea
CHUN-CHEN YEH, IBM Research, Albany, NY 12203, USA and IBM omas J. Watson Research Center, Yorktown
Heights, NY 10598, USA
VIJAY NARAYANAN, IBM omas J. Watson Research Center, Yorktown Heights, NY 10598, USA
JUNGWOOK CHOI†, Department of Electronic Engineering, Hanyang University, Seoul, South Korea
Recent advances in deep neural network demand more than millions of parameters to handle and mandate the high-performance
computing resources with improved eciency. e cross-bar array architecture has been considered as one of the promising deep
learning architectures that shows a signicant computing gain over the conventional processors. To investigate the feasibility of the
architecture, we examine non-idealities and their impact on the performance. Specically, we study the impact of failed cells due to
the initialization process of the resistive memory based cross-bar array. Unlike the conventional memory array, individual memory
elements cannot be rerouted and, thus, may have a critical impact on model accuracy. We categorize the possible failures and propose
hardware implementation that minimizes catastrophic failures. Such hardware optimization bounds the possible logical value of the
failed cells and gives us opportunities to compensate the loss of accuracy via o-line training. By introducing the random weight
defects during the training, we show that the model becomes more resilient on the device initialization failures, therefore, less prone
to degrade the inference performance due to the failed devices. Our study shed light on the hardware and soware co-optimization
procedure to cope with potentially catastrophic failures in the cross-bar array.
CCS Concepts: •Hardware→ Non-volatile memory; Emerging architectures;
Additional Key Words and Phrases: inference, accelerator, neural networks, ReRAM
1 INTRODUCTION
Recent progress in algorithm and computing hardware has made it possible to train the neural network in large scale
and demonstrated that neuromorphic computing is a robust and ecient way to solve various problems including
the paern recognition [15], speech recognition [34], and optimization [32]. e central step of training is to modify
the mapping rule from one neuron layer to the other adjacent layer minimizing the assumed cost function. Such
mapping is oen expressed as weight matrices and optimizing the weights requires intensive matrix operations seing
a boleneck in the training. While transistor scaling and the subsequent performance gain has resolved such boleneck
last few decades, the pace of the scaling has signicantly slowed due to the growing cost and diminishing returns. e
technological and economical challenges drive the computing more toward to the specialized computing architectures
rather than the general purpose processors [31].
Recently, cross-bar array hardware has been investigated as an emerging architecture. e architecture exploits the
analog memory elements for multi-state weight representations and performs in-memory multiply-accumulate (MAC)
∗is work is done when the author is a member of IBM omas J. Watson Research Center, Yorktown Heights, NY 10598, USA
†is work is done when the author is a member of IBM omas J. Watson Research Center, Yorktown Heights, NY 10598, USA
ar
X
iv
:2
00
2.
04
60
5v
1 
 [c
s.E
T]
  1
1 F
eb
 20
20
operations [11, 14]. e architecture is optimized to perform the MAC operations in parallel and such parallelism shows
signicant advantages over a conventional hardware both in speed and power consumption [6, 11, 27, 29]. Moreover,
the feasibility of the architecture has been demonstrated using non-volatile memory such as a phase change memory
(PCM) [1], a resistive random access memory (ReRAM) [3, 36], and charge trapping memory [13, 23] element.
As cross-bar array hardware has a fundamentally dierent physical realization from the conventional architecture,
the optimal ways of implementing the cross-bar array in the architecture level has been investigated for the fully
connected [11], convolutional [6, 9, 27, 29], and recurrent [10, 22] neural networks. e analog nature of the memory
element, however, lacks the explicit quantization of the states and errors occurred in the analog hardware domain may
be accumulative. us evaluating and understanding the impact of the non-idealities in the analog domain and the
analog/digital interface is a key element to enable the cross-bar array technology. e device-to-device, cycle-to-cycle
update variations [11] as well as the resistance dri [21] has been considered for the individual analog elements. e
variations in the peripheral circuitry have been investigated including the error occurring at the analog/digital interface
[9–11] and the sense amplify circuitry [30]. In addition, general strategies to address the non-idealities in analog domain
has been discussed [17].
Following this trajectory, we have investigated the possible device failure scenario and its impact on the model
accuracy. e impact of the failed cell has been examined in the fully-connected neural network for the PCM based
architecture [26]. However, the lack of prior study on the ReRAM hardware motivates us to study the device failure
scenario which occurs during the initialization process of the ReRAM based cross-bar array. Aer the fabrication of the
device, the forming process is a necessary step to generate the lamentary conductive path for the ReRAM. During the
forming process, the memory cell may stuck on the high-resistance or low-resistance state and potentially play as a
critical source of error. Unlike the conventional memory, however, the analog memory elements are hard-wired and it
is not straightforward to re-route the failed cells. Instead, one may incorporate a whole redundant arrays or columns to
address the failed cells [33]. To provide a dierent perspective, we aempt to address the failed cells in cross-bar array
via hardware and soware co-optimization without utilizing hardware redundancies. We rst evaluate the potential
impact of such failed cells on the architecture and propose the optimized hardware. We discuss on the inference model
accuracy and propose an o-line training strategy that further compensates the accuracy lost due to the failed cells.
2 MODELING OF INITIALIZATION FAILURE IN RERAM CROSS-BAR ARRAY
Using cross-bar array, we may perform multiplication and summation in parallel and improve calculation eciency
by a following mechanism. Figure 1(a) shows the individual resistive memory unit cell connected to a word-line
(horizontal line) and a bit-line (vertical line). Assuming that the voltage across the word-line is vi , the current read
from the bit-line is Iji = дjivi , where дji is the conductance of the resistive element. By applying a certain voltage to
the available word-line at a given period of time ti ∈ [0, tmax ] and integrate the output current, each bit-line reads the
total accumulative charges as
zj =
∑
i
∫
t
dtIji =
∑
i
w ji × ai , (1)
where ai = vi tmax and дji = w ji . Here, the conductance of the resistive element дji represents the weight value w ji
and the applied bias vi at the given word-line maps to the activation of the previous layer or an input value. As a result,
Eq. (1) eectively represents the matrix multiplication of the hidden layer. Once the matrix multiplication is done by
the cross-bar array, the results are processed in peripheral circuits and converted to the digital data at the data interface
in Fig. 1(a) [11]. en, other operations such as the batch normalization or activation functions are executed in the
2
resistive 
memory 
cell
resistive 
memory 
cell
resistive 
memory 
cell
resistive 
memory 
cell
resistive 
memory 
cell
resistive 
memory 
cell
…
… … …
𝑖 + 1
𝑖
… … …
resistive 
memory 
cell
resistive 
memory 
cell
resistive 
memory 
cell
𝑁𝑖
…
…P
erip
h
eral circu
it
𝑗
Peripheral circuit
D
ata in
terface (A
D
C
/D
A
C
)
𝑗 + 1 𝑁𝐽
Data interface (ADC/DAC)
1R cell
1T1R cell
𝐼𝑗
𝑣𝑖
𝑔𝑗𝑖
𝐼𝑗
𝑣𝑖
𝑔𝑗𝑖
𝑣𝑔
(a) (b)
(c)
2D cross-bar array
Fig. 1. The schematics of cross-bar array system. (a) A cross-bar array is comprised of a resistive memory cell, peripheral circuits,
and data interface. (b) The resistive memory cell stores weight values in a form of a resistance, which may consist of one resistor (1R).
(c) The resistive memory cell may comprise one resistor and one transistor (1T1R) where the transistor plays a role to limit the current.
digital circuits before the next cross-bar array consumes the output data. For this reason, the weight elements utilized
for the matrix multiplication are represented by the analog memory elements, whereas remainder weights including
batch normalization parameters are assumed to be handled in the digital circuits in oating point precision.
Among the various resistive memory candidates, we mainly focus on the resistive switching memory or resistive
random access memory (ReRAM) whose resistance states are determined by a lamentary conductive path of a dielectric
material. e lament is formed by applying large bias across the dielectric which induces a so-breakdown of the
material and this process is called forming. For example, the breakdown process in HfOx occurs by the oxygen vacancy
movement which creates a metallic conductive path [18, 24]. Once such conductive path is formed, the reverse polarity
bias (or RESET bias) may be applied to induce the recombination of the oxygen and the oxygen vacancy. Once such
recombination disconnects the lamentary conductive path, the device is in a high resistance state (HRS). Likewise,
the same polarity with the forming bias (or SET bias) may be applied to re-connect the conductive lament to set the
device in a low resistance state (LRS). Figure 2 shows the ideal ReRAM cell which may allow us to explore intermediate
states between LRS and HRS by applying SET and RESET bias. In this study, we assume that each ReRAM cell is
wrien by write and verify method. is means that we assume the writing process involves in multiple checking and
re-writing steps until we reach the desired state for each cell. Although such method becomes a serious bole-neck of
the algorithm for training purpose, it is a valid approach for inference purpose as we need to update the weight matrices
once when we copy the trained model to the ReRAM cells. Using this approach, ReRAM cell have been demonstrated to
express 7 bits (128 states) with 0.39% uctuations compared with the spacings between states [20]. Of course, there
are other source of non-idealities such as noise and an intrinsic stochasticity in ReRAM. More comprehensive studies
have been done, i.e. [11], and have shown that the cross-bar array architecture is robust in certain degree to such
non-idealities. erefore, we assume that the ReRAM device that is not failed are ideal and the states are wrien exactly
as it supposed to be up to the 4 bits of quantization levels.
However, the device may fail during the forming process if (i) the ReRAM cell is not formed and remained open
for a given forming bias. e forming voltage is a function of dielectric thickness and area [4]. A ReRAM cell that
has process variations on these parameters may result in the forming voltage higher than the maximum voltage that
3
𝑔𝐹𝐹
≃ 0
𝑔𝑂𝐹
Δ𝑔 =
𝑔𝐿𝑅𝑆 − 𝑔𝐻𝑅𝑆
𝑁 − 1
conductance
Quantized into N states
𝛿𝑔𝐹𝐹 𝛿𝑔𝑂𝐹
Normal cell dynamic range Overformed
Forming 
failed
𝑔𝐻𝑅𝑆 𝑔𝐿𝑅𝑆
logic value
𝑤𝐻𝑅𝑆 𝑤𝐿𝑅𝑆
Δ𝑤 =
𝑤𝐿𝑅𝑆 − 𝑤𝐻𝑅𝑆
𝑁 − 1
𝑤𝑂𝐹𝑤𝐹𝐹
𝛿𝑤𝐹𝐹 𝛿𝑤𝑂𝐹
Fig. 2. The schematics of the conductance range of ReRAM. The top schematics show the filament states and their corresponding
conductance range. The normal cell dynamic range is defined by the conductance of high resistance state (дHRS ) and the low
resistance state (дLRS ). The intermediate states are quantized into N states and represents the relevant weight values ranging from
wHRS to wLRS . The conductance of the forming failed cell is close to 0 and the resultant deviation from the дHRS is defined as
δдF F . Such deviation results in an additional error in weight value from wHRS , which is defined as δwF F . The overformed device
tends to have lower resistance than the typical LRS. The resultant deviation from the desired LRS conductance is defined as δдOF ,
and the corresponding deviation in the weight element is defined as δwOF .
is supplied by the peripheral circuit, and may not be formed. In addition, (ii) the lament may be overformed and
cannot be disconnected by the RESET bias. To form the conduction path with a desirable lamentary thickness, it is
important to control the current supplied during the forming process. e failure of the current control oen results in
overforming the device. A fast rate of lament formation (less than 1ns [2]) make it challenging to control the supplied
current simply by removing the applied voltage timely. erefore, a typical approach to limit the supply current is to
utilize a transistor. However, the existence of the parasitic capacitance results in an overshoot of the current over the
compliance current even in the presence of the transistor [19]. Such overshooting in current partially ascribe to form
the lament thicker than the desirable value and may produce lament that is overformed and cannot be reset. Above
mentioned failures result in the cell states stuck in (i) HRS or (ii) LRS.
According to Eq. (1), an extreme resistance value results in an abnormally large weight value and may cause a
non-trivial impact on the model accuracy. In conventional random access memory architecture, such failed cells may
be re-routed to the working redundant cells. However, the cells in cross-bar array are hard-wired and the failed cells
cannot be re-routed. e goal of this study is to evaluate an impact of the two major failure mechanism on the inference
model, and provide methods to minimize the loss in the model accuracy as well as an insight on the acceptable forming
yield.
2.1 Individual resistive memory forming failure analysis
Figure 2 depicts the conductance range of the forming failed cells during the forming process. e top and boom
electrodes are electrically disconnected by the insulating dielectric for the forming failed (FF) devices. As a result,
the conductance of the FF device typically shows few orders of magnitude lower than the conductance of the formed
device. erefore, we may assume the conductance of the FF devices as дF F ' 0. is assumption is consistent with
the other works such as a dead device modeled as д = 0 in PCM array [26], or a stuck-at-1 fail cell modeled as a
minimum conductance (or д ' 0) of the system in ReRAM array [33]. In contrast, the resistance of the overformed
4
(OF) devices can be as low as few hundreds of Ω to few kΩ whereas the desirable ReRAM operation dynamic range is
from few hundreds kΩ to few MΩ [11]. Assuming the dynamic range of the working device is д ∈ [дHRS ,дLRS ], the
corresponding logical value is mapped to w ∈ [wHRS ,wLRS ]. However, if the conductance of OF or FF devices deviates
signicantly from a typical range of д, such scenario may lead to a serious failure of the cross-bar array architecture.
For example, if the resistance states of the OF device stuck at 2 kΩ whereas the LRS of the working cell is 100 kΩ, a
single OF device ows current 50 times larger than the expected LRS devices for a given input bias. In this case, the
current level may be above the maximum acceptable range of the current integrator at the peripheral circuits and the
whole bit-line signals may be overwhelmed. To prevent such scenario, we propose to use 1 transistor + 1 resistor (1T1R)
cell structure shown in Fig. 1(c) rather than 1 resistor + 1 selector or 1 resistor (1R) structure described in Fig. 1(b).
1T1R structure limits the maximum current by adjusting the gate bias, vд . Assuming that the transistor is operating
in the linear regime, the drain current is determined as Ids = (W /L)µCox (vд −vth )vds , whereW and L is the width
and length of the gate, respectively, µ is the carrier mobility, Cox is the gate oxide capacitance, vth is the threshold
voltage, andvds is the bias across the source and drain. erefore, we may set the eective conductance of the transistor
дtr = (W /L)µCox (vд −vth ) close to the дLRS by adjusting the gate bias. In this case, the unit cell conductance range
becomes
дcellLRS =
дLRS · дtr
дLRS + дtr
,
дcellHRS =
дHRS · дtr
дHRS + дtr
' дHRS ,
дcellOF =
дOF · дtr
дOF + дtr
' дtr ,
дcellF F =
дF F · дtr
дF F + дtr
' дF F ,
(2)
where we assume дF F  дHRS  дtr  дOF . With the conductance value dened in Eq. (2), we may map the
conductance to the logical value, or weight value. As it is described in Eq. (1),
дcellLRS →wLRS ,
дcellHRS →wHRS ,
(3)
where (wHRS ,wLRS ) ⊂ {(0, 2), (0,−1), (0, 1)} depending on the cross-bar array architecture. For the inference model,
the weight value is oen quantized into N states without any signicant accuracy loss of the model. erefore, we may
dene the spacing of the conductance and the corresponding spacing in the logical value as
∆д =
дcellLRS − дcellHRS
N − 1 → ∆w =
wLRS −wHRS
N − 1 . (4)
Note that the OF and FF devices are deviated from the expected maximum or minimum logical values, respectively.
Figure 2 describes such deviation as δдF F = дcellHRS − дcellF F for the FF device and δдOF = дcellOF − дcellLRS for the OF device.
eir relative signicance may be quantied by comparing with the logical value spacing dened in Eq. (4):
δдF F
∆д
=
δwF F
∆w
=(N − 1)д
cell
HRS − дcellF F
дcellLRS − дcellHRS
'(N − 1) 1
д¯ − 1 ,
(5)
5
(c)
…
---
𝐺𝑚𝑖𝑑
…
…
---
…
𝑖,
𝑗
𝑖
+
1
,𝑗
𝑁
𝑖,
𝑗
…
… 𝑖,
𝑗
𝑖
+
1
,𝑗
𝑁
𝑖,
𝑗
𝐼𝑟𝑒𝑓,𝑖 𝐼𝑟𝑒𝑓,𝑖+1 𝐼𝑟𝑒𝑓,𝑁𝑖
(a)
(b)
…
…
𝑖,
𝑗
𝑖
+
1
,𝑗
…
- -
𝑁
𝑖,
𝑗
-
…
…
𝑖,
𝑗
𝑖
+
1
,𝑗
…
(d)
𝑁
𝑖,
𝑗
𝐺𝑚𝑖𝑑 𝐺𝑚𝑖𝑑 𝐺𝑚𝑖𝑑
-
𝐺+ 𝐺−
-
𝐺+ 𝐺−
-
𝐺+ 𝐺−
Resistive cell resources
Peripheral resources
Vulnerability on fails
(a) (b) (c) (d)
(e)
Margin to error
Fig. 3. Cross-bar array cell structure. (a) Individual ReRAM cell representsw ji ∈ [0, 2] and shares the reference column of ReRAM
cells whose conductance is set to be дmid , or wmid = 1. By subtracting the current from the reference column, each ReRAM cell
represents full range of weight values, or w ji −wmid ∈ [−1, 1]. (b) The reference current, Ir ef , can be supplied from the external
circuit. (c) Individual ReRAM cell is paired with the reference cell. (d) A pair of ReRAM cell represents plus (w+) and minus (w−)
weight values. This choice of dierential reading allows to reduce the number of quantized states per cell, as each cell needs to store
half of the weight range. (e) Summary on the pros and cons of the cell structures. 7 indicates an unacceptable disadvantages over
other candidates, − indicates no dierence with others or an acceptable level of disadvantage. X indicates that the candidate is
superior or equivalent to other candidates.
δдOF
∆д
=
δwOF
∆w
=(N − 1) д
cell
OF − дcellLRS
дcellLRS − дcellHRS
=(N − 1)д
cell
OF /дcellLRS − 1
д¯ − 1
'(N − 1) дtr
дLRS (д¯ − 1) ,
(6)
where д¯ = дcellLRS /дcellHRS is the min/max conductance ratio between LRS and HRS, and the last equalities in Eqs. (5-6)
are from Eq. (2). e logical value deviation of both OF and FF devices is minimized when the min/max ratio (д¯) is
maximized or the cell has less quantized states (N ). In addition, seing дtr closer to the дLRS is benecial in further
reducing logical value deviation of the OF devices. However, this will reduce the overall dynamic range of the resistive
unit cell due to the diminishing cell LRS conductance, дcellLRS , and eventually reduces д¯. Prior to examine such trade-o,
we need to further specify N . e quantized states N is the number of states per individual cell, and its value may dier
depending on the choice of the specic cross-bar array architecture.
2.2 Forming failure scenario analysis on resistive memory cell
Figure 3 shows four possible resistive memory cell structures which are designed to express w ji ∈ [−1, 1] logical values.
e most ecient implementation in hardware resource perspective may be shown in Fig. 3(a). In this structure, each
individual cell represents (wmin ,wmax ) = (0, 2) and the cells in the last column serve as reference devices by xing its
conductance value to дmid = (дmax + дmin )/2, or wmid = 1. By subtracting the individual cells with the reference
column, the architecture represents plus and minus values of the weights, or w ji − wmid ∈ [−1, 1]. However, the
6
𝑔
𝑡𝑟
/𝑔
𝐿
𝑅
𝑆
𝛿𝑤𝐹𝐹
Δ𝑤
𝛿𝑤𝑂𝐹
Δ𝑤
𝑔
𝑡𝑟
/𝑔
𝐿
𝑅
𝑆
𝑛𝑏𝑖𝑡
(a)
(b)
𝑛𝑏𝑖𝑡
Fig. 4. A logical value deviation of FF and OF devices. (a) A logical value deviation of FF device (δwF F ) as a function of
a quantization bit (nbit ) and conductance of the transistor (дtr ). δwF F is calculated from Eq. (5). The contour line shows
δwF F /∆w = 1, 2, 3, 4, 8 and 16. (b) A logical value deviation of OF device (δwOF ). δwOF is calculated from Eq. (6). The contour
line shows δwOF /∆w = 1, 2, 3, 4, 8, 16, 32 and 64. We set the min/max ratio of the conductance дLRS /дHRS = 10 for (a) and
(b).
impact of the OF or FF device exerts on the whole row if a forming process failure occurs to the reference device. To
avoid such scenario, we may simply subtract current at the end of the column-wise current integration as described
in Fig. 3(b) by introducing Ir ef ,i =
∑
i vi · дmid . While this resolves the problem, we now need to calculate Ir ef ,i
using extra-peripheral circuits. Instead, Fig. 3(c) shows that we may double the hardware resources by introducing an
additional resistive memory to form a resistive memory unit cell. Here, we use one cell as a weight storage and the other
cell as a reference device. Note that each weight storage cell represents (wmin ,wmax ) = (0, 2), which is quantized into
N = 2nbit levels, where nbit is the target model quantization bit. An alternative scheme in Fig. 3(d) utilizes one cell for
a plus logical value, or (w+min ,w+max ) = (0, 1), and the other cell for a minus logical value, or (w−min ,w−max ) = (−1, 0).
As the nbit quantization is performed over (−1, 1), each cell now stores N = 2nbit−1 quantized levels as each of the cell
covers half of the weight range. According to the Eqs. (5-6), the 2T2R structure (two 1T1R) in Fig. 3(d) is advantageous
over other candidates as it reduces the number of quantized states from N = 2nbit to N = 2nbit−1 which eectively
reduces the logical value deviation of OF and FF devices. e above mentioned arguments are summarized in Fig. 3(e).
Adopting the 2T2R resistive memory cell described in Fig. 3(d), we examine how undesirable conductance deviation
in Fig. 2 is aected by the physical parameter variations and quantify its signicance in terms of the weight quantization
step. Specically, we vary the number of quantized conductance (nbit ) and the eective gate conductance (дtr ).
Figure 4(a) and (b) shows the resultant logical weight deviation δwF F and δwOF , respectively, with their magnitude
7
Table 1. Summary of the possible forming failure types and the resultant logical error and its occurrence probability. Working cells
are indicated as -.
w+ w− possible
value range
best possi-
ble value
logical
error
prob.
FF - [−1, 0] 0 −δwF F pF F (1 − p)
FF FF 0 0 ∼ 0 pF FpF F
FF OF −1 −1 −δwOF pF FpOF
OF - [0,+1] 0 +δwOF pOF (1 − p)
OF FF +1 +1 +δwOF pOFpF F
OF OF 0 0 ∼ 0 pOFpOF
- - [−1,+1] 0 (1 − p)(1 − p)
- FF [0,+1] 0 +δwF F (1 − p)pF F
- OF [−1, 0] 0 −δwOF (1 − p)pOF
normalized by the quantized step (∆w). We choose to vary nbit and дtr for the following reasons: (i) Eqs. (5-6) shows
that these variables determine δwF F and δwOF and (ii) they are tunable parameters without any signicant modication
in the hardware. If we lower the quantization bit, the quantization step ∆w is now represented by the larger conductance
window. is wider window allows the cell to tolerate a given amount of the conductance deviation. As a result, smaller
nbit always improves the logical error of the resistive memory cell. Figure 4(a) shows that the logical error of FF device
approaches to one quantization step when nbit ∼ 4. For a given nbit , larger дtr helps to reduce δwF F . is is due to the
fact that larger transistor conductance allows a larger dynamic range of the resistive cell, thus, the relative signicance
of the conductance deviation is reduced. In contrast, Fig. 4(b) shows that smaller дtr helps to reduce δwOF . In this
case, having дtr closer to дLRS sets a beer conductance lower-bound for OF devices. e overall trade-o shows
that optimizing the gate bias to adjust smaller дtr is desirable in reducing the logical weight value deviations rather
than fully turn-on the transistor during the MAC operation. is is because δwOF is more sensitive to the parameters
than δwF F . For example, δwOF is as large as three quantization steps at (nbit ,дtr /дLRS ) = (4, 1), whereas δwF F is
equivalent or smaller than one quantization step in wider range of parameters. Note that we x the min/max ratio
of the resistive cell as дLRS /дHRS = 10 in Fig. 4. e logical errors are further minimized for higher min/max ratio,
although improving min/max ratio requires longer term eort as it requires an improvement in materials and device
structures.
Even with a zero logical value deviation, the conductance state of the OF device is xed near дcellmax . erefore, the
corresponding logical value is stuck at ±1. We dene such failed cells as ±1 defects in the weight matrices. In contrast,
the conductance of the FF device is close to дcellmin and the resultant logical value is stuck at 0. We dene these failed
cells as 0 defects. With 2T2R structure in Fig. 3(d), we further minimize ±1 defects by utilizing the following forming
protocol. For example, in case the w+ cell is overformed while its pair w− is properly working, we can avoid having
unintentional +1 value by seing w− = −1. In this case, the conductance of the OF device in w+ is compensated by
the conductance of w− and we have 0 + δwOF as a logical error instead of +1 + δwOF . In other words, by utilizing
the resistive cell pairs, we eectively prune the mal-functioning weight elements instead of having +1 logical error.
Table 1 summarizes all the possible scenario. In Table 1, we dene the probability for FF and OF devices as pF F and pOF ,
respectively, and the total forming failure probability is dened as p = pF F + pOF . e straightforward initialization
protocol (referred to as a strategy A) may faithfully follow the Table 1. In this case, the resistive memory cells stuck at
8
Table 2. Two forming strategies based on Table 1 and the resultant 0 defect (p0) and ±1 defect (p1) probabilities.
strategy A strategy B
p0 ∼ p ∼ 2p
p1 2pOF · pF F pOF · pF F
±1 value with a probability of p1 = 2pF FpOF while (1 − p)2 cells are properly working. e remaining portion of the
cell is forced to be 0 and the weights are eectively pruned to minimize the impact of the failed cells. e probability
of having 0 defect is p0 = 2(1 − p)p + pOFpOF + pF FpF F ' 2p if p  1. Among all possible scenarios for 0 defects,
(w+,w−) = (FF ,−), (−,OF ) and (w+,w−) = (−, FF ), (OF ,−) cases are programmable and may programmed correctly
if the weight value is negative or positive, respectively. If we assume that the chance of having minus or plus weight
value for an arbitrary cell is 50%, the resistive cell still records the correct value with the probability of p0 ' 2p/2 ' p.
As a result, we end up having 0 defects with a probability of ∼ p.
Another possible strategy (referred to as strategy B) is to form one device rst and choose not to form its pair if
the device is FF device. is strategy will avoid the risk of having OF device for its pair during the forming process,
and further reduce the probability of having ±1 defect from 2pF FpOF to pF FpOF . Although such choice is eective in
minimizing ±1 defects, we no longer have (w+,w−) = (FF ,−) or (−, FF ) as we choose not to form the pair of FF devices.
If we simply assume that we do not write the weight values to the failed cells, the probability of having 0 defect is now
p0 ' 2p.
When the OF device occurrence is low, it is more benecial to choose the strategy A due to the lower p0 probability.
However, the strategy B is a good option if OF device occurrence rate is substantial compared with the occurrence of
the FF device. e discussed two forming strategy A and B and their corresponding probabilities of having ±1 defect
(p1) and 0 defect (p0) are summarized in Table 2. e numerical analysis in the following section utilizes the strategy B,
but the qualitative results are consistent for both strategy A and B as they diers only by the probability combinations
of 0 and ±1 defects.
3 NUMERICAL EXPERIMENTS
3.1 The impact of ±1 and 0 defects on the inference accuracy
We use CIFAR-10 dataset and test the impact of the forming failed cells on the image recognition model. e model
has been trained in ResNet-20 [16]. 50,000 images has been used for training and 10,000 images are utilized for the
test. According to the analysis in Section 2.1, a deviation of the weight from the desired logical value is minimized for
smaller quantized bit, or nbit . We quantize the weight [35] and activation [7] and nd no signicant degradation in the
test accuracy for nbit ≥ 4 both for weight and activation [7, 37]. Below nbit = 4 requires special techniques to maintain
the model accuracy [8, 25] which is out of scope in this study, thus, we choose nbit = 4. We obtain the baseline with a
test error of 7.99% aer 200 epochs of training with a learning rate scheduling of 0.1, 0.01, 0.005 at 80, 120, 180 epochs.
We rst assume an ideal defect by seing δwOF ,F F → 0 and evaluate the impact of 0 and ±1 random defects on
the trained model. Specically, we randomly change the weight matrix elements of the trained model to 0 or ±1 with
a probability of p0 or p1, respectively, and we set an equal probability for +1 and −1 defects for simplicity. As a rst
step, we isolate the impact of the 0 defects from ±1 defects by seing p0 = 0. We then vary p1 and obtain the inference
results. e black dashed line in Fig. 5(a) shows the inference results for p1 = 0.2%− 5%. e test error rapidly increases
and the model shows more than 20% error rate for p1 > 1%. However, such degradation in the test accuracy may be
9
(a) (b) (c)
Fig. 5. Inference error in the presence of ±1 and 0 defects. Black doed line shows the inference error from the baseline, and red
solid line shows the inference results from the defect-aware model. Each box plot represents the inference results from 50 dierent
defect configurations. The blue cross symbol shows the retraining results from one particular random defect configuration. (a) The
impact of the ±1 defects on the inference accuracy. The weight elements of the trained model are replaced by ±1 value with a give
probability p1. (b) The impact of the 0 defects on the inference accuracy. The weight elements of the trained model are replaced by
0 value with a give probability p0. (c) The impact of both ±1 and 0 defects on the inference accuracy. We vary the probability of
forming failure p = pOF + pF F , and the corresponding ±1 and 0 defects are placed with the calculated probability following Table 1.
recovered by retraining the network if we know the exact defect conguration. e blue x symbol in Fig. 5(a) shows the
recovered accuracy through 200 epochs of re-training from the baseline model using a xed defect conguration and
an identical learning rate scheduling with the baseline. is approach is valid when we know the exact location of the
existing defects. e defect conguration, however, likely appears in random and varies from chip to chip. erefore,
the strategy requires a prior knowledge of a specic conguration as well as computational resources to re-train for
each individual chips.
Instead, we may train the network with a prior knowledge of a type of defect and its likelihood. By introducing
random defects for a given probability during the training, the model may nd a local minimum which minimizes the
loss function even in the presence of random defects. rough this process, model may become resilient to a particular
set of defects without a knowledge of a specic defect conguration, thus a single trained model may be utilized for
numerous defect congurations. We examine the hypothesis by introducing randomly generated ±1 defects with a
probability p1. Aer the weight matrices at each layer are quantized, certain portion of the weight elements are replaced
by +1 in p1/2 and −1 in p1/2 probability. e modied weight matrices are utilized for a given training image to perform
the forward, and backward propagation. We then repeat the above procedure by generating a new set of random defects
in the same probability for the next training image and the defect-aware model has been trained with 200 epochs. Using
the resultant model, we generate 50 dierent random defect congurations. e corresponding inference results are
ploed as a red solid line in Fig. 5(a). When the defect probability is smaller than 0.2%, the inference results from
baseline (black doed line) and defect-aware model (red line) shows lile dierence. However, as we increase the defect
probability, the defect-aware model produces much beer inference results than the inference results from the baseline.
We perform the similar analysis for the 0 defects. Namely, we replace the quantized weight matrix elements to 0
with a probability of p0. Figure 5(b) shows a consistent trend with Fig. 5(a). One noticeable dierence is the overall
shi of the model error degradation. e model accuracy shows a signicant degradation when p0 >∼ 1% for 0 defect
whereas p1 >∼ 0.2% for ±1 defect. In other words, ±1 defect exhibits more critical impact on the model accuracy. is
10
may be understood by looking at the weight value distribution. Aer the 200 epoch training, most of the baseline
weight values are centered at 0 as the loss function tends to minimize the weight value. e test error is expected to
increase upon an introduction of the defects as the dierence between the desired weight value and the defect plays as
a source of error. e model weight matrices have much less ±1 values than 0, therefore, an arbitrary change of the
weight value to ±1 may induce larger amount of error in higher chance. rough the defect-aware training, however,
the weight value distribution is forced to shi toward ±1 by intentionally populating extreme weight values in random
fashion. Although the resultant model is deviated from the global minimum, the model becomes less prone to fail when
similar type of defects are introduced.
To keep the test error of CIFAR-10 dataset below 10%, for example, the baseline results shows that one needs to keep
the ±1 defect probability less than 0.1%, whereas the defect-aware model may tolerate up to ∼ 0.5%. In contrast, the
baseline model is less susceptible to 0 defects. As the baseline model inherently includes certain amount of randomly
distributed weight values close to 0, both baseline and defect-aware model tolerates the defect probability up to ∼ 1%.
e baseline and defect-aware model start to show a noticeable dierence once we introduce more than 2% of the 0
defects. is summarizes that the defect-aware model approach is eective when the defect type tends to manifest itself
as an uncommon logical value from typical weight value distributions. Furthermore, Fig. 5(a-b) shows that the baseline
model is more sensitive to ±1 defects. is result further justies the 2T2R structure over 1T1R shown in Fig. 3(d) as it
reduces the probability of having ±1 from pOF to pOFpF F .
With our understanding on the impact of ±1 and 0 defects on the inference model accuracy, we now utilize the
Table 1 to evaluate the optimized hardware performance as a function of the probability of having OF and FF devices.
When the total probability of the initialization failure scenario is p = pOF + pF F , we choose pOF = pF F which is the
worst case scenario of having maximum ±1 defect. Figure 5(c) shows the inference results as a function of p. e 2T2R
hardware minimizes the ±1 defects and the inference results get close to the results of the 0 defect in Fig. 5(b).
3.2 The impact of δwOF ,F F on the inference accuracy
In reality, the conductance value of the OF device may be higher than the normal cells and, for example, the logical
value becomes +1 + δwOF instead of +1. Similarly, the conductance of the FF device may exhibit lower conductance
than the expected value which results in 0 ± δwF F . Such non-ideal deviation from the expected logical value has
been described in Fig. 2 and formulated in Eqs. (5-6). In our cell design, the possibility of having ±1 or 0 defects and
the corresponding non-ideal deviation has been summarized in Table 1. Following the prescriptions in the table, we
examine the impact of such non-ideal deviations of the logical value of ±1 and 0 defects for δwF F ,δwOF . Figure 4
shows that δwF F /∆w ∼ 1− 2 and δwF F /∆w ∼ 3− 4 are reasonable range for nbit = 4 and дLRS /дHRS = 10. erefore,
we choose a range of parameters δwF F , δwOF ∈ [0, 3∆w], where ∆w = (1 − (−1))/(2nbit − 1) is the quantization step
of the weight elements.
Figure 6(a) shows the inference results from the baseline for p = pOF + pF F = 2%. e inference error is averaged
over 15 dierent defect congurations and the error is 16.2% for δwF F = δwOF = 0. As we increase the non-ideal
deviation of FF and OF devices, the averaged inference error is computed over 15 dierent defect congurations and a
relative error with respect to the error at δwF F = δwOF = 0 is ploed in Fig. 6(a). e error monotonically increases as
δwF F and δwOF increase and we lose additional ∼ 10.1% accuracy for δwF F = δwOF = 3∆w .
We perform the similar analysis on the defect-aware model for p = pOF +pF F = 2% and the inference error has been
obtained by averaging over results from 15 dierent defect congurations. We rst obtain the inference error of 12.8%
11
Inference results from 
baseline model
R
elativ
e erro
r (%
)
Inference results from 
defect-aware model
R
elativ
e erro
r (%
)
(a)
(b)
+10.1
0 (reference)
0 (reference)
+3.3
𝛿𝑤𝐹𝐹/Δ𝑤
𝛿𝑤𝐹𝐹/Δ𝑤
𝛿
𝑤
𝑂
𝐹
/Δ
𝑤
𝛿
𝑤
𝑂
𝐹
/Δ
𝑤
Fig. 6. Robustness of the defect-aware model to the non-ideality of the defects (δwOF ,F F ) on the test error. (a) The
inference error from the baseline at p = 2% is ∼ 16.2% with δwOF ,F F = 0. The plot shows the relative error increase caused by the
non-zero δwOF ,F F . (b) The inference error from the defect-aware model at p = 2% is ∼ 12.8% with δwOF ,F F = 0. The plot shows
the relative error increase by δwOF ,F F is smaller than the results from the baseline model.
for δwF F = δwOF = 0, which shows a less error than the inference results from the baseline. As we increase δwF F
and δwOF , the monotonically increasing error shows a consistent trend with that of the baseline inference results.
However, the relative error is ∼ 3.3% at δwF F = δwOF = 3∆w , which is three times smaller than the relative error
12
observed in the baseline results. erefore, the defect-aware model shows more robust inference capability against
non-ideal deviations of the weight elements.
3.3 The impact of the defect probability distribution on the inference accuracy
We may extract the initialization failure probability by collecting the statistics of the individual ReRAM data for a given
die. However, such statistics may vary from die to die due to the process variation across the wafer, or from wafer to
wafer due to the process dri induced by the equipment. As we have only discussed the defect-aware model at a xed
probability so far, it is worthwhile to investigate the strategy to address the statistical variations in the defect probability.
For a given statistically meaningful interval of p, we pursue to obtain lower test error with minimal standard deviation.
We proceed our discussion with an exemplary Gaussian distribution of the defect probability.
e failure analysis on 4Mb ReRAM test chip shows pOF ∼ 1.75% and pF F ∼ 9.04% [5]. Aer applying a dc bias
to form 128Kb arrays, applying alternating set/reset pulses results in pOF ∼ 1.28% and pF F ∼ 4.76% [28]. A series of
optimized pulse inputs has been applied to 4Kb arrays and showed the improved forming yield, yet to have pF F ∼ 1%
[12]. Based on these results, we set a reasonable distribution of pOF and pF F to illustrate our approach. Specically, we
assume the mean values of µpOF = µpF F = 1.5% with standard deviations of σpOF = σpF F = 0.5%. As a result, the total
defect probability has a mean value of µp = µpOF + µpF F = 3% and σp =
√
σ 2pOF + σ
2
pF F ' 0.7%. e right-side y-axis
of Fig. 7(a) describes the statistics of the defect probability and a relevant probability interval from p = 1% (−3σp ) to
p = 5% (+3σp ) has been indicated in dashed vertical lines.
e black line in Fig. 7(a) and the le-side y-axis depict the inference results from the baseline model. A rapid
increase of the test error has been observed from ∼ 10% at p = 1% to ∼ 40% at p = 5%. e result shows that the baseline
model is sensitive to the statistical variation of the defects due to its steep slope. e observed variation of the test
error is unacceptably large and, therefore, a small deviation from the expected defect probability may result in a large
deviation from an expected inference accuracy. e remaining plot shows the improved results from the defect-aware
model. e gray doed line in Fig. 7(a) shows the inference error from the defect-aware model trained at p = 1%
(−3σp ). A minor improvement over the baseline has been observed, but still suers a rapid degradation of the accuracy
for larger p. e dark red doed line in Fig. 7(a) exhibits the inference results of the defect-aware model trained at
p = 5% (+3σp ). As its test error saturates for p < 3.0%, the variation of the inference error becomes less sensitive to the
statistical variation. However, the overall test error is even higher at p < 2.0% than the baseline results, which results
in an overall degradation of the averaged test accuracy. e pink dashed line in Fig. 7(a) shows the inference error
from the defect-aware model at p = 3%. e model shows a beer accuracy than the results from the baseline or the
defect-aware model trained at p = 1% within the relevant range of p. e inference results at p > 4% shows an inferior
performance than the model with p = 5%, but the statistical performance is expected to be beer as the test error is
comparable or smaller within the most signicant interval of µp − σp ≤ p ≤ µp + σp .
Alternatively, we may utilize the known distribution of the defect probability while the training. We randomly select
pOF and pF F based on the known distribution and generate a defect conguration. e generated defect conguration
is utilized for forward and backward propagation for a given image during the training procedure. e same procedure
is repeated to generate a new conguration with a dierent defect probability for the next image training cycle. e
inference results from the distribution-aware model is indicated as a blue solid line in Fig. 7(a). e test error of the
distribution-aware model shows comparable or less test error compared with the defect-aware model trained at the
13
~3𝜎~3𝜎
(a)
(b)
Fig. 7. Robustness of defect-aware model to the impact of the statistical variation of failure probability. (a) Inference
results from the baseline model (black solid line), defect-aware model (dashed line), and distribution-aware model (blue solid line).
Each box plot represents the inference results from the 50 dierent defect configuration. The le y-axis shows the assumed Gaussian
distribution with a mean defect probability µp = 3% and a standard deviation σp ' 0.7%. (b) Inference results from the same set of
models in (a), but utilizes the assumed defect probability distribution shown in (a). The black-dot in box plot represents a possible
inference test error for individual inference chip and their possible mean and standard deviation is presented in a box plot.
mean value of p = 3% and, therefore, is expected to perform beer in terms of mean and standard deviation of the
inference test error.
14
Figure 7(b) shows the inference results distribution from various models discussed in Fig. 7(a). e defect probability
pOF and pF F are randomly selected from the assumed distribution and we generate defect congurations with the
selected probability. e same procedure is repeated for 500 dierent trials. Each point represents one inference chip
whose defect probability follows a given Gaussian distribution and the resultant statistical inference performances are
presented. Figure 7(b) clearly shows that both mean and standard deviation is the lowest for the distribution-aware
model whose test error is µ ± σ = 13.5 ± 0.3%.
4 SUMMARY AND CONCLUSION
We have investigated the failure mechanism of the forming process in the resistive memory based cross-bar array.
As the forming process is an essential initialization step for ReRAM devices, it is pivotal to understand the impact of
the forming failure on the cross-bar array performance. Specically, we have focused on the two failure scenario: (i)
forming failure (FF) devices and (ii) overformed (OF) devices. In the cross-bar array architecture, the FF device ows
minimal current and may act as 0 defects whereas OF device allows an excessive amount of current which may cause a
catastrophic error. We have discussed 1T1R structure to regulate the excessive current ow and set a lower bound for
conductance. In this case, OF device acts as ±1 defect. We have further discussed 2T2R structure and corresponding
forming strategy that minimizes ±1 defects. Having the optimized hardware that minimizes the critical initialization
errors, we then evaluate the impact of the ±1 and 0 defects on the inference model accuracy. e numerical experiments
show that the impact of ±1 defect is more signicant than that of the 0 defects. Furthermore, we show that the trained
model becomes resilient to the defects if we apply the same type of random defects on the weight matrices during
the training. e defect-aware model shows smaller test error than the inference results from the baseline when ±1
and 0 defects are introduced. We also have discussed the logical value deviation of OF device (δwOF ) and FF device
(δwF F ). Such deviation from the desired value occurs as both OF or FF device exhibits conductance which is deviated
from the lowest/highest possible conductance value. We have discussed the reasonable range of such deviation and
show that defect-aware model is also resilient to the errors occurred by δwOF ,F F . Lastly, we discuss a variation in the
defect probability which may happen by the process variation or process dri. We have examined the impact of such
variation by using an exemplary defect probability distribution. e result shows that including a known distribution of
the defect probability during the training further improves the average inference performance as well as the standard
deviation of the test error.
One caveat of this study is that our analysis only includes the initialization failure scenario as the hardware non-
ideality. In reality, the intricate interplay between dierent error sources may worsen the model accuracy and demands
tighter parameter control than the scenario where individual components are separately considered [14]. To make a
further statement on the realistic accuracy, future study needs to incorporate our results with the simulation framework
that is capable of addressing other types of non-idealities.
e yield improvement of the semiconductor industry is one of the most critical factors determining manufacturing
cost. In this regard, understanding the major failure mechanism and co-optimize the hardware and soware may
provide an opportunity for sustainable incremental yield improvement on top of the device engineering and material
innovations. e proposed optimization strategy provides a straightforward method to implement, and may open a
new avenue to address similar problems for cross-bar array based on other types of devices (e.g. Phase change memory)
or other types of neural network (e.g. recurrent network).
15
5 ACKNOWLEDGMENTS
Y. Kim thanks to eodorus E Standaert and Robert R. Robison for their managerial support. Y. Kim thanks to Wilfried
Haensch and Georey W. Burr for fruitful discussions. S. Kim acknowledges useful discussions from Tayfun Gokmen
and managerial support from John Rozen. J. Choi thanks to Kailash Gopalakrishnan for his managerial support.
REFERENCES
[1] Stefano Ambrogio, Pritish Narayanan, Hsinyu Tsai, Robert M Shelby, Irem Boybat, Carmelo Nolfo, Severin Sidler, Massimo Giordano, Martina
Bodini, Nathan CP Farinha, et al. 2018. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558, 7708 (2018),
60.
[2] G Bersuker, DC Gilmer, and D Veksler. 2019. Metal-oxide resistive random access memory (RRAM) technology: Material and operation details and
ramications. In Advances in Non-Volatile Memory and Storage Technology. Elsevier, 35–102.
[3] M. Bocquet, T. Hirztlin, J. . Klein, E. Nowak, E. Vianello, J. . Portal, and D. erlioz. 2018. In-Memory and Error-Immune Dierential ReRAM
Implementation of Binarized Deep Neural Networks. In 2018 IEEE International Electron Devices Meeting (IEDM). 20.6.1–20.6.4. hps://doi.org/10.
1109/IEDM.2018.8614639
[4] An Chen. 2013. Area and thickness scaling of forming voltage of resistive switching memories. IEEE Electron Device Leers 35, 1 (2013), 57–59.
[5] Ching-Yi Chen, Hsiu-Chuan Shih, Cheng-Wen Wu, Chih-He Lin, Pi-Feng Chiu, Shyh-Shyuan Sheu, and Frederick T Chen. 2014. RRAM defect
modeling and failure analysis based on march test and a novel squeeze-search scheme. IEEE Trans. Comput. 64, 1 (2014), 180–190.
[6] Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. Prime: A novel processing-in-memory
architecture for neural network computation in reram-based main memory. In ACM SIGARCH Computer Architecture News, Vol. 44. IEEE Press,
27–39.
[7] Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. PACT:
Parameterized Clipping Activation for antized Neural Networks. arXiv preprint arXiv:1805.06085 (2018).
[8] Mahieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural
networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016).
[9] Tayfun Gokmen, Murat Onen, and Wilfried Haensch. 2017. Training Deep Convolutional Neural Networks with Resistive Cross-Point Devices.
Frontiers in Neuroscience 11 (2017), 538. hps://doi.org/10.3389/fnins.2017.00538
[10] Tayfun Gokmen, Malte Rasch, and Wilfried Haensch. 2018. Training LSTM Networks with Resistive Cross-Point Devices. arXiv preprint
arXiv:1806.00166 (2018).
[11] Tayfun Gokmen and Yurii Vlasov. 2016. Acceleration of deep neural network training with resistive cross-point devices: design considerations.
Frontiers in neuroscience 10 (2016), 333.
[12] Alessandro Grossi, Cristian Zambelli, Piero Olivo, Enrique Miranda, Valeriy Stikanov, Christian Walczyk, and Christian Wenger. 2016. Electrical
characterization and modeling of pulse-based forming techniques in RRAM arrays. Solid-State Electronics 115 (2016), 17–25.
[13] X. Guo, F. M. Bayat, M. Bavandpour, M. Klachko, M. R. Mahmoodi, M. Prezioso, K. K. Likharev, and D. B. Strukov. 2017. Fast, energy-ecient,
robust, and reproducible mixed-signal neuromorphic classier based on embedded NOR ash memory technology. In 2017 IEEE International
Electron Devices Meeting (IEDM). 6.5.1–6.5.4. hps://doi.org/10.1109/IEDM.2017.8268341
[14] W. Haensch, T. Gokmen, and R. Puri. 2019. e Next Generation of Deep Learning Hardware: Analog Computing. Proc. IEEE 107, 1 (Jan 2019),
108–122. hps://doi.org/10.1109/JPROC.2018.2871057
[15] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference
on computer vision and paern recognition. 770–778.
[16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity mappings in deep residual networks. In European conference on computer
vision. Springer, 630–645.
[17] Shubham Jain, Aayush Ankit, Indranil Chakraborty, Tayfun Gokmen, Malte J. Rasch, Wilfried Haensch, Kairshik Roy, and Anand Raghunathan.
2019. Neural network accelerator design with resistive crossbars: Opportunities and challenges. IBM Journal of Research and Development 63 (2019),
10:1–10:13.
[18] Hyoungsub Kim, Paul C McIntyre, Chi On Chui, Krishna C Saraswat, and Susanne Stemmer. 2004. Engineering chemically abrupt high-k metal
oxide/ silicon interfaces using an oxygen-geering metal overlayer. Journal of Applied Physics 96, 6 (2004), 3467–3472.
[19] K Kinoshita, K Tsunoda, Y Sato, H Noshiro, S Yagaki, M Aoki, and Y Sugiyama. 2008. Reduction in the reset current in a resistive random access
memory consisting of Ni O x brought about by reducing a parasitic capacitance. Applied Physics Leers 93, 3 (2008), 033506.
[20] Can Li, Miao Hu, Yunning Li, Hao Jiang, Ning Ge, Eric Montgomery, Jiaming Zhang, Wenhao Song, Noraica Da´vila, Catherine E Graves, et al. 2018.
Analogue signal and image processing with large memristor crossbars. Nature Electronics 1, 1 (2018), 52.
[21] Rui Liu, Heng-Yuan Lee, and Shimeng Yu. 2017. Analyzing inference robustness of ReRAM synaptic array in low-precision neural network. In
Solid-State Device Research Conference (ESSDERC), 2017 47th European. IEEE, 18–21.
16
[22] Yun Long, Taesik Na, and Saibal Mukhopadhyay. 2018. ReRAM-based processing-in-memory architecture for recurrent neural network acceleration.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems 99 (2018), 1–14.
[23] F. Merrikh-Bayat, X. Guo, M. Klachko, M. Prezioso, K. K. Likharev, and D. B. Strukov. 2018. High-Performance Mixed-Signal Neurocomputing
With Nanoscale Floating-Gate Memory Cell Arrays. IEEE Transactions on Neural Networks and Learning Systems 29, 10 (Oct 2018), 4782–4790.
hps://doi.org/10.1109/TNNLS.2017.2778940
[24] Feng Pan, Shuang Gao, Chao Chen, C Song, and F Zeng. 2014. Recent progress in resistive random access memories: materials, switching
mechanisms, and performance. Materials Science and Engineering: R: Reports 83 (2014), 1–59.
[25] Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classication using binary convolutional
neural networks. In European Conference on Computer Vision. Springer, 525–542.
[26] Louis P Romero, Stefano Ambrogio, Massimo Giordano, Giorgio Cristiano, Martina Bodini, Pritish Narayanan, Hsinyu Tsai, Robert M Shelby, and
Georey W Burr. 2019. Training fully connected networks with resistive memories: Impact of device failures. Faraday Discussions 213 (2019),
371–391.
[27] Ali Shaee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R Stanley Williams, and Vivek Srikumar.
2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News
44, 3 (2016), 14–26.
[28] Hsiu-Chuan Shih, Ching-Yi Chen, Cheng-Wen Wu, Chih-He Lin, and Shyh-Shyuan Sheu. 2011. Training-based forming process for RRAM yield
improvement. In 29th VLSI Test Symposium. IEEE, 146–151.
[29] Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. 2017. Pipelayer: A pipelined reram-based accelerator for deep learning. In High Performance
Computer Architecture (HPCA), 2017 IEEE International Symposium on. IEEE, 541–552.
[30] Xiaoyu Sun, Shihui Yin, Xiaochen Peng, Rui Liu, Jae-sun Seo, and Shimeng Yu. [n. d.]. XNOR-ReRAM: A scalable and parallel resistive synaptic
architecture for binary neural networks. algorithms 2 ([n. d.]), 3.
[31] Neil ompson and Svenja Spanuth. 2018. e Decline of Computers As a General Purpose Technology: Why Deep Learning and the End of
Moores Law are Fragmenting Computing. Available at SSRN 3287769 (2018).
[32] Gabriel Villarrubia, Juan F De Paz, Pablo Chamoso, and Fernando De la Prieta. 2018. Articial neural networks used in optimization problems.
Neurocomputing 272 (2018), 10–16.
[33] Lixue Xia, Wenqin Huangfu, Tianqi Tang, Xiling Yin, Krishnendu Chakrabarty, Yuan Xie, Yu Wang, and Huazhong Yang. 2017. Stuck-at fault
tolerance in RRAM computing systems. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 8, 1 (2017), 102–115.
[34] Ying Zhang, Mohammad Pezeshki, Phile´mon Brakel, Saizheng Zhang, Ce´sar Laurent, Yoshua Bengio, and Aaron Courville. 2016. Towards End-to-End
Speech Recognition with Deep Convolutional Neural Networks. Interspeech 2016 (2016), 410–414.
[35] Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. Dorefa-net: Training low bitwidth convolutional neural
networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).
[36] Z. Zhou, P. Huang, Y. C. Xiang, W. S. Shen, Y. D. Zhao, Y. L. Feng, B. Gao, H. Q. Wu, H. Qian, L. F. Liu, X. Zhang, X. Y. Liu, and J. F. Kang. 2018. A
new hardware implementation approach of BNNs based on nonlinear 2T2R synaptic cell. In 2018 IEEE International Electron Devices Meeting (IEDM).
20.7.1–20.7.4. hps://doi.org/10.1109/IEDM.2018.8614642
[37] Neta Zmora, Guy Jacob, and Gal Novik. 2018. Neural Network Distiller. hps://doi.org/10.5281/zenodo.1297430
17
