Many mainstream applications require multiply-accumulate calculations, such as image processing and neuromorphic computing. Multiply-accumulate calculations using memristor crossbar arrays is a remarkable method for extremely high implementation efficiency, whereas the memristor array fabrication technology is still not mature and it is difficult to fabricate large-scale arrays with high-yield, which will seriously affect the performance of the application running on the RRAM crossbar. This paper proposes an inputs split based calibration method that improves the application accuracy in tolerating variations and stuck-at-fault of memristor devices. To demonstrate the performance of the calibration algorithm, the case of image sharpening and three neural networks architectures are applied for simulation experiments. And experimental results show that the calculation accuracy can be improved by up to 26.83% at 90% yield of crossbar arrays, and the success rate of the algorithm can be as high as 99.3% when there are several arrays cascaded. It is of great significance to the application of arrays in multiply-accumulate computation.
I. INTRODUCTION
As demand for neuromorphic computing rapidly develops, the traditional hardware implementation becomes less efficient [1] . In recent years, neuromorphic computing systems based on the conventional FPGA, GPU and CPU are widely used in industry and enterprises [2] - [4] , such as mobile or real-time and robotics scenarios. As an important part of a neuromorphic computing system, a neural network often uses a weight matrix to represent a group of synapses [5] . Consequently, the computation process of deep neural networks (DNNs) can be transformed into matrix multiplication, but traditional hardware implementation of neural networks require a large amount of memory and consume excessive system resources [6] , [7] .
In recent years, the memristor [8] has received significant attention as synapses for neuromorphic systems [9] - [12] , and memristor crossbar arrays can easily carry out matrix multiplication [13] which is a computationally expensive operation for neural networks [14] . The basic structure of a memristor crossbar array [15] is shown in Fig. 1 . A memristor crossbar is an array with memristive devices at its cross-The associate editor coordinating the review of this manuscript and approving it for publication was Asif Islam Khan. points, which receives voltage inputs (at its rows) and produces an output current (at its columns) that is the weighted summation of the encoded weights at the column and the input voltage pulses. It is a direct result of the Kirchhoff's law. The voltage pulses as the input, and the current output along a column from any cross-point will be the product of the conductance at that cross-point. Moreover, the memristor can achieve an integration density of 100Gbits/cm 2 [16] , which is a promising device for large-scale and highly parallel neuromorphic computing [17] , [18] .
However, device variations and manufacturing yield are still a big issue in memristor fabrication [19] , [20] , and the memristor may be damaged by testing cycles and aggressive programming. These situations will remarkably degrade the multiply-accumulate computation accuracy. Unfortunately, there are few researches on improving application accuracy of memristive crossbar arrays. Most of works focused on the memristor fabrication reliability as electronic synapses [21] , [22] , and some works proposed solutions to neuromorphic computing with few levels [23] , [24] or damaged device [25] . In addition, a number of hardware-based solutions [26] - [28] are proposed, however, these methods also bring inevitable power consumption and hardware cost. To maintain the performance affected by stuck-at faults (SAF) or resistance variations of the application running on the RRAM crossbar, we propose a software-based calibration method for multiplyaccumulate computation on RRAM crossbar arrays, which splits the input vector into a standard vector and another input. And to evaluate the performance of the method, the pearson correlation coefficient is introduced to measure the correlation of ideal output and calibration output. To show the performance of calibration method more intuitively, we executed the experiments on image sharpening and some neuromorphic computing architecture based on memristive arrays [10] , [29] , [30] to verify the calibration performance of the array in cascade, experimental results demonstrate that the accuracy can be effectively improved by the calibration method when the yield of device is too low.
The rest of this paper is organized as follows, Section II describes the related work of this paper. Section III introduces the calibration methodology for memristor crossbar arrays. Section IV introduces the pearson correlation coefficient to measure the correlation and exhibits the simulation performance of calibration method on a case of image sharpening and some neuromorphic computing architecture. The final Section V concludes the paper.
II. RELATED WORK
As memristor device suffers from various reliability issues, the idea of exploiting fault tolerance to improve the application efficiency of memristor crossbar arrays has attracted some contributions.
Liu et al. [25] proposed a retraining method for memristorbased neuromorphic computing systems to regain the network weights. This approach classifies synapses weights into insignificant and significant sets: defects on insignificant synapses hardly affects the system accuracy whereas the degradation of the accuracy is giant when the significant synapses fall at bad devices. The algorithm is designed for neural networks on crossbar arrays and requires knowledge of the structure of the neural network.
The error correct code (ECC) [31] with crossbar arrays was utilized to detect errors by using specific codes and recover the corrupted data. In contrast, the memristor-based neuromorphic computing showed a totally different vulnerability on each individual devices [32] . Besides, the computing accuracy reflects the impact and not the logic error of devices.
Li et al. [33] found that the memristor device slowly drifts from its original programmed state (e.g., a voltage of 0.1V causes the device to deviate about 2% from the original state in 1s), so the accuracy of memristor crossbar arrays based system decreases gradually. They presented an inline calibration method to improve system accuracy under drift.
A pre-computation algorithm called ''VAT'' was proposed by Liu et al. [34] , which maps the vital synapses to the memristor device with low variation. In the training process of neural networks, a scalar parameter γ is used for evaluating the resistance variation of memristor crossbar arrays. The new weight matrix derived by repeatedly self-adjusting the parameter γ is utilized to construct the neural network.
III. PROPOSED CALIBRATION METHOD
Assuming a standard input vector V STD in , it has n elements which have a value of s for a n × m memristor crossbar array.
So a input vector V in (the vector to be calculated) can be described as
where V in is the input vector that is actually input into the crossbar array, and V i is the i th element of the V in vector. The matrix G is used to describe the conductance of the devices in the crossbar array. The G can be described as
where G n,m represents the conductance of the device in column m of row n. So when the input vectors are V in and V STD in respectively, the ideal output of crossbar array is described as
where V out represents the ideal output of vector V in and V STD out represents the ideal output of V STD in . Considering the yield of the crossbar array, the change of the conductance of the memristor in the array can be described by matrix G.
After considering the yield, the conductance in the array can be described as
and the original output of V in can be described as
We use the δ to represent the difference between the original output and the ideal output of V in , which is the error of the output due to the yield issue. So the δ can be described as
Similarly, the original output of V STD in is described below. 
As it can be seen from Eq. (12), we can pre-calculate V STD out by the matrix G, so the output error of the V STD in can be calculated from the original output V STD out (as shown in Eq. (12)). The δ is estimated by P and δ STD . And the definition of P is described as
where δ is an estimate of δ . Therefore, the calibrated output V out can be described as
The calibration method takes V out as the output of the input vector V in , and Fig. 2 shows the flowchart of the method.
According to the principle of the method, the process of the calibration method can be summarized as follows.
1) Define a 1 × n standard vector V STD in that satisfies each element in the vector to be equal, and the value (15)) The G i,k (i = 1, 2, .., n; j = 1, 2, .., m) and V i (i = 1, 2, ..n) can affect the accuracy, especially when G i,k is so large. In other words, when G i,k is very large, this also means that the yield of RRAM is very low.
IV. EXPERIMENTS A. EXPERIMENTS SETUP
All experiments are conducted using an Intel Xeon E7 (2.6 GHz), 16GB DDR4 and a NVIDIA Titan XP graphics card. This work is implemented with the Theano python open-source library to train the neural network models, and HSPICE is also used for some circuit simulations. Experiments are conducted with Monte-Carlo simulation method.
1) MEMRISTOR MODEL
The simulation is conducted based on the Ti/AlOx/TaOx/Pt memristor device, which was proposed in our prior work [22] . The bottom electrode (Pt, 25nm) and the resistive switching layer (AlOx/TaOx, 5nm/3nm) were deposited by electron beam evaporation in the fabrication process. The top electrode (Ti, 30nm) was grown by magnetron sputtering. By the lump, three lithography processes were performed to form the patterns of the three different layers.
Multilevel devices have a resistance range of 1K to 12K , which can be tuned to target resistance by repeated programming, and 200 states achieved by applying triangular pulses are utilized for simulations. 
2) DEVICE DEFECTS GENERATION METHOD
Device variations and RRAM yield are considered as device defects in our simulations. In detail, the damaged devices denote the single-bit failure (SBF) [25] , that means a memristor that fixes in a low (900-1100 ) or a high (10800-13200 ) resistance state. When the simulated device occurs a SBF issue, the device is at a high resistance with a probability of 50%, and the probability of 50% is at a low resistance.
Fluctuating devices are also included in the device defects generation. The variation of memristor devices mainly includes two types: switching variation [35] and parametric variation [36] . The driving circuit causes switching variation during the writing or reading cycle, programmed voltage with micro variation leads to a large variation in the memristor resistance. The imperfect fabrication such as oxide thickness, line-edge roughness and random dopants causes the parametric variation. When the resistance of the memristor fluctuates, the change of resistance on RRAM crossbar array, from R ij to R ij , is described as follows:
where the θ ij denotes the resistance variation, which follows the lognormal distribution [26] . The parameter σ is utilized to denote the extent of the variation in the experimental simulation. Unless otherwise specified in the following simulation, the default value of σ is 0.8. Fig. 3(a) shows a 28 × 28 pattern which selects from the MNIST dataset [37] that simulated programming in a 32 × 32 memristor crossbar array. In the process of simulated programming, 1%-20% of errors are generated by the write circuit [17] . Fig. 3(b) demonstrates the resistance distribution of a 15% defects 32 × 32 crossbar array. From the figure it can be seen that defects (variations and SBF) distribute stochastically across the crossbar array and blur the programmed pattern.
3) IMAGE SHARPENING WITH CALIBRATION METHOD
In order to show the performance of calibration method more intuitively, a case of image sharpening is introduced. The memristor crossbar array is utilized for edge detection.
A laplace mask is used to implement the convolution operation on the input image for achieving the edge of the image, a 3 × 3 laplace mask can be described as
When the edge detection is completed, the sharpened image can be obtained by subtracting the edge from the original image. Fig. 4 demonstrates the case of image sharpening. As shown in Fig. 4(a) , the classic image 'Lena' with 512 × 512 pixels is resized to 270 × 270 pixels after median filtered, and the zero-padding operation was executed around the image for the convenience of processing, so the size of the final image is 272 × 272. The input image is segmented into 9 × 9 small fields of 32 × 32 pixels, and a 1024 × 1800 crossbar array is used for completing the edge detection. The laplace mask is mapped into the array, which is used for performing the convolution operation. Each 32 × 32 input field produced 900 outputs by convolution. As shown in Eq. (17), positive and negative weights are existed in mask, so 1800 outputs were produced by differential output.
In detail, The image is divided into RGB channels for processing, each channel sends 32 × 32 field into the array, and 900 outputs generated after differential processing. The output of the array is the original output, and the output after applying the calibration method is the calibrated output. Fig. 4(b) shows the ideal output for an array without defects, and Fig. 4(c) illustrates the original output of the array when the array yield is 95%. Fig. 5 (a)-(c) demonstrate histograms of the input voltages of the R/G/B three channels, and the standard voltage s for each channel is 0.0769V , 0.0380V and 0.0392V respectively.
In order to better evaluate the correlation between original output and calibrated output and ideal output, the pearson correlation coefficient (PCC) is introduced to measure. The formula of PPC can be described as
where ρ X ,Y is the coefficient, X and Y represent two vectors that need to be measured, and N indicates the dimension of the input vectors. In practical computation, X is replaced by ideal output and Y is replaced by original output or calibrated output. Table 1 demonstrates the correlation strength corresponding to different ranges of correlation coefficients. A thousand groups of ideal output, original output and calibrated output are used for PCC calculation. Fig. 5(d) shows the correlation coefficients of 1,000 groups of output at 95% yield. It can be seen that original outputs coefficients are below 0.40 and calibrated outputs coefficients are above 0.50, the minimum value of the calibrated output differs from the maximum value of the original output by 0.19. According to the statistics, the average of original outputs coefficient is 0.197, while the calibrated is 0.742. From Table 1 , it can be seen that the correlation between the original output and the ideal output is very weak correlation, while the correlation between the calibrated output and the ideal output can be strong correlation. Table 2 shows the average PCC of different defects. In Table 2 , both variation and stuck-at faults are taken into account. In other words, the ''variation'' and ''stuck-at faults'' share the same proportion. At 98% array yield, the PCC of original output is 0.386, which corresponds to weak correlation, while the PCC of calibrated output can reach 0.827, which corresponds to extremely strong correlation. Obviously, the PCC decreases as the proportion of defects percentage increases. When the yield is only 90%, the PCC of original output is only 0.088, and the average PCC of the calibrated output can still reach 0.650, which remains strong correlation. 
B. CALIBRATION IN NEUROMORPHIC COMPUTING
In the experiments, the MNIST dataset [37] is chosen for accuracy verification. The trained synapses weights are converted to conductivity values of memristors that fall within the bounded range of devices state.
Experiments are executed based on a 784 × 10 singlelayer perceptron (SLP) [29] , a 784 × 100 × 10 multilayer perceptron (MLP) [30] and a ''3-2-1'' RRAM crossbar based cascaded architecture [12] which includes six convolutional neural networks. And each 28 × 28 handwritten image is converted to a 784 × 1 vector by voltage generator as the crossbar array input. Fig. 6 demonstrates the implementation on crossbar array. It can be seen that a 784 × 10 SLP is mapped into a 784 × 20 memristive array. The matrix W represents a 784 × 10 synaptic networks, which includes positive and negative values. Therefore, each original value is extended in two parts, W + and W − . If one element is a negative value, then the value is defined in W − , and the value of W + is zero. Similarly, the positive value is defined in W + and the W − is zero [10] . In the actual mapping of the array, we use the minimum conductance (Smin) of device to indicate the zero value. 
1) SINGLE-LAYER PERCEPTRON ON CROSSBAR ARRAY

2) MULTILAYER PERCEPTRON ON CROSSBAR ARRAY
In order to verify the performance of the method when two arrays are cascaded, a 784 × 100 × 10 MLP is applied. Fig. 7 shows the implementation on two crossbar arrays. The weight matrix W 1 is mapped to a 784 × 200 array. Similarly, matrix W 2 is mapped to a 100 × 20 crossbar array, and the output of the first array is treated as the input of the second array after passing through the absolute activation function (Abs) [10] . It can be seen that the outputs of both arrays are all calibrated.
3) CASCADED ARCHITECTURE ON CROSSBAR ARRAY
In order to further verify the performance of the calibration method which utilizes on multiply crossbar arrays cascaded, the cascaded architecture [12] is implemented with largescale memristor crossbar arrays. A ''3-2-1'' cascaded framework is utilized for simulation, which includes six basic computation units (BCU), and each BCU is a convolutional neural network. In each BCU, five 9 × 9 convolution kernels are mapped into a 784 × 4000 array, and each kernel performs 400 convolution operations, so there are 4000 outputs including positive and negative weights. In the same way, the fully connected layer is mapped into a 100 × 20 crossbar array. And the activation circuit and pooling circuit are referred to [10] . 
4) SIMULATION RESULTS
Device variations and yield are considered in our simulations. The 1%-20% defects in memristor crossbar array were randomly generated. In the experiments, the s we used is determined according to the median of the training dataset (excluded zero value), and here s = 0.0831. Fig. 9 illustrates the original accuracy and calibrated accuracy with defects percent, and the solid lines in the figure represent the average accuracy of each set of data. In each percent, the positions of fluctuating devices and damaged devices are randomly generated, and 100 experiments are executed in each defect percent. As figures shown, with the increase of defect percentages, the accuracy rate decreases, in other words, the computation errors of arrays increase. Fig. 9(a) shows the comparison between the calibrated accuracy and the original accuracy of SLP. It can be seen that the average accuracy has been improved after calibrated. For instance, only 49.85% original accuracy can be obtained on average when considering 20% defects, and the accuracy can be calibrated to 73.05% on average.
The degradation of MLP increases with number of defects as shown in Fig. 9(b) . It demonstrates that the accuracy can be calibrated to 92.45% and 81.32%, considering 2% and 4% random defects respectively.
As shown in Fig. 9(c) , when the defects is 10.0% (90% yield of arrays), the original accuracy of cascaded neural networks is 32.53%, and the calibrated accuracy is 59.36%, which is 26.83% higher than that. Fig. 9(d) depicts the number of successes (improved accuracy) and failures (degraded accuracy) after calibrated is executed in 1000 sets of data. In the SLP implementation (only has one array), the success rate of the calibration method is 76%, and the success rate is as high as 96.2% and 99.3% when there are several arrays (MLP and cascaded framework implementations). It can be seen that the number of successes of the calibration algorithm is much greater than the number of failures.
It can be seen that the cascaded architecture provides dramatic improvements over the non-calibrated version, but the MLP provides only modest accuracy improvements even though it is a cascaded architecture. In fact, the output of each array is calibrated in the experiments (as shown in Fig. 7 and Fig. 8 ), so the more arrays that are cascaded, the higher the calibrated accuracy will be. Table 3 shows the performance comparison between this work and the ''Vortex'' algorithm [34] . It can be seen that this work dramatically outperforms the ''Vortex'', especially when the resistance fluctuates greatly (σ 1.0). From the table it can be seen that this work significantly surpasses ''Vortex'' by 29.36% accuracy with σ = 1.0 and 42.20% accuracy with σ = 1.2.
We further compare this work with the method of Ref. [25] which retrained the neural networks for improving the accuracy. 
V. CONCLUSION
In this paper, a calibration method is proposed for tolerating variations and stuck-at-fault on RRAM crossbar, which effectively pulls up the calculation accuracy. Three neuromorphic computing architectures and a case study of image sharpening were mapped into the array for experiments. From the simulation results above, the accuracy of calculations has been greatly improved (nearly 30% of cascaded framework). Experiments show that the number of successes of the calibration method is far greater than the number of failures, the success rate of the method can be as high as 99.3%. Compared with state-of-the-art algorithm, the proposed method can obtain desired accuracy with defects on crossbar arrays. HUI XU received the Ph.D. degree in information engineering from the National University of Defense Technology, Changsha, China, in 1995, where he is currently a Professor with the College of Electronic Science.
His current research interests include circuits and signal processing systems.
JIWEI LI received the M.S. degree in electrical engineering from the National University of Defense Technology, Changsha, China, where he is currently pursuing the Ph.D. degree in electronic science and technology.
His current research interests include circuit design and spiking neural networks.
ZHIWEI LI received the M.S. degree in electronic science and technology from the National University of Defense Technology, Changsha, China, where he is currently pursuing the Ph.D. degree in electronic science and technology.
His current research interests include circuit design, 3-D memory technology, memristor modeling, and memristive systems.
YI SUN received the M.S. degree in electrical engineering from the National University of Defense Technology, Changsha, China, where he is currently pursuing the Ph.D. degree in electronic science and technology.
His current research interests include neuromorphic computing and memristor characterization.
QINGJIANG LI received the Ph.D. degree in electronic science from the National University of Defense Technology, Changsha, China, in 2014, where he is currently an Associate Professor with the College of Electronic Science.
His current research interests include memristor characterization and applied circuits.
HAIJUN LIU received the Ph.D. degree in information and telecommunication systems from the National University of Defense Technology, Changsha, China, in 2010, where he is currently an Associate Professor with the College of Electronic Science.
His current research interests include the study of memristor and memristive systems, and applications based on two-terminal resistive switches.
