Abstract-The memristor has been regarded as a promising candidate of basic cell for the next-generation computation system. Compared to the traditional MOSFET device, the memristor is much more efficient in energy and area. But the biggest obstacle for the memristor to replace the CMOS device is its precision,up till now, the highest precision acquired is 8-bits which can not reach the requirements of most numerical computation. In this paper, we propose a memristor crossbar-based computing system to conduct pretty high precision computation.As a brief introduction, We divide the multiple bits of computations into many groups. The computation of each group can be calculated by memristor crossbar based structure. Analog-to-Digital Converters (ADCs) are used to extract these valid most significant bits. These valid bits are then combined together to obtain the final computation results with high precision. what's more, our computing system 's precision is user-defined, which means it can conduct both high precision and low precision computation. So it can realize both neuro-morphic computation such as pattern recognition and numerical computation such as digital signal processing algorithms.
I. INTRODUCTION HP Lab [1] demonstrated the first memristor in 2008, which was predicted by O.CHUA [2] about 40 years ago. The physical realization of the memristor by HP Lab is based on a TiO 2 thin-film structure. Many memristive materials and devices were tried afterwards [3] , [4] , [5] , [6] , [7] . Memristor has demonstrated to be a promising device for many applications. For example, memristor based non-volatile memory can achieve higher integration density than the traditional flash memory. Memristors-CMOS hybrid structures were demonstrated to be useful for reconfigurable computing.
Moreover, due to the similarity between the memristive and synaptic behaviors, memristor provide an efficient way to implement neuromorphic computing systems. The memristor crossbar-based structure has been widely employed to implement the neuromorphic computing. In the memristor crossbarbased structure, memristors are allocated at the crosspoints of the horizontal and vertical metal wires. The crossbar structure is similar to the traditional neural network models. The memristors implement the synaptic connections of the neural network models. The memristors can be programmed to hold the synaptic weights of the neural network models.
The wide applications of the memristors to neuromorphic computing is mainly due to the similarity between memristor and synapse. However, this is also because the neuromorphic computing is not sensitive to the large random process variations of the memristive devices. Recently, an algorithm has been proposed in [8] , [9] aiming to improve the precision of the memristors. It can achieve 8-bit writing precision with hundred times of writing and reading processes. Although the writing precision has been significantly improved, it can still not satisfy the requirements of high-precision computation, where the error should be at least less than 10 −10 in most cases.
In this paper, we propose a memristor crossbar-based computing system towards higher precision based on the result of [8] . Due to the limited precision of the memristor, we divide the multiple bits of computations into many groups. The computation of each group can achieve adequate accuracy by memristor crossbar based structure with limited precision of memristors. Combining the computation results of all the groups together is challenging. Note that only several most significant bits of the results of a group is valid. Therefore, we employ Analog-to-Digital Converters (ADCs) to extract these valid most significant bits. These valid bits are then combined together to obtain the final computation result with high precision. We tested our design with fixed-point multiplications.
The experimental results demonstrated that our design can achieve 10 −16 precision for 32-bit fixed-point multiplications. The rest of the paper is organized as follows. In Section II, the background of memristor and its computation structure is reviewed. The detailed design will be presented from the aspects of theory and circuit structure in Section III and IV, respectively. The experimental results will be demonstrated in Section V. In Section VI, we conclude the paper.
II. BACKGROUND REVIEW A. Memristor
We will give a brief introduction to memristor firstly. A typical structure of the memristor proposed by HP Lab in [1] is shown in 1. It is a semiconductor film, such as copper oxide or hafnium oxide, sandwiched between two metal contacts. The resistance of the devices is decided by two variable resistors connected in series,as shown in Figura1. In fact, the device has a region with higher dopants, which is shown in darker color and a region with lower dopants, which is shown in lighter color. Suppose the depth of the memristor is D, the length of the higher dopant region is w, the resistance of the doped region is R on and the resistance of the region without dopants is R of f , the mobility of the ion is µ v . We have the following I-V relationship of the memristor.
where v(t) is the voltage applied between the two ports of the memristor, and i(t) is the current flow through the memristor. Define the doped side of the memristor to be p 1 and the other side to be p 2 . According to (1) , if the current flow from p 1 to p 2 , the doped area will expand and w become larger, which leads to the decrease of the resistance. Similarly, if the current flow from p 2 to p 1 , the resistance of memristor will increase. The I-V relationship of a memristor is thus the curve shown in Figure 2 [1].
B. Crossbar-based Structure for neuromorphic computing
The most common application of memristor is neuromorphic computing. The basic structure for neuromorphic computing is crossbar array. Most artificial neural networks can be divided into a series of matrix-vector mulplications, i.e., where y, x are vectors, M is a matrix. Such a computation can be accomplished by a memristor crossbar structure. An example of the crossbar structure is shown in Figure 3 .
In the memristor crossbar-based structure, memristors are allocated at the cross-points of the horizontal and vertical metal wires. x, y are the input and output of the crossbar array, respectively. The resistances (conductances) of the memristors can be viewed as the values of the elements of the matrix M . In neuromorphic computing, the resistance of memristors are programmed to be the weights of the artificial neural networks for specific applications. Based on this structure, a lot of machine learning algorithms can be realized [10] .
We should note that the neuromorphic computing is not sensitive to the large random process variations of the memristive devices. As a result, the programmed resistances of the memristors are not necessary to be in high precision. But for other applications such DSP algorithms, the resistances of the memristors should be programmed with much more higher precision. Recently, an algorithm has been proposed in [8] aiming to improve the precision of the memristors. It can achieve 8-bit writing precision with hundred times of writing and reading processes. Unfortunately, such a precision is still not acceptable for high-precision computation, where the error should be at least less than 10 −10 in most cases.
III. DATA REPRESENTATION
In this section, we will present the theoretical analysis of our proposed memristor crossbar-based computing scheme towards higher precision.
A. Representation of the High-precision Data by Multiple Memristors
In prior works, a memristor usually corresponds to one element of the matrix M . Aiming to improve the computation accuracy, we propose to store the high-precision data by a group a memristors. Although the precision of one memristor is limited, the multiple memristors together can accurately represent data with high precisions. For the reason of convenience, we consider the fixed-point unsigned number in our paper only. We also assume the target value α to be stored in memristors is within the range of (0, 1). The data α can be written in binary form
where α i ∈ {0, 1} is the i-th bit of the binary representation and the total number of bits is n. We aim to store α by k memresitors. For convenience, we assume that n = k * m.
The value stored in the j-th memristor can be expressed as
where β j is the data stored in the j-th memristor. α can also be expressed by
It is important to point out that the least significant bit (LSB) of β j (1 < j < k) is 2 −m , in contrast to 2 −km in its original form. In other words, the programmed resistance of the memristor should achieve a precision of 2 −km to accurately store the original data α. However, the data β j {j = 1, · · · , k} can be accurately stored with only a precision of 2 −m . With multiple memristors, the precision requirement is greatly reduced.
B. Multiplication with Data Stored in Multiple Memresitors
We consider the following multiplication. z = x * y where x, y, z are scalars represented by n bits. We assume that y is programmed in k memristor, x is encoded by k separated input signals. z is the output. x and y can be written in the binary form as follows. where x i and y i are the i-th bit of x and y, respectively. x is encoded by k separated signals and y is stored in k memristors. They are expressed as follows.
where
. If x is encoded by the amplitudes of the waveform, X j can be the amplitude of the j-th waveform. Y j is the value programmed in the j-th memristor. Now, we consider the expressions of z based on
It can be easily verified that z can be expressed as
Equation (8) indicates that we can accomplish the multiplication of x and y based on the sub-component expressions of x and y. The rule is also very similar to traditional multiplication based on binary forms, which can be efficiently implemented by the memristor crossbar based structure.
IV. CIRCUIT IMPLEMENTATION
In this section, we will present the circuit implementation of our proposed approach. An overview of the circuit structure is presented firstly, the implementations of the memristor crossbar and chain structures will be presented afterwards. We also assume that the number of bits for the data is n. The n bits are divided into k groups for further processing, and n = k * m.
A. Overview
The input of our proposed structure is one single scalar encoded by sinusoidal waves. The output of the proposed multiplier is also a scalar. On the contrary, the input to the traditional memristor crossbar structure is a vector. The output of the crossbar structure is also a vector, which is the result of the input vector multiplying a matrix programmed in the memristors.
As shown in Fig 4, our proposed multiplier consists of two main components. The first component is a memristor-based crossbar array and the second component is a chain structure consisting of operational amplifier and ADC. The crossbar array is used to obtain Z j according to (8) . The chain structure is used to extract the valid bits of Z j obtained from the array structure, and deal with the carry bits.
B. Memristor Crossbar Array
The structure of our crossbar array is similar to the traditional crossbar structure we mentioned before. The input and output signals are all in the analog form, i.e. in the form of sinusoidal waves, whose amplitude ranges from 0 to 1. The amplitude of the sinusoidal waves encoded the target values of the input and output. Memristors are allocated at the cross-points of the horizontal and vertical metal wires. Instead of using a NMOS to cut off the sneak path [10] , we use an operation-amplifier in each column to collect the current and transfer the current to the voltage output. Thus, the input signal and output signal are all in the voltage form, which is convenient for series connection. The structure of the memristor crossbar structure is shown in Figure 5 .
Note that our crossbar is used to implement the computations of Z j as shown in (8) . Since x and y are divided into k sub-components, the result of z = xy thus consists of 2k − 1 sub-components. As a result, our crossbar structure is a (2k − 1) × k memristor array. The k input signals are fed into the array from the left and encoded the value of x by their amplitudes. More specifically, X i in (6) is encoded by the amplitude of the i-th sinusoidal waves. The i-th input signal is connected to the i-th row of the array as shown in Figure 5 . The sub-components {Y 1 , · · · , Y k } are programmed in the memristors of the crossbar. For the first row, the conductance of the 1st to k-th memristors are programmed with With such a crossbar structure, it can be easily verified that the current in the i-th column equals Z i as shown in (8) . An operation-amplifier is used in each column to transfer the current to the voltage output. However, two problems still need to be solved. Firstly, the noises induced by the variations of the memristors remain in Z i . Only the first several most significant bits are valid in Z i . Secondly, the carries between the consecutive Z i s are not tackled. These issues will be addressed by a chain structure, as will be discussed in the next subsection.
C. Chain Structure
We use a chain structure to extract the valid most significant bits from {Z 1 , · · · , Z 2k−1 } and deal with the carry bits. For the inputs {Z 1 , · · · , Z 2k−1 }, each should have at least p = mlog 2 k effective bits. The first p − 1 bits are the carry bits, and the lowest bit is the output of the chain structure. As shown in Figure 6 , the chain structure is similar the traditional carry chains. The difference is that an ADC is used to extract the p valid bits from each input, and the number of carry bits is not 1 but p − 1. The p − 1 carry bits are encoded by the amplitude of a voltage signal. This signal is obtained directly from the inline DAC of the ADC. The chain structure consists of 2k −1 basic cells connected in series. Each basic cell has two inputs and two outputs. Two inputs include the encoded p − 1 carry bits from neighboring cell, and Z i from the memristor crossbar array. Two outputs include the encoded m carry bits, and m bits in binary form.
For convenience, we assume that the two inputs of the jth basic cell are V 
where V j in1 and V j in2 are aligned by multiplying V j in2 by 2 −m . Equation (9) can be implemented efficiently by a simple memristor structure as shown in Figure 6 . The two memristors are programmed with conductance 1 and 2 −m respectively. The signal in current form from the memristor structure is then transformed to voltage form by a operational amplifier. V j s is then fed to an ADC to extract the effective bits. After the first p − 1 bits are extracted, it is transformed to its analog form by the inline DAC of ADC and taken as the output V j o2 . The lowest bit in binary form is the output V j o1 of the i-th cell. Note that the ADCs can be reused in our design.
V. EXPERIMENTAL RESULT
We implement our proposed memristor-based highprecision structure. The memresitor model is based on [11] . The CMOS circuits including operational amplifiers and ADCs use ideal models for convenience. HSPICE 2010 is used for circuit simulation. In order to evaluate the impacts of the variations of the memristors and input signals, random variations are added to all the conductances of the memristors and input signals in our experiments. We consider the fixedpoint unsigned number in our experiments, and the range of the number is (0, 1).
A. An Illustrative Example
In the first example, each memristor holds 2 valid bits. We set x to 0.8359375, which can be accurately represented by 8 bits. Therefore, it is divided into 4 groups, and {X j , j = 1, · · · , 4} are listed as follows. The result z is expressed as z = xy (10) = 0.35592265137 (11) = 0.10110110001111 (2) .
In order to simulate the variations of the conductances of the memristors, Random noises are added to {Y j , j = 1, · · · , 4} as follows. The result of our multiplier is 0.10110110001111 (2) v, which is equal to the standard result showed above. This means that our structure is still able to achieve higher precision, although the writing conductances of the memristors are not accurate and the input signals have noise.
B. Multiplication of Scalars with Different Ranges
We also consider the multiplication z = xy, where x, y, z are all fixed-point numbers. x, y have 16 effective bits. The ranges of x and y are set to (0,1). According to [8] , the writing conductance of a memristor can achieve 8-bit precision. We add random noises with absolute value less than 2 −8 to the conductances of the memristors. In order to achieve 2
−16
accuracy, a memristor is used to hold 1 effective bit in this example. In this case, y with 16 effective bits are hold by 16 memristors.
Considering the outputs of the memristor crossbar structure, 16 bits are added together in the worst case. Therefore, the accumulated error of random errors of 16 memristors with 2 −8 precision would be less then 2 −4 for {Z 1 , · · · , Z 1 6}, which means we can extract 4 effective bits from {Z 1 , · · · , Z 16 }. 4 effective bits are enough to represent the sum of 16 bits, which means we can get the accurate result for the multiplication of scalars with 16 effective bits.
In order to test the precision of the multiplications, we random generate many combinations of x and y range from 0 to 1 and calculate the error compared with the accurate result.The testing data shows that we can realize the results with less than 2 −16 with more than 99 percent and some of our testing data of x, y and the error are shown in Table I .
The proposed circuit can run at a the frequency of 1 GHz, with an area of 6.68um
2 and the energy of 44.8mw
C. Testing Results with Higher Precisions of the Memristors
If the writing conductance of a memristor can achieve higher precision, e.g., 10-bit precision, we can accurately implement multiplication of x and y with 32 effective bits. In order to achieve 2 −32 accuracy, a memristor is used to hold 1 effective bit in this case, and y with 32 effective bits are hold by 32 memristors.
The accumulated error of random errors of 32 memristors with 2 −10 precision would be less then 2 −5 for {Z 1 , · · · , Z 3 2}, which means we can extract 5 effective bits from {Z 1 , · · · , Z 3 2}. 5 effective bits are enough to represent the sum of 32 bits, which means we can get the accurate result for the multiplication of scalars with 32 effective bits.
In order to test the precision of the multiplications, we random generate many combinations of x and y range from 0 to 1 and calculate the error compared with the accurate result. Some of the testing data of x, y and the error are shown in Table II . From the table, we can see that our proposed multiplier realizes the multiplication with the precision of 2 −32 . 
VI. CONCLUSION
Compared to the traditional MOSFET devices, the memristor is much more efficient in energy and area, so it is considered as a promising candidate to conquer the Von Neumann bottleneck. But to realize this dream, a memristorbased system must have the ability to conduct both precise computation and imprecise computation, otherwise it can only play the role as a GPU which means it can not replace the CPU and memory totally. In this paper, we proposed a memristor-based multiplier towards higher precision. We divide the multiple bits of computations into many groups. The computation of each group can achieve adequate accuracy. Analog-to-Digital Converters (ADCs) are then used to extract these valid most significant bits. The valid bits are then combined together to obtain the final computation result with high precision. Experimental results have demonstrated that if the conductances of memristors can achieve 8-bit precision as shown in [8] , our proposed approach can achieve accurate results for the multiplication of 16-bit fixed-point numbers. If the conductance of the memresitors can achieve 10-bit precision, our proposed approach can achieve accurate result for the multiplication of 32-bit fixed-point numbers. In the future work, we want to expand our structure to conduct various computation such as the DSP algorithms. 
