Abstract-Neural processor development is reducing our reliance on remote server access to process deep learning operations in an increasingly edge-driven world. By employing inmemory processing, parallelization techniques, and algorithmhardware co-design, memristor crossbar arrays are known to efficiently compute large scale matrix-vector multiplications. However, state-of-the-art implementations of negative weights require duplicative column wires, and high precision weights using single-bit memristors further distributes computations. These constraints dramatically increase chip area and resistive losses, which lead to increased power consumption and reduced accuracy. In this paper, we develop an adaptive precision method by varying the number of memristors at each crosspoint. We also present a weight mapping algorithm designed for implementation on our crossbar array. This novel algorithm-hardware solution is described as the radix-X Convolutional Neural Network Crossbar Array, and demonstrate how to efficiently represent negative weights using a single column line, rather than double the number of additional columns. Using both simulation and experimental results, we verify that our radix-5 CNN array achieves a validation accuracy of 90.5% on the CIFAR-10 dataset, a 4.5% improvement over binarized neural networks whilst simultaneously reducing crossbar area by 46% over conventional arrays by removing the need for duplicate columns to represent signed weights.
network (CNN), which are well suited for object detection and vision-based processing, due to their high performance in feature recognition and object detection in images [5] .
One of the challenges associated with machine learning stems from dimensionality issues, where algorithms with more features in higher dimensional spaces lead to difficulty in interpretability of the network. When a learning algorithm does not work, the simplest path to success is often to feed the machine more data. This leads to scalability issues, where we have more data but lack the processing power to compute new inferences. An almost real-time prediction with sufficient accuracy is required for portable devices and edge sensors, using a constrained power budget to implement ambientassisted technologies. This challenge was initially addressed by shifting computations over to graphical processing units (GPU), as GPU architectures consist of many small cores that parallelize the processing of data. Calculations of similar form are carried out simultaneously, thus maximizing throughput of all threads which boosts performance while reducing the bottleneck when paired with a CPU. However, when dealing with algorithms that must call a significant number of parameters from memory, (e.g., 138 million parameters in the VGG-16 CNN [6] ), these parameters must be accessed from and stored in data memory via a shared bus with restrictive data transfer rates. This issue is referred to as the von Neumann bottleneck. More recently, application specific Neural Processing Units (NPUs) were deployed in mobile devices for real time operation without the need for server connections to perform deep learning operations [7] [8] [9] [10] . NPUs are optimized for power and area efficiency for matrix-vector multiplication (MVM) without the need for 'cloud-based' processing. However, this approach still relies on conventional CMOS technology where process scaling is bound to performance degradation (retention, cycling and reliability), and memory and processing are physically delocalized. This has given rise to the exploration of beyond-CMOS architectures for artificial neural network (ANN) and CNN applications.
Researchers have offered a variety of hardware solutions that implement memristors into neuromorphic processors [11] [12] [13] [14] [15] [16] [17] [18] . The memristor is a two-terminal nanoscale device which serves as non-volatile memory and also doubles as a resistor. That is, memory and computation based on the linear form of Ohm's Law exist within the same device. Memristors are scaled into a dense crossbar structure for an area efficient means to parallelize multiply-and-accumulate (MAC) functions, where high-speed computation is achieved through the column-wise parallelism of arrays. However, problems such as memory leakage, variability and device sensitivity make it challenging to reliably store multi-bit and analog data [19] [20] [21] [22] . The work in [23] demonstrates the storage of over 64 conductance states per memristor, though the difference between simulated and experimental efficiency is an order of magnitude of 10 2 TOPS/w, speculated to be a result of the slow write times needed to ensure precise conductance control and noise mitigation.
To combat the limitation of multi-bit and analog state memristors, hardware implementations using two states (R ON , R OF F ) of a memristive binarized neural network (BNN) [24] , [25] model have been proposed. Where weights are limited to single-bit resistances, lower precision results in decreased classification accuracy. Other methods to achieve multi-bit weights are through binarized encoding schemes with column-wise distribution, or via frequency modulation by encoding weight information in the time-domain of the driving voltage [26] . In all cases, either chip area or timing is compromised due to additional columns and the need for more complex CMOS driving circuitry. The representation of negative weights with positive conductances requires double the number of columns, with outputs passed through a differential amplifier [23] .
In this paper, we propose a novel solution derived from nanoelectronics to overcome the above limitations of conventional crossbar architectures. This is done by introducing parallel-connected memristors at each crosspoint junction on a crossbar, by either splitting larger memristors and insulating the smaller counterparts from one another, or laying out multiple masks per crosspoint. This means we are able to process radix-X weights (i.e., higher bit precision at each junction), and formalize a hardware mapping approach that significantly reduces circuit area utilization by representing both negative and positive weights without the need to distribute computations across column wires. Furthermore, this approach significantly reduces exposure to line losses.
The main contributions of this paper are: 1) Radix-X CNN: here we introduce a CNN implementation using radix-X weights, where the weights and activation values are mapped to the range of the radix numeral system, or 'X'. We develop a straightforward algorithm based on regularization, and provide both pseudo-code and our python implementation. We test the accuracy of our radix-X CNN by training it on the CIFAR-10 dataset, and comparing it with several prominent models. Intuitively, this can be thought of as targeting the algorithmic component in algorithmhardware co-design methodology. 2) Parallel-connected memristors at each crossbar junction for radix-X weight representation: hardware implementation of radix-X CNN. We show improved stability, reliability and decreased area consumption by using our proposed parallel-connected memristor architecture for storage of radix-X CNN weights. This focuses on the hardware aspect of the co-design methodology. 3) Negative weights representation: implementation of negative weights is a significant overhead in crossbar arrays. Conventional methods use twice the area of crossbars to address this problem. Here, we demonstrate how our radix-X CNN significantly reduces the circuit area by using a single crossbar reference column for both negative and positive weight representation, rather than doubling the number of column wires. The above contributions are quantified by showing how our proposed radix-X CNN hardware achieves a validation accuracy of 90.5% on the CIFAR-10 dataset when X = 5, and a 4.5% improvement on conventional low precision weights (namely, BNNs). Importantly, we reduce chip area by 46% over conventional state-of-the-art arrays by condensing the number of required column wires to represent negative weights down to a single reference line.
This paper is organized as follows: section II introduces the concepts that drive the technology of the radix-X CNN approach in a memristor crossbar. Section III describes our radix-X CNN learning algorithm with pseudo-code provided, and section IV demonstrates how it is implemented using a parallel-connected memristive crossbar array for representation of radix-X weights, and proposes a solution for negative and multi-bit weight representation. Section V shows our simulation results by running a classification example on the CIFAR-10 dataset, and section VI presents the nanofabrication techniques employed in the development of our crossbar array, with accompanied experimental results of a simple convolutional kernel with a Sobel filter containing both positive and negative elements applied to an input image. Section VII provides a discussion of some of the design trade-offs of the hardware implemented radix-5 CNN, with concluding remarks given in section VIII.
II. BACKGROUND

A. Resistive Switching in Memristors
The reconfigurability of conductance in a memristor is leveraged in neuromorphic computing to represent updatable weight values. Resistive switching has been demonstrated in metal-oxide devices, with Ta 2 O 5 [27] , [28] , HfO 2 [29] and TiO 2 [30] , [31] being among the most recognized. Under the influence of an applied electric field, a conductive filament made up of oxygen vacancies can be formed which creates a pathway for electrons to flow through [32] . The formation of the filament corresponds to a low resistance, and the rupture of the filament breaks the conductive pathway resulting in a high resistance.
Under a forward bias, the memristor switches to a low resistance state (LRS). When a reversal of the bias is applied, it switches to a high resistance state (HRS). Fig. 1(a) illustrates the physical structure of a memristor formed by TiO 2 and oxygen deficient TiO 2 -x layers sandwiched between two metal electrodes. Fig. 1(b) illustrates the resultant V-I curve under a sinusoidal driving voltage, causing the device to switch between two resistance states.
To achieve analog or multi-bit states, the width of the filament must be precisely modulated, which is challenging in practice. It often requires the use of lower write voltages applied across longer durations, which super-exponentially increase the time of write cycles [33] . Therefore, many realizations of crossbar arrays employ conservative design techniques and treat metal-oxide memory cells as single-bit storage [34] . Multi-bit weights are often implemented using multiple memristors, distributed across multiple column wires.
B. Convolutional Neural Networks
A generic structure of a CNN is depicted in Fig. 2 [35], [36] . Its high performance in image classification is enabled by retaining some spatial dependencies (i.e., taking consideration of the location of pixels relative to neighboring pixels). This is achieved by treating the image as a matrix rather than vectorizing it in a fully-connected neural network. As higherlevel features are extracted, the channel depth increases, which results in a much larger number of MVMs (computational equivalent of a MAC operation) for a given number of inputs.
C. Neural Network Using Memristor Crossbar Arrays
The key to memristor crossbar arrays being capable of neural network acceleration is that MVMs are the dominant process in CNNs. By parallelizing a large number of MACs across column wires using weights that have been stored in the form of conductance values, we are able to optimize the hardware mapping of neural network architectures.
Figs. 3(a) and (b) depict the mapping of the neuron model to a circuit. The inputs of the neural network X 0 to X n are linearly mapped to the input voltages V x0 to V xn of the crossbar, and the weights w 0 to w n are linearly mapped to the conductances G x0 to G xn of the memristor. By using the virtual ground of an inverting amplifier to hold each column wire as the reference node (detailed in section IV), the current drawn by each memristor can be calculated using Ohm's Law, and then summed along the column wire in accordance to Kirchhoff's Current Law. Equations (1) and (2) mathematically describe this process:
where Y is the pre-activation output of the artificial neuron, n corresponds to the number of inputs to the neuron, and i tot is the total current through a column wire. A vectorized implementation of Fig. 3 (b) is defined by (3) . When the number of columns is increased to a value m in an array, (2) can be extended to MVM in (4):
The conductance weights in a single column in the crossbar array correspond to a single channel in a CNN kernel. One can implement deep-channel kernels in parallel by distributing these across column wires. The voltage corresponding to the image data is applied at the input terminals of the crossbar (i.e., at the row wires), where the convolution operation is performed.
III. RADIX-X CNN ALGORITHM
The conventional methods of working around single-bit weight restrictions in memristor crossbar arrays are either algorithmically by using BNNs, or via hardware distribution of computations via binarized encoding across columns. As mentioned, the former compromises accuracy and the latter expands chip area and power consumption. In the past, BNNs have been implemented either at the weight level, at the activation level (akin to the classic perceptron [37] ), and both in unison. The work in [24] , [38] , [39] implements a binarized activation that can adopt both positive and negative values:
Although this bounding approach is convenient for digitized implementation, there is a degradation of inference accuracy as a result of high precision compression [40] . This may be counteracted by using more learning parameters with an increased number of training epochs, but this offsets the advantages of parallelization. In light of these limitations, we propose a novel approach based on a radix-X weight representation, and present our method for algorithm and hardware co-design to realize it on a memristor crossbar array. The radix of a digital numeral system refers to the number of unique digits used to represent values in a positional numeral system, including the digit zero. If X is the radix of a numeral system, then in context of a neural network, radix-X refers to the complete set of values that are assignable as a weight and activation value. For example, where X=5, then in a radix-5 CNN the weights and activations can take on any one of 5 values. We present an algorithm that normalizes a highprecision pre-trained weight matrix into a radix-X weight matrix that can be any one of the values in the set {-2, -1, 0, 1, 2}. By employing the ReLU activation function, we ensure the outputs can also be represented within the limits of the radix-X numeral system as one of the values in the set {0, 1, 2, 3, 4}. In the most generalized case for radix-X, we propose that the weights must first be normalized according to the pseudo-code:
Algorithm 1 Convert pre-trained weights into radix-X 1: function NORMALIZEDTENSOR(x, weights): Return normalized weights given input of radix-x and pre-trained weights 2: range ← max of weights − min of weights Return quantized weights given input of pre-trained weights 6: for element in weights do:
if element < 0 then round element up 8: else round element down 9: return quantized
In main function 10: return QUANTIZEDTENSOR(NORMALIZEDTENSOR(x, weights))
An integer input x is the radix of a numeral system, and a matrix or tensor input of pre-trained weights, plainly denoted weights, are both passed into the function NORMAL-IZEDTENSOR, which returns a normalized set of weights where the minimum element is −X/2 and the maximum element is +X/2. The output is called as the argument of QUANTIZEDTENSOR which quantizes all floating point decimals into integers. For accessibility, we have provided a link to the GitHub repository containing our Python 3 implementation of the above pseudo-code [42] . The Python code also includes the radix-X activation function. Where X = 5, the mathematical equivalent for a radix-5 CNN of the above algorithm is:
where range refers to the range of weights prior to normalization, and w max and w min are the maximum and minimum weights respectively. In (7), w r−5 is the equivalent set of weights in radix-5, and w refers to the weights in the base 10 system. In (8) , 'pixel' is the input data convolved with a kernel before activation, and when passed through the ReLU r−5 activation, gives an output of pixel r−5 , which is bound to one of five integer values. Figs. 4 and 5 illustrate the process.
When training data is passed through the network, the neuron output and weights are converted to w r−5 using (6)- (8) . Then, the classification result is obtained through forward propagation. The cost function for the output is obtained, and the slope of the cost function for w r−5 is calculated using backward propagation. We compute the real-valued weights using the ADAM optimizer [41] , which is used to calculate and store w r−5 . This feedback process is represented in Fig. 6 , and while it bears many similarities to conventional backpropagation, we will show how it can be harnessed at the system level using parallel-connected memristive junctions in a crossbar array in the following sections.
IV. RADIX-X CNN ACCELERATOR CIRCUIT We designed and fabricated an application specific reconfigurable crossbar array, intended precisely for the implementation of our radix-X CNN Accelerator. Here, we will describe . The process of training radix-X CNN models. The gradients of the cost function δC/δw r−X are obtained through forward-and back-propagation using radix-X converted weights w r−X . The ADAM optimizer updates realvalued weights w based on those in the previous cycle. This updated w is converted to a radix-X weight w r−X and used as a parameter to decrement the cost function again. The real-valued weight must be saved during training. the operating principle of our design, how to achieve multi-bit and negative weights at a single crosspoint, and then detail the nanofabrication techniques used in its development.
A. Multi-bit Weights
As the resistance precision of the memristor for storing information is limited, and the impact of writing variation increases with the number of resistance states [20] [21] [22] , we seek to circumvent this issue by introducing parallelconnected memristors at each crosspoint in the array. Each of these heterogeneous memristors are still only used to store binarized weights, but by forming and severing connections to the memristor electrodes, we introduce additional bits per crosspoint, despite our conservative design approach. chosen X = 5 (i.e., radix-5) which requires four parallel memristors per column-row wire intersection. 4 × 1-bit memristors are placed in a quad-parallel structure per metal crosspoint. Thus, 0-4 memristors are connected to the top metal and pre-programmed to either a HRS or LRS. That is, these memristors are used as read-only memory to ensure high reliability and to avoid write variability.
As shown in Fig. 8 , five resistance values can be obtained depending on the number of activated parallel-connected memristors. The proposed parallel-connected memristor demonstrates how a set of radix-5 CNN weights using 5 discrete resistance states can be implemented.
B. Negative Weights
Existing studies have implemented the hardware described in Fig. 9 to represent negative weights, which requires twice the number of columns. In our proposed method, each of the radix-5 weights {-2, -1, 0, 1, 2} are mapped to one of five available memristor configurations. This is depicted in Table I . However, in each of the 5 configurations, the equivalent resistance at a crosspoint is still non-negative. We will demonstrate how to remove the need for duplicative columns by mapping negative weights into positive conductances. First, all radix-X weights w r−X are positively shifted by the magnitude of the minimum weight w r−X,min . This translates the minimum weight to 0. Next, each level-shifted weight is divided by the resistance of a single memristor to calculate the equivalent memristance weight. For example, in radix-5, the minimum weight is -2. Where w r−X = -1, a level shift of |−2| gives +1, and the equivalent resistance can be found by dividing R m by this value. Table I shows that only one memristor (1M) should be connected between row and column wires to attain R m . For w R−X = 0, the equivalent resistance will be R m /2. Table I indicates that two memristors are connected in parallel:
The equivalent conductance is given by:
Substituting (9) into (2) gives the following equation for the column current for n rows in radix-5:
However, this is an insufficient representation of output current. To see why, consider Figs. 10(a) and (c) which are radix-5 ANNs consisting of 3 unique inputs, and Figs. 10(b) and (d) which are the crossbar array equivalents using our parallelconnected structure. The equivalent conductances are derived from Table I and (9), where R m = 100 kΩ. The current through the first column in Fig. 10(b) is calculated using (10):
and for the first column of To counter the w r−X,min level-shift, we must design an adaptive reference line to be subtracted from the signal columns. To do this, we note that the minimum column current 12µA in Fig. 10(b) corresponds to the ANN output of Y = 0. If we subtract 12µA from each current in the set {12µA, 17µA, 16µA}, the resulting set of column currents becomes {0µA, 5µA, 4µA}; there is now a 1:1 correspondence to the ANN output. For Fig. 10(c), subtracting the minimum current 14µA  from {14µA, 19µA, 18µA} attains {0µA, 5µA, 4µA} . The current sets now match the ANN outputs. In both cases, the solution is to subtract the current corresponding to the ANN output of '0' from all column signals.
In a radix-5 crossbar array, we create our own zero-weight reference column by having two memristors in parallel at each row (2M in Table I ). This corresponds to a radix-5 weight of 0 for an entire column. 1 The output current of the reference line can be calculated by substituting w r−5 = 0 into (10):
This is generalized to any radix-X numeral system, by substituting w r−X = 0 into (9), and the result into (2):
The reference current is dependent on the input voltages, and therefore cannot be implemented using a constant current. This was demonstrated by example in Fig. 10 . The reference current i ref is converted into a voltage using an op-amp, and subtracted from all signal voltages with an array of differential amplifiers.
The hardware level implementation of the level-shift is shown in Fig. 11 , with the reference line highlighted in red. The inverting amplifiers are used to fix all columns at virtual ground. To find the potential at the output of the inverting amplifier on the reference line, note that i ref from (13) is passing through the negative feedback resistor R:
Similarly, for the inverting amplifier output of the signal columns:
Given all resistors of the differential amplifier are equivalent, the output stage of the crossbar array is a subtractor with V ref from (15) being passed into the positive terminal, and V inv from (16) into the negative terminal:
The final result of (17) shows how the '+2' linear shift is removed by V ref , thus ensuring a correct representation of negatively weighted MVMs following the demonstration in Fig. 10 . The relationship between a neural network input X n and the input voltage V n in the circuit is given as,
where S is the scaling factor of V n . Substituting (18) and (1)into (17) obtains:
This verifies that the output voltage of our radix-X CNN accelerator is simply scaled by ( Rm * S R ), and concludes that we are able to represent multi-bit negative weights with a parallelconnected memristor without duplicative columns.
V. SIMULATION RESULTS
We conducted a simulation of the radix-X CNN accelerator described above with all memristors being used as read-only memory, and peripheral circuitry in the SK Hynix 180nm CMOS Process. The characteristics of the simulated memristor are based on our own Al/TiO 2 /TiO x /Al crossbar array which we will provide details of in the next section. The relevant features for our feed-forward simulation of a pretrained network are R m = 100kΩ and V T h = 0.5V . As all parallel configurations are fixed on our crossbar, there was no need to consider switching time characteristics and programming variations. The relatively large width of our metal lines (20µm) meant low line resistance and so line losses were negligible. When scaling the metal lines down and the number of rows and columns up, this assumption will need to be adapted accordingly. The final idealization made was assuming negligible device-to-device variation which was accounted for in experimentation. The peripheral resistances were chosen to be R = 10Ω and the scaling factor S = 10 to ensure read voltages did not exceed the switching threshold.
The architecture of our radix-5 CNN is shown in Fig. 12 . We evaluated the validation accuracy for three implementations of a high precision 16-bit CNN, BNN and the proposed radix-5 CNN. Fig. 13 shows the classification accuracy during training on the CIFAR-10 dataset, where a high precision CNN and radix-5 CNN showed a difference in accuracy of approximately 0.8%. This is a 5.3% improvement over BNNs, which this is to be expected given the higher base value used, but for a substantial decrease in chip area. A more detailed comparison is summarized in Table II. As shown in Fig. 14 , the behavior of a simple neural network for the proposed radix-5 CNN is fully implemented and simulated. Analyzing the simulation results in Fig. 14(b) shows that the output of the neuron for the first pulse during time t = 1µs to 2µs is verified with (2):
Therefore,
In the same manner as (20) and (21), the V col outputs from the second and third input pulses are 20µV and 70µV , respectively. The results of our simulation agree with our mathematical derivations in section IV.
VI. EXPERIMENTAL RESULTS
A. Nanofabrication
We fabricated a proof-of-concept 4 × 4 parallel-connected crossbar array in-house to demonstrate the feasibility of the proposed memristor-based radix-5 CNN method. This was achieved with a sandwich structure composed of Al/TiO 2 /TiO x /Al layers. A 200-nm-thick Al layer was deposited as the bottom electrode on a glass wafer. Standard photolithography was conducted to produce 20-µm-wide Al lines. During the microfabrication process, the wafers were irradiated by using a mask alignment system for 100 s and then developed at 296K for 120 s. The Al channel was then defined by wet etching (H 3 PO 4 :HNO 3 :CH 3 COOH:H 2 O = 80 ml : 5 ml : 5 ml : 10 ml), removing any Al outside of the channel regions at an etching rate of ∆d/t = 300 nm/min. 5-nm-thick TiO 2 thin film and a 15-nm-thick TiO x thin film layers were formed by atomic layer deposition (ALD) and magnetron sputtering. Subsequently, another 200-nm-thick Al layer was sputtered as the top electrode, followed by standard photolithography to create 20µm × 20µm windows. Fig. 15 shows a cross-sectional image of a single memristor taken with a focus ion beam (FIB) analyzer.
B. Image Processing
We performed image convolution on 100 images of handwritten digits from the MNIST dataset, of 28×28 in dimension [43] and passed them through a Sobel filter, which is typically used in edge detection algorithms. The Sobel operator takes the form of a 3 × 3 matrix in radix-5 form:
The rationale being that, if the crossbar is capable of performing MVMs then by extension, classification tasks using a CNN will also be possible on larger arrays. The image is processed using similar parameters to those in the simulations, where input pixels are linearly mapped from a null input for a black pixel and 0.4 V for a white pixel. As per Table I , a kernel element of '-2' is implemented as an open junction at a crosspoint, and an element of '2' mapped to four parallel connected memristors. The maximum current drawn from a memristor was measured to be approximately 1.6µA, and the critical value for i tot from a full column under the test case of MNIST images passed through an edge detection filter was 4.0µA. This column current is relatively small when compared to similar arrays based on conduction via oxygen vacancies, but this is a result of having a small-scale array rather than low read voltages. The output voltages at v col were then linearly mapped back into output images. Qualitatively, we successfully generated a near perfect 2D convolution with a stride of 1 and no zero-padding, as can be seen in Fig. 16 , and a scaled up sample in Fig. 17 . The small scale prototyped nature of our array meant that for a 3 × 3 kernel, each pixel required 3 read cycles where 4 output pixels could be pipelined across columns, and convolving a 28 × 28 image required a total of 21 read cycles.
VII. DISCUSSION
Implementing BNNs on memristor crossbars is a common technique used to enhance robustness of crossbar arrays in light of analog write variability. Our proposed technique follows this conservative design methodology where the radix-X CNN accelerator uses single-bit memristors. Rather than using binarized encoding across multiple columns, we instead modulate the number of memristors at crosspoints between row and column lines (i.e., a 1TXM cell), and have thus proposed a new crossbar architecture and co-developed an algorithm specifically suited to adapt to the number of memristors per cell. The first trade-off to consider is the number of additional memristors per cell, as against additional columns to improve precision and implementation of negative weights. This analysis is process dependent, and in our array where the metal lines occupy a width of 20 microns, the minimum width of a single memristor is of sub-micron pitch (and of a few nanometers in more advanced processes [44] , [45] ). For singlebit memristors in conventional binarized crossbars, the closest equivalent comparison to radix-5 is by using 2-bit weights, which will require a total of 4 columns (2 for positive weights, and 2 for the differential pair). We are able to implement the above scheme in 2 columns, with a 20% improvement in precision using radix-5 over 2-bit representations. The alternative option for column reduction is to use analog weights, which remains a developing but promising field of research.
The limiting factor is where the radix of the numeral system becomes larger, resulting in an increasing number of parallelconnected memristors per cell, and an associated reduction in equivalent resistance. Larger metal lines and more vias are needed to cope with the increasing current capacity. While our array had no issues with a critical current of 4.0µA (due to the wide metal lines used in our process, and had current capacity of over 100mA -see Fig. 8(b) ), this will become an increasingly important trade-off when optimizing for higher values of X in radix-X. The effect of decreasing equivalent resistance can be partially mitigated by reducing the read voltage, where state-of-the-art crossbar arrays have demonstrated read currents of under 10nA [33] .
The second trade-off is with respect to pipelining. Given that parallel-connections are fixed at the time of fabrication, the radix-X crossbar will typically be optimized for specific conductance matrices. In general, this will be advantageous only for kernels containing a particular set of elements. The benefit to reduced reconfigurability is that write-variability is no longer an issue, and endurance is also prolonged due to the application of only read pulses.
VIII. CONCLUSION
We have proposed a crossbar array with multiple metaloxide thin film switches at each crosspoint, and a co-designed algorithm tailored for this inference accelerator to convert a set of pre-trained weights into values based on user-selected precision. We conducted CNN classification on the CIFAR-10 dataset using a large-scale simulation, and performed experimental validation of convolution image processing on a subset of the MNIST dataset using a small-scale crossbar array. We demonstrated that we could achieve multi-bit and negative weights using 46% of the area of conventional differential pairs of columns, all whilst including an adaptive precision mechanism within our array. What has been proposed is not an exhaustive use of this array. For example, future work includes the use of transistor switches to reconfigure the number of memristors at each crosspoint to enable a higher degree of reconfigurability. Alternatively, as research on multibit memristors matures and values of memristance increases, these will be the proponents to achieving higher precision by extending the range of possible base values usable for a given crossbar dimension in radix-X.
