Abstract-Three distinct methods of reading multi-level cross-point resistive states from selector-less RRAM arrays are implemented in a physical system and compared for read-out accuracy. They are: the standard, direct measurement method and two methods that attempt to enhance accuracy by computing cross-point resistance on the basis of multiple measurements. Results indicate that the standard method performs as well as or better than its competitors. SPICE simulations are then performed with controlled amounts of non-idealities introduced in the system in order to test whether any technique offers particular resilience against typical practical imperfections such as crossbar line resistance. We conclude that even though certain non-idealities are shown to be minimized by different circuit-level read-out strategies, line resistance within the crossbar remains an outstanding challenge.
I. INTRODUCTION
R ESISTIVE random-access memory (RRAM) is a promising, emerging, beyond-Moore memory technology whereby storage nodes operate on the basis of the resistive switching phenomenon. RRAM systems exhibit small size and good scalability (down to 8 8 nm node size reported in the literature) [1] , [2] , multi-state memory storage [3] , low-power operation [4] and rely on the use of simple, 2-terminal devices; all highly desirable characteristics for applications ranging from large, industrial memory cells to neuromorphic applications. In neuromorphic engineering RRAM is seen as a possible means of linking pre-and post-synaptic neurons through area-effective artificial synapses [5] ; currently a fundamental stumbling block towards the development of large-scale, area-and power-efficient artificial brain-inspired computational systems.
The key benefit of RRAM scalability is typically leveraged by implementing RRAM cells as crossbar arrays; a configuration that maximizes memory area density (down to /storage node for planar crossbars, where is the minimum feature size Fig. 1 . The crossbar sneak current problem: when attempting to interface a target device "sneak current" will flow through "sneak paths," thus corrupting the accuracy of read operations and potentially disrupting non-target devices during the write operation. The specific paths and magnitudes of sneak currents will depend on the biasing regime applied to the nodes marked as " ." Inset shows the structure of each cross-point.
of the crossbar array [6] , [7] ). The main drawback of this implementation, however, is the issue of "sneak currents" [8] whereby current tends to pass through devices other than the device under test (DUT) during "read" and "write" operations ( Fig. 1) . Sneak currents prove disruptive to the accuracy of read operations [9] and may precipitate programming of non-target storage nodes during write operations.
Research towards mitigating sneak current effects revolves mainly around the development of "selector devices" that can be embedded into the storage nodes themselves and allow highly selective targeting of DUTs [10] - [12] . The implementation of selector devices, however, adds complexity to the overall fabrication process. Sneak currents can be mitigated in selectorless arrays too by e.g., employing suitable biasing regimes (for overview of such biasing regimes, see [13, Sec. 2.6 ]) or attempting to calculate cross-point resistance via multiple, multiport measurements [14] . Each mitigation/read-out strategy has merits and drawbacks.
In this paper we build upon previous work [15] and investigate the readout accuracy limits in selectorless, multi-level RRAM crossbar arrays for three distinct read-out techniques. We implement them on a custom-built instrumentation platform and compare their ability to successfully read resistive states from a small (12 12) , selectorless reference array consisting of linear resistors. We then further broaden the scope of our study by extrapolating towards the scaling limits of RRAM selectorless arrays as well as worst case scenarios via SPICE simulations. The focus is kept on well-behaved linear test devices in order to eliminate any sources of uncertainty that would be present if RRAM devices had been used (inadvertent switching, resistive state drift etc.).
The capability of performing accurate array-level read-outs is a crucial element in many applications. One example is the mass testing of novel device architectures, where array-level electrical characterization would be a major boost towards the automation of the process development cycle. Another example can be found in multi-level memory cell development, where the amount of information potentially extractable from storage nodes able to assume resistive states within a continuous range is read-out accuracy-limited. Finally, the field of neuromorphic engineering could exploit the benefits arising from the development of nanoelectronic artificial synapse banks far more efficiently if there was an option to monitor the internal state of every synapse in the bank for debugging and analysis purposes.
The paper is organized as follows: Section II provides the definition of "memory state" and introduces the three different read-out techniques. Section III presents a physical system capable of performing all three types of read operation while Section IV provides measured results from our prototyped reference array used to assess the instrument's accuracy. Section V presents SPICE simulations whereby system performance is estimated for larger arrays set up in standard "worst case" configurations. Finally, Section VI discusses the merits of each read operation technique in light of the simulated and measured results and draws the overall conclusions of the paper.
II. THEORETICAL BACKGROUND

A. Defining a Memory State
The resistive state (RS) of any two-terminal, non-linear component (such as typical RRAM devices) can be measured in a multitude of ways, e.g., as the static or differential resistance as measured when some fixed potential difference is applied across the component. In practice it is convenient to use the static resistance at fixed definition. This approach is amenable to easy circuit implementation as it employs a constant read-out voltage and allows for the determination of DUT RS via a simple amperometric measurement. In this work, we define RS as static resistance measured at . This value was arbitrarily chosen in order to demonstrate sub-volt operation while allowing our instrumentation platform to operate well above the noise floor. Read-out voltage optimization lies outside the scope of this paper.
B. Crossbar Array Nomenclature
Reading from a crossbar array typically involves targeting a single cross-point element at a time and attempting to determine its resistive state. This gives rise to "active" (leading to the target device) and "inactive" word-and bit-lines. In order to simplify the tangled structure of the crossbar network, read-out operations are typically performed with all inactive word-lines shorted together and similarly all inactive bit-lines shorted together. This reduces the crossbar network to a "four-node-fourcomponent" (4n4c) system where the four components are: 1) the target device alone , 2) the devices sharing word-line with the target (we shall call this the "word complement" of ) with equivalent resistance , 3) the devices sharing bit-line with the target (the "bit complement" of ) with equivalent resistance , and 4) the rest of the array with equivalent resistance (see [9] , [14] ). Fig. 2 illustrates these concepts. We shall use this reduced system in order to describe how the three read-out operations function conceptually. 
C. The Three Read-out Techniques
1) Single Read-One Direct Measurement:
There are many biasing regimes that can be used to carry out a "single read" operation. The specific choice of regime will depend largely on the type of device being used (e.g., unipolar-vs. bipolar-switched) and other considerations, such as power dissipation, but ultimately it will aim to simultaneously determine the voltage drop across, and the current through the target device: (1) where is the target resistance to be determined, denotes the standard read-out voltage applied across the target device (in this work ) and is the current through the target device. can be determined by bootstrapping the wordor bit-complement devices (or both) and measuring current entering/exiting the active line servicing the bootstrapped devices. An example of a possible single read measurement configuration is shown in Fig. 3(a) . A good way to implement it in practice would be by realizing the ammeter as a trans-resistance amplifier (TRA), thus enforcing virtual earthing of the active bit-line. This, in turn, ensures bootstrapping of the bit-complement while keeping the voltage drop across the target device under control as shown in Fig. 3 
(b).
This technique is relatively easy to implement and relies on a single measurement, but when the target device is in very high RS, becomes very small. This generates vulnerability to systematic offset error factors such as constant leakage currents and especially voltage offsets that affect the quality of bootstrapping as seen in Fig. 3(b) . The importance of proper bootstrapping for the correct operation of the single read technique can be demonstrated by calculating the current reaching the ammeter in the example of Fig. 3 (3) where , is the current through during the first and second sub-operations respectively.
will compensate for systematic offsets present in the system.
The differential read technique computes static resistance at independent of the shape of the target device I-V characteristic only for . Any other choice of causes the technique to yield the slope defined by the two chosen voltage bias points along the I-V of the target device. If we set , the differential technique attempts to compute differential resistance at the chosen bias point . The main benefit of a "differential read" approach is its offset cancelling nature, which becomes apparent only when systematic offsets become significant sources of error, overpowering sources of random error such as instrumentation noise. However, the technique requires two sub-operations, one of which is a single read operation; therefore, it is slower than the single read technique.
3) Triple Read-Three Proxy Measurements: The "triple read" operation is based upon the idea of computing by proxy. The procedure flows as follows (see Fig. 4 ): a) Measure the equivalent resistance of all devices in the target word-line (the "full word" with equivalent resistance ). b) Measure the resistance of the target bit-line ("full bit,"
). c) Short together and bias the active wordand bit-lines with the aim of shunting while determining the "full complement,"
. The procedure yields three equations for three unknowns and hence can be computed, as shown in [14] . All read-outs are performed at the standard . The equations for each sub-operation are as follows: (4) (5) (6) where indicates current through during the th sub-operation.
It then follows that:
(7) where . Although algebraically a correct method for determining , this method suffers from an inherent tendency to amplify errors committed while determining its partial results and . Let us define the fractional read-out error of any single read measurement as: (8) where is the nominal and the measured RS of the target element(s). Noting that it can be show that computed target device conductance in the presence of read-out errors is given by: (9) where are the fractional errors in reading , and respectively and the actual target device conductance. The fractional error in determining is then given by:
Equation (10) shows that the fractional error involved in reading and transmits directly into the fractional error in reading , but the terms and are amplified by and respectively. Furthermore, the Bienayme formula 1 and the variance scaling property 2 tell us that any variance in , and (random read-out errors e.g., due to noise in the circuitry) will be amplified by and add-up to cause higher final computed variance vs. variance in measurements of , and . Theoretically, the main benefit of using this technique would be that the entire crossbar array is treated as a two-terminal device in every sub-operation. This implies that unlike in single and differential read-out, all current entering the array will exit through the ammeter, while the exclusive use of shunting (as opposed to bootstrapping) in order to neutralise various components of the crossbar (e.g., or ) should remove the requirement for careful handling of offsets. However, this technique requires three sub-operations to complete and the ability of the system to treat each crossbar line as either a word-or a bitline flexibly. The relatively minor increase in circuit complexity needed to accommodate this additional flexibility is overshadowed by the fact that during the last sub-operation current flows through both and , but in opposite polarities from the perspective of the DUTs. Therefore, this read-out technique is not suitable for arrays consisting of storage nodes with asymmetric I-V curves.
As a side-note we notice a further benefit of this technique concerning speed: Each sub-operation aims to compute a low resistance formed by the parallel combination of either or devices. Consequently, for sufficiently large "N" the determination of all l.h.s. terms in (4), (5), and (6) can potentially be much faster than obtaining a "single" or "differential" read result. This speed benefit will tend to improve with increased array size. The detailed study of the transient behavior of the three read-out techniques is outside the scope of this paper where we concentrate on steady-state behavior (read-outs are assumed to be taken when all voltages throughout the system have settled). Fig. 5 shows a simplified schematic of the instrumentation platform used to implement the read-out techniques under study. At the core sits the crossbar array under study. Around it lies the access framework module, which consists of single-pole-triplethrow (SP3T) switches whose positions determine which lines act as active and inactive word-and bit-lines. Pairs of relays (G6EU-134P-US) were used in order to implement the SP3T functionality while minimizing access resistance ( max.). Finally, the measurement environment consists mainly of the bias generator and the TRA. The TRA consists of a precision OpAmp (OPA227-max. offset) and its feedback resistor bank (precision resistors-5/6 resistors at 0.1% tol. 1/6 at 1%) and acts as an ammeter with output voltage as its current-reading variable. The bank allows the TRA to measure a large range of currents while maintaining within the amplifier's output swing limits. Each resistor in the bank is software-assigned to measure target loads within given RS 1 The variance of the sum or difference of uncorrelated random variables equals the sum of their individual variances. 2 The variance of , where is a random variable and a constant, equals times the variance of . 
III. SYSTEM IMPLEMENTATION
TABLE I TARGET LOAD RS CLASSES
Number of devices in reference crossbar (see Section IV). ranges ( Table I ). The boundaries between the RS ranges of adjacent resistors are given by their geometric means. The bias generator consists of an LT1970A amplifier whose output is measured each time a device is read in order to improve measurement accuracy.
At higher level the system is operated by a microcontroller (mBED LPC1768) and implemented on a custom-made PCB with discrete components. This is crucial as it allowed us to utilise very high spec components in order to test the limits of read-out accuracy. Other system modules, not shown in the schematic of Fig. 5 , include: read-out buffers, voltage references, the power management unit and the device programming unit. A photograph of the instrument is shown in Fig. 6 .
IV. EXPERIMENTAL RESULTS
The performance of the "single," "differential," and "triple" read-out techniques was benchmarked against a reference crossbar array. The array consisted of discrete, linear resistors with RS ranging between 1 and 220 (Fig. 7) ; values covering the initially intended region of operation of the instrument (1 -100 ). Notable features of the reference array include: a) An all-low RS (1 ) bit-line intended to uncover the effects of attempting to bootstrap a very low RS path. b) A high RS word-line testing read-out performance at excessively high RS. c) Components randomly drawn from pots of available devices to provide insight concerning read-out of intermediate-value components in a relatively homogeneous environment.
Every device in the reference crossbar was measured with each technique described in Section II and the fractional read-out error was computed. Results obtained from measurements on the reference array are shown in Fig. 8 . Notably, results for the single and differential read-out techniques are very similar and cover broadly similar ranges in terms of . On the other hand, results obtained for the triple read show fractional errors up to thousands of percentage points above and below nominal. This is rather surprising considering that the partial results for full word, full bit and full complement are all fairly tightly distributed. Panels (d) and (e) in Fig. 8 show characteristic horizontal, respectively vertical bands. This is because the triple read sub-operations used to determine the full word or full bit RS are procedurally identical and therefore results should not depend on the specific selection of target device. As a result, panels (d) and (e) of Fig. 8 contain 12 independent measurements of each full word and full bit RS; one for each device in a line. Differences within each set of 12 measurements reveal the effects of random measurement errors. These effects are summarized in Table II where for each word/bit line, average fractional error over all 12 measurements and corresponding standard deviation were computed. can be a useful indicator of spread in the data even though the underlying distribution may not necessarily be Gaussian.
Many pairs of lines are read with statistically significant fractional error differences ( significantly larger than both associated standard deviations ) as can be seen by examining, e.g., the cells highlighted in yellow in Table II (two-tailed two-means t-test shows values of are significantly different with p-value 0.00095). This indicates that each line is read at a fundamentally distinct fractional error; possibly a function of the RS of the line itself and/or the state of the rest of the crossbar array. This is important because it implies that the fractional errors committed in measuring the terms in the l.h.s. of (7) do not necessarily cancel each other out. At a higher level we also observe that our reference array's word-and bit-lines are read at significantly different average fractional errors although for both each line contains readings with, on average, similar spread.
Revisiting (10) we note that the and factors take large values for some devices, as evidenced in Fig. 7 : for example the device sitting at (word, bit) location (7, 8) has a value of 220 while its corresponding full word and full bit have values of 908 and 832 respectively. If we set device (7, 8) as the target, then we obtain and . Even in the absence of systematic fractional errors , this would lead relatively small amounts of random error in determining , and (see values in Table II) to generate intolerably high read-out errors on most trials.
In order to investigate the resulting data further, the distributions of fractional errors and their corresponding cumulative distributions were extracted for each read-out technique as shown in Fig. 9 , top two rows. The bottom row shows cumulative error distributions separately for devices belonging to each RS class. Notably, both single and differential read-out show a propensity for overestimating smaller resistances while underestimating larger ones. In both cases the worst performers tend to be the devices in RS class 5. The relatively large number of devices in RS classes 1 and 2 skews the overall error distribution towards a 3% average overestimate. In the triple read case, the error distribution shows a vast range of values, including many resistances that were read as negative ( implies ). Devices exhibiting higher RS tend to suffer much higher read-out errors.
V. SPICE ANALYSIS
In order to investigate how the three read-out techniques can be expected to perform in the presence of a controlled amount of realistic non-idealities, each was examined through SPICE simulations based on the simplified system schematic in Fig. 5 . Table III shows the list of non-ideality factors taken into account, of which , and were of key significance. All non-ideality factors except were based on the components used in our system and kept fixed for all simulations. The TRA core amplifier was modelled behaviorally, although result accuracy was validated with the more computationally demanding SPICE model provided by the supplier of the component (Texas Instruments).
The simulated system was first tested with the reference array from Fig. 7 in order to compare performance against the physical system and then with a 12
12 "worst case" array with and . was 50 in both cases. We define "worst case" arrays as arrays consisting of linear I-V elements where the device farthest from the access resistors, in our case devices with (word,bit) coordinates of (1,N), is at the maximum allowed RS while every other cross-point node is set at the minimum RS -see Fig. 10(d) . Results are summarized in Fig. 10 .
Results on the reference array show that under the simulated system set-up the low RS bit-line exhibits significantly higher errors than the rest of the array due to the effects of line resistance. This is confirmed by noting that as one moves closer towards the bit-line access switch (towards word-line 12) the errors abate in a gradual fashion. This issue affects all read-out techniques similarly. Next, we notice that only in the case of the single read we can still observe traces of the pattern present in Fig. 8(a) . This pattern vanishes when the single read operation is simulated with an offset-free TRA. The differential and triple read techniques seem to eliminate much of the propensity of the single read technique to underestimate devices at high RS, likely because of the offset-cancelling nature of the differential read.
Results on the 12 12 worst case array confirm that the differential and triple reads mitigate high RS device underestimation, as seen in the table inset in Fig. 10(d) . In the case of the triple read the device is read successfully because word-and bit-complements are read with exactly the same while the full complement is read at an extremely close . The main contribution to the final overall read-out of 0.73% comes from the first r.h.s. term of (10) ( 0.59%).
Interestingly, in this particular array set-up it is the worst of the devices that forms the bottleneck of the design. Worst shows a fairly consistent and is located at address (2,11) for all read-out techniques (marked in a red box in Fig. 10(d) ). Notably, this is the device farthest from the word-and bit-line access switches that does not have an device on either of its lines; results underlining that the worst performing device is not always the most obvious one (typically assumed to be the high RS device).
Next, simulations were carried out in a variety of worst case arrays of different sizes and .
was swept between a minimum of 50 corresponding to expected line resistance for the reference array and a maximum of 10.6 , corresponding to expected line resistance for an array employing electrodes with 100 10 nm cross-sectional area and 100 nm pitch. The fractional read-out errors for the high RS device are shown in Fig. 11 .
Read-out accuracy is very similar for all read-out techniques indicating that line resistance and overall system loading affect all three read-out techniques similarly in the idealized simulation framework used in this section. We notice three key trends. First, small arrays with low line resistance tend to allow for accurate read-out of the target device, as expected, but as line resistance increases target resistance starts to be underestimated ( Fig. 11(a) marked (i) ). This is probably caused by the line resistance interfering with the bootstrapping/shunting of the active line(s). For example in Fig. 3 the inactive word-lines can no longer be assumed to be sufficiently well grounded throughout their entire lengths, thus resulting in above ground voltages at the word-line terminals of the bit-complement devices. When the inactive bit-line terminals of the bit-complement devices are below those of their word-line terminals (a situation aided by low currents) extra current is injected onto the active bit-line and hence the TRA block overestimates the current through the target element. Secondly, if line resistance continues to grow the system tends to start overestimating target element resistance ( Fig. 11(a) marked (ii) ). This probably occurs because voltage delivery to the target element fails completely, all current between bias generator and grounding effectively by-passing the target element and choosing shorter pathways throughout the array. Finally, we notice that for larger arrays the two aforementioned trends continue to be present, but manifest their presence at lower values of .
VI. DISCUSSION
In this paper we have examined the issue of accurate read-out of the resistive state of devices within linear, selectorless crossbar arrays. We have presented three read-out techniques, analyzed some of their key sources of errors, implemented a system capable of carrying them all out, presented measured data from a reference array and performed simulations in order to better understand the unique attributes of each technique. Our analysis indicates that the differential read technique becomes advantageous vis-a-vis the more traditional single read if systematic offsets within the system are a significant source of error. In the noiseless, simplified simulated system we see a clear accuracy benefit although in the physical system it seems that other sources of error dominate-errors that cannot be eliminated through use of the differential read technique.
With regard to the triple read technique, we showed that it has an in-built tendency to amplify differences in the read-out errors of its partial results. Interestingly, in the simulated system these "partial errors" tended to cancel each other out, which shows that the triple read technique exhibits some degree of inherent resilience to the controlled imperfections we introduced in our simulated system. In the physical system they failed to cancel out and led to extraordinarily high RS read-out errors, possibly because of the highly randomized and asymmetric configuration of the reference array used as a test subject. Finally, we noted that the triple read technique requires employing at least two different read-out voltages throughout its cycle (in our case standard and ). This renders it hard to operate on practical devices with asymmetric I-V curves.
Finally, we have shown simulated results indicating that none of the examined read-out techniques can truly compensate for the inherent limitations arising from within the array (line resistance, selectorless nature) much more successfully than the others; even when many other sources of error are factored out. This, in combination with the fact that the region of satisfactory operation in -space is rather restricted points towards the absolute necessity of operating crossbar arrays with high quality selectors.
This work can be pursued further by attacking three crucial issues: a) Introducing transient analysis in order to fully investigate issues of read-out speed and power dissipation, b) measuring and simulating arrays with non-linear, possibly asymmetric IV cross-point elements, and c) assessing system performance on arrays that boast selector devices, ideally 2-terminal selectors integrated into the array fabric.
