The high demands for performance and energy efficiency pose significant challenges for computational systems. Memristor-based crossbar architectures are actively considered as vital rivals for the traditional solutions. Nonetheless, density and energy driven passive array structures, that lack a switching control per cell, suffer from sneak paths that limit the range of accurate operation of the crossbar array. In this paper, the crossbar array is treated as a communication channel with added distortion to represent the sneak current. Estimation techniques based on preset pilots are utilized to alleviate the distorting effects and enhance the system throughput. A two dimensional setting of these reference points leads to an accurate estimation of and compensation for the sneak paths effects. Thereby a comprehensive technique is presented that boosts the performance and accommodates functional metrics of speed, energy efficiency, accuracy and density all within a single envelope. SPICE simulations cover the data patterns dependencies, the non-linearity impact, and the crossbar distortion. It offers a further validation, from several aspects, on the reliable operation attained with the complete separation of the high and low bits regions.
Introduction
Design and technology innovation of computational architectures, along with data and server growth, pose an exigent need for novel approaches. It has initiated a mandated exploration of emerging technologies [1, 2] that addresses the set conventions beyond Moore's law [3] . The discrepancy, in the data rate and throughput, between the memory and the processing block stresses further the communication bottleneck; where the memory falls short in providing the anticipated data rates. [4] . Moreover, the extensive scaling has pushed the conventional memory technologies up to its limit, which compromises the main functionality metrics of reliability, speed, and retention capabilities [5] .
Memristors are promising candidates that could be up to the stringent challenges in diverse applications [6] [7] [8] [9] [10] [11] . The crossbar structure is of particular interest, due to its many advantages, whether in terms of density, efficiency of fabrication and power consumption [12, 13] . It has paved its way into memory applications [14] [15] [16] [17] [18] [19] , including associative memories [20] in the process to bridge the gap between memory and designated computational blocks. It was also adopted in unconventional computing systems, showing high potential for embedded architectures and co-localized in memory computation [21] [22] [23] [24] [25] . Neuromorphic processing techniques incorporated the crossbar as well in terms of synapses to overcome the density and integration challenges faced by the current CMOS approaches [26] [27] [28] [29] .
Data is saved within the memristor-based memory array in the form of resistances. The two memristor boundaries of R off and R on correspond to the bit '0' and '1' respectively. Writing to the memory is accomplished through applying a voltage across the memristor to ensure its switching to either high or low resistance states. While reading out data from the memory requires activating the corresponding row and column of a target cell at a voltage lower than its switching threshold. This density driven adoption of a passive topology, where no access control [6] is available per cell, imposes an additional challenge to be addressed in the read operation; the sneak path phenomenon [6, 30, 31] . Sneak paths are considered as added distortion to the actual cell data. The read out value of a particular cell is the combination of the saved data along with the added distortion as depicted in Fig. 1a . The sneak paths are a result of having open access to any of the cells in the array. Such circuit structure is adopted in order to preserve the density attained with the memristor crossbar structure. Nonetheless, the presence of sneak paths leads to a large overlap in the regions for the readout values of '1' and '0' bits as shown in Fig. 1b .
A trivial solution to the sneak path problem is to add a switch per cell to control the access, thus leading to gated or 1T1M/1D1M architectures [32, 33] . The added switch provides higher reliability but at the expense of increased area per cell and consequently reduced density. Gateless solutions that address the distortion imposed by the sneak path are split into circuit-based and data analysis approaches. The former applies different contexts of architectural [16, [34] [35] [36] [37] [38] [39] and delay accommodation techniques [17] , which achieve improved margins and accuracy of the read-out data while sacrificing other metrics of power efficiency, density and speed respectively. On the other hand, data analysis approaches take the data dependent characteristics into consideration and dampen the sneak current effects by applying information theoretic and statistical principles. These data-constrained measures add complexity to the sensing circuit design [40] and limit the amount of information that could be encoded within the array [17, 41, 42] . A compromise is encountered in the aforementioned techniques for overcoming the sneak path or distortion limitation. Gated approaches suffer mainly from density limitation but preserves the reliability of operation. On the other hand, gateless approaches prioritize the density measure over all other metrics. Thus, contrasting measures of energy efficiency, circuit design complexity, information constraints, speed, and accuracy are all set forward without a comprehensive solution.
In this paper, we propose a novel readout technique that preserves the advantages of the crossbar array while adhering to the strict design requirements of integration and operation efficiency. We model the sneak path distortion affecting a memory cell as an additive noise that is correlated with the distortion imposed upon the cells in the same row and column. Exploiting this correlation, we devise a technique for estimating the sneak path current by making use of pilots; memory cells that contain known data values. Once the distorting sneak path effect is estimated, its effect can be alleviated. A simple threshold-based readout mechanism can then be employed, and is shown to yield high accuracy. Using SPICE simulations, we demonstrate the improvements achieved in terms of read margins and the resulting complete separation between the high and low bit regions. The simulation setup incorporated the array non-idealities such as the wire resistance. Tests addressed the non-linearity effects, the probability of '1' or low resistance within the array data, and different data patterns including random and NIST memory images [43] . We report the SPICE simulation results for large arrays of 256 kb and reaching up to 1 Mb in size. Incorporating pilots within gateless crossbar arrays could be used for different purposes. At one end, it provides insight to the sneak path and a way to alleviate its effect using a simple linear model. The model is not strictly confined to memristor cells, but is also applicable to other emerging technologies within a crossbar structure. On the other hand, it could also serve as post silicon validation techniques to ensure the operations are within the set constraints.
The remainder of the paper is arranged as follows. Section 2 presents the sneak path model within passive arrays and its estimation principles. The integration of the pilot within the crossbar array and its consequent effect on operation are illustrated in Section 3. Simulations and results of the pilot-assisted readout operation covering different aspects of the data effects and circuit design are discussed further in Section 4. The Discussion in Section 5 holds the analysis of our proposed technique and the comparison with alternative approaches that target the sneak path phenomenon. Section 6 presents the conclusions drawn and a summary of the overall paper.
Dynamic sneak path estimation
Reading a cell value within a memory array is usually done through applying a read voltage across the corresponding word line and bit line of the target cell. This approach to the reading operation, with leaving the remaining rows and columns of the memory array floating, comprise a floating rows and columns scheme (FRC). In gateless architectures, the memory cell would be solely composed of the memristor element as a storage device as shown in Fig. 2 . The memristor model used throughout the paper is the HP device reported in [17] 10 /10 9 1 2 respectively. Upon activating a particular row and column for the read operation, no cell selector is available to suppress the access to the remaining cells in the memory array. Consequently, current sneaks over to the other cells and forms a significant amount of distortion. The sensed current is then composed of the original cell data along with the sneak path current. Fig. 1a shows a sample of the sneak path current that is added to the target cell to form the readout value.
Channel model
As current sensing is used in order to read the memory cell value, the sneak paths act as a resistor in parallel with the target cell [6, 38] . Thus, the sensed current I sense is the sum of the current of the target cell I t and the sneak path current I sneak .
= + ( )
The sneak current is primarily a function of the complete array data and dimension with no confined leakage path. Moreover, the level of distortion added is quite random and spreads across the complete dimensions of the array [44, 45] . It forces a random distribution on the high and low resistance states that shifts them away from their original values. It causes an overlap between the regions and diminishes the read margins as shown in the shaded area in Fig. 1b . The overlap percentage -set as the shaded region normalized by the overall range of values, is the measure to be addressed and alleviated using the estimation technique. In this paper, we model the sneak path effect as a distortion added to the binary signal stored in a memory cell. As depicted in Fig. 3 , the sneak path estimation is the filtering step required in the process of rearranging the distributions of the ones and zeros by shifting them apart and forming the clear separation needed for decoding. The input signal S i stands for the original data saved within the array. Whereas D i corresponds to the added distortion or sneak path current. With the estimation of the distortion, the decoded signalŜ i is the resulting output value. We explain below how we model the distortion introduced by the sneak path.
The sneak path current is modeled as an additive Gaussian distributed noise due to its randomness and dependence on the data within the memory array. Moreover, following the noise analysis in [44] , the distortion is found to be two-dimensional and depends on the row and column of the target cell that we desire to read, as depicted in Fig. 4 . As the column and row of the target cell are the only lines with voltage activated across them, the current passing through these two dimensions is the most prominent factor on the sneak path current. Furthermore, the number of ones within the row/column has a direct impact on the distortion encountered at a particular cell within the corresponding lane. As the bit '1' is mapped into R ON compared to R OFF for bit '0', a lower resistance is seen per cell, and more sneak current is allowed to pass through the cell. This effect propagates along the row/column as higher current accumulates with larger number of '1's along the vertical/horizontal dimensions.
The distortion is thus primarily a function of the row and column factors = ( ) D f D , D r c . However, the contributing percentage of each dimension is not easily formulated as it is data dependent. Thereby, the proposed estimation process is built on fixing one dimension and scaling the distortion by the number of low resistance states in the perpendicular dimension. Reading cells across a particular row in the array, the same row will be activated, but for every cell its particular column will be activated. Thus, the row effect is common among all the readings along the same row. However, the column effect is the variable factor. Moreover, simulating the array, it was apparent that with increasing the number of '1's in the row/column, the higher is the distortion seen in the cells along the vertical/horizontal dimension [38] . Hence, the maximum distortion that occurs in a certain row takes place at a column where all stored data are ones. Thus, for any target cell, depending on the row it resides on, the maximum distortion along this particular row is obtained. Furthermore, the dependency of the distortion on the overall number of ones, allows for the scaling of the maximum row distortion by the actual number of ones in the column that the target cell resides on. Hence, we can formulate the following expression for the distortion at the target cell (i,j), i.e., the cell at the ith row and jth column:
where D max i , is the distortion affecting the ith row. Parameters N c j , and N A are the actual number of ones in column j and the dimension of the array, respectively. In order to estimate D i j , , we need to estimate D max i , and N c j , . To this end, we use pilot cells in the memory array where the stored data are always one. After explaining pilot setting within the memory array, we provide the details for the steps needed to estimate the sneak path signal along with the simulation setup and results.
Pilot setting
In its abstract sense, pilots are preset cells that allow for estimating the channel and distortion undergone by data prior to its detection [46] . Earlier practices introducing reference points in memory primarily rely on the assumption of noise correlation over a single dimension. Thereby, alternating values of high and low reference bits are used. With current sensing applied for the detection of saved data, a dynamic current threshold is set per fragment to distinguish the low and high resistance states with no distortion estimation [47] . It is a complex setup that occupies a large portion of the array, requires constant threshold adjustment, and forces a compromise between the estimation range and the enhancement attained. However, our proposed technique emphasizes the two dimensional nature of the distortion in a direct relation to the position of the target cell in terms of the word and bit lines respectively.
Inserting reference points within the crossbar memristor memory is a challenging task considering the intermingled factors that have to be addressed. The foremost condition is concerned with the number of cells that should be reserved for this task, thus compromising the overall information capacity of the array. Moreover, the location where the pilots are placed plays an important role in the estimation process. The parameters needed for estimating the distortion are also highly dependent on the pilot cells and its location setting. To that end, several allocation possibilities were investigated while bearing in mind the density metric and the performance enhancement.
Starting with pilots in the vertical dimension, setting the pilots to all zeros will impose a high resistance along the line. Thus, the current passing through the column will be too small to capture the variation of the data within the array. It would act to suppress, to a certain extent, the sneak path effect, as shown in [38] , but requires the reservation of several lines for that purpose and would not allow for the estimation. On the other hand, setting the pilots to a low resistance shows the sensitivity of the pilot cells to the change in the data within the array, particularly along the rows it resides on. Similarly, setting the pilots with high bits on the horizontal dimension captures the sensitivity of the pilot cells to the variation along the columns. Setting pilots along a single dimension is not sufficient. As it does not provide all the parameters needed for the estimation.
Hence, the choice to set the first row and column of the array as pilots and to fill them with ones, stands out as an adequate setting that serves the main purposes for the estimation process and density conservation. Fig. 5a shows the allocation of the data within the array. The column pilots provide the maximum distortion per row, whereas the row pilots provide the required estimate regarding the count of the high bits in a column.
Estimation principles
As explained previously in the argument leading to (3), the following two quantities are needed for the sneak path estimation:
The number of high bits in the vertical dimension N c j , .
Parameter D max i
, is estimated using the pilot cell in the same row of the target cell. Note that this pilot cell lies within an allones column. The pilot cells are readout using current sensing with application of an input bias across the first column and row the pilot resides on. The estimated value is simply the difference between the pilot readout value, P r i , and I high , which is the signal we would ideally get in case the stored signal is not subject to any contamination or distortion. Regarding parameter N c j , , we make use of the following hypothesized model for the readout pilot value associated with pilot cells on top of each column. Specifically, we assume that the readout pilot value, P c j , , is a linear function of N c j
where α and β are the linear fitting parameters that are obtained offline prior to the reading stage. As depicted in Fig. 6 , reading out pilot values from several simulations for different sized array, with the first row and column filled completely with ones, show the dependence of the pilot read out value on the number of ones across the column/row it resides in.
Linear model
The main concept behind the pilot linear model is to capture the effect of the column data variation on the readout value of the pilot. Consequently, the relationship established simplifies the required estimation to an extent where the regions for the high and low resistance state readings are distinguishable. Indeed, diverse and more elaborated models could be devised, but on the expense of more complex computations and estimation principles. In that regards, two main factors that affect the modeling are the R /R OFF ON ratio (α) and the size of the array N.
The change of α directly affects the value of the high and low resistance states. Thus, with decreasing ratios, the high resistance state starts to play a role in the readout value of the pilot cell. However, linearity still holds but with a much higher sensitivity (a) (b) Fig. 5 . Crossbar array pilots (a) Reserved pilot setting on the crossbar along the primary row and column. The pilot cells contain preset values (set to '1' in this case) to be used for the estimation process (b) The vertical and horizontal pilots needed for the estimation of the sneak path along a target cell. For the readout of any target cell, the vertical pilot will provide the maximum distortion that could be seen within the row it resides in. The horizontal pilot will provide the number of ones in the target column and thus allow for the scaling of the maximum row distortion attained earlier.
required. As a conclusion, adopting higher ratios allows for a better estimation bearing in mind that ratios up to 1000 are typical in the memory design [17] . Similarly in terms of the array size, with increasing array dimensions, the sneak path value rises along with the effect of the horizontal and vertical dimensions the read cell lies upon. Hence, it reflects back on the sensitivity of the readings, with lower precision needed at higher array sizes.
where ρ stands for the sensitivity of the reading. There is a trade off between the complexity of the estimation process and the model adopted, with the size and ration of the resistance states. The linear relationship apparent between the pilots and the number of ones is constant across various distributions and patterns saved within the same array size. Thus, the linear fitting operation is required once in the crossbar lifetime.
Estimation process
Having the estimation parameters at hand, simulations of data points across a 256 kb memristor crossbar reveal a predominance of the distortion on the readout values. A random sample of 100 points was picked from different locations within the array to show the effect of the proposed estimation on the decoding process. The readout values of the data bits were normalized with respect to the ideal read-out value of a high bit. Fig. 7 shows the readout signals and the decoded bits prior to the application of any estimation principles. However, as shown in Fig. 8 , estimating the distortion parameter and removing its effect from the readout values significantly improves the signal range. It separates the data levels apart and results in a simple mapping to the original cell values. It also smoothes the signal and allows for setting a clear threshold for a completely accurate decoding step.
Pilot assisted readout technique
In the process of applying the pilot technique into the read operation of the memory, design considerations in terms of complexity and speed add a further dimension to the estimation problem. The readout process and the decision circuitry required for the final output result are discussed.
Pilot assisted memory readout
In a pilot incorporated memristor crossbar, two main components must be available for the read process of any of the data bits within the array; this includes the row and column pilots P R i and P C j respectively. Thus, considering a single row, the first element will suffer from the highest level of distortion as it lies in a column with the maximum number of high bits along the dimensions of the array. On the other hand, the remaining data cells will incur a different level of distortion that is mainly shaped by the number of high bits along its vertical lane. Thus, subtracting the readout value of the row pilot from its actual saved value (a high bit) will provide the maximum level of distortion along the row it resides in. However, the column pilots set on the first row of the array will provide insight into the number of high bits available along its corresponding column.
As depicted in Fig. 5c , reading the target cell S i j , will undergo the following steps:
1. Read the row pilot P R i and extract the row distortion. 2. Read the column pilot P C j and extract the number of high bits.
Read the Target cell S i j
, and adjust it according to the estimated distortion.
We apply a threshold toŜ i j , . This is equal to S i j , minusD i j , , which is the estimated sneak path distortion.
In principle, three reading steps are required per data cell in order to achieve complete accuracy. That mainly applies in case of adopting a completely random approach to the memory reading and disregarding the features available in memory. Particularly, the locality principle where chunks of data are taken into the cache and continuous cell access is implemented. Thus, all cells on the same row would share a single reading for the row pilot and only the column pilot is needed in that case; consequently this reduces the readings to an average of two read steps per cell. Divergence of the pilot reading during the estimation phase and the readout values will have an impact on the decoding process. It will cause a shift in the distortion estimation that will be mapped in a similar manner to the complete array. Thus, a narrowing of the noise margin will be encountered. A way to combat this divergence effect is to use dedicated resistors for the pilot cells instead of memristors. In this way the variable effect is mitigated to a certain extent allowing for the estimation process to hold for a larger range.
Digital decision circuit
Once the memory cell is read out, the sneak path parameter is estimated and the actual cell value is adjust accordingly. The estimation, adjustment, and decision process are all performed in the digital domain. An Analog to Digital Converter (ADC) is used to convert the readout values into digital format. The decision process is then based on adjusting the readout value by subtracting the estimated distortion, illustrated in Eq. (6). The parameters required for the underlying operations are made available through the linear fitting and the pilot cell values. Hence, a simple arithmetic circuit composed of basic operations of subtraction and shifting operation is required to reach the final decision on the read out value. Fig. 9 shows the corresponding circuit along with the underlying operations and parameters. Having the processing performed in the digital domain will avoid the analog circuit variations, such as temperature, process, and voltage, that might affect the sneak path estimation. It will link in it merely with the quantization noise which consequently depends on the number of bits used. Further elaboration and analysis of the quantization and bit attained are presented in lieu of the array size and simulations in the following section.
Simulation and results
The sneak path estimation technique has paved the way for improvement in the readout process over several array sizes. However, in order to test the validity of the proposed technique, diverse simulations and analysis were conducted. The details of the simulation setup along with results for the array simulation analysis are discussed further showing the induced margins and separation results for the readout data.
Simulation setup
To cater to the large and realistic size simulations required, a python script was formulated that generates SPICE netlists [16] . It calls the Cadence APS iteratively and simulates the array in a circuit based environment. The flexibility offered within the script allows for conducting a variety of tests and concept verification. The script provided the option to simulate any array length up to the levels that match the realistically available arrays. Simulations conducted varied from a 4 kb array (64 Â 64 square array) up to 1 Mb. Furthermore, complete flexibility in the data input is set through a large choice of input patterns, whether in random, checkered, or real memory dumps (NIST images) format. This setup has allowed for a detailed analysis of the influencing factors where the complete array was simulated instead of its equivalent circuit model. The memory cell was modeled as a component with the possibility of incorporating any device model whether linear, non-linear, with an added selector, and even added switch. In contrast to an added selector scheme [48] where the non-selected terminals are biased to V /2, the simulation performed were based on a floating rows and column (FRC) setting as shown in Fig. 2 . The FRC option was adopted in order to target the worst case scenario for the sneak path. The FRC environment was also set to maintain the minimum level of power consumption in relation to the crossbar connectivity and the added read voltage to exceed the selector threshold. Moreover, within the biased scheme and despite the selector presence, the sneak path is not completely eliminated and caution needs to be applied when choosing the selector and its underlying non-linearity [32] .
Furthermore, the script accounted for the different non-idealities and parasitics available in the system, such as the crossbar resistance of the connecting wires. 1 In the simulation platform, the worst case of the crossbar resistance is considered = Ω R 10 cb . To that end, the simulation setup provided the required testing platform for validating our proposed technique with simulations covering all the dimensions of the sneak path problem.
Data array analysis
Within the realm of memory, different factors have an effect on the levels of distortion incurred per cell and consequently the margins of operation. In this section we investigate two main factors affecting the overall system performance, the level of the non-linearity of the memristor model and the distributions of high bits within the array.
Device non-linearity
Linear saturation memristor devices usually have a fixed value for the high and low resistance states. On the other hand, nonlinear saturation devices would have a distribution of values for the R OFF and R ON , which is primarily affected by the applied voltage across its terminals. The non-linear HP device reported in [17] has the following relation between the input voltage and output current behavior.
on off / where k on off / and α stand for the ON/OFF-state constants for the memristor device and V corresponds to the input voltage. Utilizing the non-linearity within the crossbar has an improving impact on system performance. Simulations of different data array sizes that were comprised of random data sets were conducted for the two models of the memristor, the linear and non-linear cases. Memristors designed for memory applications are characterized with a high R /R OFF ON ratio [16] . The parameters used for the devices are taken from [17] . The parameters for the linear device: R on ¼1 MΩ, R off ¼ 1 GΩ, crossbar resistance R cb ¼10 Ω. For the non-linear device, the parameters are
As illustrated earlier, the sneak path would force the readout values for the high and low bits to have a distribution of values rather than a single quantity. For each of the high and low resistance states, the difference between the minimum low high / min and maximum low high / max value readout is considered its corresponding range. Thus, the resistance range is calculated as follows:
To measure the effect of the non-linearity, the overlap between the distributions or ranges for the high and low resistances was recorded. It corresponds to the difference between the minimum value for the high bit and the maximum value for the low bit = − ( ) Overlap High Low 9
min max
As depicted in Fig. 10 , the distribution overlap between the high and low bits is mainly affected by the size of the array. It increases dramatically with increasing sizes. The linear devices suffered from severe overlaps even at lower array sizes, reaching up to a complete diminish of the margins at high array sizes (1 Mb). On the other hand, the non-linear devices provided improved margins and a better performance but without complete mitigation of the sneak path effect. 
High bits distributions
Aside from the array length, the data distribution and particularly the probability of high bits within the array plays a significant role on the level of distortion incurred in the data cells. In that regards, abiding by the severity of the induced distortion and the practicality of the sizes under investigation, the tests were run for a 256 kb memory array with the non-linear saturation devices incorporated within the crossbar and random data pattern. The random pattern would generally provide an equiprobable percentage for the low and high bits respectively within the array. Tests of data patterns such as checkered, rows, columns, and interleaved were not performed as they are rarely present in real scenarios and would not provide any added value to the system verification. On the other hand, building on the random nature of the sneak path and instead of applying systematic or structured patterns, the percentage of the high bits within the array varied from 10% up to 80%. Thus, these different simulations provided an added dimension in the testing and verification of the proposed scheme. As shown in Fig. 11a , the increasing percentage of high bits within the array induced larger levels of distortion and increased the overlap among the high and low bits spectrum. Furthermore, the percentage of region overlap for the low bit is much more severe than the high bits, as it becomes completely governed by the sneak path distortion and the actual cell data begins to loose its impact in comparison to the level of the distortion imposed. The subplot of Fig. 11a shows the discrepancy of the overlap for the high and low bits and its significance, particularly for the low bit values; it reaches up to 60% for an 80 percent distribution and only 20% for the high bit values. However, with the application of the pilot technique for arrays with different distribution percentages, the regions overlap for the high and low bits was alleviated and the read margins were immensely improved regardless of the distribution percentage. As a result, an error free readout operation was achieved. Fig. 11b illustrates the separated regions across the distribution percentage levels. This analysis stresses the flexibility of the pilot technique in comparison to the statistical approaches. Its accuracy measure is not bound by the data distributions which stands in contrast to [40, 42] where limitations are set on the data distributions in order to achieve the required margins.
Decision circuit design
Next, the design implications on the arithmetic circuit design are discussed. This is primarily in terms of the analysis of the quantization process and the area estimation for the digital circuitry, as they represent the two major points quantifying the level of complexity required for the decision phase.
Quantization
The decision process is based on quantized values of the data required for the estimation. Mainly the row and column pilots along with the linear fitting parameters that are made available through an offline step. The level of accuracy to be attained is determined by the number of quantization bits used. Larger array sizes incur a considerable amount of distortion and an elevated level of sneak path leading to higher nominal values and a wider range of high and low values. Furthermore, the read margins have collapsed, making the difference between the readout values of the high and low bits quite small. The error percentage is calculated as the overlap region between the high and low bits over the complete range of values. Thus, the bits required to assure an error free readout with completely separated regions increases with the array size. Fig. 12 shows the dependence of the error percentage on the number of bits used for the quantization and the effect of the array size on the required number of bits to assure an error free operation. Thus, merely 5 bits for the practical array size of 256 kb are needed, but reaches an 11 bit requirement for an array of 1 Mb. With the resulting requirements, the pilot technique preserves the simplicity of the circuit design and shows improved performance compared to the alternative circuit technique [16] that requires higher number of bits for a similar operation. Moreover, it allows the error free operation for larger array sizes without constrained circuit elements, which exceeds limits set by other approaches as well [40] .
Area estimate
With the digital circuit mainly composed of simple arithmetic operations and subtractors, its contribution to the total area is quite minimal considering the level of scaling achieved in the CMOS technology [5] . On the other hand, the analog-to-digital converter (ADC) required for the data conversion and the proper detection process should be further analyzed, as its underlying size depends on the frequency of operation along with the number of bits used. As established previously, 5 bits are needed for the anticipated operation of the circuit for a 256 kb array. The state-ofthe-art 5-bit Interleaved SAR ADC operating at a 250 MS/s in 65 nm technology [49] is reported to have an estimated area of 5 mm 2 . This estimate could be further reduced with the frequency scaling required to match the operation of the Redox memory, which has a delay of less than 50 ns mapping to 100 MS/s frequency operation according to the International Technology Roadmap for semiconductors (ITRS) report [5] . As per the area model equations in [50] , the estimated scaled area reaches × − 1.56 10 mm 2 2 , which consumes less than 0.01% of a 1 Gb DDRAM of die area 102 mm 2 [51] . A negligible area overhead is added to the die area with the proposed digital circuitry. It consequently preserves the energy efficiency throughout the crossbar functionality and the relieved design complexity as well.
NIST data simulations
Testing for the validity of the pilot technique in real case scenarios called for the use of NIST data images [43] that correspond to real memory dumps. To that end, simulations for a 256 kb crossbar array filled with NIST data images show a significant impact of the sneak path, resulting in a massive shift of the low bit region into the region of the high bits with a large overlap among the low and high bit values. Fig. 13a illustrates the merged regions for the high and low bits prior to any estimation. However, once applying the reading technique with the underlying estimation principles, the sneak path distorting effect was immensely reduced and resulted in a complete separation of the high and low bit distributions. It allowed for setting a threshold to distinguish between the high and low bits and achieving an error free reading. Fig. 13b shows the resultant readings after the application of the sneak path estimation, where the data sets are divided with a clear indication of a threshold. The enhancement attained with the pilot technique allowed the alleviation of the sneak path parameters and paved the way for accurate decoding at different data sets. This was also while considering the non-idealities within the system, such as the interconnect resistance. More than 115% enhancement was achieved in the read margin with the mere expense of two reserved lines, a negligible density loss compared to the levels proposed in [37, 38] .
Discussion
In an overall comparison to the applied techniques the pilot technique provided a compromise among the performance metrics. Table 1 holds the quantitative analysis for the area, power and speed of operation for several techniques along with this work. The density loss is calculated as the number of non-data bits divided by the total number of bits in an N Â N memory array. The non-data bits are defined as the bits used for pilots, isolation points, or the added lines used to short the rows/columns. A split is mainly set on the basic principles of leakage mitigation, which is either dependent on the data distribution or the circuit structure. The circuit oriented approaches provide no limitation on the data set to be saved within the array, thus allowing for any type of memory state whether in a static or dynamic environment, and in any type of architecture, be it a flat or a hierarchical model. However, these features are provided as a compromise to the speed metric where three [16] or six [17] reading steps as variant solutions are provided with varying power consumption levels as well. Density is also an affected parameter in alternative circuit based approaches [37, 38] , where inserted isolating points or complementary switches occupy a considerable portion of the array in order to limit the effect of the sneak path without targeting any estimation or decoding operations. On the other hand, the data analysis techniques try to alleviate the distortion imposed by the sneak path by enforcing some limitations on the distribution of the data within the array [40, 42] . It approaches the arrays 
Table 1
Performance metrics for reading a complete row in a N Â N array. The density loss is calculated as (number of non-data bits)/(total number of bits in the memory for N Â N array). from a probabilistic and distribution perspective and enhances the accuracy of the read values with a clear compromise of the capacity and dependency on the distribution of high and low bits within the array. Further to that, the complexity of the estimation process and the decision phase rises with the desired level of precision.
Despite the nature of the data analysis of the pilot technique, it still benefits from the advantages of the circuit approach as well. The technique continues to handle the array in the floating rows and column strategy, providing the minimum level of power consumption for structures that do not involve any fabrication or isolation points as proposed in [38] . Using pilots also does not impose any constraint or limitation on the data and the corresponding distribution within the array, providing a high density measure. These features are offered with no added complexity or circuit modification, but rather preserves the simplicity of the decision stage in a comparable manner to the circuit technique. Furthermore, with a significant impact, the pilot-based technique allows for an approximately single read for slowly varying environments, or a similar level of speed to that of the improved circuit techniques, but at a lower power cost. The sequential reading operation poses a challenge to coop with circuit techniques where parallel reads are easily feasible but at much higher energy cost. In the pilots technique, the parallel read is also applicable but with reading the pilots first then followed by the actual cell. However, it is a matter of resource allocation and compromise between the performance metrics sought for in the application at hand. In that perspective, the pilots provide a competitive solution that captures the essence of the noise mechanism within the array and allows for its simple extraction and elimination. This is all while maintaining the accuracy measure and an overall speed improvement to comparable non-intrusive approaches. The primary affecting parameters on the feasibility and optimality of the proposed solution are further stressed. It lies with the capacity of the array, in terms of the information storage capabilities, and the enhancement of the bandwidth or the speed of the readout operation. These metrics are optimized bearing in mind the simplicity of the decision circuitry and the estimation technique. This is particularly in reference to the performance analysis of the overall two dimension pilot setting; the predetermined set of allocated values immensely improves the accuracy of the readout process. As only two lines of data are reserved for the pilots, the density loss decreases significantly with the increasing size of the array, reaching less than 0.05% in the range of array sizes used for our simulations. For a square array with dimensions (n,n), the array density is estimated by − ( − ) n n 2 1 2 as only two lines are reserved for the pilots. It is at a similar level to the circuit approaches that are optimal in terms of the density attainment. It also out performs the data analytic techniques by relieving the constraints that limit the capacity for the sake of improved performance, where limitations up to nlogn 2 were reported in [42] and accuracy bond sizes up to 16 kb in [40] . It provides an added advantage of reduced read delay with no architectural modification or the information capacity limitation.
Conclusion
An estimation scheme for the sneak path was presented in the context of the passive memristor crossbar. It built on the detection theory principles with signal extraction to improve the decoding process. Pilots were induced into the array at the primary vertical and horizontal lanes as a two dimensional setting to benefit from the decomposition of the distortion parameters, to serve as the extraction window for the data cells, and to allow for an accurate reading operation. A priori offline measure paved the way for a comprehensive solution that accommodated the contrasting design challenges and allowed for better performance compared to circuit based and analytic techniques.
