A series of breakthroughs in memristive devices have demonstrated the potential of using crossbar-based memristor arrays as ultra-high-density and low-power memory. However, their unique device characteristics could cause data disturbance for both read and write operations resulting in serious data reliability problems.
INTRODUCTION
The evolutionary improvement of current memory technologies cannot keep up with the fast-growing demand for denser, lower-power, and higher-bandwidth memories. In traditional transistor-based memories, high leakage current is becoming a major concern, and imprecision in the fabrication process is reducing the yield to an alarming level as the technology feature size continues to shrink [3] . To address such problems, several new memory technologies have been proposed. Redox-Based Resistive Switching Memories [36] , Phase Change Memories [13] , and Spin-Transfer Torque Magneto-resistive Memories [28] are some of the emerging technologies that could possibly serve as the nextgeneration memories for various applications. Among these candidates, metal oxide valence change ReRAMs (more generally referred as memristor [8] ) are especially promising due to excellent scaling prospects, high endurance and high speed which can also be combined with great retention [2, 18] .
A memristor is a passive non-linear resistive device, the resistance of which depends on the time integral of current applied across its terminals. Hence, it maintains its resistance in the absence of electrical current, which makes it suitable as a non-volatile memory element. The theoretical foundation of memristors goes back to 1971, when L. Chua predicted the existence of such a device [7] . However, it took researchers decades to unify the theory with experimental observations [37] .
There have been very active research efforts recently, in both industry and academia on various aspects of memristors [1, 5, 17, 14, 23] . Memristive devices of a feature size 15nm has been fabricated in academia [20] to form a crossbar-based memory array [21] . A crossbar architecture has been used due to its high density and regularity. It is anticipated that feature sizes as small as 3nm are feasible [29] due to simpler and imprecision-resistant fabrication process [32] . The fabrication process is CMOS-friendly [24] , and efficient methods exist to stack layers of such memories [10] which facilitate the integration of 3D memristive memories with CMOS computing cores and decoding logic. Estimations as well as preliminary experimental measurements in their power consumption show considerable improvement over existing technologies [19, 2] , as maintaining the data stored in memory does not incur any power consumption, and there is no active leakage current (as they are two-terminal passive elements). Reported experimental data show very fast write operations [15] , while the speed of a read operation is limited by that of its CMOS sensing circuitry. All these characteristics make memristive memories ideal for integration with computing cores as an extremely dense and low-power on-chip non-volatile memory in the near future [6, 34] .
However, there are some intrinsic characteristics of memristive memories that result in data reliability issues when memristors are used to form a crossbar-based memory. One issue with such memories is an undesired coupling effect with which writing into one memristor may affect the data in several other memristors sharing the same word and/or bit lines. The effect is referred to as write disturbance [9, 21] . Moreover, as the resistance of a memristor is current-historydependent, reading its resistance value by applying a read voltage across the memristor and measuring the resulting current can slightly change the strength of the stored data [30] . This effect is referred to as read disturbance. Both effects could be accumulative for a series of read/write operations which could result in data corruption and degrade the reliability of memory data. Thus these issues must be addressed before memristive memories can serve as system memories.
In this paper, we first describe the data reliability issues of memristive memories in detail, and then present a comprehensive solution to address them. Our proposal is based on restraining the write disturbance effect, detecting data corruption by adding redundancy, and restoring/refreshing the disturbed data before corruption. We then evaluate the cost of the proposed solution in terms of the area, perforPaper 14.3 978-1-4799-0859-2/13/$31.00 c 2013 IEEE The main contribution of this work is that it solves the data reliability problems of crossbar-based memristive memories. In addition, the proposed solution achieves the following goals:
• Incurring low area-, performance-, and energy-overheads.
• Using only standard memristive elements without adding any special elements, thus preserving the regularity and the scalability to achieve high-density memristor arrays.
The rest of the paper is organized as follows: Section 2 provides the readers with the necessary background on memristors and memrisitve memory architectures. Section 3 describes the data reliability issues of memristive memories in detail. Section 4 presents our proposed solution followed by the experimental results in Section 5. We also elaborate on a comprehensive electrical model for the memristor crossbar array which is used to analyze the performance and energy overheads of our solution. Section 6 concludes the paper. Figure 1 shows one possible realization of memristors. A simple memristor consists of three layers: two metallic electrodes, such as Pt, on top and bottom, and a doped thin film, such as TiO2, in between.
BACKGROUND ON MEMRISTORS

Device Physics
In the initial state, a filament of conductive TiO2−x is formed in the non-conductive TiO2 film in an irreversible forming step [16] . However, the filament does not connect the two electrodes together, thus the device is in a High Resistance State (HRS). In order to turn ON the device, a sufficiently high positive voltage is applied across the electrodes of the device. This makes the filament connected to the top electrode attract positively charged vacancies in the oxide. This essentially grows the filament, as the vacancies start to drift in the applied electric field through the most favorable diffusion paths, and form a channel between the two electrodes [16] . Once such highly conductive channels are formed, the device is in Low Resistance State (LRS) and considered as ON.
To switch the device to the high resistance OFF state, a voltage with the opposite polarity should be applied on the electrodes. This repels away the vacancies that formed the conductive channel, thus shifting the device back to its high resistance state.
The state of the device and thus its resistance only changes when an electric current is passing through the device, and this change is continuous between two extremes: the RHRS and the RLRS. The change can be modeled according to the time integral of the current. RHRS and RLRS depend on the initial filament and are set in the forming step. However, forming free-devices are also under research [12] .
Data Storage
In order to use memristors as memory elements to store binary data, the possible resistance range of the memristor is divided into three regions, as illustrated in Figure 2 . Lower resistances are considered as logic 1 and higher resistances are considered as logic 0. Any resistance that falls in the marginal region in between is considered as unknown to ensure accurate distinction of logic 0 and 1.
Throughout this paper, the term value is used to refer to a memristor's resistance value, while the actual binary data is referred to as data. Moreover, the term LRS (HRS) is generally used to refer to the range of resistance values representing logic 1 (logic 0). 
Read and Write Operations
In order to write a binary data into a memristor, a proper write voltage (Vw) pulse of width twrite is applied across the device to set its resistance to the desired value. Vw and twrite are chosen so that the write pulse can completely shift the memristor's resistance to RHRS or RLRS, based on the polarity. That is, a negative pulse shifts the resistance toward RHRS and a positive pulse shifts it toward RLRS.
The read operation decides if the resistance is in LRS or HRS. To do so, a read voltage (Vr) is applied across the memristor terminals. This results in the injection of a current through the memristor, the magnitude of which depends on the memristor's resistance. The stored data can be read by measuring this current.
I-V Characteristics
Without loss of generality, Figure 3 can be used as a model of a memristor's I-V characteristics [35] , based on which the following three regions can be defined:
• Diode Voltage Region (DVR): Applying a small voltage across the memristor terminals would not generate any noticeable current, and ideally would not change the device resistance. For example, this could be due to integrated Metal-Insulator-Metal (MIM) structure in series with the memristive layer. For a relatively small applied voltage, the bias would drop mostly across the MIM layer resulting in a negligible change of the resistance in the memristor.
However, as the resulting current is negligible regardless of the memristor resistance, an applied voltage in this region cannot determine the stored data. This kind of diode behavior has been further strengthened by the introduction of complementary resistive switches [11] .
• Read Voltage Region (RVR): As the applied voltage rises above a certain threshold, the resulting current starts to increase considerably. This current is still small and just slightly changes the resistance of the device, but is large enough to differentiate between the current of a memristor in HRS or LRS and determine the stored data. Voltages in this range (both negative or positive) can be used for the read operations.
• Write Voltage Region (WVR): By further increasing the applied voltage, the resulting current increases even further, having exponentially higher altering effect on the resistance of the memristor. This is due to the highly nonlinear kinetics typically associated with truly non-volatile memristors [18] . Such voltages can effectively change the memristor's state from LRS to HRS (or vice versa, depending on the polarity of the applied voltage), and are used to write data into a memristor.
Memory Architecture
Different architectures have been proposed to utilize memristors to form a memory array. The most popular architecture is the crossbar organization, shown in Figure 4 , which consists of two perpendicular layers of parallel nanowires forming a memristor at each cross section. The memory controller and address decoding circuitry are implemented in a peripheral CMOS subsystem [21] .
However, the simple crossbar architecture encounters some scalability limitations: (1) voltage drop along long nanowires, can prevent effective application of read or write voltages on the desired cross-point, and (2) there is an upper limit on the maximum possible number of cross-points on each nanowire, imposed by noise margin requirements. To address such limitations, in [33] , authors proposed an innovative crossbarbased architecture called CMOL, that addresses the scalability issues by segmenting the crossbar nanowires, thus limiting their length and the number of cross-points per nanowire, while preserving the cross-point density.
In this paper we use the simple crossbar architecture as the underlying memory architecture to illustrate the problem and our solution for the following reasons:
• It is easier, but without losing generality, to explain the concept using the simple crossbar architecture.
• Currently functional memristive memories are built in the form of a simple crossbar [21] .
• The proposed solution can also be generalized to architectures such as [33] , which are variants of the simple crossbar architecture, with minor modifications.
In this paper, we are assuming an n-by-n crossbar memory, where Mij refers to the memristor at the cross section of i th row (also referred to as word-line) and j th column (also referred to as bit-line).
Read and Write Operations in Crossbar
To write data into Mij, a sufficiently wide pulse with an amplitude Vw in the WVR is applied across its terminals. Among several methods proposed to apply the Vw on the memristor [22, 23] , the least intrusive one is applying Vw/2 on the i th word-line, −Vw/2 on the j th bit-line, and grounding all other word-and bit-lines.
For a read operation, a sufficiently wide pulse of amplitude Vr in the RVR is applied across the memristor in a similar way. Then a sensing and comparing (S&C) circuitry is used to read the resistance value. One possible implementation of such circuitry is shown in Figure 5 . As the current passing through a transistor is a function of its gate-source voltage, the diode-connected transistor T in S&C circuitry makes the gate voltage (Vout) follow the transistor's current, essentially converting the current to a voltage. Since the transistor's and the memristor's currents are identical, and depend on the memristor's resistance, Vout reflects the value of the memristor's resistance. 
DATA RELIABILITY ISSUES IN MEM-RISTIVE MEMORIES
Read Disturbance
As current flows through the device during the read time, the device's resistance might slightly change. This read disturbance effect, mostly affects the target memristor, and not the other memristors in the crossbar array. The read voltage Vr is chosen to meet two criteria: (1) it is within RVR, and (2) Vr/2, the voltage applied to other memristors that share the same word-or bit-line with the target memristor, falls within the DVR. The second criterion ensures that the sideeffect of the read operation on other line-shared memristors is negligible.
It should be noted that not every read operation is disturbing. For a given polarity of Vr, only one of the two logic values will be disturbed. For example, assume a positive Vr is applied for the read operation. If the stored data is logic 0 (i.e. its resistance value is within HRS), the resulting current will slightly shift its resistance toward RLRS, making the stored data a weaker 0. However, if the stored data is logic 1 (i.e. LRS) the read current will have a healing effect, by shifting its resistance toward RLRS, thus making the stored data a stronger 1.
Few articles in the literature addressed the read disturbance problem. In [30] the authors propose the use of more complex read pulses consisting of alternating Vr and −Vr pulses to compensate for the destructive effect of the read operation. The alternate pulse, appended to the original read pulse, heals the destructive effect of the original read. However, this method doubles the read time and energy unnecessarily, as not all read operations are disturbing. The incurred overhead is particularly expensive as the read operations are more time consuming than the write operations for memristive memories.
In Section 4, we propose a couple of different restoring schemes to address this issue. Our solution reduces energy overhead of a reliable read operation by triggering data restoration only for disturbing reads and expedites the data restoration by utilizing other existing voltages in the system. 
Write Disturbance
Applying Vw/2 and −Vw/2 on the word-and bit-lines respectively to write data to a memristor has an undesired side effect: a Vw/2 voltage (which falls within RVR) is also applied to all memristors that share either the word-or the bit-line with the memristor under write, which can slightly change their resistances. This effect can be disturbing or healing based on the written logic and the logic stored in the line-shared memristors: If memristor M stores a logic 0, writing a logic 1 (logic 0) to one of its line-shared memristors shifts M 's resistance toward logic 1 (logic 0), weakening (strengthening) the stored logic. Same thing happens to all other memristors on the same line as the memristor-underwrite and storing a logic 0. Note that the write disturbance problem is harder to deal with than read disturbance due to its broad impact. Figure 6 illustrates the effect of write disturbance on other line-shared memristors.
One solution is to add a switch (i.e. transistor) for each memristor to enable the isolation of a memristor from the rest of the memory array [26] (referred to as the 1T-1M technique), thus avoiding the destructive effect on the lineshared memristors. However, this technique encounters the same technology scaling limitations as other transistor-based memories, due to the integration of the transistors.
In Section 4, we propose a solution to this problem by having additional ordinary memristors with known data content in the memristor layer. These extra memristors are used as references for detecting possible data corruption. This solution, to the best of our knowledge, is the first solution to the write disturbance problem that preserves the scalability advantages of the memristor technology.
Disturbance Accumulation
The data reliability problem arises from the fact that the effects of read and write disturbances are accumulative and could eventually lead to data corruption. That is, a memristor's resistance can be shifted to the unknown region or even the opposite logic region. Figure 7 illustrates the disturbance accumulation after a sequence of write operations. Figure 6 : Write Disturbance. Vw is applied between times t1 and t2 to set the memristor-under-write to logic 1 (LRS). Black (white) memristors are in HRS (LRS). The data is written into the target memristor correctly (top-right). The white memristors sharing the same row or column are either not affected (topleft) or slightly healed (bottom-left), but the black ones are slightly disturbed (bottom-right).
ADDRESSING MEMRISTOR DATA RE-LIABILITY
Read and write disturbances are intrinsic features of memristors, which if not addressed, will result in frequent data errors, that cannot be handled only by system-level solutions such as Error Correction Codes (ECC). Here we try to prevent, detect, and resolve data errors caused by such disturbances by proposing a circuit-and architecture-level solution. However, ECC can always be used in conjunction with our method to provide additional protection.
Read-Restore solution for Read Disturbance
The read-restore mechanism in [30] can be optimized for energy efficiency by detecting destructive reads. Then, only in case of a destructive read, the read operation is extended by applying a voltage of an opposite polarity to heal the destruction. That is, if the original read uses Vr (−Vr) which causes disturbance, then the value is restored by applying −Vr (Vr). Note that after the original read operation, both the stored data and the polarity of Vr are known. Hence it is known if the read operation was destructive or not. The peripheral memory controller circuitry is extended to differentiate a disturbing read from a non-disturbing one. Moreover, during restoration, the power-hungry S&C circuitry is turned off which helps minimizing the energy overhead.
However, restoring by applying Vr roughly doubles the read time, because the restoring pulse with the opposite polarity needs to have the same pulse width as the original read pulse in order to recover the disturbing effect.
To accelerate the restoring process, we propose applying a larger voltage Vw instead. This can improve the performance of restoring operation by one order of magnitude since the higher Vw can heal the degraded data faster and more efficiently. Moreover, since the voltage Vw is already available, as it is used for the write operation, no extra voltage resources are needed to implement this method.
The potential problem with the idea of restoring by Vw is the write disturbance effect on other line-shared cells. However, our write disturbance solution described in the next subsection, resolves this side-effect and makes it possible to use the Vw for data restoration. Note that the width of a restorative Vw pulse is shorter than that of a normal write Vw pulse, thus its negative effect is also less significant.
Applying a higher restorative voltage Vw incurs higher energy consumption as: (1) larger current passes through the target memristor, (2) other line-shared memristors will experience higher partial voltage Vw/2, which is in RVR region, thus generates more current. The energy and performance trade-offs of this method will be illustrated in Section 5.
Redundancy-based Corruption Detection for Write Disturbance
Write disturbance affects all memristors sharing the same word-or bit-line with the target memristor. Our solution addresses the problem by employing the following principles: (1) limiting the disturbance to only those memristors sharing the same word-line with the target, (2) adding the capability of detecting the disturbance accretion, and (3) refreshing the disturbed data before it is corrupted.
The reason that both word-and bit-lines are affected by the write operation is the common assumption of applying symmetric voltages of ±Vw/2 on word-lines (bit-lines). In order to confine the domain of disturbed memristors, we propose asymmetric application of Vw , i.e., applying a higher absolute voltage on the word-lines, and a lower voltage which falls within the DVR of memristors, on the bit-lines. This makes it easier to address the write disturbance effect, by protecting the memristors on the bit-line from write disturbance at the cost of having more destructive effect on the word-line-shared memristors. For asymmetric voltage application, we propose applying 2Vw/3 on the word-line and −Vw/3 on the bit-line, where we assume:
This offers several advantages: (1) the bit-line-shared memristors will not experience write disturbance as Vw/3 (i.e. Vr/2) is always within the DVR. (2) The voltage applied to the word-line-shared memristors is equal to Vr, making it possible to read other cells in the same word-line simultaneously as the target memristor is written, by just enabling their S&C circuitry (i.e. sensing and comparing). (3) The number of required voltage levels remains the same (i.e. {±2Vw/3, ±Vw/3, GND} instead of {±Vw/2, ±Vr/2, GND}). Note that while other asymmetric voltage applications are also feasible (i.e. ±3Vw/4 and ∓Vw/4, etc.), that will increase the number of required voltage levels.
In the next step, we add the capability of detecting data corruption before the resistance change accumulates to the level of moving the memristor to the unknown state. The key difficulty for such detection is that if the correct data stored in the memristor is unknown, it is not possible to distinguish a weakened but correct data from an already corrupted (inverted) data.
To address this challenge, we propose the addition of an always-1 (A1) and an always-0 (A0) memristors in each word-line, as shown in Figure 8 , to facilitate the detection of data corruption. Such bits are ordinary memristors, initially Paper 14. 3  INTERNATIONAL TEST CONFERENCEA0   A1  D0  D1  D2  D3  A0  A1  D0  D1  D2  D3  A0  A1  D0  D1  D2  D3   A0  A1  D0  D1  D2  D3  A0  A1  D0  D1  D2  D3  A0  A1  D0  D1  D2 It can be observed that A1 has the worst-case disturbance among the 1-bits on the word-line, since other bits are either equally or less disturbed.
set to their corresponding states (LRS for A1 and HRS for A0). The only difference is that the user does not have write access to these cells, which can be ensured by a proper decoder design. There are two nice features of having such bits on the word-line: (1) As their correct binary data is always known, detection of data corruption for them becomes feasible, and (2) A write operation disturbs them in the same way as it disturbs other memristors on the same word-line. This makes them experience the worst possible case of accumulated disturbances among all cells on the same word-line, as they are never written into through standard memory accesses. Unlike them, other cells may have been written into by write accesses, which offset the accumulated disturbance. This means that the A0 (A1) cell always has the weakest 0 (1) on their word-line. Thus, as long as the resistance value of the A0/A1 cells stays within the correct range, which can be ensured by continuously monitoring them, the integrity of the data stored in other cells on the same word-line can be guaranteed. Figure 7 illustrates the idea.
According to the asymmetric voltage application for the write operation which applies 2Vw/3, that is Vr, to the memristors on the word-line, the value of other cells on the same word-line, thus the A0 and A1 cells, can be read and monitored simultaneously in every write cycle, using the same Sense and Compare (S&C) circuitry as shown in Figure 5 .
Note that A0/A1 bits intend to detect a potential corruption before it actually happens to trigger the refreshing mechanism. Hence, the reference voltages (V ref ) of the S&C circuitry on the A0 and A1 bit-lines, are chosen accordingly to ensure that the output of the comparator is asserted close to but before the corruption. When this happens, a refresh is required on the close-to-corruption logic. That is, if the output of the A0 (A1) bit-line is asserted, all the memristors on the same word-line storing a 0 (1) should be refreshed.
Note that here it is assumed that A0 and A1 are disturbed exactly in the same way as any other memristor on the wordline for clarification purposes. In general case, there might be small variations. The small resistance of the nanowires may result in a voltage drop along the line, which in turn causes the memristors to experience slightly different disturbing effect. This can be addressed by placing the A0/A1 bits closest to the word-line driver. Hence, they experience the worst case disturbance effect, as they are not affected by the voltage drop. Moreover, the disturbing effect might slightly differ among memristors due to the process variation. Conservative adjustment of A0/A1 corruption threshold (V ref ), can take this variation into account to consider the worst case. The next step is refreshing the close-to-corruption data. That is, if A0 cell's 0 becomes too weak, all the 0s on the line are refreshed. Refreshing the memristors storing logic 0 (0-bits) consists of two steps: (1) Finding out which memristors are 0-bits, for which all bits on the word-line of interest are read simultaneously, by applying Vr = 2Vw/3 on the target word-line, grounding all bit-lines, and turning on the S&C circuitry, and (2) Refreshing the 0-bits simultaneously by applying a write voltage −2Vw/3 on the word-line, and Vw/3 on all the bit-lines whose corresponding memristors need to be refreshed, while grounding other bit-lines.
During the refresh procedure, A0 is also refreshed, thus it experiences the same refreshing imperfection, if any, that other 0-bits might encounter (e.g. not wide enough refreshing pulses, etc.). Similarly, during the refresh, the A1 bit on the same word-line will experience the same side-effects as other memristors storing logic 1 (e.g. disturbance of their value due to the refreshing of 0-bits). Hence the method is robust and will not be affected by such imperfections or side-effects.
The main parameter affecting the refresh rate is the Write Disturbance Tolerance (WDT), which is defined as the num-
B i t L i n e Word Line ber of consecutive writes of only logic 1 (0) before corrupting the resistance of the line-shared memristors from a strong 0 (1) to the unknown state. The higher the WDT, the lower the number of refreshes needed. This number depends on two factors: (1) the applied write voltage Vw, as a lower write voltage has a smaller destructive effect, and (2) the non-linearity of the device's I-V curve, as higher nonlinearity would help decrease the destructive side effects of write accesses. Hence, with technology advancement and introduction of devices with better non-linear kinetics, WDT would continue to improve. Measurements in [25] show that applying partial voltage of 2/3Vw (i.e. the partial voltage that causes write disturbance) takes 100x more time (i.e. 100 write operations) to completely change the state of the device compared to when Vw is applied. Hence, assuming equal division of the possible resistance range into logic 0, logic 1, and unknown regions, it can be deduced that WDT for current memristive devices is ≈ 33 (i.e.
3
). The average number of random write operations (logic 0 or 1) that necessitates a refresh is estimated based on WDT and is called ψ (W DT ) hereafter. ψ is used for energy and performance overhead estimation and is calculated by Montecarlo simulations. In that, we count the number of refreshes required during a run of 10 9 write operations for different WDTs. ψ is then obtained by dividing the total number of write operations by the number of refreshes. Figure 9 shows the number of refreshes and the resulting ψ versus WDT.
As the disturbance effect is confined to the word-lines, the number of bit-lines has no effect on ψ. Moreover, as it is assumed that write operations on any memristor on the word-line affect the A0/A1 bits similarly, the number of memristors on the word-line does not affect ψ either. It should be noted that the proposed method guarantees the data integrity regardless of ψ (W DT ) , which only changes the refresh rate.
The energy and performance overheads of the proposed solution, as well as the effect of WDT on those metrics will be discussed in Section 5.
EXPERIMENTAL RESULTS
We derived an electrical model for crossbar-based memories using the Cadence Virtuoso tool, and designed the peripheral CMOS Comparing and Sensing, and decoding circuits. With these circuits and models, we simulated the electrical properties of the crossbar and evaluated the energy consumption and performance of the proposed solution.
In the following, we first elaborate on the electrical model, based on which we discuss the overhead figures of our solution.
Electrical Model and Experimental Setup
The crossbar structure is shown in Figure 10 , which is represented as two perpendicular layers of parallel nanowires. The separation of parallel nanowires is α × Fnano, where The number of required refreshes in a run of 10 9 write operations (dashed) and the average number of writes before a refresh is required, called ψ (solid), for different WDT numbers. As WDT increases, the number of refreshes drops significantly, which in turn increases ψ considerably.
Fnano is the width of the nanowire and α would be 2 for the highest density. t × Fnano in Figure 10 represents the thickness of nanowires.
In order to electrically model each nanowire, they are partitioned into nanowire segments of length αFnano and a resistor and a capacitor are used to model each segment. The resistance per nanowire segment can be extracted using the cross-sectional area and the resistivity ρ of the material:
It is a well-known effect that in nanometric scales, the electrical resistivity (ρ) of a material increases when the mean free path of the electrons in the bulk material becomes comparable to the dimensions of the structure. In this paper, the expected increment in the resistivity [4] is considered and plugged in Equation 2 to estimate the resistance of the segment.
As for the capacitive effect, we use the results obtained in [34] in which the capacitance per nanowire segment can be approximated as:
where ε is the relative dielectric constant of the insulating material. For SiO2, ε = 3.9. Hence, for a given feature size Fnano, pitch αFnano, and relative wire thickness t, we can extract the capacitive and resistive components for each nanowire segment and form an RC network that is driven by lateral circuitry, as shown in Figure 11 .
The memristive devices, formed at every cross-point, are modeled based on the model proposed in [27] which considers the dynamic characteristics of the device.
The peripheral S&C circuitry (i.e. Sense and Compare) is implemented in 45 nm CMOS technology and uses a latchbased comparator based on [31] to produce the output. This comparator only latches the output at selected times and is effectively turned off at other times for energy saving. However, the energy consumption of read operation is mainly consumed in the S&C circuitry.
Crossbar memories of size 1Kb to 64Kb are modeled and simulated to estimate the performance and energy overheads. We do not show simulation results for larger memory due to the following reasons: (1) The use of spice-level simulation limits the memory size for simulation. Cross-sectional area N a n o w ir e s e g m e n t Figure 10 : Physical characteristics of crossbars ple crossbar does not scale well for larger memories. Instead, as stated earlier, other crossbar-based architectures such as CMOL [33] , enhanced from the simple crossbar but with a similar number of cross-points per nanowire segment, has a much larger capacity and thus addresses the scalability issue. While the proposed method can be adapted to these architectures, in order to show results of a significantly larger memory size under these architectures, the results must accompany an in-depth explanation of these architectures which is prevented in this paper due to space limitation. Therefore, we illustrate the trends using the memory sizes in the range of 1Kb to 64Kb (i.e. 32 to 256 cross-points per nanowire) under simple crossbar architecture. Larger memories under those enhanced architectures should follow a similar trend. Table 1 summarizes the estimated energy consumption and timing numbers of baseline (unreliable) read and write operations, based on our electrical model. Memories with higher number of cross-points per nanowire have considerably higher energy consumption due to the increase in the number of partially activated line-shared devices. Figure 12 illustrates the timing and energy consumption of different restoring methods in our simulations. The estimated energy consumption and timing numbers of those methods are demonstrated in Table 2 for different memory sizes. Performance and energy overheads are calculated over the baseline read operation, based on the numbers shown in Table 1 .
Read Disturbance
It can be observed in the sixth column that restoring by Vr has a prohibitively large (≈80%) performance overhead, as the restorative pulse has approximately the same width as the original read pulse to heal the destructive effect. However, as shown in the eighth column, this method offers very low energy overhead (<1%), because the power-hungry S&C circuitry for the typical read operation is turned off during restore operation. Moreover, the restorative pulse width is a bit shorter than the original pulse, as it is directly applied to This also contributes to the lower energy overhead. By applying a restorative Vw pulse instead, the performance overhead can be improved significantly (to ≈8%), as shown in the seventh column. This is due to the high nonlinearity of memristive devices: a higher voltage changes the device state significantly faster. However, as demonstrated in the ninth column, this method incurs higher energy consumption (<4%), since the partial voltage on the line-shared memristors is not in the DVR range anymore, thus injecting more current through those partially selected memristors.
It can be observed that the energy consumption increases with more memristive devices on each nanowire, as more devices will be partially activated due to the partial restorative voltage applied on the line-shared memristors. However, the effect of memory size on performance is negligible, as the increase in restoring time is small.
There is an energy-performance trade-off between the proposed restoring methods. However, considering the fact that memristors offer a huge improvement in energy rather than in performance, using restorative Vw pulses would be a better choice. Application of other restorative voltages is also feasible, but generating additional voltage levels would increase energy consumption and is not desirable.
Write Disturbance
The performance overhead of our write disturbance solution is due to the refresh operation. The refresh procedure takes t read + twrite, which is due to (1) simultaneous read of all memristors on the word-line, and (2) concurrent refreshing of those memristors storing the close-to-corruption data.
The meantime between refreshes (MTBR) depends on: (1) ψ (W DT ) (i.e. the average number of random writes before a refresh is required), and (2) the probability of having write accesses, as lower write probabilities (Pwrite) slows down the accumulation of the write disturbances, thus decreases the refresh frequency. That is, if α = P read /Pwrite, then for every ψ (W DT ) write operations (that on average necessitate a refresh), α × ψ (W DT ) read operations have been performed, which increases the MTBR, thus reducing the performance overhead. Moreover, since each read/write operation should be decoded as well, t dec (i.e. decoding time) is also consid- 
Assuming equal read/write access probabilities, WDT equal to 33, and timing parameters extracted from simulation (Table 1) , the performance overhead would be ≈0.15%, which is insignificant.
The energy overhead of the proposed method is due to the energy consumed for: (1) reading the A0/A1 bits, which is required for every write operation, and (2) performing the occasional refreshing process.
Refreshing energy overhead is caused by: (1) reading the value of all memristors on the word-line, and (2) writing data back in those cells which should be refreshed. Hence, as multiple memristors should be read/refreshed, the refreshing energy also depends on the number of memristors on the word-line, WS, and the number of cells to be refreshed, RC. Since refresh procedure is triggered only when necessary, the refreshing energy should be divided among all the write operations performed between two refreshes (i.e. ψ (W DT ) ) to get the average energy overhead per write operation.
Equation 5 estimates the average energy overhead, where Ex shows the energy consumption of operation x : Figure 13 : Write Disturbance Tolerance (WDT) vs. energy overhead of a reliable write over the baseline write. Table 3 summarizes the estimated energy and performance overheads of the reliable write operation, calculated based on Equations 4 and 5 and the timing and energy numbers presented in Table 1 . Numbers are calculated for exemplar WDT value equal to 33.
Memories with fewer number of cross-points per nanowire have lower (≈40%) energy overheads, as fewer memristors are written (refreshed) and the energy consumption per write operation is small, while decoding is done in CMOS and consumes more energy. As the number of cross-points per nanowire increases, the energy consumption due to refreshing increases since: (1) more memristors should be refreshed, and (2) the refreshing is more energy consuming due to the increase in the number of partially activated devices (due to the partial 2Vw/3 on the word-line-shared memristors). However, note that in scalable crossbar-based architectures such as CMOL [33] , the number of cross-points per nanowire segment does not increase as the memory scales. Thus, when applied to such structures, our proposed method will not suffer from this increment in energy overhead. Figure 13 shows the effect of different WDT's on reliable write operation's energy overhead for different memory sizes in logarithmic scale. Smaller Write Disturbance Tolerances (WDT) necessitate frequent refresh operations, thus increasing the energy overhead of a reliable write operation. As WDT increases, the refresh rate and thus the energy overhead of a reliable write operation decreases to ≈40%. It is also shown that having higher number of cross-points per nanowire increases the energy overhead, as described before.
As for area overhead, the proposed method adds only two memristors (i.e. A0 and A1) on each word-line regardless of the word size. Hence, the area overhead depends on the word size and is equal to . For an exemplar wordline containing 64 memristors, this overhead is 3.12%.
CONCLUSION
In this paper we address the data reliability issues of the emerging crossbar-based memristive memories.
The read disturbance problem is addressed by a readrestore mechanism. Utilizing the voltage levels already available in the memory system, two restoring methods are proposed and are evaluated for their energy-performance tradeoffs.
The write disturbance issue, that affects the memristors on the same word-/bit-line as the memristor-under-write, is addressed by first limiting the disturbance domain only to the memristors sharing the same word-line, through asymmetric distribution of the write voltage Vw. Furthermore, Paper 14.3
INTERNATIONAL TEST CONFERENCEthe possible corruption of data is detected by adding two extra memristors without write access on each word-line, which store logic 0 and 1 respectively and are used as references to check corruption trend and status. A refreshing scheme is also proposed to refresh the disturbed cells. One main advantage of the proposed solution is that the design-for-reliability hardware uses only regular memristors in the memristor layer plus some CMOS circuitry that can be implemented outside the memristor arrays. Hence, unlike other methods which require integration of transistors to decouple memristors, our solution maintains array regularity and will not suffer from technology scaling issues.
Our case study shows that the performance overheads of the proposed reliable read and write operations, are 8% and 0.1% respectively and the energy overheads are 0.5% and 38% respectively in comparison with the baseline, unreliable implementation. This should be affordable due to the ultralow-power characteristics of the memristive memories.
