Abstract-The advent of the first TiO 2 -based memristor in 2008 revived the scientific interest both from academia and industry for this device technology and has so far led to several emerging applications including logic and in-memory computing. Several memristive logic families have been proposed in the current quest for energy-efficient future computing systems. However, the limited device endurance and variability (both cycle-to-cycle and deviceto-device) are important parameters to be considered in the assessment of logic operations. In this work, we used an accurate physicsbased model of a bipolar memristor (supporting parasitics of the device structure and variability of switching voltages and resistance states) and demonstrate that performance of memristor-based inmemory computations can de degraded owing to both variability and state drift impact, if such features are not properly considered in the design flow. Inspired on pseudo-NMOS ratioed logic and based upon a previous CMOS-like logic scheme, we propose a crossbar-compatible memristive ratioed logic style which is tolerant to device variability and does not affect device endurance as computations do not involve conditional switching of memristors. Using the Cadence Virtuoso suite, we compare this logic scheme with MAGIC and CNIMP approaches, focusing on the universal NOR gate and more complex logic functions.
I. INTRODUCTION
T HE existence of the memristor as the fourth fundamental circuit element was postulated by Leon Chua in 1971 [1] . However, this emerging device technology drew unprecedented attention quite later, after 2008, when a group at HewlettPackard Laboratories (HP Labs) demonstrated the first TiO 2 -based memristor [2] , connecting the nature of such devices with Chua's previous theory. Owing to their analog nature, potential nonvolatility, high integration density and CMOS BEOL compatibility, memristors constitute an important trend in modern electronics [3] , representing a very promising technology which has extended its influence beyond memory [4] to logic [5] , neuromorphic and in-memory computing [6] , [7] . The 2008 HP Labs invention also concerned the development of a simplistic device model, which has been ever since the basis for several more models published later [8] - [10] . The majority of such models capture the very basic behavioral device characteristics (e.g., threshold-based switching and nonlinearities near the resistive boundaries applying different window functions [11] , to name a few), and thus contributed to the progress of this emerging research field, being adequate to demonstrate the impact and usefulness of memristors in a variety of applications [12] , [13] . However, recent advanced physics-based device models [14] go deeper into the device dynamics and take into account parasitics (owing to the device structure), variability of voltage thresholds and resistance states, temperature dependencies, cycle-to-cycle stochasticity, features normally observed in real device implementations, consequently enabling more realistic circuit simulations.
Among several potential applications, the memristive logic design, i.e., the methodology of designing logic circuits using memristors, is an emerging concept in the constant quest for energy efficient post-CMOS computing systems. Many such memristive logic families have been proposed: IMPLY [15] , MAGIC [16] , CNIMP [17] , MRL [18] , MAD [19] , to name a few, in which resistance is normally used to represent data, thus all are suitable for resistive computing, most of them also being compatible with the crossbar geometry. The latter is considered a requirement for real in-/near-memory computing, since the topology of the logic circuits to implement should fit in the crossbar memory array, which is most likely the prevailing topology for memristive memory. What is more, the metrics proposed so far for the comparison of such logic families naturally focus on latency, energy, and area efficiency [20] . However, most such relevant works omit crucial factors such as variability (both cycle-to-cycle and device-to-device) and endurance of memristors; being the latter a major limitation to be considered if frequent switching is necessary during computations. For instance, recently Papandroulidakis et al. [21] suggested using different memristor technology for memory and computing tasks to address endurance issues. Likewise, another logic scheme called Scouting Logic [22] was proposed to alleviate the endurance requirement while executing logic operations by just sensing the memristor state, even though this scheme eventually suffers from device variability. Moreover, MAD gates [19] were proposed based on the same principle (using Memristors As Drivers) but the target topology was not compatible with the crossbar geometry. Performing logic computations without switching the states of the involved memristors was first seen in the so called CMOS-like memristive logic [23] , revisited recently in [24] , which was unfortunately not crossbar-compatible either.
In this work, we build upon ideas and preliminary results shown in [25] and provide an extended presentation and simulation-based evaluation of a crossbar-compatible ratioed logic scheme, first introduced in [26] . Such scheme is completely variation-tolerant; i.e., it does not suffer from device variability or state drift, and also it does not impact the device endurance as logic operations are performed by just reading the state of memristors acting as inputs. We performed circuit simulations of the suggested ratioed scheme using the Cadence Virtuoso suite and compared its performance with the two most popular memristive logic families of the literature (MAGIC and CNIMP), focusing on the universal NOR gate and on more complex logic functions such as AND-OR-Invert (AOI) all implemented in a 1T1R crossbar array. We based our simulations on an advanced physics-based model of a bipolar metal-oxide resistive RAM (ReRAM) device [14] , [27] derived by Stanford University and our results underline the importance of taking into account device variability in circuit simulations as both read and write errors can emerge due to variability and state drift impact. Furthermore, CMOS device mismatch was considered to explore the possible additional impact of such nonidealities on memristor-based logic circuits. Such features are rarely considered so far in other relevant publications in this field and the presented results highlight that variability can be critical for proper device selection and circuit design quality/viability assessment. Finally, we proved that the ratioed logic scheme outperforms the rest in terms of robustness, viability and also scalability (fan-in), getting us to the simple conclusion that "rethinking of memristive logic design from a practical point of view" is necessary if we aspire to enable in-memory computing.
II. TARGET MEMRISTIVE LOGIC GATE DESIGN STYLES
This section briefly describes the three different crossbarcompatible logic gate design schemes (i.e., MAGIC, CNIMP, and Ratioed Logic) whose operation inside the crossbar array is later studied and compared in the presence of device variability. Owing to its crossbar-compatibility, the same NORn logic function is always performed (i.e., NOR gate with n inputs). In all cases it is assumed that a memristor stores a logic '0' when it is at a high resistive state (HRS) and logic '1' when it is at a low resistive state (LRS). We will refer to a memristor being forward/reverse-biased when the voltage at the top/bottom terminal is higher than that on its bottom/top terminal; the bottom terminal is denoted by the thick black line in the circuit schematic (see Fig. 1 ). So, a SET process (HRS → LRS) occurs when the device is forward biased with a voltage of amplitude higher than a V set threshold, whereas a RESET process (LRS → HRS) occurs when it is reverse-biased with a voltage amplitude higher than a |V reset | threshold. 
A. Memristor-Aided loGIC (MAGIC)
According to Memristor-Aided loGIC (MAGIC) [16] , different gates are built using different networks of memristors and computation takes place by applying voltage pulses to the whole network. Each logic gate is composed of one or more input memristors, holding the input logic values in their resistive states, and a single output memristor to store the result of the logic operation. It is worth mentioning that only NOR MAGIC gates are crossbar-compatible, thus any logic function shall be computed using sequential NOR operations. More specifically, Fig. 1(a) shows a generalized topology for a NORn gate implemented in a row of a crossbar array (although column-wise gate implementations are also possible). The input data participating in the operation (x 1 … x n ) is stored in the resistive state of a set of input memristors, whereas an additional output memristor eventually stores the result. Assuming that the input data is initially stored in the memory array, MAGIC NORn is evaluated in two steps, regardless of the number of inputs: first, the memristor storing y (output data) is SET to LRS. Next, input voltage pulses of amplitude V 0 > 0 are applied to the top terminal of every input memristor whereas the top electrode of the output memristor is connected to ground. Note that, at the same time, the row line must show high impedance (floating) in both ends. Thus, the gate consists of a pull-up network of input memristors and a single pull-down output memristor. The latter is conditionally RESET to HRS only if at least one input memristor is at LRS. V 0 value is selected such that guarantees the switching of the output memristor while being non-destructive for the input memristor states. So its value depends both on the memristor parameters as well as on fan in. According to [16] , for a 2-input NOR (NOR2) while assuming a very high R HRS /R LRS ratio (R HRS and R LRS are the memristor resistances in HRS and LRS, respectively), the input voltage requirements are as follows: 2×|V RESET | < V 0 < V SET . In other words, MAGIC NOR gates could not be implemented unless 2×|V RESET | < V SET is true in a particular device technology.
B. Converse NonIMPlication (CNIMP)
The Converse NonIMPlication (CNIMP) [17] concerns a logic gate design that improves the previous IMPLY logic [15] by solving the issue of incomplete switching of the memristors; the latter was originally stated in [28] and was also observed experimentally [29] . The CNIMP and the SET programming operations together form a universal logic gate set. CNIMP was named after the binary operation it reproduces using a circuit consisting of memristors and an auxiliary resistor R G , able to fit in a row of a crossbar array as shown in Fig. 1(b) . CNIMP operation is performed by applying voltage pulses of amplitude V COND+ > 0 and V COND-< 0 to the memristors containing the first and second operand, respectively. Fig. 1(b) shows how to realize a NOR2 operation in 3 steps: first the device to finally hold the output data is programmed to LRS. Next, the operation x 1 CNIMP y is performed (intermediate result stored in y) and finally the x 2 CNIMP y to store the result of NOR2(x 1 , x 2 ) in y. The values V COND+ , V COND-and R G are selected to guarantee: (i) no state change in the memristor corresponding to the first operand, and (ii) conditional change in the state of the second operand to hold the output data. Similar to the MAGIC case, CNIMP involves conditional switching, which is why the requirements for the applied voltage pulse amplitudes are very similar: [17] . Therefore, unless 2×|V RESET | < V SET is true in a particular device technology, CNIMP gates cannot be implemented.
C. Ratioed Logic With Memristors and Transistors
The focus of this work is placed on the memristor-ratioed logic scheme, recently introduced in [26] inspired on the pseudo-NMOS logic design. It works very similar to the CMOS-like logic [23] but it involves less circuit complexity, making it compatible with crossbar arrays. Two different possible implementations are shown in the far left side of Fig. 1(c) , which concern a PMOS (NMOS) transistor connected to a pull-down (pull-up) memristor network (the version of the PMOS transistor and the pull-down memristors is assumed in the rest of this paper). The memristors substitute the transistors in the NMOS pull-down (or PMOS pull-up) block of a conventional ratioed logic gate. Moreover, the memristors hold the input data; i.e., a memristor in LRS substitutes a turned-ON transistor, whereas a memristor in HRS substitutes a turned OFF transistor. A particular example of a NOR2 gate is shown in the far right side of Fig. 1(c) . The logic operation is performed by applying a voltage V read across the entire circuit, while making sure that the voltage drop on the memristor network is appropriate for sensing the input states without disturbing them. Contrary to MAGIC and CN-IMP, both of which have a resistive output, here the logic output corresponds to a voltage level V out whose value falls between V REF = V DD − V read and V DD . The overall circuit works as a voltage divider with an output voltage as described in (1), where R eq is the whole memristor network equivalent resistance and R L is the equivalent PMOS channel resistance. V out is eventually compared to a threshold value V COMP , properly selected so as to interpret/correspond correctly the output voltage levels to '0' or '1'.
Likewise for the two rival logic schemes described previously, Fig. 1(d) shows how a ratioed NORn logic gate can be implemented in a crossbar row. Note that the result of computation is not simultaneously stored in the memory array; so, it is reasonable to talk about near-memory computing. However, the result can be stored back to a memory element right afterwards through an additional programming step. In order to clarify this, we include in the same figure a simplified version of the peripheral circuit. The logic operation is performed when "write/op" is '0'. Pull-up PMOS transistor is tuned to exhibit a channel resistance R L ≈ R LRS since ratioed circuits depend on proper pull-up/down resistance for correct operation. In this case, once V REF and V DD pulses are applied (enable signals "en1" … "enn" define which memristors are involved), two different cases are observed depending on the states of the input memristors. For instance, in the NOR2 example, when input is "00" the input memristors are both in R HRS , so V out is roughly V DD , which is interpreted as logic '1'. On the other hand, if at least one input
in the range corresponding to logic '0'. V out is compared to a threshold value V COMP , properly selected so that the V out voltage levels for '0' and '1' are clearly distinguished. The details of this operation are summarized in Table I . The logic output V out is momentarily stored in a flip-flop. Next, with "write/op" = '1' the output is written back in a memristor by means of the enable signals. The flip-flop and the comparator certainly add some area and power overhead, whereas the rest of the peripheral circuitry is in fact also required in the case of MAGIC and CNIMP gates. Avoiding the conditional switching of memristors during logic computations is beneficial in many different aspects, and the advantages of the proposed ratioed logic scheme are further demonstrated in the rest of this work.
Moreover, it is worth mentioning here some other dimensions in which logic design could be benefitted via the ratioed scheme. MAGIC and CNIMP allow for row-wise and column-wise operations in the crossbar array and in parallel, if a 1R array is considered. However, for the ratioed logic a 1T1R crossbar is preferred, because the logic operation is equivalent to sensing a voltage level and this could be affected by the rest of the crossbar if no select transistors are used in the cross-points. Moreover, if the selector transistors are connected in rows, then NOR operations can be performed in a row-wise or column-wise manner, but only column-wise parallel operations are possible, as shown in Fig. 2 . Furthermore, the ratioed logic scheme accepts several modifications to enable other useful logic functions in a very compact way. For instance, ORn gates are possible just by including an inverter stage after the comparator (or by using instead a parallel memristor pull-up block). With the ORn gate directly available, a particular case for n = 1 is the COPY gate, allowing to copy the content of a particular cell in another one in just one step, unlike MAGIC and CNIMP which would need two steps (two NOT operations) and conditional switching of memristors. Therefore, assuming that NOR, OR, NOT and COPY are readily available, Table II shows the cross-point area needed to compute a NOR, OR, AND or NAND gates for each one of the three logic styles compared here, aiming to provide a perspective of the necessary resources in terms of area. {IN, AUX, OUT} columns correspond to memristors holding the input data (always 2 for 2-input gates), the number of auxiliary devices to hold intermediate results in chained operations, and eventually a single memristor to hold the output value in all cases. Both aligned and not-aligned input memristors are considered, i.e., whether they share the same row or column. With aligned input devices, the number of required AUX devices is less in most cases and thus makes the difference. Generally, it can be observed that the ratioed logic scheme outperforms the rest, thus results being the most compact solution among the three.
Finally, regarding scalability, the CNIMP (or IMPLY) and MAGIC mapping has been studied in some recent works [30] , [31] . Mapping relies on the capability of parallel row-and column-wise operations with NOR and NOT gates, so optimized algorithms for faster, and less area-/power-consuming logic realizations were proposed. Likewise, the memristive ratioed logic style also has the same potential to allow efficient mapping, while using slightly different strategies than for MAGIC or CN-IMP.
III. VARIABILITY-AWARE PHYSICS-BASED MEMRISTOR DEVICE MODEL FOR REALISTIC SIMULATION

A. Description of the Memristor Model
In order to perform more realistic circuit simulations, advanced physics-based models that capture most of the device behavioral characteristics are required. Here we briefly provide the basics of the memristor model used in this work, which is the Stanford-PKU ReRAM model [14] . It is a compact physics-based model (described in Verilog-A) which captures typical DC and AC electrical behavior of metal-oxide based ReRAM devices. The model assumes a conductive filament (CF) growth process described by a change of the CF geometry during the SET and RESET processes under various bias conditions. Thus the core of the model is a two-dimensional description of a unique CF, which includes both the CF gap region and the CF width as control variables. Most importantly, the model includes parasitic effects such as the parasitic resistance of the switching layer and the electrodes, as well as the parasitic metal-insulator-metal (MIM) capacitance. Furthermore, the model supports intrinsic variation effects, such as statistical distribution of switching thresholds and resistance states, temperature dependency and dynamic current fluctuations for the RESET process, thus supporting literally all the ReRAM device variation effects known to date. Operation is very similar to that of other models. A positive applied voltage produces a SET process, where the oxide layer suffers a soft-breakdown; the CF is formed and the device is in a low resistive state. A negative applied voltage causes a RESET process in which the CF is dissolved through ion diffusion or drift processes.
The model is executed in the Cadence Virtuoso suite. The majority of the parameters were kept at default values suggested in [27] , except those directly affecting the switching thresholds, i.e., the average active energy of oxygen vacancies (E a ), the hopping barrier of O 2 -(E h ) and the energy barrier between the electrode and the oxide (E i ). Tuning these parameters is recommended to adjust the overall device behavior according to the target application requirements. For instance, in order to comply with the previously mentioned threshold voltage requirements for MAGIC and CNIMP logic operations (V SET > 2 × |V RESET |, ) these parameters were tuned as follows: E a = 0.9 eV, E h = 0.9 eV and E i = 0.7 eV. Fig. 3 (a) demonstrates i-v curves for 20 cycles taken for a device under a triangular voltage application. The compliance current (cc) is adjusted by tuning the gate voltage of a series NMOS transistor (we use here realistic 0.35 μm MOS models). Since the memristor model does not include voltage thresholds directly as parameters, we define as SET threshold V SET the voltage at which the current reaches to 90% of the cc. Likewise, we define as RESET threshold V RESET the voltage when the current first experiences a sudden decrease. These statistics give us the following mean values: V SET = 2 V and V RESET = −0.5 V, which will allow to establish the appropriate amplitude of the programming/reading voltage pulses. Moreover, the conduction of the model is purely ohmic for the conductive filament body (LRS) but non-linear for the tunneling gap (HRS). This is observed in Fig. 3(b) which shows how the effective R HRS /R LRS ratio can change depending on the applied voltage. The device is first SET to R MIN , the lowest resistive value, and then a positive voltage is applied without modifying its state. As expected, R LRS shows the ohmic conduction of the CF. Nevertheless, when the device is RESET to R MAX , i.e. the highest resistive value, and a larger negative voltage is applied, again without disturbing its state, a highly nonlinear behavior is observed in the effective R HRS owing to the hopping current through the tunneling gap. Therefore, such dependency of the effective R HRS state on the voltage across the device marks a significant difference when compared to other device models commonly used in memristive logic design. This fact could have a significant impact on the efficiency of memristive applications, thus it is an important consideration also for logic gate design.
B. Memristor State Encoding, Read Out, and Variability
Binary encoding of memristance is necessary for logic applications. Given the used parameter values, the model exhibits a memristance range from 5 kΩ (R MIN ) to 3 MΩ (R MAX ). Taking into consideration the results in Fig. 3(b) , we defined values above 1 MΩ as HRS and values below 100 kΩ as LRS, whereas all values within the intermediate guard band are treated as undefined states (see memristance map in Fig. 4(a) ). Reading of the memristor state is performed in voltage mode with a 100 kΩ series resistor by applying a small voltage pulse (V read = 0.5 V, 20 ns-wide) which does not affect the device state. For the purposes of our simulations, we implemented the state decoder shown in Fig. 4 (a) (adapted from [32] .) The voltage read through the memristive voltage divider is compared with two reference values that represent the upper bound of LRS, i.e., R LRS,h = 100 kΩ, and the lower bound of HRS, i.e., R HRS,l = 1MΩ. So, depending on the result of the comparison, the output of the circuit is either logic '1' for LRS, logic '0' for HRS, or 'X' for any undefined intermediate resistive state. An important aspect of the Stanford/PKU model is the capability of generating cycle-to-cycle variability. During the switching process, a random variable is added to the change rate of the tunneling gap distance between the electrode and the tip of the conductive filament (CF) g, and to the change rate of the CF width w. Such random variable is a zero-mean Gaussian distribution with std. deviation σ g and σ w , respectively, thus modifying g and w in every step of the transient simulation. Different devices shown different amounts of variability, hence it is interesting to include different scales of statistic distributions for these parameters. To do so, we set σ g = k × σ g0 and σ w = k × σ w0 (σ g0 = 10 −4 m/s and σ w0 = 5 · 10 −4 m/s are the default values), where k = 1,2,3 … is a variability factor that permits configuring easily the amount of desired variability to satisfy the required R LRS and R HRS distributions. Such state variability affects the memristance values as well as the switching thresholds (as shown previously in Fig. 3(a) applying  k = 1) .
We inject variability to the initial states of the devices through a two-step initialization process, as shown in Fig. 4(b) : when programming the device to the LRS (HRS), this is done by first performing a hard RESET (SET) and then a soft SET (RESET). Hard SET (RESET) completely forms (destroys) the CF to thus eliminate the previous history of the memristor. On the contrary, the HRS or LRS programming taking place right next (called soft programming) initializes the memristor to a state within the LRS or HRS range, as desired, thus including the variability effect in the memristance. The voltage pulses applied for the HRS initialization concern: V ON amplitude of 3 V, 200 ns width and 500 uA cc for hard SET, V OFF,soft amplitude of −2 V, and 100 ns width for soft RESET. On the other hand, for the LRS initialization it is: V OFF,hard amplitude of −2.5 V, 200 ns width for hard RESET, and V ON amplitude, 100 ns width and 50 uA cc for soft SET. Fig. 4(c) shows simulation results for the memristance distribution of R LRS and R HRS read after such initialization process. It can be observed that variability-aware simulations for HRS/LRS programming show normal distributions for the device state whose std. dev. increases linearly with increasing values of k.
IV. TARGET SYSTEM OVERVIEW: CROSSBAR MEMORY ARRAY AND PERIPHERAL CIRCUITRY
Crossbar is the target memory topology in this work and all considered logic design schemes are crossbar-compatible. For the purposes of our simulations, we designed both the crossbar memory array as well as the entire peripheral circuitry used to properly program the devices, read their states, and perform the logic operations of interest according to the logic styles described previously.
The main unit of the simulated system is the crossbar array shown in Fig. 5(a) , where each cross-point cell consists of a memristor and a series select transistor (1T1R). 1T1R arrays are beneficial since they mitigate sneak-path currents and also isolate the memristors involved in memory/logic operations from the rest of the devices. However, compared with passive arrays that use highly nonlinear memristors, the only drawback of the 1T1R structure is a penalty on the cross-point density that can be nearly 2.5 times the passive cell area, as shown in [33] . However, such area penalty may be compensated by the improvements in terms of reliability of operations. We assume that the memristors are stacked directly on top of group-accessed select transistors, which are placed on the bottom layer (fabricated as front-end transistors) to limit the influence of the parasitic capacitance and resistance and also to minimize the area penalty (such as in the case of [34] with the nano-pillar gate-all-around transistors).
Moreover, there are multiplexer (MUX) and demultiplexer (DEMUX) units to drive the selected columns and rows. The MUX, whose detailed implementation is shown in Fig. 5(b) , provides different voltage levels to the rows or columns. It is primarily composed by a decoder, which connects the output to any of the input voltages according to the selection signals sel mux , or disconnects completely the block from the crossbar (high impedance node -HZ) using the enable signal en. The input voltage signals are shown in two groups; those concerning memory operations and those used in logic operations with either of the three logic styles supported here. On the other hand, the DEMUX shown in Fig. 5(c) , provides connection to extra circuitry when necessary, such as the reading circuit described in Fig. 4(a) , the transistor that limits current in the LRS or HRS programming and the R G used in the CNIMP logic style. There are also pull-up PMOS transistors connected to both rows and columns, which are enabled only when computations using the ratioed logic scheme take place (see Fig. 1(d) ).
The system was designed particularly to support all features necessary in the three different logic styles (apart from the typical memory operations.) So, the area overhead of the peripheral circuitry is not an issue here since it will be severely reduced if just one logic style is supported.
V. SIMULATION RESULTS AND DISCUSSION
Next we describe the setup of the entire target system that enables the simulation of different in-memory operations, which we used to evaluate the impact of device variability in the design process of memristive logic gates.
A. Simulation Setup
The system described in Section IV was implemented in Cadence Virtuoso. For the peripheral blocks, Verilog-A behavioral description of decoders and comparators was initially used along with ideal switches for the transistors, to minimize simulation overhead and focus on the impact of parasitics owing to the 1T1R topology. To this end, only the cross-point select transistors and those directly participating in the programming tasks and in the memristor-ratioed logic scheme, were implemented using 0.35 um standard CMOS technology. Once the impact of variability in memristor devices was explored, some of the simulated scenarios were repeated including CMOS device mismatch, for completeness, aiming to highlight the further impact of the design of the peripheral circuit and potential CMOS nonidealities on the memristor-based logic circuits.
The inputs of the simulation were primarily all the voltage pulses applied to MUXes, DEMUXes and the pull-up transistors that enable each operation. The target logic functions were: NOR2, NOR4 and a 2-2 AND-OR-Invert (AOI22). The NOR2 concerns an easy to understand design and straightforward variability analysis, whereas with NOR4 the increment of fan-in was meant to be explored. Finally, the AOI22 is a complex function with chained operations, built as a two-level NOR2 implementation using inverted inputs. For the chained operations, two auxiliary memristors were employed to hold the output of the first level NOR2 gates and serve as input for the last level NOR2 gate.
In the early design steps, the main goal was to find the design parameters for each operation that guarantee the expected outcome, also called the design space. In MAGIC this primarily concerns selecting V 0 , whereas in CNIMP it is about V COND+ , V COND-and R G . Finally, for the ratioed logic scheme the load transistor needs to be properly designed and the V DD , V REF values need to be properly selected. In order to find the proper design space, we performed a sweep of the design parameter values for every possible combination of the input variables of the logic functions. Variability impact was evaluated considering the error rate as the primary metric, which we computed as the number of unsuccessful logic operations over the total number of logic operations performed. The latter is valid as long as the state of both the input and output memristors is not driven to any undesired region after a logic operation is finished. The error rate is computed using 1000 simulations for each input combination of interest. MAGIC and ratioed logic NORn gates are computed as a commutative operation, i.e., NOR2 input cases "01" and "10" are equivalent, hence only "01" was considered. However, CNIMP NORn was computed via a sequence of CNIMP operations, thus "01" and "10" are by default different operations even though the logic result is expected to be the same, so both cases were simulated. Finally, the average error rate was calculated, which for MAGIC and ratioed logic styles it is a weighted average error rate from the simulated input cases (e.g., in NOR2 "00" and "11" have a weight of 0.25, whereas "01" has a weight of 0.5 since it includes the equivalent "10" case.)
In order to accommodate a large number of logic operations with several memristors and to simplify the configuration of every simulation, a Python script was prepared which works similarly to a compiler program: it takes as inputs the crossbar size and an instruction file that includes the list of operations to be performed in the system during the simulation and generates all the required input signals (e.g., enable and select signals for MUX/DEMUX blocks, gate signals for the row select transistors and the PMOS devices, etc.) in the form of piecewise descriptions w.r.t. time, as required by the Cadence simulation environment. The instruction file is a sequential list of all the different possible operations supported (i.e., HRS and LRS programming, READ, MAGIC NOR, CNIMP and RATIOED NOR) along with several arguments related to the position in the array of the memristors that are involved in every operation, or other specific features such as the number of inputs of a logic operation, etc. The whole simulation flow is shown in Fig. 6(a) . An example of a simulation file for a MAGIC NOR2 with previous inputs programmed at "01" and memristors read before and after the simulation, is shown in Fig. 6(b) . Once the piece-wise linear files are generated by the Python script, the Cadence tool incorporates them along with the schematics, device models, variable values, and the simulation is executed for the given scenario. Finally, the simulation results are exported, containing the state of all the memristors of interest, both before and after the execution of the logic operation(s). It is worth mentioning that such "compiler" is versatile enough to permit being extended in order to support more logic styles (by updating the peripheral circuitry blocks), as well as to be optimized for less 
B. Simulation Results
1) MAGIC Logic Gates:
Regarding MAGIC, the proper design space for NOR2 and NOR4 was first explored and the impact of variability was then evaluated for both, including the AOI22 function. Note that the AOI22 does not have to be designed separately but it consists of sequential NOR2 operations. In the design process, the V 0 value was swept while monitoring the state of all the memristors after the logic operation. The applied V 0 voltage pulse width was 200 ns, time long enough to perform the switching of the output memristor. If any input memristor did not hold its initial logic level, or if the logic level of the output memristor was not the expected one, then the specific V 0 value was not included in the design space. This process took place for every distinct input combination ("00", "01/10" and "11"). Figure 7 (a) presents the results for NOR2, i.e., the resistance of the input and output memristors after the logic operation. In the x-axis the valid V 0 range is shown, extending from 1.89 V to 2.19 V. The initial HRS (R HRS,ini ) and initial LRS (R LRS,ini ) is marked by a black dashed line. The higher bound of the V 0 range is found in the "00" input case as the state of the input memristors enter the undefined region when V 0 > 2.19 V. Likewise, the lower bound is found in the "11" input case, when the state of the output memristor enters the undefined region when V 0 < 1.89 V. Note that in the "00" input case the input memristors do not switch their state from HRS to LRS; however, we notice a state drift (see Fig. 3(b) ) towards the undefined region, due to the considerable voltage drop on the input memristors, which is insufficient to cause a complete switching event but it is high enough to slightly modify their state. State drift is undesirable as it can cause wrong evaluations in subsequent computations that may use these devices as input memristors. Similarly, the design of the NOR4 gate is performed in the same way but involves checking more input cases. Here V 0 is bounded from 1.89 V (for any combination having two inputs at '1', such as "0011") up to 2.47 V (for the "0000" input case). Interestingly, Fig. 7(b) shows how the design space of NORn is modified with increasing fan-in. The range of valid V 0 values results different for different numbers of inputs, first getting wider up to a particular point, and then shrinking.
Once the design space for NORn was defined, then variability was injected in the involved memristors. The error rate of NOR2 and NOR4 for variability factor k = 5 is displayed in Fig. 7(c) and Fig. 7(d) , respectively. The error rate is shown separately for each input case (equivalent input cases are grouped in the same category; e.g., "10" and "01", or "0001" and "0100", since what is important is just the number of logic '1' in the input combination rather than the specific bit sequence). Apparently, the most affected case is the "all-zero" input as V 0 increases. We observed that the higher the applied voltage, the larger the state drift in the input memristors. Although all V 0 values within the design process should in principle be valid, it turns out that the error rate in some cases is unacceptable. The average error rate is minimized for V 0 = 1.95 V in NOR2 and for V 0 = 2 V in NOR4, respectively, highlighting that memristor device variability must be taken into account in the design process. Note that in NOR4 the average error rate seems more flat than in NOR2. This, owing to the fact that most of input cases exhibit similar low error rates, whereas the critical case is just the "all-zero" case, so it weighs less in the computation of the average value.
Next, the error rate of AOI22 was computed. Since AOI22 was implemented via sequential NOR2 operations, the design space of NOR2 was considered and only values around V 0 = 1.95 V were tested, i.e., around the value that minimized the error rate for NOR2. Fig. 7(e) shows the results for variability factor k = 5. The error rate is detailed for each grouped input case. Equivalent input cases are grouped in pairs, each corresponding to the inputs of one of the two first-level NOR2 gates. For instance, the "00|01" also includes "00|10", "01|00", and "10|00", which are equivalent. We observed that the most error-prone cases are the ones with a "00" input pair. This was somehow expected since "00" is the input combination with the highest error rate in NOR2 (Fig. 7(c) ). Thus, in AOI22 the firstlevel NOR2 gates produce quite unreliable outputs that are used next as inputs in the second level, to eventually produce an even less reliable final output. Moreover, the average error rate, here minimized for V 0 = 1.9 V, does not change significantly with V 0 . We also observe that the error rate curve, even for the most critical input case, it is not as steep as in previous results. Generally, in agreement with NOR2 and NOR4 functions, it seems that using higher voltage amplitudes results in increasing error rate. However, the error rate of AOI22 shows higher values than the NOR2 or NOR4 operations, reflecting the undesired consequence of chained logic operations. A logic operation with a minimum error rate of 15.6% ("00|00" input for V 0 = 1.9 V) is clearly not feasible. So, after every NOR2 operation, the state of the output memristor should be properly rewritten to minimize the propagated error rate. 2) CNIMP Logic Gates: For CNIMP, the only design necessary is that of the basic CNIMP operation which is used to perform any other logic function. Once this is completed, the NOR2, NOR4 and AOI22 are simulated while injecting the same amount of variability to finally compute the error rate. Contrary to the MAGIC case, in the CNIMP design there are three design parameters: V COND+ , V COND− and R G . So, in order to keep the design process as simple and short as possible, we reduced the test set of V COND+ and R G values and rather swept only V COND-. The applied input voltage pulse width was again 200 ns as in MAGIC.
The corresponding results are shown in Fig. 8(a) . Different designs were obtained for different R G values. Then, NOR2, NOR4 and AOI22 operations were tested to obtain the error rate, shown in Fig. 8(b), Fig. 8(c) and Fig. 8(d) , respectively. The R G = 10 kΩ and V COND+ = 1 V were fixed, whereas V COND-was tested for −1.1 V, −1.2 V, and −1.3 V. Likewise in the MAGIC case, once again the results prove that the error rate is very much dependent on the design parameters. Also, we observe that the CNIMP NOR2 error rate is lower than the MAGIC NOR2 but, as the number of inputs increases, the CN-IMP error rate results higher than the corresponding MAGIC gate (NOR4). We even note a set of parameter values that leads to 100% error rate for the "0000" input combination. This is an undesirable effect of state drift, since NOR4 is computed using a chain of four CNIMP operations. Furthermore, the AOI22 results confirm the fact that, although the function is implemented with sequential NOR2 operations, the error rate is not only higher than NOR2 but actually prohibitive. Generally, in CNIMP-based implementations the chained operations are even more critical than in MAGIC as every NORn involves a chain of basic CNIMP operations. Likewise in NOR4, we confirmed that V COND-= −1.3 V produces 100% faulty results for certain input combinations. Unfortunately, it is not easy to establish any Fig. 9 . Simulation results for error rate while considering CMOS mismatch for (a) MAGIC NOR2 and (b) CNIMP NOR2. 500 operations were realized in each case, with circuit parameters as described in Fig. 7(c) and Fig. 8(b) , respectively.
correlations between NOR2 and AOI22 faulty input cases as it was for the MAGIC AOI22. However, it is noticeable that input combinations with a high number of logic '0' and also those including a few logic '1' (for instance, "1000", "0100" or "1100"), evaluated in early stages of the computation, are the ones with the highest error rate. Finally, the minimum average error rate was displaced here (w.r.t. that in CNIMP NOR2) to
All in all, MAGIC and CNIMP were proved quite sensitive to variability in memristor states even after being properly designed, and while considering quite a large resistance window. What is more, since large voltages are generally required, the state drift issue appears. So, chained operations may worsen the situation increasing the amount of faulty operations to prohibitive levels and thus requiring for extra intermediate steps to restore the state of memristors in-between logic operations. Furthermore, in Fig. 9 we show simulation results for some of the scenarios shown previously, while considering CMOS nonidealities and having all blocks in the peripheral circuit designed using 0.35 um standard CMOS models. Monte Carlo simulations were carried out including the best possible levels of process variability to highlight the potential impact of CMOS device mismatch on the memristive logic operations. The same trends are observed and the error rate now increases in both MAGIC and CNIMP gates. So, the CMOS components could eventually worsen the performance of logic operations and an optimized design of the driving circuit is required, along with a potential adjustment of other parameters related to memristors (V 0 , V SET , V RESET , V COND+ , V COND-, …) so as to minimize the error rate. Therefore, it is expected that a logic scheme which does not imply conditional switching of the memristor states would most likely constitute a more viable and robust approach to follow.
3) Ratioed Logic Gates With Memristors-Transistors: Last, we present the results concerning the design exploration of the alternative ratioed-logic scheme proposed in [26] . The voltage difference V DD -V REF must be low enough to not disturb the state of the input memristors. Using V read = 0.5 V as a reference, we applied V DD = 3.3 V to operate the 0.35 um PMOS load transistor and set V REF = 2.8 V. Moreover, the PMOS channel resistance value was achieved very similar to the LRS resistance of memristors, through proper sizing and by using a gate voltage of 1.6 V. Moreover, since no switching of any memristor is required, the voltage pulse width was selected 20 ns, much shorter compared to the pulses applied in the MAGIC and CNIMP gates. Regarding the evaluation of the variability impact, it should be noted that the error rate of NOR2 is zero. So, it was meaningless to use this as a metric of evaluation. Alternatively, the distributions of the V out values for each different input combination, was considered more illustrative for this logic style. The simulation results are shown in Fig. 10(a) . Each distribution is colored depending on the variability factor k used in the simulation, i.e., blue for k = 1, red for k = 5 and black for k = 10, whereas the insets show separately the details of each distribution. As expected (see Fig. 1(c) ), in the "00" input case V out falls near V DD and has a very narrow distribution. On the other hand, "01/10" and "11" input cases have broader distributions that are centered in particular values, as explained theoretically in Section II.C (see Table I ). The results confirm that the three cases are clearly distinguishable and a comparison voltage V COMP can be easily defined to interpret correctly the values corresponding to logic '1' or logic '0'. The NOR4 results in Fig. 10(b) show similar performance. Indeed, as there are more input memristors in parallel, the V out distributions are displaced further from V DD . So, V COMP clearly separates the input cases that give a logic '1'output from those that give a logic '0' output.
Next, Fig. 10 (c) aims to underline the really high tolerance of the proposed scheme to device variability. The graph shows the V out range evolution with increasing number of inputs of the NOR gate. The output is shown only for the all-zero input case ("00 …00") and for the combinations with only one input storing a logic '1' (indicated as "00 …01"), since their results are closer to V COMP , as seen in Figs. 10(a),(b) . The green line shows an extrapolation following the results without any variability for input number n = 2, 4, 8 and 16, whereas the simulation results shown were taken after the injection of variability. It can be noticed that the two cases remain separated enough so as to be properly distinguished by the comparator. Eventually, the "00..00" case theoretically intersects the V COMP line approx. when n = 64! However, such limitation could be erased by justifying accordingly the V COMP level to the number of inputs. Hence, only when the V out values for these two extreme input combination cases are too close with each other, then the error rate will be non-zero for the ratioed logic scheme.
We did not include any simulation results for AOI22 since the performance of the basic operation, i.e., NORn, was found a priori superior than in the other logic schemes and practically error-free. On the other hand, since output here is expressed in voltage instead of memristance, conducting chained operations in this scheme assumes adding an intermediate step to store the logic output to a memristor state which will later act as input variable, as explained in Section II.C. Therefore, in principle the delay of computations in the ratioed scheme would be higher. However, as commented before, unless an intermediate "regenerating" step is included both in MAGIC and CNIMP gates, chained operations in either of these styles is unacceptably error-prone. So, after all, requiring an extra step to store intermediate results in rationed logic operations is not that much a disadvantage, considering the high gains in reliability and also the much shorter computation time which involves only sensing the memristor states. Finally, no simulation results are shown here for the impact of CMOS nonidealities to the ratioed logic scheme, since we found they have no significant effect to the output voltage levels, thus the error rate was kept zero.
Relying on logic styles that assume frequent device switching, even for the implementation of simple logic functions, actually poses a major concern for the endurance-limited memristor technology. This is why, as mentioned in Section I, the focus has been moved to alternative schemes that implement logic using just read operations, such as the Scouting Logic [22] . The latter is also crossbar compatible and is based on a modified sense amplifier for the implementation of different logic gates. It shares some of the advantages with the ratioed logic scheme [26] , even though the design process is iterative, similar to the one presented for MAGIC and CNIMP. On the other hand, the ratioed logic was proved here robust and with a straightforward and noniterative design process, whereas proper functionality was verified even while scaling the number of gate inputs.
VI. CONCLUSION
The paper compared the performance of three memristive logic styles in terms of tolerance to device variability. Simulation results confirmed that a variability-aware design and more realistic circuit simulations using physics-based accurate device models, are necessary in the assessment of memristive logic design schemes. Our analysis showed that MAGIC and CNIMP behavior is highly sensitive to design parameters, increasing significantly the error rate up to unacceptable levels. State drift was also observed in the design process and it highly impacts chained operations. On the other hand, the ratioed logic scheme with a much simpler design space exploration, was proven much more robust against device variability. Also, it is crossbar-compatible, fast, and it does not affect the device endurance as computations do not involve switching the memristor states. The smaller voltage amplitudes required are expected to decrease power consumption as well (not covered in this work) and certainly prevent state drift. Everything considered, the memristor-based ratioed logic scheme distinguishes as a good candidate for in/near-memory logic in future computing systems. 
Manuel Escudero
