Abstract-Memristive crossbars have been shown to be excellent candidates for building an ultra-dense memory system because a per-cell access-transistor may no longer be necessary. However, the elimination of the access-transistor introduces several parasitic effects due to the existence of partially-selected devices during memory accesses, which could limit the scalability of access-transistor-free (ATF) memristive crossbars. In this paper we discuss these challenges in detail and describe some solutions addressing these challenges at multiple levels of design abstraction.
I. INTRODUCTION
CMOS-based memory technologies cannot keep up with the ever-increasing demand for denser and lower-power memories. As the memory cell size is mainly limited by the size of its access-transistor, CMOS technology scaling is reaching its limit due to the increased leakage current of these transistors and the yield drop induced by fabrication imprecision [1] .
As an alternative, emerging resistive memory technologies such as Phase Change Memories (PCM) [2] , Spin-Transfer Torque Magneto-resistive Memories (STT-MRAM) [3] , and metal oxide valence change ReRAMs [4] have been investigated recently that offer ultra-small and low-power memory elements with fast switching speeds. Among them, metal oxide valence change ReRAMs, generally referred to as memristors [5] , are especially promising as they offer the possibility of elimination of the access-transistor due to their unique electrical characteristics, while maintaining the same power/speed/endurance advantages [6] .
A memristor is a two-terminal passive programmable resistor, the resistance of which is maintained in the absence of an electric field. This characteristic makes them an ideal candidate as a non-volatile memory. High/Low resistances can be used to represent logic value 0/1. Applying adequate voltage (or current) pulses can change the resistance of the device. The change to the resistance has a strong non-linear dependency on the amplitude and the duration of the applied pulse [7] . This non-linearity opens up opportunities to obviate the need for an access-transistor for each memory cell. Ultra-high density memory arrays can be realized with memristive devices as their feature size can be shrunk to a sub-10nm scale, due to the elimination of the access-transistor and the simple structure of the memristors [8] . Moreover, both analysis and preliminary experimental measurements have also demonstrated its potential for lower power consumption than existing technologies [9, 10, 11] . The main contributors to its power efficiency are the access-transistor-free (ATF) memory structure and the passiveness of the devices. These characteristics make memristive memories attractive as an extremely dense and low-power non-volatile memory [6] . Several nanoscale memristive crossbars have been successfully demonstrated recently [12, 13, 14] .
The elimination of per-cell access-transistors also introduces several challenges. In an ATF memory array, selecting a target cell puts a partial bias on several other devices in the crossbar. Sneak/leakage currents could be introduced due to the existence of such partially-selected devices that may interfere with the operation of the sensing circuitry utilized during the read and write operations. This parasitic effect is data-dependent, as the amount of the sneak/leakage current depends on the datapattern stored in the ATF crossbar. Furthermore, in an ATF crossbar, the partial voltages applied to those partially-selected devices during the write operation might slightly affect the states of such devices. This effect, also known as "write disturbance", could accumulate and result in corruption or complete inversion of the stored logic value.
The sneak/leakage current and write disturbance issues become worse for a larger-scale ATF crossbar, as the number of partially-selected devices during read/write operations generally increases with the size of the memory. Moreover, the high cumulative leakage current in a large-scale ATF crossbar would result in high current requirements and high power consumption for the memory system. Such issues limit the scalability of ATF crossbars and thus must be addressed.
In the rest of the paper, we describe the challenges of building a scalable access-transistor-free memristive crossbar in detail, and illustrate some solutions to address such challenges ranging from device-level fabrication schemes to higher-level architectural solutions. The rest of the paper is organized as follows: Section II provides some backgrounds of memristors. Section III discusses the challenges introduced by the elimination of the access-transistors and some solutions addressing these challenges. Section IV focuses on the scalability issues of ATF crossbars and methods for addressing them. Section V concludes the paper.
II. BACKGROUND ON MEMRISTORS
Memristors typically have a metal/insulator/metal (MIM) structure. The change in the resistance happens due to the nonvolatile formation of a conductive filament inside the insulating oxide layer. Such filament is formed by applying a voltage (or current) pulse across the device. The applied electric field mobilizes the conductive particles (e.g. oxygen vacancies, metal- lic ions, etc.) to form a filament by making them drift inside the insulating oxide layer [15] . With the formation of such highly conductive filament, the device is set to a low resistance state (ON state). To reset the device back to a high resistance state (OFF state), a pulse with an opposite voltage polarity is applied. Such pulse will disperse the conductive particles and rupture the filament. Figs. 1a and b show one possible realization of memristors and the filament formation and rupture processes.
Memristive memory arrays are fabricated in form of a crossbar: two perpendicular layers of parallel nanowires are fabricated with one layer on top of the other, having switching oxide material deposited in between, as illustrated in Fig. 1c . Hence each cross-point has the MIM structure of a memristor with no access-transistor. The memory controller, decoder modules, and the sensing circuitry are implemented in the peripheral CMOS subsystem [12] . Such crossbar structure offers a very small footprint of 4F 2 for each memristive device, where F is the feature size of the memory technology, enabling the realization of ultra-dense memristive memory arrays. A feature size as small as 9nm has been fabricated in academia [8] and even smaller feature sizes are under active research [16, 17] . The memory density can be further increased by growing multiple layers of memristive crossbars in a 3D fashion [18] .
Elimination of the access-transistor is possible because of the strong non-linearity in the rate of the change of memristor's resistance versus the applied voltage, i.e., dR/dt-V [19, 20] , as shown by the solid curve in Fig. 1d . While applying voltages above a write threshold, V thw , effectively switches the internal state of the device, applying voltages below V thw has negligible effect on the device's state. This non-linearity combined with proper voltage application schemes could effectively provide the functionality of an access-transistor. For the write operation, a write voltage pulse of amplitude V w above the write threshold V thw is applied across the device in a crossbar via a method known as V/2 scheme [21] : V w /2 is applied on the target word-line (i.e. a nanowire in the top layer of the crossbar array) and −V w /2 is applied on the target bit-line (i.e. a nanowire in the bottom layer of the crossbar array) while all other lines in both top and bottom layers are grounded, as shown in Fig. 2a . In Fig. 2 , horizontal and vertical lines represent word-lines and bit-lines respectively. In such a configuration, the target cell experiences the full V w pulse, which is greater than V thw , and thus switches, while the bit-line-shared and word-line-shared devices only experience V w /2 which falls below V thw , and has negligible effect on the state of such devices. The remaining unselected devices have no bias across them. Note that the V w pulse can be applied on the target word-and bit-line asymmetrically, e.g. , −V w /3 on the bit-line and 2V w /3 on the word-line, as long as the partial voltages across line-shared devices fall below V thw .
The read operation is performed by applying a low read voltage V r across the device and sensing the resulting current to determine the state of the device. This could be realized by applying V r on the target word-line and grounding the target bit-line, while other lines are grounded (or less favorably, floated [22] ), as shown in Fig. 2b . While applying low V r voltages generally would not change the state of the device, the value of the resulting current depends on the device's state, i.e., its resistance: a high (low) current indicates a low (high) resistance, and thus a logic 1 (0), as illustrated in Fig. 1d . A typical sensing circuit measures the current on the bit-line, which ideally should be equal to the current passing through the target cell, I target , and compares it with a fixed reference current, I ref . The value of the reference current is chosen in between the expected current value for a memristor in the ON state (I ON ) and that in the OFF state (I OFF ), that is, I OFF < I ref < I ON . Some sensing circuits convert the bit-line current into a voltage and does the comparison in voltage [23] .
7S-1 III. CHALLENGES IN ACCESS-TRANSISTOR-FREE CROSSBARS
The elimination of the per-cell access-transistors reduces the fabrication complexity and increases the memory density of memristive arrays. However, the lack of access-transistor also results in several undesired effects in an ATF memristive crossbar. During the read operation, sneak currents could flow into the intended path and interfere with the sensing operation. During the write operation, the partial voltage bias across lineshared devices could disturb the state of such devices causing write disturbance. Such partial bias can also result in leakage currents that affect the write operations. This section discusses such issues in detail.
A. Sneak Current Fig. 3a shows the sneak paths during a read operation. A sneak path is formed when there is a path consisting of lowresistance ON devices connecting the target word-line to the target bit-line. Such paths form parallel resistances to the target memristor, through which sneak current, I sneak , flows. Consequently, the input current to the read circuitry is no longer contributed solely by the current of the target cell, I target , but also includes a I sneak component. As a typical sensing circuit employs a fixed reference current I ref to determine the state of the target device, such undesired sneak currents could affect the read operation and result in erroneous reads. The sneak current is data-dependent, as having more low-resistance (ON) devices will form more sneak paths and result in higher sneak currents.
The sneak path problem is more prominent when the unselected lines are floated [21, 22] . Keeping such lines floating would result in nondeterministic and data-dependent voltage bias across line-shared and unselected devices. Grounding the unselected lines ameliorates the sneak path problem by minimizing the voltage difference across the unselected devices and thus the amount of I sneak . However, the sneak current still exists due to the small voltage drop on the nanowires and limited driving capability of the grounded lines' drivers.
Two solutions exist for the sneak current problem: One is to prevent the sneak current by altering the I-V characteristics of memristive devices, and the other is to adjust the reference current I ref based on the sneak current.
Sneak Current Prevention:
The sneak current exists mainly due to the linear I-V characteristics of memristive devices: a small bias voltage across a line-shared or unselected device, which is common of ATF crossbars, generates a sneak current proportional to the bias and inversely proportional to the resistance of the memristive device.
Hence, one solution to the sneak current problem is engineering non-linearity in the I-V characteristics of the memristive devices, such that having small biases below a read threshold V thr across the device would result in negligible currents regardless of the state of the device (illustrated in Fig. 3b) . Yang et al. engineered such non-linearity by adding a conductive oxide layer inside the device stack. The non-linearity is generally associated with the Schottky-like interface of the internal oxide layers [7] . Given this non-linear I-V characteristic curve, unselected devices in an ATF crossbar appear as highly resistive paths in case of small biases, thus considerably reducing the sneak current.
The complementary resistive switch (CRS) was proposed as an alternative solution to the data-dependent sneak path problem [24] . A CRS is formed by connecting two memristors antiserially and stores binary information as an internal configuration rather than as a resistance. As originally envisioned, the idea of a CRS is to keep at least one of the constituent memristors in the OFF (high resistance) state: logic 1 (0) is defined as the configuration in which the bottom (top) memristor is ON and the other memristor is OFF, as shown in Fig. 3c . As a result, a CRS always presents a high resistance regardless of the stored logic, thus reducing the sneak current by eliminating the low-resistance paths inside the ATF crossbar.
A major drawback of CRS cells is the destructive nature of the read operation in such devices: As both binary logics demonstrate a high-resistance, in order to read their state, first a write operation is performed to differentiate logic 1 from logic 0 (the destructive write shown in Fig. 3c ). Such write operation switches a CRS cell storing logic 0 into a low resistance device by switching both constituent memristors into ON state, while it does not affect the state of a CRS cell storing logic 1. The device is then read using a regular reading scheme, followed by another write operation to put the device back into the highresistance CRS configuration. The additional write operations adversely affect the endurance and cause excessive energy consumption.
In [25] , the destructive read problem of CRS cells is addressed. The authors proposed HReRAM, a hybrid crossbar structure formed by CRS cells that can act both as a memristor and as a CRS, a behavior that has been observed in TaO xbased single memory cells [26] . HReRAM alleviates the CRS destructive read issue by keeping the frequently-accessed devices in the memristive mode whose read accesses are nondestructive. Moreover, keeping the seldom-accessed devices in a high resistance CRS mode help dampen the effect of the sneak current.
7S-1
Reference Adjustment: Another solution to the sneak current problem is to employ a read circuit that can adjust its reference to take into account the effect of the data-dependent sneak current in an ATF crossbar.
One example is the utilization of local reference memristors [27] : a subset of memristors on each word-line are programmed to have a resistance R REF between R ON and R OFF . During read operation, a V r pulse is also applied to a reference cell adjacent to the target cell, producing a current between I ON and I OFF through the reference cell. However, the current observed at the end of the reference bit-line includes a sneak current similar to the sneak current on the target bit-line. This essentially provides a locally-generated reference current that is adjusted to include the amount of the sneak current in the crossbar, thus reducing the adverse effects of the sneak current. However, this method requires several reference cells per line, as the reference cell should be sufficiently close to the target cell to be affected by a similar, and yet not exact, amount of sneak current. Moreover, the resistance value of the reference cells need to be carefully programmed and maintained.
B. Leakage Current
During the write operation, line-shared devices are partiallybiased. Such partial voltage introduces a leakage current on the target bit-line, as shown in Fig. 2a . The leakage current is observed even in case of memristive devices with non-linear I-V characteristics such as the one shown in Fig. 3b , since the partial voltage is usually higher than the typical read threshold voltage V thr of such devices. Similar to the sneak current, the leakage current is also data-dependent: the larger the number of ON devices on the bit-line, the greater the leakage current. Such leakage currents usually do not interfere with the write operation, since the write operation typically does not involve current sensing.
However, the leakage current becomes problematic during an adaptive write operation. In an adaptive write operation, the length of the write pulse is adjusted for individual devices by terminating the write pulse as soon as the target device completes the switching. This is typically done by monitoring (i.e. , sensing) the write-current through the target cell, I target , to detect a sudden jump in the current that indicates the completion of switching. In ATF crossbars, I target is mounted on top of a considerable data-dependent leakage current I leak , due to the bit-line-shared devices. A typical sensing circuitry cannot detect the switching in such noisy conditions. One approach to address this issue is to avoid on-the-fly sensing during the write operation. Rather, the long write pulse is broken into several smaller write pulses, each of which is followed by a regular leakage-free read operation with low V r to detect the completion of switching [20] . However, this approach suffers from the power and performance overheads of the additional read operations.
In [28] a leakage-current-filtering mechanism is proposed to address this problem. In this method, each adaptive write operation consists of two stages. In the first stage, the datadependent leakage current of the bit-line-shared devices is latched. The latched I leak is then subtracted from the total current observed on the bit-line in the second stage, to obtain the write-current contributed only by the target cell, i.e. , I target . This filtered current is then sensed by a typical sensing circuit to detect the switching event. As there exists significant temporal variation in memristive devices for completing the switching of the write operation [29] , this method yields a considerable energy saving in ATF memristive crossbars by enabling adaptive write operation in such crossbars.
C. Write Disturbance
Write disturbance is an undesired coupling effect during write operation that affects several other memristors that share the same word-and/or bit-line [30, 12] . Due to the write disturbance, writing a logic value into one memristor may disturb the resistance of the line-shared memristors that store the opposite logic value. This effect is caused by the notable partial voltage (e.g. , V w /2) across line-shared devices during the write operation. The resistance degradation due to the write disturbance could accumulate over several write cycles and may eventually result in corruption or complete inversion of the stored logic value.
A solution proposed in [23] addresses the problem caused by write disturbance. This solution confines the write disturbance effect to word-line-shared devices by applying asymmetric voltages to the word-line and the bit-line (e.g. , −V w /3 on target bit-line and 2V w /3 on target word-line): bit-line-shared devices will experience less partial bias (e.g. , V w /3), which has a very negligible write-disturbance effect. Then, two regular devices on each word-line are assigned as canary cells, in which undisturbed logics 1 and 0 are stored initially. The canary cells cannot be accessed through the standard write interface and are meant to keep track of the worst-case, cumulative write disturbance effect for their corresponding logic on each word-line: while they are affected by the write disturbance effect similar to other memristors on the same word-line, they cannot be restored to the strong logic values via the standard write operations. During a write operation on word-line W , the resistance values of W 's two canary cells are monitored to avoid data corruption: As canary cells experience the worst disturbance accumulation of all devices on W (explained in the last paragraph), therefore, as long as the resistance value of each canary cell is in its valid range, the validity of the data stored on other devices on W can be guaranteed. Whenever the resistance value of a canary cell reaches a known close-to-corruption threshold, R T H , a refresh operation is triggered that refreshes all memristive devices on W . Fig. 4 illustrates write disturbance effect as well as the proposed solution.
7S-1

IV. SCALABILITY CHALLENGES IN MEMRISTIVE CROSSBARS
ATF memristive crossbars have limited scalability due to the fact that as the crossbar size grows, the number of partiallyselected devices increases.
One major issue is due to the collective sneak current in larger-scale ATF crossbars during the read operation. Engineering non-linearity in the I-V characteristics of memristive devices [7] could reduce the sneak current through individual sneak paths; however, the number of sneak paths increases with the size of the crossbar. This in turn results in a large cumulative sneak current in larger-scale ATF crossbars. Such large sneak currents adversely affect the read circuitry's ability to accurately determine the stored logic value.
Moreover, increasing the size of the crossbar escalates the leakage current of partially-selected devices during the write operation. This in turn results in 1) lower energy efficiency due to the higher wasted power in partially-selected devices, and 2) higher current requirements for the memory chip that might saturate the driving CMOS circuitry and result in significant voltage drop [7] .
Architectural solutions have been proposed to address such issues that prevent the accumulation of sneak and leakage currents by limiting the number of cross-points per nanowire, i.e. , partially-selected devices [14, 31] . The general idea is to segment long nanowires in large-scale crossbars into smaller segments with limited number of crosspoints per segment, while maintaining the high memory-density through novel architectural and interfacing innovations. One such solution was proposed by Kawahara et al. in which a hierarchical bit-line structure is employed [14] as shown in Fig. 5 . This structure breaks a long bit-line into several short segments, only one of which can be accessed at any given time. Each short bit-line segment has only 16 cross-points, effectively limiting the level of leakage and sneak currents. A select-transistor is utilized in the CMOS layer to select the active bit-line segment. The select-transistor does not affect the overall memory density as it is shared among 16 memristors and is laid out underneath the memristive crossbar. However, this method can support only up to two layers of memristive memory, which limits the potential memory density offered by this architecture.
Another architecture proposed by Likharev et al. , called CMOL, is especially designed for many-layer memristive memories [31] . In CMOL, the crossbar nanowires in the memristive layer are broken into smaller segments with few crosspoints per segment. Shorter segments are accessed via an areadistributed interface from the beneath CMOS circuitry. The distributed interface consists of two rectangular arrays of pins, called blue and red pins, rotated by an angle α with respect to the direction of the crossbar nanowires. Blue and red pins provide individual access to the horizontal and vertical nanowires, respectively. A memristive device is accessed by selecting one of the blue pins, and one of the red pins that falls in the connectivity domain of the blue pin. The distributed interface, rotated crossbar and connectivity domain for two CMOL configurations (with different α) are shown in Figs. 6a and b. Several CMOS-CMOL integration efforts has been summarized in [6] , and a realization of CMOL can be found in [13] .
A major drawback of the CMOL architecture is its sophisticated addressing scheme, as the shape of the connectivity domain depends on the number of crosspoints per nanowire segment. Figs. 6a and b show two of the possible shapes for a connectivity domain. In [32] a memory organization and address decoding scheme were proposed that facilitates the implementation of scalable 3D memory systems with CMOL interface. The proposed solution divides the crossbar and un-derlying CMOS circuitry into multicells that are square regardless of the shape of the connectivity domain, as shown with the dashed lines in Figs. 6a and b . The main advantage of this solution is that an operation in the multicell can be decoupled from other areas, enabling concurrent read, write logic 1 and write logic 0 operations in memristive crossbars, as illustrated in Fig. 6c . This organization allows the usage of such crossbars either as standalone memories or as memory banks in a multi-bank memory.
V. CONCLUSION
In this paper we discuss and address the challenges caused by the elimination of access-transistors in memristive crossbars. The parasitic sneak currents during read operations are avoided using memristive devices with non-linear I-V characteristics. Local reference cells are utilized to generate adjustable reference currents which provide a robust read operation even in the presence of sneak current. A filtering scheme is proposed to cancel the considerable leakage current during the write operation to enable an adaptive write operation in accesstransistor-free crossbars. Finally, the write disturbance is addressed by employing per-word-line canary cells that monitor the level of resistance degradation and trigger refresh operation when boundary conditions are met.
We also discuss the escalation of these effects in larger-scale ATF crossbars, which limits crossbars' scalability. Innovative architectures, such as CMOL, are presented that offer ATF crossbars with better scalability. A novel memory organization which addresses the inherent addressing complexity of CMOL architecture is also described.
