Abstract-This paper describes a new memristor crossbar architecture that is proposed for use in a high density cache design. This design has less than 10% of the write energy consumption than a simple memristor crossbar. Also, it has up to 4 times the bit density of an STT-MRAM system and up to 11 times the bit density of an SRAM architecture. The proposed architecture is analyzed using a detailed SPICE analysis that accounts for the resistance of the wires in the memristor structure. Additionally, the memristor model used in this work has been matched to specific device characterization data to provide accurate results in terms of energy, area, and timing.
INTRODUCTION
As CMOS devices have shrunk into the nanoscale regime, the increase in power density in CMOS systems has stopped the increase in single core processor performance. For this reason CPUs are now based on multicore architectures. There are two main factors that limit the performance of these architectures. First, there is currently not enough on-chip memory to effectively handle the instruction and data load that the multicore architecture is capable of processing. Second, power consumption limits the number of cores and on-chip memory, thus limiting performance.
As an alternative to traditional SRAM, Resistive Random Access Memory (RRAM) is a promising solution to the forthcoming memory wall problem in conventional CPUs. These memories work based on different resistive switching mechanisms where a dynamic resistance value determines the memory state of the device. The three main types of RRAM include memristors [1] [2], Phase Change Random Access Memory (PCRAM) [3] , and Spin-Torque Transfer Magnetic Random Access Memory (STT-MRAM) [4] .
In 2008, the first physical realization of the memristor (initially theorized in 1971 [1] ) was published [2] . Furthermore, memristor crossbar arrays have been proposed [5] as the potential building block of an ultra-high density memory system. The problem with these high density crossbar arrays is that the power consumption will increase dramatically with the size of the crossbar [6] . This is due to the many alternate current paths lowering the effective resistance of the array. Additionally, read errors are much more likely due to these alternate current paths. To solve this, a 1 transistor-1 memristor (1T1M) bit cell can be used which is commonplace in STT-MRAM architectures [7] [8] . Unfortunately, this will lower the density of the memristor memory system to that of a single transistor array.
This paper presents a memristor based memory system that is capable of achieving more than 4 times the density of a typical STT-MRAM array. Additionally, it has dramatically reduced power consumption when compared to a high-density transistor-less memristor crossbar. This is done by tiling many smaller memristor arrays for partially isolated resistive grids.
The analysis of this memory design is performed through SPICE simulation. To model the memristors, a previously published device model [9] is utilized that is capable of reproducing memristor characteristics very accurately. Both the wire resistance and the isolating transistors in the array are simulated to provide a more complete crossbar analysis. This work describes a very accurate device level simulation of a novel memristor based memory architecture, and it provides results for energy consumption and noise margin within the circuit. An accurate area analysis is also performed that describes the layout of the memory system. Very few crossbar simulations [10] [11] account for wire resistance, and these were completed with less accurate device models. This paper is organized as follows: Section II provides a comparison of the existing resistive memory devices and crossbar memory designs. Section III describes the design of the proposed memory architecture and Section IV discusses the procedure used to analyze the simulated crossbar tiles. Section V displays the results of the crossbar tile analysis and Section VI concludes the paper.
II. RESISTIVE MEMORY TECHNOLOGY

A.
Resistive Memory Devices Resistive switching devices such as STT-MRAM, PCRAM, and memristors have all been proposed as possible solutions for the development of high density memory. These different types of resistive memory devices are used in a similar manner, although their properties differ slightly.
Previous research [8] [12] suggests that STT-MRAM is the most promising candidate for the future of high-density, nonvolatile, resistance switching memories. However, the R OFF /R ON ratio of STT-MRAM is typically only about 2.5
[4] [13] , so it is not likely that an STT-MRAM memory system would work without an access transistor for each individual memory device. This creates a problem where the maximum areal density of this type of memory system is limited by the size of an access transistor (similar to Fig. 1 PCRAM is another promising new memory technology. It has the lowest endurance in terms of switching cycles before failure, and generally has a longer switching time (50 to 100ns) [14] when compared to memristors and STT-MRAM. For these reasons, the majority of the PCRAM based memory systems are proposed as a replacement for DRAM as opposed to SRAM. However, PCRAM has the advantage of unipolar switching [3] , so diodes can be used to limit unwanted current paths in a higher density design.
Memristors have several advantages when comparing to these alternative RRAM technologies. The memristor device that was selected for the final results in this paper has a relatively fast switching time (10ns) and very low current draw with a minimum resistance of 125kΩ [5] . Additionally the device has a very large off to on ratio (about 10 6 ) that will be very useful in the proposed design since a small number of unwanted current paths will be present. A number of other memristor devices [15] [16] were tested for use in the system, but according to our simulations they either had a power consumption that was too large, or a R OFF /R ON ratio that was too small.
B.
Crossbar Array Designs Other studies [5] [6] [17]- [20] have proposed large memristor crossbar arrays as a high density memory design. The layout and circuit diagram typically proposed for this type of memory is seen in Figs. 2(a) and 2(b). In this design each nanoscale memory element will consume an area of just 4F 2 [21] where F is the feature size of the fabrication technique. In contrast, a typical STT-MRAM memory array has a transistor present with each memory element which increases the cell size to 40F 2 [42]. The schematic in Fig. 2 (b) illustrates a potential problem with high density memristor crossbars. To read or write memory element m 1,1 voltages are applied to the wires a 1 and b 1 . As there are no access transistors in this design, nothing is stopping current from flowing through other memristor devices. This can lead to potential read errors in a large crossbar because the current sensed at b 1 could be due to a chain of devices in a low resistance state when the selected memristor is in a high resistance state. The high R OFF /R ON ratio in memristor devices helps alleviate the impact of alternate current paths. A high R OFF /R ON ratio enables a much larger possible voltage range to be observed at the output of the crossbar.
Alternate current paths can also lead to unrealistically high energy consumption. Some preliminary simulation results show how the energy consumption of a high density crossbar increases with crossbar size in Fig. 3 . The value plotted is the energy required to write a single bit. This increase in current consumption is due to an increase the number of possible unwanted current paths within a larger crossbar. Since the simulations were performed in SPICE, a 16×16 crossbar containing 256 memristors was the largest system that could be simulated in a reasonable time frame. If assumed to be linear, the data can be extrapolated to show that large crossbars quickly reach a level of energy consumption that that would be unrealistic for a competitive memory technology. This study was performed assuming a 5Ω wire resistance between all memristors modeled after the device in [5] using 10ns ±7V write and erase pulses.
As a possible solution to this problem, one publication [22] presents a memristor device with current suppression in one direction. This reduces the problem of alternate current paths, although the switching time of this device is too large for an on-chip memory application (100µs).
In the following sections we use accurate SPICE models of memristors [9] [23] [24] and modeled wire resistance in the crossbars to capture detailed behaviors of alternate current paths. We simulated multiple read and write cycles for memristor crossbars of varying sizes. Our results indicate that crossbars of 16×16 memristors have errors when reading the stored data. Our results also show that alternate current paths lead to an increase in the total current drawn within larger crossbars, thus increasing the power consumption.
It should be noted that other research has proposed that devices with a more non-linear I-V curve [17] can be used to increase the size of the crossbar before errors occur. However we show that our design is capable of working with linear devices. Therefore, the proposed crossbar design reduces the requirements placed on the memristor devices themselves to design a working system.
III. PROPOSED HYBRID CROSSBAR DESIGN
The proposed hybrid crossbar architecture is a combination between a high density array, and one with transistor isolation. In this design, transistors are used to isolate small crossbars within a larger array. These smaller memristor arrays will be referred to as tiles. Fig. 4 displays a portion of the circuit design for the hybrid memory system.
In this example, 4 memristor tiles are displayed, each consisting of 16 memristors arranged into a 4×4 square. The top of the circuit displays a pulse generator block, which is responsible for sending data to a single row in each tile (through either D R1 , D R2 , D R3 , or D R4 ) and grounding the rest. Additionally, a row decoder containing the row select signals (S 1 through S N ) is designed to turn on only one row of tiles during a parallel read or write operation. During a read or write operation, data from the selected row of tiles will be processed by the column circuits. This design allows for all unwanted current paths to be contained within the small 4×4 crossbars with twice the bit-cell density of a 1T1M design.
A write operation in this design is a two-step process [25] that writes to an entire row of memristors in each of the selected tiles (see Fig. 5 ). The first step is to apply a voltage of V w /2 to a selected row (grounding the other 3), and to apply a voltage of -V w /2 to all columns where a 1 (low resistance state) should be written (where V w is the write voltage). Furthermore, a voltage of V w /2 should be applied to all global column wires were a 0 (high resistance state) should be written (through the write enable transistors). This will result in writing a 1 to only the memristors in the selected row that need to be set to 1. During the second step in the write process, a voltage of -V w /2 is applied to the selected row, and all global column wires are set as they were in step one. This will result in writing a 0 to the rest of the memristors in the row. A parallel read operation is performed by setting a selected row of memristors to a voltage below the switching threshold, and activating the read enable transistors in each column circuit. The analog read voltage across R S (in Fig. 2 ) is converted to a binary signal using a comparator and the constant threshold resistance R T .
Data After Complete Write Cycle 
IV. HYBRID CROSSBAR ANALYSIS
A.
Memristor Device Model To perform a device level analysis of a memristor crossbar memory system, a SPICE equivalent of the memristor model first proposed in [9] was utilized. This model was set to match the characterization data of one of the memristor devices published in [5] (see Fig. 6 ). This device was chosen for use in the proposed memory design because it had a large R OFF /R ON ratio (10 6 ) while still retaining a relatively low switching time (about 10 ns). It also has a large on state resistance of about 125kΩ (determined by the 8µA current from a 1V read pulse).
The simulation result in Fig. 4 shows the minimum and maximum resistances of the model to be 124.95kΩ and 125.79×10
9 Ω respectively, which correlates very closely to the characterization [5] . Applying a +7V pulse successfully switches the device into a low resistance state, and applying a -7V pulse drives the model into a high resistance state. These strong simulation results show that a reliable device model has been developed, and this will lead to more accurate results when simulating memristors. The following parameter values were used in the model [9] 
B. SPICE Circuit Simulation
To determine the maximum noise margin and energy consumption of both the 4×4 and 8×8 tiles, a large string of read and write signals was applied to a crossbar simulation. Crossbar circuit operation varies based on the resistance values of memristors within the crossbar, so a large number of randomized signals were applied to obtain an average result. This was done to complete a simulation more similar to actually memory operation as opposed to testing a few worst case scenarios.
To complete this task, the signals were generated in MATLAB and then exported in a file that could be interpreted in LTSpice (see Fig. 7 ). The signals generated included switching signals for the transistors to select the correct memristors, as well as the data signals that contained the read and write pulses. The row of memristors that was to be written was chosen at random by the MATLAB script. The data that was written in the memristors was also randomized by choosing to apply either a +7 or -7V pulse signifying a write or erase operation respectively. These voltages were chosen to match the switching characteristics in [5] . After each write operation, every row within the crossbar was read and the read output voltages were recorded. Upon completing the simulation the read output voltage signals were compared to the write data to determine if any read errors had occurred.
If no read errors were present, the noise margin was calculated. The noise margin is defined as the voltage range between the largest voltage required to represent a 0 (V 0H ), and the smallest voltage required to represent a 1 (V 1L ), as shown in Fig. 8 . This experiment was repeated for different values of R S until the noise margin was maximized. To determine the write energy, an experiment similar to that used to determine noise margin was also completed in SPICE. In this case the write sequences were applied with no read cycles. The power dissipated in each device in the circuit was integrated over the simulation time to determine the total energy consumption in the crossbar. The total energy consumption in the circuit was then divided by the number of writes to determine the average write energy per bit.
To determine the read energy, the half of the memristors in the SPICE circuit were initialized to their high state and the other half were initialized to the low state. Each row in the memristor crossbar was read and the total power in the circuit was integrated and the divided by the total number of memristors in the circuit to determine the average read energy of a single bit.
V. CROSSBAR TILE RESULTS
A.
Memory Analysis A large number of simulations were completed to determine the optimal sense resistance and write voltage that would maximize the noise margin (see Table 1 ) for each of the crossbar tile designs (4×4 and 8×8). When comparing the two tile designs, Table 1 shows that the 8×8 crossbar provides twice the bit density. Although, it has a lower noise margin and consumes more energy due to the alternate current paths. Table 2 compares the proposed memory array designs to other existing memory architectures. The values for bit density were calculated assuming 45nm technology. The 4×4 and 8×8 tiled systems consume less than 5% and 10% of the energy consumed in the unconstrained 1kB crossbar respectively. Furthermore, the read energy in the tiled systems is less than 1/3 of the read energy in the unconstrained crossbar.
The proposed architecture has a higher density when compared to SRAM, although it has a longer write time. It is noted in Table 2 that while it takes 10ns to wire a single bit in the memristor designs, it will take 20ns to write an entire row due to the 2 step write process. SRAM has a large amount of leakage energy [26] that is not present in the memristor or STT-MRAM based systems. This is because these devices do not require power to retain their memory state, and the transistors in the crossbar tiles are only active when a read or write pulse is present.
The SRAM leakage current was assumed to be 29μA/Mb with a 1V operating voltage [26] . To obtain leakage energy of a single-bit access, it was assumed that an average of 1000 accesses per bit would be performed in one second. It should be noted that the leakage energy will be consumed by all memory cells within an SRAM array even if they are not being accessed, so this will lead to larger total energy consumption. The read energy in the STT-MRAM system was found to be larger than that of the memristor systems. It is assumed that STT-MRAM cells have a lower resistance range, and thus draw a larger current during a read. 
B.
Area Analysis Each tile has transistors along two edges -one for each row access wire and column access wire. Thus 4×4 tiles require 8 transistors to drive the 16 memristors within (an average of 2 memristors per transistor). An 8×8 tile would require 16 transistors to drive the 64 memristors within (an average of 4 memristors per transistor). This allows the memristive memories to have multiple bits stored per transistor.
In these designs the size of the transistors is still the limiting factor in memory density. However, the 4×4 and 8×8 tile systems provide an increase in density over a 1T1R system by a factor of 2 and 4 respectively. The layouts for the 4×4 and 8×8 tiles can be seen in Fig. 9 , where the red squares represent memristors. To keep the density of the memristor tiles high, the access transistors must be made as small as possible. Using memristors in the proposed memory system with very low values for R ON will increase the drain current in the access transistors. This will cause the transistors to become very large. Using a memristor device with a high R ON value such as the one modeled in Fig. 6 (R ON ≈125kΩ) will help minimize the worst case current requirement of the crossbar. According to the crossbar simulations in section 3.4, the absolute maximum current spike through a transistor in the 4×4 tile was about 175µA (and about 225µA in the 8×8 tile).
Equation (1) is used to determine the minimum possible width of the transistors based on the drain currents required for the crossbars. A similar method for determining transistor size was presented in [25] . 
Using a value of 225µA for I D , equation (1) was solved for W to determine the minimum transistor size. It was determined that a transistor area of 50F 2 was required for this system. These results also correlate to the general trend for transistor sizing in STT-MRAM [27] [28] , where a 40F 2 device is required for driving a 196µA current [27] .
VI. CONCLUSION
A new hybrid memory system has been proposed that can provide up to 5.2 times the memory density of an STT-MRAM system. The proposed system has a significantly lower energy consumption compared to a large high density memristor crossbar. A detailed analysis of the design including wire resistance and accurate device modeling was performed in SPICE. Future work includes a further study of the memory system to see if larger tiles would benefit the system. Our existing simulations show that a 16×16 tile is not capable of producing a significant noise margin when modeling devices with a linear I-V curve. Although, it may be possible to correct this by modeling non-linear memristor devices. If larger tiles are used, it may be possible for this system to approach the bit density of a transistor-less crossbar.
