Abstract -This paper addresses the design challenges of simulating the 3-D vertical-resistive random access memory (V-RRAM) toward MB-level. The interconnect IR drop and sneak paths are known to be the limiting factors for building large-scale V-RRAM arrays. The previous approach to evaluate the write/read margin of V-RRAM was based on the exhaustive SPICE simulations, which prohibits the design exploration to MB-level as it takes huge amount of computation resources. In this paper, a quasi-analytical model is proposed, which aims to reduce the simulation time and the required memory usage. Through the validation with the SPICE simulation results, the proposed model shows a similar accuracy. Based on the proposed quasi-analytical model, the worst case data pattern of 3-D V-RRAM with large array size up to 4 MB is analyzed. The results show that it is more efficient to increase the number of stack layers than expanding the horizontal array size to achieve large subarray size.
especially the word-plane type of cross-point array) is very attractive due to the bit-cost scalability and less number of necessary interconnection lines [6] . The interconnect IR drop (voltage degradation) and sneak paths are known to be the limiting factors for building large-scale cross-point arrays [7] for both 2-D and 3-D architectures, as the write/read margin shrinks as the array-size increases.
Compact device models have been developed to capture the dynamic switching characteristics of single RRAM cell [8] , [9] . Quasi-analytical models based on MATLAB have been developed to evaluate the write/read margin for 2-D cross-point arrays [10] [11] [12] . For 3-D cross-point arrays, previous works [13] [14] [15] were mostly based on exhaustive SPICE simulations that run on a huge netlist of the full 3-D resistor network. The SPICE simulations require enormous amount of computation resources in terms of CPU time and RAM capacity as the array size increases. For example, to simulate a 256 kB 3-D V-RRAM array [13] , 20 min and 10 GB RAM usage are needed with a 3.6 GHz Intel i-7 CPU, and the simulation of an array with more than 512 kb typically does not achieve convergence.
This paper aims to address the design challenges of simulating the 3-D V-RRAM toward MB-level and will be organized as follows. Section II introduces the basic parameters of 3-D V-RRAM and the operation scheme. Section III analytically addresses the worst case data pattern dependence in the write and read operation of 3-D V-RRAM with the considerations of the interconnect IR drop and sneak paths, and then a quasianalytical model is proposed. Section IV validates the quasianalytical model with SPICE simulation results, and discusses the design challenges for increasing the size of 3-D V-RRAM above MB-level. Section V concludes the paper. AND OPERATION SCHEMES
II. 3-D V-RRAM ARCHITECTURE

A. 3-D V-RRAM Architecture
The schematic of 3-D V-RRAM with m select lines (SLs), n bit lines (BLs), and l word planes (WPs) is illustrated in Fig. 1 . The word lines (WL), the BL, and the select lines (SL) are used to decode the 3-D array. WLs are connected to one edge of each plane electrode. BLs located at the bottom of the array are connected to the pillars. Vertical transistors in series with the pillar electrodes are controlled through SL. The RRAM cells in purple color are formed at the sidewall of the pillar electrodes and the plane electrodes. It is noticed that the number of stacked layers l is determined by the feature size (F), the etching aspect ratio (AR), the thickness t i of isolation layer, and the thickness t m of plane electrode, which can be calculated as l = F × AR/(t i + t m ). Here, F is defined as the diameter of the pillar electrode (d) plus twice of the RRAM oxide thickness (t ox ), which is also half pitch between the centers of neighboring pillar electrodes. Another constraint parameter in the V-RRAM architecture is the drivability of the vertical transistor at the bottom of the pillar which decreases as the pillar diameter decreases. Based on the scaling trend shown in [16] , we adopt 100 μA as the saturation current for F = 30 nm.
B. Write/Read Scheme
According to the random access memory decoding scheming for 1-b write operation, only one WP, one BL, and one SL are selected at one time to access the selected cell R sss and the others are unselected. This means every voltage parameter, i.e., the applied voltage of word plane (V i ), vertical transistor (S) and bitline (V BL ) can be classified into two groups, one is the selected voltage and the other one is the unselected voltage. In the ideal case which ignores the resistance of plane electrode and pillar electrode, we can divide 3-D VRRAM cells into six groups where the cells in the same group share the same two-terminal voltage, as shown in Fig. 2(a) . Here each group is classified depending on the decoding scheme of SLs, BLs, and WPs as shown in Table I . To help understand the equivalent circuit of each group, we label it with four parts including selected WP, unselected WPs, unselected slice pillar parts and Fig. 2(b) . Besides the resistance of the selected cell R sss which is known, the equivalent resistance of the other five groups (R sus , R sxu , R uxu , R uus , and R uss ) can be analytically calculated as
where R(i, j, k) denotes the resistance of RRAM cell at i th word plane, j th bitline and kth select line and (i 0 , j 0 , k 0 ) corresponds to the position of selected element R sss . To avoid the undesired disturbance on unselected cells, the 1/2 bias scheme is typically used for the write operation. In the 1/2 bias scheme, the voltages of selected WP (V sel_WP ) and BL (V sel_BL ) are biased to the write voltage (V w ) and ground, respectively, while the voltages of other WPs (V unsel_WP ) and BLs (V unsel_BL ) are biased to V w /2 to prevent the unintentional write.
For the read operation, we follow the "read-in-a-row" scheme: V sel_WP is biased to the read voltage (V r ), while V unsel_WP and the voltage of all the BLs (both V sel_BL and V unsel_BL ) are grounded. The current-mode sense amplifier is typically used in today's RRAM macro design due to a smaller latency as compared with the voltage-mode sense amplifier. In this paper, we assume that a current-mode sense amplifier is designed to sense the read out current from BL, and the minimum I = 100 nA is used as the criterion for read margin of a fast sensing latency within tens of ns [17] . For typical RRAM devices with an average switching voltage of 2 V, a possible switching range of 1.7-2.3 V is assumed. In this paper, the write access threshold is set to 2.5 V to ensure a safe write operation.
III. QUASI-ANALYTICAL MODEL OF 3-D VERTICAL RRAM
For a realistic circuit analysis of 3-D V-RRAM, the interconnect IR drop effect should be taken into consideration for both write and read operation, which accounts for the voltage degradation in the selected WP away from the edge driver and also the sneak paths from the unselected WPs to the selected BL due to nonequal voltage potentials. [13] , [14] used exhaustive SPICE simulations to run such complete circuit model, which consumed enormous amount of computation resources for large array size.
A. Complete SPICE Model
Due to the constraint of fabrication technology, the number of stacked layers l is typically much smaller than the number of select lines m and bit lines n. And for the pillar part in the unselected slice, the current flowing in each pillar is no larger than switching current (I w ) which is relatively small, so it is reasonable to neglect the pillar resistance in the that part. Keep in mind, we do consider the pillar resistance in the selected slice. With this approximation, Fig. 3(b) shows the equivalent circuit of the complete model. Because of the relative large area of the WP network, both the group R sxu and R uxu can no longer be represented by one resistor, and they are divided into n(m − 1) pillar units in the equivalent circuit. The blue resistors in each word plane r sr_i and r sr_k represent the equivalent plane resistance of the shared path to selected slice in piecewise, and the red/green resistors r ur_i and r ur_k represent the equivalent plane resistance of independent path to the pillar units. And r p denotes the equivalent pillar resistance in the selected slice.
B. Worst Case Data Pattern Analysis
For the write operation, we care about the write voltage V sc across the selected cell. It can be calculated as where V sel_WP = V w (nominal write voltage), the second term represents the voltage (IR) drop due to the current leakage in the unselected slice pillar part I p_ j from the selected WP to the unselected WPs, and the third term represents the voltage drop caused by the current of group R sss and R sus across the selected WP. The worst case data pattern is to maximize I p_ j , I sss and I sus which means all the cells in the 3-D VRRAM should be in the low resistance state (LRS) to maximize the IR drop effect.
For the read operation, we care about the current flowing through the selected BL:
where V sel_BL = 0, V sc can be derived by (2) with V sel_WP = V r (nominal read voltage) and the voltage for the active slice V as can be calculated as
where V unsel_WP = 0, and r sr_k = r sr_i /(l − 1). When we read the selected cell in high resistance state (HRS) in the worst case, each cell in the group R uss is assumed to be LRS to maximize the read current according to (3) . So R uss = R LRS /(l − 1) and we have
where the worst case data pattern is to maximize I p_ j (R sxu and R uxu are LRS) and minimize I uus and I sus (R uus and R sus are HRS) under the longest IR drop path. Similarly, when we read the selected cell in LRS in the worst case, each cell in the group R uss is assumed to be HRS to minimize the read current. So R uss = R HRS /(l − 1) and we have
where the worst case data pattern is to maximize I p_ j , I uus , and I sus under the longest IR drop path which means all the cells in the R sxu , R uxu , R uus , and R sus parts should be in the LRS to maximize the current leakage effect. Table II summarizes the worst case data pattern which depends on the operation being performed as well as on the state of the selected cell. 
C. Quasi-Analytical Model
The solution of a circuit model is commonly based on the Kirchhoff's laws to calculate the unknown current flows in every branch of circuit or the unknown voltage variables by numerical iteration method. So the complexity, including the cost of simulation time and memory resource, of getting a solution to the circuit model is determined by the number of coupled equations which equals to the number of junction points. For the complete model above, there are 8mn junction points in each word plane and 8mnl + 2mn junction points in total. It is inefficient for SPICE simulation to converge when the array size grows larger, especially in the vertical direction when l is large.
With the help of the aforementioned analysis of worst case data pattern, we propose a quasi-analytical model which just contains two circuit planes: the selected word plane and the active slice of array, as shown in Fig. 4 . As a result, the number of the total junction points is now reduced to 8mn+nl.
The main idea of proposed model is to replace the complex resistor network of unselected word planes with two critical parameters: the pillar voltage V p and the voltage for the active slice in the unselected word planes V as . For the worst case data pattern in Table II , the RRAM cells in the group R sxu and R uxu are always in LRS and we notice that compared with the selected slice part, the current flowing in the unselected slice pillar parts for large array size (especially when the number of SLs is sufficient large) dominates the current leakage. So the effect of third term in (2) and (4) can be almost neglected. With the help of (1), (2) and (4), we can derive
IV. SIMULATION RESULTS AND DISCUSSION
A. Simulation Parameters A summary of the parameters used in this simulation is listed in Table III which are adapted from [13] . The RRAM device I -V nonlinearity, which can be increased by selector devices as 1-selector-1-resistor (1S1R) configuration or be tuned by band structure engineering of heterojunction oxide materials, is defined as the ratio of current at V W to that Example of bipolar RRAM I-V characteristics with built-in selector: R ON = 200 kΩ, R OFF = 20 MΩ and nonlinearity =10X. In the simulation, the nonlinearity is considered to be the same for the ON and OFF state.
where R V W and R V W /2 denotes the resistance of RRAM cell at V W to that at V W /2 respectively. To enable the performance evaluation on a very large array size, the full nonlinear I -V characteristics of RRAM (or selector) devices (e.g., Verilog-A model or TCAD) are not incorporated into the SPICE simulation as it will significantly increase the simulation time and the required memory resources. Similar as the previous works [13] , we use the nonlinearity as a generic parameter to characterize the current ratio between the selected and unselected cells as an approximation. Here R ON and R OFF are referred as the resistances at V W . As an example, device I -V simulated using bipolar RRAM compact model [18] and exponential type selector [19] with R ON = 200 k , R OFF = 20 M and K w = 10× is shown in Fig. 5 .
In the simulation, the resistor network of selected WP and selected slice keeps the same as the complete circuit model of previous works [13] , [14] , and we add a new dc voltage source V p which is calculated by (7) and directly connected to each RRAM cell in the selected WP. While according to (8) , the applied voltage V as is implemented by the voltagecontrolled voltage source.
B. Accuracy Validation
We validate the proposed quasi-analytical model against the complete circuit model as proposed in the previous works [13] , [14] . And we define the error rate Error rate = (R Sim − R Pre )/R Pre (10) where R Sim and R Pre denote the simulation result of simplified model and previous model, respectively. Fig. 6(a) shows the selected cell's write voltage as a function of array planar size for 4 vertical layers and 8 vertical layers with fixed parameters K W = 10, R ON = 500 k , and R OFF = 2.5 M . Fig. 6(b) shows the error rate of write margin between the quasi-analytical model and the complete model. It can be seen that the proposed quasi-analytical model is in a good agreement with complete SPICE model for the write operation, exhibiting a maximum error less than 3% at 256 × 256 word-plane size. Similarly, Fig. 7(a) shows the read current as a function of array planar size for eight layers with fixed parameters K W = 10, R ON = 500 k , and R OFF = 2.5 M . Fig. 7(b) shows the error rate of read margin. It can be seen that the proposed quasi-analytical model is in a good agreement with complete SPICE model for the read operation, exhibiting a maximum error of 4% at 256 × 256 word-plane size with K W = 100, R ON = 500 k , and R OFF = 50 M . Although the error rate performance in Figs. 6(b) and 7(b) appear to diverge for a larger array size, however, it should be noted that the absolute difference between the proposed quasi-analytical model and the SPICE model is still very small (no more than 0.05 V for write operation and no more than 5 nA for read operation). When we evaluate the write margin at 2.5 V or the read margin at 100 nA, such error is still much less than the write/read failure criterion. As the primary goal of the proposed model is to quickly evaluate different 3-D V-RRAM array configurations whether they would pass the write/read margin, such small error does not affect the qualitative assessment results.
C. Cost Performance
The above analyses are based on 3-D V-RRAM array with no more than 512 kb (256 × 256 × 8 layers). Our simulation tool is HSPICE based on the computer platform with the configuration of Intel i7-4790 Processor (8M Cache, 3.6 GHz), 32 GB DRAM (DDR3-1600 MHz). During the simulation, we find 512 kb seems to be the maximum size of 3-D V-RRAM array allowed to be simulated by the traditional complete circuit model, and a convergence error would occur when the array size becomes larger than 512 kb.
To validate the cost performance of the proposed quasianalytical model, we make a comparison of required memory resources (DRAM usage) on the computer and the simulation time between the traditional model and the proposed model, as shown in Fig. 8 . It clearly shows that the simplified quasianalytical model exhibits much better cost performance than the traditional model.
Besides, we observe that the cost performance of traditional model is very sensitive to the number of stack layers, while the cost performance of the simplified model is almost determined by the plane size and independent with the number of stack layers. Those characteristics make our proposed simplified model extendable to simulate larger size of array, especially in the vertical direction.
D. Design MB-level 3-D V-RRAM Subarray
For high-density memory with array efficiency comparable to the 3-D NAND Flash, exploration of the design space and nonlinearity ratios K w is explored, as shown by Fig. 9 . It shows that a high R ON or a high K w tends to decrease the read current of the selected cell which results in the read failure, while a high I W or a low K w tends to increase the write current via the unselected cells and enlarge the interconnect IR drop which results in the write failure. In general, there are two types of bidirectional selectors: Type I: exponential I -V and Type II: threshold I -V [19] . Type I selectors rely on an exponential slope in the I -V curve to turn on the selector, accompanied with an increase of the current by several orders of magnitude. In our model, it takes the nonlinearity of RRAM cell in to consideration which can be treated as the effect of Type I selector. As a result of such exponential I -V property, the read sense margin generally degrades because the readcurrent for LRS is also suppressed with higher K w . On the other hand, a low R ON and low K w [the bottom left corner of Fig. 9(a) ] for the large-size array design may lead too much current leakage from the selected WP to unselected WPs, and also resulting in the read failure.
By the comparsion between Fig. 9 (a) and (b), it shows that the scalability of 3-D V-RRAM is more efficient in the vertical direction than that in the planar direction, as 256 × 256 × 64 still allows some "all-pass" region for the design parameters. Morever, considering the cost benefits [1] and decoding burden [6] , increasing the number of stack layers instead of enlarging plane size is a more promising way to design a 3-D V-RRAM subarray for a given array-size. If R ON is low, e.g., 50 k or below, it is impossible to meet the write margin requirement even with a large nonlinearity. This is because a significant voltage drop occurs across the transistor due to the insufficent current drivablity of the vertical transistor as illustrated in the write voltage drop chart in Fig. 9(c) . In Fig. 9(c) , we can also see that the voltage drop along the pillar is very small. Therefore, we also explore the effect of current drivability of the vertical transistor in Fig. 10 . We can boost the current drivability either by increasing the gate voltage, or increasing the channel length, however, cautions need to be taken: increasing the gate voltage may result in gate dielectric reliability degradation, increasing the channel length means larger diameter of the pillar thus losing the integration density.
V. CONCLUSION
A quasi-analytical model is proposed to explore the design space of 3-D V-RRAM, which not only shows a good agreement with the complete SPICE model but also greatly reduces the simulation time and the memory resources. By applying the proposed model, we analyzed the worst case data pattern of 3-D V-RRAM with MB-level subarray size. The results show that it is more efficient to increase the number of stack layers than expanding the horizontal array size to achieve large subarray size. To further enable larger subarray size, cutting the word plane into separated word lines as suggested in [5] may be helpful. Under the same read/write scheme (e.g., 1/2 bias scheme), the 3-D array with separate word lines may have a better same performance as compared to the word plane type of 3-D array. Because in the activing plane, the separate WLs could help reduce the leakage paths in the pillars from the selected plane to the unselected plane. The trade-off will be more complicated fabrication process for separated word lines formation. Adding a selector in series with RRAM cell is an attractive way to achieve high nonlinearity as typically employed in the 2-D or 3-D horizontal array. However, adding additional selector is challenging for 3-D V-RRAM array. The first reason is that the selector will increase the lateral cell size as it is located at the vertical pillar sidewall, limiting the downscaling of the RRAM cells. The second reason is that the selector on the vertical sidewall may cause short-circuit between all the layers if the conduction of the selector is not filamentary. Thus, finding a thin-layer of selector candidate that allow the local conduction or achieving built-in nonlinearity by self-selection RRAM cell [20] is critical for developing future 3-D V-RRAM array.
