order to keep pace with Moore's law doubling the number of transistors every 18 months [1] [2] [3] [4] . Among them, 3-D cross point phase change memory (PCM) promises to replace NAND Flash and fundamentally changes the memory-storage hierarchy because of its combination of high performance, high density and nonvolatility [5] [6] [7] [8] . However, the chip performance is now limited by parasitic elements, sneak currents and vertical integration, which can be enhanced by the optimization of bias schemes [9] [10] [11] [12] .
3-D cross point memory uses stacked cross point arrays where metal lines are orthogonally crossing one over another, and a memory cell is formed at each intersection [13] [14] [15] [16] . Three common read schemes in a planar crossbar or cross point array-the ground, V/2, and V/3 schemes-have been extensively studied [17] [18] [19] [20] [21] . In 3-D cross point array architecture, the introduction of the ground scheme will select all the up and down layer memory cells on the selected word line (WL). Therefore partial bias schemes are favorable as they support a single bit read in a subarray [22] [23] [24] . The V/3 bias scheme has much higher power consumption than the V/2 bias scheme [16] , [25] [26] [27] [28] , which may not be acceptable in 3-D memory. Selectors which are capable of bidirectional operation, such as ovonic threshold switch (OTS) and mixed-ionic-electronic-conduction (MIEC), can be utilized in 3-D cross point PCM [29] [30] [31] [32] [33] . V/2 bias scheme is favorable in this memory because of supporting a single bit read, low sneak currents and good compatibility with bidirectional selector [16] , [34] .
Previous studies on bias schemes use static analyses to assess array performance (power consumption, writing voltage margin and sensing margin) with device parameters and array design details. However, dynamic simulations for assessing chip performance (dynamic power consumption, read access time and read errors) with device parameters, bias schemes, arrays and circuits are rarely reported.
Lei et al. [34] proposed a single-reference parasitic-matching (SRPM) sensing circuit for 3-D cross point PCM with considerations of factors that affect the read operation. Different from conventional read reference currents [15] , [24] , its reference current changes and shares a similar curve with the read current.
In The remainder of this paper is organized as follows. Section II introduces the array architecture, conventional bias scheme and the array core circuit. Section III analyzes limitations of chip performance and the role of bias scheme. Section IV introduces the proposed 2V/3 bias scheme. Section V presents evaluation results and discussions. Finally, Section VI concludes the paper. I cell is the cell read current. I read1 is the source current of NM A1 . I charge is the charge current of the half-selected cell. I sneak is the cell sneak current. V C is the voltage of the memory cell. In the read operation, cells in the subarray can be divided into four groups. Each group is classified depending on the decoding scheme of WLs and BLs as shown in Table I . BL operates two layers of memory cells. The array core circuit is designed accordingly, as shown in Fig.3 . The circuit comprises a deselect BL voltage source (DESBL) and a deselect WL voltage source (DESWL). Voltages supplied by DESBL and DESWL voltage sources are designed according to the bias scheme. BL is connected to the programming bit line (PBL) and the read bit line (RBL) through the programming transmission gate (PTG) and the read transmission gate (RTG), respectively. EN signal controls read and write operations. 
II. BUILDING 3-D CROSS POINT PCM

III. CONVENTIONAL BIAS SCHEME AND ANALYSIS
The 64Mbit 3-D cross point PCM has two-stacked-layers of memory cells, two layers of WL and one layer of BL. Each memory layer is divided into 4 banks. Each bank comprises 8 tiles of sub-banks (1024 rows × 1024 columns). In the array, n=1024, a=1024, m=16, c=16, b=8. Every 512Kbit (1024 rows × 1 column × 2 layers × 16 LTGs × 16GTGs) share an SA. Periphery circuits are designed in the SMIC 40nm CMOS process. The supply voltage is 2.5V. The chip utilizes the V/2 bias scheme and the SRPM sensing circuit [34] .
Parameters used in the simulation are listed in Table II , based on [16] , [29] , [30] , [35] [36] [37] [38] . R PCM is the PCM resistance. Compared to the previous research [34] , the switching speed of selector and the line resistance are considered. The selector compact model in this paper is also based on [29] , [30] , [35] , [36] , [38] . The model of PCM is based on [39] . In the read operation, SA has to charge parasitic capacitors in the array, which enlarges the read current and delays the read operation. In planar PCM, most of parasitic elements are planar parasitic elements. But in 3D PCM, vertical parasitic elements will also delay the read operation. In the SRPM sensing scheme, factors that affect the read operation in 3-D cross point PCM are all considered and I refnew shares a similar curve with I cell . However, a high peak I read still exists in the SRPM scheme. At the beginning of the read operation, I read rises to their maximum currents, then drops exponentially [9] , [37] . It is costly for I read dropping from the maximum value to the stable value. A high peak I read increases the read access time and the dynamic power consumption.
Three factors contribute to a high peak I read for 3-D cross point memory:
1) Parasitic capacitors in the array [34] . As the capacity of memory becomes larger, the number of parasitic capacitors in [29] ) to conduct a read operation. But V sth of emerging selectors is larger than that of diodes and V ds of transistors [29] , [30] , [35] , [36] .
3) High voltage variation of half-selected cells on the BL. It is not V C but the voltage variation of the cell that contributes to a high peak read current. The voltage variations of cells in the V/2 bias scheme are shown in Fig. 2 . The voltage variation of C SS between the standby and read phases is V. There is only one selected cell, and its influence on a high peak I read is small. The voltage variations of C US and C SU are V/2. Due to their opposite I charge directions, C US has a much bigger impact on the high peak I read than C SU [22] , [34] . The voltage variation of C UU is 0. In general, the high voltage variation of C US contributes most among four cell groups.
These three factors all lead to an extra charging process in the read operation. First two factors can be optimized by circuit and device designs. The last one can be optimized by scheme design.
IV. PROPOSED 2V/3 BIAS SCHEME
The main aim of the proposed read bias scheme is to reduce the high voltage variation of C US without severe sacrificing other performances of the chip. The proposed 2V/3 bias scheme is illustrated in Fig.6 . The selected BL and WL are biased to V and 0, respectively. The unselected BL and WL are biased to 2V/3. The voltage variation of C SS and C UU are still V and 0, respectively. The voltage variations of C US and C SU are V/3 and 2V/3, respectively.
For C US , I charge comes from the selected BL, which weakens I read . The voltage variation of the 2V/3 scheme is decreased from V/2 to V/3, comparing to that of the V/2 scheme. For C SU , I charge comes from the unselected BL, which would not affect the transient value of I read directly. Despite an increase in voltage variation, I charge of C SU still has a little impact on a high peak I read .
The proposed scheme also brings changes in the amount of I sneak for cell groups. I sneak of C US decreases, which is good for the reduction of read errors. Meanwhile, I sneak of C SU increases. As it comes from the unselected BL to the selected BL, I sneak of C SU would not affect the stable value of I read directly and has a little impact on it.
Compared to the conventional bias scheme, the proposed scheme will lower the maximum value of I read (or dynamic power consumption) and read errors. As the result of a reduced peak value of I read , a short read access time can be measured. In addition, C UU has no sneak current, avoiding full-array-current-sneaking. Therefore, the 2V/3 bias scheme has much lower power consumption than the V/3 bias scheme.
In the proposed scheme, 2V/3 should be lower than V sth for a low sneak current of two-third selected cells. Moreover, DESBL and DESWL voltages can be other than 2V/3 as long as they are higher than V/2. For example, 3V/5, 4V/5, etc. The DESBL and DESWL voltages should also be lower than V sth . All these bias schemes have similar advantages as 2V/3 bias scheme.
V. PERFORMANCE EVALUATION
We then utilize the 2V/3 bias scheme and the SRPM sensing circuit in a 64Mbit 3-D cross point PCM. Simulation parameters of the 2V/3 bias scheme are listed in Table II . Fig. 7 and Fig. 8 are simulation results of the proposed chip when RTG L <0>, RTG G <0>, BL<0> and WL 1 <0> are selected. The peak value of I read and I ref are 160.27μA and 160.28μA which are reduced by 114.64μA and 114.68μA, respectively. The sensing time is 29.7ns. The read and reference currents are all reduced (set: 5.49μA to 5.37μA, reference: 2.18μA to 1.73μA, reset: 1.67μA to 1.19μA ). Compared to the conventional scheme, the sensing time, maximum and stable values of I read are greatly decreased, which is also consistent with our theoretical analysis in Section IV.
Simulation results of the proposed chip when RTG L <15>, RTG G <15>, BL<1023> and WL 1 <1023> are selected are shown in Fig.9 . The sensing time is 30.89ns which is reduced by 22.5% compared to the conventional V/2 scheme. The sensing time of the proposed chip differs due to the resistance of the long BL and WL. Fig. 10 presents simulation results for four combinations of sensing circuits and bias schemes. The results include the sensing time of the regular simulation, the worst case sensing time of the Monte Carlo simulation and read errors. The read performance is evaluated under both the regular and the worst R PCM . In simulations, RTG L <15>, RTG G <15>, BL<1023> and WL 1 <1023> are selected. The sensing time of the SRPM sensing circuit is reduced by 5.68ns under the worst R PCM with the proposed bias scheme. That of the conventional sensing circuit is reduced by 16.34ns and 16.86ns under the regular and the worst R PCM , respectively. The 2V/3 bias scheme speeds up the read operation for both SRPM and conventional sensing circuits.
Monte Carlo simulations are performed using the industry compatible SMIC 40-nm model parameters. The simulation is conducted with the whole array and the periphery circuit. Accomplished in process & mismatch analysis, six times standard deviation (6σ) is used as variances of parameters and mismatches of MOSs and resistors. In each simulation, 4000 trials are run. The worst case sensing time of the SRPM sensing circuit is reduced by 50.04ns and 78.23ns under the regular and the worst R PCM , respectively, with the 2V/3 bias scheme. That of the conventional sensing circuit is reduced by 28.64ns and 37.9ns under the regular and the worst R PCM , respectively. Read errors are reduced for both SRPM (by 100%) and conventional (by 28.98%) sensing circuits. In particular, results of the SRPM circuit and the 2V/3 scheme show zero read error under the regular and the worst R PCM . Based on the above simulation results, the 2V/3 bias scheme is verified to be robust to process variations and have superiority over the conventional V/2 bias scheme.
VI. CONCLUSION
We have reported the design of a 2V/3 bias scheme for 3-D cross point PCM. Better dynamic performances have been achieved, such as shorter read access time, lower dynamic power consumption and less read errors. The design uses dynamic simulations to assess dynamic chip performances with device parameters, bias schemes, arrays and circuits, in contrast to traditional static analyses. The simulations show a high peak read current at the beginning of the read operation, which increases the read access time. Three factors which contribute to this high current are analyzed, and a read bias scheme for reducing the high voltage variation of half-selected cells on the BL is proposed. The 2V/3 bias scheme exhibits lower maximum and stable values of read current and consequently a lower sensing time for both SRPM and conventional sensing circuits. The scheme proves to have fast read access time even for the edge of the array. Chip performances of four combinations of sensing circuits and bias schemes have been compared, and the 2V/3 bias scheme is verified to be robust to process variations and have superiority over the conventional V/2 bias scheme. The dynamic analyses prove to be powerful in the optimization of bias scheme and the improvement of chip's dynamic performances. The next steps will be to utilize dynamic analyses of bias scheme to planar and other 3-D emerging memories.
ACKNOWLEDGMENT
We thank G. Liu and S. Zhang for useful discussions and Y. Xu, E. Deng and Y. Xia for reviewing the manuscript. 
