Abstract: A multi-stage dual replica bit-line delay (MDRBD) technique is proposed for reducing access time by suppressing the sense-amplifier enable (SAE) timing variation of low voltage static random-access memory (SRAM) applications. Compared with the traditional technique, this strategy, using statistical theory, reduces the timing variation by using multi-stage ideas, meanwhile doubling the replica bit-line (RBL) capacitance and discharge path simultaneously in each stage. At a supply voltage of 0.6 V, the simulation results show that the standard deviations of the SAE timing and cycle time with the proposed technique are 69.2% and 47.2%, respectively, smaller than that with a conventional RBL delay technique in TSMC 65 nm CMOS technology (Taiwan Semiconductor Manufacturing Company, Taiwan).
Introduction
At present, the static random-access memory (SRAM) is widely used for many emerging portable electronic devices (Chang et al., 2011; Gammie et al., 2011) . Generally, to reduce the power consumption and speed up the operation during the reading phase of SRAM, a sense amplifier (SA) is used. Fig. 1 presents the read operation of SRAM. Considering the mismatch between the cross-coupled NMOSs (N 1 and N 2 , Fig. 1b ) due to random dopant fluctuations (RDF) (Keyes, 1975; Pelgrom et al., 1989; Johnson et al., 2008) , sense-amplifier enable (SAE) signal should be activated after there is a tiny differential ‡ Corresponding author * Project supported by the National Natural Science Foundation of China (No. 61474001) ORCID: Chun-yu PENG, http://orcid.org/0000-0003-2408-5048
c Zhejiang University and Springer-Verlag Berlin Heidelberg 2015 voltage swing level that is detectable and larger than the offset voltage (V OS ) of SA, ΔV BL , between the bit-line pair (Lovett et al., 2000; Song et al., 2010) . The smaller the bit-line swing, the less the access time and power consumption. From this point, the accurate timing of SAE is necessary. If the SAE signal arrives earlier than the time when the differential voltage starts to exceed V OS , SA will not perform the amplifying operation properly and this leads to read failure. On the other hand, if the SAE signal is asserted too late, the additional access time and extra power consumption increase unnecessarily. Hence, the optimum timing of the SAE signal has a critical impact on the design of high-speed and low-power SRAM. However, the timing of SAE is sensitive to process, voltage, and temperature (PVT) variations (Amrutur and Horowitz, 1998; Osada et al., 2001; Arslan et al., 2008; Komatsu et al., 2009; Niki et al., 2010; Arandilla and Madamba, 2011; Kawasumi et al., 2012; Li et al., 2014; Wu et al., 2014) . Thus, for low-supply voltage applications, the timing variation of SAE will deteriorate.
To obtain the optimum timing with minimal slack for SAE, the replica bit-line (RBL) technique is commonly adopted (Amrutur and Horowitz, 1998; Arandilla and Madamba, 2011) . In this method, the replica cells (RC), dummy cells (DC), and replica bit-line capacitance, rather than the logic gate, are employed for tracking the normal bit-lines discharge delay to obtain the suitable timing. However, with fabrication technology scaling, the threshold voltage (V TH ) of the transistor is more prone to shift caused by RDF. Meanwhile, the RBL technique cannot track the V TH variation, which leads to read failure and increase of access time, particularly for low-supply applications.
To solve this issue, further suppression of the timing of SAE is necessary. In this paper, using statistical theory, a multi-stage dual replica bit-line delay technique is proposed, which is divided by M stages, while in each stage doubling the replica bitline capacitance and discharge path simultaneously. Thus, the optimized timing of SAE is suitable for the low-voltage SRAM application.
Related RBL delay techniques
The conventional RBL technique, presented in Fig. 2 , was proposed for the timing of control path tight tracking the time of the read discharge (Amru- (Amrutur and Horowitz, 1998) showed that the RBL technique is more robust than the inverter chain delay technique under process variations (PVs). The robustness characteristic of the RBL delay technique depends on having the same systematic variations. However, due to local variations, the RBL delay can be decreased, reducing the bitline swing and causing a read failure, or increased, causing an increase in power consumption. Fig. 3 presents the effect of SAE timing variation caused by local variation of RBL on performance.
To obtain the optimum timing of SAE, the configurable replica bit-line delay (CRBD) technique (Arslan et al., 2008) has been proposed, based on the conclusion that the timing variation of SAE is reduced, compared to that of a conventional RBL circuit, by using plural replica cells in the RBL column, according to Osada et al. (2001) . The method shrinks the timing variation of SAE by ∼ 14×. However, the CRBD technique increases the implementation costs because additional post-silicon tests are RBL. As a result, the theoretical total standard deviation of timing variation, σ, is divided by √ M (M is the number of the stages) compared to that of conventional RBL. However, with the increase of M, the mismatch between global RBL and normal bit-line becomes obvious owing to the gate delay of the inverters inserted in every two stages. From this, a digitized replica bit-line delay (digitized-RBD) technique (Fig. 4b) was proposed by Niki et al. (2010; . The digitized-RBD technique uses K times RCs in the RBL column compared to that of conventional RBL to obtain the standard deviation of RBL delay variation divided by K √ K. Then the goal timing variation of SAE is multiplied by K for using the time multiply circuit (TMC) to guarantee that the delay time is the same as that of normal bit-line. That is to say, the theoretical goal standard deviation of timing variation is divided by √ K compared to conventional RBL. The disadvantage of this technique is that, as the RCs count increases, the quantified error caused by the delay unit used in TMC becomes larger, and there is an increase in total area overhead. Another technique, called multiple-stage parallel replica bit-line delay (MPRBD) (Fig. 4c ) (Wu et al., 2014) , whose essence is K multiple RBLs being divided into M × K stages, faces the same problems as the digitized-RBD technique. Considering the trade-off between area overhead and performance, the two sets of replica bit-line (main and reservoir) delay technique is employed (Kawasumi et al., 2012) . To further improve the area overhead, the dual replica bit-line delay (DRBD) technique (Fig. 4d) has recently been proposed by Li et al. (2014) . Without extra area overhead, this design proposal improves the target standard deviation of timing variation of the SAE by 1/ √ 2 times. However, the capacitance of the dual-RBL is doubled by connecting the two sides of the replica column. Consequently, compared to normal bit-line, the charging time of dual-RBL is doubled, which leads to the increase of access time. In addition, the new replica cell proposed by Li et al. (2014) cannot well simulate the real 6T bitcell owing to the destruction of cross coupling.
Principle and structure of the proposed MDRBD technique
According to Arslan et al. (2008) , the scaling of conventional RBL delay variation (σ/μ) is a function of the number of RCs, which is described by
where σ 0 and μ 0 are the standard deviation and mean of timing variation of conventional RBL, respectively. The distinctions σ n and μ n are those of RBL with n times RCs compared to the conventional one. Owing to the n times RCs being activated (i.e., I read of RBL increases by n times), μ n is equal to μ 0 /n. Keeping the capacitance (C RBL ) invariant, σ n is approximately equal to σ 0 /(n √ n) (i.e., the theoretical basis of k √ k directly given in Niki et al. (2010; ). However, the time delay of RBL is directly proportional to the ratio of C RBL /I read . Therefore, keeping I read invariant (i.e., the number of RCs is invariant) and C RBL being multiplied by m, σ m is approximately equal to mσ 0 (i.e.,
2 )), according to Komatsu et al. (2009) . Furthermore, to ensure that the mean of timing variation is the same as that of the conventional RBL, extra circuits such as TMCs or accumulative circuits are necessary.
According to the aforementioned fundamental principle, the principle of the proposed MDRBD technique shown in Fig. 5 could be explained as follows: first, using twice the number of RCs compared to conventional RBL, the standard deviation and mean of timing variation are σ 0 /(2 √ 2) and μ 0 /2, respectively. Then they become σ 0 / √ 2 and μ 0 owing to C RBL multiplied by two on both sides. Further, divide the RBL into several sub-RBLs (i.e., M stages novel dual replica bit-lines, DRBs), while keeping the RCs in each stage the same as those before division. Those of DRB become σ 0 /(M √ 2) and μ 0 /M . Finally, accumulating the M stages using inverters, they become σ 0 / √ 2M and μ 0 . The proposed MDRBD technique solves the problem of the double charging time necessary in the DRBD technique using the multi-stage idea. Moreover, to achieve the same effect, 2M stages are necessary according to the MRBD technique, which means 2M inverters are necessary. However, only M inverters are necessary in the proposed MDRBD technique owing to, in each stage, doubling the replica bitline capacitance and discharge path simultaneously. Consequently, to a certain degree, the mismatch between global RBL and normal bit-line owing to the gate delay of the inverters decreases.
4 Simulation results and discussion Fig. 7 presents the standard deviation of SAE timing with different stages using MRBD and the proposed MDRBD technology. Keeping the same stage, the standard deviation of SAE timing of the RBL with the proposed MDRBD technology will be reduced approximately by 1/ √ 2, compared with that of the RBL with the MRBD technology. However, with theoretically the same improvement, the standard deviation of SAE timing of the RBL with the proposed MDRBD technology is better than that of the RBL with the MRBD technology owing to the half inverters utilized (Table 1) . As shown in Fig. 8 , a comparative study, in the condition of 0.6 V supply voltage, slow-slow (SS) corner, −40
• C, and 128-row memory cells, is made among the conventional RBL, MRBD, DRBD, and the proposed MDRBD based on the TSMC 65 nm CMOS technology (Taiwan Semiconductor Manufacturing Company, Taiwan). Here, RBLs are divided into four stages for MRBD and MDRBD, and N = 2. The Monte Carlo simulation results show that the proposed MDRBD has the lowest standard deviation σ (8.4 ns), resulting in improvements of 69.2%, 45.1%, and 11.6% compared with those of the conventional RBL, DRBD, and MRBD, respectively. Generally, without the timing margin, the desired access cycle time is double the SAE timing delay (Niki et al., 2010; Li et al., 2014) . Considering random V TH variation, a large timing margin is necessary. Assuming three times standard deviation for the SAE timing margin, the conventional and proposed access timing margins are 163.8 ns and 50.4 ns (i.e., 3σ ×2), respectively. The timing margin is reduced by about 113.4 ns owing to a 69.2% reduction of the SAE timing variation. Consequently, the access time is reduced by 227 ns. Thus, a 47.2% cycle time improvement is expected using the proposed MDRBD technique. Fig. 9 shows the standard deviation of these current techniques and the proposed design with different supply voltages. The simulation condition remains the same except for the supply voltage (i.e., SS,−40
• C). When the supply voltage varies from 0.6 V to 1.0 V, the standard deviation of the timing variation of the proposed design is reduced by about 63.2% to 69.2%, 44% to 48%, and 6.7% to 13.3% compared with those of the conventional RBL, DRBD, and MRBD, respectively. In particular, at the supply voltage of 0.6 V, the standard deviation is decreased by 69.2% compared with that of the conventional RBL, which indicates that the proposed design is more effective in low-supply applications. Fig. 10 shows the standard deviation of these current techniques and the proposed design with a different process corner. Keeping the supply voltage at 0.6 V and the temperature at −40
• C, the variation tendency of the standard deviation in different RBL techniques shows that the standard deviation of the proposed design is reduced by 59.2% to 69.2%, 45.1% to 49.8%, and 11.6% to 24.3% compared with those of the conventional RBL, DRBD, and MRBD, respectively. Therefore, the proposed MDRBD is more robust under process variation compared with the other techniques. Keeping the supply voltage at 0.6 V and the process corner at SS, the standard deviation with changing temperature is as presented in Fig. 11 . As can be seen from the results, as the temperature changes from −40
• C to 125 • C, the standard deviation of the proposed design is reduced by 62.8% to 69.2%, 40.5% to 45.1%, and 10% to 12.8% compared with those of the conventional RBL, DRBD, and MRBD techniques, respectively. Table 2 shows the comparison between different timing strategies. As mentioned above, the proposed technique requires the same area overhead as MRDB. In terms of power consumption, there is an increase of 5.67% with respect to DRBD. 
Conclusions
A multi-stage dual replica bit-line delay technique has been proposed to further optimize the SA timing of low-supply SRAM. By using multi-stage and dual replica bit-line, the proposed MDRBD technique, to some degree, decreases the mismatch between global RBL and normal bit-line owing to the gate delay of inverters. Simulation results indicate that the proposed design can obtain the smallest SAE timing variation in TSMC 65 nm CMOS technology, particularly at a low supply voltage. At 0.6 V, SS process corner, and −40
• C, the standard deviation in the proposed design has decreased by 69.2% relative to that of the conventional scheme, and the cycle time has reduced by 47.2%.
