Abstract-A BIST-Assisted Timing-Tracking (BATT) scheme is proposed in this paper to facilitate robust read operation in an SRAM design without sacrificing any circuit performance at all. This scheme has very low area overhead since it uses commonly existing memory BIST circuit for tracking the worst-case silicon speed of the bitlines. It is also highly scalable and therefore suitable for an SRAM compiler that needs to support a wide range of different configurations. Measurement results of 8 manufactured chips of a 2 K-bit SRAM design using TSMC 0.18-m CMOS technology demonstrate that it can indeed rescue one originally failing chip, while still warranting correct functionality of all the other seven chips, even under some injected variations in which conventional schemes may fail badly.
Robust SRAM Design via BIST-Assisted
Timing-Tracking (BATT)
Ya-Chun Lai and Shi-Yu Huang
Abstract-A BIST-Assisted Timing-Tracking (BATT) scheme is proposed in this paper to facilitate robust read operation in an SRAM design without sacrificing any circuit performance at all. This scheme has very low area overhead since it uses commonly existing memory BIST circuit for tracking the worst-case silicon speed of the bitlines. It is also highly scalable and therefore suitable for an SRAM compiler that needs to support a wide range of different configurations. Measurement results of 8 manufactured chips of a 2 K-bit SRAM design using TSMC 0.18-m CMOS technology demonstrate that it can indeed rescue one originally failing chip, while still warranting correct functionality of all the other seven chips, even under some injected variations in which conventional schemes may fail badly.
Index Terms-Memory BIST, sense amplifier, SRAM, timing tracking.
I. INTRODUCTION
A STATIC RANDOM ACCESS MEMORY (SRAM) operation could fail in several different modes, e.g., read failure, write failure, and sensing failure [1] . These functional failures are sometimes linked to the performance variations of the SRAM bit cells. As reported in [1] , with the continuing scaling in device geometries, the failure count of bit cells can easily exceed the maximum repair capacity provided by structural redundancy, thus causing the deterioration of the yield. Moreover, the sensing failure can sometimes be attributed to the improper timing control of the sense amplifiers. Therefore, a reliable timing tracking method for sense amplifier control is critical in order to reduce the sensing failure.
The commonly used voltage sense amplifier in an SRAM circuit is the latch-type sense amplifier [10] . This is because it consumes less power and occupies smaller area than other types of sense amplifiers such as the current-mirror sense amplifier. However, the timing control for the latch-type sense amplifier is crucial. If the sense amplifier is turned on too early, the bitline voltage swing could be smaller than the input offset voltage of the sense amplifier, and thus incorrect sensing operation could occur. On the other hand, if the sense amplifier is turned on too late, the access time and power dissipation will increase.
The conventional timing tracking method for sense amplifier control is based on delay element such as inverter chain or dummy bitline composed of replica cells [2] - [5] . Although the replica cells are less sensitive to process, voltage, and temperature (PVT) variations than the inverter chain as claimed by [2] , [3] , the timing tracking based on the replica cells still cannot guard an SRAM circuit against the sensing failure, especially when aggressive timing control is adopted to achieve low power and high speed. On the contrary, the more conservative timing control could degrade the circuit's performance even though it can indeed avoid the risky sensing operation. As a result, these conventional methods cannot strike a good balance easily on the timing control of the sense amplifiers. The method presented in [6] has used variable delay element to flexibly adjust the sense amplifier's timing. However, the control code used to configure the variable delay element is programmed by an external input. In this paper, we apply the popular at-speed Memory Built-In Self-Test (BIST) circuit to automatically determine the appropriate control code for the variable delay element. Thus, this timing-tracking scheme, called BIST-Assisted Timing-Tracking (BATT), can adapt to the silicon speed seamlessly from one chip to another. Compared with the conventional timing-tracking schemes, the SRAM design using the proposed scheme can thus operate more safely, while achieving the best circuit performance simultaneously.
The rest of this paper is organized as follows. Section II describes the conventional timing-tracking schemes and the relevant problems. Section III presents the circuit design of the proposed BATT scheme. Section IV discusses the scalability and area overhead of the proposed scheme. Section V demonstrates the measurement results. Section VI concludes.
II. PRELIMINARIES
A self-timed embedded SRAM design relies on timing tracking to ensure correct operation sequence, including the pulsed wordline control, sense amplifier control, and precharge control. A satisfactory timing-tracking scheme must accommodate the worst-case silicon speed under varying PVT conditions. The key part of a successful timing-tracking scheme is the delay element that determines the timing window from the activation of the wordline to the activating of the sense amplifiers during a read operation. The delay element could be made of an inverter chain or a column of replica cells. Traditionally, replica-cell-based timing tracking has been touted as a more robust scheme that is less sensitive to the process variation [2] , [3] . However, as the process technology advances into the even deeper nanometer regime, improvement is needed. Fig. 1 shows the block diagram of an SRAM circuit using the replica-column scheme, where the data and control paths are highlighted by solid and dashed lines, respectively. The data path for sensing operation is from clock (CLK) through wordline enabling signal (wl en), wordline and bitline pair to the inputs of the sense amplifier. The control path is from clock (CLK) through dummy wordline (dwl), dummy bitline (dbl) and the feedback reset signal (fb) to the sense-enabling (SE) signal of sense amplifiers. In this timing-tracking scheme, all replica cells are used as dummy bitline loads while only a selective number of replica cells are used as the dummy bitline drivers to discharge the dummy bitline. The number of replica cells for these dummy bitline drivers influences the timing to activate the sense amplifiers, and is often decided by circuit designers by means of thorough simulation during the design phase.
We use the following notations and terms as we analyze the potential problems associated with this conventional timing-tracking scheme.
• Td: the time that bitline voltage swing is sufficient for sensing.
• Tc: the time that sense amplifiers are activated.
• Timing deviation: the timing difference between the actual Td (Tc) and its mean value. • Timing window: the timing difference between the actual Td and Tc. • Timing margin: the timing difference between the mean values of Td and Tc. Fig. 2 conceptually depicts the statistical distributions of Td and Tc from the viewpoint of process variations, where the actual Td (Tc) is diverse from one chip to another. Moreover, it also fluctuates with the variations of the supply voltage and the temperature. Fig. 3 gives examples of correct and wrong sensing. In case I, the actual Tc is greater than the actual Td, implying that the timing window is positive and the sensing margin is sufficient for correct operation. In case II, the actual Tc is less than the actual Td due to adverse timing deviation. Thus, the sensing margin is insufficient and wrong sensing could occur accordingly. In this case, the timing window is negative.
In order to achieve low-power and high-speed operation, the conventional timing tracking adopts aggressive timing setting, and thus timing margin is limited for SRAM design. However, as technology advances and device shrinks continually, the statistical distributions of Td and Tc become wider and wider due to severe timing deviation. As a result, the probability of negative timing window could increase significantly. Sources of the timing deviation of the data path in an SRAM circuit are multiple, including (1) variation of the cell current [7] , (2) variation of the bitline resistance and capacitance, (3) data-pattern dependent bitline leakage [8] , [9] , and (4) input offset voltages of the sense amplifiers [10] .
One solution to combat the potential large deviations of Td and Tc is to increase the timing margin for SRAM design, e.g., by adopting conservative timing settings. As shown in Fig. 4 , the conservative timing setting could provide guard band to reduce the probability of negative timing window and to guarantee safe sensing operation. Nevertheless, this design strategy sacrifices circuit performance of all manufactured chips. Even though accommodation is acceptable, how to determine the adequate quantity of extended timing margin is a tough issue since the simulation does not match that well with the silicon for nanometer technologies. Besides, such a simulation-based strategy also makes a design less portable from one process to another.
III. CIRCUIT DESIGN OF BIST-ASSISTED
TIMING-TRACKING SCHEME
The block diagram of an SRAM circuit using the proposed BATT scheme is shown in Fig. 5 . Compared with the replica-column scheme as shown in Fig. 1 , an extra reconfigurable delay line is added on the control path, as a vehicle for tuning the timing of the sense-activation signal under some control code determined automatically by the companion at-speed memory BIST circuit appended to the SRAM circuit. Multiple runs of memory BIST are conducted during the power-on self-test (POST) to decide a proper control code for each individual chip. Then, the SRAM circuit with the appropriate control code can normally perform safe functional operation. The controller highlighted in Fig. 5 is used to regulate the process of finding the most suitable control code of the reconfigurable delay line by running memory BIST.
A. Reconfigurable Delay Line
The circuit schematic of the reconfigurable delay line is shown in Fig. 6 . It consists of an inverter chain and several CMOS transmission gates. There are four paths from the input port to the output port, each of which represents a specific delay line. The delay across the selected delay line will dictate the time when the sense amplifiers will be activated. Specifically, the largest control code (2'b11) corresponds to the shortest delay line, which will in effect activate the sense amplifiers at the earliest. On the contrary, the smallest control code (2'b00) corresponds to the longest delay line, which will activate the sense amplifiers at the latest. Fig. 7 shows the influence of the control code on the sense-activation timing, where the definition of sense-activation timing is the delay time from the rising edge of the clock signal to that of the sense-activation signal. Fig. 8 shows the influence of the control code on the bitline voltage swing, where bitline voltage swing is defined as the voltage difference across the bitline pair at the moment when the sense amplifier is activated. From the results of Figs. 7 and 8, it is obvious that the larger the control code, the faster the sense amplifiers will be activated, and the smaller the bitline voltage swing. Figs. 7 and 8 also show the influence of the control code on sense-activation timings and bitline voltage swings across different PVT conditions. It was found that the curve of the typical case has the similar trend to those of the best and worst cases.
B. BIST-Assisted Timing-Tracking Controller
The proposed BATT controller that generates the control code is incorporated with a memory BIST, which is dedicated to test the functionality of an SRAM macro. By doing so, the modified memory BIST can not only test SRAM macro but also tune the timing of the sense-activation signal according to the test result. After the POST session completes, the BATT controller supplies an appropriate control code. Fig. 9 depicts the operation flow of the BATT controller during the POST session. The operation flow is described step by step as follows.
1. Set the control code as the initial value, i.e., the largest value 3 (which represents the most aggressive timing setting). 2. Test SRAM macro by means of the memory BIST. 3 . If the test result fails, then go to step (4). Otherwise, go to step (6). 4. If the control code is down to the last value, i.e., the smallest value 0 (which represents the most conservative timing setting), then go to step (7). Otherwise, go to step (5). 5. Decrease the control code by 1 and then go to step (2). This step relaxes the timing setting in an attempt to accommodate some slow bit cells and/or bitlines in silicon. 6. Report a pass signal and the final control code for the tested SRAM macro. 7. Report a fail signal and the final control code for the tested SRAM macro. Fig. 10(a) illustrates how the memory BIST writes test patterns into the cell array. Since the successive data patterns of the cell array look like forward movement of logic-1 or backward movement of logic-0, such test patterns are called march patterns. Fig. 10(b) describes how the memory BIST generates march patterns. To begin with, the memory BIST writes logic-0 into all of SRAM cells. Then, the memory BIST starts to write logic-1 into an SRAM cell in the ascending order. Due to a sequence of operations at the same address, this type of test patterns performs at-speed test to check whether dynamic timing failure occurs. Subsequently, the memory BIST starts to write logic-0 into an SRAM cell in the descending order. Finally, the memory BIST reads out the data stored in all of SRAM cells and thereby completes a BIST session.
C. Memory Built-In Self-Test

IV. SCALABILITY AND AREA OVERHEAD
In an SRAM compiler, the configurations of memory macros supported are various. In that case, the memory BIST needs to be parameterized accordingly. However, the proposed BATT controller can be employed without any alteration. Fig. 11 shows the gate count of the memory BIST and the BATT controller as a function of the SRAM's capacity. It is clear that the gate count of the BATT controller remains unchanged as the SRAM's capacity increases. Nevertheless, the gate count of the memory BIST rises from 408 to 2242. Thus, the area overhead of the BATT controller to the memory BIST decreases from 14.5% to 2.6%.
Since we can use some fragmented space around the corner of the SRAM macro to implement the reconfigurable delay line, the area overhead of the reconfigurable delay line is almost negligible. Additionally, the same design and layout of the reconfigurable delay line can be re-used to different configurations of SRAM macros. Hence, the proposed BATT scheme is scalable and area-efficient.
V. CHIP FABRICATION AND MEASUREMENT RESULTS
A. Chip Fabrication
In order to verify the effectiveness of our scheme, we implemented a 2 K-bit SRAM macro equipped with an at-speed memory BIST using the design flow provided by Chip Implementation Center (CIC) in Taiwan [11] . We also implemented another SRAM macro with the same configuration while the output loads of sense amplifiers are unmatched intentionally so as to mimic the potential input offset voltages of the sense amplifiers in nanometer technologies. Test chips including these two SRAM macros and memory BIST were manufactured in a 0.18-m CMOS technology. The die photo of a fabricated test chip is shown in Fig. 12 . These two SRAM macros are located at the top and bottom of the test chip, respectively, and their corresponding memory BIST circuits are located in the middle of the test chip. Table I summarizes the test chip characteristics.
B. Measurement Results
Table II presents measured performance of SRAM macro under different control codes. Compared with the conservative timing setting (i.e., ), the aggressive timing setting (i.e., ) can achieve 7.4% and 8.8% improvement in average power and access time, respectively. Fig. 13 shows the control code reported by the BATT controller of the memory BIST from eight test chips. The diamondmarked dots correspond to the normal SRAM macro with sense amplifiers of matched output loads (denoted as SRAM macro 1) and the square-marked dots correspond to the SRAM macro with sense amplifiers of unmatched output loads (denoted as SRAM macro 2). According to simulated results, the control code for SRAM macro 1 is 3 while the control code for SRAM macro 2 is 2. However, regarding SRAM macro 1, it is interesting to see that the real control code in silicon is different from that predicted by simulation. This result suggests that the SRAM design using the conventional replica-cell-based timing tracking with the aggressive timing setting does not match the silicon very well. By contrast, the SRAM design using the proposed timing tracking can work correctly by means of a little relaxed timing setting. Additionally, in comparison with the conventional timing tracking with the conservative timing setting, the proposed method would not sacrifice the circuit performance of all other manufactured chips. The control code for SRAM macro 2 varies from one chip to another. This fact implies that the unmatched output loads of sense amplifiers could cause significant variation in input offset voltage and thus larger sensing margin may be needed. It is worth mentioning that the SRAM macro 2 in the chip of index 1 fails even when the control code is set to 0 (the longest delay line). In this case, the reconfigurable delay line should be designed to cover wider timing range, or in other words, the longest delay line needs an even larger delay.
Fig. 14 shows the failure count under different control codes for the SRAM macro with sense amplifiers of unmatched output loads (i.e., for the SRAM macro 2). It is clear that the failure count declines to zero if the appropriate control code is set by the proposed BATT scheme for all test chips except the chip of index 1. Although the SRAM macro in this test chip fails, the failure count can be reduced by 88% as the control code is relaxed from 3 to 0. Therefore, it implies that this SRAM macro could have been rescued if the reconfigurable delay line had offered more relaxed timing settings. Furthermore, the failure count of the chip of index 1 under the control code of 3 exceeds 250. Thus, this SRAM macro cannot be fixed easily by the redundancy and repair scheme effectively. However, if the control code reduces to 0 such that the timing setting is somewhat relaxed, the failure count can be reduced significantly, which creates an opportunity for the redundancy and repair scheme to rescue the SRAM macro.
The shmoo plot of the operating frequency versus the supply voltage shown in Fig. 15 validates that the test chip works successfully. Also, this shmoo plot demonstrates the effectiveness of the proposed BATT scheme, given the fact that this SRAM macro can function correctly under the setting of the appropriate control code even if the input offset voltages of sense amplifiers could be large for the SRAM macro with sense amplifiers of unmatched output loads (SRAM macro 2). Fig. 16 shows the shmoo plots under different control codes for the SRAM macro 2 for the chip of index 2. The results are obtained by external Automatic Test Equipment (ATE), Agilent-93000. It is interesting to see that when the control code is set to 3, the SRAM macro can pass at the operating frequency of 100 MHz and the intermediate supply voltage (e.g., 1.3 V). However, it fails at the higher or typical supply voltage (1.8 V). This fact indicates that increasing supply voltage may not be a cure for a failing chip. By contrast, if the control code is set to 2, the SRAM macro presents typical characteristics of shmoo plot. Consequently, it can be implied that adjusting the timing for the sense amplifiers is a more effective way to rescue some SRAM macros that work incorrectly due to insufficient sensing margin.
C. Comparison and Discussion
To clarify benefits of the proposed BATT scheme, comparison with the conventional replica-cell-based timing tracking scheme from some viewpoints is discussed as follows: 1) Resiliency (yield): The proposed BATT scheme can adjust the sense amplifier's timing to avoid sensing failure caused by weak-driving cells detected by sense amplifiers with excessive input offset voltages owing to severe local process variations. As a result, the manufactured SRAM macros with the BATT scheme can maintain high yield due to improved resiliency. On the other hand, the traditional timing tracking adopting aggressive timing setting may incur significant yield loss due to tight timing margin. For example, according to measurement results shown in Fig. 13, 1 of appropriate timing setting to accommodate some slow bit cells in silicon. 2) Power consumption: In order to achieve satisfactory resiliency (yield), the traditional timing tracking can adopt conservative timing setting to gain sufficient timing margin. However, the power consumption of all manufactured chips rises as timing margin of the designed SRAM circuit increases. On the contrary, the proposed BATT scheme can better adjust timing margin from one chip to another so as to reduce power consumption of manufactured chips that require relatively small timing margin (e.g., 7.4% power reduction shown in measurement results). 3) Area overhead: Since the reconfigurable delay line well located in the corner of the SRAM macro would not degrade array efficiency, the area of the entire SRAM macro using the BATT scheme is comparable to that of the SRAM macro employing the traditional timing tracking. 4) Scalability: Since the reconfigurable delay line can be re-used to different configurations of SRAM macros, the proposed BATT scheme is scalable like the traditional timing tracking.
VI. CONCLUSION
The sense amplifier control is critical for an SRAM design to reduce sensing failure and to achieve low-power and high-speed operation simultaneously. With the increase of the process variation, the statistical distributions of the data-path and control-path timings become wider and wider and the worst-case is harder and harder to predict by simulation. Therefore, the conventional timing tracking may fall victim to substantial sensing failures. Motivated by the need to automatically control the timing of the sense amplifiers based on the silicon speed, we have proposed a low-area-overhead BIST-assisted timing-tracking scheme in this paper. Measurements of fabricated test chips demonstrated its capability to guarantee robust sensing while having zero penalty on the circuit performance. In this scheme, a reconfigurable delay line is added on the control path to tune the timing of the sense-activation signal guided by a control code determined by a sequence of memory BIST sessions. The measurement results from test chips show that our scheme can effectively rescue one chip that may have failed using the traditional timing-tracking scheme. For memory macros with injected variations, our scheme can adapt from one chip to another to maintain satisfactory yield, except for one chip in which we can still reduce the failure count effectively by 88%.
