Abstract-Due to the spatial-locality property of data caches and the temporal-locality property of instruction caches, significant leakage reduction can be achieved by switching a large number of cache lines into the low-power standby or drowsy mode. It has been shown that 80%-90% of the data cache lines can be maintained in drowsy state without affecting the performance by more than 0.6% (IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 2, pp. 167-184, Feb. 2004). However, with the introduction of the drowsy-cache design technique, new fault behaviors appear and more restrictive design rules must be applied to the chip fabrication process. In this paper, we simulate all possible spot defects (SDs) under normal mode and drowsy mode in different resistance regions using HSpice. Six new fault models appear with the introduction of drowsy mode for memory arrays. When we derive a march algorithm for the new fault models of this low-power cache, several simplification rules are utilized to reduce the test complexity. According to these simplification rules, each of these new faults has its equivalent counterpart existent in both data caches and instruction caches. As a result, we develop a march algorithm which can detect all SDs in either data caches or instruction caches. Since some faults occur only in drowsy mode, a built-in self-repair (BISR) scheme is developed. By utilizing BISR, the cache can still work even if some cache lines fail to work in drowsy mode.
I. INTRODUCTION
I N THE PAST, dynamic power dominated the total power consumption of CMOS transistors. When a CMOS transistor is not switching, there is no direct current path from V dd to ground and leakage power is negligible. However, as feature size shrinks, leakage power increases much faster than dynamic power does. In the current 0.13-0.09-µm technologies, leakage power is already considerable when compared with the active power dissipation. When technology moves below 0.09 µm, leakage-power consumption is approaching over 50% of the total power [2] , which is dominating. Suppressing leakage current is hence critical.
Onchip memories, especially large-cache memories, provided high performance with lower power density than logic circuits before. As a result, larger and larger portions of the die area have been occupied by cache memories. For instance, Manuscript received March 13, 2006 ; revised June 20, 2006 . This work was supported in part by the National Science Foundation under Grant CCF-0541103. This paper was recommended by Associate Editor N. K. Jha.
W. Pei is with Sun Microsystems, Inc., Sunnyvale, CA 94086 USA (e-mail: wei.pei@sun.com).
W. 50% of Pentium-4 chip area and 60% of StrongARM chip area are allocated to cache structures [3] , [4] . Onchip memories had lower power density, because typically, only a small portion of the memories are accessed every clock cycle. It is no longer true when leakage has become a problem for transistors. Due to a large number of storage cells and the lack of stacking effect [5] to reduce leakage current, leakage power will dominate the cache power consumption and, thus, the total power of a chip. According to the projection in [1] , for 70-nm process, leakage can contribute more than 60% of power consumption in L1 caches if left unchecked. A modern system-on-a-chip (SOC) usually carries much larger L2 caches onchip, which will consume much more energy than L1 caches. Reducing leakage power of onchip caches can decrease the total power consumption of a chip significantly. Several techniques have been presented on leakage reduction. In [6] , a dual-V t technique uses transistors with high threshold voltage in the noncritical part of a circuit, since subthreshold leakage current reduces exponentially with the increase of V t . The gated-V dd technique inserts a high-V t transistor between the circuit and one of the power-supply rails (V dd /GND) [7] . The circuit will be detached from its power supply when it is not in use. A multithreshold-CMOS technique has also been presented to lower the threshold voltage and to reduce the leakage power [8] . A simple but effective drowsy technique is proposed in [1] . This method implements caches with drowsy/standby mode and normal mode, where different supply voltages can be selected. The SRAM cells consume significantly less leakage power, when placed into drowsy mode by supplying lower voltage. Due to the spatial locality and temporal locality of onchip caches, a large portion of cache lines can be placed into drowsy mode to cut down power consumption.
Many faults in memory circuits are caused by spots of extra, missing, or undesired material in a small area. These defects are called spot defects (SDs) and are the primary testing target. In [9] , a complete analysis of SDs for industrial SRAMs is presented. Functional fault models (FFMs) are defined to describe the fault behaviors, and march tests are developed based on these FFMs. All electrical faults are transformed into FFMs, which consist of nine single-cell faults (e.g., stuck-at fault) and five coupling faults (e.g., deceptive-read destructive fault). The March SRD (simple realistic and can deal with data retention faults) algorithm with test length 14n (n is the number of bits) is developed to detect all FFMs with deterministic data outputs at sense amplifiers [9] . Recently, a similar defectinjection and circuit-simulation technique has also been used to derive the fault behaviors of embedded DRAMs [10] and 0278-0070/$25.00 © 2007 IEEE multiport SRAMs [11] . Built-in self-test (BIST) is a technique that enables circuits to perform testing without an expensive test equipment [12] . BIST methods based on patterns generated by march tests are dominant for testing memories nowadays [13] . As the complexity and the size of embedded caches/memories increase, built-in self-repair (BISR) is used to improve the overall yield. BISR begins with applying memory-test patterns and collecting the test responses. Traditionally, the defective addresses are eliminated and substituted with redundant-memory circuits [14] - [16] , so the memory yield can be dramatically increased.
Unfortunately, new fault behaviors can appear with the introduction of drowsy-mode caches or memories. In this research, we implement a drowsy SRAM cache with peripheral circuits like sense amplifier, address decoder, write circuit, etc. All possible SDs in a memory cell are simulated in both normal mode and standby/drowsy mode using HSpice. We find new fault behaviors in drowsy mode. These fault behaviors are transformed into FFMs, and a march algorithm is developed. A set of drowsy-fault simplification rules (similar to equivalent fault identification in logic circuits) is identified to alleviate the difficulty of march-test pattern generation. We demonstrate that all drowsy faults can be detected by our proposed march algorithm. Furthermore, if a cell functions properly in normal mode but manifests its defect in drowsy mode, the entire cache line where this cell is located will be marked as a nondrowsy cell (using a register) and will not be subject to drowsy operations. Thus, no redundant-memory cells are required for the BISR circuit, and drowsy defects can be tolerated if the power budget is not exceeded. This paper is organized as follows. In Section II, the working principle of a drowsy cache is presented and fault modeling for memory SDs is briefly reviewed. Drowsy-fault modeling for SDs in a drowsy cache is presented in Section III. Fault behaviors obtained using defect injection and HSpice simulation are then transformed into FFMs. Based on the fault models identified, a powerful march algorithm is presented in Section IV to detect all drowsy and normal defects. Furthermore, a built-in self-repair method is proposed to tolerate all drowsy defects. Section V concludes this paper and gives future work.
II. BACKGROUND
This section provides a brief introduction of the drowsy memory technique and memory testing.
A. Drowsy-Cache Working Principles
In this paper, cache line (block) [1] is adopted as the major component of the cache architecture, and multiple bytes in a cache line are accessed simultaneously. The cache structure is implemented based on 6T SRAM cells, and includes the corresponding peripheral circuits like sense amplifier, address decoder, precharge circuit, etc. Magic is used to generate the layout with the TSMC 0.18-µm technology, and HSpice is used for simulation. The diagram of a drowsy cache is shown in Fig. 1 . For the current 0.18-µm technology, V dd is applied 1.8 V.
With the introduction of drowsy mode, two problems arise: One is how small the standby voltage can be and the other is how long drowsy state has to be simulated. The minimum standby voltage and the minimum simulation time for drowsy mode can be derived, when these two problems are answered.
1) Data-Retention Voltage: Since leakage power reduces superlinearly with the reduced standby voltage [1] , the minimum standby voltage [date retention voltage (DRV)], hence, can achieve the minimum leakage power, while preserving the data stored in an SRAM cell. Here, we analyze the datapreservation limit of a low-voltage SRAM.
The cell stability of a memory array is often characterized using static-noise margin (SNM), where noiselike mismatches and disturbances are modeled as dc offsets [17] - [19] . When these dc offsets exceed the SNM of an SRAM cell, the cell performs a false switch, i.e., assume a wrong logic value. SNM can be visualized by superimposing the voltage-transfer curves (VTCs) of both cross-coupled inverters within an SRAM cell. Its value is defined as the edge of the maximum square that can fill into the two VTC curves [19] . In Fig. 2 , V T and V F denote the voltages of nodes T and F in the SRAM cell mentioned in Fig. 3 . VTC T denotes the VTC resulting from the inverter whose input is T and output is F , while VTC F denotes the VTC resulting from the inverter whose input is F and output is T . When V dd is 0.36 V, the resulting SNM is around 100 mV. When V dd reduces to 0.1 V, the VTCs degrade such that the noise margin drops to zero. If V dd reduces further, the SRAM cell cannot retain the stored data any more. But, the real noise margin comes not only with reduced V dd but with temperature, process variation, etc. Therefore, the standby voltage cannot be reduced all the way down to 0.1 V. In [20] , it is found that a guard band over 100 mV of the minimum voltage (the one with zero SNM) is sufficient to overcome these noise effects. In this paper, 0.36 V is used as the DRV for our 0.18-µm technology.
2) Drowsy Operation: To illustrate how a drowsy-cache works, the SRAM cell in Fig. 3 is used as an example. The absolute value of both p-transistor and n-transistor threshold voltages is 0.53 V, which is denoted in the technology file (TSMC SCN6M_SUBM). It is assumed that the cell contains logic "1" before being placed into drowsy state. Hence, V T (V F ) equals to 1.8 V (0 V) in the beginning. The memory cell goes through two phases to enter into drowsy state, as illustrated in Figs. 3 and 4. Fig. 3 consists of four procedures: 1) Fig. 3(a) shows the initial state of the cell, and at this time, V dd (1.8 V) is supplied; 2) when V dd is reduced to 0.553 V, the voltage of T reduces all the way down to 0.553 V immediately [ Fig. 3(b) ]; 3) when V dd is reduced to 0.36 V, the voltage of T reduces very slowly because of the small leakage current [ Fig. 3(c) ]; and 4) the voltage of T reduces to 0.36 V and the cell enters into its static drowsy state [ Fig. 3(d) ]. In Fig. 4 , V T (V F ) denotes the voltage of node T (F ) in an SRAM cell (Fig. 3) and V dd denotes the supply voltage to the cell. The regions denoted by numbers (1, 2) show different phases, when the cell is placed into drowsy state. Phases 1 and 2 are divided based on the V dd value, which is denoted as point (2.18 e-08 and 5.53 e-01). Note that this point represents 21.8 ns and 0.553 V.
In phase 1 (Fig. 4 , region 1), the supply voltage V dd is reduced but is still above the absolute value of threshold voltage (V TH , −0.53 V) of P1. During this period, the gate-to-source voltage V GS of P1 (0 − V dd ) is less than V TH (−0.53 V), transistor P1 is ON (P2 and N1 are OFF, and N2 is ON), and V T reduces immediately with V dd [ Fig. 3(b) ]. In this phase, V T exactly follows V dd and drops very fast to 0.553 V in 1.8 ns. Note that V dd starts to drop at time 20 ns in Fig. 4 . This phase can also be presented as (a)→(b) in Fig. 3 , and the memory cell begins to enter into the drowsy state.
In phase 2 (Fig. 4 , region 2), the supply voltage is further reduced to the drowsy voltage (0.36 V). We have V dd < |V TH |, hence, V GS > V TH , and transistor P1 is OFF (P2, N1, and N2 are OFF). At this time, all the four transistors are in the subthreshold region. A leakage current exists from node T to node V dd , which is shown in Fig. 3 (c). During this time, V T reduces slowly when compared to the change in V dd . This phase can be denoted as Fig. 3(d) ], the leakage current flowing through P1 is around zero, and the memory cell finally enters into its static drowsy state. This phase is very slow, because leakage current flows through a turn-off transistor (P1). Note that the leakage current from T to ground is very small, when compared with that flowing from T to V dd . According to our HSpice simulation, it takes 270 ns (i.e., at time 290 ns) to have zero leakage current flowing from node T to V dd . Although V T looks equal to V dd at time much shorter than 290 ns (e.g., 140 ns), the current is still leaking from T to V dd (but it is extremely small).
In summary, the phase 1 drowsy time (the time needed for a cell to enter into the initial drowsy mode, where V T = 0.553 V) and the wakeup time (the time needed to charge the cell to normal voltage) depend on the slope of V dd . But, the time needed for a cell to enter into "static" drowsy state (V T reduces exactly to the drowsy voltage) is much longer. In Fig. 4 , it takes about 270 ns (after V dd begins to drop) for the cell to enter into static drowsy state. To simplify the following discussions, we define a very early point, 5 ns after V dd begins to drop, in region 2 as "early" drowsy state. Fortunately, to detect most of the faults in a drowsy cell, the cell only needs to enter into "early" drowsy state, which is only of several nanoseconds. This will be presented in the following sections.
B. Memory Fault Modeling 1) Spot Defects:
Defects in SRAM memory chips can be categorized as global defects and local defects [13] . Global defects affect a large part of the silicon while local defects affect only a small (local) area of an IC and are called SDs. SDs can be modeled as spots of extra, missing, or undesired material (resistance), and can cause undesired connections or disconnections in circuits. SDs can be introduced during any one of the many steps in the IC fabrication process and are the primary test target, since they are much harder to be detected than global defects. In this paper, only SDs will be considered. Depending on their conductivities in memory chips, SDs can be categorized to the following three groups [9] . 3) Bridge: An undesired resistive path (R br ) between two connections which are not V dd /GND, where 0 < R br ≤ ∞.
There will be more than 22 defects when considering defect locations between and within cells. But due to the symmetric structure of the 6T SRAM cell, it has been demonstrated in [9] that only a subset of these defects needs to be simulated. This can be done by introducing complementary behavior, interchanged behavior, and interchanged complementary behavior.
2) Definition and Location of Open Faults:
Opens in an SRAM cell are categorized as opens within a cell (OC), opens at bit lines, and word lines. As shown in Fig. 5 , opens at locations OCx and OCxc show a complementary behavior, so only defects at OCx need to be simulated. Table I gives a detailed description of these open defects.
3) Definition and Location of Short Faults: Short defects can be classified as shorts within a cell (SC), shorts at bit lines (SB) and shorts at word lines (SW). As shown in Table II , for example, a short at F will show complementary behavior to a short at T . SBs and SWs affect many cells, and the first cell affected will be concerned. 
4) Definition and Location of Bridges:
Assume that a bridge can exist between a pair of nodes located close to each other. Thus, all bridge faults can be classified as bridges within a cell and bridges between cells. Table III shows all possible bridge defects within a cell (denoted as BCx), while Fig. 6 is used to illustrate relative cell locations in a memory. Depending on different layout implementations, all possible bridges between cells are listed in Table IV . Here, rBCCx denotes the bridges between cells in the same row, cBCCx denotes the bridges between cells in the same column, and dBCCx denotes the bridges between cells in near diagonal cells.
5) Fault Notation:
To describe the fault behaviors involving SRAM cells, fault primitives (FPs) with compact notation are introduced [9] . Each FP represents a fault behavior, and all FPs can be divided into the following three categories.
1) S/F/R : This FP involves faults in a single cell.
Here, S is the sensitizing operation; S ∈ {dr0, dr1, 0, 1, w0, w1, w ↑, w ↓, r0, r1, ∀}, where dr0 (dr1) describes the drowsy operation on a cell with logic value "0" ("1"). Furthermore, 0/1 denotes logic value "0" and "1" separately; w0/w1 and r0/r1 denote write and read operations; w ↑ (w ↓) denotes an up (down) transition write operation. If the fault behavior of S appears after time T , the sensitizing operation is denoted as S T . Notation ∀ can be "0" or "1". F describes the fault behavior of the cell, F ∈ {0, 1, ↑, ↓, X}, where ↑ (↓) denotes an up (down) transition; "X" denotes an undefined logic value. R denotes the output value of an SRAM cell, if the sensitizing operation applied to the cell is a read operation. We have R ∈ {0, 1, X, −}, where "-" means the output is not available. For example, when S is a write operation, R can be denoted as "-". 
, ∀}, whereby ∀ is the don't care value here, ∀ ∈ {0, 1}. The definitions of "F " and "R" are the same as those of S/F/R above. 3) wF (weak fault): A fault is partially sensitized by a read/write operation [9] , e.g., if a defect can only cause a small disturbance within the noise margin, it cannot be detected by an operation. In other words, in the presence of a wF , all operations pass correctly [9] .
III. DROWSY-FAULT MODELING
Based on the cache implementation of Section II, we simulated all possible SDs in both normal mode and drowsy mode. FFM1 (FFM with FP1s) and FFM2 (FFM with FP2s) are then derived from the simulation results.
A. FFM1 Fault Class
The simulation results of FFM1 are listed in Tables V-IX. By default, all SDs involving one cell are simulated at cell zero in Fig. 6 . Hence, in these tables, FP1s without subscript show the simulation result of cell zero, while those with subscript x ( S/F/R x ) show the observed fault behavior of cell x.
In Tables V-IX , column "Name" denotes each SD fault according to its type and position. For bridge and short faults, notation (A-B) within the "Name" column shows that a bridge (short) exists between nodes A and B; for each open fault, the open position can be found in Fig. 5 . The "Resistance" column denotes the different resistance regions (in increasing order, (Table V) . The values of different resistance regions might also be different. Take BC1 and BC2 as an example. The region 2 of BC1 is 40 to 8000 kΩ, while the region 2 of BC2 is 2 to 5 kΩ. The "Fault behavior" column shows the fault behavior of each SD, where "-" denotes that there is no fault behavior for the current setting. Furthermore, wF denotes that the defect can only cause a small disturbance and does not affect the cell function. Column "Comp. behavior" shows the complementary behavior of a specific fault, while column "Class" indicates whether the defect involves one cell (FP1) or two cells (FP2). Finally, column "FFM" shows the name of that FFM defined in this section or in [9] . FP2s mentioned in Table V will be explained in the next section.
The simulation time for each drowsy operation (the time a cell being placed into drowsy mode) ranges from 5 ("early" drowsy state in Fig. 4 ) to 270 ns ("static" drowsy state in Fig. 4 ). We found that most different drowsy operation times between 5 and 270 ns can give the same simulation results, which means that the "early" drowsy state is enough for simulation. That is, only faults with dagger sign ( †) and double-dagger sign ( ‡) in Tables V-X require long test-application time, and this will be considered in the march algorithm design later. As a result, the simulation time for each drowsy operation can be 5 ns (unless data-retention faults are involved), which will save a lot of time for testing. To save space, the simulation results of different drowsy times will not be shown in Tables V-X. New fault behavior appears with the introduction of drowsy state. Take Fig. 7 as an example. Assume that there is an extra resistance (25 kΩ) between T and F . When the cell is operated with standard voltage, current goes through from V dd to T , F , and GND [shown in Fig. 7(a) ]. Due to the large resistance, the voltage of T is 1.36 V and, hence, the cell can retain its value (logic "1"). When the cell goes into drowsy state, all four transistors are OFF, and the voltages of T and F become the same. At this time, no current path exists. The cell can no longer retain its value when waking up. This can be observed from the waveform of Fig. 8 . As a result, when bridge defect BC1 (25 kΩ) exists, the cell operates properly under the standard voltage. But once it enters drowsy mode, the voltages of both T and F nodes become the same (0.226 V). Thus, when it is accessed after being waken up, the cell returns an undefined state (0.58 V). This fault can be represented as dr1/X/− in Table V . This fault model is defined as drowsy undefined state fault (DUF) here.
Another new fault behavior introduced by the drowsy technique is drowsy data-retention fault (DDRF), where a drowsy operation applied to a cell will change the cell value to its inverse. Take the bridge fault BC2 in Table V as an example. Assume that a resistance (bridge fault) of 40 kΩ exists between nodes T 0 and BL0 in the cell zero of Fig. 6 . The cell is written logic "0" (at time ∼104 ns). When it enters into drowsy mode, because of the bridge between T 0 and BL0, node T 0 will be charged when BL0 is precharged to 1.8 V. After a certain delay time (8 ns, as shown in Fig. 9 ), the cell value is inverted (at time ∼130 ns). When it is waken up, the cell contains logic "1" now. This is shown in Fig. 9 . This fault can be modeled as dr0 T /1/− , and it will not occur if the power-supply voltage is not reduced to the drowsy voltage. We found that when the resistance of a bridge defect becomes larger, the corresponding simulation time for DDRF increases up to 2 µs. The drowsy operation times (i.e., test-application times) needed to detect all possible DDRFs are indicated in the comments of Tables V-IX. Based on the fault simulation results of opens, shorts, and bridges within a single cell, the following new drowsy FP1s in FFM1 are derived. 1) Drowsy transition fault (DTF): A drowsy operation applied to a cell with value x changes the value to x, when the cell is waken up. DTF consists of two FPs: dr0/1/− and dr1/0/− . It can be caused by: a) bridges between one node of a cell and bitline BL/BL or word line WL within a cell (BC2, BC3, and BC4); b) gate of pull-up at true side broken (OC5); c) shorts between one node of a cell and GND/V dd (SC1, SC2); d) bridges between one node of a cell and bitlines BL/BL (rBCC3, rBCC4); and e) bridge between one node of a cell and its adjacent word line (cBCC3). 2) DUF: A drowsy operation performed on a cell brings the cell into an undefined state ( dr0/X/− , dr1/X/− ). It can be caused by bridge between the T and F nodes of a cell (BC1). Three new fault models (DTF, DUF, and DDRF) are introduced by bridging faults within a cell in drowsy state (Table V) . For opens within a cell, only the source/drain of pull-up open defects (OC1 and OC2) and OC5, OC11, and OC12 introduce new fault behaviors (DDRF and DTF in Table VI ). The cell needs to be placed into drowsy mode for at least 2 ms to observe the fault behavior of DDRF (Table VI) . For shorts in a cell, drowsy state introduces a new fault model, which is DTF (Table VII) . Furthermore, bridge defects between cells in the same row/column will also give DTF and DDRF (Tables VIII  and IX ). Note that all of nondrowsy faults, except CF iw and CF drd , listed in Tables V-IX can be found in [9] . The fault behaviors of CF iw and CF drd will be discussed in FFM2.
B. FFM2 Fault Class
Cells in the same row/column/diagonal are simulated to derive FFM2. To save space, their interchanged (interchanged complementary) behavior will not be listed in Tables VIII-X. The fault notation Sa; Sv/F/R i,j indicates that cells i and j are aggressor/victim to each other.
New coupling faults also exist with the introduction of drowsy operations. Take the rBCC2 defect (bridge between two cells in the same row, Table VIII ) as an example. In Fig. 6 , the resistance between F 0 and T 2 is 15 kΩ. Assume that originally both cells store logic "0." When both cells enter into drowsy state, because of the resistance path between F 0 and T 2, the voltage of F 0 will be reduced. When cell zero is awakened both nodes (T 0 and F 0) are charged. But at this time, the voltage of T 2 is zero and there is a bridge between F 0 and T 2, so node F 0 will be charged much slower. As a result, cell zero will manifest its defect when it is waken up. This fault is denoted as dr0; dr0/1/− 2,0 , and it is called coupling DTF (CF dtf ). The simulation results are shown in Fig. 10 , where V T 0 /V F 0 and V T 2 /V F 2 denote the voltages of T /F node at cell zero and cell two separately. Due to the bridge between nodes F 0 and T 2, V F 0 is forced to be around 0 V when cells zero and two are placed into drowsy mode. When cell zero is awakened, it is inverted. Note that under normal mode, the fault will not occur due to the large resistance between F 0 and T 2.
Based on simulations of SDs between two neighbor cells in the same row (or column, or diagonal), the new drowsy FFM2s have been derived as follows. be caused by defects like: a) bridges between two cells of the same row (rBCC1 and rBCC2); b) bridges between two adjacent cells in the same column (cBCC1 and cBCC2); and c) bridges between nodes of two adjacent cells in the same diagonal (dBCC1 and dBCC2). Faults 0; dr0/1/− , 0; dr1/0/− , 1; dr0/1/− , and 1; dr1/0/− can only exist in data caches, and adjacent cache lines belonging to different subbanks of instruction caches. The reason is that, in instruction caches, all the cache lines within a subbank enter into drowsy mode simultaneously. 2) Coupling drowsy write destructive fault (DWDF) (CF dwdf ): A write operation applied to the a-cell causes a transition in the v-cell, which is in drowsy mode. CF dwdf consists of four FPs: w1; dr0/1/− , w0; dr0/1/− , w1; dr1/0/− , and w0; dr1/0/− . It can be caused by bridges between nodes of two adjacent cells in the same column (cBCC1 and cBCC2) or two adjacent cells in the same diagonal (dBCC1 and dBCC2). Obviously, this fault can only exist in data caches again, because all cache lines within an instruction cache enters into drowsy mode at the same time. 3) Coupling drowsy retention fault (DRF) (CF drf ): When the a-cell is in a specific state (drowsy or nondrowsy), a drowsy operation performed at the v-cell causes a transition at the v-cell, after a long drowsy time T . CF drf consists of eight FPs: There are two nondrowsy faults, CF iw and CF drd , which are not listed in [9] and their fault behaviors are discussed as follows: 1) Incorrect write coupling fault (CF iw ): When a write operation is applied to the a-cell, the v-cell in same cache line will fail to undergo a transition. It consists of four FPs: < w0; w ↓ /1/− >, < w1; w ↓ /1/− >, < w1; w ↑ /0/− >, and < w0; w ↑ /0/− >. CF iw can be caused by bridges involving bitlines (rBCC3, rBCC4, rBCC5, and rBCC6). This fault also occurs when the a-cell has a static logic value, while the v-cell is performing a write operation (obviously, both cells are in different cache lines). This case also consists of four be caused by bridges within a cell (BC2 and BC3). Obviously, this case occurs when the a-cell and the v-cell are in different cache lines. This fault can also occur when the a-cell and the v-cell are in the same cache line, and the fault effects can be represented by < r0; r0/ ↑ /0 >, < r0; r1/ ↓ /1 >, < r1; r1/ ↓ /1 >, and < r1; r0/ ↑ /0 >. This can be caused by rBCC3 and rBCC4. The above two (new) nondrowsy coupling faults can also be detected by the march algorithm proposed in the next section. In Table VIII , we can find several coupling faults with notation <> i,j . This notation means that we have i, j = 0, 2 or i, j = 2, 0, since this involves coupling faults in the same cache line. Similarly, coupling faults with <> i,j in Tables IX and X 
IV. MARCH ALGORITHM AND BISR
Based on the fault models, we first use a voltage-windowdetection circuit (presented in Section IV-A) to identify defects that result in undefined state. A few simplification rules are then developed to reduce the number of faults that must be dealt with. A drowsy march algorithm is proposed to detect all traditional faults and drowsy faults. Finally, a BISR circuit is designed to tolerate drowsy defects occurring in drowsy-cache devices.
A. March Drowsy Word-Oriented Memory (DWOM)
Since cache-line design is used in cache architectures, the cache we implemented can be treated as a word-oriented memory (WOM) for the testing algorithm. A WOM contains B bits per word, where B is greater than two and is usually a power of two. Many memory-test algorithms are based on bitoriented design, i.e., read and write operations access only one bit in the memory. It is mentioned that WOMs can be tested by repeated application of a test for bit-oriented memories, and a different data background is used during each iteration [21] . The new march algorithm referred as DWOM march algorithm is introduced in Table XI. To describe March DWOM, the traditional march notation will be used. A complete march test consists of a finite sequence of march elements [22] . A march test can be delimited by a pair of brackets "{· · ·}." A march element is composed of a finite sequence of operations applied to every word in the memory before the next word can be proceeded. A march element can be denoted by a pair of parentheses "(· · ·)," and it can be done in two address orders: An increasing (⇑) address order (from address zero to address n-1), or a decreasing (⇓) address order. The test algorithm shown in Table XI is based on the assumption that SDs can only exist within one cell or between two adjacent cells and can detect all traditional and drowsy faults.
Totally, the proposed March DWOM contains 16 march elements, within which four are drowsy operations. The major march elements are explained as follows. M2 performs four operations for each word exercised. The operations are: 1) a read operation is performed to the word and background data zero is expected; 2) a write operation is performed and background data one is written into the word addressed; 3) a read operation is performed and background data one is expected; and 4) a write operation is performed and background data zero is written into the word. In M3, for each word, two consecutive read operations are exercised and both expect background data zero. In M4 (and also M9, M12, and M15), the entire cache memory is driven into drowsy mode for T time units, which is the longest time that is required for all drowsy faults. In this paper, it is assigned the longest drowsy time (i.e., 2 ms) required by open defects (in a cell) shown in Table VI . In M11, all words with odd (even) addresses are written background data zero (one), and then, the entire cache memory is driven into drowsy mode in M12. All other march elements can be discussed similarly. Note that the idea of memory background data can be found in [13] .
For FFMs with undefined state outputs (X), i.e., the output voltages are between high and low, the proposed march algorithm can also detect them by using a voltagewindow-detection circuit shown in Fig. 11 . Note that A and B in Fig. 11 are both operational amplifiers, and we have V ref1 = ((R3)/(R1 + R2 + R3))V dd , and V ref2 = ((R2 + R3)/(R1 + R2 + R3))V dd . The output V out is high when we have V ref1 < V in < V ref2 , and is low if otherwise. As a result, by configuring R1, R2, and R3 properly, this circuit can detect the undefined states.
In the following discussions, SD i is used to describe the SDs mentioned in Section II, where SD ∈ {BC1, . . . , dBCC2}, and i ∈ {I, II, . . .} denotes the resistance region. For example, BC1 I denotes the FFM of BC1 defect in resistance region 1, which is an undefined state fault [9] .
B. Fault Model Simplification
Section III gives a detailed description for FFMs of each SD in different resistance regions. However, the complexity of our march testing algorithm can be greatly reduced by the following simplification rules. 16 . Note that all combinations of logic values required for each pair of a-cell and v-cell can be supported by these four march elements. 6) All faults in data caches and instruction caches are detected. As mentioned before, CF dwdf s ( wx; drz/z/− ), some of CF dtf s ( x; drz/z/− ), and some of CF drf s ( x; drz T /z/− ) are the major difference between the fault behavior of data caches and instruction caches. The reason comes from the fact that memory cells within the same instruction subbank do not have the above faults. But, adjacent cache lines belonging to different instruction subbanks still have these faults and must be considered separately. Fortunately, based on simplification Rule 1), SDs with CF dwdf are detected by march operations for CF st ; based on simplification Rules 3) and 4), all CF dtf s and CF drf s discussed here can be detected by drx; drz T /z/− . This property enables the detection of the faults discussed in this item to be detected by driving the entire data cache and instruction cache into drowsy mode.
In conclusion, March DWOM can detect all the faults in data caches and instruction caches.
D. Built-In Self-Repair
It is found that, for some defects, the cell manifests itself only under drowsy mode. For example, in the resistance regions 4 and 5 of the BC3 defect in Table V , the cell works properly under normal mode, but its value is inverted when it is placed into drowsy mode. Instead of discarding a chip when a fault is detected, we can still use it if it is only a drowsy-mode defect.
The basic idea is that during testing mode, a drowsy_mask[i] register bit will be set to "0" if cache line i only manifests itself in drowsy mode. This can be done in march elements M 5 , M 10 , M 13 , and M 16 in Table XI . During working mode of the cache, the drowsy-cache controller checks drowsy_mask[i] when issuing the drowsy control signal to cache line i. This can be done by an AND gate and the BISR architecture is shown in Fig. 12 .
The BISR architecture differs for data caches and instruction caches. For data caches, the drowsy control circuit has a register bit (i.e., drowsy_mask[i]) for each cache line. Instruction caches are divided into several subbanks to implement the drowsy control and each subbank needs only one drowsy signal. To utilize the BISR feature, each cache line within a subbank is also connected to a drowsy_mask[i] register, as shown in Fig. 12 . Hence, using a two-level drowsy control, a subbank of each instruction cache can still be placed into drowsy mode even when a cache line within this subbank fails in drowsy mode. When the subbank is placed into drowsy mode, all its cache lines are in drowsy mode except the faulty one. As a conclusion, this BISR architecture works well for both data caches and instruction caches.
V. CONCLUSION AND FUTURE WORK
In this paper, we implemented a fully functional drowsy SRAM cache with peripheral circuits such as address decoder and sense amplifier to investigate the fault behaviors of SDs under the stand-by voltage. Based on the assumption that SDs can only exist either within a cell or between two adjacent cells in the same row/column/diagonal, we simulated all possible SDs with different resistance regions (from 0 to ∞ Ω) in normal mode and drowsy mode separately. Six new faults (DTF, DUF, DDRF, CF dtf , CF dwdf , and CF drf ) appear with the introduction of drowsy operations. A data-background-based march algorithm called March DWOM has been developed. The proposed march algorithm has been greatly simplified by a set of simplification rules that are similar to fault collapsing for random-logic circuits [12] . A voltage-window-detection circuit is used to detect faults with undefined states. With the benefit of detecting undefined states, March DWOM can detect all SDs in the drowsy cache we implemented, and thus, it has full fault coverage for the drowsy-fault models identified in this paper and those traditional fault models developed in [9] . A BISR technique has also been proposed in this research to tolerate cache lines with drowsy defects, as long as the power budget is not exceeded. Our future works will be focused on: 1) refining the drowsy-fault models using more advanced fabrication technologies (e.g., 0.09-µm technology) and using industrial SRAM circuits; 2) investigating more powerful march algorithms to further reduce the test application time; 3) examining the drowsy-fault behavior under process variation; and 4) finding a compromise in adding a control bit to each block of cache lines (instead of each cache line) to compromise between yield and hardware overhead.
