In this paper we address the test scheduling problem for Builtin Self-tested (BISTed) 
Introduction
Embedded memories, in particular embedded SRAMs (e-SRAMs), tend to consume most of the silicon area in today's system-on-achips (SoCs), ranging from register files as small as 64bits to larger caches with sizes of hundreds of kilobits or even megabits [2] . Because of their extreme density, e-SRAMs are more prone to manufacturing defects than the other types of on-chip circuitry (e.g., standard cells) and it is important to test them thoroughly to ensure an acceptable SoC yield. Therefore, how to efficiently and effectively test these hundreds of instances of e-SRAMs on-chip becomes a major challenge for the SoC system integrators [3] . On the one hand, we would like to let more e-SRAMs be tested in parallel to reduce the total testing time and hence the SoC test cost. On the other hand, however, the test power constraint becomes a major concern because power consumption in test mode is usually higher than the one in functional mode [17] . Therefore, efficient power-constrained test scheduling techniques (e.g., [12] ) play a key role on reducing e-SRAM test cost. Most prior work in test scheduling assumes a constant power consumption during the entire test process. Although simple and effective for logic testing, this model is overly pessimistic for e-SRAM testing when data retention faults (DRFs) are considered [1] . DRFs model the defects in SRAM bit cells that fails to retain a stored logic value. The most common retention test method is simply to load a known value into the cell and wait for a period of time (up to hundreds of milliseconds [2] ), and read it out. During the two DRF pause time (for retention test of both logic '0' and logic '1'), no read/write operations are performed and hence it consumes nearzero test power. By taking this property into account, Wang et al. [1] proposed a "retention-aware" test power model for testing built-in self-tested (BISTed) e-SRAMs, in which each e-SRAM test is represented by three rectangles A, B, and C with interval T AB and T BC , as shown in Figure 1 . Empirical analysis is given in [1] that shows this model may significantly reduce the total e-SRAM testing time. However, automatic test scheduling algorithms are not provided in their work. Therefore, in this paper, we propose "retention-aware" test scheduling techniques for e-SRAMs based on the test power model presented in [1] . Experimental results show that our approach significantly reduces the total testing time under given power constraints.
The remainder of this paper is organized as follows. Section 2 reviews the related work in this domain. The detailed "retentionaware" test scheduling algorithms are presented in Section 3. Next, Section 4 presents our experiments on four test cases. Finally, Section 5 concludes this paper.
Prior Work
Related Work in DRF Tests: As discussed in [2] , retention testing needs to consider the slow process corner case, whose leakage (responsible for the loss of the stored logic value) actually slows from 130nm to 90nm. Because of this, the pause time for testing DRFs does not decrease significantly with the increasing chip operational frequency, and hence the testing time for DRFs dominates the total e-SRAM testing time. To address this problem, many design for test (DFT) techniques have been proposed to completely remove the pause time from the test flow, by introducing customized circuitry [2, 10, 14, 15] . Although effective, the above DFT techniques often come with high cost in terms of hardware/performance overhead and design efforts. Moreover, the memory compiler supplied by memory vendors often does not provide the feature to apply the above DFT techniques. As a result, in this paper we assume the DRF tests are applied in the traditional way with extended wait period.
Related Work in Test Scheduling:
Many test scheduling techniques have been proposed in the literature [4, 6, 7, 9, 13, 17] (only name a few). In particular, [4, 7] considered power constraint in their work. The above work, however, mainly targets on the test scheduling of logic cores (usually scanned), and one of the design aims is to design an efficient test access mechanism (TAM) architecture to link the test source/sink to the core under test. e-SRAM tests, however, are usually conducted by BIST engines, without involving TAM design and optimization issues. Another major difference of e-SRAM test from logic test is that the testing time for an e-SRAM is a fixed constant with its size given, while the testing time for a logic core usually varies with the assigned TAM width. Wang et al. [12] proposed a simulated annealing (SA) algorithm for the test scheduling of BISTed memory cores. Test power for each memory is assumed to be constant during its entire test process and the computational time is quite high when the number of memory cores is large.
The Impact of e-SRAM BIST Architecture: How the e-SRAM BIST architectures are designed affects the test scheduling process. For example, when many different e-SRAMs share the same BIST engine to save silicon area, depending on the BIST scheme, they may [8] or may not [11] be able to be tested in parallel. In this paper, we consider the case that each e-SRAM is supplied with its own BIST engine. It is important to note that, however, the proposed approach can be generalized to the above BIST-sharing scenario by adding additional constraints into the test scheduling process. In addition, whether the BIST engine is "soft" or "hard" significantly affects the test scheduling process. When it is "soft", i.e., the system integrator is able to modify its architecture, the pause time for DRF tests (i.e., T AB and T BC ) can be changed easily. When it is hardwired, the pause time is a pre-determined fixed value. Without losing generality, we consider both cases in this paper, as shown in the following section.
Retention-Aware Test Scheduling
The retention-aware test scheduling problem invested in this section can be stated as follows:
Problem P dr f −opt : Given the test parameters for the BISTed eSRAMs, including
• the total number of e-SRAMs N m ;
• the maximum allowed test power P max ;
• for each BISTed e-SRAM i, the test power consumption P i , the testing time T A i , T B i and T C i for block A, B and C; • the minimum pause time for testing DRFs T pause ; determine the test schedule of all e-SRAMs such that (i) the total testing time is minimized; (ii) the pause time for testing DRFs satisfies T AB ≥ T pause and T BC ≥ T pause ; and (iii) the test power consumption at any moment does not exceed P max .
In this section, we first consider the case that the BIST architectures for e-SRAMs are "soft" and hence T AB and T BC can vary as long as they are greater than T pause . Next, we consider the case that the BIST architectures for e-SRAMs are "hard" or the system integrator is not willing to make any changes. In this case, T AB = T BC = T pause holds for all e-SRAMs.
Scheduling with Flexible DRF Pause Time

Packing-based Scheduling Strategy
The retention-aware test scheduling with flexible DRF pause time problem can be modeled as a constrained rectangle packing problem. That is, we try to pack all the rectangles A i , B i and C i (i = 1 ... N m ) into a rectangular region of height not exceeding P max and of a minimized width such that for each BISTed e-SRAM i, the separation between A i and B i and the separation between B i and C i are at least T pause . We first borrow a SA-based approach as described in [16] to solve this problem.
To impose a 'minimum separation' constraint between two blocks, an edge of weight T pause is inserted into the horizontal constraint graph from A i to B i (from B i to C i respectively). The cost function of a candidate solution S used in the annealing process is as follows:
, where α and β are weights, area(S) is the area of S and is computed as P max × width(S), Penalty 1 (S) is the penalty for exceeding the maximum allowed test power P max and Penalty 2 (S) is the penalty for violating the minimum separation constraints. Penalty 1 (S) and Penalty 2 (S) are computed as:
, where x(R) of a rectangular block R is the x-coordinate of the lower left corner of R. The simulated annealing engine provides a very flexible framework to solve the constrained packing problem. However, it's runtime is long for problem instances with a large number of blocks and constraints. Therefore, in order to make use of this packing based approach effectively, some groupings between the memories is done first as a pre-processing step. First of all, all memories of the same type and of the same testing time (all A's, all B's or all C's) will be grouped together as one block and they are grouped in such a way to form a square-shaped rectangle as much as possible (packing for square-shaped rectangles are relatively easier). Then different types of memories belonging to the same testing time may be grouped together if their total area does not exceed a certain given threshold of the total area of the memory blocks. This threshold can be used to control the tradeoff between the optimality of the solution and the runtime. The smaller the threshold, less grouping will be done, the solution quality will be higher but the runtime will be longer. In all the following experiments, a threshold of 0.01 is used such that there will be at most 100 blocks in the problem instance after grouping.
A Fast Scheduling Heuristic
The pre-processing step used in the above packing-based scheduling strategy significantly reduces computational complexity, but it also greatly restrict the available solution space and hence may lead to excessive testing time. In this section, we present another heuristic that is both efficient in terms of runtime and effective in terms of testing time, based on the algorithm presented in [5] . In this heuristic, each memory test block is treated as a scheduling unit, and its data structure is as follows:
Data structure memory block While the other variables are self-explanatory, the variable lowerLimit is utilized to meet the DRF interval T pause constraint and is discussed in detail in the following algorithm.
The Algorithm DRF Flexible Schedule starts by initializing the lowerLimit for every memory test block in MB. For the blocks whose type is 'A', lowerLimit is initialized to be zero; while for the other memory blocks whose type is 'B' or 'C', they are initialized to be ∞. As a result, in the very beginning of the test scheduling process, only 'A' type of memory test blocks can be scheduled. Next, the current schedule begin time is initialized to zero, the currently available power constraint P avl is initialized to P max and the number of unscheduled memory blocks is initialized to the size of MB (lines 2). As long as there exist unscheduled blocks, the algorithm first tries to find the maximum memory test block that can be scheduled at thisTime (line 5). If such m i exists, it will be scheduled by updating its begin i , end i and isScheduled i (line 7). Line 8 updates P avl and N unscheduled after scheduling m i . If m i is of 'A' or 'B' type, we need to update the lowerLimit of the corresponding 'B' or 'C' block (line 9). If no such blocks can be found and at the same time P avl = P max , which means all the unscheduled blocks are of type 'B' or 'C', and their lowerLimit all exceed thisTime. In this time, we have to insert idle time into the test schedule and update thisTime accordingly (lines [11] [12] . If no such blocks can be found but P avl < P max , which means the current available test power is not enough, we will record this idle power P idle (line 14), and branch to finish some currently scheduling blocks to release more available test power (lines [15] [16] [17] [18] [19] [20] . The algorithm then repeats the loop (lines 4-24) and ends only when all memory test blocks are scheduled.
Scheduling with Fixed DRF Pause Time
This section considers how to schedule e-SRAM tests with fixed DRF pause time T AB = T BC = T pause . Because of this fixed wait period, whenever an 'A' type of memory test block m A i is scheduled, the schedule of its corresponding m B i and m C i are determined. Therefore, the three blocks cannot be treated as independent scheduling units and it is fairly difficult to keep track of the power profile during the scheduling process. As can be observed from Figure 3 , the power profile after scheduling only 3 e-SRAMs is already quite complex.
Algorithm 1. DRF Flexible Schedule
INPUT: MB, T pause , P max
OUTPUT: e-SRAM test schedule 1. Initialize lowerLimit for MB; 2. Initialize thisTime = 0; P avl = P max ; N unscheduled = |MB|;
find m i with maximum test length subject to .
lowerLimit i ≤ thisTime and P i < P avl ; 6.
if (found) { 7.
schedule m i ; 8.
find m i with minimum lowerlimit i subject to .
isScheduled i = f alse; 12.
thisTime = lowerlimit i ; 13.
} else { 14.
P idle = P avl ; P avl = 0; } .
} else { 15.
P To reduce the complexity of this problem, instead of dynamically scheduling memory test blocks in between the DRF pause time T AB and T BC , we propose to group multiple e-SRAM tests statically before scheduling them. The main idea is to try to fill up the DRF pause time as much as possible during the initial grouping phase, and then treat the entire group of e-SRAM tests as a single scheduling unit. The pseudocode for this pre-processing procedure is shown in Figure 5 . The procedure starts by initializing the set of ungrouped e-SRAMs M ungrouped , and the index i of the current memory group mg i . Then we sort the memory tests in nonincreasing order (line 2). Inside the outer loop of the procedure, the first e-SRAM test (i.e., the memory test in M ungrouped with the maximum test power) is put in mg i (line 4). When this is the last ungrouped e-SRAM, the procedure has already finished grouping and terminates (lines 5-7). Otherwise, we try to group other e-SRAM tests with their 'A' and 'B' blocks embedded in T AB 1 and T BC 1 . To check the feasibility, we define terms Range A , Range B , T AB occupied , and T BC occupied , which denotes the range to fit the e-SRAM's 'A' block, the range to fit the e-SRAM's 'B' block, the already occupied DRF pause time T AB , and the already occupied DRF pause time T BC , respectively. The physical meaning of the above terms can be easily observed from Figure 4 . Whenever a e-SRAM test is grouped into mg i , these values are updated (lines 8-9, 22-23). When T AB occupied > T pause or T BC occupied > T pause , no more e-SRAM tests can be grouped into mg i , and hence we proceed to a new memory test group (lines 11-13). Otherwise, we first try to find a compatible e-SRAM test with maximum P that is able to fit in without conflicts (see Figure 4) . If such memory test exists, it is grouped (line 18). If the available test power allows and there are some other exactly the same type of memories, they are grouped with the same schedule (lines 19-21). The procedure halts when all e-SRAM tests are grouped. Figure 4 shows an example grouping process with four e-SRAM tests. Each memory test group is treated as a single unit during the scheduling process, which, again, can be modeled as a rectangle (i.e., the dashed-rectangle as shown in Figure 4) . A heuristic similar to Algorithm 1 without constraints can then be used for e-SRAM scheduling.
Algorithm 2. Group Tests
break; . } 8.
T AB occupied = T B 1 ; T BC occupied = T C 1 ; 9.
determine 
Experimental Results
To show the benefits of the proposed retention-aware test scheduling techniques, we constructed four test cases as follows:
1. 500 instances of 64*256 and 500 instances of 512*8 eSRAMs, in total 1000 e-SRAMs and about 10Mb;
2. 10 instances of 16k*32 and 5 instances of 64k*16 e-SRAMs, in total 15 e-SRAMs and about 10Mb;
3. 37 mixed types of e-SRAMs, in total 418 e-SRAMs and about 5Mb;
4. a combination of the above, in total 1433 e-SRAMs and about 25Mb.
The detailed configurations for the test cases are shown in Table  1 , in which N, P, T A , T B , and T C denote the number of each type of e-SRAMs, the test power consumption, the testing time for block A, B, and C, respectively. Note, we assume all e-SRAMs are tested in 100MHz when we acquire P from our memory compiler. Although different e-SRAMs may be tested in distinct frequencies in practice, this would not affect the effectiveness of our approach. Tables 2-5 compare the total e-SRAM testing time using different test scheduling schemes with the variation of the DRF pause time T pause and the given power constraint P max . T reg , T packing , T f lex , and T f ixed represents the testing time using the regular "single-rectangle" test power model, the testing time using packing-based algorithm shown in Section 3.1.1 when T pause can be varied, the testing time using the fast heuristic shown in Section 3.1.2 when T pause can be varied, and the testing time using the grouping-based strategy shown in Section 3.2, respectively. They are all in unit clock cycles. Since we assume the e-SRAMs are tested in 100MHz, T pause varies from 500μs to 100ms in our experiments. ΔT f lex and ΔT f ixed are calcu-
× 100%, which shows the benefit of the proposed "retention-aware" test scheduling algorithms for variable and fixed DRF pause time, respectively.
From these tables, we can observe T f lex (with computational time within a second) is better than T packing (with computational time in minutes) in all cases. This is mainly because, to reduce runtime, the packing-based scheduling strategy group many e-SRAM tests first. This limits the solution space for Problem P dr f −opt , which, however, can be explored in the fast heuristic presented in 3.1.2.
It can be also seen from Tables 1-4 that the total e-SRAM testing time is reduced in most cases with the proposed "retention-aware" test scheduling techniques, for both cases with flexible DRF pause time and cases with fixed DRF pause time. The reduction is especially significant when T pause is large. This is expected because more e-SRAMs can fit in the DRF pause time during the scheduling process in such cases. While these times with idle test power consumption are wasted in traditional single-rectangle model. We can also observe that the savings in testing time is usually larger when P max is smaller. This is also expected because the "retention-aware" test power model is not very effective when the power constraint is relaxed. For example, the total test power consumption of all eSRAMs in test case 3 is less than 800mW . When P max = 500mW , similar to using single-rectangle test power model, the retentionaware scheduling approach also wastes lots of idle power in the final schedule. Therefore, the savings is not very large.
In a few cases, the proposed method leads to a slightly longer testing time (e.g., when DRF pause time is fixed, P max = 200mW and T pause ≥ 5M in test case 2). This is due to the fact that test case 2 has only 15 large e-SRAMs, and when T pause ≥ 5M several e-SRAMs can be grouped into one scheduling unit 1 . As shown in Figure 3 , the grouping happens in the horizonal direction and the testing time of the group becomes larger than the testing time of each individual e-SRAM. By using the single-rectangle power model, however, these e-SRAMs may be able to be scheduled in the vertical direction and
Conclusion
We proposed retention-aware test scheduling techniques in this paper for testing e-SRAMs when DRFs are considered, for both cases where the DRF pause time is fixed and cases where it can be varied. Experimental results show that the proposed approach can significantly reduce e-SRAM testing time, especially when the power constraint is tight and/or the DRF pause time is large. As stressed in [2] , the DRF pause time can be as large as up to hundreds of ms, the proposed approach is able to greatly reduce the e-SRAM test cost.
