Abstract: The test scheduling problem for built-in self-tested embedded SRAMs (e-SRAMs) when data retention faults (DRFs) are considered is addressed here. We proposed a 'retention-aware' test power model by taking advantage of the fact that there is near-zero test power during the pause time for testing DRFs. The proposed test scheduling algorithm then utilises this new test power model to minimise the total testing time of e-SRAMs while not violating given power constraints, by scheduling some e-SRAM tests during the pause time of DRF tests. Without losing generality, we consider both cases where the pause time for DRFs is fixed and cases where it can be varied. Experimental results show that the proposed 'retention-aware' test power model and the corresponding test scheduling algorithm can reduce the testing time of e-SRAMs significantly with negligible computational time.
Introduction
Embedded memories, in particular embedded SRAMs (e-SRAMs), tend to consume most of the silicon area in today's system-on-a-chips (SoCs), ranging from register files as small as 64 bits to larger caches with sizes of hundreds of kilobits or even megabits [1] . Because of their extreme density, e-SRAMs are more prone to manufacturing defects than the other types of on-chip circuitry (e.g. standard cells) and it is important to test them thoroughly to ensure an acceptable SoC yield. Therefore how to efficiently and effectively test these hundreds of instances of e-SRAMs on-chip for all possible faults becomes a major challenge for the SoC system integrators [2] . On the one hand, we would like to let more e-SRAMs be tested in parallel to reduce the total testing time and hence the SoC test cost. On the other hand, however, the test power constraint becomes a major concern because power consumption in test mode is usually higher than the one in functional mode [3] . Therefore efficient power-constrained test scheduling techniques (e.g. [4] ) play a key role in reducing e-SRAM test cost. Most prior work in test scheduling assumes a constant power consumption during the entire test process. As shown in Fig. 1b , an e-SRAM test can be represented by a rectangle, where its width denotes the testing time and its height denotes the test power. Although simple and effective for logic testing, this model is overly pessimistic for e-SRAM testing when data retention faults (DRFs) are considered [5] . DRFs model the defects in SRAM bit cells that fails to retain a stored logic value. The most common test method for DRFs is simply loading a known value into the cell and waiting for a period of time (up to hundreds of milliseconds [1] ), and then reading it out, as shown in Fig. 1a . During the two DRF pause time (for retention test of both logic '0' and logic '1'), no read/write operations are performed and hence it consumes near-zero test power. By taking this property into account, we propose a 'retention-aware' test power model for built-in self-tested (BISTed) e-SRAMs, in which each e-SRAM test is represented by three rectangles A, B and C with interval T AB and T BC corresponding to the DRF pause times, as shown in Fig. 1c . Based on this new test power model, we present an efficient and effective test scheduling algorithm that minimises the total testing time of e-SRAMs under given power constraints, by scheduling some e-SRAM tests during the pause time of DRF tests. Without loss of generality, we consider both cases where the pause time for DRF tests is fixed and cases where it can be varied. Experimental results show that our approach significantly reduces the total testing time under given power constraints.
Prior work and motivation

Related work in DRF tests
From the functional point of view, e-SRAM data retention faults behave as that the e-SRAM cell cannot retain a logic 1/0 after a certain amount of time [6] . From the defect point of view, DRFs are usually caused by a defective source, drain or gate open of the pull-up transistor of the e-SRAM cell or by a defective power or ground path. Based on the above, there are mainly two types of DRF testing methodologies: (i) functional-based, that is, introducing pass time in March tests [7, 8] and (ii) defect-based, that is embedding various design for test (DFT) circuitries to identify DRFs in a short time [1, 9 -17] .
DFT-based DRF testing methods embed dedicated circuitries in e-SRAM cells and/or their peripherals and detect DRF-related defects with specially designed operations. Among the previous work [1, 9 -17] weak-write method [15] has excellent DRF detectability due to the fact that the weak write value can be programmable on the fly, while pre-discharge write method proposed in [16] leads to the most significant test time savings (close to zero) and has the additional benefit of at-speed testability that is more important for deep sub-micron technology [18] . Although effective on detecting DRFs, the above DFT techniques require more design efforts and also often come with high hardware and/or performance overhead. Moreover, since the DFT circuitries are implemented at transistor level, these techniques are technology-dependent and hence requires verification at every technology node for all corner cases, which may significantly increase time-to-market. Because of the above reasons, most memory compilers supplied by memory vendors today do not provide the feature to apply the above DFT techniques.
Therefore we consider the case that DRF tests are applied in the traditional functional-based methods. As shown in Fig. 1a , all the e-SRAM cells are firstly initialised as a logic value 1/0. After that, the e-SRAM under test is disabled, that is, no read or write operation is conducted, for a pre-defined pause time (up to several hundred milliseconds) before reading the values out. To reduce DRF testing time, Wang et al. [8] proposed to reuse the initialisation time of the neighbourhood cells which are not on the same row as the cells under test as part of the pause time. This technique, however, is only effective for large e-SRAMs. As discussed in [1] , retention testing needs to consider the slow process corner case, whose leakage (responsible for the loss of the stored logic value) actually slows from 130 to 90 nm. Because of this, the pause time for testing DRFs does not decrease significantly with the increasing chip operational frequency, and hence the testing time for DRFs dominates the total e-SRAM testing time when applying pause test, especially for small e-SRAMs. In fact it is the above observation that motivates this work on how to effectively and efficiently utilise the pause time for DRF tests in test scheduling process.
Related work in test scheduling
Test scheduling is the process that allocates test resources (e.g. test bus lines or BIST engines) to cores at different time in order to minimise the overall testing time, while at the same time satisfying the given constraints [19] . Various constraints need to be considered during test scheduling, but probably the most important one is the test power constraint. That is, testing more cores in parallel usually result in reduced testing time; however, it will also increase the test power, which may lead to destructive testing [20] .
Many test scheduling techniques have been proposed in the literature [3, [21] [22] [23] [24] [25] (only name a few). In particular, [21, 23] considered power constraint in their work. The above work, however, mainly targets on the test scheduling of logic cores (usually scanned), and one of the design aims is to design an efficient test access mechanism (TAM) architecture to link the test source/sink to the core under test. e-SRAM tests, however, are usually conducted by BIST engines, without involving TAM design and optimisation issues. Another major difference of e-SRAM test from logic test is that the testing time for an e-SRAM is a fixed constant with its size given, while the testing time for a logic core usually varies with the assigned TAM width. Wang et al. [4] proposed a simulated annealing (SA) algorithm for the test scheduling of BISTed memory cores. Test power for each memory is assumed to be constant during its entire testing process and the computational time is quite high when the number of memory cores is large. Fang et al. [26] presented an effective and efficient power-constrained test scheduling heuristic for their hardware/software co-testing methodology. None of the above work considered the special features of DRF pause tests.
Impact of e-SRAM BIST architecture
How the e-SRAM BIST architectures are designed affects the test scheduling process. For example, when many different e-SRAMs share the same BIST engine to save silicon area, depending on the BIST scheme, they may [27] or may not [28] be able to be tested in parallel. Since at-speed testability for e-SRAMs becomes more important with the ever increasing operational frequency, most of the current system integrators prefer to design unique BIST engine for each and every e-SRAM, at least for the timing-critical portion of the BIST engine, for example, the address generator, the control signal generator and the comparator [29] . As a result, we consider the case that each e-SRAM is supplied with its own BIST engine. It is important to note that, however, the proposed approach can easily be generalised to the BIST-sharing scenario by adding additional constraints into the test scheduling process.
In addition, whether the BIST engine is 'soft' or 'hard' significantly affects the test scheduling process. When it is 'soft', that is, the system integrator is able to modify its architecture, the pause time for DRF tests (i.e. T AB and T BC ) can be changed easily. When it is hard-wired, however, the pause time is a pre-determined fixed value. Without loss of generality, we consider both cases.
3
Retention-aware test scheduling
The retention-aware test scheduling problem investigated in this section can be stated as follows: Problem P drf -opt : Given the test parameters for the BISTed e-SRAMs, including † the total number of e-SRAMs N m ; † the maximum allowed test power P max ;
IET Comput. Digit. Tech., Vol. 1, No. 3, May 2007 † for each BISTed e-SRAM i, the test power consumption P i , the testing time T A_i , T B_i and T C_i for blocks A, B and C; † the minimum pause time for testing DRFs T pause ;
determine the test schedule of all e-SRAMs such that (i) the total testing time is minimised; (ii) the pause time for testing DRFs satisfies T AB ! T pause and T BC ! T pause and (iii) the test power consumption at any moment does not exceed P max .
3.1 Scheduling with flexible DRF pause time 3.1.1 Packing-based scheduling strategy: Since each e-SRAM test i can be modelled by three rectangular blocks A i , B i and C i (see Fig. 1c ), our objective can be seen as to pack all the rectangles A i , B i , and C i (i ¼ 1, . . . , N m ) into a rectangular region of height not exceeding P max and of a minimised width such that for every e-SRAM i, the separation between A i and B i and the separation between B i and C i are at least T pause . This is a typical constrained rectangle packing problem and can be modelled and solved by using a SA approach as described in [30] , borrowed from the floorplanning literature. In this approach, SA is used to search for a good packing satisfying a given set of general placement constraints. In each annealing step, a candidate packing solution S represented by a sequence pair [31] , is evaluated. A pair of constraint graphs, G h and G v , are constructed according to the sequence pair to realise a packing from its representation. To impose a 'minimum separation' constraint between two blocks, for exmple, between A i and B i (or between B i and C i ), an edge of weight T pause will be inserted into the horizontal constraint graph from A i to B i (from B i to C i , respectively). According to the definition of horizontal constraint graph, an edge e(v i , v j ) from v i to v j of weight w means that the block represented by v j must be placed at a distance of at least w units on the right of the block represented by v i . After adding all these additional constraint edges, a single source shortest path algorithm can be performed on the constraint graphs to find out the location of each block. The resulting packing will automatically have all the minimum separation constraints satisfied. It may happen that a positive cycle is formed in the horizontal constraint graph after adding those additional constraint edges and the single source shortest path algorithm will be failed, implying that the current candidate floorplan solution is infeasible to satisfy all the minimum separation constraints. In this case, we will remove all the additional constraint edges and simply pack the blocks according to the sequence pair. A penalty term will be included in the cost function to penalise the violated constraints. The cost function of a candidate solution S used in the annealing process is as follows
where a and b are weights, area(S) is the area of S and is computed as P max Â width(S), Penalty 1 (S) is the penalty for exceeding the maximum allowed test power P max and Penalty 2 (S) is the penalty for violating the minimum separation constraints. Penalty 1 (S) and Penalty 2 (S) are computed as
where x(R) of a rectangular block R is the x-coordinate of the lower left corner of R. The SA engine provides a very flexible framework to solve this constrained block packing problem. However, its runtime is very long for problem instances with a large number of blocks and constraints. To make use of this packing-based approach, some groupings betweeen the memories will be done as a pre-processing step. First of all, some memories of the same type and belonging to the same testing period, that is, period A, B or C, will be grouped together as one block and they are grouped in such a way to form a squareshaped rectangle as much as possible (packing of squareshaped rectangles are relatively easier). For example, if there are 12 BISTed e-SRAM i, with test power consumption P i ¼ 18, testing time T A i ¼ 60, T B i ¼ 12 and T C i ¼ 12, these 12 A blocks will be grouped together in the form of 6 Â 2 since the dimensions of this 6 Â 2 combined block will be 108 Â 120 (6 Â P i ¼ 108), which is a possible shape closest to a square. Similarly, we do such preprocessing for the A, B and C blocks of each BISTed e-SRAM i to reduce the problem size. Memories of the same type and belonging to the same testing period will be grouped together if their total area does not exceed a certain threshold of the total area of all the memory blocks. This threshold is set by the user to control the trade-off between the optimality of the solution and the runtime. The smaller the threshold, less grouping will be done, and the solution quality will be higher but the runtime will be longer. In all the following experiments, a theshold of 0.01 is used, that means there will be at most 100 blocks in the problem instance after grouping.
Fast scheduling heuristic:
The pre-processing step used in the above packing-based scheduling strategy significantly reduces computational complexity, but it also greatly restricts the available solution space and hence may lead to excessive testing time. In this section, we present another heuristic that is both efficient in terms of runtime and effective in terms of testing time, based on the algorithm presented in [26] . In this heuristic (as shown in Fig. 2) , every e-SRAM test block (i.e. A i , B i or C i ) is treated as a scheduling unit, and its data structure is as follows While the other variables are self-explanatory, the variable lowerLimit is utilized to meet the DRF interval T pause constraint and is discussed in detail in the following algorithm.
The algorithm DRF_Flexible_Schedule takes the set of memory test blocks MB, T pause and P max as inputs and outputs the test schedule of all e-SRAMs. It starts by initialising the lowerLimit for every memory test block in MB. For the blocks whose type is 'A', lowerLimit is initialised to be zero; while for the other memory blocks whose type is 'B' or 'C', they are initialised to be 1. As a result, in the very beginning of the test scheduling process, only 'A' type of memory test blocks can be scheduled. Next, the current schedule begin time is initialised to zero, the currently available power constraint P avl is initialized to P max and the number of unscheduled memory blocks is initialised to the size of MB (line 2). As long as there exist unscheduled memory test blocks, the algorithm first tries to find the maximum one that can be scheduled at thisTime (line 5). If such m i exists, it will be scheduled by updating its begin i , end i and isScheduled i (line 7). Line 8 updates P avl and N unscheduled after scheduling m i . If m i is of 'A' or 'B' type, we need to update the lowerLimit of the corresponding 'B' or 'C' block (line 9). If no such blocks can be found and at the same time P avl ¼ P max , which means all the unscheduled blocks are of type 'B' or 'C', and their lowerLimit all exceed thisTime. In this time, we have to insert idle time into the test schedule and update thisTime accordingly (lines 11 -12) . If no such blocks can be found but P avl , P max , which means the current available test power is not enough, we will record this idle power P idle (line 14), and branch to finish some currently scheduling blocks to release more available test power (lines [15] [16] [17] [18] [19] [20] .
The algorithm then repeats the loop (lines 4 -24) and ends only when all memory test blocks are scheduled.
Scheduling with fixed DRF pause time
The above DRF tests with flexible pause time requires the system integrator to revise the BIST engine for each e-SRAM based on the final test schedule. This not only involves some development effort that may result in longer time-to-market, but more importantly, there are cases that the BIST engines are hard-wired and the pause time simply cannot be changed. As a result, in this section, we consider how to schedule e-SRAM tests with fixed DRF pause time
Because of this fixed wait period, whenever an 'A' type of memory test block m i A is scheduled, the schedule of its corresponding m i B and m i C are determined already. Therefore the three blocks cannot be treated as independent scheduling units and have to be considered as a whole. At the same time, it is fairly difficult to keep track of the power profile during the scheduling process. For example, as can be observed from Fig. 3 , the power profile after scheduling only three e-SRAMs is already quite complex. To reduce the complexity of this problem, instead of dynamically scheduling memory test blocks in between the DRF pause time T AB and T BC , we propose to group multiple e-SRAM tests statically before scheduling them. The main idea is to try to fill up the DRF pause time as much as possible during the initial grouping phase, and then treat the entire group of e-SRAM tests as a single scheduling unit. The pseudocode for this pre-processing procedure is shown in Fig. 4 . The procedure Group_Tests takes the set of e-SRAMs M, T pause and P max as inputs and outputs the e-SRAM test groups MG. It starts by initialising the set of ungrouped e-SRAMs M ungrouped , and the index i of the current memory group mg i . Then we sort the memory tests in nonincreasing order in terms of their power consumption (line 2). Inside the outer loop of the procedure, the first e-SRAM test (i.e. the memory test in M ungrouped with the maximum test power) is put in mg i (line 4). When this is the last ungrouped e-SRAM, the procedure has already finished grouping and terminates (lines 5-7). Otherwise, we try to group other e-SRAM tests with their 'A' and 'B' blocks embedded in T AB_1 and T BC_1 . To check the feasibility, we define terms Range A , Range B , T AB_occupied , and T BC_occupied , which denotes the range to fit the e-SRAM's 'A' block, the range to fit the e-SRAM's 'B' block, the already occupied DRF pause time T AB , and the already occupied DRF pause time T BC , respectively. The physical meaning of the above terms can easily be observed from Fig. 5 . Whenever an e-SRAM test is grouped into mg i , these values are updated (lines [8] [9] [22] [23] . When T AB_occupied . T pause or T BC_occupied . T pause , no more e-SRAM tests can be grouped into mg i , and hence we proceed to generate a new memory test group (lines 11-13). Otherwise, we first try to find a compatible e-SRAM test with maximum power consumption that is able to fit in without conflicts (see Fig. 4 ). If such memory test exists, it is grouped (line 18). If the available test power allows and there are some other exactly the same type of memories, they are grouped with the same schedule (lines [19] [20] [21] . The procedure halts when all e-SRAM tests are grouped. Fig. 5 shows an example grouping process with four e-SRAM tests.
After the e-SRAM test groups are generated with the above procedure, each group is treated as a single unit during the test scheduling process, which, again, can be modelled as a rectangle (i.e. the dashed-rectangle as shown in Fig. 5) . A heuristic similar to Algorithm 1 without constraints is then utilised for this problem to minimise testing time.
Experimental results
To show the benefits of the proposed retention-aware test scheduling techniques, we constructed four test cases as follows:
1. 500 instances of 64 Â 256 and 500 instances of 512 Â 8 e-SRAMs, in total 1000 e-SRAMs and about 10 Mb; 2. 10 instances of 16 k Â 32 and 5 instances of 64 k Â 16 e-SRAMs, in total 15 e-SRAMs and about 10 Mb; 3. 37 mixed types of e-SRAMs, in total 418 e-SRAMs and about 5 Mb; 4. a combination of the above, in total 1433 e-SRAMs and about 25 Mb.
The detailed configurations for the test cases are shown in Table 1 , in which N, P, T A , T B , and T C denote the number of each type of e-SRAMs, the test power consumption, the testing time for blocks A, B and C, respectively. Note that we assume all e-SRAMs are tested in 100 MHz when we acquire P from our memory compiler. Although different e-SRAMs may be tested in distinct frequencies in practice, this would not affect the effectiveness of our approach. Tables 2 -5 compare the total e-SRAM testing time using different test scheduling schemes with the variation of the DRF pause time T pause and the given power constraint P max . T reg , T packing , T flex and T fixed represent the testing time using the regular 'single-rectangle' test power model, the testing time using packing-based algorithm shown in Section 3.1.1 when T pause can be varied, the testing time using the fast heuristic shown in Section 3.1.2 when T pause can be varied, and the testing time using the grouping-based strategy shown in Section 3.2, respectively. They are all in unit clock cycles. Since we assume the e-SRAMs are tested in 100 MHz, T pause varies from 500 ms to 100 ms in our experiments. DT flex and DT fixed are calculated as DT flex ¼ T flex 2 T reg /T reg Â 100% and DT fixed ¼ T fixed 2 T reg /T reg Â 100%, which shows the benefit of the proposed 'retention-aware' test scheduling algorithms for variable and fixed DRF pause time, respectively. From these tables, we can observe T flex (with computational time within a second) is better than T packing (with computational time in minutes) in all cases. This is mainly because, to reduce runtime, the packing-based scheduling strategy group many e-SRAM tests first. This limits the solution space for Problem P drf -opt , which, however, can be explored in the fast heuristic presented in Section 3.1.2.
It can be also seen from Tables 1 -4 that the total e-SRAM testing time is reduced in most cases with the proposed 'retention-aware' test scheduling techniques, for both cases with flexible DRF pause time and cases with fixed DRF pause time. The reduction is especially significant when T pause is large. This is expected because more e-SRAMs can fit in the DRF pause time during the scheduling process in such cases. While these times with idle test power consumption are wasted in traditional singlerectangle model. We can also observe that the savings in testing time are usually larger when P max is smaller. This is also expected because the 'retention-aware' test power model is not very effective when the power constraint is relaxed. For example, the total test power consumption of all e-SRAMs in test case 3 is less than 800 mW. When P max ¼ 500 mW, similar to using single-rectangle test power model, the retention-aware scheduling approach also wastes lots of idle power in the final schedule. Therefore the savings are not very large.
In a few cases, the proposed method leads to a slightly longer testing time (e.g. when DRF pause time is fixed, P max ¼ 200 mW and T pause ! 5 M in test case 2). This is due to the fact that test case 2 has only 15 large e-SRAMs, and when T pause ! 5 M several e-SRAMs can be grouped into one scheduling unit (when T pause , 5 M, e-SRAMs in test case 2 cannot be grouped and the scheduling process is exactly the same as the single-rectangle power model). As shown in Fig. 3 , the grouping happens in the horizonal direction and the testing time of the group becomes larger than the testing time of each individual e-SRAM. By using the single-rectangle power model; however, these e-SRAMs may be able to be scheduled in the vertical direction and hence reduced testing time can be achieved. Nevertheless, this situation rarely happens when the number of e-SRAMs is large and/or the sizes of e-SRAMs are small. There are also other few cases that T reg , T flex and we attribute them to the fact that the fast heuristic explores only part of the solution space.
Conclusion
Traditionally, test power modelling treats e-SRAMs the same as logic cores and represents the test using a 'single-rectangle' model. This paper showed that this model is overly conservative because of the near-zero power delay cycles used to detect data retention faults. By taking advantage of this property, we proposed a retention-aware test power model and the associated test scheduling techniques. We considered both cases where the DRF pause time is fixed and cases where it can be varied. Experimental results show that the proposed approach can significantly reduce e-SRAM testing time, especially when the power constraint is tight and/or the DRF pause time is large. As stressed in [1] , the DRF pause time can be as large as up to hundreds of ms even for the future technologies, the proposed approach is able to greatly reduce the e-SRAM test cost. 
