Abstract-Overheating has been acknowledged as a major problem during the testing of complex system-on-chip integrated circuits. Several power-constrained test-scheduling solutions have been recently proposed to tackle this problem during system integration. However, we show that these approaches cannot guarantee hot-spot-free test schedules because they do not take into account the nonuniform distribution of heat dissipation across the die and the physical adjacency of simultaneously active cores. This paper proposes a new test-scheduling approach that is able to produce short test schedules and guarantee thermal safety at the same time. Two thermal-safe test-scheduling algorithms are proposed. The first algorithm computes an exact (shortest) test schedule that is guaranteed to satisfy a given maximum temperature constraint. The second algorithm is a heuristic intended for complex systems with a large number of embedded cores, for which the exact thermal-safe test-scheduling algorithm may not be feasible. Based on a low-complexity test-session thermal-cost model, this algorithm produces near-optimal length test schedules with significantly less computational effort compared to the optimal algorithm.
Thermal-Safe Test Scheduling for Core-Based
System-on-Chip Integrated Circuits Paul Rosinger, Bashir M. Al-Hashimi, Senior Member, IEEE, and Krishnendu Chakrabarty, Senior Member, IEEE Abstract-Overheating has been acknowledged as a major problem during the testing of complex system-on-chip integrated circuits. Several power-constrained test-scheduling solutions have been recently proposed to tackle this problem during system integration. However, we show that these approaches cannot guarantee hot-spot-free test schedules because they do not take into account the nonuniform distribution of heat dissipation across the die and the physical adjacency of simultaneously active cores. This paper proposes a new test-scheduling approach that is able to produce short test schedules and guarantee thermal safety at the same time. Two thermal-safe test-scheduling algorithms are proposed. The first algorithm computes an exact (shortest) test schedule that is guaranteed to satisfy a given maximum temperature constraint. The second algorithm is a heuristic intended for complex systems with a large number of embedded cores, for which the exact thermal-safe test-scheduling algorithm may not be feasible. Based on a low-complexity test-session thermal-cost model, this algorithm produces near-optimal length test schedules with significantly less computational effort compared to the optimal algorithm. Index Terms-Design for testability, manufacturing testing, reliability.
I. INTRODUCTION

R
ECENT REPORTS from industry indicate that power consumption during scan testing in some designs can be significantly higher compared to the normal operation mode. A case study involving the Motorola Version 3 ColdFire processor [14] reported a minimum of 3X increase of test power over functional power. In the same experiment, there have also been cases where the test power for at-speed compressed patterns was as high as 8X the functional power. A more recent industrial paper [19] reports test power up to 30X higher compared to the normal operation mode. The elevated levels of power dissipation during test lead inherently to higher die temperatures compared to the normal operation. This creates a number of problems because both soft error rates and aging increase exponentially with temperature. An undesirable consequence of overheating is thermal stress. At high tem- peratures, transistors fail to switch properly and many failure mechanisms, such as electromigration, are accelerated resulting in an overall decrease in reliability or even permanent damage. These problems are exacerbated for core-based system-onchip (SOC) designs because, quite often, several embedded cores are tested concurrently in order to reduce the overall test time. A significant amount of research has been devoted to reducing the power consumption during test in order to avoid the overheating of the silicon die during test. Consequently, several low-power solutions targeting core-level design-fortest (DFT), as well as system-level DFT, have been recently proposed. Techniques falling in the first category include lowpower scan chain architectures with gated clocks [17] , [18] , scan cell and test pattern reordering [3] , [5] , and low-transition test patterns generated by specialized automatic test-pattern generation (ATPG) algorithms [22] and low-transition TPGs [21] . The second category of techniques is mainly based on power-constrained test-scheduling algorithms [1] , [2] , [6] , [7] , [9] , [11] - [13] , [15] . 1 This paper focuses on avoiding overheating during test through appropriate test scheduling. The main contributions of the paper are:
1) propose a thermal-aware test as a better alternative to power-constrained test when dealing with overheating during test; 2) propose a test-session thermal model that will reduce the thermal simulation effort required to identify a thermalsafe test schedule. The motivation for this paper is presented in Section II. The basic ideas behind the existing power constrained testscheduling approaches are examined from the perspective of chip overheating, and it is explained why these approaches cannot guarantee thermal safety. Section III proposes a new test-scheduling approach that overcomes this problem. An exact algorithm that guarantees minimum test times as well as thermal safety is presented in Section III-A. It is shown through experimental results that significantly shorter test schedules can be obtained without increasing the maximum die temperature during test when compared with existing power-constrained test-scheduling approaches. While this algorithm guarantees the optimal solution for a given thermal limit, it may require significant computational effort for complex systems, which is mainly due to the required amount of accurate thermal simulations. Therefore, a fast heuristic algorithm for thermalaware test scheduling is proposed in Section III-C. This approach uses a low-complexity test-session thermal-cost model in order to speedup the solution space exploration and reduce the thermal-simulation effort required to reach an acceptable solution. The experiments show that this heuristic produces nearly optimal test schedules (and even optimal schedules in some cases) while significantly reducing the thermal simulation effort.
II. MOTIVATION
In this section, we examine the effectiveness of the powerconstrained test scheduling (PCTS) as a means of avoiding die overheating during test. The common idea behind PCTS is to impose a chip-wide maximum allowable limit on the power consumption, which should not be exceeded during the test application. Several recently proposed power-constrained testscheduling algorithms aim to maximize the number of tests running in parallel without exceeding this limit [1] , [2] , [6] , [7] , [9] , [11] - [13] , [15] .
Silicon die hot spots result from localized overheating, which occurs much faster than chip-wide overheating due to the nonuniform spatial on-die power distribution. Recent research supported by industrial observations suggests that spatial temperature gradients exceeding 30
• C are possible even under typical operating conditions [20] , which suggests that there are large variations in power density across the die. These gradients, especially between active and inactive blocks, are likely to increase during testing since test power dissipation can be significantly higher compared to functional power [14] , [19] . Having large variations in power densities across the die means that constraining the maximum chip-level power consumption is not an effective way for avoiding local overheating. We demonstrate this using the hypothetical system shown in Fig. 2 , which serves as an example of nonuniform power distribution. As shown in the test-description table from Fig. 1(c) , cores with different sizes, such as C1 and C3 for example, are assumed to consume the same amount of power during test. Let us consider two possible test sessions TS1 = {C2, C4, C5} and TS2 = {C3, C6, C7}. They are both valid in terms of test compatibility, as can be seen from the associated test-compatibility graph shown in Fig. 1(b) . According to the existing power constrained test-scheduling approaches, both test sessions would be acceptable under a power constraint of 15 W or more. We have run thermal simulations using the HotSpot tool presented in [20] , on each of these test sessions, and found a large discrepancy in terms of maximum die temperature: 127.19
• C for TS1 while only 66.47
• C for TS2. This difference is mainly because the power density (power consumed per area unit) varies significantly from the cores such as C2, C4, and C5 to cores such as C3, C6, and C7 (for example, the power density of core C2 is four times higher than that of C3). Moreover, the hottest cores in the two test sessions were C5 and C7. This is because of their reduced lateral heat-removal paths: cores C5 and C7 have only two "cold" (inactive) neighbors (more specifically, on their "EAST" and "SOUTH" edges), while all other cores have three "cold" neighbors.
This example has shown that imposing a global power constraint during testing cannot guarantee thermal safety because it does not consider power densities across the die nor the clustering of "hot" cores, which can limit the lateral heat removal. In the following section, we present a new test-scheduling approach that overcomes these issues.
III. THERMAL-SAFE TEST SCHEDULING
The mean time to failure (MTTF)-a commonly used metric in reliability models-is based on the Arrhenius equation, which shows that reliability is decreasing exponentially with the absolute junction temperature: MTTF = A exp[E a /kT ], where A is an empirical constant, E a is the so-called activation energy, and k is Boltzmann's constant [20] . The semiconductor industry is currently using commonly accepted limits for the maximum tolerable operating junction temperature based on the device package type. These have been well accepted as numbers relating to reasonable device lifetimes and thus failure rates. For example, for devices fabricated in a molded package, the maximum allowable junction temperature is 150
• C, while for devices assembled in ceramic or cavity dual in-line packages (DIP), the maximum allowable junction temperature is 175
• C [10] . Based on these practices, the thermal-safe test-scheduling approach proposed in this paper aims to produce solutions guaranteeing that the maximum allowable junction temperature will not be exceeded during test. Throughout this paper, the term "hot-spot" will be used to refer to cores that exceed the maximum allowable junction temperature during test. Any tests running below this critical temperature are considered to be "thermally safe."
In the following, we propose two thermal-safe testscheduling algorithms. The first one, although computationally expensive, computes the exact solution to the problem, i.e., the shortest test schedule that meets the thermal constraint. The second proposed algorithm takes into consideration the on-chip lateral heat-transfer paths in order to determine a nearly optimal solution with less computational effort. The results obtained using this algorithm are then compared with the solutions obtained using the exact algorithm.
Both proposed test-scheduling algorithms start from the set of cores (S) of the target system, the corresponding test compatibility graph (TCG), such as the one shown in Fig. 1(b) , and the maximum junction temperature that can be tolerated during test (T max ). Each core is annotated with the length of its corresponding test. The TCG captures the concurrence compatibility relationships between the system cores: Each node in the TCG corresponds to a core, and an edge between two nodes means that the two corresponding cores can be tested concurrently without causing any resource conflicts. The floorplan of each system is also needed for performing thermal simulations on the generated test sessions. The algorithms return a thermal-safe test schedule as a list of test sessions, where each test session is a group of cores to be tested concurrently.
A. Exact Algorithm
In this section, we present an algorithm for determining an exact solution to the thermal-safe test-scheduling problem (see Fig. 2 ). The algorithm computes the shortest test schedule, which guarantees that the specified T max will not be exceeded during test. An outline of this algorithm is presented in the following. First, a thermal simulation is performed on each individual core and corresponding test in order to ensure they all comply with the thermal constraint. The shortest test schedule is computed using only the test compatibility relations between the cores. A thermal simulation is then performed to check whether the test schedule complies with the thermal constraint (T max ). If T max is violated during any test sessions, these test sessions are discarded and the process is repeated until a thermal-safe test schedule is found.
We will explain the steps of the algorithm using the hypothetical system shown in Fig. 2 , assuming a thermal constraint T max = 110
• C. In the first stage (lines 1-6), it is ensured through thermal simulations that each core, when tested individually with all other cores inactive, does not exceed T max . None of the cores in our example system was found to violate the thermal constraint when tested individually. In case a thermal violation is detected, the designer needs to fix it by appropriate modifications to the core DFT infrastructure and/or test set. If the violation cannot be fixed, it means that T max is too restrictive, and other means for reducing the core temperature are needed. Possible solutions include redesigning the cooling structures for the chip and reducing the test clock frequency. Once all cores have passed this initial check, the algorithm computes the clique set [4] test compatibility cliques (TCC) for TCG (line 7). For our example, the clique set for the TCG shown in Fig. 1 
Since the number of nodes in TCG (number of cores in a design) is reasonably low, we have used a straightforward exhaustive search algorithm for determining TCC (the all_cliques function in Fig. 2 ). Each clique in the TCC represents a maximal group of cores that can be tested concurrently without causing resource sharing conflicts. Consequently, any valid test schedule must consist only of subsets (TCS) of the cliques in TCC (line 8 in Fig. 2 ). The shortest test schedule can be determined as the minimum weight set cover for TCS, where the weight of each test compatible subset in TCS is the length of its longest test (it is assumed that all tests in a test session start at the same time). In order to reach this result, a total of 5.8 s of test-session time had to be thermally simulated. Table I compares the results obtained using the proposed algorithm with those obtained using the power constrained test-scheduling approach presented in [7] . We have chosen the approach presented in [7] for comparison since it is very recent, has been applied to large designs, and performs well in comparison with other existing power constrained test-scheduling approaches. Details such as floorplan information and realistic test power and time values had to be added or modified in the original design descriptions in order to provide all necessary information for the proposed thermal safe test-scheduling algorithms. The modified design descriptions used in our experiments can be found at [16] . Some of the physical constants used for thermal simulations performed with the HotSpot tool presented in [20] are reported in Table II . The second column shows the test times corresponding to the power-constrained test schedules. Columns four to seven show the results corresponding to the proposed thermal-aware test-scheduling algorithm. For each design, the temperature limit T max was set to the maximum temperature of the power constrained test schedule in order to see whether shorter test schedules could be obtained within the same thermal limits. Columns four and five show the test times and relative savings obtained using the thermal-safe test-scheduling algorithm when compared to the algorithm presented in [7] . The experimental data show that the proposed algorithm was able to produce up to 26% shorter test schedules without increasing the maximum die temperature during test. It should be noted that the proposed thermal-aware test-scheduling approach does not pose any constraints on the overall power dissipation; hence, it is possible that the resulting test schedules may exhibit higher overall power compared to the 
B. Experimental Results for Exact Algorithm
power-constrained test schedules. However, this falls outside the goal of the proposed test-scheduling approach, which is only to keep the die temperature within the safe limits. The last two columns show the number of iterations and the cumulated length of the thermal simulations required in each case to find a thermal-safe test schedule. One iteration consists of the process of computing a test schedule and checking if it meets the thermal constraint. For example, for system_s, six test schedules need to be computed until a suitable (thermal-safe) solution was found. It should be noted that the most time-consuming part of the algorithm is represented by the thermal simulations, which could take up to a couple of minutes per design depending on the chosen time step. Computing the clique set and solving the ILP for the minimum weight set cover were taking under 1 s of CPU time on a Pentium IV at 1.8-GHz system.
C. Heuristic Algorithm
Although the algorithm presented in the previous section computes the optimal solution to the thermal-safe testscheduling problem, it requires significant computational effort, especially because it requires a large amount of thermal simulations. This is mainly because no knowledge of the heat-transfer paths is used while computing the test schedule, and the thermal compliance check is performed only in a postscheduling phase. This implies that for tight thermal constraints (such as in the case of system_s shown in Table I ), several iterations, and, thus, several thermal simulation runs, are required until a valid solution is found. The thermal simulation effort required to identify thermal-safe test schedules can be reduced by exploiting the knowledge of the on-chip heat-transfer paths. There are two predominant paths for heat transfer out of the integratedcircuit package. The first one is from the die to the surrounding package material, then to the package lead frame and on to the printed circuit board, and finally to the ambient air. The second path is from the package to the heat spreader, to the heat sink, and then to the ambient air. Local die temperature is strongly dependent on the proximity with other heat sources because close heat sources means more heat has to flow through the same paths. Therefore, keeping simultaneous heat sources as far apart as possible reduces the probability of hot spots.
In order to capture the thermal interactions between different cores that are tested concurrently, we have derived a thermoresistive model for the test sessions. The basic idea is to derive some quantitative measure of the lateral heat-removal paths for a core by taking into account the thermal interactions with active neighboring cores.
The duality between the electrical and thermal domains, illustrated in Table III , offers a convenient basis for an architecture-level thermal model. According to this duality relationship, heat flow can be described as a "current" passing through a thermal resistance leading to a temperature difference analogous to a "voltage." Thermal resistance R th is directly proportional to the thickness of the material (t) and inversely proportional to the cross-sectional area across which the heat is being transfered (A)
where k is the thermal conductivity of the material per volume unit (100 W/mK for silicon and 400 W/mK for copper at 85
• C). In order to clarify how the lateral thermal resistances are computed, consider the two adjacent cores CORE1 and CORE2 shown in Fig. 3 . The chip thickness is t and the core dimensions are (L1, W 1) and (L2, W 2), respectively. The lateral resistance R th21 is the thermal resistance from the center of Block 2 to the shared edge of cores one and two. In this case, the heat is constricted from CORE1 to CORE2 via the surface areas defined by L1 * t and L2 * t. The constriction thermal resistance can be calculated by assuming the heat source area to be L1 * t, the silicon bulk area that accepts the heat to be L2 * t, and the thickness of the bulk to be W 2/2. With these values found, the spreading/constriction resistance can be computed using the formulas given in [8] . The resistance is of the spreading type if the lateral area of the source is smaller than the bulk lateral area, and it is of the constriction type otherwise. When computing the lateral thermal resistances, each core is assumed to present a thermal resistance toward each neighboring core.
The lateral thermo-resistive representation for the example floorplan system shown in Fig. 1(a) , according to the thermoresistive thermal model presented in [20] , is shown in Fig. 4 . In the following, we propose a test-session thermal-cost model aiming to capture the thermal effects due to the physical proximity of simultaneously active cores, since these effects can be controlled by the choice of cores that are to be active at the same time. The proposed test-session thermal-cost model is 2) The heat transfer between the two cores tested concurrently is considered to be negligible; hence, the thermal resistance between those cores is ignored for the current test session. This is a valid assumption because the amount of exchanged heat depends on the temperature difference, which is low for cores tested at the same time. 3) Inactive cores are assumed to be thermally grounded, i.e., their temperature is assumed to be equal to the ambient temperature and fixed for the entire duration of the test session.
Let us consider a test session consisting of cores C2, C4, and C5 from our example shown in Fig. 2 . The lateral thermoresistive model derived for this test session according to the previous assumptions is shown in Fig. 5(a) . The white arrows pointed to the center of the active cores signify the power dissipated by each of the cores, which is pumped away from the core through the lateral thermal resistances. As it can be observed, the thermal resistances between the pairs of nodes corresponding to active cores (such as [C2, C5] and [C4, C5]) are omitted (assumption 2), while all remaining thermal resistances connect the active core nodes to the ambient, i.e., thermal ground (assumption 3). According to this model, the heat-transfer paths from an active core to its cooler surroundings appear as a number of thermal resistances in parallel. For example, core C4 in Fig. 5(a) has three lateral heat-removal paths toward cores C1, C6, and the left chip edge (west edge). C5 does not represent a heat-removal path for C4 since it is itself an active and, thus, "hot" core. A small equivalent lateral spreading resistance associated with an active core represents a good heat exchange between the core and the ambient; consequently, it predicts a lower core temperature during test. On the other hand, a large lateral thermal spreading resistance means poor heat exchange with the ambient; therefore, it signals a potential hot spot during test for cores with high power consumption. This can be seen by comparing the lateral thermo-resistive models shown in Fig. 3 . Each active core in Fig. 5(a) has only three lateral heat-removal paths represented by three thermal resistors. Cores C2 and C4 shown in Fig. 5(b) have both gained an additional lateral heat-removal path through the removal of core C5 from the test session. The equivalent lateral thermal resistances of cores C2 and C4 are lower in this case compared to the scenario for test session [C2, C4, C5] shown in Fig. 5(a) . The thermal simulations performed on the two test sessions shown in Fig. 3 yielded a 103.20
• C maximum temperature for [C2, C4] and a 127.19
• C maximum temperature for [C2, C4, C5], which supports our earlier observations.
The thermal-cost model we are proposing for a core is basically the value of the equivalent thermal resistance toward cooler surroundings weighted by the power dissipated by that core, as shown in (2) . This is necessary in order to account for the actual power density of the core as well as for the lateral heat-removal paths
In order to asses the impact of the lateral heat exchange on the core temperature, we have performed the following experiment. We randomly generated a number of test sessions. For each core in each test session, we computed its thermal cost according to (2) , and the worst case (i.e., maximum) values were correlated with the maximum temperature reached during the execution of each test session. The high-correlation coefficients obtained for several designs, shown in Table IV, suggest that lateral heat spreading has a significant influence on the maximum core temperature. The maximum core temperature during test was determined through thermal simulations using the HotSpot tool [20] .
Based on the results of the previous experiment, we are extending our thermal-cost model to test the sessions as follows:
In the following, we are presenting a fast heuristic algorithm for computing the thermal-safe test schedules, which uses the proposed test-session thermal-cost model in order to reduce the required amount of thermal simulations (see Fig. 6 ). As in the exact algorithm presented in Section III-A, the heuristic algorithm starts by checking whether the individual cores comply with the maximum-allowable-temperature limit T max (lines 1-6). Once all cores have passed this test, they are marked as available (line 7) and are arranged in the descending order of their test lengths (line 8). While there still are unscheduled cores, the algorithm tries to assign them to the current test session (TS). The TS is initially empty (line 11), and cores are added to it until no core can be added due to resource conflicts (lines [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] . A thermal simulation is performed on TS to verify if it complies with T max . In our experiments, we have used the Hotspot tool presented in [20] ; however, any other thermal simulator could be used for this purpose. Consequently, the accuracy of the results is dependent on 
TABLE IV CORRELATION BETWEEN TEST-SESSION THERMAL COST
AND MAXIMUM CORE TEMPERATURE the chosen thermal simulator. According to the data reported in [20] , the simulation-accuracy error of Hotspot is at most 5.8% with respect to FloTherm, the commercial thermal simulator from FloWorks (http://www.floworks.com). If the maximum temperature for TS complies with T max , the test session is added to the test schedule and the process is repeated for the unscheduled cores. If the maximum temperature during TS exceeds T max , the flag GotCostLimit is set to true, and a thermal-cost limit is computed based on the thermal cost of TS and the fraction by which T max was exceeded (lines 34-37). The thermal cost of TS is computed according to (3) . The testsession cost limit is computed as the thermal cost of TS, scaled down linearly by the fraction by which T max was exceeded. The thermal-cost adjustment factor is computed as
where K ∈ (0, 1] is a user specified constant used to relax the thermal-cost limit (ThCostLimit). In our experiments, we have used K = 0.5. Once a thermal-cost limit had been computed, a core is added to the current test session only if it does not increase the test-session thermal cost over ThCostLimit (lines 23-38). This way, it is ensured that once a thermal violation has been detected, test sessions with similar or worse lateral-heat-exchange capabilities are avoided without requiring lengthy thermal simulations. We are illustrating the steps of this algorithm using the example system shown in Fig. 2 . The same thermal constraint T max = 100
• C used for the exact algorithm in Section III-A will be used here as well. After the initial core check, the available array is initialized with all cores in the system arranged in the descending order of their test lengths (Line 7)
GotCostLimit is set to false (Line 8) and an empty testsession TS is created (Line 11). The first core added to the TS is C5. The next available core, C3, cannot be added to TS because it is not test compatible with C5 [see Fig. 1(b) ] (line 15). This process continues until no more cores can be added to TS. At this moment, TS = [C5, C4, C2]. A thermal simulation is performed on TS (Line 31) to determine the maximum temperature reached during TS, in this case 127.19
• C. This violates the thermal constraint of 110
• C; therefore, the algorithm proceeds by setting GotCostLimit to true and computing the thermalcost limit based on the maximum temperature reached during TS and the thermal-cost value of TS: ThCostLimit = 35.43 (line 36). The algorithm continues by discarding TS and building a new test session, this time checking also that the thermal cost of the test session does not exceed the previously computed thermal-cost limit. The first core added to TS is C5. At this point, the thermal cost for TS is 26.88. This is below the imposed limit; therefore, the algorithm continues to add test compatible cores to TS (line 26). The next core to be added is C4, which arises the thermal cost of TS to 32.01. This is still below the imposed limit, so a new core, C2, is added to TS. The thermal cost of TS becomes 39.57, which is over the imposed limit. Consequently, C2 is removed from TS. Since no more cores can be added to TS, a thermal simulation is performed on TS (Line 31). The maximum temperature is found to be 117.04
• C, which violates the thermal constraint. The , where N is the number of cores. However, the thermal simulation (line 31 in Fig. 6 ), which is the most computationally expensive part of the algorithm, is performed in the outermost loop, which has only a complexity of O(N ). This is a considerable improvement in terms of the required computational effort over the exact algorithm described in Section III-A, which has an exponential complexity due to the NP-hard nature of the optimization problem.
D. Experimental Results for Heuristic Algorithm
A number of experiments has been performed in order to assess the performance of the proposed test-scheduling heuristic. The first set of experiments was used to compare the proposed heuristic with the power-constrained test-scheduling approach TABLE V  POWER CONSTRAINED TEST SCHEDULING VERSUS HEURISTIC THERMAL-AWARE TEST SCHEDULING   TABLE VI  TEST TIMES FOR DIFFERENT TEMPERATURE CONSTRAINTS presented in [7] . The results of these experiments are reported in Table V . The maximum temperature during test corresponding to the power constrained test schedules was used as a thermal constraint (T max ) for the proposed test-scheduling algorithm. This way, it is guaranteed that the resulting test schedules will not lead to higher temperatures than that using the powerconstrained approach. From the fourth and fifth columns, it can be observed that the proposed heuristic algorithm outperformed the power-constrained test-scheduling approach for all designs, which produced up to 24% shorter test schedules. Moreover, for three out of the six designs considered, the heuristic algorithm produced the same test schedules as the exact algorithm presented in Section III-A. The last column in Table VI shows significant reductions in terms of thermal simulation effort when compared to the exact algorithm. For example, the simulation length was reduced from 54 to 18 s for system_s.
Another set of experiments was performed to analyze the effect of different maximum temperature limits on the test time • C reduced the simulation effort by half, from nearly 18 s to less than 8.5 s.
Table VII compares the proposed heuristic and the exact thermal-safe test-scheduling algorithms. As mentioned earlier, the heuristic determines the optimum solution for the four out of the six designs considered. In only one case, the required thermal-simulation effort exceeded that required by the exact algorithm, while in all other cases up to 75% reductions have been obtained.
IV. CONCLUSION
Overheating has been acknowledged as a major problem during the testing of complex SOC integrated circuits. In this paper, we have outlined the need for thermal-safe testing and explained that existing power-constrained test-scheduling approaches cannot guarantee a thermal safety during test. Next, we have proposed a new test-scheduling approach that produces short test schedules and guarantees thermal safety during testing at the same time. Two possible algorithms have been developed for the proposed thermal-safe test-scheduling approach. The first proposed algorithm, although computationally expensive, provides an optimal solution to the thermalsafe test-scheduling problem. The second algorithm uses a fast heuristic based on a low-complexity test-session thermal model in order to reduce the required computational effort while producing optimal or near-optimal test schedules. Experimental results show that up to 24% shorter test schedules can be obtained using the proposed approach without increasing the maximum temperature during test application, when compared to power constrained test-scheduling approaches. The proposed approach provides an effective solution to the problems arising from chip overheating during test. 
