SUMMARY Rapid advances in semiconductor manufacturing technology have led to higher chip power densities, which places greater emphasis on packaging and temperature control during testing. For system-on-chips, peak power-based scheduling algorithms have been used to optimize tests under specified power constraints. However, imposing power constraints does not always solve the problem of overheating due to the non-uniform distribution of power across the chip. This paper presents a TAM/Wrapper co-design methodology for system-on-chips that ensures thermal safety while still optimizing the test schedule. The method combines a simplified thermal-cost model with a traditional bin-packing algorithm to minimize test time while satisfying temperature constraints. Furthermore, for temperature checking, thermal simulation is done using cycle-accurate power profiles for more realistic results. Experiments show that even a minimal sacrifice in test time can yield a considerable decrease in test temperature as well as the possibility of further lowering temperatures beyond those achieved using traditional power-based test scheduling.
Introduction
As feature sizes and frequencies of newer System-on-Chips scale much faster than operating voltages, not only power densities but also heat densities will experience a considerable increase. Furthermore, the problem of overheating becomes much larger during testing when beyond normal switching activities occur due to the need for concurrently testing cores to shorten test time. Overheating can lead to problems such as increased leakage power and even permanent chip damage. For every 20
• C rise in temperature, there is approximately a 5-6% increase in interconnect delay timing [15] . These timing uncertainties can result in further yield loss. Traditionally, simply using better packaging and cooling methods would suffice but this has become increasingly difficult and expensive. To reduce packaging cost, packages have increasingly been designed for the worst case typical application [12] , [13] and the cost of cooling during test application has become very prohibitive.
For SoCs, test planning usually involves the design of a test data delivery method (TAM: Test Access Mechanism), and the use of wrappers which isolate cores under test. While several approaches to optimize wrapper designs for single frequency embedded core test [1] , [2] have been proposed, Iyengar et al. [3] , [4] integrated the process into one wrapper and TAM co-optimization algorithm. Up to now, limiting power consumption during test has been the main method of temperature control, and test scheduling under power constraints have been considered in [4] - [7] . Because of the non-uniform spatial power distribution across the chip, limiting the maximum chip-level power dissipation is not effective in reducing and avoiding localized heating (called hot spots) which occurs faster than chipwide heating [9] , [12] , [13] as shown in Table 1 . In Table  1 , the maximum test temperatures, maxT , do not scale with power constraints P max for the SoC p93791 using the powerconstrained method in [4] . Furthermore, power-constrained test scheduling does not allow further exploration of schedule variations with the same test peak power. For the benchmark SoC d695 with a layout shown in Fig. 1 , two schedules having the same peak power value can have different peak temperature, as shown in Fig. 2 . The peak temperatures for the three hottest cores, c5, c6, and c10 are indicated and the maximum temperature for c5 varies from 89.6
• C to 77.2
• C simply by changing its allocated TAM width from 32 to 31 bits. 
p93791
T AM = 32 In this paper, we propose a design framework which integrates the TAM/wrapper co-optimization process with a thermal-aware test scheduling algorithm. The main contributions of this paper are as follows:(1)consider a different cycle-accurate power profile per wrapper configuration for more realistic results, (2)present a simplified thermal cost function and develop a test scheduling algorithm to minimize the overall test time while satisfying temperature constraints, (3) show that for the ITC'02 SoC benchmarks [8] , even a small increase in test time can yield a considerable decrease in test temperature as well as the possibility of further lowering temperatures beyond those achieved using traditional power-based test scheduling.
The rest of this paper is organized as follows. A review of related works is given in Sect. 2. The motivation for this work is discussed in Sect. 3. Section 4 discusses the proposed TAM/wrapper co-optimization algorithm and the proposed test scheduling algorithm. Section 5 gives the experimental results while Section 6 concludes this paper.
Related Work and Motivation
Rosinger et al. [9] first proposed using a thermal model as a guide to test scheduling instead of a chip-level power constraint. They used the RC-equivalent micro-architecture thermal model from [12] - [14] which in turn makes use of the well-known duality between heat transfer and electrical phenomena: heat can be described as a current passing through a thermal resistance and leading to a temperature difference analogous to a voltage [12] . More specifically, [9] only considered the lateral flow of heat away from an active core by reducing a chip into a network of thermal resistances as shown in Fig. 3 . The same thermal resistance network model is used in this work. The proposed test scheduling algorithm in [9] uses a test compatibility graph as its basis and cores are grouped into test sessions which are applied sequentially.
In [10] , Liu et al. defines a "hot spot" as a core whose temperature is substantially higher than the average temper- Fig. 3 Lateral thermo-resistive model [9] . Table 2 Max. temperatures of d695 under various power constraints using different power models.
d695
T AM = 24 ature over all cores. They proposed two algorithms which try to spread heat more evenly over a chip via layout information and a progressive weighting function, respectively. For this work, we define "hot spot" as any core which exceeds the thermal constraint during test. Thus, a core can be scheduled even if its temperature is much higher than its surrounding cores unlike in [10] . In [11] , He et al. proposed using test partitioning and interleaving to allow hot cores to cool off while freeing the test resources to test other cores and avoid overheating.
For all previous methods, only a single fixed power value per core was considered and steady-state temperatures were used as temperature upper bounds [9] . However, this is not realistic, as shown in Table 2 where the peak temperature of test schedules using static average power T pavg during thermal simulation are usually less than cycle-accurate values T real , while maximum temperatures using peak power values T peak are usually much higher and can be considered pessimistic.
Furthermore, the choice of using a single fixed power profile per core is also not realistic. From our experiments, we found that higher TAM widths (therefore, shorter test time) can yield lower maximum temperatures despite having higher peak power values. This is shown in Fig. 4 , where the temperature profile, as well as the peak temperature of core 5 of d695 varies with TAM width. This can be attributed the varying power profile per TAM configuration [7] as well as the RC characteristic of temperature rise: if a test can finish before the temperature curve reaches steady state, the capacitance can have a "filtering" effect on the maximum temperature values. Thus, test time must also be considered when deriving a thermal model or thermal cost function as discussed in the next section.
Finally, flexible TAM-width and partitioned testing were also outside the scope of [9] and [11] . To the best of our knowledge, this is the first work which attempts to integrate TAM/wrapper co-optimization and test scheduling under a thermal constraint using cycle-accurate power profile per wrapper configuration for more realistic temperature simulation.
TAM/Wrapper Co-optimization and Test Scheduling
In this section, we formally present the TAM/Wrapper cooptimization and test scheduling problem P TWOP .
Problem P TWOP : For an SoC S , given: W ext : external TAM width allocated to the SoC N C : Number of cores T emp max : maximum allowed temperature during test
• T AM i j : alloted TAM width
Determine the following output: Rectangular 2-D bin packing has been extensively used to solve the test scheduling problem for embedded cores. Each wrapper configuration of a core is represented by a rectangle whose width and height represents test application time and TAM width, respectively. The rectangles are packed into a bin with unbounded width, representing overall test time, and bounded height representing external TAM width. The aim is to find the optimal way of packing the rectangles such that overall test time (e.g. bin width) is minimized. For scheduling under a power constraint, it can be extended into a restricted 3-D bin packing problem where the length, width and height represent total test time, peak power and TAM width, respectively, for an SoC core. For this paper, previous bin-packing algorithms cannot be directly applied since we cannot simply add the various temperatures of the cores to obtain the overall temperature of the SoC. Furthermore, since it has been shown that the bin packing problem is NP-Hard, this paper proposes a heuristic algorithm to solve the problem.
Thermal Cost Function
Since temperature cannot be handled in the same simplistic and direct way as power(i.e. simple superposition is inapplicable), we need a thermal model and cost function which can effectively and simply express the heating phenomena without the need for data from thermal simulations.
The results in [9] prove that there exists a positive correlation between heat and heat dissipation paths represented by lateral thermal resistances. Thus, we have chosen to use lateral thermal resistance as one of the basis for our model and cost function, with necessary modifications of assumptions from previous works so the model can better approximate heating patterns during testing. From Eq. 1, the thermal resistance R TH between two adjacent bodies is directly proportional to the thickness of the heat source t and inversely proportional to the cross-sectional area A of the destination across which the heat is being transferred and the thermal conductivity k of the material per volume unit.
First, similar to [9] , it is assumed that heat transfer between two cores tested concurrently is negligible and thermal resistances between these cores are removed as shown in Fig. 5 , where we are left with lateral resistances in parallel for core 1 and core 2. Since the thermal resistances of a core C i are in parallel to each other, the total thermal resistance Rth TOT,i can be computed as follows: where N is the total number of thermal resistances R i, j of C i . Note that removing a thermal resitance from the network increases the total thermal resitance, which reflects the fact that there are fewer paths for heat to escape to and this reflects a higher maximum temperature for the core. The assumption made in [9] that inactive cores are thermally grounded and do not heat up is not realistic unless ample time is given for tested cores to cool down before the next test session, as shown in Fig. 6 , where c5 can increase the temperatures of its inactive peripheral cores by as much as 10
• C for c10. Obviously this is not practical because of the required increase in idle time. Furthermore, our experiments show that the temporal dimension, more specifically, the test length as well as the order in which cores are tested can greatly affect the maximum temperature of the next core to be tested as shown in Fig. 7 where the peak temperature of core 5 increases by 7
• C when core 10 is tested right before it ( Fig. 7 (b) ) compared to the opposite sequence ( Fig. 7 (a) ). Thus, when a core is about to be tested, the lateral resistances to cores whose test has just ended are also removed from the total lateral resistance. For example, if core 2 is tested right after core 1 in Fig. 3 , then R 2,1 is removed.
Furthermore, the time dependence of temperature and the power consumption must also be considered. As a rule, we want to test hot cores with large power densities as short as possible and minimize their effects on other cores (avoid concurrency and immediate precedence with cores in immediate physical periphery of the hot spot core).
Due to the localized nature of hot spots as well as the effects of layout and varying thermal resistance configurations, the core with the highest thermal cost does not always mean that it is hotter than cores with lower thermal costs. Thus, we define the following thermal cost for each core C i with respect to its wrapper configuration w i j and time t as shown below:
T AT i j is the test application time and p i j is the average power computed from power profile P i j for wrapper configuration w i j . The lateral resistance R THi is expressed as a function of time because it changes according to when core C i is scheduled and what cores are tested before as well concurrently with it. In our experiments, the average power dis-sipation was found to give a closer thermal profile curve to the actual thermal profile derived from cycle-accurate values compared to peak power values. Thus, instead of considering cycle accurate power, we chose to use average power values which vary with respect to w i j to greatly simplify cost calculations. The main idea is to pick out hot spot cores, determine an upper limit to their thermal cost, cost max i , and gradually decrease this limit until the thermal constraint is satisfied. Furthermore, a thermal cost minimum is computed which represents the worst case configuration of a core to be packed. It inevitably leads to the core being tested alone regardless of time frame, and not preceded by any immediate peripheral cores as given by the equation below:
where Cost i (w i j , NULL) denotes the cost of unscheduled core C i with wrapper configuration w i j and no thermal resistance is removed in equation 3, denoted by NULL time.
Test Scheduling Algorithm
The pseudo-code for our proposed algorithm is shown in Fig. 8 .
Init: Optimal Wrapper Configuration Creation
The initialization steps (lines 1-5 of Fig. 8 ) first makes sure that a configuration for each core can be found which satisfies the thermal constraint T emp max . Initially, the highest cost cost max is set to infinity, and the minimum cost cost min is computed for each core (line 4). It then uses a selection process introduced in [4] where Pareto-optimal points of the TAM vs. Test time graph are chosen as optimal wrapper configurations (w iopt ) in line 5. When choosing optimal wrapper configurations, the thermal cost must always satisfy both cost constraints.
Priority 1: Packing Rectangles with Optimal Wrapper Configuration
Before packing, the algorithm takes note of the current time in the schedule, denoted by the variable current t. In line 8, we try to pack as many cores using optimal TAM widths while available T AM 0. Each core C i is examined in order of decreasing thermal cost when using their optimal wrapper configurations, denoted by Cost i (w iopt , NULL), since potential hot spot cores should be scheduled as early and as quickly as possible to minimize their effects on subsequent cores.
Here and in all subsequent steps, the thermal costs for all active cores are computed and checked with their upper and lower limits before packing since they change whenever a new core is scheduled. The Assign() function in Fig. 9 invoked after every priority step updates the parameters of the core to be scheduled (lines 1-3), recomputes the thermal costs of all the scheduled cores (line 4), updates the remaining core list (line 5) and available TAM (line 6). As the algorithm iterates further, hotspot cores are gradually separated from each other during scheduling due to the imposition of cost limits.
Priority 2: Insertion of Rectangles into Idle Space
If no rectangle can be packed in their optimal configuration, the algorithm looks for a core C i whose optimal tam-wdith T AM iopt is less than or equal to available T AM + α where (1 ≤ α ≤ 4) in line 9. In Fig. 10 , the wrapper configuration of core 4 was changed to a non-optimal configuration (c4 old to c4 new ) and inserted into the idle space of the schedule.
Priority 3: Filling Idle Space by Increasing TAM width
The algorithm checks among the currently scheduled cores whose start times tstart equal current t and determines which core would have the largest gain in test time if given the unused TAM lines and packs this core in line 10. In Fig. 11 , the alloted TAM width to the scheduled core c6 is changed to minimize wasted TAM wires. In line 11, current t is updated when available T AM becomes zero or when no cores can be scheduled in lines 8-10.
Updating and Cost Adjustment
When all cores have been scheduled, thermal simulation using HotSpot tool is performed using cycle-accurate power profiles in line 13. The peak chip-wide temperature maxT is then compared to the thermal constraint. If it is satisfied, then the program ends. If not, then cost adjustment is performed on the hottest core C hot in lines 14-15 and cost max hot is updated. Line 16 looks for the next hottest core to adjust when the current hot spot core's cost can no longer be adjusted. The program ends when the thermal constraint is satisfied or no more cores can be adjusted. The adjustment factor, ad just f actor, can be any value from 0-1. For this work, a constant factor of 0.90 is used.
Experimental Result
The experiments were done using three SoCs from the ITC'02 SoC Benchmark suite [8] , d695, p22810, and p93791. For thermal simulation, cycle-accurate power profiles provided by the authors of [7] were used. Note that the actual power profiles were originally expressed as number of transitions per clock cycle. We converted the values into Watts by simply dividing them by 20, 200, and 500 for d695, p22810, and p93791, respectively, to reflect power dissipation during test. The test data for d695, upon thermal simulation, reveals that the total test time under TAM configurations used for this experiment (16, 24, 32, 64) are too short to show any significant heating of the chip. Therefore, when necessary, we have increased the length of the sampling interval during thermal simulation to allow the effects of heat to show. This is reasonable if we consider that tests for delay faults are normally 2-4 times larger than stuck-at-fault test sets. Since the test application time per core is normally much larger in magnitude compared to lateral resistance, we scaled the test time values for each SoC such that their magnitudes are within acceptable range of each other (in this work, both total lateral resistance and test time was adjusted to not exceed 100) when computing for the thermal costs. Experiments were done using an HP ProLiant Workstation with 4 Opteron CPU's operating at 2.4 GHz with 32 GB of memory.
Since the original SoC benchmarks did not include layout information, we handcrafted the layout of each SoC. The scheduling and thermal simulation results for d695, p22810 and p93791 are shown in Tables 3 to 5 . Before applying any thermal constraints, we used our scheduling algorithm to create a base schedule without any constraints. From the non-constrained schedule, we determine its maximum temperature, maxT , and use it as the thermal constraint, T emp max . We gradually decreased the constraint by 5 degree steps, each time recording the actual maximum temperature (maxT ), the test application time (T AT ), and peak power value (P max ) given as number of switches. We also computed the gains in temperature (dT ) with respect to the base temperature as well as the differences in TAT (dT AT ). In Table 3 for d695, a maximum temperature gain of 26.64% was achieved with a modest 24.75% increase in TAT (TAM = 32, Temp max = 80.16
• C). For as little as 5.30% increase in TAT, we can get a relatively large gain of 20.86% in temperature reduction (TAM = 24, T emp max = 107.42
• C). The limitations of global peak-power based approaches becomes apparent when we consider the results for TAM = 32 in Table 3 . For most of the temperature variations, the peak power value remained constant at 1598. When such a power constraint is applied, the temperatures of the generated schedule can vary within the range of 89.58
• C − 77.15
• C and our algorithm makes sure that the thermal constraint is indeed satisfied. For p22810 in Table 4 , a maximum temperature reduction of 33.82% can be had for a 20.38% increase in TAT (TAM = 24, T emp max = 111.37
• C). At TAM = 32, the algorithm was able to decrease the temperature from 155.5
• C to a manageable 109.36
• C with just a 9.33% sacrifice in TAT. Similar results were obtained for p93791 in Table 5 , where there is a maximum temperature reduction of 35.61% with a 35.64% increase in TAT at TAM = 64. Note that at TAM=24, there was no further gain in temperature from temperature at T emp max = ∞ and is mainly a result of the lack of flexibility due to the limited usable TAM width. Overall, the algorithm works well for designs with many cores and exploits the availabilty of wider TAMs.
Conclusion
In this paper, we have presented a TAM/Wrapper cooptimization framework for system-on-chips that ensures thermal safety while still optimizing the test schedule. The proposed method allows us to further explore, beyond the limits of peak-power based test scheduling, possible variations of a schedule which can lead to further reductions in temperature while limiting increases in test application time. Using cycle-accurate power profiles per wrapper configuration and considering both the spatial and temporal dimensions of heat transfer, overall, allows us to more closely approximate real world thermal phenomena. 
Krishnendu

