Abstract-Large temperature gradients exacerbate various types of defects including early-life failures and delay faults. Efficient detection of these defects requires that burn-in and test for delay faults, respectively, are performed when temperature gradients with proper magnitudes are enforced on an Integrated Circuit (IC). This issue is much more important for 3-D stacked ICs (3-D SICs) compared with 2-D ICs because of the larger temperature gradients in 3-D SICs. In this paper, two methods to efficiently enforce the specified temperature gradients on the IC, for burn-in and delay-fault test, are proposed. The specified temperature gradients are enforced by applying highpower stimuli to the cores of the IC under test through the test access mechanism. Therefore, no external heating mechanism is required. The tests, high power stimuli, and cooling intervals are scheduled together based on temperature simulations so that the desired temperature gradients are rapidly enforced. The schedule generation is guided by functions derived from a set of thermal equations. The experimental results demonstrate the efficiency of the proposed methods.
Temperature-Gradient-Based Burn-In and Test Scheduling for 3-D Stacked ICs
I. INTRODUCTION L ARGE temperature gradients (e.g., temperature difference between two adjacent cores) exacerbate various types of defects including early-life failures and delay faults. The capability to detect these temperature-gradient induced defects is crucial for many ICs. In particular, 3-D ICs exhibit considerably larger temperature gradients compared with normal ICs (for example, three times is reported in [22] ) and therefore, temperature-gradient-based test is necessary for them.
A promising technology for fabricating 3-D ICs is based on through-silicon vias (TSVs) used for interdie connections [8] , [12] . The ICs fabricated using TSVs are commonly referred to as 3-D stacked IC (3-D SIC) [12] .
A. Test for Early-Life Failures
Burn-in is a common way of accelerating and detecting early-life failures and it should be done with low cost in a reasonably short time. For this purpose, usually the dies are operated at elevated temperature and voltage. The elevated temperature and voltage speed up the aging and wear mechanisms so that the dies experience their early life before testing.
The wear mechanisms that are speeded up include metal stress voiding and electromigration, metal slivers bridging shorts, as well as gate-oxide wear out and breakdown [17] .
Recently, several studies have, however, shown that some wear mechanisms are speeded up more efficiently by large temperature gradients rather than the high temperature itself. A temperature-gradient-induced wear mechanism is identified in [19] , which shows that a metal layer elevation develops rapidly on the sites that experience large temperature gradients. Moreover, in the atomic flux equation that models the electromigration, temperature gradient is present directly and also indirectly through its effect on the mechanical-stress gradient [15] . Therefore, a burn-in process that has not created the appropriate thermal scenarios does not sufficiently speed up the formation of the defects and, consequently, such early-life defects will go undetected. To prevent these test escapes, it is necessary to introduce a burn-in process that enforces appropriate temperature scenarios on the IC. This necessity is more urgent for the ICs that suffer from large temperature gradients, such as 3-D SIC.
3-D SIC technology, similar to other deep submicrometer technologies, suffers from high power densities. In addition, power densities are considerably higher in the test mode compared to the functional mode, in particular for corebased designs [25] . Consequently, overheating may damage the ICs under test [2] , [24] . This means that the application of test stimuli to ICs can raise their temperatures beyond their tolerable limits. This often undesirable effect is, however, utilized in this paper to heat up the IC for burn-in. In our case, the stimuli are not necessarily actual test patterns. Instead, they could be specially generated sequences that cause large switching activities. Such stimuli are called heating sequences. The use of the heating sequences to heat up the IC from inside means that a special equipment for heating the IC from outside is not necessary. This will lead to large reduction of cost and also allow for the generation of needed temperature gradients.
For 2-D ICs, there are usually two possible stages for burnin: 1) wafer-level burn-in that is performed before packaging and 2) die-level burn-in performed after packaging [17] . For 3-D SIC, there are more stages, including prebond, midbond, postbond, and final stages [20] . At different stages, different defects can be targeted, based on their likelihood and considering the corresponding burn-in costs. A 3-D SIC, in addition to the defects that also exist for 2-D, is affected by TSV related defects (e.g., defects related to TSV bonding). Such defects motivate, among other reasons, the use of these extra stages.
The existence of the test stages before the IC is fully assembled, which is a key difference between the 2-D and 3-D SIC burn-in process. In the case of 3-D SIC, using input ports in the functional mode may benefit burn-in for the postbond and the final stages similar to 2-D ICs. But for the prebond or midbond stages, the inputs to the die or partially stacked dies are not necessarily the inputs to the IC. The input ports to the unit under test for 3-D SICs, before the final bonding, are likely to include a number of TSVs. The TSVs and test equipment are not designed to support simultaneous application of functional signals, particularly to a large number of TSVs (even though they might be designed to allow simple electrical tests for the TSV itself). Therefore, the use of the IC ports for enforcing the temperature gradients is not possible for the prebond and midbond stages. Albeit this lack of access in the functional mode, the test access mechanism (TAM) provides access to the cores in the test mode [1] . Therefore, the heating sequences could be applied using the TAM to enforce the desired gradients.
B. Test for Delay Faults
Three-dimensional stacked ICs and other deep submicrometer technologies suffer from a considerably larger number of delay faults as compared with previous technologies. Therefore, delay-fault testing is necessary to provide sufficient fault coverage [9] . A large number of prebond TSV defects are resistive in nature and, moreover, the mechanical stress caused by TSVs contributes also to delay faults [8] . Therefore, the expected number of delay faults for 3-D SIC is larger than that for 2-D ICs.
Since temperature has a significant effect on delay, its impact should be considered for delay-fault test. A very important effect of temperature on signal integrity is its effect on the clock network [6] . Delay faults usually occur because of increased clock skew and a major contributor to skew in 3-D SICs is temperature gradient [14] . Since propagation delays depend on temperature, different temperatures on different sites (i.e., temperature gradients) result in clock skew. Temperature gradients may reach up to 50°C in adjacent cores for normal operation and even higher during test [6] , [7] , [14] . Besides, as mentioned before, the temperature gradients in 3-D SICs are much larger than in 2-D ICs [22] . This will exacerbate temperature-gradient-related issues including delay faults, in particular, for 3-D SIC. Therefore, the associated tests should be performed when the proper temperature maps are enforced. A temperature map specifies the appropriate temperatures for different sites (e.g., cores) in the IC. These temperatures are to be realized simultaneously to enforce the proper temperature gradients. The temperature maps are given along with their corresponding tests. Beside the gradient-based burn-in (discussed in Section I-A), the other objective of this paper is to introduce a technique to apply the tests while the corresponding maps are enforced on the IC.
II. RELATED WORKS
Traditionally, burn-in is performed at elevated temperature, which is achieved by a special equipment (e.g., temperature chambers) [17] . A more elaborate technique is a hybrid burn-in and test technique that provides tighter temperature control using an active heat-sink, without using temperature chambers [13] . These existing techniques are not able to enforce the specified temperature gradients, especially those with large magnitudes.
Despite the fact that enforcing specific temperature gradients will facilitate the detection of delay-faults, the existing methods for delay-fault testing focus only on performing the tests disregarding the temperature gradients [5] . Apart from gradients, creating and maintaining some other kinds of thermal conditions during the test has been addressed previously. These existing techniques are briefly reviewed in the following.
Two different approaches for multicore ICs are introduced in [10] and [23] to guarantee that the core temperatures are kept within the specified range when the corresponding tests are being applied. They focus on the temperature of the individual cores that are under test and the temperatures of other cores are neglected.
Speeding up the test while minimizing the damages caused by overheating due to process variation is addressed in [2] . The test temperatures are kept sufficiently low by introducing cooling cycles. A number of test schedules are generated for different variation scenarios. During the test, the proper schedule is selected based on the on-chip temperature sensor readouts (adapting to the current thermal variation situation). A fast temperature simulation technique that isolates heavy computations into a single initial phase is also suggested in [2] .
The existing methods for controlling the chip temperatures during test try to respect a global upper temperature limit to prevent overheating [24] or to respect upper and lower bounds for individual cores to target temperature dependent defects [10] , [23] . In both cases, the temperature bounds are defined for each core independent from other cores and therefore, spatial temperature gradients cannot be enforced.
The first proposals to consider temperature-gradients in the burn-in and delay-fault test processes were made in [3] and [4] , respectively, where the concept of temperature maps were introduced. Preliminary sketches of two simple algorithms to consider such temperature maps for burn-in and delay-fault test, respectively, were reported in [3] and [4] . This paper has built on the preliminary results of [3] and [4] and develops an integrated and systematic framework to address temperaturegradient issues in both burn-in and test processes. It also presents several advanced techniques to compute proper values for the test schedule period, to order the temperature maps, and to generate high-quality temperature-gradient-based burnin solutions, which were outside the scope of [3] and [4] .
III. TEMPERATURE-GRADIENT-BASED BURN-IN

A. Preliminaries and Problem Formulation
As discussed earlier, a temperature map specifies the desired temperature values for different sites (e.g., cores) in an IC. The temperature maps are to be given by the user who studies the typical temperature-gradient-induced failure mechanisms analytically or experimentally [15] , [19] . Each map corresponds to a particular temperature condition of an IC, such as large temperature differences between adjacent cores (i.e., large temperature gradients), that can accelerate aging for early-life failures or enlarge the delay fault effect so that they can easily be tested for. There might also be some sites that their temperatures are not important regarding the targeted defects. Such sites are indicated as don't-cares.
When the expected locations in the IC simultaneously have the temperature values that are specified by a map, it is said that that temperature map is enforced. The specified temperature maps should be enforced quickly. In case of burn-in, the temperatures should then be enforced for a given period of time to achieve the intended effect. In case of test, a map should be enforced as long as the corresponding tests are being applied. Usually, there are many temperature maps that, therefore, it is important to enforce them rapidly whether the ICs start from the ambient temperature or from another map. The order of the maps has a considerable impact on the overall burn-in/test time, as will be discussed in Section V. For the time being, we assume that the maps order is given and focus on other aspects of the problem. In our work, a temperature map will be enforced by applying heating sequences sent through the TAM. Moreover, it is assumed that no test is applied when an IC is kept under a temperature map for burn-in. This will be relaxed in Section IV so that the tests can be applied when a map is enforced.
Assume that there are M modules in an IC (on one or multiple dies) and their tests can be started and stopped independently (e.g., the modules are cores with core wrappers in a core-based design). To enforce the specified temperature maps, heating sequences are used to heat up some of the modules. The average power of the heating sequence is given by a real number, denoted by p HS m for module m(0 ≤ m < M). It is assumed that the TAM only affords W (a positive integer number) modules to be tested simultaneously.
A heating sequence could be a collection of high power tests, being applied one after the other. A heating sequence generates heat in the same way that a test does and occupies TAM in the same way as a test. In general, heat generation happens with every shift in the scan-chain, since we do not assume scan chain masking during the shift mode.
A 3-D SIC is usually laid out so that the main blocks (e.g., logic and memory) are placed in a certain distance relative to TSVs to avoid undesirable effects induced by TSVs such as high mechanical stress. Such forbidden areas are called keep-out zones (KOZs) [8] . A collection of the TSVs placed next to each other (perhaps to overlap the KOZ of different TSVs and save area on the die), is called a TSV block. A TSV block may consist of only one TSV if the TSVs are placed far apart.
In this section, it is assumed that a module is a single active thermal node. Furthermore, it is assumed that TSV blocks are always thermally don't-care. They do not generate heat (are passive thermal nodes) since TVS drivers are not considered as parts of the TSV blocks.
Assume that the desired temperature map is specified by a low temperature limit and a high temperature limit for each module and the don't-care modules are declared separately. For example, a temperature map specifies that module m has a low temperature limit equal to θ L m and a high temperature limit equal to θ H m . The inputs to the proposed method include temperature maps, IC's temperature model, IC's electrical model (e.g., specification of the TAM and power-related specifications), switching activities of the heating sequences, ambient temperature (θ ambient ), and overheating limit (θ overheating ). The output is a schedule that guides the application of the heating sequences to the modules so that their temperatures move into the specified ranges and stay there.
As an example, consider an IC with 3 modules, m 0 , m 1 , and m 2 . Assume that a temperature map is specified as
, and θ L 2 = 55°C, and no module is specified as don't-care. These temperature limits are shown in Fig. 1(a) with dashed/dotted lines. A temperature simulation is performed for this IC based on a proper periodic schedule and the simulated temperatures are shown in Fig. 1(a) . Starting from the ambient temperature (θ ambient = 30°C), the module temperatures steadily rise until they are inside the specified ranges. As shown in this example, applying heating sequences can drive the IC into a high-temperature situation. For example, the temperature of module m 0 has reached 120°C at around 4 × 10 4 time units (TUs). A TU consists of 4 × 10 3 test cycles in this example.
The temperatures around 6 × 10 4 TU point are magnified in Fig. 1(b) . The time interval shown in Fig. 1(b) corresponds to three periods of the schedule. Since the schedule is periodic, one period captures the entire schedule, which is repeated in a cyclic manner. Fig. 1(c) further magnifies one period of the schedule that starts at t 0 and ends at t 3 . The period length is denoted by τ (τ = t 3 − t 0 ). One period is divided into three intervals, specified by numbers 0, 1, and 2 in Fig. 1(b Fig. 1(c) , respectively. The schedule specifies that the heating sequence for module m 0 is applied only in the [t 0 t 1 ] interval, the [τ + t 0 τ + t 1 ] interval, and in general, in
, assuming that the process starts at time t 0 . The application of the heating sequences for module m 1 and module m 2 are specified in a similar manner by the schedule. For the [t 0 t 3 ] period, the time intervals at which the heating sequences are applied are shown by gray areas in Fig. 1(c) 
B. Steady-State Solution
Let us first analyze a simplified situation, where we assume that a steady state power could be provided for the modules. In this case, there is a steady-state solution that could generate and maintain the specified temperature map. Providing continuous steady-state powers simultaneously for all modules is, however, very likely to be impossible mainly due to TAM limitations. One solution is to use the maximal practical power for each module in combination with a pulsewidth modulation (PWM) technique. Therefore, the best that can be achieved is a discrete stimulus sequence that has a constant long-term average power with small ripples.
In this way, the modules have a time-divided multiple access to the TAM. To reduce the risk of out of range temperatures due to ripples in the input power, the desired steady-state temperatures are defined at the middle of the specified ranges
Such ripples could be observed in the temperature curves given in Fig. 1 . To find the power values that result in the specified temperatures, the IC's temperature model should be analyzed. A widely used temperature model is the lumped-element temperature model, as used in HotSpot [11] . Such a model divides an IC into elements represented by nodes. Each node has a heat capacitance modeling its thermal capacity. Adjacent nodes are connected through a heat resistance that models the thermal conductivity between them. They are connected together in a network configuration, similar to an electric circuit. The temperatures correspond to voltages and the heat dissipation corresponds to a current source. A node is called active if it directly receives electrical power caused by switching activities. A detailed information of such models can be found in [7] and [11] .
All the characteristics of a temperature model are captured in two matrices A and B. The thermal behavior of an IC is captured in the following system of ordinary differential equations [2] , [11] :
In this equation, Θ is the temperature vector and P is the power vector. Heat transfer among nodes is included in the temperature model and it means that a node can be heated up by its neighboring nodes even if it has no switching activities.
The specified temperature map consists, in fact, of the steady-state temperatures that the IC should be kept at. A temperature map could be thought as the targeted steadystate temperatures, Θ SS , which are composed of the desired steady-state temperatures for each module (e.g., θ SS m for module m). Since Θ SS is, in this case, equivalent to the steady-state temperatures, which are considered constant (for a certain amount of time), its derivatives are zero (no variation in time). Therefore, (1) could be written as
This means that it is possible to calculate the required powers that lead to the specified temperature map. In order for the specified temperature map to be achievable, the computed steady-state power values must satisfy a feasibility and a schedulability condition. The first part of the feasibility condition is that the computed steady-state power for module m ( p SS m ) should be larger than or equal to the stray power dissipated by the module. The stray power is the sum of the leakage power and the clock network power. It is denoted by p m (for module m). The second part of the feasibility condition is that p SS m should be less than or equal to the average power of the corresponding heating sequence, p HS m , plus p m . The feasibility condition is, therefore, as follows:
Usually, the feasibility condition is easily met if the specified temperature map is realistic (i.e., the specified temperature is not lower than the ambient and not larger than the achievable temperature). Assuming that (3) is satisfied, the schedulability condition which is related to the limited TAM bandwidth should be verified. The challenging problem here is to create the required average power values, P SS , using the available TAM bandwidth. This is done by selectively applying the heating sequences to the modules.
The continuous application of the heating sequence generates an average dynamic power equal to p HS m . The desired power values, p SS m , which are smaller than p HS m + p m , are created by applying the heating sequence, p HS m , for a fraction of a time period. The average power in a period should be made equal to the required steady-state power. As mentioned before, this is done using a technique similar to PWM. The ratio of the duration of heating sequence application to the overall time period is therefore called duty cycle (D m ) and its value is calculated using the following:
The duty cycles might not be achievable if their values are relatively large and if the TAM does not provide sufficient bandwidth. For example, assume a design with two modules with the duty cycles D 0 = 0.6 and D 1 = 0.8. This means that in a period of time equal to 1, we need access to module 0 for 60% of the time and access to module 1 for 80% of the time. Therefore, simultaneous access to more than one module (0.6 + 0.8 = 1.4 modules) is needed. This means that the TAM must provide simultaneous access to these two modules; otherwise, these duty cycles are not schedulable and the specified temperature map cannot be enforced.
Note that D m can be divided into pieces; for example, D 1 = 0.8 could be implemented by first applying the heating sequence for a duration equal to D 1,0 = 0.3 at the middle of the period and later on for a duration of D 1,1 = 0.5 at the end of the same period. The feasibility and schedulability conditions could be written together using the duty cycle concept as follows:
The first inequation in (5) is identical to the feasibility condition in (3), which is rewritten in terms of the duty cycles. The second inequation in (5) is the schedulability condition, where W is the number of modules that can access the TAM simultaneously. Given a temperature map that satisfies both feasibility and schedulability conditions, it is relatively simple to develop a schedule to deliver the required duty cycles. One such scheduling algorithms is presented in [4] . It is demonstrated in [4] that for the schedules generated by the proposed method, at every moment in time, W modules or less are receiving their heating sequences, which means that the TAM limitation is not exceeded. Furthermore, it is shown that the average of the applied heating sequences for each module is equal to the specified steady-state power for it. For example, in Fig. 1(c plus p 0 , p 1 , and p 2 , respectively. This is indicated by the width of the gray areas as compared with the schedule period τ (τ = t 3 − t 0 ).
As mentioned before, a temperature map may leave the temperatures for some nodes unspecified (don't-care nodes). Besides, the temperatures for inactive thermal nodes (e.g., TSV blocks) are also left unspecified. On the other hand, to compute the steady-state powers [using (2) ], these temperatures should also be known. The proper choice of the temperatures for the don't-care nodes may determine if the temperature map can be achieved or not. The problem of finding proper temperature values for the don't-care nodes can be formulated as a LP problem.
In the LP formulation shown in Fig. 2 , the duty cycles are decision variables. The main objective is to find a feasible solution. The temperatures, θ m , should be equal to the temperatures that are specified by the temperature map, θ SS m . If not specified by the temperature map (don't-care modules), the temperatures should be between the ambient temperature and the overheating temperature. The relation between the power values, p SS m , and the duty cycles is defined by (4). The temperatures, θ m , are computed based on power values, p SS m , using (2) (by replacing P SS and Θ SS with vectors composed of p SS m and θ m , respectively). For an inactive module, the power value should be equal to the stray power p m and therefore, the duty cycles should be zero. For an active node, the duty cycles are between zero and one, as defined in (5) . The duty cycles should satisfy the schedulability condition [i.e., the second inequation in (5)]. Assuming that the LP solver has found a feasible solution and has calculated the duty cycles successfully, a proper period for the PWM-like method has to be computed.
The duty cycles and the scheduling approach, discussed so far, are independent of the schedule period τ. They generate the module temperatures such that their average equals the specified steady-state temperatures. The period, τ, should be short enough so that the fluctuations in the temperatures do not violate the specified limits (θ L m and θ H m ). On the other hand, a longer period is desirable to minimize the switching actions in the schedule. An example for the results obtained by the proposed algorithm could be observed in Fig. 1(a) . After the temperatures have completed their transitions to their new values (after 4 × 10 4 TU), the proper choice of the period keeps them inside the specified ranges, with a relatively low number of switching actions in the schedule.
To find a relatively long period, τ , that albeit being long keeps the temperature fluctuations inside the specified ranges, two different situations should be considered: 1) (H ) heating sequence is applied and 2) (L) no stimuli are applied. To estimate the proper period for situation H , (1) is rewritten around the steady-state temperature for the heating sequence power, as shown in (6a). For situation L, (6b) is used instead
An example for (6a) is the tangent line that touches the temperature curve in Fig. 3 at point A (around the steady-state temperature). A similar example for (6b) is the tangent line CD in Fig. 3 . Equation (6a) is then used to estimate the desired value for the period focusing only on the high-temperature limit. 
Now, T H m is computed for module m as
The values for (d/dtΘ) H m are obtained from the right side of (6a) and, consequently, the values for T H m are computed using (8) . For example, in Fig. 3 , when the module is receiving power, the derivative that is represented by a straight line is tangential to the temperature curve at its intersection point with the steady-state temperature at point A and later on intersects with the high-temperature limit at point B. The period, T H m , is then calculated based on the time difference between A and B. The other part of the line that stand between A and the low temperature limit is deliberately left out to achieve a shorter period that is safe in most of the situations (e.g., variation in the input power).
In a similar manner values for situation L, T L m , are calculated based on (6b) by focusing only on the low temperature limit. Since the temperatures should not violate any of the specified limits, the shortest
is selected as the acceptable period for module m. The actual period, τ , should be the smallest among acceptable periods for all modules (τ = {T m }) so that none of the temperature limits for the modules is violated.
For the steady-state solution, the average powers that are applied when a new map is going to be enforced are the steady-state powers that will also maintain that map. This implies that the transition to the new map is very slow since even during the transition, these steady-state powers are applied. The steady-state solution results in an excessively long transition time and therefore, a faster solution is necessary.
C. Transient Solution
In this section, a solution to reduce the overall transition time is introduced. We start by looking into the analytic solution for (1) for a duration of time equal to t, as shown below [2] 
In the above equation, the initial temperatures are expressed by 0 and the temperatures at time t are denoted by t . P B is the power vector that is assumed to be constant for the time interval t. An intuitive explanation of (9) is that α(t) determines how fast the initial temperatures fade away and β(t) determines how fast the input power affects the temperatures. α(t) and β(t) are matrices that are computed based on A and B, for a duration of time equal to t as follows [2] :
In the rest of this paper, α(t) and β(t) are represented as α and β, respectively. As mentioned before, achieving a new temperature map in a short time is crucial and, therefore, this transition should happen as fast as possible. Once the IC's temperatures have converged to the specified temperature map, they can be maintained using the steady-state powers, P SS , as presented in Section III-B.
We would like to extend the steady-state solution approach to (9) , which includes the transient response, to find the schedulable power values that result into the shortest transition time. The new problem can be formulated as: find the shortest transition time, t, and the corresponding power values, P B , such that the specified map is achievable. The transition time from map μ i to map μ j is defined as the time required to construct the temperatures specified by map μ j starting from the temperatures specified by map μ i .
This problem can be solved using an iterative search that tries different alternatives for t. The algorithm uses the latest information regarding the interval that contains the optimal transition time (denoted by [λ σ]). At any step, it is known from the previous steps that the specified map is not achievable for transition times shorter than λ. It is also known that since the temperature map is achievable for a transition time equal to σ , longer transition times are not optimal. Initially, λ is set to zero and σ to the transition time for the steadystate approach. This steady-state transition time is obtained by simulating the temperatures when the steady-state schedule is used (similar to Fig. 1) . A number of candidate transition times with uniform distances are selected between λ and σ as follows:
The r th candidate transition time is denoted by t r . R is the number of parallel LP solvers and its value is selected based on the degree of parallelism offered by the platform that runs the algorithm. For example, for a machine that supports eight threads, eight is a reasonable choice for R. For each candidate t r , solving the LP formulation determines whether the temperature map is achievable or not. The value of σ is updated to the smallest t r that leads to schedulable power values. The value of λ is updated to the largest t r that leads to power values that are not schedulable. Note that if for all the candidate transition times, denoted by t r (r = 0, 1, . . . , R − 1) the map is achievable, then λ remains unchanged. On the other extreme, if none of the t r s are schedulable then σ remains unchanged. The algorithm stops when the smallest transition time is found with acceptably low error. The error is bounded to (σ − λ) and therefore, if this difference is smaller than a certain limit, then the actual error, too, will be smaller than that limit.
The problem formulation for the LP solver that is used here is similar to the LP formulation in Section III-B and Fig. 2 , with the following differences: 1) instead of θ m s, the temperatures at the end of the transition time, θ t m s, are used and 2) instead of (2), (9) is used to calculate the temperatures based on the power values. The relation between the power values and the duty cycles is defined by (4) similar to Fig. 2 . If the LP solver finds a feasible solution, the temperature map is achievable. This information is then used to update the λ and σ values.
The matrix exponent computation for α, in (10), is performed using techniques proposed in [21] . These techniques are used to speed up the repeated recalculations of α and β for alternative transition times. They are based on eigenvalue decomposition utilizing the inherent properties of matrices A and B and replace the excessively time consuming matrix exponent calculations in (10) with simpler operations. Although these techniques speed up the calculations, the required time is still very large, as experimentally shown in Section VI.
Even though the transient solution is an intuitive extension of the steady state solution and greatly outperforms it, it is slow in generating the schedules. Therefore, a new approach that avoids the time-consuming successive calculations of α and β is necessary. Such an approach is proposed in the next section based on a fast heuristic. Moreover, this new approach is capable of handling a more realistic problem formulation compared with the steady-state and transient solutions.
D. Transient-Based Heuristic
So far, temperature maps could only specify temperatures for modules. Therefore, the module's area limits the resolution of the temperature maps. This limitation is relaxed, from this section onward, by allowing the modules to be divided into smaller areas called submodules.
1) Support for High-Resolution Temperature Maps:
Previous techniques require the ability to apply the heating sequence only to a selected thermal node, avoiding the application of heating sequences to other nodes. Therefore, the smallest element in the temperature model could not be smaller than the corresponding module (e.g., a core with core wrapper). The method proposed in this section can work even if heating sequence application to a selected node generates heat in some other nodes. This supports division of a module into a number of submodules. Now we can assume that the overall number of thermal nodes, denoted by N, is larger than or equal to the number of modules (M ≤ N). In the rest of this paper, the desired temperature maps are specified for the thermal nodes instead of the modules. Consequently, the temperature map specifies that node n has a low-temperature limit equal to θ L n and a high-temperature limit equal to θ H n (0 ≤ n < N). The switching activities for heating sequences must be more specific, providing the power breakdown among active thermal nodes. For example, assume that module m is divided into two active thermal nodes n and o. The average power of a heating sequence for active node n is represented by p HS n . Node o may also receive power, denoted by p HS n,o , when heating is targeted for node n. Similarly, when trying to heat up node o with p HS o , Fig. 4 . Transient-based heuristic demonstrated using a temperature curve.
node n is also heated by p HS o,n . Furthermore, power dissipation for TSV blocks is now supported, and the TSV drivers/buffers may be placed in TSV blocks, and their desired temperatures might also be specified in the temperature maps. Note that a TSV block is not only a TSV. A TSV block includes TSVs, interconnects, bulk silicon, and possibly transistors. The TSV, itself, is a metal rod that is a good conductor of heat and will not generate heat on its own.
2) Operations During the Transition (Boosting):
The transient-based heuristic generates the schedule offline. The schedule generation process is based on temperature simulation. The general idea is to apply heating when the simulated temperature is below θ L and stop the heating before it reaches θ H . These start/stops events construct the schedule.
The transition time between two maps is wasted time and must be minimized. The fact that the temperatures during transition are not important (except that overheating is avoided) is used to shorten the transition time. The proposed technique may apply heating more than needed. Heating, in this case, is called boosting. Boosting stops when the node reaches the stop boosting temperature, θ SB n . The stop boosting temperature may be higher than the high-temperature limit, θ H n , but it is always lower than θ overheating .
The following example shows how boosting helps. Assume that node n is initially boosted beyond θ H n (θ SB n > θ H n ). Then n does not need to receive heating for a while and this leaves the TAM available for other nodes. Meanwhile, n's temperature keeps decreasing (naturally). Just before all other nodes are in their specified temperature ranges, n's temperature drops below θ H n . This simplifies and shortens the schedule for the transition period and, therefore, is desirable. Fig. 4 shows how the transient-based heuristic works, by showing the temperature curve for one of the thermal nodes. The curve starts at the exact moment that the transition to the new map is started. The transition interval ends when all the nodes are in their valid temperature ranges. Since only one of the nodes is shown here, the transition time cannot be observed directly; therefore, it is indicated by the gray area at the lower left corner of Fig. 4 . Boosting is shown in interval a: the temperature increases beyond θ H n and continues to θ SB n . This helps to achieve a shorter transition time, as discussed before. Apart from boosting, other operations during the transition are similar to the operations after the transition.
3) Operations After the Transition: Just after transition, the map is enforced. However, since a node's temperature will naturally decrease, the temperature will eventually fall below θ L n if no or little power is applied to it. Therefore, a heating sequence should be applied at some point, before the temperature falls out of range. This point is marked with a temperature level named heating trigger and denoted by θ HT n for node n (θ HT n > θ L n ). The heating sequence should be applied when the temperature of node n falls below θ HT n . The difference between θ HT n and θ L n provides sufficient time for the node to wait for gaining access to the TAM without its temperature falling below θ L n . In Fig. 4 , the heating is required at the beginning of the interval c, but since the TAM is not available, the node waits. At the beginning of the interval d, the node has finally gained access to the TAM and the heating begins.
Heating should stop when the temperature reaches θ H n . The time it takes to get back to the low temperature limit could be utilized to heat up other nodes that need heating. In a situation that a module consists of multiple active thermal nodes, the heating sequence could only be applied if all of these thermal nodes have temperatures lower than their high-temperature limit.
4) TAM Access Management:
The nodes that simultaneously require heating should be accommodated within the available bandwidth of the TAM. This bandwidth might not be sufficient for all of them and, therefore, the nodes that need heating more than others should be prioritized. The priorities for using the TAM are determined based on the regional need for heating (denoted by d n around a node n). The value of d n is recomputed whenever node n needs heating (during offline schedule generation). A node requires heating in the following two situations: 1) when θ n < θ HT n after the transition, for example, the interval c in Fig. 4 and 2) when θ n < θ SB n during the transition, for example, the interval a in the same figure.
In the following, we explain how to calculate d n for the situation 1). Regional need for heating for situation 2) is obtained in a similar manner by replacing θ HT n with θ SB n . Equation (1) is rewritten, with the approximate derivatives, as
The input power, P, in (1) is substituted with the stray power P plus the PWM power of the heating sequences, D × P HS . Vector D is the vector form of the regional need for heating and consists of d n s. Equation (12) is written for one test cycle with period T, which is a very short time. The equation is then solved for the nodes that need heating as follows:
The regional need for heating d n depends on the required heating for node n (consider the summations when k is equal to n), on the required heating that is related to the adjacent nodes (consider the summations when k denotes an adjacent node to n), and on the average power of the corresponding heating sequence, p HS n . The regional need for heating for a node has the highest dependency on the node itself, and then a relatively high dependency on the adjacent nodes (this characteristic is captured by the temperature model). The influence of other nodes located far away from the targeted node is small. The heat transfer between nodes is considered automatically, since (13) is derived from the temperature equation (1), and includes the thermal conductances from matrix B. This is reflected by b n,k in (13) . Equation (13) ensures that the priority for using the TAM is given to the regions that need longer heating times, for example, because of large (θ HT n − θ n ) and small p HS n . Furthermore, the locality of this heuristic is helpful because adjacent nodes are likely to be in the same module and therefore, these nodes will receive some desirable heating sequence ( p HS n,k ) or heat transferred from module n. An effect of the interplay between priorities could be observed in Fig. 4 . The waiting period in the interval f is much shorter than the waiting period in the interval c. The length of a waiting period depends on the other node priorities in addition to the node n's priority.
As discussed before, the performance of the transientbased heuristic strongly depends on the stop boosting θ SB n and heating trigger θ HT n temperatures. One example is the priorities calculated using (13), since they depend on θ HT n after the transition and on θ SB n during the transition. Efficient values for these temperature levels for each temperature map and each thermal node are found using a particle swarm optimization (PSO) technique. PSO is a well-known iterative population-based optimization metaheuristic. A canonical form of PSO [16] is used in this paper in a straightforward manner.
E. Remarks
The proposed transient-based heuristic allows the IC to be divided into a desired number of thermal nodes N. A large N means high resolution and translates into a higher computational effort. Therefore, the resolution can be restricted if the available computational power is limited.
In this paper, as mentioned previously, it is assumed that the temperature maps capable of identifying the gradientdependent defects are given. Furthermore, it is assumed that the combination of the given tests and heating sequences (mixed with cooling intervals) is capable of enforcing the desired temperatures on the specified modules.
IV. TEMPERATURE-GRADIENT-BASED TEST
For the temperature-gradient based test, the goal is to make sure that the tests are performed when the temperature gradients are enforced. This means that the specified temperature maps should be enforced before the test and then maintained during the test. The straightforward algorithm and the fast heuristic, which are proposed in the following, do this differently.
A. Straightforward Algorithm
This algorithm works by changing between two modes, the temperature construction mode and the test mode. Initially, the temperature construction mode is activated and it enforces the specified temperature map using a method similar to the transient-based heuristic proposed in Section III-D. Then the test mode is activated and the tests that are scheduled with a third-party algorithm (e.g., scheduling method proposed in [18] ) are applied. The test temperatures are simulated at design time and as soon as at least one of the thermal nodes is out of its specified range, the test mode is paused and the temperature construction mode takes over again. When all thermal nodes are brought into the specified temperature ranges, the temperature construction mode is paused and testing resumes.
Similar to the transient-based heuristic, if the temperature of a node is lower than the heating trigger temperature, it should be heated by applying the heating sequence to it. If there are many nodes that need heating (more than what the TAM can support), priority is given to those with higher regional need for heating, as defined in Section III-D. The construction mode, unlike the transient-based heuristic, should not heat the nodes up to their high-temperature limit since the power of the tests that are applied immediately after the construction mode may rapidly heat up the node beyond high-temperature limit. Therefore, testing trigger temperatures that are denoted by θ TT n for node n (θ HT n < θ TT n < θ H n ) are introduced here. During the temperature construction mode, the heating for node n stops as soon as the temperature reaches θ TT n . In the test mode, as soon as the temperature of a node reaches the high-temperature limit, the test mode is immediately paused, the temperature construction mode is activated and, consequently, a cooling interval is applied. The cooling continues until the node is cooled down to the testing trigger temperature, θ TT n , and then the node is ready for testing again. The actual activation of the test mode will also depend on the temperatures of the other nodes. Efficient values for testing trigger temperatures, θ TT n , for each map are found using a PSO technique along with θ SB n and θ HT n . The inputs to the methods proposed here include the inputs to the methods proposed in Section III as well as the test specifications (e.g., test switching activities). The output is a set of offline schedules, generated based on temperature simulations. There is no need for temperature sensors and the precision of the achieved temperature maps depends on the temperature model. Although generating such offline schedules is the focus of this paper, the proper values for the heating trigger θ HT n , stop boosting temperatures θ SB n , and testing trigger temperatures θ TT n that result in a rapid test could also be considered as the outputs that provide a basis for an online scheduling scenario. In this case, temperature sensors are used and the precision depends on the temperature sensors. Such online approaches are, however, beyond the scope of this paper.
The straightforward algorithm is simple, and allows the choice of a desired arbitrary test schedule that is used in the test mode. However, the overall test application time offered by this method is very long. Note that the total test application time also includes time spent for temperature construction.
B. Fast Heuristic
The fast heuristic schedules the tests together with the heating sequences such that the specified temperature map is maintained. In this way, a shorter test application time can be achieved. An illustrative example for the proposed method is given in Fig. 5 for a . Testing continues until the temperature of at least one of the nodes goes beyond the high-temperature limit θ H n or falls below the heating trigger θ HT n . For example, at the end of interval c, the node is too cold for testing and a heating interval should be introduced. Note that the TAM may no longer be available and therefore, the node is waiting for access to TAM in interval d.
Finally, when access to the TAM is obtained, the heating sequence is applied in interval e. To start heating, all nodes covered by a module should be colder than the high-temperature limit since the heating sequence for one node is very likely to inject power to other nodes in the same module (as explained in Section III-D). Heating continues until the temperature goes beyond the testing trigger temperature, and then the test resumes as in interval f in Fig. 5 . When the temperature reaches the high-temperature limit, a cooling interval is introduced as in interval g. This procedure continues until all tests corresponding to the current temperature map are completed.
As mentioned before, nodes will compete for access to the TAM and, therefore, some of them should be prioritized. First, the nodes that require heating (not the tests) are granted access to TAM. This helps to keep the temperatures most of the time within the specified limits and, thus, keep the flow of the tests uninterrupted. Note that if only one node falls out of its specified range, all tests must be interrupted until the map is achieved again. This will waste a lot of time since the tests for the modules that are in their specified range should also be interrupted. The priorities for the nodes that require heating is determined based on the regional need for heating, as proposed in Section III-D.
If the TAM has left with some available bandwidth after the heating sequences are scheduled, the modules that are thermally qualified may resume their tests. A module is thermally qualified if none of the nodes that correspond to that module are demanded by the previously discussed rules to receive heating, wait for heating, or receive cooling. The priority is given to the modules that are expected to offer long test endurance. The test endurance is denoted by e m for module m and is defined as
The test endurance is directly proportional to the remaining test size denoted by r m for module m. The larger the remaining test size, the longer the test endurance. The thermal tolerance, denoted by tt m for module m, is the other contributor to the test endurance. High thermal tolerance, tt m , indicates that the module is capable of receiving tests for a relatively long time without exceeding the specified thermal limits. Therefore, a module with a large thermal tolerance may remain under test for a relatively long time. The thermal tolerance is defined as
In (15), it is assumed that module m covers K active thermal nodes. k (k = 0, 1, . . . , K − 1) denotes the expected thermal distance to a temperature limit for node k and is defined as
As mentioned in Section III-B, the desired steady-state power p SS k is the power that results in a temperature equal
. Equation (16) indicates that if the upcoming tests have relatively high average power, then it is likely that the thermal node exceeds the hightemperature limit and, therefore, the difference between the current temperature θ k , and the high-temperature limit θ H k is a good measure for thermal tolerance. Similarly, for a relatively low power test, it is more likely that the temperature falls below the heating trigger in the future. Therefore, the difference between the current temperature θ k and the heating trigger temperature θ HT k is a good measure for thermal tolerance. Thermal tolerance tt m is defined as the smallest k (k = 0, 1, . . . , K − 1) since as soon as a single node is out of the specified range [θ L n θ H n ], disregarding the temperatures of the other nodes, test should be interrupted. Note that if the temperature falls below θ HT n , only for a node in module m, then the test is interrupted only for module m.
A proper value for the testing trigger temperature θ TT n is selected so that the temperature variation during test (caused by the variations in the test power) rarely results in the temperatures below θ HT n or above θ H n . Every time that θ HT n or θ H n is violated, the test must be interrupted and a heating or cooling interval must be introduced, respectively. Since these are time consuming, a proper θ TT n value helps to obtain a short test application time by reducing the number of interruptions. Besides the testing trigger temperature, stop boosting and heating trigger temperatures (θ SB n and θ HT n , respectively) have a considerable effect on the test application time and therefore, proper values for them should be found. A PSO technique is used to find the proper values for θ SB n , θ H T n , and θ TT n for each map. Similar to the previous sections, the canonical PSO [16] is utilized in a straightforward manner. 
V. TEMPERATURE MAP ORDERING TECHNIQUE
The order in which the maps are enforced has a considerable impact on the overall burn-in and test time. Since there are usually a number of temperature maps to be applied, their ordering is important. In this section, we present methods to rapidly obtain a proper order for temperature maps that results in a short burn-in and test time.
To simplify the discussions, let us assume that the temperature map for a thermal node is represented by the middle value of the specified temperature range
As an example, assume that an IC has two thermal nodes and the initial temperature is 30°C. The temperatures specified in map μ 0 are denoted by {θ SS 0 , θ SS 1 }. 2 This means that temperatures θ SS 0 and θ SS 1 are specified by map μ 0 for nodes 0 and 1, respectively. Assume that there are three temperature maps denoted by μ 0 , μ 1 , and μ 2 . These maps specify the following temperatures: μ 0 = {110°, 90°}, μ 1 = {40°, 50°}, and μ 2 = {110°, 80°}, respectively. These temperature maps are represented in Fig. 6 by three points in a Cartesian space. The temperature for node 0 is represented by the horizontal axes, θ 0 , and for node 1 by the vertical axes, θ 1 . The initial order of temperature maps {μ 0 , μ 1 , μ 2 } requires a long time to increase the temperature for node 0 from 30 to 110 [a 0 in Fig. 6(a) ], then decrease it to 40 [a 1 in Fig. 6(a) ], and then again increase it from 40 to 110 [a 2 in Fig. 6(a) ]. This process will take a long time due to the required large temperature changes. In contrast, it is much faster to work with the maps ordered as {μ 1 , μ 2 , μ 0 }, since in this case, the required temperature changes consist of smaller temperature variations, as shown in Fig. 6(b) .
As discussed earlier, to minimize the overall transition time for burn-in, a PSO finds the proper values for stop boosting and heating trigger temperatures (θ SB n s and θ HT n s, respectively). The map orders should be optimized along with these temperatures, since all of these factors have a crucial effect on the overall transition time for a given set of temperature maps. The naïve approach to find proper map orders is to introduce them as decision variables into the PSO along with θ SB n s and θ HT n s. Experiments showed that this naïve approach takes very long CPU time to complete. Since the optimized values for θ SB n and θ HT n depend on the map order, different map orders results in different optimized values for θ SB n and θ HT n .
The initial PSO population in the naïve approach consists only of random solutions (random θ SB n s, θ HT n s, and random map orders). Introducing a relatively good map order into the initial population of PSO (among other initial solutions that are random) will help to speed up the search. This approach is denoted by A1. The idea for approach A1 is to rapidly find a potentially good map order using some initialization heuristic and introduce it into the initial PSO population. By doing this, the search should speed up while the quality of the final values for θ SB n s and θ HT n s are kept reasonably high. In the majority of cases, PSO finds a better map order than the one produced by the initialization heuristic.
It is, in fact, possible to find a potentially good map order without having to go through the time-consuming optimization of θ SB n s and θ HT n s. Furthermore, it is possible to do it without the relatively time consuming scheduling procedures for the heating sequences. A temperature map could be considered as a point in an N-dimensional Euclidean space (N is the number of thermal nodes). The thermal distance between two maps is defined as the Euclidean distance between them (e.g., b 2 between maps μ 1 and μ 2 in Fig. 6(b) . For a sequence of maps, the total thermal distance (TTD) is defined as the sum of the thermal distances between successive maps. For example, TTD for Fig. 6(a) is approximately 257, while for Fig. 6(b) , it is 108, which is much smaller. In general, a sequence of maps with smaller TTD is expected to have a shorter transition time compared with a sequence with larger TTD.
Note also that the time required to change the temperature differs from node to node depending on the node's location on the IC, the adjacent node temperatures, the heating sequence powers, and so on. Moreover, depending on these factors, the rise time and the fall time for the temperature of a certain node are also different (e.g., in many cases heating up is faster than cooling down, with the same temperature gap). The TTD does not consider these differences in favor of a simple but meaningful metric that is fast to evaluate. However, when the map order is optimized using PSO, all these once ignored factors are automatically considered.
This problem is equivalent to finding the shortest Hamiltonian path in a complete graph whose vertices are temperature maps and the distance between two vertices is their Euclidean distance. Therefore, the initial heuristic based map order that is added to the PSO's initial population in approach A1 is called shortest Hamiltonian path. Due to the reasons discussed previously, this shortest path does not necessarily correspond to the optimal map order.
If A1 is allowed to run for a long time, it will produce very high-quality solutions. However, for larger designs, this is unaffordable. We have, therefore, proposed the A2 approach, which consists of a short run of A1 followed by a post-PSO optimization of map orders. The motivation for this is that PSO optimization in A1 can rapidly identify possible solutions in the near optimal area of the search space, but it then becomes very slow. Knowing the near optimal area, other optimization techniques can be deployed to rapidly improve the results. In the following, the post-PSO optimization for the map orders is discussed.
In the general case, the post-PSO optimization could be excessively time consuming. A greedy heuristic is therefore used to rapidly find a near optimal solution. The greedy approach is characterized by its size, S. This size is the number of alternative partial solutions that are kept at each step (i.e., among the vertices with equal depth in the search tree). A greedy heuristic with size S works as follows. Starting from the root vertex (initial temperature) in the search tree, S vertices (i.e., S temperature maps) that have the shortest partial transition times are selected. This corresponds to the first map in the final map order. Here, the full-fledged scheduling is performed to calculate the transition times. Then again S new vertices that have the shortest partial transition times are selected out of the set of vertices that succeed the previous best S vertices. Two maps (in the final map order) are scheduled so far. This procedure repeats until all maps are scheduled. For S equal to one, at each step, the map that is the fastest to achieve is selected. A large S slows down the search, but it may provide better results. Our experiments showed that 10 is a good choice for S.
Albeit this general case that addresses large and timeconsuming ICs, for smaller ICs, it is possible to find the optimal map order (i.e., exact solution) using an exact algorithm (e.g., branch and bound). Since a relatively good solution is already found by PSO in approach A1, we can skip many paths in the search tree that result in a larger transition time, without wasting time to fully schedule them. For example, assuming that the map order in Fig. 6(b) is already found by A1, there is no need to schedule a 2 [ Fig. 6(a) ] at all. Scheduling a 1 may also be aborted before completion since the overall transition time of this path in the search tree exceeds the overall transition time of the path corresponding to Fig. 6(b) before it even gets to vertex μ 1 . Note that in this algorithm, the edges are actual transition times and not the Euclidean distances. Albeit significant acceleration achieved by utilizing the near optimal result from A1 approach, larger examples are excessively time consuming and therefore finding their optimal solution is not practical.
Although this section has focused on map ordering for the temperature-gradient based burn-in, the map ordering for the delay test is very similar and the same technique can be used.
VI. EXPERIMENTAL RESULTS
We have first performed experiments to demonstrate that the proposed technique can rapidly and accurately achieve the specified temperature maps. For this purpose, temperature simulations with HotSpot [11] based on the generated schedules are performed. Consider an IC with three modules and two temperature maps. The first map requires average temperatures of 120°C, 90°C, and 60°C for modules m 0 , m 1 , and m 2 , respectively (μ 0 = {120, 90, 60}). The second map requires 90°C, 60°C, and 120°C, respectively (μ 1 = {90, 60, 120}). Both maps assume a temperature range of 10°C (e.g., 120°m
eans [115°125°]). The temperature curves for the steady-state solution and the transient-based heuristic are plotted in Fig. 7(a) and (b) , respectively. Boosting of temperature at module m 2 , which happens around 5 × 10 3 TU in Fig. 7(b) , is magnified in Fig. 7(c) . As discussed earlier, the temperature is allowed to rise beyond the high-temperature limit during the boost. The burn-in intervals for the first and second maps are marked with bi 0 and bi 1 , respectively. The transition time from the ambient temperatures to the first map (denoted by tr 0 ) and from the first map to the second one (denoted by tr 1 ) are much shorter for the transient-based heuristic. Consequently, the total time for burn-in process performed by the transient-based heuristic is about 4 × 10 4 TU shorter, compared with the steady-state solution.
The proposed temperature-gradient-based methods for burnin and delay-fault test are evaluated for 12 experimental ICs with one to three layers, as detailed in Table I , columns 2-4. The one-layer experimental ICs (row 1 to 4 in Table I ) are bare dies and could represent the prebond test stage. The ICs that have two layers (row 5 to 8) could represent mid-bond test stage. The ICs with three layers (row 9 to 12) could represent postbond test stage. There are two, four, eight, and 16 physical modules per layer for different dies, resulting in the total number of modules ranging from two to 48, as given in column 3. There are one, two, and three TSV blocks per layer on the dies, resulting in the total number of TSV blocks given in column 4, ranging from one to nine. Each TSV block hosts a relatively large number of TSVs. The dies are assumed to be stacked in a face-to-back configuration.
The temperature models are extracted using an approach similar to the method proposed in [7] for 3-D SIC. This is an extended form of the technique used by HotSpot [11] for normal 2-D ICs. The heating pattern switching activities are generated using Markov chains, similarly as in [24] . The temperature maps specify the valid temperature ranges for nodes in the temperature model. The valid ranges are randomly selected between 35°and 95°, and some modules/nodes are randomly selected to be don't-care. Only temperature maps that can be achieved in practice are considered. An example for a temperature map that cannot be achieved is one that requires a central node with a very low temperature and its adjacent nodes with very high temperature. In this case, the temperature gradient is huge and it probably will require negative power (active cooling) for the central node.
A. Evaluation of Transient-Based and Fast Heuristics
The transient solution and the transient-based heuristic for burn-in (Sections III-C and III-D), respectively) are evaluated and compared with the steady-state solution (Section III-B). The transient-based method is capable of handling temperature models having multiple nodes per module, while the steady state and transient solutions only support one thermal node per module. To have comparable experiments, the temperature model that is supported by the steady-state method is used in the experiments for burn-in. The CPU time to generate the schedules for the transient-based method for all of the twelve experimental ICs together is about 12 min while the transient solution takes 17 min and steady-state method completes in 2 s. As discussed earlier, the time required to bring the IC into a thermal situation that complies with the first temperature map and then to the next map until all maps are applied is defined as the overall transition time in this paper. The percentage change in overall transition time offered by the transient solution and the transient-based heuristic, compared with the steady-state solution, are given in columns 5 and 6 of Table I , respectively. A considerable speedup (78% on average) is achieved by the transient-based heuristic and moreover, it also outperforms the transient solution.
The fast heuristic for test (Section IV-B) is evaluated and compared with the straightforward method (Section IV-A). The temperature model used for these experiments has multiple nodes per module. First, the total CPU times for generating the schedules are compared. The percentage change in CPU time required by the fast heuristic compared with the straightforward method is −36%. The overall CPU time depends on the interaction between the computational complexity of a single decision point in the schedule and the schedule length. The experimental results indicate that since the fast heuristic method makes better decisions, compared with the straightforward method, the overall length of the schedule is reduced considerably and therefore, the overall CPU time is also reduced (−36% on average). This happens despite of the fast heuristic's higher computational complexity for individual decision points. In fact, the schedule length is an important contributor to the CPU time since longer schedules require longer temperature simulations and temperature simulation is, per se, very time consuming.
The total time required to enforce a temperature map and maintain it while the tests are being applied, in addition to the time spent applying the corresponding tests is defined as the test time in this paper. The percentage change in test time offered by the fast heuristic compared with the straightforward method is given in column 7 of Table I , which shows that considerable speedup (67% on average) is achieved.
The CPU times for the transient-based heuristic for different number of modules are given in Fig. 8 . Even though they grow rapidly with the increase in the number of modules, for an IC with 48 modules, it is still relatively short (480 s). The CPU times for the fast heuristic will be relatively larger since the tests are also scheduled along with the heating sequences. The increase rate in the CPU times, as shown in Fig. 8 , is tolerable similar to the transient-based heuristic. This was expected since these algorithms are very similar.
B. Evaluation of Temperature Map Ordering Technique
Percentage change in CPU time for A1 approach compared with the naïve approach is −266% on average. Furthermore, the overall transition time achieved by A1 is 18% smaller than the overall transition time achieved by the naïve approach.
Optimal map orders are found for some of the experimental ICs to be used for comparison purposes. It was not practical to find optimal map orders for all the experimental ICs because of the excessive search time that relatively large ICs require. The overall transition times achieved by A1 are around 23% larger than the overall transition times offered by the optimal map orders. As mentioned before, this shows that the map orders found by A1 are close to optimal, but A2 can do better.
In the following, A2 that includes post-PSO optimization is compared with A1 that terminates after the PSO optimization.
The greedy approach with a population size of one (S = 1) is used to find map orders for all of the experimental ICs. The results show 16% improvement over the A1 results, but it is 13% worse than the optimal. Increasing the population size to ten (S = 10), improves the results furthermore, so that there is 21% improvement over the A1 and it is only 7% worse than the optimal, but it almost doubles the search time. In short, A1 finds map orders that result in overall transition time around 23% worse than optimal. The post-PSO optimization in A2 improves the map orders by 21%, which means that it is very close to the optimum.
VII. CONCLUSION
Early-life failures and delay faults that are dependent on temperature-gradients introduce additional challenges to achieve efficient burn-in and delay-fault test. The negative effects of temperature gradients are more pronounced for 3-D SIC technology since their magnitude is much larger. The challenge for burn-in is that some defects develop and cause early-life failures very rapidly when the IC is working with certain temperature maps that include large temperature gradients, which are difficult to enforce by traditional burn-in methods. The challenge for delay-fault test is that some defects can be detected only when a certain temperature map is enforced on the IC.
To effectively detect these defects, it is necessary to construct and maintain the specified temperature maps during burn-in and delay-fault test. The methods proposed in this paper utilize the available TAMs to do so. The specified temperature maps are constructed and maintained by selectively applying high-power stimuli to the IC. Therefore, there is no need for an expensive equipment to heat up the chip externally. To the best of our knowledge, this is the first technique to achieve temperature maps for burn-in and test without any external heating mechanism.
For burn-in, a steady-state solution is introduced that is fast to generate the schedules, but the schedules are slow to achieve the specified temperatures. A schedule in this case consists of a single periodic schedule for each map. The steady-state solution has been extended to the transient solution, which is slow in generating the schedules, but constructs the maps faster. Finally, the transient-based heuristic is proposed to support a more precise temperature model and offer a shorter overall transition time by generating schedules that rapidly bring the IC to the specified temperature conditions. The experiments indicate that this method outperforms the transient solution. Moreover, this method is 78% faster than the steady state solution in realizing the specified temperature maps.
For delay-fault test, a straightforward method is proposed that is based on two working modes, the temperature construction mode and the test mode. The temperature construction mode works similarly to the transient-based method for burn-in and brings the IC to the specified temperature conditions. Then the test mode applies the tests according to a given test schedule until the IC's temperatures exits the specified range, when the temperature construction mode is activated again. This continues until all tests are performed. Furthermore, another method (fast heuristic) has been developed to schedule the heating and cooling intervals mixed with the tests. Therefore, the test time offered by this method is reduced. The experiments indicate that fast heuristic is 67% faster in performing the tests compared with the straightforward method.
The order of the temperature maps has a considerable effect on the overall burn-in and test time. Therefore, map orders need to be optimized, since they affect the optimal values for other decision variables. The experiments for map ordering show that the introduction of an initialization heuristic that adds an initial map order to the PSO's initial population speeds up the search time by 266% on average. Furthermore, the overall transition time improves by 18% on average for burn-in. The overall transition times are further improved by 21% through the introduction of a post-PSO optimization stage that consists of a greedy approach.
