SUMMARY This paper preeents a power-constrained test scheduling method for multi-clock domain SoCs that consist of cores operating at different clock frequencies during test. In the proposed method, we utilize virtual TAM to solve the frequency gaps between cores and the ATE. Moreover, we present a technique to reduce power consumption of cores during test while the test time of the cores remain the same or increase a little by using virtual TAM. Experimental results show the effectiveness of the proposed method.
Introduction
Today's SoCs embed hundreds of memory cores and several different types of logic cores obtained from various vendors. Moreover, multiple clocks operate at multiple frequencies in a single SoC. Testing of SoCs is a crucial and time consuming problem due to the increasing design complexity. SoCs are increasingly tested in a modular fashion because the system integrator in most cases has very limited knowledge about the structural content of the adopted core, and hence deals with it as a black box. Therefore he/she cannot develop the DFT structures and the corresponding test patterns for it. This is especially true if a core is a hard one or is an encrypted Intellectual Property block [1] . In order to enable modular test, each embedded core should be isolated from its surrounding circuitry. Zorian et al. introduced a generic test architecture that enables modular test for SoCs [1] . It consists of the following three components: 1) test pattern source and test response sink, 2) test access mechanism (TAM), and 3) wrapper. The TAM propagates test patterns for a core from test pattern source to the core, and furthermore propagates the responses from the core to test pattern sink. The wrapper provides an interface between TAM and core, and also provides functions for cores to switch the mode of the cores: 1) normal, 2) INTEST (to test cores), 3) EXTEST (to test interconnects between cores), and 4) BYPASS defined in IEEE std. 1500 [2] . The goal is to develop techniques for wrapper design, TAM design and test schedule that minimizes test application time under given constraints such as the number of test pins and power consumption. A number of approaches have addressed wrapper design [3]- [5] which are IEEE std. 1500 compliant. Similarly, several TAM architectures have been proposed such as TestBus [6] , [7] , TES-TRAIL [8] , transparency based TAM [9]- [11] . Moreover, many approaches for core-internal test scheduling problem have been proposed [3] , [8] , [12] - [16] . Recently, [17] proposed a test scheduling method to minimize the overall test time for core-internal logic and core-external interconnects.
However, these previous approaches are applicable only to single-clock domain SoCs that consist of embedded cores operating at the same clock frequency during test. Today's SoC designs in telecommunications, networking and digital signal processing applications consist of embedded cores operating with different clock frequencies. The clock frequency of some embedded cores during test is limited by its scan chain frequencies. On the other hand, other cores may be testable at-speed in order to increase the coverage of non-modeled and performance-related defects. Moreover, there also exists a frequency gap between each embedded core and ATE used to test the SoC. From these facts, we can say that the previous approaches have the following two problems: 1) when test clock frequency of a core is higher than that of ATE, the ATE cannot provide test sequences at the same speed of the test clock frequency of the core, and 2) when test clock frequency of a core is lower than that of ATE, testing of the core by lowering the frequency of ATE does not make use of ATE capability effectively. Therefore, it is necessary to develop a technique that can solve the above problems for the multi-clock domain SoCs.
Recently, virtual TAM based on bandwidth matching [18] has been proposed in [19] to increase ATE capability when the clock frequency of a core is lower than that of ATE. Xu and Nicolici extended the virtual TAM technique to the multi-frequency TAM design to reduce the test time for the single-clock domain SoCs in [20] . Moreover, a wrapper design for cores with multiple clock domains was proposed in [21]-[24] to achieve at-speed testing of the cores by using virtual TAM technique. However, the test scheduling problem for the multi clock domain SoCs is not addressed in these literatures.
To the best of our knowledge, this paper gives a first discussion and a formulation of the core-internal test scheduling problem for multi-clock domain SoCs.
We 
Preliminaries

Power Consumption
Power consumption in CMOS circuits can be classified into two categories: static power and dynamic power. Static power dissipation is caused by leakage or other current drawn continuously from the power supply. On the other hand, Dynamic power dissipation is caused by output switching. For the current CMOS technology, dynamic power is the dominant source of power consumption. High average power consumption causes structural damage to the silicon, bonding wires or package. And if peak power consumption exceeds a certain limit, designers cannot guarantee that the entire circuit will function correctly. It is said that average power consumption is closely related to scan operation while peak power consumption is related to capture operation during test. In this paper, we only consider the power consumption during scan operation defined as follows.
First, the energy E(k) consumed in the circuit on application of consecutive two test vectors (Vk-1, Vk) is defined as follows [25] .
where c0 is the circuit's minimum parasitic capacitance, VDD is the power supply voltage, Si(k) is the number of switchings provoked by Vk at node i, and Fi is the number of fanout at node i. Let N be the number of clock cycles for scan operation. The total energy consumed in the circuit during the scan operation is defined as follows. of the cores and serial-in/parallel-out registers at the outputs of the cores. Therefore, the hardware cost is relatively low compared to the area of the core itself. Second, since the TDM and TDdeM used for implementation are placed next to the cores, only the original TAM wires are routed through the SoC. Thus, the routing cost is also low.
In this paper, we also utilize the virtual TAM technique to reduce power consumption of a core while the test time remains the same or increases a little. From the Eqs. (3) and (4), we observe that the power consumption of a core during test can be reduced by lowering its test frequency. However, this increases test time of the core proportionally to the power reduction ratio. Here, we insert TDdeM/TDM circuits between the ATE and the core. Then, more virtual TAM wires become available for the core, and test time can be reduced. In the best case, we can reduce the power Test data multiplexing/de-multiplexing. 1. the total number of test pins used at any moment does not exceed Wmax, 2. the total power consumption used at any moment does not exceed Pmax, 3. each core satisfies at-speed test requirement (i.e., if atspeed(ci)=yes, ci must be tested at fmax(ci). Otherwise, ci can be tested at frequencies lower than fmax(i)), and 4. the overall SoC test time is minimized.
Scheduling Algorithm
This section presents a heuristic algorithm for Pmcds. The outline of the proposed algorithm is shown in Fig. 3 .
Step 1: Testability Analysis First, the algorithm checks whether there is a solution for a given problem instance (Line 1). For a core ci such that atspeed(ci)=yes, we cannot change the test frequency fmax(ci) and power consumption power(ci) during test. Therefore, there is no solution under the given Pmax if power(c1) exceeds Pmax. Moreover, we have no solution if the ATE cannot provide enough bandwidth for ci to test at fmax(ci). Now, we summarize the conditions as follows.
For each ci E C such that atspeed(ci)=yes, if ci cannot satisfy the following both two conditions, there is no solution and the algorithm exits. Otherwise, it moves to Step 2. Outline of the proposed algorithm.
Step 2: Lower Bound Calculation The authors in [8] proposed an architecture independent lower bounds on core and SoC test time. In this step (Line 2-3), similar lower bounds are calculated for use in the later steps. First, we calculate a lower bound T J3 on test time of each core ci as follows.
where rmaxi is the wrapper configuration of ci such that pin(rmaxi) is maximum and rmax satisfies the following condition. Test frequency f(ci) for ci is determined as follows.
In this paper, we assume that the frequency division factor mci is an integer. However, to simplify the hardware implementation we can limit mci as two's exponent or userspecified frequency set.
Step 4: Test Schedule at Time 0 This step determines test schedule at time 0 (Line 7-16). First, we initialize the available power consumption Po, available test pins W0 at time 0 and the set of unscheule cores S (Line 7). Then, we sort cores in the descending order based on TLB (Line 9). After that, we schedule a core ci in the above order at time 0 with wrapper rrest and test frequency f (ci) (Line 11), and update the corresponding variables (Line 12). This process is repeated until all cores are scheduled or ci cannot satisfy the conditions at Line 10. The condition TLB>TLB/|C| can prevent us from scheduli cores with small amount of test data to time 0. Instead of scheduling such small cores at time 0, next step tries to reduce the test time of the cores scheduled in this step by distributing the remaining available power and test pins. Figure 4 shows a current test schedule generated after Step 4. In Fig. 4 (a) , the horizontal axis denotes the test time, and the vertical axis denotes the power consumption used in each test time. In Fig. 4 (b) , the horizontal axis denotes the test time, and the vertical axis denotes the number of test pin used in each test time.
Step 5: Remaining Power/Pin Distribution at Time 0 There exists a case where P0 (available power consumption at time 0) does not reach 0 after Step 4 as shown in Fig. 4 (a) . This is because Step 4 terminates when one of the three conditions in Line 10 cannot be satisfied. In this step (Line 17-19), we find a core ci with longest test time among the currently scheduled cores such that
If there exists such a core ci, we update mci to mci-1, and reduce the test time of ci by increasing f(ci) according to Eq. (11) while satisfying power constraint by Eq. (13). Equation (14) can prevent one core from dominating power consumption, and help us to increase the test concurrency when the remaining cores are scheduled in next step (Step 6). This process is repeated while both of the following two conditions are satisfied.
1. P0>0, and 2. there exists a core that satisfies all three Eqs. (12), (13) and (14).
Figure 5 (a) shows a result where we apply this process to the current schedule generated after Step 4 shown in Fig. 4 . In this example, frequencies for core 2, 3, 4 and 6 are increased. Consequently, the test time for these cores are reduced.
Similarly, there exists a case where W0 (available test (a) Power vs test time after recalculating test frequencies.
(b) Pin vs test time after redesign wrappers. pins at time 0) does not reach 0 after Step 4. In this case, we find a core ci with longest test time, then assign 1 test pin to ci and reduce the test time. This process is repeated while W0>0 (Line 20-22). Figure 5 (b) shows a result where we apply this process to the current schedule corresponding to Fig. 5 (a).
Step The difference from power-independent lower bound TLB is at most only 5.4% when Pmax=3000. Therefore, we can say that the proposed heuristic algorithm is effective and efficient. Table 4 shows the test time results for another multiclock domain SoC p93791 from ITC'02 SoC benchmarks [27] when fATE=50MHz. As the original benchmark SoC do not have any data related to power consumption, we used the following settings for each core ci: (1) 
