Abstract-Dynamic voltage scaling (DVS) has been widely adopted in multicore SoCs for reducing dynamic power consumption. Despite its benefits, the use of DVS increases test time because high product quality can only be ensured by testing every core at multiple supported voltage settings; hence the repetitive application of the same or different tests at multiple voltage settings becomes necessary. In addition, testing at lower supply voltage settings increases considerably the length of each test because lower scan frequencies must be used for shifting test data using scan chains. Standard scheduling techniques fail to reduce the test time for DVS-based SoCs since they do not model testing at multiple voltage settings. In addition, they do not consider the practical aspects of tester overhead and the dependencies between core voltage settings due to the use of voltage islands. To alleviate the detrimental impact of DVS on test application time, we propose a time-division multiplexing (TDM) method and an integer linear programming-based test scheduling technique, which exploit high automatic test equipment (ATE) frequencies even when low shift frequencies must be used at low voltage settings. Experimental results on two industrial SoCs highlight the effectiveness of TDM and the associated scheduling method.
Abstract-Dynamic voltage scaling (DVS) has been widely adopted in multicore SoCs for reducing dynamic power consumption. Despite its benefits, the use of DVS increases test time because high product quality can only be ensured by testing every core at multiple supported voltage settings; hence the repetitive application of the same or different tests at multiple voltage settings becomes necessary. In addition, testing at lower supply voltage settings increases considerably the length of each test because lower scan frequencies must be used for shifting test data using scan chains. Standard scheduling techniques fail to reduce the test time for DVS-based SoCs since they do not model testing at multiple voltage settings. In addition, they do not consider the practical aspects of tester overhead and the dependencies between core voltage settings due to the use of voltage islands. To alleviate the detrimental impact of DVS on test application time, we propose a time-division multiplexing (TDM) method and an integer linear programming-based test scheduling technique, which exploit high automatic test equipment (ATE) frequencies even when low shift frequencies must be used at low voltage settings. Experimental results on two industrial SoCs highlight the effectiveness of TDM and the associated scheduling method.
I. INTRODUCTION
Dynamic voltage scaling (DVS) offers a good trade-off between power consumption and system performance as it adaptively adjusts power supply-voltage depending on the workload of the SoC [15] . In addition, SoCs consist of multiple voltage islands with separate supply rail and unique power characteristics [17] . Voltage islands permit the adaption of the DVS technique to the specific requirements of the cores of each island and maximize the power gains [15] . Several stateof-the-art processors adopt this technique [2] , [8] , [19] .
Defect-free operation of an SoC that supports DVS can be only assured by testing the SoC at multiple voltage levels as various defects manifest themselves in different ways at various voltage levels [1] , [7] , [12] . Testing at different voltage settings increases test time because either test patterns have to be applied multiple times or additional patterns must be used for the other voltage settings. As a result, test cost is increased considerably due to the use of DVS for power management.
Even though many techniques reduce the test cost for single-V dd SoCs [6] , [9] , [13] , [14] , [16] , [21] [10] considers the additional constraints imposed by the use of multiple voltage settings, but it does not tackle the problem that low shift frequencies must be used at the lower voltage settings. Thus its effectiveness for test scheduling in DVS-based SoCs is limited.
The work presented in this paper alleviates the problem of low shift frequencies in DVS-based SoCs; it reduces the test time by means of an efficient time-division-multiplexing (TDM) architecture and an effective integer linear programming (ILP)-based test scheduling method. It even allows us to reduce the shift frequency for testing various cores at multiple voltage settings below the nominal value, whenever this choice of shift frequencies and TDM minimizes the total SoC test time. Experimental results for two representative industrial SoCs are presented, which highlight the clear benefits of applying the proposed technique on multi-V dd designs. Although the proposed ILP method is efficient for contemporary industrial SoCs, it might not be computationally feasible for future generations of SoCs. For these cases, we present an LP-relaxation approach, which provide sub-optimal, but nevertheless good solutions to the test-scheduling problem.
II. MOTIVATION
Testing of multi-V dd SoCs requires long test application times due to the high volume of tests that must be repeatedly applied at the various voltage levels. Test time is dominated by the process of serially loading test data into the cores through scan chains. As scan chains are usually not designed to operate at the rated speed of the cores, an ATE transfers and loads test data to cores using a slow scan shift frequency. Consequently the time for testing each core is very long. Even in the case that the tester supports higher scan frequencies, this capability cannot be exploited, thus leaving tester potential underutilized.
The above problem is exacerbated when cores are tested at the low power-supply voltages where the maximum allowable scan frequency is low. In addition, testers usually conduct SoC testing using a single scan frequency over the duration of the test period. Thus, to avoid scan violations at any voltage setting, the lowest frequency for shifting test data has to be used, which corresponds to the lowest voltage level. As a result, the test time of the SoC increases even more.
Despite these limitations, the partitioning of every SoC into multiple voltage islands permits the concurrent testing of different cores at different voltage settings when they belong to different islands. In addition, cores which are tested at lower shift frequencies and which share the same TAM resources may be tested concurrently by also sharing the high frequencies of the ATE channels. For example, two cores, which have to be loaded using low scan frequencies, can be loaded in parallel using the same TAM resource by multiplexing the test data in time on an ATE channel. By using time-division multiplexing (TDM), the test data can be transmitted by the tester at a higher frequency, while at the same time, they can be shifted into the scan chains of multiple cores at much lower frequencies which depend on the voltage setting used in each case. Note that this is a generalization of the TDM approach as it does not only exploit the gap between the shift frequency of the ATE and the cores but it also exploits the gap on the shift frequencies between cores of different islands which are concurrently tested at different voltage settings. Example 1. Let the maximum scan frequencies of the cores at the voltage settings V 1 > V 2 > V 3 be 200, 100 and 50 MHz respectively, and the maximum scan frequency supported by the tester be 200 MHz. Then, a TAM resource can be used to concurrently test: (1) a single core at V 1 or (2) two cores at V 2 or (3) four cores at V 3 or (4) one core at V 2 and two cores at V 3 or (5) one core at V 1 at 100 MHz, one core at V 2 at 50 MHz (note that a slower shift frequency than the nominal one is used for these cores) and one core at V 3 , etc.
Depending on the maximum tester frequency and the maximum scan frequency at each voltage setting, there exist many different scheduling scenarios that exploit the full capability of the tester and the TAM mechanism for increasing the parallelism in loading test data into the cores. In addition, counter-intuitively and in contrast to what we expect, shifting test data into the scan chains at a lower than the nominal frequency may be beneficial in terms of ATE-channelfrequency utilization and this strategy may reduce the overall SoC test time. Since the solution space is very large, it is computationally challenging to find a good solution, especially when lower than nominal scan frequencies are also explored for reducing the overall test time. Test scheduling using multiple scan frequencies is a generalization of the simple scheduling problem that assumes a single scan frequency for all cores; the latter problem is NP-complete, therefore the scheduling problem being considered is also at least NP-hard. To this end, an ILP-based test scheduling approach is proposed in this paper to provide optimal or near optimal solutions.
III. TDM SCHEME them. Cores A, B belong to island L 1 and core C belongs to L 2 . Both islands support voltage settings V 1 , V 2 , V 3 and the nominal scan frequencies at each voltage setting are F, F/2 and F/4, respectively. Let the tester provide the ATE CLK signal with frequency F used as the generator clock signal for loading the scan chains. The tester also provides test data on the bus at frequency F . Each core is assigned one cyclical shift register with length equal to 4, which divides the scan frequency by a value equal to 1, 2 or 4. The scan frequency for each core is determined by loading the appropriate pattern to each register before the testing of the core begins. Every shift register is clocked with the fast ATE CLK (frequency F ) and in turn provides a clock signal with frequency equal or smaller to F . The following example illustrates this method.
Example 2. Let us assume that at a specific time instance cores A, B are tested at voltage V 3 and C is tested at voltage V 2 . Then, the highest frequencies that can be used for A, B, C are F/4, F/4, F/2, respectively. In order to provide scan frequency of F/4 to core A, register R A in Fig. 1 is loaded with the pattern "0001". Then, during every 4 successive cycles of ATE CLK, the rightmost cell of R A receives the value '1' only once and permits the application of one out of the four active edges of signal ATE CLK to the core A. In this case the scan clock frequency for this core is equal to F (AT E CLK)/4 = F/4. Register R B is loaded with the pattern "0100", which sets the scan frequency of core B equal to F/4 too. However, note that a different pattern from core A is used in order to offer non-overlapping loading of the test data from the common bus. Register R C is initialized with pattern "1010" and thus core C sees one active clock edge every two ATE CLK cycles. Therefore, the scan clock frequency for core C is set to F/2. Similar to the previous case, the pattern loaded into R C has non-overlapping '1' logic values with the patterns loaded into registers R A , R B .
From the above example we note that: a) all shift registers are concurrently shifted at every ATE CLK cycle, and b) they are loaded with patterns that have no overlapping logic values of '1'. Therefore, at most one core at any ATE CLK cycle receives the active edge of ATE CLK. At the same time, at every ATE CLK cycle, one test data vector is available at the bus and the core which receives the active clock edge at that time also loads the respective test vector from the bus.
The timing diagram for Example 2 is shown in Fig. 2 . It is obvious that the tester channel utilization is increased without violating the timing specifications of the cores. Whenever a core is not tested, the corresponding shift register is loaded with the all-'0' pattern and receives no active edges of the scan clock. Larger shift registers can be also used to allow various levels of TDM (e.g., 8-bit registers offer division by 2, 4 and 8). The selection of the register size depends on the specifications of each SoC and the amount of TDM required. We note that the proposed scheme uses clock gating only for loading the scan chains and the wrapper with test data (this mechanism can be bypassed during normal operation as well as during the capture cycles where on-chip PLLs are used).
The various frequencies offered by the TDM technique cannot perfectly match the highest scan frequencies that can be used at the various voltage settings. However, this is not a limitation of the proposed technique as the optimization target is to exploit as much of the available ATE frequency as possible regardless of the shift frequency used for each core. This goal is achieved by the proposed test scheduling method, even when lower than the nominal scan frequencies are used at each voltage setting for shifting test data into the cores. For example, suppose that a tester supports a maximum shift frequency of 300 MHz, and that cores C 1 , C 2 share the same TAM resource and are being tested using shift frequencies 150 and 200 MHz, respectively. Let the test time in each case be 10 and 5 ms, respectively. Then, these two tests must be applied serially and the total test time is equal to 15 ms. If the shift frequency for core C 2 is reduced to 150 MHz then the test time for this core increases to 7 ms. However, both tests can be applied concurrently so the total test time drops to 10 ms.
Similar to [20] , the proposed scheme can be combined with test data compression and BIST to derive more benefits. Identical cores can be tested using a broadcast mode if identical patterns are loaded at the respective registers to concurrently shift test data from the bus. Power constraints can be considered to set an upper bound on the amount of parallelism. At the same time, cores that are not tested can be completely shut down. For those ATE CLK cycles in which no core receives test data, the ATE repeat command can be used to avoid storing unnecessary test data in tester memory. Finally, testing of the logic in between the cores can be done by considering tests that involve multiple cores at a time.
IV. TEST SCHEDULING METHOD
We consider a multi-core SoC with N c cores C 1 , . . . , C Nc and N I voltage-islands L 1 , . . . , L NI (N c ≥ N I ). Each island includes a subset of the N c cores. We consider a set of N v voltage levels V 1 , . . . , V Nv sorted in descending order (i.e., V 1 > · · · > V Nv ) and we associate each island to a subset of these voltage levels. We assume that the DFT infrastructure for the chip has been implemented, whereby TAMs do not have to be optimized. The DFT infrastructure imposes restrictions on how much parallelism can be achieved during the testing of the cores.
Since capture cycles constitute a negligible portion of the test period, we state that
For every core C i , voltage V j and frequency F k a binary variable S CiVj F k is assigned, which is equal to '1' whenever the test of C i at V j is applied using F k . When F k > F max Ci,Vj , F k cannot be used for testing core C i at V j and thus we set S CiVj F k = 0 (higher frequencies are not supported at the lower voltage settings due to timing violations of the scan chains).
The first constraint that must be satisfied is that any test for core C i at any voltage setting V j is applied using a single scan frequency. This is modeled by the following relation:
For those voltage settings that a core is not tested the above constraint is omitted. Let ST CiVj F k denote the start time of task t CiVj F k . For testing core C i at voltage setting V j , only a single frequency can be used as indicated by (2) . Therefore, the start and end time for this test is given by the following relations
The product S CiVj F k · ST CiVj F k is not linear so it is replaced by the variable y CiVj F k and new constraints are introduced which can be found in Part 1 of the Appendix in [11] . Thus (3), (4) become
The second constraint is that any two cores C i1 , C i2 in the same island cannot be concurrently tested at different voltage settings V j1 , V j2 (V j1 = V j2 ). Therefore, either test for C i1 , V j1 begins after the test for C i2 , V j2 finishes or vice versa. Using relations (5), (6) it is written as: C 1 or C 2 , where
Constraint (7) is linearized as shown in Part 2 of the Appendix in [11] .
In a similar way, we determine the concurrency between different tests: if the test for C i1 V j1 begins after the test for C i2 , V j2 finishes or vice versa, then the two tests are not concurrent, else they are concurrent. Concurrency is determined for all cores sharing a common bus, excluding those that are in the same island and correspond to V j1 = V j2 as they are excluded by the second constraint. Let Conc Ci 1 Vj 1 Ci 2 Vj 2 be a binary variable which is equal to '1' if the tests C i1 , V j1 and C i2 , V j2 overlap. Then we can formally write:
The linearization of (8) is shown in Part 3 of the Appendix in [11] .
The final constraint bounds the number and type of tests that can concurrently use the same TAM resource. Each scan frequency is assigned a weight proportional to the capacity (in time) of the resource consumed by the tests applied at this frequency. For example, consider a scan frequency F supported by the tester. Then, any core that is tested using scan frequency F/2, F/4, · · · consumes half, a quarter, etc., of the capacity of the TAM resource. Any concurrent combination of tests must not exceed the capacity of the TAM resource. Let W (F k ) be the capacity consumed by any test using scan frequency F k . Let also W (T AM ) be the capacity available on the TAM resource. Then, for any two cores C i1 , C i2 connected to the same TAM resource and tested concurrently at supply voltages V j1 , V j2 the sum of their weights must not exceed W (T AM ). This is modeled by the following constraint:
Again (9) is linearized as shown in Part 4 of the Appendix in [11] . In the same way we construct the constraints for triplets or larger sets of tests depending on the maximum number of cores which are connected to any TAM resource (see Part 5 of the Appendix in [11] ). Finally, if T estLength is the total test time of the SoC, the optimization objective of the ILP model is the following:
Even though ILP models can optimally solve test scheduling optimization problems, they are NP-hard [5] and they do not scale well for large SoC designs. ILP problems can be solved in polynomial time by using the method of LP-relaxation [3] . In LP-relaxation, the binary variables are relaxed to real-valued variables such that the solution to the relaxed LP problem provides a lower bound for the cost function (test time in this case). However, real-valued variables are inadmissible in practice. A common technique to map these binary values to '0', '1' values is the weighted probabilistic technique of randomized rounding [18] . After the corresponding LP problem is solved, all binary variables that are assigned to fractional values are identified. One of them is randomly selected and it is set to '1' with a probability equal to the fractional value. At this point, we check this assignment for consistency with the constraints of the ILP model, and if it violates any constraints we reverse the assignment. Then, the LP problem is solved again, and the randomized rounding step is repeated until all variables are set to either 0 or 1.
V. EXPERIMENTAL RESULTS
For evaluating the proposed method we used test data for two industrial SoCs, hereafter referred to as SoC-A, SoC-B, respectively, which are targeted for portable wireless applications. SoC-A has 4 voltage islands I Table I and Table II present the minimum testing times M T Ci,Vj for all cores C i of SoC-A and SoC-B, at the various voltage levels V j . The test data for the cores are grouped into columns according to the island they belong to. The test times are calculated using the maximum scan frequency, denoted as 'F m ' that does not cause scan chain timing violations at any core for shifting at the corresponding voltage level (they are presented in normalized time units, so as not to reveal confidential data). This frequency is presented in the last column of each island (reported in MHz and different for every voltage setting). For the last two islands I B 6 , I B 7 of SoC-B, the value of F m is not included in Table II ; it is equal to 200 MHz. The entries denoted as "N/A" correspond to cores that either do not operate at the respective voltage settings or they are not tested at these voltage settings.
We assume that the tester provides a clock signal with frequency that is close to the highest scan frequency of any core of the SoC (200 MHz for SoC-A and 400 MHz for SoC-B). For SoC-A this frequency is divided on-chip using the proposed TDM scheme by 8, 4, 2 and 1, which corresponds to scan frequencies 25, 50, 100 and 200 MHz. For SoC-B, it is divided by 16, 8, 4, 2 and 1, and the scan frequencies are 25, 50, 100, 200 and 400 MHz, respectively. Each core C i is tested at a voltage V j using any of the TDM frequencies that are smaller or equal to the corresponding highest nominal scan frequency of C i at V j . For example, for testing core C A 8 at V 1 , the highest scan frequency that can be used is 133 MHz. If we consider a TDM scheme that provides frequencies of 200, 100, 50 and 25 MHz, then only the frequencies 100, 50 and 25 MHz can be used for testing C A 8 at V 1 and using equation (1) the respective test times are equal to 800, 1600 and 3200, respectively. Note that when the scan frequency is divided by two, the completion time for the task doubles. However, at the same time, the use of a smaller frequency permits a higher level of parallelization between the various tasks, which offsets the increase in task time due to the reduction in frequency.
First, we run various experiments for SoC-A where the ATE frequency is set between 25 MHz -400 MHz. We considered as TAM two test buses that are shared between the cores of SoC-A. For each ATE frequency, we assume that the TDM scheme divides the frequency by 2, 4, 8, etc., until the frequency reaches the lowest value which is equal to 25 MHz according to Table I . The results are shown in Fig. 3 . Above each bar we report the scan frequencies generated by the TDM scheme. It is obvious that, despite the low frequency used for shifting test data into scan chains, as the tester frequency increases, the test time decreases due to the higher parallelism achieved. Without TDM, testing has to be conducted at the lowest nominal scan frequency (i.e., 25 MHz in this case). The test time at this frequency is reported in the leftmost bar of Fig. 3 and it obviously represents the least effective solution.
Next we compare the proposed method against two non-TDM based methods. The first one is the Shortest-Job-First (SJF) approach proposed in [4] , which is efficient for scheduling tests at single-V dd designs. This method schedules the tasks using a priority order based on the length (in time) of each task (the next task selected to be scheduled each time is the task with the shortest test length). We adapted this approach to multi-V dd designs by appending additional constraints. In addition, we run experiments using the multi-V dd approach proposed in [10] for the test cases used in this paper.
For these baseline methods, we consider a realistic scenario (denoted as "RL") with respect to tester resources and TAM constraints, according to which there is only one ATE channel for providing the clock signal to every core. Note that usually testers in industrial multi-site testing environments do not provide separate clocks for different scan partitions; hence it is likely that all partitions may have to be shifted at the same rate. In addition, parallel testing of multiple cores at different voltage settings imposes the use of a lower frequency that does not violate the scan-chain timing for any of the cores being tested. Therefore, only one frequency is available for testing all cores, and it needs to equal the lowest shift frequency used for any of the cores and any voltage setting. For both SoCs this frequency was set equal to 25 MHz.
Even though the above baseline scenario is realistic for multisite testing in industry, we consider also a second hypothetical scenario for the baseline methods. Our goal is to show that the benefits of the proposed method are not limited to leveraging of higher scan frequencies for loading cores. Higher utilization of TAM bandwidth can also be achieved due to increased parallelism achieved by running different tasks (tests) at different voltage settings and frequencies at the same time. We assume that for each core, the highest possible scan frequency is used for loading the test data at any voltage setting. In other words, test scheduling is done using the minimum test times reported in Tables I-II. A separate ATE clock signal is required for every core, which increases the test cost as it increases both the number of ATE channels and the number of pins required for concurrently supporting these different scan frequencies. However, any non-TDM method cannot achieve better test time than this scenario, therefore it provides the lower bound of the test time for non-TDM methods. This baseline scenario is denoted as "LB".
We used a commercial solver for the proposed ILP model. The solver was allowed to run for a few hours and the best schedule provided during that time is reported. We also run the LP-relaxation technique (denoted as "Rand") for ten randomized experiments and we report the best result found. The CPU time for "Rand" is a few minutes for each experiment.
We used various TAM configurations, assuming a predetermined TAM architecture in each case (the problem of TAM optimization was not considered). For each volume of TAM resources, considered as shared buses in this work, we assumed three different random sharings of the buses between the various cores. The results for SoC-A are shown in Table III and for SoC-B in Table IV . The first two columns present the number of buses and the configuration index number while the next three pairs of columns present the results for SJF, the method proposed in [10] and the proposed method respectively. The proposed test scheduling method achieves remarkably high reduction in test time compared to all baseline methods. The test time is considerably lower than that of both realistic baseline approaches and it is also much lower in almost all cases than the lower bound for any non-TDM approach. The percentage reduction of the proposed ILP approach over [10] for the RL scenario approaches the value of 86.1% while for the LB scenario it approaches the value of 48.8%. The respective reduction percentages over the SJF approach are even higher. Therefore, we conclude that the proposed method offers considerable reductions in test time.
VI. CONCLUSIONS

