Despite the efficiency they achieve at the system level, these test-scheduling algorithms cannot be applied to BIST RTL data paths for two reasons. First, all these algorithms are formulated for a fixed test-resource allocation. Because test synthesis and test scheduling are highly interrelated, a fixed test-resource formulation leads to a prohibitively large computation time, hindering efficient exploration of the testable design space. Second, the algorithms assume a fixed amount of power dissipation associated with each test. This assumption is not valid for BIST RTL data paths, in which transitions associated with power dissipation required for testing each module can propagate to other registers and untested modules, causing useless power dissipation that has no influence on test efficiency.
In this article, we introduce a power-conscious test synthesis and scheduling technique that accounts for the complexity of the testable design space caused by the interrelation of test synthesis and test scheduling. The technique leads to power-minimized testable data paths in a small computation time. It targets dataflowintensive application domains, such as digital signal processing, communications, and graphics, in which
Higher power dissipation during test
Dynamic power dissipation in CMOS VLSI circuits depends on three parameters: supply voltage, clock frequency, and switching activity. The first two parameters reduce power dissipation at the expense of circuit performance. But reducing power by minimizing switching activity-and, hence, switched capacitance-does not degrade performance. This is the main technique researchers have studied over the past 15 years. 6 Systems comprising many memory elements and multifunctional execution units carry out power-conscious architectural policies such as power management, in which blocks are not simultaneously activated during functional operation. Hence, inactive blocks do not contribute to dissipation during functional operation. The fundamental premise of power management is that systems and their components carry a nonuniform workload during functional operation. However, this assumption is not valid during test application. Minimizing test application time when the system is in test mode requires concurrent test execution. Concurrent test execution means that many blocks are active at the same time, conflicting with the power management policy and dissipating more power than during functional operation.
Power dissipation classification
We classify power dissipation in BIST RTL data paths as necessary or useless, according to its effect on required test efficiency: ing high-level synthesis, which assigns operations to functional units and operands to registers, to generate an RTL data path with reduced power dissipation during normal operation. 7 The power-conscious technique we propose complements those approaches by investigating multiplexer control during test synthesis and test scheduling. The technique ensures that during testing, a synthesized RTL data path obtained using lowpower high-level synthesis algorithms does not exceed power ratings.
Effect of test synthesis and scheduling
We analyze the effect of test synthesis and scheduling on useless power dissipation through two examples. The first shows how test synthesis affects both TAP and SP dissipation. The second shows how module selection during test scheduling affects the elimination of useless power dissipation in registers and modules. Figure 2 illustrates the first example. Assume that in this BIST RTL data path, modules M 0 and M 2 are tested simultaneously without exceeding the given power constraints. When linear-feedback shift register LFSR 1 generates test patterns for M 2 , any configuration of control signals for the multiplexer at M 1 's input will cause useless power dissipation in M 1 , as Figure 2a shows. Moreover, the useless power dissipation in M 1 will further propagate to R 3 , leading to useless power dissipation in both modules and registers. In Figure 2b , however, when LFSR 2 generates test patterns for M 2 by selecting inactive register R 1 at M 1 's input, it eliminates useless power in M 1 without any penalty in test area or test efficiency. LFSR 2 's selection also eliminates useless power dissipation in R 3 . Test synthesis also has a profound impact on SP dissipation because an inappropriate selection of test registers can lead to useless power dissipation in M 1 and R 3 during the shifting in of seeds for test pattern generators LFSR 0 and LFSR 1 ( Figure 2a) . Figure 3 shows the second example. Assume that module M 0 is already scheduled in the current test session and that we must examine the selection of M 1 and M 2 . For simplicity, signature analysis registers for M 0 , M 1 , and M 2 are not shown in Figure 3 , and R 3 and R 4 are not used as analyzers in the current test session. Selecting M 1 and M 2 to be tested simultaneously and choosing inactive module M 2 at R 4 's input eliminates useless power dissipation in R 4 . However, any configuration of control signals for the multiplexer at R 3 's input causes useless power dissipation in R 3 ( Figure 3a ). Selecting M 2 and M 0 to be tested simultaneously and setting the appropriate values on control signals of multiplexers at the R 3 and R 4 inputs eliminates useless power dissipation in both R 3 and R 4 ( Figure 3b ).
Thus far we have outlined test scheduling's effect on useless power dissipation in registers. Figure 4 Figure 4b ). Clearly, test scheduling affects useless power dissipation in both registers and modules.
For designs that use gated clocks at the RTL, gating the inactive registers' clocks can also eliminate useless power dissipation in registers. However, eliminating useless power dissipation in modules by controlling multiplexer inputs is necessary even when modules are highly sequential and use power enable/disable signals to turn off untargeted modules. The reason is that useless power dissipation is eliminated in the combinational logic up to the first sequential boundary in the module only where power enable/disable signals take effect. Therefore, although the technique we propose is aimed at design styles without gated clocks and power enable/disable signals, it can successfully be combined with clock-gating power reduction techniques for further power dissipation savings. 
Proposed technique
We have integrated our power-conscious test synthesis and scheduling (PC-TSS) technique into an efficient, tabu-search-based exploration of the testable design space. The exploration combines the accuracy of incremental test-scheduling algorithms with the exploration speed of test-scheduling algorithms using fixed test-resource allocation. 8 A solution in the testable design space is a testable data path in which test pattern generators and signature analysis registers are allocated for each data path module. During testable design space exploration, a move transforms a feasible solution into another one by reallocating the test registers. Our technique investigates the neighboring testable data paths for useless power dissipation according to new move acceptance and module selection criteria.
Move acceptance during test synthesis
To minimize power dissipation, we have modified previous move acceptance criteria 8 to determine whether a newly generated testable design causes useless power dissipation. If a move generates a testable design with useless power dissipation, it must be rejected. Figure 5 shows our new accept-move algorithm. Given testable data path T-DP and the current solution's test registers (left and right test pattern generators TPG L and TPG R and signature analyzers SA), the algorithm accepts or rejects new testable designs by analyzing the interconnect between test registers and modules. For every module from the output module set of test registers, the algorithm examines the left and right input register sets (LIRS and RIRS).
If all the registers from either input register set are test registers, the algorithm rejects the move because no value of control signals for multiplexers at data path module inputs will eliminate the propagation of spurious transitions. Rejecting the testable data paths eliminates useless power dissipation during test application and while shifting out test responses. If all moves lead to useless power dissipation, the move leading to the lowest test application time is accepted, and the useless power dissipation is minimized with power-conscious test scheduling.
Module selection during test scheduling
Testable design space exploration tries to minimize test application time under power constraints, using BIST area overhead as the tie-breaker among many possible solutions with the same test application time. To satisfy power constraints, we compute test application time by making two modifications of Craig, Kime, and Saluja's test-scheduling algorithm based on partitioned testing with the "run-to-completion" method: 9 I A new module selection algorithm eliminates useless power dissipation. I The test-scheduling algorithm computes the power dissipated by scheduling selected module M i ; if the power constraint is not satisfied during the current test, then test t i for M i is removed from the candidate node set 9 and postponed. Figure 6 shows the proposed algorithm for module selection during power-conscious test scheduling. Module selection tries to eliminate useless power dissipation not only in useless registers (UR) at the output of currently tested modules, but also in useless modules (UM) to which spurious transitions propagate through useless registers. Given the testable data path, the modules scheduled at the current test time (tested modules) and the candidate modules to be scheduled according to the resource conflict graph, 8,9 the select-module algorithm selects the candidate module that will cause the minimum increase in power dissipation when scheduled at the current test time.
Initially, the active module set (AMS) contains the tested modules, and the active register set (ARS) contains the test registers that generate test patterns and analyze test responses for currently tested modules. For each candidate module, the algorithm computes power dissipation by recursively propagating spurious transitions through UR and UM (lines 4 through 11 in Figure 6 ). Initially, both sets of useless registers and useless modules are null. The algorithm computes UM using ARS and UR. A module is assigned to UM if all the registers in its left or right input register set are active at the current test time. The algorithm considers useless modules to detect the propagation of spurious transitions to useless registers. Once AMS is updated with UM, the algorithm computes useless registers UR, using the updated AMS. A register is assigned to UR if all the modules in its input module set are active at the current test time. All useless registers detected in the current iteration are used to update ARS in the next iteration.
Once ARS is updated, the algorithm detects new useless modules, and the recursive propagation of spurious transitions continues until no new useless registers are detected. At the end of the recursive propagation (lines 5 through 10 in Figure 6 ), AMS and ARS contain not only the tested modules and their test registers, but also all the data path elements active during the current test time. AMS and ARS are used to compute both necessary and useless power dissipation associated with selecting candidate module CM i to be scheduled at the current test time. Finally, the candidate module CM s that leads to minimum power dissipation is selected to be scheduled at the current test time.
To explore the testable design space under power constraints, we consider generic power models. Estimating power dissipation at a lower abstraction level for each solution hinders efficient design space exploration because module selection during test scheduling requires power computation during each stage of the algorithm for each solution in the design space. Therefore, the only extra constraint on the design library is that besides performance, cost, and pseudorandom test length, it should provide an average or peak power value.
Experimental results
We compared the proposed technique (PC-TSS) with the time and area test synthesis and scheduling (TA-TSS) technique, 8 whose main objective is to minimize test application time under the given power constraint. TA-TSS uses BIST area overhead as a tie-breaker among many possible solutions with the same test application time, without considering useless power. 8 Unlike TA-TSS, the main objective of PC-TSS is to eliminate useless power dissipation-and then use test application time and BIST area overhead as tie-breakers. Our powerconscious technique is applicable and hence orthogonal to any test register allocation algorithm, test-scheduling algorithm, or testable design space exploration algorithm because it improves decision making during the exploration process by accounting for useless power dissipation. during the design space exploration process, we assume that power dissipation values for registers, adders, and multipliers are P Reg = P u , P Add = P u , and P Mult = 4P u , where P u is a generic technology-dependent model of power dissipation. Similarly, to achieve 100% fault coverage for 8-bit data path modules, we assume that test application times for adders and multipliers are T Add = T u , and T Mult = 4T u , where T u = 128.
To compute area overhead, we specified the BIST data paths and controllers in VHDL code at the RTL and synthesized and technology-mapped them into Austria Mikro Systeme (AMS) 0.35-micron technology. 10 To estimate the dynamic power shown in Figures 7c and 7d , we used AMS 0.35-micron timing and power information and a delay model simulator operating at a 3.3-V supply voltage and a 100-MHz clock frequency. To compute power dissipation during test application in the entire data path, we summed the power dissipation of all active elements. To assess the techniques' effectiveness, we compared PC-TSS and TA-TSS for 10 power constraints ranging from 10P u to 19P u . For all the evaluated circuits, PC-TSS produces less TAP and SP dissipation than TA-TSS, which does not account for useless power dissipation.
TA-TSS, which assumes a fixed amount of power dissipation with each test, yields a lower test application time and BIST area overhead than PC-TSS. However, TA-TSS always causes higher power dissipation because it ignores spurious activity (the source of useless power) and hence violates the power constraint, possibly leading to destructive testing. To learn how PC-TSS algorithms scale computationally to highly complex designs, we generated 18 hypothetical data paths containing 35 to 45 modules and 90 to 115 registers (the increment step is 5). 8 The computational time for these complex data paths reaches 30 minutes on a Pentium III processor running at 800 MHz.
OUR FUTURE WORK will investigate power-conscious techniques for addressing useless power dissipation in test-per-scan environments and core-based SoCs.
I
