Test access is a major problem for core-based system-on-a-chip (SOC) designs. Since embedded cores in an SOC are not directly accessible via chip inputs and outputs, special access mechanisms are required to test them at the system level. An efficient test access architecture should also reduce test cost by minimizing test application time. We address several issues related to the design of optimal test access architectures that minimize testing time., including the assignment of cores to test buses, distribution of test data width between multiple test buses, and analysis of test data width required to satisfy an upper bound on the testing time. Even though the decision versions of all these problems are shown to be NP-complete, they can be solved exactly for practical instances using integer linear programming (ILP). As a case study, the ILP models for two hypothetical but nontrivial systems are solved using a public-domain ILP software package.
INTRODUCTION
Embedded cores are now increasingly being used in large system-on-a-chip (SOC) designs [Zorian et al. 1998 ]. These complex, predesigned functional blocks facilitate design reuse, allow greater on-chip functionality, and lead to shorter product development cycles. However, the manufacturing test and debug of such SOC designs remains a major challenge. Since embedded cores are not directly accessible via chip inputs and outputs, special access mechanisms are required to test them at the system level. The development of efficient test access architectures is therefore of considerable interest to the SOC design and test community.
A test access architecture, also referred to as a test access mechanism (TAM), provides means for on-chip test data transport [Zorian et al. 1998 ]. It can be used to transport test patterns from a pattern source to a core-under-test, and to transport test responses from a core-under-test to a response monitor. A number of test access architectures have been proposed in the literature [Zorian et al. 1998 ]. These include macro test [Marinissen and Lousberg 1999] ; core transparency [Ghosh et al. 1993; Ghosh et al. 1998 ]; dedicated test bus [Varma and Bhatia 1998 ]; multiplexed access [Immaneni and Raman 1990] ; and a bus architecture based on the concept of a TESTRAIL ]. A TESTRAIL provides a flexible and scalable test access mechanism; a single TESTRAIL can provide access to one or more cores, and an IC may contain one or more TESTRAILs of varying widths. The width of a TESTRAIL is referred to as the test data width, since it determines the overall system testing time. Figure 1 , derived from Marinissen et al. [1998] , illustrates one possible implementation of the TESTRAIL architecture. (The core wrapper and the bypass mechanism are explicitly not shown in the figure. )
In order to reduce test cost, the testing time for a core-based system should be minimized by adopting an appropriate test access architecture. While the TESTRAIL architecture allows the system designer to trade-off testing time with area overhead by varying tests data widths, the precise relationship between the testing time and the test access architecture has not been formally studied. Related prior work has either been limited to test scheduling for a given test access mechanism [Chakrabarty 1999; 2000a; Sugihara et al. 1998 ], or to determining the optimal number of internal scan chains in the cores [Aerts and Marinissen 1998 ]. The latter requires redesign of the scan chains for each customer, and thereby affects core reuse. While Aerts and Marinissen [1998] present several novel strategies for TAM design (e.g. multiplexing, daisy chaining, and distribution), it does not directly address the problem of optimal sizing of test buses in the SOC. We are interested here in the problem of minimizing the SOC testing time via optimal test bus design, and without any redesign of the embedded cores. The design of the test access architecture is especially important for the system designer/integrator, since the IEEE P1500 standard (being developed for embedded core testing) leaves TAM design upto the system integrator .
The system integrator is interested in the following TAM design problems: (1) Given an SOC and maximum test data width, how should the width be distributed among the various test buses in order to minimize the testing time? (2) How should the embedded cores in the system be assigned to the test buses? (3) For a given test access architecture, how much test data width is required to meet specified testing time objectives? To the best of our knowledge, this paper presents the first systematic solutions to these SOC design problems.
The main contributions of the paper are listed below.
• We formulate several design problems related to test access architectures, and show that the decision versions of all these problems are NP-complete.
• Even though these design problems are NP-complete, we show that they can be solved exactly using integer linear programming (ILP). We first develop an ILP model for optimally assigning cores to test buses when the widths of these test buses are known. We refer to this as the "test bus assignment problem".
• We note that the testing time can be reduced further by optimally distributing the total test width among the individual test buses. We develop an ILP model for minimizing the testing time by combining optimal width distribution with optimal test bus assignment.
• Given a constraint on the maximum testing time, we develop an ILP model to determine the minimum test data width and an optimal assignment of cores to test buses.
• The above ILP models make the simplifying assumption that a test bus cannot be subdivided into test buses of smaller width which subsequently merge before the test data is transported to the IC outputs. In order to account for this realistic scenario, we refine our ILP models to allow a test buses to fork into test buses of smaller width.
• We evaluate the feasibility of the proposed ILP models by solving them using an ILP solver for two hypothetical, but nontrivial and representative SOCs. The experimental results demonstrate that optimal solutions to these important design problems in SOC testing are indeed feasible. Our ILP models do not address the problem of testing the interconnect and wiring between the cores. A complete solution for SOC testing that also addresses these issues requires enhancements to the basic ILP framework presented here for isolation testing of embedded cores.
The organization of the paper is as follows. In Section 2 we briefly review integer linear programming and formulate the problem of optimal test bus assignment. In Section 3 we develop ILP models for minimizing the testing time by determining an optimal test width distribution. In Section 4 we present case studies for two example SOCs (described below). We solve the various ILP models for this system using the lpsolve software package from Eindhoven University of Technology [Berkelaar 1999 ]. Finally, in Section 5, we extend our basic ILP models to handle cases where a test bus may fork into several test buses that subsequently merge before the test data is transported to the IC outputs. We present experimental results on optimal and near-optimal subdivision of the test buses.
In order to illustrate the proposed optimization methods, throughout the paper, we use as examples the core-based SOCs S 1 and S 2 shown in Figure 2 . These hyopthetical but nontrivial SOCs consist of ten ISCAS 85 [Brglez and Fujiwara 1985] and ISCAS 89 benchmark circuits [Brglez et al. 1989] each. We assume that the three ISCAS 89 circuits contain internal scan chains. S 1 contains seven combinational cores and seven sequential cores, while S 2 consists of two combinational cores and eight sequential cores. The complexity of the ILP models depends more on the number of cores in the SOC than on the sizes of the cores. For the sake of illustration, only two test buses are shown in Figure 2 . Our ILP models can easily be used for any number of test buses. (a) (b) Fig. 2 . Two examples of core-based SOCs with two test buses each: (a) System S 1 containing seven combinational cores and three cores with internal scan; (b) System S 2 containing two combinational cores and eight cores with internal scan.
OPTIMAL ASSIGNMENT OF CORES TO TEST BUSES
We first briefly review ILP by means of matrix notation [Williams 1985 ]. The goal of ILP is to minimize a linear objective function on a set of integer variables while satisfying a set of linear constraints. A typical ILP model is described as follows:
where A is a cost vector, B is a constraint matrix, C is a column vector of constants, and x is a vector of integer variables. Efficient ILP solvers are now readily available, both commercially and in the public domain. For our experiments (described in Sections 4 and 5), we used the lpsolve package from Eindhoven University of Technology in the Netherlands.
Let the SOC design consist of N C cores and let core i, 1 Յ i Յ N C have n i inputs and m i outputs. We assume that the n i inputs of core i include data inputs and scan inputs. Similarly, the m i outputs of core i include data outputs and scan outputs. Each full or partial scan core may have one or more internal scan chains. (A combinational or nonscan legacy core has no scan inputs and outputs.)
Test data serialization is required at core I/Os when the width of the test bus is less than the number of core terminals . This happens often because the number of core terminals is determined by the functionality of the core, while test bus width is determined by the test pattern source, the SOC routing and area constraints and, in some cases, the width of the existing system bus. Therefore, test data serialization must be performed in the test wrapper if the number of core terminals is larger than the test bus width.
We assume in our serialization model that the test sets for the SOC cores are available in scan format, in which the functional input values remain unchanged during successive scan cycles. (If the test sets for the cores are obtained before the translation to scan format, then alternative serialization models involving the lengths of the scan chains in the cores can be used to reduce the testing time even further.) The amount of test data serialization necessary at the inputs and outputs of core i is therefore determined by its test width i ϭ max͕n i , m i ͖. This influences the testing time for core i. We assume that core i requires t i (scan) cycles for testing. Finally, we assume that the system contains N B test buses with widths w 1 , w 2 , . . . , w NB .
The problem that we address in this section is to minimize the system testing time by optimally assigning cores to test buses. It is formally stated as follows:
• P1: Given N C cores and N B test buses of widths w 1 , w 2 , . . . , w NB , respectively, determine an assignment of cores to test buses such that the total testing time is minimized.
Note that P1 is equivalent to the well-known multiprocessor scheduling problem, and is therefore NP-complete ( [Garey and Johnson 1979, p. 65] ). The multiprocessor scheduling problem is stated as follows:
Instance: A finite set A of "tasks," a "length" l͑a͒ Ͼ 0 for each a ʦ A, a number m Ͼ 0 of "processors", and a deadline, D Ͼ 0.
Question: Is there a partition
The equivalence between a decision version of P1 and multiprocessor scheduling can easily be established by noting the correspondence between processors and test buses and between tasks and test sets. The deadline D corresponds to the overall SOC testing time.
Even though P1 is NP-complete, we show that, as in the case of many other NP-complete problems, it can be solved exactly for practical instances using integer linear programming. We assume for now that a test bus does not fork (split) into multiple branches that may merge later. This restriction is removed in Section 5. We also assume that all cores on any given test bus are tested sequentially. Two or more test buses can be used simultaneously for delivering test data to cores and for propagating test responses. We assume that the number of test buses (and thereby the amount of test parallelism) is determined by the core user (system integrator) after a careful consideration of system-level I/O, area, and power dissipation issues.
We first note that if core i is assigned to bus j, then the testing time for core i is given by
If i Ͼ w j then the width of the test bus is insufficient for parallel loading of test data, and serialization is necessary at the inputs and/or outputs of core i. In order to calculate the test time due to serialization, we use an interconnection strategy similar to the one suggested in ] for connecting core I/Os to the test bus, namely, provide direct (parallel) connection to core I/Os that transport more test data. We assume a worst-case scenario of test data serialization, in which the first (w j Ϫ 1) test bus lines are connected to (w j Ϫ 1) core I/Os in parallel and the last test bus line is serially connected to the remaining ( i Ϫ w j ϩ 1) core I/Os; see Figure 3 . This can potentially reduce the amount of interconnect within the wrapper. If the width of bus j is adequate, i.e. i Յ w j , then no serialization is necessary, and core i can be tested in exactly t i cycles.
Let x ij be a 0-1 variable defined as follows: 
The above minmax nonlinear cost function can easily be linearized [Williams 1985 ]. The resulting integer linear programming model is shown in Figure 4 . It can be easily seen that the integer linear program ILP model for P1 contains N B N C 0-1 variables, one nonbinary, integer variable, and
As an example, we consider the SOCs S 1 and S 2 introduced in Section 1. Table I presents the test data for each embedded core in these systems. We assume that s838 contains one internal scan chain, and s5378 and s9234 contain 4 internal scan chains each. We also assume that s35392 and s38417 contain 32 internal scan chains each, and s13207 and s15850 contain 16 scan chains each. For the combinational cores, 1 Յ i Յ 7, the number of test cycles t i is equal to the number of test patterns p i . However, for the remaining three cores with internal scan,
where core i contains f i flip-flops and n i internal scan chains [Aerts and Marinissen 1998 ]. The test patterns for these circuits were obtained from [Hamzaoglu and Patel 1998 ].
Example. Let N B ϭ 2, and let the total test data width for S 1 be 48 bits, i.e. w 1 ϩ w 2 ϭ 48. In addition, let w 1 ϭ 32 and w 2 ϭ 16. The optimal assignment of cores to these two test buses is given by the vector (1,1,1,1,1,1,1,2,2,1), where a 1 (2) in position i of the vector indicates that core i is assigned to bus 1 (2). This is shown in Figure 5 . The optimal testing time for these values of w 1 and w 2 obtained using lpsolve is 411884 cycles. Note that this is not the minimum testing time that can be achieved with a total test width of 48 bits. For example, a testing time of 408077 cycles is achieved using w 1 ϭ 28, w 2 ϭ 20, and the test bus assignment vector (1,1,2,1,2,1,2,2,2,1). In the next section we examine the problem of c6288  1  32  32  32  12  12  c7552  2  207  108  207  73  73  s838  3  36  3  36  75  2507  s9234  4  40  43  43  105  5723  s38584  5  70  336  336  110  5105  s13207  6  78  168  168  234  9634  s15850  7  93  166  166  95  3359  s5378  8  39  53  53  97  4507  s35932  9  67  352  352  12  714  s38417  10  60  138  138  68 3656 (b) Optimal Test Access Architectures
• determining an optimal distribution of the total test data width among the individual test buses.
The following theorem presents a lower bound on the total testing time when the widths of the test buses are known. This lower bound can indeed be achieved in practice-we illustrate this below using the system S 1 as an example. We also make use of this theorem in Section 3 to derive a lower bound on the testing time when only the total test data width is known and the optimal widths of the test buses have to be determined. 
is the test width of core i and T ij is defined by (1).
PROOF. The testing time for core i depends on the width of the test bus to which it is assigned. Clearly, the testing time for core i is at least min j ͕T ij ͖. Since the overall system testing time is determined by the core that has the longest test time, T Ն max i ͕min j ͕T ij ͖͖. e For the system S 1 with two test buses of 32 bits and 16 bits, respectively, Theorem 1 provides a lower bound on the testing time of 391,190 cycles. This corresponds to a test bus assignment in which only core 10 is assigned to the 32-bit first test bus. Such an assignment is indeed optimal and the The ILP model presented in this section can also be used for optimally assigning cores to test buses for more general test access architectures. For example, Figure 6 shows a test access architecture consisting of two test buses in which the 20-bit test bus forks into two sets of buses, which in turn merge into the original 20-bit-wide test bus. If we use this test bus architecture for S 1 , then a minimum testing time of 407,991 cycles is obtained using the test bus assignment vector (1,2,2b,2,1,2a,2,2,2,1). A more general discussion of this problem is presented in Section 5.
OPTIMAL TEST BUS WIDTH
In this section we examine the problem of minimizing system testing time by determining (i) optimal widths for the test buses, and (ii) optimal assignment of cores to test buses. This generalizes the optimization problem discussed in Section 2. We assume that the total system test width can be at most W. We also assume that the width of a test bus does not exceed the width required for any given core, i.e., max j ͕w j ͖ Յ min i ͕ i ͖ for all values i and j, and test data serialization is required for every core. This assumption is necessary to avoid complex nonlinear models that are difficult to linearize. From a practical point of view, this assumption implies that cores with very small test widths are assigned to test buses after the cores with larger test widths are optimally assigned. We extend the ILP model and remove this restriction in Section 4.
We now formulate the problem of optimally allocating the total width among the N B buses, as well as determining the optimal allocation of cores to these buses. The optimization problem is formally stated as follows:
• P2: Given N C cores and N B test buses of total width W, determine the optimal width of the test buses and an assignment of cores to test buses such that the total testing time is minimized. 
Optimal Test Access Architectures
• PROOF. To show that P2 belongs to NP, we consider the following decision problem version of P2: Given N C cores and N B test buses of total width W, does there exist a width distribution for the test buses and an assignment of cores to test buses such that the total testing time is less than or equal to T? A nondeterministic algorithm can guess a width distribution and a test bus assignment for the cores and check in polynomial time if the testing time is less than or equal to T. To show that P2 is NP-hard, we use the method of restriction [Garey and Johnson 1979] . Consider an instance of P2 for which the W ϭ N B min i ͕ i ͖. Since the width of a test bus is at least min i ͕ i ͖, this implies that every test bus has a width of min i ͕ i ͖. This is equivalent to an instance of P1 which, as discussed in Section 2, is NP-complete. Therefore, P2 is NP-hard. e Even though P2 is NP-complete, the sizes of practical SOC problem instances allow it to be solved exactly. We now present an integer programming model for P2 that allows us to determine optimal widths and an optimal assignment of cores to buses simultaneously. We use the 0-1 variable x ij defined in Section 2.
Minimize C, subject to:
Note that constraint (1) above is nonlinear because it contains a product term. We linearize it by replacing the product term w j x ij with a new integer variable y ij (y ij Ն 0) and adding the following three constraints for every such product term:
(1) y ij Ϫ w max x ij Յ 0 , where w max ϭ W is an upper bound on the widths of the test buses.
The intuitive reasoning behind the above three constraints is as follows. Since x ij can take only 0-1 values, y ij is restricted to be either 0 (if x ij ϭ 0) or w j (if x ij ϭ 0). This implies that 0 Յ y ij Յ w max . The three additional inequalities are necessary and sufficient to constrain the values that y ij can take. This leads us to the (linearized) ILP model for P2 shown in Figure 7 .
As expected, the ILP model for P2 is bigger in size than the ILP model for P1. It contains N B N C 0-1 variables, N C N B ϩ N B ϩ 1 nonbinary integer variables, and ͑6N B N C ϩ N B ϩ N C ϩ 1͒ constraint inequalities. The ILP model for P 2 is especially useful in determining the effect of increased test data width on the testing time. However, there is a limit to which the testing time can be decreased by simply increasing the system test width. The following theorem provides a lower bound on the testing time T for a core-based system. It is useful in determining the maximum test width beyond which the testing time cannot be decreased by simply increasing width.
THEOREM 3. For a core-based system with N C cores, a lower bound on the total testing time T is given by
PROOF. Let the system consist of N B test buses with (undetermined) test widths w 1 , w 2 , . . . w NB such that min k ͕ k ͖ Ն max j ͕w j ͖. We know from Theorem 1 that T Ն max i ͕min j ͕͑ i Ϫ w j ϩ1͒t i ͖͖. Since min j ͕͑ i Ϫ w j ϩ 1͒t i ͖ ϭ ͑ i Ϫ max j ͕w j ͖ϩ1͒t i and max j ͕w j ͖ Յ min k ͕ k ͖, we have
This completes the proof of the theorem. e Next, we address the related optimization problem of determining the minimum system test width required to meet a minimum testing time objective. In addition, we determine an optimal distribution of the width among the test buses, and an optimal test bus assignment. The optimization problem is formally stated as follows:
• P3: Given N C cores, N B test buses, and a maximum testing time T, determine the minimum total test width, an optimal distribution of the test width among the test buses, and an optimal assignment of cores to test buses. Optimal Test Access Architectures • PROOF. Once again, using the same strategy as in the proof of Theorem 2, it is straightforward to show that P3 belongs to NP. To show that P3 is NP-hard, we polynomially transform an arbitrary instance of the known NP-complete problem P2 to an instance of P3. Consider an instance of P2 parameterized by ͑N C , N B , W ͒, with the decision problem version, checking if the testing time is less than or equal to T. The corresponding instance of P3 that we consider is parameterized by ͑N C , N B , T͒. Suppose a solution to P3 is obtained in polynomial time with a width of W ૺ . We now check if W ૺ Յ W. This provides a solution in polynomial time for P2. Thus, we conclude that P3 is NP-hard, and therefore NP-complete. e As in the case of P2, even though P3 is NP-complete, it can be solved exactly for instances of realistic core-based systems. The ILP model for P3 can be derived directly from P2, and is shown in Figure 8 . This model is of the same size as that for P2, i.e., it has the same number of variables and constraints. The following theorem relates the width of the widest test bus to the minimum testing time T and the test widths of the cores. THEOREM 5. Let ͕w 1 , w 2 , . . . , w NB ͖ be the optimal width distribution for a core-based system with N C cores and maximum testing time T. A lower bound on the width of the widest test bus is given by max j ͕w j ͖ Ն max i ͕͑i͒ Ϫ T ր t i ϩ 1͖.
PROOF. From the proof of Theorem 3, we know that T Ն max iʦ͕1, 2, . . . , NC͖
As examples, consider S 1 with two test buses as shown in Figure 2 . If an upper bound T ϭ 430000 cycles is placed on the testing time, then Theorem 5 yields max j ͕w j ͖ ϭ 22. As demonstrated in Table III, this lower bound on the test bus width is achieved using the ILP model for P3, hence Theorem 5 provides a tight lower bound. 
CASE STUDIES
In this section we present case studies using S 1 and S 2 for the optimization problems P2 and P3. (Solutions for the optimization problem P1 are presented in Section 2.) We also remove some of the restrictions imposed in Sections 2 and 3 in order to simplify the ILP models. We solved the ILP models using the lpsolve software package on a Sun Ultra 10 workstation with a 333 MHz processor and 128 MB memory. We were unable to obtain actual CPU times from lpsolve; however, the user time for P1 is less than one minute in all cases, while the user time for P2 and P3 is less than one hour in all cases-in fact, in most cases, the CPU time is only a few minutes. The problem instances, while realistic and representative of real-world SOCs, are small enough to be solved exactly using ILP. Table II presents the optimal test data width, optimal width distribution, and test bus assignment vector when two test buses are considered for S 1 and S 2 . For S 1 , the lower bound of 391,190 cycles predicted by Theorem 3 is reached for W ϭ 56 bits. Any further increase in the system test width W does not decrease testing time, since the widest test bus can be at most min i ͕ i ͖ ϭ 32 bits. Table III shows the optimal width and width distribution for S 1 and S 2 with two test buses for various values of the maximum testing time T.
In Figure 9 , we report experimental data for P2 and P3 when S 1 contains three test buses. As expected, for a given total test width, the testing time with three buses is less than with two buses; see Figure 9 (a). Not surprisingly, this difference becomes more pronounced as the total width increases. Figure 9 (b) shows the total width needed for two and three buses, respectively, for a given maximum testing time T. As T increases, the difference between the two two cases decreases. This is expected, since less stringent testing time requirements imply lower width requirements and decreases the need for more test buses.
Finally, we present experimental results for S2 when a greedy heuristic test bus design is used. The heuristic divides the total test width W equally among the two test buses. In the first set of experiments, we solve P1 to determine the testing time and an optimal test bus assignment for this equidistribution. In the second set of experiments, we simply calculate the testing time using the assignment vector of Table II. Table IV Table II is used, the increase in testing time is as high as 15%. This motivates the need for an optimal test bus design approach.
Next, we describe how the ILP models can be extended to remove the restriction max j ͕w j ͖ Ն min i ͕ i ͖. This is necessary to decrease the testing 48 (27,21) (2,2,2,2,2,2,2,2,2,1) 420000 43 (25,18) (2,2,2,2,1,1,2,2,2,1) 430000 39 (22,17) (2,2,2,2,2,2,2,2,2,1) 440000 34 (19,15) (2,2,2,2,2,2,2,2,2,1) 450000 30 (16,14) (2,2,2,2,2,2,2,2,2,1) 460000 25 (14,11) (2,2,2,1,2,2,2,2,2,1) 470000 21 (11,10) (2,2,2,2,2,2,2,2,2,1) 480000 16 (8,8) (2,2,2,2,2,2,2,2,2,1) time below the limit of Theorem 3 if greater test width is available. Let ␦ ij be an "indicator" 0-1 variable defined as follows:
The testing time T ij for core i assigned to test bus j can now be expressed as
The nonlinear terms in this formulation can be linearized as in Section 3, and the resulting ILP model can be solved easily to obtain optimal width distribution and test bus assignment. We solved the ILP model for S 2 , and the results in Table V indicate that significant reductions in testing time are achieved, especially for higher test widths. The ILP formulation also allows us to sometimes decrease the test width of the cores in the system, i.e., the number of lines that connect the cores to their respective test buses, without increasing the overall testing time. For example, if core i is connected to test bus j of width w j , then even though w j lines are available for propagating test data to and from core i, it is not always necessary to use all these lines. This is especially the case if bus j is not the test bottleneck. In such situations, b i Ͻ w j lines connect core i to test bus j. We refer to b i as the test width of core i. The motivation for using smaller core test width lies in the reduction of routing and interconnect area for SOC testing.
We allow b i to be less than w j in our ILP model by introducing the constraint x ij b i Յ w j , which implies that the test width for core i may be less than the width of test bus j if core i is assigned to bus j. We also replace w j by b i in the expression for T ij . The product term x ij b i does not pose a problem, since it is linearized for both P1 and P2. Table VI shows how the test width cores in S 2 can be reduced without increasing the system testing time. Table 2 over optimum 24 (2,2,2,2,2,1,1,1,1,2) 2383808 4.6 2573397 12.9 32 (2,2,2,1,1,1,2,1,1,2,2 
OPTIMAL SUBDIVISION OF TEST BUSES
In this section we allow the width of the test buses to be distributed among several buses with smaller widths. This allows the w j bits of test bus j to be divided into several parts, each of which can test one or more cores in parallel (see Figure 6 , for example). For a given total test width, the subdivision of test buses allows further reductions in the testing time. We first make the simplifying assumption that the width w j for each test bus j is known. Later we extend our model to the case where the widths are not known and optimal widths have to be determined. The optimization problem being considered here is stated formally below.
• P4: Given N C cores, N B test buses with known widths w 1 , w 2 , . . . , w NB , respectively, and an upper limit j max on the number of subdivisions allowed for test bus j, 1 Յ j Յ N B , determine (i) an optimal subdivision of test bus widths, and (ii) an optimal assignment of cores to test buses such that the total testing time is minimized.
Note that the decision version of P4 can also be shown to be NP-complete using the method of restriction. A restriction of P4 to P1 is achieved by setting j max ϭ 1, 1 Յ j Յ N B .
Let x ij be a 0-1 variable as defined in Section 2. Test bus j can be divided into a maximum of j max parts, each part serving as a test bus for a subset of cores in the system. Suppose that these parts have widths w j1 , w j2 , . . . w jjmax , respectively, such that kϭ1
Let y ijk be a 0-1 variable defined as follows:
1 if core i is assigned to the kth part of bus j 0 otherwise
The following constraint follows directly from the definitions of the 0-1 variables. It denotes the fact that a core is either assigned to a test bus with its complete width or to a portion of a test bus (with reduced width). Optimal Test Access Architectures
If core i is assigned the entire width of bus j, then its testing time is ͑ i Ϫ w j ϩ 1͒t i . (We assume as before that max j ͕w j ͖ Յ min i ͕ i ͖.) On the other hand, if it is assigned to the kth test bus derived from bus j, then its testing time is ͑ i Ϫ w jk ϩ 1͒t i . The cost function (system testing time) C can now be expressed in terms of the above parameters.
The right-hand side of the above equation is nonlinear, but it can be linearized as before by a sequence of transformations. Let
The cost function can then be expressed as C ϭ max jʦ͕1, 2, . . . NB͖ ͑C 1j ϩ C 2j ͒ and the optimization problem can be formulated as minimize C, subject to
We next linearize constraint (1). Note that C 1j Ն iϭ1
can be linearized by adding a nonbinary, integer variable r ijk for each i, j, k, and adding three constraints as in the case of P2 in Section 3. This yields the ILP model for P4 shown in We next generalize P4 to the case where the widths of the test buses also need to be optimally determined. The formal statement of this problem is given below:
• P5: Given N C cores, N B test buses with total width W, and an upper limit j max on the number of subdivisions allowed for test bus j, 1 Յ j Յ N B , determine (i) an optimal width for each test bus, the optimal subdivision of the width of every test bus, and (ii) an assignment of cores to test buses such that the total testing time is minimized.
The problem decision version of P5 can also be shown to be NP-complete by restricting it to P2. This is achieved by imposing the restriction j max ϭ 1, 1 Յ j Յ N B . The ILP model for P5, shown in Figure 11 , is obtained by combining the ILP models for P2 and P4. Finally, we present experimental results in solving optimization problems, P4 and P5. We considered S 1 and S 2 with two test buses (1 and 2), and we modeled the situation where the first test bus can fork into at most two branches (1a and 1b). The objective of this set of experiments was twofold: (i) demonstrate that P4 (P5) provides lower testing time than P1 (P2), and (ii) show that even nonoptimal solutions for P5 provide lower testing time than P2. 
Optimal Test Access Architectures •
We first return to the example based on S 1 , which we presented in Section 2 to illustrate P1. For this example, w 1 ϭ 32, w 2 ϭ 16, and an optimal testing time of 411,884 cycles was obtained using P1. By allowing the 32-bit bus to fork into two branches of 27 bits (1a) and 5 bits (1b) each, we achieve a reduced, but nonoptimal, testing time of 409,472 cycles using the test bus assignment (2,2,2,2,2,2,2,2,1b,1a).
Unfortunately, lpsolve did not run to completion for all cases when we attempted to solve P4 and P5. Nevertheless, we allowed it to run for up to two hours, after which we tabulated the best solution obtained. These results (optimal and nonoptimal) for P5 are presented in Table VII . The experimental results show that the added flexibility of allowing test buses to be subdivided can reduce the testing time significantly, especially for an SOC such as S 2 . Note also that subdivision also provides the same minimum testing time with 36-bit width as with a 44-bit width for S 1 . Figure 12 illustrates an optimal test access architecture based on P5 for S 1 , when the total width of 36 bits and at most one subdivision of the first test bus is allowed.
CONCLUSIONS
We have presented a formal methodology for designing optimal test access architectures for testing SOC designs. In doing so, we have attempted to provide a formal basis for comparing the several ad hoc test access architectures proposed in the literature. The methodology allows designers to explore design options and make appropriate choices. We examined several problems related to the design of optimal test architectures, including Fig. 12 . An optimal test bus architecture for S 1 with two test buses, total width of 36 bits, and only one subdivision allowed for the first test bus.
Optimal Test Access Architectures
• assigning cores to test buses, distributing a given test data width among multiple test buses, and determining the amount of test data width required to satisfy an upper bound on testing time. We have shown that even though the decision versions of these design problems are NPcomplete, they can be modeled efficiently using integer linear programming for practical instances. We applied these models to two nontrivial corebased systems and solved them using a standard software package available in the public domain. We are currently extending the ILP models to incorporate routing and additional power constraints, and recently reported initial results in this direction [Chakrabarty 2000b ].
Our results give rise to a number of useful extensions and new directions for further research, summarized below.
• The ILP models need to be generalized to handle test access architectures of the type shown in Figure 1 , where a test bus may fork, but not necessarily merge.
• Test access architectures may also be designed hierachically. Hence, ILP models should be able to handle hierarchical compositions, where complex cores embed one or more simple cores. Moreover, P4 and P5 should be extended to handle recursive sudivision of the test buses.
• The ILP model descriptions that we have used in our experiments are problem-specific, i.e., they are described in a format specific to the problem instance and to the lpsolve program. This is a cumbersome process. It is far more convenient to use high-level languages such as AMPL [Fourer 1993 ] and GAMS [GAMS Development Corporation 1993] that allow the model to be described in a parameterized form, independent of the ILP solver and the input data used for a specific instance of the model. • Finally, significant advances have been made in recent years in solving nonlinear integer programs, and a number of these solvers are now readily available, e.g., through the Argonne National Laboratory Ͻhttp:// www.mcs.anl.gov/otc/Server/neos.htmlϾ. We are examining the feasibility of using such nonlinear solvers for designing optimal test access architectures.
