Abstract-Testing of embedded core based system-on-chip (SoC) ICs is a well known problem, and the upcoming IEEE P1500 Standard on Embedded Core Test (SECT) standard proposes DfT solutions to alleviate it. One of the proposals is to provide every core in the SoC with test access wrappers. Previous approaches to the problem of wrapper design have proposed static core wrappers, which are designed for a fixed test access mechanism (TAM) width. We present the first report of a design of reconfigurable core wrappers which allow for a dynamic change in the width of the TAM executing the core test. Analysis of the corresponding scheduling problem indicates that good approximate schedules can be achieved without significant computational effort. Specifically, we derive a ( ) time algorithm which can compute near optimal SoC test schedules, where is the number of cores and is the number of top level TAMs. Experimental results on benchmark SoCs are presented which improve upon integer programming based methods, not only in the quality of the schedule, but also significantly reduce the computation time.
I. INTRODUCTION
The widening design productivity gap between VLSI manufacturing capabilities and engineering ability, in a limited time to market scenario, has prompted many design groups to adopt a policy of design reuse at the core level [13] . Typical cores include CPUs like MIPS and ARM, network controllers, embedded memories, digital signal processing (DSP) cores and associated peripherals like ethernet controllers and UARTs. As has been noted in [20] , reusability of design alone is not sufficient as the verification and test generation efforts now dominate the typical design time. Reusability of tests is, therefore, crucial for reducing total design time. This raises the problem of test knowledge transfer and physical test application. The proposed IEEE P1500 Standard on Embedded Core Test (SECT) standard [8] , [19] provides facilities for test knowledge transfer using core test language (CTL) [12] and has suggested various core wrapper designs for providing test access to embedded cores. The core wrapper is a thin shell around the design proper which connects the test access mechanism (TAM) to the core and provides support for switching the core between normal functional use and test access.
The complete problem of core test application can be divided into three subproblems. 1) TAM design problem [5] , as the number of test pins available at the IC level is constrained, an optimal partition of these test pins must be done so as to reduce the total test cost. Test cost in this paper refers to the test application time on the automatic test equipment (ATE); 2) wrapper design problem [15] , given a particular width of the TAM, how should the core level scan chains be connected so as to reduce the length of the longest scan chain for that core; and 3) test scheduling problem [4] , given a set of tests and the test resources like TAM (many of which are shared between many cores), to determine the TAM assignment to execute the tests of each core, so as to reduce the total SoC test time.
Previous work in the area of core wrapper design has focused on static TAM partitions i.e., the number of TAM bits (the TAM width) Manuscript Recent years have seen a lot of activity in the area of wrapper and test access architecture design (cf. [3] , [6] , [9] , [15] , [18] , [20] ). Varma and Bhatia [18] proposed a dedicated test bus to transport the test vectors and response. Their method is referred to as FScan-TBus, and it uses full scan for core level testability; a test bus is added to the SoC which is responsible for transporting the test vectors. Another bus based method is presented by Zorian, Marinissen, and Dey [20] . Ghosh, Dey, and Jha present a method [6] which can use transparent paths in the cores. The method assumes that core providers can provide different versions of the core having varying test access latency, and then the test integrator can make a good choice of which core to use for his particular SoC. In practice, however, it is difficult to customize a core with respect to DfT for multiple instances of delivery.
The problem of efficient wrapper design was first investigated in detail by Marinissen, Goel, and Lousberg in [15] . They showed that the scan chain partitioning problem (PSC) was NP-hard by reducing the multiprocessor job scheduling problem to PSC. They present several heuristic solutions based on bin-packing and bin-design problems. Chakrabarty [3] has approached the TAM optimization problem with place and route constraints by formulating it as an integer linear program (ILP). Recently, Iyengar et al. [9] proposed ILP-based methods to solve the TAM and wrapper co-optimization problem.
All previous approaches have designed test wrappers around cores assuming a static width of TAM. The proposed IEEE P1500 standard proposes a core wrapper design that is designed with the width of the external TAM fixed at a prior known value. This wrapper design allows two modes, parallel loading of test patterns using wi number of TAM bits, or a serial access. This is clearly suboptimal, as a core can have a variety of tests. Each of the tests will have varying bitwidth requirement. As an example, a built-in self-test (BIST) typically requires application of only few a parallel patterns from the TAM and, hence, has a low-bitwidth requirement. Compare this to scan test which has a high-bitwidth requirement. The P1500 standard does not describe the SoC level test access mechanism structures which will be connected to the core; as a consequence the core provider does not have any means to optimize his core delivery for varying test scenarios. The choice of the number of core internal scan chains exposed at the wrapper level is a static one, and cannot be changed dynamically by the SoC integrator (of course, the SoC integrator may be able to reduce the number even further by chaining multiple internal chains together).
1063-8210/03$17.00 © 2003 IEEE Assigning a core wider than required TAM, can increase the SoC test time, as some other core could have benefitted from a wider TAM. It is more efficient that the wrapper can adjust to the bitwidth requirement dynamically; thus different tests of the same core can be executed on different width TAMs. When a core wrapper is designed in such a way that it admits the formation of balanced internal and surround scan chains for more than one TAM width, then the wrapper is said to be reconfigurable. The number of different TAM widths which can be admitted is denoted the degree of reconfigurability. Dynamic reconfiguration is the process by which a core wrapper is reconfigured by special DfT hardware to admit a different TAM width, while still maintaining balanced scan chains.
III. TEST TIME CALCULATION FOR WRAPPED CORES
Let us denote the test time calculator function as FT (Ti;wi), where i is a test and w i is the width of the TAM assigned to execute this test. For scan tests FT depends on the number of scan patterns and the length of the longest scan chain in the core. Aerts and Marinissen provide a simple expression to compute the test time for a core test T i when executed on a TAM of width wi in [1] . From [1] we can derive 
Informally, i is the TAM width value beyond which there is no decrease in the test time for test i.
We also associate with each core test a processing speed function (or speedup) for a TAM which can be achieved by executing the core test on w i TAM wires. Let us denote this speed function for a test i with a TAM width wi by gi(wi) 0. If wi bits are assigned to execute test i, then the test executes in time (F T (T i ; 1)=g i (w i )), where g i (w i ) is a nondecreasing processing speed function defined for w i 2 0; 1; ...;i;gi(0) = 0.
Using some recent results from the field of parallel malleable task scheduling (MPTS) as presented by Blazewicz, Kovalyov, Machowiak, Trystram, and Weglarz [2] we can analyze the SoC test time as follows. We can represent a core test as dxi dt 
Example 1: Given the core test execution time data for two cores, cores and in Table I , calculate the optimal schedule makespan when the SoC containing cores and is tested with a top level TAM width W = 4 bits.
Conventional scheduling models to solve the embedded core test scheduling problem (ECTSP) above would first create a partition of the top level TAM width W into candidate subTAMs, and then solve the resulting minimum makespan assignment problem sequentially. One such framework is described in [9] , which formulates the ECTSP as an integer linear programming problem. Since N C = 2;W = 4 in the above example, only two candidate subTAMs have to be analyzed, yielding a possible set of f[1; 3][2; 2]g. The core assignments yield a makespan of (100, 80, and 80) for the cases FT (; 1), FT (;1), and FT (; 2), respectively. Obviously, the optimal TAM partition and core assignment are the cases for which the makespan is equal to 80 clock cycles.
The ILP formulation of [9] uses the above described scheme, which is exponential both in the number of partitions (even ignoring cyclically isomorphic partitions) and the possible assignments. Moreover, the optimal makespan (80 clock cycles) can be improved using dynamically reconfigured TAM assignments as shown below.
We represent the core test time function as a piece-wise linear function and calculate the speedup function g 1 and g 2 . Using g 1 and g 2 , we can compute f1 and f2, and solve the following equation:
For Example 1, we get w1 2:5, w2 1:5. We also compute the makespan C max = C 1 = C 2 70 clock cycles. Instead of using a discrete TAM assignment model, we have computed the optimal schedule with continuous TAM bit assignment. Of course, a TAM width of 2.5 is not realizable in hardware. Hence, we compute two TAM assignments which approximate the optimal assignment of (2.5,1.5) bits; these are (2,2) and (3,1), respectively. We then create 2 linear variables 1 and 2 and solve the following equations:
21 + 32 =2:5
Solving for 1 , 2 , yields 1 0:5 and 2 0:5. We also compute the processing intervals 1t 1 = C max 1 and 1t 2 = C max 2 giving 1t1 = 35clock cycles and 1t2 = 35clock cycles, respectively. Combining the above information the schedule for the SoC can be output:
Core is assigned three TAM bits for first 35 clock cycles, and two TAM bits for the next 35 clock cycles. Core is assigned one TAM bit for 35 we can draw the corresponding schedule as shown in Fig. 1 . A graphical representation of a schedule for tests with varying TAM bit assignments is shown below in Fig. 1 . The time 1t recon is the time consumed by the reconfiguration itself, we show in Section IV that using test control mechanisms this overhead is very small (in comparison to the core test time itself).
As can be seen from the schedule, 6 calls for two different TAM width assignments to the same core but at different time instances. For optimum test time, the core internal scan chains should be able to form a balanced group for both these TAM widths. In Section IV, we describe our proposed design of reconfigurable core wrapper which allows for a dynamic change in the width of the TAM assigned to the core.
IV. DESIGN OF RECONFIGURABLE CORE WRAPPERS
We propose to design reconfigurable core test wrappers using multiplexers at the input and output of each reconfigurable scan chain. The set of reconfigurable scan chains can be calculated using graph theoretic methods as explained below. We first describe the proposed method with the help of an example. Consider a core A which has a total of six scan chains of length f12; 5; 17;9; 14; 4g, respectively. We calculated partitions of these scan chains for some TAM widths. These partitions are shown in Table II . The objective function used to compute these partitions is to minimize the length of the longest scan chain, i.e., min L w max (A).
We can also represent each of these partitions by means of directed graphs. We define G = (V [ I; E) to be a directed graph such that for each scan chain Si we denote a vertex i in G with label l(Si), where l(S i ) is the length of the scan chain. I denotes the input/output TAM for the core and is assumed to have infinite length. Hence, l(I) = 1.
Two vertices are said to be comparable iff they are in the same partition.
The special vertex I is assumed to be present in every partition. Then, the labels of the vertices which are comparable induce a partial order on V [ I. The vertex set of G forms a partially ordered set (poset). A pairwise comparable subset of a poset is called a chain. A path from I to any other vertex forms a chain. We add an edge from a vertex u ! v in G such that 8 u, v 2 V : l(u) l(v) and P (u) = P (v), i.e., u precedes v in the chain, where P (i) denotes the partition number of vertex i. Informally, we add an edge between two vertices iff they are in the same partition and the direction of the edge is from the vertex representing the longer scan chain to the next shorter one in the poset. For each TAM width instance we create such a graph. Let the graph for TAM width i be G i . Example graphs for core A with TAM width 1 and 4 are shown in Fig. 2 . The shaded vertices are the scan chains that are also to be connected to the output TAM.
We form the combined directed graph representation for a core by taking the union of all directed graphs of the core. Formally, 4 . The graph is shown in Fig. 3(a) .
The combined directed graph representation captures the partitioning requirement per core. The indegree of any vertex i in G 3
represents the number of signals that must be multiplexed into that scan chain. We can make some general observation about G 3 . The longer a scan chain (or higher the label of the vertex) the less it needs to be multiplexed (muxed). Secondly, the longest chain of the core need not be muxed at all. This statement is easily provable as we can see that in any graph G k for 1 k i , there always exists one chain (vertex) which is connected only to I. This is the longest scan chain. Given the combined directed graph of a core G 3 we can derive the scan chain access structure as follows. We design our architecture using multiplexers to switch the chain configuration. We observe that the number of select signals for each scan chain i is dlg indegree(i)e. 
V. SOC TEST SCHEDULING
Previous work in the area of test scheduling can be found in [4] , [9] , and [14] . In [4] , Chakrabarty has analyzed the test scheduling problem by reducing the test scheduling problem to open-shop scheduling with m processors. Recent improvements to this formulations have been presented by Iyengar, Chakrabarty, and Marinissen in [9] . The authors have given a ILP formulation of the test scheduling problem for a fixed partition. Then the approach is extended for variable width partitions using ILP. A graph theoretic formulation of test scheduling has been presented by the author in [14] ; where the scheduling problem is solved using a single-source unsplittable flow algorithm.
A. Constrained Reconfiguration Scheduling
Designing reconfigurable wrappers for small cores might be expensive in terms of area and routing. A question that can be asked; is it possible to solve ECTSP when only some of the cores have reconfigurable wrappers? In this section, we analyze the case when only one of the cores (to be more precise; one of the larger cores) has a reconfigurable wrapper. We will show that under a reasonable assumption on the size of the core, close to optimal schedules can be found.
Assume one of the tests is malleable, i.e., it can be dynamically reas- 
It is easy to see that if Am W 16(s)0 i2Z Ai, then Cmax(Z) = ( i2Z A i =W ), or that there is no idle time on the TAM wires, see Fig. 4 . Using this result we can construct an algorithm ECTSP1Sol which tries each malleable test (in decreasing order of area) as a candidate for the last addition. For any malleable test if the area condition is met, then we would have found a schedule with zero idle time on the TAM wires. Barring the increase in the area of a test rectangle due to nonlinear speedup, this schedule is optimal. The N C 0 1 cores have to be packed in an LB fashion (borrowing a term from the floorplanning literature; an LB packing is a packing of modules such that no module can be moved to the left or bottom without moving other modules). For ECTSP, such an algorithm has been presented earlier by the author in [14] , which formulates minimum makespan assignment as a network transportation problem. It was shown in [14] that an LB schedule can be computed in time O(NC 2 B). The NC 01 cores are arranged using this method which we call TFLOW. In order to compute the SoC makespan after the addition of the core with the reconfigurable wrapper, we can write a procedure mhe for Manhattan Evaluate, as the completion times for all TAMs form a manhattan like shape. The procedure mhe evaluates such a manhattan shape for a given core. This is shown in Fig. 5 . The manhattan shape is given as a list, for Fig. 5 this is ((w 1 ; T 2 )(w 3 ; T 1 )). It is easy to see that this implies that w 1 + w 3 bits are idle for time C max 0T 2 , and w 3 bits are idle for time T 2 0T 1 .
Using the core test data, it can be computed that given such TAM widths, how many patterns will be executed. If we denote the total number of idle TAM bits in an interval of 1T i by 1w i , then the total number of patterns consumed is 
B. Analysis of ECTSP1SOL
The call to TFLOW takes O(N C 
VI. IMPLEMENTATION DETAILS AND EXPERIMENTAL RESULTS
The algorithms presented in this paper were implemented using the Common Lisp language, running on an Intel Pentium Celeron 1.2 GHz machine with 768 MB RAM running Linux. We present the results of our experiments on two SoCs of the ITC'02 SoC Benchmark Suite [16] in Table IV . The names of the SoCs are a measure of their test complexity. SOC d695 is an academic SoC from Duke University, Durham, NC, consisting of ISCAS benchmark circuits. The other benchmark SoC is an industrial ICs from Philips. All the cores were assumed to be individually tested blocks, and no hierarchy was used. Functional inputs, outputs, and inouts were assumed to be equivalent to scan chain of unit length.
The test planner software we have written accepts SoC data in the ITC'02 benchmark format [16] , and proceeds to create optimized TAM architecture for a user specified top level TAM width. Then it uses this architecture to schedule each of the core test, and reports the makespan (the schedule time) in clock cycles. The software also prints useful diagnostic information and graphical plots. A number of experiments for varying W were conducted, and the results are shown below. For comparison, we have also given alongside the results given by Iyengar et al. in [9] . We compare the schedules with rectangle packing based methods as proposed by Iyengar et al. [11] in the third column. For the comparison with test-rail based methods we have compared our results against those presented by Goel and Marinissen in [7] . For an empirical comparison, we also show a lower bound for each schedule; the lower bound was computed using the relation
The columns labeled 1T show the relative comparison in percentage from the computed lower bound for each of the methods 1T = (T 0 LB=LB) 3 100. In the last column, we present the schedule makespan obtained using our proposed method using reconfigurable wrappers ( ECTSP1SOL ). In the case of ECTSP1SOL, core s38584 and c6 were chosen to have reconfigurable wrappers for SOC d695 and p93791, respectively. An important factor contributing to the efficiency of ECTSP1SOL, is the low dependence of the run time on the top level TAM width, W . The results for schedule makespan are in close agreement with the enumerative method of [9] and improve upon the integer linear programming method. The reconfiguration time in these experiments has been assumed to be zero, as by using core level test control mechanisms (TCMs), the reconfiguration overhead can be eliminated; a TCM can change the test mode of a core without disturbing the ongoing test of surrounding logic [17] .
A. Discussion of Results
In this implementation of the reconfigurable wrapper based test scheduling, we have used a partition TAM based testbus model in which first the TAM width on top level W is divided into B testbuses and then these are used for scheduling. Static TAM partition can lead to larger test times as shown in Table III , but it is popular as it is the most easy TAM design to implement. Some of the newer scheduling approaches use rectangle-packing methods [11] , which can create hard-to-route TAM designs, but provide very good test schedule times. Another approach is to forgo testbus style, and adopt a testrail style of TAM design and use pattern pipelining to optimize test time [7] .
The proposed method is a DfT/test scheduling technique to optimize existing testbus based TAM architectures for dynamic scan chain reconfiguration; it is simple to implement in hardware (TAM layout need not be changed), its hardware cost are low, and it can improve the test schedule by optimizing the TAM utilization factor. An advanced version of this software, which does not need static TAM partition on the top-level is under development, and is an area of future research for us.
VII. CONCLUSION
We have described the design of reconfigurable test wrappers for cores; using these wrappers the width of a TAM connected to a core may be allowed to change dynamically. A procedure for the automatic derivation of the necessary DfT hardware needed for reconfiguration using a graph theoretic representation of core wrappers was also presented. Integration of the proposed scheme with the proposed IEEE P1500 (SECT) standard is described. We show in this paper, that the area and performance overhead of our proposed scheme is minimal. Using some recent results from the theory of malleable task scheduling we have designed a simple (yet effective) test scheduling algorithm. For SoCs employing a combination of static and reconfigurable wrappers, an easy to implement scheme ECTSP1SOL is shown which is a natural derivation from the geometrical representation. In our opinion, this method has the most promise as we feel that large cores which have a high number of scan chains and patterns are ideal candidates for addition of reconfiguration capability. The time complexity of our algorithm is O(NC 2 B) which compares favorably with the ILP formulations of [9] and N 3 C approximation schemes proposed in [4] .
In addition to improving the SoC schedule complexity, reconfiguration can also address the concerns of core providers who must provide cores to different integrators who have varying SoC level test constraints. A CPU core meant to be integrated onto a SoC for SmartCard will be expected to operate in test mode with very few TAM bits at top level. In contrast, the same CPU core when integrated on a game processor IC can expect to be connected to a large number of test pins in parallel. To avoid test regeneration effort per customized version of the design, the core provider should design the core with reconfigurable wrappers. Using reconfigurable wrapper design, the core provider can create a generic core supporting both scenarios by providing reconfiguration test modes.
Future work will concentrate on benchmarking this method against static wrapper test methods on the axes of test time, DfT area, and routing congestion. On the theoretical side, an analysis of the approximation factor introduced by the relaxation technique used in this paper to approximate the nonintegral TAM widths is required. Extension of this scheme to incorporate test conflicts, power constraints, and precedence relations are also topics of future research.
