A solution for mapping core I/O pins to SystemOn-a-Chip (SOC) 
Introduction
Core-based SOC design strategy is becoming more and more popular these days. The Semiconductor Industry Association's (SIA) Technology Roadmap [1] predicts the percentage of reusable cores in SOC to be rising to 80% in 2006, thereby resulting in a 50% reduction of time-to-market.
However, conflicting design objectives such as increasing complexity and reduced design cycles have made the test application time a major bottleneck towards achieving aggressive marketing requirements. Concurrent SOC testing (i.e. testing more than one core simultaneously) is becoming an attractive solution to reduce the total test application time under such circumstances. In this paper, the pin mapping problem in concurrent SOC testing is addressed, and a solution is presented to optimize the total number of SOC pins needed.
The paper is organized in the following manner. In Section 2, related work in the SOC test scheduling area is reviewed. In Section 3, various formulations of the pin mapping problem are presented, and, in Section 4, a heuristic algorithm to achieve optimized concurrent SOC test is proposed. Experimental results are presented in Section 5, followed by the conclusion section.
Review of Related Work
The complexity of a SOC makes manufacturing test a much more difficult problem than before. Many new DFT techniques have been exploited to address this problem. However, considering test issues for individual cores and User Defined Logic (UDL) may not be enough. The SOC composite test requires adequate test scheduling given a number of chip level requirements such as total test time, power dissipation, pin limitations etc. Test scheduling is also necessary to run intra-core and inter-core tests in a certain order that does not impact the contents of individual cores. SOC test should be created to satisfy these scheduling constraints. The requirements for test scheduling will become more complex as the level of SOC integration increases. Previous research in the area is discussed in the following paragraphs.
Sugihara et al.
[2] addressed the problem of selecting a test set for each core from a set of test sets provided by the core vendors, and then schedule the test sets in order to minimize the test application time. The authors assumed that: 1) each core has its own BIST logic, 2) external testing can be carried out for only one core at a time, and 3) core vendors provide multiple test sets comprising of both BIST and external test patterns for each core.
The problem was modeled as a combinatorial optimization problem and solved using a heuristic method. Chakrabarty [3] generalized the test scheduling problem of [2] . He assumed that Test Access Mechanism (TAM) includes one external test bus and multiple BIST resources, and the cores have been assigned to a test bus. The problem was formulated as an m-processor open shop scheduling problem, and solved by using a Mixed-Integer Linear Programming (MILP) model in order to minimize the test time. Ravikumar et al. [4, 5] proposed a method to solve test scheduling problem under the constraint of power consumption. They assume that BIST is the only methodology for testing individual cores.
All the above mentioned test scheduling algorithms for core-based systems ([2]- [5] ) assume a fixed TAM structure. However, recently, there have been a few works considering scheduling problem together with the test resource allocation [6] - [11] .
Chakrabarty [6] assumed a TestRAIL TAM structure and formulated the problem with an ILP model to find an optimum solution for allocating N test lines to a fixed number (N B ) of test buses, and assign each core to a test bus in order to minimize the total test time. The place-and-route and power constraints were also considered in [7] . In [8] , a technique to determine optimal SOC test schedules with precedence constraints, i.e. schedules that preserve desired ordering among tests were introduced. An algorithm was proposed to solve the problem in polynomial time by using preemption. Bagchi et al. [9] addressed the same problem as in [6] , but considered clustering some cores into a module, and schedule testing of modules rather than individual cores in order to reduce the total test time.
Marinissen et al. [10] [11] proposed a method to allocate test resources and schedule test sets in order to achieve optimal concurrent SOC test. The objective was to minimize the test application time, while offering full scan / partial scan / functional tests for different TAMs under the constraint of peak power consumption.
Compared with the above-mentioned methods, the method proposed in this paper does not target the test scheduling problem, instead, it uses the test scheduling information as a constraint to optimize the necessary SOC resources.
The concurrent test scheduling information is specified usually by core integrators. Because core integrators have access to the information such as test application time and power dissipation for each core, they prefer specifying groups of cores that should be tested concurrently. In practice, this scenario is more typical when core integrators happen to be core providers themselves.
In this paper, the problem of mapping core pins to SOC pins to access cores from chip-level I/Os directly under the test scheduling constraints is addressed. There are several major techniques for direct access of embedded core, such as multiplexed access [12, 13] , Test Bus [14] , and TestRAIL [15] . Our methodology is a general solution to fit all the above test access mechanisms. Given a fixed set of concurrent test groups and the total number of SOC pins, the method proposed in this paper can find the optimal resource requirement and allocation that can be used to satisfy the specified concurrent test constraints and save valuable chip-level I/Os as much as possible.
Problem Formulations

Assumptions
Before the description of the problem, some of the assumptions are listed as follows.
1.
It is assumed that only a subset of the SOC I/Os are available for mapping (either via MUXes or through test buses) based on the functionality and user preferences. Specifically, the dedicated SOC pins such as clocks, scan inputs, scan outputs, scan enables and test enables are excluded from consideration. A total of N SOC I/Os are considered ready for mapping.
2.
User-Defined Logic (UDL) is treated as a core, and for each core, there are clearly modes available to place it into isolation or subject it to test. The isolated core remains free from any harmful effects of input changes at the core boundary and test patterns can be applied in an unconstrained fashion to other core(s) under test. The cores that are specified for serial test or internal BIST will use the dedicated resources, and are not considered in the problem formulation. It is assumed that there are a total of K cores waiting for resource allocation.
3.
For each core C i (0<i≤K), the number of I/O pins that need mapping to SOC pins is W i . It is assumed that the SOC pins can be bi-directional in nature. Therefore the pin direction is not a constraint in the problem we address, i.e. any pin of a core can be mapped to any pin of SOC.
4.
All core pins are assumed to have direct access from the SOC pins.
Problem Statement
The problem is formally stated as follows. Given N SOC pins and K cores, for each core C i (0<i≤K), its total number of I/O pins W i is recorded as a weight for core i. Let Ω indicate the set of groups. In each group, there are a set of cores which have to be tested concurrently. A core can appear in different groups. The objective is to determine a one-to-one mapping from pins of each core C i to the SOC pins. The realization of the concurrent SOC test needs to satisfy the following conditions. (1) The total number of SOC pins has to be minimized. If the determined number of SOC pins M is less than the maximum allowed pins (N), then one can use the additional available SOC pins to balance the load for each pin or to alleviate the routing congestion. If M>N, the program reports back to the core integrator about pin shortage. It is then up to the core integrator to decide whether to add more SOC pins or change the grouping constraints. (2)
At any given time t, a group / set of cores (C t1 , C t2 …C tr ) included in Ω can be tested simultaneously.
Problem Formulation I
The problem can be transformed to a wellknown chromatic number problem as follows.
A graph G(V,E) corresponding to the problem given in the last subsection is built. Each vertex represents a pin on a core. An edge exists between two vertices if and only if one of the following condition holds.
(1)
The two pins represented by these two vertices belong to the same core.
(2)
The two pins represented by these two vertices belong to two different cores, but those two cores are specified in one concurrent group in Ω. Now, the problem is easily transformed to a chromatic number problem. Each vertex is assigned a color, which means that the core pin represented by the vertex is to be mapped to a SOC pin indicated by that color. The original problem is transformed to find minimum number of colors to achieve a proper coloring of G. A proper coloring of G occurs when no two adjacent vertices have the same color. Chromatic number problem is a notoriously well-known NPcomplete problem [16] .
3.4
Problem Formulation II
The same problem could also be formulated as a dependency matrix partitioning problem used in pseudoexhaustive test [17] . The dependency matrix corresponding to the problem given in subsection 3.2 has m rows and n columns, where m is the cardinality of the concurrent test set Ω and n is the total number of the pins for all cores. An entry a ij is "1" if and only if the corresponding pin of the j th column belongs to a core which is included in the i th concurrent test group. All other entries are "0".
The dependency matrix partitioning problem is formed by partitioning the columns of the dependency matrix into sets such that each row of a set has at most one 1-entry and the total number of sets is a minimum.
It was shown that the above problem is also NPcomplete [18] .
Proposed Algorithm
In this section, a clique partitioning based heuristic method is proposed to solve the concurrent SOC testing problem.
Note that both the above formulations are pin-based. Although there are several heuristics for solving the chromatic number problem, one needs to be careful while using them in this scenario because the total number of pins of cores in a SOC could be very large (could be up to ~10K), thereby leading to very expensive computational cost. In addition, the pin based heuristic solutions are typically 10% off, and as much as 100% off from the exact solutions [19] . In contrast, the number of cores in a SOC is limited (typically, less than 100 in practical designs). Therefore, in this paper a heuristic solution based on cores rather than pins is proposed.
The basic idea of the algorithm is to find a lower bound for this problem, and increase the lower bound gradually until a solution is found. The algorithm is described in Fig. 1 .
Fig. 1. Flow Chart of the Proposed Algorithm.
The steps of the algorithm as presented in the flow chart are explained in detail with the following example.
Example 1:
Let there be 7 cores C 1 , C 2 , … C 7 in a design and the number of pins for these cores be 200, 200, 100, 150, 160, 100, and 80 respectively. Let the concurrent test groups be the following: Ω = { (C 1 , C 2 ), (C 1 , C 4 ), (C 1 , C 6 , C 7 ), (C 2 , C 5 ), (C 3 , C 4 ), (C 3 , C 5 ), (C 3 , C 6 , C 7 ) }.
To apply the proposed algorithm in order to get an optimal number of SOC pins under the given constraints, the following steps are necessary.
Step1 To start, a weighted incompatibility graph G(V,E) is built. A weighted incompatibility graph is defined as a graph where each vertex in V represents a core and an edge between two vertices exists iff these two vertices appear in the same group at least once (The vertices are incompatible since they can not share any SOC pins). A weight attached to each vertex is the number of pins of the corresponding core.
The incompatibility graph for the given example is given in Fig. 2 .
Fig. 2. Incompatibility Graph for Example 1.
Step2 A maximum-weight clique in the weighted incompatibility graph is obtained with the total weight W. A clique of G is a complete subgraph of G. Hence, no two cores corresponding to a pair of vertices in the clique can share SOC pins. Therefore, a lower bound on the number of required SOC pins is the total weight of this clique. Since the number of cores in practice is limited (<100), it is computationally feasible to solve the maximum weight clique problem to get an exact solution. In the proposed algorithm, we use a branch and bound method proposed in [20] to solve the maximum weight clique problem.
In Example 1, the maximum weight-clique is (C 1 , C 2 ). The total weight W is 400.
Step3 Based on the incompatibility graph, a compatibility graph is built. The compatibility graph is the complement of the incompatibility graph. For Example 1, the compatibility graph is shown in Fig. 3 .
Fig. 3. Compatibility Graph for Example 1.
Step4 Next, a maximum clique in the compatibility graph is determined that satisfies the following conditions. (1) At least one core is selected from the maximumweight clique as determined in Step 2.
The maximum clique is determined without considering their weights. (3) If there are more than one candidate groups, the group that has the largest total weight is picked. If there is still a tie, one of the groups is picked randomly.
In Example 1, there are 2 maximum cliques that satisfy the above-mentioned conditions, (C 2 , C 4 , C 6 ) and (C 2 , C 4 , C 7 ). Group (C 2 , C 4 , C 6 ) is picked as it has a larger weight.
Step5 Let w i be the weight of core i that is included in the maximum clique selected in Step 4. Min(w i ) is the minimum weight of a core in this clique. Min(w i ) pins are selected out of each core in the clique, and are mapped to the same SOC pins. For each core selected in the clique, the weight is updated to (w i -Min(w i )). If a core has weight 0, the corresponding vertex and all the edges incident to it are deleted.
In order to illustrate the mapping process, let C[a,b] indicate the pins in the range from a to b in a core or the SOC.
For the given example, C 2 [1,100], C 4 [1, 100] , and C 6 [1, 100] are mapped to SOC [1, 100] , and the updated compatibility graph is shown in Fig. 4 .
Fig. 4. Updated Compatibility Graph
Steps 4 and 5 are repeated until the number of mapped SOC pins reach the maximum limit W, which is determined in Step 2. The process is shown in figure 5 . 
Experimental Results
In order to evaluate the proposed algorithm, we compare the results with two greedy algorithms -namely Heuristic 2 (H2) and Heuristic 3 (H3). H2 is based on the information of a core and its immediate neighbors. In this case, a degree is associated to each core, which is the sum of the weights of all the neighbors including the core itself. A core with the maximum degree is selected first for mapping. In heuristic H3, the algorithm is the same except that the degree for each core is now calculated based on just its neighbors (without considering the weights). In other words, both these algorithms try to map the most "hard-to-map" cores first based on some locally optimal decisions. In addition, the proposed method is also compared with another greedy algorithm where the cores are ordered randomly for pin mapping. All these algorithms were implemented to provide a fair comparison of the proposed method as no such work in this area is reported in the literature.
These algorithms are run on 9 hypothetical but nontrivial SOCs (S1 to S9) and 2 real industrial designs (IND1, IND2). The number of pins for each core is randomly generated between 30 and 300 for S1 to S9. The number of cores and number of pins for each core in SOCs IND1 and IND2 were fixed when designed for commercial use. However, since the core grouping information is not available, the concurrent test groups for all the circuits were generated randomly. The algorithms were implemented in C running on a SUNBlade1000 workstation. The CPU time for each benchmark is less than 1 second for the proposed algorithm (H1) and up to ~10s for H2 and H3. That's why the CPU time is not included in the tabulated results.
The experimental results are shown in Table 1 . The number of cores for each test case is shown in the second column. The lower bound of SOC pins given by the proposed clique partitioning algorithm are in Column 3. The SOC pins obtained in the proposed algorithm (H1) are listed in Column 4. The incremental number of SOC pins obtained by algorithms H2 and H3 over H1 are shown in Column 5 and Column 6 respectively. The next 6 columns show the results obtained by 6 runs of the greedy algorithm based on random selection. From the results, it is evident that:
The proposed algorithm H1 results in the best solution for all the test cases used. Note that the results are quite close to the lower bound.
The order of mapping is very critical. A bad ordering of cores could lead to results that are far from the optimal solution. (3) The greedy algorithm H2 is better than H3. (4) The greedy algorithm based on random core selection usually can not guarantee good solutions.
On average, the total number of SOC pins computed by greedy algorithms H2 and H3 are about 3% and 5.5% more than the proposed method H1 respectively. The random algorithm, on the other hand, may use upto 13% more SOC pins.
In order to compare the heuristics with an exact solution, a pin-based exact coloring algorithm is implemented using the method proposed in [19] . However, for the test cases used in Table 1 , the computational time is exponentially large and far from completion. For some of the large test cases in Table 1 , the exact algorithm could not come up with a solution after running for even 2 days. Therefore, small test cases were used to compare the results. The results are shown in Table 2 . In Column 2, the total numbers of pins on all cores are given. This is followed by the optimum number of SOC pins necessary in each case. In the last 3 columns, the numbers of SOC pins obtained by H1, H2, and H3 are presented.
The computational time for all heuristics are less than 0.01s for all the test cases in Table 2 , whereas the exact solution suffers from exponential computational time. Even for these small test cases, the CPU time for the exact solution is upto 6386.3s. The results in Table 2 show that the proposed heuristics can come up with solutions that are very close to the exact solution, and would therefore provide fairly good results for all practical scenarios.
Conclusions
In this paper, an algorithm to allocate test resources efficiently for achieving concurrent SOC test under specific test constraints is presented.
The objective was to minimize the total number of SOC pins required for a design. The problem can be formulated as a chromatic number or a clique partitioning problem. A heuristic algorithm is proposed to solve the problem. Experimental results show that the proposed method produces much better results when compared with several greedy approaches and is cost efficient when compared with an exact solution.
