The test time for core-external interconnect shorts and opens is typically much less than that for core-internal logic. Therefore, prior work on test-infrastructure design for core-based system-ona-chip (SOC) has mainly focused on minimizing the test time for core-internal logic. However, as feature sizes shrink for newer process technologies, the test time for signal integrity (SI) faults on interconnects cannot be neglected. The test time for SI faults can be comparable to, or even larger than, the test time for the embedded cores. We investigate the impact of interconnect SI tests on SOC test-architecture design and optimization. A compaction method for SI faults and algorithms for test-architecture optimization are also presented. Experimental results for the ITC'02 benchmarks show that the proposed approach can significantly reduce the overall testing time for core-internal logic and core-external interconnects.
INTRODUCTION
As feature sizes shrink and clock frequencies increase for high-performance system-on-a-chip (SOC) designs, signal integrity (SI), that is, the ability of an input signal to generate correct responses in a circuit [Guler and Kilic 1999] , is becoming a major concern for the interconnects between embedded cores [Kao et al. 2001 ]. SI problems, typically caused by capacitance and inductance between interconnects, include overshoots, undershoots, glitches, oscillations, excessive signal delay, and even signal speedup [Kundu et al. 2005 ]; see Figure 1 . SI-related problems are aggravated in core-based SOC designs because interconnects transporting signals between embedded cores tend to be long, hence they suffer more from crosstalk effects [Nordholz et al. 1998 ]. If the noiseinduced voltage swing and timing skews depart from the noise-immune region, functional error may occur.
Traditionally, SI problems have been treated as design errors, and a number of physical design and fabrication solutions [Becer et al. 2004; Chen et al. 2004; Massoud et al. 2002; Zhang and Sapatnekar 2004] have been proposed in the literature to tackle them. These design techniques rely on accurate simulation of SI effects, which are affected by many parameters (e.g., characteristics of interconnects and transistors, input data and environmental noise). Unfortunately, these parameters are interdependent and our lack of complete knowledge of this interdependence leads to uncertainty and inaccuracies in the simulation of SI loss [Wang et al. 2007] . Moreover, process variations and manufacturing defects may aggravate the SI-related problems [Natarajan et al. 1998 ]. Since it is unacceptable to over-design the circuit to tolerate signal integrity loss in all cases and it is impossible to predict the occurrence of defects, manufacturing test strategies are essential for detecting SI-related errors [Cuviello et al. 1999; Sirisaengtaksin and Gupta 2002; Tehranipour et al. 2003 ].
Various SI fault models [Cuviello et al. 1999; Kundu et al. 2005; Tehranipour et al. 2004] and associated test methodologies [Bai et al. 2000; Tehranipour et al. 2003 ] have been proposed in the literature. SI-related problems are aggravated in core-based SOC designs because interconnects carrying signals between embedded cores tend to be long and hence they suffer more from parasitic effects [Nordholz et al. 1998 ]. Despite this problem, most prior work in SOC test-architecture optimization has focused on core-internal test (InTest) only [Ebadi and Ivanov 2003; Goel and Marinissen 2002; Iyengar et al. 2002; Larsson and Peng 2002; Larsson and Fujiwara 2003; Nahvi and Ivanov 2004; Xu and Nicolici 2004; Zhao and Upadhyaya 2005; Zou et al. 2003 ] and neglected the problem posed by core-external interconnect SI faults. The test time for SI faults is long because of the need to exercise a large number of signal-state combinations for the interconnects [Sirisaengtaksin and Gupta 2002; et al. 2003 ]. For nanometer SOCs running at speeds of several hundred MHz and higher, the test time for SI faults can be as high as or even exceed the test time for the embedded cores. Therefore, the goal of this article is to study, for the first time, the impact of SI faults on SOC test-architecture design. The main contributions of this article are as follows:
-We present a two-dimensional SI test pattern compaction strategy to reduce the interconnect SI test data volume. -We develop algorithms for SOC test-architecture optimization to minimize the overall SOC testing time for both interconnect SI faults and core-internal faults. -We show that for the ITC 2002 SOC Test benchmarks, the test time obtained using the proposed method is significantly less than that using two baseline methods: (i) a test access architecture optimized for core-internal test and then used for core-external SI fault testing; (ii) a test-access architecture optimized for both core-internal test and core-external test, but where only pattern count reduction is employed for the tests for SI faults.
The remainder of this article is organized as follows. Section 2 reviews related prior work and provides motivation for the work described in this paper. Section 3 presents the proposed test pattern compaction method for SI faults. In Section 4, SOC test-architecture optimization techniques for handling both core-internal faults and core-external SI faults are described. Experimental results for benchmark SOCs are presented in Section 5. Finally Section 6 concludes this article.
RELATED WORK AND MOTIVATION
Early attempts for testing SI-related problems modeled crosstalk at the circuit level [Attarha and Nourani 2002; Chen et al. 1999] . Although more accurate than gate-level models, the complexity of the associated test-pattern generation procedures limits its usefulness for SOC interconnects. Cuviello et al. [1999] proposed a behavioral-level SI fault model, called the maximal aggressor (MA) model. This approach assumes that all aggressors 1 make the same simultaneous transition (in the same direction) and act collectively to generate a glitch when the victim is quiescent, or a delay error when the victim makes an opposite transition. Therefore, 6N test-vector pairs are needed to detect SI faults for a set of N interconnects. Since in reality, inter-core interconnects in SOC can be of any arbitrary topology, to reduce test pattern count, Sirisaengtaksin and Gupta [2002] extended the MA fault model to the so-called maximum-affectingline (MAL) fault model by taking the physical layout information into account. If all the physical defects are capacitive or resistive, all MA/MAL faults can be targeted using a pattern count that is linear in the number of interconnects. When inductance is considered, however, such test patterns may not be able to generate maximum noise/delay on the victim line [Chen et al. 1999; Naffziger 1999] ; hence, Tehranipour et al. [2004] presented a multiple transition (MT) fault model that covers all transitions on victim and multiple transitions on aggressors. The number of test patterns for this MT fault model, however, is exponential in the number of interconnects under test. To address this problem, an empirically-determined locality factor k showing how far the effect of aggressors remains significant, was introduced. For a set of N interconnects, the number of test patterns for the reduced-MT fault model is approximately N · 2 2k+2 . Built-In Self-Test (BIST) has been a popular test method used to detect SIrelated errors [Sekar and Dey 2002; Tehranipour et al. 2004] . In this approach, driver side of interconnects are equipped with test generators to generate transitions on the aggressors and victims, while at the receiver side, various types of integrity-loss sensor (ILS) cells are embedded to detect SI-related errors. Bai et al. [2000] introduced on-chip test generators and error detectors at the core boundaries, based on the MA fault model [Cuviello et al. 1999] . Nourani and Attarha [2001] presented two ILS cell designs to detect voltage distortions and timing violations, respectively. Later, several other ILS designs [Caignet et al. 2001; Tabatabaei and Ivanov 2002] were introduced, which are more accurate in measuring voltage and/or timing violations, at the cost of large area overheads. Assuming the existence of logic BIST structures in an SOC, Sekhar and Dey [2002] presented a self-test solution, called LI-BIST, for both the core internal logic and the SOC interconnects. Zhao et al. presented an online testing technique to capture noise-induced logic failures in functional buses [Zhao et al. 2004] . Yang et al. [2001] used boundary scan and IDDT to test functional buses. Finally, Chen et al. [2001] discussed how to test SI defects on the data bus and the address bus by executing a test program on the microprocessor.
A test method that relies on hardware-based test generators may cause overtesting and/or under-testing since not all test patterns generated in the test mode are valid in the normal functional mode of the SOC. In addition, since the SOC interconnect topology can be arbitrary (see Figure 2 ) and it is hard to predict it during test-hardware insertion, the interconnects between several cores may be close enough to result in SI errors [Sirisaengtaksin and Gupta 2002] . It is very difficult, if not impossible, to take interconnect proximity into account for these hardware-based test techniques. Therefore, in this work, we assume that the test stimuli are loaded from an external tester to the core-test wrapper. To apply SI test at the core-level, as shown in Figure 3 [Tehranipour et al. 2003 ], the wrapper-output cell (WOC) should be able to provide the necessary consecutive transitions to interconnects; the wrapper-input cell (WIC) needs to be equipped with a signal integrity loss sensor [Bai et al. 2000; Tehranipour et al. 2003 ] to capture the signal with noise and/or delay error.
Most prior work in SOC test-architecture optimization [Xu and Nicolici 2005] only takes core internal testing into account, which is mainly because testing interconnect shorts/opens requires little time and therefore core-external (ExTest) testing can be ignored in the test-architecture optimization process. However, when high SI fault coverage is desired for today's SOCs, the testing time for SOC interconnects can be comparable to or even higher than the testing time for the core-internal logic. To understand this issue, let us estimate the interconnect SI testing time for a representative video-processing SOC [Dutta et al. 2001; Goel et al. 2004] , which contains two 32-bit programmable interconnect (PI) buses, each connecting to a number of embedded cores (e.g., MIPS/TriMedia processor, mpeg-2 decoder, transport stream processor, and IEEE 1394 controller). Without loss of generality, suppose ten cores connect to each PI buses and assume that each core on average sends data to two other cores on the bus. Hence the number of victim interconnects under test on each PI bus is N = 2 × 10 × 32 = 640. Based on the previous discussion, without test set compaction, 6N × 2 = 7680 test vector pairs are needed for the MA fault model; 
while roughly N ·2 2k+2 ×2 = 327680 test vector pairs are needed for the reduced-MT fault model with the locality factor k = 3. Since the total numbers of all the core I/Os for a typical SOC is in the range of several thousand, the test time for MA faults is in the range of millions of clock cycles for serial ExTest, while the test time for reduced-MT faults is two orders of magnitude higher. On the other hand, as reported in Goel et al. [2004] , the SOC test time for core-internal logic is less than two million clock cycles when the total number of test access mechanism (TAM) wires is 140, which in turn is less than the testing time for the previous SI faults. Moreover, with shrinking feature sizes of deep-submicron technology, short interconnects may also suffer from SI problems [Nordholz et al. 1998 ]. Therefore, it is likely that we need to test for SI faults on hundreds or even thousands of interconnects in the SOC. Prohibitively high test time is needed if an effective test-pattern compaction scheme is not employed and the SOC test-architecture is not optimized for both core-internal logic test and interconnect SI test.
Three important conclusions can be drawn from this discussion:
-Effective test set compaction strategy should be utilized to reduce the volume of test data for interconnect SI faults; -Parallel external testing is required in order to reduce the test time for interconnect SI faults; -The SOC test-architecture needs to minimize the overall testing time for both core-internal logic and core-external interconnects.
These observations motivate the work presented in this article.
TWO-DIMENSIONAL SI TEST-SET COMPACTION
We assume that the test stimuli for SI faults are given to us a priori; these stimuli can take the form of functional patterns, pseudorandom patterns, and/or patterns generated for various SI fault models [Cuviello et al. 1999; Sirisaengtaksin and Gupta 2002; Tehranipour et al. 2004] . Since a victim interconnect is mainly affected by its neighboring aggressors [Kundu et al. 2005] , the signal integrity test patterns typically feature a large number of don't-care bits. The format of the SI test vector pairs applied at the wrapper output cells of the embedded cores is shown in Table I . The entry 'x' represents a don't-care bit; '0/1' indicates that the corresponding core output terminal stays at 0/1 in consecutive cycles, and ↑ (↓) represents a positive (negative) transition. For each test pattern, we also add a postfix to denote whether this test pattern utilizes a shared bus line (as discussed in the following paragraph)-a '1' indicates that the specific bus line is utilized while 'x' implies that it is a "don't-care".
• 4:7 
Test-pattern-count reduction.
Because of the large number of don't-care bits in each test pattern, it is possible to reduce the volume of test data by compacting multiple test vectors into one vector when they are compatible (i.e., their intersection is nonempty). Note that since bus lines are shared by the cores and they may connect many cores at the same time, several SI test patterns may trigger the same bus line from different core boundaries; these patterns cannot be compacted into one test pattern. The postfix that we add to each SI test pattern is used to identify such situations. If the bit values for a specific position in the postfix of two SI test patterns are both '1', they are marked as incompatible (e.g., p 0 and p 1 in Table I ). The problem of finding a compacted test set of minimum size for a given test set is very similar to the traditional test compaction problem and can be formulated as a maximal clique-partitioning problem [Jha and Gupta 2003] . The pattern compaction problem is mapped to a graph, where each vertex corresponds to a test pattern and an edge is added between two vertices if the corresponding test patterns are mutually compatible. A set of compatible SI test patterns form a clique in this graph; our objective to find a minimum number of disjoint cliques that cover all the vertices in the graph. The clique partitioning problem, however, is known to be NP-complete [Garey and Johnson 1979] , and approximation algorithms, that is, those with bounded approximation error, suffer from high computational complexity [Arora 1998 ].
To reduce computation time, we use a simple greedy heuristic as shown in Figure 4 . The algorithm takes the original test set P o as input. A compacted test pattern is generated in each inner loop (Lines 4-7) by merging the first pattern p 1 in the uncompacted test set P u with the compatible patterns that follow in one pass. The algorithm terminates when all test patterns are compacted, and it outputs the compacted test set P c . Obviously this greedy strategy is not optimal, and the quality of the resulting P c depends on the order of the test patterns. To address this problem, we randomize the order of test patterns several times, apply our greedy heuristic, and select the best result, that is, the ordering that leads to the smallest compacted set P c . Suppose the number of original test patterns is n and the test pattern width is m. In the worst case, no test pattern is compatible with any other pattern. The time complexity of the above heuristic is O(mn 2 ). The preceeding compaction scheme to reduce test pattern count can be viewed as reducing the volume of the test data in a vertical manner.
Test-pattern-length reduction.
If we compact all the test patterns together, the length of every compacted pattern will be very large-it will be equal to the sum of the number of WOCs for the different cores. Since each SI test pattern involves only a few cores' terminals (referred to as care cores of the SI test pattern), we can bypass the boundaries of the remaining don't-care cores (e.g., Core 1 for p x in Table I ) and reduce the length of this test pattern. The above strategy can be viewed as compacting the test pattern in a horizonal manner.
That is, instead of compacting all the test patterns together, we first partition the set of cores into several smaller groups of cores (say, N g groups). Next we classify the SI test patterns in such way that the test patterns, whose care cores are all within the same core group, form an SI test group. The length of each test pattern is now reduced to the sum of the number of WOCs of this core group, instead of the WOCs of all cores. Let woc i denote the number of WOCs for core group i. For the remaining test patterns whose care cores fall into multiple core groups, we simply group them as a whole and their length remains the sum of the lengths of the WOCs for all the cores, denoted as woc SOC . The test data volume V c after two-dimensional compaction is as follows:
where p i and p r represent the number of compacted test patterns in SI test group i and the number of compacted remaining test patterns, respectively. The value '2' in this equation is added because each SI test pattern contains two vectors. To achieve better compression, we should minimize the number of remaining patterns, and at the same time, balance the test-pattern lengths for the partitions. This problem can be formulated as a hypergraph partitioning problem, with each vertex in the hypergraph corresponding to a core. The weight of each vertex is the number of WOCs of the core corresponding to the vertex and it is used to balance the partitions. A hyperedge is added for each test pattern that connects all its care cores (vertices). Since there might be multiple test patterns having the same care cores, we use the weight of each hyperedge to represent this information. The hypergraph partitioning problem has been well-researched in the literature and we use the hMetis package [Selvakkumaran and Karypis 2003] to solve this problem. As shown in Figure 5 , for the horizontal SI test pattern compaction of a hypothetical SOC containing seven cores, the patterns corresponding to the cut hyperedge 7-4-6 need to load the WOCs for all the cores, while the other patterns can be applied with shorter pattern lengths. For simplicity, the vertex and edge weights are not shown in the figure. The main objective of the hMetis package is to minimize the number of hyperedges that are cut (i.e., the remaining patterns in our problem), while our objective is to achieve the minimum test data volume as shown in Equation (1). To obtain better results, we use the partitioning result obtained from hMetis as an initial solution, and on top of it employ the FM partitioning algorithm [Fiduccia and Mattheyses 1982] with the cost function shown in Equation (1) to refine the original solution. That is, we try to move one core at a time between partitions, and check whether the test data volume is reduced. If this is indeed the case, we fix this movement; otherwise, we try an alternative movement. Our experiments show that the above refinement step is able to further reduce test data volume by 3 ∼ 5 percent when compared to the solution obtained from hMetis.
TEST-ACCESS ARCHITECTURE DESIGN AND OPTIMIZATION
We consider, as a starting point, that every core in the SOC uses wrapper cells as shown in Figure 3 [Tehranipour et al. 2003 ]. These wrappers are compatible with the IEEE 1500 standard [IEEE Std. 1500 2004 with some additional hardware added to the wrappers for signal integrity test, including a new wrapper instruction to enter the signal-integrity test (SITest) mode. In addition, the user-defined logic (e.g., the glue logic between embedded cores) is also treated as a wrapped core. In other words, the SOC is assumed to contain only wrapped logic blocks and interconnect wires that are affected by signal integrity faults.
In addition, we use the TestRail TAM architecture in this work [Marinissen et al. 1998 ]. While it is possible to use the alternative Test Bus architecture [Varma and Bhatia 1998 ] to support parallel external testing [Xu and Nicolici 2003] , the TestRail architecture is more amenable for core-external testing [Goel and Marinissen 2002] .
SI Test-Architecture Optimization: Problem Formulation
As discussed in Section 2, the testing time for interconnect SI faults can be comparable to or even higher than the testing time for core-internal logic. Therefore, it is necessary for system integrators to optimize the SOC test-architecture for both kinds of tests in order to reduce the overall testing time. The optimization problem addressed in this section can be formulated as follows:
Problem P S I opt : Given the maximum TAM width W max for the SOC, and -the test set parameters for each embedded core, including the number of input and output terminals, the number of test patterns for core internal logic, the number of scan chains and the length of each scan chain; -the test set parameters for each group of compacted interconnect SI tests obtained using the method proposed in Section 3, including the set of cores involved and the number of SI test patterns;
Determine the wrapper design for each core, the TAM resources assigned to each core and a test schedule for the entire SOC such that: (i) the sum of the TAM width used at any time does not exceed W max ; (ii) the total SOC testing time T SOC is minimized.
One of the subproblems of P SI opt is to design and optimize the test wrapper for each core. Since the test application time of a core is dependent on the length of the maximum wrapper scan chain, 2 the main objective in wrapper design and optimization is to build balanced wrapper scan chains. This is a well-researched problem [Marinissen et al. 2000; Iyengar et al. 2002] , and we use the Combine procedure from [Marinissen et al. 2000] for solving it in InTest mode. For a core wrapper in SI test mode, wrapper scan chains contains wrapper cells only and we can therefore assume that balanced wrapper input/output scan chains are achieved. Based on the TestRail architecture, we propose to solve Problem P SI opt in two steps. First, we describe how to schedule SI tests for a given TAM design, as shown in Section 4.2. Next, we describe our solution for the general problem of how to design and optimize the SOC test-architecture from scratch by adapting an existing method [Goel and Marinissen 2002] ; this approach is presented in Section 4.3.
SI Test Scheduling for a Given TAM Design
Wrapper cells are used for both core-external interconnect SI test and coreinternal logic test at the same time. Hence, to avoid test-resource conflicts, we schedule the two types of tests at different times. Therefore, T SOC = T The need for combining interconnect SI test with core-internal test makes test-architecture optimization more difficult compared to the case when only core-internal test is considered. This difficulty results from the fact that interconnect SI test patterns may involve multiple TAMs at the same time, since victim and aggressors in a crosstalk environment may link cores connected to different TAMs. To highlight this problem, we examine how T SOC can be calculated for a given TAM design.
Consider the hypothetical SOC shown in Figure 2 . Suppose that after twodimensional test compaction, the SI test has been placed in three groups, where the SI 1 group involves all the five embedded cores (these are the remaining test patterns after partitioning), SI 2 group involves Core 1 , Core 4 and Core 5 , and SI 3 group involves Core 2 and Core 3 . Two possible TAM designs and their 
For the second test schedule shown in Figure 6(b) , the TAM architecture. The calculation of T si SOC , however, is less straightforward because multiple TAMs may be involved. First, we need to calculate T si j for each SI test group j , which is determined by a single TAM denoted as the bottleneck TAM for this SI test (e.g., T AM 2 for SI test SI 2 ). Second, we need to schedule the SI tests to minimize T si SOC . We next elaborate on these two steps. Data structure. The data structures that we use to store the SI test group information and the TestRail configuration are presented in Figure 7 . The two data structures are updated whenever the SOC TAM design is changed. In particular, in data structure for TestRail r, we use time in (r), time si (r) and time used (r) to denote the internal testing time, the SI testing time and the utilized testing time on TAM r, respectively. For example, for T AM 3 shown in We use time used (r) to compare the actual utilization of TAM resources for different TAMs.
Calculation of test time for individual SI test.
The pseudocode for the procedure to calculate the testing time for each SI test group is shown in Figure 8 .
The procedure takes the TestRail architecture R SOC and all the SI test groups S SOC as inputs and calculates time si (s i ) for each SI test group s i . In the inner loop (Lines 3-8), we check all the TAMs that are involved in SI test s i to identify the TAM that determines time si (s i ). Line 4 finds out C involved , that is, all the cores on TAM r j that are involved in SI test s i . Line 5 then calculates time si (r j ), the signal integrity testing time contributed by TAM r j . We record the SI testing time time si (s i ) (Line 7) and the bottleneck TAM r btn (s i ) for SI test s i (Line 8). Finally the procedure returns the SI tests with updated SI testing time (Line 9).
• 4:13 Figure 9 . Line 1 performs procedure CalculateSITestTime to calculate the testing time for each SI test. Line 2 initializes unSchedSI, the unscheduled SI tests and currSchedTAMs, the TAMs that are utilized by the SI tests currently under schedule. Line 3 initializes currTime, that is, the begin time for to-be-scheduled SI test. After the initialization, the following loop schedule SI test one by one (Lines 4-17 . If all the unscheduled SI tests utilize the TAM resources in currSchedTAMs and hence cannot be scheduled with begin time currTime, we find nextTime, that is, the time for the first SI test that is expected to end after currTime (Line 14). We then update the begin time of the to-be-scheduled SI tests (Line 15) and currSchedTAMs based on the SI tests still under schedule (Line 16). Finally the procedure returns the SOC SI testing time T si SOC and the SI tests with updated schedule information. Let us take the test architecture and test schedule shown in Figure 6(b) as an example to demonstrate the ScheduleSITest algorithm. After the internal tests have been scheduled, all the TAM resources are available and we choose to schedule SI test SI 1 . However, before SI 1 is completed, there is no TAM available because SI 1 uses them to shift test patterns to all cores' wrapper cells. Therefore, the other SI tests have to wait for the completion of SI 1 . Afterwards, the TAM resources occupied by SI 1 are released and SI 2 and SI 3 can be scheduled with TAM 2 and TAM 3 in parallel. 
TAM Design and Optimization
The preceeding discussion for calculating T si SOC is based on a given TAM architecture. What makes Problem P S I opt more difficult is that the testing time for a SI test time si (s) is not known until the SOC test-architecture is determined This makes P S I opt fundamentally different from the problem of designing and optimizing an SOC test-architecture for core internal-logic only. In the latter case, the testing time for each core can be pre-determined for a given TAM width [Xu and Nicolici 2005] . Unlike many test scheduling algorithms that schedule cores one after another and terminate after all cores are scheduled, the TR-Architect algorithm proposed in [Goel and Marinissen 2002] generates an initial test-architecture with all cores assigned to TAMs in the beginning and then optimizes this architecture in an iterative manner. This strategy is particularly attractive for interconnect SI test, since we are able to calculate the SI testing time in each optimization step. Therefore, we propose to adapt the TR-Architect algorithm for solving Problem P S I opt in this article. At the 4:15 same time, this adaptation is not straightforward, as described in the following paragraphs.
Identifying bottleneck TAMs. The basic idea of the TR-Architect algorithm is to optimize T in SOC at the TAM level by merging TAMs and/or distributing free TAM wires to the bottleneck TAM, that is, the TAM with the longest T in tam . As a result, we define the bottleneck TAMs of the SOC (in contrast to the single bottleneck TAM for an SI test) to be those which are critical to the test time; T SOC is reduced if extra wires are assigned to them; the remaining TAMs are referred to as non-bottleneck TAMs of the SOC. In TR-Architect, there exists only a single bottleneck TAM at a time during the optimization process. Either two non-bottleneck TAMs are merged with less TAM width to release freed TAM resources to the bottleneck TAM, or the bottleneck TAMs is merged with another TAM to decrease T in SOC [Goel and Marinissen 2002] . In our problem, as we try to minimize T SOC = T in SOC + T si SOC , it is possible that multiple bottleneck TAMs exist at the same time. That is, in addition to the bottleneck TAM for core-internal logic test, each SI test has its own bottleneck TAM, which may affect the total SOC testing time T SOC . For example, for the schedule shown in Figure 6 (a), the bottleneck TAM for S I 2 (i.e., T AM 2 ) is a bottleneck TAM for the SOC; on the other hand, for the schedule shown in Figure 6 (b), the bottleneck TAM for S I 3 (i.e., TAM 1 ) is not a bottleneck TAM for the SOC. For the schedule shown in Figure 6 (a), T AM 1 and T AM 2 are bottleneck TAMs and TAM 3 is a non-bottleneck TAM, while for the schedule shown in Figure 6 (b), TAM 2 is a bottleneck TAM and TAM 1 is a non-bottleneck TAM.
The procedure to identify SOC bottleneck TAMs is shown in Figure 10 . The bottleneck TAM for core-internal logic test is guaranteed to affect T SOC . Therefore, in Line 1 and Line 2, we find this bottleneck TAM r and it is identified as a SOC bottleneck TAM (e.g., TAM 1 in Figure 6(a) ). Next, in Line 3, we find the TAM r * with the longest SI test time. Each core on r * might be involved in several SI tests. For every one of these SI tests, we identify its bottleneck TAM r btn . Since the SI test bottleneck TAMs identified in this way must affect the total SOC testing time T SOC , we treat each of them as the bottleneck TAM of the 
SOC (Lines 4-7)
. From this procedure, it can be seen that the bottleneck TAM for those SI tests that are not involved with any core on r * can be ignored, for example, the bottleneck TAM for S I 3 shown in Figure 6 (b). For the test architecture and test schedule in the example of 6(a), TAM 1 is the bottleneck TAM for internal test. On the other hand, both TAM 1 and TAM 2 are bottleneck TAMs for SI tests. Therefore, the bottleneck TAM set for this schedule is composed of TAM 1 and TAM 2 .
In the IdentifyBtnTAMs algorithm, every SI test involving a core on the TAM with the longest SI test time is checked for its bottleneck TAM . Therefore, the complexity of this algorithm is O (N SI N c ) .
Algorithm for problem P S I opt . Next we introduce our algorithm for Problem P S I opt . Similar to the TR-Architect algorithm, we first create an initial TestRail architecture and optimize it by merging TAMs and distributing free TAM wires afterwards. There are two key questions during the optimization process, namely, How to find out the merging candidate and merge them and How to distribute free TAM wires. Because there may exist multiple bottleneck TAMs at the same time in our problem, the answers to these two questions highlight the main differences between our algorithm and the TR-Architect algorithm proposed in [Goel and Marinissen 2002] .
The procedure for distributing free TAM wires is shown in Figure 11 . The procedure takes the given TestRail architecture R SOC , all the SI tests S SOC and the number of free TAM wires numFreeWires as inputs. The free TAM wires are distributed iteratively to the bottleneck TAMs (Lines 2-6). Since we may have multiple bottleneck TAMs at the same time, we select one of them based on the criteria that T SOC is the minimum after obtaining the extra TAM wire (Line 4). Because R SOC is changed whenever a free TAM wire is assigned (Line 5), time si (r) and time used (r) for every r ∈ R SOC are updated (Line 6). Finally the procedure outputs the new TestRail architecture R SOC with all free TAM wires assigned.
Let us take the test architecture and test schedule shown in Figure 6 (a) as an example to explain the distributeFreeWires algorithm. Suppose there is one more free wire to be distributed to one of the three TAMs. The algorithm tries to distribute it to either TAM 1 or TAM 2 (they are the bottleneck TAMs), compares the overall SOC testing time for the two choices, and finally selects the one with smaller testing time.
To distribute a free wire, we need to identify all bottleneck TAMs at the current time and schedule the SI tests with one of the bottleneck TAMs added with one more wire, which means that we need to run SI test scheduling O(N TAM ) times in the worst case. Therefore, if the number of free wires is N FW , the overall complexity of the distributeFreeWires procedure is O(N FW N SI N c N 2 TAM ). The procedure for merging TAMs is shown in Figure 12 . In this procedure, with the given TestRail architecture R SOC , all the SI tests S SOC and one of the merging candidate r 1 as inputs, we look for another TAM candidate in R find = R SOC \ {r 1 }, which leads to the lowest testing time after merging with r 1 . After initialization (Lines 1 and 2), we enumeratively try every TAM r i in R find as the other merging candidate (Lines 3-14). What's more, we also try to merge r i and r 1 with different TAM width in the range of width min = max{width(r i ), width(r 1 )} (Line 4) and width max = width(r i ) + width(r 1 ) (Line 5). The intuition behind this is that we may be able to merge two TAMs with less TAM width and the extra free TAM wires can be assigned to other bottleneck TAMs to reduce T SOC . The procedure outputs the TestRail architecture R SOC with the lowest testing time after merging (Line 15). It is also possible that we cannot find a merging plan to reduce T SOC . In such case, the original TestRail architecture is returned.
Again, let us take the test architecture and test schedule shown in Figure  6 (a) as an example to explain the mergeTAMs procedure. Consider the case that TAM 1 is the candidate TAM and we need to find another TAM to be merged with TAM 1 and redistribute the TAM wires. We try each and every one of the other TAMs (TAM 2 and T AM 3 in this case) to be merged with TAM 1 . If TAM 1 is merged with T AM 2 , the newly merged TAM, TAM 12 , will be assigned max{width(r 1 ), width(r 2 )} wires initially. By now there are two TAMs (TAM 12 and TAM 3 ) and wid th(r 1 ) + wid th(r 2 ) − max{width(r 1 ), width(r 2 )} free wires. We then distribute these free wires to the two TAMs one wire at a time to achieve maximum total test time reduction. The merging of TAM 1 and TAM 3 is similar to the above procedure. We will then select the merging with smaller test time and the corresponding TAM architecture will be generated from the mergeTAMs algorithm.
In the mergeTAM procedure, to find a TAM to be merged with the candidate TAM for maximum test time reduction, we need to try all other TAMs (i.e., N TAM −1 times). For each of these TAMs, we need to call the distributeFreeWires procedure multiple times (the worst case complexity is O(W max ) times). Since the worst-case complexity for the utilized distributeFreeWires procedure is
The pseudocode for our top-level algorithm T AM Optimization for Problem P S I opt is presented in Figure 13 , which is adapted from the TR-Architect algorithm [Goel and Marinissen 2002] . First, we create a start solution (Lines 1-16). This mainly consists of three steps. In Step 1 (Lines 2-5), we assign each core to a one-bit wide TAM and we calculate the testing time of core internal logic time in (r), the testing time of interconnects time si (r) and the actual utilized testing time time used (r) for every r ∈ R SOC . In case W max < |R SOC |, we do not have enough TAM wires and hence we need to merge TAMs together (Lines 7-13). We first sort R SOC based on the total utilized testing time in each TAM (Line 9), then r W max +1 is merged iteratively with another TAM r i . We select this merging candidate r i based on the criteria that T SOC is the minimum after merging with r W max +1 (Line 10). Since R SOC is changed after merging, time si (r) and time used (r) for every r ∈ R SOC are updated (Line 13). In the case W max > |R SOC |, we have extra free TAM wires left and procedure distributeFreeWires is called to distribute them.
Next, we optimize the TAM architecture by merging the TAM with the lowest time used with another TAM (Lines 17-23). We first sort R SOC in nonincreasing order and we select r |R SOC | as one of the merging candidate r 1 , then we call procedure mergeTAMs to search for another TAM to merge with r 1 and possibly redistribute TAM resources to reduce T SOC . This is an iterative procedure and it stops when no reduction in T SOC can be achieved . Afterwards, we try to further optimize the TAM architecture by trying to merge the TAM with the longest time used with another TAM (Lines 25-30) and merging other TAMs (Lines 31-36). Finally, TAM Optimization tries to minimize T SOC by iteratively moving one core from bottleneck TAMs of the SOC to another TAM, if possible (Line 37).
As can be observed in Figure 13 , the computational complexity of the TAM Optimization algorithm is mainly determined by the bottom-up and topdown optimization procedures, which require the mer geT AM procedure to be carried out O(W max ) times in the worst case. Therefore, the complexity of the overall algorithm is O(W 
× 100%.
EXPERIMENTAL RESULTS
To evaluate the effectiveness of the proposed solution, experiments were carried out for three ITC'02 benchmark SOCs from Marinissen et al. [2002] , namely, g1023, p34392, and p93791. Without loss of generality, we do not consider hierarchy in the testing of core-internal logic. Since the topology of these benchmark SOCs and the connection between embedded cores are not available, we cannot obtain the test patterns for core-external interconnect SI faults for these benchmark SOCs. Therefore, we generate random test patterns for our experiments in the following manner. For the smaller SOC g1023, we generate 1, 000 and 5, 000 random patterns, respectively. For p34392 and p93791 we generate 10, 000 and 50, 000 random patterns, respectively. Each test pattern targets one victim and N a (2 ≤ N a ≤ 6) random aggressors. Suppose the victim wire connects two cores Core a and Core b . Then at least N a − 2 aggressor lines are between these two cores. In addition, we assume that a 32-bit bus is utilized in all the three SOCs. The probability that the bus is used by a test pattern is set to 50%. If the bus is used for a particular pattern, we randomly generate 1 ∼ N a specified bits in the postfix of the pattern (see Section 3). Table II shows the results for our two-dimensional test compaction scheme. We partition the SOCs in N g parts using the hMetis package [Selvakkumaran and Karypis 2003] . Therefore, the row with N g = 1 is for the case when the test set is compressed without partitioning. N r = 1, 000 Tables III, IV , and V present results for the SOC test application time, measured in terms of the number of clock cycles. We compare between the following cases: (i) optimizing T in SOC using only the TR-Architect algorithm [Goel and Marinissen 2002] (T TR-Arch ); (ii) optimizing T SOC using our proposed algorithm TAM Optimization for several SI test pattern counts N r and the SI test grouping strategy. Note that T [Goel and M arinissen 2002] is determined by optimizing the SOC TAM architecture in terms of core-internal test time T × 100%, respectively. Note that T g quantifies the benefit derived from our two-dimensional compaction strategy over the onedimensional compaction scheme that reduces only the test-pattern count. We can see that more than 20% test-time reduction can be achieved in some cases From Tables III, IV , and V, we note that obliviously optimizing SOC testarchitectures, without considering interconnect SI faults, leads to much higher test time. This gap grows with an increase in the pattern count for the SI faults and the associated percentage of SI testing time in T SOC . We can also see that when W max is small, there is no significant advantage in using proposed algorithm; in a few cases, worse results are obtained compared to SI-oblivious TAM optimization (e.g., for SOC p34392, when W max = 8 and N r = 10, 000). This is mainly because the TAM design solution space is small for smaller values of W max , therefore, similar TAM architectures are obtained with different optimization criteria. When W max is higher, we have more freedom during the TAM design process and hence the improvement offered by the new optimization procedure is more noticeable. We can also observe that, for SOC p34392, when W max > 32, T min remains nearly the same. This is because the testing time for Core 18 , the largest embedded core, dominates T SOC .
We attribute the few exceptions to the nature of the heuristics that explore a limited part of the solution space.
When the number of SI test pattern grows, it is more important to optimize the test-architecture for both core-internal faults and interconnect SI faults. In Figure 14 , we vary the original SI test-pattern count while keeping the TAM width at 32 bits. We compare the test time obtained using the proposed method with the test time for the baseline method based on Goel and Marinissen [2002] . The number of (given) SI test patterns is increased from 100 to 5,000 for SOC g1023, and from 1,000 to 50,000 for SOC p34392 and p93791, respectively. It can be observed that the gap between the two solutions becomes larger when the number of SI test patterns increases, which highlights the importance of optimizing the SOC test-architecture for interconnect SI faults for newer technology generations.
CONCLUSION
As feature sizes shrink with newer process technologies, and clock frequencies increase, the test cost due to interconnect signal integrity faults can be considerable. To cope with this problem, we have presented a new TAM optimization flow for core-based SOCs that considers test times for both core-internal logic and core-external signal integrity faults on interconnects. This is in contrast to prior work on test infrastructure design for core-based system-on-a-chip, which has focused on minimizing only the test time for core-internal logic. We have investigated the impact of interconnect SI tests on SOC test-architecture design and optimization. We have also presented a compaction method for SI test sets such that the test data volume is reduced. Experimental results for the ITC'02 benchmarks show that the proposed approach can significantly reduce the overall testing time for core-internal logic and core-external interconnects. The test times obtained using this approach are noticeably less than that obtained by a baseline based on the TR-Architect algorithm, which only considers the core-internal test time during optimization. As part of future work, we are considering the role of different core frequencies for reducing the test time [Xu and Nicolici 2006] . We are also investigating how interconnect layout information can be used for more effective test-infrastructure optimization.
