Available techniques for testing core-based systems-on-a-chip (SOCs) do not provide a systematic means for synthesising low-overhead test architectures and compact test solutions. In this paper, we provide a comprehensive framework that generates low-overhead compact test solutions for SOCs. First, we develop a common ground for addressing issues such as core test requirements, core access and test hardware additions. For this purpose, we introduce finite-state automata for modeling tests, transparency modes and test hardware behavior. In many cases, the tests repeat a basic set of test actions for different test data which can again be modeled using finite-state automata. While earlier work can derive a single symbolic test for a module in a register-transfer level (RTL) circuit as a finite-state automaton, this work extends the methodology to the system level, and, additionally contributes a satisfiability-based solution to the problem of applying a sequence of tests phased in time. This problem is known to be a bottleneck in testability analysis not only at the system level, but also at the RTL. Experimental results show that the system-level average area overhead for making SOCs testable with our method is only 4.4%. while achieving an average test application time reduction of 78.5% over recent approaches. At the same time, it provides 100% test coverage of the precomputed test sets/sequences of the embedded cores.
Introduction
Embedded cores are being increasingly used to provide SOC solutions to complex integrated circuit design problems. Synthesis of lowoverhead test architectures and compact test sets for these SOCs is of critical importance [ 1, 2] .
Traditional approaches for testing SOCs rely on variants of boundary scan or test bus to provide test access to the embedded cores [2, 3, 41. However, the area and delay overheads of such methods are usually high, as is the test application time for boundary scan based methods. Another approach is to use existing functionality for test access of embedded cores [5, 6, 7] . Even though such functional access approaches reduce overheads, they are usually not flexible enough to handle embedded cores made testable using diverse test methodologies such as scan, built-in self-test (BIST), sequential test generation, etc.
In this paper, we present a comprehensive technique for testing core-based SOCs. Our solution focuses on effectively reducing the test application time, in addition to reducing the test overheads, while providing complete test coverage of the test setskequences of the embedded cores. The key strengths of our work are as follows.
It is able to use individual cores to transfer test data to and from the core under test in a much more aggressive and effective man- 
MIP-9729441.
ner than has been done in previous work. To provide such a functional test access, it allows complex core transparency modes, where core transparency can be loosely defined as the ability of the core to propagate test data from its inputs to its outputs (more formal definition is given later). Consider, for example, a situation where we need to transfer data from an 8-bit input of a core to its 16-bit output. Our technique can make use of the fact that the core might provide a mechanism to compose the output vector from two time-separated input vectors, whereas other techniques [6, 7] would consider the core incapable of executing the required data transfer and solve the problem by adding designfor-test (DFT) hardware such as test multiplexers. It provides controllability and observability to cores on an asneeded basis: past work [6, 71 requires all the inputs (outputs) of a core to be simultaneously, though indirectly, controllable (observable). This can lead to significant area overheads, especially if the test scheme for the core does not require all its inputs (outputs) to be simultaneously controllable (observable). It explicitly accounts for sequential test sequence requirements of cores: core test strategies apply a single symbolic test (a set of test actions) to a core multiple times with different test data at pre-defined time-instants. As opposed to the existing expensive practice of meeting these test requirements with extra test hardware, our technique tries to overlap multiple tests in time to realize a low-cost solution, whenever possible. It requires SOC-level test insertion only as a last resort. It also effectively models diverse core test strategies like scan and BIST. The paper is organized as follows. Section 2 introduces transparency and test models through examples. Section 3 presents the SOC testing algorithm. Section 4 gives the experimental results and Section 5 the conclusions.
Transparency and Test Models
In this section, we present our models for different aspects of corebased testing. First, we introduce a novel way of modeling the transparency of a core using finite-state automata. Next, we model diverse test requirements for a core under test. Lastly, we illustrate a compact and efficient test architecture derived for an SOC with the help of a test strategy (Section 3) employing these models.
Transparency of a core
We first formalize the notion of transparency used in our work as follows.
Definition 1: Transparency of a core is a collection C of nondeterministicfinite-state automata [8] qJ that allows test data to propagate from a set of its inputs I to a set of its outputs J .
The advantages of using a finite-state automaton (FSA) ,to describe tranparency are manifold. First, we can compactly describe 0-7803-5832-5/99/ $10.00 0 1999 IEEE temporally-separated circuit events. For example, if we examine the transparency of a register RI, we can deduce that the element (at some time instant) can propagate data at its input upon a LoadRl operation. Subsequently, the data is available at its output for as many cycles as allowed by the HoldRl operation. Subgraph1 in Figure l (b) compactly captures this behavior (note that S1 would need to be changed to an accept state for this purpose).
We can also use an FSA to capture spatial aggregation of test data. Consider, for example, the two 8-bit registers shown in Figure l b) are (a) SO to S3 via S1, and (b) SO to S3 via S2. Both solutions take two time-steps. In (a), setting of the higher (lower) 8 bits of data is achieved in the first (second) time-step, while in (b), it is the reverse (indicated by the annotation to states S1 and S2, respectively).
In this way, we can represent the transparency behavior of an RTL element in a compact manner using an FSA. Since a core can be viewed as an interconnection of RTL elements, we can clearly extend the transparency behavior concept to the core-level. For this purpose, we first pre-compute the transparency behavior for the different RTL elements. We next perform RTL symbolic justification and propagation analysis to compose a transparency behavior for the core outputs in terms of the core inputs. Exploiting the equivalence of regular expressions and FSA, we tailored an existing regular expression based justification and propagation framework [9] for this purpose. Modeling of core-level transparency behavior using an FSA is analyzed in the following example.
Example 1 : Consider the core DG1 shown in Figure 2 . It is a part of an SOC used as a data address generator. The core consists of a 16-bit input Data, a reset input R, a test input T and a 16-bit output Generate. The core has an arithmetic logic unit, ALU, and a counter, CTR, which facilitate both logical and arithmetic modifications to the input data.
Let us analyze the transparency behavior of DG1 as shown in Figure 3 . The FSA has one start state SO' and six accept states SO, S4,-++ ,S8. States SO' and SO have the property that incoming transitions to these states preserve the previously held values in the different register elements. Therefore, it follows from Figure 3 that values generated at an accept state (other than SO) are always available for one extra time-step through a transition to SO. This is useful when we use DG1 to feed cores that require the same test data for an extra cycle. The edges in the FSA are annotated with input labels that denote the values required at the primary inputs, while states are annotated with the operations performed in that cycle. For example, the transition from state S2 to S3 is labeled with R,T, which indicates that an input of R = 0, T = 0 when applied to DG1 in state S2 transfers DG1 to state S3. In state S3, CTR loads its input (as given by the annotation Cc = loud to state S3). Observe that the FSA is only partially specified, thereby giving only the necessary information required for transparency analysis.
Consider the problem of propagating a test data sequence < VI, v2 > from Data to Generate. From Figure 3 , we can see that the path SO'
to S5 (via SI, S2, S3 and S4) in the FSA has two accept states, S4 and S5. If we examine the path SO' to S4, we need v l at Datu at time t = 2 for providing v l at Generate at time t = 4. Likewise, path SO' to S5 propagates v2 at Data at time t = 3 to Genercite at time t = 5. From the annotations to the edges, we can see that the data transfers are activated from the start state by the following input sequence at R, T : 00, followed by 00, 00, 00 and 00. In this way, we can use an FSA to effectively propagate a test data sequence and also determine the additional constraints (e.g., R, T here) that facilitate this transfer.
I

Core test requirements
Diverse strategies adopted by different core vendors to test a core create a range of controllability/observability objectives for an SOC. For example, high-level symbolic test generation techniques [9] for RTL circuits have controllability and observability requirements only at some specific time-instants (don't cares otherwise). Test strategies such as scan, BIST, sequential test generation, etc., place similar cycleby-cycle requirements. Consequently, an SOC lest framework must be flexible enough to encapsulate different specifications of test sets and also provide a common ground for systematic analysis. In the follow-ing example, we illustrate how an FSA can provide a convenient and compact representation to the given core test requirements. Figure 5 (a) models a general scan test schedule as an FSA consisting of a start state start, an accept state finish and two other states test1 and test2. Each state represents a high-level granularity of test actions (as represented by the annotations to the states), and, in turn, can be decomposed into an FSA. For example, the steadystate scan actions of applying the test vector after scanning in the state vector and scanning out the previous response (represented by state test2 in Figure 5 (a)) can be decomposed into the FSA shown in Figure 5(b) . This decomposition is shown in the context of the scan test requirements of the core TLU. Scanning one test pattern into TLU requires four cycles of shifting from the input with Test = 1 (indicated by states SO to S4). Simultaneously, we also scan out the stored test response. Thereafter, we can apply the input test vector by setting Test = 0 (S4 -+ S5 transition). If there are V test vectors, this FSA is executed V -1 times accounting for the self-loop at state test2.
I
The FSA to model sequential test sequences and BIST can be obtained in a similar fashion. For example, we can use a sequential test generator such as HITEC [lo] to generate a sequence of test vectors for the gate-level implementation of core DGI. We can then model this test schedule by the FSA shown in Figure 6 (a). Similarly, we can model a BIST test scheme for a memory module that has a 1-bit TestStart input and a 1-bit TestJlag output by the FSA shown in Figure 6 (b). In this way, FSA representations of basic test schedules provide a common ground for creating a systematic framework for their analysis (Section 3). This, in turn, leads to a compact and low-cost test architecture design discussed next.
System test architecture
In this section, we first introduce an example system test architecture generated for an SOC, and then quantitatively analyze it for possible test application time savings in comparison with existing techniques. Figure 7 shows the steady-state flow of test data at Lo from
In[O to 1.51 using the transparency behavior of DG1. Suppose DG1 allows variable-latency transparency. Specifically, assume that the first vector can propagate through DG1 in four cycles, but subsequentvectors take only one extra cycle to propagate through it because of a pipeline in it. The scan action described by the FSA in Figure 5 is realized by the window shown in Figure 7 . Specifically, four 16-bit vectors are scanned in at Lo through DG1 at cycles i+9, i + 13, i + 14 and i + 15. Since the state of TLU must be preserved between cycles i + 9 and i + 13, the test controller sets cl = 1 within this period to gate the clock of TLU (and thus preserve its state). The circuit response is captured at cycle i+ 16, and scan-out takes four cycles starting at i+ 17. However, since scan-in of a new state vector and scan-out of the previous captured response can occur simultaneously in the window shown, it takes eight cycles per test vector to test TLU.
We next compare our scheme with the ones presented in [6, 71 which only allow constant-latency transparency. In other words, under their scheme, DG1 can only feed the desired 16-bit vectors to Lo every four clock cycles (the clock needs to be gated here as well to preserve the state when necessary). Thus, in the steady state, it takes 16 cycles to scan-in the desired state into TLU and four more to feed it the desired test vector, for a total of 20 cycles (scan-out can take place in parallel with scan-in as usual). This means that our scheme results
I
In the next example, we will illustrate how our methodology can in a test application time speed-up of 2.5X for testing TLU.
also help lower area and delay overheads. Figure 8 depicts the main components of an SOC called SysProc (ignore the shaded mux temporarily and assume that ASICl .Our is connected directly to ASIC3.lnl). Consider the objective of testing ASIC4 with a given test sequence at its input In, and observing the resulting test responses at its output Out. The transparency characteristics of the different cores, which are significant for the testability of ASIC4, are as follows: ASICl is opaque (empty transparency set), while cores ASIC2 and ASIC3 are transparent. The transparency of ASIC2 is significantly different since it takes 4-bit inputs at timeinstants 1, 2 , 4 and 5 to compose a 16-bit input at time-instant 6. This is depicted by the FSA shown in Figure 9 . The testability schemes proposed in [6,7] cannot model this transparency. Consequently, the only option for those schemes is to provide test data at ASIC3.Znl through additional test hardware. This is done by adding the shaded test hardware shown in Figure 8 for this purpose, causing area and delay overheads.
Subsequent testability analysis by our algorithm in Section 3 exploits the transparency of ASIC2 and determines ASIC4 to be testable. Hence, no additional test hardware is necessary for test data access at ASIC4.h from the system inputs, leading to savings in area and delay overheads. This case study clearly suggests that better test and transparency models are crucial in the developmenit of low-overhead SOC test solutions.
I 3 The SOC Testing Algorithm
In this section, we detail the algorithmic aspects of our methodology. Our algorithm takes as its input a system of cores, their connectivity and test requirements, and outputs a low-ov'erhead test architecture and a test schedule that facilitate its testability.. In the process, it follows the steps outlined below.
The first step in the algorithm is to modt:l the transparency and test requirements of the individual cores (Section 2). In many cases, the core tests involve a repetition of a basic set of test actions (a single symbolic test) for different test data. Therefore, we next perform system-level symbolic justification and propagation to satisfy the requirements of this symbolic test. This is very similar to symbolic RTL testability analysis for testing an RTL element (e.g., functional unit, register, multiplexer, etc.) in a standalone core with a symbolic test vector. Hence, we adopted the regular expression based symbolic testability analysis scheme from [9] for this purpose. Note that the analysis scheme in [9] was applied to individual cores, not SOCz;. In general, any other high-level testability analysis scheme can also be used to determine the system-level test actions for a single symbolic test. For such cases, the solution capturing the cycle-by-cycle test actions is simply equivalent to an FSA. Unlike the analysis which terminates at this juncture for a standalone core, we need to compose a sequence of SOC tests at time-instants dictated by the test models. If such a sequence is not realizable with the existing transparency and connectivity, we employ additional test hardware, e.g., clock gating or systemlevel test multiplexers, to relax the core test requirements and output a low-cost solution. Finally, we employ the framework provided in [9] to minimize the test hardware added.
Composing a test sequence
We now propose a Boolean satisfiability based framework for composing a test sequence from a single symbolic test. We first illustrate our method with the help of some simple examples.
Example 5: Consider, for example, the FSA for a single symbolic test shown in Figure 10(a) . If the system operates according to the sequence of actions specified by this FSA, we achieve the test objective when the system enters the accept state SN. Let tN denote the time-instant associated with state SN assuming the time-scale starts with state SO at to. Now, suppose that the test requirements specify that the test objective must be achieved every two cycles in the steadystate. In other words, we require the system to enter accept state SN at time-instants tN, tN + 2, tN + 4, etc. This, in turn, is possible if we can pipeline the FSA as shown in Figure 10(b) . From the time-chart shown, it is evident that we can realize the sequence of tests if and only if states SN, SN -2, etc., co-exist, and, states SN -1, SN -3, etc., also co-exist. In other words, we merely need to check if the odd-numbered group of states and the even-numbered group of states are compatible In the next example, we will study the additional issues that must be considered when the FSA for a single symbolic test is available. We will also motivate why a satisfiability-based approach forms a natural solution to the problem. For the Boolean expression TESTS in Equation ( 1 ) to be complete, we must also consider the constraiats due to the compatibility graph (see Figure 1 l(b) ). Specifically, the incompatibility of state SO with state S3 translates to the Boolean expression Abl r\j=,(SOi + q) A (S3y' 4 q). The Boolean expression INCOMPAT below captures the incompatibility relationships as given by the compatibility graph. A state in the test schedule is a tuple of states existing at that time-instant for some instantiation in the time-chart. However, this schedule is a solution only for a finite number (five) of test objectives. We can extend this schedule to the infinite case by comparing states for equivalence. We note that state (S4,S2,SO) repeats in Schl. Using state equivalence, we can obtain the compacted schedule Sch2, as shown in Figure 1 l(d) . Schedule Sch2 clearly satisfies an infinite number of test objectives. I
The pseudocode for SOC test sequence composition is given in Figure 12 . The function ComposeJestSeq takes as its input an array of FSA Tests which must be phased according to the test requirements CoreTest. The compatibility graph for each FSA in Tests is precomputed using Definition 2, and is passed as the input array CG.
The function PhaseJests generates the time-chart Timechart using the FSA Tests and the specific test requirements CoreTest (statement 1). Clause-Gen (statement 2) uses Timechart and CG to construct the set of conjunctive clauses TESTS, as described in Example 6. SarSolve (statement 3) next checks TESTS for satisfiability and if satisfiable, the function Schedule (statement 4) returns a valid test schedule. 
Experimental Results
We next present experimental results obtained by applying our algorithm to some example SOCs. The SOC Sys-Gen was seen in Section 2. SOCs Grid, Star4, Mesh and Star8 are systolic architectures proposed to study the performance of digital signal processing applications. They consist of processor cores connected in different configurations to effect pipelined processing of input data.
The testability results are given in Table 1 . Columns 2 and 3 give the area of the SOCs before and after running our algorithm. The area numbers are actual layout numbers generated after placement and routing with the Octtools package from University of California, Berkeley. Column 4 reports the resultant area overheads for these SOCs, with an average of only 4.4%. Columns 5 and 6 compare the test application time for our approach with the one in [6], which drastically reduces test application time compared to the traditional approaches. Column 7 gives the percentage reduction in the test application time, with an average of 78.5%. Since the scheme in [6] cannot handle some of the SOCs, we conservatively extended their approach to estimate the test application time. Our testability approach achieves 100% test coverage of all embedded cores in all SOCs. For example, the test architecture derived for the SOC Sys-Gen (see Figure 4) provides complete access to apply the scan tests for the core TLU as well as the precomputed test sequences for cores CU and DGI. In this way, all embedded cores are completely tested with the precomputed test sets (or sequences) provided for them.
Conclusions
We provided a comprehensive framework: for analyzing the testability of core-based SOCs for generating low-overhead test architectures and compact test schedules. Salient features of this work include the modeling of transparency, tests and test hardware using finite-state automata, and providing a rigorous system-level testability analysis framework. Experimental results show complete test coverage with low area overheads and test application times.
