The testability of a VLSI design is strongly aected by its register-transfer level (RTL) structure. Since the high-level synthesis process determines the RTL structure, it is necessary to consider testability during high-level synthesis. A synthesis system composed of scheduling and binding components minimizes the number of hardware sharing conicts between tests in the test schedule. Novel test conict estimates are used to direct the synthesis process. The test conict estimation is based on examination of the in terconnect structure of the partial design state during synthesis. Test conict estimates enable our synthesis system to select design options which increase test concurrency, thereby decreasing test time. Experimen tal results show that designs generated by this approach are testable in a highly concurrent manner.
INTRODUCTION
The cost of chip testing has become a large fraction of the total chip production expenditure as other cost components have improved. Furthermore, increasing gate-to-pin ratios limit the feasibility of testing chips externally. The incorporation of test structures into the design ameliorates the testability of hardware which is not easily testable through external pins. The Built-In Self-Test BIST approach ( [2] , [11] ) tests chip components using pseudo-random patterns which are generated onchip. Consequently, testing can be performed on-site with minimal additional testing equipment, and at chip speed. BIST requires the placement of pseudo-random pattern generators (PRPG) and multiple-input signature registers (MISR) on the chip. PRPGs generate pseudo-random patterns to test the combinational modules on the chip and MISRs compact the results of the tests. All test registers are in a shift register chain so that seeds to the PRPGs can be shifted in at the beginning of testing, and compacted results can be shifted out of the MISRs after testing.
High chip test time reduces chip production throughput and increases chip production cost. In order to minimize test time, as many tests as possible should be executed in parallel, yet total parallelization is often impossible due to hardware sharing conicts between tests. Hardware sharing conicts occur because the test results of two dierent hardware modules may be forced to propagate through the same hardware in order to arrive at a MISR. Such conicts occur as a result of the nature of the interconnect structure between dierent modules in the datapath. Considering test concurrency during high-level synthesis may greatly improve test time since the structural representation is determined at that stage.
Previous research concentrates on the eect of self-loops and sequential depth from test registers. Such approaches may increase the testability of a design, but they only indirectly address the problem of conicts between tests. It is the test conicts which increase test time by requiring that tests be executed sequentially. Since reduction of self-loops and sequential depth only resolves part of the test conict problem, those approaches may degrade the datapath area and delay characteristics, while providing little increase in testability. A new metric for testability is needed which reects the conict properties of the tests.
In this paper we present metrics for the estimation of the number of test conicts in the nal design, and we present This work is supported by the Semiconductor Research Corporation under contract number 93-DJ-538, the National Science Foundation under grant n umberCDA-9314748, and a graduate fellowship from Brooktree Inc. a s c heduling and binding algorithm which uses these metrics to make i n telligent high-level synthesis decisions. This novel approach i n troduces the possibility of including test concurrency and test time issues in microarchitectural synthesis. The proposed approach enables, for the rst time, reasoning about evolving test concurrency in tandem with the evolving microarchitectural design. The results of the described algorithm are demonstrated.
P R OBLEM DEFINITION
In this paper we describe a scheduling and binding algorithm which produces microarchitectural designs with high levels of test concurrency. The algorithm is applicable to a range of Built-In Self-Test methodologies, such a s P artial-Intrusion BIST [1] and Full BIST [11] . The algorithm uses a testability metric which minimizes test application time by reducing hardware sharing conicts between tests.
Under the assumption used throughout this paper, ip-ops are not necessarily placed at each control line, so random data is not sent on these lines. It is the responsibility of the test controller to select the mux congurations at each step of testing. We use this assumption because control of the mux congurations allows hardware sharing conicts to be reduced by enabling test data to be directed through non-conicting paths of the datapath.
In order to test a module, each of its inputs must receive test data from PRPGs, and its output must send test results to a MISR. The test data may pass through a number of modules between the output of the PRPGs and the input of the MISR. The subgraph of the datapath through which test data ows from PRPGs to MISRs in order to test a module is called a test path. When two or more test paths share hardware, they are said to conict. Conicts between test paths restrict the test scheduling [6] , thereby reducing the throughput. The concurrency of conicting test paths is restricted dierently depending on the type of hardware being shared. The two types of conicts are listed below. Hard Conicts occur when one test path uses a register as a MISR, while another test path uses the same register as a PRPG. Figure 1a shows a hard conict between two tests paths due to their sharing of register r3. Since a register with both PRPG and MISR capabilities entails large area overhead, we disallow this option. This assumption forces the two test paths in gure 1a to be executed in dierent test sessions. All of the test paths in one test session are executed concurrently with one another, but each test session must be executed sequentially. Additional test sessions increase test time.
1
Soft Conicts exist when two test paths share intermediate registers, muxes, buses, or functional units at the same control step. Soft conicts can be avoided by s c heduling into dierent control steps, operations which conict. In the example of gure 1b, by s c heduling the use of ADD2 in test path 1 to the rst control step, and the use of ADD2 in test path 2 to the second control step, the conict has been avoided and the two test paths can be executed in parallel. The work in this paper assumes that test scheduling will be performed in a single test session which contains no hard conicts. For this reason, references to conicts in this paper refer to soft conicts.
We identify two goals which should be satised during synthesis to ensure testability: (a) each module should be covered (included in a test path), and (b) full coverage should be achieved with as few test conicts as possible. Since test paths are not dened until after the datapath is complete, it is necessary at this stage of synthesis to design a datapath which allows the denition of non-conicting test paths. It is obvious that a module must be included in a test path in order to be tested, making this goal minimally necessary in order to test the chip. The number of conicts between tests is important due to its eect on test time. Since all tests which share hard conicts must be performed in dierent test sessions, the maximum number of mutually conicting tests is a lower bound on the number of test sessions. Soft conicts can be avoided, but their avoidance often requires an increase in the number of control steps required to perform a test. Both hard and soft conicts can be avoided through proper datapath denition during high-level synthesis. The eects of test conicts are also apparent when testing is performed in a pipelined fashion. During pipelined testing, test conicts must be avoided between dierent test paths, as well as dierent instantiations of each path. Consequently, soft conict consideration is crucial to maximizing throughput during pipelined testing.
MOTIVATION
The work presented here attempts to: (a) cover each module in a test path, and (b) achieve full coverage with the minimum number of test sessions. Since every module must participate in at least one test path, each module must be covered. Coverage is reduced if a module's input port is reachable only by registers which m ust act as MISRs, or if a module's output port can only reach registers which m ust act as PRPGs. A self-loop is perhaps an extreme case of a lack o f c o v erage since an input port and an output port cannot be independently covered. Test time is greatly increased by reduced test concurrency caused by hardware sharing conicts between tests. The datapath in gure 2a has a test conict because both modules A1 and S1 must be observed through module IN1. The test paths in gure 2b show that the outputs of modules A1 and S1 must both be connected to the input of IN1. This conict forces A1 and S1 to be tested sequentially, in dierent test sessions, increasing test time.
Coverage and conict problems may be inadvertently created during high-level synthesis unless care is taken to avoid them. Evolving concurrency problems can be estimated during high-level synthesis to determine which decisions will reduce the probability of conicts. Scheduling and binding decisions which do not consider the conicts between tests can limit the testing options of many modules to single options which share hardware. When the test options of two or more modules are limited in this way, a conict is forced to occur which m a y have been avoidable. We use gure 3 to illustrate the necessity of considering test conicts during scheduling. Two s c hedulings are shown for the dataow graph in gure 3. Dashed horizontal lines denote clock cycle boundaries. In the scheduling in gure 3a, at least one binding exists as shown which results in the concurrently testable datapath shown in gure 3b. All functional units are testable in a single test session through the test path which i s shown in dotted lines in gure 3b. The scheduling in gure 3c does not allow a binding to be generated that avoids a test conict because the outputs of both the adder and the subtracter are connected only to the right input of the divider. In order for the subtracter and adder to be tested concurrently, the divider's right input would need to receive input from both the adder and subtracter at the same time. The scheduling in gure 3a allows binding to be performed without creating a conict. Consequently, it is more advantageous, from a selftest perspective, than the scheduling shown in gure 3c.
Similarly, the example in gure 4a illustrates the necessity of considering test conicts during binding. Figure 4a shows the dataow graph from which the datapath in gure 2a was generated. Node inc1 in gure 4a has the option of being bound to IN1 or IN2. With respect to interconnect, these two options are equivalent, but from a testing standpoint, one option causes the conict shown in 2a, and the other option generates the conict-free datapath in gure 4b whose test paths (which are testable in one session) are shown in gure 4c. Notice that there is no conict at the input of IN2 because A2 may be observed through M1 as shown in gure 4c.
Suboptimal scheduling and binding decisions can easily increase test time as shown above, while reduced test time solutions may exist which incur no additional interconnect overhead. It is clear that an eort must be made during scheduling and binding to avoid these conicts.
PREVIOUS WORK
The use of high-level synthesis as a technique which allows fast exploration of microarchitectural design possibilities has been thoroughly studied. Recently, eorts have been made to incorporate new design constraints into high-level synthesis such as fault-tolerance [7, 8] and testability. T estability constraints have been included into high-level synthesis in [10] , wherein synthesis is performed to reduce sequential depth between registers and primary I/O pins. Reduction of sequential depth will reduce test time by reducing the number of patterns necessary to control and observe each possible fault, but it does not directly reduce the number of conicts between tests. In [12] , an algorithm is presented to perform register and operator binding to remove self-loops in the datapath which can cause problems during testing. Work performed in [5] is an initial attempt at incorporating test conict considerations in conjunction with synthesis. Test conicts are removed in [5] by modifying high-level synthesis binding decisions to redene RTL interconnect. Various metrics have been used to estimate the testability o f a datapath. In [3] , a metric is proposed to estimate the controllability/observability o f c hip modules based on adjacency to registers and external pins. Chiu and Papachristou [4] present metrics to estimate the fault coverage of circuit modules based on the locations of test registers in the datapath.
DESIGN REPRESENTATION
It is necessary to maintain a representation of the partial design state during synthesis in order to estimate the eect of synthesis decisions on the testability of the design. While performing binding, we model the design as a graph whose nodes are functional units and registers with a number of input ports, and a single output port. The edges of the graph represent point-to-point connections between those ports.
During scheduling (before binding has been performed), no distinction can be made between hardware modules with the same functionality, so each n o d e i n t h e Scheduling Design Representation corresponds to the set of all modules with the same functionality. Each n o d e i n t h e s c heduling design representation is annotated with an allocation value to indicate the number of modules with the corresponding functionality that will exist in the nal design. Each edge in the scheduling design representation is annotated with a connection weight which i s the total number of connections which exist between the two classes of modules that the edge spans. The Binding Design Representation distinguishes each hardware module as a different node, and each connection between modules as a dierent edge with unit weight. Figure 5 shows an example of a scheduling design representation and one binding design representation to which it maps. The goal during scheduling is to selectively break connections between adjacent dataow graph nodes by s c heduling them to dierent clock cycles. This causes the insertion of a register between the functional units to which the dataow graph nodes are eventually mapped during binding. In this way, connections between functional units and registers are distributed, to allow each operator to have sucient conictfree test options. During scheduling, test conicts can only be avoided between groups of functional units of the same type. It is not until the binding phase that conicts between individual functional units can be examined. As was illustrated in the example of gure 3, unless scheduling creates a good distribution of interconnections between hardware types, it may be impossible for binding to avoid a conict.
A s c heduling decision is chosen by rst nding a connection in the scheduling design representation whose connection weight should be increased. The connection is chosen using the proposed test conict metric to evaluate the design representation as a result of increasing each connection individually. Then a scheduling decision is selected which will cause the weight of the chosen connection in the design representation to be increased. The chosen scheduling decision is performed and the eects of the decision on the design are propagated by pruning design options which h a v e been made infeasible due to the scheduling decision. Scheduling decisions are selected in this manner until scheduling is complete.
Binding decisions distribute interconnections over the modules to allow all modules to have a m ultiplicity of test options. Operator and register binding are performed in an intertwined fashion. At each decision step, all possible operator and variable decisions are enumerated and evaluated. The options are evaluated using the proposed test conict metric to evaluate the testability of the binding design representation as a result of each option. The selected binding option is then performed and its eects are propagated throughout the design state to prune away infeasible design options. Additional binding decisions are selected until binding is complete.
The rest of the paper is organized as follows: Section 7 describes the test conict metrics which are used to guide synthesis. Sections 8 and 9 describe the use of the test conict metric during scheduling and binding. Experimental results are discussed in section 10.
METRICS FOR TEST CONCURRENCY
We propose the following testability metrics which examine an incomplete design and estimate the number of hardware sharing conicts that will occur between tests in the nal design. These metrics are used in the algorithm to evaluate the testability of proposed scheduling and binding decisions, and thus guide the algorithm towards designs with superior testability c haracteristics.
The two aspects that inuence test time are: (a) concurrency between tests, and (b) the number of patterns required for each test path. The degree of chaining aects the number of patterns for each test path. This issue has been explored previously in [13] . The research described here maximizes the test concurrency by a v oiding the creation of test hardware sharing conicts in the structural representation. Consequently, we propose metrics for modeling the number of test conicts during high-level synthesis.
Metric Denition
We identify two c haracteristics of the I/O ports of a module that indicate the testability of that I/O port.
Coverage Probability: The probability that at least one of the incoming edges will be included in a test path which connects it to LFSRs. If at least one of the incoming edges is covered (included in a test path), then the I/O port will be covered also.
Conict Probability: The probability that a hardware component will need to be multiply included in some test path(s). Conict probability is only valid for the input ports of a component, since an output port can fan out to many dierent modules, while an input port can only select a single input. To e v aluate the testability quality of a datapath, we examine the ratio between the average coverage probability and the maximum conict probability o v er all I/O ports. The average coverage is maximized because all I/O ports are equally important since they must all be tested. The maximum conict probability is used because the largest conict in any I/O port determines the maximum degree of allowable concurrency. F or example, in the datapath of gure 2a, although there is a conict on the input of only a single module, IN1, that single conict causes two test sessions to be the minimum number.
Coverage Calculation
The coverage of an I/O port is the probability that the port will be contained in a test path which connects it to either an LFSR or a MISR. The coverage probability of a port is a path based metric which e v aluates the paths between a port and a test register. The coverage can be computed as a function of the coverages of the neighboring edges and nodes.
Each input port i of a module m h a s a c o v erage Cin(m; i)
which is a function of the coverage probabilities of its incoming edges. The output port is similarly associated with a coverage probability Cout(m), which is a function of the output coverages of the outgoing edges. Each edge has an input coverage value, Cin(e), which represents the probability that it will be needed by its predecessor module to send test data to a MISR.
Each edge also has an output coverage, Cout(e), which i s t h e probability that its successor module will need it to connect to a PRPG. The formulas for Cin(e) and Cout(e) are shown in 
where mp(e) is the predecessor module of edge e, ms(e) is the successor module of edge e, iMax is the number of input ports that module mp has, OutDeg(m) is the number of outgoing edges of m, and InDeg(m; e) is the number of edges entering module m at the input through which edge e is connected.
The input coverage probability of an input port is the probability that at least one of the incoming edges is covered. This can be computed as shown in equation 2.
Cin(m;i) = 
Conict Calculation
Since conicts can only occur on the inputs of a module, only input ports have conict probabilities. The conict probability X(m;i) for input port i of module m is the probability that two incoming edges will need to be covered. This is modeled in the binding design representation by nding the probability that the two incoming edges with the highest coverage probabilities will both be covered. The coverage probabilities of two edges are assumed to be independent, so the probability o f their coincident occurrence is modelled as the product of their respective probabilities. In the binding design representation the conict probability is calculated as shown in equation 4. X(m;i) = Cin(eM1(m; i)) Cin(eM2(m; i)) (4) eM1(m; i) = e1 3 (e1 2 eList(m; i) \ 
SCHEDULING FOR TEST CONCURRENCY
During scheduling, in order to minimize test conicts it is necessary to distribute connections between dierent hardware types in such a w a y that all of the hardware types can be covered with minimal increase to the conict probability. It is important that each hardware module participate in at least one test path, but not two or more conicting test paths. The scheduling algorithm distributes connections between dierent module types to minimize the probability that a single hardware type is required to participate in multiple test paths.
Each n o d e i n t h e s c heduling design representation corresponds to the set of modules with a particular functionality.
The edge weight of each edge in the representation is the number of connections between the sets of modules represented by the nodes adjacent to the edge. Before any s c heduling decisions have been performed, only the connections to constants and architectural variables can be determined. Consequently, the weights of the edges to the variable and constant nodes in the representation are initialized with the number of connections to variables and constants respectively. Since connections to nodes other than variables and constants cannot be determined before scheduling, the edges to these nodes are initialized with zero weight.
The two t ypes of scheduling decisions which m a y be performed at each step are: (a) scheduling two adjacent nodes in the same clock cycle, and (b) scheduling two adjacent nodes in dierent clock cycles. Adjacent pairs of nodes are scheduled together so that at least one connection will be determined at each step. An adjacent pair of nodes is simultaneously considered to ensure a well-dened eect on the interconnect denition in the structural representation.
Each s c heduling decision increments the weights of edges in the scheduling design representation. The quality o f a s c heduling decision is measured by increasing the weights of the edges which will be aected, and evaluating the resulting design using the proposed testability metric for estimating coverage and conict probability. The algorithm applies one of the scheduling options which most increases the coverage/conict ratio. There may b e m a n y s c heduling options which equally improve the coverage/conict ratio. A scheduling option is chosen from the set of candidate options which best preserves the degrees of scheduling freedom of adjacent nodes.
BINDING FOR TEST CONCURRENCY
During binding, it is necessary to distribute connections between individual modules in such a w a y that all of the modules can be covered, yet no single module is required to participate in more than one test path. Each node in the binding design representation corresponds to a unique module in the structural representation, and each edge corresponds to a pointto-point connection between modules in the datapath. Before binding has been performed, no module connections can be determined, so the binding design representation contains no edges.
The two t ypes of binding decisions that can be made are: (a) binding a node to a functional unit, and (b) binding a variable to a register. Each binding decision necessitates the addition of new edges to the binding design graph. The quality of each binding decision is measured by adding edges to the design representation which w ould need to be added if the binding decision were performed, and using our testability metric to evaluate the resulting design.
As a result of a binding decision, the binding options of the remaining unbound nodes need to be restricted. No other operations may be bound to a module if there is already a module bound which is assigned to the same control step. The binding possibilities of each u n bound node are updated to avoid these hardware usage conicts. A similar algorithm is applied for updating register binding possibilities; in this case, register lifetimes are additionally considered.
EXPERIMENTAL RESULTS
The results are illustrated in two parts. We rst show detailed results for selected high-level synthesis benchmarks. We then summarize extensive results in tabular form, applied to a number of high-level synthesis benchmarks. To illustrate the performance of this system in detail, we examine the design results for the dierential equation example [15] and the design of the 16 point elliptic lter benchmark [9] (with chaining). Chained execution of operations in a clock cycle is used in the elliptic lter design, but not in the dierential equation design.
The dierential equation dataow graph is scheduled and bound by our system in 4 clock cycles, with 2 adders, 2 multipliers, and 1 relational operator. The resulting datapath is shown in gure 6. The rectangles in the datapath are architectural registers used to store input and output variables, and intermediate registers used to store intermediate values. The scheduling and binding enable each functional unit to be tested in a single test session using the test paths shown in gure 7 (the test registers are shown shaded). Test conicts are successfully avoided during synthesis by giving each functional unit a non-conicting test option.
V4 V7
Figure 7: Dierential Equation Test Paths
We also explore a design for the elliptic lter example in gure 8 which i s s c heduled and bound by our system. For clarity w e h a v e only included the dataow operations and the non-recursive edges in gure 8. The full dataow graph and signal ow graph can be found in [9] . Synthesis was performed in 9 clock cycles, with 5 adders, and 2 multipliers, with chaining. This example also enables test conicts to be avoided, and testing to be performed in a single session. The non-conicting test paths are shown in gure 9. An interesting feature is that chaining in a test path, which w ould cause an increased number of test patterns, is not necessary due to the range of test options allowed by the structure of the design.
In addition to the two designs described above, we h a v e generated multiple designs for the AR-lter [14] the FIR-lter [14] , the fth-order elliptic lter, and the dierential equation owgraphs under various constraints. The results of these synthesis experiments with dierent allocations and clock cycle limits are summarized in table 1. The Chain column indicates the degree of chaining allowed during scheduling. The last column of the table contains the number of test sessions in which each design can be tested. In all experiments, the high-level synthesis system achieved a single test session, independent of the synthesis constraints given, or the existence of chaining in the scheduling of the dataow graph.
CONCLUSIONS
The importance of test time as a component o f c hip cost has caused test time to become a design attribute that needs to be considered at the earliest stages of design. The strong eect of test conicts on the test time of a design makes the use of a test conict metric necessary in order to reduce test time during high-level synthesis. The work presented here is the rst to integrate test conict information into scheduling and binding in this way. The use of the proposed test conict metric enabled all of the designs which w e generated in our experiments to be testable with maximum test concurrency. W e h a v e thus shown that highly testable designs with high levels of test concurrency can be achieved by considering testability during microarchitectural synthesis. Future planned extensions of this work include utilization of the proposed test conict metric to select testable registers in conjunction with selection of test paths.
