Directed acyclic graphs (dags) are often used to model circuits. Path lengths in such dags represent circuit delays. In the vertex splitting problem, the objective is to determine a minimum number of vertices to split so that the resulting dag has no path of length δ. This problem has application to the placement of flip-flops in partial scan designs, placement of latches in pipelined circuits, placement of signal boosters in lossy circuits and networks, etc. Several simplified versions of this problem are shown to be NP-hard. A linear time algorithm is obtained for the case when the dag is a tree. A backtracking algorithm and heuristics are developed for general dags and experimental results using dags obtained from ISCAS benchmark circuits are obtained.
Introduction
In order to achieve high fault coverage in sequential circuits they are often designed to be easily testable. The current method of choice is the scan-design. In test mode all flip-flops in a sequential circuit, using scan-design, are connected into one or more shift registers. This allows one to set the contents of the flip-flops to the desired state as well as to observe the states of the flipflops. As the complexity of logic circuits grows, the overhead for full scan-designs may become unacceptable. For such situations, partial-scan designs have been proposed. In partial-scan designs only a selected subset of the flip-flops in a sequential circuit are included in the scanpath. Several methods to choose the flip-flops to be included in the scan-path have been proposed [CHEN90] , [GUPT90] , [LEE90] . One of these proposals gives a method to use the structural information in a sequential circuit to determine the flip-flops to be placed in a scan-path [CHEN90] . We briefly discuss this method. Figure 1 is an example of a Sgraph. Empirical evidence suggests that the existence of cycles and the maximum path length between nodes of the S-graph increase the complexity of deriving tests for sequential circuits. It was therefore suggested in [CHEN90] to include a minimum subset of flip-flops into a scan-path such that the resulting S-graph is cycle-free and the maximum distance between a pair of nodes is small. The maximum distance between node 2 o and 2 i is six. If a flip-flop corresponding to node 5 is also included in the scan-path then the S-graph of Figure 3 is obtained. In this the maximum distance between any pair of nodes is less than or equal to 3. Two step methods to select the flip-flops to be scanned were proposed in [CHEN90] , [GUPT90] , and [LEE90] . In the first step a minimal subset of flip-flops is selected to be included in the scan-path such that the resulting S-graph is acyclic. In the second step additional flip-flops are selected to be included in the scan path such that in the resulting S-graph the maximum distance between any pair of nodes is less than or equal to a specified number δ. This second step can be modeled as a vertx splitting problem on directed acyclic graphs (dags).
In this paper we study solutions to the problem of finding a minimum number of nodes, in a dag, to be split such that the maximum distance between any two nodes in the resulting digraph is less than or equal to a pre-specified value δ. The dags we consider are more general than the ones that arise from S-graphs. We permit each edge in the dag to have a positive integral weight instead of requiring all edges to have unit weight. This generalization can be shown to have application in the placement of latches in pipelined circuits and in the placement of signal boosters in lossy circuits.
In Section 2, we introduce the terminology we shall use in the remainder of this paper. The NP-hard results are developed in Section 3 and the linear time algorithm for tree dags is given in Section 4. A backtracking algorithm and heuristics for the dag vertex splitting problem are proposed in Section 5 and 6, respectively. Section 7 reports on experiments with the ISCAS benchmark circuits. It should be noted that a linear time algorithm for series-parallel dags is easily derived from the linear time dag vertex deletion algorithm of [PAIK90] .
Terminology
Let G = (V,E,w) be a weighted directed acyclic graph (wdag) with vertex set V, edge set E, and edge weighting funtion w. w (i, j) is the weight of the edge < i, j > ∈ E. w (i, j) is a positive integer
A source vetex is a vertex with zero indegree while a sink vetex is a vertex with zero out-degree. The delay, d (P), of the path P is the sum of the weights of the edges on that path. The delay, d (G), of the graph G is the maximum path delay in the graph, i.e., Figure 3 shows the result, G /X, of splitting the vertex 5 of the dag of Figure 2 . The dag vertex splitting problem (DVSP) is to find a least cardinality vertex Lemma 1: Let G = (V,E,w) be a weighted dag and let δ be a prespecified delay value. Let Max- 
Complexity Results
If w (i, j) = 1 for every edge in the wdag, then the edge weighting function w is said to be a unit weighting function and we say that G has unit weights. In this section we show that the following problems are NP-hard.
1. DVSP for unit weight graphs with δ ≥ 2. Since unit weight wdags are just a special case of general wdags, the results obtained imply the NP-hardness of the corresponding problems with the unit weight constraint removed.
Unit Weight DVSP
We shall show that the known NP-complete problem 3SAT can be solved in polynomial time if the unit weight DVSP with δ ≥ 2 can.
3SAT Problem[GARE79]
Input: A boolean function F = C 1 C 2 . . . C n in n variables x 1 , x 2 , ... , x n . Each clause C i is the disjunction of exactly three literals.
Output: "Yes" if there is a binary assignment for the n variables such that F = 1. "No" otherwise.
For each instance F of 3SAT, we construct an instance G F of the unit weight DVSP such that from the size of the solution to G F we can determine, in polynomial time, the answer to the 3SAT problem for F. This construction employs two unit weight dag subassemblies: variable subassembly and clause subassembly.
Variable Subassembly
Figure 4 
Clause Subassembly
The clause subassembly CS (j) is obtained by connecting together four δ − 1 vertex chains with another three vertex subgraph as shown in Figure 5 (a). The schematic for CS (j) is given in Figure 5 (b). The number of vertices in CS (j) is 4δ − 1 and d (CS (j)) = 2δ. One may easily verify
To construct G F from F, we use n VS (i)'s, one for each variable x i in F and m CS (j)'s, one for
There is a directed edge from vertex Figure 6 is obtained. Since the total number of vertices in G F is 3δn + (4δ − 1)m, the construction of G F can be done in polynomial time for any fixed δ. Theorem 1: Let F be an instance of 3SAT and let G F be the instance of unit weight DVSP obtained using the above construction. For δ ≥ 2, F is satisfiable iff there is a vertex set X such
Proof: If F is satisfiable then there is a binary assignment to the x i 's such that F has value 1. Let b 1 ,b 2 , ... b n be this ssignment. Construct a vertex set X in the following way:
1.
2. >From each CS (j) add exactly two of the vertices l j 1 , l j 2 , l j 3 to X. These are chosen such that the literal corresponding to the vertex not chosen has value 1. Each clause has at least one literal with value 1.
We readily see that X = n + 2m and that d (G F /X) ≤ δ.
Next, suppose that there is an X such that X = n + 2m and d (G F /X) ≤ δ. >From the construction of the variable and clause assemblies and from the fact that X = n + 2m, it follows that X must contain exactly one vertex from each of the sets {x i , x _ i }, 1 ≤ i ≤ n and exactly 2 from each of the sets {l j 1 , l j 2 , l j 3 }, 1 ≤ j ≤ m. Hence there is no i such that both x i ∈ X and x _ i ∈ X and there is no j for which l j 1 ∈ X and l j 2 ∈ X and l j 3 ∈ X. Consider the Boolean assignment
Suppose that l jk / ∈ X and l jk
) must be split as otherwise there is a source to sink path with delay greater than δ. So, x i (x _ i ) ∈ X and b i = 1 (0). As a result, the k'th literal of clause C j is true. Hence, b 1 , ... b n results in each clause having at least one true literal and F has value 1. When δ = 1, the unit weight DVSP is easily solved as now every vertex that is not a source or sink has to be split.
DVSP For Unit Weight Multistage Graphs
A multistage graph is a dag in which the vertices are partitioned into stages and each edge connects two vertices in adjacent stages. An example is given in Figure 7 . In the construction of Section 3.1, VS (i) is a multistage graph but CS (j) is not as the edges < l j 1 , l j 2 >, < l j 2 , l j 3 > require l j 1 and l j 3 to be two stages apart while the edge < l j 1 , l j 3 > requires them to be one stage apart.
To show that DVSP for multistage graphs is NP-hard, we use the problem 2-3SAT defined as:
Input:
A boolean function F = C 1 C 2 . . . C n in n variables x 1 , x 2 , ... , Output: "Yes" iff there is a truth assignment for the n variables such that F = 1. "No" otherwise. 
In this way F is transformed into an instance H of 2-3SAT. One may verify that H is satisfiable iff F is.
>From an istance F of 2-3SAT we can construct an istance G F of the multistage DVSP using the variable and clause subassemblies of Figure 8 .
One may verify that for δ ≥ 4 :
The construction of G F is similar to that used in Section 3.1 except that the variable and clause subassemblies of Figure 8 are used. In case C j = 2, a modified CS 2(j), subassembly as in Figure 9 (a) is used. If C j = 3, then a modified CS 3(j) is used. This modification is now Proof: Analogous to that of Theorem 1.
Tree DVSP
In this section we develop a linear time algorithm for the DVSP when the wdag G is a rooted tree.
The algorithm is a simple postorder [HORO90] traversal of the tree. During this traversal we compute, for each node x, the maximum delay, D (x), from x to any other node in its subtree. If x has a parent z and D (x) +w (z,x) exceeds δ, then the node x is split and D (z) is set to 0. Consider the example tree of Figure 11 and assume δ = 3. The delay, D (x), for x a leaf node is 0. So, D (x) = 0 for x ∈ { h , i , e , j , k }. In postorder, a node is visited after its children have been. When a node x is visited, its delay may be computed as: Theorem 4: Procedure DVSP_tree finds a minimum cardinality X such that d (T/X) ≤ δ.
Proof:
The proof is by induction on the number, n, of nodes in the tree T. If n = 1, the theorem is trivially valid. Assume this is so for n ≤ m where m is an arbitrary natural number. Let T be a tree with n + 1 nodes. Let X be the set of vertices split by DVSP_tree and let W be a minimum cardinality vertex set such that d (T/W) ≤ δ. We need to show that X = W . If X = 0, this is trivially true. If X > 0, then let z be the first vertex added to X by DVSP_tree. Let T z be the subtree of T rooted at z. As z is added to X by DVSP_tree, D (z) + w (parent (z),z) > δ. 
A Backtracking Algorithm For DVSP
Backtracking algorithms [HORO78] generally search a tree organization of the solution space using bounding functions. The solution to our problem is a 0/1 vector X = (
where n is the number of vertices and x i = 0 iff vertex i is not split. We use the binary tree organization used in [HORO78] for the 0/1-knapsack problem. In this organization, the nodes at level i denote a decision on x i , 1 ≤ i ≤ n. If x i = 0 we move to the left subtree. Otherwise we move to the right subtree of a level i node. Figure 15 shows the solution space tree for the case n = 3.
Each root to leaf path defines a vector X in the solution space.
The remaining features of our backtracking algorithm are :
1) The vertices of the dag are considered in topological order. Thus, x i (of Figure 15 ) denotes a decision on whether or not the i'th vertex, in the topological order, of the dag is split. 
Heuristics For DVSP
We formulate four simple and intuitively appealing constructive heuristics to obtain a set X such that d (G /X) ≤ δ. All four split one vertex at a time until the remaining dag has delay ≤ δ. They assume that the input dag has a feasible solution. I.e., no edge has delay > δ.
The first three heuristics have the form given in Figure 16 and differ only in the criteria used to select the next vertex to split. 
Heuristic 1 (h1)
The selection criteria for the next vertex to split is : This heuristic is easily implemented to have run time O( k (n + e) ) where k is the number of vertices split, n is the number of vertices in the dag, and e is the number of edges in the dag.
Heuristic 2 (h2)
In this heuristic, the next vertex, v , to split satisfies criteria a) and b) of Heuristic 1. In addition, the following criteria is employed:
c') Of all the vertices that satisfy a) and b), v is a vertex whose splitting results in a dag that has the fewest number of vertices that are on paths of delay > δ. Ties are broken as in h1.
Heuristic 2 may be implemented to have complexity O( kne ).
Heuristic 3 (h3)
Heuristic 3 also uses criteria a) and b) used by Heuristic 1. However, criteria c) is replaced by:
c'') Of all the vertices that satisfy a) and b), v is such that its splitting results in a dag with least delay. I.e., v is such that d( G /(X ∪ {v}) ) is minimum over all choices for v. Ties are broken as in h1.
The complexity of Heuristic 3 is O( kne ).
Heuristic 4 (h4)
In this heuristic, the vertices of the dag are examined in two different orders: topological and reverse topological. When the i'th vertex in the topological (reverse topological) order is examined, it is split if the current dag contains a path comprised solely of vertices 1, ... , i and one additional vertex that has delay > δ. The heuristic is specified in Figure 17 . It can be imple- 
Experimental Results
The backtracking algorithm of Section 5 and the four heuristics of Section 6 were programmed in Pascal and run on an Apollo DN3500 workstation. We experimented with two sets of acyclic directed graphs. The first set was obtained from the S-graphs of the ISCAS-89 benchmark sequential circuits [BRGL89] . The S-graphs were first rendered cycle free by the procedure given in [LEE90] . The characteristics of the resulting dags are given in Table 1 . The other set of graphs was derived from the ISCAS-85 benchmark combinational circuits [BRGL85] . Here the nodes in the digraph model the gates in the circuit and the edges correspond to the connections between gates. Associated with each edge is the propagation delay along the corresponding circuit gate input. The edge delay was set to the maximum of the rising and falling delays provided in [BRGL85] . The characteristics of these circuits are given in Table 2 . For each dag, G, we experimented with the δ values { .9d (G), .8d (G), .7d (G), .6d (G), .5d (G), .4d (G) }. Table 3 gives the results for the case G = s400. Note from Table 1 that d (s400) = 16. For δ close to d (s400) (specifically, δ = 12 and 14), all four heuristics found optimal solutions. Heuristic 2 was the only one that obtained optimal solutions for all tested δ values. Table 4 gives the performance of circuit s38584. The backtracking algorithm was able to complete only for the case δ = .9d (G) and δ = .8d (G) in the time alloted for each run. Heuristic h2 consistently obtained better solutions than obtained by the remaining heuristics. However, its run time, while quite acceptable, was greater than that of heuristics 1 and 4. Table 5 gives the results for the combinational circuit c432. For this circuit, heuristics 2 and 3 found the optimal solution for all tested δ values. The results for circuit c6288 are given in Table 6 . The backtracking algorithm successfully found the optimal solution only for the cases δ = 287.89 = 0.9d (G) and δ = 255.90 = 0.8d (G) . Of the four heuristics, h2 obtained the best solutions for five of the six δ values tested and h4 was best for the remaining δ value. Tables 7 and 8 give the total number of nodes split by each of the four heuristics for each of the sequential and combinational circuits, respectively. For each circuit the six δ values { .9d (G), .8d (G), .7d (G), .6d (G), .5d (G), .4d (G) } were used and the tables give the sum of the number of vertices split for each of these δ values. Table 9 and 10 give the % of tests on which each heuristic obtained the best solution. Heuristic 2, on average, was significantly better than the others.
Tables 11 − 14 give the number of nodes split at the two extremes δ = 0.9d (G) and δ = 0.4d (G) of the range of δ values tested. Generally, for δ close to d (G) the four heuristics tended to obtain solutions of comparable quality while for smaller δ the differences were more noticeable. However, in all δ ranges tested, heuristic 2 tended to produce the best solutions. The average run time for each of the circuits and each δ value is given in Tables 15 and 16. As can be seen heuristics 1 and 4 are very fast. While heuristic 2 is significantly faster than heuristic 3, it is h2  h3  h4  optimal  h1  h2  h3  h4  optimal   116  21  2  2  5  2  3  61  61  < 1  222  103  15  2  4  24  2  2  64  127  < 1  17280  90  18  2  5  20  -3  68  181  < 1  -77  27  4  6  20  -5  127  218  < 1  -64  40  8  13  27  -7  293  682  < 1  -51  89  10  37  44  -18  439  2126 < 1 - circuit  h1  h2  h3  h4   s400  28  20  23  33  s420  31circuit  h1  h2  h3  h4   s400  50  100  83  33  s420  83  83  83  83  s526  100  33  50  50  s526n  100  33  50  50  s838  33  83  83  83  s1423  83  17  100  33  s5378  17  100  83  67  s9234  0  100  50  0 s13207  5  3  3  3  3  < 1  9  10  < 1  16  s15850  5  2  2  2  2  < 1  25  25  < 1  8  s35932  10  10  10  10  -< 1  127  157  < 1  -s38417  5  3  3  2  2  < 1  47  56  < 1  10  s38584  21  2  2  5  2  3  61  61 < 1 222 ----
