Abstract-It is known that the clock-period in a sequential circuit can be shorter than the maximum signal delay between registers if the clock arrival time to each register is controlled. We propose an algorithm to find the minimum clock-period of a circuit whose signal propagation delays are given. Experimental results on LGSynth93 benchmarks show that this technique achieves as much as about 16% reduction of clock-period compared with the conventional maximum signal delay based methods. An application of this technique to improve the reliability of circuits is considered.
I. INTRODUCTION
In the design of a sequential circuit with globally clocked registers, there are various techniques to achieve a shorter clock-period [I, 2, 3 , 4, 5, 6, 9, 13, 151 .
A signal starts to propagate from a register when the register is ticked by a clock-edge. It must arrive at the next register before it is ticked by the next clock-edge. If every register in the circuit is ticked exactly simultaneously, the maximum signal propagation delay along functional elements (wires, transistors, etc.) from a register to another register is a lower bound of the clock-period. The purpose of retiming and performance driven layout is to reduce this delay in the circuit. Retiming investigated in [9, 121 relocates the registers of a given circuit while preserving its functionality. In a unit delay circuit, Papaefthymiou [12] gave an exact characterization of the minimum clock-period that can be achieved by applying a retiming technique in terms of the minimum cycle-mean introduced by Karp [7] .
The clock-delay of a register is the time needed for a clock-edge to propagate from the clock source to the register. However, by various reasons caused mainly by the layout, a clock-edge arrives at registers with a not negligible difference of time each other, which brings about the clock-skew. Conventional design usually sets the clockperiod no smaller than the maximum signal propagation 37 delay plus the maximum clock-skew to guarantee the correct function. Thus the clock-skew has been considered as a negative effect against speeding up a sequential circuit and efforts have been towards its elimination, i.e. zero clock-skew routing [3, 5, 151.
Bowever there is a different point of view which makes use of the clock-skew to shorter the clock-period. For the circuit whose signal propagation delays are fixed, niinimizing the clock-period by controlling arrival times of clock-edges to registers is called here the clock schedulzng problem. The decision version of the problem to determine if a given clock-period works correctly was formulated as a feaszbalaty decasaon problem of a system of linear inequalities [l, 41 . Similar ideas are found in the multiphase clock scheduling problem [B, 13, 141 . Since their algorithms only answer the decision problems, they can only find a solution with precision k bits after IC trials following the binary search strategy.
Due to routings, process variations, etc., it may be hard to estimate accurately signal propagation delays. It may also be hard to supply clock in designated timing. In such cases, we should take these deviations into consideration. The clock scheduling can be used to improve the reliability of a circuit.
In this paper, we solve the clock scheduling problem: the exact solution is characterized graph theoretically and a polynomial time algorithm to find it is presented.
The clock scheduling problem is formulated on a directed weighted graph consisting of two kinds of edges to find a particular cycle. A polynomial time algorithm is provided to find it. Accordingly, the minimum clock-period and the clock-timing of each register are determined. Experimental results on LGSynth93 benchmarks revealed the significance of such considerations on clock scheduling by showing 16% reduction of the clock-period on average. They also show that a circuit reliability is improved by clock scheduling when the clock-period is fixed.
To achieve such a controlled clock scheduling, techniques for clock distribution routing must be developed For the purpose, it is believed the technology of zero clock- skew routing is available [ 3 , 5, 151.
The rest of the paper is organized as follows. Section I1 analyzes the clock scheduling problem and formulates it as a graph problem. Section 111 introduces some terms which are defined on the graph and discusses their iniplications. Section IV proves the main theorem that claims how the minimum clock-period is determined, and realizes the claim as an algorithm. Experimental results are presented in Section V. Section VI is the conclusion.
PRELIMINARIES
Let G be a directed edge-weighted graph. V ( G ) and E(G) denote the vertex set and edge set of G, respectively. An edge (U,.) with weight w (u,v) (P) , is the sum of the edge weights of P, that is, w ( P ) = Clsilk w(ei).
The sequential circuit N under consideration consists of registers and gates, and wires connecting them. Every register is clocked equi-period but not necessarily simultaneously. In clocking design, only the signal propagation delay between registers is concerned, which is not unique because of signal propagations on va,rious paths, different rise and fall gate delays, etc. In our discussion, only the maximum and minimum propagation delays are significant which are assumed to be estimatable.
We model N by a directed graph G as follows: a vertex v E V ( G ) represents a register and an edge (U,.) 
does the signal transmission from register u to register v along' functional elements of the circuit. The weight of edge (U] .) is a pair (,~,i~(~,w),w,,,(u,w) ) where wmin (u,w) and wmac (u,w) are the minimum and maximum signal propagation delays, respectively. See an example in Fig. 1 .
A clock from the clock source arrives a t each regis- No-Double-Clocking Constraint:
Other technology dependent constraints related to the setup and hold time of the regist,ers and the deviation of due propagation delays, etc. [l$] , are assumed to be c~nt~aiiied in the signal propagation delays. If there are registers constrainted to be triggered simultaneously, such as these concerned with primary inputs and/or outputs, then we consider these registers as one register. Thus our framework is defined only by the no-double-clocking and no-zero-clocking constraints. The period t is called feasible if there exist d(v)'s that satisfy both of them.
Note that if t is the minimum clock-period, any t' 2 t is also a feasible clock-period.
These constraints are formulized in linear inequalities
Z-edge : wmin (u,v) or = t-w,,,(u, U ) . While we regard d(w)'s being unknown variables and f ( u , v)'s given constants, the system is called the difference constraint. The following well-known fact enables us to solve the decision version of the problem [9, 8, 11, la] . Our problem is t o find the minimum clock-period efficiently. We define the constraint graph Gt, where t is a variable standing for the clock-period, is obtained from the graph model G of the circuit by replacing each edge ( U , U) E E ( G ) with two edges ( U , U ) and (U, U ) with weight w m i n ( u , v ) and t -w m a z ( u , w), respectively. The former edge is called the D-edge and the latter the 2-edge, respectively. D-edges (Z-edges) correspond t o the No-Double (No-Zero) Clocking Constraints. For example, the constraint graph C, for the circuit shown in Fig. 1 is shown in Fig. 3 . We claim that the minimum t and associated clock scheduling are determined from this graph.
Once a feasible t is given, we can determine clock-timing d(w) of every register by solving shortest path problems on G t . The clock-timing of d(w)'s that makes t feasible will be in some range. We can use this margin to improve the reliability of the circuit by choosing the clock-timing appropriately. A related discussion with experiments will be given.
FORMULATION OF THE MINIMUM CLOCK-PERIOD
Let Gt be the constraint graph. For a subgraph or an edge set S of G t , E z ( S ) denotes the set of Z-edges contained in S . For an edge set S which contains at least one Z-edge, w(S)/lEz(S)l is called the 2-mean of S, and denoted as G ( S ) . Particularly, if S is a cycle, it is called a 2-cycle and G ( S ) is called the cycle 2-mean. Its minimum over all Z-cycles in G t , called minzmum cycle Zm e a n of Gt and denoted as Z ( G , ) , is a key index in the following discussion.
In the following, we assume that the weight of each D-and Z-edge is finite since the signal propagation delay is naturally so assumed. Due to the hold time, etc., the weights of some D-edges may be negative. However, if there is a negative weight cycle which consists only of D-edges, there is no feasible clock-period by Lemma 1. Therefore our concern is the circuits in which the weight of any cycle which consists only of D-edges is zero or positive. However if there is a zero weight cycle the following discussion becomes unnecessarily lengthy although the similar discussion is possible. Therefore, for compactness of presentation here, we assume that the weight of any cycle which consists only of D-edges is positive. (Note that a negative or zero cycle is easily checked.)
Our problem is to determine the minimum clockperiod TN of the circuit N by clock scheduling. The following theorem is claimed by Lawler [8, 141 but we state it differently for our purpose. A proof is given for completeness. 
IEz(C)I 2 1, we have TN 2 -w ( C ) . Since it holds for any Z-cycle, TN 2 -Z(Go).

Next we show that TN 5 -Z(Go).
The weight of each Z-cycle of C' of G,, is non-negative by Lemma 1. There exists a Z-cycle C' of G,, such that w(C') = 0, otherwise we can reduce the clock-period further without violating the feasibility (Lemma l), which contradicts TN being minimum. Let C be a Z-cycle of Go corresponding to C'.
Then w(C') = w ( C ) + T N I E~( C ) I
= 0 and we have 0
TN = -W ( c ) / l E z ( c ) l = -a ( c ) 5 -Z(Go).
Iv. COMPUTATION OF THE MINIMUM CYCLE Z-MEAN
A. Main Algorithm
Now our problem is t o provide a polynomial time algorithm to compute the minimum cycle Z-mean Z ( G o ) .
Graph Go is a very special graph such that every edge has a parallel edge of opposite direction and different type. The weights of these two edges correspond to the minimum delay and the maximum delay with minus sign, respectively. An example shown in Fig. 4 is the one derived from a circuit in Fig. 1 . We assume that Go is strongly connected since otherwise we can determine the minimum cycle Z-mean as the minimum of cycle Z-means of all strongly connected components of Go. Let n 1 IV(Go)l. Let s be an arbitrarily chosen vertex of Go. For every vertex U E V(G0) and every non-negative integers i and j (0 5 i 5 j 5 n ) , let ?(U, i / j ) be the set of Step 1: Choose a vertex as s, and compute F ( v , i / j ) for every triple ( v , i , j j .
Step 2:
Step 3:
Step 4:
Compute X ( w , i ) for each v E V(G0) and i .
Compute Q(w,i) for each v E V(G0) and i .
Take the minimum of all Q(w,i) over all pairs (w, i ) . Then, output its minus as TN.
Step 5: Find a shortest path in G,, from s to each vertex v , and determine d(w) as the length of the shortest path from s to w. In terms of F ( v , i / j ) , two more indexes X ( v , i ) (0 5 i 2 n ) and Q ( v , i ) (1 5 i 5 n ) are defined.
X ( w , i ) = min F ( w , i / j ) . Z < J < T l
That is, X ( w , i ) denotes the minimum of weights of all walks from s to v that use exactly i Z-edges and at most n edges. While Q(w, i ) is the upper bound of the Z-mean of a cycle contained in a walk whose weight is minimum in
Our proposing algorithm is shown in Fig. 5 . following, we show the correctness of the algorithm.
In the
B. Theorem, lemmas, proofs, and complexity analysis
The correctness of Step 5 in Fig. 5 comes from Lemma 1. Then the main theorem claims that the algorithm in Fig. 5 
the proof completes. Thus we assume that F(w,i/n) is finite and F(w, i / n ) = X ( v , i ) .
Let P be a walk whose weight is minimum in P(w, i / n ) , i.e., w ( P ) = F ( w , i/n). Since the number of edges of P is R , P contains cycles. Let arbitrary one of them be C . For simplicity of terminology,
P' be the walk obtained from P by deleting the edges of C. (Note that a walk minus a cycle is a walk.) Then, the number of Z-edges, the number of edges, and the weight of the walk P' are i -i * , n -j * , and F(w, i / n ) -w*,
and since F ( v , i -i * / n -j * ) is the minimum weight of such walks,
If i* = 0, then w* > 0 by the assumption that the weight of a cycle consisting only of D-edges is positive.
We have
we have i* 2 1 Then the cycle Z-mean of C , which is wz(C), is w * / i * . Since Z(G0) is the minimum of such averages, we have
Proof: Let C be a Z-cycle of weight 0 in GO. Let w' be a vertex in V ( C ) and P' be a minimum weight walk from s to w'. Since the weight of C is 0, a walk, P' followed by any number of repetitions of C is also a minimum weight walk. Any initial part from s to a vertex on the way of a minimum weight walk is also a minimum weight walk from s to the vertex. So there exists a vertex w* E V ( C ) such that a walk that comprises exactly n edges is a minimum weight walk from s to U*. Let P be such a minimum weight walk.
Lemma 4 There exzst a vertex w E V(G0) and an znteger
Proof: Letting t = Z(Go), G' is a graph obtained from Go by reducing each Z-edge weight by t . Then an important fact is that Z(G') = 0. Let F'(w,i/j) be the minimum weight of a walk from s to v in G' using i Z-edges and j edges (i.e. elements of ?(U, i / j ) defined in G'). X ' ( v , i ) and Q'(w, i ) for G' are similarly defined.
Another important fact is that a minimum weight walk from s to v in G' using i Z-edges and j edges is also such a minimum weight walk in G. Hence
O_<a'<a* -1
Thus Q(w*, i * ) 5 Z(Go).
0
Since TN = -Z(Go), Theorem 2 comes from Lemmas 2 and 4. Notice that Theorem 2 is naturally described independently from the choice of s , the standard register used for clock-timing reference. Notice also that the minimum cycle Z-mean can be computed for any graph G consisting two types of edges, D-edges and Z-edges, if the weight of any cycle consisting only D-edges is positive.
Finally, we show that TN and the corresponding clocktiming d(w) for all register w in N can be computed in O(n2e) time where n = IV(Co)l and e = (E(G0)l.
Step 1 can be executed in O(n2e) time. The following steps totally take further O(n3) times. Sincc N is strongly connected, n 5 e , so, the minimum cycle Z-mean can be computed in O(n2e) time.
C. Example
The constraint graph Gt with t = 0 is shown in Fig. 4 for a sequential circuit N in Fig. 1 . The minimum cycle Z-mean of Go is -9 which cycle ( a , b ) , ( b , cl), ( d , U ) attains.
Then, the clock-timing d ( v ) of each iv is obtained as follows: taking register a as the standard, See a clock realization in Fig. 6 . It is easily verified that the clock assignment as above works correctly,
V . EXPERIMENTS The clock scheduling technique was applied to
LGSynth93 benchmarks using JCC JS5/70 (equivalent to SUN SPARC 5). Results are presented in Tables I and 11. The first experiment is to see the effect of the clock scheduling. In Table I , "Regs" include registers themselves and IO terminals. ''wmaZ'' is the maximum delay between registers estimated by the sum of the maximum gate delays along a path where the maximum gate delay is defined as the larger of raise and fall delays. Similarly, minimum delay is estimated by the sum of the minimum gate delays along a path. While "TN" is our result. The reduction of clock-period to the maximum signal propagation delay is 16.1% on average.
The second experiment is to see the allowance in clock schedule design. In Table 11 , the minimum clock-periods are shown in cases that a certain deviation from the estimated propagation delay is inevitable. They are within &2.5%, +5%, or &lo% of each signal propagation delay. The last column shows the allowance of the deviation of propagation delay in percent precision in the case that the clock-period is set equal to the maximum signal propagation delay in the circuit. This shows that the clock-period which equals to the maximum signal propagation delay can be achieved if the deviation of each signal propagation delay is at most 13.7% (average) in the case of LGSynth93 benchmarks. We can also show that we can achieve the clock-period which equals to the maximum signal propagation delay if the deviation of due clock-timing is at most 3.5% (average) of the maximum signal propagation delay (Experimental results are omitted here).
VI. CONCLUDING REMARKS
Considerations of this paper are fully standing on the expectation that some way is available which can control the clock distribution with arbitrary clock-timing to every register. Though such a technology falls beyond the scope of this paper, it seems probable that zero clock-skew routing is available for the purpose. In fact, some of zero clock-skew routing algorithms, for example [3], are considered. When a large amount of clock-skew is requested, combination of the multi-clock source and clock routing techniques may be valid for our purpose. The idea in [lo] may be helpful in which the non-zero clock-skew routing problem is discussed. The larger the differences of the maximum and minimum propagation delays, the severer the constraints are, so the reduction of the clock-period tends to be smaller. Thus our results have created a new problem to reduce the difference of the maximum and minimum propagation delays from registers to registers.
For reliability concerned, it is essential to take an enough margin for clock-period to ensure a correct performance of a sequential circuit. The idea in this paper will be useful to improve circuit reliability in this way.
The urgent future work in this direction should be to develop a layout optimization techniques based on t,he clock scheduling technique. 
