We propose a novel sequential delay op-
clock (i.e., single-phase circuits) and the latching is always positive (or always negative) edge triggered. We assume that the clock has a period of g$seconds and every gate has a unit propagation delay. Retiming [1] attempts to reduce the clock period of S to # -c (c > O) by moving the latches in the circuit.
The behavior of the retimed circuit SR is identical to the behavior of S for all input sequences.
If the desired clock period 4 cannot be achieved by retiming, then a combination of combinational logic resynthesis and retiming can be used to achieve the desired clock period [2, 3, 4] . A combinational delay optimizer can be used to resynthesize the combinational logic of the sequential circuit. The delay optimizer attempts to satisfy delay constraints specified as the arrival and required times of the primary inputs and primary outputs, respectively, of the combinational circuit. A simple and naturrd specification of delay constraints is to assign an arrival time of O to all primary inputs and a required time of~to all primary outputs. However, in many cases it may be impossible to resynthesize the circuit to meet this delay constraint [4] . A recent technique [4] enables retiming by using combinational logic transformations. They use forward movement of latches to derive a set of arrival and required times for the inputs and outputs of the combinational logic of the sequential circuit, However, aa we show in Section 2, these delay constraints may be unduly restrictive. This is because they are computed based on local information like slacks at latches.
In this paper, we propose a sequential delay optimization technique that simultaneously exploits delays on all paths in the circuit. Recognizing that a delay optimizer can satisfy certain delay constraints more easily than others, we first propose a measure of difficulty for the delay optimizer. We then derive a delay constraint set that is optimal in the sense that it is the easiest constraint that can be specified to the delay optimizer.
The optimal delay constraint set is computed by viewing the sequential circuit as an interconnection of path segments with pre-specified delays. Path segments are bounded by flip-flops, primary inputs or primary outputs. We simultaneously consider delays on all path segments and formulate the delay constraint calculation problem as a minimum cost network flow problem. The optimal solution to the flow problem corresponds to an optimal delay constraint set. If the delay optimizer satisfies the optimal delay constraint set, then the resynthesized circuit may have several paths exceeding the desired clock period. However, we show that the resynthesized circuit can always be retimed to achieve the desired clock period. The path segment view of sequential circuits also yields a new retiming technique for unit delay circuits. This technique is presented elsewhere [5] .
DELAY CONSTRAINT COMPUTATION
Consider a sequential circuit S. Let L = {II lk } be the set of flip-flops, primary inputs and outputs of S. The pri30th ACIWIEEE Design Automation Conference"
Permission~copy without fee all or part of ttr$smaterial is grsntedprovided that rhe copiesarenot madeor distributedfor direct cosnrne+al advantage, $e ACM copyn ht notice and the title of the ubhcationand its dateappear,andnotice is given that copying is by nnission of the Assocaatmn for Corn utrsr Machinesy.~o ccpy otherwise,or to rep$lish, requiresa fee ad/or specificpcsmissioo. @l~3 ACM O-89791 -577-l/93/0W5018$ 1.58 mary inputs or latch outputs of S are the primary inputs of its combinational logic. Also, the primary outputs or latch inputs of S are the primary outputs of its combinational logic. In the sequel, we will refer to the primary inputs and primary outputs of the combinational logic as inputs and outputs, respectively. A combinational delay optimizer attempts to satisfy pre-specified maximum tolerable path delays between inputs and outputs of the combinational logic. We refer to the maximum tolerable path delay between a given input and output of the combinational logic aa a delay constraint.
Different input and output pairs can have different delay constraints. These delay constraints are usually specified as the arrival and required times of the inputs and outputs, respectively, of the combinational logic. We use the following notation to represent the arrival and required times of inputs and outputs, respectively, of the combinational logic. If ii is a latch, its output is an input to the combinational logic. We denote the arrival time of the latch output by x;. Similarly, the input to latch li is also an output of the combinational logic and the required time of the latch input signal is denoted by x;. If /i is a primary input of S, it is also an input of the combinational logic. The arrival time of this input is denoted by z;. Note that in this case, z: is not defined. Similarly, if 1: is a primary output of S, then it is also an output of the combinational logic. The required time of this output is denoted by z:. Note that in this case, z~is not defined. In the absence of external interface constraints, we assume that the arrival time for all primary inputs is O and the required time for all primary outputs is the desired clock period. If external timing constraints are specified, they can be easily incorporated into our delay computation framework.
Forward movements of latches can be used to obtain a set of delay constraints [4] . If latch Zi has an input slack s;, then they assign the latch output signal an arrival time of z~= -s:. This is possible since the Iat ch can be moved forward by sa without making any paths terminating at l; to become critical. However, moving the latch forward by si implies that the latch input signal must arrive s: units of time earlier than the default required time of 4 -c for all outputs of the combinational logic. Therefore, the new required time for the latch input signal is z; = # -e -si. The delay optimizer resynthesizes the combinational logic under these delay constraints. However, there are cases when the delay optimizer will fail to meet the delay specifications. Again, there may not exist a combinational logic implementation that satisfies the delay requirements.
For example, consider the circuit in Figure 1 circuit is # = 3 and this cannot be reduced any further by retiming. This is because there is a combinational path between primary input d and the primary output~that has a delay of 3. Also, combinational delay optimization cannot reduce the delay of the circuit any further.
If the desired clock period is 2, latches /1 and lZ have an input slack of 1. However, latch 23 has no input slack. Therefore, z; = z; = -1 and z: = O. The arrival time of all primary inputs is O and the required time of the primary output~is 2. The required times of the latch input signals of /1, L and is are x: = z; = 1, z: = 2. It is impossible to resynthesize the combinational logic to meet these delay constraints since there does not exist an implementation for j that meets the delay constraints. However, as we show later in Section 6, it is possible to compute an easier set of delay constraints that can be satisfied by the delay optimizer.
A clock period of 2 can be achieved by combinational resynthesis and subsequent retiming.
The above example clearly reveals limitations of computing delay constraints based on local information like the input slack of latches.
A MEASURE OF DIFFICULTY
A combinational delay optimizer can satisfy certain delay constraints more easily than others. For example, a delay constraint set that requires all path delays to be less than 4 -c is more stringent than a delay constraint set that requires most paths to have a delay less than or equal tõ -c but allows some paths to have a delay more than # -c. This is because the delay optimizer may be able to resynthesize the logic to satisfy the latter constraint set but it may fail to satisfy the former constraint set. Also, if the delay optimizer satisfies the former constraint set, it automatically satisfies the latter const mint set.
We propose the following measure of difficulty for combinational delay optimization.
Given only structural descriptions of circuits, we use path lengths in the combinational logic to obtain a measure of difficulty. If functional information about the circuit or its internal signals is available, it is possible to incorporate this information into our measure. Let D1 and DZ be two delay constraint sets on paths in the combinational logic. Let p be any path in the combinational logic. If the maximum allowable path delay on any path p in constraint set D1 is always greater than or equal to the corresponding allowable path delay on p in set DZ, then we define D1 < Dz. Note that our definition induces a partial order on the delay constraints on paths in the combinational logic. Constraint D1 is less stringent than D2 because the delay optimizer may be able to satisfy D1 but it may fail to satisfy D2. Also, DI is automatically satisfied whenever Dz is satisfied.
Let DZ be the set of actual maximum path delays between any input and output pair of the combinational logic.
If the combinational logic has m inputs and n outputs, then DZ can have at most m x n elements. Since the clock period of S is more than the desired clock period 4 -c, the delay on some paths in the combinational logic exceeds~-c. Paths with delays exceeding the desired clock period are called long paths and paths with delays less than the desired clock period are called short paths. We simultaneously consider delays on all path segments to obtain a delay constraint D1 that satisfies the following two conditions: 1. D1 < Dz 2. D1 is the greatest lower bound for D2. Therefore, there is no delay constraint D3 so that D3 < DI < Dz.
In a sense, the constraint D1 is the easiest constraint that can be specified to the delay optimizer. Note that if the resynthesized logic meets the delay constraint D1, there may be path segments with delays exceeding the desired clock period. However, as we show in Section 7, it is always possible to retime the resynthesized circnit to achieve the desired clock period 4 -c.
OPTIMAL DELAY CONSTRAINT SET
The arrival and required times of the inputs and outputs, respectively, of the combinational logic are computed by simultaneously considering all path segments of the sequential circuit. Let the default arrival time of all inputs of the combinational logic be O. Also, let~-e be the default required time of all outputs of the combinational logic. Note that the primary inputs and outputs of the sequential circuit S assume the default values in any optimal delay constraint set. We specify the arrival times of all inputs of the combinational logic with respect to the primary inputs of S. Similarly, we specify the required times of all outputs of the combinationrd logic with respect to the required time of the primary outputs of S. The arrival time of the output signal of a latch and the required time of the latch input signal are related as follows. Consider a latch 1,. If the arrival time of the latch output signal is advanced by x; (i.e., this signal arrives z: units of time ahead of the primary inputs of S), then the latch input signal's required time is also advanced by the same amount.
Therefore, the input signal of the latch is required to be ready z: units of time ahead of the primary outputs of S. Let z, be the number of time units by which the output signal of latch L is advanced M compared to the primary inputs of S. If z; is negative, then the out put signal of latch 1, arrives -z, units of time after the primary inputs of S. Also, let zo denote the change in the arrival and required times of primary inputs and primary outputs of s.
We formulate the optimization problem by separately considering short and long path segments.
Short Paths: Let p be the maximum delay from latch /, to lj. Since we are considering a short path segment, p < #-c. If we assume that the output signal of latch 1, arrives at the same time as the primary inputs of S, then the input signal of latch lY is ready before the default required time of # -c. Let the input signal of latch lJ arrive Z2 units of time before its default required time. This implies that the output signal of latch Zj is ready ZJ units of time before the primary inputs of S. Therefore, an additional delay of Zj units of time can be tolerated on all path segments originating from latch ZJ. The delay optimizer can resynthesize path segments originating from latch 1$ so that their delay does not exceed @-e + Xj rather than the default value of -c. Assuming that the delay optimizer is able to resynthesize these path segments to meet the delay constraint of~-c + X3, some of the resynthesized path segments may have a delay exceeding the desired clock period of~-t. However, the delay on these resynthesized path segments can be reduced by moving latch lj forward by at most Xj units of time during the retiming phase.
A similar argument applies to path segments terminating at latch 1,. If we assume that the input signal of latch 13 arrives at the same time as the primary outputs of S, then the output signal of latch /, can arrive after the primary input signals have arrived. This is because the path segment between 1, and lJ is short. Let z; be the number of time units the output signal of latch 1, can arrive after the primary input signals of S have arrived. This implies that the input signal of latch 1, can be ready z, units of time after the primary outputs of S. Therefore, an additional delay of z, units of time can be tolerated on all path segments terminating at latch 1,. The delay optimizer can resynthesize path segments terminating at latch L so that their delay does not exceed # -c+ z, rather than the default value of # -c. Again, assuming that the delay optimizer is able to resynthesize these path segments to meet the delay constraint of @-c + z), some of the resynthesized path segments may have a delay exceeding the desired clock period of~-c. However, the delay on these resynthesized path segments can be reduced by moving latch 1, backward by at most z, units of time during the retiming phase.
We now analyze the more general case where the arrival times of output signals of both latches are advanced. Let zi and Xj be the amounts by which the output signals of latches 1, and lj, respectively, are advanced. If latch output signals are assigned their default arrival times, then the delay optimizer must resynthesize the path segment between /, and /j so that the delay does not exceed q5-c. If we advance only the output signal of latch 1,, then a delay of # -c + z, can be tolerated between latches /, and /j. However, if we advance only the output signal of latch /j, then a delay of only~-c -Zj can be tolerated between the two latches. Note that if~-c -Xj z p, then the delay optimizer does not have to resynthesize the path segment between the latches since the delay constraint is already satisfied by the current implementation.
If output signals of both latches are advanced, then the delay optimizer must resynthesize the path segment between the two latches so that the delay does not exceed~-~-Ap.
Here, Ap = ZJ-z, is the net decrease in the tolerable delay between the two latches as compared to the default tolerable delay of~-c. If~-c -Ap becomes less than the original delay of p, then it may be impossible to resynthesize the logic to achieve this delay bound. Therefore, we require that 4 -c -Ap z p.
Long
Paths:
Let p be the maximum delay from latch 1, to lJ. Clearly, p > @-c. If output signals of the two latches are assigned their default arrival times, then the delay optimizer must resynthesize the path segment between L and ZJ to reduce the delay from p to 4 -e. If we advance only the output signal of latch 1,, then a delay of~-c + z, can be tolerated between latches 1, and 1, and the delay optimizer will be required to reduce the delay of this path segment from p to # -c + z, rather than~-c. Assuming that the delay optimizer is able to resynthesize these path segments to meet the delay constraint of 1#-c + z,, some of the resynthesized path segments may have a delay exceeding the desired clock period of # -t. However, the delay on these resynthesized path segments can be reduced by moving latch i, forward by at most xi units of time during the retiming phase.
If we advance only the output signal of latch lJ, then only a delay of~-c-xj can be tolerated between the two latches. The delay optimizer will have to reduce the delay of this path segment from p to~-t -ZJ which may be more difficult to achieve than the original goal of~-t.
If output signals of both latches are advanced, then the delay optimizer must resynthesize the path segment between the twolatches toreduce thedelay fromptoq5-c-Ap.
Clearly, we require that Ap < 0. Otherwise, the delay optimizer will have to reduce the delay frompto a quantity that is lower than~-c and this may be impossible to achieve.
The smaller the value of Ap, the less stringent is the delay constraint for the delay optimizer.
However, Ap need not decrease beyond I#J-c-p. This is because at this value of Ap, the tolerable delay on the path segment is equal top and this delay constraint is already satisfied by the current implementation.
Therefore, the delay optimizer does not have to resynthesize the path segment. The fact that Ap need not decrease beyond~-c -p can be captured in an . . . oPtlmlzatlon framework a.sfo~ows: We construct an objective function that is heavily biased towards increasing the tolerable delay on long path segments so that the tolerable delay is equal to the delay in the current implementation.
This amounts to minimizing~~ij for all long path segments. A secondary goal is to increase the tolerable delayson all path segments.
Let Pbe the set of all path segments. Also, let PI and P2 be the set of short andlong path segments, respectively. Wedenote thepath segment froml, tol~asl; + /j. Let d,b e the delay of this segment. The optimization problem to obtain the optimal delay constraint can be stated as follows:
Here, a is significantly larger than /3. The optimization is performed under the following constraints:
q Tolerable delay on a short path segment is greater than or equal to the actual delay of the path segment.
q Tolerable delay on long paths is greater than or equal to the desired clock period.
A solution of the optimization problem may have have zo > 0. Therefore, the arrival time for the output signal of latch 1, is given by x, -xo.
The above optimization problem is the dual [6] of the minimum cost flow problem.
We will refer to the above optimization problem as the dual problem and the minimum flow cost problem as the primal problem. The network for the flow problem consists of a vertex for each variable x, in the dual. If the dual has a constraint X3 -z, < c, then the net work has an arc from j to i. Furthermore, the cost of unit flow over this arc is equal to c and this arc can carry an arbitrarily large amount of non-negative flow. If the dual has a constraint ZJ -z, -~:j < C, then the network has an arc from J to 2. The cost of unit flow over this arc is equal to c and the flow on this arc cannot exceed a. The coefficient of z, in the dual objective function is the net flow out of vertex i in the flow network. If the net flow is positive (negative), then vertex i is a source (sink). If the net flow is O, then vertex i is a transshipment node of the network and the tot al flow is conserved.
A useful variation of the above problem is as follows. Among long paths, we may prefer to decrease certain longer paths more than others. Our preference may be dictated by functional information available about the long paths. This can be easily incorporated into the objective function as follows. If the maximum path delay between 1, and lJ is p ( p~~-c), then we include the term p x (-Ap) in the objective function. Another variation would be to require that the arrival times (required times) of any latch output (input) signal be ahead of the primary inputs (outputs). All these variations translate into additional constraints that can be easily added to the basic optimization framework. Many other variations are possible using the above optimization framework.
A systematic procedure to obtain the optimal set of delay constraints is as follows: 1.
2.
3.
Construct the path graph for circuit S. A path graph P has a vertex L for every latch L. Primary inputs and primary outputs of circuit S are represented by a single vertex 20. If there is some path from latch 1, to lJ iu circuit S, graph 'P has an arc from vertex 1, to vertex /j with a weight equal to the maximum path delay from latch Zi to latch 1~. If 1, is a primary input, then there is an arc from iO to 2,. Similarly, if 1, is a primary output, then there is an arc from 1, to 10. Combinational paths between primary inputs and primary outputs are not included in the path graph since the delays on these paths can only be reduced by combinational resynthesis.
Classify arcs into short and long arcs. An arc is long (short) if its weight exceeds (is less than) the desired clock period.
Formulate inequalities for short and long arcs. There is one inequality for every short arc and three inequalities for every long arc. 5. Solve optimization problem using minimum-cost flow algorithm. Let X;, O < z < k be the optimal arrival time of the output signal of latch la. If XO is non-zero, then we adjust the arrival time for the latch output signal to be X: -XO. This translation is done since there is no change in the arrival times of the primary inputs. In the sequel, we assume that this adjustment (if necessary) has been performed and that X; refers to the adjusted arrival time for the latch output signal. The optimal arrival and required times for the combinational delay optimizer are obtained as follows:
1. Primary inputs are assigned an arrival time of O. Also, the output of latch li is assigned an arrival time of -Xi.
2. Primary outputs are assigned a required time equal to the desired clock period 4 -c. All other latch inputs are assigned a required time of~-e -Xi.
6. AN
EXAMPLE
We illustrate the delay constraint set calculation by an example. Consider the circuit shown in Figure 1 . The clock period of the circuit is # = 3 and this cannot be reduced any further by retiming. This is because there is a combinational path between primary input d and the primary output~that has a delay of 3. Also, combinational delay optimization cannot reduce the delay of the circuit any further. This is because the primary output function j cannot be resynthesized to achieve a clock period of 2. We show that combinational resynthesis using the optimal delay constraint set reduces the clock period to 2. Therefore, the reduction in clock period is c = 1.
The path graph for the circuit in Figure 1 is shown in Figure 2 . It has three vertices 11, lz and 13 corresponding to the three latches in the circuit.
Vertex 10 corresponds to the primary inputs and primary outputs of the circuit. Since there is a path from primary inputs to latch 11 with a maximum delay of 1, we include the arc /0 > 11 with a weight of 1 in the path graph. Similarly, paths from latch 23 to primary outputs are represented by the arc h = 10. The weight of this arc is 3 since the maximum delay on any path from 13 to a primary output is 3. Other arcs in the path graph can be constructed similarly. Arc 10~/s is a short arc and the corresponding inequality is x3 -Z. < 1. Similar inequalities are constructed for the remaining five short arcs in the path graph. The path graph has only one long arc 13~ZO. This arc contributes three inequidities: zo -X3 < 0, so -Z3 -e30 < -1 and QO z O. The optimization problem can be formulated directly from the path graph:
The first six inequalities correspond to the short arcs. The last three inequalities correspond to the long arc 13~10. We solve the optimization problem using a minimum-cost flow algorithm and obtain the solution: zo = O, Z1 = 1, m=land~s=l Wemsume thatcr=lOand~=l. The arrival times for all primary inputs are O. The arrival times for outputs of latches 11, 12and 13are -1, -1 and -1, respectively. The required time for all primary outputs is the desired clock period 2. Required times for the inputs of latches 11, 12and 13are 1, 1 and 1, respectively. We resynthesize the combinational logic under these delay constraints. The resynthesized circuit is shown in Figure 3 . The delay optimizer hss satisfied all the specified delay constraints. However, note that the resynthesized combinational logic has paths. exceeding the desired clock period of 2. How- can always ever, ss shown in the next section, this circuit be retimed to achieve the desired clock period. The retimed circuit is shown in Figure 4 . We show that the delay of this path is less than or equal to (n + 1) x (@ -c). Let p,j be the delay between 1, and 1~. Therefore, the delay of this path is bounded bỹ~~~p ,,,+,.
Since resynthesis of S guarantees that p,,,+, < (~-~) -(~t+l -z,) and ZO = Z~+I = O, the summation is bounded by (n+ 1) x (~-c).
Absence of critical cycles:
Consider a cycle in S' with latches 11. . . L. We show that the delay of this cycle is less than or equal to n x (# -t). Again, let p,~be the delay between /, and /3. Therefore, the delay of this path is bounded by~~~~P,,,+l. Here, p~,~+l is the delay between latches /~and /1. Using p,,,+l < (~-t) -(Z,+ I -z,), the summation is bounded by n x (@-t). m 8. EXPERIMENTAL RESULTS We implemented the proposed delay optimization technique in a prototype C language program called SDO (sequential delay optimizer).
Our implementation consists of three main parts: retiming, delay constraint computation and combinational resynthesis. Retiming and combinational resynthesis in SD o are performed using the unit delay retiming and speed.up tools, respectively, that are part of the logic synthesis framework S1S [7] . Delay constraint computation in SDO is performed using a commercial linear programming package called CPLEX [8] that also solves network flow problems. Table 1 summarizes the experimental results on the MCNC synthesis benchmarks. We transform every circuit into a circuit that consists of only t we-input NAND gates by using the tech-decomp -a .2 program in S1S [7] . The circuit obtained aft er using tech-deco rnp is the initial circuit for our experiments.
Under column Initial, we show the area, the number of flip flops and the clock period of the initial circuit.
The area of a circuit is the number of literals in the circuit. The number of flip flops in the circuit is indicated under column Reg,. The clock period of the circuit is indicated in column~.
For each circuit, we conducted three experiments. For a fair comparison, we used the same retiming and combinational delay optimizer (speed_up) for all experiments. In the first experiment, we used the retiming program in S1S to obtain an optimally retimed circuit. The area, the number of flip flops and the clock period of the optimally retimed circuit are shown under the column On/y Retiming. In the second experiment, we performed optimal retiming as well as combinational delay optimization using the speed.up program in S1S with arguments -d 6 -m unit. Column Retiming & speed-up shows the area, number of flip flops and the clock period of the circuit obtained by using a combination of optimal retiming and speed-up. Finally, column SDO shows the area, number of flip flops and the clock period of the circuit obtained by using our delay optimization technique. The delay constraint calculation part of our program took less than one second of CPU time on a Sparc2 workstation for all example circuits.
As an example, the circuit dk14 initially haa a clock period of 14. It has 283 literals and three flip flops. After optimal retiming, the clock period of the circuit reduces to 13. If we use a combination of retiming and speed_up, then a clock period of only 12 is achievable. When the optimally retimed circuit is processed by SDO, the clock period reduces from 13 to 8. The optimized circuit has 310 lit erals and three flip flops. Experimental results clearly indicate that it is possible to reduce the clock period of the circuit beyond what is achievable using optimal retiming by using global path delays in the circuit.
CONCLUSION
We have presented a new framework for sequential delay optimization that improves the performance of circuits beyond what is possible by using optimal retiming. The sequential circuit is viewed as an interconnection of weighted path segments and a network flow formulation is used to obtain an optimal set of delay constraints.
If desired, our framework can exploit functional information about paths to bias delay optimization.
Our formulation also provides a new technique for retiming unit delay synchronous circuits [5] . We are currently considering the initialization and register minimization of circuits produced by SD o. 
APPENDIX
We show that a critical path or cycle is necessary and suficient condition to prevent retiming.
We model digital circuit aa a directed graph G that has a vertex for every primary input, primary output or combinational logic gate. There is an arc e from vertex u to vertex v (represented as u +' v) if gate u is an input to gate v. Furthermore, we associate a delay dv~O with vertex v and a weight we~O with every arc e. Here, de is the propagation delay of gate v and we is the number of latches on the arc e. The augmented graph R is obtained from the graph G by replacing TABLE primary input and primary output vertices in graph G by a single host node. All outgoing arcs from primary inputs G now originate from the host in the augmented graph H. Similarly, all incoming arcs to primary outputs in G are now incident to the host. The delays of vertices in G and H are also the same, except that the host vertex is assigned a delay of zero. The arc weights in H are the same as the arc weights in G. However, we we increment the weight on all incoming arcs of the host by 1 and we will refer to such arcs as latch arcs. Otherwise, a combinational path in G, will result in a zero weight cycle in H.
It is convenient to introduce a new graph H' that haa the same vertices and arcs as H. Vertices in H' have the same delay as the corresponding vertices in H. However, the arc weight w: of any arc e in H! is w: = we -& except for outgoing arcs of the host that have identical weights in H and H'. We obtain a retiming of H by using the graph H'. A similar construction has been used by Leiserson and Saxe [1] . However, in their construction, weights of outgoing arcs of the host are also decreased by -&.
If G has path from a primary input to a primary output with delay # -e, their construction of H' results in the following problem. The corresponding cycle in H' will have a negative weight of -1 + 1 -~= --& and we wrongly conclude that G is not retima~i;.
Leiserson and Saxe [1] (Theorem 11) have identified necessary and sufficient conditions under which H is retimable. They assume that all vertices in H have unit delay. However, in our case, the host node in H has zero propagation delay. Furthermore, we are interested in a specific retiming in which HR has at least one latch on every latch arc. The retiming proposed by Leiserson and Saxe [1] does not guarantee this. We first show that G is retimable if and only if the augmented graph H can be retimed so that each latch arc has a weight of 1. Given such a retiming of H, we obtain GR as follows. We delete the host node and remove a latch from every incoming arc to the host node in HR. shall produce a retiming r of H so that the clock period is less than or equal to @-c. Let g(v) be the weight of the shortest path from v to v~, the host vertex in H'. We define the retiming function r as follows: r(o) = (-& + g(o)l -1. We claim that this a legal retiming of H. Using the retiming function r, it can be shown that (1) every arc in the retimed graph has non-negative edge weight and (2) that there is at least one latch on any path with delay greater than #-c [5] .
We now show that the proposed retiming guarantees that every latch arc haa at least one latch. Consider the latch If H' contains a negative weight cycle, a retiming of H is impossible. Consider a cycle p in H' that includes the host and has n arcs. Therefore, the delay of this cycle is n -1. Since the cycle has a negative weight, w(p) -S <0. Here, w(P) is the weight of the corresponding cycle in H. Since the number of latches is less than &, therefore, thk is a critical cycle. Similarly, consider a cycle p that does not include the host and has n arcs. The delay of this cycle is n. Therefore, w(p) --& <0 and the cycle is critical. Hence, if H' has a negative cycle, then retiming is impossible.T heorem 2: A unit delay synchronous circuit S that has a clock period~can be retimed to achieve a clock period~-c (c > O) iff S has no critical paths or cycles.
Proof:
If S haa a critical path or cycle, then S cannot be retimed [4] . We show that if S is not retimable, then S has a critical path or cycle. Let H be the augmented graph of circuit S. From Lemma 1, H is not retimable ifl H' haa a negative weight cycle. Furthermore, a negative weight cycle in H' corresponds to a critical cycle in H. Critical cycles in H that include the host correspond to critical paths in S. a weight of at least one, then circuit SR is obtained from HR by deleting the host node and by removing a latch from Proof:
Assume that H' has no negative weight cycle. We every latch arc. H
