Abstract-In general-synchronous framework, in which the clock is distributed periodically to each register but not necessarily simultaneously, the circuit performance such as the clock period is expected to be improved by delay insertion. However, if the amount of inserted delays is too much, then the circuit is changed too much and the circuit performance might not be improved. In this paper, we propose an efficient delay insertion method that minimizes the amount of inserted delays in the clock period improvement in general-synchronous framework. In the proposed method, the amount of inserted delays is minimized by using an appropriate clock schedule and by inserting delays into appropriate places in the circuit. Experiments show that the proposed method can obtain optimum solutions in short time in most circuits
I. INTRODUCTION
The semiconductor manufacturing process technology has improved the scale, speed, and power consumption of LSI circuits. However, increasing the ratio of the routing delay in the propagation delay bounds the amount of improvements in complete-synchronous framework (c-frame) in which the simultaneous clock distribution to every register is assumed. The increases of the size and power consumption of a clock distribution circuit have become serious issues in c-frame. While, general-synchronous framework (g-frame) [1] - [3] , in which the clock is assumed to be distributed periodically to each individual register though not necessarily to all the registers simultaneously, is expected to give an essential solution. By using g-frame, the quality of circuit such as the clock frequency, clock distribution circuit size, power consumption, and etc. are expected to be improved. The efforts toward improvements of qualities in g-frame are summarized in [4] .
Since the clock period might not be reduced in g-frame even if the maximum delay is reduced, the effort in c-frame might degrade the circuit performance in g-frame. Therefore, the optimization of circuit synthesis that takes g-frame into account must be investigated. In this paper, we focus on delay insertion methods [5] - [8] . In [5] , a delay insertion method that minimizes the clock period was proposed. Since the amount of an inserted delay is iteratively determined by searching the whole circuit, the method takes too much computation time. In [6] , a fast delay insertion method that minimizes the clock period was proposed. Although it is fast and the amount of inserted delays is smaller than that by the method in [5] , a lot of redundant delays are still inserted. In [7] and [8] , the mixed integer linear programming formulations that minimize the amount of inserted delays (MILP) were proposed, respectively. Alothogh optimal solutions are obtained by MILP, the methods based on MILP cannot be applied to large circuits since the required memory size and computation time of MILP are large.
In this paper, we propose an efficient delay insertion method that minimizes the amount of inserted delays in clock period improvement in g-frame. The proposed method is based on the method in [6] . In the method in [6] , a clock schedule for a target clock period that has timing violations is assumed, and delays are greedily inserted to recover the timing violations of the assumed clock schedule. The amount of inserted delays by it is relatively large since it assumes an inappropriate clock schedule and adopts a greedy delay insertion approach.
In the proposed method, a clock schedule that has fewer timing violations is assumed in order to reduce the amount of inserting delays. Furthermore, delays are inserted into appropriate places so that the timing violations are recovered by fewer inserting delays. Experiments show that the proposed method can obtain optimum solutions in short time in most circuits.
II. PRELIMINARIES In this paper, we consider a circuit consisting of registers and gates, and wires connecting them. We refer to them as elements. A circuit is represented by the graph G = (V g , E g ), where V g is the vertex set corresponding to elements in the circuit and E g is the directed edge set corresponding to signal propagations in the circuit. In this paper, we assume that the maximum delay of each element is equal to its minimum delay. Let d(v) be the weight of v ∈ V g which corresponds to the delay of corresponding element. Let V r be a register set. Necessarily, the register set is a subset of V g . An example of the circuit is shown in Fig. 1 (a) . In Fig. 1 (a) , {a, b, c, d} is the register set, and the figure in each vertex except registers represents its weight.
In general-synchronous framework (g-frame), the clock arrival timing of a register may be different from other registers. The clock timing S(r) of a register r is defined as the difference in clock arrival time between r and an arbitrary chosen reference register. Moreover, the set of clock timing of all the registers S is called clock schedule.
A circuit works correctly with a clock period T if the following two types of constraints are satisfied for every register pair with signal propagations [1] .
Setup (No-Zero-Clocking) Constraints
Hold (No-Double-Clocking) Constraints
where D max (a, b) is the maximum delay and D min (a, b) is the minimum delay from a register a to b (Fig. 2) . Since a clock ticks all the register simultaneously in complete-synchronous framework (c-frame), the clock period must be larger than or equal to the maximum delay between registers. On the other hand, in g-frame, circuits can work correctly with the clock period which is smaller than the maximum delay between registers, if all the register pair with signal path satisfies two types of constraints.
Let T S (G) be the minimum clock period of a circuit G in g-frame under the assumption that the clock can be inputted to each register at an arbitrary designated timing. Hereafter, we simply call T S (G) the minimum clock period of G in g-frame. T S (G) is determined by the constraint graph H(G) = (V r , E r ) for G, where vertex set V r corresponds to registers in G and directed edge set E r corresponds to two types of constraints [2] , [3] . An edge in E r from a register a to a register b with weight D min (a, b), called the D-edge, corresponds to the hold constraint, and an edge from a register b to a register a with weight T − D max (a, b), called the Zedge, corresponds to the setup constraint. Let H(G, t) be the constraint graph in which the clock period T is set to t. Let the weight of a directed cycle in H(G, t) be the sum of edge weights on the directed cycle. It is known that the minimum clock period T S (G) is the minimum t such that there is no cycle with negative weight in the constraint graph H(G, t).
For example, the constraint graph H(G, 7) of G shown in Fig. 1(a) is shown in Fig. 1(b) . Since the clock period must be larger than or equal to the maximum delay between registers in c-frame, the minimum clock period of G in c-frame T C (G) = 10. On the other hand, since H(G, 7) has no cycle with negative weight and the weight of cycle (a, c, a) is negative when T < 7, T S (G) = 7.
III. EXISTING METHOD In [6] , a delay insertion method that minimizes the clock period was proposed (Fig. 3 ). It consists of two steps. In the first step a clock schedule without considering all the hold constraints is determined and in the second step all the violating hold constraints of the clock schedule are recovered by delay insertion. It determines the amount of an inserted delay according to the delay-slack and delay-demand. The delay-slack and delay-demand are defined by the difference between the arrival time of the latest signal and that of the earliest signal (see Fig. 4 ).
The methods in [5] , [6] guarantee that obtained circuit achieves the lower bound of the minimum clock period of the circuit G in g-frame (hereafter, we refer to it as T L (G)). However, since the amount of inserted delays was not taken into account, a lot of redundant delays were inserted. For example, the clock schedule and the circuit obtained from G shown in Fig. 1 by the method in [6] with t = 4(= T L (G)) are shown in Fig. 5 .
IV. PROPOSED METHOD In this paper, we propose an efficient delay insertion method that minimizes the amount of inserted delays to achieve the Procedure SchedulingZ(H(G, t)) Input : constraint graph H(G, t) Output : clock schedule S Step 1 : Determine a clock schedule S that satisfies all the setup constraints.
Step 2 : return S.
Procedure InsertDelayDFS(G, S)
Input : circuit G, clock scheduling S Output : circuit G after delay insertion
Step 1 : Until all the hold constraints are satisfied, repeat the following. 1) Calculate delay-slack and delay-demand.
2) Find an edge e by DFS from a register from which a hold constraint is violating, and insert a delay which equal to min{slack(e), demand(e)}(> 0) to e. Step 2 : return G.
Step 2 : G := InsertDelayDFS(G, Sinit)
Step 3 : return G . target clock period which is set to be larger than or equalt to T L (G), and smaller than T S (G) in g-frame. The proposed method is based on the method in [6] and shown in Fig. 6 In the method in [6] , a clock schedule is determined ignoring hold constraints. Therefore, the number of hold violations in the clock schedule tends to be large, and the amount of inserted delays tends to be large. The proposed method tries to minimize the number of hold violations in the assumed clock schedule to reduce the amount of inserted delays.
Let where w(a, b) is the edge weight of (a, b) on H(G, t) and S is an assumed clock schedule. Note that the timing constraint of Fig. 1 by the method in [6] with t = 4(= T L (G)).
Procedure ModefyS(H(G, t), S)
Input : constraint graph H(G, t), clock schedule S Output : clock schedule S
Step 1 : Until no hold constraint can be recovered, repeat the following.
Procedure InsertDelayMinCut(G, S)
Step 1 : Until all the hold constraints are satisfied, repeat the following. 1) Calculate delay-slack and delay-demand. 2) Insert a delay which equal to min{slack(e), demand(e)(> 0)} to each edge e which is on a minimum cut of edges with positive delay-demand.
Step 2 : return G.
Input : circuit G, target clock period t Output : circuit G after delay insertion
Step 1 : Sinit := SchedulingZ(H(G, t))
Step 2 : Smod := ModifyS(H(G, t), Sinit)
Step 3 :
Step 4 : return G . β (see Fig. 7 ). However, this change might cause new timing violations since the slack of (b, c) decreases. Let H S (G, t) be the graph obtained from H(G, t) by deleting all the edges (a, b) with Δ(a, b) < 0 in clock schedule S. Let the length of a path be the sum of slacks of the edges in the path in H S (G, t). Let dist(u, v) be the minimum length of paths from a vertex u to v in H S (G, t). Assume that there are hold violations. Let Δ(a, b) < 0.
If dist(b, a) ≥ −Δ(a, b), then the slack of (a, b) becomes 0 and no new violation is generated when S(v) is changed to
The hold violation of (a, b) can be recovered without generating new timing violations by the above procedure. In our proposed method, a clock schedule that removes as many hold violations as possible is obtained by the above procedure and is used in delay insertion.
In the method in [6] , since a delay is inserted iteratively without considering the global structure of the circuit, redundant delays are inserted. If a delay is inserted into an appropriate place, several hold violations might be recovered at once.
The proposed method constructs a flow graph in which the capacity of each edge, source vertices, and sink vertices are defined. Then, delays are inserted into edges in a minimum capacity cut with finite capacity which separates source vertices and sink vertices. If a hold constraint of (a, b) is violating, then the vertex in G corresponding to register a is set to a source and that corresponding to register b is set to a sink. The capacity of each edge is defined according to the delayslack and delay-demand as shown in Fig. 8 .
If there is a source-to-sink path on which each edge has an infinite capacity, there is no finite cut. In such cases, the hold violation cannot be recovered by inserting delays into edges in the cut since delays cannot be inserted into some edges in the cut. The proposed method removes edges with infinite capacity whose tail is a source. Furthermore, for each removed edge, the method removes the tail vertex if it has no outgoing edge and regards the head vertex as a source vertex if it has no incoming edge.
Assume that more than two edges in a source-to-sink path from a register a to b are cut by a minimum capacity cut. In such cases, if delays are inserted into all the edges in the cut, the setup violation of (a, b) might occur since too much delays are inserted on the path from a to b. Therefore, if more than two edges in a source-to-sink path from a to b are cut by a minimum capacity cut, the proposed method inserts a delay into the nearest edge from a.
A finite cut can be obtained and some hold violations are relaxed by the above procedure. Our proposed method can recover all the hold constraints by applying the above procedure iteratively.
For example, the clock schedule and the circuit obtained from G shown in Fig. 1 by the proposed method with t = 4(= T L (G)) are shown in Fig. 9 . The amount of inserted delays by the proposed method is less than that by the method [6] shown in Fig. 5 .
V. EXPERIMENTAL RESULTS
We implement the clock scheduling method of the method in [6] (S init ), the delay insertion method of the method in [6] (DFS), the clock scheduling method of the proposed method (S mod ), and the delay insertion method of the proposed method (MinCut), respectively. We implement these methods in C++, which was compiled by gcc4.1.2, and execute on a PC with a 3.40GHz Intel Pentium-4 CPU and 1GB RAM. In order to observe the efficiency of each clock scheduling method and that of each delay insertion method, we apply four methods to circuits in combination: S init +DFS [6] , S init +MinCut, |Vg| the number of vertex in the circuit graph |Vr| the number of register in the circuit TC the minimum clock period of the original in c-frame TS the minimum clock period of the original in g-frame TL the lower bound clock period in g-frame Dadd the minimum inserted delay to achieve T L Time[s] the sum of computation time to formulate of MILP and to solve MILP * The best feasible solution of s6669 is shown before the memory over.
S mod +DFS, and S mod +MinCut. We also implement the mixed integer linear programming formulation for minimization of the inserted delay (MILP) based on [7] , [8] . MILP is solved by CPLEX10.0.0 [9] on the same PC. We perform these methods on the ISCAS89 benchmark suite. The delay of each NOT, AND, OR, NAND, and NOR gate is set to 1 and that of each register and routing is set to 0. In 30 circuits among 48 ISCAS89 benchmark circuits, since the minimum clock periods in g-frame are not decreased by the delay insertion, the original circuits are optimal. The other 18 circuits are shown in Table I . In all the following experiments, the target clock period is set to the lower bound clock period in g-frame T L [5] , [6] . The optimum solutions except s6669 are obtained by MILP. On the other hand, the optimum solution of s6669 is not obtained by MILP because of the memory over when the MILP is solved by CPLEX. The obtained feasible solutions by MILP are also shown in Table I .
The experimental results of four combinational methods are shown in Table II . The amount of inserted delays obtained by the method in [6] is about 948 times as much as that by optimal solution on average, and the computation time of the method in [6] took larger than that of MILP. While, the amount of inserted delays obtained by the proposed method S mod +MinCut is only about 1.27 times as much as that of the optimum solution on average, and the computation time of the method in [6] is four times as fast as that of MILP on average. In particular, the optimum solutions are obtained by the proposed method in 11 circuits among 18 circuits.
VI. CONCLUSION We propose an efficient delay insertion method that minimizes the amount of inserted delays based on the method in [6] . Experiments showed that the proposed method can obtain optimum solutions in short time in most circuits under the assumption that the maximum delay of each element is equal to its minimum delay.
For the future work, we apply the proposed method to the real delay model and evaluate the whole circuit performance after the delay insertion including the clock distribution circuit. ACKNOWLEDGMENT This research was partially supported by Grant-in-Aid for JSPS Fellows (19·6015).
