The reduction of the peak power consumption of LSI is required to reduce the instability of gate operation, the delay increase, the noise and etc. It is possible to reduce the peak power consumption by clock scheduling because it controls the switching timings of registers and combinational logic elements. In this paper, we propose a fast power estimation method for the clock scheduling and fast clock scheduling methods for the peak power reduction. In experiments, it is shown that the peak power wave estimated by the proposed method in a few seconds is highly correlated with the peak power wave obtained by HSPICE simulation in several days. By using the proposed power estimation method, proposed clock scheduling methods find clock schedules for benchmark circuits that greatly reduce the peak power in a few minutes.
INTRODUCTION
The semiconductor manufacturing process technology has improved the scale, speed and power consumption of LSI circuits. However, increasing the ratio of the routing delay in the propagation delay bounds the amount of improvements in the complete-synchronous framework (c-frame) in which the simultaneous clock distribution to every register is assumed. The increases of the size and power consumption of Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. a clock distribution circuit have become serious issues in cframe. While, the general-synchronous framework (g-frame) [1, 2, 3, 4, 5, 6, 7] , in which the clock is assumed to be distributed periodically to each individual register though not necessarily to all registers simultaneously, is expected to give an essential solution. By using g-frame, the improvements of quality of circuit such as the clock frequency, clock distribution circuit size, power consumption, peak power consumption, and etc. are expected to be achieved.
In LSI design, the reduction of the peak power consumption is required to reduce the instability of gate operation, the delay increase, the noise and etc. In c-frame, the difference of the maximum and the minimum power consumption in the power wave is usually very large since all the registers switch simultaneously and the switchings of gates follow. Therefore, the peak power consumption is very large compared with the average power consumption. While, in g-frame, the peak power consumption is expected to be reduced since the switching timings of registers and gates can be controlled by clock scheduling. The peak power reduction methods by clock scheduling are proposed in [5, 7] .
In [7] , the power consumption is estimated under the assumption that the switching timings of gates are fixed regardless of clock scheduling. A feasible clock schedule that reduces the peak power is obtained by genetic algorithm (GA) in which the clock input timing of each register is changed. But in fact, since the switching timings of gates in the combinational logic circuit depends on the clock schedule, the obtained clock schedule using this estimation is not accurate. Moreover, the computation time of this method is large due to the nature of GA which is a stochastic approach.
In [5] , the method is extended from [7] . The power consumption is estimated under the assumption that the switching timings of gates in a combinational logic circuit depend on the minimum delay from a register and the clock timing of the register. A feasible clock schedule is obtained by GA in which an operation from an infeasible clock schedule to a feasible clock schedule is included. But due to the switching timings of gates caused by the non-minimum delay from a register, the obtained clock schedule using this estimation is still not accurate. Moreover, the computation time of this method is large due to the nature of GA.
In this paper, we propose a fast power estimation method for the clock scheduling. In our proposed power estimation method, the following model is used. In our model, the power consumption of a circuit is the sum of power consumptions of elements by switching. A switching event emerges at a register when the clock is inputted to the register, and propagates through the combinational circuit. The switching event might be blocked by gates due to the side-inputs of gates. Therefore, a gate switches with some probability at the time to which a switching event reaches. The time to which a switching event reaches is estimated by the clock schedule, the gate delays, and the routing delays. Since each switching event is originated by a register, the power consumption of a circuit can be divided into the register-originated power consumptions. In the proposed method, the power consumption is expressed as the sum of the register-originated power consumptions.
In this paper, clock scheduling methods using the proposed power estimation method are also proposed. The proposed scheduling consists of two stages. In the first stage, a clock schedule is generated by determining the clock timing of a register one by one among its feasible clock timing range so that the objective is as small as possible. In the second stage, the clock schedule obtained by the first stage is greedily modified by changing the clock timing of a register one by one to reduce the objective.
In experiments, it is shown that the peak power wave estimated by the proposed method in a few seconds is highly correlated with the peak power wave obtained by HSPICE simulation in several days. By using the proposed power estimation method, proposed clock scheduling methods find clock schedules for benchmark circuits that greatly reduce the peak power in a few minutes.
G-FRAME
In the conventional synchronous framework, the clock timing of a register is the same as those of the other registers. We call this framework c-frame (complete-synchronous framework). C-frame, in which the clock timings of registers are assumed to be equal, is a kind of g-frame (generalsynchronous framework). In g-frame [1, 2, 3, 4, 5, 6, 7] , the clock timing of a register may be different from other registers. The clock timing S(r) of a register r is defined as the difference in the clock arrival time between r and an arbitrary chosen reference register.
A circuit works correctly with clock period T if the following two types of constraints are satisfied for every register pair with signal propagations [3] .
where Dmax(a, b) (Dmin(a, b)) is the maximum (minimum) delay from a register a to a register b.
The clock timing of a register is not necessarily being unique. The range of clock timing of a register is referred to as the timing range of the register [4] . Let [Smin(r), Smax(r)] be the timing range of a register r, where Smin(r) is the lower bound and Smax(r) is the upper bound of timing range of r. We call the set of the timing ranges of all registers clock schedule. If constraints are satisfied whenever the clock timing of each register is selected among its timing range, then the clock schedule is said to be feasible. Since the timing ranges of registers depend on each others, a feasible clock schedule is not unique.
The evaluation of a feasible clock schedule depends on the optimization targets such as clock frequency, clock distribution circuit size, power consumption, and etc. Although the difficulty is in finding a feasible clock schedule that compromises on multiple objectives, a feasible clock schedule that maximizes the clock frequency is obtained in very short time [2, 6] . In addition, if the target of clock timing of each register is given, a feasible clock schedule that minimizes the sum of differences from the targets is also obtained in short time [1] .
ESTIMATION OF POWER CONSUMP-TION
In our power estimation, the peak power consumption of a circuit at each time unit in the clock period is estimated and evaluated. The power consumption differs at each clock period since the input vector to the circuit and the register outputs are different. Since the simulation based estimation is very time consuming, it is not practical to use it in the clock scheduling. Therefore, we propose a fast probability based peak power estimation method to use it in clock scheduling.
In our model, the power consumption of a circuit is the sum of power consumptions of elements by switching. A switching event emerges at a register when the clock is inputted to the register, and propagates into the combinational circuit. The emerging time of a switching event at a register is determined by a clock schedule. A switching event is blocked by an element if some conditions hold, thus the switching event is assumed to propagate with some probability along a path. The propagation time of a switching event from a register to an element along a path is estimated by the sum of gate delays and routing delays of the path.
The power consumption of an element by switching continues during certain time units. The power wave of an element caused by a switching event depends on the environment such as the slew rate of switching and the number of fanouts of the element. The power wave of an element by a switching evaluated empirically is used for the estimation. The details are explained in the experiments.
Since each switching event is originated by a register, the power consumption of a circuit can be divided into the register-originated power consumptions. Associated with a register, the register-originated power consumption is defined. The power consumption caused by a switching event emerged at a register belongs to the register-originated power consumption of the register. See Figure 1 . In Figure 1 , the power consumption of each gate is assumed to be consumed within a time unit at which the switching event occurs.
If the input vector to the circuit and the register outputs change, then an actual register-originated power consumption wave changes. It also depends on a clock schedule. The propagation of a switching event which is not blocked in a clock schedule may be blocked in another clock schedule. In the proposed estimation, however, the register-originated power consumption is estimated based on the probability, and it is assumed to be independent of a clock schedule. According to this assumption, the computation time in obtaining the estimated power consumption wave in clock scheduling is very small. The estimated power consumption wave is not exact but the accuracy is enough for clock scheduling. Let R be the set of registers. Let W (r, t) be the registeroriginated power consumption of a register r ∈ R at time t which is derived by the clock inputted to r at time 0. The register-originated power consumption of r at time t in clock schedule S is denoted by W S (r, t). The register-originated power consumption of r is shifted according to S(r). That is,
where S(r) is the clock timing of r in S. Let W S (t) be the power consumption of the circuit in clock schedule S. We have,
W (r, t) consists of the static power consumption of a register r and the dynamic power consumption of r and gates. The static power consumption of r is consumed whenever clock is inputted, which is independent of emerging of switching event at r. Let Wc(r, t) and Wg(r, t) be the static power consumption and the dynamic power consumption at time t, respectively, which are derived by the clock inputted to r at time 0. That is, W (r, t) = Wc(r, t) + Wg(r, t).
Let wg(v, t) be the switching power consumption of an element v at time t where the switching of v occurs at time 0. Let p(r, v, t) be the switching probability of an element v at time t which is caused by switching events emerged at a register r at time 0. The dynamic power consumption at time t is estimated as follows.
where O(r) is the set of gates in the output cone of r.
The switching of a gate occurs when the switchings of its several fanin gates occur and the other fanin gates do not prevent the switching (see Figure 2) . For example, the The switching probability of gate and the condition probability. (Each gate delay is assumed to be 1.)
switching of a NAND gate occurs when switchings of several fanin gates to the same direction occur and the outputs of the other gates are 1. Let condition probability ca(v) be the probability of the output of a gate v to be a. In this paper, ca(v) as shown in Table 1 is used in the switching probability analysis. For a NOT gate v, c0(v) = c1(v ) and c1(v) = c0(v ) where v is the fanin gate of v. By using the condition probability of each gate which is independent of input vectors, the switching probability of each gate can be defined quickly. Let v be a NAND gate, I(v) be a set of fanin gates of v, X be a subset of fanin gates of v, and d(v) be the delay of v. We have,
) .
Switching probabilities of NOR, AND, and OR are defined similarly. If v is a NOT gate and the fanin gate of v is v ,
The switching probability of a register is the probability such that the fanin gate switches odd times within one clock period. Let p1, p2, . . . , pn be the non-zero switching probabilities of the fanin gate of a register r in one clock period. Then the switching probability of r p(r, r, 0) is
Input : register set R, clock period T , the delay between registers, the register-originated power consumptions.
Output : clock schedule S Objective : minimization of the peak power consumption or the variance of the power consumption.
Step 1 : Obtain a feasible clock schedule by setting the clock timing of each register to the middle of its clock timing range determined by the scheduling engine in [1] .
Step 2 : Ru := R, W S (t) := 0 (0 ≤ t < T )
Step 3 : Choose the register rt in Ru such that W peak (rt) = maxr∈R u W peak (r).
Step 4 : Modify the current feasible clock schedule by enlarging the clock timing range of rt so that the clock timing range of rt is maximized while fixing the clock timings of the other registers.
Step 5 : Set the clock timing of rt within its clock timing range so that the objective of W S (t) + W S (rt, t) is minimized.
Step 6 : W S (t) := W S (t)+W S (rt, t), Ru := Ru \rt.
Step 7 : If Ru = ∅, then go to Step 3. Otherwise output S and terminate. The switching probability of a gate depends on the switching probabilities of registers, and the switching probability of a register depends on the switching probability of the fanin gate of the register. So, the computation of switching probability is repeated until it converges. In this paper, the initial switching probability of each register is set to 1 to increase the proportion of the dynamic power consumption to the static power consumption because the peak power occurs when many elements switch. The computation is repeated until the difference of the switching probability of every register in one iteration is less than 0.0001.
CLOCK SCHEDULING
We propose fast clock scheduling methods For the reduction of the peak power using the proposed power estimation.
The goal of the clock scheduling is the minimization of the peak power of a circuit. Since the error of the proposed power estimation is not small, it is probable that the actual peak power is achieved at the time when the estimated power is near the estimated peak power. Therefore, the time period in which the estimated power is much larger than the average power is better to be short. Therefore, we adopt the minimization of variance of power as an objective. If the variance of power of a circuit becomes very small, the time period in which the estimated power is near the estimated peak power would become large, but the difference between the actual peak power and the average power will be small enough.
Let W peak (r) = max 0≤t<T W (r, t) be the peak power of register-originated power consumption of a register r.
Input : register set R, clock period T , the delay between registers, the register-originated power consumptions, clock schedule S, and the power consumption of the circuit W S (t)
Step 1 : Ru := R
Step 2 : Choose the time tmax and tmin from between 0 and T such that W S (tmax) = max 0≤t<T W S (t) and W S (tmin) = min 0≤t<T W S (t).
Step 3 : If the objective is the peak power, then choose the register rt in Ru such that W S (rt, tmax) is maximum. Otherwise, choose the register rt in Ru such that W S (rt, tmax) − W S (rt, tmin) is maximum.
Step 5 : Set the clock timing of rt within its clock timing range so that the objective of W S (t) is minimized.
Step 6 : If the clock timing of rt is changed, then go to Step 1. Otherwise Ru := Ru \ rt.
Step 7 : If Ru = ∅, then go to Step 2. Otherwise output S and terminate. The proposed clock scheduling consists of two stages. The first stage of clock scheduling is shown in Figure 3 . In the first stage, a feasible clock schedule is generated by determining the clock timing of a register one by one from the register with the largest peak power to the register with the smallest peak power among its feasible clock timing range so that the objective is as small as possible.
The first stage of clock scheduling shown in Figure 3 depends on the order of registers that the clock timing is fixed. Therefore, we also propose the clock scheduling modification method in Figure 4 as the second stage of clock scheduling. In this stage, the clock schedule obtained by the first stage is greedily modified by changing the clock timing of a register one by one to reduce the objective.
EXPERIMENTS
In experiments, the proposed methods are applied to 4 circuits in ISCAS89 benchmark suite shown in Table 2 using the 0.35 µm library of Rohm Corporation. The power consumption of the each gate and that of the register are set to the results obtained by HSPICE. The power wave of an element with n fanouts is obtained as follows. n NAND gates are used as fanouts of the element and wire effects are ignored. The output of 20 series buffers with step input is used as the input wave of the element. The average s1238  18  508  8135  6214  s1423  74  657  27350  23531  s5378  179  2779  7641  6553  s9234  211  5597  17991 of rise and fall waves is used. The unit time of the power estimation, the unit time of clock timing, the unit time of calculation of variance, are set to 5ps, 100ps, and 25ps, respectively. The clock period is set to the minimum clock period of c-frame rounded up to the unit time of sampling.
First, clock scheduling methods for minimization of the peak power and the minimization of the variance are applied, and clock schedules obtained by the first stage and second stage are compared in terms of the qualities in the proposed power estimation.
The estimation results of the obtained clock schedule are shown in Table 3 . In Table 3 , P and P&P are the results of the first and second stages, respectively, where the objective is minimization of the peak power. Similarly, V and V&V are the results of the first and second stages, respectively, where the objective is minimization of the variance. In Table 3, the variances are shown by var. The estimated peak powers of P which are smaller than that of c-frame are obtained within about a second. They are slightly improved by the second stage with small computation cost. Although the estimated peak powers of V and V&V are almost same as that of P and P&P, the variances become small. The computation times of V&V are large compared with the other stages but they are less than a few minutes.
Next, several obtained clock schedules are evaluated by HSPICE simulation. Due to the time limitation, all the obtained clock schedules are not evaluated. In HSPICE simulation, the input vectors which are randomly generated are used. The length of the input vectors is set as the number of registers in a circuit. At each time unit in the clock period, the maximum power consumption among all cycles is recorded.
The peak power waves of s1238 and s1423 obtained by HSPICE for clock schedules obtained by P and V&V are shown in Figure 5 and 6, respectively. The peak power waves by c-frame are also shown in these figures. In c-frame, the power consumption is very high at the time when the clock is inputted to registers, and the power consumption is near 0 at other time units. So the peak power consumption is also very high in c-frame. While, since the switching timings of registers and gates are controlled by clock schedules, the peak power is reduced in g-frame. The peak power of circuit with clock schedule obtained by P is improved 63.2% from c-frame on the average. The peak power of circuit with clock schedule obtained by V&V is improved from circuit with clock schedule obtained by P on most circuits but it requires more CPU time.
In Table 4 , the peak powers by our proposed estimation method, the peak powers by HSPICE simulation, and the simulation times and the between the estimated power wave and the peak power wave by HSPICE simulation are shown in esti., peak, time, and corr., respectively.
The correlation of the peak power wave simulated by HSPICE and our power estimation is high on most circuits, so the validity of proposed power estimation method is shown. In s1423, though the peak power of circuit obtained by V&V is higher than the peak power of circuit obtained by P. Power waves of our estimation and HSPICE of s1423 with clock schedules obtained by P and V&V are shown in Figure 7 and 8, respectively. Figures show that the proposed power estimation is not accurate enough near the peak power. It remains future works that the peak power can not be accurately estimated by our method in a few case.
CONCLUSIONS
In this paper, we propose a fast power estimation method for the clock scheduling and clock scheduling methods for peak power reduction. In experiment, it is shown that the power consumption estimated by the proposed method in a few seconds is highly correlated with the peak power wave obtained by HSPICE simulation in several days. By using the proposed power estimation method, the proposed clock scheduling methods find clock schedules for benchmark circuits that greatly reduce the peak power in a few minutes.
As a future work, an improvement of the accuracy of power estimation without increasing the computation time is desired. For example, the power consumed by a clock tree is not taken into account in the proposed estimation. Although any clock schedule can be realized by a clock tree, the power consumption of the clock tree would become very large and unacceptable if the clock schedule is determined without considering layout information. A clock scheduling method that reduces both the peak power and the total power should be perused.
ACKNOWLEDGMENTS
This work is supported by VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with Synopsys, Inc., Cadence Design Systems, Inc., Rohm Corporation and Toppan Printing Corporation.
