ABSTRACT
INTRODUCTION
The sequential circuits in a system are considered major contributors to the power dissipation since one input of sequential circuits is the clock, which is the only signal that switches all the time. In addition, the clock signal tends to be highly loaded. To distribute the clock and control the clock skew, one needs to construct a clock network (often a clock tree) with clock buffers. All of this adds to the capacitance of the clock net. Recent studies indicate that the clock signals in digital computers consume a large (15% -45%) percentage of the system power (1) . Thus, the circuit power can be greatly reduced by reducing the clock power dissipation.
Most efforts for clock power reduction have focused on issues such as reduced voltage swings, buffer insertion and clock routing (2) . In many cases switching of the clock causes a lot of unnecessary gate activity. For that reason, circuits are being developed with controllable clocks. This means that from the master clock other clocks are derived which, based on certain conditions, can be slowed down or stopped completely with respect to the master clock. Obviously, this scheme results in power savings due to the following factors: 1) Load on the master clock is reduced and the number of required buffers in the clock tree is decreased.
Therefore, the power dissipation of clock tree can be reduced.
2) The flip-flop receiving the derived clock is not triggered in idle cycles; the corresponding dynamic power dissipation is thus saved.
3) The excitation function of the flip-flop triggered by derived clock may be simplified since it has a don't care condition in the cycle when the flip-flop is not triggered by the derived clock.
In (3) the authors presented a technique for saving power in the clock tree by stopping the clock fed into idle modules. However, a number of engineering issues related to the design of the clock tree were not addressed and hence, the proposed approach has not been adopted in practice. This paper investigates various issues in deriving a gated clock from a master clock. In section II, a quaternary variable is used to model the clock behavior and to discuss its triggering action on flip-flops. Based on this analysis, two clock-gating schemes are proposed. In section III, we use the covering relation between the clock and the transition behaviors of the triggered flip-flops to derive conditions for gating the master clock. Two common sequential circuits, i.e. 8421 BCD code up-counter and three-excess counter, are then described to illustrate the procedure for finding a derived clock. In section IV, a new technique for clock-gating is presented which generates a clock synchronous with the master clock. This eliminates the additional skew between the master clock and the derived clock. Thus, the designed sequential circuit is a synchronous one. Finally, we present circuit simulation results to prove the quality of the derived clock and its ability to reduce power dissipation in the circuit.
II. DESCRIPTION FOR CLOCK BEHAVIOR AND CLOCK-GATING
In a synchronous system, a flip-flop is triggered by a certain directional transition of a clock signal. For the clock to be another signal rather than the master clock, it must offer the same directional transition to trigger the flipflop, and it must be "in step" with the master clock.
For the clock signal clk in a circuit if we denote its logic values before and after a transition as clk(t) and clk
respectively, four combinations can be used to express different behaviors of the clock as shown in Table 1 , Assume that there are n flip-flops in a sequential circuit and that their outputs and clock inputs are denoted by Q i and clk i , i = 0,1,«,n-1, respectively. For a synchronous sequential circuit, we have clk i = clk, namely all flipflops are triggered by the same master clock signal clk. However, if a flip-flop Q i is to be disconnected from the master clock during some (idle) cycles, then we have to use a derived clock for Q i . Notice that this derived clock should be "in step" with the master clock for the circuits to remain synchronous.
Generally, we consider that the derived clock is obtained from the master clock clk and the outputs of other flip-
, (which make transitions following the triggering transition of their respective clocks.) Since both AND gating and OR gating can be used for controlling the master clock, we have the following two clock-gating forms 
It should be pointed out that the attached circuitry needed for generating the derived clock should be simple to avoid excessive power dissipation due to this overhead circuitry. Therefore g i and p i in (2) and (3) should be relatively simple functions. Especially, we require g i to be simple to avoid dangerous glitches. Note that if g i = 0,
, we return to the condition of applying the master clock clk in a synchronous sequential circuit. Q . The covering relation can be expressed as:
III. DESIGN OF SEQUENTIAL CIRCUITS BASED ON DERIVED CLOCK
Since AND and OR operations on Boolean variables can be interpreted as minimum and maximum operations on these variables, i.e. , we can obtain the following equations from (6) ) ( ) (
. ) (
Therefore, we should first obtain (
) and then generate the derived clock clk i for flip-flop Q i . We will show the procedure by using design examples.
Example 1. Design of an 8421 BCD code up-counter
The next states and state behaviors of an 8421 BCD code up-counter are shown in Table 2 
(10)
Therefore, we have circuits has lower node capacitance, the asynchronous design is saving power.
Example 2. Design of an excess-three code up-counter
The next state and state transition of an excess-three code up-counter are shown in Table 3 . Transition functions for each flip-flop can be derived as below
(17) Therefore, we have
Based on (2) and (4), (22) and (23) can be re-expressed as
Obviously, if we take clk 3 
and clk 0 = clk , the covering relation will set the excitation functions of all the four flip-flops as 1,2,3) . On the other hand, if we use the master clock for triggering all four flip-flops, we obtain the following complicated excitation functions:
Since the above 
IV. SYNCHRONOUS DERIVED CLOCK AND ITS APPLICATION
In the Example 1 of the last section we take . According to this form of the derived clock we get another asynchronous design, as shown in Fig.4(a) . At the first glance, the circuit has one AND gate more than the design in Fig.3(b) . Besides, it appears that the derived clock clk 1-3 may have an increased phase delay. However, the timing relation shown in Fig.1 indicates that the transition delay of clk 1-3 is independent of the delay of the Q 0 output. The delay between clk and clk 1-3 is only 2t g (t g is the average delay of a gate), which is less than the delay of the flip-flop output.
Based on the above discussion, we can rewrite
. Besides, we take clk from the previous stage of the clock tree. Thus, we obtain a new design, as shown in Fig.4(b) . If we consider delay of the We simulated the new design in Fig.4(b) by SPICE 3f3 using 2µ CMOS technology, which proved that the new design has an ideal logic operation. We also measure the power dissipation of two synchronous designs in Fig.3(a) and Fig.4(b) . The power dissipation diagrams are shown in Fig.5 , and prove that the new design reduces the power dissipation by 22%.
V. CONCLUSION
The behavioral description of a clock is the basis to analyze its triggering action on flip-flops. Based on it, two types of clock-gating were introduced to form a derived clock. We showed that the procedure for designing a derived clock could be systematized so as to isolate the triggered flip-flop from the master clock in its idle cycles.
The achieved power saving can be significant. However, the additional clock skew may lower the maximum operation frequency. Based on analyzing the timing relation in clock-gating, we then presented a new technique for generating the derived clock, which is synchronous with the master clock. Circuit simulation proved the quality of the new derived clock and its capability to reduce power dissipation. The engineering issues mentioned in (3) have thus been resolved for practical application, opening the path for widespread adoption of the clock-gating technique in low power design of custom ICs. 
