# Selective Clock-Gating for Low Power/Low Noise Synchronous Counters <sup>1</sup>

Pilar Parra, Antonio Acosta, and Manuel Valencia

Instituto de Microelectrónica de Sevilla-CNM / Universidad de Sevilla Avda. Reina Mercedes s/n, 41012-Sevilla, SPAIN Phone: +34-95-505-66-66; Fax: +34-95-505-66-86; acojim@imse.cnm.es

**Abstract.** The objective of this paper is to explore the applicability of clock gating techniques to binary counters in order to reduce the power consumption as well as the switching noise generation. A measurement methodology to establish right comparisons between different implementations of gate-clocked counters is presented. Basically two ways of applying clock gating are considered: clock gating on independent bits and clock gating on groups of bits. The right selection of bits where clock gating must be applied and the suited composition of groups of bits is essential when applying this technique. We have found groupment of bits is the best option when applying clock gating to reduce power consumption and specially to reduce noise generation.

## **1** Introduction

In sequential circuits, it can be often found idle parts where no computation is performed. We can selectively stop the clock of these portions of the circuit using a signal to stop the global clock. This technique is called clock-gating and has been traditionally used to reduce the power consumption due to the system clock transitions [1-3]. For those cycles in which the partial or total state of the system does not change, it is not necessary to let the clock to have a transition. The same reasoning can be made in terms of switching noise. Clock transitions taking place in idle cycles are source of non-productive noise.

In this paper we have centered on the study of binary counters. Such circuits can be considered as a good example of non-productive noise generation because, most of the cycles, the switching bits are only a few. In this sense, most of the significant bits change at a lower rate than the least significant bits.

Synchronous counters can be considered as key subsystems in modern VLSI circuits because of their wide use and importance. Unlike ripple counters, they do not present spurious states, although the excess in hardware and clock synchronized transitions can make them unsuited for low power/low noise applications [4].

Usually, when applying clock gating techniques, specific conditions should be met that stop the system clock totally or in a big part of the system. So, the excess in hard-

<sup>1.</sup> This work has been sponsored by the Spanish MCYT TIC2000-1350 MODEL and TIC2001-2283 VERDI Projects

ware introduced by the clock gating can be compensated by the number of elements that will save power and noise. Only in those cases where a significant part of the clocked system can be disabled, the clock gating technique can bring some advantages in terms of power/noise reduction [5].

For counters, there are no idle cycles (except in the case of "no operation" function if this exists), on the contrary we have that, in every clock cycle, there is at least one state variable switching. For this reason, it is not possible to globally stop the clock and if we want to apply clock gating techniques, we will have to do it on individual bits or on groups of bits. This feature gives a special sense to the study of clock gating in counters.

In this work we will establish the way of stopping the clock in counters separately in some of the bits or, in the case that we select a group strategy, we will establish the suited size of the groups. We will also quantify what is the benefit obtained when applying this technique. In order to do this, a comparison methodology based on switch level simulation has been defined. We will also determine the timing implications of all these techniques.

The structure of the paper is as follows. In the next section we will present the counter that has been used to illustrate the power/noise reduction technique and the selected measurements made to quantify the savings. In section 3 the different options to gate the clock for sequential circuits are summarized and the timing implications of the clock gating strategies are analysed. In section 4 simulation results are presented and finally, we will draw our conclusions.

## 2 The Counter and the Measurement Methodology

We have selected as demonstrator a 16 bit binary counter with four operations: count up, count down, parallel load and inhibition. It also has an asynchronous reset. In Fig 1a the counter functional and structural descriptions are shown. To implement it, a modular design has been developed based on the cell in Fig 1b. The circuit has been designed using the AMS standard 0.35  $\mu$ m CMOS library. Although power consumption and noise generation strongly depends on the kind of flip-flops used [6,7] our goal is to explore what improvements can be done in a counter with a given flip-flop.

Concerning the measurement methodology, a simulation-based method has been selected to estimate the power consumption and the noise generated by the counter and by its gate-clocked versions. Focusing on the count up operation, 500 cycles have been simulated at switch level (Mach PA from Mentor Graphics) and the peak to peak deviation (Vpp) in the power supply (VDD) and the RMS noise (PRMS) have been considered as an indirect measurement of noise. For measuring the power consumption the average value of the supply current (AVG( $I_{VDD}$ )) was computed. In the simulations, we have considered 1nH parasitics coupling inductances between the circuit and power and between the circuit and ground lines [8] (Fig 2).

We first applied the measurement methodology to the 16 bit binary counter, before applying clock gating, in order to have a reference value to establish comparisons with the gate-clocked solutions. The obtained results are shown in Fig 3, where the varia-



tions in power supply and in supply current versus the time of simulation have been plotted in order to show the counter behavior in a qualitative way.

Fig. 1. (a) Structural and functional descriptions of the implemented counter (b) Basic cell for the modular design



Fig. 2. Environment for simulating realistic conditions

There is a remarkable feature on  $I_{VDD}$  vs time graphic: some big-sized peaks of current appear periodically. The explanation for this fact can be given attending on the moments where these peaks are produced. They correspond to the times when more than five or six cells of the counter change simultaneously of state. For example, the biggest peak in the graphic is produced when the counter goes from 255 to 256 (011111111 -> 100000000 in radix 2). so there are 9 flip-flops switching at a time. The following peaks in size are produced when the counter goes from 127 to 128 (8 flip-

flops switching), from 63 to 64 (7 flip-flops switching), 31 to 32 (6 flip-flops switching) and others like these.

The values of Vpp (mean and standard deviation) and PRMS are shown in Fig 3 too. We have also computed the mean of Vpp when Ck = 1 (Vppr) and when Ck = 0 (Vppf). It can be noticed that the noise is higher for Ck = 1 (just after the active edge). This effect is due to the particular configuration of the flip-flops. The value of the average current supply consumption (AVG(I)) is shown. With all these quantities we have a reference to estimate how the clock gating technique improves the counter performance in terms of power consumption and noise generation.



Fig. 3. Power supply and supply current versus time of simulation

In Fig 4 we have plotted VDD and  $I_{VDD}$  graphics in detail for some cycles of interest: when the counter goes from 0 to 1 (there is only one flip-flop changing); when the counter goes from 127 to 128 (there are 8 flip-flops switching at the same time) and when the counter goes from 255 to 256 (9 flip-flops switching simultaneously). Each graphic is divided in two parts by a vertical line. The first part corresponds to the semicycle where the active edge occurs and the second part corresponds to the other semicycle.

We have marked two peaks in each graphic. The first one (marked with continuous line) is the same in the three cases (0->1, 127->128, 255->256), and is due to the noise produced by the clock signal arriving to the flip-flops. The second one, grows with the number of flip-flops that switch simultaneously, being for example  $I_{VDD}$  under 0.005

in the case 0->1, over 0.010 in the case 127->128 and near 0.012 in the case 255->256. We have already pointed up these peaks when looking at Fig 3.

From this analysis we can conclude that there are two sources of noise and it seems clear that clock gating techniques are directed to reduce the first type of noise that it is not needed if there are non-switching bistables, but the second type of noise will need some different considerations.



Fig. 4.  $I_{VDD}$  and VDD versus time of simulation

## **3** Selective Clock Gating

#### 3.1 Clock-Gating Circuits

Here, the different approaches we have used to gate the clock are presented. Basically, two cases have been considered: the based on latches solution and the latch-free solution [5]. In the latch-free option we use only one gate (AND or OR) but we need to restrict the changes in the signal that stops the clock (INH). In Fig 5a the latch-free gated clock generator with AND gate is shown. Below the circuit a timing diagram where it can be seen how a positive edge in INH when Ck = 1 provokes an erroneous (no synchronised) edge in the signal gated\_ck. The same situation can be found for the OR gate at Ck = 0 (Fig 5b). This effect can be avoided by adding a latch to filter the glitches in INH (Fig 5c and Fig 5d), yielding the based on latches solution.

Let us analyse what circuits from those in Fig 5 are suited for our purposes. In the counter case, the signal INH comes from the comparison between  $D_{int}$  (excitation signal of flip-flop *i*) and *q* (state of the cell *i*). Only when both are different there will be a change in the state. Having into account that in our circuit the active edge is the positive one, the transitions in  $D_{int}$  and *q* and hence in INH will probably occur when Ck = 1.



(d)

**Fig. 5.** a), b) free of latches gated clock generation c), d) based on latches gated clock generation

(c)

For this reason we must not consider the case of the Fig 5a that would lead erroneous edges in gated\_ck. However, we could consider the case of Fig 5b if we guarantee that the operating frequency in the counter is

$$T = (1/f) < Tp(INH) = Tp(q) + Tp(AND) + Tp(XNOR).$$

Finally, we can choose whatever between the cases in Fig 5c and Fig 5d, the based on latches solutions, because they filter the glitches and do not impose any restrictions on frequency.

As it was previously said there are two possibilities to apply clock gating techniques to counters: doing it over some of the bits or over groups of bits.

In the first case, the cell *i* is substituted by a clock gating cell. If based on latches clock gating is selected, a latch and two gates are the overhead for each substituted cell. With this, the extra hardware will produce noise and power consumption comparable with the saving we would get. If clock gating without latches is selected the overhead is smaller but the operating frequency will be reduced.

The second option is to consider groups of bits sharing the same gated clock. This way, only one inhibition signal (INH) needs to be generated for each group. Then, the noise and power introduced by the extra hardware is much less than the noise and power we can save. Some efforts have been made to speed long counters using partitioning [9]. Our interest here is not this but to study how noise and power can be reduced when we partition the counter in several parts.

## 3.2 Grouping Bits for Clock Gating

To make groups of bits sharing the same clock we must select bits that are consecutive in significance. The least significant bit in the group is the one with the biggest switching frequency and it will impose the frequency of the group clock. In Fig 6 we show three groupment options for the 16 bit counter. In the first we have two blocks of 8 bits, one of them works with the initial clock of the counter (ck) and for the other we have generated a gated clock for the 8th bit so that this cell and the following ones are synchronised. In the second groupment we have generated gated clocks for the 4th, 8th and 12th bits so we have four synchronised groups. Finally the last option implies the biggest cost because we generate gated clocks for 7 groups.

It must be pointed out that applying this grouping strategy the counter do not have full synchronisation between its stages because each group clock is obtained from the previous groups clock signals. This has a good consequence in terms of switching noise reduction because simultaneous switching in flip-flops was the cause of big peaks in  $I_{VDD}$  (section 2) that will be indirectly reduced.



Fig. 6. Three approximations four grouping bits gate clock technique

# 4 Simulation Results

In this section the obtained results for the implemented clock gated counters are shown. In first place, we will centre on the grouping strategy and later we will compare this with the bit strategy.

In Table 1 are the results for the grouping strategy being applied with blocks of 8, 4 and 2 bits. At the first row the reference values of 16 bit binary counter are repeated to easily establish comparisons. In each of the other rows, the absolute values for the three gated clock cases and also the percentages of reduction are collected. An overhead reference has been introduced in the table. For each gate-clocked counter we show two values separated by a slash. The first one corresponds to the number of latches and the second one to the number of gates. As we can see the 8 bit groupment allows a 50% reduction in noise with a minimum extra cost (only two gates and a latch) and minimum desynchronisation. However the power consumption is only reduced in a 20%. The 4 bit groupment allows to low the generated noise until 35% and power consumption until 70%. The 2 bit groupment improves the behavior only in 10% but with much more extra hardware (7 latches and 14 gates). As we can see this technique is very suited for reducing noise.

In Table 2 we show the results when we apply clock gating on independent bits. In this case we focus on bits that switch at a lower rate (the more significant bits), because

they will be the ones that will allow to obtain greater savings. As it has been said, in this case we must substitute each selected cell by the clock gated cell. We have done this using clock gating with latches in three cases: for each of the 8 more significant bits, for each of the 4 more significant bits and for each of the 2 more significant bits. It can be clearly seen that the reductions obtained with this method are much less than with groupment of bits. This is due, as we have already mentioned, to the extra hardware introduced that is comparable to the circuit we stop.

| bits<br>per<br>group | PRMS<br>(mV) | Vpp<br>(mV)             | Vppr<br>(mV)            | Vppf<br>(mV)            | AVG(I)<br>µA   | over-<br>head |
|----------------------|--------------|-------------------------|-------------------------|-------------------------|----------------|---------------|
| 16                   | 12.35        | $70.84 \pm 16.37$       | 87.12 ± 1.65            | 54.53 ± 1.45            | -516.90        | 0             |
| 8                    | 6.18<br>50%  | $33.22 \pm 8.55$<br>47% | 41.46 ± 2.65<br>48%     | $24.95 \pm 1.70$<br>46% | -418.20<br>80% | 1/2           |
| 4                    | 4.15<br>34%  | $21.09 \pm 6.70$<br>30% | $27.10 \pm 3.57$<br>31% | 15.06 ± 2.10<br>28%     | -355.65<br>69% | 3/6           |
| 2                    | 3.13<br>25%  | 13.63 ± 5.52<br>19%     | 17.02 ± 5.59<br>20%     | 10.23 ± 2.57<br>19%     | -326.41<br>63% | 7/14          |

 Table 1. Measurement results for clock gating on groups of bits (overhead is number of latches/number of gates).

Table 2. Measurement results for clock gating on independent bits

| bits  | PRMS<br>(mV) | Vpp<br>(mV)          | Vppr<br>(mV)        | Vppf<br>(mV)            | AVG(I)<br>µA     | over-<br>head |
|-------|--------------|----------------------|---------------------|-------------------------|------------------|---------------|
| none  | 12.35        | $70.84 \pm 16.37$    | 87.12 ± 1.65        | 54.53 ± 1.45            | -516.90          | 0             |
| all   | 10.86        | 51.81 ± 19.87<br>73% | 71.51 ± 2.75<br>82% | $32.08 \pm 2.08$<br>60% | -683.02<br>132%  | 16/32         |
| 8 MSB | 10.75<br>87% | 52.86±14.39<br>74%   | 67.04 ± 2.24<br>77% | 38.64 ± 2.43<br>71%     | -513.63<br>99.4% | 8/8           |
| 4 MSB | 11.39<br>92% | 59.83±17.53<br>84%   | 77.24 ± 1.74<br>89% | 42.38 ± 1.87<br>78%     | -514.99<br>99.6% | 4/4           |
| 2 MSB | 11.81<br>96% | 64.96±18.59<br>92%   | 83.44±1.50<br>96%   | 46.43±1.78<br>85%       | -515.65<br>99.8% | 2/2           |

## **5** Conclusions

Nowadays it is widely accepted that power consumption and switching noise are crucial factors that must be reduced when designing high performance circuits. This communication analyses the applicability of the clock gating technique to binary counters. Although this technique has been usually used to reduce power consumption, in this work, its influence in switching noise generation has been analysed. Two different clock gating strategies have been considered and compared: clock gating on independent bits and clock gating on groups of bits. We have found how clock gating techniques when are conveniently applied on counters are a very good option to reduce power and, specially, switching noise. Particularly, selecting 2 bit groups for clock-gating yields in a reduced 25% of generation noise and 63% of power consumption when comparing to the original counter. Timing implications in each solution have also been considered and we have conclude that there is a small desynchronization that implies a benefit on switching noise reduction.

#### References

- Benini, L., De Micheli, G., "Automatic Synthesis of Low Power Gated-Clock Finite-State Machines", *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Sys*tems, Vol.15, no. 6, June 1996., pp. 630–643
- Benini, L., DeMicheli, G., Macii, E., Poncino, M. y Scarsi, R. "Symbolic Synthesis of Clock-Gating Logic for Power Optimization of Control-Oriented Synchronous Networks". *Europe*an Design & Test Conference (EDTC-97), pp. 514–520.
- Piguet, C., "Low-Power Design of Finite State Machines", in PATMOS, pp. 25-34, Bologna, Sept. 1996.
- A.J. Acosta, R. Jiménez, J., M. J. Juan, Bellido, and M. Valencia, "Influence of clocking strategies on the design of low switching-noise digital and mixed-signal VLSI circuits", in 10th PATMOS, pp. 316–326, Göttingen, Sept. 2000.
- 5. Emnett, F., Biegel, M., "Power Reduction through RTL Clock Gating". SNUG San Jose 2000.
- Stojanovic, V., OKlobdzija, V.G., "Comparative Analysis of Master-Slave LAtches and Flip-Flops for High-Performance and Low-Power Sistems", IEEE Journal of Solid State Circuits, Vol. 34, No. 4, April 1999.
- 7. Jiménez, R., Parra, P., Sanmartín, P. and Acosta, A.J.: "Analysis of high-performance flipflops for submicron mixed-signal applications", International Journal of Analog Integrated Circuits and Signal Processing. Kluwer Academic Publishers. (accepted)
- 8. X. Aragonès, J. L. González and A. Rubio, "Analysis and Solutions for Switching Noise Coupling in Mixed-Signal ICs". Kluwer Academic Publishers, 1999.
- 9. Stan, M., Tenca, A. and Ercegovac, M., "Long and Fast Up/Down Counters". IEEE Transactions on Computers, Vol. 47, No. 7, July 1998.