Analysis and Reduction of Glitches in Synchronous Networks by Jeroen Leijten et al.
Analysis and Reduction of Glitches in Synchronous Networks
Jeroen Leijten1,2, Jef van Meerbergen1 and Jochen Jess2
1 Philips Research Laboratories, WAY 4.47, Prof. Holstlaan 4, 5656 AA Eindhoven, The Netherlands
2Department of Electrical Engineering, Eindhoven University of Technology, The Netherlands
Abstract
The inﬂuence of transition activity on dynamic power
dissipation is analysed by examining three components:
dissipation in combinational logic, ﬂipﬂops and clock line.
Transition activity is analysed by making a distinction be-
tween useful transitions and glitches (useless transitions).
A transition counting and parity evaluation method is used
for this.
Most glitches can be eliminated by introducing ﬂipﬂops
using retiming and pipelining and/or by choosing different
architectures. In this way an optimal level for pipelining
can be found.
1 Introduction
As battery powered products using integrated circuits
become more important, severe constraints are imposed on
the power that may be consumed by these circuits. Apart
from this, the integration of an increasing number of tran-
sistors on a single chip can lead to areas of excessive pow-
er dissipation which can cause reliability problems.
Furthermore excessive power dissipation leads to IC-pack-
aging problems. These considerations reveal the need for a
reduction in power dissipation of integrated circuits.
Several actions at different levels of the design can be tak-
en to reduce power dissipation in CMOS integrated cir-
cuits [1]. Our research focuses on gate level architectures
for synchronous static CMOS circuits. Glitches can occur
in these circuits, leading to unnecessary power dissipation.
Other research [2][3][4] shows that the number of transi-
tions can be counted and used to obtain a circuit perform-
ance measure regarding power dissipation. It further shows
that the reduction of glitches helps to obtain a signiﬁcant
decrease in power dissipation.
In our work a new transition counting method is used in
which a distinction is made between useful transitions and
glitches. First we analysed the large impact of glitches on
transition activity using probability calculations on a ripple
carry adder. Next we investigated the inﬂuence of delay
imbalance in circuit architectures on the number of glitch-
es. This was done using unit delay gate level simulations
on different multiplier architectures.
Layout level simulations were carried out on a typical
video processing unit. The effect of introducing ﬂipﬂops
(by using retiming) to reduce the amount of glitches was
investigated using these simulations. For this purpose dy-
namic power dissipation was examined by dividing it in
three components: dissipation in the combinational logic,
in the ﬂipﬂops and in the clock line.
After some preliminary remarks regarding transition ac-
tivity and power dissipation our methods and results are
presented. Finally our conclusions will be given.
2 Transition activity and power dissipation
We will refer to a signal change in a circuit node from
logic level 0 to logic level 1 or vice versa as a signal tran-
sition. A signal change from 0 to 1 will be termed a power-
consuming transition, because in this case the loading ca-
pacitance will be charged directly from the supply and
therefore power from the supply is consumed by the cir-
cuit. This can be seen from Figure 1 where the effect of a
transition in an arbitrary circuit node is depicted.
Figure 1 Dynamic power dissipation caused by sig-
nal transitions in node X.
A transition can be either useful or useless. A transition
is termed useful when it is needed to ensure the correct
functional behaviour of the architecture being used. Other-
wise it is termed useless. Whether a transition is useful or
useless depends on other possible transitions in the same
Vdd
Cload
1
0
1
0
node X
Vdd
Cload
1
0
1
0
node X
Charging of Discharge of
loading capacitance. loading capacitance.clock cycle. This will be further explained in section 3.3.
Two consecutive useless transitions constitute a glitch.
Figure 2 shows an arbitrary binary signal exhibiting these
different forms of transitions.
In equation 1 the dynamic power dissipation in CMOS
is given as a function of the probability of a power-con-
suming transition pt , load capacitance Cload , supply volt-
age Vdd and clock frequency f.
(1)
In static CMOS the dynamic power dissipation domi-
nates the total power dissipation. From 1 it is clear that the
dynamic power dissipation is linearly dependent on the
number of power-consuming signal transitions and conse-
quently on the number of glitches.
Figure 2 Possible forms of signal transitions.
3 Probability analysis of transition activity
To gain a ﬁrst insight in the problem of glitches in logic
circuits, transition activity in a ripple carry adder was ana-
lysed using probability calculations.
An N-bit ripple carry adder consists of N cascaded full
adders (see Figure 3). Here we assume a unit delay model
for every full adder stage. We further assume that new in-
put bits Ai and Bi of the two input operands AN-1...A0 and
BN-1...B0 always arrive at the beginning of a clock cycle.
Figure 3 Worst case number of transitions in an N=4
bit ripple carry adder.
Pdyn ptCloadVdd
2
f =
time in clock cycles
power consuming
useful transition
useless transition
glitch
pct
uft
ult
pct
ult
uft
pct
ult ult
glt
pct
uft
glt
signal
pct uft
ult
glt transition
clock period
1
0
logic
level
S0 S2 S1 S3
C0 C1 C2 C3 C4
1
0
1
0
1
0
1
0
1
0
1
1
0
1
0
0
1
0
1
0
0
0
1
0
1
1
0
1
0
0
0
1
1
-
1
0
1
0
1
1
0
0
0
1
1
0
1
1
0
1
-
1
FA3 FA2 FA1 FA0 Dt
clock cycle
new inputs
previous inputs
time
useful transitions
When the inputs change, full adder FA0 will introduce a
delay in the calculation of sum bit S0 and carry out bit C1.
Thus full adder FA1 will ﬁrst compute its sum S1 and carry
out C2 using the new input bits A1 and B1 as well as the old
carry input C1 of the previous addition. After this C1 will
reach its new value and full adder FA1 will perform another
calculation with this new value. This means that two tran-
sitions can occur in S1 and C2 within a single clock cycle.
Similarly it can be found that in total three transitions
can occur in S2 and C3. In general we ﬁnd that i+1 transi-
tions can occur in Si and Ci+1 of full adder FAi within a sin-
gle clock period. This means that in a worst case situation
N transitions occur in SN-1 and CN.
3.1 Worst case number of transitions
The worst case situation is only possible if CN has the
values 0,1,0,1,... or 1,0,1,0,... for the delta time moments
Dt0, Dt1, Dt2, ..., DtN-1 within a single clock period. This can
only occur if two conditions are met:
1. After the completion of the previous addition the car-
ry bits of the full adders have the values (CN,C N-
1,C N-2,C N-3) = (0,1,0,1,...) or (CN,C N-1,C N-2,C N-
3, ...) = (1,0,1,0,...).
2. The carry must be able to ripple through all N full
adder stages.
It can be shown [5] that the probability that both condi-
tions are met is equal to  for random inputs. Al-
ready for relatively small word sizes N this probability will
be negligible. More interest lies in the average number of
transitions appearing in a circuit.
3.2 Average number of transitions for random
inputs
We have calculated the average number of transitions
per clock period appearing in a signal Xi. We call this ﬁg-
ure the average transition ratio TR(Xi).
Arithmetic circuits, like multipliers and adders, are of-
ten used in a multiplexed environment. Due to this multi-
plexing the original signal statistics and correlations are
lost, and randomly distributed input signals arrive at the in-
puts of the arithmetic circuits. Furthermore source coding
is often applied to signals that are fed as inputs to arithme-
tic circuits. Source coding also results in almost complete
loss of signal correlation. Under those conditions assuming
random inputs gives a good approximation of a practical
situation in which these arithmetic circuits are used.
In [5] we show that for random inputs the average tran-
sition ratio TR(Ci+1) for carry out Ci+1 of full adder FAi in a
ripple carry adder is equal to:
(2)
The average transition ratio TR(Si) for sum Si of full
31 8 ¤ () N ×
TR Ci 1 + () 3
4
--- 3
4
--- 1
2
--- èø
æö i 1 +
× – =adder FAi is equal to:
(3)
3.3 Average number of useful and useless transi-
tions
A useful transition in a signal occurs when its ﬁnal val-
ue which resulted from the previous addition differs from
the value resulting from the new addition. In other words, a
useful transition occurs if the signal makes transitions
0Ô1Ô0Ô...Ô1 or 1Ô0Ô1Ô...Ô0 within a single clock
cycle. From this observation the following properties of
useful and useless transitions can be derived:
1. If the number of transitions a signal makes within one
clock period is an odd number, this signal makes one
useful transition. All extra transitions it makes are
useless transitions.
2. If the number of transitions a signal makes within one
clock period is an even number, all these transitions
are useless transitions.
An example for this is given in Figure 4.
Figure 4 Useful and useless transitions in a single
clock cycle.
We used these two properties and probability calcula-
tions to determine the average useful transition ratios per
clock cycle (UFTR(Xi)) and average useless transition ra-
tios per clock cycle (ULTR(Xi)) [5]. The following ratios
were found:
(4)
(5)
(6)
(7)
If we multiply these ratios with the total number of
clock cycles (or equivalently the total number of input
stimuli) the average number of transitions, useful transi-
tions, and useless transitions can be calculated. This is
TR Si () 5
4
--- 3
4
--- 1
2
--- èø
æö i
× – =
time in clock cycles
2 useful transitions signal 1
time in clock cycles
2 useless transitions signal 2
time in clock cycles
signal 3
1 useful transition
2 useless transitions
1
2
0
1
0
3
UFTR Si () 1
2
--- =
ULTR Si () 3
4
--- 3
4
--- 1
2
--- èø
æö i
× – =
UFTR Ci 1 + () 1
2
--- 1
2
--- 1
4
--- èø
æö i 1 +
× – =
ULTR Ci 1 + () 1
2
--- 1
2
--- èø
æö i 1 + 1
2
--- – èø
æö 1
2
--- èø
æö i 1 +
1 – èø
æö ×× =
done in Figure 5 for 4000 random inputs applied to a ripple
carry adder.
It is apparent that the number of useless transitions con-
stitutes a large part of the total number of transitions ap-
pearing in the circuit. For the ripple carry adder used with
4000 random inputs, a total number of 119002 transitions
is found using equations 2 and 3 and summing the results
for all sixteen sum and carry bits. Using equations 4, 5, 6
and 7 it can be found that 63334 of these transitions are
useful. The remaining 55668 transitions are useless transi-
tions. In other words, the ratio of useless transitions to use-
ful transitions L/F is equal to L/F = 55668/63334 = 0.88.
Figure 5 Useful transitions and useless transitions
occurring for 4000 random inputs.
4 Inﬂuence of delay imbalance
To investigate the effect of delay imbalance on transi-
tion activity, gate level simulations were carried out on dif-
ferent architectures. These architectures implement the
same functional behaviour, but have more or less unbal-
anced delay paths in their structures. During the gate level
simulation all internal signal nodes are monitored. The
number of signal transitions for each monitored signal and
for each clock cycle is counted. Using parity evaluation ac-
cording to the two properties mentioned in section 3.3 a
distinction between useful and useless transitions can then
be made.
4.1 Multiplier architectures
Multipliers have functional properties that make it easy
to introduce more or less delay unbalanced paths in their
architecture. For example, a choice between an array archi-
tecture and a wallace tree architecture can be made. These
types of multiplier architectures were used as test cases to
investigate the inﬂuence of delay imbalance on transition
activity.
Figure 6 shows the basic architecture of an 8x8 array
multiplier for positive numbers. It is clear that many unbal-
anced delay paths exist in this type of architecture. Figure
03 7 1 1 1 5
1000
4000
2000
3000
5000
bit number
14 8 1 21 6
1000
2000
3000
bit number
useful useful
useless
useless
sum
carry7 shows the architecture of an 8x8 wallace tree multiplier.
This architecture has a much more balanced structure.
Figure 6 8x8 array multiplier architecture.
Figure 7 8x8 Wallace tree multiplier architecture.
Both multiplier architectures were simulated using unit
delay modelling for the multiplier cells. This was done
both for the 8x8 multipliers and their 16x16 equivalents.
The results of our simulations using 500 random inputs are
given in Table 1.
From this table it can be seen that the wallace tree mul-
tiplier has far less useless transitions and a much better
Table 1 Transition activity for 500 random inputs.
array mply wallace tree mply
8x8 16x16 8x8 16x16
total 58858 438575 50824 200380
useful F 23418 102845 39608 173330
useless L 35440 335730 11216 27050
L/F 1.51 3.26 0.28 0.16
i,j i-1,j i+1,j
i,j+1
i,j-1
Y0 X0 Y0 X1 Y0 X7
Y1 X7
Y7 X7
P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
Y2 X7
Y3 X7
Y4 X7
Y5 X7
Y6 X7
Y0 X6 Y0 X5 Y0 X4 Y0 X3 Y0 X2
10bit CSA 10bit CSA
13bit CSA 11bit CSA
15bit CSA
16bit CSA
17bit RCA
carry save
ripple carry
product
M0 M1 M2 M3 M4 M5 M6 M7
summands
14 12 10 8 6 4 2 0
M5 M6
M7
M1
M2
M3
M4
M0
14 12 10 8 6 4 2 0
14 12 10 8 6 4 2 0
14 12 10 8 6 4 2 0
14 12 10 8 6 4 2 0 16
ﬁnal addition
product
adder
adder
useless/useful ratio than the array multiplier. The 16x16
wallace tree multiplier has more useful transitions than the
16x16 array multiplier due to its larger number of gates.
However, the number of useless transitions in the 16x16
wallace tree is much smaller than in the 16x16 array multi-
plier. It is clear that decreasing the number of unbalanced
delay paths in the circuit architecture signiﬁcantly reduces
the number of useless transitions.
In practice the delay of the sum calculation in a full
adder is about twice as large as the delay of the carry cal-
culation. A more realistic unit delay modelling of the 8x8
multipliers was therefore applied by deﬁning the delay of
the sum generation dsum in every full adder (in a multiplier
cell) to be twice as large as the delay of the carry genera-
tion dcarry. The same transition activity simulations were
then carried out on these more accurate models and com-
pared to the previously used unit delay models (which used
equal sum and carry delays). The results of these simula-
tions are shown in Table 2. As expected because of the in-
creased delay imbalance, the number of useless transitions
is higher. This further deteriorates the useless/useful ratios.
The difference in the number of useless transitions ap-
pearing in the array multiplier compared to the number ap-
pearing in the wallace tree multiplier is still signiﬁcant.
4.2 Direction detector architecture
Transition activity simulations were also carried out on
a processing unit for Phideo, known as a direction detec-
tor. A block diagram of a direction detector is shown in
Figure 8. The direction detector is used in the implementa-
tion of a so-called progressive scan conversion algorithm
[6].
Random inputs are again a good choice for these transi-
tion activity simulations. The original video input signal
statistics and correlations are almost completely lost very
early in the circuit, immediately after the absolute differ-
ences are taken. In the rest of the circuit the signals will be
more and more randomly distributed.
The direction detector was simulated using unit delay
modelling and 4320 random inputs. The results of the tran-
sition activity simulation on the direction detector were as
follows:
number of useful transitions: 272842
number of useless transitions: 1033970
Table 2 Transition activity for 500 random inputs.
8x8 array 8x8 wallace tree
dsum =
dcarry
dsum =
2*dcarry
dsum =
dcarry
dsum =
2*dcarry
useful F 23552 23552 38786 38786
useless L 34346 47340 11274 24762
L/F 1.46 2.01 0.29 0.64ratio useless/useful: 3.79
From this useless/useful ratio it is clear that on average
for each useful transition about 3.8 useless transitions oc-
cur. Therefore transition activity in the combinational logic
of the direction detector can be reduced with a factor of 1 +
3.8 = 4.8 if all delay paths are balanced.
Figure 8 Direction detector.
5 Inﬂuence of ﬂipﬂops
Power dissipation is inﬂuenced by transition activity.
However, equation 1 shows that loading capacitances of
circuit nodes also play an important role. To be able to ob-
tain accurate power dissipation results, circuit level simu-
lations were carried out on extracted netlists of real circuit
layouts. The goal of these simulations was to investigate
the inﬂuence of extra ﬂipﬂops in the circuit on power dissi-
pation. Flipﬂops can be used to improve delay balance in a
circuit. In this way they help to reduce the amount of
glitches and thus power dissipation in combinational logic.
Figure 9 shows how the insertion of ﬂipﬂops by using
retiming [7][8] affects the occurrence of glitches in a cir-
cuit. The ﬂipﬂops inserted in the input lines just before the
operation node make sure both inputs arrive at the same
time. Therefore no glitches will appear at the output of the
operation node.
The power dissipation simulation results were divided
into three components:
1. dissipation in the combinational logic
2. dissipation in the ﬂipﬂops
3. dissipation in the clock line
This was done by simulating the circuit with a separate
clock power supply and a main power supply feeding the
rest of the circuit. Clock power dissipation and main power
dissipation were measured separately. Flipﬂop power dis-
ﬁnd
min/max
> > >
select
min/max
>
|a-b|
|a-b| |a-b| |a-b|
select
threshold
a[2] a[1] a[0]
b[2] b[1] b[0]
default direction
along a[1],b[1]
direction
min max
is_max
is_min
sipation was calculated by multiplying the average power
dissipation of a single ﬂipﬂop, having 50% input transition
activity1, by the total number of ﬂipﬂops in the circuit.
Power dissipation in the combinational logic was then cal-
culated by subtracting the ﬂipﬂop power from the simulat-
ed main power dissipation.
Figure 9 Glitches and retiming.
Four different direction detector layouts were created in
0.8mm, 5V layout technology. Each was retimed for a dif-
ferent clock frequency, resulting in more or less pipeline
ﬂipﬂops in the circuits. Introducing more ﬂipﬂops increas-
es the delay balance in the circuit and consequently more
glitches are eliminated. On the other hand more ﬂipﬂops
result in more ﬂipﬂop power dissipation and more clock
lines, and more clock lines increase the clock capacitance
and consequently clock power dissipation.
All circuits were simulated for 20 clock periods using
20 ‘random’ inputs at an equivalent clock frequency of
5MHz. The relatively small number of 20 ‘random’ inputs
was used to limit the large amount of simulation time
needed for circuit level simulation. However, this limited
number of inputs still appeared to give reliable results. The
results of the simulations are shown in Table 3 and are
graphically displayed in Figure 10.
The power dissipation in the combinational logic reduc-
es as the retiming frequency and consequently the number
1.  It is realistic to assume that on average the input of a ﬂipﬂop in the cir-
cuit is constant for about 50% of the time and changing for the rest of
the time under normal circuit operation for random inputs. Using this
50% input transition activity ﬁgure the dynamic power dissipation of
a single ﬂipﬂop can be estimated using circuit level simulation.
Table 3 Power dissipation simulation results.
circuit number 1234
number of ﬂipﬂops 48 174 218 350
area (mm2) 0.73 0.99 1.00 1.23
clock capacitance (pF) 3.2 10.5 12.8 19.9
logic power (mW) 21.8 9.7 7.5 6.1
ﬂipﬂop power (mW) 0.9 3.3 4.1 6.6
clock power (mW) 0.5 1.5 1.8 2.8
total power (mW) 23.2 14.5 13.4 15.5
operation
unbalanced delay paths
operation
unbalanced delay paths
glitches no glitchesof ﬂipﬂops in the circuit increases. A factor of
 difference in logic power dissipation is
found between circuits 1 and 4. This reduction comes from
the dramatic decrease in the number of glitches when a
large number of pipelining ﬂipﬂops is inserted in the cir-
cuit.
Flipﬂop power dissipation increases with an increasing
number of ﬂipﬂops. As we stated before, this power dissi-
pation ﬁgure was calculated and is linearly dependent on
the number of ﬂipﬂops.
The clock power dissipation is highly dependent on the
load capacitance of the clock input. Because extra clock
circuitry is necessary when more ﬂipﬂops are inserted in
the circuit, this capacitance will increase. This explains the
fact that the clock power dissipation increases with an in-
creasing number of ﬂipﬂops.
Figure 10 Power dissipation results as a function of
the number of ﬂipﬂops in the circuit.
The total power dissipation plot exhibits a minimum for
a certain number of ﬂipﬂops. Since this number is depend-
ent on the retiming frequency that was used, we ﬁnd a
retiming frequency which is optimal for power dissipation.
Suppose the operating point of the original circuit lies to
the left of this minimum in Figure 10. Then, retiming the
circuit for a higher throughput than necessary, while run-
ning it at the original clock frequency will result in lower
power dissipation.
On the other hand, consider the case where the operat-
ing point of the original circuit lies to the right of the mini-
mum in Figure 10. Then, retiming the circuit for a lower
throughput than necessary, lowering the clock frequency
accordingly and putting two or more (sub)circuits in paral-
lel may also result in lower power dissipation provided that
the power consumed by the necessary extra multiplexing
21.8 6.1 ¤ 3.6 »
5
10
15
20
25
50 100 150 200 250 300 350
number of ﬂipﬂops
power dissipation
(mW)
total
logic
ﬂipﬂop
clock
0
and control circuitry is low enough.
6 Conclusions
Glitches play an important role in excessive useless
power dissipation. The ratio of useless transitions to useful
transitions can be very large as we showed in our examples
of adder, multiplier and processing unit architectures.
A signiﬁcant reduction in power dissipation can be
achieved if the amount of glitches is reduced. This can be
done by balancing delay paths and/or by introducing ﬂip-
ﬂops in the circuit.
Flipﬂops can be introduced in the circuit by using
retiming. By ﬁnding the right amount and positions of
these pipelining ﬂipﬂops an optimal total power dissipa-
tion can be obtained. This means that an optimum retiming
for power dissipation exists.
References
[1] R. Brodersen, A. Chandrakasan and S. Sheng, “Low-Power
Signal Processing Systems”, 5th IEEE Workshop on VLSI
Signal Processing, California, USA, October 1992.
[2] T.K. Callaway and E.E. Swartzlander, Jr., “Optimizing
Arithmetic Elements for Signal Processing”, 5th IEEE
Workshop on VLSI Signal Processing, California, USA,
October 1992.
[3] M.D. Ercegovac and T. Lang, “Reducing Transition Counts
in Arithmetic Circuits”, Digest of Technical Papers IEEE
Symposium on Low Power Electronics, San Diego, USA,
October 1994.
[4] Uming Ko, P.T Balsara and Wai Lee, “A Self-timed
Method to Minimize Spurious Transitions in Low Power
CMOS Circuits”, Digest of Technical Papers IEEE
Symposium on Low Power Electronics, San Diego, USA,
October 1994.
[5] J. Leijten, J. van Meerbergen and J. Jess, “Analysis of
Transition Activity and Power Dissipation in Synchronous
Logic Circuits”, Internal Report Eindhoven University of
Technology, Eindhoven, The Netherlands, December 1993.
[6] J. van Meerbergen, et al., “Architectural Strategies for
High-Throughput Applications”, Journal of VLSI Signal
Processing, vol. 5, 1993, p. 201-220.
[7] A. van der Werf, et al., “Efﬁcient Timing Constraint
Derivation for Optimally Retiming High Speed Processing
Units”, Proceedings of the 7th International Symposium on
High-Level Synthesis, Ontario, Canada, May 1994.
[8] OPTIMA, Philips Electronic Design & Tools, Hilversum,
The Netherlands, 1992.