Minimal-power, delay-balanced smart repeaters for interconnects in the nanometer regime. by Weerasekera, Roshan et al.
Minimal-Power, Delay-Balanced Smart Repeaters
for Interconnects in the Nanometer Regime
Roshan Weerasekera, Dinesh Pamunuwa*, Li-Rong Zheng and Hannu Tenhunen
Department of Electronic, Computer and Software Systems,
KTH Information and Communication Technology,
ELECTRUM 229, 164 40 Kista, Sweden.
*Centre for Microsystems Engineering, Faculty of Applied Sciences,
Lancaster University, Lancaster LA1 4YR, United Kingdom.
{roshan,lirong,hannu}@imit.kth.se, d.pamunuwa@lancaster.ac.uk
ABSTRACT
In this paper we propose a smart repeater that consumes
less energy and is suitable for driving global interconnec-
tions in nanometre technologies. When there is coupling
between interconnects, the effective capacitance of a given
wire is a function not only of the physical geometry, but also
the relative switching pattern described by the bits on the
wire in question (the victim) and the adjacent wires (ag-
gressors). The drive strength of a traditional repeater is
static, resulting in a spread of the propagation delay, with
the repeater strength being essentially too much for every
bit pattern other than the worst-case pattern. In the pro-
posed SMART repeater, the drive strength is dynamically
altered depending on the relative bit pattern, by partitioning
it into a Main Driver and Assistant Driver. For a higher
effective load capacitance both drivers switch, while for a
lower effective capacitance the assistant driver is quiet. By
disconnecting part of the repeater when it is not needed, the
total load capacitance to the previous stage is reduced, re-
sulting in reduced energy consumption for those instances.
It is shown that the potential average saving in energy can
be as much 15% with a 18% jitter reduction over a tradi-
tional repeater for typical global wire lengths in nanometre
technologies.
Categories and Subject Descriptors
B.7.1 [Integrated Circuits]: Types and Design Styles—




Delay-Balanced, Interconnects, Minimal-Power, Repeaters.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SLIP’06, March 4–5, 2006, Munich, Germany.
Copyright 2006 ACM 1-59593-255-0/06/0003 ...$5.00.
1. INTRODUCTION
Shrinking of the minimum feature size used in fabrication
of ICs has resulted in exponential growth of performance
and functionality over the past four decades. The integra-
tion of millions of devices on a single die however poses many
difficult engineering challenges, most notably in power man-
agement and on-chip communication. As chip complexity
and area grow, despite the best efforts to exploit locality
with innovative architectural solutions, the average distance
across which a bit has to be transferred has increased, and
interconnection delay is a key bottleneck in modern digital
design. Scaling of wires and tighter integration has also re-
sulted in signal integrity problems which only add to the
interconnection woes; cross-talk between signal lines results
in signal corruption and variable delay, depending on the
respective switching patterns.
A key technique in reducing propagation delay and signal
degradation is repeater insertion. Although very effective
and simple, this has an adverse effect on power consumption,
and it has been estimated that over 50% of the power in a
high performance microprocessor is dissipated by repeaters
charging and discharging interconnects [13, 21, 14]. Further,
over 90% of this power is concentrated in only 10% of the
interconnects; i.e. those which are classed as global and run





Figure 1: Basic schematic of the proposed driver
scheme
In this paper we propose a repeater that consumes less
energy, and is suitable for exactly these kinds of global in-
terconnections. It exploits the fact that in a parallel wire
structure, the effective capacitance of a given wire is dy-
namic; i.e. it is a function of not only the physical geometry,
but also the relative switching pattern described by the bits
on the wire in question (the victim) and the adjacent wires
(aggressors). With a traditional repeater, since the drive
113
strength is static, the result is a spread of the propagation
delay, with the repeater strength being essentially too much
for every bit pattern other than the worst-case pattern. In
the proposed repeater, the drive strength is dynamically al-
tered depending on the relative bit pattern, by partitioning
it into a Main Driver and Assistant Driver. For a higher
effective load capacitance both drivers switch, while for a
lower effective capacitance the assistant driver is quiet [27].
By disconnecting part of the repeater when it is not needed,
the total load capacitance to the previous stage is reduced,
resulting in reduced energy consumption for those instances.
It is shown that the potential average saving in energy can
be as as much 15% over a traditional repeater for typical
global wire lengths in nanometre technologies.
The ramifications of the dynamically changing load in cou-
pled interconnects have received a fair amount of attenion
in the literature. A comprehensive analysis of design con-
siderations for repeater insertion in a bus structure with
heavy coupling was presented in [19]. A scheme proposed
in [9] staggers the repeaters so that opposing transitions
only persist for the length of the offset between repeaters,
and become best-case patterns for the remainder, resulting
in a delay reduction. Many innovative alternatives to the
traditional repeater have also been proposed, such as the
Transient Sensitive Accelerator (TSA) [8], Charge Recycling
Technique (CRT) [24], Boosters [15], the TAGS receiver [11],
the Aggressor-Aware Repeater [10], and the Capacitor Cou-
pled Trigger and Accelerator combination [7]. Some of these
use skewed inverters to trade-off noise margin for speed [8,
15, 11], while others consume more energy [10] and occupy
a larger area [8, 15, 11] to produce a faster response.
The work done in [6] and [17] also seek to reduce the delay
by avoiding simultaneous switching similar to [9], but they
accomplish this by introducing static delays in the repeaters
rather than by physical offsets in the placement. They re-
port an overall reduction in the delay for the worst-case pat-
tern of upto 20%, but this scheme dissipates more power for
transitions in the same direction, due to additional charging
and discharging of the coupling capacitance. [4] and [16]
report average energy savings of upto 25% by introducing a
delay dependant on the relative transition pattern between
two adjacent wires, but this additional delay introduces a
timing penalty. The worst-case pattern for the delay is also
the worst case pattern for the energy, and hence any energy
saving is at the cost of an increase in the cycle time, which
may not always be possible.
Another approach is to use Error Control or Transition
Coding Techniques [26, 22, 25, 12, 23, 1, 20] to overcome the
effects of inter-symbol interference. The relatively complex
codec circuitry causes additional delay and consumes more
power, rendering the coding ineffective in many cases [3].
Even otherwise, these schemes mostly address the problem
of reducing transitions on a given wire, which is less impor-
tant than reducing the relative switching activity between
lines, given that the aspect ratio of on-chip interconnect em-
phasises the coupling capacitance over the self capacitance.
In general, not only do these alternatives to traditional re-
peaters require much effort in circuit design similar to library
cell design, but they also lack a clear high-level abstraction;
in contrast, performance metrics such as delay and energy
consumption can easily be quantified in terms of a few crit-
ical design parameters for the traditional inverting repeater









Figure 2: Schematic of proposed SMART repeater,
shown here only for two coupled nets
ent levels of hierarchy from initial signal planning to detailed
place and route [18].
One of the main advantages of the repeater circuit pro-
posed here is that the relatively minor increase in circuit
complexity required to obtain the energy saving and delay
equalization described above can be completely abstracted
in the performance analysis. We present a design method-
ology similar to that for traditional single-wire inverting re-
peaters, including an RC equivalent circuit and closed-form
expressions for the first-order approximation to the delay.
Therefore this repeater can be very easily modeled in tasks
such as delay calculation, signal integrity analysis and tim-
ing driven optimisation in any CAD flow for physical design.
The rest of this document is structured as follows. Sec-
tion 2 describes the repeater circuit and analyses the energy
saving resulting from it. The following section presents the
circuit abstraction and design methodology. In the next
section we present simulation results that validate the delay
analysis and show the average saving in energy for typical
wire lengths in a 180nm technology. We end with a discus-
sion.
2. CONCEPT OF THE SMART DRIVER
The effective interconnect capacitance varies with the tran-
sitions of neighboring lines and can be written as Cs + λCc,
where Cs is the self capacitance of the wire, λ is the switch
factor and Cc is the inter-wire capacitance. In this work we
use different switch factors for delay and power estimation
(given in Table 1) based on the experimental validation
in [5] which proposes power-based switch factors that are
slightly different from the delay-based ones. The variation of
the effective capacitance with the relative switching pattern
introduces a spread in the arrival time at the far end of the
wire. To demonstrate this a pair of coupled lines is used as a
constituent unit for a bus. For two simultaneously switching
lines, sixteen possible switching combinations can be iden-
tified. These can be categorized into five different groups
according to the effective capacitance as follows. Group 1:
Both switch in the same direction; Group 2: Both lines are
quiet (at 0 or 1); Group 3: One line is switching while the
other is quiet at 0; Group 4: One line is switching while the
other is quiet at 1; Group 5: The lines switch in opposite
directions.
To ensure error-free operation, timing constraints have
to be satisfied for the switching pattern that causes the
worst-case delay, which are the ↑↓ and ↓↑ combinations.
Since the effective load is highest for these patterns, the size
114
Group Case Switching Event on Switch Factor Energy Dissipation for wire i (× 1
2
V 2dd) with
wire i wire j Delay-Based(λd) Power-Based(λp) Traditional driver Smart driver
1
1 ↓ ↓ 0 0.25 Cw trad + 0.25Cc Cw smart + 0.25Cc
2 ↑ ↑ 0 0.25 Cw trad + 0.25Cc Cw smart + 0.25Cc
2
3 0 0 n.a. n.a. 0 0
4 0 1 n.a. n.a. 0 0
5 1 0 n.a. n.a. 0 0
6 1 1 n.a. n.a. 0 0
3
7 0 ↑ 1 1 0 0
8 ↑ 0 1 1 Cw trad + Cc Cw smart + Cc
9 0 ↓ 1 1 0 0
10 ↓ 0 1 1 Cw trad + Cc Cw smart + Cc
4
11 1 ↑ 1 0 0 0
12 ↑ 1 1 0 Cw trad Cw smart
13 1 ↓ 1 0 0 0
14 ↓ 1 1 0 Cw trad Cw smart
5
15 ↑ ↓ 2 1.75 Cw trad + 1.75Cc Cw trad + 1.75Cc
16 ↓ ↑ 2 1.75 Cw trad + 1.75Cc Cw trad + 1.75Cc
Table 1: Switching Activities on the lines and the variation of effective capacitance.
of the buffer designed statically for the worst-case delay is
much larger than would be necessary for the same timing
requirements for other patterns [19]. Now this worst-case
condition occurs only twice out of 16 possible input switch-
ing patterns, with a probability of 1/8 for simultaneously
switching lines if the transitions are equally distributed as
in a random bit stream. For the 14 other cases, the wire
is driven faster, which just translates to slack which typi-
cally cannot be used, consuming energy unnecessarily. The
driver proposed here changes its drive strength depending
on the neighbour’s switching direction by using some simple
logic. A basic schematic of the proposed SMART repeater
is shown in Figure 2. If the switching pattern belongs to
Groups 1, 3, or 4, a single inverter (the Main driver) drives
the interconnect. When a switching pattern in Group 5 oc-
curs, another inverter (the Assistant) also drives the line,
increasing the total drive strength appropriately. By dis-
connecting the Assistant driver when it is not needed, part
of the parasitic capacitance is disconnected for the majority
of the switching patterns, leading to a saving in the average
energy consumption.
2.1 Energy Saving with the SMART Driver
Each time a wire is driven from 0 to VDD, an energy
amounting to CeffV
2
dd is drawn from the power supply. Half
of this is stored in the load capacitance while the rest is dis-
sipated in the pull-up network of the driver. During a VDD
to 0 transition, the energy stored in the capacitance is dis-
sipated in the pull-down network of the driver. The energy
dissipation when two wires are coupled together is given in
Table 1 for all possible switching patterns. In accordance
with common terminology, the size of a traditional inverting
repeater is defined in terms of multiples of a minimum sized
repeater as Ht. Since the driving portions of the SMART
driver are two inverters, they can be characterized in a sim-
ilar fashion as Hm and Ha, which denote the sizes of the
Main and Assistant drivers respectively. The total static
capacitive load of the traditional driver, Cw trad, can be de-
fined as Cs + Ht(Cdmin + Cgmin) (i.e. the sum of its own
parasitic drain capacitance, the self capacitance of the wire,
and the gate capacitance of the target load (a repeater for
the purpose of this analysis) at the end of the wire. Here
Cgmin and Cdmin are the gate capacitance and the drain dif-
fusion capacitance of a minimum sized inverter. Similarly,
Cw smart can be described as Cs +HtCdmin +HmCgmin.
The energy dissipation per cycle depends on whether or
not switching transitions occur, and on the relative switch-
ing pattern as given in Table 1. A switching transfer is
a probabilistic event, and in general there can be temporal
and spatial correlations between transitions on the same line
and also on different lines. The average energy dissipation




[ps,s (Cw trad + 0.25Cc) + pe,1Cw trad(1)
+ pe,0 (Cw trad + Cc) + po,o (Cw trad + 1.75Cc)]
where px,y is the probability that wires i, j switch as defined
below:
(s, s) - both wires switch in same direction;
(e, 0) - wire i switches up or down while wire j is quiet at 0;
(e, 0) - wire i switches up or down while wire j is quiet at 1;
(o, o) - both wires switch in different directions.




[(ps,s + pe,0 + pe,1 + po,o)Cw trad (2)
+ (0.25ps,s + pe,0 + 1.75po,o)Cc]
As enumerated in Table 1, the capacitive load Cw for the
SMART driver in a bus structure is Cw smart for Group 1-4




[(ps,s + pe,0 + pe,1)Cw smart (3)
+ po,oCw trad + (0.25ps,s + pe,0 + 1.75po,o)Cc]
From (3) and (4) the average total energy saving for a single
segment of a repeater-inserted net is found to be:
∆E = 0.5(ps,s + pe,0 + pe,1)(Cw trad − Cw smart)V 2DD (4)
Substituting Cw trad = Cs+Ht(Cgmin+Cdmin) and Cw smart =
115




(ps,s + pe,0 + pe,1)CgminV
2
DD (5)
If all switching events are random uniformly distributed
events with no correlation between neighbouring lines, ps,s =
pe,1 = pe,0 =
2
16




(Ht −Hm)CgminV 2dd (6)
This can be verified by averaging the energy disspation








(2Cw trad + 6Cw smart + 6Cc)(8)
It can be easily seen the energy saving given by (7)-(8) is
the same as (6).
This energy analysis assumes that the energy consumed by
the selection logic of the SMART driver is negligible. This is
true for repeaters driving global wires, and the smaller the
technology, the more accurate is the assumption.
To obtain the same performance as a traditional driver for
the worst-case, the main and assistant drivers are designed
such that Ht = Hm + Ha. In essence (5) affirms that the
larger the Assistant driver, the larger the energy saving, as
this results in a lower average load.
2.2 Jitter Reduction with the SMART Driver
In Figure 3 the curves with solid lines represent the out-
put response of a conventional driver, for minimum effec-
tive capacitance (Best-Case) and maximum effective capac-
itance (Worst-Case). Our aim, with the SMART driver, is
to delay the response for the best-case without affecting the
worst-case, so that the variation in delay becomes as small
as possible. In other words, the concept is to make the re-
sponse slower with minimum effective capacitance, as the
cycle delay has to be set to the worst-case delay anyway.
3. DESIGN METHODOLOGY
3.1 Delay Modeling
The delay analysis for repeater insertion is carried out
by characterizing a minimum sized inverter by an output
resistance Rgmin, in addition to the input gate capacitance
Cgmin and output drain-diffusion capacitance Cdmin already
defined. This linearization allows us to use superposition
to find the total delay, and is sufficiently accurate for the
intended design tasks of global signal planning and incre-
mental physical optimization. The 50% delay of the circuit
can be approximated by the Elmore delay, with a factor of
0.4 being used to account for the distributed nature of the
wires. The delay of the pattern sensing circuitry has been
neglected.
3.1.1 Delay Analysis with both Drivers Switching
With the linearisation of the driver, the equivalent circuit
for one repeater segment can be shown to be the circuit in
Figure 4. Hence the 50% delay for the wire can be expressed
Worst−Case
Best−Case with Smart Driver
T
Best−Case











































, Cg = Cgmin(Hm +Ha),
Cd = Cdmin(Hm +Ha) and Cw = Cs + λCc. Here Hm and
Ha are the sizes of the Main and Assistant drivers respec-
tively, and λ is the switching factor. Since the Assistant
driver switches only when adjacent lines switch in opposite
directions, λ=2. To simplify the delay equation, the follow-
ing time constants are defined: tDout = RdminCdmin, tDWs =
RdminCs, tDWc = RdminCc, tDin = RdminCgmin, tWD =
RwCgmin, tWs = RwCs and tWc = RwCc. This results in:
TMA = 0.7k(tDout + tDin) +
0.7(tDWs + 2tDWc)
(Hm +Ha)




3.1.2 Delay Analysis with the Assistant Quiet
When the Assistant driver is quiet while the Main driver
is switching, the gate capacitance of the Assistant will not
add to the load, as it is disconnected by a switch in which
the input capacitance is negligible compared to the Assis-
tant driver’s input capacitance (see Figure 5). However the










Figure 5: Equivalent Circuit for the case when the
Main driver is switching







































where λ ∈ {0, 1}
Equations (10) and (11) are the two principal delay equa-
tions of the SMART driver for its two states of Main and
Assistant drivers switching, and Main driver switching while
the Assistant driver is quiet. The accuracy of these delay
expressions was checked against simulated values, and the
results are presented in Table 2. Here Rdmin, Cgmin and
Cdmin are chosen as 9kΩ and Cdmin = Cgmin = 1.8fF as
being representative values for a repeater in a 180 nm tech-
nology. The average error of under 10% is low enough to
allow this first order delay model to be used for optimisa-
tion tasks early in the design flow.
3.2 Optimum Buffer Sizes
The delay equations (10) and (11) predict a global min-
imum for the delay for optimal k,Hm and Ha values. The
variation of TMA and TM with Hm and Ha is shown in Fig-
ures 6 and 7. TMA is a convex function of Hm and Ha, and
hence also of (Hm + Ha). TM is a convex function of Hm,
while it has a linear dependence on Ha for a given value
of Hm. This is a consequence of the fact that the Assis-
tant driver contributes a parasitic capacitance to the load
while not contributing any drive strength for the switching
combinations represented by TM .
We obtain the optimal Hm, Ha and k values by deriv-
ing an expression for (Hm +Ha) and k by minimising tMA.
Then Hm is obtained by minimising tMA, which then al-
lows us to also solve for Ha
1. The delay when the assistant
driver is switching, tMA, is minimised by setting the partial
derivatives of tMA with respect to k and (Hm+Ha) to zero,
which results in:
1A value for the optimal k can also be obtained by setting
the partial derivative of TM with respect to k to zero, but




















































Figure 7: The Variation of TM with Hm and Ha










Since (10) is a convex function of total drive strength (Hm+
Ha), and also of k, (12) gives the optimum value for (Hm +
Ha) and (13) the optimal number of sections, k opt, to min-
imise delay. Now the optimal value for Hm is obtained by




= 0⇒ H2m = kHatDout + tDWs + λtDWc
tWD
(14)



















The two possible solutions to the above quadratic equation
117
Rw Cs Cc TMA/(ps) TM/(ps)
/(Ohms) /(fF) /(fF) Model Simulated % Error Model Simulated % Error
400 30.00 70.00 106.28 113.394 6.27 75.306 86.186 12.62
400 30.00 100.00 112.47 119.865 6.17 75.306 86.128 12.57
400 30.00 130.00 118.66 125.953 5.79 75.306 86.426 12.87
400 50.00 70.00 110.41 118.021 6.45 79.96 90.669 11.81
400 50.00 100.00 116.59 124.187 6.12 79.96 90.96 12.09
400 50.00 130.00 122.78 130.333 5.80 79.96 90.895 12.03
400 70.00 70.00 114.53 122.033 6.15 84.615 95.033 10.96
400 70.00 100.00 120.72 128.065 5.74 84.615 95.196 11.11
400 70.00 130.00 126.91 134.099 5.36 84.615 95.381 11.29
900 30.00 70.00 206.75 212.233 2.58 132.51 142.467 6.99
900 30.00 100.00 218.93 224.466 2.47 132.51 142.518 7.02
900 30.00 130.00 231.12 236.577 2.31 132.51 142.445 6.97
900 50.00 70.00 214.87 220.57 2.58 141.17 150.895 6.44
900 50.00 100.00 227.06 232.635 2.40 141.17 150.382 6.13
900 50.00 130.00 239.25 244.967 2.33 141.17 150.86 6.42
900 70.00 70.00 223 228.248 2.30 149.82 159.309 5.96
900 70.00 100.00 235.18 241.478 2.61 149.82 159.528 6.09
900 70.00 130.00 247.37 253.616 2.46 149.82 159.477 6.06
Table 2: Comparison of the delay and delay predicted by the model
are

















Since Ha < Ht, the meaningful value for Ha is:






















Here λ can be either 0 or 1. It can be seen that when
there is no coupling (i.e. λ = 0) (15) gives Ha = 0 and
consequently Hm = Ht which is the logical result.
3.3 Delay Balancing with the SMART Driver
As explained earlier, the SMART driver saves energy by
reducing the capacitive load for certain switching combina-
tions, which in turn is achieved by switching off part of the
driver. This means that the driver is essentially slower for
the switching combinations that give rise to a lower capac-
itive load, and hence reduces jitter - the variation between
the best-case and worst-case delays. This is a secondary
benefit of the driver, and here we present a design method-
ology for sizing the SMART driver to minimize jitter.
Since the Assistant driver switches only for the worst-case
switching pattern defined by Group 5 in Table 1, the size of
the Assistant driver, Ha, can be used to tune the delays for
the other switching combinations defined by Groups 1 and
3-4. The expressions in (11) and (10) represent the delay for
all these switching combinations. For clarity of explanation,
say T1, T2 and T3 are the wire delays for Groups 1, 3-4,
and 5 respectively. Hence T1 = TM |λ=0, T2 = TM |λ=1 and
T3 = TMA. Now increasing Ha increases TM (see Figure
7), and hence Ha can be sized so that either T1 = T3 or
T2 = T3 (T1 = T2 = T3 is not possible because the relative
delay variation between T1 and T2 is not a funtion of Ha).
The delay variation can be quantified as
∆T = TMA − TM .
By setting ∆T = 0, delay balancing can be achieved. Sub-
stituting for TMA and TM from (11) and (10) and using the
relation HmDB = Ht − HaDB the following quadratic for
HaDB can be obtained.
AH2aDB +BHaDB + C = 0 (18)
where
A = 0.7tWD














Now sizing HaDB to equalise T1 and T3 results in T2 be-
ing larger than T3, which may not always be possible due
to constraints on T3, the worst-case delay. However equal-
ising T2 and T3 does not result in any such adverse effect.
For the former strategy λ = 0 while for the latter strategy
λ = 1 in (18). Here Ht and k can be calculated according
to the strategy adopted. For example for optimal repeater
insertion, Ht can be calculated from Equation (12) and k
from Equation (13).
4. DISCUSSION ON DELAY AND ENERGY
CONSUMPTION
Having identified the delay expressions and possible ways
of optimizing the buffers, both the main driver and assis-
tant, calculations were carried out in order to verify the
strategy proposed in the paper. We designed the assistant







w s = w
Rw = 39 Ω/mm
Cs = 36 fF/mm
Cc = 98.6 fF/mm
Length = 10mm
Figure 8: Wire Geometrical Parameters
of delay for Group 1 (λ = 0), and the second case is delay
optimization for Group 3-4 (λ = 1). These two cases has
termed as Design Strategy One and Design Strategy Two.
Apart from that the drivers can be sized according to the
delay balancing technique too. The calculations carried out
considering minimum sized global wires (Refer Figure 8 with
w = 525nm) in 180 nm technology node and the results are
shown in the Table 3. Switching probabilities to evaluate the
energy saving (∆E) were found using two uniformly distrib-
uted pseudo random bit sequences(PRBS) with probabilities
ps,s = 0.1111, pe,0 = 0.1311,pe,1 = 0.1542 and po,o = 0.1001.
For all the strategies the maximum delay is 352 ps, and ∆E
(= Etrad−Esmart
Etrad
) and ∆T (= ∆Ttrad−∆Tsmart
∆Ttrad
) values are
represented as a percentage as compared to the traditional
method.
Case k Ht Hm Ha ∆T ∆E
Strategy One 4 173 105 68 3.1% 10.9%
Strategy Two 4 173 142 31 -1.6% 5.0%
Delay Balanced 4 173 75 97 18.7% 15.6%
Traditional 4 173 N.A. N.A. - -
Table 3: Hm and Ha for Two Optimization Strategies
The optimum driver sizes given in Table 3 are used to
evaluate the propagation delays for the switching patterns
related to Group 1, Groups 3 and 4, and Group 5. The
propagation delays are shown in the Figure 9. Switching
patterns in the Group 5 shows the maximum (worst-case)
delay whereas that for Group 1 is the minimum while Group
3-4 is in between.
When the delay balancing technique is used the delay for
Group 5 and Group 3-4 becomes equal and the delay for
Group 1 increases a little reducing the delay variation, which
is shown in the Figure 10. The reduction in jitter is 18% with
the delay balanced sizing of SMART repeater.
5. CONCLUSIONS
With the ever decreasing feature size, the delay and en-
ergy dissipation of on-chip global level wires are posing great
challenges in the nanometer regime. Usually repeaters are
used to reduce their delay trading off some active area and
power. Due to the static nature of the traditional repeater
there is an excessive power loss in bus wire structures. We
addressed this unnecessary power dissipation problem by dy-
namically changing the size of a repeater: The SMART re-
peater concept. Required worst-case delay based driver has
been distributed among two drivers namely main and assis-
tant. What we pay for all these power and jitter reduction
is just some extra area for a logic circuit and the assistant
driver. However, compared to a bus wire structure, the area



























Figure 9: Propagation Delay
maximum delay as with traditional repeaters, SMART re-
peaters presented in this work shows a 15% energy reduction



























Figure 10: Propagation Delay using with balancing
the delay
This work contained a detailed Elmore delay model for
the SMART repeater and optimum sizes for Assistant and
the Main drivers. Also, the delay balancing technique is also
presented. In section 4 probabilistic average energy model is
described for clear analysis of energy saving in SMART re-
peater concept. Unlike the other interconnect repeater con-
cepts other than the original concept by Backoglu [2],this
SMART repeater concept can easily be implemented in an
interconnect planning CAD tool. However, a detailed cir-
cuit level implementation and practical design issues of the
SMART driver, and it’s usefulness for different applications
will be discussed elsewhere.
Acknowledgment
The authors would like to acknowledge the financial support
of SIDA under the auspices of the research capacity build-
ing project at the Department of Electrical and Electronic
Engineering, University of Peradeniya, Sri Lanka.
6. REFERENCES
[1] M. Anders, N. Rai, R. K. Krishnamurthy, and
S. Borkar. A transition-encoded dynamic bus
technique for high-performance interconnects. IEEE
119
Journal of Solid-State Circuits, 38(05):709–714, May
2003.
[2] H. B. Backoglu. Circuits, Interconnections and
Packaging for VLSI. Addison-Wesley, 1990.
[3] D. M. Claudia Kretzschmer, Andre´ K. Nieuwland.
”why transition coding for power minimization of
on-chip buses does not work”. In DATE ’04:
Proceedings of the conference on Design, automation
and test in Europe, pages 512–517, February 2004.
[4] M. Ghoneima and Y. I. Ismail. Utilizing the effect of
relative delay on energy dissipation in low-power
on-chip buses. IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, 12(12):1348–1359,
December 2004.
[5] P. Gupta and A. B. Kahng. Quantifying error in
dynamic power estimation of cmos circuits. In IEEE
Intl. Symp. on Quality Electronic Design, pages
273–278, March 2003.
[6] K. Hirose and H. Yasuura. A bus delay reduction
technique considering crosstalk. In DATE ’00:
Proceedings of the conference on Design, automation
and test in Europe, pages 441–445, 2000.
[7] H.-Y. Huang and S.-L. Chen. Interconnect accelerating
techniques for sub-100-nm gigascale systems. IEEE
Transactions on Very Large Scale Integration (VLSI)
Systems, 12(11):1192–1200, November 2004.
[8] T. Iima, M. Mizuno, T. Horiuchi, and M. Yamashina.
Capacitance coupling immune, transient sensitive
accelerator for resistive interconnect signals of
subquarter micron ulsi. IEEE Journal of Solid-State
Circuits, 31(4):531–536, April 1996.
[9] A. B. Kahng, S. Muddu, E. Sarto, and R. Sharma.
Interconnect tuning strategies for high-performance
ics. In DATE ’98: Proceedings of the conference on
Design, automation and test in Europe, pages
471–478, 1998.
[10] A. Katoch, S. Jain, and M. Meijer. Aggressor aware
repeater circuits for improving on-chip bus
performance and robustness. European Solid-State
Circuits, 2003. ESSCIRC ’03. Conference on, pages
261–264, Sep 2003.
[11] H. Kaul and D. Sylvester. Low-power on-chip
communication based on transition-aware global
signaling(tags). IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, 12(05):464–476,
May 2004.
[12] K.-W. Kim, K.-H. Baek, N. Shanbhag, C. L. Liu, and
S.-M. Kang. Coupling-driven signal encoding scheme
for low-power interface-design. In Proceedings of the
IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pages 318–321,
November 2000.
[13] D. Liu and C. Svensson. Power consumption
estimation in cmos vlsi chips. IEEE Journal of
Solid-State Circuits, 29(6):663–670, 1994.
[14] N. Magen, A. Kolodny, U. Weiser, and N. Shamir.
Interconnect-power dissipation in a microprocessor. In
SLIP ’04: Proceedings of the 2004 international
workshop on System level interconnect prediction,
pages 7–13, 2004.
[15] A. Nalamalpu, S. Sirinivasan, and W. P. Burleson.
Boosters for driving long on chip interconnects-design
issues, interconnect synthesis, and comparison with
repeaters. IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems,
21(1):50–62, January 2002.
[16] A. K. Nieuwland, A. Katoch, and M. Meijer. Reducing
cross-talk induced power consumption and delay. In
PATMOS 2004 Proceedings, Lecture Notes in
Computer Science, volume LNCS 3254, pages
179–188, September 2004.
[17] K. Nose and T. Sakurai. Two schemes to reduce
interconnect delay in bi-directional and uni-directional
buses. In VLSI Circuits, 2001. Digest of Technical
Papers. 2001 Symposium on, pages 193–194, 2001.
[18] D. Pamunuwa, S. Elassaad, and H. Tenhunen.
Modeling delay and noise in arbitrarily coupled rc
trees. IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems, 24(11):1725–1739,
November 2005.
[19] D. Pamunuwa, L.-R. Zheng, and H. Tenhunen.
Maximizing throughput over parallel wire structures
in the deep submicrometer regime. IEEE transactions
on Very Large Scale Integration (VLSI) Systems,
11(2):224–243, April 2003.
[20] K. N. Patel and I. L. Markov. Error-correction and
crosstalk avoidance in dsm busses. In SLIP ’03:
Proceedings of the 2003 international workshop on
System-level interconnect prediction, pages 09–014,
2003.
[21] T. Sakurai. Perspectives on power-aware electronics.
In Solid-State Circuits Conference, 2003. Digest of
Technical Papers. ISSCC., volume 1, pages 20–26,
2003.
[22] Y. Shin, S.-I. Chae, and K. Choi. Partial bus-invert
coding for power optimization of system level bus. In
ISLPED ’98: Proceedings of the 1998 international
symposium on Low power electronics and design,
pages 127–129, 1998.
[23] Y. Shin and T. Sakurai. Coupling-driven bus design
for low-power application-specific systems. In DAC
’01: Proceedings of the 38th conference on Design
automation, pages 750–753, 2001.
[24] P. Sotiriadis, T. Konstantakopoulos, and
A. Chandrakasan. Analysis and implementation of
charge recycling for deep sub-micron buses. In
ISLPED ’01: Proceedings of the 2001 international
symposium on Low power electronics and design,
pages 364–369, 2001.
[25] P. P. Sotiriadis and A. Chandrakasan. Bus energy
minimization by transition pattern coding (tpc) in
deep sub-micron technologies. In ICCAD ’00:
Proceedings of the 2000 IEEE/ACM international
conference on Computer-aided design, pages 322–328,
2000.
[26] M. R. Stan and W. P. Burleson. Bus-invert coding for
low-power i/o. IEEE Transactions on Very Large
Scale Integration Systems, 3(1):49–58, 1995.
[27] R. Weerasekera, L.-R. Zheng, D. Pamunuwa, and
H. Tenhunen. Switching sensitive interconnect driver
to combat dynamic delay in on-chip buses. In
PATMOS 2005 Proceedings, Lecture Notes in
Computer Science, volume LNCS 3728, pages
277–285, September 2005.
120
