Delay-balanced smart repeaters for on-chip global signaling. by Weerasekera, Roshan et al.
Delay-Balanced Smart Repeaters for On-Chip
Global Signaling
Roshan Weerasekera∗, Dinesh Pamunuwa†, Li-Rong Zheng∗ and Hannu Tenhunen∗
∗ Department of Electronic, Software and Computer Systems,
KTH School for Information and Communication Technology,
ELECTRUM 229, 164 40 Kista, Sweden.
Email: {roshan,lrzheng,hannu}@imit.kth.se
† Centre for Microsystems Engineering, Faculty of Applied Sciences
Lancaster University, Lancaster LA1 4YR, United Kingdom
Email: d.pamunuwa@lancaster.ac.uk
Abstract— In this paper we propose a smart driver, whose
drive strength is dynamically altered depending on the relative bit
pattern, by partitioning it into a Main Driver and Assistant Driver.
For a higher effective load capacitance both drivers switch, while
for a lower effective capacitance the assistant driver is quiet. It
is shown that in an UMC 0.18µm technology the potential peak
power saving, for typical global wire lengths, can be as much
18% with a 12% jitter reduction over a traditional repeater for
a data rate of 1Gb/s.
I. INTRODUCTION
Scaling of wires and tighter integration has resulted in signal
integrity problems which only add to the interconnection woes;
cross-talk between signal lines results in signal corruption and
variable delay, depending on the respective switching patterns.
A key technique in reducing propagation delay and signal
degradation is repeater insertion. Although very effective and
simple, this has an adverse effect on power consumption,
and it has been estimated that over 50% of the power in a
high performance microprocessor is dissipated by repeaters
charging and discharging interconnects [1]. Further, over 90%
of this power is concentrated in only 10% of the interconnects;
i.e. those which are classed as global and run for a significant





Fig. 1. Basic schematic of the proposed driver scheme
In this paper we propose a repeater that consumes less
energy, and is suitable for exactly these kinds of global
interconnections. It exploits the fact that in a parallel wire
structure, the effective capacitance of a given wire is dynamic;
i.e. it is a function of not only the physical geometry, but also
the relative switching pattern described by the bits on the wire
in question (the victim) and the adjacent wires (aggressors).
With a traditional repeater, since the drive strength is static, the
result is a spread of the propagation delay, with the repeater
strength being essentially too much for every bit pattern other
than the worst-case pattern. In the proposed repeater, the drive
strength is dynamically altered depending on the relative bit
pattern, by partitioning it into a Main Driver and Assistant
Driver. For a higher effective load capacitance both drivers
switch, while for a lower effective capacitance the assistant
driver is quiet [2]. It is experimentally found that the potential
peak power saving can be as as much 18% and the jitter
reduction is about 12% over a traditional repeater for typical
global wire lengths in UMC 0.18µm technology.
The ramifications of the dynamically changing load in
coupled interconnects have received a fair amount of attention
in the literature. A comprehensive analysis of design consid-
erations for repeater insertion in a bus structure with heavy
coupling was presented in [3]. A scheme proposed in [4]
staggers the repeaters so that opposing transitions only persist
for the length of the offset between repeaters, and become best-
case patterns for the remainder, resulting in a delay reduction.
Many innovative alternatives to the traditional repeater have
also been proposed, such as the Transient Sensitive Accelerator
(TSA) [5], Charge Recycling Technique (CRT) [6], Boosters
[7], the TAGS receiver [8], the Aggressor-Aware Repeater [9],
and the Capacitor Coupled Trigger and Accelerator combi-
nation [10]. Some of these use skewed inverters to trade-off
noise margin for speed [5], [7], [8], while others consume
more energy [9] and occupy a larger area [5], [7], [8] to
produce a faster response.
In general, not only do these alternatives to traditional
repeaters require much effort in circuit design similar to library
cell design, but they also lack a clear high-level abstraction;
in contrast, performance metrics such as delay and energy
consumption can easily be quantified in terms of a few critical
design parameters for the traditional inverting repeater [11],
resulting in easy amalgamation in CAD flows at different
levels of hierarchy from initial signal planning to detailed place
20th International Conference on VLSI Design (VLSID'07)
0-7695-2762-0/07 $20.00  © 2007










Fig. 2. Schematic of proposed SMART repeater, shown here only for two
coupled nets
and route [12].
One of the main advantages of the repeater circuit proposed
here is that the relatively minor increase in circuit complexity
required to obtain the energy saving and delay equalization
described above can be completely abstracted in the perfor-
mance analysis. We present a design methodology similar to
that for traditional single-wire inverting repeaters, including
an RC equivalent circuit and closed-form expressions for the
first-order approximation to the delay. Therefore this repeater
can be very easily modeled in tasks such as delay calculation,
signal integrity analysis and timing driven optimisation in any
CAD flow for physical design.
The rest of this document is structured as follows. Section
II describes the smart repeater circuit and circuit abstraction.
The following section presents the design methodology. In the
next section we present simulation results that validate the
delay analysis. Then, we end with a discussion.
Group Case Switching Event on Status of
wire i wire j the Assistant
1 1 ↓ ↓ Off2 ↑ ↑ Off
2
3 0 0 Off
4 0 1 On
5 1 0 On
6 1 1 Off
3
7 0 ↑ Off
8 ↑ 0 Off
9 0 ↓ Off
10 ↓ 0 On
4
11 1 ↑ Off
12 ↑ 1 On
13 1 ↓ Off
14 ↓ 1 Off
5 15 ↑ ↓ On16 ↓ ↑ On
TABLE I
STATUS OF ASSISTANT DRIVER FOR POSSIBLE SWITCHING ACTIVITIES
ON THE LINES.
II. CONCEPT OF THE SMART DRIVER
The variation of effective capacitance with the relative
switching pattern introduces a spread in the arrival time at
the far end of the wire, which is known as dynamic delay. To
Worst−Case
Best−Case with Smart Driver
T
Best−Case
Fig. 3. Method of Jitter reduction using SMART driver
demonstrate this, a pair of coupled lines is used as a constituent
unit for a bus. For two simultaneously switching lines, sixteen
possible switching combinations can be identified (Refer Table
I). These can be categorized into five different groups accord-
ing to the effective capacitance as follows. Group 1: Both
switch in the same direction; Group 2: Both lines are quiet
(at 0 or 1); Group 3: One line is switching while the other is
quiet at 0; Group 4: One line is switching while the other is
quiet at 1; Group 5: The lines switch in opposite directions.
To ensure error-free operation, timing constraints have to
be satisfied for the switching pattern that causes the worst-
case delay, which are the ↑↓ and ↓↑ combinations. Since the
effective load is highest for these patterns, the size of the buffer
designed statically for the worst-case delay is much larger
than would be necessary for the same timing requirements
for other patterns [3]. Now this worst-case condition occurs
only twice out of 16 possible input switching patterns, with
a probability of 1/8 for simultaneously switching lines if the
transitions are equally distributed as in a random bit stream.
For the 14 other cases, the wire is driven faster, which just
translates to slack which typically cannot be used. The driver
proposed here changes its drive strength depending on the
neighbour’s switching direction by using some simple logic. A
basic schematic of the proposed SMART repeater is shown in
Figure 2. If the switching pattern belongs to Groups 1, 3, or
4, a single inverter (the Main driver) drives the interconnect.
When a switching pattern in Group 5 occurs, another inverter
(the Assistant) also drives the line, increasing the total drive
strength appropriately.
In Figure 3 the curves with solid lines represent the output
response of a conventional driver, for minimum effective
capacitance (Best-Case) and maximum effective capacitance
(Worst-Case). Our aim, with the SMART driver, is to delay
the response for the best-case without affecting the worst-case,
so that the variation in delay becomes as small as possible. In
other words, the concept is to make the response slower with
minimum effective capacitance, as the cycle delay has to be
set to the worst-case delay anyway.
A. Circuit Realization
It is difficult to change the state of the assistant before the
input completes its transition due to the inherent delay in logic
20th International Conference on VLSI Design (VLSID'07)
0-7695-2762-0/07 $20.00  © 2007
Authorized licensed use limited to: Lancaster University Library. Downloaded on November 30, 2009 at 09:21 from IEEE Xplore.  Restrictions apply. 
Inv1
Nselbar    












Fig. 4. Circuit Schematic of the proposed Smart driver
elements in the smart driver. Hence, in the implementation we
anticipate and decide whether the next state would be a worst-
case pattern or not. For example, assume that wire i is at logic
level zero(0) and wire j is at logic level one(1) at this moment.
One of the next states will be wire i going to logic level one(1)
and wire j switching to logic level zero(0), which is a worst-
case pattern described in Group 5. It is the same as with wire
i in logic level one(1) and wire j being logic level zero(0) at
the current state. If the wire i at logic level 0 (1) and wire j
at logic level 1 (0), the assistant is switched on. For all the
other states the assistant is off. But due to this anticipation the
assistant is on for not only the worst-case switching patterns,
defined in Group 5, but also for two other patterns as well,
i.e. case 10 and 12. The decision of making assistant on or off
should be taken after the interconnect voltage reaches above
threshold voltage and due to this reason the clock cycle has
to be adjusted such that
Tdriver ≤ Tlogic < Tclk (1)
where Tdriver is the propagation delay of the driver, Tlogic is
the delay of the selection logic circuit, and Tclk is the clock
period. The output of the selection logic should be available
just after the interconnect is driven, and this is the maximum
clock rate that can be achieved with this proposed scheme. The
lower bound of the inequality ensures that assistant driver is
maintained until the output has crossed the threshold voltage
(VDD/2).
The transistor level schematic of the Smart driver is shown
in the Figure 4. The transistors Pa and Na form the As-
sistant driver, where as the Inverter I1 works as the Main
Driver. Two transmission gates, TGp and TGn, are used to
connect/disconnect the path from the signal input and the
input of assistant driver. The transistors Pk and Nk will ensure
that the assistant driver is off while the signal input path to
the assistant is cut off, Otherwise the assistant driver will be
just floating. The selection logic signals Psel and Nsel are
determined as:
Psel = Agg + V ictim










Fig. 6. Only when the main driver is active: Aggressor and the Victim
switches in the same direction
When the victim input is at logic 0 and the aggressor is at
logic 1, the next victim stage would be logic 1, and this might
be a worst-case pattern if the aggressor too change its state.
In this case we switched on Na and switched off Pa since Pa
is not needed during this discharging time. This is achieved
by setting Nasel = 0 and Pasel = 0.
Transmission gates is used as the switch to on/off the path
between the wire input to the assistant driver, in order to reduce
the path resistance. Cascaded two stage buffer is utilized to
drive the transmission gate. Not only that the cascaded buffer
delays the output of the selection logic so that it satisfy the
condition mentioned in the Equation (1). If odd number of
stages are needed in the cascaded buffer, selection logic has
to be changed accordingly.
Figures 5 and 6 shows the simulations results at the far
end and near end - points A and B in Figure 9 - of an
2.5mm long wire driven by a smart repeater and a traditional
repeater (inverter). When both the inputs, Aggressor and
Victim, switches in the same direction the smart driver shows a
delayed response, but the traditional driver shows a very faster
response. But the far-end waveform the traditional repeater
takes very a longer time to settle, but that for the smart driver
is faster. When the victim and Aggressor switches in different
directions, in this case victim switches upward and aggressor
switches downward, both drivers show similar response, i.e.
smart driver has similar drive capability.
20th International Conference on VLSI Design (VLSID'07)
0-7695-2762-0/07 $20.00  © 2007
Authorized licensed use limited to: Lancaster University Library. Downloaded on November 30, 2009 at 09:21 from IEEE Xplore.  Restrictions apply. 
III. DELAY BALANCING WITH THE SMART DRIVER
A. First-Order Delay Model
The delay analysis for repeater insertion is carried out by
characterizing a minimum sized inverter by an output resis-
tance Rdmin, in addition to the input gate capacitance Cgmin
and output drain-diffusion capacitance Cdmin already defined.
Following notation is used as the wire electrical parameters:
Rw - total wire resistance, Cs - total self wire (wire-to-ground)
capacitance and Cc - total interwire capacitance.
1) Delay Analysis with both Drivers Switching: With the
linearisation of the driver, the equivalent circuit for one re-
peater segment can be shown to be the circuit in Figure 7.





(Ha +Hm)Cdmin (Ha +Hm)Cgmin
Fig. 7. Equivalent Circuit for the case when both drivers are switching
TMA = 0.7k(tDout + tDin) +
0.7(tDWs + µitDWc)
(Hm +Ha)







where i = 3. Here Hm and Ha are the sizes of the Main and
Assistant drivers respectively, λ, µ are the empherical switch
factors1 proposed in [3], and to simplify the delay equation,
the following time constants have been defined: tDout =
RdminCdmin, tDWs = RdminCs, tDWc = RdminCc, tDin =
RdminCgmin, tWD = RwCgmin, tWs = RwCs and tWc =
RwCc.
2) Delay Analysis with the Assistant Quiet: When the
Assistant driver is quiet while the Main driver is switching,
the gate capacitance of the Assistant will not add to the
load, as it is disconnected by a switch in which the input
capacitance is negligible compared to the Assistant driver’s
input capacitance (see Figure 8). However the parasitic drain-








Fig. 8. Equivalent Circuit for the case when the Main driver is switching
1When one wire is switching and the other: switching in the same directions
λ0 = µ0 = 0, is quiet either at zero or one λ1 = 0.57 and µ0 = 0.67,
switching opposite direction λ2 = 1.51 and µ2 = 2.2.























where i = 1, 2
Equations (2) and (3) are the two principal delay equations
of the SMART driver for its two states of Main and Assis-
tant drivers switching, and Main driver switching while the
Assistant driver is quiet.
B. Sizing of the Drivers
We obtain the optimal Hm, Ha and k values by deriving
an expression for (Hm +Ha) and k by minimising tMA.










Since the Assistant driver switches only for the worst-case
switching pattern defined by Group 5 in Table I, the size of
the Assistant driver, Ha, can be used to tune the delays for
the other switching combinations defined by Groups 1 and 3-
4. The expressions in (3) and (2) represent the delay for all
these switching combinations. For clarity of explanation, say
T1, T2 and T3 are the wire delays for Groups 1, 3-4, and 5
respectively. Hence T1 = TM |λ=0, T2 = TM |λ=1 and T3 =
TMA. Now increasing Ha increases TM (see Figure ??), and
hence Ha can be sized so that either T1 = T3 or T2 = T3
(T1 = T2 = T3 is not possible because the relative delay
variation between T1 and T2 is not a function of Ha).
The delay variation can be quantified as
∆T = TMA − TM .
By setting ∆T = 0, delay balancing can be achieved. Substi-
tuting for TMA and TM from (3) and (2) and using the relation
HmDB = Ht −HaDB the following quadratic for HaDB can
be obtained.
AH2aDB +BHaDB + C = 0 (6)
where
A = 0.7tWD








C = 0.7(λ3 − λ2)tDWc +
0.4(µ3 − µ2)tWC
k
Now sizing HaDB to equalise T1 and T3 results in T2 being
larger than T3, which may not always be possible due to
constraints on T3, the worst-case delay. However equalising
T2 and T3 does not result in any such adverse effect. Here Ht
20th International Conference on VLSI Design (VLSID'07)
0-7695-2762-0/07 $20.00  © 2007
Authorized licensed use limited to: Lancaster University Library. Downloaded on November 30, 2009 at 09:21 from IEEE Xplore.  Restrictions apply. 
and k can be calculated according to the strategy adopted. For
example for optimal repeater insertion, Ht can be calculated
from Equation (4) and k from Equation (5). Note also that Ha
is not a function of the wire length, but is solely depend on
the crosstalk capacitance.





Aspect Ratio (AR) 2.1






BUFFER CHARACTERISTICS FOR 0.18 µm AND WIRE PARAMETERS
IV. PEAK POWER/DELAY COMPARISON
UMC 0.18− µm CMOS technology is used for the device
models with a Vdd of 1.8V, Cadence Spectre is used for all the
simulations. A typical global metal layer is used for routing
the bus, with a minimum pitch of 1050nm. The wire electrical
parameters has obtained using the equation proposed in [13],
and the inverter characteristics Rdmin, Cdmin, and Cgmin,
are experimentally determined as the simple method outlined
in [14]. They are presented in Table II. Each interconnect
wire segment is modeled as a pi − 10 capacitively coupled
network. The two wires, representative unit for bus structures,
has fed with two uniformly distributed pseudo random bit
sequences(PRBS). The above bit sequence has a cycle time
of 1ns with a rise/fall time of 350ps. In this particular design
Tsel is 567ps and that is the maximum cycle time that we can
achieve using this scheme.
I BA O
Fig. 9. Interconnect Link used for the Simulation - Total length is 10mm,
and one segment is 2.5mm
For these parameters, the values for driver sizes and the
number of repeaters needed for delay minimization and delay-
balanced assistant are found using Equations (4), (5) and (6).
They are presented in Table III:
Case Ht kopt Ha Tmax Tmin Tmax − Tmin
From Calculations ...
Traditional 120 4 - 741 ps 488 ps 253 ps
Smart 120 4 77 741 ps 600 ps 141 ps
From Simulations ...
Traditional 120 4 - 695 ps 483 ps 212 ps
Smart 120 4 70 718 ps 531 ps 187 ps
TABLE III
CALCULATED VALUES FOR OPTIMAL BUFFER SIZES, RELATED DELAY AND
ENERGY
Simulations were carried out slightly increasing and de-
creasing the optimal buffer size, Ht, obtained such that it
reduces the worst-case delay, but the reduction of the delay
is not so pronounced and hence the estimated value itself
has used for the simulations. For this repeater configuration
the delays 2 for different switching patterns are shown in the
Figure 10. The 50% delay is measured from the input to the
far-end of the last section (Point O in Figure 9). It clearly
depicts the idea of delay balancing, reducing the drive strength
for low effective capacitance and increasing drive capability
for higher effective capacitance. The variation of maximum
and minimum delays with Ha is shown in Figure 11 and found
that Ha should be in the range of 70-75 in order to have a
minimum delay variation. In the case of the traditional driver
the delay variation is 212ps, but for the smart driver case it
is 187ps. With the help of the smart repeater technique the
delay variation is reduced by about 12% and the peak power
consumption is reduced by more or less 18% (Peak power
dissipation for the traditional case is 16mW and that of smart
driver is 13.1mW ).
When Ha is very small 10-20% of Ht the delay variation is
very much close that of a traditional driver, but as it increases,
which in turn reduces the size of Hm, the delay variation is






















Fig. 10. Delay Comparison for two driver techniques
The delay model predicts that there might be a jitter reduc-






average energy consumption per cycle for the smart driver and
the selection logic combination is about 858fJ , whereas the
traditional driver consumes only 756fJ , which is almost the
same.
There is a slight increase in the peak crosstalk voltage with
a smart driver compared to that of traditional but the peak
crosstalk at the far-end is about 12% of Vdd, which is in the
normal acceptable range of 20% of Vdd.
2The first order Elmore delay model presented in this work assumes that
the input is a step, but in practice it is a finite ramp. In order to increase its
accuracy, half of the rise time ( tr
2
) is added to the values computed by TM
and TMA
20th International Conference on VLSI Design (VLSID'07)
0-7695-2762-0/07 $20.00  © 2007
Authorized licensed use limited to: Lancaster University Library. Downloaded on November 30, 2009 at 09:21 from IEEE Xplore.  Restrictions apply. 


















Fig. 11. The Variation of Delay Variation with Ha
Near End (A) Far End (B)
Smart Repeater 0.113 V 0.213 V
Traditional Repeater 0.069 V 0.206 V
TABLE IV
MAXIMUM CROSSTALK ON A QUIET LINE: POINT A AND B ARE THE
NEAR END AND FAR END
In order to demonstrate the relationship of coupling ca-
pacitance and the size of delay-balanced assistant buffer, for
a variation of coupling capacitance the relative size of Ha
is found both in experimentally in circuit level and system
level. Shown in Figure 12 is that relationship. However, the
calculated and simulated values are not very close, but show
similar characteristics.
















Fig. 12. Relative Size of Ha for different Cc/Cs
V. CONCLUSIONS
We have presented a detailed Elmore delay model for the
SMART repeater and high level sizing for the Assistant and the
Main drivers. A jitter reduction method called delay balancing
technique is also presented. However, compared to a bus
wire structure, the area for the extra gates are considerably
small. With the same maximum delay as with traditional
repeaters, SMART repeaters presented in this work shows
12% jitter reduction and a 18% peak power consumption. We
conclude that SMART repeaters consumes less peak energy
and reduce the data dependent jitter, with a area penalty but
not a delay penalty. In future technologies since the transistors
are less expensive than routing area, this would probably be
an excellent replacement.
REFERENCES
[1] N. Magen, A. Kolodny, U. Weiser, and N. Shamir, “Interconnect-power
dissipation in a microprocessor,” in SLIP ’04: Proceedings of the 2004
international workshop on System level interconnect prediction, 2004,
pp. 7–13.
[2] R. Weerasekera, L.-R. Zheng, D. Pamunuwa, and H. Tenhunen, “Switch-
ing sensitive interconnect driver to combat dynamic delay in on-chip
buses,” in PATMOS 2005 Proceedings, Lecture Notes in Computer
Science, vol. LNCS 3728, September 2005, pp. 277–285.
[3] D. Pamunuwa, L.-R. Zheng, and H. Tenhunen, “Maximizing throughput
over parallel wire structures in the deep submicrometer regime,” IEEE
transactions on Very Large Scale Integration (VLSI) Systems, vol. 11,
no. 2, pp. 224–243, April 2003.
[4] A. B. Kahng, S. Muddu, E. Sarto, and R. Sharma, “Interconnect tuning
strategies for high-performance ics,” in DATE ’98: Proceedings of the
conference on Design, automation and test in Europe, 1998, pp. 471–
478.
[5] T. Iima, M. Mizuno, T. Horiuchi, and M. Yamashina, “Capacitance cou-
pling immune, transient sensitive accelerator for resistive interconnect
signals of subquarter micron ulsi,” IEEE Journal of Solid-State Circuits,
vol. 31, no. 4, pp. 531–536, April 1996.
[6] P. Sotiriadis, T. Konstantakopoulos, and A. Chandrakasan, “Analysis
and implementation of charge recycling for deep sub-micron buses,”
in ISLPED ’01: Proceedings of the 2001 international symposium on
Low power electronics and design, 2001, pp. 364–369.
[7] A. Nalamalpu, S. Sirinivasan, and W. P. Burleson, “Boosters for driving
long on chip interconnects-design issues, interconnect synthesis, and
comparison with repeaters,” IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, vol. 21, no. 1, pp. 50–62,
January 2002.
[8] H. Kaul and D. Sylvester, “Low-power on-chip communication based
on transition-aware global signaling(tags),” IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, vol. 12, no. 05, pp. 464–476,
May 2004.
[9] A. Katoch, S. Jain, and M. Meijer, “Aggressor aware repeater circuits
for improving on-chip bus performance and robustness,” European Solid-
State Circuits, 2003. ESSCIRC ’03. Conference on, pp. 261–264, Sep
2003.
[10] H.-Y. Huang and S.-L. Chen, “Interconnect accelerating techniques for
sub-100-nm gigascale systems,” IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol. 12, no. 11, pp. 1192–1200, November
2004.
[11] H. B. Backoglu, Circuits, Interconnections and Packaging for VLSI.
Addison-Wesley, 1990.
[12] D. Pamunuwa, S. Elassaad, and H. Tenhunen, “Modeling delay and
noise in arbitrarily coupled rc trees,” IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, vol. 24, no. 11, pp.
1725–1739, November 2005.
[13] L.-R. Zheng, D. Pamunuwa, and H. Tenhunen, “Accurate a priori
signal integrity estimation using a dynamic interconnect model for
deep submicron vlsi design,” in European Solid-State Circuits, 2000.
ESSCIRC ’00. Conference on, 2000, pp. 324–327.
[14] P. Caputa, R. Kallsten, and C. Svensson, “Capacitive crosstalk effects on
on-chip interconnect latencies and data-rates,” in NORCHIP Conference,
2005., 2005, pp. 281 – 284.
20th International Conference on VLSI Design (VLSID'07)
0-7695-2762-0/07 $20.00  © 2007
Authorized licensed use limited to: Lancaster University Library. Downloaded on November 30, 2009 at 09:21 from IEEE Xplore.  Restrictions apply. 
