Minimal-power, delay-balanced smart repeaters for global interconnects in the nanometer regime. by Weerasekera, Roshan et al.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 5, MAY 2008 589
Transactions Briefs
Minimal-Power, Delay-Balanced SMART Repeaters for
Global Interconnects in the Nanometer Regime
Roshan Weerasekera, Dinesh Pamunuwa, Li-Rong Zheng, and
Hannu Tenhunen
Abstract—A SMART repeater is proposed for driving capacitively-cou-
pled, global-length on-chip interconnects that alters its drive strength dy-
namically to match the relative bit pattern on the wires and thus the effec-
tive capacitive load. This is achieved by partitioning the driver into main
and assistant drivers; for a higher effective load capacitance both drivers
switch, while for a lower effective capacitance the assistant driver is quiet.
In a UMC 0.18- m technology the potential energy saving is around 10%
and the reduction in jitter 20%, in comparison to a traditional repeater for
typical global wire lengths. It is also shown that the average energy saving
for nanometer technologies is in the range of 20% to 25%. The driver ar-
chitecture exploits the fact that as feature sizes decrease, the capacitive
load per transistor shrinks, whereas global wire loads remain relatively
unchanged. Hence, the smaller the technology, the greater the potential
saving.
Index Terms—Buffer, interconnects, nanometer design, on-chip sig-
naling, repeaters.
I. INTRODUCTION
A key technique in reducing propagation delay and signal degradation
in global on-chip interconnects is repeater insertion. Although very
effective and simple, this has an adverse effect on power consumption
and it has been estimated that over 50% of the power in a high
performance microprocessor is dissipated by repeaters charging and
discharging interconnects [1]. Further, [1] concludes that over 90%
of this power is concentrated in only 10% of the interconnets;
i.e., those which are classed as global and run for a significant
fraction of the die length.
The repeater we propose in this paper exploits the fact that in a par-
allel wire structure, the effective capacitance of a given wire is dy-
namic; i.e., it is a function not only of the physical geometry, but also
of the relative switching pattern described by the bits on the wire in
question (the victim) and the adjacent wires (aggressors). With a tradi-
tional repeater, since the drive strength is static, the result is a spread of
the propagation delay, with the repeater strength being essentially too
much for every bit pattern other than the worst-case pattern. In the pro-
posed repeater, the drive strength is dynamically altered depending on
the relative bit pattern, by partitioning it into a main driver and assistant
driver (see Fig. 1). For a higher effective load capacitance, both drivers
switch, while for a lower effective capacitance the assistant driver is
quiet [2]. By disconnecting part of the repeater when it is not needed,
the total load capacitance to the previous stage is reduced, resulting in
reduced energy consumption for those instances. It is experimentally
Manuscript received July 1, 2006; revised February 9, 2007, March 18, 2007,
and July 2, 2007.
R. Weerasekera, L.-R. Zheng, and H. Tenhunen are with KTH School of In-
formation and Communication Technologies (ICT), Electrum 229, 164 40 Kista,
Sweden (e-mail: roshan@kth.se).
D. Pamunuwa is at the Center for Microsystem Engineering, Lancaster Uni-
versity, Lancaster LA1 4WA, U.K.
Digital Object Identifier 10.1109/TVLSI.2008.917555
Fig. 1. Basic schematic of the proposed driver scheme.
shown that for a UMC 0.18-m technology the potential average saving
in energy can be as much as 10% over a traditional repeater for typical
global wire lengths in nanometer technologies. Since this SMART re-
peater works by reducing the variation in the delay, an added benefit
is that the jitter is reduced. In the same technology, the jitter reduction
was as much as 20%.
The ramifications of the dynamically changing load in coupled in-
terconnects have received a fair amount of attention in the literature.
A comprehensive analysis of design considerations for repeater inser-
tion in a bus structure with heavy coupling was presented in [3]. A
scheme proposed in [4] staggers the repeaters so that opposing tran-
sitions only persist for the length of the offset between repeaters, and
become best-case patterns for the remainder, resulting in a delay reduc-
tion. Many innovative alternatives to the traditional repeater have also
been proposed, such as the transient sensitive accelerator (TSA) [5],
charge recycling technique (CRT) [6], boosters [7], the TAGS receiver
[8], the aggressor-aware repeater [9], and the capacitor-coupled trigger
and accelerator combination [10]. Some of these use skewed inverters
to tradeoff noise margin for speed [5], [7], [8], while others consume
more energy [9] and occupy a larger area [5], [7], [8] to produce a faster
response.
In general, not only do these alternatives to traditional repeaters re-
quire much effort in circuit design similar to library cell design, but
they also lack a clear high-level abstraction; in contrast, performance
metrics such as delay and energy consumption can easily be quantified
in terms of a few critical design parameters for the traditional inverting
repeater [11], resulting in easy amalgamation in computer-aided design
(CAD) flows at different levels of hierarchy from initial signal planning
to detailed place and route.
A secondary advantage of the repeater circuit proposed here is that
the relatively minor increase in circuit complexity required to obtain
the energy saving and delay equalization described above can be com-
pletely abstracted in the performance analysis.
II. IMPLEMENTATION OF THE SMART DRIVER
A. Concept
In order to demonstrate the variation of effective capacitance of
wires, a pair of coupled lines is used as a constituent unit for a bus. For
two simultaneously switching lines, sixteen possible switching combi-
nations can be identified as given in Table I. These can be categorized
into five different groups according to the effective capacitance as
follows.
Group 1 Both switch in the same direction.
Group 2 Both lines are quiet (at 0 or 1).
1063-8210/$25.00 © 2008 IEEE
Authorized licensed use limited to: Lancaster University Library. Downloaded on November 30, 2009 at 05:15 from IEEE Xplore.  Restrictions apply. 
590 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 5, MAY 2008
TABLE I
SWITCHING ACTIVITIES ON THE LINES AND THE VARIATION
OF EFFECTIVE CAPACITANCE
Group 3 One line is switching while the other is quiet at 0.
Group 4 One line is switching while the other is quiet at 1.
Group 5 The lines switch in opposite directions.
In Table I, Cwt = (Cs=k) + Ht(Cdmin + Cgmin) and Cws =
(Cs=k) + HtCdmin + HmCgmin, where Hm and Ha denote the
sizes of the main and assistant drivers, respectively; Ht = Hm +Ha;
k is the number of wire segments; Cgmin and Cdmin are the gate
capacitance and the drain diffusion capacitance of a minimum sized
inverter; and Cs and Cc are the total wire-to-ground and wire-to-wire
capacitances, respectively.
To ensure error-free operation, timing constraints have to be satisfied
for the switching pattern that causes the worst-case delay, which are the
"# and #" combinations. Since the effective load is highest for these
patterns, the size of the buffer designed statically for the worst-case
delay is much larger than would be necessary for the same timing re-
quirements for other patterns [3]. Now this worst-case condition occurs
only twice out of 16 possible input switching patterns, with a proba-
bility of 1/8 for simultaneously switching lines if the transitions are
equally distributed as in a random bit stream. For the 14 other cases,
the wire is driven faster, which just translates to slack which typically
cannot be used, consuming energy unnecessarily. The driver proposed
here changes its drive strength depending on the neighbour’s switching
direction by using some simple logic.
The other useful feature in the SMART driver is its ability to re-
duce jitter while saving energy. The SMART driver achieves this energy
saving by delaying the response for the best-case without affecting the
worst-case (delay-balancing), so that the variation in delay is as small
as possible [2], [12]. This is illustrated in Fig. 2, where the curves with
solid lines represent the output response of a conventional driver for
minimum effective capacitance (best-case) and maximum effective ca-
pacitance (worst-case).
We propose this SMART driver circuit for a regular bus structure, and
in that case, there will be at least two aggressors for a middle wire.
This problem is addressed by adding an extra assistant as in our pre-
vious work [2] for each extra aggressor, and a unified analytic optimiza-
tion algorithm is not derived, but an empirical sizing methodology can
easily be obtained. In terms of the hardware, there is very little added
complexity in the control logic. Similarly, irregularly spaced aggressors
do not impose any special problems, because we can merely stagger the
Fig. 2. Jitter reduction using SMART driver.
spacing of repeaters appropriately, so that long extra control signals are
not necessary.
B. Circuit Realization
The implementation has been carried out in a UMC 0.18-m CMOS
technology, with a VDD of 1.8 V. All simulations are carried out using
Cadence Spectre.
In the implementation, a decision is made prior to the next transition
about whether or not it constitutes a worst-case pattern. This decision
is based on the relative logic values of the aggressor and the victim
at the current time. Since the assistant driver needs to switch on for
the worst-case patterns described in Group 5 in Table I, any time the
present state has opposing logic values on the victim and aggressor, the
assistant is turned on. This actually turns the assistant on for two other
patterns which are not worst-case, namely patterns 10 and 12 in Table I,
which reduces the energy saving from the theoretical maximum, but
allows a robust and fairly simple circuit implementation.
The complete schematic is shown in Fig. 3. The transistors Pa and
Na form the assistant driver, whereas the inverter I1 is the main driver.
Two transmission gates (TGp and TGn), drive the pull-up and pull-
down networks of the assistant driver. The weak transistors Pk andNk
act as keepers ensuring that the assistant driver is turned off properly
when the corresponding transmission gate is disabled.
The propagation delay of the selection logic Tlogic is designed to
be greater than or equal to the delay of the main driver (from node
V ictim In to node V ictim Out) so that node V ictim Out is able
to change before Pa or Na change.
Figs. 4 and 5 show the simulation results at the far end of a
2.5-mm-long wire driven by a SMART repeater and a traditional
repeater (inverter). The waveforms show the delay equalization for
different switching patterns taking place.
Along with the delay performance, the noise resilience of the pro-
posed driver is of paramount importance. In our implementation, we
avoid the use of skewed inverters while using complementary logic
with a switching threshold of VDD=2 throughout the control circuitry.
The only exception is the transmission gate pair, which are protected by
keepers. Hence, a preliminary analysis reveals a relatively high noise
margin for the circuit. Nevertheless, a more comprehensive noise anal-
ysis is earmarked for future work.
III. ENERGY MODEL OF THE SMART DRIVER
1) Dynamic Energy: If all switching events are random uniformly
distributed events with no correlation between neighboring lines, the
average energy dissipation per transition for wire i can be obtained by
averaging out the dynamic energy consumption for each pattern. Then
Authorized licensed use limited to: Lancaster University Library. Downloaded on November 30, 2009 at 05:15 from IEEE Xplore.  Restrictions apply. 
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 5, MAY 2008 591
Fig. 3. Circuit schematic of the proposed SMART driver and the selector logic.
(a) Driver circuit. (b) Gate-Level schematic of selector logic. (c) Transistor-level
schematic of selector logic.
Fig. 4. Waveforms when the aggressor and victim switch in the same direction
with only the main driver being active.
Fig. 5. Waveforms when the aggressor and victim switch in opposite directions
with both drivers active.










BUFFER AND WIRE PARAMETERS FOR VARIOUS FUTURE TECHNOLOGIES
BASED ON ITRS [15] PROJECTIONS AND [16]. WIRE ELECTRICAL
PARAMETERS WERE OBTAINED USING FORMULA GIVEN IN [17]








The dynamic energy consumption of the selection logic is found by
estimating the total effective load capacitance including parasitic ca-
pacitances of all the gates and multiplying it by V 2DD=2 and the activity
factor.
2) Short Circuit Energy: Assuming the short-circuit current spike
is a triangle with a peak Ipeak, and a base tsc, the short-circuit energy











tsc = 1:1 [Rd(Cd + Cg + Cw) +RwCg + 0:4RwCw] (5)
with Rd being the device resistance averaged over the switching range
during which the short-circuit current flows and Vgs is the source to
gate voltage of the MOS transistor. It is assumed that the peak current
occurs in the middle of the transition and hence, Vgs  VDD=2.
In modeling the short circuit power consumed in the selector logic,
the series connected pMOS/nMOS combination is represented by an
equivalent single pMOS/nMOS device for the purpose of computing
the driving resistance. This resistance is multiplied by the load capac-
itance to obtain tsc, which is
tsc gate  Rgout(Cdout + Cgin) (6)
where Rgout is the equivalent output resistance of the gate, Cdout is
the output capacitance, and Cgin is the fan-out capacitance.
3) Leakage Energy: The average leakage energy of a MOS tran-









Authorized licensed use limited to: Lancaster University Library. Downloaded on November 30, 2009 at 05:15 from IEEE Xplore.  Restrictions apply. 
592 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 5, MAY 2008
Fig. 6. Variation of energy per transition with the size of assistant driver.
TABLE III
ENERGY DISSIPATION FOR EACH SWITCHING GROUP (H = 104)
Fig. 7. Delay variation with the size of assistant driver.
IV. ENERGY MODEL VALIDATION
The energy models derived in Section III are validated for the UMC
0.18-m implementation by running simulations for a wire length of
10 mm. The relevant parameters for this technology node are given in
column one of Table II. For the traditional inverting repeater, the pa-
rameters of repeater size (Ht = 156) and number of repeaters (k = 5)
for minimizing delay are obtained from the well-known methodology
described in [3]. Applying the general methodology described in the
Appendix for optimizing the SMART driver, the parameters Hm = 52,
Ha = 104, and k = 5 are obtained.
The simulations show that the energy models derived for the tradi-
tional and SMART repeaters are accurate to within 95% of their sim-
ulated values as evidenced in Fig. 6 and summarized in Table III. As
predicted by the model, increasing the size of the assistant driver will
increase the energy saving, although at the cost of increased delay, if
the size is increased beyond the optimal (refer to Fig. 7).
It is evident from Table III that the energy loss introduced by the
extra selection logic for switching patterns in Group 5, where both the
assistant and main drivers switch, is more than offset by the energy
saving for those patterns in Groups 1, 3, and 4, where the assistant
TABLE IV
MAXIMUM CROSSTALK ON A QUIET LINE
TABLE V
ENERGY SAVING FOR FUTURE GENERATIONS
does not switch. On average, assuming equally likely occurrences of
all patterns, the total energy saving is around 10%.
There is a slight increase in the peak crosstalk voltage with a smart
driver compared to a traditional driver but the peak crosstalk at the
far-end is about 12% of VDD, which is in the normally acceptable range
of 20% ofVDD. It can be seen from Table IV that the near-end cross-talk
with the smart repeater is doubled compared to a traditional repeater,
but the total percentage is 0.1% of VDD.
V. IMPACT OF TECHNOLOGY SCALING
In this section, the potential of the SMART Driver to save energy
in future technology nodes is investigated. As the feature size
decreases, the short circuit energy increases fairly sharply, which
adversely affects the energy saving due to the fact that the SMART
driver has a few transistors in the selector logic. However, this is
offset to some degree due to the relative decrease in area and the
associated dynamic energy consumption of the selection logic in
comparison to the driving inverters. Since global wires are scaled
selectively, the wire parasitics remain approximately the same, or
are worse, and the driving transistors see no reduction in size [15].
In contrast, the selection logic can be implemented with minimum
sized transistors and the dynamic energy consumed becomes truly
negligible. An analysis was carried out using ITRS predictions
to derive the relevant technology parameters, as summarized in
columns 2–6 in Table II. The predicted total average energy saving
in driving global length wires is shown in Table V, highlighting
the usefulness of the SMART driver right up to the 32-nm node.
VI. CONCLUSION
In this paper, we addressed the issue of reducing energy consump-
tion by exploiting the switching-pattern-dependent delay of repeater-
inserted global wires. The proposed circuit was implemented in an
UMC 0.18-m CMOS technology and tested for proof of concept. The
average energy saving was shown to be around 10%, and the jitter re-
duction to be 20% for a data rate of 1 GB/s.
A comprehensive delay and energy analysis was presented,
including a design methodology to obtain the optimal repeater config-
urations for minimizing delay while also minimizing jitter. Further,
as processes scale, the selector latency shrinks, and higher data rates
can be achieved. The total energy saving that can be achieved by the
SMART driver in future nanometer technologies is found to be in the
range of 20%–25%.
Authorized licensed use limited to: Lancaster University Library. Downloaded on November 30, 2009 at 05:15 from IEEE Xplore.  Restrictions apply. 
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 5, MAY 2008 593
APPENDIX
DELAY-BALANCED DRIVER SIZING
The methodology for delay-balanced driver sizing for the SMART
driver is described extensively in our previous work [12]. The delay
analysis for SMART repeater insertion uses the characterization of
a minimum-sized repeater in terms of an output resistance Rdmin,
input gate capacitance Cgmin, and output drain-diffusion capacitance
Cdmin. When both the main and assistant drivers are switching,
Rd = Rdmin=(Hm + Ha) and Cg = Cgmin(Hm + Ha). An
expression can be derived for the associated delay TMA, by using the
Elmore delay as in [13]. When the assistant driver is quite, the driver
resistance changes to Rd = Rdmin=Hm, and the gate capacitance
to Cg = HmCgmin. Thus, the delay expression (TM ) when the
assistant is quiet is found. Since the assistant driver switches only for
the worst-case switching pattern defined by Group 5 in Table I, the
size of the assistant driver Ha can be used to tune the delays for the
other switching combinations defined by Groups 1, 3, and 4. The delay
variation can be quantified as
T = TMA   TM :
By setting T = 0, delay balancing can be achieved. Using the rela-
tion HmDB = Ht   HaDB , a quadratic equation for HaDB can be
obtained, the solution to which gives the delay balanced assistant driver
size.
REFERENCES
[1] N. Magen, A. Kolodny, U. Weiser, and N. Shamir, “Interconnect-power
dissipation in a microprocessor,” in Proc. Int. Workshop Syst. Level
Interconnect Prediction (SLIP), 2004, pp. 7–13.
[2] R. Weerasekera, L.-R. Zheng, D. Pamunuwa, and H. Tenhunen,
“Switching sensitive interconnect driver to combat dynamic delay in
on-chip buses,” in Proc. Lecture Notes Comput. Sci. (PATMOS), Sep.
2005, vol. 3728, pp. 277–285.
[3] D. Pamunuwa, L.-R. Zheng, and H. Tenhunen, “Maximizing
throughput over parallel wire structures in the deep submicrom-
eter regime,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.
11, no. 2, pp. 224–243, Apr. 2003.
[4] A. B. Kahng, S. Muddu, E. Sarto, and R. Sharma, “Interconnect tuning
strategies for high-performance ICS,” in Proc. Conf. Des., Autom. Test
Eur. (DATE), 1998, pp. 471–478.
[5] T. Iima, M. Mizuno, T. Horiuchi, and M. Yamashina, “Capacitance
coupling immune, transient sensitive accelerator for resistive intercon-
nect signals of subquarter micron ULSI,” IEEE J. Solid-State Circuits,
vol. 31, no. 4, pp. 531–536, Apr. 1996.
[6] P. Sotiriadis, T. Konstantakopoulos, and A. Chandrakasan, “Analysis
and implementation of charge recycling for deep sub-micron buses,”
in Proc. Int. Symp. Low Power Electron. Des. (ISLPED), 2001, pp.
364–369.
[7] A. Nalamalpu, S. Sirinivasan, and W. P. Burleson, “Boosters for
driving long on chip interconnects-design issues, interconnect syn-
thesis, and comparison with repeaters,” IEEE Trans. Comput.-Aided
Des. Integr. Circuits Syst., vol. 21, no. 1, pp. 50–62, Jan. 2002.
[8] H. Kaul and D. Sylvester, “Low-power on-chip communication based
on transition-aware global signaling (TAGS),” IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., vol. 12, no. 5, pp. 464–476, May 2004.
[9] A. Katoch, S. Jain, and M. Meijer, “Aggressor aware repeater circuits
for improving on-chip bus performance and robustness,” in Eur. Solid-
State Circuits Conf. (EUSCIRC), Sep. 2003, pp. 261–264.
[10] H.-Y. Huang and S.-L. Chen, “Interconnect accelerating techniques for
sub-100-nm gigascale systems,” IEEE Trans. Very Large Scale Integr.
(VLSI) Syst., vol. 12, no. 11, pp. 1192–1200, Nov. 2004.
[11] H. B. Backoglu, Circuits, Interconnections and Packaging for VLSI.
New York: Addison-Wesley, 1990.
[12] R. Weerasekera, D. Pamunuwa, L.-R. Zheng, and H. Tenhunen, “Min-
imal-power, delay-balanced smart repeaters for interconnects in the
nanometer regime,” in Proc. International Workshop Syst.-Level Inter-
connect Prediction (SLIP), 2006, pp. 113–120.
[13] T. Sakurai and A. Newton, “Alpha-power law MOSFET model and
its applications to CMOS inverter delay and other formulas,” IEEE J.
Solid-State Circuits, vol. 25, no. 2, pp. 584–594, Apr. 1990.
[14] D. Sylvester, W. Jiang, and K. Keutzer, “Berkeley advanced chip per-
formance calculator,” [Online]. Available: http://www.eccs.umich.edu/
dennis/bacpac/
[15] Semiconductor Corp., “The international technology roadmap for
semiconductors (ITRS),” 2003 [Online]. Available: http://www.itrs.
net/Links/2003ITRS/Home2003.htm
[16] W. Zhao and Y. Cao, “New generation of predictive technology model
for sub-45 nm design exploration,” in Proc. 7th Int. Symp. Quality Elec-
tron. Des. (ISQED), 2006, pp. 585–590.
[17] L.-R. Zheng, D. Pamunuwa, and H. Tenhunen, “Accurate a priori signal
integrirty estimation using a dynamic interconnect model for deep sub-
micron VLSI design,” in Proc. Conf. Euro. Solid-State Circuits (ESS-
CIRC), 2000, pp. 324–327.
Authorized licensed use limited to: Lancaster University Library. Downloaded on November 30, 2009 at 05:15 from IEEE Xplore.  Restrictions apply. 
