On-Chip Self-Calibrating Communication Techniques Robust to Electrical Parameter Variations by Worm, Frédéric et al.
Self-Calibration Techniques
524 0740-7475/04/$20.00 © 2004 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers
RECENT SILICON TECHNOLOGIES used for SoCs are
increasingly different from original CMOS, which was an
almost ideal digital technology. Second-order effects of
different sorts, such as capacitive or inductive crosstalk,
have become more critical because of a wealth of physi-
cal phenomena, including 3D and quantum effects.
Similarly, ever-decreasing geometries are widening the
spread of electrical parameters linked to imprecision in
the manufacturing process. All these factors contribute
signiﬁcantly to design process complexity. Without a dra-
matic, unforeseen change in circuit and manufacturing
technology, this situation can only get worse.
The adopted design methodologies have always
been based on worst-case design approaches. Gate and
interconnect delays are a typical example. Circuits
clock registers only when data is sure to be stable, and
designers use worst-case analysis—as determined, for
example, by static timing analyzers—to achieve this tim-
ing estimate. Designers simply model all sources of devi-
ation from the nominal situation and total them to
determine the most conservative estimate of the
incurred delay.
Observed trends in worst-case analysis for current
design methodologies could invalidate
the beneﬁts of faster, scaled-down semi-
conductor technologies. As a result, large
capital investments in deep-submicron
silicon fabrication might not return com-
petitive chips. Worst-case design will
show diminishing returns in speed as
designers scale down devices and supply
voltages. The complex interaction of sev-
eral physical factors will become increasingly harder to
model accurately, pushing designers toward ever more
conservative assumptions. Although some research
aims to improve the accuracy of worst-case static tim-
ing estimations,1 a more radical approach is needed.
Otherwise, there will be a heavy price to pay—mostly
in terms of energy consumption—even as power sav-
ings becomes a primary goal in many SoC applications.
Figure 1 illustrates the point with a simple qualitative
example. Recall that accurate knowledge of the delay
and voltage relation is key for many optimization tech-
niques, such as transistor sizing and dynamic voltage scal-
ing. The nominal relation between delay and supply
voltage is modiﬁed by several physical phenomena,
whose cumulative effects constitute a worst-case relation.
Therefore, at a given supply voltage, VDD, a designer will
assume the most conservative delay—that is, that the
operating point is not, for instance, A but B—and imple-
ment the design accordingly. However, at a particular
instant, the device is likely to be operating under far more
favorable conditions—for instance, with a lower delay
indicated by operating point C. This implies a waste of
energy because operation at reduced voltage VDD′ (B′)
On-Chip Self-Calibrating
Communication Techniques
Robust to Electrical
Parameter Variations
Editor’s note:
Dynamic self-calibration holds the promise of overcoming conservative worst-
case design techniques needed to combat deep-submicron process and
operating variations. This article proposes an on-chip point-to-point
interconnect scheme characterized by self-calibration that can operate
dynamically to achieve the best energy/performance trade-off.
—Soha Hassoun, Tufts University
Frédéric Worm, Paolo Ienne, and Patrick Thiran
Swiss Federal Institute of Technology Lausanne
Giovanni De Micheli
Stanford University
would yield the actual performance the
system was designed for. A less conserva-
tive operation in B′ rather than B would
achieve the same user function in the
same time with potentially signiﬁcant
energy savings—roughly proportional to
the difference of the square of the supply
voltages VDD and VDD′.
Self-calibrating circuits
Tolerance for process, voltage, and
temperature (PVT) variations is becom-
ing increasingly important.2 To achieve
aggressive circuits that exploit the fea-
tures of expensive downscaled tech-
nologies, we propose designing
self-calibrating circuits that break the
worst-case barrier. Such circuits deter-
mine operational parameters, such as
voltage, at runtime to meet overall relia-
bility constraints. In other words, these
parameters ensure that the number of
data and timing errors is bounded and
that most can be corrected. Because all factors that
reduce circuit performance could combine to realize
the worst-case situation, self-calibrating circuits must
still be conservatively overdesigned to withstand this
possibility. However, we want to avoid paying the price
(typically, in wasted energy) for such conservative
designs in the general case.
Of course, using adaptive design techniques in
extremely aggressive designs is not new; in some situa-
tions it is commonplace. Researchers and practitioners
have already gone far. For instance, in a state-of-the-art
commercial processor, regional clock skew is adaptive-
ly tuned at power-up using relatively complex controllers
to compensate for local process variations across a sin-
gle die.3 Nonetheless, it is not common to use powerful
digital controllers, such as complex finite-state
machines, to adjust the operating point of transistors
when the overall design might be jeopardized, or while
the circuit is operating. Designers generally use tight ana-
log feedback loops (phase-locked loops or delay-locked
loops). We believe that digital controllers can be effec-
tive in certain limited but important situations.
We see a strong potential for applying small synthe-
sizable digital controllers in applications in which
 it’s possible to trade robustness for energy (for exam-
ple, in Figure 1 an investment of energy guarantees
correct operation under all conditions);
 it’s possible to check, with low overhead, whether
the system is operating correctly and, if not, to oper-
ate the circuit under different conditions; and
 the application has some intrinsic tolerance to lim-
ited latency deviations (as in modern communica-
tion systems and memory hierarchies).
This article describes an on-chip point-to-point inter-
connect scheme that permits on-line self-calibration to
achieve the best energy/performance trade-off. We
designed the scheme to recover from the occasional
choice of an overly aggressive value for the operating
point at which the interconnect, in fact, does not oper-
ate correctly—or at all. 
The idea of operating CMOS devices at voltages
below the worst-case characterization point—and
thus in subcritical regions where errors might occur—
has received little attention. A recent article
addressed the possibility of exploiting devices in sub-
critical regions for DSP.4 In that case, DSP algorithms
compensate for errors arising from subcritical volt-
ages. Our goal is similar, but it concerns communica-
tion rather than computation; therefore, we can
exploit classic techniques to achieve correct behav-
ior despite occasional errors—as the second condi-
tion in our previous list requires.
525November–December 2004
Worst case
Nominal
Actual
Supply voltage
Delay
A
B
X´´´´
X´´´
X´´
X´
C
B´
V´DD VDD
Figure 1. Delay and voltage relation for nominal, actual, and worst-case
design. The worst-case design typically wastes resources—usually silicon
area and, more critically, energy. Traditional dynamic voltage-scaling
techniques would select only points such as X′ to X′′′′.
On-chip interconnects
The successful design of highly complex SoCs
depends on the availability of robust design method-
ologies that allow a short time to market with low risk.
Faced with the need to integrate billions of transistors
on a single chip, design technologies are under increas-
ing pressure.
Designing such SoCs is possible only by using com-
plex components such as full subsystems with proces-
sors, controllers, and digital signal processors as major
predeﬁned building blocks. Therefore, because of the
difﬁculty of global synchronization, we can view mod-
ern SoCs as heterogeneous multiprocessing systems with
multiple, possibly asynchronous, timing references.
Given a library of modular components, designers’ main
challenge for future SoCs will be to efﬁciently connect
such components into an effective network that imple-
ments the desired functionality. On-chip micronetworks,
or networks on a chip,5 will become the central focus of
the design process and will inherit techniques such as
layered design and packetized communications and
methodologies from today’s macronetworks.
In our discussion of long-distance on-chip VLSI inter-
connects (informally called buses), we focus, without
loss of generality, on three objectives:
 Performance requirements. A bus implementing a
communication link should provide enough band-
width to support the required communication
demand. This demand might not be precisely known
in the early design stages. Additionally, we must rec-
ognize that a bus’ workload can change dynamical-
ly, meaning its bandwidth needn’t always be kept at
its peak. Therefore, dynamically adjusting bus band-
width can greatly enhance design versatility.
 Energy consumption. Studies have shown that wires
account for a signiﬁcant portion of total energy con-
sumption (40% to 50%).6 A large share of this con-
sumption results from long, high-capacity wires
crossing the die and connecting different subsys-
tems. With larger dies and more subsystems on a
chip, the proportion of power consumed by com-
munication can only grow. Obviously, we need tech-
niques to reduce the energy consumed by on-chip
communication.
 Reliability and noise sensitivity. We already men-
tioned that many technological factors challenge the
traditional robustness of digital CMOS design, and
functionality depends on phenomena that are
increasingly more difficult to model. This conflicts
dramatically with the fact that the best way to
achieve low-energy communication is to use small
voltage swings, but at the cost of further decreasing a
circuit’s noise immunity. Design methodologies for
interconnects must account for growing noise sen-
sitivity and indeterminacy.
A common technique for minimizing power con-
sumption on buses is to choose an appropriate encoding
scheme that reduces switching activity without affecting
the signal information content. This approach accounts
for interwire capacitances7 and has recently been extend-
ed to address reliability issues.8 Bus encoding techniques
have proved effective at reducing power consumption,
although best results are generally obtained in speciﬁc
devices, such as address buses. In fact, energy-efﬁcient
encoding complements our scheme.
As Figure 1 already suggests, the classic way to
reduce power consumption is to use a lower supply volt-
age, and for interconnects and buses in particular, low-
swing signaling techniques.9,10 Although very effective
on the power side, these techniques alone signiﬁcantly
compromise a design’s robustness. Instead of helping
designers address new deep-submicron effects, they fur-
ther complicate the design process. Our proposed
scheme uses low-swing communication judiciously
while ensuring that the system’s overall reliability does
not decline but, on the contrary, increases.
Like low-swing techniques, the well-established and
effective technique called dynamic voltage scaling
(DVS) reduces power consumption in systems under
given performance constraints.11 Its most common
application is for dynamically adapting mobile-proces-
sor speeds to current computational requirements, and
several commercial processors (Intel XScale, Mobile
Pentium, and Transmeta Crusoe) now support it. DVS
is based on the characterization of devices at several
different working points (pairs of supply voltages and
operating frequencies). These pairs correspond to a set
of safe operating conditions computed or measured
while accounting for all worst-case parameters—for
instance, points X′ to X′′′′ in Figure 1.
Shang et al. introduced a transmission scheme apply-
ing DVS to chip-to-chip interconnection networks.12
Such a system is a direct extension of processor voltage
scaling and assumes the knowledge of a ﬁxed relation
between voltage and frequency for safe operation. Our
communication scheme similarly extends the idea of
DVS to on-chip communications in the form of variable
voltage-swing signaling, but in the spirit of our self-cali-
Self-Calibration Techniques
526 IEEE Design & Test of Computers
bration idea, it doesn’t rely on prior knowledge of
robust working points.
Self-calibrating architecture
For simplicity, we focus on a typical unidirectional
point-to-point interconnect between subsystems. Figure
2a shows a qualitative view of the classic interconnect.
At the producer end, a FIFO or similar buffer decouples
the two subsystems, which might operate at different
frequencies, and a large driver (typically a chain of
appropriately sized inverters) charges or discharges the
large capacitance represented by the interconnecting
wires. A receiver (typically a CMOS gate) compares the
voltage level of the line to a threshold.
As Figure 2b shows, we add a few elements to the
classic scheme. To reduce the energy consumed per bit,
we apply a form of DVS to the interconnect by dynami-
cally controlling the driver swing and the correspond-
ing receiver threshold. There are well-known electrical
schemes to reduce the interconnect’s voltage swing. Of
course, the variable voltage swing affects the speed at
which the interconnect driver can charge or discharge
the load capacitance; thus, lower swings reduce the
maximum reliable operating frequency. Hence, we
need to adapt the communication speed, too, as in tra-
ditional DVS techniques.
Our architecture is seamlessly applicable to seg-
mented buses. In such cases, we can use the same volt-
age swing along all segments because every repeater
consists only of an inverter supplied at voltage Vch. Later
in this article, we report conservative results that con-
sider the energy spent only on the interconnect wires.
In reality, the repeaters draw additional energy, which
also scales down with our technique.
Operating with lower voltage swings makes our com-
munication more sensitive to several noise sources. To
cancel this effect, we introduce error detection encod-
ing at the word level on the source side, and we imple-
ment a typical automatic repeat request (ARQ) strategy,
namely Go-Back-N.13 The ARQ strategy entails small
latency variations. Although hard real-time applications
might suffer from these variations, many practical
embedded systems can tolerate them because of their
softer real-time constraints. Finally, our scheme requires
a self-calibration controller that decides on the operat-
ing frequency and voltage swing. This controller must
choose voltage/frequency pairs from a set of safe oper-
ating points and as a function of the requested band-
width. It must also explore the design space to discover
safe and lowest-power operating points. Therefore, it
needs as an input some information on both bandwidth
requirements and link reliability.
In summary, our system uses variable frequency and
voltage swing to trade off speed for energy, implements
error detection and ARQ to guarantee reliable commu-
nication, and exploits a variable relation between oper-
ating frequency and voltage swing to ﬁnd the best safe
operating point under current environmental condi-
tions by monitoring the error rate.
Challenges of self-calibration
Making the system robust under the expected
extreme conditions entails several challenges. The main
point is that we are not trying to screen out and remove
some relatively infrequent errors, as error detection
codes and ARQ protocols do. On the contrary, we try to
operate the system as close as possible to the point at
which it becomes nonoperational. In a sense, we push
our system to explore the operating space, so that at
times it actually becomes nonoperational.
Figure 3 shows a more practical view of our system.
It represents in greater detail the idea illustrated in Figure
2b, with the addition of some necessary components.
Channel bit-error modeling
Worm et al. have discussed several system modeling
527November–December 2004
FIFO
FIFO
ErrorsTraffic
En
co
de
r
D
ec
od
er
Controller
VDD VDD
VDDVch
Fch
(a)
(b)
Figure 2. The basic idea of a self-calibrating, point-to-point,
unidirectional on-chip interconnect: the classic static
scheme, with a FIFO buffer to decouple the two subsystems
(a); the proposed self-calibrating scheme, with the elements
needed to achieve the desired goals (b).
issues.14 The issue most relevant to achieving self-cali-
bration is the availability of a reliable error model as a
function of the voltage swing and the transmission fre-
quency.
We consider two possible sources of errors, or noise.
The ﬁrst is an additive white Gaussian noise, modeling
external disturbances. The second noise source cap-
tures the variability of the channel cutoff frequency
around its nominal value, representing the effects of
temperature, manufacturing conditions, and so forth,
on the propagation delay through the interconnect. We
assume these two noise sources are uncorrelated. We
further assume that an error occurs if the operating fre-
quency exceeds the channel cutoff frequency or the
additive noise exceeds half the voltage swing. Although
external disturbances are more accurately modeled as
burst noise, a white-noise model suffices to prove our
concept. Note that the operating-point control policy
doesn’t rely on any assumption about the noise model,
which serves only to generate random bit errors in our
experiments.
With these assumptions, we can derive a relation to
express the probability of errors on a single line as a
function of the voltage swing and the transmission fre-
quency. At a given voltage, for transmission frequencies
below the channel cutoff frequency, the probability of
error is not 0 but extremely small. Conversely, for very
high transmission frequencies, the same probability is
practically 1.
The bit-error probability doesn’t express a bit-ﬂipping
probability. Because we model the charging and dis-
charging of interconnect bit lines—including timing
errors such as those induced by crosstalk—the bit-error
probability models approximately the probability that
a line is sampled before having time to change to its
new state (see Figure 4). That is, we can assume that if
the operating frequency is too high, the word read on
the interconnect is simply the previous one, because
there wasn’t enough time for the lines to transition to
their final state. This has important consequences for
our choice of encoding.
Delay-insensitive encoding
Simple spatial encoding (such as adding parity bits
to the data word) is insufﬁcient. Such encoding would
effectively detect, for instance, that because of crosstalk
a single bit hasn’t yet transitioned. Yet, if our clock is so
fast that the entire previous word is still present on the
interconnect (for example, when the sampling process
is like (2) in Figure 4), a pure spatial encoding would
Self-Calibration Techniques
528 IEEE Design & Test of Computers
Channel
32 40FIFO32
full
wen
wnack
wen
wdata
wclk Clock
empty
ren
32
go-back
ack
fill level
rnack
ren
rdata
rclkClock
FI
FO
En
co
de
r
Clock Clock Clock
Sy
nc
hr
on
ize
r
D
ec
od
er
OPC
FIFO
Decoder
Automatic
repeat request
(ARQ) controller
empty
ren
go-back
ack
error
Clock
generator
Clock
Voltage
generator
Channel
driver
voltage
Operating-
point
controller
(OPC)
fill level
error
fch
vch
AR
Q O
PC
Figure 3. Possible architecture for the self-calibrating, point-to-point, unidirectional on-chip interconnect.
see the result as correct and would not
detect that the new word is simply not
ready. Instead of more-classical delay-
insensitive encodings, such as 1-of-N
schemes, we use the simpler scheme
shown in Figure 5. Our error detection
scheme works by generating one addi-
tional bit, alternatively a 0 and a 1, that is
not transmitted but is produced inde-
pendently at the source and destination,
and by computing and transmitting an 8-
bit cycle redundancy check code (CRC-
8) using the generator polynomial x8 + x2
+ x + 1 on the data word (for example, 32
bits) padded with the generated bit.13
This bit ensures that no two successive
identical data words can have the same
encoding; hence, two successive 40-bit
encoded words on the channel can be
identical only if an error occurs. We have
verified that for independent uniformly
distributed input data, the redundant bit
lines have the same switching activity as
nonencoded data lines, transitioning an
average of once every two cycles (half
the switching rate of a clock).
This scheme combines a flipping bit
and a CRC-8. However, analytically
assessing the scheme’s robustness is
beyond the scope of this article.15 We per-
formed simulations in VHDL with a func-
tional model of the channel that
approximates the analyti-
cal bit-error-rate model.
We transferred 0.32 × 109
random bits and observed
no residual undetected
error for raw bit-error rates
up to 10–3. Figure 6 shows
the residual bit errors as a
function of high raw bit-
error rates.
Although by no means
specific to this encoding,
it’s worth noting that as
the bit-error rate ap-
proaches 1, the absolute
number of undetected
errors increases dramati-
cally. This is of no con-
529November–December 2004
(2) (3) (1) (2) (3) (1) (2) (3) (1)
1 0 0
0 1 1U U 0 00 0
Received values
Transmitted values
Figure 4. A qualitative view of the error sources in a self-calibrated
interconnect operating under delay/voltage conditions that are too
aggressive: correct operation after a sufficient delay (1); bit errors resulting
from the sampling after a largely insufficient delay (2); and risk of
metastability in the receiver for sampling times that are slightly too
aggressive (3). (The figure is simplistic because a new symbol would
normally be emitted at the same time the line is sampled.) ´The two
horizontal lines represent the receiver thresholds. U indicates an undefined
received value because the sampled value is between the two thresholds.
CR
C-
8 
de
co
de
r
CR
C-
8 
en
co
de
r
Error
32
33
32
8
33
Data outData in
Clock
QD
Clock
QD
Channel
(40 bits)
32
8
32
Figure 5. Possible delay-insensitive encoding scheme. The error signal also detects
whether the sampled word is still the last word correctly sent across the channel.
cern in typical applications of error-correcting codes,
where we can assume that the error rate is always small,
but a self-calibrating system might operate briefly in
regions of extremely high bit-error rates. Contrary to the
classic CRC-8 and thanks to the ﬂipping bit, our encod-
ing scheme detects errors when the raw bit-error rate
approaches 1. Encodings with an even stronger detec-
tion probability under our error model are an active
subject of research.
Operating-point control policy
As Figure 3 shows, it’s possible to completely sepa-
rate an ARQ controller from a controller devoted to
choosing the operating point. The former’s sole task is
to push all data words through the channel until they’re
communicated to the receiver without error, ignoring
the channel parameters. In other words, the ARQ con-
troller decides only which words to push through the
channel. The operating-point controller,
on the other hand, selects the lowest fre-
quency and voltage swing required to
meet some communication constraint,
such as an average delay. It decides how
to communicate and determines whether
the choice is appropriate, but it ignores
what is going through the channel.
Figure 7 shows a simplified control
algorithm for the operating-point con-
troller, which memorizes the best oper-
ating point for each possible frequency.
The controller performs three tasks inde-
pendently:
 It records the location of the best volt-
age/frequency points (that is, for each
possible frequency, it discovers the
lowest usable voltage swing). It does
this on the basis of experienced errors
and periodic attempts to explore
more-aggressive operating regions.
 It chooses a frequency on the basis of
the delay constraint.
 It chooses the estimated best point’s
voltage swing at the selected fre-
quency.
We assume the delay constraint is
known, which is often the case with mul-
timedia data transfers (see the “Simulation
results” section). Figure 8a shows how the
Self-Calibration Techniques
530 IEEE Design & Test of Computers
10−3
10−4
10−5
10−6
10−7
10−8
10−3 10−2 10−1 100
Raw bit-error rate
R
es
id
ua
l b
it-
er
ro
r r
at
e
Figure 6. Residual bit-error rate as a function of the raw bit-
error rate.
mode = normal
counter = 0
V_best (F_ch)++
V_best (F_ch)--
mode = exploring
V_best (F_ch)-- counter++
Errors?
mode?
Start
counter>
threshold
 
V_ch = V_best (F_ch)
F_ch = fn (fifo_level, deadline)
Yes No
Yes No
Exploring Normal
Apply F_ch and V_ch
Figure 7. Simplified operating-point control policy.
controller selects the operating point among a set of pos-
sibilities (one point per frequency); the recorded points
are an estimation of Pareto operating points. The con-
troller chooses the most appropriate operating point as
a function of the observed trafﬁc and the delay con-
straint. Figure 8b illustrates the effect of the estimation
process: Errors immediately push the system to become
more conservative (that is, to increase the voltage swing
associated with a given frequency). To ensure the most
aggressive operation, whenever the system works satis-
factorily for a given number of cycles (a threshold value
of, say, 500 to 1,000 cycles), it brieﬂy attempts to reduce
the voltage at a constant frequency. If errors aren’t
observed for a few cycles (say, 50), the controller records
the new point as the best point at that frequency.
Figure 7’s control policy deserves a brief mention
here. In particular, we’re interested in comparing our
policy with that of an optimal controller that already
knows the Pareto voltage that should have been used
for every frequency. It turns out that the two controllers
perform similarly. We observed no difference in terms
of transfer delay and residual word errors (none in
either case), but the optimal controller saved approxi-
mately 1 percentage point more power.
We are also interested in sensitivity to the empirical
threshold parameter that dictates how often the controller
tries to reduce voltage. Figure 9 reveals that our policy
results in correct behavior for a wide range of values.
Although implementing this control algorithm entails
considerably more than Figure 7 shows, hardware com-
plexity is still relatively low and requires an area equiv-
alent to a few thousand two-input NAND gates.
Simulation results
We synthesized and simulated a self-calibrating 32-
bit interconnect system and compared it with a classic
ﬁxed-swing system. We modeled typical 0.13-µm CMOS
technology and noise sources as follows:
 nominal supply voltage, 1.5 V;
 device threshold voltage, 0.3 V;
 additive noise standard deviation, 0.1 V; and
 average cutoff frequency and standard deviation,
500 MHz and 36 MHz, respectively.
This technology data applies only to bit-error simula-
tion; the controller is completely technology indepen-
dent. Table 1 shows the systems’ operating ranges.
The classic system does not implement an error
detection scheme, whereas our system contains the
encoder and decoder illustrated in Figure 5. We present
our results with delays and frequencies relative to the
classic system. To calculate the self-calibrating system’s
energy advantage, we account for the main sources of
inefficiency—namely, the need to communicate 25%
more bits for the error-detecting code and the need to
occasionally resend some pieces of data because of
errors. However, we disregard the small amount of ener-
gy spent in the ARQ and the operating-point controllers
and in the encoder and decoder. The ARQ controller
531November–December 2004
D
el
ay
Negative
slack
Positive
slack
Voltage
Voltage
D
el
ay
Explore Errors
(a)
(b)
Possible operating points
Pareto operating points
Figure 8. Use and estimation of best operating points. The
control policy fixes the operating frequency as a function of
the delay constraint; it sets the operating voltage to the
minimum value that has experienced error-free transmission
(a). When the system experiences errors, the controller
raises the best voltage associated with a given frequency;
otherwise, every few cycles it tries to reduce the voltage to
ensure the most aggressive operation (b).
has roughly 100 gates. A study of the encoding/decod-
ing circuitry shows that the incurred logic overhead
doesn’t signiﬁcantly affect the energy balance. We can
expect that current high-end systems and future systems
in general will already contain an encoder and a
decoder.16 Because we neglect the control logic energy
overhead, the ratio of energy consumed by the self-cal-
ibrating system to that consumed by the classic system
doesn’t depend on parameters such as bus length or
capacitive load. Therefore, we don’t have to specify
their actual value in the results.
In Figures 10–12, we present results from three exper-
iments. The ﬁrst, Figure 10, shows the energy advantage
of dynamic bandwidth adaptation on a realistic MPEG-
based workload. The second example, Figure 11, shows
the energy advantage of dynamically tuning the oper-
Self-Calibration Techniques
532 IEEE Design & Test of Computers
100 102101 103 104
Threshold
50
0
−50
En
er
gy
 s
av
in
gs
 (%
)
104
103
102
101
100 102101 103 104
Threshold
Av
er
ag
e 
de
la
y 
(ns
)
100 102101 103 104
Threshold
40
20
0
10
30
R
es
id
ua
l e
rro
rs
(a)
(b)
(c)
Figure 9. Sensitivity of various metrics to the threshold appearing in the description of the controller algorithm.
Values that are too small cause frequent word retransmissions, negatively affecting energy savings (a), word
transfer delay (b), and residual word errors (c).
Table 1. Operating ranges for comparing the classic and self-calibrating
systems.
Operating parameter Classic Self-calibrating 
Voltage swing 1.5 V 0.5-1.6 V
Frequency 250 MHz 50-500 MHz
ating point to actual technology variations. The third,
Figure 12, illustrates our system’s robustness to unpre-
dictable noise sources.
Modern multimedia algorithms have dynamically
varying requirements. Figure 10b shows how the self-
calibrating system takes advantage of a time-varying
MPEG workload, shown in Figure 10a. The adaptive sys-
tem tries to exactly match the bandwidth to the current
needs. It slows down the communication link to send
every MPEG frame exactly in the allotted time and, ide-
ally, not any faster. Operating at a lower frequency
grants a substantial reduction in average energy con-
sumption: The whole trace, consisting of 400 frames of
several kilobytes each, requires 53% less energy with a
dynamically self-calibrating system than with a classic
system.
Figure 11 illustrates the effect of technology on the
choice of control points. On a wafer whose electrical
parameters are poor, simulated with an average cutoff
frequency of 430 MHz, the controller chooses mainly
Pareto points relatively close to the worst-case delay
line. On a good wafer, simulated with an average cutoff
frequency of 570 MHz, the points chosen are mostly
along a more aggressive delay/voltage line and reﬂect
the lowest delays that the system experiences. (In both
cases, the cutoff frequency standard deviation has been
decreased to 15 MHz to account for the lower indeter-
minacy.) On the poor wafer, the self-calibrating system
provides an energy savings of 17%, compared with the
classic system. The energy savings rises to 38% on the
good wafer. The simulated traffic is an artificial work-
load of 100,000 words, with arrival times following a
Poisson process. Average latency through the commu-
nication system increases in the self-calibrating system
by 14% for the good wafer and 26% for the poor wafer.
Figure 12 illustrates the effect of design hypotheses
that turned out to be too optimistic. To simulate the self-
calibrating system with more noise, we raised the stan-
dard deviation of the additive noise from 0.1 V to 0.15 V
and the cutoff frequency standard deviation from 
36 MHz to 55 MHz. The classic system will probably not
work anymore under these conditions. Overlooking or
533November–December 2004
0 50 100 150 200 250 300 350 4000
0.5
1.0
1.5
0 50 100 150 200 250 300 350 4000
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Fr
am
e 
si
ze
 (K
by
tes
)
Fr
am
e 
de
la
y 
(µs
)
Frame number
Frame number
(a)
(b)
Figure 10. Transmission of a variable workload: workload variation in time (a); incurred frame delay
in the classic system (low delay—solid line at bottom) and in the self-calibrating interconnect (delay
as close as possible to the imposed constraint—dashed line at top) (b).
underestimating any error source—such as crosstalk or
other deep-submicron second-order effects—in the nor-
mal design ﬂow might prevent the manufactured chips
from working or result in a very limited yield. As the ﬁg-
ure shows, the self-calibrating system adapts to the strong
noise by choosing less-aggressive operating points and
by trading energy for robustness. Energy savings shrinks
to 14% and the average latency grows by 34%, compared
with the desired behavior of the classic system.
However, the interconnect operates correctly and avoids
the yield reductions incurred by the classic system.
OUR NOVEL DESIGN PARADIGM for tolerating electrical
parameter variations offers much-needed advantages
because a wider spread of the electrical parameter will be
unavoidable as technologies shrink further. That is, worst-
case design assumptions may very well cancel the bene-
ﬁts of technology investments. Therefore, designers will
need dynamically self-calibrating techniques to exploit
fully the potentials of future nanometric CMOS technolo-
gies and overcome manufacturing limitations. 
Acknowledgments
The MARCO Consortium and the Gigascale System
Research Center partly supported this work.
References
1. M. Orshansky and K. Keutzer, “A General Probabilistic
Framework for Worst Case Timing Analysis,” Proc. 39th
Design Automation Conf. (DAC 39), ACM Press, 2002,
pp. 556-561.
2. S. Borkar et al., “Parameter Variations and Impact on
Circuits and Microarchitecture,” Proc. 40th Design
Automation Conf. (DAC 03), ACM Press, 2003, pp. 338-
342.
3. S. Tam et al., “Clock Generation and Distribution for the
First IA-64 Microprocessor,” IEEE J. Solid-State Circuits,
vol. 35, no. 11, Nov. 2000, pp. 1545-1552.
4. R. Hegde and N.R. Shanbhag, “Soft Digital Signal Pro-
cessing,” IEEE Trans. Very Large Scale Integration
(VLSI) Systems, vol. 9, no. 6, Dec. 2001, pp. 813-823.
5. L. Benini and G. De Micheli, “Networks on Chips: A New
SoC Paradigm,” Computer, vol. 35, no. 1, Jan. 2002, pp.
70-78.
6. D. Liu and C. Svensson, “Power Consumption Estima-
tion in CMOS VLSI Chips,” IEEE J. Solid-State Circuits,
vol. 29, no. 6, June 1994, pp. 663-670.
7. P.P. Sotiriadis and A. Chandrakasan, “Low Power Bus
Coding Techniques Considering Inter-wire
Self-Calibration Techniques
534 IEEE Design & Test of Computers
0
0.5
1.0
1.5
2.0
2.5
Vch (volts)
No
rm
al
ize
d 
de
la
y
0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6
Worst-case delay/voltage relation
Other delay/voltage relation
Classic system (poor and good wafers)
Self-calibrating system (good wafer)
Self-calibrating system (poor wafer)
Figure 11. The operating points used depend on technology
variations. The operating points are drawn with a size
proportional to usage.
0
0.5
1.0
1.5
2.0
2.5
Vch (volts)
N
or
m
al
iz
ed
 d
el
ay
0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6
Worst-case delay/voltage relation
Other delay/voltage relation
Classic system
Self-calibrating system
Figure 12. Operating points used by the self-calibrating
system in the presence of strong noise. The classic system
has a reduced yield under these conditions, while the self-
calibrating system moves to more energy-consuming, but
safer, operating points. The operating points are drawn with
a size proportional to usage. 
535November–December 2004
Capacitances,” Proc. IEEE Custom Integrated Circuits
Conf. (CICC 2000), IEEE Press, 2000, pp. 507-510.
8. D. Bertozzi, L. Benini, and G. De Micheli, “Low Power
Error Resilient Encoding for On-Chip Data Buses,” Proc.
Design, Automation and Test in Europe (DATE 02),
IEEE CS Press, 2002, pp. 102-109.
9. H. Zhang, V. George, and J.M. Rabaey, “Low-Swing On-
Chip Signaling Techniques: Effectiveness and Robust-
ness,” IEEE Trans. Very Large Scale Integration (VLSI)
Systems, vol. 8, no. 3, June 2000, pp. 264-272.
10. C. Svensson, “Optimum Voltage Swing on On-Chip and
Off-Chip Interconnects,” IEEE J. Solid-State Circuits, vol.
36, no. 7, July 2001, pp. 1108-1112.
11. T. Pering, T. Burd, and R. Brodersen, “The Simulation
and Evaluation of Dynamic Voltage Scaling Algorithms,”
Proc. Int’l Symp. Low Power Electronics and Design
(ISLPED 98), ACM Press, 1998, pp. 76-81.
12. L. Shang, L.-S. Peh, and N.K. Jha, “Dynamic Voltage
Scaling with Links for Power Optimization of
Interconnection Networks,” Proc. 9th Int’l Symp. High-
Performance Computer Architecture (HPCA 03), IEEE
CS Press, 2003, pp. 91-102.
13. J. Walrand and P. Varaiya, High-Performance Communi-
cation Networks, 2nd ed., Morgan Kaufmann, 2000.
14. F. Worm et al., “An Adaptive Low-Power Transmission 
Scheme for On-Chip Networks,” Proc. 15th Int’l Symp.
System Synthesis (ISSS 02), ACM Press, 2002, pp.
92-100.
15. F. Worm et al., “Soft Self-Synchronizing Codes for Self-
Calibrating Communication,” to be published in Proc. Int’l
Conf. Computer-Aided Design (ICCAD 04), IEEE CS
Press, 2004.
16. C. McNairy and D. Soltis, “Itanium 2 Processor Microar-
chitecture,” IEEE Micro, vol. 23, no. 2, Mar./Apr. 2003,
pp. 44-55.
Frédéric Worm is a PhD student in
the School of Computer and Commu-
nication Sciences at the Swiss Federal
Institute of Technology Lausanne
(EPFL), Switzerland. His research
interests include self-calibration techniques for net-
works on chips. Worm has an MS in communication
systems from the Swiss Federal Institute of Technolo-
gy Lausanne and a DEA (diploma for advanced stud-
ies) in networking and distributed systems from the
University of Nice-Sophia Antipolis, France.
Paolo Ienne is a professor at the
School of Computer and Communica-
tion Sciences at the Swiss Federal
Institute of Technology Lausanne
(EPFL), Switzerland, where he heads
the Processor Architecture Laboratory. His current
research interests cover aspects of advanced SoC
design, including automatic processor specialization,
programming abstractions for reconfigurable com-
puting, and self-calibrating design methodologies.
Ienne has an MS in electrical engineering from Politec-
nico di Milano and a PhD in computer science from the
Swiss Federal Institute of Technology Lausanne. He is
a member of the IEEE and the IEEE Computer Society.
Patrick Thiran is a professor in the
School of Computer and Communica-
tion Sciences at the Swiss Federal
Institute of Technology Lausanne,
Switzerland. His research interests
include communication networks and dynamic sys-
tems. Thiran has an electrical engineering diploma
from the Université Catholique de Louvain, Belgium;
an MS from the University of California at Berkeley; and
a PhD from the Swiss Federal Institute of Technology
Lausanne, both in electrical engineering. He is a mem-
ber of the IEEE and the ACM.
Giovanni De Micheli is a professor
of electrical engineering and comput-
er science at Stanford University. His
research interests include several
aspects of design technologies for ICs
and systems. He has an MS and a PhD in electrical
engineering and computer science from the Universi-
ty of California at Berkeley. De Micheli is a Fellow of
the IEEE and the ACM and a recipient of the IEEE
Emanuel R. Piore Award for contributions to synthesis
technology.
Direct questions and comments about this article
to Frédéric Worm, Processor Architecture Laboratory,
EPFL, Lausanne, Switzerland; frederic.worm@epfl.ch.
For further information on this or any other computing
topic, visit our Digital Library at http://www.computer.org/
publications/dlib.
