Robust checkers for self-calibrating designs by Worm, Frédéric
POUR L'OBTENTION DU GRADE DE DOCTEUR ÈS SCIENCES
PAR
ingénieur en systèmes de communication diplômé EPF
et de nationalité française
acceptée sur proposition du jury:
Lausanne, EPFL
2006
Prof. E. Telatar, président du jury
Prof. P. Ienne, Prof. P. Thiran, directeurs de thèse
Prof. G. De Micheli, rapporteur
Prof. T. Mudge, rapporteur
Prof. N. Shanbhag, rapporteur
robust checkers for self-calibrating designs
Frédéric WORM
THÈSE NO 3647 (2006)
ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
PRÉSENTÉE LE 24 NOvEMBRE 2006
à LA FACULTÉ INFORMATIQUE ET COMMUNICATIONS
Laboratoire d'architecture de processeurs
SECTION D'INFORMATIQUE
To my parents
Acknowledgements
First of all, I would like to express my wholehearted gratitude to both of my
advisors. I am deeply convinced of the professionalism of their abundant and
high quality guidance. I feel obliged to mention quantity since I have tasted the
luxe of weekly meetings with two professors for at least half of the thesis...
One of the contribution of my work is to exploit the strong complementarity
of two checkers and combine them into a novel highly robust checker architec-
ture. Likewise, on a human perspective, the thesis has brought together two
complementary characters, namely my advisors.
I am hardly exaggerating by describing one as a wild and rapid thinker,
exploiting roundtable discussions to produce plenty of illuminating graphs (when
his pencil is not lost), and, on top of that, mastering fully the finest nuances of
written and oral communication.
The other one is an embodiment of cold logic, rather reluctant to embark
on any speculative thinking, and always striving for the perfection of written
explanations (in a sense, the simplest possible one). A picture being worth a
thousand words, I imagine a bulldozer proceeding at its own pace, certainly, but
always forward. Through the work done with them, I will take with me features
from both worlds.
While I am well aware that many Ph.D. students would be jealous due to my
exceptional access to professorial resources, I have to admit that the beginning
of the thesis actually saw me coached by three professors: my advisors and
Professor De Micheli. I thank Nanni truly for his counselling and bootstrapping
of the project.
I am also grateful to Professor Mudge and Professor Shanbhag who reviewed
thoroughly the thesis.
Finally, I would like to thank all LAP members that I have met during my
thesis: LAP has definitely be a nice working place. A special thank to the
secretaries: Brigitte for her kindness, and Chantal for her ever-lasting cheerful
humor and recurrent laugh. I am very grateful to Ajay and his colleague for
their fruitful help on solving a particular mathematical problem I encountered.
Many thanks also to Miljan who has preceded me in the various tasks required to
finalise the thesis. Lastly, I want to thank Michael for the many aviation-related
debates held during coffee breaks and apologize to all other “coffee breakers”
who suffered from the apparent monotony of our discussions.
i
ii
Abstract
So far, performance and reliability of circuits have been determined by worst-
case characterisation of silicon and of environmental conditions such as noise
and temperature. As nanometer technologies exacerbate process variations and
reduce noise margins, worst-case design will eventually fail to meet an aggres-
sive combination of objectives in performance, reliability, and power. In order
to circumvent these difficulties, researchers have recently proposed a new de-
sign paradigm: self-calibrating circuits. The design parameters (e.g., operating
points) of a self-calibrating circuit are tuned at run-time by a controller. The
latter receives feedback from a checker that monitors correct operation of the
circuit. A self-calibrating circuit can thus trade dynamically reliability for power
or performance, depending on actual silicon capabilities and noise conditions.
This thesis pioneers the use of digital self-calibration techniques to dynam-
ically tune the operating points of an on-chip link based on the detection of
run-time transfer errors. In particular, we show that the energy overhead in-
duced by the checker and operating point controller is offset by the operating
of the link at sub-critical voltage. Such a system-level study strengthens the
interest into self-calibrated links by demonstrating their feasibility.
The primary focus of the thesis bears on the development of robust and low
overhead checkers for a self-calibrating on-chip data link subject to errors caused
by operation at sub-critical voltage. Such errors—we call them timing errors—
may be numerous and cause error rates as large as 100%. We abstract timing
errors by the failure of bit transitions and propose ad-hoc coding techniques to
detect them reliably. We emphasise the originality of the coding requirements
by showing that (i) traditional error correcting codes (like CRCs) fail to detect
timing errors under over-aggressive operation of the link, and (ii) asynchronous
codes such as dual-rail detect all timing errors, but incur a significant bandwidth
overhead in the synchronous context of our problem. Next, we introduce a novel
code-based checker satisfying such requirements and featuring unique detection
capabilities towards both timing and additive errors.
Then, we contrast the error detection capabilities of the code-based checker
with the one of double sampling flip-flops. We stress the complementarity of
the two approaches and show how to optimally combine them into an even more
robust checker featuring a very limited wiring and circuitry overhead. Finally,
we extend our work to computing elements by giving preliminary research di-
rections on the detection of timing errors resulting from the self-calibration of
the operating points of an adder.
The main contribution of this work is to propose novel checker architectures
based on codes and/or double sampling flip-flops to detect massive timing errors
caused by self-calibration of the link operating points. A requirement rendering
our work unique is that reliable operation of the checker should be ensured
over the whole range of bit error rates from 0 to 100%. Furthermore, we have
developed a unified framework bringing fundamental insights into the timing
error detection capabilities of various practical encoding schemes.
Keywords
self-calibrating VLSI design, electrical parameter variation, error detection, self-
synchronisation, low-power on-chip link.
iii
iv Abstract
Re´sume´
La performance et la fiabilite´ des circuits inte´gre´s sont actuellement de´termine´es
par des hypothe`ses pessimistes de type “worst-case” concernant les char-
acte´ristiques des transistors et l’environnement du circuit tels que le bruit ou la
tempe´rature. Cette approche est toˆt ou tard voue´e a` l’e´chec car la miniaturi-
sation croissante des circuits inte´gre´s s’accompagne d’une variation significative
des characte´ristiques des transistors ainsi que d’une re´duction des marges de
se´curite´ par rapport au bruit. Ainsi, il devient de plus en plus difficile, voire
impossible, de satisfaire simultane´ment les objectifs fixe´s en termes de perfor-
mance, fiabilite´, et faible consommation. Re´cemment, des chercheurs ont pro-
pose´ une approche diffe´rente qui consiste a` auto-calibrer les parame`tres impor-
tants au lieu de les de´terminer par des hypothe`se pessimistes. Par exemple, la
fre´quence et la tension d’un circuit est ajuste´e dynamiquement par un controˆleur
en fonction de la de la de´tection en ligne des erreurs cause´es par le processus
d’auto-calibration. Ainsi, il est possible d’utiliser des points de fonctionnement
beaucoup plus agressifs que ceux re´sultant d’hypothe`ses pessimistes.
Cette the`se est consacre´e a` l’e´tude et au de´veloppement du circuit charge´ de
de´tecter en ligne les erreurs cause´es par l’auto-calibration des points d’ope´ration
d’un lien sur puce. Durant ce processus, le taux d’erreur peut tout a` fait at-
teindre 100%. Nous mode´lisons ces erreurs par l’e´chec des transitions de bit
a` bit durant la communication. Dans un premier temps, nous proposons une
technique ad-hoc pour la de´tection de telles erreurs base´e sur le codage des
donne´es. En comparaison des codes asynchrones (comme Dual-Rail), cette tech-
nique permet de re´duire significativement la quantite´ de redondance tout en
garantissant une faible probabilite´ de non-de´tection des erreurs. En simulant
un lien sur puce capable d’auto-calibrer sa fre´quence et tension d’alimentation,
nous de´montrons que le bilan e´nerge´tique d’un tel syste`me est favorable car
l’energie supple´mentaire de´pense´e par le controˆleur et les bits de redondance est
largement compense´e par les gains dus a` l’alimentation du lien a` une tension
sous-critique. Ensuite, nous comparons la technique de de´tection d’erreur pro-
pose´e (i.e., encodage des donne´es) avec une autre me´thode base´e sur l’emploi
de bascules a` double e´chantillonage. Apre`s avoir mis en e´vidence la forte
comple´mentarite´ en terme de pouvoir de de´tection d’erreurs de ces deux tech-
niques, nous montrons comment elles peuvent eˆtre simplement combine´es. Il en
re´sulte un circuit extreˆment robuste, meˆme avec une faible redondance. Enfin,
nous donnons quelques pistes de recherches pre´liminaires quant a` la de´tection
des erreurs dues a` l’auto-calibrage des points d’ope´ration d’un additionneur.
La contribution principale de cette the`se est de proposer de nouvelles tech-
niques utilisant des codes et/ou des bascules a` double e´chantillonage afin de
de´tecter les erreurs dues a` l’auto-calibrage des points d’ope´ration d’un lien sur
puce. D’un point de vue the´orique, nous pre´sentons une taxinomie comple`te
des encodages ayant un inte´reˆt pratique dans les circuits synchrones et ayant
des capacite´s d’auto-synchronisation.
Mots Cle´s
auto-calibration pour circuits inte´gre´s sur puce, variation des parame`tres
e´lectroniques, de´tection en ligne d’erreurs, auto-synchronisation, lien sur puce
avec faible puissance.
v
vi R·esum·e
Contents
1 Introduction 1
1.1 Worst-Case Design . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Self-Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Goal and Problem Statement . . . . . . . . . . . . . . . . . . . . 5
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 How to Design Differently Than Worst-Case? 9
2.1 Dynamic Voltage and Frequency Scaling . . . . . . . . . . . . . . 9
2.2 Alternatives to Worst-Case Design . . . . . . . . . . . . . . . . . 11
2.2.1 Better Than Worst-Case Designs . . . . . . . . . . . . . . 11
2.2.2 Asynchronous Circuits . . . . . . . . . . . . . . . . . . . . 12
2.3 Fault-Tolerant VLSI Design . . . . . . . . . . . . . . . . . . . . . 14
2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Bit Error Rate and Channel Models 17
3.1 Bit Error Rate Model . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Channel Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 System-Level View of a Self-Calibrating On-Chip Link 27
4.1 System-Level Description . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.1 Statement of the Link Control Problem . . . . . . . . . . 37
4.3.2 Motivations for a Decoupled Control . . . . . . . . . . . . 41
4.3.3 Decoupled Control: a Particular Policy . . . . . . . . . . 42
4.3.3.1 Delay Estimation . . . . . . . . . . . . . . . . . 45
4.3.3.2 Properties . . . . . . . . . . . . . . . . . . . . . 46
4.3.3.3 Hardware Overhead . . . . . . . . . . . . . . . . 48
4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4.1 Dynamic Bandwidth Adaptation . . . . . . . . . . . . . . 50
4.4.2 Exploiting Technology Variations . . . . . . . . . . . . . . 52
4.4.3 Robustness Towards Design Uncertainties . . . . . . . . . 53
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
vii
viii CONTENTS
5 Self-Synchronisation for Synchronous Encoding Schemes 59
5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3 Self-Synchronisation and Sequencing Rules . . . . . . . . . . . . . 63
5.4 Hard Self-Synchronising Sequencing Rules . . . . . . . . . . . . 71
5.4.1 Self-Synchronising Sets . . . . . . . . . . . . . . . . . . . 71
5.4.2 Properties of Self-Synchronising Sets . . . . . . . . . . . . 73
5.4.3 Optimum Hard Self-Synchronising Encoding . . . . . . . 79
5.4.3.1 Optimum Symbol-Invariant Sequencing Rules . . 80
5.4.3.2 Summary on Hard Self-Synchronising Encoding 82
5.5 Soft Self-Synchronisation With Linear Codes . . . . . . . . . . . 83
5.5.1 Linear Codes Over the Timing Error Channel . . . . . . . 83
5.5.2 Alternating-Phase Encoding With Linear Codes . . . . . 86
5.5.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.6.1 Summary of Achievements . . . . . . . . . . . . . . . . . . 92
5.6.2 Fundamental Limits of Soft Self-Synchronising Encodings 93
6 Double Sampling, Coding, or Both? 95
6.1 Qualitative Comparison of Razor Flip-Flops with Codes . . . . . 96
6.2 Double Sampling and Codes . . . . . . . . . . . . . . . . . . . . . 99
6.2.1 Razor Flip-Flops Combined with Codes . . . . . . . . . . 99
6.2.2 Robustness to Timing Errors . . . . . . . . . . . . . . . . 102
6.2.2.1 Experimental Set-Up . . . . . . . . . . . . . . . 102
6.2.2.2 Comparison Results . . . . . . . . . . . . . . . . 105
6.2.3 Giving Up Correction? . . . . . . . . . . . . . . . . . . . . 108
6.2.4 Hardware Complexity of the Combined Checkers . . . . . 112
6.2.5 Double Sampling and/or Codes: Conclusions . . . . . . . 113
6.3 Comparing Checkers With Operating Point Usage . . . . . . . . 113
6.4 Towards Self-Calibrating Computation . . . . . . . . . . . . . . . 117
6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7 Conclusion 123
7.1 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.4 Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
A Residual Error Rate of Linear Codes Over the Timing Error
Channel 129
B Residual Error Rate of Alternating-Phase Encoding With Lin-
ear Codes Over the Timing Error Channel 139
C Residual Error Rate of the Berger Code Over the Binary Sym-
metric Channel 143
D Qualitative Comparison of Razor Flip-Flops and Codes 147
Bibliography 157
CONTENTS ix
Curriculum Vitae 159
x CONTENTS
List of Figures
1.1 Delay distribution of a new and an old CMOS technology. Even
though the average delay of the newer technology is improved
with respect to the older one, we observe barely no improvement
on worst-case delay due to the much larger spread of the newer
technology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Illustration of the potential waste resulting of worst-case design. 4
1.3 A circuit with self-calibrated frequency and voltage supply. Con-
trary to worst-case design, concerns about, on the one hand, re-
liability are confined to the checker, while, on the other hand,
power and performance trade-offs are determined by the operat-
ing point controller. . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 A Razor flip-flop. The shadow latch samples the input data some
time after the main flip-flop. The sampled values are then com-
pared. A mismatch indicates a timing error because the main
flip-flop has sampled data just before it was changing. In this
case, the output of the shadow latch is returned to the main flip-
flop input, correcting thus the timing error at the expense of an
additional cycle latency. . . . . . . . . . . . . . . . . . . . . . . . 11
3.1 A data link supplied at voltage vch and clocked at frequency Fch. 17
3.2 Contour plot of the bit error rate ε in the voltage (vch) and delay
(Tc) plan. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Bit error rate landscape in the voltage (vch) and delay (Tc) plan. 22
3.4 Left: a binary symmetric channel BSC(εa). Right: a timing error
channel TEC(εt). The box with symbol ∆ denotes a single cycle
delay element. The top right representation shows explicitly that
errors are caused by the erasure of individual bit transitions. The
bottom right representation emphasises that the channel output
yk is either xk or xk−1 depending on ek,t. . . . . . . . . . . . . . 23
3.5 Additive (top) and timing (bottom) errors during the transfer of
the bit sequence 01011. . . . . . . . . . . . . . . . . . . . . . . . 24
xi
xii LIST OF FIGURES
4.1 The basic idea of a self-calibrating point-to-point unidirectional
on-chip interconnect. (a) The classic static scheme, with a FIFO
that decouples two subsystems. (b) The proposed self-calibrating
scheme with additional elements needed in order to adjust the
operating points as required by workload and detected transfer
errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 A more detailed architecture of the self-calibrating point-to-point
unidirectional on-chip interconnect. . . . . . . . . . . . . . . . . . 30
4.3 A qualitative view of the sources of error in a self-calibrated in-
terconnect operating in too aggressive delay/voltage conditions.
(a) Correct operation after a sufficient delay. (b) Bit-errors due
to the sampling after a largely insufficient delay. (c) Risk of
metastability in the receiver for slightly too aggressive sampling
times. (Note that the figure is simplistic in that a new symbol
would be emitted at the same time the line is sampled.) . . . . . 32
4.4 Timing errors and encoding schemes with spatial (left) and tem-
poral (right) redundancy. . . . . . . . . . . . . . . . . . . . . . . 32
4.5 Synchronous implementation of a K-bit LEDR encoder. The
flip-flop generates the alternating phase bit. LEDR is thus an
“alternating-parity” encoding scheme. . . . . . . . . . . . . . . . 34
4.6 The new self-synchronising encoding scheme we propose. The
input (respectively received) data is augmented with a phase bit
that is not transmitted but generated by the encoder (respectively
decoder). The error signal does not only detect bit flips, but also
detects when the sampled word is still the last one correctly sent
across the channel. . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.7 The code C0 (respectively C1) uses in the alternating-phase en-
coding consists of all codewords of the code C having 0 (respec-
tively 1) as the most significant information bit. . . . . . . . . . . 36
4.8 Word error rate and residual bit error rate as a function of the raw
bit error rate for the CRC-8 alternating-phase encoding generated
by the polynome x8 + x2 + x + 1 (40-bit words). . . . . . . . . . 37
4.9 The energy scaling ratio of Eq. (4.2) as a function of the word
error rate. The parameter Θ is the ratio Esys/Ech under the
worst-case voltage vdd. . . . . . . . . . . . . . . . . . . . . . . . . 40
4.10 Use and estimation of best operating points. (a) The control
policy fixes the operating frequency in function of the delay con-
straint; it sets the operating voltage to the minimum value which
has experienced error-free transmission. (b) The controller raises
the best voltage for a given frequency when experiencing errors;
otherwise, every several cycles, it tries tentatively to reduce it in
order to ensure aggressive operation. . . . . . . . . . . . . . . . . 43
4.11 Simplified operating point control policy. T1 and T2 are the
values of two constant thresholds. . . . . . . . . . . . . . . . . . 44
4.12 GI/D/1 model of the transmission scheme. The arrivals in the
queue are i.i.d processes. According to Little law, the average
transfer delay is proportional to the average buffer fill level. We
simply estimate the average transfer delay by the delay experi-
enced by the last queued element. . . . . . . . . . . . . . . . . . 45
LIST OF FIGURES xiii
4.13 Circuit determining the slowest frequency making possible to
meet the required delay constraint. By convention, frequency
index 0 corresponds to the fastest frequency. K0,...,KQ are con-
stants that are hardwired for every frequency. . . . . . . . . . . . 46
4.14 Sensitivity of various metrics to the threshold T1 (expressed in
transmission cycles) mentioned in the description of the controller
algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.15 Impact of the ratio between controller and data path clock on:
(top) energy saving and (bottom) average transfer delay. The
dashed line represents the target delay of the classic system. . . . 49
4.16 Transmission of a variable workload. Top: workload variation in
time. Bottom: incurred frame delay in the classic system (low
delay) and in the self-calibrating interconnect (delay as close as
possible to the imposed constraint—dashed line). . . . . . . . . . 51
4.17 Energy breakdown of the self-calibrating interconnect. . . . . . . 52
4.18 Energy saving (with respect to the classic system) as a function
of the average transfer delay for different Poissonian workloads.
The system becomes energy-inefficient only under high workloads
requiring the interconnect to work most of the time at full speed. 53
4.19 Operating points used depending on technology variations. ◦ →
classic system; + → self-calibrating system on a poor wafer; × →
self-calibrating system on a good wafer. The bold line represents
the worst-case relation between delay and voltage. According to
the model, slightly less than 1% of the wafers are classified as
good or better than good, and poor or worse than poor. . . . . 54
4.20 Energy saving (with respect to the classical system) as a function
of the measured average transfer delay for different wafer quality.
The better the wafer, the more important the energy saving. . . 54
4.21 Operating points used by the self-calibrating system in the pres-
ence of strong noise. ◦ → classic system; + → self-calibrating
system. The classic system has a reduced yield under these
conditions, while the self-calibrating one moves to more energy-
consuming, but safer operating points. . . . . . . . . . . . . . . 55
5.1 Two different self-synchronising encoding schemes that both obey
a sequencing rule where two dictionaries (namely, C0 and C1) are
used alternatively. . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 Hard self-synchronisation—contrary to soft self-
synchronisation—ensures that any intermediate symbol that
may be sampled while individual bit transitions are taking place
is not mistaken for a valid codeword. . . . . . . . . . . . . . . . . 64
5.3 Encoder structure of a sequencing rule. The next emitted symbol
depends on the previously emitted one, on the number of symbols
emitted so far, and on the input information. . . . . . . . . . . . 65
5.4 The different symbol sequences that can be generated up to
time index 2, for a 2-bit sequencing rule with symbol set S ={
s0, s1, s2, s3
}
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.5 Structure of an alternating-phase encoder. The flip-flop acts as
a 1-bit counter. . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
xiv LIST OF FIGURES
5.6 Structure of a differential encoder. The next emitted codeword
depends on the previously emitted one and on the input informa-
tion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.7 Encoder structure of a (a) general, (b) time-invariant, and
(c) symbol-invariant sequencing rule. The simplest encoder
structure—such as, standard codes as defined in Example 5—is
drawn in case (d) and has no self-synchronising property because
only spatial redundancy is added. . . . . . . . . . . . . . . . . . . 68
5.8 Synchronous implementations of a (N = 2 K, K) LEDR encoder:
(a) symbol-invariant and (b) time-invariant. Implementation (a)
is symbol-invariant since the decoding set can be determined only
knowing the phase, which is a 1-bit counter. Implementation (b)
is time-invariant because the next decoding set is a function only
of the previously emitted symbol. . . . . . . . . . . . . . . . . . . 70
5.9 The graph G (010). The equivalence classes are drawn with a
dotted line. The claims of Lemma 1 can be verified on this example. 75
5.10 Relative bandwidth efficiency of various codes with respect to
the optimal solution, i.e., the Sperner code with spacer-based or
differential encoding. . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.11 Residual error rate as a function of the timing error rate εt for the
linear (N = 40, K = 32) CRC code generated by the polynome
x8 + x2 + x + 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.12 Possible options of alternating-phase encoding. The bottom right
drawing depicts the LEDR code. All options with more than one
independent encoder are particular cases of the top row. . . . . . 87
5.13 Residual error rate (εres) as a function of the timing error rate
εt for the (N = 40, K = 32) CRC-8 alternating-phase encoding
generated by the polynome x8 +x2 +x+1. The 40-bit word error
rate curve has been plotted to show that the residual error rate
of the encoding is very low (less than 10−10) as long as the word
error rate remains less than a few percents. . . . . . . . . . . . . 88
5.14 Residual error rate as function of the additive bit error rate εa
over a 10-bit wide BSC for the (N = 10, K = 7) Berger code
(top), the (N = 10, K = 5) LEDR code (middle), and the
(N = 10, K = 7) alternating-phase encoding generated by the
polynome x3 + x2 + 1 (bottom). . . . . . . . . . . . . . . . . . . 91
5.15 Residual error rate as function of the additive bit error rate εa
over a 38-bit BSC for the 2 · (N = 19, K = 15) Berger code,
the (N = 38, K = 19) LEDR code, and the (N = 38, K = 30)
alternating-phase encoding generated by the polynome x8 +x2 +
x + 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.16 The graph sketches how the novel CRC-based alternating-phase
encoding compares with existing encoding schemes targeting ei-
ther timing or additive errors. The results developed in the chap-
ter show that improving the robustness to timing errors of the
CRC alternating-phase encoding entails both a reduction of band-
width efficiency and a lower robustness towards additive errors,
as indicated by the thick arrow. . . . . . . . . . . . . . . . . . . 94
LIST OF FIGURES xv
6.1 Top and middle: residual (top) and reported (middle) word error
probability as a function of supply voltage for a bus terminated
by Razor flip-flops or a soft SSC. Bottom: ratio of residual to
reported error probability for each checker. The reliability metric
plotted in the vertical axis expresses clearly and compactly the
complementarity of both checkers. . . . . . . . . . . . . . . . . . 97
6.2 The K-bit input data is first encoded into a N -bit codeword.
The codeword is transmitted over a link terminated by Razor
flip-flops. Finally, the data sampled by the Razor flip-flops is
validated in the decoding stage. . . . . . . . . . . . . . . . . . . . 99
6.3 Timing diagram of a timing error (top) and of a short-path error
(bottom). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.4 Model of a N -bit bus terminated by Razor flip-flops. . . . . . . . 103
6.5 All possible error outcomes for each of the 3 checkers (soft SSC,
Razor flip-flops, and combined). . . . . . . . . . . . . . . . . . . 104
6.6 The ratio ρ = εres/εrep as a function of bit error rate under nor-
mal variance (top) and large variance (bottom). The curves are
extended with vertical dotted lines at the first and last simulated
points where residual errors could be measured. The maximum
unreliability of RZR+CRC1 is comparable to CRC3; however,
it occurs at significantly larger bit error rates—and thus smaller
voltages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.7 Bit error rate as seen by the soft SSC with and without Razor
flip-flops. Without error Razor flips-flops, the bit error at the
decoder input is the bit error rate of link. With Razor flip-flops,
the bit error rate at the decoder input is less than the bit error
rate of the link, since some errors are corrected. . . . . . . . . . . 107
6.8 A data link using a checker combining double sampling flip-flops
with a soft SSC. Timing errors are not corrected. Error recovery
(by retransmission) and in-sequence data delivery is ensured by
an ARQ controller such as the one described in Sec. 6.2.1. . . . 109
6.9 Timing diagram describing the operation of the checker depicted
in Fig. 6.8 in the presence of a timing error. Note that, contrary
to the top diagram of Fig. 6.3, the phase of the encoder and
decoder is not affected by the timing error. . . . . . . . . . . . . 109
6.10 The ratio ρ = εres/εrep as a function of bit error rate under normal
variance (top) and large variance (bottom). In both scenarios, the
DSFF+CRC1 checker outperforms all other checkers, except the
RZR+CRC3 checker in a limited range of error rate. . . . . . . 111
6.11 Hypothetic comparison of two checkers. With the operating point
distribution of scenario 1, checker A is more reliable than checker
B because the circuit is mostly used under error rate ε3. On the
contrary, the distribution of scenario 2 uses mainly the error rate
ε1 where checker B is more reliable than checker A. . . . . . . . . 114
6.12 Successive steps required for the computation of the effective re-
liability metric ρeff. . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.13 The error rate tracking controller modelled as a discrete time
Markov chain. The probability of requesting voltage level j given
that the current voltage level is i is denoted by pi,j . . . . . . . . 116
xvi LIST OF FIGURES
6.14 An input-output property verified by the computing element un-
der correct operation is used as a checking principle. Without loss
of generality, a single input, single output computing element can
be considered. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.15 An adder with an alternating-parity prediction scheme. A and B
are the two parity-encoded operands. S and C are respectively
the sum and internal carries. . . . . . . . . . . . . . . . . . . . . 119
6.16 An adder combining an alternating-parity prediction scheme with
Razor flip-flops. A and B are the two parity-encoded operands.
S and C are respectively the sum and internal carries. . . . . . . 120
A.1 Information bits of e˜ that are 1 impose that the bits at the same
position in the codeword t are also 1 as indicated by the two left-
most vertical arrows. Similarly, the redundant bits of e˜ that are
1 impose that the corresponding bit positions are 1 for each ele-
ment of Sup(e˜). This constraint is not be met under any possible
assignment of unconstrained information bit positions marked by
a “X”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
D.1 Word error rate εw (top), residual error rate εres (middle) and
reported error rate εrep (bottom) of Razor flip-flops (left) and
alternating-phase code (right) with the 8-bit CRC generated by
the polynome x8 + x2 + x + 1 for 32-bit information. . . . . . . . 149
Chapter 1
Introduction
Out of intense complexity, intense
simplicities can emerge.
Sir Winston Churchill
We start by briefly introducing worst-case design in Sec. 1.1. In particular,we explain why this technique is facing increasingly large difficulties as
CMOS technology scales down. Next, Sec. 1.2 describes the concept of self-
calibrating circuits. We motivate the interest in this novel technique that we
present as an alternative to worst-case design. Sec. 1.3 states qualitatively our
goal and the problem we focus on. Finally, Sec. 1.4 gives the general structure
of the thesis.
1.1 Worst-Case Design
Worst-case design is a widespread engineering technique extremely simple in its
principle: it consists in taking a safety margin sufficient to ensure that certain
design objectives are met even in conjunction of different worst-case situations.
Engineering knowledge is required to quantify what is both “sufficient” and
cost-effective.
Described at a high level, worst-case design in the VLSI field is the technique
designers have used so far to ensure that a circuit meets its performance, relia-
bility, and power objectives, even though there are uncertainties at design time.
It consists in making worst-case assumptions about various quantities that are
unknown at design time. Essentially, these assumptions bear on the following
aspects.
• Process capability. This aspect relates to the speed at which silicon
operates. It is modelled as a probabilistic distribution, either continuous—
speed is then a positive real number— or discrete (e.g, speed is either slow,
typical, or fast). The speed of silicon assumed at design time is the one of
a slow process. If silicon speed is modelled as a Gaussian random variable,
then the speed assumed at design time is usually at a 3 σ variation from
the typical speed (with σ the standard deviation of the Gaussian random
variable).
1
2 1. INTRODUCTION
• Noise and environmental conditions. Different noise sources—
capacitive and inductive coupling, IR drops, etc—are modelled by prob-
abilistic distributions. Their effect is considered additive. Environmental
conditions such as temperature are also taken into account.
Worst-case design aims thus at meeting required performance, reliability, and
energy consumption levels under some very pessimistic noise, process, and envi-
ronmental conditions considered as worst-case. For example, a circuit designed
with slow process should operate at the maximum required performance, meet
energy consumption and reliability objectives under the worst noise conditions.
Only a tiny fraction of all the manufactured chips do not meet these design
objectives: the remaining ones constitute the yield. In addition, worst-case de-
sign ensures correctness-by-design, since actual process capabilities and actual
operating conditions are necessarily more favourable than the ones assumed
at design time. This nice property comes at the expense of another: conser-
vativeness. Indeed, worst-case design is intrinsically conservative, because the
conjunction of all the worst-case assumptions is unlikely.
Recently, researchers have raised concerns about the viability of worst-case
design as CMOS technology continues to scale down [Austin et al., 2005]. The
downside of technology scaling is an increase in variability and design complexity.
• Variability. As CMOS technology moves to nanometer geometry, cir-
cuits become more vulnerable to internal and external noise sources. The
ever decreasing supply voltage reduces noise margins, which in turn exac-
erbates the effect of various noise sources and the susceptibility to alpha
particles (soft errors) [Nicolaidis, 2005]. At the same time, we see increas-
ingly more variation in the devices dimensions and behavior. This fact
is referred to as parameter variation and is explained by several reasons.
First, due to extreme scaling, random variations in the concentrations of
dopant atoms in transistor channels affect their characteristics, such as
speed and leakage current. Typically, chips manufactured in 130nm tech-
nology exhibit a 30% variation in operating frequency and 5 to 10 times
variation in leakage power [Borkar, 2005]. Leakage variations of the order
of 20 times are reported in 90nm technology. Secondly, lithography—the
method used for patterning transistors—is pushed to its limits, causing
rough line edges. It is thus difficult to scale device dimensions without cre-
ating deep sub-micron noise effects, or even defects. For example, lateral
coupling dominates in nanometer wires, which causes serious crosstalk.
Moreover, it is anticipated that, in future technologies, as much as 10%
of the transistors may have a variation in critical dimensions larger than
6 σ [Borkar et al., 2004]. Another factor contributing to differences in
transistor behavior is the variation in the heat flux over the chip. While
the two previous factors cause static variation, variation of temperature
over the chip is dynamic, as it varies in function of time, workload, and
location.
• Complexity. According to Moore’s law, which has been tracked accu-
rately until now, the number of transistors available to designers doubles
every 18 months. Due to this exponential increase of the transistor budget,
circuits of unpreceded complexity can be manufactured. However, at the
same time, the verification of such circuits requires huge efforts. In par-
ticular, verifying the different corner cases (such as process, voltage, and
1.2. SELF-CALIBRATION 3
temperature) is a tedious work, although most of these corners are quite
unlikely to be encountered during actual operating conditions. The main
culprit here stems from one attribute of worst-case design: correct opera-
tion has to be ensured. As a result, designers have the challenging task of
verifying highly complex circuits under highly unfavorable conditions.
In summary, worst-case design has been successful so far, since manufacturing
has been nearly defect-free, process has exhibited low variation, and susceptibil-
ity to noise has been controlled. Nanometer technologies, which are forecast in
the near future, exhibit none of these properties. These observations motivate
concerns on the capability of worst-case design to build reliable, fast, high-
yield, and low-power chips. Fig. 1.1 illustrates the inefficiency of worst-case
design when facing large delay variation (the spread of distributions has been
exaggerated for the purpose of illustration). As a result, costly efforts made by
delay
pr
ob
ab
ilit
y
di
st
rib
ut
io
n
newer technology:
faster but less predictable
older technology
significant
improvement
in typical delay...
...but minimal
improvement
in worst-case delay
same planned
yield losses
τ’typ τtyp
τ’slow τslow
3σ’
3σ
Figure 1.1: Delay distribution of a new and an old CMOS technology. Even though
the average delay of the newer technology is improved with respect to the older one,
we observe barely no improvement on worst-case delay due to the much larger spread
of the newer technology.
technologists to reduce typical delay are cancelled by a simultaneous increase in
delay variation.
The next section introduces the concept of self-calibrating design, and ex-
plains why it is expected to handle better uncertainty than the worst-case tech-
nique just presented.
1.2 Self-Calibration
Worst-case design does not enable to separate concerns such as reliability, perfor-
mance, and energy consumption. As a result, exploiting the trade-offs between
these objectives becomes quite intricate. For example, if a manufactured circuit
is faster than expected or required, worst-case design prevents it from either op-
erating faster, as actual silicon capabilities would permit, or from saving energy
by working at lower voltage without any performance penalty. Self-calibration is
meant to exploit such trade-off. Fig. 1.2 illustrates this point with a simple qual-
4 1. INTRODUCTION
Worst-Case
Nominal
Actual
Supply Voltage
Delay
A
vdd
B
X’’’’
X’’’
X’’
X’
C
B'
vdd’
Figure 1.2: Illustration of the potential waste resulting of worst-case design.
itative example. The nominal relation between delay and supply voltage might
be worsened by the deviation of a number of physical phenomena whose cumula-
tive effect is expressed in the worst-case relation. Points X′ to X′′′′ could be used
by Dynamic Voltage Scaling (DVS) techniques that we discuss in Sec. 2.1. At a
given supply voltage vdd, a designer assumes the most conservative delay—that
is, that the operating point is not, for instance, A but B—and implements the
design accordingly. In fact, the actual device at a particular instant is very likely
to be operating in much more favourable conditions and, for instance, the delay
may be actually as in C. This implies the following energy waste: operation
at the reduced voltage v′dd (B
′) would yield the same performance the system
has been designed for. A less conservative operation in B′ rather than B would
achieve the very same user function in the same time but would have saved a
potentially significant amount of energy—roughly proportional to the difference
of the square of the supply voltages vdd and v
′
dd. As this example illustrates,
worst-case design typically results in a waste of resources—usually silicon area
and, more critically, energy.
Digital self-calibration has recently gained momentum in the VLSI research
community. At the device level, examples include the self-calibration of a
keeper1 based on actual measurement of the leakage current [Kim et al., 2005]
and the adaptive source biasing of an SRAM array [Ghosh et al., 2006]. At
the circuit level, several architectures have been proposed where the operating
voltage and frequency are tuned dynamically as a function of detected errors.
Through the work described in Chapter 4, this thesis has actually pioneered
the use of digital self-calibration to tune the operating points of an on-chip
link [Worm et al., 2002; 2005]. Shortly after our initial work of 2002, researchers
from the university of Michigan have developed self-calibration techniques based
on a checker consisting of double sampling flip-flops [Austin et al., 2004;
Kaul et al., 2005; Das et al., 2006].
1A keeper is a device usually added to large fan-in domino gates.
1.3. GOAL AND PROBLEM STATEMENT 5
checkerinput
controller
Vdd
Vdd
correct output
V < V
circuit/link
undetected
erroneous 
output
detected
erroneous 
output
F
reliability
power vs. performance trade-off
performance
requirement 
dd
Figure 1.3: A circuit with self-calibrated frequency and voltage supply. Contrary to
worst-case design, concerns about, on the one hand, reliability are confined to the
checker, while, on the other hand, power and performance trade-offs are determined
by the operating point controller.
Fig. 1.3 depicts the structure of a circuit where the supply voltage is self-
calibrated as a function of reported errors. The output is verified by the checker.
The latter informs the controller of detected errors. Depending on the occur-
rence of detected errors and on the actual performance requirement, the con-
troller adjusts frequency and/or voltage. Correction of an erroneous circuit
output is either achieved by the checker, or by re-evaluating the output. While
the value of frequency and voltage do not result from worst-case assumptions
about the circuit or link, the checker and controller are assumed to be designed
worst-case. Clearly, self-calibration incurs some hardware overhead (mainly the
checker and operating point controller logic). Yet, the overhead is expected to
be offset by operation at sub-critical voltage. We will verify this expectation in
Chapter 4.
Circuits with self-calibrating operating points have the following peculiari-
ties. Contrary to fault-tolerant circuits (that we discuss in Sec. 2.3), the checker
is part of a closed loop control system. This feature complexifies the estimation
of reliability. For example, the controller should not request operating points
where the checker reliability is poor. Moreover, during the self-calibration pro-
cess, the circuit may be operated temporarily at sub-critical voltage, i.e., under
an error rate as large as 100%. Detecting errors even under such conditions puts
an original and challenging requirement on the checker.
The next section states the goal of this work and formulates qualitatively
the problem we address.
1.3 Goal and Problem Statement
A circuit with self-calibrated operating point requires to (i) detect and correct
corrupted circuit outputs, and (ii) control the operating points based on feed-
back about errors and performance requirements. We want to address items (i)
6 1. INTRODUCTION
and (ii) by separating concerns between, on the one hand, reliability—confined
to the checker—and, on the other hand, power and performance trade-offs that
can be exploited by the operating point controller. That is, ideally, reliability
should be determined exclusively by the checker, while the operating point con-
troller would determine power and performance of the circuit. Such a situation
can only be attained if reliability is ensured by the checker under any circuit
error rate. Otherwise, the operating point control policy needs to account for
reliability concerns—e.g., avoiding weak spots of the checker—in addition of
power and performance objectives. This situation complexifies design trade-
offs, or may still require worst-case assumptions to ensure reliability, like, in the
case of double sampling flip-flops. In short, we aim at developing highly robust
checkers enabling a reliability-agnostic control of the circuit operating points.
The work presented in this thesis focuses on item (i), in the context of an
on-chip link with self-calibrating operating points. More specifically, our goal is
to design a low-overhead link checker, reliable under any error rate ranging from
0 to 1. This translates into the following problem. Let tp be the propagation
delay through a line of a link, whose operating frequency and voltage are self-
calibrated. Even though the link may be operated at sub-critical voltage, tp is
not characterised by any worst-case assumption. The link error rate may thus
reach 100%, which happens if the propagation time tp exceeds significantly the
sampling period. In this context, we want to design a reliable checker for the
link. The worst-case solution of this problem is to use conservative operating
points that ensure statistically tp < Tc.
We will argue in Sec. 4.3 that, if this goal is fulfilled, frequency and voltage
can be determined in a decoupled manner. On the one hand, frequency is set
based on performance requirement only. This question is well researched, as
we mention later in Sec. 2.1. On the other hand, voltage is adjusted based
on detected errors by simple policies, such as tracking a given error rate, or
requesting to increase voltage as soon as an error is reported. While the focus
of this thesis is on solving item (i), the solutions proposed for this problem
enable to address item (ii) as well using the dynamic voltage scaling technique
we describe in Sec. 2.1. The benefits of this approach are as follows:
• separation of concerns between reliability, power, and performance;
• use of well studied dynamic voltage scaling techniques to determine oper-
ating frequency;
• simple control of the operating voltage based on detected errors.
1.4 Outline
Chapter 2 reviews various techniques related to some extent with our goal such
as dynamic voltage and frequency scaling, double sampling flip-flops, on-chip
data encoding, asynchronous, and fault-tolerant circuits. Each time, we empha-
sise in what the requirements of our problem are unique.
Chapter 3 defines the models of the bit error rate and communication chan-
nel assumed in this work. The bit error rate model is not developed in the
purpose of high accuracy, but rather in order to generate errors during simula-
tions and compare qualitatively different checkers under a common error model.
Regarding the channel, the model introduced defines formally timing errors, i.e.,
how we abstract errors caused by operation at sub-critical voltage.
1.4. OUTLINE 7
Chapter 4 gives a detailed system-level description of a self-calibrating link.
The goal of this chapter is twofold. First, we study the energy balance of the
self-calibrating link. We observe that tangible energy gains can be achieved
without sacrificing performance nor reliability. This result settles the interest
into checkers such as the one we study, because it shows that a self-calibrating—
despite requiring additional elements—is not unrealistic. Second, we introduce
a particular embodiment of the code-based checker we use to detect massive
timing errors. We state up-front that the main contributions of this thesis are
not to be found in this chapter, but rather in the two following ones where
we study in an analytical framework the self-synchronising properties of more
general encoding families and propose more robust checker architectures.
In Chapter 5, we develop an analytical framework to express and study
in a synchronous context the self-synchronisation properties required to detect
timing errors. More specifically, we give an achievable lower bound on the
wiring overhead required to detect all such errors—a property that we call hard
self-synchronisation. We address this question first in all generality, and then
focus on a particular encoder structure. Next, we study soft self-synchronising
encodings, i.e., schemes that use bandwidth more efficiently than hard self-
synchronising ones (and thus do not detect all possible timing errors). Moreover,
we show how the encoding proposed in Chapter 4 is obtained by reducing the
wiring overhead of a particular hard self-synchronising encoding.
In a first part, Chapter 6 focuses on checkers based on double sampling flip-
flops. Having explained in Chapter 2 how they detect and correct timing errors,
we now contrast their error detection capabilities with our coding technique.
We emphasise their complementarity. Next, in a second part, we exploit this
fact and show how to simply combine them, while preserving the benefits of
each one. The resulting checker architecture features an unmatched robustness
to timing errors over the whole range of error rate. Moreover, we show that the
combination enables to relax the coding requirements: very basic and tiny codes
can be used in the combined checker. We take advantage of this opportunity
to extend our scope to computation. We give preliminary checker architectures
for an adder with self-calibrated supply voltage. The main achievement of this
chapter is to propose novel checker architectures meeting the goal stated in
Sec. 1.3.
Finally, Chapter 7 concludes by summarising our achievements with respect
to the goal introduced in this chapter and points out perspectives opened by
this work.
8 1. INTRODUCTION
Chapter 2
How to Design Differently Than
Worst-Case?
This chapter briefly reviews existing pieces of work that relate to our goal.Some of them constitute enabling techniques (e.g., dynamic voltage and fre-
quency scaling), while other provide complementary solutions or address slightly
different objectives. We proceed as follows. Sec. 2.1 introduces voltage and fre-
quency scaling techniques, which we use to handle uncertainty not only about
workload—in fact, its traditional application—but also in noise and process
condition. Then, Sec. 2.2 describes alternatives to worst-case design, i.e., tech-
niques that do not satisfy design objectives such as power, performance, and
reliability by relying on pessimistic worst-case assumptions. In their intent,
such techniques are similar to the one we focus in this work. At last, Sec. 2.3
discusses various contributions to fault-tolerance in VLSI systems. To conclude,
we stress that no single piece of work mentioned in this chapter meets our goal,
which motivates the originality of our approach.
2.1 Dynamic Voltage and Frequency Scaling
Dynamic voltage and Frequency scaling (DVFS) enables to adjust the operating
frequency of a circuit in function of actual workload requirements—typically
less than peak performance. By operating at reduced frequency, a reduction in
supply voltage is in turn possible. The dynamic power Pd dissipated in a circuit
is given by the well-known equation
Pd =
1
2
α C f v2dd,
where α is the switching activity factor, C the total switched capacitance ab-
stracting the circuit netlist, f is the operating frequency, and vdd is the supply
voltage. Because DVFS reduces both the operating voltage and frequency, the
resulting decrease in dynamic power consumption is significant. Reducing power
consumption is a key concern, especially for battery-powered circuits—there is
no equivalent of Moore law for batteries—and high performance computing el-
ements where cooling is critical.
9
10 2. HOW TO DESIGN DIFFERENTLY THAN WORST-CASE?
DVFS has been successfully applied to microprocessors [Nowka et al., 2002;
Flautner et al., 2001]. For example, it has been implemented for the Pentium
processor and is supported by both the Windows 32 and Linux operating sys-
tems. In addition, DVFS has been proposed for off-chip links [Shang et al.,
2003]. As pointed out in the discussion of Fig. 1.2, such designs still rely on
worst-case assumptions to determine the voltage level chosen for each possible
operating frequency. The energy overhead required to change voltage is kept
low due to the high efficiency of state-of-the-art voltage converters [Sakiyama
et al., 1999].
A DVFS-capable processor offers the possibility to operate at less than peak
performance. Yet, in a real-time context, deadlines of tasks should not be
missed, even though the processor they are running on does not always operate
at peak performance. This requirement leads to the formulation of the following
scheduling problem. Given a set of possible operating points and a set of tasks
(each one is characterised by its arrival time, processor cycles requirement, and
deadline), find a schedule of operating points and tasks that minimises energy
consumption (approximated by the average square operating voltage) and meets
the deadline of each task. This problem has been heavily researched and is
solved: for example, we refer the reader to the elegant solution proposed by
Gaujal et al. [2005]. For applications without any hard real-time constraints,
the control of frequency can be determined by simpler algorithms, e.g., using
history-based workload prediction [Sinha and Chandrakasan, 2001].
Besides the fact that DVFS techniques rely on worst-case design and are
thus affected by the increasing variability of nanometer technology, their us-
age in future designs suffers from additional limitations. First, an ineluctable
consequence of the reduction in supply voltage is that the range of voltage avail-
able for further scaling dramatically shrinks. For illustration, typical values of
the supply and threshold voltages of a 90nm technology are respectively 1V and
0.2V, which does not leave a large margin for voltage scaling. Somewhat related
to this first point is the fact that, as technology progresses, an increasingly large
part of the power dissipated is static. While DVFS only decreases the dynamic
power consumption, researchers have proposed to combine it with a method also
addressing static power [Martin et al., 2002].
In the perspective of this work, DVFS in itself does not provide a solution
since each operating pair frequency-voltage is chosen by worst-case assumptions.
Nevertheless, DVFS embodies a design technique that handles workload uncer-
tainty differently than by always operating at peak frequency (i.e., worst-case).
Our approach consists in extending this idea to accommodate uncertainty not
only about workload, but also about noise and process conditions. Moreover,
we do so by leveraging existing work about frequency control. With respect
to our intents, the significance of DVFS techniques is to provide mature build-
ing blocks enabling to dynamically vary the on-chip supply voltage. Many
researchers are currently exploiting this opportunity for different purposes (e.g.,
dynamic thermal management [Brooks and Martonosi, 2001] or dynamic relia-
bility management [Karl et al., 2006]). Yet, all these design methods share with
our work the fact that they bring alternatives to traditional worst-case design.
2.2. ALTERNATIVES TO WORST-CASE DESIGN 11
clk
clk_del
0
1
dout
rzr_error
din
shadow latch: checker optimised 
for reliability 
main flip-flop: optimised 
for performance 
Figure 2.1: A Razor flip-flop. The shadow latch samples the input data some
time after the main flip-flop. The sampled values are then compared. A mismatch
indicates a timing error because the main flip-flop has sampled data just before it
was changing. In this case, the output of the shadow latch is returned to the main
flip-flop input, correcting thus the timing error at the expense of an additional cycle
latency.
2.2 Alternatives to Worst-Case Design
This section discusses a few examples of designs sharing the feature that the
traditional worst-case approach is either marginally or even not at all used to
meet design objectives.
2.2.1 Better Than Worst-Case Designs
Better than worst-case designs enable to separate concerns such as performance
and reliability by optimising the main circuit for the former while ensuring the
latter with a checker [Austin et al., 2005].
Recently, novel designs have been successfully implemented that rely on
double sampling flip-flops as a checker to correct timing errors caused by self-
calibration of the supply voltage of a circuit [Austin et al., 2004; Kaul et al.,
2005; Das et al., 2006; Li et al., 2006]. Previously, double sampling data—i.e.,
the addition of intra-cycle time redundancy—has been proposed by Nicolaidis
as a method for recovering from transient faults, which exploits their locality in
time [1999]. As Fig. 2.1 shows, a Razor flip-flop consists of a main flip-flop—
fed by the normal clock–and a so-called shadow-latch fed by a clock delayed
with respect to the main clock. Data at the input is first latched by the main
flip-flop. Then, it is latched after some delay by the shadow latch. The setup
and hold constraints of the latter are ensured by both worst-case and best-case
assumptions about the data arrival time. Provided these assumptions are met,
the data held by the shadow latch acts as a “reference”. Next, a metastability-
tolerant comparator validates the data latched in the main flip-flop by comparing
it with the one of the shadow latch. In case a timing error—i.e., a mismatch—
is detected, the output of the shadow latch is re-issued at the main flip-flop
input, while at the same time raising an error signal. The timing error is thus
12 2. HOW TO DESIGN DIFFERENTLY THAN WORST-CASE?
corrected at the expense of a one cycle latency (in a processor pipeline, error
recovery actually incurs a larger penalty).
A Razor flip-flop corrects timing errors reliably as long as the setup and hold
constraints of the shadow latch are met, because the data held by the latter is
used to correct timing errors. These constraints are ensured by best-case and
worst-case assumptions about the arrival time of data at the input of the shadow
latch. The worst-case constraint is somewhat relaxed, because the shadow latch
is fed with a delayed clock. More specifically, a too late data arrival results
in an undetected timing error since both the main flip-flop and shadow latch
hold the same corrupted data: no mismatch is detected. On the contrary, a
too early data arrival causes a short-path error, whereby the data held in the
main flip-flop—although correct—is invalidated due to the early arrival of the
the next data sampled in the shadow latch. In such a situation, a timing error
is reported although none actually occurred. Razor flip-flops provide a generic
method of correcting timing errors because they can replace standard flip-flops
after any combinational logic block, as long as the timing constraints of the
shadow latch are met and short-path errors are avoided. Yet, depending on
the considered circuit, the fact that both the shortest and longest (i.e., critical)
paths are constrained may constitute a severe limitation. The latency penalty
required to recover from errors is architecture-dependent. For example, error
recovery in a pipeline may require to flush it entirely.
As a checker, Razor flip-flops do not meet fully the goal expressed in Sec. 1.3
because reliable operation requires worst-case characterisation of the controlled
circuit. We expand on this point later in Chapter 6.
2.2.2 Asynchronous Circuits
In a synchronous circuit, the validity of signals and the progression of events
is defined with respect to a global clock. The notion of clock is intrinsically
related to the one of worst-case. Indeed, all combinational logic blocks of a
synchronous circuit—even though radically different—must evaluate in less than
a single clock period. As a result, the performance of a synchronous circuit is
defined in terms of worst-case: blocks that evaluate in much less than a clock
period do not improve performance at all, since their output is only used at the
end of the clock period.
On the contrary, asynchronous circuits do not use a clock to indicate events.
Instead, they are able to detect when a combinational block has evaluated (com-
pletion detection) and indicate events by changes in handshaking signals. The
advantages of asynchronous circuits are numerous.
• No clock. Asynchronous circuits do not need complex clock distributions
trees. Clock distribution is a notorious source of headache for designers
and consume a significant amount of power.
• Average performance. Asynchronous circuits do not rely on a clock
to indicate the validity of signals. Instead they use completion detection
techniques. Therefore, the result produced by a combinational logic block
can be exploited as soon as it is available and not at the end of a clock
period. That is, asynchronous circuits enable average performance.
• Tolerance to variations. Because asynchronous circuits use comple-
tion detection techniques, they are tolerant to process and environmental
2.2. ALTERNATIVES TO WORST-CASE DESIGN 13
variations.
• No idle power. A clock always toggles—and thus contributes to dynamic
power—even though there is no data activity for the circuit. On the
contrary, all switching activity occurring on asynchronous circuits is useful.
• Modularity. Because, in asynchronous circuits, timing assumptions are
explicit in the handshaking protocol, the redesign of a component is easily
handled.
Despite these benefits, synchronous circuits constitute the vast majority of
all circuits manufactured. The main reasons explaining why asynchronous cir-
cuits are confined to niche applications are as follows.
• Lack of tools. The design and verification of asynchronous circuits is
poorly automated. At the same time, tools for synchronous circuits are ex-
tremely optimised—a consequence of their heavy use—and alleviate some
of the inherent drawbacks of clocked circuits.
• Difference and complexity. Asynchronous circuits are radically dif-
ferent from synchronous ones, which in itself is sufficient to explain why
the community of designers is reluctant to adopt them. For example, the
operation of asynchronous circuits is concurrent and thus harder to grasp
than purely synchronous circuits. In addition, completion detection needs
to be implemented with glitch-free logic, which is complex and restricts
the set of codes that can be used for this purpose.
Completion detection is performed by either a code membership test or using
a matched delay line. In the first method, delay-insensitive codes, such as dual-
rail [Varshavsky, 1990] or LEDR (described in Example 2 of Chapter 4), indicate
the correct sequencing of events under any timing error rate and, thus, constitute
perfect checkers. In fact, timing errors are not even considered in asynchronous
circuits since they are defined with respect to a clock (i.e., a too early sampling).
The code membership test must be implemented with glitch-free logic, which
often requires the use of a spacer and significantly reduces the set of possible
codes—in practice, mostly to dual-rail or M -of-N codes [Bainbridge et al., 2003].
In this work, we do not consider applying asynchronous design because, first,
as pointed out before, they are poorly supported and, second, they incur a
significant overhead in terms of wiring and bandwidth. Yet, the codes used as
checker that we develop later in Chapters 4, 5, and 6 borrow ideas from them.
In addition, a much larger set of codes is relevant for our problem since there is
no strict requirement of glitch-free implementation.
The second method making it possible to detect completion matches the
critical path of a combinational logic block with the delay through a chain of
buffers (called a delay line). The handshaking protocol relies on the delay line
to define the availability of the data output by the combinational logic. In order
to operate correctly, the delay through the line should always exceed the delay
through the critical path of the logic. For this to hold, the delay line needs to
be designed worst-case, even though temperature variations are compensated
for, because the delay line and the circuit are exposed to the same environment.
Nonetheless, designs based on a matched delay line bring some advantages with
respect to synchronous worst-case designed circuits. For example, they tolerate
better imbalance between different logic stages, as demonstrated by a technique
called de-synchronisation [Blunno et al., 2004; Cortadella et al., 2004]. Besides
14 2. HOW TO DESIGN DIFFERENTLY THAN WORST-CASE?
their application to detect completion, matched delay lines are also used in
voltage controlled oscillators to generate clock signals [Toprak and Leblebici,
2005].
2.3 Fault-Tolerant VLSI Design
Traditional worst-case designed circuits do not tolerate errors at all. As a matter
of fact, design parameters—impacting performance and power consumption—
are chosen so that, in average, the circuit operates error-free for a specified
period of time (i.e., the mean time to failure). As discussed in Chapter 1, this
design paradigm is bound to encounter severe difficulties because, among other,
soft errors, thermal effects, and increased variation challenge reliability.
Many techniques exist since the beginning of CMOS technology that provide
fault-tolerance from the gate level to the application level. Fault-tolerance at
the gate level was first proposed in 1956 in the seminal work of Von Neumann in
order to compensate the poor reliability of early NAND logic gates and majority
voters. Each binary logic signal was replaced by a bundle of N lines [1956].
Because each gate was replicated N times, most of the faults could be masked
by the architecture. The application of this concept to blocks larger than single
gates has lead to the well-known N modular redundancy technique, whereby
a functional block is repeated N times—typically 3—and the different outputs
are resolved by one or several majority voters. Motivated by reliability concerns
regarding nanometer technologies, researchers have recently investigated further
refinements of modular redundancy [Han et al., 2005; Thaker et al., 2005].
Contrary to data links, computing elements exhibit some intrinsic capa-
bilities of masking faults. For example, when two 1s are input to an OR
gate, the output is still correct even if a single error corrupts either in-
put. Building on this property, the sensitivity of microprocessors to fail-
ure due to soft errors has been recently studied by contrasting combinational
with sequential logic and control with execution parts [Saggese et al., 2005a;
2005b].
Other techniques do not mask faults, but instead rely on a checker to detect
errors produced by misbehaving circuits—typically, the ones affected by stuck-at
faults—or transient faults such as single event upsets [Gorshe and Bose, 1996;
Jien-Chung et al., 1992; Nicolaidis, 2003]. Correct outputs produced by the
circuit are expected to belong to a particular code. Moreover, the checker itself
is not assumed error-free. For this reason, the dual-rail code is often used, which
avoids that a single stuck-at fault in the checker corrupts the error signal. Two
important properties characterise checkers used in such systems. The first one is
defined by the ability to detect all erroneous outputs produced by the circuit for
a set of modelled faults. A checker satisfying this property is called fault-secure,
because it avoids the delivery of corrupted outputs. Yet, some faults could still
produce correct outputs and thus remain hidden. The second property aims at
avoiding such situations and requires that, for each modelled fault, there exists
an input vector producing an error at the circuit output. A checker satisfying
this property is self-testing, while a checker that is both fault-secure and self-
testing is called strongly fault-secure. Much of the research in this field targets
the development of low-overhead strongly fault-secure checkers for ALUs.
As far as on-chip communication is concerned, a wide range of coding
2.3. FAULT-TOLERANT VLSI DESIGN 15
schemes exist that aim at improving reliability. Some of these techniques often
impact not only reliability, but also performance or energy consumption. For
example, coding schemes that reduce the switching activity of large inter-wire
capacitance decrease energy consumption and, at the same time, increase relia-
bility because bit transition patterns maximising crosstalk are less frequent or
even avoided [Zhang et al., 2002; Victor and Keutzer, 2001].
Error correcting codes (ECC) such as CRC codes [MacWilliams and Sloane,
1977; Koopman and Chakravarty, 2004] are widely used in order to increase
the tolerance to additive errors. By applying ECC to an on-chip bus, Bertozzi
et al. have studied the especially important trade-off between reliability and
energy consumption [2002]. Indeed, an on-chip bus capable of error detection
or correction brings the overhead of both additional wires and the codec cir-
cuitry. However, the added redundancy offers the same reliability level as a
non-redundant bus at a smaller operating voltage. Although the considered
trade-off still relies on worst-case design—in particular, reliability figures are
derived from a questionable bit error rate model—our work has similar intents,
in that we use a coding technique tolerant to timing errors in order to operate
an on-chip link more aggressively than worst-case design enables. Moreover,
our work does not depend on a particular bit error rate model since our goal is
to ensure reliability under any bit error rate ranging from 0 to 1. Even though
traditional ECCs like CRCs increase the robustness to additive errors, Rossi et
al. have shown that by optimising the inter-wire spacing under a given area con-
straint, they better tolerate crosstalk caused by inter-wire capacitance [2005a;
2005b].
Finally, fault-tolerance may also also be carried out at the algorithm level,
as performed by Shanbhag [2002; 2004]. This technique, referred to by their
authors as algorithmic noise-tolerance, jointly addresses reliability and energy
efficiency issues in digital signal processing systems. It exploits properties of
applications run over a DSP (typically, FIR filtering) to enable operation at
sub-critical voltage and thus grant significant energy saving. Like better than
worst-case designs, this technique exploits data dependencies, i.e., the fact that
the critical path is excited only for some particular input data—possibly not
occurring frequently. To ensure reliability, architectural enhancements enable
to recover from run-time errors and thus guarantee a minimum signal-to-noise
ratio. More specifically, blocks implementing linear prediction are added. Fur-
thermore, an original mean of achieving fault-tolerance is called reduced preci-
sion redundancy, whereby a low precision replica of the main DSP is used to
validate outputs of the latter.
While our work shares the feature of operating at sub-critical voltage, algo-
rithmic noise tolerance targets specifically digital signal processing and its goal
is essentially to achieve tangible energy saving without degrading the signal-
to-noise ratio. Furthermore, algorithmic noise tolerance does not use detected
errors as part of a closed-loop control of the supply voltage.
As discussed in this section, fault-tolerance in VLSI systems is achieved by
deploying a whole spectrum of techniques that mask the occurrence of faults.
Except in the case of algorithmic noise-tolerance, faults are modelled as either
relatively infrequent and uncorrelated— typically, transient fault caused by sin-
gle event upsets—or, on the contrary, as permanent faults—typically, stuck-at
faults.
The context of our work is totally different: instead of trying to screen out
16 2. HOW TO DESIGN DIFFERENTLY THAN WORST-CASE?
the effect of faults, we aim at operating an on-chip link aggressively and thereby,
temporarily, under massive timing errors, while at the same time ensuring reli-
ability. Moreover, unlike all techniques discussed in this section, our approach
adjusts dynamically its level of fault-tolerance to silicon capabilities. As a mat-
ter of fact, most of the work reviewed is complementary to our approach, because
there is no sense in adjusting the operating points of a circuits subject to rare
and uncorrelated errors.
2.4 Conclusions
To conclude, the following points emphasise the novelty of our work by sum-
marising in what its context and requirements are particular.
• Synchronous operation. Contrary to the techniques discussed in
Sec. 2.2.2, we do not consider asynchronous communication.
• No worst-case assumption about the link. We do not rely on any
worst-case characterisation of the link. In particular, the reliable operation
of the checkers we propose is to be ensured without any assumption about
the link itself, even though the checkers themselves can be designed worst-
case. This requirement explains the difference of our approach with respect
to checkers consisting of double sampling flip-flops (discussed in Sec. 2.2.1).
• Independence towards bit error rate models. This point follows
from the last stated requirement. We do not make any worst-case as-
sumption about the link and, in particular, about its bit error rate. While
we assume a particular type of error—namely, timing errors that we define
in Chapter 3—we do not make any assumption on the bit error rate it-
self, that is, the probability that a timing error occurs given the operating
points of the link.
• Separation of design concerns. Reliability issues are confined to the
checker, since the latter should operate reliably under any possible error
rate. Other design objectives such as performance are decoupled from
reliability and are determined by an operating point control policy uncon-
strained by the need of avoiding weak spots of the checker. That is, we
aim at a reliability-agnostic control of the self-calibrating circuit.
• Closed-loop control of voltage. Contrary to techniques providing
fault-tolerance, we apply voltage scaling techniques to adjust dynamically
the supply voltage as a function of detected errors. That is, our checker
is a part of a closed-loop control system. A practical consequence is that
the self-calibration process may cause the link to be temporarily operated
under massive error rates as large as 100%.
Chapter 3
Bit Error Rate and Channel
Models
This chapter presents the bit error rate (Sec. 3.1) and channel models(Sec. 3.2) of a data link depicted in Fig. 3.1.
On the one hand, the model of the link bit error rate quantifies the proba-
bility that a bit error occurs given the operating voltage vch and frequency Fch.
We do not model the bit error rate in the purpose of high accuracy. Instead,
the model is needed to generate errors in the simulations presented in Chapter 4
and to compare different checkers under a common error model, which we do
in Chapter 6. On the other hand, the channel model describes in what the
bit sampled at the link output differs from the one originally emitted when a
timing error happens. That is, it explains how errors caused by operation at
sub-critical voltage are modelled.
vch vdd
Fch Fch
Figure 3.1: A data link supplied at voltage vch and clocked at frequency Fch.
17
18 3. BIT ERROR RATE AND CHANNEL MODELS
3.1 Bit Error Rate Model
We consider two possible sources of noise. The first one is an additive white
Gaussian noise modelling transient external disturbances, e.g., electro-magnetic
interference. The second error source captures the variability of the link delay
around its nominal value, representing the effects of temperature, manufacturing
conditions, power supply noise, etc. on the link propagation delay. We assume
that these two noise sources are uncorrelated.
Regarding the second error source, we would like to model how tp depends
on the link supply voltage. A simple expression can be derived by assuming
that delay is measured as the time for the lumped capacitance to attain half
a swing (i.e., vch/2) and by neglecting velocity saturation and channel length
modulation [Weste and Eshraghian, 1993]:
tp =
CL
km
· vch
(vch − vth)2
, (3.1)
where
• km is the transistor transconductance, which depends on the driver tran-
sistor dimensions and on some technological parameters,
• CL is the interconnect capacitance, and
• vth is the threshold voltage of the devices.
A more complex expression can be used when considering velocity saturation
and channel length modulation [Rabaey et al., 2003] as well as sense amplifiers at
the receiving end. We refrain from using more complex delay models. Further-
more, we point out that the lumped capacitance model described in Eq. (3.1)
is currently widely used in design tools, e.g., by those of Cadence.
In order to model the effect of crosstalk, temperature, and process variation
over the link delay, we describe the ratio CL/km of Eq. (3.1) as a random
quantity. We denote this ratio by α and model it by a Gamma random variable
α ∼ Γ (a , b) ,
where a and b are two real positive parameters characterising the distribution.
We use a Gamma distribution because it can model more accurately the line
delay than a Gaussian distribution. For example, a Gamma distribution takes
only positive values, contrary to a Gaussian. Exponential (a = 1) and Erlang
(a ∈ N) distributions are a particular case of the Gamma distribution. While α
is modelled as random, both vch and vth are considered deterministic.
Let µtp be the mean of tp and σ
2
tp be its variance under a reference voltage
vdd (later used to characterise nominal operating conditions). Because vch and
vth are modelled as deterministic constants, µtp and σ
2
tp can be determined as
a function of the parameters a and b as indicated below:
µtp =
a b vdd
(vdd − vth)2
, and
σ2tp =
a b2 v2dd
(vdd − vth)4
. (3.2)
3.1. BIT ERROR RATE MODEL 19
The distribution of α is thus entirely determined by the mean and standard
deviation of tp under the reference voltage vdd. By solving the last equation for
a and b, we obtain easily
a =
µ2tp
σ2tp
, and
b =
σ2tp
µtp
· (vdd − vth)
2
vdd
. (3.3)
The bit timing error rate, εt, is the probability that a bit line is sampled
incorrectly. More precisely, εt is the probability that tp exceeds the sampling
period Tc (with Tc = 1/Fch). By definition of α, tp > Tc if and only if
α > Tc
(vch − vth)2
vch
.
It follows that
εt = P (tp > Tc)
= P
(
α > Tc
(vch − vth)2
vch
)
= 1− F
(
Tc
(vch − vth)2
vch
| a , b
)
, (3.4)
where F (· | a , b) is the cumulative distribution function of α:
F (x | a , b) = 1
baγ (a)
∫ x
0
ta−1e−
t
b dt,
and γ (·) is the Gamma function
γ (x) =
∫ ∞
0
e−ttx−1dt, x ∈ C \ {0,−1,−2, . . .}.
In Eq. (3.4), the values of the parameters a and b can be determined knowing
the mean and standard deviation of tp, as indicated in Eq. (3.3). The timing
error rate εt clearly depends on Tc, vch and vth.
In addition, we model the additive noise vn by a white Gaussian noise, with
standard deviation σvn :
vn ∼ N (0, σvn) .
Although on-chip disturbances are more accurately modelled as bursty noise,
the white noise model developed here suffices to prove our concept. The additive
bit error rate, εa, is the probability that the additive noise vn exceeds half the
voltage swing vch:
εa = P (vn > vch/2) = Q
(
vch
2σvn
)
, (3.5)
with Q (·) the complementary cumulative Gaussian distribution function:
Q (x) =
1√
2pi
∫ ∞
x
e−y
2/2dy.
20 3. BIT ERROR RATE AND CHANNEL MODELS
Eq. (3.5) is common in the literature [Hegde and Shanbhag, 2000]. Since it
accounts only for additive white noise, it is not a function of the sampling
period Tc.
Having now modelled the two noise sources, we express the overall bit error
rate ε as a function of the timing and additive bit error rates, respectively, εt
and εa. Because they model different physical phenomena, the random variables
α and vn are assumed independent. For the sake of simplicity, we make the
following approximation: a bit error occurs either if a timing or additive error
occurs. This implies that the contribution of inter-symbol interference to the bit
error rate is neglected if tp < Tc, and is approximated to one otherwise. Using
Eqs. (3.4) and (3.5), the approximation yields directly
ε = 1− ((1− εt) · (1− εa))
= 1−
[
F
(
Tc
(vch − vth)2
vch
| a , b
)
·
(
1−Q
(
vch
2σvn
))]
. (3.6)
Whether a bit error occurs for given values of vch and Tc depends on the re-
alisation of the two random variables α (which in turn impacts tp) and vn.
Eq. (3.6) is a generalisation of the relation previously introduced [Hegde and
Shanbhag, 2000]. The new relation does not model only additive noise—as done
in Eq. (3.5)—but also takes into account some other important error sources
impacting the bit error rate.
Example 1 illustrates how we model the bit error rate of an on-chip link.
Example 1. (bit error rate of a link). To model a given technology, we de-
termine the nominal operating points, the device threshold voltage, the mean and
variance of tp under the nominal conditions, and lastly the standard deviation
of the additive white noise. We define these parameters as follows.
• The nominal operating points are vdd = 1.2V and Tc = 2ns.
• The device threshold voltage is vth = 0.2V.
• The mean and variance of tp at the nominal operating points are: µtp =
1ns and σtp = 0.1ns.
• The standard deviation of vn is σvn = 0.08V.
The nominal operating voltage vdd and the threshold voltage vth could reflect a
90nm or 130nm CMOS technology. The values assigned to µtp (1ns) and Tc
(2ns) describe respectively the typical and worst-case delay. It is common for
the safety margin between the worst-case and typical delay to be as large as
100% [Cortadella et al., 2004]. In addition, the value given to σtp is 1/10 of the
typical delay, which has been observed in recent technologies. Under the nominal
conditions vdd = 1.2V and Tc = 2ns, these values result into the following error
rates:
• the timing error rate is εt = 1.9 · 10−15;
• the additive error rate is εa = 3.2 · 10−14;
• the overall bit error rate is ε = 3.4 · 10−14.
Moreover, assuming that errors occurring on different bit lines of a bus are
statistically independent, the probability that at least one bit of a 32-bit word is
corrupted is
εw = 1− (1− ε)32 = 10−12.
3.2. CHANNEL MODEL 21
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
2
4
6
8
10
12
14
16
0
1e−014
1e−012
1e−010
1e−008
1e−006
0.0001
0.001
0.01
0.1
1
voltage [Volt]
de
la
y 
[n
s]
Figure 3.2: Contour plot of the bit error rate ε in the voltage (vch) and delay (Tc)
plan.
Fig. 3.2 shows a contour plot of the bit error rate ε in the voltage (vch) and
delay (Tc) plan. Contour lines are vertical, i.e., independent of Tc, as soon
as the voltage becomes low enough for the additive noise to exceed the contour
level. One recognises the critical zone where the circuit passes from a faulty
to a functionally correct state: for delay values sufficiently above the critical
value the probability of error is almost zero, whereas the same probability is 1
in regions where the circuit is over constrained. Fig. 3.3 depicts the transition
line from 0 to 1 for the bit error rate ε.
We proceed by defining timing and additive errors along with the associated
channels. The next section is complementary to this one in that the bit error
rate model quantifies the probability of a bit error, while the channel model
determines how the received data is obtained from the input data and errors.
3.2 Channel Model
We define first additive errors and the well-known associated channel, which is
the Binary Symmetric Channel (BSC). Next, we describe timing errors and the
resulting channel.
A binary symmetric channel is an additive noise channel. The noise is mod-
elled by a Bernouilli source. The channel output is thus
yk = xk ⊕ ek,a, (3.7)
with xk the input data and ek,a a sequence of i.i.d.
1 Bernouilli random variables
defined by
ek,a =
{
1 with probability εa,
0 with probability (1− εa).
(3.8)
1Independent and identically distributed.
22 3. BIT ERROR RATE AND CHANNEL MODELS
Figure 3.3: Bit error rate landscape in the voltage (vch) and delay (Tc) plan.
εa can be quantified using Eq. (3.5). The left drawing of Fig. 3.4 depicts the
corresponding channel. The channel is said to be symmetric since a 0 has the
same probability εa of being corrupted into a 1, as a 1 into a 0.
We would like to illustrate timing errors by the following phenomenon: as-
sume that a binary signal is sampled synchronously, but with a clock whose
period is not guaranteed to be larger than the time elapsed between two signal
transitions. In such a situation, the link may be operated beyond its RC de-
lay, which causes inter-symbol interference [Proakis, 2000]. The signal sampled
at the receiver end contains not only the current transmitted information but
also a contribution from the previously transmitted symbols. In the worst-case,
these symbols combine constructively and interfere negatively with the current
symbol: an error happens if the interfering contribution is large enough for the
sampled value to be in the wrong decision interval. To summarise, by operating
the link at sub-critical voltage, there is a risk of sampling the received signal
while data transitions have not yet completed: the receiver either samples the
previous data, or has metastability problems.
Fig. 3.5 illustrates timing errors. In the top drawing of Fig. 3.5, the data rate
is low enough to ensure correct sampling. Therefore, no timing error occurs.
Yet, additive noise can corrupt the sampled data, as shown for x1. The bottom
drawing of Fig. 3.5 depicts a situation where the sampling rate is too fast to
ensure correct sampling. In this example, x1 and x2 are sampled incorrectly.
We adopt a discrete time model representing the sampling instants. We now
explain how timing errors affect the sampled data. We denote the input data by
xk and the sampled data by yk. Moreover, we denote by ek,t ∈ {0, 1} the failure
3.2. CHANNEL MODEL 23
Bernouilli
source
ε t
x
k k-1
x∆
x
k
tk
ek,t
yk
ek,t
~
Bernouilli
source
εa
x
k
ek,a
yk
Binary Symmetric Channel Timing Error Channel
transition  vector
timing error vector
yk
x
k
x
k-1
ek,t
0 1
Bernouilli
source
ε t
equivalent representation of 
the Timing Error Channel
Figure 3.4: Left: a binary symmetric channel BSC(εa). Right: a timing error
channel TEC (εt). The box with symbol ∆ denotes a single cycle delay element.
The top right representation shows explicitly that errors are caused by the erasure
of individual bit transitions. The bottom right representation emphasises that the
channel output yk is either xk or xk−1 depending on ek,t.
of data sampling: ek,t = 1 indicates that the sampling was too early, whereas
ek,t = 0 indicates a correct sampling. A timing error occurs when yk 6= xk ,
which requires the two following conditions to be met:
• the sampling is too early, i.e., ek,t = 1, and
• a bit transition occurs, i.e., xk 6= xk−1.
In the latter condition, note that we have not defined a transition when xk 6=
yk−1. Indeed, doing so would preclude some errors (such as the sampling of x2
in the bottom of Fig. 3.5) from being modelled correctly. Yet, our definition
does not capture timing errors caused by a signal transition spreading over more
than twice the sampling intervals.
We model the failure of sampling by a sequence of i.i.d. Bernouilli random
24 3. BIT ERROR RATE AND CHANNEL MODELS
x = 00 x =11 x =02 x =13
y = 00 y = 11 y = 02 y = 13
vdd
(0)
vdd
x = 00 x =11 x =02 x =13
y = 00 y = 01 y = 12 y = 13
x =14
y = 14
vdd
2
vdd
2
Figure 3.5: Additive (top) and timing (bottom) errors during the transfer of the
bit sequence 01011.
variables {ek,t}, k ∈ N,
ek,t =
{
1 with probability εt,
0 with probability (1− εt).
That is, we have P (yk 6= xk | xk 6= xk−1) = εt, while, for a BSC, P (yk 6= xk) =
εa. The expression of Eq. (3.4) can be used to determine εt as a function of the
link supply voltage vch and the sampling period Tc.
For a N -bit wide channel, xk and ek,t are both N -bit vectors. The relation
between the input vector xk and the sampled data yk is given by
yk = xk ⊕ ek,t · (xk ⊕ xk−1) . (3.9)
Eq. (3.9) models the fact that yk = xk if ek,t = ~0 or xk ⊕ xk−1 = ~0. On the
contrary, a timing error on a bit position i occurs if and only if (i) xik 6= xik−1
and (ii) eik,t = 1 (according to our convention, an index in upper-script denotes
the bit position in a vector, while the subscript denotes the time index).
We call a channel described by Eq. (3.9) a timing error channel, and we
denote it by TEC(εt). We introduce a few notations and rewrite Eq. (3.9) as
yk = xk ⊕ ek,t · (xk ⊕ xk−1)
= xk ⊕ ek,t · tk
= xk ⊕ e˜k, (3.10)
where we have defined the transition vector tk = xk⊕xk−1 and the timing error
vector e˜k = ek,t · tk.
In this particular case, the timing error vector e˜k is not a Bernouilli process as
Eq. (3.8). Indeed, only the failure of sampling—which does not necessarily cause
3.3. CONCLUSIONS 25
an error—is determined by a Bernouilli random process. A graphic description
of the timing error channel is given in the right part of Fig. 3.4. A timing error
channel differs from a BSC in two main features.
• Random errors affect data transitions and not directly the transmitted
data itself.
• Errors are asymmetric. While a transition can be cancelled, no spurious
transition is ever added.
We use later in Chapter 5 the timing error channel to define self-synchronisation
and estimate analytically the reliability of various encoding schemes.
3.3 Conclusions
The primary goal of this chapter is to model at a high level errors occurring on a
self-calibrating on-chip link. In this situation, voltage and frequency are not set
based on worst-case assumptions about the link delay. Therefore, data sampled
on the receiver side of the link may not have completely transitioned to the
correct state. In addition, the receiver may encounter metastability problems.
We have modelled operation at sub-critical voltage by the possible failure of
bit transitions. This approach is motivated by the fact that inter-symbol inter-
ference and inter-wire crosstalk would delay significantly bit transitions under
sub-critical operation and has lead to the definitions of timing errors and their
corresponding communication channel. These models reflect the original feature
of the considered problem. Indeed, in Chapter 5, we define self-synchronisation
properties of synchronous encoding schemes based on the timing error channel.
Furthermore, we have formulated an expression of the link bit error rate as
a function of the supply voltage and the sampling period (i.e., the inverse of the
link frequency). In doing so, we have extended the well known expression of
the bit error rate under additive noise [Hegde and Shanbhag, 2000]. We use the
bit error rate model to generate errors during the simulations from which the
results presented later in Chapter 4 and 6 are derived. However, our approach is
independent of a bit error rate model because we study the reliability of checkers
under the whole range of possible bit error rate until 100%.
Finally, we point out that metastability is not taken into account in our
models. Nevertheless, the checkers presented later include metastability avoid-
ance techniques such as inserting two flip-flops in a row at the link output or
the metastability-tolerant comparator deployed in a Razor flip-flop.
26 3. BIT ERROR RATE AND CHANNEL MODELS
Chapter 4
System-Level View of a
Self-Calibrating On-Chip Link
This chapter gives a detailed description of a self-calibrating link. Sec. 4.1discusses its system-level structure and introduces the main idea of the
design. Next, Sections 4.2 and 4.3 focus respectively on the encoding and op-
erating point control policy. Then, Sec. 4.4 demonstrates and quantifies the
advantages of a self-calibrating link in terms of energy saving and performance
in a few particular situations. The goal of this chapter is twofold:
• By comparing the energy consumption of a self-calibrating link with the
one of a classic system, we show that, although self-calibration incurs some
overhead, energy savings are possible due to operation at lower voltage.
• As a checker, we introduce a particular embodiment of the encoding family
that we study in more depth in Chapter 5.
Through the material presented in this chapter, the thesis has pioneered the
development of a self-calibrating on-chip link [Worm et al., 2002; 2005]. Never-
theless, our primary contributions—presented later in Chapters 5 and 6—bear
on (i) the analytical study of the reliability of the encoding used as checker as
well as other more general encoding families, and (ii) the development of checker
architectures even more robust than the one deployed in the self-calibrating link
we discuss now.
4.1 System-Level Description
We consider a unidirectional point-to-point link interconnecting two subsys-
tems. Fig. 4.1(a) shows a qualitative view of the classic interconnect: at the
producer end, a FIFO or a similar buffer is used to decouple the two subsys-
tems which may operate at different frequencies, and a large driver (typically
a chain of appropriately sized inverters) charges or discharges the large capac-
itance represented by the interconnecting wires. A receiver (typically a CMOS
gate) compares the level of the line to a threshold and delivers the resulting
information to the consumer.
27
28 4. SYSTEM-LEVEL VIEW OF A SELF-CALIBRATING ON-CHIP LINK
FIFO
vdd vdd
Fch
(a)
FIFO
vch vdd
Fch
errorsworkload
En
co
de
r
De
co
de
r
Controller
(b)
Figure 4.1: The basic idea of a self-calibrating point-to-point unidirectional on-
chip interconnect. (a) The classic static scheme, with a FIFO that decouples two
subsystems. (b) The proposed self-calibrating scheme with additional elements
needed in order to adjust the operating points as required by workload and detected
transfer errors.
4.1. SYSTEM-LEVEL DESCRIPTION 29
We add a few elements to the classic scheme, as indicated in Fig. 4.1(b). To
reduce the energy consumed per bit, we control dynamically the driver swing and
the corresponding receiver threshold. Electrical schemes to reduce the voltage
swing of the interconnect are known and well studied [Zhang et al., 2000]. Of
course, the variable voltage swing impacts the speed at which the interconnect
driver is able to charge or discharge the load capacitance—this phenomenon
has been modelled by Eq. (3.1). The maximum reliable operating frequency is
thus reduced with lower swings: hence, we need to adapt the communication
speed too, as with traditional Dynamic Voltage and Frequency Scaling (DVFS)
techniques. The key difference with the latter technique stems from the fact
that actual operating points (i.e., the pair voltage-frequency) do not result from
worst-case assumptions but, instead, are dynamically established by a controller.
The operation with lower swings makes communication more sensitive to
several noise sources. To cancel this effect, we introduce an error detection
encoding at the word level on the source side—a word is the data unit used
for transfers over the link. The link may be operated at bit error rates as
large as 1, causing thus multiple errors in a single word. It follows that, in
this context, no low-overhead encoding can correct errors. As a result, we
implement a typical Automatic Repeat reQuest (ARQ) strategy, such as Go-
Back-N [Walrand and Varaiya, 2000], which relies on the encoding for error
detection only, while corrupted words are retransmitted. We have chosen in
this case to perform error detection and retransmission on a per word basis;
however, one could very well imagine an ARQ strategy dealing with packets of
several words. In its simplest embodiment, error detection is provided by a code
(e.g., CRC) which is transmitted in parallel with the data. DVFS is also applied
to the redundant lines that consume additional energy. Nevertheless, the energy
saving achieved by operation with a lower swing outweighs the energy required
by driving the additional lines, as reported later in Sec. 4.4. Moreover, it has
been reported that, for on-chip links, retransmitting corrupted words is more
energy-efficient than correcting errors [Bertozzi et al., 2002].
Finally, our scheme requires a controller that decides of the operating fre-
quency and voltage swing: such a controller must be able not only to choose
voltage-frequency pairs from a set of operating points as a function of the re-
quested bandwidth; it must also explore the design space to find safe operating
pairs. Therefore, it needs as input some information on both bandwidth re-
quirements and channel reliability. In summary, our system
1. uses a variable frequency and swing to trade off speed for energy,
2. implements error detection and ARQ to guarantee reliable communication,
and
3. exploits a variable relation between operating frequency and voltage swing
to find the best safe operating point in the current environmental condi-
tions, based on monitoring the error rate.
In essence, the controller will provide the minimum voltage swing such that
communication is achieved at the requested bandwidth and under a limited
retransmission rate. In the ideal case that all transmission errors could be
detected, our scheme would trade off transmission energy for additional com-
munication latency (i.e., tardiness). In reality, codes detect only a subset of the
30 4. SYSTEM-LEVEL VIEW OF A SELF-CALIBRATING ON-CHIP LINK
Channel
32 40FIFO32
full
wen
wnack
wen
wdata
wclk clk
empty
ren
32
go-back
ack
fill level
ARQ rnack
ren
rdata
rclkclk
FIFO
En
co
de
r
clk clk clk
Sy
nc
hr
on
ise
r
De
co
de
r
OPC
FIFO
DEC
O
PC
ARQ
Controller
empty
ren
go-back
ack
error
Clk Gen.
clk
Volt. Gen.
channel
driver
voltage
Operating
Point
Controller
fill level
error
fch
vch
Figure 4.2: A more detailed architecture of the self-calibrating point-to-point uni-
directional on-chip interconnect.
possible transmission errors. In our case, undetected malfunctions cause data
corruption: we will measure reliability of the link by the residual error rate.
A more practical view of our system is represented in Fig. 4.2. It describes
in greater detail the idea of Fig. 4.1(b). The FIFO separates two clock domains:
the write clock on the data producer side, and the read clock which is fed to the
remaining of the design. Yet, the data interfaces on the producer and consumer
ends are the same. The FIFO consists essentially of a dual ported memory with
some logic around to control the read/write address and to generate the empty
signal. In particular, the FIFO needs to handle data retransmissions. In order
to reduce the probability of metastability in the decoding logic, two flip-flops in
a row are inserted before the decoding stage [Dally and Poulton, 1998]. This is
represented by the synchroniser box.
In practice one can separate completely an ARQ controller and a controller
devoted to the choice of the operating point. The former has the sole task
to push all words of data through the channel until they are communicated
without error and in sequence, ignoring the channel parameters: it addresses
thus error recovery. In other words, the ARQ controller decides which words
to push through the channel in order to deliver only those that are in order
and for which no error has been detected. On the other hand, the operating
point controller is in charge of picking the lowest frequency and voltage swing
required to meet some communication constraint—such as an average delay: it
decides how to communicate, checks if the choice is appropriate by monitoring
the error rate, but ignores what is going through the channel.
As Fig. 4.2 suggests, the data path is pipelined into an encoding stage, a
synchronizing stage, and a decoding stage. Although the choice of pipelining
belongs to implementation issues, our architecture relies on this fact to recover
from metastability.
We point out that our architecture can seamlessly be applied to segmented
busses. In this case, the same voltage swing is used along all segments—which
4.2. ENCODING 31
is possible as every repeater only consists of an inverter supplied at vch. As
matter of fact, we report in Sec. 4.4 conservative results that only consider
the energy consumed by the interconnect wires; in reality, the repeaters will
draw additional energy which also scales down with our technique. We have
modelled a segmented bus, and found that the energy difference compared to a
non-segmented bus amounts to slightly less than 5%.
The focus of this work lies on the overall feasibility and potential advan-
tages of such an adaptive transmission scheme and on the challenges of aban-
doning a conservative worst-case design style. We will not detail some cir-
cuit design aspects such as the implementation of the variable supply trans-
mitters and receivers (which we suppose achievable as a derivation of known
techniques [Zhang et al., 2000]), nor the availability of on-chip efficient con-
trollable power supply sources (a key component for any DVFS technique and
thus the object of many research efforts [Gutnik and Chandrakasan, 1997;
Stratakos, 1998]). In other words, our goal is not to demonstrate that DVFS
techniques can be successfully deployed over an on-chip link, but rather to show
that abandoning worst-case assumptions to determine operating points is fea-
sible and advantageous in many aspects. Fundamentally, dynamic frequency
scaling is not required to prove our concept.
The next section introduces the coding scheme we use to detect timing errors.
We study its self-synchronising properties in depth later in Chapter 5, as well
as other encoding schemes, which are a more general form of the particular case
we present now.
4.2 Encoding
Several challenges are required to make the system robust under the extreme
conditions planned. The main difference is that we are not trying to screen
out and remove some relatively infrequent errors, which is what error detection
codes and ARQ protocols are typically used for. Instead, we operate the system
within a small margin from where it becomes no longer operational. In a sense,
we will push our system to explore the operating space and, thus, to become
at times non-operational. In this section we will analyse some of the related
challenges and suggest possible solutions.
Using only a spatial encoding—such as, in the simplest case, adding some
parity bits to the data word—is not sufficient. This encoding would be effective
to detect, for instance, that a single bit has not yet made a transition, due to
crosstalk. Yet, if our sampling rate is so fast that the complete previous word
is still present on the interconnect (for example, when the sampling process is
like (b) in Fig. 4.3), a pure spatial encoding would diagnose the result to be
correct and would not detect that the new word is simply not ready, since a
codeword would be sampled twice. Fig. 4.4 contrasts two systematic1 encod-
ing schemes adding spatial (left) and temporal (right) redundancy. As already
pointed out, we observe that a spatial encoding cannot distinguish a new ready
codeword from an old one sampled twice consecutively due to the failure of all
bit transitions.
1An encoding is systematic when its codewords are obtained by concatenating redundant
bits to information bits.
32 4. SYSTEM-LEVEL VIEW OF A SELF-CALIBRATING ON-CHIP LINK
(b) (c) (a) (b) (c) (a) (b) (c) (a)
1 0 0
0 U 1 1 U 0 0 0 0
Figure 4.3: A qualitative view of the sources of error in a self-calibrated intercon-
nect operating in too aggressive delay/voltage conditions. (a) Correct operation
after a sufficient delay. (b) Bit-errors due to the sampling after a largely insufficient
delay. (c) Risk of metastability in the receiver for slightly too aggressive sampling
times. (Note that the figure is simplistic in that a new symbol would be emitted at
the same time the line is sampled.)
time
sp
ac
e 
(b
its
)
RE
DU
ND
AN
CY
ERRORS
time
sp
ac
e 
(b
its
)
ERRORS
REDUNDANCY
redundant
bits
information
bits
Figure 4.4: Timing errors and encoding schemes with spatial (left) and temporal
(right) redundancy.
4.2. ENCODING 33
u φ r ~x = (u | r)
0 0 0 (0, 0)
0 1 1 (0, 1)
1 1 0 (1, 0)
1 0 1 (1, 1)
Table 4.1: Codewords of the LEDR code. The encoder outputs a sequence of
codewords with alternating phase.
Example 2 illustrates a particular encoding adding temporal redundancy,
and explains why it detects all timing errors.
Example 2. (Level Encoded Dual-Rail (LEDR) [Dean et al., 1991]).
Dual-rail uses a spacer symbol as temporal redundancy. On the contrary, LEDR
includes temporal redundancy implicitly in the codeword sequencing. The encod-
ing is systematic and appends one redundant bit to each information bit. Each
codeword is said to have a phase, which we define as follows. We denote the
information bit by u and the redundant bit by r. The phase, φk, of the infor-
mation uk is a binary information obtained as the parity of the sequence index
k:
φk = k mod 2 ∈ {0, 1}.
By extension, the phase of a codeword is the phase of the encoded information.
LEDR encodes a single information bit into a 2-bit codeword. The redundant
bit r is computed by xor-ing the information bit u and the phase φ (i.e., r =
u ⊕ φ). Equivalently, the phase of an LEDR codeword can be obtained directly
by checking whether the information and redundant bits are equal (φ = 0) or
different (φ = 1). A codeword contains thus information about data sequencing.
Because LEDR computes the redundant bit using temporal information (namely,
the phase), it adds temporal redundancy. Table 4.1 lists LEDR codewords.
One verifies easily that LEDR detects all timing errors. Let us consider
a 2-bit timing error channel over which information is transmitted with the
LEDR code. By inspecting Table 4.1, we observe that exactly one bit transitions
between any two consecutive codewords of opposite phase. As a result, the only
timing error that can occur is that a single bit that should transition does not.
In this case, LEDR declares an error, since the phase of the received data is
unchanged.
When transmitting more than one information bit, one redundant bit is com-
puted for each information bit by xor-ing the latter with the phase. Fig. 4.5
depicts a possible implementation of an LEDR encoder; while the hardware cost
is relatively low, the wiring overhead is significant (100%). In order to use the
LEDR as an encoding scheme in the system of Fig. 4.2, the encoder and decoder
are provided with a synchronous clock to generate locally the phase bit, since the
latter is not transmitted.
Typical self-synchronising codes [Varshavsky, 1990], such as the 1-of-N or
LEDR (introduced in Example 2) schemes, incur a large wiring overhead. In-
stead of using them, we enhance simpler schemes—like classical CRCs—with
the notion of phase that has been introduced with the description of LEDR.
A particular embodiment is shown in Fig. 4.6. It generates 8 redundant bits
34 4. SYSTEM-LEVEL VIEW OF A SELF-CALIBRATING ON-CHIP LINK
u(1)
u(K)
K
u(2)
phase bit
Figure 4.5: Synchronous implementation of a K-bit LEDR encoder. The flip-flop
generates the alternating phase bit. LEDR is thus an “alternating-parity” encoding
scheme.
computed from all 32 information bits. Our new error detection scheme—that
we call CRC-8 alternating-phase encoding—works by (i) generating a phase bit,
i.e., alternatively a 0 and a 1, which is not transmitted but produced indepen-
dently at the source and destination, and (ii) transmitting eight redundant bits
computed using the generator polynomial x8 +x2 +x+1 [Walrand and Varaiya,
2000] and from the 32-bit data padded with the generated bit. The inclusion of
the phase bit ensures that any two consecutive identical data pieces cannot have
the same encoding—hence, two successive 40-bit encoded words on the channel
may be identical only if an error has occurred. The choice of the particular
employed CRC-8 is motivated by a study on the error detection capabilities of
8-bit CRC over the binary symmetric channel that concluded that this polyno-
mial has the best overall performance [Baicheva et al., 1998]. We have observed
on a few examples that this result still holds over the timing error channel.
The scheme is called alternating-phase encoding because it transmits alter-
natively codewords from two distinct codes. As depicted in Fig. 4.7, the two
codes can be obtained from a single systematic code by discriminating over the
most significant bit. In the case where the timing error rate is as large as 1, all
bit transitions towards the new codeword fail. Henceforth, the decoder samples
twice the same codeword. As already mentioned, any encoding adding only spa-
tial redundancy does not detect this error, which is actually mistaken for the
transmission of twice the same information. On the contrary, the proposed en-
coding detects deterministically such an error. Indeed, the phase bit generated
locally by the decoder is included in the parity checks and causes a single bit
error, which is always detected by CRC codes. The key feature of this encoding
is to include information about data sequencing into the redundant bits, since
the latter are computed from parity checks including the phase bit. However,
the wiring overhead of the resulting scheme is significantly reduced compared
to LEDR, since the parity check of a CRC code involves several information
4.2. ENCODING 35
K-bit 
input data
N-K 
redundant bits
error
phase bit
K-bit 
output data
K N
K
K+1
N+1
N-bit link
CRC
encoder
CRC
decoder
Figure 4.6: The new self-synchronising encoding scheme we propose. The input
(respectively received) data is augmented with a phase bit that is not transmitted
but generated by the encoder (respectively decoder). The error signal does not only
detect bit flips, but also detects when the sampled word is still the last one correctly
sent across the channel.
bits, instead of a single one as LEDR does. Possible desynchronisation errors
between the toggling flip-flops can be corrected by resetting both the encoder
and decoder phase each time an error is detected.
We defer the study of the reliability of this original scheme—as well as other
related encodings—to Chapter 5. In the meantime, we have estimated the
residual bit error rate of the encoding scheme described in Fig. 4.6. The residual
bit error rate is the fraction of transmitted bits affected by an undetected error.
We have simulated in VHDL a functional model of the timing error channel and
approximated the analytical bit error rate model introduced in Sec. 3.1. We
have simulated the transfer of 0.32 · 109 random bits. As shown in Fig. 4.8,
no residual undetected error could be observed for raw bit error rates less than
10−2. Moreover, for bit error rates larger than 0.1, the word error rate reaches
and remains at 1, while the residual bit error rate reaches a maximum value
before eventually decreasing to zero.
Although by no mean specific to the choice of the particular CRC, one
point is worth mentioning: as the bit error rate approaches unity, the absolute
number of undetected errors increases dramatically. While this is no concern
in typical application of error correcting codes, where the error rate is assumed
to be always small, a self-calibrating system might operate briefly in regions of
extremely high bit-error rate: thanks to the alternating phase bit, our encoding
scheme has the important feature of detecting errors when the raw bit error
rate approaches unity. As a matter of fact, the residual error rate is 0 if the bit
error rate is 1. This feature is essential to prevent the operation point controller
using too aggressive, unsafe operating points.
Other error-resilient encoding schemes can be applied. For example, codes
with stronger detection probability. It is also possible to avoid the additional
lines required to transmit the redundant codes, and to insert error detecting
codes in the data stream—as done for data packets. These implies trading off
36 4. SYSTEM-LEVEL VIEW OF A SELF-CALIBRATING ON-CHIP LINK
information
bits
redundant
bits
phase bit
code C
C0
1C
0
0
0
0
0
1
1
1
1
1
1
0
Figure 4.7: The code C0 (respectively C1) uses in the alternating-phase encoding
consists of all codewords of the code C having 0 (respectively 1) as the most
significant information bit.
latency for area. A more subtle trade-off involves the energy consumed and
error detecting probability of various coding styles. In this work, we consider
the parallel coding scheme of Fig. 4.6 for the sake of simplicity. Nevertheless,
we stress that our approach is general, and can be combined with different data
formats, including packet encapsulation.
The next section focuses on the link operating point control problem.
4.3 Control
We state the link control problem as a constrained minimisation problem. Then,
we explain why and how the voltage and frequency control can be decoupled.
Finally, we introduce a particular instance of a decoupled link control which we
use later in the simulations presented in Sec. 4.4.
4.3. CONTROL 37
10−3 10−2 10−1 100
10−8
10−7
10−6
10−5
10−4
10−3
10−2
10−1
100
raw bit error rate
residual bit error rate 
word error rate 
Figure 4.8: Word error rate and residual bit error rate as a function of the raw
bit error rate for the CRC-8 alternating-phase encoding generated by the polynome
x8 + x2 + x + 1 (40-bit words).
4.3.1 Statement of the Link Control Problem
The link control problem consists in finding the operating pair (vch, Fch) which
1. minimises the energy consumption,
2. meets a performance constraint, and
3. meets a reliability constraint.
Both the performance and reliability constraints are considered as inputs defin-
ing some quality of service requirements: the operating frequency and, thus, the
corresponding voltage is adjusted in function of workload requirements. Tradi-
tional DVFS techniques match energy consumption to the required performance
level. In addition, worst-case assumptions determine which voltage is safe for
operation at a given frequency, ensuring thus that reliability is met. While more
energy-efficient than full time operation at peak performance, this approach does
not address concerns raised by increasing process variations, since it relies on
worst-case assumptions to determine which voltage is used for a given frequency.
We proceed by stating more precisely the control problem of a self-calibrating
link. We define first the energy, performance, and reliability metrics. Referring
to Fig. 4.2, we decompose the energy consumed in the transmission scheme into
the two following contributions:
• the energy Ech spent over the link to transmit information bits and addi-
tional redundancy bits of the error detection code through the link,
38 4. SYSTEM-LEVEL VIEW OF A SELF-CALIBRATING ON-CHIP LINK
• the energy Esys consumed by the logic of the ARQ and operation points
controllers, the encoder, the decoder, the synchroniser and the FIFO mem-
ory.
Ech corresponds to the energy required to charge or discharge the link capac-
itance. It scales down with the supply voltage vch. On the contrary, Esys is
independent of vch since all elements accounted for are supplied at vdd. This
decomposition neglects the energy lost in the voltage converter: the efficiency
of state-of-the-art voltage converters can be as high as 95% [Sakiyama et al.,
1999]. It also neglects the energy spent in three backward control lines due to
their very low switching activity.
We are interested in the energy required to transmit a job through the link.
We call job a collection of information bits that needs to be transmitted over the
on-chip link. We define a transaction on the link as a cycle during which valid
information is transmitted. Transactions on a self-calibrating link include suc-
cessful word transfers as well as additional transfers caused by retransmissions.
However, cycles during which the link is idle are not counted as transactions.
Let Φtot be the number of transactions required for a given job. The energy
consumed by the adaptive transmission scheme to transmit the job through the
link is
Ejob = Ech + Esys
= Φtot · (K + n) C1 v2ch + Φtot C2
= Φtot ·
(
(K + n) C1 v
2
ch + C2
)
, (4.1)
where
• K is the number of information bits per word,
• n is the number of redundant bits per word (i.e, n = N −K),
• v2ch is the square voltage averaged over all transactions,
• C1 is a constant accounting for the switching activity and the bit line
capacitance, and
• C2 is a constant equal to the energy per transaction spent by the ARQ and
operation points controllers, the encoder, the decoder, the synchroniser
and the FIFO memory.
Eq. (4.1) accounts for the fact that a self-calibrating link introduces an energy
overhead both in space (due to the additional redundant lines) and in time
(due to the additional transactions incurred by retransmissions). Esys accounts
for the energy spent in the elements supplied at voltage vdd. Therefore, it is
proportional to Φtot. Although not indicated expressly, Φtot is a function of the
link word error rate (among other) since it depends on retransmissions. In turn,
the link word error rate is a complex function of both vch and Fch.
Regarding performance, we consider a requirement about the average word
transfer delay, which we denote by ∆. Constraining the average transfer delay
is equivalent to requiring a certain throughput for the link. Moreover, a self-
calibrating link cannot ensure a non-trivial (i.e., low) absolute word transfer
delay, since retransmissions eventually occur. We denote by ∆est the approxi-
mation of the average transfer delay. ∆est is clearly a function of Fch. Because
vch affects the error rate (which, in turn, affects retransmissions and the trans-
fer delay), ∆est is also a function of vch. We explain later in Sec. 4.3.2 how we
estimate the average transfer delay.
4.3. CONTROL 39
As far as reliability is concerned, we would like to meet a constraint about
the residual word error rate, εres. The residual word error rate is the fraction
of transmitted words affected by an undetected error. We denote the required
residual word error rate as εresw . Now, we state the link control problem as
follows.
Problem 1. (link control problem). Consider a job characterised by its
average delay requirement ∆ and its residual error rate requirement εresw . Find
a sequence of operating pair (vch, Fch) that:
1. minimises Ejob as expressed in Eq. (4.1),
2. ensures that the average delay constraint is met ∆est ≤ ∆, and
3. ensures that the reliability constraint is met εres ≤ εresw .
The coupling of voltage and frequency makes this problem highly complex.
Indeed, Ejob, Φtot, ∆est, and εres are functions of both vch and Fch. We point
out that, in order to obtain meaningful solutions, it is necessary to give both
performance and reliability requirements.
In the following example, we illustrate qualitatively how the total energy
required to transmit a job varies with the link error rate, which brings some
insight into the solution of Prob. 1.
Example 3. (energy optimal error rate). We illustrate how the total energy
Ejob varies as a function of the word error rate using very high-level energy
models. For the sake of simplicity, we consider a system operating at a fixed
frequency. We assume that bit errors on a given line are statistically independent
from errors occurring on other lines, and that bit errors in a given cycle are
statistically independent from bit errors occurring in other cycles. Moreover, we
assume that the bit error rate depends on the communication voltage vch as
ε = Q
(
vch
2 σvn
)
,
with σvn the standard deviation of the additive noise. Due to the independence
of bit errors, the word error rate is
εw = 1− (1− ε)N ,
where N is the word size. The average number of transfer attempts (including
the last successful one) is given by 1/ (1− εw). This relation enables to express
how the total number of transactions Φtot increases with the word error rate.
Since Esys is proportional to Φtot, it also increases accordingly. On the contrary,
the energy spent on the interconnect decreases quadratically as vch decreases.
Our goal is to illustrate how the the total energy required to transfer a job varies
with the link error rate. Let Θ be value of the ratio Esys/Ech under the worst-
case voltage vdd. As the voltage vch is decreased, we can write
Ech (vch) = Ech (vdd) · v
2
ch
v2dd (1− εw)
and
Esys (vch) = Esys (vdd) · 1
1− εw .
40 4. SYSTEM-LEVEL VIEW OF A SELF-CALIBRATING ON-CHIP LINK
10−14 10−12 10−10 10−8 10−6 10−4 10−2 100
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
word error rate
ra
tio
 E
jo
b(
v c
h)
 / 
E j
ob
(v
dd
)
Θ = 10 
Θ = 1 
worst−case point 
Figure 4.9: The energy scaling ratio of Eq. (4.2) as a function of the word error
rate. The parameter Θ is the ratio Esys/Ech under the worst-case voltage vdd.
Therefore, the total energy scales as follows
Ejob (vch) = Ech (vch) + Esys (vch)
= Ech (vdd) · v
2
ch
v2dd (1− εw)
+ Esys (vdd) · 1
(1− εw)
= Ech (vdd) ·
(
v2ch
v2dd (1− εw)
+
Θ
(1− εw)
)
.
Finally, normalising by Ejob (vdd), we obtain the energy scaling ratio
Ejob (vch)
Ejob (vdd)
=
(
v2ch
v2
dd
(1−εw)
+ Θ(1−εw)
)
1 + Θ
(4.2)
Fig. 4.9 plots the ratio expressed in Eq. (4.2) as a function of the word
error rate for a 40-bit wide link and for two different values of the parameter
Θ. Fig. 4.9 clearly shows the existence of an error rate minimising the total
energy: this error rate lies in the 1–10% range, depending on the value of the
parameter Θ. However, the error rate minimising the total energy does not
necessarily solve Prob. 1 because it may either cause a too long transfer delay
or a too large residual error rate. As expected, large values of the parameter Θ
decrease the maximum possible energy saving (since most of the energy is spent
4.3. CONTROL 41
in the overhead elements supplied at fixed voltage vdd), and cause the error
rate of minimum energy to decrease (since a large error rate would result in a
significant amount of energy spent in the overhead elements).
We proceed by arguing why the solution of Prob. 1 can be accurately ap-
proximated by a decoupled control of voltage and frequency.
4.3.2 Motivations for a Decoupled Control
As stated in Prob. 1, the link controller has to meet a reliability constraint: the
residual word error rate should not exceed a safe value. However, it is impossible
for the controller to receive feedback about words delivered and corrupted, since,
by definition, these errors are undetected. The controller is only informed of
detected word errors.
An alternative approach to ensure reliability consists in modelling the
checker reliability as a function of the link operating points, and use worst-case
assumptions to forbid a-priori operating pairs that do not meet the reliability
requirement. While the feasibility of this approach has been demonstrated for
Razor flip-flops [Austin et al., 2004], we show later in Chapter 6 that the robust-
ness of these checkers can be increased to a level sufficient to avoid worst-case
assumptions about the link error rate by adding little hardware overhead. As
a result, we do not follow such an approach that relies on worst-case assump-
tions to avoid unsafe operating points. On the contrary, our goal is to discover
adaptively which operating points are safe in the current noise environment and
silicon conditions.
It follows from these observations that, in order to meet a reliability con-
straint, the link controller has to use the feedback it receives about detected
word errors as a correlated feedback on undetected word errors. Because we
want to ensure reliability without using any worst-case assumption about the
link, the controller should react by avoiding operating pairs that cause retrans-
missions. We state this important remark as a necessary assumption to solve
Prob. 1: the link controller interprets retransmissions as a sign of unreliability.
Moreover, as far as the CRC-8 alternating-phase encoding is concerned, Fig. 4.8
attests that detected errors need to be avoided in order not to compromise re-
liability. Indeed, residual errors can be avoided if the link word error rate does
not reach 100 %. Accordingly, the link controller should avoid operating points
where detected errors are reported. We confirm this fact later in Chapter 5 by
studying analytically the reliability of the CRC-8 alternating-phase encoding as
a function of the link word error rate.
Because operating points with detected errors are avoided by the link con-
troller, the overall word error rate is expected to be small. Besides reliability
concerns, large error rates are likely to be energy-inefficient and cause large
transfer delays. Infrequent retransmissions constitutes the key motivation for
separating concerns between, on the one hand, energy and, on the other hand,
performance. That is, provided that the word error rate is small (typically,
around a few percents), Prob. 1 can be accurately approximated by assuming a
zero word error rate. In this ideal situation, Φtot becomes a constant indepen-
dent of both vch and Fch. Thus, referring to Eq. (4.1), Ejob does not depend
anymore on Fch. Moreover, ∆est is a function only of Fch, and not anymore on
vch.
42 4. SYSTEM-LEVEL VIEW OF A SELF-CALIBRATING ON-CHIP LINK
In summary, we propose a link control policy that
1. ensures reliability by avoiding operating points where errors are reported,
2. dynamically adapts its operating frequency as a function of the perfor-
mance requirement, and
3. minimises energy consumption by tracking the lowest possible voltage
swing which does not cause transfer errors (retransmissions).
The control policy is reliable since it avoids operating the checker where relia-
bility is poor, energy-aware since it implements a DVFS technique to determine
its operating frequency, and finally variation-resilient since it discovers which
voltage to use for a given frequency. Fig. 4.10 illustrates how such a control pol-
icy selects so-called Pareto points within the design space. Known techniques
can be used to solve the frequency selection problem, such as history based pre-
diction of the link workload [Shang et al., 2003]. While we have formulated this
problem by constraining the average delay, other formulations—e.g., tracking a
target FIFO fill level—would be possible as well. On the contrary, the prob-
lem of finding out adaptively the minimum safe voltage for a given operating
frequency is more original and related to our intentions.
Having now explained why the link control problem can be decoupled, we
proceed by describing a particular decoupled control policy. We study its prop-
erties and hardware overhead.
4.3.3 Decoupled Control: a Particular Policy
Fig. 4.11 depicts a simplified operating point control algorithm. The control
consists of three main tasks. First, it stores in a table the optimum voltage for
each possible frequency. Secondly, it updates the optimum voltages based on
reported errors. The update principle is simple: as soon as an error is reported,
the optimum voltage (which just suffered an error) is raised. To ensure the
most aggressive operation, whenever no error is reported for a given number
of cycles, the controller briefly attempts to reduce the Pareto voltage of the
current frequency. The threshold T1 dictates how often attempts to decrease
the voltage are made: a counter is incremented each time a word is transmitted
successfully. Once the voltage is decreased, the controller enters an explore
mode whose purpose is to validate the candidate new voltage. The latter is
validated only if T2 consecutive successful word transfers are observed. If that
is the case, the controller resets the counter and returns into the normal mode.
Typical values given to T1 and T2 are respectively 500 and 50. The value of T1
should be large enough to avoid frequent retransmissions caused by tentative
reductions of the Pareto voltage.
Finally and independently of this voltage selection process, the minimum
frequency meeting the average delay requirement is determined as a function of
the FIFO fill level, as indicated by Fch = fn (fifo level, deadline) in Fig. 4.11.
In what follows, we describe in more depth how the average delay is es-
timated (Sec. 4.3.3.1), we verify some desired properties of the control policy
(Sec. 4.3.3.2), and lastly we discuss the hardware complexity of the controller
(Sec. 4.3.3.3).
4.3. CONTROL 43
voltage
de
la
y
positive
slack
negative
slack
(a)
voltage
de
la
y
explore errors
(b)
Figure 4.10: Use and estimation of best operating points. (a) The control policy
fixes the operating frequency in function of the delay constraint; it sets the operating
voltage to the minimum value which has experienced error-free transmission. (b)
The controller raises the best voltage for a given frequency when experiencing errors;
otherwise, every several cycles, it tries tentatively to reduce it in order to ensure
aggressive operation.
44 4. SYSTEM-LEVEL VIEW OF A SELF-CALIBRATING ON-CHIP LINK
mode = normal
counter = 0
V_best(F_ch)++
V_best(F_ch)-- mode = exploring
counter = 0
counter++
errors?
mode?
Start
counter >T1 ?
V_ch = V_best(F_ch)
F_ch = fn(fifo_level, deadline)
Y N
exploring normal
Y N
Apply F_ch and V_ch
counter =0 ?
if counter = T2
   mode = normal
   counter = 0
else
   counter ++ 
NY
Figure 4.11: Simplified operating point control policy. T1 and T2 are the values
of two constant thresholds.
4.3. CONTROL 45
general
independent
arrivals   en
co
de
r
de
co
de
r
sy
nc
h.
FIFO link  (       cycles)
∆   
Ξ    l 
est
w=
Fch
Ξ w
Figure 4.12: GI/D/1 model of the transmission scheme. The arrivals in the queue
are i.i.d processes. According to Little law, the average transfer delay is proportional
to the average buffer fill level. We simply estimate the average transfer delay by
the delay experienced by the last queued element.
4.3.3.1 Delay Estimation
Our controller estimates, for each possible frequency, the expected word transfer
delay through the system. Then, it selects the smallest frequency meeting the
performance requirement.
Because we are limited in resources, we need to estimate very simply the
average transfer delay. To do so, our approximation of the average transfer delay,
∆est, is the estimated delay experienced by the last word queued. According to
this approximation, ∆est, which includes queueing and transmission, is given by
∆est (Fch) =
l · Ξw
Fch
,
where l represents the queue size, that is, the number of words currently present
in the FIFO buffer, and Ξw is the expected number of cycles needed to send a
word through the link. Because our control policy enforces a low error rate, Ξw
can be accurately approximated by the number of cycles required for a word to
go through the link when no error occurs—i.e., the depth of the transmission
pipeline.
We briefly motivate the approximation made for ∆est. As performed in
Fig. 4.12, we can model the transmission scheme by a GI/D/1 queue, i.e., by a
single-server queue with a general and i.i.d. arrival process and a deterministic
service time. According to Little law [Arnold, 1990], the average response time
of such a system is proportional to the average fill level. In order to minimise
the overhead, we avoid computing an average fill level, and simply sample the
average fill level by its instantaneous value. We have also considered finding
which element in the queue is the most “slack-critical”. The answer is that the
most critical is not necessarily the last one. Actually, the problem turns out to
be even more complex to solve than computing an average transfer delay.
As far as the implementation is concerned, the controller has to find the
slowest frequency among a discrete set meeting the delay requirement. Fig. 4.13
illustrates this selection process. The delay constraint is expressed in cycles with
respect to a given frequency. The factors K0,...,KQ account for the coefficients
46 4. SYSTEM-LEVEL VIEW OF A SELF-CALIBRATING ON-CHIP LINK
1 1 0... ... ...
...
FIFO level
deadline
K 0
Fch, 0
KQ
Fch, Q
... ...
... ...
01
index of chosen frequency
Figure 4.13: Circuit determining the slowest frequency making possible to meet
the required delay constraint. By convention, frequency index 0 corresponds to the
fastest frequency. K0,...,KQ are constants that are hardwired for every frequency.
Ξw/Fch,i, i = 0, ..., Q and are constant. The circuit needs to multiply these
coefficients with the FIFO fill level. Therefore, the complexity of the circuit can
be minimised by limiting the number of possible operating frequencies (e.g., to
4) and by picking frequency values whose ratios are powers of two.
4.3.3.2 Properties
We now discuss the optimality, stability, and sensitivity of the control policy to
design parameters. We consider two main design parameters. The first one is the
threshold T1 mentioned in Fig. 4.11 governing how often the controller attempts
to use more aggressive voltages. The second parameter—introduced due to
implementation reasons—determines how slower the controller runs compared
to the data path. The goal of this parameter is to limit the energy overhead
of the operating point controller. We now discuss the impact of these two
parameters on optimality, stability, and sensitivity. We give later in Sec. 4.4
more experimental details concerning the graphs presented.
• Optimality. The algorithm presented in Fig. 4.11 contains two deviations
from optimality:
(i) The slower controller clock that delays voltage and frequency up-
dates.
(ii) The tracking of the optimal voltage (Pareto voltage) associated to
every operating frequency.
4.3. CONTROL 47
While the choice of Fch is not approximated, the delay estimation neglects
retransmissions. However, such approximations are inherent to any prac-
tical DVFS capable scheme. In summary, the first deviation from optimal
control stems from implementation reasons, while the second is intrin-
sic to the algorithm. We study later the impact of the slower controller
clock and focus now about the tracking of the Pareto voltage. We show
that the latter hardly affects the control quality. To do so, we simulate
a 100,000 words transfer and a-posteriori determine the Pareto voltage of
every frequency. Then, we compare the performance of our controller with
another optimal controller knowing which Pareto voltage to use for every
frequency. The difference in performance is minor: our controller saves
25% of energy in this case, while the optimal controller saves one more
percent. Both controllers performs the same in term of average transfer
delay and residual word errors (none in both cases).
• Stability. In our context, stability translates into (i) ensuring that the
FIFO level does not grow to infinity, and (ii) after an uncontrolled dis-
turbance (traffic variation or errors), the controller eventually settles on
an(other) operating point. Because it tries to obey a delay constraint, the
controller does not let the FIFO fill level grow to infinity but, instead,
invests more energy by operating at high frequency. The FIFO fill level
would only overfill in pathological cases where the classical system’s FIFO
would too. This is even more evident as the self-calibrating system has a
maximum frequency larger than the classic one which suffers worst-case
limitations. The second item is granted by the algorithm itself. The only
point to discuss stems from the threshold that governs the controller ag-
gressivity. Too low a value leads to undesirable voltage level oscillations.
The next paragraphs shows that a wide range of values enable the system
to exhibit a sound behavior.
• Sensitivity. We show here how relevant metrics such as the energy, trans-
fer delay, and residual word errors are affected by the choice of the con-
troller threshold and by how much slower the controller runs compared to
the data path. Recall that the threshold determines how often the con-
troller attempts to set more aggressively the voltage level used for a certain
frequency. This threshold henceforth relates with the speed of reaction to
changes in noise level. In addition to illustrating the sensitivity to these
parameters, the presented results guide us towards a wide range of accept-
able values. We have simulated the transfer of 100,000 words. Fig. 4.14
shows that energy saving, words transfer delay, and residual word errors
remain in desired intervals over a wide range of threshold values. The
top two graphs of Fig. 4.14 do not show any trade-off: a large thresh-
old value results in positive energy savings, a low transfer delay, and no
residual errors. The reason is that the error statistics was not modified
during simulation. As matter of fact, the Pareto voltages are constant
over the whole simulation. On the contrary, small threshold values affect
negatively all metrics of interest. First, unsafe operating points are often
visited, which causes a significant number of undetected errors. In ad-
dition, numerous retransmissions incur a large transfer delay and cancel
energy saving.
48 4. SYSTEM-LEVEL VIEW OF A SELF-CALIBRATING ON-CHIP LINK
100 101 102 103 104
−50
0
50
threshold [cycle]
en
er
gy
 s
av
in
gs
 [%
]
100 101 102 103 104
101
102
103
104
threshold [cycle]
av
er
ag
e 
de
la
y 
[n
s]
100 101 102 103 104
0
10
20
30
40
threshold [cycle]
re
si
du
al
 e
rr
or
s
Figure 4.14: Sensitivity of various metrics to the threshold T1 (expressed in trans-
mission cycles) mentioned in the description of the controller algorithm.
Finally, Fig. 4.15 illustrates the impact of the ratio between the controller
clock and the data path clock on the transfer delay and energy savings.
The controller is only allowed to run more slowly. If the controller clock
is too slow, the system behaves poorly both in terms of energy saving
and average delay. Indeed, energy efficient operating points are excluded,
due to the violated delay constraint. On the contrary, once the control
policy runs often enough to meet the delay constraint, there is no point
in running the controller more often, since its overhead in terms of gates
becomes tangible and decreases energy savings. According to Fig. 4.15,
running the controller 4 times slower than the data path maximises the
energy saving, while barely affecting the transfer delay.
We proceed now by briefly discussing the hardware cost of such a controller.
4.3.3.3 Hardware Overhead
The implementation of the algorithm described in Fig. 4.11—although consid-
erably more detailed than what is shown—has still a relatively low hardware
complexity and requires an area equivalent to a few thousand NAND-2 gates.
The whole system consists actually of 770 standard cells (230 sequential and
540 combinatorial) from Artisan library in CMOS UMC 90 nm. An accurate es-
timation of the energy consumed is given later in Fig. 4.17, assuming a discrete
operation set of four voltages and four frequency levels. The hardware overhead
of the controller increases with the number of possible frequency and voltages
due to the need of storing the Pareto voltages in a table.
4.4. SIMULATION RESULTS 49
0 5 10 15 20 25 30
5
10
15
20
clock ratio
en
er
gy
 s
av
in
gs
 [%
]
0 5 10 15 20 25 30
20
30
40
50
60
clock ratio
av
er
ag
e 
de
la
y 
[n
s]
Figure 4.15: Impact of the ratio between controller and data path clock on: (top)
energy saving and (bottom) average transfer delay. The dashed line represents the
target delay of the classic system.
classic system self-calibrating system
voltage swing 1.0V 0.6,...,1.2V
frequency 500MHz 50,...,1000MHz
Table 4.2: Operating range of the classic and self-calibrating interconnect.
4.4 Simulation Results
The goal of this section is to illustrate the advantages of a self-calibrating link
over a classical fixed-swing interconnect designed from worst-case assumptions.
More precisely, we have compared the self-calibrating link with two kinds of
classic fixed-swing systems. The first classic system transmits raw unencoded
data on 32 bits. The second one represents the increasing concern with reliability
of on-chip communication (see for instance the implementation of practically
all internal buses of Itanium 2 [McNairy and Soltis, 2003]). It uses an error
detecting code such as an 8-bit CRC to protect 32-bit words and retransmits
corrupted words. We refer to the former system as the classic one and to the
latter as the classic with codec. We summarise in Table 4.2 the operating range
of the considered systems. We present our results with delays and frequencies
relative to the two classic system.
We have measured from simulations the energy consumption, transfer de-
lay, and undetected word errors of the classic and self-calibrating transmission
schemes under different—artificial and real-life—traffic traces. A traffic trace is
50 4. SYSTEM-LEVEL VIEW OF A SELF-CALIBRATING ON-CHIP LINK
a file defining the arrival times of words or frames (i.e., a group of words) into
the input FIFO.
In order to generate errors during the simulation, we use the bit error rate
model introduced in Chapter 3. More specifically, we have modelled the bit
error rate and noise sources of a typical 90nm CMOS technology as follows:
• Nominal supply voltage: 1.0V.
• Device threshold voltage: 0.2V.
• Additive noise standard deviation: 0.05V.
• Mean and standard deviation of the propagation delay 1ns and 0.1ns.
Regarding performance, we measure the average transfer delay in case of
word transfers, or the absolute delay required to transfer a whole frame. Reli-
ability is measured by counting the number of undetected errors that occurred
during the transfer of a given workload. Furthermore, we have estimated the
energy required to transfer a workload. We follow the decomposition introduced
in Eq. (4.1), which distinguishes the energy Ech spent on the interconnect and
the energy Esys consumed by the encoder, decoder, synchroniser, ARQ, and
operating point controller. The total number of transactions required to trans-
fer a workload includes the successful transfer and retransmission cycles, and
has been measured in the simulation. To estimate Ech, we assume a 1 cm bus
length, and a bit line capacitance amounting to 2.73 pF. We have obtained this
capacitance value from the technology manual. Moreover, we assume a switch-
ing activity factor of 0.5 since our workloads consist of randomly generated data,
even for those with a realistic arrival schedule. Finally, we have computed Esys
by synthesising the overhead elements required by the self-calibrating system.
The error detecting circuitry is as defined in Sec. 4.2. We have approximated
the switching activity to 50% on all nets.
In summary, the estimated energy consumption of the self-calibrating system
includes the overhead caused by the redundant bit lines, the overhead caused
by the additional retransmissions, and the overhead of the logic required by the
self-calibration (mainly the codec and operating point controller). However, as
mentioned in Sec. 4.3.1, the energy overhead of the voltage converter and of
three backward control lines have been neglected.
We present three experiments.
1. The first example focuses on the energy advantage of dynamic bandwidth
adaptation considering a realistic MPEG-based workload and an artificial
one.
2. The second example shows the energy advantage of dynamically tuning
the operating point to actual technology variations.
3. The last example exhibits the robustness of our system to unpredictable
noise sources.
4.4.1 Dynamic Bandwidth Adaptation
Modern multimedia algorithms have dynamically varying requirements.
Fig. 4.16 shows how the self-calibrating system takes advantage of a time-varying
MPEG workload. In the bottom graph, one can observe that the adaptive sys-
tem tries to exactly match the bandwidth to the current needs: it slows down
the communication link to send every MPEG frame as slowly as possible in the
4.4. SIMULATION RESULTS 51
0 50 100 150 200 250 300 350 400
0
0.5
1
1.5
frame number
fra
m
e 
si
ze
 [K
By
te
]
0 50 100 150 200 250 300 350 400
0
0.5
1
1.5
2
frame number
fra
m
e 
de
la
y 
[u
s]
delay constraint 
classic system self−calibrating system 
Figure 4.16: Transmission of a variable workload. Top: workload variation in
time. Bottom: incurred frame delay in the classic system (low delay) and in the
self-calibrating interconnect (delay as close as possible to the imposed constraint—
dashed line).
allotted time and, ideally, not any faster. The operation at a lower frequency
grants a tangible reduction in average energy: the whole trace, consisting of
400 frames of several Kbytes each, requires 42% less energy with a dynamically
self-calibrating system compared to a classic system, and 47% less energy com-
pared to the classic system with codec. Such a saving was achieved by letting
the channel controller run at half the frequency than the other components.
No undetected error has been reported. Fig. 4.17 depicts the contribution of
each component of the self-calibrating system to the energy budget. One can
notice that the controller logic and the synchronising registers, which are the
only contributions added in our scheme compared to the classic system with
codec, impose a relatively limited overhead. On the other hand, it is interesting
to note that the energy cost of the codec is not fully negligible and about of the
same order of magnitude. Finally, the visible overhead caused by retransmis-
sions amounts to 4%, which accounts for the ARQ controller and the additional
memory reads. While this part is clearly quantified, retransmissions also affect
the energy budget indirectly, since their delay overhead forces the controller to
operate on faster, more energy-costly operating points.
As a last illustration of the energy saving resulting from dynamic bandwidth
adaptation, we expose both the classic and the self-calibrating systems to various
Poissonian workloads of different traffic intensity. Each workload consists of
100,000 words generated according to Poisson arrivals. We require that, for
each workload, the self-calibrating interconnect offers the same average transfer
latency as the one experienced by the classic interconnect. Fig. 4.18 illustrates
52 4. SYSTEM-LEVEL VIEW OF A SELF-CALIBRATING ON-CHIP LINK
Figure 4.17: Energy breakdown of the self-calibrating interconnect.
the energy saving granted in this case. The graph shows that our controller
manages to save energy consistently in all those cases where the average delay
is not too constrained—only for tight constrains, the controller energy overhead
is larger than the diminishing savings. Moreover, reliability has been ensured
as no simulation run suffered from an undetected error.
4.4.2 Exploiting Technology Variations
Fig. 4.19 illustrates the effect of technology on the choice of control points: On a
poor wafer, simulated with an average propagation delay of 1.125ns (vs. 1ns for
the typical delay), the controller chooses mainly Pareto points relatively close
to the worst-case delay line. On the contrary, a good wafer, simulated with an
average propagation delay of 0.875ns, has operating points chosen mostly along
a more aggressive delay-voltage line, which reflects the lowest delays experienced
by the system. In other words, our control policy is effective in “discovering” the
real delay-voltage characteristic of the technology without making any assump-
tions on it. In both cases, the standard deviation has been reduced to 0.05ns
to account for the lower indeterminacy on wafers of a defined quality. These
hypotheses result in slightly less than 1% of the wafers being classified as good
or better than good, and poor or worse than poor. The simulated traffic consists
of an artificial workload of 100,000 words with arrival times following a Poisson
process. Table 4.3 summarises the energy saving and relative performance of
the self-calibrating interconnect compared to the classic and classic with codec
systems.
Fig. 4.20 shows the energy-latency trade-off of the self-calibrating link for
different wafer quality. As expected, for any type of wafer, energy saving im-
proves with the relaxation of the timing constraints and are dependent on the
quality of the wafer: the better the wafer and the larger the average delay that
can be tolerated, the more important is the energy saving. This fact can have
a very interesting effect in products designed early after the introduction of a
4.4. SIMULATION RESULTS 53
10 15 20 25 30 35 40 45 50 55
−40
−30
−20
−10
0
10
20
average delay [ns]
en
er
gy
 s
av
in
g 
[%
]
Figure 4.18: Energy saving (with respect to the classic system) as a function of
the average transfer delay for different Poissonian workloads. The system becomes
energy-inefficient only under high workloads requiring the interconnect to work most
of the time at full speed.
new technology node: at design time the technology is poorly controlled and
chances are that our system will not save significant amounts of energy. Yet, as
chips will go into production and the technology will mature, a more significant
saving would be possible. Classic systems would need a redesign, whereas our
system will benefit automatically of the technology improvements. Regarding
reliability, no residual error has been observed.
4.4.3 Robustness Towards Design Uncertainties
Fig. 4.21 simulates the effect of design hypotheses which have turned out to
be too optimistic after manufacturing, for instance due to unexpected sources
of on-chip noise. To simulate the self-calibrating system with a higher noise,
energy saving average delay variation
wafer wafer
good poor good poor
classic 21% -8% ≤0.5% ≤0.5%
classic with codec 30% 5% ≤0.5% ≤0.5%
Table 4.3: Energy saving and average delay variation for different wafers quality,
compared to the classic and classic with codec systems.
54 4. SYSTEM-LEVEL VIEW OF A SELF-CALIBRATING ON-CHIP LINK
0.5 0.6 0.7 0.8 0.9 1 1.1 1.2
0
0.5
1
1.5
2
2.5
voltage [Volt]
no
rm
al
iz
ed
 d
el
ay
Figure 4.19: Operating points used depending on technology variations. ◦ →
classic system; + → self-calibrating system on a poor wafer; × → self-calibrating
system on a good wafer. The bold line represents the worst-case relation between
delay and voltage. According to the model, slightly less than 1% of the wafers are
classified as good or better than good, and poor or worse than poor.
10 15 20 25 30 35 40 45 50 55
−60
−50
−40
−30
−20
−10
0
10
20
30
average delay [ns]
en
er
gy
 s
av
in
g 
[%
]
bad wafer 
typical wafer 
good wafer 
Figure 4.20: Energy saving (with respect to the classical system) as a function
of the measured average transfer delay for different wafer quality. The better the
wafer, the more important the energy saving.
4.5. CONCLUSIONS 55
0.5 0.6 0.7 0.8 0.9 1 1.1 1.2
0
0.5
1
1.5
2
2.5
voltage [Volt]
no
rm
al
iz
ed
 d
el
ay
 
Figure 4.21: Operating points used by the self-calibrating system in the presence of
strong noise. ◦ → classic system; + → self-calibrating system. The classic system
has a reduced yield under these conditions, while the self-calibrating one moves to
more energy-consuming, but safer operating points.
we raise the standard deviation of the additive noise from 0.05V to 0.1V and
the standard deviation of the propagation delay from 0.1ns to 0.15ns. These
noise parameters result in a word error rate of approximately 10−5 under the
nominal operating conditions. It should be stressed that the classic system is not
expected to work any more under this condition: if, in the normal design flow,
any source of error is overlooked or underestimated—such as crosstalk or other
deep sub-micron second-order effects—the manufactured chips may not work or
have a very limited yield. The simulated traffic is the same artificial workload
consisting of 100,000 words as described in Sec. 4.4.2. As the figure shows, the
self-calibrating system adapts to the strong noise by choosing less aggressive
operating points and by trading energy for robustness: energy savings shrink to
4% and 16% for the classic system and classic with codec, respectively. That
is, it behaves essentially as in the presence of a poor wafer (refer to Fig. 4.19).
Two residual errors have been reported during the simulation. As to the average
latency, the increase amounts to 4% compared to the desired behaviour of the
classic system—but the interconnect operates correctly and avoids the yield
reductions incurred by the classic system.
4.5 Conclusions
The content of this chapter constitutes pioneering work about the application
of digital self-calibration techniques to an on-chip link [Worm et al., 2002;
2005]. We have described a self-calibrating link at the system-level and ex-
plained in detail the additional elements—timing errors detection circuitry, er-
56 4. SYSTEM-LEVEL VIEW OF A SELF-CALIBRATING ON-CHIP LINK
ror recovery (i.e., the ARQ scheme), and operating point controller—imposed
by self-calibration.
Regarding the encoding scheme, we have stressed the fact that detecting
timing errors requires temporal redundancy. Borrowing from the LEDR encod-
ing, we have combined the notion of phase—a binary information about the
data sequencing—with standard error detecting codes such as CRCs to obtain
an encoding that detects additive as well as massive timing errors. We study in
detail the detection capabilities of this encoding in the next chapter.
Next, we have formulated the link control problem as a constrained minimi-
sation. Contrary to reliability (which is measured through residual errors and
has thus to be verified off-line), energy and performance can be observed by
the link controller at different levels of indirections. Still, the problem remains
complex because retransmissions affect energy efficiency and performance, and
depend on both voltage and frequency. Due to the limited resources available
to an on-line controller, we approximate the original and complex link control
problem by a simple decoupled control of voltage and frequency. Our approx-
imation relies on the observation that retransmissions are expected to occur
rarely. The motivations are manifold. First, we have observed by simulation
that our particular encoding scheme meets tight reliability constraints only for
limited word error rates, and hence infrequent retransmissions. Another more
general argument is that retransmitting often is energy inefficient and results in
large transfer delays. In a DVFS capable system, large transfer delays reduce
the available slack, which, in turn, increases the use of energy-costly operating
points. In addition, the decoupled control leverages traditional DVFS techniques
in order to determine the smallest operating frequency meeting a given perfor-
mance requirement. However, the minimum voltage used for a given frequency
is tuned by monitoring the occurrence of detected errors, instead of relying on
worst-case defined voltage-frequency pairs. In the presented case, the controller
has to request a safer operating point as soon as an error is detected in order
to avoid operating points where the checker is not reliable enough.
On the one hand, our problem formulation is “link-centric” in the sense
that energy and performance metrics are derived by sole considerations of the
link and related elements, leading henceforth to a local optimisation. On the
other hand, we argue that a global energy optimisation in a network of self-
calibrating links is possible if an external unit—such as proposed by Toprak
and Leblebici [2005]—provides values of the individual target delay of each self-
calibrating link. Extending this principle to the voltage control problem, one
could imagine that the local voltage control of a particular link would target an
error rate that has been determined in order to minimise, not only the energy
consumed by the link but by a larger system, e.g., a link and a cache. To our
best knowledge, such questions—outside the scope of the present work—have
not been addressed.
Our experimental results have emphasised the versatility of our transmis-
sion scheme. We believe that such self-calibrating circuits display features that
are essential in future VLSI designs. Specifically, we have shown that a self-
calibrating interconnect saves energy under both artificial and realistic work-
load. Our experiments have demonstrated that operating the link at sub-critical
voltage compensates the overhead incurred by additional elements required by
self-calibration. Furthermore, the energy saving reported depends on the quality
of the manufactured chips. We have also shown that, while the yield of a tradi-
4.5. CONCLUSIONS 57
tional system reduces when the design is done with optimistic assumptions on
the noise sources, self-calibration allows a circuit to trade energy for robustness
without reducing the yield.
In summary, we have studied in detail a particular self-calibrating on-chip
link and validated the feasibility of the concept. We have motivated a decou-
pled control of the link operating points. On the one hand, frequency can
be estimated using well-known DVFS techniques which we have introduced in
Sec. 2.1. On the other hand, the on-line control of voltage based on monitoring
the error rate constitutes, in our opinion, a more interesting and challenging is-
sue. However, there still exists a coupling between the reliability of the checker
and the voltage control algorithm. For example, the control policy presented
in Sec. 4.3.2 is constrained by the need of avoiding operation under large word
error rates. That is, objectives in reliability and energy-efficiency are not decou-
pled. While double-sampling flip-flops enable to track a target error rate [Kaul
et al., 2005], they rely on worst-case assumptions in order to ensure reliability,
which is incompatible with our goal. The remaining of this work is therefore
devoted to the development of checkers detecting reliably timing errors under
any error rate.
The next chapter studies analytically self-synchronising properties of the
alternating-phase encoding as well as other more general schemes and gives
fundamental bounds on the amount of wiring overhead necessary for detecting
all timing errors. Then, in Chapter 6, we will further improve the link checker
architecture presented so far, and propose novel architectures bridging the gap
between high robustness and low overhead, fulfilling thereby the goal expressed
in Sec. 1.3.
58 4. SYSTEM-LEVEL VIEW OF A SELF-CALIBRATING ON-CHIP LINK
Chapter 5
Self-Synchronisation for
Synchronous Encoding Schemes
An expert is a man who has made
all the mistakes which can be
made, in a narrow field.
Niels Bohr
The goal of this chapter is to derive self-synchronisation properties of syn-chronous encoding schemes. It reflects the particularity of the considered
problem: we use a coding technique in order to detect timing errors reliably
under any possible timing error rate. The perspective differs from purely asyn-
chronous communication, because we sample the link output at well-defined
instants and would like to indicate—possibly with a non-zero but small er-
ror probability—whether the sampled data is correct. On the contrary, asyn-
chronous codes indicate with no error probability the first instant when a trans-
mitted piece of data is available.
The self-synchronising scheme introduced in Sec. 4.2, namely alternating-
phase encoding, raises many natural questions. For example, why does it not
detect all timing errors? Is it possible to find an encoding that detects all timing
errors and has the same wiring overhead? In this chapter, we will answer such
questions by defining formally self-synchronisation in a synchronous context
and giving fundamental bounds about bandwidth efficiency of self-synchronising
encodings. To do so, we introduce sequencing rules, i.e., rules dictating which
symbols can follow which. For example, spacer-based encoding schemes obey a
sequencing rule requiring a spacer symbol to be inserted after each codeword
conveying information. Self-synchronisation has been extensively studied in a
framework assuming only two sequencing rules, namely spacer-based encoding
schemes (e.g., dual-rail) and differential encoding where information transmitted
on the channel consists of the values of signal changes, rather than signal values
themselves.
When considering classic sequencing rules—spacer-based and differential
encoding—self-synchronisation is equivalent to the unordering property of a
59
60 5. SELF-SYNCHRONISATION FOR SYNCHRONOUS ENCODING SCHEMES
(0,0,0,0)
(1,1,1,1)
(0,0,0,0)
(1,1,1,1)
(0,0,1,1)
(1,1,0,0)
(0,1,0,1)
(1,0,0,1)
(0,1,1,0)
(1,0,1,0)
(0,0,1,1)
(0,1,0,1)
(1,0,0,1)
(0,1,1,0)
(1,0,1,0)
(1,1,0,0)
C0 C1 C0 C1
=12C0 C1 =16C0 C1
Figure 5.1: Two different self-synchronising encoding schemes that both obey a
sequencing rule where two dictionaries (namely, C0 and C1) are used alternatively.
code [Verhoeff, 1988; Bose, 1991]. Unordered codes are such that for any code-
word p, there exists no other codeword q such that qi = 1 for all bit positions
i where pi = 1. In other words, for any two codewords p, q of an unordered
code, there exist at least two bit positions i, j such that pi = 1; qi = 0 and pj =
0; qj = 1. In this context, research has been essentially focused about unordered
codes of minimum redundancy [Berger, 1961; Freiman, 1962; Verhoeff, 1988;
Varshavsky, 1990]. Such codes are called optimal self-synchronising codes.
Although alternating-phase encoding is a generalisation of spacer-based en-
coding, it does not fit in this framework because it is neither defined by spacer-
based nor by differential sequencing rules. Moreover, its self-synchronisation
properties are weaker—and thus novel—since it does not detect all possible tim-
ing errors. The originality of our approach is to consider sequencing rules that
are more general than—and, indeed, include—the framework described above.
While the main motivation is to study self-synchronising properties of a wider
range of encodings (e.g., alternating-phase encoding), an additional motivation
is to ascertain whether such a generalisation could lead to improvements on
bandwidth efficiency. Moreover, we consider a larger family of sequencing rules
and codes than pure asynchronous codes because we do not have the necessity
of implementing the code membership test with glitch-free logic.
To illustrate the discussion, we give a motivational example showing that
minimising the redundancy of unordered codes does not necessarily maximise
bandwidth efficiency.
Example 4. (minimum redundancy vs. maximum bandwidth effi-
ciency). Fig. 5.1 depicts two particular self-synchronising encoding schemes.
Both obey a sequencing rule requiring to transmit alternatively codewords from
two codes C0 and C1. We have not yet defined formally self-synchronisation.
Nevertheless, we observe that, for both schemes, the failure of any arbitrary com-
61
bination of bit transition towards a new correct codeword never results in the
received data to be mistaken for a valid codeword. For example, let us focus on
the encoding depicted on the left and assume that the all-zero codeword has been
transmitted. Next a codeword from C1 will follow. As soon as at least a tran-
sition from the all-zero codeword towards any codeword of C1 fails, the received
codeword has a Hamming weight different than 2 and thus does not belong to C1.
Similar observations can be made for any codewords of both encoding schemes.
The code C1 on the left side of Fig. 5.1 consists of all 4-bit vectors of weight
2 and is called a Sperner code. Although the fact is already known, we establish
later in Sec. 5.4 that the Sperner code used in spacer-based or differential encod-
ing schemes minimises redundancy, while detecting all possible timing errors.
The 4-bit Sperner code is used alternatively with another code C0 containing the
all-zero and all-one spacer. The all-one spacer can be added “for free” in this
case and increases thus bandwidth efficiency (we show later in Thrm. 2 that C0
cannot be further extended without removing codewords from C1).
On the contrary, none of the code used in the encoding scheme on the right of
Fig. 5.1 is a self-synchronising code of minimum redundancy. Yet, the bandwidth
efficiency of this scheme is larger than the one of the other encoding based on
the Sperner code, since 16 (instead of 12) different codewords combinations can
be transmitted over two cycles. Actually, the encoding scheme on the right is
the 4-bit LEDR version. We conjecture later in Sec. 5.4.3.1 the optimality of
LEDR under sequencing rules alternating coding dictionaries.
The particular case of Example 4 can be generalised: do self-synchronising
encoding schemes exist that achieve a better bandwidth efficiency than optimal
codes used with sequencing rules of the classic framework? Indeed, when con-
sidering any possible sequencing rule, it is not clear whether unordered codes
of minimum redundancy (i.e., optimal self-synchronising codes) also maximise
bandwidth efficiency.
We proceed as follows. In Sec. 5.1, we briefly review the state-of-the-art of
self-synchronising encoding schemes. We discuss in depth later in Chapter 6
other approaches to detect timing errors, such as double sampling flip-flops.
Sec. 5.2 summarises the notations used in this chapter. Then, Sec. 5.3 defines
the general form of symbol sequencing rules we consider, as well as two particular
types which we study in more depth. Moreover, attributes related to sequencing
rules—such as self-synchronisation properties and bandwidth efficiency—are de-
fined. Key results are stated in Sec. 5.4, which is devoted to sequencing rules that
detect all possible timing errors and maximise bandwidth efficiency. We give the
combination of sequencing rule and code that maximises bandwidth efficiency,
i.e., show that differential encoding using unordered codes of minimum redun-
dancy is optimal. Next, we focus on other—and thus sub-optimal—particular
sequencing rules. Under these more restrictive assumptions (the choice of the
sequencing rule is imposed), we show that unordered codes of minimum redun-
dancy are not necessarily optimal. Finally, Sec. 5.5 studies the self-synchronising
properties of low-overhead encoding schemes that do not detect all possible tim-
ing errors. In particular, we derive the self-synchronising properties of linear
codes and alternating-phase encoding with linear codes. We give tight bounds
on the residual error rate of the CRC-8 alternating-phase encoding introduced
in Sec. 4.2.
62 5. SELF-SYNCHRONISATION FOR SYNCHRONOUS ENCODING SCHEMES
5.1 Related Work
Self-synchronising (or delay-insensitive) codes enjoy the property of detecting all
timing errors. As a result, they are potential solutions of the considered encoding
problem. In practice, self-synchronising codes whose membership test can be
implemented with glitch-free logic are used in asynchronous circuits. Dual-rail,
LEDR [Dean et al., 1991], and 1-of-N codes are the most well-known examples.
Varshavsky studies exhaustively codes used in asynchronous circuits [1990].
A code is said to be systematic (or separable) if its codewords are formed by
concatenating redundant bits to bits carrying information. The Sperner code is
a non-systematic unordered code that minimises redundancy [Freiman, 1962].
It consists of all vectors with 1s in half (rounded to an integer if needed) of
the bit positions. On the contrary, the Berger code is a minimum redundancy,
systematic, unordered code [Berger, 1961]. The redundant bits of the Berger
code are the binary encoding of the number of 0s in the information bits. The
Sperner and Berger codes are not used in asynchronous circuits, because their
membership test cannot be implemented with glitch-free logic. However, they
can be used in the considered encoding problem that has no requirement about
how to implement code membership test.
Errors are said to be unidirectional when, in any particular received word,
either some 1s have been transformed in 0s, or vice-versa, but both cases never
occur together in the same word. For example, errors occurring in a logic
network consisting only of AND and OR gates are unidirectional, since each
particular gate preserves the unidirectional property of errors. On the contrary,
XOR gates or inverters do not preserve unidirectional errors. Unordered codes
detect all unidirectional errors, since transforming a codeword of an unordered
code into another codeword requires to change both 1 into 0 and vice-versa,
which is not a unidirectional error. Researchers, especially Bose, have also
studied the so-called t-unidirectional error detecting codes that detect up to t
unidirectional errors, and all unidirectional errors detecting codes that correct
a few (typically, a single) errors [Bose and Rao, 1982; Bose and Lin, 1985;
Albassam et al., 1991]. In the considered encoding problem, all unidirectional
errors detecting codes are preferred, as the bit error rate can be as large as
100%.
5.2 Notations
We give the notations used in the rest of this chapter. Some special care is need
to handle vectors that we address both in time and space.
A letter x (respectively y) denotes the data sent (respectively received). A
letter in subscript denotes the time index. For example, xk is the vector x
sent at time k ∈ N. A letter in upper-script denotes the bit position index.
For example, xi is the ith bit of the vector x. The letter N denotes the bus
width or, equivalently, the symbol size, while the letter K denotes the number
of information bits per symbol. A code encoding K information bits into a
N -bit codewords is referred to as a (N, K) code. We use the term symbol to
denote any N -bit binary vector. We call a codeword a particular symbol that
carries information. The operator ⊕ denotes the bitwise exclusive or of two
binary values. Moreover, the operator ⊕ applied to a binary vector and a set of
5.3. SELF-SYNCHRONISATION AND SEQUENCING RULES 63
binary vectors is to be understood as follows.
Notation 1. Let u ∈ {0, 1}N be a binary vector. Let V ⊆ {0, 1}N be a set of
binary vectors. We denote by u⊕ V the set defined by
u⊕ V = {w ∈ {0, 1}N | w = u⊕ v, v ∈ V } .
The operator · denotes the bitwise and of two binary values. We denote
by w(u) the weight of a binary vector u. The weight of a binary vector is the
number of 1 it contains. The cardinality operator is denoted by | · |.
5.3 Self-Synchronisation and Sequencing Rules
We first explain our assumption as to how an encoding scheme determines
whether the data received at the channel output contains a new piece of infor-
mation. We call this feature readiness, as expressed in the following definition.
Definition 1. (data readiness). A binary data is ready if and only if all
transitions from the previous piece of data have completed.
We formulate the following definition of self-synchronisation.
Definition 2. (hard self-synchronisation). An encoding is hard self-syn-
chronising if and only if it detects any possible error over a timing error channel
TEC (εt) with 0 ≤ εt ≤ 1.
While equivalent to the definition of delay-insensitivity given by Var-
shavsky [1990], Def. 2 is more suited to a synchronous context. Indeed, the
discrete sampling instants are modelled in the timing error channel.
We contrast hard self-synchronisation with soft self-synchronisation, which
is the property to detect not all but many timing errors. Informally, soft self-
synchronisation could be defined as the property of an encoding scheme in-
cluding some time redundancy, but that is not hard self-synchronising. The
fact that hard self-synchronising schemes detect all possible timing errors is a
particularity of the error model. On the contrary, there exists no equivalent
concept for additive errors since the possible outputs of a binary symmetric
channel spans the whole space, preventing thus to detect all possible errors.
Fig. 5.2 emphasises the difference between a hard and a soft self-synchronising
encoding scheme.
We proceed by defining sequencing rules. We assume that time is discrete
and determined by the sampling instants. We would like to transfer an informa-
tion sequence that is encoded into a sequence of symbols {xk}k∈N over a timing
error channel. We denote by xk the k
th symbol emitted by the encoder, and by
yk the output of the timing error channel for the input xk . According to Def. 1,
a data yk at the output of a timing error channel is ready if and only if yk = xk .
Let Dk be the set of symbols xk that can be emitted at time k. We call the
sequence {Dk}k∈N the sequence of decoding sets. Therefore, Dk also constitutes
the set of expected symbols at the channel output. If yk is ready, then yk ∈ Dk.
The converse is not necessarily true. However, as in any syndrome decoding
scheme, we assume that a membership test is used to detect whether a given
output is ready or not: The output yk is declared ready if and only if yk ∈ Dk.
64 5. SELF-SYNCHRONISATION FOR SYNCHRONOUS ENCODING SCHEMES
time
hard self-synchronisation:
all non-codeword codeword codeword
soft self-synchronisation: 
low probability of sampling a codeword
sampling instants
Figure 5.2: Hard self-synchronisation—contrary to soft self-synchronisation—
ensures that any intermediate symbol that may be sampled while individual bit
transitions are taking place is not mistaken for a valid codeword.
In asynchronous systems, this membership test has to be implemented with
glitch-free logic.
So far, we have characterised an encoding scheme by its sequence of decoding
sets. We now define sequencing rules as encoding schemes whose sequence of
decoding sets obeys a particular rule. While we have explained that the decoding
method consists in applying a membership test between yk and Dk, we have
not told how each decoding set is obtained from one time index to the next. In
the most general case, a decoding set is a function of the whole transfer history
Dk = F (x0, x1, ..., xk−1, k). For practical reasons, we restrict our considerations
to encoding schemes with “Markovian” history. More precisely, we focus on
schemes having the property that each decoding set is a function only of the
last emitted symbol and the time index. We call such schemes sequencing rules.
Definition 3. (sequencing rule). A encoding scheme is a sequencing rule if
and only if its sequence of decoding sets is such that Dk = F (xk−1, k), for all
k ≥ 1.
Fig. 5.3 depicts the encoder structure of a sequencing rule. We now define
a few sets useful to study sequencing rules.
Definition 4. (symbol set). The symbol set S of a sequencing rule is the
subset of {0, 1}N containing all symbols that the encoder may output:
S =
{
s ∈ {0, 1}N | ∃a sequence of symbols x0, x1, ..., xk with xk = s
}
.
Without loss of generality, we assume that the symbol set is ordered, i.e., we
have, assuming an arbitrary ordering, S = {s0, s1, ..., s|S|−1}.
5.3. SELF-SYNCHRONISATION AND SEQUENCING RULES 65
information
L
x
k
k-1
x
encoder
N
counter
Figure 5.3: Encoder structure of a sequencing rule. The next emitted symbol
depends on the previously emitted one, on the number of symbols emitted so far,
and on the input information.
1 2
Succ(s, 0)1 Succ(s, 0)2
Succ(s, 1)0 Succ(s, 1)2 Succ(s, 1)1 Succ(s, 1)3
possible D0
possible D1
possible D2
tim
e
0
1
2
s s
2s0s 1s 3s
0s 1s 1s 3s 2s0s 1 2s s
Figure 5.4: The different symbol sequences that can be generated up to time index
2, for a 2-bit sequencing rule with symbol set S =
{
s0, s1, s2, s3
}
.
We call the symbol index the index of a symbol in the symbol set. By
convention, we write the symbol index in superscript. It follows from Def. 3
that, for each time index and for each symbol, we can define a set of symbols
that may follow it and will constitute the next decoding set if the symbol is
emitted.
Definition 5. (the set Succ (s, k)). For a sequencing rule, we define
Succ (s, k) = {p ∈ S | ∃ a sequence of symbols such that xk+1 = p and xk = s} .
In a sequencing rule, Dk+1 = Succ (s, k) whenever xk = s. Fig. 5.4 illustrates
a few sequences of symbols that can be generated according to a particular
sequencing rule. We point out that Def. 3 does not describe explicitly how a
sequencing rule conveys information, i.e., how particular codes are used in a
sequencing rule. We address this question later in Prop. 1 and Prop. 2.
The examples given below attest that sequencing rules encompass a wide
family of encoding schemes.
66 5. SELF-SYNCHRONISATION FOR SYNCHRONOUS ENCODING SCHEMES
information x
k
N
encoder
(code C0)
encoder
(code C1)
0
1
Figure 5.5: Structure of an alternating-phase encoder. The flip-flop acts as a 1-bit
counter.
Example 5. (standard codes). Consider an error detecting code C (e.g.,
a CRC). The corresponding sequencing rule is easily obtained. We have S =
D0 = C and Succ (c, k) = C, for all c ∈ C and k ∈ N.
Example 6. (alternating-phase encoding). Alternating-phase encoding has
been introduced in Sec. 4.2. The scheme transmits alternatively codewords of two
codes C0 and C1. The symbol set is S = C0 ∪ C1. Moreover, either D0 = C0,
or D0 = C1. Lastly, for all c ∈ C0 and k ∈ N, Succ (c, k) = C1 and for all
c ∈ C1 and k ∈ N, Succ (c, k) = C0. Spacer-based encoding (e.g., dual-rail) is a
particular alternating-phase encoding where one of the two codes contains only
a single symbol (i.e., a spacer symbol that carries no information). As shown
in Fig. 5.5, the structure of an alternating-phase encoder is a particular case of
the encoder structure depicted in Fig. 5.3.
Example 7. (differential encoding). Let C be a code. Let c (u) be the
codeword obtained by encoding the information u. Differential encoding is a
sequencing rule outputting xk+1 = xk ⊕ c (u) to transmit the information u in
the symbol xk+1, k > 0. Initially, x0 can be an arbitrary symbol. We have,
for all s ∈ S and k ∈ N, Succ (s, k) = s ⊕ C. D0 contains only one arbitrary
symbol. S = C if C is linear and D0 ⊂ C. Differential encoding is also known
as encoding by changes, or transition signalling. LEDR [Dean et al., 1991] can
be described with differential encoding—see later Fig. 5.8(b).
In Def. 6 and 7, we introduce two particular sequencing rules especially
relevant from an implementation point of view: time- and symbol-invariant
sequencing rules.
Definition 6. (symbol-invariant sequencing rule). A sequencing rule is
symbol-invariant if and only if the set Dk is entirely determined by the knowledge
of k, for all integers k ≥ 1.
5.3. SELF-SYNCHRONISATION AND SEQUENCING RULES 67
information x
k
k-1
x
Nencoder
(code C)
Figure 5.6: Structure of a differential encoder. The next emitted codeword depends
on the previously emitted one and on the input information.
Definition 7. (time-invariant sequencing rule). A sequencing rule is time-
invariant if and only if the set Dk is entirely determined by the knowledge of
xk−1, for all integers k ≥ 1.
For symbol-invariant sequencing rules, xk is entirely determined by k and
the information bits, whereas for time-invariant sequencing rules, xk is entirely
determined by xk−1 and the information bits. The structure of these sequenc-
ing rules is illustrated in Fig. 5.7. Alternating-phase encoding (described in
Example 6) is an example of symbol-invariant sequencing rule, since the next
decoding set is entirely determined by the number of symbols emitted so far.
The alternating-spacer protocol presented in [Sokolov et al., 2005] is also a
symbol-invariant sequencing rule. On the contrary, differential encoding is a
time-invariant sequencing rule because the next decoding set depends only on
the last emitted symbol xk , but not on the value of k itself.
We characterise in more depth time- and symbol- invariant sequencing rules
by giving the relation between codes—i.e., mapping between information and
symbols—and the decoding sets of these sequencing rules.
It follows from Def. 7 that a time-invariant sequencing rule is entirely speci-
fied by the finite collection of sets Succ
(
sj
)
with j = 0, ..., |S|−1 (in the previous
notation, the index in upper-script denotes the symbol index in the symbol set).
Of course, S =
⋃|S|−1
j=0 Succ
(
sj
)
. We consider a time-invariant sequencing rule
described henceforth by |S| sets Succ (sj) , j = 0, ..., |S|−1. Prop. 1 shows that
such a sequencing rule can be specified by a relation between a set of distinct
codes and each of the sets Succ
(
sj
)
, j = 0, ..., |S| − 1, as explained in Prop. 1.
Property 1. (time-invariant sequencing rule). A sequencing rule is time-
invariant if and only if there exists a set of distinct codes {C0, C1, ..., CQ−1}
and a mapping M from {0, ..., |S| − 1} to {0, ..., Q− 1} such that for all j =
0, ..., |S| − 1, Succ (sj) = sj ⊕ CM(j).
Proof. We start by assuming that there exists a set of distinct codes
{C0, C1, ..., CQ−1} and a mapping M : {0, ..., |S| − 1} 7→ {0, ..., Q− 1} such
that Succ
(
sj
)
= sj ⊕CM(j). This clearly means that the set Succ
(
sj , k
)
is only
function of the symbol index j. Therefore, the sequencing rule is time-invariant.
Now, we prove the reverse implication and consider a time-invariant sequenc-
ing rule. We show how to build a set of distinct codes and a mapping M as
68 5. SELF-SYNCHRONISATION FOR SYNCHRONOUS ENCODING SCHEMES
information
L
x
k
k-1
x
encoder
N
counter
information
x
k
k-1
x
encoder
N
(a) (b)
information
L
x
kencoder
N
counter
information x
kencoder
N
(c) (d)
Figure 5.7: Encoder structure of a (a) general, (b) time-invariant, and (c) symbol-
invariant sequencing rule. The simplest encoder structure—such as, standard codes
as defined in Example 5—is drawn in case (d) and has no self-synchronising property
because only spatial redundancy is added.
claimed. We proceed iteratively over the symbol set. Let C = ∅ be an empty
set of codes. Then, we do the following iterations: for all j = 0, ..., |S| − 1, we
define Cj = s
j ⊕ Succ (sj). If Cj 6∈ C, then we add Cj to C and set M(j) = j.
Otherwise, there exists an integer i < j such that Ci = Cj . In this case, we
set M(j) = i. After a finite number of iterations, we have constructed a set of
distinct codes, namely C, and a mapping M as claimed.
The next example illustrates Prop. 1 by showing that differential encoding
is a time-invariant sequencing rule.
Example 8. (differential encoding). Let C0 ⊂ {0, 1}N be a code. The
symbol set is taken (a priori) as S = {0, 1}N . The mapping M is defined as
follows:
M :
{
0, 1, ..., 2N − 1} 7→ {0}
M(j) = 0, for all j = 0, 1, ..., N.
As a result, we have Succ (s) = s ⊕ C0, for all s ∈ S, which describes indeed
differential encoding. We point out that LEDR is a particular case of differential
encoding, where the symbol set is given by S = {0, 1}2 and C0 = {(0, 1), (1, 0)}.
We proceed by stating the equivalent of Prop. 1 for symbol-invariant se-
quencing rules.
5.3. SELF-SYNCHRONISATION AND SEQUENCING RULES 69
Property 2. (symbol-invariant sequencing rule). A sequencing rule
is symbol-invariant if and only if there exists a set of distinct codes
{C0, C1, ..., CQ−1} and a mapping M from N to {0, ..., Q− 1} such that for all
k ∈ N, Succ (xk) = CM(k) and the initial decoding set D0 ∈ {C0, C1, ..., CQ−1}.
Proof. We start by assuming that there exists a set of distinct codes
{C0, C1, ..., CQ−1} and a mapping M : N 7→ {0, ..., Q− 1} such that Succ (xk) =
CM(k) and D0 ∈ {C0, C1, ..., CQ−1}. Obviously, this means that the set
Succ (xk, k) is only function of the time index k. Therefore, the sequencing
rule is symbol-invariant. Now, we prove the reverse implication and consider a
symbol invariant sequencing rule. We show how to build a set of codes and a
mapping M as claimed. We give an iterative construction. Let C be the set
of codes. We initialise C as follows: C = {D0}. Then, we do the following
iterations: for all k ≥ 0, define Ck+1 =
{
p ∈ {0, 1}N | p ∈ Succ (q, k) , q ∈ Ck
}
.
If Ck+1 6∈ C, then we add Ck+1 to C and set M(k+1) = k+1. Otherwise, there
exists an integer j ≤ k such that Cj = Ck+1. In this case, we set M(k + 1) = j.
By iterating infinitely, we construct a set of codes, namely C, and a mapping
M as claimed.
We illustrate Prop. 2 by applying it to alternating-phase encoding (which
we have defined in Example 6).
Example 9. (alternating-phase encoding). Consider a set of codes
{C0, C1}. Let S = C0 ∪ C1 be the symbol set. Let M be a mapping defined
by
M : N 7→ {0, 1}
M(k) = k mod 2.
Then, we obtain
Succ (xk) =
{
C1 if k mod 2 = 1,
C0 if k mod 2 = 0.
Dual-rail is a particular case where C0 = {(0, 0)} and C1 = {(0, 1), (1, 0)}.
LEDR, which we have already described in Example 8 by a timing-invariant
sequencing rule, is another particular case where C0 = {(0, 0), (1, 1)} and C1 =
{(0, 1), (1, 0)}.
Some particular encoding schemes can be described by a time-invariant or by
a symbol-invariant sequencing rule. For example, Fig. 5.8 depicts two different
encoder structures (one symbol-invariant, the other time-invariant) for LEDR.
We point out that there is no general method to express a symbol-invariant
sequencing rule as a time-invariant sequencing rule, and vice-versa.
The fundamental difference between time- and symbol-invariant sequencing
rules stems from the convention (namely, the mapping M) they define. Symbol-
invariant sequencing rules define a convention based on the time index k. In-
deed, the encoder and decoder have to keep track of how many symbols have
been emitted. In practice, this would be done by the counter (Cntr) drawn in
Fig. 5.7(c). On the contrary, time-invariant sequencing rules define a convention
bearing on the symbol space: the encoder and decoder have to keep track of the
last emitted symbol, as shown in Fig. 5.7(b).
70 5. SELF-SYNCHRONISATION FOR SYNCHRONOUS ENCODING SCHEMES
information bitsK
r 1
r 2
r k
u1
u2
u k
re
du
nd
an
t b
its
phase bit
u1u1
r 1
uKuK
rK
(a) (b)
Figure 5.8: Synchronous implementations of a (N = 2 K, K) LEDR encoder: (a)
symbol-invariant and (b) time-invariant. Implementation (a) is symbol-invariant
since the decoding set can be determined only knowing the phase, which is a 1-bit
counter. Implementation (b) is time-invariant because the next decoding set is a
function only of the previously emitted symbol.
Finally, we define bandwidth efficiency, which is the metric we use in order
to quantify how efficiently a sequencing rule uses wiring resources to transmit
information. Let T ∈ N be the number of N -bit symbols needed to transfer
K information bits with a particular encoding scheme. A maximum of 2TN
different sequences of bit vectors can be generated. However, among these, only∏T−1
i=0 |Di| different sequences of symbols carry information. As a result, we
define bandwidth efficiency for the transfer of K information bits by
η (K) =
∑T−1
i=0 log2 (|Di|)
TN
. (5.1)
By definition, 0 ≤ η (K) ≤ 1. For most encoding schemes, bandwidth efficiency
is not a function of the number K of information bits transferred, nor of the
number of needed symbols T . It means that the use of bandwidth is independent
of how much information is transferred. We will use the metric of Eq. (5.1) to
compare bandwidth efficiency of different schemes.
As illustration, we compute the bandwidth efficiency of differential encoding
with the Sperner code [Freiman, 1962] and of the alternating-phase encoding.
Example 10. (bandwidth efficiency of differential encoding with the
Sperner code). The Sperner code is a non-systematic unordered code min-
imising redundancy. It consists of all N -bit vectors of weight
⌊
N
2
⌋
(the choice
between the ceiling or floor function is arbitrary). We denote the Sperner code
by WN =
{
q ∈ {0, 1}N | w(q) = ⌊N2 ⌋}. Differential encoding with the Sperner
5.4. HARD SELF-SYNCHRONISING SEQUENCING RULES 71
code is the sequencing rule defined by the relation Dk = xk−1 ⊕WN . Let T be
the number of symbols emitted to convey a sequence of K information bits (ac-
tually, T = K/ log2 (|WN |); however the exact value of T is not needed). Here,
Di = WN , so that for all K ≥ 1
η =
T log2 |WN |
T N
=
log2 |WN |
N
,
which is the redundancy of the Sperner code.
Example 11. (bandwidth efficiency of the alternating-phase encod-
ing). We consider the alternating-phase encoding (defined in Example 6) using
two codes C0 and C1. Let T be an even number of emitted symbols, i.e., T = 2·t,
with t ∈ N. Then, Eq. (5.1) reads
η =
t∑
i=1
log2 |C0|+ log2 |C1|
2 · t ·N
=
log2 |C0|+ log2 |C1|
2 ·N .
Having now defined hard self-synchronisation (Def. 2), sequencing rules
(Def. 3), and bandwidth efficiency (Eq. (5.1)), a natural question is to bound
the bandwidth efficiency of hard self-synchronising sequencing rules. This is the
topic of the next section.
5.4 Hard Self-Synchronising Sequencing Rules
First, Sec. 5.4.1 defines self-synchronising sets and uses this notion to charac-
terise hard self-synchronising sequencing rules. In Sec. 5.4.2, we derive useful
properties of self-synchronising sets, which we exploit afterwards in Sec. 5.4.3 in
order to upper bound bandwidth efficiency of hard self-synchronising sequencing
rules.
5.4.1 Self-Synchronising Sets
We first state a few definitions.
Definition 8. (ordering relation). Let u and v be two N -bit vectors. We
say that u ≤ v when the ordering relation holds component-wise, i.e., u ≤ v ⇔
ui ≤ vi for all i = 1, ..., N .
We point out that the ordering relation of Def. 8 is not conserved by trans-
lation: x ≤ y 6⇒ x⊕ z ≤ y ⊕ z for all x, y, z ∈ {0, 1}N .
Unordered codes are easily defined using the latter ordering relation.
Definition 9. (unordered codes). A code C is unordered if and only if for
all u ∈ C, there exists no v ∈ C \ {u} such that v ≤ u.
We introduce now self-synchronising sets.
Definition 10. (self-synchronising set for a symbol). Let s ∈ {0, 1}N be
a binary vector. Let S ⊆ {0, 1}N be a set of binary vectors. We define S as a
self-synchronising set for s if and only if the two following conditions are met:
72 5. SELF-SYNCHRONISATION FOR SYNCHRONOUS ENCODING SCHEMES
(i) s 6∈ S and
(ii) for all p ∈ S, there exists no q ∈ S \ {p} such that s⊕ q ≤ s⊕ p.
We use the notation s ≺ S (s can precede S) to denote that S is a self-
synchronising set for s. In addition, for two sets of vectors P, Q ⊆ {0, 1}N ,
P ≺ Q means that p ≺ Q for all p ∈ P . Def. 10 essentially states that if a
self-synchronising set S for a symbol s contains a symbol p, then all symbols q
such that s⊕ q ≤ s⊕ p cannot belong to S. We emphasise later the difference
between self-synchronising sets and unordered codes in Sec. 5.4.2.
We state here the condition under which a sequencing rule is hard self-
synchronising.
Property 3. (hard self-synchronising sequencing rule). A sequencing
rule is hard self-synchronising if and only if xk−1 ≺ Dk for all k ≥ 1.
Proof. We need to show the two following assertions: (i) if an undetected error
occurs at time k, then xk−1 6≺ Dk, and (ii) if xk−1 6≺ Dk, an undetected error
can occur at time k. Let tk = xk ⊕ xk−1 be the transition vector. Eq. (3.9)
gives the input-output relation of a timing error channel. We rewrite it as
yk = xk ⊕ ek,t · tk
= (xk ⊕ tk)⊕ (tk ⊕ ek,t · tk)
= xk−1 ⊕
(
ek,t ⊕~1
)
· tk
= xk−1 ⊕ ek,t · tk, (5.2)
where ek,t is the bitwise logical negation of the vector e. We show (i) first. If an
undetected error occurs at time k, then yk ∈ Dk \ {xk}. Since it always holds
that ek,t · tk ≤ tk, using Eq. (5.2) on the left side and the definition of tk on
the right side, one obtains xk−1 ⊕ yk ≤ xk−1 ⊕ xk. This violates the second
condition of Def. 10 and thus xk−1 6≺ Dk. Next, we show (ii). If xk−1 6≺ Dk,
then, from Def. 10, either (a) xk−1 ∈ Dk or (b) there exist u, v ∈ Dk, u 6= v,
such that xk−1 ⊕ v ≤ xk−1 ⊕ u. In the former case, if ek,t = ~1, from Eq. (5.2),
one has yk = xk−1. If xk 6= xk−1, the error is then undetected because yk 6= xk
and yk ∈ Dk. In the latter case, since xk−1⊕ v ≤ xk−1⊕u, there exists a vector
f ∈ {0, 1}N such that xk−1 ⊕ v = f · (xk−1 ⊕ u). Using Eq. (5.2), if xk = u and
ek,t = f , then yk = v, which causes an undetected error because yk ∈ Dk.
Since Dk = Succ (s, k − 1) whenever xk−1 = s, it is equivalent to state that
a sequencing rule is self-synchronising if and only if for all s ∈ S and k ∈ N,
s ≺ Succ (s, k).
We illustrate Prop. 3 by applying it to differential encoding (defined in Ex-
ample 7).
Property 4. (hard self-synchronisation with differential encoding).
Differential encoding with a code C is a hard self-synchronising sequencing rule
if and only if C is an unordered code.
Proof. After emitting a symbol xk−1, the next decoding set is Dk = xk−1 ⊕ C.
By Prop. 3, hard self-synchronisation is obtained if and only if xk−1 ≺ xk−1⊕C
for all k ≥ 1. We show later in Prop. 6 that self-synchronising sets are preserved
by translation, i.e. xk−1 ≺ xk−1 ⊕ C if and only if ~0 ≺ C. The latter condition
expresses the unordering property of the code C.
5.4. HARD SELF-SYNCHRONISING SEQUENCING RULES 73
By a direct application of Prop. 3 to alternating-phase encoding (defined in
Example 6), we obtain the following.
Property 5. (hard self-synchronisation with alternating-phase encod-
ing). Consider the alternating-phase encoding with two codes C0 and C1. The
encoding is hard self-synchronising if and only if (i) C0 ≺ C1 and (ii) C1 ≺ C0.
Proof. With the alternating-phase encoding, Dk = C0 whenever xk−1 ∈ C1 and
Dk = C1 whenever xk−1 ∈ C0. The result follows by application of Prop. 3.
In the particular case of alternating-phase encoding, self-synchronisation
does not necessarily require C0 or C1 to be an unordered code. It is only
the case if one of these codes is a singleton consisting of the all-zero symbol.
Indeed, if C0 = {~0}, then Prop. 5 imposes ~0 ≺ C1, i.e., C1 has to be unordered
to ensure hard self-synchronisation. The other requirement C1 ≺ C0 is satisfied
as C0 is a singleton.
Lastly, we explain in the light of Prop. 3 why standard codes as defined in
Example 5 are not hard self-synchronising: the same symbol can be emitted
twice consecutively, which violates condition (i) of Def. 10.
5.4.2 Properties of Self-Synchronising Sets
Contrary to Def. 8, self-synchronising sets express a relation that is preserved
by translation, as the following property shows.
Property 6. (invariance by translation). Let s, p ∈ {0, 1}N be two arbitrary
symbols and S ⊆ {0, 1}N be a set of symbols. Then, p⊕ s ≺ p⊕ S if and only
if s ≺ S.
Proof. (⇐) We first show that if s ≺ S, then s⊕p ≺ p⊕S. One immediately sees
that p⊕ s 6∈ p⊕S because s 6∈ S. Hence, condition (i) in Def. 10 is verified. Let
u ∈ p⊕S. We need to show that any v ∈ p⊕S such that s⊕p⊕u ≤ s⊕p⊕ v is
identical to u to prove that condition (ii) also holds. Let x = p⊕u and y = p⊕v.
Clearly, x, y ∈ S. As s ≺ S, s ⊕ x ≤ s ⊕ y implies that x = y and thus that
u = v.
(⇒) Now, we show that if s ⊕ p ≺ p⊕ S, then s ≺ S. Since p⊕ s ≺ p⊕ S,
we have p⊕ s 6∈ p⊕S and thus s 6∈ S. Hence, condition (i) of Def. 10 is verified.
To show that condition (ii) is also verified, let us pick any x ∈ S, and show that
any y ∈ S such that x⊕s ≤ y⊕s is necessarily identical to x. Let u = x⊕p and
v = y ⊕ p. Clearly, u, v ∈ p⊕ S and u⊕ p⊕ s ≤ v ⊕ p⊕ s. Since p⊕ s ≺ p⊕ S,
it follows that u = v, which concludes the proof.
We can now state the relation between unordered codes and self-
synchronising sets. By definition, an unordered code C is a self-synchronising
set for the all-zero symbol: ~0 ≺ C. By a direct application of Prop. 6, we obtain
that S is a self-synchronising set for the symbol s if and only if s ⊕ S is an
unordered code. Because of these relations, self-synchronising sets can be re-
garded as a generalisation of unordered codes: self-synchronising sets express an
unordering property with respect to a particular symbol. This additional infor-
mation is essential to determine the bandwidth efficiency of a symbol-invariant
sequencing rule (see later, Thrm. 2).
74 5. SELF-SYNCHRONISATION FOR SYNCHRONOUS ENCODING SCHEMES
When considering self-synchronising sets which are not singletons, a ques-
tion arises naturally. Given an arbitrary symbol, what is the largest self-
synchronising set for this symbol? Prop. 6 enables a first observation: the
size of a self-synchronising set for a given symbol is independent of the choice
of the symbol. Indeed, if s ≺ S, then p ⊕ S is a self-synchronising set for the
symbol p⊕ s. As p can be chosen arbitrarily, p⊕ s can be any arbitrary symbol,
as well. Moreover, the set p ⊕ S has obviously the same cardinality as S. In
order to answer the question, we introduce preliminary definitions and results.
Then, Thrm. 1, 2, and 3 state key results.
We define first a maximum self-synchronising set.
Definition 11. (maximum self-synchronising set for a symbol). Let
s ∈ {0, 1}N . Let S ⊂ {0, 1}N be a self-synchronising set for s. S is a maximum
self-synchronising set for s if and only if there exists no other self-synchronising
set for s, T ⊂ {0, 1}N , such that |T | > |S|.
We also define equivalence classes in {0, 1}N . Let s ∈ {0, 1}N and consider
the equivalence relation between N -bit binary vectors: p ≡ q if and only if
w(s⊕ p) = w(s⊕ q) (the dependence on s is not written explicitly). From this
equivalence relation, we derive N + 1 equivalence classes.
Definition 12. (equivalence classes in {0, 1}N). Let s ∈ {0, 1}N . We define
N+1 equivalence classes in {0, 1}N : V iN (s) =
{
p ∈ {0, 1}N | w(s⊕ p) = i}, with
0 ≤ i ≤ N .
Def. 12 implies that V iN (s) contains exactly all vectors that are at Hamming
distance i from s. Lastly, we define a graph structure on {0, 1}N .
Definition 13. (graph on {0, 1}N). Let s ∈ {0, 1}N . We define a graph
structure, G (s), on {0, 1}N . An arc goes from a node u ∈ {0, 1}N to a node
v ∈ {0, 1}N if and only if u 6= v and s⊕ v ≤ s⊕ u.
We use the notation u → v to mean that an arc of G (s) connects a node u
to a node v. As illustration, Fig. 5.9 depicts the graph G (010). We give basic
properties of the graph structure defined in Def. 13.
Lemma 1. A node belonging to equivalence class V iN (s) has edges
(i) only towards nodes in equivalence classes V jN (s), with j < i,
(ii) towards i nodes in equivalence class V i−1N (s), and
(iii) from N − i nodes in equivalence class V i+1N (s).
Proof. (i) An arc connects node u to node v if and only if s⊕ v < s⊕ u, which
implies w(s⊕ v) < w(s⊕ u). The inequality is strict since, by definition of an
arc, u 6= v.
(ii) and (iii) Let u ∈ V iN (s). By definition, w(u⊕ s) = i. As a result, there exist
exactly i possibilities to transform the vector u⊕s into a vector u′⊕s such that
w(u′ ⊕ s) = i− 1 and u → u′. Conversely, there are exactly N − i possibilities
to transform the vector u⊕ s into a vector u′′ ⊕ s such that w(u′′ ⊕ s) = i + 1
and u′ → u.
For a set of binary vectors Q, define imax (Q) =
max0≤j≤N
{
j | Q ∩ V jN (s) 6= ∅
}
. In other words, imax (Q) is the largest
index among all equivalence classes that contains at least one element of Q.
We show the following claim.
5.4. HARD SELF-SYNCHRONISING SEQUENCING RULES 75
010
110 000 011
100 111 001
101
0
V (010)3
1
V (010)3
2
V (010)3
3
V (010)3
Figure 5.9: The graph G (010). The equivalence classes are drawn with a dotted
line. The claims of Lemma 1 can be verified on this example.
Lemma 2. Let S be a self-synchronising set for a vector s. Assume that
imax (S) >
⌊
N
2
⌋
. Then, there exists a self-synchronising set for s, S ′, such
that (i) imax (S
′) = imax (S)− 1 , and (ii) |S′| = |S|.
Proof. Let R = S ∩ V imax(S)N (s). Knowing R, we define the following set of
vectors:
A =
{
a ∈ V imax(S)−1N (s) | ∃r ∈ R such that r → a
}
.
By definition of A and because S is a self-synchronising set for s, S ∩ A = ∅.
We observe that (S\R)∪A is a self-synchronising set for s. We prove this claim
by contradiction, and assume that (S\R) ∪A is not a self-synchronising set for
s. This implies the existence of two distinct symbols a, p ∈ (S\R)∪A such that
a → p. Because S\R is a self-synchronising set for s, we have necessarily a ∈ A.
We have therefore a situation where r → a → p with r ∈ R. This implies in
turn r → p, which is impossible since S is a self-synchronising set.
We obtain S′ from S by performing |R| iterations defined as follows. Assume
that R is arbitrarily ordered: R =
{
r1, ..., r|R|
}
. Let S0 = S. Define, for
i = 0, ..., |R| − 1, Si+1 = (Si\ {ri}) ∪ {ai}, where ai ∈ A such that ri → ai. We
have to show that it is effectively possible to perform |R| iterations, i.e., that
there exist at least |R| vectors ai ∈ A such that ri → ai for i = 0, ..., |R| − 1.
This is feasible if and only if |A| ≥ |R|. We show that the latter inequality
holds by reasoning on the number of arcs that connect vectors of R to vectors
of A. According to (ii) of Lemma 1, this number of arcs is exactly |R| · imax (S).
However, each vector in A is pointed by at most N − (imax (S) − 1) vectors
of R (due to (iii) of Lemma 1). Therefore, this very same number of arcs
is upper bounded by |A| · (N − (imax (S)− 1)), so that we have the relation
76 5. SELF-SYNCHRONISATION FOR SYNCHRONOUS ENCODING SCHEMES
|R| · imax (S) ≤ |A| · (N − (imax (S)− 1)). We write thus
|R|
|A| ≤
N − (imax (S)− 1)
imax (S)
≤ 1,
where the last inequality holds if imax (S) ≥ N+12 ≥
⌊
N
2
⌋
, which is true by
assumption.
Finally, we show that S′ = S|R| fulfils items (i) and (ii) of the claim. By
construction, item (ii) is verified as |S| = |Si| for i = 1, ..., |R|. Moreover,
the construction ensures that imax
(
S|R|
)
= imax (S) − 1. At last, since S|R| ⊂
(S\R) ∪ A and (S\R) ∪ A is a self-synchronising set for s, S|R| is also a self-
synchronising set for s. As a result, S|R| satisfies (i) and (ii).
We state a last claim, very similar to Lemma 2. For a set of binary vectors
Q, define imin (Q) = min0≤j≤N
{
j | Q ∩ V jN (s) 6= ∅
}
.
Lemma 3. Let S be a self-synchronising set for a vector s, and assume that
imin (S) <
⌊
N
2
⌋
. Then, there exists a self-synchronising set for s, S ′, such that
(i) imin (S
′) = imin (S) + 1 , and (ii) |S′| = |S|.
A proof very similar to the one of Lemma 2 is possible.
Proof. We define the set R =
{
r ∈ S | r ∈ V imin(S)N (s)
}
. Knowing R, we define
the following set of vectors:
A =
{
a ∈ V imin(S)+1N (s) | ∃r ∈ R such that a → r
}
.
We observe that (S\R)∪A is a self-synchronising set for s. We prove this claim
by contradiction, and assume that (S\R) ∪A is not a self-synchronising set for
s. This implies the existence of two distinct symbols a, p ∈ (S\R)∪A such that
p → a. Because S\R is a self-synchronising set for s, we have necessarily a ∈ A.
We have therefore a situation where p → a → r with r ∈ R. This implies in
turn p → r, which is impossible since S is a self-synchronising set.
We obtain S′ from S by performing |R| iterations defined as follows. Let
S0 = S and define, for i = 0, ..., |R| − 1, Si+1 = (Si\ {ri}) ∪ {ai}, where ai ∈ A
such that ai → ri. We have to show that it is effectively possible to perform
|R| iterations, i.e., that there exist at least |R| vectors ai ∈ A such that ai → ri
for i = 0, ..., |R| − 1. Again, this is feasible if and only if |A| ≥ |R|. According
to (ii) of Lemma 1, the number of arcs joining vectors of A to vectors of R is
exactly |R| · (N − imin (S)). However, each vector in A is connected to at most
imin (S) + 1 vectors of R (due to (iii) of Lemma 1). Therefore, the number
of arcs is upper bounded by |A| · (imin (S) + 1), so that we have the relation
|R| · (N − imin (S)) ≤ |A| · (imin (S) + 1). We write thus
|R|
|A| ≤
imin (S) + 1
N − imin (S) ≤ 1,
where the last inequality holds if imin (S) ≤ N−12 ≤
⌊
N
2
⌋
, which is true by
assumption.
Finally, we verify easily that S ′ = S|R| fulfils items (i) and (ii) of the claim
(same verifications as in Lemma 2).
5.4. HARD SELF-SYNCHRONISING SEQUENCING RULES 77
We have now derived the properties required to state the following theorem
that constructs a maximum self-synchronising set.
Theorem 1. (maximum self-synchronising set for a symbol). Let s ∈
{0, 1}N . Define WN as
WN =
{
q ∈ {0, 1}N | w(q) =
⌊
N
2
⌋}
.
s⊕WN is a maximum self-synchronising set for s.
Proof. We have the set equality V
bN2 c
N (s) = s⊕WN . Indeed, s⊕WN contains all
vectors that are at Hamming distance
⌊
N
2
⌋
of s, which is exactly the definition of
V
bN2 c
N (s). Due to (i) of Lemma 1, we obtain that s⊕WN is a self-synchronising
set for s. Let S be another self-synchronising set for s. We have to show
that |S| ≤ |s ⊕ WN |. We know from Lemmas 2 and 3 that S can be used to
generate another self-synchronising set for s, S ′, such that (a) |S′| = |S|, and
(b) imax (S
′) = imin (S
′) =
⌊
N
2
⌋
. However, (b) implies that S ′ ⊂ V b
N
2 c
N (s). As a
result, S′ ⊂ s⊕WN , which in turn implies |S′| ≤ |s⊕WN |. Finally, combining
with (a), we get the desired result |S| ≤ |s⊕WN |.
As mentioned earlier, the cardinality of a maximum self-synchronising set
only depends on the size of the vector space, N , and not on the choice of a
particular symbol s.
Although we have proven Thrm. 1 by relying only on properties of self-
synchronising sets, we could have applied Prop. 6 to motivate the fact that find-
ing a maximum self-synchronising set for the all-zero symbol—i.e., a maximum
unordered code—is an equivalent problem. Since the Sperner code WN is an un-
ordered code minimising redundancy, it contains the maximum possible number
of codewords for a given symbol size. While the proof given by Freiman [1962] is
more complex than our approach, it provides an additional result: uniqueness.
That is, the Sperner code is the unique unordered code minimising redundancy.
As a corollary, the self-synchronising set defined in Thrm. 1 is also unique (be-
sides the arbitrary choice between the floor and ceiling functions).
We give another result about maximum self-synchronising sets.
Theorem 2. (maximum self-synchronising set for a set). Let s ∈ {0, 1}N
be a binary vector. Let S be a maximum self-synchronising set for s. Let Q ={
q ∈ {0, 1}N | S is a self-synchronising set for q
}
. Then, Q = {s, s}.
Proof. The proof relies on the fact that a maximum self-synchronising set for
a vector has a known specific form—as mentioned previously. For brevity, we
assume N even. By uniqueness of a maximum self-synchronising set, it holds
that for all q ∈ Q, S = s⊕WN = q ⊕WN . Equivalently, we write
s⊕ q ⊕WN = WN .
We show that the latter equation has no other solution than q = s and q = s.
Let t ∈ {0, 1}N be a symbol such that t ⊕ WN = WN . We show that the
only solutions of the latter equation are t = ~0 or t = ~1. We assume first that
0 < w(t) < N2 . Then, there exists a vector u ∈ WN such that t ≤ u. As
78 5. SELF-SYNCHRONISATION FOR SYNCHRONOUS ENCODING SCHEMES
a result, w(t⊕ u) < N/2, which implies t ⊕ u 6∈ WN . This shows that there
exists no t ∈ {0, 1}N such that t ⊕ WN = WN and 0 < w(t) < N/2. Now,
assume that w(t) = N/2, i.e., t ∈ WN . Let u = t ∈ WN . Since w(t⊕ u) = N ,
t ⊕ u 6∈ WN . There exists thus no t ∈ {0, 1}N such that t ⊕ WN = WN and
0 < w(t) ≤ N/2. Finally, let us assume that N/2 < w(t) < N . Then, there
exists a vector u ∈ WN , such that u ≤ t. As a result, w(t⊕ u) < N/2, which
implies t⊕u 6∈ WN . In conclusion, we have shown that there exist no t ∈ {0, 1}N
such that t⊕WN = WN and 0 < w(t) < N . As a result, either t = ~0 or t = ~1.
If N is odd, a similar discussion is possible.
The meaning of the latter theorem is that if S is a maximum self-
synchronising set for a symbol s, then s⊕ ~1 is the only other symbol for which
S is also a maximum self-synchronising set.
We give a last result that characterises maximum systematic self-
synchronising sets. Systematicity of a set is defined as follows.
Definition 14. (systematic set of binary vectors). Let S ⊂ {0, 1}N and
let K, 0 < K ≤ N , be an integer. S is K-systematic if and only if for all
u ∈ {0, 1}K, there exists a unique s ∈ S such that s is the concatenation of u
and v, for a certain vector v ∈ {0, 1}N−K.
For example, a systematic (N, K) code constitutes a K-systematic set con-
sisting of 2K N -bit codewords, each one being obtained by concatenating re-
dundant bits to information bits.
Theorem 3. (maximum systematic self-synchronising set). Let s ∈
{0, 1}N . Let S ⊂ {0, 1}N be a self-synchronising set for s. Then, S is K-
systematic implies N ≥ K + log2 (K + 1).
Proof. We assume that S is K-systematic and consider N as an unknown con-
stant. We are looking for an integer R such that concatenating an R-bit vector
to every vector in {0, 1}K yields a self-synchronising set for s. K, R and N
therefore satisfy K + R = N and we have to show that R ≥ log2 (K + 1). We
denote by r(x) ∈ {0, 1}R the R-bit vector concatenated to the K-bit vector
x ∈ {0, 1}K.
According to Def. 12, we define K + 1 equivalence classes V iK (sK) for i =
0, ..., K, where sK is the K-bit vector formed by the K most significant bits of
the N -bit vector s. We also define R + 1 equivalence classes V iR (sR) for i =
0, ..., R, where sR is the R-bit vector formed by the R least significant bits of the
N -bit vector s. We denote by (u | v) the concatenation of vector v to vector u.
Using the decomposition {0, 1}R = ⋃Ri=0 V iR (sR), there exist orderings of vectors
of {0, 1}R , {0, 1}R = {r0, ..., r2R−1} such that sR ⊕ ri 6≤ sR ⊕ rj if i < j. For
example, take r0 ∈ V RR (sR), ri ∈ V R−1R (sR) , i = 1, ...,
(
R
R− 1
)
, etc. We claim
that the following construction (which describes the Berger code [Berger, 1961])
forms a self-synchronising set for s: for i = 0, ..., K and for all u ∈ V iK (sK),
r(u) = ri. To prove the claim, let us assume that there exist u, v ∈ {0, 1}K ,
u 6= v, such that sK ⊕ u ≤ sK ⊕ v. This implies w(sK ⊕ u) < w(sK ⊕ v). By
construction, there exist two integers i and j such that r(v) = rj , r(u) = ri, and
i < j. As i < j, we have ri 6≤ rj , which ensures s ⊕ (u | r(u)) 6≤ s ⊕ (v | r(v)).
5.4. HARD SELF-SYNCHRONISING SEQUENCING RULES 79
Therefore, the construction gives a self-synchronising set for s with 2R−1 = K,
i.e., R = log2 (K + 1).
It remains to show that R = log2 (K + 1) is indeed the smallest integer
needed to obtain a self-synchronising set for s. Take K + 1 distinct vectors
ui ∈ V iK (sK), for i = 0, ..., K, such that u0 ≤ u1 ≤ ... ≤ uK . For S to be a self-
synchronising set for s, we must have r(ui) 6= r(uj), for all i 6= j, 0 ≤ i, j ≤ K.
As a result, the mapping r(·) defines at least K +1 different values, i.e., we have
necessarily 2R ≥ K + 1.
The next section characterises optimum hard self-synchronising sequencing
rules. More specifically, we use the results derived in this section to find the
combinations of sequencing rules and codes that maximise bandwidth efficiency,
while ensuring the hard self-synchronising property.
5.4.3 Optimum Hard Self-Synchronising Encoding
We first give an achievable upper bound on the bandwidth efficiency of any hard
self-synchronising sequencing rule.
Theorem 4. (upper bound on the bandwidth efficiency of hard self-
synchronising sequencing rules). Differential encoding with the Sperner
code (defined in Example 10) has the largest bandwidth efficiency among all
hard self-synchronising sequencing rules.
Proof. Let us consider differential encoding with the Sperner code which is
defined by the symbol sequencing rule: Dk = xk−1⊕WN . According to Thrm. 1,
each decoding set Dk is a maximum self-synchronising set for the previously
emitted symbol xk−1. As a result, no sequencing rule uses bandwidth more
efficiently than differential encoding with the Sperner code.
Equivalently, the bandwidth efficiency of any hard self-synchronising se-
quencing rule cannot be larger than
log2 |WN |
N , which is the bandwidth efficiency
of differential encoding with the Sperner code (computed in Example 10).
Furthermore, we bound the bandwidth efficiency of hard self-synchronising
sequencing rules using systematic encoding.
Theorem 5. (upper bound on the bandwidth efficiency of systematic
hard self-synchronising sequencing rules). Any hard self-synchronising
sequencing rule using a systematic encoding to convey K information bits has
a bandwidth efficiency less than or equal to KN , where N ≥ K + log2 (K + 1).
Proof. A systematic hard self-synchronising sequencing rule maximises band-
width efficiency if and only if it all its decoding sets are maximum systematic
self-synchronising sets. According to Thrm. 3, the symbol size N of such a
sequencing rule is lower bounded by K + log2 (K + 1).
Differential encoding with the Berger code—which has been suggested by the
construction in the proof of Thrm. 3—achieves the upper bound of Thrm. 5.
We state another result showing that symbol-invariant sequencing rules (de-
fined in Def. 6) cannot achieve the upper bound on bandwidth of Thrm. 4 which
is attained by a time-invariant sequencing rule.
80 5. SELF-SYNCHRONISATION FOR SYNCHRONOUS ENCODING SCHEMES
Theorem 6. (sub-optimality of symbol-invariant sequencing rules).
Any N -bit, N > 2, symbol-invariant hard self-synchronising sequencing rule
has a bandwidth efficiency strictly less than log2 |WN |N .
Proof. We have shown in Prop. 2 that any symbol-invariant sequencing rule is
such that for all k ∈ N, there exist 2 codes Ck and Ck+1 such that Dk = Ck and
Dk+1 = Ck+1. To achieve the upper bound on bandwidth efficiency, Ck+1 must
be a maximum self-synchronising set for any symbol belonging to Ck. Then, by
Thrm. 2, Ck cannot contain more than 2 symbols. Nonetheless, in order to reach
the upper bound of Thrm. 4, Ck must also be a maximum self-synchronising
set for any symbol belonging to Dk−1. Since |WN | > 2 for all N > 2, both
Ck and Ck+1 cannot be simultaneously maximum self-synchronising sets, unless
N = 2. This shows that
log2 |WN |
N is not a tight upper bound on the bandwidth
efficiency of hard self-synchronising symbol-invariant sequencing rules.
To summarise, Thrm. 4 informs that transmitting symbols each time at
Hamming distance
⌊
N
2
⌋
from the preceding one is an optimal encoding strategy
as far as hard self-synchronisation is concerned. This result is based on Thrm. 1
which exhibits the differential structure of a maximum self-synchronising set. If
a systematic encoding is required, differential encoding using the Berger code
is optimal. Lastly, by exploiting Prop. 2 and Thrm. 2, we have shown that
symbol-invariant sequencing rule are sub-optimal, unless N = 2. In the latter
case, LEDR (that can be described by both a timing- and symbol-invariant
sequencing rule) is optimal.
In order to illustrate the results presented in this section, Fig. 5.10 plots the
ratio of the bandwidth efficiency of a few well-known codes to the bandwidth
efficiency of the Sperner code, assuming a sequencing rule either with a spacer, or
differential encoding. Dual-rail and LEDR always have a bandwidth efficiency
that is at least 55% of optimal. Yet, contrary to more bandwidth-efficient
encoding, they can be implemented with a glitch-free membership test.
A natural question triggered by Thrm. 6 is as follows. Since symbol-invariant
hard self-synchronising sequencing rules cannot achieve the upper bound on
bandwidth efficiency mentioned in Thrm. 4 for symbol sizes larger than 2, what
is a tight bound on the bandwidth efficiency of such sequencing rules? So far, we
could not answer completely this question. Nevertheless, we state the problem
and give a conjecture in the next section.
5.4.3.1 Optimum Symbol-Invariant Sequencing Rules
As previously discussed, alternating-phase encoding is a particular symbol-
invariant sequencing rule. In all generality, a symbol-invariant encoding al-
ternates transmission between more than two codes, while alternating-phase
encoding uses only two codes, C0 and C1, as presented in Example 6. In the
sequel, we focus the discussion on alternating-phase encoding. That is, referring
to Fig. 5.5 , we look for a pair of N -bit codes (C0, C1) maximising bandwidth
efficiency and ensuring hard self-synchronisation.
We know from Example 11 that, in order to maximise bandwidth efficiency
of the alternating-phase encoding, the product |C0| · |C1| has to be maximised.
Combining with Prop. 5, we obtain that a hard self-synchronising alternating-
5.4. HARD SELF-SYNCHRONISING SEQUENCING RULES 81
0 5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
100
channel width (N) [bit]
re
la
tiv
e 
ba
nd
w
id
th
 e
ffi
ci
en
cy
 [%
]
1−of−4 
dual−rail (if spacer),  
LEDR (if differential)  
Berger
Sperner
Figure 5.10: Relative bandwidth efficiency of various codes with respect to the
optimal solution, i.e., the Sperner code with spacer-based or differential encoding.
phase encoding maximises bandwidth efficiency if and only if the couple of codes
(C0, C1) solves the following problem.
Problem 2. Let C0, C1 ⊆ {0, 1}N be two codes. Maximise |C0| · |C1| under the
constraint that (i) C0 ≺ C1 and (ii) C1 ≺ C0.
As a consequence of Thrm. 2, the codes C0 and C1 solving Prob. 2 are
not unordered codes of minimum redundancy, which we already verified by an
example in Fig. 5.1. Furthermore, we have verified by an exhaustive search that
LEDR solves Prob. 2 for even values of N not larger than 8 (solutions for odd
values are not exactly LEDR, but can be derived from it). We conjecture that
LEDR solves Prob. 2 for any even value of N . Since LEDR is a systematic
encoding adding one redundant bit to every information bit, its bandwidth
efficiency is 50%. According to this conjecture, the bandwidth efficiency of any
alternating-phase encoding would be upper bounded to 50%.
While we could not solve Prob. 2, the optimality of LEDR has been shown
for a related problem.
Problem 3. Let C0, C1 ⊆ {0, 1}N be two codes. Maximise |C0| · |C1| under the
constraint that for all s ∈ C0 and p ∈ C1, w(s⊕ p) =
⌊
N
2
⌋
.
The proof of Prob. 3 is complex and has been omitted in this thesis. Notice
that if a pair of codes C0, C1 ⊆ {0, 1}N satisfies w(s⊕ p) =
⌊
N
2
⌋
for all s ∈ C0
and p ∈ C1, clearly C0 ≺ C1 and C1 ≺ C0, which according to Prop. 5 ensures
hard self-synchronisation. However, proving the reverse implication, namely
82 5. SELF-SYNCHRONISATION FOR SYNCHRONOUS ENCODING SCHEMES
that if two codes satisfy C0 ≺ C1 and C1 ≺ C0 then w(s⊕ p) =
⌊
N
2
⌋
for all
s ∈ C0 and p ∈ C1, is not obvious either.
Although we could not show the equivalence between Prob. 2 and Prob. 3,
the requirement in the latter problem that any codeword in C0 differs in exactly⌊
N
2
⌋
bit positions from any codeword in C1 is an intuitive extension of the
optimality criterion exposed in Thrm. 1. In our opinion, this remark augments
the suspicion that LEDR solves Prob. 2.
5.4.3.2 Summary on Hard Self-Synchronising Encoding
So far, hard self-synchronising codes have been exclusively used in asynchronous
circuits. Because this application requires the code membership test to be
implemented glitch-free, only a few codes meeting this requirement—such as
1-of-N encoding, dual-rail, and LEDR—have been considered. However, as
timing errors become increasingly widespread in synchronous circuits, self-
synchronisation can now be studied in a broader framework where a glitch-free
code membership test is not required.
While self-synchronisation has been mainly considered as a property of un-
ordered codes [Verhoeff, 1988], our approach is more general and defines self-
synchronisation as a property of the symbol sequencing rule (Prop. 3). We have
shown that, for particular sequencing rules, e.g., differential encoding (Prop. 4)
or spacer-based encoding (corrolary of Prop. 5) using the all-zero symbol as a
spacer, self-synchronisation is indeed equivalent to the unordering property of
a code. On the contrary, for other sequencing rules such as alternating-phase
encoding, hard self-synchronisation is not equivalent to the unordering property
of a code (Prop. 5).
In Def. 10, we have introduced self-synchronising sets, which are a generalisa-
tion of unordered codes. Self-synchronising sets express an unordering property
with respect to a particular symbol. In Thrm. 1, we have given the form of a
maximum self-synchronising set for a particular symbol s: this set consists of
all symbols that are at Hamming distance
⌊
N
2
⌋
(or
⌈
N
2
⌉
) from s. This result
is fundamental since it shows that maximum self-synchronising sets (i) have a
differential form, and (ii) are defined using the Sperner code. Exploiting these
facts, we have deduced that differential encoding with unordered codes of min-
imum redundancy is the combination of sequencing rule and code maximising
bandwidth efficiency. This optimal sequencing rule is time-invariant and con-
stitutes the general upper bound on bandwidth efficiency stated in Thrm. 4.
Next, we have discussed hard self-synchronisation for symbol-invariant
sequencing rules. The problem formulation leads to the notion of self-
synchronising set, not only for a symbol, but for a set of symbols—a concept
definitely more general that the unordering property. As a corollary of Thrm. 2,
we have shown in Thrm. 6 that symbol-invariant sequencing rules cannot achieve
the upper bound on bandwidth efficiency given in Thrm. 4. Furthermore, we
have pointed out that unordered codes of minimum redundancy do not max-
imise bandwidth efficiency under such sequencing rules. Finally, we have stated
Prob. 2, i.e., maximising bandwidth efficiency of alternating-phase encoding. We
have conjectured the optimality of LEDR under these assumptions and given a
related problem (Prob. 3) solved by this encoding scheme.
We proceed now by deriving the soft self-synchronisation properties of the
widespread linear codes and of alternating-phase encoding using linear codes.
5.5. SOFT SELF-SYNCHRONISATION WITH LINEAR CODES 83
5.5 Soft Self-Synchronisation With Linear Codes
Linear codes have been extensively studied and have many applications in error
detection and correction. Their capability of detecting additive errors is well-
known [MacWilliams and Sloane, 1977]. In addition, researchers have developed
efficient implementations. Because linear codes are so widespread, we compute
their capability of detecting timing errors. More precisely, we derive in Sec. 5.5.1
the undetected error probability of systematic linear codes over the timing error
channel. We call the undetected error probability of a code its residual error
rate. As expected, the error detection capability becomes very poor under
large timing error rate (because linear codes only add spatial redundancy). In
Sec. 5.5.2, we discuss a modification that improves the detection capability under
a large error rate.
5.5.1 Linear Codes Over the Timing Error Channel
The goal of this section is twofold. First, we believe that the timing error channel
is an as relevant model of errors occurring on an on-chip link as the binary
symmetric channel is. Therefore, we bound and approximate the residual error
rate of systematic linear codes over a timing error channel TEC(εt). Second,
the results presented in this section will be useful to derive those presented in
Sec. 5.5.2.
Deriving the residual error rate is not straightforward—the full derivation
is given in Appendix A. In what follows, we only sketch the main steps. We
consider a (N, K) code C (i.e., a linear code that encodes K-bit information
vectors into N -bit codewords) and make the following assumptions.
• C is systematic. That is, its codewords are obtained by concatenating
N −K redundant bits to the K information bits.
• C is linear. A code is linear if and only if it includes the all-zero codeword
and the sum of any two codewords is also a codeword.
• All codewords of C are equally likely to be transmitted.
Systematic codes are very convenient because they simplify the encoding pro-
cedure. Moreover, linear and systematic codes have the same error detection
capabilities as linear and non-systematic codes.
In addition, we make a last assumption which we need to upper bound the
residual error rate. The N −K parity checks of the code C can be expressed
compactly in a K × (N −K) matrix, P , whose element pl,i = 1 if and only if
the information bit xl is involved in the i
th parity check constraint. That is,
any codeword x ∈ C satisfies N −K parity check relations
K⊕
l=1
xl · pl,i = xK+i, i = 1, ..., N −K. (5.3)
We assume that the code C satisfies Hypothesis 1.
Hypothesis 1. The K× (N−K) matrix P defined by the code C has full rank:
rank (P ) = min (K, N −K) =
{
K if K ≤ N/2
N −K otherwise.
84 5. SELF-SYNCHRONISATION FOR SYNCHRONOUS ENCODING SCHEMES
Although stronger that the previous ones, the latter assumption is often
verified by codes exhibiting strong error detection capabilities. For example, it
is verified by the CRC presented later in this sections and by the well-known
Hamming code. Parity check codes as well as any linear code defined by two
parity checks also verify Hypothesis 1 because the associated matrix P is not
full rank only for degenerated cases that have no practical interest.
We proceed now with the derivation of the residual error rate, which we
denote by εres. We use the notations of Sec. 3.2. εres is the probability that the
received data yk is a codeword different from the one sent, i.e., xk . Equivalently,
yk 6= xk if and only if e˜ 6= ~0. The undetected probability εres is then
εres =
∑
e˜∈C\{~0}
P (e˜) , (5.4)
with P (e˜) denoting the probability of occurrence of the timing error vector e˜.
Because C is a linear code, the transition vector is a codeword. We express now
εres by conditioning on the transition vector
εres =
∑
e˜∈C\{~0}
∑
t∈C
P (e˜ | t) P (t) . (5.5)
In addition, e˜ ≤ t, because e˜ = t · et, so that P (e˜ | t) 6= 0 if and only if e˜ ≤ t.
Therefore, we rewrite Eq. (5.5) as
εres =
∑
e˜∈C\{~0}
∑
t∈C
t≥e˜
P (e˜ | t) P (t) . (5.6)
By the codeword equiprobability assumption, the transition vector t is evenly
distributed over the code C. Hence,
P (t) =
{
1
2K if t ∈ C,
0 otherwise.
(5.7)
Combining Eqs. (5.6) and (5.7) yields
εres =
1
2K
∑
e˜∈C\{~0}
∑
t∈C
t≥e˜
P (e˜ | t) . (5.8)
The main difficulty consists in evaluating the sum∑
t∈C
t≥e˜
P (e˜ | t) , (5.9)
for each non-zero timing error vector e˜. We derive in Appendix A a lower bound,
an approximation, and an upper bound on the sum of Eq. (5.9). We obtain the
5.5. SOFT SELF-SYNCHRONISATION WITH LINEAR CODES 85
lower bound
εres ≥ 1
2K
K∑
i=1
Ai εt
i
{
1 + (1− εt)N−K
[
(2− εt)K−i − 1
]}
+
1
2K
N∑
i=K+1
Ai εt
i, (5.10)
and the approximation
εres ∼= 1
2K
K∑
i=1
Ai εt
i
{
1 + (1− εt)b
N−K
2 c [(2− εt)K−i − 1]}
+
1
2K
N∑
i=K+1
Ai εt
i, (5.11)
where, in Eqs. (5.10) and (5.11), εt is the timing error rate and Ai is the number
of codewords of weight i (i.e., containing exactly i bits equal to 1). The value
of Ai can be readily obtained for every code [MacWilliams and Sloane, 1977].
If the code C satisfies Hypothesis 1, the following upper bound holds
εres ≤ 1
2K
N∑
i=1
Ai εt
i(2− εt)M(i), (5.12)
where M depends on i as follows
M(i) =

K − i if i ≤ d (C⊥)− 1,
K − d (C⊥)+ 1 if d (C⊥) ≤ i ≤ N −K + d (C⊥)− 1,
N − i if i ≥ N −K + d (C⊥) , (5.13)
where C⊥ is the code orthogonal to C and d
(
C⊥
)
the minimum distance of C⊥,
that is the minimum number of 1 in any non-zero codeword of C⊥. By definition,
C⊥ contains all codewords for which the inner product with all codewords of C
is null. Because C is linear, its orthogonal code C⊥ is easily obtained using the
matrix P characterising C.
In case εt = 1, Eqs. (5.10), (5.11), and (5.12) amount to
2K−1
2K
, which is
nearly 1 when K is large. As expected in such a situation, the received data yk
is exactly xk−1. But, as xk−1 is itself a codeword, an undetected error occurs,
unless xk = xk−1.
We have drawn in Fig. 5.11 the bounds and approximation derived for the
residual error rate of a particular linear code as a function of the timing error
rate εt. The figure shows that, when the timing error rate is relatively small
(i.e., εt ≤ 0.1), the residual error rate is several orders of magnitude smaller
than the timing error rate. However, the residual error rate tends to 1 as the
timing error rate increases to 1. This feature is a major impediment for the use
of linear codes as an encoding scheme checking timing errors on a self-calibrating
link: no negative feedback would be received even when the link is completely
inoperative.
86 5. SELF-SYNCHRONISATION FOR SYNCHRONOUS ENCODING SCHEMES
10−3 10−2 10−1 100
10−10
10−8
10−6
10−4
10−2
100
timing error rate (εt)
re
si
du
al
 e
rr
or
 ra
te
 (ε
re
s)
upper bound of Eq. (5.12) 
lower bound of Eq. (5.10) 
approximation of Eq. (5.11) 
simulation 
Figure 5.11: Residual error rate as a function of the timing error rate εt for the
linear (N = 40, K = 32) CRC code generated by the polynome x8 + x2 + x + 1.
5.5.2 Alternating-Phase Encoding With Linear Codes
In Fig. 4.5, we have shown that the LEDR encoding involves each information
bit with the phase bit in exactly one parity check relation. A very straightfor-
ward modification enabling to reduce the wiring overhead consists in including
more than one information bit into a parity check relation. For example, let us
consider the transfer of 32 information bits. Instead of computing 32 redundant
bits with 32 independent encoders—as LEDR would do—one could very well
use, for instance an 8-bit CRC, to generate only 8 redundant bits. Each redun-
dant bit is computed by only one encoder as a sum of some information bits
and the phase bit. This example underlines that two parameters can be varied,
namely (i) the number of independent encoders, and (ii) the amount of redun-
dancy added per encoder. Since LEDR is a (N = 2, K = 1) code, the number of
independent encoders is in this case equal to the number of information bits to
transfer. Fig. 5.12 shows different encoding options, including LEDR. In what
follows, we focus on the most general case: the alternating-phase encoding with
only one independent encoder. We call the option with only one CRC encoder
CRC-based alternating-phase encoding.
As shown in Sec. 5.5.1, linear codes detect timing errors very poorly under
large timing error rate (εt ≥ 0.5), because, contrary to hard self-synchronising
codes, they only add spatial redundancy. We have already illustrated the fact
that detecting timing errors requires temporal redundancy, i.e., additional in-
5.5. SOFT SELF-SYNCHRONISATION WITH LINEAR CODES 87
redundancy
number of independent encoders
+100%+25%
1
2
32 N.A.
+50%
N.A.
32
8
32
clk
QD
CR
C-
8 
En
c 32
16
32
clk
QD
CR
C-
16
 E
nc 32
32
32
clk
QD
CR
C-
32
 E
nc
16
4
32
clk
QD
16
4
CR
C-
4 
En
c
CR
C-
4 
En
c
16
8
32
clk
QD
16
8
CR
C-
8 
En
c
CR
C-
8 
En
c
16
16
32
clk
QD
16
16
CR
C-
16
 E
nc
CR
C-
16
 E
nc
CR
C-
1
1 1
clk
QD
1
32
CR
C-
1
1
Figure 5.12: Possible options of alternating-phase encoding. The bottom right
drawing depicts the LEDR code. All options with more than one independent
encoder are particular cases of the top row.
formation about the sequencing of data—e.g., refer to Fig. 4.4. Due to the in-
clusion of the phase bit into several parity checks, CRC-based alternating-phase
encoding detects timing errors certainly better than bare CRCs. It remains to
quantify exactly the residual error rate of CRC-based alternating-phase encod-
ing. To do so, we make the same assumptions about C as in Sec. 5.5.1 (i.e.,
C is linear, systematic, all codewords are equally likely to be transmitted and
we assume that Hypothesis 1 is verified). We can thus apply the bounds and
the approximation of the sum in Eq. (5.9) derived in Appendix A to obtain the
following expressions (the proof is given in Appendix B):
εres ∼= 1
2K
K∑
i=1
Ai εt
i
{
(1− εt)d−1 + (1− εt)b
N−K
2 c [(2− εt)K−i − 1]} , (5.14)
and, if C satisfies Hypothesis 1 of Appendix A,
εres ≤ 1
2K
N∑
i=1
Ai εt
i
{
(1− εt)d−1 +
[
(2− εt)M(i) − 1
]}
, (5.15)
where d is the minimum distance of the code, Ai is the number of weight-i
codewords in the (N + 1, K + 1) code C, and M(i) depends on the weight i as
88 5. SELF-SYNCHRONISATION FOR SYNCHRONOUS ENCODING SCHEMES
10−3 10−2 10−1 100
10−10
10−8
10−6
10−4
10−2
100
timing error rate (εt)
re
si
du
al
 e
rr
or
 ra
te
 (ε
re
s)
upper bound of Eq. (5.15) 
approximation of Eq. (5.14) 
simulation 
40−bit word error rate 
Figure 5.13: Residual error rate (εres) as a function of the timing error rate εt
for the (N = 40, K = 32) CRC-8 alternating-phase encoding generated by the
polynome x8 + x2 + x + 1. The 40-bit word error rate curve has been plotted to
show that the residual error rate of the encoding is very low (less than 10−10) as
long as the word error rate remains less than a few percents.
defined in Eq. (5.13). We have plotted the upper bound and approximation of
Eqs. (5.15) and (5.14) in Fig. 5.13, for the CRC-8 alternating-phase encoding
generated by the polynome x8 + x2 + x + 1, and validated the quality of the
approximation by comparing with a simulation. The residual word error rate
is actually maximum for a bit error rate equal to 0.5. In such a case, data
received by the decoder is completely mixed between the previous one and the
correct one, which is the worst situation for the decoder. However, as the bit
error rate further increases, the residual word error rate eventually reaches 0—
this is verified from the graph and the equations of both the upper bound and
approximation. In this situation, the decoder detects deterministically the error
because the phase bit added before the parity checks causes a single bit error,
which is always detected. Such a feature is essential for a self-calibrating link
that may operate sporadically under such a large error rate.
In this section, we have introduced an embodiment of soft self-synchronising
codes. By a significant generalisation of LEDR, we have derived a new en-
coding family: CRC-based alternating-phase encoding. The new encoding has
a bandwidth efficiency significantly larger than LEDR (32/40 = 80% in the
case discussed previously vs. 50% for LEDR). The increase in bandwidth ef-
ficiency comes at the expense of a loss in reliability: the resulting encoding is
5.5. SOFT SELF-SYNCHRONISATION WITH LINEAR CODES 89
Berger code (N = K + log2 (K + 1) , K)
K
K+log2(K+1)
LEDR code (N = 2 ·K, K) K2·K = 50%
CRC alternating-phase code (N, K) KN
Table 5.1: Bandwidth efficiency of the compared encoding schemes.
no longer hard self-synchronising—as expected from the conjecture expressed in
Sec. 5.4.3.1. Nonetheless, we have bounded analytically and accurately approx-
imated the induced loss of reliability.
5.5.3 Case Study
We illustrate the discussion lead in this chapter by comparing the following
systematic encoding options:
• differential encoding with the Berger code,
• the LEDR encoding, and
• the CRC-8 alternating-phase encoding,
under the comparison metrics: bandwidth efficiency, robustness to timing and
additive errors. We perform the comparison for a common and fixed wiring
resource (i.e., bus width N). More precisely, we consider a narrow bus N = 10
and a wider bus N = 38. The different metrics are computed as follows. Since
the three considered schemes are systematic, the computation of bandwidth
efficiency is straightforward. Indeed, the bandwidth efficiency of each encoding
scheme is K/N , as shown in Table 5.1.
The Berger encoding and LEDR are hard self-synchronising. On the con-
trary, the CRC-8 alternating-phase does not detect all timing errors: its residual
error rate is approximated by Eq. (5.14).
To assess the robustness towards additive errors, we compute the residual
error rate over a binary symmetric channel (BSC). The residual error rate of
the CRC-based alternating-phase encoding is easily obtained [MacWilliams and
Sloane, 1977]. Denoting εa the additive bit error rate, we have
εCRCres,a =
N∑
i=1
Ai εa
i (1− εa)N−i, (5.16)
with Ai the number of weight-i codewords. We derive now the residual error
rate of the Berger encoding and LEDR over the BSC. To do so, we need a
preliminary result.
Property 7. (residual error rate of several identical encoders). Con-
sider a (M · N, M ·K) encoding scheme consisting of M identical (N, K) en-
coders. Each individual (N, K) encoder transmits over a binary symmetric
channel (BSC) with bit error rate εa. Assume that the errors occurring on
a particular BSC are statistically independent from the ones occurring on the
other BSCs. Let εres be the residual error rate of each individual (N, K) en-
coder. Let εtotres be the residual error rate of the global (M ·N, M ·K) encoding
90 5. SELF-SYNCHRONISATION FOR SYNCHRONOUS ENCODING SCHEMES
scheme. Then, εtotres is given by
εtotres =
M∑
i=1
(
M
i
)
εres
i (1− εa)N(M−i) . (5.17)
Proof. An undetected error on the (M · N, M · K) encoding scheme occurs if
and only if one of the i, 1 ≤ i ≤ M, following events occurs: i among the M
independent encoders have each one an undetected error and the M − i others
have no error at all. This is exactly the probability computed in Eq. (5.17).
Using Prop. 7, the residual error rate of LEDR is easily obtained, since a
(N = 2 · K, K) LEDR encoder consists of K independent encoders, each one
encoding one information bit into a 2-bit codeword. Moreover, the only error a
2-bit LEDR encoder does not detect is a one affecting both bits, because such
errors leave the phase of the corrupted data unchanged. These errors happen
with probability εa
2. Applying Prop. 7 with N = 2, M = K and εres = εa
2
yields directly
εLEDRres,a =
K∑
i=1
(
K
i
)
ε2 ia (1− εa)2 (K−i) . (5.18)
The residual error rate of the Berger code over the binary symmetric channel
is derived in Appendix C. In the wide bus scenario presented, the (N = 38, K =
30) Berger encoder consists of two (N = 19, K = 15) encoders. The resulting
residual error rate is obtained by using Prop. 7.
We have plotted in Figures 5.14 and 5.15 the residual word error rate of the
considered encoding schemes as a function of the additive bit error rate over a
narrow and wide bus. We point out that, in the narrow bus case, the 3-bit
CRC code presented is the one with the lowest residual error rate among the 8
possible other 3-bit CRC codes.
Finally, we have reported in Tables 5.2 and 5.3 the bandwidth efficiency, the
kind of self-synchronisation (i.e., hard or soft) and the robustness to additive
errors. Regarding the latter metric, we have computed the orders of magnitude
change in the residual error rate for a one order of magnitude change in the raw
bit error rate. We call this metric δ. Equivalently, δ is the slope of the curves
obtained in Figures 5.14 and 5.15, that is δ =
∆(log(εres,a))
∆(log(εa))
. In both scenarii,
CRC APC Berger LEDR
bandwidth efficiency 70% 70% 50%
self-synchronisation soft hard hard
δ 2 2 2
Table 5.2: Comparison metrics for the narrow bus (N= 10).
differential encoding with the Berger code outperforms LEDR: the former has
a better bandwidth efficiency than the latter and the same robustness to both
timing and additive errors. While differential encoding with the Berger code
stands out as the best option in the narrow bus scenario, the CRC-8 alternating-
phase encoding may be preferred over a wide bus dominated by additive errors.
Indeed, this encoding shows a better robustness to additive errors than the
5.5. SOFT SELF-SYNCHRONISATION WITH LINEAR CODES 91
10−6 10−5 10−4 10−3 10−2 10−1 100
10−12
10−10
10−8
10−6
10−4
10−2
100
additive bit error rate (εa)
   
   
re
si
du
al
 e
rr
or
 ra
te
 (ε
re
s
w
,a
)
narrow bus (N=10)
Berger code (K=7) 
CRC code (K=7) 
LEDR code (K=5) 
Figure 5.14: Residual error rate as function of the additive bit error rate εa over
a 10-bit wide BSC for the (N = 10, K = 7) Berger code (top), the (N = 10, K =
5) LEDR code (middle), and the (N = 10, K = 7) alternating-phase encoding
generated by the polynome x3 + x2 + 1 (bottom).
Berger code: its residual error rate decreases approximately of 3.5 orders of
magnitude as the bit error error rate decays of one order of magnitude (vs. 2
for the Berger code). Moreover, the hard self-synchronisation property has to be
de-emphasised in the sense that no real-life link introduces only timing errors.
In fact, all hard self-synchronising encodings have a non-zero residual error rate
as soon as additive errors are considered.
CRC APC Berger LEDR
bandwidth efficiency 79% 79% 50%
self-synchronisation soft hard hard
δ 3.5 2 2
Table 5.3: Comparison metrics for the wide bus (N= 38).
92 5. SELF-SYNCHRONISATION FOR SYNCHRONOUS ENCODING SCHEMES
10−6 10−5 10−4 10−3 10−2 10−1 100
10−25
10−20
10−15
10−10
10−5
100
additive bit  error rate (εa)
re
si
du
al
 e
rr
or
 ra
te
 (ε
re
s,
a)
wide bus (N=38)
Berger code (K=30)
CRC code (K=30) 
LEDR code (K=19) 
Figure 5.15: Residual error rate as function of the additive bit error rate εa over
a 38-bit BSC for the 2 · (N = 19, K = 15) Berger code, the (N = 38, K = 19)
LEDR code, and the (N = 38, K = 30) alternating-phase encoding generated by
the polynome x8 + x2 + x + 1.
5.6 Conclusions
Sec. 5.6.1 summarises the achievements of the chapter. Next, Sec. 5.6.2 discusses
the limitations of coding to further improve the robustness of alternating-phase
encoding, which links with the next chapter.
5.6.1 Summary of Achievements
Due to the scaling of CMOS technology and reduction in noise margins, we argue
that timing errors—and not only additive errors—are becoming a concern in
synchronous circuits. Motivated by this observation and the particular features
of the problem considered in this thesis, we have expressed and studied self-
synchronisation in the context of a synchronous link.
In a first part of this chapter, we have presented a comprehensive taxonomy
of hard self-synchronising encoding schemes. The originality of our approach re-
lies on the definition and study of symbol sequencing rules. We have shown that,
depending on the considered symbol sequencing rule, hard self-synchronising
encoding schemes maximising bandwidth efficiency do not necessarily use un-
ordered codes of minimum redundancy—although the latter have been known so
far as the optimal coding solution as far as self-synchronisation is concerned. In
5.6. CONCLUSIONS 93
particular, the theorems stated in Sec. 5.4.3 and the conjecture of Sec. 5.4.3.1
bring a fundamental insight into the self-synchronisation capabilities of syn-
chronous systems. We refer the reader to Sec. 5.4.3.2 for a detailed summary
about hard self-synchronisation.
In a second part, we have contrasted hard self-synchronisation with soft self-
synchronisation, i.e., encoding schemes able to detect many but not all timing
errors. We have presented a particular embodiment, CRC-based alternating-
phase encoding, that exhibits unique detection capabilities towards both timing
and additive errors. First, we have bounded and approximated the residual
error rate of systematic linear codes over a timing error channel. This result
complements the well-known formula for the residual error rate of linear codes
over the binary symmetric channel and, in our opinion, is as much important
to estimate the level of reliability offered by these codes. Moreover, the result
confirms the expectations that linear codes detect very poorly timing errors
under a large error rate. In order to improve the detection capability under large
error rate, we have proposed a novel family of encoding scheme: CRC-based
alternating-phase encoding which we derived from a significant generalisation of
LEDR. As CRC-based alternating-phase encoding enables to reduce the wiring
overhead to a level that does not ensure any more hard self-synchronisation, we
have developed tight bounds on the residual error rate of this encoding over a
timing error channel.
Lastly, we have applied some of the results derived in this chapter to de-
termine which of a few particular hard and soft self-synchronising schemes are
most adapted to transmission over a link subject to both timing and additive
errors.
5.6.2 Fundamental Limits of Soft Self-Synchronising Encodings
As shown in Fig. 5.13, the CRC-based alternating-phase encoding does not
detect timing errors reliably under moderate bit error rates. Thus, the downside
is that the operating point controller is constrained to avoid the weak spot of
this checker.
In order to improve the robustness to timing errors, one may wonder whether
a hard self-synchronising encoding scheme exists with the same additive errors
detection capabilities as alternating-phase encoding. Equivalently, can coding
improve the latter scheme such that it detects all timing errors and preserve
its detection capabilities towards additive errors? Referring to the conjecture
stated in Sec. 5.4.3.1, we know that, among the hard self-synchronising en-
codings alternating two dictionaries, LEDR is the one maximising bandwidth
efficiency. As a result, using coding to improve the robustness to timing errors
of alternating-phase encoding means modifying the encoding towards LEDR, as
suggested in Fig. 5.16. However, such modifications imply two negative aspects.
First, the bandwidth efficiency cannot exceed the one of LEDR, i.e., only 50%.
Second, LEDR has the particularity of including only a single information bit
in each parity check. Therefore, it has a poor robustness to additive errors.
In summary, coding can further increase the timing error detection capabil-
ities of alternating-phase encoding, but only at the expense of bandwidth effi-
ciency and robustness to additive errors. Henceforth, the next chapter proposes
double sampling—instead of coding—to enhance simultaneously the robustness
and bandwidth efficiency of the CRC alternating-phase checker.
94 5. SELF-SYNCHRONISATION FOR SYNCHRONOUS ENCODING SCHEMES
detection of additive errors
de
te
ct
io
n 
of
 ti
m
in
g 
er
ro
rs
100%
10
0%
BergerLEDR
CRC
CRC alternating-phase 
encoding
Sperner
BCH
Reed-Solomon
1-Hot
codes optimised
for additive errors
codes optimised
for timing errors
bandwidth efficiency and 
robustness to  additive errors
DECREASE
Figure 5.16: The graph sketches how the novel CRC-based alternating-phase en-
coding compares with existing encoding schemes targeting either timing or additive
errors. The results developed in the chapter show that improving the robustness
to timing errors of the CRC alternating-phase encoding entails both a reduction of
bandwidth efficiency and a lower robustness towards additive errors, as indicated by
the thick arrow.
Chapter 6
Double Sampling, Coding, or
Both?
On ne connaˆıt que les choses que
l’on apprivoise.
Antoine de Saint-Exupe´ry.
Before answering the question posed in the title, we contrast qualitativelyin Sec. 6.1 the detection capabilities of Razor flip-flops (introduced in
Sec. 2.2.1) and soft self-synchronising codes (abbreviated soft SSC and intro-
duced in Sec. 5.5). We emphasise the strong complementary of these two check-
ers and introduce a novel reliability metric specific to checkers for self-calibrating
designs. Exploiting their complementarity, Sec. 6.2 is devoted to the optimal
combination of double sampling checkers with soft SSC. First, Sec. 6.2.1 de-
scribes in detail a checker combining Razor flip-flops with a soft SSC. In partic-
ular, we show in Sec. 6.2.2 the superiority of this combined checker by comparing
thoroughly its robustness to timing errors with, individually, Razor flip-flops and
the soft SSC.
Then, Sec. 6.2.3 proposes a variation on the combined checker, whereby
reliability is further enhanced at the expense of error correction capabilities.
Our conclusions on the robustness of the combined checkers—presented in
Sec. 6.2.5—is that double sampling without any error correction combined with
a soft SSC is optimal as far as reliability is concerned. The hardware complexity
of the combined checkers is briefly discussed in Sec. 6.2.4. Interestingly, despite
their strong robustness, their hardware overhead is minimal.
Next, Sec. 6.3 develops briefly the question of comparing checkers for self-
calibrating circuits. Specifically, we discuss whether information about the op-
erating point control policy is required to compare checkers. In Sec. 6.4, we
propose research directions to apply checkers combining double sampling flip-
flops with soft SSC not only to communication, but also to computing elements
such as adders. Finally, Sec. 6.5 concludes by summarising the contributions of
the chapter.
95
96 6. DOUBLE SAMPLING, CODING, OR BOTH?
6.1 Qualitative Comparison of Razor Flip-Flops with Codes
In this section, we emphasise the complementarity of Razor flip-flops and soft
SSC by a qualitative discussion. The conclusions drawn from the comparison are
confirmed later by the simulations presented in Sec. 6.2.2. Finally, we introduce
a reliability metric specific to checkers used in a self-calibrating design.
First, we show qualitatively how the residual and reported error probabilities
of each checker vary as a function of the link supply voltage. This information
characterises completely the quality of a checker. As with any checker, the data
delivered to the end-user should not be corrupted by residual errors. In addition,
the checker informs the controller of each detected error, so that unsafe operating
points—supply voltage, in the particular case—can be dynamically avoided.
The reader may want to read Appendix D where the detection capabilities of
both checkers are compared in the whole delay-voltage plane using the error
model introduced in Chapter 3. The comparison following below is a snapshot
focusing on a fixed delay value. Essentially, the conclusions drawn in Appendix
are similar to the ones that follow.
Let tp be the propagation delay through a bit line. We consider tp a random
quantity parametrised by the supply voltage vch: for each particular value of
vch, tp is characterised by a probability distribution (details are given in Ap-
pendix D). We perform a qualitative comparison because the points we want to
make depend on important features of the checkers and not on a particular bit
error rate model (i.e., on the actual relation between tp and vch). By definition,
a bit line is affected by a timing error whenever
tp ≥ Tc (6.1)
with Tc the sampling period.
A single Razor flip-flop is affected by a residual bit error if and only if
tp 6∈ [Td; Tc + Td] (6.2)
with Td the delay between the clock fed to the main flip-flop and the clock fed
to the shadow latch. If a timing error occurs on a bit line and is such that
tp ≥ Tc + Td, then the error is undetected since both the main flip-flop and
shadow latch hold the same data piece. Therefore, even if Razor flip-flops on
other bit lines detect a timing error, the data output for the particular line
where tp ≥ Tc + Td is corrupted. On the contrary, if the propagation time on a
bit line is so short that tp ≤ Td, then the next data piece is latched too early by
the shadow latch, which causes the correct data held by the main flip-flop to be
invalidated since a mismatch is detected. This phenomenon is called a short-path
error. It is worth noting that the data delivered on a bit line where a short-
path error occurs is corrupted—indeed, the next data piece is delivered—even
though no timing error actually occurred. As a result, reliable operation of a
single Razor flip-flop occurs in limited range of voltages, where the probability
of a short-path and an undetected timing error are both acceptably low. In
practice, this is ensured by introducing buffers delaying data arrival—avoiding
thus short-path errors—while undetected timing errors are avoided thanks to
worst-case characterisation of tp.
Considering a group of Razor flip-flops (such as in a Razorised bus), a word
error is reported when at least one of the Razor flip-flops reports an error.
6.1. QUALITATIVE COMPARISON OF RAZOR FLIP-FLOPS WITH CODES 97
vdd
link 
supply voltage
1
Razor
re
si
du
al
 w
or
d 
er
ro
r
pr
ob
ab
ilit
y
soft SSC
short path 
errors (Razor)
vdd
1
re
po
rte
d 
wo
rd
 e
rro
r
pr
ob
ab
ilit
y
soft SSCRazor
error-free feedback 
to controller
short path 
errors (Razor)
weak spot of 
soft SSC 
high residual
 bit errors (Razor)
link 
supply voltage
worst-case 
operating voltage 
vdd
1
ra
tio
 re
sid
ua
l t
o 
re
po
rte
d
er
ro
r p
ro
ba
bi
lity
link 
supply voltage
Razor
soft SSC
unreliable feedback
weak spot of 
soft SSC 
Figure 6.1: Top and middle: residual (top) and reported (middle) word error
probability as a function of supply voltage for a bus terminated by Razor flip-flops or
a soft SSC. Bottom: ratio of residual to reported error probability for each checker.
The reliability metric plotted in the vertical axis expresses clearly and compactly the
complementarity of both checkers.
98 6. DOUBLE SAMPLING, CODING, OR BOTH?
Equivalently, the error signal fed to the controller is obtained by ORing the
individual error signals of each Razor flip-flop. A residual word error occurs
whenever at least one Razor flip-flop is affected by a residual bit error, even
though other Razor flip-flops than the one(s) causing a residual error may report
an error. Surprisingly, in a situation where at least one Razor flip-flop is affected
by a residual bit error and at least one Razor flip-flop reports an error, residual
and reported word errors are not exclusive events.
The top (respectively middle) graph of Fig. 6.1 compares the residual (re-
spectively reported) word error probability of the two link checkers: a group of
Razor flip-flops and a soft SSC. The latter operates satisfactorily both under
small and large error rates—as expected from Fig. 5.13—where Razor flip-flops
suffer from short-path or undetected timing errors. On the contrary, Razor flip-
flops provide correct information to both the end-user and the controller in the
operating area where the soft SSC is unreliable.
Now, we introduce the reliability metric used to compare the different check-
ers. In a system with fixed operating points, reliability is measured by the
residual error rate, which is the probability that an undetected error occurs.
This metric makes sense because the operating points are not expected to be
dynamically adjusted due to detected errors. On the contrary, a self-calibrating
system uses a checker not only to avoid the delivery of corrupted outputs—like
in traditional systems with fixed operating points—but also to provide a feed-
back to the controller so that unsafe operating points are dynamically avoided.
That is, a checker used in a self-calibrating design should provide reliable in-
formation about both residual errors (feedback relevant for the end-user) and
reported errors (feedback relevant for controller). The particularity of Razor-
based checkers is that reported and residual errors are not exclusive. Thus,
there is no simple relation between these two quantities, contrary to soft SSC
where any undetected error is necessarily residual and vice-versa. As a result, a
relevant reliability metric for a self-calibrating circuit involving Razor flip-flops
is thus the ratio ρ of the residual error probability to the reported (i.e., detected)
error probability:
ρ =
εres
εrep
. (6.3)
The smaller the metric is, the more reliable the checker. The complementarity of
Razor flip-flops and soft SSC is obvious by comparing their respective reliability
ratio, which is sketched in the bottom graph of Fig. 6.1. This graph conveys
compactly the information expressed in the top and middle graphs of the figure.
In addition, the detection capabilities of Razor flip-flops and soft SSC are
also complementary in terms of failure mode of the link. That is, as the voltage
is reduced to sub-critical values, soft SSC favour a steep transition from low to
extremely large error rates. Such a feature minimises the exposure to moderate
error rates where they are the least reliable. On the contrary, a more shallow
transition from low to high error rates makes the possible operating range of
Razor flip-flops larger. Finally, Razor flip-flops and soft SSC differ in how errors
are recovered. The former can correct errors—and, thus recover efficiently—
while the latter rely on retransmissions, which incurs a larger latency penalty.
Having emphasised the complementarity of the two checkers, the next section
proposes to combine them.
6.2. DOUBLE SAMPLING AND CODES 99
CRC
encoder
K N
rzr_error
crc_error
error
data
K
N CRC
decoder
link
Rzr
FF
EN
EN
EN
Figure 6.2: The K-bit input data is first encoded into a N -bit codeword. The
codeword is transmitted over a link terminated by Razor flip-flops. Finally, the data
sampled by the Razor flip-flops is validated in the decoding stage.
6.2 Double Sampling and Codes
The section is organised as follows. First, Sec. 6.2.1 describes a novel checker
architecture combining Razor flip-flops with a soft SSC. Sec. 6.2.2 studies its
robustness to timing errors and shows its superiority to checkers consisting of
only Razor flip-flops or only a soft SSC. However, we argue that, as far as re-
liability is concerned, the combination of double sampling with the soft SSC
is not optimal. Thus, Sec. 6.2.3 proposes a variation on the checker architec-
ture introduced in Sec. 6.2.1 whereby, double sampling used without any error
correction capability is combined with a soft SSC. We show that the resulting
checker further enhances the robustness to timing errors, while incurring a min-
imal hardware cost. Finally, Sec. 6.2.4 discusses the hardware complexity of
the combined checkers and Sec. 6.2.5 summarises on how to optimally combine
double sampling and soft SSC.
6.2.1 Razor Flip-Flops Combined with Codes
Fig. 6.2 depicts a checker combining Razor flip-flops with a soft SSC. The
combined checker is organised as a 3-stage pipeline (encoding-transmission-
decoding). The basic motivation is to combine serially the two checkers, so
that the soft SSC validates the data output by the Razor flip-flops while the
latter corrects errors in the range of moderate error rate where the soft SSC
reliability is poor.
The binary signal crc error indicates an error in the decoding. The binary
signal rzr error indicates the detection of at least one timing error among the N
Razor flip-flops sampling the link output. The decoding occurs after the Razor
flip-flops, allowing thus to combine error signals from the decoder and the Razor
flip-flops very easily: the error signal output along with the decoded data is the
OR of crc error and rzr error.
As Fig. 6.2 shows, the phase of the decoder is kept unchanged whenever a
timing error is detected because the data corrected by Razor flip-flops is output
one cycle after the error is reported. Freezing the phase of the decoder during a
cycle has two consequences on the encoder. First, its phase should be frozen as
100 6. DOUBLE SAMPLING, CODING, OR BOTH?
well, since the two phases must stay synchronised. In turn, as the encoder phase
is kept unchanged, the encoder output must also be held during a cycle. If not,
two consecutive data pieces would be encoded with the same phase. We point
out that the synchronisation of the encoder and decoder phase as performed in
Fig. 6.2 poses a significant limitation on the possible values of the parameter Td.
Indeed, the enable signal feeding the encoder and encoder phase flip-flops (line
drawn in the bottom of Fig. 6.2) needs to be de-asserted in the same cycle in
which the rzr error signal is asserted. This translates into the following timing
constraint (neglecting the set-up and hold time of flip-flops):
Td + tp ≤ Tc, (6.4)
where tp designates the time for the enable signal feeding the encoding logic
to propagate backward from the decoder side. The latter equation necessarily
constrains Td to relatively small values. Later in Sec. 6.2.3, we introduce a
checker that combines double sampling flip-flops with soft SSC without correct-
ing timing errors. Among other, one advantage of such a checker is that Td is
not restricted to small values due to a relation such as Eq. (6.4).
The top diagram of Fig. 6.3 illustrates how a timing error is detected, cor-
rected by the shadow latch, and finally validated by the soft SSC. The timing
error occurs in cycle 1, caused by the late arrival of D2 at the Razor flip-flops
input. In the same cycle, D1 is validated by the decoding logic and is output
without error in cycle 2. In cycle 2, the rzr error signal is asserted, because
the shadow latch samples D2 that has finally arrived. The decoding logic also
asserts its error signal because it sees now D1 with the wrong phase. In cycle
3, D2 is output by the Razor flip-flop to the decoding logic, which returns the
rzr error signal to low. Moreover, the phase has been kept unchanged to allow
for the additional cycle needed for D2 to arrive. As a result, the decoding logic
validates D2 that is output without error in cycle 4.
The bottom diagram of Fig. 6.3 shows how the combined checker behaves
in the presence of a short-path error. The error occurs in cycle 1. The shadow
latch is corrupted by D2 that arrives too early, which triggers the assertion
of the Razor error signal. In turn, the assertion of the rzr error signal causes
the encoder output and the phase to keep their value in cycle 2. Moreover, in
the same cycle, the shadow latch provides the value D2 to the decoding logic,
causing the rzr error signal to return to 0. Because the phase has been kept to
1, it does not match with D2 and thus the CRC error signal is asserted. Next,
in cycle 3, D2 is still present at the decoder input. The phase has changed to
0. Therefore, the decoding logic validates D2 that is finally output in cycle 4.
Short-path errors are intrinsic to double sampling. Henceforth, the combined
checker cannot eliminate them. However, as shown in Fig. 6.3, detected and
corrected timing errors do not cause the same behavior as short-path errors:
the error signals from Razor and CRC errors are not asserted in the same order.
Due to this fact, short-path errors can still be diagnosed (i.e., distinguished from
timing errors).
Besides the detection of timing and short-path errors, in-order data delivery
is also required. As such, the combined checker does not ensure in sequence
data delivery since (i) when a timing error is corrected, the encoder output is
frozen for a cycle, causing thus the data at its input to be disregarded, and
(ii) an error detected only by the decoder triggers a retransmission (which is
6.2. DOUBLE SAMPLING AND CODES 101
D1 D2 D3 D3 D4 D5 D6
D1 D2 D3 D3 D4 D5
cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 cycle 6
D1 D2 D3 D4 D5D0 D1
D0 D1 D2 D3 D4D1
late arrival of D2
clk
encoder
output
rzr FF
input
rzr FF
output
decoder
output
phase
rzr
error
crc
error
error
D1 D2 D2 D3 D4 D5 D6
D1 D3 D4 D5
cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 cycle 6
D1 D2 D2 D4 D5
clk
encoder
output
rzr FF
input
rzr FF
output D0
D0 D2 D3 D4decoderoutput D1
phase
rzr
error
crc
error
D2
D3
D3
error
early arrival of D2
Figure 6.3: Timing diagram of a timing error (top) and of a short-path error
(bottom).
102 6. DOUBLE SAMPLING, CODING, OR BOTH?
equivalent to flushing the pipeline). The task of delivering only correct and
in-sequence data pieces is performed by an ARQ controller, such as the one
presented in Chapter 4.
6.2.2 Robustness to Timing Errors
We have contrasted qualitatively the detection capabilities of Razor flip-flops
and soft SSC in Sec. 6.1. We compare now thoroughly the timing error detection
capabilities of the combined checker with, individually, Razor flip-flops or a soft
SSC. First, we explain the simulation set-up. Next, we give the comparison
results.
6.2.2.1 Experimental Set-Up
We have simulated in VHDL a system consisting of a FIFO, an encoder, a
bus terminated by Razor or double sampling flip-flops, a decoder, and an ARQ
controller. Each simulation has been run until 500,000 eight-bit data pieces
randomly generated were transferred in sequence. For large word error rate,
the actual number of word transfers is actually much larger due to frequent
retransmissions (up to 107 transfers).
The simulation set-up is as follows. We apply the capacitive delay model to
express the delay tp through a bit line of the link as:
tp =
CL
km
· vch
(vch − vth)2
, (6.5)
with km the transistor transconductance, CL the line capacitance, and vth the
device threshold voltage. We refrain from using more complex delay models for
two reasons: first, the lumped capacitance model is widely used in the design
tools, e.g., by those of Cadence. Second, we study the reliability of the checkers
for any error rate from 0 to 100%. In that sense, the presented results are
independent of the actual relation between the link supply voltage and the bit
error rate. Like in Chapter 3, the ratio CL/km is denoted by α and is modelled
by a Gamma random variable to account for the variability in the line delay.
The coefficient vch/ (vch − vth)2 in the right hand side of Eq. (6.5) is considered
as deterministic. The bit line delay is thus a random variable parametrised by
the supply voltage vch.
The bus is modelled with non-synthetisable VHDL and introduces random
timing errors. That is, during simulated word transfers, we generate for each bit
line a random value for the delay tp through the line. Henceforth, we determine
for each line whether a timing error occurs (i.e., when tp ≥ Tc) and, if that is
the case, whether the error can be detected and corrected by the Razor flip-flop
(i.e., when tp ≤ Tc + Td).
The delays tp through different bit lines are modelled as independent and
identically distributed. As the voltage is lowered, the bit error rate rises identi-
cally on all lines but each line remains statistically independent from the others.
Fig. 6.4 describes how we model a bus terminated with Razor flip-flops.
The model is only used to generate timing errors and mimic the capability
of Razor flip-flops to correct them. A Razor flip-flop affected by an undetected
timing error outputs the previously sampled data, even during a correction cycle
triggered by (at least) one other Razor flip-flop that has detected an error. The
6.2. DOUBLE SAMPLING AND CODES 103
mode:  normal (1) 
or correcting (0)
0
1
1
0
1
0
enc_data(i) rzr_data(i)
rzr_error
rzr_error(i)
error_fact(i)
rzr_data
N
enc_data
N
rzr_error
N
error_fact
N
er
ro
r m
od
el
V Tc
N
ch
Figure 6.4: Model of a N -bit bus terminated by Razor flip-flops.
ith bit of the signal error fact indicates the presence of a timing error on the ith
input bit (i.e., enc data(i)). This timing error is detected by the ith Razor flip-
flop if and only if the ith bit of the signal rzr error(i) is asserted. The two N -bit
error signals error fact and rzr error are generated randomly at each cycle.
During the simulations, we record the detected and residual word errors for
each checker. For the combined checker, errors are reported to the controller
if either (i) rzr error= 1 and crc error= 0, i.e., the decoder has validated a
word after correction by Razor flip-flops, or (ii) rzr error= 1 and crc error= 1,
i.e., the decoder has invalidated a word corrected by Razor flip-flops, and (iii)
rzr error= 0 and crc error= 1, i.e., the decoder has detected a word error
that no Razor flip-flop detected. Furthermore, a residual word error happens
in either of the two following cases. Either all Razor flip-flops are affected by
an undetected timing error and the decoder does not detect the resulting word
error. Or, at least one Razor flip-flop does not detect a timing error and at least
another one reports an error and the resulting word error is not detected by the
decoder. All the possible situations are depicted in Fig. 6.5.
We model a link manufactured in a 130nm CMOS technology with the fol-
lowing parameters: the nominal voltage vdd is 1.2V, the threshold voltage vth
is 0.2V, and the cycle time Tc amounts to 2ns. In addition, Td = 0.3 Tc, as it
has been reported for busses [Kaul et al., 2005]. The value of the parameters
a and b characterising the Gamma random variable are chosen so that, under
the nominal voltage vdd, the typical delay µtp is 1ns and the standard deviation
σtp is 0.1ns. We ignore the actual failure mode of the link, i.e., the relation of
the bit error rate as a function of voltage and frequency. Indeed, it depends on
many complex factors—which is actually a reason why worst-case design is so
conservative. As a result, we have simulated two opposite noise conditions by
assigning different values to the parameter σtp . The first scenario (σtp = 0.1ns
under the nominal conditions) models a steep transition from a low error rate
104 6. DOUBLE SAMPLING, CODING, OR BOTH?
link errors
CRC detected errors
residual errors
CR
C
link errors
RZR detected errors
residual errors
RZ
R
CRC detected error undetected error
error corrected
correctly by RZR
wrong correction
by RZR
error undetected 
by RZR
link errors
CRC detected errors
RZR detected errors
residual errorsR
ZR
+C
RC
error corrected
correctly by RZR
wrong correction by RZR
undetected by CRC
error undetected 
by RZR and CRC
wrong correction by RZR
detected by CRC
error undetected by RZR
detected by CRC
time
Figure 6.5: All possible error outcomes for each of the 3 checkers (soft SSC, Razor
flip-flops, and combined).
6.2. DOUBLE SAMPLING AND CODES 105
to a high error rate. This scenario is called normal variance. On the contrary,
the second scenario models a more shallow transition from low to high error
rates. It is representative of extreme variations in silicon characteristics and is
characterised by σtp = 0.25ns at the nominal operating conditions. We refer to
this scenario as large variance. For illustration, while the probability of a timing
error is about 10−15 with the normal variance, it amounts nearly to 0.07% under
the large variance scenario.
During each simulation, we point out that both voltage and frequency are
fixed. However, the voltage is varied from 0.6V to 1.3V. One simulation is
performed for each voltage value.
In the simulation, we compute the reliability metric ρ defined in Eq. (6.3)
by dividing the total number of residual word errors by the total number of
detected word errors. The computed ratio also represents the average number
of undetected errors before a detected error occurs.
6.2.2.2 Comparison Results
We compare the following checkers: RZR, i.e., 8 Razor flip-flops, CRC3, i.e., the
3-bit CRC generated by the polynome x3 + x + 1, RZR+CRC1, i.e., a minimal
combined checker augmenting RZR with an alternating-phase parity check, and
RZR+CRC3 which combines RZR and CRC3.
Fig. 6.6 plots the reliability metric ρ of these four checkers and the word
error rate as a function of the timing bit error rate, under both the normal
(top) and large (bottom) variance scenarios.
As expected, Fig. 6.6 shows that the combined checkers always outperform
RZR. Moreover, the addition of a single parity bit RZR+CRC1 results in a
significant increase in reliability with respect to the RZR checker.
When comparing the RZR+CRC with CRC3 checkers in Fig. 6.6, we observe
that the region of poor reliability has shrunk and has been shifted to larger bit
error rate values. It follows that the RZR+CRC checkers offer a larger range
of bit error rate available to inform reliably the operating point controller that
the link is not operative (note that, when the RZR+CRC checker becomes un-
reliable, the word error rate has already reached 100%). Again, the minimal
combined checker (RZR+CRC1) increases reliability very significantly: under
the normal variance scenario, residual errors for this checker could not be mea-
sured until the word error rate reaches 100%.
While the combined checkers significantly outperform the CRC3 checker
for all bit error rates less than approximately 0.7, the single soft SSC checker
remains the most reliable beyond these values. Indeed, its residual error rate
tends rapidly to zero as the bit error rate further increases because the phase of
most (or all of) the bits input to the decoder does not match the phase of the
latter. As a result, most of the the errors—or, even all of them—are detected.
As Fig. 6.6 indicates, this beneficial phenomenon is not as pronounced for the
combined checkers. Specifically, the reliability curve of the combined checkers
features a plateau; yet, compared to the one of the CRC3 checker, the plateau
is shifted to the right (i.e., towards larger bit error rates). It follows that the
single soft SSC checker CRC3 remains the most reliable under very large bit
error rates. Fig. 6.7 provides an explanation. Without Razor flip-flops, the
bit error rate at the input of the decoder is obviously the same as the link bit
error rate εt = P (tp ≥ Tc). On the contrary, the shadow latch of a Razor flip-
106 6. DOUBLE SAMPLING, CODING, OR BOTH?
ρ
RZR+CRC1
ρ
RZR
ρ
CRC3
ρ
RZR+CRC3
8-bit word error rate
(prob. of at least one bit error)
ρ
RZR+CRC1
ρ
RZR
ρ
CRC3
ρ
RZR+CRC3
8-bit word error rate
(prob. of at least one bit error)
Figure 6.6: The ratio ρ = εres/εrep as a function of bit error rate under normal
variance (top) and large variance (bottom). The curves are extended with vertical
dotted lines at the first and last simulated points where residual errors could be
measured. The maximum unreliability of RZR+CRC1 is comparable to CRC3;
however, it occurs at significantly larger bit error rates—and thus smaller voltages.
6.2. DOUBLE SAMPLING AND CODES 107
propagation time t pTc Tc+Td
without Razor flip-flops:
error no error 
no error error 
(even with shadow latch) 
with Razor flip-flops:
actual bit error rate
without Razor flip-flops... 
...smaller effective bit error rate
with Razor flip-flops. 
error corrected
by shadow latch
pr
ob
ab
ili
ty
 
Figure 6.7: Bit error rate as seen by the soft SSC with and without Razor flip-flops.
Without error Razor flips-flops, the bit error at the decoder input is the bit error
rate of link. With Razor flip-flops, the bit error rate at the decoder input is less
than the bit error rate of the link, since some errors are corrected.
flop corrects some errors. Therefore, the bit error rate effectively seen by the
decoder of the combined checker is εt = P (tp ≥ Tc + Td), which is less than the
former quantity. As a result, when comparing a single soft SSC checker with
a combined checker under the same operating points, the effect of Razor flip-
flops is to reduce the bit error rate measured at the input of the decoder of the
combined checker. If, in addition, the same soft SSC is used for both checkers,
then the residual error rate of each checker is obtained by sampling the same
curve (residual error rate as a function of bit error rate, such as obtained in
Sec. 5.5.2) under different bit error rates. However, the detected error rate of
combined checker exceeds in any case the one of the single soft SSC checker.
We proceed now by analysing whether operating the decoder of the combined
checker under a bit error rate lower than the one of the link is beneficial. To do
so, we refer to Fig. 5.13 that plots the residual word error rate of a soft SSC as
a function of the bit error rate, assuming a timing error channel. We point out
that the residual word error rate is not a monotonic function of the timing bit
error rate: as the latter increases, the residual word error rate first increases,
then reaches a plateau, and eventually decreases to zero. Because of the non-
monotonicity of the function, it is possible that a decrease in bit error rate (due
to the correction capabilities of Razor flip-flops) causes an increased residual
word error rate (since the residual word error rate is not a monotonic function
of the bit error rate). In such a case, correction by Razor flip-flops is detrimental
to reliability. Because the values of the word error rate for which the residual
error rate reaches the plateau are fairly high (as confirmed in Figures 5.13 and
6.6), we can draw the informal conclusion that correction capabilities of Razor
108 6. DOUBLE SAMPLING, CODING, OR BOTH?
flip-flops are only beneficial under small word error rates. The next section
develops this assertion in more depth.
6.2.3 Giving Up Correction?
The combined checker described so far features aggressive error correction ca-
pabilities since the data held by the shadow latch is always considered as a
reference even in situations of large error rate where that may not be the case.
While error correction capabilities are efficient and safe under small error rates
(i.e., when only a few timing errors corrupt a word), they are defective under
large error rates because (i) the data pieces held by the shadow latches of some
Razor flip-flops—and thus used for correction—may be corrupted themselves,
and (ii) the decoder of the combined checker is subject to a smaller bit error
rate than the link, which, in this situation, decreases the chance of detecting an
error.
These remarks reveal a trade-off between reliability and the correction ca-
pabilities of the combined checker. In order to further improve reliability, one
could design a different combined checker combining double sampling flip-flops
used in the sole purpose of error detection with a soft SSC. That is, compared
to the RZR+CRC checker, the shadow latch is replaced by a shadow flip-flop
whose output is not used to correct timing errors—even though it samples some
time after the main flip-flop. The detection of a word error—either by one of
the double sampling flip-flops and/or by the decoder of the soft SSC—triggers
the retransmission of the entire word. Such a design option favours reliability
at the expense of error correction capabilities.
The high level structure of such a checker is depicted in Fig. 6.8. The
error signal is asserted whenever a timing error is detected either by the double
sampling flip-flops or by the decoder. The assertion of the error signal triggers
a retransmission. The ARQ controller is a very basic entity that de-asserts the
in sequence signal during 2 cycles after the detection of a timing error. Contrary
to Fig. 6.2, the main difference stems from the fact that the two checkers are
not combined serially, but in parallel: indeed, the double sampling flip-flops and
the decoder of the soft SSC are fed with the same data pieces (but at different
time instants). Another difference is that there is no need to synchronise the
encoder and decoder phase, since the double sampling flip-flops do not provide
a corrected data one cycle after the timing error is detected.
Fig. 6.9 illustrates the operation of this checker in the presence of a timing
error. In cycle 2, the main flip-flop output is corrupted because it still holds D1.
On the contrary, the shadow flip-flop has sampled D2 correctly. As result, the
ff error is sampled high. At the same time, the crc error signal is asserted since
D1 does not match the phase of the decoding logic. The error signal is asserted
accordingly in cycle 3. However, contrary to the checker depicted in Fig. 6.2,
a whole cycle is available for requesting a retransmission. That is, Td is not
constrained by the relation such as the one expressed in Eq. (6.4) and may thus
assume larger values. Finally, the ARQ controller de-asserts the in sequence
signal during cycles 4 and 5 until D2 is correctly retransmitted.
We denote by DSFF+CRCn a checker such as described in Fig. 6.8, i.e.,
combining double sampling flip-flops (without error correction) with a n-bit
alternating-phase CRC. It can be shown easily that, if submitted to the
same data inputs, the DSFF+CRCn checker has a reliability ratio (defined
6.2. DOUBLE SAMPLING AND CODES 109
CRC
encoder
K N
dsff_error
crc_error
error
data
K
N CRC
decoder
link
EN
EN
EN
clk_del
clk
shadow flip-flop
errormain flip-flop
double sampling 
flip-flop
QD
DS
FF
Figure 6.8: A data link using a checker combining double sampling flip-flops with
a soft SSC. Timing errors are not corrected. Error recovery (by retransmission) and
in-sequence data delivery is ensured by an ARQ controller such as the one described
in Sec. 6.2.1.
D1 D2 D3 D4 D2 D3 D4
D1 D2 D3 D4 D2 D3
cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 cycle 6
D1 D3 D4 D2 D3D0 D1
D0 D1 D3 D4 D2D1
late arrival of D2
clk
encoder
output
DS FF
input
main FF
output
decoder
output
phase
crc_error
error
shadow FF
output
D4
D0 D1 D2 D3 D4 D2 D3
ff_error
in_sequence
Figure 6.9: Timing diagram describing the operation of the checker depicted in
Fig. 6.8 in the presence of a timing error. Note that, contrary to the top diagram of
Fig. 6.3, the phase of the encoder and decoder is not affected by the timing error.
110 6. DOUBLE SAMPLING, CODING, OR BOTH?
in Eq. (6.3)) not larger than the one of the CRCn or RZR+CRCn checkers.
To show this claim, we compare first the DSFF+CRCn and CRCn checkers.
Because the decoder in both checkers is fed with exactly the same data, it
follows that (i) any error detected by the CRCn checker is also detected by
the DSFF+CRCn checker, and (ii) any error undetected by the DSFF+CRCn
checker is also undetected by the CRCn checker. Therefore, the reliability ratio
of the DSFF+CRCn checker is less than or equal to the one of the CRCn checker.
We compare now the DSFF+CRCn and RZR+CRCn checkers. Clearly, any er-
ror detected by the Razor flip-flops is also detected by the double sampling
flip-flops. Moreover, any error that is undetected by the Razor flip-flops but de-
tected by the CRC is, as well, undetected by the double sampling flip-flops and
detected by the CRC. Thus, the number of detected errors of the DSFF+CRCn
checker is at least the number of detected errors of the RZR+CRCn checker.
On the other hand, the RZR+CRCn generates more residual errors than the
DSFF+CRCn checker. Indeed, a residual error with the former either went
undetected by both the Razor flip-flops and the CRC, or has been detected
and wrongly corrected by the Razor flip-flops and then validated by the CRC.
The first case also causes a residual error with the DSFF+CRCn checker. On
the contrary, the second case causes a retransmission since the error would be
detected by the double sampling flip-flops. As a result, the reliability ratio of
the DSFF+CRCn checker is less than the one of the CRCn checker.
We have simulated under the same conditions as in Sec. 6.2.2 the reliability
ratio of the DSFF+CRC1 checker, i.e., a checker consisting of double sampling
flip-flops combined with a parity alternating-phase code. Fig. 6.10 plots the
reliability ratio of the DSFF+CRC1 checker and compares it with the ratios
plotted in Fig. 6.6. As expected, the DSFF+CRC1 checker outperforms all
other checkers under any error rate
Overall, giving up to correction capabilities offers significant benefits.
• Higher reliability. As discussed and verified in Fig. 6.10, double sam-
pling flips-flops combined with a single alternating-phase parity bit stands
out as the most reliable checker. Furthermore, there is another phe-
nomenon that increases reliability under small error rates. Because the
shadow flip-flop output is not used as a reference to correct errors, the
effect of short-path errors on reliability is limited to false alarms. On
the contrary, the combined checker using Razor flip-flops may output cor-
rupted data in case of a short-path error and compromise thus reliability.
• Tolerance to soft errors. Several techniques mitigate the effects of soft
errors thanks to double sampling flip-flops. They exploit the fact that soft
errors have a limited effect in time by introducing a delay either between
the clock or data signals fed into the flip-flops. While Razor flip-flops are
unable to recover from soft errors—since the output of the shadow latch
is not guaranteed to be correct—double sampling flip-flops can. Indeed,
they only indicate a mismatch between the data held in each of them.
Retransmitting the corrupted data recovers from the soft error.
• Low hardware overhead. The high complementarity of the timing
redundancy added by the DSFF+CRC checker enables to minimise the
codec circuitry and the wiring overhead, while not compromising relia-
bility. Moreover, the area and power overhead incurred by the shadow
flip-flops can be amortised as follows. In a recent work devoted to soft
6.2. DOUBLE SAMPLING AND CODES 111
ρ
DSFF+CRC1
ρ
RZR+CRC1
ρ
RZR
ρ
CRC3
ρ
RZR+CRC3
8-bit word error rate
(prob. of at least one bit error)
ρ
DSFF+CRC1
ρ
RZR+CRC1
ρ
RZR
ρ
CRC3
ρ
RZR+CRC3
8-bit word error rate
(prob. of at least one bit error)
Figure 6.10: The ratio ρ = εres/εrep as a function of bit error rate under normal
variance (top) and large variance (bottom). In both scenarios, the DSFF+CRC1
checker outperforms all other checkers, except the RZR+CRC3 checker in a limited
range of error rate.
112 6. DOUBLE SAMPLING, CODING, OR BOTH?
error recovery with double-sampling flip-flops, Mitra et al. propose to
use scan flip-flops—i.e., additional flip-flops used for testing and debug-
ging purposes and thus unused during normal operation of the chip—as
shadow flip-flops [2005]. This idea may also be applied to the double
sampling flip-flops in the combined checker proposed in this section.
Hybrid approaches where correction capabilities are dynamically disabled can
also be envisaged. We mean that correction is enabled when the data input to
the decoder is taken from the shadow flip-flop. Conversely, it is disabled when
the decoder is fed with data sampled by the main flip-flop. One possibility
consists in disabling the correction capabilities whenever the number of timing
errors exceeds a given threshold (possibly a single timing error). While error
recovery of such a checker would be more efficient than only retransmitting,
its hardware overhead would be increased by the additional circuit indicating
when a single or a few bits of a register are set to 1. Another possibility would
be to disable correction capabilities only when the supply voltage of the self-
calibrating link is lowered to a tentative new level. Otherwise, during normal
operation—in fact, when the error rate is expected to remain low—correction
is enabled and safely exploited. Dynamically enabling error correction opens a
new research direction as far as recovery of on-chip transfer errors is concerned
and compares interestingly with hybrid ARQ schemes recently proposed [Murali
et al., 2005].
6.2.4 Hardware Complexity of the Combined Checkers
The combined checkers—RZR+CRC and DSFF+CRC—are both organised as
a 3-stage pipeline (encoding-transmission-decoding). This fact in itself does not
constitute an additional hardware overhead because coding techniques requiring
a similar organisation are already deployed over on-chip links, as mentioned in
Sec. 4.4.
We argue that the hardware overhead incurred by the DSFF+CRC checker
is minimal. First, the area overhead caused by double sampling flip-flops can be
mitigated by using scan flip-flops in the purpose of double sampling [Mitra et
al., 2005]. Second, we have shown in Sec. 6.2.3 that a single alternating-parity
bit protecting 8 information bits offers a high robustness to timing errors over
the whole range of bit error rates. Such a checker only adds a single redundant
wire and a minimal codec circuitry needed to compute a parity. Moreover,
the amount of redundancy (1/8 or 12.5%) is smaller than or comparable to
the one added by error control techniques traditionally deployed over on-chip
bus—e.g., [McNairy and Soltis, 2003]. The codec circuitry is also less complex
as the DSFF+CRCR checker does not perform error correction. We also point
out that the hardware overhead of the RZR+CRC or single soft SSC checker is
larger than the one incurred by DSFF+CRC, because more redundant bits are
required for the same reliability level.
Finally, we have synthetised an ARQ controller for the RZR+CRC checker—
in fact, the most complex configuration—and found that it consists of only 6
flip-flops and about 20 gates. Actually, a very similar controller is required in
any system that uses codes for error detection.
6.3. COMPARING CHECKERS WITH OPERATING POINT USAGE 113
6.2.5 Double Sampling and/or Codes: Conclusions
Overall, we recommend the checkers combining Razor or double sampling flip-
flops with an alternating-phase parity check because they offer a sufficient level
of reliability, while incurring a minimal overhead in wiring and codec circuitry.
This high level of robustness is explained by the addition of both intra-cycle
(i.e., double sampling) and extra-cycle (with the phase bit) time redundancy.
As we have shown in Fig. 6.6, the RZR+CRC1 checker is more reliable than
Razor flip-flops under all error rates. Moreover, compared to the single soft
SSC checker, the RZR+CRC1 checker offers a larger range of bit error rate to
reliably report transfer errors to the operating point controller. The RZR+CRC
checker also lessens the impact of error recovery on latency since retransmissions
are triggered only in situations that would have resulted in an undetected error
had Razor flip-flops been used as a single checker.
Then, we have investigated the possibility of using double sampling flip-
flops only in the purpose of error detection—error recovery being performed by
retransmission. This option is very promising because the resulting checker (i) is
more robust to timing errors than any other checker architecture, (ii) is resistant
to soft errors, (iii) has a low hardware cost, and (iv) contrary to RZR+CRC
checkers, does not necessarily require a small delay between the clock feeding
the main and shadow flip-flops (as the constraint of Eq. (6.4) expresses).
A remarkable fact is that the soft SSC validating the data output by the Ra-
zor or double-sampling flip-flops needs not include complex spatial redundancy
that incurs a large wiring overhead. The reliability ratio plotted in Fig. 6.6
for the RZR+CRC1 checker confirms that the addition of a single alternating-
parity bit increases reliability significantly compared to the RZR checker. The
RZR+CRC3 checker is certainly even more reliable; however the increase in re-
liability is not as significant compared to the RZR+CRC1 checker. The reason
is that the reliability of the combined checker relies on the soft SSC only from
the point on where some Razor flip-flops do not detect timing errors any more,
which happens with large bit error rates. Under such conditions, detecting tim-
ing errors does not require complex spatial redundancy, but relies mainly on
the notion of phase. This phenomenon is even more acute for the DSFF+CRC1
checker.
We proceed as follows. In Sec. 6.3, we answer the question whether compar-
ing checkers requires some knowledge about the operating point control policy.
Finally, Sec. 6.4 expands on the opportunity of relaxing coding requirements
when codes are used in combination with Razor flip-flops, and opens perspec-
tives towards self-calibrating computation.
6.3 Comparing Checkers With Operating Point Usage
When comparing the reliability metric ρ of two checkers, it could be that one
of them is more reliable on a given operating point, while the converse occurs
on another operating point. Actually, the reliability comparison of the checkers
CRC3 and RZR+CRC1 in Fig. 6.6 illustrates this case. Nevertheless, we could
recommend the checker RZR+CRC1 because any practical control policy settles
mainly on operating points where the combined checker is more reliable than the
CRC3 checker. Therefore, a decision can be made without assuming a particular
114 6. DOUBLE SAMPLING, CODING, OR BOTH?
re
lia
bi
lit
y 
m
et
ric
ρ
bit error rate
checker Bchecker A
ε1 ε2 ε3
operating point
usage 
30% 60%10%
overall, checker A is more reliable
operating point
usage 
30% 10%60%
overall, checker B is more reliable
 Scenario 1
 Scenario 2
Figure 6.11: Hypothetic comparison of two checkers. With the operating point
distribution of scenario 1, checker A is more reliable than checker B because the
circuit is mostly used under error rate ε3. On the contrary, the distribution of
scenario 2 uses mainly the error rate ε1 where checker B is more reliable than
checker A.
distribution of the operating point usage (i.e., the fraction of time a particular
operating point is used).
However, the knowledge of the operating point usage may be required to
determine which of two checkers is more reliable. We mean by operating point
usage the fraction of the time the operating point is selected (used) by the
controller. For example, let us consider the situation depicted in Fig. 6.11. In
a situation where the third operating point (characterised by a bit rate ε3) is
mostly used, checker A is overall more reliable than checker B. On the contrary,
a different control policy using mostly the first operating point (characterised
by a bit rate ε1) favours checker B.
We denote by Q the number of possible operating points and by ui the usage
of the ith operating point. By definition, the operating point usage satisfies∑Q
i=1 ui = 1. In order to discriminate the checkers, the value of the reliability
metric ρ of a given operating point should be weighted by its usage frequency.
The resulting reliability metric, which we call the effective reliability metric and
denote by ρeff, is computed as
ρeff =
Q∑
i=1
ρi ui, (6.6)
with ρi the reliability metric of the i
th operating point as defined in Eq. (6.3).
The effective reliability metric can thus be interpreted as the average (over all
operating points used) number of residual errors per detected error.
This metric enables to compare the reliability of checkers, even in situations
like the one depicted in Fig. 6.11: a checker A is more reliable than a checker
6.3. COMPARING CHECKERS WITH OPERATING POINT USAGE 115
error model
operating points
v1 v2 v3
ε1 2 3ε ε
bit error rates
simulation or
analytical 
derivations
εrep εres
reported error rate
operating point
control policy εres / εrepρ =
u ρi i
ρ    = Σ  u i ρieff
operating point
usage
reliability
metric
ε = Fct(V, F)
residual error rate
Figure 6.12: Successive steps required for the computation of the effective relia-
bility metric ρeff.
B if and only if ρeff,A < ρeff,B. Yet, this relation can be deduced without the
knowledge of the operating point usage if either ρi,A < ρi,B for all i = 1, ..., Q—
such as when comparing RZR with RZR+CRC1—or if the latter relation is not
verified only for operating points uj such that uj u 0 for all practical control
policies—such as when comparing CRC3 with RZR+CRC1.
Evaluating Eq. (6.6) requires both the quantities ρi and ui. The ratios ρi
are derived by simulation, as performed in Sec. 6.2.2, while the operating point
usage can be obtained by a statistical analysis of the control policy. Since
the latter specifies how operating points are adjusted in function of detected
errors, a statistical analysis is possible whenever the reported error probabil-
ity εrep is known for each operating point. In turn, these probabilities can be
obtained from reliability simulations such as the ones performed in Sec. 6.2.2.
Fig. 6.12 describes the consecutive steps required to evaluate the effective relia-
bility metric. Even though we did not encounter a practical case justifying the
computation of the effective reliability metric, we illustrate in Example 12 the
method described in Fig. 6.12.
Example 12. (comparison of the effective reliability of two checkers).
We would like to compare the reliability of the checkers CRC3 and
116 6. DOUBLE SAMPLING, CODING, OR BOTH?
v1 v2 vQvQ-1
p2,3 pQ-1,Qp1,2
p2,1 p3,2 pQ,Q-1
p1,1 pQ,Qp2,2
pQ-1,Q-1
Figure 6.13: The error rate tracking controller modelled as a discrete time Markov
chain. The probability of requesting voltage level j given that the current voltage
level is i is denoted by pi,j .
RZR+CRC1 under the normal variance scenario defined in Sec. 6.2.2 and as-
suming a voltage control policy that consists in tracking a target word error
rate—1% in this example. The voltage controller records the number of reported
errors every W cycles. The voltage level is incremented whenever the number of
reported errors exceeds a threshold Th and is decremented whenever the number
of reported errors is less than another threshold Tl.
We choose a Markovian control scheme. The control algorithm is a discrete
time Markov chain, as shown in Fig. 6.13. The time step of the Markov chain
model amounts to W cycles, which corresponds to the time instants where the
voltage control decision is taken. The probability pi,j of requesting voltage level
j given that the current voltage level is i is computed as follows. At each cycle,
an error is reported to the controller with probability εrep. Therefore, using the
fact that errors occurring in different cycles are statistically independent, the
probability of increasing voltage is, for i = 1, ..., Q− 1,
pi,i+1 =
W∑
k=Th
(
W
k
)
εkrep,i · (1− εrep,i)W−k ,
while, for i = 2, ..., Q, the probability of decreasing the voltage is
pi,i−1 =
Tl∑
k=0
(
W
k
)
εkrep,i · (1− εrep,i)W−k ,
where εrep,i is probability of a reported error in voltage level i. The probability
of keeping the voltage unchanged is computed as
pi,i = 1− pi,i−1 − pi,i+1,
because voltage levels can only be incremented, decremented, or kept unchanged.
This also implies that pi,j = 0 whenever | i− j |> 1.
Let P be the Q×Q matrix defined by P = (pi,j)i,j=1,...,Q. The column vector
consisting of the steady state usage probabilities U satisfies
U = P U.
The steady state usage probabilities are the unique solution of the latter equation
satisfying
∑
i ui = 1.
6.4. TOWARDS SELF-CALIBRATING COMPUTATION 117
vch[V] 0.85 0.90 0.95 1.0 1.05 1.10
ρ 2.0 · 10−4 0 0 0 0 0
εrep 0.48 0.33 8.0 · 10−2 6.3 · 10−3 2.3 · 10−4 1.6 · 10−6
u 0 0 0.30 0.70 0 0
ρeff 5.0 · 10−17
(a)
vch[V] 0.85 0.90 0.95 1.0 1.05 1.10
ρ 0.10 3.4 · 10−2 4.1 · 10−3 4.0 · 10−4 0 0
εrep 0.87 0.57 0.10 7.8 · 10−3 3.0 · 10−4 1.6 · 10−6
u 0 0 0.06 0.94 0 0
ρeff 6.4 · 10−4
(b)
Table 6.1: Voltage (vch), reliability metric (ρ), reported error rate (εrep), operating
point usage (u), and effective reliability metric (ρeff) for the RZR+CRC1 (a) and
the CRC3 (b) checkers.
Using the reported error probabilities obtained in Sec. 6.2.2, we have com-
puted the operating point usage probabilities assuming a controller characterised
as follows. The voltage is updated every 5000 cycles (i.e., W = 5000 and the
target count of reported error is 50). The voltage is increased whenever more
than 70 errors are reported (i.e., Th = 70) and decreased whenever less than 30
errors are reported (i.e., Tl = 30). The link frequency is fixed (500MHz), while
the simulated voltages range from 0.6V to 1.1V in 50mV increments. Table 12
summarises the figures obtained, focusing on voltages larger than or equal to
0.85V. The numerical comparison of the effective reliability metric clearly con-
firms the advantage of the RZR+CRC1 checker over the CRC3 checker. The
explanation is obvious: the steady state analysis of the control policy reveals
that, essentially, the only voltages used are 0.95V and 1.0V, where no residual
errors could be measured for the combined checker.
We proceed by presenting perspectives ideas on how to combine Razor flip-
flops with codes to build checkers for computing elements, e.g., adders.
6.4 Towards Self-Calibrating Computation
As emphasised in Sec. 6.2.5, the high robustness of the combined checkers results
of the combination of double sampling flip-flops with codes—possibly, quite
basic—including time redundancy, e.g., the notion of phase. Expanding on this
observation, we consider now the combination of double sampling flip-flops with
codes on computing elements.
Two main differences stand out compared to communication. The first dif-
ference bears on the error model, or more precisely, the lack thereof. Regarding
communication, our fault model—i.e., the failure of bit transition—has lead to
a particular communication channel: the timing error channel. In our opin-
ion, it is unclear whether developing an equivalent channel for, e.g., an adder
is feasible. The failure modes of a computing element are more complex than
118 6. DOUBLE SAMPLING, CODING, OR BOTH?
encoder
computing
element
encoder checker error
verification of 
an input-output
property
output codeword
input codeword
Figure 6.14: An input-output property verified by the computing element under
correct operation is used as a checking principle. Without loss of generality, a single
input, single output computing element can be considered.
the mere failure of bit transitions. The concept of bit error rate is hard to
correlate with a physical event—such as, for communication, the fact that the
propagation time through the link exceeds the cycle time. Furthermore, given
a particular transition vector, the timing error channel enables to determine all
possible channel outputs, which is valuable in order to develop a coding scheme.
Again, we see no such equivalent that could list all possible outputs of a com-
puting element operated at sub-critical voltage. Finally, computing elements
may be implemented in a wider variety of forms than data links. Thus, different
implementations are very likely to behave differently at sub-critical voltage.
The second difference is that, unlike a link, a computing element trans-
forms its input. That is, the set of possible outputs is, in general, differ-
ent from the set of possible inputs. As a result, the coding scheme de-
ployed must be “preserved” by the computing element. Because communi-
cation does not transform intentionally data, a verified input-output prop-
erty is identity. The checking principle used in computed elements is more
general, as described in Fig. 6.14. Two requirements bear on the input-
output relation under correct operation: (i) it is verified for input code-
words, and (ii) it is not verified for inputs that are not codewords. When
these requirements are met, the computing element is said to be code dis-
joint [Lala, 2001]. There exist simple input-output properties for the parity
and Berger codes under all operations performed by an ALU [Nicolaidis, 2003;
Jien-Chung et al., 1992].
We give now specific examples of the checker architecture depicted in
Fig. 6.14 by focusing the rest of the discussion on adders. We denote respec-
tively by A, B, C, and S the two input operands, internal carries and sum of
an N -bit adder. C contains the carry-in cin and the N − 1 internal carries.
Moreover, we denote respectively by p (A), p (B) , p (C), and p (S) the parity
of the inputs, internal carries, and sum. The bitwise relation between A, B,
C, and S is Si = Ai ⊕ Bi ⊕ Ci, with 1 ≤ i ≤ N and c1 = cin. It follows that
p (S) = p (A)⊕ p (B)⊕ p (C). By reordering the terms, we obtain the following
input-output relation: p (A) ⊕ p (B) = p (C) ⊕ p (S). That is, one can consider
the adder as a single element with a 2 N -bit input, A | B, producing a 2 N -bit
output C | S, where · | · denotes the concatenation operator. From this point
6.4. TOWARDS SELF-CALIBRATING COMPUTATION 119
+
A
B
error
cout
cin
predicted alternating parity 
S
C
sum
N
N parity
encoder
parity
encoder
N+1
N
N
parity check
parity of outputs
alternating parity
N
N
parity of inputs 
Figure 6.15: An adder with an alternating-parity prediction scheme. A and B are
the two parity-encoded operands. S and C are respectively the sum and internal
carries.
of view, the adder preserves parity, i.e., p (A | B) = p (C | S).
Fig. 6.15 is a natural translation of Fig. 4.6 in the context of an adder.
The encoding also includes the notion of phase. In Example 2, we have men-
tioned that LEDR alternates the parity of transmitted codewords. Likewise,
the parity predicted in the encoding of Fig. 6.15 alternates at each new addi-
tion between the parity p (A) ⊕ p (B) and its opposite. While such a scheme
prevents from delivering twice the same adder output in case of over-aggressive
operation, it is expected to have a poor detection capability under moderate bit
error rate, because any even number of bit errors leaves the parity unchanged.
Moreover, multiple-bit errors are likely due to the fact that errors affecting the
carry propagate. Two non-mutually exclusive options are available to increase
the reliability of this scheme. The first one consists in using Razor flip-flops to
increase the detection of timing errors, as already discussed in Sec. 6.2.1 for the
case of a link. The second option focuses on increasing the code error detection
capability and the hardware overhead as well.
On the one hand, the addition of Razor flip-flops is a natural translation of
Fig. 6.2 to an adder, as depicted in Fig. 6.16. The resulting checker is expected
to be more robust than the one described in Fig. 6.15, since its reliability relies
on the single parity check only under error rates large enough to cause some or all
Razor flip-flops not to detect errors any more. This is the same reason as already
invoked in Sec. 6.2.5 to justify why the coding requirements may be relaxed when
combined with Razor flip-flops. Another expectation is that the robustness of
the combined checker will hold for different possible implementations of the
adder.
On the other hand, we see two ways of improving reliability of the coding
scheme.
1. Including more than a single parity check. Parity is preserved by addition
120 6. DOUBLE SAMPLING, CODING, OR BOTH?
+
A
B
error
cout
cin
predicted parity 
S
C
sum
N
N
N
N
parity
encoder
parity
encoder
N+1
parity check
razor error
EN
EN
Rzr
 FF
Rzr
 FF
N
N
N
N
EN
EN
EN
Figure 6.16: An adder combining an alternating-parity prediction scheme with
Razor flip-flops. A and B are the two parity-encoded operands. S and C are
respectively the sum and internal carries.
as long as it is computed on consecutive bits. Thus, several parity checks
can be computed; their supports need not be disjoint.
2. Using other codes preserved by addition, such as the Berger code or low-
cost residue codes where the redundant bits are computed as the remainder
of the division of the information bits by 2n − 1, e.g., 3 or 7.
6.5 Conclusions
The main achievement of this chapter is to propose novel checker architectures
for a self-calibrating on-chip link that feature unmatched robustness to timing
errors with minimal codec circuitry and wiring overhead. In Sec. 6.1 and Ap-
pendix D, we have emphasised the strong complementarity of checkers based
on double sampling flip-flops and soft self-synchronising codes (SSC). Next, ex-
ploiting this observation, we have proposed two different checker architectures
combining double sampling flip-flops with soft SSC.
The first architecture (denoted by RZR+CRC) combines Razor flip-flops
with a soft SSC and features
• a larger reliability than each of its counterparts,
• efficient error correction capabilities requiring retransmissions only when
Razor flip-flops as a single checker would have caused residual errors, and
• a low hardware overhead incurred by the redundant wires and the codec
circuitry, since detection capabilities of the soft SSC are only tried un-
der large bit error rate where the notion of phase—not complex spatial
redundancy—mostly matters.
However, we have shown that, under large bit error rates, the error correction
capabilities of Razor flip-flops interfere negatively with the soft SSC: in fact, the
6.5. CONCLUSIONS 121
single soft SSC checker remains more reliable under large error rates. Although,
all in all, the range of error rate ensuring reliable operation is larger for the
RZR+CRC checker than for the soft SSC checker, we have developed a modified
version of the combined checker where the shadow latch in a Razor flip-flop
is replaced by a shadow flip-flop whose output is used in the sole purpose of
detecting timing errors. In this latter version, the redundancy brought by double
sampling flip-flops is combined optimally—from a reliability standpoint—with
the timing redundancy of the soft SSC because the detection capabilities of
each checker are exploited without any negative interference, contrary to the the
RZR+CRC checker. We have verified by simulation that the checker combining
double sampling flip-flops with a single alternating-phase parity bit outperforms
all other checkers.
The checker combining double sampling flip-flops with soft SSC achieves
the goal formulated in Sec. 1.3, namely the separation of reliability concerns
from performance and power objectives. On the one hand, reliability concerns
are confined to the checker even without making any worst-case assumptions
about the link error rate. On the other hand, the operating point controller
adjusts voltage in function of reported errors and operating frequency in function
of workload requirements. As a practical consequence, the combined checker
enables a larger variety of voltage control policies without risking infringements
to reliability. Because of its high robustness to timing errors under any possible
bit error rate, the checker does not constrain the operating point controller to
avoid weak spots where reliability is not ensured. Equivalently, the novel checker
architecture enables a reliability-agnostic control of the link operating points,
which is a property neither the single soft SSC checker nor Razor flip-flops enjoy.
Next, we have elaborated on the question of comparing the reliability of
checkers for self-calibrating circuits. Since the checker provides information to
both the end-user (i.e., the entity using the circuit outputs) and the operating
point controller, both residual and reported errors need to be taken into account.
There exists not necessarily a simple relation between these two quantities: e.g.,
residual and reported errors are not exclusive with Razor flip-flops. We have
thus proposed a reliability metric specific to checkers for self-calibrating circuits,
which accounts for both residual and reported errors . Then, using this metric,
we have characterised in Sec. 6.3 situations where the comparison of two checkers
does require information about the operating point usage. We have illustrated
the procedure required to compute this reliability metric in the particular case
of a voltage controller tracking a target error rate.
Finally, expanding on the opportunity to combine low overhead codes with
double sampling flip-flops, we have given prospective checker architectures for
an adder. Due to the lack of models on the failure of computing elements at
sub-critical voltage, future work in this direction should focus on the experi-
mental validation of such checkers, such as the approach followed by Roberts et
al. [2005].
122 6. DOUBLE SAMPLING, CODING, OR BOTH?
Chapter 7
Conclusion
First, we summarise our achievements taking especially into account the goalintroduced in Chapter 1. In Sec. 7.2, we mention potential areas that could
benefit from the checkers we have proposed. Next, Sec. 7.3 mentions a few items
that we consider worth investigating in the short-term. Finally, we conclude by
discussing long-term perspectives opened by this work.
7.1 Achievements
We say up-front that, through the work presented in Chapter 4, this thesis has
pioneered the use of digital self-calibrating techniques to set the operating points
of an on-chip link without relying on worst-case characterisation [Worm et al.,
2002; 2005]. Having demonstrated the feasibility of a self-calibrating on-chip
link, this thesis focuses primarily on the checkers deployed in such circuits.
The main achievement is thus the development of checker architectures meet-
ing the goal stated in Chapter 1, which we recall as follows:
• The checker should demonstrate a high reliability under any possible tim-
ing error rate. Although we quantify reliability using a particular error
model (namely, the timing error channel), the requirement on the checker
to operate reliably over the whole range of bit error rate makes it possible
to design without any worst-case characterisation of the link error rate.
This requirement is key for the a self-calibrating design that may operate
temporarily under error rates as large as 100%—yet and especially under
this condition, reliable operation of the checker should be guaranteed.
• The overhead of the checker should be limited in terms of redundant wires
added to the link and additional encoding and decoding circuitry.
The checkers proposed in Sec. 6.2 meet these objectives. They offer an un-
matched level of robustness to timing errors because they combine two very com-
plementary checkers, namely Razor or double sampling flip-flops adding intra-
cycle timing redundancy with CRC-based alternating-phase encoding adding
extra-cycle timing redundancy. Due to the strong complementarity of the added
timing redundancy, such combined checkers do not require complex and costly
spatial redundancy to operate reliably. We have verified by simulation that a
123
124 7. CONCLUSION
single alternating-phase parity check validating the output of Razor or double
sampling flip-flops offers a level of reliability sufficient to protect 8-bit data un-
der any possible timing error rate until 100%. In addition, we have shown that,
interestingly, the error correction capabilities of double sampling flip-flops com-
bined with soft self-synchronising codes are detrimental to reliability under large
bit error rates. As a result, we have proposed in Sec. 6.2.3 a checker architecture
combining double sampling flip-flops—without any correction capability—with
codes, which combines optimally the robustness of each individual checker.
As stated in the introduction, such low-overhead and highly robust combined
checkers constitute key building blocks in that they enable a reliability-agnostic
control of the on-chip link. As a result, the operating point controller is not
constrained and complexified by the need of avoiding weak spots of the checker.
Simple control policies can be designed to efficiently exploit power-performance
trade-offs.
A second achievement of this thesis is to study comprehensively the self-
synchronisation properties of synchronous encoding schemes. Based on the ob-
servation that operation at sub-critical voltage causes data to be sampled while
some individual bit transitions have not yet completed, we have defined tim-
ing errors and their associated communication channel. We have exploited the
formal definition of timing errors to characterise hard self-synchronising encod-
ings that enjoy the property of detecting all possible timing errors. We have
bounded the minimum amount of wiring overhead required to ensure the hard
self-synchronising property. The bound is achieved by differential encoding with
unordered codes of minimum redundancy, i.e., using codes known to be optimal
in asynchronous communication. Next, we have studied self-synchronisation for
a particular type of encoding (referred to as symbol-invariant) that alternates
different dictionaries in time. Focusing on symbol-invariant encodings that alter-
nates only two dictionaries, we have conjectured—and verified by an exhaustive
search for several symbol sizes—the optimality of LEDR (defined in Example 2).
That is, we claim that no other hard self-synchronising alternating-phase en-
coding uses bandwidth more efficiently than LEDR. Because LEDR does not
consist of unordered codes of minimum redundancy, the conjecture shows that,
depending on the considered symbol sequencing rule, maximising the bandwidth
efficiency of hard self-synchronising encodings does not necessarily lead to un-
ordered codes of minimum redundancy. Therefore, one important contribution
is to specify and limit the scope of optimality of the latter encoding and give new
properties characterising hard self-synchronisation—as well as the new optimal
encodings—valid out of its scope of optimality.
Moreover, we have investigated the possibility of reducing the wiring over-
head beyond the limit needed to ensure hard self-synchronisation. Although the
resulting encoding schemes do not detect any more all possible timing errors,
they still detect many of them and have a lower wiring overhead. Accordingly,
we have called these schemes soft self-synchronising. In the particular case of
LEDR, reducing the wiring overhead below its original amount (i.e., 100%) has
lead to a novel encoding scheme: CRC-based alternating-phase encoding. This
novel soft self-synchronising encoding features unique detection capabilities to-
wards both timing and additive errors, as shown in Fig. 5.16. Regarding the
reduction in the wiring overhead of LEDR, we have bounded and approximated
accurately the induced loss in reliability, quantifying thus the trade-off between
wiring overhead and self-synchronisation properties.
7.2. APPLICATIONS 125
Finally, we argue that timing errors are as relevant as additive errors to
derive reliability figures of standard error correcting codes used over on-chip
links. Indeed, crosstalk—a phenomenon raising significant concerns about the
reliability of deep sub-micron on-chip links—causes both timing and additive
errors. Timing errors result from delayed bit transitions, while additive errors
are caused by spurious bit transitions (modelled with a binary symmetric chan-
nel). Therefore, Eqs. (5.10), (5.11), and (5.12) bounding and approximating
the residual error rate of linear codes over the timing error channel complement
interestingly the well-known formula of Eq. (5.16) valid for additive errors.
7.2 Applications
The checkers proposed in this thesis can be deployed over on-chip links where
variability in electrical parameters compromises reliable and efficient operation.
We stress that self-calibration of on-chip links can be easily deployed because
most of the required building blocks (essentially, the support for voltage scaling
and the codec circuitry) are already existing—and even wide-spread in some
particular applications. The fact that codes are implemented over on-chip links
demonstrates that many applications tolerate transfer variations caused by the
recovery from run-time transfer errors. In fact, this feature is very desirable as
future technologies will be increasingly error-prone.
Moreover, our approach integrates seamlessly with dynamic voltage and fre-
quency scaling (DVFS) techniques: we propose to control frequency and voltage
in a decoupled manner due to the fact that reliability concerns are confined to
the checker. As a result, this approach leverages workload prediction meth-
ods (such as history-based) that determine the operating frequency in DVFS
capable systems.
The network-on-chip design paradigm—a communication-centric approach
where a customisable interconnect glues several heterogeneous pre-designed
cores—suggests interesting applications for self-calibrating interconnects. In-
deed, many such instances use point-to-point links that interconnect computing
cores and memories across the whole chip in several cycles. Although network-
on-chip embodiments exploit regularity in order to reduce variability, we believe
that even regular communication architectures will exhibit a sufficient amount
of variability to justify self-calibrating techniques such as the one we propose.
7.3 Future Work
Failure of Circuits at Sub-Critical Voltage
A natural continuation of the work presented in this thesis is to study experi-
mentally how a typical on-chip link fails as its supply voltage is reduced. This
knowledge would bring a very complementary insight with respect to the analyt-
ical models that we have derived. Moreover, it would enable further refinements
of the proposed checkers. Important parameters of the checker combining dou-
ble sampling flip-flops with codes (such as the delay between the main clock and
the clock feeding the shadow latch or the amount of code redundancy) can be
adapted in function of the rate at which the link fails, i.e., how steep the actual
transition from operating to non-operating occurs in practice. For example, a
126 7. CONCLUSION
very steep transition would favour codes with little redundancy, since mostly
the phase matters to detect timing errors under large error rates.
The need to characterise experimentally the failure of computing elements
is even more stringent due to the lack of any analytical model in this case.
Roberts has already explored this research direction by studying the failure
of a multiplier implemented in an FPGA [Roberts et al., 2005]. His work has
pointed out inter- and intra-die process variation. In addition, such experimental
studies can be performed for synthetic (e.g., random) as well as realistic data
sets. Possible data dependencies in the critical path may thus be emphasised.
In our opinion, a similar work is required to validate the architectures proposed
for self-calibrating adders, e.g., the one depicted in Fig. 6.16.
GALS Designs
GALS (Globally Asynchronous Locally Synchronous) is a broadly-defined design
paradigm where locally synchronous islands communicate with each other asyn-
chronously [Chapiro, 1984]. The motivation behind GALS designs is to avoid
the distribution of a global clock across the whole chip, to be modular, while
at the same time leveraging synchronous design tools because only communi-
cation between synchronous islands is performed asynchronously. GALS sys-
tems rely on wrappers—referred to as port controllers—to ensure data integrity
during transfers between two locally synchronous islands. Port controllers are
asynchronous finite state machine that pause (i.e., stretch) the clocks of both
synchronous islands while data is being transferred. Applied in a GALS design,
soft self-synchronising codes constitute an alternative and attractive solution
to traditional asynchronous codes in order to indicate when the data at the
receiver end can be safely sampled. Thus, bandwidth could be used more ef-
ficiently during data transfers. Although the encoder and decoder logic would
be fed by clocks that are not synchronised, the fact that both clocks are paused
during the whole transfer ensures consistency between the phase bits generated
individually by the encoder and decoder.
7.4 Perspectives
In a near future, we will see chips integrating several billions of transistors that
exhibit extreme static and dynamic variations (e.g., in speed or leakage). Han-
dling such sources of variation by worst-case characterisation will be impractical.
As a result, today’s design technique, namely worst-case design, is fundamen-
tally incompatible with the features chips will soon exhibit. Therefore, we see
a urgent need for alternative techniques such as self-calibration.
Although at a very early stage, checkers such as the ones we propose consti-
tute key building blocks required by self-calibrating on-chip links. As such, they
contribute to the general trend of building a variety of circuits reactive to their
operating conditions. This includes environmental factors such as temperature,
workload, internal features (e.g., leakage current), application-specific data (e.g.,
dependencies that Razor flip-flops can exploit) and context (e.g., reconfiguring
for a particular environment).
On the one hand, the need for alternative solutions such as reactive circuits
is growing because worst-case design poses increasingly large difficulties. On
7.4. PERSPECTIVES 127
the other hand, a limiting factor in the development of such circuits stems
from the necessity to keep design complexity as low as possible. For example,
dynamic frequency scaling for the purpose of thermal management is complex
as it renders performance hard to predict: a chip should not fail to operate fast
enough simply because it is running hot.
We believe that the work presented in this thesis, together with the tech-
niques mentioned below, are worth developing further because they span the
whole design depth and thus explore complementary directions.
• Self-calibration at the device level. Kim et al. have provided several
embodiments, for example, the self-calibration of a keeper based on actual
leakage current measurement [2005], or the reduction of hold failure in
nano-scaled SRAM [Ghosh et al., 2006]. These techniques target very
specific physical phenomena where electrical parameter variations render
worst-case design impractical.
• Self-calibration of circuit operating points. These techniques in-
clude the work presented in this thesis for an on-chip link [Worm et al.,
2005], as well as the techniques based on Razor flip-flops [Austin et al.,
2004; Kaul et al., 2005; Das et al., 2006]. Such techniques target the
detection and correction of timing errors.
• Algorithmic noise tolerance. This technique exploits application-
specific properties to provide error tolerance [Shanbhag, 2002; Shim et
al., 2004]. While application-specific, it is able to recover from any kind
of errors—timing or additive.
Through the checkers proposed in this thesis, we make a first step possible
towards design techniques that do not rely at all on worst-case characterisation
of the designed circuit. In this respect, we propose a more radical paradigm
shift than “better than worst-case” designs [Austin et al., 2005]. Although much
remains to be done before self-calibrating techniques reach a mature level, we
hope that, in the end, the checkers we have proposed will help in establishing
alternatives to worst-case design.
128 7. CONCLUSION
Appendix A
Residual Error Rate of Linear
Codes Over the Timing Error
Channel
In order to derive the undetected errop probability, we start from Eq. (5.8)and focus on the sum ∑
t∈C
t≥e˜
P (e˜ | t) . (A.1)
An exact value seems difficult to obtain. As a result, we will bound and approxi-
mate this sum. First, we evaluate the summand by showing how the probability
P (e˜ | t) is related to the realisation of the error process ek. Since e˜ = e ·t, when-
ever ti = 0, we always have e˜i = 0. When ti = 1, e˜i = 1 with probability εt
and e˜i = 0 with probability (1− εt). We denote by w(v) the weight of a binary
vector v, i.e., the number of bits equal to 1. By definition, there are w(t) bits
of t equal to 1; thus, we obtain that, for any e˜ such that e˜ ≤ t,
P (e˜ | t) = εw(e˜)t (1− εt)w(t)−w(e˜) . (A.2)
Plugging Eq. (A.2) into Eq. (5.8), we obtain
εres =
1
2K
∑
e˜∈C\{~0}
∑
t∈C
t≥e˜
ε
w(e˜)
t (1− εt)w(t)−w(e˜) . (A.3)
For convenience, we introduce the following notation.
Definition 15. (the set Sup(e˜)). Let C be a (N, K) linear code and e˜ be a
codeword of C. The set Sup(e˜) is defined as the set of all codewords of C that
are larger than or equal to e˜:
Sup(e˜) = {t ∈ C | t ≥ e˜}.
129
130
A. RESIDUAL ERROR RATE OF LINEAR CODES OVER THE TIMING ERROR
CHANNEL
We call the bit positions i of e˜ where e˜i = 1 the “forced to 1” bit positions,
because e˜i = 1 ⇒ ti = 1 for any vector t ∈ Sup(e˜).
In order to compute the sum of Eq. (A.1), we exploit the fact that the
encoding is systematic, which means that the first K bits of any codeword are
independent of each other. As a result, for every possible assignment of the first
K bits, there exists a codeword that has the assigned bit values in its first K
bits positions. We illustrate this property in the following example that shows
how a simpler sum than the one of Eq. (A.1) can be bounded.
Example 13. (computation of the sum
∑
t∈C (1− εt)w(t)). Consider the
computation of the sum
∑
t∈C (1− εt)w(t) for a linear and systematic (N, K)
code C. We decompose the weight of every codeword t ∈ C into the weight of
the information bits and the weight of the redundant bits:
w(t) = wI(t) + wR(t),
where wI(t) (respectively wR(t)) is the number of 1 in the information (re-
spectively redundant) part of the codeword t. On one hand, we have clearly
0 ≤ wR(t) ≤ N −K. On the other hand, we know that there are exactly
(
K
i
)
codewords such that wI(t) = i. As the first K bits can take any of the 2
K possible
assignments of the information bits, we write
∑
t∈C
(1− εt)w(t) =
K∑
i=0
∑
t∈C
wI(t)=i
(1− εt)i+wR(t) .
Using the bounds on wR(t), we obtain directly two bounds
∑
t∈C
(1− εt)w(t) ≤
K∑
i=0
(
K
i
)
(1− εt)i+N−K ,
∑
t∈C
(1− εt)w(t) ≥
K∑
i=0
(
K
i
)
(1− εt)i .
We would like to apply the very same procedure to compute the sum of
Eq. (A.1). To do so, we have to account for the constraint t ≥ e˜, which imposes
ti = 1 for each bit position i such that e˜i = 1. In addition, the code C imposes
N −K dependence relations between the N bits. The difficulty is that we have
to include in the sum only the N -uples that (i) have a 1 at each bit position
that is forced to 1 by e˜ and (ii) still satisfy the N − K dependence relations.
Fig. A.1 illustrates these constraints. Before showing how this question can be
dealt with, we formalise the definition of “dependence-free” bit position.
Definition 16. ((maximum) set of free bit positions for a set S). Let
S be a set of vectors in {0, 1}N , E be an ordered subset of bit position indices:
E ⊆ {1, ..., N}. For any vector s ∈ S, let sE be the vector of {0, 1}|E| that
is obtained by keeping the entries si whose indices i belong to E, i.e., sE,i =
si ∀i ∈ E.
131
1 1 1 10 0 00
1 1 1 1X X X X
e
information bits redundant bits
t
code constraints
Figure A.1: Information bits of e˜ that are 1 impose that the bits at the same
position in the codeword t are also 1 as indicated by the two leftmost vertical
arrows. Similarly, the redundant bits of e˜ that are 1 impose that the corresponding
bit positions are 1 for each element of Sup(e˜). This constraint is not be met under
any possible assignment of unconstrained information bit positions marked by a
“X”.
We define E as a set of free bit positions if and only if
for all a ∈ {0, 1}|E|, there exists s ∈ S such that sE = a.
The set E is maximum if there exists no other set of free bit positions, F 6= E,
with E ⊂ F .
In other words, a set E of bit position indices is a set of free bit positions
for a set S if and only if, for all possible assignments of the bit positions with
indices in E, there exists elements of S having the assigned values. We call the
bit positions belonging to a free set of bit positions the free bit positions. We
illustrate the definition with an example.
Example 14. (set of free bit positions for a code).
Consider a linear and systematic (N = 5, K = 3) code C, where the first re-
dundant bit is the sum of the first two information bits and the second redundant
bit is the sum of the last two information bits. This code is given in Table A.1.
Suppose that e˜ = (1, 0, 0, 1, 0), we obtain Sup(e˜) = {(1, 0, 0, 1, 0); (1, 0, 1, 1, 1)}.
By visual inspection of Sup(e˜), we find two sets of free bit positions: E = {3}
and E = {5}. The maximum cardinality of a set of free bit positions is
therefore 1. It cannot be more, since Sup(e˜) contains two codewords. Now,
if e˜ = (0, 0, 1, 0, 1), we find: Sup(e˜) = {(0, 0, 1, 0, 1); (1, 0, 1, 1, 1)}, yielding
E = {1} and E = {4} as maximum sets of free bit positions. However, if
e˜ = (0, 1, 0, 1, 1), we realize that Sup(e˜) only contains e˜ itself, so that the set of
free bit positions is empty.
Let us assume that we know a maximum set of free bit positions for Sup(e˜)
(we defer the question of how to find it). We call the set M . We point out that
132
A. RESIDUAL ERROR RATE OF LINEAR CODES OVER THE TIMING ERROR
CHANNEL
x1 x2 x3 x4 x5
0 0 0 0 0
0 0 1 0 1
0 1 0 1 1
0 1 1 1 0
1 0 0 1 0
1 0 1 1 1
1 1 0 0 1
1 1 1 0 0
Table A.1: A linear (N = 5, K = 3) code.
M is not uniquely defined, as shown in Example 14; however, |M | is unique
because M is a maximum set of free bit positions. We distinguish the following
types of bit positions within the set Sup(e˜):
(i) The positions of the bits forced to 1. There are w(e˜) such bit positions.
(ii) Knowing (i), we determine the free bit positions of Sup(e˜) (i.e., the bit
positions with indices in M).
(iii) The remaining bit positions, i.e., the bit positions which are not included
in (i) and (ii).
Based on this decomposition and for every t ∈ Sup(e˜), we write
w(t) = w(e˜) + wF(t) + wrem. (t) , (A.4)
where wF(t) designates the number of 1 in the bit positions with indices in M
and where wrem. (t) is the number of 1 in the remaining bit positions (iii). We
follow now exactly the same procedure as in Example 13. That is, we decompose
the sum in function of the weight assigned to the set of free bit positions:∑
t∈Sup(e˜)
(1− εt)w(t) = (1− εt)w(e˜) +
∑
t∈C
t>e˜
(1− εt)w(t)
= (1− εt)w(e˜) +
|M |∑
i=1
∑
t∈C;t>e˜
wF(t)=i
(1− εt)w(e˜)+i+wrem.(t). (A.5)
The difference with Example 13 is that we do not know how to evaluate easily
|M |, because we sum only over those codewords that are larger than e˜. As a
result, we only bound the cardinality of this set and give later in Lemma 4 an
expression for an upper and a lower bound, denoted respectively by Mh and
Ml. We obtain trivially from Eq. (A.4)
w(t) ≥ w(e˜) + wF(t), (A.6)
and
w(t) ≤ w(e˜) + wF(t) + N − w(e˜)− |M |
= wF(t) + N − |M |. (A.7)
133
Lower bounding |M | by Ml and using the upper bound of Eq. (A.7), we obtain
∑
t∈Sup(e˜)
(1− εt)w(t) ≥ (1− εt)w(e˜) +
Ml∑
i=1
∑
t∈C;t>e˜
wF(t)=i
(1− εt)N+i−Ml
= (1− εt)w(e˜) +
Ml∑
i=1
(
Ml
i
)
(1− εt)N+i−Ml ,
= (1− εt)w(e˜) + (1− εt)N−Ml
(
(2− εt)Ml − 1
)
,
(A.8)
where, in the second equality, we have exploited the fact that there are exactly(
Ml
i
)
codewords t of Sup(e˜) such that wF(t) = i, and, in the last equality, we
have used the binomial formula:
(1 + x)Q =
Q∑
i=0
(
Q
i
)
xi,
with Q an integer.
Similarly, we upper bound |M | by Mh and combine with the lower bound of
Eq. (A.6) to get the following upper bound on the sum of Eq. (A.1):
∑
t∈Sup(e˜)
(1− εt)w(t) ≤
Mh∑
i=0
∑
t∈C;t≥e˜
wF(t)=i
(1− εt)w(e˜)+i
= (1− εt)w(e˜)
Mh∑
i=0
(
Mh
i
)
(1− εt)i
= (1− εt)w(e˜) (2− εt)Mh . (A.9)
The derivation of Ml and Mh is given in the following lemma proven at the end
of this appendix.
Lemma 4. (cardinality of a maximum set of free bit positions for
Sup(e˜)). Let C be a (N, K) systematic linear code and let C⊥ be the (N, N−K)
code orthogonal to C. Let e˜ be a codeword of C and M be a maximum set of
free bit positions for Sup(e˜). Then, we have the bounds
|M | ≥ Ml =
{
K − w(e˜) if w(e˜) ≤ K,
0 otherwise.
(A.10)
and, if C satisfies Hypothesis 1,
|M | ≤ Mh =

K − w(e˜) if w(e˜) ≤ d (C⊥)− 1,
K − d (C⊥)+ 1 if d (C⊥) ≤ w(e˜) ≤ N −K + d (C⊥)− 1,
N − w(e˜) if w(e˜) ≥ N −K + d (C⊥) ,
(A.11)
with d
(
C⊥
)
denoting the minimum distance of C⊥.
134
A. RESIDUAL ERROR RATE OF LINEAR CODES OVER THE TIMING ERROR
CHANNEL
As a corollary of this lemma, |M | = K −w(e˜) whenever w(e˜) ≤ d (C⊥)− 1.
The lower bound is not specific to a certain code, whereas the upper bound
requires some information specific to the code, namely the minimum distance
of the code C⊥.
Inserting the values of Ml in Eq. (A.8) yields directly the lower bound∑
t∈Sup(e˜)
(1− εt)w(t)
≥
{
(1− εt)w(e˜)
(
1 + (1− εt)N−K
(
(2− εt)K−w(e˜) − 1
))
if w(e˜) ≤ K,
(1− εt)w(e˜) otherwise.
(A.12)
We obtain a lower bound on εres by plugging Eq. (A.12) in Eq. (A.3):
εres ≥ 1
2K
K∑
i=1
∑
e˜∈C
w(e˜)=i
εt
i
(
1 + (1− εt)N−K
(
(2− εt)K−i − 1
))
+
1
2K
N∑
i=K+1
∑
e˜∈C
w(e˜)=i
εit
=
1
2K
{
K∑
i=1
Ai εt
i
(
1 + (1− εt)N−K
(
(2− εt)K−i − 1
))
+
N∑
i=K+1
Ai εt
i
}
,
(A.13)
with Ai the number of codewords of weight i.
In order to avoid heavy notations, we do not develop the value of Mh needed
to upper bound εres. We plug Eq. (A.9) into Eq. (A.3), which yields
εres ≤ 1
2K
N∑
i=1
∑
e˜∈C
w(e˜)=i
εt
i (2− εt)Mh(i)
=
1
2K
N∑
i=1
Ai εt
i (2− εt)Mh(i) , (A.14)
with Mh depending on w(e˜) as indicated in Eq. (A.11).
In Eq. (A.13), the exponent N−K upper bounds the weight of the redundant
bits of t. Approximating the latter quantity to
⌊
N−K
2
⌋
yields the following
approximation of εres:
εres ∼= 1
2K
{
K∑
i=1
Ai εt
i
(
1 + (1− εt)b
N−K
2 c ((2− εt)K−i − 1))+ N∑
i=K+1
Ai εt
i
}
.
(A.15)
To conclude, we prove Lemma 4.
135
Proof. (Lemma 4). Given a codeword e˜, we construct a sequence of wR(e˜)+1
vectors, {e˜i}i=0,...,wR(e˜) with e˜wR(e˜) = e˜ that enables to bound the cardinality
of a maximum set of free bit positions for Sup
(
e˜wR(e˜)
)
. Before defining the
sequence, we introduce some needed definitions.
We decompose the weight of e˜ into two contributions: w(e˜) = wI(e˜)+wR(e˜),
where wI(e˜) is the weight of the information bits of e˜ and wR(e˜) is the weight of
the redundant bits of e˜. For convenience, we denote by I the set of information
bit positions, i.e., I = {1, ..., K}, and by R the set of redundant bit positions,
i.e., R = {1, ..., N −K}. Moreover, we define the two following sets of bit
positions.
Definition 17. (wrong information bit positions).
WI = {l ∈ I | e˜l = 1} .
Definition 18. (wrong redundant bit positions).
WR = {r ∈ R | e˜K+r = 1} .
We order the elements of WR arbitrarily: WR =
{
r1, ..., rwR(e˜)
}
and use this
ordering designate the parity check constraints associated with the redundant
bit positions ri, i = 1, ..., wR(e˜), i.e., the i
th parity check constraint is the one
defined by
⊕K
l=1 xl · pl,ri = xK+ri = 1.
The initial vector of the sequence, e˜0 contains 1 only in the information bit
positions j where e˜j = 1:
e˜0j = 1, for all j ∈ WI, and
e˜0j = 0, for all j ∈ {1, ..., N} \WI.
The vector e˜i, 1 ≤ i ≤ wR(e˜) is constructed from e˜i−1 as follows:
e˜ij = e˜
i−1
j , for all j ∈ {1, ..., N} \ {K + ri} and
e˜iK+ri = 1.
As e˜i−1K+ri = 0, we observe that e˜
i differs from e˜i−1 only in bit position K + ri.
Clearly, the last vector of this sequence is e˜wR(e˜) = e˜. Since wR(e˜
0) = 0, I \WI
is a maximum set of free bit positions for Sup
(
e˜0
)
. Then, at each iteration
i, 1 ≤ i ≤ wR(e˜), the ith parity check imposes
K⊕
j=1
xj · pj,ri = 1, (A.16)
which we can rewrite as⊕
j∈WI
xj · pj,ri ⊕
⊕
j∈I\WI
xj · pj,ri = 1. (A.17)
Because for all x ∈ Sup(e˜i) and for all bit position j ∈ WI , xj ≥ e˜ij = e˜j = 1,
it holds that ⊕
j∈WI
xj · pj,ri =
⊕
j∈WI
e˜j · pj,ri = 1.
136
A. RESIDUAL ERROR RATE OF LINEAR CODES OVER THE TIMING ERROR
CHANNEL
Therefore, we obtain⊕
j∈I\WI
xj · pj,ri = 0 for all x ∈ Sup
(
e˜i
)
. (A.18)
We call Eq. (A.18) the equation associated with parity check ri. Any codeword
x ∈ Sup(e˜wR(e˜)) satisfies wR(e˜) equations such as Eq. (A.18). It follows that
the maximum number |M | of free bit positions for Sup(e˜wR(e˜)) is the maximum
number of variables xj , j ∈ I \WI, that can be assigned freely while satisfying
the following system of wR(e˜) equations:⊕
j∈I\WI
xj · pj,r1 = 0,⊕
j∈I\WI
xj · pj,r2 = 0,
...⊕
j∈I\WI
xj · pj,rwR(e˜) = 0. (A.19)
Let T be the number of linearly independent equations in the system (A.19).
Then, |M | is the difference between the number of information bits not affected
by an error, K − wI(e˜), and T :
|M | = K − wI(e˜)− T. (A.20)
Obviously, T ≤ wR(e˜). Hence Eq. (A.20) yields
|M | ≥ max (K − w(e˜), 0) , (A.21)
which is Eq. (A.10).
In order to obtain an upper bound on |M |, we need to lower bound T . We
proceed as follows. First, we observe that if T = wR(e˜), all equations in the
system (A.19) are independent and thus |M | = K − w(e˜). From now on, we
assume T < wR(e˜). Let i0 be the index of the first parity check for which
the associated equation is linearly dependent on the ones associated with the
previous parity checks. That is, the equation
⊕
j∈I\WI xj · pj,ri0 = 0 is a linear
combination of the equations associated with parity check rq1 ,..., rqQ for some
indices q1, ..., qQ, with Q ≤ i0 − 1. Let Ĉ be the systematic code defined by
the generator matrix Ĝ =
(
IK P̂
)
where IK is the identity matrix of size
K and where the matrix P̂ is identical to P except in the column ri0 , i.e.,
pˆr = pr for all r ∈ R \ {ri0}. The column ri0 of P̂ is obtained as
pˆri0 = pri0 ⊕ prq1 ⊕ ...⊕ prqQ . (A.22)
Because Ĉ is obtained from C by a so called elementary operation (namely, the
column ri0 of P̂ is a linear combination of some columns of P ), the two codes
are said equivalent, that is they have the same coding properties. In particular,
their minimum distance is the same.
Because P is full rank, so is P̂ . As a result, the column ri0 of P̂ , p̂ri0 , is not
the zero vector. Let ρ be the number of 1 in this column: ρ =
∑K
j=1 pˆj,ri0 . Since
137
the parity check i0 is linearly dependent of parity check rq1 ,..., rqQ , Eq. (A.22)
imposes that P̂j,ri0 = 0 for all j ∈ I \WI . It follows that
ρ ≤ wI(e˜). (A.23)
Let Ĥ be the parity check matrix of Ĉ: Ĥ =
(
P̂ T IN−K
)
. Ĥ is identical to
the parity check matrix H of C, except the row ĥri0 :
ĥr = hr for all r ∈ R \ {ri0} ,
ĥri0 = hri0 ⊕ hrq1 ⊕ ...⊕ hrqQ .
Since the rows of Ĥ are codewords of Ĉ⊥, the minimum distance of Ĉ⊥ is upper
bounded by the number of 1 in the row ĥri0 of Ĥ . Therefore,
d
(
Ĉ⊥
)
≤ w(ĥri0 ) = ρ + 1 + Q.
In the last equality, we have used the fact that ĥri0 has exactly ρ 1 in the first K
bit positions and exactly 1 + Q in the remaining N −K bit positions. Because,
in addition, Q ≤ i0 − 1, we obtain
d
(
Ĉ⊥
)
≤ ρ + i0. (A.24)
Combining Eqs. (A.23) and (A.24), we get
d
(
Ĉ⊥
)
≤ wI(e˜) + i0.
Now, the number T of equations linearly independent in Eq. (A.19) is at least
i0 − 1, by definition of i0. Thus, i0 ≤ T + 1 and
d
(
Ĉ⊥
)
≤ wI(e˜) + T + 1. (A.25)
By adding wR(e˜) on both sides of Eq. (A.25) and reordering the terms, we
obtain finally
wR(e˜)− T ≤ w(e˜)− d
(
Ĉ⊥
)
+ 1.
Since the codes Ĉ⊥ and C⊥ are equivalent, their minimum distance is the same.
Thus
wR(e˜)− T ≤ w(e˜)− d
(
C⊥
)
+ 1. (A.26)
By definition of T , wR(e˜) − T is the number of equations linearly dependent
in Eq. (A.19). From Eq. (A.26), we deduce that, if w(e˜) ≤ d (C⊥) − 1, all
equations in Eq. (A.19) are linearly independent and therefore |M | = K−w(e˜).
In addition, if w(e˜) ≥ d (C⊥), we have the two following upper bounds.
First, N − w(e˜) is a trivial upper bound on |M | because w(e˜) bit positions are
forced to 1 and thus at most the N − w(e˜) remaining ones can constitute a set
of free bit positions. Secondly, we have deduced that |M | = K − d
(
Ĉ⊥
)
+ 1
if w(e˜) = d
(
Ĉ⊥
)
− 1. Since |M | is a decreasing function of w(e˜), we have also
138
A. RESIDUAL ERROR RATE OF LINEAR CODES OVER THE TIMING ERROR
CHANNEL
|M | ≤ K − d (C⊥) + 1 whenever w(e˜) ≥ d(Ĉ⊥). Combining these two up-
per bounds yields |M | ≤ min (K − d (C⊥)+ 1, N − w(e˜)). Finally, the overall
upper bound as a function of w(e˜) reads
|M | ≤

K − w(e˜) if w(e˜) ≤ d (C⊥)− 1,
K − d (C⊥)+ 1 if d (C⊥) ≤ w(e˜) ≤ N −K + d (C⊥)− 1,
N − w(e˜) if w(e˜) ≥ N −K + d (C⊥) . (A.27)
Appendix B
Residual Error Rate of
Alternating-Phase Encoding With
Linear Codes Over the Timing
Error Channel
We proceed as in Appendix A. We consider a (N + 1, K + 1) code Cand recall that the same assumptions as the one stated in Sec. 5.5.1
are made: C is linear, systematic, and its codewords are equally likely to be
transmitted. Futhermore, we assume that Hypothesis 1 is verified. We point
out that the symbol K still denotes the number of information bits, excluding
thus the alternating-phase bit. Without loss of generality, we can assume that
the alternating-phase bit is located at the least significant bit, i.e., x1 is the
alternating-phase bit of a codeword x ∈ C. We first introduce the following
notations.
Definition 19. (LSB0 (.) and LSB1 (.)). Let S be a set of vectors of {0, 1}N ,
for a certain integer N . We define LSB0 (S) (respectively LSB1 (S)) as the
subset of vectors of S, which have 0 (respectively 1) as the least significant bit:
LSBi (S) = {s ∈ S | s1 = i}; i ∈ 0, 1.
In order to compute εres, we take the same starting point as in
Eq. (5.5)
εres =
∑
e˜∈C\{~0}
∑
t∈C
t≥e˜
P (e˜ | t) P (t) . (B.1)
The alternating-phase encoding imposes two constraints:
• The alternating-phase bit is not transmitted and hence is never corrupted
by an error. Consequently, e˜1 = 0 for all e˜. Henceforth, the error vector e˜
belongs to LSB0 (C).
139
140
B. RESIDUAL ERROR RATE OF ALTERNATING-PHASE ENCODING WITH LINEAR
CODES OVER THE TIMING ERROR CHANNEL
• t1 = 1, since the least significant bits of two consecutive codewords are
always different. Hence, the transition vector t belongs to LSB1 (C).
By linearity of the code C and due to the codeword equiprobability as-
sumption, the transition vector t is uniformly distributed over LSB1 (C),
i.e.,
P (t) =
{
1
|LSB1(C)|
= 12K if t ∈ LSB1 (C) ,
0 otherwise.
In the last equation, we have used the fact that half of the codewords of the
(N + 1, K + 1) code C have 1 as least significant bit. Plugging the obtained
expression into Eq. (B.1) yields
εres =
∑
e˜∈LSB0(C)\{~0}
∑
LSB1(Sup(e˜))
P (t) P (e˜ | t)
=
1
2K
∑
e˜∈LSB0(C)\{~0}
∑
t∈LSB1(Sup(e˜))
P (e˜ | t) .
(B.2)
Now, in order to evaluate P (e˜ | t), we make the same reasoning as the one made
to obtain Eq. (A.2). The only difference is that, deterministically, the realisation
of the error process e is 0 for the least significant bit (i.e., e1 = 0), since the
alternating-phase bit is not transmitted. As a result, we remove a factor 1− εt
from Eq. (A.2) and obtain
εres =
1
2K
∑
e˜∈LSB0(C)\{~0}
∑
t∈LSB1(Sup(e˜))
P (e˜ | t)
=
1
2K
∑
e˜∈LSB0(C)\{~0}
ε
w(e˜)
t
∑
t∈LSB1(Sup(e˜))
(1− εt)w(t)−1−w(e˜) .
(B.3)
We focus now on bounding the sum∑
t∈LSB1(Sup(e˜))
(1− εt)w(t)−1−w(e˜) . (B.4)
Like in Appendix A, we bound the size of a maximum set of free bit positions
for the set LSB1 (Sup(e˜)) (the proof of Lemma 5 is given at the end of this
appendix).
Lemma 5. (maximum set of free bit positions for LSB1 (Sup(e˜))). Let
C be a (N + 1, K + 1) systematic linear code satisfying Hypothesis 1 (stated in
Appendix A) and let e˜ be a codeword of C such that e˜1 = 0. The upper bound
of Eq. (A.11) also holds for the set LSB1 (Sup(e˜)).
Contrary to Lemma 4, we have only derived an upper bound since
LSB1 (Sup(e˜)) may be an empty set.
We continue by upper bounding the summand of Eq. (B.4). To do so, we
lower-bound w(t), t ∈ LSB1 (Sup(e˜)). We write
w(t) ≥ 1 + w(e˜) + wF(t), (B.5)
141
with the first term of the RHS being the 1 in the least significant bit position,
the second term the required 1 in bit positions where e˜ has 1 and the last term
the 1 in the free bit positions of LSB1 (Sup(e˜)). Moreover, we know that any
two codewords of the (N + 1, K + 1) code C differ in at least d bit positions,
where d is the minimum distance of the code. In particular, any codeword
t ∈ LSB1 (Sup(e˜)) differs in at least d bit positions with e˜. Because, in addition,
t ≥ e˜, there exists at least d bit positions j such that tj = 1 and e˜j = 0.
Therefore, the following holds:
w(t) ≥ w(e˜) + d. (B.6)
By combining Eqs. (B.5) and (B.6), we obtain
w(t)− w(e˜)− 1 ≥ max (d− 1, wF(t)) . (B.7)
Introducing the bounds of Lemma 5 and of Eq. (B.7) in Eq. (B.4) yields∑
t∈LSB1(Sup(e˜))
(1− εt)w(t)−w(e˜)−1 =
∑
t∈LSB1(Sup(e˜))
wF(t)=0
(1− εt)w(t)−w(e˜)−1
+
Mh∑
l=1
∑
t∈LSB1(Sup(e˜))
wF(t)=l
(1− εt)w(t)−w(e˜)−1
≤ (1− εt)d−1 +
Mh∑
l=1
(
Mh
l
)
(1− εt)l
= (1− εt)d−1 +
(
(2− εt)Mh − 1)
)
, (B.8)
where Mh depends on w(e˜) and is defined in Eq. (A.11). We obtain an up-
per bound on εres by combining Eqs. (B.3) and (B.8). Again, for the sake of
readability, we do not develop further the value of Mh and obtain
εres =
1
2K
∑
e˜∈LSB0(C)\{~0}
ε
w(e˜)
t
∑
t∈LSB1(Sup(e˜))
(1− εt)w(t)−w(e˜)−1
≤ 1
2K
N∑
i=1
∑
e˜∈LSB0(C)
w(e˜)=i
εt
i
(
(1− εt)d−1 +
(
(2− εt)Mh(i) − 1
))
=
1
2K
N∑
i=1
Ai εt
i
(
(1− εt)d−1 +
(
(2− εt)Mh(i) − 1
))
, (B.9)
where Ai is the number of weight-i codewords in the (N, K) code C, and where
d is the minimum distance of the (N + 1, K + 1) code.
We obtain an approximation of εres as follows. First, we approximate the
quantity w(t)−w(e˜) for t ∈ LSB1 (Sup(e˜)), which is the number of bit positions i
such that ti = 1 and e˜i = 0. We write w(t)−w(e˜) = wI(t)−wI(e˜)+wR(t)−wR(e˜)
and approximate wI(t)− wI(e˜) to 1 + wF(t). Since wR(t)− wR(e˜) is an integer
142
B. RESIDUAL ERROR RATE OF ALTERNATING-PHASE ENCODING WITH LINEAR
CODES OVER THE TIMING ERROR CHANNEL
between 0 and N − K, we approximate the latter quantity to the arithmetic
average over the interval
⌊
N−K
2
⌋
, which yields
w(t)− w(e˜) ∼= 1 + wF(t) +
⌊
N −K
2
⌋
.
This approximation leads to
εres ∼= 1
2K
K∑
i=1
Ai εt
i
{
(1− εt)d−1 + (1− εt)b
N−K
2 c ((2− εt)K−i − 1)} ,
(B.10)
with Ai is the number of weight-i codewords in the (N, K) code C and d the
minimum distance of the (N + 1, K + 1) code.
To conclude, we prove Lemma 5.
Proof. (Lemma 5).
We proceed as in the proof of Lemma 4. In particular, we consider the
system of wR(e˜) equations constraining the bits positions that do not belong to
WI .
x1 · p1,r1 ⊕
⊕
j∈I\{WI∪{1}}
xj · pj,r1 = 0,
x1 · p1,r1 ⊕
⊕
j∈I\{WI∪{1}}
xj · pj,r2 = 0,
...
x1 · p1,rwR(e˜) ⊕
⊕
j∈I\{WI∪{1}}
xj · pj,rwR(e˜) = 0. (B.11)
The number of free bit positions for LSB1 (Sup(e˜)) is the maximum number of
variables that can be assigned freely while satisfying the system of Eq. (B.11)
and such that x1 = 1. We rewrite thus the system of Eq. (B.11) as⊕
j∈I\{WI∪{1}}
xj · pj,r1 = p1,r1 ,⊕
j∈I\{WI∪{1}}
xj · pj,r2 = p1,r2 ,
...⊕
j∈I\{WI∪{1}}
xj · pj,rwR(e˜) = p1,rwR(e˜) . (B.12)
If the non-homogeneous system (B.12) has a solution (which may not be the
case if the equations associated with parity checks r1, ..., rwR(e˜) require x1 = 0),
then the number of its solutions is the same as that of the homogeneous system
(A.19). As a result, |M | can be upper bounded by Eq. (A.11) as well. However,
there exists no other lower bound as the trivial one (namely, 0) because the
system (B.12) may have no solution.
Appendix C
Residual Error Rate of the Berger
Code Over the Binary Symmetric
Channel
We compute the residual error rate of differential encoding using the Bergercode over the BSC. Let C be a Berger code with K information bits (the
Berger code requires K+1 to be a power of two). The Berger code is systematic;
we denote by Q = log2 (K + 1) the number of redundant bits. Thus, we write
for all c ∈ C, c = (cI | cR) where cR ∈ {0, 1}Q is the Q-bit redundant vector of
the codeword c computed from its information part cI ∈ {0, 1}K . We partition
C into K+1 subsets: C =
⋃K
i=0 Ci with Ci = {c ∈ C | w(cI) = i}. By definition
of the Berger code, cR = B (K − i) for all c ∈ Ci where B (k) , k ∈ {0, ..., K}
is the binary value of the integer k coded on Q bits. We denote by t ∈ C the
transition vector, and by t˜ the sampled transition vector: t˜ = t⊕ e, with e the
additive error vector. The residual error rate is given by
εres,a =
∑
t˜∈C\{t}
P
(
t˜
)
, (C.1)
with P
(
t˜
)
the probability of occurrence for t˜. By conditioning on the code
subset Ci to which the transition vector t belongs to, we get
εres,a =
K∑
j=0
∑
t˜∈Cj\{t}
P
(
t˜
)
=
K∑
i=0
K∑
j=0
∑
t˜∈Cj\{t}
P
(
t˜ | t ∈ Ci
)
P (t ∈ Ci)
=
K∑
i=0
(
K
i
)
2K
K∑
j=0
∑
t˜∈Cj\{t}
P
(
t˜ | t ∈ Ci
)
, (C.2)
where we have used the codeword equiprobability assumption in the last equal-
ity. We focus now on the sum
∑
t˜∈Cj\{t}
P
(
t˜ | t ∈ Ci
)
and compute it in the
143
144
C. RESIDUAL ERROR RATE OF THE BERGER CODE OVER THE BINARY
SYMMETRIC CHANNEL
following lemma.
Lemma 6. Denote by F (i, j, εa) the sum
∑
t˜∈Cj\{t}
P
(
t˜ | t ∈ Ci
)
. We have
F (i, j, εa) = ε
w(B(i)⊕B(j))
a (1− εa)Q−w(B(i)⊕B(j)) ·

T1 if j > i,
T2 if j = i,
T3 if j < i,
where Q = log2 (K + 1) and T1, T2, T3 are defined respectively in Eqs. (C.3),
(C.4), and (C.5):
T1 =
min(i,K−j)∑
k=0
(
i
k
)(
K − i
j − i + k
)
· εj−i+2ka (1− εa)K−(j−i+2k) , (C.3)
T2 =
min(i,K−i)∑
k=1
(
i
k
)(
K − i
k
)
· ε2ka (1− εa)K−2k , (C.4)
and
T3 =
min(j,K−i)∑
k=0
(
i
i− j + k
)(
K − i
k
)
· εi−j+2ka (1− εa)K−(i−j+2k) . (C.5)
Proof. The redundant part of any t˜ ∈ Cj can equivalently be written as t˜R =
B (j) ⊕ ~1. As a result, the redundant part of any t˜ ∈ Cj differs from the
redundant part of t in exactly w(B (i)⊕B (j)) bit positions. We write thus for
any t˜ ∈ Cj
P
(
t˜ | t ∈ Ci
)
= P
(
t˜I | t ∈ Ci
) · εw(B(i)⊕B(j))a (1− εa)Q−w(B(i)⊕B(j)) ,
so that∑
t˜∈Cj\{t}
P
(
t˜ | t ∈ Ci
)
=
εw(B(i)⊕B(j))a (1− εa)Q−(w(B(i)⊕B(j))) ·
∑
t˜∈Cj\{t}
P
(
t˜I | t ∈ Ci
)
.
Now, we work the sum ∑
t˜∈Cj\{t}
P
(
t˜I | t ∈ Ci
)
.
By definition of Ci and Cj , we get∑
t˜∈Cj\{t}
P
(
t˜I | t ∈ Ci
)
=
∑
t˜I∈{0,1}
K\{tI}
w(t˜I)=j
P
(
t˜I | w(tI) = i
)
.
Since t˜I = tI ⊕ eI , we rewrite the last sum as∑
t˜I∈{0,1}
K\{tI}
w(t˜I)=j
εw(t˜I⊕tI)a · (1− εa)K−w(t˜I⊕tI) .
145
We consider first the case i = j. Let u ∈ {0, 1}K \ {tI} be a vector such that
w(u) = i and k be the number of information bit positions p such that tpI = 1
and up = 0. Clearly, we have k ≤ i. Because w(u) = w(tI) = i, there exists
k other information bit positions q such that tqI = 0 and u
q = 1. As a result,
k ≤ K − i. For a given value of k, there are exactly
(
i
k
)
·
(
K − i
k
)
vectors
t˜I 6= tI such that w(t˜I ⊕ tI) = 2 k, with 1 ≤ k ≤ min (i, K − i). As a result, we
can write
∑
t˜∈Ci\{t}
P
(
t˜I | w(tI) = i
)
=
min(i,K−i)∑
k=1
(
i
k
)(
K − i
k
)
· ε2ka (1− εa)K−2k .
By a similar reasoning, we obtain, for i < j,∑
t˜∈Cj\{t}
P
(
t˜I | w(tI) = i
)
=
min(i,K−j)∑
k=0
(
i
k
)(
K − i
j − i + k
)
· εj−i+2ka (1− εa)K−(j−i+2k) ,
and, finally, for i > j,∑
t˜∈Cj\{t}
P
(
t˜I | w(tI) = i
)
=
min(j,K−i)∑
k=0
(
i
i− j + k
)(
K − i
k
)
· εi−j+2ka (1− εa)K−(i−j+2k) .
146
C. RESIDUAL ERROR RATE OF THE BERGER CODE OVER THE BINARY
SYMMETRIC CHANNEL
Appendix D
Qualitative Comparison of Razor
Flip-Flops and Codes
This appendix shows the complementarity of Razor flip-flops and soft SSCby comparing qualitatively their detection capabilities in the delay-voltage
plan. That is, we express the residual and detected error probabilities of each
checker as a function of the link supply voltage vch and cycle time Tc. To do so,
we use the bit and word error rate models introduced in Chapter 3, which we
need to express the metrics of interest.
Using the lumped capacitance wire model, the delay through a bit line can
be expressed as
tp =
CL
km
· vch
(vch − vth)2
,
with km the transistor transconductance, CL the line capacitance, and vth the
device threshold voltage.
In order to model the variability of tp, we describe the ratio CL/km, which
we denote by α, by a Gamma random variable with parameters a and b:
α ∼ Γ (a , b). The two parameters a and b are determined from the mean
and standard deviation of tp, as expressed in Eq. (3.2). We use a Gamma
distribution because it models more accurately the line delay than a Gaussian
distribution. For example, a Gamma distribution takes only positive values,
differently to a Gaussian.
As defined in Eq. (6.1), a timing error occurs when tp exceeds the sampling
period Tc. By definition of α, tp > Tc if and only if
α > Tc
(vch − vth)2
vch
.
In order to avoid heavy notations, we denote the deterministic factor
(vch − vth)2 /vch by ∆v. We write thus
εt = P (tp > Tc) = P (α > Tc ·∆v) = 1− F (Tc ·∆v | a, b) ,
where F (· | a , b) is the cumulative distribution function of α:
F (x | a , b) = 1
baγ (a)
∫ x
0
ta−1e−
t
b dt,
147
148 D. QUALITATIVE COMPARISON OF RAZOR FLIP-FLOPS AND CODES
with γ (·) the Gamma function.
We model errors occurring on different bit lines as independent and identi-
cally distributed. For example, when the voltage is lowered, the probability of a
timing error increases identically on all bit lines. Yet, the distributions remain
independent, modelling the independence of the noise sources affecting them.
Henceforth, the probability that at least one timing error occurs in a N -bit word
is
εw = 1− (1− εt)N . (D.1)
Now, we express the residual and reported error probability of Razor flip-
flops, and derive first the residual error probabilty. Eq. (6.2) characterises resid-
ual bit errors. Accordingly, the residual bit error probability is
εSLb,res = P (tp 6∈ [Td; Tc + Td])
= F (Td ·∆v | a , b) + 1− F ((Tc + Td) ·∆v | a , b) . (D.2)
We obtain the residual word error probability, denoted εSLres, by observing that
Razor flip-flops process each data bit individually: a residual word error occurs
as soon as at least one bit line suffers a residual error. That is,
εSLres = 1−
(
1− εSLb,res
)N
. (D.3)
In addition, we are interested in the probability of an error to be reported
to the controller. A bit error is reported by a Razor flip-flop either in case
of a short-path error—i.e., whenever tp ≤ Td—or in case of a detected timing
error—i.e., whenever Tc ≤ tp ≤ Tc + Td. Let εSLb,rep be the probability of a bit
error to be reported by a Razor flip-flop. We obtain
εSLb,rep = F (Td ·∆v | a , b) + F ((Tc + Td) ·∆v | a, b)− F (Tc ·∆v | a , b) .
Since the error signal reported to the controller is obtained by OR-ing the error
signals of each individual Razor flip-flop. It is thus given by
εSLrep = 1−
(
1− εSLb,rep
)N
. (D.4)
We derive now the reported error probability for soft SSC. No error is re-
ported to the controller if and only if no error occurs or an error occurs and is
undetected. The probability of an error to be reported is thus
εSSCrep = εw − εSSCres . (D.5)
The word error rate εw is obtained from Eq. (D.1), while the residual error rate
εSSCres can be approximated by the expression given in Eq. (5.14).
For illustration, we describe a 0.13µm technology as follows.
• The nominal operating points are Vdd = 1.2V and Tc = 2ns.
• µtp = 1ns, σtp = 0.1ns under nominal conditions.
• vth = 0.2V.
Furthermore, Td = 0.3 Tc, as it has been reported for busses [Kaul et al., 2005].
According to the values given to µtp and σtp and under the nominal conditions,
the timing error probability per bit is εt u 10
−15.
149
Figure D.1: Word error rate εw (top), residual error rate εres (middle) and reported
error rate εrep (bottom) of Razor flip-flops (left) and alternating-phase code (right)
with the 8-bit CRC generated by the polynome x8+x2+x+1 for 32-bit information.
150 D. QUALITATIVE COMPARISON OF RAZOR FLIP-FLOPS AND CODES
We have plotted in Fig. D.1 the word error rate (i.e., Eq. (D.1)), the resid-
ual error rate (i.e., Eqs. (D.3) and (5.14)), and the reported error rate (i.e.,
Eqs. (D.4) and (D.5)) of shadow latches and a soft SSC in the voltage-delay
plane. The top graph of Fig. D.1 depicts the word error rate. Areas of zero and
100% error rate are clearly separated. The middle left graph of Fig. D.1 shows
that shadow latches are completely unreliable under a large error rate. On the
contrary, the soft SSC checker operates reliably except in a tiny stripe where
the error rate is moderate (top right of Fig. D.1). Next, by comparing the two
bottom graphs of Fig. D.1, we observe that (i) soft SSC provide the controller
with a consistent error landscape—i.e., no error is reported under conservative
operating points and errors are reported under over-aggressive operation—and
(ii) Razor flip-flops report falsely errors under conservative operating points (due
to short-path errors), while they do not report any error under over-aggressive
operation. As a result, Razor flip-flops require worst-case assumptions about
the error rate to make sure they are kept in the constrained but safe operating
range where timing errors can be reliably corrected and short-path errors are
avoided. There is no such requirement for soft SSC: operation in the whole
voltage-delay plane is possible. Reliability is ensured by limiting the probability
of visiting unsafe operating points, which constrains the operating point control
policy.
Bibliography
[Albassam et al., 1991] Sulaiman Albassam, Bella Bose, and Ramarathnam
Venkatesan. Burst and unidirectional error detecting codes. Proceedings of
the IEEE International Symposium on Information Theory, page 142, June
1991.
[Arnold, 1990] Allen Arnold. Probability, Statistics and Queuing Theory with
Computer Science Applications. Academic Press, San Diego, Calif., 1990.
[Austin et al., 2004] Todd Austin, David Blaauw, Trevor Mudge, and Krisztia´n
Flautner. Making typical silicon matter with Razor. Computer, 37(3):57–65,
March 2004.
[Austin et al., 2005] Todd Austin, Valeria Bertacco, David Blaauw, and Trevor
Mudge. Opportunities and challenges for better than worst-case design. In
Proceedings of the Asia and South Pacific Design Automation Conference,
January 2005.
[Baicheva et al., 1998] Tsonka Baicheva, Stefan Dodunekov, and Peter Kaza-
kov. On the cyclic redundancy-check codes with 8-bits redundancy. Computer
Communications, 21(11):1030–33, August 1998.
[Bainbridge et al., 2003] John Bainbridge, William B. Toms, David Edwards,
and Steve Furber. Delay-insensitive, point-to-point interconnect using m-of-n
codes. In Proceedings of the 9th International Symposium on Asynchronous
Circuits and Systems, pages 132–40, Vancouver, May 2003.
[Berger, 1961] Jay M. Berger. A note on error detecting codes for asymmetric
channels. Information and Control, 4(1):68–71, March 1961.
[Bertozzi et al., 2002] Davide Bertozzi, Luca Benini, and Giovanni De Micheli.
Low power error resilient encoding for on-chip data buses. In Proceedings of
the Design, Automation and Test in Europe Conference and Exhibition, pages
102–9, Paris, March 2002.
[Blunno et al., 2004] Ivan Blunno, Jordi Cortadella, Alex Kondratyev, Luciano
Lavagno, Kelvin Lwin, and Christos P. Sotiriou. Handshake protocols for
de-synchronization. In Proceedings of the 10th International Symposium on
Asynchronous Circuits and Systems, pages 149–158, 2004.
151
152 BIBLIOGRAPHY
[Borkar et al., 2004] Shekhar Borkar, Tanay Karnik, and Vivek De. Design and
reliability challenges in nanometer technologies. In Proceedings of the 41st
Design Automation Conference, page 75, San Diego, Calif., June 2004.
[Borkar, 2005] Shekhar Borkar. Designing reliable systems from unreliable com-
ponents: The challenges of transistors variability and degradation. IEEE
Micro, 25(6):10–16, November–December 2005.
[Bose and Lin, 1985] Bella Bose and Der Jei Lin. Systematic unidirectional
error-detecting codes. IEEE Transactions on Computers, C-34(11):1026–32,
November 1985.
[Bose and Rao, 1982] Bella Bose and Thammavaram Rao. Theory of unidirec-
tional error correcting/detecting codes. IEEE Transactions on Computers,
C-31(6):521–30, June 1982.
[Bose, 1991] Bella Bose. On unordered codes. IEEE Transactions on Comput-
ers, C-40(2):125–31, February 1991.
[Brooks and Martonosi, 2001] David Brooks and Margaret Martonosi. Dynamic
thermal management for high-performance microprocessors. In Proceedings
of the 7th International Symposium on High-Performance Computer Archi-
tecture, pages 171–182, January 2001.
[Chapiro, 1984] Daniel M. Chapiro. Globally-Asynchronous Locally Syn-
chronous Systems. Ph.D. thesis, Stanford University, Stanford, Calif., 1984.
[Cortadella et al., 2004] Jordi Cortadella, Alex Kondratyev, Luciano Lavagno,
and Christos P. Sotiriou. Coping with the variability of combinational logic
delays. In Proceedings of the International Conference on Computer Aided
Design, pages 505–08, San Jose, Calif., October 2004.
[Dally and Poulton, 1998] William J. Dally and John W. Poulton. Digital Sys-
tems Engineering. Cambridge University Press, Cambridge, 1998.
[Das et al., 2006] Shidhartha Das, David Roberts, Seokwoo Lee, Sanjay Pant,
Blaauw David, Todd Austin, Krisztia´n Flautner, and Trevor Mudge. A self-
tuning DVS processor using delay-error detection and correction. IEEE Jour-
nal of Solid-State Circuits, 41(4):792–804, April 2006.
[Dean et al., 1991] Mark E. Dean, Ted E. Williams, and David L. Dill. Efficient
self-timing with level-encoded two-phase dual-rail (LEDR). In Proceedings
of the 1991 University of California/Santa Cruz Conference on Advanced
Research in VLSI, pages 55–70. MIT Press, March 1991.
[Flautner et al., 2001] Krisztia´n Flautner, Steve Reinhardt, and Trevor Mudge.
Automatic performance setting for dynamic voltage scaling. In Proceedings
of the 7th Conference on Mobile Computing and Networking, pages 260–71,
Rome, July 2001.
[Freiman, 1962] C. V. Freiman. Optimal error detecting codes for asymmetric
binary channels. Information and Control, 5(1):64–71, March 1962.
BIBLIOGRAPHY 153
[Gaujal et al., 2005] Bruno Gaujal, Nicolas Navet, and Cormac Walsch.
Shortest-path algorithms for real-time scheduling of FIFO tasks with minimal
energy use. ACM Transactions on Embedded Computing Systems (TECS),
4(4):907–33, November 2005.
[Ghosh et al., 2006] Swaroop Ghosh, Saibal Mukhopadhyay, Keejong Kim, and
Kaushik Roy. Self-calibration technique for reduction of hold failures in low-
power nano-scaled SRAM. In Proceedings of the 43rd Design Automation
Conference, pages 971–976, San Francisco, Calif., July 2006.
[Gorshe and Bose, 1996] Steven S. Gorshe and Bella Bose. A self-checking ALU
design with efficient codes. In 14th IEEE VLSI Test Symposium (VTS’96),
pages 157–161, Princeton, New Jersey., April 1996.
[Gutnik and Chandrakasan, 1997] Vadim Gutnik and Anantha P. Chan-
drakasan. Embedded power supply for low-power DSP. IEEE Transactions on
Very Large Scale Integration (VLSI) Systems, VLSI-5(4):425–35, December
1997.
[Han et al., 2005] Jie Han, Jianbo Gao, Yan Qi, Pieter Jonker, and Jose´ Fortes.
Toward hardware-redundant, fault-tolerant logic for nanoelectronics. IEEE
Design and Test of Computers, 22(4):328–339, July–August 2005.
[Hegde and Shanbhag, 2000] Rajamohana Hegde and Naresh R. Shanbhag.
Toward achieving energy efficiency in presence of deep submicron noise.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, VLSI-
8(4):379–91, August 2000.
[Jien-Chung et al., 1992] Lo Jien-Chung, Thanawastien Suchai, and Michael
Nicolaidis. An SFS berger check prediction ALU and its application to self-
checking processor designs. IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems, 11(4):525–540, April 1992.
[Karl et al., 2006] Eric Karl, David Blaauw, Dennis Sylvester, and Trevor
Mudge. Reliability modeling and management in dynamic microprocessor-
based systems. In Proceedings of the 43rd Design Automation Conference,
pages 1057–1060, San Francisco, Calif., July 2006.
[Kaul et al., 2005] Himanshu Kaul, Dennis Sylvester, David Blaauw, Trevor
Mudge, and Todd Austin. DVS for on-chip bus designs based on timing er-
ror correction. In Proceedings of the Design, Automation and Test in Europe
Conference and Exhibition, pages 80–85, Munich, March 2005.
[Kim et al., 2005] Chris H. Kim, Steven Hsu, Ram Krishnamurthy, Shekhar
Borkar, and Kaushik Roy. Self calibrating circuit design for variation tolerant
VLSI systems. In Proceedings of the 11th IEEE International On-Line Testing
Symposium, pages 100–5, Saint Raphae¨l, France, July 2005.
[Koopman and Chakravarty, 2004] Philip Koopman and Tridib Chakravarty.
Cyclic redundancy code (crc) polynomial selection for embedded networks. In
2004 International Conference on Dependable Systems and Networks (DSN
2004), pages 145–154, Florence, Italy, June 2004.
154 BIBLIOGRAPHY
[Lala, 2001] Parag K. Lala, editor. Self-checking and fault-tolerant digital de-
sign. Morgan Kaufmann Publishers Inc., San Francisco, Calif., 2001.
[Li et al., 2006] Hai Li, Yiran Chen, Kaushik Roy, and Cheng-Kok Koh. SAVS:
a self-adaptive variable supply-voltage technique for process-tolerant and
power-efficient multi-issue superscalar processor design. In Proceedings of
the Asia and South Pacific Design Automation Conference, pages 158–163,
January 2006.
[MacWilliams and Sloane, 1977] Florence Jessie MacWilliams and Neil J. A.
Sloane. The Theory of Error-Correcting Codes, volume 19 of Mathematical
Library. North-Holland, Amsterdam, 1977.
[Martin et al., 2002] Steven M. Martin, Krisztian Flautner, Trevor Mudge, and
David Blaauw. Combined dynamic voltage scaling and adaptive body biasing
for lower power microprocessors under dynamic workloads. In Proceedings of
the International Conference on Computer Aided Design, pages 721–25, San
Jose, Calif., November 2002.
[McNairy and Soltis, 2003] Cameron McNairy and Don Soltis. Itanium 2 pro-
cessor microarchitecture. IEEE Micro, 23(2):44–55, March–April 2003.
[Mitra et al., 2005] Subhasish Mitra, Norbert Seifert, Ming Zhang, Quan Shi,
and Kee S. Kim. Robust system design with built-in soft-error resilience.
IEEE Computer, 38(2):43–52, 2005.
[Murali et al., 2005] Srinivasan Murali, Theo Theocharides, Narayanan Vi-
jaykrishnan, Mary Jane Irwin, Luca Benini, and Giovanni De Micheli. Anal-
ysis of error recovery schemes for networks on chips. IEEE Design and Test
of Computers, 22(5):434–442, September–October 2005.
[Nicolaidis, 1999] Michael Nicolaidis. Time-redundancy based soft-errors toler-
ance to rescue nanometer technology. In 17th IEEE VLSI Test Symposium,
pages 86–94, April 1999.
[Nicolaidis, 2003] Michael Nicolaidis. Carry checking/parity prediction adders
and ALUs. IEEE Transactions on Very Large Scale Integration (VLSI) Sys-
tems, VLSI-11(1):121–128, February 2003.
[Nicolaidis, 2005] Michael Nicolaidis. Design for soft error mitigation. IEEE
Transactions on Device and Materials Reliability, 5(3):405–18, September
2005.
[Nowka et al., 2002] Kevin J. Nowka, Gary D. Carpenter, Eric W. MacDonald,
Hung C. Ngo, Bishop C. Brock, Koji I. Ishii, Tuyet Y. Nguyen, and Jeffrey L.
Burns. A 32-bit PowerPC System-on-a-Chip with support for dynamic voltage
scaling and dynamic frequency scaling. IEEE Journal of Solid-State Circuits,
37(11):1441–47, November 2002.
[Proakis, 2000] John G. Proakis. Digital Communications. McGraw-Hill, New
York, 2000.
[Rabaey et al., 2003] Jan M. Rabaey, Anantha Chandrakasan, and Borivoje
Nikolic´. Digital Integrated Circuits. Prentice Hall, Upper Saddle River, N.J.,
second edition, 2003.
BIBLIOGRAPHY 155
[Roberts et al., 2005] David Roberts, Todd Austin, David Blauww, and Trevor
Mudge. Error analysis for the support of robust voltage scaling. In Proceedings
of 6th IEEE International Symposium on Quality Electronic Design, pages
65–70, March 2005.
[Rossi et al., 2005a] Daniele Rossi, Andre´ K. Nieuwland, Atul Katoch, and Ce-
cilia Metra. Exploiting ECC redundancy to minimize crosstalk impact. IEEE
Design and Test of Computers, 22(1):57–70, January–February 2005.
[Rossi et al., 2005b] Daniele Rossi, Andre´ K. Nieuwland, Atul Katoch, and Ce-
cilia Metra. New ecc for crosstalk impact minimization. IEEE Design and
Test of Computers, 22(4):340–348, July–August 2005.
[Saggese et al., 2005a] Giacinto Paolo Saggese, Anoop Vetteth, Zbigniew
Kalbarczyk, and Ravishankar K. Iyer. Microprocessor sensitivity to failures:
Control vs execution and combinational vs sequential logic. In 2005 Interna-
tional Conference on Dependable Systems and Networks (DSN 2005), pages
760–769, Yokohama, Japan, July 2005.
[Saggese et al., 2005b] Giacinto Paolo Saggese, Nicholas J. Wang, Zbigniew
Kalbarczyk, Sanjay J. Patel, and Ravishankar K. Iyer. An experimental
study of soft errors in microprocessors. IEEE Micro, 25(6):30–39, November–
December 2005.
[Sakiyama et al., 1999] Shiro Sakiyama, Jun Kajiwara, Masayoshi Kinoshita,
Katsuji Satomi, Katsuhiro Ohtani, and Akira Matsuzawa. An on-chip high-
efficiency and low-noise DC/DC converter using divided switches with current
control technique. In IEEE International Solid-State Circuits Conference,
Digest of Technical Paper, pages 156–57, San Francisco, Calif., February 1999.
[Shanbhag, 2002] Naresh R. Shanbhag. Reliable and energy-efficient digital sig-
nal processing. In Proceedings of the 39th Design Automation Conference,
pages 830–835, June 2002.
[Shang et al., 2003] Li Shang, Li-Shiuan Peh, and Niraj K. Jha. Dynamic volt-
age scaling with links for power optimization of interconnection networks. In
Proceedings of the 9th International Symposium on High-Performance Com-
puter Architecture, pages 91–102, Anaheim, Calif., February 2003.
[Shim et al., 2004] Byonghyo Shim, Srinivasa R. Sridhara, and Naresh R.
Shanbhag. Reliable low-power digital signal processing via reduced preci-
sion redundancy. IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, 12(5):497–510, May 2004.
[Sinha and Chandrakasan, 2001] Amit Sinha and Anantha P. Chandrakasan.
Dynamic voltage scheduling using adaptive filtering of workload traces. In
Proceedings of the 14th International Conference on VLSI Design, pages 221–
226, Jan 2001.
[Sokolov et al., 2005] Danil Sokolov, Julian Murphy, Bystrov Alexander, and
Alex Yakovlev. Design and analysis of dual-rail circuits for security applica-
tions. IEEE Transactions on Computers, 54(4):449–60, April 2005.
156 BIBLIOGRAPHY
[Stratakos, 1998] Anthony J. Stratakos. High-Efficiency Low-Voltage DC-DC
Conversion for Portable Applications. Ph.D. thesis, University of California,
Berkeley, Calif., 1998.
[Thaker et al., 2005] Darshan D. Thaker, Francois Impens, Isaac L. Chuang,
Rajeevan Amirtharajah, and Frederic T. Chong. Recursive TMR: Scaling
fault tolerance in the nanoscale era. IEEE Design and Test of Computers,
22(4):298–305, July–August 2005.
[Toprak and Leblebici, 2005] Zeynep Toprak and Yusuf Leblebici. A Low-Power
Adaptive Bias/Clock Generator for Fine-Grained Voltage and Frequency
Scaling in Multi-Core Systems. WSEAS TRANSACTIONS on SYSTEMS,
4(12):2390–2397, 2005.
[Varshavsky, 1990] Victor I. Varshavsky, editor. Self-Timed Control of Concur-
rent Processes. Kluwer Academic, Dordrecht, The Netherlands, 1990.
[Verhoeff, 1988] Tom Verhoeff. Delay-insensitive codes—an overview. Dis-
tributed Computing, 3(1):1–8, January 1988.
[Victor and Keutzer, 2001] Bret Victor and Kurt Keutzer. Bus encoding to
prevent crosstalk delay. In Proceedings of the International Conference on
Computer Aided Design, pages 57–63, San Jose, Calif., November 2001.
[Von Neumann, 1956] John Von Neumann. Probabilistics logics and the syn-
thesis of reliable organisms from unreliable components. Automata Studies,
Eds. C. E. Shannon et J. McCarthy, Princeton University Press, pages 43–98,
1956.
[Walrand and Varaiya, 2000] Jean Walrand and Pravin Varaiya. High-
Performance Communication Networks. Morgan Kaufmann, San Mateo,
Calif., second edition, 2000.
[Weste and Eshraghian, 1993] Neil H. E. Weste and Kamran Eshraghian. Prin-
ciples of CMOS VLSI Design. VLSI System Series. Addison-Wesley, Reading,
Mass., second edition, 1993.
[Worm et al., 2002] Fre´de´ric Worm, Paolo Ienne, Patrick Thiran, and Giovanni
De Micheli. An adaptive low-power transmission scheme for on-chip networks.
In Proceedings of the 15th International Symposium on System Synthesis,
pages 92–100, Kyoto, October 2002.
[Worm et al., 2005] Fre´de´ric Worm, Paolo Ienne, Patrick Thiran, and Giovanni
De Micheli. A robust self-calibrating transmission scheme for on-chip net-
works. IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
VLSI-13(1):126–39, January 2005.
[Zhang et al., 2000] Hui Zhang, Varghese George, and Jan M. Rabaey. Low-
swing on-chip signaling techniques: Effectiveness and robustness. IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, VLSI-
8(3):264–72, June 2000.
BIBLIOGRAPHY 157
[Zhang et al., 2002] Yan Zhang, John Lach, Kevin Skadron, and Mircea R.
Stan. Odd/even bus invert with two-phase transfer for buses with coupling.
In Proceedings of the International Symposium on Low Power Electronics and
Design, pages 80–83, Monterey, Calif., August 2002.
158 BIBLIOGRAPHY
Curriculum Vitae
Fre´de´ric Worm was born in 1977 in Geneva, Switzerland. He obtained a master
in Communication Systems in 2001 from Ecole Polytechnique Fe´de´rale de Lau-
sanne (EPFL). He spent the last two years of his master at Eure´com, Sophia-
Antipolis, France and obtained a Diploˆme d’Etudes Approndies in Networks
and Distributed Systems from Universite´ de Nice et Sophia-Antipolis in 2001.
The same year, he joined the Processor Architecture Laboratory at EPFL and
started working on his PhD thesis under the joint supervision of Professor Paolo
Ienne and Professor Patrick Thiran.
His research interests include bus encoding techniques, VLSI design robust
to electrical parameter variations, and Network-on-chip architectures.
Publications
• Fre´de´ric Worm, Patrick Thiran, and Paolo Ienne. Designing robust check-
ers in the presence of massive timing errors. In Proceedings of 12th IEEE
International On-Line Testing Symposium, pages 281-86, Lake of Como,
Italy, July 2006.
• Fre´de´ric Worm, Patrick Thiran, Giovanni De Micheli, and Paolo Ienne.
Self-calibrating Networks-on-Chip. In Proceedings of the IEEE Interna-
tional Symposium on Circuits and Systems, pages 2361-64, Kobe, Japan,
May 2005.
• Fre´de´ric Worm, Patrick Thiran, and Paolo Ienne. A unified coding frame-
work for delay-insensitivity. In Proceedings of the 11th International Sym-
posium on Asynchronous Circuits and Systems, New York, March 2005.
• Fre´de´ric Worm, Paolo Ienne, Patrick Thiran, and Giovanni De Micheli.
A robust self-calibrating transmission scheme for on-chip networks. IEEE
Transactions on Very Large Scale Integration (VLSI), 13(1), January 2005.
• Fre´de´ric Worm, Paolo Ienne, Patrick Thiran, and Giovanni De Micheli.
On-Chip self-calibrating communication techniques robust to electrical
parameter variations. IEEE Design and Test of Computers, 21(6),
November-December 2004.
• Fre´de´ric Worm, Paolo Ienne, and Patrick Thiran. Soft self-synchronising
codes for self-calibrating communication. In Proceedings of the Interna-
tional Conference on Computer Aided Design, San Jose, Calif., November
2004.
• Fre´de´ric Worm, Paolo Ienne, Patrick Thiran, and Giovanni De Micheli.
An adaptive low-power transmission scheme for on-chip networks. In Pro-
ceedings of the 15th International Symposium on System Synthesis, Kyoto,
October 2002.
• Patrick Thiran, Jean-Yves Le Boudec, and Fre´de´ric Worm, Network cal-
culus applied to optimal smoothing. In Proceedings of the 20th IEEE
Infocom Conference, Anchorage, April 2001.
159
160 Curriculum Vitae
