Combined Time and Information Redundancy for SEU-Tolerance in Energy-Efficient Real-Time Systems by Ejlali, Ali et al.
TVLSI-00230-2005.R1 
 
1 
Abstract— Recently the trade-off between energy consumption 
and  fault-tolerance  in  real-time  systems  has  been  highlighted. 
These works have focused on dynamic voltage scaling (DVS) to 
reduce  dynamic  energy  dissipation  and  on  time  redundancy  to 
achieve  transient-fault  tolerance.  While  the  time  redundancy 
technique exploits the available slack time to increase the fault-
tolerance by performing recovery executions, DVS exploits slack 
time  to  save  energy.  Therefore  we  believe  there  is  a  resource 
conflict  between  the  time-redundancy  technique  and  DVS.  The 
first  aim  of  this  paper  is  to  propose  the  usage  of  information 
redundancy  to  solve  this  problem.  We  demonstrate  through 
analytical and experimental studies that it is possible to achieve 
both  higher  transient  fault-tolerance  (tolerance  to  single  event 
upsets (SEU)) and less energy using a combination of information 
and  time  redundancy  when  compared  with  using  time 
redundancy alone. The second aim of this paper is to analyze the 
interplay  of  transient-fault  tolerance  (SEU-tolerance)  and 
adaptive body biasing (ABB) used to reduce static leakage energy, 
which has not been addressed in previous studies. We show that 
the same technique (i.e. the combination of time and information 
redundancy) is applicable to ABB-enabled systems and provides 
more advantages than time redundancy alone. 
 
Index  Terms—Embedded  systems,  energy  efficiency,  fault 
tolerance, Single event upsets 
 
I.  INTRODUCTION 
EAL-TIME  embedded  systems  that  are  employed  in 
defense, space, and consumer applications often have both 
energy  constraints  and  fault-tolerance  requirements.  To 
address the energy consumption issue, dynamic voltage scaling 
(DVS)  has  been  effectively  used.  It  reduces  the  dynamic 
energy consumption by decreasing the operational frequency 
and supply voltage [4,5]. On the other hand, time-redundancy 
technique (i.e. rollback-recovery) has been commonly used to 
achieve tolerance to transient faults (e.g. single event upsets) 
in  real-time  embedded  systems  [3,7].  Time  redundancy 
technique  is  popular  since  it  is  cost-effective,  and  less 
resources  (hardware/software)  are  wasted  on  tolerating 
transient faults as compared to other fault-tolerance techniques 
[25,27]. This technique uses slack time in the system schedule 
to  improve  transient-fault  tolerance  by  performing  recovery 
executions  whenever  faulty  runs  occur.  The  number  of 
possible recovery executions depends on the available slack 
time. Since DVS also requires slack time, there is a resource 
conflict between DVS and the time-redundancy technique on 
slack time which is a limited resource, i.e. if more slack time is 
given to DVS to save more energy, less slack time is left for 
transient-fault tolerance, and vice versa. 
DVS  not  only  reduces  the  slack  time  available  to  time-
redundancy-based fault-tolerance, but also increases the rate of 
transient faults or Single Event Upsets (SEU) (Bit-flips due to 
the impact of particles on flip-flops). Indeed, it was reported 
that the rate of SEUs increases exponentially as supply voltage 
decreases [3,8,9,10]. Traditionally, SEUs where regarded as a 
major concern only for space application. However, recently, 
SEUs have become the major source of concern even at the 
ground  level  due  to  the  continuing  technology  shrinkage 
[10,12]. Unfortunately, packaging cannot be effectively used 
to  shield  against  SEUs  [13,14],  since  the  chip  and  the 
packaging materials themselves emit alpha particles that can 
cause SEUs. Also SEUs can be caused by neutrons which can 
easily penetrate through packages [8,10]. 
The  energy  consumption  of  a  VLSI  system  can  be 
subdivided  into  two  main  components:  dynamic  energy  and 
leakage energy. Until recently, dynamic energy has been the 
main  source  of  energy  consumption.  However,  in  deep-
submicron CMOS, the technology shrinkage causes transistor 
subthreshold leakage current to increase exponentially which 
results in a corresponding increase in leakage energy, so that 
the  leakage  energy  becomes  comparable  to  the  dynamic 
energy. Hence, it is essential to use techniques to manage the 
leakage energy [21,26]. Adaptive body bias (ABB) has been 
shown to be an effective technique to reduce leakage power 
Combined Time and Information Redundancy 
for SEU-Tolerance in Energy-Efficient  
Real-Time Systems 
Alireza Ejlali, Bashir M. Al-Hashimi, Senior Member, IEEE, Marcus T. Schmitz,                           
Paul Rosinger, and Seyed Ghassem Miremadi, Member, IEEE 
R 
 
Manuscript received July 29, 2005; revised January 11, 2006. This work 
is supported in part by the EPSRC, U.K., under grants GR/S95770, and 
EP/C512804.  
A.  Ejlali  is  with  the  Department  of  Computer  Engineering,  Sharif 
University of Technology, P. O. Box: 11365-9517, Tehran, Iran (email: 
ejlali@ce.sharif.edu). 
B.  M.  Al-Hashimi,  is  with  the  School  of  Electronics  and  Computer 
Science,  University  of  Southampton,  Southampton  SO17  1BJ,  U.K. 
(corresponding author: Tel: +44 (0)238059324; Fax +44 (0)2380592901; 
e-mail: bmah@ecs.soton.ac.uk). 
M.  T.  Schmitz  was  with  the  School  of  Electronics  and  Computer 
Science, University of Southampton, Southampton SO17 1BJ, U.K. He is 
now with Robert Bosch GmbH, Germany.  
(email:  marcus.schmitz@de.bosch.com). 
P. Rosinger is with the School of Electronics and Computer Science, 
University  of  Southampton,  Southampton  SO17  1BJ,  U.K.  (e-mail: 
pmr@ecs.soton.ac.uk). 
S. G. Miremadi is with the Department of Computer Engineering, Sharif 
University of Technology, P. O. Box: 11365-9517, Tehran, Iran (email: 
miremadi@sharif.edu). TVLSI-00230-2005.R1 
 
2 
[28] by tuning the threshold voltage of the transistors, reverse 
body bias (Vsb < 0) to reduce sub-threshold leakage in standby 
mode and forward body bias (Vsb > 0) to improve performance 
in  active  mode.  To  implement  ABB  in  practice;  generator 
circuits  supplying  the  body  bias  voltages  are  required  [29]. 
Nevertheless, this biasing also affects the frequency at which 
the circuit operates and therefore influences the slack time [6]. 
That is, a problem similar to the one discussed above for DVS 
exists also for ABB, i.e. there is a resource conflict between 
ABB  and  the  time-redundancy  technique  on  slack  time. 
Furthermore, it has been shown that ABB can worsen the SEU 
rate by 36% [11]. 
As  opposed  to  the  previous  works  [1,2,3,31]  on  fault-
tolerant DVS-enabled real-time systems which focused on time 
redundancy, in this paper we propose the usage of information 
redundancy  in  fault-tolerant  DVS-enabled  and ABB-enabled 
systems.  Since  both  DVS  and  ABB  require  slack  time, 
information redundancy is used to decouple the fault-tolerance 
from the slack time and hence to provide more slack time to 
DVS and ABB without degrading the fault-tolerance capability 
of the system. To the best of our knowledge, this paper is the 
first attempt that addresses energy management through DVS 
and ABB and SEU-tolerance through information-redundancy 
in  conjunction.  Also,  this  paper  is  the  first  attempt  that 
considers the energy/fault-tolerance trade-off in ABB-enabled 
systems. It should be noted that the aim of the paper is not to 
propose  any  new  fault-tolerance  or  energy  management 
technique,  rather  to  identify  appropriate  fault-tolerance  and 
energy management techniques among the existing ones which 
are more suitable to be used together. This is necessary since 
the  continuing  diversity  of  embedded  systems  applications 
require that such systems to exhibit both reliability and energy 
efficiency.  Towards  this,  we  have  evaluated  the  fault-
tolerance/energy trade-off for given deadline of two existing 
fault  tolerance  techniques  when  employed  in  real-time 
embedded systems to improve their reliability to SEU faults. 
Our study shows that a combination of time and information 
redundancy  has  less  interference  with  energy  management 
techniques  (i.e.  ABB  and  DVS)  as  compared  to  time 
redundancy alone (which has been the focus of the previous 
works [1,2,3]). 
The  rest  of  the  paper  is  organized  as  follows.  Section  II 
discusses  the  related  works.  Section  III  presents  the  system 
fault-tolerance and energy models. Section IV compares the 
fault-tolerance  and  the  energy  consumption  of  the  proposed 
approach (which uses both time-redundancy and information-
redundancy) and the conventional approach (which solely uses 
time-redundancy), using the models presented in Section III 
and  experimental  results.  Finally,  Section  V  concludes  the 
paper. 
 
II.  RELATED WORKS 
The trade-off problem between fault-tolerance and energy 
consumption  in  DVS-enabled real-time systems has recently 
been  highlighted  [3]  and  become  subject  to  investigations 
[1,2,31]. Non-uniform checkpoint placement policies for the 
combined purpose of conserving energy and providing fault-
tolerance have been proposed in [1]. The technique proposed 
in [2] uses an adaptive check-pointing scheme to achieve fault-
tolerance and energy saving in a unified manner. An integrated 
approach for achieving fault tolerance and energy savings in 
fixed-priority  real-time  embedded  systems  has  been 
investigated in [31]. Although all these techniques [1,2,31] are 
effective  in  achieving  fault  tolerance,  the  obtained  energy 
savings are limited due to the fact that the time redundancy 
requires  slack  time  –  slack  time  that  otherwise  could  be 
exploited through DVS to reduce the energy consumption. In 
the  context  of  leakage  energy  reduction,  although  there  are 
some  works  on  the  impact  of  body  biasing  on  SEU  rate 
[11,16,17],  these  works  do  not  consider  any  fault  tolerance 
technique  and  the  interplay  of  ABB  and  fault-tolerance 
techniques has not been studied. 
In addition to the aforementioned works which consider the 
energy/fault-tolerance  trade-off  in  DVS-enabled  real-time 
systems, recently there have been some other reported works 
which are not directly related to this paper (here we focus on 
ABB-enabled  and  DVS-enabled  systems);  however  they 
provide further evidence that the energy/fault-tolerance trade-
off  exists  [15,18,19].  Using  circuit-level  simulation,  [19] 
shows that in small circuits (4-bit counters) the transient fault 
tolerance and power dissipation are at odds. In the context of 
on-chip communication, [18] analyzes the impact of redundant 
bus  coding  on  the  energy/reliability  trade-off,  and  [15] 
proposes a dynamic voltage swing approach to optimize the 
energy consumption of a reliable communication scheme. 
 
III.  SYSTEM MODELS 
In  this  paper,  we  will  compare and analyze two types of 
fault-tolerant  energy-aware  real-time  systems,  defined  as 
follows: 
(a) Conventional R system: This represents a fault-tolerant 
energy aware system which uses pure rollback-recovery, i.e., 
the  conventional  approach  based  on  time-redundancy  (Fig. 
1a). In this system, whenever transient faults (i.e. SEUs) occur 
during the task execution, a recovery execution (re-execution) 
of the same task is required [3,7]. All the systems which use 
rollback recovery have some error detection mechanisms (e.g. 
control  flow  checking  techniques,  consistency  check,  etc. 
[25,27]). Using these mechanisms when a system detects that 
the  results  generated  by  a  task  are  in  error,  the  system  re-
executes the task [27]. Also, in these systems highly reliable 
memory  units  are  required  [27].  This  is  necessary  for  the 
correct operation of rollback-recovery. For example if an error 
occurs in the memory area which contains the task code, all 
recovery executions will be faulty since they will re-execute 
the  same  erroneous  code  (for  more  information  on  the 
requirements of rollback-recovery please refer to [27]). 
As an example, Fig. 1a, shows a possible scenario which TVLSI-00230-2005.R1 
 
3 
can  occur  in  the  conventional  R  system.  As  shown  in  this 
figure, during the original task execution three SEUs (2 SEUs 
in the same clock cycle and 1 single SEU) cause a faulty run, 
hence necessitating a recovery execution (recovery execution 
1). Such executions have to be performed until a non-faulty 
run happens (e.g. recovery execution 2 in Fig. 1a). In order to 
achieve a certain degree of fault-tolerance it is necessary to 
reserve some system time for recovery executions (slack time 
for recoveries), while the remaining slack time until the task 
deadline D can be exploited via DVS and ABB to reduce the 
system’s energy dissipation. 
(b) Proposed RI system: These fault-tolerant energy aware 
systems  use  both  rollback-recovery  and  information 
redundancy [25], i.e., the fault-tolerance is achieved through 
recovery executions as well as through redundant information 
that can be used to correct faults during execution (i.e. without 
necessitating  a  re-execution).  Consider  Fig.  1b,  which 
demonstrates this approach using the same SEUs as in Fig. 1a. 
As we can observe, whenever one SEU occurs during a single 
clock cycle (first and third faults in Fig. 1), the resulting error 
can be corrected by some additional hardware which is used 
for  information  redundancy.  Faults  that  require  a  recovery 
execution occur only if two or more SEUs happen during a 
single  clock  cycle  (for  instance,  second  fault  during  the 
original  execution  in  Fig.  1b).  Accordingly,  the  number  of 
necessary recoveries is reduced leaving more exploitable slack 
time to DVS and ABB. 
Suppose a task and its recoveries run at the same frequency 
f. Let N be the number of clock cycles which are needed to 
execute the task, D be the deadline (in seconds), and ρf be the 
probability  of  having a faulty run. Then, the task execution 
time  is  N/f  seconds,  and  the  amount  of  total  slack  time  is 
D-(N/f). The first recovery execution is only required when the 
original execution fails, hence the probability to run the first 
recovery is ρf. The second recovery execution is only required 
when both the original and the first recovery executions fail, 
hence  the  probability  to  run  the  second  recovery  is  ρf
2. 
Similarly, the ith recovery will be executed with probability 
ρf
i.  Thus,  the  expected  time  required  for  executing  K 
recoveries is: 
 
      ∑
=
− =
K
i
i
f rec K f
N
T
1
ρ   (1) 
 
Therefore the slack time which is left for DVS and ABB is: 
 
∑ ∑
= =
−
− = − −
= − − =
K
i
i
f
K
i
i
f f
rec K Energy
f
N
D
f
N
f
N
D
T
f
N
D T
0 1
0
) (
ρ ρ ρ
  (2) 
 
DVS and ABB can use the slack time TEnergy to save energy. 
It can be seen from Eq. (2) that as ρf (i.e. the probability of 
having a faulty run) decreases, TEnergy increases. Note that in 
the RI system, the usage of information redundancy decreases 
ρf,  so  that  TEnergy  increases  and  more  slack  time  becomes 
available to save energy (compare Fig. 1a and Fig. 1b). 
Information  redundancy  in  the  proposed  RI  system  is 
obtained  by  adding  some  additional  hardware  to  the 
conventional  circuit,  as  shown  in  Fig.  1c.  This  hardware 
comprises  a  parity  generator  (produces  parity  bits,  e.g. 
overlapping parity bits [25]), flip-flops to store the parity bits, 
and  a  single  bit  error  corrector  which  restores  the  affected 
registers  to  the  original  content  as  long  as  only  one  bit  is 
corrupted. We will demonstrate in Section IV that the extra 
energy  associated  with  the  additional  hardware  can  be 
overcompensated  by  DVS  and  ABB  (because  of  the  TEnergy 
increase), i.e. the RI systems can yield higher energy savings 
when compared to the conventional R systems. 
To clarify the hardware required for information redundancy 
in the proposed RI system, consider a 4-bit register which has 
been  protected  using  overlapping  parity  technique.  In  the 
overlapping  parity  technique,  3  parity  bits  are  required  to 
protect 4 bits of information. Each parity bit is generated from 
a  subset  of  the  data  bits,  called  parity  group.  For  example, 
Original Register
Parity Generator
(Combinational)
FFs for Parity
bits
Single Bit Error
Correction
(combinational)
Register data in
Register data out
Information
redundancy
hardware
Original
Execution
Recovery
Execution 1
Recovery
Execution 2
Slack for recoveries Slack for energy
saving
D
e
a
d
l
i
n
e
Total Slack
(a)
S
E
U
S
E
U
SEU
SEU
Register
Conventional R system
Original
Execution
Recovery
Execution 1
Slack for
recoveries
Slack for energy saving
D
e
a
d
l
i
n
e
Total Slack
(b)
S
E
U
S
E
U
SEU
SEU
Corrected
Corrected
Time
Register
(c)  
 
Fig. 1.  a) Conventional system (denoted by R), b) Proposed system (denoted by RI), c) Information redundancy hardware 
 TVLSI-00230-2005.R1 
 
4 
assuming that the 4 original data bits are stored in flip-flops 
D0 through D3 and the 3 parity bits are stored in flip-flops P0 
through P2, Table I shows the 3 parity groups associated to the 
parity bits. As shown in this table, the parity groups overlap in 
such  a  manner  that  each data bit appears in more than one 
parity group. The concept of overlapping parity is to assign 
each bit to a unique combination of parity bits [25], so that if a 
SEU  occurs  in  any  one  bit  (either  data  or  parity),  the 
combination of the parity bits which detect the error is unique. 
For  example,  it  can  be  seen  from  Table  I  that  data  bit  D2 
contributes to the generation of parity bits P2, and P1, hence 
when bit D2 is in error (because of a SEU in flip-flop D2) the 
unique combination of the parity bits which detect this error is 
{P2,  P1},  i.e.  parity  bits  P2  and  P1  detect  the  error 
simultaneously. However, if a SEU occurs in any bit (either 
data or parity) other than D2, the combination of the affected 
parities will not be {P2, P1} and parities other than P2 and P1 
will be affected. Table II shows the combination of the parity 
bits which detect SEUs. 
 
TABLE I 
PARITY GROUPS AND THEIR ASSOCIATED PARITY BITS 
Parity group  Parity bit 
D3 D2 D1  P2 
D3 D2 D0  P1
 
D3 D1 D0  P0 
 
TABLE II 
COMBINATION OF THE PARITIES WHICH DETECT A SEU 
SEU location  Parities affected 
D3  P2 P1 P0 
D2  P2 P1 –– 
D1  P2 –– P0 
D0  –– P1 P0 
P2  P2 –– –– 
P1  –– P1 –– 
P0  –– –– P0 
–– denotes unaffected parities 
 
The single error correction circuitry, shown in Fig. 1c, can 
detect and locate a SEU because each SEU affects the parity 
bits  in  a  unique  manner  (as  shown  in  Table  II).  Once  the 
location  of  the  erroneous  bit  is  known,  the  error  can  be 
corrected  by  simply  complementing  the  output  of  the 
erroneous flip-flop. For example, when parity bits P2 and P0 
are affected, bit D1 is erroneous and the output of flip-flop D1 
should be complemented (Table II). The output of the single 
error correction circuitry maintains the correct value until the 
next clock edge, on which the erroneous bit is removed when 
new data is clocked in. Note that the single error correction 
circuitry cannot correct errors, if more than one SEU occurs in 
the  protected  registers  during  a  clock  cycle.  For  example, 
when data bits D1 and D0 are erroneous, parity bits P2 and P1 
detect the error (Table I). However, the combination {P2, P1} 
is  assigned  to  bit  D2  (Table  II),  hence  the  single  error 
correction  circuitry  complements  bit  D2,  instead  of 
complementing the erroneous bits D1 and D0. 
It should be noted that if an error (SEU) occurs directly in 
one of the parity flip-flops, it will have no impact on the data 
bits read out from the register. This is because when a SEU 
occurs in a parity flip-flop, only the corresponding parity bit is 
affected. However, the error correction circuitry inverts a data 
bit only when at least two parities are affected (Note that each 
data bit has been assigned to more than one parity bit). The 
penalty for using overlapping parity on 4 bits of information is 
high; 3 parity bits are required for the 4 bits of information. 
However,  as  the  number  of  information  bits  increases,  the 
number of parity bits required becomes a smaller percentage of 
the  number  of  actual  information  bits.  For  example,  only  7 
parity bits are adequate for protecting a 64-bit register [25]. It 
will be shown in Table III, Section IV-B, that the proposed RI 
system is still effective, even when considering the imposed 
hardware overheads. 
A.  Fault-tolerance assessment 
The correctness of a real-time system depends not only on 
the logical correctness of computation, but also on the time 
which the application takes to complete successfully. Hence, to 
measure the fault-tolerance of a real-time system, we need to 
consider both the tolerance to computation faults as well as the 
capability to meet deadlines (timely completion). Note that in 
soft  real  time  systems,  occasionally  missing  deadlines  has 
negligible effects, however it is still incorrect (i.e. incorrect but 
negligible). 
While  SEUs  can  cause  computation  faults,  lowering  the 
performance (speed), required by DVS and ABB, increases the 
application execution time and hence in the case of excessive 
performance reduction, it can cause missed deadlines. Also, in 
the  time-redundancy  technique  [3,7],  the  use  of  rollback 
executions to tolerate SEUs requires time and when a faulty 
run occurs the task can be re-executed only if it does not result 
in  missing  the  deadline  (i.e.  there  is  enough  slack  time 
available). From this discussion, it can be seen why in addition 
to computation faults, timely completion should be taken into 
account  when  assessing  the  fault-tolerance  of  real-time 
systems.  To  do  this,  the  related  literature  has  used  the 
following metrics to measure the fault-tolerance of real-time 
systems:  [2,31]  have  used  "the  likelihood  of  timely  task 
completion in the presence of faults", and [3,20] have used 
"the probability to complete the application correctly within its 
deadline in the presence of faults". It can be seen that these 
two  definitions  are  equivalent  and  both  of  them  consider 
timely  task  completion  in  the  presence  of  faults.  However, 
from a terminology point of view, the term "performability" is 
used in [3,20] to refer to this definition, while [2,31] do not 
use the term "performability". 
It  should  be  noted  that  for  hard  real-time  systems,  it  is 
important to guarantee timeliness in worst-case scenarios (for 
example in scheduling the tasks), however from the reliability 
point of view, it is not possible to claim that a task will be 
completed  correctly  within  its  deadline.  For  example,  in  a 
hard-real  time  system,  even  when  the  probability  of  having TVLSI-00230-2005.R1 
 
5 
errors  is  very  low,  errors  can  consecutively  occur  in  the 
original and recovery executions, so that the task cannot be 
completed within its deadline (the probability of this happens 
is very low but it is not zero). It can be stated that "in hard real 
time systems, we require that a task finishes correctly within its 
deadline with a very high probability". In other words, based 
on  the  above  mentioned  performability  definition,  it  can  be 
stated  that  "in  hard  real  time  systems  we  require  a  high 
performability (i.e. very near to 1)". For soft real-time systems, 
although missed deadlines will not cause catastrophes, we still 
require  that  a  task  can  be  completed  within  its  deadline. 
However, the probability of finishing the task within its dead 
could be less than what is required for hard real time systems. 
In other words, in soft real-time systems we require a lower 
performability as compared to hard real-time systems. 
Using the performability criterion, this section presents an 
analysis for both the conventional R and proposed RI systems. 
In  this  section,  first  we  consider  the  system  operational 
frequency, since it determines the performance (speed) of the 
system, which has an important impact on the performability. 
Then  we  consider  SEU  rate  which  is  another  factor  with 
important influence on performability, since SEUs cause faults 
in computation results. Finally using the analytical models of 
operational  frequency  and  SEU  rate,  we  develop  the 
performability  models  for  both  the  conventional  R  and 
proposed RI systems. 
1)  Operational  frequency:  In  DVS-enabled  systems, 
reducing  the  supply  voltage  of  a  digital  circuit  requires  the 
reduction of the frequency in order to ensure correct operation. 
Similarly,  in  ABB-enabled  systems,  reducing  the  body  bias 
voltage  requires  the  reduction  of  the  frequency.  Analytical 
models  for  the  impact  of  ABB  and  DVS  on  the  system 
operational  frequency  have  been  developed  in  [21].  In  this 
section, we use the same models to formulate the operational 
frequency of the conventional R and proposed RI systems. 
When the conventional R system runs at supply voltage VR, 
and body bias voltage (applied between the body and source of 
transistors) Vbs R the operational frequency can be expressed as 
[21]: 
   
α ] ) 1 [( ) (
) , (
1 2 1
1
6 th R bs R R d
R bs R R
V V K V K K L
V V f
− + +
=
−
  (3) 
 
where Ld R is the logic depth of the critical path, Vth1, K1, K2, 
and K6 are constants for given process technology, and α is a 
measure  of  velocity  saturation  whose  value  has  been 
approximated to be 1 [21]. 
This paper proposes the usage of information redundancy, 
which  requires  some  extra  hardware  logic  to  process  the 
redundant  information.  Suppose  that  because  of  the  extra 
hardware logic, the depth of the critical path of the proposed 
RI  system  is  KC  times  the  depth  of  the  critical  path  of  the 
conventional R system, i.e. 
R d C RI d L K L ⋅ = , then the operational 
frequency of the proposed RI system is: 
 
α ] ) 1 [( ) (
) , (
1 2 1
1
6 th RI bs RI R d C
RI bs RI RI
V V K V K K L K
V V f
− + +
=
−
   (4) 
 
where  VRI  and  Vbs  RI  are  the  supply  voltage  and  body  bias 
voltage in the proposed RI system respectively. 
2)  SEU  rate:  SEU  rate  is  the  average  number  of  SEUs, 
occurring in a system, per unit of time (e.g. second, hour). It 
has been observed that supply voltage (DVS) has an important 
influence  on  SEU  rate, so that as supply voltage decreases, 
SEU  rate  increases  exponentially  [8,9].  In  fact,  SEU  rate 
increases  about  1-2  orders  of  magnitude  as  supply  voltage 
decreases by 1V [8,9]. Also, it has been reported that body 
biasing techniques (ABB), used to reduce leakage power, can 
worsen SEU rate by 36% in flip-flops [11]. 
To analyze the impact of combined dynamic voltage scaling 
and adaptive body biasing on the SEU rate of flip-flops, we 
have used SPICE-based fault injection experiments. In these 
experiments,  faults  were  injected  to flip-flops similar to the 
flip-flops  used  in  [11]  (in  [11]  SEU  rate  measurements  are 
performed by subjecting the flip-flops to accelerated alpha and 
neutron fluxes). Fig. 2 shows the scheme of these flip-flops. 
The  simulations  were  carried  out  using  a  CMOS  0.25µ 
technology.  Faults  were  injected  using  the  current  sources, 
which  can  accurately  represent  the  electrical  impact  of  the 
particle  strikes.  Similar  approaches  have  been  used  in prior 
works  [10,17,19].  The  injected  current  caused  by  a  particle 
strike is [10]: 
 
T
t
Inj e
T
t
T
t I
−
⋅ ⋅
⋅
=
π
2
) (   (5) 
 
VSS
VDD
VPB
VNB
VPB
VNB
VPB
VNB
VDD
VSS
VPB
VNB
VDD
VSS
VPB
VNB
CLK
CLKb
CLK CLKb
VPB
VNB
CLK
CLKb
din qout
 
Fig. 2.  Body controlled flip-flop [11] 
 
An SEU occurs if collected charge Q (caused by a particle 
strike) exceeds critical charge QCRIT of a circuit node. In other 
words, QCRIT can be defined as the minimum charge collected 
due to a particle strike that can cause a SEU [10]. It has been 
shown that there is an exponential relationship between SEU 
rate and QCRIT [10], i.e. 
 
CRIT Q e
− ∝ λ     (6) 
 TVLSI-00230-2005.R1 
 
6 
On the other hand QCRIT can be derived using eq. 7 [30]. 
 
∫ ⋅ =
F T
d CRIT dt t I Q
0
) (    (7) 
 
where Id is the drain current induced by the charged particle, 
and  TF  is  the  flipping  time  which  defines  the  irreversibility 
point after which the feedback mechanism of the flip-flop will 
take over to continue the flipping process. SPICE simulations 
were used to measure the flipping time TF which can be used 
to calculate QCRIT. Fig. 3 depicts the experimental results and 
shows the impact of supply voltage Vdd and body bias voltage 
Vbs variations on the critical charge QCRIT. In this figure, three 
curves are plotted for three different body bias voltages. Each 
curve illustrates how QCRIT changes as Vdd changes. 
 
 
Fig. 3.  Impact of Vdd and Vbs variations on QCRIT. 
 
Three interesting observations can be made from Fig. 3: 
1) It can be seen from this figure that regardless of Vbs there is 
a linear relationship between QCRIT and Vdd, i.e. 
 
     dd CRIT V Q ∝    or   
2 1 C V C Q dd CRIT + ⋅ =    (8) 
 
2) It can be seen from this figure that when Vbs changes, the 
line is shifted up or down; however the slope of the line is 
almost the same. This means that C2 is a function of Vbs, but C1 
(line slope) is not a function of Vbs. Therefore, we can rewrite 
eq. 8 as: 
 
     ) ( 2 1 bs dd CRIT V C V C Q + ⋅ =    (9) 
 
3) The impact of Vdd on QCRIT is much more significant than 
the impact of Vbs. For example, when Vbs is constant and equal 
to 0, if one reduces Vdd by 1.5V (from 3.3 to 1.8), the critical 
charge will be reduced by about 11.5fC. However, when Vdd is 
constant and equal to 3.3V, if one reduces Vbs by 2V (from 0 
to -2), the critical charge will be reduced by about 3fC. This 
result is in agreement with the conclusions reached in [8,9,11], 
i.e.  while  variations  in  Vdd  change  the  SEU  rate  by  several 
orders  of  magnitude  (e.g.  multiplied  by  a  factor  of  10,100, 
1000, or …) [8,9], variations in Vbs changes the SEU rate only 
by  a  factor  of  about  1.36.  It  should  be  noted that although 
ABB does not have a major impact on SEU rate (as compare 
to DVS), it still has an important impact on the system fault 
tolerance (Section III-A-3). This is because when ABB is used 
to reduce energy consumption, it uses slack time and leaves 
less slack time for the recovery executions. 
 
As  mentioned  previously,  SEU  rate  is  exponentially 
proportional to QCRIT, therefore SEU rate can be expressed as: 
 
dd bs V C V C
bs dd C V V
⋅ ⋅ =
1 2 10 . 10 ) (
) (
3 , λ    (10) 
 
In order to show eq. 10 in a more suitable shape, let VMAX be 
the maximum supply voltage, and S be the voltage value that 
when supply voltage decreases by it, the SEU rate increases by 
one  order  of  magnitude.  Then  eq.  10  can  be  rewritten  as 
follows: 
 
S
dd V MAX V
bs V C
bs dd C V V
−
⋅ = 10 . 10 ) (
) (
4 ,
2 λ    (11) 
 
It should be noted that Eq. 11 is obtained from Eq. 10, just 
by  defining  new  constants  S  and  VMAX,  i.e.  S C / 1 1 − =   and  
S
VMAX
C C 10 4 3 ⋅ = . Also, Eq. 11 can be rewritten as follows: 
 
S
V V
bs bs dd
dd MAX
V V V
−
⋅ = 10 ) ( ) , ( 0 λ λ    (12) 
 
where λ0(Vbs) is the SEU rate corresponding to Vdd=VMAX. In 
this paper it is assumed that the SEU rate increases one order 
of magnitude as supply voltage decreases by 1V (reasonable 
assumption based on the data in [8,9]), hence S=1V. Also, it is 
assumed that λ0(0)=10
-6 FPS (faults per second), i.e. the SEU 
rate at Vdd=VMAX and Vbs=0 (reasonable assumption based on 
the data in [3]). Although, this assumption about the SEU rate 
λ0(0) is reasonable for typical environments [3], since the SEU 
rate  varies  in  different  environments  we  will  analyze  the 
impact of SEU rate variations on both the proposed RI and 
conventional  R  systems  in  Section  IV-C.  As  mentioned 
previously, it has been shown that ABB can worsen SEU rate 
by  36%  [11].  Therefore,  it  is  assumed  that 
λ0(VbsMIN)=1.36⋅λ0(0), where VbsMIN is the minimum value of 
Vbs. 
The use of information redundancy requires some extra flip-
flops to store the redundant bits. However, as the number of 
the flip-flops increases, the rate at which the flip-flops are hit 
by particles increases linearly [19]. Suppose that because of 
the redundant bits, the number of the flip-flops of the proposed 
RI  system  is  KFF  times  the  number  of  the  flip-flops  of  the 
conventional R system, then the SEU rate of the proposed RI 
system is: TVLSI-00230-2005.R1 
 
7 
 
S
V V
bs FF bs dd RI
dd MAX
V K V V
−
⋅ ⋅ = 10 ) ( ) , ( 0 λ λ    (13) 
 
3) Performability model: It has been observed that the time 
instants  where  a  radiation  particle  hit  takes  place  follows  a 
Poisson process [12]. Consequently, Poisson distribution has 
been  commonly  used  to  model  the  rate  of  particle-induced 
faults (i.e. SEUs) [2,3,12]. In the conventional R system, based 
on  Poisson  distribution,  the  probability  of  having  no  SEU 
during a given clock cycle is: 
 
) , (
) , (
0
R bs R R
R bs R R
V V f
V V
R e P
λ
−
=    (14) 
 
Therefore, in the conventional R system, the probability of 
having a faulty run (at least one SEU during one of the clock 
cycles) of the task is: 
 
) , (
) , (
0 1 1
R bs R R
R bs R R
V V f
N V V
N
R R f e P
⋅
−
− = − =
λ
ρ    (15) 
 
where N is the number of clock cycles which are needed to 
execute the task. Since the time required for one execution of 
the  task  is  ) , ( / R bs R R exe V V f N t =   ,  the  maximum  number  of 
possible recoveries is: 
 
1
) , (
1 − 




 ⋅
= − 





=
N
V V f D
t
D
k
R bs R R
exe
R f    (16) 
 
where D is the deadline (in seconds). Based on Eq. (15) and 
Eq. (16), the performability of the conventional R system is: 
 





 ⋅ ⋅
−
+ − − = − =
N
V V f D
V V f
N V V
k
R f R f
R bs R R
R bs R R
R bs R R
R f e R
) , (
) , (
) , (
1 ) 1 ( 1 1
λ
ρ   (17) 
 
In the proposed RI system, based on Poisson distribution, 
the probability of having no SEU during a given clock cycle is: 
 
) , (
) , (
0
RI bs RI RI
RI bs RI RI
V V f
V V
RI e P
λ
−
=    (18) 
 
and the probability of having exactly one SEU in the clock 
cycle is: 
 
) , (
) , (
1 ) , (
) , (
RI bs RI RI
RI bs RI RI
V V f
V V
RI bs RI RI
RI bs RI RI
RI e
V V f
V V
P
λ
λ −
⋅ =    (19) 
 
Hence, the probability of having a faulty run in the proposed 
RI system can be expressed as: 
 
) , (
) , (
1 0
)
) , (
) , (
1 ( 1
) ( 1
RI bs RI RI
RI bs RI RI
V V f
N V V
N
RI bs RI RI
RI bs RI RI
N
RI RI RI f
e
V V f
V V
P P
⋅
−
⋅ + −
= + − =
λ
λ
ρ
   (20) 
 
Note  that,  as  mentioned  in  Section  III,  the  proposed  RI 
system has a faulty run if more than one SEU (at least two 
SEUs)  occurs  during  a  clock  cycle.  Based  on  Eq.  20,  the 
performability of the proposed RI system is: 
 





 ⋅
⋅
−
+








⋅ + − −
= − =
N
V V f D
V V f
N V V
N
RI bs RI RI
RI bs RI RI
k
RI f RI f
RI bs RI RI
RI bs RI RI
RI bs RI RI
RI f
e
V V f
V V
R
) , (
) , (
) , (
1
)
) , (
) , (
1 ( 1 1
1
λ
λ
ρ
  (21) 
 
Eq. (17) and Eq. (21) will be used in Section IV to compare 
the performabilities of the conventional R system (based on 
time-redundancy  only)  and the proposed RI system (i.e. the 
proposed  approach  based  on  the  combination  of  time  and 
information  redundancy).  It  is  important  to  note  that  the 
performability  of  both  the  conventional  R  system  and  the 
proposed  RI  system  increase with increasing supply voltage 
and  body  bias  voltage  (and  consequently  increasing 
operational frequency). This is due to two reasons: a) more 
recovery executions can be performed within the task deadline, 
and b) the system is less susceptible to SEUs at higher supply 
and body bias voltages. However, the performability of the RI 
system is in general better than the R system when the same 
supply and body bias voltages are used. This is due to the fact 
that the additional information redundancy in the RI system, 
which does not require slack time for any recovery execution, 
covers  one  SEU  per  clock  cycle,  hence  leaving  more  slack 
time for recoveries. This aspect will be clarified in Section IV. 
B.  Energy consumption model 
The  energy  consumption  per  cycle  of the conventional R 
system is [21]: 
 
4 4 4 4 4 4 4 3 4 4 4 4 4 4 4 2 1
3 2 1
Energy Static
j R bs
V K V K
R
R bs R R
R g
Energy Dynamic
R eff R cyc
I V e e K V
V V f
L
V C E
R bs R ) (
) , (
5 4
3
2
+
+ =
   (22) 
 
where Ceff is the average switched capacitance per cycle for the 
whole  circuit,  LgR  is  the  number  of  the  logic  gates  in  the 
circuit,  K3,  K4  and  K5  are  constant  parameters  and  Ij  is  the 
current due to junction leakage. 
As  mentioned  in  Section  III,  in  the  proposed  RI  system 
some extra hardware logic is needed to process the redundant 
information. Suppose that because of the extra hardware, the 
number  of  gates  in  the proposed RI system is Ka times the 
number of gates in the conventional R system, i.e. LgRI=Ka⋅LgR. 
Let Ceff_extra be the average switched capacitance per cycle for TVLSI-00230-2005.R1 
 
8 
this extra hardware logic, the energy consumption (per cycle) 
of the proposed RI system is: 
 
) (
) , (
) (
5 4
3
2
_
j RI bs
V K V K
RI
RI bs RI RI
R g a
RI eff extra eff RI cyc
I V e e K V
V V f
L K
V C C E
RI bs RI +
⋅
+ + =
   (23) 
 
As mentioned in Section III, both the conventional R and 
the  proposed  RI  systems  use  rollback-recovery,  i.e.  after  a 
faulty  run  the  task  has  to  be  re-executed.  Such  recovery 
executions  consume  energy,  just like the original execution. 
Therefore,  to  analyze  the  energy  consumption  of  the 
conventional R and proposed RI systems, the expected value 
of  energy  consumption  should  be  considered.  The  expected 
energy consumption is [3]: 
 
f
k
f
cyc
k
i
i
f cyc
f f
E N E N EE
ρ
ρ
ρ
−
−
⋅ = ⋅ =
+
= ∑ 1
1
1
0
   (24) 
 
where Ecyc is given either by Eq. (22) or Eq. (23), depending 
on which system type is considered. According to Eqs. (22)-
(24), if the conventional R system and the proposed RI system 
operate  at  the  same  supply  and  body  bias  voltages,  the  RI 
system  will  show  higher  energy  consumption  than  the  R 
system. However, it is important to note that the RI system has 
a much better performability than the R system at the same 
voltage setting, so that it is possible (see Section IV) to lower 
the supply voltage and body bias voltage of the RI system via 
DVS and ABB to achieve less energy dissipation than the R 
system,  even  though  the  RI  system  still  provides  better 
performability than the R system. 
 
IV.  EXPERIMENTAL AND ANALYTICAL RESULTS 
In this section we validate the efficiency and applicability of 
the  proposed  combined  time  and  information  redundancy 
approach as compared to the time-redundancy approach. For 
this purpose we have performed a Crusoe processor case study 
as  well  as  some  experiments  using  several  ITC’99 
benchmarks.  Section  IV-A  compares  the  performability  and 
energy dissipation of the conventional R and the proposed RI 
systems  based  on  the  Crusoe  processor.  Section  IV-B 
investigates  the  influence  of  hardware  overhead  on  the 
suitability  of  the  proposed  approach  and  presents  synthesis 
results to clarify the typical hardware overhead. Section IV-C 
studies the impact of the SEU rate on the proposed approach. 
A.  Case study: Crusoe processor 
This section demonstrates that it is possible to achieve both 
higher  performability  and  less  energy  consumption  using  a 
combination of information and time redundancy techniques 
(proposed  RI  system)  when  compared  to  using  time 
redundancy alone (conventional R System). We use as a case 
study  a  Transmeta  Crusoe  processor  implemented  in  0.18µ 
CMOS  technology,  for  which  implementation-relevant 
parameters are given in [21,22]. These parameters comprise 
the  following  constants  needed  for  the  evaluation  of 
performability  and  energy  (Eqs.  (3),  (4),  and  (14)-(24)): 
K1=0.053,  K2=0.140,  K3=3.0⋅10
-9,  K4=1.63,  K5=3.65, 
K6=51⋅10
-12, Vth1=0.359 V, Ceff=1.11⋅10
-9 F, Ld=37, Lg=4⋅10
6, 
Ij=2.40⋅10
-10  A.  As  an  example,  a  task  with  N=3⋅10
6  clock 
cycles and a deadline at D=20 ms is considered here. This task 
has a worst-case execution time of N/f(Vdd,Vbs)=4.2 ms, when 
Vdd=1.6V and |Vbs|=0V. It should be noted that these values for 
execution time and deadline are considered only as an example 
which is used as a case study to plot the trade-off graphs. For 
this example, the deadline allows 3 recovery executions of the 
whole task at Vdd=1.6 V and |Vbs|=0 V. Furthermore, for the RI 
system we assume a hardware overhead as well as increased 
switching activity of 100% (i.e. Ka=2, KFF=2, Ceff_extra=Ceff), 
and  a  critical  path  depth  increase  of  10%  (KC=1.1).  This 
assumption will be examined in Section IV-B. 
Using  the  analytical  models  developed  in  Section  III,  we 
analyze the energy/performability trade-off in the conventional 
R and proposed RI systems when: a) DVS is used, and b) both 
DVS and ABB are simultaneously used. 
 
Conventional R
Proposed RI
Performability
Vdd=1.0V
1.0 1.2
1.4
1.6
1.8
Energy
consumption
(mJ)
5
10
15
20
25
30
35
40
1-10-10 1-10-20 1-10-30 1-10-40 1-10-50 1-10-60
2.0
2.2
2.4
2.6
2.8
3.0
1.1
1.2
1.3
1.4
1.5
1.6
 
Fig. 4.  Energy/Performability trade-off in DVS-enabled systems: 
Conventional R system trade-off graph obtained from eqs. 17 and 24 
Proposed RI system trade-off graph obtained from eqs. 21 and 24 
 
1) Energy/performability trade-off in DVS-enabled systems: 
Fig.  4  shows  how  the  energy  consumption  and  the 
performability of the conventional R and proposed RI systems 
change when DVS is used (supply voltage Vdd changes and 
body  bias  voltage  is  constant  |Vbs|=0 V). In this figure, the 
curve of the conventional R system is an energy/performability 
trade-off  graph  obtained  from  eqs.  17  and  24.  Also,  the 
energy/performability  trade-off  graph  of  the  proposed  RI 
system has been obtained from eqs. 21 and 24. It can be seen 
from  this  figure  that  in  both  systems  we  can  improve  the TVLSI-00230-2005.R1 
 
9 
performability  (fault-tolerance)  by  increasing  the  supply 
voltage; however this increases the energy consumption of the 
system. 
As shown in Fig. 4, when the supply voltage of the proposed 
RI  system  increases  from  1.4V  to  1.5V,  the  performability 
does  not  increase  considerably.  However,  when  the  supply 
voltage  increases  from  1.5V  to  1.6V,  the  performability 
abruptly increases. This is because when the supply voltage 
increases from 1.4V to 1.5V the number of possible recovery 
executions remains the same (the performability has a small 
improvement  because  of  the  SEU  rate  reduction),  however 
when  the  supply  voltage  changes  from  1.5V  to  1.6V,  the 
operational frequency reaches the level sufficient to have one 
more  recovery  execution,  which  leads  to  an  abrupt 
improvement in performability. A similar pattern is observed 
for the conventional R system when, for example, the supply 
voltage changes from 1.2V to 1.4V and from 1.4V to 1.6V. 
It can be seen from Fig. 4 that the curve of the proposed RI 
system is below the curve of the conventional R system. This 
leads to an interesting conclusion: 
•  When  the  DVS  technique  is  employed,  it  is  possible  to 
achieve both higher fault-tolerance and less energy using 
the  proposed  RI  system  when  compared  to  the 
conventional  R  system.  We  clarify  this  by means of the 
following examples: 
1) Suppose we require a performability higher than 1-10
-40 
(this performability is very near to 1 which means that we 
require a hard real-time system). As it can be seen in Fig. 
4, to meet this requirement, we can use the conventional R 
system with the supply voltage Vdd=2.4 V. However, if we 
use  the  proposed  RI  system  with  the  supply  voltage 
Vdd=1.3 V, we will achieve the required performability as 
well as about 43% energy saving. In fact, compared to the 
conventional  R  system  at  Vdd=2.4V,  the  proposed  RI 
system  can  even  provide  both  higher  performability  and 
lower energy consumption at the same time if we apply the 
supply voltages 1.4 V, 1.5 V, 1.6 V (Fig. 4). 
2) Suppose we require a maximum energy consumption of 
10 mJ. As it can be seen in Fig. 4, to meet this requirement, 
we  can  use  the  conventional  R  system  with  the  supply 
voltage  Vdd=1.6  V  which  leads  to  a  performability  of 
1-10
-20. However, if we use the proposed RI system with 
the supply voltage Vdd=1.1V, we will achieve the required 
energy  constraint  and  at  the  same  time  a  better 
performability (i.e. 1-10
-30) than the conventional R system. 
 
2) Comparison of the R and RI systems and simultaneous 
DVS  and  ABB:  Using  the  analytical  models  developed  in 
Section III, Fig. 5 shows how the energy consumption and the 
performability of the conventional R and proposed RI systems 
change when DVS and ABB are used simultaneously. In this 
figure, for each system (R and RI) two curves are plotted for 
two different body bias voltages, i.e. Vbs=0, and Vbs=Vbs_MIN= 
-1 V. Each curve illustrates the energy/performability trade-off 
when the suply voltage Vdd changes. As shown in Fig. 5, the 
curves of the proposed RI system are below the curves of the 
conventional  R  system.  An  interesting  observation  can  be 
made from Fig. 5: 
•  When both the DVS and ABB techniques are employed, 
for  the  same  constraint  on  system  fault-tolerance 
(performability)  the  proposed  RI  system  offers  lower 
energy consumption than the conventional R system. For 
example, if we require a performability more than 1-10
-40, 
as it can be seen in Fig. 5 we can use one of the following 
combinations: 
1) Conventional R (Vdd, Vbs)=(2.4,0)     
2) Conventional R (Vdd, Vbs)=(2.4,-1) 
3) Proposed RI (Vdd, Vbs)=(1.4,0)     
4) Proposed RI (Vdd, Vbs)=(1.4,-1) 
However, if we use the combination 4, i.e. proposed RI 
(Vdd,  Vbs)=(1.4,-1),  we  will  achieve  the  required 
performability as well as the least energy consumption. 
 
1-10-20 1-10-40 1-10-60 1-10-80 1-10-100
10
20
30
40
50 Vbs=0V
Conventional R
Vdd=3.4V
2.6
3.0
3.4
3.0
2.6
2.2
2.2
1.8
1.8
1.4
1.4
Energy
consumption
(mJ)
Vbs=-1V
Conventional R
Vbs=0V
Proposed RI
Vbs=-1V
Proposed RI
Performability
 
Fig. 5.  Energy/Performability trade-off in DVS and ABB enabled systems 
 
B.  Hardware overhead 
Although  the previous analysis has been carried out for the 
Crusoe processor, most of the parameters (Section IV-A) are 
independent from the Crusoe design and are only dependent on 
the used technology. In fact, the only parameters that depend 
on the Crusoe processor are, i) number of the gates and flip-
flops,  ii)  average  switched  capacitance,  and  iii)  depth  of 
critical  path.  The  hardware  overhead,  which  is  required  to 
process  the  redundant  information,  influences  these  three 
parameters.  In  order  to  examine  the  assumptions  made  in 
Section IV-A about the hardware overhead value, and to study 
the impact of the overhead on the efficiency of the proposed 
approach, we have regenerated the plots of Fig. 5 in Fig. 6 for 
different  parameters  settings,  i.e. critical path increase (KC), 
hardware  overhead  (Ka  and  KFF)  and  switching  activity 
(switched capacitance) overhead (Ceff_extra). 
As we can observe from Fig. 6a, if the RI system hardware TVLSI-00230-2005.R1 
 
10 
overhead as well as the switching activity are assumed to be 
50% higher than in the original R system and the critical path 
increase  to  be  4%,  then  the  proposed  RI  system  proves 
advantageous in terms of both fault-tolerance (performability) 
and  energy  dissipation.  With  increasing  critical  path  (up  to 
10%), hardware and switching overheads (up to 200%), the 
energy  consumption  and  performability  of  the  proposed  RI 
system becomes closer to the conventional R system (Fig. 6a 
to 6d); however the proposed RI system still provides better 
performability and energy dissipation. 
To  provide  insight  into  the  critical  path,  hardware  and 
switching activity overhead required for typical circuit designs, 
we  have  carried  out  some  synthesis  experiments  using  four 
circuits  from  the  ITC’99  benchmarks  and  Synopsys  design 
compiler.  The  benchmarks  which  have  been  used  are 
benchmarks b12 through b15. These benchmarks are: 80386 
processor (subset), Viper processor (subset), 1 player game, 
and sensor interfaces. Some of the other ITC'99 benchmarks 
are  too  small  so  that  they  can  be  considered  as  simple 
components  (such  as  b1,  b2).  Also,  the  other  ITC'99 
benchmarks include several copies of benchmarks b15 and b14 
(such as b16, and b17). We have used the most appropriate 
benchmarks  among  the  ITC'99  benchmarks  (such  as 
processors which can be used in real-time applications). 
The experiments were performed for the unmodified circuits 
(representing  the  R  systems)  as  well  as  for  the  modified 
circuits  (based-on  overlapping  parity  method  [25])  that 
included  the  extra  hardware  for  the  redundant  information 
(representing the RI systems). To apply the overlapping parity 
technique,  the  flips-flops  of  the  system  are  divided  into 
10
20
30
40
50
1-10-20 1-10-40 1-10-60 1-10-80 1-10-100
Performability
Vbs=-1V
Conventional R
Vdd=3.4V
2.6
3.0
3.4
3.0
2.6
2.2
2.2
1.8 1.8
1.4
1.4
Switching activity overhead = 50% (Ceff_extra=0.5 Ceff),
Hardware overhead = 50% (Ka=1.5,KFF=1.5),
Critical path increase=4% (Kc=1.04)
(a)
Energy
consumption
(mJ)
Vbs=0V
Conventional R
Vbs=0V
Proposed RI
Vbs=-1V
Proposed RI
1-10-20 1-10-40 1-10-60 1-10-80 1-10-100
10
20
30
40
50
Energy
consumption
(mJ)
Performability
Vdd=3.4V
2.6
3.0
3.4
3.0
2.6
2.2
2.2
1.8 1.8
1.4
1.4
Switching activity overhead = 100% (Ceff_extra= Ceff),
Hardware overhead = 100% (Ka=2,KFF=2),
Critical path increase=8% (Kc=1.06)
(b)
Vbs=-1V
Proposed RI
Vbs=0V
Conventional R
Vbs=-1V
Conventional R
Vbs=0V
Proposed RI
Vbs=0V
1-10-20 1-10-40 1-10-60 1-10-80 1-10-100
10
20
30
40
50
Energy
consumption
(mJ)
Performability
Vbs=-1V
Proposed RI
Vbs=0V
Conventional R
Vbs=-1V
Conventional R
Vdd=3.4V
2.6
3.0 3.4
3.0
2.6
2.2
2.2
1.8
1.8
1.4
1.4
Proposed RI
Switching activity overhead = 200% (Ceff_extra=2 Ceff),
Hardware overhead = 200% (Ka=3,KFF=3),
Critical path increase=10% (Kc=1.10)
(d)
Vbs=0V
1-10-20 1-10-40 1-10-60 1-10-80 1-10-100
10
20
30
40
50
Energy
consumption
(mJ)
Performability
Vbs=-1V
Proposed RI
Vbs=0V
Conventional R
Vbs=-1V
Conventional R
Vdd=3.4V
2.6
3.0
3.4
3.0
2.6
2.2
2.2
1.8
1.8
1.4
1.4
Proposed RI
Switching activity overhead = 150% (Ceff_extra=1.5 Ceff),
Hardware overhead = 150% (Ka=2.5,KFF=2.5),
Critical path increase=8% (Kc=1.08)
(c)  
Fig. 6.  Impact of information redundancy hardware  
 TVLSI-00230-2005.R1 
 
11 
registers  (with  different  sizes)  and  each  register  is  replaced 
with a corresponding SEU tolerant register (See Section III). 
This process has been performed manually.  
After synthesis, the total number of signal transitions was 
used as a criterion to analyze the average switched capacitance 
and,  hence,  the  dynamic  energy  consumption.  It  should  be 
noted that the hardware overhead also accounts for the static 
energy  overhead  (see  Section  III-B).  Table  III  shows  the 
experimental  results.  As  shown  in  this  table,  the  performed 
experiments  indicate  a  hardware  overhead  of 42% to 173% 
and a switching activity overhead of 59% to 161%. Also, it has 
been found that the critical path length increase is less than 
7%.  Note  that  for  such  overheads  the  proposed  RI  system 
yields better results in terms of energy and performability (Fig. 
6).  Overall,  the  experiments  presented  in  this  section  have 
shown that the proposed RI systems offer advantages in terms 
of  energy  and  performability  over  conventional  R  systems. 
This  is  particular the case if the hardware overhead for the 
additional information redundancy can be kept below 200% 
(Fig. 6). 
C.  Impact of SEU rate 
So  far,  we  have  assumed  that  λ0(0)=10
-6  FPS  (Section 
  III-A-2). However, the SEU rate depends on the application 
environments and hence it is worthwhile to study the impact of 
the SEU rate on the efficiency of the proposed RI system. To 
do this, we have regenerated the plots of Fig. 5 in Fig. 7 for 
different SEU rates. Here, it is assumed that: switching activity 
overhead=100%  (Ceff_extra=Ceff),  hardware  overhead=100% 
(Ka=2,KFF=2), and critical path increase=10% (KC=1.10). 
It  can  be  figured  out  from  Fig.  7  that  the  proposed  RI 
system  proves  more  advantageous  than  the  conventional  R 
system, when the SEU rate is larger. We clarify this by means 
of the following example: Suppose we require a performability 
more than 1-10
-40. To achieve this level of performability: 
•  When  λ0(0)=10
-9  FPS  (Fig.  7a),  we  can  use  the 
conventional R system at (Vdd, Vbs) = (1.8V, 0V) and the 
proposed RI system at (Vdd, Vbs) = (1V, 0V). However, at 
these voltage settings, the proposed RI system offers about 
22%  energy  saving  as  compared  to  the  conventional  R 
system. 
•  When  λ0(0)=10
-6  FPS  (Fig.  7b),  we  can  use  the 
conventional R system at (Vdd, Vbs) = (2.4V, -1V) and the 
proposed RI system at (Vdd, Vbs) = (1.4V, -1V). However, 
at  these  voltage  settings,  the  proposed  RI  system  offers 
about 42% energy saving as compared to the conventional 
R system. 
•  When  λ0(0)=10
-3  FPS  (Fig.  7c),  we  can  use  the 
conventional R system at (Vdd, Vbs) = (3.2V, -1V) and the 
proposed RI system at (Vdd, Vbs) = (1.8V, 0V). However, at 
these voltage settings, the proposed RI system offers about 
44%  energy  saving  as  compared  to  the  conventional  R 
system. 
In short, with the performability constraint of 1-10
-40, as the 
SEU rate increases from λ0(0)=10
-9 FPS to λ0(0)=10
-3 FPS, the 
energy saving of the proposed RI system over the conventional 
R system increases from 22% to 44%. 
 
Vbs=0V
1-10-20 1-10-40 1-10-60 1-10-80 1-10-100
10
20
30
40
50
Performability
Vbs=-1V
Proposed RI
Vbs=0V
Conventional R
Vbs=-1V
Conventional R
Vdd=3.4V
2.6
3.0 3.4
3.0
2.6
2.2
2.2
1.8
1.8
1.4
1.4
Proposed RI
Energy
consumption
(mJ)
1-10-120 1-10-140
(a)
9
0 10
− = λ FPS
Vbs=0V
1-10-20 1-10-40 1-10-60 1-10-80 1-10-100
10
20
30
40
50
Performability
Vbs=-1V
Proposed RI
Vbs=0V
Conventional R
Vbs=-1V
Conventional R
Vdd=3.4V
2.6
3.0
3.4
3.0
2.6
2.2
2.2
1.8
1.8
1.4
1.4
Proposed RI
Energy
consumption
(mJ)
6
0 10
− = λ FPS (b)
Vbs=0V
1-10-10 1-10-20 1-10-30 1-10-40 1-10 -50
10
20
30
40
50
Performability
Vbs=-1V
Proposed RI
Vbs=0V
Conventional R
Vbs=-1V
Conventional R
Vdd=3.4V
2.6
3.0
3.4
3.0
2.6
2.2
2.2
1.8
1.8
1.4
1.4
Proposed RI
Energy
consumption
(mJ)
1-10 -60 1-10 -70
3
0 10
− = λ FPS (c)  
Fig. 7.  Impact of SEU rate 
TABLE II 
OVERHEAD OF THE INFORMATION REDUNDANCY TECHNIQUE ON ITC'99 BENCHMARKS 
Original circuit without 
information redundancy 
Circuit with information 
redundancy  Overhead (%) 
Benchmark 
#Gates  #FF  #Gates  #FF  Gates  FF  Critical 
path 
Switching 
activity 
B12  1129  121  2693  149  138.5  18.8  6.6  144.0 
B13  388  53  1060  72  173.2  26.4  6.1  161.4 
B14  10658  245  15211  297  42.7  17.6  3.9  59.7 
B15  9017  449  16146  554  79.1  19.0  6.4  79.2 
 TVLSI-00230-2005.R1 
 
12 
V.  CONCLUSION 
High fault-tolerance against transient faults (SEUs) and low 
energy consumption are key objectives in the design of real-
time embedded systems. There exists effective energy saving 
techniques such as DVS and ABB and mature fault-tolerance 
techniques  which  can  be  used  to  achieve  these  objectives.  
However  careful  considerations  should  be  taken  in order to 
achieve  both  objectives  simultaneously,  since  it  has  been 
shown that these two objectives are at odds, i.e. the usage of 
fault-tolerance techniques increases energy dissipation and the 
usage of energy-saving techniques reduces system reliability. 
This paper has intended to contribute to the effort of finding 
suitable  fault  tolerance  techniques,  to  be  used  with  systems 
that employ energy management techniques. It is not intended 
to provide any new fault-tolerance or energy saving technique. 
Toward  this  goal,  this  paper  has  presented  the  first 
investigation  into  the  usage  of  information  redundancy  in 
DVS-enabled  and  ABB-enabled  systems.  Experimental  and 
analytical studies has shown that the use of a combination of 
information-redundancy  and  rollback-recovery  in  DVS-
enabled and ABB-enabled real-time systems can significantly 
improve  the  system’s  fault-tolerance  as  well  as  energy 
dissipation, when compared to the real-time systems that rely 
solely  on  rollback-recovery,  even  when  considering  the 
imposed  hardware  overheads.  Since  the  SEU  rate  varies  in 
different  environments,  the  impact  of  the  SEU  rate  on  the 
suitability of the proposed approach has been analyzed. The 
analysis  has  shown  that  as  the  SEU  rate  increases,  the 
proposed  system  (based  on  the  combination of information-
redundancy and rollback-recovery) proves more advantageous 
in terms of energy consumption than the conventional system 
(sole rollback recovery). 
 
VI.  ACKNOWLEDGEMENT 
Authors  are  thankful  to  the  reviewers  for  their  insightful 
comments which improved the contribution and clarity of the 
work. 
 
REFERENCES 
[1]  R.  Melhem,  D.  Mosse,  and  E.  Elnozahy,  "The  interplay  of  power 
management  and  fault  recovery  in  real-time  systems,"  IEEE  Trans. 
Computers, vol. 53, no. 2, pp. 217-231, 2004. 
[2]  Y. Zhang and K. Chakrabarty, "Dynamic adaptation for fault tolerance 
and power management in embedded real-time systems," ACM Tran. 
Embedded Computing Systems, vol. 3, no. 2, pp. 336-360, 2004. 
[3]  D. Zhu, R. Melhem, and D. Mosse, "The Effects of Energy Management 
on Reliability in Real-Time Embedded Systems," in Proc. Intl. Conf. 
Computer Aided Design (ICCAD 2004), pp. 35-40, 2004. 
[4]  M. T. Schmitz, B. M. Al-Hashimi, and P. Eles, System-Level Design 
Techniques for Energy-Efficient Embedded Systems, Kluwer Academic 
Publisher, 2004. 
[5]  T. D. Burd, T. A. Pering, A. J. Stratakos, and R. W. Brodersen, "A 
dynamic voltage scaled microprocessor system," IEEE Journal of Solid-
State Circuits, vol. 35, no. 11, pp. 1571-1580, 2000. 
[6]  C. H. Kim and K. Roy, "Dynamic VTH scaling scheme for active leakage 
power  reduction,"  in  Design,  Automation  and  Test  in  Europe 
Conference and Exhibition, 2002.Proceedings, pp. 163-167, 2002. 
[7]  F. Liberato, R. Melhem, and D. Mosse, "Tolerance to multiple transient 
faults  for  aperiodic  tasks  in  hard  real-time  systems,"  IEEE  Trans. 
Computers, vol. 49, no. 9, pp. 906-914, 2000. 
[8]  N. Seifert, D. Moyer, N. Leland, and R. Hokinson, "Historical trend in 
alpha-particle induced soft error rates of the Alpha
 microprocessor," in 
Proc. 39th Annual IEEE Intl. Reliability Physics Symp., pp. 259-265, 
2001. 
[9]  P.Hazucha, C.Svensson, and S.A.Wender, "Cosmic-ray soft error rate 
characterization of a standard 0.6-µm CMOS process," IEEE Journal of 
Solid-State Circuits, vol. 35, no. 10, pp. 1422-1429, 2000. 
[10]  P.Hazucha and C.Svensson, "Impact of CMOS Technology Scaling on 
the  Atmospheric  Neutron  Soft  Error  Rate,"  IEEE  Trans.  Nuclear 
Science, vol. 47, no. 6, pp. 2586-2594, 2000. 
[11]  T.Karnik, J.Tschanz, B.Bloechel, P.Hazucha, P.Armstrong, S.Narendra, 
A.Keshavarzi,  K.Soumyanath,  G.Dermer,  J.Maiz,  S.Borkar,and  V.De, 
"Impact of Body Bias on Alpha- and Neutron-Induced Soft Error Rates 
of  Flip-flops",  in  Digest  of  Technical  Papers,  2004  Symp.  On  VLSI 
Circuits, pp. 324-325, 2004.  
[12]  P. A. Ferreyra, C. A. Marques, R. T. Ferreyra, and J. P. Gaspar, "Failure 
map  functions  and  accelerated  mean  time  to  failure  tests:  new 
approaches for improving the reliability estimation in systems exposed 
to single event upsets," IEEE Trans. Nuclear Science, vol. 52, no. 1, pp. 
494-500, 2005. 
[13]  P. E. Dodd, M. R. Shaneyfelt, J. R. Schwank, and G. L. Hash, "Neutron-
induced latchup in SRAMs at ground level," in Proc. 41st Annual IEEE 
Intl. Reliability Physics Symp., pp. 51-55, 2003. 
[14]  P.E.Dodd  and  L.W.Massengill,  "Basic  Mechanisms  and  Modeling of 
Single-Event Upset in Digital Microelectronics," IEEE Trans. Nuclear 
Science, vol. 50, no. 3, pp. 583-602, 2003. 
[15]  F.  Worm,  P.  Ienne,  P.  Thiran,  and  G.  DeMicheli,  "A  Robust  Self-
Calibrating  Transmission  Scheme  for  On-Chip  Networks,"  IEEE 
Trans.Very Large Scale Integration (VLSI) Systems, vol. 13, no. 1, pp. 
126-139, 2005. 
[16]  V. Degalahal, N. Vijaykrishnan, and M. J. Irwin, "Analyzing Soft Errors 
in Leakage Optimized SRAM Design," in Proc.16th Intl. Conf. VLSI 
Design, pp. 227-233, 2003. 
[17]  V. Degalahal, R. Ramanarayanan, N. Vijaykrishnan, Y. Xie, and M. J. 
Irwin, "The Effect of Threshold Voltages on the Soft Error Rate," in 
Proc. 5th Intl. Symp. Quality Electronic Design (ISQED), pp. 503-508, 
2004. 
[18]  D. Bertozzi, L. Benini, and G. DeMicheli, "Error Control Schemes for 
On-Chip  Communication  Links:  The  Energy–Reliability  Tradeoff," 
IEEE  Trans.  Computer-Aided  Design  of  Integrated  Circuits  and 
Systems, vol. 24, no. 6, pp. 818-831, 2005. 
[19]  A.  Maheshwari,  W.  Burleson,  and  R.  Tessier,  "Trading  off  transient 
fault tolerance and power consumption in deep submicron (DSM) VLSI 
circuits," IEEE Trans. Very Large Scale Integration (VLSI) Systems, 
vol. 12, no. 3, pp. 299-311, 2004. 
[20]  K.  M.  Kavi,  Hee  Yong  Youn,  B.  Shirazi,  and  A.  R.  Hurson,  "A 
performability model for soft real-time systems," in Proc.of the 27th 
Hawaii Intl. Conf. System Sciences (HICSS), pp. 571-579, 1994. 
[21]  S.  M.  Martin,  K.  Flautner,  T.  Mudge,  and  D.  Blaauw,  "Combined 
dynamic  voltage  scaling  and  adaptive  body  biasing  for  lower  power 
microprocessors under dynamic workloads," in Proc. IEEE/ACM Intl. 
Conf. Computer Aided Design (ICCAD 2002), pp. 721-725, 2002. 
[22]  "TM5400/TM5600 Data Book," TRANSMETA, 2000. 
[23]  S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim, "Robust system 
design with built-in soft-error resilience," Computer, vol. 38, no. 2, pp. 
43-52, 2005. 
[24]  J. Gaisler, "A portable and fault-tolerant microprocessor based on the 
SPARC v8 architecture," in Proc. Intl. Conf. Dependable Systems and 
Networks (DSN 2002), pp. 409-415, 2002. 
[25]  B.W. Johnson, Design and Analysis of Fault Tolerant Digital Systems, 
Boston, MA: Addison-Wesley, 1989. 
[26]  A.  Andrei,  M.  Schmitz,  P.  Eles,  Z.  Peng,  B.M.  Al  Hashimi, 
“Simultaneous  Communication  and  Processor  Voltage  Scaling  for 
Dynamic  and  Leakage  Energy  Reduction  in  Time-Constrained 
Systems”,  in  Proc.  IEEE/ACM    Intl.  Conf.  Computer  Aided  Design 
(ICCAD 2004), pp. 362-369, 2004. 
[27]  D.K.Pradhan, Fault-Tolerant Computer System Design, Prentice Hall, 
1996. TVLSI-00230-2005.R1 
 
13 
[28]  M. Miyazaki, G. Ono, and K. A. Isbibashi, "1.2-GIPS/W microprocessor 
using speed-adaptive threshold-voltage CMOS with forward bias", IEEE 
Journal of Solid-State Circuits, vol. 37, pp. 210-217, Feb, 2002. 
[29]  M.  Sumita,  "High  resolution  body  bias  techniques  for  reducing  the 
impacts of leakage current and parasitic bipolar", in Proc. Intl. Symp. 
Low Power Electronics Design (ISLPED), pp. 203-208, 2005. 
[30]  D.  Lambert,  J.  Baggio,  V.  Ferlet-Cavrois,  O.  Flament,  F.  Saigne,  B. 
Sagnes,  N.  Buard,  and  T.  Carrière,  "Neutron-Induced  SEU  in  Bulk 
SRAMs  in  Terrestrial  Environment:  Simulations  and  Experiments", 
IEEE Trans. Nuclear Science, Vol. 51, No. 6, 2004. 
[31]  Y. Zhang and K. Chakrabarty, "A unified approach for fault tolerance 
and dynamic power management in real-time embedded systems", IEEE 
Transactions  on  Computer-Aided  Design  of  Integrated  Circuits  & 
Systems, vol. 25, pp. 111-125, January 2006. 
 
 
 
Alireza Ejlali received the M.Sc. and Ph.D. 
degrees in computer engineering from Sharif 
University  of  Technology,  Tehran,  Iran,  in 
1999  and  2006  respectively.  From  2005  to 
2006,  he  was  working  on  the  trade-off 
between fault tolerance and energy efficiency 
as  a  visiting  researcher  in  the  Electronic 
Systems  Design  Group,  University  of 
Southampton, UK..  
His  research  interests  include  dependability 
evaluation, fault injection, and fault tolerant 
embedded systems.  
   
 
 
 
 
Bashir  M.  Al-Hashimi  (M’99-SM’01) 
received  the  B.Sc.  degree  (with  1
st-class 
classification)  in  Electrical  and  Electronics 
Engineering from the University of Bath, UK, 
in  1984  and  the  Ph.D.  degree  from  York 
University,  UK,  in  1989.  Following  this  he 
worked  in  the  semiconductor  industry 
designing  integrated  circuits  for  signal 
processing  applications,  developing  CAD 
tools for simulation and synthesis of analogue 
and  digital  circuits.  In  1999,  he  joined  the 
School of Electronics and Computer Science, 
Southampton  University,  UK,  where  he  is 
currently a Professor of Computer Engineering. He has authored one book on 
SPICE  simulation,  (CRC  Press,1995),  and  coauthored  two  books,  Power 
Constrained  Testing  of  VLSI  circuits  (Springer,  2002),  and  System-Level 
Design Techniques for Energy-Efficient Embedded Systems (Springer, 2004). 
Recently, he edited the book, System-on-Chip: Next Generation Electronics 
(IEE Press, 2006). He has published over 160 papers in journals and refereed 
conference proceedings. His current research and teaching interests include 
low-power system-level design, system-on-chip test, and VLSI CAD.      
Prof. Al-Hashimi is a Fellow of the IEE and a Senior Member of the IEEE. He 
is  the  Editor-in-Chief  of  the  IEE  Proceedings:  Computers  and  Digital 
Techniques,  an  editor  of  the  Journal  Electronic  Testing:  Theory  and 
Applications (JETTA), and is a member of the editorial board of the journal 
Low power Electronics, and the journal of Embedded Computing. He is the 
General  Chair  of  the  11
th  IEEE  European  Test  Symposium  (Southampton 
2006) and the General Chair of DATE Friday Workshops (2005 and 2006). 
He was the coauthor of the James Beausang Best Paper Award at the 2000 
IEEE International Test Conference relating to low power BIST for RTL data 
paths.  
 
 
 
 
Marcus T. Schmitz received the Dipl. Ing. 
(FH)  degree  in  electrical  engineering  from 
the University of Applied Science Koblenz, 
Germany, in 1999, and the Ph.D. degree in 
electronics  from  the  University  of 
Southampton, United Kingdom, in 2003.   
He has been working on the development of 
off-line  and  on-line  voltage  scaling 
techniques  for  embedded  systems  as  a 
postdoctoral  researcher  in  the  Embedded 
Systems  Laboratory,  Linkoping  University, 
Linkoping, Sweden, as well as the Electronic 
Systems  Design  Group,  University  of 
Southampton,  UK.  Following  these 
appointments, he join Robert Bosch GmbH, Germany, in 2005, where he is 
currently  involved  in  the  design  of  electronic  engine  control  units.    His 
research interests include system-level co-design, application-driven design 
methodologies,  energy-efficient  system  design,  and  reconfigurable 
architectures.    Dr.  Schmitz  has  published  several  papers  in  the  area  of 
dynamic voltage scaling for heterogeneous embedded systems and he has co-
authored  the  book  "System-Level  Design  Techniques  for  Energy-Efficient 
Embedded Systems" (Kluwer Academic, 2004).    
 
 
 
Paul  Rosinger  received  the  B.Sc.  in 
Computer  Science  from  the  Technical 
University  of  Timisoara,  Romania,  in 
1999,  and  the  Ph.D.  in  Electronics  and 
Computer Science from the University of 
Southampton,United  Kingdom,  in  2003. 
He is now a postdoctoral research-fellow at 
the University of Southampton. His current 
research interests include testing of digital 
systems, low power embedded systems and 
reconfigurable architectures. 
 
 
 
 
Seyed Ghassem Miremadi got his MSc in 
Applied Physics and Electrical Engineering 
from Linköping Institute of Technology and 
his  PhD  in  Computer  Engineering  from 
Chalmers  University  of  Technology, 
Sweden, in 1984 and 1995, respectively. He 
is  an  Associate  professor  of  Computer 
Engineering  at  Sharif  University  of 
Technology. As fault-tolerant computing is 
his specialty, he initiated the "Dependable 
Systems Laboratory" at Sharif University in 
1996 and has chaired the Laboratory since 
then.  The  research  laboratory  has 
participated in several research projects which have led to several scientific 
articles, conference papers and technical reports. Dr. Miremadi and his group 
have done research in Physical, Simulation-Based and Software-Implemented 
Fault Injection, Dependability Evaluation Using HDL Models, Fault-Tolerant 
Embedded Systems and Fault Tree Analysis.  
Dr. Miremadi was the Education Director (1997-1998) and the Head (1998-
2002) of Computer Engineering Department at Sharif University and since 
2002 is the Research Director of the department.  
He is a member of the IEEE Computer Society, IEEE Reliability Society and 
the Computer Society of Iran. 
 
  
 