SERAD: Soft Error Resilient Asynchronous Design using a Bundled Data
  Protocol by Aketi, Sai Aparna et al.
1SERAD: Soft Error Resilient Asynchronous Design
using a Bundled Data Protocol
Sai Aparna Aketi†, Smriti Gupta†, Huimei Cheng∗, Joycee Mekie† and Peter A. Beerel∗‡
Abstract—The risk of soft errors due to radiation continues to
be a significant challenge for engineers trying to build systems
that can handle harsh environments. Building systems that
are Radiation Hardened by Design (RHBD) is the preferred
approach, but existing techniques are expensive in terms of
performance, power, and/or area. This paper introduces a novel
soft-error resilient asynchronous bundled-data design template,
SERAD, which uses a combination of temporal and spatial
redundancy to mitigate Single Event Transients (SETs) and
upsets (SEUs). SERAD uses Error Detecting Logic (EDL) to
detect SETs at the inputs of sequential elements and correct
them via re-sampling. Because SERAD only pays the delay
penalty in the presence of an SET, which rarely occurs, its
average performance is comparable to the baseline synchronous
design. We tested the SERAD design using a combination of
Spice and Verilog simulations and evaluated its impact on area,
frequency, and power on an open-core MIPS-like processor using
a NCSU 45nm cell library. Our post-synthesis results show
that the SERAD design consumes less than half of the area of
the Triple Modular Redundancy (TMR), exhibits significantly
less performance degradation than Glitch Filtering (GF), and
consumes no more total power than the baseline unhardened
design.
I. INTRODUCTION
Designing electronic components that can sustain harsh
environmental conditions in space and military applications
has been an important field of research for many decades [1],
[2]. Prolonged bombardment of heavy-ions, protons, neutrons
and other particles on electronic equipment and circuits can
result in permanent damage or soft errors that cause tempo-
rary failures [3], [4]. When high-energy neutrons (present in
terrestrial cosmic radiations) or alpha particles (that originate
from impurities in the packaging materials) strike a sensitive
node in a CMOS circuit, they generate a dense local track of
additional electron-hole pairs in the substrate. This additional
charge is collected by the drain of an OFF transistor and can
result in a transient voltage pulse [5].
Because these events are somewhat rare, typical models
assume these events can be analyzed in isolation as a Single
Event Transient (SET), but as technology scales multiple event
transients are also becoming relevant. A transient event can be
altered as it propagates along a path through combinational
†S. A. Aketi, S. Gupta, and J. Mekie are with the Department
of Electrical Engineering, Indian Institute of Technology Gandhinagar
(IITGN), Gandhinagar, Gujarat 382355, India (email: saketi@purdue.edu;
smriti.gupta@mtech2016.iitgn.ac.in; joycee@iitgn.ac.in)∗H. Cheng and P. A. Beerel are with the Ming Hsieh Dept. of Elec.
and Comp. Eng., University of Southern California, Los Angeles, CA. USA
(email: huimeich@usc.edu; pabeerel@usc.edu)‡P. A. Beerel also consults for Galois, Inc in the area of asynchronous
design.
circuit. In particular, an event may be masked due to three
techniques, i.e. logical masking, electrical masking, and tem-
poral masking. In addition, the event may be attenuated or
propagated and can result in an early or late edge or a dynamic
hazard [5]. When the transient is finally latched, a Single
Event Upset (SEU) is created. Traditionally transients that
effect memory elements were considered more significant than
those that strike combinational logic, but as technology has
scaled, strikes in the combinational logic are also becoming
important [6].
To circumvent the effects of these types of radiation, nu-
merous Radiation Hardened by Design (RHBD) techniques
have been proposed in the literature, the most common being
the Triple Modular Redundancy (TMR) [7], Guarded Dual
Modular Redundancy (GDMR) [8], gate sizing [9]–[12], and
Glitch Filtering (GF) [13]–[15]. RHBD cell libraries have also
been developed for both asynchronous and synchronous de-
signs [16], [17]. One study demonstrated the benefits of using
asynchronous bundled-data latch-based design techniques to
reduce the area, delay, and power penalty associated with a
RHBD cell library [17]. However, radiation-hardened libraries
are cumbersome to design, typically generations behind state-
of-the-art unhardened versions, and relatively expensive in
terms to cell area, power consumption, and cell delay.
This paper explores the possibility of building an efficient
SET-resilient technique that uses standard cell libraries and
pays a performance penalty only when an SET occurs. The
proposed approach is inspired by an asynchronous bundled-
data template that is timing-resilient [18], exhibiting high
performance when no timing errors occur and gracefully
slowing down in the presence of timing errors. However,
unlike timing-resilient designs that can assume that the timing
of signals is governed by a notion of worst-case delay, SET-
resilient circuits must account for the fact that SETs can occur
at any time. This means the existing timing resilient design
cannot be simply adopted and new SET-aware asynchronous
protocols and circuits must be developed.
Our proposed template that achieves this goal is called
SERAD which refers to soft error resilient asynchronous
design using a bundled data protocol. Like its timing-resilient
cousin, it uses standard single-rail combinational logic and
error-detecting latches [19]. However, its handshaking protocol
and control logic are completely redesigned to use a combi-
nation of temporal and spatial redundancy to mitigate SETs.
The use of single-rail combinational logic yields an area and
power efficient design, and the novel error-detecting logic and
control structure makes it performance efficient as it delays
the pipeline only when an SET occurs. Given that soft-errors
ar
X
iv
:2
00
1.
04
03
9v
1 
 [c
s.A
R]
  1
3 J
an
 20
20
2are quite rare [20], the average performance of SERAD is thus
essentially the same as its non-SET-resilient counterpart.
The remainder of this paper is organized as follows. Sec-
tion II describes related research to make synchronous and
asynchronous techniques resilient to soft errors. Section III
then presents a detailed description of the proposed SERAD
template describing how it combines spatial and temporal
redundancy to ensure SET resilience. Section IV presents
our experimental results that consist of analog and digital
simulations that validate SERAD’s SET resilience as well as a
case study on an open-core MIPS-like processor that quantifies
SERAD’s relative area, frequency, and power compared to
alternative approaches. Finally, Section V summarizes the
paper and describes opportunities for future work.
II. RELATED WORK
The gold standard approach to making a typical syn-
chronous system shown in Fig. 1(a) resilient to soft-errors is
TMR. As illustrated in Fig. 1(b), this uses three copies of
the combinational logic and flip-flops (FFs) and an additional
voting structure to pick the answer common to at least two
copies. When applied to a large system, this approach is
complicated by the need to re-synchronize the blocks after
an event occurs. Eaton et al. addressed this issue by applying
this concept to the design of FFs that sample the data at three
distinct time steps, each separated from the next by at least the
maximum SET pulse width [21]. This is reliable but has high
area, performance, and power penalties. Dual-mode and dual-
rail versions use two copies of the logic followed by a Muller
C-element that waits until both copies agree before firing
[8], [21] and are also costly. Alternatively, gates and storage
elements can be sized to minimize the risk of soft-errors [10]–
[12]. Unfortunately, the increase in size needed for the gates
is technology dependent and expected to grow as technologies
scales. In addition, numerous researchers have focused on
making storage elements and memory arrays SET-tolerant,
the latter using error-detecting codes [22]–[24]. Our template
leverages these techniques as well. Other researchers have
proposed to add glitch filters to the output of combinational
logic blocks and thereby mask SETs before they can be latched
[13], [14], as illustrated in Fig. 1(c). However, the glitch
filter effectively increases the latency of the combinational
logic and does not combat a late edge that can be caused
by an SET. Therefore to use this approach the clock period
must be increased by at least two times that of the maximum
width of an SET. Several experimental efforts have reported
glitch widths between 10s of pico-seconds to over 1ns [25],
suggesting that the performance penalties can be impractical
for high-speed designs. An important point to be made is that
in all these schemes the performance penalty is uniform in that
it increases the critical path of the design, thereby decreasing
its maximum clock frequency. Therefore, for designs with little
timing slack, the performance penalty will exist in every cycle
of operation independent of whether a soft-error occurs or not.
There have also been several works that have tried to lever-
age asynchronous design as a basis for radiation hardening.
They can be divided into two domains - one domain is based
on using dual-rail (DR) or quasi-delay-insensitive (QDI) logic
and the other is based on bundled-data design. The former
covers works based on many existing asynchronous tem-
plates, including weak-conditioned half-buffers [26], dual-rail
minterms [27], pre-charged half-buffers [28], null-convention
logic [29], [30], and others [31]. In each case, the researchers
add circuitry to traditional DR/QDI circuit templates to make
them more resilient to soft-errors. However, the base circuitry
often requires at least two times more transistors than standard
single-rail synchronous logic [32], incurring large penalties in
silicon area and leakage current when compared to standard
synchronous design. Moreover, the return-to-zero nature of
these templates often implies higher switching activities than
synchronous single-rail counterparts which results in higher
dynamic power. While these asynchronous design styles are
attractive because they require few timing assumptions to
ensure correctness and can thus work in harsh environments,
their costs may be prohibitive for many applications.
In contrast, bundled-data asynchronous circuits use standard
combinational logic and sequential elements. They differ from
their synchronous counterparts only in that instead of a global
clock, asynchronous circuits are used to generate local clock
trees. The result is a design which is similar (if not better)
in area and power consumption than synchronous designs but
require timing assumptions to ensure correctness. Delay lines
are needed as part of the asynchronous circuits to ensure
setup and hold constraints are satisfied. As these can be
difficult to design optimally they are often made to be post-
silicon programmable. The sequential elements can be latch
or FF based, latch-based typically being more flexible and
lower power [17]. Making these designs radiation hardened
has often focused on the unique aspects of the asynchronous
nature - the asynchronous control logic [33], [34]. One of the
simplest approach is dual-modular redundancy with guard-
gates [35]. This doubles the size of the control logic, but
given the control logic is a small fraction of the overall
chip area, the penalties are quite manageable. SETs in the
combinational logic, however, must be handled through a
combination of gate-sizing, spatial redundancy, or temporal
redundancy, similar to other synchronous designs. SERAD is
also based on bundled-data but is, to the best of our knowledge,
the first to be inspired by a timing-resilient form of bundled-
data design.
III. SERAD TEMPLATE
The proposed SERAD template, illustrated in Fig. 2, uses
single-rail combinational logic, error detecting logic (EDL),
delay lines, DICE latches, and a novel SERAD controller.
SERAD mitigates SETs that originate in the sequential ele-
ments of the pipeline, i.e, latches, using DICE latches. SERAD
prevents SETs that originate in the combinational logic from
propagating from one pipeline stage to the next using a special
form of temporal redundancy. Any SET which appears at the
input of a pipeline latch when it is transparent is identified
as an error by the EDL and is mitigated by stalling the
pipeline until the data is re-sampled. In particular, the SERAD
controllers communicate with each other using a soft error
3(a) Synchronous template
(b) TMR template (c) Glitch-filter template
Fig. 1: Block diagram of original flop-based circuit and two well-known SET-resilient variants.
Fig. 2: Block diagram of proposed SERAD template
resilient handshaking protocol that can ensure any SET of
width less than a predefined value σ is mitigated.
Before explaining this protocol in detail, however, we first
describe the delay notations critical to a SERAD design.
• τ : Maximum pulse width of an SET that will be detected
and mitigated. It is chosen such that the probability of
occurrence of an SET of width greater than τ in the given
technology is negligible.
• φ: Minimum pulse-width required by the DICE latch to
correctly sample data.
• σ: Maximum(φ,τ )
• ∆: Worst case delay of the combinational logic or critical
path delay including the propagation delay through one
DICE latch.
• δ = ∆ − σ. This defines the required delay of the delay
line between controllers, as illustrated in Fig. 2, ignoring
the overhead of the EDL and control circuit which will
be discussed in Section III-E.
Fig. 3: Timing diagram of the proposed SERAD template
• y: Time required to close and re-open a DICE latch
A. Soft Error Resilient Handshaking Protocol
Fig. 3 illustrates the expected behavior of the CLK signals
associated with two instructions flowing through a four-stage
SERAD pipeline. Instruction 1 launches from Stage 1. An SET
that occurs in the combinational logic path between Stage 1
and Stage 2 is detected at the Stage 2 latch. The rising edge
of Stage 3’s CLK signal is nominally scheduled to occur δ
time units after Stage 2’s CLK closes for the first time, shown
as the dotted gray region. However, because of the occurrence
of an SET, the Stage 2’s latch closes and re-opens, giving
Instruction 1 a total of (δ + (2 ∗ σ) + y) time to pass from
Stage 2 to Stage 3, where y is the time taken to close the
latch, sample the error, and reopen the latch. Subsequently,
instruction 2 does not suffer a timing violation in Stage 2,
which allows Stage 3’s CLK signal to rise δ time units after
Stage 2’s CLK falls.
4Fig. 4: SERAD single-rail controller on which sized GDMR
controller is based.
Note that if the data at the input of the latch is stable when
the CLK is high as well as during the subsequent hold time
of the latch, then the latched data is error free. This interval
is precisely when the EDL checks for an error and is defined
as the SET Filtering Window (SFW). Its length is σ plus the
hold time of the DICE latch, where σ, as indicated above,
is the maximum of φ and τ . This ensures that the SFW is
large enough to safely catch the SETs. The fact that the error
detection window also includes the latches’ hold time is unique
to SET-resilient design and is motivated in Section III-D.
B. SERAD Controller Design
The SERAD controller is based on the single-rail controller
shown in Fig. 4 and is illustrated in Fig. 5. It is designed
using a combination of GDMR [36] and gate sizing [9]
techniques and we call it a sized GDMR SERAD controller.
The SETs on the internal nodes of the controller are mitigated
via spacial redundancy and guard gates while the SETs on
the output nodes are mitigated using gate-sizing. In particular,
the feedback loops in the single-rail controller are cut and the
resultant combinational logic is duplicated to introduce spatial
redundancy. Each pair of redundant outputs is passed through
a sized guard gate. The guard gates are sized such that any
SET on the output node with a pulse width less than or equal
to σ is mitigated. Similarly, any SET of pulse width less than
or equal to σ on the internal or input nodes is filtered out by
the guard gates. For this work, we have used the gate-sizing
technique proposed by Zhou et al in [9]. In this work, the
authors calculate the width of an inverter to remove the effect
of SET by reducing its amplitude. For a given LET (linear
energy transfer) value, the equivalent amount of charge (Q)
that can be deposited is first computed. Needless to say, Q
is process dependent. The design is sized (i.e. the width is
increased) to increase the node capacitance until the resulting
SET pulse is sufficiently attenuated in amplitude [9]. We have
implemented the gate-sizing algorithm proposed in [9] to size
the guard-gate for radiation hardening. We note here that our
sized guard gate is about 10× larger than a minimum sized
guard gate. Further, sizing the guard gate also motivates the
preceding circuits to be appropriately sized. We observe that
the latency of the controller with sized guard gate is 6% larger
than the controller with minimum sized guard gates.
The single-rail controller is implemented as a burst-mode
Fig. 5: Sized GDMR SERAD controller
state machine [37] and synthesized using the 3D tool [38].1
We then implemented the GDMR SERAD controller by dupli-
cating the single rail controllers, and combining their outputs
through the sized guard gates. The non-reset input signals to
the first rail of the controller are L.req1, R.ack1, Err and Corr
and the output signals are CLK, L.ack1 and R.req1. The input
signals to the second rail of the controller are L.req2, R.ack2,
Err and Corr and the output signals are CLK, L.ack2 and
R.req2. The output rails are independent but the input signals
to the two rails are dependent. Most of the inputs are driven by
the output of neighbouring controllers which are error-free as
the final output nodes are sized. The exception is the signals
Err and Corr that are driven from the SERAD EDL which is
discussed in Section III-C.
Fig. 6 shows the burst-mode state machine specification for
a single-rail controller of a normal pipeline stage. The behavior
of the machine can be summarized as follows.
• On deactivating reset signal, the controller sends an
acknowledgement to the previous stage (moving from
state 0 to state 1).
• On receiving a request from previous stage and acknowl-
edgement from next stage, the controller sets the clock
signal high, waits for the EDL error signal to reset, and
waits for the done signal to go high via the illustrated
asymmetric delay line (moving from state 1 to 2 to 3).
• As soon as the done signal goes high, the clock is
driven low and the controller waits to receive the error
information (in state 4).
• If Err goes high, indicating the occurrence of an SET,
the clock is driven high and the controller loops back to
state 3.
• If Corr goes high, the controller sends a request to the
next stage and an acknowledgement to the previous stage,
signaling that the latched data is error free (state 6).
Due to the two-phase nature of the machine the states 1-5 are
replicated in states 6-9 with the alternate signal transitions.
Note that the controller for other pipeline stages is similar,
but is designed to generate an output request after reset (an
initial token on reset). This type of controller is called a token
1Note that the reset signal rst is added to the specification to ensure that
the state machine initializes properly.
5Fig. 6: Burst-mode state machine of single-rail SERAD controller
controller [39] and generates the needed initial state to the
system, avoiding deadlock right after reset.
The synthesized logic expressions of the single-rail con-
troller are given below. The 3D synthesis tool analyzes the
state space of the specification and guarantees that these
expressions are free of static and dynamic hazards. In some
cases, this synthesis process requires the introduction of state
variables, such as the signal z below. Each expression effec-
tively describes the next state expression of one output signal
that can be implemented with combinational logic but in many
cases these output signals must be fedback as inputs to the
controller.
Rreq = rst.(Corr.Lack +Done.Lack + z.Lack
+z.Corr.Done)
Lack = rst.(Corr.Lack +Done.Lack + z.Lack
+z.Corr.Done)
clk = rst.(Err.Done+ clk.Done+ z.Lreq.Rack
+ Lreq.Rack.Done.z)
z = rst.(Done.Lack + z.Corr + z.Lack
+ z.clk.Err)
Note that some delays were added to the controller feedback
wires to avoid essential hazards. In addition, other delays were
added to some of the primary outputs to avoid violations of
the fundamental mode assumption [40].
C. SERAD Error Detection Logic
As illustrated in Fig. 7, the EDL consists of transition
detectors, asymmetric C-elements, and a Q-Flop [41]. For our
design, we implemented the EDL using Transition Detecting
Fig. 7: Block diagram of SERAD error detecting logic
Time Borrowing (TDTB) latches proposed in [19]. The input
to the DICE latch, Data, is fed to a transition detector con-
sisting of an XOR gate and delay element δp. The asymmetric
C-element remembers errors detected by the transition detector
during the high phase of either CLK or Sample bar. The
outputs of the C-elements are ORed together and sampled
by a Q-flop. The Q-flop produces dual-rail outputs Err and
Corr which are both initially 0. After Sample raises, the Q-
flop samples its input. If its input is 1, i.e., if there was an SET,
it raises Err; Otherwise it raises Corr. It resets both outputs
when Sample falls.
There are two small delay lines, typically made of a chain
of inverters, that help ensure setup/hold conditions of the Q-
flop are generally met. In particular, Sample is a sufficiently
delayed inverted version of CLK that ensures that the C-
element’s output propagates to the Q-flop before Sample goes
high, thereby adhering to the Q-flop’s setup time. Sample bar
is used as an input to the C-element to ensure the C-element
output resets only after the Q-flop samples the data, thereby
adhering to the Q-flop’s hold time. The explicit constraints on
6these delays are further explained in Section III-E.
Notice that if there is an SET on any of the nodes within
the EDL, the EDL may produce a false-alarm that causes the
controller to re-sample the latches inputs. It is important to
recognize that this causes a degradation in performance but
no logical error. For this reason most of the EDL need not to
have the overhead of being made SET tolerant. In particular,
only the Q-flop in the EDL is sized so that its outputs are
immune to radiation strikes, guaranteeing that even in the
presence of an SET, Corr and Err can never both be high
and the asynchronous control logic has clean inputs.
We also note that more recent EDL designs can be adopted
that reduce the overhead of error detection [42], [43]. In
particular, [43] proposed implementing multi-bit transition-
detectors that significantly reduce the amortized area and
power overheads.
D. Metastability Analysis
There are two sources of metastability that must be con-
sidered in this design, in the error-detection logic and in
the data path latches. First, metastability can occur in the
EDL logic because of an internal SEU in the C-element or
a race between its inputs. Fortunately, the Q-flop contains a
special metastability filter [41] which guarantees that the Q-
Flop outputs will remain zero until any internal metastability
is resolved. Secondly, metastability can occur in the datapath.
Under normal operation, the circuit is timed to ensure that
the inputs of the latches satisfy the setup and hold time
that is relative to the falling edge of the CLK. However, an
SET may violate this condition causing a latch to go into
metastability. For this reason the SFW is set to the time
period for which the clock is high plus the hold time of the
latch. Consequently, any SET that causes a latch to go into
metastability will be detected and mitigated by re-opening and
closing the latch, essentially re-sampling the data. This is in
contrast to its timing-resilient inspiration [18] which, relying
on worst-case timing analysis, does not re-sample the data
but instead mitigates timing violations only by slowing down
downstream pipeline stages.
Notice also that the circuitry that detects an SET can
invariably experience metastability if the SET occurs at the
very beginning or end of the SFW. The benefit of the Q-
flop is that it isolates the controllers from the metastability
and, on average, resolves any metastability very quickly. The
SERAD controller simply stalls until one of the dual-rail error
signals is asserted (which happens only after metastability
is resolved). In contrast, the only approach to resolve this
synchronously that we are aware of requires waiting a worst-
case delay of one to two clock cycles for metastability to
resolve with a reasonable mean-time-between-failures. Unfor-
tunately, one cannot decide whether to stall a pipeline one
to two clock cycles after the potential metastability event
occurs. For this reason most timing resilient synchronous
designs either have ignored metastability [44], resulting in
poor mean-time-between-failure [45], or have relied on flush-
and-replay logic in which the entire pipeline is flushed of
data and replayed (e.g., [46]). Consequently, although some
synchronous timing resilient schemes are also resilient to some
SETs (as suggested in, for example, [47]), none provide as
comprehensive resilience as SERAD.
E. Timing Constraints
This section explains the timing constraints associated with
SERAD. To simplify this discussion we denote the delay
components of the error detecting logic as follows.
• The propagation delay of the XOR gate i.e., from D to
X of the EDL, is denoted δxor−pd.
• The output pulse width of the XOR gate is denoted
δX−pw.
• The C-element’s rising and falling propagation delays
are denoted δC−pullup and δC−pulldown, respectively. For
simplicity, we assume these delays are larger than the C-
element’s minimum pulse width requirements.
• The propagation delay of the OR tree between the C-
elements and Q-Flop is denoted δor−tree.
• The Q-Flop’s setup and hold times are denoted δQ−setup
and δQ−hold, respectively.
The output pulse of XOR gate should be sufficient wide so
that it is captured by the C-element. This constraint can be
approximated as
δX−pw ≥ δC−pullup (1)
As illustrated in Fig. 7, a small compensation delay δcomp
exists between the CLK and Y signals. It is implemented
with an asymmetric delay line for which δcomp−r denotes the
delay after a rising input and δcomp−f denotes the delay after
a falling input. Using this delay line, the path delays from
CLK to the rising transition of Y and the delay from data to
X are matched. This ensures that the C-element registers the
change in Data as soon as the associated latches go transparent
and remains sensitized until after the hold time of the latches
expires. Assuming an ideal clock-tree, the constraints the delay
line must adhere to are as follows.
δcomp−r = δxor−pd + δX−pw (2)
δcomp−f = δxor−pd + δX−pw + δDICE−hold (3)
The Q-flop also has setup and hold constraints that should
be observed (in the absence of metastability) to optimize
performance.
1) Setup Constraint: The Q-flop samples its input value
when Sample goes high. As discussed earlier, to properly
catch all desired SETs, Sample should go high only after the
completion of the hold period of the latches, i.e. when Y goes
low, and the error signal propagates to the Q-flop inputs.
δSu ≥ δC−pullup + δor−tree + δQ−setup (4)
δSu is the lower bound of the falling delay of the delay line
between Sample and Y.
7Fig. 8: Verilog simulations showing (a) the resiliency of SERAD controllers to SETs; (b) SETs at the dual rail input of
controllers; (c) the working of SERAD during SETs; (d) the working of SERAD during consecutive SETs.
2) Hold Constraint: The Q-flop input should be stable for
its own hold time after the positive edge of sample.
δQ−hold ≤ δH + δC−pulldown + δor−tree, (5)
where δH is the minimum rising delay of the delay line
between Sample and Sample bar. Fortunately, this constraint
is easily met given reasonable physical design. It is thus not
required during synthesis but must be verified during the final
post-physical-design sign-off procedure.
The EDL and controller delays also impose a lower bound
on the non-overlap period δno between the two clocks of
consecutive pipeline stages, as follows:
δno ≥ δcomp−r + δSu + δQ−pd + δctrl, (6)
where δQ−pd is the Sample to Corr delay of the Q-flop and
δctrl is the control logic delay from the Corr signal of one
pipeline stage to the rising edge of the CLK signal of the next
pipeline stage. If the worst-case delay of the combinational
logic ∆ is smaller than δno +σ, this control overhead dictates
the minimum cycle time of the design. Otherwise, the control
overhead can be completely hidden by using a smaller delay
line δ between pipeline stages.
IV. EXPERIMENTAL RESULTS
The proposed radiation-hardened design template was tested
and evaluated using both digital and analog test benches.
The digital test benches used the Verilog digital simulator
QuestaSim to demonstrate correct logical behavior of the
template despite the presence of SETs. For these simulations,
digital SETs were forcibly injected in the design. Spice-like
analog simulation within the Cadence Virtuoso tool suite were
used to demonstrate the effectiveness of gate sizing and the
impact of SETs that occur in the EDL. For these analog
simulations SETs were modeled using double exponential
current pulses whose specification were obtained using TCAD
simulations.
A. Verilog Simulations
Case I: Fig. 8(a) shows an SET in the controller that will
not propagate to the final outputs. An internal node (n31) has
an SET which caused the output of one rail clk1 to have an
error but the final output CLK is error-free because the error
is filtered by the sized guard gate.
Case II: Fig. 8(b) shows that an SET at the input of the
dual-rail controllers does not result in error at any of the final
outputs. In particular, it shows that an SET at input R.ack1
does not cause an error in any of the outputs, CLK, R.req1,
R.req2, L.ack1 and L.ack2.
Case III: Fig. 8(c) shows what happens when an SET in
the combinational logic propagates to the input of the latch
when clock is high. The clock (CLK), which re-samples the
data, is closed and reopened, marked as “re-sampling” in
Fig. 8(c). Fig. 8(d) shows an interesting sub-case when two
SETs happens at the input of latch in consecutive cycles. Here,
SERAD re-samples the data twice before it samples the correct
data.
B. Spice Simulations
Case IV: A particle strike occurs at the controller output.
The output does not show a glitch because the guard gates
are properly sized. In particular, the simulations of the sized
guard gate in Fig. 9(a) show two cases: (i) Case-A shows
the double-exponential SET current pulse [15] applied at the
output of the sized guard-gate which does not propagate. (ii)
Case-B indicates an SET at the input of the guard-gate which
also does not affect the output.
Case V: Fig. 9(b) illustrates the case when a particle strike
occurs at one of the outputs of the Q-flop (Err). Since the
Q-flop is properly sized, the output of the Q-flop does not
glitch.
Case VI: Fig. 9(c) illustrates that a soft-error in the EDL
logic can be latched by Q-flop and lead to a false error. This
happens when an SET occurs in the error detecting logic
that is detected by the Q-flop which consequently asserts its
Err signal. The controller conservatively interprets this as an
indication of an SET in the combinational logic. The controller
thus unnecessarily re-samples “stable” data.
We have also simulated the SERAD design under PVT
variations, consisting of three process corners (FF, SS, TT),
10% voltage variations, and temperature ranging from −40◦C
to +70◦C. We found process variations causes controller delay
to vary by about 8% around TT and that across all corners
SET pulse widths vary by about 2.6%. We observed that the
SERAD controllers and EDL have sufficient margins to remain
effective against SETs even under PVT variations.
8Fig. 9: Spice simulations showing (a) resiliency of a sized guard-gate to an SET in the SERAD controller; (b) resiliency of
a sized Q-flip when a particle strike occurs at one of the outputs of a Q-flop (Err); (c) False error at the output of a sized
Q-flop when a particle strike occurs at the input of the Q-flop.
C. Plasma Case Study
We also functionally validated the proposed template within
an application in a radhard processor. In particular, we used
a 3-stage Plasma [48] processor, an OpenCore MIPS CPU,
synthesized using a leading commercial synthesis tool and
the NCSU 45nm open-source cell library. We compare our
post-synthesized SERAD Plasma processor with three other
variants:
• Sync Plasma: An unhardened synchronous implementa-
tion of Plasma processor used as a comparison baseline.
• Glitch-Filter Plasma: A radiation-hardened synchronous
implementation of Plasma where glitch-filtering [15] is
used to mitigate glitches in the datapath. This technique
has less area and power overhead than triple modular
redundancy at the expense of a performance penalty equal
to twice the delay of a worst-case SET pulse.
• TMR Plasma: A radiation-hardened synchronous imple-
mentation of Plasma using the conventional technique
of triple modular redundancy [7]. This technique incurs
large area and power overheads but does not incur a
significant performance penalty.
Power consumption is calculated at 286 MHz (the maximum
frequency achievable by all four compared variants) using
the signal activity obtained from running the “count” and
“pi” programs that are included in the Plasma open-source
distribution.2
More specifically, a combination of Python and TCL scripts
were used to convert the original synchronous design into the
radiation hardened variants. Custom cells were made for the
guard gate and DICE FFs and latches. The glitch filter version
requires an SET filter [15] at the input of each FF and all FFs
are replaced with DICE FFs with SET filters in this design.
For the TMR version, the combinational logic and FFs are
triplicated and voters are added to each triplet of FFs.
2The “count” program performs
∑∞
i=1 3
i and displays each term in the
series in its word form and the “pi” program numerically computes the value
of pi. The “Coremark” program contains implementations of the following
algorithms: list processing (find and sort), matrix manipulation (common
matrix operations), state machine processing to determine if an input stream
contains valid numbers), and a CRC (cyclic redundancy check).
Similarly, the synchronous Plasma design is converted into
a SERAD design using a semi-automated CAD flow similar to
the timing-resilient so-called ”Blade” flow presented in [18].
The Blade flow involves automatically synthesizing the RTL
to gates using standard FFs and using a custom TCL script to
replace the FFs with two-phase master-slave latches, re-time
the slave latches, replace timing-critical latches with error-
detecting latches, and automatically add the EDL and Blade
controllers to manage the local clocks. Our SERAD flow is
different in that all latches are replaced with DICE latches
(rather than only the timing-critical latches) and that the added
EDL and SERAD controllers (described in Sections III-B &
III-C) are different than those used for Blade [18]. As with
the Blade flow, the SERAD flow is applicable to any RTL
synchronous specification.
Table I shows the maximum performance and associated
area of the four variants of Plasma using the synchronous
design as a baseline. According to Table I, the area of SERAD
Plasma design is 80% higher than the baseline but comparable
to the Glitch-Filter Plasma and less than half of the TMR
design. This is because the added relative cost of the error
detecting logic (EDL) and control logic are not large, and
the total cost of DICE latches is not significantly more than
that of the DICE FFs (despite having two-times more latches
than FFs). Notice that the Glitch-Filter Plasma has the highest
performance degradation due to the additional delay of twice
of maximum SET pulse width [5]. It is important to emphasize
that this cost is fixed and thus is more prominent for high-
frequency designs.
Design type Max. AreaFreq. Comb Seq Total Incr. (%)
SERAD Plasma 333 55784 25000 80784 80.6
Sync Plasma [48] 340 33083 11656 44739 0
Glitch-Filter Plasma [15] 286 57867 23439 81306 81.7
TMR Plasma [7] 329 143679 34969 178648 299
TABLE I: Maximum frequency (MHz) and area (µm2) com-
parison of the four design variants
Table II, Table III, and Table IV compare the power
consumption among the four design variants at the clock
frequency of 286MHz using “count,” “pi”, and “CoreMark”
9Design type Comb Seq Total Incr. (%)
SERAD Plasma 2.57 4.17 6.74 -23.7
Sync Plasma [48] 2.12 6.71 8.83 0
Glitch-Filter Plasma [15] 2.97 10.2 13.2 49.5
TMR Plasma [7] 7.69 18.34 26 194.4
TABLE II: Power (mW) comparison of the four design variants
with a 286MHz clock running the “count” program
Design type Comb Seq Total Incr. (%)
SERAD Plasma 2.54 4.82 7.36 -25.1
Sync Plasma [48] 2.93 6.88 9.82 0
Glitch-Filter Plasma [15] 4.91 10.46 15.37 56.5
TMR Plasma [7] 11.26 18.64 29.9 204.5
TABLE III: Power (mW) comparison of the four design
variants with a 286MHz clock running the “pi” program
programs, respectively.3 Interestingly, the SERAD design is
comparable to unhardened baseline in terms of performance
and, somewhat counter-intuitively, is estimated to consume
lower power. In particular, as detailed in the three power
comparison tables, SERAD consumes relatively low sequential
power. This is a result of three factors. First, compared to
the default FF design in the unhardened version, there is no
local buffer in a DICE latch, which reduces clock switching
power. The impact of this is particularly significant in this
case study because of the relatively low switching activity on
the data signals. Interestingly, this type of cell design has
been independently proposed in error-detecting latch-based
designs because the removed local clock buffers can be more
efficiently compensated during physical design by properly ad-
justing the clock tree [42]. Second, the input capacitance of the
clock pin in the default FF is larger than in the DICE latch. In
our library, the input capacitance is 8fF for the FF versus 5fF
for the DICE latch. Lastly, the latch-based design yields lower
switching activity at the sequential elements by eliminating
unnecessary glitches. For our particular experiment, the data
pin of the DICE latches has an average activity factor of 2.0%
compared to 3.7% for the data pin of FFs in the unhardened
design. In addition, we also note that the increase in area
is due largely to the error detection logic which in normal
operation does not switch. This further explains why despite
being larger, we see lower power consumption than the other
three variants (operating at the same frequency). However, we
do see about 5.9× increase in the leakage power as compared
to sync Plasma. In particular, the fraction of total power due
to leakage in our SERAD design grew to 22.6%. This is due
to the fact we are significant larger in size and because our
DICE latches have not been optimized for leakage.
While this power analysis comparison is promising, it is
also important to recognize that these results do not take
into account the power of the clock tree that is designed
during and accounted for after physical design. In particular,
two-phase latch-based designs require two clock trees which
may present an additional power overhead. Such latch-based
designs, however, have built-in hold margins and thus require
fewer hold buffers (also added during physical design) than
3Because the four variants compute exactly the same operation across every
cycle, running all variants at the same frequency also provides a direct and
fair comparison of energy consumption.
Design type Comb Seq Total Incr. (%)
SERAD Plasma 2.59 4.74 7.34 -12.8
Sync Plasma [48] 2.26 6.16 8.42 0
Glitch-Filter Plasma [15] 3.05 6.58 9.63 14.3
TMR Plasma [7] 8.26 18.43 26.69 216.9
TABLE IV: Power (mW) comparison of the four design vari-
ants with a 286MHz clock running the “CoreMark” program
their FF-based counterparts. The combined impact of these
effects is process, cell-library, and design dependent. Given
efficient clock tree synthesis for asynchronous designs is
an on-going research and engineering challenge (see e.g.,
[49]) we have estimated this overhead by place-and-routing
synchronous FF and two-phased latch-based Plasma designs
and found the clock tree power increased by 12% representing
approximately a 2% increase in overall power.
Lastly, it is important to note that these results do not take
into account the ability of the asynchronous design to track
process, voltage, and temperature variations. Because the delay
of the control circuits is generally positively correlated with the
delay of the combinational logic, smaller margins are needed
in the delay lines than in synchronous clock periods. Although
difficult to quantify, this leads to increased performance and/or
increased yield [50], [51], the degree of which is dependent on
the relative amount of local versus global variation, whether
speed binning is employed, and whether chips are allowed to
vary in performance due to transient voltage or temperature
variations.
V. CONCLUSIONS
This paper presents a design template for soft-error resilient
asynchronous bundled-data design called SERAD that uses a
novel combination of space and temporal redundancy to be-
come resilient to SETs. The SERAD design template has been
validated in a 45nm technology using a combination of Spice
and Verilog simulations. The resulting area, performance, and
power of the design template has been evaluated on an open-
core MIPS-like processor. Compared with the unhardened
synchronous version, the post-synthesis SERAD design is
81% larger, exhibits negligible performance degradation, and
is estimated to consume lower power. It consumes less than
half of the area of the TMR design and is significantly faster
than the glitch-filtering-based design, making it a promising
approach for radiation hardening.
There are two areas of future work that can improve the
benefits of SERAD and expand its applicability. First, we
can improve the performance of SERAD by leveraging its
inherent timing-resilience by modifying its control to also
recover from uncommonly long combinational delays. In
particular, if the latch transparency window is started early,
near-critical combinational delays will be detected as errors.
They will trigger the error detecting logic and be mitigated
via re-sampling. This should cost negligible area but lead to
significant further performance improvements over unhardened
synchronous designs. Second, we propose to design radiation-
hardened asynchronous-synchronous clock domain crossing
modules that can surround a SERAD design, enabling it
10
as a drop-in replacement for latency-insensitive synchronous
modules.
ACKNOWLEDGEMENTS
We would like to acknowledge the help of Dr. Dylan Hand
in porting the CoreMark program to Plasma. This work is
supported in part by a grant received from the Ministry of
Electronics and Information Technology (MEITY), Govern-
ment of India for a Special Manpower Development Project
for Chips to System Design (SMDP-C2SD) and by a grant
received from the Science and Engineering research board
(SERB) grant CRG/2018/005013 .
REFERENCES
[1] V. Ferlet-Cavrois, L. W. Massengill, and P. Gouker, “Single event
transients in digital CMOS - a review,” IEEE Trans. Nucl. Sci., vol. 60,
no. 3, pp. 1767–1790, June 2013.
[2] N. N. Mahatme, S. Jagannathan, T. D. Loveless, L. W. Massengill, B. L.
Bhuva, S. J. Wen, and R. Wong, “Comparison of combinational and
sequential error rates for a deep submicron process,” IEEE Trans. Nucl.
Sci., vol. 58, no. 6, pp. 2719–2725, Dec 2011.
[3] G. R. Hopkinson, “Radiation effects in a CMOS active pixel sensor,”
IEEE Trans. Nucl. Sci., vol. 47, no. 6, pp. 2480–2484, Dec 2000.
[4] N. N. Mahatme, B. Bhuva, N. Gaspard, T. Assis, Y. Xu, P. Marcoux,
M. Vilchis, B. Narasimham, A. Shih, S. J. Wen, R. Wong, N. Tam,
M. Shroff, S. Koyoma, and A. Oates, “Terrestrial SER characterization
for nanoscale technologies: A comparative study,” in Proc. 2015 IEEE
International Reliability Physics Symp. (IRPS), April 2015, pp. 4B.4.1–
4B.4.7.
[5] M. Hosseinabady, P. Lotfi-Kamran, G. D. Natale, S. D. Carlo, A. Benso,
and P. Prinetto, “Single-event upset analysis and protection in high speed
circuits,” in Proc. 2006 Eleventh IEEE European Test Symp. (ETS), Jun
2006, pp. 29–34.
[6] N. N. Mahatme, N. J. Gaspard, T. Assis, S. Jagannathan, I. Chatterjee,
T. D. Loveless, B. L. Bhuva, L. W. Massengill, S. J. Wen, and R. Wong,
“Impact of technology scaling on the combinational logic soft error rate,”
in Proc. 2014 IEEE International Reliability Physics Symp. (IRPS), June
2014, pp. 5F.2.1–5F.2.6.
[7] R. E. Lyons and W. Vanderkulk, “The use of triple-modular redundancy
to improve computer reliability,” in IBM Journal of Research and
Development, vol. 6, no. 2, April 1962, pp. 200–209.
[8] R. Kaur, N. Surana, and J. Mekie, “Guarded dual rail logic for soft
error tolerant standard cell library,” in Proc. 2016 16th European Conf.
on Radiation and Its Effects on Components and Systems (RADECS),
Sept 2016, pp. 1–4.
[9] Q. Zhou and K. Mohanram, “Gate sizing to radiation harden combi-
national logic,” in IEEE Trans. Comput.-Aided Design Integr. Circuits
Syst., vol. 25, no. 1, Dec 2006, pp. 155 – 166.
[10] R. R. Rao, D. Blaauw, and D. Sylvester, “Soft error reduction in combi-
national logic using gate resizing and flipflop selection,” in Proc. 2006
IEEE/ACM International Conf. on Computer Aided Design (ICCAD),
Nov 2006, pp. 502–509.
[11] N. Miskov-Zivanov and D. Marculescu, “MARS-C: modeling and re-
duction of soft errors in combinational circuits,” in Proc. 2006 43rd
ACM/IEEE Design Automation Conf. (DAC), July 2006, pp. 767–772.
[12] M. A. Sabet, B. Ghavami, and M. Raji, “A scalable solution to soft error
tolerant circuit design using partitioning-based gate sizing,” IEEE Trans.
Rel., vol. 66, no. 1, pp. 245–256, March 2017.
[13] P. Mongkolkachit and B. Bhuva, “Design technique for mitigation of
alpha-particle-induced single-event transients in combinational logic,”
IEEE Trans. Device Mater. Rel., vol. 3, no. 3, pp. 89–92, Sept 2003.
[14] R. Naseer and J. Draper, “The DF-dice storage element for immunity
to soft errors,” in Proc. 48th Midwest Symp. on Circuits and Systems
(MWSCAS), Aug 2005, pp. 303–306 Vol. 1.
[15] A. Balasubramanian, B. Bhuva, J. Black, and L. Massengill, “RHBD
techniques for mitigating effects of single-event hits using guard-gates,”
in IEEE Trans. Nucl. Sci., vol. 52, no. 6, Dec 2005, pp. 2531 – 2535.
[16] A. Stabile, V. Liberali, and C. Calligaro, “Design of a rad-hard library
of digital cells for space applications,” in Proc. 2008 15th IEEE
International Conf. on Electronics, Circuits and Systems (ICECS), Aug
2008, pp. 149–152.
[17] D. J. Barnhart, T. Vladimirova, M. N. Sweeting, and K. S. Stevens,
“Radiation hardening by design of asynchronous logic for hostile
environments,” IEEE J. Solid-State Circuits, vol. 44, no. 5, pp. 1617–
1628, May 2009.
[18] D. Hand, M. M. Trevisan, H.-H. Huang, D. Chen, F. Butzke, Z. Li,
M. Gibiluka, M. Breuer, N. L. V. Calazans, and P. A. Beerel, “Blade -
a timing violation resilient asynchronous template,” in Proc. 2015 21st
International Symp. on Asynchronous Circuits and Systems (ASYNC),
2015, pp. 21–28.
[19] K. Bowman, J. Tschanz, N. S. Kim, J. Lee, C. Wilkerson, S. Lu,
T. Karnik, and V. De, “Energy-efficient and metastability-immune re-
silient circuits for dynamic variation tolerance,” IEEE J. Solid-State
Circuits, vol. 44, no. 1, pp. 49–63, Jan 2009.
[20] A. Dixit and A. Wood, “The impact of new technology on soft error
rates,” in Proc. 2011 IEEE International Reliability Physics Symp.
(IRPS), April 2011, pp. 5B.4.1–5B.4.7.
[21] D. G. Mavis and P. H. Eaton, “Soft error rate mitigation techniques
for modern microcircuits,” in Proc. 2002 IEEE International Reliability
Physics Symp. (IRPS), 2002, pp. 216–225.
[22] T. Calin, M. Nicolaidis, and R. Velazco, “Upset hardened memory design
for submicron CMOS technology,” IEEE Trans. Nucl. Sci., vol. 43, no. 6,
pp. 2874 – 2878, Dec 1996.
[23] K. Kobayashi, K. Kubota, M. Masuda, Y. Manzawa, J. Furuta, S. Kanda,
and H. Onodera, “A low-power and area-efficient radiation-hard redun-
dant flip-flop, DICE ACFF, in a 65nm thin-BOX FD-SOI,” IEEE Trans.
Nucl. Sci., vol. 61, no. 4, pp. 1881–1888, Aug 2014.
[24] S. Mitra, M. Zhang, N. Seifert, T. Mak, and K. S. Kim, “Soft error
resilient system design through error correction,” in Proc. 2006 IFIP
International Conf. on Very Large Scale Integration (VLSI-SOC), Oct
2006, pp. 332–337.
[25] P. Eaton, J. Benedetto, D. Mavis, K. Avery, M. Sibley, M. Gadlage, and
T. Turflinger, “Single event transient pulsewidth measurements using a
variable temporal latch technique,” IEEE Trans. Nucl. Sci., vol. 51, no. 6,
pp. 3365–3368, Dec 2004.
[26] J. Pontes, N. Calazans, and P. Vivet, “H2A: A hardened asynchronous
network on chip,” in Proc. 2013 26th Symp. on Integrated Circuits and
Systems Design (SBCCI), Sept 2013, pp. 1–6.
[27] Y. Monnet, M. Renaudin, and R. Leveugle, “Hardening techniques
against transient faults for asynchronous circuits,” in Proc. 2005 11th
IEEE International On-Line Testing Symp. (IOLTS), July 2005, pp. 129–
134.
[28] W. Jang and A. J. Martin, “SEU-tolerant QDI circuits [quasi delay-
insensitive asynchronous circuits],” in Proc. 2005 11th IEEE Interna-
tional Symp. on Asynchronous Circuits and Systems (ASYNC), March
2005, pp. 156–165.
[29] W. Kuang, P. Zhao, J. S. Yuan, and R. F. DeMara, “Design of asyn-
chronous circuits for high soft error tolerance in deep submicrometer
cmos circuits,” IEEE Trans. VLSI Syst., vol. 18, no. 3, pp. 410–422,
March 2010.
[30] J. Di, “A framework on mitigating single event upset using delay-
insensitive asynchronous circuits,” in Proc. 2007 IEEE Region 5 Tech-
nical Conf., April 2007, pp. 354–357.
[31] W. Friesenbichler and A. Steininger, “Soft error tolerant asynchronous
circuits based on dual redundant four state logic,” in Proc. 2009 12th
Euromicro Conf. on Digital System Design, Architectures, Methods and
Tools (DSD), Aug 2009, pp. 100–107.
[32] P. A. Beerel, G. D. Dimou, and A. M. Lines, “Proteus: An ASIC flow
for GHz asynchronous designs,” IEEE Des. Test. Comput., vol. 28, no. 5,
pp. 36–51, 2011.
[33] S. R. Naqvi, J. Lechner, and A. Steininger, “Protection of Muller-
pipelines from transient faults,” in Proc. 15th International Symp. on
Quality Electronic Design (ISQED), March 2014, pp. 123–131.
[34] I. A. Danilov, M. S. Gorbunov, A. I. Shnaider, A. O. Balbekov, Y. B.
Rogatkin, and S. G. Bobkov, “DICE-based Muller C-elements for soft
error tolerant asynchronous ICs,” in Proc. 2016 16th European Conf. on
Radiation and Its Effects on Components and Systems (RADECS), Sept
2016, pp. 1–4.
[35] S. Almukhaizim, F. Shi, E. Love, and Y. Makris, “Soft-error tolerance
and mitigation in asynchronous burst-mode circuits,” IEEE Trans. VLSI
Syst., vol. 17, no. 7, pp. 869–882, July 2009.
[36] A. Sai Aparna, J. Mekie, and H. Shah, “Single-error hardened and
multiple-error tolerant guarded dual modular redundancy technique,” in
Proc. 2018 31st International Conf. on VLSI Design and 2018 17th
International Conf. on Embedded Systems (VLSID), Jan 2018.
[37] R. Fuhrer, B. Lin, and S. Nowick, “Symbolic hazard-free minimization
and encoding of asynchronous finite state machines,” in Proc. IEEE
11
International Conf. on Computer Aided Design (ICCAD), Nov 1995,
pp. 604–611.
[38] K. Yun, D. Dill, and S. Nowick, “Synthesis of 3D asynchronous state
machines,” in Proc. 1992 IEEE International Conf. on Computer Design:
VLSI in Computers & Processors (ICCD), Oct 1992, pp. 346–350.
[39] P. Beerel, R. Ozdag, and M. Ferreti, A Designer’s Guide to Asynchronous
VLSI. Cambridge University Press, 2010.
[40] S. M. Nowick and D. L. Dill, “Exact two-level minimization of hazard-
free logic with multiple-input changes,” IEEE Trans. Comput.-Aided
Design Integr. Circuits Syst., vol. 14, pp. 986–997, 1992.
[41] F. Rosenberger, C. Molnar, T. Chaney, and T.-P. Fang, “Q-modules:
internally clocked delay-insensitive modules,” IEEE Trans. Comput.,
vol. 37, no. 9, pp. 1005–1018, Sep 1988.
[42] S. Kim, J. P. Cerqueira, and M. Seok, “A 450mV timing-margin-free
waveform sorter based on body swapping error correction,” in Proc.
IEEE Symp. on VLSI Circuits (VLSI-Circuits), 2016, pp. 1–2.
[43] W. Hua, R. N. Tadros, and P. A. Beerel, “Low area, low power,
robust, highly sensitive error detecting latch for resilient architectures,”
in Proc. 2016 International Symp. on Low Power Electronics and Design
(ISLPED), 2016, pp. 16–21.
[44] M. Fojtik, D. Fick, Y. Kim, N. Pinckney, D. Harris, D. Blaauw, and
D. Sylvester, “Bubble razor: An architecture-independent approach to
timing-error detection and correction,” in Proc. 2012 IEEE International
Solid-State Circuits Conf. Digest of Technical Papers (ISSCC), Feb 2012,
pp. 488–490.
[45] S. Beer, M. Cannizzaro, J. Cortadella, R. Ginosar, and L. Lavagno,
“Metastability in better-than-worst-case designs,” in Proc. 2014 20th
IEEE International Symp. on Asynchronous Circuits and Systems
(ASYNC), 2014, pp. 101–102.
[46] S. Das, C. Tokunaga, S. Pant, W.-H. Ma, S. Kalaiselvan, K. Lai, D. Bull,
and D. Blaauw, “Razor II: In situ error detection and correction for PVT
and SER tolerance,” IEEE J. Solid-State Circuits, vol. 44, no. 1, pp. 32–
48, Jan 2009.
[47] D. Bull, S. Das, K. Shivashankar, G. Dasika, K. Flautner, and D. Blaauw,
“A power-efficient 32 bit ARM processor using timing-error detection
and correction for transient-error tolerance and adaptation to PVT
variation,” IEEE J. Solid-State Circuits, vol. 46, no. 1, pp. 18–31, Jan
2011.
[48] Plasma CPU, 2014. Available: http://opencores.org/project,plasma.
[49] G. Gimenez, A. Cherkaoui, G. Cogniard, and L. Fesquet, “Static
timing analysis of asynchronous bundled-data circuits,” in Proc. 2018
24th IEEE International Symp. on Asynchronous Circuits and Systems
(ASYNC), 2018, pp. 110–118.
[50] J. Cortadella, M. Lupon, A. Moreno, A. Roca, and S. S. Sapatnekar,
“Ring oscillator clocks and margins,” in Proc. 2016 22nd IEEE Inter-
national Symp. on Asynchronous Circuits and Systems (ASYNC), May
2016, pp. 19–26.
[51] Y. Zhang, H. Zha, V. Sahir, H. Cheng, and P. A. Beerel, “Test margin and
yield in bundled data and ring-oscillator based designs,” in Proc. 2017
23rd IEEE International Symp. on Asynchronous Circuits and Systems
(ASYNC), May 2017, pp. 85–93.
Sai Aparna Aketi received her B.Tech degree in
Electrical Engineering at Indian Institute of Tech-
nology Gandhinagar in 2018. This work was started
when Aparna Aketi was interning at the Univer-
sity of Southern California as part of IUSSTF-
Viterbi Program in 2017. She is currently pursing
her doctoral degree under the guidance of Prof.
Kaushik Roy at Centre for Brain Inspired Computing
(C-BRIC), Purdue University. Her current research
interests include a variety of topics in explainable,
robust and energy-efficient deep learning.
Smriti Gupta received her B.Tech degree in Elec-
tronics & Communication Engineering from Insti-
tute of Engineering & Technology Lucknow, In-
dia in 2016 and her M.Tech degree in Electrical
Engineering from Indian Institute of Technology
Gandhinagar in 2018. She is currently working as a
Senior Engineer in MediaTek Bangalore, India. Her
interests include high performance and low power
VLSI designs.
Huimei Cheng Huimei Cheng is a Ph.D. student in
Ming Hsieh Department of Electrical and Computer
Engineering at the University of Southern California.
She received her B.S. degree at Nanjing University
of Information & Technology (China) in 2014, and
her M.S. degree from USC in 2016. Upon gradua-
tion, she worked at Synopsys in R&D Prime Time
team conducting research on pessimism reduction
in crosstalk. She is a student member of IEEE. Her
research interests include a variety of topics in CAD
and asynchronous VLSI design.
Joycee Mekie received her Ph.D. degreee in Elec-
trical Engineering from IIT Bombay in 2009, and
received her Bachelors and Masters in Electrical
Engineering from M. S. University of Baroda in
1997 and 1999, respectively. She joined as Assistant
Professor in the Electrical Engineering Department
at IIT Gandhinagar in 2009. She is a recipient of
the prestigious Young Faculty Research Fellowship
(YFRF) from Ministry of Electronics and Infor-
mation Technology under the Visvesvaraya PhD
scheme. She has served on the technical program
committee of several conferences, including ASYNC and VLSID, and is the
reviewer for several journals, including TCASI, TCASII and TCAD. Her
research interests include Approximate computing, Circuits for space appli-
cations, Asynchronous systems, Energy-efficient memory design, Computer
architecture and Network-on-chip architectures.
Peter A. Beerel received his B.S.E. degree in
Electrical Engineering from Princeton University,
Princeton, NJ, in 1989 and his M.S. and Ph.D.
degrees in Electrical Engineering from Stanford Uni-
versity, Stanford, CA, in 1991 and 1994, respec-
tively. Professor Beerel is currently a Full Professor
and Associate Chair of the Computer Engineering
Division of the Ming Hsieh Electrical and Com-
puter Engineering Department at the University of
Southern California. He co-founded TimeLess De-
sign Automation to commercialize an asynchronous
ASIC flow in 2008 and sold the company in 2010 to Fulcrum Microsystems
which was bought by Intel in 2011. His interests include a variety of topics
in CAD, VLSI, and Machine Learning. He is a Senior Member of the IEEE.
