A Reconfigurable Programmable Logic Block for a Multi-Style Asynchronous
  FPGA resistant to Side-Channel Attacks by Hoogvorst, Philippe et al.
ar
X
iv
:0
80
9.
39
42
v1
  [
cs
.C
R]
  2
3 S
ep
 20
08
Abstract
Side-channel attacks are efficient attacks against cryptographic de-
vices. They use only quantities observable from outside, such as the du-
ration and the power consumption.
Attacks against synchronous devices using electric observations are
facilitated by the fact that all transitions occur simultaneously with some
global clock signal.
Asynchronous control remove this synchronization and therefore makes
it more difficult for the attacker to insulate interesting intervals. In ad-
dition the coding of data in an asynchronous circuit is inherently more
difficult to attack.
This article describes the Programmable Logic Block of an asynchronous
FPGA resistant against side-channel attacks. Additionally it can imple-
ment different styles of asynchronous control and of data representation.
1 Introduction
Side-channel attacks (SCA) have been put forward mainly by Paul Kocher
et al. in 1996 in [22]. This first description of a SCA explained how the mere
observation of the duration of computations could allow an attacker to retrieve
the secret key. The attack was then improved and extended to other cryptosys-
tems [38, 6, 12, 49].
In 1999 Kocher et al. described what they called “dpa 1” [32]. This new
attack used the power consumption instead to the duration but yielded the
same result: the retrieval of the secret key. The process of this latter attack
is relatively simple: a large number of cryptographic operations are monitored
and the cipher text stored together with the electric consumption. Then guesses
were made of some parts of the secret key, which were confirmed or or not by
a statistical processing the data. Other attacks against various cryptosystems
were based on this method [5, 42, 18].
Countermeasures soon appeared to protect systems based on a strong alge-
braic structure[2, 20, 47, 25]. At he protection of opposite symmetric cryptosys-
tems often consisted in introducing some randomization either in the computing
process or power consumption to prevent the statistical processing of the ac-
quired data. However “counter-countermeasures” also appeared [19]. Some
other protection schemes were designed [26, 10].
An interesting and apparently efficient countermeasure is the WDDL2 [46]
which duplicates each signal in the circuit so that whatever the value is, one
of the lines will toggle. This countermeasure was enhanced by an improved
routing of related signals [11], which reduces the differences between the power
consumptions of a ’1’ and a ’0’.
Asynchronous circuits, the history of which dates back to 1950, are
nowadays increasingly considered as a viable alternative to classical synchronous
1Differential Power Analysis.
2
Wave Dynamic Differential Logic.
1
designs. Indeed they feature some very useful properties such as flexibility,
robustness, high speed and low power. This article brings another good reason to
consider asynchronous designs: a greater resistance against side-channel attacks.
Some industrial applications of asynchronous ASIC and FPGA begin to
appear both in the academic world [28, 29, 44] and in the industry [1].
At the same time synchronous circuits are suffering from problems arising
from the distribution of the clock signal through the IC and the excessive power
consumption (and thus dissipation!).
As an asynchronous circuit has no centralized clock, the problems associated
with the clock distribution, clock skew and power consumption do not exist. In
addition this circuits offers advantages like:
• average-time performance,
• lower electromagnetic radiation,
• better robustness towards variations of the power voltage,
• better robustness towards fabrication process variations [31],
• better composability and modularity because of the simple handshake
interfaces and the local timing [39] and
• better scrambling of the side-channel information [30, 16, 41].
Asynchronous circuits thus seem to be a viable alternative which would remove
these limiting factors [35].
Due to these advantages, there has been a resurgence of interest in asyn-
chronous design, especially in the reprogrammable field. There have been sev-
eral recent successful design projects such as ASPRO-216 [36], AES crypto-
processor [4], many of Philips designs targeting low power [3, 21], projects fo-
cused on designing an asynchronous FPGA from a synchronous one, like MON-
TAGE [14] and PGA-STC [27] or targeting asynchronous application-specific
FPGAs, locally synchronous, like GALSA [9] and STACC [33] or completely
asynchronous like PAPA [43, 45], and other recent works [15, 8, 7, 17, 24].
PGA-STC was developed to implement two-phase bundled-data systems such as
micro pipelines, GALSA for massively parallel computing architecture, STACC
for reconfigurable computation and PAPA was mainly created and optimized
for pipe-lined processes.
This article describe the design of the PLB3 of a new asynchronous FPGA
with security as the main requirement, even at the expense of performance.
Indeed in the particular case of cryptography performance is second to security
even if it cannot be ignored. The FPGA must be able to implement various
styles of asynchronous protocols and different representations of data so as to
enable comparisons between these representations and protocols as for their
ability to thwart the side-channel attacks.
3Programmable Logic Block.
2
Section 2 describes the representation of data and the different asynchronous
protocols used in the FPGA. We also discuss their suitablity for trusted com-
puting. Section 3 shows the construction of the PLB to implement the 4-phase
protocol using both binary and ternary representations of data. Section 4 shows
the necessary additions to the PLB to accommodate the 2-phase protocols. Sec-
tion 5 shows how the FPGA is programmed. Finally section 6 concludes the
article.
2 Asynchronous Representation of Signals
As opposed to synchronous data, whose validity is guaranteed by the timing
of some global “clock” signal, the asynchronous computations are synchronized
by the availability of data and, when necessary, by a Request/Acknowledge
handshake signalling.
A formal description of delay insensitive representation of data can be found
in [48]. In the Quasi-Delay Insensitive (QDI) protocols the request is carried
by the data itself. This allows to obtain a reliable design, independent of the
routing.
The data are transmitted together with the availability information and thus
a logic signal or, shorter a “signal”, must be represented by more than a single
electrical signal or, shorter, a “wire”4. In this article, a wire is able to take
one of two values, which we denote 0 and 1 regardless of their actual electric
implementation.
In order to avoid glitches, a sufficient condition is that given a signal S
represented by n wires, the transmission of a new value of S must consist in
exactly one of the n wires changing its electrical state. This means that the
number of wires is greater than or equal to the number of the states of S. As
silicon and routing is a precious resource, the number of wires representing a
given signal will thus be equal to the number of possible values of this signal.
The most frequently used kind of signal is the binary signal, which carries
a {‘1’,‘0’ } information. Such a signal is encoded with 2 wires. This rep-
resentation is called “Dual-Rail” or “1-out-of-2”. However ternary signals,
which carry a {‘0’,‘1’,‘2’ } information, can also be thought of. Such a signal
is represented by 3 wires and one speaks of “1-out-of-3” representation. This
representation is more compact than the 1-out-of-2 as for arithmetic: 6 wires
in 1-out-of-2 represent 3 1-out-of-2 signals which can take 8 valid values, com-
pared to two 1-out-of-3 signals, which can take 9 valid values. However due to
the greater complexity of gates in 1-out-of-3 representation, the binary signals
are most of the time preferred.
An asynchronous design may need additional signals, which are specialized
to synchronisation. These signal carry no data information and can thus be
coded on a single wire. They will be referred to as Acknowledge signals. The
4If one could work with non-standard electrical levels, a {−5 V, 0 V,+5 V } representation
on a single wire per signal would be acceptable in some cases but we shall restrict ourselves
in the following pages to standard CMOS levels: Vdd and Vss.
3
inputs of the gates which receive such a signal will be denoted Sin and those
driving these signals will be called Sout.
2.1 Asynchronous Protocols
There are two main families of QDI asynchronous communication protocols,
which differ by the nature of the signalling information: the 2-phase protocols
and the 4-phase protocols.
2.1.1 4-Phase Protocol
Under a 4-phase protocol, valid values of a signal are separated by a special
value, denoted Ω. The transmission of a value x from an emitter to a receiver
proceeds as follows:
Emitter Receiver
1 sends x −→
2 ←− acknowledges x
3 sends Ω −→
4 ←− acknowledges Ω
For instance, if a signal S is represented by n wires (S0, S1, , ..., Sn−1), the
Ω value will be implemented as the n-tuple (0, 0, . . . , 0) while the value i will
be represented by (0, . . . , , 0, Si = 1, 0, . . . , 0).
This particular kind of 4-phase protocol is named “wchb”5 in [37, Sec. 2.3.1]
and as dpl6 among the secure computing community [34].
2.1.2 2-Phase Protocols
Under a 2-phase protocol, no special value is used to separate valid ones. The
transmission of a value x from an emitter E to a receiver R proceeds as follows:
Emitter Receiver
1 sends x −→
2 ←− acknowledges x
In this article we will describe the implementations of two 2-phase protocols:
2-phase-edge protocol:
a signal S, which can take n values is represented by n wires and the
arrival of a new value i is signalled by wire i toggling 0→ 1 or 1→ 0.
Note that the instantaneous values of the wires is not significant under
this protocol: only the toggles are significant.
5Weak Condition Half Buffer.
6
Dual-Rail Precharge Logic
4
2-phase-ledr protocol:
a signal S is represented by two wires: Sd and Sr. The arrival of a new
value x, is signalled by one of Od and Or toggling 0→ 1 or 1→ 0 and the
value is given by Od.
Note that the requirement that any change of the value of the signal
be implemented by the toggling of exactly 1 wire limits the 2-phase-ledr
protocol to binary signals.
Remark 1 The 4-phase protocol can be considered as a 2-phase protocol in
which all “valid” values are followed by a Ω dummy value and in which the
gates return to the Ω value as soon as all inputs have received the Ω value.
The 2-phase protocols are thus inherently twice faster as the 4-phase ones. This
is especially important in a FPGA, in which the routing delays are often the
limiting factor of the speed of the system. However, even if twice faster, they
lead to much more complex gates than the 4-phase ones.
2.2 Initialization of the System
At the initial time of the system’s operation, all gates must be reset to a known,
deterministic value. (This is also true for synchronous systems even if some
flip-flops sometimes need no initialization.)
The requirement of a known, deterministic value, implies no specific value to
the wires. However the simplest initialization, which we shall use in this article,
consists in initializing all gates so that all wires be set to 0.
The consequence of this initialization is that the parity of the Hamming
weight of any signal is 0 just after reset, which implies that its parity is even.
The relevant property just after RESET is thus that:
• under a 4-phase protocol an Ω value is thus output by all gates and
• under a 2-phase protocol the parity of the Hamming weight of the outputs
of any gate is 0.
2.3 Request Signalling
The Request event is coded into the data of the QDI protocol itself; a request
corresponds to a change of one of the wires encoding the signal. A gate will be
ready to perform its computation when each of its input have received a request
and when all gates using its output have acknowledged the last value sent.
If performance were the major requirement this would not be true: for in-
stance, a AND gate could perfectly output a ‘0’ as soon as one of its inputs has
received a ‘0’. But such an early evaluation would occur only when some in-
put(s) receive a ‘0’ and never when all receive ‘1’. This difference in timing could
potentially leak some information about the computations being performed to a
malevolent observer. Thus such “early evaluation” will never be allowed in a se-
cured circuit and computations will always be performed upon the rendez-vous
of all data and Acknowledge inputs.
5
As the arrival of a new value is always signalled by a single wire changing
value, the parity of the Hamming weight of any signal changes each time a new
value is transmitted.
Under a two-phase protocol, a gate will be ready to compute its output when
all its inputs show a parity opposed to the current output parity.
Under a 4-phase protocol a gate is ready to compute as soon as each input
has left thenΩ state. As Ω is coded as (0, 0, . . . , 0) is has an even parity while
any valid value, signalled by a single wire at 1, has an even parity. The behaviour
of the gates under a 4-phase protocol is thus coherent with the one of the gates
under 2-phase protocols. This will be useful for the design of the FPGA.
2.4 Acknowledge Signalling
The Acknowledge signal consists of a single wire, carrying a { Ω, ack } under
a 4-phase protocol or an { odd, even } “phase” information, under a 2-phase
protocol.
Given the “parity” property of the signals, the Acknowledge signal is com-
puted as the XOR of all wires carrying the output signal. An OR gate would
be enough under a 4-phase protocol. However it is easy to show the OR and
the XOR functions are identical on the allowed domain of values of the wires
under a 4-phase protocol.
This signal is sent by a given gate to those which drive its inputs. When
the output of a gate S is sent to more than one gate, D1, D2,..., a rendez-vous
is computed to combine the synchronization signals coming from the Di into a
single signal, fed to S.
2.5 C-Element
The C-element is the gate which implements the rendez-vous of signals. It has
an arbitrary number p of input wires, denoted I1, I2,. . . ,Ip, and a single output
Z, whose equations are:
Z =


1 if I1 = I2 = · · · = Ip = 1
0 if I1 = I2 = · · · = Ip = 0
Z otherwise.
(1)
Eq. 2 shows an equivalent form of Eq. 1.
Z = (Z ∧ (I1 ∨ · · · ∨ Ip)) ∨ (I1 ∧ · · · ∧ Ip) . (2)
Where ∧ and ∨ are respectively the AND and OR operators.
Fig. 1 depicts the implementation of a C-element derived form Eq. 2, using
a multiplexer (MUX), which we use in out FPGA.
In an FPGA the C-element can be implemented in many ways. A p-input
C-element can be implemented in p + 1-input lut, provided the output of the
lut can be fed back to one of the inputs.
6
Z
‘0’
‘1’
Ip
I1
· · ·
Figure 1: C-element implemented with a MUX.
If Z = 0, the MUX selects the AND gate, which will output 1 if and only if
∀i ∈ [1, p], Ii = 1. When this condition becomes true, the output of the MUX
becomes 1 and the output of the OR is selected to be sent to Z instead of the
one of the AND. As ∀i, Ii = 1 ⇒ ∃i : Ii 6= 0, the output is stable at 1. The
output remains 1 until all inputs are back to 0. Mutatis mutandis the same
proof shows that the output of the gate comes back to 0 when all inputs are 0
and that this value is stable until all inputs are 1 again. Thus the gate correctly
implements the rendez-vous with no glitch.
2.6 Asynchronous Computation & Security
2.6.1 Timing Attack
As each gate always waits for every input to be ready before computing its
result, the duration of the computations is independent of the data. However a
dependency can be generated if the lengths of the wires xi which implement a
signal x are different, thus generating different propagation times for each value
of x.
Thus the following necessary condition must hold: for any pair of gates
(S,R), connected by a signal x, composed of wires (x0, x1, . . . , xp):
• under the 4-phase protocol, the propagation time of the transition from
Ω to any value and of the transition from any valid value to Ω S to R
must be independent of the value;
• under a 2-phase protocol, the rising and falling times of any output wire
must be equal and independent from the former and next value of the
signal.
As the condition must be fulfilled by any signal routed through the FPGA,
this implies that:
• in any routing channel, all wires must have the same length and the same
capacity with respect to Vdd or Vss,
• for any pair of wires in two routing channels connected by a switchbox, the
propagation time through the switchbox must be the same for all possible
pairs,
7
• for any input of a PLB, the propagation time from the network to the
processing elements must be uniform,
• for any output, the propagation time to the routing network must be
uniform.
If all these conditions are satisfied and if all PLB process information at the
same speed the timing attack [23, 38, 6, 12, 49] is impossible.
2.6.2 Measurement of Power Consumption
Under the 4-phase protocol, two valid values are separated in time by a Ω value,
implemented as all wires at 0. The transition from Ω to a valid value i consist
in a rising edge 0 7→ 1 of wire i and the return to Ω is the opposite falling
transition.
In order to thwart these attacks the power consumption must be the same
for the rising edge of any of the wire xi which compose a signal x and also for
their falling edges. This condition implies that lengths of the xi through the
routing network be the same.
The necessary conditions to thwart the timing attack are also necessary here
but, in addition the resistances of the output transistors must be equal.
3 4-Phase Protocol
This protocol is the simplest of all three because the instantaneous values of the
wires composing any signal are sufficient to determine the value of this signal.
We will implement the gates with:
• from 1 to 6 inputs, including the Sin signals, and
• from 2 to 4 outputs, not including the Sout signals.
3.1 Encoding of Signals
Though it is not the only possible one, we shall use the one of Eq. 3 for a signal
x in the rest of this article:
if (x0, x1) = (0, 0): x = Ω,
if (x0, x1) = (1, 0): x = ‘0’,
if (x0, x1) = (0, 1): x = ‘1’ and
if ((x0, x1)= (1, 1): forbidden state.
(3)
The occurrence of the “(1, 1)” forbidden state will always signal either a
malfunction or an attack against the system. Fig. 2 depicts the succession
of values on a signal X , represented by 2 wires (x1, x0), and, when present,
the associated transmissions of the ACK signal by the receiver back to the
transmitter.
8
\INV
X0
ACK
X1
Ω ’0’ Ω Ω ’0’ Ω ’0’ ’1’ Ω’1’
Figure 2: 4-Phase Protocol.
3.2 1-out-of-2 ,2-input Gates
Let f(x, y) : F2 × F2 7→ F2 a two-variable Boolean function. Its output is a
1-out-of-2 signal represented by two wires O1 and O0. We denote respectively
f1(x, y) and f0(x, y) the functions computing the values of each wire.
Fig. 3 depicts the minimal structure of a PLB necessary to implement in
the most general way a gate with 2 binary inputs. Three signals enter the gate:
2 data signals x and y, respectively implemented by the (x0, x1) and (y0, y1)
pairs of wires, and Sin, the synchronization signal.
The output value (O) is implemented by two 6 7→ 1 lut, respectively com-
puting the O0 and O1 wires. Eq. 4 shows the equations of the outputs. In this
equation,
O1 =


f1(x, y) if (x 6= Ω) ∧ (y 6= Ω) ∧ (Sin = 0)
0 if (x = Ω) ∧ (y = Ω) ∧ (Sin = 1)
O1 otherwise.
O0 =


f0(x, y) if (x 6= Ω) ∧ (y 6= Ω) ∧ (Sin = 0)
0 if (x = Ω) ∧ (y = Ω) ∧ (Sin = 1)
O1 otherwise.
Sout=O0 ⊕O1.
(4)
The “memory effect” implied by Eq. 4 is implemented by sending each of
O0 and O1 to an input of the lut which drives it. Thus the minimal practical
size for the lut is 64 bits, which can implement any 6-bit 7→ 1-bit function. As
there are two output bits the minimal size of the PLB is 2 lut.
Even if an OR gate would be enough, the Sout signal is computed by a XOR
gate (See 2.4).
As the inputs to the lut are the same, with the exception of the feedback
wires, there can be a single connection box to the routing network, which will
divide by 2 the total size of the connection boxes. Fig. 3 shows the minimal
structure of the PLB, which allows to implement 2-input gates with synchro-
nization.
9
L.U.T.
6 7→ 1
L.U.T.
6 7→ 1
Sin
x
y
O1
Sout
O0
Figure 3: Minimal PLB for 4-phase , 2-input gates.
Remark 2 In Eq. 3, each wire of x and y is loaded with exactly the same
number of inputs, as it is necessary to achieve the indiscernability of signals for
a malevolent observer.
3.3 1-out-of-2 , 3-input Gates
Eq. 4 can be immediately modified into Eq. 5 to add a third input term z and
the new equation shows that we need a 7-input lut with one feedback.
O1 =


f1(x, y, z) if (x 6= Ω) ∧ (y 6= Ω) ∧ (z 6= Ω) ∧ (Sin = 0)
0 if (x = Ω) ∧ (y = Ω) ∧ (z = Ω) ∧ (Sin = 1)
O1 otherwise.
O0 =


f0(x, y, z) if (x 6= Ω) ∧ (y 6= Ω) ∧ (z 6= Ω) ∧ (Sin = 0)
0 if (x = Ω) ∧ (y = Ω) ∧ (z = Ω) ∧ (Sin = 1)
O0 otherwise.
Sout =O0 ⊕O1.
(5)
As the 3-input gates need 6 inputs for a 3-variable function, they cannot be
implemented in the structure of Fig. 3, on which each 6 7→ 1 lut has 5 inputs
from the routing network and 1 feedback input.
As it is not realistic to use two 7 7→ 1 lut because of the number of program-
ming points (2×128 bits), we separate the rendez-vous + computation function
from the memory function and introduce a specific component: the memory
point.
Fig. 4 depicts the memory point, which consists in a pair of C-elements,
together with a XOR gate, which computes the Sout signal. Two MUX, under
control of a single programming point, allow to bypass the C-elements. It will
be useful when implementing the 2-phase protocols.
Fig. 5 depicts the schematic of the 2-input 1-out-of-3 gate. The ancillary
“return to Ω” function is implemented by a specialized 6-input OR gate while
the 6 7→ 1 lut are programmed to compute the rendez-vous and the functions
F 1(x, y, z] and f1(x, y, z).
10
Sout
O0
O1
C
C 0
1
0
1I′
1
I′′
1
I′
0
I′′
0
Figure 4: Memory Point.
L0
L1
L0
L1
M
M O′
1
S′out
O′
0
O′′
1
O′′
0
S′′out
x′
y′
z′
x′
y′
z′
Figure 5: Binary 3-input gate with 4-phase protocol.
11
L2
L0
L1
L3
[O3]M
M
x′
y′
x′
y′
O0
Sout
O1
O2
Figure 6: Structure of PLB needed to implement a ternary 2-input gate.
Remark 3 Note that it is much better use of the lut than the one implied by
Fig. 3, in which all bits corresponding to the feedback input set to 1 are filled
with ’1’ to implement the inclusive OR of all 4 input bits.
The 4-lut PLB can implement two independent 3-input, 1-out-of-2 func-
tions. Ex: a full-adder.
Remark 4 The wiring depicted by Fig. 5 can handle any gate the inputs of
which sum up to 6 wires (Ex: one Sin + one 1-out-of-2 input + one 1-out-of-3
input; two Sin, two 1-out-of-2 inputs, etc...).
Remark 5 The feed-back and the associated MUX at the inputs of lut could
be removed. However they will be useful later for the implementation of the
2-phase-ledr protocol.
3.4 1-out-of-3 , 2-input Gates
Just as the 1-out-of-2 , 3-Input Gates, the 1-out-of-3 , 2-input gates need 6 inputs
but they need three outputs, each of them equipped with a memory point.
Strictly speaking, a 1-out-of-3 gate needs three lut, each of them implementing
one of the functions Oi = f
i(x, y), i = 0, 1, 2.
However as most of the gates in a design will still be binary, the PLB fea-
tures four 6 7→ 1 lut. One of them will remain unused and filled with 0 when
implementing a 1-out-of-3 gate. The computation of the Sout signal needs some
12
specialized hardware. Fig. 6 depicts the new PLB needed for a 2-input 1-out-
of-3 gate, with the supplementary devices gate in a grey rectangle:
• a MUX, controlled by a programming point, which allows to use the PLB
either as two separate 2-binary input, binary output gates or a single
combined gate and
• a single XOR gate which computes the XOR of all four outputs of the
memory points.
For the same reason of compatibility with the binary gates, the inputs to
the pairs of lut are split into two groups. The load to each of the 12 input
wires is exactly the same, thus equalizing the power consumptions of all possible
transitions on inputs.
Remark 6 The OR gates which compute the “return to Ω” signal are not
grouped but will compute output the same value as their input are the same.
3.5 Conclusion as for the 4-Phase Protocol
In order to implement 2- and 3-inputs gates under the 4-phase protocol, the
PLB must at least consist of four 6 7→ 1 lut, named L0, L1, L2 and L3. One
input of each lut can be replaced with a feedback signal equal to the output
pin.
The schematic depicted on Fig. 6 is general: it can implement any gate
with:
• inputs consisting in any combination of 6 wires or less, including the Sin
signals, and
• outputs consisting of any combination of 4 wires, not counting the Sout
signals: 2 binary outputs, with separate Sout signals, 1 ternary output
with a single acknowledge-out signal or 1 quaternary output with an Sout
signal.
4 2-Phase Protocols
4.1 Phase of a Signal
Under the 2-phase protocols valid values of a signal are not separated by “Ω”
markers. However, as the arrival of a new value (possibly identical to the pre-
ceding one) is indicated by the toggling a exactly one wire, the parity of the
Hamming weight of the wires which represent a signal toggles at each new data.
In the following pages, the phase of the signal X , denoted “φ(X)”, is by
definition, the parity of the Hamming weight of the wires representing X .
Remark 7 For Acknowledge signals, which consist in a single wire, the phase
is equal to the value of the wire itself. The name of an Acknowledge signal A
will thus be used instead of φ(A).
13
’1’ ’0’’1’ ’1’’0’ ’1’ ’1’ ’0’’0’ ’0’ ’1’
Xr
ACK
Xd
Figure 7: Transmission of a Signal and the acknowledge under the 2-phase-ledr
protocol.
At the beginning of the computation, all wires are set to a known value.
2-phase protocols require that, after initialization and before any computation
is started, the parities of all signals be the same, say even. A simple way of
ensuring this even parity is to initialize all wires to 0.
As the phase of a signal toggles with every new valid value, a given gate is
ready to compute its output when the phases of all “data” signals at its inputs
are the same, different from the current phase of the output and the phase of
the Sin signal, if present, the same as the output phase.
After the gate has performed its computation, the phase of its outputs be-
come the common one of the data inputs and thus the Sout signal toggles.
4.2 2-Phase, LEDR Protocol
This protocol is referred to as “level-encoded dual-rail”, or LEDR [13].
4.2.1 Transmission of a Signal
Fig. 7 shows the transmission protocol of the successive values of a signal,
together with the acknowledge signal. One can see that:
• a signal X is represented by two wires: the “data wire”: (Xd) and the
“repeat” wire: (Xr);
• each time a value is sent, exactly one wire toggles;
• the value of the signal X is the value of the Xd signal, thus the oncoming
a a new value, different from the preceding one is signalled by the toggling
of Xd;
• the oncoming of a new value, identical to the preceding one is signalled
by Xr toggling; thus the instantaneous value of Xr is irrelevant, only its
toggling are significant.
Remark 8 The 2-phase-ledr protocol is restricted to binary signals. Otherwise,
the transition between two values would imply that more than a single wire toggle.
14
ML2
L0
L1
L3
M
Od
Sout
Or
y
x
Sin
Figure 8: 2-input Gate under the 2-phase-ledr protocol.
4.2.2 Binary 2-input Gates
Let f(x, y) : F2 × F2 7→ F2 a two-variable Boolean function. The inputs are
represented by 4 wires: xd, xr, yd and yr, to which a synchronization signal,
Sin, may added and the output signal O represented by two wires: Od and Or,
together with an acknowledge output Sout.
The equations of the output wires are:
Od =


fxd, yd) if (φ(x)) = 0) ∧ (φ(y)) = 0) ∧ (Sin = 1),
fxd, yd) if (φ(x)) = 1) ∧ (φ(y)) = 1) ∧ (Sin = 0),
Od otherwise,
Or =


fxd, yd) if (φ(x)) = 0) ∧ (φ(y)) = 0) ∧ (Sin = 1),
fxd, yd) if (φ(x)) = 1) ∧ (φ(y)) = 1) ∧ (Sin = 0),
Or otherwise.
Sout =Od ⊕Or .
(6)
Eq. 6 shows that each of (Od, Or) is a a function of 6 variables:
• two input data signals, represented by 4 wires,
• one Sin signal, represented by a single wire and
• one feed-back signal, also 1 wire.
15
These functions can be implemented in the same hardware as the corre-
sponding gate under the 4-phase protocol. Fig. 8 shows the assignment of the
wires.
The hardware elements which are not used to implement this gate are rep-
resented in dashed lines:
• the 6th input to the 6 7→ 1 lut, which is replaced by the feed-back,
• the 6-input OR gate,
• the memory element, which is programmed as “transparent” using its
internal programming point (See Fig. 4).
Note that, opposite to the case of the 4-phase protocol, here, the Sout value
must be computed by a XOR gate.
4.2.3 3-input Gates
Let f(x, y, z) : F32 7→ F2. Eq. 7 shows the expressions of the output wires.
Od =


f(x, y, z) if (φ(x) = 1) ∧ (φ(y) = 1) ∧ (φ(z) = 1) ∧ (Sin = 0),
f(x, y, z) if (φ(x) = 0) ∧ (φ(y) = 0) ∧ (φ(z) = 0) ∧ (Sin = 1),
Od otherwise,
Or =


f(x, y, z) if (φ(x) = 1) ∧ (φ(y) = 1) ∧ (φ(z) = 1) ∧ (Sin = 0),
f(x, y, z) if (φ(x) = 0) ∧ (φ(y) = 0) ∧ (φ(z) = 0) ∧ (Sin = 1),
Or otherwise.
Sout = Od ⊕Or =
(7)
Eq. 7 shows that each of Od and Or is a variable of 7 input variables and
cannot thus be implemented in a 6 7→ 1 lut.
4.2.4 Practical Implementation
Under the 4-phase protocol the outputs were set back to 0 by the rendez-vous of
the 0 coming from the lut and the 0 coming from the 6-input OR gate. Under
the 2-phase protocol a OR gate cannot express the “return to 0” condition.
Therefore the wiring of Fig. 5 is modified according to Fig. 9.
Two MUX, controlled by a programming point, are added, which allow to
replace the 6-in OR gate by the two other 6 7→ 1 lut of the PLB. This way,
each of Od and Or is now a rendez-vous of the outputs of 2 lut:
Od= rendez-vous(L0, L2)
Or = rendez-vous(L1, L3)
16
ML0
L1
L2
L3
M
O0
Sout
O1
x
y
x
y
Figure 9: Implementation of the 3-input gate under the 2-phase protocol.
Eq. 8 shows the programming of lut L0 and L2 and Eq. 9 shows the
programming of lut L1 and L3.
L0 =


f(xd, yd, zd) if (φ(x) = 0) ∧ (φ(y) = 0) ∧ (φ(z) = 0) ∧ (Sin = 1),
f(xd, yd, zd) if (φ(x) = 1) ∧ (φ(y) = 1) ∧ (φ(z) = 1) ∧ (Sin = 0),
0 otherwise,
L2 ==


f(xd, yd, zd) if (φ(x) = 0) ∧ (φ(y) = 0) ∧ (φ(z) = 0) ∧ (Sin = 1),
f(xd, yd, zd) if (φ(x) = 1) ∧ (φ(y) = 1) ∧ (φ(z) = 1) ∧ (Sin = 0),
1 otherwise.
(8)
When the conditions for a transition are fulfilled, L0 and L2 have the same
value. Thus the rendez-vous occurs and Or takes its new value. Otherwise
L0 = 0 and L2 = 1, the C-element within the memory element has different
17
C0,0, C0,1, C1,0, C1,1
I1, I0
A1
B1
A0
B0
Sin
Sout
O0
O1
Compute
Outputs
Synchro
Decode
Detect &
Figure 10: Global Structure of a 2-input 2-Phase-Edge gate.
values on its inputs and Od is locked.
L1 =


f(xd, yd, zd) if (φ(x) = 0) ∧ (φ(y) = 0) ∧ (φ(z) = 0) ∧ (Sin = 1),
f(xd, yd, zd) if (φ(x) = 1) ∧ (φ(y) = 1) ∧ (φ(z) = 1) ∧ (Sin = 0),
0 otherwise,
L3 ==


f(xd, yd, zd) if (φ(x) = 0) ∧ (φ(y) = 0) ∧ (φ(z) = 0) ∧ (Sin = 1),
f(xd, yd, zd) if (φ(x) = 1) ∧ (φ(y) = 1) ∧ (φ(z) = 1) ∧ (Sin = 0),
1 otherwise.
(9)
Mutatis mutandis the same demonstrations shows the validity of Or.
4.2.5 Conclusion on the 2-Phase, LEDR Protocol
Apart from the shaded area in Fig. 9 the 2-phase-ledr protocol needs the same
resources as the 4-phase protocol.
As for security, all inputs to the gates have an equal load but the value of a
signal X is the value of one of xd. This is a potential security risk, which will
have to be investigated as soon as the ICs have been delivered.
4.3 2-Phase, Edge Protocol
Signals under the 2-phase-edge protocol can take an arbitrary number of values.
Binary signals are represented by 2 wires, ternary signals are represented by 3
wires, etc... However the complexity of the gates is quadratic in the number of
wires per signal. Thus the use of this protocol is in practice limited to binary
signals. The complexity of the gates is also quadratic in the number of inputs.
Again this limits in practice the number of inputs to 2. In the sequel signals are
binary and a signal X is thus represented by 2 wires: (x0, x1).
The coding of the signals relies exclusively on toggling of wires. the instan-
taneous values of the wires is always irrelevant. This means that the current
state of 4 wires has to be stored. Thus even for a 2-input gate all four lut of
the PLB will have to be used.
18
CC
C
C
C1,1
C1,0
C0,1
C0,0
B0
B1
A0
A1
C1,1
C1,1
C1,0
C1,0
C0,1
C0,1
C0,0
A0
C0,0
A0
A1
A1
B1
B1
B0
B0
B0
Figure 11: 2× 2− dw.
4.3.1 Structure of a Gate
The global structure of a 2-input gate under the 2-phase-edge protocol is de-
picted by Fig. 10. The operation of the gate is divided in three steps:
Detection: waits for an edge on Ai and one on Bj and toggles the corre-
sponding Ci,j ,
Computation: toggles If(i,j) and
Synchronization: toggles Of(i,j) and Sout if and only if Sin has toggled since
the last data output.
Detection: 2x2-decision wait The detection and the decoding of the in-
put data is performed by the circuitry known as the “2x2-decision wait” or,
shorter, the “2× 2− dw”. The circuitry, shown on Fig. 11, works as follows:
1. assume an initial state such that, for each C-element, the inputs are equal,
(as this is the initial state, with all wires set to 0, the recurrence can start),
2. after an input value i ∈ {0, 1} has arrived on input port A and an input
value j ∈ {0, 1} on input port B, Ai and Bj have toggled (double-thickness
continuous lines)
3. at this point:
- one input to Ci,1−j has toggled ⇒ Ci,1−j is unchanged,
- one input to C1−i,j has toggled ⇒ C1−i,j is unchanged,
- both inputs to Ci,j have toggled ⇒ Ci,j toggles,
4. the new value of Ci,j is sent to the next stage and to the appropriate
XOR gates to cancel the unwanted toggling of Ci,1−j and Ci,1−j (double-
thickness dashed lines),
5. all four C-elements now have their inputs identical, which was the initial
situation and Ci,j has toggled, indicating to the next stage that:
19
• both input ports A and B have received a new data,
• the data just arrived on A was i and
• the data just arrived on B was j.
Each of the Ci,j can be expressed as:
Ci,j = rendez-vous(Ai ⊕ Ci,1−j , Bj ⊕ C1−i,j) (10)
Each of the Ci,j is a 5-term expression depending of:
• three feedback lines: Ci,j (itself), Ci,1−j and C1−i,j and
• two input lines: Ai and Bj .
Though the expression would fit in a 6 7→ 1 lut, the feedback from one
lut to the other would have to be routed through the general routing network,
which has the following drawbacks:
• it consumes routing resources,
• the timings of the feedbacks will be different between the feedback of a
lut to itself (which is routed inside the PLB) and other, routed outside.
This could be an attack point;
• the 2-input gate will always need 2 PLB: one for the 2× 2− dw and one
for the computation itself.
Therefore, these feedback have been added to the PLB, as shown on Fig.
12, which. As on preceding figures, the black triangles at the inputs of the lut
are MUX controlled by programming points, which are denoted by “[./.]” in Eq.
11.
With these notations the equations of the 4 6 7→ 1 lut are:
L0 = lut ( [I
′
0/L0] , [I
′
1/L1] , [I
′
2/L2] , [I
′
3/L3] , I
′4 , I ′5) )
L1 = lut ( [I
′
0/L0] , [I
′
1/L1] , [I
′
2/L2] , [I
′
3/L3] , I
′4 , I ′5) )
L2 = lut ( [I
′′
0 /L0] , [I
′′
1 /L1] , [I
′′
2 /L2] , [I
′′
3 /L3] , I
′′4 , I ′′5) )
L3 = lut ( [I
′′
0 /L0] , [I
′′
1 /L1] , [I
′′
2 /L2] , [I
′′
3 /L3] , I
′′4 , I ′′5) )
(11)
To implement the 2× 2− dw the input lines are assigned as in Eq. 12 and
depicted by Fig. 13:
I ′0 →NC I
′
1 →NC I
′
2 →B1 I
′
3 →B0 I
′
4 →A0 I
′
5 →A1
I ′′0 →B1 I
′′
1 →B0 I
′′
2 →NC I
′′
3 →NC I
′′
4 →A0 I
′′
5 →A1
(12)
in which “NC” means “not connected” and Eq. 13 shows the interconnection
of the feedbacks needed to implement the 2× 2− dw.
L0=C0,0= lut(C0,0 , C0,1 , C1,0 , B0 , A0 , A1 )
L1=C0,1= lut(C0,0 , C0,1 , B1 , C1,1 , A0 , A1 )
L2=C1,0= lut(C0,0 , B0 , C1,0 , C1,1 , A0 , A1 )
L3=C1,1= lut( B1 , C0,1 , C1,0 , C1,1 , A0 , A1 )
(13)
20
L0
L1
L2
L3
I′0
I′1
I′2
I′3
I′4
I′5
I′′0
I′′1
I′′2
I′′3
I′′4
I′′5
L2
L3
L1
L0
Figure 12: PLB with all feedbacks for the 2-phase-edge protocol.
M
M
L0
L1
L2
L3
B1
B0
A0
A1
A1
A0
B1
B0 C1,0
C0,0
C0,1
C1,1
Figure 13: Wiring used to implement the 2× 1-decision-wait.
21
I1
Sin
I0
O1
Sout
O0
J1
J0 C
C
Figure 14: 2× 1-decision-wait.
Remark 9 A1 is useless to compute C0,0 and C0,1 and A0 is useless to compute
C1,0 and C1,1. The reason why these inputs are connected to the network but
ignored in the programming of the lut is that B0 and B1 are connected twice
from the network to the PLB and that the loads on this network must be identical
for both variables.
Computation & synchronization The 2× 2− dw stage provides a de-
coded output: Ci,j toggles if i and j data have arrived on inputs A and B
respectively.
Computing the outputs is then straightforward: each of O1 and O0 outputs
is the XOR of the relevant Ci,j . Let’s see some examples:
Gate O1 O0
AND C1,1 C0,0 ⊕ C0,1 ⊕ C1,0
NAND C0,0 ⊕ C0,1 ⊕ C1,0 C1,1
OR C1,1 ⊕ C0,1 ⊕ C1,0 C0,0
NOR C0,0 C1,1 ⊕ C0,1 ⊕ C1,0
XOR C0,1 ⊕ C1,0 C0,0 ⊕ C1,1
NXOR C0,0 ⊕ C1,1 C0,1 ⊕ C1,0
Synchronization The synchronization is performed by a device called “2×1-
decision-wait” (or, shorter: 2× 1− dw). Fig. 14 depicts the schematic of the
2× 1− dw.
The 2× 1− dw works as follows:
1. In the initial state, the following relations hold: O1 = I1, O0 = I0 and
Sin = O0 ⊕ O1, which imply J1 6= I1 and J0 6= I0. (because J1 =
Sin ⊕O0 = O0 ⊕O1 ⊕O0 = O1 = I1, idem for J0);
2. Assume Ii toggles and thus becomes equal to Ji, the C-element transmits
the common value of its inputs to Oi,
3. as Oi toggles, J1−i toggles too and becomes equal to O1−j .
4. until Sin toggles, we have I0 = J0 and I1 = J1: even if one of the inputs
toggles, the C-elements will remain stable;
22
ML0
L1
L2
L3
C0,0
C0,1
C1,0
C1,1
Sin
M
O1
O0
Sout
Figure 15: Wiring used to implement the 2× 1-decision-wait.
5. when Sin toggles, J0 and J1 toggle and the system is back in the initial
state.
If one wants to combine the computation stage with the 2× 1− dw, it can-
not be done in 2 lut.
It is not because of the complexity of the functions: each of O1 and O0 is a
function of 2 feed-backs, 1 Sin and at most 3 Ci,j , at least if one does not want
to implement trivial, constant functions.
However, the set of 2 lut together would need 2 feed-backs, 1 Sin and 4
Ci,j , which is one more than the number of available wires. Thus we must use
a full PLB.
If we use the full PLB, the memory element will provide the necessary C-
element and the lut become purely combinatorial. The inputs will be assigned
following Eq. 14 and depicted on Fig. 15:
(I ′0, I
′
1, I
′
2, I
′
3) = (C0,0, C0,1, C1,0, C1,1) and I
′′
4 = Sin (14)
Then the lut are programmed as by Eq. 15
L0 = f
1(C0,0, C0,1, C1,0, C1,1)
L1 = f
0(C0,0, C0,1, C1,0, C1,1)
L2 = O0 ⊕ Sin
L3 = O1 ⊕ Sin
(15)
23
CC
C
C
C
C
Pi+1Pi
Acknowledge signals
Figure 16: FIFO memory for programming.
4.3.2 Conclusion on the 2-Phase, Edge Protocol
The 2-phase-edge protocol is difficult to implement in a FPGA without special
hardware added to the PLB: it takes two PLB to implement a single 2-input
gate.
However this protocol has advantages as for security because the instanta-
neous value of the wires is not significant in itself. For instance ‘1’ is represented
alternatively by the rising and the falling edge of a given wire. An attacker try-
ing DPA, for instance, would have to exhibit the difference between the average
consumption of both edges on wire ’1’ and the same average on wire ’0’.
5 Programming the FPGA
The FPGA can be partially programmed: it is divided in square blocks which
can be programmed separately from the other.
The programming chain is a set of asynchronous FIFO memories. An ele-
mentary stage of these FIFO is depicted by Fig. 16.
At RESET time, all C-elements are set to zero by a general RESET wire.
Then the programming bits are fed to the FIFO, separated by Ω values. The
last stage of each FIFO is particular: the Acknowledge signal is controlled by
an external pin. During the programming of the block, the Acknowledge signal
is held low. This way the programming bits are stacked in the FIFO and the
FPGA becomes functional.
If a partial reconfiguration is wanted, the chosen blocks are cleared by allow-
ing the Acknowledge signal of their last stage to acknowledge the value in the
last stage. Then the FIFO is activated again until all bits have gone thought it.
At this point, the Acknowledge signal is blocked again and the FIFO is ready
to receive a new set of configuration bits.
During the configuration of the FPGA, all outputs of PLB are kept at 0 to
avoid short-circuits. The PLB are programmed first, while all switchboxes are
left in an insulation mode. Then the switchboxes are programmed to connect
the newly reconfigured part to be connected to the still working part. It is the
designer’s responsibility to ensure that the new part can create no conflict with
the existing part.
24
6 Conclusion
We have presented the programmable logic block of an asynchronous FPGA,
which is oriented towards security rather than performance. In particular we
have chosen not to implement one of the advantages of an asynchronous design,
which usually allows to compute in average time: the early evaluation. This
choice is deliberate as early evaluation is a security risk [40].
The FPGA can accommodate various sizes of data as well as various styles of
asynchronous control, thus making it possible for the end user to design mixed
styles of logic, depending on the applicative requirements. Incidentally, this
FPGA is also a valuable prototype that allows to perform comparisons between
styles of asynchronous protocols.
A silicon is being manufactured and will be used for intensive testing. The
different resistances of the various protocols against SCA will be evaluated. In
particular the strict link under the 2-phase-ledr protocol between the value of a
signal X and the one of the Xd wire will decide whether this protocol is suitable
at all for a secure implementation.
References
[1] Achronix semiconductors. http://www.achronix.com/.
[2] Elena Trichina & Antonio Bellaza. Implementation of elliptic curve cryp-
tography with built-in counter measures against side channel attacks. In
LNCS – CHES 2002, volume 2523, pages 98–113, 2002.
[3] K.v. Berkel, R. Burgress, J. Kessels, A. Peeters, M. Roncken, and F. Schalij.
A fully-asynchronous low-power error corrector for dcc player. IEEE Jour-
nal of Solid-State Circuits, 29:1429–1439, 1994.
[4] F. Bouesse, M. Renaudin, A. Witon, and F. Germain. A clock-less low-
voltage aes crypto-processor. In European Solid State Circuits Conference
(ESSCIRC 2005), pages 12–16, Grenoble, September 2005.
[5] Bert den Boer & Kerstin Lemke & Guntram Wicke. A dpa attack against
the modular reduction within a crt implementation od rsa. In LNCS –
CHES 2002, volume 2523, pages 228–243, 2002.
[6] Jean-F. Dhem, F. Koeune, P.-A. Leroux, P. Mestre, J.-
J. Quisquater, and J.-L. Willems. A practical implementa-
tion of the timing attack. In CARDIS, pages 167–182, 1998.
http://citeseer.nj.nec.com/dhem98practical.html.
[7] Laurent Fesquet, Bertrand Folco, Mathieu Steiner, and Marc Renaudin.
State-holding in look-up tables: application to asynchronous logic. In VLSI-
SoC 2006, pages 12–17, Nice, France, October 16-18 2006. IEEE.
25
[8] Laurent Fesquet, Je´roˆme Quartana, and Marc renaudin. Asynchronous
systems on programmable logic. In Reconfigurable Communication-centric
SoCs (ReCoSoC), pages 105–112, Montpellier, France, June 27-29 2005.
[9] B. Gao. A globally asynchronous locally synchronous configurable array
architecture for algorithlm embeddings. PhD thesis, University of Edinburg,
December 1996.
[10] Jovan Dj. Golic. Multiplicative masking, power analysis of AES.
http://citeseer.nj.nec.com/529351.html.
[11] Sylvain Guilley, Philippe Hoogvorst, Yves Mathieu, and Renaud Pacalet.
The “backend duplication” method. In CHES-2005, volume 3659 of LNCS,
pages 383–397. Springer-Verlag, August 2005.
[12] G. Hachez, F. Koeune, and J. J. Quisquater. Timing attack: what can be
achieved by powerful adversary. In A. M. Barb/’e et. al., editor, 20th Symp.
on Information Theory in the Benelux, pages 63–70, Haasrode (B), 27-
28 1999. Werkgemeenschap Informatie- en Communicatietheorie, Enschede
(NL). http://citeseer.nj.nec.com/hachez99timing.html.
[13] Daniel H. Linder & James C. Harden. Phased logic: Supporting the syn-
chronous desing paradigm with delay-insensitive circuitry. IEEE transac-
tions on computers, 45(9):1031–1044, September 1996.
[14] Scott Hauck, Gaetano Boriello, and Carl Ebeling. Montage: An fpga fo
synchronous an dasynchronous circuits. In 2nd International Workshop on
Field-Programmable Logic and Applications, Vienna, August 1992.
[15] Quoc Thai Ho, J.-B. Rigaud, L. Fesquet, M. Renaudin, and R. Rolland.
Implementing asynchronous circuits on lut based fpgas. In 12th Interna-
tional Conference on Field Programmable Logic and Applications (FPL),
pages 36–46, Montpellier (La Grande-Motte), France, September 2002.
[16] Philippe Hoogvorst, Sylvain Guilley, Sumanta Chaudhuri, Jean-Luc Dan-
ger, Alin Razafindraibe, Taha Beyrouthy, Laurent Fesquet, and Marc Re-
naudin. A Reconfigurable Cell for a Multi-Style Asynchronous FPGA.
pages 15–22, June 2007. ReCoSoC, Montpellier, France.
[17] N. Huot, H. Dubreuil, L. Fesquet, and M. Renaudin. Fpga architecture for
multi-style asynchronous logic. In Design Automation and Test in Europe
(DATE), pages 32–33, Mu¨nchen, Germany, March 7-11 2005.
[18] K. Itoh, T. Izu, , and M. Takenaka. Address-bit differential power analysis
on cryptographic schemes ok-ecdh, ok-ecdsa. In LNCS, pages 129–143,
2002.
[19] Louis Goubin Jean-Se´bastien Coron. On Boolean, arithmetic mask-
ing against differential power analysis. LNCS, 1965:231–??, 2001.
http://citeseer.nj.nec.com/coron00boolean.html.
26
[20] Kouichi Itoh & jun Yajima & Masahiko Takenaka & Naoya Torii. Dpa
countermeasures by improving the window method. In LNCS – CHES
2002, volume 2523, pages 303–317, 2002.
[21] J. Kessels and P. Marston. Designing asynchronous standby circuits for a
low-power pager. In Third International Symposium on Advanced Research
in Asynchronous Circuits and Systems, pages 257–267, 1997.
[22] P. Kocher, J. Jaffe, and B. Jun. Timing attacks on implementations of
diffie-hellman, rsa , dsa , and other systems. In Proceedings of CRYPTO’96,
volume 1109 of LNCS, pages 104–113. Springer, 1996.
[23] Paul C. Kocher, , Joshua Jaffe, and Benjamin Jun. Timing attacks on im-
plementations of diffie-hellman, rsa, dss,, other systems. LNCS, 1109:104–
113, 1996. http://citeseer.nj.nec.com/kocher96timing.html.
[24] M. Renaudin L. Fesquet. A programmable logic architecture for prototyp-
ing clockless circuits. In Field Programmable Logic (FPL), pages 293–298,
Tampere, Finland, August 24-26 2005.
[25] P.-Y. Liardet and N. P. Smart. Preventing spa/dpa in ecc systems using
the jacobi form. In LNCS – CHES 2001, volume 2162, pages 391–401, 2001.
[26] Jacques Patarin Louis Goubin. Des, differential power analysis - the Du-
plication method. http://citeseer.nj.nec.com/364562.html.
[27] Kapilan Makeswaran and Venkatesh Akella. Pga-stc : Programmable gate
array for implementating self-timed circuits. International Journal of Elec-
tronics, 84:255–267, 1998.
[28] A.J. Martin, S.M Burns, T.K. Lee, D. Borkovic, and P.J. Hazewindus.
The first asynchronous microprocessor: The test results. In Computer
Architecture News, volume 17, pages 95–98, 1989.
[29] A.J. Martin, S.M. Burns, T.K. Lee, D. Borkovic, and P.J.Hazewindus.
Advanced Research in VLSI, chapter The design of an asynchronous micro-
processor, pages 351–373. MIT Press, 1989.
[30] S. Moore, R. Anderson, P. Cunningham, R. Mullins, and G. Taylor. Im-
proving smart card security using self-timed circuits. In ASYNC’02, pages
211–218, April 2002. Manchester , United King.
[31] L.S. Nielsen, C. Niessen, and C.H. van Berkel. Low-power operation us-
ing self-timed circuits and adaptive scaling of the supply voltage. IEEE
Transactions on VLSI Systems, 2:391–397, 1994.
[32] B. Jun P. Kocher, J. Jaffe. Differential power analysis: Leaking secrets.
In Proceedings of CRYPTO’99, volume 1666 of LNCS, pages 388–397.
Springer-Verlag, 1999.
27
[33] Robert Payne. Self Timed Field Programmable Gate Array Architectures.
PhD thesis, University of Edinbugh, 1997.
[34] Thomas Popp and Stefan Mangard. Masked dual-rail pre-charge logic:
Dpa-resistance without routing constraints. In CHES, volume 3659 of
LNCS, pages 172–186, 2005.
[35] M. Renaudin. Asynchronous circuits and systems: a promising design al-
ternative. Microelectronic Engineering, 54:133–149, 2000.
[36] M. Renaudin, P. Vivet, and F. Robin. Aspro-216: a standar cell q.d.i. 16-
bit risc asynchronous microprocessor. In IEEE, editor, Proc. of the Fourth
International Symposium on Advanced Research in Asynchronous Circuits
and Systems, pages 22–31, 1998.
[37] Jean-Baptiste Rigaud. Spe´cification de Bibliothe`ques pour la Synthe`se de
Circuits Asynchrones. PhD thesis, http://www.inpg.fr/, december 2002.
[38] Werner Schindler. A timing attack against rsa with the chines remainder
theorem. In LNCS – CHES 2000, volume 1965, pages 109–124, 2000.
[39] Jens Spars and Steve Furber, editors. Principles of Asynchronous Circuit
Design: A Systems Perspective. Kluwer Academic Publishers, Boston /
Dordrecht / London, 2001.
[40] Daisuke Suzuki and Minoru Saeki. Security Evaluation of DPA Counter-
measures Using Dual-Rail Pre-charge Logic Style. In CHES, volume 4249 of
LNCS, pages 255–269, 2006. http://dx.doi.org/10.1007/11894063_21.
[41] Taha Beyrouthy and Alin Razafindraibe and Laurent Fesquet and Marc
Renaudin and Sumanta Chaudhuri and Sylvain Guilley and Philippe
Hoogvorst and Jean-Luc Danger. A Novel Asynchronous e-FPGA Architec-
ture for Security Applications. Dec 2007. FPT’07, Kokurakita, Kitakyushu,
Japan.
[42] Kouichi Itoh & Tetsuya Uzu & Masahiko Takenaka. Address-bit differential
power analysis of cryptographic schemes ok-ecdh, ok-ecdsa. In LNCS –
CHES 2002, volume 2523, pages 129–143, 2002.
[43] John Teifel and Rajit Manohar. Programmable asnchronous pipeline ar-
rays. In International Workshop on Field-Programmable Logic and Appli-
cations, Lisbon, Portugal, September 2003.
[44] John Teifel and Rajit Manohar. An asynchronous dataflow fpga architec-
ture. IEEE Transactions on Computers, 53(11):1376–1392, 2004.
[45] John Teifel and Rajit Manohar. Highly pipelined asynchronous fpgas. In
2th ACM International Symposium on Field-Programmable Gate Arrays,
Monterey, CA, February 2004.
28
[46] Tiri, Hwang, Lai, and Yang Schaumont Verbauwhede. Prototype ic with
wddl, differential routing – dpa resistance assesment. In B. Sunars J.R. Rao,
editor, LNCS – CHES 2005, volume LNCS 3659, pages 354–365, 2005.
[47] Marc Joye & Christophe Tymen. Protections against differential analysis
for elliptic curve cryptography —, algebraic approach —. In LNCS – CHES
2001, volume 2162, pages 377–390, 2001.
[48] T. Verhoeff. Delay-insensitive codes – an overview. Distrib. Comput., 3:1–8, 1988.
[49] J. Quisquater W. Schindler, F. Koeune. Unleashing the full power of timing
attack, 2001.
29
