Abstract-The main challenge when implementing cryptographic algorithms in hardware is to protect them against attacks that target directly the device. Two strategies are customarily employed by malevolent adversaries: observation and differential perturbation attacks, also called SCA and DFA in the abundant scientific literature on this topic. Numerous research efforts have been carried out to defeat respectively SCA or DFA. However, few publications deal with concomitant protection against both threats. The current consensus is to devise algorithmic countermeasures to DFA and subsequently to synthesize the DFAprotected design thanks to a DPA-resistant CAD flow. In this article, we put to the fore that this approach is the best neither in terms of performance nor of relevance. Notably, the contribution of this paper is to demonstrate that the strongest SCA countermeasure known so far, namely the dual-rail with precharge logic styles that do not evaluate early, happen surprisingly to be almost natively immune to most DFAs. Therefore, unexpected two-inone solutions against SCA and DFA indeed exist and deserve a closer attention, because they ally simplicity with efficiency. In particular, we illustrate a logic style, called WDDL without early evaluation (WDDL w/o EE), and a design flow that realizes in practice one possible combined DPA and DFA counter-measure especially suited for reconfigurable hardware.
I. INTRODUCTION
Embedded systems that contain cryptographic modules are becoming commonplace with the generalization of privacy, authentication and integrity in digital communications. The cryptographic hardware is very resource consuming because it relies on complex operations needed to prevent illegitimate users from spying, impersonating or altering the communications. Therefore, many studies focus on the optimization of cryptographic blocks. In parallel, new threats -not of cryptanalytic nature -have shown up: it has been suggested and demonstrated that an attacker can break the logical security conveyed by the cryptography by merely observing or perturbing it on the physical layer. The common point between those two exaction strategies is their aim to defeat the security by retrieving some secret elements (such as keys) from which the security features stem.
On the one hand, observation attacks are also known as sidechannel attacks (SCAs [1] ), in that they exploit a physical leakage of the device to gain information about its internal secrets. On the other hand, perturbation attacks consist in altering the state of the device so as to retrieve faulted outputs, that together with nominal outputs, can disclose or negate relationships within the secret bits normally concealed into the hardware; these attacks are referred to as differential fault analyses (DFAs [2] , [3] ). The main strength of SCAs is their furtivity. As they are virtually impossible to detect, an adequate countermeasure must be vigilant each time the cryptographic engine is in use. On the contrary, the first prerequisite for a DFA to be successful is to actually modify the device's state. A detection strategy can thus be enforced to check for the device operations' integrity. However, the careful check of all components of an embedded system is very fastidious and error-prone. In addition, even if any sensitive data is carefully monitored for integrity, the faults coverage remains an issue. Indeed, if detecting one single error (of unitary bit entropy) is easy using simple parity codes, the detection of multiple errors is more difficult to address. In general, the detection logic complexity is exponential with the faults multiplicity, which quickly becomes deterrent in practical applications.
One device can be claimed tamper-resistant only if it is protected, at least to some extent, against both SCA and DFA simultaneously. It must be noticed that the efforts to deploy in protection depends on the threat. To be successful, the best attacks known so far require to garner some thousands 1 of side-channel traces recording (SCAs) [1] but only a couple of faults (DFAs) [2] , [3] from an unprotected device. As a consequence, the need for protection is more stringent against DFA than it is against SCA. This asymmetry is one reason for which the countermeasures against DFA and SCA are nowadays studied separately: this partitioning makes it possible for a designer team to tune the countermeasure efficiency according of the threat urgency, while keeping the flexibility to combine them at the final stage of integration. Another reason why countermeasures against DFA and SCA are considered independently is linked with our state-of-art in defense. The protection against DFA is naturally achieved at an algorithmic level, with the introduction of redundancy in data representation and processing. However, the effective protection against SCA is more subtle, since it requires the removal for any source of leakage through physical sidechannels. Therefore, the widespread methodology consists in using dedicated logic gates along with ad hoc backend steps. As we know how to resist against DFA before the logic synthesis and to resist against DPA after synthesis, it is implicitly considered obvious that the protection against DFA and DPA should be built one on top of each other.
In this article, we advocate that this methodology is neither natural nor efficient. Basically, we show that a class of strong countermeasures against SCA, namely all variants of dualrail with precharge logic (DPL) styles which do not suffer from early evaluation (EE), are already protected against the state-of-the-art fault injection techniques. Thus, by subsuming the individual issues of securization against SCA and DFA into a unique problem, we arrive to an original solution that is economic in resources because of its duality w.r.t. both the SCA and the DFA threats. In addition, we show that the countermeasure is all the more efficient as the faults multiplicity is high, which is a property out of reach of traditional protections based on coding theory.
Some previous works have already attempted to provide joint countermeasures against SCA and DFA, but thanks to specific features of FPGAs. For instance, the two papers [5] , [6] show how to resist SCA and DFA when dynamic partial reconfiguration is available. In our article, we achieve the same result even on low-cost FPGAs that cannot be reconfigured at run-time.
The rest of the article is organized as follows. Section II presents the DPL protection against SCA, and motivates for the preference of DPL without EE. In section III, the protection potential of DPL (w/ or w/o EE) against DFA is explained. The section IV presents a methodology for mapping this protection into FPGAs, and details its performances in terms of resources usage. Finally, conclusions are discussed in section V.
II. DUAL-RAIL WITH PRECHARGE LOGIC STYLES
AGAINST SCAS The goal of a protection against SCAs is to prevent any attacker from the retrieving any information from any internal bit. Various solutions have been proposed to address this requirement. Side-channel masking consists in making the activity of sensitive bits random by rewritting the algorithm in such a way that those variables depend on a external entropy source. Side-channel hiding adds redundant logic so as to end up with a constant activity when sensitive bits are manipulated. Each solution has its own pros and cons; some logic styles, based on "masked DPL gates", even mix the two for an improved security. Still, the comparison between these securization options is beyond the scope of this article.
In this article, we focus on the hiding styles. Indeed, as will be made clear in Sec. III, those styles combine harmoniously with DFA protection, whereas masking styles do not, as demonstrated in [7] . Information hiding at the bit level can be achieved by a large variety of ad hoc encodings and protocols. However, the most convenient ones rely on a so-called dualrail with precharge representation. Every bit a involved in the algorithm is actually mapped into a couple of wires, named (a F , a T ), and called the 'false' and 'true' halves of the dual-rail variable a. The couple (a T , a F ) alternates between two values:
, and designated as a NULL token, playing the role of spacer, and 2) (1, 0) or (0, 1), called VALID0 or VALID1, and designated as a VALID token, carrying the value of a.
One DPL computation alternates NULL and VALID tokens, with the remarkable property that exactly one bit toggle occurs in each transition. A pair of gates (f F , f T ) respects the DPL convention if:
• It propagates the NULL values, i.e., if all the inputs are NULL, then (f F , f T ) is also NULL.
• It propagates the VALID values, i.e., if all the inputs are VALID, then (f F , f T ) is also VALID.
Wave dynamic differential logic (WDDL [8] ) has been the first logic style to implement these conditions. WDDL has the nice property to be separable, meaning that f F (resp. f T ) depends only on the false (resp. the true) inputs half. However, some other properties have been added afterwards to ensure a secure operation of WDDL. First of all, it has been noticed that on the way from all NULL to all VALID values, glitches could occur if the functions (f F , f T ) were not positive [9] . Afterwards, many authors notice concomitantly that the evaluation time depends on the inputs values [10] , [11] . An up-to-date list of known DPLs styles used for side-channel information hiding countermeasure is given in Tab. I. The salient features of these logic styles are briefly described below:
• WDDL is the less complex DPL style because it is separable, which makes it possible to reduce the overhead of each dual network.
• MDPL adds some logic on top of WDDL to swap randomly the logic interconnect pairs, in a view to balance the routing mismatches. Indeed, this problem is not addressed directly by WDDL but is left to the layouter [18] , [19] .
• iMDPL fixes the leakage conveyed by data-dependant evaluation and precharge dates in WDDL and MDPL.
• DRSL combines masking and early evaluation protection, and is optimized to be compact using one standard ASIC cell (OAI222) and all RSL [20] , [21] gates.
• STTL is a non-masked improvement of WDDL style free of early evaluation. STTL is however not balanced in structure, as WDDL, and is limited in speed by the slow validation path, by design longer than the path of the data signal pairs. This limitation seriously impedes the throughput of STTL. Eventually, we underline that STTL requires the routing of three wires per logical signal.
• SecLib is non-masked computation style that fixes the EE issue and features a balanced structure. To be exhaustive, we should also mention the NCL (Null Convention Logic) that is a generalization of SecLib albeit deprived from any balance effort.
• WDDL w/o EE is a logic style dedicated to FPGA that removes the EE without computing a rendezvous. Instead, each functional half gate receives the true and false inputs, and decides to output the VALID value only when all the inputs are VALID. This behavior can be achieved by a purely combinatorial gate, as depicted in Tab. II. The detailed rationale behind the "WDDL w/o EE" style is the following: -The gate outputs NULL{0,1} when the inputs are NULL{0,1} or transitional from this value. -The gate outputs VALID only when all the inputs are VALID. -In case of inconsistent values w.r.t. the DPL convention, the gate outputs an arbitrary NULL value. This logic does not evaluate early by design, and propagates errors: if any input is stuck to NULL or if the input is out of specifications, then the output always remains to NULL too. In addition, this logic does not generate glitches even if the functionality is not positive, and can be inverting. Therefore, the synthesis is more optimized than for plain WDDL.
III. POTENTIAL OF DPL W/O EE FOR PROTECTION AGAINST DFAS

A. Fault Model
We assume in the sequel that multiple faults can be generated locally (by means of a laser or an electromagnetic injection [22] ), but decorrelated one from each other.
B. Early Evaluation Prevention and Faults Transformations
This article is based on [23] , that has already shown that WDDL is immune against multiple asymmetric faults such as those caused by setup violations. Basically, the idea is that asymmetric faults turn a VALID token into a NULL one. The NULL token can propagate until the outputs, being even amplified. However, the NULL wave propagation acts as an eraser, which means that the outputs have eventually lost any information about the faulted values. A parallel is done in [23] 
between asymmetrical faults and the logical propagation of 'U' value in the 9-valued type std_ulogic of VHDL (IEEE standard number 1076). We add in this paper that all dual-rail with precharge logics (DPLs) are actually protected against setup violation attacks. Indeed, they never disclose the faulty result in the presence of a setup violation. Instead, they have two different kinds of behavior:
1) WDDL and MDPL compute results given the inputs, and propagate NULL spacers for the outputs whose values are non decidable. This is the logic behavior of 'U' in VHDL. One could say that faults in these logics are recessive w.r.t. VALID values. 2) iMDPL, DRSL, STTL, SecLib and WDDL w/o EE propagate the NULL on the fault fanout, even if a VALID value could have been deduced. This is the logic behavior of 'X' in VHDL. Along with the former phenotypic metaphor, faults in this second class of logics are dominant, or rather contaminating, as their propagation is indeed an unexpected avalanche effect.
The implication is that DPL in itself does not provide a good protection against symmetrical faults. As a matter of fact, it can filter out a NULL (see Fig. 1(a) ) and generate a faulted VALID from NULL tokens (see Fig. 1(b) ). In contrast, the DPL styles that are EE-free propagate the NULL unconditionally; this feature is even part and parcel of the WDDL w/o EE specification. Additionally, the NULL (behaving like an 'X') always absorbs other VALID faults, as shown in Tab. 2.
C. Propagation of NULL Values Through Substitution Boxes
The fault propagation in logics with EE is exploding in substitution boxes (sboxes). The average number of NULL tokens at the output of various sboxes when one or several NULL tokens of the same type (either NULL0 or NULL1) are at the input has been computed in Tab. III for any logic style subject to EE, such as WDDL or MDPL.
In DPL w/o EE, the propagation is also independent on the implementation. It is also more straightforward as it does not depend on the data: the propagation through a gate occurs iff the output depends on the given input. This is case of all nontrivial gates. Notably, any fault, even single, on the input of an sbox, corrupts the entire sbox output: the propagation is maximal. 
D. Analysis of the DFA Protection of the Proposed Logic
Single bit faults are inefficient against DPL because they turn a VALID data into a NULL token, that propagates and leads to an unexploitable error since it hides the faulted value. This is the typical scenario described in paper [23] . Highly multiple faults generate randomly a large quantity of NULL values along with some more unlikely but devastating bit-flips. However, as NULL values are systematically propagated, they proliferate very quickly after some combinatorial logic layers traversal. And as they have the nice property to contaminate VALID values, the risky coherent bit-flips (simultaneous 0 * → 1 and 1 * → 0 in one dual-rail couple), they jam their propagation hopefully before they reach the algorithm output. This absorption property is all the more efficient as the number of NULL generated by the multiple faults is high. Therefore, the only way to inject a poisonous fault is to stress the circuit sufficiently enough to have multiple faults, without nonetheless creating too many faults so as to leave a chance for them not to be absorbed during their percolation towards the outputs. But, hopefully, in this opportunity window of low stress (generation of 2, 3, or maximum 4 errors because of the high diffusion of cryptographic algorithms), efficient coding schemes can be used in supplement to the DPL w/o EE protection.
To be more accurate, we present a simple model that provides a convincing proof of our assertion. Let us consider a dual-rail circuit that is attacked with a perturbation that is focalized on 2n wires, and that has an intensity sufficient enough to cause m ≤ 2n simultaneous faults. We also make the optimistic hypothesis that the m faults are equidistributed over the 2n wires, and that the flips are truly symmetrical, i.e. it is as likely to flip to a 0 and to a 1. Those conditions modelize a worst case from the defense view point, because they foster coherent bit-flips susceptible to turn a VALID value into a VALID * one, by the mean of two antinomic flips on two wires pertaining to the same dual-rail couple. To further simplify the modelization, we also assume that the attacked block has a perfect diffusion: in practice, this is not exactly true for one round of an algorithm, but for at least two of them (and exactly two in the case of AES). Nevertheless, it helps us grasp more intuitively the idea of the proof without introducing overcomplicated considerations. Therefore, for a fault to successfully propagate through the round, no single NULL shall be generated. Otherwise, the NULL wave catches the fault, because of the perfect diffusion, as already depicted in Fig. 2 . The first constatation is that for VALID faults to be generated, m must be even. Indeed, they are generated by pairs. If, on the contrary, m is odd, then at least one NULL (bit-flip of one wire in a pair) is generated, leading to the VALID fault absorption. Then, a VALID fault is generated iff, given a unique fault, a second one occurs in the paired wire. For m = 2 faults, this happens with probability 1/(2n − 1). For more faults, the generation of solely paired faults consists in always pairing the remaining faults. Then, the probability to generate at least one VALID fault that survives until the output is equal to:
This probability becomes very small starting from a multiplicity of 4 when m increases up to n 2 . This is to be contrasted with schemes involving a coding with error detection. They are basically able to detect:
• all the faults of multiplicity smaller than the error detection capability r 3 , but
• only a ratio of 1 − 1/2 r faults for m > r. The figure 3 compares the rate of successful faults injection depending on the multiplicity, for an n = 8 set of wires, respectively for the proposed scheme based on DPL w/o EE and for a classical integrity check with a linear code detecting r = 2 bits of error.
The authors would like to insist that this is the first time that a countermeasure against DFA proves efficient even in the context of a large number of faults. As a matter of fact, usual schemes, based on spatio-temporal or coding, can be defeated with high probability if the number of faults is greater than the detection capacity. Smartly enough, the implementations using DPL w/o EE take advantage of three properties that all contribute to destroy the VALID faults: 1) faults are very likely to alter only one wire in a pair, especially if the stress is badly localized, thus creating much more NULL tokens than wrong VALID pairs, 2) because of the protection against EE, NULL values win against VALID ones, hereby hiding in particular VALID fault propagation, 2 When m is too large, starting from n, the probability increases, because of the property: p(2n, m) = p(2n, 2n − m).
3 Faults of multiplicity m ≤ r mutate a code word into a non-code word. 3) as the algorithms implement cryptography, they have a high diffusion, which helps the NULL values meet (and thus eat) the possibly faulted VALID values still alive.
IV. CAD FLOW FOR THE PROPOSED COUNTER-MEASURE As every digital system, cryptographic coprocessors can be separated into control and datapath. The datapath contains the secret key related operations. Thus to assure security of the design it is sufficient to secure the datapath only. A design flow to implement a cryptographic coprocessor on an FPGA is shown in Fig. 4 . Since DPL designs are redundant by nature, we have to use customised tool for processing. The goal of this synthesis is to remove the unnecessary logic redundancy while keeping the redundancy needed for DPL style. This cannot be achieved by a standard design flow. An ASIC synthesizer is used to synthesize the design with a library containing only those gates which respect the DPL style constraints. Then the output netlist is processed using a custom tool which converts a single-rail netlist into a DPL netlist. The controller is then connected to the datapath using a wrapper. Thereafter, a legacy FPGA vendor tool does synthesis, mapping, placing & routing for the whole design on the FPGA. Although the design flow is shown for Altera FPGAs, it has also been tested apt for Xilinx FPGAs.
As stated earlier, to secure a design against SCA and DFA we can use a DPL style which is free from EE. WDDL is a DPL style most suited for FPGA designs but it is prone to EE. In [23] , authors implement a WDDL design in FPGAs using a library containing four-input functions which are positive in nature. We use the same methodology in this paper. To make WDDL protected against EE, we limit the library to two-input gates, implemented as per Tab. II.
We have applied these syntheses on an AES [24] datapath in the Stratix family of Altera. More precisely, we used an EP1S25B672C7 device. The table IV summarizes the area of an unprotected datapath, the same datapath protected with an EE-prone logic (namely WDDL) and with an EEfree logic (namely WDDL w/o EE). Both protected designs are embedded in EveSoC [25] , and run at similar maximal frequency (27.24 vs 27.36 MHz).
The implementation size of the "WDDL w/o EE" style is only slightly greater than that of the original "WDDL", V. CONCLUSION This paper shows that, in addition to increasing the resistance against SCAs, the DPL styles also help resist against DFAs. Indeed, single faults consist in turning a VALID token into a NULL one, which conceals the value of the (sensible) data before corruption. The DPL styles that protect against the EE side-channel analysis ensure in addition that the NULL propagation contaminates all the data it crosses in the combinatorial logic cones. Thus, in the case of multiple faults, both VALID faults and NULL tokens are generated, but the NULL tokens destroy the VALID faults prior they arrive at the algorithms inputs. Therefore, we show for the first time that a SCA counter-measure is, as such, already an excellent counter-measure against DFA.
We also introduce WDDL w/o EE, a simple logic style that enhances the plain WDDL style by making it EE-free and having it avoid non-VALID inputs propagation. In addition, the synthesis of WDDL w/o EE is efficient because even noninverting and positive functions are allowed. We provide a mapping of this new logic into LuT4-based FPGAs.
