Abstract-The security of cryptographic implementations relies not only on the algorithm quality but also on the countermeasures to thwart attacks aiming at disclosing the secrecy. These attacks can take advantage of leakages of the secret appearing through the power consumption or the electromagnetic radiations also called "Side Channels". This is for instance the case of the Differential Power Analysis (DPA) or the Correlation Power Analysis (CPA). Fault injections is another threatening attack type targeting specific nets in a view to change their value. The major principle to fight the side-channel attack consists in making the power consumption constant. The masking method allows the designer to get a power consumption which has a constant mean and a variance given by a random variable. Another manner is the Hiding method which consists in generating a constant power consumption by using a Dual-rail with Precharge phase Logic (DPL). This paper presents an overview of the various logic styles that have been promoted in the last six years, with an emphasis on their relative advantages and drawbacks.
I. INTRODUCTION
Many modern cryptographic algorithms are theoretically robust and immune from practical cryptanalysis in the "black box" model. However, some methods can be deployed to break the security by attacking the physical implementation of virtually any algorithm. These attacks can be mounted by merely observing or perturbing the targeted system. Observing the activity of the system and its correlation with potential guesses can yield sensible information. Such attacks are better known as Side Channel Attacks (SCAs). When a device is perturbed such that it yields a non-nominal output, this together with expected output can lead to the secret key. Such attacks are called Differential Fault Analyses (DFAs [1] , [2] ).
The advantages of SCAs are that the system is made to operate in its comfort zone. In such condition, it is difficult to detect that some devices may be observing the activity of the target. To defeat SCA efficiently, the counter-measures have to be submitted at the logic level. Dual-rail Precharge Logic (DPL) is a class of counter-measures which aims at making the device activity constant and independent of data processed.
In this paper, we propose an overview of the main DPL styles with a focus on their vulnerabilities against both the Side-Channel and the Fault attacks.
The rest of the article is organized as follows. Section II presents the DPL counter-measure at logical level and the major vulnerabilities incurred by the backend. Then the Sec. III describes how the vulnerabilities are addressed by the already evaluated logic styles in either FPGA or ASIC. The Sec. IV explains how some optimizations and original solutions can be found using specific technologies. A synoptic comparison between the known logic styles is drawn in Sec. V. Finally, conclusions and perspectives are discussed in Section VI.
II. DPL PRINCIPLE, BUILT-IN DFA RESISTANCE, AND LATENT SIDE-CHANNEL VULNERABILITIES

A. Information Hiding Rationale
The aim of dual-rail with precharge logic (DPL) is to hide the internal circuit's activity from a prospective attacker. If any sensitive variable update occurs with a constant activity, it is likely that all side-channels also are. Therefore, the measurable quantity from an attacker's point of view is independent from any secret value. The protocol of the DPL consists of two phases: precharge and evaluation. The precharge phase allows to start new computations from a known electrical state. It thus prevents unexpected transitions between two computation steps. The dual-rail signalization of the data is conveyed by two wires for each Boolean variable: NULL = (0, 0) or (1, 1) while in precharge and VALID ∈ {(0, 1), (1, 0)} while in evaluation. Therefore, every evaluation consists in the transition of exactly one wire ((0, 0) → (0, 1) or (0, 0) → (1, 0)). If the design is adequately balanced, which transition actually occurred is indiscernible by an attacker.
B. DPL Built-in DFA Resistance
Single bit faults are inefficient against DPL because they turn a VALID data into a NULL token, that propagates and leads to a non exploitable error since it hides the faulted value. This is the typical scenario described in the seminal paper [3] , introducing the intrinsic immunity of DPL against some classes of DFA.
Highly multiple faults ((1, 0) ↔ (0, 1)) generate randomly a large quantity of NULL values along with some more unlikely but devastating bit-flips. However, as NULL values are systematically propagated, they proliferate very quickly after some combinatorial logic layers traversal. And as they have the nice property to contaminate VALID values, the risky coherent bit-flips (simultaneous 0 * → 1 and 1 * → 0 in one dual-rail couple), they jam their propagation hopefully before they reach the algorithm output. This absorption property is all the more efficient as the number of NULL generated by the multiple faults is high. Therefore, the only way to inject a poisonous fault is to stress the circuit sufficiently enough to have multiple faults, without nonetheless creating too many faults so as to leave a chance for them not to be absorbed during their percolation towards the outputs.
C. Vulnerabilities w.r.t. Side-Channel Attacks
Although perfectly sound at logical level, DPL ends up to be concretely implemented in physical devices. Now, the logical description of DPL ignores any timing and capacitance's notions.
Regarding the timing, two unbalance behaviors can occur. On the way from the precharge to the evaluation, and viceversa, there can exist: 1) spurious transitions, referred to as glitches, that negate the hypothesis of activity constantness, and 2) timing-dependent evaluation. Also the implicit hypothesis that all bit toggles are equivalent does not hold, for at least three reasons. Either the placeand-route tool has not balanced the pairs, or the manufacturing process is variable, or the attacker possesses a probe able to preferentially measure the signal emanating from one wire of a pair.
Eventually, we warn that second-order mismatches shall also be considered. Indeed, the glitches concern one net, whereas the timing-dependent evaluation and the unbalance are relative quantities within a dual-rail pair. The crosscoupling is another issue, still unexploited but probably also endangering the security of DPL designs.
The next section III studies how these latent flaws have been addressed by some existing DPL logics, whereas the section IV illustrates technology-dependent optimizations or innovative solutions.
III. DPL FAMILIES BASED ON STANDARD CELLS
A. WDDL
Wave Dynamic Differential Logic (WDDL [4] ) meets all the logical constraints of a DPL. The initial state is propagated by a wave of (0, 0) couples through the netlist thanks to the use solely of positive gates. The fact that exactly one half of the gates evaluate results from the duality between the true and false networks. This duality also make WDDL especially area-efficient: each gate receives only one half of the dual-rail signals. Put differently, WDDL is a separable logic, where the instances of each dual network are not subject to a doubling overhead in terms of fan in. In addition, the positivity of WDDL ensures the absence of glitches in the complete netlist. Notice that WDDL with gates propagating the NULL spacer but without being positive is easily broken in practice, as explained in [5] . However, as shown in [6] , [7] , WDDL is prone to early evaluation and early precharge. The Early Evaluation (EE) effect comes from the difference of delay between two variables of a same gate. Fig. 1 illustrates the EE flaw when variable a is in advance to variable b. In this case the output does not switch at the same time.
Moreover, the dual networks are not necessarily balanced, since the transistor structure of x → f (x) and of x → f (x) differ. Those two issues have made possible some attacks on WDDL circuits, as described for instance by the authors of WDDL themselves in an ASIC [8] or independently in an FPGA [9] . Therefore, either incremental improvements or radically novel strategies have shown up.
B. MDPL
Masked Dual-rail with Precharge Logic (MDPL [10] ) is an attempt to fix the otherwise unbalancedness of WDDL. The assumption is that, in some conditions, it can be difficult to constrain a router to balance the differential interconnect. Indeed, the two solutions available in the literature, namely the fat wire [11] and the backend duplication [12] methods, apply primarily to ASICs. The transposition to FPGA is possible, albeit with less fine-grain control over the result [13] . For this reason, MDPL proposes to swap the true and the false routes randomly, so as to emancipate from the fatal routing unbalance. By the same token, it makes up for the structural unbalance of the dual pair of gates. The only gates involved in the logic are majority functions, both for the true and the false networks. Nonetheless, MDPL fails to provide a solution to the early evaluation and precharge of WDDL. 
C. DRSL
The primary focus of Dual-rail Random Switching Logic (DRSL [14] ) is to make the evaluation and the precharge gates data-independent. For this reason, one pairwise unanimity gate 1 computes the validity of all inputs prior to allowing the gate from delivering any result, thus avoiding the EE flaw. On the contrary, the unanimity makes it possible for the overall DRSL logic to always anticipate the precharge. This optimization is indeed relevant, since the value to be computed while coming back to precharge is data-independent (the NULL token). However, in the original design of DRSL, the functions are not required to be positive. The example of the AND function is sketched in Fig. 2(a) . Hence the presence of data-dependent glitches in the return to precharge phase.
We carried out an extensive simulation of the DRSL AND gate when it returns to precharge. The table I shows the situation where the mask is the fastest to return to NULL. More precisely, we assume that m returns to precharge first, followed by a and b in this order, which we abbreviate as t m→NULL < t a→NULL < t b→NULL . It happens that the DRSL AND gate glitches iff a ⊕ b = 1, irrespective of the mask value. Notice that it could have been anticipated that the glitching property does not depend m if the mask is particular (e.g. the fastest signal) and a and b are equivalent. Indeed, when m = 0, there will be a glitching pattern for the DRSL AND gate computing in the direct convention, whereas when m = 1, the glitching pattern will correspond to a complemented interpretation for the functional signals a and b. As, by design of DRSL, the attacker cannot make the difference between a transition occurring on a true or a false wire, the glitches will be observed without any distinction for each value of m. As a return to NULL with a glitch consists in three transitions (one functional plus two non-functional), whereas a return to NULL without a glitch consists in a single transition, a correlation of the traces with the value a ⊕ b will yield a peak. Assuming that a ⊕ b is sensitive and predictable, this correlation is a means to test hypotheses.
We also studied other types of transitions ordering. In the cases where t a→NULL < t m→NULL < t b→NULL or t a→NULL < t b→NULL < t m→NULL , the DRSL gate also features glitches, 1 The pairwise unanimity Boolean gate performs the following computation: when b ⊕ m = 1, irrespective of the value of variable a. But given that m is an unknown quantity, these glitches do not convey any information about the value of b. The glitches are thus innocuous in these cases. There is however a possible flaw if b is known (e.g. it is a primary input, such as one bit of the plaintext). In this case, the value of the cryptoprocessor-wide masked can be estimated by classifying the traces according to their intensity, as in [15] . However, the situation where the mask is the fastest to return to zero is the most likely, for at least two compelling reasons: 1) As the mask is global (shared by all the protected gates), it is amplified and therefore propagates very fast, in a similar way as a clock signal. 2) Also, the mask is directly available at one register's output, whereas the data signals can traverse many other DRSL instances prior to arriving at the gate's inputs. Two solutions can be imagined to patch the glitching problem of DRSL. The first one consists in adding buffers to delay the signals so as to balance the paths within the DRSL gate. Indeed, the glitches described previously have the duration of the propagation time of the OAI222 (inverted unanimity gate). If the race condition between the fanin of the RSL NAND gate and the OAI222 is balanced, the glitches can be suppressed. Another option consists in implementing DRSL in positive logic, as shown in Fig. 2(b) . This solution has a cost in CMOS logic, because inverting gates are smaller than noninverting ones (actually realized in practice by the composition of an inverting gate with an inverter [16] ). However, this is not constraining in FPGA. A loss in area is nonetheless expected, as the functionality can only consist in positive gates, thereby limiting the degree of freedom of the the logic synthesizers. In this case, the new logic, that we name DRSL+, consists in MDPL augmented with a synchronization by an unanimity cell. The equation for the AND gate becomes:
We attract the reader's attention on the fact that the proposed DRSL+ gate is not an implementation-level correction of DRSL. Instead, DRSL+ really changes the functionality of DRSL (i.e. the Boolean equations for (q T , q F ) are not the same). It is straightword to check that the DRSL AND gate is not positive, whereas equation (1) is, given the sole use of Boolean AND (·) and OR (+) functions involved in the expression 2 . Surprizingly enough, this correction comes with hardly significant overhead. Indeed, the DRSL+ style does not forbid the use of inverting CMOS standard cells. As a matter of fact, the Boolean functionality of DRSL+ can be mapped entirely with standard cells, which is not the case for original DRSL, where the pr, a, b, m → a · b + b · c + c · a + pr function (RSL-NAND) is not a standard cell. In standard cells, the equation (1) can be simplified as:
, idem for q F , with PAOI2 being the inverted majority gate.
The first solution is slightly chancy, unless the delays are chosen in a conservative manner. This second solution, as such, still suffers a little bias of intra-cell early precharge. Let us consider, to simplify the explanation, the DRSL+ AND gate of equation (1) without mask. Without loss of generality, we assume a is faster than b. If a = 1 is known and b is unknown, we can retrieve the value of b by correlation because the evaluation date differs with b. As shown in Fig. III-C , the output needs either to pass successively through two gates, or one through one. This slight unbalance can be fixed by stacking the two solutions one on top of each other, since it reduces the identified skew in the output.
Another attack against DRSL is presented in [17] . Actually, this attack puts forward a vulnerability that is common to all masked DPL styles. The idea is that the masking of the gates allows to make up for the routing unbalance. However, the mask signal is itself differential and therefore unbalanced. As it is not balanced (since this is the hypothesis when resorting to masked DPL), it paradoxically opens the door to an attack on itself.
D. STTL
Secure Triple Track Logic (STTL [18] ) eludes any glitching risk by waiting to evaluate and to precharge until all the inputs are either valid or NULL. This incurs useless delays in the return to precharge phase, which is however only detrimental to performance, not to security. The main drawback of STTL is the requirement to route one synchronization signal slower than the dual-rail, while granting a balanced routing within the dual-rail pair. However, the known methods to balance signals (fat wire and backend duplication) operate on a full netlist, and are therefore difficult to adapt on heterogeneous netlists, in which single-ended and dual-rail signals are mixed up.
E. BCDL
Balanced Cell-based Differential Logic (BCDL) improves on STTL by accelerating the precharge phase, thanks to a global signal. As BCDL design allows to squeeze the precharge step, BCDL can compute about 80% faster than DRSL because the precharge is global, This possibility is depicted in Fig. 4 .
Furthermore, as the global signal is, by design, faster than data signals, BCDL is free from the flaw identified in DRSL. Additionally, BDCL is a truly differential logic. BCDL and STTL can be seen as equivalent at the netlist level: input synchronization logic (either C-elements or unanimity gates) can be factored in STTL to the detriment of the systematic addition of a third routing resource. Refer to Fig. 5 for an illustration.
F. WDDL Variants
Some variants of WDDL have also been devised to ease the balance of the WDDL networks. However, as already explained in the subsection devoted to MDPL, it is known that balancing the WDDL interconnect does not solve the early evaluation (EE) inherent to this logic. Nevertheless, Timing optimization in DPL protocol when the precharge is anticipated, illustrated on one half of a DPL circuit consisting in one twostage register followed by one combinatorial function f . we introduce them here because some of these logics have unexpected positive side-effects on their security w.r.t the EE. [19] to counter-balance one unbalanced network with a dummy dual one. Although this solution is sound in theory, other efforts have been deployed to reduce the overhead associated with the further duplication of hardware in DWDDL. In [13] , the design of a substitution box (sbox) in WDDL, similar to the WDDL in BDD-style presented in [20] , allows for a separation between the true and false halves, thus allowing for a copy-and-paste of the two halves, that are thus guaranteed to have the same backend.
1) DWDDL: Double WDDL (DWDDL) is introduced in
2) WDDL with Divided backend duplication: Divided backend duplication [21] attempts to go one step further by being applicable to any kind of logic (not only the sboxes), has roughly speaking the same overhead as WDDL, while being completely separable in the meantime. Basically, the true/false separation is achieved by preventing the inversions to be replaced by dual wires crossing. However, in a view to keep the precharge propagation to the NULL state, the inverters can be inhibited when in precharge: they are implemented as XNORs with the precharge signal. However, this alteration comes at the cost of two vulnerabilities insertion. If the precharge is concomitant to the clock, then glitches are going to occur due to races between the signals in a non-positive logic. If the precharge is asserted in an individual clock period, then the precharge state does not guarantee anymore a constant number of toggles at evaluation stage.
3) IWWDL: Eventually, Isolated WDDL (IWWDL) [22] is a different strategy to separate a WDDL netlist. Here, inverters are kept but potential glitches are stopped by systematically inserting one register after it. This strategy is expensive in terms of area and requires a redesign of the controller. Additionally, the design becomes much more pipelined, which requires much higher clock frequencies to maintain an acceptable throughput. However, the benefit of this approach is to stop also the propagation of the EE wave. Apart from the very poor performance of IWDDL, this method is however very strong from a pure security standpoint. Only one point is questionable: isn't the complete separation of the netlist opening the door to well located EMA attacks, that can record selectively the activity from only one half of the netlist, thus defeating the activity constantness property. This issue is all the more stringent as the netlist is much larger in IWDDL than in WDDL, because of the large quantity of registers added for the pipeline.
4) WDDL w/o EE:
WDDL w/o EE is a logic style dedicated to FPGA that removes the EE without computing a rendezvous. Instead, each functional half gate receives the true and false inputs, and decides to output the VALID value only when all the inputs are VALID. This behavior can be achieved by a purely combinatorial gate [23] . The detailed rationale behind the "WDDL w/o EE" style is the following:
• The gate outputs NULL{0,1} when the inputs are NULL{0,1} or transitional from this value. Figure 6 . Schematic of the QDI secured AND gate (left) and its internal 3OR architecture (right).
• The gate outputs VALID only when all the inputs are VALID.
• In case of inconsistent values w.r.t. the DPL convention, the gate outputs an arbitrary NULL value.
This logic does not evaluate early by design, and propagates errors: if any input is stuck to NULL or if the input is out of specifications, then the output always remains to NULL too. In addition, this logic does not generate glitches even if the functionality is not positive, and can be inverting. Therefore, the synthesis is more optimized than for plain WDDL.
IV. TECHNOLOGICAL SPECIFIC DPL STYLES
A. Full Custom Optimizations
In 2002, Kris Tiri introduces the "Sense Amplifier Based Logic" (SABL) logic style [24] , [25] , which aim is to make power consumption independent of both the logic values and the sequence of the data. It is therefore the first DPL proposal. Its principle consists in combining Differential and Dynamic Logic (DDL) like in the "Dynamic Cascode Voltage Switch Logic" (DCVSL) style, while fixing second order asymmetry in the gate (especially for complex logic functions), due to parasitic capacitance [26] . This allows to decorrelate the power consumption from the inputs. In 2006, Marco Bucci et al. [27] show that the balance of DPL gates can be improved by adding a systematic discharge after the evaluation. The resulting computations are thus based on a ternary pace: (1) pre-charge, (2) evaluation and (3) post-discharge. When applied to SABL, simulations reveal that a gain of two-order of magnitude is obtained in terms of balance.
SecLib is a full-custom logic style depicted in Fig. 6 introduced in 2004 by Sylvain Guilley et al. [28] . This logic is based on an quasi-delay insensitive asynchronous primitives, that are balanced to provide constant evaluation and precharge time and dissipation. Specially crafted transistor-level symmetry grants SecLib a higher resistance level to attacks than WDDL, albeit at a high cost in terms of silicon area [29] , [30] , [31] .
In [32] , Loïc Duflot et al. describe an optimization for SecLib. The core idea, detailed by Fabien Germain [33] , is to balance the computation thanks to conflict logic after an input configuration decoding stage.
In 2005, SABL and "Dynamic Current Mode Logic" (Dy-CML) [34] are compared by François Macé et al. [35] . In DyCML, only one of the output nodes is discharged during the precharge phase. This leads to better performances, such as a reduction by 80 % of the power delay product and by 50 % of the power consumption. In addition, DyCML is assessed to be more resistant to DPA than SABL.
Recently, Francesco Regazzoni et al. explore the resistance of "MOS Current Mode Logic" (MCML) against DPA [36] , [37] up to simulated attacks. Preliminary results show that MCML has a strong potential for protecting circuits.
B. Asynchronous Logic
Some asynchronous logic styles operate in a DPL mode. If the netlist and their layout is additionally balanced, asynchronous styles can be a candidate for secure computing [38] , [39] , [40] . In addition to the protection against side-channel attacks, asynchronous logics are also more tolerant to the environmental variations, which makes them inherently more difficult to attack with faults injections.
C. Reversible Differential Logic
Reversible logic is a means to compute without loosing energy at any step. This implies that any moment of the computation, the operations may be reversed. Two precursors in this field of research were T. Toffoli [41] and E. Fredkin [42] .
They proved that the concept of reversible computing was indeed realizable physically, provided that the function to implement is logically reversible. Basically, they demonstrated that any bijection can be mapped onto a reversible physical system. However, two difficult issues were left uncovered by their works: 1) a generic synthesis method for arbitrary bijections, and also an algorithm to provide the most compact netlist, is still be found, and 2) an integrable electronic system suitable for the implementation of reversible logic is laking. Indeed, the only concrete example illustrating Toffoli & Fredkin's work was the famous albeit unpractical "Billiard Ball" model, that cannot extend to thousands of interactions, as required by our modern computational needs. The first question has received some answers [43] , [44] . Regarding the second point, it has been covered by some researchers, for instance in this article [45] . In this paper, the authors describe some implementations in CMOS for representative reversible logic gates. Table II draws up a comparison of the main DPL styles, in terms of principle, design constraints and performance, highlighting most of the known advantages (masking, synchronization) and drawbacks (primitives and back-end constraints, and technological bias) of such counter-measures.
V. DPL STYLES COMPARISON
Masking allows to greatly reduce the technological bias, but also results in a significant increase of area. As a matter of fact, it requires at least a transformation of 2-input operations AJ(a, b, c) . = a · b + b · c + c · a. ‡ ni is the maximum number of inverters amongst all combinatorial paths. into 3-input majority function (MDPL) or into a 4-input RSL gate (DRSL).
Synchronization on both precharge and evaluation is mandatory to avoid glitches and early propagation effects.
Primitive constraints induce a higher complexity, by reducing the panel of usable functions (like in WDDL where only positive functions are allowed), or by binding the designer to use specific functions that can be more area-consuming or slower than basic ones (Seclib, MDPL, DRSL).
Back-end constraints generate extra design work as the P/R stage has to meet specific requirements to achieve a good balance between the T and F networks. It can also cause a loss of performance, like in STTL where the synchronisation signal must be manually made slower than the others, by adding delay elements between each gates, in order to ensure that it always switches last.
Technological bias corresponds to the imbalance between the True and False networks. It encompasses the load, interconnect and CMOS structure differences. This is a significant source of information leakage, and must therefore be as low as possible to ensure a perfectly secure counter-measure.
VI. CONCLUSIONS
In this article, we presented the different DPL logic styles aiming at Hiding the cryptoprocessors activity to thwart the side-channel attacks. Although the DPL logic is based on an elegant manner to obtain secure implementations, flaws exist at logical and physical level. The different logic styles are more or less able to counteract these negative effects but often with an higher complexity or back-end design. This paper permits the understanding of the main DPL style and draws a comparison between them in order to help the pros and the cons analysis. Research on new DPL styles is still active to improve the robustness and keep a good compromise with complexity and performances requirements.
