Abstract-In this paper, we present and evaluate a hardware implementation of the PRESENT block cipher secured against both side-channel analysis and fault attacks (FAs). The side-channel security is provided by the first-order threshold implementation masking scheme of the serialized PRESENT proposed by Poschmann et al. For the FA resistance, we employ the Private Circuits II countermeasure presented by Ishai et al. at Eurocrypt 2006, which we tailor to resist arbitrary 1-bit faults. We perform a side-channel evaluation using the state-of-the-art leakage detection tests, quantify the resource overhead of the Private Circuits II countermeasure, subdue the implementation to established differential FAs against the PRESENT block cipher, and contemplate on the structural resistance of the countermeasure. This paper provides the detailed instructions on how to successfully achieve a secure Private Circuits II implementation for the data path as well as the control logic.
I. INTRODUCTION
T HE presence of symmetric cryptography in embedded systems is necessary to achieve confidentiality and integrity for both the users and their data. Whereas modern ciphers offer strong, computationally unbreakable security on the algorithmic level, many of their implementations are susceptible to physical attacks. Physical attacks exploit characteristics of the implementation itself in order to retrieve secret keys with much more ease compared with cryptanalytic or brute-force attacks.
Two well-known classes of physical attacks are side-channel analysis (SCA) and fault attacks (FAs). In SCA, a passive adversary observes, e.g., the time duration [1] , the power consumption [2] , or the electromagnetic emanation [3] , [4] of the cryptosystem in an attempt to obtain information about sensitive values of the computed algorithm. Popular The authors are with KU Leuven, Leuven, Belgium, and also with imec, 3001 Leuvan, Belgium (e-mail: thomas.decnudde@kuleuven.be; svetla.nikova@kuleuven.be).
Digital Object Identifier 10.1109/TVLSI. 2017.2713483 approaches to mount such attacks are differential power analysis (DPA) [2] and correlation power analysis [5] . FAs, on the other hand, require an active adversary, that is, the adversary tampers with the device and exploits its behavior afterward [6] . An attack that illustrates the power of FAs is differential fault analysis (DFA) [7] . SCA and FA can be combined to form even more powerful attacks [8] - [12] .
Countermeasures are required to harden embedded systems against such real-world attacks. Similar to the classification of the attacks, countermeasures have been predominantly researched separately. For SCA, a popular and widely implemented countermeasure is masking [13] , [14] . It provides provable security [15] by randomizing the computed intermediates on the algorithmic level in order to decorrelate secret values from their side channels. Countermeasures against FAs generally rely on some form of redundancy. Either a given computation is repeated (time redundancy) or the same operation is computed in parallel (area redundancy) in order to check whether a fault was injected [16] . Appending error correction or detection codes to the intermediate values forms another option for increasing the system's security [17] . Yet another approach is to rely on shielding countermeasures, e.g., shielding parts of an integrated circuit (IC) to prevent optical attacks [18] . In case tampering is detected, an alarm can be triggered to withhold the faulty ciphertexts from being output to prevent DFA attacks.
An interesting approach toward provable resistance against both passive and active attacks is Private Circuits II (PC-II) [19] . It is an extension of Private Circuits (PC-I) [or the Ishai, Sahai and Wagner (ISW) masking scheme] [20] , on which it relies to obtain protection from passive attacks.
A practical drawback of the ISW algorithm is that its security relies on ideal gates, i.e., gates that evaluate only once per clock cycle and in the right order. Satisfying the ideal gate requirement in CMOS logic is costly and failing to do so leads to a deterioration in the security of the masking scheme through glitches and early evaluation [21] . As an alternative to ISW, the threshold implementations (TIs) masking scheme [22] - [24] has gained in popularity for hardware applications, since it does not require ideal gates.
A. Related Work
While the introduction of Private Circuits [20] and Private Circuits II [19] their most notable implementation-related aspects have only recently been published. We provide a brief overview of these different advances.
A first move toward a PC-II implementation was made by Rakotomalala et al. [25] and presented at RECONFIG 2015. In summary, the authors implement the PC-II countermeasure with tamper resistance against reset attacks on a fieldprogrammable gate array (FPGA). Their design is based on a manually coded, fully combinational PC-I implementation followed by a PC-II encoding using FPGA specific primitives. In addition, an assessment of the security against critical path violations is provided. The underlying PC-I implementation has, however, been shown to contain security flaws [26] , of which the reasons were pointed out by Roy et al. [27] . Undesired circuit optimization by tools in the design flow, glitches, and early propagation of signal values all have a noticeable impact on the security of the masked implementations. By registering the output of every gate, a fully sequential implementation of Private Circuits was established as a secure but expensive solution.
At CRYPTO 2015, Reparaz et al. [28] presented a strategy for secure Private Circuits implementations in the presence of glitches. By inheriting the so-called noncompleteness property from TI, their strategy provides a cheaper and more efficient PC-I implementation compared with the fully sequential realization of [27] .
A different approach toward combined SCA and FA security was presented by Schneider et al. [29] at CRYPTO 2016. By combining TIs with error detection, the lightweight LED cipher [30] was implemented to achieve the first-order SCA resistance in combination with a fault detection capability that exceeds duplication at a very low increase in area cost.
B. Contribution
This paper builds upon and extends this paper work presented at FDTC 2016 [31] , where we compared PC-I and TI in an equal and fair setting to obtain a novel and more efficient approach toward PC-II. Due to limitation with respect to the size of the targeted FPGA, no practical PC-II implementation was possible, and instead, this paper focused primarily on a comparative study between PC-I and TI as base for a PC-II implementation. In addition, a theoretical description was offered on how to securely implement Private Circuits II in FPGAs.
In this extended version, we apply our theoretical initial results in practice on a larger FPGA platform, where the PC-II implementation benefits from larger lookup tables (LUTs). Our results are applied on the lightweight PRESENT block cipher [32] to achieve the combined first-order SCA security and resistance against arbitrary 1-bit fault injection, i.e., a 1-bit set, reset, or toggle attack. Since the PC-II implementation can be accommodated in the newly targeted FPGA, we advance the previous work by perform the security evaluation, confirming that the total PC-II design indeed offers SCA resistance. In addition, we provide implementation costs of our design in gate equivalents using an open-source library, allowing for convenient comparison with other implementations. Another novel addition is an extended security analysis by evaluating the success of established DFA attacks on PRESENT. Moreover, we discuss how to further increase the FA resistance of our PC-II implementation by relying on structural and topological aspects of the back-end placement and routing steps. While applying the PC-II extensions over TI, we detail every step taken to provide a guide for future PC-II implementations.
C. Paper Organization
In Section II, we give the necessary theoretical background with respect to the PRESENT algorithm, the TIs masking scheme, PC-II, and leakage detection. We implement the TI of PRESENT as in [33] and give the result of its leakage detection test in Section III, before we apply PC-II in Section IV. We discuss the implementation cost and the security of our resulting implementation against known DFAs in Section V. We draw conclusions and propose directions for future work in Section VI.
II. PRELIMINARIES
We now give an overview of the required background theory. Due to the vast information given in the original Private Circuit works [19] , [20] , we only provide the information that we use throughout this paper. We start by providing our notation conventions and definitions.
Using the standard convention that a dth-order SCA attack exploits the dth-order statistical moment of the shares, a sensitive value can be split in at least d + 1 new values, or shares, such that all shares are required to reconstruct the original value. If the reconstruction operation is the Boolean addition, this process is referred to as Boolean masking. This way, a (d + 1) th DPA attack needs to be mounted to break the dth-order masked implementation. Since the number of measurements needed for a successful attack increases exponentially with the masking order, one typically guarantees only security up to a certain order.
Lower case letters represent the sensitive values in GF(2 m ). These elements are then split in s in shares a = (a 1 , a 2 , . . . , a s in ) using Boolean masking. The sharing is said to be uniform if s in − 1 shares are assigned from a uniform random distribution and the remaining share is chosen to satisfy a s in = a
Boolean addition (XOR) of a and b is denoted by a ⊕ b, their bitwise multiplication (AND) by ab, and the inversion of a is denoted by ¬a.
We adopt a modified d-probing adversary model that includes glitches. In the d-probing model with glitches, the adversary is allowed to probe d wires in a circuit per clock cycle and observes all values that occur on that wire during a computation, even temporary or intermediate ones. We modify the d-probing adversary of [20] slightly by not allowing adaptive probes, i.e., we do not allow the attacker to move probes within a clock cycle nor over clock cycle boundaries. 1 While this adapted model might appear to reduce the security, this model has shown to be more realistic in hardware scenarios. Moreover, the ability of our adversary to observe all intermediate values on a wire results in a more flexible model, which keeps the security of our implementation unaffected from Effects, such as glitches in CMOS.
A. PRESENT Block Cipher
The PRESENT symmetric key block cipher [32] is designed with the heavy constraints on area and performance of lightweight hardware applications in mind. As a result, it is aggressively optimized for hardware environments and forms an ideal candidate for Internet-of-Things applications. It was made an ISO standard in 2012. Its block length equals 64 bit and the key lengths of 80 and 128 bit are supported, referred to as PRESENT-80 and PRESENT-128, respectively, of which PRESENT-80 is recommended for lightweight applications and is the target for our implementation. The PRESENT cipher iterates through 31 rounds followed by a final key whitening stage. Each round consists of a round key addition and a substitution-permutation network. The permutation layer is a bitwise rewiring governed by p out (i ) = p in (16 i mod 63) and comes at no cost in hardware. The substitution layer applies a 4-bit S-box S : GF(2 4 ) → GF(2 4 ) on each nibble of the state registers. We refer to the original work for the full details [32] .
B. Threshold Implementations
The TIs masking technique achieves provable security against the dth-order DPA attacks. By making minimal assumptions on the underlying hardware, security can be attained at any chosen order d even in the presence of glitches [24] .
The transformation of a circuit C to a dth-order TI [22] - [24] , [34] C is obtained as follows.
1) Input Encoding: Each sensitive value is split into s in ≥ d + 1 shares using uniform Boolean masking. The outputs of the component functions should be constructed to satisfy uniformity, just as the inputs do. Several ways to achieve this property have been proposed [33] - [38] . 3) Output Decoding: The output value is reconstructed from the output shares by unmasking c = c i .
C. Private Circuits II
Private Circuits II is an extension of the Private Circuits masking scheme that adds protection against an adversary capable of modifying values anywhere in a circuit, on a chosen number of individual wires between logical gates [19] . In contrast to other FA countermeasures [39] , no part of the circuit needs to be completely free from tampering.
Private Circuits II comes in two styles. 1) tamper resistance against an unbounded number of adaptive reset-only wire faults; 2) tamper resistance against a bounded number e of arbitrary wire faults (set, reset, or toggle) per clock cycle. In both constructions, an (optionally infective [40] ) circuit is achieved, which resists both FAs and SCA attacks. Two transformations are carried out to transform a circuit C to a tamper resistant circuit C : one for the circuit itself and one for the data. The starting point is a side-channel resistant circuit, which we achieve through TIs instead of the originally proposed Private Circuits [31] .
Since we protect the PRESENT block cipher against any type of fault, we limit our overview to the more general PC-II construction.
1) Tamper Resistance Against General Attacks on Wires:
In this model, the adversary can set the values in any of the circuit wires to 0 or 1, as well as toggle its value.
A limit e on the number of newly targeted wires per clock cycle is imposed, but the attacker is allowed to release the perturbations without restrictions on the number or time of release. Hence, persistent or permanent faults are only counted on their introduction in the circuit.
To achieve the tamper resistance, a circuit is first transformed into a dth-order SCA resistant circuit. Afterward, a 2de repetition encoding is applied to all data values and the gates are replaced by the so-called gadgets operating on these encoded values.
These actions are formalized as follows. Fully infective behavior within one clock cycle is achieved by following the structure shown in Fig. 1 , where the error cascading gadget is listed in Table I . We recall that this stage is optional [19] . 4) Output Decoding: The final output can be decoded by ignoring all but the first of the 2de output wires.
D. T-Test-Based Leakage Detection
One way to test for potential side-channel leakages, which might lead to successful key retrieval attacks in cryptographic systems, is based on the t-test statistic [41] - [43] . This method is convenient due to its independence of an underlying leakage model while still being sensitive enough to uncover a wide range of potential problems. After acquiring a sufficient number of power consumption traces, the traces are divided in two sets, A and B, based on an intermediate value in the computation. Throughout this paper, we employ the nonspecific t-test, which fixes the input for one of the sets (we choose to fix the input message to zero) while randomizing the input message for the other set. For testing leakage in the first-order statistical moment, the t-test statistic is calculated samplewise on the two sets A and B as
where T i , s 2 i , and N i are the sample mean, sample variance, and sample size of the set T i∈A,B , respectively. If no t-test value exceeds a certain confidence threshold ±C, no relation between the processed intermediate value and the mean instantaneous power consumption can be found. When the t-test statistic exceeds ±C, the power consumption and their processed intermediate values are related in a statistically significant way with a confidence level related to C, making the device potentially vulnerable to the first-order SCA attacks. Throughout this paper, we set the confidence level to ±C = ±4.5, which corresponds to a 99.999% certainty of the concluded outcome.
III. MASKING PROCESS
In this section, we reproduce and evaluate the TI of PRESENT proposed by Poschmann et al. [33] in order to assert a sound, SCA resistant basis for our PC-II implementation. We implement the first-order secure PRESENT with both a masked state and key, and test the result with the state-ofthe-art leakage detection tests.
A. Masking PRESENT With Threshold Implementations
The round key addition and permutation of the cipher can be performed on each share independently due to the linearity of these operations. The S-box operation is correspondingly harder to mask due to its nonlinearity. We therefore direct our attention to masking the nonlinear PRESENT S-box, which performs the substitution S(x) given in Table II. A compact way of sharing the PRESENT S-box is proposed by Poschmann et al. [33] . It decomposes the S-box S(x) (of algebraic degree three) into two functions g = G(x) and f = F(x) (each of algebraic degree two) such that S(x) = F (G(x) ). The S-box and its decomposed functions G and F are given in Table II . In order to guarantee the uniformity at the input of F, the evaluation of the nonlinear functions G and F needs to be separated by registers. The total S-box is then computed in two clock cycles. The algebraic normal forms of (g 3 
0 ) are listed in the following, where x 3 , g 3 , and f 3 represent the most significant bits:
For the shared versions of these equations, we refer to the Appendix.
B. Testing PRESENT-TI With Leakage Detection Tests
We perform the evaluation of the security by loading the resulting implementation onto a SAKURA-G board [44] . The SAKURA-G (Side-Channel Attack User Reference Architecture) provides an SCA evaluation environment based around two Xilinx Spartan-6 FPGAs: an XC6SLX75 to hold our cryptographic implementations and an XC6SLX9 for controlling the communication between the board, the measurement PC, and other equipment. During synthesis, we select the Xilinx "Keep Hierarchy" option to avoid optimizations over boundaries of the individual shares. We keep all other synthesis and implementation options to their default value.
We provide a stable clock of 3 MHz to the FPGAs and measure the instantaneous power consumption as the voltage drop over a 1-resistor between the ground lines of the crypto FPGA core and the board. We acquire the power traces with a Tektronix DPO 7254C oscilloscope at a sample rate of 1 GHz/s. We submit our designs to t-test-based leakage detection [41] - [43] , since they provide a very powerful way to ascertain side-channel resistance.
To show that the security of our design uniquely comes from a sound masked implementation, we go through the following steps. We first bias the uniformity of the input shares by turning OFF the random number generator (RNG), fixing one of the three initial input shares to the plaintext, while setting the remaining shares to zero. In that case, we expect that the leakage detection test will reveal t-values beyond the confidence interval of ±4.5. If this expectation is fulfilled, we assume that the soundness of the test setup is asserted and proceed with the evaluation by turning on the RNG of our designs. The lack of bias in uniformity, and thus the correct application of all properties of the masking schemes, is then exclusively responsible for any increase in SCA resistance.
For the PRESENT TI, this gives the following security evaluation.
1) RNG Off:
The result of the leakage detection tests for the PRESENT-TI with biased masks is shown in Fig. 2 . With 20k traces, the t-value goes beyond the confidence interval of ±4.5, and we can conclude that our measurement setup is sound.
2) RNG On: The result of the leakage detection test on PRESENT-TI with the activated RNG is shown in Fig. 2 . Leaks are present in the second-order t-test, while no first-order t-values fall outside the confidence interval. Our PRESENT-TI implementation achieves the targeted first-order security with 100 million traces and provides us with a solid foundation for a PC-II implementation. Fig. 2 . Masked PRESENT-TI, from top to bottom: average power consumption trace of 1.5 rounds of a masked encryption, first-order t-test with biased masks using 20k traces, first-order t-test with uniform masks using 100M traces, and second-order t-test with uniform masks using 100M traces. 
IV. APPLYING PC-II
We proceed by applying the remaining PC-II transformations on the PRESENT-TI. While our model assumes an attacker to only inject a 1-bit fault on a wire (e = 1), the explained process can be extended to any number of faulty bit injections.
A. Encoding
Transforming every wire into a wire pair is straightforward. All wires and registers are simply duplicated. In contrast to masking, where only the data-dependent logic and registers must be protected, this duplication has to be applied to all wires and registers, including the ones responsible for the control of the data flow. If their protection is overlooked, intermediate states from which the key can be trivially retrieved can be made to prematurely appear at the outputs through a careful fault injection. This is exemplified in Fig. 3 : activating the ready signal at the start of an encryption would make a key retrieval straightforward for an attacker.
B. Gate Transformation
After all wires and registers are encoded, we transform all gates in the circuit to their respective PC-II gadgets. In addition to the gates of the shared S-box, permutation, and round key addition, this transformation has to include the internal gates of the two adders, the multiplexers and the comparators of the design also. An example of this can be found in the structure of the PRESENT-TI control logic, which is shown in Fig. 4 . The internal signals of, e.g., the multiplexers (shown in Fig. 5 ) have to undergo the encoding and the gates transformation too.
In Section II, it was noted that reversible NOT gates are required for PC-II in the general attack model. The reason for this is described in the original work [19] and boils down to being able to consider the NOT gates as part of atomic AND gates inside PC-II gadgets. We can omit the need for reversible NOT gates in our FPGA implementation by packing the logic of each single PC-II gadget inside individual six-input, dual output LUTs of the Spartan-6 FPGA. With the original assumption that attacks are only performed on wires, the design remains a valid PC-II implementation as a whole LUT can be considered atomic. For standard cell ICs, this can be achieved by placing the atomic standard cells related to the same gadget adjacently and keeping their connections on the lowest routing layers. Fig. 6 . Masked PC-II protected PRESENT, from top to bottom: average power consumption trace of 1.5 rounds of a masked encryption, first-order t-test with biased masks using 20k traces, first-order t-test with uniform masks using 100M traces, and second-order t-test with uniform masks using 100M traces. We achieve this by expressing all operations in the Hardware Description Language of the PRESENT-TI using the following atomic gates: AND, XAND, 2 OR, NOR, XOR, XNOR, and NOT gates. Since all these functions have at most four inputs and two outputs in their encoded form, we map these to the six-input, dual output LUTs of the Spartan-6 FPGA using the Xilinx LU T _M AP constraint. This is an alternative to hard coding the LUT functionality as was done in the PC-II implementation of Rakotomalala et al. [25] .
C. Error Cascading
Error cascades are nonlinear gadgets that forward the input unless an invalid encoding is detected at one of or both the inputs. In that case, both its encoded outputs are made invalid (⊥).
We provide a toy example to explain the effect of the error cascading stage on the SCA security. Assume that we have two shares of a 1-bit value s = s 1 ⊕ s 2 , such that s 1 = s ⊕ r and s 2 = r , where r is a random bit drawn from a uniform distribution. These two values are passed through an error cascade, of which the function is given in Table I . Its underlying circuit is of the form OR of ANDs of all the shares. This can be seen by observing its four outputs 1 ¬s 1,0 ¬s 2,1 s 2,0 ) ⊕ (s 1,1 ¬s 1,0 s 2,1 ¬s 2,0 
For the error cascading stage to be effective, it should pass all pairs of wires, as shown in Fig. 1 . This creates a combinational circuit that nonlinearly relates all shares, and invalidates the noncompleteness property. When using CMOS, glitching and early propagation of values can cause the circuit to potentially leak. Since this stage is optional [19] , we will omit its implementation.
D. Leakage Detection
The resulting implementation needs to satisfy the first-order SCA security. To test this claim, we follow the same approach as with our previous security evaluations for the PRESENT-TI.
1) RNG Off:
The result of the leakage detection tests for the biased PC-II protected PRESENT is shown in Fig. 6 . With 20k traces, the t-value goes beyond the confidence interval of ±4.5 and we can again conclude that our measurement setup is sound.
2) RNG On: The results of the leakage detection tests on PC-II protected PRESENT with the activated RNG is shown in Fig. 6 . Again, clear leaks are present in the second-order t-test, while no first-order t-values fall outside the confidence interval. The PC-II protected PRESENT implementation achieves the targeted first-order SCA security with 100 million traces.
E. Fault Attack Simulation
As mentioned in the example in Section IV-A, an attractive target for an attacker to inject a fault on is the ready signal. The ready signal guards the intermediate states from appearing prematurely at the output and activates the outputs only when the cipher has finished the right number of rounds. By validating this signal at the start of an encryption using a fault injection, intermediate states of the cipher become observable and cryptanalysis becomes feasible.
We now simulate this fault injection on the ready signal of the unprotected and PC-II protected PRESENT implementations. Fig. 7 shows the simulation traces of several signals of the unprotected PRESENT design. The clk signal represents the system clock, the start signal activates the start of an encryption, and the ready signal indicates when the output is available. We show the decoded input, key, and output shares by decoded_inp, decoded_key, and decoded_out, respectively. The ready signal in this example is stuck-at-1, i.e., every intermediate state is observable at the output of the cipher. By knowing the plaintext and observing the first intermediate result, the key can be obtained without much effort in an unprotected implementation. In Fig. 8 , we show the same simulation applied on the PC-II protected PRESENT. Since the ready signal and the bits of the state values pass through an AND gadget (see Fig. 3 ), their decoded output values become zero when a fault is present at their inputs. The invalid signal 01 on the encoded ready signal is detected, and the countermeasure succeeds in making the injected fault unexploitable.
Similarly, attacks on the data that are covered by our fault model will not succeed. Fig. 9 shows a correct and faulty encryption, where the fault is generated by flipping a bit in the first S-box lookup of the penultimate round. Fig. 10 shows this same 1-bit fault injected in the PC-II protected PRESENT. While not all ciphertext bits are zeroized due to the absent error cascading stage, the injected fault is detected and invalid values are propagated through parts of the circuit, resulting in several bits turning zero. Table III lists the area costs of the individual components of both the TI and PC-II implementation of PRESENT. Table IV lists the area costs of the individual PC-II gadgets that we use in our PC-II design. All area estimations are obtained with the [45] .
V. DISCUSSION

A. Area Cost
When going from a first-order SCA resistant TI to a first-order SCA resistant PC-II implementation resisting 1-bit faults, an increase in area of factor 8.75 is required. The bulk of this factor originates from the increase in area of the gadgets compared with their corresponding gates, since the duplication of the registers would only lead to a duplication in area cost.
B. Increased Resistance Against Differential Fault Analysis
We now evaluate the increased resistance against two DFA attacks on PRESENT: 1) a DFA on the key schedule introduced by Wang and Wang [46] and 2) a DFA on the internal state introduced by Bagheri et al. [47] .
1) DFA on the PRESENT Key Schedule: The attack on the key schedule uses a fault model where 4-bit random faults are injected in either the 30th or 31st round key. Once an attacker obtains (on average) 64 pairs of correct and faulty ciphertexts, the secret key can be retrieved with a complexity of 2 29 .
Since our implementation only provides protection against 1-bit faults, a successful key retrieval is possible when the attacker's power is outside the model. Some increased DFA resistance is, however, obtained from our implementation. First, an attacker now has to inject an 8-bit fault instead of a 4-bit fault and needs to make sure the 8-bit fault is an encoded version of the 4-bit fault, and otherwise, the circuit will start to infectively erase values by propagating the invalid signal ⊥ * . From a theoretical point of view, only 2 4 of the 2 8 8-bit values lead to a useful fault injection. As a result, the fault injection procedure will require 2 4 times more effort, leading to needing 1024 pairs of correct and faulty ciphertexts on average for a successful key retrieval.
2) DFA on the PRESENT State Registers: Bagheri et al. [47] present two DFAs on the PRESENT state registers. The first uses a fault on a single bit of the intermediate state at the start of the S-box layer of the last round. This attack leads to a retrieval of the last subkey with an average of 48 correct and The first attack using the single bit fault falls within the bounds of what our implementation can secure and will always fail, as shown in Fig. 10 . The second attack will require the 4-bit random fault to be extended to a restricted 8-bit fault similar to the condition of the previously studied DFA on the key schedule. The fault injection procedure for this second attack to succeed will require 2 4 times more effort than the unprotected version, requiring on average 192 correct and faulty ciphertext pairs for a key retrieval.
C. Further Increased Resistance Against Differential Fault Analysis
The resistance against DFA can be further increased in the back-end design process. This is illustrated by the routing configurations of the wires shown in Fig. 11 . Encoded wires lying adjacently in the same layer can lead to easier fault injection than when the wires are separated in different metal layers. We leave the investigation of the effect of placement and routing on FA resistance for future work.
VI. CONCLUSION In this paper, we implemented and evaluated a Private Circuit II of the PRESENT block cipher based on the serialized TI of Poschmann et al. [33] . We obtained an implementation that resists combined first-order side-channel attacks and single bit FAs. The area cost compared with a side-channel resistant circuit increases with factor 8.75 and mainly originates from the complexity of the gadgets. While this cost is significant, our design benefits from an increase in attacker effort to mount differential FAs that fall outside our security model. In addition to the data path, the control logic is secured, leading to a protection against trivial attacks, e.g., revealing intermediate state values at the output of the circuits by injecting a fault in a well-chosen signal. Our implementation was succumbed to the state-of-the-art leakage detection tests, which it passed with power consumption traces corresponding to 100 million encryptions.
The Private Circuits II countermeasure is shown to be expensive. It is a succession of two separate transformations on a circuit, the first one obtains SCA resistance and the second one obtains additional FA resistance. Turning this separation of transformations into a more integrated approach while designing a combined SCA and FA countermeasure is therefore a promising approach to realize a less costly implementation, e.g., by drawing on established techniques from the field of secure multiparty computation.
As additional future work, we propose the investigation of the influence of placement and routing on FA resistance. We additionally propose the investigation of applying PC-II to an SCA resistant circuit that depends on an RNG, since our considered PRESENT-80 implementation does not consume any randomness during its execution.
APPENDIX
The algebraic normal forms of the shared functions G(x) and F(x) from Poschmann et al. [33] are listed in the following. A first subscript index represents the share and a second subscript indicates the bit of that share, with index 3 representing the most significant bit G 1 (x 2 , x 3 ) = (g 1,3 , g 1,2 , g 1,1 , g 1,0 ) g 1,3 = x 2,2 ⊕ x 2,1 ⊕ x 2,0 g 1,2 = 1 ⊕ x 2,2 ⊕ x 2,1 g 1,1 = 1 ⊕ x 2,3 ⊕ x 2,1 ⊕ x 2,2 x 2,0 ⊕ x 2,2 x 3,0 ⊕ x 3,2 x 2,0 ⊕ x 2,1 x 2,0 ⊕ x 2,1 x 3,0 ⊕ x 3,1 x 2,0 g 1,0 = 1 ⊕ x 2,0 ⊕ x 2,3 x 2,2 ⊕ x 2,3 x 3,2 ⊕ x 3,3 x 2,2 ⊕ x 2,3 x 2,1 ⊕ x 2,3 x 3,1 ⊕ x 3,3 x 2,1 ⊕ x 2,2 x 2,1 ⊕ x 2,2 x 3,1 ⊕ x 3,2 x 2,1 3 , g 2,2 , g 2,1 , g 2,0 ) 
