Abstract-Cryptocircuits can be attacked by third parties using differential power analysis (DPA), which uses power consumption dependence on data being processed to reveal critical information. To protect security devices against this issue, differential logic styles with (almost) constant power dissipation are widely used. However, to use such circuits effectively for secure applications it is necessary to eliminate any energy-secure flaw in security in the shape of memory effects that could leak information. This paper proposes a design methodology to improve pull-down logic configuration for secure differential gates by redistributing the charge stored in internal nodes and thus, removing memory effects that represent a significant threat to security. To evaluate the methodology, it was applied to the design of AND/NAND and XOR/XNOR gates in a 90 nm technology, adopting the sense amplifier based logic (SABL) style for the pull-up network. The proposed solutions leak less information than typical SABL gates, increasing security by at least two orders of magnitude and with negligible performance degradation. A simulation-based DPA attack on the Sbox9 cryptographic module used in the Kasumi algorithm, implemented with complementary metal-oxide-semiconductor, SABL and proposed gates, was performed. The results obtained illustrate that the number of measurements needed to disclose the key increased by much more than one order of magnitude when using our proposal. This paper also discusses how the effectivenness of DPA attacks is influenced by operating temperature and details how to insure energy-secure operations in the new proposals.
I. INTRODUCTION

I
N THE current information and communication technology based world, security is a major concern. Privacy is considered an important personal right. Commonly used devices like smart cards and other embedded devices need encryption technology to guarantee security. Encryption security is typically based on mathematically secure algorithms, designed to produce a ciphertext from a plaintext that can not be mathematically attacked [1] . But even when such theoretical security is achieved, the physical implementation of the encryption algorithm leaks side-channel information that can be used by an attacker to reveal the secret key [2] - [6] . The physical implementations of cryptographic devices therefore have to be carefully considered.
Side channel attacks (SCAs) on cryptographic devices use certain physical information such as power consumption, time delay, or electromagnetic radiation to find the secret key [2] - [5] . SCAs can be noninvasive and usually require minimal equipment, hence they are easy to carry out [6] . Of all SCAs, differential power analysis (DPA) [4] , [6] - [8] , is one of the most powerful for its simplicity and effectiveness. DPA attacks are based on the well known fact that dynamic power consumption in a logic circuit is dependent on the data being processed by the device.
Thus, an attacker can obtain the secret key by measuring the power supply current of a cryptographic device while it is performing an encryption, and by statistically analyzing of the measured power traces. Nanometric technologies with a drastic increase in leakage power are also vulnerable to similar leakageassociated attacks [9] - [11] .
Since the first DPA attack on smart cards in 1999 [4] , dozens of countermeasures have been proposed to deal with this type of intrusion. They are shown in Fig. 1 . The earliest methods of combatting DPA, such as the incorporation of random power consuming operations [12] and introduction of random delays [13] , among others [14] , proved generally to be ineffective, since they only slightly increase the number of measurements to disclose (MTDs) required to recover the secret key [6] .
To maximize DPA attack prevention, numerous methods based on protecting cryptosystems at algorithm level have been presented, with some noteworthy solutions being based on duplication [15] - [17] . However, algorithm-based security techniques are very specific and difficult to automate, due to their heavy dependence on specific cryptographic algorithm.
On the other hand, circuit-level countermeasures are more generic, since they are not constrained to one specific cryptographic algorithm. Once a practical method has been found, designers need worry no more about the security of implementations for a specific algorithm, and this make automatic design feasible. This type of solution falls into two categories: gate level mask circuits and complementary circuits.
Masking at gate level is analyzed in [18] , and some implementations of masked gate circuits are also described in [12] , [19] . For instance, in the random-switching-logic (RSL) implementation [19] , a random signal is used to equalize output transi- tion probability. The main weakness of these masking methods lies in the strict timing required. For example, output transitions of logic gates are dependent on input signals when glitches exist [20] , and some cases have been reported of successful attacks on masked hardware implementations with glitches [21] , [22] .
The other circuit-level category, sometimes known as hiding [6] or complementary techniques, is the implementation of a logic circuit with power consumption theoretically independent of the data being processed, as proposed in [23] - [29] . The design of this kind of secure cells has been an ongoing obsession in the crypto-community, because they can be used for the hardware implementation of any kind of cryptographic algorithm for either public-key or private-key cryptosystems, regardless of the specific application. There are several approaches to creating hiding countermeasures at circuit level with complementary coding and data-independent power consumption. Those based on adiabatic logic, like for instance [30] , offer relevant low-power security features, but adiabatic designs require precise timing (at least four supply-clock phases) and still need further development. To maximize hiding effects for security purposes using more conventional logic styles, dual rail with precharge logic (DPL) families have been proposed to ensure one computation performed in every clock cycle showing exactly the same transition probability for every input condition [31] , [32] .
In DPL, each Boolean variable is represented by a pair of wires . When is valid, means . Signal is not valid if . Every computation comprises one precharge (where is invalid), followed by one evaluation (where is valid), and being complementary. Whatever the input situation, exactly one and only one of and will toggle. The number of toggles per clock cycle is thus constant, as should be the overall power consumption.
Considering the physical design technique used for DPL, there are two main categories: those based on standard-cell implementations, and those requiring full-custom implementations.
The classic example of standard-cell based implementation is the so-called wave dynamic differential logic (WDDL) [33] . In WDDL, the precharge value propagates from the inputs to the outputs, like a wave. Its major advantage is the use of a standard-cell flow, which facilitates the synthesis process. The main flaws are the early evaluation bias [32] , which is difficult to mitigate, and glitch generation if WDDL is not implemented using positive functions. Some improvements to WDDL have been reported, such as MDPL [26] , iMDPL [34] , STTL [35] , and BCDL [36] . In [37] , WDDL was duplicated in such a way that the True and False networks were inverted between the two WDDL instances. Basically, standard-cell based secure gates are easily incorporated into FPGA implementations of cryptocircuits, but they usually offer lower security levels. Even a careful back-end process in ASIC does not ensure total security. For a complete description, and experimental results obtained by attacking an WDDL AES ASIC, see [38] . No further reference to this type of implementation will be made in this paper.
Turning to full-custom solutions, SecLib (Secure Library [39] ) combines security countermeasures at protocol, architecture and back-end levels. At protocol level, the computations are divided into two steps: computation of one iteration, and the reinitialization of all the nets so that the circuit is ready to start a new computation afresh, for instance with all the nets in the same electrical state. At the architectural level, the resynchronization capabilities of quasi-delay-insensitive logic disables anticipated evaluation by synchronizing the inputs. Gate timing is thus independent of the data. Finally, the desired symmetry is acquired by having a fully balanced cell layout. Although very powerful, SecLib is an integral solution that is not easy to apply in standard synthesis processes, due to the requirements imposed by specific protocols and architectures.
The other full-custom solutions are those based on differential logic styles. It has been reported that, if carefully designed, placed and routed, these solutions provide the best results. The most relevant proposals have been Dynamic Current Mode Logic (DyCML) [40] , Low-Swing Current Mode Logic (LSCML) [41] , Sense Amplifier Based Logic (SABL) [23] , Three-Phased Dual-Rail Pre-charge Logic (TDPL) [42] , and Delay-Based Dual-Rail Pecharge Logic (DDPL) [43] . Differential circuits exploit their inherent symmetry to ensure similar consumptions for "0" and "1" evaluations, since both the true and complemented outputs are generated simultaneously. Fig. 2 shows a simplified scheme for a dynamic differential logic style. Such logic styles comprise a differential pull-down network (DPDN) performing the logic function and a differential pull-up network (DPUN) working in alternate precharge and evaluation phases. They provide both the true and the complemented output in every clock cycle, with only one charging event, but care must be taken to ensure that a fixed amount of charge is used in every transition. TDPL and DDPL are insensitive to unbalanced routing capacitances [43] , but TDPL needs a third clock phase [42] , and DDPL needs a timing codification scheme requiring precise delay generation [43] . SABL [23] is a differential logic style that meets both requirements: it has one charge event and uniform capacitance charges. SABL achieves better results because its internal structure suppress the influence of internal capacitances better than the reduced output swing used by DyCML and LSCML [27] , [45] . Table I summarizes the features and performances of fullcustom secure DPL techniques. Average energy and deviation in energy are taken from [27] , [40] - [43] . Data are normalized to SABL results. Main drawbacks limiting the usage of the techniques for secure applications are also included in the table. From the information given in Table I , it is clear that in energy-secure systems, SABL gates provide the best trade-off in hardware resources, power and security, especially if balanced outputs are provided. Although DDPL and TDPL can provide more security levels, the overhead in complexity makes recommendable the usage of SABL as reference technique.
Although the above mentioned methods for complementary circuits are intended to prevent DPA completely, they still leak side-channel information. Even ensuring DPA resistance, and taking into account loading capacitances, symmetric routing and process-voltage variations, it has been demonstrated that SABL logic, and hence all the differential styles, still leak some information due to the asymmetric charging of the internal pulldown nodes [44] . Notwithstanding the back-end implementation of crypto-hardware, the aim of this work is to present a methodology for optimizing the design of DPDN for differential logic gates capable of implementing fully secured cryptographic blocks using any DPUN approach. The main contributions of this paper are as follows.
• A new methodology for optimizing DPDN, eliminating stored charges in internal nodes, and avoiding harmful memory effects. Evaluation of the aforementioned DPDN structures with regard to energy-secure operation, arising security flaws.
• Application of the methodology to improved AND/NAND and XOR/XNOR SABL gate designs, which serve as a test bed.
• Implementation of a case study (Sbox9) using different AND and XOR gates as an application example.
• Simulated DPA attack on Sbox9 implemented using complementary metal-oxide-semiconductor (CMOS) standard cells, SABL cells, and the improved SABL gates, to demonstrate the improvement in security resulting from our proposal.
• Evaluation of the effect of temperature variations on the weaknesses of energy-secure cells. It will be demonstrated how a DPA attack carried out at appropriate temperatures creates severe security flaws. The work presented in this paper is partially based on [44] , where the initial idea of removing the charge in internal nodes of DPDN in a differential XOR gate was presented. This work presents an improvement of the proposal in [44] as new alternative method: 1) a novel complete methodology for removing internal charges in any gate of any differential logic style is presented, proving its suitability for secure implementations designing and simulating different digital gates; 2) performing simulation-based DPA attacks on the substitution box of the Kasumi algorithm to assess the proposals; and 3) analyzing the effect of temperature variation in the security of the proposals against DPA attacks.
The results can easily be extrapolated to other hardware-based cryptographic systems (either public-or private-cryptosistems including any sbox, block-or stream-cipher, etc.), since the improvement in security was obtained using basic building blocks.
The paper is organized as follows. Section II describes the basic configuration of dynamic dual-rail structures, focusing on their characteristics in relation to DPA applications. Section III reveals the structure of the proposed DPDN for generic differential logic gates. Specific considerations for the AND/NAND and XOR/XNOR gates and some simulation results evaluating the improvement in the DPDN proposals are presented in Section IV. Section V shows the design and results of simulated DPA attack on a Sbox9 using different approaches. Section VI analyzes the impact of temperature variation on the effectiveness of the DPA attacks, and finally conclusions and future work are presented in Section VII.
II. DPA RESISTANT DIFFERENTIAL LOGIC GATES
The basic circuit configuration of a differential DPL cell is shown in Fig. 2 . The DPDN in Fig. 2 (a) is mostly implemented with NMOS transistors connected to the bottom clocked NMOS transistor. Without loss of generality, a DPA-resistant gate can also be constructed with the logic function implemented in the DPUN with PMOS transistors and a clocked PMOS transistor on the top, as shown in Fig. 2(b) . For simplicity, we will use the scheme in Fig. 2 
(a).
As mentioned, SABL [23] fulfils all the requirements for DPA resistance. Fig. 3 shows the DPUN structure for SABL. SABL operates as follows: the clocked PMOS transistors (T6 and T8) in the DPUN are ON in the precharge phase , setting . In the evaluation phase , the sources of the NMOS transistors T7 and T10 are grounded through a discharge path in the DPDN and the switching action of transistor M, which is always ON, making the logic function at the output dependent on input values. The specific features that make SABL resistant to DPA are 1) the presence of the clocked bottom transistor T11, 2) full symmetry in DPUN, and 3) outputs of DPDN not connected to the gate of output inverters (T1/T2 and T3/T4). This last feature makes SABL superior to other simpler symmetric differential structures [23] , [27] .
Even using an appropiate logic style such as SABL in the DPUN, the DPDN must be fully symmetrical when implementing the logic, regardless of input values. Here, symmetry means that all the paths from to ground must have the same transistor count and the same equivalent resistance and capacitance in every node. To make DPDN effective against DPA attacks, it should be fully symmetrical, with the same number of NMOS series transistors, and with similar resistance in every path. The gate will then operate with a constant delay (RC value), regardless of the specific input values. Fig. 4 shows the typical implementation of the AND/NAND and XOR/XNOR DPDN reported in literature [45] . Note how the AND/NAND DPDN has been modified, with the addition of dummy transistors controlled by and , to achieve the maximum symmetry [45] .
Even with a fully symmetric DPDN, information could be leaked if the evaluation of specific data leaves an indelible fingerprint that can be exploited by an attacker. To guarantee maximum DPA protection, therefore, any kind of memory effect should be removed. In both DPDNs in Fig. 4 , for every input condition, the charge stored in the internal nodes (n1 and n2) should be equalized, because these nodes may store information about the previous state. Let us consider the implementation of the XOR/XNOR DPDN shown in Fig. 4(b) . Regardless of the input values, nodes qXor and qXnor always see the same capacitance, and output is always discharged through two series-connected NMOS. However, the values stored in internal nodes n1 and n2 in the precharge phase will depend on the value of input B in the previous evaluation, as demonstrated in [44] . There will thus be a difference in power consumption regardless of whether the value of B between two consecutive evaluations is the same or not, as it can be seen in the waveforms in Fig. 5 obtained from a simulation of a XOR/XNOR SABL gate implemented in a 90 nm technology.
For example, the voltage value of node n1 in the precharge is higher when input B was "0" in the previous evaluation, than when it was "1". Hence, the power consumption of the next evaluation would depend on the previous value, giving rise to an undesired memory effect and power consumption will be smaller when input B maintains its previous value than when it changes. Similar results can be obtained for the internal nodes of the AND/NAND implementation shown in Fig. 6 . When using these two DPDNs to perform the AND/NAND and XOR/XNOR logic functions, power consumption therefore varies slightly depending on the internal nodes of the DPDN. In this regard, there exists a severe security hole in the transition from precharge to evaluation, which could enable a potential attacker to observe the supply current traces present exactly in the vicinity of such transition.
For any n-input precharge-based secure gate, it has been traditionaly considered by the research community [46] that "00" values restored in the outputs during precharge phase minimize the number of possible input transitions with potentially different power/ground current traces for the following evaluation phase to . For a two-input gate, the different (and unique) transitions considered in inputs and from precharge to evaluation were Considering the above-mentioned mechanism, it is clear that the evaluation prior to the precharge phase influences the power/ ground supply current of next evaluation if the memory effect is not removed. The number of possible transitions with potential different power/ground current traces rises to 16 (four possible transitions prior to each precharge for each of the four cases considered above), as opposed to the four transitions traditionally considered, making any potential attack much easier. The number of different possible transitions for a n-input gate is . In next section, it will be shown how the DPDN can be provided with additional security features to operate in a memoryless fashion, in such a way that all internal nodes are charged/ discharged in each precharge phase.
III. OPTIMIZATION METHODOLOGY FOR DPDN
To prevent the undesired effect described above, we propose a technique for matching the charge in internal nodes during the precharge phase. This can be achieved principally in two main different ways: 1) by recycling the charge and equalizing it by its distribution between the internal nodes and 2) by charging/discharging all the internal nodes to the same final value. In both cases, it suffices to add specific transistors that are in the ON state only during precharge. Initially, the same depth was considered for both branches of DPDN. If the logic function allows different branch lengths, dummy transistors must be added in the same way as for the AND/NAND gate in Fig. 4(a) in order to improve symmetry. 
Single-Switch Solution (P):
In any DPDN implementation for a generic differential logic function, the intermediate nodes in the same depth level are tied together through a switch that is ON during the precharge phase , setting an equal value of voltage in nodes in the same level. The overhead associated to this solution is one switch for each transistor level in the DPDN except for the first one, which generates the true and the complemented output. In the SABL structure, these are interconnected with the intermediate Vdd-gated NMOS transistor that is always ON. For an N-depth DPDN, therefore, the overhead is N-1 switches. Considering ideal switches, this solution ensures accurate charge distribution during precharge and does not leak any information. From a practical point of view, since a CMOS switch needs one PMOS and one NMOS transistor, as well as and , the associated overhead is very high, especially in SABL solutions where only a single phase clk is needed. The generation of a global or local becomes unpractical, and so a one-transistor switch represents a good trade-off between complexity and security achievements. A PMOS transistor that is ON in the precharge phase therefore provides the most feasible solution. A generic scheme for a single-switch solution is shown in Fig. 7 .
Dual-Switch Solution (2P):
The intermediate nodes in the DPDN implementation are tied to supply/ground rails with independent switches during precharge, forcing exactly the same voltage in all nodes. Each DPDN level except for the first one, which generates the true and the complemented output, needs exactly one pair of switches. In the SABL structure, these are interconnected with the intermediate Vdd-gated NMOS transistor that is always ON. Thus, for an N-depth DPDN, the overhead is switches. As with the single-switch configuration, the only feasible solution uses PMOS switches that are ON during precharge, connected to Vdd. Any other solution has important drawbacks: NMOS switches need to be controlled by unavailable signal, PMOS switches are not suitable for GND connection because of their limited conduction of "0" and CMOS switches are too expensive to implement. A generic scheme for a dual-switch solution is shown in Fig. 8 .
Let us consider an AND/NAND SABL gate to show the feasibility of the two proposed solutions. The simulation results in Fig. 6 show the undesired memory effect in the original SABL gate, designed in a 90 nm technology. The schematic for the single-switch solution applied to this gate is shown in Fig. 9 and the corresponding simulation results in Fig. 10 . In each precharge phase the action of the PMOS switch T1 ON equalizes the intermediate voltage in internal nodes n1 and n2. Fig. 11 shows the schematic for the dual-switch solution applied to the AND/NAND DPDN implemented in the same 90 nm technology. Here, two PMOS transistors T1 and T2 connected to internal nodes n1 and n2 are gated by the clock signal. In each precharge phase, when , the PMOS transistors T1 and T2 are turned ON, setting an equal voltage value (Vdd) in nodes n1 and n2, as shown in the simulation results in Fig. 12 .
These two feasible solutions prevent the above-mentioned memory effect and ensure that all evaluations start in the same initial conditions. A priori, the main drawbacks would be slight increases in 1) area, 2) power consumption during the precharge phase, and 3) delay in the evaluation phase. Moreover, a significant improvement in security of the gate is expected, since closer power consumption and delay values can be achieved for different input data.
The proposed methodology affects only the DPDN block, and it can therefore be used in any DPUN proposal with similar benefits to those obtained for SABL. 
IV. IMPLEMENTATION AND SIMULATION RESULTS
To evaluate the effectiveness of the two techniques, the proposed methodology was applied to the design of two-input AND/NAND and XOR/XNOR SABL gates in a 90 nm technology. Although simple gates, they are the most commonly used in current cryptohardware implementations. However, the proposals can be easily applied and their effectiveness verified with more complex gates and other differential structures.
Classic and proposed AND/NAND (XOR/XNOR) SABL gates were designed in CADENCE using TSMC 90 nm ( V) technology. They were simulated with SPECTRE under nominal conditions, i.e., typical model for transistors, nominal Vdd and C. Monte-Carlo simulations considering process-temperature-voltage variations were conducted, producing results equivalent to those typically obtained in this type of implementation in terms of security. Inputs and outputs of the gate being tested were passing through gates of the same style, nominal clock frequency being 100 MHz. The input patterns were such that all possible combinations could occur, and the power consumption for the 16 possible situations described in Section II was measured. Table II shows the simulation results for power consumption in the evaluation and precharge phases for both implementations of AND/NAND (XOR/XNOR) SABL; the one with the classic DPDN and the one with the improved DPDN. To quantify the DPA-resistance of the cell, we measured the minimum energy value (Min), the maximum energy value (Max), the mean energy and the standard deviation for all the transitions. Energy per cycle has been computed as shown in (1) and (2) (1)
From these values, we obtained the normalized energy deviation (NED) and the normalized standard deviation (NSD), according to (3) and (4), respectively. NED and NSD are usually conceived as an indirect measurement of DPA resistance [23] , [29] , [42] , [43] (3) (4) Since all these magnitudes are indicative only of DPA resistance, further DPA attacks were needed to evaluate the security of the proposed solutions. However, it made no sense to perform a DPA attack on a single gate, thus NED and NSD are considered as security measurement of the gate. The greater the difference between the maximum and minimum values (highest NED and NSD), the higher the cell's vulnerability to DPA attacks. Table II also shows average energy ( ) in the evaluation and precharge phases, total average energy ( Total), delay in the evaluation phase and the number of transistors per cell.
To quantify the DPA-resistance of the cells, let us consider NED and NSD in the evaluation phase, as shown in Table II . Power consumption in the evaluation phase has a greater dependence on the data being processed because in the precharge phase nodes n1 and n2 are floating. For AND/NAND cells, the double-switch solution did not improve the NED and NSD values of the classic cell. However, the single-switch proposal improve NED by a factor of 0.67, and NSD by a factor of 0.54 in comparison with the classic-style solution. In the XOR/XNOR cells, the results for the double-switch proposal improved NED by a factor of 0.03 and NSD by a factor of 0.02. The single-switch proposal for the XOR/XNOR gate also provided better results than the classic cell, reducing NED by a factor of 0.76, and NSD by factor of 0.78.
The main drawbacks of the proposed solutions were increases in total power consumption, area and delay degradation. The AND/NAND cells increased power consumption by around 1.17 for the double-switch proposal and 1.03 for the single-switch proposal. For the XOR/XNOR double-switch and single-switch proposals, the increases in power consumption were 1.28 and 1.02, respectively. In all the proposals, delay degradation was less than 18% compared with the classic cell. The overhead in hardware resources, measured as the number of transistors, was 5.5%(11%) for the single(double)-switch solution. These overhead are negligible compared to the improvement in security achieved (up to several orders of magnitude).
In view of the results reported in Table II , it can be concluded that the best choices for the AND/NAND and XOR/ XNOR gates are, respectively, the single-switch and doubleswitch proposals. The simulation results show an improvement in security, with lower NED and NSD values. Since NED and NSD only indicate the DPA resistance of the gates, a DPA attack was simulated to assess the security of our proposals. This is described in the next section. 
V. CASE STUDY: DPA RESISTANT SBOX9
Although the NED and NSD values of the proposals provide a good idea of their robustness against DPA attacks, the definitive way to measure security is actually to carry out DPA attack and see what happens and to this end such an attack was simulated.
A DPA attack aims to recover the secret key of a criptographic device, in which the input patterns and the criptographic algorithm were known. The subject of this case study was the 9-bit substitution box, henceforth calling Sbox9, used in the Kasumi algorithm [47], a block enciphering system. It is widely recognized that Sbox is the main source of significant leakage signals, and hence a prime target for DPA block-cipher circuit attackers [6] , [48] . Sbox9 can easily be implemented with only two-input AND and two-input XOR gates [49] . The designed circuit scheme is shown in Fig. 13 , where is a 9-bit input pattern, is a 9-bit key, is the input data of the Sbox9 obtained after a XOR operation between and is a 9-bit output data and is the measured current in the Sbox9 during encryption.
To evaluate the improvement produced by our proposals, three different Sbox9s were designed, using: 1) CMOS gates, 2) classic SABL gates, and 3) proposed AND/NAND and XOR/XNOR SABL gates. For a realistic and fair comparison, practical clock networks were considered to evaluate circuit operating security under identical conditions. The Sbox9s, with 84 AND/NAND and 95 XOR/XNOR gates, were designed in CADENCE using TSMC 90 nm ( V) technology. They were simulated with SPECTRE under nominal conditions, i.e., typical model for transistors, nominal Vdd and C. The simulation environment was set with maximum simulation resolution, capturing data every 10 ps, with a clock frequency of 500 MHz, and without noise sources in order to ensure the best possible conditions for an attacker. It is expected that obtained results would be extended to experimental results on chips or simulation under more realistic conditions, as PVT variation, particular routing, layout non idealities, etc. Since only DPDN is modified, variations in security values should affect all proposals in the same way. Fig. 14 shows the DPA attack flows [6] . First, an intermediate result of the executed algorithm had to be chosen. In this specific case, it was the output of the Sbox9, since this is a value dependent on the input pattern (known) and the key (unknown). Power consumption of the Sbox9 then had to be measured during encryption. To do so, a large amount of known input patterns were generated, obtaining array . During each encryption run, a power consumption trace was measured, thereby obtaining a matrix of size being the number of input patterns generated and the number of points measured per trace. into hypothetical power consumption values using the Hamming Distance (HD) model, which describes the power consumption of CMOS circuits better than other models like Hamming Weight (HW) model when the attacked device is known. In the HD model, the hypothetical power consumption of a transition is directly proportional to the . Finally, the hypothetical power values were compared with the measured power traces, correlating matrix with matrix . The result of this correlation was matrix of size . The key with the maximum correlation would be the correct key. In the reference proposal the security flaw was in the neighborhood of the precharge to evaluation transition, where the correlation takes maximum values, but in the proposed solutions this security flaw disappeared.
The DPA attack was carried out under MATLAB R2009a after extracting power traces with a SPECTRE simulation of the Sbox9 under input patterns.
The first attack carried out was on the Sbox9 CMOS. 1250 random input patterns were applied, obtaining the results shown in Fig. 15 , which shows the correlation versus number of input patterns for a random key (Key1). As a security metric, the number of measurements to disclosure (MTD) proposed in [50] was used, since this has been used in numerous studies as a security metric for DPA resistant systems [11] , [51] . MTD is the minimum number of input patterns required to obtain the correct key of a cryptographic device. For the DPA attack with Key1, in Sbox9 CMOS the MTD measured was 145. Fig. 16 shows the results of the attack on Sbox9 SABL. As in Sbox9 CMOS, the same 1250 input patterns were applied, and the MTD measured increased to 344, about 2.37 higher than vulnerable CMOS. Finally, the Sbox9 with the proposed gates was attacked. It can be seen in Fig. 17 that even the 10 000 input patterns not only are not enough to reveal the correct key, but the key is not foreseen to be disclosed.
Since the simulation time of this last attack involves one day and 53 min of electrical simulation and 28 min of MATLAB processing, increasing the input patterns clearly becomes unfeasible, thus demonstrating that the security of the proposed Sbox9 designed with the AND/NAND (P) and XOR/XNOR (2P) proposals is very much higher than that of the classic SABL.
The results of the simulation-based DPA attacks on the three different Sbox9s are reported in Table III . For each Sbox9, the table gives the number of transistors used for the design, the number of input patterns used in the attack, the measured MTD for Key1 and the average MTD. Average MTD has been calculated after the attack with eight different random keys for every Sbox9 proposals. As expected, CMOS logic was found to be unsuitable for cryptographic applications because its MTD is lower than that of classic SABL and much lower that of our proposal. Based on the results obtained in Section IV, it can be said that the NED and NSD values predict the improved security of our proposals compared with classic SABL cells. From results in Table III , we obtained an improvement of at least one order of magnitude in security ( being strictly verified, although the NED and NSD values obtained for the single gates, and the poor evidences of the correct key curve appearing in Fig. 17 predict an improvement of several orders of magnitude.
VI. DPA ATTACK UNDER TEMPERATURE VARIATIONS
In the previous sections, we have shown how the security flaw caused by the memory-effect in internal nodes in the DPDN was In energy-secure systems, such weaknesses should be detected in advance in order to guarantee adequate security levels.
Furthermore, we have also noticed that additional security flaws may be caused by variations in the temperature of the cryptographic device during encryption. This phenomenon should be carefully studied in modern energy/temperature-secure cryptographic devices in order to protect them against DPA attacks. Basically, as will be demonstrated, temperature variation during DPA attacks modifies the number of input patterns needed to recover the correct key. To show the temperature range in which the vulnerability (strength) of the proposal is minimum (maximum), it was decided to simulate DPA attacks at different temperatures.
The effect of temperature on the operation of CMOS-based electronic circuits has been extensively studied in the literature [52] , [53] . Two main temperature-related effects have been reported at a transistor level: mobility degradation and a decrease in threshold voltage, producing a noticeable reduction in operation speed as the temperature rises. Power consumption also increases with rising temperatures, and this increases exponentially for leakage power [10] . However, the effect of operating temperature on the security of cryptohardware has not been investigated in depth, excepting leakage power analysis [11] . Thermal attacks have traditionally been based on increasing/decreasing the temperature until the device fails [54] , considered as a fault injection attack. The scope of this section is quite different. Here, the circuits were attacked while fully operative, the objective being to analyze the vulnerability of our proposals when subjected to temperature variations. In [55] , it is claimed that vulnerability increases with rising temperatures, because variation in power consumption decreases. However, the multiple dependence of related CMOS parameters on temperature make it necessary to simulate full scale attacks in order to reach definitive conclusions.
The simulations were performed on the Sbox9 using the same setup and test conditions described in Section V, with a temperature range set that caused no malfunction in the cryptographic device; i.e., from 0 C up to 85 C. Table IV shows the simulated DPA attack results with the different temperature variations. In each attack, the same 2000 input patterns were applied. For each temperature, Table IV shows the MTD, the maximum correlation values (Max. Corr.), the differences between the maximum correlation value and the second maximum value (Diff. Corr.), and the maximum correlation point of the DPA attack for the CMOS and classic SABL solutions with a random key. The maximum correlation value and the difference correlation value metrics illustrate more accurately how temperature affects correlation values. The value of the maximum correlation point attack given for the classic SABL Sbox9 and the CMOS Sbox9 increased with temperature in both cases. The maximum correlation point varies in time, and so a preliminary study had to be made before the attack to corroborate the resolution in the acquired data. No results are included for the proposed Sbox9 with the enhanced AND/NAND and XOR/XNOR SABL gates, since these proposals offer maximum security for all the temperatures selected (the keys are not retrieved). The results for the classic SABL Sbox9 at 0 C and 5 C are not shown because the DPA attacks were unsuccessful at these temperatures. Fig. 18 represents MTD-Max.Corr.-Diff.Corr. versus temperature. The CMOS Sbox9 clearly presents similar MTD values regardless of temperature. This means that the attack could be effective at any temperature, so no conclusions can be drawn with regard to attack/defense. On the other hand, the Max. Corr. and the Diff. Corr. values increase with temperature, meaning that the correct key peak increases with the temperature in comparison with the incorrect keys.
However, the behavior of classic SABL Sbox9 shown in Fig. 19 was quite interesting. Security was considerably reduced in several windows. More specifically, temperatures between 15 C to 35 C and 60 C to 85 C made the cryptocircuit more vulnerable to DPA attacks, while at temperatures lower than 10 C and 40 C to 50 C, system security increased. The Fig. 18 . MTD, the maximum correlation values (Max. Corr.), and the differences between the maximum correlation value and the second maximum value (Diff. Corr.) for DPA attack under temperature variations of Sbox9 CMOS. Fig. 19 . MTD, the maximum correlation values (Max. Corr.), and the differences between the maximum correlation value and the second maximum value (Diff. Corr.) for DPA attack under temperature variations of Sbox9 Classic SABL.
optimal temperature range was lower than 10 C, where the DPA attack using 2000 input patterns was unsuccessful.
These results are quite interesting, but requiring further work to be theoretically explained and to make possible a prediction from design characteristics. It seems to be clear that temperature variation brings opposite effect in security characteristics, as it has been demonstrated for Physical Unclonable Functions [56] and True Random Number Generators [57] , where postprocessing units are needed to increase security.
In view of these results, the circuit can be protected by intentionally cooling/heating it if the chosen operating temperature range lies in the less vulnerable area. In this regard, it is necessary to evaluate the selected proposal to detect window vulnerability.
VII. CONCLUSION AND FUTURE WORK
This paper has presented a methodology for improving the DPDN of differential logic gates used in cryptographic applications. Two new mechanisms were presented to remove charge in the pull-down of a differential gate and eliminate the memory effect. Both of them-the single-switch solution and the doubleswitch solution-can be used in any differential structure for security applications. Using our configuration, the DPA-resistance of the gate was improved, with minimum performance degradation.
The applicability of the proposed methodology was demonstrated by designing the two-input AND/NAND and XOR/XNOR gates. In a first phase of design, two metrics-NED and NSD-were used to quantify the potential DPA-resistance of the proposed gates. Using the results of the single gate simulations shown in Table V , we were able to choose the best proposal for the AND/NAND gate (single-switch solution) and the XOR/XNOR gate (double-switch solution) for further analysis. As can be seen, neither power degradation nor delay were significant, making the proposals suitable for use in low power applications.
To evaluate the DPA-resistance of the proposed gates, part of a cryptographic algorithm was developed: the Sbox9 of the Kasumi algorithm. Sbox9 was designed with three different strategies: 1) CMOS gates, 2) classic SABL gates, and 3) the proposed AND/NAND (P) and XOR/XNOR SABL (2P) gates. After simulating DPA attacks, we obtained the following results (as expected): the CMOS logic is not suitable for cryptographic applications, the classic SABL Sbox9 is more resistant to DPA attacks than the CMOS variant, but is considerably more vulnerable than our proposal. The DPA attacks showed that our proposal increases DPA-resistance by much more than one order of magnitude with respect to the classic SABL Sbox9.
A new study was carried out to determine security variation in cryptosystems when a DPA attack is performed under temperature variations. To detect the security flaws caused by temperature variations, DPA attacks at different temperatures were simulated for Sbox9 CMOS, SABL classic and proposal. The results obtained indicated that CMOS circuits were vulnerable regardless of temperature, but in the case of classic SABL Sbox9, cryptocircuits operating at temperatures lower than 10 C are extremely more secure. Cooling the circuit intentionally can therefore help to protect the circuit against DPA attacks. No results are shown for the DPA attack with temperature variations on the proposed Sbox9 with the enhanced AND/NAND and XOR/XNOR SABL gates, because the proposals offer maximum security for all the temperatures selected (the keys are not retrieved).
As future work, the implementation of different Sboxes and block-or stream-cipher is going to be considered to apply the proposed methodology. The comparison with other strategies, not necessarily based on full-custom solutions, will enable the right selection of the best proposals for secure cryptosistems. In such comparisons, PVT variations, interconnects, and nonsymmetric layouts will take an important role. Finally, a thermalaware mechanism to on-chip temperature sensing and able to cool/heat the cryptoprocessor will be developed, in order to dynamically operate in the most secure temperature.
