ABSTRACT Side channel attacks have become a major threat to hardware systems. Most modern digital IC designs utilize sequential elements which dominate the information leakage. This paper reports the first unified analysis and comprehensive comparison of known secure flip-flop circuits. We present a device level analysis of the information leakage from these FFs and propose several evaluation metrics to quantify their security. We show that simulated PA attacks that utilize the information evaluated by these metrics at the gate-level extract more information at the module-level.
I. INTRODUCTION
Digital systems that process or store personal information are liable to Side Channel Analysis attacks (SCA) [1] , [2] . SCA attacks utilize information that is associated with the physical implementation of the hardware to extract private information. One of the most powerful side channel attacks is known as Power Analysis (PA) [2] , [3] . PA attacks exploit the correlation between the processed data and the device's dissipated current, which they use to extract the cryptographic key employed for encryption. In model-based PA attacks, the attacker computes a hypothesized dissipated current, which reflects the current induced by expected logical transitions of signals in the circuit, and then correlates it with the measured power supply current. Correlation-based attack-methodologies must rely on statistics since the current samples are very noisy. A successful attack depends on the attacker's ability to correlate multiple hypotheses with the corresponding current samples from the times they are processed.
Correlation-based [4] PA attack procedures can be divided into several stages as detailed in [2] and [3] . Prior to these stages, the attacker implements the necessary preprocessing step of segmenting the large amount of power measurements that have been collected and synchronizes the segments with the hypothesized currents. The efficiency of the PA attack depends almost exclusively on this synchronization.
Most modern digital IC designs nowadays are implemented in the synchronous design style which have become very popular mainly because of their design simplicity. In conventional synchronous designs, a single clock is utilized for many design modules such that many vectors in the design are sampled simultaneously. This makes synchronous designs very vulnerable to PAs. The Globally Asynchronous Locally Synchronous (GALS) design style [5] - [7] is considered to be attractive from the hardware security perspective. In GALS designs each local module is synchronized by a local clock signal but communication between different local modules takes place asynchronously. In this case, only local module signals are sampled by the same clock, so their level of security is thought to be higher than synchronous designs.
Both synchronous and GALS designs utilize sequential elements. Therefore, groups of signals are sampled with synchronization to a clock using these sequential elements. This makes the design of secured sequential elements a key challenge [3] , [8] . The sequential elements are typically constructed from a large number of transistors, as compared to basic combinational elements. Thus, their operation draws large currents leaving a substantial power profile signature.
The hardware security problem is considered as a multidisciplinary problem. Flip-Flops are a key component in any hardware system-thus their contribution to the system's immunity to side channel attacks (such as power attacks) should be taken into account across levels and disciplines, starting from the circuit designers through cryptographers up to the system engineers.
Many sequential elements have been proposed over the years to optimize electrical properties such as [9] - [11] energy, area, performance and reliability. The increased interest in the security characteristics of sequential elements (mainly their sensitivity to Power Analysis attacks) has resulted in proposals for several topologies of these elements. These solutions aim to weaken the correlation between input transitions (or input states) and the current dissipated by these elements. Flip-flop (FF) circuits are used almost exclusively by digital system designers and are supported by digital-automation tools. As such and as expected, FFs have been the most highly researched sequential elements in terms of PA immunity and are the focus of this paper. For example, the Sense Amplifier Based Logic (SABL) [12] and the Improved SABL [13] - [15] flip-flops have a symmetric transistor-level scheme and operation targeted to consume equal energy/current for all transitions (similarly concepts to that of the dynamic current mode logic based flip-flop, DyCML [30] ). The Secured Detect D-FF [16] is designed to identify states in which the output does not switch, by triggering a dummy flip-flop that consumes the ''switching'' energy in these states. The delayed detection mechanism based FF (denoted by DelayedFB DFF) [17] and the Three Phase Dual Rail Logic based (TDPL) FF [18] , [19] are implemented using the concepts of dynamic logic and utilize a unique timing scheme to precharge and/or discharge the stored energy to provide constant energy for each computation.
Although a variety of secured sequential elements have been proposed in the literature, their security properties have not been thoroughly evaluated and compared using the same evaluation environment and metrics. This manuscript aims to provide a solid evaluation environment for the PA security characteristics of sequential elements. In addition, we evaluate previously reported security metrics at the gate level and propose improved metrics.
The contributions of this work are as follows:
• A unified comparison of known secured-FFs.
• A circuit level analysis of the secured-FFs weaknesses.
• Presentation of several evaluation metrics which are shown to better quantify the security of the secured-FFs.
• Soft-spots of various security oriented FFs are associated with security metrics so as to identify them.
• Simulated PA attacks that utilize the information evaluated by the proposed metrics at the gate-level are shown to extract more information at the module-level. The remainder of this manuscript is organized as follows: Section 2 provides a short background on related work. The evaluation setup used to examine and compare information leakage is detailed in Section 3. In Section 4 we discuss several known metrics to evaluate the information leakage of these devices. In addition, several new evaluation metrics are proposed. Section 5 analyzes the device-level information leakage mechanisms of the sequential elements under consideration. A discussion and examination of these metrics based on the device level examination follows in Section 6. Section 7 proves that PA attacks on the module level that utilize the information evaluated through the proposed gatelevel metrics are more efficient and Section 8 concludes and summarizes this manuscript.
II. A SECURITY PERSPECTIVE ON KNOWN SECURED FLIP-FLOPS
Flip-Flops (FF) are typically (and almost exclusively in standard libraries) constructed from Master-Slave [9] latches. A conventional static latch is comprised of a back-to-back inverter pair [20] (cross-coupled structure) and additional control signals (e.g. clock) and circuitry. The back-to-back pair, which is the main reason for the robust operation of a static latch, is responsible for its ''differential'' nature. In other words, each latch stores both a data bit and its complement. In conventional FFs the "no-change'' and ''change" states can be differentiated, thus making them vulnerable to power attacks.
Due to the resistive or capacitive imbalance between the nodes in the non-ideal (physical) world, each of the two ''change'' states (0→1 and 1→0) can be distinguished by a PA attack. This imbalance can be caused by different sizes of the devices, imbalanced routing, physical mismatch or variations.
In this section, we present a short security-oriented review on related work and previously proposed solutions for secured FFs implementations. 1 
A. SECURED DETECT FLIP FLOP (DETECT-FF)
The Detect-FF structure [16] , shown in Fig. 1(a) , aims to achieve a data-independent current by duplicating the main flip-flop (which results in FF1 and FF2) and by adding a detector-generator unit. The role of the detector-generator is to identify whether switching of FF1 has occurred or not, and trigger the switching operation of FF2, if needed. This scheme assures that only one FF (FF1 or FF2) will switch in each cycle. This functionality makes the detect-FF dynamic 2 and differential 3 ; however, the main pitfall of this architecture is that the detector-generator unit responds differently to various inputs. When doing so, the detector-generator unit draws current that leaks information on the manipulated input data. Later in this manuscript we analyze and show in which circumstances this information-leak is substantial and discuss the reasons for this information leakage.
B. SENSE AMPLIFIER BASED LOGIC FLIP FLOP (SABL-FF)
SABL-logic gates [12] are based on a sense amplifier circuit, as shown in Fig. 1(b) . A basic sense amplifier circuit is sensitive to the voltage-difference between its inputs and amplifies this difference (positive and negative differences result in '0' and '1' values at the output, respectively). In the context of the SABL gates, the inputs are differential (D andD) and its architecture is symmetric in layout and operation. In the ideal case, this symmetry leads to current data-independence between ''switching'' transitions. SABL gates also utilize two clocked transistors which precharge the differential outputs in every clock cycle. Therefore, the circuit is differential and dynamic (circuits having both properties are typically referred to as Dual-Rail Precharge, DRP [17] ).
In fact, the SABL-FF architecture employs an SR-latch which is connected to the sense amplifier outputs (highlighted in grey in the figure) . The SR latch stores the FF state on a cross-coupled NANDs structure (N 1 and N 2 in Fig. 1(b) ). The clocked precharge transistors trigger an update in the SR state when clock='0'. The sense amplifier then reacts on a clock transition to a logical '1'. It is important to note that even though the SR-latch implementation is symmetric (differential) and its inputs are dynamically precharged, the concatenation of the sense-amplifier and the SR-latch is not dynamic. It reacts differently to an input change or nochange.
This is a significant drawback in the context of security, as will be detailed in Section IV.
C. IMPROVED SABL-FF
The Improved SABL-FF is based on the SABL-FF with two additions:
• An additional transistor is superimposed between the differential pair outputs as shown in Fig. 1(c) , similarly to the Strong Arm-FF [13] , [14] circuit. This transistor is added to discharge both int 1 and int 2 internal nodes during the evaluation 4 (clock = '1'). That is, when the n-MOS footer transistor (ft) is open, both internal nodes are discharged (one to '0' and one to the n-MOS threshold voltage, V th ). This mechanism initiates the discharge and precharge of the pair of the differential internal nodes, which means that no information from a capacitive imbalance of these nodes can be utilized. However, as one of the nodes will only discharge to V th and not to 0V, the efficiency of this mechanism is undermined.
• The SR-Latch is replaced by an improved SR-Latch which is designed (by changing device sizing) to provide smaller data-dependent currents for the latch than the SABL as discussed in the next section. Up to this point all these FFs have been compatible with a standard synchronous system operating by a single clock signal. Below we discuss FFs that are only compatible with dynamic-logic flavors (e.g. np-Domino [20] , NORA [21] , Domino [22] , DML [23] , [24] ) that employ signals with unique timing control. In these FFs the data signals (D and its complement) are bound to specific precharge and/or discharge periods within the clock cycle. Note that the 4 Evaluation is the phase of settling on a desired logical value after a charge or discharge phase (denoted typically by precharge or discharge).
construction of the systems which follows these strict timing diagrams is more complex.
As discussed above, the design of an ideal dynamic and differential circuit is complex since the SABL-FF's SR-latch operation reveals information about the data. However, the Improved SABL-FF's SR-latch and the added internal transistor operation induce currents which are still data dependent. The asymmetry of the Detect-FF's detectorgenerator unit leaks information on all possible output transitions.
D. DELAYEDFB-DFF
The DelayedFB-DFF [17] is based on the SABL-FF structure with two main differences:
• Two delay elements (buffers) are added in the feedbackloops of the differential pair (as shown in Fig. 1(e) ). These elements tolerate imbalanced differential-input transitions (variations in transition-slopes, arrivaldelays etc.). The principle of operation is based on the fact that an input change triggers the operation of the cross-coupled pair with an additional delay. This means that if the input-change duration is smaller than the added delay, the internal nodes affected by the inputs will already be stable when the cross-couple pair reacts. Thus, the same current will be drawn from the power supply in case of imbalance.
• The outputs of the DelayedFB-DFF FFs are discharged to '0' (or precharged to '1', if an output inverter is added) during the precharge phase. During the evaluation phase, only one of the outputs will be charged (or discharged). This contrast with the SABL-FF; though the internal nodes of the SABL-FF are precharged in each cycle, its outputs are not.
E. THREE PHASE DUAL RAIL FLIP FLOP (TDPL-FF)
The TDPL logic family [18] , [19] operation is somewhat more complex than the two-phased dynamic logic (i.e. precharge and evaluation phases). The TDPL circuits utilize dynamic logic gates; however, they operate in three phases: precharge, followed by evaluation, followed by discharge.
The TDPL-FF architecture uses two TDPL inverters, at its input and output, connected through a slightly modified SRlatch to store the data, as shown in Fig. 1 
(d).
The main difference with the two-phase methodologies discussed above is the ability to tolerate differential output imbalances. In the physical (non-ideal) world, differential outputs can be affected by the imbalance between resistive and/or capacitive networks. In two-phase timing-schemes, this implies that the instantaneous current and/or total energy differ for different transitions. The special three-phase TDPL timing diagram ensures that both differential output nodes are precharged and discharged in each clock cycle. Therefore, the total energy consumed per clock cycle is not affected by variations in capacitance. However, it is important to note that the instantaneous current will show data-dependency due to the imbalance in resistance.
Note that the two TDPL inverters ( Fig. 1(d) ) are controlled by different control signals; namely, the evaluation phases of the two inverters are complementary and the corresponding precharge and discharge phases precede and follow (respectively) the evaluation phases of one of the gates (in Fig. 1(d) the output TDPL-inverter control signals are marked by an ' * ' to denote the difference).
III. GENERIC EVALUATION ENVIRONMENT FOR FFS
To state that solution A provides more security than solution B a generic testing environment needs to be defined in which their effectiveness in concealing information is tested and compared under the same setup, equivalent conditions and the same metrics. In what follows we describe this type of testing environment and provide a rationale.
As mentioned above, there is a difference between an ideal FF and a fabricated one caused by variations in driving strengths, physical delays, local and global variations, noise etc. This leads to mismatches between the differential signals and their associated devices and the security the FF provides. The following generic environment mimics realistic operation conditions and supports the following factors:
1) IMBALANCE IN LOAD CAPACITANCES OF THE FF'S DIFFERENTIAL OUTPUTS
Many secured FFs contain complementary outputs which must be assigned equal loads. In practice, given different load gates, Fan-Outs and routing imbalance, the load on each output can be different. Hence the sensitivity of the FF to this imbalance must be evaluated.
2) DELAY MISMATCH BETWEEN THE TWO COMPLEMENTARY INPUTS (DENOTED BY INVERTED INPUT DELAY)
In general, in circuits with more than one input, the arrival time of each input can be different due to process-voltagetemperature (PVT), glitches, paths delays (gates and routing) etc. The same also applies to differential inputs.
3) IMBALANCE IN INPUT SLOPES
The voltage transition slope of each node in a design depends on many factors. This includes the logical gates in the path leading to the node, the physical parameters of the wires and loads, different Fan-Outs, etc. The generic evaluation environment allows for a characterization with a set of slopes, S, per technology.
4) DATA CHANGE AT INPUTS DURING DIFFERENT CLOCK STATES
The behavior of a FF depends on the clock state. Clearly this results in a different current signature if the input data changes while the clock is at '0' or at '1'.
To illustrate the impact of these imbalance factors, Fig. 2 shows current measurements of two SABL-FF designs using this environment. Fig. 2(a) presents the influence of the delay between two complementary inputs (denoted by t) on the measured current. Fig. 2(b) shows the impact of imbalance on the differential load-capacitance (denoted by C out ). Illustrative sets of imbalance factors for 65nm bulk technology with a supply voltage of 1.2V are listed in Table. I. It is clear that the sets depend on the technology under evaluation. All simulations were conducted using an analog simulation tool (Cadence Virtuoso) over the post-layout parasitic extracted designs and were further post processed in Matlab.
A generic setup that allows for FF evaluation as a function of these imbalances is illustrated in Fig. 3 . Three emulation units were connected to the units-under-test (UUTs). The input generator is responsible for generating all the differential input transitions, each is repeatedly generated with all specified slopes (set S) and Inverted Input Delays set ( t inv ). The differential capacitance generator unit provides a set of differential load capacitances ( C). For all experiments the power supply current was measured on a dedicated resistor and stored for further processing in Matlab. 
IV. EVALUATION METRICS
This section reviews known metrics for gate-level security evaluation and presents two new metrics. In particular, we address the Instantaneous Variance, NED 1 , NSD 1 , NV 1 used and discussed in [12] , [14] , [18] , [19] , and [25] .
A. KNOWN METRICS
The gate-level security of FF's is usually evaluated by the variance of the dissipated current over different data transitions or by simplified versions of this matric, for example: Normalized Variance (NV 1 ), Normal Standard Deviation (NSD 1 ), Normalized Energy Deviation (NED 1 ) as evaluated in [12] , [14] , [18] , [19] , and [25] . In all metrics, high values correspond with high information leakage.
The NED 1 and NSD 1 metrics utilize information from the consumed current during the whole clock period; that is, the instantaneous current has to be integrated over the whole clock cycle period prior to the analysis. These metrics reduce the amount of information to be stored and processed; nevertheless, the integration filters out valuable instantaneous information.
These matrices are defined as follows:
• The NED 1 metric is a function of the max and min energy (E max , E min ) over all possible data transitions. It reflects the normalized difference between the two. Thus, it disregards the probabilities distribution of these values. Formally,
where, the random variable E stands for the computation energy:
• The NSD 1 is the standard deviation of the energy normalized by the mean value of the energy:
Where σ E and µ E are the standard deviation and mean of the energy (E) respectively. • The instantaneous NV 1 : Unlike the NED 1 and NSD 1 , the current normalized variance (NV 1 ) metric relates to the points in time where the instantaneous current is maximized. It utilizes information from the instantaneous current trace and finds the point in time where the variance is maximal (for this reason the matric is denoted by instantaneous). Formally
where σ I (t) and µ I (t) are the standard deviation and mean of the instantaneous current in time sample t, respectively. Although the NV 1 metric better reflects the information leakage it has a drawback; it does not distinguish between leakage of different transitions. In what follows we introduce two alternative metrics to evaluate the information leakage more accurately.
B. NEW METRICS
Below we introduce two alternative flavors to the metrics described above.
To better highlight differences between the methods we attach a subscript to the name of the metric (e.g. NED 1 , NED state , NV HD ). Index 1 indicates that the metric computation was done on the whole set of current trace vectors.
Here we suggest dividing the currents into four groups, where each group is associated with a specific data transition state. The set of states' S is S = {0 → 1, 0 → 0, 1 → 0, 1 → 1}. For each s ∈ S the average trace I s is computed over all the current traces corresponding to this transition. Then, the three metrics are computed with respect to the four average traces; for example
The second metric is based on Hamming Distance model. It divides the currents into two disjoint groups according to the HD ('0' or a '1'), and computes two average currents, I HD 0 and I HD 1 . Then the metrics are computed, for example
The grouping prior to the metric computations emphasizes cases where each input transition derives a unique current pattern or when groups of transitions leak different information. This is quite similar to DPA-grouping [3] or templating different groups [3] , [26] ; however, this is done at the gatelevel to quantify leakage sensitivities. In Section VI these metrics are evaluated and their efficiency is examined for different FF circuit topologies.
V. SECURITY ANALYSIS -TRANSISTOR LEVEL
Next, we analyze the mechanisms of information leakage at the transistor-level of the FFs described in Section II. For each of these FFs, we provide a waveform showing where in time the information leak takes place and explain the transistor level mechanisms that trigger them. Clearly, understanding the soft-spots of the FFs enables a more robust evaluation of their information leakage and provides opportunities for designing secured circuits.
A. C 2 MOS FF
The C 2 MOS-FF [see Fig. 5(b) ] has never been used for security applications, since its current dissipation is highly data-dependent. That is, each input transition is associated with a distinct current pattern. As will be discussed below, C 2 MOS -FF is an important building block for some secured FFs. In addition, the C 2 MOS FF based architecture is widely used in standard cell libraries and therefore can serve as a reference point for non-secured-FFs [9] . For these reasons, it is briefly discussed in this sub-section. Fig. 5(a) shows the current waveforms of all possible data transitions 5 over all imbalance factors, as discussed in Section III. The upper figure shows the case where the data change occurs while the clock is at '1'. The clock toggles every 0.3 ns, starting from a logical '1'. The change in data occurs in t ∈ {0, 0.3} [ns] . The figure indicates the pointsof-interest (POIs) in time where large current variance is captured (denoted by numbered circles, 1:3). The C 2 MOS -FF scheme is shown in Fig. 5(b) . This figure will be used 5 Note that the term Data change in the figures relates to all possible data transitions; that is, the set {D old , D new } = {i, j} ; i, j ∈ {0, 1}. to examine transistor-level mechanisms associated with the high variance POIs. In order to simplify the presentation, the devices are denoted by circled capital letters in Fig. 5(b) .
Next, we elaborate on the POIs and their associated information leakage mechanisms.
POI-(1) At this POI the input changes while the clock='1' (the first clocked inverter, I , is ''closed''). The input change induces a power bounce on V DD due to the upper p-MOS gate-source coupling-capacitance (denoted by A). A rising 0→1 (falling 1→0) input induces a negative (positive) supply current (as shown in the figure). POI-(1) also exhibits many curves surrounding a central lobe, which are due to the set of different input slopes (S). A faster input transition results in a higher current amplitude.
POI-(2) At 0.3 ns, the clock changes to '0' and the masterlatch becomes transparent (the clocked inverter, I , becomes transparent) and the data propagate. A rising data change induces falling voltage on node B. This triggers a positive supply current due to the clocked inverter's (III) couplingcapacitance and a charging of the slow feedback-inverter output (both are denoted by C). On the other hand, a falling data change induces charging of node B. It is clear that the current signature is different in these two cases, and results in a substantial data dependency.
POI-(3) When the clock rises (t=0.6ns), the data propagate through the second latch to the outputs Q and Q. Rising data induce a fast charge of node D and falling data results in charging of node E, but only after a delay due to the feedback. Clearly, the current varies for different values of the load capacitances set, C, provided by the evaluation setup. POI-(4) In POI-(4) the data change during the low phase of the clock (the first master latch is transparent). In this case, this involves a combination of mechanisms of POI-(1) and POI-(2) which are associated with the specific data change.
As expected, the C 2 MOS -FF leaks substantial information on the processed data during all phases of operation. Fig. 6(a-b) shows the current traces and the scheme of the SABL-FF circuit. In what follows, we elaborate on the POIs of this circuit and emphasize their security weaknesses.
B. SABL-FF
POI-(1) The first POI is associated with the case of input data change during the precharge state (clock='0') of the FF. During the precharge state the input transistor's drain capacitances (int 1 and int 2 ) are charged to '1'. A rise in the input signal leads to the injection of current to the power supply because of the gate-drain coupling-capacitances of the input n-MOSs. Falling input has the opposite effect of drawing current from the power supply. In the ideal case, the total current from the power supply sums to zero. However, in the case of imbalanced slopes (S) and arrival times of the inputs ( t inv ), the total current is data dependent, as can be seen clearly in the figure.
POI-(2) The second POI refers to input changes during the evaluation state (clock='1'). In this case, int 1 and int 2 , which were precharged during the preceding precharge state, are already stable in one of the states ({0,1} or {1,0}) . If the inputs change, the final-state of int 1 and int 2 will be similar to the current state. Note that the cross-coupled pair does not switch in this case. Ideally, the voltage changes across the input transistor's drain coupling-capacitances do not lead to current draw. However, like POI-(1), in the case of imbalanced slopes (S) and arrival times of the inputs ( t inv ), the total current is data dependent.
POI-(3) Immediately after the rising edge of the clock, one of the precharged feedback-inverters will switch (depending on the data). If the final state is different from the final state of the previous clock cycle, the SR-latch will react (i.e. datadependently). A very substantial current peak emerges when a switch occurs, as compared to the small current peak in the no-change case. Note that in the no-change case all curves are superimposed on each other. However, switching of the outputs will lead to a distribution of the set of curves (due to the output capacitance imbalance, C, which is triggered by the SR-latch).
POI-(4) The fourth POI relates to the case of input data change before the precharge state of the FF, i.e. before evaluation ends. The voltage change over the drain capacitance of transistor (a) in Fig. 6(b) causes the data dependent current. In the case where D='1' during the evaluation, the input transistors associated with D and the transistor above int 2 are open. During the precharge, the drain capacitance of (a) will charge through the right branch associated with D. On the other hand, when D changes from '1' to '0', the transistor associated with D will close and the transistor associated with D will open. In this case, the current will flow through the left branch. The left branch is triggered by the precharge of the right branch by opening the int 1 transistor. This timeconsuming mechanism results in a slower response, as shown in the figure (denoted by HD=1) . This causesa significant difference between the change and no-change states.
C. STRONG SABL-FF
The Strong SABL-FF circuit has many similarities to the SABL-FF circuit. Although the Strong SABL-FF presents a significant improvement at POI-(3), 6 where the non-dynamic activity is less damaging thanks to the improved SR-Latch design, its current signatures at POI-(1), POI-(2) and POI-(4) behave very similarly to the ''classic'' SABL. An additional 6 The imbalance in the output capacitance, C, still provides a datadependent current bridging transistor of the Strong SABL-FF, denoted by (b) in Fig. 6(c) , results in reduced information leakage at all POIs, as discussed in Section II(c).
D. DETECT-DFF
In contrast to the protected FFs discussed above, the Detect-DFF architecture is asymmetric in structure (see Section II(a)). This asymmetry induces four different current patterns for each data transition. As shown in Fig. 7 , POIs of the Detect-DFF:
POI-(1) In the Detect FF circuit while the clock is at '1', the second latch of the complementary TG C 2 MOS FF is transparent. On one hand when HD=1 the FF denoted by (a) will switch its outputs Q1 and Q1. In case of a load capacitance mismatch (C and C + C), it induces two different current patterns (for '0'→'0' and '1'→'0' switch). On the other hand, when HD=0 the FF denoted by (c) will switch its outputs, Q2 and Q2, which are prone to additional loadcapacitance mismatches. In turn, this yields two additional current patterns.
POI-(2) The Detect-DFF is not dual rail in the traditional sense since it does not incorporate differential inputs; therefore, t inv sensitivity does not exist. However, different input slopes (S) do impact the current signature. When the clock= '0' all the data changes propagate through the Detect-Unit differently, which induces different data dependent currents. Each of these currents exhibits slightly different variations because of the different slopes. When the clock='1' the Detect-Unit is disabled; however, the data flow through the Master-FF (denoted by (a)) and the standard C 2 MOS sensitivities are visible. It is important to note that the data-dependent currents in this case (when the clock='1') are more distinct than the C 2 MOS design because the data (D) signal has more transistors connected to it and one of the outputs of the Master-FF (a) is connected in a capacitance imbalanced fashion to the Detect-Unit.
POI-(3) While the clock = '0' the Detect-Unit operates. This unit is affected by the current and the previous input data because both the inputs and outputs of the FFs are connected to it. In terms of the Detect-Unit scheme, there are four different paths from D to D'', each of which is triggered by different data changes. The difference between POI-(3) and POI- (4) is that in the case that D changes (or not), addition current is drawn (or not) by the path from D through D to the Detect-Unit. Fig. 8 shows the current traces and the scheme of the DelayedFB-DFF circuit. In this subsection, the POIs of this circuit are described.
E. DELAYEDFB-D-FF
POI-(1-2) Similar to the case of the SABL FF, POI-(1) and POI-(2) describe the effect of S and t inv on the DelayedFB-DFFcurrent during the precharge phase (clock='0'). These effects are due to the couplingcapacitances of the differential input transistors. During the precharge phase (POI-(2)), the capacitance between the two transistors, denoted by (B), is charged. Two distinct current waveforms can clearly be seen in POI- (2) . The first relates to the case where D = '0' at the beginning of the evaluation phase (which implies that (B) was charged), and changes to '1' while the circuit is still in evaluation. In this case node (B) is discharged without affecting the output. When entering the precharge phase (clock change to '0'), node (B) is charged only after the added delay-buffer (denoted by d) switches to '1' (connected to the gate of the upper transistor (of B)).
The second waveform is associated with the case where D = '1' at the beginning of the evaluation phase and does not change during the entire evaluation. In this case, the upper transistor (of B) is already open and the node (B) charges immediately without stalling by the delay-buffer. Therefore, POI-(2) shows two distinct current patterns differentiated by whether the data were changed or not during the evaluation.
POI-(3) During precharge both outputs discharge. Although the internal-nodes coupling capacitances are symmetric because of the circuit symmetric structure, the differential output capacitance is asymmetric. Therefore, it induces different currents depending on which of the output nodes is discharged.
POI-(4) This POI is associated with the C impact during evaluation while one of the outputs rises.
F. TDPL-FF
As discussed in Section II(e), each element of the TDPL FF operates in three phases which are unique to this element. The TDPL FF scheme is shown in Fig. 9(b) . The input TDPL inverter (a) operates with precharge-evaluation-discharge phases. The output inverter, (c), operates with a complementary evaluation phase and its corresponding precharge * and discharge * phases [18] - [19] .
The main POIs of the TDPL-FF are listed below: POI-(1) The data inputs change during the discharge phase. In this case the supply voltage is disconnected from the TDPL inverter (a). This means that the current is independent of the input slopes and t inv . This feature solves the issues of coupling effects caused by changes of input data that were visible at this point in all the other FFs presented above. However, as can be seen at PIO-(4), information leakage associated with the input data change still exists: if the data change during the precharge phase, the supply voltage is connected and the set of slopes, S, and t inv affect the dissipated current.
POI-(2) Similar to the SABL FF, the TDPL FF utilizes an SR-latch, (b), to store the data. As discussed above, the SR-latch reveals information through its current between the case with data change and the case without change in the data. POI-(3) During the precharge * of the inverter (c), the output are charged and affected by C, leading to information leakage.
VI. SECURITY ANALYSIS -METRICS EVALUATION
In this section, the secured-FFs are compared using the metrics described in Section IV. The NED, NSD and NV were calculated for all three grouping methodologies. For example, in Fig. 6(a) that shows the POI-(3) of the SABL architecture, the HD grouping corresponds to cases where the difference between data change and no change was significant. The State grouping indicates that for the detect DFF (example POI- (4) in Fig. 7(a) ) the current is unique for each input transition. Fig. 10 depicts three equipotential radar-plots for each metric. Each axis (corner) represents one of the three groupings. The smaller the values become (are closer to the origin) the more the security increases and the less information is captured. Table II summarizes the results where for each metric for all groupings (one triangle curve in the plots) the worst case (Max) Max value is listed. The grouping that has the maximal value is indeed the best tactic for an attacker.
The C 2 MOS FF emerged as more sensitive than all the other candidates for almost all metrics (Fig. 10) . The Detect-DFF, which was shown above to be highly sensitive to the State analysis, exhibited relatively high State sensitivity in the NV metric, almost reaching the level of C 2 MOS FF sensitivity. As expected, FFs that utilize an SR-Latch (i.e. non-differential) showed high HD and State metric sensitivities compared to their fully-differential counterparts (e.g. DelayedFB-DFF). For asymmetric transistor-level architectures such as the Detect-DFF and C 2 MOS, each input change concludes in a unique current pattern. Therefore, these stand out in the NV State analysis shown in Fig. 10 . Crucially, the Detect-DFF is designed to consume the same amount of energy regardless of the data processed. The NED and NSD metrics thus distinguish Detect-DFF much more poorly than the NV metric as shown in Fig. 10 .
Note that the State results only exceed the HD results in cases where the current leaks state dependent information, as shown for the Detect-DFF, Strong-SABL and TDPL-FFs.
Generally, the TDPL emerged as less secure than the Strong-SABL. This can be attributed to the use of a nonsecured (non-dynamic) SR-Latch (as discussed in section IV(3)).
Although the Detect-DFF leaks more information than the other designs, it is the only FF that is standard CMOS designcompatible and does not require a dual rail I/O (or dual rail coding).
VII. SECURITY ANALYSIS -IMMUNITY TO POWER ATTACKS
To validate the first-order information leakage observations (Section IV) and the proposed evaluation metric results (Section VI) in this section, the model-based CPA attacks results are shown. Each of these attacks was run with a different current model that used information from the gatelevel characterization (and grouping). This was done to show that correct use of the State and HD dependent information obtained from a gate level examination can increase the module level attack success ratio. In some ways CPA attacks that are updated with different State weights are similar to template attack scenarios where the current of each state of a module is templated [3] . However, unlike template-attacks these templates do not require special knowledge or an already cracked device, but merely a characterization of the standard cell library primitives (FFs).
To perform these attacks, a simplified module of a 4-bit Add_Key_SBOX DUT was constructed based on the 4-bit SBOX discussed in [27] and [28] . Four-bit input plaintext (d) was XORed with a 4-bit key, followed by an SBOX. The output was sampled by a group of four Flip-Flops which was attacked.
All possible 16×16 input transitions were injected into the design. The currents were divided into groups. The standard-CPA attack procedure was adjusted to maximize the attack success rate by taking into account the characteristics of the FFs according to the theoretical analysis in the previous section. That is, we used the radar-plots from the previous section to allocate the type of grouping which would provide information about the secret key better.
The three radar plot analysis of the instantaneous current provides more information than an averaging analysis. Therefore, the attacks were based on the maximal correlation over the whole clock period (intra-cycle instantaneous attack).
The correlation values computed for the CPA attacks are shown in Fig. 11 . For the Detect-DFF there were 16 curves in the plot, each corresponding to a different key. The correlation values indicate the instantaneous correlation between the hypothesized current and the measured current. The correct hypothesis appears as the bold black curve and all other hypotheses are in light gray. The upper plot was derived from a CPA conducted with an HD-based current hypothesis model and the lower plot was derived with the modeled Statebased current hypothesis. It shows that an attack with the HD model provides substantial information around 650 ps which is associated with the C 2 MOS embedded FFs. An attack with the State-based model provides substantial information around 150 ps which is associated with the State-sensitive Detect unit. Similar temporal sensitivities emerged for the other designs.
After the examination of correlation vs. time, the relative correlation ratio, CR, 7 was derived as a function of the number of traces (samples) collected, as shown in Fig. 12 . Clearly, a CR larger than 1 implies a successful attack. The figure shows two examples: a circuit embedded with Strong-SABL FFs and with a Detect-DFF (Fig 12(a) and (b) , respectively). In Fig. 12(a) the CR is plotted for HD and State based models (left and right). It shows that the CRs crosses the CR=1 points with as few as 28 current traces and the maximum CR values are quite close. This is reasonable since the Strong-SABL devices show HD dominated leakage and the State analysis does not provide substantial additional information. In contrast, the Detect-FF (Fig. 12(b) ) shows more State dependent information which is manifested in the fact that the CR crosses 1 with as few as 10 traces compared to 31 with the HD model. Table III summarizes the maximum CR and the crossing point of CR=1 for all designs with the HD and State based hypotheses. Clearly, the C 2 MOS was the most sensitive design whereas the DelayedFB-DFF design exhibited the most secured characteristics since it was not attackable with the HD model and had the smallest CR and smaller correlation values with the State model.
VIII. CONCLUSION
Sequential elements dominate the information leakage of synchronous hardware systems. Governed by a global clock signal they make it feasible to synchronize measurements, which is the required preliminary for statistical side-channelanalysis attacks. This manuscript presented a unified analysis framework and a comprehensive comparison of known secure Flip-Flop circuits. An in-depth investigation on device level information leakage from these FFs was provided, supplemented by important insights. In addition, several evaluation metrics were proposed to quantify these elements' security. Simulated power analysis attacks are discussed, empowered by information evaluated by the proposed metrics at the gatelevel and show that more information at the module-level can be exploited.
