Abstract: Masking is one of the most commonly used Side-Channel Attack (SCA) countermeasures and is built on a security framework, such as the ISW framework, and ensures theoretical security through secret sharing.
Introduction
In the past decade, Side-Channel Attacks (SCAs) have become a threatening analysis method for cipher implementations [1] [2] [3] [4] . Masking schemes are the algorithm-based countermeasure against SCA [5] [6] [7] [8] . The goal of masking schemes is to make the physical power consumption of a cryptographic device independent of the intermediate values of the cryptographic algorithm, which is achieved by randomizing these intermediate values. Ishai et al. [9] proposed a formal theoretical framework (known as the ISW framework) and gave the security proofs for the circuits. Due to the uncontrolled glitches occurring in the masked circuits, the implementation of hardware masking faces many challenges. Most of the existing SCAs concentrate on the leakage of the register; however, in the practical cipher chip, the leakage of the combinational circuit also accounts for a large proportion of the issues.
The existing power analysis against a combinational circuit can be divided into two types. The first one is to analyze power consumption from the glitches occurring in the combinational circuit [10] [11] [12] [13] . Starting in 2005, several papers pointed out that the masking schemes can be unsafe in practice in the CMOS design even though they were provably secure [11, 12] . This is because a CMOS circuit generates leakage caused by glitches, that can account for 20%-70% of the dynamic switching power [13] . A glitch is the critical component of the switching activity of signals in typical Register Transfer Level (RTL) circuits [10] . Mangard et al. [11] theoretically investigated the influence on the masked CMOS gates from the glitches and conducted experimental verification based on SPICE simulation.
Another type of analysis involves constructing the leakage model of the whole combinational logic circuit [12, 14] . Suzuki et al. [14] investigated the leakage model for the combinational circuit, and proposed static and dynamic leakage model for the CMOS device. Mangard et al. [12] proposed the Toggle Count (TC) model in 2005, which was applied to the analysis of the power leakage of combinational circuits. Because of the leakage caused by glitches, masking circuits can be successfully attacked with TC models that are derived from the simulation with glitches based on the backannotated netlist. The original Verilog or netlist file is necessary. Then through logic simulation, the TC model is constructed after calculating all transitions for the whole circuit, which means the TC model describes the sum of power consumption during a period of time instead of the instant power consumption. Moradi et al. [15] proposed an improved TC model which is called the Enhanced Toggle Count (ETC) model. The effectiveness of the ETC model was verified by simulation results, and the comparison of the ETC and TC models showed an improvement of 16% in the similarity to an analog simulation result.
Simulation results have determined the effectiveness of the TC method, and several researchers tried to find the difference between simulations and physical devices for Field-Programmable Gate Array (FPGA) power simulation [16] [17] [18] . Most of them focused on improving the accuracy of the power simulation with Electronic Design Automation (EDA) tools. However, these researchers tried to improve the accuracy considering the reconfiguration overhead, accelerator area, performance tradeoff, and idle power consumption [18] , which were not related to the sensitive data. Therefore, the existing work about the power simulation cannot be used to analyze the combinational circuits.
Moradi et al. [19] investigated a software implementation of the Rotating S-Box Masking (RSM) scheme and proposed an approach to detect sidechannel leakages on cryptographic implementations. The paper shows that a generic verification method for the masking implementation in practice is necessary. The authors of this study put the TC method as the entrance to analyze the leakage of combinational circuits in the masking designs. Moreover, more attention was paid to the leakage information from different simulations and identification of the leakage information from the design phase and implementation phase separately.
First, the differences among logic-level simulations were investigated and the effectiveness of TC models derived from different simulation levels will be discussed. Next, the original TC model was classified into a Non-glitch Toggle Count (NTC) model and Glitch Toggle Count (GTC) model according to the existence of glitches. Furthermore, an innovative evaluation method will be proposed to detect the leakage from two phases, namely the design phase and implementation phase of the masking scheme. Finally, an SDF-based improvement will be proposed that can significantly increase the effectiveness of the GTC model.
The rest of this paper is structured as follows. Section 2 introduces notations, measurement setups, and the Correlation Power Analysis (CPA) distinguisher considered throughout the paper. Section 3 introduces the background of TC models and discusses several problems with them. Moreover, different levels of simulation and the characters of corresponding TC models are clarified in this section. Section 4 introduces the whole flow of the proposed TC analysis method. Sections 5 and 6 compare the theoretical effectiveness of different TC models. In addition, several experiments with TC models were conducted to evaluate the leakages of the design phase and implementation phase separately in the masking scheme. The conclusions are stated in Section 7.
Preliminaries
To make the paper self-contained, this section introduces notations and provides background information about the measurement setups considered throughout the paper. In addition, the CPA distinguisher that was utilized by the original TC model is introduced.
Notations
The calligraphic letter, like X , is used to denote finite sets. The corresponding large letter X is used to denote a random variable over X , while the lowercase letter x is a particular element from X . The variance of X is denoted by Var .X /. The Pearson correlation coefficient between X and Y is denoted by X;Y and measures the linear interdependence between X and Y . The covariance between X and Y is denoted by Cov .X; Y /.
Measurement setups
In the following sections, the SASEBO-GII platform equipped with a Xilinx Virtex5 FPGA chip [20] , which is specially designed for research on hardware security, such as SCA attacks, was used to conduct the practical experiments.
In all the experiments, the SCA traces were collected by means of a KEYSIGHT InfiniiVision DSOX3034A digital oscilloscope at a sampling rate of 2 GHz and a bandwidth limit of 20 MHz to reduce the environmental noise.
To construct the TC models, the design was simulated by using an ISE Simulator (ISIM), which is a logic simulator on the Xilinx ISE platform. Four levels of simulations are available in ISIM, including behavioral simulation, post-translate simulation, postmap simulation, and post-route simulation.
CPA distinguisher
The proposed TC analysis method is based on the CPA distinguisher which is utilized by original TC model [3] . In this case, the method starts by constructing a leakage model L .X / for the target intermediate variable X. This model corresponds to the leakages associated with the different values of X . Then the estimator calculates the correlation, P .t/;L.X / , between the physical power consumption collected from the measurement devices, P .t/, and modeled leakages from the leakage model (e.g., TC model) L .X / to find the real leakages.
TC Model

Introduction of TC model
Mangard et al. [12] proposed the TC model in 2005. The TC model is a type of leakage model that is constructed on the premise of acquiring the design details about the combinational circuit, which may need the backannotated netlist file of several parts of the device. In general, the TC model is constructed after calculating the number of transitions that occur in the combinational circuit and the registers.
The TC model is defined as follows:
where P .t / denotes the physical power consumption at time sample, t; m denotes the number of internal signals; and g i .t / is the sum of transitions occurring in the i -th signal during the period of OEt; t C ", where " is determined by the delay of the longest path in the combinational circuit [15] . In a real-world scenario, the attacker must simulate the combinational circuit for all possible values. If the register has an eight-bit width, the attacker has to simulate the circuit 256 255 times and accumulate the number of transitions in each simulation separately. Then the attacker can utilize the TC model to perform a CPA attack.
Effectiveness of TC-Correlation between modeled leakage and physical samples
Studies have found that side-channel leakage exists in the masked CMOS implementations due to glitches [12] . Moreover, the glitches occur in combinational circuits due to the delay of logic gates. Several factors affect the delay of the logic gates, such as the size of the combinational circuit and the temperature of the physical device. Though most of the glitches occur in sequence, the influence of glitches on power consumption is added accumulatively and appears as a peak in power consumption traces [15] . Mangard et al. [12] utilized this character and proposed the TC model.
To construct a TC model, the DPA-Contest V2 design [21] was simulated by using the ISIM in Xilinx ISE platform. Figure 1a shows the TC model constructed by behavioral simulation and Fig. 1b shows the physical power trace. To decrease the noise, they were averaged with the same intermediate value. Then, 256 averaging traces were acquired where each trace contained 4000 sample points. Hence, the traces were represented as a matrix, T, of size 256 4000. The Pearson correlation coefficient between the TC model constructed by the behavioral simulation and each column of a matrix, T, was calculated, and the result is shown in Fig. 1c . The absolute value of the correlation begins to reach a maximum when the sample point approximates to 1220, which is the time when the signals of the combinational circuit begin to toggle.
S-Box
Signals in combinational circuit begin to toggle 
Toggle power consumption of combinational circuit
The toggle power consumption of the combinational circuit is the core part of TC models. Since the TC model is derived from simulations, the number of the transitions is fixed when the input and the design of the circuit are both fixed. However, the number in the real circuit can be hardly fixed, because there are many factors, such as input-output path delay, port delay, the size of the combinational circuit, and the temperature of the physical device. All of these affect the number of transitions. For example, assuming a linear ramp at the output of the gate, for an input-output path delay, d , and glitch duration, w i , at the gate input, the glitch duration, w o , at the output of the gate that influences the number of glitches can be approximated as follows [22] :
The TC model was constructed with digital simulators while the transition power of both registers and logic gates was assumed to be identical. When the total number of transitions in the circuit was counted as simulated dynamic power, the number of transitions in the combinational circuit was more than that in the registers. In other words, the modeled leakage of the combinational circuit accounted for a larger proportion. Evidently, the effectiveness of the TC method depends on the difference between the transitions in the physical device and simulation levels. In the existing studies of the TC method, there are no related works about it. Since different simulation levels will reveal the leakage of the countermeasures, it can help to understand the reason for the TC leakage in the physical circuit and propose secure countermeasures both in the design and implementation phases.
Different levels of simulation
TC models can be derived from different levels of logic simulation. Both the analog and logic simulations are used to simulate the power consumption of a digital circuit, and the result of the analog simulation is more precise than the logic simulation [23] . Since the TC model is derived from the logic simulation, the accuracy of logic simulations is the core problem in this paper. Logic simulations are based on netlists of the digital circuit that usually contain back-annotated information about the propagation delay. Naturally, the precision of the logic simulation depends on the accuracy of the back-annotated information [23] . The ISIM is a logic simulator equipped with the ISE platform, and four levels of simulations are available in ISIM, including behavioral simulation, post-translate simulation, post-map simulation, and post-route simulation. The original Verilog design is needed when performing behavioral simulation, and the netlist file that is generated after the design has been synthesized and translated is required when performing the post-translate simulation. Post-map simulation is available after the netlist has been mapped to the physical device. After the netlist has been placed and routed, a post-route simulation is available. In addition, the input-output path delay is provided when performing a post-map simulation. The input-output path delay and the port delay are both provided when performing the post-route simulation. Theoretically, the post-route simulation is the most accurate. However, the accuracies of the last two levels of simulations may be decreased due to the effect of noise in the physical device. The differences among the four levels of simulations are shown in Table 1 .
TC Analysis Method
To analyze the leakage of a masking scheme in the design phase and implementation phase separately, the TC models can be classified as NTC or GTC, and the difference is whether to describe the quantity and distribution of glitches in the process of simulations. Among the four levels of simulation, the propagation delay is not provided when performing the behavioral simulation and post-translate simulation. Since propagation delay is the necessary condition to predict glitches, the results of behavioral and post-translate simulation only contain normal transitions. However, the propagation delay is provided when performing post-map and post-route simulations, and the corresponding simulation results contain both normal transitions and glitches. In conclusion, the behavioral and post-translate simulations belong to the NTC model, while the post-map simulation and postroute simulation belong to the GTC model.
A vulnerable masking scheme may contain leakage in the design phase or the implementation phase separately. The leakage in the design phase is caused by the defect of the original scheme, such as the MASBox scheme [5] , thus the NTC model can be utilized to investigate the design phase leakage. The leakage in the implementation phase is caused by the weakness of the certain method, such as the MOS-Box scheme [6] , thus the GTC model can be utilized to investigate the implementation phase leakage. A masking scheme without leakage in the design phase may still have leakage due to a weak implementation.
Before introducing the whole flow of the TC analysis method, significant definitions utilized throughout the paper are listed below.
Definition 1 (Leakage) After calculating the correlation, P .t/;L.X / , between physical power consumption, P .t /, and modeled leakages (calculated by leakage models), L .X /, the circuit contains leakages if > 0 (where is the correlation corresponding to the correct key, and 0 is the highest correlation among all correlations corresponding to other wrong keys).
Definition 2 (NTC leakage and GTC leakage) If the leakage is detected by the NTC model (the GTC model), it belongs to the NTC leakage (the GTC leakage).
The whole flow of the TC analysis method is as follows:
( 
If the circuit contains NTC leakage and GTC leakage simultaneously, the masking scheme has defects in the design phase, and the Verilog design needs to be modified. If the circuit contains GTC leakage only, the masking scheme has defects in the implementation phase, and the implementation method needs to be modified.
TC Analysis on the Design Phase of Masking
TC model from non-glitch simulations
Non-glitch simulations include behavioral simulation and post-translate simulation. The difference between them is that the post-translate TC model contains extra normal transitions that come from the logic units that derived from synthesizing and translating. Next, the effectiveness of behavioral TC model and post-translate TC model is compared theoretically. Variable L t is defined as the modeled leakage derived from post-translate simulation, L b and as the modeled 
The physical power consumption, P a , contains P ta and P to . Variable P t a denotes the physical power consumption derived from the normal transitions of signals in post-translate netlist, and P t o denotes the physical power consumption apart from P t a in L t . Thus, P a D P ta C P t o , P t a D C L t , where C is a constant, L b and P t o are independent, also, L t and P to are independent.
The Pearson Correlation Coefficient L b ;P a of L b and P a can be calculated as follows:
Var .P a / (3) Then, L t ;P a of L t and P a can be calculated as follows:
According to Eqs. (3) and (4), the theoretical relation between L b ;P a and
According to the latter experiments, L b L 0 t has a direct relationship with a unique feature of the logic circuit design.
CLCD vs. SLCD
In the practical Logic Circuit Design (LCD), the behavioral simulation counts the wire signals, input signals, and output signals as the unit of transitions [24] . The post-translate simulation is an extension of the behavioral simulation. Each signal in the original design is expressed as a Basic Logic Unit (BLU), and the number of the input ports of a BLU is limited. If any signals exist that cannot be expressed with a single BLU, several BLUs and output signals are needed. Therefore, when the original design is CLCD, more BLUs and output signals will be added in the posttranslate simulations.
Definition 3 (CLCD and SLCD) Let BLU .I n ; O/ denote a BLU with n-bit input port and one-bit output port. The y (y 2 Y ) is a variable in the Verilog design that located after the ASSIGN statement [24] , and Y is a set that consists of all y's. y D F 
Instances of CLCD and SLCD
In Xilinx Virtex5, the BLU of the combinational circuits is LUT, the maximum number of the input port of a LUT is six, so one LUT can perform the computation over GF.2 6 /. As shown in Fig. 2a , the design is CLCD when the number of inputs of an arbitrary signal is more than 6, the extra signal and the LUT3 are added after synthesis. However, the extra signal and the LUT3 are not added if the design is SLCD, which is shown in Fig. 2b .
Experiment of TC method in SLCD
This experiment is based on Xilinx Virtex5 FPGA platform, the CLCD is the AES S-Box while the SLCD is denoted by a two-level combinational circuit with 8 inputs and 8 outputs. Four TC models can be acquired from different levels of simulations, and the value of each TC model is shown in Fig. 2a . All correlation coefficients between each pair of the four models are more than 0.98. Therefore, when performing CPA attacks in SLCD with each TC model, the results are almost identical. From the result of these experiments, the correct key can be recovered when the number of traces reaches 34 000.
From Fig. 3 and the theoretical result in Eq. (5), the effectiveness of the behavioral TC model and the posttranslate TC model are shown to be identical in SLCD. 
Experiment of TC method in CLCD
The CLCD is expressed by the design of the AES SBox, and the values of different TC models are shown in Fig. 4 . To compare the TC model and the physical trace more intuitively, a linear transformation was applied to the TC model. The transformed post-translate TC model and the physical trace are shown in Fig. 5 , where a negative correlation can be found, and the value of the correlation is 0:36:
Two CPA attacks were performed to the physical power consumption trace with two TC models and the effectiveness of them was compared. 5), the post-translate TC model was more effective than the behavioral TC model in CLCD, which had the same experimental result.
Leakage in the design phase
MAS-Box masking scheme
The MAS-Box masking scheme was proposed by Akkar and Giraud [5] , and it is a first-order multiplicative masking scheme. Since multipliers with one operand at zero usually require less power than in all other cases, the multiplicative masking scheme is vulnerable to Zero-Value (ZV) model [25] .
Experiment results of the NTC model
In this experiment, 300 000 traces were collected and each of them contained 4000 points. Because the MASBox scheme was CLCD, the post-translate TC model was selected to represent the NTC model and the post-route model represented the GTC model as a control. The post-translate simulation and the post-route simulation were performed to analyze the masking scheme, and the transitions were counted when the input of the combinational circuit changed from 0 to the unmasked S-Box input. Furthermore, the correlations between the physical traces and two TC models were calculated separately, and are shown in Fig. 7 , where the black trace indicates the correlation of the correct key. Evidently, both of them had the highest peak in the black trace. The leakage that occurred in the design phase will last until the implementation phase is finished. Thus, the leakage can be detected by utilizing the NTC model and GTC model simultaneously.
6 TC Analysis on the Implementation Phase of Masking
TC model from glitch simulations
The glitch simulations include the post-map simulation and post-route simulation. The post-map simulation is performed in consideration of the input-output path delay while the post-route simulation is performed in consideration of both input-output path delay and port delay. In the real circuit, the port delay is widespread and accounts for a significant proportion of the total delay. The difference between the two Standard Delay Formats (SDFs) [26] that were used by the postmap simulation and post-route simulation is shown in Table 2 . The port delays of the post-map simulation are Table 2 SDFs used by the post-map simulation and the postroute simulation.
all equal to zero while the port delays of the post-route simulation are constant values that were estimated by the EDA tool.
Next, the effectiveness of the post-map TC model and post-route TC model were theoretically compared. The TC models can be considered as the modeled leakage. The modeled leakages derived from the postmap simulation and post-route simulation cannot be totally identical with the physical power consumption due to glitches. The post-map modeled leakage is L m D L t C L mg , where L mg is the modeled leakage derived from glitches of the post-map simulation. The postroute modeled leakage is L r D L t C L rg , where L rg is the modeled leakage derived from glitches of the postroute simulation. The physical power consumption is
Variables L mg and L t are independent, L rg and L t are independent, and L t and P t o are independent.
First, L mg ;P t o and L rg ;P to can be calculated as follows:
The routed path length causes the difference of the propagation delay, and the routed path length has a uniform distribution. Thus, the variances of glitches derived from post-map simulation and postroute simulation can be assumed to be Var L mg D Var L rg (9) The port delays in the post-map SDF file are all equal to zero; however, in the physical device, they cannot be zero. Thus, the quantity and distribution of the glitches generated from the post-map simulation were significantly different from the physical device. The post-route simulation was performed when the design was placed and routed, so port delays that were used by the post-route simulation were equal to them in the physical device regardless of the noise. Moreover, the quantity and distribution of glitches derived from the propagation delay were equal to it in physical device irrespective of the noise. Thus, L mg ;P to < L rg ;P to (10) From Eqs. (7)- (10), Cov L mg ; P t o < Cov L rg ; P t o (11) where variables L m ;P a and L r ;P a can be calculated as follows:
From Eq. (9) and Eqs. (11)- (13),
Two CPA attacks with different TC models were performed and the effectiveness of them was compared. The physical traces of the AES S-Box were acquired in SASEBO-GII with the sample rate of 2 10 9 s 1 . The guessing entropy was utilized to evaluate two attacks and the results are shown in Fig. 8 . The guessing entropy of post-route TC attack is less than the other. This means that the post-route TC model was more effective than the post-map TC model, which had the same as the theoretical result (Eq. (14)).
Leakage in the implementation phase
The MOS-Box masking scheme was proposed by Oswald et al. [6] , and works with combinations of additive and multiplicative masks. It is secure against ZV attacks, but the MOS-Box scheme is not secure against TC attacks due to the glitches. Thus, GTC models (e.g., the post-route TC model) can be used to perform CPA attacks successfully.
Mangard et al. [12] utilized the post-route TC model to successfully attack the MAS-Box masking scheme, and they demonstrated that the correct key was recovered before the number of traces reached 30 000. The leakage of the MOS-Box scheme is caused by glitches, and the behavioral simulation and posttranslate simulation are performed without glitches, so the MOS-Box scheme cannot be successfully attacked by using NTC models. Thus, the MOS-Box scheme has no design phase leakage, which is different from the MAS-Box. However, the weakness in the implementation phase can be found by GTC models (e.g., the post-route TC model) due to glitches.
Improved GTC model based on SDF
Since there is a gap between the simulated propagation delay and the real delay, the method that utilizes the GTC model to detect the leakage of the masking schemes still has issues with accuracy, which may lead to an incomplete evaluation that misses some leakage. An improvement scheme for GTC models based on SDF that can increase the accuracy of GTC models significantly is therefore proposed. The basic flow of this method is shown in Fig. 9 .
An experiment in S-Box of AES was conducted to verify the effectiveness of this improved scheme. In the physical device, the input-output path delay and the port delay can be impacted by environmental factors. Since all gates and wires are identical at the chip level, it can be assumed that environmental factors have the same impact on the propagation delay. Therefore, the value of the propagation delay was modified by adding Fig. 9 The flow of the improved GTC model. a constant, C , in SDF that was used by the post-route TC model. In the S-Box design used for this study, after the completion of all steps in Fig. 9 , it can be found that > 0 if C 2 OE42; 47 with 50 000 traces, where is the correlation corresponding to the correct key, and 0 is the highest correlation among all correlations corresponding to 255 wrong keys.
The result is shown in Fig. 10 , where the black trace corresponds to the correlation of the correct key. Evidently, the TC model with an adjusted SDF was significantly more effective than the original TC model. The effectiveness will improve with an increasing number of traces.
Conclusion
Masking is one of the most important countermeasures against SCA because both the design and implementation phases may consist of security weakness. To the best of our knowledge, the existing analytical methods always challenge the security of masking schemes separately and in two phases. Security frameworks are often used to validate the theoretical security of masking schemes. However, this cannot guarantee the practical security of all implementations.
In this paper, a new method based on TC models was proposed to verify the security of masking schemes both in the design and implementation phases. By analyzing the difference among different levels of logic simulation, it was confirmed that NTC models could be used in the design phase and GTC models could be used in the implementation phase. Therefore, the weaknesses found with NTC models indicate the defects of masking schemes in the design phase, while the weaknesses found with GTC models correspond to the problem in the implementation phase. Furthermore, an improved GTC scheme was proposed, that has a smaller gap with the real design. It is worth noting that both models are based on the logic simulation, which has a distinct advantage in efficiency and can be accepted as a practical evaluation tool in the design of masking schemes.
[24] D. Thomas 
