Abstract-Energy efficiency and security is a critical requirement for computing at edge nodes. Unrolled architectures for lightweight cryptographic algorithms have been shown to be energy-efficient, providing higher performance while meeting resource constraints. Hardware implementations of unrolled datapaths have also been shown to be resistant to side channel analysis (SCA) attacks due to a reduction in signal-to-noise ratio (SNR) and an increased complexity in the leakage model. This paper demonstrates optimal leakage models and an improved CFA attack which makes it feasible to extract first-order sidechannel leakages from combinational logic in the initial rounds of unrolled datapaths. Several leakage models, targeting initial rounds, are explored and 1-bit hamming weight (HW) based leakage model is shown to be an optimal choice. Additionally, multi-band narrow bandpass filtering techniques in conjunction with correlation frequency analysis (CFA) is demonstrated to improve SNR by up to 4×, attributed to the removal of the misalignment effect in combinational logics and signal isolation. The improved CFA attack is performed on side channel signatures acquired for 7-round unrolled SIMON datapaths, implemented on Sakura-G (XILINX spartan 6, 45nm) based FPGA platform and a 24× reduction in minimum-traces-to-disclose (MTD) for revealing 80% of the key bits is demonstrated with respect to conventional time domain correlation power analysis (CPA). Finally, the proposed method is successfully applied to a fully-unrolled datapath for PRINCE and a parallel round-based datapath for Advanced Encryption Standard (AES) algorithm to demonstrate its general applicability.
INTRODUCTION
Recent shift in the paradigm from mainframe and personal computing to computing at edge nodes has led to an exponential growth in interconnected internet-of-things (IoTs) and internetof-everything (IoE) devices. In a typical IoT environment, IoT devices sense, process and communicate information to a mobile device or to enterprise/servers over a wireless network. To ensure integrity, confidentiality and security of sensitive information against malicious adversaries, communication is usually encrypted with implementations in software/hardware at the edge node. Due to severe resource constraints for IoT devices, the encryption hardware must be compact, consume small energy/power and be able to meet system performance/throughput requirements. This gave rise to lightweight cryptography with several compact block ciphers, such as Midori [1] , PRINCE [2] , SIMON and SPECK [3] , proposed over the last few years. Although these algorithms are proven secure against cryptanalysis attacks, their hardware implementations tend to leak information in several physical side channels, e.g., timing, power, electromagnetic (EM) emanations, acoustic signatures etc. Differential Power Analysis (DPA) [4] or Correlation Power Analysis (CPA) [5] of these side channel signatures can reveal the secret encryption key used during the encryption process.
To reduce/eliminate leakage through side channels, several hiding and masking-based countermeasures have been proposed [6] [7] [8] [9] . Round unrolling was proposed as a simple countermeasure against power-based side channel attacks (PSCA) for data encryption standard (DES) [10] , SIMON [11, 12] and PRINCE [13] and are applicable to other lightweight ciphers as well. Although a generic countermeasure, it is highly applicable for lightweight ciphers due to their less complex round functions and large number of rounds. Traditionally, both software and hardware implementations of cryptographic algorithms are pipelined, with intermediate states stored in sequential registers after computation. Since registered values match with the algorithmic values, hypothetical leakage models can be easily derived based on registered intermediate states.
However, when cryptographic rounds are fully/partially unrolled, these registered intermediate states can no longer be targeted due to deeper diffusion of the keys through the datapath and key expansion, requiring an adversary to target intermediate states computed at the intermediate nodes of the combinational logic. Extraction of side channel leakage from a combinational logic is difficult due to the reasons described below:
Inability to find deterministic states to compute a good leakage model as intermediate states computed at the intermediate nodes of a combinational logic do not necessarily match the targeted algorithmic states. High glitch activity (a function of logic depth and therefore degree of unrolling) leading to poor SNR. Misalignment (in time) of measured power traces as the intermediate states are computed at different instants based on the signal arrival times (a function of logic structure and the input vectors). Introducing misalignment into the power traces has often been used as a countermeasure to SCA. Data dependent and Random Delay Insertion (RDI) countermeasures have been shown to reduce information leakage by desynchronizing power traces, thereby reducing correlation between current consumption and the computed intermediate state [14, 15] . The misalignment introduced with RDI based countermeasures affect the temporal shift of time-domain signals due to a composition of legitimate and dummy executions. However, they have been shown to be successfully mitigated using alignment techniques both in time and frequency domain [16] . On the other hand, the misalignment inherent in unrolled datapaths is dependent on logic structure organization and is input vector dependent, which results in different delay distributions of targeted bits in a combinational logic as shown in [14] . Success of single and multi-bit CPA attacks in timedomain is dependent on these delay distributions. Therefore, unless an adversary has complete and accurate knowledge of the combinational logic structure, it is very difficult to derive a good leakage model based on targeted algorithmic states. When the deterministic part of the leakage differs from the leakage model used by the adversary, leakage models based on linear regression have been shown to provide a better attack efficiency [17] . However, for SIMON, due to an absence of non-linear functions, 1-bit intermediate states can be targeted with a reduced key complexity without the need for linear regression. Similarly, single-bit leakage models perform much better for highly noisy measurements as demonstrated in [17] with respect to DPA. In this work, we show that single-bit leakage models with reduced complexity is an optimal choice for round unrolled SIMON with respect to the CPA-based distinguisher. However, for PRINCE and AES which have non-linear substitution functions, 1-bit leakage models have the same key complexity as the multi-bit leakage models, and are therefore not an optimal choice [18] .
Correlation frequency analysis (CFA) has been previously shown to improve CPA attack efficiency in the presence of RDI or randomized clock-based countermeasures [16, 19] . Due to the presence of noise in the measured signatures, frequency components of leakage signals (expected to have a small amplitude) are isolated from frequency components of noise (expected to have a large amplitude) with bandpass filters in time-domain to eliminate adverse effects of spectral leakage [20] . We show that filtering the measured signatures in timedomain using multiple narrow bandpass filters, each isolating a specific frequency band, followed by windowed FFT, improves the effectiveness of CFA attacks for unrolled datapaths by removing the data dependent misalignment of power samples and improving SNR. The key contributions of this work are described below:
We demonstrate that single-bit HW-based leakage models on round unrolled datapaths of 128-bit SIMON outperform multi-bit leakage models with respect to CPA. We present an improved CFA attack with multi-band narrow bandpass filtering employed in the time-domain followed by windowed FFT to isolate the signal and improve SNR for increased attack efficiency in the frequency domain.
The proposed leakage models and CFA attack is applied to 7 and 10-round unrolled implementations of SIMON-128 and fully-unrolled implementations of PRINCE-64. All key bits are successfully recovered with a small number of measurements demonstrating that unrolled datapaths have very high exploitable leakage. Additionally, the same CFA attack is applied to round-reusebased 128-bit parallel implementation of AES-128 with HWbased models and leakage from the combinational SBOX output is successfully extracted, demonstrating general applicability of the proposed CFA attack to extract side channel leakage from any combinational logic. The rest of the paper is organized as follows: Section II reviews encryption algorithms employed in this work and related literature; section III presents the experimental setup and side channel analysis methodology; section IV presents SNR and CFA attack results for SIMON-128, section V applies the same methodology and demonstrates successful CFA attacks for fully-unrolled PRINCE-64 and AES-128; and section VI concludes the paper.
II. BACKGROUND

A. SIMON, PRINCE and AES Algorithms
1) Lightweight Ciphers -SIMON and PRINCE: SIMON and SPECK were two ciphers proposed by NSA, which can be optimized for hardware and software implementations. SIMON is a block cipher with a fiestal network with different configurations, each with a different level of mathematical security provided [3] . Encryption and decryption operations are based on a round function. A typical round function for SIMON-128 with a block size of 64-bits is depicted in Fig. 1(a) . It consists of 1-AND gate and 3-XOR gates. Encryption in the i th round can be expressed using the Eq. 1:
PRINCE was proposed at ASIACRYPT 2012 by Borghoff et. al. [2] as a lightweight block cipher optimized for low latency operation. It possesses reflectivity features where encryption and decryption can be performed using the same hardware resources. A 64-bit plaintext and 128-bit key is used for encryption. The overall architecture of the cipher is shown in Fig. 1 (b) . The entire key is split into 2 parts, k0 and k1 (64 bits each):
k0' is derived from k0 by a transformation as shown below:
k0 and k0' are used as pre-whitening and post-whitening keys respectively. Encryption is completed in 10 fully unrolled rounds. The first 5 rounds perform SBOX, linear operation (M'), and ShiftRow operations. The last 5 rounds perform inverse SBOX, linear and inverse ShiftRow operations. Each round has a unique round constant except the middle layer which has only SBOX, inverse SBOX and linear operations. 
B. Related Work
Lightweight ciphers are designed for specific applications but may sometimes fail to meet all target requirements due to inherent trade-offs in security, flexibility, side-channel and fault-attack resistance. Serial and parallel datapath implementations of SIMON on FPGA have been shown to be susceptible to CPA attacks [21, 22] . Round unrolling was first proposed as a simple architectural countermeasure on hardware implementations of Data Encryption Standard (DES) algorithm [10] where a fully-unrolled implementation was shown to be resistant to CPA and Mutual Information Analysis (MIA) attacks. More recently, A. Singh et. al. have shown that unrolled implementations of 128-bit SIMON (to 6 th degree) improves power side-channel attack resistance. Their CPA attack targeted round 3 output (combinational) with a 1b hamming weight (HW) leakage model but failed to show an attack even with 500K traces, showing inability to extract leakage in time-domain [23] . Author in [12] demonstrated a general Degenerate Grouping Power Attack (DGPA) on different configurations of SIMON and SPECK and showed a successful attack with up to 4 th degree of unrolling (r=4). However, they showed that with sufficient degree of unrolling (r>=9), DGPA attack is computationally infeasible to carry out even with state-of-theart computational resources.
Since its introduction, a significant amount of work has been done on the side channel leakage characteristics of PRINCE. CPA attacks have shown a successful recovery of the correct key in a round-based implementation of PRINCE on a SASEBO-G FPGA where the same hardware was re-used iteratively in a loop to compute each round of the encryption operation [24] . A similar analysis was shown for a DPA attack as well. Both these works were based on the hamming distance (HD) leakage model targeting the first-round output of the encryption block. A fully unrolled architecture of PRINCE has also been analyzed by [13] where a point-of-interest (POI) selection using a Welch's t-test was performed on the measured power traces prior to HD (with respect to back-to-back plaintext) based CPA analysis. The results showed an improved number of key nibbles recovered with up to 250K traces.
The success of CPA attacks improves with pre-processing of acquired power traces, especially filtering. Filters such as matched filters [25] , comb filters [26] , band pass frequencydomain filtering (near-clock frequency) have shown improved SNR of power traces. In [27] , the impact of linear FIR filters with optimized filter coefficients on time and frequency domain CPA is demonstrated. I. Levi et. al. have demonstrated the impact of filters of different widths on a pseudo-asynchronous design style (a countermeasure that reduces information leakage by using data dependencies to generate dynamic clock sampling) [28] .
III. PROPOSED LEAKAGE MODELS AND IMPRVOED CFA METHODLOGY
This section describes leakage models, improved CFA methodology and experimental setup to measure side channel activity to validate the hypotheses.
A. Proposed Leakage Models
The success of any side channel attack not only depends on the chosen distinguisher (CPA, DPA, mutual information analysis -MIA, etc), but also on the leakage model. For unrolled datapaths with deeper key diffusion, leakage from sequential registers cannot be targeted, thus requiring modeling of intermediate states within the combinational logic, which is not deterministic. The success of attacks, thus, depends on how well the modeled intermediate variable (modeled leakage) matches the true intermediate variable (true leakage). Single-/multi-bit intermediate variables can be targeted to create single-/multi-bit leakage models based on the corresponding key dependencies of the target. For encryption algorithms with a non-linear substitution operation, the key dependency generally remains unchanged, irrespective of the number of target bits chosen. For example, the key dependencies at any SBOX output for PRINCE and AES is the same regardless of the number of bits chosen for the target intermediate variable.
It has been demonstrated that for such cases, all bits in the intermediate variable that correspond to the respective keydependencies need to be chosen for better attack efficiency. In contrast, SIMON does not contain non-linear functions, which leads to a reduced key complexity (2-bit key dependency for 1-bit of the output at round 2). Additionally, when there is a mismatch between modeled leakage and true leakage, multi-bit HW/HD leakage models can be improved with linear regression (linear combination of weighted bits). In such a case, single-bit leakage model (DPA) is also demonstrated to perform better than multi-bit (but not as good as linear regression) [17] .
Considering the reduced key-dependency, CPA (equivalent to normalized DPA) with 1-bit leakage models in SIMON are expected to provide better results without the need for linear regression (only 1-bit targeted; different weights don't affect the output of CPA). HW based models are preferred, compared to HD, because the previous state in a combinational node is unknown due to high glitching activities and large number of transitions prior to reaching the final state.
B. Improved CFA attack
CFA attacks are usually employed to break time domain misalignment-based countermeasures (random delay insertion, clock randomization, globally asynchronous locally synchronous -GALS designs etc). In the case of unrolled datapaths, the arrival times of signals at different nodes of combinational logic is input-vector-dependent as opposed to registered sequential circuits. Large noise and high glitch activity reduce the efficiency of CFA attacks on combinational logic. In this regard, we attempt to improve CFA by employing multiple time-domain narrow bandpass (5 MHz) filters. In contrast to conventional techniques, we filter the entire power trace (5MHz to 105MHz) into 5MHz bands to isolate signal from noise in the time domain, eliminate the effect of spectral leakage (side lobes of noise frequency leaking into nearby signal frequency with small amplitude), while also improving SNR. Then, windowing in frequency domain of each of these 5MHz bands further enhances leakage extraction. This overall methodology improves the attack, minimizing MTD and recovers all bytes.
C. Experimental Setup for Validation
The unrolled architectures for SIMON-128, PRINCE-64 and round-based parallel datapath architecture for AES-128 were synthesized and mapped on Sakura-G FPGA platform [ Fig.2 (a) ]. Randomly generated plaintext vectors and a fixed secret key are loaded from C# application running on PC to the control FPGA through USB interface. Target design is mapped onto the main FPGA (spartan-6, 45nm technology) which performs encryption. An on-board LNA (350MHz bandwidth, 10× gain) amplifies the power signatures which are measured during the encryption. An internal trigger signal, generated by the main FPGA, triggers Tektronix DSO5204 oscilloscope (2GHz BW, 10Gbps sampling rate, sampled at 1Gbps) to read samples from the scope using a MATLAB script [ Fig. 2(b) ].
Traces were captured with a channel bandwidth of 250MHz. The frequency of operation for all the designs was 24MHz. All the acquired signatures were filtered using a wide band pass filter (15-35MHz) and several narrow bandpass filters (5MHz band, ranging from 5MHz to 105MHz) [ Fig. 3] . Fig. 4(a) plots the raw measured power signatures for 7-round unrolled SIMON-128, fully-unrolled PRINCE-64 and parallel AES-128. FFT of the measured power shows a high peak at the design clock frequency (24MHz) and its harmonics [ Fig. 4(b) ]. Several low frequency peaks are also observed. Fig. 4(c) plots the filtered (with a 15-35MHz bandpass filter) waveforms for the measured power signatures.
Metrics used to characterize and compare leakage models and the proposed improved CFA attack are described below:
1) Signal to Noise Ratio (SNR): Signal to noise ratio of captured traces is represented using the leakage model L0 (4) where ε is the leakage conveyed by one-bit toggle and is the signal, ϕ(x) is the leakage model related to the plaintext and the key, L0 is average circuit power due to activity of other parts of the design, N(0, σ2) is an additive white Gaussian noise (AWGN). Signal (ε) can be modeled as covariance between power traces and leakage model [29] , which is given by: 
FFT FFT
Freq.
FFT
Freq.
Filtered Traces Fig. 3 . CPA attack methodology performed on power traces for multiple bands in 5MHz interval followed by windowed FFT. Best attack shows minimum traces to disclosure (MTD) with smallest value and corresponding band as highest leaking band. 
where M is leakage model and n is the number of bits in the intermediate variable. The SNR with this leakage model is then given as: (6) 2) Success Rate (SR) for CPA/CFA: All 64 bits of the lower key for SIMON-128 were analyzed with respect to CPA/CFA attack. The attack efficiency for the time/freq. domain attacks with different leakage models was compared with respect to number of bits recovered with increasing number of measurements, namely success rate (SR) [17] . Additionally, minimum-traces-to-disclose (MTD) is also used to show a successful attack for the chosen bits. (7) IV. EXTRACTING LEAKAGE FROM UNROLLED This section explores different leakage models targeting the initial few rounds, presents SNR and applies the improved CFA attack on 7-and 10-round unrolled datapaths for SIMON-128 to extract side channel information leakages.
A. Comparison of Leakage Models with respect to SNR
For CPA/CFA attacks, an adversary must model leakage generated due to switching of intermediate nodes. Switching activity and therefore power consumption is generally modeled with hamming weight (HW) or hamming distance (HD) based models. Although, HD models dynamic power consumption more closely, it requires the knowledge of underlying hardware implementations to observe state transitions. On the other hand, HW can be generated based only on the algorithmic understanding of block cipher operation, which benefits an adversary without knowledge of the underlying hardware architecture.
To understand the impact of leakage models, several HWbased leakage models are evaluated including the 1b HW leakage model described in the previous section for 7-round unrolled SIMON. We have evaluated 1-bit (1b), 2-bit (2b), 3-bit (3b) and 4-bit (4b) HW leakage models at round 2 output, and 1-bit (1b) HW model at round 3 output. Increasing bitwidth further or targeting round 4 output significantly increases the key dependencies and complexity of the leakage model. SNR is computed for each leakage model against number of measurements and is shown in Fig. 6(a) . Signal strength is evaluated using correlation between the actual power trace and the leakage model for the correct key guess. Noise is modeled as a gaussian random variable. SNR also depends on the number of bits used to compute HW. A reduction in SNR is observed as bit-width is increased in HW-based leakage models. Also, SNR degrades for 1b HW leakage model at round 3 output. This is attributed to an increase in depth of the combinational logic leading to a higher glitch activity. We see that 2b HW leakage model gives the highest SNR. SNR is also evaluated in multiple filter bands for the 1b HW leakage model as shown in Fig. 6(b) . It is observed that narrow band pass filtering in frequency domain improves SNR by atleast 4× compared to wide-band (15MHz to 35MHz) around the clock frequency. It also shows that frequency domain analysis improves SNR over time-domain in most bands. Moreover, leakage is present not only in the clock band region, but in its harmonics as well as some low-frequency bands.
CLK
B. Successful Key Recovery with Proposed Leakage Models
and Improved CFA attack 1) 7-round Unrolled SIMON-128: A. Singh et. al. have shown 6-round unrolled SIMON-128 implementation being resistant to power-side channel attacks with no key recovery possible even with 500K traces [23] . The authors have targeted flop output and shown the infeasibility of modeling power due to increased key dependencies. However, 6-round unrolled SIMON can be easily attacked with respect to known-ciphertext attacks as the last clock cycle only performs 2 rounds of operation (SIMON-128 has 68 rounds and each cycle in a 6-round unrolled datapath executes 6 rounds with the last cycle executing only 2 rounds). Therefore, we attack a 7-round unrolled SIMON to demonstrate leakage extraction in combinational circuits of unrolled datapaths, as it is advantageous over 6-round unrolled datapath presented in [23] with respect to both known plaintext and ciphertext attacks.
We have performed CPA and CFA using several HW based leakage models, considering different number of bits at round 2 and round 3 output. For 1-bit HW leakage model, 1 bit of the intermediate state at the output of round 2 (R2) is considered to generate key dependencies and the leakage model. We have evaluated all possible 1-bit leakage models for revelation of all 64 bits of the lower key. Altogether, 64 leakage models were generated which could recover 64 bits of the lower key. Each leakage model had a 2-bit key dependency (total of 4 key guesses). Fig. 5 shows the attack location, which is the output of round 2 and round 3. Key dependencies for the highest leaking bit are described in the equations below.
[ The plots for time-domain CPA and freq. domain CFA are shown in Fig. 7 (a, b) where the correct key guess shows a distinct peak compared to other key guesses. MTD plot against the number of measurements is shown in Fig. 7(c, d) the MTD in both time and frequency domain for the highest leaking band (5-10MHz). These results show 1-bit leakage models to be optimal in revealing the correct key.
To demonstrate the effectiveness of the proposed improved CFA attack, we have evaluated the SR against number of measurements targeting all 64-bits of the lower key, using up to 1 million measurements. Fig. 8 summarizes the SR with different leakage models against number of measurements. MTD for 80% SR is used as a metric to compare different leakage models and post-processing methods. 1bHW R2 based leakage model reveals 80% bits with only 41K measurements while 2bHW R2, 3bHW R2 based leakage models at round 2 output require 381K and 801K measurements respectively. 1bHW based leakage model at round 3 output reveals 80% bits with 781K traces. 4bHW leakage model at round 2 output could not recover 80% bits with 1 million measurements. Therefore, 1bHW leakage model at round 2 output is the optimal choice for leakage models. However, when we look at SNR results, 2bHW leakage model at round 2 output shows the highest SNR but a slightly poorer MTD for 80% SR. This could be attributed to the modeling of intermediate states. In unrolled datapaths, the states that are computed at intermediate nodes of the combinational logic is difficult to predict (with only the final state of the combinational logic at the end of the unrolled rounds known) and depends on the logic structure. Therefore, multi-bit leakage models may not be a good choice as these bits may not be getting computed together. 1-bit leakage models, thus, act as a better distinguisher [30] . Multi-bit leakage models may provide a better result when linear regression-based analysis is used with different contributions from each bit [31] .
SR for SCA in the time domain and frequency domain for 3 different leakage models is shown in Fig 9. Clearly, SR for time-domain CPA attacks for all 3 considered leakage models is lower than CFA attacks. CPA also fails to reveal 80% of bits with 1000K traces, except for 1bHW R2 leakage model, where the MTD is quite high (781K). On the other hand, CFA reveals 80% of the bits with all the 3 leakage models.
The advantage of narrow bandpass filtering techniques is demonstrated with respect to MTD for 80% SR plotted in Fig.  10 . The highest leaking band (5MHz-10MHz) for narrow bandpass filtering is compared with wide band (15MHz-35MHz). MTD in the wide-band for CFA attack reveals 80% bits with 241K traces, but time-domain has a poor SR (45%) even with 1000K traces. It also, indicates that some bytes show leakage in bands outside wide band around clock frequency, which cannot be exploited with 15M-35M wide band filters.
2) 10-round Unrolled SIMON-128:
S. Cavanaugh has demonstrated a Degenerate Grouping Power Attack, (DGPA), a kind of partition power analysis, on SIMON 64/128 and have shown infeasibility of targeting attack on architectures with >9 degree of unrolling [10] . The proposed methodology in this paper targets leakage from initial few rounds and it should show attack on 10 round unrolled architecture. We have evaluated side-channel resistance of round unrolled implementation to 10 th degree, and successfully demonstrated CFA attack revealing 80% bits with 40K traces. The plot shown in Fig. 11 is based on a 1bHW leakage model in round 2. In comparison, time-domain plot could also reveal 80% of bits but has a higher MTD of 580K.
V. EXTRACTING LEAKAGE FROM UNROLLED PRINCE-64 AND PARALLEL AES-128 To demonstrate the general applicability of the improved CFA attack, we have evaluated the power side channel attack (PSCA) resistance of fully unrolled PRINCE block cipher and round-based parallel datapath for AES.
For the fully unrolled architecture of PRINCE, both HW/HD based leakage models were analyzed. The absence of a pipeline makes the encryption engine a purely combinational logic block. The attack location for HD model is the output of first round SBOX and the HD is between the output of the first round SBOX and the input register. For AES, the 128-bit parallel datapath implementation uses round-based logic and reuses the same hardware for 11 rounds to complete the encryption. Since each round output is registered, switching activity can be easily modeled using HD based leakage models. PSCA evaluation of parallel AES has been demonstrated in many prior works. The leakage model is generated using HD between the output at the last round and 2 nd to last round. The methodology of targeting non-registered intermediate states as described in Section III, is used to demonstrate PSCA attack with HW based leakage model, demonstrating general exploitability of leakage from combinational circuits.
A. Fully-unrolled PRINCE-64
The PRINCE architecture with the target location for mounting CFA, is shown in Fig. 12(a) . The output of the SBOX in round 1, R1, was targeted to generate the required leakage model. Prior to the first round, the plaintext is XOR-ed with k0 followed by k1 (RC0=0 and can be ignored in the power-model). Therefore, a hypothesis can be made for a 64-bit key, K, where
k0. This K is subsequently XOR-ed with the 64-bit plaintext before being input into the SBOX of R1. The SBOX being a 4-bit non-linear operation, creates a 4-bit key dependency for every nibble of its output. We targeted each nibble of the output and generated the HD based leakage model utilizing the key dependencies represented by the following sets of equations:
where βi and βi-1 are Sbox outputs in the current and previous states respectively, corresponding to successive plaintexts inputs. The HW is calculated simply based on the number of 1 bits in the current state of the output and is represented by: (10) For our analysis, we only focus on the extraction of the key K. However, once K is determined, k1 can be easily targeted by keeping K constant, making a hypothesis for k1 and targeting the output of the SBOX of round 2. Here we have shown the successful recovery of all the nibbles of K within 1000k measurements. MTD required to recover 80% of bits is expressed as a percentage of the total number of nibbles. Success rate against number of measurements, for HW and HD leakage models for CPA and CFA is depicted in Fig. 13(b) . The 80% SR data shows the overall best case MTD to be 280K with HD based leakage models. In contrast, only 13 out of 16 nibbles were recovered using traditional time-domain CPA even after 1000K measurements.
B. Parallel AES-128
The AES-128 datapath highlighting the target location is depicted in Fig. 12(b) . We have generated an 8bHW leakage model, which is the output bit-width of the SBOX. To generate a leakage model, the input plaintext is divided into 16 bytes, , while the input key (K) is also divided into 16 bytes, . This is followed by AddRoundKey operation which is essentially an XOR of the input plaintext and the key. This is the output of the targeted 1 st round SBOX. HW at the output of the SBOX can be described as: (11) CPA and CFA are performed on all 16 SBOX outputs. Therefore, 16 HW leakage models are analyzed which cover all 128 key bits. The side channel analysis methodology is applied, and minimum MTD, corresponding frequency domain window, and the highest leaking band is derived from PSCA analysis. MTD required to reveal 80% bits with CPA and CFA are 60K and 40K respectively. SR vs. number of measurements for 8-bit HW leakage model is plotted in Fig. 12(b) . (Table 2) . Even though the improved CFA attack reveals all bits for AES-128, the advantage of using narrow bandpass filtering over unfiltered or wide bandpass is not significant. This could be attributed to a better match between modeled and true leakage possible due to no logic optimization happening between SBOX and Mix-Column operation. Table 3 compares the proposed leakage models and improved CFA attack with existing works on attack methods on unrolled architectures of lightweight ciphers. Both [10] and [11] could not reveal any key bits with 100K and 500K measurement respectively. Authors in [12] have highlighted that when more than 9 rounds are unrolled for 128-bit SIMON, DGPA attack is computationally infeasible. In comparison, our work shows that all bits of unrolled SIMON-128 can be revealed with a small number of measurements, regardless of the number of rounds unrolled.
C. Summary and Discussion
We observe that the proposed 1-bit HW leakage models offer the best attack efficiency with a significant reduction in attack efficiency when employing multi-bit leakage models. Linear regression with multi-bit leakage models is expected to improve the attack efficiency because the contribution from all the bits is not expected to be the same for unrolled datapaths based on the mismatch between modeled and true leakage. However, this work only focuses on first-order attacks on unrolled architectures rather than a comparison of different attack methods. Future work will investigate applying linear regression for multi-bit leakage models as well as applying the improved CFA attack for fully-unrolled implementations presented for DES in [10] . Additionally, mutual information analysis (MIA)-based distinguisher, template attacks and higher order DPA could also be explored to see if attack efficiency can be further improved. (a) Fig. 13 . Success rate vs. number of measurements for time-domain and frequency domain correlation attack for (a) PRINCE-64 for HD and HW power models and (b) AES-128 when combinational SBOX output is targeted instead of sequential registers. VI. CONCLUSION This paper demonstrated optimal leakage models and an improved CFA attack to extract leakage from combinational logic of unrolled architectures of lightweight ciphers. Through side-channel analysis, performed on Sakura-G FPGA for SIMON, PRINCE and AES implementations, we showed the applicability of the approach to any degree of unrolling. Exploration of leakage model selection and impact of narrow band pass filtering is studied to reduce the minimum-trace-todisclosure for key recovery. Through the analysis performed on 7 rounds unrolled SIMON using SNR and SR, we showed 1bHW leakage model to be the optimal choice with respect to improved CFA, with lowest MTD (41K) for 80% SR. Furthermore, 24× improvement is achieved with improved CFA attack when compared with conventional time-domain CPA. General applicability of the proposed methods is also demonstrated by successfully recovering all key bits from fully unrolled implementation of PRINCE-64 and parallel AES-128.
VII. ACKNOWLEDGEMENT
This material is based on work supported by Semiconductor Research Corporation through Texas Analog Center of Excellence (#2810.002).
