AbstractGalois Field arithmetic blocks are the key components in many security applications, such as Elliptic Curve Cryptography (ECC) and the S-Boxes of the Advanced Encryption Standard (AES) cipher. This paper introduces a novel hardware intellectual property (IP) protection technique by obfuscating arithmetic functions over Galois Field (GF), specifically, focusing on obfuscation of GF multiplication that underpins complex GF arithmetic and elliptic curve point arithmetic functions. Obfuscating GF multiplication circuits is important because the choice of irreducible polynomials in GF multiplication has the great impact on the performance of the hardware designs, and because the significant effort is spent on finding an optimum irreducible polynomial for a given field, which can provide one company a competitive advantage over another.
Introduction
Due to the increasing cost of integrated circuit (IC) design and manufacturing, it becomes more important to protect the intellectual property (IP) of an IC against reverse engineering (RE). Despite the complexity of modern ICs, the implementation details can be extracted by RE techniques once a circuit is fabricated and released to market [1] . This has given rise to a number of academic works and commercial products focused on using obfuscation to make designs more difficult to reverse engineer. In this work, we focus on protecting the IP of Galois Field arithmetic circuits that are commonly used in cryptography.
Galois field (GF) is a number system with a finite number of elements. Galois Field arithmetic is used extensively in many security applications such as Elliptic Curve Cryptography (ECC) and the Advanced Encryption Standard (AES). The basic arithmetic functions include GF addition and multiplication, and more advanced GF arithmetic functions are derived from those two [2] , such as GF division, and elliptic curve point addition and multiplication [3] . An irreducible polynomial P (x) is required for constructing GF multiplication, and the choice of polynomial has a significant impact on the performance of the GF multiplier.
Intel Research showed that GF(2 4 ) (4-bit) GF multipliers implemented in the same 22nm CMOS technology with two different P (x) differ by 40% and 9% in area and delay respectively, on their nanoAES chip [4] . The costs vary with the choice of polynomial because different polynomials require different numbers of XOR operations in the critical path and the entire design; as the bit-width increases, the performance of the multipliers varies even more strongly across different polynomials [5] . Moreover, recent work shows that it is possible to reverse engineer the irreducible polynomial of a post-synthesized GF(2 m ) multiplier up to 571-bit [5] by analyzing the algebraic signatures extracted from the gate-level netlist. This shows that the choice of polynomial should not be considered secret by default unless steps are taken to obfuscate it. Due to the significant efforts spent on finding the optimum P (x) in industrial designs [4] , preventing the P (x) from being reverse engineered becomes important 1 . This work introduces the first obfuscation methodology over Galois Field that obfuscates the GF multiplication with multiple irreducible polynomials, and furthermore shows that there is a small performance overhead per obfuscated function. The obfuscation approach is based on analyzing the mathematical properties of finite field arithmetic to identify the maximum common logic between multiplications of different P (x). The resulting multiplier performs multiplication with a certain P (x), denoted the true function, which is known by the designers only. An attacker should be unable to use reverse engineering to distinguish the true function from the obfuscated functions which are all valid GF multiplications in the same finite field as the true function.
Background

Galois Field Principle
Galois field (GF) is a number system with a finite number of elements and two main arithmetic operations, addition and multiplication; other operations such as division can be derived from those two [2] . Galois field with p elements is denoted as GF(p). Prime field, denoted GF(p), is a finite field consisting of a finite number of integers {1, 2, ...., p − 1}, where p is a prime number, with additions and multiplication performed modulo p. Binary extension field, denoted GF(2 m ) (or F 2 m ), is a finite field with 2 m elements. Unlike in prime fields, however, the operations in extension fields are not computed modulo 2 m . Instead, in one possible representation (called polynomial basis), each element of GF(2 m ) is a polynomial ring with m terms with coefficients in GF(2) and modulo an irreducible polynomial P (x). The addition of finite field elements is the addition of polynomials, with coefficients computed in GF (2) . Multiplication of field elements is performed modulo irreducible polynomial P (x) of degree m and coefficients in GF (2) . The irreducible polynomial P (x) is analogous to the prime number p in prime fields GF (p). Extension fields are used in many cryptography applications, such as AES and ECC. a2  a1  a0  b3  b2  b1  b0  a3b0  a2b0  a1b0  a0b0  a3b1  a2b1  a1b1  a0b1  a3b2  a2b2  a1b2  a0b2  a3b3  a2b3  a1b3  a0b3  s6  s5  s4  s3  s2  s1 s0 Galois Field multiplication is performed by multiplication modulo an irreducible polynomial that defines the finite field. An irreducible polynomial is a polynomial that cannot be factored into nontrivial polynomials over the same field [6] . For example, in GF(2), x 2 +x+1 is an irreducible polynomial but x 2 +1 is not, since x 2 +1=(x+1)(x+1). An example of GF(2 4 ) (4-bit) multiplication is shown in Figure 1 , with irreducible polynomial P (x)=x 4 +x 3 +1. Similar to addition and subtraction, the inputs and outputs in multiplication are binary expressions. For example, A = [a 3 a 2 a 1 a 0 ], where a 0 is the least significant bit and a 3 is the most significant bit. The multiplication is performed in two stages: 1) adding the partial products and 2) reducing over GF (2 4 ) with P (x). The partial products are generated similarly to the integer multiplication using AND operations. Since additions in the field are XOR operations, the sum of the partial products (s q in Figure 1 ) is generated using a series of XORs.
The sum of partial products will then be reduced modulo the irreducible polynomial. As mentioned previously, the binary expression corresponds to the coefficients of polynomial expression. Thus, s i is the coefficient of x i in its polynomial representation. According to the modulo addition rule, s 0 +s 1 +...s 6 mod P (x) = [(s 0 mod P (x))+(s 1 mod P (x))+...(s 6 mod P (x))] mod P (x). The GF multiplication can be constructed as follows:
i=0 s i mod P (x) = s 0 +s 1 x+s 2 x 2 +s 3 x 3 , denoted as P 0 .
• P 1 = s 4 x 4 mod P (x) = s 4 +s 4 x 3 mod P(x).
• P 2 = s 5 x 5 mod P (x) = s 5 +s 5 x+s 4 x 3 mod P(x).
• P 3 = s 6 x 6 mod P (x) = s 6 +s 6 x+s 6 x 2 +s 6 x 3 mod P(x).
• Hence, Figure 1 . Galois Field addition and multiplication are the basic GF arithmetic operations that are used to implement the advanced arithmetic functions such as division, and elliptic-curve point addition and multiplication [3] for cryptography applications. However, GF addition is performed with one bit-vector XOR regardless of the irreducible polynomial, which means that obfuscating multiple irreducible polynomials cannot be applied to a stand-alone GF adder. Thus, this work focuses on multiplication obfuscation over GF(2 m ). 
Irreducible Polynomials
In general, there are various irreducible polynomials that can be used for a given field size, each resulting in a different multiplication result. The number of irreducible polynomials increases as m increasing. The list of irreducible polynomials that exist for degrees m={2,3,4,5} are shown in Table 1 . For constructing efficient arithmetic functions over GF(2 m ), the irreducible polynomial is typically chosen to be a trinomial, x m +x a +1, or a pentanomial x m +x a +x b +x c +1 [7] . It is furthermore required that coefficients m, a be chosen such that m -a ≥ m/2 [8] .
Given degree m, the multiplications constructed by different irreducible polynomials are functionally different but are in the same field. For example, given degree m=4, in contrast to using P 0 (x)=x 4 +x 3 +1, using P 1 (x)=x 4 +x+1 produces a different GF multiplication function in GF (2 4 ) . The difference appears in the process of reducing the sum of the partial products modulo the irreducible polynomial. For example, when using polynomial P 0 , s 6 is required for all the output bits since x 6 mod P 0 (x) is equal to x 3 +x 2 +x 1 +1, which needs four XOR operations. However, when using polynomial P 1 (x), s 6 is only required for z 2 and z 3 since x 6 mod P 1 (x) is equal to x 3 +x 2 , which needs only two XOR operations. This explains why the choice of irreducible polynomials effects the delay and area of GF multipliers.
The goal of our approach is implementing a multiplier with multiple functions obfuscated, such that only designers and authorized users know the true function ( Figure 2 ). P are the switches that will be physically implemented as constant inputs using camouflaged standard cells. One important observation is that most logic of the GF multipliers using different irreducible polynomials remain the same. The main differences are the logic of reducing the sum of the partial products. This means that the overhead of implementing such an obfuscated GF(2 m ) multiplier is much smaller than synthesizing the model in Figure 2 , which offers the main motivation for this work.
Camouflaged Standard Cell
Gate-level camouflaging techniques rely on using camouflaged standard cells in the fabricated integrated circuits. The camouflaged standard cells are designed as independent standard cells in the technology library. Mostly, the camouflaged cells are used to introduce dummy functionalities. During technology mapping in the design flow, the original functionality of the circuit is camouflaged by partially mapping the circuit with the camouflaged standard cells. These camouflaged cells are designed by changing the layout of the cell with dummy contacts [9] , or by changing doping of the transistors [10] . With such camouflaged cells, designing circuits with constant inputs camouflaged becomes possible. For example, a 2-NAND is proposed to implement camouflaged constant one/zero by modifying the doping of different transistors [11] . A variant of dopant-programmable cells is to build components in a dual-Vt process technology such that inferring the correct component functions would require identification of which devices use high and low thresholds [12] . To minimize cost, it is often desirable to protect a circuit by camouflaging only a small subset of the gates [13] . However, in emerging technologies, it can be more difficult to infer function from structure [14] , and a reverse engineer may thus need to consider all gates as camouflaged. An overview of physical mechanisms for obfuscation is given by Vijayakumar et al. [15] . In this work, the switches P shown in Figure 2 are mapped with such camouflaged standard cells.
Attacker Model
The attacher model in this work is similar to the attacker model for reverse engineering circuits with camouflaged gates, which is firstly given by Rajendran [16] . The logic function implemented by a camouflaged circuit should remain hard to discover when the attacker has knowledge of all non-camouflaged gates and can apply inputs to the circuit and observe outputs. Techniques from oracle-guided synthesis [17] have recently been used in SAT-based attacks to reverse engineer gate camouflaging [18] and logic encryption [19] , and improved with incremental SAT solving [20] and approximate deobfuscation by relaxing the conjunctions of SAT formulas [21] . With the knowledge of capabilities and limitations of oracle-guided SAT-based attacks, there are several countermeasure techniques developed, such as introducing AND-tree [22] , protecting the minterms of the specification [23] , introducing dummy combinational loops [24] , etc. Moreover, formal verification technique based on computer algebraic methods [25, 26] is shown to be able to reverse engineer the irreducible polynomials while the GF circuit is considered as a black box and the encodings of primary inputs and outputs are unknown [27] .
We assume that the attackers know all the irreducible polynomials of a given field GF(2 m ). The attackers also have access to the physical implementation. We assume that they can obtain the gate-level netlist and identify the camouflaged standard cells that implement the constant inputs using reverse engineering techniques. The attackers aim to find the true irreducible polynomial used in the multiplier blocks in a large design. Using this knowledge to reverse engineer the true irreducible polynomial, the attackers have to do the following: 1) the attackers have to guess the value of the camouflaged signals implemented with the camouflaged standard cells; 2) prove the equivalence between the implementation and the reverse engineered version. The results of evaluating the obfuscation strength are shown in Section 4.3, with SAT-based attack techniques, and the recently proposed Binary Decision Diagrams (BDDs) based approach [28] .
Approach
The flow of the proposed obfuscation approach is shown in Figure 3 . The inputs include 1) the degree (bit-width) m of the GF multiplier, 2) a set of irreducible polynomials in GF(2 m ), 3) the indication of the true function, and 4) the irreducible polynomials will be obfuscated in resulting multiplier. In Figure  3 , the true function will be multiplication modulo P 0 (x), and multiplications modulo {P 1 (x), P 2 (x), ..., P n (x)} are the obfuscated functions. The designer and authorized users know which irreducible polynomial is used for constructing the field. This approach is processed in three steps:
• Initialize a GF(2 m ) multiplier with the irreducible polynomial that the designer wants to implement in the design. This requires a function that generates the multiplication structure (e.g., Figure 1 ) with any irreducible polynomial. Since the partial products and the sum of partial products are identical for the irreducible polynomials with the same degree, this function is reduced to produce the structure of reducing the addition of the partial products.
• Generate and minimize the extra logic for adding obfuscated functions, i.e., multiplications modulo {P 1 (x), P 2 (x), ..., P n (x)}. Based on our observation, the only changes needed to add obfuscated functions are in the logic that reduces the sum of partial products modulo different irreducible polynomials. Thus, these logic can be produced by comparing the reduction structures. The output is an updated reduction structure. This function is applied iteratively for generating the obfuscation logic for n polynomials.
• Produce the obfuscated multiplier. This process generates the obfuscated multiplier by combining the partial product generator, the addition of partial products, and the reduction structure created by the previous step. The output of this process will be the input of the design flow, which produces the gate-level netlist and layouts.
Multiplication Structure Generation
The algorithm of generating GF multiplication structure is shown in Algorithm 1, including partial products, addition of partial products, and reduction structure. Algorithm 1 is illustrated using another multiplication of GF(2 4 ) using polynomial x 4 +x+1. The first two functions are trivial and produce the same result as multiplication using x 4 +x 3 +1 (see Figure 1 ). 2 4 ) multiplication structures M 0 ,M 1 , with irreducible polynomials P 0 and P 1 , generated by structureGen; c) obfuscated multiplication structure, in which the true function is implemented with P 0 . The reduction structure is modeled as a matrix in this work. Thus, an mby-m matrix will initialized with all 0s (line 11). The first row of the matrix is filled with {s i , 0 ≤ i ≤ m-1}. Note that the polynomial modulo function is not applied to those terms. This is because the result of x i modulo a polynomial with degree m is always x i if i<m is true (lines 2-4). To determine the rest of the structure, {x i , m ≤ i ≤ 2m-2} modulo P (x) is then processed (line 16). The results are shown in Equations 1-3. The r th row of the matrix (2 ≤ r ≤m) is filled with s m−2+r by checking the existing terms in the remainder R of x i mod x 4 +x+1 (lines 7-9).
Algorithm 1 structureGen(P(x)): Generate Mult Structure
For example (Eq. filled with s 6 . The rest of the elements in M remain as 0. Applying Algorithm 1 with two irreducible polynomials P 0 (x) and P 1 (x) over GF(2 4 ) produces the results denoted as M 0 and M 1 in Figure 4 (a) and (b) 2 .
Obfuscation
This section introduces our approach to generating the obfuscation logic for producing the obfuscated multiplier. This procedure identifies the different logic between the original reduction structures, and producing a new one with minimum extra logic introduced. The notations used in this section are as follows:
• z m n is the n th output bit of the multiplier implemented with P m (x).
• z m0,m1 n is the n th output bit of the obfuscated multiplier, obfuscated with functions of P m0 (x) and P m1 (x). The true polynomial of this multiplier is implemented with P m0 (x).
• δ r c is the obfuscation term in the obfuscated reduction structure at the r th row and c th column.
To obtain the difference between the reduction structures, vector-wise XOR is applied to the two matrices, M 0 and M 1 generated by structureGen. The resulting matrix is denoted as M . The positions of the non-zero element in M , {(r i ,c j ), 1 ≤ i, j ≤ m}, indicate the differences. The obfuscated reduction structure is first created by copying M 0 . The elements at positions (r i ,c j ) are replaced by δ r i c j . The function of δ is defined by Equation 4. In the actual hardware implementation, p is a known constant to the designers and authorized users, namely dummy switch. The dummy switch is implemented by introducing camouflaged standard cell introduced in Section 2.3 during technology mapping process.
Example 1: We illustrate the obfuscation process using the two GF(2 4 ) multiplications shown in Figure 4 . The resulting multiplier performs multiplication with P 0 (x) and is obfuscated with the multiplication with P 1 (x). M is created by XORing M 0 and M 1 , which includes seven non-zero elements at positions (r i , c j ) = { (2,1), (3,1), (3,2), (2,3), (4,3), (3,4), (4,4) }. Thus, seven δ r i c j are required for this obfuscation. The obfuscation terms of Figure 4 -(c) are shown in Equation 4. The obfuscated multiplication structure is first created by copying M 0 , and then is updated by replacing the elements at (r i , c j ) = { (2,1), (3,1), (3,2), (2,3), (4,3), (3,4) , (4, 4) 
The rest of the logic in the obfuscated multiplier remains the same as in any GF(2 4 ) multiplier.
An iterative obfuscation approach is applied to generate obfuscated multiplier with more than two functions. With performing obfuscation with three or more functions, the designer must choose the irreducible polynomial for the true function (e.g. P 0 (x)), and also choose the order of obfuscation among the other functions. For example, consider a scenario in which the designer wants to design an obfuscated GF(2 4 ) multiplier with three functions to replace the multiplier block in the ECC hardware, with one more irreducible polynomial P 2 (x)=x 4 +x 3 +x 2 +x 1 +1 (c.f. Table 1 ). The true polynomial is P 0 (x). Let M 0 , M 1 , and M 2 be the multiplication structures of P 0 (x),P 1 (x), and P 2 (x). If the order is P 1 (x)→P 2 (x), our approach first generates the intermediate obfuscated structure M with inputs M 0 and M 1 , and the generates the finalized design by obfuscating M with M 2 . We can see that 1) in order to obfuscate n + 1 functions, the number of obfuscation iterations is n; 2) the maximum number of functions in one GF(2 m ) multiplier is limited by the total number of irreducible polynomials that have degree m. For this iterative approach, the size of the final multipliers are effected by the order of obfuscations. This occurs because the total number and complexity of the obfuscation terms (δ) vary with across the different orders. This has been further explored in Section 4.2.
Optimization
Two optimization techniques are introduced to reduce the overhead of the obfuscation approach, 1) early constant propagation and 2) obfuscation term reduction.
Early constant propagation
It turns out that there could exist a large number of obfuscation terms generated have zero entries. Table 1 (a) Distribution of area cost. which case, those terms can be reduced from AND-OR logic into simple AND functions.
Obfuscation term reduction
Two types of reduction are introduced: a) merging the equivalent δ terms. For example, δ 3 1 and δ 3 4 will be merged since they have the same functionality; b) reducing non-equivalent δ terms. Two δ in the same column can be merged if one δ is s x ·p, and the other is s y ·p, x =y. For example, in Figure 4 -(c), in the third column (z 1 ), δ 2 3 and δ 4 3 can be replaced by δ reduce =s 6 ·p+s 4 ·p. This removes one term in the third column, which reduces one XOR function for z 1 by introducing one OR function. This is because XOR is a more complex Boolean function than OR.
Experimental results
The proposed approach is evaluated by creating obfuscated Galois Field multipliers with various numbers of viable GF (2 m ) multiplications. The designs generated by our approach are mapped using the open source synthesis tool ABC [29] , with a 14nm technology library. The bit-width m varies from 8 to 256. The irreducible polynomials are obtained from [30] . The runtime of generating obfuscated multipliers is not included in this section because all runtimes are less than one second. In Table 2 , we can see that obfuscating 16 functions for m={64,128,256} requires 60%×, 30%×, and 20%× area overhead. The delay overheads are 36%, 30%, and 25%. On average, the cost of adding an extra obfuscated function is 1.8% area and delay. The number of obfuscated functions for GF (2 8 ) is limited to 8 because there are only eight primitive irreducible polynomials in this field.
Design Cost Analysis
To further analyze the design cost, we evaluate the total area and delay overhead with the number of obfuscated functions from 2 to 32, with m=8, 64, 128, and 256. The x-axis shows the number of obfuscated functions, and the y-axis represents the overhead of area/delay. In Figure 5 , we can see that:
• the area overhead increases almost linearly with the number of functions increasing; on average, the cost of adding an extra obfuscated function is 1.8% area and delay.
• given the same number of obfuscated functions, the area overhead and the delay overhead decrease as m (bit-width of the multiplier) increasing. For example, the area and delay overhead of obfuscating eight functions for GF(2 8 ) multiplier are 186% and 33%; for GF(2 256 ), they are 8% and 22%. This shows that our approach advances in obfuscating large Galois Field arithmetic applications.
• the overheads occasionally decrease when the number of obfuscated functions increases. This is because the obfuscation terms δ introduced may become the don't care logic, which helps the technology mapping process to improve the results [31] .
Order of Obfuscations
As mentioned in Section 3.2, the size of the obfuscated multipliers are affected by the orders of the iterative obfuscations. The main reason is that using different orders, the number of δ and the complexity of these δ can be very different. An exhaustive permutation study over GF (2 8 ) is shown in Figure 6 to demonstrate the impact of the obfuscation order. All possible eight-function obfuscated GF(2 8 ) multipliers are generated by the proposed approach, while each order corresponds to one permutation of {P 0 , P 1 , ..., P 7 }. Thus, the total number of designs in Figure 6 is 8!=40320. The results are collected by ABC with 14nm technology library. The x-axis shows the area/delay, and the y-axis shows the number of designs in a given range of area/delay. The area varies from 1800-3300, and the delay ranges from 125-170. We can see that the order of obfuscations has great impact on the design cost of the obfuscated multipliers. Comparing the result to the order used in Table 2 (Table 1 in Figure 6 ), area=2786 and delay=148.64, that design can be further improved by exploring the choice of orders. The future work will focus on finding the good orders for efficient obfuscation using machine learning.
Evaluation of Attacks
We apply the SAT-based attack technique using the two tools released publicly [19] [32] . The inputs to the tools are Verilog design with extra syntax for defining the de-camouflaging problems. We develop a set of camouflaged GF circuits using the proposed approach, including 8-bit, 12-bit,16-bit and 32-bit GF functions. Each of these circuits includes four camouflaged GF functions. Regarding the BDD approach [28] , we measure the performance of constructing the BDDs of the camouflaged circuit using the same CUDD package [33] . The results are shown in Table 3 . The SAT-based attack techniques cannot obtain the true function with only three dummy functions after 16-bit within 12 hours. BDD construction fails at 16-bit as well due to the memory explosion. Note that the cryptograph applications such as ECC could have large GF operators. 
Conclusion
In this paper, we introduce an obfuscation approach over Galois Field, mainly focusing on obfuscation GF multiplications. Our approach generates GF multipliers with multiple irreducible polynomials obfuscated, to prevent the actual irreducible polynomial being reverse engineered. A complete design methodology is developed and evaluated with a set of GF multipliers, with up to 32 functions obfuscated. The results show that our approach can obfuscate the GF multipliers with low overhead in design performance. We also evaluate the strength of obfuscation over Galois Field using SAT-based and BDD-based techniques. The future work will focus on leveraging machine learning algorithms in searching the best obfuscation order(s).
