This paper presents a system for the automatic generation of Galois-field (GF) arithmetic circuits, named the GF Arithmetic Module Generator (GF-AMG). The proposed system employs a graph-based circuit description called the GF Arithmetic Circuit Graph (GF-ACG). First, we present an extension of the GF-ACG to handle GF(p m ) (p ≥ 3) arithmetic circuits, which can be efficiently implemented by multiple-valued logic circuits in addition to the conventional binary circuits. We then show the validity of the generation system through the experimental design of GF(p m ) multipliers for different p-values. In addition, we evaluate the performance of three types of GF(2 m ) multipliers and typical GF(p m ) multipliers (p ≥ 3) empirically generated by our system. We confirm from the results that the proposed system can generate a variety of GF parallel multipliers, including practical multipliers over GF(p m ) having extension degrees greater than 128.
Introduction
Applications of error correction code (ECC) and cryptography based on arithmetic operations over Galois fields (GFs) are rapidly proliferating as the importance of reliable and secure communications increases [2] Recently, these operations are being increasingly implemented on hardware in embedded devices such as cell phones and radio-frequency identification (RFID) chips, and the performance of the arithmetic circuits have a significant impact on the effectiveness and security of the entire system. In addition, some applications with elliptic curve and pairing-based cryptographies employ GF arithmetic circuits of characteristic greater than 2 (i.e., GF(p m ) where p and m are prime and natural numbers, respectively), which are inherently represented in a non-binary manner. For example, some pairing-based cryptographies use GF with p = 3 [2] - [6] , and a hyperelliptic curve over GF with p = 5 or 7 is useful for efficient implementation of pairing-based cryptography [7] , [8] .
Most such arithmetic circuits have been designed at the lowest logic level by designers whose training in GF arithmetic is specialized for a particular type of application. Conventional hardware description languages (HDLs) do not currently have high-level arithmetic data structures, arithmetic operations, or formulae over Galois fields. Moreover, conventional high-level synthesis techniques have a difficulty in describing GF arithmetic circuits because the mapping of arithmetic operations over GFs varies with the basis and the modular polynomial even if they are represented by the same operator. Due to the large variety of GF multipliers, designers would be forced to describe the behavior of a GF multiplier in lowest-level expressions even with high-level synthesis tools. Furthermore, complete functional verification of GF arithmetic circuits designed by hand is much harder than that for integer arithmetic circuits because many GF operations in ECCs and cryptography are performed with operands having more than 64 bits for achieving enough error correction capability and resistance to cryptoanalysis attacks, respectively. Even for statistical verification by Monte Carlo simulation, the generation of test patterns for GF operations is more intractable than for integer operations. To address the above problems, we present a system for the automatic generation of GF arithmetic circuits, named the GF Arithmetic Module Generator (GF-AMG). Given a circuit specification, the system generates the corresponding HDL description whose function is completely verified in a formal manner. The system we have developed focuses on the generation of GF (2 m ) and GF(3 m ) parallel multipliers for various modular polynomials. The basic idea of GF-AMG is to use a graph-based representation of GF arithmetic circuits, called the GF Arithmetic Circuit Graph (GF-ACG) [9] , as the internal data structure. We can represent an arbitrary GF arithmetic circuit by a GF-ACG and verify it formally even if the operand bit length (i.e., the extension degree) is greater than 128. The verified GF-ACG is then mapped into the corresponding HDL description, which can be implemented in either multiple-valued logic or conventional binary logic.
In this paper, we first present an extension of the GF-ACG for describing arithmetic circuits over GF(p m ) in order to design hardware for elliptic curve and pairing-based cryptographies including the above ones, and mapping them into multiple-valued logic circuits in addition to binary logic circuits. The basic idea of the extension is to add an encoding function from a p-valued variable to several R-valued variables (p > R) as new nodes at the lowest level, where R is determined by an implementation logic. The encoding function is given by algebraic equations. We then present our GF-AMG framework based on the extended GF-ACG and describe our experimental evaluation of the system. While a preliminary evaluation was performed in the previous version [1] , in this paper, we show an extension and further evaluation of the proposed system. First, we extend our system to support more variety of GF(p m ) multipliers while the previous version supported only p = 2 and 3. We then demonstrate the validity of GF-AMG using the experimental design and verification of GF(p m ) (p = 2, 3, 5, 7, and 11) parallel multipliers. The results show that the proposed system can generate a verified GF (11 256 ) multiplier in about five minutes. In addition, we show the generation results of typical GF(2 m ) parallel multipliers based on three arithmetic algorithms. We also show the generation results of typical multipliers over GF(p m ) (p ≥ 3) implemented in binary logic. Figure 1 shows an overview of the GF-ACG. A GF-ACG G is defined as (N, E), where N is a set of nodes and E is a set of directed edges. A node represents an arithmetic circuit by its functional assertion and internal structure. A directed edge represents the flow of data between nodes and defines the data dependency. We assume that every node has at least one edge connection.
Galois-Field Arithmetic Circuit Graph

Definition of Galois-Field Arithmetic Circuit Graph
A node n ∈ N is defined by (F, G ), where F is the functional assertion given as a set of equations over GFs (the GF equations) and G is the internal structure given as a smaller GF-ACG. A node at the lowest level of abstraction does not have an internal structure and is thus described as (F, nil). A functional assertion is represented as a relation E l = E r , where E l and E r are the output and input expressions, respectively, and each expression is given by variables, constants, or combinations of two or more expressions connected by any of the arithmetic operators +, −, or ×.
A directed edge e ∈ E is defined as (src, dest, x), where src and dest represent the start and end nodes, respectively, and x represents the variable indicating an element of GF. If either src or dest is nil, the directed edge represents an external input or output for the given GF-ACG. Each variable x is associated with a Galois field. A Galois field GF is defined as (B, C, IP), where B is the basis, C is the coefficient vector, and IP is the irreducible polynomial. More precisely, B, C, and IP are given as Fig. 1 Overview of Galois-field arithmetic circuit graph.
where β is the indeterminate element (i.e., a root of an irreducible polynomial), m is the degree of field extension, C i is the coefficient set of degree i (i ∈ Z, 0 ≤ i ≤ m − 1), and c i is the i-th element of the coefficient set C i . γ i = β i if GF is represented by a polynomial basis (PB), and γ i = α p i if GF is represented by a normal basis (NB), where α = β n and p is the characteristic. IP = nil if GF is a prime field. Thus, the above description can handle both prime and extension fields. Let h (0 ≤ h ≤ m − 1) and l (0 ≤ l ≤ h) be the most and least significant degrees, respectively. A variable is then represented as x = (GF, (h, l) ), where the ordered pair (h, l) is called the degree range. Using the above notation, we can handle any specific variable x i of degree i.
A variable can be decomposed to an expression with sub-variables at a lower level of abstraction. Let x be a variable and x i (l ≤ i ≤ h) be a lower-level variable. We have two types of decomposition nodes, whose functions are given as
Equation (4) indicates that
On the other hand, Eq. (5) indicates that x ∈ GF(p m ) is divided into a number of variables over the prime field [i.e., x
We also have two types of composition nodes, given as the inverse relations of the above inputs and outputs. Using the decomposition and composition nodes, we can change the level of abstraction of edge representation. Note that these nodes are implemented by wiring and have no internal structures.
The above GF-ACG can also be used to represent any binary logic circuit. A logic variable is defined as a variable over the GF whose coefficient set is limited to the zero element "0" and the unit element "1". Any binary logic operation can be represented with pseudo-logic equations, such as and(a, b) = ab. Note that the idempotent law is defined as one of the functional assertions in the corresponding node (i.e., a = a 2 and b = b 2 ). Thus, the original GF-ACG can represent any GF(2 m ) arithmetic circuit in binary logic. The arithmetic circuits given by GF-ACGs are verified by a formal verification method using a Gröbner basis and a polynomial reduction technique [9] .
Extension to GF(p m ) Arithmetic Circuit
An extension of the GF-ACG is presented for describing a GF(p m ) arithmetic circuit, enabling it to be implemented in multiple-valued logic as well as in binary logic. In the above GF-ACG, a mapping from a GF variable to a logic variable at the lowest level description is implicitly given [i.e., 0 and 1 in GF (2) are mapped into 0 and 1 in binary logic, respectively] because it focuses only on GF(2 m ) arithmetic circuits and their binary implementations. Therefore, such mapping is done without the need for any additional procedure. In order to describe and verify nodes with GF(p) (p ≥ 2) variables and their R-valued (R ≥ 2) implementation, however, we need to give an explicit mapping at the lowest level of abstraction. Our plan is to provide a mapping function, called an encoding function, for transforming GF(p) variables into R-valued logic variables in the form of a functional assertion (i.e., a GF equation) for the lowest-level nodes. We first describe an encoding function for transforming GF(p) variables into binary logic variables. Each GF variable in C i (a coefficient set of degree i) is encoded by at least log 2 |C i | logic variables. Table 1 gives examples of such mappings, showing encodings of (a) GF(2) ∈ {0, 1} and (b) GF(3) ∈ {0, 1, 2} into binary logic variables. Note that for cases having characteristic p > 2, any encoding is possible, including non-minimum-length encoding. Such encoding can be represented by a specific equation, referred to as an encoding equation. Let x and L j (0 ≤ j ≤ k − 1) be a GF variable over GF(p) and a logic variable used for encoding,
k be a k-bit logic value. The general form of the encoding equation is then given as
where f (α) is the GF value corresponding to α, and L α j j is the j-th literal, defined as
For example, the encoding equations for Table 1 (a) and (b) are given as x = L 0 and Figure 2 shows GF-ACGs for a 2-input multiplier over GF (3) implemented in binary logic [10] , where the node in Fig. 2 (a) corresponds to the shaded part in Fig. 2 (b) . This Table 2 Nodes, Galois fields and variables in Fig. 2 (b) .
indicates that node n 0 has an internal structure consisting of lower-level nodes in the corresponding shaded part. Table 2 shows the nodes, GFs and variables used in Fig. 2 . The nodes of n 1 , n 2 , and n 9 in Fig. 2 (b) perform the mapping between GF variables and logic variables. More precisely, the functions of n 1 and n 2 are to translate GF variables into logic variables, while the function of n 9 is to translate logic variables into GF variables. Note that the functional assertions of such nodes require equation(s) that represent unused inputs. In this example, one such equation is given as 1) is not used. Thus, any GF(p m ) arithmetic circuit will be implemented by binary logic circuits in a uniform manner.
Next, we then describe an extension of the above encoding equation for the case of R-valued implementation. Table 3 shows examples of the mapping of (a) GF(3) ∈ {0, 1, 2} and (b) GF(5) ∈ {0, 1, 2, 3, 4} into ternary logic. Let x and L j (0 ≤ j ≤ k − 1) be a GF variable over GF(p) and an R-valued logic variable used for encoding, respectively. Let
k be a k-bit R-valued logic value; the encoding equation is then given as
and
where Eqs. (8) and (9) are based on arithmetic operations over GF(p). For example, the encoding equations for Table 3 (a) and (b) are given by x = L 0 and x = (2L 1 
Galois-Field Arithmetic Module Generator
This section presents an automatic generation system, the Galois-Field Arithmetic Module Generator (GF-AMG), for producing GF parallel multipliers. The system employs the extended GF-ACG for producing multiplier modules whose functions are completely verified at the algorithmic level. Figure 3 is a block diagram of GF-AMG. It consists of (i) the GF-ACG Code Synthesizer, (ii) the GF-ACG Verifier, and (iii) the ACG-to-HDL Translator. The GF-ACG Code Synthesizer generates GF-ACG code according to the user's design specification, which includes characteristic, multiplication algorithm, modular polynomial, and logic type. Table 4 shows a list of characteristics, multiplication algorithms, modular polynomial degrees, and logic types that can be generated by the GF-AMG system. The GF-ACG Verifier proceeds to formally verify the generated GF-ACG code by a method using a Gröbner basis and a polynomial reduction technique, following the procedure given in [9] . The ACG-to-HDL Translator then translates the verified GF-ACG code into the equivalent Verilog-HDL code, using the algorithm shown in Algorithm 1. Given a GF-ACG G, we extract a set of relations of internal edges at the lowest level of abstraction from G recursively. The relations of internal edges are then translated into the corresponding HDL format by one-to-one mapping.
System Framework
Generation of GF(p m ) Parallel Multipliers
This section focuses on the design and generation of GF(p m ) parallel multipliers in GF-AMG. For the conventional design of GF(2 m ) parallel multipliers also generated by GF-AMG, see [11] , [12] , and [13] . Let x and y ∈ GF(p m ) be the inputs and let z ∈ GF(p m ) be the output. The multiplication over GF(p m ) is first divided into the following two functions:
where
is the i-th partial product. We then consider the internal structure of the nodes (11) to obtain its hierarchical GF-ACG description. Each w i is given by 
-output adders over GF(p).
For example, Fig. 4 shows the GF-ACGs for the GF (3 4 ) parallel multiplier at the top four levels of abstraction. Table 5 shows the corresponding nodes, GFs and GF variables. Note that the decomposition and composition nodes are not shown in Table 5 . The nodes in Fig. 4 (a), (b) , and (c) correspond to the shaded parts in Fig. 4 (b) , (c), and (d), respectively. The 2 nd -level nodes "Partial Product Generator" (PPG) and GF "Accumulator" (GFA) in Fig. 4 (b) have functional assertions corresponding to Eqs. (10) and (11), respectively. The 3 rd -level nodes "PPGi" in Fig. 4 (c) have the functional assertion corresponding to Eq. (12). The nodes "GFAi" in Fig. 4 (c) indicate 2-input 1-output adders over GF (3 4 ) to construct "Accumulator". In addition, the nodes in Fig. 4 (d) indicate GF(3) arithmetic circuits, and these are described as given in Fig. 2 , which showed the binary implementation case. Thus, we have the GF-ACGs for the GF(p m ) parallel multiplier represented in a hierarchical manner.
Algorithm 2 displays an algorithm for synthesizing GF(p m ) multipliers. Given a design specification (i.e., an irreducible polynomial and an implementation logic), the algorithm generates a GF-ACG. The function "Degree" in Line 2 obtains the degree of the irreducible polynomial. According to the value obtained, its internal structure is generated in a recursive manner. The function "GetEquation" in Line 4 obtains an equation of the "PPGi" expressions that are represented in Eq. (12) . The functions "CountSubOperator" and "CountAddOperator" count the numbers of "−" and "+" operators in the equation, respectively. Using the above numbers and the degree, we generate 4 thlevel GF-ACGs for GF(p) arithmetic circuits. The functions "GenerateGFpMultiplier", "GenerateGFpAdditiveInv", and "GenerateGFpAdder" return GF-ACGs for GF(p) multipliers, additive inverters, and adders, respectively. Their internal structures are determined by the given logic L. If L is binary logic, the internal structure is given by the netlist corresponding to GF(p) multiplier, additive inverter, and adder which are designed in a manner similar to Fig. 2 . If L is not binary logic, the internal structure is given as nil in order for designers to use custom logic cells designed by themselves. The function "GeneratePPGi" in Line 16 generates "PPGi" expressions (0 ≤ i ≤ d − 1) from the 4 th -level GFACGs, where the GF(p) adders are placed as a tree. The function "GeneratePPG" in Line 18 generates a GF-ACG for "Partial Product Generator" from the 3
rd -level GF-ACGs Table 5 Nodes, Galois fields and variables in Fig. 4 .
of "PPGi". Similarly, "Accumulator" is generated from the 3 rd -level GF-ACGs of "GFAi" consisting of d GF-ACGs of GF(p) adders. In "Accumulator", d − 1 "GFAi" expressions are placed as a tree. Finally, the function "GenerateMultiplier" in Line 26 generates a GF-ACG for the GF(p m ) multiplier from the 2 nd -level GF-ACG. The HDL code generated for GF(p m ) multipliers may be applied in not only a binary implementation but also an L-valued implementation. [GF(2 m ) multipliers, by contrast, are implemented only in binary logic.] For a binary implementation, we can apply the HDL code to the standard back-end design flow including logic synthesis and placement and routing (P&R) with the standard cell library. For an L-valued implementation, we would implement the HDL code by a technology mapping with a custom-made library in an L-valued logic. Thus, we see that GF-AMG generates verified HDL codes for both multiple-valued logic and binary logic.
As an example, Fig. 5 shows a schematic of a GF (3 8 ) multiplier generated by GF-AMG, where the lowest level component indicates an arithmetic circuit over GF (3) . We can implement this multiplier in ternary logic by applying a ternary logic circuit to the component. 
Experimental Generation
The performance of our system was evaluated through the experimental generation of GF(p m ) parallel multipliers. We first generated a set of GF(p m ) parallel multipliers of typical degrees. The generation was conducted with an open-source computer algebra software Risa/Asir [14] under Linux on a PC (an Intel Xeon E5450 with a 3.00-GHz processor and 32 GB of RAM). Table 6 shows the generation times, consisting of GF-ACG synthesis, verification, and GF-ACG-to-HDL translation times, for each of the degrees investigated. Using our method, we achieved complete verification even for a 1024-bit multiplier over GF (11 256 ). Note here that the verification time decreases even in the case of a larger p because the computation time of algebraic operation with the software used in the experiment is sometimes dependent on the machine condition such as parallel-executed processes. As a comparison to evaluate the advantage of the verifier, we also performed the Verilog-XL simulation using the corresponding HDL descriptions. With this method, we were not able to complete the simulation of GF (3 10 ) or larger multipliers because the simulation time increases exponentially as the extension degree increases. As described above, GFs with at most characteristic seven are used for pairing-based cryptography so far. Thus, the experimental result suggests that our system is sufficient and available for such applications.
We then generated a set of GF(2 m ) parallel multipliers for three types of multiplication algorithm in order to assess Figure 6 shows the area and delay of the three types of GF(2 m ) multiplier for different value of m, where the vertical axis indicates the (a) area or (b) delay, and the horizontal axis indicates the extension degree. We can confirm here that Mastrovito and Full-Tree have the advantage in area and delay, respectively. Massey-Omura, which is a typical multiplication algorithm using a normal basis (NB), did not demonstrate any advantage in area or delay over the other two algorithms. However, Massey-Omura is useful for more sophisticated arithmetic NB circuits; for example, we can design efficient exponential circuits based on an NB since the squaring operation is performed only by wiring. Table 7 shows the performance of GF(p m ) multipliers implemented in a binary logic, for different characteristics and degrees. For GF(p m ) multipliers with p = 5, 7, and 11, we implemented GF(p) arithmetic circuits (i.e., adder, multiplier, and constant multipliers over GF(p)) using the corresponding lookup-table. The Synopsys Design Compiler could not synthesize the GF (7 128 ) and GF (11 128 ) multipliers under our experimental condition due to the memory overflow. This would be because the circuit area (i.e., the number of logic gates) for the multipliers are too large (∼ 6 M gates). However, the results suggest that designers can generate a variety of practical GF multipliers from given design specifications by the proposed GF-AMG system.
Conclusion
In this paper, we have presented a system named GF-AMG for the automatic generation of GF parallel multipliers that uses a graph-based circuit description called GF-ACG. We first extended the GF-ACG for GF(p m ) (p ≥ 2) arithmetic and R-valued (R ≥ 2) implementation. We then showed the system framework of GF-AMG, wherein the generated HDL codes are completely verified by a formal verification method. We also evaluated the performance of GF-AMG by the experimental generation of GF(p m ) multipliers. In particular, we demonstrated that the extended GF-ACG allows us to generate GF(p m ) multipliers that can be implemented in multiple-valued logic.
The system described here will be available at our website [15] , as depicted in Fig. 7 . Designers can submit their specification on the request page [ Fig. 7 (a) ] and then receive the generated HDL code from the download page [ Fig. 7 (b) ].
Rei Ueno received a B.E. degree in Information Engineering, and the M.S. degree in Information Sciences from Tohoku University, Sendai, Japan, in 2013 and 2015, respectively. He is currently enrolled in a doctorial course at Tohoku University. Since 2016, he has been a JSPS (The Japan Society for the Promotion of Science) research fellow. His research interests include arithmetic circuits, cryptographic implementations, formal verification, and hardware security. 
