Abstract: Galois field computations abound in many applications, such as in cryptography, error correction codes, signal processing, among many others. Multiplication usually lies at the core of such Galois field computations, and is one of the most complex operations. Hardware implementations of such multipliers become very expensive. Therefore, there have been efforts to reduce the design complexity by decomposing the Galois field GF (2 ) as ( (2 ) ) where = × . Such a decomposition introduces a hierarchical abstraction -lifting the ground field from (2) (bit-level) to (2 ) (word-level) -thus simplifying the design. This paper addresses the formal verification problem of such multipliers designed over ((2 ) ), using a computer algebra and algebraic geometry based approach. To prove that the composite field multiplier implementation matches the original specification, we hierarchically formulate the verification problem using the Hilbert's Nullstellensatz over Galois Fields. A Gröbner basis engine is employed as the underlying computational framework. Experiments are performed with various variable/term orders to demonstrate the efficacy of our approach. We can verify the correctness of upto 1024-bit multipliers, whereas SAT/SMT-based approaches are infeasible.
I. INTRODUCTION
Galois field theory is extensively applied in many applications, such as in elliptic curve cryptography, error correction codes, digital signal processing, VLSI testing, etc. Therefore, dedicated hardware and software implementations of Galois field arithmetic abound. Multiplication usually lies at the core of most of the Galois field applications. The high complexity of multiplication has led researchers to derive efficient and sophisticated implementations over Galois fields. One such effort has been towards the design of Composite Field Multipliers: where the Galois field (2 ) is decomposed as ( (2 ) ), for a non-prime = ⋅ , and the multiplication is then performed over ( (2 ) ). The decomposition introduces a hierarchy (modularity) in the design by lifting the ground field from (2) (bits) to (2 ) (words). This
This work is sponsored in part by a grant from NSF #CCF-546859.
results in impressive area and delay savings for multiplier circuits over large finite fields [1] [2] [3] .
Custom designs for such complicated architectures introduce potentials for errors/bugs in the implementation. Due to their large size (the datapath size can be 512-bits or larger in many cryptographic schemes), verification of such designs becomes infeasible with contemporary techniques, such as SAT/SMT/BDD-based methods.
This paper addresses the implementation verification of such composite field multipliers over ( (2 ) ). Our approach requires that the circuit (decomposition) hierarchy be known. This information is available, as the field decomposition is derived from the original specification and the circuit is custom designed using a top-down design methodologythus preserving the hierarchy.
Problem Statement: Given a word-level specification ⋅ (mod ( )) over (2 ), a composite field multiplier implementation over ( (2 ) ) and the circuit hierarchy, where = ⋅ . Our objective is to prove whether the circuit correctly computes the multiplication ⋅ (mod ( )), for all values of the inputs , . Otherwise, we have to find a counter-example that excites the bug.
Contributions of this paper:
• We formulate the verification problem using a computeralgebra/algebraic geometry based approach. In particular, we utilize the results of Hilbert's Nullstellensatz (suitably applied over Galois Fields) to verify the circuit implementation. A Gröbner basis engine (SINGULAR [4] ) is used as the underlying computational framework.
• Based on the circuit architecture over ((2 ) ), we apply a hierarchical verification methodology: first we verify the correctness of lower-level building-blocks (adders and multipliers) over the ground field (2 ); subsequently we verify the overall multiplication at the higher-level over composite field ((2 ) ).
• It is known that the efficiency of Gröbner basis computations depends on the given term orderings. Therefore, we present empirical results on various term orderings, of which, lex order shows its superiority over other term orderings to achieve the high efficiency for Gröbner basis computations.
• While our approach is efficient in proving design correctness of composite field multipliers for upto 1024-bit circuits ( (2 1024 ) = ((2 32 ) 32 )), it is incapable of identifying bugs in the implementation. In such cases, we incorporate SMT-solvers into our framework such that bugs can be efficiently found in flawed designs.
II. PRELIMINARIES
To put our verification problem into perspective, this section briefly reviews concepts about Galois fields, modular multipliers over (2 ) , and composite field decomposition. Good references for these topics are [5] for Galois field theory, [6] for modular multiplication, and [1] [2] [3] for composite fields and arithmetic circuit design over such fields.
A finite field, also called a Galois field, is a field with a finite number of elements. Finite fields are denoted as = ( = ); we use these notations interchangeably. All the field operations are performed modulo a primitive polynomial ( ) ∈ [ ] of degree , which is always given a priori, and the coefficients are reduced modulo . For circuit implementations, is often chosen as 2 which implies the coefficients of all polynomials are in 2 = {0, 1}.
We will illustrate multiplication over GF(2 ) through Example II.1, Example II.1: Let us consider the field (2 4 ). We take as inputs:
, along with the primitive polynomial ( ) = 4 + 3 + 1. We have to perform the multiplication ( ) = × (mod ( )). The coefficients of = { 0 , . . . , 3 }, = { 0 , . . . , 3 } are in 2 = {0, 1}. So we can perform this multiplication as shown below:
In polynomial expression, we have the result as:
, and so on. Here the multiply "⋅" and add "+" operations are performed modulo 2, so they can be implemented in a circuit using AND and XOR gates. Note that unlike integer multipliers, there are no carry-chains in the design, as the coefficients are always reduced modulo = 2. However, the result is yet to be reduced modulo the primitive polynomial ( ) = 4 + 3 + 1. This is shown below, where the final output of the circuit is denoted by ( ) = 3 3
The final result of the circuit is:
There exists a unique field of order , for a given prime and any positive integer . This implies that (2 ) is isomorphic to ((2 ) ) when = ⋅ , and due to this isomorphism, it is possible to derive one field representation from the other. Construction of composite fields is described in detail in [1] and [3] ; here we briefly present the field decomposition concepts over ( (2 ) ) by Example II.2.
Example II.2:
As an example, let us reconsider the field (2 4 ) and decompose it as
The corresponding representation ∈ ((2 2 ) 2 ) is shown below: Fig.1 internally represents a 2-bit operation: × represents 2-bit multiplication and + represents 2-bit addition over the ground field (2 2 ). The output = 0 + 1 is the result of the multiplication. In the figure, element (10) is the bit-vector representation of the primitive root of the ground field.
For our verification problem, we are given the circuit netlist and the hierarchy information of composite field multiplier. We have to prove/disprove whether the circuit implementation over ( (2 ) ) correctly implements the equivalent computation ( ) = ⋅ (mod ( )) over (2 ), where = ⋅ . 
III. RELATED PREVIOUS WORK
Contemporary graph-based canonical DAG representations of Boolean functions such as BDDs [7] , OKFDDs [8] , BMDs [9] , etc. are ill-suited for such modulo-multiplication applications, particularly over large finite fields. This verification problem is also very hard for SAT solvers, as SAT solvers have to exhaustively search a large solution space. Contemporary Satisfiability Modulo Theory (SMT) solvers employ a mixture of theories for reasoning -however, none of them employ polynomial equation solving over Galois fields (which is a very hard problem in itself). Therefore, we use the QF-BV theory of SMT-solvers to conduct experiments of Galois field multipliers. As described in Table I , both SAT and SMT solvers cannot prove equivalence beyond 16-bit multipliers.
The work of [10] uses the composite field decomposition to verify properties of circuits over Galois field. Instead of verifying circuits over GF(2 ) directly, [10] verifies the circuits over GF((2 ) ). Their approach is OKFDD based, which also has the same limitations as described above. Therefore, their approach can verify at most 16-bit multipliers.
The theorem-proving approach of [11] also verify a Galois Field (2 ) implementation against a given specification. They devise a decision procedure based on polynomial division, variable elimination, term re-writing, etc., and demonstrate a correctness proof of a sub-block of a Reed-Solomon decoder. Their correctness criterion is 2 -field-size independent. If this condition is not satisfied, then their approach requires decision over 2 , and that would limit the scalability of their approach.
Recently, symbolic computer algebra techniques have attracted attention of the verification community. In [12] , [13] and [14] , an approach is proposed to determine whether a multivariate polynomial vanishes on a given the ring ℤ 2 . However, this approach is effective when word-level representation is available, which limits their capability in verifying bitlevel implementation. Alternatively, [15] describes a method based on Gröbner bases theory over the ring ℤ 2 to verify circuits at arithmetic bit level (ABL) [16] . This paper formulates the verification problem as an equivalent variety subset problem and then conducts a normal form computation. Based on results from [12] [13] and [14] , a vanishing polynomial test for the normal form result is performed. If the normal form results in a vanishing polynomial, the correctness of designs is proved. In [17] , it is shown further that the expensive vanishing polynomial test for the normal form can be omitted by formulating the problem in the quotient ring :
However, these methods are employed for verification by modeling constraints over ring ℤ 2 as opposed to our problem domain GF(2 ).
IV. VERIFICATION SETUP AND MODELING USING POLYNOMIAL CONSTRAINTS
Based on the circuit architecture over composite field ( (2 ) ), our verification setup is applied as a hierarchical verification methodology: first we verify the correctness of lower-level building-blocks over ground field (2 ) (adders and multipliers, as shown in Fig.1) ; once the correctness of each building-block is verified, each building-block is represented by means of their specification polynomials. For example,
sents a block. Subsequently we verify the overall multiplication at the higher-level over ( (2 ) ). The verification strategy is the same over (2 ) and over (2 ) except for their distinct specifications. Therefore, below we only describe our approach over the ground field (2 ). We verify the ground-field adders and multipliers against its specification polynomial = ⋅ (mod ( )) over (2 ) (or = + (mod ( )) over (2 )) -thus the miter is constructed to verify sub-blocks over ground field (2 ) . The equivalence verification setup is shown in Fig.  2 . Given specification , and implementation we want to prove that for all possible inputs, is functionally equivalent to . The specification is given at word level (in polynomial form) as follows:
where , ∈ 2 are primary inputs and , ∈ 2 symbolically represent inputs and ∈ 2 is the symbolic representation of output of specification.
In the circuit implementation, all bitwise operations AND, OR, NOT, XOR can be modeled using polynomial algebra in 2 (⊂ 2 ) by the following one-to-one mapping:
:
Therefore, we can "parse" the implementation into polynomial form. Now we can generate (multivariate) polynomial equations for the entire setup shown in Fig. 2 for (2 ) [and similarly for ( (2 ) )] -i.e. the specification polynomial , the implementation polynomials , and the ∕ = polynomial.
Subsequently, we can use computer algebra and algebraic geometry to reason whether or not solutions exist to this polynomial system.
V. VERIFICATION AS A NULLSTELLENSATZ PROOF
We describe some results from Computer Algebra and Algebraic Geometry to reason about the solutions (variety) to the polynomial equations (ideal) [18] .
A. Theory Background
Let be any field, and let [ 1 , . . . , ] denote the ring of multivariate polynomials with coefficients in . Suppose that we are given a set of polynomials 1 , 2 , . . . , ∈ [ 1 , . . . , ] and that we wish to find solutions to the polynomial system 1 = 2 = ⋅ ⋅ ⋅ = = 0. The set of polynomials generates an ideal which is denoted by = ⟨ 1 , 2 , . . . , ⟩. The set of all solutions to a given system of polynomial equations is called the variety, and is denoted by
In general, the variety can be infinite, a finite non-empty set, or empty (no solutions). For our problem, we have to determine whether or not the variety of the polynomial system corresponding to Fig. 2 {( 1 , . . . , ) : 1 , . . . , ∈ } in . Any single point is a variety of some polynomial system: e.g. ( 1 , . . . , ) is a variety of
Moreover, finite unions and finite intersections of varieties are also varieties. Let = ( 1 , . . . , ) and = ( 1 , . . . , ). Then: . . . , , 1 , . . . , )
• ∪ = ( : 1 ≤ ≤ , 1 ≤ ≤ ) is a finite -dimensional affine space, so it can be construed as finite unions of varieties. Therefore, is a variety of some polynomial equations. We will use the above concept to solve our problem. Variety depends not just on the given system of polynomial equations, but rather on the ideal generated by the polynomials.
Any ideal may have many different basis. Consequently, different sets of generators can represent the same ideal and, therefore, the same variety. Furthermore, some generating sets may be "better" than the original -providing a better representation of the ideal and of the corresponding variety. One such basis is the Gröbner basis [19] , which has many nice properties that allow to reason about the variety of an ideal.
To derive a Gröbner basis, one relies on variants of the Buchberger's algorithm, precise details of which can be found in [18] [20] . The Gröbner basis computation begins as an initial set of polynomials in a basis to be reduced. A monomial ordering is fixed to ensure that polynomials are manipulated in a consistent manner. Buchberger's algorithm then takes pairs of polynomials in the basis and combines them into " -polynomials" to cancel leading terms. An -polynomial is defined as:
where = LCM( ( ), ( )), where ( ) is the leading power of , and ( ) denotes the leading term of . Thepolynomial is then reduced by all elements of to a remainder , denoted as ( , ) −→ + . Multivariate polynomial division is used for this reduction step. This process is repeated for all unique pairs of polynomials, including those created by newly added elements, until no new polynomials are generated; ultimately constructing the Gröbner basis.
Therefore, given an ideal Example V.1: Suppose we are given a specification of a circuit over 2 :
And the corresponding implementation:
The inequality of 1 ∕ = 2 is formulated as:
( 1 + 2 ) + 1 = 0 1 and 2 are symbolically different but computationally equivalent. So according to Weak Nullstellensatz, the Gröbner basis of above polynomials is supposed to be ⟨1⟩ while the actual result is: So, to restrict the variety (solutions) to , we can intersect ( ) with . In other words:
Recall that is a finite set of points, and therefore it is a variety of some ideal 0 . We have to now identify this ideal 0 . Remarkably, this ideal can be easily described using "vanishing polynomials" of the Galois Field [5] : In any Galois field , = for all ∈ . Therefore − = 0 ∀ ∈ . We will call the polynomials − as "vanishing polynomials" of a Galois field. Further, we will denote by 0 the "ideal of vanishing polynomials", where 0 = ⟨ 1 −
Putting it all together:
Therefore, to find whether or not the variety of an ideal is empty over the given Galois Field , we can compute the Gröbner basis of ⟨ , 0 ⟩ (i.e. GB( , 0 )) and test if 1 ∈ ( , 0 ). If indeed 1 ∈ ( , 0 ) then the circuit implementation is bug-free (miter is infeasible). Otherwise, there is definitely a bug in the circuit over the given field . Now let us re-visit Example V.1, vanishing polynomials of each variable
2 − 2 } are appended to the original ideal and then this Gröbner Basis is computed as ⟨1⟩ which is consistent with Theorem V.2.
Unfortunately, vanishing polynomials ( − ) introduce a practical problem. Most computer algebra tools (Gröbner Bases engines) have a restriction on the largest degree of variables in the system. For our work, we use the SINGULAR [4] tool. SINGULAR has a restriction that the degree of a variable ( in ) be < 2 16 . In cryptography, one encounters very large fields where = 2 256 or higher. So we need to be able to find alternate ways to account for the high degree polynomials − in 0 . To address this practical issue, we derive the following result. Theorem V.3:
) ), where ∈ (2 ), = 2 = (2 ) , then
and (2 ) ⊂ (2 ) , we have ∈ (2 ) . Also,
we have the following result [5] :
Subsequently, we have:
Now we have:
Together with:
We conclude:
From Theorem V.3, the constraints corresponding to the vanishing polynomials of large degree can be equivalently replaced by vanishing expressions with variables from the ground field. This has allowed us to verify circuits over very large 1024-bit fields: as vanishing polynomials over which cannot be expressed in most computer algebra systems, can be replaced by ∑ 63 =0 ( 2 − ) ⋅ . Therefore, the restriction on the largest degree of variables is eliminated. Together with Theorem V.3, Theorem V.2 now provides a solution to our verification instance. We extract implementation equations from the circuit, and we are given the symbolic polynomial for the specification. Next, a miter is created by combining the implementation and the specification constraints. Subsequently, we derive a polynomial system for our verification miter which corresponds to the ideal I. We append
Here ′ is the size of ground field. The Gröbner basis ( , 0 ) is then computed. If 1 ∈ ( , 0 ) then there is no solution within the Galois Field ; which implies the circuit implementation matches the specification. Otherwise, ( , 0 ) ∕ = ∅ in and there is definitely a bug (or more bugs) in the circuit implementation.
Note that this approach can only detect presence or absence of bugs. In the presence of bugs, this approach cannot provide a counter-example that excites the bug. However, our experiments show that in presence of bugs, SMT-solvers can easily provide the counter-example.
VI. EXPERIMENTAL RESULTS
Using the concepts presented in the previous sections, particularly Theorem V.2 and Theorem V.3, we have conducted experiments to hierarchically verify Mastrovito multiplier [6] implementations ( ) against the specification ( ) = ⋅ (mod ( )). Our experiments are conducted on a desktop with 2.40GHz CPU and 2GB memory running 64-bit Linux.
A. Evaluation of SAT/SMT/BDD
To evaluate different SAT/SMT/BDD methods, we encode our multiplier implementation into different formats, including CNF format for SAT solvers, SMT-LIB format for SMT solvers and blif format for BDD. Specially, we use the theory of fixed-size bit-vectors (QF-BV) to model the circuit constraints for SMT solvers. Our experiments, shown in Table I , show that BDD and SAT/SMT solvers cannot prove the correctness beyond 16-bit multipliers. The time-out limit is set as 10 hours here. 
B. Low level verification
The time-out limit for verification of designs ( ) over GF (2 ) is set as 1 hour. Experiments for low level multiplier designs over GF (2 ) are shown in Table II . Adder verification is trivially done in less than 1 second even for 32-bit circuits. Compared to SAT/SMT/BDDs based approaches, as shown in Table I , our approach can achieve at least an order of magnitude runtime improvement. Note that in our proposed method, the Mastrovito Multiplier implementation is verified against a word-level (polynomial) specification. However, when we use SAT and BDDs for verification, we have to bit-blast the specification ( ) ⋅ ( ) (mod ( )) into bit level. For SMT solvers, the specification is represented as bit-vector expressions. Therefore, there is an inherent advantage of our method in that it maintains the high-level abstraction whenever possible.
Impact of Variable and Term Orderings:
The efficiency of the Gröbner basis engines depends significantly on the variable and term orderings used to systematically manipulate the polynomials. We have experimented with various lexicographic (lex), degree-lexicographic (deglex) and degreereverse-lexicographic (degrevlex) term orderings for verification. Results are shown in Table II for lp (lex), dp (degrevlex) and Dp (deglex) term orders. We partition the variables into three categories: primary inputs (PI), intermediate variables (IM) and primary outputs (PO). It can be observed that that the efficiency of our approach heavily depends on the given variable order as well as term orderings. For example, for 32-bit multiplier verification, runtime varies from 545.28s to > 3600s. The empirically best variable order found is "PI > IM > PO", as shown in Table II . For this variable order, the lex term order outperforms all other term orderings and we can verify 32-bit multipliers in less than 10 minutes. 
C. High level verification
The time-out limit for verification of high level designs is set as 24 hours. Table III shows the runtime of high level design verification over GF((2 ) ) for varying wordsize = ⋅ . Other contemporary approaches, including BDD/SMT/SAT based methods, fail to verify even 32-bit designs while our approach can verify such designs up to 1024-bit over GF( (2 ) ). Based on the empirical results of low level verification, we choose the lex term ordering with "PI > IM > PO" to compute the Gröbner basis at high level.
D. Incorporation of SMT Solvers
Our method is infeasible when there is a bug in the design. The reason is that in our approach, the Gröbner basis computation stops immediately when detecting 1 ∈ ( , 0 ). On the other hand, if 1 / ∈ ( , 0 ), then a large number of polynomials is generated by the Gröbner basis engine, which often results in memory explosion.
We introduced a bug in the design by arbitrarily swapping the wires (variables) with , for some ∕ = . In such cases, 1 / ∈ ( , 0 ), and the Gröbner basis computation does not terminate in 5 hours until memory explosion, even for an 8-bit multiplier. And these bugs are also hard to find even for SAT/BDD based techniques, as they can only find bugs in upto 16-bit circuits. Fortunately, SMT solvers can find bugs quickly. We introduce 5 different bugs in both high level designs and low level designs and conduct verification using SMT solvers [21] Table IV , where the reported time is the average of 5 different experimental runs. Of these, Yices outperforms all other SMT solvers. The time-out limit here is set as 2 hours.
VII. CONCLUSIONS
This paper has targeted the implementation verification of hierarchically designed composite Galois field multipliers. Decomposing the Galois field (2 ) as ( (2 ) ) introduces a hierarchical abstraction. We formulate the verification problem using the Gröbner basis engine as a Nullstellensatz proof. First we verify low-level adders and multipliers at (2 ) , and then verify the high-level interconnections between these blocks at ( (2 ) ). Using our approach, we can verify the correctness of upto 1024-bit multipliers over GF((2 32 ) 32 ) where contemporary SAT/SMT-based techniques fail. Moreover, we present empirical results on efficient variable orders using which we are able to achieve the high efficiency for Gröbner basis computations. However, our approach is infeasible in finding bugs, whereas SMT-solvers fare very well in bug-catching. We are now investigating how to combine both Gröbner basis and SMT based approaches for more robust verification. (2 ) ) with > > . All times are given in seconds. MO=memory out of 2G.
