Doctor of Philosophy by Pruss, Tim
WORD-LEVEL ABSTRACTION FROM COMBINATIONAL
CIRCUITS USING ALGEBRAIC GEOMETRY
by
Tim Pruss
A dissertation submitted to the faculty of
The University of Utah
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
Department of Electrical and Computer Engineering
The University of Utah
August 2015
Copyright c© Tim Pruss 2015
All Rights Reserved
THE UNIVERSITY OF UTAH GRADUATE SCHOOL
STATEMENT OF DISSERTATION APPROVAL
The dissertation of Tim Pruss
has been approved by the following supervisory committee members:
Priyank Kalla , Chair 5/11/15
Date Approved
Ganesh Gopalakrishnan , Member 5/11/15
Date Approved
Chris J. Myers , Member 5/11/15
Date Approved
Kenneth Stevens , Member 5/11/15
Date Approved
Rongrong Chen , Member 5/11/15
Date Approved
and by Gianluca Lazzi , Chair/Dean of
the Department/College/School of Electrical and Computer Engineering
and by David Kieda, Dean of The Graduate School.
ABSTRACT
Abstraction plays an important role in digital design, analysis, and verification, as
it allows for the refinement of functions through different levels of conceptualization.
This dissertation introduces a new method to compute a symbolic, canonical, word-level
abstraction of the function implemented by a combinational logic circuit. This abstraction
provides a representation of the function as a polynomial Z = F (A) over the Galois field
F2k , expressed over the k-bit input to the circuit, A. This representation is easily utilized
for formal verification (equivalence checking) of combinational circuits.
The approach to abstraction is based upon concepts from commutative algebra and
algebraic geometry, notably the Gro¨bner basis theory. It is shown that the polynomial
F (A) can be derived by computing a Gro¨bner basis of the polynomials corresponding to
the circuit, using a specific elimination term order based on the circuits topology. However,
computing Gro¨bner bases using elimination term orders is infeasible for large circuits.
To overcome these limitations, this work introduces an efficient symbolic computation to
derive the word-level polynomial. The presented algorithms exploit i) the structure of the
circuit, ii) the properties of Gro¨bner bases, iii) characteristics of Galois fields F2k , and iv)
modern algorithms from symbolic computation.
A custom abstraction tool is designed to efficiently implement the abstraction pro-
cedure. While the concept is applicable to any arbitrary combinational logic circuit, it
is particularly powerful in verification and equivalence checking of hierarchical, custom-
designed and structurally dissimilar Galois field arithmetic circuits. In most applications,
the field size and the datapath size k in the circuits is very large, up to 1024 bits. The pro-
posed abstraction procedure can exploit the hierarchy of the given Galois field arithmetic
circuits. Our experiments show that, using this approach, our tool can abstract and verify
Galois field arithmetic circuits up to 1024 bits in size. Contemporary techniques fail to
verify these types of circuits beyond 163 bits and cannot abstract a canonical representation
beyond 32 bits.
Dedicated to all of my family, friends, instructors, and everyone who has encouraged me
and helped me get to where I am today. You’ve all played a monumental role in my life.
Thank you to the professors who encouraged me to pursue my PhD. Special thanks to
Priyank; it’s been great working alongside you.
CONTENTS
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
CHAPTERS
1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Hardware Design and Verification Overview . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Formal Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Importance of Word-level Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Dissertation Objective, Motivation, and Contributions . . . . . . . . . . . . . . . . 8
1.4.1 Motivating Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.2 Dissertation Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Dissertation Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2. PREVIOUS WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1 Canonical Decision Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Word-Level Techniques in RTL Synthesis and Verification . . . . . . . . . . . . . 14
2.3 Combinational Equivalence Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Verification of Galois Field Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Verification of Integer Arithmetic Circuits Using Gro¨bner Bases . . . . . . . . 16
2.6 Polynomial Interpolation in Symbolic Computation . . . . . . . . . . . . . . . . . . 17
2.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3. GALOIS FIELDS PRELIMINARIES AND APPLICATION IN HARDWARE
DESIGN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Rings, Fields, and Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Galois Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 Containment of Galois Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.2 Polynomial Interpolation Over Galois Fields . . . . . . . . . . . . . . . . . . . 27
3.3 Hardware Implementations of Arithmetic Operations Over Galois Fields . . 29
3.3.1 Montgomery Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.2 Circuit Designs Over Composite Fields . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.3 Applications to Elliptic Curve Cryptography . . . . . . . . . . . . . . . . . . . 34
4. COMPUTER ALGEBRA FUNDAMENTALS . . . . . . . . . . . . . . . . . . . . . . 38
4.1 Monomials, Polynomials, and Term Orderings . . . . . . . . . . . . . . . . . . . . . . 38
4.2 Varieties and Ideals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 Gro¨bner Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4 Elimination Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5 Hilbert’s Nullstellensatz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5. WORD-LEVEL ABSTRACTION OF COMBINATIONAL CIRCUITS . . . 56
5.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.2 Circuit Polynomial Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.3 Abstraction Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.4 Experimental Results: Validation of the Approach . . . . . . . . . . . . . . . . . . . 65
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6. OVERCOMING GROBNER BASIS COMPLEXITY FOR ABSTRACTION 67
6.1 Improving the Abstraction Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2 Improving Polynomial Division Using F4-style Reduction . . . . . . . . . . . . 75
6.3 Reducing Bit-Level Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.3.1 Symbolically Computing the Bit-Level Mapping . . . . . . . . . . . . . . . . 85
6.4 Overall Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.5 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7. GENERALIZING THE APPROACH TO ARBITRARY COMBINATIONAL
CIRCUITS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.1 Circuits with Varying Input and Output Sizes . . . . . . . . . . . . . . . . . . . . . . . 93
7.2 Composite Field Arithmetic Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.2.1 Design of Composite Field Multipliers . . . . . . . . . . . . . . . . . . . . . . . 98
7.2.2 Abstraction of Composite Field Multipliers . . . . . . . . . . . . . . . . . . . . 101
7.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8. IMPLEMENTATION OF THE CUSTOM ABSTRACTION SOFTWARE
AND EXPERIMENTAL RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8.1 Data Structures and Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.1.1 Galois Field Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.1.2 Rings and Monomials Over Galois Fields . . . . . . . . . . . . . . . . . . . . . 111
8.1.3 Polynomials and Polynomial Division . . . . . . . . . . . . . . . . . . . . . . . . 116
8.2 Abstraction Tool Flow and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.3 Limitations of the Abstraction Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
vi
9. CONCLUSIONS AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . . 126
9.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
9.1.1 Hardware Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
9.1.2 Integration with CAD Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
9.1.3 Polynomial Reductions using Data-Structures . . . . . . . . . . . . . . . . . . 128
9.1.4 Application to Sequential Circuit Verification . . . . . . . . . . . . . . . . . . 129
9.1.5 Application to Formal Software Verification . . . . . . . . . . . . . . . . . . . 129
9.1.6 Application to Integer Arithmetic Circuits . . . . . . . . . . . . . . . . . . . . . 129
APPENDIX: REPRESENTATIONS OF BASE FIELD ELEMENTS OVER EXTENSION
FIELDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
vii
LIST OF FIGURES
1.1 Typical hardware design flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Equivalence checking as applied to the hardware design flow . . . . . . . . . . . . 5
1.3 A miter of two circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Circuit with k-bit input A and k-bit output Z. Abstraction to be derived as
Z = F(A) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 The black-box or the algebraic circuit representation . . . . . . . . . . . . . . . . . . 17
3.1 Containment of fields: F2 ⊂ F4 ⊂ F16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 4-bit adder over F24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Mastrovito multiplier over F24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Montgomery multiplier over F2k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 4-bit composite multiplier designed over F(22)2 . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Point addition over an elliptic curve (R=P+Q) . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1 Derive the abstraction Z = F(A) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.2 A 2-bit multiplier over F(22) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.1 A buggy 2-bit multiplier over F(22) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2 F4-style polynomial reduction on a matrix for Example 6.5 . . . . . . . . . . . . . . 80
7.1 Circuit with varying word sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.2 4-bit composite multiplier designed over F(22)2 . . . . . . . . . . . . . . . . . . . . . . . 99
8.1 Object structure of a Galois field element . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.2 Logic comparisons between similarly structured circuits . . . . . . . . . . . . . . . . 123
LIST OF TABLES
3.1 Bit-vector, exponential and polynomial representation of elements in F24 =
F2[x] (mod x4 + x3 + 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.1 Run-time of Gro¨bner basis computation of Mastrovito multipliers in
SINGULAR using abstraction term order > . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.1 Steps to derive the inverse of α6 + α4 + α + 1 . . . . . . . . . . . . . . . . . . . . . . . 111
8.2 Abstraction of Mastrovito multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
8.3 Abstraction of flat Montgomery multipliers . . . . . . . . . . . . . . . . . . . . . . . . . 120
8.4 Abstraction of Montgomery blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
8.5 Abstraction of bug-free Mastrovito multipliers over F(2m)n . . . . . . . . . . . . . . 122
8.6 Statistics of designs over F(2m)n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.7 Bug-catching between a golden-model Mastrovito and buggy Montgomery
circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.8 Equivalence checking between a golden-model Mastrovito and a bug-free
Montgomery circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
CHAPTER 1
INTRODUCTION
There is an ever-increasing need for secure communication within information tech-
nology. Security of sensitive information relies more and more heavily on encryption
methodologies implemented in hardware by cryptographic circuits. One of the most promi-
nent of these methodologies is Elliptical Curve Cryptography (ECC), which provides more
strength per encryption bit than other encryption methodologies. The main building blocks
of ECC hardware implementations are fast, custom-built Galois field arithmetic circuits.
These circuits are notoriously hard to verify, yet their correctness is vitally important in
critical applications. In [1], for example, it is shown that a bug in the hardware could
lead to the full leakage of the secret cryptographic key, which could compromise the entire
system. Thus, formal verification is imperative in Electronic Design Automation (EDA)
when dealing with cryptographic circuits.
To facilitate this verification, it is highly desirable to obtain a word-level representation
of the datapath of the ECC arithmetic block from its bit-level implementation. Ideally, this
abstraction should be canonical, as this allows it to be directly applicable to equivalence
checking. Such a canonical, word-level abstraction of the Galois field arithmetic block
would not only make it easier to verify and reason about the cryptographic system as a
whole, but also enable the use of higher level abstraction and synthesis tools. As arithmetic
circuits are custom-designed, often modularly, using Galois field arithmetic blocks, the
abstraction should also exploit the hierarchical nature of the circuitry. Due to the modular
circuit structure, abstraction of each arithmetic block becomes the key in verification of the
full circuit. Practical applications of ECC dictate a datapath of a minimum of 163-bits, up
to 571-bits, as designated by the National Institute for Standards and Technology (NIST).
However, abstraction of Galois field arithmetic circuits has been infeasible for datapaths
beyond 16 bits.
2This dissertation proposes an algebraic geometry based approach to abstract canonical,
word-level representations of bit-level Galois field arithmetic circuits. The approach is
able to abstract representations for circuits up to 571 bits in size, which is the largest NIST
standard for datapath size in ECC. Verification of circuits for which this abstraction has
been computed is shown to be trivial; thus, the focus is on deriving the abstraction quickly
and efficiently.
1.1 Hardware Design and Verification Overview
The typical design flow of a hardware system, as shown in Fig. 1.1, starts with a
hardware system specification, which describes the necessary functions and parameters
that the system must perform and adhere to. The specification is typically modeled using
a transaction-level model (TLM), which describes communication details between large
circuit modules. The TLM is then translated into a register-transfer-level (RTL) description,
which is composed of abstracted, interconnected circuit blocks that compose the entire
system. RTL is typically implemented in hardware description languages (HDL) such as
Verilog and VHDL, which are the most popular choices in the industry. Next, the RTL
is optimized and converted into a netlist, i.e., a large collection of small physical blocks
(MOSFET, Boolean logic gates, etc.) and the inter-connections (wires) between them.
Lastly, the netlist is further optimized and then mapped onto a physical space on a chip,
which is then sent off for fabrication. This entire design flow is automated by computer-
aided design (CAD) tools.
When moving from one abstraction level of the hardware design process to the next, an
important issue arises: how can one ensure that the functionality of the optimized design
matches original spec? Bugs in hardware design which are not caught early can have
costly effects later, such as the need for a redesign. Bugs in arithmetic circuits can be
especially catastrophic. One infamous example is the 1994 floating point division (FDIV)
bug that affected the Intel Pentium chip [2], and subsequently cost the company $475
million because it was discovered after the chip’s release. In another more fatal case,
during the Gulf war, an American Patriot Missile battery failed to intercept an incoming















Figure 1.1: Typical hardware design flow.
consequences, there has been extensive work in field of hardware verification to find and
eliminate bugs prior to fabrication.
The two main methodologies used in hardware verification are simulation and formal
verification. Simulation checks correctness by applying exhaustive assignments to the
circuit inputs and verifying correctness of the output. This ensures that the circuit performs
as designed under all possible inputs. Such exhaustive testing is quite effective for smaller
circuits. However, as the size of the circuit increases, it becomes computationally infeasible
to simulate all possible test vectors. This is the case with Galois field arithmetic circuits,
which are commonly very large in real-world applications. Often for such large circuits,
simulations of a smaller and more manageable subset of test vectors are employed to catch
4bugs. While these tests can increase confidence in the correctness of the design, they do not
guarantee correctness since every data-flow of the design has not been analyzed.
1.2 Formal Verification
Instead of simulating input vectors, formal verification utilizes mathematical theory to
reason about the correctness of hardware designs. Formal verification has two main forms:
property checking and equivalence checking.
Property checking (or property verification) verifies that a design satisfies certain given
properties. Property checking is done mainly in the form of theorem proving, model
checking, or approaches which combine the two.
1. Theorem proving [4] requires the existence of mathematical descriptors of the specifi-
cation and implementation of the circuit. Theorem provers apply mathematical rules
to these descriptors to derive new properties of the specification. In this way, the tool
can reduce a proof goal to simpler subgoals, which can be automatically verified.
However, generating the initial proof-goal requires extensive guidance from the user,
so there is an overall lack of automation in theorem proving.
2. Model checking [5] is an approach to verifying finite-state systems where specifi-
cation properties are modeled as a system of logic formulas. The design is then
traversed to check if the properties hold. If the design is found to violate a particular
property, a counter-example is generated that exercises the incorrect behavior in the
design. Such counter-examples allow the designer to trace the behavior and find
where the error in the design lies. Modern model checking techniques use the result
to automatically refine the system and perform further checking. These tools are
typically automated, and thus have found widespread use in CAD tool suites.
Equivalence checking verifies that two different representations of a circuit design have
equivalent functionality. An example of equivalence checking as it applies to the hardware
design flow is shown in Fig. 1.2.















Figure 1.2: Equivalence checking as applied to the hardware design flow.
1. Graph-based techniques construct a canonical graph representation, such as a binary
decision diagram (BDD) or one of its many variants, of each circuit. A linear com-
parison is then conducted to determine whether the two graphs are isomorphic. Since
the graph representation is canonical, the graphs of the two circuits will be equivalent
if and only if the circuits perform the same function.
2. Satisfiability techniques construct a miter of the two circuits, typically in a graph
such as an and-inverter graph (AIG). A miter is a combination of the two circuits
with one bit-level output, which is only in a “1” state when the outputs of the circuits
differ given the same given input, as shown in Fig. 1.3. A satisfiability (SAT) tool [6]
is then employed to simplify the graph and find a solution to the miter, i.e., find
an input for which the miter output is “1.” If a solution is found, this solution acts






Figure 1.3: A miter of two circuits.
Certain formal verification methods use computer-algebra and algebraic geometry tech-
niques based on mathematical theories. Unlike SAT-based verification, modern algebraic
geometry techniques do not explicitly solve the constraints to find a solution; rather, they
reason about the presence or absence of solutions, or explore the geometry of the solutions.
These methods [7] [8] [9] transform the circuit design into a polynomial system. Typically,
this system of polynomials is then used to compute a Gro¨bner basis [10]. Computation
of Gro¨bner bases allows for the easy deduction of important properties of a polynomial
system, such as the presence or absence of solutions. These properties are then leveraged
to perform verification. Unfortunately, such a computation has been shown to be doubly
exponential in the worst case, and thus these methods have not been practical for real-world
applications. However, recent breakthroughs in computer-algebra hardware verification
have shown that it is possible to overcome the complexity of this computation while still
utilizing the beneficial properties of a Gro¨bner bases [11].
1.3 Importance of Word-level Abstraction
Most formal verification techniques can benefit from word-level abstractions of the
circuits they verify. Abstraction is defined as state-space reduction, i.e., abstraction reduces
state-space by mapping the set of states of a system to a smaller set of states. Because the
new representation contains fewer states, it is easier to comprehend and thus easier to use.
Word-level abstraction focuses specifically on abstracting a word-level representation of
a circuit out of a bit-level representation. For example, a bit-level representation of an
integer multiplier is represented by a collection of Boolean inputs and outputs, whereas a
word-level abstraction hides the underlying logic and represents the circuit as two integer
inputs and one integer output, e.g., Z = A·B. As the bit-size of the multiplier increases, the
7logical implementation of the multiplier grows (typically exponentially) while the word-
level abstraction stays the same.
Word-level abstractions have a wide variety applications in formal verification. Theo-
rem proving techniques can leverage abstraction as an automatic decision procedure or as
a canonical reduction engine. For example, since RTL is composed of circuit blocks that
represent the underlying circuit, RTL verification methods can exploit abstractions of these
blocks. This is seen in the following RTL verification methods:
• Model checking [12], where an approximation abstraction of RTL blocks is generated
and then refined.
• Graph-based equivalence checking [13] [14], where abstraction methods are used to
generate a canonical word-level graph representation of the circuit.
• Satisfiability-based equivalence checking [15], where abstractions are used to iden-
tify symmetries and similarities in order to minimize the amount of logic that is sent
to the SAT tool.
Other equivalence checking techniques that employ abstractions include satisfiability
modulo theory (SMT) techniques [16] [17], which are similar to SAT except they operate on
higher-level data structures (integers, reals, bit vectors, etc.), as well as constraint solving
techniques [18] [19]. In general, RTL equivalence checking approaches would ideally
maintain a high-level of abstraction while still retaining sufficient lower-level functional
details (such as bit-vector size, precision, etc) [20].
Word-level hardware abstractions also have applications in RTL and datapath syn-
thesis [21] [22] [23]. Abstractions of circuits allow for design reuse, which allows for
tool-automated synthesis of larger circuit blocks. Since hardware design specifications
tend to be word-level, synthesis tools can use these larger circuit blocks to generate and
optimize the datapaths and create the RTL of the system. Thus, in order for a circuit to be
used by these automated synthesis tools, its word-level abstraction must be known.
Finally, abstractions can also be applied to detect malicious modifications to a circuit,
potentially inserted as a hardware trojan horse. Hardware trojans, a relatively new security
concern in the hardware industry, use certain techniques to add incorrect behavior to a
8design. This behavior is only activated under certain rare circumstances that only the mal-
intent designer has knowledge of. The behavior is purposely hidden and is very difficult
to encounter during simulation of the design. A manufactured chip with a subsystem that
contains a hardware trojan could compromise the entire system in which it is used. In
some hardware trojan cases, formal verification techniques may be applied to catch a bug
in a design and provide a counter-example which exercises the error. However, it can be
difficult to tell whether the bug in the design was introduced intentionally of not. On the
other hand, word-level abstractions of bit-level circuits effectively reverse-engineer the true
function implemented by the circuit, which could be used to determine the designer’s true
intention.
1.4 Dissertation Objective, Motivation, and Contributions
This dissertation focuses on abstracting a canonical, word-level representation of hard-
ware (bit-level) implementations of combinational circuits. The proposed technique is a full
abstraction solution that can be applied to any arbitrary acyclic combinational circuit. It is
particularly efficient when applied to Galois field arithmetic circuits. Using this technique,
if the abstraction of the circuit’s implementation and its specification are found, they can
be easily compared to determine equivalence. Implementation of a custom software tool,
developed to compute the abstractions, is also described.
1.4.1 Motivating Application
The motivation for this work comes from applications of Galois field arithmetic cir-
cuits in elliptical curve cryptography (ECC) hardware systems. The main operations of
encryption, decryption, and authentication in ECC rely on operations performed on elliptic
curves, which are implemented in hardware as polynomial functions over Galois fields. To
be applicable in real-world situations, ECC datapaths should be a minimum of 163-bits
wide, which is the minimum NIST standard, up to a recommended size of 571-bit operand
widths. Many non-ECC cryptosystems have datapaths on the order of 1000-bits.
A Galois field arithmetic circuit with a datapath size of k is built as Boolean functions:
Bk → Bk. This function is mapped to an operation f : F2k → F2k over the Galois field
F2k . These circuits are custom-built, modular systems that cannot be synthesized due to
their complex nature. Thus, formal verification is needed to ensure they operate correctly.
9Recent computer-algebra based formal verification techniques have been able to per-
form verification of Galois field arithmetic circuits with a datapath size up to 163-bits [11].
Word-level abstractions of Galois field arithmetic circuits could be used to further improve
these formal verification techniques to allow for verification of larger circuits, as well
as provide the other benefits of word-level abstraction. However, there is currently no
technique for computing word-level abstractions of Galois field circuits of any practical
size.
While the motivation comes from the need to verify Galois field arithmetic circuits,
the presented approach can be generalized to be applicable to any combinational acyclic
circuit. Any such circuit with a k-bit inputA and a k-bit output Z, such as the one shown in
Fig. 1.4, computes f : Bk → Bk and can thus be analyzed as the function f : F2k → F2k .
Over F2k this function can be represented as the polynomial Z = F(A). This is trivially
generalized when there are multiple k-bit inputs A1, A2, . . . , Ai, i.e., Z = F(A1, . . . , Ai).
Now assume the word-size of the input differs from the output, that is the circuit computes
f : Bm → Bn for m 6= n. This can be represented as a function over Galois fields as
f : F2m → F2n . This function can be analyzed over the field F2k such that F2k ⊃ F2m and
F2k ⊃ F2m , where k = LCM(m,n).
1.4.2 Dissertation Contributions
To solve the problem of word-level abstraction, this dissertation proposes a full solution
consisting of three main contributions.
1. A theory for finding the word-level abstraction from a bit-level circuit over Galois
fields is created. The given bit-level circuit implementation is modeled as a system
Figure 1.4: Circuit with k-bit input A and k-bit output Z. Abstraction to be derived as
Z = F(A).
10
of polynomials over the field. This theory is derived using techniques from computer-
algebra, notably the theory of Gro¨bner basis [24].
2. Using this theory, new algorithms based on symbolic computation are developed to
derive the word-level abstraction. The algorithms are designed to be applicable to
industry-size arithmetic circuits over Galois fields [25] [26]. A complexity analysis
of the algorithmic approach is also presented. Furthermore, the approach is also
generalized to make it applicable to arbitrary combinational circuits. Finally, we
show how the approach can be used to exploit the hierarchical structure of large
Galois field multipliers designed over composite fields.
3. A custom software tool implementation of the algorithmic approach is described,
including an analysis of efficient data structures designed for this purpose [27].
Experiments show that the proposed solution can abstract canonical, word-level, poly-
nomial representations of Galois field arithmetic circuits up to 1024-bits in size, while other
contemporary approaches are infeasible beyond a 32-bit designs.
1.5 Dissertation Organization
The rest of this dissertation is organized as follows. Chapter 2 reviews previous ap-
plicable work and highlights their drawbacks with respect to the canonical, word-level
abstraction problem. Chapter 3 describes the properties of Galois fields, F2k , and explains
the process of constructing them. It also describes how to design arithmetic circuits over
such fields, their complexities, and the role of these circuits in elliptic curve cryptography.
Chapter 4 provides a theoretical background of computer-algebra and Gro¨bner bases and
explains their application to Galois fields. Chapter 5 describes an approach to abstract
word-level polynomial representations of combinational circuits using a Gro¨bner basis
computation. Chapter 6 improves on this word-level abstraction approach to make it
applicable to much larger circuits. Chapter 7 generalizes the abstraction approach to make
it applicable to circuits with varying operand word-lengths. It also describes how the
approach can take advantage of the hierarchy of arithmetic circuits designed over composite
fields. Chapter 8 describes the implementation details of a custom abstraction tool and
gives experimental results of abstracting large Galois field multiplier circuits. Chapter 9
11




This chapter covers previous work in the area of canonical representations of functions,
word-level abstractions and their application to design verification. Since the application
of our approach is targeted towards formal equivalence verification, modern combinational
equivalence checking techniques are also reviewed. Finally, formal verification techniques
using computer algebra, algebraic geometry, and polynomial interpolation are also consid-
ered.
2.1 Canonical Decision Diagrams
Canonical representations of Boolean functions have been the subject of extensive in-
vestigation for logic synthesis and design verification. The reduced ordered binary decision
diagram (ROBBD) [28] was the first significant contribution in this area. Efficient imple-
mentation of ROBDDs as a software package [29] allowed for efficient formal verification
of combinational and sequential circuits. ROBDDs represent a Boolean function as an
implicit set of points on a canonical directed acyclic graph (DAG). Manipulation of Boolean
functions can then be carried out as composition operations on their respective DAGs. The
decomposition principle behind BDDs is one of Shannon’s expansion, i.e.,
f(x, y, . . . ) = xfx + x
′fx′ (2.1)
where fx = f(x = 1) and fx′ = f(x = 0) denote the positive and negative co-factors
of f w.r.t. x, respectively. Motivated by the success of BDDs, variants of the Shan-
non’s decomposition principle (Davio, Reed-Muller, etc.) were explored to develop other
functional decision diagrams. For example, the AND-OR-NOT logic based Shannon’s
expansion is transformed into an AND-XOR logic based decomposition, termed as the
Davio’s decomposition:
13
f(x, y, . . . ) = xfx + x
′fx′ (2.2)
= xfx ⊕ x′fx′ (2.3)
= xfx ⊕ (1⊕ x)fx′ (2.4)
= fx′ ⊕ x(fx ⊕ fx′) (2.5)
Decision diagrams based on such decompositions include functional decision diagrams
(FDDs) [30], algebraic decision diagrams (ADDs) [31], multi-terminal binary decision
diagrams (MTBDDs) [32], and their hybrid edge-valued counterparts, hybrid decision di-
agrams (HDDs) [33], and edge-valued binary decision diagrams (EVBDDs) [34]. While
these are referred to as Word-Level Decision Diagrams [13], the decomposition is still
point-wise, binary, with respect to each Boolean variable. These representations do not
serve the purpose of word-level abstraction from bit-level representations.
Binary moment diagrams (BMDs) [35], and its derivatives, extended multiplicative
BMDs (K*BMDs) [36] and multiplicative power hybrid decision diagrams (*PHDDs) [37],
depart from the Boolean decomposition and perform the decomposition of a linear function
based on its two moments. BMDs provide a compact representation for integer arithmetic
circuits such as multipliers and squarers. However, these are inapplicable to word-level
abstraction of modulo-arithmetic circuits over Galois fields.
Taylor expansion diagrams (TEDs) [38] are a word-level canonical representation of a
polynomial expression, based on the Taylor’s series expansion of a polynomial. However,
they do not represent a polynomial function canonically. For example, f1 = 0 and f2 =
2x2 − 2x (mod 4) are two different polynomial representations of the zero function over
Z4; but they are symbolically different polynomials and they have nonisomorphic TED
DAGs. While [39] and [40] provide canonical representations of polynomial functions,
they do so over finite integer rings Z2k and not over Galois fields F2k .
MODDs [41] [42] are a DAG representation of the characteristic function of a circuit
over Galois fields F2k . MODDs come very close to satisfying our requirements as a canon-
ical word-level representation that can be employed over Galois fields, as it essentially
interpolates a polynomial from the characteristic function. However, MODDs do not scale
very well for large circuits — this is because every node in the DAG can have up to k
children and the normalization operations are very complicated for MODDs. They also
14
suffer from the size explosion problem during intermediate computations. They are known
to be infeasible in representing functions over 32-bit operand words.
2.2 Word-Level Techniques in RTL Synthesis
and Verification
Other attempts to derive high-level representations of functions, along with associated
decision procedures, can be found in the rich domain of formal model checking [43] [44],
theorem proving [14], bit-vector SMT-solvers [16] [45] [46] [47], automated decision pro-
cedures for Presburger arithmetic [48] [49], algebraic manipulation techniques [50], or the
ones based on term rewriting [51], etc. In [52], a word-level linear function over integers is
found from bit-level component descriptions if such a function exists. Polynomial, integer
and other nonlinear representations have also been researched: difference decision dia-
grams (DDDs) [53] [54], interval diagrams [55], interval analysis using polynomials [56],
etc. Most of these have found application in constraint satisfaction for simulation-based
validation: [57] [58] [15] [59] [60] [47]. Among these, [59] [60] [47] have been used
to solve integer modular arithmetic on linear expressions — a different application from
representing finite field modulo-arithmetic on polynomials in a canonical form.
2.3 Combinational Equivalence Checking
The verification problem addressed in this dissertation is a manifestation of the com-
binational equivalence checking (CEC) problem, where the specification (polynomial) and
the implementation (circuit) are custom-designed, structurally very dissimilar circuits. To
make use of contemporary gate-level CEC tools, we can take the specification circuit
(“golden model”) and check its equivalence against the implementation circuit. Canonical
decision diagrams (BDDs [28] and their word-level variants [13]), and-invert-graph (AIG)
based reductions [61] [62], circuit-SAT solvers [6], etc., are among the many techniques
that can be employed for this CEC. When one circuit is synthesized from the other, this
problem can be efficiently solved using AIG-based reductions (e.g., the ABC tool [63])
and circuit-SAT solvers (e.g., CSAT [6]). Synthesized circuits generally contain many
subcircuit equivalences which AIG and CSAT based tools can identify and exploit for
verification. However, when the circuits are functionally equivalent but structurally very
15
dissimilar (e.g., Mastrovito [64] versus Montgomery implementations [65] of Galois field
circuits), none of the contemporary techniques, including ABC and CSAT, offer a prac-
tical solution. Automatic formal verification of large custom-designed modulo-arithmetic
circuits largely remains unsolved today.
This verification problem is very hard for SAT solvers and also for quantifier-free bit-
vector (QF-BV) theory based SMT-solvers, due to the large circuit size, and the presence of
AND-XOR-SHIFT structures. Similarly, the Cryptol tool-set [66] also employs AIG-based
reductions (SAT-sweeping) and SAT/SMT-solvers for verification of crypto-protocols. For
applications where AIGs/SAT/SMT-techniques fail, the Cryptol tool-set also does not de-
liver. As shown in [11], none of the BDDs, SAT, SMT solvers, nor the ABC tool can prove
design equivalence beyond 16-bit circuits.
In [67], the authors present a method for verification of integer multipliers using a
data-flow approach. This work abstracts a polynomial function of the given multiplier and
then solves the network flow problem using algebraic techniques. However, the abstraction
is solely bit-level, and is thus not applicable to deriving a word-level representation of a
given design.
2.4 Verification of Galois Field Circuits
Symbolic computer algebra techniques have been employed for formal verification of
circuits over Z2k and also over Galois fields F2k . The work of [68] shows how to use
Gro¨bner basis techniques to count the zeros of an ideal J over Fq (i.e., count VFq(J)).
The authors then follow-up with an approach for quantifier elimination over Galois fields
Fq [69]. However, computing a Gro¨bner basis is computationally expensive. While these
works present the proper theory and algorithms, efficiency/improvements to the Gro¨bner
basis computation is not addressed. This is also the case with other general verification
techniques using Gro¨bner bases [7] [9] [70], etc.
In [71] [72] [73], the authors present the BLUEVERI tool from IBM for verification of
Galois field circuits for error correcting codes against an algorithmic spec. The implemen-
tation consists of a set of (predesigned and verified) circuit blocks that are interconnected to
form the error correcting system. The spec is given as a set of design constraints on a “check
file.” Their objective is to prove the equivalence of the implementation against this check
16
file. They model the verification instance as a data-flow graph, represent each subcircuit
block with its known (word-level) polynomial over Fq, and formulate the verification prob-
lem using the Weak Nullstellensatz — i.e., to check if the variety of the algebraic system
“spec 6= implementation” is empty for which they employ a Nullstellensatz formulation.
Their main contributions are: i) a “term re-writing” to specify the algorithmic description
using polynomials (ideal); and ii) integrating an AIG-style [61] Boolean solver with their
word-level decision procedure, with lazy signal computations and Boolean reasoning. For
final verification, the polynomial system is given to a computer algebra tool (SINGULAR
[74]) to compute a reduced Gro¨bner basis. However, improvements to the core Gro¨bner
basis computational engine are not the subject of their work.
In [11] [75] [76] [77] [78], Lv et al. present computer algebra techniques for formal
verification of Galois field arithmetic circuits. Given a specification polynomial f , and
a circuit C, they formulate the verification problems as an ideal membership test using
the Strong Nullstellensatz and Gro¨bner bases. In [76], the authors show that for any
combinational circuit, there exists a term order>1 that renders the set of polynomials of the
circuit itself a Gro¨bner basis — and this term order can be easily derived by performing
a topological traversal of the circuit. By exploiting this term order, verification can be
significantly scaled to 163-bit (NIST-specified) cryptography circuits. In contrast to the
work of [11], we are not given a specification polynomial. Instead, given the circuit C, we
want to derive (extract) the word-level specification f . In our work, we borrow and further
build upon the results of [68] [69] [76] [11].
2.5 Verification of Integer Arithmetic Circuits Using
Gro¨bner Bases
Symbolic computer algebra techniques have been used for verification of integer arith-
metic circuits [79]. The paper [79] addresses verification of finite precision integer datapath
circuits using the concepts of Gro¨bner bases over the ring Z2k . This work models the
circuit constraints by way of arithmetic-bit-level (ABL) polynomials ({G}), and formulates
the verification test as an equivalent variety subset problem. This problem is solved by
deriving a term order that already makes {G} a Gro¨bner basis, then computing a normal
form f of the specification g w.r.t. {G}. Circuit correctness is established by testing
17
whether or not f is a vanishing polynomial over Z2k [80]. In [81], the authors further
show that the vanishing polynomial test can be omitted by formulating the problem directly
over Q := Z2k [X]/〈x2 − x : x ∈ X〉. However, in these works, the problem requires
that the word-level abstraction of the circuit be known, whereas our approach derives this
abstraction polynomial.
2.6 Polynomial Interpolation in Symbolic Computation
The problem of polynomial interpolation is a fundamental problem in symbolic and
algebraic computing that finds application in modular algorithms, such as the GCD compu-
tation and polynomial factorization. The problem is stated as follows: Given n distinct data
points x1, . . . , xn, and their evaluations at these points y1, . . . , yn, interpolate a polynomial
F(X) of degree n − 1 (or less) such that F(xi) = yi for 1 ≤ i ≤ n. Let t be the
number of nonzero terms in F and let T be the total number of possible terms. When
t
T
<< 1, the polynomialF is sparse, otherwise it is dense. Much of the work in polynomial
interpolation addresses sparse interpolation using the “black-box” model (also called the
algebraic circuit model) as shown in Fig. 2.1.
Let F be a multivariate polynomial in n variables {x1, . . . , xn}, with t nonzero terms
(0 < t < T ), represented with a black-box B. On input (x1, . . . , xn), the black-box
evaluates yi = F(x1, . . . , xn). Given also a degree bound d on F , the goal is to interpolate
the polynomial F with a minimum number of probes to the black-box. The early work of
Zippel [82] and Ben-Or/Tiwari [83] require O(ndt) and O(T log n) probes, respectively, to
the black-box. These bounds have since been improved significantly; the recent algorithm
of [84] interpolates with O(nt) probes.
Our problem of polynomial abstractions of Galois field circuits falls into the category
of dense interpolation, as we require a polynomial that describes the function at each of
the q points of the field Fq. Newton’s interpolation technique, with the black-box model,
Figure 2.1: The black-box or the algebraic circuit representation.
18
bounds the number of probes by (d + 1)n — which exhibits very high complexity. In
the logic synthesis area, the work of [85] investigates dense interpolation. Due to this
high-complexity, their approach is feasible only for applications over small fields, e.g.,
computing Reed-Muller forms for multivalued logic over F2.
For our problem, we can also employ the black-box model by replacing the black-box
(algebraic circuit) by the given circuit C; then every probe of the black-box would corre-
spond to a simulation of the circuit. However, as we desire a polynomial representation of
the entire function over the Galois field, exhaustive simulation would be required, which is
infeasible.
2.7 Concluding Remarks
For the problem of word-level, canonical, polynomial abstractions of Galois field arith-
metic circuits over F2k , previous related work is either inapplicable or only applicable
to circuits no larger than 32-bits in size. Therefore, we propose a symbolic approach to
polynomial interpolation from a circuit using the Gro¨bner basis computation. However, the
complexity of a Gro¨bner basis computation is prohibitively expensive; thus, we propose
further improvements to this approach by deriving a smaller subset of computations based
on a Gro¨bner basis analysis. These improvements allow for abstractions of flattened Galois
field circuits up to 571-bits, which is the largest NIST standard for ECC, or up to 1024-bits
when a hierarchy is given. Furthermore, we propose applications of this approach to allow
for formal verification of flattened Galois field circuits up to 1024-bits, where current
techniques are only applicable for circuits up to 163-bits.
CHAPTER 3
GALOIS FIELDS PRELIMINARIES AND
APPLICATION IN HARDWARE DESIGN
This chapter provides a mathematical background for understanding Galois fields and
explains how to design Galois field arithmetic circuits. We first introduce the mathemat-
ical concepts of groups, rings, fields, and polynomials. We then apply these concepts to
create Galois field arithmetic functions and explain how to map them to a Boolean circuit
implementation. The material is referred from [86] [87] [88] for Galois field concepts
and [64] [89] [65] [90] [91] for hardware design over Galois fields and previous work by
Lv [11].
3.1 Rings, Fields, and Polynomials
Definition 3.1 An abelian group is a set S with a binary operation ′+′ which satisfies the
following properties:
• Closure Law: For every a, b ∈ S, a+ b ∈ S
• Associative Law: For every a, b, c ∈ S, (a+ b) + c = a+ (b+ c)
• Commutativity: For every a, b ∈ S, a+ b = b+ a.
• Additive Identity: There is an identity element 0 ∈ S such that for all a ∈ S; a+0 = a.
• Additive Inverse: If a ∈ S, then there is an element a−1 ∈ S such that a+ a−1 = 0.
The set of integers Z forms an abelian group under the addition operation.
Definition 3.2 Given a set R with two binary operations, ′+′ and ′·′, and element 0 ∈ R,
the system R is called a commutative ring with unity if the following properties hold:
• R forms an abelian group under the ’+’ operation with additive identity element 0.
20
• Multiplicative Distributive Law: For all a, b, c ∈ R, a · (b+ c) = a · b+ a · c
• Multiplicative Associative Law: For every a, b, c ∈ R, a · (b · c) = (a · b) · c
• Multiplicative Commutative Law: For every a, b ∈ R, a · b = b · a
• Identity Element: There exists an element 1 ∈ R such that for all a ∈ R, a · 1 = a =
1 · a
For the purpose of this dissertation, any time we refer to a ring, we are specifically
referring to a commutative ring with unity. Two common examples of such rings are
the set of integers, Z, and the set of rational numbers, Q. Note that while both of these
examples are rings with an infinite number of elements, the number of elements in a ring
can also be finite.
Definition 3.3 The modular number system with base n is a set of positive integers Zn =
{0, 1, . . . , n− 1}, with the two operations + and · satisfying the properties below:
(a+ b) (mod n) ≡ ((a (mod n)) + (b (mod n))) (mod n)
(a · b) (mod n) ≡ ((a (mod n)) · (b (mod n))) (mod n)
(−a) (mod n) ≡ (n− a) (mod n)
Example 3.1 The set Z8 = {0, 1, . . . , 7} denotes the modular number system with base 8.
Examples of some operations performed (mod 8) are:
3 + 6 = 9 (mod 8) = 1
3 · 6 = 18 (mod 8) = 2
(−3) = 8− 3 (mod 8) = 5
The modular number system Zn = {0, 1, . . . , n − 1}, where n is a positive integer,
forms a ring. Since this type of ring contains a finite number of elements n, it is termed
a finite integer ring, where addition and multiplication are computed modulo n (mod n).
In hardware applications, arithmetic over k-bit vectors manifests itself as algebra over the
finite integer ring Z2k , where the k- bit vector represents integer values from {0, ...., 2k−1}.
21
Example 3.2 Consider the following arithmetic circuit:
This circuit takes two 4-bit inputs, A and B, and computes a 4-bit sum C. Since A, B,
and C are all bit-vectors of size 4, the addition computation this circuit performs is modulo
24. Hence, this circuit exemplifies arithmetic computations over the ring Z24 .
Some examples of possible inputs and outputs of the circuit:
Addition over Z24 Boolean Circuit Implementation
5 + 8 = 13 (mod 16) = 13 A = 0101, B = 1000 → C = 1101
10 + 9 = 19 (mod 16) = 3 A = 1010, B = 1001 → C = 0011
12 + 4 = 16 (mod 16) = 0 A = 1100, B = 0100 → C = 0000
Definition 3.4 Let R be a ring. A polynomial over R in the indeterminate x is an expres-
sion of the form:
a0 + a1x+ a2x




i,∀ai ∈ R. (3.1)
The constants ai are the coefficients and k is the degree of the polynomial. For example,
8x3 + 6x+ 1 is a polynomial in x over Z, with coefficients 8, 6, and 1 and degree 3.
Definition 3.5 The set of all polynomials in the indeterminate x with coefficients in the
ring R forms a ring of polynomials R[x]. Similarly, R[x1, x2, · · · , xn] represents the ring
of multivariate polynomials with coefficients in R.
For example, Z24 [x] stands for the set of all polynomials in x with coefficients in Z24 .
8x3 + 6x+ 1 is an instance of a polynomial contained in Z24 [x].
Definition 3.6 A field F is a commutative ring with unity, where every nonzero element in
F has a multiplicative inverse; i.e., ∀a ∈ F− {0}, ∃aˆ ∈ F such that a · aˆ = 1.
22
A field is defined as a ring with one extra condition: the presence of a multiplicative
inverse for all nonzero elements. Therefore, a field must be a ring while a ring is not
necessarily a field. For example, the set Z2k = {0, 1, · · · , 2k − 1} forms a finite ring.
However, Z2k is not a field because not every element in Z2k has a multiplicative inverse.
In the ring Z23 , for instance, the element 5 has an inverse (5 · 5 (mod 8) = 1) but the
element 4 does not.
The main concept of field theory is field extensions. The idea behind a field extension
is to take a base field and construct a larger field that contains the base field as well as
satisfies additional properties. For example, the set of real numbers R forms a field; one
common extension of R is the set of complex numbers C = R(i). Every element of C can
be represented as a+ b · i where a, b ∈ R, hence C is a two-dimensional extension of R.
Like rings, fields can also contain either an infinite or a finite number of elements. In
this dissertation we focus on finite fields, also known as Galois fields, and the construction
of their field extensions.
3.2 Galois Fields
Galois fields, also known as finite fields, find widespread applications in many areas of
electrical engineering and computer science such as error-correcting codes, elliptic curve
cryptography, digital signal processing, testing of VLSI circuits, among others. In this
dissertation, we specifically focus on their application to elliptic curve cryptography as
Galois field arithmetic circuits. This section describes the relevant Galois field concepts
[86] [87] [88] and hardware arithmetic designs over such fields [64] [89] [65] [90] [91].
Definition 3.7 A Galois field, denote Fq, is a field with a finite number of elements, q. The
number of elements q of the Galois field is a power of a prime integer, i.e., q = pk, where p
is a prime integer, and k ≥ 1. Thus a Galois field can also be denoted as Fpk .
Fields in the form Fpk are called Galois extension fields. We are specifically interested
in extension fields of type F2k , where k > 1. These are extensions of the binary field F2.
Example 3.3 Addition and multiplication operations over F2:
Notice that addition over F2 is a Boolean XOR operation, because it is computed










Algebraic extensions of the binary field F2 are generally termed as binary extension
fields F2k . Where elements in F2 can only represent 1 bit, elements in F2k represent a k-bit
vector. This allows them to be widely used in digital hardware applications. In order to
construct a Galois field of the form F2k , an irreducible polynomial is required:
Definition 3.8 A polynomial P (x) ∈ F2 [x] is irreducible if P (x) is nonconstant with
degree k and cannot be factored into a product of polynomials of lower degree in F2[x].
Therefore, the polynomial P (x) with degree k is irreducible over F2 if and only if it
has no roots in F2, i.e., if ∀a ∈ F2, P (a) 6= 0. For example, x2 + x + 1 is an irreducible
polynomial over F2 because it has no solutions in F2, i.e., (0)2 + (0) + 1 = 1 6= 0 and
(1)2 + (1) + 1 = 1 6= 0 over F2. Irreducible polynomials exist for any degree ≥ 2 in F2[x].
Given an irreducible polynomial P (x) of degree k in the polynomial ring F2[x], we can
construct a binary extension field F2k ≡ F2[x] (mod P (x)). Let α be a root of P (x), i.e.,
P (α) = 0. Since P (x) is irreducible over F2[x], α /∈ F2. Instead, α is an element in F2k .




(ai · αi) = a0 + a1 · α + · · ·+ ak−1 · αk−1
where ai ∈ F2 are the coefficients and P (α) = 0.
To better understand this field extension, compare its similarities to another common-
place field extension C, the set of complex numbers. C is an extension of the field of real
numbers R with an additional element i =
√−1, which is an imaginary root in R. Thus




(aj · ij) = a0 + a1 · i (3.2)
24
where aj ∈ R are coefficients. Similarly, F2k is an extension of F2 with an additional
element α, which is the “imaginary root” of an irreducible polynomial P in F2[x].
Every element A ∈ F2k has a degree less than k because A is always computed modulo
P (x), which has degree k. Thus, A (mod P (x)) can be of degree at most k − 1 and at
least 0. For this reason, the field F2k can be viewed as a k dimensional vector space over
F2. The equivalent bit vector representation for element A is:
A = (ak−1ak−2 · · · a0) (3.3)
Example 3.4 A 4-bit Boolean vector, (a3a2a1a0) can be presented over F24 as:
a3 · α3 + a2 · α2 + a1 · α + a0 (3.4)
For instance, the Boolean vector 1011 is represented as the element α3 + α + 1.
Example 3.5 Let us construct F24 as F2[x] (mod P (x)), where P (x) = x4 + x3 + 1 ∈
F2[x] is an irreducible polynomial of degree k = 4. Let α be the root of P (x), i.e., P (α) =
0.
Any element A ∈ F2[x] (mod x4 + x3 + 1) has a representation of the type: A =
a3x
3 + a2x
2 + a1x + a0 (degree < 4) where the coefficients a3, . . . , a0 are in F2 = {0, 1}.
Since there are only 16 such polynomials, we obtain 16 elements in the field F24 . Each
element in F24 can then be viewed as a 4-bit vector over F2. Each element also has an
exponential α representation. All three representations are shown in Table 3.1.
We can compute the polynomial representation from the exponential representation.
Since every element is computed (mod P (α)) = (mod α4 + α3 + 1), we compute the
element α4 as
α4 (mod α4 + α3 + 1) = −α3 − 1 = α3 + 1 (3.5)
Recall that all coefficients of F24 are in F2 where −1 = +1 modulo 2. The next element α5
can be computed as
α5 = α4 · α = (α3 + 1) · α = α4 + α = α3 + α + 1 (3.6)
Then α6 can be computed as α5 ∗ α and so on.
25
Table 3.1: Bit-vector, exponential and polynomial representation of elements in F24 =
F2[x] (mod x4 + x3 + 1)
a3a2a1a0 Exponential Polynomial a3a2a1a0 Exponential Polynomial
0000 0 0 1000 α3 α3
0001 1 1 1001 α4 α3 + 1
0010 α α 1010 α10 α3 + α
0011 α12 α + 1 1011 α5 α3 + α + 1
0100 α2 α2 1100 α14 α3 + α2
0101 α9 α2 + 1 1101 α11 α3 + α2 + 1
0110 α13 α2 + α 1110 α8 α3 + α2 + α
0111 α7 α2 + α + 1 1111 α6 α3 + α2 + α + 1
An irreducible polynomial can also be a primitive polynomial.
Definition 3.9 A primitive polynomial P (x) is a polynomial with coefficients in F2, which
has a root α ∈ F2k such that {0, 1(= α2k−1), α, α2, · · · , α2k−2} is the set of all elements in
F2k , where α is a primitive element of F2k .
A primitive polynomial is guaranteed to generate all distinct elements of a finite field
F2k while an irreducible polynomial has no such guarantee. Often, there exists more than
one irreducible polynomial of degree k. In such cases, any degree k irreducible polynomial
can be used for field construction. For example, both x3 + x + 1 and x3 + x2 + 1 are
irreducible in F2 and either one can be used to construct F23 . This is due to the following:
Theorem 3.1 There exist a unique field Fpk , for any prime p and any positive integer k.
Thm. 3.1 implies that Galois fields with the same number of elements are isomorphic
to each other up to the labeling of the elements.
Thm. 3.2 provides an important property for investigating solutions to polynomial
equations in Fq.
Theorem 3.2 [Generalized Fermat’s little theorem] Given a Galois field Fq, each element
A ∈ Fq satisfies:
26
Aq ≡ A
Aq − A ≡ 0 (3.7)
We can extend Thm. 3.2 to polynomials in Fq[x] as follows:
Definition 3.10 Let xq − x be a polynomial in Fq[x]. Every element A ∈ Fq is a solution
to xq − x = 0. Therefore, xq − x always vanishes in Fq. Such polynomials are called
vanishing polynomials of the field Fq.









= α (mod α2 + α + 1)
(α + 1)2
2
= α + 1 (mod α2 + α + 1)
3.2.1 Containment of Galois Fields
A Galois field Fq can be fully contained within a larger field Fqk . That is, Fq ⊂ Fqk .
For example, Fig 3.1 shows the containment of the fields F2 ⊂ F4 ⊂ F16. It is easy to see
that since F4 = F22 , it contains F2. Likewise, F16 = F42 = F24 contains F4 and F2. The
elements {0, 1, α, . . . , α14} designate F16. Of these, {0, 1, α5, α10} create F4. From these,
only {0, 1} exist in F2.





(α5)3 = α15 = 1 (3.8)
The only elements that are generated in this recurrence are {1, α5, α10}. Every field con-
tains {0, 1}, so the elements {0, 1, α5, α10} form F4. Let P (x) = x4 + x3 + 1 be the
27
Figure 3.1: Containment of fields: F2 ⊂ F4 ⊂ F16.
primitive polynomial used to generate F24 = F16. A primitive polynomial of degree 2 used
to generate F22 = F4 can be found as follows:
(x+ α5) · (x+ α10) mod P (x)
= x2 + (α10 + α5)x+ α15 mod P (x)
= x2 + x+ 1 (3.9)
Theorem 3.3 F2n ⊂ F2m iff n | m, i.e., if n divides m.
Therefore:
• F2 ⊂ F22 ⊂ F24 ⊂ F28 ⊂ . . .
• F2 ⊂ F23 ⊂ F29 ⊂ F227 ⊂ . . .
• F2 ⊂ F25 ⊂ F225 ⊂ F2125 ⊂ . . . , and so on
Definition 3.11 The algebraic closure of the Galois field F2k , denoted F2k , is the union of
all fields F2n such that k | n.
3.2.2 Polynomial Interpolation Over Galois Fields
In the construction of digital circuits, arbitrary mappings between two bit-vectors of
size k can be constructed. Each such mapping generates a function f : Bk → Bk. As every
28
k-bit vector can be construed as an element in F2k (as shown in the previous section), every
such function also corresponds to a function over a Galois field: f : F2k → F2k .
Definition 3.12 A function f : R→ R over a ring R is considered a polynomial function
if there exists a polynomial F ∈ R[x1, . . . , xd] such that F(x1, . . . , xd) = f(x1, . . . , xd).
Theorem 3.4 From [88]: Let Fq be a Galois field of q elements where q is a power of a
prime integer. Given any function f : Fq → Fq, there exists a polynomial F ∈ Fq[x] such
that f(a) = F(a), for all a ∈ Fq. Thus, every function f : Fq → Fq is a polynomial
function.
Thus, since every function over a Galois field, f : F2k → F2k , is a polynomial
function, every mapping between two bit-vectors of size k is a polynomial function over
F2k . Furthermore, every polynomial can be derived using Lagrange interpolation.
Theorem 3.5 [Lagrange interpolation] Given a set of k data points over a function f ,
(x0, f(x0)), . . . , (xk−1, f(xk−1))
where no two xi ∈ {x0, . . . , xk−1} are the same elements, the polynomial representation of






j 6=i(xi − xj)
· f(xi) (3.10)
By applying Lagrange interpolation over every element in the Galois field F2k , we can
derive the polynomial representation F of any function f : F2k → F2k . Furthermore, F is
a polynomial of degree at most 2k − 1 in x and F(a) = f(a) for all a ∈ F2k .
While every function over a Galois field is a polynomial function, not every function
over the integer ring Z is a polynomial function.
Example 3.7 Let A = {a2, a1, a0} and Z = {z2, z1, z0} be 3-bit vectors. Thus, A and
Z ∈ B3. Consider the following function:
f : Z[2 : 0] = A[2 : 0] >> 1
29
f is a bit-vector right shift operation on A. This function can be analyzed as a mapping
over different forms: B3 → B3, Z8 → Z8, and F23 → F23 . These mappings from A to Z
are:
{a2a1a0} ∈ B3 A ∈ Z8 A ∈ F23 → {z2z1z0} ∈ B3 Z ∈ Z8 Z ∈ F23
000 0 0 → 000 0 0
001 1 1 → 000 0 0
010 2 α → 001 1 1
011 3 α+ 1 → 001 1 1
100 4 α2 → 010 2 α
101 5 α2 + 1 → 010 2 α
110 6 α2 + α → 011 3 α+ 1
111 7 α2 + α+ 1 → 011 3 α+ 1
f : Z8 → Z8 is not a polynomial function (this can be verified using the results
of [92] [93] [94]). However, f : F23 → F23 is a polynomial function. By applying
Lagrange’s interpolation formula to f over F23 for every element in F23 , we obtain the
following polynomial function: Z = (α2+1)A4+(α2+1)A2, where P (α) = α3+α+1 = 0.
Since every function over F2k is a polynomial function, the functional mapping of a
Galois field arithmetic circuit over F2k must exist in polynomial form. Construction of
these arithmetic circuits is described next.
3.3 Hardware Implementations of Arithmetic Operations
Over Galois Fields
There are two main applications of hardware implementations of Galois field arith-
metic. In the first case, Galois field arithmetic computations, such as ADD OR MUL,
are implemented in hardware, and algorithms are then implemented in software (e.g.,
cryptoprocessors [95] [96]). In other cases, the entire design can be implemented in
hardware, such as a one-shot Reed-Solomon encoder-decoder chip [97] [98], or point
multiplication circuitry [99] used in elliptic curve cryptosystems. Therefore, there has been
extensive research in efficient hardware design of primitive arithmetic computations over
Galois fields. In this section, we describe the design principles of such circuits with focus
on their architecture and verification complexity.
30
Addition in F2k is performed by correspondingly adding the polynomials together and
reducing the coefficients of the result modulo 2.
Example 3.8 Given A = α3 + α2 + 1 = (1101) and B = α2 + 1 = (0101) in F24 ,
A+B = (α3 + α2 + 1) + (α2 + 1) = (α3) + (α2 + α2) + (1 + 1) = α3 = (1000).
Effectively, the addition operation is only performed on the coefficients, which are in
F2. As addition over F2 performs an XOR operation, constructing an addition circuit over
F2k is trivial as it only consists of k number of XOR gates. A 4-bit adder over F24 is shown
in Fig. 3.2.
Multiplication Z = A × B (mod P (x)) in F2k conceptually consists of two steps.
In the first step, the multiplication A × B is performed. In the second step, the result is
reduced modulo the irreducible polynomial P (x). This multiplication procedure is shown
in Example 3.9.
Example 3.9 Consider the field F24 with the irreducible polynomial P (x) = x4 + x3 + 1
and P (α) = 0. We take as inputs: A = a0 + a1 · α + a2 · α2 + a3 · α3 and B = b0 + b1 ·
α+ b2 · α2 + b3 · α3. We have to perform the multiplication Z = A×B (mod P (x)). The
coefficients of A = {a0, . . . , a3}, B = {b0, . . . , b3} are in F2 = {0, 1}. This multiplication













Figure 3.2: 4-bit adder over F24 .
31
a3 a2 a1 a0
× b3 b2 b1 b0
a3 · b0 a2 · b0 a1 · b0 a0 · b0
a3 · b1 a2 · b1 a1 · b1 a0 · b1
a3 · b2 a2 · b2 a1 · b2 a0 · b2
a3 · b3 a2 · b3 a1 · b3 a0 · b3
s6 s5 s4 s3 s2 s1 s0
The result Sum = s0 + s1 · α + s2 · α2 + s3 · α3 + s4 · α4 + s5 · α5 + s6 · α6, where
s0 = a0 · b0
s1 = a0 · b1 + a1 · b0
s2 = a0 · b2 + a1 · b1 + a2 · b0
s3 = a0 · b3 + a1 · b2 + a2 · b2 + a3 · b1
s4 = a1 · b3 + a2 · b1 + a3 · b1
s5 = a2 · b3 + a3 · b2
s6 = a3 · b3
Here the multiply “·” and add “+” operations are performed modulo 2, so they can
be implemented in a circuit using AND and XOR gates, respectively. Note that unlike
integer multipliers, there are no carry-chains in the design, as the coefficients are always
reduced modulo 2. However, the result is yet to be reduced modulo the primitive polynomial
P (x) = x4 + x3 + 1. This transforms every exponent representation, αd, to a polynomial
representation where d ≥ k = 4.
α3 α2 α 1
s3 s2 s1 s0
s4 0 0 s4 s4 · α4 (mod P (α)) = s4 · (α3 + 1)
s5 0 s5 s5 s5 · α5 (mod P (α)) = s5 · (α3 + α + 1)
s6 s6 s6 s6 s6 · α6 (mod P (α)) = s6 · (α3 + α2 + α + 1)
z3 z2 z1 z0
The final result (output) of the circuit is: Z = z0 + z1α + z2α2 + z3α3; where z0 =
s0 + s4 + s5 + s6; z1 = s1 + s5 + s6; z2 = s2 + s6; z3 = s3 + s4 + s5 + s6.
32
The above multiplier design is called the Mastrovito multiplier [64], which is the most
straightforward way to design a multiplier over F2k . A logic circuit for a 4-bit Mastrovito
multiplier over Galois field F24 is illustrated in Fig. 3.3.
Modular multiplication is at the heart of many public-key cryptosystems, such as elliptic
curve cryptography (ECC) [100]. Due to the very large field size (and hence the datapath
width) used in these cryptosystems, the above Mastrovito multiplier architecture is ineffi-
cient, especially when exponentiation and repeat multiplications are performed. Therefore,
efficient hardware and software implementations of modular multiplication algorithms are
used to overcome the complexity of such operations. One such algorithm we will focus on
is the Montgomery reduction [89] [65].
Figure 3.3: Mastrovito multiplier over F24 .
33
3.3.1 Montgomery Multipliers
Montgomery reduction (MR) computes:
G = MR(A,B) = A ·B ·R−1 (mod P (x)) (3.11)
where A,B are k-bit inputs, R = αk, R−1 is multiplicative inverse of R in F2k , and P (x) is
the irreducible polynomial for F2k . Since Montgomery reduction cannot directly compute
A ·B, we need to precompute A ·R and B ·R, as shown in Fig. 3.4.
Each MR block in Fig. 3.4 represents a Montgomery reduction step, which is a hard-
ware implementation of the algorithm shown in Algorithm 1.
The design of Fig. 3.4 is not efficient for computingA·B (mod P (x)) when compared
to the Mastrovito implementation. However, when these multiplications are performed
repeatedly, such as in iterative squaring, then the Montgomery approach speeds-up the
computation. As shown in [90], the critical path delay and gate counts of a squarer designed














G=A B (mod P)
"1"
Figure 3.4: Montgomery multiplier over F2k .
Algorithm 1: Montgomery Reduction Algorithm [65]
Input: A(x), B(x) ∈ F2k ; irreducible polynomial P (x).
Output: G(x) = A(x) ·B(x) · x−k (mod P (x)).
G(x) :=0
for (i = 0; i ≤ k − 1; ++i ) do
G(x) := G(x) + Ai ·B(x) /*Ai is the ith bit of A*/;
G(x) := G(x) +G0 · P (x) /*G0 is the lowest bit of G*/;
G(x) := G(x)/x /*Right shift 1 bit*/;
end
34
3.3.2 Circuit Designs Over Composite Fields
The Galois field F2k is a k-dimensional vector space over the subfield F2. If k = m · n,
the field F2k can be decomposed as F(2m)n . Such a field representation is called a composite
field, and it is constructed as a n-dimensional extension of the subfield F2m . The subfield
F2m is called the ground field. Note that we have F2 ⊂ F2m ⊂ F(2m)n .
A Galois field arithmetic circuit over F2k can thus be composed as circuit over F(2m)n
if k = m · n. Since the base field is F2m , this composite field circuit is composed of blocks
of m-bit multipliers and adders, along with m-bit buses that act as the inputs and outputs
of these blocks. A F24 Galois field multiplier designed over the composite field F(22)2 is
shown in Fig. 3.5. Design methodologies of these circuits are examined more closely in
Chapter 7.
3.3.3 Applications to Elliptic Curve Cryptography
Elliptic Curve Cryptography (ECC) is one of the most influential applications of Galois
fields. ECC is an approach to public-key (or asymmetric-key) cryptography based on the
algebraic structure of elliptic curves over Galois fields. Due to the complex nature of these

















































Figure 3.5: 4-bit composite multiplier designed over F(22)2 .
35
while providing the same level of security [101]. The main operations of encryption,
decryption and authentication in ECC rely on point multiplications.
Point multiplication involves a series of addition and doubling of points on the elliptic
curve. A drawback of traditional point multiplication is that each point addition and
doubling require a multiplicative inverse operation over Galois fields, the computation of
which is costly. Modern methods, however, represent the points in projective coordinate
systems [99], which has eliminated the need for a multiplicative inverse operation by re-
placing it with addition and multiplication operations over Galois fields. This has increased
the efficiency of point multiplication operations, but it has also increased the need for fast,
custom hardware designs of Galois field arithmetic.
In-depth analysis of elliptic curve theory is beyond the scope of this dissertation. In-
stead, we will look at some examples of point addition and point doubling to give a general
idea of the operations involved in ECC and how they apply to Galois field arithmetic.
Our experiments use custom Galois field arithmetic designs based on Lo´pez-Dahab (LD)
coordinate system [102], so these examples will use the same coordinate system.
Example 3.10 Consider point addition in a LD projective coordinate system, as seen in
Fig. 3.6.
Given an elliptic curve: Y 2 +XY Z = X3Z + aX2Z2 + bZ4 over F2k , where X, Y, Z
are k-bit vectors that are elements in F2k and similarly, a, b are constants from the field.
Let P + Q = R represent point addition over the elliptic curve. P = (X1, Y1, Z1) and
Q = (X2, Y2, 1) are given. Then R = (X3, Y3, Z3) can be computed as follows:
A = Y2 · Z21 + Y1
B = X2 · Z1 +X1
C = Z1 ·B
D = B2 · (C + aZ21)
Z3 = C
2
E = A · C
36





Figure 3.6: Point addition over an elliptic curve (R=P+Q).
X3 = A
2 +D + E
F = X3 +X2 · Z3
G = X3 + Y2 · Z3
Y3 = E · F + Z3 ·G
Example 3.11 Consider point doubling in a LD projective coordinate system. Given an
elliptic curve: Y 2 +XY Z = X3Z+aX2Z2 + bZ4. Let 2(X1, Y1, Z1) = (X3, Y3, Z3), then
X3 = X
4






1 · Z3 +X3 · (aZ3 + Y 21 + bZ41)
In the above examples, polynomial multiplication and squaring operations can be im-
plemented in hardware using Montgomery reductions over Galois fields F2k . In practical
applications, the field size k of F2k is 163, or larger. However, there are no word-level
abstraction techniques applicable to circuits of such size, so hardware implementations of
37
Galois field arithmetic circuits cannot benefit from the many advantages of abstraction.
Thus, we propose a computer-algebra approach to word-level polynomial abstractions of
Galois field arithmetic circuits. Recent computer-algebra formal verification techniques
[11] have been able to verify these circuits up to 163 bits. We propose an application of
our abstraction approach to improve these techniques. These improvements allow us to
perform formal verification of these circuits up to 571 bits. These proposals are described
in detail in subsequent chapters.
CHAPTER 4
COMPUTER ALGEBRA FUNDAMENTALS
This chapter reviews fundamental concepts of commutative and computer algebra that
are used in this work. Specifically, this chapter covers monomial ordering, polynomial
ideals and varieties, and the computation of Gro¨bner bases. It also overviews elimination
theory as well as Hilbert’s Nullstellensatz theorems and how they apply to Galois fields.
The results of these theorems are used in polynomial abstraction and formal verification of
Galois field circuits and are discussed in subsequent chapters. The material of this chapter
is mostly referred from the textbooks [103] [10] and previous work by Lv [11].
4.1 Monomials, Polynomials, and Term Orderings
Definition 4.1 A monomial in variables x1, x2, · · · , xd is a product of the form:
xα11 · xα22 · · · · xαdd , (4.1)
where αi ≥ 0, i ∈ {1, · · · , d}. The total degree of the monomial is α1 + · · ·+ αd.
Thus, x2 · y is a monomial in variables x, y with total degree 3. For simplicity, we will
henceforth denote a monomial xα11 · xα22 · · · ·xαdd as xα, where α = (α1, · · · , αd) is a vector
size d of integers ≥ 0, i.e., α ∈ Zd≥0.
Definition 4.2 A multivariate polynomial f in variables x1, x2, . . . , xd with coefficients




aα · xα, aα ∈ K
The set of all polynomials in x1, x2, . . . , xd with coefficients in field K is denoted by
K[x1, x2, . . . , xd]. Thus, f ∈ K[x1, x2, . . . , xd]
39
1. We refer to the constant aα ∈ K as the coefficient of the monomial aαxα.
2. If aα 6= 0, we call aαxα a term of f .
As an example, 2x2 + y is a polynomial with two terms 2x2 and y, with 2 and 1 as
coefficients, respectively. In contrast, x+ y−1 is not a polynomial because the exponent of
y is less than 0.
Since a polynomial is a sum of its terms, these terms have to be arranged unambiguously
so that they can be manipulated in a consistent manner. Therefore, we need to establish a
concept of term ordering (also called monomial ordering). A term ordering, represented
by >, defines how terms in a polynomial are ordered.
Definition 4.3 Let Td = {xα : α ∈ Zd≥0} be the set of all monomials in x1, . . . , xd. A
monomial order > on Td is a total well-ordering satisfying:
• For all xα, xβ ∈ Td, xα and xβ are comparable
• For any xα ∈ Td, xα > 1
• For all xα, xβ, xγ ∈ Td, xα > xβ ⇒ xα · xγ > xβ · xγ
Term-orderings are totally ordered, i.e., antisymmetric with constant terms last in the
ordering. A total-order ensures that there is no ambiguity with respect to where a term is
found in the term-ordering. Total orderings for monomials come in different forms, notably
lexicographic orderings (lex), and its variants: degree-lexicographic ordering (deglex) and
reverse degree-lexicographic ordering (degrevlex).
A lexicographic ordering (lex) is a total-ordering> such that variables in the terms are
lexicographically ordered, i.e., simply based on when the variables appear in the ordering.
Higher variable-degrees take precedence over lower degrees for equivalent variables (e.g.,
a3 > a2 due to a · a · a > a · a · 1).
Definition 4.4 Let x1 > x2 > · · · > xd be in lexicographic order. Also let α =
(α1, . . . , αd); β = (β1, . . . , βd) ∈ Zd≥0. Then, we have:
xα > xβ ⇐⇒
 Starting from the left, the first co-ordinates of αi, βithat are different satisfy αi > βi (4.2)
40
A degree-lexicographic ordering (deglex) is a total-ordering > such that the total
degree of a term takes precedence over the lexicographic ordering. A degree-reverse-
lexicographic ordering (degrevlex) is the same as a deglex ordering, however, terms are
lexed in reverse.
Definition 4.5 Let x1 > x2 > · · · > xd be in degree lexicographic order. Also let
α = (α1, . . . , αd); β = (β1, . . . , βd) ∈ Zd≥0. Then, we have:








i=1 βi and x
α > xβ w.r.t. lex order
(4.3)
Definition 4.6 Let x1 > x2 > · · · > xd be in degree reverse lexicographic order. Also
let α = (α1, . . . , αd); β = (β1, . . . , βd) ∈ Zd≥0. Then, we have:








i=1 βi and the first co-ordinates
αi, βi from the right, which are different, satisfy αi < βi
(4.4)
Applying these term orderings, we have the following relations, where a > b > c.
lex:a2b > a2 > abc > ab > ac2 > ac > b2c > b2 > bc3 > 1 (4.5)
deglex:bc3 > a2b > abc > ac2 > b2c > a2 > ab > ac > b2 > 1 (4.6)
degrevlex:bc3 > a2b > abc > b2c > ac2 > a2 > ab > b2 > ac > 1 (4.7)
The difference between the lex and two deg- orderings is obvious, while the difference
between the two degree-based orderings can be seen by considering from which direction
the term is lexed, e.g., a · c · c > b · b · c (deglex, left-to-right) versus b · b · c > a · c · c
(degrevlex, right-to-left).
Example 4.1 Let f = 2x2yz + 3xy3 − 2x3. The effects of different term orderings on f
are:
• lex x > y > z: f = −2x3 + 2x2yz + 3xy3
• deglex x > y > z: f = 2x2yz + 3xy3 − 2x3
41
• degrevlex x > y > z: f = 3xy3 + 2x2yz − 2x3
Definition 4.7 The leading term is the first term in a term-ordered polynomial. Likewise,
the leading coefficient is the coefficient of the leading term. Finally, a leading monomial
is the leading term lacking the coefficient. We use the following notation:
lt(f) — Leading Term (4.8)
lc(f) — Leading Coefficient (4.9)
lm(f) — Leading Monomial (4.10)
tail(f) f − lt(f) (4.11)
Example 4.2
f = 3a2b+ 2ab+ 4bc (4.12)
lt(f) = 3a2b (4.13)
lc(f) = 3 (4.14)
lm(f) = a2b (4.15)
tail(f) = 2ab+ 4bc (4.16)
Polynomial division is an operation over polynomials that is dependent on the imposed
monomial ordering. Dividing a polynomial f by another polynomial g cancels the leading
term of f to derive a new polynomial.
Definition 4.8 Let K be a field and let f, g ∈ K[x1, x2, . . . , xd] be polynomials over the
field. Polynomial division of f by g computes a remainder r as:
r = f − lt(f)
lt(g)
· g (4.17)





is nonzero, then f is considered divisible by g, i.e., g | f .
Notice that if g - f , that is, if f is not divisible by g, then the division operation gives r = f .
42
Example 4.3 Over R[x, y, z], set the lex term order x > y > z. Let f = −2x3 + 2x2yz +









is nonzero g|f . The division, f g−→ r, is computed as:
r = f − lt(f)
lt(g)
· g = −2x3 + 2x2yz + 3xy3 − (−2x · (x2 + yz))
= −2x3 + 2x2yz + 3xy3 − (−2x3 − 2xyz) = 2x2yz + 3xy3 + 2xyz (4.20)
Notice that the division cancels the leading term of f .
4.2 Varieties and Ideals
In computer-algebra based formal verification, it is often necessary to analyze the
presence or absence of solutions to a given system of constraints. In our applications,
these constraints are polynomials and their solutions are modeled as varieties.
Definition 4.9 LetK be a field, and let f1, . . . , fs ∈ K[x1, x2, . . . , xd]. We call V (f1, . . . , fs)
the affine variety defined by f1, . . . , fs as:
V (f1, . . . , fs) = {(a1, . . . , ad) ∈ Kd : fi(a1, . . . , ad) = 0,∀i, 1 ≤ i ≤ s} (4.21)
V (f1, . . . , fs) ∈ Kd is the set of all solutions in Kd of the system of equations:
f1(x1, . . . , xd) = · · · = fs(x1, . . . , xd) = 0
Example 4.4 Given R [x, y], V (x2 + y2) is the set of all elements that satisfy x2 + y2 = 0
overR2. So V (x2+y2) = {(0, 0)}. Similarly, inR [x, y], V (x2+y2−1) = {all points on the circle :
x2 + y2 − 1 = 0}. Note that varieties depend on which field we are operating on. For the
same polynomial x2 + 1, we have:
• In R[x], V (x2 + 1) = ∅.
• In C[x], V (x2 + 1) = {(±i)}.
The above example shows that the variety can be infinite, finite (nonempty set) or
empty. Note that any finite set of points is a variety. Likewise, any variety over Fq is
43
finite (or empty). Consider the points {(a1, . . . , ad) : a1, . . . , ad ∈ Fq} in Fdq . Any
single point is a variety of some polynomial system: e.g., (a1, . . . , ad) is a variety of
x1 − a1 = x2 − a2 = · · · = xd − ad = 0. Finite unions and finite intersections of
varieties are also varieties.
Example 4.5 Let U = V (f1, . . . , fs) and W = V (g1, . . . , gt) in Fq. Then:
• U ∩W = V (f1, . . . , fs, g1, . . . , gt)
• U ∪W = V (figj : 1 ≤ i ≤ s, 1 ≤ j ≤ t)
One important distinction we need to make about varieties is that a variety depends not
just on the given system of polynomial equations, but rather on the ideal generated by the
polynomials.
Definition 4.10 A subset I ⊂ K[x1, x2, . . . , xd] is an ideal if it satisfies:
• 0 ∈ I
• I is closed under addition: x, y ∈ I ⇒ x+ y ∈ I
• If x ∈ K[x1, x2, . . . , xd] and y ∈ I , then x · y ∈ I and y · x ∈ I
An ideal is generated by its basis or generators.
Definition 4.11 Let f1, f2, . . . , fs be polynomials of the ringK[x1, x2, . . . , xd]. Let I be an
ideal generated by f1, f2, . . . , fs. Then:
I = 〈f1, . . . , fs〉 = {h1f1 + h2f2 + . . .+ hsfs : h1, . . . , hs ∈ K[x1, . . . , xd]}
then, f1, . . . , fs are called the basis (or generators) of the ideal I and correspondingly I
is denoted as I = 〈f1, f2, . . . , fs〉.
Example 4.6 The set of even integers, which is a subset of the ring of integers Z, forms an
ideal of Z. This can be seen from the following;
• 0 belongs to the set of even integers.
44
• The sum of two even integers x and y is always an even integer.
• The product of any integer x with an even integer y is always an even integer.
Example 4.7 Given R [x, y], I = 〈x, y〉 is an ideal containing all polynomials generated
by x and y, such as x2 +y and x+x ·y. J = 〈x2, y2〉 is an ideal containing all polynomials
generated by x2 and y2, such as x2 + y3 and x10 +x2 · y2. Notice that J ⊂ I because every
polynomial generated by J can be generated by I; but, I 6= J because x + y can only be
generated by I .
The same ideal may have many different bases. For instance, it is possible to have
different sets of polynomials {f1, . . . , fs} and {g1, . . . , gt} that may generate the same
ideal, i.e., 〈f1, . . . , fs〉 = 〈g1, . . . , gt〉. Since variety depends on the ideal, these sets of
polynomials have the same solutions.
Proposition 4.1 If f1, . . . , fs and g1, . . . , gt are bases of the same ideal in F[x1, . . . , xd],
so that 〈f1, . . . , fs〉 = 〈g1, . . . , gt〉, then V (f1, . . . , fs) = V (g1, . . . , gt).
Example 4.8 Consider the two bases F1 = {(2x2 + 3y2 − 11, x2 − y2 − 3} and F2 =
{x2 − 4, y2 − 1}. These two bases generate the same ideal, i.e., 〈F1〉 = 〈F2〉. Therefore,
they represent the same variety, i.e.,
V (F1) = V (F2) = {±2,±1} (4.22)
Ideals and their varieties are a key part of computer-algebra based formal verifica-
tion. A given hardware design can be transformed into a set of polynomials over a field,
f1, . . . , fs ∈ F (we showed how this is done for Galois field arithmetic circuits in the





Using algebra, it is possible to derive new equations from the original system. The ideal
〈f1, . . . , fs〉 provides a way of analyzing such consequences of a system of polynomials.
Example 4.9 Given two equations in R[x, y, z]:
x = z + 1
y = x2 + 1
we can eliminate x to obtain a new equation:
y = (z + 1)2 + 1 = z2 + 2z + 2
Let f1, f2, h ∈ R[x, y, z] be polynomials based on these equations:
f1 = x− z − 1 = 0
f2 = y − x2 − 1 = 0
h = y − z2 − 2z − 2 = 0
If I is the ideal generated by f1 and f2, i.e., I = 〈f1, f2〉, then we find h ∈ I as follows:
g1 = x+ z + 1
g2 = 1
h = g1 · f1 + g2 · f2 = y − z2 − 2z − 2
where g1, g2 ∈ R[x, y, z]. Thus, we call h a member of the ideal I .
Let K be any field and let a = (a1, . . . , ad) ∈ Kd be a point, and f ∈ K[x1, . . . , xd] be
a polynomial. We say that f vanishes on a if f(a) = 0, i.e., a is in the variety of f .
Definition 4.12 For any variety V ofKd, the ideal of polynomials that vanish on V , called
the vanishing ideal of V , is defined as I(V ) = {f ∈ F[x1, . . . , xd] : ∀a ∈ V, f(a) = 0}.
Proposition 4.2 If a polynomial f vanishes on a variety V , then f ∈ I(V ).
46
Example 4.10 Let ideal J = 〈x2, y2〉. Then V (J) = {(0, 0)}. All polynomials in J will
obviously agree with the solution and vanish on this variety. However, the polynomials
x, y are not in J but they also vanish on this variety. Therefore, I(V (J)) is the set of all
polynomials that vanish on V (J), and the polynomials x, y are members of I(V (J)).
Definition 4.13 Let J ⊂ K[x1, . . . , xd] be an ideal. The radical of J is defined as
√
J =
{f ∈ K[x1, . . . , xd] : ∃m ∈ N, fm ∈ J}.
Example 4.11 Let J = 〈x2, y2〉 ⊂ K [x, y]. Note neither x nor y belongs to J , but they
belong to
√
J . Similarly, x · y /∈ J , but since (x · y)2 = x2 · y2 ∈ J , therefore, x · y ∈ √J .
When J =
√
J , then J is said to be a radical ideal. Moreover, I(V ) is a radical
ideal. By analyzing the ideal J , generated by a system of polynomials derived from a
hardware design, its variety V (J), and the ideal of polynomials that vanish over this variety,
I(V (J)), we can reason about the existence of certain properties of the design. To check
for the existence of a property, we formulate the property as a polynomial and then perform
an ideal membership test to determine if this polynomial is contained within the ideal
I(V (J)). A Gro¨bner basis provides a decision procedure for performing this test, which
is described in the following section. A future section focuses on Hilbert’s Nullstellensatz,
which describes the properties of the ideal of a variety, I(V (J)).
4.3 Gro¨bner Bases
As mentioned earlier, different polynomial sets may generate the same ideal. Some
of these generating sets may be a better representation of the ideal, and thus provide
more information and insight into the properties of ideal. One such ideal representation
is a Gro¨bner basis, which has a number of important properties that can solve numerous
polynomial decision questions:
• Presence or absence of solutions (varieties)
• Dimension of the varieties
• Ideal membership of a polynomial
47
In essence, a Gro¨bner basis is a canonical representation of an ideal. There are many
equivalent definitions of Gro¨bner bases, so we start with the definition that best describes
their properties.
Definition 4.14 A set of nonzero polynomials G = {g1, . . . , gt}, which generate the ideal
I = 〈g1, . . . , gt〉, is called a Gro¨bner basis for I if and only if for all f ∈ I where f 6= 0,
there exists a gi ∈ G such that lm(gi) divides lm(f).
G = Gro¨bnerBasis(I) ⇐⇒ ∀f ∈ I : f 6= 0,∃gi ∈ G : lm(gi) | lm(f) (4.23)
The foundation for computing the Gro¨bner basis of an ideal was laid out by Buchberger
[104]. Given a set of polynomials F = {f1, . . . , fs} that generate ideal I = 〈f1, . . . , fs〉,
Buchberger gives an algorithm to compute a Gro¨bner basis G = 〈g1, . . . , gt〉. This algo-
rithm relies on the notions of S-polynomials and polynomial reduction.




· f − L
lt(g)
· g (4.24)
where L = lcm (lt(f), lt(g))
Note that lcm denotes least common multiple.
Definition 4.16 The reduction of a polynomial f by another polynomial g to the a reduced
form r is denoted:
f
g−→+ r




represents the reduced polynomial r resulting from f as reduced by a set of nonzero
polynomials F = {f1, . . . , fs}. The polynomial r is considered reduced if r = 0 or no
term in r is divisible by a lm(fi),∀fi ∈ F .
48
The reduction process f F−→+ r, of dividing a polynomial f by a set of polynomials of
F , can be modeled as repeated long-division of f by each of the polynomials in F until no
further reductions can be made. The result of this process is then r. This reduction process
is shown in Algorithm 2.
The reduction algorithm keeps canceling the leading terms of polynomials until no
more leading terms can be further canceled. So the key step is p = p− lt(p)/lt(fi) · fi, as
the following example shows.
Example 4.12 Given f = y2 − x and f1 = y − x in Q[x, y] with deglex: y > x, perform
f
f1−→+ r:
1. f = y2 − x, f/f1 = f − lt(f)/lt(f1) · f1 = y2 − x− (y2/y) · (y − x) = y · x− x
2. f = y · x− x, f/f1 = f − lt(f)/lt(f1) · f1 = (y · x− x)/f1 = x2 − x
3. f = x2 − x, no more operations possible, so r = x2 − x
Algorithm 2: Polynomial Reduction
Input: f, f1, . . . , fs
Output: r, a1, . . . , as, such that f = a1 · f1 + · · ·+ as · fs + r.
a1 = a2 = · · · = as = 0; r = 0;
p := f ;
while p 6= 0 do
i=1;
divisionmark = false;
while i ≤ s && divisionmark = false do
if fi can divide p then
ai = ai + lt(p)/lt(fi);






if divisionmark = false then
r = r + lt(p);




Algorithm 3: Buchberger’s Algorithm
Input: F = {f1, . . . , fs}, such that I = 〈f1, . . . , fs〉
, and term order > Output: G = {g1, . . . , gt}, a Gro¨bner basis of I
G := F ;
repeat
G′ := G;
for each pair {fi, fj}, i 6= j in G′ do
Spoly(fi, fj)
G′−→+ r ;
if r 6= 0 then
G := G ∪ {r} ;
end
end
until G = G′;
With the notions of S-polynomials and polynomial reduction in place, we can now
present Buchberger’s algorithm for computing Gro¨bner bases [104]. Note that a fixed
monomial (term) ordering is required for a Gro¨bner basis computation to ensure that poly-
nomials are manipulated in a consistent manner.
Buchberger’s algorithm takes pairs of polynomials (fi, fj) in the basis G and combines
them into “S-polynomials” (Spoly(fi, fj)) to cancel leading terms. The S-polynomial is
then reduced (divided) by all elements ofG to a remainder r, denoted as Spoly(fi, fj)
G−→+
r. This process is repeated for all unique pairs of polynomials, including those created by
newly added elements, until no new polynomials are generated; ultimately constructing the
Gro¨bner basis.
Example 4.13 Consider the ideal I ⊂ Q[x, y], I = 〈f1, f2〉, where f1 = yx − y, f2 =
y2 − x. Assume a degree-lexicographic term ordering with y > x is imposed.
First, we need to compute Spoly(f1, f2) = x · f2− y · f1 = y2−x2. Then we conduct a
polynomial reduction y2−x2 f2−→ x2−x f1−→ x2−x. Let f3 = x2−x. ThenG is updated as
{f1, f2, f3}. Next we compute Spoly(f1, f3) = 0. So there is no new polynomial generated.
Similarly, we compute Spoly(f2, f3) = x ·y2−x3, followed by x ·y2−x3 f1−→ y2−x3 f2−→
x− x3 f2−→ 0. Again, no polynomial is generated. Finally, G = {f1,f2, f3}.
50
When computing a Gro¨bner basis, it is important to note that if lt(fi) and lt(fj) have no
common variables, the S-poly reduction step in Buchberger’s algorithm, Spoly(fi, fj)
G′−→+
r, will produce r = 0.







·g = lt(f) · lt(g)
lt(f)
·f− lt(f) · lt(g)
lt(g)
·g = lt(g) ·f− lt(f) ·g
Thus, every monomial in Spoly(f, g) is divisible by either lt(f) or lt(g), so computing
Spoly(f, g)
f,g−→+ r will give r = 0.
As mentioned previously, a Gro¨bner basis gives a decision procedure to test for poly-
nomial membership in an ideal. This is explained in the following theorem.
Theorem 4.1 [Ideal membership test] Let G = {g1, · · · , gt} be a Gro¨bner basis for an
ideal I ⊂ K[x1, · · · , xd] and let f ∈ K[x1, . . . , xd]. Then f ∈ I if and only if the remainder
on division of f by G is zero.
In other words,
f ∈ I ⇐⇒ f G−→+ 0 (4.25)
Example 4.14 Consider Example 4.13. Let f = y2x − x be another polynomial. Note
that f = yf1 + f2, so f ∈ I . If we divide f by f1 first and then by f2, we will obtain a
zero remainder. However, since the set {f1, f2} is not a Gro¨bner basis, we find that the
reduction f
f2−→ x2 − x f1−→ x2 − x 6= 0; i.e., dividing f by f2 first and then by f1,
does not lead to a zero remainder. However, if we compute the Gro¨bner basis G of I ,
G = {x2−x, yx−y, y2−x}, dividing f by polynomials in G in any order will always lead
to the zero remainder. Therefore, one can decide ideal membership unequivocally using the
Gro¨bner basis.
A Gro¨bner basis is not a canonical representation of an ideal, but a reduced Gro¨bner
basis is. To compute a reduced Gro¨bner basis, we first must compute a minimal Gro¨bner
basis.
51
Definition 4.17 A minimal Gro¨bner basis for a polynomial ideal I is a Gro¨bner basis G
for I such that
• lc(gi) = 1,∀gi ∈ G
• ∀gi ∈ G, lt(gi) /∈ 〈lt(G− {gi})〉
A minimal Gro¨bner basis is a Gro¨bner basis such that all polynomials have a coefficient of
1 and no leading term of any element in G divides another in G. Given a Gro¨bner basis G,
a minimal Gro¨bner basis can be computed as follows:
1. Minimize every gi ∈ G, i.e., gi = gi/lc(gi)
2. For gi, gj ∈ G where i 6= j, remove gi from G if lt(gi) | lt(gj), i.e., remove every
polynomial in G whose leading term is divisible by the leading term of some other
polynomial in G
A minimal Gro¨bner basis can then be further reduced.
Definition 4.18 A reduced Gro¨bner basis for a polynomial ideal I is a Gro¨bner basis
G = {g1, . . . , gt} such that:
• lc(gi) = 1,∀gi ∈ G
• ∀gi ∈ G, no monomial of gi lies in 〈lt(G− {gi})〉
G is a reduced Gro¨bner basis when no monomial of any element in G divides the leading
term of another element. This reduction is achieved as follows:
Definition 4.19 Let H = {h1, . . . , ht} be a minimal Gro¨bner basis. Apply the following
reduction process:
• h1 G1−→+ g1, where g1 is reduced w.r.t. G1 = {h2, . . . , ht}
• h2 G2−→+ g2, where g2 is reduced w.r.t. G2 = {g1, h3, . . . , ht}
• h3 G3−→+ g3, where g3 is reduced w.r.t. G3 = {g1, g2, h4, . . . , ht}
...
52
• ht Gt−→+ gt, where gt is reduced w.r.t. Gt = {g1, g2, g3, . . . , gt−1}
Then G = {g1, . . . , gt} is a reduced Gro¨bner basis.
Subject to the given term order >, such a reduced Gro¨bner basis G = {g1, . . . , gt} is a
unique canonical representation of the ideal, as given by Proposition 4.3 below.
Proposition 4.3 [10] Let I 6= {0} be a polynomial ideal. Then, for a given monomial
ordering, I has a unique reduced Gro¨bner basis.
Gro¨bner basis computation depends on the Spoly computation, which in turn depends
on the leading terms of polynomials. Thus, different monomial orderings can result in
different Gro¨bner basis computations for the same ideal. Computation using a degrevlex
ordering tends to be least difficult, while lex ordering tends to be computationally complex.
However, lex ordering used in the computation of Gro¨bner basis is an elimination order-
ing; that is, the polynomials contained in the resulting Gro¨bner basis have continuously
eliminated variables in the ordering. This is the topic of elimination theory, which is
described in the following section.
4.4 Elimination Theory
Elimination theory uses elimination ordering to systematically eliminate variables
from a system of polynomial equations.
Definition 4.20 Let I be an ideal in K[x1, . . . , xk]. The i-th elimination ideal Ii is the
ideal of K[xi+1, . . . , xk] defined by
Ik = I ∩K[xi+1, . . . , xk] (4.26)
The elimination ideal Ii has eliminated all the variables x1, . . . , xi, i.e., it only con-
tains polynomials with variables in xi+1, . . . , xk. We can generate elimination ideals by
computing Gro¨bner bases using elimination orderings.
Theorem 4.2 [Elimination theorem] Let I be an ideal in K[x1, . . . , xk] and let G be the
Gro¨bner basis of I with respect to the lex order (elimination order) x1 > x2 > · · · > xk.
53
Then, for every 0 ≤ i ≤ k,
Gk = G ∩K[xi+1, . . . , xk] (4.27)
is a Gro¨bner basis of the i-th elimination ideal Ii.
This can be better visualized using the following example.
Example 4.15 Given the following equations in R[x, y, z]
x2 + y + z = 1
x+ y2 + z = 1
x+ y + z2 = 1
let I be the ideal generated by these equations:
I = 〈x2 + y + z − 1, x+ y2 + z − 1, x+ y + z2 − 1〉
The Gro¨bner basis for I with respect to lex order x > y > z is found to be G =
{g1, g2, g3, g4} where
g1 = x+ y + z
2 − 1
g2 = y
2 − y − z2 + z
g3 = 2yz
2 + z4 − z2
g4 = z
6 − 4z4 + 4z3 − z2
Notice that while g1 has variables in R[x, y, z], g2 and g3 only have variables in R[y, z]
and g4 only has variables in R[z]. Thus, G1 = G ∩ R[y, z] = {g2, g3, g4} and G2 =
G ∩ R[z] = {g4}.
Also notice that since g4 only contains variable z, and since g4 = 0, a solution for z
can be obtained. This solution can then be applied to g2 and g3 to obtain solutions for y,
and so on.
Elimination theory provides the basis for our abstraction approach.
54
4.5 Hilbert’s Nullstellensatz
In this section, we further describe some correspondence between ideals and varieties
in the context of algebraic geometry. The celebrated results of Hilbert’s Nullstellensatz
establish these correspondences.
Definition 4.21 A field K is an algebraically closed field if every polynomial in one vari-
able with degree at least 1, with coefficients in K, has a root in K.
In other words, any nonconstant polynomial equation over K [x] always has at least one
root in K. Every field K is contained in an algebraically closed one K. For example, the
field of real numbers R is not an algebraically closed field, because x2 + 1 = 0 has no
root in R. However, x2 + 1 = 0 has roots in the field of complex numbers C, which is an
algebraically closed field. In fact, C is the algebra closure of R. Every algebraically closed
field is an infinite field.
An interesting result is one of strong Nullstellensatz. The strong Nullstellensatz es-
tablishes the correspondence between radical ideals and varieties.
Theorem 4.3 [The strong Nullstellensatz [10]] LetK be an algebraically closed field, and
let J be an ideal in K[x1, . . . , xd]. Then, we have I(VK(J)) =
√
J .
Strong Nullstellensatz holds a special form over Galois fields Fq. Recall the notion
of vanishing polynomials over Galois fields from the previous chapter: for every element
A ∈ Fq, A − Aq = 0; then the polynomial xq − x in Fq[x] vanishes over Fq. Thus, if
J0 = 〈xq − x〉 is the ideal generated by the vanishing polynomial, V (J0) = Fq. Similarly,
over Fq[x1, . . . , xd], J0 is 〈xq1 − x1, . . . , xqd − xd〉 and V (J0) = (Fq)d.
Definition 4.22 Given two ideals, I1 = 〈f1, . . . , fs〉 and I2 = 〈g1, . . . gt〉, then the sum of
ideals I1 + I2 = 〈f1, . . . , fs, g1, . . . gt〉
Theorem 4.4 [Strong Nullstellensatz over Fq] For any Galois field Fq, let J ⊂ Fq[x1, . . . , xd]
be any ideal and let J0 = 〈xq1 − x1, xqd − xd〉 be the ideal of all vanishing polynomials. Let
VFq(J) denote the variety of J over Fq. Then, I(VFq(J)) = J + J0.





J + J0 = J + J0. That is, J + J0 is a radical ideal.
2. VFq(J) = VFq(J + J0).
3. Due to (2), I(VFq(J)) = I(VFq(J+J0)). By Thm. 4.3, this is equivalent to
√
J + J0.
Finally, due to (1), this is equivalent to J + J0.
4.6 Concluding Remarks
Our approach to word-level abstraction of Galois field arithmetic circuits applies con-
cepts of polynomial ideals, varieties, Gro¨bner basis, and elimination theory to abstract a
word-level representation of the circuit. This approach is described in the next chapter.
However, a Gro¨bner basis computation is prohibitively expensive; thus we propose im-




In this chapter, we introduce our approach to abstract word-level canonical representa-
tions of combinational circuits using methods based on computer-algebra and algebraic-
geometry. Given a bit-level implementation of a combinational, acyclic circuit C that
implements some unknown function f : Fn
2k
→ F2k , where Z is the k-bit output and
A1, . . . , An are the k-bit inputs, find the canonical, word-level representation implemented
by C, Z = F(A1, . . . , An); that is, find a canonical representation of the polynomial F
in terms of A1, . . . , An. For example, a combinational circuit with one word-level k-bit
input A and k-bit output Z, which computes Z = F(A) over F2k , is shown in Fig. 5.1.
By modeling the the arithmetic circuit as a polynomial system in F2k [x1, x2, · · · , xd], the
abstraction problem can be solved using a Gro¨bner basis computation using an abstraction
term order.
5.1 Problem Statement
• Given a gate-level combinational, acyclic circuit C with n word-level k-bit inputs,
A1, . . . , An ∈ F2k and one k-bit output Z.
Figure 5.1: Derive the abstraction Z = F(A).
57
• Pick a primitive, irreducible polynomial P (x) over F2[x] of degree k to construct
F2k (these polynomials are known). Let P (α) = 0, where α ∈ F2k is a root of the
irreducible polynomial P .
• The bit-level primary inputs of the circuit are denoted {ai0, ai1, . . . , aik−1}, for 1 ≤
i ≤ n; the bit-level primary outputs are {z0, . . . , zk−1} = Z. Note that all aij, zj ∈ F2
for 0 ≤ j < k.
• Find the word-level polynomial function F over F2k computed by C in the form of
Z = F(A1, . . . , An).
In order for a combinational circuit C to compute a function f over the Galois field
F2k , C must have any number of k-bit word-level inputs and one k-bit word-level output.
Since every function over F2k is a polynomial function, C has a word-level polynomial
representation over F2k . The goal is to derive this word-level polynomial representation F
computed by a given combinational circuit over a given F2k .
For the purpose of explaining the proposed abstraction approach, this chapter explores
its application to Galois field multiplier circuits, which are described in Chapter 3, as they
form the core of most cryptographic computations in ECC and are notoriously hard to
verify. In this case, P (x) is chosen to be the same primitive polynomial over which the
circuit was designed.
Example 5.1 Consider our problem statement as it applies to a multiplier circuit over F2k .
The specification of the circuit is unknown.
• Given the Galois field F2k and the corresponding irreducible polynomial P (x). Let
P (α) = 0.
• Given a gate-level combinational circuit. The bit-level primary inputs of the circuit
are {a0, . . . , ak−1, b0, . . . , bk−1}, and the bit-level primary outputs are {z0, . . . , zk−1};
thus all ai, bi, zi ∈ F2.
• A and B denote the k-bit word-level inputs and Z is the k-bit word-level output.
Therefore, A = a0 + a1α + · · · + ak−1αk−1, B = b0 + b1α + · · · + bk−1αk−1, and
Z = z0 + z1α + · · ·+ zk−1αk−1 with A,B,Z ∈ F2k .
58
• Find the word-level polynomial function F that this circuit implements over F2k . The
polynomial must be in the form of Z = F(A,B). Since, in this case, the circuit is a
multiplier, the resulting polynomial will be Z = A ·B.
5.2 Circuit Polynomial Modeling
Given a gate-level implementation of a circuit, we map each gate-level Boolean oper-
ator in the circuit (NOT , AND, OR, XOR) to a polynomial over F2 using the following
one-to-one mapping over B→ F2 :
NOT : ¬a→ a+ 1 (mod 2)
AND : a ∧ b→ a · b (mod 2)
OR : a ∨ b→ a+ b+ a · b (mod 2)
XOR : a⊕ b→ a+ b (mod 2)
(5.1)
where a, b ∈ F2 = {0, 1}. Note that the equation c = F(a, b) is written in polynomial form
as c−F(a, b) = c+ F(a, b) = 0, as −1 ≡ +1 (mod 2).
Example 5.2 Consider the equation with Boolean operators:
z = a⊕ (b ∨ c)
This equation modeled over F2 is:
z + a+ b+ c+ b · c = 0
Notice that the left-hand side expression is a polynomial in F2 [a, b, c, z] ⊂ F2k [a, b, c, z]
Secondly, we model the k-bit word-level inputs and the k-bit word-level output of the
given circuit as polynomial expressions in F2k as shown in the problem statement. If the
k-bit word-level output of the circuit is denoted Z, which is composed of bit-level outputs
z0, . . . , zk−1, the corresponding equation is:
Z = z0 + z1α + · · ·+ zk−1αk−1
Once again, since −Z = +Z (mod 2), this equation is modeled as:
59
Z + z0 + z1α + · · ·+ zk−1αk−1 = 0
Likewise, for all word-level inputs A1, . . . , An we have
A1 + a
1




0α + · · ·+ ank−1 = 0
Overall, a combinational circuit composed of s Boolean gates with n k-bit inputs,
A1, . . . , An, and one k-bit output Z, is modeled as a polynomial system over F2k as follows:
f1(x1, x2, · · · , xd) = 0
f2(x1, x2, · · · , xd) = 0
...
fs(x1, x2, · · · , xd) = 0

Bit-level circuit constraints
fZ : Z + z0 + z1 · α, · · · , zk−1 · αk−1 = 0




1 · α + · · ·+ a1k−1 · αk−1 = 0
...








Example 5.3 Consider a 2-bit multiplier over F22 with P (x) = x2+x+1, given in Fig. 5.2.
Variables a0, a1, b0, b1 are primary inputs, z0, z1 are primary outputs, and c0, c1, c2, c3, r0
are intermediate variables.














Figure 5.2: A 2-bit multiplier over F(22).
c0 = a0 ∧ b0
c1 = a0 ∧ b1
c2 = a1 ∧ b0
c3 = a1 ∧ b1
r0 = c1 ⊕ c2
z0 = c0 ⊕ c3
z1 = r0 ⊕ c3
With the mapping rules given in Eqn. (5.1), the above equations are transformed into
the following polynomials:
c0 + a0 · b0
c1 + a0 · b1
c2 + a1 · b0
c3 + a1 · b1
r0 + c1 + c2
z0 + c0 + c3
z1 + r0 + c3
Therefore, our overall polynomial system is:
61
f1 : c0 + a0 · b0
f2 : c1 + a0 · b1
f3 : c2 + a1 · b0
f4 : c3 + a1 · b1
f5 : r0 + c1 + c2
f6 : z0 + c0 + c3
f7 : z1 + r0 + c3

Bit-level circuit constraints
fA : A+ a0 + a1 · α
fB : B + b0 + b1 · α




Let S be the system of polynomials, {f1, . . . , fs, fA1 , . . . , fAn , fZ} ⊂ F2k , derived from
the hardware implementation of the Galois field arithmetic circuit over F2k . This circuit
performs some unknown function f over F2k in the form of Z = F(A1, . . . , An), where Z
is the k-bit output and A1, . . . , An are the k-bit inputs. The polynomial representation of F
over F2k is thus:
fF : Z + F(A1, . . . , An)
Since fF is ultimately derived from the circuit implementation, it agrees with the
solution to the system of polynomials {S} = 0, i.e.:
f1 = · · · = fs = fA1 = · · · = fAn = fZ = 0
Thus, if we let J = 〈f1, . . . , fs, fA1 , . . . , fAn , fZ〉 be the ideal generated by S, fF vanishes
on the variety VF
2k
(J). Therefore, due to Proposition 4.2, fF must be contained in the ideal
of polynomials that vanish on this variety, fF ∈ I(VF
2k
(J)).
By applying strong Nullstellensatz over F2k (Thm. 4.3), I(VF2k (J)) = J +J0 where J0
is the ideal generated by all vanishing polynomials in F2k . Recall that a vanishing polyno-
mial in F2k [x] is xq − x = xq + x. In our case, {x1, . . . , xd} ∈ F2 and {A1, . . . , An, Z} ∈
62
F2k . Thus, for F2k [x1, . . . , xd, A1, . . . , An, Z]:
J0 = 〈x21 + x1, . . . , x2d + xd, A2
k
1 + A1, . . . , A
2k
n + An, Z
2k + Z〉
The generators of the ideal sum J + J0 are simply the combination of the generators of
J and the generators J0.
Example 5.4 Let us reconsider Example 5.3. First, polynomials are extracted from the
circuit implementation as shown in Example 5.3. These polynomials represent the ideal J .
Along with the ideal of vanishing polynomials J0, the following polynomials represent the
generators of J + J0 for the multiplier circuit.
f1 : c0 + a0 · b0
f2 : c1 + a0 · b1
f3 : c2 + a1 · b0
f4 : c3 + a1 · b1
f5 : r0 + c1 + c2
f6 : z0 + c0 + c3
f7 : z1 + r0 + c3

Bit-level circuit constraints (⊂ J )
fA : A+ a0 + a1 · α
fB : B + b0 + b1 · α
fZ : Z + z0 + z1 · α
 Word-level designation (⊂ J )
a20 − a0, a21 − a1, b20 − b0, b21 − b1
c20 − c0, c21 − c1, c22 − c2, c23 − c3
r20 − r0, z20 − z0, z21 − z1
A4 − A, B4 −B, Z4 − Z

vanishing polynomials(J0 )
The variety VFq(J) is the set of all consistent assignments to the nets (signals) in the
circuit C. If we project this variety on the word-level input and output variables of the
circuit C, we essentially generate the function F implemented by the circuit. Projection of
varieties from d-dimensional space Fdq onto a lower dimensional subspace Fd−lq is equivalent
to eliminating l variables from the corresponding ideal. This can be done by computing
63
a Gro¨bner basis of the ideal with elimination ordering, as described in the elimination
theorem (Thm. 4.2). Thus, we can find the polynomial fF : Z + F(A1, . . . , An) by
computing the Gro¨bner basis of J + J0 using the proper elimination ordering.
The proposed elimination order for abstraction is defined as the abstraction term
order.
Definition 5.1 Given a circuitC, let x1, . . . , xd denote all the bit-level variables, letA1, . . . , An
denote the k-bit word-level inputs, and let Z denote the k-bit word-level output. Using the
partial variable order {x1, . . . , xd} > Z > {A1, . . . , An}, where any refinement of the or-
der will do, impose a lex term order> on the polynomial ringR = Fq[x1, . . . , xd, Z, A1, . . . , An].
This elimination term order > is defined as the abstraction term order. The relative
ordering among x1, . . . , xd is not important and can be chosen arbitrarily. Likewise, the
relative ordering among A1, . . . , An is also unimportant.
Theorem 5.1 [Abstraction theorem] Using the setup and notations above, compute a Gro¨bner
basis G of ideal (J + J0) using the abstraction term order >. Then:
(i) For every word-level input Ai, G must contain the vanishing polynomial A
q
i − Ai as
the only polynomial with Ai as its only variable;
(ii) G must contain a polynomial of the form Z + G(A1, . . . , An); and
(iii) Z+G(A1, . . . , An) is such that F(A1, . . . , An) = G(A1, . . . , An),∀A1, . . . , An ∈ Fq.





i −Ai is a given generator of J0. A1, . . . , An are also the last variables in the
abstraction term order. Moreover, Ai is an input to the circuit, soAi is an independent
variable. As a result, Gd+1 = G ∩ F2k [A1, . . . , An] = {Aq1 − A1, . . . , Aqn − An}.
(ii) Since f : Z + F(A1, . . . , An) is a polynomial representation of the function of the
circuit, Z + F(A1, . . . , An) ∈ J + J0, as described above. Therefore, according to
the definition of a Gro¨bner basis, the leading term of Z + F(A1, . . . , An) (which is
64
Z) should be divisible by the leading term of some polynomial gi ∈ G. The only
way lt(gi) can divide Z is when lt(gi) = Z itself. Moreover, due to our abstraction
(lex) term order, Z > A > · · · > An; so this polynomial must be of the form
Z + G(A1, . . . , An).
(iii) As Z = F(A1, . . . , An) represents the function of the circuit, Z + F(A1, . . . , An) ∈
J + J0. Moreover, V (J + J0) ⊆ V (Z + F(A1, . . . , An)). By projecting this variety
V (J + J0) onto the co-ordinates corresponding to (A1, . . . , An, Z), we obtain the
graph of the function (A1, . . . , An) 7→ F(A1, . . . , An) from F2k → F2k . Since
Z + G(A) is an element of the Gro¨bner basis of J + J0, V (J + J0) ⊆ V (Z +
G(A1, . . . , An)). Therefore, Z = G(A1, . . . , An) gives the same function as Z =
F(A1, . . . , An), for all Ai ∈ F2k .
As a consequence of the abstraction theorem, computing a Gro¨bner basis G of J + J0
using the abstraction term order finds a polynomial of the form Z + G(A1, . . . , An) in the
Gro¨bner basis, such that Z = G(A1, . . . , An) is a polynomial representation of the circuit.
However, if the Gro¨bner basis is not reduced, it is possible to obtain multiple polynomials
in G of the form Z + G1(A1, . . . , An), Z + G2(A1, . . . , An), . . . ,; all of which correspond
to the same function.
Corollary 5.1 By computing a reduced Gro¨bner basis Gr of J + J0, Gr will contain one
and only one polynomial in of the form Z + G(A1, . . . , An), such that Z = G(A1, . . . , An)
is the unique, minimal, canonical representation of the function F implemented by the
circuit.
Proof. Any function f : Fd
2k
→ F2k has a unique canonical representation as polynomial
Pf ∈ F2k [x1, ..., xd] such that all its nonzero monomials are of the form xi11 · · ·xidd where
0 ≤ ij ≤ q − 1, for all j = 1, . . . d.
Let J0 be the ideal of all polynomials that vanish over F2k [x1, . . . , xd]. The generators
of J0 are polynomials in the form x
2qi
i − xi, where qi is the datapath size of xi. Then
these generators also form a reduced Gro¨bner basis for J0. This implies that the elements
A2
k
h − Ah, 1 ≤ h ≤ n will have to be part of the reduced Gro¨bner basis of J + J0.
65
Corollary 1.8.6 in [10] shows that the obtained element F(A1, ..., An) that is reduced
modulo A2kh − Ah, 1 ≤ h ≤ n. Thus, the polynomial representation of F in the reduced
Gro¨bner basis is the unique canonical representation.
Example 5.5 Consider the 2-bit multiplier Example 5.4, for which we have already gener-
ated J + J0. We apply abstraction term order >, i.e., a lex order with “bit-level variables”
> “Output Z” > “Inputs A, B.”
When we compute the reduced Gro¨bner basis, Gr, of {J + J0} with respect to this
ordering, Gr = {g1, . . . , g14} :
g1 : B
4 +B; g2 : b0 + b1α +B; g3 : a0 + a1α + A;
g4 : c0 + c1α + c2α + c3(α + 1) + Z; g5 : r0 + c1 + c2; g6 : z0 + c0 + c3;
g7 : z1 + r0 + c3; g8 : Z+A ·B; g9 : b1 +B2 +B; g10 : a1 + A2 + A;
g11 : c3 + a1 · b1g12 : c2 + a1 · b1α + a1 ·B; g13 : c1 + a1 · b1α + b1A; g14 : A4 + A
g8 = Z + A · B is the canonical, word-level polynomial representing the function
performed by the multiplier Z = A ·B.
Consolidating our results, the proposed abstraction approach is described as follows:
1. Given a bit-level implementation of a Galois field arithmetic circuit C over a given
F2k , with k-bit output Z and k-bit inputs A1, . . . , An.
2. C performs some unknown function Z = F(A1, . . . , An).
3. Model the circuit as a system of polynomials {f1, . . . , fs} ⊂ F2k [x1, . . . , xd, Z, A1,
. . . , An] as described above and let J be the ideal generated by these polynomials.
4. Let J0 be the ideal generated by all vanishing polynomials of F2k .
5. By computing a reduced Gro¨bner basis Gr of ideal J + J0 using abstraction term
order, the word-level polynomial Z + F(A1, . . . , An) will be found in Gr.
5.4 Experimental Results: Validation of the Approach
Our experiments take, as inputs, Mastrovito [64] multiplier circuits of various word
sizes k. Each multiplier performs the polynomial function Z = F(A,B) = A · B over
66
a Galois field, where Z is the k-bit output and A and B are the k-bit inputs. We extract
the Boolean gate-level operators J and vanishing polynomials J0. Then we compute the
reduced Gro¨bner basis Gr of J +J0 with respect to abstraction term order >. The resulting
Gr contains a polynomial Z + A ·B.
The experiments were designed as scripts in the computer algebra tool, SINGULAR [74],
which provides functionality for polynomial computations over rings and fields. This tool
provides a number of efficient polynomial algorithms. The ring F2k [x1, . . . , xf , Z, A,B]
is defined over the abstraction ordering, using the same primitive polynomial (“minpoly”
in SINGULAR) P (X) that was used to design the Galois field multiplier. Ideals J and
J0 were provided using their generating polynomials and the the reduced Gro¨bner basis
computation of J +J0 was performed using the “slimgb” command. The results are shown
in Table 5.1.
The experiments were conducted on a 64-bit Ubuntu machine running a 2.4GHz proces-
sor with 8GB of memory. We applied our approach to abstract the canonical, polynomial
representation of Mastrovito multipliers of various sizes. Our machine was unable to
perform the computations of the Gro¨bner basis of multipliers beyond 40-bit word inputs.
5.5 Conclusions
The above approach is guaranteed to find a canonical, word-level representation of the
function F performed by a circuit C over F2k . However, the Gro¨bner basis computation is
prohibitively complex for circuits of practical sizes. The next chapter proposes a method to
overcome the complexity of the Gro¨bner basis computation in order to make this abstraction
approach scalable.
Table 5.1: Run-time of Gro¨bner basis computation of Mastrovito multipliers in SINGULAR
using abstraction term order >
Word Size (k) Number of Polynomials (d) Computation Time (minutes)
16 1, 871 2.4
24 3, 135 12
32 5, 549 22.6
40 8, 587 266
48 12, 327 NA (Out of Memory)
CHAPTER 6
OVERCOMING GROBNER BASIS COMPLEXITY
FOR ABSTRACTION
Computing a Gro¨bner basis is prohibitively expensive for large circuits. The approach
from the last chapter is limited only to small circuits, with datapaths no larger than 40-bits.
A full Gro¨bner basis computation results in numerous polynomials, but the abstraction
approach “searches” for only one polynomial (Z + G(A)) in the basis. This motivates
an investigation into whether it is possible to guide a sequence of Spoly(f, g) J+J0−−−→+ r
computations to arrive at the desired word-level polynomial. This chapter describes this
smaller subset of computations, which are derived from a Gro¨bner basis analysis, to find
the word-level polynomial of the function performed by a given circuit. The improved
approach can abstract canonical word-level representations of circuits up to 571 bits, cor-
responding to the largest NIST-specified ECC standard.
6.1 Improving the Abstraction Approach
Consider the word-level abstraction problem formulation from Chapter 5. J is the ideal
generated by all polynomials derived from the circuit implementation and J0 is the ideal of
all the vanishing polynomials of every variable in the ring. The computation of the reduced
Gro¨bner basis of J + J0 over Fq has the following known complexity [68]:
Theorem 6.1 Let J+J0 = 〈f1, . . . , fs, xq1−x1, . . . , xqd−xd〉 ⊂ Fq[x1, . . . , xd] be an ideal.
The time and space complexity of Buchberger’s algorithm to compute a Gro¨bner basis of
J + J0 is bounded by qO(d).
In our case q = 2k, and when k and d are large, this complexity makes abstraction
infeasible.
68
Recall that Buchberger’s algorithm [104] for computing Gro¨bner bases depends on the







· f − L
lt(g)
· g
L = lcm(lt(f), lt(g))
A new polynomial is added to the basis when the remainder of the Spoly reduction, r,
is nonzero.
Notice that our approach searches for only one polynomial fF : Z + F(A1, . . . , An),
and it does so by computing the entire reduced Gro¨bner basis, G = {g1, . . . , gm} and
finding fF ∈ {g1, . . . , gm}. This motivates us to investigate whether it is possible to
guide a sequence of Spoly(f, g) J+J0−−−→+ r computations to arrive at the desired word-level
polynomial, without considering other polynomials in the generating set.
Numerous improvements have been introduced to improve the efficiency of Buch-
berger’s algorithm. One of these is the product criterion, the results of which we exploit for
our approach.
Lemma 6.1 [Product criterion [105]] Let F be any field, and f, g ∈ F[x1, · · · , xd] be poly-
nomials. If the equality lm(f)·lm(g) = LCM(lm(f), lm(g)) holds, then Spoly(f, g) G−→+
0.
The above result states that when the leading monomials of f, g are relatively prime
then Spoly(f, g) always reduces to 0 modulo G. In this case, Spoly(f, g) need not be
considered in Buchberger’s algorithm, and thus the computation is avoided. Recall that
in the abstraction term order (Definition 5.1), we have “circuit variables x1, . . . , xd” >
“word-level output” > “word-level inputs,” where the relative ordering among x1, . . . , xd
is not important. This ordering is now further refined to exploit the product criteria.
69
Given an acyclic combinational circuit, an ordering can be applied to the bit-level
variables, {x1, . . . , xd} ∈ F2, based on their topological position in the circuit. In a reverse
topological ordering, the output variable of the gate will always come earlier in the ordering
than any of its input variables.
Definition 6.1 Refined abstraction term order (RATO) >r: Given a circuit C, apply
a reverse topological ordering to the bit-level variables {x1, . . . , xd}. Then, impose a
lex term order >r on Fq[x1, . . . , xd, Z, A1, . . . , An] with “circuit variables ordered reverse
topologically” > “output word-level variable” > “input word-level variables.”
Let F be the set of polynomials that generate J , i.e., J = 〈F 〉. When RATO is applied,
we find that all bit-level circuit constraint polynomials in F have leading terms that are
relatively prime to each other. Since we are using a reverse topological variable ordering
with lex term ordering, these polynomials are in the form of fi = xi + tail(fi), where xi
is the output of a gate fi, and thus the following proposition from Lv’s work [11] can be
applied.
Proposition 6.1 Let C be any arbitrary combinational circuit. Let {x1, . . . , xd} denote the
set of all variables (signals) in the circuit, i.e., the primary input, intermediate and primary
output variables. Perform a reverse topological traversal of the circuit and order the
variables such that xi > xj if xi appears earlier in the reverse topological order. Impose a
lex term order to represent the Boolean expression for each gate as a polynomial fi; then
fi = xi + tail(fi). Then the set of all polynomials {f1, . . . , fs} forms a Gro¨bner basis, as
lt(fi) and lt(fj) for i 6= j are relatively prime.
Example 6.1 Consider the 2-bit multiplier from Example 5.4. With RATO applied, the
bit-level circuit constraint polynomials in F , where J = 〈F 〉, are:
70
f1 : c0 + a0 · b0
f2 : c1 + a0 · b1
f3 : c2 + a1 · b0
f4 : c3 + a1 · b1
f5 : r0 + c1 + c2
f6 : z0 + c0 + c3
f7 : z1 + r0 + c3
The leading terms of f1, . . . , f7 are relatively prime to each other.
Let F0 be the set of polynomials that generate J0, i.e., J0 = 〈F0〉. In F ∪ F0, for every
polynomial fi = xi + tail(fi) in F there is a vanishing polynomial vi = x2i + xi in F0;
fi and vi have leading terms that are not relatively prime. In this case, [11] shows that
Spoly(xi + tail(fi), x2
k
i − xi) J,J0−→+ r always produces r = 0, and thus can be excluded
from the Gro¨bner basis computation.
Theorem 6.2 Let q = 2k, and let Fq[x1, . . . , xd] be a ring on which we have a reverse
topological lex order. Let I be a subset of {1, . . . , d}. For all i ∈ I , let fi = xi +Pi (where
Pi = tail(fi)) such that all indeterminates xj that appear in Pi satisfy xi > xj . Then the
set G = {fi : i ∈ I} ∪ {xq1 − x1, . . . , xqd − xd} is a Gro¨bner basis.
The proof is given in [11] and is reproduced here:
Proof. Given a system of polynomials derived from a circuit over Fq[x1, . . . , xd], where
{x1, . . . , xd} are bit-level variables, apply a reverse topological lex ordering to {x1, . . . , xd}.
Let xi be the output of a Boolean logic gate for some 1 ≤ i ≤ d. Let f = xi + Pi be the
polynomial derived from this logic gate and g = xqi − xi be the vanishing polynomial of
xi. Then Spoly(f, g) = x
q−1
i f − g = xq−1i Pi + xi. In what follows, it is important to note
that the indeterminates appearing in Pi are all less than xi over the given ordering.






i + xi − xq−3i P 2i (xi + Pi) = xq−3i P 3i + xi. Continuing in this fashion, we
get xiP
q−1
i + xi − P q−1i (xi + Pi) = xi + P qi , and finally xi + P qi − (xi + Pi) = P qi − Pi.
71
Hence,
xq−1i Pi + xi
xi+Pi−→ xq−2i P 2i + xi xi+Pi−→ xq−3i + xi xi+Pi−→ · · ·
· · · xi+Pi−→ P qi + xi xi+Pi−→ P qi − Pi.
Over the finite field Fq, P qi − Pi is a vanishing polynomial. Therefore, P qi − Pi ∈
I(V (J0)) = 〈xq1 − x1, . . . , xqd − xd〉. Due to the product criterion (Lemma 6.1), G0 =
{xq1 − x1, . . . , xqd − xd} is Gro¨bner basis. Therefore P qi − Pi G0→+ 0.
Due to RATO, there exists a polynomial fzi ∈ F , which is the polynomial derived from
a Boolean logic gate, where zi is the first variable in the ordering for some 0 ≤ i < k. That
is, fzi : zi + tail(fzi).
Proposition 6.2 Over RATO, the polynomial pair (fZ , fzi) is the only critical pair at the
start of the Gro¨bner basis computation of J + J0, where fzi is the polynomial derived from
the gate.
Proof. Due to Thm. 6.2 and the product criterion, a critical pair must come from a
word level designation polynomial, {fZ , fA1 , . . . , fAn}, and a polynomial derived from
a Boolean logic gate. The leading terms of the polynomials {fA1 , . . . , fAn} are bit-level
inputs to the circuit and thus are not the outputs of any gate. Thus, the only critical pair if
fZ and fzi , where zi is the first variable in the ordering and is thus the leading monomial of
fZ .
Thus, the first computation of the Gro¨bner basis is guaranteed to be Spoly(fZ , fzi)
F,F0−−→+
r. This computation has the following interesting property.
Proposition 6.3 Normalize fZ , i.e., fZ=fZ/LC(fZ). Then
Spoly(fZ , fzi)
F,F0−−→+ r is equivalent to fZ F−{fZ},F0−−−−−−→ +r (6.1)
Proof. Assuming both fzi and fZ are minimized, i.e., LC(fzi) = LC(fZ) = 1, they are
both in the form zi +P for some polynomial P . Let f = zi +Pf and g = zi +Pg represent
fzi and fZ , respectively.
(i) For Spoly(f, g)












· f − zi
zi
· g
= f + g
Thus Spoly(fZ , fzi)
F,F0−→+ r is equivalent to f + g F,F0−→+ r.
(ii) For fZ
F−{fZ},F0−−−−−−→ +r, since the leading term of fZ is zi, the only polynomial in the
set {F − {fZ}, F0} that can perform the first division is fzi . Again, denote fZ as f
and fzi as g. According to the reduction algorithm, the remainder r of f
g−→+ r is:
r = f − lt(f)/lt(g) · g
= f − zi/zi · g
= f − g
= f + g (6.3)
Thus fZ
F−{fZ},F0−−−−−−→ +r is equivalent to (f + g) F−{fZ},F0−−−−−−→ +r
(iii) Consider (f +g)
F,F0−−→ +r. f = zi+Pf and g = zi+Pg. So f +g = 2zi+Pg +Pf =
Pg + Pf no longer contains zi as the leading monomial. As a consequence of RATO,
as reduction proceeds the remainder will never again contain zi since zi is the first
variable in the ordering. Since the leading monomial of fZ is zi, fZ will never be used
for reduction. Therefore (f + g)
F,F0−−→ +r is equivalent to (f + g) F−{fZ},F0−−−−−−→ +r.
Thus, Spoly(fZ , fzi)
F,F0−→+ r is equivalent to fZ F−{fZ},F0−−−−−−→+ r.
Thus, fZ
F−{fZ},F0−−−−−−→+ r is the first computational step of the abstraction. The polyno-
mial remainder r will not contain any bit-level variable corresponding to the output of any
gate in the design; i.e., primary output bits and intermediate variables of the circuit do not
appear in r. To prove this, assume that a nonprimary input variable xj appears in a mono-
mial term mj in r. Since there always exists a polynomial fj such that fj = xj + tail(fj),
lt(fj) divides monomial mj and mj can be canceled. Therefore, all such terms mj with
nonprimary input bit-level variables can be eliminated.
73
Two cases need to be considered:
1. Remainder r only contains word-level variables: word-level output Z and the word-
level inputs A1, . . . , An. Since RATO is lex with Z > {A1, . . . , An}, the remainder
r is the desired canonical polynomial representation, Z + F(A1, . . . , An).
2. Remainder r contains both the bit-level primary input variables, as well as the word-
level variables.
Example 6.2 Again, consider the 2-bit multiplier from Example 6.1. RATO for this exam-
ple is
z1 > z0 > r0 > c0 > c1 > c2 > c3 > a0 > a1 > b0 > b1 > Z > A > B (6.4)
F + F0 with RATO applied is:
f1 : c0 + a0 · b0
f2 : c1 + a0 · b1
f3 : c2 + a1 · b0
f4 : c3 + a1 · b1
f5 : r0 + c1 + c2
f6 : z0 + c0 + c3
f7 : z1 + r0 + c3

Bit-level circuit constraints (⊂ J )
fA : a0 + a1 · α + A
fB : b0 + b1 · α +B
fZ : z1 · α + z0 + Z
 Word-level designation (⊂ J )
a20 − a0, a21 − a1, b20 − b0, b21 − b1
c20 − c0, c21 − c1, c22 − c2, c23 − c3
r20 − r0, z20 − z0, z21 − z1
A4 − A, B4 −B, Z4 − Z

vanishing polynomials(J0 )
Notice that the leading monomial of fZ is z1, which is also the leading monomial of f7.
Minimize fZ , fZmin = fZ/α = z1 + z0 · (α + 1) + Z · (α + 1). By computing the
74
S-polynomial of fZmin and f7:
Spoly(fZmin, f7)
F,F0−→+ r
the remainder r is Z + A ·B.




the remainder r is Z + A ·B.
Example 6.3 Now consider a 2-bit multiplier which has a bug. The output lines, z0 and
z1, have been swapped, as shown in Fig. 6.1.
The polynomials in F + F0 are the same as in Example 6.2 except for the following
changes:
f6 : z1 + c0 + c3
f7 : z0 + r0 + c3
fZ : z1 + z0 · α + Z















Figure 6.1: A buggy 2-bit multiplier over F(22).
75
gives remainder r : a1 · b1α + a1 · Bα + b1 · Aα + Z + A · B, which is the same result as
computing fZ
F−{fZ},F0−−−−−−→+ r.
6.2 Improving Polynomial Division Using F4-style
Reduction
The most intensive computational step in our proposed improvement is that of poly-
nomial division fZ
F−{fZ},F0−−−−−−→+ r. When the circuit C is very large, the polynomial set
{F − {fZ}, F0} also becomes extremely large. This division procedure then becomes the
bottleneck in our abstraction approach. In principle, this reduction can be performed using
contemporary computer-algebra systems — e.g., the SINGULAR [74] tool, which is widely
used within the verification community [79] [81] [76]. In our work, we have also performed
experiments with SINGULAR. However, as in any “general-purpose” computer algebra
tool, the data structures are not specifically optimized for circuit verification problems.
Moreover, SINGULAR also limits the number of variables (d) that it can accommodate in the
system to d < 32767; this limits its application to large circuits. Recent symbolic compu-
tation techniques [11] have shown improvements from employing the concept of F4-style
polynomial reduction [106]. Therefore, to further improve our approach, we exploit this
relatively recent concept, which implements polynomial division using row-reductions on
a matrix, to develop a custom verification tool to perform this reduction efficiently.
Fauge`re’s F4 approach [106] presents a new algorithm to compute a Gro¨bner basis.
It uses the same mathematical principles as Buchberger’s algorithm. However, instead of
computing and reducing one S-polynomial at a time, it computes many S-polynomials in
one step and reduces them simultaneously using sparse linear algebra on a matrix (triangu-
lation). We can use this efficient reduction technique to perform our reduction, fZ
F−{fZ},F0−−−−−−→+
r, by representing and solving it on a matrix. First, let us consider the following example
that demonstrates the main concepts behind the reduction approach of F4.
Example 6.4 Consider the lex term order with x > y > z on the ring Q[x, y, z]. Given
F = {f1 = 2x2 + y, f2 = 3xy2 − xy, f3 = 4y3 − 1}, consider one step of Buchberger’s
algorithm: S(f1, f2)
f1,f2,f3−−−−→+ r. We have, Spoly(f1, f2) = 13x2y + 12y3 = f4. The
reduction Spoly(f1, f2)
f1,f2,f3−→ + (−16y2+ 18) is done as follows: Since lt(f1) | lt(f4), f4
f1−→
h is computed as:
76
h = f4 − lt(f4)
lt(f1)








Now, lt(f2) does not divide any term in h, but lt(f3) | lt(h), so f f3−→ r:














This reduction procedure can also be simulated on a matrix using Gaussian elimination.
The reduction above requires the computation of 1
6





, we can generate all the monomials required in the reduction process: i.e., monomials
of f4, yf1, f3, and setup the problem of cancellation of terms as Gaussian elimination on
a matrix. Monomials of f4, yf1, f3 are, respectively, {x2y, y3}, {x2y, y2}, {y3, 1}. Let the
rows of a matrix M correspond to polynomials [f4, yf1, f3], and columns correspond to all
the monomials (in lex order) [x2y, y3, y2, 1]. Then the matrix M shows the representation
of these polynomials where the entry M(i, j) is the coefficient of monomial of column j
present in the polynomial of row i.
M =








yf1 2 0 1 0
f3 0 4 0 −1

Now, reducing M to a row echelon form using Gaussian elimination gives:
M =








h = f4 − 16yf1 0 13 −16 0
r = h− 1
8
f3 0 0 −16 18









y, which is equal to the
reduction result r obtained before.
This approach generates all the monomial terms that are required in the division pro-
cess, and the coefficients required for cancellation of terms are accounted for by elementary
77
row reductions in the subsequent Gaussian elimination. Based on the above concepts, a
matrix can be constructed for our problem: fZ
F−{fZ},F0−−−−−−→ +r.
Definition 6.2 Let L = [f1, . . . , fm] be a list of m polynomials. Let ML be an ordered list
of monomials of elements of L and let n be the number of elements in ML. Define M as
the m × n matrix that associates the polynomials of L to rows and monomials of ML to
columns. Entry in row i, column j is the coefficient of the jth element of ML in fi.
Algorithm 4 describes our procedure to generate the matrix M of polynomials corre-
sponding to our reduction procedure. The main idea is to setup the rows and columns of
the matrix in a way that polynomial division can be subsequently performed by applying
Gaussian elimination on M. In the algorithm, the set of polynomials {F − {fZ}, F0} =
{f1, . . . , fs} correspond to the circuit constraints and RATO is imposed on the polynomi-
als. The output word-level polynomial fZ is to be reduced w.r.t. {f1, . . . , fs}. Initially,
L = {fZ} is inserted as the first row of the matrix and ML constitutes the (ordered) list of
monomials of fZ . Then, in every iteration i, a polynomial fk ∈ {f1, . . . , fs} is identified
such that lm(fk) divides the ith monomial (mon) of ML; this is to enable cancellation of
Algorithm 4: Generating the Matrix for Polynomial Reduction
Input: fZ , {F − {fZ}, F0} as {f1, . . . , fs}, RATO >r
Output: Remainder r of fZ
f1,...,fs−−−−→+ r
/*L = set of polynomials, rows of M*/;
L:={fZ} ;
/*ML = the set of monomials, columns of M */;
ML:={ monomials of f} ;
for (i = 0; i ≤num monomials in ML ; i++) do
mon:= the ith monomial of ML;
Identify fk ∈ F satisfying: lm(fk) can divide mon ;
/*add polynomial fk to L as a new row in M */;
L := L ∪ mon
lm(fk)
· fk ;
/*Add monomials to ML as new columns in M */;
ML:=ML ∪ {monomials of monlm(fk) · fk} ;
end
Gaussian Elimination on M;
return r = last row of M;
78
the corresponding monomial term. The computation L := L∪ mon
lm(fk)
· fk in the while-loop
generates the polynomials required for reduction.
The list ML is updated to include monomials of monlm(fk) · fk. Finally, the iteration in the
loop terminates when all monomials of ML have been analyzed. The loop is guaranteed to
terminate once mon contains only word-level variables, as no polynomials in {f1, . . . , fs}
have a leading term that contains a word-level variable over RATO.
Using the set L as rows and ML as columns, a matrix M is constructed and Gaussian
elimination is applied to reduce it to row-echelon form. The last row in the reduced matrix
corresponds to the reduction result r. Let us describe the approach using an example.
Example 6.5 Consider the reduction related to the abstraction of the F22 multiplier circuit
from Example 6.2. The word-level output designation polynomial fZ is z1α + z0 + Z, and
the circuit polynomials are
f1 : a0 + a1α + A
f2 : b0 + b1α +B
f3 : r0 + a0b1 + a1b0
f4 : z0 + a0b0 + a1b1
f5 : z1 + r0 + a1b1
Here P (x) = x2 + x + 1, and P (α) = 0. We have to compute fZ
f1,...,f5−−−−→+ r. Note that,
for simplicity, variables c0, c1, c2, c3 from Example 5.3 have been substituted by functions
on primary inputs. Impose RATO on the polynomials as follows:
z1 > z0 > r0 > a0 > a1 > b0 > b1 > Z > A > B (6.5)
The algorithm constructs the matrix as follows:
1. Initialization: L = {fZ} = {z1α + z0 + Z}. ML = {z1, z0, Z}, i = 1,mon = z1
(ith monomial of ML).
79
2. Iteration 1: Identify a polynomial fk ∈ {f1, . . . , fs} s.t. lm(fk) | mon. Clearly,
fk = f5 = z1 + r0 +a1b1. Then, L = L∪ monlt(fk) · fk = L∪ f5. Therefore, L = {f, f5}
and ML = {z1, z0, r0, a1b1, Z}, i = 2 and mon = z0.
3. Iteration 2: fk = f4 = z0+a0b0+a1b1 because lm(f4) |mon. Therefore, L = L∪f4
and ML = {z1, z0, r0, a0b0, a1b1, Z}, i = 3,mon = r0.
4. Iteration 3: fk = f3 = r0 + a0b1 + a1b0 as lt(f3) |mon. Therefore, L = L∪ f3 and
ML = {z1, z0, r0, a0b0, a0b1, a1b0, a1b1, Z}, i = 4,mon = a0b0.
5. Iteration 4: fk = f1 = a0+a1α+A because lm(f1) |mon. Then L = L∪ a0b0a0 ·f1 =
L ∪ b0 · f1 = {f5, f4, f3, b0f1} and ML = {z1, z0, r0, a0b0, a0b1, a1b0, a1b1, b0A,Z}.
6. Continuing in this fashion . . .
7. Iteration 8: L = {fZ , f5, f4, f3, b0f1, b1f1, a1f2, Af2},
ML = {z1, z0, r0, a0b0, a0b1, a1b0, a1b1, a1B, b0A, b1A,Z,AB},
i = 9, mon = AB.
8. Iteration 8: Since mon = AB contains only the word-level inputs, no polynomial in
F has a leading term that can cancel mon, so the loop terminates. The matrix M
can be constructed using L as rows and ML as columns.
Fig. 6.2a shows the matrix M, and its subsequent Gaussian elimination is shown in
Fig. 6.2b. The last row of the reduced matrix corresponds to the reduction fZ
f1,...,fs−−−−→+ r,
where r = Z + A ·B.
6.3 Reducing Bit-Level Inputs
When the remainder r contains only word-level variables, the problem of word-level
abstraction is solved. Thus, the focus now is to efficiently obtain a word-level abstraction




z1 z0 r0 a0b0 a0b1 a1b0 a1b1 a1B b0A b1A Z AB
fZ α 1 0 0 0 0 0 0 0 0 1 0
f5 1 0 1 0 0 0 1 0 0 0 0 0
f4 0 1 0 1 0 0 1 0 0 0 0 0
f3 0 0 1 0 1 1 0 0 0 0 0 0
b0f1 0 0 0 1 0 α 0 0 1 0 0 0
b1f1 0 0 0 0 1 0 α 0 0 1 0 0
a1f2 0 0 0 0 0 1 α 1 0 0 0 0
Af2 0 0 0 0 0 0 0 0 1 α 0 1

(a) MatrixM generated by Algorithm 4
M =

z1 z0 r0 a0b0 a0b1 a1b0 a1b1 a1B b0A b1A Z AB
fZ α 1 0 0 0 0 0 0 0 0 1 0
αf5 − row1 0 1 α 0 0 0 α 0 0 0 1 0
f4 − row2 0 0 α 1 0 0 α + 1 0 0 0 1 0
αf3 − row3 0 0 0 1 α α α + 1 0 0 0 1 0
b0f1 − row4 0 0 0 0 α 0 α + 1 0 1 0 1 0
αb1f1 − row5 0 0 0 0 0 0 0 0 1 α 1 0
Af2 − row6 0 0 0 0 0 0 0 0 0 0 1 1

(b)M reduced to row echelon form via Gaussian Elimination
Figure 6.2: F4-style polynomial reduction on a matrix for Example 6.5.
from each bit-level input variable {a0, . . . , ak−1} ∈ F2 to the word-level input variable




where each Fai is some function of A, which needs to be derived. Here, we present the
derivation when {a0, . . . , ak−1} ∈ F2 and A ∈ F2k . However, this result is applicable from
any field Fq to any extension of the field Fqk , i.e., when {a0, . . . , ak−1} ∈ Fq and A ∈ Fqk .
This generalized derivation is presented in Appendix A.
These mappings from {a0, . . . , ak−1} to A in Eqn. (6.6) are represented as polynomial
functions fa0 , . . . , fak−1 in the following form:
81
fa0 : a0 + Fa0(A)
... (6.7)
fak−1 : ak−1 + Fak−1(A)
Due to RATO, {a0, . . . , ak−1} > A, thus the leading terms of fa0 , . . . , fak−1 are a0, . . . , ak−1,
respectively. Let Fa = {fa0 , . . . , fak−1}. Then computing r
Fa,F0−−−→+ rw ensures that the
new remainder rw must only contain word-level variables. In other words, rw must be in
the form Z + F(A) and is thus the word-level polynomial representation of the circuit.
Over F2k , A = a0 +a1α+ · · ·+ak−1αk−1. To compute A2, a special property of Galois
fields, dealing with powers of elements, can be applied.
Lemma 6.2 From [86]: Let α1, . . . , αt be any elements in Fpk . Then
(α1 + α2 + · · ·+ αt)pi = αpi1 + αp
i
2 + · · ·+ αp
i
t (6.8)
for all integers i ≥ 1.
Lemma 6.2 can be applied to compute A2:
A2 = a20 + a
2
1α
2 + · · ·+ a2k−1α2(k−1) (6.9)
Since each ai ∈ F2, for 0 ≤ i < k, then a2i = ai. This is applied to find the final form for
A2:
A2 = a0 + a1α
2 + · · ·+ ak−1α2(k−1) (6.10)
Similarly, A4 can be derived as (A2)2:
A4 = a0 + a1α
4 + · · ·+ ak−1α4(k−1) (6.11)
Deriving A2j in this manner for all 0 ≤ j < k gives a system of k equations. These
equations can be represented in matrix form, A = Ma, where A = [A,A2, . . . , A2k−1 ]T ,










1 α α2 . . . αk−1
1 α2 α4 . . . α(k−1)·2

















Note that M is a matrix of constants and A and a are vectors of variables. However, by
interpreting a as a vector of unknowns, and M and A as constants, then Fa can be derived
by solving Eqn. (6.12) using Gaussian elimination. However, this system of equations also
has a special structure that can be exploited to further simplify the abstraction procedure.
Definition 6.3 Consider a system of n linear equations and n unknowns, x1, . . . , xn, ex-
pressed in matrix form as Mx = b:
m11 m12 . . . m1n
m21 m22 . . . m2n
...
... . . .
...


















where Mi is M with the i-th column replaced with vector b:
Mi =

m11 m12 . . . m1i−1 b1 m1i+1 . . . m1n
m21 m22 . . . m2i−1 b2 m2i+1 . . . m2n
...
... · ... ... ... · ...
mn1 mn2 . . . mni−1 bn mni+1 . . . mnn
 (6.15)
Definition 6.4 Let V (x1, . . . , xn) denote a square n x n matrix of the form
1 x1 x
2


















where elements of each row are presented in a geometric progression. Then V (x1, . . . , xn)
is a Vandermonde matrix, the determinant of which can be computed as:
|V (x1, . . . , xn)| =
∏
1≤i<j≤n
(xj − xi) (6.17)
This determinant is nonzero if each xi ∈ {x1, . . . , xn} is a distinct element.
Notice that M in Eqn. (6.12) is a square Vandermonde matrix of the form:
V (α, α2, . . . , α2
k−1
) (6.18)
Lemma 6.3 The determinant of M as in Eqn. (6.12) is nonzero.





j − α2i) (6.19)
Since F2k is constructed from a primitive polynomial, every αi is a distinct element for
0 ≤ i < 2k. Thus, |M| is nonzero as it is a product of nonzero elements.
Since |M| is nonzero, Cramer’s rule can be applied to derive an equation for every ai,




where Mi is M with the column {αi, αi·2, . . . , αi·2k−1}T replaced by A.
Mi =

1 α α2 . . . αi−1 A αi+1 . . . αk−1
1 α2 α4 . . . α(i−1)·2 A2 α(i+1)·2 . . . α(k−1)·2
...
...















Lemma 6.4 Over F2k , |M| = 1.







j − α2i) (6.22)



























When j = k − 1, the product term is in the form (α2k + α2i+1). Since α2k = α over F2k ,
this term is equivalent to (α2
i+1
+ α). This gives the property:
|M|2 = |M| (6.26)
|M| ∈ F2k , and only two elements of F2k satisfy Eqn. (6.26): 0 and 1. From Lemma 6.3,
|M| 6= 0. So |M| = 1.
The proof can be further explained with the help of an example.
Example 6.6 Over F23:
A = a0 + a1α + a2α
2
A2 = a0 + a1α
2 + a2α
4
A4 = a0 + a1α
4 + a2α
8 (6.27)








Since M is a Vandermonde matrix of the form V (α, α2, α4), its determinant is found by
applying Eqn. (6.17).
|M| = (α4 − α2) · (α4 − α) · (α2 − α) (6.29)
Over any F2k , −1 = 1, so Eqn. (6.29) is rewritten as
|M| = (α4 + α2) · (α4 + α) · (α2 + α) (6.30)
Note that |M| is nonzero since it is a product of nonzero terms. Now compute |M|2 while
applying Lemma 6.2:
|M|2 = [(α4 + α2) · (α4 + α) · (α2 + α)]2
= (α8 + α4) · (α8 + α2) · (α4 + α2) (6.31)
Over F23 , α8 = α, so Eqn. (6.31) is further simplified.
|M|2 = (α + α4) · (α + α2) · (α4 + α2) (6.32)
Notice that |M|2 = |M|. Since |M| 6= 0, |M| must equal 1, as no other element of F23 can
satisfy this condition. Indeed, evaluating Eqn. (6.30) and minimizing the result based on
the primitive polynomial, P (x), that was used to construct F23 will always give the result
1 regardless of which P (x) is chosen.
Applying Lemma 6.4 to Eqn. (6.20) gives the equation for ai,
ai = |Mi| (6.33)
The determinant |Mi| can be computed symbolically as described below.
6.3.1 Symbolically Computing the Bit-Level Mapping
The refinement of the determinant |Mi| uses fundamental symmetric polynomials.
Definition 6.5 For x1, . . . , xn and 0 ≤ j ≤ n, let Sj(x1, . . . , xn) be the j-th fundamental
symmetric polynomial in {x1, . . . , xn}:
86
Sj(x1, . . . , xn) =
∑
i1<···<ij
xi1xi2 · · ·xij (6.34)
Informally, Sj is the sum of all unique monomials of exactly j variables, with no
variable having an exponent greater than 1.
Example 6.7 The possible fundamental symmetric polynomials over {x1, x2, x3} are:
S0(x1, x2, x3) = 1
S1(x1, x2, x3) = x1 + x2 + x3
S2(x1, x2, x3) = x1x2 + x1x3 + x2x3
S3(x1, x2, x3) = x1x2x3
Proposition 6.4 Let Vi(x1, . . . , xn), 0 ≤ i ≤ n, be a square Vandermonde-like matrix de-
rived similarly to V (x1, . . . , xn) but with the column {xi1, . . . , xin}T skipped and a column
{xn1 , . . . , xnn}T appended to the end:






















... · ... ... · ...
1 xn x
2








It is known that
|Vi(x1, . . . , xn)| = |V (x1 . . . , xn)| · Sn−i(x1, . . . , xn) (6.36)




(−1)(i+j)A2j |Vi+1(α, . . . , α2(j−1), α2(j+1), . . . , α2(k−1))| (6.37)





·|V (α, . . . , α2(j−1), α2(j+1), . . . , α2(k−1))|
·Sn−1−i(α, . . . , α2(j−1), α2(j+1), . . . , α2(k−1)) (6.38)
87
6.4 Overall Approach
The entirety of the word-level abstraction approach for a circuit with k-bit input A and
k-bit output Z is summarized as follows:
1. Given a combinational circuit C, with word-level k-bit inputs A and word-level
output Z, select a primitive polynomial P (x) of degree k and construct F2k .
2. Perform a reverse-topological traversal of C to find RATO, {x1 > x2 > · · · >
xd > Z > A}, where {x1, . . . , xd} are bit-level variables with xi appearing earlier in
traversal than xj if i < j.
3. Derive the bit-level polynomials {f1, . . . , fs} from C. These will be in the form
fi : xi + TAIL(fi) where xi is the output of a Boolean logic gate.
4. Compose the word-level polynomials that correspond to bit-level and word-level
input and output:
fA : a0 + a1α + · · ·+ ak−1αk−1 + A (6.39)
fZ : z0 + z1α + · · ·+ zk−1αk−1 + Z (6.40)
5. Compute the reduction fZ
f1,...,fs,fA−−−−−−→+ r.
6. If r does not contain bit-level variables, then r is the word-level abstraction of C over
F2k . Otherwise, continue to step 7.
7. Compute Fa = {fa0 , . . . , fak−1} as
fa0 : a0 + |M0|
...
fak−1 : ak−1 + |Mk−1| (6.41)
where each |Mi| for 0 ≤ i < k is given by Eqn. (6.38).
8. Compute r
Fa,F0−−−→+ rw. Then rw is the word-level abstraction of C over F2k .
This approach can be easily extended to circuits with multiple word-level inputs as well as
circuits with varying word-sizes amongst the word-level inputs and output.
88
Example 6.8 Consider, again, the buggy example shown in Example 6.3, corresponding
to a buggy version of the multiplier circuit of Fig. 5.2. We already found
r = (α)a1b1 + (α + 1)a1B + b1A+ Z + (α + 1)AB (6.42)
Since r contains the bit-level variable a1, find fa1 : a1 + |M1|. In this example, fA :















fa1 : a1 + A
2 + A (6.45)
As r also contains the bit-level input b1, the polynomial fb1 is also required. Since fB :
b0 + b1α + B is isomorphic to fA, fb1 can be derived by performing the corresponding
substitutions in fa1 .
fb1 : b1 +B
2 +B (6.46)
Now, computing r
fa1 ,fb1−−−−→+ rw finds
rw = Z + (α)A
2B2 + A2B + (α + 1)AB2 + (α + 1)AB (6.47)
which is indeed the polynomial representation of the buggy circuit.
6.5 Complexity Analysis
The worst case complexity of abstracting a combinational circuit over F2k [x1, . . . , xn]
using the proposed approach is now analyzed. For simplicity, a generic circuit with a
k-bit input A and a k-bit output Z is examined. No assumptions are made about the type
of internal gates of the circuit; any gate can have an arbitrary number of inputs and an
arbitrary representation over F2k .




where fz : z0 + z1α + · · ·+ zk−1αk−1 + Z (6.49)
where {z0, . . . , zk−1} are the bit-level outputs, F is the set of polynomials derived from the
circuit, and F0 is the set of vanishing polynomials.
Lemma 6.5 No intermediate polynomial ri during the reduction process fz
F−{fz},F0−−−−−−→+ r
will ever contain a variable with a degree larger than 1.
Proof. Any intermediate result can contain three types of variables:
• Bit-level variables: Any intermediate division that results in a polynomial containing
a variable x ∈ F2 with degree higher than 1 is immediately divided by {x2+x} ∈ F0.
• Z: As reduction proceeds, the term Z is never modified since no polynomial other
than fZ contains the variable Z and fZ is never used during the reduction process.
• A: This term is contained in fA : a0 + a1α + · · · + ak−1αk−1 + A. Notice that
LT (fA) = a0 ∈ F2; since an intermediate polynomial will never contain the variable
a0 with a degree higher than 1, reducing by fA will not create a variable A with
degree higher than 1.
Lemma 6.6 Let C ·M be a term where C is a coefficient in F2k and M is a monomial with
variables in {x1, . . . , xn}. Since the maximum degree of any variable of any intermediate
polynomial ri is 1, the maximum number of terms in any ri is 2n. Similarly, the maximum
number of terms of any polynomial ∈ F is 2n since these polynomials are also guaranteed
not to have any variables with degree greater than 1.
The order in which polynomials in F and F0 are used to divide r is based on RATO.
Each division process divides the leading term of the intermediate polynomial.
Lemma 6.7 Each term in an intermediate result ri is reduced at most once.
Proof. Assume the division is being computed by some f ∈ F . This division is computed
as LT (ri)
LT (f)
· f + ri. Since LT (LT (ri)LT (f) · f) = LT (ri), the leading terms are cancelled. The
90
leading term of the resulting polynomial is strictly smaller than LT (ri) in the ordering.
As every subsequent division produces a leading term strictly smaller than the last, LT (ri)
never appears again. Thus, each term is divided at most once.
Lemma 6.8 Since the maximum number of terms in any resulting intermediate polynomial
ri is 2n and each term is divided at most once, the maximum number of divisions computed
during the reduction is 2n.
Assume that a monomial multiplication and monomial addition can be computed in
constant time. Each division is computed as LT (ri)
LT (f)
· f + ri. Here, f can have at most 2n
terms. LT (ri)
LT (f)
is computed in constant time. Then, (LT (ri)
LT (f)
) ·f requires at most 2n monomial
multiplications. As the result from a monomial multiplication may contain variables with
degrees higher than 1, each monomial has its variables minimized, for a maximum of
2n minimizations. Finally, this result is added to ri using a maximum of 2n monomial
additions. Thus each division uses at a maximum 2n + 2n + 2n + 1 monomial operations.
Lemma 6.9 The complexity of the reduction fz
F−{fz},J0−−−−−−→+ r is O(22n).
Proof. This is computed as maximum number of divisions multiplied by maximum number
of monomial additions/multiplications per division.
2n · (2n + 2n + 2n + 1) = 22n + 22n + 22n + 2n < 4 · 22n = O(22n) (6.50)
The next step is to derive the bit-level to word-level mapping FA. Without any op-
timizations, this can be derived by computing Gaussian elimination of a k by k matrix.
The worst case arithmetic complexity here is O(k3). Furthermore, this is done in parallel
with the reduction. As k is much smaller than n, the O(22n) reduction easily absorbs the
complexity of deriving FA, i.e., O(22n) >> O(k3).
The last step is computing the final reduction r
FA,F0−−−→+ rw.
Lemma 6.10 Any intermediate result rj during the final reduction r
FA,F0−−−→+ rw will
contain at most 22k + 1 terms.
91
Proof. Any intermediate polynomial will only contain the variables {a0, . . . , ak−1, Z, A}.
From Lemma 6.5, all variables {a0, . . . , ak−1} can be degree 1 at most. Hence, there can
be at most 2k terms containing only variables {a0, . . . , ak−1}. The variable A can have a
degree of at most 2k − 1, as any higher degree is immediately divided by {A2k + A} ∈
J0. Thus there can be at most 2k terms containing only the variable A, which means that
there are 2k · 2k = 22k terms containing variables in {a0, . . . , ak−1, A}. The variable Z
is only found in 1 term: Z. This term is never divided and never modified since the only
polynomial within the set of divisors that contain the variable Z is {Z2k +Z} ∈ J0. So the
maximum number of monomials in an intermediate result is 22k + 1.
Lemma 6.11 Due to Lemma 6.7, each term will be divided at most once. Z is never
divided, so at most 22k divisions are computed.
In each division, LT (ri)
LT (f)
· f + rj , f has at most 2k + 1 terms, since it is of the form
ai + F(A). Thus, there is 1 monomial division, at most 2k + 1 monomial multiplications.
As before, the degree of each resulting monomial needs to be minimized, which performs
a maximum of 2k + 1 minimizations. Finally, at most 2k + 1 monomial additions are
computed when adding this result to rj . Thus, the total number of monomial operations
per division is:
1 + (2k + 1) + (2k + 1) + (2k + 1) = 3 · 2k + 4 (6.51)
Lemma 6.12 The complexity of the final substitution, which is computed as the reduction
r
Fa,F0−−−→+ rw, is O(23k).
Proof. This is computed as the maximum number of divisions multiplied by the maximum
number of monomial operations per division.
(22k) · (3 · 2k + 4) = 3 · 23k + ·22k+2 < 3 · 23k+2 + 23k+2 = 23k+5 = O(23k) (6.52)
Theorem 6.3 The worst-case complexity for the abstraction algorithm is
O(22n) +O(23k) (6.53)
92
The first reduction is the main bottleneck in the abstraction computation since n is much
larger than k.
6.6 Conclusions
This chapter proposed an approach that solves the problem of word-level abstraction
from bit-level circuits using symbolic computation. Using an improved ordering (RATO)
a reduction is computed to derive a polynomial, r, which contains only bit-level inputs
{a0, . . . , ak−1} and the word-level datapaths A and Z. This reduction is efficiently com-
puted using an F4-style reduction engine. Next, an equation of the form ai = Fai(A),
which maps each bit-level input to its corresponding word-level representation, is computed
using a binomial expansion over F2k . Substituting each ai variable in r by Fai(A) gives
rw, which is the canonical, word-level, polynomial representation of the circuit. Finally, a
complexity analysis of the overall approach is provided.
The abstraction approach is directly applicable only to circuits with equivalent datapath
sizes amongst the inputs and output, i.e., every word input and output is of size k. The next
chapter generalizes the abstraction approach to be applicable to any arbitrary combinational
circuit.
CHAPTER 7
GENERALIZING THE APPROACH TO ARBITRARY
COMBINATIONAL CIRCUITS
The abstraction approach presented in the previous chapter is directly applicable only
when the word-size of the operands of the given circuit are the same, k bits in size. In this
case, the circuit computes a function over Fn
2k
→ F2k , where n is the number of word-level
inputs, and the abstraction is thus analyzed over F2k . When the input and output sizes
vary, the analysis must be performed over an overarching field. This chapter shows how
to suitably modify the approach for abstracting word-level representations of circuits with
varying input and output sizes.
As an application of this generalization, design and verification methodologies for
composite field multipliers over F2k are also explored. These designs decompose the
field F2k to F(2m)n , where k = m · n. Internally, the circuit is then composed of an
n-interconnection of m-bit multipliers and adders over F2m . This chapter describes how
this hierarchy can be exploited by the abstraction approach.
7.1 Circuits with Varying Input and Output Sizes
When the word size of the inputs and output of the circuits vary, the functionality of
the circuit must be analyzed over an encompassing field. Given a circuit with a word-level
input A and word-level output Z, let t be the bit size of the A and u be the bit size of
Z. Input A can be represented as an element over the field F2t; likewise, Z is an element
of F2u . Thus, this circuit computes some function f : F2t → F2u . If t 6= u, there is
no guarantee that A can be represented over F2u and or conversely Z over F2t . Thus, the
analysis must be performed over some F2k such that F2t ⊂ F2k and F2u ⊂ F2k . That is, the
function is mapped to a larger field F2k → F2k . The smallest such field F2k is constructed
where k = LCM(t, u).
94
Given the primitive polynomial P (x) of degree k, let α be the primitive element of F2k ,
i.e., P (α) = 0. Let β be some primitive element of F2t . Then the polynomial fA is denoted
fA : a0 + a1β + · · ·+ at−1βt−1 + A (7.1)
Since the analysis is over F2k , β must be mapped directly to α. Here, a generalized result
from [107] gives the relation.
β = α(2
k−1)/(2t−1) (7.2)
All powers of β in fA are replaced by their representation in α. Similarly, let γ be the
primitive element of F2u , such that
fZ : z0 + z1γ + · · ·+ zu−1γu−1 + Z (7.3)
A mapping from γ to α is found in the same way and applied to fZ . All polynomials in the
ideal J now have coefficients in F2k . The polynomials in J0 derived from A and from Z
are modified to reflect their corresponding fields:
A2
t
+ A = 0 Z2
u
+ Z = 0 (7.4)
Now, the reduction procedure fZ
F−{fz},F0−−−−−−→+ r is computed normally over F2k .
Some changes still need to be made to functionally map each {a0, . . . , at−1} to A over
F2k . This is accomplished by first computing the bit-level to word-level mapping over
F2t using the same approach described in Chapter 6, which will provide the polynomials
fa0 , . . . , fat with coefficients in β. Then, map each β coefficient in Fa = {fa0 , . . . , fat} to α
using Eqn. (7.2). Polynomials of Fa are now in F2k and the final substitution r
Fa+J0−−−→+ rw
is computed as normal.
This allows the word-level abstraction of any combinational circuit with one word-level
input of any size and one word-level output. In the case of multiple word-level inputs, let
k be the LCM of all the bit-sizes of the inputs and outputs. Then the abstraction proceeds
as normal.
95
Example 7.1 Consider the circuit shown in Fig. 7.1. The input, A, is 3 bits wide while
the output, Z is 2 bits. Thus, A ∈ F23 and Z ∈ F22 . Let β be the primitive element of
F23 and γ be the primitive element of F22 , i.e., A = a0 + a1β + a2β2 and Z = z0 + z1γ.
Thus, the circuit computes the function F23 → F22 . Since LCM(2, 3) = 6, then F22 ⊂ F26
and F23 ⊂ F26 . So this function mapped to F26 → F26 . Choose P (X) = X6 + X + 1
as the irreducible polynomial to construct F26 , where P (α) = 0. Then β and γ in can be




6−1)/(22−1) = α21 (7.5)
So the word-level polynomials are:
fA : a0 + a1α
9 + a2α
18 + A
fZ : z0 + z1α
21 + Z (7.6)
The rest of the polynomials in F are derived from the circuit:
f1 : z0 + s0 + s1 f2 : z1 + s1 + s2 f3 : s0 + a0 · a1
f4 : s1 + a1 · a2 f5 : s2 + a0 · a2 (7.7)
where the RATO ordering is:








Figure 7.1: Circuit with varying word sizes.
96
F0 is defined as before, except with a change to the polynomials derived from word-level
variables A and Z:
f6 : z
2
0 + z0 f7 : z
2





1 + s1 f10 : s
2





1 + a2 f13 : a
2
2 + a2 f14 : Z
4 + Z
f15 : A
8 + A (7.9)
Computing fZ
F−fz ,F0−−−−−→+ r gives:
r = (α2 + α) · a1 · a2 + a1 · A+ (α4 + α3) · a1
+(α5 + α4 + α3 + α + 1) · a2 · A+ (α5 + α4 + α2 + α) · a2 + Z (7.10)
This result must be further reduced by fa1 : a1 + Fa1(A) and fa2 : a2 + Fa2(A). First,












1 β A1 β2 A2
1 β4 A4
 (7.11)
Computing the determinants in fa1 : a1 + |M1| and fa2 : a2 + |M2| gives:
fa1 : a1 + A
4 · (β4 + β2) + A2 · (β8 + β2) + A · (β8 + β4)
fa2 : a2 + A
4 · (β2 + β) + A2 · (β4 + β) + A · (β4 + β2) (7.12)
Replacing all β in fa1 and fa2 with α
9 gives their proper form over F26 .
fa1 : a1 + A
4 · (α4 + α3 + 1) + A2 · (α4 + α2 + α + 1) + A · (α3 + α2 + α)
fa2 : a2 + A
4 · (α4 + α2 + α + 1) + A2 · (α3 + α2 + α) + A · (α4 + α3 + 1) (7.13)
Finally, computing the reduction r
fa1 ,fa2 ,F0−−−−−−→+ rw gives the word-level polynomial
abstraction of the circuit.
97
rw : Z + A
6(α2 + α) + A5(α4 + α3 + α) + A4(α2 + α)
+A3(α4 + α3 + α2) + A2(α4 + α3 + α2) + A(α4 + α3 + α)
This result allows the abstraction to be computed over fields of different sizes. An
important application of this result is that of modularly-designed composite field arithmetic
circuits. These circuits compute operations over very large fields (i.e., k = 1024) by
combining operations over smaller subfields (i.e., k = 32). The abstraction approach can
be efficiently applied to these types of circuits by exploiting the hierarchy found in these
designs.
7.2 Composite Field Arithmetic Circuits
A Galois field multiplier over F2k can be composed over the composite field F(2m)n
where k = m · n [108]. Similarly to how F2k is a k-dimensional extension of the subfield
F2, F(2m)n is also an n-dimensional extension of F2m . A composite field multiplier lifts
the ground field from F2 to F2m and computes the multiplication over F2k as a collection of
operations over F2m . Thus, a composite field multiplier over F2k is composed internally as
a collection of multipliers and adders over F2m .
This hierarchy found in composite field multipliers can be exploited by the proposed
abstraction approach. Abstraction of these types of multipliers is composed of two steps:
1. Compute the canonical word-level polynomial representation of each F2m multiplier
and adder.
• These abstractions are independent of one another. Thus, they are computed in
parallel.
• In the case of an adder, the abstraction is trivial.
2. Compute the overall abstraction of the F2k multiplier.
The first step utilizes the proposed abstraction approach over F2m with no changes. Once
these word-level abstractions are known, they replace the gate-level implementations for
the final word-level abstraction of the multiplier over F2k .
98
7.2.1 Design of Composite Field Multipliers
The following adapts principles of composite fields from [108] and explains how they
are applied to construct Galois field multipliers. Consider the element A ∈ F2k and its
representation over F(2m)n . Let α be the primitive element of F2k and let γ be the primitive
element of F(2m)n . Then any element A ∈ F2k is represented as:
A = a0 + a1α + · · ·+ ak−1αk−1,where ai ∈ F2 (7.14)
This same element A ∈ F(2m)n is represented as:
A = A0 + A1γ + · · ·+ An−1γn−1,where Ai ∈ F2m (7.15)
Let β be the primitive element of F2m . Then each Ai is represented as
Ai = ai0 + ai1β + · · ·+ ai{m−1}βm−1,where aij ∈ F2 (7.16)
Since there always exists a unique field with pk elements, the field F2k is isomorphic
to the field F(2m)n . Due to this, γ = α. Furthermore, β can be derived from α using Eqn.
(7.2) since F2m ⊂ F2k , i.e., β = αw for some w. Thus, all that is required to construct a
composite field multiplier over F(2m)n is the primitive polynomial P (x), which generates
F2k , with P (α) = 0, where β is known.
In order to lift the ground field F2 to F2m , the variables {a00, . . . , a{n−1}{m−1}} must
be derived in terms of {a0, . . . , ak−1}. Equating the representation of A ∈ F2k with the
representation of A ∈ F(2m)n from Eqns. (7.14) and (7.15) gives the following:
a0 + a1α + · · ·+ ak−1αk−1







aij · αwj) · αi (7.17)
Here, every A0, . . . , An−1 is replaced by its representation over F2m . Analyzing the co-
efficients gives {a0, . . . , ak−1} in terms of {a00, . . . , a{n−1}{m−1}}. This mapping can be












where T is a k by k matrix consisting of elements in F2. Inverting T gives a mapping from
{a00, . . . , a{n−1}{m−1}} to {a0, . . . , ak−1}.
Example 7.2 An example composite field multiplier F(22)2 , which computes a multiplica-
tion over F24 , is shown in Fig. 7.2. Notice that, after the transformation, all additions and
multiplications are computed over the base field F22 .
Let P (x) = x4 + x3 + 1 and P (α) = 0. Representation of element A ∈ F(22)2 is:
A = A0 + A1 · α (7.19)
Representation of A0, A1 in F2m is:
A0 = a00 + a01 · β

















































Figure 7.2: 4-bit composite multiplier designed over F(22)2 .
100
where aij ∈ F2. From Eqn. (7.2), β = α5. Now, A0, A1 can be substituted into A as
follows:
A = a00 + a01 · α5 + (a10 + a11 · α5) · α (7.21)
Since P (x) = x4 + x3 + 1 with P (α) = 0,
A (mod P (α)) = a00+a01+a11+(a01+a10+a11) ·α+a11 ·α2+(a01+a11) ·α3 (7.22)
The same element A ∈ F24 is represented as:
A = a0 + a1 · α + a2 · α2 + a3 · α3 (7.23)
Since Eqns. (7.22) and (7.23) represent the same element, we can match the coefficients of
the the polynomials to obtain:
a0 = a00 + a01 + a11
a1 = a01 + a10 + a11
a2 = a11
a3 = a01 + a11







1 0 0 1
0 0 1 1
0 1 0 1








Thus, A is represented in F(22)2 as:
101
A = A0 + A1 · α
A0 = a00 + a01 · α5
A1 = a10 + a11 · α5
a00 = a0 + a3
a01 = a2 + a3
a10 = a1 + a3
a11 = a2
B is similarly represented in F(22)2 .
7.2.2 Abstraction of Composite Field Multipliers
The first step in abstracting the word-level polynomial representation of the composite
field multiplier is to abstract every internal F2m computational block. In the case of a F2m
adder block ({Z0, Z1, C4} from Fig. 7.2), this abstraction is trivial, as the adder is just a
bit-wise XOR computation. In the case of a multiplier block ({C0, C1, C2, C3, C5, C6} from
Fig. 7.2), the abstraction approach presented in the previous chapter is directly applicable.
Each abstraction can be computed independently, so these computations can be performed
in parallel. This set of abstracted polynomials, F , generates the ideal J .
Once the word-level abstraction of each F2m subblock is known, the final abstraction of
the entire design can be computed using only word-level variables. Set a RATO ordering.
Then reduce the word level polynomial by F + F0:
fZ : Z0 + Z1α + · · ·+ Zn−1αn−1 + Z (7.25)
fZ
F+F0−−−→+ r (7.26)
Here, F is the set of word-level polynomial abstractions from each F2m block and F0 is the
set of vanishing polynomials. Every variable X , apart from the word-level inputs A and B
and word-level output Z, is an element of F2m , so its corresponding vanishing polynomial
is X2m +X . Computing the reduction gives remainder r containing the elements
A0, A1, . . . , An−1, B0, B1, . . . , Bn−1, Z (7.27)
102
Similarly, the mapping from {A0, . . . , An−1} to A now needs to be derived (along with the
mapping from {B0, . . . , Bn−1} to B). That is, the next step is to find
FA = {FA0 , FA1 , . . . , FAn−1}
where FAi = Ai + FAi(A) (7.28)
FA is derived from the polynomial
A = A0 + A1α + · · ·+ An−1αn−1 (7.29)
Compute A2m as follows:
A2
m






2m + · · ·+ A2mn−1α2
m(n−1)
= A0 + A1α
2m + · · ·+ An−1α2m(n−1) (7.30)
Here, the propertyA2mi = Ai is exploited. Continually raise this result by 2
m to obtainA2jm
for all 0 ≤ j < n. This gives a system of n equations and n unknowns {A0, . . . , An−1}.
As before, this system of equations can be represented in matrix form, A = Ma, where
A = {A,A2m , . . . , A2(n−1)m}T , M is a n by n matrix of coefficients ∈ F2k , and a =







































The matrix M is the Vandermonde matrix V (α, α2m , . . . , α2(n−1)m). Since α is a prim-
itive element, all elements α, . . . , α2k−1 are unique, where k = m · n. Thus, all elements
α, α2
m
, . . . , α2
(n−1)m are unique, so |M| 6= 0. Then, Cramer’s rule can be applied to derive
each FAi as




whereMi isMwith the i-th column replaced byA. Here, |M| is not guaranteed to be equal
to 1, so it must be computed. FB = {FB0 , . . . , FBn−1} is similarly derived. Computing
r
FA,FB ,J0−−−−−→+ rw gives rw : Z + F(A,B), which is the canonical word-level polynomial
abstraction of the composite field design.
Example 7.3 Consider the composite field multiplier over F222 from Example 7.2. Ab-
stracting every multiplier and adder over the base field F22 gives the following polynomials:
f1 : Z0 + C6 + C2 f2 : Z1 + C5 + C4 f3 : C6 + α
5C3
f4 : C5 + α
5C3 f5 : C4 + C1 + C0 f6 : C3 + A1 ·B1
f7 : C2 + A0 ·B0 f8 : C1 + A1 ·B0 f9 : C0 + A0 ·B1 (7.33)
Set the following RATO ordering:
Z0 > Z1 > C6 > C5 > C4 > C3 > C2 > C1 > C0
> A0 > A1 > B0 > B1 > Z > A > B (7.34)
The vanishing polynomials are:
f10 : Z
4
0 + Z0 f11 : Z
4





5 + C5 f14 : C
4





2 + C2 f17 : C
4





0 + A0 f20 : A
4





1 +B1 f23 : Z




Then F = f1, . . . , f9 and F0 = f10, . . . , f25. Here, fZ : Z0 + Z1α + Z. Computing
fZ
F,F0−−→+ r gives:
r = A0 ·B0 + αA0 ·B1 + αA1 ·B0 + α2A1 ·B1 + Z (7.36)



















After minimizing by the primitive polynomial P (x) = x4+x3+1, the determinants of these
matrices are:
|M| = α3 + α + 1; |M0| = αA4 + (α3 + 1)A; |M1| = A4 + A (7.38)
Now, FA0 and FA1 are derived:
FA0 = A0 +
|M0|
|M| = A0 + (α
3 + α2 + 1)A4 + (α3 + α2)A (7.39)
FA1 = A1 +
|M1|
|M| = A1 + (α
3 + α)A4 + (α3 + α)A (7.40)
Since FB = {FB0 , FB1} is derived from B = B0 + B1α, which is isomorphic to A =
A0 + A1α, then FB can be derived from FA by substituting corresponding variables.
FB0 = B0 + (α
3 + α2 + 1)B4 + (α3 + α2)B (7.41)
FB1 = B1 + (α
3 + α)B4 + (α3 + α)B (7.42)
Finally, computing r
FA,FB ,F0−−−−−→+ rw gives the word-level abstraction of the circuit.
rw : Z + A ·B (7.43)
7.3 Conclusion
This chapter described how to generalize the abstraction approach to any arbitrary
combinational circuit. This method works best when the derived operand-width k, from
the LCM of the word-lengths of all inputs and output, is not very large, for instance, when
the output size is a multiple of the input sizes. When all word-sizes are relatively prime, k
is a product of all sizes, which can make the analysis overly bulky.
An abstraction technique for exploiting the hierarchy of composite field multiplier
circuits over F(2m)n was also examined. The approach abstracts all internal multipliers
and adders over F2m in parallel and uses these abstractions to compute the final abstraction
completely over word-level variables. Experimental results for abstractions of composite
field multipliers using a custom-built tool is presented in the next chapter.
CHAPTER 8
IMPLEMENTATION OF THE CUSTOM
ABSTRACTION SOFTWARE AND
EXPERIMENTAL RESULTS
The abstraction procedure can be fully scripted using the computer algebra tool SIN-
GULAR. However, SINGULAR has limitations that make abstraction of large circuits im-
possible. This is due to:
• A limit on the number of ring variables allowed. As of SINGULAR release 4.0.1, a
ring cannot be declared with more than 32, 767 variables. This limits the number of
gates that can be present in the design.
• The size of an exponent (n) of a variable x, is limited n < 232.
• Large amount of memory usage and slow computation time. SINGULAR uses a
dense-distributive structure for polynomials, which is a poor representation of sparse
polynomials over rings with many variables.
The limit on the size of exponents prohibits an abstraction beyond 32-bit circuits, as larger
circuits require manipulating word-level variables with exponents larger than 232. Even if
this limitation is overcome, SINGULAR uses an enormous amount of memory. For instance,
preliminary experiments show that using SINGULAR to compute the initial reduction of a
163-bit Mastrovito multiplier uses 41.6 GB of memory! Thus, deriving abstractions using
SINGULAR is infeasible on desktop workstations.
In order to overcome these limitations, a custom C++ tool is developed to compute
the word-level abstraction of circuits quickly and efficiently. This chapter describes the
implementation details of this tool.
106
8.1 Data Structures and Algorithms
Computing a word-level abstraction requires the representation and manipulation of
polynomials in F2k [x1, . . . , xd] over a lex ordered ring. Intermediate polynomials can
become very large during the abstraction procedure, so the data structure to represent them
is designed to conserve memory while allowing for fast manipulation during the reduction
procedures. The backbone of the tool is a custom library composed of three main sections:
1. Galois field elements
2. Monomials and rings
3. Polynomials and division procedures
These are built on top of each other. The starting point is the custom Galois field element
section, which facilitates the construction of Galois field elements over F2k .
8.1.1 Galois Field Elements
The Galois field section of the library is initialized by parsing a given primitive poly-
nomial P (x) of degree k, which constructs F2k . Any element C ∈ F2k can be represented
in the form
C = ck−1 · αk−1 + ck−2 · αk−2 + · · ·+ c2 · α2 + c1 · α + c0 (8.1)
where {c0, . . . , ck−1} ∈ F2 and α is the primitive element. This structure is stored as an













Figure 8.1: Object structure of a Galois field element.
107
Addition between two elements
C = ck−1 · αk−1 + · · ·+ c2 · α2 + c1 · α + c0 (8.2)
D = dk−1 · αk−1 + · · ·+ d2 · α2 + d1 · α + d0 (8.3)
is simply a combination of like terms
C +D = (ck−1 + dk−1) · αk−1 + · · ·+ (c2 + d2) · α2 + (c1 + d1) · α + (c0 + d0) (8.4)
Since addition over F2 is computed as a bit-wise XOR, the library’s Galois field element
structure allows addition to be trivially performed as a byte-wise XOR operation. Further-
more, this structure makes it easy to check if a given element is equal to 0 or 1, which is
used in deciding when a term is to be removed (0) and when a division can be ignored (1).
During library initialization, the elements αk, αk+1, . . . , α2k−2 are precomputed and
cached. First, αk is derived directly from the given primitive polynomial,
P (x) = xk + ck−1 · xk−1 + · · ·+ c1 · x+ 1 (8.5)
for {c1, . . . , ck−1} ∈ F2. Since P (α) = 0,
αk = ck−1 · αk−1 + · · ·+ c1 · α + 1 (8.6)
To compute αk+1, first compute a 1-bit left shift of αk.
αk+1 = ck−1 · αk + ck−2 · αk−1 + · · ·+ c1 · α2 + α (8.7)
This element can contain the term αk, which must be minimized by the primitive polyno-
mial. Thus, if ck−1 is 1, the ck−1αk term is removed and the minimized form of αk is added.
This derives the minimized form for αk+1. Computation continues in this fashion (shift by
1, minimize if needed) until all αk, αk+1, . . . , α2k−2 have been derived. These elements are
later used during the multiplication procedure.
Example 8.1 Given the primitive polynomial P (x) = x4 + x3 + 1, initialize the library by
computing α4, α5, and α6. Here, k = 4, and each element of F24 can be represented as
108
c3 · α3 + c2 · α2 + c1 · α + c0 (8.8)
which is stored as one byte in the following form.
0 0 0 0 c3 c2 c1 c0 (8.9)
Notice that there are 4 leading bits that are unused; these are always set to 0.
P (α) = α4 + α3 + 1 = 0. Hence, α4 = α3 + 1, which is stored as
c3 c2 c1 c0
α4 = 0 0 0 0 1 0 0 1
(8.10)
To compute α5, take the α4 element and shift the result left 1 bit. The c3 term is dropped
since the leading 4 bits are always 0.
0 0 0 0 0 0 1 0 (8.11)
Then, since c3 was 1, add α4.
0 0 0 0 0 0 1 0
+ 0 0 0 0 1 0 0 1
0 0 0 0 1 0 1 1
(8.12)
This gives α5 = α3 + α + 1. Similarly, α6 is derived as
0 0 0 0 0 1 1 0
+ 0 0 0 0 1 0 0 1
0 0 0 0 1 1 1 1
(8.13)
Multiplication requires temporarily increasing the size of the byte-array to store the
intermediate result, which can have values up to α2(k−1).
c2k−2 · α2k−2 + · · ·+ ck · αk + ck−1 · αk−1 + · · ·+ c2 · α2 + c1 · α + c0 (8.14)
109
This result needs to be divided by the given minimum polynomial. Each α term with an
exponent of k or larger is replaced by its minimized equivalent, which was computed during
initialization. That is, for i ≥ k, each term for which ci = 1 is removed and the minimized
form of αi is added in.
Example 8.2 Consider again the setup for F24 from Example 8.1. Compute the product of
the following two elements:
α3 + α2 + 1 (8.15)
α2 + α (8.16)
These elements are stored, respectively, as
0 0 0 0 1 1 0 1 (8.17)
0 0 0 0 0 1 1 0 (8.18)
The intermediate result is computed using the basic shift-and-add procedure.
1 1 0 1
x 0 1 1 0
0 0 0 0
1 1 0 1
1 1 0 1
+ 0 0 0 0
0 0 1 0 1 1 1 0
(8.19)
The intermediate result of the multiplication is α5 +α3 +α2 +α, which needs to be further
minimized. The value of α5 was determined during initialization to be α3 + α + 1. The α5
term from the intermediate result is removed and the minimized form is added.
110
0 0 0 0 1 1 1 0
+ 0 0 0 0 1 0 1 1
0 0 0 0 0 1 0 1
(8.20)
So the minimized result of the product is α2 + 1.
Division of two Galois field elements, C = B
A
, requires finding the multiplicative
inverse of the divisor: C = B ·A−1. To find the inverse, the library implements the extended
Euclidean algorithm over F2k , depicted in Algorithm 5. The algorithm requires a nonmin-
imized representation of the element P (α), so the size of object is temporarily increased
to allow the storage of the αk bit. The function DIV returns the quotient and remainder
of a Euclidean division; that is, DIV (A,B) returns {Q,R}, where A = B · Q + R. This
procedure is described in Algorithm 6; here, DEG returns the highest degree of a given
element in F2k , i.e., DEG(α4 + α3 + 1) would return 4.
Example 8.3 Given P (x) = x8 + x4 + x3 + x + 1, which generates F28 , find A−1 where
A = α6 + α4 + α + 1. Table 8.1 shows the steps Algorithm 6 goes through to find the
inverse.
Algorithm 5: Inverse of an Element Over F2k
Input: M := P (α) where P (x) was used to generate F2k , A ∈ F2k
Output: A−1 over F2k
{Q0, Q1} := {0, 0};
{R0, R1} := {M,A};
{U0, U1} := {0, 1};
i := 1;
while Ri 6= 1 do
if Ri == 0 then
ERROR: No inverse exists
end
{Qi+1, Ri+1} := DIV (Ri−1, Ri);
Ui+1 := (Qi+1 · Ui) + Ui−1;




Table 8.1: Steps to derive the inverse of α6 + α4 + α + 1
i Qi Ri Ui
0 0 α8 + α4 + α3 + α + 1 0
1 0 α6 + α4 + α + 1 1
2 α2 + 1 α2 α2 + 1
3 α4 + α2 α + 1 α6 + α2 + 1
4 α 1 α7 + α6 + α3 + α
Algorithm 6: DIV (Euclidean Division Over F2k)
Input: A,B ∈ F2k
Output: {Q,R} such that A = B ·Q+R
{Q,R} := {0, A};
while DEG(R) ≥ DEG(B) do
S := αDEG(R)−DEG(B);
Q := Q+ S;
R := R + S ·B;
end
return {Q,R};
The derived inverse is A−1 = α7 + α6 + α3 + α. Correctness can be checked by
computing A · A−1 and verifying that the result is 1 (mod P (x)).
8.1.2 Rings and Monomials Over Galois Fields
A monomial M over the ring F2k [x1, . . . , xd] is a power-product of variables from the
ring along with a coefficient C ∈ F2k .
M = C · x1e1 · x2e2 · · ·xded ; ei ≥ 0 (8.21)
Ring variables can either be bit-level (representing a single wire within a circuit) or word-
level (representing a word input or output). If xi is a bit-level variable, then xi ∈ F2; thus
it has the property x2i = xi, so its exponent ei ∈ {0, 1}. If the variable is word-level, then
ei < 2
k due to the property x2ki = xi.
Lex ordering is the only monomial ordering used for abstraction, and hence it is the only
ordering implemented in the tool. Ring variables are added as strings, one at a time, along
with an argument stating whether the variable is bit-level or word-level. Each variable is
112
given a unique unsigned integer id, which is continuously incremented with each added
variable. Thus, ids of two variables can be compared to quickly distinguish which variable
appears earlier in the ordering.
Three static objects are created during initialization of the ring:
• strToId — a map of each variable name (string) to its id (unsigned int)
• idToStr — a map of each id (unsigned int) to its variable name (string)
• wordSet — a set of ids (unsigned int) of all variables that are word-level
Here, “map” and “set” are classes of the standard C++ library. It is important to note that
a C++ “set” is a container of unique, ordered elements (this property is exploited later).
The “strToId” map object is used when constructing monomials to quickly find the id of a
parsed variable name. The “idToStr” map object allows a monomial object to be printed
to the user. The “wordSet” object is used to quickly check whether a given variable is
word-level, which determines how it is handled during monomial operations.
Once the ring has been initialized, monomials can be generated and manipulated. In-
ternally, all monomial variables are manipulated using their ids. Each monomial object
contains the following:
• coef — a Galois field object (as described in the previous subsection)
• idSet — a set of ids (unsigned int) of all variables in the monomial
• idToExp — a map of variable ids (unsigned int) to their exponents (BigUnsigned)
During monomial creation, string variable names are parsed and mapped to their corre-
sponding ids, which are then added to the set. The exponent map is only filled for variables
that are word-level. As most variables are bit-level in a circuit, most monomials will have
a completely empty map. Since exponents can be much larger than what can be stored
in a primitive data structure, each exponent is stored as a BigUnsigned object of the open
source library BigInt [109]. This is a library that provides basic functionality for signed
and unsigned integers of unbounded size.
Monomial comparison is required for proper monomial ordering, which is necessary for
implementation of polynomial procedures such as reduction. The comparison procedure
113
compares the id sets of two monomials, one variable at a time. If the variables differ, the
smaller id appears earlier in the ordering. If they are the same, exponents are checked if the
variable is word-level. This procedure is shown in Algorithm 7.
Multiplication of two monomials is the main function of this portion of the tool. First,
the two Galois field objects are multiplied together using the previously described method.
Then, the two sets of ids are merged together. Since sets can only contain unique values,
duplicates are discarded; this is done automatically using the standard set::insert operation.
In the common case, both monomials only contain bit-level variables and the multiplication
would be complete. If there are word-level variables in the monomials, the mapped expo-
nents of each such variable would be added together and then minimized if the resulting
exponent is ≥ 2k.
Algorithm 7: Monomial Comparison
Input: Monomials M1 and M2
Output: < 0 if M2 > M1, > 0 if M1 > M2, 0 if M1 == M2
id1 := M1.idSet.begin();
id2 := M2.idSet.begin();
while id1 6= ∅ && id2 6= ∅ do
if id1 6= id2 then
return id2-id1;
end
if id1 ∈ wordSet then
if M1.idToExp[id1] 6= M2.idToExp[id2] then






if id1 == ∅ && id2 == ∅ then
return 0;
end





Example 8.4 Consider again the setup for F24 from Example 8.1. Construct the ring
F24 [a, b, c, Z] with the lex ordering a > b > c > Z, where {a, b, c} are bit-level variables
and Z is a word-level variable. The initialized monomial static library objects are:
strToId idToStr wordSet
“a” → 0 0 → “a”
“b” → 1 1 → “b” {3}
“c” → 2 2 → “c”
“Z” → 3 3 → “Z”
(8.22)
Let M1 and M2 be the following monomials:
M1 = (α
3 + α2 + 1)abZ10 M2 = (α
2 + α)bcZ7 (8.23)
These are stored internally by the tool as:
M1
coef idSet idToExp
0 0 0 0 1 1 0 1 {0, 1, 3} 3→ 10 (8.24)
M2
coef idSet idToExp
0 0 0 0 0 1 1 0 {1, 2, 3} 3→ 7
(8.25)
It is easy to see thatM1 > M2 in the given ordering, since the first element of “idSet” inM1
is 0 while in M2 it is 1. Multiplying M1 by M2 is computed by first multiplying the Galois
field elements (coef) together (as shown in Example 8.2). Then, the two sets (idSet) are
merged (union). Notice that, although variable b appears in both monomials, b2 = b due
to it being a bit-level variable. This is handled automatically by the set class (duplicates
thrown out). Finally, the two corresponding exponents of variable Z (idToExp) are added
together. Since this new exponent of Z is 17, and 17 ≥ 24, the exponent is minimized by the




0 0 0 0 0 1 0 1 {0, 1, 2, 3} 3→ 2
(8.26)
So M1 ·M2 = (α2 + 1)abcZ2
Monomial division is a procedure mainly used during polynomial reduction. Given two
monomialsM1 andM2, compute M1M2 . That is, find a monomialM3 such thatM1 = M2 ·M3.
Monomial division is described in Algorithm 8. The division procedure loops over all
variables in M2 and removes them from M1 if they are bit-level. For word-level variables,
exponents are subtracted from each other. If a variable exists in M2 but not in M1, or if the
exponent of a word-level variable in M2 is larger than in M1, the division is 0. Finally at
the end, the Galois field elements are divided by each other.
Algorithm 8: Monomial Division





foreach id ∈M2.idSet do
if id /∈M3 then
return NULL;
end
if id /∈ wordSet then
M3.idSet.erase(id);
else












8.1.3 Polynomials and Polynomial Division
With the monomial structure defined, a polynomial is simply a C++ vector of monomial
objects. These monomial objects are ordered by the given ring ordering, which is imposed
at all times, using monomial comparisons.
Polynomial addition is computed by simply merging the vector lists of two polynomials
together, since the two polynomial vectors are already sorted. If two monomials are found
to be equal, their Galois field coefficients are added together and the resulting monomial
added to the sum if the new coefficient is not 0.
Multiplication of a polynomial by a monomial, P2 = M1 · P1 is detailed in Algorithm
9. Each monomial in P1 is iteratively multiplied by M1 to derive a temporary monomial
Mtemp, which is added to P2. Let Mlast denote the last monomial in P2 at any given time.
Due to the ordering, typically Mlast ≥ Mtemp during the procedure. In cases where it is
not, which only happens when exponents have been minimized, Mtemp falls not far earlier
than Mlast. Thus, to add Mtemp to P2, Mtemp is compared to monomials in P2 in reverse
order.
Multiplication of polynomials, P3 = P1 · P2, is computed as numerous monomial-by-
polynomial multiplications. Each monomial in P1 is multiplied by the entire polynomial
P2 to derive a temporary polynomial Ptemp. Then, Ptemp is added to a growing P3. Order
is maintained by these subprocedures, so no further ordering logic is needed.
Example 8.5 Assume the environment has been set up over the ring F24 [a, b, c, Z] as in
Example 8.4. Let P1 and P2 be the following polynomials
P1 = (α)ab+ bZ
3 P2 = abc+ ab+ b (8.27)
P1 + P2 is computed by merging the polynomials together. Two monomials exist with the
same order, (α)ab in P1 and ab in P2, so here only the coefficients are merged.
P1 + P2 = abc+ (α + 1)ab+ bZ
3 + b (8.28)
P1 · P2 is computed by taking each monomial of P1 and multiplying it by P2. The first
temporary polynomial generated is:
117
(α)ab · P2 = (α)ab · (abc+ ab+ b) = (α)abc+ (α)ab+ (α)ab = (α)abc (8.29)
Notice that, since two equivalent terms were generated, their coefficients were added to-
gether creating (0) · ab, so this term was removed. The second multiplication is
bZ3 · P2 = bZ3 · (abc+ ab+ b) = abcZ3 + abZ3 + bZ3 (8.30)
Finally, these two polynomials are added together to obtain the final result.
P1 · P2 = abcZ3 + (α)abc+ abZ3 + bZ3 (8.31)
Polynomial reduction is the main procedure computed by the abstraction tool. To reduce
a polynomial P1 by polynomial P2, one reduction step is computed as
Algorithm 9: Multiplication of a Polynomial by a Monomial
Input: Monomial M1, Polynomial P1.
Output: M1 · P1
P2 := ∅;
foreach Monomial Mp ∈ P1 do
Mtemp := Mp ·M1;
foreach Monomial Mp2 ∈ P2 in reverse order do




if Mp2 == Mtemp then
Mp2.coef += Mtemp.coef;













P2−→ P1 + LT (P1)
LT (P2)
· P2 (8.32)
where LT is the first monomial object of the given polynomial. LT (P1)
LT (P2)
P2 modifies P2 so
that it has the same leading term as P1. Thus, when the two polynomials are added together,
the leading terms are cancelled out. Note that reduction is only possible when LT (P1) is
divisible by LT (P2). However, one reduction step may not be sufficient to compute a full
reduction, i.e., it is possible that the resulting polynomial could be reduced further by P2.
One can reapply the reduction steps until no more reductions are possible. A more efficient
method is to collect all monomials in P1 that are divisible byLT (P2), add the results of each
division to Ptemp, and then compute P1 + Ptemp · P2. Due to the ordering, if LT (P2) > Mi
where Mi is the ith monomial in P1, then all Mj ∈ P1 for j ≥ i are not divisible by
LT (P2). Thus, the divisions are performed in monomial order and stopped as soon as this
condition holds. The overall procedure is described in Algorithm 10. Polynomial reduction
makes use of monomial division, monomial-by-polynomial multiplication, and polynomial
addition, which were all detailed previously.
Example 8.6 Consider again the F24 [a, b, c, Z] setup from Example 8.5 with
P1 = (α)ab+ bZ
3 P2 = abc+ ab+ b (8.33)
Algorithm 10: Polynomial Reduction
Input: Polynomial P1, Polynomial P2.
Output: r where P1
P2−→+ r
Ptemp := ∅;
foreach Monomial Mp ∈ P1 do
if LT (P2) > Mp then
break;
end
Mdiv := Mp/LT (P2);




return P1 + (Ptemp · P2);
119
Compute P2




= (α3 + α2)c (8.34)
Note that over this field, (α)−1 = (α3 + α2). The next monomial division is:
ab
(α)ab
= (α3 + α2) (8.35)




so Ptemp = (α3 + α2)c+ (α3 + α2). The final reduction is then computed as
P2 + (Ptemp · P1)
= abc+ ab+ c+ ((α3 + α2)c+ (α3 + α2)) · ((α)ab+ bZ3)
= abc+ ab+ c+ (abc+ ab+ (α3 + α2)bcZ3 + (α3 + α2)bZ3)
= (α3 + α2)bcZ3 + (α3 + α2)bZ3 + c (8.37)
Notice that the two monomials of P2 that were divisible by LT (P1) are cancelled out.
8.2 Abstraction Tool Flow and Results
The tool takes the circuit as input and applies the approach presented in Chapter 6 to
derive the polynomial representation of the circuit. The most computationally intensive
procedures in the approach are
1. Initial reduction of the word-level polynomial, fz
F−{fz},F0−−−−−−→+ r
2. Derivation of the polynomials for the second reduction procedure, Fa = {a0 =
F0(A), a1 = F1(A), . . . }
3. Computation of the second reduction (substitution), r
FA,F0−−−→+ Z + F(A)
Of these, the first two are computed in parallel as they are independent of each other. Step
1 orders the polynomials in J + J0 by their monomial order and reduces fZ in that order.
In our experiments, this step typically takes longer than step 2.
120
All experiments are run on a 64-bit Linux desktop with a 3.5GHz Intel CoreTM i7
Quad-core CPU and 16 GB of RAM. Table 8.2 depicts the time and memory required to
derive the polynomial abstraction from bug-free and buggy Mastrovito multiplier circuits
using our custom tool. This circuit is provided as a bit-blasted/flattened gate-level netlist.
These circuits compute Z = A · B over some field F2k , so the analysis is performed
over this same field, abstracting this word-level representation. The bug introduced is
a swapping of two output nodes of the given circuit, ensuring that the effect propagates
down during the reduction process. Similarly, Table 8.3 depicts the results for abstracting
flattened Montgomery multipliers.
Montgomery multipliers are typically designed hierarchically, as shown in Fig. 3.4. If
the hierarchy is known, it can be exploited by computing the abstraction of each MR block
in parallel, as shown in Table 8.4. In this table, “BLK A” and “BLK B” denote the input
MR blocks, “BLK Mid” denotes the middle block and “BLK Out” is the output block.
While each block is an MR block, some have been simplified by constant-propagation,
hence they have different sizes. First, a polynomial is extracted for each MR block (gate-
level to word-level abstraction), and then the approach is re-applied at word-level to derive
the input-output relation (solved trivially in < 1 second). Our approach can extract the
word-level polynomial for up to 571-bit circuits!
Table 8.2: Abstraction of Mastrovito multipliers. Time given in seconds, memory given in
MB. TO = 3 days (259,200 seconds.)
Size (k) 163 233 283 409 571
# of Gates 153K 167K 399K 508K 1.6M
Time (s)
Bug Free 1,443 1,913 11,116 17,848 192,032
Buggy 1,487 2,106 11,606 20,263 204,194
Max Memory (MB) 213 269 561 845 2,855
Table 8.3: Abstraction of flat Montgomery multipliers. Time given in seconds, memory
given in MB. TO = 3 days (259,200 seconds.)
Size (k) 163 233 283 409 571
# of Gates 184K 329K 488K 1.0M 1.97M
Time
Bug Free 6,897 63,805 TO TO TO
Buggy 6,961 64,009 TO TO TO
Max Memory 153 325 505 971 2,240
121
Table 8.4: Abstraction of Montgomery blocks. Time given in seconds, memory is given in
MB. TO = 3 days (259,200 seconds)
Circuit Size (k) 163 233 283 409 571
# of Gates
Blk A 33K 55K 82K 168K 330K
Blk B 33K 55K 82K 168K 330K
Blk Mid 85K 163K 241K 502K 980K
Blk Out 32K 54K 81K 168K 328K
Time
Bug Free
Blk A 25 142 330 1,322 5,371
Blk B 25 141 329 1,335 5,241
Blk Mid 73 408 883 4,471 19,942
Blk Out 24 140 321 1,338 5,532
Buggy
Blk A 26 142 331 1,323 5,372
Blk B 26 141 330 1,336 5,421
Blk Mid 111 580 1,411 6,829 37,804
Blk Out 25 141 322 1,339 5,539
Max Mem Per Blk 80 168 254 538 1,129
To abstract a word-level representation of the composite field multiplier F(2m)n [Fig.
7.2], we first apply our approach to abstract a word-level representation of each m-bit
block. In the case of an adder, this abstraction is trivially computed in 1 second. In
the case of a multiplier, refer to our experimental results for Mastrovito multipliers for
comparison (Table 8.2), as these are designed asm-bit Mastrovito blocks. Each abstraction
can be computed independently. Once these word-level abstractions are known, the final
abstraction over F(2m)n is performed using only word-level variables. The results of this
final word-level abstraction of buggy and bug-free multipliers over composite fields are
shown in Table 8.5. These results depend on the total number of m-bit multiplier and adder
blocks in the design, which are given in Table 8.6.
The above experiments also demonstrate that we can perform equivalence checking
between Mastrovito (golden model) and Montgomery multiplier (implementation) circuits,
by deriving a canonical polynomial (Z1, Z2) from each circuit independently and then
checking if Z1 = Z2, for up to 571-bit circuits. Our experiments have shown that contem-
porary approaches (BDDs, SAT, SMT, and AIG/ABC) show some success in bug-catching
for these kinds of circuits (particularly ABC). These experiments are shown in Table 8.7.
Note that ABC uses random simulation for FRAIGing (functionally reducing AIGs), which
can catch a bug early if it is lucky. However, full equivalence checking using any of these
techniques failed even for 16-bit circuits, as shown in Table 8.8.
122
Table 8.5: Abstraction of bug-free Mastrovito multipliers over F(2m)n . Time is given in
seconds. Memory is given in MB. TO = more than 24 hours = 86,400 seconds. Note that




Bug Free Buggy Mem
2 64 1 1 4
4 32 1 1 2
8 16 1 1 2
16 8 1 1 2
32 4 1 1 2
64 2 1 1 3
- - - - -
- - - - -




Bug Free Buggy Mem
2 128 15 15 23
4 64 2 2 4
8 32 1 1 3
16 16 1 1 2
32 8 1 1 2
64 4 1 1 2
128 2 1 1 2
- - - - -




Bug Free Buggy Mem
2 256 406 408 90
4 128 53 53 25
8 64 8 8 4
16 32 2 2 4
32 16 1 1 3
64 8 1 1 3
128 4 1 1 2
256 2 1 1 2




Bug Free Buggy Mem
2 512 11, 883 12, 050 414
4 256 1, 520 1, 536 106
8 128 209 211 29
16 64 38 37 10
32 32 10 10 5
64 16 4 4 3
128 8 2 2 3
256 4 1 1 3
512 2 1 1 3
Table 8.6: Statistics of designs over F(2m)n .
n 2 4 8 16 32
# of F2m Multipliers 6 36 168 720 2976
# of F2m Adders 3 27 147 675 2883
8.3 Limitations of the Abstraction Approach
Our tools and approach perform very favorably for F2k multiplier circuits and other
functions designed over fields. These types of circuits are based on AND-XOR gate logic.
Thus, the polynomials derived during the reduction procedures are very sparse. Since the
complexity of the algorithm heavily depends on the density of the polynomials, the worst-
case is avoided in such designs. In the case of bugs within these circuits, these polynomials
123
Table 8.7: Bug-catching between a golden-model Mastrovito and buggy Montgomery
circuit. Time given in seconds. TO = 3 days (259,200 seconds).
Circuit Size (k) 64 163 233 283 409 571
ABC 1 32 6 96 217 401
Lingeling 1 8 362 12,728 3,323 23,298
Picosat 15,235 TO TO TO TO TO
Boolector 4 30 41 105 152 19,113
CVC4 2 11 64 8,660 280 TO
Z3 1 12 55 10,169 335 TO
Yices 1 6 7 618 578 11,568
Table 8.8: Equivalence checking between a golden-model Mastrovito and a bug-free
Montgomery circuit. TO = 3 days (259,200 seconds).








increase in size, but are still easily manageable. This is why the approach is applicable to
very large Galois field multipliers, both buggy and bug-free.
However, for random logic, especially logic containing chains of OR gates, the polyno-
mial becomes very dense. As a result, the algorithm begins to encounter the computational
worst-case. This is best shown with a small example.
Example 8.7 Consider the circuit in Fig. 8.2a, which performs a 4-input XOR function.
Due to RATO, the monomial ordering of the variables is z > f > e > d > c > b > a.
Thus, z will be reduced in terms of the rest of the variables. The polynomials derived from
the design are:
(a) (b)
Figure 8.2: Logic comparisons between similarly structured circuits. a) Small XOR logic
circuit. b) Small OR logic circuit.
124
f1 : z + f + d f2 : f + e+ c f3 : e+ b+ a
The reduction procedure z
f1,f2,f3−−−−→+ r will be computed as follows:
1. z
z+f+d−−−−→ f + d
2. (f + d)
f+e+c−−−−→ e+ d+ c
3. (e+ d+ c) e+b+a−−−→ d+ c+ b+ a
In each reduction, the output gate variable is removed and one copy of each input variable
is added, leaving a sparse polynomial. Now consider the same circuit with the XOR gates
replaced by OR gates, as shown in Fig. 8.2b.
The monomial ordering stays the same, but the polynomials derived from each gate
have changed:
f1 : z + fd+ f + d f2 : f + ec+ e+ c f3 : e+ ba+ b+ a
The reduction procedure, z
f1,f2,f3−−−−→+ r is now computed as:
1. z
z+fd+f+d−−−−−−→ fd+ f + d
2. (fd+ f + d)
f+ec+e+c−−−−−−→ f + edc+ ed+ dc+ d;
(f + edc+ ed+ dc+ d)
f+ec+e+c−−−−−−→ edc+ ed+ ec+ e+ dc+ d+ c
3. (edc+ ed+ ec+ e+ dc+ d+ c) e+ba+b+a−−−−−−→+
dcba+ dcb+ dca+ dba+ dc+ db+ da+ d+ cba+ cb+ ca+ c+ ba+ b+ a
Each pass removes an output variable of the gate, but replaces it with two instances of each
input variable. This increases the density of the resulting polynomial exponentially.
8.4 Conclusions
This chapter examined the data structures and algorithms implemented in a custom
software abstraction tool. Abstraction of Galois field circuits using the custom tool has
significantly better performance compared to using SINGULAR scripts. With it, we can
abstract circuits up to 1024-bits, buggy or bug-free. Although a bug will generally substan-
tially inflate the size of the resulting polynomial abstraction, the tool is not greatly hindered
125
by the presence of bugs in a circuit. However, for random logic, especially OR-based logic,
the size of the polynomials tends to grow exponentially during the abstraction. Thus, the
approach is infeasible for circuits with OR-gate chains.
CHAPTER 9
CONCLUSIONS AND FUTURE WORK
A combinational circuit with k-inputs and k-outputs implements Boolean functions
f : Bk → Bk, where B = {0, 1}. The function can also be construed as a mapping
f : F2k → F2k , where F2k denotes the Galois field of 2k elements. A circuit with differing
input and output sizes computes f : Bm → Bn, which can be represented as a function
over Galois fields f : F2m → F2n . This circuit can also be analyzed as the function
f : F2k → F2k , where F2k ⊃ F2m and F2k ⊃ F2m .
Every function f over F2k is a polynomial function — i.e., there exists a unique, mini-
mal, canonical polynomialF that describes f . This dissertation presented novel techniques
based on computer-algebra and algebraic-geometry to derive the canonical (word-level)
polynomial representation from the circuit as Z = F(A) over F2k , where A and Z denote,
respectively, the input and output bit-vectors of the circuit.
A theory for word-level polynomial abstraction of bit-level circuits over Galois fields
is first developed. This theory is derived using techniques from computer-algebra, notably
the theory of Gro¨bner basis. However, due to the computational complexity of computing a
Gro¨bner basis, the solution is not scalable to large designs. In order to overcome these limi-
tations, new symbolic computational algorithms are developed and refined. The algorithms
employ techniques from the binomial expansion over F2k and F4-style reduction and can
exploit hierarchy in a given circuit. Finally, an efficient implementation of the algorithmic
approach is presented.
Experiments show that the proposed approach works exceptionally well for abstracting
word-level Galois field arithmetic circuits. It has been shown that the approach can abstract
and verify these types of circuits with up to 1024-bit datapaths. Other contemporary
techniques cannot verify these types for circuits beyond 163-bits and fail to abstract them
beyond 32-bits.
127
However, in cases of random logic circuits, the abstraction approach can generate high-
degree polynomials:
Xq−1 +Xq−2 . . . (9.1)
In these cases, the polynomials derived during the computation are dense, and the compu-
tational complexity of manipulating such polynomials makes abstraction infeasible.
9.1 Future Work
Due to the modular nature of the proposed solution, there are many potential future
research directions that can be explored.
9.1.1 Hardware Acceleration
The first reduction step, fZ
F−fzi ,F0−−−−−→+ r is the most computationally complex part of
the proposed abstraction approach. This reduction could be implemented using a hardware
accelerator. Significant speed-ups have been observed in GPU implementations of circuit
simulation algorithms [110]. Furthermore, this work has shown cases where multiple,
independent abstractions need to be computed at the same time, such as when abstracting
a word-level representation of a composite field multiplier.
These abstractions can be computed in parallel with one another, and this parallelism
could then be exploited using a GPU. Furthermore, our approach to compute the abstrac-
tions uses an F4-style reduction procedure, which performs many complex computations
over a large matrix. Operations over matrices can be suitably implemented using a GPU.
Lastly, the substitution by ai = F(A) is trivially parallelized. Thus, further study is
proposed to implement word-level abstraction on a general purpose GPU.
9.1.2 Integration with CAD Tools
The proposed canonical word-level abstraction approach is a full, self-contained so-
lution. It can thus be integrated into other CAD tools. There are direct applications of
word-level abstractions to design synthesis. For instance, the approach can compute a
functional decomposition of a logic, or it could be used in high-level RTL synthesis. Since
the derived abstraction is canonical, it can also be used in verification engines such as SMT
solvers. The abstraction approach most efficiently handles AND/XOR logic, so it could
128
be used to complement approaches in the mentioned tools that are efficient over AND/OR
logic.
9.1.3 Polynomial Reductions using Data-Structures
The abstraction approach poorly handles chains of OR gates due to their representation
as polynomials of Galois fields. Other polynomial-based tools [111] have shown that it
can be beneficial to represent polynomials internally as decision diagrams. Thus, it is
worthwhile to explore whether it is possible to implement the algorithmic approach in a
different data-structure that is better-suited for handling this type of logic. One candidate
data-structure is the And-Invert-Graph, as this structure efficiently handles OR gates. The
widely-used tool ABC [63] provides a very efficient, flexible, and open-source implemen-
tation of the AIG data structure.
Recall that a one-step reduction of the polynomial f by polynomial g, f
g−→ r, is
computed as:
r = f − LT (f)
LT (g)
· g (9.2)
Over Boolean circuits, B ≡ F2. Since the leading term of any monomial in B is 1, and
−1 ≡ +1, then
f − (LT (f)
LT (g)
· g) ≡ f + (LM(f)
LM(g)
· g) (9.3)




while the · operator acts as an AND operation ∧. LM(f)
LM(g)
can be computed as a division of
cubes. So one step of the reduction procedure can be computed as the following AND/XOR
operation:
r = f ⊕ LM(f)
LM(g)
∧ g (9.5)
Thus, we propose an investigation into implementing the algorithms presented in this
dissertation over AIGs.
129
9.1.4 Application to Sequential Circuit Verification
Sequential Galois field arithmetic circuits over F2k take k-bit inputs and produce a
k-bit result after k-clock cycles of operation. Formal verification of sequential arithmetic
circuits with large datapath sizes is beyond the capabilities of contemporary verification
techniques. To address this problem, we described a verification method in [26] that uses
the presented abstraction approach to implicitly unroll the sequential arithmetic circuit
over multiple (k) clock-cycles. The resulting function computed by the state-registers of
the circuit is represented canonically as a multivariate word-level polynomial over F2k .
While directly applicable to sequential Galois field arithmetic circuits, this work needs to
be further generalized in order to make it applicable to any sequential state machine.
9.1.5 Application to Formal Software Verification
Computer algebra techniques based on Gro¨bner basis theory have been used in formal
software verification [70]. In this work, a Gro¨bner basis computation is used to derive loop
invariants. However, the derived invariants are not bit-precise, so not every invariant that
is computed can be applied to the verification. As our approach maintains the input-output
relationship in the abstraction, it could be applied to find bit-precise invariants.
9.1.6 Application to Integer Arithmetic Circuits
The abstraction approach derives a word-level representation of circuits over Galois
fields, F2k . In order to expand its usability, we conjecture whether it is possible to apply
concepts from this approach to abstract word-level representations of circuits over integer
rings, Z2k . As any function over a Galois field F2k is a polynomial function, there exists a
polynomial that describes the word-level function of a given circuit over F2k . However, not
every function over an integer ring Z2k is a polynomial function. Thus, a single polynomial
that describes the function of a circuit overZ2k is not guaranteed to exist. Even though there
may not exist a single polynomial that describes the entire function over Z2k , elimination
theory and Gro¨bner basis still apply over this ring. Thus, it may be possible to modify the
theory and implementation of our word-level abstraction approach in order to abstract a set
of word-level polynomials over Z2k .
APPENDIX
REPRESENTATIONS OF BASE FIELD ELEMENTS
OVER EXTENSION FIELDS
Consider any Galois field Fq and a k-bit extension of this field, Fqk . This extension is
created using a primitive, irreducible polynomial P (x) of degree k over Fq[x]. Any element
A ∈ Fqk can be represented as
A = a0 + a1α + · · ·+ ak−1αk−1 (A.1)
where {a0, . . . , ak−1} ∈ Fq and α is the primitive element of Fqk , i.e., P (α) = 0. The goal
is to derive the polynomial functions ai = F(A) for all 0 ≤ i < k. Note that Fq could itself
be an extension field of a base field Fp where p is a prime and q = pl for some l ≥ 1.








Examine what happens when Eqn. (A.1) is raised by the power q. The following lemma
is applied.
Lemma A.1 [86] Let α1, . . . , αt be any elements in Fpk . Then
(α1 + α2 + · · ·+ αt)pi = αpi1 + αp
i
2 + · · ·+ αp
i
t (A.4)
for all integers i ≥ 0.
131
Corollary A.1 Since q = pl for some l ≥ 1, Lemma A.1 is applicable to Fqk . Let α1, . . . , αt
be any elements in Fqk . Then
(α1 + α2 + · · ·+ αt)qi = αqi1 + αq
i
2 + · · ·+ αq
i
t (A.5)
for all integers i ≥ 0.








qi + · · ·+ aqik−1α(k−1)q
i
(A.6)
Then, applying Eqn. (A.2) to this result gives:
Aq
i
= a0 + a1α
qi + · · ·+ ak−1α(k−1)qi (A.7)
In this way, k unique equations are derived, {A,Aq, . . . , Aqk−1}, along with k un-
knowns, {a0, . . . , ak−1}. These equations can be represented in matrix form, A = Ma,
where A = [A,Aq, . . . , Aqk−1 ]T , M is a k by k matrix of coefficients ∈ Fqk , and a =
[a0, . . . , ak−1]T :
1 α α2 . . . αk−1

























Treat a as a vector of unknowns, M and A as constants. This system of equations can
be solved using Gaussian elimination. However, this system also has a special structure that
can be exploited. M is a k by k Vandermonde matrix of the form V (αq0 , αq1 , . . . , αqk−1).





j − αqi) (A.9)
Since the elements {α, αq, . . . , αqk−1} are distinct:
|M| 6= 0 (A.10)
132




where Mi is M with the column {αi, αiq, . . . , αiqk−1}T replaced by A.
Mi =

1 α . . . αi−1 A αi+1 . . . αk−1
1 αq . . . α(i−1)q Aq α(i+1)q . . . α(k−1)q
...
... · ... ... ... · ...
1 αq
k−1




(k−1) . . . α(k−1)q
(k−1)
 (A.12)




(−1)(i+j)Aqj |Vi+1(α, . . . , αq(j−1), αq(j+1), . . . , αq(k−1))| (A.13)
where Vi(x1, . . . , xn) is the Vandermonde matrix V (x1, . . . , xn) with the i-th column skipped
and an extra column added to the end.






















... · ... ... · ...
1 xn x
2








The computation of determinant of Vi is known to the linear algebra community to be
|Vi(x1, . . . , xn)| = |V (x1 . . . , xn)| · Sn−i(x1, . . . , xn) (A.15)
where Si(x1, . . . , xn) is the i-th fundamental symmetric polynomial in {x1, . . . , xn}. Thus,





·|V (α, . . . , αq(j−1), αq(j+1), . . . , αq(k−1))|
·Sn−1−i(α, . . . , αq(j−1), αq(j+1), . . . , αq(k−1))] (A.16)
Lastly, this work has discovered the following proposition for the determinant |M|.
133
Proposition A.1 The determinant |M| has the property
|M|q = (−1)k−1|M| (A.17)











j − αqi)]q (A.19)





j+1 − αqi+1) (A.20)
When j = k − 1, the product term is in the form (αqk − αqi+1). Since αqk = α over Fqk ,
this term equivalent to −(αqi+1 − α). This gives the property:
|M|q = (−1)k−1|M| (A.21)
REFERENCES
[1] E. Biham, Y. Carmeli, and A. Shamir, “Bug Attacks,” in Proceedings on Advances
in Cryptology, pp. 221–240, 2008.
[2] T. R. Nicely, “Pentium FDIV Flaw.” http://www.trnicely.net/pentbug/pentbug.html.
[Online; accessed May-2015].
[3] D. N. Arnold, “The Patriot Missile Failure.” http://www.ima.umn.edu/ arnold/disas-
ters/patriot.html. [Online; accessed May-2015].
[4] Z. Manna and A. Pnueli, The Temporal Logic of Reactive and Concurrent Systems.
Springer-Verlag, First ed., 1991.
[5] E. Clarke, O. Grumberg, and D. Peled, Model Checking. The MIT Press, Cam-
bridge, MA, 1999.
[6] F. Lu, L. Wang, K. Cheng, and R. Huang, “A Circuit SAT Solver with Signal
Correlation Guided Learning,” in IEEE Design, Automation and Test in Europe,
pp. 892–897, 2003.
[7] G. Avrunin, “Symbolic Model Checking using Algebraic Geometry,” in Computer
Aided Verification Conference, pp. 26–37, 1996.
[8] C. Condrat and P. Kalla, “A Gro¨bner Basis Approach to CNF formulae Preprocess-
ing,” in International Conference on Tools and Algorithms for the Construction and
Analysis of Systems, pp. 618–631, 2007.
[9] Y. Watanabe, N. Homma, T. Aoki, and T. Higuchi, “Application of Symbolic Com-
puter Algebra to Arithmetic Circuit Verification,” in IEEE International Conference
on Computer Design, pp. 25–32, October 2007.
[10] W. W. Adams and P. Loustaunau, An Introduction to Gro¨bner Bases. American
Mathematical Society, Providence, RI, 1994.
[11] J. Lv, Scalable Formal Verification of Finite Field Arithmetic Circuits using Com-
puter Algebra Techniques. PhD thesis, Univ. of Utah, Aug. 2012.
[12] H. Jain, D. Kroening, N. Sharygina, and E. Clarke, “Word Level Predicate Abstrac-
tion and Refinement Techniques for Verifying Rtl Verilog,” in Design Automation
Conf., 2005.
[13] S. Horeth and R. Drechsler, “Formal Verification of Word-level Specifications,” in
IEEE Design, Automation and Test in Europe, pp. 52–58, 1999.
135
[14] L. Arditi, “*BMDs can Delay the use of Theorem Proving for Verifying Arithmetic
Assembly Instructions,” in Proc. Formal Methods in CAD (Srivas, ed.), Springer-
Verlag, 1996.
[15] Z. Zeng, P. Kalla, and M. Ciesielski, “LPSAT: A Unified Approach to RTL Satisfia-
bility,” in Proc. DATE, 2001.
[16] R. Brummayer and A. Biere, “Boolector: An Efficient SMT Solver for Bit-Vectors
and Arrays,” in TACAS 09, Volume 5505 of LNCS, Springer, 2009.
[17] R. Brant, D. Kroening, J. Ouaknine, S. Seshia, O. Strichman, and B. Brady, “Decid-
ing Bit-Vector Arithmetic with Abstraction,” in Proc. TACAS, pp. 358–372, 2007.
[18] D. Babic and M. Musuvathi, “Modular Arithmetic Decision Procedure,” Tech. Rep.
TR-2005-114, Microsoft Research, 2005.
[19] N. Tew, P. Kalla, N. Shekhar, and S. Gopalakrishnan, “Verification of Arithmetic
Datapaths using Polynomial Function Models and Congruence Solving,” in Proc.
Intl. Conf. on Computer-Aided Design (ICCAD), pp. 122–128, 2008.
[20] A. Gupta, “Formal Hardware Verification Methods: A Survey,” Formal Methods in
System Design, vol. 1, pp. 151–238, 1992.
[21] J. Smith and G. DeMicheli, “Polynomial Methods for Component Matching and Ver-
ification,” in Proceedings of the IEEE/ACM International Conference on Computer-
Aided Design, 1998.
[22] J. Smith and G. DeMicheli, “Polynomial Methods for Allocating Complex Compo-
nents,” in IEEE Design, Automation and Test in Europe, 1999.
[23] A. Peymandoust and G. DeMicheli, “Application of Symbolic Computer Algebra
in High-Level Data-Flow Synthesis,” IEEE Transactions CAD, vol. 22, no. 9,
pp. 1154–11656, 2003.
[24] T. Pruss, P. Kalla, and F. Enescu, “Word-Level Abstraction from Bit-Level Circuits
using Gro¨bner Basis,” in International Workshop on Logic and Synthesis, 2013.
[25] T. Pruss, P. Kalla, and F. Enescu, “Equivalence Verification of Large Galois Field
Arithmetic Circuits using Word-Level Abstraction via Gro¨bner Bases,” in Design
Automation Conference, 2014.
[26] X. Sun, P. Kalla, T. Pruss, and F. Enescu, “Formal Verification of Sequential Galois
Field Arithmetic Circuits Using Algebraic Geometry,” in Design, Automation & Test
in Europe, pp. 1623–1628, 2015.
[27] T. Pruss, P. Kalla, and F. Enescu, “Efficient Symbolic Computation for Word-Level
Abstraction from Combinational Circuits for Verification over Galois Fields,” IEEE
Transactions on CAD (in review), 2015.
[28] R. E. Bryant, “Graph Based Algorithms for Boolean Function Manipulation,” IEEE
Transactions on Computers, vol. C-35, pp. 677–691, August 1986.
136
[29] K. Brace, R. Rudell, and R. Bryant, “Efficient Implementation of a BDD Package,”
in DAC, pp. 40–45, 1990.
[30] R. Drechsler, A. Sarabi, M. Theobald, B. Becker, and M. Perkowski, “Efficient Rep-
resentation and Manipulation of Switching Functions based on Ordered Kronecker
Functional Decision Diagrams,” in Design Automation Conference, pp. 415–419,
1994.
[31] I. Bahar, E. A. Frohm, C. M. Gaona, G. D. Hachtel, E. Macii, A. Pardo, and
F. Somenzi, “Algebraic Decision Diagrams and their Applications,” in Proceedings
of the IEEE/ACM International Conference on Computer-Aided Design, pp. 188–
191, Nov. 1993.
[32] E. M. Clarke, K. L. McMillan, X. Zhao, M. Fujita, and J. Yang, “Spectral Transforms
for Large Boolean Functions with Applications to Technology Mapping,” in DAC,
pp. 54–60, 1993.
[33] E. M. Clarke, M. Fujita, and X. Zhao, “Hybrid Decision Diagrams - Overcoming the
Limitation of MTBDDs and BMDs,” in Proceedings of the IEEE/ACM International
Conference on Computer-Aided Design, pp. 159–163, 1995.
[34] Y. T. Lai, M. Pedram, and S. B. Vrudhula, “FGILP: An ILP Solver based on Function
Graphs,” in ICCAD, pp. 685–689, 1993.
[35] R. E. Bryant and Y. A. Chen, “Verification of Arithmetic Functions with Binary
Moment Diagrams,” in Proceedings of Design Automation Conference, pp. 535–541,
1995.
[36] R. Dreschler, B. Becker, and S. Ruppertz, “The K*BMD: A Verification Data
Structure,” IEEE Design & Test of Computers, vol. 14, no. 2, pp. 51–59, 1997.
[37] Y. A. Chen and R. E. Bryant, “*PHDD: An Efficient Graph Representation for
Floating Point Verification,” in Proc. ICCAD, 1997.
[38] M. Ciesielski, P. Kalla, and S. Askar, “Taylor Expansion Diagrams: A Canonical
Representation for Verification of Data-Flow Designs,” IEEE Transactions on Com-
puters, vol. 55, no. 9, pp. 1188–1201, 2006.
[39] N. Shekhar, Equivalence Verification of Arithmetic Datapaths using Finite Ring
Algebra. PhD thesis, Univ. of Utah, Dept. of Electrical and Computer Engineering,
Aug. 2007.
[40] B. Alizadeh and M. Fujita, “Modular Datapath Optimization and Verification based
on Modular-HED,” IEEE Transactions CAD, pp. 1422–1435, Sept. 2010.
[41] A. Jabir and D. Pradhan, “MODD: A New Decision Diagram and Representation for
Multiple Output Binary Functions,” in IEEE Design, Automation and Test in Europe,
2004.
[42] A. Jabir, D. Pradhan, T. Rajaprabhu, and A. Singh, “A Technique for Representing
Multiple Output Binary Functions with Applications to Verification and Simulation,”
IEEE Transactions on Computers, vol. 56, no. 8, pp. 1133–1145, 2007.
137
[43] R. K. Brayton, G. D. Hachtel, A. Sangiovanni-Vencentelli, F. Somenzi, A. Aziz,
S.-T. Cheng, S. Edwards, S. Khatri, Y. Kukimoto, A. Pardo, S. Qadeer, R. Ranjan,
S. Sarwary, G. Shiple, S. Swamy, and T. Villa, “VIS: A System for Verification and
Synthesis,” in Computer Aided Verification, 1996.
[44] K. L. McMillan, Symbolic Model Checking. Kluwer Academic Publishers, Norwell,
MA, 1993.
[45] C. Barrett and C. Tinelli, “CVC3,” in Computer Aided Verification Conference,
pp. 298–302, Springer, July 2007.
[46] L. Moura and N. Bjrner, “Z3: An Efficient SMT Solver.,” in International Con-
ference on Tools and Algorithms for the Construction and Analysis of Systems,
vol. 4963, Springer, 2008.
[47] C. W. Barlett, D. L. Dill, and J. R. Levitt, “A Decision Procedure for bit-Vector
Arithmetic,” in DAC, June 1998.
[48] H. Enderton, A Mathematical Introduction to Logic. Academic Press, New York,
1972.
[49] T. Bultan, R. Gerber, and C. League, “Verifying Systems with Integer Constraints
and Boolean Predicates: A Composite Approach,” in Proc. Int’l. Symp. on Software
Testing and Analysis, 1998.
[50] S. Devadas, K. Keutzer, and A. Krishnakumar, “Design Verification and Reachabil-
ity Analysis using Algebraic Manipulation,” in Proc. ICCD, 1991.
[51] Z. Zhou and W. Burleson, “Equivalence Checking of Datapaths Based on Canonical
Arithmetic Expressions,” in DAC, 1995.
[52] P. Dasgupta, P. P. Chakrabarti, A. Nandi, S. Krishna, and A. Chakrabarti, “Ab-
straction of Word-level Linear Arithmetic Functions from Bit-level Component
Descriptions,” in Proc. Design, Automation and Test in Europe, pp. 4–8, 2001.
[53] J. Møller, J. Lichtenberg, H. R. Andersen, and H. Hulgaard, “Difference Decision
Diagrams,” in Computer Science Logic, The IT University of Copenhagen, Den-
mark, Sept 1999.
[54] J. Møller and J. Lichtenberg, “Difference Decision Diagrams,” Master’s thesis,
Department of Information Technology, Technical University of Denmark, Building
344, DK-2800 Lyngby, Denmark, aug 1998.
[55] K. Strehl, “Interval Diagrams: Increasing Efficiency of Symbolic Real-Time Verifi-
cation,” in Intl. Conf. on Real Time Computing Systems and Applications, 1999.
[56] P. Sanchez and S. Dey, “Simulation-Based System-Level Verification using Polyno-
mials,” in High-Level Design Validation & Test Workshop, HLDVT, 1999.
[57] G. Ritter, “Formal Verification of Designs with Complex Control by Symbolic
Simulation,” in Advanced Research Working Conf. on Correct Hardware Design
and Verification Methods (CHARME) (S. V. LCNS, ed.), 1999.
138
[58] F. Fallah, S. Devadas, and K. Keutzer, “Functional Vector Generation for HDL
models using Linear Programming and 3-Satisfiability,” in Proc. DAC, ’98.
[59] R. Brinkmann and R. Drechsler, “RTL-Datapath Verification using Integer Linear
Programming,” in Proc. ASP-DAC, 2002.
[60] C. Y. Huang and K. T. Cheng, “Using Word-Level ATPG and Modular Arithmetic
Constraint Solving Techniques for Assertion Property Checking,” IEEE Trans. CAD,
vol. 20, pp. 381–391, 2001.
[61] A. Kuehlmann, V. Paruthi, F. Krohm, and M. K. Ganai, “Robust Boolean Reasoning
for Equivalence Checking and Functional Property Verification,” IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems, vol. 21, pp. 1377–
1394, Nov. 2006.
[62] A. Mishchenko, S. Chatterjee, R. Brayton, and N. Een, “Improvements to Combina-
tional Equivalence Checking,” in Proc. Intl. Conf. on CAD (ICCAD), pp. 836–843,
2006.
[63] R. Brayton and A. Mishchenko, “ABC: An Academic Industrial-Strength Verifica-
tion Tool,” in Computer Aided Verification, vol. 6174, pp. 24–40, Springer, Berkeley,
CA, 2010.
[64] E. Mastrovito, “VLSI Designs for Multiplication Over Finite Fields GF(2m),” Lec-
ture Notes in Computer Science, vol. 357, pp. 297–309, 1989.
[65] C. Koc and T. Acar, “Montgomery Multiplication in GF(2k),” Designs, Codes and
Cryptography, vol. 14, pp. 57–69, Apr. 1998.
[66] L. Erko¨k, M. Carlsson, and A. Wick, “Hardware/Software Co-verification of Cryp-
tographic Algorithms using Cryptol,” in Proc. Formal Methods in CAD (FMCAD),
pp. 188–191, 2009.
[67] M. Ciesielski, W. Brown, D. Liu, and A. Rossi, “Function Extraction from Arith-
metic Bit-level Circuits,” ISVLSI, 2014.
[68] S. Gao, “Counting Zeros over Finite Fields with Gro¨bner Bases,” Master’s thesis,
Carnegie Mellon University, 2009.
[69] S. Gao, A. Platzer, and E. Clarke, “Quantifier Elimination over Finite Fields with
Gro¨bner Bases,” in Intl. Conf. Algebraic Informatics, 2011.
[70] S. Sankaranarayanan, H. B. Sipma, and Z. Manna, “Non-linear Loop Invariant
Generation using Grobner Bases,” SIGPLAN Not., vol. 39, no. 1, pp. 318–329, 2004.
[71] L. Lastras-Montan˜o, P. Meany, E. Stephens, B. Trager, J. O’Conner, and L. Alves,
“A New Class of Array Codes for Memory Storage,” in Proc. Information Theory
and Applications Workshop, pp. 1–10, 2011.
[72] L. Lastras, A. Lvov, B. Trager, S. Winograd, V. Paruthi, A. El-Zhein, R. Shadowen,
and G. Janssen, “New Formal Verification Techniques for Algorithms over Finite
Fields.” Presented at Intl. Workshop on Internation Theory and Applications. Ab-
stract of the paper available at: http://ita.ucsd.edu/workshop/12/talks, 2012.
139
[73] A. Lvov, L. Lastras-Montan˜o, V. Paruthi, R. Shadowen, and A. El-Zein, “Formal
Verification of Error Correcting Circuits using Computational Algebraic Geometry,”
in Proc. Formal Methods in Computer-Aided Design (FMCAD), pp. 141–148, 2012.
[74] W. Decker, G.-M. Greuel, G. Pfister, and H. Scho¨nemann, “SINGULAR 4-0-2 — A
Computer Algebra System for Polynomial Computations.” http://www.singular.uni-
kl.de. [Online; accessed May-2015].
[75] J. Lv, P. Kalla, and F. Enescu, “Efficient Grobner Basis Reductions for Formal
Verification of Galois Field Arithmetic Circuits,” IEEE Transactions CAD, vol. 32,
pp. 1409–1420, Sept. 2013.
[76] J. Lv, P. Kalla, and F. Enescu, “Efficient Groebner Basis Reductions for Formal
Verification of Galois Field Multipliers,” in IEEE Design, Automation and Test in
Europe, 2012.
[77] J. Lv, P. Kalla, and F. Enescu, “Verification of Composite Galois Field Multipliers
over GF((2m)n) using Computer Algebra Techniques,” in IEEE High-Level Design
Validation and Test Workshop, pp. 136–143, 2011.
[78] J. Lv, P. Kalla, and F. Enescu, “Formal Verification of Galois Field Multipliers using
Computer Algebra,” in 25th IEEE International Conference on VLSI Design, 2012.
[79] O. Wienand, M. Wedler, D. Stoffel, W. Kunz, and G. Gruel, “An Algebraic Ap-
proach to Proving Data Correctness in Arithmetic Datapaths,” in Computer Aided
Verification Conference, pp. 473–486, 2008.
[80] N. Shekhar, P. Kalla, and F. Enescu, “Equivalence Verification of Polynomial
Datapaths using Ideal Membership Testing,” IEEE Transactions on CAD, vol. 26,
pp. 1320–1330, July 2007.
[81] E. Pavlenko, M. Wedler, D. Stoffel, W. Kunz, A. Dreyer, F. Seelisch, and G.-M.
Greuel, “STABLE: A New QBF-BV SMT Solver for Hard Verification Problems
Combining Boolean Reasoning with Computer Algebra,” in IEEE Design, Automa-
tion and Test in Europe Conference, pp. 155–160, 2011.
[82] R. Zippel, “Probabilistic Algorithms for Sparse Interpolation,” in Proc. Symp. Sym-
bolic and Algebraic Computation, pp. 216–226, 1979.
[83] M. Ben-Or and P. Tiwari, “A Deterministic Algorithm for Sparse Multivariate
Polynomial Interpolation,” in Proc. Symp. Theory of Computing, pp. 301–309, 1988.
[84] S. Javadi and M. Monagan, “On Sparse Polynomial Interpolation over Finite Fields,”
in Intl. Symp. Symbolic and Algebraic Computing, 2010.
[85] Z. Zilic and Z. Vranesic, “A Deterministic Multivariate Interpolation Algorithm for
Small Finite Fields,” IEEE Trans. Computers, vol. 51, Sept. 2002.
[86] R. J. McEliece, Finite Fields for Computer Scientists and Engineers. Kluwer
Academic Publishers, Norwell, MA, 1987.
[87] S. Roman, Field Theory. Springer, Providence, RI, 2006.
140
[88] R. Lidl and H. Niederreiter, Finite Fields. Cambridge University Press, Cambridge,
MA, 1997.
[89] P. Montgomery, “Modular Multiplication Without Trial Division,” Mathematics of
Computation, vol. 44, pp. 519–521, Apr. 1985.
[90] H. Wu, “Montgomery Multiplier and Squarer for a Class of Finite Fields,” IEEE
Transactions On Computers, vol. 51, May 2002.
[91] M. Knezˇevic´, K. Sakiyama, J. Fan, and I. Verbauwhede, “Modular Reduction in
GF(2n) Without Pre-Computational Phase,” in Proceedings of the International
Workshop on Arithmetic of Finite Fields, pp. 77–87, 2008.
[92] D. Singmaster, “On Polynomial Functions (mod m),” J. Number Theory, vol. 6,
pp. 345–352, 1974.
[93] Z. Chen, “On Polynomial Functions from Zn to Zm,” Discrete Math., vol. 137, no. 1-
3, pp. 137–145, 1995.
[94] Z. Chen, “On Polynomial Functions from Zn1×Zn2 × · · ·×Znr to Zm,” Discrete
Math., vol. 162, no. 1-3, pp. 67–76, 1996.
[95] ST Microelectronics, Secure MCU with 32-bit ARM SC300 CPU, SWP Inter-
face, NESCRYPT Cryptoprocessor and High-density Flash Memory. ST33F1M
Datasheet, 2013, Rev 4.
[96] K. Kobayashi, Studies on Hardware Assisted Implementation of Arithmetic Opera-
tions in Galois Field. PhD thesis, Nagoya University, Japan, 2009.
[97] S. Morioka and Y. Katayama, “Design Methodology for a One-shot Reed-Solomon
Encoder and Decoder,” in IEEE International Conference on Computer Design,
pp. 60–67, 1999.
[98] Y. Lee, K. Sakiyama, L. Batina, and I. Verbauwhede, “Elliptic-Curve-Based Security
Processor for RFID,” IEEE Transactions on Computers, vol. 57, pp. 1514–1527,
Nov. 2008.
[99] D. Hankerson, J. Hernandez, and A. Menezes, “Software Implementation of Elliptic
Curve Cryptography over Binary Fields,” in Cryptographic Hardware and Embed-
ded Systems CHES 2000 (C. K. Koc and C. Paar, eds.), vol. 1965 of Lecture Notes
in Computer Science, pp. 1–24, Springer, Berlin, Germany, 2000.
[100] V. Miller, “Use of Elliptic Curves in Cryptography,” in Lecture Notes in Computer
Sciences, (New York, NY, USA), pp. 417–426, Springer-Verlag, New York, NY,
1986.
[101] B. A. Forouzan, Cryptography and Network Security. McGraw-Hill, Columbus,
OH, 2008.
[102] J. Lo´pez and R. Dahab, “Improved Algorithms for Elliptic Curve Arithmetic in
GF(2n),” in Proceedings of the Selected Areas in Cryptography, pp. 201–212,
Springer-Verlag, London, UK, 1999.
141
[103] D. Cox, J. Little, and D. O’Shea, Ideals, Varieties, and Algorithms: An Introduction
to Computational Algebraic Geometry and Commutative Algebra. Springer, New
York, NY, 2007.
[104] B. Buchberger, Ein Algorithmus zum Auffinden der Basiselemente des Restklassen-
ringes nach einem nulldimensionalen Polynomideal. PhD thesis, University of
Innsbruck, 1965.
[105] B. Buchberger, “A Criterion for Detecting Unnecessary Reductions in the Construc-
tion of a Groebner Bases,” in EUROSAM, 1979.
[106] J. C. Fauge˜re, “A New Efficient Algorithm for Computing Gro¨bner Bases (F4),”
Journal of Pure and Applied Algebra, vol. 139, pp. 61–88, June 1999.
[107] B. Sunar, E. Savas, and C. Ko, “Constructing Composite Field Representations for
Efficient Conversion,” IEEE Transactions on Computers, vol. 52, pp. 1391–1398,
November 2003.
[108] C. Paar, Efficient VLSI Architecture for Bit-Parallel Computation in Galois Fields.
PhD thesis, University of Essen, Germany, 1994.
[109] M. McCutchen, “C++ Big Integer Library.” https://mattmccutchen.net/bigint/. [On-
line; accessed May-2015].
[110] Z. Feng, Z. Zeng, and P. Li, “Parallel On-Chip Power Distribution Network Analysis
on Multicore GPU Platforms,” IEEE Transactions VLSI, 2011.
[111] M. Brickenstein and A. Dreyer, “Polybori: A Framework for Gro¨bner Basis Com-
putations with Boolean Polynomials,” Journal of Symbolic Computation, vol. 44,
pp. 1326–1345, September 2009.
