Highly Automated Formal Verification of Arithmetic Circuits by Sayed-Ahmed, Amr
Highly Automated
Formal Veriﬁcation of
Arithmetic Circuits
PhD thesis
AMR SAYED-AHMED

Highly Automated Formal Veriﬁca-
tion of Arithmetic Circuits
Amr Sayed-Ahmed University of Bremen
A Dissertation Submitted in Partial Fulﬁllment
of the Requirements for the Degree of
Doctor of Engineering
- Dr.-Ing. -
University of Bremen, Faculty 3
Department of Mathematics and Computer Science
Bremen, December 2016
Supervisory Committee
1. Prof. Dr. Rolf Drechsler (University of Bremen, Germany)
2. Prof. Dr. Christoph Scholl (University of Freiburg, Germany)
i

To my Son Abdoallah
iii

ACKNOWLEDGMENTS
I would like to start by thanking my advisor Rolf Drechsler for his continuous
support and guidance. I deeply appreciate the trust in me to be part of his
research group, which allows me to learn and research about various interesting
topics. I take this opportunity to express gratitude to my former advisor Hossam
Fahmy, he sets my mind to become a researcher, I learned from him fruitful
lessons that gave me the power to complete my PhD journey. This thesis would
not have been possible without the help of my co-authors, Daniel Große, Ulrich
Kühne, and Mathias Soeken, who support me with their academic knowledge,
valuable feedback, and many insightful discussions and suggestions. I would
also like to thank my committee members and my external examiner Christoph
Scholl for their time and their constructive comments.
Words can not express how grateful I am to all members of my family, spe-
cial thanks to my parents Magda and Abdelfatah as well as my wife Gylan for
their constant encouragement and endless patience. Furthermore, the last four
years would not have been even half as enjoyable without all my friends in Ger-
many; nice discussions with Nahla Galal, Nabila Abdessaied, Judith Peters, and
Ahmed AbdelMonem Mohamed have contributed immensely to my personality.
—Amr Sayed-Ahmed, December 2016
v

CONTENTS
1 introduction 1
2 background 13
2.1 Circuit Modeling 13
2.1.1 Boolean Function 14
2.1.2 And Inverter Graph 15
2.1.3 Conjunctive Normal Form 17
2.1.4 Decision Diagrams 18
2.1.5 Multivariate Polynomials 21
2.2 Boolean Reasoning 23
2.2.1 Boolean Satisﬁability 23
2.2.2 Binary Decision Diagrams 25
2.2.3 Symbolic Computation 27
2.3 Formal Veriﬁcation of Arithmetic Circuits 33
2.3.1 Multiplier Architectures 33
2.3.2 Floating-Point Speciﬁcation 35
2.3.3 Equivalence Checking 37
2.3.4 Theorem Proving 40
3 recurrence relations: scalable verification of mul-
tipliers 43
3.1 Equivalence Checking Based on Recurrence Relations 44
3.2 Checking Partial Product Approach 47
3.2.1 Basic Notions 47
3.2.2 Overview of the Approach 50
3.2.3 Mathematical Formulations 51
3.2.4 Implementation 53
3.2.5 Discussion 55
3.2.6 Limitations 57
3.3 Experimental Results 58
3.3.1 Equivalence Checking Results 60
3.3.2 Fault Injection 60
3.4 Summary and Future Work 61
4 symbolic computation for verifying complex mul-
tipliers 63
4.1 Boolean Ring versus Binary Galois Field 65
vii
viii contents
4.2 Veriﬁcation Complexity of Sum Carry Networks 71
4.3 Problem of Vanishing Monomials 77
4.4 Logic Reduction within Model Rewriting 79
4.4.1 Logic Reduction 79
4.4.2 Rewriting Schemes 80
4.4.3 Overall Algorithm 85
4.4.4 Discussion 85
4.5 Ideal Membership Testing 87
4.6 Speciﬁcation Polynomial 90
4.7 Experimental Results 91
4.8 Summary and Future Work 94
5 equivalence checking of floating-point multipli-
ers using gröbner bases 97
5.1 Algebraic Combinational Equivalence Checking 98
5.2 Reverse Engineering of Data-path Units 101
5.2.1 Model Rewriting 102
5.2.2 Identifying Boundaries of Data-path Units 103
5.2.3 Abstracting Data-path Units 104
5.3 Arithmetic Sweeping 108
5.3.1 Generating Relationship Polynomials 109
5.3.2 Testing Membership of Internal Relations 110
5.4 Eﬃcient Polynomial Representation 111
5.4.1 Diﬀerent Decompositions 112
5.5 Experimental Results 116
5.6 Summary and Future Work 119
6 conclusions 121
bibliography 123
1
INTRODUCTION
Verifying the functional correctness of integrated circuits is essential to provide
high quality systems. Traditionally, industrial designs are validated by simu-
lation, often using specialized test case generators [45, 51, 84, 104] to target
speciﬁc areas. While such approaches are eﬃcient at exposing bugs, they are in-
herently incomplete and cannot achieve a full coverage—evaluation of all input
combinations. In many cases, this intelligent dynamic simulation leaves consid-
erable doubts about the correctness of integrated circuits. This has motivated
the development of formal veriﬁcation techniques to provide a full-functional
veriﬁcation coverage and prove the consistency of the circuits with their func-
tional speciﬁcation.
Microprocessors are being used for safety-critical systems such as air planes,
nuclear reactor controllers and medical instrumentations. Hence, formal veriﬁ-
cation for microprocessors has a vital role. There are also commercial arguments
for verifying formally microprocessors, because of the cost of a recall, redesign,
and refabrication. A well known example is the FDIV bug in Intel’s Pentium
processor [89] which costs nearly half a billion dollar in 1995. Since this famous
bug, a considerable research eﬀort has been spent developing automated and
formal techniques which can prove the correctness of microprocessor designs
beyond mere testing.
formal verification in hardware design
Formal veriﬁcation methods are applied in diﬀerent phases of the circuit’s de-
sign ﬂow, they beneﬁt signiﬁcantly the design process by uncovering early the
design errors. Typically, the design process proceeds in several roughly deﬁned
phases, from speciﬁcation (conceptual design), through Transaction Level Mod-
eling (TLM), Register Transfer Level (RTL), synthesis of a gate-level netlist,
and ﬁnally a structural layout as an input to the manufacture process. The
design speciﬁcation deﬁnes the functional of the design and the nonfunctional
requirements such as clock rate or power consumption. This speciﬁcation is im-
plemented manually or automatically, resulting in the RTL description. Hard-
ware Description Languages (HDLs)—such as VHDL and Verilog—allow the
1
2 introduction
description of the design at RTL. TLM supports an early evaluation of the
overall performance of the design, where components can easily be replaced be-
fore working on the actual RTL implementation. To obtain a gate-level netlist
from the RTL description, automatic synthesis tools are utilized. In order to
manufacture the ﬁnal circuit, the gate-level netlist is further transformed into a
structural layout by placing and wiring the netlist gates, using highly optimized
tools.
Formal veriﬁcation constructs rigorous mathematical proofs to verify the cor-
rectness of the Design Under Veriﬁcation (DUV). This thesis focuses on designs
that are described at RTL or gate-level. The formal proofs are hard—if not
impossible—to be done by hand due to their complex functionality. Typically,
they are performed with the help of automated software tools. This motivates
the developments of software-assisted veriﬁcation methods which formulate the
speciﬁcation of the DUV as proof obligations and verify that the DUV meets
these obligations via an algorithmic proof. This proof is performed by answer-
ing decision problems by ‘yes’ or ‘no’ via decision procedures. A naïve proof
would formulate the veriﬁcation problem as a single decision problem, in many
cases, this proof is computationally infeasible since the complexity of the de-
cision problem overruns the computational capabilities of the state-of-the-art
decision procedures. This motivates the developing of formal veriﬁcation tech-
niques for the veriﬁcation of complex industrial designs by deriving—manually
or mechanically—proofs consisting of sets of non-complex problems that can
be answered by classical decision procedures.
Theorem proving, equivalence checking, and model checking, are the state-
of-the-art techniques that verify formally microprocessors designs. Interactive
theorem proving [49, 52] (or theorem proving for short) can be simply thought
of as a formal proof that is checked by a computer. This mathematical proof is
aided in the creation and the checking by software tools called theorem provers
such as HOL-Light [48], Coq [29], PVS [75] and ACL2 [59] which provide trusted
mathematical knowledge in a form of large numbers of basic theories and lem-
mas. To check a proof that asserts the correctness of the DUV with respect to
the speciﬁcation, a theorem prover applies powerful deductive reasoning based
on its mathematical knowledge and lemmas that build the proof. The correct-
ness of each lemma in the proof is checked relative to some subset of previously
proved lemmas. In fact, such a check is too diﬃcult if there is no enough pre-
viously proved lemmas that support it. In this case, a skilled human veriﬁer
provides additional supporting lemmas based on his deep understanding of the
DUV, which requires from the veriﬁer a lot of knowledge, skills, and practice.
introduction 3
In contrast to theorem proving, equivalence checking and model checking
can be fully automated and do not require human intervention. In the case
of equivalence checking the question is whether two descriptions of the design
are equivalent, it checks whether a DUV and a proved correct implementation
for its speciﬁcation have the same (or equivalent) behavior. The two compared
designs are modeled using the same description method and combined in one
design, called miter. A miter is constructed by adding the implementation of an
equivalence function to perform a pairwise comparison between the correspond-
ing outputs of the two given designs. The equivalence function takes outputs
of compared designs as inputs and produces a single output which is the out-
put of the miter. The output of the miter will be one iﬀ at least one pair of
outputs diﬀer, otherwise it is proved to be constant zero and consequently the
compared designs are proved to be equivalent—they produce identical output
values under any input assignment. Automated detection for internal functional
equivalences [65, 72] is utilized to decompose this veriﬁcation proof into a set of
simple problems. This makes equivalence checking applicable only if the com-
pared designs have a fair degree of structural and functional similarity such
that enough internal equivalences can be detected.
Model checking considers the question of how to verify that a given sequential
design satisﬁes a temporal property. The DUV is modeled using a Finite State
Machine (FSM). Each state of the FSM is identiﬁed with a Boolean assignment
to a set of state variables. The speciﬁcation is formulated as a set of properties,
each property is a formula in a temporal logic such as Computation Tree Logic
(CTL) [20] or Linear Time Logic (LTL) [80]. Approaches based on (incomplete)
bounded model checking (BMC) [6] are eﬃcient in ﬁnding counter examples
that falsify a given property for a bounded number of states, however, it can-
not prove that a temporal property is satisﬁed in all reachable states. Since 2003,
model checking based on interpolation [69] has shown its advantages and is cur-
rently considered one of the most valuable complete formal veriﬁcation methods.
It derives an over-approximation for the forward image of the initial state of
the FSM and iterates an image operation to compute an over-approximation
for all reachable states. Because of the approximation, it is computationally
feasible to prove that a given property is true for unbounded number of states.
Nowadays, interpolation based model checking is complemented with IC3 [9, 10]
which is currently the most powerful algorithm for model checking of hardware.
IC3 maintains the sequence of stepwise over-approximating sets, because of ap-
proximation, some states are detected that lead to violating the given property
although they are unreachable from the initial states. IC3 works by iteratively
learning lemmas that demonstrate why these states cannot be reached within
4 introduction
a bounded number of steps. This automated algorithm of building inductive
veriﬁcation proofs allows a more powerful model checking for safety properties.
Decomposed problems by veriﬁcation techniques are described by formulas
of ﬁrst order theories such as propositional logic, bit vectors, and linear arith-
metic. For every decidable theory, there is a decision procedure that terminates
with a correct answer to a decision problem for given formulas of this theory.
Boolean Satisﬁability (SAT) solvers are decision procedures for propositional
logic that are utilized intensively by formal techniques. Given a Boolean for-
mula, a SAT solver [37] decides whether the formula is satisﬁable—there exists
an assignment of its variables under which the formula evaluates to true; if it is
satisﬁable, it also reports the satisfying assignment. The success of SAT solvers
can be largely attributed to their ability to learn from wrong assignments, prune
large search spaces quickly and focus ﬁrst on important variables—those vari-
ables that once given the right value, the problem is simpliﬁed signiﬁcantly.
Satisﬁability Modulo Theories (SMT) [34, 73] is a generalization of SAT, it de-
termines the satisﬁability of a Boolean combination of formulas in more theories
of ﬁrst order logic via tailored decision procedures, e.g., SMT solver can decide
the satisﬁability of the data-path operations of a microprocessor at the word
rather than the bit level.
Another proof solver that is utilized intensively besides SAT/SMT solvers
is based on Reduced Ordered Binary Decision Diagrams (ROBDDs, or BDDs
for short) [16]. BDDs are a highly useful graph-based data structure for ma-
nipulating Boolean formulas. The canonicity of this data representation is its
main feature that enables a trivial tautology procedure for answering decision
problems. If two formulas are functionally equivalent, then their BDD repre-
sentations are isomorphic. One implication of canonicity is that all formulas
that evaluate to true have the same BDD (a single node with a label ‘1’) and
all unsatisﬁable formulas also have the same BDD (a single node with a label
‘0’). Thus, two formulas of completely diﬀerent size can both be unsatisﬁable,
but their BDD representations are identical—a single node with the label ‘0’.
As a consequence, checking for satisﬁability can be done in constant time for a
given BDD using this tautology procedure. However, building the BDD for a
given formula can take exponential space and time, even if in the end it results
in a single node. For formulas of some important functions, such as multiplica-
tion, constructing the BDD is infeasible due to its exponential size. For such
arithmetic functions, word level decision diagrams such as Multiplicative Binary
Moment Diagram (*BMD) [18] have been introduced, whereas they oﬀer canon-
ical representations for large multipliers. Unfortunately, the sizes of *BMDs are
exponential for functions that can easily be represented using BDDs. For this
introduction 5
purpose, Hybrid Decision Diagrams such as HDDs [24] and K*BMDs [30] have
been proposed. HDDs and K*BMDs are supersets of *BMDs and allow in all
cases eﬃcient and canonical representations as BDDs, in particular, for decision
problems where data-path and control logic functions are combined. However,
building word level decision diagrams for large circuits at the gate-level is com-
putationally hard since the sizes of the diagrams may increase exponentially
during the construction.
thesis contributions
Veriﬁcation of Floating-Point Units (FPUs) is usually among the top check-
list items of any microprocessor. FPUs are data-intensive designs, achieving
complete coverage on these is impossible through dynamic simulations. For ex-
ample, the veriﬁcation of a single operation with two 64-bit operands requires
2128 input data combinations, many life-years have to be spent to completely
validate this operation. Formal veriﬁcation is the only way to get a complete
coverage on such designs. One noteworthy challenge is developing a fully au-
tomated technique which proves that a ﬂoating-point design is in consistence
with the IEEE standard for ﬂoating-point arithmetic (IEEE Std 754-2008) [50].
Theorem provers have been applied extensively to verify properties of ﬂoating-
point designs. Although a lot of automation has been added and ﬂoating-point
libraries have been created to avoid repetition of proofs, the theorem proving
methodology still requires an enormous amount of manual eﬀort [90]. The pa-
per by Jacobi et al. [54] is the highest automated work up to today, however,
it skips the hardest part to verify—multiplication. In order to come up with
a fully automated technique in this thesis, the functions of the fundamental
elements of ﬂoating-point designs are ﬁrst independently veriﬁed. Among these
elements, the multiplier has turned out to be the toughest part to verify. Word
level decision diagrams such as *BMDs suﬀer from an exponential blow-up of
their size during the construction of the diagram from bit level formulas. Tech-
niques based on SAT and SMT solvers fail to check for the correctness of large
scale multiplier circuits in practical time. The most successful technique up to
today is based on reverse engineering to an Arithmetic Bit-Level (ABL) repre-
sentation of the circuit [96]. It extracts adder structures from gate-level netlists
and builds full adder networks, however, building these adder networks is not
possible for all multiplier architectures as well as for incorrect multipliers.
This dissertation investigates the problems of two distinctive formal veri-
ﬁcation techniques for verifying large scale multiplier circuits and proposes
6 introduction
two approaches to overcome some of these problems. The ﬁrst technique is
equivalence checking based on recurrence relations [85], while the second one is
the symbolic computation technique which is based on the theory of Gröbner
bases [87]. This investigation demonstrates that approaches based on symbolic
computation have better scalability and more robustness than state-of-the-art
equivalence checking techniques for veriﬁcation of arithmetic circuits. Accord-
ing to this conclusion, the thesis leverages the symbolic computation technique
to verify ﬂoating-point designs. It proposes a new algebraic equivalence check-
ing [86], in contrast to classical combinational equivalence checking [66, 72], the
proposed technique is capable of checking the equivalence of two circuits which
have diﬀerent architectures of arithmetic units as well as control logic parts,
e.g., ﬂoating-point multipliers.
In the following, brief overviews are given about the three techniques that
have been proposed to verify in a fully automated manner large scale multipliers
as well as ﬂoating-point multipliers.
Recurrence Relations: Scalable Veriﬁcation of Multipliers
State-of-the-art equivalence checking techniques [65] cannot deal with imple-
mentations that have few internal equivalences. This problem occurs especially
for arithmetic circuits where one function can be implemented in many diﬀerent
ways [96]. For this reason, arithmetic properties of the multiplier function have
been employed by equivalence checking based on recurrence relations in order
to build a miter with many internal equivalences. The ﬁrst approach in this con-
text has been proposed by Fujita [42]. It is based on the fact that any function
satisfying the recurrence relation (X + 1) · Y = X · Y + Y is a multiplication.
However, the original approach by Fujita does not scale, and it cannot verify
multipliers larger than 16 bits.
To increase the scalability of this technique, we propose the Checking Par-
tial Product (CPP) approach [85] which decomposes the veriﬁcation process of
the multiplier into a series of simpler cases. In every case, the generation and
the addition of one partial product of the multiplier under veriﬁcation will be
checked using a recurrence equation. As shown in Figure 1, a given circuit netlist
is decomposed into small parts depending on deduced information about the
partial products of the circuit that is assumed to be a multiplier. The second
step as shown in the ﬁgure is the construction of a miter for every decomposed
part based on a recurrence relation. Finally, all created miters are checked inde-
pendently by the Combinational Equivalence Checking (CEC) approach [72], as
shown in the last block of Figure 1. The small diﬀerences and the similar ways
introduction 7
Circuit Netlist
Partial Products
Decomposition
Recurrence
Miters CEC
Equivalence
Inconsistency
Figure 1: Checking Partial Product Approach
Circuit
Netlist
Gröbner
Bases
Modeling
N
Membership
Testing
(IMT)
G
Polynomial
Speciﬁcation
pr
Equivalence/Inconsistency
Figure 2: Flow of Symbolic Computation Technique
of carry propagation within miters allow fast equivalence checking, regardless
of the multiplier size. As the multiplier size increases, the number of cases will
increase, while the complexity to check one case will remain almost the same.
Our approach is able to verify a multiplier at the gate-level without any infor-
mation about its high-level speciﬁcation or the internal structure of the netlist.
The experiments show the capability of the proposed approach to verify up
to 128-bit multiplier. However, the approach is not applicable for the veriﬁca-
tion of Booth recoding multipliers and optimized multipliers. The case splitting
scheme of the approach assumes that the partial products are independent of
each other, which is not the case for Booth recoding multipliers and optimized
multipliers.
Symbolic Computation for Verifying Complex Multipliers
The symbolic computation technique [23, 39, 68, 77, 99, 100] reduces the ver-
iﬁcation problem to a membership testing of a speciﬁcation polynomial in a set
of multivariate polynomials modeling a circuit netlist. It solves the veriﬁcation
problem using an algebraic decision procedure called Ideal Membership Testing
(IMT). As illustrated in Figure 2, the inputs of the IMT are the speciﬁcation
polynomial pr of the circuit function and a set of polynomials in the form of
Gröbner basis G = {g1, . . . , gs} modeling the circuit netlist N . The IMT proce-
8 introduction
Circuit
Netlist
Gröbner
Bases
Modeling
N
Model
Rewriting
G
Proposed
Contribution
Membership
Testing
G′
Polynomial
Speciﬁcation
pr
Equivalence/Inconsistency
Figure 3: Symbolic Computation for Multipliers
dure answers the question of whether the circuit netlist satisﬁes its speciﬁcation
by applying recursive divisions on pr wrt. G, denoted pr G−−−→+ r, where r is
the remainder of dividing pr by G. The division steps are repeated until no
term in r is divisible by the leading term of any polynomial in G. If r = 0, the
circuit satisﬁes the speciﬁcation, and an equivalence is proved, otherwise, an
inconsistency between the model G and the speciﬁcation is announced.
In the case of integer arithmetic, the IMT procedure suﬀers from an expo-
nential increase in the size of the intermediate polynomial during the division
(reduction) process, because of nonlinear terms that model carry chains which
are propagated within bit-level implementations of integer multipliers. To im-
prove the scalability of the technique, a rewriting step is inserted, as shown in
Figure 3, to derive a new set of Gröbner basis polynomials G′ = {g1, . . . , gs}
from the algebraic model of the circuit G = {g1, . . . , gt}, making the veriﬁcation
of a limited class of integer multipliers feasible. Model rewriting allows the early
cancellation of shared terms in the polynomial representation, which eﬀectively
circumvents the blow-up within the IMT procedure. However, enhancing the
technique by only rewriting is not suﬃcient to verify multipliers using complex
architectures such as Parallel Preﬁx Adders (PPAs) or Booth recoding. The
main reason—as identiﬁed by us—is the accumulation of vanishing monomials,
which refers to monomials that always evaluate to zero.
We propose an algebraic algorithm which enables the veriﬁcation of a large
class of multiplier circuits, i.e., including basic and parallel multiplier architec-
tures. Based on the observation of accumulating vanishing monomials, a novel
rewriting scheme is proposed to reveal these monomials. In particular, the al-
gorithm makes use of structural knowledge on the circuit netlist in order to
identify and remove vanishing monomials early before starting the IMT pro-
cess. Thus the approach can verify complex multiplier circuits of up to 128 bits
in practical time.
introduction 9
Circuit
Netlist 1
Gröbner
Modeling
N1
Gröbner
Modeling
Circuit
Netlist 2
N2
Combined
Model
G1 G2
Reverse
Engineering
G
Arithmetic
Sweeping
G′Gword
Membership
Testing
Output
Relationships
Gsimple
Inconsistency
Equivalence
Figure 4: Abstracted Flow of ACEC
Equivalence Checking of Floating-Point Multipliers Using Gröbner Bases
Motivated by the fundamental problem that not every circuit speciﬁcation
pr can be represented by one polynomial in a canonical and an abstract form
over Z2n , we are interested in equivalence checking, i.e., we want to prove the
functional equivalence of two circuits. This can be done as follows: Assume the
two circuits checked for equivalence represent the functions f1(x1, . . . ,xn) =
(y1, . . . , ym) and f2(x1, . . . ,xn) = (z1, . . . , zm) and are given as two sets of
polynomials G1 and G2. Then we test the membership of each polynomial
zj − yj (1 ≤ j ≤ m)—which formulates the equivalence of each output bit—in
polynomials from the combined model G = G1 ∪ G2. This naïve method does
10 introduction
not scale since during the recursive reduction (division) process performed by
the IMT procedure, the internal variables in the polynomials set G cause for
a tremendous overhead which can only be resolved when the primary input
variables xi appear in the polynomials.
This problem can be circumvented if one knows internal equivalences in the
two circuits, which allows putting internal variables into relation. Conceptually,
this is similar to SAT sweeping [65] and as a consequence G is simpliﬁed. This
ultimately avoids a blow-up of the polynomials during reduction. The diﬃculty
is ﬁnding internal equivalences. To solve this problem we propose reverse engi-
neering techniques: First, expected arithmetic word-level components such as
multipliers and adders are detected in the circuit using structural signatures.
Then, the proposed arithmetic sweeping uses the I/O boundaries of detected
word-level components to prove internal equivalences and circumvent division
blow-ups. To further reduce veriﬁcation runtime during the divisions we pro-
pose a decomposition algorithm that allows more compact and semi-canonical
representations for diﬀerent implementations of the same function.
The result is a new Algebraic Combinational Equivalence Checking (ACEC)
technique shown in Figure 4 which is based on Gröbner bases. It combines two
Gröbner basis sets G1 and G2—modeling the compared netlists N1 and N2—in
one Gröbner basis model G. Then it applies the two main algorithms of the
ACEC, which are reverse engineering and arithmetic sweeping, as illustrated in
Figure 4. The reverse engineering algorithm rewrites the model G into a new
Gröbner basis G′ to identify arithmetic functions of G′ and abstract them to
word-level polynomials, building from them a word-level model Gword. Using
the arithmetic sweeping algorithm, polynomials of Gword and G′ are leveraged
to deduce and prove equivalence relationships between internal variables of G′,
which leads to a simpliﬁed Gröbner basis Gsimple by merging internal variables
of G′ that are proved to be equivalent. Finally, as shown in the end of Fig-
ure 4, the ACEC checks the satisﬁability of the output relationships in the
simpliﬁed model Gsimple using the IMT procedure; if all output relationships
are satisﬁed, then the equivalence between N1 and N2 is proved, otherwise, the
nonequivalence is announced. In contrast to classical combinational equivalence
checking [66, 72], the ACEC can check the equivalence of two circuits which
contain diﬀerent architectures of arithmetic units, e.g., multipliers and adders,
as well as control logic parts. Our experimental evaluation demonstrates the ap-
plicability of our algebraic equivalence checking approach on several optimized
ﬂoating-point multipliers which cannot be veriﬁed by other fully automated
proof techniques.
introduction 11
Outline
The dissertation is based on several peer-reviewed publications. The publica-
tions are listed individually for each of the three main chapters:
Chapter 2 provides the required background to keep this dissertation self-
contained, it reviews deﬁnitions and notations from Boolean reasoning,
formal veriﬁcation, and symbolic computation technique.
Chapter 3 revisits equivalence checking based recurrence relations, it pro-
poses a scalable veriﬁcation approach for bit level multiplier circuits, the
chapter is based on the publication:
Recurrence Relations Revisited: Scalable Veriﬁcation of Bit Level
Multiplier Circuits
Amr Sayed-Ahmed, Ulrich Kühne, Daniel Große, and Rolf Drechsler
IEEE Annual Symposium on VLSI (ISVLSI), 2015, 1–6.
Chapter 4 enhances the symbolic computation technique to verify complex
architectures of integer multipliers, it is based on the publication which
has been nominated for best paper at DATE 2016:
Formal Veriﬁcation of Integer Multipliers by Combining Gröb-
ner Bases with Logic Reduction
Amr Sayed-Ahmed, Daniel Große, Ulrich Kühne, Mathias Soeken, and
Rolf Drechsler
Design, Automation and Test in Europe (DATE), 2016, 1048–1053.
Chapter 5 introduces a new algebraic equivalence checking technique for veri-
fying formally ﬂoating-point circuits, the chapter is based on the following
publication:
Equivalence Checking using Gröbner Bases
Amr Sayed-Ahmed, Daniel Große, Mathias Soeken, and Rolf Drechsler
Int’l Conf. on Formal Methods in CAD (FMCAD), 2016, 169–176.
Chapter 6 concludes the dissertation and provides directions for possible
future work.

2
BACKGROUND
To keep this work self-contained, this chapter brieﬂy provides the basics of the
main state-of-the-art formal veriﬁcation techniques for arithmetic circuits. Also,
the theoretical background of the symbolic computation technique is introduced,
which is the main focus of the thesis.
The chapter looks at the formal techniques leveraged for Boolean reasoning
from two aspects: the ﬁrst aspect is the modeling methodologies for the veriﬁ-
cation problem—the circuit under veriﬁcation and its speciﬁcation, the second
aspect is the manipulation algorithms over the obtained models to perform the
automated reasoning. So that in the ﬁrst part of the chapter, various circuit
modeling methods are provided. Then manipulation algorithms of three for-
mal Boolean reasoning techniques are introduced, together with relating each
technique to the respective modeling method.
The last part of the chapter reviews brieﬂy theoretical backgrounds, capabil-
ities, and drawbacks of formal veriﬁcation techniques for arithmetic circuits, in
addition to the speciﬁcation of ﬂoating-point multiplier circuits as described in
the IEEE standard for ﬂoating-point arithmetic [50] as well as various types of
multiplier architectures.
2.1 circuit modeling
Circuits are hardware implementations of Boolean functions, they are mainly
classiﬁed as combinational or sequential circuits. While combinational circuits
consist only of a combinational logic, sequential circuits integrate the combina-
tional logic with memory elements such as ﬂip-ﬂops. Typically combinational
logic is composed of the standard gates NOT, AND, OR, and XOR. A netlist
of combinational logic gates can be thought of as a directed acyclic graph with
wires carrying some value in the set {0, 1}, and these values are processed by
gates computing speciﬁed Boolean functions. In the following, deﬁnitions of
Boolean functions and diﬀerent eﬃcient modeling methods to manipulate the
representation of the combinational logic of a circuit are given.
13
14 background
2.1.1 Boolean Function
B = {0, 1} is the set of Boolean values. A Boolean variable takes a value
from this set.
Deﬁnition 1 (Boolean function). A multi-output Boolean function F : Bn →
Bm is a Boolean function which maps n inputs to m outputs with n,m ∈ N.
The multiple-output function can be represented as a tuple F = (f1, · · · , fm)
where fi : Bn → B is an one-output function for each i ∈ {1, · · · ,m}. Hence
F (X) = (f1(X), · · · , fm(X)) over a ﬁnite set of Boolean variables X. The
functions fi(X) are called primary outputs and the set of variables X =
{x1,x2, · · · ,xn} are primary inputs.
Deﬁnition 2 (Integer-valued function). Some multi-output Boolean functions
F : Bn → Bm such as arithmetic functions are encoded to integer-valued func-
tions F : Bn → Z2m which map n Boolean inputs to an integer output Z2m.
Boolean functions of circuits are modeled by diﬀerent representations such as
Truth-tables, Decision Diagrams (DDs) [16], And Inverter Graphs (AIGs) [65],
Conjunctive Normal Forms (CNFs) [98], or Multivariate Polynomials [68, 93].
The latter is based on Boolean ring—a concept from symbolic computation,
while the others leverage Boolean algebra to represent eﬃciently Boolean func-
tions.
Deﬁnition 3 (Boolean Algebra). A Boolean algebra is formally deﬁned as a set
of Boolean variables x1,x2,x3, · · · , three logic operations AND ∧, OR ∨, and
NOT ¬, and two distinct elements 0 and 1 such that the set holds the following
properties:
Idempotent: x1 ∧ x1 = x1 ∨ x1 = x1
Complementation: x1 ∧ ¬x1 = 0
x1 ∨ ¬x1 = 1
Identities: x1 ∧ 1 = x1 ∨ 0 = x1
x1 ∧ 0 = 0
x1 ∨ 1 = 1
Commutative: x1 ∧ x2 = x2 ∧ x1
x1 ∨ x2 = x2 ∨ x1
Associative: x1 ∧ (x2 ∧ x3) = (x1 ∧ x2) ∧ x3
x1 ∨ (x2 ∨ x3) = (x1 ∨ x2) ∨ x3
Absorption: x1 ∨ (x1 ∧ x2) = x1 ∧ (x1 ∨ x2) = x1
2.1 circuit modeling 15
The Boolean algebra permits writing any Boolean function as formulas. For
example, the following common functions could be described as follows:
XOR: x1 ⊕ x2 ≡ (x1 ∧ ¬x2) ∨ (¬x1 ∧ x2)
Implication: x1 ⇒ x2 ≡ ¬x1 ∨ x2
Equivalence: x1 ↔ x2 ≡ ¬(x1 ⊕ x2)
Deﬁnition 4 (Boolean Ring). A Boolean ring is a set on which the operations
of multiplication ·, addition +, and subtraction − over the Boolean elements of
the set are deﬁned and satisfy certain basic rules. The Boolean ring is given by
Z modulo 2 denoted Z2, the elements of its set can take only two values 0, 1
which are the Boolean values.
For Boolean variables x1,x2,x3, · · · in the set of the Boolean ring, the following
rules are satisﬁed:
Additive inverse: x1 +−x1 = 0
Identities: x1 · 1 = x1 + 0 = x1
Commutative: x1 + x2 = x2 + x1
x1 · x2 = x2 · x1
Associative: x1 + (x2 + x3) = (x1 + x2) + x3
x1 · (x2 · x3) = (x1 · x2) · x3
Distributive: x1 · (x2 + x3) = x1 · x2 + x1 · x3
Over the Boolean ring, Boolean functions can be represented as multivariate
polynomials where the roots of polynomials are the truth assignments of the
functions. This modeling method is explained in subsection 2.1.5.
2.1.2 And Inverter Graph
Deﬁnition 5 (And Inverter Graph (AIG)). An AIG is a Boolean directed
acyclic graph composed only of two-input AND gates and inverters (NOT gates).
The AIG of a circuit is derived by factoring its gates into AND and OR gates,
then converting OR gates into ANDs and inverters using DeMorgan’s rule:
¬(x1 ∨ x2) ≡ (¬x1 ∧ ¬x2).
Example 1. Consider the full adder circuit shown in Figure 5. x1, x2 and x3
are primary inputs, s and co are primary outputs, while v1, v2, v3 and v4 are
16 background
x1 g6 g2
x2 g5 s
x3 g3
g4 g1 co
v1
v4v3
v2
Figure 5: Full Adder Circuit
internal variables. The AIG model of the circuit is obtained by factoring the
XOR gates and converting ORs as follows:
Gate : g1 co = v4 ∨ v3 co = ¬(¬v4 ∧ ¬v3)
Gate : g2 s = v1 ⊕ x3 s = ¬(v1 ∧ x3) ∧ ¬(¬v1 ∧ ¬x3)
Gate : g3 x4 = v2 ∧ x3 v4 = v2 ∧ x3
Gate : g4 v3 = x1 ∧ x2 v3 = x1 ∧ x2
Gate : g5 v2 = x1 ∨ x2 v2 = ¬(¬x1 ∧ ¬x2)
Gate : g6 v1 = x1 ⊕ x2 v1 = ¬(x1 ∧ x2) ∧ ¬(¬x1 ∧ ¬x2)
AIGs consist of speciﬁc types of nodes: two-input AND nodes, primary input
(PI) nodes, and primary output (PO) nodes. Primary input nodes have no
incoming edges. The inverters are not counted as nodes of the AIG graph, they
are represented as complemented edges. Figure 6 shows an AIG representation
for the full adder of Example 1, the AND nodes are represented as circles, the
complemented edges as dashed arrows, x1, x2, and x3 are input nodes, while s
and co are output nodes.
Diﬀerent manipulation methods are applied on the AIG model to minimize
the number of ANDs and inverters such as structural hashing and rewriting
[71]. Structural hashing ensures during the construction of the AIG that no
two AND gates have identical pairs of incoming edges, as is noticed in Figure 6.
Rewriting selects iteratively AIG subgraphs rooted at a node and replaces them
with smaller precomputed subgraphs, while preserving the functionality of the
root node. The subgraphs are collapsed by refactoring their Boolean expressions
[12] or balancing them using the algebraic tree-height reduction [25].
2.1 circuit modeling 17
x1
s
co
x2 x3
Figure 6: AIG for Full Adder
2.1.3 Conjunctive Normal Form
Deﬁnition 6 (Conjunctive Normal Form). A formula (one-output Boolean
function) is in Conjunctive Normal Form (CNF) if it is a conjunction of dis-
junctions of literals, i.e., it has the form
∧
i
(
∨
j
lij) where lij is the j-th literal (a
literal is either a Boolean variable or its negation) in the i-th clause (a clause
is a disjunction of literals).
The transformation from circuit gates into CNFs is done via Tseitin’s encod-
ing [98]. Several clauses are added to constrain the value of the output variable
18 background
of each logic gate according to the function of the gate. For Boolean functions
of basic gates, the clauses modeling these functions are:
xo = ¬x1 =⇒ (xo ∨ x1) ∧ (¬xo ∨ ¬x1)
xo = x1 ∧ x2 =⇒ (¬xo ∨ x1) ∧ (¬xo ∨ x2) ∧ (xo ∨ ¬x1 ∨ ¬x2)
xo = x1 ∨ x2 =⇒ (xo ∨ ¬x1) ∧ (xo ∨ ¬x2) ∧ (¬xo ∨ x1 ∨ x2)
xo = x1 ⊕ x2 =⇒ (¬xo ∨ x1 ∨ x2) ∧ (xo ∨ ¬x1 ∨ x2) ∧ (xo ∨ x1 ∨ ¬x2)
∧ (¬xo ∨ ¬x1 ∨ ¬x2).
CNF clauses for a complete circuit consist of the conjunction of all CNFs of
the local gates. The transformation into CNF clauses increases the size of
the Boolean function linearly, however, particular decision procedures—SAT
solvers—are designed to work eﬃciently over CNF models, see Subsection 2.2.1.
2.1.4 Decision Diagrams
x1
x2
1
x3
x2
0
co
x2 ∨ x3 x2 ∧ x3
x3
x1
x2
x3
1
x2
x3
0
s
¬(x2 ⊕ x3) x2 ⊕ x3
x3 ¬x3
Figure 7: BDD for Full Adder
Decision Diagrams (DDs) are directed acyclic graphs that are built by or-
dering primary inputs of a function and applying recursive decompositions
to the function based on this order. DDs such as Binary Decision Diagrams
(BDDs) [15, 16] and Multiplicative Binary Moment Diagrams (*BMDs) [17] are
popular data structures since they oﬀer canonical representations for Boolean
functions and eﬃcient manipulation methods, making checking of functional
properties such as satisﬁability and equivalence straightforward. BDDs map
Boolean functions f : Bn → Bm based on Shannon’s decomposition rule
2.1 circuit modeling 19
x2
x1
x0
0
2
4
y2
y1
y0
1 2 4
Z
4x2 ·
2∑
i=0
2iyi + 2x1 ·
2∑
i=0
2iyi + x0 ·
2∑
i=0
2iyi
2x1 ·
2∑
i=0
2iyi + x0 ·
2∑
i=0
2iyi
x0 ·
2∑
i=0
2iyi
2∑
i=0
2iyi
1∑
i=0
2iyi
y0
Figure 8: *BMD for 3-bit Multiplier
xi
f |xi=0 f |xi=1 − f |xi=0
f = f |xi=0 + xi(f |xi=1 − f |xi=0)
Figure 9: Decomposition Rule of *BMD
f = ¬xif |xi=0 ∨ xif |xi=1. Term f |xi=1 is the positive cofactor of f with re-
spect to the variable xi, i.e., the function that is resulted when the value one is
assigned to xi. In a similar way, f |xi=0 denotes the negative cofactor of f , when
xi = 0. For *BMDs, Positive Davio’s rule f = f |xi=0 + xi(f |xi=1 − f |xi=0) is
the basis to represent a Boolean function at word level as an integer-valued
function f : Bn → Z2m . The Positive Davio’s rule decomposes f into the nega-
tive cofactor f |xi=0 and the function (f |xi=1 − f |xi=0) which is called the linear
moment of f .
The graphs of DDs as shown in Figure 7 and Figure 8 consist of non-terminal
vertices (drawn as circles) and terminal vertices (drawn as squares) labeled by
the possible values of the represented function. Each non-terminal vertex is la-
beled by a variable from the primary inputs of the function and has exactly two
children. The directed edges to these children are called low-edge and high-edge
20 background
and are drawn dashed and solid, respectively. A non-terminal vertex labeled xi
represents a function f decomposed wrt. xi using a rule into two child functions
which are represented by children vertices. In the case of BDD, the child func-
tions are the negative and positive cofactors of f , while for *BMD, as shown in
Figure 9, f is decomposed into negative cofactors and linear moment functions.
Example 2. Consider the BDD representations of a full adder shown in Fig-
ure 7. x1, x2 and x3 are primary inputs, while s = x1 ⊕ x2 ⊕ x3 and co =
(x1 ∧ x2) ∨ (x1 ∧ x3) ∨ (x2 ∧ x3) are primary outputs.
The variables are ordered as x1 > x2 > x3, based on this chosen order, the
function of the output co is decomposed ﬁrst wrt. x1 by Shannon’s rule into
co = ¬x1co|x1=0 ∨ x1co|x1=1, whereas co|x1=0 = x2 ∧ x3 and co|x1=1 = x2 ∨ x3
are the resulted cofactors. This decomposition is represented in Figure 7 by a
vertex labeled x1 with two children vertices that represent the child functions.
The two children vertices are labeled by the variable x2 since it is chosen to
decompose the negative cofactor co|x1=0 = ¬x2co|x1=0&x2=0 ∨ x2co|x1=0&x2=1
into co|x1=0&x2=0 = 0 and co|x1=0&x2=1 = x3 and to decompose the positive
cofactor co|x1=1 into co|x1=1&x2=0 = x3 and co|x1=1&x2=1 = 1. Finally x3
decomposes the resulted child functions from the two previous steps into 0 and
1. A similar way is applied to build a BDD for the output function s, which is
shown in the right side of Figure 7.
*BMDs support multiplicative edge weights—the values at the edges are
multiplied with the represented child functions. Using this data structure, it is
feasible to build compact diagrams for multipliers of large bit width, whereas
multiplier representations based on BDDs have exponential sizes.
Example 3. As shown in Figure 8, *BMD can represent a 3-bit multiplier
Z =
2∑
i=0
2ixi ·
2∑
i=0
2iyi by a compact diagram. The chosen order is x2 > x1 >
x0 > y2 > y1 > y0.
The function Z = 4x2 ·
2∑
i=0
2iyi + 2x1 ·
2∑
i=0
2iyi + x0 ·
2∑
i=0
2iyi is decomposed wrt.
x2 by Positive Davio’s rule as Z = Z|x2=0 + x2(Z|x2=1 −Z|x2=0). The negative
cofactor function is Z|x2=0 = 2x1 ·
2∑
i=0
2iyi + x0 ·
2∑
i=0
2iyi, while the positive-
edge is labeled by ‘4’ to represent the coeﬃcient of the linear moment function
Z|x2=1 − Z|x2=0 = 4
2∑
i=0
2iyi. Decomposing these child functions recursively by
the same explained way builds the diagram shown in Figure 8.
2.1 circuit modeling 21
Because some functions can only be represented eﬃciently by *BMDs and
others can easily be represented by BDDs [30], hybrid DDs such as HDDs [24]
and K*BMDs [30] have been proposed to model more diversiﬁed types of func-
tions by one diagram. They support mixing diﬀerent decomposition types—the
function decompositions are not performed by one decomposition rule as in
BDD and *BMD. Every variable is decomposing the function using one of
three decomposition rules: Shannon, Positive Davio, or Negative Davio f =
f |xi=1 + (1 − xi)(f |xi=0 − f |xi=1). Choosing a proper variable ordering and
relating every variable with the appropriate decomposition type are the key
roles for eﬃcient modeling of a given function by hybrid DDs, however, ﬁnding
such good choices as well as restricting the weights of edges to make the graph
canonical are diﬃculties that restrict applications of hybrid DDs.
2.1.5 Multivariate Polynomials
Symbolic computation provides the Gröbner bases theory which is capable
of modeling a circuit as a set of Boolean polynomials. This subsection gives
an overview about the practical usage of the theory for the circuit modeling,
skipping over the theoretical part which is presented in Subsection 2.2.3.
Deﬁnition 7. A Boolean polynomial p = c1M1 + · · · + ctMt is a ﬁnite sum
of terms, where each term is the product of a coeﬃcient ci and a power prod-
uct over a set of n Boolean variables {x1, . . . ,xn} denoted a monomial Mi =
x1x2 · · ·xn−1xn. The coeﬃcients are integers—ci ∈ Z for all i = 1, only the
leading coeﬃcient lc(p) = c1 ∈ {−1, 1} is limited to ‘−1’ or ‘1’
The monomials of a polynomial are ordered according to a monomial ordering
≺, such that M1 > · · · > Mt, the leading term of the polynomial is lt(p) = c1M1,
the leading monomial is lm(p) = M1, and the leading coeﬃcient is lc(p) = c1.
The thesis denotes tail(p) = p − lt(p) = c2M2 + · · ·+ ctMt.
A set of Boolean polynomials P = {p1, . . . , ps} belongs to a Boolean Poly-
nomial Ring Z2(x1, . . . ,xn), where (Z2) is the Boolean ring (see Deﬁnition 4).
Within the polynomial ring Z2(x1, . . . ,xn), the set of polynomials 〈x2i −xi〉 are
added to keep the variables xi in the Boolean domain. Note that the solutions
of the polynomial equation x2i − xi = 0 are xi ∈ {0, 1}, which restrict values
of xi to Boolean values. The practice inﬂuence of these polynomials 〈x2i − xi〉
is interpreted by reducing xαii to xi every time its degree becomes greater than
one during any computational step. For example, the monomial x21x32x3 is equal
to x1x2x3 over the Boolean polynomial ring.
22 background
For modeling circuits, the monomial order follows the reverse topological
order of the variables of the modeled circuit. Logic gates of a circuit are modeled
by polynomials and signals as Boolean variables. The modeling is performed
according to the Boolean function of gates, for the basic Boolean functions, the
polynomial representations are as follows:
NOT: xo = ¬x1 =⇒ −xo − x1 + 1
AND: xo = x1 ∧ x2 =⇒ −xo + x1x2
OR: xo = x1 ∨ x2 =⇒ −xo − x1x2 + x1 + x2
XOR: xo = x1 ⊕ x2 =⇒ −xo − 2x1x2 + x1 + x2
MUX: xo = (x1 ∧ x2) ∨ (¬x1 ∧ x3) =⇒ −xo + x1x2 − x1x3 + x3.
Each Boolean function is modeled in a way that the output variable xo is de-
scribed in terms of the input variables {x1,x2,x3}. The solutions of the polyno-
mial equations of these Boolean functions correspond to the truth assignments
of these functions.
Example 4. For the NOT function the solutions (roots) of its polynomial are
the pairs (xo = 0, x1 = 1) and (xo = 1, x1 = 0) which are the truth assignment
of the function.
By ordering each variable of the model according to its reverse topological level
in the circuit, every polynomial will be of the form pi := xi + tail(pi), where xi
is the gate’s output variable and tail(pi) are terms consisting of the gate’s input
variables, describing the function implemented by the gate. According to this
polynomial form, all leading monomials of the model will be relatively prime,
which is the main condition to represent a set of polynomials as Gröbner basis
(see Subsection 2.2.3).
Example 5. Consider the full adder circuit implementing the function s +
2co = x1 + x2 + x3 shown in Figure 5. Its algebraic model is:
g1 := −co − v4v3 + v4 + v3 g2 := −s − 2v1x3 + v1 + x3
g3 := −v4 + v2x3 g4 := −v3 + x1x2
g5 := −v2 − x1x2 + x1 + x2 g6 := −v1 − 2x1x2 + x1 + x2.
Ordering the polynomial variables in the reverse topological order of the circuit
yields co > s > v4 > v3 > v2 > v1 > x3 > x2 > x1. Following this order, the
leading monomials of all polynomials will be relatively prime, e.g., the leading
monomial of g1 is co, and it is prime relative to all other leading monomials,
the extracted algebraic model is therefore a Gröbner basis.
2.2 boolean reasoning 23
Decision
CNF
BCP
Partial Assigment Conﬂict
Analysis
Conﬂict
Backtracking
UNSAT
dl > 0
dl = 0
SAT
Full Assigment
No Conﬂict
Figure 10: DPLL Algorithm
The reader can ﬁnd more mathematical details about the Gröbner bases
theory and the symbolic computation in Subsection 2.2.3.
2.2 boolean reasoning
For formal veriﬁcation of circuits, there are mainly three proof techniques uti-
lized for eﬃcient Boolean reasoning of problems derived from circuits. Tradition-
ally, Satisﬁability (SAT) and Decision Diagrams (DDs)—in particular BDDs—
are used intensively to solve diﬀerent formal veriﬁcation problems, however,
they are incapable of dealing with problems that require to solve nonlinear
arithmetic constraints which can be solved by the symbolic computation tech-
nique as is demonstrated in the thesis. In this section, the main concepts of SAT
and BDDs are given, then concepts and notations of the symbolic computation
technique utilized in the thesis are described.
2.2.1 Boolean Satisﬁability
The Boolean satisﬁability problem is about ﬁnding an assignment that sat-
isﬁes a set of constraints. Because of that, it has a practical and theoretical
importance in many applications which has led to a vast amount of research to
develop powerful SAT solvers.
Deﬁnition 8 (Assignment). Given a Boolean function f(x1, · · · ,xn), an as-
signment α = (a1, · · · , an) to f is mapping each primary inputs xi to elements
of Boolean values ai ∈ {0, 1}. The assignment is full if all primary inputs are
assigned, and partial otherwise.
24 background
Deﬁnition 9 (Boolean Satisﬁability Problem (SAT)). Given an one-output
Boolean function (formula) f(x1, · · · ,xn) : Bn → B, the Boolean satisﬁabil-
ity problem decides whether there exists an assignment α = (a1, · · · , an) for
primary inputs of f under which the formula evaluates to true (f(α) = 1),
otherwise, f is proved unsatisﬁable since there is no such an assignment.
Given a Boolean formula f , a SAT solver decides whether f is satisﬁable
and reports a satisfying assignment, or it proves that f is unsatisﬁable. Typi-
cally, SAT solvers consider the problem of solving formulas in CNF (see Sub-
section 2.1.3) since every formula can be converted to this form in a linear
time and the Davis-Putnam-Loveland-Logemann (DPLL) algorithm [28] is per-
formed eﬃciently over it. To ﬁnd a satisﬁable solution, the majority of SAT
solvers leverage the DPLL algorithm to perform the search progress by making
a decision about a value of a Boolean variable, propagating implications of this
decision, and backtracking in the case of a conﬂict. This search progress can be
thought of as traversing and backtracking on the binary tree of the search space.
In this tree, internal nodes represent partial assignments and the leaves repre-
sent full assignments. Each decision is associated with a decision level which
is the depth in the binary decision tree and denotes the number of variables
assigned by previous decisions.
The DPLL algorithm performs its steps based on the status of CNF clauses
under assignments. A clause is satisﬁed if one or more of its literals are satisﬁed,
conﬂicting if all of its literals are assigned but not satisﬁed, unit if it is not
satisﬁed and all but one of its literals are assigned, and unresolved otherwise.
The DPLL algorithm as represented by [105] consists of main four steps, as
shown in Figure 10:
1. Decision, it chooses an unassigned variable and assigns value for it. If
and only if there are no more variables to assign, the solver announces
the problem as satisﬁable and reports a satisﬁable assignment. There are
numerous heuristics for making these decisions which are associated with
decision levels.
2. Boolean constraints propagation (BCP), based on the decision taken by
the previous step, the unit clause rule is applied repeatedly until either
a conﬂict is encountered or there are no more implications. The rule is
applied on a clause at the unit status—with a single unassigned literal—
by assigning to the unassigned literal the value that evaluates the clause
to true.
2.2 boolean reasoning 25
3. Conﬂict Analysis, it returns a decision level which is utilized by the solver
for backtracking. If a conﬂict at decision level 0 is detected, the solver
proves that the formula is unsatisﬁable, otherwise, it backtracks to the
decision level given by the conﬂict analysis. In addition to computing the
backtracking level and detecting the problem unsatisﬁability, this analysis
step adds new constraints to the search in the form of new clauses to avoid
the future occurrence of this conﬂict, this approach is named conﬂict-
based-learning.
4. Backtracking, based on the decision level (dl) generated by the conﬂict
analysis, it erases all variable assignments at decision levels larger than
dl.
For more details about SAT/SMT solvers, the interested reader is referred
to the book [64].
2.2.2 Binary Decision Diagrams
The canonicity of BDDs allows to solve decision problems such as satisﬁa-
bility or unsatisﬁability in a constant time for given reduced BDDs since two
functions are identical iﬀ their reduced BDDs are identical. A BDD is a canon-
ical representation under two conditions:
1. The BDD is reduced by reductions rules until neither of the rules is ap-
plicable.
2. The variables of the reduced BDD appear in the same order x1 < x2 <
· · · < xn on each path from the root node to a terminal node (the BDD
is ordered).
A BDD can be implemented as a simple binary decision tree where each
variable appears at least once from the root to the leaves. Such a representation
has the same size of the truth table of the function since every path in this BDD
from the root to a leaf corresponds to an assignment in the truth table. The
BDD provides the feature that it can be reduced from such a tree to a unique
representation under speciﬁc reduction rules which are repeated as long as they
can be applied.
1. Reduction #1 merges isomorphic subtrees, isomorphic subtrees are those
that have roots representing the same variable and have the same left and
right children.
26 background
2. Reduction #2 removes redundant nodes which their values do not aﬀect
the values of paths that go through it. At this case, the two edges of
the redundant node point to the same child node, the redundant node is
removed by redirecting its incoming edge to its child node.
The size of a BDD as well as its canonicity depend strongly on the variable
ordering, whereas diﬀerent orders construct diﬀerent BDDs for a given function.
To obtain canonical BDDs for diﬀerent implementations of a given function,
their reduced BDDs must have a ﬁxed variable ordering. In addition to that,
there are functions such as adders which their reduced BDDs have a polynomial
number of nodes under a certain variable ordering, while with another order
the size of the diagram is exponential. For ﬁnding a good variable ordering of a
given function, eﬃcient heuristic variable ordering algorithm has been proposed,
based on both static and dynamic ordering [81].
However, building a canonical decision diagram for a given function may take
exponential space and time, even if in the end it results in a bounded number
of nodes. To avoid the blow-up during building a BDD, instead of creating
the BDD directly for a given function, the diagram is composed recursively
from BDDs of its subexpressions. The algorithm that performs such a recursive
composition is known as the Synthesis (also called Apply) algorithm. Typi-
cally for this, eﬃcient implementations use a recursive synthesis algorithm [8]
based on the if-then-else operator (ITE). ITE is a Boolean function deﬁned as
ite(f , g,h) = (f ∧ g)∨ (¬f ∧h), it can express main Boolean operators between
BDDs as follows:
¬f = ite(f , 0, 1) f ∧ g = ite(f , g, 0)
f ∨ g = ite(f , 1, g) f ⊕ g = ite(f ,¬g, g).
For BDDs of three functions f , g and h that at least two of them share the root
nodes of variable x, a BDD of the operator ite(f , g,h) is constructed by calling
recursively the following function:
ite(f , g,h) = ite(x, ite(f |x=1, g|x=1,h|x=1), ite(f |x=0, g|x=0,h|x=0)).
Starting with the top most variable x, this equivalence formula is applied re-
cursively to all the variables in the order they appear in their respective BDDs
( f , g, and h must have compatible ordering for the algorithm to work). The
operation ITE increases the number of isomorphic subtrees, therefore, the sizes
of BDDs are reduced eﬃciently by the reduction rule #1.
2.2 boolean reasoning 27
Recursive
DivisionGröbner Bases Model
G = {g1, . . . , gs}
Equivalence
Relationship
pr
Remainder
Checking
Equivalence
Inconsistency
r
r = 0
r = 0
Figure 11: Ideal Membership Testing (IMT)
A tautology procedure based on BDDs solves decision problems by repre-
senting their functions as reduced BDDs under a ﬁxed variable ordering. These
problems can be classiﬁed mainly into two types:
1. The equivalence testing between reduced BDDs of two functions f and
h. This equivalence test is very easy since it suﬃces to check whether the
roots for f and h lead to the same node, which can be done in a constant
time.
2. The satisﬁability problem to ﬁnd an assignment α for which f(α) = 1.
This is done by a simple depth-ﬁrst search approach which is given the
reduced BDD of f in order to ﬁnd a path from the root of f to the 1-sink,
the assigned values for the variables in such a path is the assignment α.
Otherwise, if there is no such a path, then f is unsatisﬁable.
The interested reader is referred to [32] for more details about decision dia-
grams.
2.2.3 Symbolic Computation
Symbolic computation oﬀers an algebraic decision procedure named Ideal
Membership Testing (IMT) which can answer questions about the correctness
of equivalence relationships. As shown in Figure 11, the IMT takes two inputs:
1) the circuit model as a Gröbner basis G = {g1, . . . , gs} (see Subsection 2.1.5),
and 2) a multivariate polynomial pr describing the equivalence relationship be-
tween two or multiple variables of the circuit model. It tests whether the equiva-
lence relationship polynomial pr lies in the Gröbner basis G = {g1, . . . , gsˆ}, the
28 background
testing is performed by reducing (dividing) pr wrt. G. In case that the remain-
der r of applying the division algorithm to divide pr by G is equal to zero, then
IMT proves that the circuit satisﬁes the equivalence relationship, otherwise r is
a symbolic polynomial constructed of the primary inputs of the circuit model
and IMT announces that the relationship is not satisﬁed in the model.
Because the IMT is applied on Gröbner bases, it is essential to show that the
modeling method introduced in Subsection 2.1.5 oﬀers a Gröbner basis model
for a given circuit. By modeling every logic gate in the circuit as one polyno-
mial, a set of Boolean polynomials is constructed. This set P = {p1, . . . , ps} ∈
Z2(x1, . . . ,xn) has a set of all solutions named aﬃne variety V (p1, . . . , ps)
of the polynomial equations p1(x1, . . . ,xn) = · · · = ps(x1, . . . ,xn) = 0. The
aﬃne variety is not only the solutions of the given set of polynomials P , in
fact, it is the set of solutions of the ideal generated by the polynomials. An
ideal I = 〈P 〉 = {
s∑
i=1
hi · pi : hi ∈ Z2(x1, . . . ,xn)} is generated by this set of
polynomials P , and we call P the bases (generators) of the ideal I. The ideal I
may have many other bases. The bases are diﬀerent representations of the set
of polynomials P . One of these bases is called a Gröbner basis G = {g1, . . . , gsˆ},
for which V (G) = V (I).
Buchberger (1965) [19] introduced the algorithmic theory of Gröbner bases
which are primarily deﬁned for ideals in a polynomial ring K[x1, · · · ,xn] over a
ﬁeld K [26]. To apply the Gröbner bases theory to polynomial rings over a ring,
various approaches have been proposed [3, 57, 58, 95] to extend basic deﬁnitions
and concepts. Recently, there is renewed interest to extend the theory for more
types of rings [41, 44, 83].
Deﬁnition 10 (Ring). A ring is a set with two operations addition and multi-
plication satisfying additive and multiplicative associativity, additive commuta-
tivity, left and right distributivity, and existence of additive identity and inverse.
A commutative ring also satisﬁes multiplicative commutativity.
Deﬁnition 11 (Field). A ﬁeld K is a commutative ring with unity, where every
element in K, except 0, has a multiplicative inverse (i.e., ∀x ∈ K −{0}, ∃xˆ ∈ K
such that x · xˆ = 1).
The integers Z2m (for m > 1) is not a ﬁeld since not every element in Z2m has
an inverse, it is a commutative ring. For example, Z4 is not a ﬁeld because it
has the element 2 whose multiplicative inverse 0.5 is not an integer and is not
in Z4.
As shown in Subsection 2.1.5, to model designs implementing integer-valued
functions (see Deﬁnition 2) as multivariate polynomials, the variables of the
2.2 boolean reasoning 29
polynomials are Booleans while the coeﬃcients are in the integer ring Z, there-
fore, the ideals of those polynomials will be in the Boolean ring Z2(x1, . . . ,xn) =
Z[x1, . . . ,xn]/〈x21 − x1, · · · ,x2n − xn〉, whereas the ideal 〈x2i − xi〉 restricts the
values of xi to the set {0, 1}. In [41], the ring Z[x1, . . . ,xn]/Ia has been in-
vestigated, where Ia ⊆ Z[x1, . . . ,xn] is an ideal, it has proposed a restriction
on Ia to arrive at a necessary and suﬃcient condition, such that all ideals in
Z[x1, . . . ,xn]/Ia are isomorphic to integer lattices, therefore, Gröbner bases
representations of those ideals can be derived as for ﬁelds. The restriction is
that the Gröbner basis Ga of Ia consists of monic polynomials, one for every
variable xi.
Deﬁnition 12 (Monic Polynomial). A Polynomial is named monic, if it is in
the form xαt + ct−1xαt−1 + · · · + c1xα1 + c0, whereas α1 ∈ N and the leading
coeﬃcient of the leading term xαt is equal to one.
Theorem 1 ([41]). Let Ia ⊆ Z[x1, . . . ,xn] be a non-zero ideal. Let Ga be
a Gröbner basis for Ia wrt. a monomial ordering, ≺. If polynomials of Ga
are monic then the Gröbner bases theory can be applied on ideals of the ring
Z[x1, . . . ,xn]/Ia.
Since the ideal 〈x2i − xi〉 satisﬁes the condition of Theorem 1, Gröbner bases
could be derived for ideals in the Boolean ring Z2(x1, . . . ,xn).
To compute the Gröbner basis G = {g1, ..., gsˆ} for an ideal I〈p1, . . . , ps〉,
a Gröbner bases algorithm constructs G in a ﬁnite number of steps by apply-
ing S-polynomial Spoly(p, g) G−−−→+ r in every step. For ﬁelds, Buchberger’s
algorithm [19] is applied, while for rings such as Z2(x1, . . . ,xn) the Gröbner
bases algorithm proposed by [58] generates a Gröbner basis by computing S-
polynomials for rings. The Gröbner basis is obtained by applying repeatedly S-
polynomial until: 1) every pi can be reduced to zero wrt. G; and 2) S-polynomial
can reduce every pair of polynomials in G to zero.
Deﬁnition 13 (S-polynomial for Fields [26]). The S-polynomial of polynomials
p and g in a polynomial set P , is the combination Spoly(p, g) = Llt(p) · p −
L
lt(g) · g, where L is the least common multiple LCM(lm(p), lm(g)). Note that
Spoly(p, g) cancels the leading terms of p and g, the remainder r obtained in
Spoly(p, g) P−−−→+ r gives a new leading term.
Deﬁnition 14 (S-polynomial for Rings [58]). For rings, the S-polynomial is the
combination Spoly(p, g) = Llm(p) · p− Llm(g) · qc · g, whereas lc(p) = qc · lc(g)+ rc,
qc is the quotient of dividing lc(p) by lc(g) and rc is the remainder.
30 background
The utilized modeling method in the thesis models the circuit directly as a
Gröbner basis and avoids the Gröbner bases algorithm which is computationally
expensive, however, we are interested in the S-polynomial since it tests whether
a given set of polynomials is in the form of a Gröbner basis or not.
Lemma 1. Given a ﬁnite set G ∈ Z2(x1, . . . ,xn), suppose that we have p, g ∈
G such that LCM(lm(p), lm(g)) = lm(p) · lm(g). In other words, the leading
monomials of p and g are relatively prime. Then Spoly(p, g) G−−−→+ 0 [26].
According to Lemma 1, a given polynomial set is a Gröbner basis, if the leading
monomials of all polynomials in the set are relatively prime. By combining this
lemma with the aﬃne variety concept of an ideal, the thesis deﬁnes the Gröbner
bases of an ideal as follows:
Deﬁnition 15 (Gröbner Bases). A ﬁnite subset G = {g1, . . . , gsˆ} wrt. a mono-
mial order ≺ of an ideal I is said to be Gröbner basis of I if V (G) = V (I) and
all leading monomials in G are relatively prime.
In [68], the monomial order ≺ following the reverse topological order of the
circuit has been shown to be eﬀective in modeling the circuit as Gröbner basis
in linear time. Because of this monomial order, the time complexity of the
Gröbner bases algorithm has been bypassed, and utilizing the IMT procedure
becomes computationally feasible.
A given ideal may have diﬀerent Gröbner bases, where one set of bases can be
reduced wrt. a monomial ordering to other bases. The thesis deﬁnes G as strong
Gröbner bases of ideal I, if for any term of p ∈ I there exists a polynomial g ∈ G
satisfying that lt(g) divides this term of p. Note that for ﬁelds, any non-zero
coeﬃcient of a term is invertible, it is easy to verify that in this case, every
Gröbner basis is a strong Gröbner basis, however, this does not hold in general
for rings, hence the utilized modeling method restricts the values of coeﬃcients
of all lt(gi) ∈ G to the set {−1, 1}, as introduced in Subsection 2.1.5. Strong
Gröbner bases enable the polynomial division operation required by the IMT.
Deﬁnition 16 (Polynomial Division). A polynomial division of two polynomi-
als p and g denoted as p g−−−→+ r is performed as r = p− c·Mlt(g) · g. If a non-zero
term c ·M of p is divisible by the leading term of g, then p reduces to r modulo
g. Similarly, p can be reduced (divided) wrt. a set of polynomials G to obtain
a remainder r, denoted p G−−−→+ r, such that no term in r is divisible by the
leading term of any polynomial in G.
2.2 boolean reasoning 31
The polynomial division in this form is not applicable, if G is not a strong
Gröbner basis. Because at this case the lc(g) may not equal to one, note that
lt(g) = lc(g) · lm(g), therefore, the coeﬃcients of the resulting polynomial r
will not be in the integer ring.
The polynomial division p g−−−→+ r can be seen as the substitution of a
variable x in p with the tail terms of g using a rewrite rule, whereas x is also
the leading monomial of g (lm(g) = x).
Deﬁnition 17 (Rewrite Rule). Let g = −lm(g) + tail(g). The rewrite rule
corresponding to g substitutes the lm(g) = x in a polynomial p ∈ Z2(x1, . . . ,xn)
by the tail(g), which is denoted as lm(g) → tail(g). If g is a monomial, then
the right-hand side of its rule is 0. The thesis refers to applying the rewrite rule
as the substitution of variable x.
Example 6. Let p := x4x3 + x1 and a polynomial g := −x4 + x2x1, then
r = p − x4x3−x4 g = x3x2x1 + x1, where the polynomial division substitutes x4 in
p with x2x1.
In the IMT procedure, as shown in Algorithm 1, given a speciﬁcation (or
relationship) polynomial pr and a circuit model in form of Gröbner bases G,
pr is divided in every iteration by some polynomial g ∈ G. The division (sub-
stitution) iterations are executed according to a certain order, the substitution
order. This order is crucial to cancel the nonlinear terms before the blow-up
of their intermediate sizes. In [23, 39], the substitution order follows the re-
verse topological order of the circuit variables, in addition to the fanouts of
the gates—variables that have the same level and depend on common inputs
(fanouts) must follow each other in the substitution.
Example 7. Following Example 5, the extracted algebraic model is a Gröbner
basis, hence the ideal membership testing of pr can be applied. The substitution
order follows the reverse topological order of the circuit:
pr
g1−−−−→+ −s+ 2x4x3 − 2x4 − 2x3 + x3 + x2 + x1
g2−−−−→+ 2x4x3 − 2x4 − 2x3 + 2x1x3 − x1 + x2 + x1
g3−−−−→+ 2x3x2x3 − 2x3 − 2x2x3 + 2x1x3 − x1 + x2 + x1
g4−−−−→+ 2x2x3x2x1 − 2x2x3 + 2x1x3 − x1 − 2x2x1 + x2 + x1
g5−−−−→+ 2x1x3 − x1 + 4x3x2x1 − 2x3x1 − 2x3x2 − 2x1x2 + x2 + x1 g6−−−−→+ 0.
Since the ﬁnal division result is 0, it is proven that the circuit under veriﬁcation
satisﬁes the function speciﬁcation pr.
32 background
Algorithm 1 IMT Procedure
Require: Equivalence relationship polynomial pr, circuit polynomials G =
{g1, g2, . . . , gs}
Ensure: Remainder r is equal to zero
1: V ← OrderedPolynomialV ariables(pr,G) { Substitution ordering}
2: r ← pr
3: for i in 0 to |V | − 1 do
4: if V [i] ∈ PrimaryInputs then
5: Choose gt ∈ G such that lm(gt) = V [i]
6: r
gt−−−→+ r
7: end if
8: end for
The thesis leverages also another property of the Gröbner bases theory to
simplify and preprocess the ideal of the circuit model. For a given ideal (set of
polynomials), the theory oﬀers a canonical representation for the ideal called
reduced Gröbner bases.
Deﬁnition 18 (Reduced Gröbner Bases). A reduced Gröbner basis is Gröbner
basis G for a polynomial ideal I, such that for all gi ∈ G, no term in gi is
divisible by the leading term lt(gj) for all i = j.
Lemma 2. Let I = 0 be a polynomial ideal. Then, for a given monomial
ordering ≺, I has a unique reduced Gröbner basis [26].
In Chapter 5, the thesis utilizes the uniqueness property of the reduced Gröb-
ner bases for computing canonical polynomials and for checking the equality of
ideals. As a consequence of Lemma 2, once reduced Gröbner bases can be eﬀec-
tively computed for two given ideals, it is eﬀortless to deduce that the ideals
are equal if and only if they have the same reduced Gröbner basis. The reduced
Gröbner basis can be computed by eliminating variables of the given Gröbner
basis according to a speciﬁc substitution order. By eliminating variables recur-
sively new Gröbner bases are generated until an ideal that satisﬁes conditions
of the reduced Gröbner basis is derived.
Example 8. Consider two Gröbner bases G1 and G2 modeling the same Boolean
function xo = x1 ⊕x2 ⊕x3, where G1 = {−vˆ1 −2x2x1 +x2 +x1, −xo −2vˆ1x3 +
vˆ1 + x3} and G2 = {−v1 − x2x1 + x1, −v2 − x2x1 + x2, −x3 − v2v1 + v2 +
v1, −v4 − v3x3 + v3, −v5 − v3x3 +x3, −xo − v5v4 + v5 + v4}. Eliminating (sub-
stituting) the variable vˆ1 by the rewrite rule from G1 leads to the reduced Gröbner
basis rG1 = {−xo + 4x1x2x3 − 2x3x2 − 2x3x1 − 2x2x1 +x3 +x2 +x1}, while by
2.3 formal verification of arithmetic circuits 33
eliminating the variables of G2 according to the order v5 > v4 > v3 > v2 > v1,
the reduced Gröbner basis rG2 = {−xo + 4x1x2x3 − 2x3x2 − 2x3x1 − 2x2x1 +
x3 + x2 + x1} is derived. Because rG1 = rG2, the equality of the two ideals is
proved.
2.3 formal verification of arithmetic circuits
Formal veriﬁcation of ﬂoating-point and integer arithmetic has been the sub-
ject of extensive investigation in academia and industry, both to verify the
correctness of the computation and control aspects. Veriﬁcation of arithmetic
circuits is expensive due to large and complex hardware given the inherent
diﬃcult nature of the computation. Because of this, most if not all of recent
proposed works decompose the problem manually or through speciﬁc case split-
ting approaches and apply diﬀerent techniques to verify the decomposed units.
These methodologies require high-level expertise and a perfect understanding
of the design. There is a large body of work from Intel outlining methodologies
that combine automated model checking and theorem proving to verify FPUs.
As a paradigmatic example of using this combination, a divider circuit is ver-
iﬁed by constructing an inductive proof using a theorem prover to verify that
certain invariants are maintained, while the circuit-level details are checked by
a model checker [55, 76]. The most recent exposition of these works [61] require
implementation-speciﬁc tedious manual eﬀort to make the veriﬁcation process
complete. In a similar fashion, AMD leverages the ACL2 theorem prover and
a model checker to verify formally FPUs with manually-guided proofs [88]. A
higher automated methodology from IBM [54] combines case splitting, multi-
plier isolation, and automatic equivalence checking, to make the veriﬁcation of
fused-multiply-add FPUs tractable.
In the following, the thesis describes brieﬂy architectures of multipliers that
have been veriﬁed by proposed techniques of the thesis, in addition to the main
speciﬁcations of FPUs as deﬁned in the latest IEEE Standard for ﬂoating-point
arithmetic [50]. Also, it classiﬁes formal veriﬁcation methodologies for arith-
metic circuits into two directions: 1) methodologies utilize equivalence checking
and 2) methodologies combine theorem proving with automated checkers.
2.3.1 Multiplier Architectures
The Multiplication function involves two basic operations to generate and ac-
cumulate partial products. There are three types of high speed multipliers [62]
34 background
which are sequential multiplier, parallel multiplier and array multiplier. How-
ever, the thesis focuses on parallel and array multipliers since they are the most
common types. The parallel multiplier generates partial products in parallel and
accumulates them using a fast multi-operand adder, while the array multiplier
consists of identical cells which are typically full adders or carry-save-adders
that are connected in a way that reduces the delay of the longest path in the
circuit (logic depth).
To design a high speed multiplier, the designer reduces the number of partial
products and/or accelerates their accumulation. This is done by optimizing at
least one of the three main parts of the multiplier circuit:
1. The ﬁrst part is the generation of partial products. They can be pro-
duced simply from the logical AND of the multiplicand with the multi-
plier. However, Booth’s algorithm—in particular Radix-4 modiﬁed Booth
recoding—is utilized to reduce the number of partial products.
2. The second is the multi-operands adder tree which sums up partial prod-
ucts to two arrays of output bits. There are many types of trees that make
a trade-oﬀ between the overall delay and the wiring complexity (number
of needed wires or tracks) of the circuit. For instance, Wallace tree is
known for its optimal computation time, but it suﬀers from a large num-
ber of wires, in contrast, balanced delay tree requires a smaller number of
wiring tracks but has a higher overall delay compared with the Wallace
tree.
3. The third part is the last stage two-operands adder which is given outputs
of the adder tree and generates a multi-bits array as an output of the
multiplier circuit. The straightforward implementation of the ﬁnal adder
is the ripple carry adder which can be thought of as an array of full
adders, where the carry-out of the ith full adder is fed to the carry-in of
the (i+1)th full adder. Another group of adders which has less delay than
the ripple carry adder is called Parallel Preﬁx Adders (PPAs) since the
adders reduce the amount of time required to determine carry bits. Types
of PPAs diverge in the logic depth, the gates fan-out, and the area.
The thesis categorizes the integer multiplier architectures according to 1) the
type of the partial products generator, 2) the partial products accumulator,
and 3) the last stage adder. In the experiments, the veriﬁed integer multipliers
circuits are combinations of the following:
1. Two types of partial products generators, namely Simple Partial Products
(SP), and Booth Partial Products (BP).
2.3 formal verification of arithmetic circuits 35
2. Multiple types of partial products accumulators which are Array (AR),
Wallace Tree (WT), (4,2) Compressor Tree (CT), Redundant Binary Ad-
dition Tree (RT), and Dadda Tree (DT).
3. The chosen types of the last stage adder are Ripple Carry Adder (RC) as
well as PPAs which are Carry Look-Ahead Adder (CL),Brent-Kung Adder
(BK), Kogge-Stone Adder (KS), and Hans-Carlson Adder (HC).
These multiplier benchmarks are named according to their architecture features.
For example, a circuit with simple partial products, a Wallace tree as partial
products accumulator, and a ripple carry adder as last stage adder will be la-
beled by SP-WT-RC. The integer multiplier benchmarks as HDL are generated
using the online tool Arithmetic Module Generator [2].
2.3.2 Floating-Point Speciﬁcation
Floating-point (FP) numbers overcome the limitation of ﬁxed-point integers,
they can provide a vast range of numbers. Fixed-point representations have
a ﬁxed window of expressible numbers, they cannot represent very large and
very small numbers at once. In contrast, FP maps the inﬁnite range of real
numbers by a ﬁnite subset with limited precision. Despite its range, FP has
a serious drawback: the represented numbers are inaccurate—they are approx-
imated. Because there are many possibilities to approximate a real number,
IEEE standard for FP is proposed. The arithmetic standard [50] is the most
common representation for real numbers in the state-of-the-art microprocessors.
It provides a standard method for computation with FP numbers that will yield
the same result regardless of the implementation—the results of the computa-
tion including errors and error conditions will be identical given the same input
data.
Deﬁnition 19 (Binary Floating-Point). A binary ﬂoating-point number is de-
ﬁned by a triple (s, ex,m) with sign bit s ∈ {0, 1}, exponent ex ∈ Z, and
signiﬁcand (mantissa) m ∈ R≥0. The value of the number is calculated as
(−1)s · 2ex · m.
The standard determines a set of ﬁnite binary FP numbers representable
within a particular format by the integer parameters: p = the number of bits in
the signiﬁcand m (precision), emax = the maximum exponent ex, and emin =
1 − emax = the minimum exponent ex. Within each format, ex is any integer
in the range emin ≤ ex ≤ emax and m is a number represented by a bit string
of the form b0 · b1b2 · · · bp−1 where bi ∈ B, therefore, 0 ≤ m < 2.
36 background
Table 1: Binary Format Parameters
Parameter binary16 binary32 binary64 binary{k} at k ≥ 128
k 16 32 64 multiple of 32
p 11 24 53 p = k − round(4 · log2(k)) + 13
emax 15 127 1023 2(k−p−1) − 1
Representations of FP numbers in the binary interchange formats are encoded
in k bits allocated as follows: 1-bit for sign, w-bit for biased exponent E =
ex + bias, and (t = p − 1)-bit for trailing signiﬁcand ﬁeld b1b2 · · · bp−1; the
leading bit of the signiﬁcand b0 is implicitly encoded in the biased exponent
E. The values of w, t, and bias for binary formats are calculated given the
value of k and p based on the following equations: w = k − p, t = p − 1, and
emax = bias = 2(w−1) − 1. The standard deﬁnes binary formats of widths 16,
32, 64, and 128 bits, and in general for any multiple of 32 bits of at least 128
bits. The parameters p and k for every format width are shown in Table 1.
According to the standard, every operation on FP numbers is performed as
if it ﬁrst produced an intermediate result that is correct to inﬁnite precision,
and then rounded this intermediate result based on given rounding modes to
ﬁt in a destination binary format. In other words, if the result of an arithmetic
operation does not ﬁt in the precision p or the range of the exponent e, the
result is rounded according to a given round mode. The standard supports four
rounding modes for binary FP:
1. roundTiesToEven delivers the FP number nearest to the inﬁnitely precise
result, if the two nearest FP numbers are equally near, the one with an
even least signiﬁcant bit is delivered;
2. roundTowardPositive delivers the FP number closest to and no less than
the inﬁnitely exact result;
3. roundTowardNegative delivers the FP number closest to and no greater
than the inﬁnitely exact result;
4. roundTowardZero delivers the FP number closest to and no greater in
magnitude than the inﬁnitely exact result.
The thesis evaluates the performance of its proposed techniques by ver-
ifying integer multipliers and FP multipliers. The FP multiplication opera-
2.3 formal verification of arithmetic circuits 37
tion computes C = A · B for two FP operands A = (−1)sa × 2ea · ma and
B = (−1)sb · 2eb · mb. sa denotes the sign, ea the exponent, and ma the signiﬁ-
cand including the implicit bit of the operand A (similarly for B and C). The
operation can be deﬁned as sc = sa ⊕ sb and 2ec ·mc = RND (2ea+eb ·ma ·mb).
RND is the round function that ﬁts the exact result of (2ea+eb ·ma ·mb) in the
FP number (2ec · mc).
Such arithmetic speciﬁcation could be formulated mathematically for theo-
rem proving and model checking [53, 70] or as a HDL-based reference model
for equivalence checking as in [54] and also in this thesis.
2.3.3 Equivalence Checking
Equivalence checking veriﬁes that given two designs with the same corre-
sponding inputs, the corresponding outputs of these designs are always equal
under all input assignments. Equivalence checking is performed by building a
so-called miter, where the two compared designs are modeled using the same
description method and combined in one design. A miter is constructed by
adding 2-input XOR gates on top of corresponding outputs of compared de-
signs together with connecting the outputs of these XOR gates to a large OR
gate. The miter has a single output which is the output of the OR gate, the
output is equal to one if there is an assignment causing a mismatch between at
least one output pair.
Modern equivalence checking tools are capable of verifying designs with mil-
lions of gates in very short times. The great success of equivalence checking
is based on exploiting structural similarities between the two compared cir-
cuits. Structurally similar circuits contain a lot of internal nodes implementing
equivalent circuit functions. These internal equivalences sometimes called cut
points [65] are deduced automatically and leveraged eﬃciently to decompose
the veriﬁcation problem into smaller ones [11, 65, 72]. This process is performed
typically over the AIG representation of the miter.
Checking the equivalence between two combinational designs or two sequen-
tial designs with the same state encodings is named Combinational Equiva-
lence Checking (CEC). Another type of equivalence checking is called Sequential
Equivalence Checking (SEC) [5, 38], it compares two sequential designs wherein
there are no one-to-one correlations between some or all of their states. Gener-
ally, SEC requires analysis of the sequential behavior of compared designs so
that it comes with greater computational expense than CEC.
38 background
Both types of equivalence checking cannot deal with designs that have few
internal equivalences. This problem occurs especially for arithmetic circuits
since one arithmetic function can be implemented in many diﬀerent ways [96],
in particular integer multipliers implemented at the gate level. Because of this,
there is no general automated solution that can be applied on all types of
circuits. The thesis classiﬁes equivalence checking approaches into two groups
based on the types of the checked circuits: 1) integer multipliers, and 2) FPUs.
2.3.3.1 Integer Multipliers
Without any information on the high-level structure of the netlists most
equivalence checkers fail to verify multiplication circuits. For this problem, sev-
eral approaches have been proposed in the past. However, most of these ap-
proaches are not scalable, cannot be applied on all types of multiplier architec-
tures, or they fail if the circuit does not represent a correct multiplier function.
The existing approaches can be divided mainly into three categories: 1) decision
diagrams, 2) structural methods, and 3) approaches based on special arithmetic
properties of the realized function.
Decision diagrams provide a canonical representation for netlists of the design
under veriﬁcation (DUV) and its reference implementation (RI). The equiva-
lence checker exploits this property to check that the resulting decision diagrams
are identical. Unfortunately, many popular data structures like BDDs [15] have
an exponential size for multiplication. *BMDs [17] can represent the word level
multiplier function in a compact way. However, the exponential blowup can
still occur for incorrect designs or during the construction of the *BMD from a
bit-level circuit [47].
The second direction in the classiﬁcation is based on structural methods. An
approach in this direction has been proposed in [96]. It extracts adder structures
from bit-level netlists and builds full adder networks. This approach has mainly
two drawbacks: First, it makes assumptions about the internal structure of the
circuits that are not fulﬁlled by all multiplier architectures [97]. Second, the
approach fails to build the full adder networks if the circuit does not represent a
correct multiplier, leading to an inconclusive result. Nevertheless, this technique
achieves promising results—it allows to verify a 48-bit multiplier in about 30
minutes.
The third direction exploits arithmetic properties of the multiplier function to
build a miter with many internal equivalences. The ﬁrst approach in this context
has been proposed by Fujita [42]. It is based on the fact that any function
satisfying the recurrence relation (X + 1) · Y = X · Y + Y is a multiplication.
2.3 formal verification of arithmetic circuits 39
However, the original approach by Fujita does not scale, and it cannot verify
multipliers larger than 16 bits. A related approach [21] is based on case splitting
by forcing a bit of one of multiplier operands to be zero and summing partial
products that belong to this bit outside the multiplier. The problem with this
idea is that the similarity of the nets inside the miter structure depends on the
order of the partial products of the DUV, therefore, this approach only works
for speciﬁc implementations of multipliers.
2.3.3.2 FPUs
In [54, 102], two similar combinational equivalence checking methodologies
have been proposed to verify FP of a Fused-Multiply-Add (FMA), they are also
applicable to FP of adders and multipliers. FMA computes the function A ·B+
C on FP operands A, B, and C, while multiplication A ·B and addition A+B
operations are computed using FMA as A ·B+ 0 and A · 1+B, respectively. In
addition to the multiplier block, shifters of the FMA circuit are major challenges
for the state-of-the-art equivalence checkers. FMA contains two shifters: the
alignment shifter that aligns the addend C to the result of product (A ·B) and
the normalization shifter that eliminates leading zeros in the intermediate result
before rounding. Each of these building blocks leads to exponential runtimes
with SAT and exponential sizes with BDDs.
In [54], the RTL implementation of the FPU is compared against a simple
behavioral model described in an HDL language, while in [102], it is compared
against a C/C++ model. In both methodologies, the overall equivalence check-
ing problem is split into subproblems to circumvent diﬃculties posed by shifters.
In this way, the shift amounts of shifters are restricted to small ranges, caus-
ing them to collapse into simple wires. To circumvent the diﬃculties posed by
the multiplier, it is removed from the cone-of-inﬂuence of the compared FPUs
and veriﬁed independently by one of the approaches described in the previous
subsection.
In [63], SEC is leveraged to verify the control aspects of FPU, whereas CEC
can verify only data-path aspects, as in [54, 102]. This is achieved by: 1) as-
suming that the FPU produces the right result without accounting for control
aspects such as interactions between instructions and resource conﬂicts; and
2) comparing the FPU against itself under diﬀerent conditions. The RI of the
miter is an instance of the FPU that is given a single random instruction in
an empty pipeline, while the DUV will be another instance of the FPU that is
fed the same instruction as a part of a sequence of random instructions. The
miter checks only the results of the given instruction between RI and DUV.
40 background
This setup enables control features in the DUV to be invoked, and if the given
instruction interferes with the execution of other instructions, the miter will an-
nounce a mismatch. This special set-up allows leveraging structural similarities
between the RI and the DUV (they are just two instances of the same design)
to make the SEC applicable which otherwise would not be.
2.3.4 Theorem Proving
Typically, formal veriﬁcation of FP circuits has been performed by combining
theorem proving with automated checking techniques such as model checking
and equivalence checking. In the following, a brief description about theorem
proving is given, then the integration between theses formal veriﬁcation tech-
niques to verify FP circuits is presented.
Theorem proving [49, 52] constructs an inductive formal proof that is guided
manually for the correctness of the DUV. This proof is assisted by theorem
provers which provide libraries of large numbers of basic theories and lemmas,
e.g., a bit vector library. Also, they can integrate features such as an expressive
speciﬁcation language, a functional programming language, and powerful de-
ductive techniques. These features allow expressing the speciﬁcation as a set of
lemmas, modeling of the DUV as a composition of recursive functions, and as-
serting the correctness of the DUV with respect to the speciﬁcation lemmas by
deductive reasoning. A proof for speciﬁcation lemmas (properties) is obtained
by combining a set of previously proved lemmas that together imply the desired
property. Each lemma holds relative to some subset of previously proved lem-
mas; this prior knowledge is utilized to prove the new lemma. A given lemma
usually focuses on one aspect of the design. Typically, early lemmas describe
variable domains, properties about local data structures of the design, while
later lemmas address more global aspects of the design. Utilizing the prior in-
formation helps the later lemmas to be proved fairly easy since it causes a
reduction of the proof complexity. When it is too diﬃcult to prove a lemma
directly, a skilled human veriﬁer typically searches for additional supporting
lemmas that make it feasible to prove such a new one.
The strength of the combination between theorem proving, model checking
and equivalence checking, is in the use of automated checkers to hide the bit-
level details of the circuit and resolve the control as well as scheduling con-
straints of the design, while theorem proving overcomes the scalability limita-
tion of the automated checking by enabling the construction of a high-level
mathematical proof, whereas most of FPU speciﬁcations including the IEEE
2.3 formal verification of arithmetic circuits 41
Standard are based on real numbers, not Booleans. This combination bypasses
the scalability drawback of bit-level proof solvers such as BDD and SAT which
are used within automated checking techniques. However, this combination is
still requiring manual interaction and high understanding of the FPU design
to construct a tedious mathematical proof that can be checked by a theorem
prover. Because of this human intervention, such a veriﬁcation process suﬀers
also from unintentional human faults in the constructed proof. This concept
has been utilized for more than two decades [52, 56, 74, 76, 79, 82, 88, 90,
91] for verifying FPUs and integer multipliers. An implementation of the con-
cept is the formal veriﬁcation framework being used at Centaur [91], it ties
together the ACL2 theorem prover [59] with non-commercial automated tools:
ABC equivalence checker [13], MINISAT solver [37] and IC3 model checker
(PDR) [36], whereas the veriﬁcation engine is built upon the AIG and BDD
symbolic models.
Another example to illustrate this concept is the approach proposed by
IBM [79] to verify the RTL of Booth integer multipliers. It takes two operands
X and Y, producing later after n-cycles two bit-vectors Sum and Carry such
that Sum + Carry = X · Y. The correctness of this design is encoded by the
following ACL2 theorem:
(defthm multiplier-correct
(implies
(and (integerp n)
(<= 7 n))
(equal (bv+ Sum Carry)
(bv (* (bv-val X) (bv-val Y)) L)
)))
The theorem states that the product of X times Y is equal to the addition of
Sum and Carry. Because the correct output of the multiplier starts to stream
out after 7 cycles of initialization and ﬁlling the pipeline, the condition (<= 7 n)
is added. The operation bv+ retrieves a bit vector representing the sum of the
two arguments. The function bv-val returns the integer value of a bit-vector,
while (bv v L) returns a bit vector of the length L representing a value v.
The overall proof strategy is based on decomposing this ﬁnal theorem into
simple properties that can be checked automatically by a model checker, while
the theorem prover is utilized to prove the consistency between the decomposed
properties and the ﬁnal theorem. First, the ﬁnal theorem is reduced into two
major lemmas: 1) the correctness of the Booth encoder, and 2) the correctness of
42 background
the subsequent compressions stages. The Booth encoder lemma states that the
addition of all the bit-vectors coming out of the Booth encoder is equal to the
product of the input operands. The second lemma states that the summation
is preserved during the compression tree. Combining these two major lemmas
leads to prove the correctness of the multiplier.
As these major lemmas are described over integer operations while the multi-
plier implementation is described at bit-level, three ACL2 models of the Booth
encoder are created named high-level model, low-level model, and the bit model.
The main diﬀerence between the low-level model and the bit model is that the
low-level model uses arithmetic and propositional logic operations, while the
bit model is purely constructed of propositional logic operations. To verify that
the DUV satisﬁes the major lemma of Booth, ﬁrst, the theorem prover veriﬁes
the high-level model against this major lemma; then, it proves the equivalence
between high-level and low-level models; ﬁnally, it compares the low-level model
against the bit model. By performing these steps, the major lemma of Booth is
decomposed into simple lemmas (properties) which build the bit model wherein
each property states the correctness of generating one Booth vector. Thereby it
is feasible to check independently that the DUV satisﬁes each property in the
bit model using the model checker.
To verify the implementation of the compression tree which sums the gener-
ated Booth vectors into two bit-vectors, the equivalence of its major lemma is
checked by the theorem prover against a set of properties that describe func-
tions of each carry-save-adder in the compression tree, while each individual
property is checked by the model checker wrt. the DUV in a practical time.
In IBM, the entire veriﬁcation eﬀorts to perform this proof for a speciﬁc
multiplier design required about a month, 21 eight hour work days of human
eﬀort from a single experienced ACL2 user. About one-third of this time was
spent ﬁnding properties that could be veriﬁed by the model checker together
with writing the necessary conﬁguration ﬁles. The remaining two-third of the
time was spent developing the necessary ACL2 proof, while the proof itself
requires about 50 minutes to run.
3
RECURRENCE RELATIONS : SCALABLE VERIF ICAT ION
OF MULTIPL IERS
Although a lot of eﬀort has been spent on verifying arithmetic designs, it is
still a problem that has no general robust automated solution. One major chal-
lenge is verifying large scale multiplier circuits. For this purpose, this chapter
revisits the idea of using functional properties of the multiplication function,
which can be expressed by recurrence equations. Then, instead of proving the
equivalence of the implementation and a speciﬁcation, the veriﬁcation task is
to show that the implementation satisﬁes the recurrence equation. We propose
an approach which makes this circuits. Based on a combined add/multiply re-
currence equation, we can make eﬃcient use of case splitting wrt. the partial
products of the multiplier. As a result, the veriﬁcation problem is split into sim-
pler cases such that only a small part of the multiplier will be checked in every
case, thereby avoiding redundant checks among the cases. In every decomposed
case, the generation and the addition of one partial product of the multiplier
are checked. Since the multiplier is an addition tree that adds all partial prod-
ucts, checking the correctness of generating and adding every partial product
in the multiplier by the proposed case splitting leads to the veriﬁcation of the
complete multiplier. As the multiplier size increases, the number of cases will
increase, while the complexity to check one case will remain almost the same.
The proposed approach is able to verify a multiplier at the gate-level without
any information about its high-level speciﬁcation or the internal structure of
the netlist. In addition, it is a general technique that can be applied to vari-
ous diﬀerent architectures. Overall, it allows verifying large scale multipliers in
practical time. The experiments show the ability of the proposed approach to
verify 128-bit multiplier.
In the summary, the main contributions of this chapter are:
1. Explaining the theoretical background of equivalence checking based on
recurrence relations, which has not been discussed in detail before.
2. Enhancing the scalability of an equivalence checking technique that does
not require a golden reference, whereas such references are not available
in many practical cases.
43
44 recurrence relations: scalable verification of multipliers
3. Proposing a new automated decomposition method for the veriﬁcation
problem of the multiplier which overcomes the exponential complexity of
this problem.
3.1 equivalence checking based on recurrence relations
In classical equivalence checking, the Design Under Veriﬁcation (DUV) and
the Reference Implementation (RI) are combined in one circuit called miter by
XOR-ing every output pin of the DUV with the respective output pin of the RI
and computing the OR of these functions. An eﬃcient miter should have such
many similar nets, so that the equivalence checking problem can be partitioned
into less complex sub-problems.
In this section, we review an equivalence checking that builds the RI from the
DUV itself, which is referred as Equivalence Checking based on Recurrence Rela-
tions (ECRC). The technique does not need a golden reference model and does
not require any knowledge about the internal architecture of the DUV. In the
same time, it oﬀers a miter with a proper number of equivalences. The ECRC
technique is utilized to verify functions that can be deﬁned by recurrence rela-
tions. These functions are also known as primitive recursive (p.r.) functions [27].
In the following, we brieﬂy discuss p.r. functions, before showing how to exploit
the recurrence relations for equivalence checking. Finally, we give an overview
about Fujita’s approach [42] which leverages an equivalence checking based on
a recurrence relation.
The p.r. functions are deﬁned by the following theorem, as stated in [27]:
Theorem 2. Given p.r. functions g ∈ Nk → N and h ∈ Nk+2 → N, there is
a unique f ∈ Nk+1 → N satisfying:
f(0, Y ) = g(Y )
f(X + 1, Y ) = h(f(X, Y ),X, Y )
for all Y ∈ Nk and X ∈ N.
Based on this theorem, a unique p.r. function f is deﬁned, if it can be con-
structed from other p.r. functions g and h using a composition and a primitive
recursion. Let Y = (Y0,Y1, · · · ,Yk−1) and X,Y ∈ N. The basic p.r. functions
are: 1) the constant zero C(Y ) = 0, 2) the projection function Pi(Y ) = Yi, and
3) the successor function S(X) = X + 1. Known examples for p.r. functions
are addition and multiplication. The addition function Add(X,Y ) is deﬁned
3.1 equivalence checking based on recurrence relations 45
DUV
DUV
Increment
x0x1
x0x1
. . .
. . .
xn−1
y0. . .yn−1
xn−1y0. . .yn−1
xm. . .0 0 1. . .
xmxm+1
. . .
. . .
y0y1yn−1
n bits Adder
Compare Logic
z0z1zn−1
z´0 . . .
. . .. . .
z´n−1 zˆ0
. . . zˆn−1
Miter Out
LHS
RHS
Figure 12: Miter of Multiplier using Fujita’s Approach
according to Theorem 2 by the relations Add(0,Y ) = Y and Add(X + 1,Y ) =
S(Add(X,Y )). Note that the ﬁrst relation is given in terms of a projection
function, while the second uses the successor function, whereas both are ba-
sic p.r. functions. This implies that Add is a p.r. function as well. Based on
this, the multiplication function Mult(X,Y ) = X · Y is uniquely deﬁned
by Mult(0,Y ) = 0 together with Mult(X + 1,Y ) = Add(Mult(X,Y ),Y ),
thereby it is also a p.r. function.
The uniqueness of a p.r. function thus deﬁned gives rise to the ECRC tech-
nique. Given some implementation F of a p.r. function f , we can check that F
correctly realizes f by checking that it satisﬁes the recurrence relations of f . In
case of the multiplication function, if we succeed to show that F satisﬁes the
relations F (0,Y ) = 0 and F (X + 1,Y ) = Add(F (X,Y ),Y ) for all inputs X
and Y , this implies that F is indeed a multiplication. For most p.r. functions,
checking the ﬁrst relation is relatively easy, while the second relation is veriﬁed
by building a miter according to the recurrence relation and using standard
equivalence checking techniques to show that both sides of the equation are
equal for arbitrary inputs. The checking of both relations is referred as the
ECRC technique.
46 recurrence relations: scalable verification of multipliers
An example of ECRC has been proposed by Fujita in [42]. The approach
veriﬁes a multiplier circuit by checking the equivalence of both sides of the
recurrence relation:
Mult(X + 1,Y ) = Mult(X,Y ) + Y .
The resulting miter structure is shown in Figure 12. An implementation of
the Left Hand Side (LHS) of the equation is built by adding an increment
circuit to the ﬁrst operand X of the DUV, while the Right Hand Side (RHS) is
implemented by adding the second operand Y to the output Z. The equivalence
checking compares the two circuits in order to prove that the DUV is indeed a
multiplier.
Although the two sides use the same circuit for multiplication, they have
diﬀerent inputs: (X + 1,Y ) on the LHS and (X,Y ) on the RHS. Therefore,
the equivalence checker is not able to ﬁnd enough internal equivalence points.
Fujita overcomes this problem by splitting the checking process into a series
of sub-problems. The case splitting is designed based on the least bit xm of
X that has a zero value in the increment function (X + 1) = ((2n−1xn−1 +
2n−2xn−2 + · · · + 2x1 + x0) + 1). Let xm is equal to zero in the bit vector X,
incrementing X by one inverts the value of xm from zero to one iﬀ all lower bits
xi, 0 ≤ i ≤ m − 1 are equal to 1, besides the inversion of xm, all these lower
bits will be inverted from one to zero, e.g., ((2n−1xn−1 + · · · + 2m+1xm+1 +
2m · 0 + 2m−1 + · · · + 4 + 2 + 1) + 1) = (2n−1xn−1 + · · · + 2m+1xm+1 + 2m).
Fujita’s approach leverages this feature of the increment function to split the
equivalence checking problem into simpler cases, in each case m the increment
circuit is replaced by simple inverters for the m+ 1 lower bits of X together
with assigning one to all inputs xi for i < m and zero for the variable xm; the
higher bits xj , for j > m will not be modiﬁed. This simpliﬁcation oﬀers internal
equivalence points since every case m creates a similarity between upper parts
of the LHS and the RHS of the miter, where most of the input bits (i.e., xi
for i > m) to these upper parts are the same in both sides. However, for larger
multipliers beyond 16 bit, this simpliﬁcation of the increment circuit does not
lead to suﬃcient similarities that enable a scalable veriﬁcation.
In the next section, we introduce a new approach which increases the scala-
bility of the ECRC technique to verify multipliers larger than 16 bits.
3.2 checking partial product approach 47
3.2 checking partial product approach
The ECRC has a very useful feature that is missed in classical equivalence
checking techniques, it does not assume that there is a golden reference that
is compared against the DUV. Also it does not require a structural knowl-
edge about the DUV that helps to ﬁnd the proper reference that have many
structural similarities with the DUV. For these reasons, we are interested in
enhancing the scalability of the ECRC. We propose an approach that exploits
features of the ECRC for the veriﬁcation of large scale integer multipliers.
In following, after some basic deﬁnitions of the multiplication function and
the combined add/multiply function, we give an overview of this proposed ap-
proach. Then, we present the theoretical part of the approach. Finally, its im-
plementation will be introduced.
3.2.1 Basic Notions
We denote an integer multiplier as Z = X · Y , X and Y are the integer
operands of the multiplier, Z is the integer result of the multiplier. These integer
variables will be represented as vectors of Boolean variables such that X =
n−1∑
i=0
2ixi, Y =
n−1∑
i=0
2iyi, and Z =
2n−1∑
i=0
2izi, where n is the size of each operand
of the multiplication function and 2i is the weight of each Boolean variable. The
multiplier Z = X · Y can be expressed on the bit-level as:
2n−1∑
i=0
2izi =
n−1∑
i=0
2ixi ·
n−1∑
i=0
2iyi.
Typically, the multiplication of two operands is performed by generating
partial products which are then summed using an addition tree as shown in
Figure 13.
The partial product is a bitwise multiplication ppci,ri = yci−ri · xri, where ri
and ci are the row/column indices of the partial product in the addition tree.
The ppci,ri has a weight 2ci which is the product between weights of xri and
yci−ri, e.g., Figure 13 shows the partial product pp4,4 = y0 · x4 which has the
weight 24. For an n-bit multiplier, we deﬁne that
ppci,ri =
⎧⎨
⎩
yci−ri · xri 0 ≤ ci − ri ≤ n − 1
0 Otherwise.
48 recurrence relations: scalable verification of multipliers
A partial products generator is a part of the multiplier which produces par-
tial products by multiplying each bit of the ﬁrst operand by all bits of the
second operand in that the summation of generated partial products satisﬁes
the equation
2n−2∑
ci=0
(2ci ·
n−1∑
ri=0
ppci,ri) =
n−1∑
i=0
2ixi ·
n−1∑
i=0
2iyi.
The addition tree is the multiplier part which compresses partial products to
generate the multiplier result Z. The function of the addition tree is deﬁned by
the equation
2n−1∑
i=0
2izi =
2n−2∑
ci=0
(2ci ·
n−1∑
ri=0
ppci,ri).
Note that ppci,ri is equal to zero, if ci − ri < 0 or ci − ri > n − 1. The tree
has two indices ri and ci, where ri is the index of the tree rows, while ci is the
index of the tree columns. As shown in Figure 13, partial products are ordered
in the tree such that those of the same weight 2ci belong to the same column
ci. The output bit zci belongs also to the column ci since it is of weight 2ci.
This description of the addition tree allows to deﬁne it as sets, where each set
CEci = {ppci,0, ppci,1, · · · , ppci,n−1} consists of partial products of a column ci
in the tree, together with relating each set CEci with an output variable zci.
The addition tree equation deﬁnes the addition of the elements in CEci as
well as the function of its related output variable zci. The addition of partial
products in each set CEci generates a sum bit and other carry bits which are
propagated to partial products of the next column CEci+1. Because of these
carry bits, elements of CEci cannot be summed independently without taking
into consideration carry bits propagated from adding elements of the previous
columns CEci−i, for 0 < i < ci. The summation between partial products in
CEci and the propagated carry bits from previous column results the output
bit zci and new carry bits to the next column ci + 1. We denote the integer
quantity of these carry bits as COci+1, and the quantity of carry bits that
propagate to the column ci as COci, e.g., consider again Figure 13, CO6 is an
integer variable representing the value of carry bits that propagate to column
6, and CO7 is the value of carry bits which are generated from column 6 and
propagate to column 7. Based on this description, we formulate the addition
tree equation of the column ci as
zci + 2 · COci+1 = COci +
n−1∑
ri=0
ppci,ri. (1)
3.2 checking partial product approach 49
x7 x6 x5 x4 x3 x2 x1 x0
y7 y6 y5 y4 y3 y2 y1 y0
y7x0 y6x0 y5x0 y4x0 y3x0 y2x0 y1x0 y0x0
y7x1 y6x1 y5x1 y4x1 y3x1 y2x1 y1x1 y0x1
y7x2 y6x2 y5x2 y4x2 y3x2 y2x2 y1x2 y0x2
y7x3 y6x3 y5x3 y4x3 y3x3 y2x3 y1x3 y0x3
y7x4 y6x4 y5x4 y4x4 y3x4 y2x4 y1x4 y0x4
y7x5 y6x5 y5x5 y4x5 y3x5 y2x5 y1x5 y0x5
y7x6 y6x6 y5x6 y4x6 y3x6 y2x6 y1x6 y0x6
y7x7 y6x7 y5x7 y4x7 y3x7 y2x7 y1x7 y0x7
z15 z14 z13 z12 z11 z10 z9 z8 z7 z6 z5 z4 z3 z2 z1 z0
∗
pp4,4
CO6
CO7
COLUMN INDEX(ci)
ROW INDEX(ri)
Figure 13: Addition Tree of 8-bit Multiplier
.
So far, we have presented some deﬁnitions of the multiplication function. In
the following, we prove that not only the multiplication function is a p.r. func-
tion, but also its partial functions that generate and add partial products are
also p.r. functions. This proof allows to verify the generation and addition of ev-
ery partial product in a multiplier using the ECRC technique (see Section 3.1).
Verifying a multiplier then boils down to check the whole addition tree. For this
purpose, we introduce the combined multiply-add (CMA) function since it can
deﬁne the generation and the addition of a partial product, and show that the
CMA is a p.r. function, therefore, its implementation can be veriﬁed using the
ECRC.
Deﬁnition 20 (Combined Multiply-Add (CMA)). The combined multiply-add
function CMA(x0,x1,Y ) = x0 · x1 + Y combines the bitwise multiplication
function and the addition function, where Y is an integer quantity, x0 and x1
are Boolean variables.
Lemma 3. The CMA function CMA(x0,x1,Y ) is a p.r. function.
Proof. Based on Theorem 2, the CMA function is deﬁned by the relations:
CMA(0,x1,Y ) = Y
CMA(x0 + 1,x1,Y ) = Add(CMA(x0,x1,Y ),x1).
50 recurrence relations: scalable verification of multipliers
The ﬁrst relation is a projection, while the addition function Add of the second
relation is known to be a p.r. function. This implies that the CMA function is
itself a p.r. function. 
3.2.2 Overview of the Approach
As demonstrated in the previous subsection, a multiplier can be deﬁned as an
addition tree of partial products which are generated using bitwise multiplica-
tions. Based on that, we propose an approach that applies case splitting to check
in each case the generation of only one partial product as well as the correct-
ness of adding this partial product to other partial products of the multiplier.
We will refer to this partial product as the Partial Product Under Veriﬁcation
(PPV). The combination of the generation and the addition of PPV can be
deﬁned using the CMA function from Deﬁnition 20. Since the CMA function
is a p.r. function, its implementation can be checked using the ECRC.
The approach applies the ECRC on the cone that is inﬂuenced by generating
and adding the PPV to other partial products. This PPV cone can be described
using the addition tree and the addition tree equation which are deﬁned in
Subsection 3.2.1. As stated there, the PPV belongs to a set CEcv of a column
cv in the addition tree, and the function of the related output zcv is formulated
using the addition tree equation. Based on this description, we deﬁne the PPV
cone as the logic gates that are in the functional support of zcv which represent
the generation and the addition of the following partial products:
1. Those that are in sets CEcv−i, for 0 < i < cv, which produce carry bits
that is added to the PPV.
2. All partial products in the set CEcv which includes the PPV itself since
they are involved in the addition functions that produce together the
output zcv.
3. Partial products of the set CEcv+1 which receive carry bits that resulted
from adding PPV to other partial products.
The approach decomposes the veriﬁcation into n2 cases, which corresponds
to the number of partial products of n-bit multiplier. In every case, it extracts
the PPV cone and applies equivalence checking based on the CMA recurrence
relation. This case split leads to small diﬀerences between the compared imple-
mentations of the constructed miters, thereby the approach avoids redundant
checks between the cases, which allows fast equivalence checking regardless of
3.2 checking partial product approach 51
the multiplier size. In the following, we will refer to this approach as the Check-
ing Partial Product (CPP) approach.
3.2.3 Mathematical Formulations
The CPP approach aims in each splitting case to verify the combination of
generating and adding the PPV which is a CMA function x0 · x1 + Y . The
approach veriﬁes this function by extracting its implementation from the DUV
(the PPV cone) together with checking the consistency of the cone with the
CMA relations. Consider Equation (1) to formulate a mathematical expression
for the PPV cone, PPV is a partial product, then it can be expressed as ppcv,rv =
xrv · ycv−rv, where cv and rv are indices of PPV which range over the ranges
of the column index ci and the row index ri. PPV belongs to a column in the
addition tree whose output function zcv is formulated using the addition tree
equation
zcv + 2 · COcv+1 = COcv +
n−1∑
ri=0
ri=rv
ppcv,ri + ppcv,rv. (2)
As can be seen from Equation (2), adding PPV generates the carry COcv+1
that propagates to the column cv + 1. Therefore, PPV addition has inﬂuence
on the column cv + 1. The addition tree equation of this column is
zcv+1 + 2 · COcv+2 = COcv+1 +
n−1∑
ri=0
ppcv+1,ri (3)
Because the structural relations between the addition tree equations are sum-
mations, to formulate the PPV cone, Equation (2) that formulates the cone
of zcv is added to Equation (3) that formulates the cone of zcv+1. Note that
column cv + 1 has higher weight with 21 than column cv, so that Equation (3)
will be multiplied by 2, then it is summed to Equation (2), which derives the
PPV cone equation
zcv + 2 · zcv+1 + 4 · COcv+2 =
COcv +
n−1∑
ri=0
ri=rv
ppcv,ri +
n−1∑
ri=0
2 · ppcv+1,ri + ppcv,rv (4)
52 recurrence relations: scalable verification of multipliers
Note that the term 2 · COcv+1 is removed from the two sides of Equation (4).
We refer to the integer quantity that are added to PPV as
Qrv = COcv +
n−1∑
ri=0
ri=rv
ppcv,ri +
n−1∑
ri=0
2 · ppcv+1,ri,
and we substitute the PPV term ppcv,rv by xrv · ycv−rv, which reformulates
Equation (4) to
zcv + 2 · zcv+1 + 4 · COcv+2 = xrv · ycv−rv +Qrv (5)
The mathematical expression of the PPV cone is couched by the right side of
Equation (5). Note that the PPV cone is a bitwise multiplication followed by
addition, therefore, it is an implementation of the CMA function. By replacing
the variable a0 of the CMA recurrence relation with xrv, the variable a1 with
ycv−rv, and the variable B with Qrv, we apply the CMA relations on the PPV
cone in that Equation (5) becomes
zcv + 2 · zcv+1 + 4 · COcv+2 = CMA(xrv, ycv−rv,Qrv)
The initial relation of PPV is CMA(0, ycv−rv,Qrv) = Qrv, by assigning zero
to xrv. Since the value of Qrv depends on the value of other partial products of
the multiplier, checking the initial relation for every partial product separately
is not trivial. The approach overcomes this problem by checking the initial
relations of all partial products together. Because for each initial relation xrv =
0, checking all initial relations together is done by assigning zeros to all xi bits,
which implies that X =
n−1∑
i=0
2ixi = 0. At X = 0, the approach checks trivially
that the result of the DUV is equal to zero, where 0 · Y = 0. Thus, checking
the initial CMA relations together is done by checking the initial relation of the
multiplier.
After verifying the initial relation, it is the time to check the recurrence
relation of PPV:
CMA(xrv + 1, ycv−rv,Qrv) = CMA(xrv, ycv−rv,Qrv) + ycv−rv. (6)
The approach checks this recurrence relation for every PPV cone using the
ECRC. It builds a miter that compares an implementation for the left side of
the recurrence relation against another implementation representing the right
3.2 checking partial product approach 53
side of this relation. The implementation details of the miter will be explained
in the next subsection.
By checking the addition and the generation of all partial products in the
multiplier, the approach either announces the consistency of the DUV with the
multiplication function or determines the part of the DUV (the PPV cone) that
has a bug.
3.2.4 Implementation
Implementing the recurrence relation of Equation (6) as it is, in every splitting
case, causes redundant checks. The operation xrv + 1 propagates a carry bit of
value one to other bits of X which has higher weights, i.e., xrv+1,xrv+2, · · · ,xn−1.
As the value one will be added to these bits in other splitting cases, it is re-
dundant to implement this operation xrv + 1 as an addition. To simplify the
implementation of the equivalence checking and remove these redundant checks,
the approach assigns zero to the bit xrv, at this case xrv + 1 does not generate
a carry bit. Therefore, the one bit adder xrv + 1 in the left side of Equation (6)
is implemented by the XOR function xrv ⊕ 1 which is the inversion of xrv
(x¯rv). Because of this optimization the carry bit that results from xrv + 1 to
higher bits are stopped, and building the equivalence checking miter becomes
easier. This optimization has no inﬂuence on the soundness of the veriﬁcation
process since at xrv = 0, still all patterns of the ppcv,rv are checked, whereas
ppcv,rv = 1 · ycv−rv in the implementation of the left side of Equation (6) and
ppcv,rv = 0 · ycv−rv in the implementation of the right side, thus such assigning
will not aﬀect the coverage of the approach.
We implemented our approach by integrating it with the tool ABC presented
in [72]. The approach creates n2 splitting cases. In every case, it builds a miter
circuit based on Equation (6), checking the generation and the addition of one
partial product PPV by performing the following steps:
1. It extracts the PPV cone by getting the gates that are connected to the
output bits zcv as well as zcv+1 since these gates represent the function
zcv + 2 · zcv+1 which results from adding the PPV to other partial prod-
ucts. This task is called partial products decomposition shown in the left
side of Figure 14.
2. It searches the extracted cone for the gate that generates the PPV, in
addition to that it makes two copies from the cone and assigns zero to
the bit xrv of the two copies. The variable x2 in the example of Figure 15
54 recurrence relations: scalable verification of multipliers
Circuit Netlist
Partial Products
Decomposition
Recurrence
Miters
PPV
Cones
CEC
Miters
Equivalence
Inconsistency
Identifying
PPV Gate
PPV Cone
Create LHS Create RHS
Create Miter
Miter
Figure 14: Detailed Flow of CPP Approach
(upper and lower part) is an instance of xrv. We refer to this step as
identifying PPV shown in the upper part of Figure 14.
3. The next step is creating the left side (LHS) of the miter, as demonstrated
in Figure 14, it takes a copy of the PPV cone and inverts explicitly the bit
xrv of the identiﬁed PPV gate. An instance of such step is shown in the
upper part of Figure 15, where it inverts the bit x2 of the partial product
x2 · y2.
4. For creating the right side (RHS) which is demonstrated also by Figure 14,
the outputs zcv and zcv+1 of the second copy of the PPV cone is added
to the input ycv−rv using an external 2-bit adder. This can be seen in the
lower part of Figure 15, where the input bit y2 is added to output bits
z4 + 2 · z5.
5. The construction of a recurrence miter is completed, as shown in Figure 14,
with the usual comparison logic by XOR-ing and OR-ing the two output
bits of the left side with two output bits of the external adder in the right
side.
3.2 checking partial product approach 55
x5 x4 x3 x2 = 0 x1 x0
y5 y4 y3 y2 y1 y0
y5x0 y4x0 y3x0 y2x0 y1x0 y0x0
y4x1 y3x1 y2x1 y1x1 y0x1
y3x2 y2x¯2 y1x2 y0x2
y2x3 y1x3 y0x3
y1x4 y0x4
y0x5
z´5 z´4 z3 z2 z1 z0
∗
x5 x4 x3 x2 = 0 x1 x0
y5 y4 y3 y2 y1 y0
y5x0 y4x0 y3x0 y2x0 y1x0 y0x0
y4x1 y3x1 y2x1 y1x1 y0x1
y3x2 y2x2 y1x2 y0x2
y2x3 y1x3 y0x3
y1x4 y0x4
y0x5
z5 z4 z3 z2 z1 z0
+ +
0 y2
∗
Figure 15: Inputs of Miter for Checking pp4,2 = y2 · x2
Finally, as seen in the last block of Figure 14, the constructed miters are given
independently to the Combinational Equivalence Checking (CEC) approach
of the ABC which announces the equivalence or inconsistency between the
compared sides of the miters.
In general, the CPP approach shows signiﬁcant results in the veriﬁcation
of multipliers of various architectures as demonstrated by the experimental
evaluation.
3.2.5 Discussion
Miters that are constructed based on the recurrence relation of Equation (6)
oﬀer large number of internal equivalences in that equivalence checking ap-
proaches like rewriting and fraiging [13] succeed to prove the equivalence (or
non-equivalence) of many cases without resorting to the SAT solver. This ex-
56 recurrence relations: scalable verification of multipliers
x5 x4 x3 x2 x1 x0
y5 y4 y3 y2 y1 y0
y5x0 y4x0 y3x0 y2x0 y1x0 y0x0
y4x1 y3x1 y2x1 y1x1 y0x1
y3x2 y2x2 y1x2 y0x2
y2x3 y1x3 y0x3
y1x4 y0x4
y0x5
z5 z4 z3 z2 z1 z0
∗
z4 cone
z3 cone
Figure 16: Search of CPP Approach for pp4,2 = y2 · x2
plains the eﬀectiveness of the CPP approach, even when it is applied to multi-
pliers larger than 32 bits.
The reasons behind the extraordinary eﬃcient of the constructed miter are:
1. The lower parts of both sides of the miter that produce the carry COcv
have identical structural.
2. The diﬀerence between compared implementations is only in the genera-
tion of PPV and the propagation way of COcv+1. In the left side, COcv+1
is added to the elements of CEcv+1 through the cone of PPV, while in
the right side, COcv+1 is separated into two quantities, one quantity is
added using the logic gates of the PPV cone, whereas the second quantity
is added through the external adder. This diﬀerence is not huge since it
is restricted to the part of the PPV cone that adds COcv+1 to the partial
products of the set CEcv+1.
A nice further feature provided by the CPP approach is the capability to
identify the location of a fault if exist in the DUV. Once sides of a miter of a
PPV cone are proved to be not equivalence, it is obvious to deduce that the
faults are in the gates that belong to this PPV cone and are not in previous
checked cones. This feature helps the debugging of incorrect multipliers because
it determines a narrow boundary for gates that at least one of them has a fault.
Also, these constructed miters can be checked in parallel or serially since the
cases are independent of each other. We use parallelism for multipliers larger
than 32 bits.
3.2 checking partial product approach 57
Figure 17: Wallace Tree Accumulation
3.2.6 Limitations
However, the approach suﬀers from two main limitations that restrict its
applicability for speciﬁc architectures. The ﬁrst limitation is related to the
dependency between gates that generate partial products, while the second
is because of an assumption about the propagation of carry bits within the
multiplier. In the following, we explain in details these two limitations. The ﬁrst
one is called partial products dependency, and we refer to the second limitation
as carry propagation assumption.
Partial Products Dependency Limitation
To clarify this limitation, consider Step 2 of the approach. The approach
searches for the gate of the partial product ppcv,rv. The search is done by com-
paring gates of cone zcv and cone zcv−1. The gate that belongs to the cone of
zcv, is not in the cone of zcv−1, and has the input bit xrv, is the one that the
approach searches for. Figure 16 shows an instant of this search, where the gate
that generates the partial product PP4,2 is identiﬁed since it is in the cone of
z4 but not in z3 cone and it gets the input variable x2. After ﬁnding the gate
of the PPV, the approach inverts the input bit xrv of this gate. This inversion
does not always aﬀect on the partial product ppcv,rv, but may aﬀect other par-
58 recurrence relations: scalable verification of multipliers
tial products, which is the case for multipliers based on Booth recoding and
optimized multipliers. Thus, if the implementation of partial products in a mul-
tiplier does not compute each product independently (as AND gate), the CPP
approach fails to deal with such multiplier.
Carry Propagation Assumption Limitation
The second limitation is related to the assumption that carry bits generated
because of adding ppcv,rv (COcv+1) propagate to other partial products of the
multiplier through a speciﬁc path. In this path, these carry bits are fed as
inputs in order to be added to the elements of next column (CEcv+1) resulting in
another group of carry bits (COcv+2). Because of this assumption, the approach
checks only during each splitting case the correctness of the propagation of
COcv+1 bits to CEcv+1 elements. It bypasses the check of the propagation of
carry bits of COcv+1+i through elements in the sets CEcv+1+i, for all 0 < i ≤
2n − cv − 2, since these propagations will be tested during next splitting cases.
This assumption is valid for classical architectures of multipliers such as those
that are based on Wallace tree or Dadda tree. As shown in Figure 17, Wallace
tree accumulates partial products of the addition tree into two arrays of bits
where partial products and resulted bits are symbolized as dots. The accumula-
tion is performed by adding each two or three elements of a column, where each
addition operation results in two bits: a sum bit which replaces those added ele-
ments in the column and a carry bit which propagates to the next column. This
implies that all carry bits generated because of adding elements of a column
are fed to the next column. Therefore, carry bits in the Wallace tree propagate
through paths that are consistent with the carry propagation assumption.
However, the assumption is not valid for multiplier architectures where some
carry bits of COcv+1 propagate without adding them to elements of CEcv+1.
In this case, the correctness of the generation and the propagation of these
carry bits will not be checked by the CPP approach. Because of that: 1) the
approach cannot verify architectures that do not satisfy the carry propagation
assumption, and 2) it can only ﬁnd faults of a certain class which excludes those
faults that invalidate this assumption.
3.3 experimental results
Our approach is built on top of the ABC tool [13]. ABC compiles the miter
circuit into an And-Inverter Graph (AIG) and applies structural reduction tech-
niques like fraiging and rewriting. These techniques reduce eﬀectively the size
3.3 experimental results 59
Table 2: Runtimes for Veriﬁcation of Multipliers
Benchmark I/O bits CPP [h:m:s] Fujita [h:m:s]
SP-AR-RC 16/32 00:01:23 00:00:31
SP-WT-CL 16/32 00:00:46 00:09:08
SP-WT-BK 16/32 00:00:52 00:10:05
SP-CT-BK 16/32 00:00:43 00:17:56
SP-AR-RC 32/64 02:34:40 11:09:18
SP-WT-CL 32/64 00:15:12 TO
SP-DT-HC 32/64 00:21:14 TO
SP-CT-BK 32/64 00:21:20 TO
SP-AR-RC 48/96 20:32:12 TO
SP-WT-CL 48/96 01:29:36 TO
SP-DT-HC 48/69 03:53:17 TO
SP-CT-BK 48/96 01:20:00 TO
SP-AR-RC 64/128 94:37:20 TO
SP-WT-CL 64/128 05:46:40 TO
SP-CT-BK 64/128 05:31:44 TO
SP-AR-RC 128/255 TO TO
SP-CT-BK 128/255 78:11:12 TO
of the AIG if the miter has many similar internal nodes. At the backend of the
equivalence checking procedure, the reduced AIG is converted to Conjunctive
Normal Form (CNF) and the resulting instance is given to MiniSAT [37].
All experiments have been carried out on an Intel(R) Core(TM) i5-3320M
CPU (2.6 GHz, 16 GByte) running Linux. For the experiments, the multipliers
are given as Verilog RTL code. The synthesis of the designs to a gate level
netlists has been done using the Yosys Open Synthesis Suite [101] and ABC.
We conducted two types of experiments. The ﬁrst one demonstrates the prac-
tical time of the approach in checking larger multipliers, while the second ex-
periment is to show the capability of discovering bugs.
60 recurrence relations: scalable verification of multipliers
3.3.1 Equivalence Checking Results
We compare the runtimes of our CPP approach against Fujita’s approach. In
the original approach of Fujita the equivalence is checked using BDDs. Here,—
and for a fair comparison—we use Fujita’s approach with ABC as a backend
for equivalence checking.
The ﬁrst column of Table 2 shows the name of the circuit. The second column
gives the number of inputs and output bits. The next two columns provide
the runtimes. Note that the runtimes of Fujita’s approach include only the
veriﬁcation of the lower n bits (not the full 2n output bits). The time out (TO
in the table) has been set to 100 hours. Please note that for a naive miter
construction (one big miter) and then running the ABC command (CEC) all
benchmarks timed out after 100 hours.
The experiments show that the veriﬁcation time of the CPP approach de-
pends not only on the size of the multiplier circuit, but also on the type of
the partial products accumulator. The circuits with Wallace tree or (4,2) com-
pressor are veriﬁed in less time than those with array accumulator. As can be
seen our approach veriﬁes the correctness of the multipliers for up to 128 bits
in practical time. Fujita’s approach fails here already for the complex architec-
tures.
3.3.2 Fault Injection
In order to demonstrate the ability to discover bugs, we applied our approach
to faulty designs that have been created by automatic fault injection. The faults
are inverters that have been injected in the AIG representation of the netlist.
We applied the CPP approach on diﬀerent copies of each netlist where each copy
contains one single fault. The approach has succeeded to discover all injected
bugs. The results are summarized in Table 3. The ﬁrst two columns describes
the type of the multiplier architecture (as explained in the previous section) and
the bit width, respectively. The third column gives the size (number of nodes)
of the AIG. The number of performed runs is given in the fourth column, and
the average run-time needed to discover the bug is given in the last column.
For all 16 bit architectures, we systematically covered the whole AIG by
injecting two faults for each node. For the larger designs, random gates were
chosen with an even distribution over the netlist. For the 32 bit multipliers
it can be observed that the runtimes vary between fractions of a second and
3.4 summary and future work 61
Table 3: Results for Models with Injected Faults
Benchmark I/O bits |AIG| #Faults ∅ runtime
SP-AR-RC 16/32 2126 4252 2.56 s
SP-WT-CL 16/32 2988 5976 1.80 s
SP-CT-BK 16/32 2201 4402 0.45 s
SP-AR-RC 32/64 14196 710 27.05 s
SP-WT-CL 32/64 12741 319 59.72 s
SP-CT-BK 32/64 9173 459 10.00 s
several minutes, where usually more than half of the bugs are discovered in less
than a second.
To summarize, the results show that our approach works well for bug-hunting
as well as for the veriﬁcation of correct multiplier designs.
3.4 summary and future work
Veriﬁcation of bit-level multipliers still has no general automated solution.
In this chapter, we verify multipliers at the gate-level using the ECRC tech-
nique which does not require neither information about high-level designs of
multipliers nor a golden reference for the comparison. We have developed an
approach that allows to verify multiplier circuits up to 128 bits. The approach
is based on functional properties of the multiplication function which can be
expressed as recurrence equation, together with a new case splitting scheme.
As a consequence enough similarities remain for the equivalence check of each
case. Overall, the approach increases the scalability of equivalence checking to
verify larger multipliers, however, it cannot be applied for all types of multiplier
architectures such as Booth recoding multipliers. Also, it can detect faults that
belong only to a certain class, whereas faults that break an assumption about
the structural of the multiplier may not be discovered by the approach.
In future work, an investigation should be conducted to overcome the limita-
tion of the CPP approach by rewriting the gate netlists of multipliers to isolate
partial products that rely on each other. Furthermore, the ECRC technique
could be leveraged for the veriﬁcation of other functions that can be deﬁned by
recurrence relations such as Fused Multiply Add (Xo = X1 · X2 +X3), where
X1, X2 and X3 are integer inputs, while Xo is the integer output of the function.

4
SYMBOL IC COMPUTATION FOR VERIFY ING COMPLEX
MULTIPL IERS
Formal veriﬁcation utilizing symbolic computation has demonstrated the abil-
ity to formally verify large Galois ﬁeld arithmetic circuits [68] and basic ar-
chitectures of integer arithmetic circuits [23, 39]. The technique models the
circuit as Gröbner basis polynomials and reduces the polynomial equation of
the circuit speciﬁcation wrt. the polynomials model using the Ideal Member-
ship Testing (IMT). However, during the polynomials reduction by the IMT,
the technique suﬀers from an exponential blow-up of the size of the polynomials,
in particular, when it veriﬁes Parallel Preﬁx Adders (PPAs) and Booth recoded
multipliers. In this chapter, we analyze the computational complexity of veri-
fying integer multipliers by the symbolic computation technique. Moreover, we
address the reasons of the exponential blow-up that occurs with complex inte-
ger multipliers and introduce an algorithm that allows to apply the technique
to a large class of multiplier circuits, i.e., including basic and parallel multiplier
architectures. The goal of this algorithm is rewriting the Gröbner basis model
of a multiplier circuit in order to replace its bit-level description by a network
of adder cells. Based on our observation and previous observations [22, 60, 96],
the veriﬁcation problem is polynomially bounded in both space and time when
it is applied to multipliers described as full adder networks. To circumvent the
exponential complexity of verifying bit-level multipliers, we propose the model
rewriting algorithm to convert any kind of multiplier architectures into what we
call Sum Carry Networks (SCNs) which are networks of adder cells connected
by sum and carry signals. In contrast to full adder networks that are limited
to half and full adders, SCNs consist of adders with arbitrary number of input
bits. Since the reduction complexity of SCNs as well as full adder networks wrt.
their speciﬁcation polynomials by the IMT procedure is polynomial, rewriting
bit-level multiplier models into SCNs before starting the reduction process by
the IMT circumvents the exponential blow-up of the number of terms during
this reduction process. To perform the rewriting step, as shown in Figure 18,
the proposed model rewriting algorithm executes successively two novel schemes
named XOR rewriting and common rewriting on models of bit-level multipli-
ers and applies a logic reduction rule within the XOR rewriting scheme. The
algorithm lifts bit-level models into SCNs, concurrently, it identiﬁes and then
63
64 symbolic computation for verifying complex multipliers
removes by a logic reduction rule redundant terms that evaluate to zero. These
terms are distributed within models of complex multiplier architectures. With-
out early vanishing to such terms, their number increases exponentially during
the reduction process, making the veriﬁcation task computationally infeasible.
Besides introducing the model rewriting algorithm, this chapter presents sub-
stitution rules that enhance the computational performance of the IMT (shown
on the right side of Figure 18) to verify large scale complex multipliers. As
observed by [23, 39], a proper substitution order is crucial to circumvent the
blow-up within the IMT. Restricting the IMT to follow a ﬁxed substitution
order (see Subsection 2.2.3) makes the technique applicable for a speciﬁc multi-
plier architecture. The proposed substitution rules qualify the IMT to ﬁnd the
proper substitution order that improves the time performance of the IMT as
well as expands its applicability. The experiments show that the enhanced sym-
bolic computation technique is applicable to verify a large class of multiplier
circuits of up to 128-bit. These main contributions and other contributions in
this chapter can be summarized as follows:
1. Comparing models of integer multipliers over the Boolean ring (utilized
in thesis) versus binary Galois ﬁeld models with respect to the complexity
of the veriﬁcation problem.
2. Observing and justifying that the veriﬁcation complexity of integer mul-
tipliers expressed as SCNs is polynomial in space.
3. Determining the reason for the ineﬃciency of applying the symbolic com-
putation technique to verify complex multipliers consisting of Booth par-
tial products and PPAs.
4. Observing that rewriting as an explicit step before calling the IMT proce-
dure is capable of circumventing blow-ups during reduction of polynomials
since it rewrites models of complex multiplier architectures into SCNs. For
this propose, rewriting schemes and a logic reduction rule have been pro-
posed to remove vanishing monomials (monomials that always evaluate to
zero) that appear in the algebraic models of complex integer multipliers.
5. Supporting the IMT procedure by substitution rules that permit to ﬁnd
a proper substitution order regardless of the multiplier architecture.
6. Adding modulo 22n to the speciﬁcation of n-bit integer multipliers, which
is crucial to match the speciﬁcation with multipliers that consist of Booth
partial products or redundant binary addition trees.
4.1 boolean ring versus binary galois field 65
Circuit
Netlist
Gröbner
Bases
Modeling
N
Model
Rewriting
G Membership
Testing (IMT)
G′
Speciﬁcation
pr
Equivalence/Inconsistency
XOR Rewriting
&
Logic Reduction
G Common
Rewriting
G′
Figure 18: Enhanced Symbolic Computation for Verifying Multipliers
4.1 boolean ring versus binary galois field
In the literature, two methods have been proposed to model bit-level cir-
cuits as Gröbner bases. The ﬁrst is modeling the circuit by polynomials in the
Boolean ring (Z2) as in the thesis, while the second method models the circuit
in the binary Galois ﬁeld (GF2) as in [14, 68, 78]. What distinguishes GF2
from the ring Z2 is that all the ﬁeld operations are performed modulo an irre-
ducible primitive polynomial and the coeﬃcients are reduced modulo two. This
implies that all coeﬃcients over GF2 take a value from the set {0, 1}, whereas
−1 = +1 and 2 mod 2 = 0. Every Boolean logic gate in the circuit is mapped
to a polynomial function over GF2 as follows:
NOT: xo = ¬x1 =⇒ xo + x1 + 1 mod 2
AND: xo = x1 ∧ x2 =⇒ xo + x1x2 mod 2
OR: xo = x1 ∨ x2 =⇒ xo + x1x2 + x1 + x2 mod 2
XOR: xo = x1 ⊕ x2 =⇒ xo + x1 + x2 mod 2
MUX: xo = (x1 ∧ x2) ∨ (¬x1 ∧ x3) =⇒ xo + x1x2 + x1x3 + x3 mod 2.
In this section, we discuss an argument claiming that modeling integer multi-
plier circuits over GF2 reduces the computation complexity of the veriﬁcation
problem compared to the modeling over Z2. What supports this argument is
that models of integer circuits over GF2 are free from nonlinear terms that their
coeﬃcients are multiples of two which appear intensively in Z2 models, e.g., the
66 symbolic computation for verifying complex multipliers
XOR gate is modeled over GF2 as xo = x1 + x2 mod 2 without the nonlinear
term (−2x2x1) that appears in the polynomial of the XOR function over Z2
which is xo = −2x2x1 + x1 + x2. Hence the argument claims that verifying
GF2 models has less computationally cost. In the following, we show that this
argument is not true, on the contrary, the computational complexity of verify-
ing integer multipliers modeled over GF2 is exponential, while this veriﬁcation
problem turns out to be tractable by leveraging the algorithms proposed in this
chapter which are performed on Z2 and cannot be applied to GF2 models.
The signiﬁcant capability oﬀered by Z2 and is missed in GF2 is the appli-
cability to check together the correctness of Boolean functions that depend
accumulatively on each other—they can be encoded to one integer function. In-
teger adder and multiplier circuits are implemented as a tuple F = (f0, · · · , fm)
of multi-output Boolean functions, whereas fi is an one-output Boolean func-
tion for each i ∈ {0, · · · ,m} and F is an integer-valued function related to the
functions of its tuple by the equation F =
m∑
i=0
2i · fi. Sophisticated algorithms
such as those proposed in the thesis take this relationship into consideration
to reduce the complexity of the veriﬁcation problem of integer circuits to be
solvable. In fact, this relationship can be considered if the output Boolean func-
tions are in the ring Z2, while it is ignored if the functions are modeled over
GF2.
Lemma 4. For an integer-valued function F implemented as a tuple of one-
output Boolean functions F = (f0, · · · , fm), the relationship that relates these
Boolean functions F =
m∑
i=0
2i · fi cannot be considered during their veriﬁcation
process over GF2.
Proof. Since the Boolean functions fi are modeled over GF2, their relation-
ship F =
m∑
i=0
2i · fi must also be modeled over GF2 to be considered in the
veriﬁcation process, otherwise, the process is not sound. In GF2, coeﬃcients
are reduced modulo two, therefore, (F =
m∑
i=0
2i · fi) mod 2 is rewritten to F
mod 2 = f0, resulting in an equation included only one output Boolean func-
tion f0, which proves that the relationship among outputs of an integer function
cannot be utilized in the GF2 ﬁeld. 
Because of Lemma 4, output Boolean functions modeled over GF2 can only
be checked by considering them independently from each others. To illustrate
this concept, consider the example of a half adder circuit.
4.1 boolean ring versus binary galois field 67
Example 9. A 2-bit adder (half adder) is implemented by two Boolean func-
tions: one for the sum s = x1 ⊕x2 and another for the carry-output co = x1 ∧x2,
where x1 and x2 are Boolean inputs.
Over Z2, the 2-bit adder is modeled by two polynomials:
g1 := −s − 2x1x2 + x1 + x2 g2 := −co+ x1x2.
The output functions s and co can be encoded to the integer function F =
2co+ s = x1 + x2 which derives the functional speciﬁcation polynomial pr :=
−2co− s+ x1 + x2. To verify the 2-bit adder, the IMT is evoked to test whether
the polynomials set {g1, g2} satisﬁes the speciﬁcation pr. This is performed by
substituting iteratively co and s in pr. First, co is replaced by x1x2 which is
the tail term of g2 based on the rewrite rule (see Deﬁnition 17). The result of
this substitution (division) is the remainder r := −2x1x2 − s+ x1 + x2. Second,
s is substituted for the tail terms of g1 (−2x1x2 + x1 + x2), resulting in a
new remainder r := −2x1x2 − 1 · (−2x1x2 + x1 + x2) + x1 + x2 = −2x1x2 +
2x1x2 − x1 − x2 + x1 + x2 = 0. All terms of the ﬁnal remainder cancel each
other and the remainder is equal to zero, hence an equivalence is announced
between the 2-bit adder and its functional speciﬁcation.
In the case of GF2, the 2-bit adder is modeled by the polynomials:
gˆ1 := s+ x1 + x2 mod 2 gˆ2 := co+ x1x2 mod 2.
As discussed before, it is not possible to describe the functional speciﬁcation by
one polynomial that relates the two outputs s and co. The functional speciﬁcation
is modeled by two polynomials: one to validate the sum which is pˆr1 := s+ x1 +
x2 mod 2 and another for the carry-output pˆr2 := co+ x1x2 mod 2. The IMT
is invoked two times to compare pˆr1 against gˆ1 and pˆr2 against gˆ2. It can be
easily seen that the IMT returns zero remainders at the two cases, which proves
the correctness of the 2-bit adder circuit.
Yet, we have demonstrated that output Boolean functions of an integer func-
tion must be checked independently, as long as they are modeled in the ﬁeld
GF2. But why this is a problem for the veriﬁcation process. In the following, we
answer this question by proving that the computational complexity of verifying
these Boolean functions individually is exponential.
A Boolean function fi can be linearly veriﬁed by the IMT procedure, as long
as it is described by a canonical representation, since the veriﬁcation process
will turn out into a subtraction of two canonical representations for fi and its
speciﬁcation. A GF2 model of fi is canonical, if it is represented by one poly-
nomial mapping primary outputs directly to primary inputs of fi without any
kind of internal variables, hence the complexity of verifying a function fi by the
IMT can be measured by the size of the polynomial which models canonically
68 symbolic computation for verifying complex multipliers
s0
x0
y0
cn
co0
s1
x1
y1
co1
s2
co2
x2
y2· · ·
Figure 19: Array of Full Adders
this function. To estimate the veriﬁcation complexity of integer adders and mul-
tipliers, we address ﬁrst the canonical representations for arrays of 3-bit adders
(full adders) since they are compounded to build what is called array adders and
array multipliers (see Subsection 2.3.1). We exploit the symmetry of such array
architectures in order to derive simple proofs about addition and multiplication
functions, which can be generalized afterwards for other architectures.
Consider the array of adders cells shown in the Figure 19. Every 3-bit adder
in the array includes 3-inputs OR function, the output of every OR is fed into
two AND functions which their outputs are given as inputs to another OR,
building a chain of OR and AND functions. The chain generates internal carry
bits as outputs, every internal carry coi−1 is given as an input to the output
Boolean functions si and to the function of the next internal carry coi. Thus
describing si by one canonical polynomial over GF2 is performed by eliminating
successive internal carry variables coj , for all j ≤ i − 1. In the following, we
prove that because of these eliminations, for an array of n+ 1 adder cells, the
size of the canonical polynomial of the output Boolean function sn is O(2n+1),
hence the complexity of verifying the array is exponential.
Lemma 5. For an integer function F of a 3-bit adders array implemented
as a tuple of Boolean functions F = (s0, · · · , sn) with a set of primary inputs
{x0, · · · ,xn, y0, · · · , yn, cn} and modeled over GF2, the complexity to verify the
output Boolean function sn is bounded in space by O(2n+1).
Proof. The integer function F is mapped in GF2 by the set of polynomials:
coj = (xj ∧ yj) ∨ (xj ∧ coj−1) ∨ (yj ∧ coj−1) =⇒
g2j+1 := coj + coj−1yi + coj−1xi + yjxj mod 2
sj = xj ⊕ yj ⊕ coj−1 =⇒ g2j := sj + coj−1 + yj + xj mod 2
4.1 boolean ring versus binary galois field 69
co0 = (x0 ∧ y0) ∨ (x0 ∧ cn) ∨ (y0 ∧ cn) =⇒
g1 := co0 + cn y0 + cn x0 + y0x0 mod 2
s0 = x0 ⊕ y0 ⊕ cn =⇒ g0 := s0 + cn+ y0 + x0 mod 2,
where 0 < j ≤ n and the set of variables {co0, · · · , con} model internal carry
bits. To represent the function sn canonically, a polynomial mapping sn to
the primary inputs of F is required. This is performed by backward substi-
tutions using the rewrite rule in the polynomial that its leading variable is
sn, which is (g2n := sn + con−1 + yn + xn mod 2). The ﬁrst substitution re-
places the variable con−1 in g2n with the tail terms of the polynomial g2n−1 :=
con−1 + con−2yn−1 + con−2xn−1 + yn−1xn−1 mod 2, denoted as
g2n
g2n−1−−−−−→ r1 := sn + con−2yn−1 + con−2xn−1 + yn−1xn−1 + yn + xn mod 2,
where r1 is the resulted remainder polynomial. The next substitution
r1
g2n−3−−−−−→ r2 := sn + con−3yn−2yn−1 + con−3xn−2yn−1 + yn−3xn−2yn−1+
con−3yn−2xn−1 + con−3xn−2xn−1 + yn−3xn−2xn−1 + yn−1xn−1 + yn + xn
mod 2
replaces another carry variable con−2 in r1 with the tail terms of polynomial
g2n−3 := con−2 + con−3yn−2 + con−3xn−2 + yn−2xn−2 mod 2 to obtain a new
remainder polynomial r2. These iterative substitutions are performed until the
ﬁnal remainder depends only on primary inputs of F , in other words until
all successive internal carry variables co are substituted. Since the number of
internal carry variables is n, the same number of iterative substitutions are
executed which are denoted as g2n
g2n−1−−−−−→ r1 g2n−3−−−−−→ r2 g2n−5−−−−−→ r3 g2n−7−−−−−→
· · · rn−2 g3−−→ rn−1 g1−−→ rn.
To calculate the size of the ﬁnal remainder rn, we prove that the number
of terms of a remainder ri (|ri|) is equal to 2i + |ri−1|, where 1 < i ≤ n,
and ri is obtained after the substitution iteration of index i in the previous
remainder ri−1. Each iteration i eliminates an internal carry variable con−i from
the terms of polynomial ri−1, substituting the variable con−i for three terms
(con−i−1yn−i + con−i−1xn−i + yn−ixn−i) in order to get the next remainder ri.
This means that the size |ri| is larger than |ri−1| by double the number of
terms that include con−i, which can be formulated as |ri| = 2 · sti−1 + |ri−1|,
where sti−1 is the number of terms incorporating the variable con−i. Since
two of the three terms that have replaced con−i include another carry variable
con−i−1, sti (number of terms that include con−i−1 in ri) can be calculated as
two times the number of terms that contain the variable con−i in the remainder
70 symbolic computation for verifying complex multipliers
ri−1 (sti−1). Based on this observation, another relation is deduced which is
sti = 2 · sti−1 = 22 · sti−2 = 2i · st1. The ﬁrst substitution started by the
polynomial g2n that has only one term of con−1, therefore, st1 = 1, hence
sti = 2i · st1 = 2i. Consequently,
|ri| = 2 · sti−1 + |ri−1| = 2i + |ri−1|.
Based on this derived equation,
|rn| = 2n + |rn−1| = 2n + 2n−1 + |rn−2| = 2n + 2n−1 + 2n−2 + · · · 22 + |r1|, as
|r1| = 6, then |rn| = 2n + 2n−1 + 2n−2 + · · · 22 + 2+ 1+ 3 = 2n + 2n − 1+ 3 =
2n+1 + 2.
Since the size |rn| is equal to 2n+1 + 2, the complexity to obtain a canonical
representation for the function sn is bounded in space by O(2n+1). Therefore,
the computational complexity to verify the Boolean function sn is also bounded
by O(2n+1). This completes the proof.

Based on Lemma 4 and Lemma 5, the complexity of verifying an array of
3-bit adders of n+ 1 output bits modeled over GF2 is bounded by O(2n+1).
Independently of the architecture of the integer adder, its canonical represen-
tation over GF2 is the same as an array of 3-bit adders since both of them
have the same function, therefore, an integer adder is bounded by the same
exponential veriﬁcation complexity as an array of 3-bit adders. In the case of
an integer multiplier, the architecture of the multiplier can be thought of as
successive arrays of 3-bit adders to compress n arrays of partial products into
one array, where the longest 3-bit adders array (last stage adder) has 2n − 1
outputs. Because the complexity of verifying the longest array is exponential in
space by O(22n), it is obvious to justify that the veriﬁcation complexity of the
array multiplier modeled over GF2 is also exponential. Based on this inference,
it can be deduced that the veriﬁcation of the multiplier over GF2 is exponen-
tial regardless of the multiplier architecture since all architectures represent
the same function and therefore have the same canonical representation with
an exponential size.
In the following sections, we show that, in contrast to GF2, the veriﬁcation
complexity of the multiplier is turned out to be polynomial by sophisticated
algorithms executed over the Boolean ring models of multipliers.
4.2 verification complexity of sum carry networks 71
4.2 verification complexity of sum carry networks
Veriﬁcation of integer multipliers at gate-level suﬀers from exponential com-
plexity in space and time, in particular, if the veriﬁcation is performed by
classical solvers such as SAT and BDDs, or if the multipliers are modeled in
GF2 as has been demonstrated in the previous section. However, previous ob-
servations [22, 60, 96] have shown that the veriﬁcation problem is polynomially
bounded in both space and time, as long as it is applied to multipliers described
as a network of full adders. Because of this observation, we have proposed a
rewriting algorithm that lifts implicitly bit-level descriptions of a large class
of multiplier architectures into networks of adder cells which are named Sum
Carry Networks (SCNs), where the cells have arbitrary number of input bits.
However, before introducing this rewriting algorithm, we are interested in ana-
lyzing the complexity of the IMT procedure for verifying SCNs modeled over
the Boolean ring.
Deﬁnition 21 (Sum Carry Networks). Boolean ring models that map outputs
fi(X) of integer-valued functions F = (f0, · · · , fn) to their inputs set X =
{x0,x2, · · · ,xn−1} through adder cells functions Fa = (s, co0, · · · , com) are
named sum carry networks (SCNs). s and co are Boolean variables that denote
sum and carry outputs of adder cells, respectively.
The signiﬁcant feature of SCNs is circumventing the exponential blow-up of the
number of nonlinear terms that are distributed within models of integer-valued
functions F . These nonlinear terms model the propagation of carry bits within
F , in the thesis we name them carry terms. By describing these functions F as
networks of adder cells, these carry terms are revealed only within polynomials
of adder cells, which permits to cancel these terms simply before an exponential
blow-up of their sizes. The adder cells of SCNs are not restricted to a speciﬁc
size, e.g., a full adder, they have arbitrary number of input bits, which allows
rewriting a large class of bit-level multiplier architectures into SCNs.
Deﬁnition 22 (Carry Terms). Carry terms are those nonlinear terms that
model internal carry bits propagated among Boolean output functions of adder
cells.
According to our observation, these carry bits are modeled as nonlinear terms
distributed among polynomials of Boolean ring models of adder cells, which are
revealed, as long as each output function of Fa is expressed by one polynomial.
These carry terms share monomials, while their coeﬃcients have opposite signs
and they are multiples of each other.
72 symbolic computation for verifying complex multipliers
Example 10. Consider the model of a 3-bit adder cell (full adder), which is
described by one polynomial maps the output sum s to the primary inputs of
the 3-bit adder {x1,x2,x3} and another polynomial models the function of the
output carry co. Such model consists of the following polynomials:
g1 : −co −2x3x2x1 + x3x2 + x3x1 + x2x1
g0 : −s +4x3x2x1 − 2x3x2 − 2x3x1 − 2x2x1 + x3 + x2 + x1 .
Between these two polynomials carry terms are revealed—those that are colored
blue, they share the monomials of the set {x3x2x1,x3x2,x3x1,x2x1}, and their
coeﬃcients are multiple of each other and with diﬀerent signs.
Yet, we have introduced the concept of SCN and its feature of revealing carry
terms within polynomials of adder cells. To address the advantages of SCNs in
reducing the veriﬁcation complexity of integer functions, we start by analyzing
the complexity of verifying a 3-bit adders array shown in Figure 19, when it is
modeled as a SCN. This facilitates afterwards to justify that the veriﬁcation
complexity of n-bit integer multiplication function implemented as a SCN is
bounded in space by O(n2).
Lemma 6. For a function F of a 3-bit adders array implemented as a tuple of
Boolean functions F = (s0, · · · , sn) with a set of primary inputs {x0, · · · ,xn,
y0, · · · , yn, cn} and modeled as a SCN, let si and coi be a pair of Boolean vari-
ables representing the sum and carry-output of a 3-bit adder in the array F ,
where 0 ≤ i ≤ n. The complexity to verify the SCN model of F wrt. its poly-
nomial speciﬁcation pr := −2n+1con −
n∑
k=0
2ksk +
n∑
k=0
2kxk +
n∑
k=0
2kyk + cn is
bounded in space by O(n) if the variables of each pair (coi, si) have been sub-
stituted consecutively.
Proof. The integer function F is modeled as a SCN by the set of polynomials:
g2j+1 := −coj −2coj−1yjxj + coj−1yj + coj−1xj + yjxj
g2j := −sj +4coj−1yjxj − 2coj−1yj − 2coj−1xj − 2yjxj + coj−1 + yj + xj
g1 := −co0 −2cn y0x0 + cn y0 + cn x0 + y0x0
g0 := −s0 +4cn y0x0 − 2cn y0 − 2cn x0 − 2y0x0 + cn+ y0 + x0,
for 0 < j ≤ n. The veriﬁcation process is performed using the IMT proce-
dure by backward substitutions in the speciﬁcation polynomial pr. The process
starts by substituting the variable con for the tail terms of the polynomial
g2n+1 := −con −2con−1ynxn + con−1yn + con−1xn + ynxn , denoted as
pr
g2n+1−−−−−→ r1 := −2nsn +2n+2con−1ynxn − 2n+1con−1yn − 2n+1con−1xn−
2n+1ynxn −
n−1∑
k=0
2ksk +
n−1∑
k=0
2kxk +
n−1∑
k=0
2kyk + cn+ 2nyn + 2nxn,
4.2 verification complexity of sum carry networks 73
where r1 is the resulted remainder. In the next substitution
r1
g2n−−−→ r2 := −2ncon −
n−1∑
k=0
2ksk +
n−1∑
k=0
2kxk +
n−1∑
k=0
2kyk + cn,
sn in r1 is replaced by the tail terms of polynomial g2n := −sn
+4con−1ynxn − 2con−1yn − 2con−1xn − 2ynxn+ con−1+ yn+xn, ensuing a new
polynomial r2 where the carry terms—colored orange—cancel each other.
From these two substitutions, it can be observed that the substitution pro-
cess is initiated by a polynomial pr of the size |pr| = 3(n+ 1) + 2; the outcome
of the ﬁrst substitution r1 has a size |r1| = |pr| + 3 since one term (−con)
has been substituted for four terms; after the second substitution to replace sn
in the remainder r1, the carry terms (colored orange) that are revealed in the
polynomial g2n of sn as well as r1 cancel each other, reducing the size of the
obtained remainder |r2| than |r1| by six terms to be |r2| = |pr| − 3, four of them
are carry terms and the other two are linear terms: xn and yn. By restricting
iterative substitutions pr
g2n+1−−−−−→ r1 g2n−−−→ r2 g2n−1−−−−−→ r3 · · · r2n+1 g0−−→ r2n+2 to
follow the substitution order that replaces consecutively the output variables in
the pair (coi, si) of every adder cell, carry terms revealed in the polynomials of
every pair variables cancel each other linearly—without further substitutions
in these carry terms that cause exponential blow-up in the sizes of intermedi-
ate remainders. After eliminating variables of each pair (coi, si), the resulted
remainder size |r2n−2i+2| = |pr| − 3(n− i+ 1) is always less than the maximum
size |pr| + 3 = 3(n+ 1) + 5 which is the size of the ﬁrst remainder r1. Thus
it has been justiﬁed that the veriﬁcation complexity of a SCN of a 3-bit adder
array is bounded in space by O(n) under the substitution order that obliges
consecutive substations of the variables in each pair (coi, si). 
Lemma 7. For the multiplication function F implemented as a tuple of Boolean
functions F = (z0, · · · , z2n−1) with a set of primary inputs {x0, · · · ,xn−1,
y0, · · · , yn−1} and modeled by a SCN, let the SCN model of F consists of arrays
Si = (si,0, si,1, · · · , si,lni) which are built from adder cells Fai,w = (si,w, coi,w),
the complexity to verify the SCN model of F wrt. its polynomial speciﬁcation
pr := −
2n−1∑
k=0
2kzk +
n−1∑
k=0
2kxk ·
n−1∑
k=0
2kyk is bounded in space by O(n2), as long as
the variables in the list {coi,0, si,0, coi,1, si,1, · · · , coi,lni , si,lni} of each array Si
are substituted consecutively according to the order coi,lni > si,lni > coi,lni−1 >
si,lni−1 > · · · > coi,0 > si,0.
74 symbolic computation for verifying complex multipliers
Proof. Multiplication function can be implemented as successive arrays of adders
to sum its partial products and generate the outputs of the multiplier— named
array multiplier, which can be modeled according to the following polynomials:
p0 := −
2n−1∑
k=0
2kzk + (2y0x1 +
2n−2∑
k=2
2ks0,k) + (
n−1∑
k=0
2kykx0 +
n−1∑
k=1
2k+n−1yn−1xk),
pj := −
2n−j−1∑
k=j+1
2ksj,k + (2j+1y0xj+1 +
2n−j−2∑
k=j+2
2ksj+1,k) + (
n−j−1∑
k=1
2k+jykxj +
n−1∑
k=j+1
2k−j+n−1yn−j−1xk),
pn−2 := −
n+1∑
k=n−1
2ksn−2,k + (2n−1y0xn−1) + (2n−1y1xn−2 + 2ny1xn−1),
for 1 ≤ j ≤ n− 3. Every polynomial pi is a canonical representation of an array
of 3-bit adders Si = (si,0, si,1, · · · , si,lni) implemented by a set of polynomials
Gi = {gi,0, gi,1, · · · , gi,2lni+1} as shown in Lemma 6, where lni + 1 is the length
of the array Si. The arrays are ordered from outputs to inputs of the multiplier
(reverse topological order). The ﬁrst array in the order as well as the longest ar-
ray is S0 which is represented canonically by the polynomial p0 with 2n outputs
(ln0 = 2n − 1) and two operands sets: {0, y0x1, s0,2, s0,3, · · · , s0,2n−2} together
with {y0x0, y1x0, · · · , yn−1x0, yn−1x1, · · · , yn−1xn−1}. The last and the shortest
3-bit adder array Sn−2 is described by polynomial pn−2, Sn−2 has three out-
puts and gets two operands sets: {0, y0xn−1} and {y1xn−2, y1xn−1}. In between
arrays Sj have the operands sets: {y0xj+1, sj+1,j+2, · · · , sj+1,2n−j−2} together
with {y1xj , · · · , yn−j−1xj , yn−j−1xj+1, · · · , yn−j−1xn−1}. The multiplier is con-
structed by feeding the inputs of each array Sj−1 the outputs of the array Sj .
To verify a SCN of a multiplier, the IMT procedure divides sets Gi wrt. the
multiplier speciﬁcation polynomial pr. By dividing each set Gi, variables of the
array Si—the outputs of its adder cells which are in the list {coi,0, si,0, coi,1, si,1,
· · · , coi,lni , si,lni}—will be substituted. Since every polynomial pi is a canonical
representation for the set Gi of an array Si, iterative divisions wrt. Gi can
be thought of as one division wrt. pi. Therefore, we express iterative divisions
Gi−−−→+ ri+1 restricted to substitute consecutively variables of Si by one division
pi−−→ ri+1.
The ﬁrst division of pr wrt. G0 of the array S0 is formulated as
4.2 verification complexity of sum carry networks 75
pr
p0−−→ r1 :=
n−1∑
k=0
2kxk ·
n−1∑
k=0
2kyk − (2y0x1 +
2n−2∑
k=2
2ksk,0) − (
n−1∑
k=0
2kykx0 +
n−1∑
k=1
2k+n−1yn−1xk) =
n−1∑
k=1
2kxk ·
n−2∑
k=0
2kyk − 2y0x1 −
2n−2∑
k=2
2ksk,0,
which can be thought of as subtracting p0 from pr. The next group of substitu-
tion iterations divides the set G1 of the array S1, subtracting the polynomial
p1 from r1, which is described as
r1
p1−−→ r2 :=
n−1∑
k=1
2kxk ·
n−2∑
k=0
2kyk −2y0x1 −22y0x2 −
2n−4∑
k=3
2ksk,2 −
n−2∑
k=1
2k+1ykx1 −
n−1∑
k=2
2k+n−2yn−2xk =
n−1∑
k=2
2kxk ·
n−3∑
k=0
2kyk − 22y0x2 −
2n−4∑
k=3
2ksk,2.
These groups of iterative substitutions can be expressed by
pr
p0−−→ r1 p1−−→ r2 · · · pn−3−−−−→ rn−2 :=
n−1∑
k=n−2
2kxk ·
1∑
k=0
2kyk − 2n−2y0xn−2 −
n+1∑
k=m−1
2ksk,n−2
pn−1−−−−→ rn−1 :=
n−1∑
k=n−2
2kxk ·
1∑
k=0
2kyk − 2n−2y0xn−2
− 2n−1y0xn−1 − 2n−1y1xn−2 − 2ny1xn−1 = 0,
terminating with a zero remainder.
As proved in Lemma 6, the complexity of dividing every set Gi of an array of
adder cells is bounded in space by O(lni + 1), as long as variables of every pair
(cow,i, sw,i) are substituted consecutively. This condition holds if the variables
of each Si are substituted consecutively according to the order coi,lni > si,lni >
coi,lni−1 > si,lni−1 > · · · > coi,0 > si,0. Under this substitution order the
computation complexity of dividing any set Gi is bounded by O(2n) since
the longest array S0 has the size 2n − 1. By subtracting polynomials pi from
pr which has the size n2 + n according to the reverse topological order (from
outputs to inputs), the size of every resulted remainder ri is reduced than |pr| by
|ri| = |pr| − 2n · i+ i2 − i. Because all |ri| < |pr| < n2 + n and the complexity
of dividing Gi is bounded by O(2n), the maximum size of a remainder after
any substitution step is less than n2 + n. Hence the complexity of verifying
a multiplier modeled as SCN is bounded in space by O(n2) under a speciﬁc
substitution order. 
76 symbolic computation for verifying complex multipliers
Since the adder cells in SCNs are not limited to 3-bit adders, it is practicable
to implement the integer multiplication function by other types of cell adders
such as 5-bit adders which have three outputs. At such cases, the outputs in
the tuple of adder cells functions Fa = (co0, · · · , com, s) must be substituted
consecutively to cancel carry terms revealed within polynomials of these adder
cells.
Example 11. A 5-bit adder cell (e.g., a 4:2 compressor) Fa = (co0, co1, s) has
ﬁve inputs {x1,x2,x3,x4,x5} and three outputs, once it is described by three
polynomials—one for every output—the carry terms are revealed as follows:
g2 : −co1 −2x3x2x1 + x3x2 + x3x1 + x2x1
g0 : −co0 −8x5x4x3x2x1 + 4x5x4x3x2 + 4x5x4x3x1 + 4x5x4x2x1 + 4x5x3x2x1
+4x4x3x2x1 − 2x5x4x3 − 2x5x4x2 − 2x5x4x1 − 2x5x3x2 − 2x5x3x1 − 2x5x2x1
−2x4x3x2 − 2x4x3x1 − 2x4x2x1 + x5x4 + x5x3 + x5x2 + x5x1 + x4x3 + x4x2
+x4x1
g0 : −s +16x5x4x3x2x1 − 8x5x4x3x2 − 8x5x4x3x1 − 8x5x4x2x1 − 8x5x3x2x1
−8x4x3x2x1 + 4x5x4x3 + 4x5x4x2 + 4x5x4x1 + 4x5x3x2 + 4x5x3x1 + 4x5x2x1
+4x4x3x2 + 4x4x3x1 + 4x4x2x1 + 4x3x2x1 − 2x5x4 − 2x5x3 − 2x5x2 − 2x5x1
−2x4x3 − 2x4x2 − 2x4x1 − 2x3x2 − 2x3x1 − 2x2x1 + x5 + x4 + x3 + x2 + x1 .
Carry terms are those that colored blue. By substituting variables (co0, co1, s) in
the speciﬁcation polynomial pr := −2co0 − 2co1 − s+ x1 + x2 + x3 + x4 during
the veriﬁcation process of the 5-bit adder, these carry terms cancel each other
linearly, resulting in a zero remainder.
Since Lemma 7 assumes that the multiplier is constructed from only 3-bit
adder cells, we extend Lemma 7 for multipliers that contain diﬀerent adder
cells with arbitrary numbers of inputs.
Lemma 8. For the multiplication function F implemented as a tuple of Boolean
functions F = (z0, · · · , z2n−1) with a set of primary inputs {x0, · · · ,xn−1,
y0, · · · , yn−1} and modeled by a SCN, let the SCN model of F consists of
arrays Si = (si,0, si,1, · · · , si,lni) which are built from adder cells Fai,w =
(coi,w,0, coi,w,1, · · · , coi,w,m, si,w) with arbitrary sizes, the complexity to verify
the SCN model of F wrt. its polynomial speciﬁcation pr := −
2n−1∑
k=0
2kzk +
n−1∑
k=0
2kxk ·
n−1∑
k=0
2kyk is bounded in space by O(n2), as long as the variables in the list
{coi,0,0, · · · , coi,0,m, si,0, coi,1,0, · · · , coi,1,m, si,1, · · · , coi,lni,0, · · · , coi,lni,m, si,lni}
of each array Si are substituted consecutively according to the order coi,lni,m >
4.3 problem of vanishing monomials 77
· · · coi,lni,0 > si,lni > coi,lni−1,m > · · · > coi,lni−1,0 > si,lni−1 > · · · > coi,0,m >
· · · > coi,0,0 > si,0.
Proof. The substitution order coi,lni,m > · · · coi,lni,0 > si,lni > coi,lni−1,m >
· · · > coi,lni−1,0 > si,lni−1 > · · · > coi,0,m > · · · > coi,0,0 > si,0 obligates
the veriﬁcation process to substitute consecutively outputs of each adder cell
Fai,w = (coi,w,0, · · · , coi,w,m, si,w) in F , which bounds the veriﬁcation complex-
ity of each adder array Si by O(n) since carry terms that cause the exponential
complexity are canceled under this obligated substitution order. Hence, as in
Lemma 7, the complexity of verifying an integer multiplier modeled as a SCN
is bounded in space by O(n2) regardless of the size of the compounded adder
cells. 
In the summary, rewriting the bit-level models of integer functions into SCNs
is crucial to circumvent the exponential complexity of these functions, other-
wise, the carry terms will only be eliminated after reducing them to the input
variables, which leads to an exponential increase in the number and size of
the nonlinear terms. The rewriting into SCNs is eﬀective for the veriﬁcation
of basic multiplier architectures, i.e., multipliers with simple partial products
generators and ripple carry adders in the last addition stage. However the IMT
procedure fails to verify more complex architectures even if they modeled as
SCNs because their models contain, besides carry terms, redundant nonlinear
terms named vanishing monomials. In the next section, we will provide an expla-
nation of this limitation, which forms the basis of our proposed model rewriting
algorithm that lifts bit-level multipliers into SCNs together with removing van-
ishing monomials from models of SCNs.
4.3 problem of vanishing monomials
Simplifying the circuit model by rewriting them into SCNs is not suﬃcient
for the veriﬁcation of integer multipliers that consist of parallel adders or Booth
recoding. The main reason are vanishing monomials—monomials that always
evaluate to zero—which appear in every algebraic model of these complex multi-
pliers. Unfortunately, the IMT cannot cancel these vanishing monomials before
substituting them with primary input variables. Some of the vanishing mono-
mials have the property that representing them by input variables will increase
the number of intermediate monomials exponentially, making the computation
unfeasible. In this and the following section, we illustrate the vanishing monomi-
als limitation with two examples: a PPA adder and a Booth partial product cell;
78 symbolic computation for verifying complex multipliers
and show how to overcome this problem by a new rewriting scheme enhanced
by logic reduction.
Example 12. Consider a circuit model of a 3-bit PPA:1
s3 = c2 =⇒ g1 := −s3 + c2
c2 = d2 ∨ (e2 ∧ d1) ∨ (e2 ∧ e1 ∧ d0) =⇒ g2 := −c2+e2d2e1d1d0 − e2e1d1d0
−e2d2e1d0 − e2d2d1 + e2e1d0 + e2d1 + d2
s2 = e2 ⊕ c1 =⇒ g3 := −s2 − 2c1e2 + c1+ e2
c1 = d1 ∨ (e1 ∧ d0) =⇒ g4 := −c1−e1d1d0 + e1d0 + d1
s1 = e1 ⊕ c0 =⇒ g5 := −s1 − 2c0e1 + c0 + e1
c0 = d0 =⇒ g6 := −c0 + d0
s0 = e0 =⇒ g7 := −s0 + e0
e2 = x2 ⊕ y2 =⇒ g8 := −e2 − 2y2x2 + y2 + x2
d2 = x2 ∧ y2 =⇒ g9 := −d2 + y2x2
e1 = x1 ⊕ y1 =⇒ g10 := −e1 − 2y1x1 + y1 + x1
d1 = x1 ∧ y1 =⇒ g11 := −d1 + y1x1
e0 = x0 ⊕ y0 =⇒ g12 := −e0 − 2y0x0 + y0 + x0
d0 = x0 ∧ y0 =⇒ g13 := −d0 + y0x0.
si is the sum bit, ci is the carry bit, and for every input bits xi, yi, there is
a generation bit di and a propagation bit ei. The vanishing monomials in this
model are colored red. As an example consider the vanishing monomial e1d1d0
of polynomial g4. Substituting e1 and d1 in this monomial yields e1d1d0
g10−−−−→
−2d1d0y1x1 + d1d0y1 + d1d0x1 g11−−−−→ −2d0y1x1 + d0y1x1 + d0y1x1 = 0. It is
clear that the IMT can easily cancel this monomial. However, the correspond-
ing monomial in the representation of the highest carry for an n-bit adder is
en−1 . . . e2e1d1d0. This follows from modeling the carry bit cn−1 = dn−1 ∨
(en−1 ∧dn−2)∨ (en−1 ∧ en−2 ∧dn−3)∨· · ·∨ (en−1 ∧ en−2 ∧· · ·∧ e2 ∧d1)∨ (en−1 ∧
en−2 ∧ · · · ∧ e2 ∧ e1 ∧ d0).
By substituting in this monomial according to the order en−1 > dn−1 >
· · · > e0 > d0, the number of vanishing monomials will increase from 1 to 3n−1
monomials with a maximum size of 2n variables. Consider another vanishing
monomial e2d2e1d0 of polynomial g2, the corresponding vanishing monomial
for the carry bit cn−1 is en−1dn−1en−2 . . . e2e1d0. By substituting in this mono-
mial with a diﬀerent order e0 > d0 > · · · > en−1 > dn−1 compared to the
1 Please recall that parallel preﬁx adders are typically found in the last stage of parallel mul-
tipliers.
4.4 logic reduction within model rewriting 79
previous one, the number of intermediate vanishing polynomials will increase
to be about 3n−1 with a maximum size of 2n variables. From these two examples,
we conclude that it is hard to ﬁnd a substitution order to cancel all vanishing
monomials before they blow up.
The experimental results of [99] conﬁrm the problem of the IMT with parallel
adders. Their results show that the technique cannot verify Kogge-Stone adders
with more than 6 bits. Concluding our observations above, the core problem
that we need to solve is the occurrence of a large number of vanishing monomials
that lead to an exponential blow-up when reduced to primary input variables.
4.4 logic reduction within model rewriting
This section presents the integration of a logic reduction rule within model
rewriting that consists of two rewriting schemes. The proposed solution elimi-
nates vanishing monomials before they cause a blow-up and rewrites bit-level
models into SCNs.
4.4.1 Logic Reduction
To overcome the limitation caused by vanishing monomials during the IMT,
we propose to apply logic reduction during the rewriting of the Gröbner basis
model in order to remove vanishing monomials before their blow-up. Looking
again at Example 12, it is easy to see that the monomial e1d1d0 can be removed
when considering that the variable e1 is the XOR of x1, y1, and the variable d1
is the AND of x1, y1. Based on this structural knowledge of the circuit model,
we can conclude that the monomial always evaluates to zero since (x ⊕ y) ·
(x ∧ y) = 0 for all x and y and therefore can be used to remove terms from
polynomials. If, e.g, f1 = x ⊕ y and f2 = x ∧ y, any term incorporating both
f1 and f2 can be removed. We refer to this as XOR-AND vanishing rule. By
keeping track of the original gate function and the input variables associated
to each variable, we can eﬀectively search for monomials that satisfy the XOR-
AND vanishing rule. Applying this rule will remove all the vanishing monomials
of the parallel adder model shown in Example 12, and will avoid the high
computation cost of the IMT.
We have published this idea in [87]. However, the XOR-AND vanishing rule
requires structural knowledge of the circuit model to be applied, so that we pro-
pose to generalize this rule, in the sense that no structural knowledge is needed
80 symbolic computation for verifying complex multipliers
and the rule is not restricted to the product of AND and XOR functions—the
rule can reduce the product of any arbitrary functions.
Let X be a set of variables and let f1 and f2 be two Boolean functions over the
variables X1 ⊆ X and X2 ⊆ X, respectively, with X1 ∩ X2 = ∅. If there exists
exactly one assignment to f2 such that it evaluates to true, it may be possible
that f1 is simpliﬁed to a constant value when assigning the common variables
according to that assignment. To illustrate the concept consider a multiplexer
function f1(x1,x2,x3) = x1x3 −x2x3 + x3 and f2(x1,x2) = x1x2. Clearly f2 =
1, only if x1 = x2 = 1, and f1(1, 1,x3) = 1. Therefore, we conclude that
f1f2 = f2 and we can simplify polynomials accordingly. For a polynomial g :=
−x4x5f1f2x6 + x4x5f2x6, by applying this rule on the monomial x4x5f1f2x6,
it is simpliﬁed to x4x5f2x6 and the polynomial g will be evaluated to g :=
−x4x5f2x6 + x4x5f2x6 = 0. This reduction rule is called one assignment rule.
An approach to apply one assignment rule is as follows:
1. Searching for monomials in the algebraic model that have two variables
of functions f1 and f2 which shared some of their inputs, such that the
function f2 has one satisﬁable assignment.
2. Reducing f1 after assigning values to shared inputs which evaluate f2 to
one.
3. If f1 is equal to zero or one, then rewriting the monomial by substituting
f1 with its value.
The one assignment rule supersedes the XOR-AND vanishing rule since the
one assignment rule is not restricted to speciﬁc Boolean functions and it does
not require a structure knowledge. However, because of the wide applicability
of the one assignment rule, unnecessary search for monomials that hold the
rule is performed. To solve this problem, we limit the applicability of the rule
to those functions f that have number of inputs less than or equal to eight.
This number is chosen based on the observation, that the rule is applied within
a rewriting scheme (as shown in the next subsection) which rewrites the bit-
level multipliers into adder cells where the maximum size of an adder cell is
eight inputs. This restriction might be reﬁned when the one assignment rule is
utilized for other applications.
4.4.2 Rewriting Schemes
Although the correspondence between gates in the circuits and variables in
the polynomials is given, rewriting the model is crucial to reveal vanishing
4.4 logic reduction within model rewriting 81
monomials together with carry terms by lifting the bit-level description of the
circuit into a SCN. Without rewriting the model:
1. if no substitution is applied, one may not see monomials that contain
variables satisfying the one assignment rule as well as carry terms, and
2. if arbitrary substitution is applied, a blow-up may occur within the rewrit-
ing process because of substituting some variables of these nonlinear
terms.
Both cases prohibit the application of the one assignment rule as well as can-
celing carry terms for each other during the IMT procedure. Consequently, we
integrate two rewriting schemes which permit to reveal within models of SCNs
vanishing monomials in addition to carry terms.
XOR rewriting
We propose the XOR rewriting scheme that preserves input/output variables
of XOR gates, it is carried out after Gröbner bases modeling for the circuit and
before applying the IMT procedure, it performs the following steps:
1. Store all variables in a list V that refer to either input and output variables
of an XOR gate or to primary inputs and primary outputs. Variables that
are outputs of XOR gates and given as inputs for other XOR gates will
also be excluded from the list.
2. Rewrite the model by substituting some variables of the model using the
rewrite rule (see Deﬁnition 17) such that the model depends only on
variables in V . After each substitution, the approach to apply the one
assignment rule is performed.
As shown in Figure 20, the result of this XOR rewriting are set of polynomials
that model chains of XOR gates and others that model Boolean functions (com-
bination of arbitrary logic gates with multiple inputs and one output) given as
inputs to the XOR chains. In the ﬁgure, the cloud symbols refer to the com-
bination of arbitrary logic gates, x are primary inputs, and xo are primary
outputs.
Example 13. Continue with Example 12, the algebraic model of the parallel
adder after XOR rewriting is as follows:
s3 = c2 =⇒ g1 := −s3 + c2
82 symbolic computation for verifying complex multipliers
x0 g1 xo0
... ... g4 ...
... g3 xon
xn g2 ...
Figure 20: Schematic of Circuit Model after XOR Rewriting
c2 = d2 ∨ (e2 ∧ d1) ∨ (e2 ∧ e1 ∧ d0) =⇒ g2 := −c2 + e2e1x2y2x1y1x0y0 −
e2e1x1y1x0y0 − e2e1x2y2x0y0 − e2x2y2x1y1 + e2e1x0y0 + e2x1y1 + x2y2
s2 = e2 ⊕ c1 =⇒ g3 := −s2 − 2c1e2 + c1+ e2
c1 = d1 ∨ (e1 ∧ d0) =⇒ g4 := −c1−e1x1y1x0y0 + e1x0y0 + x1y1
s1 = e1 ⊕ c0 =⇒ g5 := −s1 − 2c0e1 + c0 + e1
c0 = d0 =⇒ g6 := −c0 + x0y0
s0 = e0 =⇒ g7 := −s0 + e0
e2 = x2 ⊕ y2 =⇒ g8 := −e2 − 2y2x2 + y2 + x2
e1 = x1 ⊕ y1 =⇒ g10 := −e1 − 2y1x1 + y1 + x1
e0 = x0 ⊕ y0 =⇒ g12 := −e0 − 2y0x0 + y0 + x0
XOR rewriting have eliminated all variables that are not inputs or outputs of
chain of XOR gates, therefore, it keeps only variables in the list {e0, e1, e2, c0,
c1, c2}. This leads to lifting the bit-level description into polynomials expressing
mainly the functions of the sum and carry variables of adder cells existing within
the circuit, which exposes vanishing monomials same as in the previous model
of Example 12.
Also, we have observed that the XOR rewriting is eﬃcient to reveal vanish-
ing monomials of the Booth partial product cell, this is demonstrated by the
following example.
4.4 logic reduction within model rewriting 83
y2j−1 xi
y2j xi−1
PPj,i
0
1
y2j+1
. . .
. . .
a2
a1
a3
Figure 21: Booth Partial Product Cell
Example 14. Figure 21 shows a Booth partial product cell, which is a building
block of many eﬃcient multiplier circuits. The cell has mainly two parts. The
ﬁrst part is Booth’s encoder with output variables a1 and a2. The second part is
the generator which generates the partial product PPj,i. Note that a1 and a2 are
utilized to generate further partial product bits, so they are considered multiple
fanout variables. Overall, the fanout variables of the circuit are a1, a2,PPj,i,
while the XOR variables are a1, a3,PPj,i (remember also inputs of an XOR are
stored in V , here x1). The algebraic model of the circuit after XOR rewriting
will be:
g1 := −PPj,i − 2a3y2j+1 + a3 + y2j+1
g2 := −a3 + a1y2j+1y2jxixi−1 + a1y2j+1y2j−1xixi−1 − a1y2j+1xixi−1 + a1xi −
y2j+1y2jxi−1 − y2j+1y2j−1xi−1 + y2j+1xi−1 + y2jy2j−1xi−1 − a1y2jy2j−1xixi−1
g3 := −a1 − 2y2jy2j−1 + y2j + y2j−1.
The vanishing monomial a1y2jy2j−1xixi−1 will appear. It contains a1, which is
the XOR of y2j and y2j−1, and the product y2jy2j−1, which is the AND of y2j
and y2j−1.
Common rewriting
To reveal carry terms, a rewriting scheme named fanout rewriting [39], has
been proposed based on the fanouts of the circuit gates, such that the model
terms will depend only on shared variables. This dependency increases the
chance of exposing terms with common monomials, which is the main feature
84 symbolic computation for verifying complex multipliers
of carry terms. The fanout rewriting is performed in two steps: 1) It ﬁnds the
gates that have multiple fanouts and stores the corresponding output variables
in a list, 2) It substitutes all variables that are not in this list, such that the
model will depend only on fanouts, primary inputs, and primary outputs. The
rewriting allows revealing carry terms that cancel each other during the IMT
without exponential complexity. This positive eﬀect is not provided by XOR
rewriting, making the veriﬁcation ineﬃcient if only XOR rewriting is applied.
Hence, we propose to carry out a further rewriting called common rewriting,
which is similar to fanout rewriting, after XOR rewriting. Common rewriting
exposes carry terms by making the polynomials depend on shared variables. It
rewrites the model obtained from XOR rewriting such that the polynomials
depend only on variables that are used in more than one polynomial. This is
very similar to fanout rewriting, but since we are no longer working on the
original circuit model, one cannot strictly speak of fanout variables.
Example 15. Continue with Example 12 and Example 13, the algebraic model
of the parallel adder after removing vanishing monomials is as follows:
g1 := −s3 + c2
g2 := −c2 + e2e1x0y0 + e2x1y1 + x2y2
g3 := −s2 − 2c1e2 + c1+ e2
g4 := −c1 + e1x0y0 + x1y1
g5 := −s1 − 2c0e1 + c0 + e1
g6 := −c0 + x0y0
g7 := −s0 + e0
g8 := −e2 − 2y2x2 + y2 + x2
g10 := −e1 − 2y1x1 + y1 + x1
g12 := −e0 − 2y0x0 + y0 + x0.
Since the variables in the list {e0, c0, c1, c2} are not used as inputs for more than
one polynomial in the model, the common rewriting eliminates these variables
deriving a new model as follows:
g1 := −s3 + e2e1x0y0 + e2x1y1 + x2y2
g3 := −s2 −2e2e1x0y0 − 2e2x1y1 +e1x0y0 + x1y1 + e2
g5 := −s1 −2e1x0y0 +x0y0 + e1
g7 := −s0 −2x0y0 + y0 + x0
g8 := −e2 −2x2y2 + y2 + x2
g10 := −e1 −2x1y1 + y1 + x1.
The resulted model after common rewriting reveals carry terms which are colored
similarly.
4.4 logic reduction within model rewriting 85
4.4.3 Overall Algorithm
Both XOR rewriting and common rewriting follow two steps, which are identi-
fying a set of variables and then substituting all remaining variables. Hence, the
rewriting can be explained by a generalized algorithm, named Gröbner Rewrit-
ing (GB-Rew), illustrated in algorithm 2. It substitutes the variables that are
not in V using the rewrite rule (see Deﬁnition 17). Additionally, within the GB-
Rew, monomials are removed from the model using the one assignment rule
after every substitution.
The rewriting is performed by substituting variables of every single polyno-
mial in the model. The polynomials are considered in reverse order of their lead-
ing monomial variables. i.e., for a model of two polynomials g1 := x1 + tail(g1)
and g2 := x2 + tail(g2) with monomial ordering (The variables are ordered
according to the reverse topological order of the circuit, as explained in Subsec-
tion 2.2.3). x2 > x1, the polynomial g1 will be considered ﬁrst.
Within a single polynomial, the substitution order of the variables plays a
role in enhancing the time performance of the rewriting. The substitution order
is chosen according to the number of terms in the tail part of their polynomials.
For example, assume a single polynomial gs has a term x1x2, to rewrite gs, x1
and x2 are required to be substituted by the rewriting rule. Based on their poly-
nomials g1 := x1 + tail(g1) and g2 := x2 + tail(g2), variable x1 is substituted
before x2 if the number of terms (n1) in tail(g1) is smaller than the number of
terms (n2) in tail(g2) . In the case that the two substitutions yield to cancel
some intermediate terms (number of terms of gs will be increased by less than
n1 · n2 after the two substitutions), following the proposed order reduces the
maximum number of terms of the intermediate forms of the polynomial gs.
After ﬁnishing the model rewriting and removing vanishing monomials, all
polynomials whose leading monomial variables are not in the variables list V
will be removed, because they have been substituted during rewriting.
The overall rewriting algorithm is carried out before calling the IMT proce-
dure by executing successively XOR rewriting and common rewriting schemes
to obtain from the bit-level description of the multiplier a SCN model. This is
also illustrated by Algorithm 3, and is referred as model rewriting.
4.4.4 Discussion
So far, we have demonstrated that the proposed integration between XOR-
rewriting scheme and common rewriting scheme brings implicitly the bit-level
86 symbolic computation for verifying complex multipliers
Algorithm 2 Gröbner Basis Rewriting (GB-Rew)
Require: Variables V , Circuit Model G
Ensure: Model Gn rewritten wrt. V
1: for gi ∈ G do {in reverse order of leading monomials}
2: lv ← lm(gi)
3: r ← gi − lv
4: while Vars(r) ⊆ V do
5: Choose vt ∈ Vars(r) \ V
6: Choose gt ∈ G such that lm(gt) = vt
7: r ← Spoly(r, gt)
8: r ← XORAND-Rule(r)
9: end while
10: gi ← r + lv
11: end for
12: Gn ← UpdateModel(G,V ) {Remove polynomials whose leading terms are
not in V }
13: return Gn
Algorithm 3 Model Rewriting
Require: Speciﬁcation Polynomial pspec, Circuit Model G
Ensure: Circuit Model G
1: V ← XORRewritingVariables(G)
2: G ← GB-Rew(V ,G)
3: V ← CommonRewritingVariables(G)
4: G ← GB-Rew(V ,G)
5: return G
description into a SCN by exposing carry terms as well as purifying the model
from vanishing monomials, but why such integration is capable of doing this
task. The proposed rewriting schemes exploit a common property of diﬀerent
multiplier architectures: they are composed of adder cells such as 3-bit adders or
5-bit adders, which generate sum and carry bits, where the sum bit is the output
of a chain of XOR gates, while they diﬀer in implementations of the function of
carry bits. For a given gate netlist, the XOR rewriting detects chains of XOR
gates and assumes that a sum bit of an adder component is the result of every
single chain. For an adder cells with n inputs xi, the XOR rewriting eliminates
the internal variables of the cell, consequently, it describes the function of a sum
bit s and the functions of output carry bits coj by polynomials mapping them
directly to input variables xi. Substituting only these internal variables reveals
4.5 ideal membership testing 87
Algorithm 4 IMT with Substitution Rules
Require: Speciﬁcation polynomial pr, circuit polynomials G = {g1, g2, . . . , gs}
Ensure: Remainder r
1: V ← OrderedPolynomialV ariables(pr,G) { Reverse Topological Order}
2: r ← pr
3: x ← SelectingV ariable(r,V ) { Searching for a variable satisfying one of
substitution rules}
4: while x ∈ PrimaryInputs do
5: Choose gt ∈ G such that lm(gt) = x
6: r
gt−−−→+ r
7: x ← SelectingV ariable(r,V )
8: end while
vanishing monomials in the polynomials describing the carry functions and in
the same time it keeps these monomials without further substitutions that may
cause a blow-up. Applying common rewriting after XOR rewriting yields the
revealing of carry terms which are shared among sum and carry polynomials.
Thus the combination of these two rewriting schemes succeeds to rewrite a large
class of bit-level multiplier architectures into models of SCNs.
Since the computational complexity of verifying SCNs is polynomial under a
speciﬁc substitution order, as proved by Lemma 8, the proposed model rewriting
algorithm is the ﬁrst stage toward a full automated veriﬁcation for bit-level
multiplier architectures. The second stage is an eﬃcient IMT procedure which
applies the substitution order that averts an exponential blow-up in the size
of the veriﬁcation problem. In the next section, we introduce substitution rules
that qualify the IMT for performing such a task.
4.5 ideal membership testing
The IMT can be thought of as an algorithm that is given in every iteration
a polynomial ri and set of variables V = {x0,x1 · · · ,xn} that are related to a
set of polynomials G (the Gröbner basis Model). The task of the IMT is to sub-
stitute in ri one of the variables x ∈ V for its corresponding polynomial g ∈ G
using the rewrite rule, ensuing a new polynomial ri+1 which is together with V
are the inputs of the next iteration. In each iteration i, it decides which vari-
able is better to be substituted, thereby it circumvents a potential blow-up in
the sizes of resulted polynomials at next iterations. The IMT terminates when
there is no further substitutions that can be performed, i.e., the resulted poly-
88 symbolic computation for verifying complex multipliers
nomial depends only on primary inputs of the model. Under the assumption
that there is a substitution order that solves the veriﬁcation problem in a poly-
nomial space, the challenge of the IMT procedure is to ﬁnd this substitution
order.
For the veriﬁcation problem of the multiplier, in [23, 39], a ﬁxed substitution
order is given to the IMT based on the reverse topological order of the cir-
cuit, which restricts variables that have the same level and depend on common
inputs to be substituted consecutively. However, this given substitution order
does not solve the problem eﬀectively in case of complex multipliers—it is not
always the optimal order. Based on our analysis to the veriﬁcation of SCNs of
multipliers in Section 4.2, we support this ﬁxed substitution order by substitu-
tion rules that qualify the IMT procedure to take substitution decisions that
are consistent with Lemma 8. To guarantee a tractable veriﬁcation process for
SCNs of multipliers, Lemma 8 stipulates to substitute consecutively: 1) output
variables of each adder cell in SCNs and 2) outputs of each subset of adder
cells that build an array, i.e., for two successive adder cells in an array of SCNs
with tuples Fa = (co0, · · · , com, s) and Faˆ(cˆo0, · · · , cˆom, sˆ), the optimal substi-
tution order is co0 > · · · > com > s > cˆo0, · · · , cˆom > sˆ. Finding this optimal
substitution order will be feasible if boundaries of arrays and their constitutive
adder cells are known. Unfortunately, this is not the case with SCNs generated
by the model rewriting algorithm, whereas boundaries of their components are
not identiﬁed. Furthermore, we have observed that for some complex multi-
plier architectures, the model rewriting algorithm is not capable of revealing
all carry terms within adder cells of SCNs. This means that the given input to
the IMT is not always a pure SCN, this implies that there are some nonlinear
terms will not be canceled without eliminating at least one variable from such
terms. To overcome these problems, we propose substitution rules that aim to
deduce the optimal substitution order without a knowledge about boundaries
of a given non-pure SCN, whereas the ﬁxed substitution order of [23] is not
eﬃcient to deal with complex multipliers even after rewriting their models into
SCNs. This ﬁxed order is applicable only for multipliers consisting mainly of
full adders arrays, where the outputs of each full adder have the same level and
depend on common inputs, thus the ﬁxed order substitutes outputs of each full
adder consecutively, as it is stipulated by Lemma 8.
Our proposed substitution rules that qualify the IMT to ﬁnd the proper
substitution order for non-pure SCNs are as follows:
1. Rule #1 obliges the IMT to substitute ﬁrst variables in linear terms (terms
with a single variable) once they are exposed in a polynomial ri before
4.5 ideal membership testing 89
those in nonlinear terms. In case that a variable is included in nonlinear
terms as well as a linear term, only the linear term will be eliminated.
2. Rule #2 is applied when there are no variables satisfying rule #1. It prior-
itizes substituting a variable xs over other variables of nonlinear terms in
ri. If there is a nonlinear term t1 which is the product of variables in the
set X1, where xs ∈ X1, and there is another nonlinear term t2 with set of
variables X2, such that X1 = X2 ∪ {xs} (i.e., the diﬀerence between the
two sets is only xs).
3. Rule #3 is valid when Rule #1 and Rule #2 are not possible. It gives a
higher priority for a variable xs ∈ X1 of a nonlinear term t1, if there is
another nonlinear term t2 with set of variables X2, such that X1 −{xs} ⊂
X2 (i.e., in addition to xs, X2 has more variables that are not in X1).
4. If the IMT has to choose between more than one variable that all satisfy
one of the previous rules, Rule #4 selects the highest variable in the
reverse topological order. Rule #4 is also applied when there are no more
variables satisfying any of the previous rules.
To discuss the eﬃciency of these rules, note that the outputs of adder cells are
expressed by linear terms, e.g., the tuple of adder cells outputs (s, co0, · · · , con)
is represented in a polynomial ri as −2con − · · · − 2co0 − s. Rule #1 exploits
this feature to identify output variables of adder cells and substitute them ﬁrst.
The combination between rule #1 and rule #4 prohibits to eliminate variables
from an array of adder cells Sˆ before ﬁnishing the elimination of all variables in
the previous array S since the reverse topological order classiﬁes the variables
of Sˆ and S into two separated groups. Thus such combination qualiﬁes the IMT
to take substitution decisions that are consistent with Lemma 8.
Rule #2 as well as Rule #3 deal with nonlinear terms that are exposed since
the given model to the IMT is not a pure SCN. The rules increase the chances
of canceling the terms to each other by prohibiting the elimination of shared
variables among these nonlinear terms, whereas terms t1 and t2 cancel each
other when they are the products of the same set of variables X and their
coeﬃcients diﬀer only in signs.
The rules are applied by modifying the original IMT algorithm (see Algo-
rithm 1). As shown in Algorithm 4, the variables in the list V of the model G
are sorted based on the reverse topological order. Then IMT is given in each
iteration a remainder r and the ordered list V to select one variable x ∈ V that
90 symbolic computation for verifying complex multipliers
satisﬁes one of substitution rules. The algorithm terminates when all variables
of r are primary inputs.
In the summary, we have supported the IMT procedure by substitution rules
that are independent of the structural knowledge of the circuit, the rules en-
hance the applicability as well as the time performance of the IMT. In the
next section, the chapter discusses the second input of the IMT which is the
speciﬁcation polynomial.
4.6 specification polynomial
In the symbolic computation technique over a Boolean ring, the veriﬁcation
process of a multiplier is performed by dividing the algebraic module of the mul-
tiplier circuit wrt. the polynomial pr :=
2n−1∑
i=0
−2isi +
n−1∑
i=0
2ixi ·
n−1∑
i=0
2iyi. However,
we have observed a problem with such a formulation for the pr, it does not match
mathematically with all models of multipliers. The problem is remarked with
speciﬁc types of multipliers such as those consisting of Booth partial products
or redundant binary addition trees. The problem is also seen during the veri-
ﬁcation of partial multipliers where not all 2n outputs are veriﬁed, but rather
m < 2n outputs are involved in the veriﬁcation process. The problem is mani-
fested by nonlinear terms that the absolute values of their coeﬃcients are larger
than 2m, where m is the number of the multiplier outputs which are under the
veriﬁcation. This problem can be illustrated by the following example.
Example 16. Consider the model of a part of a multiplier (3-output bits) veri-
ﬁed against the speciﬁcation polynomial pr := −4z2 −2z1 − z0 + 4y2x0 + 4y1x1 +
4x2y0 + 2y1x0 + 2x1y0 + y0x0. The model of this multiplier part is as follows:
z2 = c1 ⊕ (y2 ∧x0)⊕ (y1 ∧x1)⊕ (y0 ∧x2) =⇒ g4 := −z2 −8c1y2x2y1x1y0x0+
4c1y2y1x1x0 + 4c1x2x1y1y0 + 4c1y2x2y0x0 + 4y2x2y1x1y0x0 − 2c1y2x0 − 2c1y1x1
−2c1x2y0 − 2y2y1x1x0 − 2y2x2y0x0 − 2x2y1x1y0 + c1 + y2x0 + y1x1 + y0x2
c1 = (y1 ∧ x0) ∧ (y0 ∧ x1) =⇒ g3 := −c1 + y1x1y0x0
z1 = (y1 ∧ x0)⊕ (y0 ∧ x1) =⇒ g2 := −z1 − 2y1x1y0x0 + y1x0 + y0x1
z0 = (y0 ∧ x0) =⇒ g1 := −z0 + y0x0.
The veriﬁcation process is performed by iterative divisions:
pr
g4−−→ r1 := 32c1y2x2y1x1y0x0 − 16c1y2y1x1x0 − 16c1x2x1y1y0
−16c1y2x2y0x0 − 16y2x2y1x1y0x0 + 8c1y2x0 + 8c1y1x1 + 8c1x2y0 + 8y2y1x1x0
+8y2x2y0x0 + 8x2y1x1y0 − 4c1 − 2z1 − z0 + 2y1x0 + 2x1y0 + y0x0
g3−−→ r2 g2−−→ r3 g1−−→ r4 : −16y2y1x1y0x0 − 16x2y1x1y0x0 + 8y2y1x1y0x0 +
8y1x1y0x0 + 8x2y1x1y0x0 + 8y2y1x1x0 + 8y2x2y0x0 + 8x2y1x1y0.
4.7 experimental results 91
The ﬁnal remainder r4 is not equal to zero, it consists of nonlinear terms that
have coeﬃcients with absolute value larger than 22. Therefore, the speciﬁcation
does not match the model of the partial multiplier.
In case of redundant multipliers such as those that include Booth recoding
or redundant trees, the accumulator trees of these n-bit multipliers are not
only fed partial products but also other input bits [62], so that these trees
should produce mathematically more than 2n outputs, however, only the least
2n outputs are considered as the multiplier outputs. This implies that parts
of the addition trees of redundant multipliers are involved in the veriﬁcation
process, while other outputs are discarded. Thus verifying 2n outputs of n-bit
redundant multipliers has the same problem like verifying a partial multiplier.
To overcome this problem, we propose the idea of adding modulo 2m+1 to the
speciﬁcation of integer multipliers, such that the speciﬁcation matches partial
multipliers and redundant multipliers. Modulo 2m+1 is performed by removing
from the ﬁnal remainder r the terms that their coeﬃcients are multiples of 2m.
Because of that the speciﬁcation polynomial is formulated as:
m∑
i=0
−2isi +
n−1∑
i=0
2ixi ·
n−1∑
i=0
2iyi mod 2m+1.
Example 17. Continue with Example 16, but modify the speciﬁcation polyno-
mial to be pr := −4z2 − 2z1 − z0 + 4y2x0 + 4y1x1 + 4x2y0 + 2y1x0 + 2x1y0 +
y0x0 mod 8. The remainders of iterative divisions will be as follows:
pr
g4−−→ r1 := 32c1y2x2y1x1y0x0 − 16c1y2y1x1x0 − 16c1x2x1y1y0
−16c1y2x2y0x0 − 16y2x2y1x1y0x0 + 8c1y2x0 + 8c1y1x1 + 8c1x2y0 + 8y2y1x1x0
+8y2x2y0x0 + 8x2y1x1y0 − 4c1 − 2z1 − z0 + 2y1x0 + 2x1y0 + y0x0 mod 8 =
−4c1 − 2z1 − z0 + 2y1x0 + 2x1y0 + y0x0 mod 8
g3−−→ r2 g2−−→ r3 g1−−→ r4 := 0. The divisions terminate with zero remainder,
which means that the modiﬁed speciﬁcation matches the partial multiplier.
4.7 experimental results
The enhanced symbolic computation technique by the proposed algorithms
in this chapter is named (SC-LR) since it integrates logic reduction with the
symbolic computation. SC-LR and the algorithm of [39] (SC-FO) have been
implemented in C++. The experiments have been carried out on an Intel(R)
92 symbolic computation for verifying complex multipliers
Table 4: Veriﬁcation Results for SP Multipliers
Benchmark I/O bits Commercial CPP [85] SC-FO [39] SC-LR
(h:m:s) (h:m:s) (h:m:s) (h:m:s)
SP-AR-RC 16/32 00:00:01 00:01:23 00:00:01 00:00:02
SP-WT-CL 16/32 00:00:01 00:00:46 TO 00:00:05
SP-RT-KS 16/32 00:00:43 - TO 00:00:17
SP-CT-BK 16/32 00:00:59 00:00:43 TO 00:00:04
SP-AR-RC 32/64 00:00:11 02:34:40 00:00:09 00:00:21
SP-WT-CL 32/64 00:00:06 00:15:12 TO 00:03:27
SP-DT-HC 32/64 00:00:09 00:21:14 TO 00:02:05
SP-CT-BK 32/64 TO 00:21:20 TO 00:01:35
SP-AR-RC 64/128 00:02:52 94:37:20 00:02:56 00:07:40
SP-WL-CL 64/128 00:00:36 05:46:40 TO 02:18:34
SP-RT-KS 64/128 TO - TO 02:51:12
SP-CT-BK 64/128 TO 05:31:44 TO 00:47:48
SP-AR-RC 128/256 01:03:34 TO 00:48:03 02:08:51
SP-CT-BK 128/256 TO 78:11:12 TO 14:03:33
Core(TM) i5-3320M CPU (2.6 GHz, 16 GByte) running Linux. For the ex-
periments, the multipliers are given as Verilog RTL code. The designs were
synthesized to gate level netlists using Yosys [101].
To evaluate the practical time of the SC-LR in verifying multipliers with
diﬀerent architectures, we apply it to verify n-bit multipliers against the speci-
ﬁcation equation:
2n−1∑
i=0
−2isi +
n−1∑
i=0
2ixi ·
n−1∑
i=0
2iyi mod 22n.
In Tables 4 and 5, we compare the runtimes of the proposed technique SC-
LR, against our re-implementation of SC-FO [39], the presented algorithm in
Chapter 3 which is referred as Checking Partial Product (CPP) approach, and
the equivalence checker of the commercial tool OneSpin (after enabling multi-
plier options). The ﬁrst column of Tables 4 and 5 shows the name of the circuit.
The second column gives the number of inputs and output bits. The next four
4.7 experimental results 93
Table 5: Veriﬁcation Results for BP Multipliers
Benchmark I/O bits Commercial CPP [85] SC-FO [39] SC-LR
(h:m:s) (h:m:s) (h:m:s) (h:m:s)
BP-AR-RC 16/32 00:00:14 - TO 00:00:02
BP-WT-CL 16/32 00:00:16 - TO 00:00:09
BP-RT-KS 16/32 00:00:18 - TO 00:00:17
BP-CT-BK 16/32 00:00:13 - TO 00:00:06
BP-AR-RC 32/64 TO - TO 00:00:17
BP-WT-CL 32/64 TO - TO 00:04:46
BP-RT-KS 32/64 TO - TO 00:05:36
BP-CT-BK 32/64 TO - TO 00:02:20
BP-AR-RC 64/128 TO - TO 00:05:06
BP-WT-CL 64/128 TO - TO 03:03:48
BP-DT-HC 64/128 TO - TO 00:58:44
BP-CT-BK 64/128 TO - TO 00:37:53
BP-AR-RC 128/256 TO - TO 01:29:10
BP-CT-BK 128/256 TO - TO 15:14:49
columns provide the runtimes. The time out (TO in the table) has been set
to 100 hours. For the CPP approach, “-” refers to the fact that CPP cannot
be used for some architectures of multipliers. The experimental results clearly
demonstrate the advantage of the proposed enhancement. While for multipli-
ers with simple partial products (Table 4) the other approaches sometimes can
verify the correctness, for the complex parallel architectures (Table 5) only our
enhanced technique solves the veriﬁcation problem when the instances reach
relevant sizes. As can be seen, we are able to verify the correctness for up to
128 bits.
Please note that all benchmarks time out after 100 hours when performing
veriﬁcation using a naive miter construction (one big miter; ABC [13] using
command ‘cec’).
Table 6 shows some statistics about the SC-LR. The columns give the circuit
name, number of circuit bits, number of vanishing monomial that are canceled
by one assignment rule (#CVM), the run-time of the IMT after model rewriting,
and ﬁnally statistics on the rewritten model. The model statistics columns show
94 symbolic computation for verifying complex multipliers
Table 6: Statistics for Veriﬁcation of Multipliers by SC-LR
Benchmark I/O #CVM IMT #P #M #MP #VM
bits (h:m:s)
BP-WT-CL 32/64 39651 00:00:40 1965 18186 142 65
BP-RT-KS 32/64 42000 00:00:38 1989 23341 200 69
SP-DT-HC 32/64 15842 00:00:23 3011 18267 124 63
SP-CT-BK 32/64 4480 00:00:37 2702 37137 256 62
BP-WT-CL 64/128 325377 00:10:47 7180 71473 331 129
BP-DT-HC 64/128 134367 00:06:45 6491 70635 260 130
SP-RT-KS 64/128 290053 00:13:08 13106 95314 376 131
SP-CT-BK 64/128 22228 00:07:09 10676 148381 274 124
SP-CT-BK 128/256 106970 01:25:53 42016 592715 530 252
number of polynomials (#P), number of monomials (#M), maximum size of
a polynomial wrt. its number of monomials (#MP), and maximum size of a
monomial wrt. its number of variables (#VM). The results of Table 6 show
that multipliers with carry look ahead adders or with Kogge-stone adder have
the largest number of vanishing monomials and therefore the largest execution
time. Furthermore, it can be seen that the SC-LR spent most of the execution
time in rewriting the circuit model.
A further remarkable result is that for the 64-bit BP-WT-CL, the number of
vanishing monomials is “325377”, while the size of its model after rewriting is
“71473” monomials. This means that the number of these redundant terms (van-
ishing monomials) is four times larger than the size of the overall model, which
explains the major inﬂuence of removing vanishing monomials on reducing the
complexity of the veriﬁcation problem.
4.8 summary and future work
In this chapter, we have presented innovative ideas which enhance the sym-
bolic computation technique for veriﬁcation of complex parallel multiplier ar-
chitectures.
First, we have analyzed the computational complexity of verifying multipliers
described as SCNs, which is proved to be polynomial in space under a speciﬁc
substitution order.
4.8 summary and future work 95
Then, a model rewriting algorithm has been proposed to lift bit-level repre-
sentations of complex parallel multiplier into SCNs. The algorithm is based on
canceling vanishing monomials that are distributed within multiplier models in
an eﬃcient way before their exponential blow-up during the IMT. It rewrites
the algebraic model of the multiplier based on the XOR gates of the netlist
in that it reveals monomials that satisfy the one assignment rule, making the
removing of theses vanishing monomials very simple.
Also, the IMT has been qualiﬁed by new substitution rules to take better
decisions, bypassing unnecessary exponential blow-up in space during recursive
divisions performed by the IMT procedure.
Finally, experimental results have demonstrated the eﬃciency of the en-
hanced technique, i.e. for all complex parallel multipliers we veriﬁed the cor-
rectness within seconds to 15 hours (for 128 bits), while all other approaches
reached the timed out limit of 100 hours and gave no result.
Directions for future work include:
1. Sophisticated approaches to extract exactly the XOR information from
a netlist are motivated by the problem that the XOR rewriting scheme
depends on structural information of the netlist like XORs which are not
always available.
2. The IMT procedure can be upgraded by new heuristics for taking substitu-
tion decisions and exploiting successful ideas from the ﬁeld of satisﬁability
solvers (SAT/SMT) such as eﬀective learning approaches.
3. Heavy optimized multipliers are still major challenges for all formal veriﬁ-
cation techniques including the enhanced symbolic computation technique
since SCNs cannot be constructed from such optimized netlists. Investi-
gating this problem is a necessity for automated veriﬁcation of circuits
that incorporate such multipliers.

5
EQUIVALENCE CHECKING OF FLOATING -PO INT
MULTIPL IERS US ING GRÖBNER BASES
Boolean reasoning based on Gröbner bases (available with symbolic compu-
tation packages) oﬀers a robust mechanism that veriﬁes arithmetic circuits at
gate-level (see e.g., [39, 78, 87]) and their power has recently been demonstrated
in formally verifying large class of bit-level multipliers, as shown in the previous
chapter. Motivated by this recent success of the symbolic computation tech-
nique in formal veriﬁcation of large scale gate-level multipliers, in this chapter,
we propose an algebraic equivalence checking for handling circuits that contain
both complex arithmetic components as well as control logic. These circuits
pose major challenges for existing proof techniques including symbolic compu-
tation based reasoning techniques and no satisfactory solution has yet been
presented. Toward solving this problem, we propose Algebraic Combinational
Equivalence Checking (ACEC), a technique that allows reasoning over circuits
which combine data-path and control logic using symbolic computation. To the
best of our knowledge, this is the ﬁrst full automated technique to formally
verify binary ﬂoating-point circuits without any kind of case splitting or other
manual eﬀorts.
A naive algebraic equivalence checking to verify such circuits models the
two compared circuits in the form of Gröbner bases over the Boolean ring and
combines them into a single algebraic model. Then it checks the equivalences
between the corresponding outputs of the two circuits by testing their mem-
bership in the combined model, as shown in Figure 22. The problem of this
setting is that during the recursive divisions performed by the ideal member-
ship testing (IMT) the sizes of the resulted remainders blow up exponentially.
Since the IMT does not scale for the described setting, we propose reverse engi-
neering to identify boundaries of arithmetic components and to abstract them
to canonical representations. Further, we propose arithmetic sweeping which
utilizes the abstracted components to ﬁnd and prove internal bit and word
equivalences between both circuits. The ACEC integrates the reverse engineer-
ing and the arithmetic sweeping algorithms to decompose automatically the
veriﬁcation problem, which bypasses the exponential blow-up of the problem
size. We demonstrate the applicability of ACEC for checking the equivalence of
a ﬂoating point multiplier (including full IEEE-754 rounding scheme) against
97
98 equivalence checking of floating-point multipliers using gröbner bases
Circuit
Netlist 1
Gröbner
Modeling
N1
Gröbner
Modeling
Circuit
Netlist 2
N2
Combined
Model
G1 G2
Membership
Testing
Output
Relationships
Inconsistency
Equivalence
G
Figure 22: Naive Equivalence Checking Setting
several optimized and diversiﬁed implementations which cannot be veriﬁed by
other proof techniques.
5.1 algebraic combinational equivalence checking
Given two circuits C1 and C2 that represent the functions f1(x1, . . . ,xn) =
(y1, . . . , ym) and f2(x1, . . . ,xn) = (z1, . . . , zm), respectively, our aim is to show
equivalence of C1 and C2, i.e., (y1, . . . , ym) = (z1, . . . , zm) for all x1, . . . ,xn. We
propose to solve the problem using algebraic computation methods. Since the
speciﬁcation of C1 and C2 may be unknown or since it may not be expressible in
a canonical and an abstract form over Z2n , we cannot use previous work [39, 78,
87] that performs ideal membership testing with respect to a given speciﬁcation.
Instead we propose to represent C1 and C2 as polynomial sets G1 and G2
and combine them into a single model G = G1 ∪ G2. We then formulate the
problem as testing the membership of relations between variables in C1 and
C2 wrt. G. An obvious choice for such a relation is the equivalence of output
signals yi = zi which can be expressed in a polynomial as yi − zi = 0. However,
reducing such a polynomial wrt. G causes a tremendous overhead since the
substitution of all the internal variables in G1 and G2 will blow up the sizes of
the polynomials in G.
5.1 algebraic combinational equivalence checking 99
To overcome this problem we suggest to ﬁnd internal equivalences, i.e., poly-
nomials that express equivalence of two internal signals in G1 and G2. Reducing
these polynomials wrt. G causes a smaller overhead and simpliﬁes G. This tech-
nique is similar to SAT sweeping in combinational equivalence checking [66]
and we call it arithmetic sweeping in the following. Arithmetic sweeping works
as follows: for each internal variable v1 in G1 we search for an equivalent vari-
able v2 in G2, i.e., v1 and v2 represent the same function wrt. to the primary
inputs. We call such a pair (v1, v2) bit equivalence and are able to substitute
v2 by v1 in all polynomials. For some internal variables, we will not be able
to prove equivalence to another variable. These variables are eliminated by the
substitution for proved bit equivalent variables of their transitive fan-in.
However, performing arithmetic sweeping on the overall combined model G
is not scalable. First, the number of candidates for bit equivalences is too large,
and second, checking for equivalence a pair of variables that have a large tran-
sitive fan-in may be too diﬃcult. To circumvent this problem, we ﬁrst apply
reverse engineering for two main goals: i) extracting and abstracting arithmetic
word-level components to canonical polynomials; ii) partitioning the circuits G1
and G2 into smaller parts. The algorithm works as follows: First, we try to ﬁnd
an instance of an arithmetic word-level component both in G1 and G2 and ab-
stract them to canonical polynomials. If this is successful, we obtain an input
boundary and an output boundary for the component in G1 and G2. The pairs
of input boundaries and output boundaries are candidates for word equivalences.
Having them, we perform arithmetic sweeping only in the transitive fan-in of
the input boundaries. If this ultimately proves that the input boundaries are
equivalent and we have proven that abstracted polynomials of the two arithmetic
components found by reverse engineering are equivalent, we can merge the tran-
sitive fan-in of the output boundaries from G, making the model signiﬁcantly
smaller.
The overall ﬂow of the ACEC is demonstrated by Figure 23. It starts by mod-
eling the two compared netlists N1 and N2 and combining them in one Gröbner
basis model G over the Boolean ring. Then it performs successively the two
main algorithms of the ACEC, which are reverse engineering and arithmetic
sweeping, ensuing a simpliﬁed model of G named Gsimple. Finally, as shown in
the end of Figure 23, the ACEC tests the consistency between output relation-
ships and the simpliﬁed model Gsimple. In the middle of Figure 23 the reverse
engineering algorithm is presented, it rewrites the model G to lift concealed
bit-level arithmetic components into sum carry networks (SCNs), afterwards it
exploits features of SCNs to identify boundaries of arithmetic components in
the rewritten Gröbner basis model G′. The last task of reverse engineering is
100 equivalence checking of floating-point multipliers using gröbner bases
the abstraction of founded SCNs into word-level polynomials using Gaussian
elimination, storing them in a word-level model Gword. After applying reverse
engineering, the arithmetic sweeping algorithm is evoked, as shown in Figure 23,
the arithmetic sweeping leverages the obtained polynomials of Gword and G′ to
deduce and prove equivalence relationships between internal variables of G′,
which leads to the simpliﬁed model Gsimple by merging internal variables of G′
that are proved to be equivalent.
Circuit
Netlist 1
Gröbner
Modeling
N1
Gröbner
Modeling
Circuit
Netlist 2
N2
Combined
Model
G1 G2
Reverse
Engineering
G
Arithmetic
Sweeping
G′Gword
Membership
Testing
Output
Relationships
Gsimple
Inconsistency
Equivalence
Model
Rewriting
G
Extracting
Arithmetic
Units
G′ Gword
G′
Generating
Relationships
G′Gword
Membership
Testing
Internal
Relationships
Model
Simpliﬁcation
Equivalence/Inconsistency
GsimpleG′
Figure 23: Flow of ACEC
5.2 reverse engineering of data-path units 101
Details on the algorithms of our ACEC are explained in the remaining sec-
tions. To summarize, the main contributions of this chapter are:
1. Utilizing symbolic computation for the combinational equivalence check-
ing of bit-level circuits.
2. Identifying boundaries of arithmetic components that exist within a larger
bit-level circuit together with abstracting their functions to canonical
word-level descriptions using a new reverse engineering algorithm.
3. Partitioning the circuits into smaller parts based on boundaries of the
extracted word-level components.
4. Proposing arithmetic sweeping which leverages the given arithmetic infor-
mation to ﬁnd (and prove) internal equivalences.
5. Oﬀering eﬃcient polynomial representations for the control logic functions
based on functional decomposition.
5.2 reverse engineering of data-path units
Key in ACEC is to ﬁnd arithmetic components using the reverse engineering
in order to guide the arithmetic sweeping in decomposing the veriﬁcation prob-
lem. Reverse engineering extracts data-path components and abstracts them to
canonical word-level polynomials. To locate such units, the propagation of carry
bits between internal nets of data-path units is a property that can be used to
identify them. In the proposed reverse engineering algorithm we exploit this
property to extract data-path units from the combined model G. According to
our observation, these carry bits are modeled as carry terms (see Deﬁnition 22)
distributed among polynomials of G. The carry terms are nonlinear terms that
can be distinguished by their shared monomials and their coeﬃcients which
are multiple of each other and with opposite signs. However, carry terms are
hidden within bit-level models, they are exposed only when bit-level descrip-
tions of arithmetic components are rewritten into Sum Carry Networks (SCNs,
see Deﬁnition 21). Such rewriting is performed by a model rewriting algorithm
based on the principles explained in Section 4.4. The algorithm is applied on the
combined model G to describe arithmetic functions as SCNs, revealing carry
terms among polynomials of adder cells of SCNs. Having a rewritten model G′,
the reverse engineering algorithm utilizes the feature of carry terms to identify
102 equivalence checking of floating-point multipliers using gröbner bases
the boundaries of each SCN in G′, isolating each set of polynomials that repre-
sent a SCN. The ﬁnal step of the reverse engineering is to derive one canonical
polynomial for each identiﬁed SCN using Gaussian elimination.
In contrast to this proposed algorithm, the recently reverse engineering algo-
rithms [94, 103] for the extraction of arithmetic word level components from a
gate-level netlists are not applicable to designs with a non-arithmetic combina-
tional logic attached to the output.
In the following, we introduce the model rewriting algorithm in Subsec-
tion 5.2.1, then the two main tasks of the reverse engineering algorithm to iden-
tify and abstract data-path units are provided in Subsections 5.2.2 and 5.2.3.
5.2.1 Model Rewriting
The proposed reverse engineering algorithm leverages the model rewriting
algorithm (see Section 4.4) to expose a speciﬁc feature that distinguishes arith-
metic functions than other control logic function when these functions are
combined and implemented as a see of logic gates (netlist). Model rewriting
has the ability to reveal carry terms among polynomials that model data-path
units. This revealing for carry terms enables our reverse engineering algorithm
to identify boundaries of diﬀerent architectures of large scale multipliers and
adders hidden in a given netlist. The rewriting algorithm executes successively
two rewriting schemes named XOR rewriting and common rewriting. The ﬁrst
scheme XOR rewriting combines the knowledge of the circuit gates with the
algebraic model. It rewrites the model such that it depends only on inputs and
output variables of chains of XOR gates, whereas all other variables are substi-
tuted. The second common rewriting scheme rewrites the model obtained from
XOR rewriting such that the model depends only on variables that are used in
more than one polynomial. As presented in Section 4.4, the rewriting algorithm
is capable of lifting bit-level models of a large class of arithmetic architectures
into SCNs which reveal carry terms, making it feasible for the reverse engineer-
ing algorithm to identify polynomial sets of each SCN modeling a data-path
unit.
However, the rewriting algorithm has been designed under the assumption
that its input is a bit level description of an arithmetic circuit. Applying this
algorithm to a control logic circuitry causes a blow-up in the number of model
terms since control logic usually does not contain XOR gates, consequently,
most of the control logic variables are substituted. To take advantage of the
rewriting algorithm for circuits which contain data-path and control logic, we
5.2 reverse engineering of data-path units 103
distinguish the control logic part of a circuit by its multiplexers (MUXes) and
disallow XOR rewriting and common rewriting from substituting input and
output variables of MUXes. This guarantees that both schemes will be applied
only to the data-path logic. For a given bit-level model G which consists of
control and data-path logics, the modiﬁed rewriting algorithm generates a new
model G′ that has a set of polynomials describing functions of chains of XORs,
MUXes, and cones of gates which are bounded by inputs and outputs of XORs
and MUXes. Thus the reverse engineering algorithm obtains a rewritten model
G′ that describes data-path components as SCNs and in the same time polyno-
mials of control logic have no exponential size.
5.2.2 Identifying Boundaries of Data-path Units
Obtaining the rewritten model G′, the next task of the reverse engineering
algorithm is to identify all SCNs that lie within G′ using the feature of carry
terms. Carry terms appear only among polynomials of adder cells which are con-
nected by their sum and carry variables to build SCNs. Based on this structure,
the algorithm identiﬁes sets of polynomials that represent SCNs as follows:
1. First, the algorithm groups polynomials of G′ that share carry terms into
sets, such that a new polynomial joins a set aGi that models an adder
cell i, if it shares a carry term with other polynomials in the set aGi, e.g,
the polynomials g0 and g1 are in the same adder cell set, if one of them
has the term −x0x1 while the second has the term 2x0x1.
2. Second, after building sets of adder cells aGi, the algorithm inserts those
that are connected to each other in a larger set nGj representing a SCN.
To perform this task, it relates each adder cell aGi to a set of input
variables Xi and a set of output variables Xoi, which represent the inputs
and the outputs of the adder cell, exploiting that a variable of a leading
monomial of a Gröbner basis polynomial is the output of the Boolean
function modeled by this polynomial. Then it builds nGj based on the
rule that two sets {aGi, aGi+1} ∈ nGj , if Xi+1 ∩ Xoi = {}. In other
words, if some outputs of the adder cell aGi are given as inputs to the
adder aGi+1.
Example 18. To illustrate the way of identifying SCNs, consider the case of
a 3-bit ripple carry adder that its polynomials are :
g6 := −c2 −2c1y2x2 + c1y2 + c1x2 + y2x2
g5 := −s2 +4c1y2x2 − 2c1y2 − 2c1x2 − 2y2x2 + c1 + y2 + x2
104 equivalence checking of floating-point multipliers using gröbner bases
g4 := −c1 −2c0y1x1 + c0y1 + c0x1 + y1x1
g3 := −s1 +4c0y1x1 − 2c0y1 − 2c0x1 − 2y1x1 + c0 + y1 + x1
g2 := −c0 +y0x0
g1 := −s0 −2y0x0 + y0 + x0
To deﬁne this set of polynomials as a SCN, the reverse engineering algorithm as-
signs ﬁrst polynomials that share carry terms into sets of adder cells. It deduces
simply three sets which are aG1 = {g1, g2}, aG2 = {g3, g4}, and aG3 = {g5, g6}.
Second, it relates each adder cell with its inputs and outputs. This means that
aG1 is related to X1 = {x0, y0} and Xo1 = {c0, s0}, aG2 has the sets X2 =
{x1, y1, c0} and Xo2 = {c1, s1}, and aG3 has the sets X3 = {x2, y2, c1} and
Xo3 = {c2, s2}. Since X2 ∩ Xo1 = {c0} and X3 ∩ Xo2 = {c1}, the set of poly-
nomials that builds the SCN is nG = aG1 ∪ aG2 ∪ aG3 = {g1, g2, g3, g4, g5, g6}.
In addition to collecting polynomials of each SCN in a set nG, the reverse
engineering algorithm determines inputs and outputs boundaries of these ex-
tracted SCNs. This works as follows: The algorithm extracts this information
from inputs Xi and outputs Xoi related to adder cells sets aGi ∈ nG. A variable
x in Xoi that is not used as inputs for other adder cells (x ∈ Xoi and x ∈ Xk
for all k > i ) is identiﬁed as an output variable of this SCN and is stored in the
set of variables nZ which represents the outputs boundary. Similarly, a variable
x is inserted in the set nX of the inputs boundary, if x ∈ Xi and x ∈ Xok for
all k < i.
So far, we have presented the ﬁrst task of the reverse engineering algorithm
to identify boundaries of data-path units by building sets of extracted SCNs.
Another task of the algorithm is presented in the following subsection, which
is handling each SCN as an independent algebraic ideal to abstract it to one
canonical polynomial using Gaussian elimination.
5.2.3 Abstracting Data-path Units
Expressing a function by a canonical representation allows checking the equiv-
alence between two diﬀerent implementations of the function in a linear time.
This why the abstraction of data-path units to canonical polynomials is cru-
cial for the ACEC. The proposed reverse engineering performs this task on the
extracted SCNs, exploiting a feature of SCN that it can be reduced without
exponential complexity in space. Lemma 6 and Lemma 8 have proved that the
reduction of SCNs wrt. speciﬁcation polynomials is bounded by linear complex-
ity for adders and by polynomial complexity in the case of multipliers under a
speciﬁc substitution order. The reverse engineering algorithm deploys Gaussian
5.2 reverse engineering of data-path units 105
elimination to reduce SCNs into what are called reduced Gröbner bases (see Def-
inition 18) which are canonical polynomials for SCNs. The reduction of SCNs
by Gaussian elimination has a computational complexity similar to Lemma 6
and Lemma 8, on the other hand, it does not require a speciﬁcation polynomial.
The idea of leveraging Gaussian elimination algorithm has been proposed by
[35, 40] to replace Buchberger’s algorithm [19] for computing Gröbner bases. In
the context of reverse engineering, Gaussian elimination is utilized for another
application which is the reduction of a given SCN into one canonical polyno-
mial. For this propose, Gaussian elimination algorithm has been modiﬁed in
the sense that it performs only this speciﬁc task. The modiﬁed version is called
Gaussian Network Reduction (GNR). It performs iteratively three steps:
1. Select from the given set nG of a SCN two polynomials g and gˆ that
have at least two terms t and tˆ respectively which: 1) have the same
monomials, and 2) the absolute values of their coeﬃcients ct and ctˆ are
equal or multiple of each other. The highest priority for the selection is
given to the pair of polynomials that have the largest number of such
terms.
2. Let ‖ct‖ > ‖ctˆ‖, calculate a scalar cq = −ct/ctˆ, multiply gˆ by cq, and
then add the result to g. This leads to cancel the terms t as well as tˆ and
therefore derives a new polynomial h.
3. Update the set nG by removing polynomials g and gˆ together with in-
serting the new polynomial h, i.e., nG = (nG \ {g, gˆ}) ∪ h. The GNR
algorithm terminates when nG consists only of one polynomial.
Example 19. To illustrate the GNR algorithm, consider again the set nG of
a 3-bit ripple carry adder:
g6 := −c2 −2c1y2x2 + c1y2 + c1x2 + y2x2
g5 := −s2 +4c1y2x2 − 2c1y2 − 2c1x2 − 2y2x2 + c1 + y2 + x2
g4 := −c1 −2c0y1x1 + c0y1 + c0x1 + y1x1
g3 := −s1 +4c0y1x1 − 2c0y1 − 2c0x1 − 2y1x1 + c0 + y1 + x1
g2 := −c0 +y0x0
g1 := −s0 −2y0x0 + y0 + x0
The given SCN model consists of polynomials which have common monomials
such as g6, g5 (colored green/dashed box in the example). The similar structural
property can be seen for equally colored terms of the polynomials g4, g3 and
polynomials g2, g1, respectively. The GNR algorithm selects ﬁrst the polynomials
g6 and g5. In order to cancel their common terms, it multiples g6 by 2 and adds
106 equivalence checking of floating-point multipliers using gröbner bases
it to g5. The result is the polynomial h1 := −2c2 − s2 + c1 + y2 + x2 which
represents a full adder function (step two of the algorithm). The third step is
to update the set nG to be
h1 := −2c2 − s2 + c1 + y2 + x2
g4 := −c1 −2c0y1x1 + c0y1 + c0x1 + y1x1
g3 := −s1 +4c0y1x1 − 2c0y1 − 2c0x1 − 2y1x1 + c0 + y1 + x1
g2 := −c0 +y0x0
g1 := −s0 −2y0x0 + y0 + x0
Applying the same steps on other related polynomials yields another two poly-
nomials of full adders h2 := −2c1 − s1 + c0 + y1 + x1 and h3 := −2c0 − s0 +
y0 + x0, resulting in the updated nG:
h1 := −2c2 − s2 + c1 + y2 + x2
h2 := −2c1 − s1 + c0 + y1 + x1
h3 := −2c0 − s0 + y0 + x0
Performing the GNR steps again on the three full adder polynomials cancels
shared terms and achieves a reduced Gröbner basis. The algorithm multiplies
h1 by 2 and adds h2. The result will be h4 := −4c2 − 2s2 − s1 + 2y2 + 2x2 +
c0 + y1 + x1. Finally, the reduced Gröbner basis polynomial h5 := −8c2 − 4s2 −
2s1 − s0 + 4y1 + 4x1 + 2y1 + 2x1 + y0 + x0 is derived by multiplying h4 by 2
and adding it to h3 for canceling the shared monomial c0.
Three questions could be asked about this proposed algorithm: Does the
GNR produce always one polynomial?. Is the resulted polynomial a canonical
representation for a given set of a SCN?. What is the computational complexity
of the GNR algorithm?. In the following, we answer these questions by three
lemmas.
Lemma 9. Let nG = {g1, · · · , gm} denotes the polynomial set of a SCN given
to the GNR, the algorithm derives always one polynomial from nG.
Proof. The construction of nG guarantees that each subset of polynomials
aGi ⊂ nG shares at least one monomial with other subset of polynomials
aGj ⊂ nG regardless of the sizes of these subsets (see Subsection 5.2.2). Be-
sides this, the GNR algorithm works iteratively on reducing each pair of related
polynomials (that share monomials) into one polynomial. Because of the con-
struction of nG, there will be always during the process of the algorithm a pair
of related polynomials hi and hj that can be reduced to one polynomial. hi
and hj are derived from reducing subsets aGi and aGj that have at least one
shared monomial, respectively, these shared monomials will remain between hi
and hj , thereby the GNR can replace both of them by a new polynomial. By re-
5.2 reverse engineering of data-path units 107
placing all related pairs of polynomials, the GNR terminates always by a single
polynomial. 
Lemma 10. Let a set nG that has been reduced by the GNR algorithm to a
single polynomial nGˆ = {g}, g is the unique canonical representation of the
function f modeled by the nG, wrt. a monomial order ≺.
Proof. Based on Lemma 2, for every ideal there is a unique reduced Gröbner
basis. The GNR has reduced the set of polynomials nG (ideal) to only one
polynomial nGˆ = {g}. Since nGˆ = {g} has a single polynomial, no term in
g is divisible by the leading term of any other polynomial in nGˆ. This means
that Deﬁnition 18 of the reduced Gröbner basis holds for nGˆ, therefore, g
is a canonical abstracted representation under the monomial order ≺ for the
function f implemented by the set nG. 
Lemma 11. Let nt denotes the total number of terms of all polynomials in
the set nG = g1, · · · , gm that represents a SCN of an integer adder or a SCN
of an integer multiplier, and let Z = {z0, · · · , zm}, X = {x0, · · · ,xn−1} and
Y = {y0, · · · , yn−1} represent outputs and inputs of the function modeled by
the SCN, respectively. The reduction of nG by the GNR algorithm is bounded
in space by a linear complexity for adders and by a polynomial complexity in
case of multipliers.
Proof. Since the GNR algorithm reduces after each iterative step by at least
two terms from the total number of terms nt, the complexity of each step
in the space is always bounded by O(nt). This means that the size of each
resulted polynomial will not exceeds nt regardless of the type of the given
set of polynomials. In case of integer adders or integer multipliers modeled
as SCNs, their set of polynomials reveal all carry terms which are canceled
simply by the GNR algorithm. Therefore, the n-bit addition function is re-
duced into one polynomial h := −
n∑
k=0
2kzk +
n−1∑
k=0
2kxk +
n−1∑
k=0
2kyk whose size is
bounded by O(n), while for the n-bit multiplication the resulted polynomial
h := −
2n−1∑
k=0
2kzk +
n−1∑
k=0
2kxk ·
n−1∑
k=0
2kyk is of the size bounded by O(n2). 
The canonical polynomials for each extracted arithmetic components are
stored in a set of word polynomials Gword after relating each polynomial with
a set of its input variables nX as well as a set of its output variables nZ. In
the next section, we present how the arithmetic sweeping algorithm deploys the
extracted arithmetic information to simplify the rewritten model G′.
108 equivalence checking of floating-point multipliers using gröbner bases
5.3 arithmetic sweeping
Arithmetic sweeping aims to ﬁnd internal equivalences, which avoids pro-
hibitive runtime during recursive divisions of the IMT. Of course, when having
identiﬁed candidates for internal equivalence, it is still necessary to prove their
equivalence (which is also done using the same IMT procedure). Hence, to gain
an overall beneﬁt, we need i) promising candidates and ii) moderate runtimes
for the equivalence proofs. Our proposed arithmetic sweeping reaches both goals
as follows.
For i), the reverse engineering step provides arithmetic components. From
this, we generate promising candidates based on the I/O boundaries of these
components. The algorithm uses the I/O boundaries to partition the variables of
the combined model G′ into groups. Simulation deduces word equivalence (wE;
for details see below) candidates between outputs of the arithmetic components.
For every nominated wE the partitioning of model variables is performed by
classifying two groups of variables. One for the transitive fan-ins variables of the
input boundaries of wE and the other are internal variables of the two related
arithmetic components. Deducing only internal bit equivalences (bE; see below)
between variables in the same group increases the potential of equivalence.
For ii), the equivalence proofs become feasible for several reasons. Arithmetic
sweeping generates two types of relationships which are bit equivalence (bE)
pair and word equivalence (wE) pair. bE describes the equivalence of a pair of
variables (v,vˆ) and is formulated by the polynomial g := −v + vˆ. The word
equivalence (wE) polynomial is formulated as g := Z − Zˆ for the word pair
candidate (Z,Zˆ), where Z = 2mzm + · · · + z0 and Zˆ = 2mzˆm + · · · + zˆ0. For
each arithmetic component we have determined an abstract canonical polyno-
mial in the reverse engineering step. The major advantage over SAT sweeping
is that the proof for the internal equivalences is performed by dividing wE poly-
nomials wrt. the abstracted polynomials. For doing this, the word model Gword
obtained from the reverse engineering algorithm is modiﬁed as follows: For ev-
ery abstracted polynomial −2mzm − · · · − z0 + f(x0, · · · ,xn), an integer word
variable Z is created and the polynomial is replaced by −Z + f(x0, · · · ,xn).
The polynomial −Z + 2mzm + · · ·+ z0 is used to interpret the equivalence be-
tween two output words Z and Zˆ, as shown in Lemma 12 of Section 5.3.2. To
summarize, dividing wE wrt. abstracted polynomials has a major inﬂuence on
the performance of the technique—it avoids the exhaustive cost of searching
for equivalences between internal variables of the data-path units which usually
have the large number of non-equivalent variables in their transitive fan-ins.
5.3 arithmetic sweeping 109
5.3.1 Generating Relationship Polynomials
The choice of relationship candidates is always the main problem of diﬀer-
ent equivalence checking techniques. ACEC draws on the simulation approach
of [43] and the extracted data-path polynomials to deduce bit and word re-
lationships. Four steps are performed to generate relationship polynomials i)
nominating wE polynomials, ii) classifying the model variables to groups, iii)
generating bE polynomials, and ﬁnally iv) sorting wE and bE polynomials in a
relationship list.
Based on a ﬁxed size of global simulation over the primary inputs of G′,
word relationships between the output words of the data-path polynomials are
deduced. Two words build a wE polynomial if their integer values are equal
under all simulated assignments.
The approach classiﬁes the variables of G′ to groups according to wE poly-
nomials. One wE polynomial categorizes two groups, the ﬁrst consists of all
transitive fan-in variables of the polynomial; and the second contains internal
variables which are bounded by outputs and inputs variables of wE.
Example 20. To illustrate this idea, consider a model which has four extracted
data-path units (DPU1, DPU2, DPU3, and DPU4), as shown in Figure 24. The
simulation nominates two wEs, one relates the output word of DPU1 and DPU2,
the other one is between DPU3 and DPU4. The approach classiﬁes the model
variables into 5 groups i) a group for transitive fan-in variables of DPU1 and
DPU2, ii) a group which contains internal variables of DPU1 and DPU2, iii)
transitive fan-in variables of the wE between DPU3 and DPU4, iv) their internal
variables, and v) the remaining variables of C1 and C2 which are not classiﬁed
into groups.
Classiﬁed groups of G′ and global simulation are used to determine for every
model variable vi a set of variables φi. We have vj ∈ φi, if Boolean values of
vi and vj are the same under each of input assignments; and therefore vi and
vj belong to the same classiﬁed group. Finally, bE polynomials between vi and
other variables of φi are generated. We call them bE polynomials of the variable
vi.
After classifying model variables and generating wE and bE relationships,
these nominated relationships are sorted topologically wrt. the circuit and their
leading variables. The sorting procedure aims to test a wE polynomial after test-
ing all bE polynomials of variables in its transitive fan-in group. First, the wE
polynomials are sorted topologically. Next, the procedure iterates over the wE
polynomials for inserting in the list for every wE i) bE polynomials of variables
110 equivalence checking of floating-point multipliers using gröbner bases
DPU2DPU1
DPU3 DPU4
C1 Netlist C2 Netlist
Transitive
Fan-in
Transitive
Fan-in
Figure 24: Schematic of a Model with Word Relationships
in the transitive fan-in group of this wE, ii) the wE polynomial itself, then iii)
bE polynomials of variables in its internal group. Finally, the bE polynomials
of remaining variables that are not included in groups that are related to wEs,
are inserted at the end of the list.
5.3.2 Testing Membership of Internal Relations
During the testing of internal relationships, the IMT algorithm is invoked
to divide every polynomial pr from the relationship list wrt. G′ or Gword, if pr
is a bE polynomial, the division is done wrt. G′, otherwise is performed wrt.
Gword. Based on the remainder result of dividing pr, the approach eliminates
or merges variables of pr from the models G′ and Gword.
The merging decision is taken, in the case that the remainder result of divid-
ing pr is equal to zero. The approach merges every two variables of pr which
are proved to be functionally equivalent to one variable. In case that pr is a wE
polynomial, equivalence is proved based on the following lemma.
Lemma 12. Given the equivalent of two integer words Z = 2mzm + · · · + z0
and Zˆ = 2mzˆm + · · · + zˆ0. If Z and Zˆ have the same number system and the
number system is not redundant, then the bit variables zi and zˆi which have
same weights (coeﬃcients) are equivalent.
Merging results are reducing the number of polynomials in G′, these merged
variables are considered new primary inputs, therefore polynomials of their
transitive fan-in variables are removed from G′.
Example 21. Continue with the previous model example. According to the
sorted relationship list, the ﬁrst group to be tested is the bE polynomials between
5.4 efficient polynomial representation 111
DPU2DPU1
−v + vˆ G′−−−−→+ r
−Z + Zˆ Gword−−−−−−→+ r
Transitive
Fan-in
Transitive
Fan-in
Figure 25: Bit and Word Relationships Testing
corresponding fan-in variables of DPU1 and DPU2, and then the wE polynomial
that describes an equivalence between the output word variables of DPU1 and
DPU2, as shown in Figure 25. In Figure 26, a simpliﬁed model is appeared
after merging variables that are proved to be equivalent by testing bE and wE
polynomials of DPU1 and DPU2. The simpliﬁcation for the original model is
performed by removing polynomials that model DPU1 and DPU2 as well as
those which model their transitive fan-in variables. In order to avoid redundant
divisions, the remaining bE polynomials which test the already merged variables
will be removed from the relationship list.
A variable of the model that has no functional equivalences is eliminated by
substituting it with the leading terms of its polynomials which are functions
in proved bit equivalent variables. The elimination decision will be taken for
variables vi of pr. If the remainder of dividing pr is not equal to zero, and there
are no more untested bE or wE polynomials in the list which are related to
vi. These eliminations facilitate the division process of next relationships. It
increases the number of shared input variables among polynomials of G′ which
simpliﬁes the division process of pr wrt. G′. For example, dividing pr : −vi + vj
wrt. a model that has polynomials g1 : −vi +x1x2 +x3 and g2 : −vj +x1x2 +x3
will be simpliﬁed to a subtraction operation. The remainder of the division will
be x1x2 + x3 − x1x2 − x3 = 0.
5.4 efficient polynomial representation
The polynomial is the heart of the algebraic computation technique. An ef-
ﬁcient representation of a polynomial has a major impact on the performance
112 equivalence checking of floating-point multipliers using gröbner bases
DPU3 DPU4
C1 Netlist C2 Netlist
Figure 26: Schematic of a Simpliﬁed Model
of any algebraic algorithm. So far, only arithmetic functions have been im-
plemented by canonical and compact representations, but what about other
Boolean functions in the model G′. Unfortunately, some Boolean functions can-
not be implemented by multivariate polynomials with moderate sizes. To cir-
cumvent the exponential sizes of the polynomials representing these Boolean
functions, we propose a decomposition method which reduces the number of
terms of polynomials signiﬁcantly. The decomposition method oﬀers also semi-
canonical representations which makes it feasible to check the equivalence be-
tween two diﬀerent implementations of the same Boolean function.
5.4.1 Diﬀerent Decompositions
Inspired by decomposition types of decision diagrams [24, 32], we enhance
representations of polynomials by considering two decomposition types:
f = f|x=0 + x(f|x=1 − f|x=0) positive Davio (pD)
f = f|x=1 + (1 − x)(f|x=0 − f|x=1) negative Davio (nD),
where x denotes a Boolean variable, the functions f are combined with addition,
subtraction, and multiplication operations.
Our observation is that polynomials have been typically represented by the
pD decomposition. This obstructs a compact representation for some Boolean
functions such as a chain of OR gates. Although, these functions can be canon-
ically deﬁned without an exponential size, if another type of decomposition for
variables is considered such as nD.
5.4 efficient polynomial representation 113
Example 22. Consider a 4-input OR function f(x0,x1,x2,x3), its polynomial
representation that follows only pD is f = x0 + x1 + x2 + x3 − x0x1 − x0x2 −
x0x3 − x1x2 − x1x3 − x2x3 + x0x1x2 + x0x1x3 + x0x2x3 + x1x2x3 − x0x1x2x3.
By decomposing f using nD for all of its variables, it will be f = 1− x¯0x¯1x¯2x¯3,
where x¯i = 1 − xi. For an n-bit OR function, a polynomial which follows pD
consists of 2n − 1 terms, while the nD polynomial has only two terms.
Other Boolean functions that can be implemented eﬃciently using the com-
bination between pD and nD decompositions are equality as well as inequality
conditions with respect to constants, which can be illustrated by the following
examples.
Example 23. Consider the equality function f = (8x3 + 4x2 + 2x1 + x0 ==
1000), where f is equal to one when x0 = 0, x1 = 0, x2 = 0 and x3 = 1.The
polynomial representation of the function that follows only pD is f = x3 −
x3x0 −x3x1 −x3x2 + x3x0x1 + x3x0x2 + x3x1x2 −x0x1x2x3. By decomposing f
using nD for variables {x0,x1,x2} and pD for the variable x3, f is implemented
as f = x¯0x¯1x¯2x3. For an n-bit equality function, a polynomial with only one
term can represent the function.
Example 24. Consider the inequality function f = (8x3 + 4x2 + 2x1 + x0 >
1000), where f is equal to one when x3 = 1 and x0 = 1, x1 = 1 or x2 = 1. The
polynomial representation of the function that follows only pD is f = x3x0 +
x3x1 + x3x2 − x3x0x1 − x3x0x2 − x3x1x2 + x0x1x2x3. By decomposing f using
nD for variables {x0,x1,x2} and pD for the variable x3, f is implemented as
f = x3 − x¯0x¯1x¯2x3.
Representing a Boolean function with a moderate size has a major inﬂuence
on reducing the number of addition, subtraction, and multiplication operations,
which enhances signiﬁcantly the performance of any symbolic computation al-
gorithm. In particular for the IMT procedure, most of the terms within the
division process cancel each other before doing any further substitutions, in ad-
dition to reducing the computational cost of each division step since the number
of terms involved in the process is less.
For applying diﬀerent decompositions, we add to the Gröbner basis model
negation version v¯i for each variable vi in the model, in addition to polynomials
g := −v¯i − vi + 1. As known from the ﬁeld of decision diagrams, the choice
of the type of the decomposition and the order of the variables plays a key
role for the size of the diagram. In the context of multivariate polynomials, we
ﬁx the order of the variables to the reverse topological order and we propose
114 equivalence checking of floating-point multipliers using gröbner bases
an approach to determine the Decomposition Type (DT) of each variable. As
the main goal of applying diﬀerent decompositions is reducing the number of
polynomials terms, the decision of DT is taken based on this factor and the
structure of the circuit.
For this purpose, we modify the modeling way of the circuit that is explained
in Subsection 2.1.5 as follows:
NOT: xo = ¬x1 =⇒ −xo + x¯1
AND: xo = x1 ∧ x2 =⇒ −xo + x1x2
OR: xo = x1 ∨ x2 =⇒ −xo + 1 − x¯1x¯2
XOR: xo = x1 ⊕ x2 =⇒ −xo − 2x1x2 + x1 + x2
MUX: xo = (x1 ∧ x2) ∨ (¬x1 ∧ x3) =⇒ −xo + x1x2 − x1x3 + x3,
such that the DT of input variables of inverters and OR gates is nD, for AND,
XOR, and MUX gates, it is pD. As shown in this modeling, one variable may
have more than one DT in the model. During model rewriting, see Subsec-
tion 5.2.1, a polynomial g is rewritten by substituting one of its variables vi, as
a result of this step, another variable vj in g may have diﬀerent decomposition
types—the variable vj and its negation v¯j are within the same polynomial g. In
this case, we unify the DT of vj based on the one which achieves the highest
reduction with respect to the size of g. In case of n variables with diﬀerent
DTs within the same polynomial, the number of possible combinations of DTs
for these variables are 2n, e.g., for a polynomial with two variables v1 and v2,
the possible decomposition combinations will be (v1, v2), (v1, v¯2), (v¯1, v2), or
(v¯1, v¯2). Trying all combinations to ﬁnd the best representation leads to a pro-
hibitive runtime because of calling the decomposition algorithm 2n times. To
bypass this problem, we propose a polynomial decomposition approach which
takes the decomposition decision of every variable independently from others.
This restriction on choosing DTs accelerates signiﬁcantly the runtime of the
proposed approach to ﬁnd compact representations for polynomials. The poly-
nomial decomposition approach works as follows,
1. During the model rewriting, its detects any rewritten polynomial g with
one or more variables vj that have non-uniﬁed DTs.
2. According to the reverse topological order, one variable v is selected.
Given the two versions v and v¯, the polynomial g is rewritten by replacing
v¯ with 1 − v.
3. To ﬁnd the proper DT for v, the decomposition algorithm (WLD) which
designed for decision diagrams [31] is called. The input given to WLD
5.4 efficient polynomial representation 115
is the polynomial g after parsing it into a K*BMD, and it is asked to
decompose the K*BMD of g wrt. v one time as pD and another time as
nD. The resulted K*BMDs are parsed in the reverse direction into mul-
tivariate polynomials, the one with less size gv is picked out and updates
the original polynomial g = gv.
4. Steps two and three are repeated iteratively until all variables of g have
uniﬁed decomposition types.
Yet, we have shown that variables of a polynomial can be decomposed wrt.
pD or nD decomposition types, but why Shannon (S) decomposition f =
(1 − x)f|x=0 + xf|x=1 has not been also applied on a multivariate polynomial,
although it is useful for implementing Boolean functions such as those composed
of XORs or MUXes. The reason behind excluding Shannon decomposition is
that it changes the representation of the XOR function from −xo − 2x1x2 +
x1 + x2 to be −xo + x¯1x2 + x1x¯2. The problem of this description is that it
hides the nonlinear term −2x1x2, therefore, carry terms of adder cells will be
hidden since they are nonlinear terms by deﬁnition. The unrevealing of these
carry terms will disable capabilities of the reverse engineering algorithm (see
Section 5.2), it will not be able to extract arithmetic functions. This means that
on one hand, Shannon decomposition is useful for representing functions such
as shifters and comparators which are rich with XORs and MUXes, on the other
hand, it cannot be applied on polynomials modeling integer-valued arithmetic
functions. To cope with this problem, we have designed a modiﬁed version of
the polynomial decomposition approach which permits Shannon decomposition,
however, this modiﬁed version is enabled only after the reverse engineering al-
gorithm. The modiﬁed polynomial decomposition is applied during the testing
phase of the arithmetic sweeping algorithm (see Subsection 5.3.2). During this
phase, variables that have no equivalences are eliminated from the model G′,
resulting in polynomials that may have variables with non-uniﬁed DTs. The
variables of these polynomials are decomposed wrt. one of the three decomposi-
tion types: Shannon, nD or pD, using the modiﬁed polynomial decomposition
approach.
The question now is whether this solution allows to represent shifters and com-
parators eﬃciently. This is can be explained by considering the model rewriting
algorithm which prohibits the elimination of variables that model the outputs
of XORs and MUXes. In the rewritten model G′, the Boolean output functions
of shifters and comparators are described by sets of polynomials modeling the
compounded XORs and MUXes of these functions. With this status, G′ is given
116 equivalence checking of floating-point multipliers using gröbner bases
Simple
Multiplier
Complex
Multiplier
EXP
Adder
EXP
Adder
Normalize
& Round
Optimized
Normalize
& Round
Left Hand Side Right Hand Side
eaeb eaebfafb fafb
fpep fˆp
eˆp
Figure 27: Compared FP Multiplier Circuits
to the arithmetic sweeping algorithm which tries to ﬁnd equivalences between
internal variables of shifters and comparators. In the case that outputs and
internal variables of such functions are proved to have equivalences, then their
descriptions in the model G′ as polynomials of XORs and MUXes are suﬃcient
to simplify the model, otherwise the arithmetic sweeping eliminates variables
that have no equivalences. At this moment, the modiﬁed polynomial decompo-
sition approach plays its role in providing polynomials with moderate sizes that
model shifters and comparators after eliminating non-equivalence variables.
5.5 experimental results
ACEC is implemented in C++. We compared it to the equivalence checkers of
ABC [13] tool and a commercial tool (OneSpin). The experiments were carried
out on an Intel(R) Core(TM) i5-3320M CPU (2.6 GHz, 16 GByte) running
Linux.
We applied the ACEC to the problem of verifying ﬂoating-point (FP) multi-
plier. We have scaled and modiﬁed the structure of the FP multiplier unit of
the open cores design module DOUBLE-FPU [67] for building dissimilar FP
instances. As shown in Figure 27, the compared circuits have diﬀerent multi-
plier architectures and their control logic units are optimized distinctively. The
multiplier units are generated using the online tool Arithmetic Module Gener-
ator [2]. These generated circuits were synthesized from Verilog to gate level
netlists using Yosys [101].
5.5 experimental results 117
Table 7: Runtimes for Checking FP Multipliers Equivalences
Multiplier FP operand Commercial ABC ACEC
Architecture # bits # bits (h:m:s) (h:m:s) (h:m:s)
SP-CT-BK 16 00:08:50 TO 00:01:42
SP-WT-CH 16 00:09:08 TO 00:01:44
SP-CT-BK 24 TO TO 00:17:49
SP-WT-CH 24 TO TO 00:25:58
SP-CT-BK 32 TO TO 02:24:01
SP-WT-CH 32 TO TO 03:41:43
SP-CT-BK 64 TO TO TO
SP-WT-CH 64 TO TO TO
In Table 7, we demonstrate the runtimes of checking the equivalences of diver-
gent FP multipliers against the same circuit reference. The reference consists of
simple multiplier (SP-AR-RC) and unoptimized normalize round unit. While
the compared circuits contain complex multipliers and round units which are
optimized using the Yosys option (share1). The ﬁrst column of Table 7 shows
the type of the multiplier architecture. The second and the third columns give
number of bits of an FP operand of the circuit in addition to the size of its
signiﬁcand and its exponent according to the IEEE standard. The next three
columns provide the runtimes. The timeout (TO in the table) is set to 24 hours.
The experimental results clearly demonstrate the advantage of ACEC in verify-
ing circuits that include data-path and control logic. While other equivalence
checking tools can verify the correctness up to 16 bits, we are able to verify the
correctness of a single precision binary ﬂoating-point multiplier (32 bit).
Table 8 shows some statistics about the algorithms of ACEC for checking
the equivalence of the FP multiplier instances that contain the multiplier ar-
chitecture (SP-WT-CH). For the reverse engineering algorithm, it shows the
runtime of rewriting the combined model G; the runtime of extracting and
abstracting data-path units; and number of the extracted units. These results
show that the reverse engineering algorithm extracts more candidates for data-
path units than the expected number. For two combined FP multipliers, six
1 It merges shareable resources into a single resource. A SAT solver is used to determine if two
resources are shareable
118 equivalence checking of floating-point multipliers using gröbner bases
Table 8: Statistics of ACEC for FP Multipliers
# bits ACEC Algorithms
Reverse Engineering
Model Rewriting Extract & Abstract # Data-path
(h:m:s) (h:m:s) Units
16 00:00:46 00:00:23 21
24 00:11:56 00:10:04 23
32 00:32:50 02:10:30 23
Arithmetic Sweeping
# Variables # Proved Runtime
of G′ Equivalences (h:m:s)
16 1888 401 00:00:27
24 4440 666 00:03:36
32 5889 854 00:58:04
Eﬃcient Polynomial Representation
Decomposition Logic Reduction
# Reduced Terms # Eﬀ./Total Calls # Canceled Terms
16 2400 514/3477 2916
24 9732 1013/8684 17153
32 16317 1345/12477 36390
data-path units should be extracted, two signiﬁcand multipliers, two exponent
adders, and two incrementers in the rounding stages. Also, the results show
that reverse engineering spends more than 65% of the total time of ACEC.
For arithmetic sweeping, Table 8 gives total number of variables of the com-
bined model; number of announced equivalences between variables of the two
compared circuits; and the spent time by the sweeping algorithm. The results
demonstrate that variables of G′ which have functional similarities between
each other account for less than 45% of the total number of variables.
Further, the table presents the number of saved terms by the decomposition
of polynomials; number of eﬀective (Eﬀ.) calls for the decomposition algorithm
wrt. the total calls for the algorithm (eﬀective calls are those which save terms
5.6 summary and future work 119
of polynomials), and the number of canceled terms by the reduction rule which
is applied during the model rewriting algorithm.
5.6 summary and future work
In this chapter, we have presented a new algebraic equivalence checking tech-
nique for checking the equivalence of circuits that combine data-path and con-
trol logic. The technique utilizes a new reverse engineering algorithm to extract
and abstract arithmetic components from the combined Gröbner basis repre-
sentation of the compared circuits. Based on input and output boundaries of
the abstracted components the proposed arithmetic sweeping deduces less and
promising candidates for bit and word equivalences between the compared cir-
cuits. The technique circumvents the blow-up in number of terms of polynomials
during the utilized algorithms by oﬀering diﬀerent types of decompositions for
polynomials. Experimental results demonstrated the eﬃciency of our technique
for the equivalence checking of large ﬂoating-point multipliers which cannot be
veriﬁed with existing Boolean combinational equivalence checking techniques.
Directions for future work include:
1. Investigating the canonization of control logic, the ACEC can bring these
units only to a semi-canonical form. Equivalence checking techniques
based on BDDs and SAT are still more powerful to reason control logic
circuits.
2. Managing the membership testing for non-equivalent circuits, whereas the
number of calls for the IMT will increase, causing an exponential increase
of the overall runtime of ACEC. Also, eliminating internal variables that
have no equivalences may cause a blow-up in the size of the combined
model.
3. Extending capabilities of the proposed reverse engineering approach, it
cannot handle data-path circuits that their output words represent a re-
dundant number system.
4. The assumption that MUXes appear only in the control logic is not always
valid, more robust approaches to distinguish the control logic from the
data-path logic are required.

6
CONCLUS IONS
This thesis resolves a hard problem that is beyond capabilities of state-of-the-
art formal veriﬁcation techniques, although it has been the subject of extensive
investigations in the academia as well as the industry for almost two decades.
The research in the thesis has aimed to propose a fully automated technique
for formal veriﬁcation of large scale bit-level arithmetic circuits, in particular,
binary ﬂoating point. In fact, automatic formal veriﬁcation for such circuits is
unachievable using commercial as well as academic formal veriﬁcation tools. To
perform this task outstanding experts are employed, who exert an enormous
amount of manual eﬀort to understand the design and therefore decompose
the veriﬁcation problem into solvable cases. That why the goal of the thesis is
signiﬁcantly important to provide high quality systems that involve arithmetic
circuits in a shorter time and a less cost.
To come up with such a full automated solution, the thesis has addressed ﬁrst
the hardest arithmetic unit to verify, which is the multiplier. Chapter 3 proposes
the CPP approach that copes with the exponential complexity of verifying
integer multipliers by decomposing automatically the veriﬁcation problem into
non-complex cases solved by standard equivalence checking tools. The approach
does not only resolve the exponential complexity of the problem, but also it is
more highly automated than the standard equivalence checking in the sense
that it is not given any information about the high level description of the
multiplier’s design or even a golden reference that is compared against the
multiplier.
Because the CPP approach is not applicable for all architectures of multi-
pliers, the thesis investigates another formal veriﬁcation technique based on
concepts from the symbolic computation ﬁeld. Chapter 4 introduces sophisti-
cated algorithms that boost capabilities of the symbolic computation technique
in order to verify large class of multiplier architectures including those that can-
not be handled by the CPP approach. The enhanced technique is considered
the most robust solution for the veriﬁcation of large scale bit-level multipliers.
Moreover, the major contribution of this chapter is that it draws the atten-
tion of the veriﬁcation research community to consider decision procedures for
formal veriﬁcation purposes rather than classical solvers such as SAT/SMT.
121
122 conclusions
The great success of the algebraic decision procedure provided by the en-
hanced symbolic computation—as shown in Chapter 4—was the key motivation
to leverage this algebraic procedure in a full automated veriﬁcation technique
for binary ﬂoating-point circuits, in particular, ﬂoating-point multipliers. For
this purpose, Chapter 5 oﬀers the ACEC technique which is able of extract-
ing from a given netlist of a ﬂoating-point circuit the necessary information
that supports a decomposition procedure for the veriﬁcation problem without
any kind of personal intervention. To perform this task, a reverse engineering
algorithm has been proposed to identify and abstract arithmetic components
of the netlist. The abstracted information is utilized afterwards by the arith-
metic sweeping algorithm to guide the decomposition of the overall problem.
By proposing the ACEC technique, the thesis attains its challenging as well as
long-term goal to verify large scale combinational arithmetic circuits in a fully
automated manner.
Furthermore, the thesis can be extended in order to handle further problems
that require solving nonlinear arithmetic constraints described at bit-level. It
becomes obvious from the results of the thesis that utilizing bit solvers such
as SAT and BDDs which express problems as propositional logic are not the
panacea for all types of bit-level constraints, particularly, nonlinear arithmetic
constraints. In contrast, algebraic solvers that manipulate polynomial represen-
tations of such constraints are more robust and more scalable. In a future work,
an integration between the SAT solver and the algebraic solver should be con-
sidered in the sense that the propositional logic part of the problem is handled
by the SAT solver, while the algebraic solver is leveraged for solving the arith-
metic part. In the same time, the two solvers have to exchange their results in
order to come up with a solution for the overall problem. Such a usage for the al-
gebraic solver prompts its future enhancement by eﬀective learning approaches
such as the DPLL approach deployed by the SAT solver, where polynomials can
be learned instead of clauses. Such improvements would expand the scope of
algebraic solvers beyond digital circuits to further applications, e.g., verifying
numerical programs as well as calculating their approximated errors [4, 33, 46],
and analyzing cryptography algorithms [1, 7, 92].
A further future aspect is to modify sequential equivalence checking as well
as model checking techniques to leverage the arithmetic information provided
by a reverse engineering algorithm—similar to the one used by the ACEC
technique—for verifying sequential circuits that their transition functions in-
corporate arithmetic components, e.g., multipliers. This development allows
also a full automated veriﬁcation for circuits such as ﬂoating-point division and
ﬂoating-point square root.
BIBL IOGRAPHY
[1] J. Almeida, M. Barbosa, G. Barthe, and F. Dupressoir. “Certiﬁed computer-
aided cryptography: Eﬃcient provably secure machine code from high-
level implementations.” In: ACM SIGSAC Conference on Computer &
Communications Security. 2013, pp. 1217–1230.
[2] Arithmetic Module Generator Based on ACG. available at http://www.
aoki.ecei.tohoku.ac.jp/arith/. 2016.
[3] M. Aschenbrenner. “Ideal membership in polynomial rings over the in-
tegers.” In: Journal of the American Mathematical Society 17.2 (2004),
pp. 407–441.
[4] E. Barr, T. Vo, V. Le, and Z. Su. “Automatic detection of ﬂoating-point
exceptions.” In: ACM SIGPLAN-SIGACT Symposium on Principles of
Programming Languages. Vol. 48. 1. 2013, pp. 549–560.
[5] J. Baumgartner, H. Mony, M. Case, J. Sawada, and K. Yorav. “Scalable
conditional equivalence checking: An automated invariant-generation
based approach.” In: Int’l Conf. on Formal Methods in CAD. 2009,
pp. 120–127.
[6] A. Biere, A. Cimatti, E. Clarke, and Y. Zhu. “Symbolic model checking
without BDDs.” In: International Conference on Tools and Algorithms
for the Construction and Analysis of Systems. 1999, pp. 193–207.
[7] B. Blanchet. “Security protocol veriﬁcation: Symbolic and computational
models.” In: International Conference on Principles of Security and Trust.
2012, pp. 3–29.
[8] K. Brace, R. Rudell, and R. Bryant. “Eﬃcient implementation of a BDD
package.” In: Design Automation Conf. 1991, pp. 40–45.
[9] A. Bradley. “SAT based model checking without unrolling.” In: Interna-
tional Workshop on Veriﬁcation, Model Checking, and Abstract Interpre-
tation. 2011, pp. 70–87.
[10] A. Bradley. “Understanding IC3.” In: International Conference on The-
ory and Applications of Satisﬁability Testing. 2012, pp. 1–14.
[11] D. Brand. “Veriﬁcation of large synthesized designs.” In: International
Conference on Computer-Aided Design. 1993, pp. 534–537.
123
124 Bibliography
[12] R. K. Brayton. “The decomposition and factorization of Boolean ex-
pressions.” In: IEEE International Symposium on Circuits and Systems.
1982.
[13] R. Brayton and A. Mishchenko. “ABC: An academic industrial-strength
veriﬁcation tool.” In: Computer Aided Veriﬁcation. 2010, pp. 24–40.
[14] M. Brickenstein and A. Dreyer. “PolyBoRi: A framework for Gröbner-
basis computations with Boolean polynomials.” In: Journal of Symbolic
Computation 44.9 (2009), pp. 1326–1345.
[15] R. E. Bryant. “Symbolic Boolean manipulation with ordered binary-
decision diagrams.” In: ACM Computing Surveys 24.3 (1992), pp. 293–
318.
[16] R. Bryant. “Graph-based algorithms for Boolean function manipula-
tion.” In: IEEE Transactions on Computers 100.8 (1986), pp. 677–691.
[17] R. Bryant and Y. Chen. “Veriﬁcation of arithmetic circuits with binary
moment diagrams.” In: Design Automation Conf. 1995, pp. 535–541.
[18] R. Bryant and Y. Chen. “Veriﬁcation of arithmetic circuits using bi-
nary moment diagrams.” In: International Journal on Software Tools
for Technology Transfer 3.2 (2001), pp. 137–155.
[19] B. Buchberger. “Bruno Buchberger’s PhD thesis 1965: An algorithm
for ﬁnding the basis elements of the residue class ring of a zero di-
mensional polynomial ideal.” In: Journal of Symbolic Computation 41.3
(2006), pp. 475–511.
[20] J. Burch, E. Clarke, K. McMillan, and D. Dill. “Sequential circuit ver-
iﬁcation using symbolic model checking.” In: Design Automation Conf.
1990, pp. 46–51.
[21] Y. T. Chang and K. T. Cheng. “Self-referential veriﬁcation of gate-level
implementations of arithmetic circuits.” In: Design Automation Conf.
2002, pp. 311–316.
[22] J. Chen and Y. Chen. “Equivalence checking of integer multipliers.” In:
ASP Design Automation Conf. 2001, pp. 169–174.
[23] M. Ciesielski, C. Yu, D. Liu, and W. Brown. “Veriﬁcation of gate-level
arithmetic circuits by function extraction.” In: Design Automation Conf.
2015, 52:1–52:6.
[24] E. Clarke, M. Fujita, and X. Zhao. “Hybrid decision diagrams.” In: In-
ternational Conference on Computer-Aided Design. 1995, pp. 159–163.
Bibliography 125
[25] J. Cortadella. “Timing-driven logic bi-decomposition.” In: IEEE Trans-
actions on Computer Aided Design of Circuits and Systems 22.6 (2003),
pp. 675–685.
[26] D. Cox, J. Little, and D. O’Shea. Ideals, Varieties, and Algorithms.
Springer, 1997.
[27] N. Cutland. Computability: An Introduction to Recursive Function The-
ory. Cambridge university press, 1980.
[28] M. Davis, G. Logemann, and D. Loveland. “A machine program for
theorem-proving.” In: Communications of the ACM 5.7 (1962), pp. 394–
397.
[29] G. Dowek, A. Felty, H. Herbelin, G. Huet, C. Murthy, C. Parent, C.
Paulin-Mohring, and B. Werner. The COQ proof assistant: User’s guide:
Version 5.6. Tech. rep. Technical Report TR 134, INRIA, 1992.
[30] R. Drechsler, B. Becker, and S. Ruppertz. “The K* BMD: a veriﬁcation
data structure.” In: IEEE Design & Test 14.2 (1997), pp. 51–59.
[31] R. Drechsler, M. Herbstritt, and B. Becker. “Grouping heuristics for
word-level decision diagrams.” In: IEEE International Symposium on
Circuits and Systems. 1999, pp. 411–414.
[32] R. Drechsler and D. Sieling. “Binary decision diagrams in theory and
practice.” In: International Journal on Software Tools for Technology
Transfer 3.2 (2001), pp. 112–136.
[33] V. D’silva, D. Kroening, and G. Weissenbacher. “A survey of automated
techniques for formal software veriﬁcation.” In: IEEE Transactions on
Computer Aided Design of Circuits and Systems 27.7 (2008), pp. 1165–
1178.
[34] B. Dutertre and L. De Moura. “A fast linear-arithmetic solver for DPLL
(T).” In: Computer Aided Veriﬁcation. 2006, pp. 81–94.
[35] C. Eder and J. Perry. “F5C: a variant of Faugere’s F5 algorithm with
reduced Gröbner bases.” In: Journal of Symbolic Computation 45.12
(2010), pp. 1442–1458.
[36] N. Een, A. Mishchenko, and R. Brayton. “Eﬃcient implementation of
property directed reachability.” In: Int’l Conf. on Formal Methods in
CAD. 2011, pp. 125–134.
[37] N. Een and N. Sörensson. “An extensible SAT solver.” In: Theory and
Applications of Satisﬁability Testing. Vol. 2919. 2004, pp. 502–518.
126 Bibliography
[38] C. Van Eijk. “Sequential equivalence checking based on structural simi-
larities.” In: IEEE Transactions on Computer Aided Design of Circuits
and Systems 19.7 (2000), pp. 814–819.
[39] F. Farahmandi and B. Alizadeh. “Gröbner basis based formal veriﬁcation
of large arithmetic circuits using Gaussian elimination and cone-based
polynomial extraction.” In: Microprocessors and Microsystems 39.2 (2015),
pp. 83–96.
[40] J. Faugere. “A new eﬃcient algorithm for computing Gröbner bases
(F4).” In: Journal of Pure and Applied Algebra 139.1 (1999), pp. 61–88.
[41] M. Francis and A. Dukkipati. “Reduced Gröbner bases and Macaulay–
Buchberger basis theorem over Noetherian rings.” In: Journal of Sym-
bolic Computation 65 (2014), pp. 1–14.
[42] M. Fujita. “Veriﬁcation of arithmetic circuits by comparing two similar
circuits.” In: Computer Aided Veriﬁcation. 1996, pp. 159–168.
[43] E. Goldberg, M. Prasad, and R. Brayton. “Using SAT for combinational
equivalence checking.” In: Design, Automation and Test in Europe. 2001,
pp. 114–121.
[44] G. Greuel, F. Seelisch, and O. Wienand. “The Gröbner basis of the
ideal of vanishing polynomials.” In: Journal of Symbolic Computation
46.5 (2011), pp. 561–570.
[45] E. Guralnik, M. Aharoni, A. J. Birnbaum, and A. Koyfman. “Simulation-
based veriﬁcation of ﬂoating-point division.” In: IEEE Transactions on
Computer Aided Design of Circuits and Systems 60.2 (2011), pp. 176–
188.
[46] L. Haller, A. Griggio, M. Brain, and D. Kroening. “Deciding ﬂoating-
point logic with systematic abstraction.” In: Int’l Conf. on Formal Meth-
ods in CAD. 2012, pp. 131–140.
[47] K. Hamaguchi, A. Morita, and S. Yajima. “Eﬃcient construction of bi-
nary moment diagrams for verifying arithmetic circuits.” In: Interna-
tional Conference on Computer-Aided Design. 1995, pp. 78–82.
[48] J. Harrison. The HOL Light Manual (1.1). Tech. rep. University of Cam-
bridge Computer Laboratory, 1998.
[49] J. Harrison. “A machine-checked theory of ﬂoating point arithmetic.” In:
International Conference on Theorem Proving in Higher Order Logics.
1999, pp. 113–130.
Bibliography 127
[50] “IEEE standard for ﬂoating-point arithmetic.” In: IEEE Std 754-2008
(2008), pp. 1–70.
[51] Bergeron J. Writing Testbenches using SystemVerilog. Springer Science
& Business Media, 2007.
[52] C. Jacobi. “Formal veriﬁcation of a fully IEEE compliant ﬂoating point
unit.” PhD thesis. Universität des Saarlandes, 2002.
[53] C. Jacobi and B. Christoph. “Formal veriﬁcation of the VAMP ﬂoating
point unit.” In: Formal Methods in System Design 26.3 (2005), pp. 227–
266.
[54] C. Jacobi, K. Weber, V. Paruthi, and J. Baumgartner. “Automatic for-
mal veriﬁcation of fused-multiply-add FPUs.” In: Design, Automation
and Test in Europe. 2005, pp. 1298–1303.
[55] R. Kaivola and M. Aagaard. “Divider circuit veriﬁcation with model
checking and theorem proving.” In: International Conference on Theo-
rem Proving in Higher Order Logics. 2000, pp. 338–355.
[56] R. Kaivola and N. Narasimhan. “Formal veriﬁcation of the Pentium®
4 ﬂoating-point multiplier.” In: Design, Automation and Test in Europe.
2002, pp. 20–27.
[57] A. Kandri-Rody and D. Kapur. “Computing a Gröbner basis of a poly-
nomial ideal over a Euclidean domain.” In: Journal of Symbolic Compu-
tation 6.1 (1988), pp. 37–57.
[58] D. Kapur and Y. Cai. “An algorithm for computing a Gröbner basis of
a polynomial ideal over a ring with zero divisors.” In: Mathematics in
Computer Science 2.4 (2009), pp. 601–634.
[59] M. Kaufmann, J. S. Moore, and R. S. Boyer. A computational logic for
applicative common lisp (ACL2). available at http://www.cs.utexas.
edu/users/moore/acl2/. 2016.
[60] M. Keim, R. Drechsler, B. Becker, M. Martin, and P. Molitor. “Polyno-
mial formal veriﬁcation of multipliers.” In: Formal Methods in System
Design 22.1 (2003), pp. 39–58.
[61] A. KiranKumar, A. Gupta, and R. Ghughal. “Symbolic trajectory eval-
uation: The primary validation vehicle for next generation Intel® pro-
cessor graphics fpu.” In: Int’l Conf. on Formal Methods in CAD. 2012,
pp. 149–156.
[62] I. Koren. Computer Arithmetic Algorithms. Universities Press.
128 Bibliography
[63] U. Krautz, V. Paruthi, A. Arunagiri, S. Kumar, S. Pujar, and T. Babin-
sky. “Automatic veriﬁcation of ﬂoating point units.” In: Design Automa-
tion Conf. 2014, pp. 1–6.
[64] D. Kroening and O. Strichman. Decision Procedures: An Algorithmic
Point of View. Springer Science & Business Media, 2008.
[65] A. Kühlmann and F. Krohm. “Equivalence checking using cuts and
heaps.” In: Design Automation Conf. 1997, pp. 263–268.
[66] A. Kühlmann, V. Paruthi, F. Krohm, and M. K. Ganai. “Robust Boolean
reasoning for equivalence checking and functional property veriﬁcation.”
In: IEEE Transactions on Computer Aided Design of Circuits and Sys-
tems 21.12 (2002), pp. 1377–1394.
[67] D. Lundgren. Double Precision Floating Point Core Verilog. available at
http://opencores.org/project,double_fpu. 2016.
[68] J. Lv, P. Kalla, and F. Enescu. “Eﬃcient Gröbner basis reductions for
formal veriﬁcation of galois ﬁeld multipliers.” In: Design, Automation
and Test in Europe. 2012, pp. 899–904.
[69] K. McMillan. “Interpolation and SAT-based model checking.” In: Com-
puter Aided Veriﬁcation. 2003, pp. 1–13.
[70] P. Miner. “Deﬁning the IEEE-854 ﬂoating-point standard in PVS.” In:
Tech. Rep. TM-110167, NASA Langley Research Center (1995).
[71] A. Mishchenko, S. Chatterjee, and R. Brayton. “DAG-aware AIG rewrit-
ing a fresh look at combinational logic synthesis.” In: Design Automation
Conf. 2006, pp. 532–535.
[72] A. Mishchenko, S. Chatterjee, R. Brayton, and N. Een. “Improvements
to combinational equivalence checking.” In: International Conference on
Computer-Aided Design. 2006, pp. 836–843.
[73] L. De Moura and N. Bjørner. “Z3: An eﬃcient SMT solver.” In: Inter-
national conference on Tools and Algorithms for the Construction and
Analysis of Systems. 2008, pp. 337–340.
[74] J. O’Leary, R. Kaivola, and T. Melham. “Relational STE and theorem
proving for formal veriﬁcation of industrial circuit designs.” In: Int’l Conf.
on Formal Methods in CAD. 2013, pp. 97–104.
[75] S. Owre, J. Rushby, and N. Shankar. “PVS: A prototype veriﬁcation
system.” In: International Conference on Automated Deduction. 1992,
pp. 748–752.
Bibliography 129
[76] J. O’Leary, X. Zhao, R. Gerth, and C. Seger. “Formally verifying IEEE
compliance of ﬂoating-point hardware.” In: Intel Technology Journal 3.1
(1999), pp. 1–14.
[77] E. Pavlenko, M. Wedler, D. Stoﬀel, W. Kunz, A. Dreyer, F. Seelisch,
and G. Greuel. “STABLE: A new QF-BV SMT solver for hard veriﬁca-
tion problems combining Boolean reasoning with computer algebra.” In:
Design, Automation and Test in Europe. 2011, pp. 1–6.
[78] T. Pruss, P. Kalla, and F. Enescu. “Eﬃcient symbolic computation for
word-level abstraction from combinational circuits for veriﬁcation over
ﬁnite ﬁelds.” In: IEEE Transactions on Computer Aided Design of Cir-
cuits and Systems 35.7 (2016), pp. 1206–1218.
[79] E. Reeber and J. Sawada. “Combining ACL2 and an automated veri-
ﬁcation tool to verify a multiplier.” In: International Workshop on the
ACL2 Theorem Prover and its Applications. 2006, pp. 63–70.
[80] K. Rozier. “Linear temporal logic symbolic model checking.” In: Com-
puter Science Review 5.2 (2011), pp. 163–203.
[81] R. Rudell. “Dynamic variable ordering for ordered binary decision dia-
grams.” In: International Conference on Computer-Aided Design. 1993,
pp. 42–47.
[82] D. Russinoﬀ, M. Kaufmann, E. Smith, and R. Sumners. “Formal veriﬁ-
cation of ﬂoating-point RTL at AMD using the ACL2 theorem prover.”
In: 2005.
[83] Y. Sato, S. Inoue, A. Suzuki, K. Nabeshima, and K. Sakai. “Boolean
Gröbner bases.” In: Journal of Symbolic Computation 46.5 (2011), pp. 622–
632.
[84] A. Sayed-Ahmed, H. Fahmy, and U. Kühne. “Veriﬁcation of the deci-
mal ﬂoating-point square root operation.” In: European Test Symposium.
2014, pp. 1–2.
[85] A. Sayed-Ahmed, U. Kühne, D. Große, and R. Drechsler. “Recurrence
relations revisited: Scalable veriﬁcation of bit level multiplier circuits.”
In: IEEE Annual Symposium on VLSI. 2015, pp. 1–6.
[86] A. Sayed-Ahmed, D. Große, M. Soeken, and R. Drechsler. “Equivalence
checking using Gröbner bases.” In: Int’l Conf. on Formal Methods in
CAD. 2016, pp. 169–176.
130 Bibliography
[87] A. Sayed-Ahmed, D. Große, U. Kühne, M. Soeken, and R. Drechsler.
“Formal veriﬁcation of integer multipliers by combining Gröbner basis
with logic reduction.” In: Design, Automation and Test in Europe. 2016,
pp. 1048–1053.
[88] P. Seidel. “Formal veriﬁcation of an iterative low-power x86 ﬂoating-
point multiplier with redundant feedback.” In: International Workshop
on the ACL2 Theorem Prover and its Applications. 2011, pp. 70–83.
[89] H. Sharangpani and M. Barton. Statistical analysis of ﬂoating point ﬂaw
in the pentium processor. Tech. rep. Intel Corporation, 1994.
[90] A. Slobodová. “Challenges for formal veriﬁcation in industrial setting.”
In: International Workshop on Formal Methods for Industrial Critical
Systems. 2006, pp. 1–22.
[91] A. Slobodová, J. Davis, S. Swords, and W. Hunt. “A ﬂexible formal ver-
iﬁcation framework for industrial scale validation.” In: Formal Methods
and Models for Codesign (MEMOCODE). 2011, pp. 89–97.
[92] E. Smith and D. Dill. “Automatic formal veriﬁcation of block cipher im-
plementations.” In: Int’l Conf. on Formal Methods in CAD. 2008, pp. 1–
7.
[93] J. Smith and G. De Micheli. “Polynomial circuit models for component
matching in high-level synthesis.” In: IEEE Transactions on Very Large
Scale Integration (VLSI) Systems 9.6 (2001), pp. 783–800.
[94] M. Soeken, B. Sterin, R. Drechsler, and R. Brayton. “Simulation graphs
for reverse engineering.” In: Int’l Conf. on Formal Methods in CAD. 2015,
pp. 152–159.
[95] S. Stifter. “A generalization of reduction rings.” In: Journal of Symbolic
Computation 4.3 (1987), pp. 351–364.
[96] D. Stoﬀel and W. Kunz. “Equivalence checking of arithmetic circuits
on the arithmetic bit level.” In: IEEE Transactions on Computer Aided
Design of Circuits and Systems 23.5 (2004), pp. 586–597.
[97] D. Stoﬀel, E. Karibaev, I. Kufareva, and W. Kunz. Advanced Formal
Veriﬁcation. Ed. by R. Drechsler. Kluwer Academic Publishers, 2004.
[98] G. Tseitin. “On the complexity of proofs in propositional logics.” In:
Seminars in Mathematics. Vol. 8. 1970, pp. 466–483.
Bibliography 131
[99] Y. Watanabe, N. Homma, T. Aoki, and T. Higuchi. “Application of
symbolic computer algebra to arithmetic circuit veriﬁcation.” In: Int’l
Conf. on Comp. Design. 2007, pp. 25–32.
[100] O. Wienand, M. Wedler, D. Stoﬀel, W. Kunz, and G. M. Greuel. “An al-
gebraic approach for proving data correctness in arithmetic data paths.”
In: Computer Aided Veriﬁcation. 2008, pp. 473–486.
[101] C. Wolf. Yosys Open Synthesis Suite. available at http://www.clifford.
at/yosys/. 2016.
[102] B. Xue, P. Chatterjee, and S. K. Shukla. “Simpliﬁcation of C-RTL equiv-
alent checking for fused multiply add unit using intermediate models.”
In: ASP Design Automation Conf. 2013, pp. 723–728.
[103] C. Yu and M. Ciesielski. “Automatic word-level abstraction of datapath.”
In: IEEE International Symposium on Circuits and Systems. 2016.
[104] J. Yuan, C. Pixley, and A. Aziz. Constraint-Based Veriﬁcation. Springer
Science & Business Media, 2006.
[105] L. Zhang and S. Malik. “Validating SAT solvers using an independent
resolution-based checker: Practical implementations and other applica-
tions.” In: Design, Automation and Test in Europe. 2003, pp. 10880–
10885.
