Formal Verification of Integer Multipliers by Combining Gröbner Basis with Logic Reduction by Sayed-Ahmed, Amr et al.
Formal Veriﬁcation of Integer Multipliers by
Combining Gro¨bner Basis with Logic Reduction
Amr Sayed-Ahmed1 Daniel Große1,2 Ulrich Ku¨hne1 Mathias Soeken1,3 Rolf Drechsler1,2
1Faculty of Mathematics and Computer Science, University of Bremen, Germany
2Cyber-Physical Systems, DFKI GmbH, Bremen, Germany 3Integrated Systems Laboratory (LSI), EPFL, Switzerland
{asahmed,grosse,ulrichk,msoeken,drechsle}@informatik.uni-bremen.de
Abstract—Formal veriﬁcation utilizing symbolic computer al-
gebra has demonstrated the ability to formally verify large
Galois ﬁeld arithmetic circuits and basic architectures of integer
arithmetic circuits. The technique models the circuit as Gro¨bner
basis polynomials and reduces the polynomial equation of the
circuit speciﬁcation wrt. the polynomials model. However, during
the Gro¨bner basis reduction, the technique suffers from expo-
nential blow-up in the size of the polynomials, if it is applied
on parallel adders and recoded multipliers. In this paper, we
address the reasons of this blow-up and present an approach
that allows to apply the technique on basic and complex parallel
architectures of multipliers. The approach is based on applying
a logic reduction rule during Gro¨bner basis rewriting. The rule
uses structural circuit information to remove terms that evaluate
to zero before their blow-up. The experiments show that the
approach is applicable up to 128 bit multipliers.
I. INTRODUCTION
Verifying arithmetic circuits is hard. Since the famous FDIV
bug in Intel’s Pentium processor [1], a lot of effort has been
spent developing automated and formal techniques which can
prove the correctness of a design beyond mere testing. Among
the basic operations, especially multiplication has turned out
to be a tough nut to crack. Decision diagrams—such as
BDDs or *BMDs—suffer from exponential blow-up. Boolean
Satisﬁability (SAT) techniques and also Satisﬁability Modulo
Theories (SMT) solvers fail to verify multiplier circuits of larger
scale. Theorem provers can be used but require an enormous
amount of manual effort and expert knowledge.
The most successful techniques up to today are based on
reverse engineering an arithmetic bit-level (ABL) representation
of the circuit [2] and—more recently—using computer algebra
techniques on polynomial representations [3]–[8]. The latter
techniques reduce the veriﬁcation problem to membership
testing of the speciﬁcation polynomial in the ideal spanned by
the circuit polynomials. The foundation for these techniques
is Gro¨bner basis reduction. While these algebraic techniques
have been applied successfully on large Galois Field arithmetic
circuits [5] and ABL networks [3], [4], the veriﬁcation
of integer arithmetic on the gate netlist is limited by the
exponential blow-up of the polynomial representation during
Gro¨bner basis reduction.
The work in [7] improves the scalability by rewriting the
polynomial model of the circuit, making the veriﬁcation of a
limited class of integer multipliers feasible. The technique
This work was supported in part by the German Research Foundation (DFG)
within the Reinhart Koselleck project DR 287/23-1, by the University of
Bremen’s graduate school SyDe, funded by the German Excellence Initiative
and by the German Federal Ministry of Education and Research (BMBF)
within the project EffektiV under contract no. 01IS13022E.
allows the early cancellation of shared subterms in the
polynomial representation, which effectively prevents the blow-
up during Gro¨bner basis reduction. However, the technique
does not scale for multipliers using parallel architectures such
as parallel preﬁx adders (PPAs) or Booth recoding. The main
reason—as identiﬁed by us—is the accumulation of vanishing
monomials, which refers to monomials that always evaluate to
zero. The problem is that these vanishing monomials cannot
be identiﬁed locally using the approach of [7].
In this work, we present an algebraic technique that en-
ables the veriﬁcation of a large class of multiplier circuits,
i.e., including basic and parallel multiplier architectures. Based
on the observation of accumulating vanishing monomials, we
propose a novel rewriting scheme. In particular, the technique
makes use of structural knowledge on the circuit netlist in
order to identify vanishing monomials early in the Gro¨bner
basis reduction process. Using our technique, we can verify
complex multiplier circuits of up to 128 bit in practical time.
The contributions of this work are:
1) Determining the reason of the inefﬁciency of applying
the computer algebra technique using state-of-the-art
algorithms to integer multipliers consisting of Booth
partial products and PPAs.
2) Observing that rewriting as an explicit step in the
membership testing algorithm is capable of circumventing
blow-ups in the Gro¨bner basis reduction.
3) Proposing rewriting schemes based on logic reduction to
remove vanishing monomials that appear in the reduction
when verifying complex parallel integer multipliers.
II. PRELIMINARIES
Veriﬁcation using computer algebra is based on modeling
the circuit under veriﬁcation as Gro¨bner basis polynomials
G = {g1, . . . , gs} and testing the membership of the speci-
ﬁcation polynomial pspec in the Gro¨bner basis polynomials
G. The membership testing is done by reducing (dividing)
pspec wrt. G. In the following, we deﬁne common notation and
deﬁnitions and the ideal membership based on [9]. We explain
the membership testing algorithm as implemented by [6], [7]
to verify large basic integer arithmetic circuits.
A. Notation and Deﬁnitions
For a polynomial ring K[x1, . . . , xn] of n variables, a
monomial M = xα11 · · · · · xαnn is the power product over
the variables x1, . . . , xn, where αi ≥ 0. A polynomial
p = c1M1 + · · ·+ ctMt is a ﬁnite sum of terms, where each
term is the product of a coefﬁcient ci and a monomial Mi.
The monomials of a polynomial are ordered according to a
1048978-3-9815370-6-2/DATE16/ c©2016 EDAA
monomial ordering ‘>’, such that M1 > · · · > Mt, the leading
term of the polynomial is lt(p) = c1M1, the leading monomial
is lm(p) = M1, and the leading coefﬁcient is lc(p) = c1.
The set of variables appearing in polynomial p is denoted by
Vars(p).
For a set of polynomials P = {p1, . . . , ps} ∈ K[x1, . . . , xn],
an afﬁne variety V (p1, . . . , ps) is the set of all solutions
of the polynomial equations p1(x1, . . . , xn) = · · · =
ps(x1, . . . , xn) = 0. An ideal I = 〈P 〉 = {
∑s
i=1 hi · pi :
hi ∈ K[x1, . . . , xn]} is generated by this set of polynomials
P , and we call P the basis (generators) of the ideal I . The
ideal I may have many other bases. The bases are different
representations of the set of polynomials P . One of these
bases is called Gro¨bner basis G = {g1, . . . , gsˆ}, for which
V (G) = V (I).
A polynomial reduction method named S-polynomial is
designed to cancel the leading terms of two polynomials
and is used by algorithms to reduce or divide Gro¨bner basis
polynomials.
Deﬁnition 1: The S-polynomial of polynomials p and g
in a polynomial set P , is the combination Spoly(p, g) =
L
lt(p)p − Llt(g)g, where L is the least common multi-
ple LCM(lm(p), lm(g)). Note that Spoly(p, g) cancels the
leading terms of p and g, the remainder r obtained in
Spoly(p, g)
P−−−→+ r gives a new leading term.
To compute the Gro¨bner basis G = {g1, ..., gsˆ} for an ideal
I〈p1, . . . , ps〉, Buchberger’s algorithm constructs G in a ﬁnite
number of steps by applying Spoly(p, g) G−−−→+ r in every
step. A Gro¨bner basis is computed if all Spoly(p, g) G−−−→+ 0.
Lemma 1: Given a ﬁnite set G ∈ K[x1, . . . , xn], suppose
that we have p, g ∈ G such that LCM(lm(p), lm(g)) = lm(p) ·
lm(g). In other words, the leading monomials of p and g are
relatively prime. Then Spoly(p, g) G−−−→+ 0 [9].
According to Lemma 1, a given polynomial set is a Gro¨bner
basis, if the leading monomials of all polynomials in the set
are relatively prime. By combining this lemma with the afﬁne
variety concept of an ideal, we deﬁne the Gro¨bner basis of an
ideal as follows:
Deﬁnition 2: A ﬁnite subset G = {g1, . . . , gs} wrt. a
monomial order of an ideal I is said to be a Gro¨bner basis
of I if V (G) = V (I) and all leading monomials in G are
relatively prime.
A given ideal may have different Gro¨bner bases, where one
basis can be reduced wrt. a monomial ordering to other bases.
These bases can be reduced again to a canonical representation
of the ideal that is called reduced Gro¨bner basis.
The ability of the Gro¨bner basis to reveal the properties of
the ideal allows to solve the ideal membership problem in an
algorithmic fashion. The ideal membership algorithm decides
whether a given polynomial p lies in ideal I = 〈p1, . . . , ps〉, by
combining the Gro¨bner basis with a division algorithm. First,
it ﬁnds a Gro¨bner basis G for I . Then it applies a division
algorithm to check that the remainder r on dividing p by G is
equal to zero. The division is denoted p G−−−→+ r.
B. Membership Testing Algorithm
The computer algebra technique veriﬁes a circuit based on
the ideal membership algorithm using an algorithm called
membership testing (MT) that contains of the following four
steps:
1) Model the circuit as Gro¨bner basis polynomials G =
{g1, . . . , gs} and the speciﬁcation as a polynomial pspec.
2) Rewrite the Gro¨bner basis G to a new Gro¨bner basis Gn
that has less variables.
3) Reduce (divide) pspec wrt. Gn, denoted pspec
Gn−−−−→+ r,
where r is the remainder of dividing pspec by Gn. Repeat
this step until no term in r is divisible by the leading
term of any polynomial in Gn.
4) If r = 0, the circuit satisﬁes the speciﬁcation, otherwise,
a mismatch between the algebraic model Gn and the
speciﬁcation equation is announced.
The second step (rewriting) is not required for the soundness
of the algorithm. However, as we show later in the paper, it
is crucial for the application to large integer circuits. In the
following, each step of the MT algorithm is explained in detail.
We explain Step 3 before Step 2 since it is easier to follow.
Step 1: Modeling a circuit as Gro¨bner Basis: The MT
algorithm uses an algebraic model of the circuit. Logic gates
are modeled by polynomials and signals as Boolean variables.
The polynomials of basic Boolean gates are
z = ¬a =⇒ g := −z + 1− a
z = a ∧ b =⇒ g := −z + ab
z = a ∨ b =⇒ g := −z + a+ b− ab
z = a⊕ b =⇒ g := −z + a+ b− 2ab.
Each logic gate is modeled in a way that the gate output
variable z is described in terms of the gate input variables
a, b. The polynomial x2 − x should be added to the model
for each variable to enforce the Boolean domain. In practice,
the ideal polynomials 〈x2 − x〉 are replaced by reducing xk
to x every time its degree becomes greater than one during
any computational step. For example, the monomial x21x
3
2x3
is equal to x1x2x3 in the Boolean domain.
By ordering each variable of the model according to
its reverse topological level in the circuit, the generated
polynomials satisfy Def. 2 by construction. Every polynomial
will be of the form pi := xi + tail(pi), where xi is the gate’s
output variable and tail(pi) are terms consisting of the gate’s
input variables, describing the function implemented by the gate.
According to this polynomial form, all the leading monomials
of the model will be relatively prime.
Example 1: Consider the full adder circuit implementing
the function si + 2ci = ai + bi + ci−1 shown in Fig. 1. Its
algebraic model is
g1 := −ci − x4x3 + x4 + x3 g2 := −si − 2x1ci−1 + x1 + ci−1
g3 := −x4 + x2ci−1 g4 := −x3 + aibi
g5 := −x2 − aibi + ai + bi g6 := −x1 − 2aibi + ai + bi
The speciﬁcation polynomial is pspec := −2ci−si+ci−1+bi+
ai. Ordering the polynomial variables in the reverse topological
order of the circuit yields ci > si > x4 > x3 > x2 > x1 >
ci−1 > bi > ai. Following this order, the leading monomials
of all polynomials will be relatively prime. E.g., the leading
monomial of g1 is ci, and it is prime relative to all other leading
monomials. According to Def. 2, the extracted algebraic model
is therefore a Gro¨bner basis.
Modeling the circuit directly as Gro¨bner basis polynomials
avoids Buchberger’s algorithm and makes it computationally
feasible to verify large arithmetic circuits using the MT
2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) 1049
ai g6 g2
bi g5 si
ci−1 g3
g4 g1 ci
x1
x4x3
x2
Fig. 1. A simple full adder.
Algorithm 1 Gro¨bner Basis Reduction (GB reduction)
Input: Speciﬁcation polynomial pspec, circuit polynomials
G = {g1, g2, . . . , gs}
Output: Remainder of Gro¨bner basis reduction r
1: V ← OrderedPolynomialV ariables(pspec, G) /* Sub-
stitution ordering */
2: r ← pspec
3: for i in 0 to |V | − 1 do
4: if V [i] 	∈ PrimaryInputs then
5: Choose gt ∈ G such that lm(gt) = V [i]
6: r ← Spoly(r, gt)
7: end if
8: end for
algorithm. In [5], it is applied on Galois ﬁeld arithmetic. In
case of integer arithmetic, the Gro¨bner basis reduction suffers
from an exponential blow-up in the number of intermediate
monomials during the division (reduction) process, because of
nonlinear polynomial terms that model the carry chains. These
nonlinear terms do not occur in case of Galois ﬁeld arithmetic.
Step 3: Gro¨bner Basis Reduction: The reduction in Step 3 of
the MT algorithm (in the remainder referred to as GB reduction)
is performed according to Algorithm 1. Given a speciﬁcation
polynomial pspec and a circuit model in form of a Gro¨bner
basis G, pspec is divided in every iteration by some polynomial
g ∈ G using the S-polynomial. The division can be seen
as substituting the variables in pspec with the corresponding
tail terms of the respective polynomials in G. For example,
given pspec := x4x3 + x1 and a polynomial g := −x4 + x2x1,
then Spoly(pspec, g) = x4x3x4x3 pspec − x4x3−x4 g = x3x2x1 + x1,
where the S-polynomial substitutes x4 in pspec with x2x1. The
division (substitution) iterations are executed according to a
certain order, the substitution order. This order is crucial in
order to cancel the carry terms of integer arithmetic before
the blow-up of the intermediate monomials. In [6], [7], the
substitution ordering follows the reverse topological order of
the circuit variables, in addition to the fanouts of the gates:
Variables that have the same level and depend on common
inputs (fanouts) must follow each other in the substitution.
Following Example 1, the extracted algebraic model is a
Gro¨bner basis, therefore the GB reduction can be applied. As
the full adder circuit has no gates with multiple fanouts, the
substitution order will follow only the reverse topological order
of the circuit:
pspec
g1−−−−→ −si + 2x4x3 − 2x4 − 2x3 + ci−1 + bi + ai
g2−−−−→ 2x4x3 − 2x4 − 2x3 + 2x1ci−1 − x1 + bi + ai
g3−−−−→ 2x3x2ci−1 − 2x3 − 2x2ci−1 + 2x1ci−1 − x1 + bi + ai
g4−−−−→ 2x2ci−1biai − 2x2ci−1 +2x1ci−1 − x1 − 2biai + bi + ai
g5−−−−→ 2x1ci−1 − x1 +4ci−1biai − 2ci−1ai − 2ci−1bi − 2aibi +
bi + ai
g6−−−−→ 0
Step 2: Rewriting the Gro¨bner Basis: The bottleneck of the
MT algorithm is the GB reduction which may cause a blow-up
in the number of monomials in the remainder. Rewriting the
Gro¨bner basis can avoid such blow-up, but typically depends
on the considered types of circuits under veriﬁcation. In [7], a
rewriting scheme, called fanout rewriting, has been proposed
based on the fanouts of the circuit gates, such that the model
terms will depend only on shared variables. This dependency
increases the chance of canceling common terms during the
GB reduction. The rewriting is performed in two steps: 1)
It ﬁnds the gates that have multiple fanouts and stores the
corresponding output variables in a list, 2) It substitutes all
variables that are not in this list, such that the model will
depend only on fanouts, primary inputs, and primary outputs.
This fanout rewriting and the substitution ordering of the GB
reduction permit to cancel carry terms before their blow-up,
as shown in the following example.
Example 2: Consider a 3-bit ripple carry adder implementing
the function
∑2
i=0 2
isi =
∑2
i=0 2
i(ai + bi). Since only the
carry signals have multiple fanout, after fanout rewriting, all
polynomials will depend on carry variables ci, inputs, and
outputs. The model is
s3 = c2 =⇒ g1 := −s3 + c2
c2 = (a2 ∧ b2) ∨ (a2 ∧ c1) ∨ (b2 ∧ c1) =⇒
g2 := −c2 −2c1b2a2 + c1b2 + c1a2 + b2a2
s2 = a2 ⊕ b2 ⊕ c1 =⇒
g3 := −s2 +4c1b2a2 − 2c1b2 − 2c1a2 − 2b2a2 + c1 + b2 + a2
c1 = (a1 ∧ b1) ∨ (a1 ∧ c0) ∨ (b1 ∧ c0) =⇒
g4 := −c1 −2c0b1a1 + c0b1 + c0a1 + b1a1
s1 = a1 ⊕ b1 ⊕ c0 =⇒
g5 := −s1 +4c0b1a1 − 2c0b1 − 2c0a1 − 2b1a1 + c0 + b1 + a1
c0 = a0 ∧ b0 =⇒ g6 := −c0 +b0a0
s0 = a0 ⊕ b0 =⇒ g7 := −s0 −2b0a0 + b0 + a0 The
speciﬁcation polynomial is pspec := −8s3 − 4s2 − 2s1 −
s0 + 4b2 + 4a2 + 2b1 + 2a1 + b0 + a0. Rewriting the model
wrt. circuit fanouts yields that polynomials g2, g3 have common
monomials (colored green/dashed box in the example). These
carry monomials cancel each other during the reduction of
pspec wrt. the rewritten model following the substitution order
s3 > c2 > s2 > c1 > s1 > c0 > s0 > b2 > b1 > b0 > a2 >
a1 > a0. Similar cancellation occurs for equally colored terms
of polynomials g4, g5 and polynomials g6, g7, respectively.
Without rewriting, these carry terms will only be eliminated
after reducing them to the input variables, which leads to an
exponential increase in the number and size of monomials.
Therefore, rewriting is required to enable veriﬁcation for large
integer circuits.
The simple rewriting described above is effective for the
veriﬁcation of basic multiplier architectures, i.e., multipliers
with simple partial products generators and ripple carry adders
in the last addition stage, but it fails to verify more complex
architectures. In the next section, we will provide an explanation
of this limitation, which forms the basis of our improved
veriﬁcation technique.
III. PROBLEM STATEMENT
If the membership testing algorithm with the existing rewrit-
ing schemes of Section II is used, the veriﬁcation of integer
multipliers that consist of parallel adders or Booth recoding
1050 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)
is unfeasible. The main reason are vanishing monomials
(monomials that always evaluate to zero) which appear in every
algebraic model of these complex multipliers. Unfortunately,
the GB reduction cannot cancel these vanishing monomials
before substituting them with input variables. Some of the
vanishing monomials have the property that representing them
by input variables will increase the number of intermediate
monomials exponentially, therefore making the computation
unfeasible. In this and the following section, we illustrate the
vanishing monomials limitation with two examples: a parallel
adder and a Booth partial product cell; and show how to
overcome this problem by a new rewriting scheme enhanced
by logic reduction.
Example 3: Consider a circuit model of a 3-bit PPA:1
s3 = c2 =⇒ g1 := −s3 + c2
c2 = D2 ∨ (X2 ∧ D1) ∨ (X2 ∧ X1 ∧ D0) =⇒ g2 :=
−c2 +X2D2X1D1D0 −X2X1D1D0 −X2D2X1D0 −X2D2D1 +
X2X1D0 +X2D1 +D2
s2 = X2 ⊕ c1 =⇒ g3 := −s2 − 2c1X2 + c1 +X2
c1 = D1 ∨ (X1 ∧D0)=⇒ g4 := −c1−X1D1D0 +X1D0 +D1
s1 = X1 ⊕ c0 =⇒ g5 := −s1 − 2c0X1 + c0 +X1
c0 = D0 =⇒ g6 := −c0 +D0
s0 = X0 =⇒ g7 := −s0 +X0
X2 = a2 ⊕ b2 =⇒ g8 := −X2 − 2b2a2 + b2 + a2
D2 = a2 ∧ b2 =⇒ g9 := −D2 + b2a2
X1 = a1 ⊕ b1 =⇒ g10 := −X1 − 2b1a1 + b1 + a1
D1 = a1 ∧ b1 =⇒ g11 := −D1 + b1a1
X0 = a0 ⊕ b0 =⇒ g12 := −X0 − 2b0a0 + b0 + a0
D0 = a0 ∧ b0 =⇒ g13 := −D0 + b0a0.
si is the sum bit, ci is the carry bit, and for every input
bits ai, bi, there is a generation bit Di and a propagation
bit Xi. The vanishing monomials in this model are colored red.
As an example consider the vanishing monomial X1D1D0 of
polynomial g4. Substituting X1 and D1 in this monomial yields
X1D1D0
g10−−−−→ −2D1D0b1a1 + D1D0b1 + D1D0a1 g11−−−−→
−2D0b1a1 + D0b1a1 + D0b1a1 = 0. It is clear that the GB
reduction can easily cancel this monomial. However, the
corresponding monomials in the representation of the highest
carry g2 in an n-bit adder is Xn−1 . . . X2X1D1D0. This follows
from the circuit model cn−1 = Dn−1∨(Xn−1∧Dn−2)∨(Xn−1∧
Xn−2 ∧Dn−3)∨ · · · ∨ (Xn−1 ∧Xn−2 ∧ · · · ∧X2 ∧D1)∨ (Xn−1 ∧
Xn−2 ∧ · · · ∧X2 ∧X1 ∧D0).
By substituting in this monomial according to the order
Xn−1 > Dn−1 > · · · > X0 > D0, the number of vanish-
ing monomials will increase from 1 to 3n−1 monomials
with a maximum size of 2n variables. Consider another
vanishing monomial X2D2X1D0 of polynomial g2, the cor-
responding vanishing monomial for the carry bit cn−1 is
Xn−1Dn−1Xn−2 . . . X2X1D0. By substituting in this monomial
with a different order X0 > D0 > · · · > Xn−1 > Dn−1
compared to the previous one, the number of intermediate
vanishing polynomials will increase to be about 3n−1 with a
maximum size of 2n variables. From these two examples, we
conclude that it is hard to ﬁnd a substitution order to cancel all
vanishing monomials before they blow up. The experimental
results of [8] conﬁrm the problem of the symbolic computer
algebra approach with parallel adders. Their results show that
the technique cannot verify Kogge-Stone adders with more 6
bits.
1Please recall that parallel preﬁx adders are typically found in the last stage
of parallel multipliers.
Concluding our observations above, the core problem that we
need to solve is the occurrence of a large number of vanishing
monomials that lead to an exponential blow-up when reduced
to the input variables.
IV. LOGIC REDUCTION REWRITING
This section describes the main contributions of our work.
We present how to integrate logic reduction into Gro¨bner basis
rewriting and present the two rewriting schemes that combine
the elimination of vanishing monomials before they can cause
a blow-up and the advantages given by the existing fanout
rewriting technique.
A. Logic Reduction
To overcome the limitation caused by vanishing monomials
during GB reduction, we propose to apply logic reduction
during the rewriting of the Gro¨bner basis model (Step 2 of
the membership testing algorithm in Section II-B), in order to
remove vanishing monomials before their blow-up. Looking
again at Example 3, it is easy to see that the monomial X1D1D0
can be removed when considering that the variable X1 is the
XOR of a1, b1, and the variable D1 is the AND of a1, b1.
Based on this structural knowledge of the circuit model, we
can conclude that the monomial always evaluates to zero since
(a⊕b)·(a∧b) = 0 for all a and b. We refer to this as XOR-AND
vanishing rule.
By keeping track of the original gate function and the
input variables associated to each variable, we can effectively
search for monomials that satisfy the XOR-AND vanishing rule.
Applying this rule will remove all the vanishing monomials of
the parallel adder model shown in Example 3, and will avoid
the high computation cost of the GB reduction.
B. Rewriting Schemes
Although the correspondence of gates in the circuits to
variables in the polynomials is given, the XOR-AND vanishing
rule cannot directly be applied to the circuit obtained from
Step 1 in the MT algorithm:
1) if no substitution is applied, one may not see monomials
that contain both the XOR and AND of two common
input variables, and
2) if arbitrary substitution is applied, internal XOR gates
may be substituted.
Both cases prohibit the application of the XOR-AND vanishing
rule. Consequently, substitution must be performed in an order
that substitutes gates by preserving the monomials that represent
XOR gates. Note that fanout rewriting does not preserve the
XOR gates.
XOR rewriting: We propose an XOR rewriting scheme,
carried out as step 2 in the MT algorithm, which performs the
following steps:
1) Store all variables in a list V that refer to either input
and output variables of an XOR gate or to primary inputs
and primary outputs.
2) Rewrite the model using the S-polynomial method such
that the model depends only on variables in V . After
each substitution, apply the XOR-AND vanishing rule.
2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) 1051
Common rewriting: As discussed in Section II-B, the fanout
rewriting increases the chance that common subterms can be
canceled during GB reduction. This positive effect is reduced
by XOR rewriting, making the veriﬁcation inefﬁcient if only
XOR rewriting is applied. Hence, we propose to carry out a
further rewriting called common rewriting, which is similar
to fanout rewriting, after XOR rewriting. Common rewriting
simpliﬁes the task of GB reduction by making the polynomials
depend on shared variables. It rewrites the model obtained
from XOR rewriting such that the polynomials depend only
on variables that are used in more than one polynomial.2
Another part of the multiplier that shows the efﬁciency of
the proposed XOR rewriting to reveal vanishing monomials is
the Booth partial product cell. Although every cell has only one
vanishing monomial, canceling it later by GB reduction causes a
blow-up. Without early cancellation, every vanishing monomial
will propagate through the carry chains of the multiplier and its
size will increase by other variables. Because of that, canceling
them by GB reduction will be computationally expensive.
C. Overall Algorithm
Both XOR rewriting and common rewriting follow two steps,
which are identifying a set of variables and then substituting
all remaining variables. Hence, the rewriting can be explained
by a generalized algorithm, called Gro¨bner Basis Rewriting
(GB-Rew), illustrated in Algorithm 2. It substitutes the variables
that are not in V using the S-polynomial method. Additionally,
in XOR rewriting, monomials are removed from the model
using the XOR-AND vanishing rule after every substitution.
It considers the polynomials in reverse order of their
leading monomial variables. For example, for a model of two
polynomials g1 := x1 + tail(g1) and g2 := x2 + tail(g2)
with monomial ordering3 x2 > x1, the polynomial g1 will be
considered ﬁrst.
The substitution order of the variables is chosen according
to the number of terms in the tail part of their polynomials.
For example, for two polynomials g1 := x1 + tail(g1) and
g2 := x2 + tail(g2), variable x1 is substituted before x2 if
the number of terms in tail(g1) is smaller than the number of
terms in tail(g2).
After ﬁnishing the model rewriting and removing vanishing
monomials, all polynomials whose leading monomial variables
are not in the variables list V and which are not primary output
variables will be removed, since the have been substituted
during rewriting.
The overall rewriting scheme that is carried out as Step 2 in
the MT algorithm is the sequential execution of XOR rewriting
and common rewriting using the Gro¨bner basis rewriting. This
is also illustrated by Algorithm 3, and is referred as logic
reduction rewriting.
V. EXPERIMENTAL RESULTS
The MT algorithm with logic reduction rewriting (MT-LR)
and with fanout rewriting (MT-FO) [7] have been implemented
in C++. The experiments have been carried out on an Intel(R)
2This is very similar to fanout rewriting, but since we are no longer working
on the original circuit model, one cannot strictly speak of fanout variables.
3Note that the variables are ordered according to the reverse topological
order of the circuit, as explained in Section II.
Algorithm 2 Gro¨bner Basis Rewriting (GB-Rew)
Input: Variables V , Circuit Model G
Output: Model Gn rewritten wrt. V
1: for gi ∈ G do /* in reverse order of leading monomials */
2: lv ← lm(gi)
3: r ← gi − lv
4: while Vars(r) 	⊆ V do
5: Choose vt ∈ Vars(r) \ V
6: Choose gt ∈ G such that lm(gt) = vt
7: r ← Spoly(r, gt)
8: r ← XORAND-Rule(r)
9: end while
10: gi ← r + lv
11: end for
12: Gn ← UpdateModel(G, V ) /* Remove polynomials
whose leading terms are not in V */
13: return Gn
Algorithm 3 Logic Reduction Rewriting
Input: Speciﬁcation Polynomial pspec, Circuit Model G
Output: Circuit Model G
1: V ← XORRewritingVariables(G)
2: G ← GB-Rew(V,G)
3: V ← CommonRewritingVariables(G)
4: G ← GB-Rew(V,G)
5: return G
Core(TM) i5-3320M CPU (2.6 GHz, 16 GByte) running Linux.
For the experiments, we generated different multiplier architec-
tures using the online tool Arithmetic Module Generator [10].
The multipliers are given as Verilog RTL code. The designs
were synthesized to gate level netlists using Yosys [11].
To evaluate the practical time of the MT-LR algorithm in
verifying multipliers with different architectures, we apply it
to verify n-bit multipliers against the speciﬁcation equation
2n−1∑
i=0
−2isi +
n−1∑
i=0
2iai ·
n−1∑
i=0
2ibi mod 2
2n.
This is done by dividing the algebraic module of multipliers
wrt. the polynomial pspec :=
∑2n−1
i=0 −2isi +
∑n−1
i=0 2
iai ·∑n−1
i=0 2
ibi and calculating r ← r mod 22n by removing from
the division remainder r the terms that have coefﬁcients that
are a multiple of 22n. Please note that we propose the idea of
adding modulo 22n to the speciﬁcation of integer multipliers,
such that the speciﬁcation matches multipliers that consist
of Booth partial products or redundant binary addition trees.
The remainder of dividing the speciﬁcation equation without
modulo 22n wrt. algebraic models of these multipliers is not
equal to zero. The remainder has terms with coefﬁcients that
are multiple of 22n.
The multiplier architectures of benchmarks [10] are cat-
egorized according to 1) the type of the partial products
generator, 2) the partial products accumulator, and 3) the
last stage adder. In our experiments, we use two types of
partial products generators, namely simple partial products
(SP), and Booth partial products (BP) . The types of partial
products accumulators are array (AR), wallace tree (WT), (4,2)
compressor tree (CT), redundant binary addition tree (RT),
and dadda tree (DT). Finally, the chosen types of the last stage
adder are ripple carry adder (RC), carry look-ahead adder
(CL), brent-kung adder (BK), kogge-stone adder (KS), and
1052 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)
TABLE I
VERIFICATION RESULTS FOR SIMPLE PARTIAL PRODUCTS MULTIPLIERS
Benchmark I/O bits Commercial CPP [13] MT-FO [7] MT-LR
(h:m:s) (h:m:s) (h:m:s) (h:m:s)
SP-AR-RC 16/32 00:00:01 00:01:23 00:00:01 00:00:02
SP-WT-CL 16/32 00:00:01 00:00:46 TO 00:00:05
SP-RT-KS 16/32 00:00:43 N/A TO 00:00:17
SP-CT-BK 16/32 00:00:59 00:00:43 TO 00:00:04
SP-AR-RC 32/64 00:00:11 02:34:40 00:00:09 00:00:21
SP-WT-CL 32/64 00:00:06 00:15:12 TO 00:03:27
SP-DT-HC 32/64 00:00:09 N/A TO 00:02:05
SP-CT-BK 32/64 TO 00:21:20 TO 00:01:35
SP-AR-RC 64/128 00:02:52 94:37:20 00:02:56 00:07:40
SP-WL-CL 64/128 00:00:36 05:46:40 TO 02:18:34
SP-RT-KS 64/128 TO N/A TO 02:51:12
SP-CT-BK 64/128 TO 05:31:44 TO 00:47:48
SP-AR-RC 128/256 01:03:34 TO 00:48:03 02:08:51
SP-CT-BK 128/256 TO 78:11:12 TO 14:03:33
TABLE II
VERIFICATION RESULTS FOR BOOTH PARTIAL PRODUCTS MULTIPLIERS
Benchmark I/O bits Commercial CPP [13] MT-FO [7] MT-LR
(h:m:s) (h:m:s) (h:m:s) (h:m:s)
BP-AR-RC 16/32 00:00:14 - TO 00:00:02
BP-WT-CL 16/32 00:00:16 - TO 00:00:09
BP-RT-KS 16/32 00:00:18 - TO 00:00:17
BP-CT-BK 16/32 00:00:13 - TO 00:00:06
BP-AR-RC 32/64 TO - TO 00:00:17
BP-WT-CL 32/64 TO - TO 00:04:46
BP-RT-KS 32/64 TO - TO 00:05:36
BP-CT-BK 32/64 TO - TO 00:02:20
BP-AR-RC 64/128 TO - TO 00:05:06
BP-WT-CL 64/128 TO - TO 03:03:48
BP-DT-HC 64/128 TO - TO 00:58:44
BP-CT-BK 64/128 TO - TO 00:37:53
BP-AR-RC 128/256 TO - TO 01:29:10
BP-CT-BK 128/256 TO - TO 15:14:49
hans-carlson adder (HC). In the following, the benchmarks
are named according to their architecture features. Please note
that all benchmarks time out after 100 hours when performing
veriﬁcation using a naive miter construction (one big miter;
ABC [12] using command ‘cec’).
In Tables I and II, we compare the run-times of the proposed
algorithm MT-LR, against our re-implementation of MT-FO [7],
a recent work combining recurrence relations and SAT-based
equivalence checking [13] referred as Checking Partial Product
(CPP) approach, and the equivalence checker of the commercial
tool OneSpin (after enabling multiplier options). The ﬁrst
column of Tables I and II shows the name of the circuit.
The second column gives the number of inputs and output
bits. The next four columns provide the run-times. The time
out (TO in the table) has been set to 100 hours. For the CPP
approach, “-” refers to the fact that CPP can not be used for
multipliers consisting of Booth partial products. Finally, N/A
refers to not available in the respective paper. The experimental
results clearly demonstrate the advantage of our approach.
While for multipliers with simple partial products (Table I)
the other approaches sometimes can verify the correctness,
for the complex parallel architectures (Table II) only our
approach solves the veriﬁcation problem when the instances
reach relevant sizes. As can be seen we are able to verify the
correctness for up to 128 bits.
Table III shows some statistics about the MT-LR algorithm.
The columns give the circuit name, number of circuit bits,
number of vanishing monomial that are canceled by XOR-
AND rule (#CVM), the run-time of the GB reduction after
TABLE III
STATISTICS FOR VERIFICATION OF MULTIPLIERS BY MT-LR
Benchmark I/O #CVM GB reduction #P #M #MP #VM
bits (h:m:s)
BP-WT-CL 32/64 39651 00:00:40 1965 18186 142 65
BP-RT-KS 32/64 42000 00:00:38 1989 23341 200 69
SP-DT-HC 32/64 15842 00:00:23 3011 18267 124 63
SP-CT-BK 32/64 4480 00:00:37 2702 37137 256 62
BP-WT-CL 64/128 325377 00:10:47 7180 71473 331 129
BP-DT-HC 64/128 134367 00:06:45 6491 70635 260 130
SP-RT-KS 64/128 290053 00:13:08 13106 95314 376 131
SP-CT-BK 64/128 22228 00:07:09 10676 148381 274 124
SP-CT-BK 128/256 106970 01:25:53 42016 592715 530 252
logic reduction rewriting, and ﬁnally statistics on the model
after rewriting. The model statistics columns show number of
polynomials (#P), number of monomials (#M), maximum
size of a polynomial wrt. its number of monomials (#MP),
and maximum size of a monomial wrt. its number of variables
(#VM). The results of Table III show that multipliers with
carry look ahead adders or with kogge-stone adder have the
largest number of vanishing monomials and therefore the largest
execution time. Moreover, it can be seen that the GB reduction
spends a fraction of the total execution time of the MT-LR
algorithm. Most of the time is spent in rewriting the circuit
model by logic reduction rewriting.
VI. CONCLUSION
In this paper we have presented a new algorithm which
extends the computer algebra technique for veriﬁcation of com-
plex parallel multiplier architectures. The algorithm is based on
canceling vanishing monomials in an efﬁcient way before their
blow-up during Gro¨bner basis reduction. Our approach rewrites
the algebraic model of the multiplier based on the XOR gates of
the multiplier netlist and searches for monomials that satisfy the
XOR-AND vanishing rule. Experimental results demonstrated
the efﬁciency of our approach, i.e., for all complex parallel
multipliers we veriﬁed the correctness within seconds to 15
hours (for 128 bits), while all other approaches reached the
timed out limit of 100 hours and gave no result.
REFERENCES
[1] H. P. Sharangpani and M. L. Barton, “Statistical analysis of ﬂoating point ﬂaw in
the pentium processor(1994),” Intel, Tech. Rep., Nov. 1994.
[2] E. Pavlenko, M. Wedler, D. Stoffel, O. Wienand, E. Karibaev, and W. Kunz,
“Modeling of custom-designed arithmetic components in ABL normalization,” in
FDL, 2008, pp. 124–129.
[3] O. Wienand, M. Wedler, D. Stoffel, W. Kunz, and G. M. Greuel, “An algebraic
approach for proving data correctness in arithmetic data paths,” in CAV, 2008, pp.
473–486.
[4] E. Pavlenko, M. Wedler, D. Stoffel, W. Kunz, A. Dreyer, F. Seelisch, and G. Greuel,
“Stable: A new qf-bv smt solver for hard veriﬁcation problems combining boolean
reasoning with computer algebra,” in DATE, March 2011, pp. 1–6.
[5] J. Lv, P. Kalla, and F. Enescu, “Efﬁcient Gro¨bner basis reductions for formal
veriﬁcation of galois ﬁeld multipliers,” in DATE, 2012, pp. 899–904.
[6] M. Ciesielski, C. Yu, D. Liu, and W. Brown, “Veriﬁcation of gate-level arithmetic
circuits by function extraction,” in DAC, 2015, pp. 52:1–52:6.
[7] F. Farahmandi and B. Alizadeh, “Gro¨bner basis based formal veriﬁcation of
large arithmetic circuits using gaussian elimination and cone-based polynomial
extraction,” MICPRO, vol. 39, no. 2, pp. 83–96, 2015.
[8] Y. Watanabe, N. Homma, T. Aoki, and T. Higuchi, “Application of symbolic
computer algebra to arithmetic circuit veriﬁcation,” in ICCD, 2007, pp. 25–32.
[9] D. Cox, J. Little, and D. O’Shea, Ideals, Varieties, and Algorithms. Springer,
1997.
[10] “Arithmetic module generator based on acg,” available at http://www.aoki.ecei.
tohoku.ac.jp/arith/, 2015.
[11] C. Wolf, “Yosys open synthesis suite,” available at http://www.clifford.at/yosys/,
2015.
[12] R. Brayton and A. Mishchenko, “ABC: An academic industrial-strength veriﬁca-
tion tool,” in CAV, 2010, pp. 24–40.
[13] A. Sayed-Ahmed, U. Ku¨hne, D. Große, and R. Drechsler, “Recurrence relations
revisited: Scalable veriﬁcation of bit level multiplier circuits,” in ISVLSI, 2015, pp.
1–6.
2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) 1053
