Word-level Symbolic Trajectory Evaluation by Chakraborty, Supratik et al.
Word-level Symbolic Trajectory Evaluation
Supratik Chakraborty1, Zurab Khasidashvili2, Carl-Johan H. Seger3,
Rajkumar Gajavelly1, Tanmay Haldankar1, Dinesh Chhatani1, and
Rakesh Mistry1
1 IIT Bombay, India ?
2 Intel IDC, Haifa, Israel
3 Intel, Portland OR, USA
Abstract. Symbolic trajectory evaluation (STE) is a model checking
technique that has been successfully used to verify industrial designs.
Existing implementations of STE, however, reason at the level of bits,
allowing signals to take values in {0, 1, X}. This limits the amount of
abstraction that can be achieved, and presents inherent limitations to
scaling. The main contribution of this paper is to show how much more
abstract lattices can be derived automatically from RTL descriptions,
and how a model checker for the general theory of STE instantiated
with such abstract lattices can be implemented in practice. This gives us
the first practical word-level STE engine, called STEWord. Experiments
on a set of designs similar to those used in industry show that STEWord
scales better than word-level BMC and also bit-level STE.
1 Introduction
Symbolic Trajectory Evaluation (STE) is a model checking technique that grew
out of multi-valued logic simulation on the one hand, and symbolic simulation
on the other hand [2]. Among various formal verification techniques in use today,
STE comes closest to functional simulation and is among the most successful for-
mal verifiation techniques used in the industry. In STE, specifications take the
form of symbolic trajectory formulas that mix Boolean expressions and the tem-
poral next-time operator. The Boolean expressions provide a convenient means
of describing different operating conditions in a circuit in a compact form. By
allowing only the most elementary of temporal operators, the class of properties
that can be expressed is fairly restricted as compared to other temporal logics
(see [3] for a nice survey). Nonetheless, experience has shown that many impor-
tant aspects of synchronous digital systems at various levels of abstraction can
be captured using this restricted logic. For example, it is quite adequate for ex-
pressing many of the subtleties of system operation, including clocking schemas,
pipelining control, as well as complex data computations [11,7,6].
In return for the restricted expressiveness of STE specifications, the STE
model checking algorithm provides siginificant computational efficiency. As a re-
sult, STE can be applied to much larger designs than any other model checking
? R. Gajavelly, T. Haldankar and D. Chhatani contributed to this work when they
were in IIT Bombay.
ar
X
iv
:1
50
5.
07
91
6v
1 
 [c
s.L
O]
  2
9 M
ay
 20
15
technique. For example, STE is routinely used in the industry today to carry
out complete formal input-output verification of designs with several hundred
thousand latches [7,6]. Unfortunately, this still falls short of providing an au-
tomated technique for formally verifying modern system-on-chip designs, and
there is clearly a need to scale up the capacity of STE even further.
The first approach that was pursued in this direction was structural decom-
position. In this approach, the user must break down a verification task into
smaller sub-tasks, each involving a distinct STE run. After this, a deductive
system can be used to reason about the collections of STE runs and verify
that they together imply the desired property of the overall design [5]. In the-
ory, structural decomposition allows verification of arbitrarily complex designs.
However, in practice, the difficulty and tedium of breaking down a property into
small enough sub-properties that can be verified with an STE engine limits the
usefulness of this approach significantly. In addition, managing the structural
decomposition in the face of rapidly changing RTL limits the applicability of
structural decomposition even further.
A different approach to increase the scale of designs that can be verified is
to use aggressive abstraction beyond what is provided automatically by cur-
rent STE implementations. If we ensure that our abstract model satisfies the
requirements of the general theory of STE, then a property that is verified on
the abstract model holds on the original model as well. Although the general
theory of STE allows a very general circuit model [10], all STE implementations
so far have used a three-valued circuit model. Thus, every bit-level signal is al-
lowed to have one of three values: 0, 1 or X, where X represents “either 0 or 1”.
This limits the amount of abstraction that can be achieved. The main contri-
bution of this paper is to show how much more abstract lattices can be derived
automatically from RTL descriptions, and how the general theory of STE can
be instantiated with this lattice to give a practical word-level STE engine that
provides significant gains in capacity and efficiency on a set of benchmarks.
Operationally, word-level STE bears similarities with word-level bounded
model checking (BMC). However, there are important differences, the most sig-
nificant one being the use of X-based abstractions on slices of words, called
atoms, in word-level STE. This allows a wide range of abstraction possibilities,
including a combination of user-specified and automatic abstractions – often a
necessity for complex verification tasks. Our preliminary experimental results
indicate that by carefully using X-based abstractions in word-level STE, it is
indeed possible to strike a good balance between accuracy (cautious propagation
of X) and performance (liberal propagation of X).
The remainder of the paper is organized as follows. We discuss how words
in an RTL design can be split into atoms in Section 2. Atoms form the basis
of abstracting groups of bits. In Section 3, we elaborate on the lattice of values
that this abstraction generates, and Section 4 presents a new way of encoding
values of atoms in this lattice. We also discuss how to symbolically simulate
RTL operators and compute least upper bounds using this encoding. Section 5
presents an instantiation of the general theory of STE using the above lattice, and
discusses an implementation. Experimental results on a set of RTL benchmarks
are presented in Section 6, and we conclude in Section 7.
2 Atomizing words
In bit-level STE [2,11], every variable is allowed to take values from {0, 1, X},
where X denotes “either 0 or 1”. The ordering of information in the values 0,
1 and X is shown in the lattice in Fig. 1, where a value lower in the order has
“less information” than one higher up in the order. The element > is added
to complete the lattice, and represents an unachievable over-constrained value.
Tools that implement bit-level STE usually use dual-rail encoding to reason
about ternary values of variables. In dual-rail encoding, every bit-level variable v
is encoded using two binary variables v0 and v1. Intuitively, vi indicates whether v
can take the value i, for i in {0, 1}. Thus, 0, 1 andX are encoded by the valuations
(1, 0), (0, 1) and (1, 1), respectively, of (v0, v1). By convention, (v0, v1) = (0, 0)
denotes >. An undesired consequence of dual-rail encoding is the doubling of
binary variables in the encoded system. This can pose serious scalability issues
when verifying designs with wide datapaths, large memories, etc. Attempts to
scale STE to large designs must therefore raise the level of abstraction beyond
that of individual bits.
X
0 1
T
Fig. 1. Ternary
lattice
In principle, one could go to the other extreme, and run
STE at the level of words as defined in the RTL design. This
requires defining a lattice of values of words, and instantiating
the general theory of STE [10] with this lattice. The difficulty
with this approach lies in implementing it in practice. The
lattice of values of an m-bit word, where each bit in the word
can take values in {0, 1, X}, is of size at least 3m. Symbolically
representing values from such a large lattice and reasoning
about them is likely to incur overheads similar to that incurred in bit-level STE.
Therefore, STE at the level of words (as defined in the RTL design) does not
appear to be a practical proposition for scaling.
The idea of splitting words into sub-words for the purpose of simplifying
analysis is not new (see e.g. [4]). An aggressive approach to splitting (an ex-
treme example being bit-blasting) can lead to proliferation of narrow sub-words,
making our technique vulnerable to the same scalability problems that arise with
dual-rail encoding. Therefore, we adopt a more controlled approach to splitting.
Specifically, we wish to split words in such a way that we can speak of an entire
sub-word having the value X without having to worry about which individual
bits in the sub-word have the value X. Towards this end, we partition every
word in an RTL design into sub-words, which we henceforth call atoms, such
that every RTL statement (except a few discussed later) that reads or updates
a word either does so for all bits in an atom, or for no bit in an atom. In other
words, no RTL statement (except the few discussed at the end of this section)
reads or updates an atom partially.
Some details of atomization To formalize the notion of atoms, let w be a
word of width m in an RTL design C. Let 0 denote the least significant bit
position and m − 1 denote the most significant bit position of w. For integer
constants p, q such that 0 ≤ p ≤ q ≤ m− 1, we say that the sub-word of w from
bit position p to q is a slice of w, and denote it by w[q : p]. let AbsSel(w, q, p) be an
abstract selection operator that either reads or writes the slice w[q : p]. Concrete
instances of AbsSel are commonly used in RTL designs, e.g. in the System-Verilog
statement c[4:1] = a[10:7] + b[5:2]. We say that AbsSel(w, q, p) induces an
atomization of w, as shown in Table II,where Atomsw denotes the set of atoms
into which w is partitioned.
Condition Atomsw
q < m− 1 and p > 0 {w[m− 1 : q + 1], w[q : p], w[p− 1 : 0]}
q < m− 1 and p = 0 {w[m− 1 : q + 1], w[q : 0]}
q = m− 1 and p > 0 {w[m− 1 : p], w[p− 1 : 0]}
q = m− 1 and p = 0 {w[m− 1 : 0]}
Table 1. Computing atoms induced by AbsSel(w, q, p)
Given atomizations Atoms(1)w and Atoms
(2)
w , we define their coarsest refine-
ment to be the atomization in which w[m1 : m1] and w[m2 : m2] belong to the
same atom iff they belong to the same atom in both Atoms(1)w and Atoms
(2)
w .
For every word w[m − 1 : 0] in the RTL design, we maintain a working set,
WSetAtomsw, of atoms. Initially, WSetAtomsw is initialized to {w[m − 1 : 0]}.
For every concrete instance of AbsSel applied on w in an RTL statement, we com-
pute Atomsw using Table II, and determine the coarsest refinement of Atomsw
and WSetAtomsw. The working set WSetAtomsw is then updated to the coarsest
refinement thus computed. The above process is then repeated for every RTL
statement in the design.
The above discussion leads to a fairly straightforward algorithm for identi-
fying atoms in an RTL design. We illustrate this on a simple example below.
Fig. 2(a) shows a System-Verilog code fragment, and Fig. 2(b) shows an atom-
ization of words, where the solid vertical bars represent the boundaries of atoms.
Note that every System-Verilog statement in Fig. 2(a) either reads or writes all
bits in an atom, or no bit in an atom. Since we wish to reason at the granu-
larity of atoms, we must interpret word-level reads and writes in terms of the
corresponding atom-level reads and writes. This can be done either by modifying
the RTL, or by taking appropriate care when symbolically simulating the RTL.
For simplicity of presentation, we show in Fig. 2(c) how the code fragment in
Fig. 2(b) would appear if we were to use only the atoms identified in Fig. 2(b).
Note that no statement in the modified RTL updates or reads a slice of an atom.
However, a statement may be required to read a slice of the result obtained by
applying an RTL operator to atoms (see, for example, Fig. 2(c) where we read a
slice of the result obtained by adding concatenated atoms). In our implementa-
tion, we do not modify the RTL. Instead, we symbolically simulate the original
RTL, but generate the expressions for various atoms that would result from
simulating the modified RTL.
reg [3:0] x; 
reg [7:0] y;
reg [7:0] z;
reg [3:0] w;
...
z[4:1] = x + y[5:2];
w = z[3:0] + y[3:0];
...
x
y
z
w
reg [3:0] x; 
reg [1:0] y_1_0; reg [1:0] y_3_2;
reg [1:0] y_5_4; reg [1:0] y_7_6;
reg z_0_0; reg [2:0] z_3_1;
reg z_4_4; reg [2:0] z_7_5;
reg [3:0] w;
...
z_4_4 = (x + {y_5_4, y_3_2})[3:3];
z_3_1 = (x + {y_5_4, y_3_2})[2:0];
w = ({z_3_1, z_0_0} + {y_3_2, y_1_0});
...
Bit­positions
7 6  5 4  3 2  1 0
(a) (b)
(c)
Concatenation
Fig. 2. Illustrating atomization
Once the boundaries of all atoms
are determined, we choose to disre-
gard values of atoms in which some
bits are set to X, and the others are
set to 0 or 1. This choice is justified
since all bits in an atom are read or
written together. Thus, either all bits
in an atom are considered to have val-
ues in {0, 1}, or all of them are consid-
ered to have the value X. This implies
that values of an m-bit atom can be
encoded using m + 1 bits, instead of
using 2m bits as in dual-rail encoding.
Specifically, we can associate an addi-
tional “invalid” bit with every m-bit
atom. Whenever the “invalid” bit is
set, all bits in the atom are assumed to have the value X. Otherwise, all bits are
assumed to have values in {0, 1}. We show later in Sections 4.1 and 4.2 how the
value and invalid bit of an atom can be recursively computed from the values
and invalid bits of the atoms on which it depends.
Memories and arrays in an RTL design are usually indexed by variables
instead of by constants. This makes it difficult to atomize memories and arrays
statically, and we do not atomize them. Similarly, if a design has a logical shift
operation, where the amount of shift is specified by a variable, it is difficult
to statically identify subwords that are not split by the shift operation. We
ignore all such RTL operations during atomizaion, and instead use extensional
arrays [12] to model and reason about them. Section 4.2 discusses the modeling
of memory/array reads and writes in this manner.
3 Lattice of atom values
Recall that the primary motivation for atomizing words is to identify the right
granularity at which an entire sub-word (atom) can be assigned the value X
without worrying about which bits in the sub-word have the value X. Therefore,
an m-bit atom a takes values from the set {
m bits︷ ︸︸ ︷
0 · · · 00, . . .
m bits︷ ︸︸ ︷
1 · · · 11,X}, where X is
a single abstract value that denotes an assignment of X to at least one bit of a.
Note the conspicuous absence of values like 0X1 · · · 0 in the above set. Fig. 3(a)
shows the lattice of values for a 3-bit atom, ordered by information content.
The > element is added to complete the lattice, and represents an unachievable
over-constrained value. Fig. 3(b) shows the lattice of values of the same atom if
XXX
XX0 X0X 0XX 1XX X1X XX1
X00 0X0  00X 0X1 01X X01 X10 10X 1X0 11X 1X1 X11
000 001 010 011 100 110 101 111
TTT 
(subsumes T01, T1T, ...)
A deep and dense lattice
   Height: 4;   # Elements: 28
X
000 001 010 011 100 101 110 111
T
A shallow and sparse lattice
   Height: 2;   # Elements: 10
(b)
(a)
Fig. 3. Atom-level and bit-level lattices
we allow each bit to take values in {0, 1, X}. Clearly, the lattice in Fig. 3(a) is
shallower and sparser than that in Fig. 3(b).
Consider an m-bit word w that has been partitioned into non-overlapping
atoms of widths m1, . . .mr, where
∑r
j=1mj = m. The lattice of values of w
is given by the product of r lattices, each corresponding to the values of an
atom of w. For convenience of representation, we simplify the product lattice by
collapsing all values that have at least one atom set to > (and therefore represent
unachievable over-constrained values), to a single > element. It can be verified
that the height of the product lattice (after the above simplification) is given by
r+ 1, the total number of elements in it is given by
∏m
j=1
(
2mj + 1
)
+ 1 and the
number of elements at level i from the bottom is given by
(
m
i
)∏i
j=1 2
mj , where
0 < i ≤ r. It is not hard to see from these expressions that atomization using
few wide atoms (i.e., small values of r and large values of mj) gives shallow and
sparse lattices compared to atomization using many narrow atoms (i.e., large
values of r and small values of mj). The special case of a bit-blasted lattice (see
Fig. 3(b)) is obtained when r = m and mj = 1 for every j ∈ {1, . . .m}.
Using a sparse lattice is advantageous in symbolic reasoning since we need
to encode a small set of values. Using a shallow lattice helps in converging fast
when computing least upper bounds – an operation that is crucially needed
when performing symbolic trajectory evaluation. However, making the lattice
of values sparse and shallow comes at the cost of losing precision of reasoning.
By atomizing words based on their actual usage in an RTL design, and by
abstracting values of atoms wherein some bits are set to X and the others are
set to 0 or 1, we strike a balance between depth and density of the lattice of
values on one hand, and precision of reasoning on the other.
4 Symbolic simulation with invalid-bit encoding
As mentioned earlier, an m-bit atom can be encoded with m + 1 bits by as-
sociating an “invalid bit” with the atom. For notational convenience, we use
val(a) to denote the value of the m bits constituting atom a, and inv(a) to de-
note the value of its invalid bit. Thus, an m-bit atom a is encoded as a pair
(val(a), inv(a)), where val(a) is a bit-vector of width m, and inv(a) is of Boolean
type. Given (val(a), inv(a)), the value of a is given by ite(inv(a),X, val(a)), where
“ite” denotes the usual “if-then-else” operator. For clarity of exposition, we call
this encoding “invalid-bit encoding”. Note that invalid-bit encoding differs from
dual-rail encoding even when m = 1. Specifically, if a 1-bit atom a has the value
X, we can use either (0, true) or (1, true) for (val(a), inv(a)) in invalid-bit encod-
ing. In contrast, there is a single value, namely (a0, a1) = (1, 1), that encodes the
value X of a in dual-rail encoding. We will see in Section 4.2 how this degree of
freedom in invalid-bit encoding of X can be exploited to simplify the symbolic
simulation of word-level operations on invalid-bit-encoded operands, and also to
simplify the computation of least upper bounds.
Symbolic simulation is a key component of symbolic trajectory evaluation. In
order to symbolically simulate an RTL design in which every atom is invalid-bit
encoded, we must first determine the semantics of word-level RTL operators with
respect to invalid-bit encoding. Towards this end, we describe below a generic
technique for computing the value component of the invalid-bit encoding of the
result of applying a word-level RTL operator. Subsequently, we discuss how the
invalid-bit component of the encoding is computed.
4.1 Symbolically simulating values
Let op be a word-level RTL operator of arity k, and let res be the result of
applying op on v1, v2, . . . vk, i.e., res = op(v1, v2, . . . vk). For each i in {1, . . . k},
suppose the bit-width of operand vi is mi, and suppose the bit-width of res is
mres. We assume that each operand is invalid-bit encoded, and we are interested
in computing the invalid-bit encoding of a specified slice of the result, say res[q :
p], where 0 ≤ p ≤ q ≤ mres− 1. Let 〈op〉 : {0, 1}m1 ×· · ·×{0, 1}mk → {0, 1}mres
denote the RTL semantics of op. For example, if op denotes 32-bit unsigned
addition, then 〈op〉 is the function that takes two 32-bit operands and returns
their 32-bit unsigned sum. The following lemma states that val(res[q : p]) can
be computed if we know 〈op〉 and val(vi), for every i ∈ {1, . . . k}. Significantly,
we do not need inv(vi) for any i ∈ {1, . . . k} to compute val(res[q : p]).
Lemma 1. Let v =
(〈op〉(val(v1), val(v2), . . . val(vk)))[q : p]. Then val(res[q : p])
is given by v, where res = op(v1, v2, . . . vk).
Proof. By definition of invalid-bit encoding, if inv(res[q : p]) is true, the value of
val(res[q : p]) does not matter. Hence, we focus on the case where inv(res[q : p]) is
false. By definition, in this case, res[q : p] has a value in {0, 1}q−p+1. If the invalid
bits of all operands vi are false, then
(〈op〉(val(v1), val(v2), . . . val(vk)))[q : p]
clearly computes the value of val(res[q : p]). Otherwise, suppose inv(vi) = true
for some i ∈ {1, . . . k}. By definition of invalid-bit encoding, vi can have any
value in {0, 1}mi . However, since inv(res[q : p]) is false, it must be the case that
val(res[q : p]) has a well-defined value in {0, 1}q−p+1, regardless of what value vi
takes in {0, 1}mi . Therefore, we can set the value of vi to val(vi) without affecting
the value of res[q : p]. By repeating this argument for all vi such that inv(vi) is
true, we see that
(〈op〉(val(v1), val(v2), . . . val(vk)))[q : p] gives val(res[q : p]).
Lemma 1 tells us that when computing val(res[q : p]), we can effectively as-
sume that invalid-bit encoding is not used. This simplifies symbolic simulation
with invalid-bit encoding significantly. Note that this simplification would not
have been possible had we not had the freedom to ignore val(res[q : p]) when
inv(res[q : p]) is true.
4.2 Symbolically simulating invalid bits
We now turn to computing inv(res[q : p]). Unfortunately, computing inv(res[q :
p]) precisely is difficult and involves operator-specific functions that are often
complicated. We therefore choose to approximate inv(res[q : p]) in a sound man-
ner with functions that are relatively easy to compute. Specifically, we allow
inv(res[q : p]) to evaluate to true (denoting res[q : p] = X) even in cases where
a careful calculation would have shown that op(v1, v2, . . . vk) is not X. How-
ever, we never set inv(res[q : p]) to false if any bit in res[q : p] can take the
value X in a bit-blasted evaluation of res. Striking a fine balance between the
precision and computational efficiency of the sound approximations is key to
building a practically useful symbolic simulator using invalid-bit encoding. Our
experience indicates that simple and sound approximations of inv(res[q : p]) can
often be carefully chosen to serve our purpose. While we have derived templates
for approximating inv(res[q : p]) for res obtained by applying all word-level
RTL operators that appear in our benchmarks, we cannot present all of them
in detail here due to space constraints. We present below a discussion of how
inv(res[q : p]) is approximated for a subset of important RTL operators. Impor-
tantly, we use a recursive formulation for computing inv(res[q : p]). This allows
us to recursively compute invalid bits of atoms obtained by applying complex
sequences of word-level operations to a base set of atoms.
Word-level addition. Let +m denote an m-bit addition operator. Thus, if a
and b are m-bit operands, a +m b generates an m-bit sum and a 1-bit carry.
Let the carry generated after adding the least significant r bits of the operands
be denoted carryr. We discuss below how to compute sound approximations of
inv(sum[q : p]) and inv(carryr), where 0 ≤ p ≤ q ≤ m− 1 and 1 ≤ r ≤ m.
It is easy to see that the value of sum[q : p] is completely determined by
a[q : p], b[q : p] and carryp. Therefore, we can approximate inv(sum[q : p]) as
follows: inv(sum[q : p])= inv(a[q : p]) ∨ inv(b[q : p]) ∨ inv(carryp)
To see why the above approximation is sound, note that if all of inv(a[q : p]),
inv(b[q : p]) and inv(carryp) are false, then a[q : p], b[q : p] and carryp must
have non-X values. Hence, there is no uncertainty in the value of sum[q : p] and
inv(sum[q : p]) = false. On the other hand, if any of inv(a[q : p], inv(b[q : p]) or
inv(carryp) is true, there is uncertainty in the value of sum[q : p].
The computation of inv(carryp) (or inv(carryr)) is interesting, and deserves
special attention. We identify three cases below, and argue that inv(carryp) is
false in each of these cases. In the following, 0 denotes the p-bit constant 00 · · · 0.
1. If
(
inv(a[p − 1 : 0]) ∨ inv(b[p − 1 : 0])) = false, then both inv(a[p − 1 : 0])
and inv(b[p− 1 : 0]) must be false. Therefore, there is no uncertainty in the
values of either a[p− 1 : 0] or b[p− 1 : 0], and inv(carryp) = false.
2. If
(¬inv(a[p − 1 : 0]) ∧ (val(a[p − 1 : 0]) = 0)), then the least significant p
bits of val(a) are all 0. Regardless of val(b), it is easy to see that in this case,
val(carryp) = 0 and inv(carryp) = false.
3. This is the symmetric counterpart of the case above, i.e.,
(¬inv(b[p − 1 :
0]) ∧ (val(b[p− 1 : 0]) = 0)).
We now approximate inv(carryp) by combining the conditions corresponding to
the three cases above. In other words,
inv(carryp) =
(
inv(a[p− 1 : 0])∨inv(b[p− 1 : 0]))∧(
inv(a[p− 1 : 0])∨(val(a[p− 1 : 0]) 6=0))∧(
inv(b[p− 1 : 0])∨(val(b[p− 1 : 0]) 6=0))
Word-level division. Let ÷m denote an m-bit division operator; this is among
the most complicated word-level RTL operators for which we have derived an
approximation of the invalid bit. If a and b are m-bit operands, a÷m b generates
an m-bit quotient, say quot, and an m-bit remainder, say rem. We wish to
compute inv(quot[q : p]) and inv(rem[q : p]), where 0 ≤ p ≤ q ≤ m−1. We assume
that if inv(b) is false, then b 6= 0; the case of a÷mb with (val(b), inv(b)) = (0, false)
leads to a “divide-by-zero” exception, and is assumed to be handled separately.
The following expressions give sound approximations for inv(quot[q : p]) and
inv(rem[q : p]). In these expressions, we assume that i is a non-negative integer
such that 2i ≤ val(b) < 2i+1.
inv(quot[q : p]) = ite(inv(b), temp1, temp2), where
temp1 = inv(a) ∨ (val(a[m− 1 : p]) 6= 0) and
temp2 = ite(val(b) = 2
i, temp3, (i < p) ∨ inv(a[m− 1 : p])), where
temp3 = (p+ i ≤ m− 1) ∧ inv(a[min(q + i,m− 1) : p+ i]))
inv(rem[q : p])= inv(b) ∨ ite(val(b) = 2i, (i > p) ∧ inv(a[min(q, i− 1) : p]), i ≥ p)
Note that the constraint 2i ≤ val(b) < 2i+1 in the above formulation refers
to a fresh variable i that does not appear in the RTL. We will see later in
Section 5 that a word-level STE problem is solved by generating a set of word-
level constraints, every satisfying assignment of which gives a counter-example
to the verification problem. We add constraints like 2i ≤ val(b) < 2i+1 in the
above formulation, to the set of word-level constraints generated for an STE
problem. This ensures that every assignment of i in a counterexample satisfies
the required constraints on i.
To see why the above approximations for inv(quot[q : p]) and inv(rem[q : p])
are sound, first consider the case where inv(b) = true. Since we are unsure of
the value of the divisor, not much can be said about the remainder. So, we set
inv(rem[q : p]) to true. The situation is slightly better for the quotient. If we
know that inv(a) = false, then since the quotient of integer division is never
larger than the dividend, we can infer that quot[q : p] = 0 if a[m − 1 : p] = 0.
Clearly, in this case inv(quot[q : p]) = false. In all other sub-cases of inv(b) = true,
we set inv(quot[q : p]) to true.
If inv(b) = false, we know that b has a value in {0, 1}m, but not 0. Repre-
senting bit vectors by their integer representations, let i ∈ {0, . . .m− 1} be such
that 2i ≤ val(b) < 2i+1. We consider two sub-cases below.
– val(b) = 2i : In this case, a÷mb effectively shifts a right by i bit positions, and
the least significant i bits of a forms the remainder. Therefore, val(quot[q : p])
is a[q + i : p+ i] if q + i ≤ m− 1, is a[m− 1 : p+ i] padded to the left with
q−m+ i+ 1 0s if q+ i > m− 1 ≤ p+ i, and is 0 if p+ i > m− 1. It follows
that if p + i > m − 1, then val(quot[q : p]) = 0 and inv(quot[q : p]) = false.
Otherwise, inv(quot[q : p]) = inv(a[k : p+ i]), where k = min(q+ i,m− 1). it
is easy to see that val(rem[q : p]) is a[q : p] if i > q, is a[i−1 : p] padded with
q − i+ 1 0s to the left if q ≥ i > p, and is 0 if i ≤ p. By similar reasoning, if
i ≤ p, then inv(rem[q : p]) = false; otherwise, inv(rem[q : p]) = inv(a[k : p]),
where k = min(q, i− 1).
– 2i < val(b) < 2i+1 : In this case, we show below that if i ≥ p, then inv(quot[q :
p]) can be approximated by inv(a[m− 1 : p]). If i < p, then inv(rem[q : p]) =
false. In all other cases, we approximate inv(quot[q : p]) and inv(rem[q : p])
by true.
To see why the above approximations are sound, note that val(a) can be
written as a1 · 2p + a2, where a1 and a2 are the integer representations of
a[m− 1 : p] and a[p− 1 : 0], respectively. Clearly, 0 ≤ a2 < 2p. Considering
quotients and remainders on division by val(b), suppose a1 = k1 · val(b) + r1
and a2 = k2 · val(b) + r2, where 0 ≤ r1, r2 < val(b) and k1, k2 ≥ 0. Suppose
further that 2p · r1 + r2 = k3 · val(b) + r3, where 0 ≤ r3 < val(b) and k3 ≥ 0.
It is an easy exercise to see that the quotient of dividing val(a) by val(b) is
2p · k1 + k2 + k3, and the remainder is r3. Thus, val(quot) = 2p · k1 + k2 + k3
and val(rem) = r3. We discuss what happens when i ≥ p and i+ 1 ≤ p.
• If i ≥ p, then val(b) > 2i ≥ 2p > a2. Since val(b) > a2, we have k2 = 0
and r2 = a2 < 2
p. It follows that quot = 2p · k1 + k3. If k3 < 2p,
then quot[q : p] depends only on k1, which in turn, depends only on
a[m − 1 : p] and val(b). Therefore, inv(quot[q : p]) can be approximated
by inv(a[m− 1 : p]).
We now show that k3 is indeed strictly less than 2
p. Since 2p · r1 + r2 =
k3 · val(b) + r3, rearranging terms, we get k3 · val(b) − 2p · r1 = r2 − r3.
If possible, let k3 = 2
p + d, where d ≥ 0. Substituting for k3, we get
2p · (val(b)− r1) +d · val(b) = r2− r3. Since val(b) > r1, the left hand side
of the above equation is at least as large as 2p, while the right hand side
is at most r2, which, in turn, is less than 2
p. This gives a contradiction,
and therefore, k3 < 2
p.
• If i < p, we have rem = r3 < val(b) < 2i+1 ≤ 2p. Therefore, val(rem[q :
p]) = 0, and inv(rem[q : p]) = false.
The above analysis yields the sound approximations for inv(quot[q : p]) and
inv(rem[q : p]) discussed above.
If-then-else statements. Consider a conditional assignment statement “if
(BoolExpr) then x = Exp1; else x = Exp2;”. Symbolically simulating this
statement gives x = ite(BoolExpr,Exp1,Exp2). The following gives a sound ap-
proximation of inv(x[q : p]).
inv(x[q : p]) = ite(inv(BoolExpr), temp1, temp2), where
temp1 = inv(Exp1[q : p]) ∨ inv(Exp2[q : p]) ∨ (val(Exp1[q : p]) 6= val(Exp2[q : p]))
temp2 = ite(val(BoolExpr), inv(Exp1[q : p]), inv(Exp2[q : p]))
To see why the above approximation of inv(x[q : p]) is sound, let x =
ite(BoolExpr,Exp1,Exp2), where BoolExpr is a boolean expression, and Exp1 and
Exp2 are expressions of the same type as x. To compute inv(x[q : p]), we note that
if inv(BoolExpr) = false, then inv(x[q : p]) is simply ite(val(BoolExpr), inv(Exp1[q :
p]), inv(Exp2[q : p])). However, if inv(BoolExpr) = true, then the value ofBoolExpr
could be 1 (denoting true) or 0 (denoting false). Interestingly, if both inv(Exp1[q :
p] and inv(Exp2[q : p]) are false (i.e., neither Exp1[q : p] nor Exp2[q : p] are X), and
if val(Exp1[q : p]) = val(Exp2[q : p]), then regardless of the value of BoolExpr, we
have inv(x[q : p]) = false. This is formalized in the approximation for inv(x[q : p])
mentioned above.
Bit-wise logical operations. Let ¬m and ∧m denote bit-wise negation and
conjunction operators respectively, for m-bit words. If a, b, c and d are m-bit
words such that c = ¬ma and d = a∧m b, it is easy to see that the following give
sound approximations of inv(c) and inv(d).
inv(c[q : p]) = inv(a[q : p])
inv(d[q : p]) =
(
inv(a[q : p]) ∨ inv(b[q : p])) ∧ (inv(a[q : p]) ∨ (val(a[q : p]) 6= 0)) ∧(
inv(b[q : p]) ∨ (val(b[q : p]) 6= 0))
The invalid bits of other bit-wise logical operators (like disjunction, xor, nor,
nand, etc.) can be obtained by first expressing them in terms of ¬m and ∧m and
then using the above approximations.
Memory/array reads and updates. Let A be a 1-dimenstional array, i be an
index expression, and x be a variable and Exp be an expression of the base type
of A. On symbolically simulating the RTL statement “x = A[i];”, we update
the value of x to read(A, i), where the read operator is as in the extensional the-
ory of arrays (see [12] for details). Similarly, on symbolically simulating the RTL
statement “A[i] = Exp”, we update the value of array A to update(Aorig, i,Exp),
where Aorig is the (array-typed) expression for A prior to simulating the state-
ment, and the update operator is as in the extensional theory of arrays.
Since the expression for a variable or array obtained by symbolic simulation
may now have read and update operators, we must find ways to compute sound
approximations of the invalid bit for expressions of the form inv(read(A, i)[q : p]).
Note that since A is an array, the symbolic expression for A is either (i) Ainit, i.e.
the initial value of A at the start of symbolic simulation, or (ii) update(A′, i′,Exp′)
for some expressions A′, i′ and Exp′, where A′ has the same array-type as A, i′ has
an index type, and Exp′ has the base type of A. For simplicity of exposition, we
assume that all arrays are either completely initialized or completely uninitialized
at the start of symbolic simulation. The invalid bit in case (i) is then easily seen
to be true if Ainit denotes an uninitialized array, and false otherwise. In case (ii),
let v denote read(A, i). The invalid bit of v[q : p] can then be approximated as:
inv(v[q : p]) = inv(i) ∨ inv(i′) ∨ ite (val(i) = val(i′), inv(Exp′[q : p]), temp) , where
temp = inv(read(A′, i)[q : p]).
To see why the above expression gives a sound approximation of inv(v[q : p]),
note that if either i or i′ is X (i.e. the corresponding invalid bit is true), we
conservatively set inv(read(update(A′, i′,Exp′), i) to true. If neither i nor i′ is X,
there are two cases to consider.
– If val(i) = val(i′), then read(update(A′, i′,Exp′), i) = Exp′. Hence, the required
invalid bit is inv(Exp′[q : p]).
– If val(i) 6= val(i′), then read(update(A′, i′,Exp′), i) = read(A′, i). Hence, the
required invalid bit is inv(read(A′, i)[q : p]).
If the RTL design has multi-dimensional arrays, we simply treat them as ar-
rays of arrays, and apply the same reasoning as above. For example, if B is
a two-dimenstional array, the RTL statement “B[i][j] = Exp;” updates the
symbolic value of array B to update(Borig, i, update(read(Borig, i), j,Exp)), where
Borig is the symbolic expression for B prior to simulating the RTL statement.
Similarly, the RTL statement “x = B[i][j];”updates the symbolic value of x
to read(read(B, i), j).
Shift operations. We discuss below the left-shift operation; the case of the
right-shift operation can be analyzed similarly. A shift operation can specify ei-
ther a constant number of bit positions to shift, or a variable number of positions
to shift. We analyze these two cases separately since shifting by a variable num-
ber of positions does not allow us to statically identify the operand’s bit-slices of
interest. In either case, we assume that a left shift operation pads 0s in the least
signficant shifted positions. Letk denote a unary left-shift operator of the first
kind, where k is a positive integer constant, and let  denote a binary left-shift
operator of the second kind. Let a, b, c, d be m-bit words such that b =k a
and c = a  d. For simplicity of presentation, we assume no wrap-around in
shifting; the case of wrap-around can be analyzed in a similar way. The follow-
ing equations give sound approximations of inv(b[q : p]) and inv(c[q : p]), where
0 ≤ p ≤ q ≤ m− 1.
inv(b[q : p])=ite(p ≥ k, inv(a[q − k : p− k]), temp),where
temp=ite(q ≥ k, inv(a[q − k : 0]), false) (1)
inv(c[q : p])=inv(a[q : 0]) ∧ (inv(d) ∨ (val(d) ≤ q)) (2)
4.3 Computing least upper bounds
Let a = (val(a), inv(a)) and b = (val(b), inv(b)) be invalid-bit encoded elements
in the lattice of values for an m-bit atom. We define c = lub(a, b) as follows.
(a) If (¬inv(a) ∧ ¬inv(b) ∧ (val(a) 6= val(b)), then c = >.
(b) Otherwise, inv(c) = inv(a) ∧ inv(b) and val(c) = ite(inv(a), val(b), val(a)) (or
equivalently val(c) = ite(inv(b), val(a), val(b))).
Note the freedom in defining val(c) in case (b) above. This freedom comes from
the observation that if inv(c) = true, the value of val(c) is irrelevant. Furthermore,
if the condition in case (a) is not satisfied and if both inv(a) and inv(b) are false,
then val(b) = val(c). This allows us to simplify the expression for val(c) on-the-fly
by replacing it with val(b), if needed.
5 Word-level STE
In this section, we briefly review the general theory of STE [10] instantiated
to the lattice of values of atoms. An RTL design C consists of inputs, outputs
and internal words. We treat bit-level signals as 1-bit words, and uniformly talk
of words. Every input, output and internal word is assumed to be atomized
as described in Section 2. Every atom of bit-width m takes values from the
set {0 . . .2m − 1,X}, where constant bit-vectors have been represented by their
integer values. The values themselves are ordered in a lattice as discussed in
Section 3. Let ≤m denote the ordering relation and unionsqm denote the lub operator
in the lattice of values for an m-bit atom. The lattice of values for a word is the
product of lattices corresponding to every atom in the word. Let A denote the
collection of all atoms in the design, and let D denote the collection of values
of all atoms in A. A state of the design is a mapping s : A → D ∪ > such that
if a ∈ A is an m-bit atom, then s(a) is a value in the set {0, . . .2m − 1,X,>}.
Let S denote the set of all states of the design. Clearly S forms a lattice – one
that is isomorphic to the product of lattices corresponding to the atoms in A.
Given a design C, let TrC : S → S define the transition function of C.
Thus, given a state s of C at time t, the next state of the design at time t + 1
is given by TrC(s). To model the behavior of a design over time, we define a
sequence of states as a mapping σ : N → S, where N denotes the set of natural
numbers. A trajectory for a design C is a sequence σ such that for all t ∈ N,
TrC(σ(t)) v σ(t+1). Given two sequences σ1 and σ2, we abuse notation and say
that σ1 v σ2 iff for every t ∈ N, σ1(t) v σ2(t).
The general trajectory evaluation logic of Seger and Bryant [10] can be in-
stantiated to words as follows. A trajectory formula is a formula generated by
the grammar ϕ ::= a is val | ϕ and ϕ | P → ϕ | Nϕ , where a is an atom of C,
val is a non-X, non-> value in the lattice of values for a, and P is a quantifier-
free formula in the theory of bit-vectors. Formulas like P in the grammar above
are also called guards in STE parlance.
Following Seger et al [2,11], the defining sequence of a trajectory formula ψ
given the assignment φ, denoted [ψ]φ, is defined inductively as follows. Here, b
denotes an arbitrary m-bit atom in A and t ∈ N.
– [a is val]φ(t)(b) , val if t = 0 and both a, b denote the same m-bit atom,
and is X otherwise.
– [ψ1 and ψ2]
φ(t)(b) , [ψ1]φ(t)(b) unionsqm [ψ2]φ(t)(b)
– [P → ψ]φ(t)(b) , [ψ]φ(t)(b) if φ |= P , and is X otherwise.
– [Nψ]φ(t)(b) , [ψ]φ(t− 1)(b) if t 6= 0, and is X otherwise.
Similarly, the defining trajectory of ψ with respect to a design C, denoted [[ψ]]φC
can be defined as follows.
– [[ψ]]φC(0) , [ψ]φ(0)
– [[ψ]]φC(t+ 1) , [ψ]φ(t+ 1) unionsq TrC([[ψ]]φC(t)) for every t ∈ N.
In symbolic trajectory evaluation, we are given an antecedent Ant and a con-
sequent Cons in trajectory evaluation logic. We are also given a quantifier-free
formula Constr in the theory of bit-vectors with free variables that appear in the
guards of Ant and/or Cons. We wish to determine if for every assignment φ that
satisfies Constr, we have [Cons]φ v [[Ant]]φC .
5.1 Implementation
We have developed a tool called STEWord that uses symbolic simulation with
invalid-bit encoding and SMT solving to perform STE. Each antecedent and
consequent tuple has the format (g, a, vexpr, start, end), where g is a guard, a
is the name of an atom in the design under verification, vexpr is a symbolic
expression over constants and guard variables that specifies the value of a, and
start and end denote time points such that end ≥ start+ 1.
An antecedent tuple (g, a, vexpr, t1, t2) specifies that given an assignment φ
of guard variables, if φ |= g, then atom a is assigned the value of expression
vexpr, evaluated on satisfying assignments of φ, for all time in {t1, . . . t2 − 1}.
If, however, φ 6|= g, atom a is assigned the value X for all time in {t1, . . . t2− 1}.
If a is an input atom, the antecedent tuple effectively specifies how it is driven
from time t1 through t2 − 1. Using invalid-bit encoding, the above semantics
is easily implemented by setting inv(a) to ¬g and val(a) to vexpr from time t1
through t2 − 1. If a is an internal atom, the defining trajectory requires us to
compute the lub of the value driven by the circuit on a and the value specified by
the antecedent for a, at every time point in {t1, . . . t2 − 1}. The value driven by
the circuit on a at any time is computed by symbolic simulation using invalid-
bit encoding, as explained in Sections 4.1 and 4.2. The value driven by the
antecedent can also be invalid-bit encoded, as described above. Therefore, the
lub can be computed as described in Section 4.3. If the lub is not >, val(a) and
inv(a) can be set to the value and invalid-bit, respectively, of the lub. In practice,
we assume that the lub is not > and proceed as above. The conditions under
which the lub evaluates to > are collected separately, as described below. The
values of all atoms that are not specified in any antecedent tuple are obtained
by symbolically simulating the circuit using invalid-bit encoding.
If the lub computed above evaluates to >, we must set atom a to an unachiev-
able over-constrained value. This is called antecedent failure in STE parlance.
In our implementation, we collect the constraints (condition for case (a) in Sec-
tion 4.3) under which antecedent failure occurs for every antecedent tuple in a
set AntFail. Depending on the mode of verification, we do one of the following:
– If the disjunction of formulas in AntFail is satisfiable, we conclude that there
is an assignment of guard variables that leads to an antecedent failure. This
can then be viewed as a failed run of verification.
– We may also wish to check if [Cons]φ v [[Ant]]φC only for assignments φ that
do not satisfy any formula in AntFail. In this case, we conjoin the negation
of every formula in AntFail to obtain a formula, say NoAntFail, that defines
all assignments φ of interest.
A consequent tuple (g, a, vexpr, t1, t2) specifies that given an assignment φ
of guard variables, if φ |= g, then atom a must have its invalid bit set to false
and value set to vexpr, evaluated on satisfying assignments of φ, for all time in
{t1, . . . t2−1}. If φ 6|= g, a consequent tuple imposes no requirement on the value
of atom a. Suppose that at time t, a consequent tuple specifies a guard g and a
value expression vexpr for an atom a. Suppose further that (val(a), inv(a)) gives
the invalid-bit encoded value of this atom at time t, as obtained from symbolic
simulation. Checking whether [Cons]φ(t)(a) v [[Ant]]φC(t)(a) for all assignments
φ reduces to checking the validity of the formula
(
g → (¬inv(a) ∧ (vexpr =
val(a)))
)
. Let us call this formula OKa,t. Let T denote the set of all time points
specified in all consequent tuples, and let A denote the set of all atoms of the
design. The overall verification goal then reduces to checking the validity of the
formula OK ,
∧
t∈T , a∈AOKa,t. If we wish to focus only on assignments φ that
do not cause any antecedent failure, our verification goal is modified to check
the validity of NoAntFail → OK. In our implementation, we use Boolector [1],
a state-of-the-art solver for bit-vectors and the extensional theory of arrays, to
check the validity (or satisfiability) of all formulas OK generated by STEWord.
6 Experiments
We used STEWord to verify properties of a set of System-Verilog word-level
benchmark designs. Bit-level STE tools are often known to require user-guidance
with respect to problem decomposition and variable ordering (for BDD based
tools), when verifying properties of designs with moderate to wide datapaths.
Similarly, BMC tools need to introduce a fresh variable for each input in each
time frame when the value of the input is unspecified. Our benchmarks were in-
tended to stress bit-level STE tools, and included designs with control and data-
path logic, where the width of the datapath was parameterized. Our benchmarks
were also intended to stress BMC tools by providing relatively long sequences
of inputs that could either be X or a specified symbolic value, depending on a
symbolic condition. In each case, we verified properties that were satisfied by the
system and those that were not. For comparative evaluation, we implemented
word-level bounded model checking as an additional feature of STEWord itself.
Below, we first give a brief description of each design, followed by a discussion
of our experiments.
Design 1: Our first design was a three-stage pipelined circuit that read
four pairs of k-bit words in each cycle, computed the absolute difference of
each pair, and then added the absolute differences with a current running sum.
Alternatively, if a reset signal was asserted, the pipeline stage that stored the sum
was reset to the all-zero value, and the addition of absolute differences of pairs of
inputs started afresh from the next cycle. In order to reduce the stage delays in
the pipeline, the running sum was stored in a redundant format and carry-save-
adders were used to perform all additions/subtractions. Only in the final stage
was the non-redundant result computed. In addition, the design made extensive
use of clock gating to reduce its dynamic power consumption – a characteristic
of most modern designs and that significantly complicates formal verification.
Because of the non-trivial control and clock gating, the STE verification required
a simple datapath invariant. Furthermore, in order to reduce the complexity in
specifying the correctness, we broke down the overall verification goal into six
properties, and verified these properties using several datapath widths.
Design 2: Our second design was a pipelined serial multiplier that read two
k-bit inputs serially from a single k-bit input port, multiplied them and made the
result available on a 2k-bit wide output port in the cycle after the second input
was read. The entire multiplication cycle was then re-started afresh. By asserting
and de-asserting special input flags, the control logic allowed the circuit to wait
indefinitely between reading its first and second inputs, and also between reading
its second input and making the result available. We verified several properties
of this circuit, including checking whether the result computed was indeed the
product of two values read from the inputs, whether the inputs and results were
correctly stored in intermediate pipeline stages for various sequences of asserting
and de-asserting of the input flags, etc. In each case, we tried the verification
runs using different values of the bit-width k.
Design 3: Our third design was an implementation of the first stage in a typ-
ical digital camera pipeline. The design is fed the output of a single CCD/CMOS
sensor array whose pixels have different color filters in front of them in a Bayer
mosaic pattern [8]. The design takes these values and performs a “de-mosaicing”
of the image, which basically uses a fairly sophisticated interpolation technique
(including edge detection) to estimate the missing color values. The challenge
here was not only verifying the computation, which entailed adding a fairly large
number of scaled inputs, but also verifying that the correct pixel values were
used. In fact, most non-STE based formal verification engines will encounter
difficulty with this design since the final result depends on several hundreds of
8-bit quantities.
Design 4: Our fourth design is a more general version of Design 3, that
takes as input stream of values from a single sensor with a mosaic filter having
alternating colors, and produces an interpolated red, green and blue stream as
output. Here, we verify 36 different locations on the screen, which translates to
36 different locations in the input stream. Analyzing this example with BMC
requires providing new inputs every cycle for over 200 cycles, leading to a blow-
up in the number of variables used.
For each benchmark design, we experimented with a bug-free version, and
with several buggy versions. For bit-level verification, we used both a BDD-based
STE tool [11] and propositional SAT based STE tool [9]; specifically, the tool
Forte was used for bit-level STE. We also ran word-level BMC to verify the same
properties.
In all our benchmarks, we found that Forte and STEWord successfully verified
the properties within a few seconds when the bitwidth was small (8 bits). How-
ever, the running time of Forte increased significantly with increasing bit-width,
and for bit-widths of 16 and above, Forte could not verify the properties without
serious user intervention. In contrast, STEWord required practically the same
time to verify properties of circuits with wide datapaths, as was needed to verify
properties of the same circuits with narrower datapaths, and required no user
intervention. In fact, the word-level SMT constraints generated for a circuit with
a narrow datapath are almost identical to those generated for a circuit with a
wider datapath, except for the bit-widths of atoms. This is not surprising, since
once atomization is done, symbolic simulation is agnostic to the widths of var-
ious atoms. An advanced SMT solver like Boolector is often able to exploit the
word-level structure of the final set of constraints and solve it without resorting
to bit-blasting.
The BMC experiments involved adding a fresh variable in each time frame
when the value of an input was not specified or conditionally specified. This
resulted in a significant blow-up in the number of additional variables, espe-
cially when we have long sequences of conditionally driven inputs. This in turn
adversely affected SMT-solving time, causing BMC to timeout in some cases.
To illustrate how the verification effort with STEWord compared with the
effort required to verify the same property with a bit-level BDD- or SAT-based
STE tool, and with word-level BMC, we present a sampling of our observations
in Table I, where no user intervention was allowed for any tool. Here “-” indicates
more than 2 hours of running time, and all times are on an Intel Xeon 3GHz CPU,
using a single core. In the column labeled “Benchmark”, Designi-Pj corresponds
to verifying property j (from a list of properties) on Design i. The column labeled
“Word-level latches (# bits)” gives the number of word-level latches and the
total number of bits in those latches for a given benchmark. The column labeled
“Cycles of Simulation” gives the total number of time-frames for which STE and
BMC was run. The column labeled “Atom Size (largest)” gives the largest size
of an atom after our atomization step. Clearly, atomization did not bit-blast all
words, allowing us to reason at the granularity of multi-bit atoms in STEWord.
Benchmark STEWord Forte BMC Word-level latches Cycles of Atom Size
(BDD and SAT) (# bits) Simulation (largest)
Design1-P1 2.38s - 3.71s 14 latches 12 31
(32 bits) - (235 bits wide)
Design1-P1 2.77s - 4.53s 14 latches 12 64
(64 bits) - (463 bits wide)
Design2-P2 1.56s - 1.50s 4 latches 6 32
(16 bits) - (96 bits wide)
Design2-P2 1.65s - 1.52s 4 latches 6 64
(32 bits) - (128 bits wide)
Design3-P3 24.06s - - 54 latches 124 16
(16 bits) - (787 bits wide)
Design4-P4 56.80s - - 54 latches 260 16
(16 bits) - (787 bits wide)
Design4-P4 55.65s - - 54 latches 260 32
(32 bits) - (1555 bits wide)
Table 2. Comparing verification effort (time) with STEWord, Forte and BMC
Our experiments indicate that when a property is not satisfied by a circuit,
Boolector finds a counterexample quickly due to powerful search heuristics imple-
mented in modern SMT solvers. BDD-based bit-level STE engines are, however,
likely to suffer from BDD size explosion in such cases, especially when the bit-
widths are large. In cases where there are long sequences of conditionally driven
inputs (e.g., design 4) BMC performs worse compared to STEWord, presumably
beacause of the added complexity of solving constraints with significantly larger
number of variables. In other cases, the performance of BMC is comparable
to that of STEWord. An important observation is that the abstractions intro-
duced by atomization and by approximations of invalid-bit expressions do not
cause STEWord to produce conservative results in any of our experiments. Thus,
STEWord strikes a good balance between accuracy and performance. Another
interesting observation is that for correct designs and properties, SMT solvers
(all we tried) sometimes fail to verify the correctness (by proving unsatisfiability
of a formula). This points to the need for further developments in SMT solving,
particularly for proving unsatisfiability of complex formulas. Overall, our exper-
iments, though limited, show that word-level STE can be beneficial compared
to both bit-level STE and word-level BMC in real-life verification problems.
We are currently unable to make the binaries or source of STEWord publicly
available due to a part of the code being proprietary. A web-based interface to
STEWord, along with a usage document and the benchmarks reported in this
paper, is available at http://www.cfdvs.iitb.ac.in/WSTE/
7 Conclusion
Increasing the level of abstraction from bits to words is a promising approach to
scaling STE to large designs with wide datapaths. In this paper, we proposed a
methodology and presented a tool to achieve this automatically. Our approach
lends itself to a counterexample guided abstraction refinement (CEGAR) frame-
work, where refinement corresponds to reducing the conservativeness in invalid-
bit expressions, and to splitting existing atoms into finer bit-slices. We intend to
build a CEGAR-style word-level STE tool as part of future work.
Acknowledgements. We thank Taly Hocherman and Dan Jacobi for their help
in designing a System-Verilog symbolic simulator. We thank Ashutosh Kulkarni
and Soumyajit Dey for their help in implementing and debugging STEWord.
References
1. R. Brummayer and A. Biere. Boolector: An efficient SMT solver for bit-vectors
and arrays. In Proc. of TACAS, pages 174–177, 2009.
2. R. E. Bryant and C.-J. H. Seger. Formal verification of digital circuits using
symbolic ternary system models. In Proc. of CAV, pages 33–43, 1990.
3. E. A. Emerson. Temporal and modal logic. In Hanbook of Theoretical Computer
Science, pages 995–1072. Elsevier, 1995.
4. P. Johannsen. Reducing bitvector satisfiability problems to scale down design sizes
for rtl property checking. In Proc. of HLDVT, pages 123–128, 2001.
5. R. B. Jones, J. W. O’Leary, C.-J. H. Seger, M. Aagaard, and T. F. Melham.
Practical formal verification in microprocessor design. IEEE Design & Test of
Computers, 18(4):16–25, 2001.
6. R. Kaivola, R. Ghughal, N. Narasimhan, A. Telfer, J. Whittemore, S. Pandav,
A. Slobodova´, C. Taylor, V. Frolov, E. Reeber, and A. Naik. Replacing Testing with
Formal Verification in Intel CoreTM i7 Processor Execution Engine Validation. In
Proc. of CAV, pages 414–429, 2009.
7. V. M. A. KiranKumar, A. Gupta, and R. Ghughal. Symbolic trajectory evaluation:
The primary validation vehicle for next generation intel R© processor graphics fpu.
In Proc. of FMCAD, pages 149–156, 2012.
8. H. S. Malvar, H. Li-Wei, and R. Cutler. High-quality linear interpolation for
demosaicing of Bayer-patterned color images. In Proc. of ICASSP, volume 3,
pages 485–488, 2004.
9. J.-W. Roorda and K. Claessen. A new SAT-based algorithm for symbolic trajectory
evaluation. In Proc. of CHARME, pages 238–253, 2005.
10. C.-J. H. Seger and R. E. Bryant. Formal verification by symbolic evaluation of
partially-ordered trajectories. Formal Methods in System Design, 6(2):147–189,
1995.
11. C.-J. H. Seger, R. B. Jones, J. W. O’Leary, T. F. Melham, M. Aagaard, C. Barrett,
and D. Syme. An industrially effective environment for formal hardware verifica-
tion. IEEE Trans. on CAD of Integrated Circuits and Systems, 24(9):1381–1405,
2005.
12. A. Stump, C. W. Barrett, and D. L. Dill. A decision procedure for an extensional
theory of arrays. In Proc. of LICS, pages 29–37. IEEE Computer Society, 2001.
