Effective use of Boolean satisfiability procedures in the formal verification of superscalar and VLIW microprocessors  by Velev, Miroslav N & Bryant, Randal E
Journal of Symbolic Computation 35 (2003) 73–106
www.elsevier.com/locate/jsc
Effective use of Boolean satisfiability procedures
in the formal verification of superscalar and VLIW
microprocessors
Miroslav N. Veleva,∗, Randal E. Bryantb
aSchool of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
bComputer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
Received 15 October 2001; accepted 17 July 2002
Abstract
We compare SAT-checkers and decision diagrams on the evaluation of Boolean formulae
produced in the formal verification of both correct and buggy versions of superscalar and VLIW
microprocessors. The microprocessors are described in a high-level hardware description language,
based on the logic of equality with uninterpreted functions and memories (EUFM). The formal
verification is done with Burch and Dill’s correctness criterion, using flushing to map the state of
the implementation processor to the state of the specification. The EUFM correctness formula is
translated to an equivalent Boolean formula by exploiting the property of positive equality, and
using the automatic tool EVC. We identify the SAT-checkers Chaff and BerkMin as significantly
outperforming the rest of the SAT tools when evaluating the Boolean correctness formulae. We
examine ways to enhance the performance of Chaff and BerkMin by variations when generating
the Boolean formulae. We reassess optimizations we developed earlier to speed up the formal
verification. c© 2003 Elsevier Science Ltd. All rights reserved.
1. Introduction
In the past few years, SAT-checkers have made dramatic improvements in both
their speed and capacity. We compare 31 of them with decision diagrams—binary
decision diagrams (BDDs) (Bryant, 1986, 1992) and Boolean expression diagrams
(BEDs) (Williams, 2000)—as well as with ATPG tools (Hamzaoglu and Patel, 1999;
Tafertshofer et al., 2000), when used as Boolean satisfiability (SAT) procedures in the for-
mal verification of microprocessors. The comparison is based on two benchmark suites,
each consisting of 101 Boolean formulae generated in the verification of one correct
∗ Corresponding author.
E-mail addresses: mvelev@ece.gatech.edu (M.N. Velev), Randy.Bryant@cs.cmu.edu (R.E. Bryant).
0014-5793/03/$ - see front matter c© 2003 Elsevier Science Ltd. All rights reserved.
doi:10.1016/S0747-7171(02)00091-3
74 M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106
and 100 buggy versions of the same design—a superscalar and a VLIW microprocessor,
respectively. Unlike existing benchmark suites, e.g. ISCAS85 (Brglez and Fujiwara, 1985)
and ISCAS89 (Brglez et al., 1989), which are collections of circuits that have nothing in
common, each of our suites is based on a single design and hence provides a point for
consistent comparison of different evaluation methods.
The correctness condition that we use is expressed in a decidable subset of First-
Order Logic (Burch and Dill, 1994). That allows it either to be checked directly with
a customized decision procedure, such as SVC,1 or to be translated to an equivalent
Boolean formula (Velev and Bryant, 1999) that can be evaluated with SAT engines for
either proving correctness or finding a counterexample. The latter approach can directly
benefit from improvements in the SAT tools.
We identify Chaff (Moskewicz et al., 2001; Zhang et al., 2001) and BerkMin (Gold-
berg and Novikov, 2002) as the most efficient SAT-checkers for the second verification
strategy. Chaff and BerkMin significantly outperform BDDs and the SAT-checker DLM-3
(Shang and Wah, 1998), the previous most efficient SAT procedures for, respectively,
correct and buggy processors. We reevaluate optimizations we developed earlier to enhance
the performance of BDDs and DLM-3, and conclude that many of them are no longer
crucial on the same benchmark suites. This study allows us to eliminate conservative
approximations that can lead to false negative results—a source of annoyance for users.
Our initial research was on developing an efficient memory model (EMM) for abstract-
ing memory arrays in symbolic ternary simulation at the bit level (Velev and Bryant,
1998a). The ternary value X , representing a don’t-care condition and encoded symboli-
cally, allows us to dramatically reduce the number of symbolic vectors that need to be
simulated. Additionally, it gives us a way to express ambiguity in signal values—a prop-
erty that we exploited to model violations in the setup and hold time requirements for
memory inputs, as well as to represent the uncertainty of memory output delays that can
range between a minimum and a maximum value.
The EMM dynamically introduces consistent initial state for accessed symbolic
addresses, and that allowed us to read-only EMMs to abstract bit-level combinational
functional units (Velev and Bryant, 1998b). However, the presence of feedback loops in
pipelined processors (e.g. as introduced by the forwarding logic or the register file) resulted
in impossible to satisfy variable-ordering constraints when BDDs were used to evaluate the
Boolean correctness formulae. By restricting the style for defining processors, while still
able to model the same features, we obtained correctness formulae where most of the word-
level values appear only in positive (not negated) equality comparisons. This structure of
the formulae allowed us to treat such word-level values as distinct constants, thus dramati-
cally pruning the solution space, while still performing exhaustive formal proofs. We called
this property positive equality (Bryant et al., 2001), and showed that it results in orders of
magnitude speedup (Velev and Bryant, 1999).
Next, we demonstrated that the same modeling techniques can be used to define and for-
mally verify single-issue pipelined, and dual-issue superscalar processors that implement
exceptions, branch prediction, and multicycle functional units (Velev and Bryant, 2000). A
1 Stanford validity checker (SVC) is available from: http://sprout.Stanford.EDU/SVC.
M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106 75
more complex VLIW processor—imitating the Intel Itanium (Sharangpani and Arora,
2000) in features such as predicated execution, advanced loads, and speculative regis-
ter remapping—was then formally verified (Velev, 2000a). A method to automatically
abstract memory arrays, whose correct operation is enforced by the interaction of for-
warding and stalling logic, resulted in an order of magnitude speedup of the BDD-
based evaluation of the correctness formula (Velev, 2001). A significant breakthrough
occurred with the development of the SAT-checker Chaff, as reported in our earlier study
(Velev and Bryant, 2001a). The current paper gives more details of that work, and presents
additional experimental results.
The rest of the paper is organized as follows. Section 2 presents the background of
high-level modeling and formal verification of microprocessors. Section 3 describes our
microprocessor benchmarks used in the experiments. Section 4 lists the compared SAT
procedures, explains the translation of the Boolean correctness formulae to CNF format,
and presents results showing that only two SAT tools—Chaff and BerkMin—scale for
our complex benchmarks. Then, we explore ways to efficiently use these two SAT-
checkers. Section 5 studies the impact of variations when generating and evaluating the
Boolean correctness formulae. Section 6 compares two ways to encode word-level equality
comparisons in the correctness formulae. Section 7 evaluates the benefits of decomposing
the correctness criterion. Section 8 studies the usefulness of conservative approximations
and positive equality. Section 9 concludes the paper, and prioritizes the optimizations that
help Chaff and BerkMin.
2. Background
The formal verification is done by correspondence checking—comparison of the single-
issue pipelined, or superscalar, or VLIW implementation processor against a non-pipelined
specification processor, by using Burch and Dill’s (1994) flushing technique. The correct-
ness criterion is expressed as a formula in the logic of equality with uninterpreted functions
and memories (EUFM), also proposed by Burch and Dill (1994), and states that all archi-
tectural state elements in the processor should be updated in synchrony by either 0 or 1,
or up to k instructions after each clock cycle, where k is the maximum number of instruc-
tions that the design can fetch in a clock cycle. The correctness formula is then translated
to an equivalent Boolean formula by the automatic tool EVC (Velev and Bryant, 2001b)
that exploits the properties of positive equality (Bryant et al., 2001), the ei j encoding
(Goel et al., 1998), and a number of conservative approximations. The resulting Boolean
correctness formula should be a tautology (or, equivalently, its complement should be
unsatisfiable) in order for the processor to be correct, and can be evaluated by any SAT
procedure.
The syntax of EUFM (Burch and Dill, 1994) includes terms and formulae—see Fig. 1.
Terms are used to abstract word-level values of data, register identifiers, memory
addresses, as well as the entire states of memory arrays. A term can be an uninterpreted
function (UF) applied to a list of argument terms; a term variable (that can be viewed as
an UF symbol without arguments); or an ITE operator selecting between two argument
terms based on a controlling formula, such that ITE (formula, term1, term2) will evaluate
76 M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106
Fig. 1. Syntax of the logic of EUFM.
to term1 when formula = true and to term2 when formula = false. The syntax for terms
can be extended to model memories by means of the interpreted functions read and write
(Burch and Dill, 1994; Velev, 2001). Function read takes two terms, serving as memory
state and address, respectively, and returns a term for the data at that address. Function
write takes three terms—memory state, address, and data—and returns a term for the new
memory state after the update. The two functions satisfy the forwarding property of the
memory semantics—a read returns the data written by the last write, if their addresses are
equal, or the data from the previous memory state otherwise. The initial state of memory
is abstracted with a term variable.
Formulae are used to model the control path of a microprocessor, as well as to express
the correctness condition. A formula can be an uninterpreted predicate (UP) applied to
a list of argument terms; a propositional variable (that can be viewed as an UP symbol
without arguments); an ITE operator selecting between two argument formulae based on a
controlling formula; or an equation (equality comparison) of two terms. Formulae can be
negated, conjuncted, or disjuncted. We will refer to both terms and formulae as expressions.
UFs and UPs are used to abstract away the implementation details of functional units
by replacing them with “black boxes” that satisfy no particular properties other than that of
functional consistency—equal combinations of expressions at the inputs of the UF (or UP)
produce equal output values. Then, it no longer matters whether the original functional unit
is an adder or a multiplier etc., as long as the same UF (or UP) is used to replace it in both
the implementation and the specification processor. We assume that the functional units
and memories are formally verified separately.
2.1. Example high-level pipelined microprocessor
The above abstraction techniques are illustrated with the 3-stage pipelined processor
shown in Fig. 2. The three stages are instruction fetch and decode (IFD), Execute (EX),
and Write-Back (WB). For illustration purposes, the processor can execute register–register
instructions only. Uninterpreted functions ALU, and +4 are used to abstract, respectively,
the ALU, and the adder for incrementing the program counter (PC). The register file is
abstracted with functions read and write, such that signal WB RegWrite is used as the
M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106 77
Fig. 2. Block diagram of a 3-stage pipelined processor.
condition for performing writes. In other words, if wb regwrite is a symbolic expression
for the value of that signal, then the new Register File state will be ITE (wb regwrite,
write (prev state, wb destreg, wb result), prev state), where prev state is an expression for
the previous Register File state; wb destreg and wb result are expressions for the values
of signals WB DestReg and WB Result, respectively, and serve as the address and data
arguments of the write operation. All the word-level values—register identifiers, opcodes,
data operands, ALU result, and PC—are modeled as terms. The opcode, Op, specifies the
operation to be performed by the ALU. We will assume that the processor does not execute
self-modifying code, which allows us to represent the (read-only) instruction memory,
InstrMemory, as a collection of UFs and UPs that take the PC as argument and abstract the
fetching and decoding of the corresponding field of a new instruction. The processor has
forwarding logic, situated in EX, only for the second data operand. Data hazards for the first
data operand are avoided by the stalling logic in IFD, so that the dependent instruction is
delayed in IFD until the result it needs is written back to the Register File. The Register File
is assumed to be write-before-read, which is modeled by synchronizing its updates with a
phase clock that precedes the phase clock controlling the updates of pipeline latch IFD EX.
These modeling details are expressed in a high-level hardware description language that is
accepted by the term-level symbolic simulator TLSim (Velev and Bryant, 2001b).
The correct behavior is defined by a non-pipelined specification processor that is built
from the same UFs, UPs, and architectural state elements (PC and Register File in the
example) as the pipelined implementation—see Fig. 3. This design is the instruction set
architecture (ISA) put together in a single-cycle model. The processor fetches, executes,
and completes one new instruction on every clock cycle. Because of its simplicity, it is easy
78 M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106
Fig. 3. Block diagram of the non-pipelined specification processor.
to define correctly. Furthermore, it will be extremely easy to formally verify—the action
of every instruction type can be checked directly against its expected behavior, as defined
in the ISA.
Note that by applying all of the abstractions, we get a much more general pipelined
processor than the original, such that the functional units are only functionally consistent,
but do not satisfy any other properties of their original implementations. However, proving
the correctness of such abstract processors is much easier. If the pipeline is correct, it will
work properly for any functionally consistent implementation of the logic that is abstracted
with UFs or UPs.
2.2. Overview of translating the EUFM correctness formula to equivalent Boolean
formula
In order to translate the EUFM correctness formula to an equivalent Boolean formula,
we need to eliminate the UFs and UPs in a way that their property of functional consistency
is enforced, as well as to encode the term-level equality comparisons with Boolean
formulae such that the property of transitivity of equality is satisfied.
Two possible ways to eliminate UFs and UPs, while enforcing their property of
functional consistency, are Ackermann constraints (Ackermann, 1954), and nested ITEs
(Velev and Bryant, 1999; Bryant et al., 2001). The Ackermann scheme replaces each UF
(UP) application in the EUFM formula F with a new term variable (propositional
variable), and then adds external constraints for functional consistency. For example, the
UF application f (a1, b1) will be replaced by a new term variable c1, another application
of the same UF, f (a2, b2), will be replaced by a new term variable c2. Then, the resulting
EUFM formula F ′ will be extended as
[(a1 = a2) ∧ (b1 = b2) ⇒ (c1 = c2)] ⇒ F ′.
M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106 79
In the nested-ITE scheme, the first application of the UF above will still be replaced by a
new term variable c1. However, the second will be replaced by
ITE((a3 = a1) ∧ (b2 = b1), c1, c2),
where c2 is a new term variable. A third, f (a3, b3), will be replaced by
ITE((a3 = a1) ∧ (b3 = b1), c1, ITE((a3 = a2) ∧ (b3 = b2), c2, c3)),
where c3 is a new term variable, and so on. UPs are eliminated similarly by using new
Boolean variables, instead of new term variables.
Positive equality allows the identification of two types of terms in the structure of an
EUFM formula—those that appear in only positive equations ( p-equations) and are so-
called p-terms (for positive terms), and those that appear in both positive and negative
equations and are so-called g-terms (for general terms). A negative equation is one that
appears under an odd number of negations, or as part of the controlling formula for an ITE
operator. The efficiency from exploiting positive equality is due to the observation that the
truth of an EUFM formula under a maximally diverse interpretation of the p-terms implies
the truth of the formula under any interpretation. A maximally diverse interpretation is one
where the equality comparison of a term variable with itself evaluates to true, that of a
p-term variable with a syntactically distinct term variable evaluates to false, and that of a
g-term variable with a syntactically distinct g-term variable could be either true or false
and can be encoded either with a Boolean variable (Goel et al., 1998) or with a Boolean
function (Pnueli et al., 1999)—details of these encodings are presented in Section 6. We
call the equality comparison of two syntactically distinct g-term variables a g-equation.
Evaluating the EUFM correctness formula under a maximally diverse interpretation results
in dramatic simplifications, and thus in orders of magnitude speedup.
As a result, the EUFM correctness formula is translated to an equivalent Boolean
formula that has to be a tautology in order for the EUFM correctness formula to be valid.
The Boolean formula can be evaluated with any SAT procedure—see Section 4.
3. Microprocessor benchmarks
We base our comparison of SAT procedures on a set of high-level microprocessors:
• 1 × DLX-C (Velev and Bryant, 1999): a single-issue 5-stage pipelined DLX, as
described by Hennessy and Patterson (2002);
• 2 × DLX-CC (Velev and Bryant, 1999): a dual-issue superscalar DLX, which is an
extended version of a processor verified by Burch (1996);
• 2×DLX-CC-MC-EX-BP (Velev and Bryant, 2000): a version of 2×DLX-CC with
multicycle functional units, exceptions, and branch prediction;
• 9VLIW-MC-BP (Velev, 2000a): a 9-wide VLIW processor that imitates the
Intel Itanium (Intel Corporation, 1999; Sharangpani and Arora, 2000) in specula-
tive features such as predicated execution, speculative register remapping, advanced
loads, and branch prediction.
The single-issue pipelined processor, 1 × DLX-C, has five stages: Fetch, Decode,
Execute, Memory, and Write-Back. The design can execute seven instruction types:
80 M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106
register–register ALU instructions, register-immediate ALU instructions, loads, stores,
branches, jumps, and nops. A nop only increments the PC, but does not modify other
architectural state elements. Branches do not have delay slots, i.e. an instruction that
immediately follows a branch is completed only if the branch is not taken. The processor
is biased for branch-not-taken, and continues to fetch instructions that sequentially follow
a branch (i.e. instructions from the path when the branch is not taken) until the branch
is resolved in the Execute stage. Then, if the branch is taken, as can be checked in the
Memory stage, the three speculatively fetched sequential instructions that are in the Fetch,
Decode, and Execute stages are squashed (canceled), and the PC is updated with the target
of the branch. Read-After-Write hazards—due to pending updates of the Register File by
instructions that are in the Memory and Write-Back stages, when a dependent instruction
is in the Execute stage—are resolved by forwarding of the data values from the Memory
and Write-Back stages to the inputs of the functional units in the Execute stage. However,
the processor does not have a forwarding path from the output of the Data Memory in the
Memory stage to the Execute stage in order to satisfy a data dependency when a load
gets data from memory and that value is used in the Execute stage by the instruction
immediately following the load. Although such a forwarding path is feasible, it will likely
lengthen the clock cycle in order to allow a signal to propagate through the Data Memory,
the forwarding logic, and then the ALU in the Execute stage, thus slowing all instructions
only to satisfy a data dependency in the frequent case of a load immediately followed by
a dependent instruction. Instead, commercial pipelined processors and our design adopt an
alternative solution that avoids such data hazards by stalling the dependent instruction in
the Decode stage, when the load providing a data operand is in the Execute stage. That
means that the dependent instruction stays in the Decode stage during the next clock cycle,
while the load is allowed to advance to the Memory stage, and a bubble (a combination
of control bits that will not modify any architectural state element) is inserted in the
Execute stage. A cycle later, the dependent instruction is allowed to advance to the Execute
stage, while the load will have gotten the data from the Memory stage and will be in
the Write-Back stage, so that the data dependency can be satisfied by the forwarding
path from the Write-Back to the Execute stage. This mechanism, preventing data hazards
in the case of a load immediately followed by a dependent instruction, is called a load
interlock (Hennessy and Patterson, 2002). A pipelined processor should be able to handle
any combination of hazards that might occur between instructions in the pipeline stages.
We assume that the processor does not execute self-modifying code, which allows us to
model the (read-only) Instruction Memory in the Fetch stage and the Data Memory in the
Memory stage as separate memories. Otherwise, the processor has to be extended with a
mechanism to reexecute instructions that get modified by store instructions still in later
pipeline stages; this mechanism is similar to the one for correcting branch mispredictions,
e.g. as implemented in 2 × DLX-CC-MC-EX-BP.
The dual-issue superscalar implementation, 2 × DLX-CC, consists of two 1 × DLX-C
pipelines, and can fetch up to two sequential instructions per clock cycle. Now there are
two load interlock conditions per instruction in the Decode stage, since each of these two
instructions has to be checked for data dependencies on the two possible loads in the
Execute stage. If the first instruction in Decode gets stalled, the second is also stalled.
Additionally, the second instruction is stalled if it has a data dependency on the first.
M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106 81
Fig. 4. Block diagram of the VLIW architecture that was formally verified.
2 × DLX-CC-MC-EX-BP extends 2 × DLX-CC with multicycle functional units, excep-
tions, and branch prediction. Each of the Instruction Memory, the two ALUs in the Execute
stage, and the Data Memory can take multiple cycles to produce a result, and can raise an
exception. The Fetch stage has an abstraction of a branch predictor that predicts both the
direction (taken or not-taken) and the target of a newly fetched branch, but only the target
of a newly fetched jump, since jumps are always taken. Based on these predictions, the PC
is speculatively updated, such that when the actual branch/jump outcome is known in the
Memory stage, special logic corrects mispredictions by squashing the speculatively fetched
instructions.
The VLIW benchmark, 9VLIW-MC-BP (see Fig. 4), is far more complex than any
other processor that has been formally verified previously in an automatic way. It has a
fetch engine that supplies the execution engine with a packet of nine instructions, with
no Read-After-Write dependencies between any two of them. Each of these instructions
is already matched with one of nine execution pipelines of four stages: four integer
pipelines, two of which can perform both integer and floating-point memory accesses;
two floating-point pipelines; and three branch-address computation pipelines. Data values
are stored in four register files: integer, floating-point, predicate, and branch-address.
In addition to these four register files, the architectural state consists of a PC, a Data
Memory, as well as two state elements from Intel’s 64-bit architecture, IA-64 (Intel, 1999)
(Sharangpani and Arora, 2000)—a current frame marker (CFM) that is used for register
remapping, and an advanced load address table (ALAT) that is used to implement advanced
loads. Every instruction is predicated with a qualifying predicate register identifier, such
that the result of that instruction affects architectural state only when the qualifying
82 M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106
predicate register has value 1. The two floating-point ALUs, as well as the Instruction and
Data Memories, can each take multiple cycles for computing a result or completing a fetch,
respectively. There can be up to 42 instructions in flight. An extended version, 9VLIW-MC-
BP-EX that also implements exceptions, was later designed as described in Section 7.
For both 2 × DLX-CC-MC-EX-BP and 9VLIW-MC-BP, we created 100 incor-
rect versions by injecting bugs into the designs. The bugs were variants of actual
errors made in the design of the correct versions, as well as variants of errors
detected in Intel microprocessors (Bentley, 2001; Jones, 2002), and in academic proces-
sors (Van Campenhout et al., 1998; 2000). The injected bugs included:
• Omitted inputs to logic gates. For example, a speculatively fetched instruction is
not squashed when a preceding branch is mispredicted, or a load interlock does not
account for all cases when a value loaded from memory will be used by a dependent
instruction.
• Incorrect inputs to logic gates, functional units, or memories. For example, an input
with the same name but a different index. Such bugs were detected in the formal
verification of an IA-32 instruction-length decoder in an actual Intel processor, as
discussed by Jones (2002, pp. 85–86). Bentley (2001) similarly lists typos and cut-
and-paste errors in a category of “Goof” bugs, detected in the Intel Pentium 4
microprocessor.
• Incorrect types of logic gates. For example, an AND gate instead of an OR gate, as
was the case in an actual Intel processor bug described by Jones (2002, p. 85).
• Lack of a mechanism to correct a speculative update of an architectural state element,
if the speculation turns out to be wrong. For example, the PC is updated speculatively,
based on a prediction for the direction and target of a newly fetched branch, but there
is no logic to update the PC with the correct branch target if the prediction happens to
be wrong. Similarly, when designing 9VLIW-MC-BP, a bug was inadvertently made
in that the CFM could be updated speculatively by instructions along the predicted
path after a branch, but there was no mechanism to restore the correct CFM state if
the branch was mispredicted.
Hence, the variations introduced were not completely random, as done in other efforts
to generate benchmark suites (Iwama et al., 1992; Mitchell et al., 1992; Iwama and Hino,
1994; Harlow and Brglez, 2001), but reflected realistic scenarios for errors that can be
made when designing high-level microprocessors. The bugs were spread over the entire
designs, and occurred either as single or multiple errors.
4. Comparison of SAT procedures
We evaluated 31 SAT-checkers. Nine of them were complete (i.e. could prove unsat-
isfiability), were based on the Davis–Putnam–Logemann–Loveland (DPLL) algorithm
(Davis et al., 1962), and implemented learning:
• SATO.3.2.1 (Zhang, 1997);
• GRASP (Marques-Silva and Sakallah, 1999; Marques-Silva, 1999),2 implementing
non-chronological backtracking, was used both with a single strategy and in a mode
2 GRASP is available from: http://vinci.inesc.pt/∼ jpms/grasp.
M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106 83
with restarts, randomization, and recursive learning (Baptista and Marques-Silva,
2000);
• CGRASP (Marques-Silva and e Silva, 1999),3 a version of GRASP that exploits struc-
tural information;
• rel sat.1.0, and rel sat.2.1 (Bayardo and Schrag, 1997);4
• rel sat rand.1.0 (Gomes et al., 2000);4
• Chaff (Moskewicz et al., 2001; Zhang et al., 2001),5 implementing lazy Boolean
constraint propagation, conflict-based relevance-limited learning, restarts, random-
ization, and decision heuristics guided by recent conflict clauses;
• SIMO.2.0 (Copty et al., 2001);6 and
• BerkMin version 62 (Goldberg and Novikov, 2002), extending the ideas from Chaff
with decision heuristics and database management procedures that attempt to satisfy
the most recently deduced conflict clauses.
Eight SAT-checkers were also complete and based on the DPLL algorithm, but did not
have learning:
• satz, and satz.v213 (Li and Anbulagan, 1997);4
• satz-rand.v4.6 (Gomes et al., 2000);4
• eqsatz.v20 (Li, 2000);
• posit (Freeman, 1995);4
• ntab (Crawford and Auton, 1996);4 and
• ASAT, and C-SAT (Dubois et al., 1993).
Seven SAT-checkers were incomplete (i.e. could not prove unsatisfiability):
• DLM-2, and DLM-3 (Shang and Wah, 1998), as well as DLM-2000 (Wu and Wah,
1999), all based on global random search and discrete Lagrangian Multipliers as
a mechanism to not only get the search out of local minima, but also steer it toward
a global minimum, i.e. toward a satisfying assignment;
• GSAT.v41 (Selman and Kautz, 1993);4
• WalkSAT.v37 (Selman et al., 1996);4
• CLS (Prestiwich, 2000); and
• UnitWalk (Hirsch and Kojevnikov, 2001),7 based on local search guided by unit
clause elimination.
And seven other SAT-checkers, based on different methods:
• QSAT (Plaisted et al., 2002), and QBF (Rintanen, 1999), both targeted to quantified
Boolean formulae;
• ZRes (Chatalic and Simon, 2000), combining zero-suppressed BDDs (Minato, 1996,
2001) with the original (Davis and Putnam, 1960) procedure;
3 CGRASP is available from: http://vinci.inesc.pt/∼ lgs/cgrasp.
4 The SAT-checker is available from: http://www.satlib.org/solvers.html.
5 We used the version mChaff with parameter file cherry 032301 (Moskewicz, 2001).
6 SIMO.2.0 is available from: http://frege.mrg.dist.unige.it/star/sim/home.html.
7 UnitWalk is available from: http://logic.pdmi.ras.ru/∼arist/UnitWalk.
84 M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106
Fig. 5. Translation of Boolean operators to CNF: (a) ∧; (b) ∨; (c) ITE; and (d) ¬.
• BSAT, and IS-USAT, both using BDDs and exploiting the properties of unate Boolean
functions (Kalla et al., 2000);
• Prover, a commercial SAT-checker using Sta˚lmarck’s method (Sta˚lmarck, 1989);
and
• HeerHugo (Groote and Warners, 2000), a publicly available SAT-checker that also
uses Sta˚lmarck’s method.
Additionally, we experimented with two ATPG tools—ATOM (Hamzaoglu and Patel,
1999), and TIP (Tafertshofer et al., 2000)—used to test the output of a benchmark for being
stuck-at-0, thus triggering the justification of value 1 at the output, and turning the ATPG
tool into a SAT-checker.
We also used BDDs (Bryant, 1986, 1992), and BEDs (Williams, 2000). The latter is not
a canonical representation of Boolean functions, but was shown by Williams et al. (2000)
to be extremely efficient when formally verifying multipliers.
The translation to the CNF format (Johnson and Trick, 1993), used as input to most
SAT-checkers, was done after inserting a negation at the top of the Boolean correctness
formula that has to be a tautology in order for the processor to be correct. If the formula is
indeed a tautology, then its negation will be constantly false, and a complete SAT-checker
will be able to prove unsatisfiability. Otherwise, a satisfying assignment for the negation
will be a counterexample.
In translating to CNF, we introduced a new auxiliary Boolean variable for every∧,∨, or
ITE operator in the Boolean correctness formula, and then imposed disjunctive constraints
(clauses) that the value of a variable for an operator must be consistent with the values
of the variables for the operands of that operator, given its semantics—see Fig. 5(a)–(c).
Negations (¬) do not generally require introducing new variables and clauses. Instead,
we can represent the value of a negation by using the complement of the variable for its
M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106 85
Fig. 6. Translation of a Boolean formula (a) to a CNF formula (b) by replacing internal negations with the
complements of their arguments, when we want to find a falsifying assignment for the original output z. Variables
a, b, c, d, e, and f are primary variables since they represent inputs to the formula. Variables x, y, and z are
auxiliary variables and are introduced to represent the values of operators other than negations. The top negation
with output w, also an auxiliary variable, and the last constraint “∧w” were added so that a satisfying assignment
for the CNF formula (b) will be a falsifying assignment for the original output z.
argument. For example, rather than introducing variables a′ and y ′ in Fig. 6(a), we use
negated versions of variables a and y, respectively, in Fig. 6(b), thus reducing the number
of variables and clauses in the CNF formula. Only the negation inserted at the top of the
original Boolean correctness formula was explicitly represented with clauses, e.g. those
restricting variable w to be the negation of variable z in Fig. 6. All clauses were conjuncted
together, including a constraint that the top-level formula (the negation of the original
Boolean correctness formula)—represented by variable w in Fig. 6—must be true. The
same translation of Boolean formulae to CNF format was used by Larrabee (1992), except
that negations were explicitly represented with clauses (see Fig. 5(d)). When generating the
Boolean correctness formula in EVC (Velev and Bryant, 2001b), we hashed the expressions
and kept only one copy of isomorphic operators. This significantly reduced the size of the
correctness formula, as well as the number of CNF variables and clauses. The variables in
the original Boolean correctness formula will be called primary Boolean variables.
We ran the experiments on a 336 MHz Sun4 with 4 GB of physical memory. For the
BDD-based runs, we used the BDD package CUDD version 2.3.0 (Somenzi, 2001),8 and
the sifting dynamic variable reordering heuristic (Rudell, 1993). In BED evaluations,9
we experimented with converting the final BED into a BDD with both the up one()
and up all() functions (Williams, 2000) by employing four different variable ordering
heuristics—variants of the depth-first and fanin heuristics (Malik et al., 1988)—that were
the most efficient when verifying multipliers (Williams et al., 2000).
The SAT procedures that scaled for the 100 buggy variants of 2×DLX-CC-MC-EX-BP
are listed in Table 1. The rest of the SAT solvers had trouble even with the single-issue
processor, 1 × DLX-C (whose CNF correctness formula had 776 variables, and 3725
clauses), or could not scale for its dual-issue version, 2 × DLX-CC (1516 CNF variables,
8 CUDD is available from: http://vlsi.colorado.edu/∼fabio.
9 Based on BED package version 2.5, available from: http://www.it-c.dk/research/bed.
86 M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106
Table 1
Comparison of SAT procedures on 100 buggy versions of 2 ×DLX-CC-MC-EX-BP
SAT procedure % Satisfiable in
<24 s <240 s <2400 s
Chaff 100 100 100
BerkMin 97 100 100
DLM-3 51 82 98
DLM-2 50 84 97
UnitWalk 45 81 98
CGRASP 44 49 68
QSAT 33 47 52
SATO 22 30 69
GRASP 14 21 24
rel sat.1.0 13 17 22
WalkSAT 10 16 27
rel sat rand 10 19 29
SIMO 7 14 16
CLS 5 8 10
GRASP with restarts 4 11 14
rel sat.2.1 3 58 97
DLM-2000 2 24 66
BDDs 2 2 3
eqsatz 1 5 7
and 12 812 clauses) that does not implement exceptions, multicycle functional units, and
branch prediction. For example, Prover could not solve the Boolean formula for correct-
ness of 2×DLX-CC within 24 h, as reported in our earlier work (Velev and Bryant, 1999).
The SAT-checker Chaff had the best performance, finding a satisfying assignment for
each benchmark in less than 24 s (indeed, less than 23.24 s). It was closely followed by
BerkMin that solved 97 instances in less than 24 s each, and required less than 29 s for
each of the other three benchmarks. We ran the rest of the SAT procedures for 240 and
2400 s—one and two orders of magnitude more than Chaff.DLM-3 and DLM-2 were third
and fourth, respectively, but could solve only half the instances within the time limit of 24 s.
UnitWalk and CGRASP solved 45 and 44 instances, respectively, in 24 s for each, followed
by QSAT with 33 of the benchmarks under 24 s. The rest of the SAT procedures, including
BDDs, performed significantly worse. DLM-2000 is slower than DLM-3 and DLM-2 because
of extensive analysis before each decision.
The 100 buggy variants of 2 × DLX-CC-MC-EX-BP are available as benchmark
suite SSS-SAT.1.0 (Velev, 2000b), and have been used for SAT experiments by many
researchers. Lynce et al. (2001) present the SAT-checker Quest0.5, built on top of
GRASP and based on restarts and random backtracking. They report that Quest0.5 took
292 s to solve the 100 buggy variants of 2×DLX-CC-MC-EX-BP on their computer, while
Chaff required 84 s, i.e. was approximately 3.5 times faster. Janssen (2001) describes a
pointerless BDD package that required less memory than CUDD version 2.3.0 when run on
the benchmarks in suite SSS.1.0 (Velev, 1999), consisting of 48 variants of 2 × DLX-CC.
M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106 87
His BDD package was up to three times faster on eight of the benchmarks, but required
comparable CPU time for the rest. He does not present results for the more challenging
100 buggy variants of 2 × DLX-CC-MC-EX-BP.
When verifying the correct version of 2 × DLX-CC-MC-EX-BP (4583 CNF variables,
and 41 704 clauses), BerkMin was the fastest—requiring 15 s, followed by Chaff with
22 s. BDDs took 2635 s (Velev and Bryant, 2000), while QSAT—14 h and 37 min. CGRASP,
SATO, GRASP, and GRASP used in a mode with restarts, randomization, and recursive
learning did not finish in 24 h.
We then compared Chaff and BerkMin on the 100 buggy VLIW designs: Chaff was
better in 73 cases; BerkMin was faster by at least 60 s on only seven benchmarks. For
Chaff, the minimum time per benchmark was 3.7 s and the maximum 180.4 s, as compared
to a minimum of 8.7 s and a maximum of 151.4 s for BerkMin. The average time was
32.5 s for Chaff, and 43.6 s for BerkMin. The variations in the times to find a satisfying
assignment, i.e. to detect bugs, can be explained with the fact that the single or multiple
errors in a buggy design affect a different number of architectural state elements and
under different conditions than the errors in another buggy design. Generally, errors that
affect fewer architectural state elements and under rare conditions are harder to detect. The
SAT-checker DLM-3 did not complete five of the VLIW benchmarks in 3600 s. The CNF
formulae from verification of the 100 buggy VLIW designs are available as benchmark
suite VLIW-SAT.1.0 (Velev, 2000b).
The correct 9VLIW-MC-BP had a CNF formula with 20 093 variables, and 179 492
clauses. Chaff took 759 s to prove the unsatisfiability of that formula, while BerkMin
required 224 s. In the original experiments, BDDs took 31.5 h (Velev, 2000a).
While preparing the final version of this paper, we learned of another proprietary SAT-
checker (Pilarski and Hu, 2002), recently developed at Synopsys, Inc., and also extending
the ideas from Chaff. Pilarski and Hu (2002) report that their SAT solver is 2.4 times faster
than zChaff10 on the 100 buggy superscalar benchmarks (SSS-SAT.1.0), 3.3 times faster
on the 100 buggy VLIW benchmarks (VLIW-SAT.1.0), and more than six times faster on
the unsatisfiable CNF instances from correct designs (FVP-UNSAT.1.0) that include the
VLIW processor. However, we found zChaff to be slower than mChaff, which is used in
this paper. We could not obtain Pilarski and Hu’s SAT-checker in order to directly compare
it with Chaff (i.e. mChaff) and BerkMin.
Fig. 7 compares Chaff and BDDs on the 100 buggy VLIW designs, such that Chaff
is evaluating only one monolithic correctness criterion, while BDDs evaluate 16 weak
and simpler correctness criteria in parallel (Velev, 2000a)—see Section 7. The assumption
is that there are enough computing resources for parallel runs of the verification tool
EVC (Velev and Bryant, 2001b) that can directly use BDDs, instead of saving the formula
in CNF format. As soon as one of these parallel runs finds a counterexample, we
terminate the rest, and consider the minimum time as the verification time. As shown,
the difference between BDDs and Chaff is up to four orders of magnitude. In a different
body of work—bounded model checking—Clarke et al. (2001), Bjesse et al. (2001) and
Copty et al. (2001) also report that SAT-checkers performed much better than BDDs.
10 Available from http://www.ee.princeton.edu/∼chaff/software.php.
88 M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106
Fig. 7. Comparison of Chaff (with one run) and BDDs (with 16 parallel runs) on the 100 buggy versions of
9VLIW-MC-BP. The benchmarks are sorted in ascending order of their times for the BDD-based experiments.
Applying the script simplify (Marques-Silva, 2000) in order to algebraically simplify
the CNF formula for one of the buggy VLIW designs required more than 47 000 s, while
Chaff took only 14 s to find a satisfying assignment without simplifications. This is not
surprising, given that the buggy VLIW designs had CNF formulae of up to 25 000 vari-
ables, and up to 450 000 clauses. Another simplifier, presented by Brafman (2001), took
130 s to process the CNF formula, but did not speed up Chaff. We also tried the MINCE
heuristic (Aloul et al., 2001) that uses a min-cut linear placement algorithm in order to
statistically rename the CNF variables in a way that reduces the cutwidth of the formula.
MINCE took 3203 s, and the resulting CNF formula required an almost doubled CPU time
for a solution by Chaff. Finally, Wang et al. (2001) propose another algorithm for comput-
ing a CNF variable ordering that reduces the cutwidth. They ran it on the nine satisfiable
CNF instances in benchmark suite SSS.1.0a (Velev, 1999)—formulae generated in the for-
mal verification of buggy variants of 2×DLX-CC. Their algorithm could not complete five
of the instances within a time limit of 10 000 s for each, and solved the other four after more
than 5700 s total. In contrast, Chaff solves these nine instances in 26 s total. Therefore,
attempts to preprocess CNF formulae prior to SAT-checking did not yield improvement.
Hence, based on experiments with two benchmark suites, each consisting of one
correct high-level processor and 100 buggy variants of the same design, we identified
Chaff and BerkMin as the most efficient SAT procedures for solving satisfiable CNF
instances generated in the formal verification of incorrect processors, with BerkMin
being significantly faster on unsatisfiable instances from formal verification of the correct
designs. As observed in our earlier work (Velev and Bryant, 2001a), the breakthrough
occurred with Chaff, which was more than two orders of magnitude faster than the other
SAT solvers available at the time. BerkMinwas created later and further develops the ideas
M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106 89
from Chaff, but is a proprietary SAT solver that is not publicly available.11 How do such
powerful SAT-checkers change the frontier of possibilities? The rest of the paper examines
ways to increase the productivity in microprocessor formal verification by using Chaff
and BerkMin as the back-end SAT procedures.
5. Impact of variations in eliminating UFs and UPs
When translating the EUFM correctness formula to an equivalent Boolean formula, we
can apply the following two structural variations:
• Early reduction of p-equations. In eliminating UFs and UPs and enforcing
functional consistency with nested ITEs (see Section 2.2), the translation algorithm
introduces equations between argument terms in order to control the nested ITEs.
Although such equations are implicitly negated for the case when they select the
else-term of one of these nested ITEs, we can still treat them as p-equations as
long as each of their argument terms has only p-term variables in its support
(Velev and Bryant, 1999; Bryant et al., 2001). This results in replacing the UFs and
UPs with lookup tables that map each unique combination of symbolic expressions
at the inputs of the UFs or UPs to a corresponding new term variable or Boolean
variable, respectively, for the output value. The elimination of UFs and UPs is done
recursively, starting from the leaves of the EUFM correctness formula. Then, when
an application of an UF or UP is eliminated, the expressions for its input terms will
consist of only nested ITEs that select one from a set of supporting term variables. If
the terms on both sides of an equation have disjoint supports of p-term variables, then
the two compared terms will not be equal under a maximally diverse interpretation,
and their equation can be replaced with the constant false. This is done in the
final step of the translation algorithm (Velev and Bryant, 1999). However, an early
reduction of such equations will result in a different (but equivalent) structure of the
Boolean correctness formula, i.e. in a different (but equivalent) CNF formula to be
evaluated by SAT-checkers.
• Eliminating UPs with Ackermann constraints. Using Ackermann constraints
(Ackermann, 1954) to enforce the functional consistency of eliminated UFs and UPs,
as discussed in Section 2.2, yields a negated equation for the new variables, c1 and
c2, that replace the original UF or UP applications:
[(a1 = a2) ∧ (b1 = b2) ⇒ (c1 = c2)] ⇒ F ′,
which is equivalent to:
(a1 = a2) ∧ (b1 = b2) ∧ ¬(c1 = c2) ∨ F ′.
The negated equation for the new variables c1 and c2 means that they cannot be
p-terms—something that we want to avoid in order to exploit the computational
efficiency of positive equality. Therefore, Ackermann constraints should not be
11 We thank E. Goldberg for releasing BerkMin to us.
90 M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106
Table 2
Maximum and average times for finding satisfying assignments in the formal verification of the 100 buggy VLIW
designs when structural and parameter variations were used: “base” means no structural variations; “ER” stands
for early reductions of p-equations; “AC” for Ackermann constraints in eliminating UPs: “base1”, “base2”, and
“base3” mean no structural variations when generating the Boolean correctness formula, but variations of Chaff’s
command parameters as explained in the text
SAT-checker Variations Comment Parallel Maximum Average
for each tool (runs per tool) runs time (s) time (s)
Chaff Base – 1 180.4 32.5
BerkMin Base – 1 151.4 43.6
Chaff, BerkMin Base 1 2 138.3 22.3
Chaff Base, ER, AC, ER + AC – 4 74.9 14.4
Berkmin Base, ER, AC, ER + AC – 4 62.0 20.3
Chaff, Berkmin Base, ER 2 4 132.0 17.1
Chaff, Berkmin Base, AC 2 4 61.3 17.0
Chaff, Berkmin Base, ER + AC 2 4 68.2 15.1
Chaff Base, base1, base2, base3 – 4 176.8 15.0
used to eliminate UFs whose results appear only in positive equations. However,
Ackermann constraints can be used to eliminate UPs—then the negated equations
will be over Boolean variables, and that is not a problem when using positive
equality. Hence, Ackermann constraints can be used instead of nested ITEs to
eliminate UPs.
In order to exploit the above structural variations, we can run parallel copies of the
formal verification tool flow, all of them applied to the same design, and each using a
different structural variation or combination thereof. Then, one satisfying assignment is
enough to detect a bug, i.e. we take the minimum of the run times as the time to find the
bug.
As Table 2 shows, when each of Chaff and BerkMin was used in four parallel runs—
one base run without structural variations and three runs with such variations (i.e. base,
ER, AC, ER + AC, as explained in the table)—the average time was reduced by a
factor of 2—from 32.5 to 14.4 s for Chaff, and from 43.6 to 20.3 s for BerkMin.
Similarly, the maximum time was reduced by a factor of 2.5 for both tools—from 180.4
to 74.9 s for Chaff, and from 151.4 to 62.0 s for BerkMin. We also ran Chaff with
three parameter variations in the base configuration file, cherry 032301, as suggested by
Moskewicz (2001): (1) the restart period was increased from 2000 to 3000; (2) the restart
period was increased from 2000 to 4000; and (3) the randomness at restart was increased
from 3 to 10. The results of these three runs with parameter variations, combined with the
base run, are summarized in the last row of Table 2. The reduction in the average time was
comparable to that achieved in any of the other experiments with four parallel runs where
Chaff was used. Berkmin was released to us without the option to vary its command
parameters. However, when verifying the correct VLIW processor, structural variations
slowed both Chaff and BerkMin, while parameter variations slowed Chaff.
M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106 91
Fig. 8. The ei j encoding of g-equations. (a) An EUFM formula F , where g-term variables g1, g2, g3, and g4 are
compared for equality in a cycle of length four; (b) the equality comparison graph between g1, g2, g3, and g4—an
edge indicates an equality comparison; (c) the triangulated equality comparison graph, with one extra edge g2–g4
added, and ei j variables assigned to the edges; (d) transitivity of equality constraints for the two triangles of the
graph in (c).
Therefore, although structural or parameter variations can speedup the detection of
bugs, if resources are available for parallel runs of the tool flow, that was not critical for
the 100 buggy VLIW designs, since the maximum and average times for the base runs
with either Chaff or BerkMin were so slow. Neither structural nor parameter variations
accelerated the verification of the correct VLIW processor, regardless of the SAT-checker.
6. Impact of g-equation encodings
When translating the EUFM correctness formula to an equivalent Boolean formula, we
can encode the g-equations with one of the following two schemes:
• The eij encoding. The equation gi = g j , where gi and g j are g-term variables, is
replaced by a new Boolean variable ei j (Goel et al., 1998). Transitivity of equality,
i.e. the property (gi = g j )∧(g j = gk) ⇒ (gi = gk), has to be enforced additionally,
e.g. by triangulating the equality comparison graph of the ei j variables that affect
the final Boolean formula and then enforcing transitivity for each of the resulting
triangles, as done in our sparse transitivity scheme (Bryant and Velev, 2002)—see
Fig. 8 for an example. The triangulation is done iteratively, in a greedy manner, such
that at each step: nodes of degree 1 and their single edges are removed, since such
nodes are not part of cycles for which transitivity of equality has to hold; the node
of the smallest degree n ≥ 2 is found; up to n − 1 extra edges are added; if they do
not exist already, in order to form n − 1 triangles with the node’s edges (e.g. edge
g2–g4 is added in Fig. 8(c) in order to triangulate the two edges, g1–g2 and g1–g4,
of node g1); the node and its edges are removed, and the procedure is applied to
the remaining nodes by considering the newly added edges; finally, the original and
the extra edges are put together to form the triangulated equality comparison graph.
Although not every correct microprocessor requires transitivity for its correctness
92 M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106
Fig. 9. The small-domain encoding of g-equations, applied to the equality comparison graph in Fig. 8 (b) with
the greedy strategy of assigning a characteristic constant to the unprocessed node of highest degree. Ties are
broken randomly. A circled node, e.g. g1 in (a), means that the node is currently being processed, i.e. assigned
a characteristic constant. The same constant is also added to the sets of constants for the nodes that can be
reached via a path of edges starting from the currently processed node. After a node is processed, its edges are
removed. An empty node, e.g. g1 in (b), means that the node has already been processed. (a) Node g1 was chosen
randomly, since all models have degree 2 initially. (b) Node g3 was chosen, since it has the highest degree 2.
(c) Node g4 was chosen randomly from the unprocessed nodes g2 and g4 of degree 0. (d) The only unprocessed
node, g2, was assigned a characteristic constant. (e) Based on the constants in its set, each g-term variable gi is
assigned a nested-ITE expression that is controlled by new indexing variables xik , and evaluates to one of these
constants, given an assignment to the variables xik . (f) Each edge in the equality comparison graph is labeled
with a Boolean formula, encoding the conditions when the two g-term variables at the ends of the edge will
simultaneously evaluate to a common constant, i.e. will be equal.
proof, that property is needed in order to avoid false negatives for buggy designs or
for processors that do need transitivity.
• The small-domain encoding. Every g-term variable is assigned a set of constants
that it can take on in a way that allows it to be either equal to or different from
any other g-term variable with which it can be transitively compared for equality
(Pnueli et al., 1999)—see Fig. 9(a)–(d). If there are N constants in the set for a g-
term variable, those can be indexed with log2(N) new Boolean variables that
will be used to control nested ITEs selecting a mapping of the g-term variable to
a constant in the set—see Fig. 9(e). For example, g-term variable g2 is assigned a
set of three constants {c1, c2, c4} in Fig. 9(d), so that we can introduce two indexing
variables, x21 and x22, and form the expression ITE(x21, c1, ITE(x22, c2, c4)) that
will be used to replace g2. Then, two g-term variables will be equal if their indexing
variables simultaneously select the same common constant—see Fig. 9(f). Hence,
M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106 93
Table 3
Comparison of the ei j and small-domain encodings on the 100 buggy versions of 9VLIW-MC-BP, using both
Chaff and BerkMin. The experiments with one run of the tool flow are without structural variations. The
experiments with four runs also include a run with early reduction of p-equations, another with Ackermann
constraints used to eliminate UPs, and a fourth run with both of these transformations (see Section 5)
SAT-checker Parallel runs g-equation encoding
ei j Small-domain
Maximum Average Maximum Average
time (s) time (s) time (s) time (s)
Chaff 1 180.4 32.5 594.0 100.4
4 74.9 14.4 338.4 35.2
BerkMin 1 151.4 43.6 245.0 85.0
4 62.0 20.3 226.5 56.7
g-term variables in a cycle can be equal if they simultaneously evaluate to the same
common constant, so that transitivity of equality is automatically enforced in this
encoding. Depending on the structure of the equality comparison graph, the small-
domain encoding might introduce fewer primary Boolean variables than the ei j
encoding. That would mean a smaller search space. However, now many g-equations
will get replaced by a Boolean formula—a disjunction of conjuncts, each consisting
of many Boolean variables or their complements, and encoding the possibility that
two g-term variables evaluate to the same common constant. In contrast, in the ei j
encoding, a g-equation always gets replaced by a single Boolean variable.
We compared the two encodings on the 100 buggy VLIW designs—see Table 3. When
using Chaff with a single run of the tool flow, the ei j encoding (used for the experiments
before this section) resulted in three times faster detection of bugs—the maximum and
average times were 180.4 and 32.5 s, compared to 594.0 and 100.4 s with the small-
domain encoding. Constraints for transitivity of equality were included when using the
ei j encoding. When four parallel runs with structural variations were employed (base, ER,
AC, ER + AC—see Section 5), the ei j encoding was again faster—at least 2.5 times—
with maximum and average times of 74.9 and 14.4 s, compared to 338.4 and 35.2 s with
the small-domain encoding. BerkMin was similarly faster with the ei j encoding. Fig. 10
shows a detailed plot of BerkMin’s performance on one run with each encoding—the ei j
encoding resulted in faster detection of bugs for 87 of the 100 designs.
When verifying the correct 9VLIW-MC-BP, the ei j encoding required more than twice
as many primary Boolean variables as the small-domain encoding, but half the CPU time
for SAT-checking with Chaff—2615 primary Boolean variables (2353 of them being ei j
variables) and 759 s of CPU time, compared with 1152 primary Boolean variables (890 of
them being indexing variables) and 1479 s of CPU time with the small-domain encoding.
BerkMin took 224 s with the ei j encoding, but 418 s with the small-domain encoding.
Again, constraints for transitivity of equality were included in the formula generated with
the ei j encoding.
94 M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106
Fig. 10. Comparison of the ei j and small-domain encodings on the 100 buggy versions of 9VLIW-MC-BP, using
BerkMin and one run of the tool flow without structural variations. The benchmarks are sorted in ascending order
of their times with the small-domain encoding. BerkMin was used for this plot, because BerkMin performed
better than Chaff on one run with the small-domain encoding (see Table 3).
We also compared the two encodings on correct designs that do require transitivity
of equality for their correctness proofs—superscalar processors with out-of-order execu-
tion that can execute register–register, and load instructions. Because instructions are dis-
patched when they do not have Write-After-Write (in addition to Write-After-Read and
Read-After-Write) dependencies (Hennessy and Patterson, 2002) on instructions that are
earlier in the program order but are stalled due to data dependencies, transitivity of equal-
ity is required in proving the equality of the final states of the Register File reached after
the implementation and the specification sides of the commutative correctness diagram.
As shown in Table 4, the small-domain encoding introduces fewer primary Boolean
variables—one-fourth of those required by the ei j encoding for the 6-wide design—but
results in approximately 50% more CNF variables, and 10–20% more CNF clauses, and
thus in longer CPU times with earlier SAT-checker—see Table 5. BerkMin was an order
of magnitude faster than Chaff on the 4-, 5-, and 6-wide designs, due to BerkMin’s
heuristics that are fine-tuned for CNF formulae derived from deeply nested expressions
(Goldberg, 2002)—the case in these benchmarks. The CNF formulae from the out-of-order
processors are available as benchmark suite FVP-UNSAT.2.0 (Velev, 2000b).
The efficiency of the ei j encoding can be explained by the impact of g-equations on
the instruction flow, and hence on the correctness formula. Such equations determine
forwarding and stalling conditions, based on equality comparisons of register identifiers,
as well as instruction squashing conditions for correcting branch mispredictions, based on
equality comparisons of actual and predicted branch targets. Therefore, g-equations affect
the branching behavior in instruction execution. A single Boolean variable, introduced
M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106 95
Table 4
Statistics for the ei j and small-domain encodings when verifying correct out-of-order superscalar processors.
These designs require transitivity of equality for their correctness proofs. The results are listed as a function of
the processor issues width
Issue g-equation encoding
width ei j Small-domain
Primary CNF CNF Primary CNF CNF
Boolean variables clauses Boolean variables clauses
variables variables
2 139 925 8 213 81 1 294 9 803
3 308 2 577 33 270 127 3 780 39 475
4 553 5 525 96 480 194 8 362 112 636
5 857 10 113 240 892 249 15 647 275 581
6 1243 17 186 528 962 304 26 738 590 832
Table 5
CPU time to prove unsatisfiability when verifying correct out-of-order superscalar processors. These designs
require transitivity of equality for their correctness proofs. Both Chaff and BerkMin were run on the same
correctness formula, which was generated without structural variations. The results are listed as a function of the
processor issue width
Issue g-equation encoding
width ei j Small-domain
Chaff BerkMin Chaff BerkMin
time (s) time (s) time (s) time (s)
2 3.9 1.6 7.3 1.7
3 46 15 49 19
4 653 65 1 049 99
5 1 381 154 1 864 255
6 68 896 1957 132 428 3206
in the ei j encoding, naturally fits the purpose of accounting for both cases—that the
equality comparison is either true or false. Transitivity of equality is enforced by automatic
application of the unit-clause rule implemented in SAT-checkers—if there is a single
unassigned literal in a CNF clause, with the rest of the literals being false, then the
unassigned literal has to get value true in order for the clause to be satisfied. Such an
assignment is called an implication. Hence, as soon as two variables, ei j and e jk , in
triangle ei j –e jk–eki become true, the third variable eki in that triangle will be assigned
value true, due to the imposed transitivity constraint (¬ei j ∨ ¬e jk¬eki ), where both ¬ei j
and ¬e jk will be already false. Transitivity of equality is similarly enforced on cycles of
any length (Bryant and Velev, 2002), since cycles longer than three are triangulated with
new ei j variables.
The small-domain encoding requires more CNF variables and more implications to
enforce transitivity of equality. In the example cycle of length 4 in Fig. 9(f), we will need
to introduce auxiliary Boolean variables f23 and f34 to represent the values of the Boolean
functions that encode the equality comparisons (g2 = g3) and (g3 = g4), respectively.
96 M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106
Also, since each of these formulae is a disjunction of two conjuncts, we will need an
auxiliary variable for the output of each conjunct, for a total of six auxiliary Boolean
variables, in addition to the five indexing variables— x21, x22 x31, x41, and x42. Hence, the
small-domain encoding will require 11 CNF variables to encode the equality comparisons
(no additional constraints are needed to enforce transitivity) in the example cycle of
length 4. In contrast, the ei j encoding will introduce five CNF variables—one for each
of the original four equality comparisons, and another for the triangulating edge (g2 = g4)
that was added to enforce transitivity—see Fig. 8(c). Therefore, given an assignment of
values to three CNF variables representing the outputs of three of the g-equations in the
cycle of length 4, the small-domain encoding will require up to eight implications (one
for each of the other eight CNF variables related with the cycle) to enforce transitivity
of equality, as compared to at most two implications with the ei j encoding, where each
triangle may trigger an implication.
With the ei j encoding, the number of Boolean variables, encoding equality and
transitivity constraints for a cycle, does not depend on the number of outside edges
(i.e. equality comparisons) that are not part of a cycle and that are connected to nodes in
the cycle. On the other hand, with the small-domain encoding, outside edges might result
in more constants being added to the sets for all or some of the nodes in the cycle. Those
constants will come from outside nodes (i.e. g-term variables) that are not part of the cycle,
but that can be transitively compared for equality with each node in the cycle. Such extra
constants might result in extra indexing variables and conjuncts in the disjunctive formulae
encoding the equality comparisons in the cycle, thus increasing both the complexity of
those formulae and the number of implications required to enforce transitivity. Also, if
the same additional constant gets included in the sets of all g-term variables in a cycle—
e.g. due to an outside node that gets assigned a characteristic constant before the nodes
in the cycle, such that this node is transitively connected with the nodes in the cycle—
then the nodes in the cycle can be equal in more than one way, which will increase the
number of implications required to enforce transitivity. Hence, the small-domain encoding
enumerates all mappings of g-term variables to a sufficient set of distinct constants, thus
introducing more information than actually required to solve the problem.
An additional source of inefficiency in the small-domain encoding is that an fi j variable,
representing the output of a function encoding the equality gi = g j , can be true for
many assignments to its supporting indexing variables, and can be false for many other
assignments to those variables. Hence, it is possible that portions of the search space
(e.g. where fi j = true) will be revisited multiple times. Although learning, employed in
both Chaff and BerkMin, can reduce or even eliminate such revisits, both SAT-checkers
age the learned clauses and periodically discard old learned clauses—hence the possibility
to revisit portions of the search space. Note that each feasible assignment to ei j variables
(i.e. assignment not violating transitivity of equality) is a feasible assignment to fi j
variables, except that it can be justified with many possible assignments to the indexing
variables. Therefore, multiple branches in the formula could be revisited for what would
be just one visit with the ei j encoding. As a result of all these factors, the ei j encoding is
more efficient than the small-domain encoding.
In a different application—encoding constraint satisfaction problems such as SAT
instances—Hoos (1999) similarly found that better performance is achieved with an
M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106 97
Table 6
CPU time (in seconds) to detect error in the 100 buggy versions of 9VLIW-MC-BP, using Chaff or BerkMin.
Each SAT-checker was run in parallel on up to 16 weak correctness criteria, stopping as soon as one of the runs
finds a satisfying assignment that triggers a bug
Parallel Chaff BerkMin
runs Minimum Maximum Average Minimum Maximum Average
1 3.7 180.4 32.5 8.7 151.4 43.6
8 0.3 31.3 4.1 2.2 32.7 8.5
16 0.2 17.5 2.8 2.3 18.6 6.3
encoding that introduces more primary Boolean variables, but results in conceptually
simpler search spaces.
In their decision procedure based on the small-domain encoding, Pnueli et al. (1999)
use Ackermann constraints to eliminate all UFs, including those that appear only in
p-equations in the EUFM correctness formula. Thus, when enforcing functional consis-
tency, they introduce negated equations for the new term variables that replace such UFs
(as discussed in Section 5), turning these term variables into g-terms, whose equations will
have to be encoded with Boolean functions. In contrast, by exploiting positive equality and
the nested-ITE scheme for eliminating UFs, we treat the new term variables as p-terms, thus
reducing the number of g-equations that have to be encoded with new Boolean variables.
7. Impact of decomposing the correctness criterion
The correctness criterion can be evaluated with one monolithic computation:
( f0,1 ∧ f0,2 ∧ · · · ∧ f0,N ) ∨ · · · ∨ ( fk,1 ∧ fk,2 ∧ · · · ∧ fk,N ) = true,
where fl,m is a Boolean formula checking whether memory element m is updated by
l instructions, 0 ≤ l ≤ k, given the fetch width k of the processor. Formulae fl,m
are produced after translating a corresponding EUFM formula to a Boolean formula by
exploiting positive equality. However, the evaluation can be decomposed (Velev, 2000a)
by selecting a set of disjoint window functions wl , one for each index l, where wl consists
of either just one of the functions fl,m or a conjunction of several of them with the same
index l, such that w0 ∨ · · · ∨ wk = true, and then proving that wl ⇒ fl,i for each l and
each i such that fl,i is not used in forming wl . That results in a set of smaller computations
(weak correctness criteria), each depending on only a subset of the formulae fl,m in the
monolithic computation. However, proving all of these weak criteria is sufficient to imply
that the monolithic criterion is true, without actually evaluating it. That resulted in a factor
of 4 reduction in the CPU time for the BDD-based evaluation of the correct 9VLIW-MC-
BP (Velev, 2000a). Note that when proving correctness with multiple parallel runs by using
decomposition, we need to wait until all of them complete, taking the maximum CPU time
as the verification time. All experiments in this section use the ei j encoding of g-equations.
The benefits of decomposition when verifying the 100 buggy versions of 9VLIW-
MC-BP with Chaff and BerkMin are shown in Table 6. Using Chaff and running eight
98 M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106
Table 7
Effect of decomposing the correctness conditiion when detecting four actual bugs in the design of 9VLIW-MC-
BP-EX, using Chaff and BerkMin. The experiments with one run are based on a monolithic correctness criterion
Bugs while Parallel Maximum primary Chaff BerkMin
designing runs Boolean CPU CPU
9VLIW-MC-BP-EX variables time (s) time (s)
Bug1 1 5127 16.2 65.0
20 4926 10.2 15.4
Bug2 1 5400 12.2 50.0
20 5043 10.9 16.4
Bug3 1 3500 29.3 53.0
22 3106 18.3 5.4
Bug4 1 3500 108.4 153.0
22 3106 39.5 22.0
weak correctness criteria, the maximum CPU time is reduced from 180.4 to 31.3 s and
the average from 32.5 to 4.1 s, while running 16 weak criteria results in a maximum
of 17.5 s and an average of 2.8 s. The performance of BerkMin is very similar on
these benchmarks. While the achieved reductions are not critical for the present set of
benchmarks, decomposition might become important for detecting bugs in more complex
designs.
Both Chaff and BerkMin had sufficient capacity to verify an extension of 9VLIW-
MC-BP that implements exceptions, yielding the processor 9VLIW-MC-BP-EX. The
Instruction Memory, the ALUs, and the Data Memory could each raise an exception.
The exception conditions were stored in three new architectural state elements—one for
each of the exception sources. The PC of the instruction that raises an exception was
stored in another new architectural state element, the Exception PC (EPC). The design
also implemented a return-from-exception instruction that transfers the value of the EPC
to the PC, allowing the program execution to resume after a software exception handler
fixes the cause of the exception.
Four bugs were generated inadvertently when creating 9VLIW-MC-BP-EX, but were
detected in 12.2 to 108.4 s by Chaff when run on a monolithic correctness criterion—see
Table 7. BerkMinwas consistently slower than Chaffwhen using a monolithic correctness
criterion, but was faster when detecting Bugs 3 and 4 with 22 weak correctness criteria
checked in parallel.
Table 8 shows the effect of decomposition when verifying correct designs. Using eight
weak correctness criteria to verify 9VLIW-MC-BP resulted in approximately a factor of 2
speedup with both Chaff and BerkMin. Doubling the weak correctness criteria to 16
produced another factor of 2 speedup for BerkMin, but a smaller speedup for Chaff.
When verifying the more complex 9VLIW-MC-BP-EX, using 11 weak correctness criteria
resulted in a factor of 2 speedup for both SAT-checkers, but 22 weak correctness criteria
produced a negligible speedup for Chaff and slightly lengthened the run time for BerkMin.
Hence, extensive decomposition has diminishing returns for complex designs, but does
M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106 99
Table 8
Effect of decomposing the correctness condition when verifying the correct versions of 9VLIW-MC-BP and its
extension with exceptions, 9VLIW-MC-BP-EX, using Chaff and BerkMin. The experiments with one run are
based on a monolithic correctness criterion
Processor Parallel runs Maximum primary Chaff BerkMin
Boolean CPU CPU
variables time (s) time (s)
9VLIW-MC-BP 1 3108 759 224
8 2273 349 134
16 2273 264 63
9VLIW-MC-BP-EX 1 3587 1094 347
11 3243 519 167
22 3175 473 173
help reduce the CPU time. While Chaff was usually faster when detecting bugs in the four
incorrect variants of 9VLIW-MC-BP-EX (see Table 7), especially when using a monolithic
correctness criterion, BerkMin was approximately three times faster than Chaff when
verifying the two correct designs in Table 8.
8. Impact of conservative approximations and positive equality
We previously used conservative approximations, such as:
• Translation boxes. These are dummy UFs or UPs with one input (Velev and Bryant,
2000) that are manually inserted before the inputs of architectural state elements in
both the implementation and the specification processor. Such UFs or UPs result in
common subexpression substitution, and could produce simpler Boolean correctness
formulae.
• Automatically abstracted memories. The interpreted functions read and write are
abstracted automatically (Velev, 2000a, 2001) with completely general UFs that do
not satisfy the forwarding property of the memory semantics.
These conservative approximations have the potential to speed up the verification of
correct designs, but might result in false negatives requiring manual intervention and
analysis. When such optimizations were not used in the verification of the correct 9VLIW-
MC-BP-EX, Chaff took 914 s to prove the unsatisfiability of the CNF formula, compared
to 660 s with the optimizations; BerkMin took 969 s (longer than Chaff), compared to
275 s with the optimizations. In both cases, the verification was done with the ei j encoding
and monolithic evaluation of the correctness criterion. Hence, the overhead is insignificant,
compared with the time to manually identify false negatives that might result from the
optimizations.
We then evaluated the benefits of exploiting positive equality, given the extremely
efficient SAT-checkers Chaff and BerkMin. This was implemented by introducing an
ei j Boolean variable for the equality comparison of two syntactically distinct p-term
100 M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106
Table 9
Time for satisfiability checking with and without positive equality. Unless specified, the time is measured in
seconds. The experiments were run on a 336 MHz Sun4 with 4 GB of physical memory
Processor Chaff BerkMin
Positive No positive Positive No positive
equality equality equality equality
1 × DLX-C-buggy 0.13 17 0.02 2
1 × DLX-C 0.19 9177 0.07 229
2 × DLX-CC-MC-EX-BP-buggy 12 9409 4 2816
2 × DLX-CC-MC-EX-BP 22 >24 h 15 >24 h
9VLIW-MC-BP-buggy 5 Out of memory 10 >24 h
9VLIW-MC-BP 759 Out of memory 224 >24 h
variables—as done originally by Goel et al. (1998)—instead of treating such p-term
variables as not equal. The results are listed in Table 9.
As Table 9 shows, when verifying the first three benchmarks, positive equality resulted
in up to four orders of magnitude speedup for Chaff, and in up to three orders of magnitude
speedup for BerkMin. When verifying the last three benchmarks that are much more
complex, and when we did not use positive equality, Chaff did not complete in 24 h for
2× DLX-CC-MC-EX-BP, and ran out of memory (given the available 4 GB) for 9VLIW-
MC-BP-buggy and 9VLIW-MC-BP; BerkMin did not finish in 24 h for each of these three
benchmarks. In contrast, with positive equality, Chaff needed 27 MB of memory for the
satisfiable CNF formula from 9VLIW-MC-BP-buggy, and 241 MB for the unsatisfiable
CNF formula from 9VLIW-MC-BP.
Therefore, positive equality is still the main reason for the efficiency of our tool flow
when formally verifying complex microprocessors. The CNF formulae generated without
positive equality are available as benchmark suite NPE-1.0 (Velev, 2002).
9. Conclusions
We found the SAT-checkers Chaff (Moskewicz et al., 2001; Zhang et al., 2001) and
BerkMin (Goldberg and Novikov, 2002) to be the most efficient for evaluating Boolean
formulae generated in the formal verification of both correct and buggy microproces-
sors, dramatically outperforming 29 SAT-checkers, two ATPG tools, and two decision
diagrams—BDDs (Bryant, 1986, 1992) and BEDs (Williams, 2000). The microproces-
sors were described in a high-level hardware description (Velev and Bryant, 2001b) based
on the logic of EUFM, proposed by Burch and Dill (1994). The formal verification was
done with Burch and Dill’s correctness criterion, using flushing of the implementation
processor to map its state to the state of the specification. The EUFM correctness
formula was translated to an equivalent Boolean formula by exploiting positive equality
(Bryant et al., 2001), and using the automatic tool EVC (Velev and Bryant, 2001b).
Reassessing various optimizations that can be applied when generating the Boolean for-
mulae for the microprocessor correctness, we conclude that the single most important step
M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106 101
is exploiting positive equality. Without it, neither Chaff nor BerkMin would have scaled
for realistic superscalar and VLIW microprocessors with exceptions, multicycle functional
units, branch prediction, and other speculative features. BerkMin was consistently faster
on unsatisfiable CNF formulae from complex correct designs, since BerkMin was devel-
oped after Chaff and was better optimized for CNF formulae derived from expressions
with many levels (Goldberg, 2002).
Exploiting the ei j encoding (Goel et al., 1998) of g-equations resulted in a speedup of
2 for the base VLIW processor, 9VLIW-MC-BP, compared to the small-domain encoding
(Pnueli et al., 1999) when verifying correct designs, and consistently performed better on
buggy versions. Although the ei j encoding introduces more than twice as many primary
Boolean variables for our benchmarks, it results in less CNF variables and less CNF clauses
than the small-domain encoding, and produces a conceptually simpler search space—
with each ei j Boolean variable naturally encoding the equality between a pair of g-term
variables. Transitivity of equality is enforced with fewer implications than in the case
of the small-domain encoding. In contrast, the small-domain encoding enumerates all
mappings of g-term variables to a sufficient set of distinct constants, thus introducing more
information than actually required to solve the problem. This results in the potential to
revisit portions of the search space, for what would be just one visit with the ei j encoding.
Conservative approximations, such as manually inserted translations boxes (Velev and
Bryant, 2000 ) or automatically abstracted memories (Velev, 2000a, 2001), are not as
essential to the fast verification of correct VLIW and dual-issue superscalar processors
when using Chaff or BerkMin, as these optimizations were when using BDDs—
previously the most efficient SAT procedure for correct designs.
Decomposing the evaluation of the Boolean correctness formula (Velev, 2000a), by
evaluating many simpler formulae in parallel, resulted in a speedup of up to 3.5 times
for the base VLIW processor, but in a speedup of 2 for its version with exceptions.
Decomposition consistently accelerated the generation of counterexamples of buggy
microprocessors.
Structural variations in generating the Boolean correctness formulae—early reductions
of p-equations, and using Ackermann constraints for eliminating UPs—as well as varia-
tions of Chaff’s command parameters (we would not vary BerkMin’s command parame-
ters) accelerated the detection of bugs, although no single variation performed best. Again,
the assumption is that we can run several parallel copies of the tool flow. Neither structural
nor parameter variations accelerated the verification of the correct base VLIW processor.
Algebraic simplifications (Marques-Silva, 2000; Brafman, 2001), or renaming the
CNF variables in order to minimize the cutwidth of the formulae (Aloul et al.,
2001; Wang et al., 2001) did not result in speedups, due to the complexity of the CNF
formulae generated in the formal verification of realistic microprocessors.
To conclude, we showed that Chaff and BerkMin can easily handle complex CNF
formulae that are produced in microprocessor formal verification without applying con-
servative transformations. Such transformations were previously needed in BDD-based
evaluations, but have the potential to result in false negatives, taking extensive human effort
to analyze. We identified the optimizations that help increase the performance of Chaff
and BerkMin on realistic dual-issue superscalar and VLIW designs—positive equality,
combined with the ei j encoding. Helpful, but not essential, are decomposed evaluation of
102 M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106
the Boolean correctness formulae, and use of structural/parameter variations in multiple
parallel runs. Our study will increase the productivity of microprocessor design engineers
and shorten the time-to-market for VLIW and DSP architectures that constitute a signifi-
cant portion of the microprocessor market (Tennenhouse, 2000). The benchmarks used in
this paper are available as Velev (2000b, 2002).
Acknowledgement
This research was supported by the Semiconductor Research Corporation under con-
tract 00-DC-684. A preliminary version of the paper was published as Velev and Bryant
(2001a).
References
Ackermann, W., 1954. Solvable Cases of the Decision Problem. North-Holland, Amsterdam.
Aloul, F.A., Markov, I.L., Sakallah, K.A., 2001. Faster SAT and smaller BDDs via common func-
tion structure. International Conference on Computer-aided Design (ICCAD’01). pp. 443–448.
Baptista, L., Marques-Silva, J.P., 2000. Using randomization and learning to solve hard real-world
instances of satisfiability. Principles and Practice of Constraint Programming (CP’00). Available
from: http://sat.inesc.pt/∼jpms.
Bayardo, R.J., Schrag, R., 1997. Using CSP look-back techniques to solve real world SAT instances.
14th National Conference on Artificial Intelligence (AAAI’97). pp. 203–208.
Bentley, B., 2001. Validating the Intel Pentium 4 microprocessor. 38th Design Automation
Conference (DAC’01). pp. 244–248.
Bjesse, P., Leonard, T., Mokkedem, A., 2001. Finding bugs in an alpha microprocessor using
satisfiability solvers. In: Berry, G., Comon, H., Finkel, A. (Eds.), Computer-aided Verification
(CAV’01), LNCS, vol. 2102. Springer, pp. 454–464.
Brafman, R.I., 2001. A simplifier for propositional formulas with many binary clauses. International
Joint Conference on Artificial Intelligence (IJCAI’01). pp. 515–520.
Brglez, F., Bryan, D., Kozminski, K., 1989. Combinational profiles of sequential benchmark
circuits. International Symposium on Circuits and Systems (ISCAS’89). Available from:
http://cb1.ncsu.edu/benchmarks.
Brglez, F., Fujiwara, H., 1985. A neutral netlist of 10 combinational benchmark cir-
cuits. International Symposium on Circuits and Systems (ISCAS’85). Available from:
http://cb1.ncsu.edu/benchmarks.
Bryant, R.E., 1986. Graph-based algorithms for Boolean function manipulation. IEEE Trans.
Comput. C-35, 677–691.
Bryant, R.E., 1992. Symbolic Boolean manipulation with ordered binary-decision diagrams. ACM
Comput. Surv. 24, 293–318.
Bryant, R.E., German, S., Velev, M.N., 2001. Processor verification using efficient reductions of the
logic of uninterpreted functions to propositional logic. ACM Trans. Comput. Logic (TOCL) 2,
93–134. Available from: http://www.ece.cmu.edu/∼mvelev.
Bryant, R.E., Velev, M.N., 2002. Boolean satisfiability with transitivity constraints. ACM Trans.
Comput. Logic (TOCL) 3. Available from: http://www.ece.cmu.edu/∼mvelev.
Burch, J.R., 1996. Techniques for verifying superscalar microprocessor. 33rd Design Automation
Conference (DAC’96). pp. 552–557.
Burch, J.R., Dill, D.L., 1994. Automated verification of pipelined microprocessor control.
In: Dill, D.L. (Ed.), Computer-aided Verification (CAV’94), LNCS, vol. 818. Springer, pp. 68–80.
M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106 103
Chatalic, P., Simon, L., 2000. Multi-resolution on compressed sets of clauses. 12th International
Conference on Tools with Artificial Intelligence (ICTAI’00). pp. 2–10.
Clarke, E., Biere, A., Raimi, R., Zhu, Y., 2001. Bounded model checking using satisfiability solving.
J. Formal Methods Syst. Design (FMSD) 19, 7–34.
Copty, F., Fix, L., Fraer, R., Giunchiglia, E., Kamhi, G., Tacchella, A., Vardi, M.Y., 2001. Benefits
of bounded model checking at an industrial setting. In: Berry, G., Comon, H., Finkel, A. (Eds.),
Computer-aided Verification (CAV’01), LNCS, vol. 2102. Springer, pp. 436–453.
Crawford, J.M., Auton, L.D., 1996. Experimental results on the crossover point in random 3SAT, In:
Hogg T., Huberman B.A., Williams C. (Eds.), Frontiers in Problem Solving: Phase Transitions
and Complexity, Artificial Intelligence 81, pp. 31–57.
Davis, M., Logemann, G., Loveland, D., 1962. A machine program for theorem proving. Commun.
ACM 5, 394–397.
Davis, M., Putnam, H., 1960. A computing procedure for quantification theory. J. ACM 7, 201–215.
Dubois, O., Andre, P., Boufkhad, Y., Carlier, J., 1993. Can a very simple algorithm be efficient
for SAT. In: Johnson D.S., Trick M.A. (Eds.), The Second DIMACS Implementation Challenge,
DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Available from:
ftp://dimacs.rutgers.edu/pub/challenge/satisfiability/contributed/dubois.
Freeman, J.W., 1995. Improvements to Propositional Satisfiability Search Algorithms, Ph.D. Thesis,
Department of Computer and Information Science, University of Pennsylvania.
Goel, A., Sajid, K., Zhou, H., Aziz, A., Singhal, V, 1998. BDD based procedures for a theory
of equality with uninterpreted functions. In: Hu, A.J., Vardi, M.Y. (Eds.), Computer-aided
Verification (CAV’98), LNCS, vol. 1427. Springer, pp. 244–255.
Goldberg, E., 2002. Personal communication.
Goldberg, E., Novikov, Y., 2002. BerkMin: a fast and robust sat-solver. Design, Automation, and Test
in Europe (DATE’02). pp. 142–149.
Gomes, C.P., Selman, B., Crator, N., Kautz, H.A., 2000. Heavy-tailed phenomena in satisfiability
and constraint satisfaction problems. J. Autom. Reasoning 24, 67–100.
Groote, J.F., Warners, J.P., 2000. The propositional formula checker Heer-Hugo. J. Autom. Reason-
ing 24, 101–125.
Hamzaoglu, I., Patel, J.H., 1999. New techniques for deterministic test pattern generation. J. Electron.
Test., Theory Appl. 15, 63–73.
Harlow, J.E., Brglez, F., 2001. Design of experiments and evaluation of BDD ordering heuristics. Int.
J. Softw. Tools Technology Transfer (STTT) 3, 193–206.
Hennessy, J.L., Patterson, D.A., 2002. Computer Architecture: A Quantitative Approach, 3rd ed.
Morgan Kaufmann Publishers, San Francisco, CA.
Hirsch, E.A., Kojevnikov, A., 2001. Solving Boolean satisfiability using local search guided by unit
clause elimination. Principles and Practice of Constraint Programming (CP’01). pp. 605–609.
Hoos, H.H., 1999. SAT-encodings, search space structure, and local search performance. Interna-
tional Joint Conference on Artificial Intelligence (IJCAI’99). pp. 296–302.
Intel Corporation, 1999. IA-64 application developer’s architecture guide. Available from:
http://developer.intel.com/design/ia-64/architecture.htm.
Iwama, K., Abeta, H., Miyano, E., 1992. Random generation of satisfiable and unsatisfiable CNF
predicates. In: Van Leeuwen, J. (Ed.), Information Processing 92, vol. 1, Algorithms, Software,
Architecture. Elsevier Science Publishers BV, pp. 322–328.
Iwama, K., Hino, K., 1994. Random generation of test instances for logic optimizers. 31st Design
Automation Conference (DAC’94). pp. 430–434.
Janssen, G., 2001. Design of a pointerless BDD package. 10th International Workshop on Logic &
Synthesis (IWLS’01).
104 M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106
Johnson, D.S., Trick, M.A. (Eds.), 1993. The second DIMACS implementation challenge.
DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Available from:
http://dimacs.rutgers.edu/challenges.
Jones, R.B., 2002. Symbolic Simulation Methods for Industrial Formal Verification. Kluwer Aca-
demic Publishers, Boston.
Kalla, P., Zeng, Z., Ciesielski, M.J., Huang, C., 2000. A BDD-based satisfiability infrastructure using
the unate recursive paradigm. Design, Automation and Test in Europe (DATE’00). pp. 232–236.
Larrabee, T., 1992. Test pattern generation using Boolean satisfiability. IEEE Trans. Comput.-aided
Des. Integr. Circuits Syst. 11, 4–15.
Li, C.M., 2000. Integrating equivalency reasoning into Davis–Putnam procedure. 17th National
Conference on Artificial Intelligence (AAAI’00). pp. 291–296.
Li, C.M., Anbulagan, 1997. Heuristics based on unit propagation for satisfiability problems.
International Joint Conference on Artificial Intelligence (IJCAI’97). pp. 366–371.
Lynce, I., Baptista, L., Marques-Silva, J.P., 2001. Stochastic systematic search algorithms for
satisfiability. LICS Workshop on Theory and Applications of Satisfiability Testing (LICS-SAT).
Malik, S., Wang, A.R., Brayton, R.K., Sangiovani-Vincentelli, A., 1988. Logic verification
using binary decision diagrams in a logic synthesis environment. International Conference
on Computer-aided Design (ICCAD’88). pp. 6–9.
Marques-Silva, J.P., 1999. The impact of branching heuristics in propositional satisfiability
algorithms. 9th Portuguese Conference on Artificial Intelligence (EPIA). Available from:
http://sat.inesc.pt/∼jpms.
Marques-Silva, J.P., 2000. Algebraic simplification techniques for propositional satisfiability.
Principles and Practice of Constraint Programming (CP’00). pp. 537–542. Available from:
http://sat.inesc.pt/∼jpms.
Marques-Silva, J.P., e Silva, L.G., 1999. Algorithms for satisfiability in combinational circuits based
on backtrack search and recursive learning. 12th Symposium on Integrated Circuits and Systems
Design (SBCCI’99). pp. 192–195. Available from: http://sat.inesc.pt/∼jpms.
Marques-Silva, J.P., Sakallah, K.A., 1999. GRASP: a search algorithm for propositional satisfiability.
IEEE Trans. Comput. 48, 506–521.
Minato, S.-I., 1996. Binary Decision Diagrams and Applications for VLSI CAD. Kluwer Academic
Publishers, Boston.
Minato, S.-I., 2001. Zero-suppressed BDDs and their applications. Int. J. Softw. Tools Technology
Transfer (STTT) 3, 156–170.
Mitchell, D., Selman, B., Levesque, H., 1992. Hard and easy distributions of SAT problems. 10th
National Conference on Artificial Intelligence (AAAI’92). pp. 459–465.
Moskewicz, M.W., 2001. Personal communication.
Moskewicz, M.W., Madigan, C.F., Zhao, Y., Zhang, L., Malik, S., 2001. Engineering a highly
efficient SAT solver. 38th Design Automation Conference (DAC’01). pp. 530–535.
Pilarski, S., Hu, G., 2002. SAT with partial clauses and back-leaps. 39th Design Automation
Conference (DAC’02). pp. 743–746.
Plaisted, D.A., Biere, A., Zhu, Y., 2002. A satisfiability procedure for quantified Boolean formulae,
In: Hammer P. (Ed.) Discrete Applied Mathematics, Renesse Special Issue Devoted to the
International Symposium on Theory and Applications of Satisfiability Testing (SAT’00).
Pnueli, A., Rodeh, Y., Shtrichman, O., Siegel, M., 1999. Deciding equality formulas by small-domain
instantiations. In: Halbwachs, N., Peled, D. (Eds.), Computer-aided Verification (CAV’99),
LNCS, vol. 1633. Springer, pp. 455–469.
Prestwich, S.D., 2000. Stochastic local search in constrained spaces. Practical Application of
Constraint Technology and Logic Programming (PACLP’00). pp. 27–39.
Rintanen, J., 1999. Improvements to the evaluation of quantified Boolean formulae. International
Joint Conference on Artificial Intelligence (IJCAI’99). pp. 1192–1197.
M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106 105
Rudell, R., 1993. Dynamic variable ordering for ordered binary decision diagrams. International
Conference on Computer-aided Design (ICCAD’93). pp. 42–47.
Selman, B., Kautz, H., 1993. Domain-independent extensions to GSAT: solving large structured
satisfiability problems. International Joint Conference on Artificial Intelligence (IJCAI’93).
pp. 290–295.
Selman, B., Kautz, H., Cohen, B., 1996. Local search strategies for satisfiability testing. DIMACS
Series in Discrete Mathematics and Theoretical Computer Science 26, 521–532.
Shang, Y., Wah, B.W., 1998. A discrete Lagrangian-based global-search method for solving satis-
fiability problems. J. Global Optimization 12, 61–99. Available from: http://manip.crhc.uiuc.edu.
Sharangpani, H., Arora, K., 2000. Itanium processor microarchitecture. IEEE Micro 20 (5), 24–43.
Somenzi, F., 2001. Efficient manipulation of decision diagrams. Int. J. Softw. Tools Technology
Transfer (STTT) 3, 171–181.
Sta˚lmarck, G., 1989. A System for Determining Propositional Logic Theorems by Applying Values
and Rules to Triplets that are Generated from a Formula, Swedish Patent No. 467076 (approved
1992), U.S. Patent No. 5276897 (1994), European Patent No. 0403454 (1995).
Tafertshofer, P., Ganz, A., Antreich, K.J., 2000. GRAINE—an implication GRaph-bAsed engINE
for fast implication, justification and propagation. IEEE Trans. CAD 19, 907–927.
Tennenhouse, D., 2000. Proactive computing. Commun. ACM 43, 43–50.
Van Campenhout, D., Al-Asaad, H., Hayes, J.P., Mudge, T., Brown, R.B., 1998. High-level design
verification of microprocessors via error modeling. ACM Trans. Des. Autom. Electronic Syst. 3,
581–599.
Van Campenhout, D., Mudge, T., Hayes, J.P., 2000. Collection and analysis of microprocessor design
errors. IEEE Des. Test Comput. 17, 51–60.
Velev, M.N., 1999. Benchmark suites, SSS. 1.0, SSS. 1.0a. Available from:
http://www.ece.cmu.edu/∼mvelev.
Velev, M.N., 2000a. Formal verification of VLIW microprocessors with speculative execution.
In: Emerson, E.A., Sistla, A.P. (Eds.), Computer-aided Verification (CAV’00), LNCS, vol. 1855.
Springer, pp. 296–311. Available from: http://www.ece.cmu.edu/∼mvelev.
Velev, M.N., 2000b. Benchmark suites SSS-SAT. 1.0, VLIW-SAT. 1.0, FVP-UNSAT. 1.0, and FVP-
UNSAT. 2.0. Available from: http://www.ece.cmu.edu/∼mvelev.
Velev, M.N., 2001. Automatic abstraction of memories in the formal verification of superscalar
microprocessors. In: Margaria, T., Yi, W. (Eds.), Tools and Algorithms for the Construction and
Analysis of Systems (TACAS’01), LNCS, vol. 2031. Springer, pp. 252–267. Available from:
http://www.ece.cmu.edu/∼mvelev.
Velev, M.N., 2002. Benchmark suite NPE-1.0. Available from: http://www/ece/cmu.edu/∼mvelev.
Velev, M.N., Bryant, R.E., 1998a. Incorporating timing constraints in the efficient memory model
for symbolic ternary simulation. International Conference on Computer Design (ICCD’98).
pp. 400–406. Available from: http://www.ece.cmu.edu/∼mvelev.
Velev, M.N., Bryant, R.E., 1998b. Bit-level abstraction in the verification of pipelined microproces-
sors by correspondence checking. In: Gopalakrishnan, G., Windley, P. (Eds.), Formal Methods in
Computer-aided Design (FMCAD’98), LNCS, vol. 1522. Springer, pp. 18–35. Available from:
http://www.ece.cmu.edu/∼mvelev.
Velev, M.N., Bryant, R.E., 1999. Superscalar processor verification using efficient reductions of the
logic of equality with uninterpreted functions to propositional logic. In: Pierre, L., Kropf, T.
(Eds.), Correct Hardware Design and Verification Methods (CHARME’99), LNCS, vol. 1703.
Springer, pp. 37–53. Available from: http://www.ece.cmu.edu/∼mvelev.
Velev, M.N., Bryant, R.E., 2000. Formal verification of superscalar microprocessors with multi-
cycle functional units, exceptions, and branch prediction. 37th Design Automation Conference
(DAC’00). pp. 112–117. Available from: http://www.ece.cmu.edu/∼mvelev.
106 M.N. Velev, R.E. Bryant / Journal of Symbolic Computation 35 (2003) 73–106
Velev, M.N., Bryant, R.E., 2001a. Effective use of Boolean satisfiability procedures in the formal
verification of superscalar and VLIW microprocessors. 38th Design Automation Conference
(DAC’01). pp. 226–231. Available from: http://www.ece.cmu.edu/∼mvelev.
Velev, M.N., Bryant, R.E., 2001b. EVC: a validity checker for the logic of equality with uninter-
preted functions and memories, exploiting positive equality and conservative transformations.
In: Berry, G., Comon, H., Finkel, A. (Eds.), Computer-aided Verification (CAV’01), LNCS,
vol. 2102. Springer, pp. 235–240. Available from: http://www.ece.cmu.edu/∼mvelev.
Wang, D., Clarke, E., Zhu, Y., Kukula, J., 2001. Using cutwidth to improve symbolic simulation
and Boolean satisfiability. IEEE International High Level Design Validation and Test Work-
shop (LDVT’01).
Williams, P.F., 2000. Formal Verification Based on Boolean Expression Diagrams. Ph.D. The-
sis. Department of Information Technology, Technical University of Denmark, Lyngby, Den-
mark. Available from: http://www.it-c.dk/research/bed.
Williams, P.F., Biere, A., Clarke, E.M., Gupta, A., 2000. Combining decision diagrams and
SAT procedures for efficient symbolic model checking. In: Emerson, E.A., Sistla, A.P. (Eds.),
Computer-aided Verification (CAV’00), LNCS, vol. 1855. Springer, pp. 124–138. Available from:
http://www.it-c.dk/research/bed.
Wu, Z., Wah, B.W., 1999. Solving hard satisfiability problems: a unified algorithm based on discrete
Lagrange multipliers. 11th IEEE International Conference on Tools with Artificial Intelligence
(ICTAI’99). pp. 210–217. Available from: http://manip.crhc.uiuc.edu.
Zhang, H., 1997. SATO: an efficient propositional prover. In: International Conference on Automated
Deduction (CADE’97), LNAI, vol. 1249. Springer, pp. 272–275.
Zhang, L., Madigan, C.F., Moskewicz, M.W., Malik, S., 2001. Efficient conflict driven learning in a
Boolean satisfiability solver. International Conference on Computer-aided Design (ICCAD’01),
pp. 279–285.
