Low-latency SAT Solving on Multicore Processors with Priority Scheduling and XOR Partitioning by Stephen M. Plaza et al.
Low­latency SAT Solving on Multicore Processors
with Priority Scheduling and XOR Partitioning
Stephen M. Plaza, Igor L. Markovy, Valeria Bertacco
EECS Department, University of Michigan, Ann Arbor, MI
ySynplicity, Inc., Sunnyvale, CA
fsplaza, imarkov, valeriag@umich.edu
ABSTRACT
As multicore processors become prevalent, computational method-
ologies for decision-making, combinatorial optimization, optimal
design, and formal verication must adapt to better utilize avail-
able CPU resources. We propose new techniques to exploit the
power offered by upcoming shared-memory multicore/multi-CPU
architectures to boost the performance of solvers for fundamental
NP-hard problems, such as Boolean and Pseudo-Boolean SAT. We
develop an algorithmic paradigm for parallel solvers centered on
1) a scheduling strategy to reduce the average latency for solv-
ing batches of instances of varying complexity, and 2) a novel,
balanced decomposition of a SAT instance's search space among
multiple threads. These techniques are implemented in a software
library that provides parallel-solving services to user applications.
Evaluation on an eight-core workstation shows signicantlyreduced
latency for solving multiple SAT instances in parallel, as well as
greater CPU utilization.
1. INTRODUCTION
Modern CPUs and SoCs exhibit greater functional complexity,
fueling numerous challenges in design and verication. With many
related tasks shown NP-hard, even small increases in design size
may require a disproportionate increase in computing resources.
Managing this fast-growing demand of resources has always been
a challenge in the EDA eld, and has often set the limit of how
complex of a design could be tackled. Hence, the opportunity pro-
vided by the recent offering of chip multicore architectures, which
can execute multiple threads simultaneously on a shared memory
platform, is not one to pass on. However, it currently remains un-
clear how leading-edge algorithms forkey problems can be mapped
onto such architectures to ensure powerful speed-ups. Our work ad-
dresses precisely this issue for the Boolean SATisability problem.
Boolean SATisability is one of the best-known NP-complete
problems with numerous applications in EDA [16]. Recent ad-
vances in DPLL-based SAT solvers [16, 17] allow many practical
problems to be solved quickly, and have facilitated their widespread
deployment in the industry. Fundamental verication techniques,
such as equivalence checking [8] and model checking [4], make ex-
tensive use of SAT solvers to bypass the often prohibitive memory
requirements of BDD-based strategies. However, the performance
of state-of-the-art SAT solvers varies widely and unpredictably on
problem instances, with some instances being intractable. Also, a
slight change in algorithm parameters can sometime affect the run-
time of a solver on the same instance by orders of magnitude.
Intrinsically paralleltasks, such as multimediaprocessing, may
achieve N times speed-up by using N cores (assuming that suf-
cient memory bandwidth is available and that cache coherency is
not a bottleneck). However, combinatorial optimization and search
Figure 1: High-level ow of our parallel SAT methodology. We
introduce a scheduler for completing a batch of SAT instances
of varying complexity and a light-weight parallel strategy for
handling hard SAT instances.
problems, such as SAT-solving and integer linear programming,
are much harder to parallelize. The straightforward solution 
to process in parallel different branches of a given decision  of-
ten fails miserably in practice because such branches are not in-
dependent in leading-edge solvers that rely on recursive learning.
The recent View from Berkeley project [3] designates branch and
bound techniques as one of thirteen core computational categories
for which parallel algorithms must be developed. In this work, we
propose new techniques to parallelize state-of-the-art SAT solving.
In addition to formal verication, state-of-the-art design opti-
mizations require solving multiple SAT instances, such as SAT
sweeping in logic synthesis [29, 19], SAT-based technology map-
ping for FPGAs [13], and logic restructuring [20] require solving
multiple SAT instances. These SAT instances are currently found
in the bottlenecks of key EDA algorithms, and solving them in
parallel offers a chance to speed up a broad range of EDA tools.
The observation that these instances exhibit varying complexity is
not only curious, but also leads to improved parallel solving tech-
niques. In addition to threads dedicated to solve SAT instances,
domain-specic threads may formulate and simplify new SAT in-
stances, as well as interpret collected solutions. Shared-memory
systems and multicore CPUs are particularly amenable to such par-
allelization strategies.
We rst introduce a novel architecture for scheduling and solv-
ing multiple instances of hard problems, such as those arising in
SAT, on shared-memory and multicore CPUs, as illustrated in Fig-
ure 1. The rst problem we address is the scheduling of M SAT
instances on N processors when M > N. Take, for example,the case N = 1. If runtimes are known for each instance in ad-
vance, then scheduling instances in the increasing order of runtime
guarantees the best batch latency, which we dene as the sum of
completion times of all instances from the beginning of the batch.
In other words, a long-running job will not delay numerous small
jobs. Scheduling for N processors, without a priori runtime infor-
mation, is harder, and our paper is the rst to address this problem.
Furthermore, many applications generate SAT instances dynami-
cally rather than in batches  the technique we propose handles
this case as well.
The need to parallelize a single SAT instance arises primarily
when no other instances remain to keep all available cores busy.
In a large verication environment, this is most likely to happen
with only the hardest SAT instances, which gives us an additional
assumption that can be used by parallel SAT algorithms. In our
framework, complex jobs can run sequentially for some time before
a decision to parallelize them is made.
In this paper we achieve two goals: 1) the minimization of the
average latency for solving a collection of SAT problems while
ensuring maximum resource utilization and 2) the minimization
of runtime for large problem instances by exploiting parallel re-
sources. To reduce the average latency of a collection of SAT in-
stances, we introduce a novel scheduling algorithm that can utilize
a predicted distribution of SAT runtimes, and emphasizes syner-
gies between time-slicing and batch scheduling. We achieve a 20%
average latency improvement while increasing utilization of par-
allel resources. To reduce the runtime for single large instances,
we consider a novel partitioning scheme based on adding XOR
constraints that evenly divides the search-space of SAT instances
independent of its underlying structure. We exploit a theoretical re-
sult from [21] on randomized polynomial-time algorithms, where
adding a limited number of random XOR constraints to a SAT in-
stance can reduce a SAT instance with multiple solutions to a SAT
instance with one solution. Our work is the rst to apply this result
to search-space partitioning in multicore SAT solving, circumvent-
ing a major pitfall common in parallel SAT solver algorithm 
unbalanced partitioning [24]. We further observe that search-space
partitioning is best performed when the random restart frequency is
low, which occurs after initially solving the problem sequentially.
We validate our work by extensive experiments on an eight-core
system and improve resource utilization by 60:5% over prior work
that uses solver portfolios. Our contributions are as follows:
1. An approach to scheduling multiple SAT problem instances
in a way that minimizes the average latency.
2. Anovel partitioning ofthe SATsearch space using XORcon-
straints which produces sub-problems of similar complexity
and evenly partitions the solution space.
3. A lightweight SAT parallelization strategy that can be easily
adapted to improve the performance of any state-of-the-art
DPLL-based solver.
4. An open-ended parallel SAT-solving methodology that com-
bines our lightweightparallelization strategy withother heuris-
tics to improve the overall resource allocation.
In Section 2, we survey previous work on parallel SAT solving.
Section 3 introduces our scheduling algorithm for handling multi-
ple SAT instances of varying complexity in a parallel setting. We
discuss the limitations of previous parallel SAT research in Section
4. In Section 5, we propose a partitioning strategy that provides
search-space division along with our strategy for parallelization.
We analyze the effectiveness of our approach in Section 6 and con-
clude in Section 7.
2. PRELIMINARIES
For a Boolean formula F in conjunctive normal form (CNF), the
SAT problem requires (i) choosing an assignment for a set of vari-
ables V that satises F, or (ii) conrming that no such assignment
exists. The basic approach to solving SAT is a backtracking frame-
work referred to as DPLL [6]. Several innovations such as non-
chronological backtracking, conict-driven learning, and decision
heuristics greatly improve upon this approach [16, 17, 25].
First, we outline previous efforts on improving SATperformance
ina parallelsetting. Thenwe explain how leading-edge SATsolvers
often exhibit characteristic long-tail runtime distributions, which
we exploit later in our work.
2.1 Previous Approaches to Parallel SAT
Parallel SAT solving strategies have explored coarse-grain or
ne-grain parallelization. Fine-grain parallelization strategies tar-
get Boolean Constraint Propagation (BCP) which contributes to the
largest percentage of runtime for most SAT solvers. In BCP, each
variable assignment is checked against all relevant clauses and any
implications are propagated. BCP can be parallelized by dividing
the clause database among n different solvers so that BCP compu-
tation time of each solver is approximately
1
n the original. Coarse-
grain parallelization strategies typically involve assigning a SAT
solver to different parts of the search space.
Fine-grain parallelization. The performance of ne-grain par-
allelizationdepends on the partitioning ofclauses among thesolvers,
where an ideal partition ensures an even distribution of BCP costs
while minimizing the implications that need to be communicated
between each solver. This strategy also requires low-latency inter-
solver communication meaning that contention for system locks as
implemented on general microprocessors could exacerbate perfor-
mance. Therefore, ne-grain parallelization has been examined on
specialized architectures [27] that can minimize any communica-
tion bottlenecks. Also, in [28, 1], signicant parallelization was
exploited by mapping a SAT instance to an FPGA and allowing
BCP to evaluate several clauses simultaneously. The exibility and
scalability of this approach is limited because each instance needs
to be compiled to a specic architecture and conict-driven learn-
ing is difcult to effectively implement.
Coarse-grain parallelization. The runtimeof an individual prob-
lem can also be improved in a parallel setting by using a solver
portfolio [10], where multiple SAT heuristics are executed in par-
allel and the fastest heuristic determines the runtime for the prob-
lem. A solver portfolio is also one way of countering the variabil-
ity that backtrack-style SAT solvers experience on many practical
SAT instances [11]. Because one heuristic may perform better than
another on certain types of problems, one can reduce the risk of
choosing the wrong heuristic by running both. Although paral-
lelization here consists of running multiple versions of the same
problem simultaneously, if the runtime difference between these
heuristics is signicant, a solver portfolio can yield runtime im-
provements.
However, using a portfolio solver does not guarantee high re-
source utilization as each heuristic may perform similarly on any
given instance or one heuristic may dominate the others. The pri-
mary limitation of solver portfolios is that there is no good mech-
anism to coordinate the efforts of these heuristics and the random-
ness inherent tothem. Tobetter coordinate efforts, other approaches
consider analyzing different parts of the search space in parallel
[18, 24, 5, 14]. If the parts of the search space are disjoint, the so-
lution to the problem can be determined through the processing of
these parts in isolation. However, in practice, the similarities that
often exist between different parts of the search space mean thatredundant analysis is done across the different parts. To counter
this, the authors in [14] develop an approach to explore disjoint
parts of the search space where large shared memory in a multicore
system is used to transfer learned information between them. The
approach considers dividing the problem instance using different
variable assignments called guiding paths, as originally described
in [24]. One major limitation of this type of search space partition-
ing is that poor partitions can produce complicated sub-problems
with widely varying structure and complexity.
The benets of learning between solvers working on different
parts of the search space in parallel suggest potential super-linear
improvement. However, the improvements achieved by current
strategies seemmore consistent withthe inherent variabilityof solv-
ing many real-world SAT problems and the effect of randomiza-
tion on reducing this variability. Through intelligent randomization
strategies, sequential solvers can often avoid complicated parts of
the search space and outperform their parallel counterparts.
2.2 Runtime Variability in SAT Solving
While DPLL SAT solvers typically struggle on randomly gener-
ated instances, most practical SAT instances possess regular struc-
ture and can be solved much faster. However, it has been observed
that many practical instances experience exponential runtime vari-
ability [11] when using backtrack-style SAT solvers without ran-
dom restarting. In particular, many instances exhibit heavy-tail be-
havior, meaning that the runtime variance of solving a SATinstance
when sampling from existing competitive algorithms is exponen-
tial.
DEFINITION 1. For a random variable X, a heavy-tail proba-
bility distribution occurs when Pr[X > x] / x
  as x ! 1 for
0 <  < 2.
Ifthe cumulative probability does not converge to1 quickly enough,
the distribution will have a heavy-tail. More specically, the vari-
ance of X is 1, and when  < 1 the mean is also 1. In analyzing
SAT performance, X denotes the number of backtracks (this cor-
relates to the difculty of solving the problem) required to solve a
given instance. Also, since the maximum runtime is exponential,
the bounded heavy-tail produces variance that is actually exponen-
tial in the number of backtracks.
Effectiverandom restarting strategies, which arenow extensively
used in DPLL-based solvers and involve a worst-case polynomial
number of restarts, can eliminate heavy-tail behavior [11] and also
target hard problems which have fat-tails ( > 0) that are not
heavy. Intuitively, random restarts prevent a solver from getting
stuck in a difcult part of the search space. Portfolio strategies [10]
offer similar benets because each heuristic explores different parts
of the search space. Furthermore, each heuristic can utilizemultiple
restarting strategies, which in turn can produce more improvement.
Backdoor variables. In [22], it was observed that many com-
mon problems possess a small backdoor set. A backdoor for a
SAT instance is variable set that under some assignment produces
a sub-problem solvable in polynomial time. This occurs when the
remaining problem can be solved by a linear-time 2-SATalgorithm.
DEFINITION 2. Given aBoolean formula F(V ),variables B 
V , and a variable assignment AB 2 f0;1g
jBj, B is a backdoor if
9AB[FAB 2 P ^ FAB 6= 0]
In other words, if assigning a set of variables results in an instance
with a satisfying assignment that can be solved in polynomial time,
the set forms a backdoor.
DEFINITION 3. Given a Boolean formula F(V ), a partial vari-
able assignment B is a strong backdoor if 8AB[FAB 2 P].
Forunsatisableinstances, thiswould requireexploration of2
jBj
combinations and a total runtime of 2
jBjP(FAB) where P(FAB)
is the runtime of the polynomial algorithm under a given assign-
ment. Empirical evaluation in [22] suggests that many practical
problems have jBj / lg(jV j) resulting intotalruntime ofjV jP(FAB)
if the backdoor set is known. Although determining this set is not
always computationally feasible, decision heuristics like VSIDS
which favor variable assignments that decide the problem quickly,
implicitly look for such sets. It was also explained in [22] that ran-
domly generated instances have considerably larger backdoors of
around 30% of jV j. Efciently determining a backdoor, explicitly
or implicitly, is critical for the performance of a SAT solver.
3. SCHEDULING SAT INSTANCES
OF VARYING DIFFICULTY
Given M different SAT instances and an N-threaded machine,
we desire to solve them in a way that satises the following:
min(
M X
Tc(m)) where 8tSt  N (1)
where Tc(m) is the completion time for problem m and St is the
number of instances being solved for a particular time-slicet. Note,
when N = 1, this formulation considers the case of only a single
available thread of execution. Ideally, the completion time Tc for
the nal mf for N threads should be N-times smaller than for
N = 1 to fully utilize the parallel resources.
Optimizing the objective above subject to resource constraints
can lead to a schedule that minimizes the total latency for nishing
all the SAT instances. Assuming that incoming SAT instances are
independent and equally important tosolve, minimizing latency isa
way to ensure feedback from as many problem instances as quickly
as possible. This may unblock the largest number of client threads
waiting for results. In the case where the runtime for each m is
approximately equal, optimizing the latency objective is trivial as
the SAT problems can be solved in any order. However, as shown
in Figure 2, several SAT instances can experience a wide variance
in runtime. In particular, by analyzing the distribution of runtimes
from the SAT 2003 competition [12] which contains several bench-
mark suites, we observe that most instances either nish in the rst
5 minutes or timeout over 64 minutes. An optimal schedule for an
N-threaded machine involves scheduling problems in increasing
order of complexity on each n thread. Unfortunately, predicting
actual runtimes beforehand is not possible; therefore, we will dis-
cuss other predictative strategies for handling this limitation.
Figure2: Thenumberof SATinstancesolved ina given amount
of time. The timeout is 64 minutes.
Because the distribution of runtimes is very uneven, it is possi-
ble that in addition to large accumulated latency, random schedul-ing could result in some threads completing execution well after
previous ones, leading to poor resource utilization. To even out
execution, we can leverage thread schedulers present in most oper-
ation systems, which perform time-slicing. Through time-slicing,
problems with small runtimes will still nish pretty early; however,
longer instances will be evened out and nish around the same time
as each other, thus increasing accumulated latency.
Our solution involves using the distribution of SAT runtimes to
predict a time threshold where the remaining percentage of SAT
problems unsolved will likely be of high complexity. However, we
will explore other techniques which are not dependent on predic-
tive distributions. From Figure 2, we see that this value is approx-
imately 5 minutes. Before this threshold, we perform time-sliced
scheduling over all the problems and after the threshold we increase
the thread priority for N instances so that they run in batch mode.
To further reduce the average latency, we can ensure that jobs
requiring large memory resources that negatively impact system
performance have low priority. Despite the increasing amount of
shared memory, large SAT instances could cause a bottleneck due
to memory contention. To counteract this, we can assign low pri-
ority to the largest X jobs so that the available system memory is
greater than mem(M   X) or the remaining required memory.
During the time-slice phase of the scheduler, we can temporarily
lower the priority of different sets of Y jobs so that each large job
from X can be analyzed for a segment of time. In such a way,
we ensure that each job receives resources, that contention is mini-
mized, and thatthe largest number of jobs have the chance of nish-
ing early. If large jobs remain after the time-slicing mode nishes,
we can divide the remaining jobs so that batches can execute with
total memory consumption within system resources. However, in
our experiments, the instances we consider have low memory pro-
les in general. Also, the growth of a SAT instance due to conict-
driven learning tends to be gradual with respect to the initial size
because of efcient memory management found in state-of-the-art
SAT solvers.
Figure 3: The percentage of total restarts for each minute of
execution for a random distribution of SAT instances.
Although not implemented in this paper, scheduling can be based
on runtime estimates generated from progress meters found insome
SAT solvers [2]. The thread priority for easier instances can be
increased in this manner. As a simpler predictive model, we con-
sider random restart frequency or the percentage of restarts done
each minute. In Figure 3, we show a distribution of restarts over
a random slice of benchmarks. It reveals an exponential decay in
frequency, which can be used as a guide to lower thread priority.
When few restarts occur, there are less opportunities to quickly ar-
rive at a solution due to a better variable order.
As another extension, we consider the case where jobs occur on
the y. In this situation, new jobs will be given priorities equal
to those of the current highest priority jobs (whether in time-slice
or batch mode) so they will enter time-slice mode. This ensures
that old jobs receive resources while maximizing resources for new
jobs, which often complete before the time threshold.
4. CURRENT PARALLEL SAT SOLVERS
Previous efforts at parallelizing algorithms for solving random
SAT instances have been effective as in [15], but random instances
are notcommon in EDAapplications whose problems exhibitstruc-
ture. For such instances, [14] represents the state of the art where
a novel implementation was proposed to exploit shared memory
to enable efcient learning between solvers running on different
threads. In this section, we describe some pitfalls with this ap-
proach, along with some limitations of portfolio solvers.
As previously mentioned, search space partitioning using guid-
ing paths, as in [14], is limited because the division may be un-
balanced. This division circumvents the effectiveness of random
restarts by prescribing initial assignments to each solver running in
parallel. However, addressing this problem by undoing the initial
assignments for a thread after each random restart appears to un-
dermine the benets of partitioning. The partition itself may also
produce subproblems that require very different runtimes to search.
Furthermore, learning between threads is not always an effective
means of boosting performance. As discussed in [26], using 1-UIP
learnts is often more effective at improving the solver's perfor-
mance than using minimally-sized learnts. This counter-intuitive
result suggests that parallel schemes for learning, which often use
the size of learnts as a ltering mechanism, may not necessary
boost the performance of a particular thread of execution.
Implementing these parallelization strategies requires careful se-
lection of a successful sequential solver. Choosing a poor heuristic
for parallelization will still lead to poor performance, especially in
a portfolio where it consistently underperforms compared to other
heuristics. Furthermore, the underlying heuristics implemented in
most successful SATsolvers arenely-tuned whichrequires careful
and time-consuming development of parallel optimizations. The
slightestperturbation tothe qualityof thesequential algorithmcaused
by parallelization, such as excessive learning between threads, can
signicantly degrade runtime performance. For example, learning
increases the size of the clause database which increases the cost of
Boolean constraint propagation. Furthermore, decision heuristics,
like VSIDS, are guided by the learning that is performed. Learning
can therefore steer the decision heuristic to complicated parts of the
search space.
Portfolio solvers are advantageous because there is little imple-
mentation overhead, and the risk of performing really poorly on an
instance that has high variable runtime is minimal. However, this
approach requires that the heuristics have different performance
characteristics on different types of instances. As larger computing
systems become available, it will be increasingly difcult to nd
larger collections of different heuristics. Furthermore, even where
orders-of-magnitude improvements are possible, some instances
may show no improvement, resulting in small overall speed-up.
5. SOLVINGINDIVIDUALHARDINSTANCES
IN PARALLEL
In this section, we propose an algorithmic methodology that uti-
lizes available resources to reduce the runtime of hard instances.
We overcome the limitations described previously by introducing
a novel approach to dividing search-space that allows for more
exible random restarts. Furthermore, our approach can be easily
adopted by any state-of-the-art DPLL-based solver.5.1 Search­space Partitioning
using XOR Constraints
We propose our strategy for partitioning the search space evenly.
First we elaborate on the theoretical underpinnings of adding XOR
constraints to a SAT instance and then reveal its signicance for
evenly dividing a search space.
Reducing search-space through XOR constraints. To more
evenly partition the search space, we extend the work in solution-
space reduction using a result described in [21]. It was explained
that adding the following XOR constraint to F probabilistically
reduced its solution space by approximately
1
2:
F ^ (x1  x2    xj  1) (2)
where xj represent randomly chosen variables in F (the probability
of choosing a variable is
1
2). We dene the resulting formula as
Feven denoting that assignments to the xj variables must have even
polarity to satisfy Feven. We can dene Fodd as:
Fodd = F ^ (x1  x2    xj  0) (3)
fFeven;Foddg denotes a disjoint partition when the set of xj vari-
ables is the same for Feven and Fodd. More formally:
DEFINITION 4. Adisjointpartition existswhen (1)F = Feven_
Fodd, (2) Feven ^ Fodd = 0, and (3) xj set is the same for Feven
and Fodd.
This partition generates two sub-problems that can be assigned
to different solvers. These partitions can be recursively divided by
adding more XOR constraints. Asa generalization to the result ex-
plained in [21], each XOR constraint probabilistically divides the
number of assignment combinations for V variables roughly in half
to 2
jV j 1. This means that the constraint probabilistically divides
the search space in half with the of hope balancing the workload
between different solvers.
However, in practice, simply adding large XOR-constraints is
inadequate for reducing the search space because it will not cause a
conict until all of the xj variables are assigned, which is approx-
imately
jV j
2 variables. In other words, this large constraint divides
the search space evenly, but it is ineffective at restricting the search
until after nearly all assignments have been made. We now explain
how smaller XOR constraints can be derived that still achieve the
same theoretical guarantees.
Connection between backdoors and randomized reductions.
As an example of how we can add smaller constraints, consider a
combinational circuit D with m inputs. This circuit can be con-
verted to a SAT instance D
C with V variables where the set of so-
lutions determined by the assignments to the primary inputs is M
where jMj = 2
m. Therefore, the set of solutions SDC 2 2
jV j $
M. In other words, any assignment AM 2 2
m results in exactly
one solution. According to Denition 3, M is a strong backdoor
for D
C. By restricting the set of variables xj to variables in M,
we can construct a partition that gives the same probabilistic guar-
antees as the original formulation, but produces a smaller XOR
constraint while generating conicts earlier if these variables are
decided on rst. Namely, 1-XOR constraint roughly divides the
solution space relative to M which correspondingly divides SDC
roughly in half due to the bijective relationship.
5.2 Light­weight Parallel SAT
For a general SAT instance, we can restrict XOR-constraints to
involve only backdoor variables, which typically restricts the vari-
ables to a much smaller subset. By adding an XOR constraint in-
volving these backdoor variables B, we can cut the search space
roughly in half to 2
jBj 1.
Evenly partitioning the search space in our multi-threaded
SAT framework. Because computing the smallest backdoor set
explicitlyisnot always feasible, weuse, as anapproximation, highly
ranked variables determined by selection heuristics inmodern DPLL-
based solvers like VSIDS. Since [22] observed that many backdoor
sets have cardinality log(jV j), we choose xj from the top log(jV j)
variables when producing XOR constraints. To generate variable
rankings, we run a SAT solver for a certain amount of time before
generating these XOR constraints.
Algorithm. In Figure 4, we introduce our algorithm for using
XOR partitioning to improve the performance of SAT in a parallel
environment. psat solve is called like a normal SAT solver and
takes as a parameter the number of random restarts performed be-
fore partitioning the problem. This allows very simple instances
to be completed sequentially and trains the SAT solver so that
good variables are chosen for partitioning. When partitioning is
required, we add an XOR constraint involving the top lg(jNj) vari-
ables through add xor constraints. Because the XOR con-
straint is typically small, we don't require a specialized XOR con-
straint representation as in [9]. We then spawn two threads and wait
for their results. Notice that the threaded mode uses the same in-
frastructure as the sequential mode with only a few minor changes.
To maintain an even division of work between the two threads, we
ensure that the partition variables part vars are ranked high by
increasing their rank after restarting. Because multiple variables
are used to drive the partitioning constraint, there is more exi-
bility in the search procedure than having an exact guiding path.
Finally, in the DPLL search function, we share learnts between
threads when conicts occur in a manner similar to [14] to facili-
tate quick search-space pruning. We expect that our partitioning,
however, will produce sub-problems with similar characteristics,
thereby making our inter-thread learning more powerful. If one
thread nishes unsatisable, we do not repartition the problem. We
have observed that constant repartitioning hinders the effectiveness
of the underlying sequential algorithm to solve instances. In prac-
tice, we observe that the even partitioning results in threads that
compute for a similar amount of time.
Even Division of Solution Space. We can also exploit the the-
oretical evenness of our division and note that the number of so-
lutions to the SAT instance are evenly distributed. Therefore, if
one sub-problem is found to be unsatisable, we can estimate that
the other sub-problems have none or very few solutions. This re-
sult could be used to guide the selection of a portfolio of solvers
on-the-y.
Parallel SAT Methodology. The previous algorithm for paral-
lelizing SAT is an effective and lightweight strategy for improving
the runtime of a sequential solver. Since the sequential solver could
perform poorly on certain classes of benchmarks, we run our par-
allel solver in a portfolio to minimize this risk. Also, in accordance
with our priority-based scheduling, we do not partition the instance
until the batch-mode time threshold is reached. Similarly, we can
partition the instance when the restart frequency is low. In this
way, we reserve parallel computation for only the hard problems
and avoid deterministically partitioning the search space when the
variable rankings change more frequently. Therefore, because the
top variables will be fairly consistent between random restarts after
several minutes of sequential solving, we simplify our procedure
in Figure 4 to not increase the ranking of variables chosen for the
partitioning. Furthermore, we also can choose fewer partition vari-
ables so smaller XOR constraints are produced.bool psat solve(CNF cnf, int passes, Mode mod=seq, Lit assump)f
static Var part vars;
initialize assumps(assumps);
while( not done() && (passes   jj mod!=seq)) f
if(mod == parallel) increase rank(part vars);
random restart();
result = DPLL search(mod);
g
if(mod == seq && not done()f
part vars = top vars();
add xor constraints(cnf, part vars);
thread(cnf,,parallel,neg);
thread(cnf,,parallel,pos);
while(wait)f
if(SAT) return SAT;
else if(num threads  ) return UNSAT;
g
return result;
g
bool search(Mode mod) f
while (true) f
propagate();
if(conict) f
analyze conict();
if(top level conict) return UNSAT;
backtrack();
if(mod == parallel) parallel learn backtrack();
g
else if(satised) return SAT;
else decide();
gg
Figure 4: Parallel SAT Algorithm.
6. EMPIRICAL VALIDATION
We consider benchmarks from the SAT 2003 Competition [12]
from the handmade and industrial categories which feature
several different suites. The runtime of each benchmark is proled
using MiniSAT 2 [7] on a 4 processor dual-core Opteron system
clocked at 1GHz with 16 GB of memory running the Fedora 8 SMP
OS. We set a timeout for each benchmark at 64 minutes and cre-
ated a distribution of runtimes over the entire suite. Our results
indicate that most benchmarks nish in either less than 1 minute
or over 1 hour. This highlights the wide variance in runtime per-
formance motivating our proposed methodology. Statistics for the
benchmarks as well as the runtime distributions can be found in
Table 1 and Figure 2 respectively.
SAT suite #SAT #UNSAT #TimeOut #total time
> 64min (min)
handmade 48 90 215 353 13779
industrial 19 33 48 100 3160
Table 1: MiniSAT 2 results on the SAT 2003 benchmark suite.
6.1 Effective Scheduling of SAT Instances
We rst consider the upper-bound of resource utilization by ex-
ecuting several problems concurrently in the ideal case where each
benchmark is roughly of the same complexity. Here, we consider
only small benchmarks from the suite previously analyzed and we
show how a multi-threaded machine can effectively be used so
that n threads result in approximately n-times speed-up. The per-
formance increase is ideal because each thread has an indepen-
dent sub-problem thus being perfectly parallelizable. However, this
analysis, shown in Table 2, is vital in showing that if n independent
problems are derivable, a corresponding speed-up is possible. Per-
fect parallelization is not seen in this table due to the slight variation
in runtime complexity for the different instances. In the following
paragraphs, we show our results for solving a set instances with a
potential wide variance in runtime.
#threads runtime(min) speed-up
1 67 1
2 34 1.98
4 18 3.72
8 10 6.70
Table 2: Running MiniSAT on a set of benchmarks using dif-
ferent numbers of threads.
Figure 5: The number of SAT instances solved (up to the time-
out) in a given amount of time by considering three different
scheduling schemes for an 8-threaded machine. Our priority
scheme gives the best average latency, which is 20% better than
batch mode and 29% better than time-slice mode.
Scheduling SAT problems with varying complexity. To test
a parallel methodology under a realistic distribution of runtimes,
we randomly select a subset of benchmarks, with total runtime of
 32 hours, that follows the distribution seen in Figure 2. We
show the performance of a non-ideal methodology denoted by the
line batch mode in Figure 5 that schedules the SAT problems
as a batch of jobs to our 8-threaded machine. Although the total
runtime for all the problems is around 4 hours, we note that sev-
eral easily executed problems are not scheduled until much later.
In particular, we notice that the latency for each job is not mini-
mized especially for smaller instances. Using the operating-system
to schedule threads results in the time-slice mode line. No-
tice that although several easy instances nish early, the latency
for harder instances increase over batch mode. In our priority
mode, we transition to batch-mode by adjusting thread priorities
after a time threshold is reached. Notice that the area below this
line is smaller, indicating better latency, according to our crite-
rion. Here, we achieve a 20% improvement in average latency over
batch mode and 29% improvement over time-slice mode.
Figure 5 shows wall-clock time; however, we have observed that
the system time is insignicant for each strategy (< 2 minutes).
This is due, in part, to the efciency of the OS scheduler along with
the relatively small memory prole required for the random slice
of 55 instance considered.
6.2 Solving Individual Hard Problems
Ultimately, fast verication turn-around may require faster so-
lution of individual hard SAT instances. Solvers such as SatZilla
[23] try to exploit the fact that some solvers perform better on cer-
tain classes of SAT problems than others. By carefully assigning
different solvers to each instance, one can improve runtime com-
pared to using any one solver. In the parallel setting, the choice
can be simplied by running until one of them completes. How-
ever, unlike the single-threaded portfolio variant, it is desirable
that the improved runtime is comparable to the extra computingresources that are used. Although super-linear runtime improve-
ment over the baseline of MiniSAT is possible due to the high vari-
ability of performance of different approaches on a given problem
instance, it is important that consistent improvements are achieved
that actually exploit available computational resources. In our fol-
lowing analysis, we consider SAT instances from the handmade
and industrial categories to more realistically reect the types
of instances seen inpractice. Wechoose a subset of instances where
MiniSAT requires signicant computation ( 1 hr).
heuristic type & num. instances heuristic solves rst
Solver Portfolio MiniSAT Vers. w/MiraXT w/pMiniSAT
MiniSAT 6 m1 3 MiniSAT 6 pMiniSAT 5
Mira1T 0 m2 2 MiraXT 1 Mira1T 1
HaifaSat 1 m3 1 - - - -
Jerusat1.3 1 m4 1 Jerusat1.3 0 Jerusat1.3 1
march ks 0 m5 1 march ks 0 march ks 0
picosat 2 m6 2 picosat 2 picosat 2
rsat 0 m7 1 rsat 2 rsat 1
zchaff 2 m8 1 zchaff 2 zchaff 2
time(min) 321 326 335 200
speed-up 1.67 1.65 1.60 2.69
%util 20.9 20.6 20.0 33.6
Table 3: Using 8 threads of computation with a portfolio of
solvers to handle hard SAT instances.
heuristic & # solved
MiniSAT 7 pMiniSat 8
picosat 2 picosat 2
zchaff 2 zchaff 2
Jerusat1.3 1 - -
time(min) 359 218
speed-up 1.50 2.46
%util 37.4 61.6
Table 4: Using 4 threads of computation with a portfolio of
solvers to handle hard SAT instances.
Table 3 shows the speed-up achieved by running multiple heuris-
tics simultaneously where we consider different solver portfolios.
In the last two columns, we highlight the improvement to CPU
utilization achieved by incorporating our lightweight parallel al-
gorithm. The total runtime without parallelization for MiniSAT is
537 min. The rst, third, fth, and seventh columns list different
heuristics organized in a portfolio. We report the number of hard
instances that a particular heuristic solves the fastest. The rst col-
umn shows a collection of state-of-the-art SAT solvers. Notice that
even though MiniSAT is not the best in all cases, the speed-up on
8 cores is pretty small at 1:7 meaning that only 20:9% of the 8-
times ideal speed-up is realized. The third column shows a portfo-
lio of different versions of MiniSAT given by MiniSAT Vers.
produced by adjusting several tunable knobs such as: restart fre-
quency, variable decay rate, and decision heuristic. These results
reveal similarly poor utilization where neither randomness nor dif-
ferent heuristics achieve high utilization. We then tried running
MiraXT [14] with two threads but did not see additional speed-up
in the portfolio (one heuristic is removed from the original port-
folio to account for the extra thread required by MiraXT) . Be-
cause its performance was dominated by MiniSAT, parallelizing
this solver is ineffective at increasing utilization. Furthermore, the
results reported in [14] consider only two threads with speed-up
much smaller than 2. Additionally, we have observed that their
heavyweight approach to partitioning and learning has diminishing
returns when considering more threads.
By incorporating our parallel version of MiniSAT, pMiniSAT,
described in Figure 4 in the solver portfolio, we are able to achieve
signicant speed-up and higher utilization of 60:5% with respect
to the 8 threads of execution compared to the best solver portfo-
lio. Furthermore, in Table 4 we show that our utilization numbers
are better when considering only 4 threads. This indicates the lim-
itation of large solver portfolios, illustrating that our lightweight
approach to parallelization can be benecial for achieving greater
utilization by applying it across multiple, different heuristics.
6.3 Partitioning Strategies
We compared our XOR-based partitioning to partitioning with a
single guiding variable, which is a special case of guiding paths. In
Figure 6, we reveal the effectiveness of using XOR constraints for
achieving balanced workloads which evenly distributes solutions
between the threads. Figure 6a gives the percentage of satisable
problem instances, out of 16 instances examined, where the rst
thread that completes has at least one solution. We compare the
single variable partitioning with XOR constraints of size 2  4 and
consider parallelization using 2, 4, and 8 threads concurrently. No-
tice, in the 2-thread case, 100% of the threads that nish rst is
satisable using XORs of size 4, compared to only 75% using one
variable. In general. this experiment reveals that our partitioning
is more effective at distributing solutions, which can be exploited
in adjusting the portfolio of heuristics considered dynamically. We
expect even better performance in application domains where the
number of solutions is much greater than the number of available
threads of computation.
Figure 6b shows the runtime balance between 2, 4, and 8 threads.
We examined different partitioning strategies on a set of 29 unsat-
isable problem instances and calculated the normalized standard
deviation, which is the standard deviation of thread runtime divided
by the average runtime. We disable learning for this experiment to
more accurately analyze how the search space is partitioned. For
the single variable partitioning fortwo threads, the normalized stan-
dard deviation is 0:35, compared to a much smaller 0:22 for XOR-
based partitioning with 4 variables. In general, we notice close to
a 2-time improvement in the runtime deviation between the single
variable strategy and the 4 variable XOR when considering differ-
ent numbers of threads.
7. CONCLUSIONS
The ubiquity of SAT-solving in EDA and its growing adoption
for new applications necessitates methodologies that can exploit
parallel resources. However, the inherent variability and complex-
ity of SAT instances challenge state-of-the-art strategies to ef-
ciently utilize these resources.
In this work we have empirically evaluated portfolios of SAT
solvers  perhaps the most promising approach so far for paral-
lel SAT on up to four cores. The results hint that portfolios are
unlikely to scale to large numbers of processors because their per-
formance on each particular benchmark type tends to be dominated
by only one or two solvers, rendering the remaining solvers essen-
tially useless. Indeed, when 100 cores are available, it is unlikely
that there will be 100 very different but equally competitive SAT
solvers available.
To better address workload balance in parallel SAT, we proposed
atwo-part strategy forutilizingparallel processing more effectively.
First, we introduced a scheduling algorithm that incorporates pre-
vious runtime distributions over a set of SAT instances to mini-
mize average latency over batch scheduling by 20%. Since several
instances require prohibitive amounts of runtime, we proposed a
lightweight parallel SAT algorithm that effectively partitions the
search space after initially running the solver in sequential mode.
We observe that our partitioning results in  50% better run-time
balance than simply choosing one splitting variable. Our partition-Figure 6: a) The percentage of satisable instances where the rst thread that completes nds a satisfying assignment. b) The
standard deviation of runtime between threads. Using XOR constraints as opposed to splitting one variable can signicantly improve
load balance and more evenly distribute solutions among threads.
ing strategy enables us to improve resource utilization over solver
portfolios by 60:5%. By incorporating our parallel solving strate-
gies on several different SAT solvers, solver portfolios can be fur-
ther improved because the randomness vital for efciently solving
SAT instances is better coordinated.
In addition to sheer necessity for parallel SAT, successful tech-
niques opens a new avenue in the design of future parallel EDA
tools. Rather than custom-developing parallel techniques for vari-
ous existing optimizations, one can look for reductions to (parallel)
SAT. Even in the cases where reductions to SAT were not justied
on a single processor, a highly-optimized generic parallel-SAT li-
brary may grow increasingly competitive as the number of cores on
a chip increases beyond several dozen.
8. REFERENCES
[1] M. Abramovici, J. DeSousa, and D. Saab, A massively-parallel
easily-scalable satisability solver using recongurable hardware,
DAC, pp. 684-690, 1999.
[2] F. Aloul, B. Sierawski, and K. Sakallah, Satometer: how much have
we searched?, TCAD, pp. 995-1004, 2003.
[3] K. Asanovic, R. Bodik, B. Catanzaro, J. Gebis, P. Husbands, K.
Keutzer, D. Patterson, W. Plishker, J. Shalf, S. Williams, and K.
Yelick The landscape of parallel computing research: a view from
Berkeley, ERL Technical Report, Berkeley.
[4] A. Biere, A. Cimatti, E. Clarke, and Y. Zhu. Symbolic model
checking without BDDs, TACAS, pp. 193-207, 1999.
[5] W. Chrabakh and R. Wolski, GraDSAT: a parallel SAT solver for
the grid, UCSB Comp. Sci. TR, 2003.
[6] M. Davis, G. Logemann, and D. Loveland, A machine program for
theorem proving, Comm. of ACM, pp. 394-397, 1962.
[7] N. Een and N. Sorensson, An extensible SAT-solver, SAT '03,
(http://www.cs.chalmers.se/Cs/Research/FormalMethods/MiniSat/).
[8] E. Goldberg, M. Prasad, and R. Brayton, Using SAT for
combinational equivalence checking, DATE, pp. 114-121, 2001.
[9] C. Gomes, W. Hoeve, A. Sabharwal, and B. Selman, Counting CSP
solutions using generalized XOR constraints, AAAI, pp. 204-209,
2007.
[10] C. Gomes and B. Selman, Algorithm portfolios, AI, pp. 43-62,
2001.
[11] C. Gomes, B. Selman, K. McAloon, and C. Tretkoff,
Randomization in backtrack search: exploiting heavy-tailed proles
for solving hard scheduling problems, AIPS, 1998.
[12] H. Hoos and T. Sttzl, SALIB: an online resource for research on
SAT, SAT, pp. 283-292, 2000.
[13] Y. Hu, V. Shih, R. Majumdar, and L. He, Exploiting symmetry in
SAT-based Boolean matching for heterogeneous FPGA technology
mapping, ICCAD, pp. 350-353, 2007.
[14] M. Lewis, T. Schubert, and B. Becker, Multithreaded SAT solving,
ASP-DAC, pp. 926-932, 2007.
[15] P. Manolios and Y. Zhang, Implementing survey propogation on
graphics processing units, SAT, pp. 311-324, 2006.
[16] J. Marques-Silva and K. Sakallah, GRASP: A search algorithm for
propositional satisability, IEEE Trans. Comp, pp. 506-521, 1999.
[17] M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik,
Chaff: engineering an efcient SAT solver, DAC, pp. 530-535,
2001.
[18] F. Okushi, Parallel cooperative propositional theorem proving,
Annals of Mathematics and AI, pp. 59-85, 1999.
[19] S. Plaza, K.-H Chang, I. Markov, and V. Bertacco, Node mergers in
the presence of don't cares, ASP-DAC '06, pp. 414-419.
[20] S. Plaza, I. Markov, and V. Bertacco, Optimizing non-monotonic
interconnect using functional simulation and logic restructuring, to
appear in ISPD'08.
[21] L. Valiant and V. Vazirani. NP is as easy as detecting unique
solutions, Theor. Comput. Sci., pp. 85-93, 1986.
[22] R. Williams, C. Gomes, B. Selman, Backdoors to typical case
complexity, IJCAI, 2003.
[23] L. Xu, F. Hutter, H. Hoos, and K. Leyton-Brown, SATzilla-07: the
design and analysis of an algorithm portfolio for SAT, CP, 2007.
[24] H. Zhang, M.P. Bonacina, and J. Hsiang, PSATO: a distributed
propositional prover and its application to quasigroup problems,
Jrnl of Symb Comp, pp. 1-18, 1996.
[25] H. Zhang, SATO: an efcient propositional prover, CADE, pp.
272-275, 1997.
[26] L. Zhang, C. Madigan, M. Moskewicz, and S. Malik, Efcient
conict driven learning in Boolean satisability, ICCAD, pp.
279-285, 2001.
[27] Y. Zhao, M. Moskewicz, C. Madigan, and S. Malik, Accelerating
Boolean satisability through application specic processing, ISSS,
pp. 244-249, 2001.
[28] P. Zhong, M. Martonosi, P. Ashar, and S. Malik, Using congurable
computing to accelerate Boolean satisability, TCAD, pp. 861-868,
1999.
[29] Q. Zhu, N. Kitchen, A. Kuehlmann, and A. Sangiovanni-Vincentelli,
SAT sweeping with local observability don't cares, DAC '06, pp.
229-234.