Cache Performance Study of Portfolio-Based Parallel CDCL SAT Solvers by Asín, Roberto et al.
ar
X
iv
:1
30
9.
31
87
v1
  [
cs
.D
C]
  1
2 S
ep
 20
13
Cache Performance Study of Portfolio-Based
Parallel CDCL SAT Solvers
Roberto As´ın*, Juan Olate**, and Leo Ferres**
*Department of Computer Engineering, Faculty of Engineering, Universidad Cato´lica
de la Sant´ısima Concepcio´n, Chile, rasin@ucsc.cl
**Department of Computer Science, Faculty of Engineering, Universidad de
Concepcio´n, Chile, {juanolate,lferres}@udec.cl
September 24, 2018
Abstract
Parallel SAT solvers are becoming mainstream. Their performance has
made them win the past two SAT competitions consecutively and are in
the limelight of research and industry. The problem is that it is not known
exactly what is needed to make them perform even better; that is, how to
make them solve more problems in less time. Also, it is also not know how
well they scale in massive multi-core environments which, predictably, is
the scenario of comming new hardware. In this paper we show that cache
contention is a main culprit of a slowing down in scalability, and provide
empirical results that for some type of searches, physically sharing the
clause Database between threads is beneficial.
Keywords: Satisfiability; Parallel SAT solving; Parallel processing;
Shared-memory SMP
1 Introduction
In this paper, we study the effect of cache performance on the scalability of
parallel solvers of the satisfiability problem in hierarchical-memory, symmetric
multi-processing systems; systems with more than one processor that share
memory. We find this topic important for three reasons:
First, in the last years, parallel SAT solvers (henceforth, pSATs) have been
performing at the top of the SAT Competitions1 (in 2011, all three wall-clock
time winners of the competition were parallel solvers). Also in 2011, pSATs
and sequential SAT solvers were grouped into a single competition track, which
shows the widespread interest in pSATs by research and industry. This appeal
stems in part from the inherently interesting properties of parallel algorithms,
1http://baldur.iti.uka.de/sat-race-2010/, http://www.satcompetition.org/
1
but also because of the need of the community to do better in new application
domains and handle even larger and more complex CNF formulas in shorter
times, taking advantage of modern hardware.
Second, instead of increasing clock performance, chip manufacturers are in-
vesting heavily on multicore architectures to improve performance and lower
power consumption (AMD released the 8-core Opteron 3260 EE in late 2011,
and Intel did the same with the Xeon E5-2650, and its low power version, the
Xeon E5-2650L in early 2012). As Herb Sutter put it, “the free lunch is over”
[Sut05], and by this he meant that software in general will not be getting any
faster by simply relying on faster processor clocks, but by relying on how soft-
ware scales in multicore systems.
Finally, modern memory architectures are not flat Processor↔RAM archi-
tectures, but a hierarchy of fast-but-small to slower-but-large memories with
latencies and capacities varying from 0.5ns access to 32Kb memories in the L1
(first level) cache, to tens of nanoseconds for megabyte-large memories like the
L3 cache (usually the last level, LL cache), to 100ns for gigabyte-sized access
times to memory such as main memory DDR ram. Hierarchical memory archi-
tectures have a strong impact on the performance of sequential software (e.g., in
a row-major representation of a matrix, memory transfers may be in the order
of the input divided by the size of the cache line, while memory transfers for
column scanning are in the order of the square of the input). This impact is
equal or bigger in the case of parallel processes since, for most architectures, the
cores in the system share some level of cache memory.
Given the three reasons above, in this paper we are concerned with the
following questions: how do pSATs scale in hierarchical-memory multicore ar-
chitectures? What is the effect of cache performance? Our case study is the win-
ner of the 2011 SAT Competition, plingeling, a portfolio-based pSAT solver
[Bie11]. The first experiment tested a modified plingeling on a 40-core ma-
chine (four ten-core processors), varying the number of threads. This instance of
plingelingwas modified so that, for each worker thread, the same search would
be performed (i.e. same strategies, starting parameters and without lemma ex-
changing). This allowed us to measure the impact of running p threads on the
same physical CPU. In Figure 1, we see how the modified plingeling’s per-
formance decays sharply (around 30% on average, up to maximum of ∼200%)
when several solver instances are executed on the same processor. This hap-
pens even though, in principle, instances do not logically share any resources
other than the common process address space. On the contrary, we would have
expected that all instances would perform similarly, plus or minus a small time
fraction. This is in fact what happens when plingeling is run in four different
cores of four different chips, in different and even in the same machine.
In order to find the reason of the performance decay, we have developed a
simple portfolio-based SAT solver (which we have called AzuDICI) that allows
us to both replicate the behavior of plingeling, and then also experiment with
alternative scenarios (such as physically sharing information among threads)
in order to take measures towards the improvement of its cache performance.
It is important to highlight that our SAT solver does not compare to high-
2
 0
 50
 100
 150
 200
 250
 1  2  3  4  5  6  7  8  9  10
Pe
rfo
rm
an
ce
 d
ec
ay
 (%
)
# of threads
aaai10-planning-ipc5-pathways-13-step17
E02F22
grid-strips-grid-y-3.035-NOTKNOWN
hwmcc10-timeframe-expansion-k45-pdtvissoap1-tseitin
hwmcc10-timeframe-expansion-k50-pdtpmsns2-tseitin
md5_48_3
q_query_3_L150_coli.sat
q_query_3_slp-synthesis-aes-top30
q_query_3_traffic_b_unsat
UCG-15-10p1
Figure 1: Performance decay of plingeling when run over 10 cores on a sin-
gle processor using several standard benchmarks (their long names are in the
legend.)
3
performance current state-of-the-art solvers as is plingeling itself. AzuDICI
serves as a useful tool to test and analyze the behavior of portfolio-based pSAT
solvers.
The paper is structured as follows, highlighting our contributions in the rel-
evant sections: in Section 2 we introduce the topic; that is, the general problem
of SAT solving and computational sequential SAT solving (Section 2.1) and par-
allel SAT solving 2.2. Section 3 shows the cache performance of plingeling, a
state-of-the-art portfolio-based SAT solver, when varying number of executing
threads. We show how cache contention significantly (and negatively) impacts
the performance of portfolio-based parallel SAT solvers. We conclude here that
pSAT solvers do not scale satisfactorily in shared-memory parallel architec-
tures. In Section 4, we introduce AzuDICI, a “simple” portfolio-based pSAT
solver that implements several levels of physical clause sharing. We also report
the results of several experiments that compare the cache performance of these
configurations, showing how physical clause sharing significantly helps certain
configurations (when all threads are executing the same search) while either
slightly or no help is observed when pSAT solvers threads perform different
searches. Finally, Section 5 closes the paper with some discussion of the future
of parallel SAT solving, situate our work in the context of what has been done
so far, and point the way for future work.
2 Preliminaries
2.1 SAT and (Sequential) SAT solvers
Let V be a fixed finite set of propositional variables. If v ∈ V , then v and ¬v
are literals of V . The negation of a literal l, written ¬l, denotes ¬v if l is v, and
v if l is ¬v. A clause is a disjunction of literals l1∨ . . .∨ ln. A (CNF) formula is
a conjunction of one or more clauses C1∧ . . .∧Cn. A (partial truth) assignment
M is a set of literals such that {v,¬v} ⊆M for no v. A literal l is true in M if
l ∈ M , is false in M if ¬l ∈ M , and is undefined in M otherwise. A clause C
is true in M if at least one of its literals is true in M . It is false in M if all its
literals are false in M , and is undefined in M otherwise. A formula F is true
in M , or satisfied by M , if all its clauses are true in M . In that case, M is a
model of F . If F has no models then it is unsatisfiable.
The problem we are interested in is the SAT problem: given a formula F , to
decide whether there exists a model of F or not. Since there exists a polynomial
transformation (see [Tse68]) from any arbitrary formula to an equisatisfiable
CNF one, we will assume w.l.o.g. that F is in CNF.
A program that solves this problem is called a SAT solver. The Conflict-
Driven-Clause-Learning (CDCL) algorithm is nowadays at the basis of most
state-of-ther-art SAT-solvers [AS09, Bie10, SNC09]. This algorithm has, at its
roots, the very simple DPLL algorithm [DLL62]. Thanks to work done mainly
in [JS97, MMZ+01, MSS99, ZS96, ES04a, Bie08], CDCL has evolved into an
algorithm that allows modern SAT-solvers to handle formulas of millions of
4
variables and clauses. Algorithm 1 sketches the CDCL algorithm.
Algorithm 1: CDCL algorithm
Input : Formula F = {C1, . . . , Cm}
Output: SAT OR UNSAT
1 status := UNDEF;
2 model := {};
3 dl := 0;
4 while status == UNDEF do
5 (conflict, model) := BCP(model, F );
6 while conflict 6= NULL do
7 if dl == 0 then
8 return UNSAT;
9 lemma := CONFLICT ANALYSIS(conflict, model, F );
10 F := F ∧ lemma;
11 dl := LARGEST DL OF FALSE LITS(lemma, model);
12 model := BACKJUMP TO DL(dl, model);
13 (conflict, model) := BCP(model, F );
14 if status == UNDEF then
15 dec := DECIDE(model, F );
16 dl := dl +1;
17 if dec = 0 then status := SAT model := model ∪ { dec };
18 return status
Basically, the CDCL algorithm is a backjumping search algorithm that in-
crementally builds a partial assignment M over iterations of the DECIDE and
BCP (Binary Constraint Propagation) procedures, returning SAT ifM becomes
a model of F or UNSAT if no such model exists.
The DECIDE procedure corresponds to a branching step of the search and
applies when the unit propagation procedure cannot set any further literal to
true (see below). When no inference can be done about which literals should
be true in M , a literal ldl is “guessed” and added to M in order to continue the
search. Each time a new decision literal is added to M the decision level dl of
the search is increased and we say that all literals in M after ldl and before ldl+1
belong to decision level dl. For further reading about the DECIDE procedure,
we refer to [MMZ+01, ES04b, PD07].
The BCP procedure applies when certain assignment M falsifies all literals
but one, of an undefined clause C. So, if l is the undefined literal in C, in order
for M to become a model of F , l must be added to M for C to be satisfied.
This procedure is tested for every clause of the Formula and it ends when there
is no literal left to add to M or when it finds a false clause. In the first case,
BCP returns an updated model containing all such propagations and the search
continues. In the second case it returns the falsified clause, which we call a
conflicting clause.
5
Since BCP usually takes about 90% of the total running time of a typical
modern SAT solver, many implementation techniques have been proposed to
make it more efficient. The algorithm that is implemented in most current state-
of-the-art SAT-solvers is known as the two-watched literal scheme [MMZ+01].
The underlying idea is that no clause with more than one literal will generate a
unit propagation or become conflicting if at least two of its literals are undefined
or, at least one of them is true. Hence, for each clause C, either C is true or
the algorithm makes sure that two undefined literals exist. For this purpose,
two non-false literals are watched in every clause. For every literal we keep a
list of the clauses where it is being watched. As soon as some literal l becomes
false in the assignment, we visit every clause C in its watch list. If the other
watched literal l′ of C is true, then C is satisfied and the invariant is preserved.
Otherwise we must find a non-false literal different from l and l′ to watch. If
we do not succeed and l′ is false, the clause is conflicting; if we do not succeed
and l′ is unassigned, we unit propagate l′. For a detailed review on BCP and
how different implementations perform in hierarchical (cache) memory we refer
to [ZM04].
Since many problems have a great percentage of binary clauses, many SAT
solvers represent the set of binary clauses as a graph of implications. For each
literal l, a list of implied literals (literals that must be true whenever l is in
M) is stored. For example, the clause l1 ∨ l2 is stored by adding l2 to the
implications list of ¬l1 and so is l1 to the list of ¬l2. Thus, BCP with binary
clauses becomes very efficient, since, to calculate the unit propagation of a literal
l ∈M , it suffices to go through its implications list and add every literal of the
list to M . In the case that a given literal l′ of the list is false, then a conflicting
binary clause ¬l∨ l′ is found and second condition for BCP termination applies.
If BCP finds a conflicting clause, then two possibilities can apply. If the
decision level of the search is zero (i.e. no decisions have been made) then
the CDCL procedure returns UNSAT. If this is not the case, a CONFLICT
ANALYSIS procedure is called. This procedure analyzes the cause of such
conflict (i.e. determines which decisions have driven to the conflict) and returns
a new clause (which we call a lemma) that is entailed by the original formula.
Then, the algorithm backjumps to an earlier decision level dl′ that corresponds
to the highest dl′ < dl among the false literals in the lemma, and propagates
with it. CONFLICT ANALYSIS works in such a way that, when backjumping
and propagating with the lemma, the original conflict is avoided. The lemmas
learned at CONFLICT ANALYSIS time are usually added to the formula in
order to avoid similar conflicts and can also be deleted (when the formula is
too big and they are no longer needed). For details of CONFLICT ANALYSIS,
we refer to [MSS99, ZMMM01] and for lemma deletion heuristics to [BS97,
GN02, AS09]. For a complete review of this algorithm as well as proofs over its
termination and soundness we refer to [NOT06].
6
2.2 Parallel SAT solvers
Parallel SAT solvers are not as mature as sequential ones and it is still not clear
which path to follow when designing and implementing such new solvers. In this
section we will briefly present the two main approaches used in parallel SAT
solvers for shared memory architectures2. We mainly classify the parallel SAT
solvers for shared memory architectures into two categories: portfolio-approach
and search-space splitting solvers.
The main idea behind portfolio approach solvers is the fact that different
strategies/parameters of CDCL sequential solvers or even different kinds of se-
quential solvers perform better for different families of SAT problems. In sequen-
tial CDCL SAT-solving, for example, there exist several parameters/strategies
related to the algorithm’s heuristics for restarting, deciding or cleaning the
clause database. Taking this into consideration, a portfolio approach is very
straightforward: run a group of sequential solvers in different threads, each with
different parameters and/or different strategies. This idea can be easily extrap-
olated to other non-CDCL SAT solvers. The time the portfolio-based parallel
solver will take to solve the problem will be the time of the fastest thread in the
group of solvers running in parallel. Differences between this kind of solvers lie
in whether the clause database should be physically shared [KK11] or, otherwise,
if each thread should have its own database. If this second approach is taken,
it is possible to implement the solver so that its different threads interchange
lemmas according to serveral policies: aggressively [HJS09] or selectively [Bie10]
or avoiding communications between threads at all [Rou11].
Search space splitting solvers do not run different solvers in parallel, but
run one solving instance that splits the search space into disjoint subspaces. A
common strategy to divide the search space is to use guiding paths [ZBH96].
A guiding path is a partial assignment M in F , which restricts the search
space of the SAT problem. A solver that divides its search space with guiding
paths will assign threads to solve F with the given M of the guiding path the
thread was assigned to. Once a thread finishes searching a guiding path with
no success, it can request another to keep searching (we refer to [SLB09] for
further explanations).
Both parallelization strategies (portfolio approach and search-space-splitting
approach) were and are currently being applied to shared memory parallel com-
puters (e.g. [Bie10]) as well as to distributed memory ones (e.g. [SLB09]). For
a further review on shared memory parallel SAT solving, we refer to [MML12]
In what follows, we focus on solvers implementing the portfolio approach
since this has reported the best results in recent papers and competitions.
2.3 Hardware and tools description
Since 2005, Intel has produced chips with more than one physical processing
core in order to speed up execution by sidestepping the difficulty of producing
2The revision on SAT solving in distributed memory architectures is out of the scope of
this paper.
7
chips with faster clocks. In these new machines, each core has private small
memories (called L1 cache, and sometimes L2) and progressively bigger (but
slower) shared memories (usually L2 and L3). These effectively constitute a
“memory hierarchy” which needs to be kept coherent to give the illusion of a
single shared memory. For instance, the architecture of two machines used to
run the tests were:
• Machine K: a dual-processor 6-core Intel Xeon CPU (E5645) running at
2.40GHz, with a total of 12 physical cores. Hyperthreading was disabled.
The computer runs Linux 3.0.0-15-server, in 64-bit mode.
• Machine I: a quad-processor 10-core Intel Xeon CPU (E7-4860) running
at 2.27GHz, for a total of 40 physical cores. Hyperthreading was enabled,
but we never ran our code in more than 40 cores (Cores 0-39) with the care
that a process was always bind to an unassigned core. Each core has one
processor unit (PU), with separate L1d (32KB) and L2 (256KB) caches.
They share a 30MB L3 cache. Main memory is 256GB. The computer
runs Linux 2.6.18-194, in 64-bit mode.
We used two testing computers for technical reasons. Since we were not the
administrators of machine I, we did not have access to the tools we needed for
some of the experiments. On the other hand, Machine I had more than three
times the number of physical cores of Machine K, providing stronger results. In
any case, in those experiments where results could be compared (those which
measured relative time, for example), we made the effort to compare them, and
their behavior was (in relative terms) always the same, rendering the results
generalizable.
3 Cache performance without physical clause shar-
ing
The plingeling program is a portfolio-based pSAT solver that has won the
parallel-track of both the 2010 SAT Race3 and the 2011 SAT Competition4.
Briefly, when plingeling is called, it launches several worker threads (operat-
ing system threads, such as POSIX threads) that differ in their random seeds,
some heuristic values and the intensity of some formula preprocessing meth-
ods. Each worker performs its individual search separately and, whenever one
of them finds a solution, it is reported and the other workers are interrupted.
Regarding information sharing, each of plingeling’s workers mantain its own
clause Database and they only exchange the Unitary Clauses they find during
search.
In this Section, we report our experimentation with this state-of-the-art
solver and carefully analyze its performance in Cache and the effect of such
Cache-behaviour in the overall performance of the solver.
3http://baldur.iti.uka.de/sat-race-2010/
4http://satcompetition.org/
8
3.1 Modified plingeling
One of the advantages we assume of parallel computing is that the more cores
we add, the better performance we will obtain. This should be also true for
plingeling, since the only difference of adding more threads (assuming we have
one thread per core) is that we will have a greater variety of solver strategies
trying to solve the same problem, and also some logical clause sharing among
threads. These are all valid assumptions in theory, but our empirical results
show that increasing the number of threads also carries a considerable decrease
in performance for portfolio solvers like plingeling due to cache misses. In
what follows, we go into detail.
Multicore shared memory systems have their cores sharing the same last
level cache (LLC) memory. The last level cache size in modern machines has
few megabytes and is usually not enough to hold all the data required by a SAT
instance. Therefore, there will be inevitably some communication between the
LLC and the main memory. The time cost of communication between the CPU
and the LLC cache is much lower than between the CPU and the main memory,
so we would like to keep data transfers from main memory to a bare minimum.
Portfolio SAT solvers that only share clauses logically have to keep a com-
plete database of clauses for each thread’s use. So as we add more threads, the
solver has greater needs of memory. But since usually all cores (in the same
chip) share the same LLC, all threads will have a lower chance of finding their
data in the LLC as we add more threads. In this scenario, what we would
expect to observe, as we have in our experiments, is a considerable decrease
in performance when adding threads, simply because we incur in more LLC
cache misses when the amount of data to be manipulated by different threads
increases. We do not usually appreciate this negative performance impact in
these type of solvers, because different threads implement different SAT solving
strategies, so the solving time will mostly depend on the fastest solving thread,
shadowing the negative performance impact of copying the clause database in
each thread.
For experimentation purposes, we modifed plingeling in such a way that
each thread did exactly the same search. For this, we initialized each thread
with the same random seed, heuristic values and clause database. Furthermore,
in order to keep them searching in the same way, we disabled clause sharing
between threads (which in plingeling corresponds to disabling the interchange
of found units) and assure that cleanup policies and algorithms were the same.
In these experiments, we would expect, theoretically, that adding more threads
would have no impact in the solving time, because all cores would be exactly
making the same search with their own data. However, in practice, we found
that the performance decay of having ten threads spread over ten cores ranged
from about 21% (1.21 times slower) to 200% (3 times slower) of the total time
one thread would take (Figure 1). Possible reasons for this behavior may be
due to several factors in modern SMP architectures. However, sharing resources
(such as the caches, communication and/or synchronization, or main memory)
could be seen as the main suspects. To find out, we ran another experiment
9
 0
 5
 10
 15
 20
 25
 30
 35
 40
 45
 50
 1  2  3  4
Pe
rfo
rm
an
ce
 d
ec
ay
 (%
)
# of threads
(a) plingeling same chip
 0
 5
 10
 15
 20
 25
 30
 35
 40
 45
 50
 1  2  3  4
Pe
rfo
rm
an
ce
 d
ec
ay
 (%
)
# of threads
(b) plingeling different chip
Figure 2: Modified plingeling performance decay
where a plingeling instance performing the same search with four threads was
executed on different physical CPU chips (rather than cores), and, to compare,
we ran the same experiment on four cores of the same CPU chip.
As can be seen in Figure 2b5, executing the solver on different CPU chips
does not impact performance, while executing it on the same CPU chip incurs in
a significant performance decay. According to the results above, the only shared
resource that could impact performance when run in one chip is the LLC or Last
Level Cache. To effectively measure the involvement of the LLC, we used the
perf tool6. The perf tool is a hardware abstraction over hardware counters
of the different CPU chips integrated in the Linux kernel to access profiling
information on retired instructions, branch misprediction, and in particular for
our purposes counting percentage of cache misses over cache hits (LLC-load-
misses). As Figure 3 strongly suggests, the performance decay observable in
Figure 1 is due to several solving threads thrashing the cache and thus wasting
much more time retrieving their individual data from main memory than their
single-threaded counterparts.
3.2 plingeling scalability
In this section, we provide an overview of how plingeling behaves at a larger
scale. So far, thanks to the experiment above, we know that the more threads
we add, the more cache hits/misses impacts negatively on performance. How-
ever, we also know that adding threads also adds new (and possibly successful)
strategies. This results in a trade-off between cache contention versus portfolio-
approach benefits. To find out where the trade-off equilibrium lies, we ran the
5This is Figure 1, showing only four cores
6perf.wiki.kernel.org. The perf utility was the reason why we could not use machine
I, it was only added in kernel version 2.6.31.
10
 0
 10
 20
 30
 40
 50
 60
 1  2  3  4  5  6
LL
C 
m
iss
 %
# of threads
Figure 3: LLC statistics for modified plingeling
original plingeling over 208 standard benchmarks taken from past SAT Races
and Competitions (see link above), varying the number of threads from one to
ten on a single chip with 10 physical cores. These 208 benchmarks are the newly
reported industrial/application benchmarks of competitions: the 2009 and 2011
SAT Competition and SAT Race 2010.
Figure 4 and Table 1 show that up until the fifth thread, scalability is good,
but from then on, the number of solved problems and total time reaches a
plateau. This means that plingeling cannot scale up on the number of cores
sharing an LLC. It is important to notice that executing the same ten-thread
solver in four different physical CPU chips solves more problems in less time,
while executing a 40-thread solver (10 threads per chip) behaves worse than
the four-threaded single-chip version. This effectively means that sharing cache
among threads has a negative impact in the overall behavior of modern portfolio-
approach-based parallel SAT solvers.
4 Cache Performance with physical clause shar-
ing
4.1 AzuDICI
It is clear that the problem with cache-misses is tightly related to the way
in which BCP is implemented [ZM04] and, therefore, with the way in which
clauses and watches are programmed. Since plingeling keeps a separate
11
Threads # Problems solved Total time
1 113 101399
2 121 95745
3 119 93854
4 122 90412
5 124 87953
6 124 89506
7 127 87416
8 124 88434
9 124 88931
10 125 89003
4 in 4 CPUs 126 88092
10 in 4 CPUs7 129 85224
40 in 4 CPUs 123 92387
Table 1: Scalability
clause database for each thread, it is possible that sharing data could improve
cache performance, because we would have a smaller amount of total data to
propagate with. Based on other parallel SAT-solvers that share their clause
database, mainly SArTagnan and MiraXT, we decided to implement AzuDICI, a
basic CDCL SAT-solver with the purpose of improving the BCP performance
in portfolio-based pSAT solvers. Three versions of AzuDICI were implemented,
one in which each thread keeps a separate database clause; another in which,
as in MiraXT, shares all clauses physically; and a hybrid one that only shares
the binary implication lists.
4.1.1 The general structure of AzuDICI
AzuDICI8 is a standard CDCL solver based on plingeling, barcelogic and
miraXT. In particular, AzuDICI implements binary implication lists for the prop-
agation with binary clauses, and the two-watched[ZM04] for BCP with clauses
of more than two literals. AzuDICI also implements the 1-UIP algorithm[MSS99,
MMZ+01] for conflict analysis, the lemma simplification algorithm used in
PicoSAT, Luby restarts [LSZ93], a policy for lemma cleaning that keeps only
binary and ternary lemmas, and more than four-literal lemmas that have par-
ticipated in a conflict since the last cleanup. Finally, AzuDICI also incorporates
the EVSIDS heuristic for branching literal decisions [ES04b].
4.1.2 Shared-none
This version works as plingeling, it does not share any clause physically. Each
thread keeps its own independent database of clauses and propagates with it.
8You can find the latest implementation of AzuDICI at
https://github.com/leoferres/azu.
12
 82
 84
 86
 88
 90
 92
 94
 96
 98
 100
 102
 1  2  3  4  5  6  7  8  9  10
To
ta
l s
ol
vi
ng
 ti
m
e 
(x1
03
 
se
co
n
ds
)
# of threads
in 4 chips
129
113
121
119
122
124
124
127
124 124
125
123 with 40 threads in 4 chips
Figure 4: Total solving time per number of threads.
Note that the fact that we are not sharing data physically does not mean threads
cannot share information. They could, for example, share unit clauses through
message passing between threads, just as plingeling does. We are interested
in measuring the impact of sharing data physically on cache performance, and
not the benefits for the search of sharing information itself.
4.1.3 Shared-bins
The Shared-bins version shares the binary implication lists. All threads have
access to the same physical data, they all can modify and read this structure.
Figure 5 is a schematization of our binary implication lists structure. We have an
array of binary lists, one for each literal. A binary list is basically two pointers,
one to a first binary node and another to the last binary node associated with
that list. A binary node is an array of literals that also has a pointer to another
binary node. The amount of literals a binary node can hold will depend on the
size of the cache line we are working with; it will have as many literals as a
cache line can hold. The literals implied by the literal associated to a binary
list will be the ones in the binary node referenced by that binary list pointer
and the subsequently referenced binary nodes.
When a thread wants to add the clause {li, lj}, it must look for the binary
list associated with ¬li and go to the last node linked to that binary list. If
there is enough space in that node to add another literal, then it adds lj . If the
node is full, then it must create a new node with the lj literal, insert it at the
end of the linked list of nodes and update the binary list last node pointer. It
13
      


 	












 	
 	
 	
 	

	
                 
Figure 5: Binary clause database
does the same for the binary list of ¬lj .
To ensure consistency of data when multiple threads are inserting, each
binary list has a lock. If a thread is inserting a new implicated literal, it first
locks the binary list where it is inserting and then proceeds to insert. If by
chance another thread wants to insert in the same binary list, it must wait till
the lock is freed. Since adding binary clauses is not frequent, and the event that
it would happen in the same binary list is even less frequent, the contention
that these locks generate is unnoticeable in our experimental results.
4.1.4 Shared-all
In the shared-bins solver we had no need to modify the usual two watched literal
scheme used in propagation. This is not the case for this version that shares n-
ary clauses physically. For the implementation of the two watched literal scheme,
it is necessary to keep track of which literals are being watched in each n-clause.
For instance, in a sequential solver, a typical implementation would consist in
watching the first two literals of the clause. In portfolio-based pSAT solvers that
physically share de clauses, since several threads could be accessing the same
clause, changes to the clause are not feasible. It is impossible since threads of
the portfolio may be watching different literals of a same clause. Instead, we
have used a similar approach to that used in MiraXT, where each thread keeps
track of the literals being watched in each clause. Figure 6 is a schematization
of how each thread worker relates with the n-ary clause database. Each SAT
solver thread has a vector of pointers to thread clauses called watches, and each
literal present in the SAT problem has a position associated with this watches
vector. A thread clause has two watched literals (WL0 and WL1), two pointers
14
	

	

	

	

	

	

	

	

	



	


  
   
   
Figure 6: The thread clause database and n-ary clause database
to another thread clause (NW0 and NW1) and a pointer to an actual n-ary
clause in the n-ary clause database. W0 and W1 keep track of the literals being
watched by the thread for a given n-ary clause. NW0 and NW1 point to the
next thread clauses where WL0 and WL1 are also being watched. The n-clause
also has a flag for each worker thread to identify which ones are using that
clause for propagation.
To insert a new n-ary clause, we first make sure that the clause does not
exist in the database. If it does not exist, we create the n-clause, set the current
thread flag to true and add it to the database. On the other hand, if it does
exist, we just toggle the corresponding thread flag of the n-clause to true. The
“insert” procedure is locked so that two different threads can not insert at the
same time. In our experiments we have not noticed any considerable overhead
caused by this lock. In fact, after profiling, the time spent by the “insert”
function is negligible in the total running time of the program, with or without
locks.
To find out whether sharing the clause database was beneficial to the same-
chip portfolio-based solver, we ran AzuDICI in three different versions (sharing
all the clauses, sharing none of the clauses and sharing only binary clauses) on
eight problems using one to six threads. Each run was repeated five times to
mitigate potential system noise.
In the following subsections we present the results of two experiments mea-
suring cache misses. In section 4.2 we ran the three different versions of AzuDICI
where each thread executed the same search, which resembles the experiments
done using modified plingeling(see section 3.1). In section 4.3, the three ver-
sions of AzuDICI were used to test the canonical work of SAT solvers, where
each thread executed a different search.
15
Dataset T2 T3 T4 T5 T6
manol-pipe-c10b
1.16 1.33 1.46 1.58 1.66
1.14 1.28 1.40 1.50 1.58
1.11 1.24 1.33 1.44 1.53
manol-pipe-c6bid i
1.17 1.32 1.43 1.54 1.61
1.15 1.29 1.39 1.48 1.55
1.12 1.23 1.33 1.41 1.53
manol-pipe-c6nidw i
1.18 1.33 1.44 1.54 1.61
1.16 1.30 1.39 1.48 1.55
1.12 1.22 1.32 1.41 1.52
manol-pipe-c7idw
1.19 1.33 1.42 1.50 1.56
1.15 1.27 1.36 1.41 1.48
1.15 1.24 1.31 1.38 1.47
manol-pipe-cha05-113
1.14 1.28 1.41 1.53 1.61
1.12 1.24 1.35 1.45 1.53
1.11 1.22 1.29 1.38 1.47
anbul-dated-5-15-u
1.26 1.49 1.67 1.81 1.91
1.24 1.45 1.60 1.73 1.81
1.15 1.39 1.50 1.64 1.74
ibm-2002-31 1r3-k30
1.10 1.22 1.29 1.37 1.42
1.10 1.21 1.28 1.35 1.39
1.09 1.16 1.23 1.30 1.34
post-c32s-gcdm16-22
1.09 1.17 1.23 1.30 1.33
1.08 1.17 1.22 1.28 1.30
1.06 1.15 1.20 1.26 1.29
Table 2: Performance decay in percentage over the T1 running time for AzuDICI
version and number of threads (T). The first row of each dataset corresponds
to Shared-None version, second row to Shared-Bin and third row to Shared-All.
4.2 Same search experiments
For this experiment, AzuDICI was modified to carry out the same search in each
thread (i.e., there is no lemma sharing among threads). For each AzuDICI ver-
sion, we measured the time needed to solve each benchmark, and the percentage
of LLC misses. The results for this are shown in Table 2 and below.
Notice that the datasets we chose for this experiment were influenced by the
early state of development of AzuDICI. Our solver is not optimized, and has
been implemented for the purposes of experimentation. Thus, the datasets are
generally “easier” so that AzuDICI can solve them. Contrariwise, we do not use
the benchmarks we used for AzuDICI for plingeling, because these are solved
so fast that scalability cannot be reliably measured. In other words, we have
divided the whole dataset of problems into “easy” and “hard” problems. We
have operationalized “easy” problems as those that AzuDICI can solve in the
span of five to fifteen minutes, while plingeling takes less than one minute.
“Hard” problems, in turn, are those that plingeling takes between five and
fifteen minutes to solve. We use hard problems in AzuDICI for different search
experiments with a timeout of 15 minuts and we don’t use easy problems in
plingeling, since, for the latter, results would be tainted by system noise.
Besides, given the nature of CDCL solvers, the size of the clause database will be
increasing with execution time, and this size increase is where cache contention
manifests itself more evidently.
Table 2 and Figure 7 show performance decay over the one thread (T1=1)
16
 0
 10
 20
 30
 40
 50
 60
 1  2  3  4  5  6
Pe
rfo
rm
an
ce
 D
ec
ay
 (%
)
# of threads
AzuDICI-Shared-All-Same-Search Average
AzuDICI-Shared-None-Same-Search Average
AzuDICI-Shared-Binaries-Same-Search Average
Figure 7: Average running time for AZUDici version and number of threads
setting. Even if performance is worse as we add more threads, shared-all will
perform consistently better than shared-binary, which will in turn perform bet-
ter (perhaps less noticeably) than shared-none. This is due to the different levels
of non-replication and physical sharing of the database clause.
On aggregate, comparing the performance of AzuDICI as shown in Figure
7 and 8, it is evident that the solver performs best in the shared-all setting,
followed by the shared-binary and finally by the shared-none setting in both
running time and cache misses. Thus, our implementation shows that physically
sharing the clause database is beneficial to avoid cache contention in the rather
artificial case of same search.
4.3 Different search experiments
Although the previous experiments point out that physically sharing the clause
database between threads may lead to improve the cache performance of the
solvers, the results are not generalizable to a full-featured SAT solver. It may
be the case that while threads are carrying out the same search, it is more likely
that they will access the same data. Whereas different search threads are clearly
not necessarily accessing the same data at the same time.
To find out how real portfolio-based SAT solvers implementing different
levels of physical clause sharing behave, we ran AzuDICI in the same three
different versions as before (sharing-all, share-none and share-bin) on the eight
problems introduced in Section 3.1 using one to six threads and a 5-minute
timeout (we therefore did not include a running time table and graph). The
timeout characteristics was due to the fact that different searches among threads
may result in different (potentially better) strategies, affecting execution time
17
 5
 10
 15
 20
 25
 30
 35
 40
 45
 50
 55
 1  2  3  4  5  6
LL
C 
m
iss
 %
# of threads
AzuDICI-Shared-All-Same-Search Average
AzuDICI-Shared-None-Same-Search Average
AzuDICI-Shared-Binaries-Same-Search Average
Figure 8: Average cache misses for AZUDici version and number of threads
and rendering search behavior effectively incomparable. Each run was repeated
five times to clean up potential system noise.
There are a few things to notice about Table 4. First, there are overall fewer
cache misses in the shared-all setting, with the exception of file grid-...-3.035-NOTKNOWN.
This file is particular in that there is almost no cache contention. This may
be because the percentage of binary clauses (98%) accounts for practically all
clauses. Due to the special data structures used for propagation with binary
clauses, the propagation computation does not incur in noticeable cache penal-
ties. It is also interesting to notice that the difference between shared-binary
and shared-all when compared to shared-none (see Figure 9) is not as large as
in the previous section (see Figure 8, and Table 3 for details).
From these data, we may conclude that physically sharing the whole clause
database does not seem to significantly improve the cache performance of the
portfolio-based pSAT solvers.
5 Conclusions and Future Work
We showed that the impact of threads accessing the shared caches is very signif-
icant and it negatively impacts the scalability of portfolio-based pSAT solvers.
We believe this is an important topic for further advancing knowledge on parallel
SAT solvers.
When it comes to whether physically sharing the clause database among
threads is advantageous, we may conclude that it is not yet clear whether sharing
18
Dataset T1 T2 T3 T4 T5 T6
manol-pipe-c10b
2.6(2.4) 15.7(0.6) 27.2(0.1) 36.4(0.2) 43.2(0.1) 48.8(0.0)
3.5(5.0) 13.5(2.9) 21.0(3.6) 27.5(2.9) 31.2(1.7) 38.6(0.8)
3.4(4.8) 9.6(8.2) 15.4(5.1) 17.7(5.3) 21.2(4.4) 29.2(2.5)
manol-pipe-c6bid i
11.8(0.3) 27.2(0.1) 37.5(0.1) 44.8(0.1) 50.3(0.0) 54.7(0.0)
12.5(0.2) 24.0(2.6) 32.6(2.0) 38.6(1.2) 43.3(2.7) 49.0(0.5)
12.1(0.4) 18.0(7.1) 21.1(0.1) 25.7(3.2) 28.3(1.0) 43.3(1.3)
manol-pipe-c6nidw i
13.2(0.2) 29.0(0.1) 39.3(0.1) 46.5(0.0) 52.1(0.1) 56.3(0.0)
14.0(0.6) 26.2(2.3) 35.4(1.4) 41.2(2.4) 44.1(0.6) 50.3(0.5)
13.5(0.3) 19.5(6.4) 22.6(0.1) 26.4(0.1) 29.5(0.1) 43.7(1.8)
manol-pipe-c7idw
13.7(0.6) 30.1(0.1) 38.7(0.1) 44.6(0.1) 49.0(0.0) 52.6(0.1)
14.8(0.4) 24.6(3.7) 30.8(3.3) 35.8(2.5) 34.4(1.6) 42.0(2.0)
15.3(0.4) 22.4(6.2) 22.9(0.1) 25.2(0.1) 27.2(0.1) 36.4(4.5)
manol-pipe-cha05-113
1.8(2.3) 13.2(1.1) 23.2(0.6) 31.8(0.3) 38.6(0.1) 44.4(0.0)
2.4(4.2) 10.6(1.4) 17.4(1.8) 23.7(1.0) 27.1(1.2) 34.6(0.5)
2.5(6.0) 8.5(0.8) 13.0(8.2) 15.3(5.9) 16.8(2.4) 24.2(2.8)
anbul-dated-5-15-u
8.6(0.6) 28.4(0.2) 42.9(0.1) 52.3(0.1) 58.9(0.0) 63.2(0.0)
8.9(0.6) 27.7(0.1) 41.1(0.3) 50.2(0.1) 55.8(0.2) 61.0(0.1)
6.5(9.1) 15.0(9.1) 28.9(7.9) 33.9(3.8) 37.9(2.0) 49.2(1.7)
ibm-2002-31 1r3-k30
13.2(1.2) 28.7(0.2) 39.9(0.0) 48.0(0.0) 53.7(0.1) 57.8(0.0)
13.0(0.3) 28.2(0.2) 39.0(0.1) 47.1(0.0) 52.4(0.3) 56.3(0.2)
11.8(2.5) 23.3(0.8) 25.1(4.1) 31.7(4.8) 31.9(3.1) 42.5(2.1)
post-c32s-gcdm16-22
13.8(0.7) 27.4(0.2) 37.1(0.0) 43.9(0.1) 49.0(0.1) 53.0(0.0)
13.8(0.1) 25.4(2.7) 34.4(0.8) 41.2(0.3) 44.0(1.0) 47.6(0.9)
13.2(0.3) 19.8(9.0) 27.3(3.2) 31.0(1.8) 30.7(3.6) 39.5(1.5)
Table 3: LLC misses (%) for AzuDICI version and number of threads (T). The
first row of each dataset corresponds to Shared-None version, second row to
Shared-Bin and third row to Shared-All . Numbers in parentheses are standard
deviations in %.
the clause database in the way we have proposed has a significant effect on
running time of the solver. On the one hand, it should be clear that physically
sharing data (in general) should be beneficial for parallel programs (if for no
other reason, just to save space). However, to implement solvers that physically
share the clause database is non-trivial, and prone to increasing complexity and
running time. Interestingly, the problems are not related, as is usual with these
kinds of systems, to, for instance, synchronization among the threads, but rather
stem from the BCP mechanism. This is so because each thread must keep track
of its own watches (which, as far as is known, cannot be shared in a portfolio
approach).
The relevance of cache efficient algorithms in CDCL SAT-solvers perfor-
mance has been known since at least 1993, with Zhang and Malik’s paper
[ZM04]. In this work we intend to update and measure such influence in
portfolio-based parallel CDCL SAT-solvers. The design of AzuDICI shared-all
version is quite similar to that of the (pa)MiraXT solver [SLB09]. To the best
of our knowledge, the first portfolio-based pSAT solver that physically shares
the clause database is SArTagnan [KK11]. An in-depth survey of parallel CDCL
SAT-solvers can be found in [MML12].
As future work we also plan to continue the development of AzuDICI, in-
corporating in it further enhancements like formula preprocessing and variable
elimination among others, so as to make it a competitive parallel SAT solver.
Regarding the Cache performance, we also plan to implement our solver with
19
Dataset T1 T2 T3 T4 T5 T6
aaai10-. . . -step17
3.0(5.2) 7.5(0.6) 14.0(0.3) 21.5(0.3) 27.3(0.2) 34.1(0.1)
2.9(1.4) 6.4(3.8) 13.9(2.7) 17.8(3.0) 25.5(1.1) 31.2(1.5)
2.8(0.9) 5.8(2.0) 12.3(1.8) 16.5(1.4) 22.6(0.2) 29.0(1.3)
E02F22
31.8(0.4) 44.0(0.1) 48.5(0.0) 53.8(0.0) 58.0(0.0) 62.6(0.1)
26.8(0.2) 40.9(5.1) 47.5(3.4) 54.7(1.5) 58.9(2.3) 62.3(1.9)
30.9(0.1) 40.2(2.7) 48.7(3.1) 52.8(1.9) 57.3(1.6) 60.4(2.2)
grid-. . . -3.035-NOTKNOWN
0.2(3.5) 0.5(1.4) 0.5(0.9) 0.7(2.1) 0.8(2.9) 1.0(7.0)
0.2(8.5) 0.7(4.3) 0.9(1.4) 1.0(3.9) 1.1(3.4) 1.4(2.2)
0.3(1.1) 0.3(1.4) 0.6(4.2) 0.9(4.4) 1.1(4.0) 1.4(7.7)
hwmcc10-. . . -k45-pdtvissoap1-tseitin
22.0(0.4) 33.6(0.1) 44.2(0.1) 50.6(0.1) 54.9(0.0) 59.5(0.0)
25.3(0.6) 34.5(0.4) 42.1(0.5) 48.3(0.2) 53.3(0.1) 56.8(0.3)
25.1(0.1) 34.1(0.6) 41.8(0.3) 48.3(0.3) 52.6(0.2) 56.0(0.3)
hwmcc10-. . . -k50-pdtpmsns2-tseitin
12.8(0.3) 27.0(0.0) 35.9(0.1) 44.8(0.0) 49.3(0.1) 54.0(0.0)
14.0(0.6) 26.0(0.9) 35.8(0.7) 40.9(1.3) 46.7(0.2) 49.8(0.5)
15.0(0.3) 26.8(1.3) 36.2(0.3) 41.9(0.1) 46.6(0.4) 50.6(0.3)
md5 48 3
12.1(0.5) 23.3(0.1) 31.9(0.2) 39.1(0.1) 43.6(0.1) 48.2(0.0)
13.4(0.3) 23.0(2.8) 32.0(0.2) 37.0(0.2) 41.8(2.1) 46.1(0.1)
14.3(0.2) 23.5(0.3) 32.2(0.3) 37.0(0.1) 41.8(0.2) 45.7(0.2)
q query 3 L150 coli.sat
23.1(0.8) 39.0(0.2) 50.2(0.0) 58.2(0.1) 62.1(0.0) 66.0(0.0)
35.8(0.1) 43.0(1.2) 52.8(1.5) 57.9(0.3) 62.3(0.8) 66.8(0.3)
34.5(0.3) 36.6(2.8) 50.4(1.3) 56.0(0.5) 61.2(0.7) 63.6(0.5)
slp-synthesis-aes-top30
11.5(0.7) 20.4(0.2) 29.3(0.1) 36.8(0.1) 42.3(0.1) 47.7(0.1)
12.2(0.2) 17.8(0.8) 26.2(0.5) 32.1(0.2) 37.8(0.4) 42.9(0.3)
13.2(0.2) 18.3(0.2) 26.3(0.2) 31.7(0.4) 37.4(0.3) 42.1(0.2)
traffic b unsat
4.4(3.4) 17.4(0.7) 31.9(0.3) 44.3(0.1) 52.1(0.0) 59.8(0.0)
4.3(1.8) 16.7(1.2) 31.4(0.2) 41.7(0.2) 51.4(0.1) 58.7(0.0)
4.0(3.5) 14.6(0.3) 28.4(0.5) 37.5(0.4) 46.3(0.2) 53.8(0.1)
UCG-15-10p1
21.4(0.6) 31.8(0.1) 40.6(0.1) 43.6(0.1) 46.9(0.1) 49.8(0.1)
20.7(0.7) 31.7(3.0) 40.9(2.4) 46.2(2.0) 53.5(0.8) 55.5(0.6)
22.6(0.6) 34.8(1.8) 42.8(2.8) 49.6(2.1) 51.6(1.7) 55.2(0.5)
Table 4: LLC misses (%) for AzuDICI version and number of threads (T). The
first row of each dataset corresponds to Shared-None version, second row to
Shared-Bin and third row to Shared-All. Numbers in parenthesis are standard
deviations in %.
20
 10
 15
 20
 25
 30
 35
 40
 45
 50
 1  2  3  4  5  6
LL
C 
m
iss
 (%
)
# of threads
AzuDICI-Shared-All-Different-Search Average
AzuDICI-Shared-None-Different-Search Average
AzuDICI-Shared-Binaries-Different-Search Average
Figure 9: Average cache misses for AZUDici version and number of threads
compact data structures so that more information can fit in the cache.
Acknowledgments
We would like to thank several people and institutions for their help in producing
this paper, either in time, resources or both. First, we would like to thank
the Barcelogic group for sharing the code of a past Barcelogic SAT-solver and
for their support while writing AzuDICI. Likewise, we’d like to thank Armin
Biere for opensourcing his plingeling code, and Enric Rodriguez-Carbonell,
Technical University of Catalunya for (conscientiously!) proofreading an earlier
draft of this paper. Finally, we would like to thank Paul Steinberg at Intel,
and the staff at the Intel ManyCore Testing Lab, where we carried out the
experiments. Roberto As´ın acknowledges the partial support from Fondecyt
through Project No. 11121220
References
[AS09] Gilles Audemard and Laurent Simon. Predicting learnt clauses
quality in modern sat solvers. In Craig Boutilier, editor, IJCAI,
pages 399–404, 2009.
[Bie08] Armin Biere. Picosat essentials. JSAT, 4(2-4):75–97, 2008.
21
[Bie10] Armin Biere. Lingeling, Plingeling, PicoSAT and PrecoSAT at SAT
Race 2010. Technical report, Institute for Formal Models and Ver-
ification, Johannes Kepler University, 2010.
[Bie11] Armin Biere. Lingeling and friends at the SAT Competition 2011.
Technical Report 11-1, FMV Reports Series, Institute for Formal
Models and Verification, Johannes Kepler University, Linz, Austria,
March 2011.
[BS97] Roberto J. Jr. Bayardo and Robert C. Schrag. Using CSP look-
back techniques to solve real-world SAT instances. In Proceed-
ings of the Fourteenth National Conference on Artificial Intelligence
(AAAI’97), pages 203–208, Providence, Rhode Island, 1997.
[DLL62] M. Davis, G. Logemann, and D. Loveland. A Machine Program for
Theorem-Proving. Communications of the ACM, CACM, 5(7):394–
397, 1962.
[ES04a] N. Ee´n and N. So¨rensson. An Extensible SAT-solver. In
E. Giunchiglia and A. Tacchella, editors, 6th International Confer-
ence on Theory and Applications of Satisfiability Testing, SAT’03,
volume 2919 of Lecture Notes in Computer Science, pages 502–518.
Springer, 2004.
[ES04b] N. Ee´n and N. So¨rensson. An Extensible SAT-solver. In
E. Giunchiglia and A. Tacchella, editors, 6th International Confer-
ence on Theory and Applications of Satisfiability Testing, SAT’03,
volume 2919 of Lecture Notes in Computer Science, pages 502–518.
Springer, 2004.
[GN02] E. Goldberg and Y. Novikov. BerkMin: A fast and robust SAT-
solver. In Design, Automation, and Test in Europe (DATE ’02),
pages 142–149, 2002.
[HJS09] Youssef Hamadi, Sa¨ıd Jabbour, and Lakhdar Sais. Manysat: a
parallel sat solver. JSAT, 6(4):245–262, 2009.
[JS97] Roberto J. Bayardo Jr. and Robert Schrag. Using csp look-back
techniques to solve real-world sat instances. In AAAI/IAAI, pages
203–208, 1997.
[KK11] Stephan Kottler and Michael Kaufmann. Sartagnan - a parallel
portfolio sat solver with lockless physical clause sharing. In Prag-
matics of SAT, 2011.
[LSZ93] Michael Luby, Alistair Sinclair, and David Zuckerman. Optimal
speedup of las vegas algorithms. Inf. Process. Lett., 47(4):173–180,
1993.
22
[MML12] Ruben Martins, Vasco M. Manquinho, and Ineˆs Lynce. Parallel
search for maximum satisfiability. AI Commun., 25(2):75–95, 2012.
[MMZ+01] M. W. Moskewicz, C. F. Madigan, Y. Zhao, L. Zhang, and S. Ma-
lik. Chaff: Engineering an Efficient SAT Solver. In 38th Design Au-
tomation Conference, DAC’01, pages 530–535. ACM Press, 2001.
[MSS99] Joao Marques-Silva and Karem A. Sakallah. Grasp: A search
algorithm for propositional satisfiability. IEEE Trans. Comput.,
48(5):506–521, 1999.
[NOT06] R. Nieuwenhuis, A. Oliveras, and C. Tinelli. Solving SAT and SAT
Modulo Theories: From an abstract Davis–Putnam–Logemann–
Loveland procedure to DPLL(T). Journal of the ACM, JACM,
53(6):937–977, 2006.
[PD07] Knot Pipatsrisawat and Adnan Darwiche. Rsat 2.0: Sat solver de-
scription. Technical Report D–153, Automated Reasoning Group,
Computer Science Department, UCLA, 2007.
[Rou11] Olivier Roussel. Description of ppfolio. Technical report, CRIL,
Centre de Recherche en Informatique de Lens, 2011.
[SLB09] Tobias Schubert, Matthew D. T. Lewis, and Bernd Becker. Pami-
raxt: Parallel sat solving with threads and message passing. JSAT,
6(4):203–222, 2009.
[SNC09] Mate Soos, Karsten Nohl, and Claude Castelluccia. Extending sat
solvers to cryptographic problems. In Oliver Kullmann, editor,
SAT, volume 5584 of Lecture Notes in Computer Science, pages
244–257. Springer, 2009.
[Sut05] Herb Sutter. The free lunch is over: A fundamental turn toward
concurrency in software. Dr. Dobbs Journal, 30(3):202–210, 2005.
[Tse68] G. S. Tseitin. On the Complexity of Derivation in the Propositional
Calculus. Zapiski nauchnykh seminarov LOMI, 8:234–259, 1968.
[ZBH96] Hantao Zhang, Maria Paola Bonacina, and Jieh Hsiang. Psato: a
distributed propositional prover and its application to quasigroup
problems. J. Symb. Comput., 21(4):543–560, 1996.
[ZM04] L. Zhang and S. Malik. Cache Performance of SAT Solvers: A Case
Study for Efficient Implementation of Algorithms. In E. Giunchiglia
and A. Tacchella, editors, 6th International Conference on Theory
and Applications of Satisfiability Testing, SAT’03, volume 2919 of
Lecture Notes in Computer Science, pages 287–298. Springer, 2004.
23
[ZMMM01] L. Zhang, C. F. Madigan, M. W. Moskewicz, and S. Malik. Effi-
cient conflict driven learning in a Boolean satisfiability solver. In
Int. Conf. on Computer-Aided Design (ICCAD’01), pages 279–285,
2001.
[ZS96] H. Zhang and M. E. Stickel. An efficient algorithm for unit prop-
agation. In Proceedings of the Fourth International Symposium on
Artificial Intelligence and Mathematics (AI-MATH’96), Fort Laud-
erdale (Florida USA), 1996.
24
