Abstract-This paper reports on an innovative approach for solving satisfiability problems for propositional formulas in conjunctive normal form (SAT) by creating a logic circuit that is specialized to solve each problem instance on field programmable gate arrays (FPGAs). This approach has become feasible due to recent advances in reconfigurable computing and has opened up an exciting new research field in algorithm design. SAT is an important subclass of constraint satisfaction problems, which can formalize a wide range of application problems.
I. INTRODUCTION

R
ECENTLY, due to advances in field programmable gate array (FPGA) technologies [4] , users can create original logic circuits and electronically reconfigure them. Furthermore, users are able to describe their designs in a hardware description language (HDL) and obtain logic circuits by using existing high level logic synthesis technologies [5] , [6] . These recent hardware technologies enable users to rapidly create logic circuits specialized to solve each problem instance. We call this problem solving approach the Reconfigurable Computing approach.
A constraint satisfaction problem (CSP) is a general framework that can formalize various application problems, and many theoretical and experimental studies have been performed on these problems [7] . In particular, a satisfiability problem for propositional formulas in conjunctive normal form (SAT) is an important subclass of CSP. This problem was the first computational task shown to be NP-hard [8] .
In this paper, we report on an innovative approach for solving SAT using the reconfigurable computing approach to create a Manuscript received February 28, 2000; revised July 15, 2000 . This paper is an extended version of the authors' previous conference papers [1] - [3] .
T. Suyama is with Research and Development Center, NTT West, Tokyo, Japan (e-mail: t.suyama@rdc.west.ntt.co.jp).
M. Yokoo and H. Sawada are with NTT Communication Science Laboratories, Kyoto, Japan (e-mail: yokoo@cslab.kecl.ntt.co.jp; sawada@cslab. kecl.ntt.co.jp).
A. Nagoya is with NTT Network Innovation Laboratories, Kanagawa, Japan (e-mail: nagoya@exa.onlab.ntt.co.jp).
Publisher Item Identifier S 1063-8210(01)00713-2.
logic circuit that is specialized to solve each problem instance on FPGAs. After the authors presented an initial report on this approach [1] , various research efforts following this line have been carried out [9] - [15] . Consequently, solving SAT using FPGAs has become a vital research area.
In the remainder of the paper, we briefly describe the problem definition in Section II and the reconfigurable computing approach in Section III. Then, we present in detail the developed algorithm, which is suitable for implementation on a logic circuit. Such algorithms can be divided into two groups, i.e., algorithms using static variable ordering in Section IV and algorithms using dynamic variable ordering in Section V. Then, we show how these algorithms can be implemented on FPGAs in Section VI. Next, we show evaluation results obtained with the software simulation and actual implementation in Section VII. We discuss related works in Section VIII. Finally, we conclude in Section IX.
II. PROBLEM DEFINITION
A satisfiability problem for propositional formulas in conjunctive normal form (SAT) can be defined as follows. A Boolean variable is a variable equal to either true or false (represented as 1 or 0, respectively). The value assignment of one variable is called a literal. A clause is a disjunction of literals, e.g.,
. Given a set of clauses and variables , the satisfiability problem is to determine if the formula is satisfiable, i.e., to determine whether there exists an assignment of values to the variables such that the above formula is true.
In this paper, if the formula is satisfiable, we assume that we need to find all or a fixed number of solutions, i.e., the combinations of variable values that satisfy the formula. Most of the existing algorithms for solving SAT problems aim to find only one solution. Although this setting corresponds to the original problem definition, some application problems, such as visual interpretation tasks [16] and diagnosis tasks [17] , require finding all or multiple solutions. Furthermore, since finding all or multiple solutions is usually much more difficult than finding only a single solution, solving the problem by special-purpose hardware is a productive approach. Therefore, in this paper we set our goal to finding all or multiple solutions.
In the following sections, for simplicity we restrict our attention to 3-SAT problems, i.e., the number of literals in each clause is three. Any general SAT problem instance can be transformed into a 3-SAT problem instance by introducing auxiliary variables. 
III. RECONFIGURABLE COMPUTING
In this section, we describe our RC approach. RC systems are hardware systems with logical configurations that can be changed to solve a problem and/or quickly carry out an application. These systems are realized with FPGAs and logic synthesis systems. Conventional RC systems are reconfigured to a target application. On the other hand, our RC system features reconfiguration for each instance of the application problem.
One might argue that it is natural to increase a system's speed by implementing an algorithm on hardware, and that there is no significant reason for research on such an approach. This argument is not correct, since the operations that can be directly performed by hardware are rather limited. If we perform a complicated operation by iterating a number of simple operations, the performance is similar to that of general-purpose computers.
To obtain significant speed increases by utilizing hardware, we need new and quite different methodologies for designing/implementing algorithms. For example, when implementing an efficient search algorithm using general-purpose computers, we often need to avoid duplicated computations by using a carefully designed data structure to represent a state. More specifically, we can maintain an integer vector for clauses (the initial value of each vector element is 3) in solving a 3-SAT problem. In determining a variable value, we subtract one from the element where the clause is reduced by this operation. With such a data structure, radically changing a state is not desirable because its bookkeeping becomes too costly.
On the other hand, if we have enough hardware resources, all clauses can be checked in one clock cycle without using such a data structure. Since some kinds of computations can be performed very quickly by hardware, we can get more freedom in the design of algorithms. We believe that this approach brings a very exciting new dimension to algorithm design.
A logic circuit that solves a specific SAT problem is synthesized by the procedure depicted in Fig. 1 . First, a text file that describes a SAT problem is analyzed by a C program called "SFL generator." This program generates a behavioral hardware description specific to the given problem with an HDL called SFL (Structured Function description Language) developed by NTT. Then, a logic synthesis system analyzes the description and synthesizes a netlist, which describes the logic circuit structure. We use a system called PARTHENON [5] , [6] , which was also developed by NTT. In general, scheduling and allocation are difficult at high level synthesis procedure. In our system, users design the finite state machine in order to help synthesis. PARTHENON integrates a description language, simulator, and logic synthesizer, which synthesizes logic circuits from a behavioral hardware description written in HDL. Finally, the FPGA Mapper of the FPGA system generates FPGA mapping data from the netlist.
IV. ALGORITHMS WITH STATIC VARIABLE ORDERING
In this section, we present several algorithms with static variable ordering. In these algorithms, we check the complete assignment of variable values, i.e., we represent one combination of all variable values as an -digit binary value. Assuming that variable s value is , a combination of value assignments is represented by an -digit binary value , in which the value of s digit (counted from the lowest digit) represents the value of . We call the combination of all variable values a state.
A. Exhaustive Algorithm
The most straightforward algorithm with static variable ordering is an exhaustive algorithm as shown in Fig. 2 . In this algorithm, the state is incremented from 0 to . For each state, the algorithm checks whether clauses are satisfied. If all clauses are satisfied, the state is recorded as a solution. Obviously, this algorithm is very inefficient since it must check all states.
B. Backtracking Algorithm
When some clauses are not satisfied, instead of incrementing s digit, we can increment the lowest digit that is included in these unsatisfied clauses; thus the number of searched states can be reduced (Fig. 3 ). More specifically, for each unsatisfied clause , we choose the lowest digit and increment max . The algorithm obtained after this improvement is very similar to the backtracking algorithm [18] where the order of the variable selection is statically determined. 
C. Forward-Checking Algorithm
If we check not only the current value (0 or 1) but also another value concurrently, we can reduce the number of searched states. For example, assume that there exist variables and clauses . The initial state does not satisfy . If we increment s digit and change s value to 1, then and are not satisfied. If we perform the check for the case that s value is 1, we can confirm that incrementing s digit is useless. In this case, if is 0, is not satisfied, and the second lowest digit in is . If is 1, and are not satisfied, and the second lowest digit in is , while the second lowest digit in is . Therefore, we can conclude that at least s digit must be changed to satisfy all clauses and that changing digits lower than is useless (Fig. 4) .
D. Unit Resolution
Another procedure that greatly contributes to the efficiency of forward checking is to assign the variable value immediately if the variable has only one value consistent with the variables that have already been assigned values. This procedure is called unit resolution [19] .
In order to perform a similar procedure in this algorithm, for each variable , we define a value called unit . If unit , there exists only one possible value for , which is consistent with the upper digit variables, and the second lowest digit in the clause that is constraining is s digit. When there exist multiple possible values, unit . We use this information to calculate the digit to increment.
For example, in the initial state of the problem described above, has only one consistent value 1 by . Therefore, we set unit to 4 (since is the second lowest digit in ) and change s value to 1. . This state satisfies all of the three clauses (Fig. 5 ).
V. ALGORITHMS WITH DYNAMIC VARIABLE ORDERING HEURISTICS
By introducing forward-checking and unit resolution, the performance of the obtained algorithm becomes equivalent to the basic Davis-Putnam procedure [19] . However, this algorithm turns out to be inefficient since the variable ordering is static (except for unit-resolution). To solve a large-scale problem within a reasonable amount of time, we need to introduce dynamic variable ordering heuristics.
As the number of variables increases, the time required to solve the problem grows exponentially. However, since the rate of growing decreases by using dynamic variable ordering, it is better for large-scale problems. This is well-known by studies of constraint satisfaction problems [20] .
In this section, we first describe the outline of an algorithm with dynamic variable ordering heuristics, then explain how to implement such an algorithm on FPGAs, and finally describe two dynamic variable ordering heuristics (EUP heuristic and MOMs heuristic).
A. Outline of the Algorithm
The outline of the algorithm with a dynamic variable ordering heuristic can be described as follows.
Let be the input SAT problem instance and be the working list containing problem instances, where is initialized as { }. A problem instance is represented as a pair of a set of value assignments and a set of clauses that are not yet satisfied.
1) If is empty, then is unsatisfiable, stop the algorithm. Otherwise, select the first element from and remove from . 2) If contains an empty clause, then is unsatisfiable, go to 1). 3) If all clauses in are satisfied, print the value assignments of as one solution, go to 1). 4) If contains a unit clause (a clause with only one variable), then set this variable to the value that satisfies the clause, simplify the clauses of and go to 2). 
B. Implementation of the Algorithm with Dynamic Variable Ordering
In most cases, such a backtracking tree search algorithm is implemented by using a stack. However, implementing a stack is not appropriate since having a large memory is difficult in a logic circuit. Even if we can manage to implement a large memory, sequential accesses to the memory can become a bottleneck in algorithm execution. We avoid the overhead of sequential accesses to a large memory by assigning separate registers for each variable. More specifically, there exists a register for each variable that records the depth of the search tree where the variable value is determined. This information is used for backtracking.
On the other hand, one merit of using a logic circuit is that all constraints (clauses) can be checked simultaneously. In order to make use of this advantage, we change the algorithm so that if there exist multiple unit clauses, multiple variable values are determined at the same time.
We are going to describe the details of the algorithm. We first define concepts and terms used in the algorithm. We represent the fact that is true as .
• Each variable is associated with the value depth , which represents the depth of the search tree where the variable value is determined.
• where determined is 0, set the value to , which is specified by the clause. Set determined to 1, branch to 0, and depth to current depth. Go to 1). 4) Branching: Otherwise, select best pos using some variable ordering heuristic. Branching end: Set that is specified by best pos to 0, determined to 1, branch to 1 and depth to current depth . Set current depth to current depth . Go to 1). 5) Backtracking:
5.1 If current depth is 1, stop the algorithm. 5.2 Otherwise, for each variable , If depth current depth and branch , set determined to 0 and depth to 0. If depth current depth and branch , set to 1, branch to 0 and depth to current depth 1. 5.3 set current depth to current depth 1, go to 1).
C. Algorithm with EUP Heuristic
The branching heuristic called experimental unit propagation (EUP) was shown to be very effective [21] . In short, when selecting a variable to branch using this heuristic, we experimentally set each unassigned variable's value to 0 and 1 (experimental unit propagation procedure). We select the variable that causes the maximum number of unit propagation.
One advantage of using this heuristic is that the logic circuit for implementing it is very similar to the logic circuit for the main tree search procedures; thus, we can share the hardware resources between these routines.
D. Algorithm with MOMs Heuristic
The other variable ordering heuristic used for the branching is called Maximum Occurrences in clauses of Minimum Size (MOMs) heuristic. In this heuristic, we simply count the occurrences of each variable in binary clauses and choose the variable that appears most in binary clauses. Although this heuristic is very simple, it has been reported to be very effective [22] , [18] in improving the efficiency of the Davis-Putnam procedure. Compared with the EUP heuristic, performing this heuristic in software is easy, but implementing this heuristic on FPGAs requires more hardware resources than those needed for the EUP. We can consider that this heuristic performs a kind of approximation of the EUP heuristic, since if a variable occurs often in binary clauses, we can expect that assigning value to this variable will cause many unit-resolutions.
VI. HARDWARE IMPLEMENTATION
In this section, we explain how to implement the algorithm described in Section V-C on FPGAs, since this algorithm turns out to be the most efficient. The algorithm can be straightforwardly represented as a finite state machine.
Since there are many similarities between the main procedure and the EUP procedure, hardware can be shared between them. Hardware works exclusively, that is, at a given time hardware works in either the main procedure mode or in the EUP procedure mode. In order to distinguish these two modes, a flag called eup is set up. If eup is 0, the algorithm is in the main procedure mode, and if eup is 1, the algorithm is in the EUP procedure mode. The evaluation, unit, and backtrack states are used both in the main procedure mode and in the EUP procedure mode. However, the behavior of these states is slightly different according to the value of eup.
The conditions of clauses are calculated and integrated in the evaluation state and the next state is determined. The condition of each clause can be calculated in parallel with other clauses. For example, we create a logic circuit that is equivalent to the following logic formula to check whether a clause is not-satisfied:
where is Exclusive-OR .
In the backtracking state, for each variable , depth and current depth are compared, and the variable values, determined, depth, etc. are changed. These procedures can be done in parallel for each variable. If current depth 1, the algorithm is terminated. These procedures require one clock cycle.
In the unit state, several variable values are determined by unit clauses. These procedures are simple and can be executed within the same clock cycle of the evaluation state.
When the current state moves from the evaluation state to the branching state, eup is set to 1, and eup is 1 until it goes back to the main procedure mode.
The high-level hardware description language SFL can handle the finite state machine representation, and the LSI CAD system PARTHENON can automatically generate an RT-level hardware description. The state transitions of the finite state machine are shown in Fig. 6 . Fig. 7 shows the hardware block diagram. In the part, the value and parameters of each variable are stored. In the evaluation state, they are sent to the part, and then the condition of each clause is checked. The connections between the part and the part represent clauses in which variables are included. The condition of each clause is checked concurrently. Fig. 8 is a part of the description of the part written in HDL (SFL), where & is logical AND and is logical OR. In this case, and are checked concurrently, and then the results are outputted.
The condition of each clause is sent to the part, and then the next step is determined by the integration of all conditions according to the algorithm. For example, if not-satisfied clause exists, the next step is backtracking.
In the part, the behavior shown in 4) of main procedure is done. In 4), clauses are evaluated by assigning the values of variables experimentally. For this behavior, eup is set to 1, and variables values are assigned one-by-one. Then, the state is changed to the evaluation state, and the evaluation is done. In the evaluation state, the behavior is changed by the value of eup. After all variables are evaluated, eup is set to 0, the values are assigned as shown in the branching end, and then return to Main mode.
In the part, the backtracking is done. The part is shared by Main mode and EUP mode. After the mode is distinguished by the value of eup, its behavior is changed. In the part, the behavior is done as shown in 3) of the main procedure. In the part, the values are outputted as the solution in the case of 2) of the main procedure.
VII. EVALUATION
A. Simulation
We first evaluated the efficiency of the developed algorithms by software simulation. We use hard random 3-SAT problems as examples. Each clause is generated by randomly selecting three variables, and each of the variables is given the value 0 or 1 (false or true) with a 50% probability. The number of clauses divided by the number of variables is called clause density, and the value 4.3 has been identified as a critical value that produces particularly difficult problems [23] .
In Fig. 9 , we show the log-scale plot of the average required time of over 100 problems, assuming the clock rate is 10 MHz. If several chips are used to implement the problem, clock speed would be reduced by the connection between chips. However, the capacity of an FPGA chip is increasing very rapidly, so we assume that we can implement a problem instance of this size in the very near future. Since a randomly generated 3-SAT problem tends to have a very large number of solutions when it is solvable, we terminate each execution after the first 100 solutions are found to finish the simulation within a reasonable amount of time.
In Fig. 9 , we show the results of the algorithm that introduces forward-checking and unit-resolution with static variable ordering (D&P), the algorithm that uses the MOMs heuristic (D&P with MOM), and the algorithm that uses the EUP heuristic for variable ordering (D&P with EUP). It is obvious that dynamic variable ordering is indispensable to solving large-scale problems in a reasonable amount of time. We can see that the D&P with EUP can solve a hard random 3-SAT problem with 400 variables within 1.6 min at a clock rate of 10 MHz. In addition, we can see that the search tree of EUP grows at the rate of , where is the number of variables, whereas the search tree of MOMs grows at the rate of . This result shows that the D&P with EUP is more efficient with a larger number of variables.
Furthermore, we show the results for AIM benchmark problems [24] with 128 variables and 256 clauses on FPGAs. These problem instances are unsolvable and known to be very difficult. They can be actually implemented on an FPGA chip. Table I shows the required numbers of states and the times needed to solve these problems when the clock rate is 10.0 MHz. For comparison, we also show the cpu time of POSIT [22] , a very sophisticated SAT solver that utilizes the MOMs heuristic. These programs run on a Sun Ultra 30 Model 300 (UltraSPARC-II 296 MHz). We can see that the running time of EUP on FPGAs is faster than these programs.
Of course, this comparison is not very fair as we do not consider the time required to generate the logic circuit. Currently, generating a logic circuit from a problem description takes 1 h. The bottleneck routine is the FPGA Mapper of the FPGA system, which generates FPGA mapping data from the netlist. In addition, many factors affect change compile times, e.g., the depth of synthesis optimization, synthesis method for FPGA, target FPGA structure, FPGA Mapping tools, and machine power for synthesis and mapping. Accordingly, the compile time is easily changed. In this paper, we compiled the entire circuit for each problem. However, this routine can be highly optimized for SAT problems since many parts in a logic circuit are common in all problem instances. Therefore, compile time can be dramatically reduced, and the time required to generate a logic circuit can be negligible for larger-scale problems.
B. Current Implementation Status
We use an ALTERA FLEX10K250 FPGA chip to implement the algorithm. The FLEX10K250 has 12 160 Logic Cells (LCs) and its typical usable gates are from 149 k to 310 k. We have actually implemented a 3-SAT problem, "aim-128-2 0-no-.cnf" as previously described with 11 042 LCs. The utilization rate of the LCs was 90%. We were able to run this circuit at a clock rate of 10.0 MHz. In addition, we have successfully mapped "aim-200-1 6-yes1-1.cnf" with 200 variables and 320 clauses, but the circuit is divided into 21 FLEX10 K chips. Many chips are used because the circuit requires a lot of wiring resources. In this case, since the number of pins is restricted, wiring resources are insufficient. As a result, the total utilization rate of the LCs was 13%. This situation will be improved by increasing the gates of a chip and by a dynamic reconfiguration technique. Fig. 10 shows the number of gates required for "aim-{50, 100, 200}-1 6-yes1-1.cnf". The figure shows the number of gates when the circuit is organized by primitive gates. Note that these circuits are the initial ones synthesized from HDL descriptions. If these circuits are optimized, the number of gates can be further reduced while keeping the same trend. EUPs required number of gates is approximately 30% less than that of MOM. This is because almost all of the circuits of the EUP can be shared with the main procedure. On the other hand, the MOM algorithm requires additional circuits for branching.
VIII. RELATED WORK
The authors presented the first result of this approach in [1] . After this report, various research projects following this line have been carried out [9] - [15] .
One of the major differences between the other approaches and ours in this paper is the method used for variable ordering. While the other approaches solve a problem instance using static variable ordering, our approach uses dynamic variable ordering. Dynamic variable ordering is indispensable to solve large-scale problems, as described in Section V.
In [10] , a hill-climbing algorithm is implemented on FPGAs. Although hill-climbing algorithms are very efficient and can solve very large-scale problems, the completeness of the algorithm, i.e., always finding a solution if one exists or terminating if no solution exists, cannot be guaranteed.
In [9] , a PODEM-based algorithm [25] is implemented on FPGAs. This algorithm is more suitable for testing multi-level logic circuits than for solving generic SAT problem instances. In [11] , a similar method to [9] is used, but this method tries to optimize the total time, including the time for generating the logic circuit.
In [14] , a massively parallel fine-grain satisfier architecture is introduced. This architecture accelerate to implement a SAT solver on reconfigurable hardware.
In [12] and [13] , a tree search algorithm based on the DavisPutnam procedure is implemented on FPGAs. A specialized logic circuit is used to perform a more powerful resolution method than unit-resolution. This resolution method seems very effective for solving problem instances that are particularly difficult in software implementation. However, in this method, the variable ordering for branching is statically determined.
IX. CONCLUSION
This paper presented results on solving SAT problems using FPGAs. In this approach, a logic circuit specific to each problem instance is created on FPGAs. We developed a series of algorithms that are suitable for implementation on a logic circuit.
Simulation results showed that the best method, which utilizes the EUP, can solve a hard random 3-SAT problem with 400 variables within 1.6 min at a clock rate of 10 MHz. Furthermore, we have actually implemented a benchmark problem with 128 variables that can run at 10 MHz.
We are currently refining the implementation of the algorithm on FPGAs and performing various evaluations on implemented logic circuits.
