A FPGA based clause evaluator for boolean satisfiability problems is presented in which a customised bitstream is directly generated from the problem specification, avoiding the need for resynthesis. A three orders of magnitude improvement in reconfiguration time was seen over the standard approach for a 50 variable, 80 clause problem.
Introduction
There has been considerable recent interest in the application of field programmable gate array devices (FPGAs) as accelerators for solving constraint satisfaction problems (CSPs) and, in particular, the boolean satisfiability (SAT) problem. The boolean SAT problem is a CSP in which the constraints are represented by a boolean function of m binary variables (F (x 0 ; x 1 ; :::; x m?1 )) in a product of sums form. Each sum term is a clause, C i , and is the sum of single literals, where a literal is a variable or its negation. The boolean satisfiability problem (SAT) is concerned with finding a variable assignment that makes F = 1 (satisfiable) or proving that F = 0 (unsatisfiable). An important component of any SAT machine is a clause evaluator which evaluates the clauses C 0 ; :::; C n?1 with different variable assignments. Inputs to the clause evaluator are the assignment of the variables and the outputs are the evaluations of the clauses. SAT solving systems can be designed with a fixed circuit to perform the search and the clause evaluator customised for different sets of constraints [1] .
Most previous implementations of SAT approached the problem by using a computer program to generate a customised clause evaluation circuit for a particular SAT problem [2, 3] . The design is then synthesised, placed and routed (P&R) to produce a bitstream which is downloaded to the FPGA, which in turn is used to search for a solution to the original SAT problem. This approach requires a complete iteration of the synthesis, P&R cycle for each new set of constraints and can take several hours for a large design, precluding its use in real-time systems. Recently, runtime reconfigurable systems have been employed to address this problem, modifying the bit-stream in a problem specific fashion without requiring resynthesis [1, 4] . To the best of our knowledge, all prior runtime configurable systems have used Xilinx XC6200 series devices which document the manner in which the bitstream relates to the hardware of the device. However, XC6200 devices have been discontinued by Xilinx and also have very small logic capacity (the largest reported runtime reconfigurable system only supporting 13 variables and 29 clauses [4] ).
In this paper, we present an architecture for a clause evaluator using industry standard Xilinx XC4000 series devices [5] in which the bitstream is directly generated from the constraint problem.
It has the advantage over previous designs in that it (1) does not require the synthesis and P&R steps and (2) supports XC4000 series devices. The clause evaluator does not have any restrictions on the number of literals in a clause. Furthermore, the same architecture can be used for the recently announced Xilinx Virtex devices which have much larger capacity and a documented bitstream format [6] . Figure 1 shows a block diagram of the clause evaluator. It contains an array of configurable logic blocks (CLBs), the logic primitives of Xilinx XC4000 devices [5] . Each CLB is configured as two 16 1 RAM memories and produces two outputs on different rows as illustrated in the figure. The inputs to the clause evaluator are 50 bits corresponding to the variables and the outputs are the 80 clause evaluations.
Clause Evaluator
Each row of the array in Figure 1 corresponds to two clauses, the outputs appearing in the two wires immediately above and below the CLB. Each 1/2 CLB in the row has its address lines connected to 4 consecutive inputs of the variable to be evaluated. The output of the CLB is the evaluation 2 of the sum terms for the input variables to which it is connected. The RAM outputs are connected to the row line through an open drain buffer, implementing the sum terms as a wired-AND (which is equivalent to an active low wired-OR operation). Note also that a pullup resistor is connected to each row.
As an example, for the clause C 0 = x 0 + x 2 + x 5 , the 1st column CLB of Figure 1 implements x 0 + x 2 (as a lookup table) and the 2nd column CLB implements x 5 . If one or more literals evaluates to a logical true (in our example, this corresponds to x 0 being false or x 2 being true or x 5 being true), its CLB will drive the row low, asserting the (active low) output.
All the components and routing were placed into predefined locations and routed automatically by the Xilinx Epic Editor from a script created by a C program. The interconnect for the inputs and outputs of the clause evaluator are implemented using longlines [5] which are intended for high fan-outs that are distributed over long distances.
As the bitstream format for XC4000 series devices is not documented, the mapping between RAM contents and the bitstream was determined by using a program to produce designs with known patterns in each RAM, compiling the design to a bitstream using the standard Xilinx tools and then finding the pattern in the resulting bitstream. A table of the starting positions of all the RAMs in the FPGA's bitstream was thus compiled. Using this table, another C problem can configure the contents of the memories in the bitstream directly from a SAT problem specification in the standard DIMACS benchmark format [7] .
Results
The clause evaluator was tested on a DIMACS 3-SAT (i.e. each clause has 3 literals) benchmark problem (aim-50-1 6-yes1-1) [7] with 50 variables and 80 clauses. On a Sun Ultra 5/10 UPA/PCI (UltraSPARC-IIi 270MHz), the time required to generate the bitstream for this problem was 0.7 seconds. Using the same UltraSPARC-IIi 270MHz machine, we also produced a VHDL description of the clause evaluator and performed synthesis (407 seconds) and place and route (660 seconds), giving a total implementation time of 1067 seconds. Thus the runtime reconfigurable version is a three orders magnitude improvement over the resynthesis approach.
The resulting runtime configurable implementation (shown in Figure 2 ) required 520 CLBs, approximately 1/4 of the resources of a Xilinx XC4062XL device. The Xilinx implementation tools report a worse case delay of 40 ns. This implementation was successfully tested at 25 MHz on a single XC4062XL chip of an Annapolis Micro Systems Wildforce board. Profiling analysis of a software implementation of GSAT (version 41) by Selman and Kautz [8] showed that it required 3:1 S per clause evaluation. Note that in this program, variable flips are done in an intelligent fashion, only the clauses affected by a variable flip being recomputed. Thus the FPGA implementation of the clause evaluator is 77 times faster than the software implementation.
Conclusion
An architecture for a runtime reconfigurable clause evaluator which generates a customised circuit for a particular problem instance was reported. Distributed RAM devices in a field programmable gate array (FPGA) were utilised to customise the circuit by directly changing the bitstream of the FPGA. This approach showed a 1500 times speedup over resynthesis from a HDL and a 77 times improvement in execution speed over an optimised software implementation. We envisage this tech- 
