Abstract-Efficient solutions to nonpolynomial (NP)-complete problems would significantly benefit both science and industry. However, such problems are intractable on digital computers based on the von Neumann architecture, thus creating the need for alternative solutions to tackle such problems. Recently, a deterministic, continuous-time dynamical system (CTDS) was proposed [1] to solve a representative NP-complete problem, Boolean Satisfiability (SAT). This solver shows polynomial analog time-complexity on even the hardest benchmark k-SAT (k ≥ 3) formulas, but at an energy cost through exponentially driven auxiliary variables. This paper presents a novel analog hardware SAT solver, AC-SAT, implementing the CTDS via incorporating novel, analog circuit design ideas. AC-SAT is intended to be used as a coprocessor and is programmable for handling different problem specifications. It is especially effective for solving hard k-SAT problem instances that are challenging for algorithms running on digital machines. Furthermore, with its modular design, AC-SAT can readily be extended to solve larger size problems, while the size of the circuit grows linearly with the product of the number of variables and the number of clauses. The circuit is designed and simulated based on a 32-nm CMOS technology. Simulation Program with Integrated Circuit Emphasis (SPICE) simulation results show speedup factors of ∼10 4 on even the hardest 3-SAT problems, when compared with a state-of-the-art SAT solver on digital computers. As an example, for hard problems with N = 50 variables and M = 212 clauses, solutions are found within from a few nanoseconds to a few hundred nanoseconds.
and neuromorphic computing) is more imperative than ever. While quantum computing is a promising venue, it is far from being brought to practical reality, with many challenges still to be faced, both in physics and engineering. Neuromorphic computing systems, e.g., cellular neural networks (CNNs) [3] [4] [5] and IBM's TrueNorth [6] , have been shown to be promising alternatives for solving a range of problems in, i.e., sensory processing (vision and pattern recognition) and robotics. Analog mixed-signal information processing systems such as CNNs can offer extremely power/energy efficient solutions to some problems that are costly to solve by digital computers [7] . Such systems have received increasing attention in recent years (see [8] [9] [10] ), including parallel analog implementations (see [11] ).
In analog computing [12] , the algorithm (representing the "software") is a dynamical system often expressed in the form of differential equations running in continuous time over real numbers, and its physical implementation (the "hardware") is any physical system, such as an analog circuit, whose behavior is described by the corresponding dynamical system. The equations of the dynamical system are designed such that the solutions to problems appear as attractors for the dynamics and that the output of the computation is the set of convergent states to those attractors [13] . Although it has been shown that systems of ordinary differential equations can simulate any turing machine [3] , [14] , [15] , and hence, they are computationally universal, and they have not yet gained widespread popularity due to the fact that designing such systems is problem specific and usually difficult. However, if an efficient analog engine can be designed to solve nonpolynomial (NP)-complete problems, then according to the Cook-Levin theorem [16] , it would help solve efficiently all problems in the NP class, as well as benefit a very large number of applications, both in science and engineering.
In this paper, we consider designing analog circuits for solving a representative NP-complete problem, the Boolean Satisfiability (SAT) problem. SAT is quintessential to many electronic design automation problems, and is also at the heart of many decision, scheduling, error-correction, and security applications. Here, we focus on k-SAT, for which is well known to be NP-complete for k ≥ 3 [16] . The currently best known deterministic sequential discrete algorithm that exploits some properties of the search space has a worst case complexity of O (1. 473 N ) [17] . Other algorithms are based on heuristics, and while they may perform well on some SAT formula classes, there are always other formulas on which they take exponentially long times or get stuck indefinitely. Some of the better known SAT solvers include Zchaff [18] , MiniSat [19] , RSat [20] , WalkSAT [21] , focused recordto-record travel [22] , and Focused Metropolis Search [23] .
They typically consist of decision, deduction, conflict analysis, and other functions [24] that employ the capability of digital computers to assign values to literals, conduct Boolean constraint propagation (BCP), and backtrack conflicts [25] , [26] .
A number of hardware-based SAT solvers have been proposed in the past. FPGAs-based solutions have been investigated to accelerate the BCP part found in all "chaff-like" modern SAT solvers [27] [28] [29] . Speedups of anywhere between 3X and 38X have been reported when comparing these FPGA-based solvers over MiniSat [19] , a well-known, high-performance software solver. A custom digital integratedcircuit (IC)-based SAT solver, which implements a variant of general responsibility assignment software patterns and accelerates traversal of the implication graph and conflict clause generation, has been introduced in [30] and [31] . A speed up of ∼10 3 over MiniSat was reported based on simulation with extrapolation. Performance of these hardwarebased approaches still has a lot of room for improvement since the algorithms that these hardware accelerators are based on are designed for digital computers and, thus, can typically expect to achieve limited speedup.
Recently, an analog SAT solver circuit was introduced in [32] using the theoretical proposal from [33] based on the CNN architecture. However, the theory in [33] has exponential analog-time complexity and, thus, is much less efficient than the SAT solver from [1] , which forms the basis for this paper. Furthermore, the circuit from [32] seems to have been implemented only for a 4 × 4 problem size, and no hardware simulation and comparison results were reported.
Mostafa et al. [11] propose a distributed mixed (analog and digital) algorithm that is implementable on VLSI devices. It is based on a heuristic method combined with stochastic search, drawing on the natural incommensurability of analog oscillators. Assuming P = NP, in order to have efficient, polynomially scaling solution times, one would require exponentially many computing elements, i.e., exponentially scaling hardware resources. However, the method in [1] trades timecost for energy-cost, which in practical terms is preferable to massive amounts of hardware resources. It is quite possible that from an engineering point of view the ideal approach combines both types of tradeoffs: time versus energy and time versus hardware (distributed). The heuristic stochastic search in [11] is effectively a simulated annealing method that implies high exponential runtimes for worst case formulas. In contrast, the analog approach in [1] is fully deterministic and extracts maximum information about the solution, embedded implicitly within the system of clauses, and can solve efficiently the hardest benchmark SAT problems-at an energetic cost [1] .
Here, we propose a novel analog hardware SAT solver, referred to as AC-SAT. 1 AC-SAT is based on the deterministic continuous-time dynamical system (CTDS) in the form of coupled ordinary differential equations presented in [1] . As mentioned above, this system finds SAT solutions in analog polynomial time, however, at the expense of auxiliary variables that can grow exponentially, when needed (see [1] , [34] for details). Though this CTDS is an incomplete solver, it does minimize the number of unsatisfied clauses when there are no solutions, and thus it is also a MaxSAT solver. The overall design of AC-SAT is programmable and modular, and thus, it can readily solve any SAT problem of size equal or less than what is imposed by the hardware limitations, and can also be easily extended to solve larger problems. Moreover, to avoid resource-costly implementations of the complex differential equations in CTDS, we introduce a number of novel analog circuit implementation ideas which lead to much smaller amount of hardware than straightforward implementations, while preserving the critical deterministic behavioral properties of CTDS equations.
We have validated our design through Simulation Program with Integrated Circuit Emphasis (SPICE) simulations. Our simulations show that AC-SAT can significantly outperform (over tens of thousands times faster than) MiniSat, with the latter running on the latest, high-performance digital processors. For hard SAT problems with 50 variables and over 200 clauses, compared with the projected performance of a possible custom hardware implementation based on a recent FPGA solver [29] , AC-SAT offers more than ∼600X speedup.
Monte Carlo simulations further demonstrate that AC-SAT is robust against device variations.
In the rest of this paper, we first review the basic CTDS theory and some of its variants in Section II. Section III introduces the overall AC-SAT design. In Section III-D, we present two alternative designs for a specific component in AC-SAT. Section IV first discusses simulation-based validation of AC-SAT, compares the different component designs, and then summarizes performance results for AC-SAT with respect to a software implementation of the CTDS SAT solver and MiniSat. Finally, we conclude this paper in Section V.
II. BACKGROUND
Solving a k-SAT problem is to find an assignment to N Boolean variables x i ∈ {0, 1}, i = 1, . . . , N, such that they satisfy a given propositional formula F. F in conjunctive normal form (CNF) is expressed as the conjunction of 
It is easy to see that clause C m is satisfied, iff K m = 0. Defining a "potential energy" function
where a m > 0 are auxiliary variables, one can see that all the clauses are satisfied iff V = 0. Thus the SAT problem can be reformulated as search in s for the global minima of V (since the condition V ≥ 0 always applies). If the auxiliary variables a m are kept as constants, then for most hard problems any hill-descending deterministic algorithm [which evolves the variables s i (t)] would eventually become stuck in local minima of V and not find solutions. To avoid this, the auxiliary variables are endowed with a time-dependence coupled to the analog clause functions K m . Ercsey-Ravasz and Toroczkai [1] proposedṡ
in which (3) describes a gradient descent on V , and (4) is an exponential growth driven by the level of non-SAT in K m (which also guarantees that a m (t) > 0, at all times). (3) can be rewritten as
where
For the auxiliary variables a m , the formal solution to (4) is
and thus the expression (2) of V is dominated by those K m terms that have been unsatisfied for the longest time during the dynamics, resulting in an analog version of a focused search-type [23] dynamics. Also note that systems (3), (4) are not unique; however, it is simple from a theoretical point of view, and incorporates the necessary ingredients for solving arbitrary SAT problems, due to the exponentially accelerated auxiliary variables. For details on the performance of the algorithm, the reader is referred to [1] .
It is important to observe that while the scaling of the analog time t to find solutions is polynomial, in hardware implementations, the a m variables represent voltages or currents and thus the energetic resources needed to find solutions may become exponential for hard formulas which is, of course necessary, assuming P =NP. However, the a m variables do not need to grow exponentially all the time and unlimitedly, as in (4) and for that reason form (4) is not ideal for physical implementations. The challenge is then finding other variants that still significantly outperform digital algorithms, yet they are feasible in terms of physical implementations and costs.
Note that such systems as ours essentially convert time costs into energy costs.
Here we introduce another form with the help of timedelays, which, however, still keeps the focused nature of the search dynamics but allows the a m 's to decrease as well when the corresponding clauses are (nearly) satisfied
with a m (0) > 0, and δ m (0) = 0, ∀m
where the delay functions δ m (t) ∈ [0, t] determine the history window of K m (s(t)) trajectory that has impact on the variation of a m . The formal solution to (8) is
Clearly, the case δ m (t) = t corresponds to (4), while δ m (t) = 0 recovers the case of constant a m values that correspond to the naive energy minimization case. One approach to choosing δ m (t) is setting it to a small value initially and doubling it every time the dynamics is stuck or hits an upper threshold (set, e.g., by a maximum allowed voltage value). This typically only requires a few iterations. Other delay functions are being investigated. It is important to note that the decrease of satisfied clause's associated a m due to this timedelayed form relatively reduces the clause's weight in (5), thus increases other clauses' weights in the focused search space, enhancing the driving capability of unsatisfied clauses.
III. SYSTEM DESIGN
In this section, we present AC-SAT, our proposed analog SAT solver circuit based on the CTDS theory in Section II. Though it is possible to implement the CTDS equations digitally, the hardware would be much more costly in terms of area, power, and performance. Thus, we opt for an analog implementation that also bears affinity with the operations in the CTDS. Our circuit design aims to keep the hardware solver configurable and modular while keeping the circuit simple and power efficient. These considerations require careful design of the overall architecture and some modifications to the algorithm itself, which will be elaborated later in this paper. (4), (5), and (6) would incur large hardware costs. Instead, here we present implementations of the SDC and AVC circuits that are much more resource-efficient than the direct approach. Below, we elaborate the design of the three circuit components using the 3-SAT problem (i.e., three nonzero c m, j s for each clause) as an example. AC-SAT for any k-SAT problem can be designed following the same principle.
A. Signal Dynamics Circuit
The SDC contains an array of analog elements that realize the dynamics specified by (5) and (6) . Though it is possible to implement the multiplications and voltage controlled current source (VCCS) in (5) and (6) straightforwardly based on operational amplifiers, such implementations can be rather costly. We introduce several novel circuit design ideas to implement the dynamics in (5) and (6) . We will show that the accuracy of the circuit is sufficient for the type of dynamical systems being considered here.
Given a 3-SAT problem with N variables, the SDC enables an array of N analog elements, referred to as s i element, for evaluating the s i (i = 1, . . . , N) signals. Fig. 2 (a) shows the conceptual design of the s i element that realizes (5). The s i element contains a capacitor C connected to the M branch blocks (where M is the total number of clauses in the 3-SAT problem), an analog inverter, an inverted Schmitt trigger, and a digital inverter. The voltage across capacitor C, i.e., V i , and the output of the analog inverter V i represent the analog value of signal s i and −s i , respectively. Signal To see why the s i element in Fig. 2 (a) can be used to evaluate (5), let us denote the current from each of the branch block as I m,i . Then, we have Comparing (5) with (11), we see that the s i element in Fig. 2 (a) precisely realizes (5) if we have
In order to design a branch block to satisfy (12) 
Referring to (13) (respectively, −1). Based on the above observations, let us examine the conceptual design of the branch block in Fig. 2(b) . Specifically, the branch block contains two switches and four tunable resistive elements. (The resistive elements here are used to simplify the drawing, and the details about their design will be described later. 3 [elements in the green box in Fig. 3(a) ]. (c) Circuit implementation for the switch as well as R am and R m,i [elements in the red box in Fig. 3(a) ]. 
If the values of R a m , R m,i , R m,i 2 , and R m,i 3 are chosen properly, the I m,i value derived from the branch block would have the same properties as identified for D m,i above. The actual realization of the four resistive elements in Fig. 2(b) is given in Fig. 3 . The implementation of R m,i 2 and that of R m,i 3 are the same and the one for R m,i 2 is shown in Fig. 3(b) . Consider the R m,i 2 block. The two terminals of the transmission gate formed by transistor M p and M n correspond to the terminals of R m,i 2 . The gate terminals of M p and M n are connected to V i 2 and V i 2 via four additional transmission gates controlled by Q c m,i 2 and Q c m,i 2 . It can be derived that this realization of R m,i 2 exhibits the desired properties outlined above for D m,i in (13) . The SPICE simulation results depicting the relationship between the resistance value and V i are given in Fig. 4(a) . For example, assuming that c m,i 2 is 1, i.e., Q c m,i 2 = V D D [corresponding to the red line in Fig. 4(a) ], the gates of M n and M p are connected to V i 2 and V i 2 , Fig. 4(a) ] and V i 2 is close to GND, then both M n and M p are OFF, R m,i 2 has a very large value (around 200 k ) and I m,i is close to zero. This means that clause C m has no impact on the variation of s i which is exactly the desired behavior. On the other hand, if s i 2 is not satisfied, as it gets closer to its target (i.e., +1), the magnitude of I m,i reduces because R m,i 2 increases as can be seen by the increase in the resistance value as V i gets close to V D D . The blue line in Fig. 4(a) corresponds to the case where c m,i 2 = −1 and its behavior can be explained in the same way as above.
The circuit block for implementing the switch controlled by Q c m,i and the two resistive elements R m,i and R a m is shown in Fig. 3 Fig. 5(a) . The digital signals are then sent to the DVC to check if a solution has been found. The inverted Schmitt trigger circuit exhibits hysteresis in its transfer curve as seen from the simulation result in Fig. 5(b) and, hence, can perform analog-digital conversion with minimal noise impact. Putting all the above discussions together, one can conclude that the SDC correctly implements the system dynamics defined by (3).
B. Auxiliary Variable Circuits
As pointed out in Section II, the auxiliary variables, a m s as defined in (4) , are used to help avoid the gradient descent search being stuck in nonsolution attractors. The a m signal follows an exponential growth driven by the level of non-SAT in clause C m . A direct way to implement an exponential function is through an operational amplifier (op-amp), which we present below. Note that we have realized the analog version of equation (4) in a resource-efficient manner, similar to the implementation in DVC, to avoid costly multiplications and VCCS implementations.
The AVC contains an array of M a m elements where M is the maximum number of clauses in a given problem that the AVC can handle. 
The Rs in Fig. 6(a) are tunable resistive elements implemented by transmission gates which have similar circuit topology to that shown in the green box in Fig. 3 Fig. 6(a) exactly realizes the exponential growth specified in (4) up to an upper bound on V a m , i.e., the op-amp's supply voltage. Fig. 6(b) plots an example V a m value growth with time before and after associated signals s i s get satisfied. After EN is set to 1 (i.e., the switch is closed), V a m starts to grow exponentially, following the differential equation in (16) . According to Fig. 4(b) , as V a m increases, the resistant value of R m,i ||R a m drops down, leading to a larger current I m,i in the corresponding branch block in Fig. 2(b) , which is consistent with (12) . This current, together with other currents that are associated with V i in Fig. 2(a) , contributes to the variation of V i which is specified in (11) . There are two cases that may stop the evolution of V a m , which are as follows.
1) As stated above, if any one of the three analog signals in clause C m is satisfied, the current paths in Fig. 6(a) is cut off, and V a m stops at a certain voltage. This indicates that V a m has finished its utility as an auxiliary variable to drive the corresponding clause to the satisfied state. 2) If V a m reaches its upper bound before any of the three variables in the corresponding clause is satisfied, the circuit stops evolving since the V a m value is unable to drive this yet unsatisfied clause any more. This impacts the effectiveness of avoiding being stuck in a nonsolution attractor during the gradient descent search process. The upper bound on V a m imposes a physical limitation on the hardware realization of the CTDS theory. 2 Although the AVC design given in Fig. 6 realizes the exponential growth, it requires M op-amps, resulting in a large amount of area and power consumption. There exist other ways to achieve exponential signal growth, e.g., circuits with positive feedback often have exponential growth in certain ranges. Besides the exponential growth implementation, we will introduce alternative AVC designs in the next section.
C. Digital Verification and Interface Circuits
The goal of the DVC is to determine if a solution (the set of s i s) to the given problem has been found within a user specified time bound. The DVC is implemented readily through the use of an array of 3M XOR gates and an array 2 As discussed in Section II, (3) and (4) are not unique, and the effect of the maximum voltage limitation depends on the equations themselves. For example, Section II introduces an alternative, delay-based formulation for a m in (8) , which allows a m to decrease when the corresponding clause is satisfied. This delay-based formulation of a m postpones reaching the a m upper bound. The implementation of (8) for the op-amp-based approach is currently under development. of M NAND gates as shown in Fig. 7 . The input to the DVC is the digital representation of s i 's and −s i 's, i.e., Q s i and Q s i , from the SDC. Each NAND gate corresponds to a clause and its inputs correspond to the literals present in the clause. Note that in the DVC, we only include those c m,i s whose values are +1 (represented by logic signal "1") and −1 (represented by logic "0"). The outputs of the DVC are analog values Q C m , for clauses C m , and indicator, which is set to 1 if the circuit finds a solution, otherwise it remains at 0. The DVC is an asynchronous circuit, and the output of the DVC constantly records whether a solution is found or not. By setting a time bound T , the DVC regards the problems whose solutions are found within T as satisfiable problems, the rest are considered either unsatisfiable or unsatisfiable within the alloted time. Note that for problem instances where no solutions are found in the given time bound, our approach does not provide a formal proof of unsatisfiability (as our algorithm is an incomplete algorithm). However, our solver is a MaxSAT solver, because it does not use any assumptions about the solvability of the formula and minimizes the number of unsatisfied clauses within the allotted resources or time. Theoretical analysis of the performance of the solver as a MaxSAT solver is out of the scope of this paper and will be presented elsewhere.
It is easy to see that all three components, SDC, AVC, and DVC, are modular and programmable. By modular, we mean that the basic elements in each circuit can be repeated for different problem sizes (i.e., the number of variables N and the number of clauses M). By programmable, we mean that any k-SAT problem instance can be solved by the same SDC, AVC, and DVC implementation as long as the problem size is less than or equal to the hardware specification.
Below, we briefly describe the I/O interface between the CPU and AC-SAT. AC-SAT is used as a coprocessor, similarly to other reconfigurable coprocessors (such as dynamically reconfigurable FPGAs). To facilitate configuration, AC-SAT can be augmented with an on-chip reconfiguration memory as well as a simple controller. Based on the problem description (given in the CNF), CPU writes to memory the configuration information. The controller then uses the memory contents to set the respective switches in the SDC, AVC, and DVC components. 
D. Alternative AVC Designs
The op-amp-based AVC described in Section III-B realizes an exponentially growing a m variable aiming to address hard SAT problems (some SAT instances with constraint density α=M/N 4.25) within its physical limitation. However, for application type SAT problems, i.e., which are not specially designed to be very hard, exponential growth for a m is not always necessary. Below we describe two alternative circuit designs to implement an a m function that has a (1 − 2 e −qt )-type growth to a saturation value. In the remainder, we will refer to this version of a m growth as the "simpler version." It is important to note that as the circuit in Fig. 9 does not realize the exponential growth specified in (4), it can indeed get captured into nonsolution attractors indefinitely for some very hard formulas. However, we have found that even for many hard problems, it works more efficiently than the op-amp-based a m element (with the same threshold value) in finding solutions for smaller size problems (as long as they are solvable), and the dynamics would only rarely get stuck. We will discuss this aspect more in the evaluation section via simulation results.
Similar to the op-amp-based AVC, in the simpler AVC design in Fig. 9 , some V a m s may reach V D D before the CTDS converges to a solution. One way to alleviate this physical limitation is to increase the range of V a m . However, such an approach has its limitations in practical circuits (e.g., the limited voltage supply allowed). This, in fact, is a fundamental limitation due to the NP hardness of 3-SAT. Nonetheless, it is possible to improve the V a m driving capability in the CTDS and increase the size of the hard problems that can be solved with the same physical range of V a m . Below, we discuss an alternative implementation of the simpler a m element to demonstrate that it is worthwhile to investigate different implementations of the AVC.
Recall that the delay function δ m (t) in (8) is to assist a m to keep relevant information from a limited range of the trajectory's past history instead of the entire history. We consider combining the simpler a m element with this time-delayed form, and choose δ m (t) = δ (meaning that we are integrating over a fixed time window of length δ). The corresponding a m element is shown in Fig. 10 . Capacitor C is charged to V D D through three tunable resistive elements and discharged to GND through the other three resistive elements. The first order differential equation of V a m can be written as
The six resistive elements are implemented by transmission gates similar to those for R m,i 2 in Fig. 3 . Specifically, 
IV. EVALUATION
In this section, we present our evaluation study of AC-SAT. We first describe the basic functional validation and then discuss the robustness of AC-SAT against device variations. We finally compare the performance of AC-SAT with a stateof-the-art digital solver.
A. Functional Validation
We have built our proposed analog SAT solver, AC-SAT, at the transistor level in HSPICE based on the predictive technology model 32-nm CMOS model [35] . All the circuit components use V D D = 1V . To achieve sufficient driving capability, the minimum transistor size is set to W = 1μm and L = 40nm while actual transistor sizes are selected according to their specific roles. For logic gates, the transistor sizes are chosen to ensure equal pull-up and pull-down strength. For the branch block in Fig. 3 and R m,i 3 being 64, 4, and 4, respectively. The sizes of the transistors in other circuits are determined in a similar fashion. Note that the transistor sizes shown above are just a lower bound for the technology model that we are using. The absolute values of the transistor sizes are not critical (the equations of the solver are adimensional), and other transistor sizes should also work as long as their relative sizes are close to the ones that we have shown.
To demonstrate that AC-SAT indeed behaves as specified by the CTDS dynamics in (3) and (4), we examine the waveforms of signals s i and a m . Fig. 11 shows three sets of s i and a m waveforms from a 3-SAT problem instance having 50 variables and 212 clauses: Fig. 11(a) for the op-amp-based a m implementation [realizing the ( 1 e qt )-type a m growth], Fig. 11(b) for the simpler a m implementation (realizing the (1− 2 e −qt )-type a m growth), and Fig. 11(c) for the time-delayed simpler a m implementation. For all three designs, AC-SAT successfully finds a solution after a certain time as indicated by the vertical dashed lines. Note that AC-SAT determines whether a solution is found via the DVC. As can be seen from the s i trajectories, the s i signals stabilize (i.e., converge) after a solution is found. Comparing the a m trajectories in the three different designs, one can see that the a m s grow most rapidly in the op-amp-based design due to the exponential growth function while some of the a m s (the ones corresponding to the satisfied clauses) in the time-delayed implementation decrease after they reach their peak magnitude, just as predicted by (8) .
B. Scaling Considerations
Besides functionality, the impact of interconnect parasitics on the circuit is another important consideration toward practical and modular designs of the solver. As the circuit size increases [i.e., O(M N )], for each variable array element in the SDC, the total parasitic capacitance from the branch blocks (Fig. 2) increases linearly with the number of branch blocks M , namely the maximum number of clauses that the solver can handle. Given a problem instance, if variable x is involved in y clauses (y < M ), then y branch blocks associated with x are active, while all other branch blocks are turned off. However, all the branch blocks contribute parasitic capacitance to the dynamical evolution of the variable x. To investigate the impact of parasitic capacitance, we have conducted a number of simulations of the solver circuit with various number of branch blocks (i.e., M ) in the SDC, i.e., 100, 500, 1000, 5000, and 10000 branch blocks for each variable array element. We used the circuits to solve various problem instances with 10, 20, and 30 variables, and evaluated the time to find a solution. Simulation results shown in Fig. 12 demonstrate that as the solver circuit becomes larger, the solver still functions correctly, but takes longer time to find solutions due to larger parasitic capacitance.
Another issue due to interconnect scaling is the capacitance value associated with the AVC elements. As the parasitic capacitance associated with variable signals increases with the number of branch blocks, the dynamic evolution of the variable signals becomes slower due to the RC charging rule. As a consequence, the AVC element, whose internal capacitance (i.e., contributed by the two capacitors in the effective and thus lead to the solver not able to find a solution. Therefore, it is critical to increase the values of the capacitors in the AVC elements as the circuit size increases. A basic approach is to choose the capacitance value such that the V a m s RC constant is comparable or smaller than the variable signal RC constant. As the parasitic capacitance of the SDC increases linearly with the number of branch blocks, the capacitance in the AVC element should also scale proportionally.
C. Device Variation Study
After validating that AC-SAT indeed can solve SAT problems correctly, we further investigate the robustness of AC-SAT against device variations. Typical analog circuits can be rather sensitive to device variations if not designed well. However, AC-SAT has two unique advantages in this aspect. First, the circuit itself does not rely on device matching. Second, the CTDS theory has been shown in theory to be robust against noise [36] . To demonstrate the robustness of our proposed AC-SAT system, we have conducted Monte Carlo simulations with respect to transistor size variations for randomly chosen 3-SAT problems. Specifically, we let the transistor widths follow a Gaussian distribution with standard deviation ( W/W ) of 0.05μm/ √ W × L for all transistor widths, which is an acceptable variance distribution for the 32-nm technology node [37] . In other words, the solver circuit is simulated with the Monte Carlo method considering 5% transistor width variations. For each problem, 100 Monte Carlo runs were performed. Fig. 13 shows the waveforms of one a m (t) signal and one s i (t) signal plus the output of DVC for one problem instance for 100 Monte Carlo simulations. As can be seen from the signal trajectories, the signals evolve consistently in the Monte Carlo simulations, and the results demonstrate the robustness of the circuit. Moreover, since analog circuits generally use mature technology nodes (e.g., 180 and 90 nm), we in fact validated our design in a relatively aggressive way. The circuit is expected to perform much better under mature technologies, whose variations would be much smaller than 5%.
To get a better comparison between the different designs, we performed Monte Carlo simulations on both the (Note that AC-SAT did not solve all the problems because of the physical voltage limit we imposed.) 3 These results indicate that, as expected, within the same physical constraints, AC-SAT based on the exponential growth a m is more effective than AC-SAT with the (1 − 2 e −qt )-type a m growth. Note that an exponential growth a m circuit implemented with an op-amp does consume larger area and energy, while the simpler a m circuit trades off area and energy with solver capability.
D. Performance Comparisons
To further investigate the effectiveness of AC-SAT, we compare the simpler a m -based AC-SAT design with: 1) a software program that solves the systems (3), (4) using an adaptive Runge-Kutta, fifth-order Cash-Karp method and 2) the software MiniSat solver [19] . The software programs are running on the same digital computer. We randomly generated 5000 hard (α = 4.25) 3-SAT problems that contain 1000 instances for each problem size of N = 10, 20, 30, 40, 50. The same initial conditions are applied whenever appropriate. Table I summarizes the average time needed to find solutions for each problem size. The AC-SAT column reports the analog/physical times taken by AC-SAT. The CTDS and MiniSat columns report the CPU times of the two software implementations, respectively. (To be fair, only the times taken by the solved problems for all three methods are included.) Observe that the times in the CTDS column increase nearly exponentially as the problem size increases. This is natural, since the numerical integration happens on a digital Turing machine, and in order to ensure the pre-set accuracy of computing the chaotic trajectory the Runge-Kutta algorithm has to do a very large number of window-refining discretization steps. As seen from the data in Table I , AC-SAT demonstrates average speedup factors of ∼10 5 to ∼10 6 and ∼10 4 over software CTDS and MiniSat, respectively. AC-SAT is also very competitive compared with existing hardware-based approaches. For example, a recent work [29] reported a CPU+FPGA-based MiniSat solver achieving ∼4X performance improvement over CPU-based MiniSat. Since ASIC implementations typically achieves a maximum of 10X performance improvement over their FPGA counterparts [38] , compared with a projected ASIC version of the FPGA design in [29] , AC-SAT would still result in ∼600X or higher speedup. We do not directly compare with the custom digital IC in [30] since our simulation-based system cannot solve the large size problems considered in [30] . (Note that the total solving times reported in [30] are extrapolated instead of directly obtained from simulation.) It is reported in [30] that an average speedup of ∼10 3 X over CPU-based MiniSat is obtained. As contrast, AC-SAT achieves ∼10 4 X speedup over CPU-based MiniSat. T Readers may be concerned with the complexity of the analog hardware design as well as other issues such as noise. It is important to note that the analog solver core is modular and consists of arrays with the same topology. Furthermore, the CTDS theory has been shown to be robust against noise [36] . AC-SAT is programmable, which means that different problem instances can be programmed or mapped to the AC-SAT circuit. AC-SAT is also modular, implying that: 1) it can be more easily extended to construct a larger solver and 2) multiple AC-SAT components can be used to solve the same problem instance by providing different initial conditions, hence allowing larger space to be searched simultaneously.
The current implementation of AC-SAT, however, does have some limitations. In particular, while the modular structure allows possible expansion to solve problems with larger numbers of variables and clauses, it can only address problems with clauses that have no more than the given number of k literals (k = 3 here). One way to solve such problems is to use the host processor to convert k-SAT problems (where k > 3) to 3-SAT problems (which can be done in polynomial time [16] ). How to directly tackle such challenges in hardware is left for future work.
V. CONCLUSION
We presented a proof-of-principle analog system, AC-SAT, based on the CTDS in [1] to solve 3-SAT problems. The design can be readily extended to general k-SAT problems. AC-SAT is modular, programmable and can be used as a SAT solver coprocessor. In this implementation the circuit size grows polynomially [O(N 2 )] as the problem size increases. Three different design alternatives were proposed and verified for implementing the auxiliary variable dynamics required by the CTDS. Detailed SPICE simulation results show that AC-SAT can indeed solve SAT problems efficiently and can tolerate well device variations. Compared with other SAT solvers, AC-SAT can achieve ∼10 4 × speedup over MiniSat running on a state-of-the-art digital processor, and can offer over 600× speedup over projected digital ASIC implementation of MiniSat.
Regarding the practical use of a hardware solver, we note that there are instances in the SAT contests that take a very long time (e.g., days or even months) to solve. The reason for the long (and exponentially growing) running time is due not only to the size of the problems, but also to their hardness. It has been demonstrated that when the constraint density (M/N) of a problem instance is between 4 and 5 (for 3-SAT), the problem can be very hard and take exponentially growing time for current software solvers to find a solution. Our work, together with its theoretical basis, however, provides a means to trade time for energy in order to speed up computations. With the circuit-friendly theory and proof-ofprinciple hardware implementation, we can solve hard SAT problems much faster than with software solvers on digital machines, however, at the expense of other resources such as energy (voltage and or/current values). Such tradeoffs are desirable for certain time-sensitive problems.
The CTDS equations (especially the dynamics for the auxiliary variables) and their analog implementations are not unique. It is quite possible that better forms and implementations exist. The fact that our proof-of-principle circuit implementations significantly outperform state-of-the-art solvers on digital computers are an indication that analog hardware SAT solvers have a great potential as application-specific processors for discrete optimization. As future work, we will further investigate alternative implementations of the auxiliary variable dynamics as well as methods to handle problem instances that do not fit on a given hardware implementation, e.g., through problem decomposition. Moreover, we will explore other methods that can, in principle, solve SAT problems even more efficiently, e.g., by combining clause learning (handled by a digital processor) with our analog solver.
