# **SAT-Based Circuit Local Improvement** ## Alexander S. Kulikov ⊠ 😭 📵 Steklov Mathematical Institute at St. Petersburg, Russian Academy of Sciences, Russia St. Petersburg State University, Russia ### Danila Pechenev □ St. Petersburg State University, Russia ## Nikita Slezkin □ St. Petersburg State University, Russia #### Abstract Finding exact circuit size is notoriously hard. Whereas modern computers and algorithmic techniques allow to find a circuit of size seven in the blink of an eye, it may take more than a week to search for a circuit of size thirteen. One of the reasons of this behavior is that the search space is enormous: the number of circuits of size s is $s^{\Theta(s)}$ , the number of Boolean functions on n variables is $2^{2^n}$ . In this paper, we explore the following natural heuristic idea for decreasing the size of a given circuit: go through all its subcircuits of moderate size and check whether any of them can be improved by reducing to SAT. This may be viewed as a local search approach: we search for a smaller circuit in a ball around a given circuit. Through this approach, we prove new upper bounds on the circuit size of various symmetric functions. We also demonstrate that some upper bounds that were proved by hand decades ago, can nowadays be found automatically in a few seconds. **2012 ACM Subject Classification** Theory of computation $\rightarrow$ Circuit complexity Keywords and phrases circuits, algorithms, complexity theory, SAT, SAT solvers, heuristics Digital Object Identifier 10.4230/LIPIcs.MFCS.2022.67 Related Version Full Version: https://arxiv.org/abs/2102.12579 [15] Supplementary Material Software (Source Code): https://github.com/alexanderskulikov/circuit\_improvement Funding Alexander S. Kulikov: Supported by Russian Science Foundation (18-71-10042). Acknowledgements We are grateful to anonymous reviewers for many useful suggestions. ### 1 Boolean Circuits A Boolean straight line program of size r for input variables $(x_1, \ldots, x_n)$ is a sequence of r instructions where each instruction $g \leftarrow h \circ k$ applies a binary Boolean operation $\circ$ to two operands h, k each of which is either an input bit or the result of a previous instruction. If m instructions are designated as outputs, the straight line program computes a function $\{0,1\}^n \to \{0,1\}^m$ in a natural way. We denote the set of all such functions by $B_{n,m}$ and we let $B_n = B_{n,1}$ . For a Boolean function $f: \{0,1\}^n \to \{0,1\}^m$ , by $\operatorname{size}(f)$ we denote the minimum size of a straight line program computing f. A Boolean circuit shows a graph of a program: for every instruction $g \leftarrow h \circ k$ , there is a node g with two directed incoming edges from nodes h and k. Figure 1 gives an example for the $SUM_n: \{0,1\}^n \to \{0,1\}^l$ function that computes the binary representation of the sum of n bits: $$SUM_n(x_1, ..., x_n) = (w_0, w_1, ..., w_{l-1}): \sum_{i=1}^n x_i = \sum_{i=0}^{l-1} 2^i w_i$$ , where $l = \lceil \log_2(n+1) \rceil$ . This function transforms n bits of weight 0 into l bits of weights $(0, 1, \ldots, l-1)$ . © Alexander S. Kulikov, Danila Pechenev, and Nikita Slezkin; licensed under Creative Commons License CC-BY 4.0 47th International Symposium on Mathematical Foundations of Computer Science (MFCS 2022). Editors: Stefan Szeider, Robert Ganian, and Alexandra Silva; Article No. 67; pp. 67:1–67:15 Leibniz International Proceedings in Informatics LIPICS Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany **Figure 1** Optimal size straight line programs and circuits for SUM<sub>2</sub> and SUM<sub>3</sub>. These two circuits are known as *half adder* and *full adder*. The straight line programs are given in Python so that it is particularly easy to verify their correctness. For example, the program for SUM<sub>3</sub> can be verified with just three lines of code: ``` from itertools import product for x1, x2, x3 in product(range(2), repeat=3): w0, w1 = sum3(x1, x2, x3) assert x1 + x2 + x3 == w0 + 2 * w1 ``` Determining size(f) requires proving lower bounds: to show that size(f) > $\alpha$ , one needs to prove that every circuit of size at most $\alpha$ does not compute f. Known lower bounds are far from being satisfactory: the strongest known lower bound for a function family in NP is (3+1/86)n - o(n) [7]. Here, by a function family we mean an infinite sequence of functions $\{f_n\}_{n=1}^{\infty}$ where $f_n \in B_n$ . Even proving lower bounds for specific functions (rather than function families) is difficult. Brute force approaches become impractical quickly: $|B_n| = 2^{2^n}$ , hence already for n = 6, one cannot just enumerate all functions from $B_n$ ; also, the number of circuits of size s is $s^{\Theta(s)}$ , hence checking all circuits of size s takes reasonable time for small values of s only. Knuth [11] found the exact circuit size of all functions from $B_4$ and $B_5$ . Finding the exact value of size(f) for $f \in B_6$ is already a difficult computational task for modern computers and techniques. One approach is to translate a statement "there exists a circuit of size s computing f" to a Boolean formula and to pass it to a SAT solver. Then, if the formula is satisfiable, one decodes a circuit from its satisfying assignment; otherwise, one gets a (computer generated) proof of a lower bound size(f) > s. This circuit synthesis approach was proposed by Kojevnikov et al. [13] and, since then, has been used in various circuit synthesis programs (abc [1], mockturtle [24], sat-chains [10]). State-of-the-art SAT solvers are surprisingly efficient and allow to handle various practically important problems (with millions of variables) and even help to resolve open problems in mathematics [2]. Still, already for small values of n and s the problem of finding a circuit of size s for a function from $B_n$ is difficult for SAT solvers. We demonstrate the limits of this approach on counting functions: $\text{MOD}_n^{m,r}(x_1,\ldots,x_n) = [x_1+\cdots+x_n\equiv r \mod m]$ (here, [·] is the Iverson bracket: [S] is equal to 1 if S is true and is equal to 0 otherwise). Using SAT solvers, Knuth [12, solution to exercise 480] found size( $\text{MOD}_n^{3,r}$ ) for all $3 \leq n \leq 5$ and all $0 \leq r \leq 2$ . Generalizing the found values, he made the following conjecture: $$\operatorname{size}(\operatorname{MOD}_n^{3,r}) = 3n - 5 - [(n+r) \equiv 0 \bmod 3] \text{ for all } n \ge 3 \text{ and } r. \tag{1}$$ He was also able to prove (using SAT solvers) that $size(MOD_6^{3,0}) = 12$ and wrote: "The case n = 6 and $r \neq 0$ , which lies tantalizingly close to the limits of today's solvers, is still unknown." Knuth also describes various symmetry breaking heuristics and shows which of them give a significant speedup. Haaswijk et al. [8] show another way of speeding up the SAT-based approach for circuit synthesis: first, generate all possible circuit topologies, then, for each topology check using SAT solvers whether one can assign Boolean operation to the gates so that the resulting circuit computes a given function. To summarize, our current abilities for checking whether there exists a Boolean circuit of size s are roughly the following: for $s \le 6$ , this can be done in a few seconds; for $7 \le s \le 12$ , this can (sometimes) be done in a few days; for $s \ge 13$ , this is out of reach. #### 1.1 New Results In this paper, we explore the limits of the following natural idea: given a circuit, try to improve its size by improving (using SAT solvers, for example) the size of its subcircuit of size seven. This is a kind of a local search approach: we have no possibility to go through the whole space of all circuits, but we can at least search in a neighborhood of a given circuit. This allows us to work with circuits consisting of many gates. As the results of experiments, we show several circuits for which the approach described above leads to improved upper bounds. • We support Knuth's conjecture (1) for $MOD_n^{3,r}$ by proving the matching upper bound: $$\operatorname{size}(\operatorname{MOD}_n^{3,r}) \leq 3n - 5 - [(n+r) \equiv 0 \bmod 3] \text{ for all } n \geq 3 \text{ and } r.$$ This improves slightly the previously known upper bound size( $MOD_n^{3,r}$ ) $\leq 3n-4$ by Demenkov et al. [4]. To prove Knuth's conjecture, one also needs to prove a lower bound on size( $MOD_n^{3,r}$ ). The currently strongest known lower bound for size( $MOD_n^{3,r}$ ) is 2.5n-O(1) due to Stockmeyer [25]. - We present improvements for size(SUM<sub>n</sub>) for various small n and show that some of these circuits and their parts can be used as building blocks to design efficient circuits for other functions in semiautomatic fashion. In particular, we show that a part of an optimal circuit for SUM<sub>5</sub> can be used to build optimal circuits of size 2.5n for MOD<sub>n</sub><sup>4,r</sup> [25] and best known circuits of size 4.5n + o(n) for SUM<sub>n</sub> [4]. In turn, an efficient circuit for SUM<sub>5</sub> can be found in a few seconds if one starts from a standard circuit for SUM<sub>5</sub> composed out of two full adders and one half adder. - We design new circuits for the threshold function defined as follows: $$THR_n^k(x_1,\ldots,x_n) = [x_1 + \cdots + x_n \ge k].$$ The best known upper bounds for THR are the following: size(THR<sub>n</sub><sup>k</sup>) $$\leq kn + o(n)$$ for $2 \leq k \leq 4$ [5] (see also [26, 6.2, Theorem 2.3]), size(THR<sub>n</sub><sup>k</sup>) $\leq 4.5n + o(n)$ for $5 \leq k$ [4]. We get the following improvement: $\operatorname{size}(\operatorname{THR}_n^k) \leq (4.5 - 2^{2-\lceil \log_2 k \rceil})n + o(n)$ for $4 \leq k = O(1)$ . In particular, $\operatorname{size}(\operatorname{THR}_n^4) \leq 3.5n + o(n)$ and $\operatorname{size}(\operatorname{THR}_n^k) \leq 4n + o(n)$ for $5 \leq k \leq 8$ . The improved upper bounds are obtained in a semiautomatic fashion: first, we automatically improve a given small circuit with a fixed number of inputs using SAT solvers; then, we generalize it to every input size. For some function families, the second step is already known (for example, given a small circuit for $SUM_5$ , it is not difficult to use it as a building block to design an efficient circuit for $SUM_n$ for every n; see Section 3.1), though in general this still needs to be done manually. #### 1.2 Related work The approach we use in this paper follows the SAT-based local improvement method (SLIM): to improve an existing discrete structure one goes through all its substructures of size accessible to a SAT solver. SLIM has been applied successfully to the following structures: branchwidth [16], treewidth [6], treedepth [20], Bayesian network structure learning [21], decision tree learning [22]. # 2 Program: Feature Overview and Evaluation The program is implemented in Python. We give a high-level overview of its main features below. All the code shown below can be found in the file tutorial.py at [3]. One may run it after installing a few Python modules. Alternatively, one may run the Jupyter notebook tutorial.ipynb in the cloud (without installing anything) by pressing the badge "Colab" at the repository page [3]. # 2.1 Manipulating Circuits This is done through the Circuit class. One can load and save circuits as well as print and draw them. A nicely looking layout of a circuit is produced by the pygraphviz module [19]. The program also contains some built-in circuits that can be used as building blocks. The following sample code constructs a circuit for SUM<sub>5</sub> out of two full adders and one half adder. This construction is shown in Figure 2(a). Then, the circuit is verified via the check\_sum\_circuit method. Finally, the circuit is drawn. As a result, one gets a picture similar to the one in Figure 2(b). ``` circuit = Circuit(input_labels=['x1', 'x2', 'x3', 'x4', 'x5']) x1, x2, x3, x4, x5 = circuit.input_labels a0, a1 = add_sum3(circuit, [x1, x2, x3]) b0, b1 = add_sum3(circuit, [a0, x4, x5]) w1, w2 = add_sum2(circuit, [a1, b1]) circuit.outputs = [b0, w1, w2] check_sum_circuit(circuit) circuit.draw('sum5') ``` ### 2.2 Finding Efficient Circuits The class <code>CircuitFinder</code> allows to check whether there exists a circuit of the required size for a given Boolean function. For example, one may discover the full adder as follows. (The function <code>sum\_n</code> returns the list of $\lceil \log_2(n+1) \rceil$ bits of the binary representation of the sum of n bits.) Figure 2 (a) A schematic circuit for SUM<sub>5</sub> composed out of two full adders and one half adder. (b) The corresponding circuit of size 12. (c) An improved circuit of size 11. This is done by encoding the task as a CNF formula and invoking a SAT solver (via the pysat module [9]). The reduction to SAT is described in [13]. Basically, one translates a statement "there exists a circuit of size s comuting a given function $f: \{0,1\}^n \to \{0,1\}^m$ " to CNF-SAT. To do this, one introduces many auxiliary variables: for example, for every $x \in \{0,1\}^n$ and every $1 \le i \le r$ , one uses a variable that is responsible for the value of the i-th gate on the input x. As mentioned in the introduction, the limits of applicability of this approach (for finding a circuit of size s) are roughly the following: for $s \le 6$ , it usually works in less than a minute; for $7 \le s \le 12$ , it may already take up to several hours or days; for $s \ge 13$ , it becomes almost impractical. The running time may vary a lot for inputs of the same length. In particular, it usually takes much longer to prove that the required circuit does not exist (by proving that the corresponding formula is unsatisfiable). Table 1 reports the running time of this approach on several datasets. # 2.3 Improving Circuits The method <code>improve\_circuit</code> goes through all subcircuits of a given size of a given circuit and checks whether any of them can be replaced by a smaller subcircuit (computing the same function) via <code>find\_circuit</code>. For example, applying this method to the circuit from Figure 2(b) gives the circuit from Figure 2(c) in a few seconds. **Table 1** The running time of CircuitFinder on various Boolean functions. | function | circuit size | status | time (sec.) | |---------------|--------------|--------|-------------| | $SUM_5$ | 12 | SAT | 141.4 | | $SUM_5$ | 11 | SAT | 337.8 | | $MOD_4^{3,0}$ | 7 | SAT | 0.2 | | $MOD_4^{3,0}$ | 6 | UNSAT | 1178.8 | | $MOD_4^{3,1}$ | 7 | SAT | 0.2 | | $MOD_4^{3,1}$ | 6 | UNSAT | 1756.5 | | $MOD_4^{3,2}$ | 6 | SAT | 0.2 | | $MOD_4^{3,2}$ | 5 | UNSAT | 12.6 | | $MOD_5^{3,0}$ | 10 | SAT | 90.1 | | $MOD_5^{3,1}$ | 9 | SAT | 50.1 | | $MOD_5^{3,2}$ | 10 | SAT | 74.3 | Table 2 shows the time taken by $improve\_circuit$ to improve some of the known circuits for SUM, MOD<sup>3</sup>, and THR<sup>4</sup>. For SUM, we start from known circuits of size about 5n (composed out of full adders and half adders). For MOD<sup>3</sup>, we start from circuits of size 3n-4 presented by Demenkov et al. [4]. For THR<sup>4</sup>, we start from circuits of size about 5n (we start by computing SUM<sub>n</sub> and then compare the resulting $\log n$ -bit integer to 4). **Table 2** The running time of improve\_circuit on various Boolean functions. | function | circuit size | time (sec.) | |--------------------|---------------------|-------------| | $SUM_5$ | $12 \rightarrow 11$ | 6.7 | | $SUM_7$ | $20 \to 19$ | 5.8 | | $MOD_6^{3,0}$ | $15 \rightarrow 14$ | 17.0 | | $MOD_6^{3,1}$ | $15 \to 14$ | 17.2 | | $MOD_6^{3,2}$ | $14 \rightarrow 13$ | 16.7 | | $MOD_7^{3,0}$ | $17 \rightarrow 16$ | 31.3 | | $MOD_7^{3,1}$ | $17 \rightarrow 16$ | 33.6 | | $MOD_7^{3,2}$ | $16 \rightarrow 15$ | 30.5 | | $THR_5^4$ | $23 \rightarrow 10$ | 38.6 | | $\mathrm{THR}_6^4$ | $28 \rightarrow 14$ | 42.1 | | $\mathrm{THR}_7^4$ | $31 \rightarrow 17$ | 43.8 | | $\mathrm{THR}_8^4$ | $40 \rightarrow 22$ | 55.1 | | | | | # 3 New Circuits In this section, we present new circuits for symmetric functions found with the help of the program. A function $f(x_1, \ldots, x_n)$ is called *symmetric* if its value depends on $\sum_{i=1}^n x_i$ only. They are among the most basic Boolean functions: ■ to specify an arbitrary Boolean function from $B_n$ , one needs to write down its truth table of length $2^n$ ; symmetric functions allow for more compact representation: it is enough to specify n+1 bits (for each of n+1 values of $\sum_{i=1}^n x_i$ ); circuit complexity of almost all functions of n variables is exponential $(\Theta(2^n/n))$ , whereas any symmetric function can be computed by a linear size circuit (O(n)). Despite simplicity of symmetric functions, we still do not know how optimal circuits look like for most of them. Below, we present new circuits for some of these functions. #### 3.1 Sum Function The SUM function is a fundamental symmetric function: for any symmetric $f \in B_n$ , size $(f) \le \text{size}(\text{SUM}_n) + o(n)$ . The reason for this is that any function from $B_n$ can be computed by a circuit of size $O(2^n/n)$ by the results of Muller [18] and Lupanov [17]. This allows to compute any symmetric $f(x_1, \ldots, x_n) \in B_n$ as follows: first, compute $\text{SUM}_n(x_1, \ldots, x_n)$ using size $(\text{SUM}_n)$ gates; then, compute the resulting bit using at most $O(2^{\log n}/\log n) = o(n)$ gates. For the same reason, any lower bound $\text{size}(f) \ge \alpha$ for a symmetric function $f \in B_n$ implies a lower bound $\text{size}(\text{SUM}_n) \ge \alpha - o(n)$ . Currently, we know the following bounds for $\text{SUM}_n$ : $2.5n - O(1) \le \text{size}(\text{SUM}_n) \le 4.5n + o(n)$ . The lower bound is due to Stockmeyer [25], the upper bound is due to Demenkov et al. [4]. A circuit for SUM<sub>n</sub> can be constructed from circuits for SUM<sub>k</sub> for some small k. For example, using full and half adders as building blocks, one can compute SUM<sub>n</sub> (for any n) by a circuit of size 5n as follows. Start from n bits $(x_1, \ldots, x_n)$ of weight 0. While there are three bits of the same weight k, replace them by two bits of weights k and k+1 using a full adder. This way, one gets at most two bits of each weight $0, 1, \ldots, l-1$ ( $l = \lceil \log_2(n+1) \rceil$ ) in at most 5(n-l) gates (as each full adder reduces the number of bits). To leave exactly one bit of each weight, it suffices to use at most l half or full adders (o(n)) gates). Let us denote the size of the resulting circuit by s(n). The first row of Table 3 shows the values of s(n) for some $n \le 15$ (see (28) in [11]). **Table 3** The first row gives the size s(n) of a circuit for $SUM_n$ composed out of half and full adders, the second row shows known upper bounds for $size(SUM_n)$ (all of them were known before our work, see (28) in [11]). | $\overline{n}$ | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 15 | |----------------|---|---|---|----|-----------|-----------|-----------|-----------|-----------|-----------| | s(n) | 2 | 5 | 9 | 12 | 17 | 20 | 26 | 29 | 34 | 55 | | $size(SUM_n)$ | 2 | 5 | 9 | 11 | $\leq 16$ | $\leq 19$ | $\leq 25$ | $\leq 27$ | $\leq 32$ | $\leq 53$ | In a similar fashion, one can get an upper bound (see Theorem 1 in [14]) $$\operatorname{size}(\operatorname{SUM}_n) \le \frac{\operatorname{size}(\operatorname{SUM}_k)}{k - \lceil \log_2(k+1) \rceil} \cdot n + o(n). \tag{2}$$ This motivates the search for efficient circuits for $SUM_k$ for small values of k. The bottom row of Table 3 gives upper bounds that we were able to find using the program. The table shows that the first value where s(n) is not optimal is n=5. The best upper bound for $SUM_n$ given by (2) is 4.75n + o(n) for k=7. The upper bound for k=15 is 53n/11 + o(n) which is worse than the previous upper bound. But if it turned out that $size(SUM_{15}) \leq 52$ , it would give a better upper bound. The found circuits for $\mathrm{SUM}_n$ for $n \leq 15$ do not allow to improve the strongest known upper bound $\mathrm{size}(\mathrm{SUM}_n) \leq 4.5n + o(n)$ due to Demenkov et al. [4]. Below, we present several interesting observations on the found circuits. **Figure 3** (a) Two consecutive SUM<sub>3</sub> blocks. (b) The MDFA block. (c) The highlighted part of the optimal circuit for SUM<sub>5</sub> computes MDFA. ## 3.1.1 Best Known Upper Bound for the SUM Function The optimal circuit of size 11 for SUM<sub>5</sub> shown in Figure 2(c) can be used to get an upper bound 4.5n + o(n) for size(SUM<sub>n</sub>) (though not through (2) directly). To do this, consider two consecutive SUM<sub>3</sub> circuits shown in Figure 3(a). They compute a function DFA( $x_1, x_2, x_3, x_4, x_5$ ) = $(b_0, b_1, a_1)$ (for double full adder) such that, for every $x_1, \ldots, x_5 \in \{0, 1\}, x_1 + \cdots + x_5 = b_0 + 2(b_1 + a_1)$ . Figure 3(a) shows that size(DFA) $\leq 10$ . One can construct a similar block, called MDFA (for modified double full adder), such that $$MDFA(x_1 \oplus x_2, x_2, x_3, x_4, x_4 \oplus x_5) = (b_0, a_1, a_1 \oplus b_1),$$ see Figure 3(b). The fact that MDFA uses the encoding $(p, p \oplus q)$ for pairs of bits (p, q), allows to use it recursively to compute $SUM_n$ . As the original construction is presented in [4], below we give a sketch only. - 1. Compute $x_2 \oplus x_3, x_4 \oplus x_5, \dots, x_{n-1} \oplus x_n \ (n/2 \text{ gates}).$ - **2.** Apply at most n/2 MDFA blocks (no more than 4n gates). - **3.** The last MDFA block outputs two bits: a and $a \oplus b$ . Instead of them, one needs to compute $a \oplus b$ and $a \wedge b$ . To achieve this, it suffices to apply $x > y = (x \wedge \overline{y})$ operation: $a \wedge b = a > (a \oplus b)$ . Figure 4 shows an example for n = 17. The MDFA block was constructed by Demenkov et al. [4] in a semiautomatic manner. And it turns out that MDFA is just a subcircuit of the optimal circuit for $SUM_5$ ! See Figure 3(c). #### 3.1.2 Best Known Circuits for SUM with New Structure For many upper bounds from the bottom row of Table 3, we found circuits with the following interesting structure: the first thing the circuit computes is $x_1 \oplus x_2 \oplus \cdots \oplus x_n$ ; moreover the variables $x_2, \ldots, x_n$ are used for this only. This is best illustrated by an example, see Figure 5. **Figure 4** A circuit computing SUM<sub>17</sub> composed out of MDFA blocks. **Figure 5** Optimal circuits computing $SUM_n$ for n = 3, 4, 5 with a specific structure: every input, except for $x_1$ , has out-degree one. These circuits can be found using the following code. It demonstrates two new useful features: fixing gates and forbidding wires between some pairs of gates. ## 3.1.3 Optimal Circuits for Counting Modulo 4 The optimal circuit for SUM<sub>5</sub> can be used to construct an optimal circuit of size 2.5n+O(1) for $\text{MOD}_n^{4,r}$ due to Stockmeyer [25] (recall that $\text{MOD}_n^{4,r}(x_1,\ldots,x_n)=[x_1+\cdots+x_n\equiv r \mod 4]$ ). To do this, note that there is a subcircuit (of the circuit in Figure 2(c)) of size 9 that computes the two least significant bits $(w_0,w_1)$ of $x_1+\cdots+x_5$ (one removes the gates $g_5,w_2$ ). To compute $x_1+\cdots+x_n \mod 4$ , one first applies $\frac{n}{4}$ such blocks and then computes the parity of the resulting bits of weight 1 (every block takes four fresh inputs as well as one bit of weight 0 from the previous block). The total size is $9 \cdot \frac{n}{4} + \frac{n}{4} = 2.5n$ . Thus, block that Stockmeyer constructed by hand in 1977 to compute $\text{MOD}_n^4$ nowadays can be found automatically in a few seconds. #### 3.2 Modulo-3 Function In [13], Kojevnikov et al. presented circuits of size 3n + O(1) for $\text{MOD}_n^{3,r}$ (for any r). Later, Knuth [12, solution to exercise 480] analyzed their construction and proved an upper bound 3n-4. Also, by finding the exact values for $\text{size}(\text{MOD}_n^{3,r})$ for all $3 \le n \le 5$ and all $0 \le r \le 2$ , he made the conjecture (1). Using our program, we proved the conjectured upper bound for all n. ▶ Theorem 1. For all $n \ge 3$ and all $r \in \{0, 1, 2\}$ , $$size(MOD_n^{3,r}) \le 3n - 5 - [(n+r) \equiv 0 \mod 3].$$ **Proof.** As in [13], we construct the required circuit out of constant size blocks. Schematically, the circuit looks as follows. Here, the n input bits are passed from above. What is passed from block to block (from left to right) is the pair of bits $(r_0, r_1)$ encoding the current remainder r modulo 3 as follows: if r = 0, then $(r_0, r_1) = (0, 0)$ ; if r = 1, then $(r_0, r_1) = (0, 1)$ ; if r = 2, then $r_0 = 1$ . The first block $IN_k$ takes the first k input bits and computes the remainder of their sum modulo 3. It is followed by a number of MID<sub>3</sub> blocks each of which takes the current remainder and three new input bits and computes the new remainder. Finally, the block $OUT_l^r$ takes the remainder and the last l input bits and outputs $MOD_n^{3,r}$ . The integers k, l take values in $\{2, 3, 4\}$ and $\{1, 2, 3\}$ , respectively. Their exact values depend on r and n mod 3 as described below. The theorem follows from the following upper bounds on the circuit size of the just introduced functions: $\operatorname{size}(\operatorname{IN}_2) \leq 2$ , $\operatorname{size}(\operatorname{IN}_3) \leq 5$ , $\operatorname{size}(\operatorname{IN}_4) \leq 7$ , $\operatorname{size}(\operatorname{MID}_3) \leq 9$ , $\operatorname{size}(\operatorname{OUT}_2^0) \leq 5$ , $\operatorname{size}(\operatorname{OUT}_1^1) \leq 2$ , $\operatorname{size}(\operatorname{OUT}_3^2) \leq 8$ . The corresponding circuits are presented in [15] by a straightforward Python code that verifies their correctness. (The presented code proves the mentioned upper bounds by providing explicit circuits. We have also verified that no smaller circuits exist meaning that the inequalities above are in fact equalities.) Table 4 shows how to combine the blocks to get a circuit computing $\operatorname{MOD}_n^{3,r}$ of the required size. **Table 4** Choosing parameters k, m, l depending on $n \mod 3$ and r. The circuit is composed out of blocks as follows: $\mathrm{IN}_k + m \times \mathrm{MID}_3 + \mathrm{OUT}_r^l$ . For each pair $(n \mod 3, r)$ we show three things: the triple (k, m, l); the sizes of two blocks: $\mathrm{size}(\mathrm{IN}_k)$ and $\mathrm{size}(\mathrm{OUT}_r^l)$ ; the size of the resulting circuit computed as $s = \mathrm{size}(\mathrm{IN}_k) + 9m + \mathrm{size}(\mathrm{OUT}_r^l)$ . For example, the top left cell is read as follows: when r = 0 and n = 3t, we set k = 4, m = t - 2, l = 2; the resulting circuit is then $\mathrm{IN}_4 + (t - 2) \times \mathrm{MID}_3 + \mathrm{OUT}_0^2$ ; since $\mathrm{size}(\mathrm{IN}_4) = 7$ and $\mathrm{size}(\mathrm{OUT}_0^2) = 5$ , the size of the circuit is 7 + 9(t - 2) + 5 = 9t - 6 = 3n - 6. There are three corner cases that are not well-defined as they require the number of MID blocks to be negative (k = t - 2): (n = 3, r = 0), (n = 3, r = 2), and (n = 4, r = 2). | | n = 3t | n = 3t + 1 | n = 3t + 2 | |-------|---------------------------|---------------------------|---------------------------| | r = 0 | (4, t-2, 2), (7, 5), 3n-6 | (2, t-1, 2), (2, 5), 3n-5 | (3, t-1, 2), (5, 5), 3n-5 | | r = 1 | (2, t-1, 1), (2, 2), 3n-5 | (3, t-1, 1), (5, 2), 3n-5 | (4, t-1, 1), (7, 2), 3n-6 | | r = 2 | (3, t-2, 3), (5, 8), 3n-5 | (4, t-2, 3), (7, 8), 3n-6 | (2, t-1, 3), (2, 8), 3n-5 | ### 3.3 Threshold Function Recall that $THR_n^2(x_1,\ldots,x_n) = [x_1 + \cdots + x_n \ge 2].$ ▶ Theorem 2. For any $k \ge 4$ , $$\operatorname{size}(\operatorname{THR}_{n}^{k}) \le (4.5 - 2^{2 - \lceil \log_{2} k \rceil})(n + 2^{\lceil \log_{2} k \rceil} - k) + o(n).$$ **Proof.** For a sequence of 2m formal variables $y_1, z_1, \ldots, y_m, z_m$ , consider a function $g \in B_{2m}$ that takes $y_1, y_1 \oplus z_1, y_2, y_2 \oplus z_2, \ldots, y_m, y_m \oplus z_m$ as input and outputs $\mathrm{THR}^2_{2m}(y_1, z_1, \ldots, y_m, z_m)$ . Note that $\mathrm{THR}^2_{2m}(y_1, z_1, \ldots, y_m, z_m) = 1$ iff there is a pair containing two 1's or there are two pairs each containing at least one 1: $\mathrm{THR}^2_{2m}(y_1, z_1, \ldots, y_m, z_m) = 1$ iff there exists $1 \leq i \leq m$ such that $y_i = z_i = 1$ or $\mathrm{THR}^2_m(y_1 \oplus z_1, \ldots, y_m \oplus z_m) = 1$ . The condition $y_i = z_i = 1$ can be computed through $y_i$ and $y_i \oplus z_i$ using a single binary gate: $(y_i \wedge z_i) = (y_i \wedge \overline{(y_i \oplus z_i)})$ . Thus, $$g(y_1, y_1 \oplus z_1, \dots, y_m, y_m \oplus z_m) = \mathrm{THR}_m^2(y_1 \oplus z_1, \dots, y_m \oplus z_m) \vee \bigvee_{i=1}^m (y_i \wedge \overline{(y_i \oplus z_i)}).$$ Now, size(THR<sub>m</sub><sup>2</sup>) $\leq 2m + o(m)$ as shown by Dunne [5]. Also, clearly, size $$\left(\bigvee_{i=1}^{m} (y_i \wedge \overline{(y_i \oplus z_i)})\right) \leq 2m-1$$ . Thus, $$\operatorname{size}(g) \le 4m + o(m). \tag{3}$$ To construct a circuit for THR<sub>n</sub><sup>k</sup>, first, consider the case $k = 2^t$ where $t \ge 2$ is an integer. Apply t - 1 layers of MDFA's (as in Figure 4). It takes $$\frac{n}{2} + n \sum_{i=1}^{t-1} 2^{2-i} = (4.5 - 2^{3-t})n$$ gates. As a result, we get bits $w_0, \ldots, w_{t-2}, a_1, a_1 \oplus b_1, \ldots, a_m, a_m \oplus b_m$ , where $m = n/2^t$ , such that $$x_1 + \dots + x_n = w_0 + 2w_1 + \dots + 2^{t-2}w_{t-2} + 2^{t-1}(a_1 + b_1 + \dots + a_m + b_m)$$ . Note that $w_0 + 2w_1 + \cdots + 2^{t-2}w_{t-2} < 2^{t-1}$ . Hence, $$[x_1 + \dots + x_n \ge 2^t] = [a_1 + b_1 + \dots + a_m + b_m \ge 2].$$ Thus, it remains to compute the function g given 2m bits $a_1, a_1 \oplus b_1, \ldots, a_m, a_m \oplus b_m$ . By (3), it takes $4m + o(m) = 2^{2-t}n + o(n)$ gates. The total size of the constructed circuit is $$(4.5 - 2^{3-t} + 2^{2-t})n + o(n) = (4.5 - 2^{2-t})n + o(n).$$ Now, assume that $2^{t-1} < k < 2^t$ (hence $\lceil \log_2 k \rceil = t$ ). Clearly, $$[x_1 + \dots + x_n \ge k] = [(2^t - k) + x_1 + \dots + x_n \ge 2^t].$$ By the previous argument, there exists a circuit C computing $THR_{n+(2^t-k)}^{2^t}$ of size $$(4.5 - 2^{2-t})(n + (2^t - k)) + o(n) = (4.5 - 2^{2-\lceil \log_2 k \rceil})(n + 2^{\lceil \log_2 k \rceil} - k) + o(n).$$ By replacing arbitrary $(2^t - k)$ inputs of C by 1's, one gets a circuit computing $THR_n^k$ . ▶ Corollary 3. For $4 \le k = O(1)$ , size(THR $_n^k$ ) $\le (4.5 - 2^{2-\lceil \log_2 k \rceil})n + o(n)$ . In particular, size(THR $_n^4$ ) $\le 3.5n + o(n)$ and size(THR $_n^k$ ) $\le 4n + o(n)$ for $5 \le k \le 8$ . We conclude by presenting an example of a reasonably small circuit that our program fails to improve though a better circuit is known. The reason is that these two circuits are quite different. Figures 6 and 7 show circuits of size 31 and 29 for $THR_{12}^2$ . They are quite different and our program is not able to find out that the circuit of size 31 is suboptimal. The code below shows how one can construct the two circuits in the program. ``` c = Circuit(input_labels=[f'x{i}' for i in range(1, 13)], gates={}) c.outputs = add_naive_thr2_circuit(c, c.input_labels) c.draw('thr2naive') c = Circuit(input_labels=[f'x{i}' for i in range(1, 13)], gates={}) c.outputs = add_efficient_thr2_circuit(c, c.input_labels, 3, 4) c.draw('thr2efficient') ``` bit among n input bits; then, it remains to compute the disjunction of the remaining n-1 bits to check whether there is at least one 1 among them. In general, this leads to a circuit of size 3n-5. Figure 7 A circuit of size 29 for THR $_{12}^2$ : (a) block structure and (b) gate structure. It implements a clever trick by Dunne [5]. Organize 12 input bits into a $3 \times 4$ table. Compute disjunctions $r_1, r_2, r_3$ of the rows and disjunctions $c_1, c_2, c_3, c_4$ of the columns. Then, there are at least two 1's among $x_1, \ldots, x_{12}$ if and only if there are at least two 1's among either $r_1, r_2, r_3$ or $c_1, c_2, c_3, c_4$ . This allows to proceed recursively. In general, it leads to a circuit of size 2n + o(n). (Sergeev [23] showed recently that the monotone circuit size of THR $_n^2$ is $2n + \Theta(\sqrt{n})$ .) ### 4 Further Directions We focus mainly on proving asymptotic upper bounds for symmetric function families (that is, upper bounds that hold for every input size). A natural further step is to apply the program to specific circuits that are used in practice. It would also be interesting to extend the program so that it is able to discover the circuit from Figure 7. Finally, it would be interesting to generalize circuits for $SUM_n$ presented in Section 3.1.2. #### References - 1 https://github.com/berkeley-abc/abc. - 2 Joshua Brakensiek, Marijn Heule, John Mackey, and David Narváez. The resolution of Keller's conjecture. In Nicolas Peltier and Viorica Sofronie-Stokkermans, editors, Automated Reasoning 10th International Joint Conference, IJCAR 2020, Paris, France, July 1-4, 2020, Proceedings, Part I, volume 12166 of Lecture Notes in Computer Science, pages 48–65. Springer, 2020. doi:10.1007/978-3-030-51074-9\_4. - 3 https://github.com/alexanderskulikov/circuit\_improvement. - 4 Evgeny Demenkov, Arist Kojevnikov, Alexander S. Kulikov, and Grigory Yaroslavtsev. New upper bounds on the boolean circuit complexity of symmetric functions. *Inf. Process. Lett.*, 110(7):264–267, 2010. doi:10.1016/j.ip1.2010.01.007. - 5 Paul E. Dunne. Techniques for the analysis of monotone Boolean networks. PhD thesis, University of Warwick, 1984. - Johannes Klaus Fichte, Neha Lodha, and Stefan Szeider. Sat-based local improvement for finding tree decompositions of small width. In Serge Gaspers and Toby Walsh, editors, Theory and Applications of Satisfiability Testing SAT 2017 20th International Conference, Melbourne, VIC, Australia, August 28 September 1, 2017, Proceedings, volume 10491 of Lecture Notes in Computer Science, pages 401–411. Springer, 2017. doi:10.1007/978-3-319-66263-3\_25. #### 67:14 SAT-Based Circuit Local Improvement - Magnus Gausdal Find, Alexander Golovnev, Edward A. Hirsch, and Alexander S. Kulikov. A better-than-3n lower bound for the circuit complexity of an explicit function. In Irit Dinur, editor, IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA, pages 89–98. IEEE Computer Society, 2016. doi:10.1109/FOCS.2016.19. - Winston Haaswijk, Alan Mishchenko, Mathias Soeken, and Giovanni De Micheli. SAT based exact synthesis using DAG topology families. In *Proceedings of the 55th Annual Design Automation Conference, DAC 2018, San Francisco, CA, USA, June 24-29, 2018*, pages 53:1–53:6. ACM, 2018. doi:10.1145/3195970.3196111. - 9 Alexey Ignatiev, António Morgado, and João Marques-Silva. PySAT: A Python toolkit for prototyping with SAT oracles. In Olaf Beyersdorff and Christoph M. Wintersteiger, editors, Theory and Applications of Satisfiability Testing SAT 2018 21st International Conference, SAT 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 9-12, 2018, Proceedings, volume 10929 of Lecture Notes in Computer Science, pages 428-437. Springer, 2018. doi:10.1007/978-3-319-94144-8\_26. - $10 \quad \texttt{http://www-cs-faculty.stanford.edu/~knuth/programs.html.}$ - Donald E. Knuth. The Art of Computer Programming, Volume 4, Fascicle 0: Introduction to Combinatorial Algorithms and Boolean Functions (Art of Computer Programming). Addison-Wesley Professional, 1 edition, 2008. - 12 Donald E. Knuth. The Art of Computer Programming, Volume 4, Fascicle 6: Satisfiability. Addison-Wesley Professional, 1st edition, 2015. - Arist Kojevnikov, Alexander S. Kulikov, and Grigory Yaroslavtsev. Finding efficient circuits using SAT-solvers. In Oliver Kullmann, editor, Theory and Applications of Satisfiability Testing SAT 2009, 12th International Conference, SAT 2009, Swansea, UK, June 30 July 3, 2009. Proceedings, volume 5584 of Lecture Notes in Computer Science, pages 32–44. Springer, 2009. doi:10.1007/978-3-642-02777-2\_5. - 14 Alexander S. Kulikov. Improving circuit size upper bounds using sat-solvers. In Jan Madsen and Ayse K. Coskun, editors, 2018 Design, Automation & Test in Europe Conference & Exhibition, DATE 2018, Dresden, Germany, March 19-23, 2018, pages 305–308. IEEE, 2018. doi:10.23919/DATE.2018.8342026. - Alexander S. Kulikov, Danila Pechenev, and Nikita Slezkin. Sat-based circuit local improvement. CoRR, abs/2102.12579, 2021. arXiv:2102.12579. - Neha Lodha, Sebastian Ordyniak, and Stefan Szeider. A SAT approach to branchwidth. *ACM Trans. Comput. Log.*, 20(3):15:1–15:24, 2019. doi:10.1145/3326159. - 17 Oleg Lupanov. A method of circuit synthesis. Izvestiya VUZov, Radiofizika, 1:120-140, 1959. - 18 David E. Muller. Complexity in electronic switching circuits. *IRE Transactions on Electronic Computers*, EC-5:15–19, 1956. - 19 https://pygraphviz.github.io/. - Vaidyanathan Peruvemba Ramaswamy and Stefan Szeider. Maxsat-based postprocessing for treedepth. In Helmut Simonis, editor, Principles and Practice of Constraint Programming 26th International Conference, CP 2020, Louvain-la-Neuve, Belgium, September 7-11, 2020, Proceedings, volume 12333 of Lecture Notes in Computer Science, pages 478–495. Springer, 2020. doi:10.1007/978-3-030-58475-7\_28. - Vaidyanathan Peruvemba Ramaswamy and Stefan Szeider. Turbocharging treewidth-bounded bayesian network structure learning. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pages 3895-3903. AAAI Press, 2021. URL: https://ojs.aaai.org/index.php/AAAI/article/view/16508. - André Schidler and Stefan Szeider. Sat-based decision tree learning for large data sets. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pages 3904-3912. AAAI Press, 2021. URL: https://ojs.aaai.org/index.php/AAAI/article/view/16509. - Igor Sergeev. On monotone circuit complexity of threshold boolean functions. *Diskretnaya Matematika*, 32:81–109, 2020. doi:10.4213/dm1547. - 24 Mathias Soeken, Heinz Riener, Winston Haaswijk, Eleonora Testa, Bruno Schmitt, Giulia Meuli, Fereshte Mozafari, and Giovanni De Micheli. The EPFL logic synthesis libraries, November 2019. arXiv:1805.05121v2. - Larry J. Stockmeyer. On the combinational complexity of certain symmetric boolean functions. Mathematical Systems Theory, 10:323–336, 1977. doi:10.1007/BF01683282. - Ingo Wegener. The complexity of Boolean functions. Wiley-Teubner, 1987. URL: http://ls2-www.cs.uni-dortmund.de/monographs/bluebook/.