We propose novel algorithms for design and design space exploration. The designs computed by these algorithms are compositions of function types specified in component libraries. Our algorithms reduce the design problem to quantified satisfiability and use advanced solvers to find solutions that represent useful systems.
Introduction
Design is a next frontier in artificial intelligence. Providing algorithms and tools for conceiving novel designs benefits many areas such as analog and digital chip design, software development, mechanical design, and systems engineering. Human designers will be assisted in better navigating complex trade-offs such as speed versus number of transistors versus heat dissipation in an Integrated Circuit (IC). Users will choose from a richer base of trade-offs and this will lead to dramatic improvements in micro-electronics and computing.
Computation, representation, and tools have improved tremendously over the last decades so now, one can consider systematic enumeration of the design space. This paper provides a novel encoding scheme for efficient exploration of the digital design space of digital circuits.
The algorithms presented in this paper are more computationally intensive compared to heuristic search (Hansen and Zhou, 2007) and genetic algorithms (Miller et al., 2000) but provide sound and complete enumeration of the design space. Our algorithms exhaustively "prove" that certain designs can or cannot be made of k components where components are drawn from an arbitrary component library.
Traditional books on digital design, for example, teach the construction of a full-subtractor with seven components (Maini, 2007) and we found one with five gates only. The five and seven component version of the subtractor will have the same number of transistors but there are other technologies (such as 3-D, or quantum) where the five-component version will have smaller footprint and faster propagation times.
As a special case, the circuit generation algorithm presented in this paper, reduces to circuit minimization but its performance should not be compared to other optimization algorithms such as Quine-McCluskey (McCluskey, 1956) or Espresso (Brayton et al., 1984) . To illustrate the generality of our approach we have used it to design a reversible quantum circuit of minimal size (Nielsen and Chuang, 2010) .
Modern satisfiability (SAT) theory (Biere et al., 2009 ) is widely used in research and in industry. There are SAT solvers that can solve industrial problems with millions of variables (Järvisalo et al., 2012) . The algorithms in this paper enumerate circuit designs by solving Quantified Boolean Formulas (QBFs). QBF satisfiability is a generalization over satisfiability of propositional formulas where universal and existential quantifiers are allowed. The QBFs that our algorithms generate are of interest to designers of quantified satisfiability (QSAT) algorithms as there is always the need of benchmarks with practical applications (Janota et al., 2016) .
The algorithms of this paper are validated on an extensive benchmark of combinational circuits with more than seventy successful experiments. We have designed generators of combinational circuits of various size such as adders, multipliers, and multiplexers. These circuits are the basic building blocks of Arithmetic Logic Units and Field Programmable Gate Arrays (FPGAs) . In addition to that we consider four digital Integrated Circuits (ICs) from the wellknown 74XXX family. We have shown that our QBF-based circuit generation algorithm is multiple orders of magnitude faster compared to a graph-based generate and test algorithm.
Design Generation and Exploration
Technical designs materialize from requirements, specifications, and the designers' experience. The design process is iterative with versions continuously improving and being refined. Incomplete designs often do not meet the requirements and designers "debug" and fix them. Designers often create multiple alternatives for the users and builders to choose from. The later process is called design exploration.
A design is typically specified in some kind of requirements. Depending on the design domain, the requirements can be a mechanical blueprint, an electrical diagram, algorithmic pseudo-code or human readable text. To automate the generation and enumeration of designs, which is the main goal of this paper, we need some formal specification of a function or a design itself. Figure 1 illustrates the design generation process. The process is usually supported by Computer Aided Design (CAD) tools, Artificial Intelligence (AI), and combinatorial optimization algorithms. In some cases it is possible to consider the whole design space and completely exhaust the search. Complete algorithms for design and design exploration are the subject of this paper.
The information flow in solving a design problem is shown in Figure 2 . The component library (basis) is specified as a set of Boolean functions. An automated procedure is then used to generate a regular fabric of configurable components and topological interconnections (wires). The configurable fabric is appended to the user requirements which are also specified as a Boolean circuit or a Boolean function. The result is a miter: a formula that checks for Boolean function equivalence. The miter formula is fed to a QBF solver. The QBF solver computes a certificate that contains the configuration of the fabric. The final design is constructed from the certificate of the miter formula. There is only one computationally intensive step in generating a design: solving the QBF miter formula. Finding a satisfiable solution of a QBF is relevant to both satisfiability and game theory and is a prototypical PSPACEcomplete problem (Garey and Johnson, 1990) .
Consider an arbitrary QBF formula:
where Q 1 , Q 2 , . . . , Q n are either existential (∃) or universal (∀) quantifiers. It can be decided if a formula is true or not by iteratively "unpeeling" the outermost quantifier until no quantifiers remain. If we condition on the value of the first quantifier, we have:
The formula is then reduced to A ∧ B if Q 1 is ∀ and A ∨ B if Q 1 is ∃. This process of recursive formula evaluation resembles a game where alternating the quantifier types forces the solver between making the solver look for primal and dual solution of the formula ϕ. The recursive procedure suggested above is inefficient. Modern QBF solvers use advanced search methods based on DPLL (Ayari and Basin, 2002) , knowledge compilation such as OBDD (Coste-Marquis et al., 2005) , conflict learning, and even machine learning (Samulowitz and Memisevic, 2007) . Some QBF solvers cater to a subclass of QBF formulas such as 2-QBF where there is only one switch between existential and universal quantifiers.
Looking deeper, the QBF solving process resembles the high-level generate and test process of design. Although it is not trivial to reduce design generation and exploration to solving a QBF, in this paper we manage to do that and use the advances in QBF solving to discover novel circuits or circuit topologies.
Fundamental Concepts
Definitions 1-3 are directly adopted from Vollmer (2013) and formally introduce the notions of a Boolean function and a Boolean circuit.
Notice that, while in Vollmer (2013) a Boolean function has a single output, we do not have this restriction. Another difference is that we do not use function families, i.e., all our objects are finite.
Some common Boolean functions are negation (¬), disjunction (∨), conjunction (∧), exclusive or (⊕), implication 1 (→), and equivalence (↔). This paper uses everywhere infix, as opposed to prefix, notation. For example, p ∨ q is used instead of ∨ (p, q).
We also use equivalence (↔) instead of the equal sign (=) to specify Boolean functions. The function output is on the left while the inputs are on the right. For example, the Boolean function r = p ∨ q is written as r ↔ p ∨ q. When there are multiple outputs, we give a formula for each one of them. Figure 3 shows the Boolean function f ↔ x ∧ ¬y ∨ ¬x ∧ y as a tree. Notice that only the leaf nodes are variables while all non-leafs are operators.
Definition 2 (Basis). A basis B is defined as a finite set of Boolean functions.
Later in this section we will discuss the fine differences between a Boolean circuit and a Boolean function as the two concepts are similar in many ways. One of the most important differences is that circuits use bases while functions do not. A basis B can be thought of as the elementary unit of sharing or as an abstract component library. Unlike in the real world, though, each basis function can be used infinitely many times and all functions in a basis have the same cost. Figure 4 shows a basis consisting of typical unary and binary Boolean functions.
Buffer AND OR XOR
Figure 4: The standard basis Figure 5 show bases with multi-input/multi-output components. Figure 5a shows a basis consisting of two multi-output functions. They implement the Fredkin and the Toffoli gates (Fredkin and Toffoli, 1982; Toffoli, 1980) . These gates, also known as CSWAP and CCNOT gates, have application in reversible and quantum computing. Figure 5b shows a basis that contains one component only: a one-bit com-parator. Sorting networks are made of chains of comparators. Proving lowerbounds on the number of comparators necessary for the building of a k-input sorting network is an ongoing challenge (Codish et al., 2014) . The methods described in this paper provide novel methods for the optimal design and analysis of sorting networks.
It is possible to construct an "if-then-else" basis from the function sown in Figure 6 and the two Boolean constants (⊤ and ⊥). If a circuit uses this base and there is no fan-in or fan-out, then the problem of synthesizing minimal Binary Decision Diagrams Akers (1978) can be cast as circuit design. It is possible to work with higher-level components. In the design of an Arithmetic-Logic Unit (ALU), for example, one can consider a basis extending the standard gates with multi-bit adders, multipliers, barrel shifters, etc.
Definition 3 (Boolean Circuit). Given a basis B, a Boolean circuit C over B is defined as C = V, E, α, β, χ, ω , where V, E is a finite directed acyclic graph, α : E → N is an injective function, β : V → B ∪ {⋆}, χ : V → {x 1 , x 2 , . . . , x n } ∪ {⋆}, and ω : V → {y 1 , y 2 , . . . , y m } ∪ {⋆}. The following conditions must hold:
Boolean function (i.e., a Boolean constant) in B;
3. For every i, 1 ≤ i ≤ n, there is exactly one node v ∈ V such that χ(v) = x i ; 4. For every i, 1 ≤ i ≤ m, there is exactly one node v ∈ V such that ω(v) = y i .
The function α determines the ordering of the edges that go into a Boolean function when the ordering matters (such as in implication). The function α is not necessary if B consists of symmetric functions only.
The function β determines the type of each node in the circuit: a function in the basis B. The function χ specifies the set of input nodes {x 1 , x 2 , . . . , x n }.
The function ω specifies the set of output nodes {y 1 , y 2 , . . . , y n }. A node v is non-output, or computational, if χ(v) = ⋆ and ω(v) = ⋆. Figure 7 shows a simple and frequently used circuit that is used for adding the two binary numbers i 1 and i 2 and a carry input bit c i . The output is found in the sum bit Σ and in the carry output c o . Notice that there are two identical subcircuits in Figure 7 . These are the two half-adders.
Half-Adder
Half-Adder The circuits shown in Figure 7 and Figure 8 use the standard basis. They are used as running examples for the rest of the paper.
The main difference between Boolean functions and circuits is that function sharing is only supported in circuits. It is possible and straightforward to convert a circuit to an equivalent Boolean function but the number of operators in the Boolean function is often larger than the number of gates in the circuit. The full-adder shown in Figure 7 , for example, requires at-least six operators:
The XOR gate that adds i 1 and i 2 is used both in calculating the sum Σ and the carry-output bit c o . In some pathological cases, the blow-up can be exponential. The other direction is trivial: all Boolean functions are also circuits.
Sometimes we would like to talk about how the nodes in a circuit a connected, without concerning ourselves with the exact function of each node. This is referred to as the topology of a circuit.
Definition 4 (Topology). Given a circuit C = V, E, α, β, χ, ω , the topology of C is defined by the C sub-tuple G = V, E, χ, ω .
The graph in Figure 9 shows the topology of the full-adder circuits shown in Figure 7 . There are three types of nodes: the input nodes i 1 and i 2 , the internal nodes that correspond to gates, and the ouput nodes Σ and c o . The purpose of this paper is to provide algorithms for circuit and not for Boolean function synthesis. We focus on circuits because they are readily implemented on silicon and in most cases require a small number of transistors.
Component Selection Problems and the Universal Component Cell
Suppose we are given a basis B, a topology G = V, E, χ, ω , and a requirements circuit ψ. The purpose of our first algorithm is, given B, G, and ψ to create a circuit ϕ, such that ϕ ≡ ψ.
Consider the full-adder from Figure 7 as the requirements circuit ψ. Obtaining the topology G from ψ is trivial as the circuit topology is a sub-tuple of the circuit (see Definition 4). Let B be the standard basis shown in Figure 4 . Given that the requirements circuit, itself, uses B, there exists at least-one full-adder that uses the standard basis: that is the requirements ψ, itself. It is the trivial solution. We will see that there also exist multiple non-trivial solutions. Figure 10 shows an alternative, non-trivial, implementation ϕ of the fulladder ψ with gates different from the ones in Figure 7 . Instead of using two AND-gates, two XOR-gates, and an OR-gate, the alternative implementation makes two identical subsystems, each one containing an OR-gate and an XNORgate. The final carry output bit is computed by an AND-gate.
Figure 10: An alternative implementation of a full adder
We can think of the circuit shown in Figure 10 as a symmetrical equivalence of the circuit shown in Figure 7 . In what follows, we present an algorithm that computes and counts these symmetric circuit alternatives. This algorithm, based on QBF, is surprisingly efficient. We will see in the empirical results of Section 8 that circuits implementing common arithmetic and logical operations have many "deep" symmetries.
Problem 1 (Component Selection Problem). Given a basis B, topology G = V, E, χ, ω , and requirements ψ, construct a circuit ϕ = V, E, α, β, χ, ω , such that ϕ ≡ ψ.
Problem 1 is concerned with finding the type of each component in ϕ, or automatically specifying the functions α, and β. A design exploration problem is to count all possible circuit implementations.
Problem 2 (Counting Component Selection Configurations). Given a basis B, topology G = V, E, χ, ω , and requirements ψ, count the number of distinct circuits
A naïve approach to solving Problems 1 and 2 is to consider all possible combinations of component types. There is, of course, the need to perform an equivalence check for each combination of components and there are exponentially many combinations. Equivalence checking is a coNP-hard problem but it is often easy in practice (Matsunaga, 1996) . The problem of equivalence checking has been largely solved either by using compilation to Ordered Binary Decision Diagrams (OBDDs) as proposed by Bryant (1986) or through resolution methods (Marques-Silva and Glass, 1999) . Despite the practical ease of equivalence checking, solving any instance of Problem 1 would still require an exponential number of coNP-hard calls.
The main idea behind our approach for solving Problems 1 and 2 is the universal component cell.
The Universal Component Cell
The universal component cell is a Boolean circuit that can be configured to perform as any of the functions in a basis B. It is is shown in Figure 11 . All algorithms, presented in this paper construct the cells dynamically, depending on the content of the basis B. The main idea behind the universal component cell is to multiplex all inputs and demultiplex all outputs. The circuit shown in Figure 11 routes all input and output wires depending on its configuration. The configuration is a binary value assigned to a vector of selector lines S. The number of selector inputs is |S| = ⌈log 2 n⌉ where n is the number of distinct component types. Figure 12b shows a multiplexer of variable size and Figure 12a shows a demulitplexer of variable size. When constructing the cells, we take special care if the components in B have different numbers of inputs and outputs and if
The special care is that we augment the miter circuit with gates that "disable" these hanging wires.
Suppose there are n alternative gates and |S| = ⌈log 2 n⌉ selector lines. Both the demultiplexer shown in Figure 12a and the multiplexer shown in Figure 12b need n multi-input AND-gates and |S| inverters. All AND-gates have |S| + 1 inputs. The multiplexer also uses an OR-gate with n inputs. The space complexity of both circuits is O(|S| × n) when multi-input gates are realized with ladders of two-input ones.
An Efficient QBF-Based Algorithm
In what follows we reduce Problem 1 to finding a satisfiable solution of a QBF problem. Most QBF solvers, in addition to determining if a given QBF is satisfiable or not, also compute a partial certificate: an assignment to the variables in the outermost quantifier that satisfies or invalidates the formula. We use this assignment for constructing the solution of our problem. The circuit whose partial certificate is a solution of Problem 1 is shown in Figure 13 . The two subcircuits shown in Figure 13 illustrate the concept of a miter (Brand, 1993) . The miter is constructed from the requirements circuit φ and an interconnected topology of universal cells. The topology structure V, E is given as an input (we will relax this assumption in Sec. 6).
The miter is used for equivalence checking. The basic idea of a miter is to pairwise tie all inputs and outputs of the two circuits together and to check for satisfiability. The resulting inputs are X = {x 1 , x 2 , . . . , x n } and the outputs are Y = {y 1 , y 2 , . . . , y n }.
The subscircuit on the left side of Figure 13 has universal component cells only. The selector lines of all universal component cells make the variable set S. The solution of Problem 1 is a an assignment to all S-variables. All internal variables of the requirements circuit ψ and all internal variables of the universal component cell go in the variable set Z.
Algorithm 1: ConfigurationCounter(B, ψ)
Input : B, set of Boolean functions, basis ψ = V, E, α, β, χ, ω , Boolean circuit, requirements Output: count, integer, number of configurations
The circuit that contains the universal cells and the requiremnet is con-structed by the CreateMiter subroutine of Algorithm 1. The function copies the target circuit ψ under a new name φ ties together each pair of corresponding primary inputs and outputs and replaces all components in φ with universal cells. Each universal cell switches between components in B.
The QBF solvers we use accept a Conjunctive Normal Form (CNF) in Prenex Normal Form (PNF). The augmented formula miter is converted to CNF by the CircuitToCNF subroutine.
The method we use for circuit to CNF conversion introduces one ancillary Tseitin variable (Tseitin, 1983) per circuit gate and uses the naïve approach of Forbus and de Kleer (1993) . Each gate is converted to CNF by iteratively applying the De Morgan's law and distributing conjunction over disjunction. Circuit to CNF conversion is an open problem, an active area of research (Manolios and Vroon, 2007) and greatly affects the performance of the subsequent satisfiability checking.
The typical miter approach uses XOR gates to compare outputs. The two functions are different if and only if the miter is satisfiable. This is dual to using XNOR gates and checking for validity. It is also possible, instead of using output XNOR gates, to simply tie each pair of corresponding output wires together. A wired OR whose output is left floating is logically equivalent to an XNOR gate connected to ⊤. This achieves the same result but uses 2 × |OUT| less variables.
Brute-Force Circuit Enumeration
One can think of Boolean circuit synthesis as having two aspects: (i) comingup with a topology G and (ii) determining the type of each node in G. Algorithm 1 solves only (ii). Arguably, (i) is the more difficult part, and in general both (i) and (ii) must be solved simultaneously. In this section we combine Algorithm 1 and an exhaustive search over all possible topologies of a certain size.
Circuit design is an optimization problem: the objective is to minimize some property such as primary input to output propagation time or power (if the circuit is implemented electrically). The optimization criterion depends on the use-case. The main goal of this paper is to minimize the complexity of the circuit, i.e., the number of components.
Problem 3 (Optimal Circuit Design). Given a basis B and a target circuit ψ, compute a circuit ϕ = V, E, α, β, χ, ω , such that ϕ ≡ ψ and no other circuit
A circuit topology, itself, has two aspects: (i) how components are connected with each other and (ii) how components interface with the outside world in terms of primary inputs and outputs (χ and ω, respectively). This gives rise to a class of graphs that have three types of nodes: (i) primary inputs X, (ii) primary outputs Y , and (iii) internal nodes Z. It is assumed that each primary input node x ∈ X is connected to one or more internal nodes Z ′ ⊆ Z. Each primary output y ∈ Y is connected to a distinct internal node or a primary input r ∈ {X ∪ Z}. Our first approach to solving Problem 3 is to exhaustively enumerate all possible topologies up to a certain size. The topology that has the fully-connected graph is denoted as K. A fully-connected topology where the primary inputs, outputs, and internal nodes are partitioned is denoted as K |X|,|Y |,|Z| , where |X| is the number of primary inputs, |Y | is the number of primary outputs and |Z| is the number of internal variables. A circuit topology of size |V | = |X| + |Y | + |Z| is always a subgraph of K |X|,|Y |,|Z| . We can skip circuit topologies where two primary outputs are tied together.
Algorithm 2: ExhaustiveSearch(ψ)
Input : B, set of Boolean functions, basis ψ = V, E, α, β, χ, ω , Boolean circuit, requirements Output: count, integer, number of circuits
The number of circuit topologies of a certain size grows rapidly. The number of directed edges in K m,n,k is |E| = mk + nk + k(k − 1) = k(m + n + k − 1). This results in a total of 2 |E| subsets. Consider the topology of the full adder with three primary inputs and two primary outputs. The first six elements of the series |2 T 3,k,2 | are 2 5 , 2 12 , 2 21 , 2 32 , 2 45 , and 2 60 .
Algorithm 2 is the simplest method for circuit enumeration. It is guaranteed to terminate as there are upper-bounds for the number of components and for the number of subgraphs of K. Algorithm 2 is also guaranteed to generate a design if all components in ψ correspond to Boolean functions in the basis B.
Algorithm 2 computes designs of minimal size. The reason for that is that first all topologies with one internal node are tried, then all topologies with two nodes, etc.
Algorithm 2 solves Problem 1 for each candidate topology V ′ , E ′ , χ, ω . The number of invocations of the QBF solver can be significantly reduced if we consider non-isomorphic graphs only. There is no analytic approach to enumerating all non-isomorphic graphs of size k, the latter is a problem on its own. The world leaders in graph enumeration are McKay and Piperno (2014) and we use their algorithms.
Full Reduction to QSAT
The brute-force algorithm of Sec. 5 is too slow. It is possible to encode the whole circuit generation, both component selection and topology generation, as a single QBF satisfiability problem. The difficulty of generating a circuit is then left entirely to the QBF solver. The approach is shown in Algorithm 3.
Algorithm 3: ReduceQSAT(B, ψ, n)
Input : B, set of Boolean functions, basis ψ, Boolean circuit, requirements n, integer, maximal number of components Output: count, integer, number of circuits
Similar to Algorithm 2, Algorithm 3 first tries candidate circuits with one components, then with two, and so on, until an equivalent circuit is discovered. The CreateCellArray subroutine in line 3 adds k universal component cells as described in Sec. 4. All inputs of the k components are in X c and all outputs are in Y c . The selector variables for the types of components are in S c .
Next, the two nested loops in line 3 create a configurable component matrix by placing tristate buffers in a two-dimensional grid. A tristate buffer is a gate that acts a like a switch. In propositional logic, it is modeled as the formula s → (o ↔ i), where s is a the selector switch, o is the output and i is the input. The concept and the name are borrowed from electronics where a tristate buffer can either act as a regular buffer or its output can be switched to a highimpedance mode.
The fully connected tristate buffers graph is represented as an adjacency matrix. We use an adjacency matrix (as opposed to an adjacency list) Sedgewick (2002) that takes n 2 space where n is the number of nodes in the topology. Notice that there is a row in the matrix for each of the universal component cell's outputs and for each primary input. Similarly, there is a column for each input of a component and for each of the primary outputs. The matrix of tristate buffers is shown in Figure 15 . The Boolean circuit that is used for obtaining the topology graph of the circuit is modeled as an array of tristate buffers. The concept is illustrated in Figure 15 . Each tristate buffer is modeled as a propositional formula s → (o ↔ i) where s is the enable signal, i is the input, and o is the output.
The circuit shown in Figure 15 combined with k component cells is not enough for a QDPLL solver to find a digital circuit. The reason is in the "don't care" variables. There is a trivial solution which has all tristate buffers disabled or in "high impedance". No primary input is then connected to a primary output. The φ circuit has no wires. This configuration satisfies the QBF formula but does not produce a "valid" circuit. The same situation occurs when the tristate buffers in the connectivity matrix are connected in such a way that inputs and outputs in φ are disconnected. This situation can occur due to wire loops. The component types, then, do not matter. To solve this problem Algorithm 3 imposes extra constraints on the connectivity matrix.
To ensure a valid circuit with each QDPLL call, the MakeConstraints subroutine of Algorithm 3 appends the following constraints to the formula φ:
Cyclic Graph (T 1 ): The tristate buffers above the main diagonal of the component connectivity matrix are all disabled. This constraint imposes a strict ordering on the components and ensures that the outputs of each component are connected to the inputs of a successor component only. Instead of disabling all tristate buffers it is easier and more efficient to simply remove them.
Direct Primary Input to Output (T 2 ): These constraints disable tristate buffers in the lower-right corner block of the connectivity matrix. They ensure that there are no primary inputs connected directly to primary outputs.
Hanging Component Output (T 3 ): An "at-least-one" constraint is added for each row of the connectivity matrix. Each constraint is converted to a single CNF clause. The T 3 constraints ensure that there are no floating component outputs.
Hanging Component Input (T 4 ):
There is an "at-least-one" constraint for each column of the connectivity matrix. This ensures that there are no floating component inputs.
Clashing Component Outputs (T 5 ):
There is an "at-most-one" constraint for each row of the connectivity matrix. The T 5 constraints ensure that there are no two component outputs that are tied together. This does not prevent an output to fork-out via a wired-or junction (a column-wise "at-most-one" constraint would).
Unbalanced Universal Component Cell Ports (T 6 ):
The universal component, introduced in Sec. 4, does not necessarily combine components with the same number of inputs and outputs. Consider a basis B = {¬, ∧}. The universal component cell will have two inputs and an output. When the multiplexers and demultiplexers are configured to choose the inverter, there will be a hanging input (it does not matter which one). The T 6 constraints prevent other component or primary inputs being connected to this hanging input. The constraint types fall into two categories. The first one consists of T 1 , T 3 , T 4 , and T 6 and ensures a validity of the resulting circuit. The second, consists of T 2 and T 6 and improves the performance of the QDPLL search. Figure 17 shows the result of running Algorithm 3 with the standard basis and the full-subtractor shown in Figure 8 as a target. The generated circuit has five components only while the target one has seven. This is a substantial saving. Another circuit designed by Algorithm 3 is the reversible adder/subtractor shown in Figure 18 . This circuit, using one CCNOT and three CSWAP gates, has two constant inputs and two garbage outputs (u 1 and u 2 ). The five-input sorting network shown in Figure 19 is computed by Algorithm 3, configured with a basis containing a comparator only. Figure 20 shows a classical-full adder implemented with NAND-gates only. The design is the classical one where the two identical half-adder subsystems are visible. There also exists a full-adder design, implemented with NOR-gates only. It has the same topology as the one shown in Figure 20 . 
Computational Complexity
Historically, building arguments about the complexity of problems like the one presented in this paper, has been difficult. Problem 1, for example, resembles diagnostics and we could attempt a reduction from logic-based abduction, the complexity of which has been found by Eiter and Gottlob (1995) . The more interesting problem, however, is circuit generation (see Problem 3), and there, we use almost a reformulation of the Minimum Equivalent Expression (MEE) problem.
In our formulation the basis is specified as an input to the computational problem which brings additional difficulty to the complexity argument. Proof. For a Boolean formula ϕ with n literals, there exists an O(n) reduction from the Minimum Equivalent Expression (MEE) problem over signature {∨, ∧, ¬}. The MEE problem is classified as L22 in the polynomial-time hierarchy compendium (Schaefer and Umans, 2002) and is shown to be in Σ P 2 by Buchfuhrer and Umans (2011) .
The MEE problem asks if, given a Boolean formula ϕ and a constant k there exists a formula ψ for which ψ ≡ ϕ, and |ψ| < k. The number of literals in ψ is denoted as |ψ|. The circuit generation problem concerns generation of circuits with a minimal number of components. For a basis B = {¬, ∧, ∨}, the number of literals in the Boolean formula equivalent to the generated circuit is equal to the number of literals.
Theorem 2 (Complexity of Circuit Generation). The unrestricted circuit generation problem is either in Σ P 2 or in Σ P 3 .
Proof. The lower bound on the worst-time complexity comes from Theorem 1. The upper bound comes from Algorithm 3 as it solves the problem by reducing it to an ∃∀∃ QBF.
Reduction from a problem with known complexity to unrestricted circuit design seems difficult. Theorem 2 places the worst-case complexity of the problem in two levels. We expect the problem to be in Σ P 2 because further restricting the basis does not make the problem easier. Having a basis with a NAND-gate only is equivalent to DNF minimization which is also in Σ P 2 Umans (2001). Solving a Σ P 2 problem with a reduction to Σ P 3 problem may seem suboptimal but our goal is to apply practically QBF to VHDL and Verilog design and we are interested in average complexity of industrial problems. There are many useful applications of SAT solvers where the problems are solved by unitpropagation only.
One possible approach of solving the unrestricted circuit generation problem with a Σ P 2 reduction is to construct a ∀∃ QBF formula, look for non-satisfiability, and extract the circuit design from the partial certificate. This approach, however, requires additional and less intuitive constraints.
Experiments
All algorithms in this article are implemented as Python modules. We have compared two award-winning (Janota et al., 2016) QBF solvers: RAReQS (Janota et al., 2012) and DepQBF (Lonsing and Egly, 2017) . Both are written in pure C. The QCNF input to RAReQS and DepQBF has been preprocessed with Bloqqer (Biere et al., 2011) . The preprocessing step works by eliminating unnecessary clauses and variables. It performs several other optimizations as well. This gives significant speed-up.
All experiments were run on a small cluster with eight nodes. Each node has two Intel R Xeon R E5520 CPUs. Each CPU contains four cores, each core has two threads per core and the main CPU frequency is 2.26 GHz. Each node is equipped with 16 GiB of RAM.
Target Circuit Benchmarks
We have experimented with two sets of benchmark circuits. The first one is a scalable synthetic set of combinational arithmetic circuits (see Table 1 ). The size of each of the eight synthetic circuits, described in Table 1 , can be varied by setting a parameter n. Each variable-size circuit shares the same topology. The carry and borrow mechanisms of adders and subtractors, for example, have bus-like topology, while the adder networks of the multipliers resemble two-dimensional meshes.
Name
Description Role of the independent parameter n n-add Full-adder Number of inputs in one of the addends, carry input is not counted n-sub
Full-subtractor Number of inputs in the subtrahend, borrow input is not counted n-mux Multiplexer Number of input bits to be multiplexed, selectors are not counted n-demux Demultiplexer Number of output bits n-cmp Comparator Number of bits in one of the terms n-shift Barrel-shifter Number of input bits to be shifted, selectors are not counted n-vecadd Multi-operand adder Number of input bits to be added n-mul Multiplier Number of input bits in the multiplicand Table 1 : n-bit synthetic circuits Table 2 gives the number of primary inputs and outputs, and the number of components as a function of the size parameter n. Some of the circuits use a proxy parameter k to avoid the use logarithms.
Notes The adder, shown in Figure 21a and the subtractor, shown in Figure 21b , are both ripple-carry. Due to the long propagation of carry, they are not used in the design of modern ICs. Used as a target design and with a sufficiently fast QBF solver Algorithm 3 should be able to enumerate all parallel adders and subtractors. An example of a real-world four-bit adder with carry look-ahead design is the 74283 IC, which is discussed later. The multiplexer and demultiplexer architectures are the same as in Figure 12 . They can be generated for an arbitrarily sized input/output word.
The n-bit comparator, shown in Figure 22 , uses n XNOR gates to check for equality, and inverters and AND-gates to check for "greater than". The "less than" signal is derived from the other two outputs with the help of an OR-gate and another inverter.
Barrel-shifters are used for shifting or rotating the bits in a bit-word and have important application in the design Floating-Point Units (FPUs) and cryptography cores. Figure 23 shows a variable-size barrel-shifter. It shifts the input word to the right, losing the least-significant bits.
The barrel-shifter shown in Figure 23 uses a cascade of multiplexers with two inputs and one output. The amount of shifting is specified as a binary number on the selector lines s 1 , s 2 , . . . , s n . The total number of multiplexers is 2 n × n. There are some multiplexers with an input tied to ground on each column of the array shown in Figure 23 . We have 2 n−1 such multiplexers per column where n is the column number. Each such multiplexer loses an AND-gate and an OR-gate. This reduces the number of gates as accounted for in Table 2 . All multiplexers of a barrel-shifter reuse the same n inverters. The inverters are not shown in Figure 23 . The n-vecadd circuit, shown in Figure 24 , adds n single-bit numbers. A digital circuit that implements multi-operand addition is useful as a stand-alone circuit and also has application in multipliers Wallace (1964) . Multi-operand addition of single-bit numbers is also known as bit-counting or binary vector addition. Applications of satisfiability to optimization use bit-counting for implementing "at-least-k" or "at-most-k" constraints (Fu and Malik, 2006) .
The multi-operand adder is implemented as a chain of multi-operand fulladders (see Figure 24a) . Each full-adder adds one bit to a binary number and consists of k half-adders where k equals the number of bits necessary for representing the binary number (see Figure 24a) . The full-adders can be implemented without a carry-out bit, which saves one AND-gate. The multi-operand adder uses full-adders of increasing size. The first adder has one input, the second and third have two inputs, the next four have three inputs, etc.
This particular implementation of a multi-operand adder has no application in digital electronics due to the long primary inputs to outputs propagation time, but it is useful in constraint programming. The chained multi-operand adder can be used as a target circuit to allow the automatic discovery of advanced topologies such as the one in Wallace or Dadda trees Wallace (1964); Dadda (1965) . Figure 25 shows the architecture of a variable size multiplier that implements the standard "pen and paper" method. The multiplier consists of two subsystems: an array of AND-gates that computes partial products (see Figure 25a and a network of adders that sum the partial products (see Figure 25b) . Table 3 shows the second set of circuits. These circuits come from reverseengineered netlists of real-world ICs (Hansen et al., 1999) . The 74XXX circuits can be chained together into larger Arithmetic Logic Units (ALUs). a 2 a n−1 a n
(a) A half-adder chain with an optional carry out bit
A ladder of half-adder-chains for multi-operand addition Figure 26 shows the average QDPLL performance for the synthetic benchmark QBF problems generated by Alg. 1. The time-to-solution depends on the topology of the target circuit and, to some extent, on the choice of the QDPLL solver.
The plots in Figure 26 have logarithmic vertical axes to accommodate the exponential time-to-solution. Contrary to our intuition, the multiplier circuit is not the most difficult one and the full-adder is not the easiest. The performance is best for the demultiplexer, no matter which QBF solver has been used. In general, the QDPLL performance is better for large fan-outs. This can be explained with less back-tracking when there are more outputs.
Circuit Generation
The performance of Alg. 3 is summarized in Table 5 . The table columns are the same as in Table 4 . The scalability follows similar trends to Figure 26 . This makes us conclude that the gate selection part of the QSAT search is important for the overall speed of the circuit search. n-mux n-demux n-add n-sub n-tree n-mul n-comp n-shifter n-mux n-demux n-add n-sub n-tree n-mul n-comp n-shifter RAReQS solved performed always better than DepQBF. This is due to the specific structure of the QBF reductions: large outermost existential variable group, small number of universally quantified variables and, again, a large number of existentially quantified inner-most group.
Let us denote the number of innermost variable to clauses ratio of a CNF reduction of a problem as κ. We notice that for the 74XXX circuits shown in Table 4 , the variable to clause ratio is between 4.18 and 4.4. For the synthetic experiments shown in Figure 26 , it is between 2.78 and 4.13, and for the ones in Table 5 , we measure 4.13 ≤ κ ≤ 4.65. We are aware of SAT-like phase transitions (Russell and Norvig, 2003) in QBF (Cadoli et al., 1998) and given the suspiciously close values of κ to 4.3, and the fact that most of the QSAT solver use DPLL search, we think that there are similar phenomena in play. As a result we will focus our future work on finding encodings and constraints that avoid problematic values of κ.
Related Work
Circuit design is related to diagnostic reasoning (de Kleer and Williams, 1987) . Consider Problem 1 and Algorithm 1. The target circuit ψ can be thought of as an observation. Instead of augmenting ψ to create φ, as done in Algorithm 1, we can augment the buggy system description. The failure modes are "mistaken gate identity", for example, the modeler has used an AND-gate in place of an OR-gate. Algorithm 1 then computes minimal changes in the system description that explain the observed circuit.
The General Diagnostic Engine (GDE) of de Kleer and Williams (1987) can diagnose wiring errors and generate topology. When the problem is reduced to QBF, however, it is easier to avoid "don't cares" by universally quantifying the primary inputs. Combined with the "connect to successor components only" (see Sec. 6), our approach is more efficient in avoiding loops and exploring the design space.
Some of the motivation for our work comes from Arthur and Polak (2006) . The authors of this work show that the evolutionary design of a multi-bit adder takes significantly less steps than anticipated. This "ease" made us attempt a complete algorithm on a seemingly very difficult problem.
The problem of circuit synthesis has been first introduces by Roth and Karp (1962) . The authors use a very early computer, an IBM 7090, to solve decomposition problems of four variables in approximately ten minutes. For larger problems they propose a heuristics that would sacrifice the algorithm completeness. Our QBF algorithm, on the other hand, could solve problems of more than 30 variables. This was, of course, done on computers that are orders of magnitude faster but we expect that the difficulty of the synthesis/decomposition problems is at least in the second level of the polynomial hierarchy Stockmeyer (1977) . Another distinct advantage of our algorithm is that the synthesis/decomposition is in terms of multi-output Boolean functions while the paper of Roth and Karp (1962) supports single output functions only.
The use of the ∃∀∃-quantified miter has been proposed for FPGA synthesis (Ling et al., 2005) . This paper, however, addresses the component placement problem only and does not consider wiring, routing, and topology. Our paper demonstrates that the combined placement/routing problem can also be solved with a single QBF call and, thus, we have provided a fully automatic solution to the circuit synthesis problem.
Discussion
Modern digital designs such as the Pentium CPUs have millions of components. All algorithms in this paper are far from being able to synthesize and enumerate such designs. Large Integrated Circuits (ICs), however, are far from being optimal at the top-level. Companies that make digital circuits integrate subsystems with the designer of each subsystem focusing on the integrity and optimality of his or her own subsystem. This results in globally suboptimal designs that also have bugs, vulnerabilities and inefficiencies.
The problems we have defined are of industrial interest and create a benchmark that is useful in the QBF competition (Janota et al., 2016) . If accepted the benchmark will help the QBF community to create faster QBF solvers that have practical application. This can be achieved by noticing the structure of the circuit design problems.
We can, at any time, sacrifice completeness and turn the algorithms proposed in this paper into heuristic or stochastic ones. The easiest way to do that is to replace the complete QBF search with stochastic (Gent et al., 2003) .
The algorithms in this paper can be adopted to analog designs and design with state. The electronic designs that pose biggest challenge and are of significant practical and theoretical interests are hybrid. It is possible for our synthesis algorithms to work on analogue designs by using QBF modulo theory solvers. These are similar to satisfiability modulo theory solvers (Barrett and Tinelli, 2018) and do not exist at the time of writing of this. The theories can be Ordinary Differential Equations (ODEs) or Differential Algebraic Equations (DAEs). Similarly, the algorithms of this paper, can work for geometric and physical designs with QBF modulo Partial Differential Equations (PDEs).
Conclusion
This paper proposes novel and generic solution to the problem of circuit design and exploration. The problem of generating a circuit that is equivalent to a goal is solved similar to how electronic and logic designers solve it: first the component a chosen and placed, and second they are connected with wires. We have given empirical evidence that the complexity of the problem is determined, to a large extent, by the component selection part.
We have proposed a reduction to QBF for solving a difficult problem. We believe that this is the first practical sound and compete algorithm for circuit design and enumeration. The built-in heuristics, compilation and learning in the QBF solvers gives us several orders of magnitude improvement over a baseline graph generation algorithm.
Our method is more generic than anything proposed in literature as it considers arbitrary component libraries, such as ones consisting of reversible gates.
