We investigate the algorithmic properties of circuits of bounded treewidth. Here the treewidth of a circuit C is defined as the treewidth of the underlying undirected graph of C, after the vertices corresponding to input gates have been removed. Thus, boolean formulae correspond to circuits of treewidth 1.
• Our first main result is an algorithm for counting the number of satisfying assignments of circuits with n input gates, treewidth ω, and at most s · n gates. The running time of our algorithm is 2
) , which for formulae instantiates to 2 n(1−1/O(s)) . This is the first algorithm to achieve exponential speed-up over brute force for the satisfiability of linear size circuits with treewidth bounded by a constant greater than 1. For treewidth 1, i.e., boolean formulae, our algorithm significantly outperforms the previously fastest 2 n(1−1/O(s 2 )) time satisfiability algorithm by Santhanam [30] .
• Our second main result is an algorithm for True Quantified Boolean Circuit Satisfiability for circuits of treewidth ω, in which every input gate has fan-out at most s. The running time of our algorithm is 2
) . Our algorithm is the first to achieve exponential speed-up over brute force for such circuits. Indeed, even for quantified boolean formulae where every variable appears at most s times, the previously best known algorithm by Santhanam [30] has running time 2 n(1−1/O(f (s)·log n)) .
Introduction
Satisfiability testing is both a canonical NP-complete problem [10, 23] and one of the most successful general approaches to solving real-world constraint satisfaction problems. Optimized cnf-sat heuristics are used to address a variety of combinatorial search problems successfully in practice, such as circuit and protocol design verification. Algorithms for the satisfiability problem are also central to complexity theory, as demonstrated by Williams [34] , who showed that any improvement (by even a superpolynomial factor compared to exhaustive search) for the satisfiability problem for general circuits implies circuit lower bounds. Furthermore he has successfully used the connection to prove superpolynomial size bounds for ACC 0 circuits using a novel satisfiability algorithm for ACC 0 circuits, solving a long standing open problem [35] . This raises the questions: For which circuit models do improved (over exhaustive search by at least a superpolynomial factor) satisfiability algorithms exist? How does the amount of improvement over exhaustive search relate to the expressive power of the model (and hence to lower bounds)? Can satisfiability heuristics for stronger models than cnf be useful for real-world instances?
A considerable amount of research has gone into devising improved algorithms for checking the satisfiability of various circuit models up to depth o(log n/ log log n) (see, for example, [25, 31, 29, 28, 32, 5, 20, 26, 17, 35, 18, 6, 19, 9, 24] ), although the question of improved algorithms for checking the satisfiability of linear size bounded depth threshold circuits is still unresolved. The improvements range from superpolynomial factors to exponential factors depending on the circuit model, size and depth. For unrestricted depth circuits, the progress is minimal. Improved algorithms have only been known for formulas over Boolean and the complete basis [30, 33, 8] .
Our general goal is to move beyond formulas to obtain improved algorithms for unrestricted depth circuits. It is a tantalizing question whether linear size circuits of unrestricted depth (even of logarithmic depth) admit satisfiability algorithms with a superpolynomial speedup over exhaustive search, the possibility of which has consequences for superlinear circuit lower bounds. All the known improved algorithms for satisfiability are such that the 'improvement' decreases exponentially in depth so that we get no speedup over exhaustive search when the depth is at least c log n for all sufficiently large c. In this paper, we consider unrestricted depth circuits with restricted treewidth as natural generalizations of formulas (circuits with treewidth 1) and obtain improved satisfiability algorithms for such circuits.
Treewidth is also a natural parameter to measure the structural complexity of a circuit. It should be noted that Alekhnovich and Razborov [1] and Allender et al. [2] have previously investigated the time and space complexity of cnf-sat from the perspective of the treewidth of the underlying graph. For a cnf φ with n input variables and m clauses, they define the treewidth ω of φ as the treewidth of the clause-variable graph of φ. Alekhnovich and Razborov [1] showed that the satisfiability of such cnf can be solved in time 2 O(ω ) |φ| O(1) and space 2 O(ω ) |φ| O(1) . Allender et al. improved the space bound to |φ| O(1) with a corresponding time bound of 2 O(ω ) log |φ| |φ| O (1) . These results also apply to circuits, as a circuit with m gates and n inputs of treewidth ω can be converted to a CNF with m input gates and treewidth O(ω ) with a 1-1 correspondence between the satisfying assignments. This shows that satisfiability for such circuits can be checked in time O(poly(m)2 O(ω ) ) where the definition of the treewidth includes input gates. While these results shed light on the connection between the the structural properties of the cnf and the complexity of their satisfiability, treewidth ω as defined can be as large as n for cnf and thus these results do not offer any improvements even for cnf-sat.
In this paper, we define treewidth ω of a circuit as the treewidth of the circuit after the input gates are removed. According to our relaxed definition of treewidth, the treewidth of cnfs and formulas is at most 1. We make modest progress towards the general goal by showing improved algorithms for the satisfiability of such circuits whereas previously such improvements all only known for formulas. In particular, we prove the following results in this paper.
• Our first main result (Theorem 1) is an algorithm for counting the number of satisfying assignments of circuits with n input gates, treewidth ω, and at most s·n gates. The running time of our algorithm is 2
) , which for formulae instantiates to 2 n(1−1/O(s)) . This is the first algorithm to achieve exponential speed-up over brute force for the satisfiability of linear size circuits with treewidth bounded by a constant greater than 1. For treewidth 1, i.e., boolean formulae, our algorithm significantly outperforms the previously fastest 2 n(1−1/O(s 2 )) time satisfiability algorithm by Santhanam [30] 1 .
• Our second main result (Theorem 2) is an algorithm for True Quantified Boolean Circuit Satisfiability for circuits of treewidth ω, in which every input gate has fan-out at most s. The running time of our algorithm is 2
• Utilizing the structural properties of low treewidth circuits which helped us obtain improved exponential-time algorithms for satisfiability, we also show that the number of wires of any constant treewidth circuit that computes the majority function must be super-linear (Theorem 3).
Remarks. The requirement of Theorem 1 that |N + C (I(C))| ≤ s · n is weaker than demanding that the gate-size of C is at most s · n. For formulas we have that ω = 1. Thus, for formulas, and more generally for circuits where the treewidth ω of C − I(C) is constant, Theorem 1 yields a factor 2 n/O(s) improvement over exhaustive search. This is an exponential improvement over O(2 n ) for circuits of linear wire-size, and sub-exponential improvement over O(2 n ) for s = O(n δ ) for δ < 1, i.e circuits of sub-quadratic gate-size. The sub-exponential improvement over O(2 n ) for circuits of sub-quadratic wire-size holds all the way up to ω = o(log n), i.e. sub-logarithmic treewidth.
Our algorithms all use exponential space, which limits their practical applicability. However, while the time usage of our algorithms is close to 2 n , the space usage of our algorithms is approximately 2 n/2 . Thus, the space used is about the square root of the time, which means that time is just as likely to become a bottleneck as space is.
As stated, the algorithms of Theorems 1 and 2, and the lower bound of Theorem 3 applies to un-bounded fan-in circuits over the De Morgan basis. However, any circuit C of fan-in d, treewidth ω, and gate-size s · n over the arbitrary Boolean basis can be converted into a circuit C of treewidth ω · d, and gate-size 2 d · s · n computing the same function. Hence the conclusions of Theorems 1, 2 and 3 also hold for constant fan-in circuits over the arbitrary Boolean basis (at the cost of a constant factor reduction in the savings for Theorems 1 and 2).
It would be interesting to see how close our savings of 1/O(s) in the exponent for formula satisfiability are optimal. For formulas in CNF with gate-size s · n, there are algorithms that achieve savings 1/O(log s) in the exponent [12] . Hence perhaps one could hope for savings of this form even for formula satisfiability. Nevertheless, even an algorithm with savings 1/O(s δ ) for δ < 1 would definitely be of interest.
Methods. In 1966, Neciporuk [27] proved a Ω(n 2 / log n) size lower bound for formulas computing the element distinctness function over the complete binary basis. The main ingredient of this proof is an upper bound on the number of sub-functions of a function computable by a size-bounded formula. More concretely, let f : {0, 1} n → {0, 1} be a function. For X + Y = n we can split the input into two parts and think of f as a function from {0, 1} X × {0, 1} Y to {0, 1}. Then each y ∈ {0, 1} Y naturally defines a subfunction f y : {0, 1} X → {0, 1} where f y (x) = f (x, y). For general functions f each y ∈ {0, 1} Y could result in a different subfunction f y : {0, 1} X → {0, 1}. However, if the function f is structured one could hope that many different choices for y ∈ {0, 1} Y will result in the function f y being the same. This is precisely what Neciporuk proved, provided that f is computed by a formula, and that the input gates corresponding to y do not have a large fan-out.
Specifically, if f is computed by a formula C (that is, a circuit C that becomes a tree after removing the input gates I(C)), and the total fan-out of the input gates corresponding to X is at most p, then the number of different subfunctions f y that can be obtained by selecting different y ∈ {0, 1} Y is at most 2 O(p) . In a recent paper, de Oliveira Oliveira [13] generalized Neciporuk's bound to circuits of bounded treewidth. In particular, de Oliveira Oliveira proved that if C−I(C) has treewidth ω instead of being a tree, then the number of different subfunctions f y that can be obtained by selecting different y ∈ {0, 1} Y is at most 2 2 O(ω) p .
The bounds of Neciporuk and de Oliveira Oliveira suggest the following approach to counting the number of satisfying assignments of a circuit, provided that the input circuit C has treewidth ω after the input gates have been removed. First, find a partitioning of the input gates of C into X and Y such that |X| = ·n for a constant < 1/4, but the total number p of wires leading out of X is small enough that 2 2 O(ω) p ≤ 2 n/2 . Such a partitioning can be found provided the circuit C is small enough. We count the number of pairs x and y such that C(x, y) = 1. We do this in two stages -go over every possible y, and for every y we count the number of choices for x such that C y (x) = 1. From the upper bound of de Oliveira Oliveira we know that the number of different subfunctions C y that the algorithm will encounter is upper bounded by 2 n/2 . When we encounter a subfunction C y we have not seen before, we try all possibilities for x and return the number of choices for x that resulted in C(x, y) = 1. When we encounter a subfunction C y that we have seen before, we do not need to try all possibilities for x, we can just return the same number of satisfying assignments as we did the last time we saw this subfunction. There are 2 n(1− ) choices for y. For at most 2 n/2 of these choices the subfunction C y has not been previously encountered. For each of these choices we spend 2 n time going over all choices of x. Thus the total running time is upper bounded by 2
The above argument has a serious gap -how do we check efficiently whether we have seen a subfunction C y before, without evaluating it on all inputs x? Indeed, if C is unsatisfiable then all subfunctions would evaluate to 0 for every input, but determining this amounts to solving the satisfiability problem for each choice of y! We overcome this problem by noticing that the bounds of Neciporuk and de Oliveira Oliveira can be made constructive in the following sense. We construct an efficiently computable "hash function" that given C and y outputs a "hash value" between 1 and 2 2 O(ω) p . If y and y produce the same hash value, then C y = C y .
Armed with such a hash function the algorithm can easily be made to work by replacing the question "has this subfunction been encountered before?" with "has a subfunction with the same hash value been encountered before?". This is precisely the approach taken in our algorithm for counting then number of satisfying assignments of a linear size bounded treewidth circuit, encapsulated in Theorem 1.
The algorithm for quantified circuit satisfiability (Theorem 2) follows the same approach. The only important difference is that now we have to pick a partition of the input gates into X and Y such that all of Y is quantified before all of X. For this reason our algorithm only works for the more restricted setting where all input gates have bounded fan-out, as opposed to the average fan-out being bounded.
Both of our algorithms are based on decompositions of the input circuit into · n parts, controlling the interaction between each of the parts and the input gates. Our circuit size lower bound for the Majority n function (Theorem 3) is based on the observation that, for circuits of linear wire-size and constant treewidth (of C − I(C)), the decomposition can be "scaled up" to a decomposition where the number of parts is constant, the number of wires between different parts is (essentially) constant, and each part depends only on n/2 of the input gates.
If C computes a symmetric function f this decomposition leads to a constant-cost multiparty communication protocol for f in the Number on the Forehead (NOF) model. Here each part of the decomposition corresponds to a player in the communication protocol. For Majority n , such a protocol is known not to exist [7] . Together these two facts yield Theorem 3, that linear wire-size circuits C with tw(C − I(C)) = O(1) can not compute the Majority n function.
Preliminaries
Graph notation. In this paper we deal with graphs that are simple, finite, and either directed or undirected. Given a graph G we will refer by E(G) to the edge set E of G and by V (G) to the vertex set V of G. For vertices u, v in V (G) we will use notation uv for the ordered pair (u, v). Thus, if G is an undirected graph with an edge between u and v then this edge is denoted both by uv and by vu. If G is a directed graph then uv denotes an edge with its head in v and tail in u. For any non-empty subset W ⊆ V , the subgraph of G induced by W is denoted by G[W ] and for ease of notation G − W is used for the induced subgraph G[V \ W ]. For a directed graph G the underlying undirected graph is a graph with vertex set V (G) and an (undirected) edge uv between vertices u and v whenever at least one of uv and vu is an edge in the original graph G.
For an undirected graph G, the neighborhood of a vertex v is
| an the degree of v in a directed graph is its degree in the underlying undirected graph. In a directed graph a sink is a vertex of out-degree 0, and a source is a vertex of in-degree 0.
A walk in a graph G is a sequence
Notice that this definition works both in directed and in undirected graphs, but in directed graphs the edges are required to point in the direction of the walk. A path in a graph G is a walk in which every vertex appears at most once, and a cycle in G is a walk where every vertex appears at most once, except for the first and last vertex, which are the same. A graph is acyclic if it does not contain any cycle. Directed acyclic graphs are commonly referred to as DAGs. An undirected graph is connected if there is a path between every pair of vertices. A tree is an undirected graph that is both connected and acyclic.
Every DAG G admits a topological ordering [14] , which is an ordering of the vertices σ : V (G) → {1, . . . , |V (G)|} such that for every edge uv ∈ E(G), σ(u) < σ(v). Hence, every DAG has at least one source and at least one sink (the vertices u and v with σ(u) = 1 and
Treewidth. A tree decomposition of an undirected graph G is a pair (T, χ) consisting of a tree T and a function χ : V (T ) → 2 V (G) satisfying the following properties. For every uv ∈ E(G), {u, v} ⊆ χ(v) for some v ∈ V (T ); and for every vertex v ∈ V (G) the set {u ∈ V (T ) : v ∈ χ(u)} is non-empty and induces a connected subtree of T . The elements of the range {χ(v) : v ∈ V (T )} of χ are called bags of T . In a rooted tree-decomposition (T, χ) the tree T is rooted, and the root vertex of T is denoted by r(T ). The width of a tree decomposition (T, χ) of G is denoted by tw(T, χ, G) and is defined to be the maximum over vertices u ∈ V (T ) of |χ(u)| − 1. In other words, the width of a tree decomposition is the maximum size of a bag, minus one. The treewidth of G is denoted by tw(G) and is defined to be the minimum over all tree decompositions (T, χ) of G of tw(T, χ, G). We extend the function χ to vertex sets in the following way: for a subset S of V (T ), χ(S) = v∈S χ(v). For a directed graph G, a tree decomposition of G is a tree decomposition of the underlying undirected graph. Similarly, the treewidth of G is the treewidth of the underlying undirected graph.
Circuits. A boolean circuit (or just circuit) is a directed acyclic graph C, in which every vertex other than the sources are labeled with the symbols ∧, ∨, or ¬. Every gate labeled ¬ has in-degree exactly equal to 1, and out-degree at most 1. The vertices of C are called gates, with the sources of C called input gates, and the sinks of C called output gates. Gates that are neither input nor output gates are called normal gates. The set of input gates, output gates and normal gates of a circuit C are referred to as I(C), O(C) and N (C) respectively.
A gate labeled ∧, ∨, or ¬ is called an and-gate, or-gate or not-gate, respectively. We say that a gate u feeds into a gate v if uv ∈ E(C). The fan-in of a gate is its in-degree, and the fan-out is its out-degree. The fan-in and fan-out of a circuit C is the maximum fan-in and fan-out of its gates. In the description of circuits we will often use the following short-hand: when we say that a gate u feeds negatively into a gate v, we mean that there is a not gate w such that u feeds into w and w feeds into v.
We will identify 0 with the meaning "false" and 1 with the meaning "true". An evaluation of a circuit C is an assignment to each gate of C a value in {0,1} such that: for every and-gate assigned 1 every gate feeding into it is assigned 1, for every and-gate assigned 0 at least one gate feeding into it is assigned 0, for every or-gate assigned 1 at least one feeding into it is assigned 1, for every or-gate assigned 0 every gate feeding into it is assigned 0, for every not-gate assigned 1 the unique gate feeding into it is assigned 0, and for every not-gate assigned 0 the unique gate feeding into it is assigned 1. For every assignment of values to the input gates of C there is exactly one evaluation of C that assigns precisely these values to the input gates.
Given a circuit C we can define a function f : 2 I(C) → 2 O(C) as follows. Given a subset X of I(C) consider the unique evaluation of C that assigns 1 to the input gates in X and 0 to the other input gates, and let O be the subset of O(C) gates assigned 1 by this evaluation. We will abuse notation and denote by C : 2 I(C) → 2 O(C) the function f computed by the circuit C. If the elements of I(C) and O(C) are ordered as {i 1 , i 2 , . . . , i n } and {o 1 , o 2 , . . . , o m } respectively then the function C can also be thought of as a function from {0, 1} n to {0, 1} m . For a circuit C with n input gates and 1 output gate, a satisfying assignment is an assignment x ∈ {0, 1} n such that C(x) = 1.
For a function f : I(C) → O(C), subset X of I(C) and subset R of X we define the restriction of f by R to be a function f R : 2 I(C)\X → 2 O(C) defined as follows. For every
The set R is sometimes provided as an assignment r : X → {0, 1} where r(x) = 1 if x ∈ R and f (x) = 0 otherwise, and we will use f r as shorthand for f R . Most commonly this notation will be used in the context of a function C computed by a circuit.
We may assume that in addition to the input gates, there are two input gates where one is set to 1 and the other is set to 0. Every evaluation of the circuit assigns 1 to the gate 1 and 0 to the gate 0. We will refer to these as constant gates. Throughout the algorithm, by forcing a gate g to 0 (or to 1) we mean removing all wires feeding into g and adding a new wire from 0 (or 1) into g. Note that if g is an input gate then after g is forced to 0 or 1, g is no longer an input gate. For a circuit C, gate g and value v ∈ 0, 1 we will refer by C g←v to the circuit obtained from C by forcing g to v. This notation can be extended to forcing a set of gates according to an assignment to all of them. If X is a subset of the gates of C and r : X → {0, 1} then C X←r refers to the circuit obtained from C by forcing each gate g in X to r(g). Note that if X is a set of input gates then the circuit C X←r computes the function C r .
The gate-size of a circuit is the total number of gates, excluding the not-gates. The wire-size of a circuit is the total number of wires, that is |E(C)|. A circuit C has linear gate-size (or wire-size) if the gate-size (wire-size) is upper bounded by O(n) where n = |I(C)|. Similarly, quadratic gate-size (or wire-size) means that gate-size (wire-size) is upper bounded by O(n 2 ). In this paper, when we refer to the treewidth of a circuit C, we will always mean the treewidth of the underlying graph of C − I(C), unless explicitly specified otherwise.
Our paper differs slightly from existing literature in how circuits are defined and how their size is measured. However the differences are purely notational. More concretely we demand that every not-gate has fan-out 1, while this is typically not required. On the other hand we do not count the not-gates towards the size of the circuit, while these gates are usually counted. Notice that a not-gate with fan-out x can easily be replaced by x not-gates of fan-out 1. This will at most double the total number of wires and gates (if you count not-gates) and not change the number of gates other than not-gates. Further, this operation can not increase the treewidth of the circuit, because on the underlying graph undirected it is a contraction operation, followed by a series of edge subdivisions (see [11, 14] for a definition of these graph operations).
Satisfiability Algorithms
In this section we will prove our main algorithmic results. We will assume that we have been given as input a circuit C with n input gates and a single output gate o, together with a tree decomposition of C − I(C) of width at most ω. We will assume without loss of generality that the output gate o is an and-gate. If it is not, we may add a new and-gate and make o feed into the new gate, making the new gate the new output gate.
Splitting gates. The first ingredient both of our algorithms and of our lower bounds is a simple "splitting" operation on gates. Given the circuit C and a normal gate h of C, splitting h results in a circuit C which is identical to C, with the following differences. The gate h is replaced by 4 gates h in , h out , h L , and h R . The gate h in is an input gate, h out is a gate of the same type as h, and h L , and h R are both or-gates. Further, h in feeds into h L and feeds negatively into h R . On the other hand h out feeds negatively into h L and feeds into h R . Both h L and h R feed directly into the output gate. Finally, every wire uh ∈ E(C) is replaced by the wire uh out in C . Every wire hu ∈ E(C) is replaced by the wire h in u in C . This concludes the construction of C .
Notice that h in feeds into all the out-neighbors of h, while h out is being fed into by all the in-neighbors of h. Thus the naming convention seems counter-intuitive, and it is tempting to swap the names of h in and h out . However this would result in h out being an input gate, which also would not be ideal.
Let e : V (C) → {0, 1} be an evaluation of C, and define a function e : V (C ) → {0, 1} by setting e (g) = e(g) for every gate g ∈ V (C ) \ {h in , h out , h L , h R }, e (h in ) = e (h out ) = e(h) and e (h L ) = e (h R ) = 1. It is easy to verify that e is an evaluation of C . Similarly, let e be an evaluation of C that assigns 1 to the output gate. It follows that both h L , h R are assigned 1, which in turn means that h in and h out must both be assigned 0 or both be assigned 1. Let e : V (C) → {0, 1} be the function such that for every gate g ∈ V (C) \ {h} we have e(g) = e (g) and e(h) = e (h out ). It is easy to verify that e is an evaluation of C.
The circuit C has one additional input gate h in in addition to the inputs I(C). Hence C computes a function from {0, 1} n+1 to {0, 1}, where we associate the last coordinate of the input to the value assigned to h in . This leads to the following observation. Observation 1. For every x ∈ {0, 1} n such that C(x) = 1 there exists exactly one y ∈ {0, 1} such that C (x, y) = 1. For every x ∈ {0, 1} n and y ∈ {0, 1}, if C (x, y) = 1 then C(x) = 1.
We will often need to split all the gates h ∈ H for a set H of normal gates in C. From the definition of splitting it follows that the order in which we split the gates in H does not matter -the resulting circuit C is the same. The circuit C has m = |H| additional input gates h in 1 , . . . , h in m in addition to the inputs I(C). Hence C computes a function from {0, 1} n+m to {0, 1}, where we associate the m last coordinates of the input to the values assigned to h in 1 , . . . , h in m . Together with Observation 1 this leads to the following lemma. Lemma 1. Let C be the circuit obtained from C by splitting a set H of m normal gates. For every x ∈ {0, 1} n such that C(x) = 1 there exists exactly one y ∈ {0, 1} m such that C (x, y) = 1. For every x ∈ {0, 1} n and y ∈ {0, 1} m , if C (x, y) = 1 then C(x) = 1.
Propeller Decompositions. The second ingredient of our algorithms and lower bounds is a decomposition lemma for graphs of bounded treewidth. A propeller decomposition of an undirected graph G is a partition of V (G) into disjoint vertex sets H, B 1 , B 2 , . . . , B t such that for every i ≤ t, N (B i ) ⊆ H. The set H is called the hub of the decomposition, and the sets B 1 , B 2 , . . . , B t are called blades. The width of the propeller decomposition is defined as max i |N (B i )|. A propeller decomposition of a directed graph is simply a propeller decomposition of the underlying undirected graph. The following lemma is a variation of Fomin et al. [16, Lemma 5] , (see an extended version [15] for the proof) with essentially the same proof, which we include for completeness.
Lemma 2. There exists a linear time algorithm that given a graph G, a tree decomposition (T, χ) of G of width ω, and a vertex set Z ⊆ V (G) computes a propeller decomposition H, B 1 ,  B 2 , . . . , B t of width at most 2(ω + 1), such that t ≤ 2|Z|, Z ⊆ H, and |H| ≤ 2|Z|(ω + 1).
Towards the proof of Lemma 2 we need the notion of least common ancestor-closure in trees. For a rooted tree T and vertex set M in V (T ) the least common ancestor-closure (LCA-closure) LCA-closure(M ) is obtained by the following process. Initially, set M = M . Then, as long as there are vertices x and y in M whose least common ancestor w is not in M , add w to M . When the process terminates, output M as the LCA-closure of M . The following (folklore) lemma summarizes two basic properties of LCA closures (see [15] for a proof).
With Lemma 3 in hand we are ready to prove Lemma 2.
Proof of Lemma 2. For every v ∈ Z add a node u in V (T ) such that v ∈ χ(u) to a set M . We have that M ≤ |Z|. Let M be the set of marked nodes and set M = LCA-closure(M ). By Lemma 3, M ≤ 2|M | ≤ 2|Z|. Let Q 1 , Q 2 , . . . , Q be the connected components of T \ M . By Lemma 3 we have that for every i ≤ t, |N T (Q i )| ≤ 2.
Without loss of generality all components Q 1 , Q 2 , . . . , Q t have exactly 2 neighbors in T while all components Q t+1 , Q t+2 , . . . , Q have one neighbor. Contracting each component Q i to a single vertex leaves a tree where the vertices corresponding to the components Q 1 , Q 2 , . . . , Q have degree 1 or 2. Hence t ≤ |M | − 1, and for each component Q i with 1 neighbor in M there is a component Q j with 2 neighbors in M such that N T (Q i ) ⊆ N T (Q j ). Build Q 1 , Q 2 , . . . , Q t as follows. Initially, for every i ≤ t, set Q i = Q i . Then, for every j from t + 1 to add Q j to the lowest indexed Q i such that N T (Q j ) ⊆ N T (Q i ). It follows that M, Q 1 , . . . , Q t is a partition of V (T ), and that each Q i has |N T (Q i )| ≤ 2.
Define H = u∈M χ(u) and for each 1 ≤ i ≤ t set B i = u∈Q i χ(u) \ H. Since every vertex of G appears in a bag of the tree-decomposition, H, B 1 , . . . , B t forms a partition of V (G). By construction we have that for every i, N (B i ) ⊆ H. Furthermore, since |N T (Q i )| ≤ 2 we have |N (B i )| ≤ 2(ω + 1). Finally, the choice of M implies Z ⊆ H and, since |M | ≤ 2|Z|, we have that |H| ≤ 2|Z|(ω + 1). It is easy to implement a procedure that computes H, B 1 , . . . , B t in this way in linear time.
Algorithmic Version of the Oliveira-Neciporuk Subfunction Bound
We are now ready to prove our main technical result -a constructive bound on the number of subfunctions of a function computable by a circuit of bounded treewidth.
Lemma 4.
There exists an algorithm A that takes as input a C circuit with n input gates and gate-size m, a tree decomposition (T, χ) of C − I(C) of width ω, a partition of I(C) into X ∪ R and an assignment r : R → {0, 1}, runs in time O(4 ω · m), and outputs a string A(C, (T, χ), X, r) ∈ {0, 1} , where ≤ 6(|N + C (X)| + 1) · (ω + 1) · 2 2(ω+1) . Furthermore, for any two assignments r and r to R such that A(C, (T, χ), X, r) = A(C, (T, χ), X, r ), the functions C r and C r are equal.
Proof. We first describe the algorithm A. The algorithm first applies Lemma 2 to the underlying undirected graph of C − I(C) and with the vertex set Z = N + C (X) ∪ {o}, where o is the output gate of C. Lemma 2 yields a propeller decomposition H, B 1 , B 2 , . . . , B t of C − I(C) of width at most 2(ω + 1) such that the hub H has size at most 2(|N + C (X)| + 1)(ω + 1), N + C (X) ∪ {o} ⊆ H, and the number t of blades is at most 2(|N + C (X)| + 1). Next the algorithm splits all the gates in H \ {o} and obtains a new circuit C with m = |H| − 1 input gates in addition to I(C). By Lemma 1 we have that for every x ∈ {0, 1} n such that C(x) = 1 there exists exactly one y ∈ {0, 1} m such that C (x, y) = 1. Furthermore, for every x ∈ {0, 1} n and y ∈ {0, 1} m , if C (x, y) = 1 then C(x) = 1.
We now summarize the structural properties of the circuit C . Each gate h ∈ H \ {o} in the circuit C corresponds to 4 gates h in , h out , h L and h R in C . We define H in = {h in : h ∈ H}, H out = {h out : h ∈ H}, H L = {h L : h ∈ H} and H R = {h R : h ∈ H}. In the circuit C we have that N The algorithm A constructs the circuits C i for every i ≤ t. For every i ≤ t it then uses the circuit C i to evaluate C i r (P ) for every P ⊆ I i and records the output C i r (P ) as a string 0's and 1's of length 2(ω + 1). In the last phase of the algorithm, for each gate g in H out ∪ {o} it records a 1 or a 0 as follows. If g is an and-gate the algorithm records a 0 for g if at least one gate in R feeding into g is assigned 0 by r, or if at least one gate in R feeding negatively into g is assigned 1 by r. Otherwise the algorithm records 1 for g. If g is an or-gate the algorithm records a 1 for g if at least one gate in R feeding into g is assigned 1 by r, or if at least one gate in R feeding negatively into g is assigned 0 by r. Otherwise the algorithm records 0 for g. This concludes the description of the algorithm A In total the algorithm records at most 2(ω + 1) · 2 2(ω+1) bits for each blade of the propeller decomposition, plus |H out | + 1 = |H| ≤ 2(|N + C (X)| + 1)(ω + 1) bits at the end of the procedure. The number of blades in the propeller decomposition is at most 2(|N + C (X)| + 1). Hence the total number of bits recorded by the algorithm is at most 6(|N
It remains to prove that for any two assignments r and r to R such that A(C, (T, χ), X, r) = A(C, (T, χ), X, r ), the functions C By Lemma 1 there exists an evaluation f of C that assigns 1 to every element in X , 0 to every element in X \ X , agrees with r on R, and assigns 1 to o. Consider now the evaluation f of C that assigns 1 to every element in X , 0 to every element in X \ X , agrees with r on R, and agrees with f on H in . We first prove that f and f agree an all gates in H out . Let g be an arbitrarily chosen or-gate in H out . We will prove that if f assigns 1 to g, then f also assigns 1 to g and if f assigns 1 to g, then f also assigns 1 to g.
If f assigns 1 to g then there exists a gate g such that either f assigns 1 to g and g feeds positively into g or f assigns 0 to g and g feeds negatively into g. The gate g can either be in X, in R, in H in , or in B i for some i ≤ t. If g is in X or H in then f and f agree on g (by choice of f ) and hence f also assigns 1 to g. If g is in R then the algorithm A when ran on (C, (T, χ), X, r) recorded 1 for g in the last phase. Hence A when ran on (C, (T, χ), X, r ) also recorded 1 for g in the last phase. Hence there exists a gate g such that r assigns 1 to g and g feeds positively into g or r assigns 0 to g and g feeds negatively into g. We conclude that also in this case f assigns 1 to g. Finally, if g is in B i , then set P to be the subset of N − C (B i ) \ R that f assigns 1. We have that g ∈ C i r (P ), in other words C i r (P ) outputs 1 for g. Since the recordings of the algorithm for C i r (P ) and C i r (P ) are the same, it follows that C i r (P ) also outputs 1 for g. But this means that there is some gate g ∈ B i that such that f assigns 1 to g and g feeds positively into g or f assigns 0 to g and g feeds negatively into g. Hence f assigns 1 to g. The proof that if f assigns 1 to g then f also assigns 1 to g is identical.
Consider now an arbitrarily chosen and-gate g in H out . An identical argument to the one above shows that if f assigns 0 to g, then f also assigns 0 to g and if f assigns 0 to g, then f also assigns 0 to g. Hence f and f agree on all gates in H out .
Since f assigns 1 to o it follows that f assigns 1 to all gates in H L ∪ H R . Thus, for every h ∈ H we have that f assigns the same value to h in and h out . Since f and f agree both on h in and h out it follows that f also assigns 1 to all gates in H L ∪ H R . Finally we need to argue that f assigns 1 to o. Since all gates in H L ∪ H R are 1 this follows from an argument identical to the one for and-gates in H out . This concludes the proof.
Counting Satisfying Assignments
Lemma 5. There is an algorithm that takes as input a circuit C with n input gates and gatesize m, such that at most s · n wires are incident to I(C), and ω = tw(C − I(C)) = o(log n), runs in time 2 n(1− ) m O(1) , where
and outputs the number of satisfying assignments to C.
Proof. First, using Bodlaender's algorithm [3] , the algorithm computes a tree decomposition (T, χ) of C − I(C) of width at most ω, in time 2 O(ω 3 ) m = 2 o(n) m. Then, the algorithm picks a subset X of I(C) of size ·n such that |N + C (X)| ≤ ·s·n. Such a set exists by a simple averaging argument. For example, X can be chosen to be the set of the first n gates of I(C) when the gates are ordered by their fan-out in increasing order. Then, the algorithm sets R = I(C) \ X, and proceeds to iterate over every assignment r : R → {0, 1}. For each r the algorithm computes the number c r of satisfying assignments to C r as follows. The algorithm first applies Lemma 4 to r, and computes a string q ∈ {0, 1} , where ≤ 6(|N + C (X)| + 1) · (ω + 1) · 2 2(ω+1) . It then looks up whether the string q has been output in a previous iteration for some other assignment r to R. If this is the case, then by Lemma 4 we have that c r = c r , so the algorithm records this and proceeds to the next iteration. If the string q has not been previously encountered, the algorithm computes c r by trying all possible assignments to X and stores the pair (q, c r ). After iterating over all choices of r the algorithm outputs r c r as the total number of satisfying assignments to C. Since each satisfying assignment to C corresponds to exactly one satisfying assignment to exactly one circuit C r , the correctness of the algorithm follows.
We now turn to the running time analysis. For each of the 2 n(1− ) choices of r the algorithm applies Lemma 4 once, which takes n O(1) time since ω = o(log n). Furthermore, in at most 2 of the iterations (once for each distinct q), the algorithm spends 2 n time to compute c r . In all other iterations, computation of c r takes time O(n) as it requires a single lookup in an exponentially large table (or binary search tree) that stored the pairs (q, c r ). Thus the total time is upper bounded by 2 (1− )n n O(1) + 2 2 n n O(1) . Since ≤ 6(s · · n + 1) · (ω + 1) · 4 ω+1 we have that 2 2 n ≤ 2 (n/2)+o(n) . Hence the running time is upper bounded by 2 (1− )n n O(1) , as claimed.
Observe that the requirement in Lemma 5 that at most s · n wires are incident to I(C) is weaker than demanding that the wire-size of C is at most s · n. Next we strengthen Lemma 5 to circuits of bounded gate-size, rather than wire-size. In a circuit C where the treewidth of C − I(C) is bounded by ω, the number of wires in C − I(C) is within a factor ω of the number gates. However the number of wires from I(C) to N + C (I(C)) can be as large as quadratic in |N + C (I(C))|. We obtain the strengthening of Lemma 5 to bounded gate-size by a standard fan-out reducing procedure for the input gates.
Theorem 1.
There is an algorithm that takes as input a circuit C with n input gates and gate-size m, such that |N + C (I(C))| ≤ s · n, and ω = tw(C − I(C)) = o(log n), runs in time
Proof. First, using Bodlaender's algorithm [3] , the algorithm computes a tree decomposition (T, χ) of C − I(C) of width at most ω, in time 2 O(ω 3 ) m = 2 o(n) m. From now on, we assume that the decomposition (T, χ) is given as input. The algorithm is recursive, and proceeds as follows. If no input gate (not counting the constant gates) has fan-out more than 6 · s the algorithm applies Lemma 5 to solve the instance. Otherwise the algorithm picks an input gate g with fan-out at least 6 · s. It then calls itself recursively to count the number of satisfying assignments where g is set to 1, and the number of satisfying assignments when g is set to 0.
Before making the recursive call the algorithm performs the following simplification procedure. When g is set to 1 then, for every or-gate g that g feeds positively into we force g to 1.
For every and-gate g that g feeds negatively into we force g to 0. When g is set to 0 then, for every or-gate g that g feeds negatively into we force g to 1. For every and-gate g that g feeds positively into we force g to 0. After the simplification procedure the algorithm removes the gate g and all wires leading out of g.
It is easy to see that the simplification procedure produces circuits where the number of satisfying assignments to the simplified circuit is equal to the number of satisfying assignments to C where the input gate g is set to 1 or 0 respectively. The correctness of the algorithm follows from this fact, together with the correctness of the algorithm of Lemma 5. We now proceed with the running time analysis.
Consider a node in the recursion tree from which the algorithm make two recursive calls, one with the input gate g set to 1 and one with g set to 0. For any gate g that g feeds into, g will be forced in at exactly one of the two recursive calls. Indeed, if g is an and-gate and g feeds positively into g then g is forced when g is set to 0. If g is an and-gate and g feeds negatively into g then g is forced when g is set to 1. If g is an or-gate and g feeds positively into g then g is forced when g is set to 1. Finally, if g is an or-gate and g feeds negatively into g then g is forced when g is set to 0.
Hence, in at least one of the two recursive calls, at least 3 · s gates in N + C (I(C)) are forced. For each internal node of the recursion tree, pick exactly one edge corresponding to a recursive call where at least 3 · s gates in N + C (I(C)) are forced, and call this edge a heavy edge. Since every internal node in the recursion tree has exactly one heavy and one light (not heavy) edge coming out of it, each leaf of the recursion tree is uniquely identified by the length of the root to leaf path, and the positions in the path of the heavy edges. In a leaf node for which the root to leaf path has length the number of input gates is n − and the total number of wires leading out from the input gates is at most (n − ) · 4 · s. Further, if the root to leaf path has at least h heavy edges, then at most s · n − 3 · s · h normal gates in C are being fed into from the non-constant input gates. It follows that h ≤ n/3. It follows that the running time of the algorithm is bounded by
For ≥ 9n/10 the terms of the sum are upper bounded by
For ≤ 9n/10 the sum is upper bounded by 2 · 2 (n− ) ( 
. Hence the total running time is upper bounded by 2 n(1− 10 ) m O(1) , as claimed.
Quantified Boolean Circuit SAT
Informally, a quantified circuit is simply a quantified boolean formula in prenex normal form where the formula that is used to evaluate the predicate once the variables have been instantiated is replaced by a circuit. Formally, a quantified circuit is a circuit C with n input gates equipped with a bijection σ : {1, . . . , n} → I(C) and a quantifier sequence q 1 , q 2 , . . . , q n where each q i is a symbol in {∀, ∃}.
We now give an inductive definition for what it means for a quantified circuit to evaluate to 1 or evaluate to 0. A quantified circuit with a single input gate g and quantifier q 1 = ∃ evaluates to 1 if C({g}) = 1 or C(∅) = 1. A quantified circuit with a single input gate g and quantifier q 1 = ∀ evaluates to 1 if C({g}) = C(∅) = 1. Otherwise the quantified circuit evaluates to 0. A quantified circuit with n ≥ 2 gates with q 1 = ∃ evaluates to 1 if at least one of the two quantified circuits C σ(1)←1 or C σ(1)←0 with quantifier sequence q 2 , . . . , q n evaluate to 1. A quantified circuit with n ≥ 2 gates with q 1 = ∀ evaluates to 1 if both quantified circuits C σ(1)←1 or C σ(1)←0 with quantifier sequence q 2 , . . . , q n evaluate to 1. Otherwise the quantified circuit evaluates to 0. We will sometimes say that the quantified circuit is "true" or "false" instead of saying it evaluates to 1 or 0 respectively.
In the Quantified Boolean Circuit SAT problem the input is a quantified circuit (C, σ, q 1 , . . . , q n ). The task is to determine whether the quantified circuit is true or false. The recursive definition of evaluating the circuit naturally gives rise to a 2 n · n O(1) time algorithm for the problem. Here we give an algorithm that is significantly faster provided that C − I(C) has bounded treewidth, and that every gate in I(C) has bounded fan-out. The proof of Theorem 2 closely follows the proof of Lemma 5, the key difference is that the subset X of input gates that are used to generate the propeller decomposition is no longer chosen greedily, but set to {σ( n(1 − ) ), σ( n(1 − ) + 1, . . . , σ(n))} instead.
Theorem 2.
There is an algorithm that takes as input a Quantified Circuit (C, σ, q 1 , . . . , q n ) with n input gates and gate-size m, such that each input gate has fan-out at most s, and ω = tw(C − I(C)) = o(log n), runs in time 2 n(1− ) m O(1) , where
and outputs whether (C, σ, q 1 , . . . , q n ) is true.
Proof. First, using Bodlaender's algorithm [3] , the algorithm computes a tree decomposition (T, χ) of C − I(C) of width at most ω, in time 2 O(ω 3 ) m = 2 o(n) m. Then, the algorithm sets X ⊆ I(C) to be {σ( n(1 − ) ), σ( n(1 − ) + 1, . . . , σ(n)}. Observe that |N + C (X)| ≤ · s · n. Then, the algorithm sets R = I(C) \ X, and commences with the algorithm for evaluating quantified circuits that follows directly from the definition. However, every time the algorithm arrives at a recursive call when the set of remaining input gates is exactly X it proceeds as follows.
Let r be the assignment that is forced to R in this recursive call. The algorithm needs to return c r , which is defined to be 0 or 1 according to whether the quantified circuit (C R←r , σ, q n(1− ) , q n(1− ) +1 , . . . , q n ) evaluates to 0 or to 1. The algorithm first applies Lemma 4 to r, and computes a string s ∈ {0, 1} , where
. It then looks up whether the string s has been output in a previous recursive call for some other assignment r to R. If it has, then by Lemma 4 we have that c r = c r since C r = C r and whether or not the considered quantified circuit evaluates to 0 or 1 depends only on the function computed by the circuit C R←r and not on the (structure of the) circuit itself. In this case, the algorithm returns c r = c r .
If the string s has not been previously encountered, the algorithm computes c r using the algorithm implicit in the definition of evaluating quantified circuits. It then stores that s has been previously encountered, and that when s was encountered the algorithm returned c r . Then the algorithm returns c r . The correctness of the algorithm follows directly from the definition of evaluation of quantified circuits together with Lemma 4.
We now turn to the running time analysis. For each of the 2 n(1− ) choices of r the algorithm applies Lemma 4 once, which takes m O(1) time since ω = o(log n). Furthermore, in at most 2 of the recursive calls (the ones where s has not been previously encountered) the algorithm spends 2 n time to compute c r . In all other recursive calls this is done in time O(n) by a single lookup in an exponentially large table (or binary search tree) storing all previously encountered strings. Thus the total time is upper bounded by 2 (1− )n m O(1) + 2 . Since ≤ 6(s · · n + 1) · (ω + 1) · 4 ω+1 we have that 2 ≤ 2 (n/2)+o(n) . Hence the running time is upper bounded by 2 (1− )n m O(1) , as claimed.
Notice that the only place where we used that the fan-out of all input gates is at most s, is to bound N + C (X). Hence Theorem 2 also applies to all circuits where the total fan-out of the · n gates that are quantified last is at most · s · n.
Lower Bounds
In this section we will prove that constant treewidth circuits with a linear number of wires can not compute the Majority n function. The proof hinges on the classic result by Chandra, Furst and Lipton [7] that in the Number On The Forehead (NOF) model of communication complexity, Majority n does not admit a constant cost communication protocol with a constant number of players. We will prove that every symmetric function computable by a constant treewidth circuit with a linear number of wires does admit such a protocol. Together the two results yield the desired lower bound for Majority n .
Theorem 3. For every constant s and ω there is no circuit C that computes Majority n , such that at most s · n wires lead out of I(C) and that the treewidth C − I(C) is at most ω.
The Number on the Forehead Model We now introduce the notions from communication complexity that we will use. For in-depth treatment, see [22] . In the Number On The Forehead (NOF) model there are k players that together have to compute a function f : {0, 1} n → {0, 1}. The players know n and the function f to be computed prior to deciding on the protocol. Once they have decide on a protocol an input x ∈ {0, 1} n is given as input, and the players together have to output f (x).
Each player sees all bits of x except a subset of n/k "anti-private" bits that are seen by everyone else except her. In particular, for every player i there is a set Q i ⊆ {1, . . . , n} of size n/k or n/k such that player i has access to the input bits {1, . . . , n} \ Q i . The sets Q 1 , . . . , Q k form a partition of {1, . . . , n}, and the players know this partition.
Communication between the players happens by broadcast, i.e when a player transmits a message all the other players receive the message. The cost of sending the message is the length of the message. Any one of the players may output the function value. There are two variants of the Number On The Forehead model -the best-partition and the worst-partition version. In the best-partition model it is sufficient that there exists a partition Q 1 , . . . , Q k , such that for every input x ∈ {0, 1} n the protocol successfully computes f (x). In the worst-partition model the protocol has to successfully compute f (x) for every input x and partition Q 1 , . . . , Q k .
A function f : {0, 1} n → {0, 1} is symmetric if f (x) only depends on the number of input bits set to 1. The function Majority n : {0, 1} n → {0, 1} function outputs 1 if at least n/2 bits are set to 1. Hence Majority n is symmetric. For symmetric functions there is no difference between the best-partition and the worst-partition models. Our lower bounds for Majority n rely on the following result.
Proposition 1 ([7]
). For every constant c and k, Majority n does not admit a best-partition NOF protocol with cost c and k players.
One-Way Communication Protocols
In a one-way communication protocol with k players the i'th player has access to a subset S i ⊆ {1, . . . n} of the input bits. The access size of the protocol is max i |S i |. Each bit is seen by at least one player, and the players know the family S 1 , . . . , S k .
Each player gets to sends a single message to a center. The center receives the messages from all the players, but has no access to the input. The center then has to determine and output the function value. None of the players other than the center can see the messages of any of the other players. The cost of the communication protocol is the total number of bits sent from the players to the center.
We will only consider best-partition one-way communication protocols where there has to exist a collection S 1 , . . . , S k of sets (each S i of cardinality at most the access size) such that for every x the protocol successfully computes f (x) given that each player i has access to the bits in S i . Note that S 1 , . . . , S k does not have to be a partition of {1, . . . , n}, nevertheless we refer to this model as the best-partition model for a nomenclature consistent with the one for the NOF model. We start by relating One-Way communication protocols for Majority n and communication protocols in the NOF model. Lemma 6. For every constant k and c, if Majority n admits a best-partition one-way communication protocol with k players, access size n/2, and cost c, then Majority n admits a best-partition NOF protocol with k players and cost c.
Proof. Suppose that Majority n admits a best-partition one-way communication protocol with k players, access size n/2, and cost c. We then give a best-partition NOF protocol with cost c and k players.
Let S 1 , . . . , S k ⊆ {1, . . . , 5n} be the access sets the players for computing Majority 5n in the one-way protocol. Each S i has size at most 5n/2. Select Q 1 , . . . Q k ⊆ {1, . . . , 5n} such that for every i, |Q i | ∈ { n/k , n/k }, Q i is disjoint from S i , i |Q i | = n and the sets Q i are disjoint. Such a collection Q 1 , . . . Q k exists because it can be picked greedily: first select n/k elements disjoint from S 1 and put them into Q 1 . Then select n/k elements from {1, . . . , 5n} \ S 2 ∪ Q 1 and put them into Q 2 . In general, in step i pick n/k or n/k elements disjoint from S i ∪Q 1 ∪. . .∪Q i−1 and insert them into Q i . Since |S i ∪Q 1 ∪. . .∪Q i−1 | ≤ 5n/2+n ≤ 7n/2 this will always be possible. Because Majority 5n is a symmetric function, without loss of generality Q 1 , . . . Q k forms a partition of {1, . . . , n}. We give a NOF protocol for Majority n with partition Q 1 , . . . Q k .
On input x ∈ 0, 1 n For every i ≤ k, player i simulates the behavior of player i in the one-way protocol for Majority 5n on input (x, y) where y is a string of 2n 1's followed by 2n 0's. Note that Majority n (x) = Majority 5n (x, y). Further, player i of the NOF protocol can simulate the behavior of player i in the one-way protocol because Q i and S i are disjoint. In the NOF model, communication happens by broadcast, so player 1 can act as the center of the one-way protocol for Majority 5n , and output the value of Majority 5n (x, y) = Majority n (x). The cost of the NOF protocol for Majority n is the same as the cost of the one-way protocol for Majority 5n , which is a constant c independent of n. This concludes the proof.
One-Way Communication Protocols from Bounded Treewidth Circuits. We now prove the main technical ingredient of our lower bound -that every function computable by a linear wire-size bounded treewidth circuit admits a one-way protocol with a constant number of players, constant cost, and access size n/2. Lemma 7. Let C be a circuit with n input gates, such that at most s·n wires are incident to I(C) and tw(C − I(C)) ≤ ω for a positive integer ω. Then, for every d, the function C : 2 I(C) → 2 {o} admits a one-way multi-party communication protocol with at most 8 · s · d players, each seeing at most n/d bits of the input, and each sending at most 12 · (ω + 1) · 4 ω bits to the center.
Towards the proof of Lemma 7 we will first give a propeller decomposition lemma similar to Lemma 2, but decomposing the graph into "equally sized" blades rather than trying to exclude some fixed set from the blades. Similar arguments commonly appear in graph algorithms, see e.g [4, 21] . In the following, when given a weight function w : V (G) → N on the vertices of a graph G, we will write w(S) for subset S of V (G), meaning w(S) = v∈S w(v).
Lemma 8.
There exists a linear time algorithm that given a graph G, a weight function w : V (G) → Z + , a tree decomposition (T, χ) of G of width ω and an integer r, computes a propeller decomposition H, B 1 , B 2 , . . . B t of width at most 2(ω + 1), such that t ≤ 5r, |H| ≤ 2r(ω + 1), and for every i ≤ t, w(B i ) ≤ w(V (G))/r.
Proof. We will consider the tree decomposition as rooted with root r. It is well known (see e.g. [11] ) that given a tree decomposition (T, χ) of a graph G one can in polynomial time construct a rooted tree decomposition (T , χ ) of G of the same width, such that every node u in T has at most two children (neighbors that are further away from the root). We will therefore assume that in T every node has at most two children. For every vertex u ∈ V (T ) we define T u to be the subtree of T rooted at u. We perform the following marking procedure.
Initially M = ∅ and w : V (G) → N is equal to w. Then, if such a vertex exists, pick a lowermost vertex u ∈ V (T ) such that w (χ(V (T u ))) ≥ w(V (G))/r. Add u to M and, for every v ∈ χ(V (T u )), set w (v) to 0. When no vertex u with w (χ(V (T u ))) ≥ w(V (G))/r exists terminate the marking procedure. Note that every time a vertex u is added to M , w (V (G)) is decreased by at least w(V (G))/r. Thus the marking procedure terminates after at most r rounds, with |M | ≤ r. Set M = LCA-closure(M ). By Lemma 3, |M | ≤ 2|M | ≤ 2r.
Let Q 1 , Q 2 . . . Q t be the connected components of T \M . By Lemma 3 we have that for every i ≤ t, |N T (Q i )| ≤ 2. Further, since T has maximum degree 3 it follows that t ≤ 2|M | + 1 ≤ 5r.
Define H = χ(M ) and for each 1 ≤ i ≤ t set B i = χ(Q i )\H. Since every vertex of G appears in a bag of the tree-decomposition, H, B 1 , . . . B t forms a partition of V (G). By construction we have that for every i, N (B i ) ⊆ H. Furthermore, since |N T (Q i )| ≤ 2 we have |N (B i )| ≤ 2(ω +1). Finally, the marking procedure implies that for every i ≤ t, w(B i ) ≤ w(V (G))/r. This concludes the proof.
Proof of Lemma 7. We will assume without loss of generality that every gate of C has at most one wire feeding into it from I(C). To justify this assumption, suppose C does not have this property. For every wire uv with u in I(C) we will make a new or-gate g and replace the wire uv with the wires ug and gv. Call the resulting circuit C * . Clearly C and C * compute the same function. Furthermore, since adding leaves to a graph does not increase treewidth (except possibly from 0 to 1), the treewidth of C * − I(C * ) is also at most ω. Additionally, the number of wires feeding out of I(C * ) is also at most s · n. Hence we may prove Lemma 7 for C * , and the same conclusion will hold for C. Thus, from now on we assume that every gate of C has at most one wire feeding into it from I(C).
Next, define a weight function w on the gates of C − I(C) where the weight of every gate g is |N − C (g) ∩ I(C)|. In other words, the weight of a gate is the number of wires feeding in to the gate from the input gates. From the assumption on the number of wires feeding out of I(C) it follows that the total weight of all gates is at most s · n. We then apply Lemma 8 on C − I(C) with this weight function, and r = d · s. This results in a propeller decomposition H, B 1 , . . . B t of width at most 2(ω + 1), such that t ≤ 5 · d · s, |H| ≤ 2 · d · s(ω + 1), and for every i ≤ t, w(B i ) ≤ n/d. If the output gate o is not in H, but rather in some B i then remove o from B i and add o to H. This will increase the size H to at most 2 · d · s(ω + 1) + 1 and the width of the decomposition to at most 2(ω + 1) + 1. The circuit C is obtained from C by splitting all the gates in H \ {o}. We abuse notation and will refer by I(C) both to the input gates of C and to the input gates of C that are copies of gates of C.
We now summarize the structural properties of the circuit C . Each gate h ∈ H \ {o} in the circuit C corresponds to 4 gates h in , h out , h L and h R in C . We define H in = {h in : h ∈ H \{o}}, H out = {h out : h ∈ H \ {o}}, H L = {h L : h ∈ H \ {o}} and H R = {h R : h ∈ H \ {o}}. In the circuit C we have that |H out | = |H in | ≤ 2(|N 
) ∪ Q and sends the result to the center. Since there are at most 2 2(ω+1) choices for Q, and for each choice of Q the result can be encoded in at most 2(ω + 1) + 1 bits, the total number of bits player i transmits to the center is at most 12 · (ω + 1) · 4 ω . The remaining players transmit to the center the value of the gates that they have access to. Each of these players transmits at most ω + 1 bits.
The center receives the messages from all of the players, and then, for every Y ⊆ H out can evaluate the circuit C (X ∪ Y ) without without the knowledge of X because the messages of the players describe precisely the output of all of the sub-circuits P i and the value of all gates in I(C) feeding directly into H out ∪ {o}. If there exists a Y such that the center concludes that C (X ∪ Y ) evaluates to 1, it outputs that C(X) = 1. Otherwise it outputs that C(X) = 0.
Proof of Theorem 3. We are now in position to put everything together and prove Theorem 3.
Proof. Suppose Majority n had a circuit C such that at most s · n wires lead out of I(C) and that the treewidth C − I(C) is at most ω. By Lemma 7 with d = 2, Majority n has a one-way protocol with constant cost, a constant number of players, and access size n/2. From Lemma 6 it then follows that Majority n has a NOF-protocol with a constant number of players and constant cost. This contradicts Proposition 1 that such a protocol does not exist.
