Let ACC • THR be the class of constant-depth circuits comprised of AND, OR, and MODm gates (for some constant m > 1), with a bottom layer of gates computing arbitrary linear threshold functions. This class of circuits can be seen as a "midpoint" between ACC (where we know nontrivial lower bounds) and depth-two linear threshold circuits (where nontrivial lower bounds remain open).
size is computable in 2 n−n ε time (where ε > 0 depends on the depth and modulus of the circuit).
• NEXP does not have quasi-polynomial size ACC • THR circuits, and NEXP does not have quasi-polynomial size ACC • SYM circuits. Nontrivial size lower bounds were not known even for AND • OR • THR circuits.
• Every 0-1 integer linear program with n Boolean variables and s linear constraints is solvable in 2 n−Ω( n (log M )(log s) 5 ) ·poly(s, n, M ) time with high probability, where M upper bounds the bit complexity of the coefficients. (For example, 0-1 integer programs with weights in [−2 poly(n) , 2 poly(n) ] and poly(n) constraints can be solved in 2 n−Ω(n/ log 6 n) time.) Impagliazzo, Paturi, and Schneider [IPS13] recently gave an algorithm forÕ(n) constraints; ours is the first asymptotic improvement over exhaustive search for up to subexponentially many constraints. We also present an algorithm for evaluating depth-two linear threshold circuits (a.k.a., THR • THR) with exponential weights and 2 n/24 size on all 2 n input assignments, running in 2 n · poly(n) time. This is evidence that non-uniform lower bounds for THR • THR are within reach.
INTRODUCTION
Recall that in the non-uniform Boolean circuit model, one designs an infinite family of logical circuits {Cn}, one for each input length n, in order to recognize a given binary language L ⊆ {0, 1} . This model is notoriously powerful, even when the size of Cn is bounded from above by a fixed polynomial in n, defining the complexity class P/poly. With polynomial size circuits, one can already "compute" some undecidable languages, such as L = {1 n | the nth Turing machine halts on blank tape}. Nevertheless, it is strongly believed that NP ⊂ P/poly, meaning that for even modestly-sized instances of NP-complete problems, the sizes of computations on such instances must be inevitably gigantic. However, knowledge of P/poly is rather poor, due to the "infinite" nature of the model: it is open if the huge complexity class nondeterministic exponential time (NEXP) is contained in P/poly. The containment of NEXP in P/poly would imply that problems verifiable with exponentially-long witnesses could be efficiently "solved" with small circuits. The possibility looks obviously absurd, but we do not know at present how to rule it out.
In recent years, it has been demonstrated that the existence of nontrivial circuit-analysis algorithms is closely linked to the NEXP versus P/poly problem. For instance, Impagliazzo, Kabanets, and Wigderson [IKW02] showed that NEXP ⊂ P/poly follows, if there is a 2 n o(1) time algorithm that can approximate a given circuit's acceptance probability to within 1/10. They also proved a partial converse, in that NEXP ⊂ P/poly implies a certain kind of derandomization. Subsequent work [Wil10] strengthened the algorithms-to-lower bounds implication, proving that a similar algorithm which (for every k) runs in 2 n−ω(log n) time on all ninput n k -size circuits still implies NEXP ⊂ P/poly. A variant of this implication (for circuit satisfiability algorithms) was combined with an satisfiability algorithm for a restricted circuit class called ACC, implying that NEXP does not have polynomial-size ACC circuits [Wil11b] . Recently, it was shown that NEXP ⊂ P/poly is equivalent to establishing a "weak" form of natural proofs [Wil13] , building on Impagliazzo et al. In particular, NEXP ⊂ P/poly if and only if there is a "constructive" property of Boolean functions that is "useful" against P/poly. The natural proofs barrier [RR97] states that if such a property is To continue progress on circuit lower bounds for NEXP, it is imperative to understand algorithms for analyzing circuits, such as algorithms for circuit satisfiability, evaluating a circuit on all 2 n inputs, and approximating the acceptance probability of a circuit.
2
In this paper, we make this sort of algorithmic progress for circuits with arbitrary linear threshold gates: such a gate outputs 1 if and only if a certain linear inequality i wixi ≥ t is true, where wi, t ∈ Z are weights and xi ∈ {0, 1} are inputs to the gate. Linear threshold functions have been studied for decades, coinciding with research on neural networks [MP69, Mur71] . Low-depth linear threshold circuits are powerful: many basic functions in arithmetic, algebra, and cryptography are known to be implementable with only constant-depth linear threshold circuits [RT92, SBKH93, SP94, MT99, NR04] . In terms of lower bounds for such circuits, very weak questions remain major open problems: for example, is all of NEXP solvable with polynomial-size depth-two linear threshold circuits with exponential-size weights?
3 Depth-two circuits correspond to multilayer perceptrons with only one hidden layer. Despite considerable study in neural networks and deep learning, we still lack understanding of the power of depth-two.
In this paper, we report some new progress on understanding the power of linear threshold gates.
ACC with threshold gates.
Let ACC • THR denote the class of circuits consisting of AND, OR, MODm gates for some constant m, 4 and linear threshold gates, with unbounded fan-in and constant depth, such that the inputs of all linear threshold gates connect directly to the circuit's input variables. Let SYM • ACC • THR be the class of circuits where the output gate computes an arbitrary symmetric function, and its inputs connect to the outputs of ACC • THR circuits. We show that such circuits can very efficiently evaluated on all 2 n inputs, even if they are of 2 size, we can produce its outputs on all 2 n inputs in 2 n · poly(n) time.
More generally, such a circuit of size s can be evaluated on all inputs in 2 n · poly(log s, n) + 2 O(log s) c time, for some c ≥ 1 depending on the depth of the circuit and the modulus m of its MODm gates.
The proof of Theorem 1.1 also carries through for SYM • ACC • SYM, where the bottom layer gates compute arbitrary symmetric functions (i.e., functions which only depend on the number of true inputs) of 2
wires. This algorithm can be used to count the number of satisfying assignments to ACC • THR circuits. also "large" (true of a large fraction of functions) then strong cryptographic pseudorandom generators do not exist. Hence, assuming strong crypto, NEXP lower bounds must somehow confront the framework of natural proofs but sidestep the "large" condition. 2 There are several recent surveys on these issues [Wil11a, San12, Coh13, Oli13] . 3 Note that for thresholds with polynomially-bounded weights, depth-two lower bounds are known; however depth-three lower bounds are still open. The survey of Razborov [Raz92] is still relatively current on these points. 4 Faster integer linear programming.
Building on Theorem 1.1, we also give a new method for solving 0-1 integer linear programs (ILP). In FOCS'13, Impagliazzo, Paturi, and Schneider [IPS13] showed that for each c > 1, there is a δ < 1 such that 0-1 ILP with cn constraints can be solved in 2 Depth-two linear threshold circuit evaluation.
We take an important step towards depth-two linear threshold circuit (a.k.a. THR • THR) lower bounds for the case of exponential weights, by giving an efficient algorithm for evaluating such circuits on all possible assignments. THEOREM 1.5. Let k > 1. Given a depth-two 2 n/24 -size linear threshold circuit C with integer weights in [−2
, we can evaluate C on all 2 n input assignments in 2 n · poly(n k ) time.
Theorem 1.5 follows from a more general result showing that any sufficiently large "combinatorial rectangle" of inputs can be evaluated in poly(n) amortized time per input. Noting that a similar statement for evaluating ACC circuits forms the heart of the proof of NEXP ⊂ ACC [Wil11b] , Theorem 1.5 suggests that large complexity classes (such as NEXP) cannot have small depth-two linear threshold circuits. However, we do not yet know how to turn Theorem 1.5 into depth-two linear threshold lower bounds. 6
Prior work
Considerable effort has been expended in proving lower bounds against circuits with linear threshold gates. Here we will provide some major highlights, in addition to the work already mentioned.
It will help to introduce a little (standard) notation. Define MAJ, AND, OR, THR, and SYM to be the class of one-gate circuits corresponding to MAJORITY, AND, OR, linear threshold, and symmetric functions, respectively, with "free" NOT gates that can appear after the output or on the input wires to the gate. (Recall that a symmetric Boolean function's output only depends on the number of true inputs.) For classes of circuits C and D, define C • D to be the class of circuits formed by taking a circuit C ∈ C, and feeding the outputs of circuits from D as inputs to C. That is, C • D is simply the composition of circuits from C and D, with the circuits from D receiving the input and the circuit from C giving the output. We will equivocate the size of a circuit with the number of wires, i.e., the number of directed arcs in the DAG defining the circuit. This is an important measure for circuits with symmetric gates, as the number of wires governs the size of the symmetric function representation.
Much work on depth-two threshold lower bounds has focused on lower bounds for inner product modulo 2, i.e., IP2(x1, . . . , xn, y1, . . . , yn) = i xi · yi mod 2.
Note that IP2 is easy for ACC (being a MOD2 of AND gates). In groundbreaking work, Hajnal et al. [HMP + 93] proved that every MAJ • MAJ circuit requires 2 Ω(n) gates to compute IP2. They also showed MAJ • SYM circuits can be efficiently simulated by MAJ•MAJ circuits, so small MAJ•SYM circuits also cannot compute IP2. Nisan [Nis94] [HP13] have shown an intriguing reduction: superpolynomial-size THR • THR lower bounds for a function f would follow from superlogarithmic lower bounds on the 3-party NOF unbounded-error communication complexity of f .
Comparison and Intuition
It is instructive to discuss how this paper's approach relates to prior work on depth-two threshold lower bounds. A certain popular approach [FKL + 01, Lok08, She09, RS10] applies ingredients from Fourier analysis of Boolean functions, linear algebra, communication complexity, discrepancy theory, etc. In particular, these works follow the general scheme:
1. Define a notion of "relaxed rank" of a 2 n/2 × 2 n/2 Boolean matrix C. Intuitively, if C has "relaxed rank" r, then there are 2 n/2 × r and r × 2 n/2 matrices A and B such that the entries of A·B correspond to the entries of C in a direct way.
Show that every function f : ({0, 1}
n/2 × {0, 1} n/2 ) → {0, 1} computable with a "small" C circuit has "small relaxed rank" when construed as an 2 n/2 × 2 n/2 Boolean matrix.
Find a family of functions gn : ({0, 1}
n/2 × {0, 1} n/2 ) → {0, 1}, construed as 2 n/2 × 2 n/2 Boolean matrices, which requires "high relaxed rank" asymptotically.
Together, these steps prove that the family g := {gn} cannot have "small" C circuits.
To prove ACC • THR circuit lower bounds, we define a generalized rank notion we call the symmetric rank, informally measuring how efficiently a 0-1 matrix M can be decomposed into a sum of rank-one matrices such that, after applying a fixed symmetric function to each entry of the sum, we obtain the matrix M . Combining several elements from previous work, we show that for a Boolean matrix representing the truth table of a SYM • ACC • THR circuit of size s, its symmetric rank is O(2 log c s ) for some constant c ≥ 1, depending on the depth d and modulus m of the MODm gates in the circuit. Moreover, given such a circuit we can efficiently compute a low-rank decomposition.
However, we do not know how to use existing methods to prove that an explicit function g has high symmetric rank. Instead, we take a more computational approach that still exploits the low symmetric rank property. The idea is that, if we can efficiently compute a low-rank decomposition from a given circuit, then the circuit's truth table can be obtained faster than evaluating the circuit on all its inputs one-by-one. This in turn suggests that these circuits possess considerable structure that make them unsuitable for simulating very complex functions, such as those in NEXP.
Suppose we are given an SYM • ACC • THR circuit C of size s with n inputs. Let M be a 2 n/2 × 2 n/2 matrix defining the function computed by C. First we show how given any such C we can compute 2 n/2 × 2 log c s and 2 log c s × 2 n/2 matrices A and B (and a symmetric function f ) giving a symmetric rank decomposition of M , in 2 n/2 · 2 O(log c s) time. By multiplying A and B and applying f to each entry of the output matrix, we can obtain M . When s is sufficiently small, a rectangular matrix multiplication of Coppersmith [Cop82] can be applied to compute the product of A and B, and the final matrix M is obtained in poly(n) time per entry. Hence, given an SYM•ACC•THR circuit C of size 2
, we can evaluate C on all its 2 n inputs in only 2 n · poly(n) time. This fast evaluation algorithm is combined with prior work [Wil10, Wil11b] along with some new tricks to exhibit a g := {gn} ∈ NEXP which does not have quasipolynomial-size ACC • THR circuits.
Our evaluation algorithm for depth-two threshold circuits (Theorem 1.5) also uses Coppersmith's rectangular matrix multiplication as a subroutine, but the rest of the algorithm is rather different from the evaluation algorithm for SYM • ACC • THR. We reduce the problem of efficiently evaluating a depth-two threshold circuit on many inputs to a special type of matrix multiplication. Namely, for two matrices A and B over the integers, we compute a "weighted" matrix product
where LEQ(x, y) is a Boolean-valued function equal to 1 if and only if x ≤ y, and the w k 's are arbitrary integer weights given as parameters to the problem. We show how Coppersmith's algorithm can be combined with a mild brute force search to efficiently compute a rectangular matrix product of the above form.
ACC WITH A LAYER OF THRESHOLD GATES
The main theorem of this section is:
with n inputs and 2
size, we can produce its outputs on all 2 n inputs in 2 n · poly(n) time. More generally, such a circuit of size s can be evaluated on all inputs in 2 n · poly(log s, n) + 2
time, for some c ≥ 1 depending on the depth of the circuit and the modulus m of its MODm gates.
Depth reduction.
The first stage of the proof is to convert an arbitrary SYM•ACC• THR circuit C of size s into a depth-two circuit C of symmetric gates, i.e., a SYM • SYM circuit. The size of the depth-two circuit will be O(2 The following paragraphs give the proof of Lemma 2.1. Let C be a SYM • ACC • THR circuit with inputs x1, . . . , xn, size s, depth d, and MODm gates, for constants d > 2 and m > 1. In the proof, several constants arise; we will denote all of them by the same constant b > 1 which is assumed to be the maximum of these quantities.
The first step in Lemma 2.1 is to translate the THR layer of C into a SYM layer, by absorbing some of its complexity into the ACC part. Without loss of generality, we can assume that the weights of all threshold gates in C have absolute value at most 2 bn log 2 n [MTT61, Mur71] . (Every THR function is equivalent to one with weights of bit-complexity at most bn log 2 n.) 7 Maciel and Therien [MT98] provided several fairly tight lowdeph circuits for various tasks. We need: THEOREM 2.1 ( [MT98] , THEOREM 3.3). Addition of n distinct n-bit numbers can be performed with polynomial-size AND • OR • SYM circuits. Furthermore, the circuits can be constructed in polynomial time.
We can therefore replace every THR gate of C with an AC 0 • MAJ circuit, as follows. Fix a threshold gate of C, with weights wi 1 , . . . , wi t for t ≤ n, computing t−1 j=1 wi j xi j ≥ wi t for some ij ∈ {1, . . . , n}. Note |wi j | ≤ 2 bn log 2 n for j = 1, . . . , t. Set W = bn log 2 n.
Let
. This can be compared to the value wi t with an AC 0 circuit, using the fact that the "lessthan-or-equal-to" comparison of two integers can be performed in AC 0 [CSV84] . We now have an AC 0 • SYM circuit D of size poly(W, t) ≤ n b computing the given threshold gate. Applying this construction to each threshold gate in the THR layer of C, we obtain an SYM • ACC • SYM circuit C of size at most s · n b . The next step of Lemma 2.1 is to convert the SYM • ACC part into a SYM•AND circuit, using a reduction of Beigel-Tarui [BT94] (with important details on constructibility filled in by Allender- c ) (where the subscript on the AND denotes the fan-in of each AND gate). For simplicity of notation, let t = (log(s · n b )) c in the following. Extending a trick of Beigel [Bei94] to symmetric gates, we can convert every ANDt • SYM subcircuit of C with n b wires into a single SYM gate with O(n b·t ) wires. Let S1(x1, . . . , xn) ∧ · · · ∧ St(x1, . . . , xn) be one such subcircuit, where Si denotes the ith symmetric gate. In particular, for i = 1, . . . , t, let fi : Z → {0, 1} be such that fi( n j=1 ci,jxj) = Si(x1, . . . , xn), where ci,j denotes the number of copies of xj that feed into Si.
ci,jxj .
For any Boolean assignment to the xj's, the number encoded by the linear form L(x1, . . . , xn) is an integer encoded in O(t · b log n) bits. By construction, the bit representation of this integer contains, for every i = 1, . . . , t, the number of wires input to Si which are set true, as a string of (b log n) bits. Therefore, from the linear form L(x1, . . . , xn) we can easily infer whether all Si(x1, . . . , xn) output 1 or not, and hence output the value of S1 ∧ · · · ∧ St.
To implement this linear form with a single SYM gate, for all j = 1, . . . , n we put 
Low symmetric rank decomposition.
Next, we prove that the truth table of any SYM • SYM circuit C of t wires and n inputs represents a 2 n/2 × 2 n/2 matrix of symmetric rank at most poly(t), and this rank decomposition can be efficiently computed. For given matrices A and B over the integers, let A · B denote their matrix product over the integers. Let M ∈ {0, 1} m×n . We define the symmetric rank of M to be the minimum r ∈ N such that there are matrices A ∈ {0, 1} m×r , B ∈ {0, 1} r×n and a function f :
) for all i, j. We call the triple (A, B, f ) a symmetric rank decomposition of M . The symmetric rank is similar to the typical notion of rank, except for the additional function f providing a "filter" from arbitrary integers back to {0, 1}. This filter function could potentially lead to smaller rank decompositions than the typical notion. However, note the symmetric rank of M is not necessarily at most (for instance) the rank of M over R, because A and B are required to have Boolean entries.
For simplicity let n be even, and let z1, . . . , z 2 n/2 be the list of all 2 n/2 n/2-bit strings in lexicographical order. For a circuit C with n inputs, define the truth table matrix MC to be the 2 n/2 × 2 n/2 matrix with MC [i, j] equal to the output of C(zi, zj).
LEMMA 2.2. Given a SYM • SYM circuit C with t wires and n inputs, its truth table matrix MC has symmetric rank O(t 3 ), and a symmetric rank decomposition of MC can be computed from C in 2 n/2 · poly(t) time.
PROOF. For simplicity we assume n is even; the case of odd n will be apparent. Index the input variables of C by x1, . . . , xn. Let g1, . . . , gs be an indexing of the gates of C on the bottom layer (closest to the inputs) and let g denote the output gate of C. (Note that s ≤ t.) Let f : {0, 1, . . . , s} → {0, 1} be the symmetric function of gate g : for all a ∈ {0, 1, . . . , s}, f (a) = b if and only if a true inputs make g output b.
We shall show how to efficiently construct matrices A and B with the appropriate properties. Let z1, . . . , z 2 n/2 be the list of all n/2-bit strings in lexicographical order, in the following. For every pair (a, b) ∈ {0, 1, . . . , t} 2 such that a + b ≤ t, let S a,b ⊆ {g1, . . . , gs} denote the subset of gates gj such that a+b true inputs makes gate gj output 1.
The matrices A and B to be constructed show that the symmetric rank of MC is at most r = a,b∈{0,1,...,t}:a+b≤t
In other words, each pair (a, b) will add |S a,b | additional components to the rows of A and the columns of B.
For i = 1, . . . , 2 n/2 , the ith row of A and ith column of B are defined as follows. For every pair (a, b), allocate |S a,b | additional components for the rows of A and columns of B.
For j = 1, . . . , |S a,b |, put a 1 in the jth additional component of the ith row of A if and only if there are a true wires going into the jth gate of S a,b when the input variables x1, . . . , x n/2 are given assignment zi. That is, the jth component is 1 if and only if the contribution (from the first half of variables) to the overall sum for the jth gate is a.
Similarly, for j = 1, . . . , |S a,b |, put a 1 in the jth additional component of the ith column of B if and only if there are b true wires going into the jth gate of S a,b , when the input variables x n/2+1 , . . ., xn are given assignment zi.
Note that each entry of A and B can be determined in poly(t) time.
For every fixed (a, b), the product of two jth components for the ith row of A and the kth column of B is either 0 or 1, and the product is 1 if and only if:
• the sum of true inputs into the jth gate of S a,b from the inputs (x1, . . . , x n/2 ) equals a when the inputs (x1, . . . , x n/2 ) are assigned zi, • the sum of true inputs into the same gate from the inputs x n/2+1 , . . . , xn equals b when x n/2+1 , . . . , xn are assigned z k , and • the jth gate outputs 1 when its sum of true inputs equals a+b. It follows that the inner product of the ith row of A and the kth column of B equals the total number N i,k of true wires going into the output gate of C on the variable assignment (x1, . . . , xn) → (zi, z k ). By definition, f (N i,k ) equals the output of C on that variable assignment.
We need one more lemma to complete the proof of Theorem 1.1: LEMMA 2.3. For all sufficiently large N , and α ≤ .172, multiplication of an N × N α matrix with an N α × N matrix can be done in N 2 · poly(log N ) arithmetic operations, over any field with O(2 poly(log N ) ) elements. Compute the product of A and B in 2 n · poly(log s, n) time, using Lemma 2.3. Finally, evaluate function f on all entries of the matrix product. This can be done by numerically sorting the entries, replacing each entry v by f (v), then inverting the sorted order, in time 2 n · poly(log s, n) + 2
, the runtime is 2 n · poly(n). 2
Counting satisfying assignments to ACC of linear thresholds
The evaluation algorithm of Theorem 1.1 is quite powerful, substantially extending the class of circuits for which we can perform non-trivial circuit analysis. REMINDER OF THEOREM 1.2 For every m > 1 and d > 0, there is an ε > 0 such that counting satisfying assignments to ACC • THR circuits of size 2 PROOF. For all k ∈ N and for i = 1, . . . , 2k, define a symmetric function Bit k i with 2 2k inputs as follows: for all i = 1, . . . , 2k, Bit k i outputs the ith bit of the sum of its input bits. Suppose we are given an ACC•THR circuit C of size s and n inputs, and we wish to count its satisfying assignments. Let < n/2 be a parameter to set later. For every assignment Aj ∈ {0, 1} 2 to the last 2 inputs of C, make a copy of C with the assignment Aj plugged into those 2 inputs, calling this copy CA j . Note that each CA j has (the same) n − 2 inputs x1, . . . , x n−2 . For every i = 1, . . . , 2 , define Bi(x1, . . . , x n−2 ) := Bit i (CA 1 (x1, . . . , x n−2 ), . . . , CA 2 2 (x1, . . . , x n−2 )). Each function Bi can be implemented in s = 2 2 · s size, as a SYM • ACC • THR circuit. Applying Theorem 1.1, Bi can be evaluated on all of its 2 n−2 possible assignments in time 2 n−2 · poly(n) + 2 poly(log s ) ≤ 2 n−2 · poly(n) + 2 poly( +log s) .
The above for-loop over all i produces 2 · 2 n−2 bits: for each of the 2 n−2 partial assignments to n − 2 variables, we learn the number (in 2 bits) of partial assignments on the other 2 variables which result in satisfaction. The number of all satisfying assignments is obtained by simply summing all 2 -bit numbers obtained from the 2 n−2 assignments, in 2 n−2 · poly( ) time.
Letting = n ε /2 for sufficiently small ε > 0, we have a 2 n−n ε time algorithm.
Non-uniform ACC • THR lower bounds
We now turn to the main application of the evaluation algorithm:
REMINDER OF THM 1.3 NEXP does not have non-uniform ACC• THR circuits of quasi-polynomial size.
To set the context, let us discuss the prior connection between known circuit satisfiability algorithms and circuit lower bounds. DEFINITION 2.1. Let C be a circuit class. C is said to be typical if, given any circuit D from one of the classes C•C, AND•C, OR•C, NOT • C, an equivalent D ∈ C can be produced in poly(size(D)) time.
That is, C is typical if it is efficiently closed under composition, unbounded fan-in AND, OR, and negations. Most well-studied circuit classes have this property.
From prior work, we know there are connections between the existence of good SAT algorithms for typical circuit classes, and lower bounds against those classes: THEOREM 2.3 ([WIL11B]). Let C be typical. Suppose for every c ≥ 1, there is an ε > 0 and an an algorithm for satisfiability of C circuits running in time O(2 n−n ε ) on circuits with n inputs and n log c n size. Then NEXP does not have quasi-polynomial size C circuits.
For example, the proof that NEXP ⊂ ACC follows from giving a faster-than-exhaustive-search ACC satisfiability algorithm, noting that ACC is typical, and applying Theorem 2.3. This theorem cannot be directly applied to a class such as ACC • THR, because it is not known whether ACC•THR•ACC•THR can be efficiently simulated with ACC • THR. However, by modifying the argument of Theorem 2.3 and using an algorithm for counting SAT assignments, we can extend the theorem to circuits with a very weak closure property. 9 DEFINITION 2.2. Let C be a circuit class. We say C is weakly closed under AND if, given the AND of two circuits of C, an equivalent circuit in C can be produced in polynomial time.
Weak closure under AND is satisfied by strictly more circuit classes than the property of being typical. To give an example, any class of the form SYM • · · · is weakly closed under AND, because an AND of t SYM gates with s wires can be collapsed into a single symmetric gate with O(s t ) wires (as seen in the proof of Lemma 2.1). However, classes like SYM • SYM are not known to be efficiently closed under composition or unbounded-fan in AND/OR, hence Theorem 2.3 does not apply to such classes. We prove:
THEOREM 2.4. Let C be weakly closed under AND. Suppose for every c ≥ 1, there is an ε > 0 and an algorithm for counting the satisfying assignments of C circuits in time O(2 n−n ε ) on circuits with n inputs and n log c n size. Then NEXP does not have quasipolynomial size C circuits.
Note that Theorem 1.3 (the ACC • THR lower bound) follows immediately from Theorem 2.4 and the counting algorithm of Theorem 1.2. It is our hope that Theorem 2.4 may be applicable in the future to depth-two classes, such as SYM • SYM and depthtwo exact threshold circuits [HP10] : an nontrivial counting SAT algorithm for one of these classes would entail new lower bounds. We survey what is needed to conclude C lower bounds in the proof of Theorem 2.3, and show that the new hypothesis can be used for these needs.
The idea is to show that NEXP ⊂ C and the hypothesis implies every L ∈ NTIME[2 n ] can be simulated in nondeterministic 2 n /n time, contradicting the nondeterminstic time hierarchy [Ž83] . In particular, the assumptions imply that the NEXP-complete problem SUCCINCT 3SAT on circuits of AND/OR/NOT with fan-in two, n inputs, and poly(n) size can be nondeterministically solved in O(2 n−n ε ) time, which is also provably false [Wil11a] . Recall that SUCCINCT 3SAT is the problem: given an AND/OR/NOT circuit C of fan-in two, does the truth table of C encode a satisfiable 3-CNF formula? That is, SUCCINCT 3SAT is a "compressed" version of the 3SAT problem.
Suppose we are given an (arbitrary) circuit C of size s and wish to determine if it is a yes-instance of SUCCINCT 3SAT. Assuming NEXP has quasipolynomial-size circuits, it is proved that for every C encoding a satisfiable 3-CNF F , there is a quasipolynomial-size circuit D which succinctly encodes a satisfying assignment for F : for all i, D(i) outputs the value of variable xi in the satisfying assignment. Our "fast" nondeterministic algorithm for SUCCINCT 3SAT guesses this circuit D, and uses it to construct a circuit E with n inputs and n log c n size for some c, which is unsatisfiable if and only if D encodes a satisfying assignment to the formula F encoded by C.
Assuming NEXP has quasipolynomial-size C circuits and that there is an O(2 n−n ε ) time algorithm for C satisfiability, it is proved that there is a nondeterministic algorithm A running in 2
time which, given an AND/OR/NOT of fan-in two circuit E of size s and n inputs, outputs an equivalent E of s log c s size from the class C on at least one nondeterministic branch (and prints no on other branches). Running this algorithm A, obtaining E , then running the C satisfiability algorithm on E , we nondeterministically determine that C is a yes-instance of SUCCINCT-3SAT in 2 n−Ω(n ε ) time.
Now assume C is weakly closed under AND. The point where closure properties are relevant is precisely in the argument that the nondeterministic algorithm A exists. In fact, if our hypothesis and the assumption that NEXP has quasipolynomial-size C circuits implies such an algorithm, it can be observed that the rest of the proof carries over without modification. We now construct such an algorithm A.
The algorithm A starts by guessing a C circuit E of n log c n size which takes as input a pair (x, g) ∈ {0, 1} n × {0, 1}
log(size(E)) (where size(E) is simply the size of the circuit E), and outputs 1 if and only if the gate g in E outputs 1 when E is given the input x. (Such an E exists, assuming P has quasi-polynomial size C circuits.) Now we need to verify that for every gate g with index ranging from 1, 2, . . . , size(E), E (x, g) outputs precisely what gate g of E(x) outputs, on all x. Each gate g is either an input, an AND of two previous gates g1 and g2, an OR of two previous gates g1 and g2, or a NOT of a previous gate g1.
To aid this verification, we show how to efficiently check for arbitrary C circuits G and H whether G(x) = H(x) for all inputs x, using an algorithm for counting SAT assignments. Let #SAT (C) be the number of satisfying assignments to a circuit C. Observe that G(x) = H(x) for all x if and only if #SAT (G) = #SAT (H) = #SAT (G ∧ H). (Note the third quantity can be efficiently computed, assuming C is weakly closed under AND.) Moreover, G(x) = H(x) for all x if and only if #SAT (G) + #SAT (H) = 2 n and #SAT (G ∧ H) = 0. Therefore, by counting SAT assignments, we have algorithms checking whether G is equivalent to H, and whether G is equivalent to the negation of H, both running in time O(2 n−n ε ). We claim that the verification problem for E can be reduced to a number of calls to the above kinds of checks. First, nondeterministically guess a circuit E not , intended to satisfy E not (x, g) = ¬E (x, g) for all x and g. Verifying this condition can be done by counting SAT assignments, as described above.
Checking E is correct on the input gates of E means that for all i = 1, . . . , n, E (x1, . . . , xn, i) = xi. Both E (x1, . . . , xn, i) and I(x1, . . . , xn) = xi are C circuits, hence their equivalence can be verified by #SAT calls. Checking a NOT gate g of E with input gate g1 is equivalent to checking that E not (x, g1) = E (x, g) on all x. Checking an AND gate g of two previous gates g1 and g2 amounts to checking that E (x, g) = E (x, g1) ∧ E (x, g2) on all x. To do this, compute G and (x) := E (x, g1) ∧ E (x, g2) (assuming C is weakly closed under AND), then check G and (x) = E (x, g) for all x. Finally, for an OR gate g with inputs g1 and g2, we want to check that E (x, g) = E (x, g1) ∨ E (x, g2) on all x. This is equivalent to ¬E (x, g) = ((¬E (x, g1))∧(¬E (x, g2))) for all x. This can be checked by forming Gor(x) := E not (x, g1)∧ E not (x, g2), then checking that Gor(x) = E not (x, g) for all x.
On a circuit E with s ≤ n log c n gates, the above procedure runs
When it concludes, we know that for all gates g and all x that E (x, g) outputs the correct value.
The circuit E (x) output by A simply evaluates E (x, g ), where g is the output gate of E. 2
EFFICIENT EVALUATION OF DEPTH-TWO LINEAR THRESHOLD CIRCUITS
Finally, we show a strong sense in which depth-two threshold circuits are weak, by giving a fast algorithm for evaluating such circuit on many assignments in batch. The general theorem is: THEOREM 3.1. Given a depth-two linear threshold circuit C with 2k inputs and at most n 1/12 gates with weights on the bottom layer of absolute value at most W b , weights on the output gate of absolute value at most Wo, and given two sets A, B ⊆ {0, 1} k where |A| = |B| = n, we can evaluate C on all n 2 points in A×B using n 2 ·poly(log Wo, log n)+n 1+1/12 ·poly(log n, log W b ) time.
The following is immediate from Theorem 3.1: REMINDER OF THEOREM 1.5 Let k > 1. Given a depth-two 2 n/24 -size linear threshold circuit C with integer weights in the
, we can evaluate C on all 2 n input assignments in 2 n · poly(n k ) time. While the proof of Theorem 3.1 also ultimately depends on Coppersmith's rectangular matrix multiplication, the rest of the algorithm is rather different from the evaluation algorithm of Theorem 1.1. PROOF OF THEOREM 3.1. We reduce the evaluation task to a special kind of matrix multiplication, then combine Coppersmith's matrix multiplication with a mild brute force to expedite the matrix multiply.
Define LEQ : Z × Z → {0, 1} to output 1 on (a, b) if and only if a ≤ b. Given a vector w = (w1, . . . , w d ) ∈ Z d , and given two matrices M and N which are n × d and d × n, define their wweighted threshold product to be
We shall show that the w-weighted threshold product of an n × n 1/12 matrix and an n 1/12 × n matrix can be computed in essentially n 2 · poly(log n) time (with some additional but negligible overhead in terms of the weights). Let us postpone this algorithm for the moment, and first show how to embed the evaluation problem into the weighted threshold product.
Let C be a depth-two circuit of size s, with the 2k input variables x1, . . . , x k , y1, . . . , y k . Let w1, . . . , ws be the weights of the top threshold gate of C, and let 1, t1, . . . , s, ts be the corresponding linear forms and threshold values from the bottom layer of threshold gates: that is, the output of LEQ(ti, i) is multipled by wi in the output gate. Without loss of generality, we may assume that all weights wi are multiplied by the output of some threshold gate at the bottom layer (there are at most n wires from the input directly to the output gate, and they can be replaced by O(n) dummy gates at the bottom layer with wires to the output gate). Let A = {A1, . . . , An} ⊆ {0, 1} k and B = {B1, . . . , Bn} ⊆ {0, 1} k . We partition each linear form j on the bottom layer into two sums 
This is precisely the value of the linear form in the output gate of C, when x1, . . . , x k are given the assignment Ai and y1, . . . , y k are assigned Bj. The truth table of C on A × B can be recovered by simply checking which entries in (M w N ) exceed the output gate's threshold.
Next, we shall show how to compute a weighted threshold matrix product efficiently. Let δ be a parameter, and let M and N be n × n δ and n δ × n matrices, respectively. The first step is to reduce the weights significantly. For all k = 1, . . . , n δ , let S k be a list of all entries in the kth column of M , plus the kth row of N . Sort S k , obtaining a ranking of 2n items, and replace each entry in the kth column of M and the kth row of N by their rank in the sorted list S k . This step reduces the domains of M and N to {1, . . . , 2n}, and the w-weighted threshold matrix product remains the same: all inequalities M [i, k] ≤ N [k, j] are preserved. Note this step takes n 1+δ · poly(log n, log W b ) time.
In order to reduce this matrix computation to a standard matrix multiplication, we perform two strategies with different advantages. (The reduction is inspired by work of Matousek [Mat91] on computing dominances in high dimensions.) Let s ∈ {1, . . . , n} be a parameter. Partition each sorted list S k into t = n/s contiguous buckets T1, . . . , Tt, where each bucket Ti contains at most s entries. (For all i < j, the largest entry in Ti is at most the smallest entry in Tj.)
Start with an n × n output matrix P that is all zeroes. For every The above algorithm runs in time O(n · n δ · s log Wo + M M (n, n 1+δ /s, n) · poly(log Wo)),
where M M (a, b, c) is the running time for multiplying a × b and b × c matrices. If we set n 1+δ /s = n 0.172 , then Coppersmith's algorithm (Lemma 2.3) can be applied to the second term of the running time, implementing it in n 2 · poly(log n) time. Under this setting, s = n δ · n 0.828 and the first term of the running time is n 1+2δ+0.828 . Setting δ = 0.086 > 1/12, the first term becomes n 2 (note that s = n .914 ). 2 It is easy to see that, since the above algorithm actually evalutes the linear form at the output gate of a depth-two threshold circuit, we can also efficiently evaluate large SYM • THR circuits as well.
