The best known circuit lower bounds against unrestricted circuits remained around 3n for several decades. Moreover, the only known technique for proving lower bounds in this model, gate elimination, is inherently limited to proving lower bounds of less than 5n. In this work, we suggest a first non-gate-elimination approach for obtaining circuit lower bounds. Namely, we prove that every (unbounded-depth) circuit of size s can be expressed as an OR of 2 s/3.9 16-CNFs. While this structural result does not immediately lead to new lower bounds, it suggests a new avenue of attack on them.
Introduction

Circuits
Boolean circuits is a natural model for computing Boolean functions. A circuit corresponds to a simple straight line program where every instruction performs a binary Boolean operation on two operands each of which is either an input variable or a result of some previous instruction. The structure of this program is extremely simple: no loops, no conditional statements. Still, we have no example of a function from P (or even NP, or even E NP ) that requires at least 3.1n binary instructions to compute (let alone superlinear or superpolynomial size). This is in sharp contrast with the fact that finding such a function non-constructively is easy. For this, one compares the number 2 2 n of different functions of n variables with the number of programs of a fixed size. One then concludes, and this was done by Shannon [Sha49] some seventy years ago, that a random function on n variables has circuit size Ω(2 n /n) with probability 1 − o(1). This bound was later proven to be tight by Lupanov [Lup59] : any function can be computed by a circuit of size about 2 n /n.
Constant-Depth Circuits
Another natural and simple model of computation is bounded depth circuits that correspond to highly parallelizable computations. In this paper, we focus on depth 2 circuits of the form AND•OR (i.e., CNFs) and depth 3 circuits of the form OR • AND • OR (i.e., ORs of CNFs). The usual assumption is that the inputs of the circuit are variables and their negations, and the fan-in of the gates is unbounded. Such circuits are much more structured and therefore are easier to analyze and to prove lower bounds, in particular. For example, it is easy to show that the minimal number of clauses in a CNF computing the parity function of n bits is equal to 2 n−1 , which gives an optimal lower bound on the size of depth-2 circuits. However, already for depth 3 there is again a large gap between known lower and upper bounds: whereas it was shown by Lupanon [Lup61] that the minimum depth-3 circuit size of a random function on n variables is Θ(2 n /n), the best known lower bound for an explicit function is 2 Ω( √ n) [Hås86, HJP93, PPZ97, Bop97, PPSZ05, MW17] .
Much stronger lower bounds are known however for depth-3 circuits where the fan-in of the gates that are close to the inputs is bounded by k. Namely, for any k = O( √ n), a 2 n/k lower bound is proven by Paturi, Saks, and Zane [PPZ97] for the parity function, and a lower bound of 2 µ k n k−1 for k ≥ 3 and some constants µ k > 1 was proven in [PPSZ05] for a BCH code. For example, [PPSZ05] gives a lower bound of 2 0.612n when the bottom fan-in of the circuit is k = 3, and a lower bound of 2 n/10 for the bottom fan-in k = 16. For the case of bottom fan-in k = 2, even a 2 n−o(n) lower bound is known [PSZ97] .
Calabro, Impagliazzo, and Paturi [CIP06] construct a family of 2 O(n 2 ) functions most of which require depth-3 circuits of size 2 n−o(n) . Santhanam and Srinivasan [SS12] improve on this by constructing such a family of functions of size 2 f (n) for every f (n) = ω(n log n).
Valiant's Depth Reduction
Remarkably, the classical result of Valiant from the 70's relates the three computational models described above. Using a depth reduction for DAGs [EGS75] , Valiant [Val77] shows that in any circuit of size cn and depth d, for every integer k, one can remove 2ckn log d wires, such that the resulting circuit has depth d/2 k . Valiant concludes that if the depth of the resulting circuit is non-trivial d/2 k < log n, then a lower bound on depth-3 circuits implies a lower bound against the original circuit model. This way, Valiant shows that any circuit of size O(n) and depth O(log n) can be converted into an OR • AND • OR circuit with the fan-in of the output gate at most 2 O(n/ log log n) and the fan-in of OR-gates fed by the inputs at most n O(1) . Hence, by exhibiting an explicit function that has no depth-3 circuit with these restrictions, one immediately gets that this function cannot be computed by circuits of linear size and logarithmic depth. Unfortunately, the known lower bounds on depth-3 circuits (see Subsection 1.4) are still too far from the ones required for this reduction.
In the same paper, Valiant introduced the notion of matrix rigidity (a similar notion was independently introduced by Grigoriev [Gri76] ) and related it to the size of linear circuits of logarithmic depth using ideas similar to those described above. Alas, the known lower bounds on matrix rigidity are also far from being able to give new lower bounds on the size of linear circuits of logarithmic depth.
Motivating Example
For Valiant's depth reduction, one can have d/2 k < log n (and non-trivial number of removed edges This counter-example shows that the graph-theoretic approach to circuit depth reduction cannot give non-trivial results for unrestricted circuits.
In this paper, we overcome this difficulty by presenting a counterpart of Valiant's depth reduction that works for circuits of unrestricted depth. Our depth reduction takes into account not only the underlying graph of a circuit, but also the functions computed by the circuit gates. We give more details in the next subsection, and here we provide a simple example showing that such a reduction is possible in principle.
By a formula we mean a circuit where each gate has out-degree exactly 1 (for the ease of exposition, here we consider formulas where gates can compute arbitrary binary Boolean operations). Here we show that a circuit of size, say, 2.7n can be computed by an OR of 2 0.9n formulas of small size (2.7n). Since we know an almost quadratic lower bound [Nec66] on formula size, we could hope to find a function which cannot be computed by an OR of fewer than 2 n linear-size formulas. Lemma 1.1. A circuit C of size s can be computed by an OR of 2 ⌈s/3⌉ formulas each of size at most s.
Proof. For s ≤ 3, we just transform a circuit into a single formula of the same size. Now, assume that s > 3 and proceed by induction. If C is a formula, then no transformation is needed. Otherwise take the topologically first gate G of out-degree at least 2. Note that G is computed by a formula (because all previous gates have out-degree 1). Let us denote the size of this formula by t = s(G). Consider two circuits C 0 and C 1 that compute the same function as C on inputs {x ∈ {0, 1} n : G(x) = 0} and {x ∈ {0, 1} n : G(x) = 1}, respectively. Note that s(C 0 ), s(C 1 ) ≤ s − t − 2 ≤ s − 3 since in both C 0 and C 1 one can remove the subcircuit computing the gate G and at least two successors of G. This is because G computes a constant on both parts of the considered partition of the Boolean hypercube, and all gates in the subcircuit of G are needed to compute G only (as G is computed by a formula). Now, note that
Using the induction hypothesis for C 0 and C 1 , we rewrite C as an OR of at most 2 ⌈(s−3)/3+1⌉ ≤ 2 ⌈s/3⌉ formulas of size (s − t − 2) + (t + 1) < s.
This result would imply a circuit lower bound of 3n−o(n) for any function that has correlation at most 2 −n+o(n) with all formulas of linear size. While we do know functions that have exponentially small correlation 2 −εn with formulas of linear size [San10, KLP12, ST13, KRT13, Tal14, IK17], none of them gives a bound of 2 −n+o(n) . Actually, there is an inherent limitation for this approach. By Parseval's inequality, every Boolean function has a Fourier coefficient ≥ 2 −n/2 . This implies that the correlation of this function with the corresponding parity function is at least 2 −n/2 (and this is essentially tight correlation with small formulas for a random function). Every parity function can be computed by a circuit of size ≤ n, thus, Lemma 1.1 would only be able to prove circuit lower bounds of 1.5n.
Therefore, in order to prove stronger circuit lower bounds, we need to improve both parameters: the constant 3 in the exponent, and the class of formulas we reduce circuits to. In the following subsection, we describe a reduction that achieves this: it reduces a circuit to an OR of 2 ⌈ s 3.9 ⌉ formulas each of which is a 16-CNF.
Our Contribution
The main contribution of this paper is counterparts of Valiant's reduction, but for unrestricted circuits. They are summarized 2 in Table 1 . We highlight several important properties of the presented depth reduction techniques below.
improving known lower bound to lower bound implies new lower bound , a function that is not constant on any subset of the Boolean hypercube of size at least s that is defined as the set of common roots of at most m polynomials of degree at most d; Cor(f, d) is the correlation of f with polynomials of degree d; R M (r) is the row-rigidity of M for the rank r, i.e., the smallest row-sparsity of a matrix A such that rank(M ⊕ A) ≤ r.
Easier to achieve. In order to get a new circuit lower bound through Valiant's depth reduction, one needs to achieve a qualitative improvement of known lower bounds for depth-3 circuits or matrix rigidity (see the table) . In a sense, Valiant's result states that an asymptotic improvement in one direction implies an asymptotic improvement in the other one. In contrast, our depth reduction shows that a quantitative improvements of known lower bounds imply modest improvements of circuit lower bounds. Thus, improvements required by our reductions, are probably easier to achieve.
Unrestricted depth. As already mentioned, Valiant's reduction works for circuits of logarithmic depth. Our reduction works for circuits of any depth, though they are only meaningful when the circuit has modest linear size.
Not just graph-based. Graph-theoretic approaches (e.g., Valiant's technique) are inherently limited to circuits of logarithmic depth. We are able to work with unrestricted depth since our example, the second line of the table says that a lower bound of 2 n−o(n) against depth-3 circuits would give a lower bound of 3.9n. On the other hand, a lower bound of 2 0.8n would lead to an elementary proof of a lower bound of 3.1n.
approach is based not just on the underlying graph of a circuit, but also on the actual computations happening inside the circuit.
No case analysis. Unlike gate elimination proofs of circuit lower bounds, our depth reductions contain almost no case analysis.
No known limitations. The strongest known lower bounds for circuits, as well as most of the previous lower bounds, are proven by the gate elimination technique. This technique is known to be too weak to prove a 5n lower bound [GHKK18] . The corresponding limitation does not apply to the depth reduction presented in this paper. We remark that this limitation also does not apply to the approach based on efficient SAT-algortihms [Wil13, BSV14, JMV15] .
New structural questions. On the way of proving new lower bounds, we also study structural results on converting small circuits into ORs of k-CNFs that have curious connections to various properties of k-CNFs (such as guaranteed by Satisfiability Coding Lemma [PPZ97, PPSZ05] and Sparsification Lemma [IPZ01, CIP06] ).
Hierarchy of pseudorandom objects. We also show that improvements on the known constructions of pseudorandom objects (dispersers for varieties, functions with small correlation with low degree polynomials, and rigid matrices) immediately imply stronger circuit lower bounds via presented depth reductions (and significant improvements on these constructions imply strong bounds via Valiant's depth reduction).
Definitions
Unrestricted Circuits
Let B n,m be the set of all Boolean functions f : {0, 1} n → {0, 1} m and let B 2 = B 2,1 . A circuit is a directed acyclic graph that has n nodes of in-degree 0 labeled with x 1 , . . . , x n that are called input gates. All other nodes are called internal gates, have in-degree 2, and are labeled with operations from B 2 . Some m gates are also marked as output gates. Such a circuit computes a function from B n,m in a natural way. The size s(C) of a circuit C is its number of internal gates. This definition extends naturally to functions: s(f ) is the smallest size of a circuit computing the function f . The depth of a gate G is the maximum number of edges (also called wires) on a path from an input gate to G. The depth of a circuit is the maximum depth of its gates. By s log n (f ) we denote the smallest size of a circuit of depth O(log n) computing f .
A circuit is called linear if it consists of ⊕ gates only. The corresponding circuit size measure is denoted by s ⊕ .
Unrestricted circuits are usually drawn with input gates at the top so by a top gate of a circuit we mean a gate that is fed by two variables.
Series-Parallel Circuits
A labeling of a directed acyclic graph G = (V, E) is a function ℓ : V → N such that for every edge (u, v) ∈ E one has ℓ(u) < ℓ(v). A graph/circuit G is called series-parallel if there exists a labeling ℓ such that for no two edges (u, v),
The corresponding circuit complexity measure is s sp .
Depth-3 Circuits
Unlike unrestricted circuits, depth-3 circuits are usually drawn the other way around, i.e., with the output gate at the top. In this paper, we focus on OR • AND • OR circuits, i.e., ORs of CNFs. We will use subscripts to indicate the fact that the fan-in of a particular layer is bounded. Namely, an OR p • AND q • OR r circuit is an OR of at most p CNFs each of which contains at most q clauses and at most r literals in every clause. Since the gates of a depth 3 circuit are allowed to have an unbounded fan-in, it is natural to define the size of such a circuit as its number of wires. It is not difficult to see that for k = O(1) the size of an OR • AND • OR k circuit is equal to the fan-in of its output gate up to a polynomial factor in n. By s k 3 (f ) we denote the smallest size of an OR • AND • OR k circuit computing f .
Rigidity
We say that a matrix M ∈ {0, 1} m×n is s-sparse if each row of M contains at most s non-zero elements. The rigidity of a matrix M ∈ {0, 1} m×n for the rank parameter r is the minimum sparsity of a matrix A ∈ {0, 1} m×n such that rank F 2 (M ⊕ A) ≤ r:
Depth Reductions
In this section, we present new depth reductions for circuits with unrestricted depth. First, we present the classical depth reduction results by Valiant [Val77] . This result applied to linear circuits gives the following theorem.
, 1} m×n be a matrix. For every c, ε > 0 there exists δ > 0 such that if a linear circuit C of size cn and depth c log n computes M x for every x ∈ {0, 1} n , then
If C is a series-parallel linear circuit of size cn and unbounded depth, then R M (εn) ≤ δ .
Linear Circuits
In this subsection, we deal with linear circuits, i.e., circuits consisting of ⊕ gates only. For technical reasons, we assume that there are n + 1 input gates of a linear circuit: x 1 , . . . , x n as well as 0. For a matrix M ∈ {0, 1} m×n , we say that a linear circuit computes the linear transformation M x (or just the matrix M itself), if some m gates of the circuit are labeled as outputs and the i-th output computes the linear sum of the subset of n input variables specified by the i-th row of M (in particular, if a row of M has at most one 1, then the corresponding output label is placed on the corresponding input gate). When the m output gates of C are specified, for x ∈ {0, 1} n , we treat C(x) as the vector of output values. Then, C computes M if C(x) = M x for all x ∈ {0, 1} n . We say that a linear circuit C computing M is optimal if no other circuit of smaller size computes M .
The main result of this subsection asserts that matrices computable by small circuits are not too rigid. The contrapositive of this statement is: to get an improved lower bound on the size of linear circuits, it suffices to construct a matrix with good rigidity parameters.
Theorem 3.3. Let M ∈ {0, 1} m×n and let C be a linear circuit of size s computing M . Then
Proof. If s < 16 or the depth of C is at most 4, then each output depends on at most 16 variables. Hence M is 16-sparse and the theorem statement holds. Consider this as the base case of induction on s. For the induction step, we assume further that C is optimal (if it is not, the statement holds just by the induction hypothesis).
We now "normalize" the circuit C. Namely, our goal is to show that the matrix M can be decomposed into the sum A ⊕ B where the matrix A is 16-sparse and the matrix B has rank at most ⌊s/4⌋. Now, if C has an output gate H of depth at most 4 (recall that the depth of a gate is the maximum number of wires on a path from the gate to an input gate), then H computes a linear function that depends on at most 16 input variables. This, in turn, means that the corresponding row of M has at most 16 ones. Consider now the matrix M H resulting from M by removing the corresponding row. It is not difficult to see that R M H (⌊s/4⌋) ≤ 16 implies R M (⌊s/4⌋) ≤ 16. Indeed, assume that M H = A H ⊕ B H where A H is 16-sparse and rank(B H ) ≤ ⌊s/4⌋. To get the same decomposition for M , we add to M H and A H the removed row and we add the all-zero row to B H . Clearly, the resulting matrix A is 16-sparse and the rank of the resulting matrix B does not change. Thus, in the following, we assume that C has no output gates of depth at most 4.
Claim 3.4. Let C be an optimal linear circuit computing M ∈ {0, 1} m×n such that s(C) ≥ 16, and no output gate of C has depth smaller than 5. Then C contains a gate G such that there exists a linear circuit C ′ computing M ′ ∈ {0, 1} m×n (i.e., of exactly the same size as M ) such that
Consider the circuit C ′ and the matrix M ′ provided by Claim 3.4. Let g ∈ {0, 1} 1×n be a characteristic vector of the linear function computed at the gate G by the circuit C: G(x) = gx. We know that gx = 0 implies (M ⊕ M ′ )x = 0. Hence (M ⊕ M ′ ) is either zero or defines exactly the same linear subspace as g: M ⊕ M ′ = tg for a vector t ∈ {0, 1} m×1 . Case 1: there exists a gate in C that has depth at least 2 and at most 4 and has out-degree at least 2. Call it G, call its predecessors B and C, and call two of its successors D and E, see Figure 3 .1 (in this and following figures we write the out-degrees of some of the gates near them). The circuit C ′ is obtained from C by simplifying C using G = 0. Indeed, the gate G is not needed in C ′ . Also, B(x) = C(x) for all x ∈ {0, 1} n where G(x) = 0. At least one of B and C must be an internal gate (otherwise G would have depth 1), let it be C. Since C computes the same function as B, it may be removed from C ′ : we remove it and replace every wire of the form C → H by a new wire B → H. Note that neither G nor C is an output gate. Now, we show that both D and E can also be removed. Let us focus on the gate D (for E it is shown similarly) and call its other predecessor F . Since G = 0, the gate D computes the same function as F . This means that one may remove D: we remove it and replace every wire D → H by a wire F → H. If D happens to be an output gate, we move the corresponding output label from D to F .
Case 1: under G = 0, the gate G is removed, B is replaced by C, D and E are replaced by their other predecessors.
Case 2: under G = 0, the gates B, C, and G are removed whereas E is replaced by F . Case 2: all gates of depth at least 2 and at most 4 have out-degree exactly 1 in C. Take a gate G of depth 4 and follow back its longest path to an input:
Let also E be the successor of G. Note that the gates B and C have out-degree 1. This essentially means that in C they are used for computing the gate G only. This, in turn, means that under G = 0 one removes G, B, and C (none of them is an output). Also, the gate E is replaced by the other input F of E (F = B, C, G since C is optimal).
Remark [PP06, Lok09] . While the current lower bounds on the outer dimension of explicit matrices do not lead to new circuit lower bounds, it would be interesting to study their applications in this context.
General Circuits
In this section, we study the following natural question: given a circuit 3 , what is the smallest OR • AND • OR k circuit computing the same function? To this end, we introduce the following notation. For an integer k ≥ 2, we define α(k) as the infimum of all values α such that any circuit of size s can be rewritten as a OR 2 αs • AND • OR k circuit.
For proving upper bounds on α(k) it will be convenient to consider the following class of circuits. Let OR p • AND q • C(r) be a class of circuits with an output OR that is fed by at most p AND's of at most q circuits of size at most r. Theorem 3.7. A circuit of size s can be computed as:
Note that any circuit of size r depends on at most r + 1 variables and hence can be written as an (r + 1)-CNF with at most 2 r clauses. This implies that an OR p • AND q • C(r) circuit can be easily converted into a OR p • AND q2 r • OR r+1 circuit. This way, we get the following corollary from Theorem 3.7. 3 In this section we consider functions with one output, but these results can be trivially generalized to the multioutput case. Proof of Theorem 3.7. Both parts are proven in a similar fashion. We proceed by induction on s. The base case is when s is small. We then just have an OR 1 • AND 1 • C(s) circuit.
For the induction step we take a gate G of C and consider two circuits C 0 and C 1 where C i computes the same as C on all inputs {x ∈ {0, 1} n : G(x) = i}. We may assume that both C i 's have the smallest possible size among all such circuits. Since C i can be obtained from C by removing the gate G (as it computes the constant i on the corresponding subset of the Boolean hypercube), we conclude that s(C i ) < s. This allows us to proceed by induction. Assume that by the induction hypothesis C i is guaranteed to be expressible as an OR p i • AND q i • C(r i ) circuit. We use the following identity to convert C into the required circuit:
Assume that the subcircuit of C computing the gate G has at most t gates. We claim that [G(x) = i] ∧ C i can be written as an OR p i • AND q+1 • C(max{r i , t}) circuit. For this, we just feed a new circuit computing G to every AND gate. Plugging this into (1), gives an
circuit for computing C. Below, we provide details specific to each of the two items from the theorem statement. In particular, we estimate the parameters p i 's, q i 's, r i 's, and t and plug them into (2).
1. The base case is s = 1. Then C consists of a single gate and can be expressed as an OR 1 • AND 1 • C(1) circuit. For the induction step, assume that s ≥ 2 and take a gate A that depends on two variables. Let G = A, hence t = 1. The gate A must have at least one successor (otherwise C can be replaced by a circuit with smaller than s gates). Clearly, A and its successors are not needed in C i 's. Hence, by the induction hypothesis p i ≤ 2 s−2 2 +1 , q i ≤ s−2 2 + 1, r i ≤ 1. Plugging this into (2) gives the desired result.
2. Take a gate A that is fed by two variables x and z and has the maximum distance to an output. If its distance to output is at most 4, then s(C) ≤ 15 and we just rewrite it as an OR 1 • AND 1 • C(15) circuit. This is the base case. Assume now that the distance from A to the output gate is at least 5. In the analysis below, we always "follow" the longest path from A to the output. This allows us to conclude that any such path is long enough and hence each gate considered has positive out-degree (i.e., is not an output). Moreover, each gate on this path cannot depend on too many variables. Denote the variables that feed A by x and z and let B be a successor of A on the longest path to the output.
In the five cases below, we show that we can always find a gate G that C(G) ≤ 15 and both s(C 0 ) and s(C 1 ) are small enough. In particular, s(C 0 ), s(C 1 ) ≤ s − 4 works for us: Figure 2 for an illustration of the five cases. For a gate G, by out(G) we denote the out-degree of G.
Case 1: out(B) = 1. Let C be the successor of B. Case 1.1: out(C) = 1. Let E be the successor of C. Let G = E. In C i 's, one removes B, C (as they were only needed to compute E that is now a constant), E, and the successors of E. Case 1.2: out(C) ≥ 2. Let G = C. In C i 's, one removes B, C, and the successors of C. ⌉ . This is smaller than 2 Remark 3.9. It is not difficult to see that the output OR gate can be replaced by a SUM gate over the integers. In other words, for any x ∈ {0, 1} n , at most one of the subcircuits feeding the OR gate may evaluate to 1. This holds because we always consider two mutually exclusive cases: G = 0 or G = 1.
Properties of α(k)
We start by observing a lower bound on α(k).
Lemma 3.10. For any integer k ≥ 2, α(k) ≥ 1/k.
Proof. Let ⊕ n denote the parity function of n inputs. It has 2 n−1 inputs where it is equal to 1 and all these inputs are isolated, that is, the Hamming distance between any pair of them is at least 2. As proven by Paturi, Pudlák, and Zane [PPZ97] , any k-CNF has at most 2 n(1−1/k) isolated satisfying assignments. This implies that f cannot be computed by an OR of fewer than 2 n/k−1 k-CNFs. Since s(⊕ n ) = n − 1, this implies that
Since this must hold for arbitrary large n, α(k) ≥ 1/k.
Thus, we know the exact value of α(2) = 1 2 . This immediately implies a circuit lower bound of 2n − o(n) for BCH codes. Indeed, it was shown in [PSZ97] that when the bottom fan-in is restricted to k = 2, then BCH codes require depth-3 circuits of size 2 n−o(n) . And, since α(2) = 2, they must have circuit complexity at least 2n − o(n).
One can use techniques from Theorem 3.7 to prove an upper bound of α(3) ≤ log 2 3 4 . Thus, we know that 1 3 ≤ α(3) ≤ log 2 3 4 < 0.3963 .
We conjecture that the upper bound on α 3 is tight. One way to prove this would be to find the s 3 3 complexity of the inner product function: IP(x 1 , . . . , x n ) = x 1 x 2 ⊕ x 3 x 4 ⊕ · · · ⊕ x n−1 x n . In particular, if the upper bound shown in the next lemma is tight, then α(3) = log 2 3 4 .
Lemma 3.11.
Proof.
1. The function IP is known to be a disperser for projections for dimension d = −o(n) lower bound on s 2 3 is proven by [PSZ97] . The upper bound follows from the fact that IP(x 1 , . . . , x n ) = 1 iff there is an odd number of 1's among
Hence,
It remains to note that each [p i = c] can be expressed as a 2-CNF because p i depends on two variables.
2. The lower bound is a direct consequence of the lower bound s 3 3 (⊕ n ) ≥ 2 n 3 (by substituting every second input of IP by 1, one gets the function ⊕ n 2 ).
For the upper bound, note that IP(x 1 , . . . , x n ) = 1 iff there is an odd number of 1's among
To compute IP by a depth 3 circuit, we go through all possible 2 n 4
−1 values of p 1 , . . . , p n 4 such that an odd number of them is equal to 1:
Now, we show that [p i = 0] can be written as a single 3-CNF, whereas [p i = 1] can be expressed as an OR of two 3-CNFs. W.l.o.g. assume that i = 1. The clauses of a 3-CNF expressing [p i = 0] should reject all assignments to x 1 , x 2 , x 3 , x 4 ∈ {0, 1} where IP(x 1 , x 2 , x 3 , x 4 ) = 1. In all such assignments, one of the two monomials (x 1 x 2 and x 3 x 4 ) is equal to 0 whereas the other one is equal to 1. Hence, one needs to write down a set of clauses rejecting the following four partial assignments: {x 1 = 0, x 3 = x 4 = 1}, {x 2 = 0, x 3 = x 4 = 1}, {x 1 = x 2 = 1, x 3 = 0}, {x 1 = x 2 = 1, x 4 = 0}. Thus,
[p 1 (x 1 , x 2 , x 3 , x 4 ) = 0] ≡ (x 1 ∨¬x 3 ∨¬x 4 )∧(x 2 ∨¬x 3 ∨¬x 4 )∧(¬x 1 ∨¬x 2 ∨x 3 )∧(¬x 1 ∨¬x 2 ∨x 4 ) .
In turn, to express [p 1 = 1] as an OR of two 3-CNFs we consider both assignments to x 1 :
It remains to note that each of [x 2 ⊕ x 3 x 4 = 0] and [x 3 x 4 = 1] can be written as a 3-CNF.
, and R i are 3-CNFs. One may then expand (3) as follows:
The fan-in of the resulting OR-gate is
Open Problem 3.1. Determine s 3 3 (IP).
Besides finding the exact values of α(k), it would be interesting to find out whether every circuit of linear size can be computed by a non-trivial depth 3 circuit with constant bottom fan-in.
Open Problem 3.2. Prove or disprove: for any constant c, any circuit of size cn can be computed as an
circuit where δ(c) > 0.
For example, we can consider one of the classes where we know linear upper bounds on circuit complexity. For any symmetric function f (i.e., a function whose value depends only on the sum over integers of the input bits) we know that s(f ) ≤ 4.5n + o(n) [DKKY10] and that s 2 3 (f ) ≤ poly(n) · 1.5 n [PSZ97]. For 0 < x < 1, let H(x) = −x log x − (1 − x) log(1 − x) be the binary entropy function. Generalizing the result of [PSZ97] , one gets an upper bound s k 3 (f ) ≤ 2 β(k)n , where
For every fixed k, it is trivial to find this maximum and β(k). In particular, we have s 3 3 (f ) ≤ poly(n) · 4 3 n , and
n for any symmetric function f . 4
Since in our depth reduction results, we always get k-CNFs with small linear number of clauses, it is interesting to study the expressiveness of OR of exponential number of such k-CNFs. Let us define α(k, c) as the infimum of all values α such that any circuit of size at most cn can be computed as an OR 2 αn • AND cn • OR k . We can upper bound the rate of convergence of α(k, c) using the following width reduction result for CNF-formulas [Sch05, CIP06] .
Theorem 3.12 ( [Sch05, CIP06] ). For any constant 0 < ε ≤ 1 and a function C : N → N, any CNF formula f with n variables and n · C(n) clauses can be expressed as f = OR t i=1 f i , where t ≤ 2 εn and each f i is a k-CNF formula with at most Cn clauses, where
For our applications, we are interested in α(k, c) for small fixed c. Since for every c, α(k, c) is a non-increasing bounded sequence, we let α(∞, c) = lim k→∞ α(k, c). Then Theorem 3.12 implies
Applications
In this section, we state formally the results that are presented in the last three row-blocks of Table 1 . Namely, we show that improving the parameters for the known explicit constructions of the following pseudorandom objects imply circuits lower bounds via depth reduction techniques presented in the previous section:
• functions that are not constant on any large algebraic variety in {0, 1} n defined by polynomials of small degree (such functions are called dispersers);
4 Here by an upper bound on s ∞ 3 we denote the expression (lim k→∞ 2 β k ) n .
• functions that agree with any polynomial of small degree on roughly half of the points in {0, 1} n ;
• matrices that are far from matrices of small rank.
For comparison, we also show what these tools give when applied to Valiant's reductions.
Dispersers
In this section we show that dispersers for algebraic varieties over F 2 cannot be computed by small circuits. We note that dispersers for varieties of degree one have been used for proving lower bounds on unrestricted circuits [DK11, FGHK16] , and it is known that an explicit construction of a disperser for varieties of degree two would slightly improve the known circuit bounds [GK16] . Now we show that dispersers for varieties of degree 16 will give new circuit lower bounds via a new simple method.
it is a set of common roots of at most m polynomials of degree at most d: We will make use of the Sparsification Lemma first proven by Impagliazzo, Paturi and Zane [IPZ01] . The dependence of C on k was later improved in [CIP06] . (And this is essentially tight by [MRW05] .) Theorem 4.1 (Corollary 1 in [IPZ01] , Section 6 in [CIP06] ). For all ε > 0 and positive k, there exists C such that any k-CNF formula f with n variables can be expressed as f = OR t i=1 f i , where t ≤ 2 εn and each f i is a k-CNF formula with at most Cn clauses, where
Now we are ready to state the main result of this section.
Theorem 4.2. Let f : {0, 1} n → {0, 1} be a function with |f −1 (1)| ≥ |f −1 (0)| and ε > 0 be a constant. 5
• If f is an (16, 1.
• If f is (n ε , ∞, 2 n−ω(n/ log log n) )-disperser, then s log (f ) = ω(n).
Proof.
5 If |f −1 (1)| < |f −1 (0)|, one can consider the negation of f , since taking negations does not change the disperser parameters.
• From Theorem 3.7, we know that if f is computable by a circuit of size s, then f is also computable by a circuit C ∈ OR 2 s/3.9 • AND s/3 • C(15). Let t = 2 s/3.9 , and let f 1 , . . . , f t : {0, 1} n → {0, 1} be the t functions computed in the gates of the AND level of
Each f i is an AND s/3 • C(15), that is, a set of common roots of s/3 polynomials of degree 16 (recall that over F 2 every monomial is multilinear; hence a circuit of size 15 computes a polynomial of degree at most 16). Since f is a disperser for varieties of size 2 εn defined by s/3 polynomials of degree 16, each f −1 i (1) ≤ 2 εn . Now, (4) implies that s/3.9 ≥ n − εn − 1.
• The proofs of items (2)- (4) of this theorem follow the same pattern, so we only present the proof of the second item. Assume, towards a contradiction, that an (ω(1), O(n), 2 (1−ε)n )-disperser f can be computed by a series-parallel circuit of size cn. From Theorem 3.1, such a circuit can be expressed as a circuit C ∈ OR 2 εn 3
• AND • OR k for k = k(c, ε). By Theorem 4.1, each k-CNF computed by the AND gates of C, can be replaced by an OR of 2 εn 3 k-CNFs with Cn clauses each where C = C(δ, ε). Let t = 2 2εn 3 , and let f 1 , . . . , f t : {0, 1} n → {0, 1} be the t k-CNFs with Cn clauses whose OR computes f . Now we have that each f i is an AND Cn • OR k , that is, a set of common roots of Cn polynomials of degree k (each computing an OR k ). From the disperser property of f , we have that each f i computes at most 2 (1−ε)n ones of f . Therefore, in order to compute all ≥ 2 n−1 ones of f , t must be greater than 2 εn−1 , which contradicts the definition t = 2 2εn 3 .
We remark that in the first item of Theorem 4.2, even dispersers for varieties defined by 1.3(1 − ε)n functions of 16 variables (rather than all polynomials of degree 16) will suffice for proving a lower bound.
In order to prove a new circuit lower bound against unrestricted circuits, it suffices to construct a (16, 1.05n, 2 0.2n )-disperser. There are known constructions of dispersers for constant-degree varieties over large fields [Dvi12, BSG12, LZ18] . For F 2 , a long line of work achieved almost optimal dispersers for degree d = 1 varieties, which are not constant on sets of size 2 log n c for a constant c [Li16] . Also, the known constructions can handle large varieties of large degrees [Rem16] , or smaller varieties of size 2 αn of constant degree (for a constant α) [LZ18] . On the other hand, the result of Cohen and Tal [CT15, Theorem 5], together with an efficient construction of affine dispersers from [Li16] , gives an explicit construction of 16, n (log n) c , 2 o(n) -disperser (it handles varieties of the desired size, but only defined by fewer polynomials). Thus, although the currently known constructions do not suffice for proving new lower bounds, they are tantalizingly close to the ones needed for a simple proof of circuit lower bounds via Theorem 3.7.
We conclude this section with a simple counting argument showing that a random function is a disperser with great parameters. 
Thus, a random function is an (d, k, s)-disperser with probability at least 1 − o(1).
Correlation with Polynomials
In this section we show that a function that has small correlation with low-degree polynomials has high circuit complexity. We show this by using a known connection between correlation with polynomials and dispersers for varieties.
Definition 4.3. For two functions f, g : {0, 1} n → {0, 1}, we define their correlation as
where x is drawn uniformly at random from {0, 1} n .
By Cor(f, d) we denote the correlation of a function f with polynomials of degree d:
where the maximum it taken over all polynomials g of degree at most d. We use the fact that small correlation with polynomials of degree d implies small correlation with products of polynomials of degree d, and, as a consequence, a disperser for varieties of degree d.
Proof. Consider a variety S = {x ∈ {0, 1} n : q 1 (x) = · · · = q k (x) = 0}, where each q i : {0, 1} n → {0, 1} is a non-constant polynomial of degree at most d. Let g(x) = k i=1 (q i (x)⊕1) be the indicator function of S, and from the Fourier expansion we have
Now note that for any S ⊆ {1, . . . , k},
because i∈S q i (x) is a polynomial of degree at most d and Cor(f, d) ≤ ε. Now
In particular, for any variety S of size |S| > ε2 n , f (x) is not constant. Now Theorem 4.2 and Lemma 4.4 imply the following result.
Theorem 4.5. Let f ∈ B n and ε > 0 be a constant.
• If Cor(f, 16) ≤ 2 −n(1−ε) , then s(f ) ≥ 3.9(1 − ε)n − 4.
• If Cor(f, ω(1)) ≤ 2 −εn , then s sp (f ) = ω(n).
• If Cor(f, 2 (log n) 1−o(1) ) ≤ 2 −εn , then s log (f ) = ω(n).
• If Cor(f, n ε ) ≤ 2 −ω(n/ log log n) , then s log (f ) = ω(n).
Rigidity
In order to prove super-linear circuit lower bounds for log-depth circuits via Valiant's reduction, one needs to construct matrices M with rigidity R M δn log log n > n ε or rigidity R M (εn) > 2 (log n) 1−δ for some constant ε > 0 and every constant δ > 0. For super-linear lower bounds for series-parallel circuits, one needs to find matrices with rigidity R M (εn) > δ. Also, Razborov [Raz89] proved that rigidity R M 2 (log log n) δ > n 2 (log log n) ε gives a language that does not belong to the polynomial hierarchy for communication complexity. The best known explicit lower bound on rigidity for every r is R(r) ≥ Ω n r log n r [Fri93, PV91, SSS97, Lok09]. 6 Thus, for new bounds via Valiant's reduction (or Razborov's reduction for communication complexity), one needs to improve the known bounds asymptotically.
In order to get new circuit lower bounds via Theorem 3.3, we need to find a matrix M ∈ {0, 1} n×n with rigidity R M (0.75n) > 16 (or a rectangular matrix M ∈ {0, 1} m×n for m ≥ n which is rigid for higher rank R M ( n 2 + m 4 ) > 16). There are several explicit construction of matrices having rigidity R(εn) > 16 for some constant ε [Fri93, PV91, SSS97, Lok09]. Valiant [Val77] showed that a random matrix M ∈ {0, 1} n×n has rigidity R(r) ≥ (n−r) 2 −2n−log n log(2n 2 ) for any r < n − √ 2n + log n. In particular, R M (n − 2 √ n) ≫ 16 for a random matrix M . As for explicit constructions, Pudlák and
Vavřín [PV91] found the exact value of rigidity (for every rank r) of the upper triangular matrix T n ∈ {0, 1} n×n . In particular, they showed that R( n 65 ) > 16. A matrix which is rigid for larger values of rank (at the price of having more outputs) was given in [PR94] and [JS13, Theorem 3.36]: A generator matrix M ∈ {0, 1} m×n of a linear code with relative distance δ > 0 for any r ≤ n/16 has rigidity R M (r) ≥ δn log(n/r) 8(r + log(n/r))
.
We now show that using the ideas from [Fri93, SSS97] , one can improve this constant, but this is still not sufficient for getting new bounds using Theorem 3.3.
Recall that H(x) = −x log x − (1 − x) log(1 − x) for 0 < x < 1, and that the generator matrix M ∈ {0, 1} m×n of a code can always be transformed such that the first n rows of M form the identity matrix. Proof. We will show that for every 16-sparse matrix B, rank(A ⊕ B) > αn · H δ(1 − α) 2α(1 − α)R + 32α
− o(n) .
First we take the αn sparsest columns of B. By Markov's inequality, each of them has at most 16m (1−α)n non-zero entries. Let A ′ , B ′ , M ′ ∈ {0, 1} m×αn be the submatrices of A, B, and M corresponding to this set of αn columns. For a vector x ∈ {0, 1} n , let |x| be the number of non-zero elements in it.
Since M generates a code with relative distance δ, we have that for every non-zero x ∈ {0, 1} n , |M x| ≥ δm. From M x = I A x = x Ax , we have that |Ax| ≥ δm − |x|. Since this holds for every non-zero x, including x with zeros in all coordinates not in A ′ , we get that for every x ∈ {0, 1} αn , |A ′ x| ≥ δm − |x|.
6 There is also a semi-explicit construction due to Goldreich and Tal [GT16] . This construction uses O(n) random bits and has rigidity R(r) ≥ Ω n 2 r 2 log n for every r ≥ √ n. This bound is better than the known explicit bounds for r = o n log n log log n . Now we only consider non-zero x ∈ {0, 1} αn with exactly k = βn ones where β = Let us consider Justesen's code [Jus72] , [MS77, Chapter 10, §11, Theorem 12]. For δ = 0.077, we have an efficient construction of a linear code with rate R = 0.15. In Lemma 4.6, we set α = 0.182 and get that this matrix is rigid for rank r > n 64 beating the bound from [PV91] (at the price of having m − n = n(1/R − 1) outputs).
If we take the concatenation of a Reed-Solomon code (as the outer code) and an optimal linear inner code, then for every δ we can construct in polynomial time a code with relative distance δ matching the Zyablov bound (see, e.g., the discussion in [ABN + 92]): R = max δ≤µ≤0.5
(1 − H(µ)) 1 − δ µ .
In particular, if we take such a code with δ = 0.49, then in the Zyablov bound we set µ = 0.493 and get R ≈ 8 · 10 −7 . Now we set α = 0.252 in Lemma 4.6, and get rigidity for rank as high as r > n 15 (at the price of having too many outputs).
Open Problems
In this section we give a short summary of pseudorandom objects which would lead to new circuit lower bounds via depth reductions described in Section 3.
Open Problem 4.1. Prove that E NP contains a language f having one of the following properties:
• f cannot be computed by an OR 2 0.2n • AND n·2 15 • OR 16 .
• f is a disperser for varieties of size at least 2 0.2n defined by 1.05n polynomials each of which depends on at most 16 variables (and, thus, has degree at most 16).
• f has correlation at most 2 −0.2n with polynomials of degree 16.
• f is a linear function defined by a matrix M ∈ {0, 1} n×n of rigidity R M (0.8n) > 16 (that is, in order to decrease the rank of M to 0.8n, one has to change more than 16 elements in some row of M ).
