We study correlation bounds and pseudorandom generators for depth-two circuits that consist of a SYM-gate (computing an arbitrary symmetric function) or THR-gate (computing an arbitrary linear threshold function) that is fed by S AND gates. Such circuits were considered in early influential work on unconditional derandomization of Luby, Veličković, and Wigderson [LVW93], who gave the first non-trivial PRG with seed length 2 O( √ log(S/ε)) that ε-fools these circuits.
Introduction
Depth-2 circuits which have a SYM or THR gate at the output and AND gates (of arbitrary fan-in) adjacent to the input variables are central objects of interest in concrete complexity, lying at the boundary of our understanding for many benchmark problems such as lower bounds, learning, and pseudorandomness. The class of SYM • AND circuits (also known as SYM + circuits) has received much attention even in the restricted case of polylog(n) bottom fan-in because of the well-known connection with the complexity class ACC 0 [Yao90, BT94, CP16] , a connection that is at the heart of Williams's breakthrough circuit lower bound [Wil11] showing NEXP = ACC 0 . Another wellstudied subclass, corresponding to the special case where the SYM gate computes the parity of its inputs, is the class of S-sparse polynomials over F 2 , which have been intensively studied in a wide range of contexts such as learning [SS96, Bsh97, BM02] , approximation and interpolation [Kar89, GKS90, RB91], deterministic approximate counting [EK89, KL93, LVW93] , and property testing [DLM + 07, DLM + 10]. Turning to THR gates (which compute an arbitrary linear threshold function of their inputs) as the top gate, the class of THR • AND circuits of size-S is easily seen to contain the class of S-sparse polynomial threshold functions over {0, 1} n . This class, and special cases of it such as low-degree polynomial threshold functions, has also been intensively studied in complexity theory, learning theory, and derandomization, see e.g. [MP68, Gol97, KP98, KKMS08, Pod09, MZ10, DKN10, DOSW11, Kan12, DS14] and many other works. In this work we focus on pseudorandom generators for these {SYM, THR} • AND circuits.
In 1993 Luby, Veličković, and Wigderson [LVW93] gave the first pseudorandom generators for these depth-2 circuits. As we shall discuss in detail below, this result was subsequently extended in various ways by different authors, but prior to the present work no strict improvement of Theorem 1 was known for the class of circuits that it addresses. that ε-fools the class of size-S SYM • AND circuits over {0, 1} n . The same is true for the class of size-S THR • AND circuits. 1 The main contribution of the present work is an exponential improvment of Theorem 1's dependence on ε, giving the first strict improvement of the [LVW93] seed length:
Theorem 2 (Our main result). There is a PRG with seed length 2 O( √ log S) + polylog(1/ε) that ε-fools the class of size-S SYM • AC 0 circuits. The same is true for THR • AC 0 circuits.
Theorem 2 improves on a result of Viola [Vio07] which, building on [LVW93] , gave a 2 O( √ log(S/ε)) -seed-length PRG for size-S SYM • AC 0 circuits. The [Vio07] PRG combines correlation bounds against SYM • AC 0 circuits with the Nisan-Wigderson "hardness versus randomness" paradigm, which yields pseudorandom generators from correlation bounds; we similarly prove Theorem 2 by establishing improved correlation bounds and using the Nisan-Wigderson paradigm.
Near-optimal hardness-to-randomness conversion. A major theme in computational complexity over the the last several decades, dating back to the seminal works of [Sha81, Yao82, BM84, Nis91, NW94], has been that computational hardness can be converted into pseudorandomness. This insight is at the heart of essentially all unconditional pseudorandom generators, and motivates the goal of understanding when and how this conversion can be carried out in a quantitatively optimal manner. With this perspective in mind, we observe that the dependence on ε in Theorem 2 is optimal up to polynomial factors, and as we discuss in Section 1.3, achieving better dependence on S even for the special case of {SYM, THR} • AND circuits would require groundbreaking new lower bounds against low-degree F 2 polynomials and ACC 0 circuits. Hence Theorem 2 achieves a near-optimal hardness-to-randomness conversion for {SYM, THR} • AC 0 circuits; the seed length of our PRG is essentially the best possible given current state-of-the-art correlation bounds and circuit lower bounds.
The exponential improvement in 1/ε over [LVW93] 's seed length translates immediately into significantly improved deterministic approximate counting and deterministic search algorithms for {SYM, THR} • AC 0 circuits, two basic algorithmic tasks in unconditional derandomization 2 (see e.g. [AW85] for formal definitions of these tasks and a discussion of how PRGs yield deterministic algorithms for them).
In the rest of this introduction we provide background and context for our results and explain the main ingredients that underlie them.
Prior PRGs and correlation bounds for {SYM, THR} • AC

0
As mentioned above, the first results on PRGs for SYM•AND circuits were given in early influential work of Luby, Veličković, and Wigderson [LVW93] , who constructed a PRG that ε-fools size-S SYM • AND circuits over n variables with seed length 2 O( √ log(S/ε)) . The work of [LVW93] employed ideas similar to those in the "hardness versus randomness" paradigm of [NW94] , which subsequently came to be well understood as a versatile technique for constructing pseudorandom generators from correlation bounds.
A number of years later, with the [NW94] framework in hand, Viola [Vio07] made the useful observation that correlation bounds against the larger class of SYM • AND • OR circuits translate to PRGs for SYM • AND circuits in a "black-box" manner via [NW94] , and the same is true when the top gate is THR instead of SYM. (Informally, the [NW94] translation "costs" two layers of depth: with typical parameter settings, it yields PRGs for a class C from correlation bounds against C • ANY log n circuits, where an ANY t gate computes an arbitrary t-variable Boolean function. By rewriting the ANY log n gate as a CNF, it is possible to collapse the two adjacent layers of AND gates, yielding Viola's observation.) Roughly speaking, in this translation from correlation bounds against C • ANY log n to PRGs that ε-fool C,
• the larger the C • ANY log n circuits for which the correlation bound holds, the better (smaller) is the PRG's seed length for fooling size-S functions in C; and
• the smaller the advantage over random guessing that the correlation bound establishes, the better (smaller) is the PRG's seed length's dependence on the fooling parameter ε.
Motivated by this template, [Vio07] established n −Ω(log n) correlation bounds against SYM• AC 0 circuits of size n Ω(log n) . This translates (see Appendix A) into a PRG with seed length 2
for size-S SYM • AC 0 circuits over {0, 1} n , matching the seed length achieved by [LVW93] but for a larger class of circuits (and also with a simpler and more modular proof). While [Vio07] does Circuit type Circuit size S Correlation bound PRG seed length Subsequent work of [LS11] established a strong correlation bound of exp(−Ω(n 1−o(1) )) against SYM•AC 0 and a correlation bound of exp(−Ω(n 1/2−o(1) )) against THR•AC 0 , but in both cases only for such circuits of size n O(log log n) . Via the Nisan-Wigderson framework [NW94] this translates into a PRG with seed length 2 O(log S/ log log S) + polylog(1/ε) for size-S {SYM, THR} • AC 0 circuits over {0, 1} n ; while this is a very good dependence on ε, it comes at the cost of a significantly worse dependence on the circuit size S. Thus both the seed length and correlation bounds of [LS11] The technical heart of our main result is a new exponential correlation bound against {SYM, THR}• AC 0 circuits of size n Ω(log n) :
Theorem 3. There is an absolute constant τ > 0 and an explicit poly(n)-time computable function H : {0, 1} n → {0, 1} with the following property: for any constant d, for n sufficiently large it is the case that for any n-variable circuit C of size n τ log n and depth d with a SYM or THR gate at the top, we have
Theorem 3 strictly improves on the correlation bound provided by Theorem 4 of [Vio07] , as it establishes correlation bounds for the same class of n Ω(log n) -size circuits, but gives a much smaller exp(−Ω(n 0.499 )) upper bound on the correlation rather than n −Ω(log n) . As described in Appendix A, our PRG result for {SYM, THR} • AC 0 (Theorem 2) follows directly from Theorem 3 via the NisanWigderson framework. In Section 1.4 we give an overview of the ideas that underlie our new correlation bound.
Correlation bounds and PRGs for constant-depth circuits with multiple SYM or THR gates. The main correlation bound and PRG of [Vio07] are actually for n c d log n -size depth-d circuits with c d (log n) 2 many SYM gates, and similarly the main result of [LS11] is a correlation bound for constant-depth circuits with n 1−o(1) many SYM gates or n 1/2−o(1) many THR gates. Our results similarly extend to constant-depth circuits with multiple SYM or THR gates. Our most general correlation bound is the following: Theorem 4. There is an absolute constant τ > 0 and an explicit poly(n)-time computable function H : {0, 1} n → {0, 1} with the following property: for any constant d, for n sufficiently large, any n-variable circuit C of size n τ log n and depth d containing n 0.249 many SYM or THR gates (the circuit is allowed to contain both types of gates) satisfies
Via the Nisan-Wigderson framework, Theorem 4 immediately yields the following, which is our most general PRG result: √ log(S/ε)) for size-S constant-depth circuits that contain O((log S) 2 ) many SYM or THR gates. We prove Theorem 4 in Appendix B.
1.3 Barriers to further progress: correlation bounds for F 2 polynomials and ACC 0 lower bounds
In this section we outline why achieving better dependence on S will require groundbreaking new correlation bounds or circuit lower bounds. 1.4 The high-level structure of our correlation bound argument
We recall the "bottom-up" approach to proving correlation bounds via the method of random restrictions. This approach dates back to the classic correlation bounds between Parity and AC 0 of [Ajt83] and [Hås86] ; in particular, the relevant prior works of [Vio07, LS11] also operate within this framework. Fix a hard function H, and let F be any function belonging to a given class F of Boolean functions (in our case F is the class of {SYM, THR} • AC 0 circuits of size n Ω(log n) ). Our goal is to show that F has small correlation with H, i.e. that Pr x←{0,1} n [F (x) = H(x)] ≤ 1 2 + α for some small α where x is uniform over {0, 1} n . This can be achieved by designing a fair distribution R over random restrictions that satisfies the following two competing requirements. (A distribution R over restrictions is said to be fair if first drawing a restriction ρ ← R and then filling in all * 's to independent uniform values from {0, 1} results in a uniform random string from {0, 1} n .)
(1) Approximator (F ) simplifies: With high probability 1 − γ SL over ρ ← R, F "collapses" when it is hit by ρ, meaning that F ↾ ρ ∈ F simple for some class F simple ⊆ F . Looking ahead, in our case
where m ≈ √ n and AND k denotes the class of fan-in k AND gates. A collapse to this F simple is useful for us because there are efficient multiparty communication protocols for functions computable by F simple (due to [HG91] when the top gate is SYM and to [Nis93] when it is THR).
(2) Target (H) retains structure: With high probability 1 − γ target over ρ ← R, the restricted hard function H ↾ ρ "retains structure", in the sense that it has small correlation with every function in F simple . In our case our notion of structure will be that H ↾ ρ "contains a perfect copy of" the generalized inner product function:
where m and k are the same m and k as above.
Suppose we have such a fair distribution R over random restrictions satisfying (1) and (2) above. The remaining step in the argument is to show the following: (3) for any ρ such that both of the above happen (approximator simplifies and the hard function retains structure), F ↾ ρ and H ↾ ρ have small correlation, i.e. they agree on at most 1 2 + γ corr fraction of all inputs. As in the previous works of [Vio07, LS11] , the fact that F simple and GIP m/2,k+1 have small correlation follows from a celebrated theorem of Babai, Nisan, and Szegedy [BNS92] lower bounding the multiparty communication complexity of GIP m,k+1 .
It is straightforward to see that items (1)-(3) above establish a correlation bound of
The goal is therefore to carry out the above with max{γ SL , γ target , γ corr } as small as possible for the class F of {SYM, THR} • AC 0 d circuits of as large a size as possible. As indicated earlier, for any constant d and circuits of size up to s = n τ log n we achieve max{γ SL , γ target , γ corr } = exp(−Ω(n 0.499 )).
After giving some technical preliminaries in Section 2, we upper bound γ SL , γ target , and γ corr in Sections 3, 4, and 5 respectively.
How this work differs from [Vio07, LS11]: improved depth reduction
A simple observation (due to [HM04] ) that is used in both [Vio07, LS11] and in our work as well is the fact that a symmetric function of depth-k decision trees can be simulated by a (different) symmetric function of width-k AND's, and likewise for a threshold function of depth-k decision trees. (See Fact 3.3 for a precise statement.) Consequently we can think of F simple as {SYM, THR} • DT k rather than {SYM, THR} • AND k (where DT k denotes the class of decision trees of depth k), and for depth reduction it suffices to prove that a family of s many AC 0 circuits collapses to a family of small-depth decision trees with high probability under a random restriction. This is exactly what is shown by switching lemmas.
The loss in the previous works of [ Vio07, LS11] is due to the switching lemmas they use and the limitations of these switching lemmas. [Vio07] uses the standard [Hås86] switching lemma:
Theorem 5 (Håstad's switching lemma). Let F be computed by a depth-2 circuit with bottom fan-in w. Then Pr
This failure probability of (5pw) t cannot be made exponentially small in our setting: since correlation bounds strong enough to be useful for the [NW94] framework are not known for SYM • AND ω(log n) (see "Open Question 1" of [Vio09a]) the value of t has to has to be taken to be at most k = O(log n), and moreover p certainly has to be ≫ 1/n (since taking p = 1/n would leave only a constant number of coordinates alive, and H ↾ ρ would not "retain structure" in the sense of containing a copy of GIP m/2,k+1 ). Indeed, [Vio07] applies Theorem 5 with p = n −Θ(1) in order to make the failure probability as small as n −Ω(log n) , and this is why [Vio07] only achieves quasi-polynomial correlation bounds n −Ω(log n) .
Faced with this obstacle, instead of using the standard [Hås86] switching lemma, [LS11] reverts to the earlier "multi-switching lemma" of [Ajt83] which applies to a collection of depth-2 circuits rather than a single such circuit. The [Ajt83] multi-switching lemma, stated below, does achieve exponentially small failure probability, but is only able to handle collections of n O(log log n) many k-DNFs, for k = O(log log n). Recall that a restriction tree T is like a decision tree except that leaves do not have labels associated with them (so each root-to-leaf path is a restriction). The distribution µ T corresponds to the distribution over restrictions obtained by making a random walk from the root of T .
Theorem 6 (Ajtai's switching lemma [Ajt83] ). Let F = {F 1 , . . . , F s } be a family of s many DNFs over x 1 , . . . , x n , each of width k. For any t ≥ 1, there is a restriction tree T of height at most nk(log s)/(log n) t such that
Hence [LS11] achieves exponentially small correlation bounds (the main point of their paper), but only against circuits of size n O(log log n) .
The key new ingredient that we employ in this work is a recent powerful multi-switching lemma from [Hås14] . (We note that [IMP12] gives an essentially equivalent multi-switching lemma which we could also use.) Roughly speaking the [Hås14] multi-switching lemma, whose precise statement we defer to Section 3 as it is somewhat involved, lets us achieve an exponentially small failure probability (like Ajtai's multi-switching lemma) of achieving a significantly more drastic simplification than Ajtai's multi-switching lemma (recall the doubly-exponential-in-k dependence on the junta size in Theorem 6). This quantitative improvement in depth reduction translates into our stronger correlation bounds.
Relation to [ST18]
We close this introduction by discussing the connection between this paper and recent concurrent work of the authors [ST18] . The high-level approaches of the two paper are fairly different: unlike the current paper, [ST18] does not use the Nisan-Wigderson hardness-versus-randomness paradigm (and does not establish any new correlation bounds); instead it establishes a derandomized version of the [Hås14] multi-switching lemma and combines this with other ingredients to obtain its final PRG in a manner reminiscent of [AW85, TX13] .
The results of the two papers are also incomparable (briefly, [ST18] obtains significantly shorter seed length for significantly more restricted classes of functions). The first main result of [ST18] is an ε-PRG for the class of size-S depth-d AC 0 circuits with seed length log(S) d+O(1) · log(1/ε). This is incomparable to the most closely related result of the present paper (Corollary 1.1, which gives a 2 O( √ log S) + polylog(1/ε) seed length PRG for AC 0 circuits augmented with polynomially many SYM or THR gates), since the [ST18] result gives a significantly better seed length but for the significantly more limited class of "un-augmented" constant-depth circuits (indeed, the [ST18] result does not apply to AC 0 circuits augmented even with a single SYM or THR gate). The second main result of [ST18] is an ε-PRG for the class of S-sparse F 2 polynomials with seed length 2 O( √ log S) · log(1/ε). Here too the seed length of [ST18] is shorter than that of the current paper (giving the optimal log(1/ε) dependence on ε as opposed to the (log(1/ε)) 4.01 of the current paper), but the result of [ST18] only holds for S-sparse F 2 polynomials, which are a very restricted case of the {SYM, THR} • AC 0 d circuits which are handled in the current paper.
Preliminaries
We use bold font like x, ρ, etc. to denote random variables.
We write "size-S AC 0 d " to denote the class of circuits of depth d consisting of at most S unbounded fan-in AND/OR gates with variables and negated variables as the inputs (we include these literals in the gate count).
Pseudorandomness.
Equivalently, we say that Gen D is a δ-PRG for C with seed length r.
Restrictions. A restriction ρ of variables x 1 , . . . , x n is an element of {0, 1, * } n ). Given a function f (x 1 , . . . , x n ) and a restriction ρ, we write f ↾ ρ to denote the function obtained by fixing x i to ρ(i) if ρ(i) ∈ {0, 1} and leaving x i unset if ρ(i) = * . For two restrictions ρ, ρ ′ ∈ {0, 1, * } n , their composition, denoted ρρ ′ ∈ {0, 1, * } n , is the restriction defined by
We write R p to denote the standard distribution over random restrictions with * -probability p, i.e. ρ drawn from R p is a random string in {0, 1, * } obtained by independently setting each coordinate to * with probability p and to each of 0, 1 with probability 1−p 2 .
Multiparty communication complexity
We recall a celebrated lower bound of Babai, Nisan, and Szegedy [BNS92] on the multi-party "number on forehead" (NOF) communication complexity of the generalized inner product function:
There is a partition of the m · (k + 1) inputs of
into k+1 blocks such that the following holds: Let P be a (k+1)-party randomized NOF communication protocol exchanging at most 1 10 (m/4 k+1 − log(1/γ comm )) bits of communication and computing a Boolean function f with error γ err (meaning that on every input x the protocol outputs the correct value f (x) with probability at least 1 − γ err ). Then
The connection between SYM • AND k circuits and (k + 1)-party communication complexity is due to the following simple but influential observation of Håstad and Goldmann:
Fact 2.1 ([HG91]). Let f : {0, 1} n → {0, 1} be a Boolean function computed by a size-s SYM•AND k circuit. Then for any partition of the n inputs of f into k + 1 blocks, there is a deterministic NOF (k + 1)-party communication protocol that computes f using O(k log s) bits of communication.
For THR • AC 0 circuits we use an analogous result from [Nis93] on the (k + 1)-party randomized γ-error communication complexity of THR • AND k circuits:
). Let f : {0, 1} n → {0, 1} be a Boolean function computed by a THR • AND k circuit. Then for any partition of the n inputs of f into k + 1 blocks, there is a randomized NOF (k + 1)-party communication protocol that computes f with error γ err using O(k 3 log n log(n/γ err )) bits of communication.
Ingredient (1): Simplifying the approximator
The main result of this section is the following:
Lemma 3.1. Let F be any {SYM, THR}•AC 0 d circuit of size s = n τ log n . There is a fair distribution R over restrictions ρ ∈ {0, 1, * } n such that the following holds: With probability 1 − γ SL = 1 − exp(−Ω d ( n/ log n)) over the draw of ρ ← R, it is the case that F ↾ ρ belongs to the class F simple = {SYM, THR} • AND k:=0.0005 log m .
The recent "multi-switching lemma" of [Hås14] is the main technical tool we use to establish Lemma 3.1. To state the [Hås14] lemma we need some terminology. Let G be a family of Boolean functions. A restriction tree T is said to be a common ℓ-partial restriction tree (RT) for G if every g ∈ G can be expressed as T with depth-ℓ decision trees hanging off its leaves. (Equivalently, for every g ∈ G and root-to-leaf path π in T , we have that g ↾ π is computed by a depth-ℓ decision tree.) Theorem 9 ([Hås14] multi-switching lemma). Let F = {F 1 , . . . , F s } be a collection of depth-2 circuits with bottom fan-in w. Then for any t ≥ 1,
Theorem 9 is the main tool we use to simplify any {SYM, THR} • AC 0 circuit down to an F simple -circuit. Conceptually, we think of this transformation as being done in three steps:
1. (Main step) Apply a random restriction ρ ′ ← R p to convert a {SYM, THR} • AC 0 circuit into a decision tree with a {SYM, THR} • DT circuit at each leaf.
2. Observing that {SYM, THR} • DT ≡ {SYM, THR} • AND, this is equivalent to a decision tree with a {SYM, THR} • AND circuit at each leaf.
In the rest of this section we describe each of these steps in detail and thereby prove Lemma 3.1.
First (main) step. If g is a Boolean function and C is a class of circuits, we say that g is computed by a (d, C)-decision tree if g is computed by a decision tree of depth d (with a single Boolean variable at each internal node as usual) in which each leaf is labeled by a function from C. We require the following corollary of Theorem 9:
Corollary 3.2. Let G be any Boolean function and G be a gate computing G, and let F be a G • AC 0 d circuit of size s . Then for p = 1 48 (48 log s) −(d−1) and any t ≥ 1,
Proof. We may assume without loss of generality that the depth-(d+1) circuit F is layered, meaning that for any gate g it contains, every directed path from an input variable to g has the same length (converting an unlayered circuit to a layered one increases its size only by a factor of d, which is negligible for our purposes). Let s i denote the number of gates in layer i (at distance i from the inputs), so
We begin by trimming the bottom fan-in of F : applying Theorem 9 with F being the s 1 many bottom layer gates of F (viewed as depth-2 circuits of bottom fan-in w = 1) and p 0 := 1/48, we get that
Let F (0) be any good outcome of the above, a (t, G • AC 0 (depth d, bottom fan-in log s))-decision tree. Note that there are at most 2 t many AC 0 (depth d, fan-in log s) circuits at the leaves of the depth-t decision tree. Applying Theorem 9 to each of them with p 1 := 1/(48 log s) (and the 't' of Theorem 9 being 2t) and taking a union bound over all 2 t many of them, we get that
Repeat with p 2 = . . . = p d−1 := 1/(48 log s), each time invoking Theorem 9 with its 't' being the one more than the current depth of the decision tree . The claim then follows by summing the s 1 2 −t , s 2 2 −t , . . . , s d 2 −t failure probabilities over all d stages and the fact that
Second step: From {SYM, THR} • DT to {SYM, THR} • AND. We recall the following fact from [HM04] : Fact 3.3. Every SYM s •DT log s function (resp. THR s •DT log s ) can be computed by a SYM s 2 •AND log s (resp. THR s 2 • AND log s ) circuit.
(This is an easy consequence of the fact that any decision tree may be viewed as a DNF whose terms corresponds to the paths to 1-leaves, and that this DNF has the property that any input assignment makes at most one term true.) Applying Fact 3.3 and choosing t = m/2 d+1 in Corollary 3.2, (where m = Θ( n/ log n) will be defined precisely in the next section), we get the following special case of Corollary 3.2:
(1)
Third step: Trimming to reduce bottom fan-in. The {SYM, THR} • AND circuits hanging off the leaves of our decision tree have bottom fan-in at most log s, but we will need them to have fan-in at most k in order to invoke the [BNS92] lower bound later. At each leaf ℓ we achieve this smaller fan-in by identifying a set (call it S ℓ ) of additional variables and restricting them in all possible ways; we argue that every fixing of the variables in S ℓ gives the desired upper bound of k on the bottom-AND fan-in. We use a probabilistic argument to establish the existence of the desired set S ℓ (this is important because in the next section we will need each S ℓ to satisfy an additional property, and the probabilistic argument makes it easy to achieve this). Let us write "L ⊆ q X" to indicate that L is a subset of X that is randomly chosen by independently including each element of X with probability q. We will use the following easy result: 
Recall that s = n τ log n where τ > 0 is a small absolute constant to be specified later and that k = 0.0005 log n. We set
where the last inequality holds for a suitably small choice of the constant τ . Observe that q is chosen so as to ensure
Fix T to be an (m/2, {SYM s 2 , THR s 2 } • AND log s )-decision tree as given by Corollary 3.4. At
is the subset of variables that are fixed on the root-to-ℓ path in T . By Fact 3.5 and (2), at each leaf ℓ it is the case that with probability at least 1 − 1/s over the random draw of L(ℓ), every extension of the root-to-ℓ path in T that additionally fixes all the variables in S ℓ collapses the {SYM, THR} • AND log s circuit that was at ℓ in T down to a {SYM, THR} • AND k circuit. We say that such an outcome of L(ℓ) is a good outcome (we will refer back to this notion in the next section).
In summary, the above discussion establishes Lemma 3.1, where the fair distribution R corresponds to (a) first drawing ρ ′ ← R p , (b) then walking down a random root-to-leaf path π in the resulting depth-(m/2) decision tree given by Corollary 3.4, (c) and then finally, at the resulting leaf ℓ, choosing a random assignment to the variables in the set S ℓ that corresponds to L(ℓ), where L(ℓ) is a good outcome of the random variable
(Note that the randomness over L(ℓ) is not part of the random draw of ρ ← R; all we require is the existence of a good L(ℓ).)
Based on our discussion thus far each L(ℓ) may be fixed to be any good outcome of L(ℓ); we will give an additional stipulation on L(ℓ) in Remark 10.
4 Ingredient (2) (target retains structure): GIP • PAR under random restrictions
], our hard function will be the generalized inner product function composed with parity:
This function was introduced by Razborov and Wigderson [RW93] to show n Ω(log n) lower bounds against depth-3 threshold circuits with AND gates at the bottom layer. We will set m = r = n/(k + 1) (recall that k = 0.0005 log m).
Note that m = r = Θ( n/ log n) and k = Θ(log n). Given parameters m ′ , k ′ , r ′ , we say that a function g : {0, 1} n → {0, 1} contains a perfect copy of RW m ′ ,k ′ ,r ′ if there is a restriction κ such
x i,j,k for some bits b, b i,j . Roughly speaking, the motivation behind augmenting GIP with a layer of parities is to ensure that RW is resilient to random restrictions (i.e. that RW ↾ ρ "remains complex", containing a copy of GIP with high probability after a suitable random restriction). In our setting we need that RW is resilient to a random restriction ρ ← R for the fair distribution R from Lemma 3.1; we establish this in the rest of this section.
Proposition 4.1. Consider the space of formal variables of RW m,k,r : {0, 1} n → {0, 1}:
Proof. This follows directly from a standard multiplicative Chernoff bound and a union bound over
Recall that a random restriction ρ ′ ← R p can be thought of as being sampled by first drawing K ⊆ p X and setting ρ i to * for each i ∈ K, and then setting the coordinates of ρ ′ in X \K according to a uniform random draw from {0, 1} X\K . Proposition 4.1 and the definition of RW thus yield the following:
where r ′ = pr 2
with failure probability at most
Note that
where the inequality uses the fact that d is a constant and the fact that s = n O(log n) ; we will use this later. Corollary 4.2 states that with very high probability over ρ ′ ← R p , the function RW m,k,r ↾ ρ ′ "does not simplify too much"; however we need RW m,k,r to "not simplify too much" under a full random restriction drawn from R (recall the discussion at the end of Section 3). We proceed to establish this.
Fix any outcome ρ ′ of ρ ′ ← R p such that (i) the conclusion of Corollary 3.4 holds (i.e. F ↾ ρ ′ is computed by a (m/2, {SYM s 2 , THR s 2 } • DT log s )-decision tree, which we call T ), and (ii) the conclusion of Corollary 4.2 holds (i.e. RW m,k,r ↾ ρ ′ contains a perfect copy of RW m,k,r ′ ). (A random ρ ′ ← R p is such an outcome with probability at least 1 − γ SL − γ target .) For ease of notation let us write RW ′ to denote RW m,k,r ↾ ρ ′ . Fix any path π that reaches a leaf ℓ in T. (Note that a random choice of such a path corresponds to part (b) in the random draw of ρ ← R, recalling the discussion at the end of Section 3.) Since |π| ≤ m/2, we have that the set A ℓ := {i ∈ [m] : π i,j,t = * for all j ∈ [k + 1] and all t} has cardinality at least m − |π| ≥ m/2. In words, at least m/2 of the m many depth-2 subcircuits of RW ′ are completely "untouched" by π. For part (c) of the draw from R, recall that the set L(ℓ) could be taken to be any good outcome of L(ℓ), and that a random L(ℓ) ⊆ q [n] is good with probability at least 1 − 1/s. By the same Chernoff bound argument as the one in Proposition 4.1, we have that
recalling that |A ℓ | ≤ m, k = Θ(log n), q ≥ n −0.01 and r ′ > n 0.49 . Since 1−1/s+1−exp(−Ω(n 0.48 )) > 1, there must exist a good outcome L(ℓ) of L(ℓ) such that for the corresponding S ℓ , every restriction ρ trim fixing precisely the variables in S ℓ is such that RW ′ ↾ πρ trim contains a perfect copy of
Having r ′′ ≥ 1 is crucial for us because, together with |A ℓ | ≥ m/2, it means that RW m,k,r ′′ contains a perfect copy of GIP m/2,k+1 (i.e. by possibly restricting and renaming some variables of RW m,k,r ′′ and possibly negating the result, we obtain a function identical to GIP m/2,k+1 ).
Remark 10. We refine the definition of R to require that in (c) it use an L(ℓ) as specified above at each leaf ℓ.
Summarizing, the above discussion establishes that RW m,k,r "retains structure" with high probability under a random ρ ← R. The formal statement of this result (incorporating also Lemma 3.1) is as follows:
Lemma 4.3. Let F be any {SYM, THR} • AC 0 d circuit of size s = n τ log n . The fair distribution R over restrictions ρ ∈ {0, 1, * } n from Lemma 3.1 satisfies the following: With probability 1 − γ SL − γ target over a draw of ρ ← R, both of the following hold:
(ii) RW m,k,r ↾ ρ contains a perfect copy of GIP m/2,k+1 .
5 Bounding the correlation between the approximator and target post-restriction
With Lemma 4.3 in hand it is a simple matter to finish the argument. Fix any outcome ρ of ρ ← R such that F ↾ ρ and RW m,k,r ↾ ρ satisfy (i) and (ii) of Lemma 4.3. Applying either Fact 2.1 or Theorem 8 (depending on whether the top gate of F is SYM or THR along with the lower bound of Theorem 7), we get that
where
This gives ingredient (3) as described in Section 1.4. Recalling the discussion at the start of Section 1.4, Theorem 3 follows from Lemma 4.3 and (5).
[Vio09a]
Emanuele Viola. A Applying the [NW94] paradigm to obtain pseudorandom generators from correlation bounds A function f is said to be (s, τ )-hard for a circuit class C if every circuit C ∈ C of size at most s has Pr x [f (x) = C(x)] ≤ 1 2 + τ , where x is a uniform random input string. If this holds then we say that f gives a correlation bound of τ against C-circuits of size s.
Given a quadruple (m, r, ℓ, s) of non-negative integers, a family F = {T 1 , . . . , T s } of r-element subsets of [m] is said to be an (m, r, ℓ, s)-design if for any two distinct subsets T i , T j ∈ F we have
An ANY t gate is a gate that takes in t inputs and computes an arbitrary function from {0, 1} t to {0, 1}.
We recall the Nisan-Wigderson [NW94] translation from correlation bounds to PRGs:
Theorem 11 (The Nisan-Wigderson generator). Fix a circuit class C and let m, r, ℓ, s ∈ AE be positive parameters with m ≥ r ≥ ℓ. Given an explicit f : {0, 1} r → {0, 1} that is (s · 2 ℓ , ε/s)-hard for C • ANY log ℓ and an explicit (m, r, ℓ, s)-design, there is an explicit PRG G : {0, 1} m → {0, 1} s that ε-fools size-s circuits in C. (Hence for s ≥ n, by taking the first n output bits of G there is an explicit PRG mapping {0, 1} m to {0, 1} n that ε-fools size-s n-variable circuits in C.)
The existence of explicit designs is well known, in particular we recall the following:
Lemma A.1 (Problem 3.2 of [Vad12] ). There is a deterministic algorithm which, for any r, s ∈ AE, It is straightforward to verify that s · 2 ℓ ≤ r c d log log r and ε/s ≥ exp(−r 1−o(1) ). By Lemma A.1 there is an explicit (m, r, ℓ, s)-design with m = O(r 2 /ℓ) = 2 O(log s/ log log s) ·(log(1/ε)) 2+o(1) , so applying the Nisan-Wigderson generator, we get that for s ≥ n, there is an explicit PRG G : {0, 1} m → {0, 1} n with seed length m = 2 O(log s/ log log s) + (log(1/ε)) 2+o(1) that ε-fools n-variable size-s circuits in [LS11] show that the same function f is (r c d log log r , exp(−r 1/2−o(1) ))-hard for THR • AC 0 d ; a similar analysis to the above gives an explicit PRG G : {0, 1} m → {0, 1} n with seed length m = 2 O(log s/ log log s) + (log(1/ε)) 4+o(1) that ε-fools n-variable size-s circuits in THR • AC 0 d .
A PRG from our Theorem 3: Proof of Theorem 2. Theorem 3 gives an explicit f : {0, 1} r → {0, 1} and τ > 0 such that for all d, f is (r τ log r , exp(−r 0.499 ))-hard for {SYM, THR} • AC 0 d . This time we choose ℓ = log s, r = 2
We have s · 2 ℓ ≤ r τ log r and ε/s ≥ exp(−r 0.499 ), so we get that for s ≥ n, there is an explicit
A PRG from our Theorem 4: Proof of Corollary 1.1. Finally, Theorem 4 gives an explicit f : {0, 1} r → {0, 1} and τ > 0 such that for all d, f is (r τ log r , exp(−r 0.499 ))-hard for the class of depth-d circuits over {0, 1} r that contain r 0.249 many SYM or THR gates. We choose ℓ, r as above, so similar to the above, we get that there is an explicit PRG G : {0, 1} m → {0, 1} n with seed length m = O(r 2 /ℓ) = 2 O( √ log s) + (log(1/ε)) 4.01 that ε-fools n-variable size-s depth-d circuits with at most 2 c √ log s many SYM or THR gates.
B Proof of Theorem 4: Handling multiple SYM and THR gates
We prove Theorem 4 via a slight variant of Theorem 3 and an argument from [LS11] (a related argument appears in a somewhat different form in [Vio07] ). The variant of Theorem 3, stated as Theorem 12 below, is proved by combining ingredients (1), (2) and (3) as in Section 1.4, but now with the aim of proving a correlation bound against ANY u •{SYM, THR}•AC 0 d circuits rather than {SYM, THR} • AC 0 d circuits (where here and throughout this appendix we take u := n 0.249 ). As we describe at the end of this section, once this correlation bound against ANY u • {SYM, THR} • AC 0 d is in place, the extension to circuits with n 0.249 many SYM or THR gates directly follows using an argument from [LS11] .
In more detail we have: (throughout the following the values of m, k, r are as they were before)
Lemma B.1 (Lemma 3.1 analogue). Fix u := n 0.249 and let F be an ANY u • {SYM, THR} • AC 0 d circuit where each of the u {SYM, THR}•AC 0 d subcircuits of F has size at most s = n τ log n . There is a fair distribution R over restrictions ρ ∈ {0, 1, * } n such that the following holds: With probability 1 − γ SL = 1 − exp(−Ω d ( n/ log n)) over the draw of ρ ← R, it is the case that F ↾ ρ belongs to the class F simple, u := ANY u • {SYM, THR} • AND k .
The proof is almost identical to that of Lemma 3.1, with ANY u · {SYM, THR} taking the place of {SYM, THR} throughout the argument. Now in Corollary 3.2 the gate G corresponds to ANY u • {SYM, THR} (rather than to just {SYM, THR} as earlier) and the total circuit size of F is us rather than s (leading to us · 2 −t rather than s · 2 −t on the RHS of the Corollary 3.2 bound), but this is swallowed up by the slack in the inequalities leading to (1).
Lemma B.2 (Lemma 4.3 analogue). Fix u := n 0.249 and let F be an ANY u • {SYM, THR} • AC 0 d circuit where each {SYM, THR} • AC 0 d subcircuit has size at most s = n τ log n . The fair distribution R over restrictions ρ ∈ {0, 1, * } n from Lemma B.1 satisfies the following: With probability 1 − γ SL − γ target over a draw of ρ ← R, both of the following hold: The proof, using Lemmas B.1 and B.2, is virtually identical to the proof of Theorem 3 using Lemmas 3.1 and 4.3. The only difference is that we use the obvious extensions of Fact 2.1 and Theorem 8 to ANY u · SYM • AND k circuits and ANY u · THR • AND k circuits respectively; these extensions are stated for completeness below. Fact B.3 (Fact 2.1 analogue). Let f : {0, 1} n → {0, 1} be a Boolean function computed by a size-s ANY u • SYM • AND k circuit. Then for any partition of the n inputs of f into k + 1 blocks, there is a deterministic NOF (k + 1)-party communication protocol that computes f using u · O(k log s) bits of communication.
Theorem 13 (Theorem 8 analogue). Let f : {0, 1} n → {0, 1} be a Boolean function computed by a AN Y u • THR • AND k circuit. Then for any partition of the n inputs of f into k + 1 blocks, there is a randomized NOF (k + 1)-party communication protocol that computes f with error γ err using u · O(k 3 log n log(n/γ err )) bits of communication.
Finally, the correlation bound Theorem 4 follows from Theorem 12 exactly as Theorem 6 of [LS11] follows from Lemma 3 of that paper.
