We show that proving exponential lower bounds on depth four arithmetic circuits imply exponential lower bounds for unrestricted depth arithmetic circuits. In other words, for exponential sized circuits additional depth beyond four does not help.
Introduction
The permanent, by virtue of being complete for #P [22] , occupies a central position in the study of the complexity of counting problems. Its illustrious sibling, the determinant is comparatively easy, being complete for GapL, a complexity class housed within N C 2 [4, 19, 21, 25] . * This research was partially supported by J. C.
Bose fellowship FLW/DST/CS/20060225
The difference between the computation complexity of the permanent and the determinant has been among the most intriguing mathematical questions of our times. While we know determinant is easy, it has been very difficult to prove any non-trivial lower bounds against the permanent.
In reality, a variety of lower bounds have been proved in restricted settings. Jerrum and Snir [8] , and more recently improved by Raz and Yehudayoff [16] , show that any monotone circuit computing the permanent requires exponential size. Nisan and Wigderson [14] show that any depth three circuit computing 2d th symmetric polynomial requires ( n 4d ) Ω(d) size. Shpilka and Wigderson [18] show that any depth three circuit computing the permanent and determinant over the rationals requires quadratic size. Grigoriev and Razborov [6, 7] show that any depth three arithmetic circuit over a finite field computing the permanent or the determinant requires exponential size. Raz [15] shows every multilinear forumla computing permanent and determinant requires superpolynomial size. All of these proofs already require mathematically intricate machinery.
Another path to potential lower bounds was discovered by Kabanets and Impagliazzo [9] . Ever since they showed a remarkable connection between efficient polynomial identity testing (PIT) and arithmetic ciruit lower bounds, identity testing has invited closer scrutiny. For example, Kayal and Saxena [12] , Saxena [17] , and Karnin and Shpilka [11] show how certain restricted depth three circuits have deterministic polynomial time identity tester. Also, Dvir, Shpilka, and Yehudayoff [5] show how the Kabanets and Impagliazzo results can be extended to bounded-depth circuits.
Interestingly, a number of the above results are restricted to depth three circuits. Why not four or five or six? The reason, and we show it in this paper, is that crossing the chasm at depth four is as hard as the general case!
Depth Four Chasm:
If a polynomial P (x 1 , ..., x n ) of degree d with d = O(n) can be computed by an arithmetic circuit of size 2 o(d+d log n d ) , it can also be computed by a depth four arithmetic circuit of size 2
) with multiplication gates of fanin o(d).
Notice that polynomial P can trivially be computed by an arithmetic circuit of size 2
(and depth two). The main result have implications to identity testing as well.
Identity Testing Chasm:
If there is a complete black-box derandomization of Identity Testing for depth four circuits with multiplication gates of small fanin, then the general Identity Testing problem can be deterministically solved in n O(log n) time.
We wish to point out that the Boolean setting behaves quite differently as compared to the arithmetic setting due to the existence of two additional axioms: (1) f g + g = g and (2) g 2 = g. These axioms result in strong cancellative properties and they actually demonstrate that degree in a Boolean setting is not a "natural" primary resource in the sense that it can be traded for time. For example, insisting on polynomial size and polynomial degree circuits puts the languages in LOGCFL but if the constraint on degree is removed, one captures all of P. Intuitively, degree is not a very good resource measure as we may get small sized circuits for a language if we lift the constraint on its degree.
In contrast, higher degree terms in arithmetic setting can never cancel lower degree ones and it follows that given a circuit to compute a degree d polynomial, it can be replaced by another circuit whose all intermediate polynomials are of degree ≤ d at the cost of polyonmial increase in size [23] . This is not to mean that depth reduction results such as ours are not possible in Boolean settings. Indeed, Valiant [24] shows how monotone circuit of size n and depth O(log n) can be converted to a depth three monotone circuit of size 2 O(n/ log log n) .
The Chasm at Depth Four
It is easy to see that depth four circuits are more powerful than depth three. For example, consider the problem of computing determinant over a finite field F . We know, by [7] , that depth three circuits computing determinant over F require exponential size. We now observe that determinant over F can be computed by depth four arithmetic circuits of size 2 o(n) .
We will start with a problem well know to be computationally equivalent to the determinant: matrix powering [3] . Matrix powering is the problem of powering an n × n matrix to the n th power, where each entry of the matrix is either −1, 0, or 1.
The proof is simple. We break the matrix chain of n matrices into √ n equal sections. In each section, we can compute the ij th entry of the resulting matrix as a sum of products; each product being a multiplication of √ n entries. It is easy to see that the number of such products, and hence the fan-in into the plus gate, is bounded by n √ n .
At the end of this phase, we are left with √ n matrices; one for each section. The ij th entry of the resulting matrix, can similarly be written as sum of products. Again, a product would be √ n long and the sum would be over all possible n √ n products.
Overall, this results in a depth 4 circuit of size n O( √ n) for matrix powering, and hence for the determinant.
Theorem 2.1
The determinant of a n × n matrix with integer n bit entries can be computed by depth 4 arithmetic circuits of size n O( √ n) .
We now generalize the above observation to any arithmetic circuit of subexponential size. In this paper, we use subexponential size to mean circuits of size 2 o(n) .
Let P (x 1 , . . . , x n ) be a polynomial of total degree d. We restrict our attention to the case when d = O(n) 1 . P can be written as sum of at most n+d d
products. Hence it can always be computed by a depth two circuit of size 2
1 If the degree is ω(n) then the bounds we get are weaker, and in any case the permanent has sublinear degree.
Lemma 2.2 For any n and k(n) such that
Stirling's formula for factorial:
) . We can assume, without loss of generality, that all the intermediate polynomials computed inside the circuit C have degree bounded by d [23] . In [20, 2] it is shown that C can be transformed to a circuit D of degree d, size M O(1) and depth O(log d) with multiplication gates of fan-in two. We do a careful analysis of the transformation in [2] to obtain a circuit D with more structural properties. In particular, we will be interested in getting good bounds on the degree of the gates.
The circuit D we construct will be a strictly alternating circuit of size S = M O(1) , where M is the size of the original circuit. The addition gates of D have unbounded fan-in, while multiplication gates of D have fan-in bounded by 6.
The degree of polynomials computed at each gate satisfies the following properties:
• the output gate degree is d,
• degree of any child of an addition gate is the same as the degree of the gate,
• all children of a multiplication gate have degree at most half of the degree of the gate.
It follows that the depth of the circuit D is at most 2 log d. We now indicate how to construct such a circuit.
Construction
As a first step, we ensure the C is a layered circuit with alternating levels of plus and mult gates. Also, we will ensure the fan-in at every multiplication gate is 2. Finally, we rearrange the children of the multiplication gate so that the degree of the left child is smaller than or equal to the degree of the right child.
A proof tree rooted at gate g of circuit C is a subcircuit obtained as follows:
• start with the subcircuit in C that has gate g at the top and computes the polynomial associated with gate g,
• for every addition gate in this subcircuit, retain only one input to the gate while deleting the remaining input lines,
• for any multiplicate gate in the subcircuit, retain both the inputs to it.
It is easy to see that a proof tree rooted at g computes a monomial of the polynomial computed at g and this polynomial equals the sum of all such monomials. For every gate g in C, define [g] to stand for the polynomial computed at gate g. For every pair of gates g and h in C, let [ 
It is easy to see that
Let g be a mult gate with children g L and g R as left and right children respectively. Then, if the right most path from g to h has only plus gates then [ 
Otherwise, for a fixed right most path from g to h, there must exist a unique intermediate mult gate, say p (with children p L and p R ) along the right most path connecting g and h such that
Of course, several right most paths could exist between g and h and we have no way of pinpointing only them. Therefore we sum over all possible gates p, satisfying the above condition.
Let us now analyze the three terms in the product.
Clearly
We ([g, h] ). To get around this, we apply the depth reduction algorithm once more to
for certain gates q, q L and q R (q L and q R are children of q and degree of q satisfies the bounds as above). By our analysis, for the troublesome left child we now h] ). Of course, the bound holds easily for the q and q R as well and therefore, we have:
where p, q satisfy the appropriate degree constraints. This completes the description of circuit D.
By introducing dummy plus gates in the circuit, we can ensure that plus and mult gates alternate in D. Thus we get a fan-in 6 multiplication circuit D with depth at most 2 log d (of which at most log d layers are of mult gates) and size M O (1) . All the properties that we had listed of circuit D are satisfied. Let S be the size of the circuit D, S = M O(1) .
We construct a depth 4 circuit E from D. Choose any such that ≤ d+d log n d log S and = ω(1). Set t = 1 2 log 6 . Cut the circuit D in two halves: the top one has exactly t mult layers with the last layer being of mult gates and the rest of layers belong to the bottom half. Let g 1 , . . ., g k (k ≤ S) be the output gates in the bottom layer. We can view the top layer as computing a polynomial in k new variables, say, y 1 , . . ., y k . Let this polyonmial be P 0 (y 1 , . . . , y k ). Let the polynomial computed at the gate g i be P i (x 1 , . . . , x n ) for 1 ≤ i ≤ k. The polynomial computed by the circuit D equals
We now obtain an upper bound on the degrees of all these polynomials. Since the top layer has exactly t mult layers and each mult gate has fanin bounded by 6, the degree of P 0 is bounded by 6 t . Since the degree goes down by at least a factor of two across mult layers, the degree of P i is bounded by
Express each P i , 0 ≤ i ≤ k as a sum of products, thus each requiring a depth two circuit to compute. Patching together these circuits, we get a depth four circuit computing the polynomial computed by D. Let this circuit be E. Let us calculate the size of E. Lemma 2.3 Polynomial P 0 can be written as a sum of at most
products, each of fanin ≤ 6 t . Polynomials P i , 1 ≤ i ≤ k, can be written as a sum of at most
Proof . The number of monomials on n variables and degree k is n+k k . The lemma now follows from the degree bound on each polynomial and the number of variables they are defined on. Therefore, the size of circuit E is bounded by
Therefore, we have the following theorem. ) for P . Further, the fanin of second layer mult gates is bounded by (n) where is any sufficiently slowly growing function in ω(1) and the fanin of bottom layer mult gates is bounded by o(d).
For multilinear polynomials, we have the following corollary.
Corollary 2.5 Let P (x 1 , . . . , x n ) be a multilinear polynomial of over the field F . If there exists an arithmetic circuit of size 2 o(n) for P , then there exists a depth 4 arithmetic circuit of size 2 o(n) for P .
When the multilinear polynomial is specialized to the permanent we get, Corollary 2.6 If every depth 4 arithmetic circuit for Permanent require exponential size, then every arithmetic circuit for Permanent requires exponential size.
Black-box Derandomization of Identity Testing
An arithmetic circuit of size n is a low degree circuit if the polynomial computed by the circuit has degree ≤ n. Low degree Identity Testing is the problem of testing if a given low degree circuit is zero. In this section, we relate the black-box derandomization of depth four Identity Testing to low degree Identity Testing. A black-box derandomization of low degree Identity Testing problem can be defined as follows (it is a restriction of the definition given in [1] to low degree circuits).
Definition 3.1 Let F be a field. Let C be a class of low degree arithmetic circuits over
is computable in time polynomial in s(n) and each p n j is of degree bounded by s(n).
• For any arithmetic circuit C ∈ C of size n computing a polynomial of n variables over
Given an s(n)-pseudorandom generator f against C, one can solve the Identity Testing problem (for circuits from the class C) deterministically in time s O(1) (n) by simply plugging in the polynomial p n j for x j and evaluating the resulting (univariate) polynomial. A complete derandomization is obtained when s(n) is a polynomial in n. We call such generators optimal pseudorandom generators.
Remark: At first glance, the above definition of pseudorandom generators may appear differ-ent from the one in the Boolean setting. Borrowing from the Boolean setting, one can define a s(n)-hitting set generator against arithmetic circuits via a function g : N × N → F * , g(n, t) = (a t 1 , a t 2 , . . . , a t n ), such that for any circuit C of size n on n inputs, C computes a nonzero polyonmial iff there exists a t, 1 ≤ t ≤ s(n) such that C(a t 1 , . . . , a t n ) = 0. It is, however, straightforward to see that the two definitions are equivalent: let p n i (y) be the polyonmial of degree ≤ s(n) such that p n i (t) = a t i for all 1 ≤ t ≤ s(n). This gives a pseudorandom generator of our definition. Conversely, let f be a s(n)-pseudorandom generator of our definition. Define g(n, t) = (p n 1 (t), . . . , p n n (t)) for 1 ≤ t ≤ 1 + ns(n). Then g is a (1 + ns(n))-hitting set generator. To see this, note that if for a circuit C of degree n, C(p n 1 (y), . . . , p n n (y)) = 0, then C(g(n, t)) = 0 for some t ≤ 1 + ns(n) since C(p n 1 (y), . . . , p n n (y)) is a non-zero polynomial of degree ≤ n · s(n). Theorem 3.2 Consider the class of depth 4 arithmetic circuits over F with fanin of second layer mult gates bounded by O( (n)) (for any unbounded function ) and the fanin of bottom layer mult gates bounded by O(log n). If there is an optimal pseudorandom generator against this class of circuits then the low degree Identity Testing problem over F can be solved deterministically in time n O(log n) .
Proof . Let f be an optimal pseudorandom generator against the class of depth 4 circuits over F defined above. It was shown in [1] that such a pseudorandom generator yields a family of multilinear polynomials {q m } m≥1 such that q m is over m variables, is computable in time 2 O(m) , and requires depth 4 circuits of size 2 Ω(m) , with fanins of second and bottom layer mult gates bounded by O( (2 m )) and O(m) respectively, to compute. By Theorem 2.4, polynomial q m requires exponential sized circuits (of any depth) to compute. Now, we can construct an algorithm that derandomizes low degree Identity Testing over F in time n O(log n) using the polynomial q as shown by the lemma below. Lemma 3.3 Let {q m } m≥1 be a multilinear polynomial family over F computable in exponential time and that cannot be computed by subexponential sized arithmetic circuits. Then the low degree Identity Testing problem over F can be solved in time n O(log n) .
Proof .
The proof is along the lines of the proof of Lemma 7.6 in [9] . Let C be any circuit over F of size n computing a polynomial of degree ≤ n. We wish to test if C compute the zero polynomial. Let S 1 , S 2 , . . ., S n be subsets of [1, c log n] (for a suitable constant c) such that |S i | = d log n (for a suitable d < c) and |S i ∩ S j | ≤ log n (for i = j). This family of sets is the Nisan-Wigderson design [13] and can be efficiently constructed. For a tuple of variables (x 1 , x 2 , . . . , x c log n ), denote by (x 1 , x 2 , . . . , x c log n ) S i the tuple obtained by retaining only those variables whose indices occur in S i (the variables are always arranged in increasing order of index). Without loss of generality, we can assume that C has n inputs z 1 , . . ., z n . Replace z i by p i = q d log n (x 1 , x 2 , . . . , x c log n ) S i for each i. We now claim that if C is zero after substitution then it is zero without substitution as well.
Suppose not. So C(z 1 , . . . , z n ) = 0 and C(p 1 , . . . , p n ) = 0. Then there must exist an index j such that C(p 1 , . . . , p j , z j+1 , . . . , z n ) = 0 and C(p 1 , . . . , p j−1 , z j , . . . , z n ) = 0. Randomly fix values of variables z j+1 , . . ., z n as well as x i 's not occuring in the polynomial p j in the last circuit. The circuit will still compute a non-zero polynomial with high probability. Fix value to the above variables that keep the circuit nonzero. Now replace each p i , i < j, by a sum of product form. Since all but log n variables of p i are fixed, the size of this form is bounded by n. After replacement, we get a circuit of size ≤ n 2 over variables (x 1 , . . . , x c log n ) S j and z j that is non-zero but becomes zero on substituting z j by p j . Hence z j − p j divides the polynomial computed by the new circuit. We now use the multivariate polynomial factorization algorithm [10] to compute this factor. The circuit computing the factor has size n e for some constant e independent of d. This gives us a circuit of size n e + n 2 that computes polynomial p j which is q d log n . Choosing a suitable d yields a contradiction on the hardness of q d log n .
Therefore, if C was non-zero to start with, it continues to be non-zero even after the substitution. Now express C as sum of products using brute-force. Since C after substitution computes a degree O(n log n) polyonmial over O(log n) variables, it will have at most n O(log n) terms. This gives an n O(log n) time algorithm for testing if C is a zero. Theorem 3.2 is suboptimal. It is an interesting open question to improve it to obtain polynomial time algorithm instead of n O(log n) .
