# Towards Optimal Depth Reductions for Syntactically Multilinear Circuits 

Mrinal Kumar<br>University of Toronto, Canada https://mrinalkr.bitbucket.io/ mrinalkumar08@gmail.com<br>Rafael Oliveira<br>University of Toronto, Canada<br>http://www.cs.utoronto.ca/~rafael/<br>rafael@cs.toronto.edu<br>Ramprasad Saptharishi<br>Tata Institute of Fundamental Research https://www.tcs.tifr.res.in/~ramprasad/<br>ramprasad@tifr.res.in


#### Abstract

We show that any $n$-variate polynomial computable by a syntactically multilinear circuit of size poly $(n)$ can be computed by a depth-4 syntactically multilinear ( $\Sigma \Pi \Sigma \Pi)$ circuit of size at $\operatorname{most} \exp (O(\sqrt{n \log n}))$. For degree $d=\omega(n / \log n)$, this improves upon the upper bound of $\exp (O(\sqrt{d} \log n))$ obtained by Tavenas [14] for general circuits, and is known to be asymptotically optimal in the exponent when $d<n^{\varepsilon}$ for a small enough constant $\varepsilon$. Our upper bound matches the lower bound of $\exp (\Omega(\sqrt{n \log n}))$ proved by Raz and Yehudayoff [12], and thus cannot be improved further in the exponent. Our results hold over all fields and also generalize to circuits of small individual degree.

More generally, we show that an $n$-variate polynomial computable by a syntactically multilinear circuit of size poly $(n)$ can be computed by a syntactically multilinear circuit of product-depth $\Delta$ of size at most $\exp \left(O\left(\Delta \cdot(n / \log n)^{1 / \Delta} \cdot \log n\right)\right)$. It follows from the lower bounds of Raz and Yehudayoff [12] that in general, for constant $\Delta$, the exponent in this upper bound is tight and cannot be improved to $o\left((n / \log n)^{1 / \Delta} \cdot \log n\right)$.


2012 ACM Subject Classification Theory of computation $\rightarrow$ Circuit complexity
Keywords and phrases arithmetic circuits, multilinear circuits, depth reduction, lower bounds
Digital Object Identifier 10.4230/LIPIcs.ICALP.2019.78
Category Track A: Algorithms, Complexity and Games
Related Version A full version of the paper is available at https://arxiv.org/abs/1902.07063.
Funding Mrinal Kumar: A part of this work was done during the postdoctoral stay at Harvard, during the lower bounds semester at Simons Institute for the Theory of Computing, Berkeley and while visiting TIFR, Mumbai.
Rafael Oliveira: Part of this work was done while visiting the Simons Institute for the Theory of Computing.
Ramprasad Saptharishi: Research supported by Ramanujan Fellowship of DST.
Acknowledgements We are extremely thankful to Ben Rossman, who pointed us towards this question, and for many stimulating discussions at various stages of this work. We also thank Shubhangi Saraf, Amir Shpilka and Ben Lee Volk for many helpful conversations. Mrinal also thanks Prahladh Harsha for accommodating him in his apartment for a part of the visit to TIFR, where a part of this paper was written.

© Mrinal Kumar, Rafael Mendes de Oliveira, and Ramprasad Saptharishi;
licensed under Creative Commons License CC-BY
46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Editors: Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi; Article No. 7 pp. 78:1-78:15

## 1 Introduction

An algebraic circuit over a field $\mathbb{F}$ and variables $\mathbf{x}=\left(x_{1}, x_{2}, \ldots, x_{n}\right)$ is a directed acyclic graph whose internal vertices (called gates) are labeled as either + (sum) or $\times$ (product), and leaves (vertices of indegree zero) are labeled by the variables in $\mathbf{x}$ or constants from $\mathbb{F}$. The gates of outdegree zero in a circuit are called its output gates. Algebraic circuits give a natural and succinct representation for multivariate polynomials; analogous to the way Boolean circuits give a succinct representation of Boolean functions. We refer the reader to the excellent survey of Shpilka and Yehudayoff [13] for an introduction to the area of algebraic circuit complexity. One of the main protagonists in the results in this paper will be the class of syntactically multilinear circuits which we now define.

- Definition 1 (Syntactically Multilinear Circuits). An algebraic circuit $C$ is said to be syntactically multilinear if at every product gate $v$ in $C$ with inputs $u_{1}, u_{2}, \ldots, u_{t}$, the set of variables in the sub-circuits rooted at $u_{i}$ are pairwise disjoint from each other.

The size of an algebraic circuit is the number of edges in it, and its depth is the length of the longest path from an output gate to a leaf. Intuitively, the size of a circuit is an indicator of the time complexity of computing the polynomial, and its depth indicates how fast the polynomial can be computed in parallel.

We now introduce a sequence of fundamental structural results for algebraic circuits, that are collectively called depth reductions; this is the main focus of this paper.

## Depth Reductions

In a beautiful, surprising and influential work, Valiant et al. [15] showed that every polynomial family which is efficiently computable by an algebraic circuit is also efficiently computable in parallel. Formally, they showed the following theorem.

- Theorem 2 ([15]). There is an absolute constant $c \in \mathbb{N}$ such that the following is true. If $P$ be an n-variate homogeneous polynomial of degree d over any field $\mathbb{F}$ which can be computed by an algebraic circuit $C$ of size $s$, then $P$ can be computed by an algebraic circuit $C^{\prime}$ (of unbounded fan-in) of depth $c \log d$ and size $(s n d)^{c}$.

In particular, the theorem says that every polynomial family of polynomially bounded (in $n$ ) degree that is computable by a circuit of size $\operatorname{poly}(n)$ and arbitrary depth, is also efficiently computable by a circuit of size poly $(\log n)$ and depth $O(\log n)$.

In a remarkable extension of Theorem 2, Agrawal and Vinay [1] showed that one can parallelize algebraic circuits even more (reducing the depth to a constant), at the cost of a larger (a non-trivial subexponential factor) blow up in the circuit size. The version of their theorem stated below is due to Tavenas [14], who optimized the parameters further.

- Theorem 3 ([1, 8, 14]). There is an absolute constant $c \in \mathbb{N}$ such that the following is true. If $P$ is an $n$-variate homogeneous polynomial of degree $d$ over any field $\mathbb{F}$ which can be computed by an algebraic circuit $C$ of size $s$, then $P$ can be computed by a homogeneous $\Sigma \Pi \Sigma \Pi$ algebraic circuit $C^{\prime}$ of size $(s n d)^{c \sqrt{d}}$.

Here, a $\Sigma \Pi \Sigma \Pi$ circuit is an algebraic circuit with four layers of alternating sum and product gates with the top layer being a sum layer. Throughout this paper, when we say a depth- 4 circuit, we mean a $\Sigma \Pi \Sigma \Pi$ circuit.

We note that while Theorem 3 as stated above reduces a homogeneous circuit of arbitrary depth to a homogeneous circuit of depth-4, but it easily follows from the proof that the depth reduction preserves syntactic restrictions. That is, if we start with a syntactically multilinear
and homogeneous circuit, the resulting depth-4 circuit is also syntactically multilinear and homogeneous. This statement will be of particular interest as we study depth reductions for syntactically multilinear circuits in this paper.

## On the optimality of reductions to depth-4

An immediate consequence of Theorem 2 and Theorem 3 is that strong enough lower bounds for algebraic circuits of bounded depth imply superpolynomial lower bounds for general algebraic circuits. Thus, the questions of proving lower bounds for bounded depth circuits, and that of understanding if the parameters in Theorem 3 can be improved further seem to be of fundamental interest. In the last few years, we have had significant progress on both these fronts. Following a long line of work starting with a work of Kayal [7] and Gupta et al. [5], we now know extremely good lower bounds for homogeneous depth-4 circuits.

- Theorem 4 (Kumar and Saraf [9]). There exists a polynomial family $\left\{f_{n}\right\}$, where $f_{n}$ is a homogeneous n-variate polynomial of degree $d=n^{\varepsilon}$, for an absolute constant $\varepsilon>0$, such that $f_{n}$ is computable by an algebraic circuit of size $\operatorname{poly}(n)$, but any homogeneous depth- 4 circuit computing $f_{n}$ has size $n^{\Omega(\sqrt{d})}$.

Moreover, the family $\left\{f_{n}\right\}$ is computable by a syntactically multilinear circuit of polynomial size.

If we allow the hard polynomial to be explicit but not necessarily have small circuits, then upper bound on the degree $d$ in the above theorem can be increased to as large as $n^{1-\varepsilon}$ for any constant $\varepsilon>0 .{ }^{1}$ Thus, in general, the exponent in the upper bound on the size of the depth- 4 circuit obtained in Theorem 3 cannot be improved asymptotically. In fact, the theorem shows that we cannot even expect such an improvement for syntactically multilinear circuits in the setting when the degree $d$ is sufficiently smaller than the number of variables $n$. A natural question here is to understand if Theorem 3 is also asymptotically tight in the exponent when the degree is larger. The following result of Raz and Yehudayoff goes a long way towards answering this question.

- Theorem 5 ([12]). There is a family of multilinear polynomials $\left\{f_{n}\right\}$ such that, for every $n$, the polynomial $f_{n}$ is an $n$-variate degree $d=\Theta(n)$ polynomial that can be computed by a syntactically multilinear circuit of size poly $(n)$, but any multilinear circuit of depth- 4 computing $f_{n}$ has size $n^{\Omega(\sqrt{n / \log n}) \text {. }}$

More generally, for any constant $\Delta$, any syntactically multilinear circuit of product-depth ${ }^{2}$ $\Delta$ computing $f_{n}$ must have size $n^{\Omega\left((n / \log n)^{1 / \Delta}\right)}$.

For depth-4 circuits (or $\Delta=2$ ), asimilar result was proved by Hegde and Saha [6] for the more general ${ }^{3}$ class of circuits called multi- $k$-ic circuits, where the formal degree of any variable in the circuit is bounded by a parameter $k$ (formally defined in Definition 17).

- Theorem 6 ([6]). There is an explicit family $\left\{f_{n}\right\}$ of n-variate multilinear polynomials of degree $d=\Theta(n)$ such that, for every $k \leq(n \log n)^{0.9}$, any multi- $k$-ic circuit of depth- 4 computing $f_{n}$ has size at least $n^{\Omega(\sqrt{n /(k \log n)})}$.

[^0]Thus, Theorem 5 and Theorem 6 shows that the exponent $\sqrt{d}$ in the exponent in Theorem 3 cannot be replaced by $o(\sqrt{n / \log n})$. Thus, in the regime when $d=\Theta(n)$, there is a gap of $\sqrt{\log n}$ between the known lower bounds and what is potentially achievable via depth reduction. Raz and Yehudayoff [12] also observe that using their techniques, the lower bound cannot be improved to $n^{\omega(\sqrt{n / \log n})}$. Our main motivation for this work was to bridge this gap. In the light of Theorem 4, we believed the upper bound of $n^{O(\sqrt{d})}$ in Theorem 3 to be right bound for multilinear circuits for all $d$, and had hoped to improve the lower bound in Theorem 5 to $n^{\Omega(\sqrt{n})}$.

However, as we discuss next, the correct exponent for depth reduction to depth- 4 in the high degree regime turns out to be $\sqrt{n / \log n}$. In addition to being surprising, this also offers a potentially viable approach to the question of proving superpolynomial lower bounds for syntactically multilinear circuits by extending Theorem 4 to the high degree regime. We now state our results and discuss the connections to multilinear circuit lower bounds.

### 1.1 Results

We start by stating our main theorems.

- Theorem 7. Let $C$ be a multi-k-ic circuit of size s computing a polynomial in $n$ variables. Then, there is a multi-k-ic $\Sigma \Pi \Sigma \Pi$ circuit $C^{\prime}$ of size $s{ }^{O}\left(\sqrt{\frac{k n}{\log s}}\right)$ computing the same polynomial.

The ideas in the proof of Theorem 7 generalize to give the following statement about reduction to depth $\Delta$ circuits for any constant $\Delta$.

- Theorem 8. Let $C$ be a multi-k-ic circuit of size $s$ computing a polynomial in $n$ variables. Then, there is a multi-k-ic $(\Sigma \Pi)^{\Delta}$ circuit $C^{\prime}$ computing the same polynomial whose size is at most

$$
s^{O\left(\Delta \cdot(n k / \log s)^{1 / \Delta}\right)} .
$$

Thus, for $s=\operatorname{poly}(n), k=o(\log s)$ and $n \geq d \geq \omega\left(\frac{k n}{\log s}\right)$, the exponents in the upper bounds in Theorem 7 are asymptotically better than that in Theorem 3. An immediate consequence of Theorem 7 is the following corollary.

- Corollary 9. Let $\left\{f_{n}\right\}$ be an explicit family of multilinear polynomials, such that $f_{n}$ is an $n$ variate polynomial of degree $d=\omega(n / \log n)$, and any multilinear $\Sigma \Pi \Sigma \Pi$ circuit computing $f_{n}$ has size at least $n^{\Omega(\sqrt{d})}$. Then, $\left\{f_{n}\right\}$ requires superpolynomial size syntactically multilinear circuits.

The corollary is of interest since by Theorem 4, we know $n^{\Omega(\sqrt{d})}$ lower bounds for homogeneous multilinear $\Sigma \Pi \Sigma \Pi$ circuits, when $d=n^{\varepsilon}$. Thus extending these bounds so that they hold for higher degree polynomials will imply superpolynomial lower bounds for multilinear circuits. The current best lower bound known for multilinear circuits is a nearly quadratic lower bound in a recent work of Alon et al. [3]. The standard technique for proving lower bounds for multilinear models is via the rank of the partial derivative matrix under a random partition of variables (due to Raz [10]). This has been useful in almost all of the known lower bounds for multilinear models, such as super polynomial lower bounds for multilinear formulas [10], exponential lower bounds for constant depth multilinear circuits [12] as well as the currently known superlinear and nearly quadratic lower bounds for multilinear circuits [11, 3]. However, this technique is too weak to yield even super-cubic lower bounds for syntactically multilinear circuits. Thus, currently we do not even have potential approaches to proving superpolynomial
lower bounds for multilinear circuits. In the light of this, it certainly seems worth exploring if the partial derivative based methods used in the proof of Theorem 4 can be extended to work for multilinear polynomials whose degree $d=\omega(n / \log n)$ is high. As far as we understand, there does not seem to be strong evidence one way or the other about this.

For multi- $k$-ic circuits, we do not even know superpolynomial lower bounds for formulas or even constant depth formulas. Based on the discussion above, Theorem 7 does seem to offer a potentially viable approach to prove these lower bounds.

Finally, we note again that the upper bound on the size of the depth-4 circuit obtained in Theorem 7 cannot be further improved asymptotically in the exponent as Theorem 5 shows.

### 1.2 Proof Overview

We focus on giving an outline of the proof of Theorem 7 for the multilinear case (or $k=1$ ). The proof follows the strategy of the proof of Theorem 3 with some key differences, which we point out as we go along. There are two main steps and we now give a sketch of both of them.

## Balancing a syntactically multilinear circuit

For this step, the key notion is that of a balanced circuit. We say that a circuit $C$ is balanced with respect to a potential function $\Phi: C \rightarrow \mathbb{N}$ (e.g. degree, number of variables), if the fan-in of every product $g$ in $C$ is a constant, and $\Phi(g) \geq 2 \Phi(h)$ for every child $h$ of $g$. In the proof of Theorem 3, the authors essentially use the results of Valiant et al. [15] to balance a homogeneous circuit with the potential function $\Phi$ being the formal degree of a gate. For our proof, we show that a syntactically multilinear circuit can in fact be balanced with the potential function being the number of variables in the sub-circuit rooted at a gate. Our proof of this part involves the machinery of gate quotients and frontier decompositions developed by Valiant et al. in their original proof, although there are some crucial differences which require some non-trivial (albeit simple) insights.

One such challenge stems from the fact that in a homogeneous circuit, the formal degree of any two children of a product gate is the same and equal to the formal degree of the parent, whereas the children might depend on very different (even completely disjoint) sets of variables. To get around this, our notion of frontier is different from that of Valiant et al [15]. In [15], frontier is defined with respect to vertices, whereas we define frontier with respect to edges. As a consequence, our frontier decomposition statements are slightly different from those in [15], although they continue to have a natural semantic meaning. This is detailed in Section 5.

## Reduction to depth-4 from a balanced circuit

In the second part of our proof, we show that any balanced syntactically multilinear circuit of size $s$ computing a polynomial in $n$ variables can be depth reduced to a syntactically multilinear depth- 4 circuit of size $s^{O(\sqrt{n / \log n})}$. The proof is along the lines of the proof of the analogous statement in the homogeneous (non-multilinear) setting by Chillara et al. [4]. The high level idea of the proof is the following : in a balanced circuit $C$, the polynomial computed at any gate $g$ can be written as a sum of product of terms, where the product fan-in is a constant, the sum fan-in is upper bounded by the size of the circuit, and the number of variables in any of the terms is at most half of the number of variables in $g$. Moreover, each of the terms is a polynomial computed by a gate in $C$, so this decomposition can be recursively applied. We apply this decomposition repeatedly till every term in the sum of products expression of the output depends on at most $t$ variables. We argue that the
sum fan-in of this sum of products expression is at most $s^{O(n / t)}$. Now, we expand each of the terms (which is a multilinear polynomial) as a sum of multilinear monomials in $t$ variables. Thus, the total size of the $\Sigma \Pi \Sigma \Pi$ circuit obtained is $2^{t} \cdot s^{O(n / t)}$ which is $s^{O(\sqrt{n / \log s})}$ for $t=\sqrt{n \log s}$.

In the proof of the analogous statement for homogeneous non-multilinear circuits, at the end of the repeated applications of the decomposition, each of the terms is of degree at most $t$. Thus, a sum of product expansion of each such term has size $\binom{n}{t}$, and so the total size of the $\Sigma \Pi \Sigma \Pi$ circuit obtained is $n^{t} \cdot s^{O(n / t)}$, which for $s=\operatorname{poly}(n)$ is minimized for $t=\sqrt{n}$ and equals $s^{O(\sqrt{n})}$. This explains the gain in the size obtained by Theorem 7 .

## 2 Preliminaries

In this section, we describe the notion of parse-trees and gate quotients which are crucial to our proof and set up some of the machinery we need for the proof.

### 2.1 Parse-trees and quotients

- Definition 10 (Parse-trees). Let $C$ be an algebraic circuit. For any $u_{0} \in C$, a parse-tree $T$ rooted at $u_{0}$ is a subcircuit of $C$ that satifies the following properties:
- the node $u_{0} \in T$,
- if $u \in T$ is a multiplication gate of $C$ with $u=v_{1} \times v_{2}$, then $v_{1}, v_{2}$ are also in $T$,
- if $u \in T$ is an addition gate of $C$ with $u=v_{1}+v_{2}$, then exactly one of $v_{1}$ or $v_{2}$ is in $T$. Any such sub-circuit computes just a monomial, and this shall be called the value the parse-tree. Although the parse-tree defined above need not be a tree, it shall unfolded to a tree.

If $T$ is a parse-tree rooted at $u$, and $v$ is a node that appears on its right-most path, then the tree $T^{\prime}$ obtained by replacing $v$ only on the right-most path by a leaf labelled 1 is said to be a $v$-snipped parse-tree rooted at $u$.

- Definition 11 (Var operator). For any nodes $u \in C$, we denote by $\operatorname{Var}(u)$ the vector $\left(d_{1}, \ldots, d_{n}\right) \in \mathbb{N}_{\geq 0}^{n}$ where $d_{i}$ is the maximum $x_{i}$-degree over all parse-trees rooted at $u$.

Similarly, for any pair of nodes $u, v \in C$, we denote by $\operatorname{Var}(u: v)$ the vector $\left(d_{1}, \ldots, d_{n}\right)$ where $d_{i}$ is the maximum $x_{i}$-degree over all $v$-snipped parse-tree rooted at $u$.

We shall also define $\left|\left(d_{1}, \ldots, d_{n}\right)\right|=\sum d_{i}$.
For a syntactically multilinear circuit $C$, note that $|\operatorname{Var}(g)|$ for any gate $g \in C$ is precisely the number of distinct variables in the sub-circuit rooted at $g$.

- Remark 12. Throughout this discussion, we will assume that the circuit is right heavy. This means that for every multiplication gate, $w=w_{L} \times w_{R}, \operatorname{Var}\left(w_{R}\right) \geq \operatorname{Var}\left(w_{L}\right)$. Note that this is without loss of generality, since left and right are merely labels that we can assign arbitrarily to the children of every gate in the circuit.
- Definition 13 (Gate Quotient). For every two gates $u, v$ in $C$, the gate quotient of $u$ with respect to $v$, denoted by $[u: v]$ is defined inductively as follows.
- If $u=v$, then $[u: v]=1$.
- If $u=u_{1}+u_{2}$, then $[u: v]=\left[u_{1}: v\right]+\left[u_{2}: v\right]$.
- If $u=u_{L} \times u_{R}$, then $[u: v]=\left[u_{L}\right]\left[u_{R}: v\right]$.
- If $v$ does not appear in the subcircuit rooted at $u$, then $[u: v]=0$.

Lemma 14. Let $u, v \in C$. Then, the polynomial $[u]$ is the sum of values of all parse-trees rooted at $u$. Furthermore, the polynomial $[u: v]$ is the sum of the values of all the $v$-snipped parse-trees $T$ that are rooted at $u$.

The above lemma is almost folklore and a proof of it can be seen in the work of Allender et al. [2].

### 2.2 Syntactic restrictions on parse-trees

We remark that throughout this paper, by degree, we mean the syntactic or formal degree, which could be much larger than the actual or semantic degree. The following observation records some basic properties of the Var operator.

- Observation 15. Let $C$ be any algebraic circuit. Then,
- $\operatorname{Var}(u)$ is monotonically non-increasing as $u$ moves towards the leaves. That is, if $u$ is an ancestor of $v$, then ever coordinate of $\operatorname{Var}(u)$ is at least as large as the corresponding coordinate in $\operatorname{Var}(v)$.
Similarly, for any fixed $v$, the vector $\operatorname{Var}(u: v)$ is monotonically non-increasing as $u$ moves towards the leaves.
- For any multiplication gate $u=u_{1} \times u_{2}$, we have $\operatorname{Var}(u)=\operatorname{Var}\left(u_{1}\right)+\operatorname{Var}\left(u_{2}\right)$. Similarly for any $v$, we have $\operatorname{Var}(u: v)=\operatorname{Var}\left(u_{1}\right)+\operatorname{Var}\left(u_{2}: v\right)$.
- For any addition gate $u=u_{1}+u_{2}$, we have $\operatorname{Var}(u)=\max \left(\operatorname{Var}\left(u_{1}\right), \operatorname{Var}\left(u_{2}\right)\right)$, the coordinate-wise max of the two vectors. Similarly for any $v, \operatorname{Var}(u: v)=\max \left(\operatorname{Var}\left(u_{1}\right.\right.$ : $\left.v), \operatorname{Var}\left(u_{2}: v\right)\right)$.

Proof. The proofs immediately follow from the definitions.

For two vectors $\mathbf{v}_{1}, \mathbf{v}_{2} \in \mathbb{N}_{\geq 0}^{n}$, we shall say $\mathbf{v}_{1} \preceq \mathbf{v}_{2}$ if each coordinate of $\mathbf{v}_{1}$ is at most the corresponding coordinate in $\mathbf{v}_{2}$.

- Observation 16. Suppose $u \in C$ and $w$ is a node in $C$ such that there is some parse-tree rooted at $u$ with $w$ appearing on its rightmost path. Then,

$$
\operatorname{Var}(u: w)+\operatorname{Var}(w) \preceq \operatorname{Var}(u) .
$$

Similarly, suppose $w$ is a node in $C$ such that there is some $v$-parse-tree rooted at $u$ with $w$ appearing on its rightmost path. Then,

$$
\operatorname{Var}(u: w)+\operatorname{Var}(w: v) \preceq \operatorname{Var}(u: v)
$$

Proof. The proof is straightforward; we just give the proof of the second equation. Fix a coordinate $i$. If $d_{i}=(\operatorname{Var}(u: w))_{i}$ then there is some $w$-snipped parse-tree $T_{i}$ rooted at $u$ whose $x_{i}$-degree equals $d_{i}$. Similarly if $e_{i}=(\operatorname{Var}(w: v))_{i}$, then there is some $v$-snipped parse-tree $T_{i}^{\prime}$ rooted at $w$ whose $x_{i}$-degree is $e_{i}$. Clearly the gluing of $T_{i}$ and $T_{i}^{\prime}$ obtained by replacing the snipped vertex $w$ in $T_{i}$ with the tree $T_{i}^{\prime}$ is a $v$-snipped parse-tree rooted at $u$ with $x_{i}$-degree $d_{i}+e_{i}$. Therefore $d_{i}+e_{i} \leq(\operatorname{Var}(u: v))_{i}$ and the claim follows.

- Definition 17 (Syntactically multilinear and multi- $k$-ic circuits). $A$ circuit $C$ is said to be syntactically multilinear if $\operatorname{Var}(u) \in\{0,1\}^{n}$ for all $u \in C$.
$A$ circuit $C$ is said to be syntactically multi- $k$-ic if $\operatorname{Var}(u) \in\{0,1, \ldots, k\}^{n}$ for all $u \in C$.


## 3 Frontier edges and quotient

- Definition 18 (Frontier edges). For a circuit $C$, an edge between two gates $g_{1}, g_{2}$ (where $g_{1}$ is the parent) is said to be an $m$-frontier edge (for a parameter $m$ ) if

$$
\left|\operatorname{Var}\left(g_{1}\right)\right| \geq m \text { and }\left|\operatorname{Var}\left(g_{2}\right)\right|<m .
$$

We use $\mathcal{F}_{m}^{\times}$to denote the set of all $m$-frontier edges $\left(g_{1}, g_{2}\right)$ where $g_{1}$ is a multiplication gate and $g_{2}$ is its right child. We use $\mathcal{F}_{m}^{+}$to denote those where $g_{1}$ is an addition gate.

Furthermore, if $v \in C$ is a fixed gate, we shall say that $\left(g_{1}, g_{2}\right)$ is an $m$-frontier edge with respect to $v$ if

$$
\left|\operatorname{Var}\left(g_{1}: v\right)\right| \geq m \text { and }\left|\operatorname{Var}\left(g_{2}: v\right)\right|<m .
$$

We will use $\mathcal{F}_{m, v}^{\times}$to denote the set of all edges $\left(g_{1}, g_{2}\right)$ that are $m$-frontier edges with respect to $v$ where $g_{1}$ is a multiplication gate (and $g_{2}$ is its right child), and $\mathcal{F}_{m, v}^{+}$to denote those where $g_{1}$ is an addition gate (and $g_{2}$ is any child).

## 4 Decomposition via gate quotients

In this section, we prove the following lemma, which is the key technical observation needed for our proofs.

- Lemma 19. Let $u, v$ be gates in an algebraic circuit $C$ with $|\operatorname{Var}(u)| \geq m$ and $|\operatorname{Var}(v)|<m$. Then,

$$
\begin{align*}
{[u] } & =\sum_{(w, z) \in \mathcal{F}_{m}^{\times}}[u: w] \cdot\left[w_{L}\right] \cdot[z]+\sum_{(w, z) \in \mathcal{F}_{m}^{+}}[u: w] \cdot[z]  \tag{1}\\
{[u: v] } & =\sum_{(w, z) \in \mathcal{F}_{m, v}^{\times}}[u: w] \cdot\left[w_{L}\right] \cdot[z: v]+\sum_{(w, z) \in \mathcal{F}_{m, v}^{+}}[u: w] \cdot[z: v] \tag{2}
\end{align*}
$$

We shall give an informal sketch using the concept of parse-trees. A complete formal proof of the lemma can be found in Appendix A. For any $u, v$, we have that $[u: v]$ is the sum of all $v$-snipped parse-trees rooted at $u$. For any parse-tree, since $|\operatorname{Var}(u)| \geq m$ and $|\operatorname{Var}(v)|<m$ and $\operatorname{Var}(\cdot)$ is a monotonically non-increasing function as we move towards the leaves, there must be a unique edge $(w, z) \in \mathcal{F}_{m, v}^{\times} \cup \mathcal{F}_{m, v}^{+}$on its right-most path such that $|\operatorname{Var}(w)| \geq m$ and $|\operatorname{Var}(z)|<m$.

If $(w, z) \in \mathcal{F}_{m, v}^{\times}$, then $w=w_{L} \times z$ is a multiplication gate. Therefore, the sum of the values of all $v$-snipped parse-trees with $w$ (and hence the edge $(w, z)$ ) on its rightmost path is exactly $[u: w][w: v]=[u: w]\left[w_{L}\right][z: v]$.

If $(w, z) \in \mathcal{F}_{m, v}^{+}$, then $w=w_{1}+z$ is an addition gate. Then, $[u: w] \cdot[w: v]$ is the sum of all $v$-snipped parse-trees with $w$ on its rightmost path and $[u: w][w: v]=[u: w]\left[w_{1}\right.$ : $v]+[u: w][z: v]$. Each $v$-snipped parse-tree with $w$ on its rightmost path either has $\left(w, w_{1}\right)$ on the rightmost path or $(w, z)$. The term $[u: w]\left[w_{1}: v\right]$ is precisely the sum of the values of such $^{4}$ parse-trees with $\left(w, w_{1}\right)$ on its rightmost path, and $[u: w][z: v]$ is precisely the sum of the values of those parse-trees with $(w, z)$ on its rightmost path.

[^1]Since the rightmost path of any $v$-snipped parse-tree rooted at $u$ has a unique edge $(w, z) \in \mathcal{F}_{m, v}^{\times} \cup \mathcal{F}_{m, v}^{+}$, summing over all such potential edges gives

$$
[u: v]=\sum_{(w, z) \in \mathcal{F}_{m, v}^{\times}}[u: w] \cdot\left[w_{L}\right] \cdot[z: v]+\sum_{(w, z) \in \mathcal{F}_{m, v}^{+}}[u: w] \cdot[z: v] .
$$

## 5 Balancing syntactically multilinear circuits

In this section, we prove the following theorem.

- Theorem 20. Suppose $C$ is an algebraic circuit of size $s$. Then, there is a circuit $C^{\prime}$ of size poly(s) computing the same polynomial with the following structural properties.
- all addition gates in $C^{\prime}$ have fan-in $O\left(s^{4}\right)$,
- all multiplication gates in $C^{\prime}$ have fan-in at most 5 ,
- for any multiplication gate $g \in C^{\prime}$, any child $h$ of $g$ satisfies $|\operatorname{Var}(h)| \leq|\operatorname{Var}(g)| / 2$.

Furthermore, if $C$ is syntactically multi-k-ic, then so is $C^{\prime}$.
Proof. Without loss of generality, we may assume that the circuit is right-heavy in the sense that for every multiplication gate $u=u_{1} \times u_{2}$ we have $\left|\operatorname{Var}\left(u_{2}\right)\right| \geq\left|\operatorname{Var}\left(u_{1}\right)\right|$. We shall build a new circuit $C^{\prime}$ that computes all $[u: v]^{\prime}$ 's and $[u]^{\prime}$ 's for gates $u, v \in C$ using the equations in Lemma 19.

We shall assume inductively that we have already computed all $[w]$ 's with $|\operatorname{Var}(w)|<t$ and also all $[w, v]$ with $|\operatorname{Var}(w, v)|<t$. Suppose $u \in C$ such that $|\operatorname{Var}(u)|=t$. Using (1) from Lemma 19 with $m=t / 2$ we have

$$
[u]=\sum_{(w, z) \in \mathcal{F}_{m}^{\times}}[u: w] \cdot\left[w_{L}\right] \cdot[z]+\sum_{(w, z) \in \mathcal{F}_{m}^{+}}[u: w] \cdot[z] .
$$

By Observation 16, $|\operatorname{Var}(w)| \geq t / 2$ implies that $|\operatorname{Var}(u: w)| \leq t / 2$. Furthermore, $|\operatorname{Var}(z)| \leq$ $t / 2$ by the choice of the frontier edge and $\left|\operatorname{Var}\left(w_{L}\right)\right| \leq t / 2$ since $C$ is right-heavy. This allows us to compute all nodes of the form $[u]$ with $|\operatorname{Var}(u)| \leq t$.

If $u, v \in C$ such that $|\operatorname{Var}(u: v)|=t$. Using (2) from Lemma 19 with $m=t / 2$, we have

$$
[u: v]=\sum_{(w, z) \in \mathcal{F}_{m, v}^{\times}}[u: w] \cdot\left[w_{L}\right] \cdot[z: v]+\sum_{(w, z) \in \mathcal{F}_{m, v}^{+}}[u: w] \cdot[z: v] .
$$

We can restrict the edges in the RHS to only those edges $(w, z)$ that is present in at least one $v$-snipped parse-tree rooted at $u$ (if not, this edge's contribution to the RHS is zero). Therefore by Observation 16, $\operatorname{Var}(w: v)+\operatorname{Var}(u: w) \preceq \operatorname{Var}(u: v)$ and therefore we have $|\operatorname{Var}(u: w)| \leq t / 2$. Furthermore, by the choice of the frontier, we also have $|\operatorname{Var}(z: v)| \leq t / 2$. The non-trivial case is $\operatorname{Var}\left(w_{L}\right)$ which could in principle be large but again $\operatorname{Var}\left(w_{L}\right) \preceq \operatorname{Var}(w: v) \preceq \operatorname{Var}(u: v)$ as any parse-tree rooted $w_{L}$ is a sub-tree of a $v$-snipped tree rooted at $u$. Since we have already computed all gates $[w]$ with $\operatorname{Var}(w) \leq t$, we can write

$$
\begin{aligned}
{[u: v] } & =\sum_{(w, z) \in \mathcal{F}_{m, v}^{\times}}[u: w] \cdot\left[w_{L}\right] \cdot[z: v]+\sum_{(w, z) \in \mathcal{F}_{m, v}^{+}}[u: w] \cdot[z: v] \\
& =\sum_{(w, z) \in \mathcal{F}_{m, v}^{\times}}[u: w] \cdot\left(\sum_{(p, q) \in \mathcal{F}_{m_{w}}^{\times}}\left[w_{L}: p\right] \cdot\left[p_{L}\right] \cdot[q]+\sum_{(p, q) \in \mathcal{F}_{m_{w}}^{+}}\left[w_{L}: p\right] \cdot[q]\right) \cdot[z: v]
\end{aligned}
$$

$$
+\sum_{(w, z) \in \mathcal{F}_{m, v}^{+}}[u: w] \cdot[z: v]
$$

where $m_{w}=\operatorname{Var}\left(w_{L}\right) / 2$.
The required structural properties of $C^{\prime}$ are readily seen from the above construction.

## 6 Reduction to depth four from balanced circuits

We now show how to reduce a balanced circuit to a depth- 4 circuit. This would complete the proof of our main theorem. We shall use the notation $\Sigma \Pi(\Sigma \Pi)_{t}$ to refer to $\Sigma \Pi \Sigma \Pi$ circuits computing polynmomials of the form

$$
F=\sum_{i} \prod_{j} Q_{i j}
$$

with $\left|\operatorname{Var}\left(Q_{i j}\right)\right| \leq t$.
The proof of this part follows the outline of a similar argument in Chillara et al. [4] of reducing to depth- 4 from a balanced circuit. However, there are some differences: our potential is $|\operatorname{Var}(\cdot)|$ and not the degree (as is usually the case). Since this potential function also falls as we go from a sum $(+)$ gate to its children, we need one more simple observation in our argument to bound the number of steps in the recursion in the proof. We now provide the details.

- Lemma 21. Let $C$ be a multi-k-ic circuit of size $s$ such that every multiplication gate $g$ in $C$ has fan-in at most 5 and for every child $h$ of $g$ in $C$, $\operatorname{Var}(h) \leq \operatorname{Var}(g) / 2$.

Then, for any positive integer $0 \leq t \leq k n$, there is an equivalent multi- $k-i c \Sigma \Pi(\Sigma \Pi)_{t}$ circuit $C^{\prime}$ that computes the same polynomial, with the following properties:

- the top fan-in of $C^{\prime}$ is at most $s^{O(k n / t)}$,
- the size of $C^{\prime}$ is at most $2^{t} \cdot s^{O(k n / t)}$,
- each of the $(+)$-gates closer to the leaves compute polynomials that are computed by gates in $C$.

Proof. Since $C$ is balanced, with product fan-in at most 5 , every gate $g$ in $C$ can be written as

$$
\begin{equation*}
g=\sum_{i=1}^{s} \prod_{j=1}^{5} g_{i, j} \tag{3}
\end{equation*}
$$

where each $g_{i, j}$ is also computed by a gate in the circuit $C,\left|\operatorname{Var}\left(g_{i, j}\right)\right| \leq|\operatorname{Var}(g)| / 2$. With this notation, (3) applied on the root of $C$ says that $C$, which is a syntactically multi- $k$-ic circuit, can be trivially written as a $\Sigma \Pi(\Sigma \Pi)_{k n / 2}$. A natural idea would be to apply (3) on the $g_{i, j}$ 's until we get a $\Sigma \Pi(\Sigma \Pi)_{t}$ circuit. All that is needed is to bound the number of summands (or the top fan-in of the resulting $\Sigma \Pi(\Sigma \Pi)_{t}$ circuit) at the end of this process. Observe that for every $i \in\{1,2, \ldots, s\}$, we could have that $\left|\operatorname{Var}\left(\prod_{j=1}^{5} g_{i, j}\right)\right|$ is much smaller than $|\operatorname{Var}(g)|$ itself. To handle this, we shall pretend that

We will view the process as a tree in the natural way. The root of the tree corresponds to the root of the circuit, and all other nodes in the tree correspond to products of addition gates in $C$. The children of a node in the tree correspond to the summands in the sum of product representation of that node obtained by expanding one of its factors according to (3). The leaves of this tree are products of addition gates $\prod g_{i}^{\prime}$ such that $\left|\operatorname{Var}\left(g_{i}^{\prime}\right)\right| \leq t$ for
each factor $g_{i}^{\prime}$. The tree has a branching factor of at most $s$, hence it suffices to get a bound on the depth of the tree to get a bound on the number of leaves which would be the top fan-in of the $\Sigma \Pi(\Sigma \Pi)_{t}$ representation.

Let $g \prod_{\ell} w_{\ell}$ be an internal node in the tree with $|\operatorname{Var}(g)|>t$. After applying (3) on $g$, we get

$$
g\left(\prod_{\ell} w_{\ell}\right)=\sum_{i=1}^{s}\left(\prod_{j=1}^{5} g_{i, j} \cdot \prod_{\ell} w_{\ell}\right)
$$

We now consider two cases.

- $\left\lvert\, \begin{aligned} & \operatorname{Var}\left(\prod_{j=1}^{\mathbf{5}} \boldsymbol{g}_{\boldsymbol{i}, j}\right) \mid<\mathbf{3} \boldsymbol{t} / \mathbf{4}: \text { In this case }, \\ & \operatorname{Var}\left(\prod_{j=1}^{5} g_{i, j} \cdot \prod_{\ell} w_{\ell}\right)\left|\leq\left|\operatorname{Var}\left(g \cdot \prod_{\ell} w_{\ell}\right)\right|-t / 4 .\right.\end{aligned}\right.$
- $\left|\operatorname{Var}\left(\prod_{\boldsymbol{j}=\mathbf{1}}^{\mathbf{5}} \boldsymbol{g}_{\boldsymbol{i}, \boldsymbol{j}}\right)\right| \geq \mathbf{3} \boldsymbol{t} \mathbf{4}:$ Since $\operatorname{Var}(g) \succeq \operatorname{Var}\left(g_{i, 1} \cdots g_{i, 5}\right)=\operatorname{Var}\left(g_{i, 1}\right)+\cdots+\operatorname{Var}\left(g_{i, 5}\right)$ and $\left|\operatorname{Var}\left(g_{i, j}\right)\right| \leq t / 2$, it follows that the number of factors $h$ in $\prod_{j=1}^{5} g_{i, j} \cdot \prod_{\ell} w_{\ell}$ with $|\operatorname{Var}(h)| \geq t / 16$ is at least one more than the number of such factors in $g \cdot \prod_{\ell} w_{\ell}$. This is because besides the factor $g_{i, j}$ with largest $\left|\operatorname{Var}\left(g_{i, j}\right)\right|$, the other four factors together must contribute at least $(3 t / 4)-(t / 2)=(t / 4)$ to $\left|\operatorname{Var}\left(g_{i, 1} \cdots g_{i, 5}\right)\right|$ and hence at least one of them must have $\left|\operatorname{Var}\left(g_{i, k}\right)\right| \geq t / 16$.
Thus, in any edge of the tree, either $|\operatorname{Var}(\cdot)|$ decreases by $t / 4$ or the number of factors with $|\operatorname{Var}(\cdot)| \geq t / 16$ increases by one. The root node $g_{0}$ has $\left|\operatorname{Var}\left(g_{0}\right)\right| \leq k n$. Hence, the depth of the tree is bounded by $(16+4)(k n / t)=O(n k / t)$. Therefore, $C$ can be computed by a syntactically multi- $k$-ic $\Sigma \Pi(\Sigma \Pi)_{t}$ circuit of top fan-in at most $s^{O(n k / t)}$.

To get the bound on the overall size of the $\Sigma \Pi(\Sigma \Pi)_{t}$ circuit, we need to bound the sparsity of the polynomials computed by bottom two layers. Note that if $\operatorname{Var}(f)=\left(d_{1}, \ldots, d_{n}\right)$, then $f$ can have at most $\prod\left(1+d_{i}\right)$ monomials. Since $2^{x} \geq 1+x$ for all positive integers $x$, it follows that $|\operatorname{Var}(f)| \leq t$ implies that $f$ has at most $2^{t}$ monomials. Therefore, the total size of the $\Sigma \Pi(\Sigma \Pi)_{t}$ circuit is $2^{t} \cdot s^{O(k n / t)}=2^{O\left(t+\frac{k n \log s}{t}\right)}$.

From Theorem 20 and setting $t=\sqrt{k n \log s}$ in Lemma 21, we get Theorem 7 restated below.

- Theorem 7. Let $C$ be a multi-k-ic circuit of size $s$ computing a polynomial in $n$ variables. Then, there is a multi-k-ic $\Sigma \Pi \Sigma \Pi$ circuit $C^{\prime}$ of size $s^{O\left(\sqrt{\frac{k n}{\log s}}\right)}$ computing the same polynomial.


### 6.1 Reduction to higher depths

We now prove Theorem 8 which shows that similar savings can be obtained in depth reductions to larger depth.

Theorem 8. Let $C$ be a multi-k-ic circuit of size s computing a polynomial in $n$ variables. Then, there is a multi-k-ic $(\Sigma \Pi)^{\Delta}$ circuit $C^{\prime}$ computing the same polynomial whose size is at most

$$
s^{O\left(\Delta \cdot(n k / \log s)^{1 / \Delta}\right) .}
$$

Proof of Theorem 8. We shall assume, without loss of generality, that the circuit $C$ is balanced (by applying Theorem 20 if necessary). The proof follows via repeated applications of Lemma 21.

Applying Lemma 21 with $t=n k /(n k / \log s)^{1 / \Delta}$, we obtain a $\Sigma \Pi(\Sigma \Pi)_{t}$ circuit $C^{\prime}$ of the form

$$
C^{\prime}=\sum_{i=1}^{s^{\prime}} \prod_{j} g_{i j}
$$

with $s^{\prime}=s^{O\left((k n / \log s)^{1 / \Delta}\right)}$ and $\left|\operatorname{Var}\left(g_{i j}\right)\right| \leq t$ for all $i, j$. Furthermore, since each $g_{i j}$ being a polynomial computed by a gate in $C$, they are computable by multi- $k$-ic circuits of size at most $s$. By induction, each $g_{i j}$ has a multi- $k$-ic $(\Sigma \Pi)^{\Delta-1}$ circuit of size at most

$$
s^{O\left((\Delta-1) \cdot(t / \log s)^{1 /(\Delta-1)}\right)}=s^{O\left((\Delta-1) \cdot(n k / \log s)^{1 / \Delta}\right)} .
$$

Replacing each $g_{i j}$ by this circuit, we obtain a $(\Sigma \Pi)^{\Delta}$ circuit of size at most

$$
s^{\prime} \cdot s^{O\left((\Delta-1) \cdot(n k / \log s)^{1 / \Delta}\right)}=s^{O\left(\Delta \cdot(k n / \log s)^{1 / \Delta}\right)}
$$

## 7 Open problems

The most interesting question that comes out of this work is to prove a lower bound of $n^{\omega(\sqrt{n / \log n})}$ for syntactically multilinear circuits of depth-4 for an explicit polynomial. A natural and first approach to this could be to understand if the shifted partials based methods can prove a lower bound of $n^{\Omega(\sqrt{d})}$ for homogeneous depth- 4 circuits for a polynomial family with degree $d=\omega(n / \log n)$.

Another question of interest would be to understand the correct exponent for the depth reduction results to depth- 4 (and also to higher depth) for various regimes of the degree $d$. From [9], we know that for $d=O\left(n^{\varepsilon}\right)$ for a small enough constant $\varepsilon, \sqrt{d}$ is the correct exponent, whereas for $d$ being nearly $n$, the results in this paper and those of Raz and Yehudayoff [12] show that the correct exponent is $\sqrt{n / \log n}$. But we do not understand this phenomenon for other values of $d$.

## References

1 Manindra Agrawal and V. Vinay. Arithmetic Circuits: A Chasm at Depth Four. In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2008), pages 67-75, 2008. doi:10.1109/FOCS.2008. 32.
2 Eric Allender, Jia Jiao, Meena Mahajan, and V. Vinay. Non-Commutative Arithmetic Circuits: Depth Reduction and Size Lower Bounds. Theoretical Computer Science, 209(1-2):47-86, 1998. doi:10.1016/S0304-3975 (97) 00227-2.
3 Noga Alon, Mrinal Kumar, and Ben Lee Volk. Unbalancing Sets and an Almost Quadratic Lower Bound for Syntactically Multilinear Arithmetic Circuits. In CCC 2018, pages 1-16, 2018. arXiv:1708.02037. doi:10.4230/LIPIcs.CCC.2018.11.

4 Suryajith Chillara, Mrinal Kumar, Ramprasad Saptharishi, and V. Vinay. The Chasm at Depth Four, and Tensor Rank : Old results, new insights. CoRR, 2016. arXiv:1606.04200.
5 Ankit Gupta, Pritish Kamath, Neeraj Kayal, and Ramprasad Saptharishi. Approaching the Chasm at Depth Four. Journal of the $A C M, 61(6): 33: 1-33: 16,2014$. Preliminary version in the 28th Annual IEEE Conference on Computational Complexity (CCC 2013). doi: 10.1145/2629541.

6 Sumant Hegde and Chandan Saha. Improved Lower Bound for Multi- $r$-ic Depth Four Circuits as a Function of the Number of Input Variables. Proceedings of Indian National Science Academy, 83(4):907-922, 2017. doi:10.16943/ptinsa/2017/49224.

7 Neeraj Kayal. An exponential lower bound for the sum of powers of bounded degree polynomials. In Electronic Colloquium on Computational Complexity (ECCC)TR12-081, 2012. URL: http://eccc.hpi-web.de/report/2012/081/.
8 Pascal Koiran. Arithmetic Circuits: The Chasm at Depth Four Gets Wider. Theoretical Computer Science, 448:56-65, 2012. doi:10.1016/j.tcs.2012.03.041.
9 Mrinal Kumar and Shubhangi Saraf. On the power of homogeneous depth 4 arithmetic circuits. In Proceedings of the 55th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2014), pages 364-373, 2014. doi:10.1109/FOCS.2014.46.
10 Ran Raz. Multi-Linear Formulas for Permanent and Determinant are of Super-Polynomial Size. J. ACM, 56(2):8:1-8:17, 2009. Preliminary version in the 36th Annual ACM Symposium on Theory of Computing (STOC 2004). doi:10.1145/1502793.1502797.
11 Ran Raz, Amir Shpilka, and Amir Yehudayoff. A lower bound for the size of syntactically multilinear arithmetic circuits. SIAM J. Comput., 38(4):1624-1647, 2008. Preliminary version in the 48 th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007). doi:10.1137/070707932.
12 Ran Raz and Amir Yehudayoff. Lower Bounds and Separations for Constant Depth Multilinear Circuits. Computational Complexity, 18(2):171-207, 2009. Preliminary version in the 23rd Annual IEEE Conference on Computational Complexity (CCC 2008). doi:10.1007/s00037-009-0270-8.
13 Amir Shpilka and Amir Yehudayoff. Arithmetic Circuits: A survey of recent results and open questions. Foundations and Trends in Theoretical Computer Science, 5:207-388, March 2010. doi:10.1561/0400000039.
14 Sébastien Tavenas. Improved bounds for reduction to depth 4 and depth 3. Inf. Comput., 240:2-11, 2015. Preliminary version in the 38th Internationl Symposium on the Mathematical Foundations of Computer Science (MFCS 2013). doi:10.1016/j.ic.2014.09.004.
15 Leslie G. Valiant, Sven Skyum, S. Berkowitz, and Charles Rackoff. Fast Parallel Computation of Polynomials Using Few Processors. SIAM J. Comput., 12(4):641-644, 1983. Preliminary version in the 6th Internationl Symposium on the Mathematical Foundations of Computer Science (MFCS 1981). doi:10.1137/0212043.

## A Proof of Lemma 19

Proof of Lemma 19. The proof shall proceed by induction on the height of $u$ (leaves are at height 0 ). We shall present the proof of (2); the proof of (1) is analogous.

Case 1: $u=u_{L} \times u_{R}$. For any $w$, we have that $[u: w]=1$ if $u=w$, and $[u: w]=$ $\left[u_{1}\right] \cdot\left[u_{2}: w\right]$ whenever $u \neq w$. In particular, since $|\operatorname{Var}(v)|<m \leq|\operatorname{Var}(u)|$ the LHS is $[u: v]=\left[u_{L}\right] \cdot\left[u_{R}: v\right]$.
If $\left|\operatorname{Var}\left(u_{R}\right)\right| \geq m$, then for any $(w, z) \in \mathbb{F}_{m, v}^{+}$or $\mathcal{F}_{m, v}^{\times}$we have $w \neq u$. Inducting on $u_{R}$,

$$
\begin{aligned}
\text { LHS } & =\left[u_{L}\right] \cdot\left[u_{R}: v\right] \\
& =\left[u_{L}\right] \cdot\left(\sum_{(w, z) \in \mathcal{F}_{m, v}^{\times}}\left[u_{R}: w\right] \cdot\left[w_{L}\right] \cdot[z: v]+\sum_{(w, z) \in \mathcal{F}_{m, v}^{+}}\left[u_{R}: w\right] \cdot[z: v]\right) \\
& =\sum_{(w, z) \in \mathcal{F}_{m, v}^{\times}}\left[u_{L}\right] \cdot\left[u_{R}: w\right] \cdot\left[w_{L}\right] \cdot[z: v]+\sum_{(w, z) \in \mathcal{F}_{m, v}^{+}}\left[u_{L}\right] \cdot\left[u_{R}: w\right] \cdot[z: v] \\
& =\sum_{(w, z) \in \mathcal{F}_{m, v}^{\times}}[u: w] \cdot\left[w_{L}\right] \cdot[z: v]+\sum_{(w, z) \in \mathcal{F}_{m, v}^{+}}[u: w] \cdot[z: v]=\text { RHS. }
\end{aligned}
$$

On the other hand, if $\left|\operatorname{Var}\left(u_{R}\right)\right|<m$ then $[u: w]=0$ for any $w \neq u$ with $|\operatorname{Var}(w)| \geq m$. Hence,

$$
\begin{aligned}
\text { RHS } & =\sum_{(w, z) \in \mathcal{F}_{m, v}^{\times}}[u: w] \cdot\left[w_{L}\right] \cdot[z: v]+\sum_{(w, z) \in \mathcal{F}_{m, v}^{+}}[u: w] \cdot[z: v] \\
& =[u: u] \cdot\left[u_{L}\right] \cdot\left[u_{R}: v\right]=[u: v]=\text { LHS. }
\end{aligned}
$$

Case 2: $u=u_{1}+u_{2}$. For any $w$, we have that $[u: w]=1$ if $u=w$, and $[u: w]=\left[u_{1}:\right.$ $w]+\left[u_{2}: w\right]$ whenever $u \neq w$. In particular, since $|\operatorname{Var}(v)|<m \leq|\operatorname{Var}(u)|$ the LHS is $[u: v]=\left[u_{1}: v\right]+\left[u_{2}: v\right]$.
Since $u$ is a + gate, $\left(u, u_{j}\right) \notin \mathcal{F}_{m, v}^{\times}$for any $j$. If $\left|\operatorname{Var}\left(u_{j}\right)\right|<m$ for some $j$, then the edge $\left(u, u_{j}\right) \in \mathcal{F}_{m, v}^{+}$. Hence,

$$
\begin{aligned}
\text { RHS } & =\sum_{(w, z) \in \mathcal{F}_{m, v}^{\times}}[u: w] \cdot\left[w_{L}\right] \cdot[z: v]+\sum_{(w, z) \in \mathcal{F}_{m, v}^{+}}[u: w] \cdot[z: v] \\
& =:
\end{aligned} T_{1}+\quad T_{2}
$$

In $T_{1}$, since every $(w, z) \in \mathcal{F}_{m, v}^{\times}$has $w \neq u$ we have

$$
\begin{aligned}
T_{1} & :=\sum_{(w, z) \in \mathcal{F}_{m, v}^{\times}}\left(\sum_{i}\left[u_{i}: w\right]\right) \cdot\left[w_{L}\right] \cdot[z: v] \\
& =\sum_{(w, z) \in \mathcal{F}_{m, v}^{\times}}\left(\sum_{i:\left|\operatorname{Var}\left(u_{i}\right)\right| \geq m}\left[u_{i}: w\right]\right) \cdot\left[w_{L}\right] \cdot[z: v] \quad\left(\text { since }\left[u_{j}: w\right]=0 \text { if }\left|\operatorname{Var}\left(u_{j}\right)\right|<m\right) \\
& =\sum_{i:\left|\operatorname{Var}\left(u_{i}\right)\right| \geq m} \sum_{(w, z) \in \mathcal{F}_{m, v}^{\times}}\left[u_{i}: w\right] \cdot\left[w_{L}\right] \cdot[z: v] .
\end{aligned}
$$

As for the other term, it can be written as

$$
\begin{aligned}
& T_{2}:=\sum_{\substack{(w, z) \in \mathcal{F}_{m, v}^{+}}}[u: w] \cdot[z: v] \\
&=\sum_{\substack{(w, z) \in \mathcal{F}_{m, v}^{+} \\
w \neq u}}[u: w] \cdot[z: v]+\sum_{j:\left|\operatorname{Var}\left(u_{j}\right)\right|<m}[u: u] \cdot\left[u_{j}: v\right] \\
&=\sum_{\substack{(w, z) \in \mathcal{F}_{m, v}^{+} \\
w \neq u}}\left(\sum_{i}\left[u_{i}: w\right]\right) \cdot[z: v]+\sum_{j:\left|\operatorname{Var}\left(u_{j}\right)\right|<m}\left[u_{j}: v\right] \\
&=\sum_{\substack{(w, z) \in \mathcal{F}_{m}^{+} \\
w \neq u}}\left(\sum_{i:\left|\operatorname{Var}\left(u_{i}\right)\right| \geq m}\left[u_{i}: w\right]\right) \cdot[z: v]+\sum_{j:\left|\operatorname{Var}\left(u_{j}\right)\right|<m}\left[u_{j}: v\right] \\
&\left.=\sum_{\substack{i:\left|\operatorname{Var}\left(u_{i}\right)\right| \geq m}}\left[u_{i}: w\right] \cdot[z: v]\right)+\mathcal{F}_{m, v}^{+} \\
& \sum_{j:\left|\operatorname{Var}\left(u_{j}\right)\right|<m}\left[u_{j}: v\right] .
\end{aligned}
$$

The last equality holds because $\left[u_{i}: u\right]=0$. Putting it together,

$$
\begin{aligned}
\text { RHS } & =T_{1}+T_{2} \\
& =\sum_{i:\left|\operatorname{Var}\left(u_{i}\right)\right| \geq m}\left(\sum_{(w, z) \in \mathcal{F}_{m, v}^{\times}}\left[u_{i}: w\right] \cdot\left[w_{L}\right] \cdot[z: v]+\sum_{(w, z) \in \mathcal{F}_{m, v}^{+}}\left[u_{i}: w\right] \cdot[z: v]\right)
\end{aligned}
$$

$$
\begin{aligned}
& +\sum_{j:\left|\operatorname{Var}\left(u_{j}\right)\right|<m}\left[u_{j}: v\right] \\
= & \sum_{i:\left|\operatorname{Var}\left(u_{i}\right)\right| \geq m}\left[u_{i}: v\right]+\sum_{j:\left|\operatorname{Var}\left(u_{j}\right)\right|<m}\left[u_{j}: v\right] \quad \text { (induction) } \\
= & {[u: v]=\text { LHS. } }
\end{aligned}
$$


[^0]:    1 Though this is not explicitly mentioned in these results, the proofs can be extended to this regime of parameters.
    2 Also referred to as a syntactically multilinear $(\Sigma \Pi)^{\Delta}$ circuit.
    3 A multilinear circuit is a multi- $k$-ic circuit for $k=1$.

[^1]:    ${ }^{4} v$-snipped parse-trees rooted at $u$ that have $w$ on its rightmost path

