Quasi-polynomial Hitting Sets for Circuits with Restricted Parse Trees by Saptharishi, Ramprasad & Tengse, Anamay
ar
X
iv
:1
70
9.
03
06
8v
2 
 [c
s.C
C]
  2
6 O
ct 
20
17
Quasi-polynomial Hitting Sets for Circuits
with Restricted Parse Trees
Ramprasad Saptharishi* Anamay Tengse†
Tata Institute of Fundamental Research, Mumbai, India
{ramprasad , tengse.anamay}@tifr.res.in
May 18, 2019
Abstract
We study the class of non-commutative Unambiguous circuits or Unique-Parse-Tree (UPT) cir-
cuits, and a related model of Few-Parse-Trees (FewPT) circuits (which were recently introduced by
Lagarde, Malod and Perifel [LMP16] and Lagarde, Limaye and Srinivasan [LLS17]) and give the fol-
lowing constructions:
• An explicit hitting set of quasipolynomial size for UPT circuits,
• An explicit hitting set of quasipolynomial size for FewPT circuits (circuits with constantly many
parse tree shapes),
• An explicit hitting set of polynomial size for UPT circuits (of known parse tree shape), when a
parameter of preimage-width is bounded by a constant.
The above three results are extensions of the results of [AGKS15], [GKST15] and [GKS16] to the setting
of UPT circuits, and hence also generalize their results in the commutative world from read-once
oblivious algebraic branching programs (ROABPs) toUPT-set-multilinear circuits.
The main idea is to study shufflings of non-commutative polynomials, which can then be used to
prove suitable depth reduction results for UPT circuits and thereby allow a careful translation of the
ideas in [AGKS15], [GKST15] and [GKS16].
1 Introduction
The field of algebraic complexity dealswith classifyingmultivariate polynomials based on their hardness.
Typically, the complexity of a polynomial is measured by the size of the smallest circuit computing it (an
arithmetic circuit is a directed acyclic graph made up of internal nodes that are labeled with + or × and
leaves labelledwith variables or constants from the field). The central question in this field is to construct
*Research supported by Ramanujan Fellowship of DST.
†Supported by a fellowship of the DAE.
Cool! You found it! (FWIW) Base version: ((None)) , (None)
1
an explicit family of polynomials ({Permn} is the top candidate) that requires large arithmetic circuits to
compute it. This is also called the “VP vs VNP” question (named after Valiant [Val79]), and thought of as
an algebraic analogue of the “P vsNP” question.
So far, the best lower bound we have for general arithmetic circuits computing an n-variate degree d
polynomial is a barely super-linearΩ(n logd ) lower bound by Baur and Strassen [BS83]. Recent research
has focused on proving lower bounds for restricted classes of circuits, either by bounding the depth of
such circuits or by focusing on other syntactic restrictions. One such syntactic restriction is to consider
non-commutative circuits, where we assume that the underlying variables x1, . . . ,xn do not commute. In
the non-commutative model, there is an inherent order in which elements are multiplied and this adds
restrictions on the way monomials can be computed (xy 6= yx here and hence x2+2xy + y2 6= (x+ y)2 =
x2+ xy + yx+ y2). It is therefore natural to expect that it should be easier to prove lower bounds in this
model.
Nisan [Nis91] introduced the non-commutative model, specifically the non-commutative algebraic
branching programs (ABP). In his seminal paper, he showed that the non-commutative versions of the
determinant and permanent polynomials (among others) require exponential sized non-commutative
ABPs to compute them. In fact, using his technique, one could even reconstruct the smallest non-
commutative ABP given just oracle access to that polynomial (cf. [KS06])! Although we have exponen-
tial lower bounds for non-commutative ABPs, we do not have any non-trivial lower bounds for non-
commutative circuits. Hrubeš, Wigderson and Yehudayoff [HWY10] presented an approach via sum-
of-squares lower bounds but we do not have any non-trivial lower bounds for the class of general non-
commutative circuits.
Limaye, Malod and Srinivasan [LMS16] extendedNisan’s lower bound to non-commutative skew cir-
cuits, which are circuits where everymultiplication gate has atmost one child that is a non-leaf. Lagarde,
Malod and Perifel [LMP16] initiated the study of non-commutative unambiguous circuits, or Unique
Parse Tree (UPT) circuits. These circuits, and generalizations are the mainmodels of study in this paper.
Arvind andRaja [AR16] also studied lower bounds for various subclasses of commutative set-multilinear
circuits. Some of the models they study also include analogues of UPT and FewPT circuits. They also
proved lower bounds for UPT and FewPT set-multilinear circuits, and also for other subclasses of set-
multilinear circuits called narrow set-multilinear circuits, interval set-multilinear circuits, the latter of
which assumes the sum-of-squares conjecture of Hrubeš, Wigderson and Yehudayoff [HWY10].
1.1 Themodel of study
A parse tree of a circuit is obtained by starting at the root, and at every + gate choosing exactly one
child, and at every × gate choosing all its children (formally defined in Definition 2.1). Informally, a
parse tree of a circuit is basically a certificate of computation of a monomial in a circuit. Lagarde, Malod
and Perifel [LMP16] introduced a subclass of non-commutative circuits called Unique Parse Tree (UPT)
circuitsorunambiguous circuitswhere all parse trees of the circuit have the same shape (formally defined
2
in Definition 2.2). The class of non-commutative UPT circuits subsumes the class of non-commutative
ABPs as any ABP can be expressed as a left-skew circuit. A related model of set-depth-∆ formulas was
studied by Agrawal, Saha and Saxena [ASS13] that is a subclass of UPT circuits where the underlying
parse trees are extremely regular1.
Lagarde, Malod and Perifel [LMP16] extended the techniques of Nisan [Nis91] to give exponential
lower bounds forUPT circuits. Subsequently, Lagarde, Limaye andSrinivasan [LLS17] extended the lower
bounds to the class of circuits with parse trees of not-too-many shapes (at most 2o(n) shapes).
In Figure 1, (a) is an example of a UPT circuit with (b) being the underlying parse tree shape; (c) is an
example of a circuit with two distinct parse tree shapes.
+
× × ×
+ +
x1 x2 x3 x4
(a)
+
×
+
(b)
+
× × ×
+ +
x1 x2 x3 x4
(c)
Figure 1: Examples of circuits with restricted parse trees
1.2 Polynomial identity testing
A Polynomial Identity Test (PIT) is an algorithm that, given a circuit as input, checks if the circuit is com-
puting the zero polynomial or not. The standard Ore-DeMillo-Lipton-Schwartz-Zippel lemma [Ore22,
DL78, Sch80, Zip79] provides a simple randomized algorithm but the goal is to construct an efficient de-
terministic PIT. A stronger test is what is called a black-box PIT where we are only provided evaluation
access to the circuit. Hence, a black-box PIT is essentially equivalent to constructing a hitting set i.e., a set
of points (or matrices, in the case of non-commutative polynomials) H such that every non-zero poly-
nomial from the class of interest is guaranteed to evaluate to a nonzero value on some element a ∈H .
PITs that use the structure of the circuit are called white-box PITs.
The task of constructing efficient PITs is intimately connected to the task of proving lower bounds
[HS80, KI04, Agr05]. Once we have a lower bound for a class C , it is natural to ask if we can also con-
struct efficient PITs for that class. Raz and Shpilka [RS05] gave the first deterministic polynomial time
white-box PIT for the class of non-commutative ABPs. Forbes and Shpilka [FS13] gave a quasipolyno-
mial (nO(logn)) size hitting set for non-commutative ABPs. This was achieved by studying a natural com-
mutative analogue of non-commutative ABPs, and this was the class of Read-Once Oblivious Algebraic
Branching Programs (ROABPs)where the variables are read in a “known order”.
1the formula is levelled, and all nodes at a level have the same fan-in
3
The class of ROABPs is interesting in its own right owing to the connection with the “RL vs L” ques-
tion. In fact, much of the hitting set constructions for ROABPs has been inspired by Nisan’s [Nis92] pseu-
dorandom generator for RL (which has seed lengthO(log2n)). As mentioned earlier, Forbes and Shpilka
gave a hitting set of size nO(logn) for polynomial sized ROABPs when the order in which variables are read
was known. Agrawal, Gurjar, Korwar and Saxena [AGKS15] presented a different hitting set for the class
of commutative ROABPs that did not need the knowledge of the order in which the variables were read.
Subsequently, Gurjar, Korwar, Saxena and Thierauf [GKST15] studied polynomials that can be computed
as a sum of constantly many ROABPs (of possibly different orders) and presented a polynomial time
white-box PIT, and also a quasipolynomial time black-box PIT for this class.
Lagarde, Malod and Perifel [LMP16], besides presenting lower bounds for non-commutative UPT
circuits, also gave a polynomial time white-box PIT for this class. This was extended by Lagarde, Limaye
and Srinivasan [LLS17] to a white-box algorithm for non-commutative circuits with constantly many
parse tree shapes (analogous to the result of [GKST15]). The question of constructing black-box PITs was
left open by them, and we answer this in our paper.
1.3 Our results
Polynomial Identity Testing
Our main results are hitting sets for the class of polynomials computed by UPT circuits and related
classes.
Theorem 1.1 (Hitting sets for UPT circuits). There is an explicit hitting set Hd ,n,s of at most (snd )
O(logd)
size for the class of degree d n-variate homogeneous non-commutative polynomials in F〈x1, . . . ,xn〉 that
are computed by UPT circuits of size at most s.
This result builds on the technique of basis isolating weight assignments introduced by [AGKS15] for
constructing hitting sets for ROABPs. Furthermore, we can also extend the hitting set to the class of non-
commutative circuits that have few shapes (analogous to [GKST15]’s hitting set for sum of few ROABPs).
Theorem 1.2 (Hitting sets for circuits with few parse tree shapes). There is an explicit hitting setHd ,n,s,k
of size at most (s2
k
nd )O(logd) for the class of n-variate degree d homogeneous non-commutative polyno-
mials in F〈x1, . . . ,xn〉 that are computed by non-commutative circuits of size at most s consisting of parse
trees of at most k shapes.
Both the above theorems are fully black-box in the sense that it is not required to know the under-
lying shape(s). For the case of non-commutative ABPs (and more generally, ROABPs in a known order),
Gurjar, Korwar and Saxena [GKS16] presented a more efficient hitting set when the width of the ABP is
small. For UPT circuits, there is a natural notion of preimage-width of a UPT circuit (formally defined
in Definition 2.3) that corresponds to the notion of width of an ABP. We show an analogue of the hitting
set of Gurjar, Korwar and Saxena for the class of UPT circuits of small preimage-width if the underlying
shape of the parse trees is known.
4
Theorem1.3 (Hitting sets for known-shape low-width UPT circuits). LetCn,d ,T,w be the class of n-variate
degree d non-commutative polynomials that are computable by UPT circuits of preimage-width at most
w and underlying parse-tree shape as T . Over any field of zero or large characteristic, there is an explicit
hitting set Hn,d ,T,w of size w
O(logd)poly(nd ) for Cn,d ,T,w .
These hitting sets also translate to the natural commutative analogues ofUPT set-multilinear circuits
etc. (formally defined in Definition 5.1).
Structural results
If f is a non-commutative polynomial of degree d and if σ ∈ Sd is a permutation on d letters, we define
the shuffling of f byσ (denotedby∆σ( f )) as thenatural operation of permuting eachword of f according
to σ.
The three PIT statements stated above begin with the following depth reduction statement about
UPT circuits.
Theorem 1.4 (Depth reduction for UPT circuits). Let f be an n-variate degree d polynomial that is com-
putable by aUPT circuit of preimage-width w. Then, there is someσ∈ Sd such that∆σ( f ) can be computed
by a UPT circuit of O(logd ) depth and preimage-width at most O(w2).
The above theorem implies that ∆σ( f ) is computable by an ABP of quasipolynomial size. We also
show that this blow-up of quasipolynomial size is tight.
Theorem 1.5 (Separating UPT circuits and ABPs, under shuffling). There is an explicit n-variate degree
d non-commutative polynomial f that is computable by UPT circuits of preimage-width w = poly(n,d )
such that for every σ ∈ Sd , the polynomial ∆σ( f ) requires non-commutative ABPs of size (nd )
Ω(lognd) to
compute it.
We also extend the lower bound of [LMP16] to give a polynomial computed by a skew circuit that
requires exponential sized UPT circuits under any shuffling. Details are in Appendix B.
1.4 Proof ideas
Asmentioned, the starting point of all these results is the depth reduction. From a result of Nisan [Nis91],
the palindrome polynomial Pald is known to require ABPs of size 2
Ω(d) even though it can be computed
by a polynomial sized UPT circuit. Therefore, Pald cannot be computed by a circuit of depth o(d/logd ).
The key insight here is that even though Pald cannot be computed by small depth non-commutative
circuits, a shuffling of the palindrome is
∑
w1,...,wd∈[n]
xw1xw1xw2xw2 · · ·xwd xwd =
d∏
i=1
(x1x1+·· ·+xnxn) ,
5
which is of course computable by anO(logd ) depth UPT formula even. Hence we attempt to reduce the
depth under a suitable shuffling.
In order to establish the depth reduction (Theorem 1.4) we follow the strategy of Valiant, Skyum,
Berkowitz and Rackoff [VSBR83] and Allender, Jiao, Mahajan and Vinay [AJMV98] but make use of the
UPT structure (work with different frontier nodes and gate quotients) based on the underlying shape of
the parse trees. It was pointed out to us that the key ideas in our proof of depth reduction were used by
Arvind and Raja ([AR16]) for a commutative analogue of UPT circuits.
This depth reduction immediately yields that there is a quasipolynomial sized ABP computing a shuf-
fling of f . We show that this blow-up is tight (Theorem 1.5) by essentially following the proof of Hrubeš
and Yehudayoff [HY16] to separate monotone ABPs and monotone circuits in the commutative world.
In order to obtain hitting sets for UPT circuits, one could potentially just use the fact that there is
a quasipolynomial sized ABP computing a shuffling of f and just use the known hitting sets for non-
commutative ABPs [FS13] to obtain a hitting set of poly(ndw )O(log
2 d). However, we directly work with
the UPT circuit and lift the technique of basis isolating weight assignments of Agrawal, Gurjar, Korwar
and Saxena [AGKS15] to this more general setting to obtain Theorem 1.1. Theorem 1.3 is a straightfor-
ward generalization of the ideas of Gurjar, Korwar and Saxena [GKS16] once we observe that the depth
reduction keeps the preimage-width small.
Theorem 1.2 essentially follows the same ideas of Gurjar, Korwar, Saxena and Thierauf [GKST15]. The
techniques of [GKST15] are general enough that once a circuit class has a characterizing set of dependen-
cies and a basis isolating weight assignment, there is a natural method to lift the techniques to work with
the sum of few elements from this class. [GKST15] use this for ROABPs and we use this for UPT circuits.
To summarize, once we obtain the depth reduction, much of the results in this paper is a careful
translation of prior work of [HY16], [AGKS15], [GKST15], [GKS16] to the setting of UPT (or FewPT) cir-
cuits. Consequently, this also generalizes the hitting sets of [AGKS15, GKST15, GKS16] from ROABPs to
UPT (or FewPT) set-multilinear circuits. Such a generalization was unknown prior to this work.
2 Preliminaries
2.1 Notation
• We use F〈x1, . . . ,xn〉 to refer to the ring of polynomials in non-commuting variables {x1, . . . ,xn}.
For a parameter d , we use F〈x1, . . . ,xn〉deg=d to refer to the set of polynomials in F〈x1, . . . ,xn〉 that
are homogeneous and of degree d . Similarly, F〈x1, . . . ,xn〉deg≤d refers to the set of polynomials of
degree at most d .
• We use boldface letters x and y to denote sets of variables (the number of variables would be clear
from context). We shall also use [d ] to refer to the set {1,2, . . . ,d }.
• The paper would sometime shift between the commutative and the non-commutative domains.
We use x whenever we are talking about non-commutative variables, and y, z for variables in the
6
commutative domain.
2.2 Basic definitions
UPT and FewPT circuits
Definition 2.1 (Parse trees). A parse tree T of a circuit C is a tree obtained as follows:
• the root of C is the root of T ,
• if v ∈ T is a × gate, then all the children in C are the children of v in T in the same order,
• if v ∈ T is a + gate, then exactly one child of v in C is a child of v in T .
The value of the parse tree T , denoted by [T ], is just the product of the leaf labels in T . ♦
Intuitively, a parse tree is a certificate that amonomialwas produced in the computation ofC (though
it could potentially be canceled by other parse trees computing the same monomial). Therefore, if f is
the polynomial computed byC , then
f =
∑
T is a parse tree
[T ].
Definition 2.2. (UPT and FewPT circuits) A circuit C computing a homogeneous polynomial is said to be
a Unique Parse Tree (UPT) circuit if all parse trees of C have the same shape (that is, they are identical
except perhaps for the gate names).
A circuit C that computes a homogeneous polynomial is said to be a FewPT(k) circuit if the parse trees
of C have at most k distinct shapes. ♦
Definition 2.3 (Preimage-width). SupposeC is a UPT circuit and say T is the shape of the underlying parse
trees. For a node τ ∈ T and a gate g ∈C, we shall say that g is a preimage of τ, denoted by g ∼ τ, if and only
if there is some parse tree T ′ of C where the gate g appears in position τ.
The preimage-width of a UPT circuit C is the largest size of preimages of any node τ∈ T . That is,
preimage-width(C )=max
τ∈T
∣∣{g ∈C : g ∼ τ}∣∣ . ♦
It is clear that if C is a UPT circuit of preimage-width w computing a homogeneous degree d poly-
nomial, then the size of C is at most dw . The preimage-width of a UPT circuit is a more useful measure
to study than the size of the circuit. A simple concrete example of this is that the standard conversion of
homogeneous ABPs to homogeneous circuits in fact yields UPT circuits. Furthermore, the width of the
ABP is directly related to the preimage-width of the resulting UPT circuit.
Observation 2.4. If f is computable by a width w homogeneous algebraic branching program, then f can
be equivalently computed by UPT circuits of preimage-width w2.
7
×p-products
Definition 2.5 (×p-products). For any d1,d2 ≥ 0 and p satisfying 0 ≤ p ≤ d2, define ×p as the unique
bilinearmap ×p : F〈x1, . . . ,xn〉deg=d1 ×F〈x1, . . . ,xn〉deg=d2 → F〈x1, . . . ,xn〉deg=d1+d2 that satisfies
xw1 · · ·xwd1 ×p xv1 · · ·xvd2 = xv1 · · ·xvp xw1 · · ·xwd1 xvp+1 · · ·xvd2 . ♦
For instance, the usual multiplication (or concatenation) operation is just ×0.
Shuffling of a polynomial
Definition 2.6 (Shuffling of a non-commutative polynomial). Let Pd (x1, . . . ,xn) ∈ F〈x1, . . . ,xn〉deg=d be a
homogeneous degree d non-commutative polynomial. Given any permutation σ ∈ Sd over d-letters, we
can define the shuffling of Pd via σ as the unique linear map ∆σ : F〈x1, . . . ,xn〉deg=d → F〈x1, . . . ,xn〉deg=d
that is obtained by linearly extending
∆σ(xw1 · · ·xwd )= xwσ(1) · · ·xwσ(d) . ♦
2.3 Basic lemmas
Canonical UPT circuits, and types of gates
We shall say that a UPT circuit C with underlying parse tree shape T is canonical if for every gate g ∈C
there is some node τ ∈ T such that every parse tree of C involving g has g only in position τ. In other
words, every gate of the circuit has a unique type associated with it.
Lemma 2.7 ([LMP16]). Suppose if f ∈ F〈x1, . . . ,xn〉 is a homogeneous, degree d, non-commutative poly-
nomial computed by a non-commutative UPT circuit of preimage-width w. Then, f can be equivalently
computed by a canonical UPT circuit of preimage-width w as well.
For a canonical UPT circuit where the parse trees have shape T , we shall say that g has type τ if τ ∈ T
is the unique node in T such that g ∼ τ.
Fix a τ ∈ T and let i be the number of leaves of the subtree rooted at τ, and let p be the number of
leaves to the left of τ in the inorder traversal of T . We shall then say that τ (or a gate g ∈C of type τ) has
position-type (i ,p). The following lemma allows us to write the polynomial computed by the circuit as a
small sum of ×p-products.
Lemma 2.8 ([LMP16]). Let f be a polynomial computed by a canonical UPT circuit C of preimage-width
w and say T is the shape of the underlying parse trees. If τ ∈ T with position-type (i ,p), then we can write
f as
f (x)=
w∑
r=1
gr (x)×p hr (x),
8
where deggr = i and deghr = deg( f )− i for all r = 1, . . . ,w.
3 Depth reduction for UPT circuits
This section shall address Theorem 1.4, which we recall below.
Theorem 1.4 (Depth reduction for UPT circuits). Let f be an n-variate degree d polynomial that is com-
putable by aUPT circuit of preimage-width w. Then, there is someσ∈ Sd such that∆σ( f ) can be computed
by a UPT circuit of O(logd ) depth and preimage-width at most O(w2).
It waspointed out tous that a very similar depth reductionwas alsoprovedbyArvind andRaja [AR16].
They showed that a commutative UPT set-multilinear circuit can be depth-reduced to a corresponding
quasi-polynomial sized O(logd ) depth UPT set-multilinar formula via Hyafil’s [Hya79] depth reduction.
Using techniques similar to [VSBR83], one can obtain a polynomial sized circuit of depthO(logd ) while
maintaining unambiguity. Though this can be inferred from the results in [AR16], we state and prove it
in the form needed for the non-commutative setting.
3.1 UPT⊗-circuits
To prove the depth reduction, we will move to an intermediatemodel of UPT ⊗-circuits.
Definition 3.1 (UPT ⊗-circuits). The class of UPT ⊗-circuits is a generalization of homogeneous non-
commutative circuits in that the internal gates are + gates and ×p gates instead of the usual + and ×
gates. We shall also say that the circuit is semi-unbounded if all ×p gates have fan-in bounded by 2 (with
no restriction on + gates).
A parse tree for an ⊗-circuit is similar to parse trees in a general non-commutative circuit but the
internal nodes of the parse tree are labelled by + and×p (with the p specified at each gate).
We shall say that an ⊗-circuit C is UPT if every parse tree is of the same shape, i.e. two parse trees in C
can differ only in the gate names. ♦
To prove Theorem 1.4, we shall first depth reduce the circuit to obtain an ⊗-circuit computing f of
O(logd ) depth. Then, we will convert that to a UPT circuit that computes a shuffling of f .
Lemma 3.2 (Depth reducing to ⊗-circuits). Let f ∈ F〈x1, . . . ,xn〉 be a homogeneous degree d polynomial
that is computable by a UPT circuit of preimage-width s. Then, f can be equivalently be computed by a
semi-unboundedUPT ⊗-circuit of preimage-widthO(s2) and depth O(logd ).
Proof. LetC be the UPT circuit computing f (x1, . . . ,xn) and say T is the shape of the parse trees ofC . For
any node τ∈ T , let Fτ be the set of all gates inC whose position in T is τ. For two gates u,v ∈C , we shall
say that u º v if the place of u in T is an ancestor of the place of v in T . We shall abuse notation and use
u º τ tomean that u’s position in T is an ancestor of τ ∈ T . For a gate u ∈C , let [u] refer to the polynomial
9
computed at that gate. Similar to [VSBR83, AJMV98], we define inductively the following notion of a gate
quotient for any pair of gates u,v ∈C :
[u : v ]=


0 if u v ,
1 if u = v ,
[u1 : v ]+ [u2 : v ] if u =u1+u2,
[u1 : v ] · [u2] if u =u1×u2 and u1 º v ,
[u1] · [u2 : v ] if u =u1×u2 and u2 º v .
Claim 3.3. For any u ∈C, if τ ∈ T such that u º τ, then
[u]=
∑
w∈C
w∼τ
[w ]×p [u :w ] (3.4)
for a suitable p depending just on τ and the type of u. Furthermore, suppose u,v ∈ C with v
being a multiplication gate and if τ ∈ T such that u º τº v then
[u : v ]=
∑
w∈C
w∼τ
[w : v ]×p [u :w ]. (3.5)
for a suitable p depending just on τ and the type of u and v.
We’ll defer this proof to later and first finish the proof of Lemma 3.2. With (3.4) and (3.5), we can construct
the ⊗-circuit C ′ for f just as in [VSBR83, AJMV98]. The circuit C ′ would have gates computing each [u]
and [u : v ] for nodes u,v ∈C with u º v and v being a multiplication gate. The wirings in C ′ is built by
appropriate applications of (3.4) and (3.5).
Let u ∈C and say deg[u] = du . The plan would be to set up the computation in C
′ so that using an
O(1) depth computation, we can compute [u] using gates whose degrees are a constant factor smaller
than du . Consider any parse tree rooted at u, and starting from u follow the higher degree child. Let τ be
the last point on the path with degree ≥ du/2 (degree of its children will be < du/2). Applying (3.4),
[u]=
∑
w∼τ
[w ]×p [u :w ]
=
∑
w∼τ
([w1]× [w2])×p [u :w ] where w =w1×w2.
Now observe that each of the terms on the RHS, [u : w ], [w1], [w2] have degree at most du/2, as we
wanted. Furthermore, each coordinate of tuple ([u : w ], [w1], [w2]) are all of the same type as we run
over all w ∼ τ.
We now need to show how to compute [u : v ] for a pair u ≻ v . Say deg[u]= du and deg[v ]= dv . For
10
this, start with some parse tree rooted at u and walk down the path leading to the place of v , and let τ be
the last point on this path such that degτ≥ du+dv2 . Using (3.5),
[u : v ]=
∑
w∼τ
[w : v ]×p [u :w ]
=
∑
w∼τ
([w1]× [w2 : v ])×p [u :w ]
where w = w1×w2 and w2 º v (the other possibility is identical). By the choice of τ, we have deg[u :
w ],deg[w2 : v ]≤
du−dv
2 . However, the best boundwe can give on deg[w1] is du−dv . Nevertheless, we can
apply (3.4) again on [w1] by finding a suitable τ
′ ≺w1 satisfying degτ
′ ≥
degw1
2 and write
[u : v ]=
∑
w∼τ
([w1]× [w2 : v ])×p [u :w ]
=
∑
w∼τ
(( ∑
w ′∼τ′
[w ′]×p′ [w1 :w
′]
)
× [w2 : v ]
)
×p [u :w ]
=
∑
w∼τ
∑
w ′∼τ′
(((
[w ′1]× [w
′
2]
)
×p′ [w1 :w
′]
)
× [w2 : v ]
)
×p [u :w ]
By the choice of τ and τ′, each of the factors on the RHS have degree at most (du−dv )2 as we wanted.
Furthermore, once again, all of the summands consists of similarly typed factors.
This naturally yields an⊗-circuit computing f of depthO(logd ) and size poly(s). Since all summands
consist of similarly typed factors, it follows that the circuit is UPT as well.
Proof of Claim 3.3. The proof is by induction. As a base case, suppose u ∼ τ. Then, [u] is just the sum of
the values of parse trees. Some of the parse trees use u. Of all nodesw ∈C such thatw ∼ τ, only [u :u]= 1
and every other [u :w ]= 0. Therefore, clearly [u]=
∑
w∼τ[w ] · [u :w ].
Now suppose u ≻ τ and say we already know that [u′] =
∑
w∼τ[w ]×p [u
′ : w ] for every u ≻ u′ º τ. If
u = u1+u2, then
[u]= [u1]+ [u2]
=
( ∑
w∼τ
[w ]×p [u1 :w ]
)
+
( ∑
w∼τ
[w ]×p [u2 :w ]
)
=
∑
w∼τ
[w ]×p ([u1 :w ]+ [u2 :w ])
=
∑
w∼τ
[w ]×p [u :w ].
Similarly, suppose [u]= [u1]× [u2]. We have two cases depending on whether u1 º τ or u2 º τ.
11
If u1 º τ, then
[u]= [u1]× [u2]
=
( ∑
w∼τ
[w ]×p [u1 :w ]
)
× [u2]
=
∑
w∼τ
[w ]×p ([u1 :w ]× [u2])
=
∑
w∼τ
[w ]×p [u :w ].
If u2 º τ, then
[u]= [u1]× [u2]
= [u1]×
( ∑
w∼τ
[w ]×p [u2 :w ]
)
=
∑
w∼τ
[w ]×p+degu1 ([u1]× [u2 :w ])
=
∑
w∼τ
[w ]×p+d1 [u :w ].
Essentially the same proof works for (3.5) as well.
Lemma 3.6 (⊗-circuits to circuits for a shuffling). Let f ∈ F〈x1, . . . ,xn〉 be a homogeneous degree d poly-
nomial that is computable by a UPT ⊗-circuit C ′ of size s. Consider the circuit C ′′ obtained by replacing
all ⊗ gates in C ′ by × gates. Then, C ′′ computes ∆σ( f ) for some σ∈ Sd .
Proof. We shall prove this by induction. We need a slightly stronger inductive hypothesis which is that
the choice of permutationσ depends only on the shape of the parse trees inC ′.
Say u is the root of C ′. Suppose u is a + gate and say u = u1+u2+·· ·+ur . If u
′ = u′1+·· ·+u
′
r is the
resulting computation in C ′′ then by the inductive hypothesis, we know that there is a σ ∈ Sd such that
[u′
i
]=∆σ([ui ]). Therefore,
[u′]=
r∑
i=1
∆σ([ui ])=∆σ([u]).
Supposeu = u1×pu2with deg[u1]= d1 and deg[u2]= d2. Sayu1 =
∑
α∈[n]d1 aαxα and
∑
β∈[n]d2 bβxβ. Then,
[u]=
∑
α,β aαbβ ·xα×p xβ. If u
′, u′1 and u
′
2 is the resulting computation inC
′′, then
[u′]= [u′1]× [u
′
2]
=∆σ1([u1])×∆σ2([u2]) for some σ1 ∈ Sd1 ,σ2 ∈ Sd2 ,
=
∑
α,β
aαbβ · (∆σ1(xα)×∆σ2(xβ))
=
∑
α,β
aαbβ ·∆σ(xα×p xβ) for some σ ∈ Sd ,
=∆σ([u])
Together, Lemma 3.2 and Lemma 3.6 yield Theorem 1.4. (Theorem 1.4)
The following corollary is immediate from the fact that any circuit of depthD and size s can be com-
puted by a formula of size sO(d) and hence an ABP of size sO(d).
12
Corollary 3.7. If f ∈ F〈x1, . . . ,xn〉 is a homogeneous degree d polynomial that is computable by a UPT
circuit of size s, then there is some σ ∈ Sd such that ∆σ( f ) is computable by a non-commutative algebraic
branching program of size sO(logd).
Furthermore, the shuffling σ that permits this can also be efficiently computed given the underlying
shape for the circuit computing f .
3.2 UPT circuits of constant width
For a UPT circuit C , we shall say that its width is w if for every node τ in the shape T , there are at most
w gates of C that have type τ. The following observation is evident from the proof of the above depth
reduction.
Observation 3.8. If C is a UPT circuit of width w, then the depth reduced circuit C ′ as obtained in Theo-
rem 1.4 has width O(w2).
This observationwould allow us to yield amore efficient hitting set for the class of smallwidth known
shape UPT circuits. Details are present in Section C.2.
4 Separating ROABPs andUPT circuits
Theorem 1.5 (Separating UPT circuits and ABPs, under shuffling). There is an explicit n-variate degree
d non-commutative polynomial f that is computable by UPT circuits of preimage-width w = poly(n,d )
such that for every σ ∈ Sd , the polynomial ∆σ( f ) requires non-commutative ABPs of size (nd )
Ω(lognd) to
compute it.
The polynomial and the proof technique described here were introduced by Hrubeš and Yehudayoff
[HY16] to separate monotone circuits and monotone ABPs in the commutative regime. The polynomial
described here is a non-commutative analogue of the polynomial used by [HY16]. Much of the proof is
also the argument of [HY16] tailored to the non-commutative setting.
4.1 The polynomial
Let Td denote the complete binary tree of depth d (with 2
d leaves) and letD = 2d+1−1 refer to the number
of nodes in Td . We shall say that a colouring γ : Td → Zm is legal if for every node u ∈ T , if v and w are
the children of u then γ(u)=γ(v)+γ(w ) modm.
Let v1, . . . ,vD be the vertices of Td listed in an in-order manner (left-subtree listed inductively, then
the root, and then the right-subtree listed inductively). We now define the non-commutative polynomial
Pd (x1, . . . ,xm) ∈ F〈x1, . . . ,xm〉 of degreeD = 2
d+1−1 as
Pd (x1, . . . ,xm)=
∑
γ∈[m]D
γ is legal
xγ(v1)xγ(v2) · · ·xγ(vD ). (4.1)
13
Lemma 4.2 (Upper bound). For every m,d > 0, the polynomial Pd (y1, . . . , ym) can be computed by a non-
commutative UPT circuit of size O(m2d ).
(Refer to Appendix A for a proof).
Theorem 4.3 (Lower bound). For every permutation σ ∈ SD , any non-commutative ABP computing the
polynomial ∆σ(Pd ) has widthm
Ω(d).
Hence for d = logm, we have that Pd (x1, . . . ,xm) is computable by a UPT circuit of size O(m
2 logm)
but for every σ ∈ SD the above theorem tells us that ∆σ(Pd ) requires ABPs of widthm
Ω(logm) to compute
it. The lower bound follows on exactly same lines as the [HY16]. A proof is present in Appendix A.
5 Hitting sets for non-commutative models
Commutative brethren of non-commutative models
This reduction to an appropriate commutative case was used by Forbes and Shpilka [FS13] to reduce
constructing hitting sets for non-commutative ABPs to hitting sets for commutative ROABPs (more pre-
cisely, to set-multilinear ABPs). They studied the image of the non-commutative polynomial under the
mapΨ : F〈x1, . . . ,xn〉deg=d → F[y1,1, . . . , yd ,n] which is the unique F-linear map given byΨ : xw1 · · ·xwd 7→
y1,w1 · · · yd ,wd .
For themodel of non-commutative UPT circuits, the appropriate commutative model is a restriction
of set-multilinear circuits that we call UPT set-multilinear (UPT-SML) circuits.
Definition 5.1 (Set-multilinear circuits). Let y = y1 ⊔ ·· · ⊔ yd be a partition of the variables. A circuit C
computing a polynomial f ∈ F[y] is said to be a set-multilinear circuit with respect to the above partition
if:
• each gate g ∈C is labelled by a subset Sg ⊆ [d ] and g computes a polynomial over variables
⋃
i∈Sg yi
where every monomial of [g ] is divisible by exactly one variable in yi for each i ∈ Sg ,
• if g is a + gate, then the subset that labels g also labels each of its children,
• if g is a × gate with g1 and g2 being its children, then the subsets Sg1 and Sg2 labelling g1 and g2
respectively is a partition of Sg , i.e. Sg = Sg1 ⊔Sg2 . ♦
We shall say the circuit C isUPT set-multilinear if every parse tree of C is of the same shape and iden-
tically labelled. That is, if g and g ′ are × gates labelled by a set S ⊆ [d ], and if g = g1× g2 with S1 and S2
labelling g1 and g2, then the children of g
′ are also labelled by S1 and S2 respectively.
We shall say the set-multilinear circuit C is FewPT(k) set-multilinear if the circuit consists of parse
trees of at most k different shapes.
14
A natural generalization that will be useful later is amulti-output UPT set-multilinear circuit, which
is a UPT set-multilinear circuit that potentially has multiple output gates, which are all labelled with the
same subset.
Forbes and Shpilka [FS13] showed that constructing hitting sets for these commutative models suf-
fices for the non-commutative models by a simple reduction (details in Section C.1). We shall there-
fore focus on these commutative models for the hitting set constructions. And since we have already
seen that such circuits can be depth reduced2 to O(logd ) depth, it suffices to construct a hitting set for
O(logd )-depth UPT and FewPT set-multilinear circuits.
5.1 Preliminaries for PIT
Weight assignments and basis isolation
To construct hitting sets for ROABPs, Agrawal, Gurjar, Korwar and Saxena [AGKS15] defined the notion
of basis isolating weight assignments for associated vector spaces of polynomials. The description pre-
sented here is an adaptation of the approach of [AGKS15] to set-multilinear circuits of small depth.
Definition 5.2 (Basis Isolating Weight Assignment (BIWA)). A weight assignment is a function wt : y→
[M ]k , for some positive integer M, that can then be extended to all multilinearmonomials over y via
wt
(∏
i∈S
yi
)
=
n∑
i∈S
wt(yi ). ♦
Let V be a vector space of polynomials in F[y], which can also be thought of as a matrix with a gener-
ating set of polynomials listed out as rows (with each column being indexed by a monomial in y).
Such a weight assignment wt is said to be a basis isolating weight assignment for V if there exists a
basis of its column space, indexed by B ⊆Mons(y), such that
1. if m1,m2 ∈B andm1 6=m2, thenwt(m1) 6=wt(m2),
2. for every m ∉ B,
Vm ∈ span
{
Vm′ : m
′
∈B , wt(m′)≺wt(m)
}
where by Vm we mean the column of V indexed by the monomial m and ≺ is the lexicographic
ordering on Mk ⊂Nk .
Lemma 5.3 ([AGKS15]). Let V be a vector space of polynomials in F[y] and say f ∈V . If wt : y→ [M ]k is a
2the shuffling just reorders the partition of the set-multilinear circuit
15
BIWA for V , then if t= {t1, . . . , tk}
f (y1, . . . , yn) 6= 0⇐⇒f (t
wt(y1), · · · ,twt(yn )) 6= 0
(where t(α1,...,αk ) is short-hand for t
α1
1 · · · t
αk
k
).
If f 6= 0 and deg( f )≤ d , then f (twt(y1), . . . ,twt(yn )) is a non-zero k-variate polynomial of degree at most
dM . Hence, the Schwartz-Zippel lemma would present a (dM +1)k sized hitting set.
Definition 5.4 (Separating small sets of monomials). Let S be an arbitrary set of monomials over y. We
shall say that a weight assignment wt : y→ N separates S if for every distinct m,m′ ∈ S we have wt(m) 6=
wt(m′). ♦
Lemma 5.5 ([AB03]). Let S be an arbitrary set of r multilinear monomials of degree at most d over vari-
ables y=
{
yi j : i ∈ [d ], j ∈ [n]
}
. For a prime p, let wp : y→N be a weight assignment given by
wp(yi , j )= 2
(i−1)n+( j−1) mod p.
Then for all but at most
(r
2
)
·n2 primes p, the weight assignment wp separates S.
BIWAs for subspaces and products
Agrawal, Gurjar, Korwar andSaxena [AGKS15] constructed BIWAs for polynomials computed by ROABPs.
The following two lemmas are slight abstractions of the key ideas in [AGKS15], so that they can also be
applied in our setting. For the sake of completeness, the proofs are provided in Section C.1.
Lemma 5.6 (BIWA for subspaces). Say V is a vector space of polynomials and supposewt is a BIWA for V .
Then, if V ′ is a subspace of V , thenwt is a BIWA for V ′ as well.
Lemma 5.7 (BIWA for variable disjoint products). Say V1 ⊆ F[y] and V2 ⊆ F[z] are two vector spaces of
polynomials over disjoint sets of variables, and of dimension at most s. Suppose
wt1 : y→N
k
wt2 : z→N
k
are BIWAs for V1 and V2 isolating bases B1 and B2 respectively. If w : y∪z→N is a weight assignment that
separates B1 ·B2 = {m1m2 : m1 ∈B1 ,m2 ∈B2}. Then the weight assignment defined by
wt : y∪z→Nk+1
wt : yi 7→ (wt1(yi ),w (yi )) for all yi ∈ y,
wt : zi 7→ (wt2(zi ),w (zi )) for all zi ∈ z,
16
is a BIWA for V =V1 ·V2 = span
{
f · g : f ∈V1 , g ∈V2
}
.
5.2 Hitting sets for UPT set-multilinear circuits
Theorem 5.8 (Hitting sets for UPT set-multilinear circuits). Let C be the class of n-variate degree d set-
multilinear polynomials (with respect to y= y1⊔·· ·⊔yd ) that are computable by UPT set-multlinear cir-
cuits of preimage-width w and depth r . Then, for M =
((w
2
)
n2d +1
)2
, the set
H =
{
(b11, . . . ,bdn) : p ∈ [M ]
r , ak ∈ A , bi j =
r+1∏
k=1
a
2(i−1)n+( j−1) mod pi
k
}
is a hitting set for C of size poly(ndw )r .
The proof of this theorem is obtained by constructing what is called a basis isolating weight assign-
ment for polynomials simultaneously computed by a multi-output UPT-SML circuit, heavily borrowing
from the ideas in [AGKS15].
Proof. Suppose f (y) is a polynomial that is computable by a UPT set-multilinear circuit C with respect
to y= y1⊔·· ·⊔yd and sayC is of preimage-width size w and depth r .
SinceC is a UPT set-multilinear circuit, let T be the shape of the parse tree. For each τ ∈ T , we define
the vector space
Vτ = span
{
[g ] : g ∈C , g ∼ τ
}
.
The following claim relates the vector space corresponding to nodes in T to the vector spaces corre-
sponding to the children.
Claim 5.9. If τ∈ T labels a + gate and if τ′ is the unique child of τ, then Vτ ⊆Vτ′ .
If τ∈ T labels a × gate and has children τ1 and τ2, then Vτ is a subspace of Vτ1 ·Vτ2 .
Proof. Suppose τ ∈ T labels a + gate and say τ′ is the unique child of τ in T . Pick an arbitrary
g ∈ C such that g ∼ τ. If [g ] = [g1]+ ·· · + [gs ], then each gi ∼ τ
′. Therefore, [gi ] ∈ Vτ′ and
[g ]= [g1]+·· ·+ [gs ] implies that [g ] ∈Vτ′ . Since the choice of g was an arbitrary gate of type τ,
it follows that Vτ is a subspace of Vτ′ .
Say τ labels a × gate, and say τ1 and τ2 are the children of τ. Pick an arbitrary gate g ∈C
with g ∼ τ. If [g ] = [g1]× [g2] then g1 ∼ τ1 and g2 ∼ τ2. But that implies that [g1] ∈ Vτ1 and
[g2] ∈ Vτ2 and therefore [g ] ∈ Vτ1 ·Vτ2 . Once again, since the choice of g was arbitrary, we get
Vτ is a subspace of Vτ1 ·Vτ2 . (Claim 5.9)
Define the multiplication height of any gate g , denoted by
∣∣g ∣∣
×
, as the largest number of × gates
encountered on a path from g to a leaf. Starting with the leaves, we shall build towards a BIWA for Vroot,
which by Lemma 5.3 also yields a hitting set.
17
Let P be the set of the first (dn2
(w
2
)
+1) primes. For each 0≤ k ≤ r and p= (p1, . . . ,pk) ∈P
k , define the
function
Ω
(k)
p : y→N
k+1
Ω
(k)
p : yi j 7→ ( j ,2
(i−1)n+( j−1) mod p1, . . . ,2
(i−1)n+( j−1) mod pk ).
The plan is to use Ω(k)p to build BIWAs for each Vτ. For a τ ∈ T with |τ|× = k , let Sτ ⊆ [d ] be the subset of
indices labelling τ. Define wt(τ)p to be the restriction ofΩ
(k)
p to ∪i∈Sτyi :
wt(τ)p :
⋃
i∈Sτ
yi →N
k+1
wt(τ)p (yi j )=Ω
(k)
p (yi j ).
We shall prove, by induction, that for each 0 ≤ k ≤ r there is a p ∈ Pk such that for every τ ∈ T with
|τ|× ≤ k , the weight assignment wt
(k)
p is a BIWA for Vτ.
If τ was a leaf of T , then any such node just computes a variable. Clearly, wt(τ)p : (yi j ) 7→ j is a BIWA
as it gives distinct weights to all variables of a partition. Hence, wt(τ)p is a BIWA for all Vτ whenever τ is a
leaf.
If τ is not a leaf but |τ|× = 0, then neither τ nor its descendants are × gates. Hence, the subtree at τ
has a unique leaf ℓ and all the nodes along this path are+ gates. By Claim 5.9, Vτ is a subspace of Vℓ and
hence, by Lemma 5.6, wt(τ)p =wt
(ℓ)
p is a BIWA for Vτ. That finishes the base case of k = 0.
Suppose we have proved the claim up to k −1. Let Tk be the set of all nodes of multiplication height
at most k that are × gates. By the inductive hypothesis, there exists p ∈ Pk−1 such that wt(τ
′)
p is BIWA for
all Vτ′ with
∣∣τ′∣∣
×
< k . Fix such a p. For each τ ∈ Tk , its children τ1,τ2 must have multiplication height at
most k −1. Since C is set-multilinear, the subset of indices that label τ1 and τ2 must be disjoint. Say S1
and S2 are the subsets of indices labelling τ1 and τ2 respectively.
Hence, by Claim 5.9, Vτ is a subspace of Vτ1 ·Vτ2 . By our inductive hypothesis, we know that wt
(τ1)
p
andwt(τ2)p are BIWAs forVτ1 andVτ2 respectively. Observe thatΩ
(k−1)
p restricted to the appropriate subset
of variables is a refinement of the weight assignments wt(τ1)p and wt
(τ2)
p (as |τ1|× or |τ2|× could have been
smaller than k − 1). Nevertheless, if wt(τ1)p and wt
(τ2)
p are BIWAs for Vτ1 and Vτ2 respectively, then the
following weight assignments
wt1 :
⋃
i∈S1
yi →N
k wt2 :
⋃
i∈S2
yi →N
k
wt1 : yi j 7→Ω
(k−1)
p (yi j ) wt2 : yi j 7→Ω
(k−1)
p (yi j )
are also BIWAs for Vτ1 and Vτ2 respectively. By using Lemma 5.7, Lemma 5.6 and Lemma 5.5, besides
18
perhaps
(w
2
)
n2 primes p ∈P , the weight assignment defined by
wt :
⋃
i∈S1∪S2
yi →N
k+1
wt(yi j )=

(wt1(yi j ),2
(i−1)n+( j−1) mod p) if i ∈ S1,
(wt2(yi j ),2
(i−1)n+( j−1) mod p) if i ∈ S2,
= (Ω(k−1)p (yi j ),2
in+ j mod p)
is a BIWA for Vτ. For different τs in Tk there may a different set of
(w
2
)
n2 primes that we should ex-
clude. But since the set P of primes is at least
(w
2
)
n2d + 1, there is a prime p ∈ P for which wt(yi j ) =
(Ω(k−1)p ,2
(i−1)n+( j−1) mod p) is a BIWA for everyVτ where τ ∈ Tk . By extending p by p in the last coordin-
ate, this shows that there is a p′ ∈ Pk such that for each τ ∈ Tk , the weight assignment wt
(τ)
p′
is a BIWA for
Vτ.
To complete the inductive step, we also need to prove the same for τ ∈ T that are+ gateswith |τ|× = k .
Hence, there must be a × gate τ′ ∈ Tk that is a descendant of τ such that the path from τ to τ
′ consists
only of + gates. Once again, this forces wt(τ)p = wt
(τ′)
p and Vτ is a subspace of Vτ′ . Hence, by Claim 5.9
and Lemma 5.6, it follows that wt(τ)p =wt
(τ′)
p is a BIWA for Vτ as well. And that completes the proof of the
inductive step.
Hence, if f is a polynomial computed by a preimage-width w UPT set-multilinear circuit of depth r ,
Ω
(r )
p is a BIWA for Vroot. Furthermore, by the prime number theorem, we know that the
((w
2
)
n2d +1
)
-th
prime cannot be bigger than
((w
2
)
n2d +1
)2
. Hence, the constructed BIWA is in fact a map
Ω
(r )
p : y→ [M ]
r+1
whereM ≤
((w
2
)
n2d +1
)2
. Therefore, by Lemma 5.3 and the Schwartz-Zippel lemma, if we pick a set A ⊆ F
with |A| > d ·
((w
2
)
n2d +1
)2
, then
H =
{
(b11, . . . ,bdn) : p ∈ [M ]
r , ak ∈ A , bi j =
r+1∏
k=1
a
2(i−1)n+( j−1) mod pi
k
}
is a hitting set for preimage-width w depth r UPT set-multilinear circuits and |H | = poly(ndw )r .
5.3 Poly-sized hitting sets for constant width UPT circuits
Theorem1.3 (Hitting sets for known-shape low-width UPT circuits). LetCn,d ,T,w be the class of n-variate
degree d non-commutative polynomials that are computable by UPT circuits of preimage-width at most
w and underlying parse-tree shape as T . Over any field of zero or large characteristic, there is an explicit
hitting set Hn,d ,T,w of size w
O(logd)poly(nd ) for Cn,d ,T,w .
19
The proof is an easy extension of the ideas from [GKS16], the details of which are in Section C.2.
6 FewPT circuits
In this section we describe the black-box identity test for FewPT(k) circuits. The following lemma from
[LLS17] shows that this class is equivalent to polynomials computed by sumof k UPT circuits (of possibly
different shapes).
6.1 Preliminaries
Lemma 6.1. ([LLS17, Lemma 16]) Let f (x) be a polynomial computed by FewPT(k) circuit of preimage-
width w. Then f can be equivalently computed by a sum of k UPT circuits of preimage-width w each.
Like in [LLS17], we’ll refer to this class by Σk -UPT. We shall further qualify this notation to use
Σ
k -UPT(w ) to denote the class of circuits that is a sum of k UPT circuits of preimage-width w .
From this lemma, we can focus our attention on constructing hitting sets for Σk -UPT-SML circuits.
The proof largely follows the ideas of Gurjar, Korwar, Saxena and Thierauf [GKST15]3.
Notation
Let y= y1⊔·· ·⊔yd be a partition of the variables and let S =
{
s1, . . . , sp
}
be a subset of [d ]. Define the set
of variables yS = ys1 ∪·· ·∪ysp and the set of monomials y
S = ys1 ×·· · ×ysp . Also, define y−S = y \yS and
y−S = y[d]\S .
Definition 6.2 (Coefficient operator).Given a set-multilinear polynomial f =
∑
m∈y[d] αmm of degree d,
for S ⊆ [d ] and a monomial m ∈ yS , define coeffm : F
[
y
]
→ F
[
y−S
]
to be as follows.
coeffm( f )=
∑
m′∈y−S
α(m·m′)m
′
whereα(m·m′) is the coefficient of mm
′ in f . ♦
Lemma6.3. Let y= y1⊔. . .⊔yd be a partitionand f (y) be a set-multilinearpolynomial (with respect to the
above partition) computed by aUPT-SML circuit of preimage-widthw and underlying parse-tree shape T .
Suppose g (y) is another set-multilinear polynomial (under the same partition) that cannot be computed
by aUPT-SML circuit of preimage-width w with the same shape T .
Then, there exists S ⊆ [d ] and R ∈ F[yS]
1×w ′ , and P,Q ∈ F[y−S]
w ′×1 with w ′ ≤w2 such that:
• For each i ∈ [w ′], there is a monomial mi ∈ y
S such that the i -th element of P and Q is coeffmi ( f )
and coeffmi (g ) respectively,
3[GKST15] constructed hitting sets for sums of ROABPs and we use similar techniques for sums of UPT circuits. Roughly
speaking, if we have a class C that has a characterizing set of dependencies for which we knowhow to construct BIWAs, then we
can also construct hitting sets for ΣkC .
20
• there is a vector Γ ∈ F1×w
′
of support size at most w +1 such that ΓP = 0 and ΓQ 6= 0,
• the coefficient space of R is full-rank, i.e. if we interpret R as a matrix over F by listing each of its w ′
entries as a column vector of coefficients, then this matrix has full column-rank.
• the vector of polynomials R is simultaneously computable by aUPT-SML circuit of preimage-width
at most w ′.
This lemma is a fairly natural and straightforward generalization of [GKST15, Lemma 4.5] and a proof
of this is provided in the appendix (Appendix D).
Lemma 6.4. Suppose f (y) is a non-zero polynomial computed by a Σk -UPT-SML(w ) circuit. Suppose
wt : y→M r is a weight assignment that satisfies the following properties:
• wt is a BIWA for spaces of polynomials simultaneously computed byUPT-SML circuits of preimage-
width at most w (w +1),
• For any g in Σk−1 -UPT-SML(w (w + 1)), the polynomial g (y+ twt) ∈ F(t)[y] has a monomial with
non-zero coefficient that depends on at most ℓ distinct variables in y.
Then, the polynomial f (y+twt) has amonomial, depending on at most log(w (w+1))+ℓ distinct variables
in y, with a non-zero coefficient.
This is essentially a restatement of [GKST15, Lemma 4.6, Lemma 4.8] and follows from their proof.
Unravelling the recursion, we get the following corollary.
Corollary 6.5. Let f (y) be a non-zero polynomial that can computed by a Σk -UPT-SML(w ) circuit. Sup-
pose wt : y→M r is a BIWA for the class of polynomials simultaneously computed by UPT-SML circuits of
preimage-width at most w2
O(k)
. Then, the polynomial f (y+ twt) ∈ F(t)[y] has a monomial with a non-zero
coefficient that depends on at most 2O(k) logw variables in y.
Once we are guaranteed to retain a monomial of small-support, we can construct a hitting set by
enumerating over all possible supports and applying the Schwartz-Zippel lemma [Ore22, DL78, Sch80,
Zip79] (or apply standard generators such as the Shpilka-Volkovich generator [SV15]). This completes
the proof of Theorem 1.2, which we restate below for convenience.
Theorem 1.2 (Hitting sets for circuits with few parse tree shapes). There is an explicit hitting setHd ,n,s,k
of size at most (s2
k
nd )O(logd) for the class of n-variate degree d homogeneous non-commutative polyno-
mials in F〈x1, . . . ,xn〉 that are computed by non-commutative circuits of size at most s consisting of parse
trees of at most k shapes.
7 Open problems
An interesting open problem (at least to us) is whether we can give non-trivial hitting sets for the class
of non-commutative skew circuits. Lagarde, Limaye and Srinivasan [LLS17] provide a white-box PIT in
21
some restricted settings when the skew circuits are somewhat closer to UPT (with some restriction on
what sort of parse trees they can have) but removing this restriction would be a great step forward.
Another issue is that the current construction of hitting sets for FewPT circuits (which build on
[GKST15]) incurs quasipolynomial losses at two different places. The first is in the construction of the
basis isolating weight assignment (BIWA), and we only know to construct that using quasipolynomially
large weights. The other is in a brute-force enumeration of all monomials of supportO(log s). As a result,
even if at a later day we have a construction of a BIWA with polynomially large weights, this proof would
still only yield a quasipolynomially large hitting set for FewPT circuits. It would be interesting to see if
this brute-force enumeration could be circumvented.
Acknowledgements
We thank the organizers of the NMI Workshop on Arithmetic Complexity 2017 where we learned of the
circuit classes that we study in this paper. We thankNutan Limaye and Srikanth Srinivasan for numerous
discussions that eventually led to these results. We thank Rohit Gurjar for pointing out a subtlety in a
previous draft of this paper, and also thank Amir Shpilka for inviting RS to Tel Aviv University (where this
discussion took place).
References
[AB03] Manindra Agrawal and Somenath Biswas. Primality and identity testing via Chinese remain-
dering. J. ACM, 50(4):429–443, 2003. Preliminary version in the 40th Annual IEEE Symposium
on Foundations of Computer Science (FOCS 1999).
[AGKS15] Manindra Agrawal, Rohit Gurjar, Arpita Korwar, andNitin Saxena. Hitting-Sets for ROABP and
Sum of Set-Multilinear Circuits. SIAM Journal of Computing, 44(3):669–697, 2015. Pre-print
available at arXiv:1406.7535.
[Agr05] Manindra Agrawal. Proving Lower Bounds Via Pseudo-random Generators. In Proceedings
of the 25th International Conference on Foundations of Software Technology and Theoretical
Computer Science (FSTTCS 2005), pages 92–105, 2005.
[AJMV98] Eric Allender, Jia Jiao, Meena Mahajan, and V. Vinay. Non-Commutative Arithmetic Circuits:
Depth Reduction and Size Lower Bounds. Theoretical Computer Science, 209(1-2):47–86, 1998.
Pre-print available at eccc:TR95-043.
[AR16] VikramanArvind and S. Raja. Some Lower Bound Results for Set-Multilinear Arithmetic Com-
putations. Chicago Journal of Theoretical Computer Science, 2016.
22
[ASS13] Manindra Agrawal, Chandan Saha, and Nitin Saxena. Quasi-polynomial hitting-set for
set-depth-∆ formulas. In Proceedings of the 45th Annual ACM Symposium on Theory of Com-
puting (STOC 2013), pages 321–330, 2013. eccc:TR12-113.
[BS83] Walter Baur and Volker Strassen. The Complexity of Partial Derivatives. Theoretical Computer
Science, 22:317–330, 1983.
[DL78] Richard A. DeMillo and Richard J. Lipton. A Probabilistic Remark on Algebraic Program Test-
ing. Information Processing Letters, 7(4):193–195, 1978.
[FS13] Michael A. Forbes and Amir Shpilka. Quasipolynomial-Time Identity Testing of Non-com-
mutative and Read-Once Oblivious Algebraic Branching Programs. In Proceedings of the 54th
Annual IEEE Symposium on Foundations of Computer Science (FOCS 2013), pages 243–252,
2013. Full version at arXiv:1209.2408.
[GKS16] Rohit Gurjar, Arpita Korwar, and Nitin Saxena. Identity Testing for Constant-Width, and Com-
mutative, Read-Once Oblivious ABPs. In Proceedings of the 31st Annual Computational Com-
plexity Conference (CCC 2016), pages 29:1–29:16, 2016. arXiv:1601.08031.
[GKST15] Rohit Gurjar, Arpita Korwar, Nitin Saxena, and Thomas Thierauf. Deterministic Identity
Testing for Sum of Read-once Oblivious Arithmetic Branching Programs. In Proceedings of
the 30th Annual Computational Complexity Conference (CCC 2015), pages 323–346, 2015.
arXiv:1411.7341.
[HS80] Joos Heintz and Claus-Peter Schnorr. Testing Polynomials which Are Easy to Compute (Ex-
tended Abstract). In Proceedings of the 12th Annual ACM Symposium on Theory of Computing
(STOC 1980), pages 262–272, 1980.
[HWY10] Pavel Hrubes, Avi Wigderson, and Amir Yehudayoff. Non-commutative circuits and the
sum-of-squares problem. In Proceedings of the 42nd Annual ACM Symposium on Theory of
Computing (STOC 2010), pages 667–676, 2010.
[HY16] PavelHrubeš andAmir Yehudayoff. On Isoperimetric Profiles andComputational Complexity.
In Proceedings of the 43rd International Colloquium on Automata, Languages and Program-
ming (ICALP 2016), pages 89:1–89:12, 2016. eccc:TR15-164.
[Hya79] Laurent Hyafil. On the Parallel Evaluation of Multivariate Polynomials. SIAM Journal of Com-
puting, 8(2):120–123, 1979. Preliminary version in the 10th Annual ACMSymposiumonTheory
of Computing (STOC 1978).
[KI04] Valentine Kabanets andRussell Impagliazzo. Derandomizing Polynomial Identity TestsMeans
Proving Circuit Lower Bounds. Computational Complexity, 13(1-2):1–46, 2004. Preliminary
version in the 35th Annual ACM Symposium on Theory of Computing (STOC 2003).
23
[KS06] Adam R. Klivans and Amir Shpilka. Learning RestrictedModels of Arithmetic Circuits. Theory
of Computing, 2(10):185–206, 2006. Preliminary version in the 16th Annual Conference on
Computational Learning Theory (COLT 2003).
[LLS17] Guillaume Lagarde, Nutan Limaye, and Srikanth Srinivasan. Lower Bounds and PIT for Non–
Commutative Arithmetic circuits with Restricted Parse Trees. Electronic Colloquium on Com-
putational Complexity (ECCC), 24:77, 2017. eccc:TR17-077.
[LMP16] Guillaume Lagarde, Guillaume Malod, and Sylvain Perifel. Non-commutative computations:
lower bounds andpolynomial identity testing. Electronic ColloquiumonComputational Com-
plexity (ECCC), 23:94, 2016. eccc:TR16-094.
[LMS16] Nutan Limaye, Guillaume Malod, and Srikanth Srinivasan. Lower Bounds for Non-Commu-
tative Skew Circuits. Theory of Computing, 12(1):1–38, 2016. eccc:TR15-22.
[Nis91] Noam Nisan. Lower bounds for non-commutative computation. In Proceedings of the 23rd
Annual ACMSymposiumonTheory of Computing (STOC1991), pages 410–418, 1991. Available
on citeseer:10.1.1.17.5067.
[Nis92] Noam Nisan. Pseudorandom generators for space-bounded computation. Combinatorica,
12(4):449–461, 1992.
[Ore22] Øystein Ore. Über höhere Kongruenzen. Norsk Mat. Forenings Skrifter, 1(7):15, 1922.
[RS05] Ran Raz and Amir Shpilka. Deterministic polynomial identity testing in non-commutative
models. Computational Complexity, 14(1):1–19, 2005. Preliminary version in the 19th Annual
IEEE Conference on Computational Complexity (CCC 2004).
[Sch80] Jacob T. Schwartz. Fast Probabilistic Algorithms for Verification of Polynomial Identities. Jour-
nal of the ACM, 27(4):701–717, 1980.
[SV15] Amir Shpilka and Ilya Volkovich. Read-once polynomial identity testing. Computational Com-
plexity, 24(3):477–532, 2015. Preliminary version in the 40th Annual ACM Symposium on The-
ory of Computing (STOC 2008).
[Val79] Leslie G. Valiant. Completeness Classes in Algebra. In Proceedings of the 11th Annual ACM
Symposium on Theory of Computing (STOC 1979), pages 249–261, 1979.
[VSBR83] Leslie G. Valiant, Sven Skyum, S. Berkowitz, and Charles Rackoff. Fast Parallel Computation of
Polynomials Using Few Processors. SIAM Journal of Computing, 12(4):641–644, 1983. Prelim-
inary version in the 6th Internationl Symposium on the Mathematical Foundations of Com-
puter Science (MFCS 1981).
24
[Zip79] Richard Zippel. Probabilistic algorithms for sparse polynomials. In Symbolic and Algebraic
Computation, EUROSAM ’79, An International Symposiumon Symbolic and Algebraic Compu-
tation, volume 72 of Lecture Notes in Computer Science, pages 216–226. Springer, 1979.
A Separating ABPs fromUPT circuits
This section contains the proofs of the separation between ABPs and UPT circuits. Recall the definition
of the polynomial Pd (of degreeD = 2
d+1−1).
Pd (x1, . . . ,xm)=
∑
γ∈[m]D
γ is legal
xγ(v1)xγ(v2) · · ·xγ(vD ).
Upper bound
Lemma 4.2 (Upper bound). For every m,d > 0, the polynomial Pd (y1, . . . , ym) can be computed by a non-
commutative UPT circuit of size O(m2d ).
Proof. Let G (d ,α) be the set of all legal colourings γ with v2d (root of Td ) satisfying γ(v2d )= α. Now we
define Pd ,α(x1, . . . ,xm) as
Pd ,α(x1, . . . ,xm)=
∑
γ∈G (d ,α)
xγ(v1)xγ(v2) · · ·xγ(vD ).
Clearly, Pd (x1, . . . ,xm)=
∑
α∈[m]Pd ,α(x1, . . . ,xm). Therefore we can now recursively write
Pd (x1, . . . ,xm)=
∑
α,β∈[m]
Pd−1,α(x1, . . . ,xm) ·xα+mβ ·Pd−1,β(x1, . . . ,xm), (A.1)
where α+m β= (α+β) modm.
Now using (A.1) it is easy to see that if we have UPT circuits for Pd−1,α(x1, . . . ,xm)s then a UPT cir-
cuit computing Pd (x1, . . . ,xm) can be obtained and this follows directly by induction. Hence, repeated
application of (A.1) yields a UPT circuit computing Pd of sizeO(m
2d ).
Lower bound
Asmentioned earlier, much of the lower bound argument is exactly along the lines of the proof of [HY16].
Themodifications required from their proof are quite minor but we present the proof here for complete-
ness.
Theorem 4.3 (Lower bound). For every permutation σ ∈ SD , any non-commutative ABP computing the
polynomial ∆σ(Pd ) has widthm
Ω(d).
25
Proof. Let us fix some σ ∈ SD and let Q(x1, . . . ,xm) = ∆σ(Pd ). In order to show that Q requires ABPs of
large width, it suffices to show that there exists some 0 ≤ k ≤ D for which the partial derivative matrix,
given by
Mk(Q)= [m]k
[m]D−k
w
w ′
coefficient of xw ·xw ′ inQ
has rank at leastmΩ(d). We shall prove this by exhibiting an r ×r identity matrix as a submatrix inMk(Q)
with r =mΩ(d). The k that we will work with would be the number whose binary expansion is 10101 · · · .
The relevance for this comes from the fact that the edge boundary of any subset V0 ⊆ Td is with |V0| = k
for such a k is reasonably large.
Definition A.2 (Isoperimetric profile of graphs). Given a graph G = (V (G),E (G)) and a subset of vertices
A ⊆V (G), edge isoperimetric profile of G is given by the following function eip(k) defined by
eipG (k)=min
{∣∣∣E (A,A)∣∣∣ : A ⊆V (G), |A| = k} ,
where E (A,A) is the set of edges with one end-point in A and the other outside. ♦
Lemma A.3. [HY16] If k ≤D is the number whose binary expansion is 1010 · · · , then eipTd (k)≥
d
4 .
The relevance for this would become apparent shortly, but let us proceed for now. If there is indeed
an ABP for a shuffling of f , then the rows of Mk(Q) is just a partial colouring of a subset V0 ⊂ Td of size
exactly k . Similarly, the columns ofMk(Q) are partial colourings of V1 := Td \V0. ThereforeMk(Q)(xw ,xw′ )
is 1 only if the colouring of V0 given by xw and that of V1 given by xw ′ together form a legal colouring
of Td . Hence the task of finding an r × r submatrix of Mk(Q) reduces to finding colourings C1,C2, . . . ,Cr
of V0 and colourings C
′
1,C
′
2, . . . ,C
′
r of V1 such that the colouring Ci ◦C
′
j
is legal if and only if i = j , for all
i , j ∈ [r ].
We will need the notion of pure nodes (as defined by [HY16]).
Definition A.4. (Pure nodes). For i ∈ {0,1}, a non-leaf node v inVi is called said to be pure if there is a path
Π= (v,v1,v2, . . . ,vk) in Td where vk is a leaf that is a descendant of v, andΠ∩Vi = {v}. ♦
There may be multiple witnesses vk for the fact that v is a pure node. For each pure node, we shall
assign one leaf arbitrarily as its pure leaf. It is easy to see that the pure leaves are distinct for each pure
node.
Let the pure nodes in V0 be P0 and those in V1 be P1 and say P := P0∪P1. Let ℓ(P), ℓ(P0) and ℓ(P1)
be the pure leaves of P , P0 and P1 respectively.
26
Lemma A.5. ([HY16, Claim 11]) |P | ≥ |E(V0,V1)|4 .
Without loss of generality, we may assume that P0 is bigger than P1 and the above lemma, in con-
junction with Lemma A.3, gives that |P0| ≥ d/32. We are now ready to define our colourings C1, . . . ,Cr
andC ′1, . . . ,C
′
r for r =m
|P0| ≥md/32.
Let L be the set of all leaves in Td . For each ci ∈ [m]
|P0|, define C˜i : Td → Zm obtained by assigning
colour 1 to all leaves in L \ℓ(P0), assigning ci to the leaves in ℓ(P0) and extending it uniquely to the other
vertices of Td in order to make it legal. The partial colouringsCi andC
′
i
be the restriction of C˜i to V0 and
V1 respectively.
Clearly,Ci ◦C
′
i
= C˜i and hence is a valid colouring. Now considerCi andC
′
j
for i 6= j . Theremust exist
some leaf v ∈ ℓ(P0) that gets different colours in Ci and C j and let u be the node in P1 that v was a pure
leaf of. We shall assume that u is minimal in the sense that any pure node u′ ∈ P1 that is a descendant
has all its leaves identically coloured inCi andC j . But then, the colour of u in C˜i and in C˜ j cannot be the
same as exactly one leaf if u has a different colour in C˜i and C˜ j respectively. This would then imply that
Ci forces u to be given a colour different than whatC
′
j
assigns and henceCi ◦C
′
j
is not legal.
Therefore, this shows that thematrixMk(Q) has an r×r identity submatrixwith r ≥m
d/32. Therefore,
any ABP computingQ must have width at leastmΩ(d).
B Exponential lower bound under any shuffling
Here we give an explicit polynomial that has polynomial sized arithmetic circuits but requires exponen-
tial sized UPT circuits under any shuffling. A version of the hard polynomial appears in [LMP16]. They
show that the polynomial requires exponential sized UPT circuits and that it is efficiently computable by
what are known as skew circuits (see [LMP16] for a formal definition). Here we extend the lower bound
and show that it applies to any shuffling of the polynomial.
B.1 The polynomial
The hard polynomial we discuss is called the moving palindrome which is a variant of the palindrome
polynomial. The palindrome polynomial of degree d on n variables, as known, is defined as follows.
Pald (x1, . . . ,xn) :=
∑
w∈{x1,...,xn }d/2
w ·wR
where wR denotes the reverse of the word w .
Using this definition, we define the (n+1)-variate moving palindrome of degreeD as follows.
PalmovD (x1, . . . ,xn ,z) :=
∑
0≤ℓ≤D/2
zℓ ·Pal D
2
(x1, . . . ,xn) · z
D
2 −ℓ
27
B.2 The lower bound
Similar to thematrixMk defined in Appendix A for a commutative polynomial, define a partial derivative
matrixM(i ,p) for a non-commutative polynomial g . Here the (w,w
′) entry ofM(i ,p) will be the coefficient
of w ×p w
′ in g , where deg(w )= i . We will show thatM(i ,p) for Pal
mov
D has rank n
Ω(D) for a range of types
(i ,p), such that anyUPT circuit computing any shuffling of PalmovD must admit at least one of those types.
Then using the characterization from [LMP16], we will conclude the following theorem.
Theorem B.1. For any σ ∈ SD , a UPT circuit computing ∆σ(Pal
mov
D ) has n
Ω(D) gates.
Proof. Let 2d be the degree of the palindrome, givingD = 4d . Also, let Pℓ(x,z)= z
ℓPal2d (x)z
2d−ℓ. There-
fore PalmovD =
∑2d
ℓ=0Pℓ(x,z)= f (x,z) (say). For Pℓ, and for ℓ< j1, j2 ≤ 4d −ℓ, we will say that j1 and j2 are
dependent with respect to Pℓ if all monomials in Pℓ contain the same variable in positions j1 and j2. It
is easy to see that the criterion j1+ j2 = 2(d +ℓ)+1 captures this relation. Define a dependency graph
Gℓ = (V ,Eℓ) withV = {1,2, . . . ,4d } such that ( j1, j2) ∈ Eℓ if and only if j1 and j2 are dependent with respect
to Pℓ. LetG = (V ,E ) with E =∪ℓEℓ.
If [4d ]=V0⊔V1 is a partition, let us define a matrix M˜V0,V1( f ) to be the one where rows and columns
are indexed by a partial assignment to the positions V0 and V1 respectively.
Claim B.2. Let [4d ] =V0⊔V1 be a partition of the positions, and suppose that for some ℓ ∈ {0, . . . ,2d } we
have t edges in Eℓ crossing the cut (V0,V1) in Gℓ. Then, rank
(
M˜V0,V1( f )
)
≥nt .
Proof. In the polynomial Pℓ, let Zℓ ⊆ [4d ] be the positions that are fixed to z. Consider the submatrix of
M˜V0,V1 where V0∩ Zℓ and V1∩ Zℓ are assigned to z. Observe that this submatrix is precisely M˜V ′0,V
′
1
(Pℓ)
where V ′0 =V0∩Zℓ and V
′
1 =V1∩Zℓ.
If we have t edges crossing the cut (V ′0,V
′
1) (none of the cut edges can be adjacent on Zℓ), then we
have a size t matching in (V ′0,V
′
1). This means that fixing the variables in their V
′
0 end-points uniquely
fixes their V ′1 end-points. Hence, it is clear that we have an n
t ×nt identity submatrix and hence that the
rank of M˜V0,V1( f ) is at least n
t .
The next claim shows that for any V0 in a fairly wide range of sizes, there will always be some ℓ with
Gℓ exhibiting a large cut.
Claim B.3. For any set V0 ⊆ [4d ] of size k with
d
6 ≤ k ≤
d
3 , there is some ℓ ∈ {0, . . . ,2d } such thatΩ(d ) edges
in Eℓ cross the cut (V0,V1).
Proof. Let V0 be a set of k positions with k ≤
d
3 . Let us partition the set of positions V = {1,2, . . . ,4d }
into S1 = {1, . . . ,k}, M1 = {(k +1), . . . , (2d −k)}, T1 = {(2d −k +1), . . . ,2d }, T2 = {(2d +1), . . . , (2d +k)}, M2 =
{(2d +k +1), . . . , (4d −k)} and S2 = {(4d −k +1), . . . ,4d }.
Now the possible choices for V0 can be split into the following (possibly overlapping) cases:
28
1. V0∩T1 ≥
k
8 :
Note that the degree of any vertex in T1 is at least (2d −k), and that every even (or odd) vertex in
M1 is connected to every odd (or even) vertex in S2. Now V1∩M1 is at least 2d −k− (k−
k
8 )≥ 2(d −
k). Total number of edges crossing (V0,V1) is therefore ≥ |(V0∩T1,V1∩M1)| ≥ 2
(
1
4 × (d −k)×
k
8
)
=
Ω(dk). Therefore there exists an Ei that achieves the average Ω(k) =Ω(d ) edges crossing the cut
(V0,V1).
2. V0∩S1 ≥
k
4 :
Consider the neighbourhood of V0∪S1 due to E0. All these positions are in T1. If more than
k
8 of
them are in V0 then case 1 applies. Else we get that≥
k
8 edges from E0 cross (V0,V1).
3. V0∩M1 ≥
k
4 :
Again, every even (or odd) position in M1 is connected to every odd (or even) position in T1, the
degree of every position inM1 is at least k , and |V1∩T1| ≥
k
8 . Therefore a total ofΩ(k
2) edges cross
(V0,V1), thereby again giving us that some Ei achievesΩ(d ) edges crossing (V0,V1).
Since the other cases (with T2,S2,M2) are symmetric to those discussed above, we can conclude the
statement of the claim.
In order to complete the proof, we just need to show that anyUPT circuit computing a homogeneous
degree d polynomial, there will be a gate of position-type (i ,p) with d6 ≤ i ≤
d
3 .
Lemma B.4. For all 0<α< 12 , any UPT circuit (with fan-in 2 × gates) computing a polynomial of degree
D contains a gate computing a degree i polynomial for some αD ≤ i ≤ 2αD.
Sketch of Proof. Let C be a UPT circuit computing a degree D polynomial with multiplication gates of
fan-in 2. Starting from the root of C , choose an arbitrary child at every addition gate and the child com-
puting a higher degree polynomial at every multiplication gate. As the degree never drops to a fraction
less than half in any step, we eventually reach an appropriate gate.
Now Lemma B.4 tells us that for any UPT circuit computing ∆σ(Pal
mov
D ), will have a gate of position-
type (i ,p) with D24 ≤ i ≤
D
12 . We can then apply Claim B.3 and then Claim B.2 to obtain an n
Ω(D) lower
bound on the number of gates inC .
29
C Hitting sets for UPT circuits
C.1 Commutative analogue of UPT circuits
Consider substitution mapΦ : {x1, . . . ,xn}→ F[y1,1, . . . , yd ,n]
(d+1)×(d+1) given by
Φ(xi )=


0 y1,i 0 . . . 0 0
0 0 y2,i . . . 0 0
0 0 0 . . . 0 0
...
...
...
. . .
...
...
0 0 0 . . . 0 yd ,i
0 0 0 . . . 0 0


, for all i ∈ [n].
To understand the effect of Φ on a homogeneous non-commutative polynomial f (x1, . . . ,xn) of degree
d , defineΨ : F〈x1, . . . ,xn〉deg=d → F[y1,1, . . . , yd ,n] as the unique F-linear map given by
Ψ : xw1 · · ·xwd 7→ y1,w1 · · · yd ,wd .
LemmaC.1 ([FS13]). Let f =
∑
w awxw ∈ F〈x1, . . . ,xn〉 be a homogeneous degree d non-commutative poly-
nomial. Then, f under the substitutionmapΦ (defined above) is given by
f ◦Φ= f (Φ(x1), . . . ,Φ(xn))=


0 · · · 0 Ψ( f )
0 · · · 0 0
...
. . .
...
...
0 0 0 0


(d+1)×(d+1)
Similar to the above definition ofΨ, we define a shifted version of it calledΨa (for a parameter a ∈N)
asΨa : xw1 · · ·xwd 7→ ya+1,w1 · · · ya+d ,wd .
Observation C.2. If f ∈ F〈x1, . . . ,xn〉deg=d1 and g ∈ F〈x1, . . . ,xn〉deg=d2 , then for any a ∈N, we haveΨa( f ·
g )=Ψa( f ) ·Ψa+d1(g ).
In the case of [FS13], when f was computable by non-commutative ABPs, they showed thatΨ( f ) is
computable by an ROABP. In our setting of non-commutative UPT circuits, the following is the commu-
tative analogue.
Observation C.3. Let C be a UPT circuit computing a polynomial f ∈ F〈x1, . . . ,xn〉 of size s and depth r .
Consider the commutative circuit C ′ where each leaf variable of type (1,p) that is labelled by xi is replaced
by yp+1,i . Then the circuitC
′ computesΨ( f ) and is UPT and set-multilinearwith respect to y= y1⊔·· ·⊔yd
where yi =
{
yi , j : j ∈ [n]
}
.
30
BIWAs for subspaces and products
Lemma 5.6 (BIWA for subspaces). Say V is a vector space of polynomials and supposewt is a BIWA for V .
Then, if V ′ is a subspace of V , thenwt is a BIWA for V ′ as well.
Proof. If B is a monomial basis of V that is isolated by wt, then the columns indexed by B span the
column space of V ′ as well. Starting with the columns of V ′ indexed by B , pick aminimumweight basis
B ′ according to wt, so that any column of V ′ that is outside B ′ is spanned by lower weight monomials
in B ′. By definition wt is a BIWA of V ′ isolating B ′, as all columns in B ′ get distinct weights and every
column outside B ′ is spanned by lower weight columns in B ′.
Lemma 5.7 (BIWA for variable disjoint products). Say V1 ⊆ F[y] and V2 ⊆ F[z] are two vector spaces of
polynomials over disjoint sets of variables, and of dimension at most s. Suppose
wt1 : y→N
k
wt2 : z→N
k
are BIWAs for V1 and V2 isolating bases B1 and B2 respectively. If w : y∪z→N is a weight assignment that
separates B1 ·B2 = {m1m2 : m1 ∈B1 ,m2 ∈B2}. Then the weight assignment defined by
wt : y∪z→Nk+1
wt : yi 7→ (wt1(yi ),w (yi )) for all yi ∈ y,
wt : zi 7→ (wt2(zi ),w (zi )) for all zi ∈ z,
is a BIWA for V =V1 ·V2 = span
{
f · g : f ∈V1 , g ∈V2
}
.
Proof. Observe that by the definition of wt, wt(m1 ·m2) = (wt1(m1)+wt2(m2),w (m1 ·m2)) for any m ∈
Mons(y) andm′ ∈Mons(z).
If V1 and V2 are expressed as matrices (with the generators listed as rows), then the matrix cor-
responding to V is just V1 ⊗V2, the tensor product. Let B1 = {m1, . . . ,mr } and B2 =
{
m′1, . . . ,m
′
s
}
. We
shall prove that the weight assignment wt is a BIWA that isolates the natural spanning set B = B1 ·B2 ={
mim
′
j
: i ∈ [r ] , j ∈ [s]
}
. Firstly, note that all the elements of B have distinct weights due to the presence
of the last coordinate from wt, which separates the r s monomials in B1 ·B2.
Now suppose m˜ =m ·m′ ∉ B for m ∈Mons(y) and m′ ∈Mons(z) and say without loss of generality
m ∉ B1. The column indexed by m˜ in V1 ·V2 is just the tensor product of the columns indexed bym in V1
and the column indexed bym′ in V2. But since wt1 is basis isolating for V1, the column of V1 indexed by
31
m can be expressed as a linear combination of lower weight terms.
V1,m =
∑
wt1(mi )≺wt1(m)
ai ·V1,mi
=⇒ Vm˜ =V1,m ⊗V2,m′ =
∑
wt1(mi )≺wt1(m)
ai ·
(
V1,mi ⊗V2,m′
)
=
∑
wt1(mi )≺wt1(m)
ai ·Vmim′
But notice that wt1(mi )≺wt1(m) also implies that wt(mim
′)≺wt(mm′). Therefore, (repeating this argu-
ment onm′ ifm′ ∉B2) we can write any columnwith index outside B as a linear combination of columns
of smaller weight in B . Hence, wt is indeed a BIWA for V that isolates B .
C.2 Constant width UPT circuits
In this subsection we prove the existence of a poly(n,d ) hitting set for UPT circuits of constant preimage-
width computing n-variate degree-d polynomials, when the shape of the circuit is known. The proof is
an easy extension of the ideas of [GKS16] to the UPT-SML circuits regime. We will construct a univariate
substitution map that preserves its nonzero-ness and has degree poly(n,d ), which will imply a hitting
set naturally.
Say y = y1⊔ ·· · ⊔yd and let f (y) be an nd-variate degree d polynomial computable by a UPT-SML
circuit (with respect to the above partition) of constant preimage-width. From Observation 3.8, we may
assume that the circuit has depth logd . We will need the following lemma for bivariates over large fields.
LemmaC.4. ([GKS16, Lemma 3.2]) Let f (y1, y2)=
∑w
i=1ui (y1)vi (y2) be a nonzero bivariate polynomial of
degree d over F. If char(F)= 0 or char(F)> d, then f (tw , tw−1+ tw ) 6= 0.
Suppose f (y) is computable by a circuitC that has shape T . Define the set of variables t= {tτ : τ∈ T }.
We will begin by substituting t
j
τi for every yi j where the leaf in C computing polynomials over yi cor-
responds to τi in T . As long as we can, we will pick a multiplication gate τ that has its left and right
children (say τL and τR ) computing univariates in tτL and tτR respectively; and then substitute tτL ← t
w
τ
and tτR ← t
w
τ + t
w−1
τ . Let us call this substitutionΦτ.
Lemma C.5. Consider the above iterative process of substituting some of the yi ’s by suitable polynomials
in t. Let Φ˜( f )= f˜ (t,y) 6= 0 be the polynomial just before applying the substitutionΦτ. Then f˜
′ =Φτ( f˜ ) :=
f˜ (tτL ← t
w
τ , tτR ← t
w
τ + t
w−1
τ ) 6= 0.
32
Proof. From (3.4), we have
f =
∑
u∼τ
[u] · [root : u]
=
∑
u∼τ
[uL] · [uR] · [root : u],
=⇒ Φ˜( f )=
∑
u∼τ
au(tτL ) ·bu(tτR ) ·hu(y,t\ tτL , tτR ) 6= 0.
Wemay treat Φ˜( f ) as a bivariate polynomial in tτL , tτR over the field F(t\
{
tτL , tτR
}
) and apply Lemma C.4
to conclude thatΦτ(Φ˜( f )) will be nonzero if and only if Φ˜( f ) was nonzero.
Now for every leaf node in T , create a sequence whichwe will call its signature, by walking down from
the root to the leaf. Every timewe pick the left child, we append L to the signature and every time we pick
the right child, we append R . For τ ∈ T , call the sequence sigτ = (a1 a2 · · · ar ). Let t be a fresh variable
and τi be the node corresponding to yi . Define
ΦL : t 7→ t
w , ΦR : t 7→ t
w
+ tw−1
Ψ : y→ F[t ]
Ψ : yi j 7→Φa1 ◦Φa1 ◦ · · · ◦Φar (t
j )
where (a1 · · ·ar )= sigτi . Observe that the procedure described above essentially executes the substitution
Ψ on y. We can then infer from Lemma C.5 that for any f (y) computable by UPT-SML circuits, f (y) 6=
0 ⇐⇒ f (Ψ(y)) 6= 0. This gives us the following theorem.
Theorem C.6. Let f (y) be a polynomial computed by an UPT-SML circuit of width w and depth r . Con-
sider the following substitutionΨ : y→ F[t ] given by
Ψ : yi j 7→Φa1 ◦Φa2 ◦ · · · ◦Φar (t
j ),
where the signature of the part yi is a1a2 · · ·ar . Then f (y) is non-zero if and only if f (Ψ(y)) is non-zero.
Now since the depth of the circuit is at mostO(logd ), if the width is constant, then the final degree of
f (Ψ(y)) is at mostO(nwO(logd)), which is poly(n,d ) if w =O(1). This finishes the proof of Theorem 1.3.
D Hitting sets for FewPT circuits
Wewill need the following fact about coefficient operators (defined in Definition 6.2).
Observation D.1 (Coefficients of UPT circuits are also UPT circuits). Suppose f (y) is a homogeneous
degree d polynomial that is computable by a UPT set-multilinear circuit with respect to y = y1⊔ ·· · ⊔yd
of preimage-width w. If S ⊆ [d ] and m is any monomial in yS , then the polynomial coeffm( f ) can also be
computed by a UPT set-multilinear circuit of preimage-width w.
33
Sketch of Proof. Since the UPT-SML circuitC can bemade canonical without loss of generality, we only
need to set the corresponding leaves in yS as 0 or 1 depending on whether the variable appears inm.
The following is an analogue of [GKST15, Lemma 4.5].
Lemma6.3. Let y= y1⊔. . .⊔yd be a partitionand f (y) be a set-multilinearpolynomial (with respect to the
above partition) computed by aUPT-SML circuit of preimage-widthw and underlying parse-tree shape T .
Suppose g (y) is another set-multilinear polynomial (under the same partition) that cannot be computed
by aUPT-SML circuit of preimage-width w with the same shape T .
Then, there exists S ⊆ [d ] and R ∈ F[yS]
1×w ′ , and P,Q ∈ F[y−S]
w ′×1 with w ′ ≤w2 such that:
• For each i ∈ [w ′], there is a monomial mi ∈ y
S such that the i -th element of P and Q is coeffmi ( f )
and coeffmi (g ) respectively,
• there is a vector Γ ∈ F1×w
′
of support size at most w +1 such that ΓP = 0 and ΓQ 6= 0,
• the coefficient space of R is full-rank, i.e. if we interpret R as a matrix over F by listing each of its w ′
entries as a column vector of coefficients, then this matrix has full column-rank.
• the vector of polynomials R is simultaneously computable by aUPT-SML circuit of preimage-width
at most w ′.
Proof. For an S ⊆ [d ], let yS = {m1, . . . ,mr } and y
−S = {n1, . . . ,nt } in some order. Define M f ,S ∈ F
r×t such
that M f ,S(i , j ) is the coefficient of n j in coeffmi ( f ). Note that the i
th row of M f ,S is the polynomial
coeffmi ( f ) written in the coefficient vector form.
For a type τ in a tree T , Sτ will denote the set of leaves of the node τ in T . Consequently, we will
also use justM f ,τ to meanM f ,Sτ . We will denote by B f ,τ a set of monomials from y
Sτ such that the rows
indexed by them in M f ,S will form a basis of the rows ofM f ,S . Note that if τ has children τ1,τ2, then we
can ensure that our choice of B f ,τ satisfies B f ,τ ⊆B f ,τ1×B f ,τ2 as the latter is clearly a spanning set. Using
such a basis B f ,τ, we can then write down a set of dependencies as below corresponding to f and τ.
∀m ∈ ySτ : coeffm( f )=
∑
m′∈B f ,τ
γm,m′ coeffm′( f ). (D.2)
Using this, we can rewrite f in the following way for any τ∈ T .
f =
∑
mk∈ySτ
mk

 ∑
m′
i
∈B f ,τ
γi ,k coeffm′
i
( f )

= ∑
m′
i
∈B f ,τ
( ∑
mk∈ySτ
γi ,kmk
)
coeffm′
i
( f )
f =
∑
m′
i
∈B f ,τ
ui (y
Sτ)coeffm′
i
( f ) for some ui ∈ F[ySτ]. (D.3)
Suppose τ ∈ T has two children τ1 and τ2 that share the same dependencies for g as well. That is,
34
f =
∑
m′
i
∈B f ,τ1
ui (y
Sτ1 )coeffm′
i
( f ),
g =
∑
m′
i
∈B f ,τ1
ui (y
Sτ1 )coeffm′
i
(g ),
f =
∑
n′
j
∈B f ,τ2
v j (y
Sτ2 )coeffn′
j
( f ),
g =
∑
n′
j
∈B f ,τ2
v j (y
Sτ2 )coeffn′
j
(g ).
Combining them (and renaming the variables by dropping the ′s), we get
f =
∑
(mi ,n j )∈B f ,τ1×B f ,τ2
ui (y
Sτ1 )v j (y
Sτ2 ) ·coeffmi ·n j ( f ),
g =
∑
(mi ,n j )∈B f ,τ1×B f ,τ2
ui (y
Sτ1 )v j (y
Sτ2 ) ·coeffmi ·n j (g ).
Observe that if for allm ∈B f ,τ1 ×B f ,τ2 we have
coeffm( f )=
∑
m′∈B f ,τ
γm,m′ coeffm′( f ) , coeffm(g )=
∑
m′∈B f ,τ
γm,m′ coeffm′(g ),
then this also forces that by (D.3), for τ:
f =
∑
mi∈B f ,τ
u′i (y
Sτ)coeffmi ( f ) , g =
∑
mi∈B f ,τ
u′i (y
Sτ)coeffmi (g ).
Since g is not computable by a UPT-SML circuit with underlying shape T this cannot happen for all
τ ∈ T . Let us pick the lowest τ (closest to the leaves; and say its children are τ1,τ2) such that for some
m ∈ B f ,τ1 ×B f ,τ2 we have
coeffm( f )=
∑
m′∈B f ,τ
γm,m′ coeffm′( f ),
coeffm(g ) 6=
∑
m′∈B f ,τ
γm,m′ coeffm′(g ).
(D.4)
The choice of the vector of polynomials is now clear. If w ′ =
∣∣B f ,τ1∣∣ · ∣∣B f ,τ2∣∣≤w2, then
R :=
(
ui (y
Sτ1 )v j (y
Sτ2 ) : (mi ,n j ) ∈B f ,τ1 ×B f ,τ2
)
∈ F[ySτ]
1×w ′
P :=
(
coeffmi ·n j ( f ) : (mi ,n j ) ∈ B f ,τ1 ×B f ,τ2
)T
∈ F[y−Sτ]
w ′×1
Q :=
(
coeffmi ·n j (g ) : (mi ,n j ) ∈B f ,τ1 ×B f ,τ2
)T
∈ F[y−Sτ]
w ′×1.
It is clear from the definition that the vectors P andQ are made up of coefficients of f and g . Also, (D.4)
provides a suitable vector Γ of support at most w +1 such that ΓP = 0 but ΓQ 6= 0.
It follows that the coefficient space of R is full-rank as the sets of polynomials
{
ui : i ∈B f ,τ1
}
and{
v j : j ∈B f ,τ2
}
are linearly independent and are on disjoint sets of variables.
35
We only need to show that every entry of R can also be computed by a UPT-SML circuit of preimage-
width at most w2. To see this, observe that the set of polynomials
{
ui (y
Sτ1 ) : i ∈ B f ,τ1
}
spans the set{
coeffm( f ) : m ∈ y
−Sτ1
}
, and similarly
{
v j (y
Sτ2 ) : j ∈ B f ,τ2
}
spans
{
coeffn( f ) : n ∈ y
−Sτ2
}
. Since the di-
mension of these spaces is at most w , it follows that each ui (y
Sτ1 ) can be written as a linear combination
of at most w many coeffm( f )’s, and similarly each v j (y
Sτ2 ). Observation D.1 shows that each of the co-
efficient polynomials can also be computed by UPT-SML circuits of preimage-width at most w . Thus,
by computing each of the ui ’s and v j ’s separately, and then taking all w
2 products, we have a UPT-SML
circuit of preimage-width at most w2 that simultaneously computes all the entries of R .
36
