We study the class of non-commutative Unambiguous circuits or Unique-Parse-Tree (UPT) • An explicit hitting set of quasipolynomial size for UPT circuits,
Introduction
The field of algebraic complexity deals with classifying multivariate polynomials based on their hardness.
Typically, the complexity of a polynomial is measured by the size of the smallest circuit computing it (an arithmetic circuit is a directed acyclic graph made up of internal nodes that are labeled with + or × and leaves labelled with variables or constants from the field). The central question in this field is to construct in Definition 2.2). The class of non-commutative UPT circuits subsumes the class of non-commutative ABPs as any ABP can be expressed as a left-skew circuit. A related model of set-depth-∆ formulas was studied by Agrawal, Saha and Saxena [ASS13] that is a subclass of UPT circuits where the underlying parse trees are extremely regular 1 .
Lagarde, Malod and Perifel [LMP16] extended the techniques of Nisan [Nis91] to give exponential lower bounds for UPT circuits. Subsequently, Lagarde, Limaye and Srinivasan [LLS17] extended the lower bounds to the class of circuits with parse trees of not-too-many shapes (at most 2 o(n) shapes).
In Figure 1 , (a) is an example of a UPT circuit with (b) being the underlying parse tree shape; (c) is an example of a circuit with two distinct parse tree shapes. 
Polynomial identity testing
A Polynomial Identity Test (PIT) is an algorithm that, given a circuit as input, checks if the circuit is computing the zero polynomial or not. The standard Ore-DeMillo-Lipton-Schwartz-Zippel lemma [Ore22, DL78, Sch80 , Zip79] provides a simple randomized algorithm but the goal is to construct an efficient deterministic PIT. A stronger test is what is called a black-box PIT where we are only provided evaluation access to the circuit. Hence, a black-box PIT is essentially equivalent to constructing a hitting set i.e., a set of points (or matrices, in the case of non-commutative polynomials) H such that every non-zero polynomial from the class of interest is guaranteed to evaluate to a nonzero value on some element a ∈ H .
PITs that use the structure of the circuit are called white-box PITs.
The task of constructing efficient PITs is intimately connected to the task of proving lower bounds [HS80, KI04, Agr05] . Once we have a lower bound for a class C , it is natural to ask if we can also construct efficient PITs for that class. Raz and Shpilka [RS05] gave the first deterministic polynomial time white-box PIT for the class of non-commutative ABPs. Forbes and Shpilka [FS13] gave a quasipolynomial (n O(logn) ) size hitting set for non-commutative ABPs. This was achieved by studying a natural commutative analogue of non-commutative ABPs, and this was the class of Read-Once Oblivious Algebraic
Branching Programs (ROABPs) where the variables are read in a "known order".
The class of ROABPs is interesting in its own right owing to the connection with the "RL vs L" question. In fact, much of the hitting set constructions for ROABPs has been inspired by Nisan's [Nis92] pseudorandom generator for RL (which has seed length O(log 2 n)). As mentioned earlier, Forbes and Shpilka gave a hitting set of size n O(logn) for polynomial sized ROABPs when the order in which variables are read was known. Agrawal, Gurjar, Korwar and Saxena [AGKS15] presented a different hitting set for the class of commutative ROABPs that did not need the knowledge of the order in which the variables were read.
Subsequently, Gurjar, Korwar, Saxena and Thierauf [GKST15] studied polynomials that can be computed as a sum of constantly many ROABPs (of possibly different orders) and presented a polynomial time white-box PIT, and also a quasipolynomial time black-box PIT for this class.
Lagarde, Malod and Perifel [LMP16] , besides presenting lower bounds for non-commutative UPT circuits, also gave a polynomial time white-box PIT for this class. This was extended by Lagarde, Limaye and Srinivasan [LLS17] to a white-box algorithm for non-commutative circuits with constantly many parse tree shapes (analogous to the result of [GKST15] ). The question of constructing black-box PITs was left open by them, and we answer this in our paper.
Our results

Polynomial Identity Testing
Our main results are hitting sets for the class of polynomials computed by UPT circuits and related classes.
Theorem 1.1 (Hitting sets for UPT circuits). There is an explicit hitting set H d,n,s of at most (snd )
O (logd) size for the class of degree d n-variate homogeneous non-commutative polynomials in F 〈x 1 , . . . , x n 〉 that are computed by UPT circuits of size at most s.
This result builds on the technique of basis isolating weight assignments introduced by [AGKS15] for
constructing hitting sets for ROABPs. Furthermore, we can also extend the hitting set to the class of noncommutative circuits that have few shapes (analogous to [GKST15] 's hitting set for sum of few ROABPs). Both the above theorems are fully black-box in the sense that it is not required to know the underlying shape(s). For the case of non-commutative ABPs (and more generally, ROABPs in a known order),
Gurjar, Korwar and Saxena [GKS16] presented a more efficient hitting set when the width of the ABP is small. For UPT circuits, there is a natural notion of preimage-width of a UPT circuit (formally defined in Definition 2.3) that corresponds to the notion of width of an ABP. We show an analogue of the hitting set of Gurjar, Korwar and Saxena for the class of UPT circuits of small preimage-width if the underlying shape of the parse trees is known. These hitting sets also translate to the natural commutative analogues of UPT set-multilinear circuits etc. (formally defined in Definition 5.1).
Structural results
If f is a non-commutative polynomial of degree d and if σ ∈ S d is a permutation on d letters, we define the shuffling of f by σ (denoted by ∆ σ ( f )) as the natural operation of permuting each word of f according to σ.
The three PIT statements stated above begin with the following depth reduction statement about UPT circuits. We also extend the lower bound of [LMP16] to give a polynomial computed by a skew circuit that requires exponential sized UPT circuits under any shuffling. Details are in Appendix B.
Theorem 1.4 (Depth reduction for UPT circuits
Proof ideas
As mentioned, the starting point of all these results is the depth reduction. The key insight here is that even though Pal d cannot be computed by small depth non-commutative circuits, a shuffling of the palindrome is 
Preliminaries
Notation
• We use F 〈x 1 , . . . , x n 〉 to refer to the ring of polynomials in non-commuting variables {x 1 , . . . , x n }.
For a parameter d , we use F 〈x 1 , . . . , x n 〉 deg=d to refer to the set of polynomials in F 〈x 1 , . . . , x n 〉 that are homogeneous and of degree d . Similarly, F 〈x 1 , . . . , x n 〉 deg≤d refers to the set of polynomials of degree at most d .
• We use boldface letters x and y to denote sets of variables (the number of variables would be clear from context). We shall also use [d ] to refer to the set {1, 2, . . . , d }.
• The paper would sometime shift between the commutative and the non-commutative domains.
We use x whenever we are talking about non-commutative variables, and y, z for variables in the commutative domain.
Basic definitions
UPT and FewPT circuits Definition 2.1 (Parse trees). A parse tree T of a circuit C is a tree obtained as follows:
• the root of C is the root of T , The preimage-width of a UPT circuit C is the largest size of preimages of any node τ ∈ T . That is,
♦ It is clear that if C is a UPT circuit of preimage-width w computing a homogeneous degree d polynomial, then the size of C is at most d w . The preimage-width of a UPT circuit is a more useful measure to study than the size of the circuit. A simple concrete example of this is that the standard conversion of homogeneous ABPs to homogeneous circuits in fact yields UPT circuits. Furthermore, the width of the ABP is directly related to the preimage-width of the resulting UPT circuit. 
♦ For instance, the usual multiplication (or concatenation) operation is just × 0 .
Shuffling of a polynomial
Definition 2.6 (Shuffling of a non-commutative polynomial 
Basic lemmas
Canonical UPT circuits, and types of gates
We shall say that a UPT circuit C with underlying parse tree shape T is canonical if for every gate g ∈ C there is some node τ ∈ T such that every parse tree of C involving g has g only in position τ. In other words, every gate of the circuit has a unique type associated with it. For a canonical UPT circuit where the parse trees have shape T , we shall say that g has type τ if τ ∈ T is the unique node in T such that g ∼ τ.
Fix a τ ∈ T and let i be the number of leaves of the subtree rooted at τ, and let p be the number of leaves to the left of τ in the inorder traversal of T . We shall then say that τ (or a gate g ∈ C of type τ) has position-type (i , p). The following lemma allows us to write the polynomial computed by the circuit as a small sum of × p -products. in the form needed for the non-commutative setting.
UPT ⊗-circuits
To prove the depth reduction, we will move to an intermediate model of UPT ⊗-circuits. A parse tree for an ⊗-circuit is similar to parse trees in a general non-commutative circuit but the internal nodes of the parse tree are labelled by + and × p (with the p specified at each gate).
Definition 3.1 (UPT ⊗-circuits).
We shall say that an ⊗-circuit C is UPT if every parse tree is of the same shape, i.e. two parse trees in C
can differ only in the gate names.
♦
To prove Theorem 1.4, we shall first depth reduce the circuit to obtain an ⊗-circuit computing f of O(log d ) depth. Then, we will convert that to a UPT circuit that computes a shuffling of f .
Lemma 3.2 (Depth reducing to ⊗-circuits
). Let f ∈ F 〈x 1 , . . . , x n 〉 be a
homogeneous degree d polynomial that is computable by a UPT circuit of preimage-width s. Then, f can be equivalently be computed by a semi-unbounded UPT ⊗-circuit of preimage-width O(s 2 ) and depth O(log d ).
Proof. Let C be the UPT circuit computing f (x 1 , . . . , x n ) and say T is the shape of the parse trees of C . For any node τ ∈ T , let F τ be the set of all gates in C whose position in T is τ. For two gates u, v ∈ C , we shall say that u v if the place of u in T is an ancestor of the place of v in T . We shall abuse notation and use u τ to mean that u's position in T is an ancestor of τ ∈ T . For a gate u ∈ C , let [u] refer to the polynomial computed at that gate. Similar to [VSBR83, AJMV98] , we define inductively the following notion of a gate quotient for any pair of gates u, v ∈ C :
for a suitable p depending just on τ and the type of u. Furthermore, suppose u, v ∈ C with v being a multiplication gate and if τ ∈ T such that u τ v then
for a suitable p depending just on τ and the type of u and v.
We'll defer this proof to later and first finish the proof of Lemma 3.2. With (3.4) and (3.5), we can construct the ⊗-circuit C ′ for f just as in [VSBR83, AJMV98] . The circuit C ′ would have gates computing each [u] and [u : v] for nodes u, v ∈ C with u v and v being a multiplication gate. The wirings in C ′ is built by appropriate applications of (3.4) and (3.5).
Let u ∈ C and say deg[u] = d u . The plan would be to set up the computation in C ′ so that using an O(1) depth computation, we can compute [u] using gates whose degrees are a constant factor smaller than d u . Consider any parse tree rooted at u, and starting from u follow the higher degree child. Let τ be the last point on the path with degree ≥ d u /2 (degree of its children will be < d u /2). Applying (3.4), . Using (3.5),
where w = w 1 × w 2 and w 2 v (the other possibility is identical). By the choice of τ, we have deg[u : and write
By the choice of τ and τ ′ , each of the factors on the RHS have degree at most
as we wanted.
Furthermore, once again, all of the summands consists of similarly typed factors.
This naturally yields an ⊗-circuit computing f of depth O(log d ) and size poly(s). Since all summands
consist of similarly typed factors, it follows that the circuit is UPT as well. Now suppose u ≻ τ and say we already know that [
. We have two cases depending on whether u 1 τ or u 2 τ.
Essentially the same proof works for (3.5) as well.
Lemma 3.6 (⊗-circuits to circuits for a shuffling). Let f ∈ F 〈x 1 , . . . , x n 〉 be a homogeneous degree d poly-
Proof. We shall prove this by induction. We need a slightly stronger inductive hypothesis which is that the choice of permutation σ depends only on the shape of the parse trees in C ′ .
Say u is the root of C ′ . Suppose u is a + gate and say
r is the resulting computation in C ′′ then by the inductive hypothesis, we know that there is a σ ∈ S d such that Furthermore, the shuffling σ that permits this can also be efficiently computed given the underlying shape for the circuit computing f .
UPT circuits of constant width
For a UPT circuit C , we shall say that its width is w if for every node τ in the shape T , there are at most w gates of C that have type τ. The following observation is evident from the proof of the above depth reduction. This observation would allow us to yield a more efficient hitting set for the class of small width known shape UPT circuits. Details are present in Section C.2.
Separating ROABPs and UPT circuits Theorem 1.5 (Separating UPT circuits and ABPs, under shuffling). There is an explicit n-variate degree d non-commutative polynomial f that is computable by UPT circuits of preimage-width w
The polynomial and the proof technique described here were introduced by Hrubeš and Yehudayoff [HY16] to separate monotone circuits and monotone ABPs in the commutative regime. The polynomial described here is a non-commutative analogue of the polynomial used by [HY16] . Much of the proof is also the argument of [HY16] tailored to the non-commutative setting.
The polynomial
Let the root, and then the right-subtree listed inductively). We now define the non-commutative polynomial it. The lower bound follows on exactly same lines as the [HY16] . A proof is present in Appendix A.
Hitting sets for non-commutative models Commutative brethren of non-commutative models
This reduction to an appropriate commutative case was used by Forbes and Shpilka [FS13] to reduce constructing hitting sets for non-commutative ABPs to hitting sets for commutative ROABPs (more precisely, to set-multilinear ABPs). They studied the image of the non-commutative polynomial under the
which is the unique F-linear map given by Ψ :
For the model of non-commutative UPT circuits, the appropriate commutative model is a restriction of set-multilinear circuits that we call UPT set-multilinear (UPT-SML) circuits. • each gate g ∈ C is labelled by a subset S g ⊆ A natural generalization that will be useful later is a multi-output UPT set-multilinear circuit, which is a UPT set-multilinear circuit that potentially has multiple output gates, which are all labelled with the same subset.
Forbes and Shpilka [FS13] showed that constructing hitting sets for these commutative models suffices for the non-commutative models by a simple reduction (details in Section C.1). We shall therefore focus on these commutative models for the hitting set constructions. And since we have already seen that such circuits can be depth reduced 2 to O(log d ) depth, it suffices to construct a hitting set for O(log d )-depth UPT and FewPT set-multilinear circuits.
Preliminaries for PIT Weight assignments and basis isolation
To construct hitting sets for ROABPs, Agrawal, Gurjar, Korwar and Saxena [AGKS15] defined the notion of basis isolating weight assignments for associated vector spaces of polynomials. The description presented here is an adaptation of the approach of [AGKS15] to set-multilinear circuits of small depth. 
. For a prime p, let w p : y → N be a weight assignment given by
Then for all but at most r 2
· n 2 primes p, the weight assignment w p separates S.
BIWAs for subspaces and products
Agrawal, Gurjar, Korwar and Saxena [AGKS15] constructed BIWAs for polynomials computed by ROABPs.
The following two lemmas are slight abstractions of the key ideas in [AGKS15] , so that they can also be applied in our setting. For the sake of completeness, the proofs are provided in Section C.1. wt : Proof. Suppose f (y) is a polynomial that is computable by a UPT set-multilinear circuit C with respect to y = y 1 ⊔ · · · ⊔ y d and say C is of preimage-width size w and depth r .
Hitting sets for
Since C is a UPT set-multilinear circuit, let T be the shape of the parse tree. For each τ ∈ T , we define the vector space
The following claim relates the vector space corresponding to nodes in T to the vector spaces corresponding to the children.
Claim 5.9. If τ ∈ T labels a + gate and if τ
If τ ∈ T labels a × gate and has children τ 1 and τ 2 , then V τ is a subspace of V τ 1 · V τ 2 .
Proof. Suppose τ ∈ T labels a + gate and say τ ′ is the unique child of τ in T . Pick an arbitrary
Since the choice of g was an arbitrary gate of type τ, it follows that V τ is a subspace of V τ ′ .
Say τ labels a × gate, and say τ 1 and τ 2 are the children of τ. Pick an arbitrary gate g ∈ C Define the multiplication height of any gate g , denoted by g × , as the largest number of × gates encountered on a path from g to a leaf. Starting with the leaves, we shall build towards a BIWA for V root , which by Lemma 5.3 also yields a hitting set.
Let P be the set of the first (d n 2 w 2 +1) primes. For each 0 ≤ k ≤ r and p = (p 1 , . . . ,
The plan is to use Ω i ∈S τ
We shall prove, by induction, that for each 0 ≤ k ≤ r there is a p ∈ P k such that for every τ ∈ T with |τ| × ≤ k, the weight assignment wt
p is a BIWA for V τ . If τ was a leaf of T , then any such node just computes a variable. Clearly, wt (τ) p : (y i j ) → j is a BIWA as it gives distinct weights to all variables of a partition. Hence, wt (τ) p is a BIWA for all V τ whenever τ is a leaf.
If τ is not a leaf but |τ| × = 0, then neither τ nor its descendants are × gates. Hence, the subtree at τ has a unique leaf ℓ and all the nodes along this path are + gates. By Claim 5.9, V τ is a subspace of V ℓ and hence, by Lemma 5.6, wt
p is a BIWA for V τ . That finishes the base case of k = 0. Suppose we have proved the claim up to k − 1. Let T k be the set of all nodes of multiplication height at most k that are × gates. By the inductive hypothesis, there exists p ∈ P k−1 such that wt
p is BIWA for all V τ ′ with τ ′ × < k. Fix such a p. For each τ ∈ T k , its children τ 1 , τ 2 must have multiplication height at most k − 1. Since C is set-multilinear, the subset of indices that label τ 1 and τ 2 must be disjoint. Say S 1 and S 2 are the subsets of indices labelling τ 1 and τ 2 respectively.
Hence, by Claim 5.9, V τ is a subspace of V τ 1 · V τ 2 . By our inductive hypothesis, we know that wt 
are also BIWAs for V τ 1 and V τ 2 respectively. By using Lemma 5.7, Lemma 5.6 and Lemma 5.5, besides perhaps w 2 n 2 primes p ∈ P , the weight assignment defined by wt :
is a BIWA for V τ . For different τs in T k there may a different set of w 2 n 2 primes that we should exclude. But since the set P of primes is at least w 2 n 2 d + 1, there is a prime p ∈ P for which wt(
is a BIWA for every V τ where τ ∈ T k . By extending p by p in the last coordinate, this shows that there is a p ′ ∈ P k such that for each τ ∈ T k , the weight assignment wt
To complete the inductive step, we also need to prove the same for τ ∈ T that are + gates with |τ| × = k.
Hence, there must be a × gate τ ′ ∈ T k that is a descendant of τ such that the path from τ to τ ′ consists only of + gates. Once again, this forces wt
p and V τ is a subspace of V τ ′ . Hence, by Claim 5.9 and Lemma 5.6, it follows that wt
p is a BIWA for V τ as well. And that completes the proof of the inductive step.
Hence, if f is a polynomial computed by a preimage-width w UPT set-multilinear circuit of depth r ,
p is a BIWA for V root . Furthermore, by the prime number theorem, we know that the 
is a hitting set for preimage-width w depth r UPT set-multilinear circuits and |H | = poly(nd w ) r .
Poly-sized hitting sets for constant width UPT circuits Theorem 1.3 (Hitting sets for known-shape low-width UPT circuits). Let C n,d,T,w be the class of n-variate degree d non-commutative polynomials that are computable by UPT circuits of preimage-width at most w and underlying parse-tree shape as T . Over any field of zero or large characteristic, there is an explicit hitting set H n,d,T,w of size w O(logd) poly(nd ) for C n,d,T,w .
In this section we describe the black-box identity test for FewPT(k) circuits. The following lemma from [LLS17] shows that this class is equivalent to polynomials computed by sum of k UPT circuits (of possibly different shapes).
Preliminaries Lemma 6.1. ([LLS17, Lemma 16]) Let f (x) be a polynomial computed by FewPT(k) circuit of preimagewidth w . Then f can be equivalently computed by a sum of k UPT circuits of preimage-width w each.
Like in [LLS17] , we'll refer to this class by Σ k -UPT. We shall further qualify this notation to use Σ k -UPT(w ) to denote the class of circuits that is a sum of k UPT circuits of preimage-width w .
From this lemma, we can focus our attention on constructing hitting sets for Σ k -UPT-SML circuits.
The proof largely follows the ideas of Gurjar, Korwar, Saxena and Thierauf [GKST15] 3 . 
Notation
A Separating ABPs from UPT circuits
This section contains the proofs of the separation between ABPs and UPT circuits. Recall the definition of the polynomial P d (of degree D = 2 d+1 − 1).
Upper bound 
. . , x m ). Therefore we can now recursively write
Now using (A.1) it is easy to see that if we have UPT circuits for P d−1,α (x 1 , . . . , x m )s then a UPT circuit computing P d (x 1 , . . . , x m ) can be obtained and this follows directly by induction. Hence, repeated application of (A.1) yields a UPT circuit computing P d of size O(m 2 d ).
Lower bound
As mentioned earlier, much of the lower bound argument is exactly along the lines of the proof of [HY16] .
The modifications required from their proof are quite minor but we present the proof here for completeness.
Theorem 4.3 (Lower bound). For every permutation σ ∈ S D , any non-commutative ABP computing the
Proof. Let us fix some σ ∈ S D and let Q(x 1 , . . . , x m ) = ∆ σ (P d ). In order to show that Q requires ABPs of large width, it suffices to show that there exists some 0 ≤ k ≤ D for which the partial derivative matrix,
given by
. We shall prove this by exhibiting an r × r identity matrix as a submatrix in M k (Q)
The k that we will work with would be the number whose binary expansion is 10101 · · · .
The relevance for this comes from the fact that the edge boundary of any subset V 0 ⊆ T d is with |V 0 | = k for such a k is reasonably large. We will need the notion of pure nodes (as defined by [HY16] ).
Definition A.2 (Isoperimetric profile of graphs). Given a graph G = (V (G), E (G)) and a subset of vertices A ⊆ V (G), edge isoperimetric profile of G is given by the following function eip(k) defined by
eip G (k) = min E (A, A) : A ⊆ V (G), |A| = k ,
Definition A.4. ( Pure nodes). For i ∈ {0, 1}, a non-leaf node v in V i is called said to be pure if there is a path
where v k is a leaf that is a descendant of v, and Π ∩ V i = {v}.
♦
There may be multiple witnesses v k for the fact that v is a pure node. For each pure node, we shall assign one leaf arbitrarily as its pure leaf. It is easy to see that the pure leaves are distinct for each pure node.
Let the pure nodes in V 0 be P 0 and those in V 1 be P 1 and say P := P 0 ∪ P 1 . Let ℓ(P ), ℓ(P 0 ) and ℓ(P 1 )
be the pure leaves of P , P 0 and P 1 respectively. for i = j . There must exist some leaf v ∈ ℓ(P 0 ) that gets different colours in C i and C j and let u be the node in P 1 that v was a pure leaf of. We shall assume that u is minimal in the sense that any pure node u ′ ∈ P 1 that is a descendant has all its leaves identically coloured in C i and C j . But then, the colour of u inC i and inC j cannot be the same as exactly one leaf if u has a different colour inC i andC j respectively. This would then imply that 
B Exponential lower bound under any shuffling
Here we give an explicit polynomial that has polynomial sized arithmetic circuits but requires exponential sized UPT circuits under any shuffling. A version of the hard polynomial appears in [LMP16] . They show that the polynomial requires exponential sized UPT circuits and that it is efficiently computable by what are known as skew circuits (see [LMP16] for a formal definition). Here we extend the lower bound and show that it applies to any shuffling of the polynomial.
B.1 The polynomial
The hard polynomial we discuss is called the moving palindrome which is a variant of the palindrome polynomial. The palindrome polynomial of degree d on n variables, as known, is defined as follows. P ℓ (x, z) = f (x, z) (say). For P ℓ , and for ℓ < j 1 , j 2 ≤ 4d − ℓ, we will say that j 1 and j 2 are dependent with respect to P ℓ if all monomials in P ℓ contain the same variable in positions j 1 and j 2 . It is easy to see that the criterion j 1 + j 2 = 2(d + ℓ) + 1 captures this relation. Define a dependency graph
. . , 4d } such that ( j 1 , j 2 ) ∈ E ℓ if and only if j 1 and j 2 are dependent with respect 
Proof. In the polynomial P ℓ , let + 1) , . . . , (4d − k)} and S 2 = {(4d − k + 1), . . . , 4d }. Now the possible choices for V 0 can be split into the following (possibly overlapping) cases:
Note that the degree of any vertex in T 1 is at least (2d − k), and that every even (or odd) vertex in M 1 is connected to every odd (or even) vertex in S 2 . Now Since the other cases (with T 2 , S 2 , M 2 ) are symmetric to those discussed above, we can conclude the statement of the claim.
In order to complete the proof, we just need to show that any UPT circuit computing a homogeneous degree d polynomial, there will be a gate of position-type (i , p) with 
C Hitting sets for UPT circuits C.1 Commutative analogue of UPT circuits
Consider substitution map Φ : {x 1 , . . . , 
To understand the effect of Φ on a homogeneous non-commutative polynomial f (x 1 , . . . , x n ) of degree
as the unique F-linear map given by 
Similar to the above definition of Ψ, we define a shifted version of it called Ψ a (for a parameter a ∈ N)
Observation C.2. If f ∈ F 〈x 1 , . . . , x n 〉 deg=d 1 and g ∈ F 〈x 1 , . . . , x n 〉 deg=d 2 , then for any a ∈ N, we have
In the case of [FS13] , when f was computable by non-commutative ABPs, they showed that Ψ( f ) is computable by an ROABP. In our setting of non-commutative UPT circuits, the following is the commutative analogue. 
C.2 Constant width UPT circuits
In this subsection we prove the existence of a poly(n, d ) hitting set for UPT circuits of constant preimagewidth computing n-variate degree-d polynomials, when the shape of the circuit is known. The proof is an easy extension of the ideas of [GKS16] to the UPT-SML circuits regime. We will construct a univariate substitution map that preserves its nonzero-ness and has degree poly(n, d ), which will imply a hitting set naturally.
Say y = y 1 ⊔ · · · ⊔ y d and let f (y) be an nd -variate degree d polynomial computable by a UPT-SML circuit (with respect to the above partition) of constant preimage-width. From Observation 3.8, we may assume that the circuit has depth log d . We will need the following lemma for bivariates over large fields. Suppose f (y) is computable by a circuit C that has shape T . Define the set of variables t = {t τ : τ ∈ T }.
We will begin by substituting t j τ i for every y i j where the leaf in C computing polynomials over y i corresponds to τ i in T . As long as we can, we will pick a multiplication gate τ that has its left and right children (say τ L and τ R ) computing univariates in t τ L and t τ R respectively; and then substitute t τ L ← t Proof. From (3.4), we have
We may treatΦ( f ) as a bivariate polynomial in t τ L , t τ R over the field F(t \ t τ L , t τ R ) and apply Lemma C.4
to conclude that Φ τ (Φ( f )) will be nonzero if and only ifΦ( f ) was nonzero.
Now for every leaf node in T , create a sequence which we will call its signature, by walking down from the root to the leaf. Every time we pick the left child, we append L to the signature and every time we pick the right child, we append R. For τ ∈ T , call the sequence sig τ = (a 1 a 2 · · · a r ). Let t be a fresh variable and τ i be the node corresponding to y i . Define
where (a 1 · · · a r ) = sig τ i . Observe that the procedure described above essentially executes the substitution Ψ on y. We can then infer from Lemma C.5 that for any f (y) computable by UPT-SML circuits, f (y) = 0 ⇐⇒ f (Ψ(y)) = 0. This gives us the following theorem. 
D Hitting sets for FewPT circuits
We will need the following fact about coefficient operators (defined in Definition 6.2). For a type τ in a tree T , S τ will denote the set of leaves of the node τ in T . Consequently, we will also use just M f ,τ to mean M f ,S τ . We will denote by B f ,τ a set of monomials from y S τ such that the rows indexed by them in M f ,S will form a basis of the rows of M f ,S . Note that if τ has children τ 1 , τ 2 , then we can ensure that our choice of B f ,τ satisfies B f ,τ ⊆ B f ,τ 1 ×B f ,τ 2 as the latter is clearly a spanning set. Using such a basis B f ,τ , we can then write down a set of dependencies as below corresponding to f and τ.
∀m ∈ y S τ : coeff m ( f ) =
