The complexity class PPA consists of NP-search problems which are reducible to the parity principle in undirected graphs. It contains a wide variety of interesting problems from graph theory, combinatorics, algebra and number theory, but only a few of these are known to be complete in the class. Before this work, the known complete problems were all discretizations or combinatorial analogues of topological fixed point theorems.
Introduction
The class PPA. The complexity class TFNP [21] consists of NP-search problems corresponding to total relations. In the last 25 years various subclasses of TFNP have been thoroughly investigated. The polynomial parity argument classes PPA and PPAD were defined in the seminal work of Papadimitriou [22] . PPA consists of the search problems which are reducible to the parity principle stating that in an undirected graph the number of odd vertices is even. The more restricted class PPAD is based on the analogous principle for directed graphs.
The class PPAD contains a relatively large number of complete problems from various areas of mathematics. In his paper Papadimitrou [22] has already shown that among others the 3dimensional Sperner, Brouwer problems, as well as the Exchange Equilibrium problem from mathematical economics were PPAD-complete. A few years later Chen and Deng [9] proved that 2-dimensional Sperner was also PPAD-complete, and after a sequence of beautiful papers Chen and Deng [10] has established the PPAD-completeness of computing 2-player Nash equilibrium, see also [11] . Kintali [18] has compiled a list of 25 PPAD-complete problems; the list is far from complete.
In comparison with PPAD, relatively few complete problems are known in the class PPA, all of which are discretizations or combinatorial analogues of topological fixed point theorems. While the original paper of Papadimitriou [22] exhibited a large collection of problems in PPA, none of them was proven to be PPA-complete. Historically the first PPA-completeness result was given by Grigni [14] who, realizing that analogues of PPAD-complete problems in non-orientable spaces could become PPA-complete, has shown the PPA-completeness of the Sperner problem for a nonorientable 3-dimensional space. This result was strengthened by Friedl et al. [17] to a non-orientable and locally 2-dimensional space. Up to our knowledge, until 2015 just these two problems were known to be PPA-complete. Last year Deng et al. [13] established the PPA-completeness of several 2-dimensional problems on the Möbius band, including Sperner and Tucker, and they have obtained similar results for the Klein bottle and the projective plane. Recently Aisenberg, Bonet and Buss [1] have shown that 2-dimensional Tucker in the Euclidean space was PPA-complete.
Compared to the fundamental similarity of these complete problems in PPA, the list of problems in the class for which no completeness result is known is very rich. Already in Papadimitriou's paper [22] we find problems from graph theory, such as Smith and Hamiltonian decomposition, from combinatorics, such as Necklace splitting and Discrete Ham sandwich (the proof in [23] that these problems are in PPAD was incorrect [1] ), and from algebra, a variant of Chevalley's theorem over the 2 elements field F 2 , which we call Explicit Chevalley. Cameron and Edmonds [8] gave new proofs based on the parity principle for a long series of theorems from graph theory [5-7, 25, 29] , the corresponding search problems are therefore in PPA. Recently Jeřábek [15] has put several number theoretic problems, such as square root computation and finding quadratic nonresidues modulo n into PPA, and he has also shown that Factoring is in PPA under randomized reduction.
Our contribution. The main result of this paper is that two appropriately defined problems related to Chevalley-Warning Theorem [12, 28] and to Alon's Combinatorial Nullstellensatz [2] over F 2 are complete in PPA. These are the first PPA-completeness results involving problems which are not inspired by topological fixed point theorems.
The Chevalley-Warning Theorem is a classical result about zeros of polynomials. It says that if P 1 , . . . , P k are n-variate polynomials over a field of characteristic p such that the sum of their degrees is less than n, then the number of common zeros is divisible by p. The Combinatorial Nullstellensatz (CNSS) of Alon states that if P is an n-variate polynomial over F whose degree is d 1 + · · · + d n , and this is certified by the monomial cx d 1 1 · · · x dn n , for some c = 0, then in S 1 × · · · × S n ⊆ F n there exists a point where P is not zero, whenever |S i | > d i , for i = 1, . . . , n. The CNSS has found a wide range of applications among others in graph theory, combinatorics and additive number theory [2, 3] .
Over the field F 2 the two theorems greatly simplify via the notion of multilinear degree. For any polynomial P over F 2 , there exists a unique multilinear polynomial M such that P and M compute the same function on F n 2 . We call the degree of M the multilinear degree of P , denoted as mdeg(P ). We use deg(P ) to denote the usual degree of P . Then the Chevalley-Warning Theorem and the CNSS over F 2 are equivalent to the following statement:
An n-variate F 2 -polynomial has an odd number of zeros if and only if its multilinear degree is n.
The natural search problem corresponding to the CNSS therefore is: given an n-variate polynomial P whose multilinear degree is n, find a point a where P (a) = 1. Similarly, the search problem corresponding to the Chevalley-Warning Theorem is: given an n-variate polynomial P whose multilinear degree is less than n and a zero of P , find another zero.
Obviously, these problems are not yet well defined algorithmically, since it is not specified, how the polynomial P is given. The starting point of our investigations is the result of Papadimitriou about some instantiation of the Chevalley-Warning Theorem. Specifically, in [22] Papadimitriou considered the following problem. Let the polynomials P 1 , . . . , P k be given explicitly as sums of monomials, and define P (x) = 1 + k i=1 (P i (x) + 1). We have then deg(P ) = k i=1 deg(P i ), and clearly P (x) = 0 if and only if P i (x) = 0, for i ∈ [n]. Suppose that deg(P ) < n, and that we are given a ∈ F n 2 such that P (a) = 0. Then the task is to find a ′ = a such that P (a ′ ) = 0. We call this problem Explicit Chevalley, and Papadimitriou has shown [22] that it is in PPA.
Could it be that Explicit Chevalley is PPA-complete? We find this highly unlikely. There are two restrictions on the input of Explicit Chevalley. Firstly, the polynomial P is given by an arithmetic circuit (in fact by an arithmetic formula) of specific form. Secondly, the number of variables not only upper bounds the multilinear degree of P , but also the degree of P . The first restriction can be easily relaxed. We can define and compute recursively very easily the circuit degree (also known as the formal degree; see Section 2.3) of the arithmetic circuit which is an upper bound on the degree of the polynomial computed by the circuit. Could it be that the problem, specified by an arithmetic circuit whose circuit degree is less than n, becomes PPA-complete? While this problem might be indeed harder than Explicit Chevalley, we still don't think that it is PPA-complete.
We believe that the more important restriction in Papadimitriou's problem is the one on the degree of the polynomial P computed by the input circuit. As we have seen, to have an even number of zeros, mathematically it is only required that the multilinear degree of P is less than n, so putting the restriction on the degree of P is too stringent. Let's try then to consider instances specified by arithmetic circuits computing polynomials of multilinear degree less than n. However, here we face a serious difficulty. We can't just promise that the polynomial has multilinear degree less than n since PPA is a syntactic class. We must be able to verify syntactically that it is indeed the case.
The multilinear degree of the polynomial is decided by the parity of the monomials computed by the circuit which contain every variable.Let us call such monomials maximal. Indeed, the multilinear degree of P is less than n if and only if an even number of maximal monomials are computed by the circuit. A very general way to prove efficiently that a set is of even cardinality is to give a polynomial Turing machine which computes a perfect matching on the elements of the set. However, the parsing of monomials in arbitrary arithmetic circuits is a rather complex task [19] . For a start, the number of maximal monomials computed by a polynomial size arithmetic circuit can be doubly exponential, making even the description of such a monomial impossible in polynomial time. Fortunately, the situation over the field F 2 simplifies a lot, thanks to cancellations due to certain symmetries. In fact, we are able to show that over F 2 it is sufficient to consider only those monomials which are computed by consistent left/right labellings of the sum gates participating in the computation of the monomial, because the rest of the monomials cancel out. We call such labellings parse subcircuits, and we call those parse subcircuits which compute maximal monomials maximal. The introduction of parse subcircuits was inspired by the concept of parse trees in [16, 20] . Technically, this results shows that that computing the multilinear degree is in ⊕P, the complexity class Parity P.
Is there a chance that for a general circuit computing the multilinear degree is in P? As it turns out not, unless ⊕P = P, because we can show that computing the multilinear degree is also ⊕P-hard. Therefore we have to identify a restricted class of circuits computing polynomials of even multilinear degree which satisfy two properties: the class is on the one hand restricted enough that we are able to construct a polynomial time perfect matching for the maximal parse subcircuits, but it is also large enough that finding another zero for the circuit is PPA-hard. The main contribution of this paper is that we identify such a class of arithmetic circuit which we call PPA-circuits.
The definition of these circuits is inspired by a rather straightforward translation of Papadimitriou's basic PPA-problem into a problem for arithmetic circuits. In a nutshell, the basic PPAproblem is the following. Given a degree-one vertex of a graph, in which every vertex has degree at most two, find another degree-one vertex. Here, the graph, whose vertices are the 0-1 strings of given length, is given via a polynomial time Turing machine M determining the neighbourhood of any specified node. We construct an arithmetic circuit over F 2 which, given a vertex v in this graph, computes the opposite parity of the number of v's neighbours. Therefore, finding another degree-one vertex is then just the same as finding another zero of the polynomial computed by the circuit. Most importantly, the circuit is constructed to be in a special form, which allows for a polynomial-time-computable perfect matching over its maximal parse subcircuits. Roughly speaking, from the Turing machine M that describes the neighbours of vertices, we extract two arithmetic circuits D and F that also describe the neighbours in a certain way. We then define the so-called PPA-composition of these two circuits, which produces a circuit C that accesses D and F in a black box fashion. Symmetries of the PPA-composition, reflecting the special structure of degree computation, enable us to construct a polynomial-time-computable perfect matching over its maximal parse subcircuits (cf. Lemma 8). Finally we define a PPA-circuit as the sum of a PPAcomposition and another circuit whose circuit degree is less than n. This is just a minor extension of the family of PPA-compositions since circuits with degree less than n don't have maximal parse subcircuits. The reason for considering this extended family is that this way our result immediately generalizes Papdimitrou's result [22] about Explicit Chevalley, and it makes also easier to express the equivalence between the algorithmic versions of the Chevalley-Warning theorem and the CNSS.
The definition of our two problems, PPA-Circuit-CNSS and PPA-Circuit-Chevalley, is therefore the following. In both cases we are given an n-variable, PPA-circuit C over F 2 and an element a ∈ F n 2 . In the case of PPA-Circuit Chevalley, a is a zero of C, and for PPA-Circuit CNSS, we consider the sum of the circuits C and L a , where L a is a simple Lagrange-circuit having a as its only zero and having a single maximal parse subcircuit. The computational task is to compute another zero of C in case of PPA-Circuit Chevalley, and a satisfying assignment for C + L a in case of PPA-Circuit CNSS. Our result is then stated in the following theorem. Since the two problems are easily interreducible, for the proof of Theorem 1 we will show that PPA-Circuit CNSS is PPA-easy and PPA-Circuit Chevalley is PPA-hard. For the easiness part we define a graph, inspired by Papadimitriou's construction, whose vertices are the assignments for the variables and the parse subcircuits. There is an edge between a parse subcircuit and an assignment if the monomial defined by the subcircuit takes the value 1 on the assignment. In addition, we also put an edge between two maximal parse subcircuits of the PPA-composition part of the circuit if they are paired by the perfect matching. As it turns out, the odd degree vertices in this graph are exactly the assignments where the polynomial defined by the circuit is 1, and the unique maximal parse subcircuit of the Lagrange-circuit. Technically, the main part of the proof is to give, for every assignment, a polynomial time computable pairing between its exponentially many neighboring parse subcircuits. For the hardness part (which is much simpler to prove) we express the basic PPA-complete problem as a PPA-composition, as we explained above.
Previous work. Papadimitriou has proven that Explicit Chevalley is in PPA. Varga [27] has shown the same for the special case of CNSS where the input polynomial P is specified as the sum of a polynomial number of polynomials P i , where each P i is the product of explicitly given polynomials whose sum of degrees is at most n. In addition, the input also contains a polynomial time computable matching for all but one of the monomials x 1 · · · x n of P . However, the paper doesn't address the question why this doesn't make the problem a promise problem. Concerning the hardness of CNSS, Alon proved in [3] the following result. Let P be specified by an arithmetic circuit in a way that it can be checked efficiently that its multilinear degree is n. If a polynomial time algorithm can find a point a where P (a) = 1, then there are no one-way permutations.
Structure of the paper. In Section 2 we recall the definition of the class PPA, the Combinatorial Nullstellensatz and the Chevalley-Warning Theorem, and arithmetic circuits. In Section 3 we define the parse subcircuits of an arithmetic circuit over F 2 , and in Proposition 6 we prove that the polynomial computed by the circuit is the sum of the monomials computed by the parse subcircuits. In Section 4 we define PPA-circuits, and in Lemma 8 we prove that in such circuits a perfect matching for the maximal parse subcircuits can be computed in polynomial time. In Section 5 we state the problems PPA-Circuit CNSS and PPA-Circuit Chevalley over F 2 and observe that they are polynomially interreducible. In Section 6 in Theorem 11 we prove that PPA-Circuit CNSS is in PPA, and in Section 7 in Theorem 13 we prove that PPA-Circuit Chevalley is PPA-hard.
Preliminaries

Total functional NP and the class PPA
We denote the set {1, . . . , n} by [n]. A polynomially computable binary relation R ⊆ {0, 1} * ×{0, 1} * is called balanced if for some polynomial p(n), for every x and y such that R(x, y) holds, we have |y| ≤ p(|x|). Such a relation defines an NP-search problem Π R whose input is x, and the task is to find for inputs x, where R(x, y) holds for some y, such a solution y, and report "failure" otherwise. The class FNP of functional NP consists of NP-search problems. For two problems Π R and Π S in FNP, we say that Π R is reducible to Π S if there exist two functions f and g computable in polynomial time such that for every positive x, S(f (x), y) implies R(x, g(x, y)).
An NP-search problem is total if for every x, there exists a solution y. The class of these problems is called TFNP (for Total Functional NP) by Megiddo and Papadimitriou [21] . Problems in TFNP exhibit very interesting complexity properties. An FNP-complete search problem can not be total unless NP = coNP. It is also unlikely that every problem in TFNP could be solved in polynomial time since this would imply P = NP ∩ coNP. TFNP is a semantic complexity class, in the sense that it involves a promise about the totality of the relation R. It is widely believed that such a promise can not be enforced syntactically on a Turing machine, in fact there is no known recursive enumeration of Turing machines that compute total search problems. As usual with semantic complexity classes, TFNP doesn't seem to have complete problems. On the other hand, several syntactically defined subclasses of TFNP with a rich structure of complete problems have been identified along the lines of the mathematical proofs establishing the totality of the defining relation.
The parity argument subclasses of TFNP were defined by Papadimitriou [22, 23] . They can be specified via concrete problems, by closure under reduction. The Leaf problem is defined as follows. The input is a triple (z, M, ω) where z is a binary string and M is the description of a polynomial time Turing machine 1 that defines a graph G z = (V z , E z ) as follows. The set of vertices is V z = {0, 1} p(|z|) for some polynomial p. For any vertex v ∈ V z , the machine M outputs on (z, v) a set of at most two vertices. Then, we define G z as a graph without self-
Obviously G z is an undirected graph where the degree of each vertex is at most 2, and therefore the number of leaves, that is of degree one vertices, is even. Finally ω ∈ V z is a degree one vertex that we call the standard leaf. The output of the problem Leaf is a leaf of G z different from the standard leaf. The Polynomial Parity Argument class PPA is the set of total search problems reducible to Leaf. The directed class PPAD is defined by D-Leaf, the directed analog of Leaf. In the problem D-Leaf the Turing machine defines a directed graph, where the indegree and outdegree of every vertex is at most one. The standard leaf ω is a source, and the output is a sink or source different from the ω.
As shown in [23] , the definition of PPA can capture also those problems for which the underlying graph has unbounded degrees and we are seeking for another odd-degree vertex. Specifically, suppose there exists a polynomial time edge recognition algorithm ǫ(v, v ′ ), which decides whether {v, v ′ } ∈ E z . Assume also, that in addition we have a polynomial time pairing function φ(v, w), where by definition, for every vertex v, the function φ(v, ·) satisfies the following properties. For every even degree vertex v, it is a pairing between the vertices adjacent to v, that is for every such vertex w, we have φ(v, w) = w ′ , where w ′ = w, w ′ is also adjacent to v, and φ(v, w ′ ) = w. For odd degree vertices v, we have exactly one adjacent vertex w such that w is mapped to itself, and on the remaining adjacent vertices it is pairing as in the case of an even degree vertex v. The input also contains an odd degree vertex v with a proof for that, in the form of an adjacent vertex w, such that φ(v, w) = w. In [23, Corollary to Theorem 1], Papadimitriou showed that any problem defined in terms of an edge recognition algorithm and a pairing function is in PPA.
Combinatorial Nullstellensatz and Chevalley-Warning Theorem
Let F be a field. An polynomial over F (or shortly a polynomial) in n variables is a formal expression P (x) = P (x 1 , . . . , x n ) of the form
where the coefficients c d 1 ,...,dn are from F, and only a finite number of them are different from zero. The degree deg(P ) of P is the largest value of d 1 + · · · + d n for which the coefficient c d 1 ,...,dn is non-zero, where by convention the degree of the zero polynomial is −∞. The ring of polynomials over F in n variables is denoted by F[x 1 , . . . , x n ]. Every polynomial P ∈ F[x 1 , . . . , x n ] defines naturally a function from F n to F. While over infinite fields this application is one-to-one, this is not true over finite fields where different polynomials might define the same function. For example, over the field F q of size q, the polynomial x q − x is not the zero polynomial (it has degree q), but it computes the zero function.
Numerous results are known about the properties of zero sets of polynomials. The Combinatorial Nullstellensatz of Alon [2] is a higher dimensional extension of the well known fact that a non-zero polynomial of degree d has at most d zeros. It was widely used to prove a variety of results, among others, in combinatorics, graph theory and additive number theory.
Theorem 2 (Combinatorial Nullstellensatz). Let F be a field, let d 1 , . . . , d n be non-negative integers, and let P ∈ F[x 1 , . . . , x n ] be a polynomial. Suppose that deg(P ) = n i=1 d i , and that the coefficient of x d 1 1 · · · x dn n is non-zero. Then for all subsets S 1 , . . . , S n of F with |S i | > d i , for i = 1, . . . , n, there exists (s 1 , . . . s n ) ∈ S 1 × · · · × S n such that P (s 1 , . . . , s n ) = 0.
The classical result of Chevalley [12] and Warning [28] asserts that if the sum of degrees of some polynomials is less than the number of variables, than the number of their common zeros is divisible by the characteristic of the field.
Theorem 3 (Chevalley-Warning Theorem). Let F be a field of characteristic p, and let P 1 , . . . , P k ∈ F[x 1 , . . . , x n ] be non-zero polynomials. If k i=1 deg(P i ) < n, then the number of common zeros of P 1 , . . . , P k is divisible by p. In particular, if the polynomials have a common zero, they also have another one.
Both of these results clearly suggest a computational problem in TFNP: Given a (set of) polynomial(s) satisfying the respective condition of these theorems, find an element in F n satisfying the respective conclusion. We study here these problems over the two-element field F 2 where both theorems have a particularly simple form, in fact they become almost the same statement. To see that, let us recall that a multilinear polynomial is a polynomial of the form M (x 1 , . . . , x n ) = T ⊆{1,...,n} c T x T , where x T stands for the monomial i∈T x i , and the coefficients c T are elements of F 2 . We say that a monomial x T is in M if c T = 1. The degree of a multilinear polynomial M is the cardinality of the largest set T such that x T is in M . It is well known that for every polynomial P over F 2 , there exists a unique multilinear polynomial M P (x 1 , . . . , x n ) such that P and M P compute the same function. We define the multilinear degree of a polynomial P over F 2 by mdeg(P ) = deg(M P ). We call a monomial maximal if its multilinear degree is n. Clearly mdeg(P ) ≤ deg(P ), and mdeg(P ) = n if and only if the number of maximal monomials of P is odd. Using the notion of multilinear degree, we can now state the rather simple equivalent formulations of the above theorems over F 2 .
. , x n ] be a polynomial such that mdeg(P ) = n. Then there exists a ∈ F n 2 such that P (a) = 1.
. , x n ] be a polynomial such that mdeg(P ) < n, and let a ∈ F n 2 such that P (a) = 0. Then there exists b = a such that P (b) = 0.
Arithmetic circuits
An n-variable, m-output arithmetic circuit C over a field F is a vertex-labeled, acyclic directed graph whose vertices are called gates. It has n variable gates of in-degree 0, labeled by the variables x 1 , . . . , x n . There is at most one constant gate of in-degree 0, labeled by the constant, for each non-zero field element. The variable and constant gates are called input gates. The other gates are of in-degree 2, and are called computational gates. They are labeled by + or ×, the former are the sum gates, and the latter the product gates. The number of computational gates of out-degree 0 is m, and they are called the output gates. For a computational gate g, we distinguish its two children, by specifying the left and the right child. The left child is denoted by g ℓ and the right child by g r . We denote the set of sum gates by G + , and the set of product gates by G × . The size of C is the number of its gates, and the depth of C is the length of the longest path from an input gate to an output gate.
The definition of an arithmetic circuit can be extended naturally to include computational gates of in-degree different from 2. Unary computational gates by definition act as the identity operator. The children of computational gates of in-degree k > 2 are distinguished by some some distinct labeling over some set of size k. It is easy to see that such an extended circuit can be simulated by a circuit with binary computational gates, which computes the same polynomial, and has only a polynomial blow-up in size. Our default circuits will be with binary computational gates, and we will mention explicitly when this is not the case.
A subcircuit of a circuit C is a subgraph of C which is also a circuit. The subcircuit rooted at gate g is the subgraph induced by all vertices contained on some path from the input gates to g, it will be denoted by C g . The left subcircuit of C, denoted by C ℓ , is the subcircuit rooted at the left child of the root of C, and the right subcircuit C r is defined similarly. The composition of arithmetic circuits is defined in a natural way. If C 1 is an n-variable, m-output circuit and C 2 is a k-variable, n-output circuit then C 1 • C 2 is the k-variable, m-output circuit composed of C 1 and C 2 where the output gates of C 1 are identified with the variable gates of C 2 , and the identical constant gates of the two circuits are also identified. Let C 1 and C 2 be n-variable, single-output arithmetic circuit. The disjoint sum C 1 ⊕ C 2 of C 1 and C 2 is the n-variable, single-output arithmetic circuit whose output gate is a sum gate, its left and right subcircuits are disjoint copies of C 1 and C 2 except for the input gates that C 1 and C 2 share. The disjoint sum naturally generalizes to more than two circuits.
Every gate g in an arithmetic circuit computes an n-variable polynomial P g (x) in the natural way, which can be defined by recursion on the depth of the gate. An input gate g labeled by α ∈ {x 1 , . . . , x n } ∪ F computes P g = α. If g ∈ G + then P g = P g ℓ + P gr , if g ∈ G × then P g = P g ℓ P gr . The polynomial computed by a single-output arithmetic circuit C is the polynomial computed by its output gate, which we will denote by C(x). We define similarly by recursion the circuit degree cdeg(C) of C. If an input gate g is labeled by α ∈ F then cdeg(C g ) = 0, and if it is labeled by α ∈ {x 1 , . . . , x n } then cdeg(C g ) = 1. For computational gates, if g ∈ G + then cdeg(C g ) = max{cdeg(C g ℓ ), cdeg(C gr )}, and if g ∈ G × then cdeg(C g ) = cdeg(C g ℓ ) + cdeg(C gr ). The circuit degree can be computed in polynomial time, and we clearly have deg(C(x)) ≤ cdeg(C).
Over the base field F 2 , we call an element a ∈ F n 2 , such that C(a) = 1, a satisfying assignment for C, and an element a, such that C(a) = 0, a zero of C. For every a ∈ F n 2 , we define the Lagrange-circuit L a as C 1 × · · · × C n , where C i = x i if a i = 1, and C i = x i + 1 if a i = 0. Clearly mdeg(L a (x)) = n, and the only satisfying assignment for L a is a. 
Parse subcircuits
We would like to understand how monomials are computed by a single-output arithmetic circuit C. If g is a sum gate, then the set of monomials computed by C g is a subset of the union of the set of monomials computed by C g ℓ and by C gr . If g is a multiplication gate, then every monomial computed by C g is the product of a monomial computed by C g ℓ and a monomial computed by C gr . A marking of the gates in G + from the set {ℓ, r} therefore computes naturally a monomial of C(x). At first sight it seems that by considering markings restricted to the sum gates effectively participating in the computing of the monomial, we could compute all of them. This is in fact the case when the fanout of every sum gate is one, but this is not true in general circuits since the sum gates can be used several times in the computation of a monomial with possibly inconsistent markings. However, as we show it below, this is essentially true over fields of characteristic 2, where it is sufficient to consider only consistent markings. By doing that, we have to be careful about two things: when computing a monomial by some marking, we shouldn't mark those sum gates which don't participate in its computation. Indeed, by considering the two possible markings also for irrelevant gates, we would assure that the monomial is necessarily computed an even number of times, making the whole process false. On the other hand, we should mark all the sum gates necessary for the computation of the monomial. We make all this precise by the notion of closed marking and parse subcircuit. Figure 3 : Two parse subcircuits for Figure 1 , note that the second one doesn't access all sum gates.
Let C be a single-output arithmetic circuit. A marking of C is a partial function S : G + → {ℓ, r}, from the sum gates of C to the marks {ℓ, r}. We can equivalently specify a marking by a total function S * : G + → {ℓ, r, * } where S * (g) = * if and only if S(g) is undefined. We denote by Dom(S) the domain of S. For the output gate of C, let S ℓ be the restriction of S to the sum gates in C ℓ and let S r be the restriction of S to the sum gates in C r . We define G S = (V S , E S ), the accessibility graph of S by induction on the depth of C. If C is a single vertex then V S consists of this vertex, and E S = ∅. Otherwise, if the output gate is a product gate, then V S consists of the output gate of C added to V S ℓ ∪ V Sr , and E S consists of the two edges from the two children of the output gate to the output gate, added to E S ℓ ∪ E Sr . If the output gate of C is a sum gate with mark ℓ then V S consist of the output gate of C added to V S ℓ , and E S consists of the edge from the left child of the output gate to the output gate, added to E S ℓ . The definition in the case when the mark of the output gate is r is analogous. If the output gate of C doesn't have a mark then the accessibility graph is just this single node.
We say that a marking S is closed if Dom(S) = V S ∩ G + , that is if the accessible sum gates of C are exactly those where S is defined. If S is closed then the accessibility graph G S , with the vertex labels inherited from C, is in fact a subcircuit of C. The inclusion Dom(S) ⊆ V S ∩ G + ensures that the only node of out-degree 0 in G S is the output gate of C, and the inclusion V S ∩ G + ⊆ Dom(S) ensures that the leaves of G S are leaves in C. We call this subcircuit the parse subcircuit induced by S, and denote it by C S . The set of parse subcircuits of C will be denoted by S(C). Observe that a parse subcircuit has binary product gates but unary sum gates which act as the identity operator. The polynomial C S (x) computed by the parse subcircuit C S is therefore a monomial, which we denote by m S (x). We say that a parse subcircuit C S is maximal if the multilinear degree of m S (x) is n, that is m S (x) = x 1 · · · x n . We say that two parse subcircuits C S and C S ′ are consistent if for every g ∈ Dom(S) ∩ Dom(S ′ ), we have S(g) = S ′ (g).
Clearly, the mapping from closed markings to induced parse subcircuits is a bijection. Therefore, to ease notation, we will often call the closed marking S itself the parse subcircuit, and we will speak about the gates, subcircuits and other circuit related notions of S, instead of C S . The notation used for the monomial computed by a parse subcircuit is already consistent with this convention. Proposition 6. Let C be a single-output arithmetic circuit over a field F of characteristic 2. Then
Proof. We prove by induction on the depth of the circuit. If C consists of a single gate, the statement is obvious. Otherwise, the parse subcircuits of S(C ℓ ) (respectively S(C r )) are exactly the parse subcircuits of S(C) restricted to the sum gates of C ℓ (respectively C r ). When the output gate of C is a sum gate then conversely, S(C) can be obtained from S(C ℓ ) ∪ S(C r ) by extending the markings in the latter set with the appropriate mark for the root of C. Therefore, using the definitions of C(x) and m S (x), we get
where the second equality comes from the inductive hypothesis.
When the output gate of C is a product gate, the situation is more complicated. The parse subcircuits S ℓ and S r are always consistent for S ∈ S(C), but an arbitrary parse subcircuit U ∈ S(C ℓ ) is not necessarily consistent with an arbitrary parse subcircuit W ∈ S(C r ). Therefore the crux of the induction step is to show that the contribution of m U (x)m W (x) to C(x) is zero when we sum over all inconsistent U and W . Indeed, we claim that
To prove this, we define an involution (U, W ) ↔ (U ′ , W ′ ) over inconsistent pairs in S(C ℓ )×S(C r ) such that m U (x)m W (x) + m U ′ (x)m W ′ (x) = 0. For this let us fix some topological ordering of the gates in C with respect to the edges of the circuit, and let g be the first sum gate in this ordering where U and W have different marks, say U (g) = ℓ and W (g) = r. Let the restriction of U to the sum gates of C g be T 0 and let the restriction of W to the sum gates of C g be T 1 . Both T 0 and T 1 are parse subcircuits in C g , which are inconsistent only at g. Also, for some monomials m 0 (x) and m 1 (x), we have m U (x) = m 0 (x)m T 0 (x) and m W (x) = m 1 (x)m T 1 (x). The parse subcircuit U ′ is obtained from U by exchanging inside C g the parse subcircuit T 0 for the parse subcircuit T 1 , that is U ′ = (U \ T 0 ) ∪ T 1 . The parse subcircuit W ′ is similarly defined from W with the roles of T 0 and T 1 reversed. It follows from the choice of g that U ′ and W ′ are parse subcircuits respectively in S(C ℓ ) and S(C r ) such that the first inconsistency between them in the topological order is at g. Therefore starting the same process with (U ′ , W ′ ) we obtain (U, W ), and thus the mapping is indeed an involution. Since
We can now complete the induction step for product gates by observing the equalities
Though it is not directly related to the main result of the paper, we prove here, essentially as a corollary of the previous proposition, that deciding if the polynomial computed by a circuit over the two elements field has maximal multilinear degree is ⊕P-complete. Note that by the Chevalley-Warning theorem, the multilinear degree of a circuit is maximal if and only if it has odd number of satisfying assignments, and via this correspondence Proposition 7 can also be proved by using the number of 1's to build a balanced relation. The point of our proof of Proposition 7 is to show this without referring to the Chevalley-Warning theorem, and therefore illustrate the use of maximal parse subcircuits. Proof. For the easiness part, we can define a balanced relation R(C, S) where S ∈ S(C), which equals 1 if and only if S is a maximal parse subcircuit. By Proposition 6, we know that the polynomial computed by the circuit C is the sum of all the monomials computed by the parse subcircuits. Among all the parse subcircuits, only the monomials computed by maximal parse subcircuits have degree n. Thus mdeg(C(x)) = n if and only if there is an odd number of maximal parse subcircuits.
For the hardness part, we will reduce the well known ⊕P-complete problem ⊕3-SAT [26] to the maximality of mdeg(C(x)). Let φ = {F 1 , F 2 , . . . , F m } be an instance of 3-SAT, where the clause F i is the conjunction of three literals belonging to {x 1 , x 1 , . . . , x n , x n }. The reduction maps φ to an m-variable, single-output and depth-3 arithmetic circuit C defined as follows. The output gate at level 0 is a product gate. It has n children α 1 , . . . , α n , all plus gates, which compose the first level of the circuit. At level 2, for all 1 ≤ j ≤ n, the gate α j has two children x j and x j , which are product gates. The gate x j is the left child of α j , and x j is its right child. Finally at level 3 are the m variable gates F 1 , . . . , F m , such that F i is a child of y ∈ {x 1 , x 1 , . . . , x n , x n } if y ∈ F i in φ. The following is an illustration of the circuit which is the image of the formula (
We give a one-to-one mapping S from the assignments of φ to the parse subcircuits of S(C). Since all plus gates of C are reachable from the output gate, a parse subcircuit of C is an {ℓ, r}marking of the gates α 1 , . . . , α n . The parse subcircuits are therefore naturally identified with the elements of {ℓ, r} n . For an assignment x ∈ {0, 1} n , the map S is defined as
To finish the proof we show that x is a satisfying assignment if and only if S(x) is a maximal parse subcircuit. To see that, observe that x is a satisfying assignment if and only if each F i in φ contains a true literal. By the definition of S, the clause F i contains a true literal exactly when the variable F i of C is in the parse subcircuit C S(x) . Since C S(x) is maximal if and only if F i is in the parse subcircuit C S(x) for all i, this concludes the proof.
with n children, all sum gates. Every sum gate has 3 children, the left child of the ith gate is the variable gate x i , its center child is the variable gate y i , and its right child is the constant gate 1. For an n-variable, n-output circuit C, we define I ⋄ C, the diamond composition of I with C, as the n-variable, single-output circuit composed from a circuit I at the top and C below. More precisely, the variable gates of I ⋄ C labeled by x 1 , . . . , x n are also the first n variables of I, and the variable gates y 1 , . . . , y n of I are identified with the output gates of C. If C has also a constant gate 1, it is identified with the constant gate 1 of I. The polynomial computed by the circuit I is I(x 1 , . . . , x n , y 1 , . . . , y n ) = n i=1 (x i + y i + 1). It is easy to check that I(x, y) is 1 if and only if the two n-bit strings x 1 , . . . , x n and y 1 , . . . , y n are equal. Therefore I ⋄ C(x) = 1 if and only if C(x) = x.
Given two n-variable, n-output arithmetic circuits D and F , we consider the set of six n-variable, single-output circuits
where I 1 , . . . , I 6 are copies of I; D 1 , . . . , D 5 are copies of D; F 1 , . . . , F 5 are copies of F , and the six circuits share the same input gates. The PPA-composition of D and F is the n-variable, singleoutput circuit C D,F is the disjoint sum of the six circuits in C D,F . We call the circuits in C D,F the components of C D,F . The polynomial computed by C D,F is C D,F (x) = I(x, D(F (x)))+I(x, F (D(x)))+I(x, D(D(x)))+I(x, D(x)))+I(x, F (F (x)))+I(x, F (x))). Figure 7 : The circuit C D,F , the PPA-composition of the circuits D and F .
The main structural property of a PPA-composition C is that it computes a polynomial whose multilinear degree is less than n. Moreover, a witness for that can be computed in polynomial time. By Proposition 6, the multilinear degree of C(x) is determined by the parity of its maximal parse subcircuits, mdeg(C(x)) = n if and only if the parity of the maximal parse subcircuits is odd. Thus, the multilinear degree of C(x) can be certified by a special type of syntactically defined matching over its maximal parse subcircuits. Formally, a matching for maximal parse subcircuits in C is a polynomial time Turing machine µ which defines a matching over the maximal parse subcircuits of C as follows: S and S ′ are matched if µ(C, S) = S ′ and µ(C, S ′ ) = S. If µ defines a perfect matching between the maximal parse subcircuits, then mdeg(C(x)) < n. If µ defines a perfect matching outside some maximal parse subcircuit T , meaning that T is the only maximal parse subcircuit without a matching pair in µ, then mdeg(C(x)) = n.
All the above statements hold also for circuits which are the direct sum of a PPA-composition and another circuit which certifiably has no maximal parse subcircuit. This is obviously the case of circuits which compute polynomials of degree less than n. Our final set of authorized circuits are of this form. We say that a circuit C is a PPA-circuit if for some D and F , we have C = C D,F ⊕ C ′ , where mdeg(C ′ ) < n. Lemma 8. If C is a PPA-circuit then mdeg(C(x)) < n, and a perfect matching µ between the maximal parse subcircuits of C can be computed in polynomial time.
Proof. Let C = C D,F ⊕ C ′ where mdeg(C ′ ) < n. We can suppose without less of generality that C ′ is the empty circuit, that is C = C D,F . Since the six components of C are pairwise disjoint (except for the input gates), every maximal parse subcircuit in C consists of the mark of the root of C from the set {1, . . . , 6}, and a maximal parse subcircuit in the corresponding component. For the definition of µ we decompose C into the disjoint sum of three circuits C 1 , C 2 and C 3 where each of them is the disjoint sum of two PPA-components, and will define the matching inside each of these circuits. The three circuits are as follows:
Clearly C 2 and C 3 are similar, therefore it is sufficient to define µ for C 1 and C 2 .
The matching µ inside C 1 .
To ease the notation, we rename the subcircuits of C 1 as I ⋄ D • F and I ′ ⋄ F ′ • D ′ , and we suppose that I ⋄ D • F is the left subcircuit of C 1 and I ′ ⋄ F ′ • D ′ is its right subcircuit. Let us denote the output (sum) gate of C 1 by h, the sum gates of I by h 1 , . . . , h n , the output gates of D by d 1 , . . . d n , and the output gates of F by f 1 , . . . , f n . For every gate g in I, D and F , we denote the corresponding gate in I ′ , D ′ and F ′ by g ′ , and we also set h ′ = h. Let us recall the h i has three children, the left child is the input gate x i , the center child is d i , the ith output gate of D, and its right child is the constant gate 1. A parse subcircuit can map h i into one of the three marks ℓ, c and r, corresponding respectively to its left, center, and right child.
We define µ(S) for the maximal parse subcircuits of I ⋄ D • F , that is when S(h) = ℓ. The definition for the case S(h) = r is symmetric. Let us first define three sets of indices S out , S middle , S in ⊆ [n]. Let S out = {i ∈ [n] : S(h i ) = c}, that is S out contains those indices i for which the edge from the d i to h i belongs to S. By definition i ∈ S middle if there exists an edge in S from f i to a gate in D. Finally, i ∈ S in if there exists an edge in S from x i to a gate in F . We claim that S out ⊆ S in . This is indeed true, since if there exists i ∈ S out \ S in then the monomial m S (x) wouldn't contain the variable x i , contradicting its maximality. We are now ready to define S ′ = µ(S) by distinguishing two cases, depending on if S out is a proper subset of S in or not.
x 1 Case 1: S out ⊂ S in . Let i be the smallest index in S in \ S out . By definition, we let S ′ be the same as S, except on h i , where S ′ takes the mark r when S(h i ) = ℓ, and it takes the mark ℓ when S(h i ) = r. This means that the only difference between S and S ′ is that at the ith sum gate of I, one subcircuit contains the edge from x i to h i , whereas the other contains the edge from 1 to h i . S ′ is therefore a parse subcircuit. To show that S ′ is also maximal, the interesting case is when S(h i ) = ℓ and S ′ (h i ) = r, that is m S ′ (x) doesn't directly pick up x i at h i . But since i ∈ S in , the variable x i is still in S ′ , which is therefore maximal. Finally clearly µ(S ′ ) = S. Figure 10 : Case 1 of the matching µ for C 1 where i is the smallest index in S in \ S out .
Case 2: S out = S in . In that case first observe that for every index i ∈ S out , we have S(h i ) = ℓ, that is S contains the edge (x i , h i ), since otherwise m S (x) wouldn't contain x i . By definition, let Dom(S ′ ) = {g ′ ∈ G + : g ∈ Dom(S)}. For the output gate h ′ = h of C 1 we set S ′ (h ′ ) = r, that is S ′ will be a parse subcircuit of I ′ ⋄ D ′ • F ′ . For the sum gates h ′ 1 , . . . , h ′ n of I, we set S ′ (h ′ i ) = c if i ∈ S middle , and we set S ′ (h ′ i ) = ℓ otherwise. Finally, for every sum gate g ∈ Dom(S) in D or in F , we set S ′ (g ′ ) = S(g). Figure 11 : Case 2 of the matching µ for C 1 : S out = S in .
Let us recall that V S is the set of vertices of the accessibility graph G S of S. The proof that S ′ is a maximal parse subcircuit immediately follows from the following proposition. Proof. We show the implication from left to right. This is certainly true for the computational gates of I since they are all accessible in G S , as well as the computational gates of I ′ in G S ′ .
If g ∈ V S is a computational gate of D then there is a path p in G S from g to h which can be decomposed into p = p 1 p 2 , where p 1 goes from g to d i for some i ∈ S out , and p 2 is the path from d i to h. In G S ′ we have therefore a path p ′ 1 from g ′ to d ′ i . Since S out = S in , in G S we have a path p 3 from x i to f j for some j ∈ S middle . Therefore in G S ′ there exists a path p ′
If g ∈ V S is a computational gate of F then there is a path p in G S from g to h which can be decomposed into p = p 1 p 2 p 3 , where p 1 goes from g to d i for some i ∈ S middle , p 2 goes from d i to f j for some j ∈ S out , and p 3 is the path from f j to h. Then in G S ′ there exists a path p ′ 1 from g ′ to d ′ i , and a path p ′ 2 which goes from d ′ i to h ′ since i ∈ S middle . Then the path p ′ = p ′ 1 p ′ 2 goes from g ′ to h ′ .
The implication from right to left follows from the symmetry between S and S ′ . For this, it is useful to observe that S ′ out = S ′ in = S middle , and S ′ middle = S out = S in .
We have Dom(S) = V S ∩ G + since S is a parse subcircuit. Proposition 9 and the definition Dom(S ′ ) = {g ′ ∈ G + : g ∈ Dom(S)} imply that Dom(S ′ ) = V S ′ ∩ G + , and therefore S ′ is a parse subcircuit. To prove the maximality of S ′ let us show that every input gate is in V S ′ . If i ∈ S middle then the path p defined above for the computational gates in D yields a path p ′ from x i to h ′ . If i ∈ S middle then the direct path p ′ from x i to h ′ via h ′ i exists in G S ′ . Finally µ is clearly involutive in that case too.
The matching µ inside C 2 .
We now turn to the description of µ for C 2 , where we rename its two subcircuits as I ⋄ D • D ′ and I * ⋄ D * . The matching for C 2 has strong analogies with the matching for C 1 , to better see this we also use the names I ′ , F and F ′ respectively for the circuits I, D ′ and D. This means that I ⋄ D • F and I ′ ⋄ F ′ • D ′ are just different names for the circuit I ⋄ D • D ′ . We suppose that I ⋄ D • D ′ is the left subcircuit of C 2 and I * ⋄ D * is its right subcircuit. Similarly to the circuit C 1 , we denote the output gate of C 2 by h, the sum gates of I by h 1 , . . . , h n , the ouput gates of D by d 1 , . . . d n , and the output gates of D ′ by d ′ 1 , . . . , d ′ n . For every gate g in I, D and D ′ , we denote the corresponding gate respectively in I ′ , D ′ and D by g ′ . For every gate g in I and D, we denote the corresponding gate in I * and D * by g * . We also set h * = h ′ = h. Again, h i has three children, the left child is the input gate x i , the center child is d i , the right child is the constant gate 1, and the respective marks are ℓ, c and r.
We first describe S ′ = µ(S) when S is a maximal parse subcircuit of I ⋄ D • D ′ . We define S out , S middle , S in the same way as for the circuit I ⋄ D • F , keeping in mind that F = D ′ . As before, we have S out ⊆ S in . For the definition of µ we now distinguish three cases.
The definition of S ′ is identical to the first case of the definition of the matching for C 1 .
Case 2: S out = S in and there exists a sum gate g in D such that S(g) = S(g ′ ).
The definition of S ′ is identical to the second case of the definition of the matching for C 1 , with one exception. The difference is that S ′ remains in the left subcircuit of C 2 , that is for the output
x 3
x 1 x 2 (b) maximal parse circuit S ′ Figure 12 : Case 2 of the matching µ for C 2 : S out = S in and ∃g, S(g) = S(g ′ ).
Case 3: S out = S in and for all sum gate g in D, we have S(g) = S(g ′ ).
By definition we set Dom(S ′ ) = {g * ∈ G + : g ∈ Dom(S)}. For the output gate h * = h of C 2 we set S ′ (h * ) = r, that is S ′ will be a parse subcircuit of I * ⋄ D * . For every other sum gate g ∈ Dom(S), we set S ′ (g * ) = S(g).
The description S ′ = µ(S) when S is a maximal parse subcircuit of I * ⋄ D * is as follows. By definition we set Dom(S ′ ) = {g, g ′ ∈ G + : g * ∈ Dom(S)}. We set S ′ (h) = ℓ, that is S ′ is a parse subcircuit of I ⋄ D • D ′ . For the sum gates of I, we set S ′ (h i ) = S(h * i ). For every sum gate g * ∈ Dom(S) which is in D * , we set S ′ (g) = S ′ (g ′ ) = S(g * ). Figure 13 : Case 3 of the matching µ for C 2 : S out = S in and ∀g, S(g) = S(g ′ ).
The proof that S ′ is a maximal parse subcircuit is basically the same as for the case of circuit C 1 . It follows immediately from the definition that µ is an involution. The only additional point to see is that in the second case S ′ = S because S(g) = S(g ′ ), for some gate g in D.
The computational problems
We are now ready to define PPA-Circuit CNSS and PPA-Circuit Chevalley, the two computational problems corresponding to the CNSS and to the Chevalley-Warning theorem over F 2 . The input will be in both cases an n-variable, single-output PPA-circuit C, and an element a ∈ F n 2 . In the case of PPA-Circuit Chevalley, it is a zero of C, and Lemma 8 ensures that C satisfies the hypotheses of the Chevalley-Warning Theorem. For PPA-Circuit CNSS, we consider the circuit C ⊕ L a , and Lemma 8 again ensures that this circuit satisfies the hypothesis of the CNSS. The computational task is to compute b ∈ F n 2 whose existence is stipulated by these theorems. The definition of the two problems is the following.
PPA-Circuit Chevalley
Input: (C, a), where C is an n-variable PPA-circuit over F 2 , and a is a zero of C.
Output: Another zero b = a of C.
PPA-Circuit CNSS
Input: (C ′ , a), where C ′ is an n-variable PPA-circuit over F 2 , and a ∈ F n 2 .
Let us restate here our main theorem. Theorem 1. The problems PPA-Circuit CNSS and PPA-Circuit Chevalley are PPAcomplete.
Proof. In Proposition 10 below we show that PPA-Circuit CNSS and PPA-Circuit Chevalley are polynomially interreducible. In Theorem 11 in Section 6 we prove that PPA-Circuit CNSS is in PPA, and in Theorem 13 in Section 7 we prove that PPA-Circuit Chevalley is PPA-hard.
We now turn to the proof of the various parts of Theorem 1. Proof. First we reduce PPA-Circuit CNSS to PPA-Circuit Chevalley. Let (C ′ , a) be an instance of PPA-Circuit CNSS, and set C = C ′ ⊕ L a . We can suppose that C ′ (a) = 1, since otherwise we are done. We define the circuit C ′′ = C ⊕ 1. Then clearly C ′′ is a PPA-circuit, and C ′′ (a) = 0. The result of the reduction is then the input (C ′′ , a) to PPA-Circuit Chevalley. If the solution to that input is another zero b = a of C ′′ (x), then clearly C(b) = 1.
The reduction from PPA-Circuit Chevalley to PPA-Circuit CNSS is very similar. Let (C, a) be an instance of PPA-Circuit Chevalley. We set C ′ = C ⊕ 1, and C ′′ = C ′ ⊕ L a . Clearly C ′ is a PPA-circuit. The result of the reduction is (C ′ , a). If the solution to that input is a satisfying assignment C ′′ (b) = 1 then b is a zero of C. Also, b = a since C ′′ (a) = 0, therefore b is another zero of C.
PPA-easiness
Theorem 11. PPA-Circuit CNSS is in PPA.
Proof. We will give a reduction from PPA-Circuit CNSS to Leaf. Given an input N = (C ′ , a) to PPA-Circuit CNSS, we set C = C ′ ⊕ L a . We construct a graph G N = (V N , E N ) by a polynomial time edge recognition algorithm and a polynomial time pairing function φ as explained in Section 2.1. The vertices of G N are V N = F n 2 ∪ S(C). There are two types of edges in E N , the first type is between an assignment and a parse subcircuit, and the second type is between two maximal parse subcircuits. By definition, the edge {a, S} exists between a ∈ F n 2 and S ∈ S(C) if m S (a) = 1. Such an edge can be easily recognized since the monomial m S (x) can be evaluated in linear time in the size of C.
Since C is the disjoint sum of C ′ and L a , the maximal parse subcircuits of C are the maximal parse subcircuits of C ′ extended with the appropriate mark at the output gate, and the unique maximal parse subcircuit of L a , again extended with the appropriate mark at the output gate. Let us denote the latter parse subcircuit by T . Let µ be a polynomial time computable perfect matching between the maximal parse subcircuits of C ′ , which exists by Lemma 8. By definition, the edge {S, S ′ } exists between S, S ′ ∈ S(C ′ ) if both are extensions of maximal parse subcircuits of C ′ , and their restrictions to C ′ are matched by µ.
Observe that by Proposition 6, a vertex a ∈ F n 2 has odd degree if and only if C(a) = 1. If S is a maximal parse subcircuit then among the vertices in F n 2 it is only connected to 1 n . If S = T , then it has one more neighbor, its matching pair given by µ, and therefore its degree is two. On the other hand, the degree of T is one and therefore it is odd. We can therefore take T as the standard leaf.
We first give the pairing for the vertices in S(C). We fix S ∈ S(C), and let a ∈ F n 2 such that m S (a) = 1. If S is not a maximal parse subcircuit then let i ∈ [n] be the smallest integer such that x i is not in m S (x), and let a ′ be obtained from a by flipping the ith bit. Then by definition φ(S, ·) pairs a with a ′ . If S = T is a maximal parse subcircuit then it has two neighbors: its matching pair S ′ by µ and 1 n , and φ(S, ·) pairs these two neighbors. For every S, the mapping φ(S, ·) is clearly involutive.
We now turn to the more complicated pairing for the vertices in F n 2 . Observe that this depends only on the edges of the first type, that is edges between an assignment a ∈ F n 2 and a parse subcircuit S ∈ S(C). These edges can be defined actually for an arbitrary circuit C. Let us denote by G(C) the graph with vertex set F n 2 ∪ S(C) and with edges of the first type from G N . First we prove the following lemma about G(C) on induction of the size of C.
Lemma 12. For every n-variable, single-output circuit C, and for every vertex a ∈ F n 2 in G(C), a) if deg(a) is even then for all S ∈ S(C) such that m S (a) = 1, there exists g ∈ Dom(S) with P g (a) = 0, b) if deg(a) is odd then there exists a unique S ∈ S(C) such that m S (a) = 1, and P g (a) = 1 for all g ∈ Dom(S).
Proof. If C consists of a single node, the statement is obviously true. Otherwise we first handle a). When deg(a) is even then C(a) = 0. If the root is a sum gate then we are done since it is in the domain of every parse subcircuit. If the root is a product gate then at least one of its children (say the left without loss of generality) also evaluates to 0, that is C ℓ (a) = 0. Let S ∈ S(C) be such that m S (a) = 1, then we also have m S ℓ (a) = 1. By the inductive hypothesis there exists g ∈ Dom(S ℓ ) with P g (a) = 0, and since g is also in the domain of S, we are again done. We now deal with the induction step of b). When deg(a) is odd then C(a) = 1. If the root is a sum gate then one of its children evaluates to 0, and the other one to 1, say C ℓ (a) = 0 and C r (a) = 1. By the inductive hypothesis there exists a unique S ′ ∈ S(C r ) such that m S ′ (a) = 1, and P g (a) = 1 for all g ∈ Dom(S ′ ). On the other hand, if S ∈ S(C) such that m S (a) = 1 and the mark of S at the root is ℓ, then S ℓ ∈ S(C ℓ ) and m S ℓ (a) = 1, and by a) there exists g ∈ Dom(S) with P g (a) = 0. Therefore the unique S satisfying the hypothesis of the statement is S ′ extended with the mark r at the root.
To finish the induction step for b), let us suppose now that the root of C is a product gate. Then by the inductive hypothesis there exists a unique S ′ ∈ S(C ℓ ) such that m S ′ (a) = 1, and P g (a) = 1 for all g ∈ Dom(S ′ ), and similarly there exists a unique S ′′ ∈ S(C r ) such that m S ′′ (a) = 1, and P g (a) = 1 for all g ∈ Dom(S ′′ ). We claim that S ′ and S ′′ are compatible, and therefore their union S = S ′ ∪ S ′′ is the unique parse subcircuit of C satisfying the claim. Suppose that it is not the case, that is there exists g ∈ Dom(S ′ ) ∩ Dom(S ′′ ) such that S ′ (g) = S ′′ (g). Since P g (a) = 1, for one of its children, say for g ℓ , we have P g ℓ (a) = 0, contradicting the inductive hypothesis about the parse subcircuit in {S ′ , S ′′ } which takes the value ℓ in g.
We give now the pairing φ(a, ·) for a ∈ F n 2 . If deg(a) is even then let S ∈ S(C) be such that m S (a) = 1. By Lemma 12 there exists a sum gate in the domain of S where P evaluates to 0. Let g be in some topological ordering of the gates of C the first sum gate such that P g (a) = 0, and suppose without loss of generality that S(g) = ℓ. Let Z ∈ S(C g ) be the restriction of S to C g , and we obviously have m Z (a) = m Z ℓ (a) = 1. We claim that P g ℓ (a) = P gr (a) = 1. Indeed, if P g ℓ (a) = P gr (a) = 0, then by Lemma 12, applied to C g ℓ , there exists g ′ ∈ Dom(Z ℓ ) with P g ′ (a) = 0, which contradicts the choice of g. Therefore again by Lemma 12 there exists a unique Z ′′ ∈ S(C gr ) such that m Z ′′ (a) = 1, and P h (a) = 1 for all h ∈ Dom(Z ′′ ). We let Z ′ ∈ S(C g ) be the extension of Z ′′ with Z ′ (g) = r. Finally we define φ(a, S) as the parse subcircuit S ′ obtained from S by exchanging Z with Z ′ , that is S ′ = (S \ Z) ∪ Z ′ . It is clear that m S ′ (a) = 1, and φ(a, S ′ ) = S.
If deg(a) is odd then by Lemma 12 there exists a unique parse subcircuit S such that m S (a) = 1, and P g (a) = 1, for all g ∈ Dom(S). We set φ(a, S) = S. For all parse subcircuits S such that P g (a) = 0, for some g ∈ Dom(S), the construction of S ′ = φ(a, S) is identical to the previous case.
The finish the proof, observe that the vertices of odd degree in V N other than the standard leaf T are the elements a ∈ F n 2 such that C(a) = 1. Therefore the output of the reduction is a satisfying assignment a for C.
PPA-hardness
Theorem 13. PPA-Circuit Chevalley is PPA-hard.
Proof. We will reduce Leaf to PPA-Circuit Chevalley. Let (z, M, ω) be an instance of Leaf, where M defines the graph G z = (V z , E z ) with V z = {0, 1} n , for some polynomial function n of |z|, and ω is the standard leaf in G z . We know that for every vertex u, M (z, u) is a set of at most two vertices. Composing the standard simulation of polynomial time Turing machines by polynomial size boolean circuits [24] with the obvious simulation of boolean circuits by arithmetic circuits, there exist two n-variables, n-output polynomial size arithmetic circuits D and F with the following properties: Consider the PPA-composition C D,F of D and F . We claim that for every vertex u, the degree of u in G z is odd if and only if u is a satisfying assignment for C D,F . This is equivalent to saying that the parity of the degree of u is the same as the parity of the satisfied components of C D,F . The proof of this claim is straightforward, but somewhat tedious. We distinguish three cases in the proof, depending on the cardinality of M (z, u) \ {u}. 
