To approximate arbitrary unitary transformations on one or more qubits, one must perform transformations which are outside of the Clifford group. The gate most commonly considered for this purpose is the T = diag(1, e iπ/4 ) gate. As T gates are computationally expensive to perform faulttolerantly in the most promising error-correction technologies, minimising the "T -count" (the number of T gates) required to realise a given unitary in a Clifford+T circuit is of great interest. We describe techniques to find circuits with reduced T -count in unitary circuits, which develop on the ideas of Heyfron and Campbell [10] with the help of the ZX calculus. Following Ref.
Introduction
An important goal of quantum technologies is to realise, as faithfully as possible, an architecture capable of performing approximately universal quantum computation. Ignoring the practical difficulties of imperfectly realised operations and noise in imperfect hardware, such an architecture must be able to approximate an arbitrary unitary transformation with high probability, possibly relative to some embedding of the computational space into the states of its qubits (e.g., to perform error correction).
The above goal requires that the set of transformations that the architecture can perform do not form a discrete set. This is challenging, as the operations which can be easily performed fault-tolerantly for various error correcting codes form a discrete set -often the Clifford group, or a subset of it. As the Clifford group is in any case useful to reason about quantum error correction and very simple procedures on quantum data, this motivates (a) considering fault-tolerant realisations of the Clifford group, together with a more labour-intensive procedure to realise some unitary transformation outside of the Clifford group, and then (b) minimising the number of non-Clifford gates required to realise or approximate a given unitary. The most popular approach is to consider "Clifford+T" circuits, using a gate-set such as {CNOT, H, S, T }, involving CNOT, the Hadamard gate H, and S = diag(1, i) as generators of the Clifford group, supplemented by the gate T = diag(1, e iπ/4 ) = √ S. We then consider the problem of minimising the T-count of a unitary transformation: the number of T gates to realise (or approximate) that unitary.
Heyfron and Campbell [10] describe a circuit transformation that allows one to realise a Clifford+T unitary using a circuit consisting of a circuit of CNOT operations, a circuit of diagonal non-Clifford operations, and a sequence of (possibly classically controlled) Clifford operations. This allows them to reduce the problem of T -count reduction to an appropriate analysis of the diagonal non-Clifford portion of this circuit. The strategy of Heyfron and Campbell [10] is to consider non-Clifford diagonal circuit in terms of phase polynomials, and builds on a sequence of results which revolve around such operations [4, 9, 3, 2, 6, 5] presented in various but similar ways. These results note the connection of T -count optimisation to difficult coding problems and NP-hard tensor decomposition problems [5, 10] , and generally approach the problem of reducing T -count by approaching these difficult problems.
Our approach is to describe diagonal CNOT+T unitaries using "π/4-parity-phase" operations. These are operations which induce a e iπ/4 phase on standard basis states, depending on a parity computation f (x) = x k 1 ⊕ x k 2 ⊕ · · · ⊕ x k m , for any integer m 1, and 1 k 1 , k 2 , . . . , k m n. As each π/4-parityphase gate can be realised in principle using a single T or T † gate (and some CNOT gates), simplifying π/4-parity-phase circuits is directly productive to reducing T -count. On this same line of investigation, Kissinger and van de Wetering [12] use the ZX calculus to describe a technique of "phase teleportation" to reduce circuits involving "phase gadgets" (denoting unitaries such as our π/4-parity-phase operations).
In this article, we describe a framework to reduce T -count, by using "tactics" which are induced by any family of identities on π/4-parity-phase operations. We then describe some identities on π/4-parityphase operations (which define two different such tactics), and describe strategies to deploy these tactics in an effective way. Our techniques yield new records for the T -count of some standard benchmark circuits, and yield results which are near to the best known results in further circuits. Because of the simple way in which we use these identities on π/4-parity-phase operations, we speculate that even these record-setting results may be easy to improve on.
Preliminaries
We first set out some basic or existing results, using the following notation. Let [n] := {1, 2, . . . , n} and ½ be the 2×2 identity matrix. For sets S, T ⊆ V we write S∆T for the symmetric difference (S∪T )\(S∩T ), and x (S) ∈ {0, 1} V denote the incidence vector of S, where x (S) j = 1 if and only if j ∈ S.
The Clifford hierarchy
Let P n := i k P 1 ⊗ · · · ⊗ P n k ∈ Z & P j ∈ {½, X , Y, Z} denote the n-qubit Pauli group. We define the Clifford hierarchy (on n qubits) by defining C n 1 := P n , and
for k > 1. We then define D n k ⊆ C n k to be the subset of diagonal operations. As an abuse of notation, we will identify C n k and D n k with subsets of C N k and D N k (respectively) for n < N. As a part of this abuse of notation, we allow ourselves to write S ∈ C n 2 and T ∈ C n 3 for all n 1.
Parity-phase operations
Defining parity-phase operations. It is easy to show that D n k forms an abelian group. In particular, one can show (see Appendix A) that D n k is generated by the operators ω · ½ ⊗n for any global phase ω, together with all operations of the form D S,k for sets S = {s 1 , . . . , s m } ⊆ [n] for m 1, defined by
where Z S = j∈S Z j . (We define D S,k for all k ∈ Z; however, one may show D S,0 =−½ ⊗n and D S,−k =½ ⊗n for all k > 0 and S ⊆ [n].) Note that X j Z S X † j = (−1) x (S) j Z S , and CNOT h, j Z S CNOT † h, j = Z S ′ such that
From this it follows that
so that D n k is preserved under conjugation by CNOT and X operations. Also note that D 2 S,k = D S,k−1 , from which it follows that D n k−1 ⊆ D n k . We refer to operations D S,k+1 , and their inverses, as "π/2 k -parity-phase" operations. We motivate this terminology as follows. Let S = {s 1 , s 2 , . . . , s m } for some m 1. For a standard basis vector |z , we
This is equivalent (up to a global phase of e −iπ/2 k+1 ) to inducing a relative phase of π/2 k on |z for those z ∈ {0, 1} n for which x (S) · z = z s 1 ⊕ z s 2 ⊕ · · · ⊕ z s m = 1; and similarly for D −1 S,k+1 . More generally, we refer to exp(± 1 2 iθ Z S ) as a θ -parity-phase operation. We note that θ -phase parity operation, the operators D S,k among them, can be represented by ZX diagrams with the usual denotational semantics (read from left to right in this article), with structure such as the following:
if conditioned on the parity of a set of bits R
where the long horizontal wires are the qubits indexed by [n] = {1, 2, . . . , n} and S ⊆ [n] is the subset of those qubits which have (light, green) degree-3 nodes on them. In the right-hand diagram, R denotes a set of boolean variables s i ∈ {0, 1}: using the extended annotations of Ref. [7] , the diagram denotes that the phase applied is ±θ only if s i ∈R s i = 1, and that otherwise the phase is zero. We refer to these as "phase gadgets", adopting the terminology of Ref. [12, Section 4.3] ; when |S| = m, we may refer to it as a "phase m-gadget". (If θ is an odd multiple of π/4, we may refer to it as a "T -phase m-gadget"; for θ an integer multiple of π/2, we refer to it as a "Clifford-phase m-gadget". If m = 1, we may also mildly abuse this terminology to refer to a simple green phase node as a "1-gadget".)
Parity-phase operations in relation to controlled phases. An important role of D S,3 gates for S ⊆ [n] is their relationship to diagonal gates in D n 3 which are controlled-unitaries of a more straightforward sense, such as CS and CCZ:
we may describe how to generate these from D k,3 operations by decomposing the projectors |11 11| or |111 111| into tensor products of |1 1| = 1 2 ½−Z), and expanding to obtain a product of D S,3 gates (see Eqn. (23) in Appendix A): disregarding the D / 0,3 factors, which realise global phases, we obtain
More generally, we may relate (t−1)-controlled π/2 k -phase gates to π/2 k−t+1 -phase parity gates, following Eqn. (23):
where the right-hand operator applies a phase of π/2 k−|T |−1 to those components of a state in which all of the qubits in T are in the state |1 . A corollary of this, on which our results depend, is that
Connection between parity-phase operations and T -count. From Eqn. (4b), it follows that any operation D S,k can be reduced to an operation D j,k ∝ diag(1, e 2πi/2 k ) acting on a single qubit j, by conjugation with an appropriate CNOT circuit. In particular, it follows that any D S,3 circuit has minimal T -count 1. This allows us to approach the question of reducing T count by considering decompositions of unitaries involving few π/4-parity-phase operations, acting on many qubits.
Previous work on π/4-parity-phase operations, and the role of the ZX calculus. Phase-parity operations were identified early in our work as objects of interest, independently of that of Ref. [12] (which is also informed by the ZX calculus) or Refs. [5, 17] (which do not use the ZX calculus). Amy and Mosca [5] identify these as relevant unitary operators, but immediately proceed to describe them rather in terms of more local controlled-phase operations. Kissenger and Van de Wetering [12] seem to have identified π/4-parity-phase operations (in the form of ZX phase gadgets) for similar reasons to us: there is the sense of being confronted with them as the principal object of study, but the lack of commitment of the ZX calculus to the circuit model allows one to be more relaxed about their nature as many-qubit operations. We note that Zhang and Chen [17] , and for that matter Litinski [13] , demonstrate that the ZX calculus is not actually required to productively reason about π/4-parity-phase operations. Indeed, little knowledge to the ZX calculus is required either to understand or to make use of our results. The role played by the ZX calculus in our work is therefore not an essential one: instead, the role played by the ZX calculus was to quickly single out π/4-parity-phase operations as the relevant objects of study, and to allow us to easily reason about them -which are the main things that one might reasonably ask of a good mathematical notation.
Reduction of T -count through simplification of parity-phase circuits
In this section, we apply circuit reduction techniques similar to those of Heyfron and Campbell [10] , augmented with techniques motivated by the ZX calculus, to describe simplifications which can reduce the T -count necessary to realise a unitary D n 3 operation. Our results do not make heavy (explicit) use of the re-write rules of the ZX calculus: a reader who is content with circuits including intermediate measurements, and who is comfortable with reading a parity-phase gadget such as that of Eqn. (6) as a unitary operator, may interpret every diagram below as a circuit diagram.
Reduction to "homogeneous" circuits of T -gadgets
We first consider a series of circuit transformations, following (and mildly extending) that of Heyfron and Campbell [10] , to reduce the amount of non-Clifford diagonal operations used to realise a unitary U . We suppose that U is given by a circuit C, initially expressed as a circuit over the gate-set X , CNOT, CCNOT, Z, CZ, CCZ, H, S, T, SWAP -of which all gates apart from {CCNOT, CCZ, T } are Clifford gates. We note that reversible circuits commonly involve multiply-controlled-NOT gates with more than two controls: for the sake of simplicity we suppose that these have been decomposed into CCNOT gates, involving computation and uncomputation on auxiliary qubits initialised to |0 in the usual way (though superior techniques to this are by now well-established: see e.g. [11, 8, 15] ).
The main concept is to isolate a "homogeneous circuit" of D n 3 operations, preceded and followed by circuits consisting entirely of (possibly classically-controlled) Clifford operations and Pauli observable measurements. To this end, we transform C as follows:
1. Replace the reversible classical operations X , CNOT, and CCNOT with the decompositions HZH, (½⊗H) CZ (½⊗H), and (½⊗½⊗H) CCZ (½⊗½⊗H) respectively. -After this step, CCZ and T are the only non-Clifford gates in C.
2. Cancel consecutive pairs of self inverse gates H, X , Z, CZ, or SWAP which occur in the circuit (e.g., such as may be introduced in the preceding step), and commute as many Clifford gates to the beginning / end of C as possible without transforming any of the gates in the circuit (e.g., commuting CZ operations but not Hadamard operations past CCZ operations). We refer to these as the "initial" and "final" Clifford stages of C below, and the rest of C as the "main body".
3. From the earliest H gate in the circuit to the latest, determine whether it can either be commuted to the initial or final Clifford part of the circuit -or commuted to be adjacent to another H gate on the same qubit -by suitable transformations of X gates, Z gates, CZ gates, or the targets of CNOT gates. If it is possible to commute it in this way, do so.
4.
Repeat step 2 to cancel pairs of H gates, or extract any further Clifford operations, to the initial or final Clifford stages of C.
5.
Rewrite the H gates in the interior of the circuit, using the following circuit / ZX gadgets (using a fresh classical bit label in place of "s" below, each time):
-After this step, X and CNOT are the only non-diagonal gates left in the main body of C.
Interpretive remark. In the circuit second from the left, the two qubits are subject to a SWAP operation, followed by a CZ = exp iπ |11 11| operation. The bottom qubit is measured finally with an X observable measurement (i.e., in the | + -basis), and the top operation is acted on finally by an X operation only if the outcome is |-. On the right are annotated ZX diagrams in the style of Ref. [7] , in which the measurement is represented as a projection with a random outcome s which is heralded and may be used to control phase operations elsewhere. The leftmost ZX diagram describes the decomposition of the controlled-Z operation, using CZ h,
The final ZX diagram propogates the single-qubit D −1 { * },2 operations towards the preparation and measurement of the second qubit, so that the second qubit is prepared in the |-y ∝ |0 − i |1 state. 6 . Decompose CCZ operations in C using the formula of Eqn. (8) , and represent T gates (on some qubit j) by D { j},3 . -After this, all non-Clifford operations are π/4-parity-phase operations, and are in the main body of C.
7. Commute all remaining X , Z, CNOT, CZ, and SWAP gates to the beginning or end of C, out of the main body and into the initial of final Clifford phases. This may transform various D S,t gates by Eqns. (4), changing the set S involved and/or negating the phase, according to the following commutation relations:
. . .
-After this, the main body of C will be a commuting circuit, consisting entirely of π/4-parityphase operations.
8. Fuse together any D S,k operations for k 3 which arise on a common subset S:
and apply Eqn. If the original circuit C had m Hadamard gates, the above procedure realises a transformation
where (reading right-to-left) Cl 0 and Cl 1 are the initial and final Clifford phases of the circuit respectively; D 0 is a circuit realising a D n 3 operation; and the circuits D j (for 1 j m) consist of the j th measurement in the | + -basis with outcome s j (denoted in ZX by a green "π,{s j }" node), followed by D n k operations conditioned on the outcome s j . We refer to a circuit with this structure as Cl-D-Cl (pronounced "cliddicle") form.
In the circuits produced by this procedure, all of the non-Clifford operations are D n 3 operations in D 0 . In particular, each of them is a π/4-parity-phase operation D S,3 -which can be realised by CNOT gates and a single T gate. This motivates the question of how to simplify a circuit consisting entirely of π/4-parity-phase operations. In some instances, we find a signifcant reduction in the T -count simply by representing the contributions to the T -count entirely in terms of π/4-parity-phase operations, and "fusing" these operations together using D 2 S,3 = D S,2 for any subset S ⊆ [n]. However, in general it is useful to consider what other reduction techniques can be used to simplify homogeneous circuits of "T -gadgets". In the setting of simplifying such a homogeneous circuit, we may easily make used of the TODD subroutine of Heyfron and Campbell [10] ; to this we add another technique which (a) in some cases yields T -counts which are better than any previously known results, whether or not one also uses TODD; and (b) is in principle extensible, allowing for the possibility of further improvements through improved algorithms for this sub-problem.
Phase Gadget Elimination tactics
Reducing the T -count while preserving the meaning of a circuit, implicitly involves applying a mathematical identity, possibly passing temporarily through different representations of these circuits. (These are often identities of diagonal unitary circuits [3, 5] , though not always [9, 12] .) In the special case of reductions by identities of π/4-parity-phase operations, these may in principle be described in terms of a commuting product of operations which are proportional to the identity operator. We now consider a general approach to the reduction of such circuits by an analysis of families of non-trivial circuits which realise the identity transformation.
(a) PHAGE tactics
In the following, we use the terms "identity of π/4-parity-phase operations" or "identity of phase gadgets" (or simply "an identity") to refer to a circuit J , whose T -count is at least 1 but which nevertheless realises the identity operation. We make the simple observation that for any family F of such "identities", there is an associated "tactic" to reduce the T -count in a homogeneous circuit C of such phase gadgets. For a given subset S ⊆ [n], this tactic is as follows:
1. Determine whether there is an identityJ ∈ F , such that C contains at least half of the T -gadgets (or alternatively the inverses of T -gadgets) which occur inJ .
2. For any such identityJ , compute a circuit C J as the product of C and J −1 (simplifying this circuit by fusing phase gadgets, possibly cancelling T -gadgets or otherwise turning into Clifford gadgets. Determine the resulting T -count.
3. Replace C with the circuit C J with the smallest T -count, if this is less than the T -count of C itself.
We call such a procedure a "Phase Gadget Elimination" (or PHAGE) tactic. This procedure is apparently "greedy", in that it selects the circuit C J which minimises the T count after a single application. It is possible to take a subtler view, in which the family F of identities which may be deployed is only implicitly defined, in a way which may depend on the particular structure of C or how it acts on S. The main principle of a PHAGE tactic is in that it selects a particular way to reduce the T count based on the independent comparison of one or more different possible identities after some bounded-time procedure.
In principle, the T-optimise subroutine of Ref. [5] and the TOOL and TODD subroutines of Ref.
[10] may be interpreted as algorithms to deploy one or more PHAGE tactics, possibly more than once in sequence, and possibly with a random choice of family F (and where F itself may on occasion be a singleton set). This approach to T -count reduction can be distinguished from that of Ref. [12] , in which phases may in principle be aggregated one at a time in circuits which are unitary but not diagonal. While such techniques seem fruitful, we suggest that investigation of identities on parity-phase operationsand the way in which such identities may be deployed as PHAGE tactics -may provide a complementary approach to reduce the T -count.
The difficulty in reducing the T -count arises from the fact that there are a very large number of identities of π/4-parity-phase operations, and a large number of subsets S ⊆ [n] which one may consider. A naïve approach is simply to let F be the family of all identities on n qubits: as this set is exponentially large, the associated PHAGE tactic is infeasible to use for large n. The difficulty is in formulating a successful strategy, in which one selects a more appropriately-sized family F of identities to try on a particular circuit or subsystem S. The question is then one of having a range of tactics which one may efficiently explore, and also successfully deploy, to reduce the T -count.
(b) Spider nest identities
Our results depend on a PHAGE tactici.e., an approach to attempt to reduce homogeneous circuits of phase gadgets -which is induced by a simple family of identities of π/4-parity-phase operations, which we now describe. While these identities are in a sense elementary, to our knowledge they have not previously been noted in the literature (though Maslov and Roetteler [14, Theorem 2] make similar observations for operations in D n 2 ). The identities can be composed from some specific homogeneous circuits which realise the identity (essentially a set of generators for the group of functions C n described by Amy and Mosca [5] ), which involve a single T -phase n-gadget for n ≥ 4, and phase k-gadgets with k ≤ 3:
Let G n denote the n-qubit circuit on the left-hand side of Eqn. (17) . This consists of a 1-gadget with phase angle (n−2)(n−3) π 8 on each line, a 2-gadget on each pair of lines with phase angle −(n−3) π 4 , and a 3-gadget with phase angle π 4 on each subset of three lines, and finally an n-gadget with phase angle − π 4 . (We prove this identity in Appendix B.)
We refer to identities of the form of Eqn. (17) -and any other identity involving a small number of large phase-gadget "spiders" together with a large number of smaller phase-gadget "spiders" -as a spider nest identity.
Features of simple spider nest identities. Our results in fact make only limited (but crucial) use of spider nest identities. As it seems likely to us that these identities can be used to greater effect than we have in our results, we now describe some features of these identities in general. Let N S represent the homogeneous circuit of phase gadgets on the left-hand side of Eqn. (17) , acting on a set S = {1, 2, . . . , n} of cardinality n. Note the following features of N S :
• If n = 4, then it is essentially the same as the rule R 13 given in [2] , and also Eqn. (10) .
• If n ≡ 1 (mod 4) or n ≡ 3 (mod 4), all of the 2-gadgets in Eqn. (17) are Clifford-phase gadgets, which do not contribute to the T -count.
• If n ≡ 3 (mod 4) or n ≡ 2 (mod 4), all of the 1-gadgets in Eqn. (17) are Clifford-phase gadgets, which again do not contribute to the T -count.
For a fixed value of n, and a T -phase gadget on 1 to 3 qubits, there is a question of whether or not such a gadget is involved in N S , as a number of the phase gadgets involved are Clifford gadgets instead. This would affect the number of phase gadgets involved in the identity, and therefore in a sense how easily one may find subcircuits on which the induced PHAGE tactic could be fruitfully used. Let T n denote the T -count of N S : then
for n ≡ 0 (mod 4); 1 6 n(n 2 − 3n + 8), for n ≡ 1 (mod 4); 1 6 n(n 2 − 1), for n ≡ 2 (mod 4); 1 6 n(n 2 − 3n + 2), for n ≡ 3 (mod 4).
(18)
In some cases, it is also possible to compose two or more circuits N S j (or their inverses) to obtain a "composite" spider nest identity which has a small T -count. This may be helpful for finding simplifications of circuits, through PHAGE tactics which use such an identity. For instance, consider the specific circuit N S N −1 S ′ where |S| 5 and S ′ = S \{r} for some r ∈ S. In this composite circuit, various small T -gadgets of N −1 S ′ and of N S cancel each other out, yielding a circuit of the form:
. . . Let r = S \ S ′ represent the top qubit in the diagram above. The purpose of composing N S with N −1 S ′ is to fuse the various phase gadgets together (as we have above) to cancel the majority of the 3-gadgets on S against the 3-gadgets on S ′ ⊂ S, and potentially to cancel almost all of the 1-gadgets on S as well. We are then left with whichever 1-and 2-gadgets on S ′ are left uncanceled, a collection of gadgets from N S involving r interacting with some one or two qubits of S ′ , and the large phase gadgets acting on all of S ′ and S respectively. IfT n denotes the T -count of the circuit above, we then havẽ T n = n 2 − n + 2 + δ n for n even;
where δ n = 1 if n ≡ 0 or n ≡ 1 modulo 4, and δ n = 0 if n ≡ 2 or n ≡ 3 modulo 4 (determining whether the 1-gadget on qubit r has T -count one or zero). As this is asymptotically smaller than T n , one may see how it could be easier to find scenarios in which a single application of the identity N S N −1 S ′ ∝ ½is beneficial.
(This most be weighed against the prospect that the asymmetry between the qubits in Eqn. (19) will imply that the structure of the input circuit will play a role in whether it is likely to be useful.)
(c) Two naïve spider-nest PHAGE tactics
We now describe the way in which we use spider nest identities to obtain our results. This involves two simple PHAGE tactics, relative to the scheme described in Section (a), using distinct families F 1 , F 2 of spider nest identities.
PHAGE TACTIC ("STOMP 4"). For a set S = {q 1 , q 2 , q 3 , q 4 }, apply the PHAGE tactic associated with the family F 1 = {N S } on the set S.
PHAGE TACTIC ("STOMP 5"). For a set S = {q 1 , q 2 , q 3 , q 4 , q 5 }, apply the PHAGE tactic associated with the family F 2 consisting of the 63 different identities
where we define S j = S \ { j} for each 1 j 5, and where p 0 p 1 p 2 p 3 p 4 p 5 ∈ {0, 1} 6 is not all zero.
These PHAGE tactics do not exploit many properties of the spider nest identities described above: they consist of a brute-force application of (possibly composite) spider nest identities on small subsystems. The tactic STOMP 5 in particular is motivated in part by the lower T -count involved in the composite spider nest identity of Eqn. (19): by testing many such composites, we attempt to find local opportunities to reduce the T -count. The strategies which use use to deploy them are also very simple: on any homogenous circuit C of π/4-parity-phase operations on n qubits, first we apply STOMP 4 on all subsets of size 4 in some order, and then we apply STOMP 5 on all subsets of size 5 in some order. (This is somewhat redundant, as F 2 contains five different identities N S j for 1 j 5; we may simplify this by requiring that p 0 p 1 p 2 p 3 p 4 p 5 have Hamming weight at least 2, replacing F 2 with a family F ′ 2 of a mere 58 identities.)
As we show in Section 4, in many cases we obtain the best known T -count for a number of circuits. Even so, our result may be considered only a proof of principle of the usefulness of spider nest identities -a more sophisticated application of them may well yield superior results to those we present below.
Analysis of a procedure to reduce T -count
We now describe the reduction procedure used in our results. Suppose that we are given a circuit C on n qubits, over the gate-set X , CNOT, CCNOT, Z, CZ, CCZ, H, S, T, SWAP .
(a) The reduction procedure
We perform the following transformations on C.
1. Reduce the circuit C to a Cl-D-Cl form, using the procedure described in Section 3.1. This serves to isolate a homogeneous circuit of commuting π/4-parity-phase operations, with the rest of the circuit consisting of Clifford group operations (possibly conditioned on the outcomes of measurements). This yields a circuit C ′ on N n qubits.
2. Perform the PHAGE tactic STOMP 4 on all subsets of size 4, in some sequence, from the N qubits on which C ′ acts. Call the resulting circuit C ′′ .
3. Perform the PHAGE tactic STOMP 5 on all subsets of size 5, in some sequence, from the N qubits on which C ′′ acts. Call the resulting circuit C ′′′ . 4*. Perform TODD on C ′′′ some constant number of times, independently; output the circuit which has the smallest T -count from among these three runs. (Our results used the best outcome from 3-40 independent executions of TODD for each circuit.)
Note that N, the number of qubits of the circuit produced as output, is a function of how many Hadamard gates are either involved in C or are introduced from the decomposition of CCNOT gates. More precisely, it also depends on how many of these gates can be commuted from the "main body" of C to the initial or final Clifford stages. Thus, for a circuit consisting of M gates, a bound which is substantially better than N n + M will be difficult without some knowledge of the structure of C. In several cases, we find that many or all of the Hadamard gates introduced by decomposing CCNOT gates can be eliminated: so, N n + M is likely a loose upper bound in practical circumstances.
(b) Remarks on the TODD subroutine
The final step involves the subroutine TODD described by Campbell and Heyfron [10] , for the simple reason that this subroutine is effective at reducing T -count without impacting the asymptotic run-time of our algorithm. It also allows us to demonstrate how using our techniques in conjunction with TODD in some cases yields a result which is better than those found to date using TODD alone.
Heyfron and Campbell bound the run-time of the TODD subroutine as O(N 3 t 2 + Nt 3 ) for t the initial T -count of the circuit -see Ref. [10, Eqn. 53 ]. The number of times TODD is invoked for a given circuit is somewhat arbitrary. As it is a randomised algorithm, it will yield different results in different invocations; and as it is difficult to determine when one has obtained a circuit with optimal (or approximately optimal) T -count, one might imagine in principle that running it a larger number of times might eventually yield a better result. As we show below, the run-time analysis of our algorithm would not be affected were we to run TODD for each circuit O(log M) times; in practise we contented ourselves with at most 40 times, and in fact at most 3 times for each circuit.
(c) Run-time analysis
Our procedure runs in time polynomial in the number of gates M of the input circuit, and can be realised in a run-time which is only slightly larger than the asymptotic upper bound of the TODD subroutine. 1 We may describe the asymptotic run-time of each of the steps of our procedure, as follows:
• Step 1 involves operations which involve simple decompositions of gates, or commutations of pairs of gates, in the circuit, and so runs in time O(M 2 ). As a part of this run-time cost (in time O(t logt)), we may create a tree structure (with t elements) storing the T -gadgets in the homogeneous circuit.
• Steps 2 and 3 involve determining whether an identity on 4-or 5-qubit subsystems of N qubit homogeneous circuits lead to T -count reductions. As each identity has constant size, the run-time for this is governed by the number of such subsystems, times the search time for a tree of size t, or O(N 4 logt) and O(N 5 logt) respectively, where t is the initial T -count of the circuit.
• Finally, TODD runs in time O(N 3 t 2 + Nt 3 ).
Thus, our procedure runs in time O(M 2 + N 5 logt + N 3 t 2 + Nt 3 ). Consider a family of circuits {C n } n∈N , with at least one operation on each qubit (so that M 1 3 n), and in which some constant fraction of the gates of C n are CCNOT gates, whose decomposition in Step 1 introduces Hadamard gates. Then we have N = n + αM for some 0 α 2, and t = β M for 0 β 7. Our procedure then runs in time O(M 5 log M), which is dominated by the asymptotic upper bound on the run-time of STOMP 5, and up to a log-factor is the same as the bound on the run-time of Step 4 (which applies TODD). Table 1 : Comparison of our techniques for T -count reduction against previous techniques, for a selection of benchmark circuits. For each circuit, we describe the number of qubits introduced by our algorithm, and the Tcounts realised after each stage of our procedure (gadget fusion, then the STOMP PHAGE tactics, and finally the TODD subroutine of Ref. [10] ). We compare the number of additional qubits required to the results of Ref. [10] , and we compare our results for T -count to the best known prior results. The prior results are classified into results which use the (computationally expensive) TODD subroutine of Ref. [10] , and those that don't. We indicate the algorithms which achieve these results by TPar [3] , TOpt [10] (specifically either TOOL(F), TOOL(NF), or TODD), recursive Reed-Muller decoding RM r [5] , or PyZX [12] . In each case, we compare the counts achievable after Steps 1 and 3 of our algorithm to the prior results without TODD, and the count achievable after Step 4 to the prior results with TODD. -In a number of instances, our results match or improve upon the best previously known results. Circuits for which our techniques are the same as or better than the best previous result are in boldface; those where our results are strictly better are also marked with an asterisk. In some instances, we manage to obtain the best known result even without the use of the TODD subroutine, indicated by a (!) mark. Note that even when we do not achieve the best known result, we often exceed that result by a single T gate. Table 1 presents a comparison of the results of our algorithm, with the previous best algorithms for reducing T -count. In order to separately demonstrate the effectiveness of the fusion of phase gadgets, the PHAGE tactics, and TODD, we describe the T -count obtained by each of these stages of the algorithm. Our results do not include an account of the cost of the Clifford group operations. These are also of interest in principle, though these will likely be significantly less expensive than T gates in the errorcorrected setting in which the T -count becomes a meaningful quantity to reduce. Almost all of our results were computed using a personal laptop (Dual-core 2.5 GHz Intel i7-6500U with 8 GiB of RAM), with either 3 independent runs of TODD, or 10 independent runs in the case of the circuits GF(2 4 )-mult and GF(2 5 )-mult. For the circuits GF(2 k )-mult for 6 k 8, we instead performed 40 runs of TODD on Dalhousie's Mathstat Cluster (each run being performed on a separate core), taking about 5 hours in total between these circuits. The circuits in Table 1 on which we demonstrate our results are those which act on 35 qubits or fewer after the stage of replacing Hadamard gates with gadgets involving auxiliary qubits.
Circuit

Results
The circuits which were obtained using our techniques may be found on GitHub [https://github. com/onestruggler/stomp]. As our main aim was to consider reductions in T -count, our algorithm ignores the possibility that the measurement outcomes on the auxiliary qubits could be anything but |+ : in the event of a |outcome, additional Clifford group operations would be required, which however would not affect the T -count. We verified our circuits using feynver [1] , which was extended to accomodate circuits involving post-selection of |+ states on qubits which are maximally entangled with a set of other qubits.
Our results show that our techniques, simple as they are, are competitive with the best known techniques for reducing T count. In some cases, the PHAGE tactics STOMP 4 and STOMP 5 match or even surpass the best known results which were known. In other cases, it is apparent that the results achievable by supplementing our techniques with TODD are better than those which were previously known with TODD and also better than only using STOMP 4 and STOMP 5. Note that even when our results do not match the best known prior results, they often differ from the best known T -count only by 1.
The particular PHAGE tactics which we used to obtain these results, and the way in which we deploy them, are (apart from TODD) very simple. We expect that better results should be achievable by a more refined approach to using these concepts, within the more general framework which we have described of deploying PHAGE tactics.
Discussion
General observations
It seems to us that the ZX calculus not only lends itself to analysis in terms of parity-phase operations, but also leads directly to the idea of analysing T -count in terms of the parity-phase operations and phase gadgets. This is particularly the case when considering circuit transformations such as those of Ref. [10] which isolate a layer of diagonal operators by commuting CNOT gates past them.
Much of our analysis clearly generalises beyond the case of reduction of T -count (as a measure of the complexity of a D n 3 circuit), to simplifications of D n k circuits. We expect that simple generalisations of Eqn. (17) would provide the opportunity to explore more general simplification of diagonal circuits.
Towards better strategies for PHAGE tactics
Our work motivates the concept of a PHAGE tactic (simplifying a part of a circuit by selecting the best identity to apply from a family of identities), and of the importance of strategically choosing identities to apply. The latter concept is one which is absent from our actual results, but would clearly be important to develop more efficient techniques to make use of spider nest PHAGE tactics. As the problem of reducing T -count is closely related to difficult decoding or tensor-decomposition problems, it is important to find ways to divide the problem into more approachable parts: the strategy/tactic distinction is one way in which this might be done, in which the development of effective "tactics" which are useful in some circumstances may be the easier part, and the development of effective "strategies" to deploy those tactics may be the more difficult part.
We now contemplate the form that a nuanced strategy to apply spider nest PHAGE tactics could take. A possible approach would be to compute the smallest number of "usable" gadgets (phase gadgets with non-trivial contribution to the T -count) of different sizes, which are required for some PHAGE tactic to possibly be useful, and then identify subsystems which may have the appropriate number of usable gadgets. This motivates the problem of finding "dense" collections of usable T -phase gadgets. Any collection of phase m-gadgets which are not essentially independent of one another must have some significant overlap: this motivates measuring the density of T -phase gadgets at each qubit q -which we define by
We also define d 3 (q), the 3-max density (of T -phase gadgets), which is the same sum but for 1 k 3. It is easy to show that d 3 (q) 1 18 + O(1/n) · n 3 ; on any qubit or collection of qubits where d(q) significantly exceeds this bound, there must be several T -phase m-gadgets for m > 3, and it may be helpful to apply Eqn. (17) to decompose these into gadgets on at most 3 qubits. Having ensured that the circuit does not have an obvious excess of large T -gadgets, we may then attempt to apply a PHAGE tactic any large collections of "usable" gadgets that we can find on subsystems of different sizes. This suggests a strategy along the following lines (which may be repeated several times):
1. Compute density of T -phase gadgets acting on each qubit (i.e., the k ∈ {1, 2, 3} terms of Eqn. (21)).
Determine the largest integer N 5, such that the sum of the N largest 3-densities is at least T ′ N . (If no N 5 satisfies this, then let N = 4.) 2. For each k ∈ {4, 5, . . . , N}, compute the score for each qubit as the sum of the densities of those m-gadgets (for 1 m 3) which are useful. 3. Again for each k, rank each qubit in order of descending score, and compute r(k) to be the "lowest" rank such that the sum of the scores of the qubits ranked {1, r(k)−k+2, r(k)−k+3, . . . , r(k)} is at least half of the smallest T -count of some spider-nest identity on k qubits. Then, let M(k) be the sum of the scores of the qubits ranked from 1 to r(k), so that M(k) is proportional to the average total score of a uniformly random subset of these qubits. 4. Repeatedly sample (a polynomial number of times) from integers k, with probability proportional to M(k); and for each sample attempt to find a subset of size k from among the qubits with the highest scores r(k), in which we may reduce the T -count by applying a spider nest identity. (We may attempt to find such a subset of size k by breadth-first-search on the hypergraph of T -gadgets). Compute the value of this set as the T -count reduction that can be realised on this subset. 5. If any set with positive value was found, realise a T -count reduction by applying an identity to the vertex-set with the largest value.
A Parity-phase operators as generators of D n k
We show in this Section that, together with arbitrary global phases, the operators D S,k = exp(− iπ 2 k Z S ) for S ⊆ [n] generate the group D n k . Let M n 1 = P n , and for k 2, let M n k consist of all products of elements of D n k with products of CNOT and X on various qubits. As D n k is preserved under conjugation by CNOT and X operations, it is easy to show that M n k forms a group for each k, and that in particular that operators in M n k decompose as a product U D U X for U D ∈ D n k and U X a circuit of CNOT and X gates.
Proof. For k = 2, elements of M n k are Clifford circuits by construction. For k > 2, consider U = U X U D ∈ M n k , where U X is a product of CNOT and X gates, and U D ∈ D n k . Then for any P ∈ P n , we have Proof. Note that D S,1 = −iZ s 1 ⊗ · · · ⊗ Z s m = −iZ S ∈ P n = C n 1 for any S ⊆ [n]. Also, by definition we have D S,k−1 = D 2 S,k+1 for any k 2. By decomposing any Pauli operator P ∈ P n into a product P ∝ X A Z B for sets A, B ⊆ [n], it is easy to see that D S,k ∈ D n k : we have
in either case, D S,k P D −1 S,k ∈ M n k−1 ⊆ C n k−1 . Then D S,k ∈ C n k , and is therefore an element of D n k .
Lemma 3. For any n, k 1, any V ∈ D n k is proportional to a product of operators D S,k for S ⊆ [n].
Proof. Consider a decomposition of V into a product of operators V = ∏ z V z for z ranging over {0, 1} n , where z|V z |z = z|V |z = exp(iθ z ) and where y|V z |y = 1 for all y = z. We may then express the operator V z as an exponential of a rank-1 projector on n qubits: V z = exp i θ z |z 1 z 1 | ⊗ · · · ⊗ |z n z n | = exp iθ z 2 n (½+(−1) z 1 Z)⊗· · ·⊗(½+(−1) z n Z) = ∏
S⊆[n]
exp iθ z 2 n j∈S (−1) z j Z j = exp ∑ S⊆ [n] i(−1) z ·x (S) θ z 2 n Z S . (23)
Taking the product over z ∈ {0, 1} n , we then have
whereθ S = ∑ z (−1) z ·x (S) θ z /2 n for the sake of brevity. For j ∈ [n], consider the effect of conjugation of X j by V : we have 
is a tensor product of Z operations. By the linear independence of the operators Z S , it follows that every factor exp(2 k−1 iθ S Z S ) is either ½ or Z S , for j ∈ S. As this result holds for all j, we obtain the same result for every non-empty set S. This implies that 2 k−1θ S ∈ π 2 Z for all S = ∅, or equivalently thatθ S = m s π/2 k for some m ∈ Z. It follows that exp(iθ S Z S ) = D −m S S,k , so that V ∝ ∏ S D −m S S,k for S ranging over non-empty subsets of [n].
B Proof of gadget decomposition
Here we provide a proof of Eqn. (17) . We express this as a proof by induction or the proportionality (i.e., the equality of the denotational semantics of ZX-diagrams) of a T -phase n-gadget for n 4 on one side, and a collection of 3-, 2-, and 1-gadgets as in Eqn. (17) on the other.
Below we use the notations τ := π 4 and ι := − π 4 for the angles of phase gadgets, written in this case inside (rather than outside) of the node to which this phase is associated. We prove Eqn. (17) In the last diagram above, we substitute every 4-gadget with the RHS of (27), and fuse together all the phase gadgets that dwell on the same lines. We assert that the resulted diagram after fusion is exactly the decomposition as presented on the RHS of (17) = θ m+1 . Similarly, one can check that the 1-gadget on each line has phase angle σ m+1 , 2-gadget on every two lines has phase angle θ m+1 , and 3-gadget on every three lines has phase angle π 4 . Therefore, (17) holds for n = m + 1. This completes the proof.
