Fast and effective techniques for T-count reduction via spider nest
  identities by de Beaudrap, Niel et al.
ar
X
iv
:2
00
4.
05
16
4v
2 
 [q
ua
nt-
ph
]  
14
 A
pr
 20
20
Submitted to:
TQC 2020
c© N. de Beaudrap, X. Bian, & Q. Wang
This work is licensed under the
Creative Commons Attribution License.
Fast and effective techniques for T-count reduction
via spider nest identities
Niel de Beaudrap
Department of Computer Science
University of Oxford
Oxford, UK
niel.debeaudrap@cs.ox.ac.uk
Xiaoning Bian
Department of Mathematics & Statistics
Dalhousie University
Halifax, Canada
bian@dal.ca
Quanlong Wang
Department of Computer Science
University of Oxford
Oxford, UK
Cambridge Quantum Computing Ltd.
Cambridge, UK
quanlong.wang@cs.ox.ac.uk
Abstract
In fault-tolerant quantum computing systems, realising (approximately) universal quantum compu-
tation is usually described in terms of realising Clifford+T operations, which is to say a circuit of
CNOT, Hadamard, and pi/2-phase rotations, together with T operations (pi/4-phase rotations). For
many error correcting codes, fault-tolerant realisations of Clifford operations are significantly less
resource-intensive than those of T gates, which motivates finding ways to realise the same transfor-
mation involving T -count (the number of T gates involved)which is as low as possible. Investigations
into this problem [3–6, 13, 21] has led to observations that this problem is closely related to NP-hard
tensor decomposition problems [23] and is tantamount to the difficult problem of decoding exponen-
tially long Reed-Muller codes [6]. This problem then presents itself as one for which must be content
in practise with approximate optimisation, in which one develops an array of tactics to be deployed
through some pragmatic strategy. In this vein, we describe techniques to reduce the T -count, based
on the effective application of “spider nest identities”: easily recognised products of parity-phase
operations which are equivalent to the identity operation. We demonstrate the effectiveness of such
techniques by obtaining improvements in the T -counts of a number of circuits, in run-times which
are typically less than the time required to make a fresh cup of coffee.
1 Introduction
To achieve practical scalable quantum computation, it is important to find effective (both useful and
efficient) techniques to reduce the resources required to perform computations. Error correction, and in
particular realising operations in a fault-tolerant way, is expected to be a particularly significant source
of resource overheads. In most quantum error-correcting codes, Clifford group operations involve less
overhead than non-Clifford gates, such as the T (or pi/4 phase-rotation) gate. As the set of Clifford+T
circuits is approximately universal for quantum computation [32], this motivates the T-count — or the
number of T gates — as a quantity of interest in the resources required to realise a quantum computation.
On the other hand, in order to test the effectiveness of quantum technologies, it is helpful to be able
to simulate the outcomes of quantum computations inasmuch as this is feasible. As circuits of Clifford
operations can be efficiently simulated [1, 22], this motivates the approach of simulating quantum circuits
by extending those efficient simulation techniques [11, 12], this again motivates the T count as a measure
of interest in the complexity of quantum circuits.
In this article, we consider the problem of reducing the T -count required to represent a unitary circuit
provided as input. Following Heyfron and Campbell [23], we consider transformations of circuits which
isolate a subcircuit of diagonal operations which is the only part of the algorithm with non-trivial T -
count. The approach of Heyfron and Campbell [23] is to transform Clifford+T circuits, to circuits with
the following structure:
2 Fast and effective techniques for T-count reduction via spider nest identities
(i) An initial stage of CNOT gates; followed by
(ii) A stage of diagonal non-Clifford operations; followed by
(iii) A sequence of (possibly classically controlled) Clifford operations.
This allows Ref. [23] to reduce the problem of T -count reduction to an analysis of the diagonal non-
Clifford portion of this circuit, in terms of phase polynomials. This builds on a sequence of results which
revolve around such operations [3–6, 13, 21] presented in various but similar ways, and in particular
establishes a connection between T -count optimisation and difficult coding problems and tensor decom-
position problems [6, 23]. Our approach is to elaborate on that of Campbell and Heyfron as follows:
• Reduce the complexity of the diagonal non-Clifford operation by more flexible (but essentially
elementary) separation of the circuit into stages by allowing the first stage to contain arbitrary
Clifford gates;
• Analyse the diagonal non-Clifford portion of the circuit directly in terms of “pi/4-parity-phase
operations” — essentially operators of the form exp(ipi
8
(Z⊗·· ·⊗Z))— rather than as phase poly-
nomials, simplifying them through the efficient application of identities of such operations.
We call these “pi/4-parity-phase operations” as they induce a eipi/4 relative phase on standard basis states,
depending on some parity computation f (x) = xk1⊕xk2⊕·· ·⊕xkm . As each pi/4-parity-phase gate can be
realised in principle using a single T or T † gate (and some CNOT gates), simplifying pi/4-parity-phase
circuits is directly productive to reducing T -count.
This line of investigation, first identified in the context of T -count by Amy, Maslov, and Mosca [4],
was further developed upon by Gosset et al. [21], Amy and Mosca [6], Kissinger and van de Weter-
ing [26], and Zhang and Chen [34]. In previous work [7], we described a family of identities of pi/4-
parity-phase operations — “spider nest identities”’ — which, when used in combination with Heyfron
and Campbell’s “TODD” subroutine [23], led to new records in T -count for several benchmark circuits.
In this work, we report new techniques for T -count reduction through the use of spider nest identi-
ties, and compare their effectiveness (the reduced T count and run-times) against the best previous result
found in the literature. While these techniques could easily be combined with other high-performance
reduction subroutines such as TODD, our results do not involve any other recently developed techniques
beyond those of Ref. [7]. We obtain a number of new records for the T -count, obtained almost exclu-
sively1 in very practical run-times on a consumer-grade laptop. (For example, the second-largest circuit,
on 768 qubits, was simplified in less than 3 minutes.) This opens the door to further improvements
through the identification of further useful identities of pi/4-parity-phase operations, and improved tech-
niques for deploying these identities.
2 Preliminaries
We first set out some basic or existing results, using the following notation. Let [n] := {1,2, . . . ,n} and 1
be the 2×2 identity matrix. For sets S,T ⊆V wewrite S∆T for the symmetric difference (S∪T )\(S∩T ),
and x(S) ∈ {0,1}V denote the incidence vector of S, where x(S)j = 1 if and only if j ∈ S. We let Pn :={
ikP1⊗·· ·⊗Pn
∣∣k∈Z&Pj∈{1,X ,Y,Z}} denote the n-qubit Pauli group. We define the Clifford hierarchy
(on n qubits) by defining C n1 := Pn, and
C nk =
{
U ∈Un(C)
∣∣∀P∈Pn.UPU†∈ C nk–1} (1)
1The one circuit which we did not simplify on a laptop was the largest benchmark circuit that we tested, acting on 1536 qubits
and involving nearly two million T gates alone. This was instead simplified on Dalhousie University’s Mathstat Cluster [14],
which took less than 15 minutes to realise a 43% reduction in T -count.
N. de Beaudrap, X. Bian, & Q. Wang 3
for k > 1; we call C nk (for arbitrary n) the k
th level of the Clifford hierarchy. As an abuse of notation, we
identify C nk as a subset of C
N
k for n< N; we may then write S ∈ C n2 and T ∈ C n3 for all n> 1.
Let Dnk ⊆ C nk be the subset of diagonal operations in the kth level of the Clifford hierarchy. (We again
identify Dnk as a subset of D
N
k for n<N.) It is easy to show that D
n
k forms an abelian group. In particular:
consider any diagonal operation as a product of operators exp
(
iθx |x〉〈x|
)
for various x ∈ {0,1}n, and
expand each |x〉〈x| as a linear combination of Pauli operators. Then one may show (see Ref. e.g. [7,
Appendix A]) that Dnk is generated by the operators ω ·1⊗n for any global phase ω , together with all
operations of the form DS,k for sets S= {s1, . . . ,sm} ⊆ [n] for m> 1, defined by
DS,k = exp
(
− ipi
2k
(
Zs1⊗·· ·⊗Zsm
))
= exp
(
− ipi
2k
ZS
)
= cos
(
pi
2k
)
1− isin( pi
2k
)
ZS , (2)
where ZS =
⊗
j∈S Z j .2 Note that XaZSX†a = (−1)x
(S)
a ZS, and that CNOTa,bZSCNOT
†
a,b = ZS′ , where here
S′ = S∆{a} if b ∈ S and S′ = S otherwise. From this it follows that
XbDS,kX
†
b = D
−1
S,k ∈ Dnk (3a)
if b ∈ S (and XbDS,kX†b =DS,k otherwise); and
CNOTa,bDS,kCNOT
†
a,b = DS′,k ∈ Dnk (3b)
so that Dnk is preserved under conjugation by CNOT and X operations. Also note that D
2
S,k = DS,k−1,
from which it follows that Dnk−1 ⊆Dnk .
We refer to the operators DS,k+1 , and their inverses, as “pi/2
k-parity-phase” operations, as the action
of DS,k+1 on standard basis states is given by
DS,k+1 |z〉 = eipi/2k+1 exp
(
i [x(S) · z]pi/2k
)
|z〉 (4)
inducing a relative phase of pi/2k depending on the result of a parity computation x(S) · z = zs1 ⊕ zs2 ⊕
·· ·⊕ zsm . More generally, we may refer to exp(± 12 iθZS) as a θ -parity-phase operation.
From Eqn. (3b), it follows that any operation DS,k can be reduced to an operationDj,k ∝ diag(1,e
2pii/2k )
acting on a single qubit j, by conjugation with an appropriate CNOT circuit. In particular, it follows that
the operation DS,3 can be easily realised with a T -count of 1. This allows us to approach the question of
reducing T count by considering decompositions of unitaries involving few pi/4-parity-phase operations,
acting on many qubits. Amy and Mosca [6] noted the relevance of the operators DS,k in this context, and
both Kissinger and van de Wetering [26] and Zhang and Chen [34] make direct use of them in their anal-
ysis of T count to achieve their results. (Litinski [27] similarly considers these operators in the context
of compilation of quantum circuits to lattice surgery [24]).
An important role of DS,3 gates for S ⊆ [n] is their relationship to diagonal gates in Dn3 which are
controlled-unitaries in a more straightforward sense, such as CS and CCZ:
CS= exp
( ipi
2
|11〉〈11|
)
, CCZ = exp
(
ipi |111〉〈111|
)
; (5)
we may describe how to generate these from Dk,3 operations by decomposing the projectors |11〉〈11| or
|111〉〈111| into tensor products of |1〉〈1| = 1
2
(
1− Z), and expanding to obtain a product of DS,3 gates.
Disregarding any D /0,3 factors, which realise global phases, we obtain
CSh, j ∝ D{h},3D{ j},3D−1{h, j},3 ; CCZg,h, j ∝ D{g},3D{h},3D{ j},3D
−1
{g,h},3D
−1
{g, j},3D
−1
{h, j},3D{g,h, j},3 . (6)
2We define DS,k for all k ∈ Z; however, as one may easily show DS,0=−1⊗n and DS,k=1⊗n for all k < 0 and S ⊆ [n], these
operations are of interest principally for k > 0.
4 Fast and effective techniques for T-count reduction via spider nest identities
More generally, we may relate (t−1)-controlled pi/2k-phase gates to pi/2k−t+1-phase parity gates:
∏
S∈℘(V )
S 6= /0
D
(−1)|S|
S,k ∝ exp
( ipi
2k−|V |+1
|1〉〈1|⊗V
)
, (7)
where the right-hand operator applies a phase of pi/2k−|T |−1 to those components of a state in which all
of the qubits in T are in the state |1〉.
Circuits of parity-phase operations on n qubits which realise the identity, correspond in the notation
of Amy and Mosca [6] to operatorsUPa for a ∈ Cn ⊆ Z2
n−1
8 , where
Pa(z) = ∑
x∈{0,1}n
x6=0
ax
(
x1z1⊕ x2z2⊕·· ·⊕ xnzn
)
(8)
and where UPa |z〉 = exp
(
ipi
4
Pa(z)
) |z〉, which is identically |z〉 for all z ∈ {0,1}n when a ∈ Cn. Let
supp(a) =
{
x ∈ {0,1}n : ax 6= 0
}
. In this notation, each element y ∈ supp(a) corresponds to a single
phase-parity operator acting on the qubits j for which y j = 1; the relative phase induced by this operator
is aypi/4; and the polynomial Pa describes a commuting product of such operations, for which Pa :
{0,1}n → Z8 is the all-zero function when a ∈ Cn.
We remark that a θ -phase parity operation U (such as an operator DS,k) can be easily represented as
tensor networks, using ZX diagrams (see Appendix A for an introduction to this notation),3 with structure
such as the following:
... ±θ
(
or
... ±θ ,R if classically conditioned on a parity ∑R≡ 1 (mod 2)
)
(9)
where horizontal wires represent qubits which are acted on byU , and S⊆ [n] is the subset of those qubits
which have (light, green) degree-3 nodes on them. These are “phase gadgets”, using the terminology of
Kissinger and van de Wetering [26]. When the number of qubits acted on is m, we may refer to it as an
“m-gadget”. (If θ is an odd multiple of pi/4, we may refer to it as a “T -phase m-gadget”; for θ an integer
multiple of pi/2, we refer to it as a “Clifford-phase m-gadget”. If m = 1, we may also mildly abuse this
terminology to refer to a simple green phase node as a “1-gadget”.)
Remark. The role played by the ZX calculus in our work is not an essential one, nor is expertise
in the ZX calculus required to understand our results. However, in practice it did inform our line of
investigation, by allowing us to obtain our results more quickly by identifying the objects of interest, and
by making it easy to reason directly about the operators DS,k. As the ZX calculus also provides a useful
notation for visually representing the (non-local) unitary gates DS,k in a readable way, as in Eqn. (9),
we use this notation in the article below. Readers should be able to understand our results by reading
ZX diagrams simply as a straightforward alternative notation for quantum circuits (see Appendix A), the
transformations of which are the subject of our work.
3In this article, where they occur, ZX diagrams may be read essentially as circuit diagrams, and in particular are read from
left to right as with other circuit diagrams.
N. de Beaudrap, X. Bian, & Q. Wang 5
3 Phase gadget elimination tactics & spider nest identities
Reducing the T -count while preserving the meaning of a circuit, implicitly involves applying a math-
ematical identity. These are often identities of diagonal unitary circuits [4, 6, 34], though not al-
ways [21, 26].) In the special case of unitary circuits consisting solely of pi/4-parity-phase operations,
such a mathematical identity may be described in terms of a commuting product of operations which are
proportional to the identity operator; and for any such identity, there is the question of how to effectively
apply it to realise a significant reduction of T -count, as efficiently as possible.
In this section, we describe a broad framework for the reduction of T -count by means of the ap-
plication of mathematical identities of commuting Dn3 operations. We also present some mathematical
identities of this form — called “spider nest identities” — first presented in Ref. [7], and describe new
techniques to use these identities to reduce T -count.
In the following, we use the terms “identity of pi/4-parity-phase operations” or “identity of phase gad-
gets” (or simply “an identity”) to refer to a circuit J , whose T -count is at least 1 but which nevertheless
realises the identity operation.
3.1 PHAGE tactics
We consider a particular approach to the reduction of Dn3 circuits by an analysis of families of non-trivial
circuits which realise the identity transformation, which may be applied more broadly than we do here
(and which in principle can be used to describe some existing techniques [6, 23]). For any family F
of identities of pi/4-parity-phase operations, there is an associated “phase gadget elimination tactic” (or
PHAGE tactic) to reduce the T -count in a circuit C of such phase gadgets:
PHAGE Tactic (F ):
1. Determine whether there is an identityJ ∈F , such that C contains at least half of the T -gadgets
which occur inJ (or their inverses).
2. For any such identityJ , compute a circuit CJ as the product of C and J
−1. This may allow
for simplifications (using the fact noted in Section 2 that D 2S,k = DS,k−1), where by T -gadgets
accumulate to form Clifford gadgets or to cancel altogether. Determine the resulting T -count.
3. Replace C with the circuit CJ with the smallest T -count, if this is less than the T -count of C itself.
The behaviour of a PHAGE tactic is in a sense “greedy”, in that it selects some circuit CJ which min-
imises the T count after a single application, ignoring the possibility of a more complicated sequence of
reductions. The main principle of a PHAGE tactic is in that it selects a way to reduce the T -count, based
on the comparison of a few different applicable identities of phase gadgets from a specific family F .
Such a tactic can then be applied again, or followed by other such “tactics”.
In principle, the Tpar subroutine of Ref. [6], the TOOL and TODD subroutines of Ref. [23], and
the results of Zhang and Chen [34] may be interpreted as algorithms to deploy PHAGE tactics, possibly
more than once in sequence, and possibly with a random choice of family F (and where F itself may
on occasion be a singleton set). This approach to T -count reduction can be distinguished from that of
Kissinger and van de Wetering [26], in which phases may be reduced in unitary circuits (or more general
tensor networks) which are not diagonal.
The difficulty in reducing the T -count arises from the fact that there are a very large number of iden-
tities of pi/4-parity-phase operations, and a large number of subsets S ⊆ [n] which one may consider. As
Amy and Mosca observe [6], reducing the T -count is formally equivalent to decoding a length 2n−1
6 Fast and effective techniques for T-count reduction via spider nest identities
punctured Reed-Muller code, in that the smallest T -count of a circuit amounts to the distance of a ci-
phertext to a valid codeword of such a code. However, no polynomial-time algorithms are known for the
decoding problem on such codes. The difficulty is in formulating a successful strategy — a means of
selecting an appropriately-sized family F of identities to try on a particular circuit. The question is then
one of having a variety of tactics which one may efficiently explore and deploy to reduce the T -count.
3.2 Spider nest identities
We consider PHAGE tactics arising from identities of pi/4-parity-phase operations (i.e., of T -phase gad-
gets) which can be composed from some specific circuits — introduced in Ref. [7], and which we call
“spider nest identities” — which realise the identity operator.
In qualitative terms, a “spider nest identity” consists of any circuit of phase-parity operations which
realises an operation on n qubits which is proportional to the identity, in which only “very few” operations
act on “many” qubits, and the vast majority act on “very few” qubits. (In terms of the notation of Amy
and Mosca [6], they would correspond to a ∈ Z2n−18 for which only very few y ∈ supp(a) have Hamming
weight larger than some low threshold w> 0; in the case of Dn3 operations, we set w= 3.) We generate
these circuits from a minimal family of such circuits for n > 4, involving a single phase 4-gadget and
various phase k-gadgets with k 6 3:
n

 ...
...
(n−2)(n−3)pi
8
(n−2)(n−3)pi
8
(n−2)(n−3)pi
8
−(n−3)pi
4
−(n−3)pi
4 −(n−3)pi
4
pi
4
...
...
−pi
4
∝ 1⊗n. (10)
Here, the n-qubit circuit on the left-hand side of Eqn. (10) consists of a 1-gadget with phase (n−2)(n−3)pi
8
on each line, a 2-gadget on each pair of lines with phase −(n−3)pi
4
, and a 3-gadget with phase pi
4
on each
set of three lines, and finally an n-gadget with phase angle −pi
4
. (For a proof of this identity, see Ap-
pendix B of Ref. [7]; in the case n= 4 this corresponds to R13 of Ref. [3].) The name “spider nest” here
refers to the qualitative feature that it involves a few “large spiders”, together with a large number of
“small spiders”.
Let NS represent the circuit of phase gadgets on the left-hand side of Eqn. (10), acting on a set
S = {1,2, . . . ,n} of cardinality n. How easily one may use this identity as part of a PHAGE tactic, to
reduce T -count, is affected by the T -count of the circuit NS itself. For a fixed value of n, and a T -phase
gadget on 1 to 3 qubits, there is a question of whether or not such a gadget is involved in NS , as a number
of the phase gadgets involved are Clifford-phase gadgets instead. In particular:
• If n ≡ 1 (mod 4) or n ≡ 3 (mod 4), all of the 2-gadgets in Eqn. (10) are Clifford-phase gadgets,
which do not contribute to the T -count.
• If n ≡ 2 (mod 4) or n ≡ 3 (mod 4), all of the 1-gadgets in Eqn. (10) are Clifford-phase gadgets,
which again do not contribute to the T -count.
Let Tn denote the T -count of NS : then
Tn =


1
6
n(n2+6δn−1), for n even;
1
6
n(n2−3n+6δn+2), for n odd,
(11)
N. de Beaudrap, X. Bian, & Q. Wang 7
where δn = 1 if n ≡ 0 or n ≡ 1 modulo 4, and δn = 0 if n ≡ 2 or n ≡ 3 modulo 4 (determining whether
the 1-gadgets on each wire have T -count one or zero). In general, we have Tn =
1
6
n3−O(n2)±O(n).
The scaling of Tn above might suggest that these circuits have at best a limited role to play in T -count
reduction: for increasing sizes of wire-sets S, a somewhat large number of operations on a given subset S
of wires must be present for substitution of NS to yield a reduction in T -count. However, by composing
multiple such circuits NS for different subsets S, we may obtain a “composite” spider nest identity which
has a smaller T -count, and which is thus more likely to be usable in practise for T -count reduction.
For instance, consider the specific circuit NSN
−1
S′ where |S|> 5 and S′ = S\{r} for some r ∈ S. As
all of the operations in these circuits commute, it is possible to see that most of the phase 3-gadgets of NS
— the dominant contribution to Tn above — are cancelled by corresponding phase 3-gadgets of N
−1
S′ .
(In many cases, most of the phase 1-gadgets of NS are similarly cancelled.) By collecting together the
actions of the phase gadgets on each subset, we may show that NSN
−1
S′ simplifies to a circuit of the
following form:
...
...
(n−3)pi
4
(n−3)pi
4
(n−3)pi
4
n−1


(n−2)(n−3)pi
8
−pi
4
−pi
4 −pi
4
−(n−3)pi
4
−(n−3)pi
4
−(n−3)pi
4
pi
4
pi
4
pi
4
...
...
−pi
4
...
...
pi
4
, (12)
If r= S\S′ represents the top qubit in the circuit above, note in particular that the dominant contributions
to the size of the circuit are the phase 2-gadgets on all size-2 subsets of S′, and the phase 3-gadgets which
involve r and some size-2 subset of S′. If T˜n denotes the T -count of the circuit above, we then have
T˜n =
{
n2−n+2+δn for n even;
n2−3n+4+δn for n odd,
(13)
where again δn = 1 if n≡ 0 or n ≡ 1 modulo 4, and δn = 0 if n ≡ 2 or n≡ 3 modulo 4. In any case, we
have T˜n = n
2−O(n).
3.3 Simple PHAGE tactics based on spider nest identities
Combining the two ideas above, we describe the PHAGE tactics which are used to achieve the T -count
reductions seen in our results.
The first tactic is the reduction of phase-parity circuits by merging together pi/4-parity-phase oper-
ations which act on sets of qubits in common, which may be described as the PHAGE tactic associated
to the circuits consisting of mutually inverse pairs of T -phase gadgets on all possible sets of qubits. To
do this to greatest effect (and also as simply as possible), we first use a circuit transformation procedure
along the lines of Heyfron and Campbell [23], with modifications to improve performance. (In the con-
text of reasoning about T count in terms of pi/4-parity-phase operations, this technique was introduced in
Ref. [7].) We describe this in more detail in the following Section, which describes our T -count reduction
procedure.
Our other PHAGE tactic (or tactics, as they are similar but technically numerous) are novel, and are
best described in terms of the following two sets of spider-nest identities on N qubit circuits:
8 Fast and effective techniques for T-count reduction via spider nest identities
• The family F (4)N =
{
NS
∣∣ S ⊆ [N] and |S|= 4}, consisting of versions of the identity of Eqn. (10)
applied to all subsets of [N] of size 4
• The family
F
(5)
N =

N p0S N p1S1 N p2S2 N p3S3 N p4S4 N p5S5
∣∣∣∣∣∣∣
S= {q1,q2,q3,q4,q5} for distinct q j ∈ [N],
S j = S\{q j} for 16 j 6 5, and
p0p1p2p3p4p5 ∈ {0,1}6 \{000000}

 , (14)
consisting of the 63 distinct identities for each set S ⊆ [N] with |S|= 5, consisting of NS j applied
to some or all subsets S j ⊆ S of size 4, and possibly also a copy of NS on all the qubits of S, fusing
together those phase-parity operations which act on common subsets S′ ⊆ S.
These are the sets of all possible spider-nest identities on 4 or 5 qubits.4
For increasing values ofN, the cardinalities of these families grow as 1
24
n4+O(n3) and 1
120
n5+O(n4)
respectively — polynomial in size, but impractical to exhaustively iterate through for values of N which
occur in common benchmark tests. This raises the question of how best to use them to realise T -count
reductions. Our approach is to construct a list of 64 identities on four or five qubits, consisting of the
elements of the sets F
(4)
4 ∪F (5)5 , and performing the following for each element J of this list:
1. Let s be the number of qubits on which J acts.
2. Repeat the following R times, for some fixed R> 0:
a. Select a subset S ⊆ [N] of size s uniformly at random.
b. Select (from F
(4)
N if s = 4, or F
(5)
N if s= 5) the identity K acting on S, which is equivalent to
J up to relabelling of the qubits.
c. Apply the tactic PHAGE({K }) associated with the singleton set {K }.
This technique implicitly provides opportunities for identities to be applied in proportion to the number
of isomorphic images of it exist in F
(4)
4 ∪F (5)5 . (For instance, isomorphic copies of the simplest identity
N[4] occurs six times in this set, and the identity of Eqn. (12) occurs five times.) As the probability that
any one such identity will be useful when applied to a particular set S ⊆ [N] of size 4 or 5 is small, it is
important to choose a significantly large value of R: for our results, we took R= 20 000.
We note that this particular strategy for T -count reduction is not particularly strongly suggested by
the framework of PHAGE tactics induced by spider nest identities. Both the concept of a PHAGE tactic,
and the range of possibilities for assembling spider nest identities, are broad enough that there is potential
for much more sophisticated strategies to deploy them. Despite this, as we show in Section 5, in many
cases we obtain the best known T -count for a number of circuits. Our result may therefore be considered
a further proof of principle of the usefulness of spider nest identities, beyond the results of Ref. [7].
4 Reduction of T -count through simplification of parity-phase circuits
In this section, we describe how we applied the concept of T -count reduction via PHAGE tactics as part
of a complete procedure to transform unitary circuits provided as input.
4The set F (5) in particular is motivated by the reduction in T -count of the spider-nest identity shown in Eqn. (12), which is
represented in five different ways in F (5): once for each subset S j of size 4.
N. de Beaudrap, X. Bian, & Q. Wang 9
Remark. Our results do not make heavy (explicit) use of the re-write rules of the ZX calculus: a reader
who is content with circuits which involve intermediate measurements, and who is comfortable with
reading a parity-phase gadget such as that of Eqn. (9) as a unitary operator, may interpret every diagram
below as a circuit diagram. (See Appendix A for a guide to reading ZX diagrams.)
We take as input a unitary circuit over the gate-set
{
X ,CNOT,CCNOT,Z,CZ,CCZ,H,S,T,SWAP
}
.
For the sake of simplicity, we suppose that any multiply-controlled NOT gates with more than two
controls are decomposed into CCNOT gates, for instance by computation and uncomputation on auxiliary
qubits initialised to |0〉, or some more advanced technique.5
Our procedure follows and extends the approach of Heyfron and Campbell [23], of performing a
transformation on circuitsC→CF ◦Cφ ◦CI , whereCF andCI consist entirely of Clifford gates, stabiliser
state preparations, and stabiliser state measurements, and where Cφ can be realised using only CNOT
and T gates. We express the circuit Cφ entirely in terms of phase gadgets, and so we describe as a
“homogeneous” circuit. The objective of isolating such a circuit is that it provides us with the best
opportunities to apply PHAGE tactics to reduce the T -count.
4.1 Circuit translation techniques
Our procedure, which we describe more explicitly in the next section, makes use of the following tech-
niques.
H gate gadgetisation. One of the techniques involved in isolating a DN3 circuit is to replace Hadamard
gates with a measurement-based gadget:
H ≡
|+〉
X
X
≡
−pi/2
−pi/2
pi/2
pi,{s}
pi,{s}
≡
−pi/2
pi/2
pi,{s}
−pi/2 pi,{s}
(15)
In the circuit second from the left, the two qubits are subject to a SWAP operation, followed by a CZ =
exp
(
ipi |11〉〈11|) operation. The bottom qubit is measured finally with an X observable measurement
(i.e., in the |+-〉 basis), and the top operation is acted on finally by an X operation only if the outcome is
|-〉. The two diagrams on the right are ZX diagrams with additional annotations in the style of Ref. [18]
(see also Appendix A). In particular, measurement is represented as a projection with a random outcome
s which is heralded and may be used to control phase operations elsewhere. The leftmost ZX diagram
describes the decomposition of the controlled-Z operation, using CZh, j ∝ D{h, j},2D
−1
{h},2D
−1
{ j},2 . The final
ZX diagram propogates the single-qubit D−1{∗},2 operations towards the preparation and measurement of
the second qubit, so that the second qubit is prepared in the |-y〉 ∝ |0〉− i |1〉 state.
Extracting H gates from the circuit. An obvious drawback of gadgetising H gates in this way is that
it requires the use of auxiliary qubits. More directly important to our results is that, as the number of
wires in a circuit increases, the more difficult it may be to successfully find opportunities to reduce the T
count. Therefore, we attempt to transform the circuit in such a way that reduces the number of H gates
from the part of the circuit with non-trivial T -count. This motivates us to define a subroutine moveH
(which we describe at a high level in Appendix B), which transforms a circuit C over our gate-set, into a
pair of circuits (CF ,C
′), obtained by attempting to commute as many Hadamard gates of C to the end of
the circuit as possible.
5In our benchmarks, we consider the simple computation-uncomputation approach; other techniques (see e.g. Refs. [20, 25,
29]) are advisable in serious production work for optimising T -count.
10 Fast and effective techniques for T-count reduction via spider nest identities
• We define (CF ,C′) = moveH(C) in such a way that CF ◦C′ ∼=C realises the same unitary, CF con-
tains only Clifford gates, C′ contains no CCNOT gates, and where the total number of Hadamard
gates in (CF ◦C′) is at most the number of Hadamard gates in C.
• We may use moveH twice, to attempt to extract Hadamard gates either from the end of the circuit
C, and also the beginning of the circuit C. If we compute
(CF ,C
′) = moveH(C); (C˜I ,C˜M) = moveH
(
(C′)−1
)
; (CI,CM) =
(
C˜−1I , C˜
−1
M
)
, (16)
then (CF ◦CM ◦CI)∼= C, the number of Hadamard gates in (CF ◦CM ◦CI) is at most the number
of Hadamard gates in C, and CI and CF only contain Clifford gates.
We call CI and CF the initial and final Clifford stages of the circuit, respectively, and CM the main body
of the circuit. We use this tripartite decomposition to allow us to condense the part of the circuit with
non-trivial T -count in the main body, and to remove Clifford gates (H gates in particular) to the initial
and final Clifford phases to the extent that this is possible.
Phase-gadgetisation. Through appropriate substitution of H gates by gadgets as in Eqn. (15), and
substitution of CCZ with pi/4-parity-phase operations as in Eqn. (6), we may transform the main body
of the circuit so that it only contains SWAP gate, X gates, CNOT gates, CZ gates, and various phase
gadgets (including powers of the T gate). We wish to transform this into a circuit consisting only of phase
gadgets, by commuting everything apart from phase gadgets either to the beginning of the main body
(and then removing it to the initial Clifford phase) or to the end of the main body (and then removing it to
the final Clifford phase). In particular, we commute all SWAP, measurement, and X operations to the end
of the circuit; we commute all preparation operations to the beginning of the circuit; and we commute
each CNOT operation either to the beginning or the end according to a simple heuristic (described in
Appendix B). This may transform various DS,t gates by Eqns. (3), changing the set S involved and/or
negating the phase, according to the following commutation relations:
... θ
pi
−→
... −θ
pi
;
... θ
pi,{s}
−→
...
...θ −2θ,{s}
pi,{s}
; (17)
... θ −→
... θ
; (18)
... θ −→
... θ
;
... θ −→
... θ . (19)
N. de Beaudrap, X. Bian, & Q. Wang 11
Phase gadget fusion. A final simplifying technique is to simply multiply together any phase gadgets
acting on the same set S of qubits:
...
...α β −→
... α+β . (20)
In some cases, this will reduce the T count by turning two gadgets with phases α = 1
4
k1pi and β =
1
4
k2pi
(for k1 and k2 odd) into a single gadget with phase α +β =
1
4
(k1+ k2)pi , where k1+ k2 is even.
4.2 Circuit translation procedure
Given a unitary circuit C over the gate-set
{
X ,CNOT,CCNOT,Z,CZ,CCZ,H,S,T,SWAP
}
, we trans-
form C as follows:
1. We first replace CCNOT operations in C with (1⊗1⊗H)CCZ (1⊗1⊗H), yielding a circuit C′.
2. Transform C′ → C′F ◦C′M ◦C′I , with an initial Clifford stage C′I , a final Clifford stage C′F , and a
main body C′M, using the procedure moveH to reduce the number of Hadamard gates in C
′
M as
much as possible.
3. Substitute the H gates in C′M with Hadamard gadgets as in Eqn. (15), using a fresh bit label for
each measurement outcome; and decompose CCZ operations in C using the formula of Eqn. (6),
and represent T gates (on some qubit j) by D{ j},3. Call the resulting circuit CM.
4. We gadgetize CM by commuting all gates which are not single-qubit phase gates or phase gad-
gets to the beginning or the end, removing these to the initial or final Clifford stages. This will
generally add some number of measurements, and classically-conditioned Clifford operations, to
the final Clifford stage, and some qubit preparations to the initial Clifford stage. This realises a
transformation of circuits C′F ◦CM ◦C′I → CF ◦C′φ ◦CI .
5. As C′φ is now a homogeneous circuit of phase gadgets, we may commute them past one another to
fuse gadgets on common subsets, yielding a circuit Cφ .
6. Apply the randomised procedure for applying PHAGE tactics based on spider nest identities de-
scribed in Section 3.3.
Steps 1–5 realise a transformation C→ CF ◦Cφ ◦CI . If the original circuit C acted on n qubits and
had m Hadamard gates, then the number of Hadamard gates in C′M which are replaced in Step 3 is some
δn6 m. Then the circuits CI , Cφ , and CF all act on N = n+δn qubits, and CF has internal structure
CF = C˜FDδn · · · D2D1 , (21)
where C˜F is some general Clifford circuit, and the circuits D j (for 16 j6 δn) consist of the j
th measure-
ment in the |+-〉 basis with outcome sj (denoted in ZX notation by a light green “pi,{s j}” node), followed
by DNk operations conditioned on the outcome sj .
In some instances, we find a significant reduction in the T -count simply from the fusion of phase
gadgets in Step 5 of this transformation. These improvements are similar to those seen in Refs. [26, 34].
However, the purpose of this circuit transformation (as Ref. [23]) is to isolate a circuit Cφ consisting
entirely of DN3 operations for some N, on which we can apply the PHAGE tactic of Step 6.
12 Fast and effective techniques for T-count reduction via spider nest identities
Note that δn, the number of additional “auxiliary” qubits involved in the circuit, is bounded above by
how many Hadamard gates are either involved in C or are introduced from the decomposition of CCNOT
gates. More precisely, it depends on how many of these gates can be commuted from the “main body”
of C to the initial or final Clifford stages. For a circuit consisting of M gates, a bound for N = n+ δn
which is substantially better than N 6 n+M will be difficult to obtain, without some knowledge of the
structure of C. In several cases, we find that many or all of these Hadamard gates can be eliminated from
the main body of the circuit: so, N 6 n+M is likely a loose upper bound in a large number of practical
examples.
The largest contributions to the asymptotic run-time of the procedure above are the complexity of
moveH in Step 2; the cumulated complexity of computing the heuristic for moving Clifford gates out of
the main body of the circuit in Step 4; and the complexity of performing a PHAGE tactics in Steps 5
and 6. ForM the number of gates in the input circuit, the procedure moveH and the procedure to commute
CNOT gates out the main body take time O(M2), essentially due to repeatedly commuting individual
gates past O(M) other gates (or computing the potential cost of doing so, in the case of the heuristic used
for determining the direction to move CNOT gates). We use a hash table to store homogeneous circuits,
allowing essentially O(1) time to look up the phase associated with a phase gadget acting on a particular
subset (which we set to 0 when no such phase gadget is present). In Step 5, fusing together pairs of phase
gadgets can be made a part of initialising this hash table, and so takes time O(M). In Step 6, applying a
PHAGE tactic associated with some given identity K (which acts on at most 5 qubits) takes time O(1);
performing this for each of the 64 identities in F
(4)
4 ∪F (5)5 on R uniformly random subsets takes time
O(R) = O(1), for R independent of M. Thus our procedure runs in time O(M2).
5 Results
We realised our techniques in Haskell code [10]. All but two of the circuits were obtained from Ref. [30]:
the circuits “GF(2256) Mult” and “GF(2512) Mult” were obtained instead from Ref. [28]. With one
exception, we ran our code for these benchmarks on a 2.5GHz Intel Core i7 processor and 8GB of
1867MHz LPDDR3 memory, running Ubuntu Linux 18.04.4. The largest single benchmark circuit,
“GF(2512) Mult”, was instead reduced on Dalhousie University’s Mathstat Cluster [14]. The results are
presented as Table 1, including comparisons to the best known reductions for which recorded times are
available.6
Our results do not include an account of the cost of the Clifford group operations. These are also
of interest in principle, though these will likely be significantly less expensive than T gates in the error-
corrected setting in which the T -count becomes a meaningful quantity to reduce. We also do not describe
the T -depth of our circuits, which may also be independently optimised from the T -count itself [4].
The circuits which were obtained using our techniques may be found on GitHub [10]. As our main
aim was to consider reductions in T -count, our algorithm ignores the possibility that the measurement
outcomes on the auxiliary qubits could be anything but |+〉: in the event of a |-〉 outcome, additional
Clifford group operations would be required, which however would not affect the T -count. We verified
our circuits using feynver [2], which was recently extended to accomodate circuits involving post-
selection of |+〉 states on qubits which are maximally entangled with a set of other qubits.
6We do not present the best known T counts which do not have recorded times. We do note that for two of our results (for
the circuits Mod Red21 and RC Adder6) which improve on the known timed results, there are recorded untimed results which
are still better: these may be found in Ref. [7].
N. de Beaudrap, X. Bian, & Q. Wang 13
Circuit # qubits T count & optimisation
n δn δn init. #T final #T time final #T time Verified?
input [23] (ours) (prev. opt.) Ref. (s) (our results) (s) (feynver)
Adder8 24 71 41 399 212
(a) [23] 227.81 176 * 24.62 X
Barenco Tof3 5 3 3 28 14
(b) [23] 0.01* 13 * 0.07607 X
Barenco Tof4 7 7 7 56 24 [23] 0.45 25 1.884 X
Barenco Tof5 9 11 11 84 34 [23] 1.94 37 13.76 X
Barenco Tof10 19 31 31 224 84 [23] 460.33 97 24.49 X
CSLA MUX3 15 17 6 70 40
(b) [23] 3.73 44 18.01 X
CSUM MUX9 30 12 12 196 74
(a) [23] 36.57 84 23.98 X
GF(24) Mult 12 7 0 112 56 (b) [23] 0.55 53 * 0.8180 X
GF(25) Mult 15 9 0 175 90 (b) [23] 6.96 88 * 4.279 X
GF(26) Mult 18 11 0 252 132 (b) [23] 121.16 128 * 7.894 X
GF(27) Mult 21 13 0 343 185 (a) [23] 153.75 167 * 27.21 X
GF(28) Mult 24 15 0 448 216 (a) [23] 517.63 229 95.63 X
GF(29) Mult 27 17 0 567 295 [23] 3213.53 306 24.79 X
GF(210) Mult 30 19 0 700 351 [23] 23969.1 357 24.65 X
GF(216) Mult 48 31 0 1 792 922 [23] 76312.5 972 25.65 X (d)
GF(232) Mult 96 – 0 7 168 4 128 [31] 1.834 3 936 * 26.07 X (d)
GF(264) Mult 192 – 0 28 672 16 448 [31] 58.341 15 865 * 29.73 –
GF(2128) Mult 384 – 0 114 688 65 664 [31] 1744.746 64 461 * 48.78 –
GF(2131) Mult 393 – 0 120 127 69 037 [31] 1953.353 67 772 * 53.39 –
GF(2163) Mult 489 – 0 185 983 106 765 [31] 4955.927 105 182 * 66.27 –
GF(2256) Mult 768 – 0 458 752 – – – 260 539 * 137.1 –
GF(2512) Mult 1536 – 0 1 835 008 – – – 1 046 964 * 850.7 (d) –
Mod54 5 6 0 28 16
(b) [31] 0.001* 7 * 0.00899 X
Mod Adder1024 28 6 270
(c) 304 1 995 978 [23] 665.5 1 010 27.56 X (d)
Mod Mult55 9 10 3 49 28
(a) [23] 0.02 19 * 0.5775 X
Mod Red21 11 17 17 119 69
(b) [23] 0.59 65 27.68 X
QCLA Adder10 36 28 25 238 157 [23] 366.1 147 * 24.96 X
QCLA Com7 24 19 18 203 81 [23] 170.77 84 24.21 X
QCLA Mod7 26 58 58 413 221
(a) [23] 289.77 233 24.26 X (d)
RC Adder6 14 21 10 77 45
(b) [23] 0.97 38 30.70 X
NC Toff3 5 2 2 21 13 [23] 0.01* 13 * 0.04005 X
NC Toff4 7 4 4 35 19 [23] 0.06 19 * 0.5322 X
NC Toff5 9 11 6 49 25 [23] 0.4 26 2.910 X
NC Toff10 19 16 16 119 55 [23] 44.78 60 28.01 X
VBE Adder3 10 4 4 70 20 [23] 0.15 20 * 1.896 X
Table 1: Comparison of our techniques to previously reported results. • In each case, “prev. opt.” represents
the best T -count with a time record (an asterisk indicates that the recorded time is an upper bound). For some
circuits, better reductions without times have been reported: those indicated by (a) have a better reduction reported
in Ref. [26], and those indicated by (b) have a better reduction reported in Ref. [7]. Where it was feasible to
verify the correctness of our reduction with feynver, this is indicated; in all other cases the verification was too
computationally expensive to carry out. • In each case, we also compare the number δn of additional “auxiliary”
qubits required by our decomposition, to that of Ref. [23] (where results are available); in the case of (c), we may
only infer an upper bound on the number of auxiliary qubits used by Ref. [23]. • In our results, those T -counts
which are indicated in bold are those which reproduce or surpass the T -count of the best previously known timed
result. Those with an asterisk also match or surpass the best previously known untimed result. • All results
of Ref. [23] were obtained on the University of Sheffield’s Iceberg HPC cluster. All results of Ref. [31] were
obtained on a machine with a 2.9GHz Intel Core i5 processor and 8GB of 1867MHz DDR3 memory, running
OS X El Capitan. All of our results were obtained on a machine with a 2.5GHz Intel Core i7 processor and
8GB of 1867MHz LPDDR3 memory, running Ubuntu Linux 18.04.4— except those indicated by (d), which were
obtained on Dalhousie University’s Mathstat Cluster [14].
14 Fast and effective techniques for T-count reduction via spider nest identities
6 Discussion
Our results show that our techniques, simple as they are, are competitive with the best known techniques
for reducing T count. We expect that better results should be achievable by a more refined approach to
using these concepts, within the more general framework which we have described of deploying PHAGE
tactics. It is not clear which further ideas may prove useful, however. For instance, in experiments for
how we might choose subsets to apply PHAGE tactics to, we found that it was not helpful to bias the
sets of qubits towards those qubits which were acted on by many T -phase gadgets. More work will be
required to find effective ways to bias or to narrow down the ways in which spider nest identities are used
to simplify homogeneous circuits.
It is remarkable that the run-times for our results in Table 1 are not more varied. Over half of our
results were obtained in an amount of time between 1 and 100 seconds, for circuits over which other
leading techniques [23, 31] had times which ranged over more than six orders of magnitude. Indeed, in
our tests on larger circuits (and in line with the asymptotic analysis of Section 4.2), we found that the
most computationally expensive part of our procedure was the relatively mundane moveH and CNOT-
commutation subroutines. Refining these elementary steps may provide yet further gains. Expanding the
complexity of the subroutines to apply PHAGE tactics may also yield further gains without substantial
increases in run-time.
We note an optimisation problem of interest is motivated by gadgetizing Hadamard gates as in Step 3.
Simply put: given an n-qubit circuit with M gates over the gate-set {X ,Z,S,CNOT,CZ,T,CCZ}, to
obtain an equivalent (unitary) circuit with the minimum number of H gates in between the first and the
last non-Clifford gate.7 Should this problem be solvable in O(M2polylog(M)) time, this may further
contribute to the effectiveness of our approach to T -count reduction.
Finally, we remark that while the benchmarks which we have tested have become a commonplace
standard for the evaluation of such techniques, they consist entirely of circuits to realise permutation
operations which would not in themselves be difficult to realise classically. (Some of these, such as
the “GF(2n) Mult” series, may be motivated on the grounds of cryptography; albeit this motivation may
become less important if standard cryptographic practise incorporates post-quantum cryptography.) A
larger range of circuits, including ones are motivated by the more likely practical applications of fault-
tolerant quantum computation, should be of general interest for future benchmark tests.
Acknowledgements.
N. de Beaudrap was supported in part by a Fellowship funded by a gift from Tencent Holdings (ten-
cent.com), and by the EPSRCNational Hub in Networked Quantum Information Technologies (NQIT.org).
X. Bian is supported by NSERC and by AFOSR under Award No. FA9550-15-1-0331. Q. Wang is sup-
ported by Cambridge Quantum Computing Ltd. and by the AFOSR grant FA2386-18-1-4028. Our
results were made possible in part by the use of the Dalhousie University Mathstat Cluster [14].
We thank Earl Campbell, Luke Heyfron, Alexander Cowtan, Aleks Kissinger, and John van de We-
tering for helpful discussions. We extend a very special thanks to Matthew Amy, who wrote a small
extension of feynver [2] to allow verification of procedures which post-select the |+〉 state, for the ex-
press purpose of helping us to independently verify the correctness of reductions such as appear in this
work and in Ref. [7]. X. Bian would like to thank his Ph.D. supervisor Peter Selinger for his support.
7It seems plausible that this problem would remain equally difficult without CCZ gates.
N. de Beaudrap, X. Bian, & Q. Wang 15
References
[1] S. Aaronson & D. Gottesman (2004): Improved simulation of stabilizer circuits. Phys. Rev. A 70, p. 052328.
[2] Matthew Amy (2018): Towards Large-scale Functional Verification of Universal Quantum Circuits. In:
Proceedings of QPL 2018, pp. 1–21, doi:10.4204/EPTCS.287.1. [arXiv:1901.09476]; see also [https://
github.com/meamy/feynman].
[3] Matthew Amy, Jianxin Chen & Neil J. Ross (2018): A Finite Presentation of CNOT-Dihedral Op-
erators. Electronic Proceedings in Theoretical Computer Science 266, pp. 84–97, doi:10.1007/
978-3-642-12821-9_4. [arXiv:1701.00140].
[4] Matthew Amy, Dmitri Maslov & Michele Mosca (2014): Polynomial-Time T-Depth Optimization of Clif-
ford+T Circuits Via Matroid Partitioning. IEEE Transactions on Computer-Aided Design of Integrated Cir-
cuits and Systems 33(10), pp. 1476–1489, doi:10.1109/TCAD.2014.2341953. [arXiv:1303.2042].
[5] Matthew Amy, Dmitri Maslov, Michele Mosca & Martin Roetteler (2013): A meet-in-the-middle algorithm
for fast synthesis of depth-optimal quantum circuits. IEEE Transactions on Computer-Aided Design of Inte-
grated Circuits and Systems 32(6), pp. 818–830, doi:10.1109/TCAD.2013.2244643. [arXiv:1206.0758].
[6] Matthew Amy & Michele Mosca (2019): T-count optimization and Reed-Muller codes. IEEE Transactions
on Information Theory 65(8), pp. 4771–4784, doi:10.1109/TIT.2019.2906374. [arXiv:1601.07363].
[7] Niel de Beaudrap, Xiaoning Bian & QuanlongWang (2019): Techniques to reduce pi/4 -parity phase circuits,
motivated by the ZX calculus. In: to appear in Proceedings of QPL 2019. [arXiv:1911.09039].
[8] Niel de Beaudrap, Ross Duncan, Dominic Horsman & Simon Perdrix (2019): Pauli Fusion: a computational
model to realise quantum transformations from ZX terms. In: Proceedings of QPL 2019, p. to appear.
[arXiv:1904.12817].
[9] Niel de Beaudrap & Dominic Horsman (2020): The ZX calculus is a language for surface code lattice
surgery. Quantum 4, p. 218, doi:10.22331/q-2020-01-09-218. [arXiv:1704.08670].
[10] Xiaoning Bian: “STOMP-code” Github. Available at https://github.com/onestruggler/
stomp-code/tree/8df4f46228c2f413e0cf5f8b6d25c20b6460fc0e.
[11] Sergey Bravyi, Dan Browne, Padraic Calpin, Earl Campbell, David Gosset & Mark Howard
(2019): Simulation of quantum circuits by low-rank stabilizer decompositions. Quantum 3, p. 181,
doi:10.22331/q-2019-09-02-181. Available at https://doi.org/10.22331/q-2019-09-02-181.
[arXiv:1808.00128].
[12] Sergey Bravyi & David Gosset (2016): Improved classical simulation of quantum circuits dominated
by Clifford gates. Physical Review Letters 116, p. 250501, doi:10.1103/PhysRevLett.116.250501.
[arXiv:1601.07601].
[13] Earl T. Campbell & Mark Howard (2017): A unified framework for magic state distillation and multi-qubit
gate-synthesis with reduced resource cost. Physical Review A 95, p. 022316, doi:10.1103/PhysRevA.86.
022316. [arXiv:1606.01904].
[14] Dalhousie University Mathstat Cluster: Available at https://www.mathstat.dal.ca/cluster/doku.
php.
[15] Bob Coecke& Ross Duncan (2011): Interacting quantum observables: categorical algebra and diagrammat-
ics. New Journal of Physics 13, p. 043016, doi:10.1088/1367-2630/13/4/043016. [arXiv:0906.4725].
[16] Bob Coecke & Aleks Kissinger (2017): Picturing Quantum Processes: A First Course in Quantum Theory
and Diagrammatic Reasoning. Cambridge University Press, doi:10.1017/9781316219317.
[17] Ross Duncan, Aleks Kissinger, Simon Perdrix & John van de Wetering (2019): Graph-theoretic Simplifica-
tion of Quantum Circuits with the ZX-calculus. [arXiv:1902.03178].
16 Fast and effective techniques for T-count reduction via spider nest identities
[18] Ross Duncan & Simon Perdrix (2010): Rewriting Measurement-Based Quantum Computations with Gen-
eralised Flow. In Samson Abramsky, Cyril Gavoille, Claude Kirchner, Friedhelm Meyer auf der Heide
& Paul G. Spirakis, editors: Automata, Languages and Programming, Springer Berlin Heidelberg, Berlin,
Heidelberg, pp. 285–296, doi:10.1007/s10472-009-9141-x.
[19] Simon Perdrix Emmanuel Jeandel & Renaud Vilmart (2019): Completeness of the ZX-Calculus.
[arXiv:1903.06035].
[20] Craig Gidney (2018): Halving the cost of quantum addition. Quantum 2, p. 74, doi:10.1007/
s11128-011-0297-z. [arXiv:1709.06648].
[21] David Gosset, Vadym Kliuchnikov, Michele Mosca & Vincent Russo (2014): An Algorithm for the T-count.
Quantum Info. Comput. 14(15–16), pp. 1261–1276. [arXiv:1308.4134].
[22] D. Gottesman (1997): Stabilizer codes and quantum error correction. Ph.D thesis.
[23] Luke E. Heyfron & Earl T. Campbell (2018): An efficient quantum compiler that reduces T count. Quantum
Science and Technology 4(1), p. 015004, doi:10.1038/srep01939. [arXiv:1712.01557].
[24] C. Horsman, A. G Fowler, S. Devitt & R. Van Meter (2012): Surface code quantum computing by lattice
surgery. New Journal of Physics 14(12), p. 123011. [arXiv:1111.4022].
[25] Cody Jones (2013): Low-overhead constructions for the fault-tolerant Toffoli gate. Phys. Rev. A 87, p.
022328, doi:10.1103/PhysRevA.87.022328. [arXiv:1212.5069].
[26] Aleks Kissinger & John van de Wetering (2019): Reducing T-count with the ZX-calculus.
[arXiv:1903.10477].
[27] Daniel Litinski (2019): A Game of Surface Codes: Large-Scale Quantum Computing with Lattice Surgery.
Quantum 3, p. 128, doi:10.1103/PhysRevB.96.205413. [arXiv:1808.02892].
[28] Dmitri Maslov: Reversible Logic Synthesis Benchmarks page. Available at http://webhome.cs.uvic.
ca/~dmaslov. Accessed February 2020.
[29] Giulia Meuli, Mathias Soeken, Earl Campbell, Martin Roetteler & Giovanni De Micheli (2019): The Role of
Multiplicative Complexity in Compiling Low T-count Oracle Circuits. [arXiv:1908.01609].
[30] Yunseong Nam, Neil J. Ross, Yuan Su, AndrewM. Childs & Dmitri Maslov: “optimiser” Github. https://
github.com/njross/optimizer.
[31] Yunseong Nam, Neil J. Ross, Yuan Su, Andrew M. Childs & Dmitri Maslov (2018): Automated opti-
mization of large quantum circuits with continuous parameters. npj Quantum Information 4(1), p. 23,
doi:10.1038/s41534-018-0072-4. Available at https://doi.org/10.1038/s41534-018-0072-4.
[arXiv:1710.07345].
[32] Michael A Nielsen & Isaac Chuang (2000): Quantum computation and quantum information. Cambridge
University Press, Cambridge UK.
[33] Renaud Vilmart (2019): A Near-Optimal Axiomatisation of ZX-Calculus for Pure Qubit QuantumMechanics.
In: 34th Annual ACM/IEEE Symposium on Logic in Computer Science, pp. 1–10, doi:10.1109/LICS.
2019.8785765. [arXiv:1812.09114].
[34] Fang Zhang & Jianxin Chen (2019): Optimizing T gates in Clifford+T circuit as pi/4 rotations around Paulis.
[arXiv:1903.12456].
N. de Beaudrap, X. Bian, & Q. Wang 17
A ZX diagram reference
The ZX calculus — first developed by Coecke and Duncan [15] (see also Refs. [16, 19, 33] for updated
treatments, and Refs. [8, 9, 17, 26] for applications to quantum technology) — is a relatively recently
developed notation for quantum operations, equipped with rules (the “calculus” part) to compute with
this notation. This article does not make explicit use of the “calculus” part of the ZX calculus: while it
does make statements about equivalences of diagrams which could be shown using the calculus, these
can and should be understood in the same way that other papers make statements of equivalences of
conventional circuit diagrams.
We use ZX notation at various points to describe quantum circuits, including circuits with classically
controlled operations and non-local unitaries such as pi/4-parity-phase operations. The ZX diagrams in
this article can be read merely as a slightly unusual (but convenient) circuit notation. In this Appendix,
we provide a reference for this notation, serving also as a glossary of sorts for various operations as they
are represented in ZX diagrams, to allow readers to understand our results as well as conventional circuit
diagrams would.
A.1 General statements
For the purposes of this article (and essentially all other practical purposes), ZX diagrams are represen-
tations of tensor networks. To represent quantum circuits, it is common to choose a direction in which to
read the diagrams from “input” to “output”. (In our paper, we draw these diagrams with input on the left
and output on the right, as with the usual circuit notation.) The ZX diagrams of our work are composed
of three different kinds of tensor nodes:
• “Green” nodes (which are lighter coloured in our article), which may have any number of indices,
and as a tensor represents a sort of GHZ state over the standard basis. If a phase parameter θ is
provided, the tensor also involves a relative phase of eiθ between the two terms; otherwise θ = 0
is assumed (and there is no relative phase).
θ
...
}
n = |0〉⊗n + eiθ |1〉⊗n (22)
In principle, we also permit the border case of n = 0, in which case this represents the “tensor”
|0〉⊗0+ eiθ |1〉⊗0 = (1+ cos(θ))+ isin(θ); though we don’t make use of such nodes in our results.
• “Red” nodes (which are darker coloured in our article), which may have any number of indices,
and are similar to green nodes except that they are expressed in terms of the {|+〉 , |-〉} basis.
θ
...
}
n = |+〉⊗n + eiθ |-〉⊗n (23)
• “Hadamard” boxes, which represent the usual 2×2 unitary Hadamard matrix.
H = |+〉〈0| + |-〉〈1| (24)
18 Fast and effective techniques for T-count reduction via spider nest identities
To represent operations taking some qubits as input, we change of some of the “kets” in the tensor nodes
to “bras” — but as |0〉, |1〉, |+〉 and |-〉 are real vectors, this change does not affect any of the tensor
coefficients. This allows us to be flexible with our diagrams, and avoid committing to the indices of each
node as being explicitly an “input” or an “output”, unless it is a free index of the whole diagram. (In
particular, this allows us to draw some closed indices by vertical wires, without confusion.)
In the rest of this appendix, we describe some simple examples (and simple extensions) of this
notation, which the interested reader should find themselves able to verify by routine calculation.
A.2 Single-node diagrams
With (light) green or (dark) red nodes of degree 1, we may easily represent states of the {|0〉 , |1〉} basis
or {|+〉 , |-〉} basis, albeit supernormalised by a factor of √2.
= |+〉⊗1+ |-〉⊗1 =
√
2 |0〉 ; pi = |+〉⊗1− |-〉⊗1 =
√
2 |1〉 ; (25)
= |0〉⊗1+ |1〉⊗1 =
√
2 |+〉 ; pi = |0〉⊗1− |1〉⊗1 =
√
2 |-〉 . (26)
More generally, green degree-1 nodes may be used to represent newly prepared qubits in the XY plane
of the Bloch sphere, and red degree-1 nodes may be used to represent newly prepared qubits in the YZ
plane of the Bloch sphere, up to the same supernormalisation of
√
2. This additional factor of
√
2 does
not affect our results: the additional factor may be accounted for any time we represent the preparation
of a qubit in one of these states.
We may also represent single-qubit measurements by degree-1 nodes oriented in the opposite direc-
tion. As re-orienting edges from the right of a node to the left corresponds to turning |0〉 to 〈0|, turning
|1〉 to 〈1|, and so forth, we then have
=
√
2〈0| ; pi =
√
2〈1| ; (27)
=
√
2〈+| ; pi =
√
2〈-| . (28)
Again, the additional factor of
√
2 may be accounted for any time we represent a projection of a qubit in
one of these states. To represent a measurement which may yield either |0〉 or |1〉, or either |+〉 or |-〉,
we may introduce a variable s ∈ {0,1} representing whether a relative phase of pi is absent in the result
(s= 0, for the states |0〉 or |+〉) or present in the result (s= 1, for the states |1〉 or |-〉). We then represent
measurement in the {|0〉 , |1〉} basis and the {|+〉 , |-〉} basis respectively as
pi ,{s} = 〈+| + eispi 〈-| ∈ {√2〈0| ,√2〈1|}; (29)
pi ,{s} = 〈0| + eispi 〈1| ∈ {√2〈+| ,√2〈-|}. (30)
The bit s is in effect a random variable representing the measurement outcome.
In other ZX diagrams (including on nodes of degree 2 or higher), we may use a set S= {s1,s2, . . .} in
place of the set {s}. This indicates a node in which the presence or absense of the phase of pi depends on
the parity (s1⊕s2⊕·· · ) of the entire set S, rather than on the single bit s. For example, we may represent
Z rotations and X rotations each by a single node of degree 2:
θ
= |0〉〈0| + eiθ |1〉〈1|= Rz(θ),
θ
= |+〉〈+| + eiθ |-〉〈-|= Rx(θ); (31)
N. de Beaudrap, X. Bian, & Q. Wang 19
Then, the following diagrams represent the same operations, conditioned on the parity s=
⊕
j s j of a set
of bits S= {s1,s2, . . .}:
θ,S
= Rz(sθ) = Rz(θ)
s,
θ,S
= Rx(sθ) = Rx(θ)
s. (32)
This feature of the ZX calculus does not play a prominent role in our work, but is present in our treatment
of the Hadamard gadget (Eqn. (15) on page 9) and in principle useful to represent the circuits which we
would obtain by representing conditionally-controlled Clifford operations in the ZX calculus.
A.3 Two-node diagrams
Diagrams of more than one node can be easily constructed simply by composing nodes on their edges.
In many cases, this has the same meaning as in conventional quantum circuit diagrams (with the same
“feature” that the algebra is read right-to-left, even though the diagram is read left-to-right): for example,
θ
H = HRz(θ) = Rx(θ)H =
θ
H (33)
θ pi
= XRz(θ) = Rz(−θ)X =
−θpi
(34)
As with circuit diagrams, we may also represent the tensor product of operations by representing opera-
tions happening on different wires in parallel — for example:
θ
ϕ
= Rz(θ)⊗Rx(ϕ). (35)
Not all “compositions” of nodes take these forms, however: in general we may compose any two nodes
simply by connecting their edges (corresponding to contracting the shared indices of the tensor nodes).
An especially important case in point is the way that CNOT operators are represented as ZX terms. As
with single-qubit states, the usual representation of CNOT by ZX diagrams is not precisely normalised:
= |0〉〈0|⊗ 〈0 |+〉 ⊗ |+〉〈+| + |0〉〈0|⊗ 〈0 |-〉 ⊗ |-〉〈-|
+ |1〉〈1|⊗ 〈1 |+〉 ⊗ |+〉〈+| + |1〉〈1|⊗ 〈1 |-〉 ⊗ |-〉〈-|
= 1√
2
|0〉〈0|⊗1 + 1√
2
|1〉〈1|⊗X = 1√
2
CNOT. (36)
(Again, the subnormalisation of this diagram does not affect our analysis, and can in principle be ac-
counted for in the ZX representation of any circuit involving CNOT gates.) Note that the shared wire
between the red and green dot does not have a specific interpretation as an “input” or an “output” of
either — nor is this necessary to provide the interpretation of the diagram as an operator.
A.4 Multi-node diagrams
Composing the diagrams above, in series or in parallel (and with appropriate accounting for normalisa-
tion), suffices to represent an arbitrary unitary operation by the (slightly redundant) gate set consisting of
20 Fast and effective techniques for T-count reduction via spider nest identities
arbitrary X and Z rotations, Hadamard gates, and CNOT operations. We may also more directly repre-
sent somewhat more “exotic” operators using ZX diagrams, and pi/4-parity-phase operations are in this
case the most relevant example: for instance,
θ
=
(
〈0|d + eiθ 〈1|d
)(
|+〉d〈+++|abc+ |-〉d〈---|abc
)
×
(
|0〉〈0|1⊗|0〉a+ |1〉〈1|1⊗|1〉a
)(
|0〉〈0|3⊗|0〉b+ |1〉〈1|3⊗|1〉b
)
×
(
|0〉〈0|5⊗|0〉c+ |1〉〈1|5⊗|1〉c
)
(37a)
= 1
2
√
2 ∑
a,b,c∈{0,1}
(
〈0|d + eiθ 〈1|d
)(
|+〉d + (−1)a+b+c |-〉d
)
⊗|a,b,c〉〈a,b,c|1,3,5
= 1
4 ∑
a,b,c∈{0,1}
(
1+ eiθ +(−1)a+b+c− (−1)a+b+ceiθ
)
|a,b,c〉〈a,b,c|1,3,5
= 1
2 ∑
a,b,c∈{0,1}
a⊕b⊕c=0
|a,b,c〉〈a,b,c|1,3,5 + 12 ∑
a,b,c∈{0,1}
a⊕b⊕c=1
eiθ |a,b,c〉〈a,b,c|1,3,5
= 1
2
eiθ/2 exp
(
1
2
iθ(Z⊗1⊗Z⊗1⊗Z)). (37b)
Again, the subnormalisation by a factor of 1
2
does not affect our analysis, which is in principle about
products of DS,3 operators — merely denoted in our work by these phase gadgets, for convenience —
which are proportional to the identity by a global phase.
The existence of rules for transforming ZX diagrams allows us to reason (i.e., to compute) effectively
about these diagrams without the need to expand their meaning algebraically as we have been doing in
this Appendix. This has particularly motivated our use of the ZX calculus in our work, as a convenient
notational tool and also as a means by which we performed our analysis.
For more information about the ZX calculus, and in particular for resources to learn about these
diagrammatic computational methods, the interested reader is invited to visit [zxcalculus.com].
N. de Beaudrap, X. Bian, & Q. Wang 21
B Details of the moveH subroutine and CNOT-commutation heuristic
In this Appendix, we describe our procedures for H gate extraction and CNOT gate extraction (used in
Steps 2 and 4 of the procedure described in Section 4.2) on a high level. For more details, the interested
reader may view our source code on Github [https://github.com/onestruggler/fast-stomp].
B.1 The moveH subroutine
Our procedure for extracting H gates from a circuit are built on a subroutine moveH, which attempts to
move each H gate as far to the right (the end of the circuit) as possible.
Representing the circuit as a list of gates in a particular order (without parallelisation), this procedure
looks for the first H gate, and attempts to move it to the right. In doing so, it makes use of several simple
commutation relations or opportunities for cancellation, for example:
H H → ; H X → Z H ; H Z → X H ;
H → H ; H → H ; H →
H
.
(38)
If the procedure moves the H gate to a point that it precedes a second H gate, it proceeds recursively
to attempt to move the second H gate before continuing with the first. When the procedure is finished
attempting (successfully or otherwise) to move the second H gate, it returns to the task of moving the
first — moving this gate past the other H gate, if the attempt to move it ended in failure. This process
continues until the procedure has stopped trying to move what originally was the first H gate.
In attempting to move H gates, moveH may encounter situations in which no progress is possible,
without trying to move or cancel other kinds of gates. For instance: in a circuit consisting only of an H
gate followed by four T gates on a single wire, it is possible to move the H gate to the end, but only after
“pushing” the T gate which follows it to the right, accumulating the other phase gates to form a Z gate.
In general, if moveH encounters a gate G for which there is no commutation rule provided, it attempts
instead to push G forward, to commute with, accumulate with, or cancel against gates further to its right.
In doing so, moveHmay encounter yet another gate F for which G has no provided commutation relation,
in which case moveH will attempt to move F further to the right, and so on.
In some cases, there are fruitful opportunities for multi-gate substitutions which either reduce the
number of H gates, or allows an H gate to be moved further to the right. For instance:
• If in moving an H gate to the right we encounter an S gate followed by an H gate, moveH first tries
to move the second H gate. If this fails, we may apply the transformation
H S H → S Z H S Z . (39)
This reduces the number of H gates by 1. We then move the S and Z gates further to the right, then
continue by moving the new H gate to the right.
• If in moving an H gate to the right we encounter the control qubit of a CNOT gate, followed by an
H gate on either the control or target, we again first try to move the H gate. If this fails, we may
apply one of the transformations
H H
→ H H ; H
H
→
H
H
. (40)
22 Fast and effective techniques for T-count reduction via spider nest identities
This doesn’t directly reduce the number of H gates, but may make it possible to move the later H
gate and the CNOT gate to the right before continuing further, thereby providing an alternative for
at least one of the two H gates to be moved further to the right.
The details of all commutation relations which we define for all of the gates are not important, except
that it is important to define these rules in such a way that the procedure terminates (rather than repeatedly
commute two gates such as T and CCZ past one another, in an attempt to cancel them so that an H gate
can be moved to the right of both). Different techniques will lead to different performances in the ability
of moveH to reduce the number of H gates which precede any non-Clifford gate.
B.2 CNOT movement heuristic
In Step 4, we move all operations which are not single-qubit phase operations or phase-parity operations
out of the main body of the circuit. The way that CNOT gates are treated aims, roughly, to avoid
generating phase-parity operations on very large subsystems, but does so in a way that attempts to avoid
performing too much computation.
The heuristic used to determine which direction to move a CNOT operation is as follows. For each
CNOT gate, from the first in the circuit to the last, we compute the following:
1. Compute the set PL of all phase-parity gadgets to the left which act on the target but not the control
of the CNOT, and the set ML of phase-parity gadgets to the left which act on its target and control
both.
2. Similarly, compute the set PR of phase-parity gadgets to the right which act on the target but not the
control of the CNOT, and the set MR of phase-parity gadgets to the left which act on its target and
control both.
3. If PL−ML < PR−MR, we prefer to move the CNOT to the left; otherwise we prefer to move it to
the right.
If no other CNOT gate acted on any qubits in common with this left-most CNOT gate, the quantity PL
(respectively, ML) would correctly indicate how many phase-parity gadgets would act on one more qubit
(respectively, one fewer) if we commuted that CNOT to the left. The difference PL−ML then indicates
the net change in the cumulative number of qubits acted on by the phase-gadgets to the left of the CNOT.
Similar remarks apply for PR−MR, albeit with the important caveat that this figure may be inaccurate
if there are further CNOT gates to the right whose targets coincide with the control of the CNOT under
consideration.
The approach taken to produce our results is as follows. For the left-most CNOT in the circuit,
compute PL, ML, PR, and MR. Commute the CNOT gate to the left if PL−ML < PR−MR, and otherwise
to commute it to the right. If in commuting it to the right we encounter another CNOT gate with which
it does not commute, we also commute that CNOT gate to the right (and any CNOT gates with which
those do not commute, etc.) Having done this, we compute compute PL,ML, PR, and MR for the leftmost
remaining CNOT gate in the circuit, where these may depend on the commutations which occurred for
the previous CNOT gate. We proceed in this way, recursively from left to right, until no more CNOT
gates are in the main body of the circuit.
