Error rates and resource overheads of encoded three-qubit gates by Takagi, Ryuji et al.
Error rates and resource overheads of encoded three-qubit gates
Ryuji Takagi, Theodore J. Yoder and Isaac L. Chuang
Department of Physics, Massachusetts Institute of Technology,
77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
(Dated: October 5, 2017)
A non-Clifford gate is required for universal quantum computation, and, typically, this is the most
error-prone and resource intensive logical operation on an error-correcting code. Small, single-qubit
rotations are popular choices for this non-Clifford gate, but certain three-qubit gates, such as Toffoli
or controlled-controlled-Z (CCZ), are equivalent options that are also more suited for implementing
some quantum algorithms, for instance, those with coherent classical subroutines. Here, we calculate
error rates and resource overheads for implementing logical CCZ with pieceable fault-tolerance, a
non-transversal method for implementing logical gates. We provide a comparison with a non-local
magic-state scheme on a concatenated code and a local magic-state scheme on the surface code. We
find the pieceable fault-tolerance scheme particularly advantaged over magic states on concatenated
codes and in certain regimes over magic states on the surface code. Our results suggest that pieceable
fault-tolerance is a promising candidate for fault-tolerance in a near-future quantum computer.
I. INTRODUCTION
Quantum error-correcting codes are the most promis-
ing route to scalable quantum computation. However,
some of their limitations are well-known. For instance, a
major problem is that a single code cannot support a full
set of universal, transversal operations [1–3]. Often, and
always for 2D designs [4], the missing gate is not in the
normalizer of the Pauli group; that is, it is non-Clifford.
The techniques of gate-teleportation [5] and magic-
states [6] can overcome the lack of a non-Clifford gate.
Different magic-states can be created to implement small
Z-rotations such as the T -gate or 3-qubit operations, like
Toffoli or controlled-controlled-Z (CCZ). However, the
process to create a magic-state occurs post-selectively
and recursively and leads to large resource overheads. Al-
though improving consistently [7, 8] approaching believed
fundamental limits [9], large resource demands remain a
serious obstacle for near-future architectures.
Certain other approaches exist in the literature for im-
plementing a universal gate-set while circumventing the
use of magic-states. A popular approach is gauge-fixing
[10–12], in which a subsystem code can implement com-
plementary sets of transversal logical gates depending
on the settings of the gauge qubits. Another approach
[13–15] concatenates different codes with complementary
transversal gate sets to achieve the same effect in one
larger code. Recently, this approach was shown to lead to
asymptotic thresholds around ∼ 10−3 albeit using more
physical qubits than, for example, surface code magic-
state distillation [16, 17].
Any fault-tolerant, universal computing scheme oper-
ating without magic states is expected to be a promis-
ing candidate for near-future architectures where fairly
accurate physical components are supplied but space-
time resources, like qubit count and circuit depth, are
limited. The primary goal in this near-future regime is
to achieve some desired target error rate after a finite-
sized computation with small resource overheads. Such
constraints imply that the logical error rates of encoded
gates and the first-level pseudothreshold [18] (called just
pseudothreshold hereafter) are more important measures
than asymptotic threshold, which only becomes mean-
ingful with access to huge amounts of resources.
To evaluate near-future fault-tolerant computation, we
focus on another magic-less alternative that allows for a
logical implementation of three-qubit gates, the pieceable
fault-tolerance scheme [19]. In this approach, a logical
gate is done non-transversally through the “round-robin”
construction, and made fault-tolerant via partial error-
correction performed throughout the circuit. This con-
struction has recently been used in [20] to perform fault-
tolerant, universal computing on seven logical qubits re-
quiring only four ancillary qubits and 15 code qubits.
The circuit volume metric, a space-time resource measure
that counts all gates weighted by the number of qubits
involved, was used in [19] to argue that pieceable fault-
tolerance reduces logical gate overhead by nearly a factor
of two over magic-state creation and injection. However,
little was said about error rates of pieceable gates.
In this paper, we calculate these error rates and com-
pare to magic-state schemes for implementing three-qubit
non-Clifford gates. Our contenders are (1) a non-local
magic-state scheme: magic-states created postselectively
on Steane’s 7-qubit code (also known as the smallest
color code), (2) a local magic-state scheme: surface code
magic-state distillation, and (3) pieceable fault-tolerance
on the (a) 5-qubit [19], (b) 7-qubit [19], (c) 3× 3 Bacon-
Shor [21], and (d) 3 × 9 Bacon-Shor [21] codes. Our
metrics are (I) error rate of the logical gate and (II) cir-
cuit volume. Among concatenated schemes (1) and (3),
we can definitively declare pieceable 3 × 3 Bacon-Shor
the winner in both metrics (I) and (II). When comparing
to (2), the picture is more complicated and interesting.
The pieceable 3×3 Bacon-Shor beats the surface code in
error rate at low distance and in circuit volume when the
physical error rate is sufficiently low compared with the
desired target logical error. On the other hand, asymp-
totically in code distance, the surface code outperforms
pieceable 3×3 Bacon-Shor due to better scaling of logical
ar
X
iv
:1
70
7.
00
01
2v
3 
 [q
ua
nt-
ph
]  
4 O
ct 
20
17
2error rate and volume with distance.
II. METHODS
We first describe our method to evaluate the logical er-
ror rates. Evaluating the surface code scheme (2) draws
on the extensive literature on the topic [22]. Our calcu-
lations of the logical error rates of schemes (1) and (3) at
code distance d = 3 are done by exact enumeration of all
combinations of up to two faults in the circuit extended-
rectangle (exREC) [23] under the standard depolarizing
noise model (which serves as a model of average-case
noise). In [23], a rigorous upper bound on the logical
error rate under depolarizing noise is given. In contrast,
we provide formulas giving a rigorous lower bound as
well as a tighter rigorous upper bound. The lower and
upper bounds on logical error rate also determine lower
and upper bounds on the pseudothreshold. Having both
bounds allows us to definitively prove a separation be-
tween two different schemes when it exists. Our method
also confers some advantages over a Monte Carlo sim-
ulation. First, we can rigorously verify our circuits are
fault-tolerant under the chosen noise model by checking
that all single faults are correctable. Second, once the
counting is complete, we can independently vary noise
for each type of gate.
Our standard noise breakdown assigns single-qubit
gates, two-qubit gates, and three-qubit gates each their
own failure probabilities p1, p2, and p3, respectively. In
the circuit depolarization noise model, an r-qubit gate
fails with one of the 4r − 1 r-qubit Pauli errors with
probability pr/(4
r − 1). In principle, preparation and
measurement could be treated separately as well, though
we will assign them failure probabilities also equal to p1.
Bounds on the error rate can always be written as poly-
nomials in p1, p2, p3 as we discuss below.
Our ultimate goal in error-rate estimation is to find the
probability the exREC is incorrect given that all ancillas
pass verification. Denote this Pfail|acc = Pr [fail|acc]. Our
counting gives the exact values of
P
(2)
fail,acc = Pr [fail, acc,≤ 2 faults] , (1)
P (2)succ,acc = Pr [¬fail, acc,≤ 2 faults] , (2)
P
(2)
rej = Pr [¬acc,≤ 2 faults] , (3)
as polynomials in p1, p2, p3 with degree equal to the num-
ber of potentially faulty components in the entire exREC.
These exactly calculated quantities are enough to bound
Pfail,acc = Pr [fail, acc], Psucc,acc = Pr [¬fail, acc], and
Pacc = Pr [acc] as
P
(2)
fail,acc ≤ Pfail,acc, (4)
P (2)succ,acc ≤ Psucc,acc, (5)
Pacc ≤ 1− P (2)rej . (6)
Thus,
P
(2)
fail,acc
1− P (2)rej
≤ Pfail|acc = 1− Psucc|acc ≤ 1− P
(2)
succ,acc
1− P (2)rej
. (7)
More details on the simulation including the description
on how to obtain these polynomials can be found in Ap-
pendix B.
Next, we consider evaluating the resource overhead.
There exist various resource measures such as qubit
count, circuit volume, gate counts and so on. The num-
ber of reusable physical qubits is often taken as a physi-
cal resource measure in the literature. However, it is not
the best, especially when we would like to compare re-
source overheads between different codes, because there
is ambiguity that comes with the level of parallelization
we assume. In this paper, we mainly focus on circuit
volume, a space-time resource measure that counts all
gates weighted by the number of qubits involved. Unlike
physical qubit count, circuit volume takes into account
the trade-off between space and time resources. The cir-
cuit volume is a space-time metric in the same vein as
the “quantum volume” [24], except for evaluating spe-
cific circuits rather than a universal quantum computer.
The circuit volume at a high concatenation level is easy
to compute using the volume of the logical construction
at the first level of encoding. Let V
(k)
G be the volume
for implementing circuit component G at the kth level
of concatenation. Then, there is a recursion relation be-
tween two concatenation levels, V
(k+1)
G =
∑
G′ N
G′
G V
(k)
G′
where NG
′
G is the number of the circuit component G
′ in
the logical construction of component G. We can under-
stand this as evolution of a vector of circuit volumes of
each component via a transformation matrix determined
by the logical gate constructions. Namely, we get
V(k) = AkV(0), (8)
where A is the matrix Aij = N
Gj
Gi
, V(k)i = V
(k)
Gi
, and
V
(0)
G is the volume of an unencoded component. We
set V(k) = (V
(k)
3 , V
(k)
2 , V
(k)
1 , V
(k)
prep, V
(k)
meas)T , where the
components refer to the circuit volume of three qubit
gates, two qubit gates, single qubit gates, |0〉 or |+〉
preparation, and measurement respectively. Note that
(V
(0)
3 , V
(0)
2 , V
(0)
1 , V
(0)
prep, V
(0)
meas) = (3, 2, 1, 1, 1).
III. LOGICAL CONSTRUCTIONS
Here, we describe the logical constructions used in the
simulation. Explicit descriptions of the circuits at the
gate level can be found in [25]. All of our constructions
begin with a round of syndrome measurement and re-
covery (the leading error correction) and end with the
same (the trailing error correction), in accordance with
the exREC formalism [23]. The rest of the circuit may
3also include rounds of error correction, called intermedi-
ate, in accordance with pieceable fault-tolerance [19].
For the 5-qubit code, we implement a logical CCZ gate
by the round-robin construction [19] with three interme-
diate error corrections. The leading error correction and
trailing error correction are done by Steane’s error correc-
tion [26]. Since the 5-qubit code is non-CSS, a 10-qubit
ancilla is needed to extract the entire syndrome simulta-
neously. We actually find that the circuit in [26] needs
some modification for non-CSS codes, which we discuss in
the Appendix A in detail. For intermediate error correc-
tions, we use Shor-type error correction with CAT states
[27]. The size of the CAT states is always four for mea-
suring constant stabilizers (those that commute with the
preceeding circuitry), but it varies for measuring non-
constant stabilizers because their weight changes as they
go through the CCZ gates. For our circuit, we need to use
9-CAT, 13-CAT, 9-CAT at maximum for the first, second
and third intermediate error correction respectively.
For the 7-qubit code, we consider the construction that
requires only one intermediate error correction [19]. All
of the error corrections are done by Steane’s error cor-
rection. Since the 7-qubit code is a CSS code, correction
of Z type errors can be done separately from that of X
type errors, and only the encoded states |0¯〉 and |+¯〉 are
needed. The state |0¯〉(|+¯〉) is verified by applying CNOT
gates transversally to another noisy |0¯〉(|+¯〉) and mea-
suring it transversally (a Steane ancilla factory [28]). If
some error is detected, we discard the state and start
again. For estimating the circuit volume, we consider
a more resource-efficient state preparation method pro-
posed by Goto [29]. Although we did not estimate the
logical error rate using the Goto’s method, we suspect
that the change in the logical error rate between differ-
ent verification methods would be small as indicated in
[29]. Since intermediate Z-type error correction is not
needed, we just apply the X-type error correction in the
middle and notify the trailing error correction about pos-
sible locations of Z-type errors as described in [19].
Logical CCZ on the Bacon-Shor code is implemented
as proposed in [21]. On the 3×3 Bacon-Shor we need no
intermediate correction although we do use a non-Pauli
recovery at the end. Furthermore, since the ancilla for
the error correction is a tensor product of 3-CAT states,
there is no need for verification since, modulo its stabi-
lizers, an error on a 3-CAT is equivalent to a weight one
error. In contrast, the 3× 9 Bacon-Shor implements log-
ical CCZ transversally, but it comes with a substantially
larger overhead [21].
For the non-local magic-state scheme, we use magic
state injection on the 7-qubit code to implement a logical
CCZ gate. The CCZ magic state is defined by the stabi-
lizers 〈X1CZ(2, 3), X2CZ(1, 3), X3CZ(1, 2)〉. The proto-
col consists of two parts, a state preparation circuit and
a teleportation circuit. The state preparation starts with
the +1 eigenstate of the second and the third stabilizer,
|0¯〉|+¯〉|+¯〉, and measures the first stabilizer [30]. Our cir-
cuit is a variant of the circuit in [31] which we modify
to create the CCZ state instead. Two measurements of
X1CZ(2, 3) are done with complete error-correction in
between. This makes the circuit fault-tolerant (to one
fault). If the two measurement results do not match, we
discard the created state and start over again. If they
match and they both show the result -1, we apply Z¯ on
the first code block to put it back to the desired magic
state. If both show +1, we do not need to apply a cor-
rection. Like the pieceable 7-qubit case, all the error
corrections are done using Steane’s method [26].
IV. COMPARISON OF CONCATENATED
SCHEMES
We compute the logical error rates and resource over-
heads of pieceably fault-tolerant CCZ gates on the 5-
qubit code, 7-qubit code [19], 3 × 3 Bacon-Shor code
and 3 × 9 Bacon-Shor code [21], and compare them to
a magic-state scheme on the 7-qubit code. Fig. 1 shows
the obtained logical error rates for these cases using two
different settings of physical error rate, p1 = p2 = p3 = p
and 10p1 = p2 = 0.1p3 = p. Lower and upper bounds
on pseudothresholds are the crossing points of “break-
even” line and the upper and lower bounds for logical
error rates. For both settings of physical error rate, the
3 × 3 Bacon-Shor code has lower logical error rate than
the magic-state scheme below pseudothreshold. For the
7-qubit code, whether the pieceable scheme has a lower
rate than the magic-state scheme depends on the physical
error rate setting. The 5-qubit code has a large logical
error rate due to a large number of pieces in the round-
robin construction. Similarly, the 3× 9 Bacon-Shor code
has a higher logical error rate than the 3× 3 Bacon-Shor
code because the size of the logical code block is obviously
much bigger. Moreover, the 3 × 9 Bacon-Shor needs to
implement verification for 9-qubit CAT states.
We now compare the resource overheads. Table I shows
the resource overheads to implement a logical CCZ gate
with these constructions. We assume that the ancillas are
not reusable. Due to a finite ancilla verification rejection
rate, the effective resource count is slightly higher than
the values in the table. However, the rejection rate of the
verification is O(p), and the effective resource count is
obtained by multiplying (1 − nrejp)−1 where nrej is the
number of error locations that lead to rejection. Since
we are interested in the region p < 10−4, and the largest
module involving verification is the magic state prepa-
ration circuit, which has nrej ∼ 100, increase in the re-
source due to verification is within 1%. Thus, it is safe
to ignore the effects of verification. Besides using more
3-qubit gates, pieceable constructions on the 7-qubit and
3 × 3 Bacon-Shor code have smaller resource overheads
compared to the magic-state scheme. In particular, they
have a significant reduction in circuit volume. Fig. 3
shows circuit volumes for the pieceable 7-qubit code, 3×3
Bacon-Shor code, and magic-state scheme. Transforma-
tion matrices A (see Eq. (8)) for these codes are given in
4Appendix D.
Combining the results for the logical error rates and
circuit volume, we conclude that the pieceable construc-
tion on the 3× 3 Bacon-Shor code beats the magic-state
scheme on the 7-qubit code in both the criteria. The
pieceable construction on the 7-qubit code also beats
magic state injection in circuit volume, and in logical
error rate when p1 = p2 = p3.
Volume Qubits 2-qubit gates 3-qubit gates
Pieceable 5-qubit 3841 364 445 46
Pieceable 7-qubit 771 93 162 21
3× 3 Bacon-Shor 414 81 90 27
3× 9 Bacon-Shor 1350 252 306 27
Magic state 1352 154 267 14
3× 3 BS/Magic 0.31 0.59 0.34 1.9
TABLE I. Resource overheads to implement logical CCZ.
Volume refers to the circuit volume, which counts all gates
weighted by the number of qubits involved. Qubits are the
number of physical qubits including data qubits and ancilla
qubits where ancilla qubits are assumed to be not reusable.
Numbers for the 5-qubit code include all the resources for the
adaptive measurements.
V. COMPARISON TO SURFACE CODES
Next, we compare logical error rate and resource over-
heads to a local magic-state scheme on surface codes. We
find that the pieceable construction can have a significant
advantage in circuit volume in a certain region in terms
of physical error rate and target logical error rate.
A. Logical error rates
Surface codes are known to have high asymptotic
threshold, which is 0.1%-1% depending on assumptions
and error model [22, 32–35], and thus they have attracted
attention as a candidate for a scalable quantum com-
puter. However, having a high asymptotic threshold does
not automatically imply that logical error rate is always
low for reasonably sized codes.
Firstly, as can be seen in [22], in the low distance
regime the pseudothreshold of the surface code is much
smaller than the asymptotic threshold. Thus, if the phys-
ical error rate is lower than the asymptotic threshold but
not below the relevant pseudothreshold, encoding at low
distance does not help to reduce the error rate.
Secondly, the logical error rate of a logical gate can be
large even if the error rate for one surface code cycle is
small, because a logical gate is made up of many cycles.
Each cycle consists of measuring the complete error syn-
drome once via measurement qubits, one per stabilizer
generator, as in [22]. Let p¯cycle be the logical error rate
for surface code per surface code cycle. Let CG be the
number of surface code cycles it takes to implement a
���������
1.×10-5 1.×10-4
10-7
10-6
10-5
10-4
0.001
0.010
Pieceable 7-qubit
3x3 Bacon-Shor
Magic state 7-qubit
p_L=p
p1.×10-5 1.×10-4
1×104
5×1041×105
5×105
pL/p
2
(a)
p
pL/p
2
1.×10-4 1.×10-35000
1×104
5×1041×105
5×105��������
1.×10-5 1.×10-4
1×104
5×1041×105
5×105
Pieceable 7-qubit
3x3 Bacon-Shor
Magic state 7-qubit
p_L=10p
(b)
���������
1.×10-5 1.×10-4
10-6
10-5
10-4
0.001
0.010
0.100
Pieceable 5-qubit
Pieceable 3x9 Bacon-Shor
Magic state 7-qubit
p_L=p
p
pL/p
2
1.×10-5 1.×10-4
1×104
5×1041×105
5×1051×106
(c)
p
pL/p
2
1.×10-4 1.×10-35000
1×104
5×1041×105
5×105��������
1.×10-5 1.×10-4
1×104
5×1041×105
5×1051×106
Pieceable 5-qubit
Pieceable 3x9 Bacon-Shor
Magic state 7-qubit
p_L=10p
(d)
FIG. 1. (Color online.) Logical error rates of 3-qubit gate
on (a,b) pieceable 7-qubit code (green, dot-dashed), piece-
able 3 × 3 Bacon-Shor code (blue, dashed), (c,d) pieceable
5-qubit code (green, dot-dashed), pieceable 3× 9 Bacon-Shor
code (blue, dashed), and magic state injection on 7-qubit
code (orange, solid) with (a,c) p1 = p2 = p3 = p and (b,d)
10p1 = p2 = 0.1p3 = p where pi refers to physical error rate of
i-qubit gate. Initialization of single-qubit states |0〉, |+〉 also
fails with probability p1. Dotted line is the “break-even” line
where logical error rate coincides with physical error rate.
5logical version of gate G. Then, logical error rate of gate
G is p¯G ≈ CGp¯cycle. Since CG ∝ d and p¯cycle ∝ p(d+1)/2
where d is the surface code distance, p¯cycle dominates for
large distance. However, when d is small, the contribu-
tion to p¯G from CG is not negligible. In Appendix E, we
find a specific form of CG for the logical Toffoli gate for
two different implementations.
Fig. 2 shows logical error rates of a 3-qubit gate on the
surface code using a Toffoli state, and upper bounds of
logical error rate of pieceable 3× 3 Bacon-Shor code and
pieceable 7-qubit code in terms of code distance with
three different physical error rates. Upper bounds are
obtained by concatenating the function upper bounding
the actual rate in Eq. (7). Since the 3-qubit gate is the
largest component among the components that appear
in the logical construction of 3-qubit gate, concatenat-
ing the upper bounding error function for the 3-qubit
gate upper bounds its error rates at higher concatena-
tion level. However, because logical 3-qubit gates have an
order of magnitude higher error rate than 2-qubit gates
and the logical constructions of 3-qubit gates mostly con-
sist of single gates and 2-qubit gates, this upper bound
is highly pessimistic. A careful analysis taking into ac-
count error functions for other types of components and
possibly even using better decoding algorithm [36, 37] at
a higher levels may greatly reduce estimates of logical
error rates.
Nevertheless, in Fig. 2, we can see that surface codes
have better scaling with distance than pieceable con-
catenated codes, which should be attributed to the high
threshold. However, for small d, CToffoli has a signifi-
cant contribution, and when d = 3 the logical error rate
of the pieceable constructions is two orders of magnitude
lower than that of the surface code.
B. Resource overheads
We also count the circuit volume for implementing log-
ical Toffoli on surface codes. This allows us to compare
the circuit volume between pieceable codes and the sur-
face codes, shown in Fig. 3. Although surface codes have
better scaling with distance, pieceable constructions have
a significant advantage until three concatenations. This
is especially true at distance three, where the difference
is three orders of magnitude.
Consider now the space consisting of pairs (physical
error rate, target logical error rate)≡ (p, pT ). Combining
volume and error rate estimates, the region of this space
where concatenated pieceable constructions require less
circuit volume for implementing Toffoli than the surface
code can be obtained. Fig. 4 shows this region for the
pieceable 3 × 3 Bacon-Shor code and the 7-qubit code.
It shows that in large range, pieceable 3× 3 Bacon-Shor
code has advantage in circuit volume over surface code,
and the difference can be significant as can be seen in
Fig. 3. This region is actually determined by the upper
bound of error rates at the third level concatenation. It is
���������
●
●
●
■
■
■
◆
◆
2 5 10 20 50
10-13
10-11
10-9
10-7
10-5
10-3
Surface code p=1.6*10^(-5)
Surface code p=8*10^(-6)
Surface p=4*10^(-6)
● 3x3 Bacon-Shor p=1.6*10^(-5)
■ 3x3 Bacon-Shor p=8*10^(-6)
◆ 3x3 Bacon-Shor p=4*10^(-6)
d
pL
●
●
●
■
■
■
◆
◆
2 5 10 20 50
10-13
10-11
10-9
10-7
10-5
10-3
(a)
���������
●
●
●
●
■
■
■
◆
◆
◆
2 5 10 20 50
10-13
10-11
10-9
10-7
10-5
10-3
Surface code p=1.6*10^(-5)
Surface code p=8*10^(-6)
Surface p=4*10^(-6)
● 7-qubit p=1.6*10^(-5)
■ 7-qubit p=8*10^(-6)
◆ 7-qubit p=4*10^(-6)
d
pL
●
●
●
●
■
■
■
◆
◆
◆
2 5 10 20 50
10-13
10-11
10-9
10-7
10-5
10-3
(b)
FIG. 2. Logical error rates for 3-qubit gate on the surface
codes and (a) pieceable 3 × 3 Bacon-Shor code (b) pieceable
7-qubit code in terms of code distance. Shown rates for piece-
able codes are upper bounds obtained by concatenating the
upper bounding function from Eq. (7). Black dashed curves
are only a guide to the eye.
●
●
●
●
●
●
■
■
■
■
■
■
◆
◆
◆
◆
◆
◆
5 10 50 100
1
1000
106
109
1012
V
d
���������
●
●
●
●
●
●
■
■
■
■
■
■
◆
◆
◆
◆
◆
◆
5 10 50 100
1
1000
106
109
1012
Surface codes
● Pieceable 7-qubit
■ Pieceable 3x3 Bacon-Shor
◆ 7-qubit magic state
FIG. 3. Circuit volume for logical 3-qubit gate on pieceable 7-
qubit code(circles), pieceable 3×3 Bacon-Shor code(squares),
and magic-state scheme on 7-qubit code(diamonds) in terms
of code distance. The dots correspond to every concatenation
level in the range. Although it may be hard to see the data
for pieceable 7-qubit code because they are close to the data
for the magic-state scheme, the pieceable 7-qubit has slightly
lower volume than the magic-state scheme for every distance
shown.
6because surface code with distance five has already larger
volume than 3 × 3 Bacon-Shor code with three concate-
nations as can be seen in Fig. 3. For the 7-qubit code,
Fig. 3 shows that a 3-qubit logical gate at two concatena-
tions of the 7-qubit code has less circuit volume than the
surface code of any size. Thus, whenever two concate-
nations are sufficient to achieve the target logical error
rate, the 7-qubit code will be advantaged, as is repre-
sented by the region in Fig. 4. Fig. 3 also shows that the
volume for the 7-qubit code with three concatenations is
slightly larger than that for the surface code with dis-
tance seven. Thus, the surface code is advantaged for
the region where distance seven is enough for the surface
code but three concatenations are needed for the 7-qubit
code, which corresponds to the region between the upper
purple region and the lower purple region in Fig. 4. The
7-qubit code again starts to have advantage over surface
code for the region where the surface code needs distance
nine whereas the 7-qubit code only needs to be concate-
nated three times, which corresponds to the lower purple
region in Fig. 4.
Printed by Wolfram Mathematica Student Edition
p
pT
BS3
ST2
ST3
SC(7)
pT = p
FIG. 4. (color online.) The region where pieceable 3 × 3
Bacon-Shor code (orange and purple) and pieceable 7-qubit
code (purple) use less volume than surface codes to implement
3-qubit gate to achieve fixed target logical error rate, pT , with
fixed physical error rate, p. Dashed lines labeled by STj are
upper bounds on logical error rate at the jth-concatenation
level of the pieceable 7-qubit code (Steane code). The dotted
line labeled by SC(7) is the logical error rate for the surface
code with distance seven. The dot-dashed line labeled by
BS3 is an upper bound of logical error rate at the third-
concatenation level of the 3 × 3 Bacon-Shor code. The solid
line on the upper boundary is the pT = p line.
VI. CONCLUSIONS
In this paper, we calculated logical error rates and re-
source overheads of 3-qubit gates using pieceable fault-
tolerant constructions, a non-local magic-state scheme
(on the 7-qubit code), and a local magic-state scheme
(on the surface code). In comparison with the non-local
magic-state scheme, we found that while pieceable con-
structions have comparable, or even lower logical error
rate to the magic-state scheme, the required circuit vol-
ume can be as little as 30%. This suggests that the piece-
able construction is a promising complement to schemes
relying on magic states.
We also compared the pieceable construction to the
surface codes and found that in quite a large region in
terms of physical error rates and target logical error rates,
pieceable constructions can have significantly lower cir-
cuit volume than surface codes.
Although realizing physical components with a small
physical error rate such that pieceable constructions have
a great advantage is challenging, one should notice that
surface codes also have as hard a challenge as this in
terms of resource overheads. Just as surface codes are
good candidates given access to large overheads, the
pieceable construction appears to be a good candidate
given access to small physical error rates.
Another difference between pieceable constructions
and the surface code is locality, i.e. the constraint that
physical gates involved act only between qubits that are
neighboring in some chosen low-dimensional layout. Al-
though the locality property is desirable in many experi-
mental setups, some systems allow non-local interaction
too [31]. Our result indicates that such a non-local tech-
niques can lead to significant reduction of resource use
for quantum error correcting codes.
ACKNOWLEDGEMENT
R.T. gratefully acknowledges the support of the Tak-
enaka scholarship. T.Y. appreciates the support of the
Department of Defense (DoD) through the National De-
fense Science and Engineering Graduate (NDSEG) Fel-
lowship program.
Appendix A: Steane’s error correction for arbitrary
stabilizer codes
Here, we describe the error correction used for the lead-
ing error correction and trailing error correction of the 5-
qubit code. Since the 5-qubit code is not CSS, one might
think Steane’s error-correction is inappropriate. How-
ever, in [26], Steane proposes a circuit to do just that for
the 5-qubit code. Unfortunately, Steane’s construction
as written is not quite correct. We present the correct
method that works for any stabilizer code. We will also
see that this method gives a conceptually simple way to
prepare the necessary ancilla state in line with Steane’s
original proposal [26].
Consider a Jn, kK stabilizer code C with stabilizer
S =
(
Sx Sz
)
, (A1)
and logical operators
N =
(
Nx Nz
)
, (A2)
7written in symplectic matrix form. That is,
Sx, Sz ∈ Fn−k2 × Fn2 and Nx, Nz ∈ F2k2 × Fn2 . Also, if we
define Λ = ( 0 II 0 ) ∈ F2n2 × F2n2 using k× k block notation,
then the canonical commutation relations are expressed
by SΛST = SΛNT and NΛNT = A for the 2k × 2k ma-
trix A with 1s on only the antidiagonal.
Following Steane, we propose the circuit in Fig. 5 to
extract the syndrome of C. The ancilla state used is twice
the size of the code C. The stabilizer of the ancilla state
|a〉 can be written
Sa =
 Sz Sx 0 Sz0 0 Sx Sz
0 0 Nx Nz
 . (A3)
We show that this ancilla state and the circuit in Fig. 5
successfully extract the syndrome without giving infor-
mation about the logical operators by propagating the
observables of the code C and the stabilizer Sa through
the circuit. Begin with,
aSx 0 0 Sz 0 0
bNx 0 0 Nz 0 0
0 Sz Sx 0 0 Sz
0 0 0 0 Sx Sz
0 0 0 0 Nx Nz
 , (A4)
where the syndrome is a ∈ Fn−k2 and logical operator
values are b = F2k2 . After the controlled-Z gates,
aSx 0 0 Sz Sx 0
bNx 0 0 Nz Nx 0
0 Sz Sx Sz 0 Sz
0 0 0 0 Sx Sz
0 0 0 0 Nx Nz
 . (A5)
After the controlled-X gates,
aSx 0 0 Sz Sx Sz
bNx 0 0 Nz Nx Nz
Sx Sz Sx Sz 0 0
0 0 0 0 Sx Sz
0 0 0 0 Nx Nz
 . (A6)
This is equivalent to the stabilizer,
aSx 0 0 Sz 0 0
bNx 0 0 Nz 0 0
a0 Sz Sx 0 0 0
0 0 0 0 Sx Sz
0 0 0 0 Nx Nz
 , (A7)
and so we see that measuring all ancilla qubits in the X-
basis results in a bitstring m ∈ F2n2 such that SΛm = a.
We note that |a〉 is simply related to a Bell pair |Φ〉 =
(|00〉+|11〉)/√2 encoded in C. If CXtb denotes n CX gates
transversally acting from the top n qubits of the ancilla
to the bottom n and Ht denotes n H gates applied to the
top n qubits, then |a〉 = HtCXtb
∣∣Φ〉. Thus, we can think
of |a〉 as an encoded Bell pair that has been “transversally
disentangled”. Circuit identities can be used to rearrange
Fig. 5 to Knill’s error-correction [38]. Also, if C is CSS,
Fig. 5 reduces to Steane’s error-correction for CSS codes
[28].
Steane’s original proposal for non-CSS error-correction
[26] omitted the Sz on the right side of the first row of
Eq. (A3). Doing the same calculation as above shows
that this will not succeed in measuring the syndrome.
Steane’s proposal suggested that the ancilla state would
always be CSS for any code. This, unfortunately, is not
true. Indeed, the 7-qubit code from [39] has an ancilla
that is not even local-Clifford (LC) equivalent to a CSS
state.
However, there are non-CSS codes for which Sa is LC
equivalent to a CSS code. The 5-qubit code with stabi-
lizer
S5 =
 1 0 1 0 0 0 0 0 1 10 1 0 0 1 0 0 1 1 01 0 0 1 0 0 1 1 0 0
0 0 1 0 1 1 1 0 0 0
 (A8)
is one of these. Indeed, Sa can be written using only Y -
type and Z-type generators. This allows us to prepare
the ancilla using Fig. 6, and verify the ancilla against
single circuit faults using Fig. 7, which are both standard
constructions for CSS states [28, 40].
X
/ /
1   ¯↵ / •
|a¯i / • •
K •
Y Y
Y Y
Y Y
K •
Y Y
K •
Y Y
K •
Y Y
|a¯ni / K† • K |a¯pi
|a¯ni / K†
|a¯ni / K† • •
|a¯ni / K†
n
n n
2n
FIG. 5. Circuit for Steane’s error correction on a non-CSS
code.
∣∣ψ¯〉 is an arbitrary encoded state, and |a¯〉 is the 2n-
qubit ancilla state from Eq. (A3). The notation n and n
means that the n CZ gates are transversally coupled to the
top n qubits in the ancilla, and that the n CNOT gates are
transversally coupled to the bottom five qubits in the ancilla.
Measurement is done transversally, from which the syndrome
can be classically computed.
Appendix B: Details of the simulation for logical
error estimation
Here, we describe some techniques used in the estima-
tion of logical error rates.
For reasons of simulation efficiency, only errors orig-
inating from at most two faults are considered, but all
such errors are counted. For Clifford circuits, propagat-
ing the Pauli errors resulting from circuit depolarizing
noise can be done simply using the Gottesman-Knill the-
orem [41]. However, some of our circuits are built from
8|a¯i
1   ¯↵ / •
|a¯i / • •
K •
Y Y
Y Y
Y Y
K •
Y Y
K •
Y Y
K •
Y Y
|a¯ni / K† • K |a¯pi
|a¯ni / K†
|a¯ni / K† • •
|a¯ni / K†
FIG. 6. Preparing the error-correction ancilla state for the
5-qubit code for use in Fig.5. Input states are all prepared in
|0〉.
1   ¯↵ / •
|a¯i / • •
K •
Y Y
Y Y
Y Y
K •
Y Y
K •
Y Y
K •
Y Y
|a¯ni / K† • K |a¯pi
|a¯ni / K†
|a¯ni / K† • •
|a¯ni / K†
H •
H •
H •
H •
X
Z
10
10
10
10 Z
FIG. 7. Verification circuit for the ancilla state prepared by
the circuit in Fig.6. |a¯n〉 is a noisy ancilla state that needs to
be verified, and |a¯p〉 is a purified one.
non-Clifford CCZ gates. In this case, a tracked error is
modified to include controlled-Z (CZ) terms. A Pauli
error that propagates through m CCZ gates picks up at
most m CZ terms (some may cancel). Upon measure-
ment (e.g. in the error-correction circuits), the CZ terms
must be broken down into a sum of Paulis, only some of
which flip measurement bits to cause a signal. We treat
each term as different error element with the probability
equal to the square of the amplitude of the term.
There is a subtlety in breaking down CZ errors. As a
sum of Pauli terms, a CZ error is written (II+ZI+IZ−
ZZ)/2. If there are multiple CZ errors, this Pauli sum
has every possible combination of I and Z on the qubits
on which CZ errors are applied, each with a plus or minus
sign. Thus, m CZ errors applied on different qubit pairs
are decomposed into a Pauli sum with 4m terms. If we
treated each term as different error element at this point,
each term would be assigned the probability square of the
amplitude. However, some terms may be equivalent to
other terms up to stabilizers. Such terms should interfere
coherently. In the simulation of pieceable CCZ on 3 ×
3 Bacon-Shor code, all the terms in the Pauli sum are
rewritten in an unambiguous way up to stabilizers, and
terms interfere before assigning them a probability.
Now that we recognize the subtle issue as the coherent
addition of the Pauli terms, we argue that it does not af-
fect the logical error rate except of the 3× 3 Bacon-Shor
code case. Firstly, note that the coherent addition can
only happen when the number of qubits in one block on
which CZ errors are applied is more than or equal to the
weight of stabilizers. This is because if stabilizers have
higher weight, multiplying a stabilizer necessarily gives
extra Paulis on the qubits that are not affected by CZ er-
rors. It prevents the term multiplied by a stabilizer being
the same as another term in the Pauli sum. In pieceable
CCZ circuit on the 5-qubit code, CZ errors only occur on
the three qubits, the support of logical Z. Since stabiliz-
ers are weight four, the coherent addition will not happen
for the above reason. For the pieceable CCZ circuit on
the 7-qubit code, we argue that although the coherent
addition may happen, it will not affect logical error rate.
Since stabilizers for the 7-qubit code are weight four, the
coherent addition could happen only when two X errors
go through in the same block in the first piece. However,
these X errors cannot be corrected because the 7-qubit
code is a perfect CSS code. Thus, all the error elements
where the coherent addition could happen end up with
logical errors regardless, and it does not matter whether
we accurately interfere the terms. For the pieceable CCZ
circuit on the 3 × 9 Bacon-Shor code, the situation is
similar to the 7-qubit code case; the coherent addition
could happen, but will not affect the logical error rate.
Since CCZs on the 3×9 Bacon-Shor code are transversal,
the number of qubits in the same code block on which
CZ errors are applied is at most two. Since weight-two
Z-gauge operators are aligned along a row, the coherent
addition could only happen when two CZ errors are ap-
plied on the two qubits in the same row in some code
block. However, all the terms in the Pauli sum for the
CZ errors in that block are Z-type errors whose weight
is less than or equal to two, and whose support is in the
same row. Since weight-one errors can be corrected by
the standard error correction, and weight-two errors in
the same row are equivalent to the identities up to stabi-
lizers for the Z-gauge Bacon-Shor code, the terms in the
Pauli sum are all correctable when the concerned coher-
ent addition could happen. Thus, it will not affect the
logical error rate.
Considering CZ errors as a Pauli sum is inefficient –
m CZ terms lead to 4m Pauli addends. However, in the
simulation, we do not actually break down all the CZ
errors. Under certain cases, we definitely know that the
final error correction will succeed to correct the CZ error.
9One of such cases is that the CZ error is applied over
different code blocks and those code blocks do not have
any Z errors. The other case is that the CZ error is
applied in one code block, there are no Z errors in the
block, and an intermediate error correction notifies the
correct locations that the CZ error is applied over.
Also, we can reduce the number of CZ errors by re-
moving harmless CZ errors before the measurements in
the final error correction take place. A harmless error is
one that does not affect encoded states. When errors are
only Paulis, like in the circuits that only consist of Clif-
ford gates, such errors are just stabilizers. The following
theorem generalizes the condition for the harmless errors
to non-Pauli case.
Theorem 1. Let E be an error operator, S =
〈g1, . . . , gn−k〉 be the stabilizers, 〈gn−k+1, . . . , gn+k〉 be
the logical operators of the code, and
∣∣ψ¯〉 be an encoded
state. If g†iE
†giE ∈ S for all i = 1, . . . , n + k, then
E
∣∣ψ¯〉 = ∣∣ψ¯〉 up to global phase.
Proof. By the assumption, there exists a stabilizer sl such
that giE = Egisl,∀i. For i = 1, . . . , n− k, since
giE
∣∣ψ¯〉 = Egisl∣∣ψ¯〉 = E∣∣ψ¯〉, (B1)
E preserves the codeword space. Now for i = n − k +
1 . . . n+ k, let
∣∣∣g(±)i 〉 be the eigenstate of the logical op-
erator gi with eigenvalue ±1, then
giE
∣∣∣g(±)i 〉 = Egisl∣∣∣g(±)i 〉 = ±E∣∣∣g(±)i 〉. (B2)
Thus, E also preserves the logical space.
This theorem allows us to ignore the CZ errors that sat-
isfy the above condition, which greatly reduces the com-
putational task.
When intermediate error corrections are present, CZ
errors need to be broken down according to the Pauli
sum in the intermediate error corrections, and need to
be propagated until the error correction at the end. If
the number of intermediate corrections is zero or one, it
is rather easy to deal with, because the number of error
elements due to the CZ errors that need to be propagated
until the end is limited. Actually, except the pieceable
5-qubit code, all the CZ errors that do not satisfy the
condition of Theorem 1 were broken down upon mea-
surement and tracked to see if they end with a logical
error.
For the 5-qubit code, to reduce the computational de-
mand, we take the rule where we declare an error to be
a logical error as soon as some CZ errors are measured
in an intermediate error correction. Although this strat-
egy would cause some overestimation of the logical error
rate, we argue that the probability that CZ errors are
measured in an intermediate error correction is rather
small. CZ errors are measured in an intermediate error
correction in the following two cases. The first case is
that an X or Y -type error is caught by a CCZ gate in
the adaptive nonconstant-stabilizer measurement. It is
described in [19] that the adaptive nonconstant-stabilizer
measurement is only triggered when some constant stabi-
lizer measurements click due to an X or Y -type error only
for a single code block. The adaptive measurement may
contain CCZ gates connected between the ancilla block
and the code blocks whose constant stabilizers did not
click. Thus, an X or Y -type error is caught by a CCZ
gate in the adaptive measurement only when an X or
Y error triggers the adaptive measurement, the constant
measurement in different code blocks fail with X or Y
type error, and it goes to a CCZ gate in the adaptive mea-
surement. The second case is that a CZ error is caught
by a CNOT gate in the adaptive measurement. Note
that CZ error only happens when an X or Y -type error
propagates through the CCZ gates in the code blocks.
CZ errors are then present in code blocks other than the
one in which the X or Y -type error exists. Also, CNOT
gates in the adaptive measurement could be only applied
to the code block whose constant stabilizers click. Thus,
a CZ error is caught by a CNOT gate in the adaptive
measurement only when an X or Y -type error generates
CZ errors in different code blocks, a later CCZ gate fails
to cancel the first X or Y -type error and generate an-
other X or Y -type error in the other code block that will
make the constant measurement click, and the CZ er-
ror goes into CNOT gate in the adaptive measurement.
These two cases are realized in very restricted situations,
so the contribution to the total logical error rate from
these cases would be rather small.
Another situation arises with two or more intermediate
corrections. The pieceable construction on the 5-qubit
code has multiple intermediate corrections, and they de-
tect X errors and notify possible error locations to the
final error correction so that the final error correction can
correct up to weight two located errors. However, mul-
tiple faults can cause two intermediate error corrections
to incorrectly notify more than two locations to the final
error correction. We declare those elements to be logical
errors.
Appendix C: Details on the error polynomials
Here, we describe how to obtain Eq. (1)-(3) from the
exact counting. We first consider Eq. (1), the probability
that one or two faults occur and that pattern is accepted
by all the verification modules through the propagation,
but ends up with a logical error. Due to the fault-tolerant
property, a single fault never causes a logical error. Thus,
it suffices to consider the cases when two faults occur. In
the simulation, each combination of two-fault patterns is
assigned a probability
(
pr
4r−1
)(
ps
4s−1
)
if the faulty com-
ponents are an r-qubit gate and an s-qubit gate. We
propagate all the errors until the end and sum up the
probabilities of the errors that lead to logical errors. Dur-
ing the propagation, these errors may encounter verifica-
tion processes. If they are accepted by the verification,
10
we keep propagating them. Otherwise, we stop propa-
gating them so that they do not contribute to the logical
error rate. Let Qfail,acc denote the estimated logical error
rate. Since each physical error rate is either p1, p2 or p3,
it looks like
Qfail,acc =
3∑
r=1
3∑
s≥r
F (2)rs prps. (C1)
Let nr be the total number of r-qubit gates. Since
we assume that different components fail independently,
Eq. (1) is obtained as
P
(2)
fail,acc =
[
Π3t=1(1− pt)nt
] 3∑
r=1
3∑
s≥r
F (2)rs
pr
1− pr
ps
1− ps
 .
(C2)
Similarly, Qsucc,acc, the sum of the assigned probability
of the patterns that are accepted by all the verification
modules and do not cause a logical error, looks like
Qsucc,acc =
3∑
r=1
S(1)r pr +
3∑
r=1
3∑
s≥r
S(2)rs prps (C3)
and Eq. (2) is obtained as
P (2)succ,acc =
[
Π3t=1(1− pt)nt
] ·1 + 3∑
r=1
S
(1)
r pr
1− pr +
3∑
r=1
3∑
s≥r
S
(2)
rs prps
(1− pr)(1− ps)
 (C4)
The patterns that are not counted in either Qfail,acc or
Qsucc,acc are rejected in some verification module. Thus,
we obtain Eq. (3) as
P
(2)
rej =
[
Π3t=1(1− pt)nt
] ·(
3∑
r=1
A
(1)
r pr
1− pr +
3∑
r=1
3∑
s>r
A
(2)
rs prps
(1− pr)(1− ps)
)
(C5)
where
A(1)r = nr − S(1)r (C6)
A(2)rs =
{
nrns − F (2)rs − S(2)rs (r 6= s)(
nr
2
)− F (2)rr − S(2)rr (r = s) (C7)
Special care is required for 5-qubit code because nr
cannot be definitely determined because of the adaptive
measurements. Note that at most two adaptive measure-
ments are triggered when one or two faults occur. Thus,
taking nr that includes two largest adaptive measure-
ments, which are the ones with 13-CAT and 9-CAT, the
lower bound in Eq. (7) still holds. Instead of Eq. (C4),
we take
P (2)succ,acc =
[
Π3t=1(1− pt)n
′
t
]
·
(
1 +
3∑
r=1
S
(1)
r pr
1− pr
)
+
[
Π3t=1(1− pt)nt
] ·
 3∑
r=1
3∑
s≥r
S
(2)
rs prps
(1− pr)(1− ps)
(C8)
where n′r is the number of fault locations not including
adaptive measurements.
For the 5-qubit code we also use
Pacc = ΠjPacc,j = Πj(1− Prej,j) < Πj′(1− P (2)rej,j′) (C9)
where j is taken over all the verification modules and j′
is taken over all the verification modules except adaptive
measurements.
The following are the obtained values for the parame-
ters for each construction.
• 3× 3 Bacon-Shor
n1 = 252, n2 = 180, n3 = 27 (C10)
S(1)r = (252, 180, 27) (C11)
F (2)rs =
4216.8 4271.9 783.51194.5 461.5
34.9
 (C12)
S(2)rs =
27409.2 41088.1 6020.514915.5 4398.5
316.1
 (C13)
• Pieceable 7-qubit
n1 = 648, n2 = 480, n3 = 21 (C14)
S(1)r = (383, 224, 21) (C15)
F (2)rs =
13258.4 12722.6 3581.43077.3 1855.3
176.7
 (C16)
S(2)rs =
56460.9 68953.7 4461.620748.8 2848.7
33.3
 (C17)
• 3× 9 Bacon-Shor
n1 = 2736, n2 = 864, n3 = 27 (C18)
S(1)r = (1524, 566.4, 27) (C19)
F (2)rs =
52074 43098.4 7049.28663.0 2968.1
183.3
 (C20)
S(2)rs =
1013640 748636 34098.8138296 12324.7
167.7
 (C21)
11
• Pieceable 5-qubit
n1 = 3365, n2 = 1228, n3 = 41 (C22)
n′1 = 2967, n
′
2 = 1152, n
′
3 = 27 (C23)
S(1)r = (1475, 457.6., 27) (C24)
F (2)rs =
113030.0 85261.6 14679.216067.4 5551.4
332.5
 (C25)
S(2)rs =
639301.0 482043.0 20392.490716.0 7554.7
59.3
 (C26)
• 7-qubit with magic state
n1 = 1138, n2 = 743, n3 = 14 (C27)
S(1)r = (612, 324.3, 6.9) (C28)
F (2)rs =
25436.5 24565.9 1078.56232.4 521.1
26.9
 (C29)
S(2)rs =
154650 166625 3921.444308.3 2178.5
18.6
 (C30)
Appendix D: Transformation matrix for volume
calculation
As explained in the main text, the circuit volume for
concatenated codes at higher concatenation level is de-
scribed by a transformation matrix A where Aij = N
Gj
Gi
.
We show the matrices for pieceable 3 × 3 Bacon-Shor
code, pieceable 7-qubit code, and 7-qubit with magic
state, which are denoted by ApBS ,Ap7, Am7 respectively.
We take the following order for gates; G={3-qubit gate,
2-qubit gate, single qubit gate, |0〉 and |+〉 preparation,
X basis and Z basis measurement}. For preparation of
|0¯〉 and |+¯〉 on 7-qubit code, we use the method proposed
by Goto [29], which requires just one additional ancilla.
ApBS =

27 90 45 54 54
0 69 30 36 36
0 30 24 18 18
0 6 3 9 0
0 0 0 0 9

Ap7 =

21 162 240 72 72
0 79 104 32 32
0 36 59 16 16
0 11 22 8 1
0 0 0 0 7

Am7 =

14 267 504 136 136
0 79 104 32 32
0 36 59 16 16
0 11 22 8 1
0 0 0 0 7

Appendix E: Detailed resource analysis for surface
code
We describe the detailed resource analysis to imple-
ment logical Toffoli gate on the surface code. There are
mainly two ways to do it, synthesizing a Toffoli gate using
Clifford gates and T gates, and injecting a logical Toffoli
state by gate teleportation.
Consider the first method, in the context of the Tof-
foli implementation proposed by Jones [42] using four T
gates. The T gates are implemented by |T 〉 state and gate
teleportation where |T 〉 state is purified by a distillation
protocol. We use the 15-1 protocol [6, 22] which reduces
error rates of |T 〉 from O(p) to O(p3), because it requires
the smallest circuit volume compared to others [9, 43, 44].
Since the region of the physical error rate that pieceable
construction helps to reduce error rate is p < 10−4 as
can be seen in Fig.1, the logical error rate of the magic
state distilled once is < 10−12. Although the reduction
in error rate may not be sufficiently low depending on the
goal logical error rate, one distillation already gives large
overheads. Thus, we consider the circuit volume for one
distillation as a lower bound and proceed the discussion.
It may come as a surprise that other distillation pro-
tocols with better conversion rate between noisy magic
state and purified magic state have larger circuit volume.
It comes from that Hadamard gate and phase gate are not
transversal on the surface code. For implementing the
Hadamard gate or phase gate fault-tolerantly, some non-
trivial techniques, such as state injection, lattice surgery
[45], code deformation [46], or surface folding [47], are re-
quired. These take many surface code steps, which affect
the circuit volume. Even though conversion rate between
noisy T state and purified T state is high, if it requires
many costly Clifford gates, the circuit volume will be
large. Especially in the case when only one distillation
is required, a poor conversion rate does not hurt circuit
volume that much.
Let us analyze the number of surface code cycles and
circuit volume for each gate that are necessary to imple-
ment the logical Toffoli gate. Let CG and VG be surface
code cycles and circuit volume it takes to implement G.
We discuss circuit volume in units of [qubit·cycle] and
then convert it to [qubit·step] using the fact that one
surface code cycle consists of six steps [22]. Also, let d be
surface code distance, and n = (2d− 1)2 be the number
of physical qubits on a surface. Necessary components
here are {|0¯〉 and |+¯〉 preparation, CNOT, Hadamard,
Phase}.
For logical state preparation, we initialize a surface
with physical |0〉 for |0¯〉 preparation, and |+〉 for |+¯〉
12
preparation. After d rounds of error correction, an appro-
priate recovery can be determined to prepare the desired
logical state fault-tolerantly. Thus, we find Cprep = d,
Vprep = nd.
The CNOT gate can be transversally implemented if
we allow non-locality or a 3D layered architecture. How-
ever, since one of the striking features of surface codes
is local interactions in a 2D architecture, we use lattice
surgery to implement the CNOT gate [45]. First, prepare
a surface with |+¯〉 state between the control surface and
the target surfaces. The control surface and the interme-
diate surface are merged while obtaining measurement
syndromes. This corresponds to Z¯Z¯ measurement. Af-
ter that, the surface is split into two original surfaces
and the intermediate surface is merged to target surface,
which corresponds to X¯X¯ measurement. It ends with
splitting it into the two original surfaces. Since merger
and splitting each take d rounds of error correction to
stabilize the surface,
CCNOT = Cprep + 4d = 5d (E1)
and
VCNOT = Vprep + (3n+ 2(2d− 1))(Cprep + 4d)
= 6d− 44d2 + 64d3. (E2)
The Hadamard gate is also implemented by the lat-
tice surgery. In the lattice surgery technique, firstly
Hadamard gates are applied transversally. To correct the
orientation of the boundary, additional qubits are merged
to the boundary and some qubits are split out so that it
restores the original boundary orientation. The protocol
ends with moving the surface back to the original posi-
tion. It takes d cycles to stabilize the original surface
after applying transversal H, d cycles for lattice merger,
d cycles for lattice splitting, and d cycles for SWAP op-
erations to move the lattice back to the original position.
Thus, CH = 4d. For circuit volume, we need a bigger sur-
face to carry out merger and split by one more column
and row of qubits. Thus, VH = (2d)
2CH = 16d
3.
For implementing phase gate, we use the circuit in
Fig.8. A good thing about this circuit is that the an-
cilla state |S〉 = S|+〉 = (|0〉 + i|1〉)/√2 is preserved.
Thus, once a purified |S〉 state is prepared at the be-
ginning of the computation, it can be reused whenever
a phase gate needs to be applied. After averaging over
a whole computation, the volume use for the distillation
process at the beginning will be negligible per one logical
gate construction. Note that if only local interactions are
allowed, it may take additional circuit volume when the
qubit to which the phase gate should be applied is far
from the stored |S〉 state. Thus, our estimation should
be considered as a lower bound of the actual circuit vol-
ume under the setting in which only local interactions
are allowed. It gives CS = 2CCNOT + 2CH = 18d and
VS = 2VCNOT + 2VH + 2nCH = 20d− 120d2 + 192d3.
Combining these building blocks, we find the number
of cycles and volume required to implement a T gate and
a Toffoli gate.
For distilling a T state, |T 〉 = T |+〉, we use the cir-
cuit in [22] which takes 15 |T 〉 states and output 1 |T 〉
with lower error rate. It takes 7 surface code cycles for
CNOTs and 2 steps for transversal T and measurements,
which is 1/4 surface code cycle. Ignoring the last 1/4
cycles, we get C|T 〉 = 7CCNOT = 35d. With some paral-
lelization, we get V|T 〉 = 16Vprep+VCNOT +7(5VCNOT +
6nCCNOT ) +
1
4 · 16nd = 446d− 2504d2 + 3224d3
For implementing a T gate, we use the usual gate tele-
portation technique [48]. The S gate correction is applied
with probability 1/2. We get CT = CCNOT +
1
2CS = 14d
and VT = V|T 〉+VCNOT + 12VS = 462d−2608d2+3384d3.
Since the surface code is CSS, we can transversally make
measurements on all the data qubits, and extract eigen-
value for measurement operator. Thus, measurement is
done with only one time step, which is 1/8 of one surface
code cycle, we ignore the volume due to the measure-
ment.
Toffoli gate synthesis in [42] consists of two steps. In
the first part, one constructs the Toffoli∗ gate, which is
Toffoli gate followed by controlled-S† gate, where four
T gates and two H gates are used. Also, note that one
logical ancilla block is used. The second part takes the
Toffoli∗ gate to the usual Toffoli gate with help of one
additional ancilla block. By construction of the synthesis
circuit, we get
CToffoli∗ = 2CH + 4CCNOT + CT = 42d (E3)
and
VToffoli∗ = Vprep + 6nCH + 2VH + 8VCNOT + 4VT
= 1921d− 10884d2 + 14180d3 (E4)
Second part of the circuit gives
CToffoli = CToffoli∗ +CS +CCNOT +CH +C|T 〉 = 104d
(E5)
and
VToffoli = VToffoli∗ + Vprep + nCToffoli∗ + VS + VCNOT
+ VH + 3n(CS + CCNOT + CH)
+ n(CToffoli∗ + CCNOT + CH)
= 2118d− 11732d2 + 15136d3 (E6)
where the unit for the volume is [qubit · cycle]. We in-
cluded C|T 〉 in CToffoli because cycles in the distillation
circuit also contribute increasing in the final logical error
rate. Note that it includes the ancilla qubits for keeping
|S〉 state that is kept during the whole computation.
Another way to implement logical Toffoli gate on the
surface code is to use Toffoli state. To locally prepare
the Toffoli state, we use the protocol that takes eight
|H〉 states and outputs one Toffoli state [49]. In the
preparation circuit, there are two Y (pi/4) gates and four
Y (−pi/4) gates, which are rotations with respect to Y
axis. These gates are implemented using |H〉 state with
Y basis measurement, controlled-Y gate, and Y (±pi/2)
gate. To implement these gates on the surface code, we
13
use phase gates and Hadamard gates to rotate them to
X basis measurement, CNOT gate, and phase gate. We
then obtain
CY (pi/4) = CS + CCNOT + CS + (2CH + CS)/2
= 54d (E7)
and
VY (pi/4) = Vprep + VS + VCNOT + 2VS + (2VH + VS)/2
= 77d− 468d2 + 756d3 (E8)
Using these, we obtain
C|Toffoli〉 = 7CCNOT + (15/2)CS + 3CH
= 182d (E9)
and
V|Toffoli〉 = 4Vprep + 3(2VCNOT + 2VY (pi/4) + 2nCY (pi/4))
+VCNOT + 2nCCNOT
= 842d− 4468d2 + 6336d3 (E10)
where |Toffoli〉 refers to the Toffoli state. Cycles and
volume for the teleportation circuit, which we write Ctele
and Vtele are
Ctele = CCNOT + 1/2(3CCNOT + 2CH)
= 16.5d (E11)
Vtele = 3VCNOT + {4nCH + 3(VCNOT + nCCNOT )}/2
= 42.5d− 260d2 + 350d3 (E12)
Combining all of them, we get
CToffoli = 198.5d (E13)
and
VToffoli = 7076d− 37824d2 + 53488d3 (E14)
where the unit for the volume is [qubit · cycle]. Fig. 9
shows circuit volume with unit [qubit·step] in terms of
code distance for both ways of implementation. We can
see that the the scheme with Toffoli state has lower circuit
volume. It is the reason why the scheme with Toffoli state
is discussed in the main text. 1   ¯↵ / •
|a¯i / • •
K •
Y Y
Y Y
Y Y
K •
Y Y
K •
Y Y
K •
Y Y
|a¯ni / K† • K |a¯pi
|a¯ni / K†
|a¯ni / K† • •
|a¯ni / K†
H •
H •
H •
H •
| i • • S| i
|Si H H |Si
FIG. 8. Circuit identity used for implementing S gate. Here
|S〉 = S|+〉 = (|0〉+ i|1〉)/√2.
���������
5 10 20
107
108
109
Toffoli state
T state
d
V
5 10 20
106
107
108
109
FIG. 9. Circuit volume for two different implementations of
Toffoli gate. Dashed: gate synthesis using T gate. Solid:
Toffoli state scheme
14
[1] B. Zeng, A. Cross, and I. L. Chuang, IEEE Transactions
on Information Theory 57, 6272 (2011).
[2] X. Chen, H. Chung, A. W. Cross, B. Zeng, and I. L.
Chuang, Phys. Rev. A 78, 012353 (2008).
[3] B. Eastin and E. Knill, Phys. Rev. Lett. 102, 110502
(2009).
[4] S. Bravyi and R. Ko¨nig, Phys. Rev. Lett. 110, 170503
(2013).
[5] D. Gottesman and I. L. Chuang, Nature (London) 402,
390 (1999).
[6] S. Bravyi and A. Kitaev, Phys. Rev. A 71, 022316 (2005).
[7] C. Jones, Phys. Rev. A 87, 042305 (2013).
[8] E. T. Campbell and M. Howard, Phys. Rev. A 95, 022316
(2017).
[9] S. Bravyi and J. Haah, Phys. Rev. A 86, 052329 (2012).
[10] A. Paetznick and B. W. Reichardt, Phys. Rev. Lett. 111,
090505 (2013).
[11] J. T. Anderson, G. Duclos-Cianci, and D. Poulin, Phys.
Rev. Lett. 113, 080501 (2014).
[12] H. Bomb´ın, New J. Phys. 17, 083002 (2015).
[13] T. Jochym-O’Connor and R. Laflamme, Phys. Rev. Lett.
112, 010505 (2014).
[14] E. Nikahd, M. Sedighi, and M. Saheb Zamani, Phys.
Rev. A 96, 032337 (2017).
[15] E. Nikahd, M. S. Zamani, and M. Sedighi,
arXiv:1610.03309 (2016).
[16] C. Chamberland, T. Jochym-O’Connor, and
R. Laflamme, Phys. Rev. Lett. 117, 010501 (2016).
[17] C. Chamberland, T. Jochym-O’Connor, and
R. Laflamme, Phys. Rev. A 95, 022313 (2017).
[18] K. M. Svore, A. W. Cross, I. L. Chuang, and A. V. Aho,
Quantum Info. Comput. 6, 193 (2006).
[19] T. J. Yoder, R. Takagi, and I. L. Chuang, Phys. Rev. X
6, 031039 (2016).
[20] R. Chao and B. W. Reichardt, arXiv:1705.05365
(2017).
[21] T. J. Yoder, arXiv:1705.01686 (2017).
[22] A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N.
Cleland, Phys. Rev. A 86, 032324 (2012).
[23] P. Aliferis, D. Gottesman, and J. Preskill, Quantum Inf.
Comput. 6, 97 (2006).
[24] L. S. Bishop, S. Bravyi, A. Cross, J. M. Gambetta, and
J. Smolin, (2017), dal.objectstorage.open.softlayer.com.
[25] R. Takagi, T. Yoder, and I. Chuang, “Circuit database,”
GitHub repository.
[26] A. M. Steane, Phys. Rev. Lett. 78, 2252 (1997).
[27] P. W. Shor, in Proceedings of the 37th Annual Symposium
on Foundations of Computer Science, FOCS ’96 (IEEE
Computer Society, Washington, DC, USA, 1996) pp. 56–.
[28] A. Steane, Fortschritte der Physik 46, 443 (1998).
[29] H. Goto, Sci. Rep. 6, 19578 (2016).
[30] X. Zhou, D. W. Leung, and I. L. Chuang, Phys. Rev. A
62, 052316 (2000).
[31] C. Monroe, R. Raussendorf, A. Ruthven, K. R. Brown,
P. Maunz, L.-M. Duan, and J. Kim, Phys. Rev. A 89,
022317 (2014).
[32] A. G. Fowler, A. M. Stephens, and P. Groszkowski, Phys.
Rev. A 80, 052312 (2009).
[33] D. S. Wang, A. G. Fowler, A. M. Stephens, and L. C. L.
Hollenberg, Quantum Inf. Comput. 10, 456 (2010).
[34] D. S. Wang, A. G. Fowler, and L. C. L. Hollenberg, Phys.
Rev. A 83, 020302 (2011).
[35] J. R. Wootton and D. Loss, Phys. Rev. Lett. 109, 160503
(2012).
[36] D. Poulin, Phys. Rev. A 74, 052333 (2006).
[37] J. Fern, Phys. Rev. A 77, 010301 (2008).
[38] E. Knill, Nature (London) 434, 39 (2005).
[39] T. J. Yoder and I. H. Kim, Quantum 1, 2 (2017).
[40] A. W. Cross, D. P. Divincenzo, and B. M. Terhal, Quan-
tum Info. Comput. 9, 541 (2009).
[41] D. Gottesman, in Group22: Proceedings of the XXII In-
ternational Colloquium on Group Theoretical Methods
in Physics (International Press, Cambridge, MA, USA,
1999) pp. 32–43.
[42] C. Jones, Phys. Rev. A 87, 022328 (2013).
[43] A. M. Meier, B. Eastin, and E. Knill, Quantum Inf.
Comput. 13, 195 (2013).
[44] B. W. Reichardt, Quantum Information Processing 4,
251 (2005).
[45] C. Horsman, A. G. Fowler, S. Devitt, and R. V. Meter,
New J. Phys. 14, 123011 (2012).
[46] H. Bombin and M. A. Martin-Delgado, J. Phys. A 42,
095302 (2009).
[47] J. E. Moussa, Phys. Rev. A 94, 042316 (2016).
[48] M. A. Nielsen and I. L. Chuang, Quantum Computation
and Quantum Information (Cambridge university press,
2010).
[49] B. Eastin, Phys. Rev. A 87, 032321 (2013).
