In theory, quantum computers can efficiently simulate quantum physics, factor large numbers and estimate integrals, thus solving otherwise intractable computational problems. In practice, quantum computers must operate with noisy devices called "gates" that tend to destroy the fragile quantum states needed for computation. The goal of fault-tolerant quantum computing is to compute accurately even when gates have a high probability of error each time they are used. Here we give evidence that accurate quantum computing is possible with error probabilities above 3 % per gate, which is significantly higher than what was previously thought possible. However, the resources required for computing at such high error probabilities are excessive. Fortunately, they decrease rapidly with decreasing error probabilities. If we had quantum resources comparable to the considerable resources available in today's digital computers, we could implement non-trivial quantum computations at error probabilities as high as 1 % per gate.
the EPG is smaller than a threshold, then scalable quantum computing is possible. [5] [6] [7] [8] Thresholds depend on additional assumptions on the error model and device capabilities. Estimated thresholds vary from below 10 −6 [5] [6] [7] [8] to 3 × 10 −3 [9] , with 10 − 4 [10] often quoted as the EPG to be achieved in experimental quantum computing.
Many experimental proposals for quantum computing claim to achieve EPGs below 10 −4 in theory. However, in the few cases where experiments with two quantum bits (qubits) have been performed, the EPGs currently achieved are much higher, 3 × 10 −2 or more in ion traps 11, 12 and liquid-state NMR. 13, 14 The first goal of our work is to give evidence that scalable quantum computing is possible at EPGs above 3 × 10 −2 . While this is encouraging, the fault-tolerant architecture that achieves this is extremely impractical because of large resource requirements. To reduce the resource requirements, lower EPGs are required. The second goal of our work is to give a faulttolerant architecture (called the "C 4 /C 6 architecture") well suited to EPGs between 10 −4 and 10
and to determine its resource requirements, which we compare to the state of the art in scalable quantum computing as exemplified by the work of Steane. 9 Fault-tolerant architectures realize low-error qubits and gates by encoding them with errorcorrecting codes. A standard technique for amplifying error reduction is concatenation. Suppose we have a scheme that, starting with qubits and gates at one EPG, produces encoded qubits and gates that have a lower EPG. Provided the error model for encoded gates is sufficiently well behaved, we can then apply the same scheme to the encoded qubits and gates to obtain a next level of encoded qubits and gates with much lower EPGs. Thus, a concatenated fault-tolerant architecture involves a hierarchy of repeatedly encoded qubits and gates. The hierarchy is described in terms of levels of encoding, with the physical qubits and gates being at level 0. The top level is used for implementing quantum computations and its qubits, gates, EPGs, etc. are referred to as being "logical". Typically, the EPGs decrease superexponentially with number of levels, provided that the physical EPG is below the threshold for the architecture in question.
The C 4 /C 6 architecture differs from previous ones in five significant ways. First, we use the simplest possible error-detecting codes, thus avoiding the complexity of even the smallest errorcorrecting codes. Error correction is added naturally by concatenation. Second, error correction is performed in one step and combined with logical gates by means of error-correcting teleportation. This minimizes the number of gates contributing to errors before they are corrected. Third, the fault-tolerant architecture is based on a minimal set of operations with only one unitary gate, the controlled-NOT. Although this set does not suffice for universal quantum computing, it is possible to bootstrap other gates. Fourth, verification of the needed ancillary states (logical Bell states) largely avoids the traditional syndrome-based schemes. Instead, we use hierarchical teleportations. Fifth, the highest thresholds are obtained by introducing the model of postselected computing with its own thresholds, which may be higher than those for standard quantum computing. Our fault tolerant implementation of postselected computing has the property that it can be used to prepare states sufficient for (standard) scalable quantum computing. Basics. For an introduction to quantum information, computing and error correction, see [15] . The unit of quantum information is the qubit whose states are superpositions α|0 + β|1 . Qubits are acted on by the Pauli operators X = σ x (bit flip), Z = σ z (sign flip) and Y = σ y = iσ x σ z . The identity operator is I. One-qubit gates include preparation of |0 and |+ = (|0 + |1 )/ √ 2, Zmeasurement (distinguishing between |0 and |1 ), X-measurement (distinguishing between |+ and |− = (|0 − |1 )/ √ 2), and the Hadamard gate (HAD, α|0 + β|1 → α|+ + β|− ). We use one unitary two-qubit gate, the controlled-NOT (CNOT), which maps |00 → |00 , |01 → |01 , |10 → |11 , and |11 → |10 . This set of gates is a subset of the so-called Clifford gates, which are insufficient for universal quantum computing. 10 Our minimal gate set G min consists of |0 and |+ preparation, Z and X measurement and CNOT. Universality may be achieved with the addition of other one-qubit preparations or measurements, as explained below. The physical gates mentioned are treated as being implemented in one "step"; the actual implementation may be more complex.
The C 4 /C 6 architecture is based on two error-detecting stabilizer codes, C 4 , a four-qubit code, and C 6 , a three qubit-pair code, both encoding a qubit pair. A stabilizer code is a common eigenspace of a set of commuting products of Pauli operators (the "check operators"). Such products are denoted by strings of X, Y , Z and I. For example, XIZ is a Pauli product for three qubits with X acting on the first and Z on the last. The shortest error-detecting code C 4 for qubits encodes two qubits in four and has check operators XXXX and ZZZZ. The encoded qubits (labeled L and S) are defined by encoded operators X L = XXII, Z L = ZIZI, X S = IXIX and Z S = IIZZ. We use this code as a first level of encoding and call the encoded qubits "level 1" qubits. Level 1 qubits come in pairs, each encoded in a "block" of four physical qubits. The second code C 6 is constructed as a code on three qubit pairs able to detect any error acting on one pair. It encodes a qubit pair and has check operators XIIXXX, XXXIIX, ZIIZZZ and ZZZIIZ acting on three consecutive qubit pairs. A choice for encoded operators for C 6 is X L = IXXIII, Z L = IIZZIZ, X S = XIXXII, Z S = IIIZZI. This code is used for the second and higher levels of encoding. For example, the second level is obtained by using three level one pairs to obtain a level two pair. A level l qubit pair requires a block of 4 × 3 l−1 physical qubits. The block structure is depicted in Fig. 1 .
The concatenation of error-detecting codes allows for a flexible use of error detection and correction. Given a joint eigenstate of the check operators, its list of eigenvalues is called the "syndrome". The level l encoding has check operators that can be derived from the check and encoded operators of C 4 and C 6 . Ideally, the state of a level l qubit exists in the subspace with syndrome 0 (all eigenvalues are +1). In the presence of errors this is typically not the case, so the state is defined only with respect to a current "Pauli frame" and an implicit recovery scheme. The Pauli frame is defined by a Pauli product that restores the error-free state of the block to the syndrome 0 subspace. The implicit recovery scheme determines the Pauli products needed to coherently map states with other syndromes to the syndrome of the error-free state. Defining the level l state in this way makes it possible to avoid explicitly applying Pauli products for correction or teleportation compensation. 9 Error-detection and correction are based on measurements that retroactively determine the syndrome of the state (the current syndrome has already been affected by further errors).
An error is detected when the syndrome differs from that expected according to the Pauli frame. In "postselected" quantum computing, the state is then rejected and the computation restarted. In standard quantum computing, the syndrome information must be used to update the Pauli frame. With the C 4 /C 6 architecture, it is possible to do so at level 2 and above by the following method: First check the level 1 C 4 syndromes of each block of four qubits. For each block where an error is detected, mark the encoded level 1 qubit pair as having an error. Proceed to level 2 and check the (encoded) C 6 syndrome for each block of three level 1 pairs. If exactly one of the level 1 pairs has an error, use the C 6 syndrome to correct it. This works because error-detecting codes can always correct an error at a known location. If not, mark the encoded level 2 pair as having an error unless none of the three level 1 pairs have an error and the C 6 syndrome is as expected according to the Pauli frame. Continue in this fashion through all higher levels. For optimizing state preparation, we can replace the error-correction step by error-detection at the top few levels depending on context as explained below. Error model and assumptions. All error models can be described by inserting errors (which act as quantum operations) after gates or before measurements. We could model correlations between the errors by extending the errors' quantum operations to a common external environment. However, here we assume that errors are independent. We further assume that a gate's errors consist of applications of Pauli products with probabilities determined by the gate. Ideally, we would obtain a threshold that does not depend on the details of the probability distributions of Pauli products. This is too difficult with available techniques, so a depolarizing model is assumed for each gate: |0 (|+ ) state preparation erroneously produces |1 (|− ) with probability e p . A binary (e.g. Z or X) measurement results in the wrong outcome with probability e m . CNOT is followed by one of the 15 possible non-identity Pauli products, each with probability e c /15. HAD is modified by one of the Pauli operators, each with probability e h /3. We further simplify by setting e c = γ, e m = e p = 4γ/15, e h = 4γ/5. This choice is justified as follows: 4γ/5 is the one-qubit marginal probability of error for the CNOT, and it is reasonable to expect that onequbit gates have error below this. In fact, one-qubit gates have much lower error than CNOTs in experimental systems such as ion-traps and liquid-state NMR. As for preparation errors, if they are much larger than 4γ/15, then it is possible to purify prepared states using a CNOT. For example, prepare |0 twice, apply a CNOT from the first to the second and Z measure the second. Try again if the measurement outcome indicates |1 , otherwise use the first state. The probability of error is given by 4γ/15 + O(γ 2 ), assuming that CNOT error is as above and measurement and preparation errors are proportional to γ. This also works for |+ preparation. To improve Z measurement, it is necessary to introduce an ancilla in |0 , apply a CNOT from the qubit to be measured to the ancilla, and measure both qubits, accepting the answer only if the measurements agree. The error probability conditional on acceptance is again 4γ/15 + O(γ 2 ). Detected error is much more readily managed than undetected error. 16 In our architecture, the primary role of state preparation implies that the conditional error is typically more relevant. To improve measurement without possibility of rejection requires an additional ancilla 4 and CNOT with majority decoding of the three measurement outcomes. However, the error probability is now 4γ/5 + O(γ 2 ).
The error model used here is idealized and does not match the error behavior of physical qubits and gates. There are three notable differences. First, real errors include coherent rotations, but any error can still be expressed as a linear combination of Pauli products. The syndrome measurements serve to "collapse" the linear combinations, so that these errors can be managed. The main problem with such errors is that consecutive errors can add coherently rather than probabilistically, resulting in more rapid error propagation. In principle, this problem can be eliminated by frequently applying known but random Pauli products, thus modulating the Pauli frame and reducing the likelihood of coherent addition. This also has the beneficial effect of decoupling 17 weakly interacting environments. The second difference is that real errors on gates nearby in time and space have correlations. These correlations are expected to decay rapidly with distance, and their effect can be alleviated by known coding techniques such as block interleaving. 9 The third difference is that in many cases, qubits are defined by a subspace of a quantum system from which amplitude can leak. An example of this problem is photon loss in optical quantum computing. Leakage errors, particularly if undetected, can be problematic. One advantage of error-correcting teleportation is that leakage is automatically controlled at each step.
The error model does not specify "memory" error or the amount of time used by a measurement. 9 We assume that gates other than measurements take the same amount of time; that is, the error parameter should represent the total error including any delays for faster gates to equalize gate times. For the C 4 /C 6 architecture, memory is an issue only when waiting for measurement outcomes that determine whether prepared states are good, or that are needed after teleportation, particularly when implementing non-Clifford gates. 18, 19 If the architecture is used for postselected computing, we can compute "optimistically", anticipating but not waiting for measurement outcomes. The output of the computation is accepted only if all measurement outcomes are as anticipated. Consider standard quantum computing with maximum parallelism. In teleportation, Bell measurements determine correction gates that need to be applied. If the correction gates are simple Pauli products, they can be absorbed into the Pauli frame and do not need to be known immediately. For teleportations used to implement non-Clifford gates, a non-Pauli compensation may be required. In this case, we must wait for the measurement outcomes. To avoid accumulating memory errors, we can maintain the logical state by repeatedly applying error-correcting teleportations with delays whose memory error is equivalent to that of physical one-qubit gates. The logical errors for these steps are comparable to logical one-qubit gates, which are small by design. The measurement outcomes for the teleportations determine only Pauli-frame updates and need not be known immediately. In state preparation, using additional resources, we may continue computing optimistically without waiting for measurement outcomes until the state is ready to be used for a logical gate. At this point it is necessary to wait for measurement outcomes to make sure the prepared state has no detected uncorrectable error. This adds at most the memory error incurred during a measurement time to each qubit. To account for this we assume that gate errors are set high enough to include this memory error.
Two additional assumptions are used in analyzing the C 4 /C 6 architecture: The first is that there is no error and no speed constraint on classical computations required to interpret measurement outcomes and control future gates. The second is that two-qubit gates can be applied to any pair of qubits without delay or additional error. This assumption is unrealistic, but the effect on the threshold is due primarily to relatively short-range CNOTs acting within the ancillas needed for maintaining one or two blocks. This may be accounted for by use of a higher effective EPG.
9, 20
Clifford gates for C 4 and C 6 . The codes C 4 , C 6 and their concatenations have the property that encoded CNOTs, HADs and measurements that act in parallel on both encoded qubits in a pair can be implemented "transversally" with physical qubit relabeling. For example, to apply an encoded CNOT between two encoded qubit pairs, it suffices to apply physical CNOTs transversally, that is between corresponding physical qubits in the encoding blocks. HAD requires in addition permuting the physical qubits in a block, which can be done by relabeling without physical manipulations or error, see the supplementary information (S.I. Sect. B). The transversal implementation ensures that errors from the physical gates apply independently to each physical qubit in a block so that they can be managed by error detection or correction. We use two methods for encoded state preparation. The first yields low-error encoded |0 and |+ states and follows from the Bell state-preparation scheme needed for error correction (S.I. Sect. B). The second uses teleportation to "inject" physical states into encoded qubits. The resulting encoded state has error but can be purified. Error-correcting teleportation. To correct (more accurately, to keep track of) errors, we use error-correcting teleportation, which generalizes gate teleportation. 18 It involves preparing two blocks, each encoding a logical qubit pair so that the first pair is uniformly entangled with the second. Two such blocks form a "logical Bell pair", and its logical state is the "logical Bell state". Suppose that a logical Bell pair's error is as if each physical qubit were subject to independent error of order γ. A block used for logical computation can then be error-corrected by applying Bell measurements transversally between corresponding qubits in the computational block and the first block of the logical Bell pair. This is the first step of conventional quantum teleportation 21 and results in the transfer of the logical state to the second block of the logical Bell pair, up to a known change in the Pauli frame. The Bell measurement outcomes reveal the syndromes of the products of identical check operators on the two blocks. Provided that the combined errors from the two measured blocks are within the limits of what the codes used can handle, they can be determined to update the Pauli frame (S.I. Sect. A). Compared to the syndrome extraction methods of Steane, 9 error-correcting teleportation involves only one step instead of at least two but requires preparing more complex states. Logical Bell state preparation. State preparation networks are detailed in S.I Sect. B. It is necessary to prepare logical Bell states so that any errors introduced are similar to independent physical one-qubit errors. We prepare such states by constructing encoded Bell states at each level, using them as a resource for constructing Bell states at the next level. An encoded Bell state can be obtained by preparing and verifying encoded |++ |00 in two encoded qubit pairs and applying an encoded CNOT from the first to the second. The encoded CNOT is applied transversally but can introduce correlations between the first and second block. To limit these correlations to the current level, the subblocks are teleported using lower-level Bell states with error detection or correction depending on context. The remaining correlations do not appear to significantly affect logical errors but can be reduced by purification 22 or by entanglement swapping 23 with two encoded Bell states. A key observation for preparing encoded |00 and |++ is that for both C 4 and C 6 , they are close to cat states such as (|0 . . . 0 + |1 . . . 1 )/ √ 2. In the case of C 4 , encoded |00 and |++ are cat states on four qubits. For C 6 , they are parallel three-qubit cat states on three qubit pairs modified by internal CNOTs on two of the pairs. To prepare verified cat states we use a minimal variant of the methods of Shor 24 starting with Bell states. For the concatenations used here, the internal CNOTs can be implemented by relabeling the physical qubits. Low error logical G min gates. The first step in establishing fault tolerance of the C 4 /C 6 architecture is to implement logical G min gates with low EPGs. For the purpose of establishing high thresholds, we first consider postselected G min computing. Postselected computing is like standard quantum computing except that when a gate is applied, the gate may fail. If it fails, this is known. The probability of success must be non-zero. There may be gate errors conditional on success, but fault-tolerant postselected computing requires that such errors are small. Purely postselected computing has little computational power, but we can use it to prepare states needed to enable scalable quantum computing. A fault-tolerant architecture for postselected computing can be implemented by use of the C 4 /C 6 architecture without error correction, aborting the computation whenever an error is detected. We have used two methods to determine threshold values for γ below which fault-tolerant postselected G min computing is possible. The first involves a computerassisted heuristic analysis of the conditional errors in prepared encoded Bell pairs. The analysis is described for a C 4 architecture in [25] . It requires that encoded Bell pairs are purified to ensure that errors are approximately independent between each Bell pair's two blocks. We obtained exact conditional errors for level 1 encoded Bell pairs and then heuristically bounded them from above with an error model that is independent between the two blocks. This independence implies that the error model for gates at the next level also satisfies strict independence, so the process can be repeated at each level to bound the conditional logical errors. With this analysis, thresholds of above γ = 0.03 were obtained. The second method involves direct simulation of the error behavior of postselected encoded CNOTs with error-detecting teleportation at up to two levels of encoding and physical EPGs of .01 ≤ γ ≤ .0375. The simulation method is outlined in S.I. Sect. E. The resulting conditional logical errors are shown in Fig. 2 and suggest a threshold of above γ = 0.06 by extrapolation. At γ = 0.03, the logical preparation and measurement errors were found to be consistent with being below the threshold.
Scalable G min computing with the C 4 /C 6 architecture requires lower EPGs and the use of error correction to increase the probability of success to near 1. To optimize the resource requirements needed to achieve a given logical EPG, the last level at which error correction is used is dl levels below the relevant top level, where dl depends on context and γ. At higher levels, errors are only detected. For simplicity and to enable extrapolation by modeling, we examined a fixed strategy with dl = 1 in all state-preparation contexts and dl = 0 (maximum error correction) in the context of logical computation. The relevant top level in a state preparation context is the level of a block measurement or error-correcting teleportation of a subblock, not the logical level of the state that is eventually prepared. Each logical gate now has a probability of detected but uncorrectable error, and a probability of logical error conditional on not having detected an error. Fig. 3 shows both error probabilities up to level 4 for a logical CNOT with error-correcting teleportation and EPGs γ ≤ 0.01. The data indicate that the threshold for this architecture is above 0.01. The logical preparation and measurement errors were found to be comparatively low.
In designing and analyzing fault-tolerant architectures, particularly those based on concatenation, care must be taken to ensure that logical errors do not have correlations that lead to larger than expected errors when gates are composed. Such effects can be missed when inferring thresholds from analysis or simulation of just one level of concatenation. An additional complication is that the C 4 /C 6 architecture's level l + 1 gates are not implemented solely in terms of level l gates. We therefore simulated the architecture at the highest levels possible. To verify that logical errors are sufficiently uncorrelated, we simulated sequential teleportation and checked the incremental error behavior of each step as shown in Fig. 4 . Universal computation. To complete the G min gate set so that we can implement arbitrary quantum computations, it suffices to add HAD and preparation of the state |π/8 = cos(π/8)|0 + sin(π/8)|1 . 26, 27 We treat the qubits in a logical qubit pair identically and ignore one of them for the purpose of computation. See S.I. Sect. D for how to take full advantage of both qubits. The logical HAD is implemented similarly to the logical CNOT and uses one error-correcting teleportation. Its logical errors are are less than those of the logical CNOT. To prepare logical |π/8 in both qubits of a logical qubit pair, we obtain a logical Bell pair, decode the first block of the Bell pair into two physical qubits and make measurements to project the physical qubits' states onto |π/8 or the orthogonal state. If an orthogonal state is obtained, we adjust the Pauli frame by Y 's accordingly. Because of the entanglement between the physical qubits and the logical ones, this prepares the desired logical state, albeit with error. This procedure is called "state injection". To decode the first block of the Bell pair, we first decode the C 4 subblocks and continue by decoding six-qubit subblocks of C 6 . Syndrome information is obtained in each step and can be used for error detection or correction. The error in decoding is expected to be dominated by the last decoding steps. Consequently, the error in the injected state should be bounded as the number of levels increase, which we verified by simulation to the extent possible. To remove errors from the injected states, logical purification can be used 27, 28 and is effective if the error of the injected state is less than 0.141. 28 The purification method can be implemented fault tolerantly to ensure that the purified logical |π/8 states have errors similar to those of logical CNOTs (S.I. Sect. G). To simplify the implementation of quantum computations, other states can be prepared similarly.
Consider the threshold for postselected universal quantum computing. The logical HAD and injection errors at γ = 0.03 and level 2 are shown in Fig. 2 . The injection error is well below the maximum allowed and is not expected to increase substantially for higher levels. The injection error should scale approximately linearly with EPG, so the extrapolated threshold of γ ≥ 0.06 may apply to universal postselected quantum computing.
The injection and purification method for preparing states needed to complete the gate set works with the error-correcting C 4 /C 6 architecture. Consider state injection at γ = 0.01. The context for injection is state preparation, which determines the combination of error-correction and detection as discussed above. The conditional logical error after state injection was determined to be 8.6±
0.6 0.5 × 10 −3 at level 3 and 1.1±
0.1 0.1 × 10 −2 at level 4, comparable to γ and sufficiently low for |π/8 purification. As a result, the C 4 /C 6 architecture enables scalable quantum computing at EPGs above 0.01. To obtain higher thresholds, we use fault-tolerant postselected computing to prepare states in a code that can handle higher EPGs than C 4 /C 6 concatenated codes can. The states are chosen so that we can implement a universal set of gates by error-correcting teleportation. Suppose that arbitrarily low logical EPGs are achievable with the C 4 /C 6 architecture for universal postselected computing. To compute scalably, we choose a sufficiently high level l for the C 4 /C 6 architecture and a very good error-correcting quantum code C e . The first step is to prepare the desired C e -encoded states using level l encoded qubits, in essence concatenating C e with level l of the C 4 /C 6 architecture. The second step is to decode each block of the C 4 /C 6 architecture to physical qubits to obtain unconcatenated C e -logical states. Once these states are successfully prepared, they can be used to implement each logical gate by error-correcting teleportation. Simulations show that the postselected decoding introduces an error γ for each decoded qubit (Fig. 2) . There is no postselection in error-correcting teleportation with C e , and it is sensitive to decoding error in two blocks (≈ 2γ) as well as the error of the CNOT (≈ γ) and the two physical measurements (≈ 8γ/15) required for the Bell measurement. Hence, the effective error per qubit that needs to be corrected is ≈ 3.53γ. The maximum error probability per qubit correctable by known codes C e is ≈ 0.19 [29] . Provided that 3.53γ 0.19 and γ is below the postselected threshold for the C 4 /C 6 architecture, the error in the state preparation before decoding together with the logical error in error-correcting teleportation can be made smaller than 10 −3 (S.I. Sect. G). The C e architecture can therefore be concatenated with the error-correcting C 4 /C 6 architecture to arbitrarily reduce the logical EPG. In view of the postselected threshold indicated by Fig. 2 , scalable quantum computing is possible at γ = 0.03 and perhaps up to γ ≈ 0.05. Although the postselection overheads are extreme, this method is theoretically efficient. Resources The resource requirements for the error-correcting C 4 /C 6 architecture can be mapped out as a function of γ for different sizes of computations. Since we do not have analytical expressions for the resources for logical Bell state preparation or for the logical errors as a function of γ and, with our current capabilities, we are not able to determine them in enough detail by simulation, we use naive models to approximate the needed expressions. The resources required are related to the number of physical CNOTs used, which dominates the number of state state preparations and measurements. HADs are used only for universality at the logical level. The number of physical CNOTs used in a logical Bell state preparation is modeled by functions of the form C/(1 − γ) k , which would be correct on average if the state-preparation network had C gates of which k failed independently with probability γ, and the network were repeatedly applied until none of the k gates fail. C and k depend on the level of concatenation. The logical error probabilities are modeled
is the Fibonacci sequence. These expressions are asymptotically correct as γ → 0. We verified that they model the desired values well and determined the constants at the lower levels by simulation (S.I. Sect. G). At high levels, the constants were estimated by extrapolating their level-dependent behavior. Using these expressions, we determined the level of concatenation that requires the fewest resources to implement a computation of a given size. The resulting resource graph is shown in Fig. 5 .
Since interesting quantum computations use many non-Clifford gates, it is necessary to estimate the average resources required for preparing states such as the |π/8 state. One instance of this state suffices for implementing a 45
• Y rotation. Two are required for a phase-variant of a Toffoli gate. Consider γ = 0.01. At level 4 of the C 4 /C 6 architecture, one purification stage requires ≈ 370 logical CNOTs (S.I. Sect. G). It is likely that this overhead can be significantly improved, but it must be accounted for when using the graphs of Fig. 5 as discussed in the caption. It is possible to implement a computation with 100 logical qubits and up to 1000 |π/8 -preparations using 1.23 × 10 14 physical CNOTs (S.I. Sect. G). This takes into account the probability that the computation fails with a detected error. The conditional probability of obtaining an incorrect output is ≈ 0.02. Such a computation is non-trivial in the sense that its output is not efficiently predictable using known classical algorithms. The resource requirements are large but would be reasonable in the context of classical computing: Central processing units have 10 8 or more transistors operating at rates faster than 10 9 bit operations per second.
30
We compare our resource requirements to those of Steane's architecture based on an example at γ = 10 −4 detailed in [9]. Steane's architecture is based on non-concatenated block codes, which are expected to be more efficient at such low EPGs. 5 Steane's example has an effective logical error per qubit of ≈ 7 × 10 −12 using ≈ 420 physical CNOTs per qubit per gate. Our architecture achieves detected errors of 5.5 × 10 −9 (level 3) or 6 × 10 −14 (level 4) using respectively ≈ 2100 or 2.6 × 10 4 physical CNOTs per qubit (S.I. Sect. G). The conditional logical errors are much smaller. The C 4 /C 6 architecture's resource requirements are still within two orders of magnitude of Steane's at γ = 10 −4 . The C 4 /C 6 architecture has the advantage of simplicity, of yielding more reliable answers conditional on having no detected errors, and of operating at higher EPGs. Discussion. How high must EPGs be so that it is not possible to scalably quantum compute? It is known that if unbiased one-qubit EPGs exceed .5, then we can simulate the effect of gates classically.
31, 32 Furthermore, if one-qubit EPGs exceed 0.25, then we cannot realize a quantum computation "faithfully", that is by encoding the computation's qubits with quantum codes. 33, 34 This is because the quantum channel capacity vanishes at a depolarizing error probability above 0.25. Faithful techniques are likely to require at least three sequential gates before an error can be eliminated (in our case these are preparation gates whose errors remain in the logical Bell pairs, a CNOT and a measurement for teleportation). Thus one would not expect to obtain thresholds above ∼ 0.09 using faithful methods. This is not far from the extrapolated 0.05 evidenced by our work. Note that the thresholds obtained here are similar to those for quantum communication. 35 An important use of studies of fault-tolerant architectures is to provide guidelines for EPGs that should be achieved to meet the low-error criterion for scalability. Such guidelines should depend on the details of the relevant error models and constraints on two-qubit gates. Nevertheless, the value of γ = 10 −4 has often been cited as the EPG to be achieved. [11] Leibfried, D., DeMarco, B., Meyer, V., Lucas, D., Barrett, M., Britton, J., Itano, W. M., Jelenković, B., Langer, C., Rosenband, T., and Wineland, D. J. Experimental demonstration of a robust, high-fidelity geometric two ion-qubit phase gate. Nature 422, 412-415 (2003) .
[12] Roos, C. F., Lancaster, G. P. T., Riebe, M., Häffner, H., Hänsel, W., Gulde, S., Becher, C., Eschner, J., Schmidt-Kaler, F., and Blatt, R. FIGURE. 1: Block structure of C 4 /C 6 concatenated codes. The bottom line shows 9 blocks of four physical qubits. Each block encodes a level 1 qubit pair with C 4 . The encoded qubit pairs are shown in the line above. Formally, each such pair is associated with two syndrome bits, shown below the encoded pair in a lighter shade, which are accessible by syndrome measurements or decoding for the purpose of error detection and correction. The next level groups three level 1 qubit pairs into a block, encoding a level 2 qubit pair with C 6 that is associated with 4 syndrome bits. A level 2 block consists of a total of 12 physical qubits. Three level 2 qubit pairs are used to form a level 3 qubit pair, again with C 6 and associated with 4 syndrome bits. The total number of physical qubits in a level 3 block is 36. Logical CNOT conditional error probability Physical CNOT error probability γ.
Level 2 errors at γ = 0.03. CNOT :
2.1± The smallest number of undetectable errors at level 2 is 4, which should be the slope as γ goes to 0. At high γ, the curves are expected to level off. 9 Other operations' errors for γ = 0.03 and level 2 are shown in the inset table. Ratios between the preparation or measurement and CNOT errors are smaller than those assumed for the physical error model. The logical HAD error is expected to be between 0.5 and 0.8 of the logical CNOT error, which could not be confirmed because of the large error bars. The decoding error is the incremental error introduced by decoding a block into two physical qubits. The injection error is the error in a logical state that we prepare by decoding one block of a logical Bell pair and measuring the decoded qubits. The measurement error per qubit is assumed to be the same as that of X-and Z-measurements. Decoding and injection errors were found to decrease from level 1 (decoding error 4.4± 
b.
FIGURE. 4: Error-compounding behavior with and without error-correcting teleportation. Incremental conditional (a.) and detected (b.) error probabilities are shown for each step of a sequence of 30 steps of applying the one-qubit error associated with HAD to each physical qubit and teleporting or not teleporting the logical qubit pair's block. Error bars are 68 % confidence intervals. Level 3 of the error-correcting architecture is used. The first step is omitted since it is biased by the error-free reference-state preparation as discussed in S.I. Sect. E. The horizontal gray lines show the average incremental error if teleportation is used. Note that for the first four steps, the incremental conditional error is smaller if no teleportation is used. This may be exploited when optimizing networks, provided one takes account of the resulting spreading of otherwise localized error events. To obtain the total computational resources, multiply pcnot × KQ by twice the average number of logical CNOTs needed for implementing a gate of the computation. It is assumed that these logical CNOTs are involved in state preparation required for universality but do not contribute to the error (S.I. Sect. G). The "scale-up" (number of physical qubits per logical qubit) depends on parallelism and level l of concatenation. With maximum parallelism, the scale-up is of the same order as pcnot. For a completely sequential algorithm such as could be used if there is no memory error, this can be reduced to 3 l−1 2. With some memory error and logical gate parallelism, ≈ (1 + 2 * (l − 1))3 l−1 2 is more realistic (S.I. Sect. G). The steps in the curve arise from increasing the number of levels. The first step is to level 2, and each subsequent step increments the level by 1. The steps are smoothed because we can exploit error-detection to avoid using the next level. Improvements of only one to two orders of magnitude are obtained by reducing γ from 0.001 to 0.0001, compared to at least five orders by reducing γ from 0.01 to 0.001.
Supplementary Information

A Explanation of Error-Correcting Teleportation
For the basic theory of stabilizer codes, see [15] . Let Q be the l × 2n binary check matrix with entries defining a stabilizer code on n qubits for encoding k = n − l qubits with good errordetecting or -correcting properties. The check matrix is obtained from an independent set of check operators P 1 , . . . , P l by placing a binary representation of P k into row k.
Arithmetic with binary vectors and matrices is modulo 2 and S is the 2n × 2n
block-diagonal matrix with blocks 0 1 1 0 .
Consider an n qubit "input" block carrying l qubits encoded in the stabilizer code for Q, where the block has been affected by errors. An effective way of detecting or correcting errors is to teleport each of the n qubits of the input block using two blocks of n qubits that form an "encoded Bell pair". That is, both blocks have syndrome 0 with respect to Q and corresponding qubits encoded in the two blocks are in the state (|00 + |11 )/ √ 2. The state of the two blocks is defined by the following preparation procedure: Start with n pairs of qubits in the standard Bell state (|00 + |11 )/ √ 2. The two blocks are formed from the first and second members of each pair, respectively. Use a Q-syndrome measurement on the n second members of each pair to project them into one of the joint eigenspaces of Q. Finally, apply identical Pauli matrices to both members of pairs in such a way as to reset the syndromes to 0. To teleport, apply the usual protocol to corresponding qubits in the three blocks. In the absence of errors, this copies the encoded input state to the second block of the encoded Bell pair. We show that errors are revealed by parities of the teleportation measurement outcomes.
The standard quantum teleportation protocol begins with an arbitrary state |ψ . This is identical to B (23) with qubits 2, 3 exchanged for qubits 1, 2. Depending on the syndrome e that results from the measurement, one applies correcting Pauli matrices to qubit 3 to restore |ψ in qubit 3.
Consider the teleportation of n qubits in a block as described above. The protocol is such that the 2n binary measurement outcomes linearly (with respect to computation modulo 2) determine the Pauli product correction to be applied to the second block of the encoded Bell pair. Let g be the binary representation of the Pauli product correction. The syndrome of the input block constrains g as shown in Fig. 6 . The principle is as explained in [18] for unitary gates, but generalized to measurements. In this case, a stabilizer projection on the destination qubits before teleportation is equivalent to a projection after teleportation, where the syndrome associated with the projection is modified by the correction Pauli product used at the end of teleportation. The expression QSg T must match the syndrome of the input block. Consequently, the syndrome of the input block can be deduced from g, a function of the teleportation Bell measurement. Errors can be detected or corrected accordingly.
It is necessary to consider the effects of errors in the prepared encoded Bell pair. Errors on the second block propagate forward and must be handled by future teleportations. Because of the Bell measurement, errors on the first block have an effect equivalent to the same errors on the input block. Thus, using the inferred syndrome for detection or correction of errors deals with errors in both blocks, as long as their combination is within the capabilities of the code.
Error correction or detection by teleportation handles leakage errors in the same way as other errors. If a qubit "leaked", the outcome of its Bell measurement becomes undetermined. The Bell measurement can be filled in arbitrarily, because for the purpose of interpreting the syndrome, the effect is the same as if a Pauli error occurred depending on how the measurement result is filled in.
Note that as usual, none of the Pauli corrections actually have to be implemented explicitly. One can just update the Pauli frame as needed.
n input qubits n output qubits Bell
FIGURE. 6: Teleporting with an encoded entangled state is equivalent to a syndrome measurement. The gray lines are the time lines of blocks of n qubits. The boxes denote various operations. The Bell-state preparation on corresponding pairs of qubits in two blocks is depicted with a box angled to the right and labeled "Bell". The state used for teleportation in the top diagram is obtained after Bell-state preparation by projecting one of the blocks with Π(Q, 0) (the actual preparation procedure is different but has the same output). Projection operators are shown with boxes angled both ways with the operator written in the box. Bell measurement of corresponding pairs of qubits in two blocks is depicted with a box angled to the left and labeled "Bell". A Bell measurement on qubits 1 and 2 is implemented by applying a CNOT from qubit 1 to 2, performing an X-measurement on qubit 1 and a Z-measurement on qubit 2. The top diagram is the actual network implemented. The other two are logically equivalent. The Bell measurement outcome g is correlated with the effective projection in the bottom diagram. If the input state has a particular syndrome, then only g for which the projection is onto the subspace with this syndrome have non-zero probababilities.
B Networks for C 4 and C 6 State Preparation and Gates
+ Z
Encoded |0 preparation. T C 6 subblock teleportation, involves three lower level blocks FIGURE. 7: Network elements. The elements shown represent networks acting on blocks of qubits. Blocks (shown by thick gray lines) may consist of only one physical qubit, so the elements can also represent physical gates. Elements with "fringes" are transversal gates: The indicated gate is applied to each physical qubit, or to corresponding physical qubits in the input blocks. The measurements have classical output indicated with a black line. Because they are transversal, the output contains as many bits as there are physical qubits. Because the codes used here are CSS codes, the check operators and the encoded Pauli operators contain only one type of non-identity Pauli operator. The output bits therefore contain both error-check information and the encodedmeasurement answers. The * u and * u 2 elements are defined as shown for physical qubit pairs. The notation comes from a polynomial construction of C 6 as a code on three quaternary qudits using the four-element field GF (4). The symbol u denotes a third root of unity over GF (2). The gates transform Pauli operators by multiplication with u or u 2 in a GF (4) labeling of these operators. FIGURE. 8: Encoded state preparations in terms of lower-level elements for C 4 . The lower-level blocks ("subblocks") can be either physical or encoded single qubits and are represented by the merging lines in the networks on the right. In the C 4 /C 6 architecture, C 4 is used only at level 1, so the subblocks are always physical qubits. In this case, the output block contains an encoded qubit pair with each qubit in the pair in the state indicated by the preparation gate on the left. The physical states prepared are four-qubit cat states ((|0000 + |1111 )/ √ 2 in the case of the top network). If no error occurred, the four measurements in each network on the right have total parity 0. For any single error in the state preparation network, if this error results in an error in the output state that is not equivalent to a single physical qubit error, then the parity is 1, so this event can be detected. Thus, if the total measurement parity is 1, the output state is rejected. This ensures that errors occurring with linear probability in the EPGs introduce no undetectable errors. Note that the networks on the right begin with Bell-state preparations. The teleportation steps are not implemented on physical qubits but are included for generality. The encoded Zand X-preparations shown assume that the next step is a transversal CNOT followed by subblock teleportations. Otherwise it may be necessary to teleport subblocks immediately to avoid error propagation. FIGURE. 9: Encoded state preparations in terms of lower-level elements for C 6 . The lower-level blocks ("subblocks") contain encoded qubit pairs. The beginning of the network prepares two parallel three-qubit cat states (|000 + |111 )/ √ 2 in the top network) on corresponding members of the encoded qubit pairs. The encoded measurements in the cat-state preparation satisfy the parity constraint described in the caption of Fig. 8 for each of the three corresponding qubits in the encoded qubit pairs. Because the measurements are implemented transversally, they also provide lower-level syndrome information that can be used for error detection or correction. Again, the networks begin with Bell pair preparations and the teleportations are only implemented on encoded qubit pairs. The last elements rotate the parallel cat states into C 6 , so that the encoded qubit pair has both qubits in the desired state. Because the first level encoding uses C 4 , they can be implemented as simple permutations, which can be accomplished by logical relabeling without delay or error, see Fig. 10 . As in Fig. 8 , the encoded Z-and X-preparations shown assume that the next step is a transversal CNOT followed by subblock teleportations. Otherwise it may be necessary to teleport subblocks immediately to avoid error propagation. 12: Implementation of encoded HADs for C 4 and C 6 . The top network is for C 4 and is transversal except for an interchange of the middle two qubits. The bottom is for C 6 and is transversal. Using the HAD and CNOT implementations, it is also possible to implement the encoded conditional sign flip transversally up to a physical qubit permutation implementable by relabeling.
X +
As can be seen, all preparation networks are based ultimately on Bell state preparation followed by full or half Bell measurements. As shown, the networks use teleportation fastidiously. It may be possible to delay teleportation in some cases, but this was not confirmed by simulation. For postselected computing, there is no need to wait for measurement outcomes before proceeding to the next steps. However, this delays the rejection of states found later to be faulty, which incurs a large resource cost if the probability of detecting an error is high. For standard quantum computing, this resource cost can be avoided by delaying further processing and incurring some memory error instead. If error correction is used, at higher levels the probability of unrecoverable error decreases rapidly so one can again proceed optimistically, before measurement answers are known.
C Decoding C 4 and C 6 .
There are two reasons to explicitly decode logical states encoded by concatenating C 4 and C 6 . First, at the highest EPGs, to implement a standard quantum computation with the postselected C 4 /C 6 architecture requires preparing C 4 /C 6 encoded states that are themselves states encoded in a code C e with very good error-correction capabilities. Once such a state is prepared, the C 4 /C 6 concatenation hierarchy is decoded to obtain a physical block encoding a state in C e . Second, to implement arbitrary quantum computations requires preparing special encoded states that are not reachable using G min and HAD gates alone. These encoded states need not be error-free initially, since they can be purified using low-error logical G min and HAD gates. A way to prepare these states with error that is bounded independently of the number of levels is to prepare a logical Bell state in two blocks, decode the first block into two physical qubits, and make a measurement of the physical qubits to project them into the desired state. (Alternatively, but with more error, the measurement can be replaced by a teleportation of the desired state prepared in another pair of physical qubits.) The entanglement between the physical qubits and the logical ones in the second block ensures that the state is injected into the logical qubits.
A good method for decoding the C 4 /C 6 concatenation hierarchy is to decode "bottom up". That is, in the first step, the blocks of four physical qubits encoding qubit pairs in C 4 at the lowest level of the hierarchy are decoded. Syndrome information becomes available in a pair of ancillas for each block of C 4 and can be used for error detection. In subsequent steps, six physical qubits encoding qubit pairs in C 6 are similarly decoded. Error information obtained in previous decoding steps can be combined with new syndrome information for error detection or correction. The C 4 and C 6 decoding networks are shown in Fig. 13 . 2) , (3, 4) or (5, 6) for C 6 ) was detected to have an error. The C 6 decoding can be simplified if the first level of the full concatenation hierarchy uses C 4 : The first step is a * u 2 operation on the first and last pair and can be implemented by relabeling before the level 1 blocks of C 4 are decoded. Even if the C 6 decoding is implemented with maximum parallelism and without waiting for measurement outcomes, it has an initial memory delay on qubits 5 and 6 that was not taken into consideration in the simulations.
D Universal Computing
Universal computing with logical qubits encoded with C 4 /C 6 can be accomplished by use of the logical G min gates, HADs and |π/8 -state preparation. However, since these operations do not distinguish between the two logical qubits encoded in one block, computations are implemented on only one of the two logical qubits in each block. Because the other one experiences the same evolution, the computation's output is obtained twice each time it is run. It is desirable to be able to address the two logical qubits in a block separately and have the ability to apply a CNOT from one to the other. One operation that is already available is the * u gate and its inverse, which acts on a logical qubit pair as a swap followed by a CNOT. As with all stabilizer codes, it is also possible to apply arbitrary combinations of logical Pauli matrices by applying suitable products of physical Pauli matrices or by making a Pauli frame change. Pauli products. The top network shows how to implement any gate U selectively on one qubit in a pair. The implementation uses a selective Pauli operator and non-selective controlled-U gates. The bottom network shows how to implement a type of controlled phase gate between the two logical qubits in a pair. It uses a * u operation, a selective 90 • z-rotation (which can be implemented using the top network) and a * u 2 operation. The CNOT (without swap) between the qubits in a pair can be implemented in terms of the controlled phase gate shown and selective one-qubit gates.
The networks shown in Fig. 14 do not result in particularly efficient ways of implementing gates on individual logical qubits. An alternative is to inject and purify states needed for onequbit teleportation of the desired gates using the techniques given in [19] . An example of such a state is |0 |+ . Note that |0 |+ is much more readily purified than |π/8 . For example, to purify |0 |+ one can apply the method suggested in the main text to reduce the preparation error. In the encoded setting, this requires a measurement of Z and X of the qubits in a pair, which cannot be done by a transversal encoded measurement. Instead, a third instance of |0 |+ is introduced and involved in a transversal Bell measurement with the block of the qubits to be measured. As in error-correcting teleportation, the desired information can be extracted from parities of the Bell measurements. At the same time, syndrome information that can be used for error detection and correction is obtained.
E Simulation of Error Behavior
To simulate the error behavior of fault-tolerant methods based on stabilizer codes, we use the result that computation with Clifford gates and feed-forward from Z-and X-measurements can be efficiently simulated. 10 The Clifford gates include Z-and X-state preparations and measurements, HAD, CNOT and 90
• Z-rotations. Networks using these gates always result in stabilizer states, which are eigenstates of maximal sets of commuting check operators (the check matrix). Simulation requires tracking a complete independent set of such check operators and the syndrome (which gives the state's eigenvalues with respect to the check operators). Check operators can be represented by binary vectors (see Sect. A). To simplify the computations required for updating the check matrix and syndrome after applying gates, we maintain it in "graph-state normal form". 39, 40 In this form, each qubit has an associated "commuting" operator, which is either X or Z, and there is exactly one check operator acting on the qubit with an operator different from I or the commuting operator. In addition to the check matrix and the ideal syndrome, we maintain the Pauli products representing the current effect of errors (the "error vector") and the Pauli frame. The error vector is known only to the simulation, not to the user implementing a computation. The error vector and Pauli frame are updated with each operation. For efficiency, blocks that have not yet interacted are associated with separate check matrices and "merged" when needed. Also, since it is necessary to accumulate as much statistics as possible, an array of error vectors and corresponding Pauli frames is used to represent multiple simultaneous preparation attempts without duplicating check matrices. For rapid prototyping purposes and fast array processing, we used Octave to implement the simulator. For simulating measurements and errors, a random-number generator is needed. We used the standard random-number generator provided with Octave. Because this implies that there are implicit correlations in the errors for the large-scale simulations undertaken here, the results obtained do not constitute full statistical proof. However, no artifacts not explainable by statistics were observed. In particular, in the few cases where an analytic expression for the data were available, the simulated data was as expected. This was checked for conditional error probabilities in postselected computing using concatenation with C 4 as discussed in [25] for up to two levels (data not shown).
The simulations are used to determine the error behavior of various logical gates. For the data shown in Fig. 2, 3 and 4 , we used the reference entanglement method 41 for determining logical CNOT error probabilities. This involves applying the logical CNOT and error-detecting or -correcting teleportations to the first members of two error-free logical Bell pairs and then comparing the logical state to what would have been obtained if the logical CNOT had no error. The comparison is implemented by applying error-free CNOTs to disentangle the Bell pairs and making error-free logical X-or Z-measurements with error detection or correction depending on the context. The procedure was modified by (1) applying only the CNOT's physical error model associated with the transversal implementation and (2) applying the error model and error-detecting teleportation twice and determining the incremental error introduced the second time. (1) simplifies the verification without affecting the error probabilities. (2) is required so as to determine the effective error introduced in the middle of a computation, because the error-free Bell pairs have no initial error, contrary to what would be expected later. Using the second of two steps suffices because of the isolating properties of teleportation, which was verified by taking some data for more steps as shown in Fig. 4 . For detected error probabilities, the incremental error is determined as the fraction of trials in which an uncorrectable error was detected during the teleportations or in the verifying measurements. For conditional error probabilities, the incremental error is the fraction of trials with no detected uncorrectable error for which the logical measurement outcomes are incorrect but there was no undetected logical error in the preceding steps.
F Scalable Quantum Computing via Bootstrapping with Postselection
The fault-tolerant architecture based on a good quantum error-correcting code C e using the C 4 /C 6 architecture with postselection for state preparation is described in the main text. We claimed that if 3.53γ 0.19 and γ is below the threshold for fault-tolerant postselected computing with the C 4 /C 6 architecture, then the logical errors for the C e architecture can be made to be below 10 −3 , which is below the threshold for known fault-tolerant architectures. The estimate assumes that the decoding error per decoded qubit is ≈ γ, in which case ≈ 3.53γ is the effective error per qubit that determines whether the error-correcting teleportation successfully corrects. With this assumption, the claim is proven as follows. Choose ǫ such that 3.53γ 0.19 − ǫ. Choose C e such that if a logical qubit is encoded in C e without error and each physical qubit is independently subjected to an error with probability 0.19 − ǫ, then the logical state can be recovered with error at most 10 −3 /4. Such codes exist 29 although their length n grows as ǫ goes to zero. Algorithms for encoding needed states in one to four blocks of C e require at most c 1 n 2 gates for some constant c 1 [42] . Choose the level of the postselected C 4 /C 6 architecture so that the logical gate error is well below 10 −3 /(2c 1 n 2 ). Then, before they are decoded, the postselected prepared states have logical error at most 10 −3 /2, since they required fewer than c 1 n 2 logical gates of the C 4 /C 6 architecture.
This error persists as a C e -logical error after the C e -error-correcting teleportation that uses this state after it is decoded. It adds to the logical error introduced by failure to error-correct in teleportation. However, because at most two error-correcting teleportations are involved, the total logical error is below 10 −3 . Note that if γ is given and strictly below the threshold, then the resources required to achieve C e -logical EPGs below 10 −3 are determined. Because the fault-tolerant architectures that can be used with EPGs of 10 −3 are known to be theoretically efficient, the combined architecture starting with C e is also theoretically efficient. The problem is that as γ approaches the upper limit, the minimum length of the code C e grows, and as a result the probability of successfully preparing the required states by postselection goes down dramatically, making the combined architecture highly impractical.
G Resource Usage
The simulations keep track of the number of operations of different types that are applied in the course of implementing a quantum network. The resources required depend on whether the networks are implemented with maximum parallelism or sequentially: If they are implemented sequentially, one can take advantage of the ability to abort some computations early, but such implementations require quantum memory of sufficiently low error. Here we consider only the case of maximum parallelism. At the core of the fault-tolerant architecture is Bell pair preparation. One can analyze the resources required to construct a level l+1 Bell pair in terms of the number of level l Bell pairs consumed. As a first step, consider the case of zero EPG. In this case no error is ever detected and all networks succeed on the first try. We count only the number of physical qubit state preparations, p(l, γ), and the number of physical CNOTs, c(l, γ). The number of physical qubit measurements is less than the number of qubit state preparations. 0  2  1  1  16  20  2  192  300  3  2304  3780  4 2.765×10 With maximum parallelism, the average resource requirements increase by factors inversely related to the probability of success at various points in the preparation process. Preparing an encoded Bell state involves two sequential steps that may fail. The first verifies the initial states of each block before they are combined with CNOTs. Let the probability of successful verification of a block at level l be given by v(l, γ). The second involves teleportation of each subblock after the two blocks are combined. Let the overall probability of success of the teleportations be t(l, γ). Note that both of these probabilities of success are with respect to the combination of error correction and detection used in state preparation, which differs from the full error correction used in logical computation. The above resource formulas are modified as follows: p(0, γ) = 2, c(0, γ) = 1,
Level Preparations CNOTs
. These formulas were obtained under the assumption that the verification of the two blocks proceeds independently with many simultaneous attempts, where the successful ones are then combined. This requires waiting for measurement outcomes and any associated memory error must be accounted for in γ. The subblock teleportations are not independent because of the immediately preceding transversal CNOT, which introduces correlated errors. Tables 2, 3, 4 show the success probabilities up to level 5 for γ = 0.01, 0.001, 0.0001 together with the resources estimated according to these recursive formulas and the resources determined by the simulation after averaging over the number of attempts made. The simulation is expected to show higher resource requirements because it involves some loss when combining unequal numbers of independently prepared blocks, as would be expected to occur in a real implementation. This was not taken into account in deriving the formulas. The values v(l, 0.01), t(l, 0.01) and the numbers in the "preparations" and "CNOTs" columns are obtained by simulation using the number of successful Bell pair preparations shown in the "# Bell pairs" column. Because only two successful preparations were used at level 5, the level 5 data have significant noise. Resources for implementing logical gates transversally are dominated by those required for logical Bell state preparation. For example, the logical CNOT includes error-correcting teleportation and therefore requires two logical Bell states and three transversal CNOTs. The number of physical CNOTs in a transversal CNOT grows by a factor of 3 for each level after the first, whereas the number of physical CNOTs required for logical Bell state preparation grows by a factor greater than 12. This justifies focusing attention on the resources required for logical Bell state preparation. The biggest resource overhead is incurred when implementing non-Clifford gates such as |π/8 -preparation (see below) or Toffoli gates. Note that two |π/8 states are needed to implement a Toffoli gate up to a reversible phase in the logical basis, which is all that is required for most uses of Toffoli gates. We have not attempted to optimize |π/8 -preparation. Furthermore, it is possible that gates such as the Toffoli gate can be implemented more efficiently using other states, for example, using Steane's adaptation 43 of Shor's method.
24
For completeness and to obtain an upper bound on the requirements for a minimal non-trivial quantum algorithm at γ = 0.01, we outline one method for preparing good logical |π/8 states, discuss why the error when using these states is expected to be similar to that of one logical CNOT and estimate the average number of logical CNOTs required. A straightforward method for preparing a noisy logical |π/8 -state is to prepare a logical Bell state, decode the first block and make a measurement in the basis |π/8 , |5π/8 = − sin(π/8)|0 + cos(π/8)|1 of each of the two decoded, now physical qubits. Note that |5π/8 differs from |π/8 by a Y operator, so any measurement outcome is acceptable and can be accounted for by a change in Pauli frame if necessary. The simulations indicate that if the measurement has the same error probability as a Z-or X-measurement, then the error ǫ π/8 in the logical prepared state is near the EPG parameter γ. To reduce the noise in the logical |π/8 states, one can purify them. The simplest purification method known so far involves using 15 prepared |π/8 states. One is encoded into the [ [7, 1, 3] ] code 44 (a code that encodes 1 qubit in 7 qubits with minimum distance 3, which implies that it can correct any (3 − 1)/2 = 1 qubit error or detect any (3 − 1) = 2 qubit errors). The other 2 × 7 |π/8 = 14 states are used to implement a conditional logical HAD from an ancilla to realize an encoded HAD measurement. Note that |π/8 is the +1 eigenstate of HAD. In the last step, the [ [7, 1, 3] ] code is decoded. If the measurement outcomes are as would be expected if no error had occurred, the state is accepted and has much reduced conditional error. The method is equivalent to Bravyi and Kitaev's scheme 28 (Reichardt, private communication) and can be analyzed using their formulas. With no error in the G min and HADs used to implement the procedure, the probability of error in successfully purified gates is ǫ Consider the effect of logical gate error on the error in the purified |π/8 . We conjecture that by using state injection with Steane's fault-tolerant methods for preparing states, the additional error on the purified |π/8 state is dominated by a decoding error of the order of the logical CNOT error. Specifically, one can encode one noisy logical |π/8 by teleportation into the [ [7, 1, 3] ] code using a Bell state correlating a logical qubit and a [ [7, 1, 3] ]-encoded qubit. (Strictly speaking, our architecture requires the use of logical qubit pairs associated with blocks of the C 4 /C 6 codes, but we treat each qubit in a pair identically.) This Bell state has minimum distance 4, so that any combination of Pauli errors on up to three qubits results in an orthogonal state. It can therefore be well verified using Steane's methods. The error in the state teleported into the [ [7, 1, 3] ] code is due to the initially prepared |π/8 state, initial error in the logical qubit of the Bell state used, and the CNOT and measurements needed for the teleportation Bell measurement. Because we are operating with logical qubits of the C 4 /C 6 architecture, all but the first of these errors are comparatively small, assuming that the C 4 /C 6 encoding level is chosen so as to significantly decrease CNOT errors. The errors have two effects. One is to modify the encoded state, which can be subsumed by considering this as additional error in the initial |π/8 state. The other is to perturb the syndrome of the encoded state. If two or fewer errors occurred, this can be detected in the decoding stage. The encoded state is verified using the controlled-HADs implemented with the other 14 noisy logical |π/8 states. Each of these controlled-HADs involves at most five CNOTs.
27
The error is dominated by that in the |π/8 states used. Additional error due to the logical CNOT either has a smaller effect, to be detected in decoding, or results in the wrong outcome in the encoded HAD measurement. The latter event could cause unintentional acceptance of the final state, but only if additional error occurred elsewhere. At the end of the procedure, the [ [7, 1, 3] ]-encoded qubit is decoded and the syndrome verified. One can decode directly or by reverse teleportation through the same type of Bell state used for the initial teleportation, verifying the syndrome in the teleportation process. The latter method may be more robust. In all cases, the effect of additional errors are either suppressed by the fault-tolerant methods used to encode and decode the [ [7, 1, 3] ] code, or can be subsumed as a relatively small amount of additional error in the initial |π/8 states due to at most five logical CNOTs. Based on experience with C 4 /C 6 codes, it is likely that the additional error from encoding and decoding is of the order of that of a logical CNOT, whereas the effective additional |π/8 error should be sufficiently small (because of significant decrease in CNOT errors at the level chosen) to have little effect on the error in the purified |π/8 .
We estimate the number of logical CNOTs needed for the |π/8 purification process. The Bell state needed for injection into the [ [7, 1, 3] ] code can be prepared from a logical Bell state by encoding one of the two blocks into the [ [7, 1, 3] ] code. 11 CNOTs suffice for encoding. The resulting state can be verified using Steane's methods. There are eight syndromes each of weight 4 to check, each requires an ancilla preparation with five CNOTs and four CNOTs for the syndrome check. If memory is an issue, it may be necessary to add error-correcting teleportations not associated with a gate. We do not consider this here but note that this may add another four logical Bell states per syndrome check to the resources required. If the robust decoding scheme is used, two of the injection Bell states are required overall. The verification process using controlled-HADs requires about 5 × 7 CNOTs. This gives a total of 201 CNOTs, but does not take into account the probability of failure in the various checks. We can estimate this probability as 1 − (1 − p) 201 , where p is the probability of detected error in a logical CNOT. For the relevant parameters, p is below 0.003. Taking the average number of trials required due to logical gate failure as 1/(1 − p) 201 , we can upper bound the average number of CNOTs required as 370.
An obvious optimization of the |π/8 -purification method in the context of the fault tolerant C 4 /C 6 architecture is to concatenate with the [ [7, 1, 3 ]] code as a last level, lifting all logical states accordingly, but injecting |π/8 states to the last C 4 /C 6 level as before for |π/8 purification purposes. This avoids having to decode the purified |π/8 states while achieving significantly lower error probabilities. Within the C 4 /C 6 scheme, if more than one purification stage is required, it may be worthwhile injecting and purifying states at intermediate levels before injecting and purifying at the top.
As an example, consider γ = 0.01, aiming for implementing a non-trivial quantum computation. The smallest non-trivial quantum computation must be one involving more qubits than can be directly simulated on existing classical computers. 100 qubits is a safe number for this property. Such a quantum computation should also apply sufficiently many gates for a classical simulation with current computers not to be able to predict the output of the quantum computation by taking advantage of restrictions on the reachable states. If the number of gates applied involves sufficiently many parallel steps of non-Clifford gates involving all qubits, this is expected to be the case. Short of having an explicit example of a computation whose output is unknown and not believed to be accessible to classical computers, we assume that 10 steps involving parallel CNOTs, HADs and |π/8 -preparations suffice 1 . We therefore take 1000 as a minimal number of gates in a non-trivial quantum algorithm. Note that with EPGs of 0.01 it is not possible to combine this many physical gates and still expect that a computation's output can be discerned. If more than 68 physical gates at this EPG are applied, the probability that the output is correct cannot be guaranteed to be strictly greater than 0.5. Although the output of a computation with such few gates may already be difficult to simulate with current classical computers, it is conceivably possible to do so.
Consider level 4 of our scheme at γ = 0.01. The detected error probability of a logical CNOT is p d = 2.4±
1.0 0.7 × 10 −3 . The conditional probability of a logical error is much lower and estimated as p c = 2.3 × 10 −5 (see below). The logical |π/8 -purification method ensures that similar error 1 Finding a computation with as few as 100 qubits and fewer than 10 4 gates with a definite and convincing answer of interest independent of quantum information theory would be very helpful and could be a boon for quantum information processing. For comparison, all fully worked out computations of this sort seem to require that the number of gates greatly exceeds 10 9 .
probabilities apply to uses of these states in the algorithm. The probability that there is a detected failure in 1000 gates is Resource requirements for implementing a given computation decrease significantly with γ. Simulation is too inefficient for resolving the dependence of resource requirements on γ, particularly when error probabilities are extremely small. We therefore obtain and verify simple models for resources and errors as a function of γ and level of concatenation. Ideally, we would like to obtain analytic expressions, however this is difficult to do, particularly since our schemes are not strictly concatenated, and the combination of error-detection and correction behaves differently depending on the level. Nevertheless, it is possible to derive functional forms for Bell state preparation resources and logical CNOT error behavior that are asymptotically valid as γ goes to 0.
We model the number of physical CNOTs required for preparing logical Bell states at level l as rbell(l, γ) = P (l)/(1 − γ) k(l) . This is a naive model based on assuming that the resources are determined by applying a network with P (l) physical CNOT gates, k(l) of which fail independently with probability γ each, and the network is repeatedly applied until no failure is detected. Perhaps surprisingly, this model matches the simulations well in the range shown in Fig. 15 . To understand the error behavior of the C 4 /C 6 architecture, suppose more generally that we have a fault-tolerant scheme A for implementing an encoded gate, which results in a detected, uncorrectable error with probability p d , or an undetected logical error with probability p c , conditional on not having detected an error. Suppose that this is concatenated with a one-error detecting (minimum distance 2) code C and used in a scheme similar to the ones used here. C can correct any error at a known location. If the implementation of C-encoded gates is fault tolerant and includes error-correcting teleportation or another method for determing the C-syndrome, any one error detected by A can be corrected with no resulting encoded error. The event that an error is detected but not correctable during implementation of a C-encoded gate therefore requires at least one undetected error or at least two detected errors. The conditional event that an undetected error occurs requires that the A gates used have one detected and one undetected error or two or more undetected errors. To lowest order, the detected and conditional error probabilities for C-encoded gates are therefore of the form p
In our case, p d and p c depend on one parameter γ. After level 1, the order in γ of p c is always between p d and p 
. As is typical of concatenation schemes, the exponent grows exponentially.
In view of the previous paragraph, we examine the data shown in Fig. 3 to determine C(l), D(l) for l = 1, 2 and c(l), d(l) for l = 1, 2, 3. The results are shown in Table 5 . We computed the values of c(l) and d(l) by fitting the model curves to the error probabilities obtained by simulation. The points at γ = 0.01 were omitted for levels 1, 2, and 3 to reduce the chance of introducing optimistic biases by the curves' leveling off at higher γ, although this effect has not been observed. We obtained the fits by starting with a least-squares fit of the log-log plots and then using a fastestdescent method to optimize the likelihood. We computed standard deviations by resampling the data according to the fitted curve and repeating the fitting process. The fitted curves are shown with the data in Physical CNOT error probability γ.
FIGURE. 16: Fits to the error data for the logical CNOT. The model assumed is The constants D(l), C(l) are significantly reduced for going from level 2 to level 3 compared to going from level 1 to 2. Level 2 is the first stage of using C 6 and the first where error correction can be used. One may conjecture that the level 2 to level 3 behavior persists or improves at higher levels, as is the case for D(3) compared to D (2) . For the purposes of modeling errors we use this conjecture to recursively obtain d(l + 1) and c(l + 1) with D(2) and C(2) in place of D(l) and C(l) for l > 2. It is an interesting exercise to use the recursion implied by the D(l) and C(l) to obtain a threshold. The threshold thus obtained is conjectural, because the approximations made are not strictly valid, particularly at high γ, and because of the extrapolation of D(l) and C(l). By implementing the recursion numerically, we obtained a threshold of ≈0.028 for this architecture, which does not seem unreasonable in view of the data shown in Fig. 3 . Of course, the resource overheads diverge as any such threshold is approached from below.
We return to the question of resource requirements for implementing gates at γ < 0.01. As γ decreases, the physical resources required per logical CNOT are reduced in two ways. First, the state preparation success probabilities at a given level of concatenation increase, see Tables 2, 3 and 4 and Fig. 15 . This increase is particularly notable near the upper limit for γ. Second, fewer levels of concatenation suffice for achieving sufficiently low logical errors. Consider implementing a computation C with the product of the number of logical gates and average number of qubits per gate given by KQ. For computations that are not maximally parallel, this quantity should include memory delays in the gate count. To simplify the resource estimates, logical errors and physical gate counts are given in terms of "effective" error and physical gate counts per (logical) qubit and gate. For example, consider the logical cnot in the C 4 /C 6 architecture. It acts on two logical qubit pairs, so its effective error per qubit is 1/4 of its total error. Similarly, its effective physical gate count per qubit is 1/4 of the total gate count. With this simplification, we can estimate the total error and number of physical gates for implementing the computation C by multiplying KQ by the the appropriate effective quantity and a nontransversal-gate state preparation overhead. In making these estimates, we assume that (1) each of the logical gates needed by C can be implemented with effective error similar to that of the logical CNOT, (2) the implementation can take advantage of both logical qubits in the logical qubit pairs and (3) overhead for addressing individual logical qubits in the pairs is accounted for in the nontransversal-gate state preparation overhead. The assumptions require that the nontransversal-gate state preparations have the property that logical gates used in the preparations do not contribute additional error, as is the case for the |π/8 state preparation described above. The reason for not including the nontransversal-gate state preparation overhead in the effective quantities per qubit and gate is that this overhead can be optimized independent of the architecture and depends on the choice of elementary nontransversal gates. It is expected to add one to two orders of magnitude to the total implementation resources.
We estimate the optimal effective number pcnot(KQ, γ) of physical CNOTs per qubit and gate as a function of the size KQ of C and the EPG parameter γ. As noted above, other physical resources such as state preparation and measurement are comparable. We optimize pcnot(KQ, γ) by choosing the level l of the C 4 /C 6 architecture and use it to repeatedly implement C until no uncorrectable error is detected in the logical gates. At this point the output of C must be correct with probability at least 2/3. The value of 2/3 is chosen to be strictly between 1/2 and 1 but otherwise not crucial. At the minimizing level, pcnot(KQ, γ) is computed as the product of the average number of times C must be implemented until no error is detected and 1/2 of the number of physical CNOTs, rbell(l, γ), needed to prepare a logical Bell state (neglecting the relatively small additional number of physical cnots needed for transversal gates and for using the Bell state in an error-correcting teleportation). The factor of 1/2 accounts for having two qubits in each block of the C 4 /C 6 concatenated codes. The probability of success of a single instance of C can be estimated as (1 − p d (l, γ)/4) KQ , which is approximately correct for our accounting using effective errors per qubit and gate, provided that p d (l, γ) is small. On average, C must be tried 1/(1 − p d (l, γ)/4) KQ times to successfully obtain the output. The conditional probability of a successful output's being correct is (1 − p c (l, γ)/4) KQ . Thus, given KQ, the optimal pcnot(KQ, γ) is obtained as the minimum over l of 1 2 rbell(l, γ)/(1 − p d (l, γ)/4) KQ subject to (1 − p c (l, γ)/4) KQ ≥ 2/3. Curves for pcnot(KQ, γ) for various KQ as a function of γ are plotted in Fig. 5 .
The quantity pcnot(KQ, γ) gives the overall "work" overhead for implementing a computation using the C 4 /C 6 architecture, but does not differentiate between parallel and sequential resources or indicate the number of physical qubits needed per logical qubit ("scale-up"). The C 4 /C 6 architecture does not determine these resources uniquely, as they depend on how the trade-off between parallelism and requirements for memory is resolved. In the case of maximum parallelism, the scale-up is close to pcnot(KQ, γ). If minimum parallelism is used, this can be reduced to a small multiple of the minimum scale-up associated with the C 4 /C 6 concatenated code at the level l that is used. This minimum scale-up is given by 3 l−1 2 (taking into account that there are two qubits per block of 3 l−1 4 qubits). If there is no memory error at all, the additional overhead per block can be minimized by operating on only one block at a time. Otherwise, for each block, two additional blocks are needed in error-correcting teleportation. Logical Bell state preparation requires an additional overhead depending on the degree of parallelism required. If the subblock teleportations in the preparation are done in parallel, and taking into accounting lower level Bell state preparations, two more blocks or equivalent are needed for each level other than the first. This means that 1 + 2(l − 1) blocks are needed per computational block. The |π/8 -state preparation has additional overhead. Depending on how it is implemented it may require up to 14 blocks with their own overhead of 1 + 2(l − 1) or more blocks each. The contribution of |π/8 -state preparation can be minimized by implementing the logical part of the computation sequentially but using memory steps to remove the effects of memory error as needed. Based on these estimates, the scale-up for low but not minimum parallelism is ≈ 3 l−1 2(1 + 2(l − 1)). At levels 2, 3, 4, this evaluates to 18, 90, 378, respectively.
The error-correcting C 4 /C 6 architecture is relatively simple and designed to work well at high EPGs. However, there is a minimum resource cost (of order 10 3 per gate and qubit) to use it since error-correction kicks in only at level 2. As a result, at low EPGs, architectures such as Steane's 9 based on more efficient codes with little or no concatenation are more efficient and have more flexibility in achieving the desired logical error probabilities. This effect can be quantified by comparing the C 4 /C 6 architecture to that of Steane using the illustrative example at γ ≈ 10 −4 worked out in [9]. Steane's error model differs from ours in that preparation, measurement and one-qubit gates all have error probability γ. In our analysis, preparation and measurement errors are 4γ/15, which we justified with a purification scheme. This scheme could also be used in the context of Steane's error model. We compare the two architectures based on the resources per logical qubit of one logical step such as a CNOT, for which the C 4 /C 6 architecture does not require one-qubit gates other than preparation and measurement. Steane's error model also includes memory error (γ/100 per step) and accounts for measurement times in excess of gate times (25 times the gate time). In our model and in the maximally parallel setting, this would require an additional error of γ/4 per qubit at the end of state preparation to delay for measurement outcomes that determine whether the state is good or not. The comparison is also complicated by Steane's method deferring some error correction to later steps (we do not account for the implicit overhead in this) and by our method having both detected and conditional logical error, with the latter typically being much lower (we use only the detected error for comparison).
Steane's example is based on a [[127, 43, 13] ] code, which encodes 43 logical qubits. Full error correction of a block requires about 1.8 × 10
4 physical CNOTs on average and has a probability of logical error (called "crash probability" in [9]) of ≈ 3 × 10 −10 . This translates to ≈ 420 physical CNOTs per qubit and gate and an effective error of ≈ 7 × 10 −12 per logical qubit. The C 4 /C 6 architecture at level 3 uses 4158.1 physical CNOTs for an error-correcting teleportation. Including 36 physical gates for a transversal operation, this gives ≈ 2100 physical CNOTs per qubit and gate. The detected error probability for a logical CNOT was estimated above as 2.2 × 10 −8 , which translates to ≈ 5.5 × 10 −9 effective error per qubit and gate. To meet the effective error probability achieved by Steane requires another level of encoding. At level 4, the C 4 /C 6 architecture uses 2.6×10 4 physical CNOTs per qubit and gate and with a detected error probability of 6 × 10 −14
per qubit and gate. One can also compare the scale-up for the two architectures at γ = 10 −4 : Steane's example has a scale-up of between 10 and 20 compared to from 378 to over 2000 for the C 4 /C 6 architecture at level 3, depending on parallelism. As expected, Steane's architecture requires fewer resources at low EPGs. It is however notable that the C 4 /C 6 architecture requires only two orders of magnitude more resources at EPGs as low as γ = 10 −4 . The C 4 /C 6 architecture has the advantage of simplicity and of yielding more reliable answers, conditional on having no detected errors.
