We study the performance of distance-three surface code layouts under realistic multi-parameter noise models. We first calculate their thresholds under depolarizing noise. We then compare a Pauli-twirl approximation of amplitude and phase damping to amplitude and phase damping. We find the approximate channel results in a pessimistic estimate of the logical error rate, indicating the realistic threshold may be higher than previously estimated. From Monte-Carlo simulations, we identify experimental parameters for which these layouts admit reliable computation. Due to its low resource cost and superior performance, we conclude that the 17-qubit layout should be targeted in early experimental implementations of the surface code. We find that architectures with gate times in the 5-40 ns range and T1 times of at least 1-2 µs will exhibit improved logical error rates with a 17-qubit surface code encoding.
I. INTRODUCTION
Topological quantum error-correcting codes are a leading approach to scalable fault-tolerant quantum computation [1, 2] . The most practical topological code to date is the surface code, which calls for a 2-D planar qubit layout with only nearest-neighbor interactions [3] [4] [5] [6] . It has been shown to allow error rates up to a threshold of approximately 1% [1, [7] [8] [9] . Several quantum architectures, including superconducting devices [10, 11] and ion traps [12] [13] [14] [15] , are suitable for realizing the surface code. Recent experiments on superconducting qubits have even demonstrated error rates in the required range [16] .
Until recently, the threshold for the surface code has been primarily calculated for the depolarizing channel [1, [7] [8] [9] . Simulation of the surface code and the depolarizing channel requires only Clifford operations and Pauli measurements on stabilizer states, allowing efficient simulation on a classical computer under the GottesmanKnill theorem [17, 18] .
It has been shown that realistic quantum noise such as decoherence can be sufficiently approximated by a depolarizing noise model parametrized by a method such as Pauli twirling [19, 20] , enabling efficient simulation. Simulations of the surface code with noise based on Paulitwirl approximations have been performed for several superconductor architectures [21] . Other studies have achieved efficient classical simulation of realistic noise models by using Clifford gates to approximate arbitrary gates [22, 23] and amplitude damping [24] .
More recently, it has been shown that the surface code threshold is significantly degraded in the presence of qubit leakage in conjunction with depolarizing noise [25] . It has also been shown to achieve arbitrary reliability given modest additional qubit resources under local many-qubit errors and non-local two-qubit errors [26] . A recent study has determined a threshold for the surface code considering correlated errors and the coupling between qubits and the environment by formulating the problem as an Ising model [27] .
In all cases, the thresholds have been calculated for a standard surface code layout. Variations of the surface code layout have been proposed [28, 29] that reduce the qubit and gate resources necessary for implementation. To the best of our knowledge, the thresholds for these modified surface code layouts have not been analyzed. In addition, studies of the threshold under realistic (non-Clifford) noise models have been limited due to the exponential cost of simulation. With device error rates rapidly approaching the surface code threshold, it is timely to investigate the performance and requirements of low-distance surface code layouts for near-term experimental implementation.
In this work, we determine the threshold for distancethree surface code layouts under depolarizing and realistic noise models. We study the layouts under an amplitude and phase damping channel and an approximation of the channel using Pauli twirling [21] . Our studies demand simulation of non-Clifford operations, which requires memory exponential in the number of qubits. We use the LIQU i| [30] software architecture for our simulations. We also outline parameter regimes that enable reliable quantum error correction for low-distance surface codes and present a decoder based on a small lookup table optimized for distance-three layouts and limited classical computation.
Our paper is organized as follows. Section II briefly reviews the surface code and three layouts for the distancethree code. We introduce our decoding method, based on a small lookup table, in Section III. Section IV describes the realistic noise models and their approximations. Our experimental methodology is introduced in Section V. In Section VI, we present our surface code simulation results. Finally, we conclude in Section VII.
II. LOW-DISTANCE SURFACE CODES
The surface code is a stabilizer code arranged on a 2-D lattice with nearest-neighbor interactions [3] . It encodes a single logical qubit in a number of physical qubits that is determined by the code distance d and desired layout (described below). Through repeated measurement of its stabilizer generators, the surface code in conjunction with a classical decoding algorithm can detect errors and subsequently correct up to (d − 1)/2 physical errors. The distance d dictates the length of the shortest undetectable error chain and in turn is also the length of the shortest logical operator. For an excellent review of the surface code, we refer the reader to [1] .
A. 25-qubit Layout
We study three different distance d = 3 layouts, shown in Figure 1 . We begin by discussing the standard layout, referred to as Surface-25, shown in Figure 1 (a). It uses a (2d − 1) × (2d − 1) square grid of qubits with a smooth and rough boundary [4] . For d = 3, the grid contains 25 qubits of which 13 data qubits (large white circles) are used to encode the logical qubit and 12 syndrome qubits (small black circles) are used to extract the error syndromes by way of stabilizer measurements.
Surface-25 is simultaneously stabilized by the group of stabilizer generators listed in Table II . In Fig. 1 , the Z stabilizers are represented by light (yellow) patches and the X stabilizers are represented by dark (green) patches, where each patch represents a tensor product of Z (or X) operators on the data qubits surrounding the patch.
A logical X operator X L is defined as a chain of physical X operations between two data qubits on opposite smooth boundaries (top and bottom edges). The chain is allowed to cross any Z stabilizer patch and follow any edge of an X stabilizer patch. A logical Z operator Z L is defined analogously as a chain of physical Z operations between two data qubits on opposite rough boundaries (left and right edges). Table II lists one possible logical X and Z operator. There are 2 G equivalent logical operators for each logical Pauli operator (X and Z), where G is the number of stabilizer generators for the given surface code. Since X L and Z L commute with all of the stabilizers and cannot be written as a product of them, logical errors, which come in the form of logical operators, cannot be detected by the code.
The surface code detects errors through the eigenvalues of the stabilizers. A bit-flip (phase-flip) on a data qubit will change the eigenvalue of adjacent Z (X) stabilizers. To extract an eigenvalue, also referred to as an error syndrome, a given stabilizer is measured. Figure 2 shows the standard quantum circuit for measuring the stabilizers [1, 9] , where data qubit b corresponds to the top (north) qubit and c corresponds to the bottom (south) qubit of each diamond patch in Fig. 1(a) .
The circuit begins with CNOT gates that propagate error information from the data qubits a,b,c,d to the syndrome qubit (black circle). CNOT gates are performed in the order: top (b); left (a); right (d); bottom (c). Cyclic orders, such as a clockwise or counter-clockwise, i.e., bdca, fail to maintain commutation of nearby stabilizers, which in turn can cause random measurement outcomes [1] . Thus the order of CNOT gates is required to follow an "S" or "Z" shape.
The syndrome qubit is then measured to extract the eigenvalue of the stabilizer. These error syndromes are input to a classical decoding algorithm to determine an appropriate correction operator. Details of our decoding algorithm are given in Section III. The total number of operations in a given round of stabilizer measurements for the surface code is given in Table I . 
B. 13-and 17-qubit Layouts
The number of qubits in Surface-25 can be reduced while maintaining the same code distance by rotating it clockwise by 45 degrees and removing the four corner data qubits [28, 29] , shown in Fig. 1(b) . The number of data qubits is reduced from 13 to 9 and the number of syndrome qubits is reduced to 8 for a total of 17 qubits. We call this layout Surface-17. The stabilizer generators contain weight-4 and weight-2 stabilizers (Table II) . Figure 3(a) shows the circuit for a simultaneous weight-4 X and weight-2 Z stabilizer measurement.
A further reduction in qubits can be obtained by reusing the syndrome qubits [29] . Surface-13 uses only 4 syndrome qubits as shown in Fig. 1(c) . Each syndrome qubit is used twice, once for X stabilizer measurement and once for Z stabilizer measurement. Figure 3 (b) contains the corresponding circuit for measuring a weight-4 X stabilizer followed by a weight-2 Z stabilizer. Surface-13 reduces the number of qubits but increases the depth of a round by 4 time steps. The depth and number of operations required for one round of the surface code for Surface-17 and 13 are given in Table I . The stabilizers and logical operations for these two layouts are listed in Table II. Despite having fewer stabilizers, Surface-17 and Surface-13 still remain distance-three surface codes [28, 29] . Due to their reduction in resources by 32-48%, these layouts are promising candidates for early experimental implementation. In Section VI, we determine which layout is most promising based on its threshold and resource costs.
III. DECODING METHOD
A standard method for mapping error syndromes to the most probable error chain is the minimum weight perfect matching algorithm [7, 31, 32] . It requires time O(n) for n detection events if executed serially, and O(1) time if executed in parallel [33] . The algorithm independently corrects X and Z errors by identifying the most likely error chain for each type such that the total chain weight is minimal. The algorithm has recently been extended to handle correlations between X and Z errors, in which case the chains are not constructed independently [34] . Corrections are applied along the chain(s). If after correction a chain of errors connecting two smooth (rough) boundaries remains, then a logical error has occurred. If errors are assumed to be independent, then long chains will be exponentially unlikely.
A. Lookup Table Decoder
In this work we target first-generation implementations of a single qubit protected by a small surface code. While the classical time and space requirements of the minimum weight perfect matching algorithm are modest, we further reduce the classical computational overhead by designing a lookup table based on the algorithm that can be implemented on a small classical device. Our lookup table is designed to find the most probable low-weight error chain from a short history of error syndromes.
Consider the set of error syndromes that indicate an error after one full (noisy) round of the surface code, that is, those indicating a −1 eigenvalue. Based on the error syndrome locations, the decoder determines the probable data-qubit error locations. For example, consider a Z error on qubit 4 in Surface-17 ( Fig. 1(b) ). Given that no other errors occur, after one round of the surface code syndrome qubits 11 and 14 will indicate an error. The decoder will determine the shortest error chain connecting these two syndromes includes data qubit 4. To correct the error chain, Z 4 will be applied.
As another example, consider an X error on qubit 6. It will cause syndrome 13 to indicate an error. Since syndromes 10 and 12 do not indicate errors, the decoder will infer an error on either data qubit 6 or 7. In this case, the decoder can correct either X 6 or X 7 since X 6 X 7 is a stabilizer.
An error syndrome may also occur due to a measurement error. However, the decoder may interpret it as a data-qubit error. For example, consider a measurement error on qubit 11. The decoder will either apply Z 0 or Z 3 to "correct" the error, thereby adding an error to a clean data qubit.
To improve identification of actual data-qubit errors, inference is performed based on several rounds of stabilizer measurements [7] . Consider performing r rounds of the surface code consecutively. Instead of storing the syndromes for each round, we store the locations in time and space of the syndromes whose values change, or "flip", between the current and previous round.
For r rounds, this requires storing a 3-D space-time array containing at most s × r values, where s is the maximum number of syndrome changes in a round. We refer to this 3-D array as the syndrome volume, where dimension r represents time. The goal is to determine a correction operator (a product of X and/or Z operators) based on the syndrome volume such that the number of errors remaining after correction is minimized, in turn reducing the chance of forming a logical error chain.
Our lookup table is based on the fact that short error chains are more likely than long chains. Assuming a syndrome volume contains r rounds, we construct a lookup table based on the following rules ( Figure 4 shows the rules visually):
1. If the same syndrome flips twice in two consecutive rounds, the pair (in time) of syndromes is ignored since it most likely indicates a measurement error. 2. If a pair (in space) of neighboring syndromes flips in the same round, a correction on the data qubit between the pair is applied. 3. If a syndrome flips in round r−1 and its neighboring syndrome flips in round r, a correction on the data qubit between the pair (in time) is applied. 4. If a syndrome flips only once and in a round other than the last, a correction is applied to a data qubit on the boundary such that the data qubit is not between two stabilizers that did not indicate a syndrome. 5. If a single syndrome flips only once and in the last round, the information is kept until the next round of error correction. No correction based on this syndrome is applied. In this case the location of the error, if any, is inconclusive without another round of syndrome measurements.
We decode by checking the above rules in order and determining the set of data-qubit error locations. We then switch the order of rules 2 and 3 and determine another set of possible error locations. We correct based on the set with fewer error locations, since fewer errors are more likely. Here we assume that r = 3.
These rules are equivalent to the minimum weight perfect matching algorithm applied to only neighboringsyndrome pairs, with uniform weight for the same distance. Since our surface codes are small, performance of the code does not improve when decoding considers more distant pairs. We encode these rules into a lookup table. The lookup table maps the syndrome volume of measurement flips to a set of probable errors on the data qubits. The table requires constant time and 2n space, where n is the number of data qubits.
B. Improved Stabilizer Measurement Circuits
In our simulations of Surface-13 and 17 under noise, we find that using the same CNOT ordering for both X-and Z-type stabilizer measurements could result in a single error on a syndrome qubit, leading to a logical X or Z error (details on noise are given in Sec. IV). Figure 5 (a) shows an example. A Z error on a Z-stabilizer syndrome qubit after the first two CNOT gates propagates onto two horizontally aligned data qubits. Since our surface codes require only three data qubits to complete a logical error chain, the next round of syndrome measurements will incorrectly diagnose a Z error on the third qubit, leading to a logical Z error chain.
To prevent the creation of a logical error, we propose to measure X-and Z-type stabilizers in different orders. The sequence for X stabilizers is the same as in Figure  2 . We modify the order of CNOTs in Z stabilizers as: top right (b); bottom right (d); top left (a); bottom left (c) ( Figure 5(b) ). This order maintains the alignment of qubits a and c such that they are perpendicular to the direction of the corresponding logical chain. It also preserves the commutation relations as well as the circuit depth and size. Fig. 5 shows an example where two Z errors map to a single Z error with the new order, versus a logical error with the old order. We use this new order for all simulations in this paper. 
IV. NOISE MODELS
In this section, we present the noise models considered in our surface code simulations. We review two noise models that can be simulated efficiently on a classical computer (depolarizing and Pauli-twirl approximation) and one noise model that requires exponential memory to simulate (amplitude and phase damping).
A. Symmetric and Asymmetric Depolarizing Channels
The depolarizing channel (D) is a standard quantum noise model in which a qubit becomes depolarized with a given probability p. This channel transforms a density matrix of a single qubit as
where
In this model, a qubit suffers from discrete Pauli bit-flip (X), phase-flip (Z), or bit-and-phase flip (Y ) errors with probabilities p X , p Z , and p Y , respectively. When p X = p Y = p Z , this channel is called a symmetric depolarizing channel. When the probabilities are independent, the model is called an asymmetric depolarizing channel.
B. Amplitude and Phase Damping Channel
The amplitude damping channel (AD) characterizes the behavior of energy dissipation of the quantum system, including spontaneous emission of a photon from a qubit. This channel transforms the density matrix of a single qubit as
Circuit representation of amplitude damping [20] .
and p AD is the probability of a qubit emitting a single photon. Figure 6 expresses amplitude damping of a single qubit in the form of a quantum circuit where an ancilla qubit is used to represent the environment and sin 2 (θ/2) = p AD [20] . The input is an arbitrary single-qubit state |ψ in = a|0 + b|1 and the output state is given by
where N is a normalization constant. The probabilities of measuring 0 and 1 are 1−b 2 p AD and b 2 p AD , respectively. During simulation, we do not use an extra ancilla as shown in the circuit in Figure 6 . Instead, we calculate the probability of measuring 0 and 1 given input state |ψ in , and simulate the measurement outcome with a random number. When the simulated measurement is 0, we apply the rotation R y (θ) on |ψ in . When it is 1, we apply damping and the state becomes |0 .
The phase damping channel (P D) is described similarly as
Phase damping noise, also called pure dephasing, is equivalent to the phase-flip channel. By unitary freedom of operator-sum representation, we can derive a new set of operation elements to express the channel in terms of the probability of a phase-flip (Z) error,
. We assume that amplitude and phase damping (AP D) are the main sources of decoherence. Using these two channels together, decoherence on a single qubit transforms the density matrix as
where t is the execution time of the gate including identity, T 1 and T 2 are the single-qubit relaxation and dephasing times, respectively, and e −t/T 1 = 1 − p AD and e −t/T 2 = (1 − p AD )(1 − p P D ) [20] .
C. Approximate Amplitude and Phase Damping Channel
Using a technique called Pauli twirling (P T ) [19] , a Pauli channel T can be used to approximate the decoherence channel given in Eq 7 [21, 35] , where
Twirling results in removal of the off-diagonal terms and in turn allows expression of the channel as an asymmetric depolarizing noise channel (given in Eq 1) with the probabilities given by
where the probabilities of failure are expressed in terms of the execution time t of a gate, the qubit relaxation time T 1 , and the qubit dephasing time T 2 [21] . Assuming errors are independent, the probabilities of two-qubit errors, for example when a CNOT gate fails, are approximated as in [21] as
V. EXPERIMENTAL SETUP
We use the LIQU i| software architecture [30] to perform simulations of the surface code under noise. LIQU i| (Language-integrated Quantum Operations) contains an embedded, domain-specific language for programming quantum circuits as well as two circuit simulation environments. The first environment allows efficient simulation of Clifford circuits, based on the GottesmanKnill theorem, and is called Stabilizer simulation [17, 18] . The second environment, called Universal simulation, allows full simulation of arbitrary quantum circuits.
While some of our noise models allow Stabilizer simulation, we have chosen for consistency to perform all simulations within the Universal simulation environment. LIQU i| allows universal simulation of a number of qubits that is limited by the main memory of the machine. We ran simulations on a large HPC cluster containing several hundred nodes with 32GB of RAM each, allowing Fig. 3(a) . Four different identity gates are used based on the other location type in the given timestep: prepare (P), single-qubit gate (H), two-qubit gate (C), and measurement (M).
simulation of up to roughly 30 qubits on each node. Our simulations required thousands of hours of compute time.
A. Monte-Carlo Simulation
We restrict the operations in our circuits to the five types given in Table I , which we refer to as location types: I, H, CNOT, Prepare a |0 state, and Measure in the Z basis. When no location type is specified on a qubit, the identity gate I is applied to that qubit, where the duration of the identity is set by the location type occurring on other qubits in the time step. When a qubit is idle for a duration of t time steps (while gates are being applied on other qubits), we apply t identity gates to it to simplify the simulations. Figure 7 shows the circuit of Figure 3 (a) with identity gates inserted. Further circuit optimization can be performed, for example by delaying qubit preparation and measuring a qubit as soon as gate operations complete. Such optimization will result in improved thresholds. For simplicity, we choose to maintain gate alignment between stabilizers.
We perform Monte-Carlo simulation of the surface code layouts to compute the logical error rates. At each time step of the circuit, each qubit undergoes a location type followed by the given noise model. For depolarizing noise and approximate damping noise, we replace each location type except measurement by the location type followed by an X, Y , or Z gate ("error") with probability p X , p Y , and p Z , respectively. In the case of measurements, X, Y , or Z errors are placed before the measurement location.
For amplitude and phase damping and the Pauli-twirl approximation, we apply the noise model after every location given the duration of the current time step t. The duration values we consider are given in Table III of Section V C. 
B. Logical Error Rate Calculation
We calculate the logical error rate of a given layout by simulating it under the various noise models. At the start of each simulation, we initialize all data qubits to |0 (if preparing |0 L ) or |+ (if preparing |1 L ) and run a noise-free cycle of syndrome measurements to project into an initial stabilizer state of the code. We refer to this state as the quiescent state [1] . Note that for a code with s stabilizers, there are 2 s possible quiescent states, since each stabilizer measurement can randomly project to either a ±1 eigenstate. In the absence of noise, the quiescent state will be maintained during subsequent rounds of the surface code.
After initialization of the quiescent state, the simulation proceeds as follows:
1. Execute two rounds of the surface code with noise (execute three if this is the first execution of the loop). Record the list of syndrome flips between contiguous rounds in the syndrome volume. For the first round, compare to the quiescent state. 2. Apply the decoder (Section III) to the three-layer syndrome volume to determine the most probable set of error locations. 3. Apply noise-free corrections to the state. In practice, corrections can be tracked directly in software. 4. Check for a logical error by calculating the distance of the state to the possible logical states. If the closest logical state is incorrect, count a logical error. 5. Repeat from Step 1 until m logical errors are detected.
After each logical error check (Step 4) the syndrome volume contains a list of unpaired syndrome flips due to the last two rules of our decoder. Each syndrome volume, as shown in Fig. 8 , thus contains three layers: the final layer from the previous volume and two layers from two additional rounds of the surface code. We refer to the number of rounds in the volume as the window size. We experimented with various window sizes and found three was optimal for distance-three layouts. In our simulations, m varies between 10 and 200 depending on the size of the physical error rates.
We calculate the logical error rate per window since in an experiment, the logical qubit will be measured after completion of a window to ensure optimal decoding and correction. For a window containing r rounds, the logical error rate P r is given by
where R represents the number of windows executed to observe m logical errors. When r = 1, Eq 11 represents the logical error rate per round of the surface code.
Since we only calculate P r for distance d = 3, we estimate the pseudothreshold [36, 37] , denoted as P th r as opposed to the asymptotic threshold as d → ∞. The pseudothreshold can be defined by the crossing point between the line x = y and the plot p vs. P r . If the error rate p of each physical location type falls below the pseudothreshold P th r , then the code is guaranteed to lower the logical error rate below p.
The logical error rate per window P r and the logical error rate per round P 1 are related by
For depolarizing noise, we calculate P 1 (to compare with previous work) and P 3 . For amplitude and phase damping and the Pauli-twirl approximation, we calculate P 3 .
C. Architectural Settings
For amplitude and phase damping and the Paulitwirl approximation, we consider several parameter settings derived from superconductor and ion trap architectures. These architectures are well-suited to 2-D, nearestneighbor operations required for the surface code. Table  III lists the different parameter settings considered for each architecture. The time per round t r,{13,17,25} indicates the time required to complete one round of the surface code given the other parameters. These six architecture settings represent a range of round times between 165 ns to 602 × 10 3 ns for Surface-17 and Surface-25. Note that the Surface-13 layout requires roughly twice the amount of time of Surface-17.
2D superconducting architectures have demonstrated fast single-and two-qubit gate execution times in recent years [16, [38] [39] [40] . Current gate times are in the range of 10-20 ns and 30-80 ns for single-qubit and two-qubit gates, respectively, with experimental T 1 times as long as 20-40 µs [16, 40] . The DiVincenzo (SC D ) [11] and Helmer (SC H ) [10] superconductor parameters are derived from [21] . SC D requires longer CNOT gate times than SC H . SC S and SC F represent parameters for slow and fast gate times, respectively, based on recent experiments [38, 39] . In particular, they account for µs prepa- [12] [13] [14] [15] 41] . While trapped ion devices tend to have longer gate execution times than superconductor devices, they have been shown to have much longer relaxation and dephasing times in the range of 780-1800 ms [42, 43] . Recently, a T * 2 time of 50 s has been reported [41] . IT S accounts for gate times observed in current experiments and longer preparation and measurement times [42] [43] [44] . IT F accounts for gate, preparation, and measurement times of a proposed scalable ion trap quantum computer model [13] . It assumes that all gate operations are within one Elementary Logic Unit (ELU) with 10-100 qubits arranged linearly. ELUs are connected to each other using photonic quantum channels to achieve modular scalability.
VI. EXPERIMENTAL RESULTS
In this section we analyze numerical Monte-Carlo simulations of the distance-three surface code layouts under the multi-parameter noise models. We first determine the distance-three layout that admits the highest pseudothreshold under depolarizing noise. We then study the performance of the preferred layout under several realistic noise models. In particular, for the six architectural settings we compare the accuracy of the approximate amplitude and phase damping channel, which can be efficiently simulated, to the amplitude and phase damping channel, which requires universal simulation. In each plot, error bars indicate the upper bound statistical significance using the standard deviation.
A. Depolarizing Noise
We begin by calculating the symmetric depolarizing noise threshold for each distance-three layout. In this model, each location fails with probability p. For single-qubit locations, P I = 1 − p and P X = P Y = P Z = p/3. For two-qubit locations, P I,I = 1 − p and P {I,X,Y,Z},{I,X,Y,Z} = p/15. Since the circuits and round times differ, we expect the pseudothreshold to vary for each layout. Figure 9 plots the location error rate p versus the logical X error rate per round P 1,X for Surface-13, 17, and 25, where each layout encodes a logical |1 L state and we check for a logical bit-flip X L . Each point represents between 10 and 200 independent simulation runs.
The corresponding pseudothresholds calculated per round (P th 1,X ) and per window (P th 3,X ) are given in Table  IV . We find that Surface-13 exhibits slightly lower pseudothresholds due to its higher circuit depth. Similarly, Surface-25 requires more data qubits and syndrome measurements, thus exhibiting a small decrease in its pseudothreshold as compared to Surface-17.
Table IV also contains the pseudothreshold and threshold calculated by Fowler et al. for Surface-25 [7, 45] . Our Surface-25 per-round pseudothreshold is slightly lower. Our simulations use a constant window size (Section V) to set the volume history, while Fowler et al. use a volume including a history of rounds limited only by the data available. They perform minimum weight matching continuously, round by round, based on a large volume, while we perform correction based on our lookup table and three rounds of history in the volume. We use a static, small window in order to mimic future experimental implementations which are likely to be limited to a small number of rounds and to restricted cold classical processing.
We also calculate the logical Z error rate P L,Z for each layout by encoding a logical |+ L state and checking for a logical phase flip Z L . Figure 10 plots the location error rate p versus P 1,{X,Z} for Surface-17. It is apparent from the plot that the pseudothresholds P th 1,X and P th 1,Z are comparable. We find similar results for Surface-13 and Surface-25. Based on these results, we conclude that Surface-17 is the preferable layout. It requires roughly half the depth of Surface-13 and significantly fewer qubits and gates than Surface-25. In addition, Surface-17 exhibits slightly higher pseudothresholds than the other layouts. For the remaining experiments, we thus perform all simulations based on the Surface-17 layout. In this section, we compare the accuracy of the approximate amplitude and phase damping channel using Pauli twirling to the amplitude and phase damping channel. We first verify that our logical Z and X error rates per round for the Pauli-twirl approximation on Surface-17 align with those reported in [21] . For T 1 = 10 µs, we find P Z,1 = 4.27 × 10 −3 and P X,1 = 4.41 × 10 −3 . These results are very similar to [21] ; small differences are expected since Surface-25 is used in [21] .
We then calculate the logical Z error rate per window, P 3,Z , for a qubit in the encoded |+ L state in Surface-17 for both channels for the Helmer setting (SC H ). Figure  11 (a) plots T 1 versus P 3,Z for approximate (solid red) and amplitude and phase damping (dashed green). We see that the approximate channel using Pauli twirling results in a logical Z error rate that closely matches that of the actual channel.
We also calculate the logical X error rate per window, P 3,Z , for a qubit in the encoded |1 L state in Surface-17 for both channels under SC H , plotted in Fig. 11(b) . We find that the approximation channel results in much higher logical X error rates, in particular as the qubit relaxation time T 1 increases. Pauli twirling results in a pessimistic estimate of the error rate, indicating that the threshold under decoherence may be significantly better than previously calculated with this technique.
Since the Pauli-twirl approximation aligns well for phase-flip errors, we further compare its performance on bit-flip errors. Fig. 12 plots T 1 time (µs) versus memory duration (µs) versus the logical X failure rate P 3,X of a qubit encoded in |1 L in Surface-17 for the SC H setting under (a) the Pauli-twirl approximation and (b) amplitude and phase damping. On the left, the blue surface represents the amplitude damping probability of an unencoded qubit in |1 for a given T 1 time and memory duration. Since the qubit is in |1 , phase damping does not apply. The yellow surface represents the logical error rate P 3,X of an encoded qubit for a given T 1 time and surface code round time (see Table III ). The orange surface indicates the upper error bar of P 3,X . For the yellow and orange surfaces, the encoded qubit undergoes the surface code three-round window time. For the blue surface, the unencoded qubit undergoes the given memory duration.
The region where the blue surface lies above the orange and yellow surfaces represents the regime where Surface-17 encoding improves the logical error rate of the qubit (similar to being below pseudothreshold). The region is larger in Fig. 12 (b) than Fig. 12 (a) , indicating that Pauli twirling results in a pessimistic estimate of the logical error rate.
The 2D plots on the right are a view from the +z-axis. The blue and red regions indicate T 1 times (x-axis) for which encoding a qubit in |1 L in Surface-17 reduces or increases, respectively, the logical error rate compared to an unencoded |1 qubit in memory for a given duration (y-axis). The purple region indicates the upper error bar of P 3,X where the orange and blue surfaces cross in the 3D plots. Surface-17 again demonstrates superior performance under amplitude and phase damping compared to Pauli twirling. For example, for T 1 = 1 µs, memory durations above 150 ns result in lower logical error rates for an encoded qubit than an unencoded qubit, while the Paulitwirl approximation lowers error rates only for memory durations longer than 350 ns. At T 1 = 50 µs, memory durations of 20 ns result in lower logical error rates, while Pauli twirling indicates lower rates at memory durations longer than 30-70 ns.
C. Amplitude and Phase Damping Figure 13 shows the same 2D plots for Surface-17 for all six architecture settings under amplitude and phase damping. For each architecture, the y-axis ranges from 0 µs to the time per surface code window. In all graphs, we see that as T 1 increases, encoding improves the logical error rate for a larger range of memory durations. This behavior is expected since the amplitude damping probability monotonically increases with memory duration.
In Fig. 13(a) , at T 1 = 1 µs we observe that for the SC S parameters, encoding does not improve the logical error rate for any plotted memory duration. However, with 10 times faster gates (SC F ), we see performance improvement, as shown in Fig. 13(b) . At T 1 = 1 µs, encoding provides a better logical error rate than an unencoded qubit in memory for at least 8 µs. At T 1 = 10 µs, SC S shows no improvement with encoding, while SC F exhibits improvements for memories of 2 µs or longer.
The SC H setting accounts for 100 times faster preparation and measurement than SC F and roughly 10 times faster gates. The faster times lead to significantly better performance under encoding. For example, at T 1 = 1 µs and 10 µs in Fig. 13(d) , the logical error rate decreases due to encoding for memory durations longer than 150 ns and 20 ns, respectively.
Comparing Fig. 13 (c) and (d), we find CNOT time strongly influences performance. A CNOT gate is four times longer in SC D than SC H . The longer twoqubit gate time is reflected in the poorer performance of Surface-17 under SC D parameters. At T 1 = 1 µs, SC D only indicates logical error rate reduction due to encoding at memory durations roughly 3 times longer than those required for SC H . Fig. 13 (e) and (f) show similar results for the ion trap settings. While IT F assumes 10 times faster CNOT gates, both IT S and IT F yield lower logical error rate upon encoding for a range of T 1 times. IT S results in improvements for memory durations longer than 300-400 µs, while IT F results in improvements for memory durations above around 15 µs.
In Figure 14 , we plot qubit relaxation time T 1 (µs) versus logical error rate P 3,X (red) or amplitude damping probability (blue) for the six architecture settings (analogous to Fig. 11(b) ). All plots assume a three-round memory duration. From the plots, the logical error rate for each architecture for a given T 1 time can be extracted. As gate times and T 1 times improve, the logical error rate decreases. An order of magnitude improvement in logi- The blue surface represents the amplitude damping probability at a given T1 and memory duration (unencoded qubit). The yellow surface is the simulated logical error rate given T1 for a qubit encoded in the |1L state in Surface-17. The orange surface indicates the upper error bar on P3,X . (Right) 2D plots from the +z-axis. The blue and red regions indicate a range of T1 times (x-axis) for which encoding a qubit in the |1L state in Surface-17 reduces or increases, respectively, the logical error rate compared to an unencoded |1 qubit in memory for a range durations (y-axis).
cal error rate can be obtained, for example, in improving gates time from those of SC F to those of SC H .
The plots also indicate that near-term experiments may be able to detect improved logical error rates due to encoding, providing experimental evidence of surface code error correction. For example, for T 1 = 1 µs, no difference in the logical error rate can be detected between an encoded and unencoded qubit given settings SC S and SC F . However, SC F exhibits differences on the order of one magnitude at T 1 times larger than 30 µs. Both SC D and SC H settings indicate significant difference in the logical error rate on an encoded versus unencoded qubit. In the case of both ion trap settings, a difference in logical error rate can be detected starting at T 1 = 10 µs. Figure 15 contains plots for T 1 times up to 50 µs for the four superconducting architectures. The left column contains 3D plots of T 1 time (µs) versus memory duration (µs) versus logical failure rate. The middle column contains 2D plots of T 1 time (µs) versus memory duration (µs). The right column contains 2D plots of T 1 time (µs) versus logical failure rate. We conclude that gate durations in the SC S setting are too slow for Surface-17 to significantly decrease the logical error rate given realistic T 1 times (Fig. 15(a) ). However, given gate durations between the SC F and SC H settings and current T 1 times of 20-40 µs, encoding a qubit in Surface-17 results in significantly improved error rates over an unencoded qubit (Fig. 15(b)-(d) ). For both superconductor and ion trap architectures, near-term experimental implementations could demonstrate surface code error correction of a single logical qubit, and mea-
Plots of T1 time (µs) versus memory duration (µs) for six architecture settings under amplitude and phase damping. The blue and red regions indicate a range of T1 times (x-axis) for which encoding a qubit in |1L in Surface-17 reduces or increases, respectively, the logical error rate compared to an unencoded |1 qubit in memory for a range durations (y-axis).
sure signficiant improvements in the logical error rate. We find that previous estimates of 2.6-2.8 µs T 1 times [21] to achieve improved logical error rates are too high, and in fact at only 1 µs T 1 time, the logical error rate can be improved using Surface-17.
VII. CONCLUSION
We have analyzed three distance-three surface code layouts under realistic noise models. Under symmetric depolarizing noise, we find the pseudothreshold is slightly lower for Surface-13 as compared to Surface-17 and 25. We have compared the performance of Surface-17 simulated under a Pauli-twirl approximation and amplitude and phase damping. Our results show that Pauli twirling pessimistically estimates the logical bit-flip rate. Thus the surface code threshold under realistic noise may be significantly better than previously calculated.
We have also simulated the 17-qubit surface code under amplitude and phase damping for six architecture settings. While gate durations in the SC S setting are too slow, gate durations between SC F and SC H with current T 1 times show improved logical error rates for a 14. Plots of T1 time (µs) versus the logical X error rate P3,X for six architecture settings for a qubit encoded in |1L in Surface-17 subject to amplitude and phase damping (red) and an unencoded |1 qubit subject to amplitude damping (blue) for the duration of three rounds of the surface code.
qubit encoded in Surface-17. For both superconductor and ion trap architectures, current state-of-the-art experiments may be able to demonstrate surface code error correction. For example, with T 1 around 10 µs and SC F settings, logical error rates will improve by encoding in Surface-17 and may be detected in experiment. With SC H settings, even shorter T 1 times will result in significant improvements with encoding.
Methods of approximating decoherence using Clifford gates have recently been shown to be more accurate than Pauli twirling [22, 24] . However, studies have only been conducted at the gate operation level as opposed to the circuit level of a given code. A direction for future work is to simulate these noise models on Surface-17 to compare to amplitude and phase damping. Another direction is to determine the performance of Surface-17 under leakage. Finally, development and simulation of realistic noise models for specific architectures will be important for guiding experimental surface code implementations. times (x-axis) for which encoding a qubit in |1L in Surface-17 reduces or increases, respectively, the logical error rate compared to an unencoded |1 qubit in memory for a range durations (y-axis).
