We consider realistic, multi-parameter error models and investigate the performance of the surface code for three possible fault-tolerant superconducting quantum computer architectures. We map amplitude and phase damping to a diagonal Pauli "depolarization" channel via the Pauli twirl approximation, and obtain the logical error rate as a function of the qubit T1,2 and state preparation, gate, and readout errors. A numerical Monte Carlo simulation is performed to obtain the logical error rates, and a leading-order analytic formula is derived to estimate their behavior below threshold. Our results suggest that scalable fault-tolerant quantum computation should be possible with existing superconducting devices.
I. INTRODUCTION
The surface code is a topological stabilizer code that is attractive because of its two-dimensional nearestneighbor layout and high fault-tolerant error threshold [1] [2] [3] [4] [5] [6] [7] . The most direct implementation of the surface code with superconducting circuits leads to the hardware design shown in Fig. 1 , where the circles represent qubit devices and the dotted lines between them represent tunable couplers. These tunable couplers, however, significantly increase the complexity of the hardware. We therefore analyze the performance of this most basic design, which we call the textbook architecture, as well as two other fault-tolerant architectures for superconducting qubits with fixed capacitive coupling, using a (mostly) realistic error model that includes qubit decoherence. Although our approach is valid for large qubit arrays, we especially focus on first-generation implementations with code distances d = 3 and d = 5, and show that an experimental demonstration of a small-d topological quantum memory should be possible with existing superconducting devices; the d = 5 case already exhibits a pronounced quantum memory enhancement with current transmon T 1 values. Figures 1 through 3 show diagrams of the three faulttolerant superconducting architectures we consider. For the textbook architecture we assume a two-dimensional array of transmon qubits [8] [9] [10] [11] , and tunable couplers [12] [13] [14] [15] [16] [17] [18] [19] connecting nearest neighbors. The transmon qubits have tunable frequency [20] . We assume that the CNOT gates used to implement the stabilizer measurements are performed using the CZ gate of Strauch et al. [21] , which has extremely high performance in realistic, multi-qubit settings [22] . High-fidelity singlequbit gates are carried out using DRAG pulses [23] .
The initial states of the syndrome qubits are prepared via ideal projective measurements followed by local rotations (if required). We assume the "catch-disperserelease" measurement protocol of Ref. [24] for this purpose, as well as for syndrome readout. Ideal tunable coupler performance-including infinite on/off ratio-is assumed, and stray coupling (such as that arising from unintended capacitance between device elements) is ignored. Gate parameters adopted for this architecture are discussed in Sec. IV and summarized in Table I .
FIG. 1. (Color online)
Layout of the distance-3 surface code considered here. Open circles denote data qubits, and light green (dark blue) filled circles denote X-type (Z-type) syndrome qubits. The dashed lines denote tunable qubit-qubit coupling. We refer to this hardware design as the textbook architecture.
Although tunable couplers have been demonstrated by several groups, it is unknown whether they will be practical for use in a large-scale quantum computer. Therefore we consider two alternative architectures with fixed (capacitive) coupling. Figure 2 shows an architecture proposed by Helmer et al. [25] , where each qubit in a two- dimensional square lattice is coupled to a "horizontal" as well as a "vertical" cavity. Horizontal and vertical cavities in this architecture are fixed at different frequencies, while qubit frequencies are varied between them. The CNOT gates between a pair of adjacent qubits across the cavity are performed via an effective two-qubit flipflop interaction in the dispersive regime [25] . While parallel CNOT gates between two or more pairs of qubits attached to the same resonator are allowed [25] , these simultaneous operations reduce the gate fidelity. Note that the Helmer architecture is not scalable, but the small distance cases are still of interest here.
Finally, we consider a scalable fixed-coupling architecture discussed by DiVincenzo [26] and shown schematically in Fig. 3 . Here each qubit (circle with solid boundary) is coupled to two resonators (squares with solid boundary). Both the qubit and resonator frequencies are fixed, avoiding the need for low-frequency qubit biases, and the CNOT gates are performed via a cross-resonance protocol using microwaves [27] [28] [29] . Note that the number of qubit frequencies required for this architecture is independent of the number of qubits in a large array. In Fig. 3 we have shown a possible frequency allocation represented by 14 colors, 12 for qubits and 2 for resonators. We emphasize that the error model considered here for the DiVincenzo architecture ignores the effects of higherorder interactions, microwave cross-coupling, and other multi-qubit errors typically neglected in theoretical models, which might be significant in this architecture given the larger values of coupling required.
We analyze these different architectures by fixing the intrinsic errors and gate times to estimated realistic values and calculating the logical error rate as a function of the qubit coherence time T 1 . For tunable transmon qubits, the T 2 time is assumed to be equal to T 1 , while for fixed-frequency transmons we assume that T 2 = 2T 1 . The logical error rate is calculated by mapping amplitude and phase damping to the asymmetric "depolarization" channel (ADC), a single-qubit error channel that is diagonal in the Pauli basis. This is explained in Sec. II.
The depolarization channel error model is widely used in the quantum error correction literature, and the symmetric case allows simple comparison (especially of faulttolerant error-threshold values) between different errorcorrecting codes. The action of the depolarization channel on stabilizer states can be efficiently simulated with a classical computer, enabling the direct calculation of logical error rates for large distance codes, and it accurately captures pure dephasing (but only approximately describes the decoherence found in real superconducting qubits). In Sec. III we derive a leading-order analytic expression for the logical error rate that estimates the below-threshold scaling behavior (for small code distances). Section IV gives the approximate performance of the three fault-tolerant architectures discussed above, using both the leading-order analytic formula and classical Monte Carlo simulation. There are many ways to implement a surface code with superconducting qubits, and the design details of any given fault-tolerant architecture will surely be improved and optimized over time; in this sense the architectures of Figs. 2 and 3 mainly serve as examples of our approach and indicate that largescale quantum computers should be possible with existing superconducting devices (assuming the simple error models considered here). The textbook architecture of Fig. 1 is also interesting because it likely provides a bound on the performance of any possible future superconducting surface code implementation, as the additional errorcorrection-cycle steps and unwanted multi-qubit interactions of alternative fixed-coupling designs will only degrade the performance. Comparing the performance of the textbook architecture with the Helmer and DiVincenzo architectures also allows one to assess the benefit of using tunable couplers, which increases the hardware complexity but offers a lower T 1 threshold and logical error rate.
where, γ ≡ p AD and λ ≡ (1−p AD )p PD . Next we represent the parameters p AD and p PD in terms of the single-qubit relaxation time T 1 and dephasing time T 2 ,
The combination of amplitude and phase damping on a single qubit transforms the density matrix as,
B. Asymmetric depolarization channel
Classical simulation of Eq. (7) is inefficient for a multiqubit system. For example, the textbook architecture requires 25 physical qubits for d = 3 and 81 physical qubits for d = 5. The dimension of the Hilbert space is more than 33 million for d = 3 and more than 10 24 for d = 5. This motivates one to construct a simplified error model which is tractable via some efficient classical simulation.
The asymmetric depolarization channel (ADC) is such a model, where a decoherent qubit is assumed to suffer from discrete Pauli X (bit-flip) errors, Z (phase flip) errors, or Y (both): (8) where (8) is the symmetric depolarization channel, where
The ADC is not sufficient to exactly capture the combined effects of amplitude and phase damping, as no choice of p X , p Y , and p Z lead to E ADC (ρ) = E D (ρ). However, the advantage of the ADC (and more generally the Clifford channel) is that it can be efficiently simulated with a classical computer. Therefore we construct an ADC that approximates (7).
C. Pauli twirl approximation
We approximate the combined amplitude damping and dephasing with an ADC via twirling [30] [31] [32] [33] [34] . Twirling is used in quantum information to study the average effect of arbitrarily general noise models via their mapping to more symmetric ones. Alternative approximate approaches have also been recently proposed [35, 36] .
Using the Kraus matrices (4), we can rewrite (7) in terms of Pauli matrices as [34] ,
(9) Twirling over the Pauli group removes the off-diagonal terms [33] from (9) , leading to the ADC (8) with error probabilities [34] 
4 and
If T 2 = T 1 , the ADC reduces to the symmetric depolarization channel.
We refer to the approximate reduction of any quantum channel to the ADC in this manner as the Pauli twirl approximation (PTA). The PTA corresponds to expanding the Kraus matrices in terms of Pauli matrices (and the identity), performing the Kraus summation, and keeping only terms that are diagonal in the Pauli basis. Equivalently, only the diagonal elements of the χ matrix in the Pauli basis are retained. Because of its simplicity and wide applicability, we expect the PTA to be a good starting point for refinements that might (approximately) account for the neglected non-diagonal terms.
III. PHYSICAL AND LOGICAL ERRORS
In this section we discuss the assumptions of our error model and the logical error rate in the surface code. We also describe the error correction cycle and review the concept of a distance-dependent error threshold.
A. Error model
While superconducting qubits promise scalability, they suffer from various error mechanisms caused by gate errors and decoherence [22, 23, 37, 38] . In order to model quantum noise for various surface code architectures we assume that the errors are Markovian (noise affects each individual gate operation independently) and uncorrelated (noise affects each individual qubit separately).
With these assumptions we now describe the dominant error mechanisms relevant for our purpose. We classify these mechanisms as follows:
1. Decoherence. We consider amplitude damping and dephasing as the dominant sources of decoherence, characterized by the relaxation time T 1 and dephasing time T 2 of the qubits. Decoherence is introduced here via the PTA as described above, which allows us to express the single qubit X, Y , and Z error probabilities as (10), where t is the operation time. Similarly, with the assumption of uncorrelated errors, one can quantify the error probabilities for various two qubit Pauli channels as
Also notice that our assumptions guarantee that any error (X, Y , or Z) in one of the qubits for a two qubit operation can be retrieved when errors on another qubit are traced out; for example p X = p XI + p XX + p XY + p XZ .
Unitary rotation error. Incorrect unitary opera-
tions give rise to a type of intrinsic error. By intrinsic we mean an error not resulting from noise or decoherence. For single qubit operations, such errors can always be diagonalized in Pauli X, Y or Z basis. An estimate suggests that with the use of DRAG pulse shapes [23] , these errors are ignorable with respect to the intrinsic two-qubit gate errors . Two-qubit gate errors depend on the architecture, and gate protocol.
3. Leakage. Leakage is an intrinsic error that populates a quantum state outside of the computational subspace. As far as the single qubit operations are concerned it is possible to suppress leakage below the level of any considerable effect (in comparison to other dominant errors) using quantum control techniques. More quantitatively, it's possible to show that higher-order DRAG pulse is capable to suppress single qubit leakage error below 10 −8 (theoretically) in 5 ns for superconducting qubits [23] .
In the present analysis, however, our primary focus is to investigate the effect of decoherence on logical error rates and therefore, we do not consider leakage or unitary errors rigorously. Instead, we compute the average intrinsic error of two-qubit gates for the three architec-tures and distribute it equally to all possible Pauli channels, while decoherence is treated via the PTA. In this section we discuss the use of the surface code as a single-logical-qubit quantum memory and describe the error correction cycle. A distance 3 quantum memory is shown in Fig. 4 . The open circles are data qubits and filled circles are ancillary qubits used for syndrome measurements. A bit-flip on any data qubit results in an eigenvalue change of adjacent Z stabilizers and a phase-flip does the same on neighboring X stabilizers. Therefore, Pauli X(bit-flip), Y (bit and phase-flip) and Z (phase-flip) errors are detectable (and therefore correctable) by sequential measurements of the stabilizer group generators, unless a misidentification in errordetection leads to the formation of a chain starting from one boundary and ending at another. Such error chains commute with all stabilizers but cannot be written as a product of them and therefore remain undetected. The larger the array (or higher the code distance) the lower the probability of formation of these error chains.
Fig . 5 shows the steps that a single surface code error correction cycle is comprised of. The first step is the initial state preparation for the syndrome qubits (state |0 for syndrome Z and |+ for syndrome X). While there exists multiple approaches for a qubit state preparation, we here assume that this is done via an ideal projective measurement and a subsequent local rotation (σ x or Hadamard), if necessary. The state preparation is followed by four CNOT operations with four adjacent data qubits. The order of these CNOT operations is important and in fact from the reference of a syndrome qubit the clockwise and anti-clockwise orders do not work as they lead to unwanted entanglement among the syndrome qubits [7] . We here adopt north-west-east-south protocol without any loss of generality. Notice that while for syndrome Z measurements data qubits act as control qubits, for syndrome X measurements data qubits are the targets. These four CNOT operations are followed by measurements for the syndrome Z case and requires a Hadamard operation before syndrome X qubits get measured. Such an error correction cycle can be shown to be equivalent to measuring the four-qubit operators XXXX and ZZZZ, and are repeated successively.
The data collected via the measurements of syndrome Z and X qubits at the end of every cycle are stored in a classical computer. A classical minimum-weight perfect matching algorithm is used to match (up to a homology) syndrome events to identify various error chains [2, 7] . The most likely logical errors occur when a misidentification by the classical software leads to the formation of an error chain starting from one boundary and ending at another of the same type. Such error chains are referred to as homologically nontrivial error chains and are responsible for logical X or Z operations on the encoded logical qubit. The logical error rate contributed by these error chains can be determined via classical Monte Carlo simulations. An analytical leading order estimate of the logical X or Z error rates for an asymmetric depolarization channel error model in a surface code is also derived in Appendix A, and its performance is compared against the numerical Monte Carlo simulation (as obtained in Ref. [7] ) in Fig. 6 . As observed in Refs. [39, 40] , there exists an additional mechanism for logical errors originating from error propagation via CNOT operationsthe diagonal error chains. We neglect such diagonal error chains in the derivation of our analytic formula and therefore it underpredicts the logical error rates. However, a close correspondence between our analytic estimate and numerical simulation is observed for small distance and below threshold, as shown in Fig. 6 , since the contributions from the diagonal error chains are negligible in that regime. Thus the approximate analytic formula is sufficient for the regimes of interest in this work. The convergence of the curves indicate that below the crossover point, surface code error correction helps as we go from d = 3 to d = 5 and above it hurts. We define that transition point as the distance-dependent error threshold.
IV. ARCHITECTURE PERFORMANCE
In this section we perform an analysis of the logical error rate with numerical Monte Carlo simulation (using Autotune [41] ), and also compare the result to our analytical estimate for the three superconducting architectures. We emphasize that while the numerical Monte-carlo simulation captures all possible error mechanisms, our analytical approach neglects the diagonal error chains as described in Ref. [39, 40] ; it therefore underpredicts the numerical result. However, the analytic formula enables a simple and immediate extension to alternative candidate architectures, error models, and parameter values. Table I shows the parameters used to estimate the logical error rate for the three architectures. We assume tunable transmons for the textbook and Helmer architectures and use two-qubit gate designs that use this tunability. CNOT gates are performed via cross-resonance protocol in the DiVincenzo architecture, which uses transmons operating at the flux sweet spot. Tunable transmons have an additional source of dephasing and therefore we assume T 2 = T 1 for the textbook and Helmer architectures. State preparation of syndrome qubits is assumed to be done via projective measurement followed by a conditional local rotation (as shown in Fig. 5 ) and therefore t QSP = t meas + t loc in Table I .
A. Approximate logical error rate
Here we construct an approximate analytic formula to estimate the logical error rates below threshold. We use the assumptions of our error model and add the individual error probabilities on data and syndrome qubits for each step to obtain the bit-flip and phase-flip error probabilities per cycle as
where t middle ≡ t cycle − (t QSP + t loc + t meas ), p bf and q bf (p pf and q pf ) are the bit-flip (phase-flip) error rates per cycle in the data qubits and syndrome qubits, respectively. The functions p(t) in (12) refer to the expressions (10) evaluated with operation time t. Furthermore, t QSP is the time required to complete the initial state preparation for syndrome qubits, and p QSP is the error probability that a wrong state is prepared. p intr is the intrinsic error of a CNOT gate averaged over the Hilbert space of all input states, and p meas is the error probability that a wrong eigenvalue is reported in the readout process. Note that the intrinsic gate error (described by p intr ) is assumed to be equally distributed over all 15 two-qubit Pauli errors and therefore the probability of a bit flip (occurs with X or Y errors) or phase flip (occurs with Z or Y errors) of any qubit during CNOT due to the intrinsic error is 8p intr /15. When we add each probability we are ignoring all higher-order contributions and also the coherence of these error mechanisms (although our numerical simulation takes the higher-order effects into account). We use these bit and phase flip probabilities in Eq. (A6) and Eq. (A7) of Appendix A to obtain analytical estimates of logical X and Z error rates. 
B. Textbook architecture
The textbook architecture consists of a twodimensional square lattice (as shown in Fig. 1 ) of superconducting qubits-tunable transmons-with nearestneighbor tunable couplings having infinite on-off ratio. The CNOT operations in this architecture are performed using the protocol discussed in Ref. [22] . We assume that the idle data qubit frequencies are 6 GHz and syndrome qubit frequencies are 8 GHz. The optimal parameters for a CNOT operation, shown in Table IV , are determined in Appendix B by modeling amplitude and phase damping. As mentioned earlier, T 1 = T 2 is assumed for tunable transmons, as they have an additional source of dephasing that degrades their T 2 .
We use Eq. (12) along with Eq. (A6) and Eq. (A7) to compute the logical X and Z error rates (P XL and P ZL ) for the textbook architecture. For the numerical Monte Carlo simulation, we use Autotune [41] to simulate the circuit shown in Fig. 5 for every syndrome qubit.
In Fig. 7 we show the graphs (both analytical and numerical) of the logical X and Z error probabilities per cycle with d = 3 and d = 5 codes, versus the relaxation time T 1 . Note that for d = 3 our analytic formula closely reproduces the numerical simulation, while for d = 5 it underpredicts as we expect. From the numerical plots we observe that the threshold is at ≈ 2.6 µs, where all other parameters are kept fixed as listed in Table I . This result signifies that if we construct this architecture with qubits having T 1 (or T 2 ) more than ≈ 2.6 µs, then surface code error correction helps as we increase the distance from d = 3 to d = 5; otherwise it hurts.
C. Helmer architecture
In this section we discuss the architecture proposed by Helmer et al. [25] , where superconducting qubits are arranged in a two-dimensional square lattice and each qubit is coupled to one horizontal and one vertical cavity as shown in Fig. 2 . The rectangular blocks (horizontal and vertical) are cavities, circles represent qubits and the colors denote their idle (between gate) frequencies. As pointed out in Ref. [25] , the minimum frequency range required to allocate the frequencies of all qubits in this architecture is proportional to square root of the number of qubits. While this architecture is not scalable, it is suitable for implementing the distance 3 and 5 surface code, which is a main focus here.
The CNOT gates are performed between a pair of adjacent qubits by tuning them into mutual resonance and waiting for a while somewhere near cavity frequency and thereby utilizing the effective flip-flop interaction between qubits. The waiting time for this gate is inversely proportional to the magnitude of the effective flip-flop interaction strength and for parameters used in Ref. [25] we estimate t CNOT ≈ 20 ns for this protocol. The dominant source of intrinsic errors for such a CNOT emerges from the higher-order Landau-Zener transitions during tuning and detuning and are estimated to be in the order of 10 −3 [25] . As specified earlier, the parallel CNOT operations involving the same resonator also cost fidelity due to the higher-order couplings in this architecture. However, in the low distance limit we assume that the total intrinsic error is bound by the fixed (distance-independent) value mentioned above. These parameters are shown in Table  I and used to estimate the logical error probability per cycle for this architecture. The bit-flip and phase-flip error probabilities per cycle for data and syndrome qubits in this architecture are given by (12) with t cycle = 160 ns and t middle = (4×20) ns = 80 ns. With a similar analysis we obtain Fig. 8 , which shows the plots of logical Xand Z error probabilities per cycle for d = 3 and 5 error correction with respect to T 1 , and we observe that the threshold is at ≈ 2.8 µs. 
D. DiVincenzo architecture
Here we analyze the architecture (shown in Fig. 3) proposed by DiVincenzo [26] , in which each qubit is dispersively coupled to two resonators, while each resonator couples four such qubits. In this architecture every data or syndrome qubit consists of four physical qubits where one of them is primary and the remaining three act as ancillary qubits. The CNOT operations in this architecture are performed via the virtual cross resonance pro- tocol where qubits always remain dispersively coupled to the resonators while microwaves drive the population transition between two qubits [27] [28] [29] . Notice that this architecture is fully scalable and the frequency allocation does not depend on the number of qubits. We first estimate the time required to complete a single surface code cycle in this architecture. As mentioned earlier, for each block one out of four qubits acts as a principal qubit and without loss of generality we assume the eastern qubit to be the principal one for every block. Table II shows the time required for each individual step in this architecture. The state preparation and read out takes 40 and 35 ns, respectively, as for previous architectures. The first CNOT is performed between a syndrome block and its north data qubit block and this is performed by doing a CNOT between the eastern qubit of the syndrome block and the western qubit of the data block. This CNOT must be accompanied by pre-and post-SWAP operations in the data block where the quantum state of the eastern qubit is transferred to the western one. As discussed, the CNOT operations are performed via the cross-resonance protocol and we assume the gate time for such a CNOT to be ≈ 20 ns [29] . SWAP operations between two qubits coupled via resonator is also assumed to be performed in 20 ns. The intrinsic error p intr for such CNOT gates is estimated to be in the order of 10 −3 [28] . These results give us the time durations required for each step in the error correction cycle, shown in Table II . These estimate the duration of a single cycle in this architecture to be 400 ns long.
Following the same argument as in the textbook architecture and using (12) for the bit-flip and phase-flip error probabilities with t cycle = 400 ns and t middle = (100 + 60 + 60 + 100) ns = 320 ns, we compute logical X and Z error rates. Fig. 9 shows the total logical X and Z error probabilities per cycle for d = 3 and 5. Note that the condition, T 2 = 2T 1 , leads to p X + p Y ≈ 2(p Z + p Y ) (assuming T 1,2 ≫ t cycle ), which means that the bit-flip error rate is almost twice larger than the phase-flip error rate. Since, logical X error rate mostly depends on bit-flip probability and logical Z on phase-flip, we expect P X > P Z for this case. This asymmetry between logical X and Z error rates imply a larger error threshold for logical X error in comparison to logical Z. We observe from our numerical simulation that logical Z errors can be suppressed if T 1 > 5 µs, while in order to suppress logical X errors we need T 1 > 10 µs, which is consistent with the above argument. 
E. Other possible architectures
We also discuss some other possible architectures based on fixed coupling elements, as shown in Fig. 10 . As per our convention, the squares denote resonators, circles denote qubits, and solid lines denote fixed couplings. For gate protocols (CNOT or SWAP) that require tuning and detuning qubits in and out of resonance, the greatest challenge is the frequency allocation such that firstorder Landau-Zener transitions can be avoided. We observe that with DC control-based gate protocols [21, 22] , none of these architectures can avoid first order LandauZener transitions. This fact is an inherent property of the topology of these architectures. However, we note that with microwave-control-based gate protocols (for example, cross-resonance), these unwanted transitions can be avoided.
The crucial role of Landau-Zener transition on the error mechanisms for these architectures motivates us to estimate this error: For any two level system LandauZener formula predicts the diabatic transition probability as
where g is the coupling between the levels, ǫ(t) is the time-dependent energy level separation and t * is the time when the levels are in resonance. Assuming parameters relevant for superconducting architectures (g ≈ 45 MHz andǫ(t * ) ≈ 2 GHz/ns), we obtain a Landau-Zener transition error 1 − P LZ of about 4%. This error is unacceptably large. We do not attempt here to perform a more quantitative analysis of logical error rate for these architectures. However, we emphasize that while microwave control-based gate protocols may prove to be useful for these cases, such gate operations have not yet been analyzed in the context of these designs. We have investigated the logical error rate and faulttolerant error threshold for three superconducting surface code implementations. While the coherence time has been improving over the past few years for superconducting qubits, we discuss here the minimum coherence time required to achieve error correction. The logical error rate for d = 3 and 5 is computed as a function of qubit coherence time and the threshold is found to be dependent on the architecture, error model, and assumed gate protocol. Table III shows our main results. These error thresholds are within reach of current state-of-theart superconducting circuit designs. The operation time requirements for qubit state preparation and readout are, however, yet to be achieved experimentally to the accuracy assumed in this work. Our analysis can be extended to the future surface code architectures. As mentioned earlier, the effect of decoherence on the logical error rate is a primary focus in this work, and our error models neglect various higher-order and unintended stray couplings between qubits. Exploring the effect of these factors is a possible direction of future research.
V. CONCLUSION
Appendix A: Derivation of approximate surface code logical error rate
In this section we derive the logical error per qubit per cycle as a function of the single qubit error rates, to leading order. Our derivation here does not include the "diagonal" error propagation via CNOT gates [39, 40] and therefore underestimates the logical error rates. Logical error rates for X and Z errors per cycle (P XL and P ZL respectively) are defined as the probability of formation of an X or Z error chain in the surface at the end of a single cycle. We consider the logical X error first, and the expression for the logical Z error follows from a similar combinatorial argument. Suppose p bf and q bf are bit-flip error probabilities (per cycle) in the data and syndrome qubits, respectively. The dominant error mechanism emerges from the fact that (d + 1)/2 errors either get misidentified as (d − 1)/2 errors (with 100% probability) or as a different arrangement of (d + 1)/2 errors (with 50% probability), thereby producing an error chain after attempting error correction. Such a process can happen in three ways. ways, and such an error chain may occur in any one of the d rows, leading to
Note that the chance of misidentification of these (d + 1)/2 errors is 100% for this case because the classical error detection software is based on minimal-weight perfect matching. This expression was previously derived in Ref. [7] . Case 2. In this case (d + 1)/2 errors occur in two consecutive rows, as shown in Fig. 4 . We refer to such an error chain as a 'broken' error chain and call the point where the chain changes its row as the 'breaking point' (shown in Fig. 4) . In order to estimate this case correctly one needs additional care with error-chains starting from one boundary and ending at the same boundary in a different row. We refer to such an error chain as a 'clasp' (shown in Fig. 4 ). Notice that clasps are homologically trivial and therefore should not be considered as a source of logical error. In order not to count these clasps, we classify this case into two mutually exclusive and exhaustive (to leading order) subcases: i when errors occur in horizontal links of a surface code lattice and ii when there are no errors on horizontal links. Also, observe that chains with errors in more than one horizontal links contribute to a higher-order process and are therefore excluded from our leading order analysis. If we think that the horizontal link with error divides a row into shorter and longer arms (also shown in Fig. 4) , then for subcase i, the number of ways an error chain is formed (W 1 ) is constrained by the condition that all sites of the shorter side cannot be filled with errors for any error chain since in that subcase one would be constructing a clasp. Satisfying this condition, for a given orientation and a specific pair of adjacent rows, we obtain,
For subcase ii all the single physical qubit errors are distributed among vertical links in two adjacent rows. In this subcase, for a given distribution of single qubit errors, in order not to overcount the homotopic error chains one needs to adopt a convention to place the breaking point. Without loss of any generality, we adopt the convention that the breaking point for this subcase is always placed right next to the rightmost error on the lower arm. Such a convention prevents overcounting of homotopic error chains. The remaining condition one needs to satisfy for this subcase is not to place all single qubit errors on the longer arm of the error chain. This condition prevents us from overcounting case 1. Satisfying these conditions, we find the number of ways an error chain is formed (W 2 ) for subcase ii as,
Combining these results we obtain the logical X error probability per cycle for case 2 as
In the first line, the factor of 2 comes from the orientation (bottom-left to top-right or top-left to bottom-right) of the error chain, the factor of 1/2 denotes the fact that the classical error detection software misidentifies such an error chain with a 50% probability, and d− 1 corresponds to the number of adjacent pair of rows in a distance-d code.
Case 3. The third process that contributes to the same order involves error chains weaving through surfaces in different time slices. In Fig. 11 we show a single row of a distance-5 surface in two subsequent time slices. Note that the geometry of locations of data qubits and measurement events for this case exactly correspond to the geometry of broken error chains discussed in case 2, except for the fact that the breaking point is along timelike direction instead of spacelike one. In analogy with case 2 we argue that such a situation happens for two subcases: i when there is one measurement error with a probability q bf on one time slice along with (d − 1)/2 bit-flip errors on data qubits in two subsequent time slices in a single row, and ii when there are only (d + 1)/2 bit-flip errors on data qubits in two subsequent time slices in a single row. Bit-flip error probability on a syndrome or data qubit in one of the two subsequent time slices is in fact p bf (1 − p bf ) or q bf (1 − q bf ) and keeping only leading order terms we approximate those as p bf or q bf . Note that the two subcases of case-3 can be mapped exactly with the two subcases of case-2 as far as their combinatorics are concerned and following a similar argument as in case 2 we obtain
where the suffix denotes qubit index, g represents the coupling between the qubits and η is the anharmonicity of the qubit. For a CZ operation we control the frequency of the first qubit (ω 1 (t)) with an error function pulse as described in Ref. [22] while the frequency of the second qubit is kept constant. The Kraus matrices for the amplitude damping channel of any three level quantum system are given by, 
where λ k for k = 1, 2, 3, 4 being parameters of our decoherence model. We assume the same amplitude and phase damping probability for |1 and |2 states (λ ≡ λ 1 = λ 2 and λ ′ ≡ λ 3 = λ 4 ) and represent λ and λ ′ as functions of time duration (∆t) and T 1 , T 2 of the quantum system as, The assumption that decoherence affects each qubit independently enables us to construct the full Kraus matrices for the qubit-qubit model by performing all possible tensor products between individual single qubit Kraus matrices. We first simulate the Hamiltonian given by (B1) for parameters given in Table. I without decoherence to obtain an optimal pulse shape that maximizes the average fidelity of the CZ gate for a given coupling and gate time. Next we apply our decoherence model described by Eqs. (B2) and (B3) on those optimal pulses. Fig. 12 shows plots of leakage error from |11 for such decoherence model (for T 1 = 10 µs) applied on optimal pulses with respect to various total gate time and for various values of coupling strengths. Fig. 12 also shows that there exists an optimal point corresponding to total gate time ∼ 11 ns at g = 55 MHz for which the leakage from |11 state is the minimum under decoherence. We use this point for the CZ part of the CNOT operation in textbook architecture and assuming that local rotations can be performed almost exactly in 5 ns, a CNOT requires 21 ns time duration as it involves two Hadamard operations along with a CZ. Table IV shows the optimal parameters and results obtained from this analysis. 
