Abstract We analytically designed the control bias pulses to realize new multi-qubit parity detector gates for 2-Dimensional (2D) array of superconducting flux qubits with non-tunable couplings. We designed two 5-qubit gates such that the middle qubit is the target qubit and all four coupled neighbors are the control qubits. These new gates detect the parity between two vertically/horizontally coupled neighbor qubits while cancelling out the coupling effect of horizontally/vertically coupled neighbor qubits. For a 3×3 array of 9 qubits with non-tunable couplings, we simulated the effect of our new 5-qubit horizontal and vertical parity detector gates. We achieved the intrinsic fidelity of 99.9% for horizontal and vertical parity detector gates. In this paper we realize Surface Code memories based on the multi-qubit parity detector gates for nearest neighbor superconducting flux qubits with and without tunable couplings. However, our scheme is applicable to other superconducting qubits as well. In our proposed memory realization, error correction cycles can be performed in parallel on several logical qubits or even on the entire 2D array of qubits, this makes it a desirable candidate for large scale and longtime quantum computation. In addition to extensive reduction of the number of control parameters in our method, the error correction cycle time is reduced and does not grow by increasing the number of qubits in the logical qubit layout. Another advantage of this approach is that there will not be any dephasing from idle qubits since all the qubits are used in the error correction cycles.
Introduction
One of the most important areas of research in the field of quantum computing is to design and implement high efficiency and fault tolerant scalable quantum architectures. The quantum systems are intrinsically error prone since the state of qubits can change by environmentally-induced errors. Therefore, it is required to apply quantum error correction schemes to preserve the state of qubits during the idle times.
Quantum errors on single qubits will propagate in quantum circuits through multi-qubit quantum gates.
A bit-flip error in a control qubit propagates to the target qubit and a phase-flip error in a target qubit propagates to the control qubit. Additionally, erroneous gates can introduce errors to their coupled qubits. Furthermore, in Quantum Error Correction (QEC) schemes, an error in measurement can introduce errors to the result of calculation, however we are not considering these errors in this paper.
There are many quantum error correcting codes [1] . Some of the most recognized ones are Shor's 9 qubits code [2] , Steane 7 qubits code [3] , Calderbank-Shor-Steane (CSS) code [4] , Stabilizer code [5] , and more recently the Bacon-Shore code, Repetition Code, and Surface Code [6, 7, 8, 9] . One of the most promising fault tolerant quantum computing schemes that follows the fault tolerant metrology proposed by Martinis [10] is the Surface Code [1] . Martinis [10] proposed a metrology for fault-tolerant error correction for scalable quantum computers by measuring qubit parities which detect bit-flip and phase-flip errors in pairs of qubits. Based on his metrology, in a parity operation which consists of one-qubit, two-qubits and measurement components, we need to keep the error probability of each component less than a defined threshold to reach an error suppression factor ⋀ of higher than 1. Higher order error detections lead to lower logical error probability as follows [10] :
here n is the order of error, and ⋀ = / is the error suppression factor, where is the probability of physical error, is the error threshold. According to Martinis, ⋀ is "the key metrological figure of merit that quantifies how much the decoding error drops as the order n increases by one" [10] . Here ⋀ > 1 means that the physical error is lower than the threshold , and by making the error correction code larger the decoding error is decreased exponentially with n [10] . Assuming after Martinis that a typical quantum algorithm implementation uses 1018 operations, and that we need to achieve the overall logical error probability of less than = 10 -18 meaning a suppression factor of ⋀ = 10, this leads to order error n = 17. To achieve error correction of n order in a Surface Code architecture [8] we need (4n+1)×(4n+1) array of qubits. This requires as many as 4761 qubits for n=17. Although the cited above number seems to be large, Surface Code architecture still is the best practical error correction method for fault tolerant quantum computing because of high tolerance to the errors which allows error rate of 1% per operation.
Moreover, its two-dimensional physical layout with nearest neighbor couplings makes it a scalable and practical approach in solid-state quantum computers [1] , [8] , [10] . Furthermore, because of simple projective measurements, and tracking of the detected errors in software, there is no need for applying physical correction gates, therefore introducing less noise and perturbation to the physical system. In
Surface Code error correction, it is of high interest to be able to perform error correction cycles on many qubits simultaneously which is the focus of this work.
The surface code architecture is based on the stabilizer formalism and consists of Z and X stabilizers [7] .
Surface Code introduces ancillary qubits dedicated to these stabilizers and repetitively performs projective quantum non-demolition (QND) parity measurements on these ancillary qubits to measure the bit-flip and phase-flip errors of the data qubits [8] . The number of ancillary qubits in these measurements is approximately equal to the number of data qubits. Although it has been shown that this approach results in storing information with a lower error rate, the Surface Code methodology has a high computational and resource overhead to realize the logical states and process information. It has been
shown [34] [35] [36] [37] [38] that the distance-three Surface Code is achievable with reduced number of physical qubits from Surface-25 to Surface-17 and Surface-13. In [8] the authors created a logical qubit from 13 physical qubits and performed quantum error correction to preserve the logical states. In [35] the authors showed that the Surface-17 code has the best threshold with lowest cost and is a suitable candidate for experiments and architectural explorations [35] , [19] . Therefore, we focus on Surface-17 code to illustrate our memory architecture, however, our proposed scheme is extensible to very large-scale Surface Code architectures and larger logical qubits. Surface-17 code consists of 9 data qubits surrounded by 8 measurement qubits (four X and four Z stabilizers).
Quantum errors and instability of quantum states are considered fundamental obstacles to achieve large scale quantum computers. Therefore, investigating methodologies to design quantum architectures with efficient error correction schemes is an important area of research. By efficient designs we mean those that have reduced control circuitry and less computational overhead, are more reliable and faster. In this work we aim to find such an efficient Surface-17 Code implementation to realize large scale 2-Dimensional Nearest Neighbor (NN) quantum computers. This was possible due to our new multi-qubit (more than three qubits involved) parity detector gates which can act on many qubits simultaneously.
Previous works
In [19] a scalable scheme for Surface Code implementation is proposed for flux tunable transmons [23] with nearest-neighbor couplings. The Helmer architecture [36] consists of transmons in a 2D array such that each transmon is coupled to one vertical and one horizontal cavity, and the Surface Code cycle time is 160 ns. This architecture has tunable couplings, but it is not scalable to large number of qubits as the method requires to allocate frequencies to all qubits and the minimum frequency range needed is proportional to the square root of the number of qubits [24] .
In DiVincenzo architecture [37] , each transmon is dispersively coupled to two resonators, while each resonator is coupled to four qubits. This architecture is fully scalable, however; the couplings are not tunable, and each error correction cycle time takes 400 ns.
The Textbook and Helmer architectures both have tunable couplings and the qubit relaxation time is 1-10 µs, while DiVincenzo architecture with non-tunable coupling has 1-40 µs qubit relaxation time.
The discussed above architectures and their timings [19, 36, 37] considered a transmon based physical realization. Currently we don't know any Surface Code architecture based on flux qubits in literature and since quantum computers based on flux qubits remain of high importance among Superconducting qubits architectures, here we propose an efficient scalable Surface Code architecture for flux-based physical model.
We consider a 2D nearest neighbor superconducting architecture consisting of flux qubits with long-range DC-SQUIDs (DC Superconducting Quantum Interference Devices) couplings. In [22] Fowler et. al.
presented an architecture based on flux qubits with long-range couplings that is scalable and suitable for 2D error correction codes. In [21] authors designed tunable couplings based on the architecture proposed by Fowler in [22] . This motivated us to design a single-shot multi-qubit parity detector gate for tunable coupled [21] flux qubits architectures. A single-shot gate means that only one single pulse is required to realize the gate operation. In [11] we introduced a novel single-shot multi-qubit parity detector gate and applied this parity gate to generate efficient circuits for Mirror Inversion (MI) [12, 13, 14] as a sequence of controlled-unitary operations between 2-Dimensional nearest neighbor qubits.
The method from [11] significantly increased the efficiency by lowering the computational overhead since the state transfer could be achieved in less computational steps without needing ancillary qubits.
Furthermore, there was not any dephasing from idle qubits since all the qubits were used in the MI operation as target or control qubits. Dephasing or qubit precession is a phenomenon in qubits based on
Josephson junction or trapped ion in which the basis states are non-degenerate in the absence of external interactions. Therefore, when no gate operations are performed, a time-varying relative phase develops between the basis states [15] . One way to overcome dephasing is to design special encoding architectures as presented in [15] and [16] .
In [17] , the author analytically showed how a multi-qubit controlled-unitary gate can be achieved in a single pulse and suggested the usage of this gate in error correction codes. These results motivate us to extend the work from [11, 17] to investigate designing multi-qubit parity detector gates and apply them to the Surface Code architecture to achieve less computational overhead, less control circuitry, and faster error correction cycles.
Our proposed architecture
Here we explain how the parity detector gates can be used in a memory realization for 2D NN architectures based on Surface Code error correction. We achieved reduced computational overhead and overall error correction cycle time and simplified control circuitry. For the systems with non-tunable couplings where couplings cannot be turned off, we realize Surface Code architecture by designing a new multi-qubit parity detector gate where each target qubit is connected to 4 adjacent control qubits.
In tunable coupling architectures, we can complete each error correction cycle in two pulse sequences of multi-qubit gates in combination with Hadamard gates and measurement operations. This approach leads to maximum depth 5 circuit realization for any size Surface Code architecture. The multi-qubit gate operation only takes 10 ns in our considered physical model. This approach extensively reduces the error correction cycle time comparing with possible corresponding traditional approaches which would utilize 4 CNOT gates, each taking 10 ns for flux qubit-based architectures.
For systems with non-tunable couplings, the interaction between qubits cannot be turned off. Since the gate operation time is a control parameter, it is important to note that we can reduce the multiqubit gate operation time even more by adjusting other system parameters. Additionally, in our approach there are no qubit left idle, therefore, it does not suffer from qubits precessions in idle times.
In section 2, we provide some background about simulation method and analytical design of existing single-shot parity detector gate for systems with tunable couplings. In section 3, we explain the physical model that we used in this work as well as the analytical design of the new multi-qubit parity detector gate for systems with non-tunable couplings. In section 4, we briefly introduce the Surface Code memory realization. In section 5, the Surface Code architectures utilizing the multi-qubit parity detector gates are
proposed. In section 6, we conclude our work and plot the future research directions.
Background
In [11] authors introduced a novel single-shot multi-qubit parity gate for superconducting qubits with Ising interactions. This gate inverts the state of the target qubit only if the two neighbor control qubits have opposite logical states. Here in section 2.1, the quantum simulation method is explained. In section 2.2
we introduce a new logical notation to describe a parity detector gate. In section 2.3, the analytical approach for designing a quantum gate for systems with tunable couplings is reviewed.
Simulation method
To simulate the dynamics of a quantum system, a Time Dependent Schrödinger Equation (TDSE) needs to be solved. Knowing the initial state of the system and the Hamiltonian, the time evolution of the state of the system is calculated as follows:
Here, for a single qubit system, the state |ψ( )⟩ is the initial state of the system at start time , the state |ψ(t)⟩ is the probability distribution for the outcome of each possible measurement on the system at time t:
where is the probability amplitude of qubit being in state |0⟩ and is the probability amplitude of qubit being in state |1⟩. The Hamiltonian H is the time evolution operator that maps the current quantum state to the next state. ћ = h/2π, where h is the Planck constant. In our simulations Planck constant is normalized to 1.
We can write the Schrödinger equation as below where U is the unitary transformation of the system.
Any single qubit gate operation can be realized using a 2×2 unitary matrix U [29] :
where is the global phase factor, ( ) and ( ) are the rotations and around the Z axis, and is the rotation around the Y axis in a Bloch sphere.
In the realm of gate design, we would like to set up the control parameters of the system such that the unitary operation U becomes as close as possible to our target gate transformation matrix. This can be done experimentally [20, 25, 26, 27, 30] , analytically [32] , by utilizing optimization methods [33] , or Machine Learning approaches [18, 28] .
Introducing the notation of parity detector gate
As we know the symbol of full colored circle on a control qubit means that when the logical state of control qubit is 1, the gate operation is performed on the target, while the symbol of empty circle means that when the logical state of control qubit is 0, the gate operation is performed on the target qubit. As shown in Fig. 1 (a) , we introduce the half-colored circles which mean that the logical state of the control qubit can be 1 or 0. The half-colored circles are meaningful when applied in pairs to represent the opposite states of two control qubits resulting in a gate operation on target qubit as shown in Fig. 1 
Parity detector gate for systems with tunable couplings
In this section we explain the analytical derivation of Multi-qubit single-shot parity detector gate for 2D NN systems with tunable couplings based on the work in [11] . To illustrate how we can analytically design controlled-unitary gates, consider the example for a 3-qubit parity detector gate, where the target B is the middle qubit and the adjacent qubits A and C as shown in Fig. 2 We define the couplings and between qubits A and B, and B and C, respectively. In tunable coupling systems, during the quantum computation we can change the value of the couplings from negative to positive values including zero [21] . We consider systems in which all qubits have the same tunneling energy parameter ∆, and the coupling energy is much higher than the tunneling energy ( ≫ ∆) [31] .
Knowing that any single-qubit gate operation can be realized using a 2×2 unitary matrix U as shown in Eq.
(6), we can use a reduced Hamiltonian scheme [31, 32] to realize arbitrary controlled-unitary operation with multiple control qubits and one target qubit. To design such a gate, we apply a high bias to all the control qubits to preserve their state, and we adjust the bias on the target qubit for the duration of gate operation. The Hamiltonian of a system shown in Fig. 2 can be written as follows:
For a parity detector gate realized as shown in Fig. 2 , qubits A and C are control qubits, therefore, we can decompose the Hamiltonian in Eq. 6 to 4 subspaces depending on the logical states of A and C. The expectation value of Pauli matrices for qubits A and C are +1 or -1 depending on the state of them being in |0⟩ or |1⟩. The expectation values of Pauli matrices for qubits A and C are zero when their states are frozen to |0⟩ or |1⟩. Therefore, the above Hamiltonian is reduced to four 2×2 Hamiltonians for qubit B in the subspaces based on the basis states of qubits A and C being |00⟩, |01⟩, |10⟩, and |11⟩ [11] .
Note that the bias terms associated with the control qubits A and B only contribute to a global phase. By adjusting the bias values such that the overall global phase would be an integer multiple of 2π, we can remove the unwanted phase. For each subspace we take the integration of the Schrödinger equation over the time of gate operation and equate the generated unitary matrix to the desired 2×2 unitary gate operation. For example, to realize a parity detector gate, we want to realize an X operation in subspace AC = |10⟩ and AC = |01⟩, and we want to realize an Identity operation for subspaces AC = |00⟩ and AC = |11⟩. By solving the system parameters for parity operation, we can find the duration time and amplitude of the bias pulse needed to be applied on qubit B. Consider we keep biases on A and C at a high value (3GHz) such that it does not cancel the effect of the couplings and apply a bias pulse to qubit B. We have shown in [11] that by choosing bias on qubit B to be zero and solving system parameters to cancel out unwanted phases we can realize a controlled unitary parity detection gate with 100% fidelity. By changing the bias of qubit B from 3 GHz to zero for duration T = 10 ns in a system with the following parameters we achieved a parity detector gate: = 25 , = = 1 , = = 3 GHz. Notice that only one pulse needs to be applied to qubit B to realize the controlled unitary operation. It is important to mention that the gate operation time T is a flexible parameter and we can achieve a shorter gate operation time by increasing the tunneling . The analytical derivation of these values is shown in [11] as well as how the proposed gate can be efficiently used to realize mirror inversion operations in 2D
and 3D arrays of NN architectures. In [11] , we introduced four single-shot operations based on parity detector gate named P, CNOT-P-CNOT, P-CNOT, and CNOT-P, shown in Fig. 3 . All of them can be used in our Surface Code architecture for tunable coupling systems. Note that we can add/remove arbitrary number of three-qubit parity detector gates in the middle to scale up or down these single-shot gates.
(a)
We can generalize the Parity detector gate realization to multi-qubit parity detector gate for very largescale array of qubits. The simulation results for a 9-qubit parity detector P gate (see Fig. 3 (a)) with the input | ⟩ = |100011110⟩ are shown in Fig. 4 . Looking at the output after applying parity detector gate | ⟩ = |110111100⟩, we see that the states of qubits , , and have changed because the parities were detected between vs , vs and vs , respectively. While the states of all control qubits were preserved during the gate operation time. Simulation results of a 9-qubit parity detector operation (shown in Fig. 3 (a)) in tunable coupling systems. The vertical axis shows the probability amplitude of each qubit being in state |1⟩, and the horizontal axis shows the time considering 0.1 ns time steps. The simulation procedure is discussed in section 2
In section 5, we explain how the multi-qubit parity detector operations from Fig. 3 can be used to realize a quantum memory with Surface Code error correction.
3 New Multi-qubit parity detector gates for 2D NN systems with non-tunable couplings
In this section, after describing our physical model we explain the analytical approach of designing the new multi-qubit parity detector gates for systems with non-tunable couplings.
Physical Model
Consider a 2D NN system with m rows and n columns, where qubits are located using index (j, k), j = 1, 2, …, m, and k = 1, …, n. The evolution of a 2D NN system of m×n superconducting flux qubits inductively coupled through DC-SQUIDs [21, 22] with and without tunable couplings [39] [40] [41] [42] can be represented by the following Hamiltonian:
where ( , ) is the tunneling energy for the qubit located at index ( , ). ( , ) is the bias energy for the qubit located at index ( , ). Here ( , )( , ) is the coupling energy between two adjacent vertically coupled qubits in column k. Similarly, ( , )( , ) is the coupling energy between two adjacent horizontally coupled qubits in row j. The couplings can be tunable or non-tunable depending on the physical constraints of the system.
The Hamiltonian operator in Eq. 11 is a 2 m×n × 2 m×n matrix. Here, when the number of qubits in the system increases; it is very hard to analytically solve this matrix to derive the system parameters. However, using a reduced Hamiltonian technique [15] , we can solve the system parameters to realize a desired gate operation as explained in section 3.2, where it is analytically shown how to derive the control parameters to achieve a multi-qubit parity detector gate. Following a brief introduction to Surface Code in section 4, we show how to use this gate in Surface Code architectures in section 5.
New Multi-qubit parity detector gates for 2D NN systems with nontunable couplings
In a 2D array of qubits with non-tunable couplings, each qubit is interacting with 4 neighbors. Therefore, to realize a gate operation for 5 qubits, a 32×32 Hamiltonian matrix represents the evolution of a 5 qubits system (qubits with black labels in Fig. 5 ). We use the reduced Hamiltonian scheme [31, 32] to break this
Hamiltonian to sixteen 2×2 Hamiltonian matrices. Each 2×2 Hamiltonian describes evolution of the target qubit T in a subspace depending on the states of the control qubits A, B, C, and D. Then for each H2×2
Hamiltonian we generate the unitary matrix by integrating the Schrödinger equation, and next equating the generated unitary matrix to a desirable controlled unitary gate operation for that subspace. In Fig. 5 , we show a 9 qubits system that we used in our simulation to prove that the states of the surrounded qubits E, F, G, and H are not affected by the proposed parity detector gate operation. 
where |ψ⟩ represents the subspace where the state of each qubit A, B, C, and D is initialized to |0⟩ or |1⟩.
We keep all the biases on non-interacting qubits E, F, G, and H high, such that their states are preserved.
Note that the qubits E, F, G, and H do not contribute to the Hamiltonian of the target qubit T as they don't have any direct coupling with T. We can write the unitary operation on target qubit based on the system parameters such as follows:
where E is the effective bias, ω is the angular momentum of gate operation, is the tunneling, and ω = 2 √ + . Here is the global phase factor, and k is considered zero in our physical model.
Depending on the initial state of |ψ⟩, the expectation value of operator can be +1 or -1, where = A, B, C, D. Now we can choose the system parameters such that the bias of the middle qubit T and couplings of two vertically adjacent qubits A and B ( ± ± ) cancel out the couplings of two horizontally adjacent qubits C and D (± ± ), such that the middle qubit T will not get affected by the horizontally coupled qubits. Therefore, we assume controlled-unitary operations as shown in Fig. 6 (a) to cancel out the effect of horizontal couplings. The architecture to cancel out the effect of vertical couplings is shown in Fig. 6 (b) .
(a) (b) 
where the notation ± means either + or -.
To realize an X operation, we need to cancel the diagonal terms in Eq. 13, and force sin(ω ) = − , where − contributes as a global phase factor = 3 /2 on the target qubit. Therefore, the following conditions must satisfy:
Therefore, we need to cancel out the effective bias (E=0) and satisfy sin(ω ) = 1 which results in the following condition:
where n is an integer and T is the gate time duration.
One set of parameters to solve Eq. 18 is = 25 , n=0, T= 10 ns, and to cancel out the effective bias (E=0) we need the following condition:
where the notation ± means either + or -. As shown in Table 1 , the effective bias in Eq. 19 (subspace QAQB =|10⟩), expands to four subspaces depending on the state of QCQD:
By choosing = , and = , Eq. 21 and Eq. 22 converge to = 0, while Eq. 20 and Eq. 23 simplify to = − − = −2 , and = + = 2 .
Similar calculation can be done for subspace QAQB =|01⟩ and reach the same results for bias on target qubit T ( ).
Therefor to realize an X operation on target qubit T, we keep biases on all control qubits high = = = = 3 GHz, and apply a bias pulse on target qubit with three sequences each taking 10 ns with the following magnitudes.
where represents bias magnitude of the i th sequence on target qubit T. The order of applying these three sequences does not matter, since in the end of applying these three sequences (after 30 ns), the desired gate operation has been realized. Table 2 summarizes all possible effective biases in each subspace under the three pulses given by Eq. 24, Eq. 25, and Eq. 26. Table 2 is derived by substituting the bias value of the target qubit under each sequence ( , , ) in the Effective bias (E) formula shown in Table 1 .
To perform an X gate in subspaces QAQB =|10⟩ and QAQB =|01⟩, we set the coupling values such that we cancel out the effective bias in one of the three pulse sequences shown in table 2. Under the remaining two pulse sequences in subspaces QAQB =|10⟩ and QAQB =|01⟩, as well as all three pulse sequences in subspaces QAQB =|00⟩ and QAQB =|11⟩, we want to achieve Identity operation. By choosing = and = , most of the equations in Table 2 simplify or cancel out and only 7 effective bias equations remain which are listed below.
Under the above effective biases, we like to achieve an Identity gate operation. Therefore, we should choose the coupling values such that the off-diagonal terms in Eq. 13 be zero and diagonal terms be 1, this results in the following condition:
where n is an integer. Knowing ≫ , we can ignore in ω = 2 √ + , which results in ω = 2 = . Therefore, we choose the effective biases in equations above as a multiple of some integer over gate duration (10 ns) such as below.
2 ± 2 = ± = 2 ± (37) The above set of parameters realize a parity detector gate that detects the parity of qubits A vs B (vertical)
while ignoring the parity of D vs C (horizontal). Fig. 8 shows the simulation result of vertical parity detection gate. Fig. 9 shows the simulation result of horizontal parity detection gate. The simulation results of architectures from Fig. 7 (a) and (b) are shown in Fig. 8 and Fig. 9 , respectively.
We consider qubits E, F, G, H in the simulation to show that their states will remain unchanged during the 5-qubit gate operations. The average gate fidelity of 99.9% was achieved for a 5 qubits vertical parity detector gate and horizontal parity detector gate operating on a 9 qubits system as shown in Fig. 8 and Fig. 9 , respectively.
We used the following formula for the gate fidelity:
where Utarget is the unitary transformation of the target parity detector gate, and U is the achieved unitary transformation calculated from the time evolution of the system as follows:
where T is the gate operation time, and ( ) is the Hamiltonian of the system at time .
Introduction to Surface Code memory realization
In this section, we briefly introduce the Surface Code memory realization. In surface code quantum computing [8] , a 2-Dimensional array of physical qubits is constructed with interleaving data qubits and measurement qubits called measure-Z and measure-X ancillary qubits, and a methodology is presented to protect the architecture from both bit-flip and phase-flip errors at the same time.
In Surface Code error correction, multiple physical qubits form a logical qubit. Therefore, the quantum information is distributed over many physical qubits that consist of data qubits and ancillary measurement qubits. The measure-X and measure-Z qubits detect phase-flip and bit-flip parities, respectively. Each measurement qubit is surrounded by four data qubits in a 2-Dimensional array. At the start, all measurement qubits are initialized to zero. At each error correction cycle, we perform measurements only on ancillary measurement qubits which stabilize the data qubits, i.e., the states of data qubits are not perturbed by the measurement. A software maps the detected error syndromes (bit-flip, phase-flip, measurement error) to a graph model which keeps track of errors and fixes the errors [1] .
In [9] , Kelly et al. developed an error correction methodology on one-Dimensional (1D) array of linear nearest neighbor (LNN) superconducting qubits named Repetition Code which preserves the logical states of qubits. In the beginning of an error correction cycle, all the measurement qubits are initialized to zero.
When qubits are idle, the ZZ operators are repeatedly applied to measure all the measure-Z ancillary qubits in a Repetition Code cycle. If the state of a measure-Z ancillary qubit is flipped in a cycle, a bit-flip error syndrome in adjacent data qubits is detected. If the state of a measure-Z ancillary qubit changes per two consecutive cycles, a measurement error syndrome is detected. Then the error syndromes are mapped to a graph which represents in software the error propagation model, and a recovery operation is applied to restore the states. It is notable that the recovery operations are applied only in software by tracking error syndromes of all cycles, the software corrects the final data measurement by fixing, if necessary, the measured data [9] . The circuitry used in [9] utilizes the primitive two-qubit controlledphase gate in combination with single-qubit rotation gates. These gates are applied to qubits in at least three sequences [9] of two-qubit gates on each cycle of Repetition Code before performing parity measurements on the ancillary qubits. They use the parity measurements information to detect bit-flip errors, and then perform some recovery operations to restore the quantum states [9] . Using our proposed parity detector gate, there is only one multi-qubit gate application on all qubits needed in each cycle of Repetition Code in 1D NN architecture. However, here we introduce a more generalized method of error correction in Surface Code that detects both bit-flip and phase-flip errors in 2D nearest neighbor architecture.
We now introduce the building block circuitry that are currently used in the Repetition Code and Surface Code architectures. As we know, applying operator to |00⟩ and |11⟩ results in +|00⟩ and +|11⟩, respectively. Applying operator to |10⟩ and |01⟩ results in −|10⟩ and −|01⟩, respectively. Therefore, states |00⟩, |01⟩, |10⟩, and |11⟩ are eigenstates of operator with corresponding eigenvalues +1, -1, -1, and +1. The operator then can be used to detect the changes in parity of data qubits. The operator can be realized using two CNOT gates applied to three qubits while the ancillary qubit is the target qubit and two data qubits are the control qubits as depicted in Fig. 10 . The ancillary target qubit is a measurement qubit called measure-Z qubit and it can be repetitively measured to detect any bit-flip between its neighbor data qubits. Similarly, the measure-X qubit can be used to detect phase-flip errors. An operator can be realized using two CNOT gates applied to three qubits where the middle qubit is the measure-X ancillary qubit and has the control role while the two adjacent data qubits are the target data qubits as depicted in Fig. 11 .
In Fig. 11 , by applying Hadamard gates before and after the ancillary measure-X qubit; we can change the state of measure-X qubit from |0⟩ to |+⟩ and revert it back after applying operator. Since the phase errors propagate to the control qubit, if adjacent data qubits contain any phase-flip error, it will change the state of measure-X qubit from |+⟩ to |-⟩, then the last Hadamard gate operation converts it to |1⟩.
Therefore, the operator can be used to detect any phase-flip of the adjacent data qubits. change from the pair of eigenvalues (+1, +1) to (+1, -1). Note that, if an X error happens on De, we will get the same result. Consequently, we cannot distinguish which data qubit the error occurred because they both will have the same measurement result. This is true for Z (phase-flip) errors too. Therefore, to uniquely identify errors on specific data qubits, we need to consider a more complex mechanism such as Surface Code [8] . 
In Surface Code, each data qubit is surrounded with 4 measurements qubits while each measurement qubit is surrounded with 4 data qubits as shown in Fig. 12 . The measure-Z qubit stabilizes the product of operators on the surrounding qubits. For example, in Fig. 12 , the qubit Zb forces the data qubits Df, De, Dc, and Db to an eigenstate of operator outer product . The measure-X qubit stabilizes the product of operators on the surrounding qubits. In Fig. 12 , the qubit Xc forces the data qubits Di, Df, Dh, and De to an eigenstate of operator outer product . Note the chosen stabilizers and must commute with one another to force the projective measurement outcome of the system into a unique eigenstate of all the stabilizers. Table 4 shows the eigenstates of and operators with their corresponding eigenvalues.
The order of applying and on data qubits to realize 4-qubit stabilizers are important. The order must be chosen to ensure that we are not measuring the result of and operators of any data qubit simultaneously. Failure to keep commutation relationship of neighbor stabilizers results in random measurements [8] . In our example, the order of and in and guarantees that the two stabilizers are commuting as well as the shared data qubits Df and De between the two stabilizer types ( and ) are interacting with one ancilla qubit of a type (Xc or Zb) at a time. This ensures robustness of Surface-17 Code to ancilla errors [35] . Suppose we have initialized the system in |0000⟩ state. If an X error happens on Df data qubit, measuring the system using stabilizer reports a change from eigenstate |0000⟩ to |1000⟩. Now if an X error happens on De, measuring the system using stabilizer reports a change from |0000⟩ to |0100⟩. As it can be seen in Table 4 , a single error in any data qubit can be uniquely specified as the measurement result lands on a specific eigenstate with eigenvalue -1. Table 4 . The set of eigenstates and corresponding eigenvalues for four-qubit stabilizers and . Note and are Pauli operators acting on the data qubits as shown in Fig. 12 where the measure-X qubit Xc stabilizes operator and the measure-Z qubit Zb stabilizes operator.
Eigenvalue +1
|+ + + +⟩ |0000⟩ In this work, our focus is on efficient Surface Code memory realization and in using multi-qubit gates as the building block circuitry. Performing quantum computation in Surface Code is not in the scope of this paper. The interested reader is referred to [8] for further information about Surface Code.
Surface Code memory based on parity detector gate
Designing quantum architectures to prevent dephasing as well as error correction schemes to detect and correct phase-flip errors and bit-flip errors are of high interest [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] . In section 5.1 and 5.2 we explain how we can use multi-qubit parity detector gates to realize Surface Code error correction.
Surface Code architecture based on parity detector gate for systems with tunable couplings
The parity detector gates in combination with Identity and Hadamard gates can detect any bit-flip or phase-flip error. Therefore, we only need Hadamard gates on measure-X qubits in the beginning and ending of each error detection cycle to detect the phase-flip errors. Also, Identity gates are applied to measure-Z qubits in the beginning and ending of each error correction cycle by simply waiting to compensate the timing of the Hadamard operations on measure-X qubits. To perform the single-qubit Hadamard gates, we must turn off the couplings between the desired qubit and its neighbor qubits.
As depicted in Fig. 14 , the circuit has depth of 5 including parity detection operations, the Hadamard gates, and measurements operations. Each global pulse application to realize the multi-qubit parity detector gate takes only 10 ns. We consider 7.1 ns timing period and bias value equal to tunneling (25 MHz) for each single-qubit Hadamard gate operation [32] . The bias value of all control qubits is 3 GHz, the bias value of all target qubits for parity detection is zero for 10 ns. For CNOT gates, the bias on all target qubits equals the coupling value between the control and target qubit for 10 ns. All the qubits are involved in each vertical or horizontal sequence gate operation, therefore, there is no qubit left idle at any time. This prevents occurrence of dephasing in the system. In each sequence (yellow areas) of an error correction cycle, the gates involved in X stabilizers are depicted in red, while the gates involved in Z stabilizers are depicted in purple.
Note that the total circuit depth of proposed error correction cycle is always 5 regardless of the total size of the Surface Code. This would allow for very large-scale Surface Code architectures with extensively reduced control circuitry.
Fig. 14 Quantum circuit of depth 5 for one cycle of error correction in Surface-17 code for systems with tunable couplings. Two sequences (vertical and horizontal) of multi-qubit parity detector gates are required to detect all bitflip and phase-flip errors. In the figure the required gates regarding X stabilizers are depicted in red, while the required gates regarding Z stabilizers are depicted in purple. All gates are applied to nearest neighbors, on the left the data qubits are green, X-measurement qubits are orange, and Z-measurement qubits are blue. see Fig. 13 to match the cycles. Note that all the gates specified in the pink regions are applied simultaneously, therefore, considering the required Hadamard gate operations and measurement the depth of the circuit is 5
Surface Code architecture based on parity detector gates for systems with non-tunable couplings
Although the tunable coupling architectures are very good candidates for error correction schemes and it is easier to perform multi-qubit gates, there are some disadvantages such as the increased circuit complexity and more noise introduction. Therefore, it is valuable to design Surface Code architecture for systems with non-tunable couplings [32, [39] [40] [41] [42] , where the coupling values will not change during the quantum computation.
In systems with non-tunable couplings, we cannot implement a parity detector operation simultaneously where two neighbor qubits are target qubits since each target qubit must be surrounded by 4 control qubits. These control qubits should have high bias values to preserve their states during gate operation. Therefore, it is not possible to simultaneously realize two vertical/horizontal parity detector gates on two neighbor columns/rows in a 2D Surface Code. The error correction cycle is performed by applying 4 sequences of multi-qubit parity detector gates as shown in Fig. 15 (a) , (b), (c), and (d). Note the order of applying these multi-qubit operators is not important, since the stabilizers are configured along the columns or rows and don't share any data qubit simultaneously. However, we chose an arbitrary order as follows: first a multi-qubit vertical X operator, second a horizontal X operator, third a horizontal Z operator, and fourth a vertical Z operator. Fig. 16 shows the required sequence of multi-qubit gates. In Fig. 16 , the required gates regarding X stabilizers are depicted in red, while the required gates regarding Z stabilizers are depicted in purple. All gates are applied to nearest neighbors. In Fig. 16 , as before, the data qubits are depicted in green, Xmeasurement qubits are shown in orange, and Z-measurement qubits are depicted in blue. Compare Fig.   15 and Fig. 16 to match the cycles. Fig. 16 Quantum circuit of depth 7 for one cycle of error correction in Surface-17 code for systems with non-tunable couplings. Four sequences (vertical and horizontal) of multi-qubit parity detector gates are required to detect all bitflip and phase-flip errors. Note that all the gates specified in the pink regions are applied simultaneously, therefore, considering the required Hadamard gate operations and measurement the depth of the circuit is 7
Note that we can add/remove arbitrary number of five-qubit parity detector gates to scale up or down these gates in a larger 2D array of qubits when realizing a large-scale Surface Code memory. For example, in Fig. 15 .d, a 9-qubit vertical parity detector gate is realized with two target qubits Zb and Zc, and control qubits Db, Dc, Df, De, Dd, Dh, and Dg. The notation for this 9-qubit gate is shown in the last vertical Z operator (last pink column) in Fig. 16 . Since, the 9-qubit vertical parity detector detects only parity of vertically coupled control qubits Dc versus De, and De versus Dg, to simplify Fig. 16 , we did not show the connection of horizontally coupled control qubits Db and Df connected to target qubit Zb, as well as horizontally coupled control qubits Dd and Dh connected to target qubits Zc.
Conclusions
We designed two new multi-qubit parity detector gates for nearest neighbor (NN) architectures with nontunable couplings. We achieved fidelity of 99.9% for 5-qubit horizontal and vertical parity detector gates.
These gates are designed for the physical realization of nearest-neighbor flux qubits inductively coupled to each other. However, the proposed multi-qubit parity detector gates and proposed Surface Code architectures can be applied to other physical realizations. These gates are realized in only 3 sequences of pulse applications, and thus extensively reduce the control circuitry. There are many applications for these new gates, here we proposed the application of these gates in Surface Code scheme for quantum memory realization to detect both bit-flip and phase-flip errors. Furthermore, for nearest neighbor (NN)
architectures with tunable couplings, we applied the existing single-shot parity detector gates to realize Surface Code memory.
The advantages of using our proposed Surface Code architectures can be summarized in four main points:
 It extensively simplifies the control circuitry.
 It achieves a much faster error detection cycle. Since the gate operation time is a control parameter, we can reduce the error detection cycle time even more by adjusting the system parameters such as tunneling.
 It is scalable to very large-scale Surface Code architectures and only needs a circuit of depth 5 for any size of 2-Dimensional array of qubits with tunable couplings. And only needs a circuit of depth 7 for any size of 2-Dimensional array of qubits with non-tunable couplings.
 Our method removes the possibility of developing relative phases (dephasing) during idle times, since there are no idle qubits in these schemes.
We considered the Surface-17 code to show how using our proposed multi-qubit parity detector gates greatly reduces the control circuitry complexity. However, investigating new logical qubit layout patterns which are optimized based on multi-qubit gates as building blocks is another topic for future research. For example, in Surface-17 Code scheme for systems with non-tunable couplings, we require to apply four sequences of parity detector gates per each error correction cycle, while CNOT gates (on the borders of Surface-17 logical qubit layout) are performed in parallel with parity detector gates. For our considered physical realization with non-tunable coupling, the gate time of CNOT gates is longer than the parity detector gates, since for realizing a CNOT gate between two qubits, the interaction of the target qubit with its three other neighbors must cancel out. Thus, for smaller Surface code logical qubit layouts like Surface-17, the use of CNOT gates on the borders increases the cycle time. However, if we consider different logical qubit layout pattern such that we don't use CNOT gates at all, using presented parity detector gates extensively raises the performance of a large-scale Surface Code architecture.
Note that the experimentalists can realize the parity detector gates proposed in this work by choosing different set of parameters which match with their physical system. They simply can multiply each parameter by a scaling factor such that the conditions explained in section 3.2 remain satisfied [11] .
In this work, we considered ideal control pulses to design the new gates. The effect of rise and fall times of non-ideal control pulses as well as the effect of the next to nearest neighbor couplings on the gate fidelity will be future topics of research.
