We present a scalable scheme for executing the error-correction cycle of a monolithic surfacecode fabric composed of fast-flux-tuneable transmon qubits with nearest-neighbor coupling. An eight-qubit unit cell forms the basis for repeating both the quantum hardware and coherent control, enabling spatial multiplexing. This control uses three fixed frequencies for all single-qubit gates and a unique frequency detuning pattern for each qubit in the cell. By pipelining the interaction and readout steps of ancilla-based X-and Z-type stabilizer measurements, we can engineer detuning patterns that avoid all second-order transmon-transmon interactions except those exploited in controlled-phase gates, regardless of fabric size. Our scheme is applicable to defect-based and planar logical qubits, including lattice surgery.
The scaling of small quantum processors [1] [2] [3] [4] [5] into large qubit arrays capable of fault-tolerant quantum computation (FTQC) [6] is an outstanding challenge for leading experimental quantum information platforms [7, 8] .
Modular [9] and monolithic [10] approaches require a systems approach that simultaneously and compatibly addresses challenges in all layers of the quantum computer stack [11] : from the quantum hardware at the low level, through classical control electronics in the middle, to software at the high level (i.e., micro-instruction sets, compilers, and high-level programming languages).
Currently, the surface code [10, 12, 13] provides an experimentally attractive paradigm for FTQC owing to its modest requirements on the quantum hardware: only nearest-neighbor coupling is needed between qubits, and the error threshold falls robustly close to 1% across a range of error models and error-decoding strategies, signficantly higher than those of Steane and Shor codes [6] . In superconducting quantum integrated circuits based on circuit QED (cQED) [14] , the error rate of single-qubit gates has reached < 0.1% [15, 16] , while those of twoqubit conditional-phase (CZ) gates and measurement are 0.6% [15] and ∼ 1% [17, 18] , respectively.
The scalability of monolithic systems hinges on the ability to copy-paste a unit cell in the quantum plane, with suitable quantum interconnect between cells, and suitable classical interconnect to and from the control plane. The latter pursuit is very active, with several groups developing vertical (rather than the traditional lateral) interconnection of input/output (I/O) signals using through-the-wafer coaxial lines [19] , electromechanical sockets [20] , and bump-bonding in flip-chip configuration [21] .
For true scalability, it is crucial that the unit cell also extend into the classical control plane. A unit cell for control signals opens the door to hardware simplification through spatial multiplexing, i.e., the selective routing of control signals (with minimal customization) to spatially separated components. While frequency-division multiplexing is already heavily exploited in cQED [3, 18, 22] , spatial multiplexing is in its infancy. Precision control of same-frequency qubits using a microwave-frequency vector switch matrix (VSM) for pulse multicasting has only recently been demonstrated [23] .
In this paper, we propose a scalable scheme for the QEC cycle of a monolithic superconducting surface code by defining a concrete unit cell for both the quantum hardware and the control signals. We focus on a fabric of fast-flux-tunable transmon qubits interacting with nearest neighbors via flux-controlled conditional-phase (CZ f ) gates [24, 25] realized by pulsing into the resonatormediated |11 ↔ |02 avoided crossing of the interacting transmon pair (numbers indicate excitation level). Our approach is compatible with adiabatic [25] , sudden [26] and fast-adiabatic [15, 27] use of these crossings. Our eight-qubit unit cell uses three fixed frequencies for all single-qubit control and eight detuning sequences for twoqubit gates. This approach to classical control allows significant control hardware savings via spatial multiplexing. By pipelining the measurement of the two types of stabilizers of the surface code, we engineer detuning patterns avoiding all second-order transmon-transmon interactions except those exploited in CZ f gates, regardless of fabric size. Our scheme allows changing the weight of stabilizer measurements by simple on/off masking of detuning pulses, making it applicable to both defect-based and planar logical qubits [10] , including lattice surgery [28] .
II. BACKGROUND

A. Surface code QEC cycle
A surface-code fabric consists of the two-dimensional square lattice of data-carrying qubits shown in Fig. 1 . The stabilizers of this code are the X-type (Z-type) parity operators i X i ( i Z i ), where i denotes data qubits on the corners of the blue (green) plaquettes. Conventionally, these stabilizers are measured indirectly using ancilla qubits positioned at the center of the plaquettes, forming a second square lattice. Standard circuits for measuring X-and Z-type stabilizers, shown in Fig. 2 , combine a sequence of coherent interactions of the ancilla with its nearest-neighbor data qubits, followed by projective ancilla measurement.
Using controlled-not (CNOT) gates as the fundamental interaction, X-type and Z-type stabilizer measurements can be fully parallelized with circuit depth seven. We define circuit depth as the number of operations on each ancilla per QEC cycle, counting in measurement but excluding ancilla initialization [we assume Pauli frame updating (PFU) [13, 29] is used for data and ancilla qubits]. The order of two-qubit gates in Fig. 2 is important for two reasons [30] . First, data qubits common to adjacent plaquettes must do all their interactions with one ancilla before the other. Second, the S (N) pattern for X-type (Z-type) stabilizers provides resilience to single ancilla-qubit errors even in small distance-three sur- face codes such as Surface-17. This circuit consists of the patch delineated in Fig. 1 , with nine data qubits (labelled D a to D i ), four ancillas (X a to X d ) for X-type stabilizer measurements, and four ancillas (Z a to Z d ) for Z-type stabilizer measurements. When the two-qubit gate is CZ, parallelizing the stabilizer measurements of Surface-17 requires depth nine because of non-commutation between Hadamard (H) gates and CZ gates. The full circuit for the parallelized QEC cycle of Surface-17 using CZ gates is shown in Fig. 3 . Using gate and measurement times from recent experiments (τ 1Q = 20 ns for single-qubit gates, τ 2Q = 40 ns for CZ f gates, and 500 ns for ancilla readout and photon depletion in readout resonator), the QEC cycle will complete in 740 ns.
B. Limitations of fully parallelized X-and Z-type stabilizer measurements using CZ f gates On paper, it is straightforward to compose a depthnine quantum circuit for the fully parallelized QEC cycle of a surface-code fabric of arbitrary size. However, to the best of our knowledge following numerous failed attempts, the full parallelization of X-and Z-type stabilizer measurements makes it impossible to realize a scalable implementation with CZ f gates that satisfies all of
X-type (a) and Z-type (b) plaquettes. Data qubits are labelled according to their position relative to the ancilla (NE=northeast, NW=northwest, SE=southeast, and SW=southwest). Standard circuits for measuring X-type (c, e) and Z-type (d, f) stabilizers indirectly using ancillas, using CNOT (c, d) or CZ (e, f) as the primitive dataancilla interaction. The order of two-qubit gates, NE-NW-SE-SW (NE-SE-NW-SW) for X-type (Z-type), ensures that all data qubits common to adjacent plaquettes do their interactions with one ancilla before the other, and also provides resilience to ancilla errors in Surface-17 [30] . Using the relations H = Y+90Z = ZY−90, one can see that the opening and closing H gates can be replaced by Y−90 and Y+90 rotations, respectively.
the following desirable properties:
• Microwave pulses for single-qubit gates should be applied at a fixed, small number of frequencies.
• Transmons should maximally exploit their coherence sweetspot [31] .
• Flux-pulsed transmons should not cross any other interaction zones on their way to or from the intended |11 ↔ |02 avoided crossings realizing the CZ f gate.
• The flux-pulsing schemes should be extensible to a surface code of arbitrary size using a fixed number of detuning sequences and a fixed detuning range.
• The implementation should be compatible with logical qubit operations. We have found frequency arrangements and flux-pulse sequences that meet the first three criteria. However, all of these solutions require a growing number of detuning sequences and detuning ranges as the fabric expands, in order to avert all other interactions on the way to and from the |11 ↔ |02 avoided crossings of CZ f gates. Furthermore, these solutions seem practically infeasible already for distance five (Surface-49 [28] ). To our knowledge, no fully parallel solution exists with a fixed number of detuning sequences and a fixed detuning range. In the next section, we introduce a pipelined (rather than parallelized) version of the QEC cycle that simultaneously meets the five desirable properties for a fabric of arbitrary size.
III. THE PIPELINED QEC CYCLE
Our scalable scheme, which we term pipelined QEC cycle, combines four key elements:
A. Repeating unit cells of eight qubits; B. Pipelined X-and Z-type stabilizer measurements;
C. Three frequencies for single-qubit control;
D. Eight detuning sequences implementing the requisite CZ f gates, realizable by on/off masking of three flux-pulse primitives.
We now introduce these elements in detail.
A. Unit cell
The first element is a unit cell (Fig. 4) from which a surface code of arbitrary size can be assembled by repetition (and truncation at boundaries). A unit cell contains four data qubits (D 1 to D 4 ) and four ancillas (X 1 , X 2 , Z 1 , and Z 2 ). Crucially, the cell is the fundamental unit of repetition not just for the quantum hardware. It is also the unit of repetition for all coherent control.
B. Pipelining of X-type and Z-type stabilizer measurements
The second element is the pipelined execution of the X-and Z-type stabilizer measurements. The pipelining concept is illustrated in Fig. 5(a) . While stabilizer measurements of one type always run simultaneously, the coherent and readout steps of ancillas of the other type are interleaved. In other words, ancillas of one type undergo coherent steps while ancillas of the other type are measured. Time slots A and B (D and E) are for single-qubit gates pertinent to the X-type (Z-type) stabilizer measurements, while slots 1 to 4 (5 to 8) are for two-qubit gates. Note that nine of the CZ gates involve two qubits within the cell, while fourteen involve one qubit from a neighboring unit cell.
Generally, ancilla measurement (including any photon depletion of the readout resonator) will take longer than the coherent steps, leaving time to perform operations on the data qubits in steps C and F while all ancillas are measured. Possible operations include logical gate operations, refocusing pulses, or single-qubit gates performing error correction. Clearly, performing such operations during steps C or F would not increase the QEC cycle time.
Pipelining offers several advantages. First, it compresses the stabilizer measurements to depth seven, two single-qubit-gate steps less than fully-parallelized quantum circuits (such as Fig. 3 for Surface-17). A second and more crucial advantage is the ability to scale without increasing the number of frequencies for single-qubit control or qubit detuning sequences, as explained next.
C. Single-qubit control and detuning sequences
The third and fourth elements are best described together. Figure 5 (b) presents our choice of frequencies for single-qubit control and the qubit-specific detuning sequences for realizing the two-qubit QEC cycle interactions. Single-qubit gates on data qubits (steps A, B, D, and E) are performed at frequencies f 1 and f 3 (alternating in data-qubit rows), while those on ancillas are performed at intermediate frequency f 2 . Note that with only nearest-neighbor coupling, two distinct frequencies (one for ancilla qubits and one for data qubits) reduce the exchange coupling between same-frequency qubits to fourth order (qubit-resonators, resonator-qubit, qubitresonator, resonator-qubit). When extending to the proposed three frequencies, this also allows engineering the detuning sequences so that no transmon crosses any other second-order interaction zone on the way to or from the |11 ↔ |02 avoided crossings exploited in the CZ f gates.
During steps 1-4 and 5-8, transmons are flux pulsed to a discrete set of frequencies, depending on whether they interact, idle, or are measured: D 1 and D 2 to f 1 or f . Small offsets are added to some detuning sequences to clarify the distinction between sequences for D1 and D2, X1 and X2, Z1 and Z2, and D3 and D4.
D. Constructing detuning sequences by masking of primitive flux pulses
The frequency detuning patterns during interaction steps 1 through 4 and 5 through 8 can be synthesized by on/off masking of three flux-pulse primitives using a switch matrix: A first primitive detuning data qubits of type D 1 and D 2 from f 1 to f . For example, the detuning sequence for D 2 in Fig. 5(b) can be synthesized by masking the pulse primitive on (off) at steps 1, 4, 6, and 7 (2, 3, 5, and 8).
IV. COMPATIBILITY WITH LOGICAL QUBIT OPERATIONS
Two types of logical qubits can be envisioned for surface code: defect-based [10] and planar [28] . Defect-based logical qubits are introduced by stopping the measurement of one or two stabilizers (X-type for rough logical qubits, and Z-type for smooth ones [10] ). In our scheme, turning stabilizer measurements fully off can be accomplished in either of two ways. One is to mask off the H gates of the corresponding ancilla using the VSM, without changing the detuning sequence or stopping the ancilla measurement. If the ancilla is in |0 , all its CZ f gates are inactive and there is no net action on the logical qubit. If it starts in |1 , the stabilizer operator (not its measurement) is applied. This performs a logical X L (Z L ) gate on a rough (smooth) qubit. The ancilla measurements provide the key input allowing the decoder to keep track of the action by PFU. A second way to turn a stabilizer fully off is to mask off all the flux-pulse primitives in the interaction step.
Logical operations, such as move and braiding operations on defect-based qubits [10] , and lattice surgery on planar ones [28] , also require dynamically changing the weight of specific stabilizer measurements, i.e., selectively removing specific data qubits from the quantum parity checks. In our scheme, this can easily be achieved by selective on/off masking of flux-pulse primitives. For example, removing a qubit of type D 2 from the X-type stabilizer measurement below it simply requires masking off the pulse primitive at step 1. The order of the twoqubit gates can also be changed by masking.
V. IMPLEMENTATION DETAILS AND VARIATIONS
A. Choosing the frequencies
Ideally, f 1 (f 3 ) would match the sweetspot frequency of D 1 and D 2 (D 3 and D 4 ), and f 2 would match that of all ancillas, to minimize dephasing from 1/f flux noise. In practice, f 1 would be set to the lowest sweetspot frequency among all transmons labelled D 1 or D 2 , and so on. It is straightforward to expand the circuit of Fig. 5(a) with refocusing single-qubit gates to minimize dephasing in the transmons that are not at their sweetspot during single-qubit control [33] .
The frequencies
, and f Park 3 must be chosen so that residual interactions during single-qubit gates can be tolerated. For simplicity, we consider a uniform detuning scale ∆F = f 1 − f 
B. Frequency arrangement variations
There exist other possible frequency arrangements than that shown in Fig. 5(b) . For example, consider the inverted arrangement with all data qubits at f 2 and the ancillas at the outer frequencies. Figure 6 shows one of these configurations, with X 1 and Z 1 (X 2 and Z 2 ) at f 1 (f 3 ). It is straightforward to modify the detuning sequences for this arrangement to also avert all unwanted interactions. However, upon comparing this alternative to the original arrangement, we observe a key difference making the original preferable for a cQED implementation with flux-tuneable transmons. Specifically, the original exactly balances the number of interaction steps in which qubits can remain at their upper frequency (i.e., at or closest to their coherence sweetspot), while the flipped arrangement allows this on just two (out of eight) steps for data qubits and zero or four (out of four) steps for ancilla qubits. The reduced data-qubit dephasing during the coherent steps will lead to a lower logical error rate. Note that this advantage of the original arrangement is made possible by lowering the ancillas to f Park 2 for their measurement, at which the additional dephasing is innocuous in view of the measurement-induced projection.
Residual single-qubit gate cross-talk between D 1 and D 2 (D 3 and D 4 ) can be reduced by breaking the degeneracy in frequency f 1 (f 3 ), which requires increasing the number of primitive pulses from three to five, or even in f 2 , further increasing the number of primitive pulses to eight.
C. Switch matrix
A digitally addressable switch matrix and primitiveflux-pulse-synthesizer operating at room temperature (cryogenically) are exciting challenges for the near (long) term. The switch matrix should allow qubit-specific customization of the flux-pulse primitives, including fine adjustment of delay (to compensate cable-length mismatch), amplitude (for fine tuning of two-qubit and single-qubit phase), and dc offset (for tuning to f 1 , f 2 and f 3 ).
VI. CONCLUSION AND OUTLOOK
We have presented a concrete scheme for the QEC cycle of an arbitary-size surface code implemented with flux-tuneable transmons. The scheme combines four key concepts: an eight-qubit unit cell as the basis for repeti- Fig. 8(a) . Note that a similar cQED chip design is envisioned in Ref. [20] . Our scheme reduces the number of feedlines by bridging these over bus resonators.
tion of quantum hardware and control signals; pipelining of X-and Z-type stabilizer measurements; a fixed set of three frequencies for single-qubit control; and a fixed set of eight detuning sequences implementing the requisite controlled-phase gates. These eight detuning sequences can be composed by on-off masking of three flux-pulse primitives.
Experimental efforts are underway to implement and evaluate the pipelined QEC cycle in Surface-17. We pursue a realization of the Surface-17 quantum hardware with transmons in a planar cQED architecture [14] made extensible using vertical interconnect for all input and output signals (Fig. 7) . Each transmon will have a dedicated flux line [25] allowing control of its transition frequency on nanosecond timescales, a dedicated microwave-drive line, and a dedicated dispersivelycoupled resonator for readout. We opt for coupling nearest-neighbor data and ancilla transmons via bus res- onators (rather than direct capacitance [1] ). Figure 8 (a) shows a prototype seven-port transmon developed in our lab, nicknamed starmon [34] . The vertical I/O will be realised either using through-silicon-vias [ Fig. 8(c) ] or bump bonding in a flip-chip arrangement.
Experimental efforts are also underway to demonstrate the scalability of the classical control plane. Diagonally running feedlines coupled to readout resonators will allow simultaneous readout by frequency-division multiplexing [3, 22] , reducing the need for cryogenic amplifiers and microwave electronics (circulators, etc.), as well as homodyne detection systems at room temperature. While frequency multiplexing for readout is common in cQED, the simultaneous readout of nine qubits using one feedline as required by is not yet achieved. Finally, demonstrating the control-hardware savings achievable by spatial multiplexing is an immediate priority. Single-qubit control at f 1 , f 2 and f 3 will make use of a next-generation VSM (follow-up to [23] ) to independently route precision DRAG [35, 36] pulses to same-frequency qubits, with significant savings in microwave sources. Finally, a switch matrix for constructing frequency detuning sequences by on/off masking of flux-pulse primitives is envisioned for room temperature.
