It is not so well-known that measurement-free quantum error correction protocols can be designed to achieve fault-tolerant quantum computing. Despite the potential advantages of using such protocols in terms of the relaxation of accuracy, speed and addressing requirements on the measurement process, they have usually been overlooked because they are expected to yield a very bad threshold as compared to error correction protocols which use measurements. Here we show that this is not the case. We design fault-tolerant circuits for the 9 qubit Bacon-Shor code and find a threshold for gates and preparation of p (p,g)thresh = 3.76 × 10 −5 (30% of the best known result for the same code using measurement based error correction) while admitting up to 1/3 error rates for measurements and allocating no constraints on measurement speed. We further show that demanding gate error rates sufficiently below the threshold one can improve the preparation threshold to p (p)thresh = 1/3. We also show how these techniques can be adapted to other Calderbank-Shor-Steane codes. 03.67.Lx An ideal quantum computer is a theoretical object capable of highly efficient computation. A major difficulty with the realization of such a powerful theoretical object is that physical implementations of any quantum operation will be noisy. However, with the use of quantum error correction (QEC) codes, fault-tolerantly designed circuits, and provided that error rates are below some threshold value, one is still able to efficiently simulate a quantum computation with arbitrarily high accuracy [1] [2] [3] . Experimental state of the art results show that error rates and execution times required for operations in order to achieve the fault-tolerant regime are not currently available. The results in this paper will alleviate part of this constraint pushing required error rates a step closer to current technology.
An ideal quantum computer is a theoretical object capable of highly efficient computation. A major difficulty with the realization of such a powerful theoretical object is that physical implementations of any quantum operation will be noisy. However, with the use of quantum error correction (QEC) codes, fault-tolerantly designed circuits, and provided that error rates are below some threshold value, one is still able to efficiently simulate a quantum computation with arbitrarily high accuracy [1] [2] [3] . Experimental state of the art results show that error rates and execution times required for operations in order to achieve the fault-tolerant regime are not currently available. The results in this paper will alleviate part of this constraint pushing required error rates a step closer to current technology.
In many physical systems measurements pose a potential bottleneck for scalable fault-tolerant quantum computation because they are slower and/or noisier than gates or preparation [4, 5] . However, they are central in the readout stage, and are widely used in QEC routines as a way of extracting error syndrome information in order to correct the quantum data. Slow measurements have been shown to be a surmountable issue by using error correction where measured error syndromes can be classically postprocessed at the end of a round of gates to execute a compensating Pauli frame rotation [6] , with the caveat that there can be a significant time lag during classical processing [7] . Regarding noise, measurement error rates cannot usually be improved by noise suppression techniques, i.e. dynamical decoupling, whereas gates can be [9, 10] . Furthermore, measurement results must be distinguishable in every time step, i.e. one must be able to discriminate between results from different measurements repeatedly over the computation, which leads to further constraints on the physical processes executing the measurements, e.g. measurements relying on photon scattering as in ion traps. [11] In this letter we overcome these problems by eliminating most measurements during fault-tolerant computation. It is wellknown [1, 12] that this is possible for Calderbank-Shor-Steane codes [13] such as the Steane code, however " The penalty paid in the stringency of the threshold has never been quantified, but it is expected that replacing measurement by coherent operations decreases the noise threshold by a large amount" [6] . We show that contrary to these conjectures coherent FT QEC suffers only slightly in regards to the threshold and brings substantial rewards.
We begin by setting up our scenario and introducing measurement free error correction (EC) routines for the Bacon-Shor code. We then show how to execute fault-tolerant Clifford operations consisting of: (I) preparation of |0 and |+ = (|0 + |1 )/ √ 2 states, (II) Clifford group [14] unitary gates, and (III) measurement in the X and Z basis, and derive a threshold error rate which is stringent for preparation and gates but as high as 33% for measurement. We proceed to show that through an encoder circuit we can prepare special ancillas at any level of concatenation. While this encoding cannot achieve an arbitrary low error rate (p anc ), it is small enough, p (anc) < p H−anc = sin 2 (π/8) ∼ 14.6%, to be used as a resource in magic state distillation [7, 8] (MSD), a protocol using exclusively Clifford operations to distill arbitrarily low-error encoded |H L = (|0 L + e iπ/4 |1 L )/ √ 2 magic states. Using this resource to execute non-Clifford gates at the highest level of concatenation completes the universality of our model. Moreover, we show how to relax the threshold value for preparation, using a variant of algorithmic cooling and demanding a gate error rate, p (g) , sufficiently below the threshold. Thus fault tolerant universal quantum computing (FTUQC) can be achieved with measurement and preparation error rates, p (p) and p (m) respectively, which are already within reach of current technology.
We demonstrate our scheme for the 9-qubit Bacon-Shor (BS) subsystem code [15] but our tools can be adapted to other CSS codes (see B). The BS code is defined by the stabilizer set on a two dimensional array, S = X X I X X I X X I , I X X I X X I X X , Z Z Z Z Z Z I I I ,
For this code logical Pauli operators are given by
acts on a column (row) of the array. This code is a subsystem code and is invariant under pairs of X(Z) operators along any given row(column) because they act only on gauge degrees of freedom. Given the subsystem structure of the code one is able to correct acting on only one row (for X-errors) and one column (for Z-errors). The library of physical (level-0) gates we use is {X, Z, H,CNOT, T OFFOLI, Z − T OFFOLI = H ⊗3 (T OFFOLI)H ⊗3 , |0 preparation, |+ preparation, |H preparation, Z-measurement, X-measurement}, allowing also for non-local interactions. We adopt an adversarial, local, stochastic error model [16] .
The first obstacle is of course to design a EC routine/gadget which uses coherent feedback instead of measurements & feedback. One needs to use more gates within the EC gadgets to execute the coherent feedback and, in particular, one would typically need fault-tolerant implementations of TOFFOLI gates at every level. This would yield a very bad threshold value [1, 6] . However, during QEC we do not really need a full-fledged TOFFOLI gate since it will only be controlled by ancillas containing the syndrome, i.e. classical, information. For instance when correcting X-errors, a Z error in the ancillas is irrelevant, thus we can map a BS encoded ancilla to a quantum repetition (QR), i.e. bit-flip, code which protects against X errors but that is vulnerable to Z-errors. Using the QR encoded controls, and the structure of the logical operators in the BS code, we can use bitwise TOFFOLI gates to implement the needed operation (see Fig. (1(a) )).
The mapping between the BS code and the QR code, of the same level of concatenation, is achieved using the gate
is encoded in the k-concatenated BS code ( 9 k physical qubits), and s (k) denotes a bit encoded in k-concatenated QR code (3 k physical qubits). From our joint use of BS and QR codes we must also introduce an error correction measurement-free routine for the QR code, i.e. states of the form a |0, 0, 0 + b |1, 1, 1 . We build a majority voting gadget, which we dub the M -gate ( Fig.(1(a)) ). In the QR code all gates involved in the M gate are transversal and thus we can use this circuit as an EC gadget for this code at any level of concatenation. Moreover, through the N gate we can also use M as an encoded majority voting gadget, i.e. acting on a state of the form a 0
. By virtue of the fact that the Bacon-Shor code is, in essence, a composition of X and Z basis QR codes, we can use M &N as the building block for the BS EC gadget. Schematically the BS QEC routine works as follows (we refer the reader to Fig. (1(c) ) for a detailed description). The boxed part of Fig. (1(a) ) is a syndrome extraction stage, and turns the ancilla, initially in a 0 state, into a string which contains the error information. We adapt this method to the BS code. In this code, to correct for X-errors, we execute an extraction stage in every column of the BS state and get three strings (s 1 , s 2 , s 3 ). We use them to vote into a fourth one s 4 = s 1 ⊕ s 2 ⊕ s 3 , which will control final correction via N and bTOFFOLI. A single error in e.g. column one of the BS state leads to s 4 = s 1 which would correctly execute the correction by virtue of the gauge freedom; on the other hand a gauge operation, e.g. two X-errors in the same row, leads to s 4 = s ⊕ s = 0 which correctly implies an identity correction operation. An analogous analysis holds for Z-error correction. Now the X and Z correction stages of the BS QEC routine are essentially equivalent but have some differences.
Because the syndrome information after the syndrome extraction stage is different in both cases, we define N (X) (and N (Z) ), gates for the BS code at level k of concatenation (see Fig.(1(b) )):
, where A (r,c) denotes gate A acting on the qubit in row r and column c of the logical qubit. The (X) or (Z) version of the gate is chosen depending on the correction subroutine in which it is being used, e.g. to correct X-errors (as in the lower part of Fig. (1(c) )), we use N (X) . The N (X) (N (Z) ) is a Z(X) decoder, where one keeps only the convenient protection while completely unprotectng against the other type of errors. Moreover, after the X syndrome extraction stage, the corresponding ancilla does not need protection against Z error, so only the lower stage (EC X ) of EC must be used. This greatly reduces the overall execution time for encoded gates acting on those ancillas. We found that due to this property, the subroutine (V N ) i (k) not only takes less time, in terms of execution time of level-(k − 1) gates, but it can be shown to fail with a probability smaller than a CNOT(k), for k > 1. For k = 1 EC gadgets, there is no need to use N since N (0) = Id. We detail this in the Supplementary Material.A
We are now ready to describe the remaining elements of our BS code fault-tolerant scheme. First we describe the elements needed to fault-tolerantly simulate any circuit based solely on Clifford operations. (I) Preparation of |0 L and |+ L states: by (i) starting with a 3 × 3 array of |+ , and (ii) applying a M (X) in every column we can prepare a |+ L . Similarly |0 L is obtained 
In our circuits G(k) denotes the implementation of gate G, in terms of level-(k − 1) gates, without the prepended and appended EC(k) routines, and W denotes a waiting gate. (c) Full error correction (EC) gadget for the BS code. Here, a TOFFOLI with controls is a Z − T OFFOLI;
is a set of transversal CNOTs, CX
and CX
. The control of the gates in boxes is always the top input of the gate. The last gate is a bTOFFOLI.
by (i) starting with a 3 × 3 array of |0 , and (ii) executing a M (Z) in every row. (II) Clifford group generators: CNOT, H, Z 1/2 :
The CNOT gate is transversal, the H gate can also be implemented in a bitwise fashion but, because stabilizers are rotated by this action, it is followed up by a physical π/2-rotation accommodated by relabeling or rewiring of gates. The Z 1/2 gate can be implemented using the circuit in Fig. (2(a) ), provided one can prepare a logical ancilla in
Since the Z 1/2 gate is not part of the EC routines, it is only needed at the highest level of concatenation. Furthermore, as it is the only complex gate, it can be shown that by always using the same logical ancilla prepared in Fig. (2(a) ), then the entire quantum computation splits into two noninterfering paths (evolution by U comp and U * comp ) and the measurements of real, Hermitian operators at the end have the same expectation values as for evolution by U comp alone [17] . Alternatively one can use the distillation circuit in [3] at the highest level provided one can prepare it with an error rate below p (i−anc) = 1/2. (III) X and Z basis measurements.-They are only required at the highest level of concatenation. Given their form, measuring encoded logical operators can be achieved measuring only one row or column of the 9 k × 9 k encoding array. Threshold calculation for Clifford operations.-We use the extended rectangle (exRec) method developed in [3] to compute the threshold (see Supplementary Material for more details). An exRec of a gate is constructed by prepending and appending error correction routines on the inputs and outputs. The exRec with the largest number of malignant pairs, i.e. the number of pairs of faults which generate two or more errors in the data, will determine the threshold value. A quick inspection reveals that the largest exRec is the one corresponding to the CNOT gate. Following [3] , only at level k = 1 must one consider all elements: preparation and gates (including waiting gates). At level k > 1, using contraction of exRecs, preparation locations can be omitted. This means that one has to solve the recursion relationships for the error p ( j) at level j:
where
B denotes all possible three-site errors, and A (k) denotes the number of malignant pairs in the largest exRec of that level. This process can be repeated for four site errors, etc. to get an even tighter bound [3] . Executing this algorithm with our largest exRec, the CNOT, we obtain a threshold value, for preparation and gates, p (p,g)thresh = 3.76×10 −5 . This value is not a bound for measurement error rates since they are not needed during the QEC process and are only required at the highest level of concatenation. So it follows that
If preparation and gate error rates are below threshold, then for k large enough p (k) is vanishingly small and the terms O(p (k) )
can be neglected. Then the threshold condition for X and Z measurements is p (m)thresh = 1/3. Encoded non-Clifford operations.-The missing component to achieve universality is the FT execution of a non-Clifford gate. Using the circuit in Fig. (2(b) ) we translate the problem into preparing the |H L ancilla. To create an ancilla at the highest level we will use an encoder circuit which will allow us to keep the p (m)thresh ≤ 1/3. To encode an arbitrary state we use the following algorithm: (i) we start with the level-0 state |φ we want to encode and 8 |0 states, then (ii) we use CNOT gates, including waiting times such that never in one step does one qubit interact with more than one qubit, to create the state φ
. Finally (iii) we execute a M (Z) gate in every row, to create the state |φ L = a |0 L + b |1 L . We can recursively use the same algorithm to create the state at any level of concatenation k. Repeating this process recursively yields an error rate for the encoding at the highest level of concatenation k = L, p made arbitrarily small, however, provided p
(g) ≤ p thresh , it can be made small enough to give p (L) (anc) ≤ sin 2 π/8, and then one can use MSD to achieve FTUQC [8] .
Additionally, we promised that preparation errors can in fact be much higher than gate error rates. The argument proceeds by using a variant of the algorithmic cooling algorithm introduced in Ref. [18] . For a group of three qubits (a, b, c) with identical probabilities p (p) = ε (0) < 1/2, to be in the erroneous state |1 , we apply T OFFOLI ((c,b),a) CNOT (a,c) CNOT (a,b) . The reduced state of qubit a is colder, i.e. has lower error (ε (1) < ε (0) ). Concatenating the process, after j rounds using a total of 3 j qubits, the final error of the one output qubit satisfies the recursion relation ε ( j) = (ε ( j−1) ) 2 (3 − 2ε ( j−1) ). Including gate errors, the total error of this preparation process is p
We are now ready to combine our tools. If we are sensibly below threshold, say with p (g,p) = 0.75p (g,p)thresh = 2.82 × 10 −5 , then with p (m) = 33% we get p (6) (g) ∼ 10 −13 and p (6) anc = 8.32 × 10 −3 which is safely below the 14.6% needed for |H L distillation (and certainly below the 50% needed for the |+i L distillation [3] ).
Thus FTUQC is achievable with noisy and currently achievable measurement error rates, but with only a small impact to the threshold value as compared to the best known result (1.26 × 10 −4 ) for the same code allowing measurements [3] . One can go further and use algorithmic cooling to also push preparation error rates within reach of current technology. We find that if one has physical preparation error rates of p (p) = 1%, then two rounds of AC and physical gate error rates p (g) = 2.32 × 10 −6 allow for FTUQC. Preparation rates as high as 1/3 can also be allowed, at the cost of demanding a lower gate error rate. For p (p) ≥ 1/3, one can instead use noisy measurement since measurement followed by a unitary is preparation.
To put this result in perspective, notice that p (g) = 1.39 × 10 −6 is not a threshold value but the required value such that effective preparation and gate error rates are sensibly below our threshold (0.75 × p thresh ). In comparison, under the same assumptions the best known result [3] implies that quantum computing is possible, with reasonable overhead, when p (p,g,m) ∼ 9.5 × 10 −5 . So the price we pay to push measurement and error rates within reach of current technology (an improvement of three and two orders of magnitude respectively), is demanding roughly two orders of magnitude more stringent gate error rates. The result is even more significant if one considers recent results which show that arbitrarily accurate unitary gates (and not measurement and preparation) can, in principle, be achieved via open system control strategies [10] . Furthermore note that the required measurement and preparation error rates have already been reported: in trapped ions [4] , p (m) = 2.3 × 10 −3 while in quantum dots [5] , p (m) = 3 × 10 −2 .
We point out that the threshold value for gates computed here is by no means tight as we wanted to keep calculations simple. We have overcounted malignant pairs of locations, and certainly the design of our circuits may not be the optimal one in terms of error locations, thus in principle the threshold can be improved. On the other hand, restricting ourselves to two-qubit interactions only, and decomposing TOFFOLI gates into one and two qubit gates degrades the gate and preparation threshold value to 2.69 × 10 −5 . Also restricting to nearest-neighbor only interactions will degrade the threshold value [19] . In our circuits ancillas can be prepared offline and we have been careful to limit measurement only to when the data is encoded (at the highest level of concatenation), thus physical systems with slow measurement or preparation are allowed.
In conclusion, we have shown that measurement-free QEC is viable, considerably relaxing the time and error rate constraints on preparation and measurement operations, and pushing them within reach of current technology, while yielding only a small penalty to the gate threshold. This small penalty seems even less relevant if one considers recent results showing that arbitrarily accurate unitary gates can, in principle, be achieved using open system control [10] . Those results complement the methods developed here and bring fault-tolerant quantum computing closer to reality.the same amount of time as a gate from our gate library of the same level, nor is it obvious that the failure probability of the corresponding exRec is smaller than that of a CNOT exRec. We show here that indeed this is the case and use these attributes to compute the error threshold. One of the main properties we use in our threshold calculation and in our circuits is that N (k) takes less time than a fully protected gate of the same degree of concatenation k, i.e. T (N (k)) < T (G(k)), where T (A(k)) denotes the execution time of the protected gate A. By protected we mean the gate has EC gadgets prepended and appended to the gate. The key observation to prove this is that whenever N is used, the state only needs protection against one kind of error, for example during the X error correction stage the ancilla only needs protection against X-errors. So to achieve this protection we execute the EC gadget, EC X , without the Z error correction stage. The same analysis follows for the EC Z . In this section we will prove relations explicitly for N (X) (k), and thus will omit the X or Z superscript, but the reader should have in mind that the same results hold for N (Z) .
With this in mind, we begin by comparing a fully protected exRec of a gate G at level-k of concatenation with N (k). The relevant gates can be decomposed as
where the notation A(k − 1) denotes the implementation of the A(k) gate in terms of (k − 1)-level protected gates but omitting k-level protection. At level k = 1, A(k − 1) corresponds to the physical gates implementing the encoded gate. Moreover a contracted extended rectangle (exRec) A(k) is composed by the implementation of the gate in terms of level-(k − 1) gates and level-k error correcting gadgets in all the inputs and relevant outputs. We use the notation G(k) to denote a fully protected gate made up of a single step of level k − 1 gates, for example a transversal CNOT (k − 1) gate. In contrast the N (k) consists of more than one step of level-(k − 1) protected gates. To calculate a bound on the time needed perform our error correction, first notice that the full error correction gadget as illustrated in Fig. (1(c) ) consists of X and Z error correction which both consist of the same number of gates and overlap in all but two locations (neither of which is an N (k) gate). Hence, regardless of the structure of N (k) or it's time duration we find
Moreover, given the structure of N (k) (see Fig. (1(b)) ), it follows then that
which in turn implies that
At level k = 1 we do know the form and time-duration for all gates in the circuit, and we have that
So an inductive reasoning leads us to
Even more, in the error correction gadgets of level-k we used the subroutine V N (k − 1) which is composed of level k − 1 gates acting on 9 level k − 1 BS encoded inputs and 3 level k − 1 QR encoded outputs. For analysis of this module, the sequence of operations can be decomposed as follows
where in the last equality we have used the exRec-contraction technique from Ref. [3] , and used the fact that only the lower output of the gate will be used. The execution time of this contracted exRec satisfies
where we have used the fact that T (EC X ) > T (M ) to go from line one to two, Eq. (A4) from two to three and Eq. (A1) to get the last equality. This shows why V N (k − 1) takes one level k − 1 time slot in our circuits.
b. Error contribution of N
Another relevant property for our threshold calculation is the failure probability of a N gate at some level k. For instance, if its failure probability was greater than that of a CNOT of the same level, we would have to equate that into our threshold calculation. We want to show something even stronger: during EC(k) one uses the subroutine (V N )(k − 1) (see Fig. (1(b) )) which can be further decomposed as (V N )(k) = ∏ i∈rows (V N ) i (k), we will show that the error probability of the contracted exRec corresponding to the collection of level
is not larger that of a CNOT of level-(k − 1). In a more fundamental way we will show that the exRec with the highest failure probability is the one corresponding to the CNOT gate. As in the previous section, we will prove all relations for the N (X) . To simplify notation we drop the X super or sub scripts when necessary but remind the reader that the analysis holds for both X and Z related routines. The sketch of the calculation is the following. At an arbitrary level of concatenation k, the failure probability of an exRec will depend on the failure probability of gates and of N of lower level-(k − 1), but since we do not know ab initio what is the failure probability of (V N ) i (k), we only know its failure probability must be larger than the N (k) one, we cannot directly compute a threshold condition. Fortunately we know what is the specific form of (V N ) i (1) at level k = 1 in terms of level-0 gates, and we can directly compare it with CNOT(1). In general, this comparison can be done first counting the number of malignant pairs of level-(k − 1) errors within a gate G exRec of level-k, A(G(k)) and written in terms of various malignant-error parameters {A EC(k) , ...} which we will define below. The error probability of such exRec is then given by
whereG corresponds to the gate of level k − 1 with the highest failure probability,
, and B G(k) denotes all possible three-site errors in the exRec. Once we show that at level k = 1, the CNOT is the largest exRec, then we can do the analogue level k = 2 calculation but now replacing the failure probability of (V N ) i (k) with p (1) , now with parameters {A EC(k=2) , ..., }. An inductive reasoning will finally lead us to the comparison at any level of concatenation: p
The malignant error parameters are defined as:
• A EC (A EC X ) the number of malignant pairs in an EC (EC X );
• u (u X ) the number of single failures in an EC (EC X ) which generate a single error in the data;ū (ū X is similar but restricting the error to be in only 8 out of the 9 qubits in the encoded data. This case is important when we have an errors propagating through a CNOT;
• α (α X ) the number of single failures in an EC (EC X ) which, in conjunction with an incoming data error, generate a double error in the data;
• Parameters A M , m,m and β can be defined for the quantum repetition code and its error correcting gadget, the M gate.
because at this point we have to assume that the size of the circuits varies with every level of concatenation, then each parameter will have a (k) denoting the level of concatenation it corresponds to.
Let us now proceed with the calculation. The number of malignant pairs in the CNOT (1), (V N ) i (1) , and N (1) exRecs are then given by
A bT OFF ( 
CNOT and p
CNOT . Now, for k = 2, we obtain the following failure probabilities, using that bT OFFOLI(1) and (V N ) i (1) fail with half the probability of a CNOT (1) gate.
with corresponding parameter values
CNOT . From this point on, the structure of the level k error correction circuits, and thus the corresponding malignant error parameter values, are the same of the level k = 2 circuits, so repeating the process for k = 3, 4, ..., k leads us to the conclusion that the CNOT exRec is in fact the largest exREC to be considered and the one which will determine our threshold value.
The largest exRecs to be considered. An EC gate corresponds to a BS QEC routine while a M gate corresponds to a QR QEC routine. The circuit (3(c)) is executed in every row of the 3 × 3 array. Because in the N exRec we are discarding the top-lines we do not require output M gadgets appended to them. Moreover, at level k = 1 there is no need for the waiting (W) gate and both CNOTs can be executed simultaneously.
Error analysis for the encoder circuit
The error analysis for the encoder circuit is as follows: to encode a level k state provided a level k − 1 state, we have that step (i) uses 8 CNOTS, 20 waiting gates, and 8 |0 preparations failing with probability p (k−1) and step (ii) can introduce unwanted phases with a single error (note that this is not a problem in Clifford ancilla preparations e.g. |0 states.) thus we count all locations in the M gates. A M (1) contributes with 27 level-0 locations, while a M (k), for k > 1, contribute with 24 level-(k-1) locations. So we have
which justifies our encoder circuit error analysis.
M gate for larger codes
If we were to use larger codes, e.g. the 25 qubit BS code, one would typically need to execute larger majority votings, i.e. of a longer distance. To achieve this purpose we developed a way of executing majority voting in a fault-tolerant way or, equivalently, a way of fault-tolerantly and unitarily correcting a quantum repetition code. We denote this gate as M (N).
where m = MBF{s 1 , s 2 , ..., s N }, |ε 0 = 0 ⊗N and MBF is the majority boolean function. Note that when N = 2k i.e. is even, the MBF may not be solvable, i.e. when the string is balanced, in those cases the protocol will just take {s 1 , s 2 , ..., s 2k } to another balanced {s 1 , s 2 , ..., s 2k }.
Consider the M gate in Fig. 5 . The C gate can now be chosen to be a series of multi-controlled NOT gates targeting the data
The boxed part of the circuit is in charge of the syndrome extraction, R corresponds to a cyclic permutation of the physical qubits and | s denotes a codeword |s 1 , ..., s N . The last unitary C , targeting the top line, will be the one in charge of executing the desired operation, depending of how we choose the controls as we will see below. All operations depicted here are bitwise and thus transversal.
string. In general, it will be all possible k-controlled-NOT gates targeting the data, where k is to be chosen from a set K N which is characteristic for every string length N, e.g. K 3 = {2}, K 5 = {4, 3}, K 7 = {6, 4}. For the purposes of this paper the N = 3 case is of special interest and C is just a TOFFOLI gate which is assumed in our library of physical operations. For notation purposes we write M for M (3) . Note that with this majority voting gadget we can build the corresponding EC gadget for larger BS codes, in the same way we used the M to build the 3 × 3 BS code QEC routine.
Parity voter (P )
Other circuit which we do not use here, but may be of use is the Parity voter circuit, P :
where q = s 1 ⊕ s 2 ⊕ ... ⊕ s N . This gate can be executed slightly modifying the circuit in Fig. (5) : we (i) swap the order of the C and the last set of CNOTs and, (ii) define C as all the possible CNOTs controlled by the ancilla qubits in a column and targeting the data string qubit corresponding to that column. Let us note that we can also vote the parity of a string |q 1 , q 2 , ..., q S into it's last qubit via ∏ S k=1 CNOT i,k .
|Cat state verification
In contrast to the 9-qubit Bacon-Shor code other codes need an extra element: |cat state verification. This verification stage in general provides a test which if passed gives an outpput state which when used an ancilla in the QEC process will not ruin the fault-tolerance, and if failed indicates that the whole ancilla preparation & verification process must be restarted. The method we develop here will not be a "test" but rather a deterministic way of producing "verified" output |cat -states.
Depending on the EC gadget of choice one will typically need a way of verifying the state |cat = (1/ √ 2)(|0000 + |1111 ).
Typically these states are prepared and verified through a measurement: if the verifier qubit has not flipped then the state passes the test, if it has flipped then the preparation & verification process must be restarted. The whole idea is that a two bit-flip cat-state such as (|0011 + |1100 )/ √ 2 has to pass the verification with probability p 2 or worse, such that the fault-tolerance is maintained. Note that this state is the result of a single failure of a CNOT during the preparation stage, however the measurement gives us a criterion for discarding it. An extra failure in the measurement must happen for the bad cat state to pass the test, but that is already a p 2 event so the analysis for fault-tolerance is valid. This process is non-deterministic in the sense that one error in the measurement can lead to the rejection of a perfectly good ancilla. As we want to avoid measurements we would ideally want to avoid such process altogether, thus we can execute the circuit in Fig. 7 . This implies that we have a way of unitarily and deterministically, i.e. no discarding and restarting the process, preparing our veri f ied |cat state ancilla.
M &N for other CSS codes
Beyond the circuits that we presented here, in a deeper sense what our N (X) (N (Z) ) circuit does is check the parity of all representations of logical X (Z) operators modulo stabilizer operators, and project this information into a X (Z) QR code. For example, the Steane code is defined by the stabilizers {IIIXXXX, IXXIXX, XIXIXIX, IIIZZZZ, IZZIZZ, ZIZIZIZ}, so there are seven different implementations of X L (Z L ) using only three X (Z) physical gates. So to execute the N (X) gate we execute the following protocol for all the seven logical operators. For the operator X Z i , obtained by applying X operators on the qubits α i , β i , γ i of the codeword we: (i) prepare an ancilla |000 state; (ii) a bitwise CNOT between the qubits α i , β i , γ i of the codeword and the ancilla, the resulting state of the ancilla is a string |s i ; (iii) vote the parity of |s i into its last qubit, |p(s i ) . Note that the process can be executed simultaneously, because no qubit is targeted twice and we are not concerned by propagation of Z-errors 
