We present a method for constructing optimal fault-tolerant approximations of arbitrary unitary gates using an arbtrary discrete universal gate set. The method presented is numerical and scales exponentially with the number of gates used in the approximation, however, for the specific case of arbitrary single-qubit gates and the fault-tolerant gates permitted by the 7-qubit Steane code, it is shown that the longest practical gates sequences can be found. We also analyse the practicality of the fault-tolerant approximations of the phase rotation gates used in Shor's algorithm and find that simple non-fault-tolerant phase rotations are more robust for realistic error rates. A general scaling law of how rapidly these fault-tolerant approximations converge to arbitrary single-qubit gates is also determined.
In large-scale quantum computation, every qubit of data is encoded across multiple physical qubits to form a logical qubit permitting quantum error correction and fault-tolerant computation. Unfortunately, only very small sets of fault-tolerant gates G can be applied simply and exactly to logical qubits, where G depends on the number of logical qubits considered, the code used, and the level of complexity one is prepared to tolerate when implementing fault-tolerant gates. Gates outside G must be approximated with sequences of gates in G. The existence of efficient approximating sequences has been established by the Solovay-Kitaev theorem and subsequent work [1, 2, 3, 4] . In this paper, we describe a numerical procedure taking a universal gate set G, gate U , and integer l and outputting an optimal approximation of U using at most l gates from G. This procedure is used to explore the properties of approximations of the single-qubit phase rotation gates R 2 d built out of fault-tolerant gates that can be applied to a single Steane code logical qubit. The average rate of convergence of Steane code fault-tolerant approximations to arbitrary single-qubit gates is also obtained. Section 1 describes the basics of the numerical procedure used to find optimal gate sequences approximating a given gate. A finite universal set of faulttolerant gates that can be applied to a single Steane code logical qubit is given in Section 2, along with most of their quantum circuits. The complicated circuits of the T-gate, which is required to ensure universality, are described separately in Section 3. Section 4 contains a discussion of phase rotations R 2 d and their fault-tolerant approximations, followed by approximations of arbitrary gates in Section 5. Section 6 summarizes the results of this paper and their implications, and points to further work.
Finding optimal approximations
In this section, we outline a numerical procedure that takes a finite gate set G ⊂ U (m) that generates U (m), a gate U ∈ U (m), and an integer l and outputs an optimal sequence U l of at most l gates from G minimizing the metric
The rationale of Eq. (1) is that if U and U l are similar, U † U l will be close to the identity matrix (possibly up to some global phase) and the absolute value of the trace will be close to m. By subtracting this absolute value from m and dividing by m a number between 0 and 1 is obtained. The overall square root is required to ensure that the triangle inequality
is satisfied. This metric has been used in preference to the trace distance used in the Solovay-Kitaev theorem [2, 3] , as the trace distance does not ignore global phase, and hence leads to unnecessarily long phase correct approximating sequences. Finding optimal gate sequences is a difficult task, and the run-time of the numerical procedure presented here scales exponentially with l. Nevertheless, as we shall see in Section 4, gate sequences of sufficient length for practical purposes can be obtained.
For a set G of size g = |G| and a maximum sequence length of l, the size of the set of all possible gate sequences of length up to l is approximately g l . For even moderate g and l, this set cannot be searched exhaustively. To describe the basics of the actual method used, a few more definitions are required. Let G denote a gate in G. Order G, and denote the ith gate by G i . Let S denote a sequence of gates in G. Order the possible gate sequences in the obvious manner
. ., and let S n denote the nth sequence in this ordering. Let {S} l denote all sequences with length less than or equal to l. Let {Q} l ′ , l ′ < l denote the set of unique sequences of length at most l ′ . Naively, {Q} l ′ can be constructed by starting with the set containing the identity matrix, sequentially testing whether S n ∈ {S} l ′ satisfies dist(S n , Q) > 0 for all Q ∈ {Q} l ′ , and adding S n to {Q} l ′ if it does. A search for an optimal approximation of U using gates in G begins with the construction of a very large set of unique sequences {Q} l ′ . The utility of {Q} l ′ lies in its ability to predict which sequences in {S} l , l > l ′ do not need to be compared with U to determine whether they are good approximations, and what the next sequence worth comparing is. To be more precise, assume every sequence up to S n−1 has been compared with U . Let {S n−1 } denote this set of compared sequences. Consider subsequences of S n of length l ′ . If any subsequence is not in {Q} l ′ , there exists a sequence in {S n−1 } equivalent to S n . In other words, a sequence equivalent to S n has already been compared with U , and S n can be skipped. Furthermore, let
where
The next sequence with the potential to not be equivalent to a sequence in {S n−1 } is
The process of checking subsequences is then repeated on this new sequence. Skipping sequences in this manner is vastly better than an exhaustive search, and enables optimal sequences of interesting length to be obtained. It should be stressed, however, that the runtime is still exponentially in l.
Highly non-optimal but polynomial runtime sequence finding techniques do exist [2, 3, 5, 6 ] but will not be discussed here.
Simple Steane code single-qubit gates
For the remainder of the paper we will restrict our attention to fault-tolerant single-qubit gates that can be applied to the 7-qubit Steane code. The Steane code representation of states |0 and |1 is [7] 
An equivalent description of this code can be given in terms of stabilizers [8] which are operators that map the logical states |0 L and |1 L to themselves. 
States |0 L and |1 L are the only two that are simultaneously stabilized by Eqs (8-13). Non-fault-tolerant circuits for both a general and LNN architecture that take an arbitrary state α|0 + β|1 and produce α|0 L + β|1 L are shown in Fig. 1 . The fault-tolerant preparation of logical states is more complicated, and will be discussed in detail in Section 3.
The minimal universal set of single-qubit fault-tolerant gates that can be applied to a Steane code logical qubit consists of just the Hadamard gate and the T -gate
To justify the use of such a large set G, consider the circuits shown in Fig. 2 implementing H, X, Z, S and S † . By combination, it can be seen that gates {G 6 , . . . , G 23 } can also be implemented with simple transversal applications of single qubit gates. As we shall see in Section 3, by comparison the T -gate is extremely complicated to implement. Since we are interested in minimal complexity as well as minimum length sequences of gates in G, it would be unreasonable to count G 23 as three gates when in reality it can be implemented as easily as any other gate {G 1 , . . . , G 22 }. Since {I, G 1 , . . . , G 23 } is a group under multiplication, minimum length sequences of gates approximating some U outside G will alternate between an element of {G 1 , . . . , G 23 } and a T -gate. Note that the T † -gate is not required in G for universality or efficiency as, in gate sequences of length l ≥ 2, it is equally efficient to use S † T or T S † . The extra S † -gate is absorbed into neighboring G i -gates, i < 24.
The fault-tolerant T-gate
Moving on to implementing the fault-tolerant T -gate [3] , the basic idea is to prepare an ancilla state |0 L + e iπ/4 |1 L then apply the circuit shown in Fig. 3 . Tracing the action of Fig. 3 , we initially have 
After measuring the lower logical qubit, if |0 L is observed (meaning one of the eight bit strings shown in Eq. (5) or a bit string a single bit different from one of these eight), no further action is required. If |1 L is observed, applying the logical gate SX to the top qubit will yield the desired state up to an irrelevant global phase. Note that the measurement step and subsequent classical processing allows the correction of a single bit-flip error and is insensitive to phase errors.
To fault-tolerantly prepare the ancilla state, we first need to be able to faulttolerantly prepare the state |0 L . As we shall see, to do this, we need to be able to fault-tolerantly determine whether a state |Ψ is in the +1 or −1 eigenstate of a self-inverse operator A (A 2 = I). A non-fault-tolerant circuit doing this is shown in Fig. 4 . It is instructive to trace the action of the circuit. The initial state is |0 |Ψ , which after the first Hadamard becomes (|0 + |1 )|Ψ . After the controlled-A the state becomes |0 |Ψ + |1 A|Ψ . After the second Hadamard
If a zero is measured, the lower qubit will be in the +1 eigenstate |Ψ + A|Ψ . Conversely, if one is measured, the lower qubit will be in the −1 eigenstate |Ψ − A|Ψ . The specific self-inverse operators we wish to measure are the stabilisers Eqs (8) (9) (10) . To build a fault-tolerant circuit measuring these multiple qubit operators, the control qubit shown in Fig. 4 must be replaced by a cat state so that each qubit modified by the stabiliser is controlled by a different qubit in the cat state. This is necessary to prevent a single error in a control qubit propagating to multiple target qubits. This in turn necessitates fault-tolerant cat state preparation which is shown in Fig. 5a [9] . A single bit-or phaseflip anywhere in this circuit causes at most one error in the final state. This circuit is significantly simpler, and no less robust than the fault-tolerant cat state preparation circuit suggested in [3] (Fig. 6 ). The uncat circuit of Fig. 5b is fault-tolerant purely because its output is a single qubit and by definition a single error can cause of most one error in the output. Using the circuit notation shown in Fig. 7 , the complete circuit for faulttolerantly measuring a stabiliser is shown in Fig. 8 . Note that the basic stabiliser measurement circuit appears three times as a single error in a cat state block, while not propagating to multiple qubits in the logical state block, almost always causes and incorrect measurement. To ensure a probability O(p 2 ) of incorrect measurement, the process must be repeated up to three times. The third measurement structure can be omitted if the first two measurements are the same. The final triply controlled Z-gate is only applied if the majority of the measurements are one. Note that this assumes fast and reliable classical processing is available. The final Z-gate converts a −1 eigenstate of XIXIXIX into a +1 eigenstate. Thus the output of Fig. 8 is the +1 eigenstate of XIXIXIX with probability O(p 2 ) of failure (i.e. more than one incorrect output qubit). We now have the necessary tools to fault-tolerantly prepare |0 L . Recall that |0 L and |1 L are the only two states simultaneously stabilised by all of Eqs (8-13). If we include the logical Z operator, |0 L is the unique state stabilised by Z and all six stabilisers. The state |0 L could thus be created using the circuit of Fig. 9 which outputs |0 L for arbitrary input [3] .
A better way of obtaining |0 L , is to start with the state |0000000 which is physically accessible in a quantum computer architecture either via some form of special reset operation, or measurement possibly followed by an X-gate. State |0000000 is a +1 eigenstate of logical Z and Eqs (11-13), therefore only stabilisers Eqs (8-10) need to be measured. To complete the construction of the ancilla state, and hence the T -gate (Fig. 14) , the operator e iπ/4 X is measured. Note that e iπ/4 X is not self-inverse, but nevertheless the circuit works as required. Specifically, before cat state preparation we have |000 |0 L . After cat state creation this becomes
After transversal cnot
Note that only three physical cnotgates are required to implement a logical cnotgate on the Steane code due to its stabiliser structure. After the single-
After uncat 
Under the assumptions that 2-qubit gates, measurement, reset and classical processing each have depth 1, single-qubit gates have depth zero and don't contribute to the gate count, and arbitrary disjoint 2-qubit gates can be implemented in parallel, Table 1 summarises the best case complexity of the T -gate.
Approximations of phase gates
We now use the machinery described in this paper to construct optimal faulttolerant approximations of phase rotation gates
Gates R 2 d are examples of gates used in the single-qubit quantum Fourier transform that forms part of the Shor circuits described in [10, 11] . Note that phase rotations of angle 2πx/2 d , where x is a d-digit binary number, are also required, but the properties of fault-tolerant approximations of such gates can be inferred from R 2 d .
For a given R 2 d , and maximum number of gates l in G, Fig. 12 shows dist(R 2 d , U l ) where U l is an optimal sequence of at most l gates in G minimising dist(R 2 d , U l ). For d ≥ 3, U 1 is equivalent to the identity. Note that as d increases, R 2 d becomes closer and closer to the identity, lowering the value of dist(R 2 d , U 1 ), and increasing the value of l required to obtain an approximation U l that is closer to R 2 d than the identity. In fact, for R 128 the shortest sequence of gates that provides a better approximation of R 128 than the identity has length l = 31. There are a very large number of optimal sequences of this length. An example of one with a minimal number of T -gates is
HT (SH)T (SH)T (SH)T HT HT (SH) T HT HT (SH)T HT HT HT (SH)T (S †
Note that dist(R 128 , I) = 8.7×10 −3 whereas dist(R 128 , U 33 ) = 8.1×10 Figure 12: Optimal fault-tolerant approximations U l of phase rotation gates
Figure 13: Circuit exactly implementing R 128 by first decoding the logical qubit and re-encoding after application of R 128 .
raises the question of how many gates are required to construct a sufficiently good approximation.
In [11] , it was shown that
was sufficiently close to R 128 . This is, of course, only a property of Shor's algorithm, not a universal property of quantum circuits. Given dist(R 128 , U ) = 2.2× 10 −3 , a sufficiently accurate fault-tolerant approximation U l of R 128 must therefore satisfy dist(R 128 , U l ) < 2.2 × 10 −3 . The smallest value of l for which this is true is 46, and one of the many optimal gate sequences satisfying dist(R 128 , U 46 ) = 7.5 × 10 −4 is
U 31 = HT HT HT (SH)T HT (SH)T (SH)T (SH)T HT (SH)T (SH)T HT HT (SH)T (SH)T HT (SH)T (SH)T (SH)T HT (SH)T HT (HS
Now that we have a minimal complexity circuit sufficiently close to R 128 , the immediate question is whether it is practical. An alternative to Eq. (27) is shown in Fig. 13 which simply decodes the logical qubit, applies R 128 , then re-encodes. This simple non-fault-tolerant circuit will fail (generate more than one error in the output logical qubit) if a single error occurs almost anywhere in the top six qubits. The probability of no errors in the top six qubits is (1 − p) 66 . This is the worst-case reliability of the circuit.
A partial schematic of the circuit corresponding to Eq. 27 is shown in Fig. 14. As the circuit is fault-tolerant, it only fails if at least two errors occur within the circuit. Any analysis of the reliability of the circuit is complicated by the fact that the T -gates that comprise the bulk of the circuit have error correction built in at a number of places. Furthermore, when errors are detected and corrected, the circuit typically increases in depth. Referring back to Fig. 11 , we shall assume that the T -gate is only sensitive to errors in the lower 14 qubits and that the depth of the circuit is never increased by errors. From Table 1 , 30000 + 30000p(1 − p) 29999 , which is only greater than the non-fault-tolerant circuit for p < 1.4 × 10 −7 . A fault-tolerant circuit correcting an arbitrary single error in a Steane logical qubit is shown in Fig. 15 . By applying error correction to the logical qubit after each T -gate in Fig. 14 , the reliability of the circuit can be increased. The best case depth of Fig. 15 is 120, and if we assume that the circuit is only sensitive to errors in the lower seven qubits, the total area sensitive to errors is approximately 800. The error corrected U 46 circuit will only fail if two errors occur within a single T -gate and error correction block. The reliability of a single block is (1 − p) 2100 + 2100p(1 − p) 2099 . The failure probability of the non-faulttolerant circuit, the fault-tolerant circuit without correction, with correction after every second T -gate, and with correction after every T -gate is compared in Fig. 16 .
Of the fault-tolerant circuits, the one with error correction after every Tgate performs best. Nevertheless, this circuit is still only more reliable than the non-fault-tolerant circuit for p < 1.3 × 10 −6 . Given that p ∼ 10 −6 is likely to be very difficult to achieve in practice, longer error correction code words or concatenation would be required to make the fault-tolerant circuit practical. Given that Fig. 14 is both extremely complex and the simplest faulttolerant circuit sufficiently close to R 128 , for practical computation non-faulttolerant circuits similar to Fig. 13 are likely to remain the best way to implement arbitrary rotations for the foreseeable future.
In Shor's algorithm, the use of non-fault-tolerant rotations would be acceptable as only 2L such gates are used to factorise an L-bit number N . Furthermore, only half of Fig. 13 would be required as these gates immediately precede measurement, and there is no point re-encoding before measurement. In a 4096 bit factorisation, the total area of non-fault-tolerant circuit would be approximately 2 × 10 5 . Assuming the rest of the Shor circuit uses sufficient error correction to be reliable, if p ∼ 10 −5 , the average number of errors in the non-fault-tolerant part of the circuit would be 2 -completely manageable with just a few repetitions of the entire circuit or minimal classical processing. 
Approximations of arbitrary gates
In this section, we investigate the properties of fault-tolerant approximations of arbitrary single-qubit gates
Consider Fig. 17 . This was constructed using 1000 random matrices U of the form Eq. 28 with α, β, θ uniformly distributed in [0, 2π). Optimal fault-tolerant approximations U l were constructed of each, with the average dist(U, U l ) plotted for each l. The indicated line best fit has the form
This equation characterises the average number l of Steane code single-qubit fault-tolerant gates required to obtain a fault-tolerant approximation U l of an arbitrary single-qubit gate U to within δ = dist(U, U l ). An important point to note is that even with unlimited resources Eq. (29) does not provide a pathway to construct arbitrarily accurate gates. The accuracy of the fault-tolerant T-gate described in Section 3 depends critically on the accuracy of a single physical T-gate (Eq. (22)). Any over or under rotation at this point will be directly reflected in the output state of the logical qubit. Since half the gates in an optimal fault-tolerant approximation are T-gates, as the number of gates increases rotation errors will inevitably accumulate.
Consider the over rotation gate
where θ ≪ 1. For sufficiently small θ, δ = dist(I, I θ ) = 3/8θ. In the logical T-gate, even if there is systematic over rotation in the single physical T-gate, the stochastic nature of Eq. (23) ensures that the final logical state will be out by a random angle ±θ. This implies that a fault-tolerant approximation involving l/2 T-gates will be uncertain by an amount δ = 3l/16θ. The inequality 3l/16θ < 0.292 × 10 −0.0511l therefore sets the maximum number of gates that can meaningfully be included in a fault-tolerant approximation. Note that even if θ ∼ 10 −4 , l max is only 60 -a number of gates accessible using the algorithm describing this paper.
Conclusion
We have described an algorithm enabling the optimal approximation of arbitrary unitary matrices given a discrete universal gate set. We have used this algorithm to investigate the properties of fault-tolerant approximations of arbitrary singlequbit gates using the gates that can be applied to a single Steane code logical qubit and found that on average an l gate approximation can be found within −0.0511l of the ideal gate. We have considered the specific case of the phase rotation gates used in Shor's algorithm and found that even the minimal complexity fault-tolerant circuits obtained are still so large that they are outperformed by non-fault-tolerant equivalents. The work here suggests that practical quantum algorithms should avoid, where possible, logical gates that must be implemented using an approximate sequence of fault-tolerant gates. In the near future, this work will also be extended to multiple-qubit gates and larger circuits.
