Abstract-Every quantum operation can be decomposed into a sequence of single-qubit and Controlled-NOT (C-NOT) gates. In many implementations, single-qubit gates are simpler to perform than C-NOTs, and it is hence desirable to minimize the number of C-NOT gates required to implement a circuit. Previous work has looked at C-NOT-efficient synthesis of arbitrary unitaries and state preparation. Here we consider the generalization to arbitrary isometries from m qubits to n qubits. We derive a theoretical lower bound on the number of C-NOT gates required to decompose an isometry for arbitrary m and n, and give an explicit gate decomposition that achieves this bound up to a factor of about two in the leading order. We also perform some bespoke optimizations in the case of small m and n. In addition, we show how to apply our result for isometries to give a decomposition scheme for an arbitrary quantum operation via Stinespring's theorem, and derive a lower bound on the number of C-NOTs in this case too.
I. INTRODUCTION
Q UANTUM computers would allow us to speed up several important computations including search [1] , [2] , quantum simulation [3] and factoring [4] . The ability to do the latter would render RSA [5] , a widespread cryptographic protocol, unfit for purpose. However, constructing a device capable of performing such computations is one of the biggest challenges facing the field, and many candidate platforms remain in their infancy, operating only with a few qubits at best.
In spite of this, the theory of quantum computation is quite advanced. At an abstract level, a quantum computation corresponds to a unitary operation, and a universal quantum computer should be able to perform arbitrary unitary operations (each to very good precision). Rather than having a bespoke component for each unitary operation, it is convenient to break down such operations in terms of a small family of simple-to-perform gates. This is the aim of the circuit model of quantum computation, which mirrors an analogous model for classical computation, in which an arbitrary computation can be decomposed in terms of (for example) NOT, AND, OR and C-NOT gates.
In the quantum case, several examples of universal gate libraries are known (see for example [6] ). In this work we focus on one involving arbitrary single-qubit operations and C-NOT gates. This set is particularly well-suited to certain architectures in which these operations are straightforwardly R. Iten is an M.Sc. student of physics at ETH Zürich, Department of Physics, Otto-Stern-Weg 1, 8093 Zürich, Switzerland.
R. Colbeck is with the Department of Mathematics, University of York, YO10 5DD, UK.
I. Kukuljan is an M.Sc. student of mathematical physics at the University of Ljubljana, Faculty of Mathematics and Physics, Jadranska ulica 19, 1000 Ljubljana, Slovenia.
J. Home is with the Institute for Quantum Electronics, ETH Zürich, OttoStern-Weg 1, 8093 Zürich, Switzerland.
M. Christandl is with the Department of Mathematical Sciences, University of Copenhagen, Universitetsparken 5, 2100 Copenhagen Ø, Denmark.
implemented. Of these operations, C-NOT is often the most difficult to perform since in all experimental architectures it involves connecting the qubits using an additional degree of freedom [7] , [8] . Taking this as our motivation, we consider circuits that minimize the number of such gates.
This task has been previously considered both for arbitrary unitary operations and for state preparation (see for example [9] , [10] and references therein). In [9] , a decomposition scheme was found for an arbitrary unitary operation on n qubits that requires 23 48 4
n C-NOTs to leading order, approximately twice as many as the best known lower bound. Similarly, in order to prepare a state of n qubits, the best known construction requires 23 24 2 n C-NOTs to leading order if n is even [10] , and 2 n to leading order if n is odd [11] , which is again approximately twice the best known lower bound.
State preparation and arbitrary unitaries are special cases of a wider class of operations, isometries. An isometry is an inner-product preserving transformation that maps between two Hilbert spaces that in general have different dimensions. As a simple example, the map |0 to |0 ⊗ |0 and |1 to |1 ⊗ |1 is an isometry from one qubit to two. Physically, isometries can be thought of as the introduction of ancilla in a fixed state (conventionally |0 ) followed by a general unitary on the system and ancilla (in the case of the previous example, this unitary can be chosen to be a single C-NOT). However, because its action only has to be specified when the ancilla systems start in state |0 , there is a lot of freedom when constructing the general unitary. This freedom can be exploited to lower the number of C-NOTs needed. In the special case where the input and output spaces have the same dimensions, the isometry is a unitary operation, while state preparation corresponds to an isometry from a (trivial) one-dimensional space to that of the required output. In this work we consider the problem of synthesis of general isometries from m qubits to n m qubits.
The paper is structured as follows. First, in Section II, we summarize relevant background material and introduce our notation. In Section III-A we derive a lower bound of required C-NOTs to decompose an isometry V from m to n (qubits) of 1 2 2 m+n − 1 4 2 2m to leading order. Our main result is presented in Section III-B, where we describe a column-bycolumn (looking at V as a 2 n × 2 m matrix) decomposition of an isometry. The idea of the decomposition is to decompose the isometry into 2 m unitaries, where each of these unitaries implements one column of the isometry. The decomposition of the unitaries for each column develops an idea used for state preparation using uniformly controlled gates [11] . We find that the column-by-column decomposition of an m to n isometry requires 2 m+n − 1 24 2 n C-NOTs to leading order (an explicit upper bound for n 8 and 1 m n is given in (39)).
In Section III-C we provide a second decomposition scheme by adapting the scheme of [9] , which is based on the CosineSine Decomposition (CSD), to isometries. This approach performs better than the column-by-column approach for m ≃ n (for large n, there is improvement only in the cases m = n−1 and m = n, where the second case corresponds exactly to the decomposition of [9] ). The CSD approach requires 23 144 (4 m + 2 · 4 n ) C-NOTs to leading order (for 2 m n an explicit upper bound is given in (40)).
Using the two decomposition described above, for large enough n, any m to n isometry can be implemented using at most 16/7 times the number of C-NOTs required by the lower bound.
The asymptotic results are summarized in Table I (see Section III-D). We optimize the C-NOT counts for isometries from m qubits to n qubits for small n in Section IV. The results are summarized in Table II . These are most likely to be of practical relevance for experiments performed in the near future.
In addition, the CSD approach can be used to lower the bestknown C-NOT count for state preparation on an odd number of qubits, and we can hence show that, to leading order, 23 24 2 n CNOTs are required for state preparation for both even and odd n. Previously, this bound was only known to be achievable for an even number of qubits [10] , with a slightly weaker bound in the odd case [10] , [11] . By exploiting a known technical trick we can also save one C-NOT gate for state preparation on an even number of qubits. In particular we reach the lowest known C-NOT counts for state preparation on 4 and 5 qubits: assuming the initial state |0 ⊗4 or |0 ⊗5 , 8 or 19 C-NOTs are enough to generate an arbitrary 4-or 5-qubit state respectively. In Table III of Appendix B we give an overview of C-NOT counts for some special controlled gates that are used for decompositions arising in this paper.
Experimental groups strive to demonstrate their ability to control a small number of qubits, and the ultimate demonstration would be the ability to do any quantum operation on them (i.e., any completely positive trace-preserving (CPTP) map). Since any such operation can be implemented via an isometry followed by partial trace (using Stinespring's theorem), we can use our decomposition scheme for isometries to generate an efficient way to synthesize arbitrary CPTP maps. This is discussed in Section V.
II. BACKGROUND AND NOTATION
We work in the circuit model of quantum computation in which the fundamental information carriers are qubits. Since it is sufficient to consider only pure states to describe quantum computation, the state of a qubit can be described by a normalized element of a two dimensional complex Hilbert space H 1 := span C {|0 , |1 }, where |0 and |1 denote two orthonormal basis states, called the computational basis states, and the subscript '1' denotes one qubit. An arbitrary (pure) state |ψ of a qubit can be written in the form
where γ, θ, ϕ ∈ R. The global phase shift e iγ 2
has no observable effects so we can ignore γ and think of the variables θ and ϕ as determining a point on the unit sphere (usually known as the Bloch sphere [6] ). In this representation the basis states |0 and |1 correspond to the north and south pole respectively.
A. Multiple qubits
Consider a system comprising an array of n qubits in a fixed order (an n qubit register). A state |ψ of an n qubit register can be described by a normalized state living in the 2 n -dimensional Hilbert space H n = H ⊗n 1 . Up to global phase (which has no physical significance), a pure state of an nqubit register can be specified by 2 n+1 − 2 real parameters. A computational basis state of the Hilbert space H n can be written as |b n−1 ⊗ |b n−2 ⊗ · · · ⊗ |b 0 or, in short notation, as |b n−1 b n−2 . . . b 0 , where b i ∈ {0, 1}. To abbreviate further we write |b n−1 b n−2 . . .
i.e., we interpret the bit string b n−1 b n−2 . . . b 0 as a binary number. If n = 1 we omit the subindex. Thus, |1 3 = |001 = |0 ⊗ |0 ⊗ |1 , for example.
We say that an n-qubit state |ψ has product form if |ψ can be written in the form |ψ = |ψ n−1 ⊗ |ψ n−2 ⊗ · · · ⊗ |ψ 0 , where |ψ i ∈ H 1 describes the state of the (n − i)th qubit. We call the process of rotating a state to product form disentangling.
B. Single-qubit gates
In the circuit model of quantum computation, information carried in qubit wires is modified by quantum gates, which correspond mathematically to unitary operations. A general n qubit gate can be represented as a 2 n × 2 n unitary matrix, and requires operating on all n qubits simultaneously. However, it is convenient to break such gates down into simpler components, and in this work we focus on arbitrary single-qubit gates and the C-NOT gate.
In particular, we will use the following single-qubit gates:
which correspond to rotations by angle θ about the x-,y-and z-axes of the Bloch sphere. One important special case is the NOT gate, σ x = iR x (π).
Lemma 1 (ZYZ decomposition):
For every unitary operation U acting on a single qubit, there exist real numbers α, β, γ and δ such that
A proof of this decomposition can be found in [6] . Note that (by symmetry) Lemma 1 holds for any two orthogonal rotation axes. Lemma 1 shows that a one qubit gate can be specified by three real parameters neglecting the (physically insignificant) global phase e iα . This is analogous to the description of a rotation in 3-dimensions being parameterized in terms of three Euler angles, here β, γ and δ. Lemma 2: Let |ψ ′ ∈ H 1 and define r such that ψ ′ |ψ ′ = r 2 . There exist U 0 , U 1 ∈ SU (2), such that
Proof: Define |ψ = 1 r |ψ ′ and |φ = − ψ|1 |0 + ψ|0 |1 ∈ H 1 . Then U 0 = |0 ψ| + |1 φ| is unitary with det U 0 = 1 and obeys (6) . U 1 can be obtained analogously.
C. Multi-qubit gates and quantum circuits
Our circuit decompositions will use one two-qubit operation, the controlled-NOT gate, or C-NOT. It is a unitary operator acting on the four dimensional Hilbert space H 2 = span C {|00 , |01 , |10 , |11 } that flips the second qubit if the first qubit is in the state |1 , and otherwise does nothing, i.e., the operator form of a C-NOT gate is given by |00 00| + |01 01| + |11 10| + |10 11|.
Together with arbitrary single-qubit gates, the C-NOT is universal for quantum computation in the sense that an arbitrary n-qubit unitary can be decomposed in terms of these gates alone [12] . Motivated by the relative complexity of twoqubit operations over single-qubit ones, in this work we use the number of C-NOT gates required in a decomposition as a measure of the complexity of a gate sequence. This is justified in experimental settings because multi-qubit gates require additional degrees of freedom for their implementation compared to single-qubit gates. This provides additional channels for the introduction of decoherence. The mediated interaction also typically requires longer gate times, increasing susceptibility to the direct qubit decoherence. As an example, the current lowest infidelities achieved experimentally are < 10 −6 for single-qubit gates [13] and ∼ 10 −3 for two qubit gates [14] . It is convenient to represent quantum circuits diagrammatically. Each qubit is represented by a wire and gates are shown using a variety of symbols. Conventionally time flows from left to right. For example, the two qubit circuit
• U represents a C-NOT operation followed by a unitary on the lower qubit, i.e., the operation I ⊗ U .
In the next subsection we generalize the example of the C-NOT gate to the concept of controlled gates.
D. Controlled gates
The IF-THEN construction is very important in classical computation. Its quantum counterpart is a controlled operation. In the case of two qubits, we may want to perform U on the second qubit if the first qubit is in state |1 (and otherwise do nothing). This controlled operation can be written in operator form as |0 0|⊗I +|1 1|⊗U , which has block diagonal matrix form I ⊕ U , and the corresponding circuit is
• U A C-NOT gate is a special case of such a controlled gate, where U = σ x .
Analogously, we can control on the first qubit being in the state |0 , with such a gate corresponding to the matrix U ⊕ I. We denote this using an unfilled circle on the control-qubit in a quantum circuit.
With a sequence of such controlled gates we can implement an IF-THEN-ELSE logical operation, which corresponds to a block diagonal matrix of the form U 0 ⊕U 1 in the computational basis. We call a gate of this form a uniformly controlled gate (UCG), where some authors also use the terminology of "Multiplexed gates" [9] . We draw a UCG with an unfilled square on the control qubit. Note that it is equivalent to a sequence of two controlled unitaries, one controlled on |0 and the other controlled on |1 , i.e.,
We will use circuit equivalences like the above throughout this paper. The general meaning of a circuit equivalence is the following: for all possible values of the (free) parameters on the left hand side there exist values for the parameters on the right hand side such that the two sides perform the same operation (up to a global phase). Thus, the above equivalence encodes the statement that for any two-qubit UCG, there exists a unitary controlled on |0 and another controlled on |1 such that the circuits are equivalent. With this convention, Lemma 1 can be expressed as
where the gate R z , for example, denotes a rotation gate with unspecified angle. If we use symbols for certain gates which have not been introduced before, they are considered to be arbitrary quantum gates. Often these are denoted by U . We use ∆ for gates that are diagonal in the computational basis. If the same symbol is used as a placeholder for more than one quantum gate, we mean that all gates are of this form, but the gates themselves don't have to be identical (as in the previous example where although R z appears twice on the right hand side, each instance can have a different rotation angle).
The types of gate introduced above are readily generalized to allow for controls on several qubits. We use l-qubit-C u k (U ) to denote a gate that performs a different l-qubit unitary for each possible state of k control qubits, where U is a place holder for a size 2 k set of 2 l -dimensional unitary operations. If l = 1 we abbreviate the notation to C u k (U ). If we write R x , R y or R z instead of U , we mean that all the 2 k singlequbit gates that determine the UCG are of the form of the corresponding rotation gate.
In order to write such gates out more precisely, we split the Hilbert space of n qubits into a 2 k -dimensional space corresponding to the control-qubits, a 2 l -dimensional space corresponding to the target-qubits and a 2 f -dimensional space, where f := (n − l − k), corresponds to the free qubits, i.e., the qubits we neither control nor act on:
f −1} and U i1 denotes the quantum gate acting on the target qubits if the control qubits are in the state |i 1 k . If each member of the set U i1 apart from one (call this one U j ) are equal to the identity operation, we drop the word "uniformly" and call such an operation a k-controlled l-qubit gate, denoted by l-qubit-C k (U j ), or more generally a multi-controlled gate (MCG). If l = 1 and we want to emphasize the total number n of qubits of the system being considered, we add an n as a second subindex, i.e. C k (U ) becomes C k,n (U ) (this notation is only used in the Appendix).
By way of example, the following circuit diagram shows a 2-qubit-C u 2 (U ), C 3 (U ) (or C 3,4 (U )) and C 2 (U ) (or C 2,4 (U )) gate in this order (from left to right)
Note that the C k (U ) notation does not specify which are the control-and which are the target-qubits and whether we control on |1 or on |0 ; these must be made clear in the particular context.
Each uniformly k-controlled gate can be decomposed into a sequence of 2 k k-controlled gates, as should be clear from the following example for the case k = 2, l = n − 2 and n 3.
Note, that the symbol "\" stands for a data bus of several (in this case l) qubits. We can always transform a gate controlled on |0 on a certain control-qubit of a MCG into a gate controlled on |1 using two σ x gates, as illustrated below.
III. DECOMPOSITION OF ISOMETRIES
As mentioned in the introduction, detailed constructions already exist for obtaining efficient (in terms of C-NOT count) gate decompositions for arbitrary unitary operations and creating arbitrary (pure) states. These are two extremes of a wider class of operation known as an isometry. Such an operation is an inner-product preserving transformation between two Hilbert spaces, and in this work we will consider isometries from m qubits to n qubits, where m, n ∈ N 0 and m n. One can think of such an operation physically as one in which n − m qubits are in a known initial state (say |0 ), while m start in an unknown state and the desired output is a state of n qubits.
A. Lower bound
In this subsection we prove the following lower bound. Lemma 3: Let m and n be natural numbers with n 2 and m n and V be an isometry from m qubits to n qubits. In any decomposition of V into a circuit comprising single-qubit unitaries and C-NOTs, the number of C-NOT gates required, N iso (m, n), satisfies
The proof follows a similar argument used to derive a theoretical lower bound for general quantum gates [15] , [16] or for state preparation [10] . An m to n isometry can be represented by a 2 n × 2 m complex matrix satisfying V † V = I 2 m ×2 m . Therefore such an isometry is described by 2 n+m+1 − 2 2m − 1 real parameters, where the −1 accounts for the physically negligible global phase.
Consider the qubits before any circuit is performed. Of these, m start in an unknown state, while the remaining n− m can each be taken to start in state |0 . Consider now applying single-qubit unitaries individually to each of these n qubits. Each such unitary introduces at most 3 parameters (neglecting the global phase shift; cf. Lemma 1). However, for the qubits that start in state |0 , only two parameters are introduced (cf. (1)). In order to introduce further parameters, C-NOT gates are required.
One might expect each C-NOT gate to allow the introduction of six real parameters by placing arbitrary single-qubit rotations after the control and target. However, since R z gates commute with control qubits, and R x gates with target qubits, we can introduce at most four parameters for each additional C-NOT gate [15] , [16] . In essence we are using the following circuit
We conclude, that we can introduce at most 3m+2(n−m)+4r real parameters using r C-NOT gates.
In order for this to be a valid gate decomposition, the number of introduced parameters must exceed the number of parameters required to specify an arbitrary m to n isometry. Thus, the number of C-NOTs required for such an isometry,
, from which we obtain the claim.
Fig. 1. Implementing the first column of an isometry V from m 0 qubits to n = 4 qubits. The action of G 0 on ψ 0 0 := V |0 m can be decomposed into operators {G i 0 } i∈0,1,2,3 , where
The upper part shows how these gates successively zero the entries of the column, while the lower part gives the circuit representation. The inverse of this decomposition scheme was introduced in [11] for state preparation together with an efficient decomposition of the uniformly controlled gates G i 0 into C-NOTs and single-qubit gates. The symbol " * " denotes an arbitrary complex number.
B. Column-by-column decomposition of an isometry using uniformly controlled one-qubit gates
This subsection is devoted to our main result that we state as the following theorem.
Theorem 4: Let m and n be natural numbers with n 2 and m n and V be an m to n isometry. There exists a decomposition of V in terms of single-qubit gates and C-NOTs such that the number of C-NOT gates required satisfies
Note that our proof is constructive, and the exact count is given in (39) of Appendix C. Comparing with (9), we see that for n ≫ m, our construction uses about twice the number of C-NOTs required by the theoretical lower bound.
In the proof we extend a technique for state preparation of [11] so that it applies to arbitrary isometries. We defer a rigorous proof of the theorem to Appendix C, and instead use this section to explain the main ideas behind the argument.
The isometry V can be described by a 2 n ×2 m matrix, which can instead be represented by a 2 n × 2 n unitary matrix G † , by writing V = G † I 2 n ×2 m , where I 2 n ×2 m denotes the first 2 m columns of the 2 n × 2 n identity matrix. Note that G † is not unique (unless m = n). Our aim is to find a decomposition of a quantum gate of the form G † in terms of C-NOTs and single-qubit gates. Since a C-NOT gate is inverse to itself and the inverse of a single-qubit unitary is another single-qubit unitary, this is equivalent to an analogous decomposition of a unitary operation G satisfying
In essence, the idea is to find a sequence of unitary operations that when applied to V successively bring it closer to I 2 n ×2 m . We will do this in a column by column fashion, first choosing a sequence of quantum gates, corresponding to a unitary G 0 that get the first column right, i.e., G 0 V |0 m = I 2 n ×2 m |0 m = |0 n . Then using G 1 to get the second column right without affecting the first, i.e.,
n , and so on. In other words, G k gets the (k + 1)th column right and acts trivially on the first k columns.
The unitary operation G 0 can be decomposed using the reverse of the decomposition scheme for state preparation as described in [11] . First we act with a UCG G 0 ∈ H n−2 . We continue in this fashion until we've disentangled all the qubits. So we have constructed a quantum gate
By construction, the matrix form of G 0 V has first column (1, 0, . . . , 0), and, since G 0 is unitary, the first row also has the form (1, 0, . . . , 0). It is natural to imagine repeating this construction for each column in turn. However, without further modification, this procedure doesn't work since the action required for the decomposition of later columns affects those that have already been done. In other words, if we construct a unitaryG 1 exactly analogously to the construction above, we can obtainG 1 G 0 V |1 m = |1 n , but, in general,
In the following we describe how to construct a unitary G 1 setting the second column of G 0 V to (0, 1, 0, . . . , 0) without affecting the first column. We construct
0 is a circuit for preparing the state ψ 0 0 ; in this sense we have performed the inverse of state preparation.
Implementing the second column of an isometry V from m 1 qubits to n = 4 qubits. The operation of
Note that all these gates act trivially on |0 n . The symbol " * " denotes an arbitrary complex number.
choosing the unitary operations such that the first entry of each pair becomes zero (see Fig. 2 ). In other words, defining ψ
Because the first entry of ψ 0 1 is already 0 due to the form of G 0 , we can set the upper most 2 × 2 block of the uniformly controlled gate G 0 1 , i.e. the block acting on the states |0 n and |1 n , to the identity. Therefore we can perform this step without affecting the first column, i.e. G 0 1 G 0 V |0 m = G 0 1 |0 n = |0 n . The next step would be to do the same to ψ 1 1 (i.e., zero every second entry). Doing so using a C u n−2 (U ) gate would, in general, have a nontrivial effect on the basis state |0 n . Therefore we modify the procedure and instead use a C u n−2 (U ) gate to zero every second entry except that in the upper most double block of ψ 1 1 or equivalently that in the upper most block of four elements of G 0 1 ψ 0 1 . We subsequently correct for this using an additional MCG acting on the second least significant qubit, i.e., we set G
. With this additional MCG we can directly address the quantum states corresponding to the two non zero entries in the upper-most four-element block. Indeed, controlling on |0 ⊗ |0 ⊗ · · · ⊗ |0 on the first (n − 2) qubits and on |1 on the least significant qubit we can zero the second non zero entry of the upper-most four-element block without affecting |0 n .
We conclude that G
We continue in this way, until we've disentangled the first significant qubit and therefore constructed a operation G 1 such that
This procedure can be continued in a similar fashion, leading to unitaries
For a general description of the construction of the unitary G k see Appendix C. We can hence construct a unitary operator
In order to compute the number of C-NOTs used for such a decomposition, we use the following existing results:
k − 1 C-NOTs are sufficient to decompose a UCG with k controls, up to a diagonal gate [11] . 2) 2 m − 2 C-NOTs are sufficient to decompose a diagonal gate acting non trivially on m qubits [17] In order to take advantage of 1), we require a small modification to our decomposition scheme. Note that if we implement the UCGs up to diagonal gates, i.e., if we replace each UCG with a UCG followed by a diagonal gate (more precisely, for every k, C
where ∆ k+1 is a diagonal gate on k + 1 qubits), then the effect of these diagonal gates can be corrected for at the end of the entire circuit. This can be done by adding a diagonal gate that acts non-trivially on m qubits and whose C-NOT count is given in 2). (In fact, the number of C-NOTs required for this is of sufficiently low-order that it doesn't feature in the count of Theorem 4.) Furthermore, as shown in Lemma 2, we only require MCGs C n−1 (W ) for W ∈ SU (2), and hence can use 3). In fact, we have modified the decomposition described in [12] and used some technical tricks (see Appendix A) to obtain a C-NOT count for a C n−1 (W ) gate with leading order 28n.
We conclude that we can decompose each column of an isometry using at most n−1 s=0
C-NOTs. Note that (for simplicity) we have overcounted the number of additional MCGs, since in the above we have assumed each G s k requires an additional MCG. Therefore, to decompose an m to n isometry, we require at most 2 m 2 n + O n 2 + 2 m = 2 m+n + O n 2 2 m C-NOTs.
Remark 1:
In some physical realizations it is difficult to implement C-NOT gates between non-adjacent qubits. Our decomposition can be adapted to the gate library containing only nearest neighbour C-NOT and single-qubit gates. To do so, note that the UCGs used to implement one column of an m to n isometry can be performed with at most (5/3)2 n +O n 2 nearest neighbour C-NOT gates [11] . Furthermore, since a C-NOT gate acting between qubits a distance n apart can be decomposed using O (n) nearest neighbour C-NOT gates [9] , the MCGs used to implement one column use O n 3 nearest neighbour C-NOT gates. Therefore the decomposition of an m to n isometry uses at most (5/3)2 m+n + O n 3 2 m nearest neighbour C-NOT gates.
C. Decomposition of isometries using the Cosine-Sine Decomposition
The most efficient known decomposition scheme for arbitrary unitary operators in term of the number of C-NOT gates required uses the CSD [9] . In this section we adapt the decomposition scheme used in [9] 
Theorem 5: Let m and n be natural numbers with 2 m n and V be an isometry from m qubits to n qubits. There exists a decomposition of V in terms of single-qubit gates and C-NOTs such that the number of C-NOT gates required satisfies
The Cosine-Sine Decomposition (CSD) [18] was first used by [19] in the context of quantum computation. In particular, the CSD states that every unitary matrix U ∈ C 2 n ×2 n can be decomposed in terms of unitaries A 0 , A 1 , B 0 , B 1 ∈ C 2 n−1 ×2 n−1 and real diagonal matrices C and S satisfying C 2 + S 2 = I:
The CSD can be summarized by the gate identity
Together with
(which is Theorem 12 of [9] ) it allows a recursive decomposition of an arbitrary unitary operation in terms of single-qubit gates and uniformly controlled R y and R z gates. In the case of an isometry, we again use a representation in terms of a unitary matrix, V n , such that V = V n I 2 n ×2 m . Now, if n > m, we can take the control qubit of the first (n − 1)-qubit-C u 1 (U n−1 ) gate to be in the state |0 , and hence this gate need not be uniformly controlled. Thus, the following circuit identity holds
Note that V n−1 represents a m to n − 1 isometry. In the matrix representation the circuit identity above corresponds to setting B 1 := B 0 in (13) . We can decompose the (n − 1)-qubit-C u 1 (U ) gate as above so that
We can use this idea to recursively decompose V n . The uniformly (n − 1)-controlled rotations can be decomposed using at most 2 n−1 C-NOT gates [17] , [20] . The two U n−1 gates can be decomposed by using the CSD and (14) recursively until two-qubit gates remain (each of which can be implemented with 3 C-NOTs). In this way it can be shown that each U n−1 requires at most (9/16)4 n−1 − (3/2)2 n−1 C-NOT gates [9] . Note that this is not the optimal count reached in [9] , but we use this slightly weaker count here for simplicity (a count that takes into account the additional optimizations of the Appendix of [9] can be found in Appendix D). The C-NOT count for an m to n isometry, N iso (m, n), hence satisfies the recursion relations
Solving these leads to the claimed count. Remark 2 (Optimized state preparation): As a byproduct of the above we obtain an improved bound over that of [11] on the number of C-NOT gates required for state preparation on an odd number n = 2k + 1 5 of qubits. The optimized decomposition is based on [10] and described in Appendix E. The count (49) using state preparation as done in [11] on k qubits, which requires 2 k − k − 1 C-NOTs, gives the following count for state preparation starting from the basis state |0 ⊗n :
Note that this count has the same leading order as the lowest known C-NOT count for state preparation on an even number of qubits [10] .
D. Summary of asymptotic results
We have summarized the lowest known C-NOT counts used to decompose an m to n isometry for large n in Table I . The two special cases m = 0 and m = n have already been considered in the past. The case m = 0 corresponds to state preparation starting from the state |0 n . The lowest known C-NOT count for state preparation on an even number of qubits uses the Schmidt decomposition [10] . Using the CSD approach of Section III-C we could lower the C-NOT count of [10] for state preparation on an odd number of qubits reaching a slightly lower C-NOT count than the previously best-known decomposition (2 n to leading order [11] ). The case m = n corresponds to synthesis of an arbitrary n qubit gate and the lowest C-NOT count in this case is found using the CSD [9] .
The C-NOT count using the column-by-column approach follows from Section III-B, modified slightly to use the , n odd (EC: see (39) (4 n − 3n − 1)⌉ [15] , [16] asymptotic best-known C-NOT counts for state preparation to implement the first column of the isometry (i.e., the number of C-NOTs, N SP (n), required to implement the gate G 0 is taken from the first column of Table I ). Alternatively, the count follows directly from (39) of Appendix C. To compare the various techniques, let us introduce c CC (m, n) and c CSD (m, n) as the ratio of the C-NOT count for the column-by-column approach or the CSD approach, respectively, to that of the lower bound for an m to n isometry. For n ≫ m the column-by-column approach of Section III-B has c CC (m, n) ≃ 2 (i.e., the decomposition uses roughly twice the number of C-NOT gates of the lower bound). In the case m ≃ n, the CSD approach can be better. To see this, note that for any natural number d, for sufficiently large n, c CC (m(n), n) 2 d+2 /(2 d+1 − 1) for all m(n) n − d. In particular c CC (n − 2, n) ≃ 2.3 and c CC (n − 1, n) ≃ 2.7 for large n. The growth of c CC (m, n) with respect to m can be understood recalling that we implement every column of the isometry in a similar fashion, however there are a lot of constraints on the last few columns due to orthogonality. On the other hand, the CSD is adjusted to the unitary structure. For m = n − 1 we can use the CSD approach of Section III-C to again reach c CSD (n − 1, n) ≃ 2 for large n. Note that c CC (0, n) ≃ 2 [10], [11] and also c CSD (n, n) ≃ 2 [9] for large n. In this sense our decomposition of an arbitrary isometry is comparably optimal to that of the special cases: state preparation and that of an arbitrary unitary operation.
Remark 3 (CSD approach zeroes too many entries):
Recall that constructing a gate V n such that V = V n I 2 n ×2 m is equivalent to constructing a gate V † n such that V † n V = I 2 n ×2 m . Therefore, rewriting (13), the first recursion step of the CSD approach leads to
If m < n − 1 we apply the same procedure to B 0 . However, in this case, we have already zeroed more entries than necessary in the first recursion step. Specifically, it was unnecessary to zero at least half of the entries in the upper right and in the lower left 2 n−1 × 2 n−1 -dimensional block of the matrix on the rhs of (18) , and the number of unnecessary zeros grows as m decreases. This intuitively explains why the CSD approach is not well-suited to m to n isometries, where m < n − 1: by zeroing too many entries, more C-NOT gates are used than needed.
IV. ISOMETRIES ON A SMALL NUMBER OF QUBITS
We have presented two methods to decompose an m to n isometry. Asymptotically the CSD approach of Section III-C only outperforms the column-by-column approach in the case where m = n − 1 or m = n. For small m and n, one has to check more carefully, and in certain cases tailored modifications of the techniques can be used to further reduce the C-NOT count.
In particular, decomposing the MCGs arising in the columnby-column decomposition as described in [12, Lemma 7.9] contributes significantly to the total C-NOT count for a m to n isometry if n is small. Since the decomposition scheme of UCGs up to diagonal gates [11] is very efficient for small n we may also use it to decompose the MCGs, which are a special case of UCGs. For example, in the case m = 1 and n = 3, the decomposition scheme of Section III-B leads to the following circuit.
Note that we have set U u 1,2 = I and that we use state preparation (SP) on three qubits to implement the first column of the isometry, which uses at most three C-NOT gates [21] . We transform the MCGs into UCGs and absorb the neighbouring UCGs:
We can implement all uniformly k-controlled gates up to diagonal gates using at most 2 k − 1 C-NOT gates [11] . Therefore we can implement a 1 to 3 isometry using at most 12 C-NOT gates. A general C-NOT count for an m to n isometry using this technique is given in Appendix F. In the special case of a 1 to 2 isometry we use an ad hoc decomposition based on [9] , [15] , [16] , which is described in Appendix G.
We have summarized the results in Table II , where for each m and 2 n 10 we have chosen the decomposition scheme using the lowest number of C-NOT gates. The construction of Table II is partly recursive (for more details, see Appendix H). We use a similar parameter counting argument as in Section III-A to find a lower bound of required C-NOTs for the implementation of an arbitrary CPTP map via a quantum circuit. First we use the Choi-Jamiolkowski isomorphism to simplify the counting of the parameters that are required to describe an arbitrary CPTP map. By the Choi-Jamiolkowski isomorphism the set of all CPTP maps from a system A consisting of m qubits to a system B consisting of n qubits is isomorphic to the set of all density operators ρ AB on H A ⊗ H B satisfying tr B (ρ AB ) = 1 2 m I A . Since a density operator ρ AB is Hermitian, it can be described by 2 2(m+n) real parameters. The condition tr B (ρ AB ) = 1 2 m I A corresponds to 2 2m constraints, and hence the determination of a CPTP map requires 2 2(m+n) − 2 2m real parameters. We restrict our analysis of the lower bound to the following setting: For the implementation of a CPTP map E from a system A to a system B we allow the use of an arbitrary number of qubits k on which we can perform C-NOT and single-qubit gates, and finally we trace out qubits. Since tracing out qubits commutes with quantum gates on the other qubits, without loss of generality, we trace out a system C consisting of k−n qubits at the end of the circuit. Now we use a similar argument as described in Section III-A, but instead of commuting the R x and R z gates to the left of each C-NOT, we commute them to the right. Therefore we perform arbitrary single-qubit unitaries on all of the qubits at the end of the circuit (cf. (10)). Since we have unitary freedom on the system C (because tr C ((I B ⊗ U C )ρ BC (I B ⊗ U † C )) = tr C (ρ BC ), the single-qubit gates on each qubit of the system C at the end of the circuit cannot introduce additional parameters. Hence, using r C-NOTs, we can introduce at most 4r + 3n real parameters. By the parameter count for a CPTP map given above, we conclude that at least By Stinespring's theorem, every CPTP map E from a system A to a system B can be implemented with an isometry V from 2 For a more rigorous proof one could use a similar argument as given in [15] , [16] . system A to system BC, where the system C consists of (at most) n+m qubits, followed by partial trace on C. We can use the column-by-column approach described in Section III-B to decompose the isometry V , which requires 4 m+n − 1 24 2 2n+m C-NOTs in leading order (without using the unitary freedom on C). Therefore we have found a way to implement an arbitrary quantum channel in a constructive and exact way using about four times the number of C-NOTs required by the lower bound.
Remark 4:
The results of this section are derived in the setting where the CPTP map must be implemented in the quantum circuit model. However, this is not the only possibility. For example, alternative methods for the implementation of quantum channels are described in [22] and [23] , which allow for additional classical randomness to implement the channel. In future work we will investigate how to use our approach in an alternative model that allows either measurements or classical randomness as additional resources, in order to further improve the C-NOT counts.
Remark 5: By Naimark's theorem any POVM on a system A can be implemented by an isometry from system A to an enlarged system AB followed by a measurement on system B. Therefore our decomposition schemes for isometries can also be used for the implementation of arbitrary POVMs.
APPENDIX A DECOMPOSITION OF MCGS
In this section we describe how to efficiently decompose MCGs C n−1,n (W ), where W ∈ SU (2). The decomposition scheme is nearly the same as described in [12] . We introduce some technical tricks to reduce the C-NOT count needed for this decomposition. Note that the same number of C-NOTs is required whether we control on one or zero (see Section II-D). We denote a k-controlled NOT gate acting on n qubits by C k,n (σ x ). In the case k = 2 and we control on |1 ⊗ |1 , we call such a gate a Toffoli gate.
Lemma 6 (C 1,2 (U ) gates [12, Corollary 5.3] ): Any C 1,2 (U ) gate can be decomposed using two C-NOT gates, three special unitary gates A, B and C and a diagonal gate Fig. 3 . Decomposition of the action part of a multi-controlled Toffoli gate. The gates A and V are defined by: A = Ry(
) and V 2 = σx of the form E = |0 0| + e iδ |1 1|, where δ ∈ R.
•
gates [12, Lemma 6.1] ): Any C 2,3 (U ) gate can be decomposed as follows
Lemma 8 (Toffoli gates [12, section 6.2]):
. We can decompose a Toffoli gate up to phase shift with the following decomposition.
Proof: To see this, note that if the second control-qubit is in the state |0 , the least significant qubit is unchanged, since AA † = I. If the second control-qubit is in the state |1 and the first control-qubit in the state |0 , the action on the least significant qubit is A 2 σ x A † 2 , which is −|0 0| + |1 1|. If both control-qubits are in the state |1 , the action on the least significant qubit is Aσ x Aσ x A † σ x A † = σ x . We choose the diagonal gate ∆ such that |010 is mapped to − |010 .
Lemma 9 (Commutation of diagonal gates and UCGs):
5 denote the total number of qubits considered and k ∈ {3, . . . , ⌈ n 2 ⌉}, then we can implement a C k,n (σ x ) gate with at most N C k,n (σx) = (8k − 6) C-NOTs.
To illustrate the idea, consider the decomposition leading to the desired C-NOT count for k = 4, n = 7. Lemma 7.2 of [12] shows that action part reset part
However, we consider instead the alternative decomposition action part reset part
To see that this is also valid, note that the diagonal gates ∆ i are of the same kind as introduced in Lemma 8 and therefore ∆ i = ∆ † i . By Lemma 9 the two ∆ 2 and ∆ 1 gates cancel each other out. In addition, the combination of all gates between the two ∆ 0 gates together correspond to a UCG acting only on the least significant (lowest) qubit, and hence the two ∆ 0 gates cancel out each other by Lemma 9.
The Toffoli gates acting on the least significant qubit can be decomposed using Lemma 7. Since a Toffoli gate is its own inverse, we can also decompose it with the inverted decomposition (i.e., reversing the gate order). The other Toffoli gates can be decomposed together with the diagonal gates using Lemma 8. This leads to a decomposition of the action part of the last circuit as shown in Fig. 3 . The marked gates cancel out each other, because they commute with the gates between them. The reset part can be decomposed analogously.
Proof of Lemma 10: First we apply Lemma 7.2 of [12] (a circuit diagram for the case k = 5 and n = 9 can be found in [12] ). By similar arguments as used in the special case above, we introduce a corresponding diagonal gate for each Toffoli gate apart from the two that act on the least significant qubit (i.e., on the target qubit of the C k,n (σ x ) gate).
The required C-NOT count for C k,n (σ x ) is thus equal to twice that required for the reset part plus the number of CNOTs needed to implement the Toffoli gates that form the first and last gates in the action part. For the latter we use 
-NOT COUNTS AND NUMBERS OF REAL PARAMETERS THAT CAN BE INTRODUCED INTO A CIRCUIT BY A SPECIFIC GATE, FOR VARIOUS CONTROLLED GATES.
Gate Notation C-NOT count (upper bound) # real parameters Uniformly controlled gate (up to a diagonal gate)
Uniformly controlled rotation C u n−1 (Rz)/C u n−1 (Ry) 2 n−1 [17] , [20] 2 n−1
Multi controlled special unitary gate C n−1,n (W ) 28n − 88 if n 8 is even (Thm. 12) 3
(W ∈ SU (2)) 28n − 92 if n 8 is odd (Thm. 12)
Multi controlled Toffoli gate
⌉} (Lemma 10 ) 0 4 C-NOTs and 4 C 1,2 (U ) gates (each of which requires 2 CNOTs using Lemma 6), to give 12 C-NOTs. One reset part uses N reset C k,n (σx) = 4(k − 3) + 3 C-NOTs. This leads to the claimed count.
Lemma Lemma 7.3] ): Let n 5 denote the total number of qubits considered. A C n−2,n (σ x ) gate can be decomposed into two C k,n (σ x ) and two C n−k−1,n (σ x ) gates, where k ∈ {2, 3, . . . , n − 3}.
For example, the decomposition for n = 7 and k = 4 is shown in the following circuit diagram.
Theorem 12 (C n−1,n (W ) gates, where W ∈ SU (2)): Let n 8 and W ∈ SU (2). We can decompose a C n−1,n (W ) gate using at most (28n − 88) C-NOTs if n is even and (28n − 92) C-NOTs if n is odd.
Proof: To aid the proof, we provide illustrations for the case n = 8. By Lemma 7.9 of [12] there exist quantum gates A, B, C ∈ SU (2) such that we can decompose the C n−1,n (W ) gate as follows.
By Lemma 11 we can decompose the C n−2,n (σ x ) gates using two C k1,n (σ x ) and two C k2,n (σ x ) gates, where we set k 2 = ⌈n/2⌉ and k 1 = n − k 2 − 1. In our example k 1 = 4 and k 2 = 3:
Since the C n−2,n (σ x ) gate is its own inverse, we can use the inverted decomposition scheme to decompose the second C n−2,n (σ x ) gate. We can decompose the gates C k1,n (σ x ) and C k2,n (σ x ) using Lemma 10. Note that this works for all n 8, since 3 k 1 , k 2 ⌈n/2⌉. We can lower the C-NOT count with some technical tricks. As in the proof of Corollary 7.4 of [12] we can decompose all Toffoli gates not acting on the least significant qubit up to diagonal gates. This can be seen by reversing the decomposition scheme of Lemma 10 for the second and fourth C k1,n (σ x ) gate and using Lemma 9. Therefore using the same technique as in Lemma 10, but implementing all Toffoli gates up to diagonal gates, we can decompose each of the C k1,n (σ x ) gates using
Now consider the marked part of the last circuit. By Lemma 10 this can be decomposed using
where, to simplify, we have not explicitly illustrated the diagonal gates. The two reset parts commute with the controlled B gate, since they don't act on the two least significant qubits, and cancel out. Therefore each of the marked C k2,n (σ x ) gates uses N C k 2 ,n (σx) − N reset C k 2 ,n (σx) = 4k 2 + 3 C-NOTs. We decompose the other two C k2,n (σ x ) gates exactly as in Lemma 10. Using Lemma 6 for the three single controlled gates then leads to the claimed C-NOT count.
APPENDIX B OVERVIEW OF C-NOT COUNTS FOR CONTROLLED GATES
We summarize C-NOT counts for some commonly-used uniformly and not uniformly controlled gates in Table III . Note that implementing a uniformly controlled C u n−1 (U ) gate up to a diagonal gate ∆ means that we implement C u n−1 (U )∆, for some diagonal gate ∆. The number of real parameters required to specify a particular gate is shown in the final column and follows from Lemma 1 and the block diagonal form of the uniformly controlled gates (see also Section III-A). For example, a C u n−1 (U ) gate is described by 2 n−1 (2 × 2)-unitaries. By Lemma 1 this corresponds to 4 · 2 n−1 real parameters. Since a diagonal gate ∆ on n qubits is described
. . .
. . . by 2 n real parameters, a C u n−1 (U )∆ gate is described by 4 · 2 n−1 − 2 n = 2 n real parameters.
APPENDIX C RIGOROUS PROOF OF THE DECOMPOSITION SCHEME DESCRIBED IN SECTIONIII-B AND EXACT C-NOT COUNT
We begin this section by introducing some additional notation. For m ′ ∈ N and k ∈ {0, 1, . . . , 2 We now consider an elementary step in the decomposition scheme. Let n ∈ N 2 , m ∈ N with n m, k ∈ {1, 2, . . . , 2 n − 1} and s ∈ {0, 1, . . . , n − 2}. Furthermore suppose |ψ is an n-qubit state of the form
where
Since it is clear from the context that, e.g., |l ∈ H n−s , we shorten the notation and write |l instead of |l n−s .
[Note that we use the following convention: If s − 1 < 0, we mean that the part |k s−1 k s−2 . . . k 0 in (19) does not exist, i.e., for s = 0 the statement of (19) 
⊗0 means that no such part exists in the considered expression. Similarly we set {n s , . . . , n e } = ∅ if n e < n s .]
Lemma 13: Take |ψ e := 
There exist a UCG A := C u n−1−s (U ) of the form
such that |ψ ′ := A |ψ has the form
where c
Additionally, A has the property that
Proof: The following proof depends on whether k s = 0 or k s = 1. In the case k s = 0 we has also to distinguish between the cases b We now determine the UCG A. To ensure that A fulfils (23) we set: . . . * * Fig. 5 . Decomposition scheme of a quantum gate G k . The notation " * " surrounded by the square signifies either a control on one or on zero.
where r ∈ R. = 0 because the corresponding entry in |ψ e is initially zero by (19) and A acts trivially on it by (24a). So in all cases we can write |ψ ′e = Fig. 4 ). Therefore, A |ψ is of the desired form (22) and by construction A satisfies (23) .
Lemma 14: Let k ∈ {1, 2, . . . , 2 n − 1} and s ∈ {0, 1, . . . , n − 1} be such that k s = 0 and b k s+1 = 0. Let |ψ be an n-qubit state of the form (19) . Then there exist a MCG B := C n−1 (U ), whose non trivial part is of the form
, such that we can write
where c = 0. In addition, B leaves the first k basis states invariant B |i = |i for i ∈ {0, . . . , k − 1}.
Proof: Since k s = 0 the condition (27) is satisfied by construction of the gate B. We define the gate U with Lemma 2 such that
where r ∈ R. Lemma 15 (Implementing one column of an isometry): Let k ∈ {1, 2, . . . , 2 n − 1}. Let |ψ ∈ H n be an n-qubit state such that i|ψ = 0 for i ∈ {0, 1, . . . , k − 1}. There exist a quantum gate G k with the following properties:
where ϕ i ∈ R for all i ∈ {0, 1, . . . , k}.
Proof: We claim that we can implement the operator G k with a circuit of the form as shown in Fig. 5 .
[Note that we have interchanged the order of the MCGs and the UCGs compared with Section III-B. We are allowed to do this, since the gates commute by their construction.]
The structure of this decomposition is based on the idea used for state preparation in [11] . The diagonal gates in {∆ i } i∈{0,1,...,n−1} are present so we can use the efficient decomposition of the UCGs up to diagonal gates in [11] . Note that we never use the MCG C n−1 (U 0 ), since we can absorb it into the UCG C u n−1 (U u 0 ). Formally we write:
To keep the notation simple, we don't write down which of the n qubits are the control/target qubits. The target qubit of the controlled gates with lower index s is the (n − s)th qubit. We consider all controlled gates as n qubit gates. If there are free qubits, i.e., qubits that are neither controlled nor acted on, they are the least significant ones.
We use Lemma 13 recursively to disentangle one qubit after another starting from the state |ψ . More formally: We define the state |ψ s := s−1 s ′ =0 O s ′ |ψ for s ∈ {1, 2, . . . , n} and we set |ψ 0 := |ψ . To determine the gate C u n−1−s (U u s ) for s ∈ {0, 1, . . . , n−2} we apply Lemma 13 on the state |ψ By construction, the operators O s leave the states {|i } i∈{0,1,...,k−1} invariant (up to phase shifts caused by the diagonal gates).
Lemma 16 (C-NOT count for one column of an isometry): Let k ∈ {1, 2, . . . , 2 n − 1}. We can decompose a quantum gate G k , which is of the form as describe in Lemma 15, using at most ((2 n − n − 1) + Q k (n)N Cn−1(U) ) C-NOTs, where Q k (n) := |{s : k s = 0 ∧b k s+1 = 0, s ∈ {0, 1, . . . , n− 1}}| and N Cn−1(U) denotes the number of C-NOTs used to decompose an C n−1 (U ) gate.
Proof: To decompose the quantum gate G k we use the decomposition scheme described in the proof of Lemma 15. The number of C-NOTs used to decompose the UCGs (together with the diagonal gates) give a count of Σ n−1 s=0 (2 n−1−s − 1) = 2 n − n − 1 C-NOTs [11] . By the construction of the proof of Lemma 15 we conclude, that the quantity of MCGs used for the decomposition of G k is at most Q k (n). We add the number of C-NOTs used to decompose Q k (n) MCGs to the C- 
m − 1}, which is equivalent to (37).
Theorem 19 (C-NOT count for an isometry):
Let m and n be natural numbers with n 8 and V be an isometry from m qubits to n qubits. There exists a decomposition of V in terms of single-qubit gates and C-NOTs such that the number of C-NOT gates required satisfies
where N SP (n) denotes the number of C-NOTs required for state preparation on n qubits starting from the state |0 n , N ∆ (m) 2 m − 2 denotes the number of C-NOTs required to decompose a diagonal gate acting on m qubits [17] and N G (m, n) is the number of C-NOTs used to decompose the gates in {G i } i∈{1,2,...,2 m −1} .
Proof: We decompose V as described in Lemma 18, and {G i } i∈{1,2,...,2 m −1} as in the proof of Lemma 15. By Lemma 16 we have
where Q(m, n) = 2 m (n − m 2 − 1) − n + m + 1 is the number of MCGs used, as given by Corollary 17, and N Cn−1(U) denotes the number of C-NOTs needed to decompose a MCG C n−1 (U ), given by Theorem 12. Note that we require U ∈ SU (2) to use Theorem 12. This causes no problems in our construction, since Lemma 14 holds for U ∈ SU (2). The gate G † 0 can be decomposed using a decomposition scheme for state preparation, which finishes the proof.
Corollary 20 (Explicit C-NOT count for an isometry): The number of C-NOTs required to decompose an m to n 8 isometry V satisfies
Proof: Theorem 12 implies that N Cn−1(U) 28n − 88 for all n (for simplicity we over-count in the case that n is odd). The asymptotic best-known C-NOT counts for state preparation (see Table I ) give us the upper bound
The number of C-NOTs used to decompose a diagonal gate ∆ acting on m qubits is at most N ∆ (m) = 2 m − 2 [17] . Using (38) this leads to the claimed count.
APPENDIX D OPTIMIZATION OF THE DECOMPOSITION OF AN ISOMETRY
USING THE CSD Theorem 21 (Optimized CSD approach): Let m and n be natural numbers with 2 m n and V be an isometry from m qubits to n qubits. There exists a decomposition of V in terms of single-qubit gates and C-NOTs such that the number of C-NOT gates required satisfies
(40) Note that we recover the optimized C-NOT count for general quantum gates [9] setting n = m in (40).
Proof: We optimize the C-NOT count of Section III-C using the two ideas described in the Appendix of [9] . There it is shown how one can combine the decomposition of the C u i (R y ) gates with neighbouring i-qubit-C u 1 (U ) gates to save one C-NOT gate over what would be required if the C u i (R y ) gates were decomposed on their own. The essential idea is to use the circuit identity
The same idea also works for the CSD adapted to isometries, allowing us to save 1 C-NOT per uniformly controlled R y gate.
To count the number of uniformly controlled R y gates Q Ry (m, n) used for an m to n isometry using the decomposition scheme of Section III-C we use the following recursion relation: 
where the last relation comes from Appendix A of [9] . Solving these gives Q Ry (m, n) = 1 144 2 2n+1 + 4 m + 1 3 (n − m − 1) .
The CSD decomposition is used until the only generic unitaries that remain are on two qubits. In Appendix B of [9] it is shown how to save one C-NOT gate for each of the remaining two-qubit gates apart from one. Again this idea also works using the CSD adapted to isometries. The number of two-qubit gates Q U2 (m, n) arising in the decomposition scheme described in Section III-C satisfies the following recursion relation: 
where the last of these relations is taken from Appendix B of [9] . Solving these gives
The optimized C-NOT count is thus given by
whereÑ iso (m, n) is given by (12) . This lead to the claimed count.
APPENDIX E OPTIMIZED STATE PREPARATION
For state preparation on two and three qubits there exist ad hoc methods using one and three C-NOT gates respectively [21] . For state preparation on n 4 qubits we use the decomposition scheme described in [10] . In the case that n is even, this uses the following iterative circuit:
where we have divided the qubits into two groups of n/2. In other words, state preparation on n qubits is equivalent to state preparation on n/2 qubits, n/2 C-NOTs, and then two n/2 qubits unitary operations. If n is odd, then U 2 is replaced by an ⌊n/2⌋ to ⌊n/2⌋ + 1 isometry.
If n is odd we can implement U 2 using the CSD approach. Furthermore, we can use a similar technical trick as described in Appendix B of [9] to save one C-NOT gate when implementing U 1 : as noted in Appendix B of [9] all apart from one of the two-qubit gates arising in the decomposition of a general unitary can be decomposed using two C-NOT gates. For the last one we can also extract a diagonal gate and merge it with the state preparation, since the diagonal gate commutes through the control qubits of the C-NOT gates that precede U 1 .
In other words, for n even, we have
where for the purpose of evaluating N iso in these counts, we use (40). Starting from N SP (2) = 1 and N SP (3) = 3, this allows us to iteratively compute N SP (n) for increasing n leading to the C-NOT counts presented in the first column of Table II in Section IV.
APPENDIX F C-NOT COUNT FOR AN ISOMETRY ON A SMALL NUMBER OF QUBITS
Lemma 22 (Column-by-column approach for small n): Let m and n be natural numbers with m n and n 2 . There exists a decomposition of an isometry from m qubits to n qubits in terms of single-qubit gates and C-NOTs such that the number of C-NOT gates required satisfies
where N SP (n) denotes the number of C-NOTs required to prepare a n-qubit state starting from the state |0 ⊗n (see Appendix E), N ∆ (m) 2 m − 2 denotes the number of C-NOT gates required to decompose a diagonal gate ∆ acting on m qubits [17] , N C u k (U) 2 k − 1 denotes the number of C-NOT gates required to decompose a C u k (U ) gate up to diagonal gates [11] and Q s (m) is given by (34) and (35).
Proof: The proof is based on the decomposition scheme of Section III-B or of Appendix C respectively and the techniques described in Section IV. We use the notation of Appendix C in this section. To decompose the gate G † 0 we can use a decomposition scheme for state preparation on n qubits. For the gates in {G i } i∈{1,2,...,2 m −1} we use the techniques described in Section IV. For each such gate we use an uniformly (n − 1)-controlled gate to disentangle the least significant qubit. To disentangle the second least significant qubit we use Q 1 (m) uniformly (n − 1)-controlled gates and 2 m − 1 − Q 1 (m) uniformly (n − 2)-controlled gates, where Q 1 (m) is defined in the proof of Corollary 17 in Appendix C. This is the case, since merging an (n − 1)-controlled gate and a uniformly (n − 2)-controlled gate results in a uniformly (n − 1)-controlled gate as described in Section IV. Continuing in this fashion leads to the claimed C-NOT count.
APPENDIX G C-NOT COUNT FOR A 1 TO 2 ISOMETRY We present an ad hoc decomposition for a 1 to 2 isometry V reaching the theoretical lower bound of two C-NOT gates. Our result is based on the following decomposition of an arbitrary two-qubit operator U described in [9] , [15] , [16] .
We represent V by a unitary matrix V 2 such that V = V 2 I 2 2 ×2 1 . Since we are only interested in the first two columns of V 2 , we can replace the diagonal gate ∆ of the last circuit by a single-qubit diagonal gate acting on the least significant qubit. Absorbing this gate into the neighbouring (arbitrary) single-qubit gate we conclude the following circuit equivalence.
|0
V
APPENDIX H CONSTRUCTION OF TABLE II
Since some C-NOT counts of Table II depend on others, we fill out the table column by column starting from m = 0 up to m = n. The first column (m = 0) corresponds to state preparation and uses the techniques described in Appendix E. For m 1 we compare the counts (38), (50) and (40). Note, that the counts (38) and (50) corresponding to the columnby-column approach use the count for state preparation on n qubits (i.e., the first column of the table). We plug in the lowest of the three counts into Table II. For example, in the case m = 3 and n = 9 to obtain the count given by (38), the first column of Table II has already been calculated and shows that we can implement state preparation on 9 qubits using N SP (9) 440 C-NOTs. A diagonal gate on 3 qubits requires N ∆ (3) 2 3 − 2 CNOTs [17] . By Theorem 12 N C8(U) 28 ·9 − 92 and therefore N G (3, 9) 11034 (see proof of Theorem 19). We conclude that N iso (3, 9) 11480.
