Several known algorithms for synthesizing quantum circuits in terms of elementary gates reduce arbitrary computations to diagonal [1, 2] . Circuits for n-qubit diagonal computations can be constructed using one (n − 1)-controlled one-qubit diagonal computation [3] and one inverter per pair of diagonal elements, not unlike the construction of classical AND-OR-NOT circuits based on the lines of a given truth table of a one-output Boolean function. More economical quantum circuits for diagonal computations are known [5, 2] in special cases.
Introduction
Logic circuits provide a notation for compositions of multiple functions, often bearing computational semantics. For example, F[g(x), h(y)] might be represented by three gates labelled F, g and h. Input lines of gates g and h are then labelled x and y, and their output lines enter the gate F, whose output lines carry the result of the computation. Observe that if g and h are viewed as one gate, any pair (x, y) is a valid input, which is how classical bits are combined into bit-strings. The situation is similar in quantum computing, except that qubits are complex two-dimensional vectors. Combined gates are applied to tensor-product vector spaces and represented mathematically by matrix tensor products. With an appropriate gate library, a logic circuit outlines how to implement a given computation in hardware. This motivates circuit synthesis, i.e., finding circuits that implement functionally-specified computations.
We briefly recall the following Definitions [9] . First, an n-qubit state vector is an element of ⊗ n 1 C[|0 , |1 ], with abbreviations such as |01 = |0 ⊗ |1 being typical. In this work, measurements are not allowed midcomputation. Thus, a quantum computation is a 2 n × 2 n matrix with complex entries which is moreover unitary, i.e., AĀ t = 1. Here, A is written in terms of the computational basis |00 · · · 0 , |00 · · · 1 , |00 · · · 10 , . . . * Partially supported by the University of Michigan mathematics department VI-GRE grant and the DARPA QuIST program. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing official policies or endorsements, either expressed or implied, of employers and funding agencies.
With more than three qubits, some quantum computations can be specified more compactly using quantum circuit diagrams such as those in Figure 3 . The more significant qubits correspond to higher lines. Gates are applied left to right and are chosen from the following universal gate library [1] :
• A y-axis Bloch sphere rotation R y (θ) = cos θ/2 sin θ/2 − sin θ/2 cos θ/2 , where 0 ≤ θ < 2π.
• A z-axis Bloch sphere rotation R z (α) = e −iα/2 0 0 e iα/2 , where 0 ≤ α < 2π.
• A CNOT gate CNOT ℓ j , which is controlled on the j th line and changes the ℓ th . Given a bit string b 1 · · · b n and letting ⊕ denote the XOR operation also known as addition in the field of two elements F 2 , the gate CNOT ℓ j exchanges basis states
The three types of gates above can be applied to any qubits and are considered elementary. We will also use several types of composite gates as shortcuts for frequently used circuits. For example, the one-qubit NOT gate (also known as Pauli-X or σ x ) is expressed using two elementary gates, up to a global phase:
(1) We recall that the term CNOT means "Controlled-NOT", i.e., NOT applied on the controlled line iff the control line carries 1. Along the same lines, we define the k-controlled NOT gate denoted k-CNOT which preserves the values b s 1 , b s 2 , . . . , b s k on its k control lines and produces b ℓ ⊕ (b s 1 b s 2 . . . b s k ) on the controlled line ℓ, i.e., the inverter is applied iff all control lines carry 1. The 2-CNOT is also called the Toffoli gate. In Section 2 we outline how k-CNOT gates are implemented in terms of elementary gates and consider a more general construction of k-controlled one-qubit gates in which a given one-qubit gate is applied iff all control lines carry 1. When using such gates in a broader context, one may want to specify the k control lines as a subset S of all lines. Then we say that the gate is S-controlled. We denote such S-controlled R z (θ) rotations by CR z (S, θ) and point out that their decompositions into elementary gates are given in [1, 7] . These composite quantum gates are used in circuit synthesis algorithms in Section 3 Furthermore, we consider a less common type of control which applies a given one-qubit gate iff the values on control lines XOR to one rather than AND to one. In Section 4 such XOR-controlled gates are denoted by XR z (S, θ) and used in circuit synthesis algorithms. We also decompose them into elementary gates as shown in Figure 3 .
As in prior work [2] , our main goal is to produce quantum circuits with elementary gates for a specified sort of quantum computation. Our main result is a generalization of a prior result [2, §2.2] to n-qubit computations. The prior result asserts that any 2-qubit unitary matrix whose off-diagonal entries in the computational basis vanish could be synthesized in 5 elementary gates or less. Definition 1.1 An n-qubit unitary matrix A = (a i j ) is diagonal iff a i j = 0 whenever i = j. We describe a quantum computation as diagonal when its associated matrix in terms of the computational basis is diagonal. Finally, the notation A = diag(b 1 , · · · , b n ) means A is an n × n diagonal matrix A = (a i j ) with a ii = b i . Proposition 1.2 Let A = diag(z 1 , · · · z 2 n ) be an n-qubit diagonal quantum computation. Then there exists an (n − 1)-qubit diagonal quantum computation B = diag(w 1 , · · · , w 2 n−1 ) and a one-qubit diagonal C = diag(y 1 , y 2 ) so that A = B ⊗C if and only if
The check that such a tensor satisfies the chain of equalities is routine.
For the opposite implication, begin with diag(z 1 , z 2 , · · · , z 2 n ). Then define the C of the statement by C = diag(z 1 , z 2 ). Now A being unitary demands z 1 = 0. Thus, choose
Before giving a key definition, we recall that in Lie theory the term character denotes a continuous complex-valued function χ with the property χ(ab) = χ(a)χ(b) whose arguments are typically matrices. Our manuscript does not assume the familiarity with Lie theory, and we state all necessary properties of characters explicitly, e.g., for a character χ we typically consider the function log χ which has the property log χ(ab) = log χ(a) + log χ(b).
Definition 1.3
Let D (n) be the set of n-qubit diagonal quantum computations. Then for j = 1, · · · , 2 n−1 − 1, we de-
By Proposition 1.2, the elements of D (n) which are tensors are precisely the elements of ∩ 2 n −1 j=1 ker χ j . The circuit synthesis algorithm then proceeds as follows for the diagonal input matrix A.
• For each nonempty subset S of the lines {1, · · · , n − 1}, we build a circuit XR z (S, θ) whose effect depends on a variable R z (θ) gate within the block. However, each translates the vector [log χ 1 (A) · · · log χ 2 n−1 −1 (A)] t by a θ dependent multiple of some nonzero vector.
• An appropriate check shows that the various vectors of the last item are all linearly independent. Thus, we may choose θ's so that all characters vanish on XR-circuits • A, given that there are 2 n−1 − 1 nonempty subsets of {1, · · · , n − 1} and similarly 2 n−1 − 1 characters.
• This produces a circuit decomposition A = XR-Circuits•(B⊗ C) with B an (n − 1)-qubit diagonal and C a 1-qubit diagonal.
• Recurse on B.
The gate count implied by this construction is given below. Figure 1 : The CA({1, 2, 3}, θ, φ) gate shown on the left is surrounded by inverters on lines one and two. The top line is an ancilla (work) qubit initialized to |0 . Say A = αE 11 + βE 22 = e iφ R z (θ) for some α, β ∈ C and angles φ, θ. Then the circuit realizes the diagonal quantum computation, where all diagonal elements are 1.0 except in positions 2 = 0010 b , 3 = 0011 b . These matrix coefficients are respectively α and β. By changing the placements of inverters, any four-qubit diagonal computation may be built from eight sub-circuits of the type shown. Generally in n qubits, the use of ancilla allows any (n − 1)-conditioned A gate to be realized in θ(k) rather than in θ(k 2 ) gates [1] . Figure 2 shows a decomposition of the 3-CNOTs into Toffoli gates (2-CNOTs) which can be further decomposed into elementary gates [1] .
This compares favorably with a straightforward algorithm shown in Section 2 where the leading term of the gate count is estimated at (i) 12n2 n if one ancilla qubit is available, and (ii) θ(n 2 2 n ) otherwise. Also, we lower the leading term from 12n2 n to 1 2 n2 n . To our knowledge, no comparable results have been published. Diagonal quantum computations are considered in a different context [5] where measurement is allowed as an elementary gate.
The remaining part of the manuscript is structured as follows. Known results are discussed in Section 2, including those in [5] and also the construction of k-controlled z-rotations. The latter are used for synthesis algorithms in Section 3. A novel construction of XOR-controlled circuits is given in Section 4.1 and used in synthesis algorithms in Section 4.2. The correctness of our main algorithm is proven in Section 4.3. Gate counts are given in Section 5, and the manuscript concludes with directions for future work in Section 6. Our constructions are illustrated by examples.
Prior Work
Given a truth table of an n-to-1 Boolean function ϕ : (F 2 ) n → F 2 , an AND-OR-NOT circuit computing ϕ easily follows. Namely, consider an input bit-stringb with ϕ(b) = 1. Using AND and NOT gates, one constructs a one-output circuit which produces 1 iff its input is b. Build such circuits for all lines of the truth table corresponding to ϕ = 1 and connect their outputs to one giant OR gate. This circuit computes ϕ and can be decomposed into truly elementary twoinput gates via any tree decompositions of the large AND and OR gates. The resulting overall circuit may be referred to as two-level logic, disjunctive normal form, and sum-of-product (SOP) decomposition. The AND operation may be viewed as logic multiplication, and the OR operation as Boolean addition.
Two-level AND-OR logic is a standard topic in textbook logic synthesis [4] . Its optimization is NP-hard and was intensively studied at least since the 1960s. Two-logic optimization algorithms and tools (e.g., Espresso) are widely known, and some are used in commercial CAD tools. More recently, two-level ESOP, a.k.a. EXOR-SUM, decompositions have been introduced. Such decompositions use XOR gates rather than OR gates and can be produced from AND-OR decompositions, e.g., along the lines of (b
. Publicly available tools for such ESOP-decomposition include, EXORCISM-4 [6, 8] .
While our main focus is on quantum circuits, we observe that classical two-level AND-OR circuits can be constructed with every nonzero value in the truth table separately, by means of a sub-circuit responsible only for that line of the truth table. A number of parallels exist between the synthesis of classical two-level logic on n inputs and diagonal n-qubit computations. However, note that the present work focuses on arbitrary diagonal n-qubit computations and worst-case complexities, which are often similar to averagecase for both classical and quantum circuits. Yet industrial benchmarks are often easier than a randomly chosen functions or circuits. This work does not consider parallels to specific two-level logic optimization tools which target industrial circuits.
In reversible logic circuits, AND gates are not available, but k-CNOT gates allow for a similar construction. These gates act on a bit-string by computing the conjunction of the k control bits and then XOR'ing the result to the prior value of the controlled bit. Inverters, necessary at some inputs of some AND gates in classical two-level circuits, can be modelled by pairs of inverters on control lines of k-CNOT gates -one before and one after. To this end, we denote the collection of parallel inverters on lines from a subset S by ⊗ S X, where X stands for an inverter. 1 The giant OR gates are then modelled by chaining multiple k-CNOT gates that share the same controlled bit through the circuit. In fact, such circuits correspond one-to-one to XOR-SUM decompositions of the Boolean function computed on the controlled bit and have been studied in the context of ROM-based classical and quantum computation [10] where signals on only some lines can be modified.
More generally, we consider other k-controlled one-qubit unitaries besides the k-CNOT gates which are directly analogous to k-input AND gates. Specifically, the multi-argument AND gates will be replaced by the quantum computations CA(S, θ, φ) realized by the circuit diagram of Figure 1 . In analogy to the giant XOR gate we compose CA(S, θ, φ) acting on the same n th qubit. In this case, the effect of all CA is cumulative, and each one-qubit computation is applied only when its control qubits have the right values. Classical two-level circuit decompositions can be viewed similarly. Definition 2.1 Let S ⊂ {1, · · · , n − 1} and S = / 0, and let θ, φ ∈
the Bloch sphere rotation of the introduction. Then CA(S, θ, φ) is that quantum computation whose action on the computational basis states is given as follows:
For S and θ as above, we also introduce the notation CR z (S, θ) = CA(S, θ, 0). Direct computation verifies that the CA(S, θ, φ) are diagonal and moreover each is associated with a circuit diagram in Figure  1 . Although some gates in Figure 1 are not elementary, their decompositions are known [1] . Using appropriate placement of inverters around the CA({1..n − 1}, θ, φ) gate, i.e., ⊗ S X •CA({1..n − 1}, θ, φ) • ⊗ S X, even-numbered pairs of consecutive diagonal matrix elements may be modified to any two values in U(1). Composing such computations one can synthesize an arbitrary diagonal quantum computation.
According to the decomposition in Figure 1 [1, Lemma 7.11], such a k-controlled one-qubit diagonal computation may be viewed 1 ⊗ S X is a slight abuse of notation as we also implicitly tensor by 1 2 
as an elementary one-qubit computation flanked by two k-CNOTs if one reusable ancilla qubit initialized to |0 is available. Also [1, Lemma 7.4 ], a k-CNOT can be implemented using 8(k − 3) Toffoli gates for any k ≥ 5 given an ancilla qubit. Accounting for further normalizations and cancellations, a k-CNOT is implemented with 48k − 116 elementary gates and one ancilla qubit initialized to |0 .
To produce an overall gate count (or, rather an upper bound), we observe that any two diagonal quantum computations U,V can be reordered (UV = VU). In particular, this is true for cir-
To set each pair of entries of a given diagonal computation, we choose one such circuit for every subset of S ⊂ {1, . . . , n − 1}. To summarize, we have 2 n−1 CA({1..n − 1}, θ, φ) gates, each taking up to 48n − 164 elementary gates. To count elementary gates in the inverters, we note that every possible S occurs exactly once. Therefore, we
The overall count of 2 n−1 (49n − 165) elementary gates to implement an arbitrary n-qubit diagonal in the presence of an ancilla qubit may be improved by ordering the Figure 1 circuits so as to cancel most of the inverters. This can be achieved via Gray-code ordering of subsets S where every two consecutive subsets differ by exactly one X gate. Thus a single inverter will separate every consecutive pair of CA({1, · · · , n − 1}, θ, φ) circuits. This decreases the overall gate count to 2 n−1 (48n − 164) + 2 n−1 (2) − 2 = 2 n−1 (48n − 163) − 2, barring further cancellations of CNOT gates in the implementations of CA({1, · · · , n−1}, θ, φ). However, since lingering inverters are equally distributed on all possible lines, one does not expect that more than half of the 1-CNOTs per CA({1, · · · , n−1}, θ, φ) gate will cancel on average. Therefore, our estimated gate count is 12n2 n . Interestingly, without the single ancilla qubit used above, a k-CNOT gate requires a quadratic rather than linear number of elementary gates. Moreover without the ancilla, the overall gate count becomes θ(n 2 2 n ) rather than θ(n2 n ). The following sections accomplish two improvements, using characters defined on D (n):
• We achieve θ(n2 n ) absent any ancilla qubit.
• We lower the leading term of the gatecount from 12 n2 n to n 2 2 n . Finally, we point out that special-case circuits for diagonal computations are proposed in [5] . They use measurements on ancilla qubits (and thus are not purely combinational quantum circuits) and do not solve our generic synthesis problem for diagonal computations. Moreover, later steps of their algorithms depend on outcomes of earlier measurements, which is roughly equivalent to using multiplexor gates, also known as MUX or if-then-else gates, immediately after measurement. Furthermore, the emphasis of that work is on A = diag(a 1 , · · · , a 2 n ) ∈ D (n) for which at most p(n) of the a j = 1 for some polynomial in n, where moreover an oracle f (b) detecting nonunit entries may be evaluated using polynomial resources. Despite the radically different setting, we emphasize the following point. The generic measurement algorithm [5, §3] would need to synthesize 2 n − 1 ≈ 2 n nonunit diagonal phases within the tensor ⊗ n j=1 R z (θ j ) individually. Since, generically, phases are unlikely to repeat, the functions U f in the worst case become U δ¯b [5, p.1351] forb an n − 1 bit string and δb the corresponding delta function. Thus in the present notation, U δ¯b is (n − 1)-conditioned CNOT. These cost roughly 50n elementary gates ( §2, [1] ) in the presence of an ancilla. The work in [5] approximates arbitrary rotation with binary digits, and approximating ⊗ n j=1 R z (θ j ) to one-bit precision requires at least 100n2 n elementary gates in addition to 2 n measurements. To improve the generic algorithm's performance in this case, a criterion for detecting full tensors [5, §4] was provided to trap ⊗ n j=1 R z (θ j ) as an exception and execute the tensor as such. In contrast, our main algorithm ( §4, §5) and a secondary construction ( §3) both implement ⊗ n j=1 R z (θ j ) in n elementary gates as the given tensor, modulo blocks of cancelling CNOT gates. No exception checking is required.
Synthesis via Controlled Rotations
This section describes a different synthesis algorithm in terms of CR z (S, θ) gates, compared to that analyzed in Section 2. This new algorithm motivates further synthesis algorithms in terms of XR z (S, θ) gates.
Subset Controlled Rotations
Following Definition 2.1, an n − 1 qubit computational basis state |b 1 b 2 · · · b 2 n−1 with b i = 1 for each i ∈ S will be referred to as Sconditioned.
Definition 3.1
Let e j , 1 ≤ j ≤ 2 n−1 − 1, denote the standard basis vectors of R 2 n−1 −1 , e.g. [0 . . . 0 1 0 . . . 0] t . The symbolb denotes the bit-string b 1 · · · b 2 n−1 or the corresponding integer with this binary representation. These states are associated to vectors in R 2 n−1 −1 as follows. For the extremal computational basis states, set v |00···0 = −e 1 and v |11···1 = e 2 n−1 −1 , else let v |b = eb − eb +1 .
Observe that the vectors {v |b }b =0 form a basis for R 2 n−1 −1 . Their application is Propositions 3.3 and 4.7. Example 3.2 In the case of n = 4 qubits where states above denote computational basis states of values on the top three lines, we form a basis of R 7 . In particular, v |000 = −e 1 does not count, while the basis consists of 6 , v |110 = e 6 − e 7 , and v |111 = e 7 . We will typically omit v |000 .
3
Now consider the map log χ : D (n) → R 2 n−1 −1 given by
This map is a group homomorphism between the commutative group D (n) and the commutative group R 
. Now if the binary expression for j represents an S conditioned state, z 2 j−1 = e −iθ/2 and z 2 j = e iθ/2 . If the binary expression for j is not S-conditioned, then each entry is one. Likewise, if the binary expression for j +1 is an S-conditioned bit-string, z 2 j+1 = e −iθ/2 and z 2 j+2 = e iθ/2 . If the binary expression for j + 1 is not S-conditioned on the other hand, each is 1. Via a case study, this verifies the formula above holds for the j th component of log χ(A). 
Thus we computed the right-hand side of Proposition 3.3. 3
Description of the Controlled Synthesis Algorithm
Note that there are 2 n−1 − 1 nonempty subsets of {1, · · · , n − 1} and 2 n−1 − 1 functions χ j : D (n) → U(1), i.e. log χ : D (n) → iR 2 n−1 −1 .
Thus, the following matrix is square. Definition 3.5
The matrix log χ[CR z (n)] is the (2 n−1 − 1) × (2 n−1 − 1) matrix defined as follows. Order nonempty subsets S 1 , S 2 , . . . S 2 n−1 −1 in dictionary order. Then for 1 ≤ j ≤ 2 n−1 − 1, the j th column of log χ[CR z (n)] is log χ[CR z (S j , 1)]. Lemma 3.6 Let θ = [θ 1 · · · θ 2 n−1 −1 ] t . Then for S 1 , S 2 , . . . S 2 n−1 −1 the dictionary ordering of the nonempty subsets of {1, · · · , n − 1}, which can be seen equal to the right-hand side.
2 We now state the controlled rotation synthesis algorithm for a diagonal unitary computation. The proof of correctness in the next subsection verifies the assertions that log χ[CR z (n)] is invertible for all n and that D is as stated a tensor.
Controlled Rotation Synthesis Algorithm Begin with
for which we wish to synthesize a circuit diagram in terms of the elementary gates of the introduction. Label S 1 , S 2 , S 3 . . . S 2 n−1 −1 to be the nonempty subsets of the top n − 1 lines {1, · · · , n − 1} in dictionary order.
1. Compute ψ = log χ(A).
Compute the inverse matrix
As is verified below, D is a tensor.
5.
Use the argument of prop. 1.2 to compute D = B ⊗ C for B ∈ D (n − 1) and C = R z (η) for some angle η.
Thus
Techniques from the literature are then used to decompose each CR z (S, θ) into elementary gates per Figure 2 [1].
7. The algorithm terminates by recursively producing a circuit diagram for B ∈ D (n − 1).
Example 3.7
Let A = diag(e 6πi/6 , e 3πi/6 , e 9πi/6 , e 8πi/6 , e 5πi/6 , e 1πi/6 , e 6πi/6 , 1). Then one has χ 1 (A) = e 2πi/6 ,
We now must compute θ by computing the inverse matrix {log χ[CR z (3)]} −1 . For this matrix, first compute the following. 
The following inverse matrix results, and it may be reused for multiple specific diagonals A.
So θ = {log χ[CR z (3)]} −1 ψ = [−π/6 − 4π/6 2π/6] t . Hence D as defined below is a tensor. 
Since D is a tensor, we obtain the following decomposition of A. ( 1, e 8πi/12 , e −3πi/12 , e −3πi/12 ) ⊗ diag(e 12πi/6 , e 6πi/6 )] (13) The algorithm then recursively synthesizes the 2-qubit diagonal diag(1, e 8πi/12 , e −3πi/12 , e −3πi/12 ). 3
Proof of Correctness of Controlled Rotation Synthesis
We briefly check that D is as claimed a tensor B ⊗C. First note that
Here, we have used the group homomorphism property of log χ(−). This property further implies
(15) So by the restatement of Proposition 1.2, we must have D = B ⊗C.
Proof: It suffices instead to consider the similar matrix corresponding to a change of basis to v |b asb runs over the bitstrings representing binary expressions for 1, 2, · · · , 2 n−1 −1.
So for example, if S k = {1, 2, · · · , n − 1}, then the k th column of M is e 2 n−1 −1 .
M is invertible since column operations reduce M to a permutation matrix. Indeed, the e 2 n−1 −1 column may be used to clear all other nonzero entries in the last row. Then each of the columns corresponding to n − 2 element subsets retain a single nonzero entry, and the corresponding rows may be cleared. Continuing by induction produces a permutation matrix. 2
Synthesis via XOR-Controlled Rotations
This section defines new composite XR z (S, θ) gates. We then propose a new synthesis algorithm based on decomposing a circuit into such XR-gates, which are then broken down into elementary gates.
XOR-Controlled Rotations
Note that XR z (S, 4πℓ) is an identity computation for ℓ ∈ Z. 
We claim that XR z (S, θ, σ) = XR z (S, θ) and show an example in Figure 3 . The claim can be verified for basis states and then extended by linearity. Indeed, those circuits contain two symmetric chains of CNOT gates, and the second chain restores all lines except for the bottom to their input values. On the last line, R z (θ) or R z (−θ) = X • R z (θ) • X is applied, depending on the ⊕-sum of all input lines of the gate. 2 The ordering of the elements in S affects the order in which the ⊕-sum is computed, but this does not affect the sum. This ⊕ s∈S b s sum in F 2 motivates the notation for circuits XR z (S, θ) introduced in Figure 3 . 3 Definition 4.4 A computational basis state |b 1 b 2 · · · b 2 n−1 is an S-flip state for a nonempty S ⊂ {1, · · · , n − 1} iff
has β k = 1. Equivalently, S-flip states are those whose bits in lines listed in S have an odd number of 1s (i.e., XOR to 1).
Example 4.5
Suppose n = 4 qubits, so the top line set is {1, 2, 3}. Then XR z ({1, 3}, θ) is provided in Figure 3 . Also shown is the circuit XR z ({1, 3}, θ, σ) for σ the flip permutation of two elements. These two circuits realize the same diagonal quantum computation. 3
Example 4.6
Consider the special case of n = 4 qubits. The flip states of each nonempty subset of {1, 2, 3} of the top three lines are given in the table below.
Shown are all bit-strings where bits in relevant positions XOR to 1. 2 Recall that a CNOT k j permutes |b
Note that |00 · · · 0 is never a flip state, so that v |00···0 never appears within the above sum.
The proof is similar to that of Proposition 3.3. However, XR z (S, θ) never leaves any computational basis state fixed, which is why the factor of θ is doubled. Example 4.8 Consider n = 4 qubits for the subset S = {1, 3} and θ arbitrary. For convenience, label φ = −θ/2, so that R z (θ) = e iφ E 11 + e −iφ E 22 . We leave it to the reader to check that A = XR z ({1, 3}, θ) is diagonal and merely describe the multiples on each computational basis state. 
Description of XOR-Controlled Synthesis Algorithm
The −0.5 radians in the definition of the following matrix cancels the −2 coefficient in equation 18. It is similar to the Definition 3.5. Definition 4.9
The matrix log χ[XR z (n)] is the (2 n−1 − 1) × (2 n−1 − 1) matrix defined as follows. Order nonempty subsets S 1 , S 2 , . . . S 2 n−1 −1 in dictionary order. Then for 1 ≤ j ≤ 2 n−1 − 1, the j th column of log χ[XR z (n)] is log χ[XR z (S j , −0.5)]. 
The third column recalls example 4.8. 3
Lemma 4.11 Let θ = [θ 1 · · · θ 2 n−1 −1 ] t . Then for S 1 , S 2 , . . . S 2 n−1 −1 the dictionary ordering of the nonempty subsets of {1, · · · , n − 1}, we have
The proof is quite similar to lemma 3.6. The multiple of −2 is due to the definition of log χ[XR z (n)], where we chose entries of ±i over entries of ±i/2.
We now state the synthesis algorithm. It is critical in the following to note that log χ[XR z (n)] is invertible for all n ≥ 1, so that one may refer to the inverse matrix. This result will be proven in Proposition 4.13 below.
Synthesis Algorithm
Begin with A ∈ D (n). Label S 1 , S 2 , S 3 . . . S 2 n−1 −1 to be the nonempty subsets of the top n − 1 lines {1, · · · , n − 1} in dictionary order.
Compute the inverse matrix
5.
Thus
The algorithm terminates by recursively producing a circuit diagram for B ∈ D (n − 1).
Example 4.12 Consider the following 3-qubit computation:
A = diag(e 4πi/12 , e 2πi/12 , e 9πi/12 , e 7πi/12 , e 3πi/12 , e 8πi/12 , e 11πi/12 , e 10πi/12 )
We begin by computing log χ[XR z (3) ]. Since {1} ⊂ {1, 2} has flip states |01 , |11 , we see the first column is e 2 − e 3 + e 3 = [0 1 0] t . Continuing this produces the matrix log
The inverse matrix appears in the algorithm and may be reused for multiple diagonal computations. 
(24) has D = B ⊗C for B a two-qubit diagonal and C a one-qubit diagonal. Note that the subset circuits above commute, so the order is immaterial.
The first step in computing D is to write XR z ({1}, 4π/24) as a diagonal matrix. Begin by noting that 
Gate Counts
An upper bound can be produced by adding gate counts in separate parts of synthesized circuits, but that does not account for possible gate cancellations between the parts. CANCELLATIONS OF CNOTS IGNORED. Since each XR z (S, θ) contains a total of 2#S + 1 elementary gates (assuming θ = 0),
For example, the zeroing block in three qubits requires 3 + 3 + 5 = 11 gates, and in four qubits -3 + 3 + 3 + 5 + 5 + 5 + 7 = 31 gates. Thus recursively synthesizing an n-qubit diagonal A will require the following number of elementary gates (n)2 n−1 + (n − 1)2 n−2 + · · · + (2)2 2−1 + 1 = (n − 1)2 n + 1 (31) Note that Theorem 1.4 is based upon not this number but the cancellations described in the next section. A CNOT CANCELLATION HEURISTIC.
Given that some CNOT gates in neighboring XR subcircuits in XR z (S 1 , θ 1 )•· · · XR z (S 2 n−1 −1 , θ 2 n−1 −1 ) may cancel, one wishes to reorder those subcircuits to facilitate more cancellations. 3 This can be achieved with a tie-breaking scheme described here, which additionally optimizes another degree of freedom related to the structure of each XR block. Definition 5.1 Let S = { j 1 , j 2 , · · · , j k } for j 1 < j 2 < · · · < j k , and let θ ∈ R. For σ : {1, · · · , k} → {1, · · · , k} a permutation, the circuit diagram XR z ([ j σ(1) j σ(2) · · · j σ(k) ], θ) corresponds to the sequence of CNOTs and R z (θ) gate of Equation 17 for XR z (S, θ, σ) . Thus the circuit diagram XR z ([ j σ(1) j σ(2) · · · j σ(k) ], θ) realizes the quantum computation XR z (S, θ, σ) = XR z (S, θ). 
Lemma 5.3 Consider the circuit diagram XR z
, and suppose θ 1 , θ 2 ∈ 4πZ. Then the number of pairs of cancelling CNOTs in this diagram is 0 if j 1 = k 1 and −1 + max{n ≥ 1 | j p = k p ∀p ≤ n} else.
Given this lemma, cancellations within XR z (S 1 , θ 1 ) • · · · XR z (S 2 n−1 −1 , θ 2 n−1 −1 ) will depend on both the choice of ordering of the S j , 1 ≤ j ≤ 2 n−1 − 1 and also on a choice of ordering of the elements of each S j . Below we show that a good ordering can be found by performing depth-first traversals of an appropriately defined rooted tree. In such a tree, every leaf corresponds to one XR block, and they can be ordered by DFS. Indeed, an XR block is uniquely identified by its subset S j , which can be encoded by a decreasing sequence of non-repeating integers. Moreover, a decomposition of such an XR block into elementary gates is determined by an ordering of set elements, and we always use the decreasing order.
Definition 5.4
The tree T (n) is a rooted tree where vertices are (labelled by) all integer sequences (n − 1)a 2 a 3 · · · a j with n − 1 > a 2 > a 3 > · · · a j ≥ 0. The root is labelled (n − 1). The directed edges begin at the vertex (n − 1)a 2 a 3 · · · a j and end at the vertex (n − 1)a 2 a 3 · · · a j a j+1 for a j+1 < a j . Figure 4 shows T (4) . In any T (n) the depth of the vertex labelled (n − 1)a 2 · · · a j is j, and the leaves of T (n) have labels of the form (n − 1)a 2 a 3 · · · a j 0.
Therefore each leaf is associated with a sequence (n−1)a 2 a 3 · · · a j . All in all, there are 2 n−2 leaves since subsets of {1, 2, · · · , n − 1} containing n − 1 are correspond to subsets of {1, 2, · · · , n − 2}.
Definition 5.5
Let v( j) denote the number of vertices of T (n) of depth j andṽ( j) the number of internal vertices, i.e., not leaves.
We then define the number of bends at depth j in T (n) as b( j) = v( j + 1) −ṽ( j).
Lemma 5.6 Within T (n), the following hold.
v( j) =
n − 1 j − 1 .
2. The number of leaves at depth j is l( j) = n − 2 j − 2 .
3.ṽ( j)
Proof: 1. and 2. are proven by induction, 3. and 4. follow by popular combinatorial identities. 2 Definition 5.7 Consider the set of all strictly-decreasing integer sequences S (n) = {(n − 1)a 2 a 3 · · · a j | (n − 1) > a 2 > a 3 > a j ≥ 0} (32) Assume a particular depth-first traversal of T (n), which in particular induces an ordering of the leaves and, hence, S (n) because the leaves correspond to subsets. Furthermore, subsets correspond to XR blocks, and therefore any DFS induces an ordering of those blocks. Our circuits use this ordering and, furthermore, implement each XR block as XR z ([S], θ), according to the decomposition of Definition 5.1. The elements of S are in decreasing order, corresponding to the label of the leaf of the tree T (n).
Below we count the number of cancelling CNOTs when decompositions of XR z ([S], θ) blocks into elementary gates are concatenated. This count is independent on how ties are broken in DFS. This count corresponds to the generic and simultaneously worst case when no R z gate is trivial.
Proof: Every XR block corresponds to a leaf of T (n). We therefore consider unique shortest paths from root to leaves. Every edge on such a path corresponds to a CNOT in our decomposition for XR. Indeed, consider the last (smallest) two integers j > ℓ, in the label of the vertex that is further from the root. Then the corresponding gate is CNOT ℓ j . Moreover, the CNOTs in the implementation of XR are ordered just like the edges are ordered on the shortest path. 4 Consider two leaves that are neighbors in a DFS-induced ordering. Then the shortest paths from the root to each leaf coincide to a certain depth j, in which ( j − 1) pairs of CNOTs cancel. The furthest vertex shared by the two paths is seen to be the least common ancestor (LCA) of the two leaves in the tree. Observe that the number leaf pairs whose LCAs are at depth j equals the number of bends b( j). The right-hand-side follows using the fourth identity in Lemma 5.6 and the differentiation of the binomial theorem. 2 Corollary 5.10 The computation performed by the circuit block XR z (S 1 , θ 1 ) • · · · • XR z (S 2 n−1 −1 , θ 2 n−1 −1 ) may be implemented so as to produce (n − 5)2 n−2 + n + 1 cancelling pairs of CNOTs. Thus, any n-qubit diagonal computation may be implemented using at most (n + 3)2 n−1 − 2n − 1 elementary gates.
