We present some basic integer arithmetic quantum circuits, such as adders and multipliers-accumulators of various forms, as well as diagonal operators, which operate on multilevel qudits. The integers to be processed are represented in an alternative basis after they have been Fourier transformed. Several arithmetic circuits operating on Fourier transformed integers have appeared in the literature for two level qubits. Here we extend these techniques on multilevel qudits, as they may offer some advantages relative to qubits implementations. The arithmetic circuits presented can be used as basic building blocks for higher level algorithms such as quantum phase estimation, quantum simulation, quantum optimization etc., but they can also be used in the implementation of a quantum fractional Fourier transform as it is shown in a companion work presented separately.
Introduction
The common representation of the elementary quantum information is the qubit, where its state is a superposition a|0 + b|1 which belongs to a two-dimensional Hilbert space with two basis states |0 and |1 known as the computational basis. A quantum computer is a finite dimensional quantum system composed of a qubits collection, performing various unitary operations on the qubits (quantum gates) and quantum measurements. Accordingly, there is a correspondence between a qubit and a classical bit, in the sense that the basis states of a qubit follow the binary logic. We can extend this correspondence to multivalued logic instead of two values only by enlarging the dimension of the elementary Hilbert space used. The qudit is a generalization of the qubit to a larger Hilbert space of dimension d > 2. The state of a qudit is a superposition a 0 |0 + a 1 |1 + · · · + a d−1 |d − 1 , where |0 , |1 , . . . |d − 1 are the computational basis states. Qutrit is a special name for the case d = 3, while ququart corresponds to d = 4. In many cases, the employment of a multivalued quantum logic is more natural. E.g. in ion traps we could exploit more than two energy levels. Multiple laser beams could be used to manipulate the transitions between these levels [1] .
Working with qudits instead of qubits may offer some advantages. The required number of qudits is smaller by a factor log 2 d than the corresponding number of qubits for the same dimension a quantum computer has to explore. E.g. the dimensions of a composite system of n qubits is 2 n , while the same dimension can be reached with only log d 2 n = log 2 2 n / log 2 d = n/log 2 d qudits. Such as reduction of the required number of physical carriers of quantum information is advantageous, considering the difficulty of reliably controlling a large number of carriers. Also, when fewer quantum information carriers are used, a decrease in the overall decoherence is expected and this fact favors the scalability issues [1, 2] .
Another advantage, which is also related to the adverse effect of decoherence, is that fewer multilevel qudit gates are required to construct a quantum circuit implementing a given unitary operation compared to the case of using two-level gates [1, 3] . Fewer gates reduce the number of steps needed to complete the circuit operation (circuit depth), and consequently less errors are accumulated during the overall operation of the circuit. Even so, protection of quantum information against environmental interaction is inevitable. Quantum error correcting codes and fault tolerant gate constructions to combat decoherence on multilevel qudits have been proposed and they are similar to the ones used for the qubit case [4, 5, 6] .
At a higher level, generalizations of known quantum algorithms and circuits using d-level qudits may offer improvements with respect to their qubits implementation counterparts. E.g., quantum phase estimation, which is the core part of Shor's algorithm [7] and also it is used in quantum simulation [8] , is improved in terms of success probability when multilevel qudits are incorporated [9] . Multiple-valued version of Deutsch-Josza algorithm has been reported in [10] while an implementation proposal for five level superconducting qudit appeared in [11] . Qudits version for Grover's algorithm [12] has been reported in [13] . The high dimensional Deutsch-Josza algorithm may find applications in image processing, while the high dimensional Grover's algorithm offers a trade-off between space and time.
An assortment of quantum gates operating on qudits have been proposed or experimentally realized on various technologies. Single and two qudit d-level gates proposed in [1, 14] for the ion trap technology. Single qudit gates for d = 5 implemented in superconducting technology and used to emulate spins of 1/2, 1 and 3/2 in [15] . Proposals for single and two qudit gates based on superconducting technology appeared in [16] . Single qudit gates based on optical technology reported in [17] . Three dimensional entanglement between photons observed in [18] .
In this work we present some quantum arithmetic circuits operating on d-level qudits by extending results given in prior works [19, 20, 21] . These circuits exploit the quantum Fourier transform and various single qudit and two qudit rotation gates to perform the desired calculations. Processing in the Fourier domain may offer some advantages related to speed [21] and robustness to decoherence [22, 23] . Among the proposed circuits are various versions of adders (adder with constant, generic adder, adder with constant controlled by single qudit) and multipliers (multiplier with constant and accumulator, multiplier with constant). Such circuits are useful in many quantum algorithms, e.g. quantum phase estimation, quantum simulation.
The increased interest in quantum information processing exploiting d-level qudits, both in theoretical and experimental aspects, was one of the stimulation for this work. However, the main motivation was the particular application targeted by the quantum circuits presented in this manuscript, which is a new definition of the fractional Fourier transform. Unitary operations on high dimensional d-level qudits fit more naturally for this specific application because the proposed fractional Fourier transform operates on a Hilbert space of dimension d n , where d = 2 is a prime. The development of the quantum fractional Fourier transform and its implementation on qudits is presented in a separate work [24] . The rest of the paper is organized as follows: A short background about design and synthesis of qudits quantum circuits is given in section 2. The elementary and basic qudit gates used in the proposed designs are given in section 3. The quantum Fourier transform definition and its circuit forudits of d levels is presented in section 4. Section 5 introduces integer arithmetic circuits like adder with constant, adder of two integers, controlled adder with constant, multiplier with constant and accumulator, and multiplier with constant. All of the arithmetic units accept one of their operands after it has been Fourier transformed. In section 6 a method to implement a diagonal operator onubits is analyzed where the diagonal elements are some powers of roots of unity. A quantum multiplier of two integers is introduced and then a quantum squarer is built upon this multiplier. It is demonstrated how to introduce relative phases between the basis states of superposition which depend quadratically on the index of the basis state. It can be generalized for a function that is polynomial in the states index. This operation is a necessary part of the quantum fractional Fourier transform presented in the companion article and also it may find other applications, such as quantum simulation algorithms and Grover's search algorithm. Appendix A gives the decomposition of a three qudits rotation gate introduced in section 6. Complexity analysis in terms of quantum cost, depth and width is reported in section 7. In Appendix B we discuss how it is possible to use a discrete library of components to approximate the proposed designs and the impact to cost and depth. A discrete library of gates is necessary if fault tolerance is to be incorporated. Finally, we conclude in section 8.
Background and related work
The construction of a complex quantum circuit operating on multilevel qudits is based on the selection of a set of elementary qudit gates and their interconnection so as to achieve the target operation. A multilevel gate operates on a single qudit, on two qudits or more qudits. A single d-level qudit gate is represented by a unitary matrix U of dimensions d × d. It transforms an initial qudit state |ψ in to |ψ out = U |ψ in . As an example consider the general superposition qutrit state |ψ in = a|0 + b|1 + c|2 . The application of the gate
on this state results to the state |ψ out = c|0 + a|1 + b|2 .
Two qudit gates operate on states of two qudits which are d 2 dimensional, so their representing unitary matrices have dimensions of
Two single qudit gates V 1 and V 2 operating on two different qudits can be seen as a two qudit gate which is their tensor product U = V 1 ⊗ V 2 . However, not every two qudit gate can be decomposed as a tensor product of two single qudit gates, in which case we have an entangling gate.
Consecutive application of single, two or more qudit gates to a collection ofudits results in a quantum circuit which is represented by a unitary matrix of dimensions d q × d q . The design of a quantum circuit is the procedure of interconnecting various elementary gates so as to fulfill the given specifications. These specifications are given in the form of a unitary matrix or the relationship between the desired input-output state relation in the computational basis.
It is proven that single qudit gates and a two qudit gate alone are adequate to form a universal set of gates, provided that the two qudit gate is an entangling gate [25] . A universal set of gates can be used to approximate any target quantum circuit with arbitrary precision. Various sets of qudit gates (gate libraries) and methods to exploit them to build more complex unitaries have been introduced in the literature. The library used in [1] consists of single and two qudit gates with continuous parameters and the synthesis method is based on spectral decomposition of the target unitary matrix. Cosine-Sine decomposition is another method used in [26] . A discrete set of single qudit gates and a single two qudit gate is used in [27] to synthesize the large unitary matrix using QR decomposition. In [14] a different two qudit gate and a set of single qudit gates is used along with quantum Shannon decomposition to synthesize the target unitary. The previous methods and results are similar to the two-level qubits synthesis cases. It is proven that the cost of the resulting circuit in terms of two qudit gates is upper bounded by O(d 2n ) where n is the qudits number [28] . Thus, these automated methods are suitable only for small quantum circuits due to the exponential cost increase.
When the target circuit is an arithmetic or logic block where its unitary matrix is a permutation matrix consisting of 0 and 1 elements, then multiple-valued reversible synthesis methods could be applied. These methods are extension of the binary reversible logic case and may be applied to a specific value of d, e.g. [29] (d = 3), or applied to any value of d [30, 31] . Similarly to the quantum synthesis case, these algorithms are not suitable for large circuits.
As many algorithms widely use quantum arithmetic blocks like adders or multipliers recurrently, it is crucial to have available efficient arithmetic and logic blocks. Ad hoc design of such blocks usually offers better results compared to the automated synthesis methods. One can exploit the iterative and regular structure of these arithmetic blocks or extend known classical designs to the quantum case. A diversity of ad hoc designed quantum arithmetic and logic circuits for two-level qubits can be found in the literature [32, 19, 33, 34, 35, 36, 37, 21] , but few (usually adders) are known for multilevel qudits and also they are mostly designed for a specific value of d. In contrast, the proposed designs are parametrized for any value of d.
One of the first ternary (d = 3) quantum adder for 3-inputs only appeared in [30] as an application example of the proposed synthesis method. Ternary quantum adders/subtractors ad hoc designed for any number of inputs is given in [38] . In [39] a ternary extension of the well known VBE ripple-carry adder [32] is reported. Quaternary (d = 4) comparators proposed in [40] . Improved designs of ternary ripple carry and carry look-ahead adders along with modifications that lead to subtractors and comparators are given in [41] . The previous ternary ripple carry adder is a modification of the CDKM binary quantum adder appeared in [33] and it has also a depth of O(n) using one ancilla qutrit. Similarly, the previous ternary carry look-ahead quantum adder is an extension of the DKRS binary quantum adder appeared in [35] and thus it offers a depth O(log(n)) using O(n) ancilla qutrits.
Several of the previous multilevel qudits (usually qutrits) designs are modifications of binary quantum adders. The gates libraries used are differentiated among each design. This is justifiable as the implementation technology for multilevel qudits is far apart to be considered matured. However, gates of one library can be expressed as gates of another one, provided that the libraries are universal.
Diagonal operator circuits on qubits or qudits don't change the absolute value of the amplitudes of a superposition, but rearrange their relative phases. Such circuits are useful in quantum algorithms [42, 43, 44] like quantum optimization, quantum simulation, Grover's search etc. Synthesis methods for diagonal unitary matrices of two-level qubits have been developed for diagonals of special structure [45, 44, 46] or for any diagonal [47] . Recently, diagonal synthesis for multilevel qudits reported in [48] . In general, the synthesis cost is related to the dimensionality of the Hilbert space covered (that is exponential in the number of the qubits or qudits) and the number of the distinct phases of the diagonal. In this work, based on ideas of [45] and the arithmetic circuits designed, we develop a diagonal operator circuit which has a special structure, that is the phases are quadratic functions of the coordinates, with polynomial cost and depth. Using same techniques, other powers, instead of quadratic, can be achieved.
The gates used in our proposed designs are the ones introduced in [14] where also physical implementation directions are given. The proposed designs use extensively various rotation gates. As the rotation angles of the gates vary with the size of the circuit and also small angles are required, implementation and fault tolerance issues are addressed in Appendix B, using results of the binary quantum case.
Many of the arithmetic multilevel quantum designs of this manuscript are direct extension or modifications of binary quantum designs appeared in [19, 20, 21] which use QFT before applying the rotation gates to one of the two integer operands and then applying the inverse QFT to bring back the result in the computational basis. The following arithmetic units are presented:
• Adder of two integers ofudits with depth of O(q) and width of 2q qudits (Subsection 5.1).
• Adder of an integer ofudits with a constant integer. Its depth is O(q) including the direct and inverse QFT blocks or O(1) without the QFT blocks. Its width isudits (Subsection 5.2).
• Single state controlled adder of an integer of q with a constant. The adder is enabled if the control qudit is in a particular basis state of the different d possible states, otherwise it acts as an identity. Its depth is O(q) and its width is q + 1 qudits (Subsection 5.3).
• Generalized controlled adder of an integer ofudits with a constant. It adds a multiple of the constant to the integer. The multiple depends on the state of control qudit, being between 0 and d − 1. Its depth is O(q) and its width is q + 1 qudits (Subsection 5.4).
• Multiplier with constant and accumulator. It multiplies an integer ofudits with a constant and adds the product to a second integer ofudits. Its depth is O(q) and its width is 2q qudits (Subsection 5.5).
• Multiplier with constant. It multiplies an integer ofudits with a constant provided that the constant is relative prime with d q , which is always the case when p is prime. Its depth is O(q) and its width is 2q qudits of whichudits are ancilla initialized to the zero state and then are reset back to the zero state (Subsection 5.6).
• Multiplier of two integers and accumulator. It multiplies two integer ofudits and adds the product to a third integer ofudits. Its depth is O(q 2 ) and its width is 3q qudits (Subsection 6.1).
• Squarer/Multiplier with constant/Accumulator. It performs the transform |x |z → |x |z + γx 2 , where γ is the integer constant. Its depth is O(q 2 ) and its width is 4q qudits (Subsection 6.2).
• General diagonal operator. It operates diagonally on a general superposition state ofudits and changes the phases of the superposition amplitudes by applying the matrix
The specific diagonal operator presented here is based on the previous squarer and some other blocks as it applies the function f (k) = γk 2 . It can be generalized for other powers of k or even for polynomial functions on k by exploiting similar techniques. It has a depth of O(q 2 ) and its width is 4q qudits of which 3q qudits are ancilla (Section 6).
Detailed complexity analysis in terms of quantum cost and depth is given in section 7, where the parameter d enters the previous rough approximations. This is because many gates like the basic rotation gates used and introduced in subsection 3.6 are synthesized using more elementary gates with a cost (and consequently depth) which depends on the dimension d of the qudits.
Elementary and Basic Gates on Qudits
We followed a hierarchical bottom-up approach to design the arithmetic circuits. At the lowest level, elementary gates operating in a two dimensional subspace of the d-dimensional space of a qudit are used. Upon them, more complex gates (which are basic for the designs) operating in the whole d-dimensional space are built. Some of the basic gates are reported in [14] , while others like the generalized controlled and doubly controlled rotation gates are introduced here (subsection 3.6, subsection 6.1 and Appendix A ).
Generalized X gates
The X (jk) gates [14] operate on a two-dimensional subspace of a d-level qudit by exchanging the basis states |j , |k , and leaving intact the other basis states, thus they are a generalization of the well known X gate for qubits which exchanges the basis states |0 and |1 . They are defined by the d × d matrix
It holds that X (jk) = X (kj) , so there are d(d − 1)/2 different such gates in this family.
Rotation gates of two levels
These gates perform a rotation on a two dimensional subspace [14] of a d-level qudit and are defined as
where σ
Parameter θ is the rotation angle, while i = √ −1.
Generalized Controlled X gates
The GCX (jk) (m) gates are generalization in the qudits of the CNOT gates acting on qubits [14] . Thus, they are gates which operate on a control and a target qudit. A GCX gate has three parameters, m, j and k, which define its operation. A GCX (jk) (m) acts like a X (jk) on the target qudit iff the control qudit is on the basis state |m . Consequently, the definition matrix of such a gate is block diagonal with dimension
where I d is the identity matrix of dimensions d × d. Equation (3) can be equivalently written as
Hadamard gate
The Hadamard gate
, R jk a (θ) and H (d) elementary gates (a is x,y or z).
In the above equation the notation (0.n) is the fractional representation of n/d in the base-d arithmetic system. The application of the H (d) gate to a basis state |j is shown below
The Hadamard gate for qudits essentially performs the order-d Fourier transform, likewise the Hadamard gate for qubits performs the order-2 Fourier transform. Methods for implementation of the H (d) gate are proposed in [2, 49] . The symbols that will be used throughout the text for the three families of elementary gates defined and the H (d) gate are shown in Figure 1 .
Diagonal Gates of one and two qudits
The qudit elementary gates of the previous section affect a 2-dimensional subspace of the whole d-dimensional Hilbert space of a single qudit. In this section single and two qudit diagonal basic gates affecting the whole d-dimensional space of one of the qudits are described and synthesized using elementary gates of the previous section. [14] is defined by the equation
It can be easily proved that such a gate can be constructed by sequentially applying d − 1 R (jk) z (θ) gates as shown in the following equation
A related gate is the
The
and add a global phase of angle ϕ =
The diagonal gates of the previous subsection can be extended to operate on two qudits, where the first is the control qudit and the second is the target qudit, in the following manner:
is applied on the target qudit iff the control qudit is in state |m , otherwise no operation is effective on the target. Thus, the d 2 × d 2 matrices representing such gates have the following block diagonal form
and
A construction of a CD 
Generalized Controlled Rotation gate R (d) k
The controlled diagonal gates CD ′ m and CD m of the previous subsection are activated whenever the control state is equal to one of the d possible basis states, e.g. |m . We define a basic controlled diagonal gate, R (d) k , such that each one of the d possible control states have a different effect on the target qudit. Such gates will be useful in the QFT and arithmetic circuits presented in the following sections. The R (d) k gate is parametrized by the integer k. The matrix defining this gate is block diagonal of the form
where the matrix Φ
k is diagonal too, and defined with
The angles ϕ 1 , ϕ 2 , . . . , ϕ (d−1) depend on the parameter k as follows
The R
k gates can be equivalently written in a more detailed form consisting of a sum of tensor products of the basis states of the two qudits as
We can see by inspecting Eq. (16) that an R
k gate is a generalization on qudits of the controlled rotation gates
k ) for the qubit case (where d = 2) and this generalization will be exploited when constructing the QFT and various arithmetic circuits based on the QFT. To understand this, it is useful to see what is the effect of an R
k gate when the control qudit is on a basis state |j 1 (j 1 = 0, 1, . . . , d − 1) and the target qudit is in a superposition of equal amplitudes, but with different phases, such as |b =
l=0 e iϕ l |l . The joint state of the two qudits after the application of the R
A. Pavlidis and E. Floratos 11 (12), (13) and (14) )
Taking into account that a CD 
Quantum Fourier Transform
The Quantum Fourier Transform on the N -dimensional computational basis {|0 , |0 , . . . , |N − 1 } is defined by
Usingudits of d levels [2] , [49] , and setting N = d q , theudits basis consists of |j = |j 1 . . . j q = |j 1 . . . |j q where for l-th qudit it holds |j l ∈ {|0 , . . . , |d − 1 }. Then, the QFT action on a basis state |j (j = 0 . .
q are used in the above definition. This tensor product form is similar to the form of the QFT of order 2 n implemented using n qubits of two levels. Thus, the structure of a QFT circuit implemented with qudits is similar to the binary QFT case as depicted in Figure 4 .
Indeed, comparing the state
m=0 e i2π(0.j l j l+1 ...jq−1jq )m |m of the l-th qudit after the transformation of Eq. (20) with Eq. (6) and (17) we can conclude that this state can be generated by applying at the basis state |j l of the l-th qudit a Hadamard gate H (d) and a sequence of q − l generalized rotation gates R
, with k = 2 . . . q − l + 1, controlled by the qudits l + 1 . . . q, respectively. At the end, the order of the qudits must be reversed with swap gates as in the case of the QFT operated on qubits. This swapping of the qudits is not shown in Figure 4 . The inverse QFT circuit is derived by reversing horizontally the direct QFT circuit of Figure 4 (including the SWAP gates not shown) with opposite signs in the angles of the rotation gates. 
A. Pavlidis and E. Floratos 13

Arithmetic Circuits
The integer arithmetic circuits presented in this section are developed in a bottom up succession, starting from the simpler ones and proceeding gradually to more complex ones. The arithmetic operation are assumed to be modulo d q where d are the qudit levels and q is the number of qudits used to represent the integers. All the adders can be easily converted to subtractors by using opposite sign in the angles of the rotation gates while retaining the same circuit structure.
Adder of two integers (ADD)
A basic arithmetic operation block is an adder of two integers of q d-ary digits each, e.g a = (a 1 a 2 . . . a q ) and b = (b 1 b 2 . . . b q ) or two superpositions of integers. Following the previous sections, the most significant d-ary digit of an integer is indexed with 1 while the least significant digit is indexed with q. The circuit in Figure 5 operates on 2q qudits, the state |b (20)). It is a generalization on qudits of the circuit proposed in [19] .
The first qudit of the lower register is initially in the state |ϕ 1 (a) . The effect of the first rotation gate R 
The effect of the second gate R
controlled by |b 2 is to further evolve it (step [1, 2] ) in the state 
Proceeding in a similar way up to gate R
controlled by |b q , we find the final state (step [1,q] ) of the first qudit which becomes
In general, the final state of the l-th qudit of the lower register is found to be
Applying Eq. (24) to each lower register qudits we can find that the lower register has the final joint state
This is the quantum Fourier transform of the sum state |a + b (mod d q ) . By applying the inverse QFT at the lower register we can get the desired sum in the computational basis, while the upper register remains in the initial state |b . The required direct and inverse QFT blocks are not shown in Figure 5 .
Adder of an integer with constant (ADDC b )
Whenever one of the integers is constant, e.g. b = (b 1 b 2 . . . b q ), then the upper register in Figure 5 is not necessary and all the controlled rotation gates become single qudit rotation gates with their angles defined by the constant integer b. Thus (see Eqs. (13) and (14)), we must apply on the l-th qudit of the lower register a sequence of q − l + 1 rotation gates
d k mb k+l−1 |m m|, for k = 1 . . . q − l + 1. This product of gates can be merged into one gate of the form Fig. 6 . Adder of an integer with constant b (ADDC b ) and the respective symbol.
so they can be constructed with elementary R (jk) z (θ) gates using the procedure described in subsection 3.5. Figure 6 shows the constant b adder (direct and inverse QFT blocks not included in the diagram). Likewise the general adder ADD, this adder performs the addition modulo d q .
Single State Controlled Adder of an integer with constant (C c ADDC b )
The constant adder ADDC b can be easily converted to a constant adder controlled by the state of an additional control qudit so as to perform the transformation C c ADDC b (|e |a ) = |e |a + bδ ce (28) where δ ce is the Kronecker delta function. Consequently, the addition is performed iff the control state equals |c , otherwise the target state |a remains unaltered. The one state controlled constant adder C c ADDC b can be constructed as shown in Figure 7 if the one qudit rotation gates B l (b) of Figure 6 are converted to the respective two qudits diagonal gates controlled by state |c . These gates are exactly the CD (c) gates of subsection 3.5. 
Generalized Controlled Adder of an integer with constant (GCADDC b )
A useful generalization of the previous C c ADDC b circuit can be achieved if we permit all the basis states of the control qudit to have an influence on the result of the addition. Such a circuit will be named Generalized Controlled Adder with constant b and is defined by the relation
The above equation can be rewritten as
Equation (30) 
) ( 
Multiplier with constant and Accumulator (MAC b )
A Multiplier with constant and Accumulator MAC b multiplies audits integer x with a constant b of q d-ary digits, and accumulates the product bx to audits integer a (modulo d q ). Namely, the MAC b circuit consists of twoudits registers holding initially the states |x and |a and transforms them as
Taking into account that x can be written as (x 1 x 2 . . .
This means that the above transformation can be implemented by applying q GCADDC circuits, where the control is done consecutively by the qudits x q , x q−1 , . . . , x 1 and the constant 
A. Pavlidis and E. Floratos 17
This always happens when d is a prime number. Figure  10 shows how to construct a Multiplier with constant b using two M AC b blocks and the necessary direct and inverse QFT blocks. It requires audits register initially holding the integer x and anotherubits ancilla register initially in zero state. At the end, one register is set to the state |bx (mod d q ) while the other register is set to state zero, so effectively the ancilla register is reset back and can be reused.
In the diagram of Figure 10 , the boxes with the black strip at their right side are the "direct" blocks while these with the black strip at their left side are the respective inverses. The operation of the inverse MAC with parameter b −1 is to perform substraction instead of accumulation, that is referring to Figure 10 , we have the operation M AC |bx |ϕ(x − b −1 (bx)) = |bx |0 . The inverse MAC −1 has the same internal topology as the direct MAC of Figure 9 (of course with parameter b −1 instead of b) with the only difference that the angles of its rotation gates have a minus sign. By inspecting the labels at the qudit buses of Figure 10 describing the respective states we can conclude that the circuit implements the multiplication
Excluding the ancilla register, which is in the zero state before and after the operation and thus it remains unentangled, we conclude that this circuit performs the desired multiplication operation.
Diagonal Operators on q qudits
The diagonal operator onudits of d levels, as its name implies, is a circuit whose unitary matrix of dimensions d q × d q has a diagonal form. The circuit developed in this section is such that the diagonal elements of the matrix are integer powers of the principal root of unity e In what follows, the diagonal operator circuit developed is for the function f (k) = γk 2 , where γ is an integer constant. Thus, the definition of our diagonal operator onudits is
where Q = d q and f (k) = γk 2 (mod d q ). All the diagonal entries of the above matrix are integer powers of the basic phase ω = e i2π Q . The effect of this matrix upon a general superposition state ofubits will be
The circuit that implements the operator of Eq. (34) will be derived by exploiting results of [45] which are given for the case of binary quantum circuits. A prerequisite for this construction is a Squarer/Multiplier with constant/Accumulator circuit (SMAC) that computes the function f involving twoudits registers as in
Such an SMAC circuit will be described in subsection 6.2. Figure 11 shows the diagonal operator circuit with entries dependable on the function f (k) = γk 2 (mod Q). Two quantum registers are used, eachudits wide, namely Reg1 and Reg2. The upper register Reg1 is assumed to be in a general superposition state prior the operator ∆ (q) γ is applied as described in Eq. (35) , while the lower register Reg2 is an ancilla register with zero initial and final state.
The first step is to form in the ancilla register Reg2 the uniform superposition state
|h . This is accomplished with the application of q Hadamard gates H 
and it has exactly the same form of the diagonal gates of Eq. (9). The joint affect of these gates at Reg2 is given by their tensor product which is a diagonal matrix too, of dimensions
Then, the state of Reg2 becomes
The initial state of Reg1 is assumed to be a general superposition of basis states and can be expressed as
In the right hand side of Eq. (40) we have grouped all the basis states with value k such that f (k) = n in a set K n and then sum over all the states belonging to sets K n . The expediency of this grouping will be clear later. Combining Eq. (40) and (39) we find the joint state of Reg1 and Reg2 just before the application of the SMAC block, which is given by the tensor product
Taking into account the effect of the SMAC block given by Eq. (36) we get the state of the two registers after the application of the SMAC
We are going to use m = h + n (mod Q) as the index of the inner summation in place of h. We observe that for a particular n, as h takes the values from 0 to Q − 1, then m = h + n (mod Q) takes one value a time ("1-1" mapping), that is the new index m will be in the same range from 0 to Q − 1. Thus the lower and upper limits of the new index m remain the same and we have h = m − n (mod Q) and Q − h = n + (Q − m) (mod Q). Also, it holds ω Q = 1. Then Eq. (42) becomes
This shows that Reg1 has the desired state of Eq. (35) and it is disentangled with respect to Reg2 which remains in state of Eq. (39) . Thus, the ancilla Reg2 can be reset without any effect on the Reg1. The resetting can be accomplished as shown in Figure 11 by applying in the reverse sequence (a) the inverse of the gates
respectively. An alternative method would be to measure Reg2 and depending on the measurement result to apply GCX gates controlled by the measurement classical result. This measurement would not affect Reg1 as it is disentangled with respect to Reg2.
Multiplier of two integers / Accumulator (MMAC)
The construction of the SMAC block requires a multiplier of two integers and accumulator block (MMAC) whose operation is to multiply integer x with integer y and accumulate the product xy to integer z (modulo d q ). This means that the MMAC block is applied on threeudits registers and performs the transformation x q−t y q−s+t (45) In Eq. (45) the full product terms corresponding to powers d s with s ≥ q have not been included, because the product is to be calculated modulo d q . Also, digits with negative index (e.g. x −1 ), as well as with index greater than q (e.g. x q+1 ), are assumed zero. The calculation of the product and the accumulation can be performed in an similar way as in the MAC circuit given in subsection 5.5. We assume that the state corresponding to the accumulation register integer |z is already Fourier transformed and taking into account Eq. (20) which expresses the QFT we expect that the l-th qudit of the accumulation result |z + xy prior the inverse QFT is
Thus, to bring and initial state |ϕ l (z) of the l-th qudit to the state of Eq. (46) we must add various integer multiples of the basic angle (2π/d l ). Namely, taking into account Eq. (45), the angles that must be added to the amplitude phases of a basis state |r (r = 0 . . . d − 1) in the superposition |ϕ l (z) of Eq. (46) are
The restriction s < l at the upper limit of the first sum of Eq. (47) comes due to the periodicity exp(ϕ + 2πd n ) = exp(ϕ) that holds for any integer d and any non negative integer n. The restriction t ≤ s at the upper limit of the second sum results because y q−s+t = 0 for t > s. Replacing with k = l − s, Eq. (47) becomes Φ l,r = 2πr
Consequently, the angles that must be added to the phase amplitude of the |r component of the superposition are (2π/d k )x m y n r and depend on indices m = q − t and n = q + k − l + t. This can be attained if we introduce the notion of a double controlled generalized rotation gate applied to three qudits, two controls and one target. Similarly to Eq. (16) which is the definition of the simply controlled generalized rotation gate, we define the double controlled generalized rotation gate R Figure 12 depicts the symbol for this double controlled rotation gate. In Appendix A a construction of R (d) k will be presented using some of the elementary and basic gates introduced in Section 3.
The topology of the MMAC circuit can be directly concluded from Eq. (48), as this equation describes which gates have to applied and which are their control connections to the qudits carrying |x and |y . Figure 13 shows an example MMAC for the case of q = 4. In this figure the R (d) k gates are represented with the value k inside the circle. Generalization for any value of q is obvious.
We observe in Figure 13 and in Eq. (48) 
k gates are sequentially applied on the l-th target qudit for a specific
k gates are applied on the l-th target qudit. Summing over all target qudits we find the total number of gates used
The same value gives the depth of the circuit as arranged in Figure 13 . Indeed, for the example q = 4 we find C MMAC (4) = 20. We can exploit the fact that gates R (d) k mutually commute as they are diagonals and rearrange them so as to achieve a parallelization in their execution. The gates that can be executed simultaneously are those that operate on different qudits. An example of the proposed parallelization for the case q = 4 is shown Figure 13 , where below each gate is shown the soonest timestep in which it can be executed. E.g. at the first timestep three gates can be executed in parallel as none of these gates operate on the same qudit as the other two. We can generalize this parallelization scheme and conclude that we can achieve a depth of about q(q + 1)/2 which is quadratic instead of cubic without the proposed rearrangement.
A. Pavlidis Step 1 2 3 4 5 1 2 3 6 7 3 1 4 2 6 7 5 8 9 10 Step 1 2 3 4 5 1 2 3 6 7 3 1 4 2 6 7 5 8 9 10 
Squarer/Multiplier/Accumulator (SMAC)
The MMAC circuit allows the construction of the SMAC γ circuit described by Eq. (36) and required for theudits diagonal operator ∆ (q) γ . The Squarer/Multiplier with constant γ /Accumulator modulo d q is presented in block diagram in Figure 14 . It uses 4q qudits, 2q of which are ancilla qudits with zero initial and final state, where q is the number of qudits used to represent the argument x. The 4q qudits are grouped into four registers ofudits each. The second register from top holds the argument |x while the bottom register holds the accumulation value |z + γx 2 mod d q .
The first step is to set the state of the top register into the same state as the second one, which is |x . This is accomplished with the adder block sandwiched between two QFT blocks, direct and inverse (This operation could be also achieved using a sequence of GCX gates to "copy" the second's register state to the first). In a second step, the two states |x and |x of the two top registers are multiplied together by the MMAC and the product is accumulated to the third register from top, which was initially in the zero state. At this stage, the joint state of the three top registers is |x |x |x 2 (mod d q ) . Next, the MAC γ block follows to multiply the constant γ with the |x 2 (mod d q ) state of the third register. The result is accumulated to the bottom register, which was initially in state |z . At this point the joint state of the four registers can be described by |x |x |x
What remains is to reset the first and the third ancilla registers. The inverse MMAC resets the third register by performing substraction instead of accumulation of the product |x constructed like the direct MMAC with opposite angles in its rotation gates. Last, the inverse adder resets the top ancilla register. Consequently, the circuit of Figure 14 implements the transformation
which is exactly the transformation of Eq. (36) if the ancilla registers are ignored.
Complexity Analysis
The arithmetic quantum circuits proposed in the previous sections are broken down to the level of elementary gates
x (θ) and GCX (jk) m introduced in Section 3. This decomposition is depicted in Figure 15 in a tree structure, where the root of of each tree is some of the complete circuits proposed and the leaves of the tree (trapezoids) represent the elementary gates. The edges of each tree are labeled with the number of components needed by each level from one level below (no label stands for 1). The SMAC and the Diagonal A rough complexity analysis in terms of quantum cost (number of elementary gates used) and depth (execution time) can be done with the help of Figure 15 . The analysis assumes that single and two qudits gates are equivalent in terms of costs and execution time. Exact costs and depths depend on the particular implementations. The total gates count for each block can be found by traversing the tree emerging from the inspected block down to each leaf of the subtree. The labels of the edges for each path are multiplied and then the products of each path used are summed together. E.g. the QFT circuit needs q Hadamard gates,
(θ) gates. Similar calculations provide us with the quantum costs shown in Table 1 , which shows only the highest order terms.
The depth calculation will be done in more detail by finding first the depths of QFT, ADD, MAC and MMAC blocks and then the depths of MULC and SMAC.
QFT At first glance Figure 4 exhibits a quadratic depth O(q 2 ), but it can be easily shown that we can parallelize the execution with an appropriate reordering of the gates and thus achieve a linear depth, namely depth(QFT)=8d 2 q.
ADD Similarly as in the QFT case, a reordering of gates in Figure 5 offers a linear depth too, that is depth(ADD)=4d 2 q.
MAC Concurrent execution of gates is possible in this case, too. It can be easily seen that by flattening the hierarchy MAC-GCADDC-CADDC, q different controlled gates B l (b) (Eq. (26)) belonging in different GCADDC blocks can be executed concurrently. Thus, the depth of the MAC is of the order O(4d 2 q) instead of O(4d 2 q 2 ) as directly calculated by the number of elementary gates. Figure 10 we find depth(MULC)=3depth(QFT)+2depth(MAC), as the two middle QFT blocks (direct and inverse) can be executed simultaneously. Thus, we derive depth(MULC)= 32d 2 q.
MULC Observing
MMAC The reordering of gates achieves q(q + 1)/2 execution steps of double controlled rotation gates. Taking into account the decomposition of these three qudit gates into single and two qudits gates (see Appendix A) we end up in depth(MMAC)=21d 3 q 2 .
SMAC From the previous calculations and Figure 14 we find that the dominant depth of SMAC in leading order is twice the depth of the MMAC block.
The depth of the diagonal circuit is essentially the depth of the SMAC.
Conclusions and Future Work
In this paper we presented an assortment of quantum circuits for multilevel qudits. These are basic integer arithmetic operations circuits (like addition, multiplication/accumulation and multiplication) as well as more complex circuits such as squarers. Additional extensions can be applied. E.g., the ADD, ADDC, MAC and MULC circuits can be converted to single qudit controlled versions. Such controlled versions could be useful to the multilevel qudits quantum phase estimation algorithms and quantum simulations. The general diagonal operator has been developed for the special case of a quadratic function f (x) = γk 2 , where k is the coordinate of the diagonal element, however using the same techniques we can easily construct diagonal operators for any power of k and even for a polynomial function on k. E.g. the Squarer/Multiplier/Accumulator can be converted to a circuit that accumulates the third power by inserting additional MMAC units in Figure 14 .
The designs are based on the alternative representation of an integer after QFT transformed instead of the usual computational basis representation, a method which has been already exploited in the binary qubits case. QFT based arithmetic circuits design is a versatile method to develop many arithmetic circuits. E.g. there is no need to handle carries which leads to space reduction. Moreover, if it is suitably used, it can offer advantages in terms of speed. This is possible when similar blocks are iterated to act on a datapath whose state follows the QFT representation. The extensive usage of rotation gates (which mutually commute) on such a datapath permits their rearrangement so as they execute concurrently. This capability is observed in the MAC block, where the application of a suitable reordering of gates led to depth reduction from O(q 2 ) to O(q). Similarly, the depth of the MMAC block reduced from O(q 3 ) to O(q 2 ). Another advantage that has been observed in designs adopting the QFT method is their robustness to various kinds of deviations from the ideal operation. E.g. approximate QFT [50] or QFT banding is the design procedure of eliminating small angle rotation gates. Studies of the Shor's algorithm which uses the QFT showed that the algorithm still works sufficiently even when a large proportion of the QFT rotation gates are eliminated [22, 51, 52] . Recent studies extended to circuits beyond QFT. In [53, 54] the simultaneous gate pruning of rotation gates of the QFT circuit and the QFT based modular exponentiator of Beauregard's circuit [20] were simulated. The simulation results showed similar robustness of Shor's algorithm to these gates eliminations. This robustness is sustained even if the parameters of the remaining rotation gates are randomly selected [23] . The above results suggest that a similar robustness is expected in the multidimensional qudits case and further investigation to be carried.
On the other side, there is a drawback related to the requirement of reliable implementing high accuracy small angles rotation gates. Moreover, these gates must belong to a set of fault tolerant gates if large scale quantum computation is considered. Fortunately, as shown in Appendix B, approximation of these gates is possible, albeit with a cost. However, the remarks of the previous paragraph suggest that this cost may be much lower if approximate computation is adopted.
For the above reasons and also because the exact cost depends on the exact technology used, which for qudits is at an early stage, the complexity analysis of section 7 is to be considered as a crude indicator of performance. Despite that, we think that the proposed designs enrich the toolkit of the future quantum computing.
gate, while the values m and n inside the small circles of the same gate signify that the (mn)-th block of the diagonal matrix CCD is of the form diag(1, e (a 1 , a 2 , . . . , a d−1 ) . The difference in this case is that we need double controlled generalized NOT gates, which will be called GCCX (jk) (m,n) . They can be thought as an extension of Toffoli gates to the qudit case and their operation is analogous to that of the GCX What remains is the GCCX (jk) (m,n) gate construction. This gate operates in a two dimensional subspace of the target qudit. Thus, extension to the qudit case of a Toffoli gate decomposition into single and two qubits [55] can be exploited and this is shown in Figure A. 3. The single and two qudit gates used will be the generalization of the S,T and H qubit gates to the d dimension of the qudits but operating only on a 2-dimensional subspace. Concretely, we define the gates 
