Improved Quantum Ternary Arithmetics by Bocharov, Alex et al.
ar
X
iv
:1
51
2.
03
82
4v
2 
 [q
ua
nt-
ph
]  
9 J
un
 20
16 Improved Quantum Ternary Arithmetics
Alex Bocharova
Quantum Architectures and Computations Group, Microsoft Research
Redmond, Washington, 98052, USA
Shawn X. Cuib
University of California
Santa Barbara, California, 93106, USA
Martin Roettelerc
Quantum Architectures and Computations Group, Microsoft Research
Redmond, Washington, 98052, USA
Krysta M. Svored
Quantum Architectures and Computations Group, Microsoft Research
Redmond, Washington, 98052, USA
Qutrit (or ternary) structures arise naturally in many quantum systems, notably in certain non-abelian
anyon systems. We present efficient circuits for ternary reversible and quantum arithmetics. Our main
result is the derivation of circuits for two families of ternary quantum adders. The main distinction from
the binary adders is a richer ternary carry which leads potentially to higher resource counts in universal
ternary bases. Our ternary ripple adder circuit has a circuit depth of O(n) and uses only 1 ancilla,
making it more efficient in both, circuit depth and width, when compared with previous constructions.
Our ternary carry lookahead circuit has a circuit depth of only O(log n), while using O(n) ancillas.
Our approach works on two levels of abstraction: at the first level, descriptions of arithmetic circuits
are given in terms of gates sequences that use various types of non-Clifford reflections. At the second
level, we break down these reflections further by deriving them either from the two-qutrit Clifford gates
and the non-Clifford gate C(X) : |i, j〉 7→ |i, j + δi,2 mod 3〉 or from the two-qutrit Clifford gates and
the non-Clifford gate P9 = diag(e−2pi i/9, 1, e2pi i/9). The two choices of elementary gate sets correspond
to two possible mappings onto two different prospective quantum computing architectures which we
call the metaplectic and the supermetaplectic basis, respectively. Finally, we develop a method to
factor diagonal unitaries using multi-variate polynomials over the ternary finite field which allows to
characterize classes of gates that can be implemented exactly over the supermetaplectic basis.
Keywords: Quantum circuits, ternary quantum systems, quantum adders
a alexeib@microsoft.com
b cuixsh@gmail.com
c martinro@microsoft.com
d ksvore@microsoft.com
1
2 Improved Quantum Ternary Arithmetics
1 Introduction
Quantum computation has seen vast progress over the years, both theoretically and experimentally.
Computations on a programmable and scalable fault-tolerant quantum computer will consist of fully
controlled sequences of primitive operations such as unitary gates, measurements and state prepara-
tions. Such sequences are called quantum circuits. In the most commonly used circuit model, quantum
information is stored in a collection of qubits, where each qubit has a two-dimensional Hilbert state
space with the computational basis {|0〉, |1〉}. A standard universal gate set consists of Clifford gates
and one non-Clifford gate such as the pi8 -gate [1] or V-gate [2]. By design, circuits over a universal set
can be used to approximate arbitrary quantum gates. Thus any quantum algorithm can be processed
given a quantum computer with a universal gate set.
It has been noted by several researchers that architecture of certain quantum registers and gates
is more naturally described by multi-valued logic as opposed to binary logic. History of experiments
with ternary superconducting registers, in particular goes back to 1989 [3],[4]. More recently, in
quantum computation domain, multi-valued logic has been proposed for linear ion traps [5], cold
atoms [6], entangled photons [7]. It remains to be seen, at what scale it would be possible to balance
out quantum universality and fault-tolerance in these and other similar architectures.
The research presented here is motivated in part by recent progress in circuit synthesis over uni-
versal quantum bases arising in topological quantum computing, where multi-qubit encoding is not
necessarily the most natural choice. Several physical systems capable of performing topologically-
protected quantum computations have a natural structure of a qutrit instead of a qubit, where a qutrit
has a three-dimensional Hilbert space with the computational basis {|0〉, |1〉, |2〉}. For instance, in the
SU(2)4 anyon system, anyons with quantum dimension
√
3 are well-suited for encoding quantum
states in qutrits. What is more, it was shown in [8] that the SU(2)4 anyon system can be made univer-
sal through braiding and projective measurement of anyons. This anyonic structure is quite far from
physical realization at the moment, yet, it offers a promise of comparatively simple quantum uni-
versality combined with native topological protection, which, in our opinion, makes it a worthwhile
subject of forward-looking research.
In [9], an algorithm is given for approximation of any multi-qutrit gate with an asymptotically
optimal circuit over the gate set Clifford + diag(1, 1,−1). This work also demonstrated the importance
of Householder reflections for synthesis of efficient circuits. Even though the gate set turned out to
be powerful enough for such synthesis, it had certain conceptual and practical limitations. Thus, it
is quite unlikely that all the reversible classical permutation gates can be implemented exactly over
Clifford + diag(1, 1,−1). This has a damping effect on implementation of arithmetic-heavy algorithms
such as Shor’s Factorization Algorithm, since the integer modular arithmetic is naturally described by
reversible classical circuits. As a matter of principle such circuits may be represented exactly in
commonly used multi-qubit circuit models. e
When compared to [9], the present paper aims at a more abstract level. Here we assume that
the entire group of multi-qutrit classical permutations is representable at some cost, explore different
scenarios of its representation and focus on synthesizing efficient circuits for ternary base arithmetic
in these scenarios. Our thinking at this level remains reflection-centric. Previous research on non-
binary reversible circuits [11] mostly focused on proving the universality of the local classical Clifford
gates in combination with the controlled-increment gate | j, k〉 7→ | j, k + δ j,d−1 mod d〉, where d is the
dimension of the qudit and δ is the Kronecker delta. Reversible circuits available in literature tend to
e To the extent the three-qubit Toffoli gate may be assumed exactly representable.
A. Bocharov, S.X. Cui, M.Roetteler, K.M. Svore 3
use ancillary qudits fairly liberally.
This paper differentiates itself from previous work in two ways. First, we explore several alternate
methods for synthesizing classical reversible circuits. Second, we strive to minimize both the depth
and the width of arithmetic circuits specifically. For example, we show in Section 3.1 that implement-
ing of a faithful CARRY gate is not necessary in a correct ternary adder. By using a modified carry
we eliminate the use of ancillary qutrits and reduce the cost of the gate when compared to a faithful
CARRY as used in previous approaches to implement ternary carry ripple adders [12, 13, 14].
Our focus is mainly on two types of ternary quantum adders, a modified ripple-carry adder and a
carry look-ahead adder. Both adders are generalized from their binary counterparts, but the general-
izations are somewhat non-trivial. To add two n-qutrit numbers, the modified ripple-carry adder uses
1 ancilla and has a circuit depth of O(n), while the carry look-ahead adder requires O(n) ancillas and
has a circuit depth of O(log n). Each of the two adders has an overall circuit size of O(n) elementary
gates. We also study various extensions of quantum adders including adder modulo 3n, comparison,
and subtraction.
We show these arithmetic circuits can be realized exactly using classical Clifford gates and one
additional gate C(X), the controlled-increment gate, whose matrix is given in Equation 1. C(X) is
a two-qutrit non-Clifford gate and it is universal for reversible classical computation. This sets the
ternary reversible circuits apart from their binary analogs, where at least one three-qubit gate, e.g., the
Toffoli gate, is required for universality.
C(X) =

1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1 0

(1)
We also introduce a qutrit universal gate set Clifford + diag(e− 2pii9 , 1, e 2pii9 ), called the supermeta-
plectic basis, which resembles the single-qubit pi8 -gate. Some techniques are developed to construct
new quantum gates from old ones. As an application, it will be shown that all ternary arithmetic
studied in this paper can be implemented exactly over the supermetaplectic basis.
We note that the reflection-centric synthesis of our adder circuits is a ternary counterpart of Toffoli-
centric binary adder circuits as developed, for example, in [17] and [18]. This analogy is explained in
more detail in corresponding sections throughout the paper. The exact representation of the C(X) gate
in supermetaplectic basis parallels the exact representation of the three-qubit Toffoli in the Clifford
+ pi8 basis. Quantitative comparison of the ternary and binary adders would be beyond the scope of
this work. A major step towards comprehensive comparison of this kind was made in the upcoming
paper [10] that demonstrates the advantages of emulating Shor’s period funding function on ternary
quantum computer and especially on the metaplectic topological quantum framework.
The paper is organized as follows. In Section 2, some preliminaries and notations used throughout
the paper are given. In Section 3, we separately discuss the modified ripple-carry adder and carry
look-ahead adder. Section 4 gives some extensions of quantum adders, including addition modulo 3n,
4 Improved Quantum Ternary Arithmetics
comparison, and subtraction. Lastly in Section 5, we introduce the supermetaplectic basis and develop
techniques for the construction of new gates.
2 Preliminaries and Notations
We denote the standard computational basis in a qutrit by {|0〉, |1〉, |2〉}. The terminology “qutrit”
and “ternary” are sometimes used interchangeably. We call a quantum gate reversible or a classical
permutation gate if it acts as some permutation of the standard basis elements. Unless otherwise
noted, the arithmetic, e.g., addition, multiplication, etc., within a ket is assumed to be taken modulo
3. Also by default circuits are read from left to right, while compositions of gates when written as
expressions follow the rule of matrix multiplications, i.e., they are read from right to left. Throughout
the paper, the following ternary quantum gates are frequently used:
1. X =

0 0 1
1 0 0
0 1 0
, namely, X|i〉 = |i + 1〉.
2. S 0,1 =

0 1 0
1 0 0
0 0 1
, namely, S 0,1 swaps |0〉 with |1〉 and fixes |2〉. Similarly, one can define
S 0,2, S 1,2. This notation is also generalized to multi-qutrit gates. For instance, S 00,22 is a 2-
qutrit gate, which swaps |00〉 with |22〉, and fixes all other basis elements.
3. Given an n-qutrit gate U, there are two versions of “controlled-U”. The first version is called
“soft-controlled-U,” denoted by ∧(U), and is defined as the (n + 1)-qutrit gate: |i, j1, · · · jn〉 7→
(I ⊗ U i)|i, j1, · · · jn〉, where the first qutrit is called the control qutrit. The second version is
the “hard-controlled-U” denoted by Cc(U), where c ∈ {0, 1, 2}. The gate Cc(U) is also an
(n + 1)-qutrit gate. However, in contrast to the soft-controlled-U, it maps |i, j1, · · · jn〉 to (I ⊗
Uδi,c )|i, j1, · · · jn〉. It is direct to see that the Cc(U) ′s for different c ′s are equivalent to each
other up to some 1-qutrit reversible gates. Thus we also use C(U) to denote a general Cc(U).
Moreover, the equality
∧(U) = C1(U)(C2(U))2 holds.
4. The following is a list of some important controlled gates:
• SUM = ∧(X) : |i, j〉 7→ |i, i + j〉,
• C(X) = Cc(X) : |i, j〉 7→ |i, j + δi,c〉,
• Horner = ∧(∧(X)) : |i, j, k〉 7→ |i, j, i j + k〉,
• C(SUM) = Cc(SUM) : |i, j, k〉 7→ |i, j, jδi,c + k〉.
The Horner gate is a qutrit generalization of the qubit Toffoli gate. See also [15] for additional
background on the Horner gate.
5. SWAP : |i, j〉 7→ | j, i〉.
For graphical representations of the gates defined above, see Figure 1.
The qutrit Clifford group C [16] is generated by SUM, X, H, and Q, where H and Q are defined as
follows:
H =
1√
3

1 1 1
1 ζ3 ζ23
1 ζ23 ζ3
 , Q =

1 0 0
0 1 0
0 0 ζ3
 ,
A. Bocharov, S.X. Cui, M.Roetteler, K.M. Svore 5
∧(U)
U
Cc(U)
U
c
X SUM C(X)
c
C(SUM)
c
Fig. 1. Graphical representations of some ternary gates
where we use the notation ζn = e
2pii
n for n ≥ 1.
It can be shown that, along with the SUM, all the reversible 1-qutrit gates and SWAP are also
contained in C. Moreover, SUM and all the 1-qutrit reversible gates generate the subgroup of all
reversible gates in C. Some other Clifford gates are Z and ∧(Z), where Z = diag(1, ζ3, ζ23 ), and∧(Z) = (I ⊗ H)SUM(I ⊗ H−1) : |i, j〉 7→ ζ i j3 |i, j〉. However, C(X),Horner,C(SUM) and S 00,22 are
non-Clifford gates.
Consider two pairs of standard basis vectors | j1〉, |k1〉 and | j2〉, |k2〉. It would be useful to note
that the two-way classical reflection S | j1〉,|k1〉 that swaps the | j1〉, |k1〉 and fixes everything else can be
reduced to the corresponding reflection S | j2〉,|k2〉 by applications of O(n) SUM and SWAP gates (that
are Clifford gates: see [9], Lemma 16). In particular, the two-way swap S 00,22 is Clifford-equivalent
to any other two-qutrit two-way swap.
We think of Clifford gates as being cheap in the quantum sense. General rationale for this assump-
tion is that such gates can be simulated classically. (Additional motivation coming from topological
computing: in the context of non-abelian anyons such as SU(2)4 anyon system [8], Clifford gates can
be obtained by anyon braiding alone.) Thus we define the complexity (resp. depth) of a circuit as the
number (resp. depth) of non-Clifford gates.
The following two identities will be used, where ω(n) is the number of 1 ′s in the binary expansion
of n, and ⌊x⌋ means the maximal integer less than or equal to x:
∞∑
i=1
⌊
n
2i
⌋
= n − ω(n), (2)
⌊log n⌋+1∑
i=1
⌊
n
2i
− 1
2
⌋
= n − ⌊log n⌋ − 1. (3)
See also [17] for similar identities.
3 Quantum Ternary Adders
Given two n-trit numbers a = an−1 · · · a1a0, b = bn−1 · · · b1b0, an adder computes their sum s =
snsn−1 · · · s0 = a + b. The elementary method of adding two n-trit numbers is illustrated in Figure 2.
Let c0 = 0 be the initial carry trit and for 1 ≤ i ≤ n, let ci be the carry trit arising from ai−1, bi−1, ci−1,
namely, ci = 0 if ai−1 + bi−1 + ci−1 ≤ 2 and ci = 1 otherwise. For 0 ≤ i ≤ n − 1, si = ai + bi + ci mod 3
and sn = cn.
In Section 3.1 and Section 3.2, we present two methods to implement reversible ternary quantum
adder: a ripple-carry adder and a carry look-ahead adder. The two adders are generalized from their
binary counterparts [17, 18], but the generalizations are somewhat nontrivial, as seen later. On one
hand, the modified ripple-carry adder uses only 1 ancilla for the whole process and has the circuit depth
in O(n). On the other hand, the carry look-ahead adder requires O(n) ancillas with the advantage of
6 Improved Quantum Ternary Arithmetics
an−1 · · · a1 a0
bn−1 · · · b1 b0
cn cn−1 · · · c1 c0 = 0
sn sn−1 · · · s1 s0
Fig. 2. Addition of two n-trit numbers
having circuit depth in O(log n). We will also compare the two adders to other ternary adders known
in literature and show that our adders are more efficient both space-wise and depth-wise.
To implement the adders, we utilize C(X), C(SUM), C(S 0,1) and S 00,22 as the basic building units.
As shown in Section 5.1, C(SUM), C(S 0,1) and S 00,22 can all be constructed exactly from C(X) and
Clifford operations. Therefore, the circuit of adders can be designed from Clifford operations and
C(X) alone. The reason that we still treat C(SUM), C(S 0,1) and S 00,22 as basic units is that it might
be more efficient to synthesize them directly in some basis rather than breaking them up into C(X) ′s.
An example is the metaplectic basis [9], where S 00,22 has an efficient approximation by a metaplectic
circuit.
3.1 Modified Ripple-Carry Adder
The binary quantum ripple-carry adder was considered in [19], where O(n) ancillas are required to
add two n-qubit numbers. In [17], the method was improved so that only 1 ancilla is necessary. Here
we give a ternary generalization of the improved ripple-carry adder.
Note that in contrast to the binary case, the ternary carry is more complicated: if the inputs to a
binary full adder are denoted by a, b, c ∈ F2, then the outgoing carry is given by cout = ab + ac + bc,
where all operations are computed modulo 2. In case of a ternary full adder with inputs a, b, c ∈ F3,
the outgoing carry is given by cout = 2(1 + a + b + c)(ab + ac + bc) + abc, where all operations are
computed modulo 3. Though directly implementing this polynomial using the presented universal
gates is possible, it leads to a relatively large number of elementary gates. A simple observation
allows to reduce this cost significantly as it turns out that cout does not have to be implemented for all
27 input triples but rather only 18 of them. Indeed, it can be shown inductively that—provided there
is no initial incoming carry—for ternary adders, every carry trit ci can only be either 0 or 1, but can
never be 2. This is indicated also in Figure 3 where the crossed out case indicates that this can never
occur in an actual addition: the case ci+1 = 2 is possible only if ci = 2, which inductively we assume
cannot happen. With this definition, ci+1 becomes a balanced function, i.e., there are the same number
of inputs corresponding to each outcome ci+1.
We sketch the idea of constructing the circuit to compute ci+1 from ai, bi and ci based on this
observation. As illustrated in Figure 3, ci+1 equals ci for all but six inputs, the last three inputs in
the column ci+1 = 0 and the last three in the column ci+1 = 1. For each of these six inputs, ci+1
equals 1 − ci. If the gate S 00,22 is applied to qutrits ai, bi, then the six inputs are turned into six new
triples. See Figure 4 for the transition. Moreover, the new six triples are exactly equal to the set
{(a, b, c) ∈ {0, 1, 2}3 : a+b = c, c , 2}. In light of these observations, a reversible circuit, called Carry,
is constructed, which takes ci, ai, bi as input, and outputs ci+1 in the last qutrit. See Figure 5, where f
and g are some functions of ai, bi, ci. The exact shape of f and g is not important since they will be
reversed at the appropriate step of the adder.
As illustrated in Figure 5, the circuit Carry is ancilla free, in contrast to the carry circuit considered
in [13] where 1 ancilla is required for each round of carry. See Figure 6 for the comparison. The circuit
A. Bocharov, S.X. Cui, M.Roetteler, K.M. Svore 7
ci+1 = 0 ci+1 = 1 ✘✘✘✘ci+1 = 2
ai 0 0 0 1 1 2 0 0 1 0 1 1 2 2 2 1 2 2 0 0 0 1 1 1 2 2 2
bi 0 1 2 0 1 0 0 1 0 2 1 2 0 1 2 2 1 2 0 1 2 0 1 2 0 1 2
ci 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 2 2 2 2 2 2 2 2 2
Fig. 3. Ternary carry table
ci+1 = 0 ci+1 = 1
ai 0 0 1 1 2 2
bi 0 1 0 2 1 2
ci 1 1 1 0 0 0
ci+1 = 0 ci+1 = 1
ai 2 0 1 1 2 0
bi 2 1 0 2 1 0
ci 1 1 1 0 0 0
S 00,22
=⇒
Fig. 4. Transition of inputs due to S 00,22
utilizes one S 00,22, one C(S 0,1), two SUM, and two SWAP gates. The SUM and SWAP are both
Clifford gates, so only 2 non-Clifford gates are needed. The depth of Carry in terms of non-Clifford
gates is also 2. Moreover, unlike the binary ripple-carry circuit MAJ [17] where the two qubits other
than ci+1 end up with ai + bi, ci + bi, in our circuit the two qutrits other than ci+1 have the final values
f (ai, bi, ci) and g(ai, bi, ci). This is the reason we call our carry circuit modified. However, as will be
seen below, the modified carry circuit works in the same way as the regular one.
Let C : |ci, ai, bi〉 → | f (ai, bi, ci), g(ai, bi, ci), ci+1〉 be the Carry gate represented by the circuit in
Figure 5. Similar to the adder circuit in [17], the modified ripple-carry adder circuit is designed in
Figure 7, which, as an illustration, shows the addition of two 3-qutrit numbers.
In Figure 7, the qutrit c0, initialized with 0, is the only ancilla required. The qutrit on the bottom
holds the overflow trit, i.e., the highest trit in the sum. Therefore, to add two n-qutrit numbers, exactly
1 ancilla, n Carry gates, n inverse Carry gates and 2 n SUM gates are required, and the depth of the
circuit is 4 n. In contrast, the adder in [13] uses n ancillas and has the complexity in O(n).
3.2 Carry Look-ahead Adder
In the ripple-carry adder, the carry ci+1 is computed only after the value of ci has been obtained, and
thus the overall depth of the circuit is in O(n). One protocol to reduce the depth is the carry look-ahead
adder studied in [18] for the binary addition. Here we generalize it to give a ternary carry look-ahead
adder, which computes all the carry trits in depth O(log n) by introducing extra O(n) ancillas.
The main idea is that there are relations between ci and ci+1, and more generally between ci and c j
for i , j. For instance, if ai + bi = 2, then ci+1 = ci. If ai + bi = 1, then ci+1 = 0 regardless of the value
of ci. See Figure 8 for a summary of the relation between ci+1 and ci. Note that c0 = 0, thus when
i = 0, the column ci+1 = ci in Figure 8 becomes c1 = c0 = 0. Motivated by their relations, we define,
for 0 ≤ i < j ≤ n, the carry status indicator C[i, j] :
ci
bi
ai
S 00,22 †
S 0,1
0
SWAP
SWAP
ci+1
g(ai , bi , ci )
f (ai , bi , ci )
Fig. 5. the circuit Carry
8 Improved Quantum Ternary Arithmetics
ci
bi
ai
ci+1
g(ai , bi , ci )
f (ai , bi , ci )
Carry
0
ci
bi
ai
ci+1
ci
bi
ai
Carry
Fig. 6. (Left) ripple carry in the present paper; (Right) ripple carry studied in [13]
0
b2
a2
b1
a1
b0
a0
c0
C
C
C C−1
C−1
C−1
s3
s2
a2
s1
a1
s0
a0
c0
Fig. 7. Circuit for ripple-carry adder
C[i, j] =

0 c j = 0 regardless of ci
1 c j = 1 regardless of ci
2 c j = ci
Since we already know c0 = 0, the case c j = c0 is then the same as the first case c j = 0. Thus we
can treat these two cases as one, and design C[0, j] so that it will never take the value 2, namely, we
will have C[0, j] = c j.
Explicitly, for 0 < i < n, the circuit, AdjC, shown in Figure 9 computes C[i, i+1] from ai and bi. It
requires 1 non-Clifford gate S 00,22, and no ancilla. However, to compute C[0, 1], we need to make use
of 1 ancilla, and 2 non-Clifford gates S 00,22,C(X). See Figure 10 for the circuit, which we call AdjC0.
Having computed the carry status indicators for any two adjacent indices, we furthermore compute
C[i, j] for arbitrary i , j. For 0 ≤ i < k < j ≤ n, C[i, j] can be obtained from C[i, k] and C[k, j] by the
merging formula in Figure 11.
Note that when i = 0, the row corresponding to C[0, k] = 2 in Figure 11 will never be used. Also
when C[0, k] takes values in {0, 1}, so will C[0, j]. A circuit, M, realizing the merging formula is
illustrated in Figure 12, where M takes C[i, k],C[k, j], and an ancilla initialized to 0 as inputs, and
outputs C[i, j] to the ancilla. The circuit requires 1 non-Clifford gate C(SUM).
The circuits AdjC and Ad jC0 both only depend on ai and bi, thus we can compute all the C[i, i+1] ′s
in one time slice. The nature of the merging formula enables us to obtain all the C[0, j] ′s in O(log n)
ci+1 = 0 ci+1 = 1 ci+1 = ci
ai 0 0 1 1 2 2 0 1 2
bi 0 1 0 2 1 2 2 1 0
Fig. 8. Relation between ci+1 and ci
A. Bocharov, S.X. Cui, M.Roetteler, K.M. Svore 9
bi
ai
S 00,22
S 0,1 C[i, i + 1]
Fig. 9. Circuit AdjC computing C[i, i + 1], 0 < i < n
bi
ai
0
S 00,22
0
SWAP
SWAP
C[0, 1]
Fig. 10. Circuit AdjC0 computing C[0, 1]
time slices. We elaborate this below.
For i = 0, 1, · · · , n − 1, let Bi be the working register configured to be C[i, i + 1] at the beginning,
and let Zi+1 be the working registers initialized to |0〉, which will end up with C[0, i + 1]. We also
need n − ω(n) − ⌊log n⌋ ancillas Xi initialized to |0〉. The circuit consists of three processes, namely,
P-process, C-process, and P−1-process. Each process roughly contains ⌊log n⌋ rounds.
In P-process, we compute all the carry status indicators of the form C[2tm, 2t(m + 1)] and write
all the results into the ancillas, except the ones C[0, 2k] which are written to Z[2k]. There are ⌊log n⌋
rounds, each t = 1, · · · , ⌊log n⌋ corresponding to one round. In the t-th round, which we call the
P[t]-round, the status indicators C[2tm, 2t(m + 1)], m = 0, · · · ,
⌊
n
2t
⌋
− 1 are computed. By the merging
formula, C[2tm, 2t(m+1)] can be obtained from C[2t−1(2m), 2t−1(2m+1)] and [2t−1(2m+1), 2t−1(2m+
2)], both of which have been computed in P[t − 1]-round by induction. Moreover, the circuit M
producing C[2tm, 2t(m + 1)] for different m ′s in the P[t]-round takes different carry status indicators
in P[t − 1]-round as input. Note that the P[1]-round requires the carry status indicators C[i, i + 1] ′s in
the registers Bi. Therefore, in the P[t]-round, all the circuits M computing C[2tm, 2t(m + 1)] can be
made parallel, and their inputs only depend on the carry status indicators from the P[t−1]-round. Thus,
the depth of the circuit in P-process is ⌊log n⌋, the number of ancillas needed is n − ω(n) − ⌊log n⌋,
and the complexity is n − ω(n).
In C-process, we compute C[0, j] into the register Z j, j = 1, · · · , n. This is performed in
⌊
log n3
⌋
+1
rounds. Note that the C[0, 2k] ′s have already been obtained in P-process, and are located in the desired
positions. For t =
⌊
log n3
⌋
, · · · , 0, the C[t]-round consists of computing the carry status indicators
C[0, 2t(2m + 1)], m = 1, · · · ,
⌊
n
2t+1 − 12
⌋
. Again, by the merging formula, we can get C[0, 2t(2m + 1)]
from C[0, 2t+1m] and C[2t(2m), 2t(2m+ 1)]. By induction, C[0, 2t+1m] has been obtained in earlier C-
rounds if m is not a power of 2, and in the P[t+1+ log m]-round otherwise. Also C[2t(2m), 2t(2m+1)]
has been computed in the P[t]-round. Therefore, we can run all the M circuits in the C[t]-round in a
parallel way. These circuits depend on the carry status indicators in the P[t]-round and C[k]-rounds,
k ≥ t + 1. If m is a power of 2, then the correspondingM circuit also depends on C[0, 2t+1m] from the
⊙ C[k, j]
0 1 2
C[i, k]
0 0 1 0
1 0 1 1
2 0 1 2
Fig. 11. The merging formula C[i, j] = C[i, k]⊙C[k, j]
10 Improved Quantum Ternary Arithmetics
0
C[k, j]
C[i, k]
†
2
C[i, j]
C[k, j]
C[i, k]
Fig. 12. Circuit M realizing the merging formula
C[
⌊
log n3
⌋
] · · · C[⌊log n⌋ − 3] · · · C[0]
P−1[⌊log n⌋ − 1] · · · P−1[2] P−1[1]
Fig. 13. Parallelism between C- and P−1-process
P[t + 1 + log m]-round. Thus the circuit in C-process has a depth of
⌊
log n3
⌋
+ 1, and the complexity
is n − ⌊log n⌋ − 1.
In P−1-process, we set the ancillas back to |0〉, thus we need to reverse all the M circuits in P-
process, except for those computing C[0, 2k] ′s which are not stored in the ancillas. The P−1-process
consists of
⌊
log n
⌋− 1 rounds. For t = ⌊log n⌋− 1, · · · , 1, the P−1[t]-round uncomputes C[2tm, 2t(m+
1)], m = 1, · · · ,
⌊
n
2t
⌋
− 1 by using the inverse of M. Note that in this process, all the C[0, 2k]′s will not
be touched. The process has a depth of ⌊log n⌋−1, and the complexity of the circuit is n−ω(n)−⌊log n⌋.
We note that most parts of C-process and P−1-process can actually be parallelized. The argument
is as follows. All the inputs to the C[t]-round which are not of the form C[0, 2m] only depend on
C[k]-rounds, k ≥ t + 1, and the P[t]-round. The inputs that are of the form C[0, 2m] were computed in
P[m]-round, but they will not be touched in P−1-process. The P−1[t + 2]-round only depends on the
outputs in P[t + 1]-round and P[t + 2]-round. Thus the C[t]-round and the P−1[t + 2]-round can be
performed simultaneously. The precise parallelism between C-process and P−1-process is illustrated
in Figure 13.
To summarize, the whole circuit uses n − ω(n) − ⌊log n⌋ ancillas, and has a depth of ⌊log n⌋ +⌊
log n3
⌋
+ 2. The total complexity of the circuit is 3n − 2ω(n) − 2 ⌊log n⌋ − 1.
3.3 Complete Circuit for Carry Look-Ahead Adder
We give two implementations of carry look-ahead adder, namely, the out-of-place adder and the in-
place adder. Recall that the circuits in Figure 9, 10, and 12 are denoted by AdjC, AdjC0, and M,
respectively. The complexity of both AdjC and M is 1, and the complexity of AdjC0 is 2. The depth
of these circuits is equal to their complexity.
3.3.1 Out-of-place Adder
Let Ai, Bi be the registers with initial value ai, bi, respectively, i = 0, · · · , n − 1. Let Zi, i = 0, · · · , n
be the registers initialized to be 0, which will hold the sum a + b at the end of the computation. We
need n − ω(n) − ⌊log n⌋ ancillas Xi to store intermediate carry status indicators. The following is a
description of the circuit of our out-of-place adder.
Out-of-place Procedure:
1. For 0 < i ≤ n − 1, run the circuit AdjC on Ai, Bi, which outputs C[i, i + 1] to Bi. Run AdjC0 on
A0, B0, and Z0 with Z0 as the ancilla, which outputs C[0, 1] to B0. Copy C[0, 1] to Z1 with the
SUM gate. The circuit has a depth of 2, and it consist of n− 1 AdjC, 1 AdjC0, and 1 SUM gates.
A. Bocharov, S.X. Cui, M.Roetteler, K.M. Svore 11
AdjC0 AdjC M
Fig. 14. Circuit glyphs for AdjC0 , AdjC and M. The inverse gates AdjC−10 , AdjC−1 and M−1 are represented by
mirror images of these glyphs.
2. As discussed in Section 3.2, compute all the C[0, i] ′s with the ancillas Xi ′s and the circuit
M±1. At the end of this process, the ancillas are returned to 0, and Zi = C[0, i], i = 1, · · · , n. f
This requires 3n − 2ω(n) − 2 ⌊log n⌋ − 1 calls to the circuit M±1, and has a circuit depth of⌊
log n
⌋
+
⌊
log n3
⌋
+ 2.
3. Undo all the AdjC ′s and AdjC0. At the end of this step, we have Bi = bi, Zi = C[0, i] = ci. The
circuit has a depth of 2, and it consist of n − 1 AdjC−1, 1 AdjC−10 , and 1 SUM−1.
4. Set Zi = Zi ⊕ Ai ⊕ Bi, 0 ≤ i ≤ n − 1. This requires 2n SUM gates.
In summary, the out-of-place adder uses n − ω(n) − ⌊log n⌋ ancillas, and has a circuit depth of⌊
log n
⌋
+
⌊
log n3
⌋
+ 6, with the complexity of 5n − 2ω(n) − 2 ⌊log n⌋ − 1.
We represent AdjC0, AdjC and M as shown in Figure 14. Their inverses are represented by the
same circuit with replaced by . Also a black rectangle means the content will be changed after
the application of the relevant gate, while a blank rectangle means the content remains the same. An
an illustration, we give a complete out-of-place circuit for adding two 10-qutrit numbers in Figure 15,
where we use x to stand for 10, and ci j is the carry status indicator C[i, j]. From Figure 15, it is clear
that the C[0]-round and P−1[2]-round can be parallelized since the gates in these two rounds act on
different wires. One can also verify the cost: the number of ancillas is n − ω(n) − ⌊log n⌋ = 5, the
depth of the circuit is ⌊log n⌋+ ⌊log n3
⌋
+6 = 10, and the complexity is 5n−2ω(n)−2 ⌊log n⌋−1 = 39.
3.3.2 In-place Adder
The idea of in-place adder is also generalized from that in [18]. Let ¯2 be the n-trit number with all
2 ′s, namely ¯2 = 3n − 1. When no confusion arises, we make no distinction between a number and
its trit representation. For two n-trit numbers a, b, denote by a ⊕ b the number obtained by trit-wise
summation modulo 3, and denote by a′ the number obtained by replacing every trit ai by 2− ai. Thus,
the following equations hold:
a ⊕ a′ = ¯2 and a + a′ = 3n − 1.
Let c = c0 · · · cn−1 be the sequence of the n low carry trits for a and b, and let s be the n low trits
of a + b. Then we have
s = a + b (mod 3n) and s = a ⊕ b ⊕ c.
Also note that s′ + a = 3n − 1 − s + a = 3n − 1 − b = b′ (mod 3n).
Let d = d0 · · · dn−1 be the n low carry trits resulting from adding s′ and a. Then, s′ ⊕ a ⊕ d = b′,
and thus we have,
f Z1 = C[0, 1] was obtained in the previous step.
12 Improved Quantum Ternary Arithmetics
0
0
b9
a9
0
b8
a8
0
0
0
b7
a7
0
b6
a6
0
0
b5
a5
0
b4
a4
0
0
b3
a3
0
b2
a2
0
b1
a1
0
b0
a0
0
c9x
c89
c78
c67
c56
c45
c34
c23
c12
c01
P [1] P [2] P [3]
c8x
c68
c46
c24
c02
c48
c04
c08
C[1] C[0]
c0x
c06
c09
c07
c05
c03
c01
P
−1[2] P−1[1]
0
0
0
0
0
b9
b8
b7
b6
b5
b4
b3
b2
b1
b0
sx
0
b9
a9
s9
b8
a8
s8
0
0
b7
a7
s7
b6
a6
s6
0
b5
a5
s5
b4
a4
s4
0
b3
a3
s3
b2
a2
s2
b1
a1
s1
b0
a0
s0
Step 1 Step 2 Step 3 Step 4
Fig. 15. Out-of-place carry look-ahead adder
A. Bocharov, S.X. Cui, M.Roetteler, K.M. Svore 13
¯2 ⊕ a ⊕ b ⊕ d = s ⊕ s′ ⊕ a ⊕ b ⊕ d
= s ⊕ b′ ⊕ b
= ¯2 ⊕ a ⊕ b ⊕ c.
Therefore, c = d, i.e., the n low carry trits for a, b are the same as those for s′, a. We will use this
property to implement the in-place adder.
For 0 ≤ i ≤ n − 1, let Ai, Bi be the working registers initialized with ai, bi, respectively. We will
need 2n−ω(n)− ⌊log n⌋ ancillas, n of which are denoted by Z0, Z1, · · · , Zn−1 and the rest are Xi ′s. Let
Zn be the working register which will store the high trit of a + b. All ancillas start with 0.
In-place Procedure:
1. As described in Out-of-place Procedure Step 1 through 3, compute all the carry trits C[0, j] into
Z j, j = 0, · · · , n. The ancillas Xi ′s and working registers Ai, Bi are all returned to their initial
configuration at the end of the process. This has a circuit depth of ⌊log n⌋ + ⌊log n3
⌋
+ 6, with
the complexity of 5n − 2ω(n) − 2 ⌊log n⌋ + 1.
2. For 0 ≤ i ≤ n − 1, let Bi = Bi ⊕ Ai ⊕ Zi, namely, the register Bi ′s will store the n low trits of the
sum a + b. This can be done by 2n SUM gates.
3. Now we want to erase the n carry trits C[0, i] = ci, i = 0, · · · , n − 1. For 0 ≤ i ≤ n − 2, let
Bi = 2 − Bi. This can be achieved by n − 1 S 0,2 gates.
4. Apply the inverse of the Out-of-place Procedure Step 1 through 3 on the registers Ai, Bi for
0 ≤ i ≤ n − 2 to erase the carry trits c j stored in Z j, j = 0, · · · , n − 1.
5. For 0 ≤ i ≤ n − 2, let Bi = 2 − Bi. Again this can be done by n − 1 S 0,2 gates.
Tracing the cost of the circuit above, we see that the in-place adder has a depth of ⌊log n⌋+⌊log n3
⌋
+⌊
log (n − 1)⌋+⌊log n−13
⌋
+12, and its complexity is 10n−2ω(n)−2 ⌊log n⌋−2ω(n−1)−2 ⌊log (n − 1)⌋−3.
Moreover, the number of ancillas required is 2n − ω(n) − ⌊log n⌋.
Figure 16 gives a complete circuit of in-place adder for n = 10. See Figure 14 and the last
paragraph in Section 3.3.1 for the explanations of notations used in the circuit.
4 Extensions
In this section, we give various extensions based on the modified ripple-carry adder and the carry
look-ahead adder, including addition modulo 3n, subtraction, and comparison.
4.1 Addition Mod 3n
To add two n-qutrit numbers modulo 3n, we simply do not compute the the high carry trit cn.
In the ripple-carry adder (see Figure 7), it suffices to remove the circuit C, SUM, C−1 in the middle,
and the last qutrit on the bottom. Thus in total we need 1 ancilla, 2(n−1) Carry gates, and 2n−1 SUM
gates, and the depth of the circuit is 4(n − 1).
14 Improved Quantum Ternary Arithmetics
0
0
b9
a9
0
b8
a8
0
0
0
b7
a7
0
b6
a6
0
0
b5
a5
0
b4
a4
0
0
b3
a3
0
b2
a2
0
b1
a1
0
b0
a0
0
sx
0
s9
a9
c9
s8
a8
c8
0
0
s7
a7
c7
s6
a6
c6
0
s5
a5
c5
s4
a4
c4
0
s3
a3
c3
s2
a2
c2
s1
a1
c1
s0
a0
c0
S02
S02
S02
S02
S02
S02
S02
S02
S02
s
′
8
s
′
7
s
′
6
s
′
5
s
′
4
s
′
3
s
′
2
s
′
1
s
′
0
†
s
′
8
s
′
7
s
′
6
s
′
5
s
′
4
s
′
3
s
′
2
s
′
1
s
′
0
S02
S02
S02
S02
S02
S02
S02
S02
S02
sx
0
s9
a9
0
s8
a8
0
0
0
s7
a7
0
s6
a6
0
0
s5
a5
0
s4
a4
0
0
s3
a3
0
s2
a2
0
s1
a1
0
s0
a0
0
Step 1 Step 2 Step 3 Step 4 Step 5
Fig. 16. In-place carry look-ahead adder
A. Bocharov, S.X. Cui, M.Roetteler, K.M. Svore 15
In the out-of-place carry look-ahead adder, we run the circuit as described in Out-of-place Proce-
dure in Section 3.3.1. However, in the first three steps of the procedure, we restrict the inputs to the
n − 1 low trits of a and b, namely, a0, · · · , an−2, b0, · · · , bn−2, since there is no need to compute cn. Of
course, in the last step we still need to compute the modulo summation ai⊕bi ⊕ ci for all 0 ≤ i ≤ n−1.
Thus the out-of-place modulo adder uses n − 1 − ω(n − 1) − ⌊log (n − 1)⌋ ancillas, and has a circuit
depth of ⌊log (n − 1)⌋ + ⌊log n−13
⌋
+ 6, with complexity 5(n − 1) − 2ω(n − 1) − 2 ⌊log (n − 1)⌋ + 1.
Similarly, for the in-place carry look-ahead modulo 3n adder, we run exactly the same circuit as
the In-place Procedure in Section 3.3.2, except in Step 1 where we again restrict the inputs only to the
n − 1 low trits of a and b. It is direct to total the cost of the circuit. It has a depth of 2(⌊log(n − 1)⌋ +⌊
log n−13
⌋
+ 6), with the complexity of 2(5(n − 1) − 2ω(n − 1) − 2 ⌊log (n − 1)⌋ + 1). The number of
ancillas required is 2(n − 1) − ω(n − 1) − ⌊log (n − 1)⌋.
4.2 Subtraction
To compute a − b for two n-trit numbers a, b, first convert a to a′, then compute a′ + b, and eventually
convert a′ + b to (a′ + b)′. Note that a′ is the n-trit number obtained by replacing each ai by 2 − ai,
namely, a′ = 3n − 1 − a. Thus we have,
(a′ + b)′ = (3n − 1 − a + b)′ = 3n − 1 − (3n − 1 − a + b) = a − b.
Changing a to a′ costs n Clifford gate S 0,2. Therefore, the circuit for subtraction has the same
depth and complexity as the regular the adder.
4.3 Comparison
Given the circuit for subtraction, it is straightforward to compare two numbers a and b. Actually, there
is a circuit for the comparison of a, b with smaller complexity than that of subtraction since we only
need to know the high trit of a − b. Let a′ = 3n − 1 − a, then a − b ≥ 0 if and only if the high trit of
a′ + b is 0.
In the ripple-carry adder, we convert a to a′ and use the Carry gate C to compute all the carry trits
c1, · · · , cn for a′ + b. After copying cn to the register storing the result of the comparison, we undo all
the C ′s and convert a′ back to a. The circuit thus requires 1 ancilla, 2n Carry gate C, 1 SUM gate, 2n
S 0,2, and has a depth of 4n.
In the carry look-ahead adder, again we first convert a to a′. To compute a′ + b, the circuit
sequentially generates all the carry status indicators C[i, j] ′s. However, since we only care about the
high trit cn = C[0, n], we can design a more efficient circuit to implement the comparison.
Recall from Section 3.2 that in P process we have obtained all the carry status indicators of the
form C[2tm, 2t(m + 1)], and in particular, any C[0, 2k] is of this form. Therefore, if n = 2k for some
k, then cn is obtained at the end of P process. At this moment, there is no need to go through the C
process. Instead, we copy cn into the register storing the result, and undo the P process. In general, let
k = ⌈log n⌉, then we can just pad a and b by adding zeros in the front to make them 2k-trit numbers,
and use the circuit described above to compare a and b. We still call the 2k-trit numbers a and b. For
0 ≤ i ≤ n − 1, let Ai = ai, Bi = bi be the working registers, and let R the register which will store the
result of the comparison. We also need 2k + 2(2k − n) ancillas, among which 2(2k − n) are used to hold
the extra zeros in from of a and b, one is denoted by Z0 as the ancilla to the AdjC0 circuit, and the rest
are denoted by Xi ′s.
Note that after padding a and b with zeros, the carry status indicators C[i, j] ′s, n ≤ i < j ≤ 2k, are
16 Improved Quantum Ternary Arithmetics
known before the compilation, thus we can store their values in the registers and there is no need to
recompute them later.
Carry Look-ahead Comparison:
1. Convert a to a′. This requires 2k S 0,2 gates.
2. For 0 < i ≤ n − 1, run the circuit AdjC on Ai, Bi, which outputs C[i, i + 1] to Bi. Run AdjC0 on
A0, B0, and Z0 with Z0 as the ancilla, which outputs C[0, 1] to B0. The circuit has a depth of 2,
and it consist of n − 1 AdjC and 1 AdjC0.
3. Perform the P process in Section 3.2 to compute all the C[2tm, 2t(m + 1)] that are not known
before compilation into the ancillary registers Xi. Note that here since we don’t have the Zi
registers, all the C[0, 2m] ′s are also written to the Xi registers. The depth of the circuit is k, and
the complexity is 2k − ω(2k) − (2k − n − ω(2k − n)) = n + ω(2k − n) − 1.
4. Copy c2k to the result register R.
5. Undo Step 3.
6. Undo Step 2.
7. Undo Step 1.
Therefore, the total depth of the circuit above is 2k + 4 = 2 ⌈log n⌉ + 4, and it has the complexity
of 4n + 2ω(2k − n) = 4n + 2ω(2⌈log n⌉ − n). The number of ancillas used is 3 · 2⌈log n⌉ − 2n.
5 Techniques for Constructing Quantum Gate Decompositions
In previous sections, we developed a system of ternary arithmetic with the focus on two types of
quantum ternary adders. The building blocks of these circuits include the Carry circuit C, the circuits
AdjC,AdjC0 computing carry status indicators, and the merging formula M. Moreover, the non-
Clifford gates used in these four circuits are S 00,22,C(S 0,1),C(X), and C(SUM).
In this section, we show that it suffices to have C(X) along with Clifford gates to produce the other
three non-Clifford gates exactly. The key technique involved is to analyze the algebraic expressions
of these gates. In Section 5.1, it is proven that C(X) and Horner are equivalent up to Clifford gates,
and that all other non-Clifford gates can be obtained from C(X). In Section 5.2, we introduce a
universal gate set called supermetaplectic basis, which is a qutrit analog of the qubit Clifford + pi8 -
gate. We then illustrate in Section 5.3 that C(X) and Horner can both be implemented exactly over
supermetaplectic basis. Therefore, with the supermetaplectic basis, the ternary circuits for arithmetic
can be realized exactly.
5.1 Construction of Reversible Gates from Polynomial Expressions
Let F3 be the field with three elements {0, 1, 2}. Then any n-qutrit reversible gate can be represented as
a map Fn3 7→ Fn3, or a sequence of n functions Fn3 7→ F3, if one identifies each |i〉 with i, i = 0, 1, 2. We
will see that reversible gates have polynomial representations and these polynomial representations
provide hints to construct one reversible gate from another.
Note that 02 = 0, 12 = 22 = 1 (mod 3), and thus δi,0 = 1− i2 (mod 3). By default, arithmetic within
a ket is taken modulo 3. The following is a list of polynomial expressions of some non-Clifford gates.
A. Bocharov, S.X. Cui, M.Roetteler, K.M. Svore 17
2
2
2
2
2
SWAP
Fig. 17. A circuit for S 01,10
• SUM = ∧(X) : |i, j〉 7→ |i, i + j〉;
• C0(X) : |i, j〉 7→ |i, j + δi,0〉 = |i, j − i2 + 1〉;
• Horner:= ∧(∧(X)) : |i, j, k〉 7→ |i, j, i j + k〉;
• C0(SUM) : |i, j, k〉 7→ |i, j, k + (1 − i2) j〉.
The above list shows that if a qutrit works as a soft control, then it contributes a linear factor in the
expression of the target qutrit, while a hard control qutrit contributes a quadratic factor.
Define C′(X) : |i, j〉 7→ |i, j+ i2〉. Thus, C′(X) = (I ⊗ X)C0(X)−1 is equivalent to C(X). We will use
C′(X) below for the construction of other gates.
The relation between the expressions of Horner and C′(X) resembles that of a bilinear form and a
quadratic form, which are equivalent. This suggests that Horner and C′(X) are also equivalent. Indeed,
the following diagrams give a construction of one from another.
• implementation of Horner gate in terms of C′(X) : |i, j, k〉 SUM1,2−→ |i, i+ j, k〉
C′ (X)−12,3−→ |i, i+ j, k − (i+ j)2〉
SUM−11,2−→
|i, j, k − i2 − j2 + i j〉 C
′ (X)1,3−→ |i, j, k − j2 + i j〉 C
′ (X)2,3−→ |i, j, k + i j〉.
• implementation of C′(X)1,2 gate in terms of Horner : |i, j, k〉
SUM1,3−→ |i, j, i + k〉 Horner1,3,2−→ |i, j + i2 + ik, i + k〉
SUM−11,3−→ |i, j + i2 + ik, k〉
Horner−11,3,2−→ |i, j + i2, k〉.
Note that in the construction of 2-qutrit C′(X), we made use of a third qutrit, but that qutrit does
not have to be clean, namely it could have arbitrary state.
Similarly, C′(X) is enough to construct C(SUM):
C0(SUM): |i, j, k〉
C′(X)1,2−→ |i, i2 + j, k〉 C
′(X)2,3−→ |i, i2 + j, k + (i2 + j)2〉
C′(X)−11,2−→ |i, j, k + i2 + j2 − i2 j〉
C′(X)−11,3−→
|i, j, k + j2 − i2 j〉
C′(X)−12,3−→ |i, j, k − i2 j〉 SUM2,3−→ |i, j, k + (1 − i2) j〉.
To implement C(S 0,1) and S 00,22, notice that the circuit in Figure 17 realizes S 01,10, and moreover
we have:
• S 00,22 = SUM−1(X−1 ⊗ I)S 01,10(X ⊗ I)SUM.
• C0(S 0,1) = SUM−12,1(X−1 ⊗ X−1)S 00,22(X ⊗ X)SUM2,1.
5.2 Supermetaplectic Basis
Recall from Section 2 that C is the qutrit Clifford group generated by H, Q, X, and SUM. Some other
gates in C are Z and ∧(Z), where Z = diag(1, ζ3, ζ23 ), and ∧(Z) = (I ⊗ H)SUM(I ⊗ H−1). It can be
directly verified that ∧(Z) has the following expression:
∧
(Z) : |i, j〉 7→ ζ i j3 |i, j〉.
18 Improved Quantum Ternary Arithmetics
In [8], it has been established that the multi-qutrit metaplectic gate set C + diag(1, 1,−1) or equiv-
alently C + diag(1, ζ6, ζ26 ) was universal for quantum computation in the sense that any multi-qutrit
unitary operator can be approximated to any given precision by a circuit over that gate set. We conjec-
ture that the metaplectic gate set is not universal for exact reversible computation, i.e. it seems that the
subgroup of reversible classical gates that can be represented exactly by metaplectic circuits is rather
thin. In order to ensure exact representation of the reversible gates over a relatively simple multi-qutrit
basis, we expand the basis by adding essentially the “cubic root” of the Z gate to it. To this end we
increase the order of the root of unity used in defining the non-Clifford diagonal gate, and define P9
as the 1-qutrit diagonal gate diag(ζ−19 , 1, ζ9).g
Definition 1 The gate set C + P9 is called supermetaplectic basis.
Since the P9 gate is non-Clifford, this basis is universal for quantum computation. The supermeta-
plectic basis resembles the qubit Clifford + T basis in several aspects. Firstly, we show in Section 5.3
that all the reversible gates can be constructed exactly over the supermetaplectic basis. Secondly, the
P9 gate is a fundamental diagonal gate in the third level of the Clifford hierarchy [20]. Lastly, it was
shown in [21] that P9 can be obtained by magic state distillation.
5.3 Construction of Diagonal Gates from Polynomial Expressions
We continue exploring the use of polynomial expressions in constructing new quantum gates.
The group of reversible gates in C is generated by SUM, X, S 1,2. More precisely, it is described by
the following proposition.
Proposition 2 {S 12, X,SUM} generate a maximal subgroup, which is isomorphic to ≃ GL(n, F3)⋊Fn3,
of the group of reversible gates for any number n of qutrits.
Proof: See Appendix 9.
The statement in Proposition 2 for the case n = 2 was also proved in [9].
By the proof of Proposition 2, the correspondence between GL(n, F3)⋊Fn3 and the group generated
by {S 12, X,SUM} is as follows:
Given a pair (A, v) ∈ GL(n, F3)⋊Fn3, where A = (ai j)1≤i, j≤n, v = (vi)1≤i≤n, then the reversible n-qutrit
gate corresponding to it maps |x〉, for any computational basis element |x〉 = |x1, · · · , xn〉, to |A.x + v〉.
Moreover, any reversible gate of this form is generated by {S 12, X,SUM}.
A function f : Fn3 7→ F3 is called affine linear if f (x1, · · · , xn) = a1x1 + · · · + anxn + b, where
a1, · · · , an, b ∈ F3. A reversible n-qutrit gate can be viewed as an n-tuple of functions: |x〉 7→
| f1(x), · · · , fn(x)〉, where we call fi the coordinates of the gate. Then the above argument shows that
a reversible n-qutrit gate is generated by {S 12, X,SUM} if and only if all of its coordinates are affine
linear functions. Let Fn be the set of all affine linear functions from Fn3 to F3.
Let D be the group generated by the reversible gates in C, together with the diagonal gates ∧(Z)
and P9. We give a technique to characterize all the diagonal gates in D.
By Proposition 2 and the argument above, the reversible gates in D can change the basis element
|x〉 to any element of the form | f1(x), · · · , fn(x)〉, where fi is an affine linear function Fn3 to F3. The
action of
∧(Z) and P9 will contribute a scalar to the basis element. Thus the most general n-qutrit
diagonal gate in D has the form:
g This is the the distillable gate denoted M†3 in [21].
A. Bocharov, S.X. Cui, M.Roetteler, K.M. Svore 19
|i1, i2, · · · , in〉 7→ ζ
∑
f∈Fn
A f f (i1 ,··· ,in)
9 ζ
∑
f ,g∈Fn
B f ,g f (i1,··· ,in)g(i1 ,··· ,in)
3 |i1, i2, · · · , in〉, (4)
where A f , B f ,g are integer parameters. Notice that the affine linear functions f and g take values in
F3, while A f , B f ,g take values in Z. We have to evaluate f , g first in {0, 1, 2}, then multiply by A f , B f ,g
inside Z. This is critical for the term ζ9.
As an application, we show that
∧(∧(Z)) and C2(Z) are both contained in D. The expressions of
relevant gates are given below.
• ∧(Z)|i, j〉 = ζ i j3 |i, j〉, P9|i〉 = ζ i9|i〉,
• X|i〉 = |i + 1〉, S 1,2|i〉 = |2i〉,SUM|i, j〉 = |i, i + j〉.
• ∧(∧(Z)) : |i, j, k〉 7→ ζ i jk3 |i, j, k〉.
• C2(Z) : |i, j〉 7→ ζ jδi,23 |i, j〉.
For n = 3, the coefficient in Formula 4 can be written as:
L(i, j, k) = ζ
2∑
a,b,c,d=0
Aa,b,c,d (ai+b j+ck+d)
9 ζ
Bi j+C jk+Dik
3 , i, j, k ∈ F3, (5)
where Aa,b,c,d, B,C, D are integer parameters h. Again ai+ b j+ ck+ d is assumed to be taken modulo 3.
To construct
∧(∧(Z)), set L(i, j, k) = ζ i jk3 . Since ζ9 = ζ33 , we get the equation:
Equ(i, j, k) :
∑
a,b,c,d
Aa,b,c,d(ai + b j + ck + d) + 3(Bi j +C jk + Dik) = 3i jk ( mod 9), i, j, k ∈ F3. (6)
The set {Equ(i, j, k) : i, j, k ∈ F3} is a system of 27 linear equations in the variables Aa,b,c,d, B,C,
and D. Thus there is an efficient way to find the solutions, if any.
By direct calculations, one solution to the above system of equations is:
ζ
i jk
3 = ζ
(1+2i+ j+k)+2(1+2i+ j+2k)+6(2+2i+ j+2k)+2(1+2i+2 j+k)+6(2+2i+2 j+k)+4(1+2i+2 j+2k)+6(2+2i+2 j+2k)
9 , (7)
where the terms on the exponent within each parenthesis is taken modulo 3.
In light of the solution in Equation 7, it is not hard to create a circuit realizing∧(∧(Z)). Explicitly,
this is given in Figure 18.
Similarly, with the same method, we construct a circuit for C2(Z). See Figure 19.
Note that
∧(∧(Z)),C2(Z) are related with Horner,C2(X), respectively, by the Clifford gate H,
namely, we have,
• (I ⊗ H)C2(X)(I ⊗ H†) = C2(Z)
• (I ⊗ I ⊗ H)Horner(I ⊗ I ⊗ H†) = ∧(∧(Z)).
Therefore, both Horner and C2(X) can be implemented exactly over supermetaplectic basis.
h Actually there are also terms i2 , j2, k2 on the exponent of ζ3, but it is direct to see that ζi23 = ζ(2i mod 3)−((2−i) mod 3)9 up to a global
phase, so the square terms can be absorbed into the ζ9 terms.
20 Improved Quantum Ternary Arithmetics
S 1,2 X ❥
s❝
❥
s❝ ❥
s❝
❥
s❝ P9
P†9
P†9
❥†
s❝
❥
s❝
P9 X S 1,2
❥†
❝s ❥†
s❝
❥†
❝s
X†
Fig. 18. A circuit for ∧(∧(Z))
P9
Q S 0,2
P9 P9
S 0,2
Fig. 19. A circuit for C2(Z)
Remark 3 1. The papers [22, 23] developed a similar framework for the binary case.
2. If one uses the similar technique for the qubit Clifford + T gates, namely replacing (ζ9, ζ3) with
(ζ8,−1), one obtains a circuit for the Toffoli gate with T-depth 3, which is optimal in the ancilla
free scenario.
6 Conclusion
We developed improved ternary circuits for reversible ternary adders of two types: the modified ripple-
carry and the carry look-ahead adder. We have also derived solutions for a modulo 3n adder, subtrac-
tion and comparison in ternary encoding. We have offered two levels of abstraction for describing the
corresponding ternary circuits: one in terms of reversible reflections of certain types and one in a more
uniform language that allows only one non-Clifford gate: either the C(X) : |i, j〉 7→ |i, j + δi,2 mod 3〉
or the P9 = diag(e−2pi i/9, 1, e2pi i/9) gate.
Future circuit synthesis work should entail the design of fully modular adders, circuits for singly-
and doubly-controlled adders, as well as optimized circuits for singly- and doubly-controlled additive
shifts that would be essential parts of Shor’s integer factorization algorithm.
An important theoretical direction of future work would be establishing lower complexity bound
for the arithmetic circuits and evaluating the efficiency of designs presented here versus these bounds.
7 Acknowledgment
Most of the work in the present paper was done during Summer 2015 when the second author was
interning with Microsoft QuArC Group.
8 References
1. Boykin, P Oscar and Mor, Tal and Pulver, Matthew and Roychowdhury, Vwani and Vatan, Farrokh: A new
universal and fault-tolerant quantum basis. Information Processing Letters,75(3):101–107, 2000
2. Harrow, Aram W and Recht, Benjamin and Chuang, Isaac L: Efficient discrete approximations of quantum
gates. Journal of Mathematical Physics, 43(9), 4445–4451, 2002
3. Morisue, Mititada and Oochi, Kiyoshi and Nishizawa, Mitsuhiro: A novel ternary logic circuit using Joseph-
son junction. IEEE Trans. Magn., 25(2), 1989
4. Morisue, Mititada and Endo, Jun and Morooka, Toshimitu and Shimizu, Nobuhiro and Sakamoto, Masahiro: A
Josephson ternary memory circuit. Multiple-Valued Logic, 1998. Proceedings. 1998 28th IEEE International
Symposium on, 19 – 24, 1998
A. Bocharov, S.X. Cui, M.Roetteler, K.M. Svore 21
5. Muthukrishnan, Ashok and Stroud, Carlos R, Jr.: Multivalued logic gates for quantum computation. Phys.
Rev. A., 62(5), 051309, 2000
6. Smith, Aaron and Anderson, Brian E and Sosa-Martinez, Hector and Riofrio, Carlos A and Deutsch, Ivan H
and Jessen, Poul S: Quantum control in the Cs 6S 1/2 ground manifold using rf and µw magnetic fields. Phys.
Rev. Lett., 111(170502), 2013
7. Malik, Mehul and Erhard, Manuel and Huber, Marcus and Krenn, Mario and Fickler, Robert, Zeilinger, Anton:
Multi-photon entanglement in high dimensions, Nature Photonics 10, 248–252, 2016
8. Cui, Shawn X and Wang, Zhenghan: Universal quantum computation with metaplectic anyons. Journal of
Mathematical Physics, 56(3), 032202, 2015
9. Bocharov, Alex and Cui, Xingshan and Kliuchnikov, Vadym and Wang, Zhenghan: Efficient topological com-
pilation for weakly-integral anyon model. Phys. Rev. A 93, 012313, 2016
10. Bocharov, Alex and Roetteler, Martin and Svore, Krysta M.: Factoring with Qutrits: Shor’s Algorithm on
Ternary and Metaplectic Quantum Architectures. (In preparation)
11. Brennen, Gavin K and Bullock, Stephen S and O’Leary, Dianne P: Efficient circuits for exact-universal com-
putations with qudits. Quantum Information and Computation, 6, 436, 2006
12. Miller, D Michael and Dueck, Gerhard W and Maslov, Dmitri: A synthesis method for MVL reversible logic.
34th IEEE International Symposium on Multiple-Valued Logic (ISMVL), 74–80, 2004
13. Satoh, Takahiko and Nagayama, Shota and Van Meter, Rodney: A reversible ternary adder for quantum com-
putation. Asian Conf. on Quantum Information Science, 2007
14. Khan, Mozammel HA and Perkowski, Marek A: Quantum ternary parallel adder/subtractor with partially-
look-ahead carry. Journal of Systems Architecture, 53(7), 453–464, 2007
15. Grassl, Markus and Roetteler, Martin and Beth, Thomas: Efficient quantum circuits for non-qubit quantum
error-correcting codes. International Journal of Foundations of Computer Science, 14(5), 757–775, 2003
16. Gottesman, Daniel: Fault-tolerant quantum computation with higher-dimensional systems. Quantum Com-
puting and Quantum Communications, Springer, 302–313, 1999
17. Cuccaro, Steven A and Draper, Thomas G and Kutin, Samuel A and Moulton, David Petrie: A new quantum
ripple-carry addition circuit. arXiv:quant-ph/0410184, 2004
18. Draper, Thomas G and Kutin, Samuel A and Rains, Eric M and Svore, Krysta M: A logarithmic-depth quan-
tum carry-lookahead adder. Quantum Information and Computation, 6(4), 351–369, 2006
19. Vedral, Vlatko and Barenco, Adriano and Ekert, Artur: Quantum networks for elementary arithmetic opera-
tions. Phys. Rev. A, 54(1), 147, 1996
20. Howard, Mark and Vala, Jiri: Qudit versions of the qubit pi/8 gate. Phys. Rev. A, 86(2), 022316, 2012
21. Campbell, Earl T and Anwar, Hussain and Browne, Dan E: Magic-state distillation in all prime dimensions
using quantum reed-muller codes. Phys. Rev. X, 2(4), 041021, 2012
22. Amy, Matthew and Maslov, Dmitri and Mosca, Michele and Roetteler, Martin: A meet-in-the-middle algo-
rithm for fast synthesis of depth-optimal quantum circuits. Computer-Aided Design of Integrated Circuits and
Systems, IEEE Transactions on, 32(6), 818–830, 2013
23. Amy, Matthew and Maslov, Dmitri and Mosca, Michele: Polynomial-time T -depth optimization of Clifford+T
circuits via matroid partitioning. IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, 33(10), 1476–1489, 2014
24. Scott, Leonard L: Representations in characteristic p. The Santa Cruz Conference on Finite Groups (Univ.
California, Santa Cruz, Calif., 1979, 37, 319–331, 1980
25. Liebeck, Martin W and Praeger, Cheryl E and Saxl, Jan: On the O’Nan-Scott theorem for finite primitive
permutation groups. Journal of the Australian Mathematical Society (Series A), 44(03), 389–396, 1988
Appendix A
9 Reversible gates generated by {S 12, X,SUM}
Proposition A.1 {S 12, X,SUM} generate a maximal subgroup, which is isomorphic to ≃ GL(n, F3) ⋊
F
n
3, of the group of reversible gates (the permutation group) for any number n of qutrits.
22 Improved Quantum Ternary Arithmetics
Proof: Let Fn3 be the n-dimensional vector space over the finite field F3. Then there is a one-to-
one correspondence between the elements of Fn3 and the computational basis of the n-qutrit space
(C3)⊗n. That is, any element (x1, · · · , xn) ∈ Fn3 corresponds to the basis element |x1, · · · , xn〉. Thus any
automorphism on Fn3 induces a permutation on the n-qutrit basis, which is a reversible n-qutrit gate.
Let G = GL(n, F3) ⋊ Fn3, the semidirect product of GL(n, F3) and Fn3, and let S 3n be the symmetric
group on 3n elements, or equivalently the group of reversible gates on n qutrits. We first prove the
group generated by {S 12, X,SUM} is isomorphic to G. As a corollary of applying the O’Nan-Scott
Theorem to the classification of maximal subgroups of the symmetric group [24] [25], it follows that
G is a maximal subgroup of S 3n .
The group G is the affine linear group of degree n over F3, namely, it consists of all the pairs (A, v),
where A is an n× n invertible group with entries in F3, and v is a vector in Fn3. The group G acts on Fn3
as follows:
(A, v).x = A.x + v, (A, v) ∈ G, x ∈ Fn3
Therefore, we get a map ϕ : G −→ U(3n), such that ϕ(A, v)|x〉 = |Ax + v〉, where |x〉 is any
computational basis vector. This map ϕ is apparently a group homomorphism and injective.
For 1 ≤ i , j ≤ n, define Ai j, Mi ∈ GL(n, F3), vi ∈ Fn3 as follows.
Ai j = In+E ji =

1
. . .
1
. . .
1 1
. . .
1

, Mi = In+Eii = diag(1, · · · , 1, 2, 1, · · · , 1), vi =
(0, · · · , 0, 1, 0, · · · , 0).
It is straightforward to check that ϕ(Ai j, 0) = SUMi j, ϕ(Mi, 0) = (S 1,2)i, ϕ(0, vi) = Xi, where
the subscript of the gate on the right hand side of each expression denotes the qutrits it acts on. For
instance, Xi is the X gate acting on the i-th qutrit. Therefore, the group generated by SUM, X, S 1,2 is
isomorphic to the group generated by Ai j, Mi, vi, for 1 ≤ i , j ≤ n.
Clearly all the vi ′s generate F3n as an additive group. We next prove that Ai j, Mi generate the group
GL(n, F3).
Let Bi j = MiAi jA−1ji Ai j = In − Eii − E j j + Ei j + E ji, thus Bi j swaps the two basis elements ei and e j.
Now given any matrix A ∈ GL(n, F3), multiplying A on the left by Ai j, Bi j, and Mi constitutes the three
types of row operations on A, and since A is invertible, it can always be reduced to the identity matrix
by row operations. This proves that any matrix in GL(n, F3) can be written as a product of Ai j, Bi j, and
Mi. Therefore, GL(n, F3) is generated by Ai j, Mi, and hence G is generated by Ai j, Mi, and vi.
Combining the above argument, we showed that the group generated by SUM, S 12, X is isomorphic
to G = GL(n, F3) ⋊ Fn3.
