Factoring with Qutrits: Shor's Algorithm on Ternary and Metaplectic
  Quantum Architectures by Bocharov, Alex et al.
Factoring with Qutrits: Shor’s Algorithm on Ternary and Metaplectic Quantum
Architectures
Alex Bocharov∗, Martin Roetteler∗, and Krysta M. Svore∗
∗Quantum Architectures and Computation Group,
Station Q, Microsoft Research, Redmond, WA (USA)
We determine the cost of performing Shor’s algorithm for integer factorization on a ternary quan-
tum computer, using two natural models of universal fault-tolerant computing: (i) a model based
on magic state distillation that assumes the availability of the ternary Clifford gates, projective
measurements, classical control as its natural instrumentation set; (ii) a model based on a meta-
plectic topological quantum computer (MTQC). A natural choice to implement Shor’s algorithm on
a ternary quantum computer is to translate the entire arithmetic into a ternary form. However, it
is also possible to emulate the standard binary version of the algorithm by encoding each qubit in
a three-level system. We compare the two approaches and analyze the complexity of implementing
Shor’s period finding function in the two models. We also highlight the fact that the cost of achiev-
ing universality through magic states in MTQC architecture is asymptotically lower than in generic
ternary case.
I. INTRODUCTION
Shor’s quantum algorithm for integer factorization [42]
is a striking case of superpolynomial speed-up promised
by a quantum computer over the best-known classical
algorithms. Since Shor’s original paper, many explicit
circuit constructions over qubits for performing the al-
gorithm have been developed and analyzed. This in-
cludes automated synthesis of the underlying quantum
circuits for the binary case (see the following and refer-
ences therein: [3, 4, 14, 31, 32, 44, 45, 47, 49]).
It has been previously noted that arithmetic encod-
ing systems beyond binary may yield more natural em-
beddings for some computations and potentially lead to
more efficient solutions. (A brief history note on this
line of thought can be found in section 4.1 of [28].) Ex-
perimental implementation of computation with ternary
logic, for example with Josephson junctions, dates back
to 1989 [34, 35]. More recently, multi-valued logic has
been proposed for linear ion traps [36], cold atoms [43],
and entangled photons [30]. In topological quantum com-
puting it has been shown that metaplectic non-Abelian
anyons [18] naturally align with ternary, and not binary,
logic. These anyons offer a natively topologically pro-
tected universal set of quantum gates (see, for example,
[37]), in turn requiring little to no quantum error correc-
tion.
It is also interesting to note that qutrit-based com-
puters are in certain sense space-optimal among all the
qudit-based computers with varying local quantum di-
mension. Thus in [22] an argument is made that, as the
dimension of the constituent qudits increases, the cost of
maintaining a qudit in fully entangled state also increases
and the optimum cost per Hilbert dimension is attained
at local dimension of dee = 3.
Transferring the wealth of multi-qubit circuits to
multi-qutrit framework is not straightforward. Some of
the binary primitives, for example the binary Hadamard
gate and the two-qubit CNOT gate, do not remain Clif-
ford operations in the ternary case. Therefore, they can-
not be emulated by ternary Clifford circuits. We resolve
this complication by developing efficient non-Clifford cir-
cuits for a generic ternary quantum computer first. We
then extend the solution to the Metaplectic Topological
Quantum Computer (MTQC) platform [18], which fur-
ther reduces the cost of implementation.
A generic ternary framework that supports the full
ternary Clifford group, measurement, and classical con-
trol [13], also supports a distillation protocol that pre-
pares magic states for the P9 gate:
P9 = ω
−1
9 |0〉〈0|+ |1〉〈1|+ ω9 |2〉〈2|, ω9 = e2pi i/9. (1)
The Clifford+P9 basis is universal for quantum compu-
tation and serves a similar functional role in ternary logic
as the Clifford+T basis in binary logic (see [13, 25] for
the more general qudit context).
We show in more detail further that the primitive R
gate available in MTQC is more powerful in practice than
the P9 gate.
Arguably, a natural choice to implement Shor’s algo-
rithm on a ternary quantum computer is to translate the
entire arithmetic into ternary form. We do so by us-
ing ternary arithmetic tools developed in [8] (with some
practical improvements). We also explore alternative ap-
proach: emulation of binary version of Shor’s period find-
ing algorithm on ternary processor. Emulation has no-
table practical advantages in some contexts. For exam-
ple, as shown in section III A, using a binary ripple-carry
additive shift consumes fewer clean P9 magic states than
the corresponding ternary ripple-carry additive shift (cf.
table III).
We also show that on a metaplectic ternary computer
the magic state coprocessor is asymptotically smaller
than a magic state distillation coprocessor, such as the
one developed in [13] for the generic ternary quantum
computer. Another benefit of the MTQC is the ability
ar
X
iv
:1
60
5.
02
75
6v
4 
 [q
ua
nt-
ph
]  
8 A
pr
 20
17
2to approximate desired non-Clifford reflections directly to
the required fidelity, thus eliminating the need for magic
states. The tradeoff is an increase in the depth of the
emulated Shor’s period finding circuit by a logarithmic
factor, which is tolerable for the majority of instances.
The cost benefits of using exotic non-Abelian anyons
for integer factorization has been previously noted, for
example in [2], where hypothetical Fibonacci anyons were
used. It is worthwhile noting that neither binary nor
ternary logic is native to Fibonacci anyons, so the NOT,
CNOT or Toffoli gates are much harder to emulate there
than on a hypothetical metaplectic anyon computer.
The paper is organized as follows. In Section II we
state the definitions and equations pertaining the two
ternary architectures used, and gaive a quick overview of
the Shor’s period finding function. In Section III we per-
form a detailed analysis of reversible classical circuits for
modular exponentiation. We compare two designs of the
modular exponentiation arithmetic. One is emulation of
binary encoding of integers combined with ternary arith-
metic gates. The other uses ternary encoding of integers
with ternary gates.
In Section IV we develop circuits for the key arithmetic
gates based on designs from [8] with further optimiza-
tions.
In Section V we compare the resource cost of perform-
ing modular exponentiation. An interesting feature of
ternary arithmetic circuits is the fact that the denser and
more compact ternary encoding of integers does not nec-
essarily lead to more resource-efficient period finding so-
lutions compared to binary encoding. The latter appears
to be better suited in practice for low-width arithmetic
circuit designs (hence, e.g., for smaller quantum comput-
ers).
We also compare the magic state preparation require-
ments. We highlight the huge advantage of the meta-
plectic topological computer. Magic state preparation
requires width that is linear in log(n) on an MTQC,
whereas it requires width in O(log3(n)) on a generic
ternary quantum computer. 1
All the circuit designs and resource counts are done un-
der assumption of fully-connected multi-qutrit network.
Factorization circuitry optimized for sparsely connected
networks, such as nearest-neighbor for example, is unde-
niably interesting (cf. [40]) but we had to set this topic
aside in the scope of this paper.
II. BACKGROUND AND NOTATION
A common assumption for a multi-qudit quantum
computer architecture is the availability of quantum
gates generating the full multi-qudit Clifford group (see
1 It requires width inO(logγ(n)) in the binary Clifford+T architec-
ture, where γ can vary between log2(3) and log3(15) depending
on practically applicable distillation protocol.
[25],[13]). In this section we describe a generic ternary
computer, where the full ternary Clifford group is pos-
tulated; we also describe the more specific Metaplectic
Topological Quantum Computer (MTQC) where the re-
quired Clifford gates are explicitly implemented by braid-
ing non-Abelian anyons [17, 18]. For purposes of this
paper, each braid corresponds to a unitary operation on
qutrits. Braids are considered relatively inexpensive and
tolerant to local noise. Universal quantum computation
on MTQC is achieved by adding a single-qutrit phase flip
gate (Flip in [18], R|2〉 in [7] and our Subsection II D). In
contrast with the binary phase flip Z, which is a Pauli
gate, the ternary phase flip is not only non-Clifford, but
it does not belong to any level of Clifford hierarchy (see,
for example, [8]). Intuitively one should expect this gate
to be very powerful. Level C3 of the ternary Clifford hi-
erarchy is emulated quite efficiently on MTQC architec-
ture, while the converse is quite expensive: implementing
phase flip in terms of C3 requires several ancillas and a
number of repeat-until-success circuits.
A. Ternary Clifford group
Let {|0〉, |1〉, |2〉} be the standard computational basis
for a qutrit. Let ω3 = e
2pi i/3 be the third primitive root
of unity. The ternary Pauli group is generated by the
increment gate
INC = |1〉〈0|+ |2〉〈1|+ |0〉〈2|, (2)
and the ternary Z gate
Z = |0〉〈0|+ ω3|1〉〈1|+ ω23 |2〉〈2|. (3)
The ternary Clifford group stabilizes the Pauli group
is obtained by adding the ternary Hadamard gate H,
H =
1√
3
∑
ωj k3 |j〉〈k|, (4)
the Q gate
Q = |0〉〈0|+ |1〉〈1|+ ω3|2〉〈2|, (5)
and the two-qutrit SUM gate,
SUM|j, k〉 = |j, j + k mod 3〉, j, k ∈ {0, 1, 2} (6)
to the set of generators of the Pauli group.
Compared to the binary Clifford group, H is the
ternary counterpart of the binary Hadamard gate, Q is
the counterpart of the phase gate S, and SUM is an ana-
log of the CNOT (although, intuitively it is a “weaker”
entangler than CNOT, as described below).
For any n, ternary Clifford gates and their various ten-
sor products generate a finite subgroup of U(3n); there-
fore they are not sufficient for universal quantum compu-
tation. We consider and compare two methods of build-
3ing up quantum universality: by implementing the P9
gate as per Eq. (1) and by expanding into the metaplec-
tic basis (Subsection II D). Given enough ancillae, these
two bases are effectively and efficiently equivalent in prin-
ciple (see Appendix A), and the costs in ancillae create
practical tradeoffs depending on the given application.
B. Binary and ternary control
Given an n-qutrit unitary operator U there are differ-
ent ways of expanding it into an (n + 1)-qutrit unitary
using the additional qutrit as “control”. Let |c〉 be a state
of the control qutrit and |t〉 be a state of the n-qutrit reg-
ister. We define
C`(U)|c〉|t〉 = |c〉 ⊗ (Uδc,`)|t〉, ` ∈ {0, 1, 2},
wherein δ denotes the Kronecker delta symbol. We re-
fer to this operator as a binary-controlled unitary U and
denote it in circuit diagrams as
U
`
.
We omit the label ` when ` = 1. We also define the
ternary-controlled extension of U by
Λ(U)|c〉|t〉 = |c〉 ⊗ (U c |t〉)
and denote it in circuit diagrams as
U .
It is paramount to keep in mind that
SUM = Λ(INC)
(see equations (2) and (6)). Another useful observa-
tion is that for any unitary U we have that Λ(U) =
C1(U) (C2(U))
2.
More detail can be found in [8].
C. The P9 gate and its corresponding magic state
It is easy to see that the P9 gate in Eq. (1) is not a
Clifford gate, e.g., it does not stabilize the ternary Pauli
group. However, it can be realized by a certain deter-
ministic measurement-assisted circuit given a copy of the
magic state
µ = ω−19 |0〉+ |1〉+ ω9|2〉, ω9 = e2pi i/9. (7)
An appropriate deterministic magic state injection cir-
cuit, as proposed in Ref. [13], is shown in Figure 1. For
|µ〉
|input〉
INC†
Cµ,m P9|input〉
Figure 1: Exact representation of the P9 gate by state
injection. Cµ,m stands for a certain precompiled ternary
Clifford gate, classically predicated by the measurement
result m.
completeness, Cµ,m = (P9 INCP
†
9 )
−m INCm. Note that
P9 INCP
†
9 is a Clifford gate, since P9 is at level 3 of the
ternary Clifford hierarchy (cf. [8]).
Such magic state naturally exists in any multi-qudit
framework with qudits of prime dimension [13]. When
the framework supports the full multi-qudit Clifford
group, projective measurements and classical control,
then it also supports stabilizer protocols for magic state
distillation based on generalized Reed-Muller codes. In
particular, a multi-qutrit framework supports a distilla-
tion protocol that requires O(log3(1/δ)) raw magic states
of low fixed fidelity in order to distill a copy of the magic
state µ at fidelity 1 − δ. The distillation protocol is it-
erative and converges to that fidelity in O(log(log(1/δ)))
iterations. The protocol performance is analogous to the
magic state distillation protocol for the T gate in the
Clifford+T framework [11].
One architectural design is to split the actual compu-
tation into “online” and “offline” components where the
main part of quantum processor runs the target quantum
circuit whereas the (potentially rather large) “offline” co-
processor distills copies of a magic state that are subse-
quently injected into the main circuit by a deterministic
widget of constant depth. Discussing the details of the
distillation protocol for the magic state µ is beyond the
scope of this paper and we refer the reader to Ref. [13].
D. Metaplectic quantum basis
The ternary metaplectic quantum basis is obtained by
adding the single qutrit axial reflection gate
R|2〉 = |0〉〈0|+ |1〉〈1| − |2〉〈2| (8)
to the ternary Clifford group. It is easy to see that R|2〉
is a non-Clifford gate and that Clifford+R|2〉 framework
is universal for quantum computation.
In Ref. [18] this framework has been realized with cer-
tain weakly integral non-Abelian anyons called metaplec-
tic anyons which explains our use of the “metaplectic”
epithet in the name of this universal basis. In Ref. [18],
R|2〉 is produced by injection of the magic state
|ψ〉 = |0〉 − |1〉+ |2〉. (9)
4The injection circuit is coherent probabilistic, succeeds
in three iterations on average and consumes three copies
of the magic state |ψ〉 on average.
For completeness we present the logic of the injection
circuit on Figure 2. Each directed arrow in the circuit is
labeled with the result of standard measurement of the
first qutrit in the state SUM2,1 (|ψ〉⊗|input〉). On m = 0
the sign of the third component of the input is flipped;
on m = 1, 2 the sign of the first or second component
respectively is flipped.
In the original anyonic framework the |ψ〉 state is pro-
duced by a relatively inexpensive protocol that uses topo-
logical measurement and consequent intra-qutrit projec-
tion (see [18], Lemma 5). This protocol requires only
three qutrits and produces an exact copy of |ψ〉 in 9/4
trials on average. This is much better than any state dis-
tillation method, especially because it produces a copy of
|ψ〉 with fidelity 1.
In [7] we have developed effective compilation meth-
ods to compile efficient circuits in the metaplectic basis
Clifford+R|2〉. In particular, given an arbitrary two-level
Householder reflection r and a desired target precision ε,
then r is effectively approximated by a metaplectic circuit
of R-count at most 8 log3(1/ε)+O(log(log(1/ε))), where
R-count is the number of occurrences of non-Clifford ax-
ial reflections in the circuit. This allows us to approxi-
mate the CNOT and Toffoli gates very tightly and at low
cost over the metaplectic basis (see Section IV B). More-
over if we wanted constant-depth high-fidelity widgets
for CNOT and Toffoli we can do so by emulating, rather
than distilling the magic state |µ〉 of (7) by a metaplectic
circuit and thus obtain a high fidelity emulation of the
P9 gate at constant online depth (see Section IV A).
As we show in Appendix A, the converse also works.
With available ancillas and enough reversible classical
gates we can prepare the requisite magic state |ψ〉 exactly
on a generic ternary computer. The particular method in
the appendix is probabilistic circuit for the magic state
|ψ〉 of (9) using the classical non-Clifford gate C2(INC).
Our current method for the latter gate is to implement
it as a ancilla-free circuit with three P9 gates.
E. Top-level view of Shor’s integer factorization
algorithm
The polynomial-time algorithm for integer factoriza-
tion originally developed in Ref. [42] is a hybrid algo-
rithm that combines a quantum circuit with classical pre-
processing and post-processing. In general, the task of
factoring an integer can be efficiently reduced classically
to a set of hard cases. A hard case of the factorization
problem comprises factoring a large integer N that is
odd, square-free and composite.
Let a be a randomly picked integer that is relatively
prime with N . By Euler’s theorem, aϕ(N) = 1 mod N ,
where ϕ is the Euler’s totient function, and thus the mod-
ular exponentiation function ea : x 7→ ax mod N is peri-
odic with period ϕ(N) < N . Let now 0 < r < N be a pe-
riod of the ea(x) function (ea(x+r) = ea(x),∀x) and sup-
pose, additionally that r is even and ar/2 6= −1 mod N .
Then the gcd(ar/2 − 1, N) must be a non-trivial divisor
of N . The greatest common divisor is computed effi-
ciently by classical means and it can be shown that the
probability of satisfying the conditions r = 0 mod 2 and
ar/2 6= −1 mod N is rather high when a is picked at ran-
dom. Therefore in Shor’s algorithm a quantum circuit is
only used for finding the small period r of ea(x) once an
appropriate a has been randomly picked.
One quantum circuit to solve for r consists of three
stages:
1. Prepare quantum state proportional to the follow-
ing superposition:
N2∑
k=0
|k〉|ak mod N〉. (10)
2. Perform in-place quantum Fourier transform of the
first register.
3. Measure the first register.
The process is repeated until a classical integer state j
obtained as the result of measurement in step 3 enables
recovery of a small period r by efficient classical post-
processing.
Shor has shown [42] that the probability of suc-
cessful recovery of r in one of the iterations is in
Ω(1/ log(log(N))). Therefore we will succeed “al-
most certainly” in finding a desired small period r in
O(log(log(N))) trials.
Given the known efficiency of the quantum Fourier
transform, most of the quantum complexity of this solu-
tion falls in 1, where the state (10) is prepared. Specific
quantum circuits for preparing this superposition have
been proposed (cf. [3, 4, 14, 31, 32, 44, 45, 47, 49, 50]).
In the context of this paper, distinguish between two
types of period-finding circuits. One type, as in Ref. [3],
is width-optimizing and uses approximate arithmetic.
These circuits interleave multiple quantum Fourier trans-
form and inverse Fourier transform blocks into modular
arithmetic circuits, which in practice leads to significant
depth overhead. We forego the analysis of circuits of this
type for the lack of space leaving such analysis for future
research.
The second type are framed as exact reversible arith-
metic circuits. Their efficient ternary emulation amounts
to efficient emulation of CNOT and Toffoli gates, possi-
bly after some peephole optimization. We discuss two
typical circuits of this kind in detail in Section III, and
briefly touch upon a number of alternatives in Appendix
C.
It is important to note that, with a couple of excep-
tions the multi-qubit designs for Shor state preparation
assumed ideal CNOT and Toffoli gates. However, in Clif-
ford+T framework, for example, the Toffoli gate is often
5±(a|0〉+ b|1〉+ c|2〉)
±(−a|0〉+ b|1〉+ c|2〉) ±(a|0〉 − b|1〉+ c|2〉)
±(a|0〉+ b|1〉 − c|2〉)
m = 1 m = 2
m = 0
m = 2 m = 1
m = 0
Figure 2: Markov chain for repeat-until-success implementation of the injection of the R|2〉 gate [18]. Starting point
is a general input a|0〉+ b|1〉+ c|2〉, where a, b, c ∈ C. Arrows indicate transitions between single-qutrit states. Each
arrow represent a single trial including measurement and consumption of the resource state |ψ〉, where each of the
transitions is labeled with the measurement result. The absorbing state corresponds to successful implementation of
the R|2〉 gate and is denoted by double borders.
only as ideal as the T gate. The question of the required
fidelity of CNOT and Toffoli gates for the quantum pe-
riod finding loop to work is an important one.
If the superposition (10) is prepared imperfectly, with
fidelity 1 − ε for some ε in o(1/√log(log(N)), then the
probability of obtaining one of the “useful” measure-
ments will be asymptotically the same as with the ideal
superposition, i.e., in Ω(1/ log(log(N))). (For complete-
ness, we spell out the argument in Appendix B.) There-
fore, if d is the depth of the corresponding quantum cir-
cuit preparing the state, then the bound on the required
precision of the individual gates in the circuit may be in
o(1/(d
√
log(log(N))) in the context of Shor’s algorithm.
In the rest of the paper we explore ternary emulations
of binary period-finding circuits and compare them to
truly ternary period-finding circuits with ternary encod-
ing of integers. We demonstrate that the fidelity and
non-Clifford cost of such ternary circuits are reduced to
those of the C(INC) gates. We also demonstrate that ef-
ficient emulation of binary period finding requires mostly
binary Toffoli gates with some use of C(INC).
III. MULTI-QUTRIT AND MULTI-QUBIT
ARITHMETIC ON GENERIC TERNARY
QUANTUM COMPUTER
We explore two options for cost-efficient integer arith-
metic over the ternary Clifford+P9 basis: (a) by emu-
lating arithmetic on binary-encoded data; and, (b) by
performing arithmetic on ternary-encoded data, based
on tools developed in [8].
Circuits for reversible ternary adders have been ex-
plored earlier. (See, for example, [27, 33, 41]). Since this
field has been in early stages so far, there is a lot of di-
vergence in terminology: however in [27, 33, 41] the key
non-Clifford tool for the circuitry is an equivalent of the
Cf (INC) gate, in our notation. As pointed out in [8], our
use of this tool is more efficient, mainly due to the design
of “generalized” carry gates and other reflection-based
operations.
Our ternary circuits for emulated binary encoding of
integers are new, as far as we know.
The emulated binary and genuine ternary versions of
integer arithmetic have different practical bottlenecks, al-
though they are asymptotically equivalent in terms of
cost. With the ripple-carry adders, the emulated bi-
nary encoding wins, in practice, in both width and depth
over the ternary encoding, whereas with carry-lookahead
adders the ternary encoding achieves smaller width but
yields no notable non-Clifford depth advantage in the
context of modular exponentiation.
To give the study a mathematical form, let us agree to
take into account only non-Clifford gates used with either
encoding and let us agree to count a stack of non-Clifford
gates performed in parallel in one time slice as a single
unit of non-Clifford depth. We call the number of units
of non-Clifford depth in a circuit the non-Clifford depth
of the circuit.
Throughout the rest of the paper we use the following
Definition 1. For integer n > 0 let |j〉, |k〉 be two dif-
ferent standard basis vectors in the n-qudit Hilbert space.
We call the classical gate
τ|j〉,|k〉 = I⊗n − |j〉〈j| − |k〉〈k|+ |j〉〈k|+ |k〉〈j| (11)
a two-level axial reflection in n qudits.
As a motivation for this term, note that τ|j〉,|k〉 can be
rewritten as the two-level Householder reflection
I⊗n − 2 |u〉〈u|, |u〉 = (|j〉 − |k〉)/
√
2.
Clearly, in binary encoding, the CNOT, the Toffoli and
any variably controlled Toffoli gate is a two-level axial
reflection in the corresponding number of dimensions.
6A. Ternary circuit for binary ripple-carry additive
shift
We discuss emulating an additive shift circuit improv-
ing on a quantum ripple-carry adder from [16]. are cast
in bold font below.
Let a be a classically known n-bit integer and b be
a quantumly-stored n-qubit basis state. We are looking
for a quantum implementation of the function |b〉 7→ |a+
b〉. More specifically, we are looking for a pre-compiled
quantum circuit Ca parameterized by a which is known
at compilation time. Consider the well-known quantum
ripple-carry adder from [16] (in particular, the circuit
illustrated on Figure 4 for n = 6 there that is copied, for
completeness into our Fig. 3).
0 = c0
b0
a0
b1
a1
...
bn−1
an−1
bn
an
z
M
A
J
M
A
J
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
M
A
J
M
A
J
U
M
A
U
M
A
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
U
M
A
U
M
A
0
s0
a0
s1
a1
...
sn−1
an−1
sn
an
z ⊕ sn+1
Figure 3: Ripple-carry adder from [16].
The adder uses 2n + 2 qubits. It performs a ladder
of n MAJ gates to compute all the carry bits, including
the top one. The top carry bit is copied unto the last
qubit and the ladder of n UMA gates is performed. Each
UMA gate uncomputes the corresponding MAJ function
and performs the three-way Z2 addition ai ⊕ bi ⊕ ci.
It is somewhat hard to fold in the classically known
a in the multi-qubit framework using this design. Note
however, that a solution along these lines is offered in
[44]. However it is easy to fold in a in ternary emulation
using the third basis state of the qutrit. We show that
it takes exactly n+ 2 qutrits to emulate the binary shift
|b〉 7→ |a+ b〉.
Consider n + 2 qutrits where the top and the bottom
ones are prepared in |0〉 state and the remaining n encode
the binary bits of the |b〉.
We will be looking for reversible two-qutrit gates Y0, Y1
such that
Yaj |cj , bj〉 = |c′j , cj+1〉 (12)
where cj+1 is the correct carry bit for cj + aj + bj and c
′
j
is an appropriate trit.
Since all the bits of a are known we can precompile a
ladder of Y gates that correctly computes the top carry
bit cn and puts the modified carry trit c
′
j on each bj wire.
Having copied cn onto the last qutrit, we sequentially
undo the Y gates in lockstep with computing partial Z2-
sums bj ⊕ cj on all the bj wires using gates of CNOT
type.
We note that Y0, Y1 are ternary gates, used however in
a narrow context of a truth table with just four columns.
One would intuitively expect that their restriction to the
context can be emulated at a relatively small expense.
Indeed:
Proposition 2. Label the ci wire by 0 and bi wire by 1
In the context of binary data the gates
Y0 = C2(INC)
†
0,1SUM1,0(τ|0〉,|1〉 ⊗ I)
and
Y1 = C2(INC)0,1(I ⊗ τ|0〉,|1〉)SUM1,0(I ⊗ τ|0〉,|1〉)
satisfy the condition (12).
Here the C2(INC) is the binary-controlled increment
C2(INC) : |j, k〉 7→ |j, k + δj,2〉
Proof. By direct computation. Note, that we do not care,
what either of these two circuits does outside of the bi-
nary data subspace as long as the action is reversible.
The C2(INC) gate is also denoted as C2(X) in Ref.
[8], where its cost and utility is discussed in detail (see
also further discussion in section IV). The non-Clifford
cost of either Yj gate is equal to the non-Clifford cost of
C2(INC) which is known to be 3 P9 gates. Allowing one
ancillary qutrit, the C2(INC) is represented by a circuit
of P9-depth of 1 and P9-width of 3.
Besides the generalized carry computation, the addi-
tive shift circuit also needs to perform the bitwise
mod 2 addition by emulated gates of CNOT type.
Recall that CNOT gate cannot be exactly represented
by a ternary Clifford circuit (cf. [8], Appendix A). As
shown further in Proposition 4, the non-Clifford cost
of ternary-emulated CNOT on binary data only is an
equivalent of two C2(INC). Thus the additive shift takes
roughly 12n P9 gates to complete (not counting the Clif-
ford scaffolding). With one ancilla this can be done at
P9-depth of 4n and P9-width of 3.
However, Shor’s period funding functions relies on con-
trolled and doubly-controlled versions of the additive
shift. It suffices to control only the bitwise addition gates.
Thus adding one level of control produces n additional
Toffoli gates and adding the second level of control turns
these gates into controlled Toffolis. This is the bottleneck
of the emulated solution: as per Corollaries 8 and 9 in
section below, an emulated Toffoli takes 12 P9 and the
binary-controlled Toffoli takes 18 P9 gates respectively.
Thus overall the controlled shift takes 18n P9 and the
doubly-controlled shift takes 24n P9 gates. Allowing,
again, an ancillary qutrit the P9-depths of the corre-
sponding circuits can be made 6n and 8n respectively.
For what it is worth, the P9-counts in this solution are
similar (and in fact marginally lower) than the T -counts
7required for running the original binary adder [16] on
the more common binary Clifford+T platform. Indeed
each of the MAJ and UMA gates shown on figure 3 is
Clifford-equivalent to a Toffoli gate that takes 7 T gates
to implement. Adding one level of control to the adder
increases the non-Clifford complexity by an additional n
Toffoli gates to the total T -count of 21n. Adding the
second level of control, conservatively, brings in 2n ad-
ditional modified Toffoli gates to yield the total T -count
of 29n.
We also note that the width of the ternary emulation
circuit is equal to n + 2 qutrits, whereas the original
purely binary design appears to require 2n+ 2 qubits.
The construction of Corollaries 8 and 9 requires 1 and 2
ancillas respectively. These ancillae can be shared along
the depth of the circuit inflating the overall width by 2
qutrits.
B. Ternary circuit for ternary ripple-carry additive
shift
Consider ripple-carry implementation of the quantum
function |b〉 7→ |a + b〉, where |b〉 is quantumly encoded
integer and a is an integer that is classically known. Sup-
pose a and b are encoded as either bit strings with at most
n bits or as trit strings with at most m = dlog3(2)ne trits
(with log3(2) ≈ 0.63093). Since a is classically known, we
strive to improve on the ternary ripple-carry adder of [8]
by folding in the trits of a. However, we are no longer
able to encode all of the quantum information for b and
the carry on the same qutrit. The additive shift thus re-
quires roughly 2m − w1(a) qutrits to run (where w1(a)
is the number of trits equal to 1 in the ternary expansion
of a).
The ternary additive shift in this design has somewhat
higher non-Clifford time cost compared to the emulated
binary shift of section III A.
For the classical additive shift we do not physically en-
code the trits of a and instead pre-compile different gen-
eralized carry circuits for different values of these trits.
Tables I and II show the truth tables for the consecutive
carry ci+1 given, respectively, ai = 1 and ai = 0 (the case
of ai = 2 is symmetric to the case a0 and yields the came
conclusions).
The case of ai = 1 does not require any ancillary
qutrits since the ci+1 is a balanced binary function that
can be produced reversibly on the pair of qutrits encoding
ci and bi by ternary SWAP gate followed by |01〉 ↔ |20〉.
However, in the case of ai = 0 the ci+1 = 0 in five
cases (respectively, for ai = 2 the ci+1 = 1 in five cases)
and such five basis vectors cannot be represented in two-
qutrit state space. These cases thus require an ancillary
qutrit to encode ci+1.
In the case of ai = 0 we simply take the ancilla in the
|0〉 state and apply doubly-controlled INC gate with the
ternary control on ci and binary control on bi. In the
case of ai = 2 it suffices to additionally use the Clifford
τ|0〉,|1〉 gate on the ci+1.
Assuming a is generic with w1(a) ≈ m/3 we get an av-
erage width of the additive shift circuit of roughly 5/3m
which eliminates the space savings afforded by denser
ternary encoding (5/3 log3(2) ≈ 1.05).
Let us now make case for the second observation.
proven practically optimal). We start by assessing the
clean magic state counts for simple uncontrolled additive
shift. We note that for any classical value of the ai trit the
non-Clifford cost of the carry gate is the same and equals
15 clean magic states. Indeed, depending on ai and in
terminology of [8] we either need one gate of S01,10 type
or one gate of C0(SUM) type. In subsection 5.1 of the
[8] both types are reduced to 5 binary-controlled incre-
ments and consequently to 15 P9 gates. The concluding
trit-wise addition is done by Clifford SUMs at negligi-
ble cost. Thus the overall cost of the circuit is roughly
30m ≈ 19n P9 gates. Allowing an ancillary qutrit, the
P9-depth of the circuit can be made equal to 10m > 6n.
Adding one ternary control to the circuit turns all the
finalizing SUMs into“Horner” gates Λ(SUM) that overall
takes 4m additional P9 gates to the total non-Clifford
cost of 34m > 21n P9 gates.
A subtle point discussed in section III E below is that
the second control that is routinely added to the additive
shift gate Sa is in fact strict control that turns it into a
Cf (Sa) gate f ∈ {1, 2} where Sf is activated only by
the control basis state |f〉. This turns each of the the m
“Horner” gates into a four-qutrit Cf (Λ(SUM)) gate. We
do not have an available ancilla-free design for a synthesis
of this gate.
Our best design described in Proposition 11 sets the
non-Clifford cost at 23 P9 gates given one clean ancilla.
Thus adding the required second (strict) control in-
flates the overall cost of the ternary circuit to 53m > 33n
P9 gates.
Again, with available ancilla the circuit can be
restacked to P9-width of 3 reducing the P9-depth by the
factor of 3 (to roughly 19m in case of doubly-controlled
additive shift). is less than the non-Clifford cost of the
emulated binary n-bits doubly-controlled additive shift.
The comparative cost of the binary and ternary options
is summarized in table III.
We demonstrate in the Section III D that the best-
known ternary-controlled modular shift circuit requires
4 instead of 3 additive shift blocks on roughly half of the
modular addition cases, so, in the context of the required
modular addition, the emulated binary encoding appears
to be a practical win-win when a low width ripple-carry
adder is used.
C. Circuits for carry lookahead additive shift
The resource layout is different for known carry looka-
head solutions. For the sake of space we forego detailed
analysis and only sketch the big picture.
8Table I Truth table for ci+1 given ai = 1
ci 0 0 0 1 1 1
bi 0 1 2 0 1 2
ci+1 0 0 1 0 1 1
Table II Truth table for ci+1 given ai = 0
ci 0 0 0 1 1 1
bi 0 1 2 0 1 2
ci+1 0 0 0 0 0 1
We assume, that the integers a and b are encoded as
either bit strings with at most n bits or as trit strings with
at most m = dlog3(2)ne trits. We use carry lookahead
additive shifts based on the in-place multi-qubit carry
lookahead adder [20] and the in-place multi-qutrit carry
lookahead adder [8].
The non-Clifford depths of the corresponding circuits
are 4 log2(n) and 4 log2(m) respectively up to small ad-
ditive constants.
Because log2(m) = log2(n) + log2(log3(2)), there is no
substantial difference in non-Clifford depths. The non-
Clifford layers of the binary adder are populated with
Toffoli gates and for the ternary adder they are populated
with carry status merge/unmerge widgets (the M and
M† widgets of [8]). The cost of ancilla-free emulation
of the former or, respectively, execution of the latter is
identical with 15 P9 gates.
When levels of control are added to the shift circuit,
putting ternary control on ternary widgets is more ex-
pensive than building multi-controlled Toffoli gates, as
discussion in Section III A implies. But in the context
of carry lookahead circuits the multi-controlled gates are
located in just two layers out of O(log(n)) thus the im-
pact of this cost distinction is both asymptotically and
practically negligible.
Note that the widths of the binary and ternary cir-
cuits are roughly proportional to n and m = dlog3(2)ne,
respectively. This means that the purely ternary solution
has roughly m/n ≈ log3(2) smaller width.
depth overhead percentage is moderate, we should pre-
fer purely ternary encoding when implementing Shor’s
period finding on small quantum computer.
D. Circuits for modular additive shifts
We review layout for modular additive shift and con-
trolled modular additive shift in both emulated binary
and genuine ternary setups.
Let N >> 0 and a < N be classically known integers.
The commonly used scheme to compute the quantum
modular additive shift |b〉 7→ |(a+ b) mod N〉 is to com-
pute |a + b〉, figure out whether a + b < N and, if not,
then subtract N . In order to do it coherently without
measurement we need to
1. Speculatively compute the |(a−N)+b〉 shift; struc-
ture it so that the top carry bit cn+1 is 1 iff
(a−N) + b < 0.
2. Copy cn+1 to a clean ancilla x.
3. Apply the shift by +N controlled by the ancilla x.
4. Clean up the ancilla x.
Surprisingly, the last step is less than trivial. We need
to compare the encoded integer |y〉 after step 3) to a.
Then y ≥ a if and only if cn+1 = 1. Therefore we must
flip the ancilla if and only if y ≥ a. We do this by taking
a circuit for comparison to classical threshold and wiring
the NOTx into it in place of the top carry qubit. It
is easy to see that performing the comparison circuit has
the exactly the desired effect on the ancilla x. A top level
layout of the modular additive shift is shown in Figure
4. We note that the three-stage layout shown in the
Figure is not entirely new. It is very similar to designs
proposed in [45] and [44]. Clearly the non-Clifford depth
of this scheme is roughly triple the non-Clifford depth
of the additive shift circuit in either binary or ternary
framework.
In the context of ternary encoding of integers and al-
lowing for ternary control the logic turns out to be more
involved. Depending on whether 2 a < N or not, which is
known at compilation time, we need to compile two dif-
ferent circuits. When 2 a < N we need to speculatively
precompute b + c a − N where c is the quantum value
of the control trit. This is different from adding ternary
control to the additive shift +(a−N). A straightforward
way to do this is by taking the controlled shift +c (a−N)
followed by strictly controlled shift C2(+N). Aside from
this additional shift box, the circuit in Figure 4 still works
as intended, which is easy to establish: the speculative
b + c a − N is corrected back to b + c a if and only if
the eventual result is ≥ c a which is the condition for the
ancilla cleanup.
When 2 a > N we can precompile ternary control on
the entire +(a − N) box, which then precomputes the
9Table III Cost of ripple-carry additive shift: ternary vs. emulated binary. n is the bit size of the arguments.
Circuits #P9: emulated binary #P9: ternary
Simple additive shift 12n 19n
Controlled additive shift 18n > 21n
Doubly-controlled additive shifts 24n > 33n
|b〉
|0〉
|0〉
+(a−N) +N
≥ a?
Figure 4: Top-level layout of modular additive shift for binary encoding.
y = b + c(a − N) for us. However, here we still get
some overhead compared to the binary encoding context.
Indeed, we need to correct the speculative state y to y =
b + c(a − N) + N when y < 0 and it is easily seen that
the result is ≥ c(a−N) +N if and only if y was negative
and the correction happened. Thus the ancilla cleanup
threshold is t = c(a−N)+N on this branch. Since c is the
quantum control trit, the comparison to t is somewhat
more expensive to engineer than comparison to c a.
To summarize, a purely ternary modular shift cir-
cuit allowing for ternary control would be similar to one
shown in Figure 5, where the extra dashed C2(+N) box
is inserted at compilation time when 2 a < N . The latter
case constitutes the critical path where we have to use
an equivalent of 4 additive shifts instead of 3.
E. Circuits for modular exponentiation
For modular exponentiation |k〉|1〉 7→ |k〉|ak mod N〉
we follow the known implementation proposed in the first
half of Ref. [49]. Our designs are also motivated in part
by Ref. [14].
We denote by d the dimension of the single qudit. d is
assumed to be either 2 or 3 where it matters. Suppose
that a,N are classically known integers a < N , and n is
an integer approximately equal to logd(N).
Suppose |k〉 is quantumly encoded, k = ∑2n−1j=0 kj dj
is base-d expansion of k, where kj are the corresponding
qudit states. First, we observe that
ak mod N =
2n−1∏
j=0
(ad
j
mod N)kj mod N. (13)
Note that (ad
j
mod N) are 2n classical values that are
known and easily pre-computable at compilation time.
Thus |ak mod N〉 is computed as a sequence of modular
multiplicative shifts, each quantumly controlled by the
|kj〉.
Suppose we have computed the partial product
pk,m =
m∏
j=0
(ad
j
mod N)kj mod N,
and let
pk,m =
n−1∑
`=0
pk,m,`d
`
be the base-d expansion of pk,m. Then
pk,m+1 =
n−1∑
`=0
pk,m,`(d
`ad
m+1
mod N)km+1 mod N.
Observe, again, that
{(d`adm+1 mod N)f mod N |f ∈ [1..d− 1]} (14)
10
|b〉
|0〉
|0〉
c(a−N) C2(+N) +N
≥ t?
Figure 5: Top-level layout of ternary modular additive shift. In case 2 a < N the circuit is compiled with the
additional C2(+N) shift controlled on c = 2 and using the threshold t = c a. In case 2 a > N the additional shift is
not needed, but the threshold t = c(a−N) +N .
is the set of fewer than d pre-computable classical values
known a priori. Therefore, promoting pk,m to pk,m+1
is performed as a sequence of modular additive shifts,
controlled by pk,m,` and km+1.
Herein lies a subtle difference between the case of d = 2
and the case of d > 2 (e.g. d = 3). In the case of d = 2
we do the modular shift by 2`a2
m+1
mod N if and only
if pk,m,` = km+1 = 1. Thus the corresponding gate is
simply the doubly-controlled modular additive shift.
In case of d > 2 the d− 1 basis values of km+1 lead to
modular additive shift by one of the d−1 potentially dif-
ferent values listed in the equation (14). Thus we need a
(d−1)-way quantum switch capable of selection between
the listed values. Let Sf , f ∈ [1..d − 1] be the modu-
lar additive shift by the f -th value in (14). Then the
desired switch can be realized coherently as the product
C1(S1) · · ·Cd−1(Sd−1) where Cf (Sf ) is the Sf activated
only by the basis state km+1 = |f〉.
This implies the following difference in the circuit
makeup between the case of d = 2 and the case of d = 3.
For d = 2 modular exponentiation takes roughly 2n2
doubly-controlled modular additive shifts; for d = 3 it
takes roughly 4m2 doubly-controlled modular additive
shifts (where m is the trit size of the arguments), each
with one ternary and one strict control on one of the two
ternary values.
When comparing the option of performing the circuit
in emulated binary encoding against the option of run-
ning it in true ternary encoding we find a practical dead
heat between the two options in terms of circuit depth.
Indeed in counting the number of doubly-controlled addi-
tive shift boxes we find that 4m2 = 2 (log3(2))
2 (2n2) ≈
0.796 × (2n2). But we should be aware of possible fac-
tor 4/3 overhead in the number of additive shifts per a
ternary modular shift as suggested, for example, by Fig-
ure 5. (Of course 4/3× 0.796 ≈ 1.06.)
To summarize, solutions based on emulation of binary
ripple-carry adders are still win-win over the comparable
true ternary ripple-carry designs in the context of the
modular exponentiation; when carry lookahead adders
are used, the two options have nearly identical non-
Clifford depth numbers, but there is notable width reduc-
tion advantage (factor of log3(2)) of using true ternary
solution over the emulated binary one.
F. Circuits for quantum Fourier transform
In the solutions for period finding discussed so far, the
quantum cost is dominated by the cost of modular expo-
nentiation represented by an appropriate reversible clas-
sical circuit. In this context just a fraction of the cost
falls onto the quantum Fourier transform. Nevertheless,
for the sake of completeness we discuss some designs for
emulating binary quantum Fourier transform on ternary
computers and implementing ternary Fourier transform
directly in ternary logic.
Odd radix Fourier transforms appeared in earlier quan-
tum algorithm literature. In particular [50] outlines
the benefits of “trinary” (ternary) Fourier for low-width
Shor factorization circuits and also briefly sketches how
ternary Fourier transform can be emulated in multi-qubit
framework. On a more general level, Ref. [24] describes
quantum Fourier transform over Zp. In Subsection III F 2
we develop specific circuitry for a version of such a trans-
form over Zp where p is some integer power of 3.
1. The case of emulated binary
A familiar binary circuit for approximate Fourier trans-
form in dimension 2n with precision δ consists of roughly
Θ(n log(n/δ)) controlled phases and n binary Hadamard
gates (see [38], Section 5). In known fault-tolerant bi-
nary frameworks, the phases epi i/2
k
, k ∈ Z occurring in
the Fourier transform have to be treated just like generic
11
phases. Of all the possible ways to emulate a controlled
phase gate we will focus on just one with minimal para-
metric cost. This is the one with one clean ancilla, two
Toffoli gates and one uncontrolled phase gate. (It is not
clear when exactly this design has been invented, but c.f.
[48], Section 2 for a more recent discussion.)
Given the control qubit |c〉 and target qubit |t〉 the con-
trolled phase gate C(P (ϕ)), |ϕ| = 1 is emulated by ap-
plying Toffoli(I ⊗ I ⊗ P (ϕ)) Toffoli to the state |c〉|t〉|0〉.
Ternary emulation of Toffoli gate is discussed in detail
in Section IV. Somewhat surprisingly, ternary emula-
tion of uncontrolled phase gates in practice incurs larger
overhead than emulation of classical gates. Also the
binary Hadamard gate is a Clifford gate in the binary
framework, but cannot be emulated by a ternary Clif-
ford circuit. This introduces additional overhead factor
of (1 + Θ(1/ log(1/δ))).
2. The case of true ternary
We develop our own circuitry for QFT over Z3n based
on the textbook Cooley Tukey procedure.
Quantum Fourier transform in the n qutrit state space
is given by the unitary matrix
QFT3n = [ζ
j k
3n ] (15)
where ζj k3n is the 3
n-th root of unity. In particular the
QFT3 coincides with the ternary (Clifford) Hadamard
gate.
The following recursion for n > 1 is verified by straight-
forward direct computation:
QFT3n = ΠnQFT3n−1(Λ(Dn))QFT3 (16)
where Πn is a certain n-qutrit permutation,
Dn = diag(1, ζ3n , . . . , ζ
3n−1−1
3n ) (17)
and where Λ is the ternary control.
By further direct computation we observe that
Dn =
n−2∏
k=0
diag(1, ζ3
k
3n , ζ
2×3k
3n ). (18)
The permutation gate Πn is not computationally im-
portant, since it amounts to O(n) qutrit swaps which are
all ternary Clifford.
Aside of this tweak we have decomposed QFT3n recur-
sively into Θ(n2) gates of the form Λ(diag(1, ζ3
k
3m , ζ
2×3k
3m ))
which are ternary analogs of familiar controlled phase
gates.
Similar to the binary case, it is known in general (cf.
[24]) that once we are allowed to approximate the QFT to
some fidelity 1−δ, we can compute the approximate QFT
with Θ(n log(n/δ) + log(1/δ)2) gates. This is because
controlled phase gates with phases in some O(δ/n) can
be dropped from the circuit without compromising the
fidelity.
3. Implementation of binary and ternary controlled phase
gates in the Clifford+R|2〉 basis
In ternary framework a P (ϕ) = |0〉〈0|+ϕ |1〉〈1|, |ϕ| = 1
can be emulated exactly by the balanced two-level gate
P ′(ϕ) = |0〉〈0| + ϕ |1〉〈1| + ϕ−1 |2〉〈2| which is a compo-
sition of the Clifford reflection H2 and the non-Clifford
reflection P ′′(ϕ) = |0〉〈0| + ϕ |1〉〈2| + ϕ−1 |2〉〈1|. Also,
the binary Hadamard gate h = (|0〉〈0|+ |0〉〈1|+ |1〉〈0| −
|1〉〈1|)/√2 is a two-level Householder reflection. As per
[6],[7], both P ′′(ϕ) and h can be effectively approximated
to precision δ by Clifford+R|2〉 circuits with R-counts
≤ C log3(1/δ) +O(log(log(1/δ))) and the constant C in
between 5 and 8. For reference, in the Clifford+T frame-
work the T -count of δ-approximation of a generic phase
gate is in 3 log2(1/δ) +O(log(log(1/δ))).
Thus, emulation of the binary circuit for a binary
Fourier transform incurs no surprising costs.
In pure ternary encoding we need to implement the
ternary analog of controlled phase gate: gates of the form
Λ(diag(1, ϕ, ϕ2)), |ϕ| = 1. This is not difficult after some
algebraic manipulation:
Proposition 3. Given a phase factor ϕ, |ϕ| = 1 and
an arbitrarily small δ > 0 the gate Λ(diag(1, ϕ, ϕ2)) can
be effectively approximated to precision δ by a metaplectic
circuit with at most 40 (log3(1/δ)+O(log(log(1/δ)))) R|2〉
gates.
Alternatively such a δ-approximation can be effec-
tively achieved by a metaplectic circuit with at most
24 (log3(1/δ) + O(log(log(1/δ)))) R|2〉 gates and a fixed-
cost widget with at most 30 P9 gates.
Proof. We note that
Λ(diag(1, ϕ, ϕ2)) = ϕdiag(1, 1, 1, ϕ∗, 1, ϕ, 1, 1, 1)
diag(1, 1, 1, 1, 1, 1, (ϕ∗)2, 1, ϕ2)(diag(ϕ∗, 1, ϕ)⊗ I). (19)
Each of the three factors in this decomposition is a
product of two two-level reflections. It is also notable
that one particular reflection, the τ|0〉,|2〉 coming from
diag(ϕ∗, 1, ϕ) = τ|0〉,|2〉(ϕ |0〉〈2| + |1〉〈1| + ϕ∗ |2〉〈0|) is in
fact ternary Clifford. Therefore we are having a total of
five non-Clifford reflections in this decomposition, two of
which are non-parametric classical reflections.
As per [7] any two-level reflection can be effectively
(δ/5)-approximated by metaplectic circuit with at most
8 (log3(1/δ) + O(log(log(1/δ)))) R|2〉 gates, and this can
be applied to all five non-Clifford reflections. Alterna-
tively, each of the two classical ones can be represented
12
exactly as per [8] using five C2(INC) or, respectively, 15
P9 gates.
Thus implementation of either version of QFT circuit
is never a cost surprise in the metaplectic Clifford+R|2〉
basis.
Although numerologically the R-depth of the required
approximation circuits is a good factor higher than the
T -depth of corresponding circuits required in the Clif-
ford+T framework, we need to keep in mind that the R|2〉
is significantly easier to execute on a natively metaplec-
tic computer since, unlike the T gate it does not require
magic state distillation.
4. Implementation of binary and ternary controlled phase
gates in the Clifford+P9 basis
At the time of this writing emulation of QFT circuits
on a generic ternary computer is not entirely straightfor-
ward.
First of all, we currently do not know an efficient di-
rect circuit synthesis method for Householder reflections
in the Clifford+P9 basis. If follows from [9] that any
ternary unitary gate can be also approximated to preci-
sion δ by an ancilla-free Clifford+P9 circuit of depth in
O(log(1/δ)); but we do not have a good effective proce-
dure for finding ancilla-free circuits of this sort, neither
do we have a clear idea of the practical constant hidden
in the O(log(1/δ)).
As a bridge solution, we show in Appendix A that
the requisite magic state |ψ〉 (see eq. (9)) for the gate
R|2〉 can be emulated exactly and coherently by a set of
effective repeat-until-success circuits with four ancillary
qutrits and expected average P9-count of 27/4. Thus we
can approximate a required uncontrolled phase gate with
an efficient Clifford+R|2〉 circuit and then transcribe the
latter into a corresponding ancilla-assisted probabilistic
circuit over the Clifford+P9 basis. In order to have a
good synchronization with the Clifford+R|2〉 circuit ex-
ecution it would suffice to have the magic state prepa-
ration coprocessor of width somewhat greater than 27.
Since the controlled phase gates and hence the approxi-
mating Clifford+R|2〉 circuits are performed sequentially
in the context of the QFT, this coprocessor is shared
across the QFT circuit and thus the width overhead is
bound by a constant.
On the balance, we conclude that ternary execution of
the QFT is likely to be more expensive in terms of re-
quired non-Clifford units, than, for example, comparable
Clifford+T implementation. However the non-Clifford
depth overhead factor over Clifford+T is upper bounded
by an (α + Θ(1/ log(1/δ))) where α is a small constant.
Such overhead becomes practically valid, however, when
hosting period-finding solutions that make heavy use of
Fourier transform, such as for example the Beauregard
circuit [3] (see Appendix C for a further brief discussion).
G. Comparative cost of ternary emulation vs. true
ternary arithmetic
With the current state of the art ternary arithmetic
circuits, modular exponentiation (and hence Shor’s pe-
riod finding) is practically less expensive with emulated
binary encoding in low width (e.g. small quantum com-
puter); however, when O(m2 log(m)) depth is desired,
pure ternary arithmetic allows for width reduction by a
factor of log3(2) compared to emulated binary circuits,
while requiring essentially the same non-Clifford depth.
IV. IMPLEMENTING REFLECTIONS ON
GENERIC TERNARY AND METAPLECTIC
TOPOLOGICAL QUANTUM COMPUTERS
State of the art implementation of the three-qubit bi-
nary Toffoli gate assumes the availability of the Clif-
ford+T basis [38]. It has been known for quite some time
cf. [1] that a Toffoli gate can be implemented ancilla-free
using a network of CNOTs and 7 T±1 gates. It has been
shown in [21] that this is the minimal T -count for ancilla-
free implementation of the Toffoli gate.
In Section IV A we develop emulations of classical two-
level reflections (which generalize Toffoli and Toffoli-like
gates) on generic ternary computer endowed with the
Clifford+P9 basis as described in Section II C. We also
introduce purely ternary tools necessary for implement-
ing controlled versions of key gates for ternary arithmetic
proposed in [8]. ancillas. This implies of course an emu-
lation of the three-qubit Toffoli gate with 6 P9 gates and
one clean ancilla.
In Section IV B we reevaluate the emulation cost
assuming a metaplectic topological quantum computer
(MTQC) with Clifford+R|2〉 basis as described in Sec-
tion II D. In that setup we get two different options both
for implementing non-Clifford classical two-way transpo-
sitions (including the Toffoli gate) and for circuitizing key
gate for proper ternary arithmetic.
One is direct approximation using Clifford+R|2〉 cir-
cuits. The other is based on the P9 gate but it uses
magic state preparation in the Clifford+R|2〉 basis in-
stead of magic state distillation. This is explained in
detail in Subsection IV B. The first option might be ideal
for smaller quantum computers. It allows circuits of
fixed widths but creates implementation circuits for Tof-
foli gates with the R-count of approximately 8 log3(1/δ)
when 1− δ is the desired fidelity of the Toffoli gate. The
second option supports separation of the cost of the P9
gate into the “online” and “offline” components (similar
to the Clifford+T framework) with the “online” compo-
nent depth in O(1) and the “offline” cost offloaded to a
state preparation part of the computer, which has the
width of roughly 9 log3(1/δ) qutrits but does not need
to remain always coherent.
13
A. Implementing classical reflections in the
Clifford+P9 basis
The synthesis described here is a generic ternary coun-
terpart of the exact, constant T -count representation of
the three-qubit Toffoli gate in the Clifford+T framework.
One distinction of the ternary framework from the bi-
nary one is that not all two-qutrit classical gates are Clif-
ford gates. In particular the τ|10〉,|11〉 reflection which is
a strict emulation of the binary CNOT is not a Clifford
gate; neither is the τ|10〉,|01〉 which which is a strict emula-
tion of the binary SWAP. However, while binary SWAP
can be emulated simply as a restriction of the (Clifford)
ternary swap on binary subspace, the CNOT cannot be
so emulated.
A particularly important two-qutrit building block is
the following non-Clifford gate
C1(INC)|j〉|k〉 = |j〉|(k + δj,1) mod 3〉.
A peculiar phenomenon in multi-qudit computation (in
dimension greater than two) is that a two-qudit classical
non-Clifford gate (such as C1(INC)) along with the INC
gate is universal for the ancilla-assisted reversible clas-
sical computation, cf. [12], whereas a three-qubit gate,
such as Toffoli is needed for the purpose in multi-qubit
case.
The following is a slight variation of a circuit from [8]:
τ|02〉,|2,0〉 = TSWAPC1(INC)2,1 C1(INC)1,2
C1(INC)2,1 C1(INC)1,2 C1(INC)2,1,
where TSWAP is the ternary (Clifford) swap gate. This
suggests using 5 copies of C1(INC) gate for implementing
a two-level two-qutrit reflection. However, this is ineffi-
cient when we only need to process binary data.
Proposition 4. The following classical circuit is an ex-
act emulation of the binary CNOT gate on the binary
data:
SUM2,1(τ|1〉,|2〉 ⊗ τ|1〉,|2〉) TSWAPC1(INC)2,1
C1(INC
†)1,2 (τ|1〉,|2〉 ⊗ τ|1〉,|2〉)SUM†2,1
(20)
Proof. By direct computation.
The two non-Clifford gates in this circuit are the
C1(INC) and C1(INC
†). (To avoid confusion, note that
the gate as per Eq. (20) is no longer an axial reflection
on ternary data.)
The C1(INC) is Clifford-equivalent to the C1(Z) =
diag(1, 1, 1, 1, ω3, ω
2
3 , 1, 1, 1) gate (ω3 = e
2pi i/3), and the
latter gate is represented exactly by the network shown
in Figure 6 (up to a couple of local τ|0〉|1〉 gates and a
local Q gate).
Plugging in corresponding representations of C1(INC)
and C1(INC
†) into the circuit (20) we obtain an exact
emulation of CNOT that uses 6 instances of the P±19
gate.
Remark 5. By using an available clean ancilla, we can
exactly represent the C1(Z) in P9-depth one. The cor-
responding circuit is equivalent to one shown in Figure
7. Thus the CNOT gate can be emulated on binary data
using a clean ancilla in P9-depth two.
Thus when depth is the optimization goal, a clean an-
cilla can be traded for triple compression in non-Clifford
depth of ternary emulation of the CNOT. (This rewrite
is similar in nature to the one employed in [26] for the
binary Margolus-Toffoli gate.)
Proposition 6. A three-qubit Toffoli gate can be emu-
lated, ancilla-free, by the following three-qutrit circuit:
(SUM† ⊗ I)(I ⊗ τ|20〉,|21〉)(SUM⊗ I) (21)
This circuit requires 15 P9 gates to implement.
Proof. The purpose of the emulation is perform the
|110〉 ↔ |111〉 reflection in the binary data subspace.
Having applied the rightmost SUM ⊗ I we find that
(SUM ⊗ I)|110〉 = |120〉, (SUM ⊗ I)|111〉 = |121〉 and
we note that the latter two are the only two transformed
binary basis states to have the second trit equal to 2.
Therefore the I⊗τ|20〉,|21〉 operator affects only these two
transformed states. By uncomputing the SUM ⊗ I we
conclude the emulation.
Importantly and typically we can reduce the emulation
cost by using a clean ancilla. To this end we first prove
the following
Lemma 7. Let U be n-qubit unitary and let the binary-
controlled (n+ 1)-qubit unitary C(U) be emulated in the
binary subspace of an m-qutrit register m > n. Then one
level of binary control can be added to emulate C(C(U))
in an (m+ 2)-qutrit register using 6 additional P9 gates;
one of the new qutrits is a clean ancilla in state |0〉 and
the other new qutrit emulates the binary control.
With one more ancilla the additional P9 gates can be
stacked to P9-depth 2.
Proof. We prove the lemma by explicitly extending the
emulation circuit. Let c1 be a label of the qutrit emu-
lating the control wire of C(U). Let c2 be a label of the
new qutrit to emulate the new control wire. Let a be the
label of the new clean ancilla.
Apply the sequence of gates C2(INC)c2,aSUMc1,c2
(right to left) then use the ancilla a as the control
in the known emulation of C(U), then unentangle:
SUM†c1,c2C2(INC)
†
c2,a.
The circuit applies correct emulation to the binary sub-
space of the (m + 2)-qutrit register. The correctness is
straightforward: within the binary subspace SUMc1,c2
generates |2〉 on the c2 wire. The C2(INC)c2,a promotes
the ancilla to |1〉 if and only if |c1, c2〉 = |11〉. Therefore
U is triggered only by the latter basis element, which is
the definition of the dual binary control.
The cost estimate follows from the fact that
C2(INC)c2,a and its inverse take 3 P9 gates each.
14
Z
∼
P9 INC P9 INC P9 INC
Figure 6: Exact representation of C1(Z) in terms of P9 gates.
|0〉 INC†
INC
INC
P9
P9
P9 INC†
INC†
INC |0〉
Figure 7: Exact representation of C0(Z) in P9-depth one.
Corollary 8. Three-qubit Toffoli gate can be emulated
in four qutrits (allowing one clean ancilla) with 12 P9
gates at P9-depth of 4.
Indeed Toffoli = CC(NOT) and C(NOT) takes 6 P9
gates with no ancillas to emulate as per Proposition 4.
Corollary 9. Four-qubit binary-controlled Toffoli gate
CCC(NOT) can be emulated
1) in six qutrits (allowing two clean ancillas) with 18
P9 gates at P9-depth of 6.
2) in five qutrits (allowing one clean ancilla) with 21
P9 gates.
Proof. For 1), we emulate using Lemma 7 and Corollary
8.
For 2), we emulate using Lemma 7 and Proposition
6
We will further use the three-qutrit “Horner” gate
Λ(SUM):
Λ(SUM)|i, j, k〉 = |i, j, k+ i j mod 3 〉, i, j, k ∈ {0, 1, 2}
as a tool for adding levels of control to emulated binary
and true ternary gates.
Recall from [8], Figure 18 and discussion, that the best-
known non-Clifford cost of Λ(SUM) is 4 P9 gates at P9-
depth 2.
We now proceed to implement the completely ternary
four-qutrit gate ΛΛ(SUM) using the same constuction as
above
Proposition 10. Label primary qutrits with 1, 2, 3, 4 and
label a clean ancillary qutrit in state |0〉 with 5. Then the
following circuit implements the ΛΛ(SUM) gate on the
primary qutrits:
Λ(SUM)†1,2,5Λ(SUM)3,5,4Λ(SUM)1,2,5 (22)
This circuit requires 12 P9 gates to implement.
However, as follows from discussion in Sections III C
and III A, controlled ternary modular exponentiation also
relies on another form of the doubly-controlled SUM gate:
the strictly controlled Horner gate Cf (Λ(SUM)), f ∈
{0, 1, 2} where the Horner gate Λ(SUM) is activated only
by the basis state |f〉 of the top qutrit.
A certain implementation of the Cf (SUM) has been
developed in [8] costing 15 P9 gates.
The following Proposition explains how to insert an-
other level of ternary control using a cascade of Horner
gates again
Proposition 11. Label primary qutrits with 1, 2, 3, 4 and
label a clean ancillary qutrit in state |0〉 with 5. Then the
following circuit implements the Cf (Λ(SUM))) gate on
the primary qutrits:
Λ(SUM)†2,3,5Cf (SUM)1,5,4Λ(SUM)2,3,5 (23)
This circuit takes 23 P9 gates to implement.
With one additional ancilla the circuit can be restacked
to have P9-depth of 9.
Let us give a direct proof for transparency
Proof. By definition, given a four-qutrit state |j, k, `,m〉,
we must have Cf (Λ(SUM)))|j, k, `,m〉 = |j, k, `,m +
δj,f k `〉.
After applying the rightmost Horner gate to the clean
ancilla we have the ancilla in the |k `〉 state. The
correctness of (23) now follows from the definition of
Cf (SUM)1,5,4.
The best known circuitry for the components yield the
cost of 15 + 2× 4 = 23 P9 gates.
15
B. Implementing classical reflections in metaplectic
Clifford+R|2〉 basis
It has been shown in [7] that, given a small enough δ >
0 any n-qutrit two-level Householder reflection can be
approximated effectively and efficiently to precision δ by
a Clifford+R|2〉 circuit containing at most 8 log3(1/δ) +
O(log(log(1/δ))) + O((2 +
√
5)n) instances of the R|2〉
gate. In particular, when n = 1 the asymptotic term
O((2 +
√
5)n) resolves to exactly 1 and when n = 2 it
resolves to exactly 4. In both cases it is safe to merge
this term with the O(log(log(1/δ))) term.
The single-qutrit P9 gate is the composition of the
ternary Clifford gate τ|0〉,|2〉 and the Householder reflec-
tion ω9 |0〉〈2| + |1〉〈1| + ω−19 |2〉〈0|. The two-qutrit gate
CNOT = τ|10〉,|11〉 is by itself a two-level Householder re-
flection R(|10〉−|11〉)/√2. Similarly, Toffoli = τ|110〉,|111〉 =
R(|110〉−|111〉)/√2. Therefore, our results apply and we
have efficient strict emulations of P9, CNOT and Tof-
foli gates at depths that are logarithmic in 1/δ and in
practice are roughly 8 log3(1/δ) in depth.
We note that the direct metaplectic approximation of
classical reflections is significantly more efficient than the
circuits expressed in Cf (INC) gates (as each of the latter
have to be approximated).
Let us briefly review such direct approximation in the
context of ternary arithmetic in ternary encoding. As per
[8], the generalized carry gate of the ternary ripple-carry
additive shift contains two classical non-Clifford reflec-
tions ([8], Fig. 5) that can be represented at fidelity 1−δ
by a metaplectic circuit of R-count at most 16 log3(1/δ).
The same source implies that the carry status merge
widget M which is key in the carry lookahead additive
shift is Clifford-equivalent to a Cf (SUM) which is eas-
ily decomposed in four classical two-level reflections and
thus can be represented at fidelity 1− δ by a metaplectic
circuit of R-count at most 32 log3(1/δ).
A sufficient per-gate precision δ may be found in
O(1/(d log(n))) where d is the depth of the modular ex-
ponentiation circuit expressed in non-Clifford reflections.
Therefore, injecting metaplectic circuits in place of reflec-
tions creates an overhead factor in Θ(log(d) log(log(n))).
While being asymptotically moderate, such overhead
could be a deterrent when factoring very large numbers.
This motivates us to explore constant-depth approxima-
tions of classical reflections as in the next section.
C. Constant-depth implementation of CNOT and
Cf (INC) on ternary quantum computers.
We demonstrate that integer arithmetic on a ternary
quantum computer can be efficient both asymptotically
and in practice. We build on Section IV A that describes
exact emulation of CNOT with 6 instances of the P9 gate.
A core result in [13] implies that the P9 gate can be ex-
ecuted exactly by a deterministic state injection circuit
using one ancilla, one measurement and classical feed-
back, provided availability of the “magic” ancillary state
µ = ω−19 |0〉+ |1〉+ ω9 |2〉.
The state injection circuit is given in Figure 1.
Assuming, hypothetically, that the magic state µ can
be prepared in a separate ancillary component of the
computer (then teleported), we get, a separation of the
quantum complexity into “online” and “offline” compo-
nents - similar to one employed in the binary Clifford+T
network.
We call these components the execution and prepa-
ration components. We use the term execution depth
somewhat synonymously to “logical circuit depth”. The
execution part of the P9 state injection, hence CNOT,
Toffoli emulations as well as implementation of Cf (INC)
are constant depth. The magic preparation can run sep-
arately in parallel when the preparation code is granted
enough width.
In the context of the binary Clifford+T network, as-
suming the required fidelity of the T gate is 1− δ, δ > 0,
there is a choice of magic state distillation solutions. For
comparison we have selected a particular one protocol
described in [11]. At the top level, it can be described
as a quantum code of depth in O(log(log(1/δ))) and
width of approximately O(loglog3(15)(1/δ)). The newer
protocol in [10] achieves asymptotically smaller width in
O(logγ(k)(1/δ)) where k is an error correction hyperpa-
rameter, and γ(k)→ log2(3) when k →∞. However the
[10] is a tradeoff rather than a win-win over [11] in terms
of practical width value.
In comparison, the magic state distillation for a generic
ternary quantum computer, described in [13] maps onto
quantum processor of depth in O(log(log(1/δ))) and
width of O(log3(1/δ)). Therefore the preparation of a
magic state by distillation requires asymptotically larger
width than the one for Clifford+T basis.
We observe that the prepartion width is asymptotically
better at O(log(1/δ)) and significantly better in practice
when the target ternary computer is MTQC. Since the
MTQC implements the universal Clifford+R|2〉 basis that
does not require magic state distillation, the instances of
the magic state µ can be prepared on a much smaller
scale.
Observation 12. (see [6], Section 4) An instance of
magic state µ can be prepared at fidelity 1 − δ by a
Clifford+R|2〉 circuit of non-Clifford depth in r(δ) =
6 log3(1/δ) +O(log(log(1/δ))).
To synchronize with the P9 gates in the logical cir-
cuit we need to pipeline r(δ) instances of the magic state
preparation circuit, so we always have a magic state at
fidelity 1− δ ready to be injected into the P9 protocol.
One important consequence of the synchronization re-
quirement is that higher parallelization of non-Clifford
operations reflects proportionally in an increase in width
of the preparation coprocessor.
16
In particular, when we employ low-width circuits for
Shor’s period finding, such as based on ripple-carry addi-
tive shifts, then it suffices to produce a constant number
of clean magic states per time step. For example, if the
ternary Cf (INC) is taken as the base classical gate and
its realization shown in figure 7 then we need three clean
magic states per a time step.
Suppose now we employ an n-bit quantum carry looka-
head adder in the same context. In order to preserve the
logarithmic time cost advantage we should be able to
perform up to n base reflection gates (such as Toffolis) in
parallel or at least O(n/ log(n)) such gates in parallel on
average. Thus the preparation component must deliver
at least O(n/ log(n)) clean magic states per time step
and widens the preparation component by that factor.
V. PLATFORM SPECIFIC RESOURCE
COUNTS
In a more conventional circuit layout for Shor’s pe-
riod finding, the ≈ N2 modular exponentiations |ak
mod N 〉, k ∈ [1..N2], are done in superposition over
k and the width of such superposition trivially depends
on the integer representation radix. Thus the purely
ternary encoding has the width advantage with a factor
of log3(2).
However, on a small quantum computer platform a
more practical approach is to use a single control (cf.
[39] or [3], Section 2.4), which allows to iterate through
the modular exponentiations using only one additional
qubit (resp. qutrit).
With this method in mind our principal focus is on the
overall cost of modular exponentiation.
We assume that for bitsize n, the ε = 1/ log(n) is a
sufficient end-to-end precision of the period-finding cir-
cuit. Then the atomic precision δ per individual gate,
or rather per individual clean magic state within the cir-
cuit depends on circuit size. The circuits under compar-
ison differ asymptotically in depth but not in size, which
is in O(n3) (disregarding the slower O(log(n)) terms).
We observe that log(1/δ) is roughly 3 log(n) for the re-
quired δ. It follows that the distillation width for one
clean magic state scales like (3 log2(n))
3 in the ternary
context. In case on magic state preparation in the meta-
plectic basis one needs at most 6×3 log3(n) = 18 log3(n)
R-gate per a clean P9 magic state. There has been a
wide array of magic state distillation protocols for the
Clifford+T benchmark. For practical reasons and for
simplicity we have selected the Bravyi-Kitaev protocol
([11]) where the raw magic state consumption scales like
O(log(1/precision)log3(15)); log3(15) ≈ 2.465. This scal-
ing is shown in the “preparetion width” cells in the re-
source tables below. An attractive alternative would be
the Bravyi-Haah protocol ( [10]). The protocol is de-
fined by the hyperparameter k of the underlying [n, k, d]
error correction code and requires preparation width in
O((log2(n))
γ(k)) where γ(k) ≈ log2((3 k + 8)/k). In par-
ticular for k = 8 the protocol distills 8 magic states si-
multaneously and γ(k) ≈ 2. Unfortunately this protocol
is more sensitive to the fidelity of the raw magic states
and this is one of the reasons we decided not to cost it
out at this time. One needs to be mindful that the scal-
ing exponent γ(k) can in principle be made smaller than
2 under certain circumstances.
Tables IV and V contain comparative resource esti-
mates for the modular exponentiation circuits based, re-
spectively, on the ripple-carry additive shift and the carry
lookahead additive shift. For simplicity, only asymptot-
ically dominating terms are represented. An actual re-
source bound may differ by terms of lower order w.r.t.
log(n).
In addition to resource counts for ternary processing
we have provided the same for Clifford+T solutions as
a backdrop. In the Clifford+T basis, resource estimate
in Table IV for low-width modular exponentiation on a
binary quantum computer is based on [23] in which an
implementation was given that uses 2n+2 logical qubits.
The Toffoli depth of the circuit in [23] can be analyzed
to be bounded by 160n3. Note that the Toffoli depth is
equal to T -depth, provided that 4 additional ancillas are
available, leading to an overall circuit width of 2n + 6.
The two resource estimates in Table V for reduced-depth
modular exponentiation are based on [29] and [20]: in
[29] an implementation for an arbitrary coupling archi-
tecture was given that uses 3n+ 6 log2(n) +O(1) qubits
and has a total depth of 12n2+60n log22(n)+O(n log(n)).
This implementation is based on a gate set that has ar-
bitrary rotations. To break this further into Clifford+T
operations, we require an increase in terms of depth of
4 log2(1/ε) = 12 log2(n) as each rotation has to be ap-
proximated with accuracy ε ≈ 1/n3. Up to leading or-
der, this leads to the estimate of the circuit depth of
144n2 log2(n) given in the table. In [20] a Toffoli based
circuit to implement an adder in depth 4 log2 n was given
that needs 4n−ω1(n) qubits, where ω1 denotes the Ham-
ming weight of the integer n. As there are O(n) Toffoli
in parallel in this circuit, we use the implementation of
a Toffoli gate in T -depth 3 from [1] to implement a sin-
gle addition in T -depth 12 log2(n). The modular addi-
tion can be implemented then using 3 integer additions.
To implemented Shor’s algorithm, we need 2n2 modu-
lar additions, leading to an overall T -depth estimate of
72n2 log2(n).
The rightmost column of either table lists counts pro-
portional to either the number of raw magic states or, in
the case of MTQC to the number of metaplectic magic
states required per a time step of the circuit.
The logarithmic execution depth for integer addition
is achieved by using carry lookahead additive shift cir-
cuit. However this comes at significant width cost, as
the circuit performs in parallel up to n (in the worst
case) or roughly n/ log2(n) (on average) reflection gates.
This requires a corresponding number of magic states or
metaplectic registers simultaneously, and therefore the
preparation width numbers in the last column of Table
17
Table IV Size comparison for low-widths modular exponentiation circuits. n is the bitsize, m = dlog3(2)ne, ω1(.) is
the number of 1s in corresponding ternary or binary expansion.
Platforms Circuit width Circuit depth (P9/R|2〉/T ) Preparation width
Emulated binary, metaplectic, via P9 n+ 4 48n
3 54× log3(n)
Section III A, emulated binary, via P9 n+ 4 48n
3 3(3 log2(n))
3
Ternary, metaplectic, via P9 2m− ω1(m) ≈ 76.35n3 54× log3(n)
Section III A, ternary, via P9 2m− ω1(m) ≈ 76.35n3 3(3 log2(n))3
Emulated binary, MTQC inline n+ 4 432n3 log3(n) 3
Ternary, MTQC inline 2m− ω1(m) ≈ 506.3n3 log3(n) 3
Haener et al. [23], Takahashi [44] 2n+ 6 (qubits) 160n3 a ∼ n× (6 log2(n))γ
a Here log2(3) < γ ≤ log3(15) depending on practically
applicable distillation protocol. n× reflects the worst case
bound on the logical width of the circuit.
Table V Sizes for reduced-depth modular exponentiation circuits. n is the bitsize, m = dlog3(2)ne, ω1(.) is the
number of 1s in corresponding ternary or binary expansion.
Circuits Circuit width Circuit depth (P9/R|2〉/T ) p width
Emulated binary, metaplectic, via P9 4n− ω1(n) 120n2 log2(n) 54× n log3(n)
Section III C, emulated binary, via P9 4n− ω1(n) 120n2 log2(n) 12n (3 log2(n))3
Ternary, metaplectic, via P9 4m− ω1(m) ≈ 127.4n2 log2(n) 54×m log3(m)
Section III C, ternary, via P9 4m− ω1(m) ≈ 127.4n2 log2(n) 12n (3 log2(n))3
Emulated binary, MTQC inline 3n− ω1(n) 384n2 log3(2)(log2(n))2 3n
Ternary, MTQC inline 3m− ω1(m) ≈ 1630.5n2 log3(2)(log2(n))2 3m
Binary, via Clifford+T, [20] 4n− ω1(n) (qubits) 72n2 log2(n) a 3n (6 log2(n))γ
Binary, via Clifford+T, [29] 3n+ 6 log2(n) (qubits) 144n
2 log2(n) 3n (6 log2(n))
γ
a Here log2(3) < γ ≤ log3(15) depending on practically
applicable distillation protocol.
V are multiplied by the corresponding bit size, or, respec-
tively, trit size. This represents the critical path bound
on the magic state preparation width of the solution.
In both tables the preparation width bound for ternary
processing is dominated by the width of the Cf (INC).
The Tables do not exhaust the vast array of possible
depth/width tradeoffs. We have chosen to represent
Cf (INC) with non-Clifford depth one as shown in Fig.
7. This circuit has the P9-width of 3 and requires a
clean ancillary qutrit. For the ripple-carry solution the
ancillary qutrit is reused and has minimal impact. For
the carry lookahead solution up to n (respectively, up
to m = dlog3(2)n) ancillas must be available in parallel
which inflates the online width by more than 30 percent.
The fifth and sixth rows show tradeoff based on direct
approximation of Toffoli gates and (controlled) Cf (INC)
gates, respectively, by topological metaplectic circuits
to precision Θ(1/n3). The topological metaplectic R|2〉
gates are executed sequentially for each individual arith-
metic gate. This nearly eliminates the need for the
magic state preparation, as only 3 topological ancillas
are needed at a time in the injection circuit for the R|2〉
(Figure 2). This tradeoff introduces the online depth
of a subcircuit for a Toffoli gate of roughly 24 log3(n).
A corresponding subcircuit for a Cf (INC) then has on-
line depth of 48 log3(m) (two two-level reflections). For
Cf (ΛΛ(INC)), it is 192 log3(m) (eight two-level reflec-
tions).
We estimate the number of required controlled inte-
ger additive shifts in a modular exponentiation circuit
as 6n2 (2n2 controlled modular additions) when binary
emulation is used and as 16m2 (4m2 controlled modular
additions) when ternary encoding is used. These bounds
define the execution depth columns in both tables.
The most significant distinction in the Tables IV, V is
the asymptotical advantage in the magic state prepara-
18
tion width with the MTQC.
There is also a tradeoff between emulated binary en-
coding and true ternary encoding on a ternary quantum
processor. It is seen from Table IV that with ripple-carry
adders (e.g., when targeting a small quantum computer)
we get a moderate practical advantage in non-Clifford cir-
cuit depth when emulating binary encoding and a small
advantage in width compared to the use of true ternary
encoding. This is true even accounting for the fact that
the trit size m is smaller than the bit size n by the factor
of log3(2).
On the other hand when carry lookahead adders are
used, the difference in the overall non-Clifford circuit
depth between the two encoding scenarios is insignifi-
cant, unless inline metaplectic circuits with MTQC are
compiled. But the use of true ternary encoding yields the
width advantage by a factor of roughly log3(2). In the
fifth and sixth lines of Table V, the use of emulated bi-
nary encoding is practically better than the use of ternary
encoding. Intuitively, this is because the metaplectic cir-
cuits are reflection-oriented and best suited for direct ap-
proximation of the (controlled) Toffoli gates that are two-
level reflections, whereas ternary arithmetic gates such as
Cf (INC) or Horner have to be first decomposed into sev-
eral two-level reflections.
The resource bounds shown in the tables provide a
great deal of flexibility in selecting a resource balance
appropriate for a specific ternary quantum computer.
On a generic ternary quantum computer where univer-
sality is achieved by distillation of magic states for the
P9 gate the choice of encoding and arithmetic circuits is
likely to be dictated by the size of the actual computer.
When native metaplectic topological resources are avail-
able, magic states for the P9 gate are prepared asymptot-
ically more efficiently. Metaplectic also offers the third
choice: that of bypass the P9 gate altogether and us-
ing inline metaplectic circuits instead at the cost of a
factor in O(log(bitsize)) in circuit depth expansion. In
this scenario using emulated binary encoding of integers
is always more efficient in practice than the use of true
ternary encoding.
VI. CONCLUSION
We have investigated implementations of Shor’s period
finding function ([42]) in quantum ternary logic. We per-
formed comparative resource cost analysis targeting two
prospective quantum ternary platforms. The “generic”
platform uses magic state distillation as described in [13]
for universality. The other, one referred to as MTQC
(metaplectic topological quantum computer), is a non-
Abelian anyonic platform, where universality is achieved
by a relatively inexpensive protocol based on anyonic
braiding and interferomic measurement [17],[18].
On each of these platforms we considered two different
logical solutions for the modular exponential circuit of
Shor’s period finding function: one where the integers are
encoded using the binary subspace of the ternary state
space and ternary optimizations of known binary arith-
metic circuits are employed; the other ternary encoding
of integers and arithmetic circuits stemming from [8] are
used.
On the MTQC platform we additionally consider semi-
classical metaplectic expansions of arithmetic circuits;
the non-Clifford depth of such a circuit is larger than the
non-Clifford depth of the corresponding classical arith-
metic circuit by a factor of O(log(bitsize)). Notably, cir-
cuits of this type bypass the need for magic states and
the P9 gate entirely.
We have derived both asymptotic and practical bounds
on the quantum resources consumed by the Shor’s period
finding function for practically interesting combinations
of platform, integer encoding and modular exponenti-
ation. For evaluation purposes we have derived such
bounds for widths and non-Clifford depths of the logi-
cal circuits as well as for sizes of the state preparation
resources that either distill or prepare necessary magic
states with the required fidelity.
We find significant asymptotic and practical advan-
tages of the MTQC platform compared to other plat-
forms. In particular this platform allows factorization of
an n-bit number using the smallest possible number of
n + 7 logical qutrits at the cost of inflating the depth
of the logical circuit by a logarithmic factor. In scenar-
ios where increasing the depth is undesirable, the MTQC
platform still exhibits significant advantage in the size of
the magic state preparation component that is linear in
the bitsize of the target fidelity (compared to cubic or
near-cubic for a generic magic state distillation).
An interesting feature of our ternary arithmetic cir-
cuits is the fact that the denser and more compact
ternary encoding of integers does not necessarily lead
to more resource-efficient period finding solutions com-
pared to binary encoding. As a rule of a thumb: if
low-width circuits are desired, then binary encoding of
integers combined with ternary arithmetic gates appears
more efficient both in terms of width and depth than a
pure ternary solution. However, even a moderate ancilla-
assisted depth compression, such as provided by carry
lookahead additive shifts, tips the balance in favor of
ternary encoding and ternary arithmetic gates.
In summary, having a variety of encoding and logic
options, provides flexibility when choosing period finding
solutions for ternary quantum computers of varying sizes.
ACKNOWLEDGMENTS
The Authors are grateful to Jeongwan Haah for use-
ful references. We would also like to thank Tom Draper
and Sandy Kutin for providing 〈q|pic〉 [19] which we used
for typesetting most of the figures in this paper. We are
thankful to an anonymous Reviewer for insightful com-
ments that inspired us to rewrite the paper into its cur-
rent, more comprehensive format.
19
[1] M. Amy, D. Maslov, M. Mosca, and M. Roetteler,
“A meet-in-the-middle algorithm for fast synthesis of
depth-optimal quantum circuits,” IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Sys-
tems 32(6) (2013)
[2] M. Baraban, N. E. Bonesteel, and S. H. Simon, “Re-
sources required for topological quantum factoring,”
Phys. Rev. A 81(062317) (2010)
[3] S. Beauregard, “Circuit for Shor’s algorithm using 2n+3
qubits,” Quantum Information and Computation 3(2
(2003)
[4] D. Beckman, A. N. Chari, S. Devabhaktuni, and
J. Preskill, “Efficient networks for quantum factoring,”
Phys. Rev. A 54(1034-1063) (1996)
[5] E. Bernstein, and U. V. Vazirani, “Quantum complexity
theory,” SIAM J. Comput. 26(5) (1997)
[6] A. Bocharov, “A Note on Optimality of Quantum Cir-
cuits over Metaplectic Basis,” 2016, 1606.02315
[7] A. Bocharov, S. X. Cui, V. Kliuchnikov, and Z. Wang,
“Efficient topological compilation for a weakly integral
anyonic model,” Phys. Rev. A 93(012313) (2016)
[8] A. Bocharov, S. X. Cui, M. Roetteler, and
K. M. Svore, “Improved quantum ternary arithmetics,”
QIC 16(9,10) (2016)
[9] J. Bourgain, and A. Gamburd, “Spectral gaps in
SU(d),” Comptes Rendus Mathematique 348(11,12)
(2010)
[10] S. Bravyi, and J. Haah, “Magic state distillation with
low overhead,” Phys. Rev. A 86(052329) (2012)
[11] S. Bravyi, and A. Kitaev, “Universal quantum compu-
tation with ideal Clifford gates and noisy ancillas,” Phys.
Rev. A 32(6) (2005)
[12] G. K. Brennen, S. S. Bullock, and D. P. O’Leary, “Ef-
ficient circuits for exact-universal computation with qu-
dits,” QIC 6(4,5) (2006)
[13] E. T. Campbell, H. Anwar, and D. E. Browne, “Magic
state distillation in all prime dimensions using quantum
Reed-Muller codes,” Phys. Rev. X 2(041021) (2012)
[14] R. Cleve, and J. Watrous, “Fast parallel circuits for the
quantum Fourier transform,” FOCS ’00 Proceedings of
the 41st Annual Symposium on Foundations of Computer
Science 526 (2000)
[15] C. Jones, “Multilevel distillation of magic states for
quantum computing,” Phys. Rev. A 87(042305) (2013)
[16] S. A. Cuccaro, T. G. Draper, S. A. Kutin, and
D. P. Moulton, “A new quantum ripple-carry addition
circuit,” 2004, quant-ph/0410184
[17] S. X. Cui, S. M. Hong, and Z. Wang, “Universal quan-
tum computation with weakly integral anyons,” Quan-
tum Information Processing 14(2687–2727) (2015)
[18] S. X. Cui, and Z. Wang, “Universal quantum com-
putation with metaplectic anyons,” J. Math. Phys.
56(032202) (2015)
[19] A. Draper and S. Kutin “〈q|pic〉: Creating quan-
tum circuit diagrams in TikZ,” 2016, Available from
https://github.com/qpic/qpic
[20] T. G. Draper, S. A. Kutin, E. M. Rains, and K. M. Svore,
“A logarithmic-depth quantum carry-lookahead adder,”
Quantum Information and Computation 6(4–5) (2006)
[21] D. Gosset, V. Kliuchnikov, M. Mosca, and V. Russo,
“An algorithm for the T-count,” QIC 14(15,16) (2014)
[22] A. D. Greentree, S. G. Schirmer, F. Green,L. C. L. Hol-
lenberg,A. R. Hamilton, and R. G. Clark, “Maximizing
the Hilbert Space for a Finite Number of Distinguishable
Quantum States,” Phys. Rev. Lett. 92(097901) (2004)
[23] Th. Haener, M. Roetteler, and K. M. Svore “Factoring
using 2n+ 2 qubits with Toffoli-based modular multipli-
cation,” in preparation (2016)
[24] S. Hallgren, and L. Hales, “An Improved Quantum
Fourier Transform Algorithm and Applications,” IEEE
54th Annual Symposium on Foundations of Computer
Science (2000)
[25] M. Howard, and J. Vala, “Qudit versions of the qubit
pi/8 gate,” Phys. Rev. A 86(022316) (2012)
[26] C. Jones, “Novel constructions for the fault-tolerant
Toffoli gate,” Phys. Rev. A 87(022328) (2013)
[27] M. H. A. Khan, and M. A. Perkowski, “Quantum
ternary parallel adder/subtractor with partially-look-
ahead carry,” Jornal of System Architecture 53 (2007)
[28] D. E. Knuth, “The Art of Computer Programming. Vol-
ume 2. Seminumerical Algorithms,” Addison-Wesley
(1969)
[29] S.A. Kutin, “Shor’s algorithm on nearest-neighbor ma-
chine,” 2006, quant-ph/0609001
[30] M. Malik, M. Erhard, M. Huber, H. Sosa-Martinez,
M. Krenn, R. Fickler, and A. Zeilinger, “Multi-photon
entanglement in high dimensions,” Nature Photonics
10(248–252) (2016)
[31] I. L. Markov, and M. Saeedi, “Constant-optimized quan-
tum circuits for modular multiplication and exponentia-
tion,” Quantum Information and Computation 12(5,6)
(2012)
[32] I. L. Markov, and M. Saeedi, “Faster quantum num-
ber factoring via circuit synthesis,” Phys. Rev. A
87(012310) (2013)
[33] M. D. Miller, G. W. Dueck, and D. Maslov, “A synthe-
sis method for MVL reversible logic,” 34th IEEE Inter-
national Symposium on Multiple-Valued Logic (ISMVL)
(2004)
[34] M. Morisue, K. Oochi, and M. Nishizawa, “A novel
ternary logic circuit using Josephson junction,” IEEE
Trans. Magn. 25(2) (1989)
[35] M. Morisue, J. Endo, T. Morooka, and N. Shimzu, “A
Josephson ternary memory circuit,” Multiple-Valued
Logic, 1998. Proceedings. 1998 28th IEEE International
Symposium on (1998)
[36] A. Muthukrishnan, and C. Stroud Jr., “Multivalued
logic gates for quantum computation,” Phys. Rev. A
62(051309) (2000)
[37] C. Nayak, S. H. Simon, M. Freedman, and S. D. Sarma,
“Non-abelian anyons and topological quantum computa-
tion,” Rev. Mod. Phys. 80(1083) (2008)
[38] M. A. Nielsen and I. L. Chuang, Quantum Computation
and Quantum Information (Cambridge University Press,
Cambridge, UK, 2000)
[39] S. Parker, and M. B. Plenio, “Efficient factorization
with a single pure qubit and log N mixed qubits,” Phys.
Rev. Lett. 85(3049–3052) (2000)
[40] P. Pham, and K. M. Svore, “ A 2D Nearest-Neighbor
Quantum Architecture for Factoring in Polylogarith-
mic Depth,” Quantum Information and Computation
13(11,12) (2013)
20
[41] T. Satoh, S. Nagayama, and R. Van Meter, “A reversible
ternary adder for quantum computation,” Asian Conf.
on Quantum Information Science, 2007 (2007)
[42] P. W. Shor, “Polynomial-time algorithms for prime fac-
torization and discrete logarithms on a quantum com-
puter,” SIAM J. Comput. 26 (1484–1509) (1997)
[43] A. Smith, B. E. Anderson, H. Sosa-Martinez,
I. H. Deutsch, C. A. Riofrio, and P. S. Jessen,
“Quantum control in the Cs 6S1/2 ground manifold
using rf and µw magnetic fields.,” Phys. Rev. Lett.
111(170502)(2013)
[44] Y. Takahashi, and N. Kunihiro, “A quantum circuit
for Shor’s factoring algorithm using 2n+2 qubits,” QIC
6(2) (2006)
[45] R. Van Meter, and K. M. Itoh, “Fast quantum modular
exponentiation,” Phys. Rev. A 71(052320) (2005)
[46] U. V. Vazirani, “On the power of quantum computa-
tion,” Phil. Trans. R. Soc. Lond. A 356(1759–1768)
(1998)
[47] V. Vedral, A. Barenco, and A. Ekert, “Quantum net-
works for elementary arithmetic operations,” Phys. Rev.
A 54(147) (1995)
[48] J. Welch, A. Bocharov, and K. M. Svore, “Efficient Ap-
proximation of Diagonal Unitaries over the Clifford+T
Basis.,” QIC 16(1,2) (2016)
[49] C. Zalka, “Fast versions of Shor’s quantum factoring
algorithm,” 1998, quant-ph/9806084
[50] C. Zalka, “Shor’s algorithm with fewer (pure) qubits,”
2006, quant-ph/0601097
Appendix A: Exact emulation of the R|2〉 gate in the
Clifford+P9 basis.
At this time we lack a good effective classical compila-
tion procedure for approximating non-classical unitaries
by efficient ancilla-free circuits in the Clifford+P9 basis.
We show here, however, that the magic state |ψ〉 of
(9) that produces the R|2〉 gate by state injection can be
prepared by certain probabilistic measurement-assisted
circuits over the Clifford+P9 basis. Therefore the com-
pilation into the Clifford+P9 basis can be reduced to a
compilation into the Clifford+R|2〉 basis, while incurring
a certain state preparation cost. This solution, however
inelegant, is sufficient, for example, in the context of
Shor’s integer factorization.
We have seen in Section IV A that the classical
C1(INC) and, hence, the classical C2(INC) gates can be
represented exactly and ancilla-free using three P9 gates.
We use the availability of these gates to prove the key
lemma below.
Recall that ω3 = e
2pi i/3 is a Clifford phase.
Lemma 13. Each of the ternary resource states
(|0〉+ ω3|1〉)/
√
2, and (|0〉+ ω23 |1〉)/
√
2
can be represented exactly by a repeat-until-success (RUS)
circuit over Clifford+P9 with one ancillary qutrit and ex-
pected average number of trials equal to 3/2.
Proof. Let us give a proof for the second resource state.
(The proof is symmetrical for the first one.)
We initialize a two-qutrit register in the state |20〉 and
compute
C2(INC)(H ⊗ I)|20〉 = (|00〉+ ω23 |10〉+ ω3|21〉)/
√
3.
If we measure 0 on the second qutrit, then the first qutrit
is in the desired state. Overwise, we discard the register
and start over. Since the probability of measuring 0 is
2/3, the Lemma follows.
Corollary 14. A copy of the two-qutrit resource state
|η〉 = (|0〉+ ω3|1〉)⊗ (|0〉+ ω23 |1〉)/2 (A1)
can be represented exactly by a repeat-until-success circuit
over Clifford+P9 with two ancillary qutrits and expected
average number of trials equal to 9/4.
To effectively build a circuit for the Corollary, we stack
together the two RUS circuits described in Lemma 13.
Lemma 15. There exists a measurement-assisted circuit
that, given a copy of resource state |η〉 as in (A1), pro-
duces a copy of the resource state
|ψ〉 = (|0〉 − |1〉+ |2〉)/
√
3 (A2)
with probability 1.
Proof. Measure the first qutrit in the state (H† ⊗
I)SUM|η〉.
Here is the list of reduced second qutrit states given
the measurement outcome m:
m = 0 7→ (|0〉 − |1〉+ |2〉)/
√
3,
m = 1 7→ (|0〉 − ω3|1〉+ ω23 |2〉)/
√
3,
m = 2 7→ (|0〉 − |1〉+ ω3|2〉)/
√
3.
While the first state on this list is the desired |ψ〉, each of
the other two states can be turned into |ψ〉 by classically-
controlled Clifford correction.
As shown in [18],Lemma 5, the resource state |ψ〉 as in
(A2) can be injected into a coherent repeat-until-success
circuit of expected average depth 3 to execute the R|2〉
gate on a coherent state. See our Figure 2 in Section II D.
Recall that the C2(INC) gate appearing in the lemma
13 construction has the non-Clifford cost of three P9
gates. Thus, to summarize the procedure: we can ef-
fectively and exactly prepare the magic state |ψ〉 using
four-qutrit register at the expected average P9-count of
27/4.
To have a good synchronization of the magic state
preparation with the R|2〉 gate injection is would suf-
fice to have a magic state preparation coprocessor of
21
width greater than 27 (to compensate for the variances
in repeat-until-success circuits).
Appendix B: Circuit fidelity requirements for Shor’s
period finding function
To recap the discussion in Section II E, the quantum
period finding function consists of preparing a unitary
state |u〉 proportional to the superposition
N2∑
k=0
|k〉|ak mod N〉 (B1)
followed by quantum Fourier transform, followed by
measurement, followed by classical postprocessing.
As we know, the measurement result j can be useful
for recovering a period r or it can be useless. It has been
shown in [42] that the probability puseful of getting a
useful measurement is in Ω(1/(log(log(N)))).
Speaking in more general terms, let H be the Hilbert
space where the QFT|u〉 is to be found after the quan-
tum Fourier transform step, let G ⊂ H be the subspace
spanned by all possible state reductions after all possible
useful measurements and let G⊥ be its orthogonal com-
plement in H.
Let QFT|u〉 = |u1〉 + |u2〉, |u1〉 ∈ G, |u2〉 ∈ G⊥ be the
orthogonal decomposition of QFT|u〉. Then puseful =
||u1〉|2.
Let now |v〉 be an imperfect unitary copy of QFT|u〉 at
Hilbert distance ε. What is the probability of obtaining
some useful measurement on measuring |v〉?
By definition, it is the probability of |v〉 being projected
to G upon measurement.
Proposition 16. In the above context the probability of
|v〉 being projected to G upon measurement is greater than
puseful − 2
√
puseful ε
Proof. Let |v〉 = |v1〉 + |v2〉, |v1〉 ∈ G, |v2〉 ∈ G⊥ be the
orthogonal decomposition of the state |v〉.
Clearly ||u1〉 − |v1〉| < ε and, by triangle inequality,
||v1〉| ≥ ||u1〉| − ||u1〉 − |v1〉| > ||u1〉| − ε.
Hence ||v1〉|2 > (||u1〉| − ε)2 > ||u1〉|2 − 2 ||u1〉| ε =
puseful − 2
√
puseful ε as claimed.
Corollary 17. In the above context, if ε < γ
√
puseful
where 0 < γ < 1/2, then the probability of obtaining some
useful measurement on measuring |v〉 is greater than (1−
2 γ) puseful.
In particular if ε <
√
puseful/4 , we are at least half
as likely to obtain a useful measurement from the proxy
state |v〉 as from the ideal state QFT|u〉.
In summary, there is a useful precision threshold ε in
O(1/(
√
log(log(N)))) that allows to use an imprecisely
prepared state at precision ε in place of the ideal state
in the measurement and classical post-processing part of
Shor’s period finding function.
This translates into per-gate tolerance in the prepara-
tion circuit in a usual way. If d is the unitary depth of
the state preparation circuit, then it suffices to represent
each of the consecutive unitary gates to fidelity 1 − ε/d
or better. For completeness, we make this argument ex-
plicit in the following proposition. Let ||U || denote the
spectral norm of a unitary operator U .
Proposition 18. Assume that an ideal quantum com-
putation U =
∏d
k=1 Uk is specified using d perfect uni-
tary gates Uk and we actually implementing it using d
imperfect unitary gates Vk where for all k = 1, . . . , d it
holds that ||Uk − Vk|| ≤ δ. Then for the actually im-
plemented unitary transformation V =
∏d
k=1 Vk it holds
that ||U − V || ≤ d δ.
Proof. (See also [5], [46]). We perform induction on d.
When d = 1 there is nothing to prove. Assume the in-
equality has been proven for a product of length d− 1.
We have ||U − V || = ||∏dk=1 Uk − (∏d−1k=1 Uk)Vd +
(
∏d−1
k=1 Uk)Vd−
∏d
k=1 Vk|| ≤ ||
∏d
k=1 Uk−(
∏d−1
k=1 Uk)Vd||+
||(∏d−1k=1 Uk)Vd − ∏dk=1 Vk|| = ||∏d−1k=1 Uk|| ||Ud − Vd|| +
||(∏d−1k=1 Uk)−∏d−1k=1 Vk|| ||Vd|| ≤ δ + (d− 1)δ = d δ,
where in the second step we used the triangle inequality,
in the third step the multiplicativity of the norm, i.e.
||U V || = ||U || ||V || for all unitaries U,V, and that ||U || =
1 for all unitary U . In the last step we used the inductive
hypothesis.
Appendix C: Alternative circuits for modular
exponentiation
1. Width optimizing circuits
One way for the modular exponentiation to use qubits
(resp., qutrits) sparingly is to perform computation in
phase space.
First consistent solution of this kind has been proposed
in [3]. A peculiar feature of the proposed solution is that
the modular additive shift block for |b〉 7→ |(a+b) mod N〉
has four interleaved quantum Fourier transforms (two di-
rect and two inverse, see Figure 5 in [3]), the sole purpose
of which is establishing and then forgetting the sign of
a + b − N . It is unlikely that any of these transforms
can be made redundant without significant redesign of
the circuit. As we have pointed out in Section III F,
ternary quantum computers are comparatively inefficient
in practice when emulating non-classical modular expo-
nentiation circuits such as Beauregard [3].
Fortunately the Beauregard circuit can be supplanted
by Haener et. al circuit [23]. Instead of emulating that
circuit directly, we point out that our ternary modular ex-
ponentiation circuit based on ripple-carry adder (see Sec-
22
tion III A), maintains smaller width in qutrits by much
simpler means - systematic use of ternary basis state |2〉
which, for our purposes is always “idle”.
2. Depth-optimizing circuits
Reversible classical circuits that stand apart from rel-
atively simple layouts we have analyzed, are hinted at in
a hidden-gem paragraph in Section 5 of [14].
Let us revisit equation (13) in Section III E
ak mod N =
2n−1∏
j=0
(a2
j
mod N)kj mod N (C1)
It is pointed out in [14] that, instead of accumulating
the partial modular products sequentially, one can accu-
mulate pairwise modular products at nodes of a binary
tree, whose original leaves are classically pre-computed
values a2
j
mod N . This prepares the entire product in
depth O(log(n)) (instead of 2n) multiplications and size
O(n2).
Furthermore, each of the pairwise multiplications can
be set up as a binary tree of modular additions and so
performed in depth O(log(n)) and size O(n).
Thus the entire modular exponentiation is done in
depth O(log(n)2) and size O(n3).
This proposal still uses modular addition as the core
building block, and thus we can plug in an emulation
of the the modular addition circuit built out of carry-
lookahead adders and comparators as in the Section III E.
This fits well into the polylogarithmic depth promise of
[14].
It should be pretty straightforward to rewrite this de-
sign in ternary logic and use the ternary lookahead ad-
ditive shift circuits as described in subsection III B to
circuitize it. For the lack of time and space we have to
forego a more detailed analysis of this here.
