Optimal and asymptotically optimal NCT reversible circuits by the gate
  types by Maslov, Dmitri
ar
X
iv
:1
60
2.
02
62
7v
4 
 [q
ua
nt-
ph
]  
28
 D
ec
 20
16
Optimal and asymptotically optimal NCT reversible circuits by
the gate types
Dmitri Maslov1,2
1 National Science Foundation, Arlington, VA, USA
2 QuICS, University of Maryland, College Park, MD, USA
dmitri.maslov@gmail.com
September 21, 2018
Abstract
We report optimal and asymptotically optimal reversible circuits composed of NOT, CNOT, and
Toffoli (NCT) gates, keeping the count by the subsets of the gate types used. This study fine tunes
the circuit complexity figures for the realization of reversible functions via reversible NCT circuits.
An important consequence is a result on the limitation of the use of the T -count quantum circuit
metric popular in applications.
1 Introduction
Reversible circuits are important parts of quantum algorithms. Grover’s oracles, integer and finite
field arithmetic operations (used in Shor-type discrete logarithm quantum algorithms), as well as nu-
merous types of Boolean operations over quantum registers are all examples of the reversible circuits.
Consequently, the study of reversible circuits and their complexities is important in understanding the
complexity of quantum circuits and algorithms, as well as for the efficient implementation of quantum
algorithms.
In this paper, we study reversible circuits over the gate library consisting of the NOT, the CNOT, and
the Toffoli gates, also commonly referred to as NCT circuits. The individual gates are defined via the
logical transformations they perform over Boolean variables, as follows:
• NOT gate, NOT(a) : a 7→ a⊕ 1;
• CNOT gate, CNOT(a; b) : (a, b) 7→ (a, b ⊕ a);
• Toffoli gate, TOF(a, b; c) : (a, b, c) 7→ (a, b, c⊕ ab).
A reversible circuit is the string of gates, read left to right. In addition to the primary variable inputs,
a reversible circuit may have constant inputs, carrying a constant value of either a zero or a one. Those
additional inputs are called ancillae. They can be a useful resource, as they provide additional space for
the computations.
A reversible function of n Boolean variables, f(x) = f(x1, x2, ..., xn) = (f1(x1, x2, ..., xn),
f2(x1, x2, ..., xn), ..., fn(x1, x2, ..., xn)) is the bijective mapping of the n-dimensional Boolean cube into
itself. There are three possible notions of what it means to implement a reversible function by a reversible
circuit, that we list next, appearing in the weakest to the strongest form.
W. Weak. The reversible circuit computes a set of Boolean functions, and among them, are all n
outputs of the desired reversible function f .
1
I. Intermediate. The reversible circuit computes the mapping (x, y) 7→ (x, y ⊕ f(x)), where x, y, and
f(x) are n-bit registers, and the EXOR operates bitwise. In addition, some ancillae may be used,
but their values are returned to the original state.
S. Strong. The circuit implementing f performs the mapping x 7→ f(x). Some ancillae may be used,
but their values are returned to the original state.
Stronger notions of the implementability can be used straightforwardly to compute the weaker notions;
some CNOTs may be required. Weaker notions can too be used to compute the stronger notions. To
construct the intermediate implementation using the weak implementation, apply the weak circuit A,
EXOR useful outputs, f(x), to the new register y via the use of n CNOTs, and then apply A−1. Recall
that the circuit A−1 may be obtained from the NCT circuit A via inverting the order of gates in A. As
a result, the intermediate implementation can be constructed with at most twice the number of gates
in the weak implementation, plus n CNOT gates. Incidentally, same procedure can be applied to the
strong implementation to obtain the intermediate implementation from it.
To obtain the strong implementation from the intermediate implementation, take two circuits—circuit
B computing (x, y) 7→ (x, y ⊕ f(x)) and circuit C computing (x, y) 7→ (x, y ⊕ f−1(x)). To obtain
the transformation x 7→ f(x) start with the 2n-bit register (x, 0), apply B to it to transform it into
(x, f(x)), then SWAP first and second n-bit registers to obtain (f(x), x), and finally apply C to obtain
(f(x), x ⊕ f−1(f(x))) = (f(x), x ⊕ x) = (f(x), 0). Discarding the mention of ancillae, the aggregate
transformation can now be described as x 7→ f(x). A further in-depth study of the relation between the
intermediate and strong forms can be found in [6].
Neither of the above constructions affects asymptotics in the case when we are concerned with upper
bounds on the resources required to obtain the most difficult function. Indeed, if the upper bound on
the resource count used by B, performing the mapping (x, y) 7→ (x, y ⊕ f(x)), is I(n), same number,
I(n), applies to upper bound the cost of C, performing the mapping (x, y) 7→ (x, y ⊕ f−1(x)). Next
we summarize how the following notions are related: W (n) the cost of the weak implementation of an
arbitrary n-bit reversible function, I(n) the cost of the intermediate implementation of an arbitrary
n-bit reversible function, and S(n) the cost of the strong implementation of an arbitrary n-bit reversible
function:
WI: W (n) ≤ I(n);
IS: I(n) ≤ 2 · S(n) + n · Cost(CNOT);
WS: W (n) ≤ S(n);
IW: I(n) ≤ 2 ·W (n) + n · Cost(CNOT);
SI: S(n) ≤ 2 · I(n) + 3n · Cost(CNOT);
SW: S(n) ≤ 4 ·W (n) + 5n · Cost(CNOT).
In the above, we relied on the common notion that the SWAP gate can be implemented via three CNOTs.
We conclude that in the case when W (n), I(n), and S(n) are at least linear in n, asymptotic optimality
of any one of them implies the asymptotic optimality of all other types of implementations.
The reason to have multiple definitions of computability is rooted in the observation that the weak notion
would be the one expected in the scenario when reversible circuits are interesting in the context of their
own. However, reversible circuits are most often viewed in the broader context of quantum computing
[11]. From the point of view of quantum computations, the weak notion of computability by a reversible
circuit may give rise to the unwanted entanglement residing on those (qu)bits carrying partial results of
the computation. The intermediate notion removes the concern of the unwanted entanglement residing
on the partially computed outputs, and furthermore is used broadly within the context of quantum
algorithms. Should the strong notion of computability be required, it is possible to obtain it too, without
2
affecting the asymptotic optimality. Therefore, from the point of view of this paper, we will be satisfied
with any one type of implementation.
Define La,b,c(n, g), where a, b, c ∈ {0, 1}, and g is a positive integer, to be the smallest cost of the circuit
implementation of the most expensive reversible function of n variables realized by the NCT circuit using
at most g input constants. The input constants are allowed to take values 0 or 1. The circuit cost is
calculated as the sum across all gates participating in the given circuit, where the NOT gate is counted
with the weight a, the CNOT gate is counted with the weight b, and the Toffoli gate is counted with
the weight c. We note that La,b,c(n, 0) does not exist, as no odd permutation may be synthesized via an
NCT circuit without using an additional ancilla [15]; this explains why we chose g > 0 in the definition
of La,b,c(n, g). La,b,c(n) furthermore reports the best cost NCT implementation of the function that is
most difficult to obtain via its circuit realization in the scenario where the use of an arbitrary number of
ancillae is allowed.
Of the 8 possible choices for parameters a, b, c in La,b,c(n, g), some carry a special meaning. For instance,
L1,0,0(n, g) determines the maximal number of the NOT gates required by reversible circuits. We will
later show that this number is zero, meaning NOT gates by themselves are not very helpful, as their use
can be avoided. L0,0,1(n, g) = L0,0,1(n), when g is allowed to be arbitrarily high, such as to not limit the
space used by the computation, can be viewed as the multiplicative, or otherwise, non-linear cost of the
reversible functions. It is furthermore closely related to the T -count circuit metric in quantum circuits,
as both ignore the effects of the cost of linear (with respect to EXOR) reversible transformations. The
study of lower and upper bounds on L0,0,1(n), as well as the discussion of the implications, is the main
focus of this paper.
1.1 Motivation
In this paper we study the problem of minimizing the gate counts by their type in reversible circuits with
NOT, CNOT, and Toffoli gates. This study is motivated by the relative hardness of constructing the
Toffoli gate compared to the effort required to obtain the CNOT, and the relative hardness of CNOT
compared to NOT.
First, compare the implementation costs of NOT and CNOT. Both are Clifford gates, therefore, on the
logical level they are likely to be transversal. This means that both gates are implemented via a set of
NOT, and, respectively, CNOT gates, applied to the physical-level qubits. On the physical level, a NOT
gate is often less expensive than the CNOT gate. Frequently, this is due to the two-qubit gates taking
more effort to implement than any of the single-qubit gates. It is not uncommon for a CNOT gate to be
20 times more resource demanding compared to the NOT gate.
Next, compare the cost of the Toffoli gate to the cost of the CNOT gate. Consider quantum logical-level
circuits. Often, all Clifford gates, CNOT included, are relatively easy to implement on the logical level.
In contrast, the Toffoli gate, being a transformation outside the Clifford group, is more difficult to obtain.
Assuming the non-Clifford gate provided by the fault tolerance approach selected is the so-called T gate,
the Toffoli gate may be implemented as a circuit with 2 Hadamard gates, 7 T/T † gates, and 6 CNOT
gates [1]. Discarding the cost of the Hadamard gate, and taking the sum of the costs of the remaining
gates in this circuit implementation, we obtain Cost(TOF) = 6 ·Cost(CNOT) + 7 ·Cost(T/T †). While
it was shown that 6 CNOT gates are required to implement the Toffoli gate as a circuit over the library
including arbitrary single-qubit and CNOT gates [14], the number 7 of T/T † gates has been obtained via
a computer search [1] and in principle could be reducible (and, in fact, it is when additional resources
are available [5]). The known way of implementing the fault-tolerant gate T requires state distillation
and then its teleportation. The teleportation is achieved via the use of the single logical CNOT gate,
relying on the well-known teleportation circuit, and therefore it is not resource demanding. The state
distillation relies on a nested application of the 15-qubit Hamming code [4]. Physical parameters of the
quantum information system used and the overall length of the desired computation play a determining
role in deciding on the details of the protocol and the complexity of implementing the T gate. Assuming
3
the distillation depth of 2, which appears to be a practical choice for scalable computations, the cost
of implementing the T gate is roughly 50 times that of the logical CNOT. At which point, the cost of
the Toffoli gate expressed in the units corresponding to the cost of the CNOT gates becomes roughly
6 ·1+7 ·50 = 356, being, in practical terms, a large number. While the number 356 may itself be possible
to reduce (e.g., outsource the ancilla production to before the desired computation), it is likely that the
Toffoli gate will remain substantially (provably, at least 6 times over arbitrary single-qubit gates and the
CNOT [14]) more expensive than the (nearest-neighbour) CNOT gate.
One other resource that can be useful for efficient implementation of reversible functions is ancillary
(qu)bits. It may be difficult to compare the cost of arbitrary gates to that of ancillary qubits directly,
as these are, strictly speaking, resources of a different kind. However, we will next evaluate the relation
between the cost of a logical ancilla qubit and that of logical NOT/Toffoli gates to conclude that, within
known fault tolerant approaches, the ancilla qubit is substantially more expensive than the NOT gate,
and substantially less expensive than the Toffoli gate. First, to obtain a logical ancilla qubit in the state
|0〉 or |1〉 we need a physical space (qubits). This physical space needs to be prepared in the respective
encoded logical state. The procedure accomplishing it can be described as a (physical-level) Clifford
circuit. As a result, one may expect a number of the physical-level CNOT gates to be applied. Thus,
recalling earlier discussions, it can be expected that the cost of this operation exceeds that of the logical
NOT gate. Second, the Toffoli gate relies on the seven T/T † gates that themselves are obtained with
the use of state distillation, which includes a nested application of the circuits implementing a Clifford
unitary. Logical |0〉 or |1〉 state preparation, on the other hand, uses only one Clifford circuit designed
to prepare a state, as opposed to a whole unitary, which is expected to be much simpler to accomplish.
The cost of an ancilla can thus be said to be roughly similar to that of the CNOT gate. As such, we
will disregard the cost of ancilla every time we exclude the cost of the CNOT gates from the circuit
cost figure (La,b,c(n, g) with b = 0). In the scenario when we include the CNOT count in the overall
calculation, we may consider accounting for ancilla as well. There are only two interesting cases to con-
sider, L1,1,1(n, g) and L0,1,1(n, g). This is because, as shown later, the remaining two relevant complexity
figures, L0,1,0(n, g) = 0 when g ≥ 1, and L1,1,0(n, 1) = 1 and L1,1,0(n, g) = 0 when g ≥ 2 carry small
values, and there is little interplay between the value of the complexity function L and the number of
ancillae. The case L0,1,1(n, g) may furthermore be reduced to L1,1,1(n, g). Indeed, constructing the lower
bound involves the counting argument, and the asymptotics of the number of transformations achievable
by cost-1 circuits over the library with “free” NOT gates is the same as the asymptotics of the number
of transformations achievable by cost-1 circuits when the NOT gates are counted towards the cost figure.
This means that essentially the same lower bounds will apply to both L1,1,1(n, g) and L0,1,1(n, g). An
upper bound for the quantity L1,1,1(n, g) can furthermore be used directly to upper bound the quantity
L0,1,1(n, g). As a result, of the two quantities, L1,1,1(n, g) and L0,1,1(n, g), only one, L1,1,1(n, g), may be
studied, with the results transferable to L0,1,1(n, g).
If we allow arbitrary ancillae, a classical Boolean circuit complexity result stating that any Boolean
function can be implemented using at most O(2
n
n ) NOT/OR/AND classical gates [7] can be used to
upper bound the number of NOT/OR/AND gates in a classical irreversible circuit by O(2n) for every
reversible function. Making this latter classical circuit reversible may increase the number of gates used by
some constant factor, and will not require more than O(2n) ancillae. This leads to the upper bound of the
form L1,1,1(n,C22
n) . C32
n for a proper choice of constants C2 and C3. We can furthermore lower bound
L1,1,1(n,C22
n) by the quantity C12
n, for a proper choice of the constant C1, via applying the simple
counting argument, [15, Lemma 8], to obtain asymptotic optimality, C12
n . L1,1,1(n,C22
n) . C32
n. It
is interesting to study how the value of L1,1,1(n, g) function changes when the number of ancillae g is
increased from 1 to C22
n, however, such a study is outside the scope of this paper. For the rest of the
paper, we will restrict the number of ancillae to a constant when considering the values of the L function
with the CNOT gate count included.
4
1.2 Previous work
The topic of the complexity of NCT realizations of reversible functions has been studied extensively.
Within the terminology introduced above, previous literature encounters the following results. [15]
reports upper and lower bounds of the following form: n2
n
3 log2 n
. L1,1,1(n, 1) . 9n2
n, L0,1,1(n, 1) . 9n2
n,
and L0,0,1(n, 1) . 9n2
n. The upper bounds were improved to L1,1,1(n, 1) . 5n2
n, L0,1,1(n, 1) . 5n2
n,
and L0,0,1(n, 1) . 3n2
n in [8], and then to L1,1,1(n, 1) . 4.5n2
n, L0,1,1(n, 1) . 4n2
n, and L0,0,1(n, 1) .
2n2n in [12, 13]. Finally, [16] reports improved upper bounds of the following form: L1,1,1(n, 1) .
48n2n
log2 n
,
L0,1,1(n, 1) .
40n2n
log2 n
, and L0,0,1(n, 1) .
32n2n
log2 n
. Technically, to apply the result [16], that was itself
developed to handle even permutations, we need to mention that any odd permutation can be reduced
to an even permutation via multiplying it by the maximal size multiple control Toffoli gate (one spanning
all n bits). This latter multiple control Toffoli gate requires a linear number of 3-bit Toffoli gates to
be implemented as an NCT circuit [2], and therefore does not affect the advertised asymptotics. The
multiplicative constant separating the best known lower [15] and upper [16] bounds for L1,1,1(n, 1) is
about a hundred, however, it is no more a function of n, allowing to conclude that the asymptotic
optimality has been established.
The topic of gate-specific Boolean circuit complexities has also been studied in the literature. Specifically,
[10] studied the complexity of formulas, circuits, and contact relays, implementing a Boolean single-
output function, including the multiplication cost (in the basis with Boolean multiplication and addition
modulo-2, where the addition can be used for free) of Boolean single-output functions. The upper
bound developed in [10] on the number of multiplications required, 2n/2 + o(2n/2), may be applied to
the reversible case to obtain the upper bound of n2n/2, in the leading order, of the Toffoli gates per a
reversible function. However, in this paper, we are able to show a better upper bound of 3√
2
√
n2n/2 in
the leading order via a direct construction. The difference between classical and reversible logic cases
is furthermore not only in the different number of outputs (n VS 1, assuming n input bits) that need
to be computed by the reversible circuit simultaneously, but also in the ancillae management, that is of
no importance in classical circuits. These differences are substantial in that they appear to prohibit a
direct transfer of the results from standard Boolean logic to the reversible function/circuit scenario.
2 La,b,c(n, g): practice-motivated and other cases
We have previously established that Cost(NOT) < Cost(CNOT) < Cost(TOF). Therefore, there is
most value in studying circuit complexities by the gate types in the scenario discarding the costs of
the simpler resources first. These are the cases of L0,1,1(n, g) and L0,0,1(n). Recall that asymptotically
optimal lower and upper bounds in the scenario L1,1,1(n, 1) have already been obtained by the previous
authors. We study L0,1,1(n, g) and L0,0,1(n) in the following sections. In this section, we study optimal
and asymptotically optimal circuits in the remaining four scenarios (L0,0,0(n, g) is trivial), L1,0,0(n, g),
L1,1,0(n, g), L0,1,0(n, g), and L1,0,1(n, g).
Consider L1,0,0(n, g), counting the number of NOT gates in reversible NCT circuits. It may be easily
established that ∀g L1,0,0(n, g) = 0. Indeed, assign the value of 1 to the ancillary qubit y. Every time
a NOT(xi) gate is used by the synthesis algorithm reported in, e.g., [9] (a NOT gate is applied at
most once to each primary input in the beginning of the circuit), replace it with the CNOT(y;xi).
The result of this modification of the synthesis algorithm in [9] is no NOT gates are used, and therefore
L1,0,0(n, g) = 0. Similarly, if one were to discard the cost of the Toffoli gates, and study the NOT/CNOT
cost, the following statement is true: ∀g ≥ 2 L1,1,0(n, g) = 0. The case of L1,1,0(n, 1) is somewhat more
complex and requires an explicit proof.
Lemma 1. L1,1,0(n, 1) = 1.
Proof. Lower bound. There are two cases to consider: on the input side we have either (a) values
5
x1, x2, ..., xn, 0 or (b) values x1, x2, ..., xn, 1, where x1, x2, ..., xn are primary inputs. To keep the NOT
and CNOT count to zero, we may use only the Toffoli gates. We next prove that there exists a reversible
function that may not be computed by such a circuit. The impossibility implies that at least one NOT
or CNOT gate needs to be used. In the proof of the upper bound that follows we will furthermore show
that one NOT gate does suffice, leading to the desired equality. To prove that we need a NOT or a
CNOT gate, apply a series of Toffoli gates, and observe that the number of components with term 1 in
their PPRM expansion (Positive Polarity Reed-Muller expansion, see (1)) remains constant (zero in the
case (a) and one in the case (b)). Indeed, to include the term 1 in some component y not yet containing
a 1 in its PPRM expansion, this bit needs to be a target of a Toffoli gate. Suppose this is the Toffoli gate
with controls y1 and y2. To obtain term 1 in the PPRM expansion of the product y1y2 both y1 and y2
need to contain the term 1 in their PPRM expansion. However, we have at most zero (case (a)) or one
(case (b)) components containing the term 1, but not two that we need. To conclude the proof observe
that the function (x1⊕ 1, x2⊕ 1, x3, ..., xn) is reversible and has two of its components contain term 1 in
their PPRM expansion. This function cannot be generated by the Toffoli gates alone.
Upper bound. We modify the algorithm reported in [9] to implement any reversible function using
only the Toffoli gates and at most one NOT gate, when only a single ancilla is available. First, choose the
value of the ancilla to be 1. Take a reversible function and synthesize a reversible circuit implementing
it using the Toffoli gates and no more than one NOT gate.
Step 1 of the basic algorithm [9] prescribes the use of NOT gates for every bit where f(0) 6= 0. With-
out loss of generality, assume these are the first k bits. Then, replace the circuit composed with these
k NOT gates, NOT(x1)NOT(x2)...NOT(xk), with the functionally equivalent circuit TOF(1, x1;x2)
TOF(1, x1;x3)...TOF(1, x1;xk)NOT(x1)TOF(1, x1;xk)...TOF(1, x1;x3)TOF(1, x1;x2), such as illus-
trated next (k = 3, n = 4):
x1
x2
x3
x4
1
7→
x1 • • • •
x2
x3
x4
1 • • • •
Observe that the circuit on the right hand side uses only one NOT gate, at which point, the circuit we
have synthesized thus far has the NOT/CNOT cost of one.
Steps 2 to 2n use only CNOT, Toffoli, and multiple control Toffoli gates. We replace each CNOT(xi;xj)
with TOF(1, xi;xj) and replace each multiple control Toffoli gate with its Toffoli gate realization, [2,
Lemmas 7.2, 7.3]. In particular, when needed, we break down the given multiple control Toffoli gate into
four smaller multiple control Toffoli gates using Lemma 7.3, and then apply Lemma 7.2 to decompose all
smaller multiple control Toffoli gates into 3-bit Toffoli gates. The resulting circuit may use the anicilla
bit that we have available, carrying the known value 1. As such, steps 2 to 2n use no NOT or CNOT
gates, and the overall NOT/CNOT gate count is 1.
Next, consider L0,1,0(n, g), counting the number of CNOTs in the reversible NCT circuits. ∀g L0,1,0(n, g) =
0, since one may select the ancillary qubit y to carry the value 1, and replace each CNOT(xi;xj) used
[9] with TOF(y, xi;xj).
Finally, the case L1,0,1(n, g) may be reduced to L0,0,1(n, g), considered in Section 4. This is because in
the presence of “free” CNOT gates, each NOT(xi) gate can be replaced by a CNOT(1;xi), being an
implementation of NOT with zero cost.
6
3 L0,1,1(n, g)
Lemma 2. n2
n
3 log2 n
. L0,1,1(n, 1) .
40n2n
log2 n
.
Proof. Lower bound. We rely on [15, Lemma 8] to construct the lower bound. Specifically, [15, Lemma
8] states that the quantity log2 Glog2 b
, where G is the size of the set of functions being computed, and b is the
number of different cost-one circuits, lower bounds the cost of the circuits that implement an arbitrary
function. In our calculations, G = 2
n!
2n , since each function may be implemented by a circuit up to the
possible “free” negation of all output side wires. The number of the different cost-one circuits (Toffoli
and CNOT gates with arbitrary NOT gates on the input side) is 4n3 + o(n3). The ratio log2 Glog2 b
is then
equal to n2
n
3 log2 n
up to the lower degree additive terms.
Upper bound. L0,1,1(n, 1) .
40n2n
log2 n
was shown in [16].
4 L0,0,1(n)
4.1 Lower bound
In the next we will show that the number of the Toffoli gates required to implement an arbitrary reversible
n-bit function f(x1, x2, ..., xn) = (f1(x1, x2, ..., xn), f2(x1, x2, ..., xn), ..., fn(x1, x2, ..., xn)) of n primary
inputs x1, x2, ..., xn with n primary outputs f1, f2, ..., fn, L0,0,1(n), is at least
√
n2n/2 + o(
√
n2n/2). To
accomplish this consider a circuit with h Toffoli gates. We will number and refer to the Toffoli gates
within the circuit as TOF1,TOF2, ....,TOFh, in the order they appear in the circuit (first to last).
We furthermore break bits/wires in the circuit into smaller chunks. In particular, denote wa to be an
uninterrupted piece of wire between some two gates. The values carried by those pieces of wire are equal
to the respective primary input/constant between that input/constant and the first gate applied; the
values of the pieces of wire in the middle of the circuit and the output depend on which gates were
applied by the circuit. For instance, for a TOFi(wa, wb;wc) over three input-side pieces of wire, being
two input controls wa and wb, and one input target wc, and three pieces of wire on the output side,
wd, we, and wf , the values carried by the pieces of wire are related by the following formulas wd = wa,
we = wb, and wf = wc⊕wawb. We will furthermore denote the Boolean product computed by the Toffoli
gate TOFi and EXORed into its target, wawb, as Prod(TOFi).
Lemma 3. In a reversible NCT circuit with h Toffoli gates each value carried by a piece of wire wa can
be written as a linear sum LS(wa) =
⊕h
i=1 ciProd(TOFi) ⊕ l(x), where ci ∈ {0, 1} and l(x) is a linear
function of primary inputs.
Proof. To construct the linear representation advertised in the statement of Lemma, start with an empty
linear sum, LS = 0 and traverse the given piece of wire wa back towards the beginning of the circuit
until the terminal pieces of wire are found. Terminal pieces of wire include all primary inputs and all
input constants. Define S := {wa}. The set S contains all pieces of wire we need to look at. We next
traverse the circuit and compute LS.
• For a piece of wire wa ∈ S and upon finding a NOT gate with input wb and output wa, replace wa
with wb in the set S, and replace LS with LS ⊕ 1.
• For a piece of wire wa ∈ S and upon finding a CNOT gate with input control wb, input target wc,
and output target wa, replace S with S \ {wa} ∪ {wb, wc}.
• For a piece of wire wa ∈ S and upon finding a CNOT gate with input control wb and output control
wa, replace S with S \ {wa} ∪ {wb}.
7
• For a piece of wire wa ∈ S and upon finding a Toffoli gate TOFi with input target wb and output
target wa, replace wa with wb in S, and replace LS with LS ⊕ Prod(TOFi).
• For a piece of wire wa ∈ S and upon finding a Toffoli gate TOFi with output control wa, and
input control wb on the same bit as wa, replace wa with wb in the set S.
• For a piece of wire wa ∈ S carrying a primary input signal xk or an input constant Const, remove
wa from S and replace LS with LS ⊕ xk or LS ⊕ Const, correspondingly.
Observe that S may already include a wb when we try to add wb to it. It is safe to keep only those pieces
of wire in the set S that appear in it with odd multiplicity. The above algorithm terminates in time at
most linear in the number of gates in the circuit.
Theorem 1.
√
n2n/2 . L0,0,1(n).
Proof. We will apply the counting argument to obtain the desired lower bound. The key in successfully
using this strategy is to encode reversible functions via such a combinatorial structure that tightly (the
encoding must not be too wasteful, such as to affect asymptotics) encodes different functions via their
circuit representation and it is easy to either count or upper bound the number of such structures, as
parametrized by the number of the Toffoli gates used. In such case, the number h, a parameter in the
formula counting the number of such structures, may be used to lower bound the value L0,0,1(n), as
h needs to be sufficiently high before the number of combinatorial structures becomes large enough to
contain 2n! instances, where 2n! is the number of reversible functions of n inputs.
We next map reversible NCT circuits over n primary inputs containing h Toffoli gates into directed
acyclic graphs with edge and vertex labels.
Vertices and edges. The set of vertices is a union of two sets, T and F . The set T contains
h elements, {T1, T2, ..., Th}, each corresponding to the respective Toffoli gate in the original circuit,
{TOF1,TOF2, ...,TOFh}. The Toffoli gates within the original circuit are numbered in the ascending
order as they appear in the circuit. The set F consists of n terminal vertices, {F1, F2, ..., Fn}, each corre-
sponding to a single bit of output. The number of vertices in the DAG is thus h+n. We draw a directed
edge (Ti, Tj) iff for some input control wa of TOFj , LS(wa) contains Prod(TOFi) with the non-zero
coefficient ci. We draw a directed edge (Ti, Fk) iff LS(wa), where wa is the piece of wire corresponding
to the output fk, contains Prod(TOFi).
Edge labels. To each edge (Ti, Tj) ending in the vertex Tj , we assign a label ELTi,Tj with a numeric
value from the set {1, 2, 3}. The binary encoding of the label ELTi,Tj tells which input-side controls
of the gate TOFj contain term Prod(TOFi) in their LS linear form representation. Specifically, label
1 = 012 says the second control of TOFj requires the knowledge of Prod(TOFi) to be computed (i.e.,
Prod(TOFi) is included in the linear sum LS of this piece of wire), label 2 = 102 says the first control
of TOFj contains Prod(TOFi) in its LS form, and label 3 = 112 says both controls of TOFj contain
Prod(TOFi) in their LS forms. To each edge (Ti, Fk) ending in the vertex Fk we assign the numeric
label of 1. The meaning of each such edge is the statement that the Prod(TOFi) is EXOR-ed with
something else to obtain the output bit fk, but it will become useful to think of the label as being equal
to 1, as opposed to any other number or no label, when counting the number of DAGs.
Vertex labels. Enumerate all 2n+1 linear functions of prime inputs {x1, x2, ..., xn}. Each vertex Ti
is labelled by a set of two numbers, V LTi := (lwi,a , lwi,b), reporting the numeric orders of the linear
functions of the primary inputs that are being EXORed to the LS of the two input-side controls of
the gate TOFi, being the pieces of wire wi,a and wi,b directly feeding into the gate itself. lwi,a(lwi,b)
is obtained via removing all Prod(·) terms from LS(wi,a)(LS(wi,b)) and computing the numeric order
of the respective linear function of the primary inputs. Each vertex Fk is labelled by the number lFk ,
corresponding to the numeric order of the linear function of primary inputs EXORed to the kth primary
output. It is obtained via removing all product terms from LS(wFk), where wFk is the piece of wire
carrying kth primary output.
8
Each such DAG uniquely defines a reversible function, and as such the number of different DAGs upper
bounds the number of different reversible functions possible to obtain as a function of h—the number of
Toffoli gates used. Indeed, consider a specific instance of the above DAG, and obtain the reversible func-
tion it encodes. We next construct logical functions computed in each vertex of the DAG, in the following
order: T1, T2, ..., Th, F1, F2, ..., Fn. T1 has no incoming edges and is labelled by V LT1 = (lw1,a , lw1,b). The
product computed by theTOF1 gate is thus Prod(T1) = lw1,a(x)&lw1,b(x); here, we use the numeric order
of the linear function to call the function itself. Incidentally, it is same as LS(w1,a)&LS(w1,b) since this
is the first Toffoli in the circuit. Ti has incoming edges that may be broken down into two sets STi,1 and
STi,2, such that the edge label in each set has a non-zero digit j = {1, 2}. The product function Prod(Ti)
takes the value (lwi,a(x) ⊕
⊕
(Tk,Ti)∈STi,1 Prod(Tk))&(lwi,b(x) ⊕
⊕
(Tk,Ti)∈STi,2 Prod(Tk)). The function
assigned to Fk is Out(Fk) := lFk(x)⊕
⊕
(Ti,Fk)∈SFk Prod(Ti), where SFk is the set of edges coming into
the vertex Fk. The reversible function is given by the output vector (Out(F1), Out(F2), ..., Out(Fn)).
We count the number of DAGs representing NCT circuits using the following product formula: the
number of DAGs with vertex and edge labels equals to the product of the number of DAGs with edge
labels and the number of ways to label vertices. The second number is easy to obtain. Each vertex Ti
is labelled by a pair of linear Boolean functions of n variables. There are 2n+1 linear Boolean functions
of n variables, and as such the number of choices for the label of Ti is 2
2(n+1). The number of choices
for the label of Fk is 2
n+1. Given the total number of T type vertices is h and the number of F type
vertices is n, the overall number of vertex labels is 22h(n+1) · 2n(n+1).
To count the number of DAGs with edge labels, describe those by a size h × (h + n) matrix B =
{bi,j}i=1..h,j=1..h+n, bi,j ∈ {0, 1, 2, 3}, where bi,j = ELTi,Tj , when j ≤ h and edge (Ti, Tj) is in the DAG,
bi,j = ELTi,Fk , when j > h, k = j − h, and edge (Ti, Fk) is in the DAG, and bi,j = 0 everywhere else.
Such matrices encode all DAGs, however, not every h× (h + n) matrix is being used by this encoding.
Specifically, bi,h+k (k > 0) never take values above 1, all diagonal elements bi,i = 0, and, by construction
of the DAG, its matrix B has zeros below the diagonal. Including those constraints gives an improved
count compared to a simple count of all h × (h + n) matrices whose elements take one of 4 values.
Specifically, the restricted set of matrices, subject to the above conditions, has h(h−1)2 elements that may
take any one of 4 values and nh elements that may take binary values. The number of such restricted
matrices is thus 4h(h−1)/2 · 2nh.
The number of DAGs, 4h(h−1)/2 · 2nh · 22h(n+1) · 2n(n+1), should be at least as high as the number of
reversible functions, 2n!. Solving for h, we obtain:
2h(h−1) · 2nh · 22h(n+1) · 2n(n+1) ≥ 2n! ≥
√
2pi2n · 2n2n · e−2n
h2 + 3nh+ h+ n2 + n ≥ n2n + C12n + C2n
h ≥ √n2n/2 + o(√n2n/2)
Dropping lower degree additive terms, we obtain the desired inequality
√
n2n/2 . h = L0,0,1(n).
4.2 Upper bound
Recall that every Boolean function can be written as a positive polarity Reed-Muller (PPRM) expansion,
also known as Zhegalkin polynomial,
f(x1, x2, ..., xn) = a0 ⊕ a1x1 ⊕ a2x2 ⊕ a3x1x2 ⊕ ...⊕ a2n−1x1x2...xn, (1)
where ai|i=0..2n−1 are Boolean numbers. We will rely on the PPRM expansion in our construction. In
particular, we start by describing how one may obtain all product terms that the PPRM expansion relies
on using optimal number of the Toffoli gates.
Lemma 4. The set of all 2n n-bit product terms {1, x1, x2, x1x2, ..., x1x2...xn} may be generated by a
reversible NCT circuit with the optimal number of 2n − n− 1 Toffoli gates.
9
Proof. Lower bound. The set {1, x1, x2, x1x2, ..., x1x2...xn} contains 2n− 1 linearly independent terms
(all but first term are linearly independent). The set of primary inputs, {x1, x2, ..., xn}, contains n
linearly independent terms. For a set S of Boolean functions the only way to obtain a new Boolean
function that is linearly independent from all those in the set S using NOT, CNOT, and Toffoli gates
applied to those functions in the set, is to use a Toffoli gate. As such, to generate 2n − 1 linearly
independent functions from the original set containing n linearly independent functions, one must use
at least 2n − 1− n Toffoli gates.
Upper bound. Denote C(n) to be the number of Toffoli gates used to obtain the set of the product terms
{1, x1, x2, x1x2, ..., x1x2...xn}. Once the set {1, x1, x2, x1x2, ..., x1x2...xn} over n variables is constructed,
obtain the set {1, x1, x2, x1x2, ..., xn+1, ..., x1x2...xn+1} over n+ 1 variables as follows. For each register
r in the existing set except first use the Toffoli gate with controls r and xn+1, and target residing in the
value 0 to compute the product rxn+1 into the target bit. This allows constructing the set of 2
n−1 terms,
{x1xn+1, x2xn+1, x1x2xn+1, ..., x1x2...xnxn+1}, at the cost of 2n − 1 Toffoli gates. Uniting these newly
constructed terms with {1, x1, x2, x1x2, ..., x1x2...xn} that we already have and the input variable xn+1
obtains the desired set {1, x1, x2, x1x2, ..., xn+1, ..., x1x2...xn+1}. To summarise the above construction,
we can write the following equality
C(n+ 1) = C(n) + 2n − 1.
Observing that C(1) = 0 allows solving this recurrence to obtain the desired C(n) = 2n − n− 1.
Theorem 2. L0,0,1(n) .
3√
2
√
n2n/2.
Proof. To obtain a circuit computing the reversible function f(x1, x2, ..., xn) = (f1(x1, x2, ..., xn),
f2(x1, x2, ..., xn), ..., fn(x1, x2, ..., xn)), we rely on the PPRM decomposition of the individual output
components, followed by the grouping of variables into two non-overlapping sets A and B, A ⊔ B =
{x1, x2, ..., xn}, containing a and b variables each (a + b = n), as follows. Denote P (σ1, σ2, ..., σn) =
xσi1xσi2 ...xσik , where σi|i=1..n are Boolean numbers and {σi1 , σi2 , ..., σik} is the set of all σi = 1 within
the set {σ1, σ2, ..., σn}. Boolean n-tuples can furthermore be treated as natural numbers (via binary
decomposition of integers).
fi(x1, x2, ..., xn) =
2n−1⊕
j=0
P (j)fi(j)
=
⊕
j=0..2a−1,fi(j,σ)=1
P (j2b)&
( ⊕
k=0..2b−1
P (k)fi(j, k)
)
=
⊕
j=0..2a−1,fi(j,σ)=1
P (j2b)&
( ⊕
k=0..2b−1,fi(j,k)=1
P (k)
)
The circuit implementing f(x1, x2, ..., xn) is obtained as follows:
1. Construct all positive polarity product terms over the set B with b variables. Per Lemma 4, this
requires 2b − b− 1 Toffoli gates.
2. Construct all positive polarity product terms over the set A with a variables. Per Lemma 4, this
requires 2a − a− 1 Toffoli gates.
3. For each of the 2a product terms in the set A, multiply this term by the proper function over the
set B, obtained as a linear combination of product terms over the set B. Add the constructed
terms together to obtain ith output bit. This operation requires 2a− 1 Toffoli gates per each of the
n bits of the target function f(x1, x2, ..., xn); this is because we use a CNOT instead of the Toffoli
gate to multiply by term 1. The total Toffoli gate count of this part is thus n(2a − 1).
10
The total Toffoli gate count in the above construction is 2a + 2b − a − b − 2 + n(2a − 1). Assigning
a := n−log2 n2 and b :=
n+log2 n
2 , we furthermore obtain:
2
n−log2 n
2 + 2
n+log2 n
2 − n− 2 + n(2n−log2 n2 − 1) = 2
n/2
√
n
+
√
n2n/2 − 2n− 2 +√n2n/2
= 2
√
n2n/2 + o(
√
n2n/2).
The above calculation relies on the real-valued a and b, whereas in our construction numbers a and b
must take integer values. This limitation imposes the requirement to correct the leading coefficient by
the ratio min{f(0),f(1)}minx∈[0,1] f(x) , where f(x) = 2
x +21−x, and x plays the role of the fractional part of either a or
b. This ratio equals to 3
2
√
2
, resulting in the overall upper bound of
L0,0,1(n) .
3
2
√
2
2
√
n2n/2 =
3√
2
√
n2n/2.
Conjecture 1. L0,0,1(n) .
√
n2n/2.
4.3 Corollaries and discussion
Define the non-Clifford cost of a quantum circuit to be the number of operations outside the Clifford
group that it contains. T -count, a metric of this kind, is popular in applications, owing to the dominating
cost of the T/T † gates over the cost of other gates.
Corollary 1. The T -count of quantum circuits implementing a reversible function f(x) of n primary
inputs in the form of the mapping (x, 0) 7→ (x, f(x)) can be upper bounded by the expression 21√n2n/2+
o(
√
n2n/2).
Proof. The desired construction relies on the Bennett’s trick [3]. The Bennett’s trick ensures that all
auxiliary bits are properly cleaned and no residual entanglement remains that may prohibit from using
the desired reversible circuit within quantum algorithms. In particular, apply the result of Theorem 2
to obtain a reversible NCT circuit with n Boolean outputs (f1(x), f2(x), ..., fn(x)) = f(x). This circuit,
C, relies on 2a+2b− n− 2+n(2a− 1) Toffoli gates and computes functions f1, f2, ..., fn, product terms
over the set A, and product terms over the set B. To obtain the desired mapping (x, 0) 7→ (x, f(x)), we
only need to uncompute product terms over the sets A and B using the inversion of the circuit that was
used to compute them. Such a circuit uses 2a + 2b − n− 2 Toffoli gates. The overall gate count is thus
2(2a + 2b − n− 2) + n(2a − 1). Equating 2 · 2b = 2 · 2n−a to n2a allows to obtain favourable asymptotic.
This requires the parameter a to take the value n+1−log2 n2 . The overall Toffoli gate count corrected for
the notion that a must be integer is thus 3
2
√
2
· 2√2√n2n/2 + o(√n2n/2) = 3√n2n/2 + o(√n2n/2). Since
each Toffoli gate relies on seven T/T † gates, the overall T -count is 21
√
n2n/2 + o(
√
n2n/2).
Table 1 reports upper bounds on the number of T/T † gates in the NCT circuit realizations of reversible
n-bit functions for small values n.
In our constructions of the upper bounds we relied on the seven T/T † gate implementation of the Toffoli
gate. However, in the presence of measurements and the ability for classical feedback, the Toffoli gate
can be implemented via a circuit with only four T/T † gates [5]. This means that the upper bound in
Corollary 1 drops down to 12
√
n2n/2 + o(
√
n2n/2), and all T -counts in Table 1 can be reduced by the
factor of 7/4 (e.g., a 15-bit reversible function would require only at most 8252 T/T † gates).
In the above, we upper bounded the T -count cost of the implementations of reversible functions by an
expression of the form O(
√
n2n/2), as well as reported a table showing the T -count for small numbers
11
n T -count n T -count
3 36 12 4,648
4 84 13 6,643
5 175 14 10,430
6 294 15 14,441
7 525 16 22,036
8 812 17 30,975
9 1,295 18 46,186
10 2,002 19 65,877
11 2,989 20 96,320
Table 1: Upper bounds on the T -count of an arbitrary reversible f(x) over n variables, implemented as
the mapping (x, 0) 7→ (x, f(x)) with the use of arbitrary ancillae.
of inputs n (Table 1). We will next consider a more realistic circuit cost metric and establish a lower
bound on its value to show that the use of the T -count may significantly downplay the real cost of
circuit implementations. The lesson here is the T -count metric must be used with extra care, or better
yet replaced with a metric that does not lead to an abuse of a resource deemed less costly and thereby
ignored, such as the T -count does with the CNOT gates.
Per previous constructions, the number of Toffoli gates that suffice to implement an arbitrary n-bit re-
versible function is upper bounded by the expression min{a>0,b>0,a+b=n} (2(2a + 2b − n− 2) + n(2a − 1)),
and thereby the T -count is no more than
X := 7 · min
{a>0,b>0,a+b=n}
(2(2a + 2b − n− 2) + n(2a − 1)),
in the scenario when we are concerned with the potential unwanted entanglement. Recall that when
applying the Bennett’s trick to this circuit, the sub-functions
⊕
k=0..2b−1,fi(j,k)=1 P (k) need to be com-
puted and multiplied by a proper product the over variable set A only in the first part of the circuit, but
are unnecessary in the second part, as they are uncomputed after each use. Such circuits implementing
the reversible functions use no more than a total of S = 2a + 2b + n+ 1 bits: 2a bits contain products
over the set A (including primary inputs), plus 2b bits containing products over the set B (including
primary inputs), plus n bits where the output values f1, f2, ..., fn are constructed, plus 1 bit to com-
pute/uncompute different sub-functions
⊕
k=0..2b−1,fi(j,k)=1 P (k). Next, establish a lower bound on the
quantity L0,1,1(n, S − n), counting the number of CNOT and Toffoli gates in the reversible circuits over
S bits and implementing reversible n-bit functions. Considering the quantity L0,1,1(n, S−n) ensures we
use same number of ancillary qubits as that used to obtain the number X . Applying [15, Lemma 8], a
lower bound is given by the expression
Y :=
log2G
log2 b
=
log2(2
n!/2n)
log2(4S(S − 1)(S − 2) + 4S(S − 1))
(=
2n+1
3
+ o(2n)).
To be able to compare the numeric values ofX , being the upper bound expressing the T -count to Y , being
the lower bound for the number of CNOT/Toffoli gates, divide Y by the cost of the T gate expressed
in terms of the cost of the CNOT, being the cheaper one between the CNOT and the Toffoli. We have
previously established that this number may carry a value of about 50. Comparing X to Y/50 reveals
that for n = 27, a = 12 the latter is already greater than the former. This means that the T -count cost
metric may already undervalue the real cost of the circuits when n is as small as 27. By the time n = 50
(a = 23), the difference between the two grows to a factor of 2662, meaning that for the numbers this
high the T -count cost metric can be rather misleading. The discrepancy furthermore grows very rapidly
with n—specifically, with the speed C 2
n/2√
n
, for some constant C.
While the above numbers clearly discourage from the use of the T -count circuit metric in scalable designs,
we suspect that the real scope of the potential misinformation carried by using the T -count may be much
12
larger. This is because in our calculations we did not account for such resources as the cost of ancilla, or
the cost of the long-range CNOT gates, that are downplayed (in fact, ignored) by the T -count. On the
other hand, we proved that the discrepancy exists in general, whereas practical quantum computations
rely on very specific and well-structured reversible transformations (such as arithmetic circuits, including
exponentiation part of Shor’s algorithm). The extent to which the discrepancy can and does manifest
itself in practice and over such structured circuits needs to be studied separately.
5 Summary of the results
Our study details reversible NCT circuit complexity figures by the gate types, leading to the following
list of refined optimal and asymptotically optimal values for the respective counts.
000. ∀g L0,0,0(n, g) = 0;
001.
√
n2n/2 . L0,0,1(n) .
3√
2
√
n2n/2;
010. ∀g L0,1,0(n, g) = 0;
011. n2
n
3 log2 n
. L0,1,1(n, 1) .
40n2n
log2 n
;
100. ∀g L1,0,0(n, g) = 0;
101.
√
n2n/2 . L1,0,1(n) .
3√
2
√
n2n/2;
110. L1,1,0(n, 1) = 1, ∀g > 1 L1,1,0(n, g) = 0;
111. n2
n
3 log2 n
. L1,1,1(n, 1) .
48n2n
log2 n
;
6 Conclusion
In this paper, we studied the complexity function La,b,c(n, g), detailing reversible NCT circuit costs by
the gate types used. We established asymptotically optimal or optimal counts in every possible scenario.
Of these, some bounds were known from the previous literature. We upper and lower bounded the
multiplicative complexity of reversible circuits, leading to their asymptotic optimality. We formulated
a conjecture stating that L0,0,1(n) .
√
n2n/2. Proving this conjecture would establish that the multi-
plicative complexity of reversible functions is equal to
√
n2n/2 up to lower order additive terms. We
furthermore applied our study to show the limitations on the use of the T -count, multiplicative com-
plexity, and Toffoli count metrics in practical designs. The discrepancy between a real cost and the one
provided by the T -count/multiplicative complexity/Toffoli count may be as high as C 2
n/2√
n
, where C is a
constant. Taking some realistic parameters we estimated that for n = 50 the T -count may misrepresent
a real cost of the circuit it is applied to evaluate by a factor of as much as 2662.
Acknowledgements
I wish to thank Prof. Sergey B. Gashkov from Lomonosov Moscow State University and Dr. Martin
Ro¨tteler from Microsoft Research for their helpful discussions.
This material was based on work supported by the National Science Foundation, while working at the
Foundation. Any opinion, finding, and conclusions or recommendations expressed in this material are
those of the author and do not necessarily reflect the views of the National Science Foundation.
13
References
[1] M. Amy, D. Maslov, M. Mosca, and M. Ro¨tteler, “A meet-in-the-middle algorithm for fast synthesis
of depth-optimal quantum circuits”, IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, 32(6):818–830, 2013, arXiv:1206.0758.
[2] A. Barenco, C. H. Bennett, R. Cleve, D. P. DiVincenzo, N. Margolus, P. Shor, T. Sleator, J. Smolin,
and H. Weinfurter, “Elementary gates for quantum computation”, Phys. Rev. A 52:3457–3467, 1995,
quant-ph/9503016.
[3] C. H. Bennett, “Logical reversibility of computation”, Maxwell’s Demon. Entropy, Information, Com-
puting: 197–204, 1973.
[4] S. Bravyi and A. Kitaev, “Universal quantum computation with ideal Clifford gates and noisy ancil-
las”, Phys. Rev. A 71, 022316, 2005, quant-ph/0403025.
[5] C. Jones, “Novel constructions for the fault-tolerant Toffoli gate”, Phys. Rev. A 87, 022328, 2013,
arXiv:1212.5069.
[6] E. Kashefi, A. Kent, V. Vedral, and K. Banaszek, “A comparison of quantum oracles”, Phys. Rev. A
65, 050304, 2002, quant-ph/0109104.
[7] O. B. Lupanov. “On circuits of functional elements with delays”, in Russian, Problemy Kibernetiki,
23:43–81, 1970.
[8] D. Maslov, D. M. Miller, and G. W. Dueck, “Techniques for the synthesis of reversible Toffoli net-
works”, ACM Transactions on Design Automation of Electronic Systems, 12(4), article 42, 2007,
quant-ph/0607166.
[9] D. M. Miller, D. Maslov, and G. W. Dueck, “A transformation based algorithm for reversible logic
synthesis”, Proc. Design Automation Conference, pages 318–323, 2003.
[10] E. I. Nechiporuk, “On the complexity of schemes in some bases containing nontrivial elements with
zero weights”, in Russian, Problemy Kibernetiki 8:123–160, 1962.
[11] M. Nielsen and I. Chuang, Quantum Computation and Quantum Information, Cambridge University
Press, 2000.
[12] M. Saeedi, private communication, January 23, 2016.
[13] M. Saeedi, M. S. Zamani, M. Sedighi, and Z. Sasanian, “Reversible circuit synthesis using a cycle-
based approach”, ACM Journal of Emerging Technologies in Computing Systems, 6(4), article 13,
2010, arXiv:1004.4320.
[14] V. V. Shende and I. L. Markov, “On the CNOT-cost of Toffoli gates”, Quantum Information and
Computation 9(5-6):461–486, 2009, arXiv:0803.2316.
[15] V. V. Shende, A. K. Prasad, I. L. Markov, and J. P. Hayes, “Synthesis of reversible logic circuits”,
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 22(6):710–722,
June 2003, quant-ph/0207001.
[16] D. Zakablukov, “On asymptotic gate complexity and depth of reversible circuits without additional
memory”, 2015, arXiv:1504.06876.
14
