Lowering the T-depth of Quantum Circuits By Reducing the Multiplicative
  Depth Of Logic Networks by Häner, Thomas & Soeken, Mathias
Lowering the T -depth ofQuantum Circuits By Reducing
the Multiplicative Depth Of Logic Networks
Thomas Haener Mathias Soeken
Microsoft Quantum, Switzerland
ABSTRACT
The multiplicative depth of a logic network over the gate basis
{∧, ⊕,¬} is the largest number of ∧ gates on any path from a
primary input to a primary output in the network. We describe a
dynamic programming based logic synthesis algorithm to reduce
the multiplicative depth in logic networks. It makes use of cut
enumeration, tree balancing, and exclusive sum-of-products (ESOP)
representations. Our algorithm has applications to cryptography
and quantum computing, as a reduction in the multiplicative depth
directly translates to a lowerT -depth of the corresponding quantum
circuit. Our experimental results show improvements in T -depth
over state-of-the-art methods and over several hand-optimized
quantum circuits for instances of AES, SHA, and floating-point
arithmetic.
1 INTRODUCTION
Logic networks are the central data structure in logic optimiza-
tion algorithms, which have been widely applied for technology-
independent optimization in electronic design automation applica-
tions [9, 38, 54]. Roughly speaking, the number of logic gates in a
logic network corresponds to the size of a physical implementation,
while the number of logic levels corresponds to its delay.
In recent years, the domain of applications for logic optimization
has broadened to also target areas such as cryptography [7] and
fault-tolerant quantum computing (see, e.g., [28, 29, 50, 52]). Logic
networks are typically represented over a gate set consisting of
2-input AND gates, 2-input XOR gates, and inverters, called XOR-
AND graphs (XAGs), in which only the AND gates contribute to
the cost functions. The multiplicative complexity (MC, [48]) and
the multiplicative depth (MD, [11]) of a Boolean function are two
important theoretical metrics. The multiplicative complexity is
the smallest number of AND gates necessary in any XAG that
represents the function. Similarly, the multiplicative depth of a
function is the smallest critical path (only considering AND gates)
in any XAG that represents the function. We also refer to the length
of the critical path (only considering AND gates) as AND-depth.
Multiplicative complexity and depth play important roles in
cryptography and fault-tolerant quantum computing. A low multi-
plicative complexity corresponds to a higher vulnerability to some
cryptographic attacks. In fault-tolerant quantum computing, the
multiplicative complexity provides an upper bound on the num-
ber of expensive quantum operations as well as the number of
qubits [31]. Furthermore, the multiplicative depth corresponds to
the execution time of a quantum algorithm [32]. Computing the
multiplicative complexity of a Boolean function f is expensive. It
has been shown that no algorithm exists to compute the multiplica-
tive complexity that is polynomial in the size of the truth table for
f [17] if one-way functions [26] exist. We are not aware of any
theoretical results concerning the multiplicative depth.
x1 x3 x4 x2 x5 x4 x3
x5 x1 x4
x1
f = ⟨x1x2x3x4x5 ⟩
+ + + +
∧ ∧
+ +
∧
+
Figure 1: XAG for the majority-of-5 function, with multi-
plicative complexity 3 and multiplicative depth 2; for 2-
input XOR gates are merged into multi-input XOR gates.
We refer to the number of AND gates and the AND-depth of an
XAG by MC/MD of an XAG, respectively. We use just MC or MD
if it is clear from the context whether we refer to the MC/MD of
a Boolean function or to the MC/MD of a logic network. We note
that the latter provides an upper bound to the former.
Thus, many heuristics have been proposed that reduce the MC
of an XAG (see, e.g., [7, 14, 43, 55, 56]), aiming to arrive at tighter
upper bounds on the MC of the function being implemented. Simi-
larly, some heuristics have been proposed that aim to reduce the
multiplicative depth [5, 11]. In this paper, we introduce a logic syn-
thesis algorithm to reduce the MD of logic networks. Our algorithm
is based on dynamic programming and makes use of cut enumera-
tion [15], tree balancing [35], as well as ESOP [46] and ESPP [21]
representations.
Contributions. We present a fully automatic logic synthesis al-
gorithm that reduces the multiplicative depth of logic networks. We
present benchmarks demonstrating that our algorithm is capable of
reducing the MD by up to 3× for depth-optimized logic networks
and up to 9× for MC-optimized logic networks. As a result, also
the quantum circuits derived from our depth-optimized networks
feature depths that are significantly smaller than state-of-the-art
circuit designs. Crucially, these improvements in depth are possible
without increasing the number of qubits significantly.
2 PRELIMINARIES
2.1 Logic networks
In this work, we consider XOR-AND graphs (XAGs), which are logic
networks consisting of 2-input AND gates, 2-input XOR gates. Such
logic networks can represent all 0-preserving Boolean functions,
i.e., functions f for which f (0, . . . , 0) = 0. We are interested in
logic networks that minimize the maximum number of AND gates
on any path from an input to an output as a primary cost criteria,
ar
X
iv
:2
00
6.
03
84
5v
1 
 [q
ua
nt-
ph
]  
6 J
un
 20
20
T. Haener and M. Soeken
and the number of overall AND gates as a secondary cost criteria.
Functions f , which are not 0-preserving, can be realized by finding
an XAG for f¯ and then inverting the output. Restricting to have
inversions only at the outputs does not affect the AND gates in the
circuit, as all inner inversions can be propagated to the outputs by
only using XOR gates [48].
Formally, we model an XAG for a single-output Boolean function
f over n variables x1, . . . ,xn as a sequence of steps, or gates,
xi = x j1i ◦i x j2i (1)
forn < i ≤ n+r , and ◦i ∈ {⊕,∧}. The values 1 ≤ j1i < j2i < i point
to primary inputs or previous steps in the network. The function
value is computed by the last step f = xn+r . This model is readily
extended to multi-output Boolean functions, by associating each
output function with some step in the network. The logic level of a
primary input or gate i is defined as
ℓi =

0 if i ≤ n,
max{ℓj1i , ℓj2i } if i > n and ◦i = ⊕,
max{ℓj1i , ℓj2i } + 1 if i > n and ◦i = ∧.
(2)
The depth of an XAG is d = max{ℓi | 1 ≤ i ≤ n + r }, the largest
level among all gates. In other words, the logic level of a step is
the earliest possible time in which a step must be computed, if we
aim at parallelizing the evaluation of a logic network. Similarly, we
define the reverse logic level ℓri as the latest possible time in which
step i must be computed while not increasing the depth of the logic
network.
2.2 Cut enumeration
Many logic optimization algorithms are based on applying local
changes to small subnetworks instead of considering thewhole logic
network at once. An important family of single-rooted subnetworks
are cuts. Formally, a cut C of a step i in a logic network is a set of
steps, called leaves, such that (i) every path from step i to a primary
input visits at least one leaf, and (ii) each leaf is contained in at least
one path. Step i is called the root of the cut and each cut represents
a subgraph that includes the root i and some internal steps, and
has the leaves as primary inputs. A cut is k-feasible (referred to as
k-cut), if |C | ≤ k , i.e., it has at most k leaves.
Cut enumeration [15] is an algorithm that computes all or a sub-
set of all k-cuts for each step in a network. It constructs a mapping
CUTS(i) that maps each step to a set of cuts using the following
recursive procedure:
CUTS(i) =

{{i}} if i ≤ n,
{{i}} ∪⋃{C1 ∪C2 |
C1 ∈ CUTS(j1i ),
C2 ∈ CUTS(j2i )
s.t. |C1 ∪C2 | ≤ k} otherwise.
(3)
Cuts {{i}} for root i are called trivial cuts. Note that these are
essential, since otherwise the leaves of cuts can only be primary
inputs. Cut enumeration can also compute the function FUNC(i,C)
represented by a C for root i , by assigning FUNC(i, {i}) = xi for all
trivial cuts, and
FUNC(i,C) = FUNC(j1i ,C1) ◦i FUNC(j2i ,C2) (4)
if C was constructed using C1 and C2 in (3). Support-normalized
truth tables are typically used to represent the cut functions; e.g.,
truth tables for cut functions x1∧x3 and x4∧x9, are both represented
by the 4-bitstring (1000)2. To which variables the truth table refers
can be determined from the cut’s leaves.
2.3 Exclusive sum-of-products
An ESOP for an n-variable Boolean function f (x1, . . . ,xn ) has the
form
f (x1, . . . ,xn ) =
m⊕
j=1
(
x
p1, j
1 ∧ · · · ∧ x
pn, j
n
)
(5)
for somem and polarities pi, j , which take values from 0 to 2. Their
meaning is that x0i = x¯i , x
1
i = xi , and x
2
i = 1. We call x
0
i a negative
literal, x1i a positive literal, and x
2
i an empty literal. Ifm = 0, we de-
fine f (x1, . . . ,xn ) = 0. The constant-1 function can be represented
by an ESOP wherem = 1 and p1,1 = · · · = pn,1 = 2.
Each term
(
x
p1, j
1 ∧ · · · ∧ x
pn, j
n
)
is called a cube of degreedj = |{i |
pi, j , 2}|. It can be regarded as an (n −dj )-dimensional subcube of
the n-dimensional hypercube, in which the 2n vertices correspond
to all bitstrings of length n. We require that no cube occurs more
than once in an ESOP. The degree of the ESOP is max1≤j≤m dj .
An ESOP in which pi, j , 0 for all 1 ≤ i ≤ n, 1 ≤ j ≤ m is
called the algebraic normal form of f . It is unique up to permutation
of the cubes. The degree of the algebraic normal form is called
the algebraic degree of f and is a lower bound for the degree of
any ESOP for f . An ESOP can be translated into the algebraic
normal form by replacing each cube with 2l cubes in which all
l = |{i | pi, j = 0}| negative literals are replaced by all combinations
of positive and empty literals.
Various exact and heuristic algorithms [8, 16, 20, 37, 40, 44, 46,
47, 53] exist to find ESOPs for Boolean functions, where the primary
cost function is the number of cubes in the ESOP and the secondary
cost function is the total number of non-empty literals. The positive
of impact of ESOP expressions to our work is mainly that they
have a small depth, thereby having the potential to reduce the
multiplicative depth, however, they likely introduce a lot of AND
gates to express the cubes. An ESOP optimization algorithm that
targets the number of literals as primary cost would therefore be a
better fit for our application.
2.4 Quantum computing
A quantum computer contains quantum bits, so-called qubits, to
which quantum gates are applied in order to solve a computational
task. It is controlled by a classical computer running a quantum
program, which consists of both classical and quantum instructions:
classical instructions are executed by the (classical) host computer,
and quantum instructions get sent to the quantum co-processor
for execution. In each computational step, the classical computer
decides on the sequence of quantum instructions to be executed
on the co-processor. Such sequences can be depicted as quantum
circuits. The circuit diagram is read from left to right, with each
horizontal line representing a qubit, and quantum gates are repre-
sented as boxes/symbols on these lines. Fig. 2 shows a quantum
circuit that computes the majority-of-5 function and is derived from
the logic network in Fig. 1. The circuit consists of CNOT gates ,
Lowering the T -depth of Quantum Circuits By Reducing the Multiplicative Depth Of Logic Networks
|x1 ⟩
|x2 ⟩
|x3 ⟩
|x4 ⟩
|x5 ⟩
|x1 ⟩
|x2 ⟩
|x3 ⟩
|x4 ⟩
|x5 ⟩
|f ⟩
Figure 2: Quantum circuit for themajority-of-5 function, derived from the logic network in Fig. 1. Each of the computing AND
gates, can be realized inT -depth 1 when using one helper qubit. The circuit therefore hasT -count 12,T -depth 2, and 11 qubits,
incl. qubits for function inputs and outputs.
AND gates , as well as uncomputing AND gates . CNOT gates act
on two qubits and compute the XOR of both qubit values onto the
lower (target) qubit, leaving the upper (control) qubit unchanged.
The AND gate computes a 1 on a newly initialized target qubit, if
and only if the two control qubits are 1. The uncomputing AND
gate expects that the target qubit is 1 if and only if the two control
qubits are are 1, and releases the target qubit in a clean state such
that it can be used for subsequent computations.
In this paper, we target quantum computing running a protocol
for fault-tolerance, which is necessary to run quantum algorithms
with more than a few thousand operations, e.g., for chemistry sim-
ulations of practical interest [42]. In this setting, the focus of cir-
cuit optimization shifts away from two-qubit gates (e.g., for NISQ
devices [41]) toward gates that require distillation. In particular,
when the surface code is used, the so-called T -gate incurs a large
overhead [2, 39]. In fault-tolerant quantum computing, the cost
of CNOTs are typically neglected. The AND gate has a T -count
of 4 and a T -depth of 1, if one additional helper qubit is used for
its implementation [23] (otherwise, it can be implemented with a
T -depth of 2 without the use of a helper qubit). The uncomputing
AND gate requires no T -gates.
Previous work [31] focused on reducing the number of costly
T -gates. Instead, we aim to shorten the time to solution by reducing
the T -depth instead.
3 MULTIPLICATIVE DEPTH REDUCTION
In this section, we introduce various methods that reduce the multi-
plicative depth of logic networks. Then, we present a procedure to
map these networks to quantum circuits while maintaining depth
improvements.
3.1 Cut-based balancing
Algorithm 1 describes a generic balancing algorithm based on dy-
namic programming and cut enumeration inspired by [35]. It takes
as input a logic network for an n-variable Boolean function with r
steps and returns a new depth-optimized logic network. Travers-
ing all steps i in topological order, it computes depth-optimized
candidates for each cut C of i , and stores the best candidate in a
mapping BEST(i). The output of the depth-optimized network is
BEST(n+r ) after all steps have been visited. For each cutC of step i ,
the algorithm tries to resynthesize the cut function FUNC(i,C) with
Algorithm 1 Generic cut-based balancing
for i = 1, . . . , n do
BEST(i) ← i
end for
for i = n + 1, . . . , n + r do
jbest ← Λ
ℓjbest ←∞
for C ∈ CUTS(i) s.t. |C | > 1 do
{l1, . . . , lk } ← C
j ← balance(FUNC(i, C), BEST(l1), . . . , BEST(lk ))
if ℓj < ℓjbest then
jbest ← j
end if
end for
BEST(i) ← jbest
end for
return BEST(n + r )
the target to reduce the level of step i . For this purpose, it assumes
the best candidates for the cut’s leaves.
The algorithm uses a balance function to resynthesize the cut
function. It is therefore generic and can be customized by applying
various resynthesis procedures. One possible resynthesis procedure
is presented in [35]. It computes a sum-of-products (SOP) represen-
tation for the cut function and then translates each term in the SOP
into a weight-balanced tree of AND gates, as well as all terms into
a weight-balanced tree of OR gates. Our work adapts this method
by using an ESOP representation instead, where the outer XOR
operations do not contribute to the logic network’s multiplicative
depth.
3.2 ESOP balancing
In this section we discuss a rebalancing algorithm based on ESOP
forms, which can be used in Algorithm 1. ESOP forms offer a po-
tentially low-depth implementation as an XAG. For the sake of a
simpler description of the algorithm, we assume that the ESOP form
is given in algebraic normal form, however, in the implementation
we consider ESOP forms that also contain negative literals, since
they allow for a more compact representation.
Given a k-cut C = {l1, . . . , lk } of root i with cut function
FUNC(i,C) = f (xˆ1, . . . , xˆk ),
T. Haener and M. Soeken
where xˆi = xBEST(li ) with corresponding level ℓˆi . If we are given
an ESOP for f withm cubes, then each cube is translated into a
tree of 2-input AND gates that is balanced with respect to the leaf
levels. Then all outputs of these AND-trees are combined by a tree
of 2-input XOR gates, which does not add to the multiplicative
depth.
Algorithm 2 Tree balancing computation of product term j
Require: Product term p1, j , . . . , pk, j , variables xˆ1, . . . , xˆk at levels
ℓˆ1, . . . , ℓˆk
letQ be a priority queue of steps, ordered by the steps’ level in ascending
order
for i = 1, . . . , k s.t. pi, j = 1 do
push(Q, xˆi )
end for
while |Q | > 1 do
u ← pop(Q )
v ← pop(Q )
push(Q, u ∧ v)
end while
return pop(Q )
The algorithm to balance a non-constant ESOP cube with respect
to the leaf levels is described in Algorithm 2. First all non-empty
literals are inserted into a priority queueQ according to their levels
in ascending order. Then as long as the queue has more than one
element, the two top-most elements are popped from the queue
and merged with an AND gate. The resulting step is then pushed
back into the queue, taking the level of the step into account for
the ordering.
3.3 ESPP optimization
An exclusive sum-of-pseudoproducts (ESPP [21]) for an n-variable
Boolean function f (x1, . . . ,xn ) has the form
f (x1, . . . ,xn ) =
m⊕
j=1
(
L
p0, j
0 ∧ · · · ∧ L
p2n−1, j
2n−1
)
(6)
where Li = b1x1 ⊕ · · · ⊕ bnxn when i = (bn . . .b1)2 is the linear
function (or parity function) that contains variables according to the
positions of 1s in the binary expansion of i . The polarity variables
pi, j play the same role as defined for ESOP forms, i.e., the parity
function Li in term j is negated if pi, j = 0, used as is if pi, j = 1, and
omitted if pi, j = 2. The terms in (6) are called pseudoproducts [27].
Note that each ESOP is also an ESPP, but an ESPP is only an ESOP
if (pi, j , 2) → (ν (i) = 1) (where ν (i) is the sideways sum of i , i.e.,
the number of 1s in its binary expansion).
The authors presented an exhaustive search algorithm to find
small ESPPs in [21], and some theoretical investigations on the form
have been conducted [49]. However, to the best of our knowledge
no efficient heuristic optimization algorithm for ESPPs has been
presented.
We implemented a simple heuristic Greedy minimization algo-
rithm to minimize the number of terms in an ESPP. The algorithm
iteratively merges cubes to increase the use of linear functions as
cube literals, thereby minimizing the number of AND operations.
The algorithm starts with an initial ESPP form that corresponds to
an ESOP form, extracted from a cut function. It then checks whether
there exists two distinct terms with indices j1 and j2 such that there
exist two indices 0 ≤ i1, i2 < 2n such that pi1, j1 = pi2, j2 = 2 but
pi1, j2 , 2 and pi2, j1 , 2, and for all other indices i < {i1, i2}, it holds
that pi, j1 = pi, j2 . Then, the two terms can be combined into a single
term j, with pi, j = pi, j1 for all i < {i1, i2} and
pi1⊕i2, j = [pi1, j2 = pi2, j1 ] (7)
if pi1⊕i2, j1 ∈ {2, [pi1, j2 = pi2, j1 ]}. If pi1⊕i2, j1 = 1 − [pi1, j2 = pi2, j1 ],
the two terms cancel and can be removed without adding another
term to the ESPP. We iterate this procedure until no more such
two terms can be found. In our implementation, empty parity func-
tions are not explicitly stored, and therefore this procedure can be
efficiently implemented.
3.4 Mapping to quantum circuit
Given a logic network over the gate basis {∧, ⊕,¬}, it is straight-
forward to generate a quantum circuit that computes the same
function: Each ∧ node in the network can be mapped to a Toffoli
that writes the output into an extra qubit starting in |0⟩; each ⊕ and
¬ node can be computed inplace using a (controlled) NOT gate [31].
While the resulting quantum circuit computes the same function,
a significant amount of parallelism is lost due to input-dependencies.
As a remedy, we copy the inputs of those gates that can be executed
in parallel, thus removing these dependencies [32].
4 EXPERIMENTAL RESULTS
We use various arithmetic and random-control functions from [1]
as well as cryptographic functions and IEEE floating-point oper-
ations [4] as benchmarks for our algorithm. Our algorithm has
been implemented in C++ on top of the EPFL logic synthesis li-
braries [51]. All experiments were run on a Microsoft Azure virtual
machine, on a general purpose Standard D8s v3 size configuration,
running on an Intel Xeon Platinum 8171M 2.40GHz CPU with 32
GiB memory and Ubuntu 18.04.
We choose two different baselines as starting points, heavily
optimized XAGs for low MC (Min. MC baseline) and heavily opti-
mized AIGs (And-inverter graphs) for low (general) logic network
depth (Min. depth baseline). The Min. MC baseline is obtained using
the MC optimization algorithm in [56].1 The Min. MC baseline is
obtained by calling the ABC [10] optimization scripts resyn2rs
(depth-preserving size optimization [33, 36]), followed by if -K
6 -y (AIG depth optimization [58]), followed by another round of
resyn2rs, each run until depth is no longer improved.
4.1 Multiplicative-depth optimization
As a first step, we apply ESOP-balancing with a cut size of 6 and
exorcism [37] to obtain ESOPs for the cut functions to the chosen
benchmarks for both baselines. We call the algorithm repeatedly
until no further reduction in the multiplicative depth can be ob-
tained. We report the results in Table 1. For the EPFL benchmarks
we list the currently best-known results for multiplicative depth
obtained from the state-of-the-art multiplicative depth optimiza-
tion approach in [5, 11]. That approach has not been applied to the
cryptographic and floating-point operations. For each baseline we
1The cryptographic and floating-point operations were not further optimized, as they
are already optimized for MC.
Lowering the T -depth of Quantum Circuits By Reducing the Multiplicative Depth Of Logic Networks
Table 1: Experimental results for applying ESOP-balancing
Benchmark State-of-the-art [5, 11] Min. MC baseline Min. depth baseline
MC MD Run-time MC (before) MD (before) Run-time MC (before) MD (before) Run-time
Arithmetic functions [1]
adder 16378 9 125.00 481 (128) 34 (128) 0.15 2761 (1742) 12 (14) 10.13
bar 4193 10 0.70 1303 (832) 4 (7) 0.33 3516 (3334) 8 (11) 2.42
div 190855 532 3731.00 158795 (5288) 973 (2243) 26.18 120327 (120327) 523 (620) 541.33
hyp 135433 15230 172000.00 120765 (56635) 4428 (8784) 166.07 780220 (417567) 1287 (1558) 324.31
log2 31573 129 94.00 34133 (10906) 104 (201) 778.78 83177 (33951) 114 (171) 130.33
max 7666 26 14.50 3839 (890) 93 (252) 1.81 8368 (4027) 25 (28) 4.17
multiplier 23059 57 30.73 15138 (7653) 65 (149) 13.50 39628 (28331) 56 (86) 77.72
sin 5507 74 4.50 6822 (2603) 62 (105) 9.45 14067 (6424) 61 (89) 58.87
sqrt 321555 2084 107814.00 71587 (5381) 951 (2167) 45.75 185061 (65762) 769 (936) 290.40
square 11306 26 12.50 6348 (4672) 59 (155) 8.41 10777 (14570) 20 (36) 38.75
Random control [1]
arbiter 5183 10 43.00 3128 (1174) 13 (50) 2.10 7276 (6205) 11 (12) 1.35
cavlc 667 9 0.00 447 (394) 7 (11) 1.15 564 (576) 8 (10) 0.45
ctrl 109 5 0.00 54 (45) 4 (5) 0.10 77 (80) 4 (8) 0.06
dec 304 3 0.00 328 (328) 3 (3) 0.08 292 (292) 3 (3) 0.02
i2c 1213 7 0.10 816 (557) 7 (11) 0.87 1122 (1007) 7 (8) 0.37
int2float 216 7 0.00 104 (85) 6 (11) 0.87 184 (190) 7 (8) 0.13
mem_ctrl 54816 40 85.00 9983 (4695) 14 (39) 17.56 78044 (37519) 35 (41) 20.37
priority 876 102 0.50 442 (323) 11 (95) 1.08 522 (479) 10 (13) 0.28
router 198 11 0.00 116 (93) 8 (13) 0.10 227 (196) 10 (12) 0.19
voter 4288 30 112.42 7335 (4257) 26 (40) 31.95 3255 (6716) 17 (48) 6.14
Cryptographic functions [4]
AES-128 8400 (6400) 50 (60) 5.49 33953 (85547) 80 (299) 65.52
AES-192 9408 (7168) 60 (72) 5.98 39533 (96979) 99 (359) 55.39
AES-256 11592 (8832) 70 (84) 7.49 53775 (120627) 123 (417) 90.26
Keccak-f 38400 (38400) 24 (24) — 38630 (567395) 28 (266) 129.00
SHA-256 22573 (22573) 1607 (1607) — 450447 (296951) 1519 (1936) 247.16
SHA-512 57947 (57947) 3303 (3303) — 1988586 (831166) 2383 (2894) 1489.64
IEEE floating-point operations [4]
FP-add 16721 (5384) 96 (235) 9.93 27541 (15879) 64 (83) 15.42
FP-div 3829444 (82265) 1646 (3619) 2994.35 732932 (200112) 885 (1157) 400.12
FP-eq 315 (315) 9 (9) — 220 (336) 9 (10) 0.02
FP-f2i 3290 (1467) 24 (94) 3.01 3405 (2881) 21 (29) 3.41
FP-mul 23886 (19614) 92 (129) 14.78 62254 (47213) 87 (140) 54.51
FP-sqrt 4946577 (91499) 3763 (6507) 2981.18 893849 (264130) 1877 (2374) 506.28
list MC and MD after optimization together with the initial values
in parentheses, as well as runtime in seconds. The result with the
lowest MD is highlighted in bold; in case of a tie we compare MC
as a second cost metric.
Our algorithm can improve the best-known results in 18 out of 20
cases. For the arithmetic functions, the largest MD reductions were
obtained when applying our approach to the Min. depth baseline,
whereas for the random control functions, the Min. MC baseline
turns out to be the better starting point. Note that in some cases
(e.g., hyp and priority) we obtain a 10× improvement over the state
of the art. For the cryptographic and floating-point functions, we
can improve the MD compared to both baselines for all benchmarks
except for Keccak-f. Because we use heavily-optimized networks
as the baseline, we do not expect large gains for cryptographic
functions, especially since MC and MD are important quantities
in cryptography. In contrast, we find depth-reductions of up to
3× for floating-point operations (e.g., FP-add and FP-f2i) with only
moderate increases in MC.
4.2 T -depth optimization
In a second step, we map our depth-optimized XAGs to quantum
circuits using two straightforward heuristics for upper-bounding
the number of qubits: the as soon as possible (ASAP) heuristic com-
putes all AND gates in parallel that have the same logic level and
the as late as possible (ALAP) heuristic computes all AND gates
in parallel that have the same reverse logic level. We present the
resulting T -counts, T -depths, and qubit estimates in Table 2. For
each cryptographic function and floating-point operation, we re-
port the two quantum circuits with the fewest number of qubits
(first row) and the lowestT -depth (second row). The corresponding
T. Haener and M. Soeken
103 104
102
103
104
105
Qubits
T
-d
ep
th
AES-128
103 104
102
103
104
Qubits
T
-d
ep
th
AES-192
103 104
102
103
104
105
106
Qubits
T
-d
ep
th
AES-256
103 104 105
103
104
105
Qubits
T
-d
ep
th
SHA-256
104.4 104.6
101.4
101.5
Qubits
T
-d
ep
th
Keccak-f
103 104
102
103
104
Qubits
T
-d
ep
th
FP-add
103 104 105
102
103
104
Qubits
T
-d
ep
th
FP-mul
Kim, Han, Jeong [24]
Grassl, Langenberg, Roetteler,
Steinwandt [18]
Langenberg, Pham, Steindwandt [25]
Jaques, Naehrig, Roetteler, Virdia [22]
Amy, Matteo, Gheorghiu, Mosca,
Parent, Schanck [3]
Haener, Soeken, Roetteler, Svore [19]
This work
Figure 3: These plots contain various resource estimates that can be found in the literature, together with all Pareto-optimal
results for our approach that we obtained during the experimental evaluation. These include also results from intermediate
optimization steps.
XAG and heuristic (ASAP or ALAP) is given in the last column.
Compared to state-of-the-art, manually-crafted quantum circuit
designs, we achieve significant reductions in depth without dra-
matically increasing the qubit requirements. A comparison of our
automatically-generated designs to a variety of state-of-the-art
circuits for several cryptographic and floating-point functions is
given in Fig. 3 with best T -depth state-of-the-art implementations
explicitly reported in Table 3.
We note that, in addition to reduced circuit depths compared to
the state of the art, our approach has the clear advantage that it is
completely automatic. This stands in stark contrast to the circuits
found in the literature, since those are manual designs that were
not created by the push of a button.
5 CONCLUSIONS
In this work we presented dynamic programming algorithm to
minimize the multiplicative depth of XAGs that makes use of cut
enumeration, tree balancing, as well as ESOP and ESPP representa-
tions. We can report significant improvement to the state-of-the-art
MD optimization algorithms in [5, 11]. We used our algorithm to
Lowering the T -depth of Quantum Circuits By Reducing the Multiplicative Depth Of Logic Networks
Table 2: Estimates for T -depth optimized quantum circuits
obtained from depth-optimized XAGs. We report quantum
circuits that achieve the smallest number of qubits (first
row) and the lowest T -depth (second row) over all T -depth
optimized circuits.
Benchmark T-count T-depth Qubits Instance
Cryptographic functions
AES-128 25600 60 7324 Min. MC baseline (ASAP)
AES-128 33600 50 9384 Min. MC opt (ASAP)
AES-192 28672 72 8156 Min. MC baseline (ASAP)
AES-192 37632 60 10456 Min. MC opt (ASAP)
AES-256 35328 84 9884 Min. MC baseline (ASAP)
AES-256 46368 70 12704 Min. MC opt (ASAP)
Keccak-f 153600 24 46400 Min. MC baseline (ASAP)
SHA-256 90292 1607 23684 Min. MC baseline (ASAP)
SHA-256 1801788 1519 458974 Min. depth opt (ASAP)
SHA-512 231788 3303 60448 Min. MC baseline (ASAP)
SHA-512 7954344 2383 2008595 Min. depth opt (ASAP)
IEEE floating-point operations
FP-add 21384 235 5969 Min. MC baseline (ALAP)
FP-add 100832 64 28154 Min. depth opt (ASAP)
FP-div 290848 3604 81066 Min. MC baseline (ALAP)
FP-div 3054524 885 792188 Min. depth opt (ASAP)
FP-eq 880 9 655 Min. depth opt (ALAP)
FP-eq 1260 9 976 Min. MC baseline (ASAP)
FP-f2i 5832 94 1821 Min. MC baseline (ALAP)
FP-f2i 13620 21 4846 Min. depth opt (ASAP)
FP-mul 76368 118 26890 Min. MC baseline (ALAP)
FP-mul 249052 87 69347 Min. depth opt (ASAP)
FP-sqrt 315924 6498 84017 Min. MC baseline (ALAP)
FP-sqrt 3575396 1877 901087 Min. depth opt (ASAP)
find fault-tolerant quantum implementations of various crypto-
graphic and floating-point operations that improve the T -depth
over state-of-the-art manual designed quantum circuits.
The adoption of SOP-based balancing for Boolean logic networks
to ESOP-balancing for XAGs in order to reduce the multiplicative
depth worked very well, since the XOR gates corresponding to the
outer XOR operator of the ESOP forms does not contribute to the
depth. We plan to investigate how this change in the cost function
and underlying logic representation may benefit from alternative
depth optimization algorithms such MUX-based optimization [6,
34]), generalized select transform algorithms [30, 45], or BDD-based
techniques [12, 13].
We presented a post-optimization algorithm for ESOPs based on
ESPPs, a generalization of ESOPs. XP2 forms are a generalization
of ESPPs and a minimization algorithm for such forms has been
presented in [57]. We expect that such forms can help to further
reduce the number of AND gates in the rebalancing step of our
algorithm without increasing the multiplicative depth.
In classical logic synthesis optimization flows, it is customary to
interleave depth-optimization algorithms with size-optimization al-
gorithms to obtain good trade-off points. We plan to adopt heuristic
MC optimization algorithms to be depth-preserving, i.e., allowing
the minimization of AND gates only if the multiplicative depth does
Table 3: Estimates from related work
Benchmark T -count T -depth Qubits Comment
AES-128 [24] — 960 21854 1
AES-128 [18] 1060864 50688 984∗
AES-192 [18] 1204224 44352 1112∗
AES-256 [18] 1505280 59904 1336∗
AES-128 [25] 118580 7520 864∗ 2
AES-192 [25] 137060 6560 896∗ 2
AES-256 [25] 166320 8640 1232∗ 2
AES-128 [22] 54400 120 1785∗
AES-192 [22] 60928 120 2105∗
AES-256 [22] 75072 126 2425∗
Keccak-f [3] 24640 33 3200∗
SHA-256 [24] — 30336 938∗ 1
SHA-256 [3] 228992 70400 2402∗
FP-add [19] 26348 7224 268∗ 3
FP-mul [19] 122752 52116 315∗ 3
A ∗ indicates that this value is better compared to the best value reported
in Table 2.
1 Authors report no Toffoli-count or T -count; T -depth is derived from
reported Toffoli-depth by multiplication with 3 [2]; authors report six
different candidates, from which we picked the one with the best T -depth.
2 T -count and T -depth are derived from reported Toffoli-count and
Toffoli-depth in the paper.
3 The floating-point designs in the paper are not IEEE-compliant and do
not account for special cases or denormalized numbers.
not increase. This allows to reduce the T -count and qubit count in
corresponding quantum circuits without increasing the T -depth.
REFERENCES
[1] Luca Gaetano Amarù, Pierre-Emmanuel Gaillardon, and Giovanni De Micheli.
2015. The EPFL combinational benchmark suite. In Int’l Workshop on Logic and
Synthesis.
[2] Matthew Amy, Dmitri Maslov, Michele Mosca, and Martin Roetteler. 2013. A
Meet-in-the-Middle Algorithm for Fast Synthesis of Depth-Optimal Quantum
Circuits. IEEE Trans. on CAD of Integrated Circuits and Systems 32, 6 (2013),
818–830. https://doi.org/10.1109/TCAD.2013.2244643
[3] Matthew Amy, Olivia Di Matteo, Vlad Gheorghiu, Michele Mosca, Alex Parent,
and John M. Schanck. 2016. Estimating the Cost of Generic Quantum Pre-image
Attacks on SHA-2 and SHA-3. In Int’l Conf. on Selected Areas in Cryptography.
317–337. https://doi.org/10.1007/978-3-319-69453-5_18
[4] David Archer, Victor Arribas Abril, Pieter Maene, Nele Mertens, Danilo Sijacic,
and Nigel Smart. [n.d.]. ‘Bristol Fashion’ MPC circuits. https://homes.esat.
kuleuven.be/~nsmart/MPC/ https://homes.esat.kuleuven.be/~nsmart/MPC/.
[5] Pascal Aubry, Sergiu Carpov, and Renaud Sirdey. 2020. Faster Homomorphic
Encryption is not Enough: Improved Heuristic for Multiplicative Depth Mini-
mization of Boolean Circuits. In The Cryptographers’ Track at the RSA Conference.
345–363. https://doi.org/10.1007/978-3-030-40186-3_15
[6] C. Leonard Berman, David J. Hathaway, Andrea S. LaPaugh, and Louise H. Tre-
villyan. 1990. Efficient techniques for timing correction. In Int’l Symp. on Circuits
and Systems. 415–419. https://doi.org/10.1109/ISCAS.1990.112064
[7] Joan Boyar, Philip Matthews, and René Peralta. 2013. Logic Minimization Tech-
niques with Applications to Cryptology. Journal of Cryptology 26, 2 (2013),
280–312. https://doi.org/10.1007/s00145-012-9124-7
[8] Daniel Brand and Tsutomu Sasao. 1993. Minimization of AND-EXOR Expressions
Using Rewrite Rules. IEEE Trans. on Computers 42, 5 (1993), 568–576. https:
//doi.org/10.1109/12.223676
[9] Robert K. Brayton, Gary D. Hachtel, and Alberto L. Sangiovanni-Vincentelli. 1990.
Multilevel logic synthesis. Proc. IEEE 78, 2 (1990), 264–300.
T. Haener and M. Soeken
[10] Robert K. Brayton and Alan Mishchenko. 2010. ABC: An Academic Industrial-
Strength Verification Tool. In Computer Aided Verification. 24–40. https://doi.
org/10.1007/978-3-642-14295-6_5
[11] Sergiu Carpov, Pascal Aubry, and Renaud Sirdey. 2017. A Multi-start Heuristic
for Multiplicative Depth Minimization of Boolean Circuits. In Int’l Workshop on
Combinatiorial Algorithms. 275–286. https://doi.org/10.1007/978-3-319-78825-
8_23
[12] Lei Cheng, Deming Chen, and Martin D. F. Wong. 2007. DDBDD: Delay-Driven
BDD Synthesis for FPGAs. In Design Automation Conference. 910–915. https:
//doi.org/10.1145/1278480.1278705
[13] Mihir R. Choudhury and Kartik Mohanram. 2010. Bi-decomposition of large
Boolean functions using blocking edge graphs. In Int’l Conf. on Computer-Aided
Design. 586–591. https://doi.org/10.1109/ICCAD.2010.5654210
[14] Stelvio Cimato, Valentina Ciriani, Ernesto Damiani, and Maryam Ehsanpour.
2019. An OBDD-Based Technique for the Efficient Synthesis of Garbled Circuits.
In Int’l Workshop on Security and Trust Management (Lecture Notes in Computer
Science, Vol. 11738), Sjouke Mauw and Mauro Conti (Eds.). Springer, 158–167.
https://doi.org/10.1007/978-3-030-31511-5_10
[15] Jason Cong, Chang Wu, and Yuzheng Ding. 1999. Cut Ranking and Pruning:
Enabling a General and Efficient FPGA Mapping Solution. In Int’l Symp. on Field
Programmable Gate Arrays. 29–35. https://doi.org/10.1145/296399.296425
[16] Rolf Drechsler. 1999. Preudo-Kronecker Expressions for Symmetric Functions.
IEEE Trans. on Computers 48, 9 (1999), 987–990. https://doi.org/10.1109/12.795226
[17] Magnus Gausdal Find. 2014. On the Complexity of Computing Two Nonlinearity
Measures. In Int’l Computer Science Symposium in Russia. 167–175. https://doi.
org/10.1007/978-3-319-06686-8_13
[18] Markus Grassl, Brandon Langenberg, Martin Roetteler, and Rainer Steinwandt.
2016. Applying Grover’s Algorithm to AES: Quantum Resource Estimates. In Int’l
Workshop on Post-Quantum Cryptography. 29–43. https://doi.org/10.1007/978-3-
319-29360-8_3
[19] Thomas Häner, Mathias Soeken, Martin Roetteler, and Krysta M. Svore. 2018.
Quantum Circuits for Floating-Point Arithmetic. In Int’l Conf. on Reversible
Computation. 162–174. https://doi.org/10.1007/978-3-319-99498-7_11
[20] Martin Helliwell and Marek A. Perkowski. 1988. A Fast Algorithm to Minimize
Multi-Output Mixed-Polarity Generalized Reed-Muller Forms. In Design Automa-
tion Conference. 427–432. http://portal.acm.org/citation.cfm?id=285730.285799
[21] Ryoji Ishikawa, Takashi Hirayama, Goro Koda, and Kensuke Shimizu. 2004. New
Three-Level Boolean Expression Based on EXOR Gates. IEICE Trans. on Informa-
tion & Systems 87-D, 5 (2004), 1214–1222. http://search.ieice.org/bin/summary.
php?id=e87-d_5_1214
[22] Samuel Jaques, Michael Naehrig, Martin Roetteler, and Fernando Virdia. 2019.
Implementing Grover oracles for quantum key search on AES and LowMC. arXiv
preprint arXiv:1910.01700 (2019).
[23] Cody Jones. 2013. Low-overhead constructions for the fault-tolerant Toffoli gate.
Physical Review A 87, 2 (2013), 022328.
[24] Panjin Kim, Daewan Han, and Kyung Chul Jeong. 2018. Time-space complexity
of quantum search algorithms in symmetric cryptanalysis: applying to AES and
SHA-2. Quantum Information Processing 17, 12 (2018), 339. https://doi.org/10.
1007/s11128-018-2107-3
[25] Brandon Langenberg, Hai Pham, and Rainer Steinwandt. 2020. Reducing the
Cost of Implementing the Advanced Encryption Standard as a Quantum Circuit.
IEEE Trans. on Quantum Engineering 1 (2020), 1–12.
[26] Leonid A. Levin. 2003. The Tale of One-Way Functions. Problems of Information
Transmission 39, 1 (2003), 92–103. https://doi.org/10.1023/A%3A1023634616182
[27] Fabrizio Luccio and Linda Pagli. 1999. On a New Boolean Function with Applica-
tions. IEEE Trans. on Computers 48, 3 (1999), 296–310. https://doi.org/10.1109/12.
754996
[28] Igor L. Markov and Mehdi Saeedi. 2012. Constant-optimized quantum circuits
for modular multiplication and exponentiation. Quantum Information and Com-
putation 12, 5&6 (2012), 361–394.
[29] Igor L. Markov and Mehdi Saeedi. 2013. Faster quantum number factoring via
circuit synthesis. Physical Review A 87 (2013), 012310. Issue 1. https://doi.org/10.
1103/PhysRevA.87.012310
[30] Patrick C. McGeer, Robert K. Brayton, Alberto L. Sangiovanni-Vincentelli, and
Sartaj Sahni. 1991. Performance Enhancement through the Generalized Bypass
Transform. In Int’l Conf. on Computer-Aided Design. 184–187. https://doi.org/10.
1109/ICCAD.1991.185226
[31] Giulia Meuli, Mathias Soeken, Earl Campbell, Martin Roetteler, and Giovanni
De Micheli. 2019. The Role of Multiplicative Complexity in Compiling Low
T -count Oracle Circuits. In Int’l Conf. on Computer-Aided Design. 1–8. https:
//doi.org/10.1109/ICCAD45719.2019.8942093
[32] Giulia Meuli, Mathias Soeken, Martin Roetteler, and Giovanni De Micheli. 2020.
Enumerating Optimal Quantum Circuits Using Spectral Classification. In Int’l
Symp. on Circuits and Systems.
[33] Alan Mishchenko and Robert K. Brayton. 2006. Scalable logic synthesis using a
simple circuit structure. In Int’l Workshop on Logic and Synthesis. 15–22.
[34] Alan Mishchenko, Robert K. Brayton, and Stephen Jang. 2010. Global delay
optimization using structural choices. In Int’l Symp. on Field Programmable Gate
Arrays. 181–184. https://doi.org/10.1145/1723112.1723144
[35] Alan Mishchenko, Robert K. Brayton, Stephen Jang, and Victor N. Kravets. 2011.
Delay optimization using SOP balancing. In Int’l Conf. on Computer-Aided Design.
375–382. https://doi.org/10.1109/ICCAD.2011.6105357
[36] Alan Mishchenko, Satrajit Chatterjee, and Robert K. Brayton. 2006. DAG-aware
AIG rewriting a fresh look at combinational logic synthesis. In Design Automation
Conference. 532–535. https://doi.org/10.1145/1146909.1147048
[37] Alan Mishchenko and Marek A. Perkowski. 2001. Fast Heuristic Minimization of
Exclusive-Sum-of-Products. In Reed-Muller Workshop.
[38] Saburo Muroga. 1993. Logic Synthesizers, the Transduction Method and Its
Extension, Sylon. In Logic Synthesis and Optimization, Tsutomu Sasao (Ed.).
Springer, 59–86.
[39] Joe O’Gorman and Earl T. Campbell. 2017. Quantum computation with realistic
magic-state factories. Physical Review A 95 (2017), 032338. Issue 3. https:
//doi.org/10.1103/PhysRevA.95.032338
[40] George K. Papakonstantinou. 2014. A Parallel Algorithm for Minimizing ESOP
Expressions. Journal of Circuits, Systems, and Computers 23, 1 (2014). https:
//doi.org/10.1142/S0218126614500157
[41] John Preskill. 2018. Quantum Computing in the NISQ era and beyond. Quantum
2 (2018), 79. arXiv preprint arXiv:1801.00862v3.
[42] Markus Reiher, Nathan Wiebe, Krysta M. Svore, Dave Wecker, and Matthias
Troyer. 2017. Elucidating reaction mechanism on quantum computers. Proceed-
ings of the National Academy of Sciences 114, 29 (2017), 7555–7560.
[43] M. Sadegh Riazi, Mojan Javaheripi, SiamU. Hussain, and Farinaz Koushanfar. 2019.
MPCircuits: Optimized Circuit Generation for Secure Multi-Party Computation.
In Int’l Symp. on Hardware-Oriented Security and Trust. 198–207. https://doi.org/
10.1109/HST.2019.8740831
[44] Heinz Riener, Rüdiger Ehlers, Bruno Schmitt, and Giovanni De Micheli. 2020.
Exact synthesis of ESOP forms. In Advanced Boolean Techniques, Rolf Drechsler
and Mathias Soeken (Eds.). Springer. arXiv preprint arXiv:1807.11103.
[45] Alexander Saldanha, Heather Harkness, Patrick C. McGeer, Robert K. Brayton,
and Alberto L. Sangiovanni-Vincentelli. 1994. Performance Optimization Using
Exact Sensitization. In Design Automation Conference. 425–429. https://doi.org/
10.1145/196244.196448
[46] Tsutomu Sasao. 1993. AND-EXOR Expressions and Their Optimization. In Logic
Synthesis and Optimization, Tsutomu Sasao (Ed.). Kluwer Academic.
[47] Tsutomu Sasao and Philipp Besslich. 1990. On the complexity of mod-2 sum
PLA’s. IEEE Trans. on Computers 39, 2 (1990). https://doi.org/10.1109/12.45212
[48] Claus-Peter Schnorr. 1988. The Multiplicative Complexity of Boolean Functions.
In Int’l Conf. on Applied Algebra, Algebraic Algorithms and Error-Correcting Codes.
45–58. https://doi.org/10.1007/3-540-51083-4_47
[49] Svetlana Nikolaevna Selezneva. 2014. On the Length of Boolean Functions in the
Class of Exclusive-OR Sums of Pseudoproducts. Moscow University Computa-
tional Mathematics and Cybernetics 38, 2 (2014), 64–68. https://doi.org/10.3103/
S0278641914020083
[50] Vivek V. Shende, Aditya K. Prasad, Igor L. Markov, and John P. Hayes. 2003.
Synthesis of reversible logic circuits. IEEE Trans. on CAD of Integrated Circuits
and Systems 22, 6 (2003), 710–722. https://doi.org/10.1109/TCAD.2003.811448
[51] Mathias Soeken, Heinz Riener, Winston Haaswijk, Eleonora Testa, Bruno Schmitt,
Giulia Meuli, Fereshte Mozafari, and Giovanni De Micheli. 2018. The EPFL logic
synthesis libraries. arXiv preprint arXiv:1805.05121v2 (2018).
[52] Mathias Soeken, Martin Roetteler, Nathan Wiebe, and Giovanni De Micheli. 2019.
LUT-Based Hierarchical Reversible Logic Synthesis. IEEE Trans. on CAD of
Integrated Circuits and Systems 38, 9 (2019), 1675–1688. https://doi.org/10.1109/
TCAD.2018.2859251
[53] Stergios Stergiou, Konstantinos Daskalakis, and George K. Papakonstantinou.
2004. A fast and efficient heuristic ESOP minimization algorithm. In ACM Great
Lakes Symposium on VLSI. 78–81. https://doi.org/10.1145/988952.988971
[54] Eleonora Testa, Mathias Soeken, Luca Gaetano Amarù, and Giovanni De Micheli.
2019. Logic Synthesis for Established and Emerging Computing. Proc. IEEE 107,
1 (2019), 165–184. https://doi.org/10.1109/JPROC.2018.2869760
[55] Eleonora Testa, Mathias Soeken, Luca G. Amarù, and Giovanni De Micheli. 2019.
Reducing the Multiplicative Complexity in Logic Networks for Cryptography
and Security Applications. In Design Automation Conference. 74. https://doi.org/
10.1145/3316781.3317893
[56] Eleonora Testa, Mathias Soeken, Heinz Riener, Luca Gaetano Amarù, and Gio-
vanni De Micheli. 2020. A logic synthesis toolbox for reducing the multiplicative
complexity in logic networks. In Design, Automation and Test in Europe.
[57] Ajay K. Verma, Philip Brisk, and Paolo Ienne. 2008. XP2: A new compact rep-
resentation for manipulating arithmetic circuits. In Int’l Workshop on Logic and
Synthesis.
[58] Wenlong Yang, Lingli Wang, and Alan Mishchenko. 2012. Lazy man’s logic
synthesis. In Int’l Conf. on Computer-Aided Design. 597–604. https://doi.org/10.
1145/2429384.2429513
