Generalized swap networks for near-term quantum computing by O'Gorman, Bryan et al.
Generalized swap networks for near-term quantum computing
Bryan O’Gorman,1, 2, 3 William J. Huggins,2 Eleanor G. Rieffel,3 and K. Birgitta Whaley2
1Department of Electrical Engineering and Computer Sciences, University of California, Berkeley
2Department of Chemistry and Berkeley Quantum Information and Computation Center, University of California, Berkeley
3Quantum Artificial Intelligence Laboratory, NASA Ames Research Center, Moffett Field, CA
(Dated: May 14, 2019)
The practical use of many types of near-term quantum computers requires accounting for their
limited connectivity. One way of overcoming limited connectivity is to insert swaps in the circuit so
that logical operations can be performed on physically adjacent qubits, which we refer to as solving
the “routing via matchings” problem. We address the routing problem for families of quantum
circuits defined by a hypergraph wherein each hyperedge corresponds to a potential gate. Our
main result is that any unordered set of k-qubit gates on distinct k-qubit subsets of n logical
qubits can be ordered and parallelized in O(nk−1) depth using a linear arrangement of n physical
qubits; the construction is completely general and achieves optimal scaling in the case where gates
acting on all
(
n
k
)
sets of k qubits are desired. We highlight two classes of problems for which
our method is particularly useful. First, it applies to sets of mutually commuting gates, as in
the (diagonal) phase separators of Quantum Alternating Operator Ansatz (Quantum Approximate
Optimization Algorithm) circuits. For example, a single level of a QAOA circuit for Maximum Cut
can be implemented in linear depth, and a single level for 3-SAT in quadratic depth. Second, it
applies to sets of gates that do not commute but for which compilation efficiency is the dominant
criterion in their ordering. In particular, it can be adapted to Trotterized time-evolution of fermionic
Hamiltonians under the Jordan-Wigner transformation, and also to non-standard mixers in QAOA.
Using our method, a single Trotter step of the electronic structure Hamiltonian in an arbitrary basis
of n orbitals can be done in O(n3) depth while a Trotter step of the unitary coupled cluster singles
and doubles method can be implemented in O(n2η) depth, where η is the number of electrons.
I. INTRODUCTION
The state of experimental quantum computing is
rapidly advancing towards “quantum supremacy” [1],
i.e., the point at which quantum computers will be able
to perform certain specialized tasks that are infeasible
for even the largest classical supercomputers. Beyond
this technical milestone, however, lies another: useful
quantum supremacy, in which quantum computers can
solve problems whose answers are of interest indepen-
dently of how they were achieved. The combination
of efficient quantum algorithms [2, 3] and scalable er-
ror correction [4] makes such progress likely in the long
term, barring fundamental surprises. In the near term,
we have so-called Noisy Intermediate-Scale Quantum
(NISQ) devices [5], capable perhaps of outperforming
classical devices on certain problems, but with extremely
constrained resources. Many types of such devices (e.g.,
superconducting quantum processors) will have limited
connectivity. For the most part, existing quantum algo-
rithms assume an abstract device with arbitrary connec-
tivity, i.e., the ability to do a two-qubit gate between any
pair of qubits. In theory, this suffices given that circuits
can be compiled to any concrete family of devices with
polynomial overhead in qubits and gates [6]. In prac-
tice, polynomial overheads matter and can be the crucial
difference between being feasible and infeasible on NISQ
devices.
The overall goal of compilation within the quantum cir-
cuit model is to take a quantum algorithm and implement
it (maybe approximately) on a concrete piece of quan-
tum computing hardware. There are many approaches
to this, but perhaps the most straightforward is to trans-
form the desired quantum circuit into an executable one
in two steps: 1) decomposition of the constituent gates
into (maybe approximately) equivalent sub-circuits con-
sisting of “native” gates, and 2) what we call routing
via matchings of the circuit [7]. Our focus here is on
the routing problem, in which the logical qubits are dy-
namically assigned to physical qubits in a way that al-
lows the desired logical gates to be implemented while
respecting the restricted connectivity of the actual hard-
ware. In general, it may be necessary to use swap gates
to change this assignment of logical qubits to physical
qubits throughout the execution of the circuit.
In the past several years, there has been a blossom-
ing of tools for addressing variants of this routing prob-
lem, which are variously called “quantum circuit place-
ment”, “qubit mapping”, “qubit allocation”, or “quan-
tum circuit compilation” (though the latter term gen-
erally encompasses much more). Prior work, however,
has taken one of two approaches. First, of theoretical
interest, is to show how any quantum circuit can be con-
verted “efficiently” (i.e., with polynomial overhead) into
one in which gates act only locally in some hardware
graph [6, 8, 9]. The second is an instance-specific ap-
proach, in which the problem is solved anew for each
logical circuit [10–20]. We propose and instantiate a new
instance-independent approach, in which the routing is
done for a family of instances, with little-to-no compila-
tion necessary for each instance; the per-instance compi-
lation time is therefore effectively amortized to nil. This
ar
X
iv
:1
90
5.
05
11
8v
1 
 [q
ua
nt-
ph
]  
13
 M
ay
 20
19
2approach, which finds solutions for families of instances,
interpolates between the two approaches above and seeks
to balance the time to solution and the quality of solu-
tion. The family-specific routing can be found either al-
gorithmically or, as is done here, manually. Algorithms
useful in the instance-independent approach, where qual-
ity of solution is prioritized over time to solution, may
(but not necessarily) differ significantly from those useful
in the instance-specific approach, wherein the prioritiza-
tion is reversed. On the other hand, for many problem
families, there is an instance with maximal structure on
which instance-specific algorithms can be run, thus ob-
taining compilations that can be used for the whole fam-
ily. In general, these instance-specific approaches work
best on sparser cases, and on dense instances will return
inferior compilations to the ones given here.
In many quantum algorithms for quantum chemistry
it is the case that all circuits of a given size for a partic-
ular problem have the same structure with respect to a
partial ordering of the operations, and the only instance-
specific aspect is the parameters (e.g. rotation angles)
of the gates. Furthermore, the implementation of these
gates on hardware often has the same properties (e.g.
fidelity and duration) regardless of the parameters. In
such cases, the instance of the compilation problem is
effectively independent of the instance of the applica-
tion problem. Compare this with implementing QAOA
on hardware in which gate durations are independent of
their parameters, in which case the routing problem for a
given problem instance is the same regardless of the vari-
ational parameters, but differs significantly for different
problem instances. In cases in which gate durations vary,
an upper bound on (or average over) the range of dura-
tions can be used to obtain instance-independent compi-
lations. Thus, the distinction between instance-specific
and instance-independent approaches is somewhat sub-
jective and contextual, but we merely aim to emphasize
that there is an under-explored regime in the trade-off
between quality of solution and computation time in ap-
proaches to the quantum circuit routing problem.
An alternative approach for variational algorithms in
general is to obviate the compilation problem by using
an ansatz that is based on the connectivity of the tar-
get hardware [21, 22] and less so on the target applica-
tion. By efficiently compiling application-specific circuits
to constrained hardware, our methods combine the effi-
ciency of this approach with respect to physical resources
with the advantages of an application-specific ansatz (e.g.
fewer variational parameters).
A method was recently proposed for implementing a
Trotter step of a fermionic Hamiltonian containing
(
n
2
)
terms, where n is the number of orbitals, using a circuit
of depth n with only linear connectivity [23, 24]. Using
fermionic swap gates [25], Kivlichan et al. were able to
change the mapping between fermionic modes and phys-
ical qubits while preserving anti-symmetry [23]. By con-
structing a network of these gates such that, at some
point in the circuit, each pair of orbitals is assigned to
some pair of neighboring qubits, they were able to guar-
antee that they could implement each of the terms in
the Trotter decomposition of the Hamiltonian by acting
only locally on said pair of qubits, and that they could
implement n/2 such terms in parallel. In this work, we
generalize their approach and describe a way to construct
networks of fermionic swap gates acting on n qubits such
that each k-tuple of fermionic modes is mapped to adja-
cent physical qubits at some point during the circuit. The
circuits that we construct have an asymptotically optimal
depth of O(nk−1) while only assuming linear connectiv-
ity.
For fermionic Hamiltonians, Motta et al. take a differ-
ent approach that exploits the fact that many Hamilto-
nians of practical interest are low-rank [26]. For unitary
coupled cluster and full-rank generic chemical Hamilto-
nians, their methods achieve the same scaling as ours, as
summarized in Table I. Our methods provide an alterna-
tive Trotter order, whose relative value will need to be
studied empirically. For a Trotter step of the Hamilto-
nian for real molecular systems, empirical data indicate
that their low-rank methods can achieve O(n2) depth.
The question of how to optimally implement a collec-
tion of k-qubit operators is not confined to the simulation
of fermionic quantum systems. Another promising use is
in the application of the Quantum Alternating Opera-
tor Ansatz (Quantum Approximate Optimization Algo-
rithm, or QAOA) to Constraint Satisfaction Problems
(CSPs) over Boolean domains. This approach was taken
for the Maximum Cut problem using existing linear swap
networks [27]; our methods can address k-CSP for any k
in O(nk−1) depth.
Our main contributions are:
• Formalizing a variant of the quantum circuit rout-
ing problem in a way that abstracts away details of
particular devices and focuses on their geometry,
which is shared by a wide class of devices;
• Making explicit and general the equivalence be-
tween swap networks that change the mapping of
logical qubits to physical qubits and those that
change the mapping between fermionic modes and
physical qubits;
• Explicit constructions for several important classes
of problems, as summarized in Table I, using modu-
lar primitives that can be applied to new problems;
and
• Providing tools for lower bounding the depth of so-
lutions to the routing problem, in particular by con-
necting it with prior work on acquaintance time and
graph minors.
This paper is organized as follows. In Section II, we
more formally describe the quantum circuit routing prob-
lem and our approach thereto. In Section III, we in-
troduce generalized swap networks that will be used in
3Instance family: k-CSP UCCGSD UCCSD UpCCGSD
Depth: Θ(nk−1) Θ(n3) Θ(ηn2) Θ(n)
TABLE I. Main results. k-CSP indicates depth for a sin-
gle round of QAOA. Remaining columns indicate depth for
a single Trotter step of the coupled cluster operator or of a
similarly structured variational ansatz. All are optimal up to
constant prefactors for arbitrary connectivity. See Section V
for details regarding k-CSP and Section VI regarding Unitary
Coupled Cluster.
the constructions of later sections. In Section IV, we in-
troduce some specific quantum simulation tasks related
to fermionic Hamiltonians, as well as the Quantum Al-
ternating Operator Ansatz (QAOA), which yield fami-
lies of circuits that can be routed using our methods.
In Section V, we present our main result, showing how
to achieve optimal scaling when routing (with an arbi-
trary ordering) circuits consisting of a k-qubit gate for
each possible set of k qubits. In Section VI we describe
families of instances arising from the Unitary Coupled
Cluster method and how to efficiently route them. In
Section VII, we conclude. A reader familiar with either
QAOA or quantum simulation of fermionic Hamiltonians
and interested in quickly learning some useful techniques
may do so in sections III, V, and VI.
In Appendix A, we discuss the instance-independent
approach in the context of quantum annealing. In Ap-
pendix B, we show how to lower bound the depth of a
solution to the circuit routing problem.
II. MODEL
We consider hardware consisting of a line of n qubits
and suppose that we are able to implement any k-qubit
gate in time τk on any set of k qubits that are adja-
cent. This is an abstraction of the more physical model
in which only 1- and 2-qubit gates can be directly im-
plemented; τk for k > 2 is thus some linear combination
of τ1 and τ2 that indicates an upper bound on the cost
of compiling any k-qubit gate. When considering a spe-
cific piece of hardware, this model is relatively coarse;
different gates on different sets of physical qubits may
require vastly different times to implement. However,
this level of abstraction allows for significant generality
without too great a loss of precision. Accordingly, for a
specific piece of hardware, our constructions should be
considered as a starting point, with low-level optimiza-
tions likely to improve the constant factors significantly.
For example, the line of qubits on which the swap net-
works are defined can be embedded in a “castellated”
manner in a 2× (n/2) lattice, as shown in Figure 1. The
availability of the additional qubit adjacency can enable
more efficient decomposition of higher-locality gates.
The problem we would like to solve is as follows: given
a set of k-qubit gates G on n qubits, implement them in
FIG. 1. “Castellated” mapping of the Jordan-Wigner string
into a 2 × (n/2) square lattice. While all swapping is done
along the Jordan-Wigner string, mapping the string to the
lattice in this way allows for potentially more efficient de-
composition of the 4-qubit gates by making available a fourth
adjacency.
some order on the hardware described above. In particu-
lar, we focus on the swap-network paradigm. That is, we
start with an initial assignment of logical qubits to phys-
ical qubits and insert a sequence of 2-qubit swap gates
to move the logical qubits around so that for every gate
in G the logical qubits on which it acts are physically
adjacent at some point in the process. As discussed in
Sections IV A and VI, the routing problem thus defined is
equivalent to the problem of using fermionic swap gates
to change the ordering of a Jordan-Wigner string to en-
able the implementation of gates locally. Without loss of
generality, we assume that there is at most one gate in G
acting on any set of qubits, and that for any gate g ∈ G
acting on a set of qubits S there is no other gate g′ ∈ G
acting on a subset of qubits S′ ⊂ S. This is a convenient
abstraction, rather than a restriction. An instance of the
routing problem is thus specified as a hypergraph, with
vertices corresponding to logical qubits and hyperedges
corresponding to logical gates. We focus on k-complete
hypergraphs, ones in which for every subset of k vertices,
there is an edge connecting them; |G| = (nk). Results
for complete hypergraphs give worst case bounds for the
general problem. A more general variant is the more typ-
ically considered problem in which one wants to enforce
a temporal partial ordering on the logical gates.
In general, near-term hardware will have greater con-
nectivity than a line; nevertheless, it will likely contain a
line as a subset, so that our constructions give a baseline.
Even with greater connectivity, our scaling is optimal
when the number of gates is Ω(nk). Let m = |G| be the
number of gates, ν the number of physical qubits, and n
the number of logical qubits. At most ν gates can be im-
plemented at a time, so the circuit depth must be at least
m/ν. For m = Ω(nk) and ν = O(n), this implies a mini-
mal depth of Ω(nk−1), which our construction provides.
Because our focus is on resource-constrained near-term
hardware, we shall assume that the number of physical
qubits is equal to the number of logical qubits.
III. SWAP NETWORKS
Henceforth, by “swap gate”, we shall mean either the
standard swap gate (when considering a mapping of log-
ical qubits to physical qubits) or the fermionic swap gate
4=
⇓
FIG. 2. Top: Notation (left) and decomposition (right) for the canonical 2-complete linear (2-CCL) swap network for
acquainting all pairs of qubits, annotated with empty boxes showing acquaintance opportunities. The circuit has depth n, and
contains n(n− 1)/2 swap gates. Bottom: The canonical 3-complete linear (3-CCL) swap network that acquaints all triples of
qubits. It is formed by replacing each layer i of acquaintance opportunities in the 2-CCL swap network with a Pi-swap network;
the qubits to be acquainted in the former are the parts in the partition Pi. The 3-local acquaintance opportunities are not
shown. Part of this swap network is shown in more detail in the top of Figure 5, with acquaintance opportunities indicated.
(Each layer of 1-swaps will be cancelled out by the final layer in the expansion of the preceding 2-swap network; we include
them here to make the recursive step clear.)
(when considering a mapping of fermionic modes to phys-
ical qubits); for circuit routing, everything is exactly
the same in both cases except for the “interpretation”.
A swap network is a circuit consisting entirely of swap
gates. We define a 2-complete linear swap network, a no-
tion we shall generalize shortly, to be a swap network in
which all pairs of logical qubits are linearly adjacent at
some point in the circuit and in which all swap gates act
on linearly adjacent physical qubits. Such networks en-
sure that, in the linear architecture described in Sec II,
there is an opportunity to add, for each pair of logical
qubits, a 2-qubit gate acting on those logical qubits (or
fermionic modes as the case may be) at some point in the
circuit. We call such opportunities acquaintance oppor-
tunities. They are not part of the swap network, but we
shall often draw them as empty boxes in circuit diagrams
to illustrate acquaintance properties of swap networks,
as in Figure 2. We shall say that a set of logical qubits
that has at least one such acquaintance opportunity is
“acquainted” by the network, or that the swap network
“acquaints” those qubits.
Before generalizing this notion, we review the con-
struction of Kivlichan et al. [23] for implementing a 2-
local gate on every pair of logical qubits in depth n,
using
(
n
2
)
swap gates. The swap network underlying
this construction is what we shall call the canonical 2-
complete linear swap network. Let the physical qubits
be labeled 1 through n, and partition the pairs of ad-
jacent qubits into two sets based on the parity of their
larger index: even pairs {{1, 2}, {3, 4}, . . .} and odd pairs
5{{2, 3}, {4, 5}, . . . , }. Note that the pairs in each parti-
tion are mutually disjoint. We define the canonical 2-
complete linear (2-CCL) swap network as n alternating
layers of swaps on the even pairs and odd pairs, as illus-
trated in the top half of Figure 2. The overall effect of
the 2-CCL swap network is to reverse the ordering of the
logical qubits. In doing so, it directly swaps every pair of
logical qubits. This construction has the attractive prop-
erty that each acquaintance opportunity precedes a swap
gate on the same two qubits, so any added gate that acts
on a pair of logical qubits can be combined with the swap
of those two qubits, with the result that in depth n we
can execute a 2-qubit gate between every pair of logical
qubits.
One direction for generalization is to (S,A)-swap net-
works, where S is a subset of all pairs of qubits and A is
an architecture, such as a 2D grid. The set S captures
the pairs of qubits to which we want to apply 2-qubit
gates at a given stage in a circuit. We shall not discuss
this generalization further in this paper, other than to
note that our results can be used to provide bounds for
(S,A)-swap networks. Because in the present work we
shall present only swap networks acting on a line, we shall
often leave that aspect implicit in the terminology and
refer simply, e.g., to a “2-complete swap network”.
Instead, we are interested in generalizing to k-complete
swap networks, networks in which the elements of every
set of k logical qubits are adjacent at some point, so that
a k-qubit gate (or set of 1- and 2-qubit gates making up
the k-qubit gate) could be applied thereto. To support
the construction of k-complete swap networks in Sec.V,
here we introduce a generalization of a 2-complete swap
network that swaps elements of a partition of qubits,
rather than individual logical qubits: a complete P-swap
network, where P is an ordered partition of the physi-
cal qubits such that each part contains only contiguous
qubits, contains only swap operators that swap parts of
the partition. In this way, a complete P-swap network
has the property that every part in the partition is ad-
jacent to every other part in the partition at some point
in the network.
In constructing P-swap networks, it will be useful to
swap pairs of sets of qubits using what we call a (k1, k2)-
swap gate, or, more generally, a generalized swap gate.
The (k1, k2)-swap gate swaps a set of k1 logical qubits
with a set of k2 logical qubits, while preserving the
ordering within each set, i.e., it permutes a sequence
of logical qubits from (i1, . . . , ik1 , ik1+1, . . . , ik1+k2) to
(ik1+1, . . . , ik1+k2 , i1, . . . , ik1). Several examples of these
generalized swap gates and their decompositions are
shown in Figure 3. In general, a (k1, k2)-swap gate can
be decomposed using k1 ·k2 standard swap gates in depth
k1 + k2 − 1. We call a swap network a k-swap net-
work whenever it contains only (k1, k2)-swap gates for
k1, k2 ≤ k.
The canonical P-swap network has the same structure
as the 2-CCL swap network, except that instead of pairs
of single qubits being swapped at a time, pairs of sets of
=
=
=
=
FIG. 3. Generalized swap gates. From top to bottom,
notation and decompositions for (1, 2)-, (1, 3)-, (2, 2)-, and
(3, 3)-swap gates. A (k1, k2)-swap gate can be implemented
in depth k1 + k2 − 1 using k1k2 standard swap gates.
qubits (i.e., the parts of the partition) are swapped. In
the canonical P-swap network, each (k1, k2)-swap gate is
preceded by a (k1 + k2)-local acquaintance opportunity.
To make the overall effect of a complete P-swap network
be a complete reversal of the qubit mapping, we append
to the end a 1-swap network within each part. This is
unnecessary when considering a single swap network, but
may be helpful when using the swap network as a primi-
tive in a larger construction. Note that this is is primarily
for explanatory purposes, and in an actual implementa-
6=
FIG. 4. Notation (left) and decomposition (right) for
a bipartite swap network corresponding to the bipartition
(((1, 2) , (3, 4) , (5, 6)) , ((7, 8) , (9, 10) , (11, 12))). For each pair
of qubits in the first half and each pair of qubits in the sec-
ond half, their union is acquainted; the overall effect is the
same as that of a generalized swap gate corresponding to the
two halves. Note the similarity of the notation to that for a
complete swap network, except that the dotted lines connect
only each part of the bipartition.
tion would likely be optimized away. In the recursive
strategy for k-local hypergraphs (discussed in Sec. V),
each generalized swap gate is preceded by some number
of acquaintance opportunities and swap gates that ensure
that each set of k1 or k2 qubits is acquainted with each
one of the other set.
The 2-CCL swap network has the exact same structure
as the optimal sorting network on a line [28]. A sorting
network is a fixed circuit consisting of “comparators”.
Given an initial assignment of objects to the wires, each
comparator compares the objects and swaps them if they
are out of order. This means that a subset of the 2-
CCL swap network can be used to effect an arbitrary
permutation of logical qubits in at most linear depth.
The swap networks above acquaint all pairs of sets of
qubits. Another useful primitive is what we call a “bipar-
tite swap network”; again, this should be more precisely
called a “bipartite linear swap network” to emphasize
that it acts on a line, but we leave this implicit for conci-
sion. Given a bipartition of sets of qubits, it acquaints all
the unions of pairs of sets of qubits which can be formed
by taking one set from the first part and the other set
from the second part. While the depth of a bipartite
swap network is similar to that of a complete swap net-
work, the gate count is approximately halved. Figure 4
shows an example bipartite swap network for the sets of
qubits ((1, 2), (3, 4), . . . , (11, 12)) with the first three in
one part and the latter three in the second part.
Swap networks can be useful for measurement as well.
In many cases, the gates to be executed correspond one-
to-one with the terms of a Hamiltonian to be measured.
Any swap network used to implement those gates thus
yields a partition of the terms of the Hamiltonian into
parts containing only gates acting on disjoint sets of
qubits. This partition can then be used to parallelize
the measurements. After an application of the swap net-
work, the swap layers following the logical layer to be
measured can be executed in reverse to return the map-
ping to one in which the terms of the Hamiltonian are
mapped to adjacent sets of qubits, with appropriate op-
Application QAOA Quantum chemistry
Iteration
∏
I exp
(
iγcI
∏
i∈I Zi
) ∏
p,q,r,s exp (−itHp,q,r,s)
Assignment logical qubits fermionic modes
Changed by SWAP FSWAP
TABLE II. The two problem families we consider. The table
lists the iterated operator we compile, and the logical unit
and gates used in the compilation.
timizations made to account for the fact that many swap
gates will likely cancel out once the logical gates are re-
moved. Alternatively, a simple sorting network can be
used to achieve the same end. For fermionic Hamiltoni-
ans, this approach can significantly reduce the number
of measurements needed by reducing the locality of all
measurement terms, in addition to the savings yielded
by parallelization.
IV. PROBLEM FAMILIES
In this section, we introduce two families of quantum
circuits that come from quite different application do-
mains but whose compilation can be addressed using es-
sentially the same tools; the analogy is summarized in
Table II. Both cases involve repeated application of a
circuit of a particular form such that for each iteration
the compilation instance is the same. We provide con-
structions for a single iteration; these can be repeated
sequentially for the full circuit. Solving the compilation
instance for the full circuit all at once may provide a bet-
ter solution, but likely at the cost of it being much harder
to find.
A. Fermionic Hamiltonians
The general form of the electronic structure Hamilto-
nian in second quantization is
H =
∑
p,q
cp,qa
†
paq +
∑
p,q,r,s
cp,q,rsa
†
pa
†
qaras, (1)
where p, q, r, s label single-electron orbitals, cp,q and
cp,q,r,s are real coefficients, and a
†
p is the creation opera-
tor for the pth orbital. A common subroutine of quantum
simulation algorithms is the Trotterization of time evo-
lution under such a fermionic Hamiltonian [29]:
e−iHt ≈
t/δt∏
l=1
( ∏
p,q,r,s
e−iHp,q,r,sδt
)
, (2)
where Hp,q,r,s is the part of the Hamiltonian that acts
exclusively on modes p,q,r,s. (For simplicity we absorb
the terms acting on two fermionic modes into the terms
acting on four.) One approach to mapping the fermionic
operators e−iHp,q,r,s into operators acting on the qubit
7Hilbert space is to employ the Jordan-Wigner transfor-
mation [30],
ap = −
p−1∏
i=1
σ
(z)
i · σ(−)p . (3)
After performing the Jordan-Wigner transformation on
Equation 2, many of the resulting operators will act non-
trivially on Θ(n) qubits, resulting in a naive gate depth
of Θ(n5) for the implementation of Equation 2, assuming
there are Θ(n4) terms in the Hamiltonian. As we shall
see, by reordering the fermionic modes (thereby chang-
ing the Jordan-Wigner ordering), this overhead from the
non-locality of the Jordan-Wigner transformation is ad-
dressed automatically in our scheme for parallelization.
For this reason, our constructions provide significant ad-
vantage even when connectivity is not a constraint, in-
cluding in the error-corrected regime. As a result, at
least with respect to scaling, we avoid the need for more
sophisticated alternatives to the Jordan-Wigner trans-
formation, such as those developed by Bravyi and Ki-
taev [31] and others [32].
A related approach, employed by a variety of works
proposing the study of quantum chemistry using a near-
term device, is the use of a quantum circuit to prepare
and measure the unitary coupled cluster ansatz [33–36].
Under the typical choice to include only single and double
excitations in the cluster operator, this wave function is
given by
|ψ〉 = eT−T † |φ0〉, (4)
where the cluster operator T has a form similar to H
in Equation 1. Usually, it contains only excitations from
the η “occupied” orbitals which contain an electron in the
reference state |φ0〉 to the n−η “virtual” orbitals, and the
coefficients are determined variationally. We refer to this
case as UCCSD, and the case where all 2-electron exci-
tations are included as UCCGSD. The exact exponential
of Equation 4 is typically approximated by a Trotter ex-
pansion and (assuming n η), the Θ(n) overhead from
the non-locality of the Jordan-Wigner strings discussed
above would lead to a circuit depth of Θ(n3η2) for a sin-
gle Trotter step.
We show how depths of O(n3) and O(n2η) can be
achieved for a Trotter step of the time evolution under a
fermionic Hamiltonian (or the similarly structured UC-
CGSD) and the UCCSD ansatz, respectively. These scal-
ings match the asymptotic results of Ref. [37] while also
respecting the spatial locality of the available gates and
requiring no additional ancilla qubits.
B. QAOA
As originally proposed [38, 39], QAOA is a method for
minimizing the expectation value of a diagonal Hamilto-
nian
Hf =
∑
I⊂[n]
cI
∏
i∈I
Zi (5)
corresponding to a classical function f : {±1}n → R
whose multilinear form is
f(s) =
∑
I⊂[n]
cI
∏
i∈I
si. (6)
The minimization is done variationally over states of the
form
|β,γ〉 = eiβpMeiγpHf · · · eiβ1Meiγ1Hf |+〉⊗n, (7)
which consists of p alternating applications of the “phase
separator” eiγHf and the “mixer” M =
∑n
i=1Xi. The
phase separator can be written as the product of gates
corresponding to terms in the Hamiltonian,
eiγHf =
∏
I
eiγcI
∏
i∈I Zi . (8)
Note that the gates are diagonal and so their order does
not matter. The locality of the gates corresponds directly
to the locality of the terms in the Hamiltonian, max |I|.
QAOA applied to k-CSP, in which each term acts on at
most k variables, thus requires k-qubit gates.
Hadfield et al. [40] generalized QAOA to the Quantum
Alternating Operator Ansatz, employing a wider variety
of mixers, many of which involve k-qubit gates. While
these gates, in general, do not commute, it is an open
question how the order of the gates affects the efficacy
of the mixing. In NISQ devices with limited depth, the
depth in which different mixers can be implemented plays
a key role in their usefulness. The techniques here can be
applied to these alternative mixers, with different order-
ings giving different mixers within the same family, and
the resulting compilation a key step in determining the
most effective mixing strategy.
V. COMPLETE HYPERGRAPHS
A. Cubic interactions
Now, suppose we want to implement a 3-qubit gate
between every triple of logical qubits. We call a swap
network that achieves this goal a 3-complete linear (3-
CCL) swap network. We can do so in the following way.
First, we start with the 2-CCL swap network, as shown
in the top half of Figure 2. At each layer i where ac-
quaintance opportunities appear, consider the partition
Pi whose parts are the pairs of qubits appearing in the
acquaintance opportunities together with singleton parts
for any unpaired qubits at the boundary. To obtain a 3-
complete linear swap network, we add 2-swap networks
corresponding to the partition Pi, as shown in the bottom
8half of Figure 2. The 3-way acquaintance opportunities,
where 3-local gates (or compilations of them to 1-and 2-
local gates) can be added, are interspersed between the
generalized swaps making up the Pi-swap network, as
shown in the top half of Figure 5. We make use of the
property that for any two pairs of logical qubits involved
in a 2-swap, each triple consisting of one of the pairs and
one qubit from the other pair is mapped to three con-
tiguous physical qubits either before or after the swap.
This ensures that overall every triple of logical qubits is
acquainted because any triple T is the union of a pair and
a third qubit. The 2-CCL network ensures that the pair is
adjacent at some point, and thus a part S of some parti-
tion Pi. The third qubit is necessarily in some other part
S′ of the same partition, so that at some point in the Pi-
swap network there is a 2-swap network involving S and
S′, ensuring that the triple T is acquainted. (Actually, it
is acquainted thrice, because there are three pairs S for
which the preceding logic applies.) There are exactly n
2-swap networks inserted, and each 2-swap gate can be
implemented in depth 3 using standard swap gates, for a
total depth of approximately (3/2)n2(τ2 + τ3) = Θ(n
2).
B. General k-qubit gates
The above ideas generalize to arbitrary k. The con-
struction is recursive. First, construct the network to im-
plement all (k−1)-qubit gates. Then replace every layer i
of acquaintance opportunities with the corresponding Pi-
complete swap network, inserting 1-swaps and acquain-
tance opportunities between the layers of (k − 1)-swaps
in order to acquaint each set of k − 1 qubits with each
qubit in the other set of k − 1 qubits with which it will
be swapped. Specifically, when inserting a (k − 1)-swap
involving two sets of (k − 1) qubits each, we want to en-
sure that each set of k qubits consisting of one of the sets
and one qubit from the other set is mapped to k con-
tiguous physical qubits either before or after the swap.
For k = 2, this is the case without additional swaps. For
larger k, this can be achieved by adding swaps that bring
half each of set to the “interface” between them before
the swap (the half closest to the interface), and the other
half to the interface afterwards (when it will then be the
closer half). This ensures that overall every set of k log-
ical qubits is acquainted because any such set T is the
union of a set S of k − 1 qubits and a kth qubit t. Sup-
pose we start with a swap network that acquaints every
set of k − 1 qubits, and in particular S, so that S is a
part of some partition Pi (corresponding to acquaintance
layer i in the starting swap network) in the recursive step.
The kth qubit t is necessarily in some other part S′ of
the same partition Pi, so that at some point in the Pi-
swap network there is a (k− 1)-swap involving S and S′,
ensuring that the set T is acquainted. (Actually, it is ac-
quainted at least k times, because there are k sets S ⊂ T
for which the preceding logic applies.)
Each (k − 1)-swap network has depth at most n in
terms of (k− 1)-swap gates. A k-swap gate has depth at
most 2k−1 in terms of standard swap gates, and the ad-
ditional swaps for bringing inner qubits to the interface
add depth k − 2 at each swap. Therefore, if we have a
depth O(nk−2) construction for all (k−1)-qubit gates, we
can use that to get an O(nk−1) depth construction for all
k-qubit gates. The base case is the linear-depth 2-CCL
swap network for 2-qubit gates. Figures 2 and 5 show the
steps for k = 4. Lower-locality gates can be included in
one of two ways, or a combination thereof. First, they can
be incorporated directly into the highest-locality gates.
Alternatively, the lower-locality acquaintance opportuni-
ties can be kept when recursing.
Using this recursive method yields a significant amount
of redundancy with respect to the number of times that
each set of k qubits can be acquainted. For applications
in which the gates do not commute, this can be exploited
in two ways. First, distributing the gates over all possible
acquaintance opportunities may lead to smaller Trotter
errors. Second, for each gate a possible acquaintance
opportunity may be chosen randomly. In other words,
the swap network can be considered as a family of swap
networks, each corresponding to a particular Trotter or-
der; prior work shows that such random Trotter orderings
may be helpful [41].
C. Alternative for 3-local
Here we present an alternative construction for sets of
3-local gates. Its depth is similar to that of the other
given, but it doesn’t obviously generalize. We include
it for two reasons: it demonstrates a potentially useful
property of complete linear swap networks, and it may
be better when applied to specific hardware devices.
Note that in the 2-complete swap network, every pair
of logical qubits that is initially at distance 2 from each
other remains so, except near the ends of the line. Fur-
thermore, every other logical qubit passes through them
at some point. For our purposes, this means that in the
course of the 2-complete swap network we can execute
any 3-local gate such that some pair of the three logical
qubits on which it acts is at distance 2 at the start of the
network.
Consider a sequence of mappings la-
beled by ∆ = 1, . . . , n/2. In the map-
ping labeled by ∆, the logical qubits
(1, 1 + ∆, 1 + 2∆, . . . , 2, 2 + ∆, 2 + 2∆, . . . ,∆, 2∆, 3∆, . . . , )
are mapped to physical qubits
(1, n, 3, n− 1, 5, . . . , bn/2c+ 1), respectively. Any
triple of logical qubits contains at least one pair that
are mapped to physical qubits at distance 2 in at least
one of the n/2 mappings. The construction is thus:
alternate between 1) 2-complete swap networks with
initial assignments given by the mappings, and 2) sorting
networks to get to the next mapping. The 2-complete
swap networks have depth n and the sorting networks
depth at most n, so overall the total depth is at most
9=
⇓
FIG. 5. Top: Notation and decomposition for complete 2-swap network for acquainting each pair of qubits in the initial
partition with every other third qubit. Note how the 3-qubit acquaintance opportunities are almost perfectly parallelized;
this helps significantly when recursing further. Bottom: Swap network for acquainting every set of 4 qubits such that 2 of
the 4 qubits were paired in the partition of the originating 2-swap network, formed by replacing each layer of acquaintance
opportunities in the complete 2-swap network with a complete 3-swap network, in the same manner as in Figure 2. In the swap
network to acquaint every set of 4 qubits, this replacement (i.e., from top to bottom of this figure) is done for every complete
2-swap network in the circuit for acquainting every set of 3 qubits, i.e., for every other layer in the bottom of Figure 2.
2n · (n/2) = n2.
VI. UNITARY COUPLED CLUSTER
In this section, we describe how the techniques of this
paper can be used to implement Trotterized versions of
three different types of unitary coupled cluster ansatz
with a depth scaling that is optimal up to constant pref-
actors. We present the details for the standard unitary
coupled cluster method with single and double excita-
tions from occupied to virtual orbitals (UCCSD) [33–35],
a unitary coupled cluster that includes additional, gen-
eralized, excitations (UCCGSD) [42, 43], and a recently
introduced ansatz that is a sparsified version of UCCGSD
(k-UpCCGSD) [36].
The standard unitary coupled cluster singles and dou-
bles ansatz is given by
|ψ〉 = eT−T † |φ0〉, (9)
where |φ0〉 is the Hartree-Fock state, T = T1 + T2,
T1 =
∑
i∈occ
a∈vir
tai a
†
aai,
T2 =
∑
i,j∈occ
a,b∈vir
ta,bi,j a
†
aa
†
bajai.
(10)
The i and j indices range over the η “occupied” orbitals
(those which are occupied in the Hartree-Fock state |φ0〉)
and the a and b indices over the n− η “virtual” orbitals
(those which are unoccupied in |φ0〉). A Trotter step of
the corresponding unitary has
(
η
2
)(
n−η
2
)
4-local gates.
These can be implemented in O(ηn2) depth, as shown
in Figure 6. First, we assign the occupied orbitals to the
first η physical qubits (1, . . . , η) and the virtual orbitals
to the last n− η physical qubits (η + 1, . . . , n). We have
a 2-complete swap network on the occupied orbitals. In
between every swap layer thereof, we do a 2-complete
swap network on the virtual orbitals. For every pair of
occupied orbitals and every pair of virtual orbitals, there
is a layer in this composite network such that the pairs are
simultaneously adjacent. Thus, if we then insert a final 2-
swap network with appropriate partitions at every layer,
then every set of 2 occupied orbitals and 2 virtual orbitals
will be adjacent at some point and a 4-local gate can be
implemented on them. There swap depth of just the 2-
complete swap networks is η(n−η+1)τ2 = Θ(ηn). Before
each one, a 2-swap network is inserted with an average
depth of (n + 2)(3τ2 + τ4) = Θ(n). Overall, this yields
the claimed Θ(ηn2) depth. The coefficient of the leading
term in the depth can be halved by accounting for the
fact that we are typically interested in implementing only
those excitations that are spin-preserving. If we initially
order the spin orbitals within the sets of occupied and
virtual orbitals by ↑, ↓, ↓, ↑, ↑, . . ., then the parity of the
spins of the pairs of orbitals acquainted in each layer of
the 1-swap networks alternates, and we only need to do
a bipartite 2-swap network when the spin parities of the
layers of the two sets coincide.
A more general version of the unitary coupled cluster
ansatz is obtained by allowing excitations between any
pair of orbitals. Rather than the cluster operators given
10
↑
↓
↓
↑
↑
↓
↓
↑
⇓
FIG. 6. Construction of the swap network for a UCCSD operator (Equation 10) with two occupied and two virtual spatial
orbitals. On top is an intermediate form of the construction useful for reasoning through the logic of its structure; on bottom
is the actual network. On the first four qubits of the top half is a complete swap network with the acquaintance opportunity
layers repeated twice. On the second four qubits are four concatenated complete swap networks, one for each acquaintance
layer (before repetition) of the complete swap network on the first four qubits. The spins of the orbitals in the initial mapping
to qubits is indicated; with this initial mapping, the parity of the spins of the orbitals to be acquainted in each layer of the
complete swap networks is the same (either all ↑↑ or ↓↓, or all ↑↓). For every pair of occupied spin orbitals with some parity
and every pair of virtual spin orbitals with the same parity, there is some layer of the combined swap network in which both
pairs are simultaneously (but separately) acquainted. The construction is completed by replacing each acquaintance layer with
a bipartite swap network over the occupied and virtual orbitals, which then acquaints the union of every such pair of pairs of
spin orbitals. (An example bipartite swap network is shown in Figure 4.)
in Equation 10, we use
T1 =
∑
p,q
tqpa
†
qap,
T2 =
∑
p,q,r,s
tr,sp,qa
†
ra
†
saqap,
(11)
where the indices p, q, r, and s are allowed to range over
the entire set of orbitals (except that we often disallow
excitations that do not preserve spin). It has been shown
that the inclusion of these “generalized’ singles and dou-
bles greatly increases the ability of unitary coupled clus-
11
ter to target the kind of strongly correlated states that
pose the greatest challenge for quantum chemical calcu-
lations on a classical computer [36, 43]. A Trotter step
for unitary coupled cluster with generalized singles and
doubles may be implemented by a straightforward appli-
cation of the techniques for implementing 4-local gates
described in Figure 5. That construction also yields the
optimal scaling here, enabling the execution of all Θ(n4)
gates operations corresponding to the terms in Equa-
tion 11 using a circuit of depth Θ(n3). One possibility for
exploiting spin symmetry is as follows. Start with an ini-
tial mapping in which the orbitals of one spin are mapped
to the first half of the physical qubits and those of the
other spin to the second half. Then apply the quartic
swap network to each half of the qubits in parallel, thus
acquainting all sets of four orbitals with the same spin.
Then apply a double bipartite swap network, of the sort
used for UCCSD, to acquaint every set of four orbitals
such that there are two orbitals of each spin.
As a final example of the utility of a swap network ap-
proach to circuit compilation, we describe the implemen-
tation of a sparse version of the unitary coupled cluster
operator with generalized singles and doubles recently
developed by Lee at al. [36]. Rather than the full set of
double excitations as in Equation 11, this variant of uni-
tary coupled cluster uses only those double excitations
that transfer two electrons with opposite spins from one
spatial orbital to another. The resulting cluster opera-
tors,
T1 =
∑
p,q,α
tqpa
†
qαapα,
T2 =
∑
p,q
tq,qp,p, a
†
q↑a
†
q↓ap↓ap↑
(12)
contain only Θ(n2) terms and can be implemented in
Θ(n) depth using the approach detailed below.
Recall our prior observation that, throughout the exe-
cution of a complete 1-swap network, every pair of logi-
cal qubits that is initially at distance 2 from each other
will remain so. Furthermore, every such pair of log-
ical qubits will become adjacent to every other pair.
Therefore we begin by ordering the fermionic modes
(1 ↑, 2 ↑, 1 ↓, 2 ↓, 3 ↑, 4 ↑, 3 ↓, 4 ↓, . . .). Then, by execut-
ing a 2-complete swap network, we bring the fermionic
modes involved in each of the 2-local and 4-local terms
in Equation 12 adjacent to each other at some point. We
show an example for n = 8 in Figure 7 below.
VII. CONCLUSION
We have introduced and instantiated an instance-
independent approach to quantum circuit routing. This
instance-independent approach has a distinct advantage
among the growing number of alternatives for addressing
the limited connectivity of physical devices: it requires ef-
fectively no marginal classical computation per instance.
Of course, there is the corresponding disadvantage that
it cannot in general achieve instance-specific optimality.
However, for many applications, including the fermionic
simulation tasks we addressed, all instances of a given size
share a topology. For applications in which this is not the
case, instance-independent swap networks can neverthe-
less provide a starting point for further optimizations.
(Regardless, simple local optimizations, such as remov-
ing any two swap gates in a row on the same pair of
qubits, should be used to tighten up the swap networks
presented here in any practical implementation.)
Another limitation of our approach is the complete sep-
aration of the decomposition aspect of compilation from
the routing aspect; perhaps a better compilation can be
found by solving these together at once. Nevertheless,
given the hardness of the compilation problem in its full
generality, we expect that this separation will in general
be useful in balancing quality of the solution found with
the time to find it.
We have made a connection between the quantum cir-
cuit routing problem and the minor embedding problem
in quantum annealing. This analogy should not be taken
too mathematically, especially when considering the or-
dered variant of the routing problem, but may still be
of value in encouraging the lifting of ideas from the sig-
nificant body of theoretical and applied work on minor
embedding. For example, the separation of gate decom-
position and circuit routing can be thought of as corre-
sponding to the separation of the parameter-setting and
minor-embedding aspects of compilation in quantum an-
nealing.
While our motivation and focus is NISQ-era devices,
our results may continue to be applicable even with full
error correction. In the surface code, for example, the
dominant cost with respect to both time and qubits is
the implementation of T gates; in comparison, the cost
of swap gates are negligible, and thus so is the overhead
in overcoming limited connectivity. However, even error-
corrected devices benefit from parallelization. Our con-
structions for swap networks imply a scheme for paral-
lelization, which may be of use independent of any map-
ping to physical qubits. For problems arising from the
Jordan-Wigner transformation of fermionic Hamiltoni-
ans, the swap networks are just as useful even with “free”
(fermionic) swap gates. In that case, the locality to be
addressed is not spatial locality but the number of qubits
that each gate acts on, which must be bounded even in
the error-corrected regime. The same applies to proposed
ion trap implementations with effectively all-to-all con-
nectivity.
There are several directions for future work. Of most
practical interest is lowering the abstraction level. That
is, using the high-level constructions presented here to
compile specific families of circuits to low-level hardware
with restricted gate sets and variable durations. This is
a necessary step in a more general program of directly
comparing swap network-based methods to alternative
approaches, with respect to quantum resources, basis-
12
1 ↑
2 ↑
1 ↓
2 ↓
3 ↑
4 ↑
3 ↓
4 ↓
FIG. 7. The swap network for a UpCCGSD operator (Equation 12) with four spatial orbitals. The initial assignment of spin
orbitals to qubits is indicated; the important feature is that the two spin orbitals for each spatial orbital are assigned to qubits
at distance 2 from each other. They then stay at distance 2 from each other throughout the evolution of the swap network
(except temporarily at the edges). The swaps are exactly the same as in the standard 1-swap network, except that a layer of
4-local acquaintance opportunities is inserted before every other swap layer, allowing the four spin orbitals corresponding to a
pair of spatial orbitals to be acquainted.
set errors, Trotter errors, etc. Furthermore, for some
algorithms, there is freedom in the choice of operator at
certain stages in the algorithm. For example, for the
alternative mixers in the Quantum Alternating Opera-
tor Ansatz [40], while reordering the gates gives different
mixers, it is an open research question as to which mixers
and which orders provide the best performance. Given
the limited depth of NISQ devices, the efficiency with
which qubit routing can be achieved for a given oper-
ator significantly impacts the choice of operator. The
techniques described here provide a key step toward ex-
ploring these trade-offs.
There is also further work to be done in the present ab-
straction level. Specifically, our construction for 4-local
gate sets is likely suboptimal with respect to constant
factors, and may be improved. The same goes for k > 4.
We also focus only on the routing problem for unordered
sets of gates, in which there is no precedence structure to
be enforced on the logical gates; examples of solutions to
the ordered problem would significantly broaden the use-
fulness of this approach. One limited example would be
the iterated circuits of a complete variational algorithm
or Trotter-based simulation, whereas in the present work
we focused on a single iteration.
More generally, with this work we have established a
foundation for designing swap networks for more appli-
cations and more architectures. A more comprehensive
understanding of how well different architectures support
the topologies of different applications can be the foun-
dation for co-design in both directions: in one direction
motivating new architectures by how well they are suited
generally or specifically to applications, and in the other
direction tweaking problems in a way that doesn’t de-
grade the value of their solution but that makes them
more efficiently solvable on a quantum computer.
ACKNOWLEDGMENTS
This work was supported by the U.S. Department of
Energy, Office of Science, Office of Advanced Scientific
Computing Research, Quantum Algorithm Teams Pro-
gram, under contract number DE-AC02-05CH11231. We
are also grateful for support from the NASA Ames Re-
search Center, the NASA Advanced Exploration systems
(AES) program, the NASA Transformative Aeronautic
Concepts Program (TACP), and from the AFRL In-
formation Directorate under grant F4HBKC4162G001.
13
B.O. was supported by a NASA Space Technology Re-
search Fellowship.
Appendix A: Instance-independent embedding for
quantum annealing
Quantum annealing is an alternative model of quantum
computation for minimizing a classical pseudo-Boolean
function f : {±1}n → R, in which the Hamiltonian is
slowly changed from an initial Hamiltonian Hinit into
the problem Hamiltonian Hf , whose ground state(s) we
would like to find. Often, the desired Hamiltonian Hf
cannot be implemented directly on a physical quantum
annealer due to limited connectivity. To overcome this
limitation, each logical qubit in Hf can be mapped to
a connected set of physical qubits which are coupled to-
gether with a ferromagnetic field that induces them to
take on the same value. In the standard case in which
Hf is 2-local (i.e., f is quadratic), it can be considered as
a graph, and this mapping from logical to physical qubits
as a minor embedding into the hardware graph. For ex-
ample, Choi [44] gave a family of minor embeddings of
the complete graph into a so-called Triad hardware graph
(similar to the Chimera hardware graph used by D-Wave)
in which the number of physical qubits scales quadrat-
ically with the number of logical qubits, which is opti-
mal for bounded-degree hardware graphs. Zaribafiyan et
al. [45] provide a deterministic embedding for Cartesian
product graphs.
In practice, problem graphs of interest are usually
much sparser than the complete graph, or the Cartesian
product graphs, and so using an embedding for the com-
plete graph is likely to use more physical qubits than
necessary. Specifically, most problems run on the D-
Wave quantum annealer make use of D-Wave’s heuris-
tic embedding software [46]. Many practitioners thus
use instance-specific embeddings to maximize the use
of scarce resources. The problem, however, is the dif-
ficulty of finding such instance-specific embeddings. An
approach similar to the one we used for quantum circuits
can be taken. Instead of using an embedding of either
the complete graph (which is trivial to find but resource-
inefficient) or a single problem graph (which is harder to
find but more resource-efficient), one can use an embed-
ding of a “supergraph” of a class of problem graphs. Such
an embedding can be found either manually or algorith-
mically, but in any case can be reused for any instance
in the class with negligible marginal cost. This approach
thus strikes a potentially valuable balance between the
two existing ones.
Appendix B: Lower bounds
The optimality of the complete swap network is easy
to show.
(
n
2
)
logical gates are executed in n almost per-
fectly parallelized layers. In a reasonable accounting in
which any 2-qubit gate on adjacent qubits can be done in
unit time, the logical qubits and swaps can be combined
into one. However, for more complicated cases the rea-
soning becomes more involved. This section gives some
methods for lower bounding the depth of solutions to the
(unordered) circuit embedding problem. In particular,
the lower bounds are on the depth of the 2-qubit swaps
only, i.e., the “swap depth”. For a bounded-degree phys-
ical graph and bounded-locality logical graph, the logical
gates that can be executed with a single, fixed mapping
of logical to physical qubits, i.e., that after an swap layer,
can be executed in O(1) depth. In such cases, which com-
prise almost all of practical interest, exact lower bounds
on the swap depth thus yield scaling lower bounds on the
total depth.
1. Acquaintance time
Benjamini et al. defined [47] the acquaintance time of
a graph G, denoted AC(G) as follows. Consider plac-
ing an agent at each vertex of the graph and a series of
matchings [48] of the graph. Each matching corresponds
to simultaneously swapping the agents on the vertices of
each edge. Such a a sequence of matchings of G is a
strategy for acquaintance in G if every pair of agents are
adjacent in the graph G at least once. The acquaintance
time is the number of rounds (matchings) in the shortest
strategy for acquaintance (and is finite if and only if the
graph is connected).
This notion of strategies for acquaintance is a use-
ful if limited abstraction for compiling quantum circuits
around geometric constraints. As is, a strategy for ac-
quaintance corresponds to a compilation of all 2-local
gates in a hardware graph G, with agents correspond-
ing to logical qubits, vertices corresponding to physical
qubits, and edges of matchings to swap gates. A gate
between two logical qubits can be implemented at any
point that that they can become “acquainted”. This
level of abstraction has the advantage and disadvantage
that it disregards the exact nature of the gates. This
makes it extremely general but also constructions within
it somewhat approximate. For example, in a strategy for
acquaintance, it is permissible for an agent to become
acquainted with more than one other agent in a single
round, while the corresponding 2-local gates would need
to be implemented sequentially.
Nevertheless, known results about acquaintance
times [47, 49] can be interpreted in the context of quan-
tum circuit embedding. For example, that the acquain-
tance time of the path graph Pn is n − 2 provides an
alternative proof of the optimality of the complete linear
1-swap network. Interestingly, the acquaintance time of
the barbell graph Bn (two fully connected halves con-
nected by a single edge) is also n − 2. Generally, it
is known that for a graph G of maximum degree ∆,
AC(G) = min{O(n2/∆), 20∆n}, which in particular im-
plies that for any graph AC(G) = O(n3/2). There are
14
also hardness results: AC(G) is NP-hard to approximate
within a multiplicative factor of 2 or within any additive
constant factor.
A strategy for acquaintance as defined above requires
that every pair of agents become acquainted. However,
it will often be the case that we care only about certain
pairs of agents, or larger-sized sets of agents. We now
define a generalization of acquaintance time that may be
of value in finding lower bounds in such cases. Let H be
the hypergraph whose vertices correspond to the agents
and whose hyperedges correspond to the sets of agents
that we would like to acquaint. We can then define a
strategy for H-acquaintance in G as an initial (injective)
mapping σ of the vertices of H to the vertices of G and a
sequence of matchings as above such that, for every edge
{i1, . . . , ik} of H, if agent i1 is placed on vertex σ(i1) in
G, i2 on vertex σ(i2), and so on, then the set of agents
{i1, . . . , ik} can be acquainted at some point. Whether
a set of agents can be acquainted given their locations
on the vertices of G can be specified in one of two ways.
In the first case, G itself is a hypergraph and the agents
can be acquainted if their positions {σt(i1), . . . , σt(i2)}
are a hyperedge of G, where σt(i) is the location of agent
i after t rounds. In the alternative, G is a simple graph,
and the agents can be acquainted if their positions form
a connected subgraph of G. The latter is closer to our
application of strategies for acquaintance: the physical
graph G specifies on which pairs of qubits a 2-qubit gate
can be applied, and higher-locality gates are decomposed
using such 2-qubit gates. The H-acquaintance time of G,
denoted ACH(G) then is the minimal size of a strategy
for H-acquaintance in G. Note that this definition does
not assume that |V (H)| = |V (G)|.
2. Circuit embeddings as minor embeddings
This section assumes that the reader is familiar with
the basic ideas of graph minor embeddings and treewidth;
see Klymko et al. [50] for a brief introduction to these
ideas in a related context. All graphs in this section will
be assumed to have edges of size 2.
Consider a strategy for G-acquaintance in Γ with d
rounds. Let H = Γ  Pd+1 be the strong product of Γ
and the path graph on d+ 1 vertices. That is,
V (H) = {(v, t)|v ∈ V (Γ), t ∈ {0, . . . , d}} , (B1)
E(H) = {{(v, t), (v′, t′)}|v = v′ ∨ {v, v′} ∈ E(Γ), |t− t′| ≤ 1} .
(B2)
The strategy for G-acquaintance in Γ can be interpreted
as a graph minor embedding of G into H as follows. Fig-
ure 8 shows an example for G = K4 and Γ = P4. The
“agents” are the vertices of G. The vertex model of v ∈ G
is the set of vertices {(σt(v), t)|t ∈ {0, . . . , d}} ⊂ V (H)
corresponding to the series of assignments of v to vertices
of Γ. Note that this vertex model is connected (indeed, a
simple path) and that the vertex models of distinct ver-
tices are disjoint, by the properties of an acquaintance
FIG. 8. A 2-round strategy for K4-acquaintance in P4 as a
minor embedding of K4 into P4P3. Each set of solid points
and lines of a given color indicates a vertex model. Each bi-
colored dashed line indicates an edge model. Solid gray lines
indicate unused edges of P4  P3.
strategy. The edge model of an edge {v, w} ∈ E(G) is
{(σt(v), t), (σt(w), t)} ∈ E(H) for some round t in which
the vertices v and w are assigned to adjacent vertices of
Γ. For any graphs A and B, if A is a minor of B, then
pw(A) ≤ pw(B) and tw(A) ≤ tw(B), because any path
or tree decomposition for B can be converted into one
for A by edge-contracting the vertex models, without in-
creasing the relevant width. In our case, we have shown
that G is a minor of H = ΓPd+1 whenever there exists
a d-round strategy for G-acquaintance in Γ. Therefore,
pw(G) ≤ pw (Γ PACG(Γ)+1) , (B3)
and similarly for treewidth.
We show now that, for an arbitrary graph G on n ver-
tices, the pathwidth pw(G) is at most about one more
than the G-acquaintance time in the path graph Pn,
pw(G) ≤ 2
⌈ACG(Pn)
2
⌉
+ 1. (B4)
We do so by explicitly constructing a path decomposi-
tion of a graph from a strategy for G-acquaintance in
Pn. Consider such a strategy and let σt(v) ∈ Pn be the
assignment of vertex v ∈ G after round t. We can con-
struct a path decomposition with n − 1 bags as follows.
Each bag corresponds to an edge of Pn and contains all
the vertices of G that are assigned to an vertex of Pn
adjacent to e. The bags form the path graph Pn−1 corre-
sponding to the line graph of Pn. Each bag can contain
at most 2 dd/2e + 2 vertices, where d is the number of
rounds in the strategy for G-acquaintance. Lastly, the
number of rounds in the strategy is at least the mini-
mum number of rounds ACG(Pn) and the pathwidth of
the graph is at most the width of this decomposition,
yielding the desired inequality.
One application of this inequality is yet another lower
bound on the swap depth of a complete swap network.
15
Equation B4 and the fact that pw(Kn) = n − 1 imply that
pw(Kn) = n− 1 ≤ 2
⌈AC(Pn)
2
⌉
+ 1 (B5)
⇒ n
2
− 1 ≤
⌈AC(Pn)
2
⌉
(B6)
⇒ AC(Pn) ≥
{
n− 2, n odd,
n− 3, n even. (B7)
Note that Equation B4 is not necessarily tight for arbi-
trary graphs. For example, consider the star graph Sk for
large k. It has pathwidth 1 [51], but the minimum swap
circuit depth is Ω(k). More generally, caterpillar graphs
exemplify the looseness of the above bound for the same
reason; the minimum depth of a swap circuit for any
graph scales linearly with the degree of the graph.
[1] S. Boixo, S. V. Isakov, V. N. Smelyanskiy, R. Babbush,
N. Ding, Z. Jiang, M. J. Bremner, J. M. Martinis, and
H. Neven, Nature Physics , 1 (2018).
[2] P. Shor, SIAM Review 41, 303 (1999),
https://doi.org/10.1137/S0036144598347011.
[3] L. K. Grover, in Proceedings of the Twenty-eighth Annual
ACM Symposium on Theory of Computing , STOC ’96
(ACM, New York, NY, USA, 1996) pp. 212–219.
[4] A. G. Fowler, A. M. Stephens, and P. Groszkowski, Phys.
Rev. A 80, 052312 (2009).
[5] J. Preskill, arXiv preprint arXiv:1801.00862 (2018).
[6] S. Brierley, Quantum Information & Computation 17,
1096 (2017).
[7] A. M. Childs, E. Schoute, and C. M. Unsal, arXiv
preprint arXiv:1902.09102 (2019).
[8] Y. Hirata, M. Nakanishi, S. Yamashita, and
Y. Nakashima, in 2009 Third International Conference
on Quantum, Nano and Micro Technologies (2009) pp.
26–33.
[9] D. Maslov, S. M. Falconer, and M. Mosca, in Proceed-
ings of the 44th annual Design Automation Conference
(ACM, 2007) pp. 962–965.
[10] D. Bhattacharjee and A. Chattopadhyay, “Depth-
optimal quantum circuit placement for arbitrary topolo-
gies,” (2017), arXiv:1703.08540 [cs.ET].
[11] M. Y. Siraichi, V. F. d. Santos, S. Collange, and F. M. Q.
Pereira, in Proceedings of the 2018 International Sympo-
sium on Code Generation and Optimization, CGO 2018
(ACM, New York, NY, USA, 2018) pp. 113–125.
[12] G. Li, Y. Ding, and Y. Xie, “Tackling the qubit map-
ping problem for nisq-era quantum devices,” (2018),
arXiv:1809.02573 [cs.ET].
[13] A. Lye, R. Wille, and R. Drechsler, in The 20th Asia
and South Pacific Design Automation Conference (2015)
pp. 178–183.
[14] D. Venturelli, M. Do, E. Rieffel, and J. Frank, Quantum
Science and Technology 3, 025004 (2018).
[15] K. E. Booth, M. Do, J. C. Beck, E. Rieffel, D. Venturelli,
and J. Frank, arXiv preprint arXiv:1803.06775 (2018).
[16] M. Saeedi, R. Wille, and R. Drechsler, Quantum Infor-
mation Processing 10, 355 (2011).
[17] R. Wille, O. Keszocze, M. Walter, P. Rohrs, A. Chat-
topadhyay, and R. Drechsler, in 2016 21st Asia and
South Pacific Design Automation Conference (ASP-
DAC) (2016) pp. 292–297.
[18] C. Lin, S. Sur-Kolay, and N. K. Jha, IEEE Transactions
on Very Large Scale Integration (VLSI) Systems 23, 1221
(2015).
[19] A. Zulehner, A. Paler, and R. Wille, in 2018 Design, Au-
tomation Test in Europe Conference Exhibition (DATE)
(2018) pp. 1135–1138.
[20] S. Herbert and A. Sengupta, arXiv preprint
arXiv:1812.11619 (2018).
[21] E. Farhi, J. Goldstone, S. Gutmann, and
H. Neven, arXiv e-prints , arXiv:1703.06199 (2017),
arXiv:1703.06199 [quant-ph].
[22] A. Kandala, A. Mezzacapo, K. Temme, M. Takita,
M. Brink, J. M. Chow, and J. M. Gambetta, Nature
549, 242 (2017).
[23] I. D. Kivlichan, J. McClean, N. Wiebe, C. Gidney,
A. Aspuru-Guzik, G. K.-L. Chan, and R. Babbush,
Physical review letters 120, 110501 (2018).
[24] Z. Jiang, K. J. Sung, K. Kechedzhi, V. N. Smelyanskiy,
and S. Boixo, Physical Review Applied 9, 044036 (2018).
[25] P. Corboz and G. Vidal, Physical Review B 80, 165129
(2009).
[26] M. Motta, E. Ye, J. R. McClean, Z. Li, A. J. Min-
nich, R. Babbush, and G. K. Chan, arXiv preprint
arXiv:1808.02625 (2018).
[27] G. E. Crooks, arXiv preprint arXiv:1811.08419 (2018).
[28] R. Beals, S. Brierley, O. Gray, A. W. Harrow, S. Kutin,
N. Linden, D. Shepherd, and M. Stather, Proc. R. Soc.
A 469, 20120686 (2013).
[29] A. Aspuru-Guzik, A. D. Dutoi, P. J. Love, and M. Head-
Gordon, Science 309, 1704 (2005).
[30] G. Ortiz, J. Gubernatis, E. Knill, and R. Laflamme,
Physical Review A 64, 022319 (2001).
[31] S. B. Bravyi and A. Y. Kitaev, Annals of Physics 298,
210 (2002).
[32] S. Bravyi, J. M. Gambetta, A. Mezzacapo, and
K. Temme, arXiv preprint arXiv:1701.08213 (2017).
[33] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q.
16
Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. Obrien,
Nature communications 5, 4213 (2014).
[34] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-
Guzik, New Journal of Physics 18, 023023 (2016).
[35] J. Romero, R. Babbush, J. R. McClean, C. Hempel, P. J.
Love, and A. Aspuru-Guzik, Quantum Science and Tech-
nology 4, 014008 (2018).
[36] J. Lee, W. J. Huggins, M. Head-Gordon, and K. B. Wha-
ley, Journal of chemical theory and computation (2018).
[37] M. B. Hastings, D. Wecker, B. Bauer, and M. Troyer,
Quantum Information & Computation 15, 1 (2015).
[38] E. Farhi, J. Goldstone, and S. Gutmann, arXiv preprint
arXiv:1411.4028 (2014).
[39] E. Farhi and A. W. Harrow, arXiv preprint
arXiv:1602.07674 (2016).
[40] S. Hadfield, Z. Wang, B. O’Gorman, E. G. Rieffel,
D. Venturelli, and R. Biswas, Algorithms 12, 1 (2019).
[41] A. M. Childs, A. Ostrander, and Y. Su, arXiv preprint
arXiv:1805.08385 (2018).
[42] M. Nooijen, Physical review letters 84, 2108 (2000).
[43] D. Wecker, M. B. Hastings, and M. Troyer, Phys. Rev.
A 92, 042303 (2015).
[44] V. Choi, Quantum Information Processing 10, 343
(2011).
[45] A. Zaribafiyan, D. J. Marchand, and S. S. C. Rezaei,
Quantum Information Processing 16, 136 (2017).
[46] J. Cai, W. G. Macready, and A. Roy, arXiv preprint
arXiv:1406.2741 (2014).
[47] I. Benjamini, I. Shinkar, and G. Tsur, SIAM Journal on
Discrete Mathematics 28, 767 (2014).
[48] A matching is a set of mutually disjoint edges of a graph.
[49] O. Angel and I. Shinkar, Graphs and Combinatorics 32,
1667 (2016).
[50] C. Klymko, B. D. Sullivan, and T. S. Humble, Quantum
information processing 13, 709 (2014).
[51] Consider the decomposition in which there is a bag for
each leaf containing that leaf and the internal vertex.
