Circuit Transformations for Quantum Architectures by Childs, Andrew M. et al.
Circuit Transformations for Quantum
Architectures
Andrew M. Childs
Joint Center for Quantum Information and Computer Science, University of Maryland, USA
Institute for Advanced Computer Studies, University of Maryland, USA
Department for Computer Science, University of Maryland, USA
amchilds@umd.edu
Eddie Schoute
Joint Center for Quantum Information and Computer Science, University of Maryland, USA
Institute for Advanced Computer Studies, University of Maryland, USA
Department for Computer Science, University of Maryland, USA
eschoute@cs.umd.edu
Cem M. Unsal
Department of Mathematics, University of Maryland, USA
Abstract
Quantum computer architectures impose restrictions on qubit interactions. We propose efficient
circuit transformations that modify a given quantum circuit to fit an architecture, allowing for
any initial and final mapping of circuit qubits to architecture qubits. To achieve this, we first
consider the qubit movement subproblem and use the Routing via Matchings framework to prove
tighter bounds on parallel routing. In practice, we only need to perform partial permutations, so we
generalize Routing via Matchings to that setting. We give new routing procedures for common
architecture graphs and for the generalized hierarchical product of graphs, which produces subgraphs
of the Cartesian product. Secondly, for serial routing, we consider the Token Swapping framework
and extend a 4-approximation algorithm for general graphs to support partial permutations. We
apply these routing procedures to give several circuit transformations, using various heuristic
qubit placement subroutines. We implement these transformations in software and compare their
performance for large quantum circuits on grid and modular architectures, identifying strategies
that work well in practice.
2012 ACM Subject Classification Computer systems organization → Quantum computing; Hard-
ware → Quantum computation; Mathematics of computing → Graph theory; Applied computing →
Physics; General and reference → General conference proceedings; Networks
Keywords and phrases quantum circuit, quantum architectures, circuit mapping
Digital Object Identifier 10.4230/LIPIcs.TQC.2019.3
Related Version Full version at arXiv:1902.09102 [quant-ph].
Supplement Material Source code and result data available at https://gitlab.umiacs.umd.edu/
amchilds/arct.
Funding This work was supported in part by the Army Research Office (MURI award number
W911NF-16-1-0349), the Canadian Institute for Advanced Research, the National Science Foundation
(grant number 1813814), and the U.S. Department of Energy, Office of Science, Office of Advanced
Scientific Computing Research, Quantum Algorithms Teams and Quantum Testbed Pathfinder
(award number DE-SC0019040) programs.
Acknowledgements The authors would like to thank Aniruddha Bapat for insights on the hierarchical
product of graphs and suggestions for tightening the routing lower bound for these graphs. We
would also like to thank Drew Risinger for helpful formative discussions.
© Andrew M. Childs, Eddie Schoute, and Cem M. Unsal;
licensed under Creative Commons License CC-BY
14th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC 2019).
Editors: Wim van Dam and Laura Mančinska; Article No. 3; pp. 3:1–3:24
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
3:2 Circuit Transformations for Quantum Architectures
1 Introduction
Quantum algorithms are typically formulated in a circuit model in which two-qubit gates
can be performed between any pair of qubits. However, most realistic quantum architectures
impose restrictions on qubit interactions. Thus a natural challenge is to find a way of
implementing a given circuit on a given architecture with low overhead. We can do this by
finding a time-efficient architecture-respecting circuit transformation – a mapping to a new
circuit that preserves the function of the original quantum circuit up to an initial mapping
of circuit qubits to architecture qubits and a final mapping of architecture qubits back to
circuit qubits, where the new circuit is constrained to respect the architecture.
There have been many proposals for the design of quantum processors. Examples
include trapped ion systems that enable interactions between any two ions in a trap [39]
and superconducting qubit architectures with more limited interactions [19, 24, 44]. Many
proposed architectures for scalable devices employ modularity, building a large device from
interconnected subunits [39, 40, 12].
There is also a considerable amount of work on implementing circuits under architectural
constraints. Some examples include implementations of Shor’s algorithm [18], the quantum
Fourier transform on 1D nearest-neighbor architectures [35], and quantum adders on nearest-
neighbor architectures [15, 16]. However, the aforementioned works focus on analyzing specific
circuits. Instead, we wish to find automated circuit transformations that can handle complex
circuits and compare their performance when implemented in various architectures. Bounds
on the efficacy of architecture-respecting circuit transformations and good automated tools for
implementing them may be able to inform architecture design decisions [52]. Unfortunately,
it is challenging to achieve good performance with an automated tool. Indeed, finding even
one optimal placement for a set of gates is NP-hard [34].
Prior Work on Automated Architecture-Respecting Circuit Transformations
Several previous works use exhaustive approaches that take time exponential in the number
of qubits (and hence can only be used for small instances). For example, Saeedi, Wille,
and Drechsler [46] use SAT solvers to decompose circuits so they can be run on the path
architecture; [33] finds an optimal circuit transformation on nearest-neighbor architectures
by formulating the problem as a pseudo-boolean optimization; Venturelli et al. [50] use
temporal planners to schedule gates; and [41] uses satisfiability modulo theory solvers to
find mappings of the circuit with high success probability using calibration data. Other
work has instead proposed minimizing the distance between all qubits in groups of gates
on specific architectures [48, 54, 43], but this is also NP-hard in general. These and other
papers add swap gates so that the logical state of a given physical qubit is transferred to
a different physical qubit (henceforth, we simply refer to this as qubit movement, with the
implicit understanding that only the logical state is moved).
As a heuristic solution, we can break the circuit into sets of disjoint gates and move qubits
between each set. Metodi et al. [36] propose polynomial-time heuristic routines that prioritize
gates with many dependents. Hirata et al. [22] propose exhaustive and heuristic searches for
good placements of qubits on the path architecture to construct circuit transformations.
One can also use heuristic qubit placement and movement algorithms on fault-tolerant
2D grid architectures [30] or algorithms that are designed to handle the surface code [28].
We do not consider fault tolerance explicitly and instead work only at the logical level.
An exhaustive search of all permutations of n qubit locations takes time O(n!) but can
work well for small numbers of locations [53], or can be done selectively using A∗ heuristic
search [57, 58] or local search [34, 8]. By choosing a suitable initial placement of qubits, we
A.M. Childs, E. Schoute, and C.M. Unsal 3:3
can further reduce the qubit movement cost. For example, [29] tries to find a good initial
placement by repeatedly transforming the quantum circuit forwards and then backwards,
taking the output qubit placement as input for the next iteration.
Others have considered a model in which one can perform fast measurements and adapt
later parts of the computation based on the outcomes [20]. This model allows the movement
of qubits with just a constant overhead at the cost of extra ancillas [45]. However, realizing
such a model presents significant technical challenges and we do not consider it here.
Various bounds are also known for the cost of moving qubits. Sorting networks provide a
way to upper bound the depth of the qubit movement circuit [27, 7, 13, 21]. We refer to [3]
for a more complete overview of sorting networks.
Contribution
In this paper, we construct architecture-respecting circuit transformations that attempt to
minimize the circuit depth or size overhead and have worst-case time complexity polynomial
in the sizes of the circuit and architecture graph. We model the connectivity of the underlying
hardware as a simple graph where vertices represent the qubits and edges represent places
where a two-qubit gate can be performed.
As a simple and fast approach, we propose the greedy swap circuit transformation
(Section 2.2.2). It inserts swaps on edges chosen to minimize the total distance between
qubits involved in two-qubit gates until some gate(s) can be executed.
We then propose building architecture-respecting circuit transformations (Section 2.2.3)
by combining algorithms for two basic subproblems: qubit movement (addressed by permuters,
for which we provide theoretical performance guarantees) and qubit placement (addressed
by mappers). For the latter, we specify a variety of heuristic strategies (Section 4) to find
suitable placements of qubits from the input circuit, attempting to optimize for circuit size
or depth. We implement these algorithms in software, which is publicly available under a
free software license [47].
Consider now the problem of moving qubits on a given architecture graph. A sorting
network sorts any fixed-length sequence of integers with a circuit of comparators, which
compare two inputs and output them in some ordering. While sorting networks can be used
to route qubits [7], they achieve a more general task, and the cost of routing can sometimes be
lower with other methods. Specifically, we suggest Routing via Matchings [2] (introduced
in Section 3.1) as a more suitable framework for moving qubits in parallel. Deciding whether
there exists a depth-k circuit for Routing via Matchings is NP-complete in general for
k > 2 [3], but optimal or near-optimal protocols are known for specific graph families [2, 56].
In some cases it is possible to implement any permutation asymptotically more efficiently than
a general sorting network (see Table 1). On complete graphs, for example, any permutation
can be implemented in a depth-2 circuit of transpositions [2], whereas an optimal sorting
network has depth Θ(logn) [1].
While it is common to consider only the worst-case routing performance, we also wish
to route efficiently in practice. To improve practical performance, we generalize to partial
permutations (permutations only defined on some subdomain) so that we can also move
subsets of qubits efficiently. The destinations of the remaining qubits are unconstrained. In
Section 3.1, we present routing algorithms for the path graph, the complete graph, and the
generalized hierarchical product of graphs [6], which includes the Cartesian product of graphs
and modular architectures as special cases [40]. Graphs obtained as hierarchical products
have many good properties for quantum architectures [5]. We establish an upper bound
on the routing number of a hierarchical product (Theorem 4) that matches prior work for
total permutations on the Cartesian product of graphs [2] and depends on easily computable
properties of the input partial permutation.
TQC 2019
3:4 Circuit Transformations for Quantum Architectures
Table 1 Performance bounds for sorting networks versus routing via matchings (the routing
number, rt(G); see (6)) where |V | = n. A (∆, D)-tree has max degree ∆ and diameter D. The
generalized hierarchical product of the graphs G1 and G2 = (V2, E2) is denoted by Π~v(G1, G2),
for ~v ∈ {0, 1}|V2| (see Definition 2). The special cases of the Cartesian product of graphs, the
r-dimensional grid, and the modular graph are also listed. Let ~1 := [1 . . . 1] and ~e1 := [1 0 . . . 0].
Worst-case circuit depth
Graph family Sorting (comparators) Routing nr. (transpositions)
path (Pn) n [25] n [2]
complete (Kn) Θ(logn) [1] 2 [2]
(∆, D)-tree O(min (∆, log(n/D))n) [4] 3n/2 + O(logn) [56]
Π~v(G1, G2) not known
⌈
|V2|
ham(~v)
⌉
(rt(G1) + rt(G2)) + rt(G2)
G1 ×G2 = Π~1(G1, G2) not known 2 rt(G1) + rt(G2) [2]×ri=1 Pni n1 + 2∑ri=2 ni + o(·) [26] n1 + 2∑ri=2 ni [2]
Π~e1(Kn1 ,Kn2) not known 3n2 + 2
We also propose using Token Swapping [55] for minimizing the total number of swaps,
which is relevant when optimizing for total circuit size (Section 3.2). We generalize this
problem to partial permutations and obtain a 4-approximation algorithm (Theorem 7).
Finally, we evaluate our circuit transformations on large quantum circuits (Section 5) and
compare their performance with the circuit transformation included in the Qiskit software
(Section 2.2.1) [8]. We find that the relative performance varies significantly with the circuit
type and architecture. When minimizing circuit size, the greedy swap circuit transformation
is one of the best, though some improvement may be gained using some of our specialized
circuit transformations. For depth, some of our specialized circuit transformations do best
on random circuits on grid architectures, whereas Qiskit’s circuit transformation does well
on modular architectures. For quantum signal processing circuits [32] we find that the depth
is best minimized by our greedy swap circuit transformation.
2 Constructing Circuit Transformations
Program transformations are algorithms that modify computer programs while retaining
functionality [42]. In a similar vein, we define a circuit transformation as an algorithm
that modifies an input quantum circuit to produce an output quantum circuit with the
same functionality. We represent an architecture by a simple graph G = (V,E), and let
Q denote the set of qubits of the input circuit. A circuit transformation is architecture-
respecting if it produces injective initial and final mappings of the form pˆ : Q→ V and an
architecture-respecting output circuit. The output circuit is architecture-respecting if for
each two-qubit gate acting on (qubit) vertices v1, v2 we have (v1, v2) ∈ E (where the ordering
is irrelevant since G is undirected). Henceforth, we only consider circuit transformations
that are architecture-respecting, and we refer to them simply as circuit transformations. We
propose a construction for a general circuit transformation that may use the properties of
the underlying architecture by relying on a specialized subroutine for moving qubits called a
permuter (Section 3), and a subroutine determining where to place qubits, called a mapper
(Section 4). We show in Appendix C that our circuit transformations are polynomial-time in
the circuit size and architecture graph size.
To be able to transform a circuit, we must have |Q| ≤ |V |, and the output circuit must
contain a qubit for every vertex in the architecture. Throughout the circuit transformation,
we keep track of the injective current placement of qubits pˆ : Q→ V . The initial and final
A.M. Childs, E. Schoute, and C.M. Unsal 3:5
values of pˆ are also the initial and final mappings, respectively, of qubits to the architecture.
A gate is executed by appending it to the output circuit. Two-qubit gates with qubits
q1, q2 ∈ Q can only be executed when (pˆ(q1), pˆ(q2)) ∈ E. By adding swap gates to the output
circuit, we can change pˆ and thereby unitarily transform quantum circuits for execution on
an architecture.
2.1 Definitions
Partial Functions and Partial Permutations. For sets X and Y , a partial function f : X ⇀
Y is a mapping from dom(f) ⊆ X to image(f) := {f(x) | x ∈ dom(f)} ⊆ Y . However, f(x)
is undefined for x ∈ X \ dom(f). We consider such elements x unmapped. For x ∈ dom(f),
we write x 7→ f(x) and say that x is mapped to f(x). We can then define any partial function
f as a set of mappings, f := {x 7→ y | x ∈ X, y ∈ Y }, where all preimages must be distinct
(i.e., if x 7→ y ∈ f and x′ 7→ y′ ∈ f with y 6= y′, then x 6= x′). A total function fˆ is a partial
function where dom(fˆ) = X and is denoted fˆ : X → Y . By the term “function” we will
mean a total function.
A partial function f is injective iff ∀x, x′ ∈ dom(f) with x 6= x′, f(x) 6= f(x′). A function
fˆ : X → Y is surjective iff ∀y ∈ Y,∃x ∈ X : f(x) = y. A bijective partial function f
is a partial function that is injective and is denoted f : X ↼⇀ Y (note that such an f is
necessarily surjective on its image). A bijective function fˆ is both injective and surjective
and is denoted by fˆ : X ↔ Y . For any bijective (partial) function f there exists an inverse
function f−1 : image(f)→ dom(f).
A partial permutation pi is any bijective partial function with the same domain and
codomain, i.e., pi : X ↼⇀ X. Similarly, a total permutation is any σ : X ↔ X. By “permuta-
tion” we mean a total permutation.
We also define some notions specifically useful for this paper. An unmapped vertex is a
vertex in V \ dom(pi), for a graph G = (V,E) and pi : V ↼⇀ V . We define the union of partial
functions f : X ⇀ Y and g : X ⇀ Y when dom(f) ∩ dom(g) = ∅ as
(f ∪ g)(x) :=
{
f(x) if x ∈ dom(f) ,
g(x) if x ∈ dom(g) . (1)
Furthermore, (f ∪ g) is a bijective partial function iff f and g are bijective partial functions
and image(f)∩ image(g) = ∅. A completion of pi : X ↼⇀ X is a pˆi : X ↔ X = (pi∪σ) for some
σ : X ↼⇀ X, where dom(σ) = X \ dom(pi) and image(σ) = X \ image(pi).
Directed Acyclic Graph Representation of a Circuit. A quantum circuit can be viewed as
a directed acyclic graph (DAG), where vertices represent gates and directed edges represent
qubit dependencies. We define the first layer of the DAG, L, to be the set of all vertices
without predecessors. By removing L and taking the first layer of the resulting DAG, we can
define the second layer, and so on.
The size of a circuit is the number of gates it contains (i.e., the number of vertices in the
DAG); the depth of a circuit is the number of layers. It is natural to minimize either the
depth, corresponding to the execution time when gates can be applied in parallel, or the size,
corresponding to the total number of operations that must be performed. We are mostly only
interested in two-qubit gates and their qubits. Therefore, let us define tg : VD ⇀ Q×Q, where
VD is the set of DAG vertices, that outputs the pair of qubits acted on by the DAG vertex,
for two-qubit gates. For simplicity, we denote tg(L) := {tg(g) | g ∈ L, g is a two-qubit gate}.
TQC 2019
3:6 Circuit Transformations for Quantum Architectures
2.2 Architecture-Respecting Circuit Transformations
We now describe some specific architecture-respecting circuit transformations. We first
describe two basic circuit transformations, one provided by the Qiskit software (Section 2.2.1)
and another that uses a simple greedy approach (Section 2.2.2). Then, in Section 2.2.3 we
specify a family of circuit transformations that builds on specialized procedures for qubit
placement and routing.
2.2.1 Qiskit Circuit Transformation
The open-source quantum computing software framework Qiskit [8] contains a circuit transfor-
mation1 that we build upon in one of our mappers (Section 4). We specify this transformation
here and compare it with our other approaches to circuit transformations in Section 5.
We initialize pˆ arbitrarily. Fix a number of trials, k ∈ N, for each layer. We do the
following in trial i ∈ [k] where [k] := {1, . . . , k}: For all v, u ∈ V , sample a symmetric
weight di(v, u) := (1 +N (0, 1/N)) d(v, u)2 independently for (v, u) ∈ V × V , where N (µ, σ)
represents a sample from the normal distribution with mean µ ∈ R and standard deviation
σ ≥ 0, and d : V × V → N is the shortest distance function on the architecture graph. We
define an objective function as the sum of gate distances,
S :=
∑
(q1,q2)∈tg(L)
di(pˆ(q1), pˆ(q2)) . (2)
We now try to swap pairs of qubits to decrease S. Specifically, we construct a set of swaps
by iterating over all edges e ∈ E and greedily adding the corresponding swap if it decreases
S and neither endpoint of e is already involved in some swap. We execute the set of swaps
and update S. We then iterate this process until either S = |tg(L)|; or there is no swap that
decreases S; or we reach the upper bound of 2|V | iterations.
Now, if S = |tg(L)| then the algorithm has successfully found a sequence of swaps and
all gates in L can be executed. The result of trial i is then set to this sequence of swaps.
Otherwise, trial i is a failure. If there is at least one successful trial out of k trials, we execute
the swaps of a successful trial with the fewest swaps and then execute all gates in L.
If no trial was successful, we apply the same routine for finding swaps that minimize S,
but taking only a single gate (q1, q2) ∈ tg(L) at a time. Note that this results in a sequence
of swaps along the shortest path between pˆ(q1) and pˆ(q2). After each such step we execute
the selected gate. We repeat this until all gates in tg(L) have been executed and also execute
all single-qubit gates in L. Finally, we remove the vertices in L from the input circuit DAG
and iterate this process until all gates in the input circuit are executed.
2.2.2 Greedy Swap Circuit Transformation
We also describe a simple greedy approach to circuit transformations. Similar to the Qiskit
circuit transformation described above, we prioritize swaps that maximally reduce the total
distance between the qubits tg(L), but now using the simpler objective function
R :=
∑
(q1,q2)∈tg(L)
d(pˆ(q1), pˆ(q2)) . (3)
Note that this is different from (2), where a randomized distance di is used.
1 We base our description on qiskit.mapper.swap_mapper from Qiskit version 0.6.1.
A.M. Childs, E. Schoute, and C.M. Unsal 3:7
We construct an initial pˆ as follows. Let us consider the first layer L′ of the circuit
consisting of only two-qubit gates (i.e., single-qubit gates are ignored), initialize p′ : Q ↼⇀ V
as undefined everywhere, and set U := ∅ ⊆ V . We iteratively construct
p′ ← p′ + {q1 7→ v1, q2 7→ v2 | (q1, q2) ∈ L′, (v1, v2) ∈M} , (4)
where M ⊆ E is a maximum matching of G, remove (q1, q2) from L′, set U ← U ∪ {v1, v2},
and recompute M on the subgraph of G with the vertices V \ U .2 The remaining qubits
Q \ dom(p′) are arbitrarily mapped to the available vertices V \ image(p′) to obtain pˆ.
In every iteration, we construct a set of disjoint gates to execute. We first execute as
many gates from L as possible given pˆ, and we remove these gates from the input circuit.
Second, let Ei, for i ∈ [2], be the set of edges where executing a swap would decrease R
by i, excluding edges which already had a vertex involved in a gate this iteration. We then
greedily execute gates from E2 first and E1 second, updating both Eis as we go. If we were
not able to execute a gate from L and no swaps were executed, then, as a fallback, we
deterministically pick a two-qubit gate (q1, q2) ∈ tg(L) and swap along the first edge on the
shortest path between pˆ(q1) and pˆ(q2). We update pˆ according to the inserted swaps, update
L, and finally update R. This process is repeated until the input circuit is empty.
The fallback routine ensures that this circuit transformation always produces an output
circuit. The value R strictly decreases in every iteration until a gate can be executed unless
the fallback routine is performed, in which case R stays the same. On repeated calls to
the fallback routine, the same two-qubit gate is picked deterministically until it is executed.
This happens within diam(G) + 1 iterations, where diam(G) denotes the diameter of G. By
induction we see that the whole circuit will be executed.
2.2.3 Constructing Architecture-aware Circuit Transformations
We now present our construction for a general circuit transformation and make some
definitions more precise. Let a permuter (Section 3) be a subroutine that, given pi : V ↼⇀ V ,
outputs a sequence of transpositions that implements pi while respecting the architecture
constraints. Let a mapper (Section 4) be a subroutine that, given pˆ, a permuter, and a
quantum circuit, computes a new placement of qubits, p : Q ↼⇀ V , such that some gates of
the input circuit can be executed.
Initialize pˆ in the same way as the greedy swap circuit transformation. We repeat the
following steps until the entire circuit has been transformed:
1. Use the given mapper to find a placement, p : Q ↼⇀ V , for the remaining input circuit;
2. Let “◦” denote partial function composition, i.e., given g : X ⇀ Y and f : Y ⇀ Z,
(f ◦ g)(x) := f(g(x)), for x ∈ dom(g) and g(x) ∈ dom(f). We use the permuter to find
transpositions implementing p ◦ pˆ−1 : V ↼⇀ V and replace the transpositions with swap
gates to construct a permutation circuit to execute. We also update pˆ to reflect the new
placement of qubits after running the permutation circuit.
3. Execute all gates in L that can be executed in accordance with pˆ, remove these gates
from the input circuit, and recompute L.
2 This is equivalent to running the greedy depth mapper (Section 4) on the input circuit with only
two-qubit gates, an arbitrary pˆ, and free permutations of qubits. In other words, the greedy depth
mapper will pick a placement of qubits on the architecture unconstrained by movement of qubits, since
this is the initial placement.
TQC 2019
3:8 Circuit Transformations for Quantum Architectures
3 Partial Permutations via Transpositions
In this section we provide routing algorithms for implementing partial permutations via
transpositions constrained to edges of a graph. We call such algorithms permuters. The
Routing via Matchings and Token Swapping problems capture exactly our optimization
goals of implementing a permutation of qubits on a quantum architecture while minimizing
the circuit depth and size, respectively.
3.1 Partial Routing Via Matchings
The framework of Routing via Matchings captures how to permute qubits on a graph using
a circuit of the smallest possible depth [2]. We first define a generalization of Routing via
Matchings that allows for partial permutations and then provide permuters for implementing
partial permutations for some architectures of interest.
I Definition 1 (Partial Routing via Matchings). Partial Routing via Matchings
is the following optimization problem. Given a simple graph G = (V,E) and a pi : V ↼⇀ V ,
the objective is to find the smallest k ∈ N such that there exist matchings M1, . . . ,Mk ⊆ E
on G, where each matching induces a permutation as a product of disjoint transpositions
piMi =
∏
(v,u)∈Mi
(v u), such that pˆi =
k∏
i=1
piMi (5)
is a completion of pi.
Routing via Matchings is the special case of Partial Routing via Matchings where
pi is constrained to be a (total) permutation. The partial routing number of pi : V ↼⇀ V on G
is rt(G, pi) := k, where k obtains the minimum in Definition 1. The routing number [2] is the
special case of the partial routing number where pi is total. In this paper, we simply refer to
the partial routing number as the routing number. The routing number of G is defined as
rt(G) := max
σ∈Sym(V )
rt(G, σ) , (6)
where we maximize over all permutations σ : V ↔ V (here Sym(V ) denotes the group of such
permutations). Note that we only optimize over permutations, since for any pi : V ↼⇀ V ,
rt(G, pi) = min
pˆi
rt(G, pˆi) , (7)
where we minimize over all completions pˆi of pi.
An alternate way to interpret (Partial) Routing via Matchings is to assign tokens
to all v ∈ dom(pi) and destinations pi(v) for the tokens. A token can only by moved through
an exchange of tokens between adjacent vertices. The goal is to move all tokens to their
destination in as few matchings (specifying exchange locations) as possible. If a vertex does
not hold a token at the time of an exchange with a neighbor, as can be the case in Partial
Routing via Matchings, then after the exchange the neighbor will not hold a token.
We give simple constructions for permuters of the complete graph, Kn, and the path
graph, Pn, for n ∈ N. Let V be the vertex set of the respective graph and pi : V ↼⇀ V
given. For Kn, if |dom(pi) ∪ image(pi)| = 2|dom(pi)| all mappings are disjoint, so we return
{(v, pi(v)) | v ∈ dom(pi)} as a single matching that implements pi. Otherwise, we construct an
arbitrary completion pˆi of pi and run the standard algorithm for Routing via Matchings
for complete graphs on pˆi [2], obtaining rt(Kn, pi) ≤ 2.
A.M. Childs, E. Schoute, and C.M. Unsal 3:9
For Pn, let V ∼= [n], ordered from one end of the path to the other (picking ends
arbitrarily). Iterate through i ∈ V in ascending order, setting
pˆi(i) =
{
pi(i) if i ∈ dom(pi),
min (V \ image(pˆi)) otherwise. . (8)
We then run the standard path routing algorithm [2] on pˆi, obtaining rt(Pn, pi) ≤ n. It
remains an open question whether a tighter bound can be proven as a function of some
property of pi.
Hierarchical Product
The generalized hierarchical product (henceforth hierarchical product) of graphs [6] produces
various subgraphs of the Cartesian product of graphs that include natural models of quantum
computer architectures [5].
I Definition 2 (Hierarchical Product [6]). For j ∈ {1, 2}, let Gj = (Vj , Ej) be a graph with
nj := |Vj | vertices and adjacency matrix Aj ∈ Mnj , where Mk is the set of k × k boolean
matrices for k ∈ N. Then the hierarchical product Π~v(G1, G2), for ~v ∈ {0, 1}n2 , has vertex
set V1 × V2 and adjacency matrix A1 ⊗ diag(~v) + In1 ⊗A2, where In1 ∈Mn1 is the n1 × n1
identity matrix, M1 ⊗M2 ∈Mn1n2 is the Kronecker product of M1 ∈Mn1 and M2 ∈Mn2 ,
and diag(~v) ∈Mn2 is the diagonal matrix with the entries of ~v on the diagonal.
Intuitively, this graph consists of n1 copies of G2, where the jth vertices in all copies of
G2 are connected by a copy of G1 if ~vj = 1. We restrict ourselves to connected simple graphs,
so A1 and A2 are symmetric 0–1 matrices and ~v is nonzero. An example of the hierarchical
product of two path graphs is
Π[1 0 1](P2, P3) = Π[1 0 1]

1
2
, 1 2 3
 =
1,1 1,2 1,3
2,1 2,2 2,3
(9)
The Cartesian product is Π~1, where ~1 := [1 . . . 1], and Π~ei is the rooted product of graphs,
rooted at the ith vertex of G2.
We define the vertex-induced subgraph of any graph G = (V,E) for vertex set U ⊆ V as
G[U ] := (U,E ∩ (U × U)) . (10)
Now, let G = (V,E) = Π~v(G1, G2) and denote the vertices of G by v = (v1, v2) ∈ V1×V2 = V .
We define Gi = (Vi, Ei) := G [{i} × V2], for i ∈ V1. Note that each Gi is isomorphic to G2, so
the permuter for G2 can be used for Gi. We also define the communicator vertices of Gi as the
vertices {i} × {j ∈ V2 | ~vj = 1} ⊆ Vi and index them in ascending order (for some ordering
of V ). Note that the jth communicator vertex (of any Gi) also belongs to G[V1 ×{j}], which
is isomorphic to G1.
A useful metric is the maximum number of vertices that need to leave or enter any Gi to
implement pi, defined as the degree of pi,
deg(pi) := max
⋃
i∈V1
{|{v ∈ dom(pi) ∩ Vi | pi(v) 6∈ Vi}|, |{v ∈ dom(pi) \ Vi | pi(v) ∈ Vi}|} . (11)
In every iteration of the routing algorithm, we route a set R = {v(i) ∈ Vi | i ∈ V1} such
that all pi(v)1 are distinct, for v ∈ R and pi(v) = (pi(v)1, pi(v)2) ∈ V . Undefined values are
always considered distinct. We call such R a set of representative vertices, and we view v(i)
as the representative vertex of Vi.
TQC 2019
3:10 Circuit Transformations for Quantum Architectures
Algorithm 3.1: Partial Routing via Matchings on the hierarchical product
of graphs Π~v(G1, G2). In Line 1, routing means constructing a partial permutation
σ on a subgraph (G1 or G2), using the applicable permuter to find transpositions
implementing σ, and applying those transpositions to update pi and each Ri.
input : pi : V1 × V2 ↼⇀ V1 × V2; permuters on G1 and G2
1 Let Ri, for i ∈ [deg(pi)], be given by Lemma 3
2 for i = 1, . . . ,
⌈
deg(pi)
ham(~v)
⌉
:
3 foreach j ∈ V1 :
4 on Gj , for all k ∈ [ham(~v)], route the (unique) vertex v ∈ R(i−1)·ham(~v)+k ∩ Vj
to the k-th communicator vertex of Gj // For R` with ` > deg(pi), do
nothing
5 foreach communicator vertex (v1, v2) of G1 : // All copies of G1
6 on G[V1 × {v2}] = (V ′, E′), route each v ∈ V ′ ∩ dom(pi) to (pi(v)1, v2) ∈ V ′
7 foreach i ∈ V1 :
8 route all v ∈ dom(pi) ∩ Vi to pi(v) within Gi
9 return the transpositions that implement this routing
I Lemma 3 (Proof in Appendix A.1). For a graph Π~v(G1, G2), pi : V ↼⇀ V , let d := deg(pi).
We can find distinct sets of representative vertices Ri, for i ∈ [d], such that
{v ∈ dom(pi) | v1 6= pi(v)1} ⊆
⋃
i∈[d]
Ri .
Algorithm 3.1 specifies a permuter for the hierarchical product. We prove the following
performance bounds for this algorithm
I Theorem 4. For a graph Π~v(G1, G2), Algorithm 3.1 finds a sequence of transpositions
that implements pi : V ↼⇀ V certifying that
rt(Π~v(G1, G2), pi) ≤
⌈
deg(pi)
ham(~v)
⌉
(rt(G1) + rt(G2)) + rt(G2) ,
where ham(~v) is the Hamming weight of ~v, i.e., the number of ones in ~v.
Proof. In every round of routing, we route ham(~v) sets Ri to their destination Gjs, for j ∈ V1.
In each round, we route on all copies of G2 in parallel and then route on all copies of G1 in
parallel. After routing all Ri in at most ddeg(pi)/ ham(~v)e rounds, Lemma 3 ensures that
only permutations local to each Gj remain. Finally, we route vertices to their destinations,
as given by pi, in each Gj independently using the permuter for G2. J
As a possible optimization, we can remove some vertices from the partial permutations in
the routing steps. For each removed vertex, we must ensure that the remaining steps of the
routing algorithm remain valid. Specifically, let there be a u ∈ Gi ∩Rk for i ∈ V1 and k ∈
[deg(pi)]. If u ∈ dom(pi) and pi(u) ∈ Vi, then we remove it since it does not need to be routed
outside of Gi. Otherwise, if u 6∈ dom(pi), we remove it unless {Rk ∩ dom(pi) | pi(v) ∈ Gi} 6= ∅
since an unmapped vertex is expected at the communicator vertex in the second loop of
the routing round. We apply this optimization in our implementation of the permuter for
modular graphs (Appendix A.2).
We show a lower bound on the routing number of hierarchical products of graphs, which
can be shown to be tight up to constant factors (see full arXiv version on title page).
A.M. Childs, E. Schoute, and C.M. Unsal 3:11
I Theorem 5. For a graph Π~v(G1, G2) and any pi : V ↼⇀ V ,
2
⌈
deg(pi)
ham(~v)
⌉
− 1 ≤ rt(Π~v(G1, G2), pi) .
Proof. Let us consider the token-based formulation of Partial Routing via Matchings.
At most deg(pi) tokens need to be moved out of any Gi, for i ∈ V1. Every matching
can move at most ham(~v) tokens out of their original Gi. Once moved out, a new set of
tokens must be moved onto the ham(~v) communicator vertices. Therefore, it takes at least
2ddeg(pi)/ ham(~v)e − 1 matchings to move deg(pi) tokens out of any Gi. J
For special cases of interest to quantum architectures, we analyze the modular graph in
Appendix A.2 and provide a permuter specialized to Cartesian products of graphs with some
heuristic optimizations in the full arXiv version.
3.2 Partial Token Swapping
The Token Swapping problem is similar to Routing via Matchings, but minimizes
the total number of transpositions instead of the depth [55]. It follows that the induced
permutation circuit is optimized for circuit size. The decision version of Token Swapping
was first shown to be NP-complete [38] and, later, hardness was shown in parametrized
complexity of the number of swaps k [10]. For  > 0, a (1 + )-approximation algorithm is an
algorithm that produces a solution within a factor (1+ ) of optimal for all valid inputs. Here,
we define a generalized version of Token Swapping that allows for partial permutations,
and then give a 4-approximation algorithm for this problem on connected simple graphs that
generalizes a previous 4-approximation algorithm for total permutations [38].
I Definition 6 (Partial Token Swapping). We define Partial Token Swapping as
an optimization problem. Given are a graph G = (V,E) and partial permutation pi : V ↼⇀ V .
The objective is to find the smallest k ∈ N such that pˆi = (u1 v1)(u2 v2) . . . (uk vk), for pˆi
some completion of pi and (ui, vi) ∈ E for i ∈ [k].
Analogous to the routing number, we define the routing size of pi : V ↼⇀ V on G, rs(G, pi), to
be the minimum k in Definition 6, and the routing size of G as
rs(G) := max
σ∈Sym(V )
rs(G, σ) . (12)
Token Swapping is the special case of Partial Token Swapping where pi is constrained
to be a total permutation. Partial Token Swapping also has an equivalent token-based
formulation, similar to Partial Routing via Matchings.
We now describe a permuter that aims to minimize the circuit size. Miltzow et al. [38]
gave a 4-approximation algorithm for Token Swapping. Here, we generalize their re-
sults to Partial Token Swapping and prove that our generalized algorithm is also a
4-approximation algorithm. For this section, we consider the token-based formulation of
Partial Token Swapping (recall the notion of tokens introduced in Section 3.1).
The main idea of Miltzow et al. is to perform swaps that reduce the sum of all distances
of tokens to their destinations. We use the following definitions from [38]: An unhappy swap
is “an edge swap where one of the tokens swapped is already on its target and the other
token reduces its distance to its target vertex (by one)”, and a happy swap chain is a path
of `+ 1 distinct vertices v1v2 . . . v`, such that swapping all (vi, vi+1) ∈ E, for i ∈ [`− 1], in
increasing order strictly reduces the distances of all tokens in the chain to their destinations.
TQC 2019
3:12 Circuit Transformations for Quantum Architectures
Algorithm 3.2: Routing tokens to their destinations while minimizing the number
of transpositions. We add an extra step that performs no-token swaps to the
algorithm of [38]. For v ∈ V , N(v) ⊆ V denotes the set of neighbors of v. The
partial permutation id |dom(pi) : V ↼⇀ V is the restriction of the identity function
id : V ↔ V to dom(pi) (so it is undefined outside of dom(pi)).
input : pi : V ↼⇀ V
1 while pi 6= id |dom(pi) :
2 if there exists a happy swap chain v1v2 . . . v` then
3 Perform transpositions (v1 v2)(v2 v3) . . . (v`−1 v`)
4 else if ∃v ∈ dom(pi),∃u ∈ N(v) \ dom(pi) : d(u, pi(v)) < d(v, pi(v)) then
5 Perform no-token swap (v u) // u has no token
6 else
7 There exists an unhappy swap; perform it
8 Update pi according to the transpositions that were performed
9 return The sequence of transpositions that was performed
When considering a partial permutation, not all vertices have a token assigned to them.
We add an extra step to the approximation algorithm for Token Swapping to make use
of this: Before considering an unhappy swap, we first try to swap a token to a tokenless
neighbor if it brings the token closer to its destination. We call this a no-token swap. The
approximation algorithm for Partial Token Swapping is specified in full in Algorithm 3.2.
I Theorem 7 (Proof in Appendix A.3). Given a simple connected graph G = (V,E) and
pi : V ↼⇀ V , Algorithm 3.2 uses at most 4 · rs(G, pi) transpositions.
In the full arXiv version we also show that, when restricted to tree graphs, Algorithm 3.2
is a 3-approximation algorithm or worse, even though it is a 2-approximation algorithm on
trees for total permutations [38].
4 Placing Qubits on the Architecture
A mapping algorithm (or mapper) finds an assignment of circuit qubits to architecture
vertices such that gates can be executed efficiently. We specify mappers in terms of the
routing number and the routing size. In practice, we replace these quantities with the upper
bounds that result from applying our permuters.
Mappers construct placements of circuit qubits onto qubits of the architecture. A
placement is a bijective partial function p : Q ↼⇀ V , where G = (V,E) is the architecture
graph. A mapper has access to the current placement pˆ : Q → V provided by the circuit
transformation. Given a placement p and the current placement pˆ, we can compute a partial
permutation p ◦ pˆ−1 : V ↼⇀ V that implements p. All our mappers construct a placement p
that is initially undefined everywhere and modify it until finished.
We briefly describe the mappers (and their abbreviations) that we implement and evaluate
(see Appendix B for details). We propose mappers optimizing for circuit depth (depth mappers)
and for circuit size (size mappers). For size mappers, specifically, if there is any gate that can
be performed without moving qubits, then there is no disadvantage to doing that immediately
since it will have to be performed eventually. If there is any such gate, we simply return
the empty placement. Thus we assume, for all size mappers, that there are no gates to be
performed in-place. Let L be the first layer of gates of the input circuit.
A.M. Childs, E. Schoute, and C.M. Unsal 3:13
The greedy depth mapper (greedy depth) repeatedly places the highest-cost gate in L at
its lowest-cost location, where the cost is the routing number to achieve the placement.
The incremental depth mapper (incremental) guarantees placement of only the lowest-
cost gate, instead of trying to place (almost) all gates in L, and incrementally improves
the situation for the other gates.
The greedy size mapper (greedy size) is the same as the greedy depth mapper, except
that we replace rt(·) with rs(·) in the objective function.
The simple size mapper (simple) places only the lowest-cost gate at its lowest-cost location.
The extension size mapper (extend) first places one gate using the same approach as the
simple size mapper. We then try to only place another gate if it is cheaper to place now
rather than in a later call to the mapper.
The Qiskit-based size mapper (qiskit) is based on Qiskit’s circuit transformation. We
slightly modify the circuit transformation routine in that it picks edges to swap one at a
time instead of finding a maximal matching. Since this is a mapper, we only execute one
iteration of the circuit transformation: for the first layer L. We return the final placement
pˆ that would be induced by executing all swaps found during the mapping process.
5 Results
We implement the circuit transformation introduced in Section 2.2.3 with a variety of mappers
and appropriate permuters. We also implement the greedy swap transformation described in
Section 2.2.2. We check the validity of our implementations by testing closeness in fidelity of
the original output state and that of the transformed circuit for random input states of 11
qubits on random circuits [47] (described in the next section).
Evaluation Criteria. When testing the performance of these circuit transformations, each
is allocated at most 8GB of RAM and 2 days to transform all circuits of a data point. For
each data point we transform 10 random circuits and 1 quantum signal processing (QSP)
circuit. We consider a 2-day runtime acceptable, given that classical computational resources
are plentiful compared to quantum ones. We generate the data on a heterogeneous cluster
with Intel Opteron 2354 and Intel Xeon X5560 processors.
The Cartesian permuter (see full arXiv version), the general size permuter (Section 3.2),
and Qiskit’s circuit transformation (Section 2.2.1) are randomized. We run multiple trials
of these permuters and take the best result. Most of the time, trials produce equally good
permutation circuits, although occasionally they deviate by a few swap gates. Our mappers
run permuters O(|L||E|) times, so we do only 4 trials to quickly remove any bad outliers. In
contrast, our circuit transformation only directly runs a permuter once per layer of gates, so
in this case we perform a slower 100 trials in an attempt to save a few swaps. We leave the
number of trials for Qiskit’s circuit transformation at its default of 40.
We test the performance of circuit transformations for the grid, Pn1 × Pn2 , using the
permuter for Cartesian graphs and for the modular architecture (Appendix A.2), MG(n1, n2),
for n1, n2 ∈ N. For an N -qubit circuit, we set n1 = n2 = d
√
Ne so that there are enough
qubits in the architecture to contain the circuit. By Theorem 4, we know that taking n1 = n2
minimizes the routing time for our routing strategy among all grids with the same number
of qubits. It is less clear how to balance parameters for the modular architecture since
Corollary 8 does not depend on n1 and n2. For n1  n2 or n2  n1, less movement of
qubits is needed, since many qubits are adjacent to one another. Thus, we take n1 = n2 in
an attempt to consider a hard case. For some values of N , it may also be possible to find
TQC 2019
3:14 Circuit Transformations for Quantum Architectures
parameters n′1 6= n′2 such that N ≤ n′1n′2 < d
√
Ne2 = n1n2, requiring fewer qubits. However,
this introduces unwanted size-dependent behavior in our results when |n′1 − n′2|  0 for one
circuit size and n′1 ≈ n′2 for the next, so we find it preferable to fix n1 = n2.
We compare the transformed circuits in terms of their weighted depth and weighted size.
For both trapped-ion and superconducting qubits, two-qubit gates typically have longer
execution times and lower fidelities than single-qubit gates [31]. Even among two-qubit gates
there is a difference between execution times. Assuming fast local unitaries, the swap gate
has 1–3 times the interaction cost of a cnot depending on the physical interactions used
to realize the gates [51]. For simplicity, we assign unit cost for one-qubit gates, cost 10 for
cnot, and cost 30 for swap. We define the weighted size of a circuit as the sum of all gate
weights and the weighted depth of a circuit as the maximum-weight path in the DAG of the
circuit, where the weight of a path is the sum of the weights of the gates along it.
We consider two circuit families: random circuits and QSP circuits [32]. Random
circuits have been proposed for quantum computational supremacy experiments on near-
term quantum devices [9, 11]. Such proposals typically construct random circuits so that
architecture constraints are automatically obeyed. For our purposes, random circuits provide
a class of examples with little structure for circuit transformations to exploit, so we expect
them to represent a hard case with large overhead. We generate a fixed set of 10 random
circuits of depth 20 for various qubit counts. For each layer, we bin the qubits into pairs
uniformly at random and assign each pair of qubits a Haar-random unitary from SU(4).
Finally, we decompose each unitary into the smallest possible number of cnot + SU(2)
gates [49]. This random circuit generator is provided by Qiskit [8].
We consider QSP circuits for Hamiltonian simulation as an example of a realistic quantum
algorithm. We use the unoptimized circuits provided in [14], decomposed into Z rotations,
cnot gates, and single-qubit Clifford gates. The QSP algorithm requires precise angles that
turn out to be expensive to compute. Therefore, [14] uses randomized angles instead, giving
a circuit that does not correctly implement the Hamiltonian simulation. Nevertheless, the
circuit corresponds to an accurate implementation of QSP, up to rotation angles, and can be
used for benchmarking resources. Furthermore, the circuit transformations we construct are
unaffected by those angles. We only consider one pair of phased iterates of the QSP algorithm
(V †φi+piVφi−1 as in [14, Eq. 31]). A full QSP circuit for the architecture can be constructed by
iterating the mapped circuit of such phased iterates, a permutation circuit between iterations,
state preparation, and state unpreparation. The cost of the transformed phased iterates
dominates all other costs of the construction, so the total cost can be estimated by taking
our result times the number of iterations.
The circuit transformations from Section 2.2.3 are constructed from a permuter and a
mapper. We denote such circuit transformations by tf : {d,s} ×Mp, where Mp is the set of
all mappers (referred to by their abbreviations, see Section 4), “d” denotes an appropriate
depth permuter (Section 3.1), and “s” denotes the general size permuter (Section 3.2). For
example, by tf(d,greedy depth) we denote a circuit transformation with a depth permuter
for the architecture and the greedy depth mapper (Section 4).
Numerical Results. Figure 1 plots our results. We first consider the random circuit results.
For the grid, we find that tf(d,incremental) shows much slower growth of weighted depth than
circuit transformations that do not use depth-optimized permuters (Section 3.1). We also note
that tf(d,qiskit) performs much better than Qiskit’s circuit transformation (Section 2.2.1),
suggesting that depth-optimized permuters can offer a significant advantage. On the modular
graph, Qiskit’s circuit transformation is much better at minimizing the weighted depth, but
A.M. Childs, E. Schoute, and C.M. Unsal 3:15
0
10000
20000
30000
40000
50000
60000
we
ig
ht
ed
 d
ep
th
0 50 100 150 200 250
qubits
0
100000
200000
300000
400000
500000
600000
700000
800000
we
ig
ht
ed
 si
ze
0 50 100 150 200 250
qubits
0
100000
200000
300000
400000
500000
we
ig
ht
ed
 d
ep
th
0 20 40 60 80 100
qubits
0
200000
400000
600000
800000
we
ig
ht
ed
 si
ze
0 20 40 60 80 100
qubits
greedy_swap
original
qiskit
tf(d,greedy depth)
tf(d,incremental)
tf(d,qiskit)
tf(s,extend)
tf(s,greedy size)
tf(s,qiskit)
tf(s,simple)
Figure 1 The weighted depth and weighted size for transformed random circuits (top two
rows) and QSP circuits (bottom two rows) on the grid architecture (left column) and the modular
architecture (right column). We generate fixed sets of 10 random circuits for increasing qubit counts
and plot the mean and standard deviation for each data point. One QSP circuit is considered for
each data point. The metrics for the original circuit are also given to make the overhead introduced
in circuit transformations explicit; note that the original circuit does not respect the architecture
constraints. The notation tf : {d,s} ×Mp indicates a circuit transformation constructed from either
an appropriate depth (“d”) permuter or the size (“s”) permuter and one of our mappers (Section 4).
TQC 2019
3:16 Circuit Transformations for Quantum Architectures
tf(d,qiskit) starts closing the gap for larger sizes. Unfortunately, we do not know if tf(d,qiskit)
performs better at larger sizes because Qiskit’s circuit transformation is not fast enough to
generate the relevant data. Up to 100 qubits tf(s,qiskit) achieves the best weighted size on
grid architectures, and tf(s,simple) does best on modular architectures up to 121 qubits. For
all sizes the greedy swap circuit transformation (Section 2.2.2) performs as one of the best
at optimizing for weighted circuit size. The greedy swap circuit transformation is also able
transform larger circuits within the time limit as expected from its lower time complexity.
For larger QSP circuits, the greedy circuit transformation (Section 2.2.2) is the clear
winner in both weighted depth and weighted size, suggesting that it may be a good approach
for practical quantum circuits. Surprisingly, tf(s,qiskit) also performs fairly well at minimizing
the depth despite targeting the circuit size.
6 Future Work
We would like to better understand what circuit transformations work best for which
architectures, quantum algorithms, and objective functions. We also would like to use the
tools of Partial Routing via Matchings and Partial Token Swapping to establish
bounds on the overhead of specific architectures. Ideally, we could use these tools and circuit
transformations to design architectures that offer good performance subject to realistic
hardware constraints and to compute realistic resource estimates for implementations of
quantum algorithms.
There are many ways our methods could be improved. It would be interesting to know
whether one can do better than just using swap gates to route qubits. Our mapper algorithms
may also be improved by including some form of lookahead to consider later layers of the
given circuit, or by specializing mappers to particular architectures.
Modeling the architecture as a simple graph loses information about the underlying
hardware. For example, in the IBM system the architecture edges have directionality
indicating the control and target of cnots. In implementations of the modular architecture,
the interconnecting links are probably much noisier and slower than local operations. In
general, gate costs and times can vary significantly across a hardware implementation
and sometimes even vary over time [41]. Adapting to variable costs and keeping track of
operations performed asynchronously is challenging but could be worthwhile for architectures
that support a mixture of fast and slow operations.
Finally, we hope that future progress on the challenges addressed in this paper will be
facilitated by a suitable set of benchmarks of large quantum circuits. We publicly make
available and license our source code, benchmark circuits, and results (in TSV format) [47]
and encourage others to do the same.
References
1 M. Ajtai, J. Komlós, and E. Szemerédi. An O(n logn) sorting network. In Proceedings of the
fifteenth annual ACM symposium on Theory of computing - STOC '83. ACM Press, 1983.
doi:10.1145/800061.808726.
2 Noga Alon, F. R. K. Chung, and R. L. Graham. Routing Permutations on Graphs via
Matchings. SIAM Journal on Discrete Mathematics, 7(3):513–530, May 1994. doi:10.1137/
s0895480192236628.
3 Indranil Banerjee and Dana Richards. New Results on Routing via Matchings on Graphs.
In Fundamentals of Computation Theory, pages 69–81. Springer Berlin Heidelberg, 2017.
doi:10.1007/978-3-662-55751-8_7.
A.M. Childs, E. Schoute, and C.M. Unsal 3:17
4 Indranil Banerjee, Dana Richards, and Igor Shinkar. Sorting Networks on Restricted Topolo-
gies. In SOFSEM 2019: Theory and Practice of Computer Science, pages 54–66. Springer
International Publishing, 2019. doi:10.1007/978-3-030-10801-4_6.
5 Aniruddha Bapat, Zachary Eldredge, James R. Garrison, Abhinav Deshpande, Frederic T.
Chong, and Alexey V. Gorshkov. Unitary entanglement construction in hierarchical networks.
Physical Review A, 98(6), 2018. doi:10.1103/PhysRevA.98.062328.
6 L. Barrière, C. Dalfó, M. A. Fiol, and M. Mitjana. The generalized hierarchical product of
graphs. Discrete Mathematics, 309(12):3871–3881, June 2009. doi:10.1016/j.disc.2008.10.
028.
7 R. Beals, S. Brierley, O. Gray, A. W. Harrow, S. Kutin, N. Linden, D. Shepherd, and M. Stather.
Efficient distributed quantum computing. Proceedings of the Royal Society A: Mathematical,
Physical and Engineering Sciences, 469(2153), 2013. doi:10.1098/rspa.2012.0686.
8 Luciano Bello, Jim Challenger, Andrew Cross, Ismael Faro, Jay Gambetta, Juan Gomez, Ali
Javadi-Abhari, Paco Martin, Diego Moreda, Jesus Perez, Erick Winston, and Chris Wood.
Qiskit, 2017. URL: https://www.qiskit.org/.
9 Sergio Boixo, Sergei V. Isakov, Vadim N. Smelyanskiy, Ryan Babbush, Nan Ding, Zhang
Jiang, Michael J. Bremner, John M. Martinis, and Hartmut Neven. Characterizing quan-
tum supremacy in near-term devices. Nature Physics, 14(6):595–600, 2018. doi:10.1038/
s41567-018-0124-x.
10 Édouard Bonnet, Tillmann Miltzow, and Paweł Rzążewski. Complexity of Token Swapping and
its Variants. Algorithmica, 80(9):2656–2682, October 2017. doi:10.1007/s00453-017-0387-0.
11 Adam Bouland, Bill Fefferman, Chinmay Nirkhe, and Umesh Vazirani. On the complexity
and verification of quantum random circuit sampling. Nature Physics, October 2018. doi:
10.1038/s41567-018-0318-2.
12 Teresa Brecht, Wolfgang Pfaff, Chen Wang, Yiwen Chu, Luigi Frunzio, Michel H. Devoret, and
Robert J. Schoelkopf. Multilayer microwave integrated quantum circuits for scalable quantum
computing. npj Quantum Information, 2(16002), 2016. doi:10.1038/npjqi.2016.2.
13 Stephen Brierley. Efficient Implementation of Quantum Circuits with Limited Qubit Interac-
tions. Quantum Info. Comput., 17(13-14):1096–1104, November 2017.
14 Andrew M. Childs, Dmitri Maslov, Yunseong Nam, Neil J. Ross, and Yuan Su. Toward the
first quantum simulation with quantum speedup. Proceedings of the National Academy of
Sciences, 115(38):9456–9461, 2018. doi:10.1073/pnas.1801723115.
15 Byung-Soo Choi and Rodney van Meter. On the Effect of Quantum Interaction Distance on
Quantum Addition Circuits. ACM Journal on Emerging Technologies in Computing Systems,
7(3):1–17, August 2011. doi:10.1145/2000502.2000504.
16 Byung-Soo Choi and Rodney van Meter. A Θ(
√
N)-depth Quantum Adder on the 2D NTC
Quantum Computer Architecture. J. Emerg. Technol. Comput. Syst., 8(3):24:1–24:22, August
2012. doi:10.1145/2287696.2287707.
17 Robert W. Floyd. Algorithm 97: Shortest path. Communications of the ACM, 5(6):345, June
1962. doi:10.1145/367766.368168.
18 Austin G. Fowler, Simon J. Devitt, and Lloyd C. L. Hollenberg. Implementation of Shor’s
Algorithm on a Linear Nearest Neighbour Qubit Array. Quant. Info. Comput. 4, 237-251
(2004), 2004.
19 Google Quantum AI Lab. A Preview of Bristlecone, Google’s New Quantum Processor. URL:
https://ai.googleblog.com/2018/03/a-preview-of-bristlecone-googles-new.html.
20 Jeff Heckey, Shruti Patil, Ali JavadiAbhari, Adam Holmes, Daniel Kudrow, Kenneth R. Brown,
Diana Franklin, Frederic T. Chong, and Margaret Martonosi. Compiler Management of
Communication and Parallelism for Quantum Computation. In Proceedings of the Twentieth
International Conference on Architectural Support for Programming Languages and Operating
Systems - ASPLOS '15. ACM Press, 2015. doi:10.1145/2694344.2694357.
21 Steven Herbert. On the depth overhead incurred when running quantum algorithms on
near-term quantum computers with limited qubit connectivity.
TQC 2019
3:18 Circuit Transformations for Quantum Architectures
22 Yuichi Hirata, Masaki Nakanishi, Shigeru Yamashita, and Yasuhiko Nakashima. An Efficient
Method to Convert Arbitrary Quantum Circuits to Ones on a Linear Nearest Neighbor Archi-
tecture. In 2009 Third International Conference on Quantum, Nano and Micro Technologies.
IEEE, February 2009. doi:10.1109/icqnm.2009.25.
23 John E. Hopcroft and Richard M. Karp. An n5/2 Algorithm for Maximum Matchings
in Bipartite Graphs. SIAM Journal on Computing, 2(4):225–231, December 1973. doi:
10.1137/0202019.
24 IBM Q Team. IBM Q Experience Devices, 2018. URL: https://quantumexperience.ng.
bluemix.net/qx/devices.
25 Donald E. Knuth. Networks for Sorting, volume 3 of The Art of Computer Programming,
pages 219–247. Addison-Wesley Professional, second edition, 1998.
26 Manfred Kunde. Optimal sorting on multi-dimensionally mesh-connected computers. In
STACS 87, pages 408–419. Springer-Verlag, 1987. doi:10.1007/bfb0039623.
27 Samuel A. Kutin, David Petrie Moulton, and Lawren M. Smithline. Computation at a distance.
Chicago Journal of Theoretical Computer Science, 13(1):1–17, 2007. doi:10.4086/cjtcs.2007.
001.
28 L. Lao, B. van Wee, I. Ashraf, J. van Someren, N. Khammassi, K. Bertels, and C. G. Almudever.
Mapping of lattice surgery-based quantum circuits on surface code architectures. Quantum
Science and Technology, 4(1):015005, September 2018. doi:10.1088/2058-9565/aadd1a.
29 Gushu Li, Yufei Ding, and Yuan Xie. Tackling the Qubit Mapping Problem for NISQ-Era
Quantum Devices.
30 Chia-Chun Lin, Susmita Sur-Kolay, and Niraj K. Jha. PAQCS: Physical design-aware fault-
tolerant quantum circuit synthesis. IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, 23(7):1221–1234, July 2015. doi:10.1109/tvlsi.2014.2337302.
31 Norbert M. Linke, Dmitri Maslov, Martin Roetteler, Shantanu Debnath, Caroline Figgatt,
Kevin A. Landsman, Kenneth Wright, and Christopher Monroe. Experimental comparison
of two quantum computing architectures. Proceedings of the National Academy of Sciences,
114(13):3305–3310, March 2017. doi:10.1073/pnas.1618020114.
32 Guang Hao Low and Isaac L. Chuang. Optimal Hamiltonian Simulation by Quantum Signal
Processing. Physical Review Letters, 118(1), January 2017. doi:10.1103/physrevlett.118.
010501.
33 Aaron Lye, Robert Wille, and Rolf Drechsler. Determining the minimal number of swap gates
for multi-dimensional nearest neighbor quantum circuits. In The 20th Asia and South Pacific
Design Automation Conference. IEEE, January 2015. doi:10.1109/aspdac.2015.7059001.
34 D. Maslov, S. M. Falconer, and M. Mosca. Quantum Circuit Placement. IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems, 27(4):752–763, 2008. doi:
10.1109/tcad.2008.917562.
35 Dmitri Maslov. Linear depth stabilizer and quantum Fourier transformation circuits with no
auxiliary qubits in finite-neighbor quantum architectures. Physical Review A, 76(5), November
2007. doi:10.1103/physreva.76.052310.
36 Tzvetan S. Metodi, Darshan D. Thaker, Andrew W. Cross, Frederic T. Chong, and Isaac L.
Chuang. Scheduling physical operations in a quantum information processor. In Eric J. Donkor,
Andrew R. Pirich, and Howard E. Brandt, editors, Quantum Information and Computation
IV, page 6244. SPIE, 2006. doi:10.1117/12.666419.
37 Silvio Micali and Vijay V. Vazirani. An O(
√
|V ||E|) algoithm for finding maximum matching
in general graphs. In 21st Annual Symposium on Foundations of Computer Science (sfcs 1980).
IEEE, October 1980. doi:10.1109/sfcs.1980.12.
38 Tillmann Miltzow, Lothar Narins, Yoshio Okamoto, Günter Rote, Antonis Thomas, and
Takeaki Uno. Approximation and Hardness for Token Swapping. In Piotr Sankowski and
Christos Zaroliagis, editors, Annual European Symposium on Algorithms, volume 24, pages
185:1–185:15. Leibniz International Proceedings in Informatics (LIPIcs), 2016.
A.M. Childs, E. Schoute, and C.M. Unsal 3:19
39 C. Monroe and J. Kim. Scaling the Ion Trap Quantum Processor. Science, 339(6124):1164–1169,
2013. doi:10.1126/science.1231298.
40 C. Monroe, R. Raussendorf, A. Ruthven, K. R. Brown, P. Maunz, L.-M. Duan, and J. Kim.
Large-scale modular quantum-computer architecture with atomic memory and photonic
interconnects. Physical Review A, 89(2), February 2014. doi:10.1103/physreva.89.022317.
41 Prakash Murali, Jonathan M. Baker, Ali Javadi Abhari, Frederic T. Chong, and Margaret
Martonosi. Noise-Adaptive Compiler Mappings for Noisy Intermediate-Scale Quantum Com-
puters, 2019.
42 Flemming Nielson, Hanne Riis Nielson, and Chris Hankin. Principles of Program Analysis.
Springer Berlin Heidelberg, 1999. doi:10.1007/978-3-662-03811-6.
43 M. Pedram and A. Shafaei. Layout Optimization for Quantum Circuits with Linear Nearest
Neighbor Architectures. IEEE Circuits and Systems Magazine, 16(2):62–74, 2016. doi:
10.1109/MCAS.2016.2549950.
44 Rigetti. QPU Specifications, 2018. URL: https://www.rigetti.com/qpu.
45 David J. Rosenbaum. Optimal Quantum Circuits for Nearest-Neighbor Architectures. In
Simone Severini and Fernando Brandao, editors, 8th Conference on the Theory of Quantum
Computation, Communication and Cryptography (TQC 2013), volume 22 of Leibniz Interna-
tional Proceedings in Informatics (LIPIcs), pages 294–307, Dagstuhl, Germany, 2013. Schloss
Dagstuhl–Leibniz-Zentrum fuer Informatik. doi:10.4230/LIPIcs.TQC.2013.294.
46 Mehdi Saeedi, Robert Wille, and Rolf Drechsler. Synthesis of quantum circuits for linear
nearest neighbor architectures. Quantum Information Processing, 10(3):355–377, June 2011.
doi:10.1007/s11128-010-0201-2.
47 Eddie Schoute, Cem Unsal, and Andrew Childs. arct, 2019. URL: https://gitlab.umiacs.
umd.edu/amchilds/arct.
48 Alireza Shafaei, Mehdi Saeedi, and Massoud Pedram. Qubit placement to minimize communi-
cation overhead in 2D quantum architectures. In 2014 19th Asia and South Pacific Design Au-
tomation Conference (ASP-DAC). IEEE, January 2014. doi:10.1109/aspdac.2014.6742940.
49 Farrokh Vatan and Colin Williams. Optimal quantum circuits for general two-qubit gates.
Physical Review A, 69(3), 2004. doi:10.1103/physreva.69.032315.
50 Davide Venturelli, Minh Do, Eleanor Rieffel, and Jeremy Frank. Compiling quantum circuits
to realistic hardware architectures using temporal planners. Quantum Science and Technology,
3(2):025004, February 2018. doi:10.1088/2058-9565/aaa331.
51 G. Vidal, K. Hammerer, and J. I. Cirac. Interaction Cost of Nonlocal Gates. Physical Review
Letters, 88(23), 2002. doi:10.1103/physrevlett.88.237902.
52 Mark Whitney, Nemanja Isailovic, Yatish Patel, and John Kubiatowicz. Automated generation
of layout and control for quantum circuits. In Proceedings of the 4th international conference on
Computing frontiers - CF '07, pages 83–94. ACM Press, 2007. doi:10.1145/1242531.1242546.
53 Robert Wille, Oliver Keszocze, Marcel Walter, Patrick Rohrs, Anupam Chattopadhyay, and
Rolf Drechsler. Look-ahead schemes for nearest neighbor optimization of 1D and 2D quantum
circuits. In 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).
IEEE, January 2016. doi:10.1109/aspdac.2016.7428026.
54 Robert Wille, Aaron Lye, and Rolf Drechsler. Exact Reordering of Circuit Lines for Nearest
Neighbor Quantum Architectures. IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, 33(12):1818–1831, December 2014. doi:10.1109/tcad.2014.2356463.
55 Katsuhisa Yamanaka, Erik D. Demaine, Takehiro Ito, Jun Kawahara, Masashi Kiyomi, Yoshio
Okamoto, Toshiki Saitoh, Akira Suzuki, Kei Uchizawa, and Takeaki Uno. Swapping Labeled
Tokens on Graphs. In Lecture Notes in Computer Science, pages 364–375. Springer International
Publishing, 2014. doi:10.1007/978-3-319-07890-8_31.
56 Louxin Zhang. Optimal Bounds for Matching Routing on Trees. SIAM Journal on Discrete
Mathematics, 12(1):64–77, January 1999. doi:10.1137/s0895480197323159.
TQC 2019
3:20 Circuit Transformations for Quantum Architectures
57 Alwin Zulehner, Alexandru Paler, and Robert Wille. An Efficient Methodology for Mapping
Quantum Circuits to the IBM QX Architectures. IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, 2018. doi:10.1109/tcad.2018.2846658.
58 Alwin Zulehner and Robert Wille. Compiling SU(4) quantum circuits to IBM QX architectures.
In Proceedings of the 24th Asia and South Pacific Design Automation Conference on - ASPDAC
'19, pages 185–190. ACM Press, 2019. doi:10.1145/3287624.3287704.
A Partial Permutations via Transpositions
Here we present proofs and statements that were omitted in the main text of Section 3.
A.1 Hierarchical Product
Proof of Lemma 3. Let G = (U, V,E) be a bipartite multi-graph, with U = V := [n1] the
left and right vertex sets, and the edge multi-set
E = {(v1, pi(v)1) | v ∈ dom(pi)} . (13)
Each vertex k ∈ U belongs to at most d edges (k, l), for l ∈ V and k 6= l, and each vertex
l′ ∈ V belongs to at most d edges (k′, l′), for k′ ∈ U and k′ 6= l′. However, for any k ∈ U
there could be as many as n2 edges (k, k). For all k ∈ U we remove as many (k, k) ∈ E as
necessary to ensure that the maximum degree of any vertex in G is d.
We make G d-regular by repeating the following: If @k ∈ U with deg(k) < d we are done.
Otherwise, such a k exists and ∃k′ ∈ V with deg(k′) < d, since∑
k∈U
deg(k) =
∑
k′∈V
deg(k′) . (14)
It follows that there exist vertices u ∈ Vk \ dom(pi) and v ∈ Vk′ \ image(pi). For the purposes
of this proof, we set pi(u) = v, effectively adding an edge (k, k′) to E.
Now we have modified pi so that G is d-regular. By Hall’s marriage theorem, there exists
a perfect matching in G, and removing it results in a (d − 1)-regular graph. We iterate
this to find d distinct perfect matchings in G. Each edge (k, k′) ∈ E corresponds to some
v ∈ Vk and u ∈ Vk′ , with pi(v) = u. Therefore, each perfect matching corresponds to a set of
representative vertices, Ri. Since all perfect matchings are distinct, and all e ∈ E are covered
by some matching, the Lemma follows. J
A.2 Modular Graph
Large-scale quantum computation may benefit from a modular design, with many inter-
connected subunits [39, 40, 12]. As a simple model of a modular quantum processor
consisting of n1 modules with n2 qubits each, we consider the modular graph MG(n1, n2) :=
Π~e1(Kn1 ,Kn2) = (V,E), where ~ei ∈ {0, 1}n2 , for i ∈ [n2], is the ith standard basis vector. In
this architecture, two qubits in the same module can be directly coupled, and any two modules
can be coupled through their unique communicator qubits. With one minor modification to
Theorem 4, we get the following bounds on the routing number of the modular graph.
I Corollary 8. For n1, n2 ∈ N and pi : V ↼⇀ V , we have deg(pi) ≤ rt(MG(n1, n2), pi) ≤
3 deg(pi) + 2.
Proof. Directly applying Theorem 4 gives rt(MG(n1, n2), pi) ≤ 4 deg(pi) + 2. However, only
one token needs to be routed to the communicator vertex in every round of Algorithm 3.1. We
note that this can be routed with one set of parallel transpositions (Section 3.1), saving us one
matching every round. To show the lower bound, we apply Theorem 5 with ham(~e1) = 1. J
A.M. Childs, E. Schoute, and C.M. Unsal 3:21
A.3 Partial Token Swapping
Proof of Theorem 7. The proof is very similar to [38, Theorem 7] with some minor modifi-
cations to account for no-token swaps. Let
S :=
∑
v∈dom(pi)
d(v, pi(v)) , (15)
and let su, sh, snt be the number of unhappy, happy, and no-token swaps, respectively. We
know that rs(G, pi) ≥ S/2 since each swap can only reduce S by two. A no-token swap
reduces S by one. A happy swap chain of length ` reduces S by ` + 1. As such, over the
course of the algorithm, sh + snt ≤ S. For an unhappy swap, the token that is swapped
away from its destination must next be involved in a happy swap or a no-token swap, so
su ≤ sh + snt. Thus, we get su + sh + snt ≤ 4 · rs(G, pi) as desired. J
B Specifics of Mappers
Here, we describe our mappers (Section 4) in full. Let M be a maximum matching in the
architecture graph.
B.1 Greedy Depth Mapper
The greedy depth mapper iteratively places the highest-cost gate at its lowest-cost location,
where cost is measured in terms of the routing number to achieve the placement. More
precisely, we initialize the set of used vertices U := ∅ and find a placement p′ := {q1 7→
v1, q2 7→ v2} that attains the optimum
max
(q1,q2)∈tg(L)
min
(v1,v2)∈M
rt
(
G, (p ∪ {q1 7→ v1, q2 7→ v2}) ◦ pˆ−1
)
, (16)
where we consider both orderings of edges from M , (v, u), (u, v) ∈ M , since edges are
undirected. Then, we update U ← U ∪ dom(p′) and recompute M for the graph G[V \ U ]
(recall (10)); we remove the gate associated to (q1, q2) from L; we set p ← p ∪ p′; and we
iterate until tg(L) = ∅ or M = ∅. Finally, we return the placement p.
B.2 Incremental Depth Mapper
Instead of trying to place (almost) all gates in L, the incremental depth mapper guarantees
placement of only the lowest-cost gate, as given by the routing number, and incrementally
improves the situation for the other gates. Specifically, we first find a placement pmin :=
{q1 7→ v1, q2 7→ v2} that attains the optimum
c′min := min(q1,q2)∈tg(L)
min
(v1,v2)∈E
rt(G, (p ∪ {q1 7→ v1, q2 7→ v2}) ◦ pˆ−1) , (17)
where we consider both orderings of E, (u, v), (v, u) ∈ E. We set p ← pmin and define
U := {u, v}. Let cmin := max{c′min, 1}.
We find a placement for the remaining two-qubit gates that (individually) does not exceed
cmin. We iterate in arbitrary order over (q1, q2) ∈ tg(L) and do the following: For i ∈ [2], we
construct a set of eligible vertices
Ui :=
{
v ∈ V \ U | rt(G, (p ∪ {qi 7→ v}) ◦ pˆ−1) ≤ cmin} . (18)
Now we try to find v∗1 6= v∗2 as (v∗1 , v∗2) := arg min(v1,v2)∈U1×U2 d(v1, v2). If such (v∗1 , v∗2) does
not exist, we do not include q1 and q2 in p; otherwise, we set p ← p ∪ {q1 7→ v∗1 , q2 7→ v∗2}
and update U ← U ∪ {v∗1 , v∗2}. After iterating over all gates in tg(L), we return p.
TQC 2019
3:22 Circuit Transformations for Quantum Architectures
B.3 Greedy Size Mapper
The greedy size mapper the same as the greedy depth mapper, except that we replace rt(·)
with rs(·) in (16).
B.4 Simple Size Mapper
The simple size mapper places only the lowest-cost gate at its lowest-cost location. More
precisely, we find a placement p := {q1 7→ v1, q2 7→ v2} that attains the optimum
min
(q1,q2)∈tg(L)
min
(v1,v2)∈E
rs
(
G, (p ∪ {q1 7→ v1, q2 7→ v2}) ◦ pˆ−1
)
(19)
where we consider all orderings of the edges of E, and return p. Note that we have replaced
rt(·) with rs(·) in (17).
B.5 Extension Size Mapper
The extension size mapper first finds an initial placement p using (19). Let c′min be the value
attained at the optimum for (19). After finding the initial placement, we try to only place
another gate if it is cheaper to place now rather than in a later call to the mapper.
Specifically, for the current p and pˆ, we define pˆ′ : Q→ V as the placement after performing
the permutation circuit constructed from transpositions achieving rs(G, p ◦ pˆ−1). Let U := ∅.
Now we define a heuristic for the number of saved transpositions, s : Q×Q→ N, as
s(q1, q2) := rs
(
G, p ◦ pˆ−1)+ min
(v1,v2)∈E
rs
(
G, {q1 7→ v1, q2 7→ v2} ◦ (pˆ′)−1
)
− min
(u1,u2)∈E′
rs
(
G, (p ∪ {q1 7→ u1, q2 7→ u2}) ◦ pˆ−1
)
, (20)
where E′ is the edge set of G[V \ U ] and we consider all orderings of the edges of E and E′.
The extension size mapper iterates the following. We find the gate (q∗1 , q∗2) ∈ tg(L)
attaining smax := max(q1,q2)∈tg(L) s(q1, q2), and let (u∗1, u∗2) ∈ E′ be the edge attaining smax
as given by (20). If smax ≥ 0, we set p← p ∪ {q∗1 7→ u∗1, q∗2 7→ u∗2}, remove the gate (q∗1 , q∗2)
from L, update U ← U ∪ {v∗1 , v∗2}, and iterate; otherwise, we stop and return p.
B.6 Qiskit-based Mapper
Finally, we implement a mapper that is based on Qiskit’s circuit transformation (described
in Section 2.2.1). Since this is a mapper, we only execute one iteration of the circuit
transformation: for the first layer L. We also do not modify the output circuit, but instead
return the final pˆ that would be induced by executing all swaps found during the mapping
process.
We make three changes to Qiskit’s circuit transformation. The first is that when mini-
mizing S, instead of choosing a maximal set of swaps in every iteration, we choose only one
swap along an edge e ∈ E that minimizes S. The second is that the upper bound on the
number of iterations is raised to |V |2, since we only apply one swap per iteration. Thirdly,
if no trial is successful, we fall back to the simple size mapper and return the placement it
finds, which places only one gate in this iteration.
C Time Complexity Analysis
To show that our proposed algorithms have polynomial worst case time complexity, we
compute time complexities of our circuit transformations, permuters, and mappers explicitly.
A.M. Childs, E. Schoute, and C.M. Unsal 3:23
C.1 Circuit Transformations
Greedy Swap Circuit Transformation. We ignore the initial placement since it is insignif-
icant for large circuits. A gate from L is executed in at most diam(G) iterations, where
diam(G) is the diameter of G. In every iteration, O(|E|) edges are checked to determine gates
that can be executed and swaps that will decrease R. Therefore, the total time complexity
is O(|C||E|diam(G)), where |C| denotes the size of circuit C. There is a tighter bound in
terms of output circuit C ′ since every iteration creates a layer in the transformed circuit, the
complexity is O(depth(C ′)|E|), where depth(C ′) denotes the circuit depth of C ′.
Specialized Circuit Transformations (Section 2.2.3). We again ignore the time complexity
of computing the initial placement. Let tm be an upper bound on the time complexity of the
mapper, and let tp be an upper bound on the time complexity of the permuter. Computing
p ◦ pˆ−1 takes time O(|V |). The number of transpositions produced by the permuter is at
most tp, so executing the associated swaps takes time O(tp). Only one gate from L may
be executed every iteration so we upper bound the number of iterations by |C|. We find a
time complexity of O(|C| (tm + |V |+ tp)). Clearly, if tp, tm ∈ poly(|C|, |V |) then our circuit
transformation is also poly-time as desired.
C.2 Permuters
We show that the permuters are polynomial-time in the input size.
Complete Graph. The time complexity of the Routing via Matchings algorithm for
Kn is O(n) [2]. The other operations described above also take time O(n), so we get a time
complexity of O(n) for the complete graph permuter.
Path Graph. Constructing the completion pˆi takes time O(|V |). The total complexity for
running the path permuter is O(|V |2), where the time complexity of the Routing via
Matchings algorithm [2] dominates the construction of pˆi.
Hierarchical Product. Let t1 and t2 upper bound the time complexity of algorithms for
Partial Routing via Matchings on G1 and G2, respectively. We first find deg(pi) distinct
sets of representative vertices by Lemma 3. The time to find one set of representative vertices
is dominated by the time to find the maximum bipartite matching, O(n2.51 ) [23]. Then, for
ddeg(pi)/ ham(~v)e iterations, we route on all copies of G2 and then G1 in parallel. Overall,
we get a time complexity of
O
(
deg(pi) · n2.51 +
⌈
deg(pi)
ham(~v)
⌉
(ham(~v)t1 + n1t2) + n1t2
)
. (21)
Modular Architecture. We evaluate the time complexity of this permuter using Equa-
tion (21). We have t1 = O(n1) and t2 = O(n2), giving an overall time complexity of
O(dn2.51 + n1n2), where we noted that t2 = O(1) while doing the deg(pi) rounds of routing.
Approximation Algorithm for Partial Token Swapping. Computing an all-to-all distance
matrix takes time Θ(|V |3) using the Floyd-Warshall algorithm [17], but this cost needs only
to be incurred once for a graph so we do not include it. A happy or unhappy swap can be
found in time O(|E|) by finding cycles in an auxiliary directed graph [38]. Similarly, finding
no-token swaps has time complexity O(|E|). Therefore, we get a total time complexity of
O(S|E|) ≤ O(|V |2|E|).
TQC 2019
3:24 Circuit Transformations for Quantum Architectures
C.3 Mappers
We give polynomially-sized upper bounds on the time complexity of the mappers as a function
of the time complexity of the permuter, tp.
Greedy Depth Mapper. We perform at most min{|L|, |M |} iterations to place gates. In
each iteration, we find a p′ according to (16) in time O(|L||M |tp). Thus, the time complexity
for one call of the mapper is
O
(
min{|L|, |M |}
(
|L||M |tp +
√
|V ||E|
))
, (22)
where O(
√|V ||E|) is the complexity of computing a maximum matching [37].
Incremental Depth Mapper. We get O(|L| (|E|tp + |V |tp + |V |2)) for the time complexity
of the incremental depth mapper. This assumes we have access to the all-pairs distance matrix
of the architecture graph, which can be precomputed in time Θ(|V |3) [17] (independent of
the input circuit).
Simple Size Mapper. The time complexity of the simple size mapper is O(|L||E|tp).
Extension Size Mapper. Calculating s(q1, q2) for any q1, q2 ∈ Q takes time O(|E|tp).
Therefore, the total time complexity of the extension size mapper is O(|L|2|E|tp).
Qiskit-based Mapper. First, we compute an all-to-all distance matrix in time Θ(|V |3) [17],
which we ignore since it is a one-time cost dependent only on the architecture. Each of the
O(|V |2) iterations has a time complexity of O(|E||L|). Thus, the Qiskit mapper has time
complexity O(|V |2|E||L|).
