Circuit Transformations for Quantum Architectures by Childs, Andrew M. et al.
Circuit Transformations for Quantum Architectures
Andrew M. Childs*1,2,3, Eddie Schoute†1,2,3, and CemM. Unsal4
1Joint Center for Quantum Information and Computer Science, University of Maryland
2Institute for Advanced Computer Studies, University of Maryland
3Department of Computer Science, University of Maryland
4Department of Mathematics, University of Maryland
September 6, 2019
Abstract
Quantum computer architectures impose restrictions on qubit interactions. We propose efficient
circuit transformations that modify a given quantum circuit to fit an architecture, allowing for any
initial and final mapping of circuit qubits to architecture qubits. To achieve this, we first consider
the qubit movement subproblem and use the Routing viaMatchings framework to prove tighter
bounds on parallel routing. In practice, we only need to perform partial permutations, so we gener-
alize Routing viaMatchings to that setting. We give new routing procedures for common archi-
tecture graphs and for the generalized hierarchical product of graphs, which produces subgraphs of
the Cartesian product. Secondly, for serial routing, we consider the Token Swapping framework
and extend a 4-approximation algorithm for general graphs to support partial permutations. We
apply these routing procedures to give several circuit transformations, using various heuristic qubit
placement subroutines. We implement these transformations in software and compare their per-
formance for large quantum circuits on grid and modular architectures, identifying strategies that
work well in practice.
1 Introduction
Quantum algorithms are typically formulated in a circuit model in which two-qubit gates can be per-
formed between any pair of qubits. However, most realistic quantum architectures impose restrictions
on qubit interactions. Thus a natural challenge is to find a way of implementing a given circuit on a
given architecture with low overhead. We can do this by finding a time-efficient architecture-respecting
circuit transformation—amapping to a new circuit that preserves the function of the original quantum
circuit up to an initial mapping of circuit qubits to architecture qubits and a final mapping of architec-
ture qubits back to circuit qubits, where the new circuit is constrained to respect the architecture.
There have been many proposals for the design of quantum processors. Examples include trapped
ion systems that enable interactions between any two ions in a trap [40] and superconducting qubit ar-
chitectures withmore limited interactions [20, 50, 45]. Many proposed architectures for scalable devices
employ modularity, building a large device from interconnected subunits. For example, one proposal
considers registers of ion trap qubits coupled via photonic quantum channels through a reconfigurable
*amchilds@umd.edu
†eschoute@cs.umd.edu
1
optical switch [40, 41]. Another approach uses shielded modules of circuit quantum electrodynamics
devices connected by superconducting transmission wires [12].
There is also a considerable amount of work on implementing circuits under architectural con-
straints. Some examples include implementations of Shor’s algorithm [18], the quantum Fourier trans-
form on 1D nearest-neighbor architectures [36], and quantum adders on nearest-neighbor architec-
tures [16, 15]. However, the aforementionedworks focus on analyzing specific circuits. Instead, wewish
to find automated circuit transformations that can handle complex circuits and compare their perfor-
mance when implemented in various architectures. Bounds on the efficacy of architecture-respecting
circuit transformations and good automated tools for implementing them may be able to inform ar-
chitecture design decisions [54]. Unfortunately, it is challenging to achieve good performance with an
automated tool. Indeed, finding even one optimal placement for a set of gates is NP-hard [35].
Prior Work on Automated Architecture-Respecting Circuit Transformations Several previ-
ous works use exhaustive approaches that take time exponential in the number of qubits (and hence
can only be used for small instances). For example, Saeedi, Wille, and Drechsler [47] use SAT solvers to
decompose circuits so they can be run on the path architecture; [34] finds an optimal circuit transforma-
tion on nearest-neighbor architectures by formulating the problem as a pseudo-boolean optimization;
Venturelli et al. [52] use temporal planners to schedule gates; and [42] uses satisfiability modulo the-
ory solvers to find mappings of the circuit with high success probability using calibration data. Other
work has instead proposed minimizing the distance between all qubits in groups of gates on specific
architectures [49, 56, 44], but this is also NP-hard in general. These and other papers add swap gates
so that the logical state of a given physical qubit is transferred to a different physical qubit (henceforth,
we simply refer to this as qubit movement, with the implicit understanding that only the logical state is
moved).
As a heuristic solution, we can break the circuit into sets of disjoint gates and move the qubits
between each set. Metodi et al. [37] propose polynomial-time heuristic routines that prioritize gates
withmany dependents. Hirata et al. [23] propose exhaustive and heuristic searches for good placements
of qubits on the path architecture and use those to construct circuit transformations.
One can also use heuristic qubit placement and movement algorithms on fault-tolerant 2D grid
architectures [31] or algorithms that are designed to handle the surface code [29]. We do not consider
fault tolerance explicitly and instead work only at the logical level.
An exhaustive search of all permutations of 𝑛 qubit locations takes time 𝑂(𝑛!) but can work well
for small numbers of locations [55], or can be done selectively using 𝐴∗ heuristic search [59, 60] or
local search [35, 8]. By choosing a suitable initial placement of qubits, we can further reduce the qubit
movement cost. For example, [30] tries to find a good initial placement by repeatedly transforming the
quantum circuit forwards and then backwards, taking the output qubit placement as input for the next
iteration.
Other work has considered a model in which one can perform fast measurements and adapt later
parts of the computation based on the outcomes [21]. This model allows the movement of qubits with
just a constant overhead at the cost of extra ancillas [46]. However, realizing such a model presents
significant technical challenges and we do not consider it here.
Various bounds are also known for the cost of moving qubits. Sorting networks provide a way to
upper bound the depth of the qubit movement circuit [28, 7, 13]. Further, there exists a circuit where
the depth overhead of qubit movement is at least logarithmic for architectures with finite degree [22].
We refer to [3] for a more complete overview of sorting networks.
2
Contribution In this paper, we construct architecture-respecting circuit transformations that attempt
to minimize the circuit depth or size overhead and have worst-case time complexity polynomial in the
sizes of the circuit and architecture graph. We model the connectivity of the underlying hardware as
a simple graph where vertices represent the qubits and edges represent places where a two-qubit gate
can be performed.
As a simple and fast approach, we propose the greedy swap circuit transformation (Section 2.2.2).
It inserts swaps on edges chosen to minimize the total distance between qubits involved in two-qubit
gates until some gate(s) can be executed.
We then propose building architecture-respecting circuit transformations (Section 2.2.3) by com-
bining algorithms for two basic subproblems: qubit movement (addressed by permuters, for which we
provide theoretical performance guarantees) and qubit placement (addressed bymappers). For the lat-
ter, we specify a variety of heuristic strategies (Section 4) to find suitable placements of qubits from
the input circuit, attempting to optimize for circuit size or depth. We implement these algorithms in
software, which is publicly available under a free software license [48].
Consider now the problem of moving qubits on a given architecture graph. A sorting network sorts
any fixed-length sequence of integers with a circuit of comparators, which compare two inputs and
output them in nondecreasing ordering. While sorting networks can be used to route qubits [7], they
achieve amore general task, and the cost of routing can sometimes be lowerwith othermethods. Specif-
ically, we suggest Routing via Matchings [2] (introduced in Section 3.1) as a more suitable frame-
work for moving qubits in parallel. Deciding whether there exists a depth-𝑘 circuit for Routing via
Matchings is NP-complete in general for 𝑘 > 2 [3], but optimal or near-optimal protocols are known
for specific graph families [2, 58]. In some cases it is possible to implement any permutation asymptot-
ically more efficiently than a general sorting network (see Table 1). On complete graphs, for example,
any permutation can be implemented in a depth-2 circuit of transpositions [2], whereas an optimal
sorting network has depth Θ(log𝑛) [1].
While it is common to consider only the worst-case routing performance, we also wish to route
efficiently in practice. To improve practical performance, we generalize to partial permutations (per-
mutations only defined on some subdomain) so that we can also move subsets of qubits efficiently. The
destinations of the remaining qubits are unconstrained. In Section 3.1, we present routing algorithms
for the path graph, the complete graph, and the generalized hierarchical product of graphs [6], which
includes the Cartesian product of graphs and modular architectures as special cases [41]. Graphs ob-
tained as hierarchical products have many good properties for quantum architectures [5]. We establish
an upper bound on the routing number of a hierarchical product (Theorem 3.4) that matches prior
work for total permutations on the Cartesian product of graphs [2] and depends on easily computable
properties of the input partial permutation.
We also propose using Token Swapping [57] for minimizing the total number of swaps, which
is relevant when optimizing for total circuit size (Section 3.2). We generalize this problem to partial
permutations and obtain a 4-approximation algorithm (Theorem 3.10).
Finally, we evaluate our circuit transformations on large quantum circuits (Section 5) and compare
their performance with the circuit transformation included in the Qiskit software (Section 2.2.1) [8].
We find that the relative performance varies significantly with the circuit type and architecture. When
minimizing circuit size, the greedy swap circuit transformation is one of the best, though some im-
provementmay be gained using some of our specialized circuit transformations. For depth, some of our
specialized circuit transformations do best on random circuits on grid architectures, whereas Qiskit’s
circuit transformation does well on modular architectures. For quantum signal processing circuits [33]
we find that the depth is best minimized by our greedy swap circuit transformation.
3
Worst-case circuit depth
Graph family Sorting (comparators) Routing nr. (transpositions)
path (𝑃𝑛) 𝑛 [25] 𝑛 [2]
complete (𝐾𝑛) Θ(log𝑛) [1] 2 [2]
tree with max degree Δ, diameter 𝐷 𝑂(min (Δ, log(𝑛/𝐷)) 𝑛) [4] 3𝑛/2 + 𝑂(log𝑛) [58]
gen. hierarchical product, Π𝐯(𝐺1, 𝐺2) not known ⌈
|𝑉2|
ham(𝐯)
⌉(rt(𝐺1) + rt(𝐺2)) + rt(𝐺2)
Cart. product 𝐺1 × 𝐺2 = Π𝟏(𝐺1, 𝐺2) not known 2 rt(𝐺1) + rt(𝐺2) [2]
𝑟-dimensional grid (⨉
𝑟
𝑖=1
𝑃𝑛𝑖) 𝑛1 + 2∑
𝑟
𝑖=2
𝑛𝑖 + 𝑜(·) [27] 𝑛1 + 2∑
𝑟
𝑖=2
𝑛𝑖 [2]
modular architecture, Π𝐞1(𝐾𝑛1, 𝐾𝑛2) not known 3𝑛2 + 2
Table 1: Performance bounds for sorting networks versus routing via matchings (the routing number, rt(𝐺);
see (9)) where |𝑉| = 𝑛. Let 𝐺2 = (𝑉2, 𝐸2) and 𝐯 ∈ {0, 1}
|𝑉2|; ham(𝐯) is the Hamming weight of 𝐯, 𝟏 ≔ [1… 1], and
𝐞1 ≔ [1 0…0]. We list special cases of the generalized hierarchical product (see Definition 3.2): the Cartesian
product of graphs, the 𝑟-dimensional grid, and the modular graph. See [4] for a short overview of known lower
and upper bounds for sorting networks restricted to common topologies.
2 Constructing Circuit Transformations
Program transformations are algorithms that modify computer programs while retaining functional-
ity [43]. In a similar vein, we define a circuit transformation as an algorithm that modifies an input
quantum circuit to produce an output quantum circuit with the same functionality. We represent an
architecture by a simple graph 𝐺 = (𝑉, 𝐸), and let 𝑄 denote the set of qubits of the input circuit. A
circuit transformation is architecture-respecting if it produces injective initial and final mappings of
the form ?̂? ∶ 𝑄 → 𝑉 and an architecture-respecting output circuit. The output circuit is architecture-
respecting if for each two-qubit gate acting on (qubit) vertices 𝑣1, 𝑣2 we have (𝑣1, 𝑣2) ∈ 𝐸 (where the
ordering is irrelevant since 𝐺 is undirected). Henceforth, we only consider circuit transformations that
are architecture-respecting, and we refer to them simply as circuit transformations. We propose a con-
struction for a general circuit transformation that may use the properties of the underlying architecture
by relying on a specialized subroutine for moving qubits called a permuter (Section 3), and a subroutine
determining where to place qubits, called amapper (Section 4).
To be able to transform a circuit, we must have |𝑄| ≤ |𝑉|, and the output circuit must contain a
qubit for every vertex in the architecture. Throughout the circuit transformation, we keep track of the
injective current placement of qubits ?̂? ∶ 𝑄 → 𝑉. The initial and final values of ?̂? are also the initial
and final mappings, respectively, of qubits to the architecture. A gate is executed by appending it to the
output circuit. Two-qubit gates with qubits 𝑞1, 𝑞2 ∈ 𝑄 can only be executed when (?̂?(𝑞1), ?̂?(𝑞2)) ∈ 𝐸.
By adding swap gates to the output circuit, we can change ?̂? and thereby unitarily transform quantum
circuits for execution on an architecture.
2.1 Definitions
We define some terminology that will be used throughout the paper.
2.1.1 Partial Functions and Partial Permutations
For sets 𝑋 and 𝑌, a partial function 𝑓∶ 𝑋 ⇀ 𝑌 is a mapping from dom(𝑓) ⊆ 𝑋 to image(𝑓) ≔ {𝑓(𝑥) ∣
𝑥 ∈ dom(𝑓)} ⊆ 𝑌. However, 𝑓(𝑥) is undefined for 𝑥 ∈ 𝑋 ⧵ dom(𝑓). We consider such elements 𝑥
4
unmapped. For 𝑥 ∈ dom(𝑓), we write 𝑥 ↦ 𝑓(𝑥) and say that 𝑥 ismapped to 𝑓(𝑥). We can then define
any partial function 𝑓 as a set of mappings, 𝑓 ≔ {𝑥 ↦ 𝑦 ∣ 𝑥 ∈ 𝑋, 𝑦 ∈ 𝑌}, where all preimages must be
distinct (i.e., if 𝑥 ↦ 𝑦 ∈ 𝑓 and 𝑥′ ↦ 𝑦′ ∈ 𝑓 with 𝑦 ≠ 𝑦′, then 𝑥 ≠ 𝑥′). A total function ̂𝑓 is a partial
function where dom ( ̂𝑓) = 𝑋 and is denoted ̂𝑓 ∶ 𝑋 → 𝑌. By the term “function” we will mean a total
function.
A partial function 𝑓 is injective iff ∀𝑥, 𝑥′ ∈ dom(𝑓)with 𝑥 ≠ 𝑥′, 𝑓(𝑥) ≠ 𝑓(𝑥′). A function ̂𝑓 ∶ 𝑋 → 𝑌
is surjective iff ∀𝑦 ∈ 𝑌, ∃𝑥 ∈ 𝑋 ∶ 𝑓(𝑥) = 𝑦. A bijective partial function 𝑓 is a partial function that is
injective and is denoted 𝑓∶ 𝑋 ⥎ 𝑌 (note that such an 𝑓 is necessarily surjective on its image). A
bijective function ̂𝑓 is both injective and surjective and is denoted by ̂𝑓 ∶ 𝑋 ↔ 𝑌. For any bijective
(partial) function 𝑓 there exists an inverse function 𝑓−1∶ image(𝑓) → dom(𝑓).
A partial permutation 𝜋 is any bijective partial function with the same domain and codomain, i.e.,
𝜋∶ 𝑋 ⥎ 𝑋. Similarly, a total permutation is any 𝜎∶ 𝑋 ↔ 𝑋. By “permutation” we mean a total
permutation.
We also define some notions specifically useful for this paper. An unmapped vertex is a vertex in
𝑉⧵dom(𝜋), for a graph𝐺 = (𝑉, 𝐸) and 𝜋∶ 𝑉 ⥎ 𝑉. We define the union of partial functions 𝑓∶ 𝑋 ⇀ 𝑌
and 𝑔∶ 𝑋 ⇀ 𝑌 when dom(𝑓) ∩ dom(𝑔) = ∅ as
(𝑓 ∪ 𝑔)(𝑥) ≔ {
𝑓(𝑥) if 𝑥 ∈ dom(𝑓) ,
𝑔(𝑥) if 𝑥 ∈ dom(𝑔) .
(1)
Furthermore, (𝑓∪𝑔) is a bijective partial function iff𝑓 and 𝑔 are bijective partial functions and image(𝑓)∩
image(𝑔) = ∅. A completion of 𝜋∶ 𝑋 ⥎ 𝑋 is a ?̂? ∶ 𝑋 ↔ 𝑋 = (𝜋 ∪ 𝜎) for some 𝜎∶ 𝑋 ⥎ 𝑋, where
dom(𝜎) = 𝑋 ⧵ dom(𝜋) and image(𝜎) = 𝑋 ⧵ image(𝜋).
2.1.2 Directed Acyclic Graph Representation of a Circuit
A quantum circuit can be viewed as a directed acyclic graph (DAG), where vertices represent gates and
directed edges represent qubit dependencies. We define the first layer of the DAG, 𝐿, to be the set of all
vertices without predecessors. By removing 𝐿 and taking the first layer of the resulting DAG, we can
define the second layer, and so on.
The size of a circuit is the number of gates it contains (i.e., the number of vertices in the DAG);
the depth of a circuit is the number of layers. It is natural to minimize either the depth (Section 4.1),
corresponding to the execution time when gates can be applied in parallel, or the the size (Section 4.2),
corresponding to the total number of operations that must be performed. We are mainly interested in
two-qubit gates and their qubits. Thus we define tg∶ 𝑉𝐷 → 𝑄 × 𝑄, where 𝑉𝐷 is the set of DAG vertices
corresponding to two-qubit gates, that outputs the pair of qubits acted on by a given gate. For simplicity,
we denote tg(𝐿) ≔ {tg(𝑔) ∣ 𝑔 ∈ 𝐿, 𝑔 is a two-qubit gate}.
2.2 Architecture-Respecting Circuit Transformations
We now describe some specific architecture-respecting circuit transformations. We first describe two
basic circuit transformations, one provided by the Qiskit software (Section 2.2.1) and another that uses
a simple greedy approach (Section 2.2.2). Then, in Section 2.2.3 we specify a family of circuit transfor-
mations that builds on specialized procedures for qubit placement and routing.
5
2.2.1 Qiskit Circuit Transformation
The open-source quantum computing software framework Qiskit [8] contains a circuit transformation1
that we build upon in one of our approaches (Section 4.2.4). We specify this transformation here and
compare it with our other approaches to circuit transformations in Section 5.
We initialize ?̂? arbitrarily. Fix a number of trials, 𝑘 ∈ ℕ, for each layer. We do the following in trial
𝑖 ∈ [𝑘] where [𝑘] ≔ {1, … , 𝑘}: For all 𝑣, 𝑢 ∈ 𝑉, sample a symmetric weight
𝑑𝑖(𝑣, 𝑢) = (1 +𝒩(0, 1/𝑁)) 𝑑(𝑣, 𝑢)
2 (2)
independently for (𝑣, 𝑢) ∈ 𝑉×𝑉, where𝒩(𝜇, 𝜎) represents a sample from the normal distribution with
mean 𝜇 ∈ ℝ and standard deviation 𝜎 ≥ 0, and 𝑑∶ 𝑉 × 𝑉 → ℕ is the shortest distance function on the
architecture graph. We define an objective function as the sum of gate distances,
𝑆 ≔ ∑
(𝑞1,𝑞2)∈tg(𝐿)
𝑑𝑖(?̂?(𝑞1), ?̂?(𝑞2)) . (3)
We now try to swap pairs of qubits to decrease 𝑆. Specifically, we construct a set of swaps by iterating
over all edges 𝑒 ∈ 𝐸 and greedily adding the corresponding swap if it decreases 𝑆 and neither endpoint
of 𝑒 is already involved in some swap. We execute the set of swaps and update 𝑆. We then iterate this
process until either 𝑆 = |tg(𝐿)|; or there is no swap that decreases 𝑆; or we reach the upper bound of
2|𝑉| iterations.
Now, if 𝑆 = |tg(𝐿)| then the algorithm has successfully found a sequence of swaps and all gates in 𝐿
can be executed. The result of trial 𝑖 is then set to this sequence of swaps. Otherwise, trial 𝑖 is a failure.
If there is at least one successful trial out of 𝑘 trials, we execute the swaps of a successful trial with the
fewest swaps and then execute all gates in 𝐿.
If no trial was successful, we apply the same routine for finding swaps that minimize 𝑆, but taking
only a single gate (𝑞1, 𝑞2) ∈ tg(𝐿) at a time. Note that this results in a sequence of swaps along the
shortest path between ?̂?(𝑞1) and ?̂?(𝑞2). After each such step we execute the selected gate. We repeat
this until all gates in tg(𝐿) have been executed and also execute all single-qubit gates in 𝐿. Finally, we
remove the vertices in 𝐿 from the input circuit DAG and iterate this process until all gates in the input
circuit are executed.
2.2.2 Greedy Swap Circuit Transformation
We also describe a simple greedy approach to circuit transformations. Similar to theQiskit circuit trans-
formation described above, we prioritize swaps that maximally reduce the total distance between the
qubits tg(𝐿), but now using the simpler objective function
𝑅 ≔ ∑
(𝑞1,𝑞2)∈tg(𝐿)
𝑑(?̂?(𝑞1), ?̂?(𝑞2)) . (4)
Note that this is different from (3), where a randomized distance 𝑑𝑖 is used.
We construct an initial ?̂? as follows. Let us consider the first layer 𝐿′ of the circuit consisting of only
two-qubit gates (i.e., single-qubit gates are ignored), initialize 𝑝′∶ 𝑄 ⥎ 𝑉 as undefined everywhere,
and set 𝑈 ≔ ∅ ⊆ 𝑉. We iteratively construct
𝑝′ ← 𝑝′ + {𝑞1 ↦ 𝑣1, 𝑞2 ↦ 𝑣2 ∣ (𝑞1, 𝑞2) ∈ 𝐿
′, (𝑣1, 𝑣2) ∈ 𝑀} , (5)
1We base our description on qiskit.mapper.swap_mapper from Qiskit version 0.6.1.
6
where𝑀 ⊆ 𝐸 is amaximummatching of 𝐺, remove (𝑞1, 𝑞2) from 𝐿
′, set𝑈 ← 𝑈∪{𝑣1, 𝑣2}, and recompute
𝑀 on the subgraph of 𝐺 with the vertices 𝑉 ⧵ 𝑈.2 The remaining qubits 𝑄 ⧵ dom(𝑝′) are arbitrarily
mapped to the available vertices 𝑉 ⧵ image(𝑝′) to obtain ?̂?.
In every iteration, we construct a set of disjoint gates to execute. We first execute asmany gates from
𝐿 as possible given ?̂?, and we remove these gates from the input circuit. Second, let 𝐸𝑖, for 𝑖 ∈ [2], be the
set of edges where executing a swap would decrease 𝑅 by 𝑖, excluding edges which already had a vertex
involved in a gate this iteration. We then greedily execute gates from 𝐸2 first and 𝐸1 second, updating
both 𝐸𝑖s as we go. If we were not able to execute a gate from 𝐿 and no swaps were executed, then, as
a fallback, we deterministically pick a two-qubit gate (𝑞1, 𝑞2) ∈ tg(𝐿) and swap along the first edge on
the shortest path between ?̂?(𝑞1) and ?̂?(𝑞2). We update ?̂? according to the inserted swaps, update 𝐿, and
finally update 𝑅. This process is repeated until the input circuit is empty.
The fallback routine ensures that this circuit transformation always produces an output circuit. The
value 𝑅 strictly decreases in every iteration until a gate can be executed unless the fallback routine is
performed, inwhich case𝑅 stays the same. On repeated calls to the fallback routine, the same two-qubit
gate is picked deterministically until it is executed. This happens within diam(𝐺) + 1 iterations, where
diam(𝐺) denotes the diameter of 𝐺. By induction we see that the whole circuit will be executed.
Let us analyze the time complexity of this circuit transformation. We ignore the initial placement
since it is insignificant for large circuits. A gate from 𝐿 is executed in at most diam(𝐺) iterations, where
diam(𝐺) is the diameter of 𝐺. In every iteration, 𝑂(|𝐸|) edges are checked to determine gates that can
be executed and swaps that will decrease 𝑅. Therefore, the total time complexity is 𝑂(|𝐶||𝐸| diam(𝐺)),
where |𝐶| denotes the size of circuit 𝐶. There is a tighter bound in terms of output circuit 𝐶′ since every
iteration creates a layer in the transformed circuit, the complexity is 𝑂(depth(𝐶′)|𝐸|), where depth(𝐶′)
denotes the circuit depth of 𝐶′.
2.2.3 Constructing Architecture-aware Circuit Transformations
We now present our construction for a general circuit transformation and make some definitions more
precise. Let a permuter (Section 3) be a subroutine that, given 𝜋∶ 𝑉 ⥎ 𝑉, outputs a sequence of trans-
positions that implements 𝜋 while respecting the architecture constraints. Let amapper (Section 4) be
a subroutine that, given ?̂?, a permuter, and a quantum circuit, computes a new placement of qubits,
𝑝∶ 𝑄 ⥎ 𝑉, such that some gates of the input circuit can be executed.
Initialize ?̂? in the sameway as the greedy swap circuit transformation. We repeat the following steps
until the entire circuit has been transformed:
1. Use the given mapper to find a placement, 𝑝∶ 𝑄 ⥎ 𝑉, for the remaining input circuit;
2. Let “∘” denote partial function composition, i.e., given 𝑔∶ 𝑋 ⇀ 𝑌 and 𝑓∶ 𝑌 ⇀ 𝑍,
(𝑓 ∘ 𝑔)(𝑥) ≔ 𝑓(𝑔(𝑥)) , for 𝑥 ∈ dom(𝑔) and 𝑔(𝑥) ∈ dom(𝑓) . (6)
We use the permuter to find transpositions implementing 𝑝 ∘ ?̂?−1∶ 𝑉 ⥎ 𝑉 and replace the trans-
positions with swap gates to construct a permutation circuit to execute. We also update ?̂? to
reflect the new placement of qubits after running the permutation circuit.
3. Execute all gates in 𝐿 that can be executed in accordance with ?̂?, remove these gates from the
input circuit, and recompute 𝐿.
2This is equivalent to runnning the greedy depth mapper (Section 4.1.1) on the input circuit with only two-qubit gates, an
arbitrary ?̂?, and free permutations of qubits. In other words, the greedy depth mapper will pick a placement of qubits on the
architecture unconstrained by movement of qubits, since this is the initial placement.
7
We note that the permuter used by the circuit transformation can be different from the one used by the
mapper. This can, for example, be useful if the permuter is randomized and can be run multiple times
in an attempt to obtain a better result. The number of such trials can be set much higher for the circuit
transformation since only the permutation circuit for 𝑝 ∘ ?̂?−1 needs to be computed in every iteration.
We make use of this flexibility in our implementation (Section 5).
Let us analyze the time complexity of this circuit transformation. We again ignore the time com-
plexity of computing the initial placement. Let 𝑡𝑚 be an upper bound on the time complexity of the
mapper, and let 𝑡𝑝 be an upper bound on the time complexity of the permuter. Computing 𝑝∘?̂?
−1 takes
time 𝑂(|𝑉|). The number of transpositions produced by the permuter is at most 𝑡𝑝, so executing the
associated swaps takes time 𝑂(𝑡𝑝). Only one gate from 𝐿may be executed every iteration so we upper
bound the number of iterations by |𝐶|. We find a time complexity of 𝑂(|𝐶| (𝑡𝑚 + |𝑉| + 𝑡𝑝)). Clearly, if
𝑡𝑝, 𝑡𝑚 ∈ poly(|𝐶|, |𝑉|) then our circuit transformation is also poly-time as desired.
3 Partial Permutations via Transpositions
In this section we provide routing algorithms for implementing partial permutations via transpositions
constrained to edges of a graph. We call such algorithms permuters. The Routing via Matchings
andToken Swapping problems capture exactly our optimization goals of implementing a permutation
of qubits on a quantum architecture while minimizing the circuit depth and size, respectively.
3.1 Partial Routing Via Matchings
The framework of Routing viaMatchings captures how to permute qubits on a graph using a circuit
of the smallest possible depth [2]. We first define a generalization of Routing via Matchings that
allows for partial permutations and then provide permuters for implementing partial permutations for
some architectures of interest.
Definition 3.1 (Partial Routing via Matchings). Partial Routing via Matchings is the fol-
lowing optimization problem. Given a simple graph 𝐺 = (𝑉, 𝐸) and partial permutation 𝜋∶ 𝑉 ⥎ 𝑉,
the objective is to find the smallest 𝑘 ∈ ℕ such that there exist matchings𝑀1, … ,𝑀𝑘 ⊆ 𝐸 on 𝐺, where
each matching induces a permutation as a product of disjoint transpositions
𝜋𝑀𝑖 = ∏
(𝑣,ᵆ)∈𝑀𝑖
(𝑣 𝑢) , (7)
such that
?̂? =
𝑘
∏
𝑖=1
𝜋𝑀𝑖 (8)
is a completion of 𝜋.
Routing via Matchings is the special case of Partial Routing via Matchings where 𝜋 is
constrained to be a (total) permutation. The partial routing number of 𝜋∶ 𝑉 ⥎ 𝑉 on 𝐺 is rt(𝐺, 𝜋) ≔ 𝑘,
where 𝑘 obtains the minimum in Definition 3.1. The routing number [2] is the special case of the partial
routing number where 𝜋 is total. In this paper, we simply refer to the partial routing number as the
routing number. The routing number of 𝐺 is defined as
rt(𝐺) ≔ max
𝜍∈Sym(𝑉)
rt(𝐺, 𝜎) , (9)
8
where we maximize over all permutations 𝜎∶ 𝑉 ↔ 𝑉 (here Sym(𝑉) denotes the group of such permu-
tations). Note that we only optimize over permutations, since for any 𝜋∶ 𝑉 ⥎ 𝑉,
rt(𝐺, 𝜋) = min
?̂?
rt(𝐺, ?̂?) , (10)
where we minimize over all completions ?̂? of 𝜋.
An alternate way to interpret (Partial) Routing via Matchings is to assign tokens to all 𝑣 ∈
dom(𝜋) and destinations 𝜋(𝑣) for the tokens. A token can only by moved through an exchange of
tokens between adjacent vertices. The goal is tomove all tokens to their destinations in as fewmatchings
(specifying exchange locations) as possible. If a vertex does not hold a token at the time of an exchange
with a neighbor, as can be the case in Partial Routing via Matchings, then after the exchange the
neighbor will not hold a token.
3.1.1 Complete Graph
We give a simple construction of a permuter for the 𝑛-vertex complete graph, 𝐾𝑛 = (𝑉, 𝐸). Given
𝜋∶ 𝑉 ⥎ 𝑉, do the following. If
|dom(𝜋) ∪ image(𝜋)| = 2|dom(𝜋)|, (11)
all mappings are disjoint, so we return
{(𝑣, 𝜋(𝑣)) ∣ 𝑣 ∈ dom(𝜋)} (12)
as a single matching that implements 𝜋. Otherwise, we construct an arbitrary completion ?̂? of 𝜋 and
run the standard algorithm for Routing via Matchings for complete graphs on ?̂? [2]. This trivially
achieves the same rt(𝐾𝑛) ≤ 2 bound for all𝜋, but will obtain rt(𝐾𝑛, 𝜋) = 1 for all𝜋with disjoint domain
and image.
The time complexity of the Routing via Matchings algorithm for 𝐾𝑛 is 𝑂(𝑛) [2]. The other
operations described above also take time 𝑂(𝑛), so we get a time complexity of 𝑂(𝑛) for the complete
graph permuter.
3.1.2 Path Graph
We construct a permuter for the 𝑛-vertex path graph, 𝑃𝑛 = (𝑉, 𝐸), by first giving a completion and then
using the standard complete permuter for paths [2]. Different completions achieve different routing
numbers. We give a heuristic for constructing a completion that seems to result in a low routing number
in practice.
We are given 𝜋∶ 𝑉 ⥎ 𝑉 and construct a completion ?̂? of 𝜋 as follows: Let 𝑉 ≅ [𝑛], ordered from
one end of the path to the other (picking ends arbitrarily). Iterate through 𝑖 ∈ 𝑉 in ascending order,
setting
?̂?(𝑖) = {
𝜋(𝑖) if 𝑖 ∈ dom(𝜋),
min (𝑉 ⧵ image(?̂?)) otherwise.
(13)
It can easily be seen that ?̂? is a completion of 𝜋. We have rt(𝑃𝑛, 𝜋) ≤ rt(𝑃𝑛, ?̂?) ≤ 𝑛 by the standard path
routing algorithm [2]. It remains open whether a tighter bound can be proven as a function of some
parameters of 𝜋.
Constructing the completion takes time𝑂(|𝑉|). The total complexity for running the path permuter
is 𝑂(|𝑉|2), where the time complexity of the Routing via Matchings algorithm [2] dominates the
construction of ?̂?.
9
3.1.3 Hierarchical Product
The generalized hierarchical product (henceforth hierarchical product) of graphs [6] produces various
subgraphs of the Cartesian product of graphs that include natural models of quantum computer archi-
tectures [5].
Definition 3.2 (Hierarchical Product [6]). For 𝑗 ∈ {1, 2}, let 𝐺𝑗 = (𝑉𝑗, 𝐸𝑗) be a graph with 𝑛𝑗 ≔ ||𝑉𝑗||
vertices and adjacencymatrix𝐴𝑗 ∈ ℳ𝑛𝑗, whereℳ𝑘 is the set of 𝑘×𝑘 booleanmatrices, for 𝑘 ∈ ℕ. Then
the hierarchical product Π𝐯(𝐺1, 𝐺2), for 𝐯 ∈ {0, 1}
𝑛2, has vertex set 𝑉1 × 𝑉2 and adjacency matrix
𝐴1 ⊗ diag(𝐯) + 𝟙𝑛1 ⊗𝐴2 ,
where 𝟙𝑛1 ∈ 𝑀𝑛1 is the 𝑛1×𝑛1 identitymatrix,𝑀1⊗𝑀2 ∈ ℳ𝑛1𝑛2 is the Kronecker product of𝑀1 ∈ ℳ𝑛1
and𝑀2 ∈ ℳ𝑛2, and diag(𝐯) ∈ 𝑀𝑛2 is the diagonal matrix with the entries of 𝐯 on the diagonal.
Intuitively, this graph consists of 𝑛1 copies of 𝐺2, where the 𝑗th vertices in all copies of 𝐺2 are con-
nected by a copy of 𝐺1 if 𝐯𝑗 = 1. We restrict ourselves to connected simple graphs, so 𝐴1 and 𝐴2 are
symmetric 0–1 matrices and 𝐯 is nonzero. An example of the hierarchical product of two path graphs
is
Π[1 0 1](𝑃2, 𝑃3) = Π[1 0 1]
⎛
⎜⎜
⎝ 1
2
, 1 2 3
⎞
⎟⎟
⎠
=
1,1 1,2 1,3
2,1 2,2 2,3
(14)
The Cartesian product is Π𝟏, where 𝟏 ≔ [1…1] (see Section 3.1.5). Furthermore, Π𝐞1 is the standard
hierarchical product, and Π𝐞𝑖 is the rooted product of graphs, rooted at the 𝑖th vertex of 𝐺2.
We define the vertex-induced subgraph of any graph 𝐺 = (𝑉, 𝐸) for vertex set 𝑈 ⊆ 𝑉 as
𝐺[𝑈] ≔ (𝑈, 𝐸 ∩ (𝑈 × 𝑈)) . (15)
Now, let 𝐺 = (𝑉, 𝐸) = Π𝐯(𝐺1, 𝐺2) and denote the vertices of 𝐺 by 𝑣 = (𝑣1, 𝑣2) ∈ 𝑉1 × 𝑉2 = 𝑉. We define
𝒢𝑖 = (𝒱𝑖, ℰ𝑖) ≔ 𝐺 [{𝑖} × 𝑉2] , (16)
for 𝑖 ∈ 𝑉1. Note that each 𝒢𝑖 is isomorphic to 𝐺2, so the permuter for 𝐺2 can be used for 𝒢𝑖. We also
define the communicator vertices of 𝒢𝑖 as the vertices
{𝑖} × {𝑗 ∈ 𝑉2 ∣ 𝐯𝑗 = 1} ⊆ 𝒱𝑖 , (17)
and index them in ascending order (for some ordering of 𝑉). Note that the 𝑗th communicator vertex
(of any 𝒢𝑖) also belongs to 𝐺[𝑉1 × {𝑗}], which is isomorphic to 𝐺1.
A useful metric is
deg(𝜋) ≔ max ⋃
𝑖∈𝑉1
{|{𝑣 ∈ dom(𝜋) ∩ 𝒱𝑖 ∣ 𝜋(𝑣) ∉ 𝒱𝑖}|, |{𝑣 ∈ dom(𝜋) ⧵ 𝒱𝑖 ∣ 𝜋(𝑣) ∈ 𝒱𝑖}|} , (18)
which represents the maximum number of vertices that need to leave or enter any 𝒢𝑖 to implement 𝜋.
In every iteration of the routing algorithm, we route a set 𝑅 = {𝑣(𝑖) ∈ 𝒱𝑖 ∣ 𝑖 ∈ 𝑉1} such that all 𝜋(𝑣)1
are distinct, for 𝑣 ∈ 𝑅 and 𝜋(𝑣) = (𝜋(𝑣)1, 𝜋(𝑣)2) ∈ 𝑉. Undefined values are always considered distinct.
We call such 𝑅 a set of representative vertices, and we view 𝑣(𝑖) as the representative vertex of 𝑉𝑖.
10
input : 𝜋∶ 𝑉1 × 𝑉2 ⥎ 𝑉1 × 𝑉2; permuters on 𝐺1 and 𝐺2
1 Let 𝑅𝑖, for 𝑖 ∈ [deg(𝜋)], be given by Lemma 3.3
2 for 𝑖 = 1, … , ⌈
deg(𝜋)
ham(𝐯)
⌉ :
3 foreach 𝑗 ∈ 𝑉1 :
4 on 𝒢𝑗, for all 𝑘 ∈ [ham(𝐯)], route the (unique) vertex 𝑣 ∈ 𝑅(𝑖−1)·ham(𝐯)+𝑘 ∩ 𝒱𝑗 to the
𝑘-th communicator vertex of 𝒢𝑗 // For 𝑅ℓ with ℓ > deg(𝜋), do nothing
5 foreach communicator vertex (𝑣1, 𝑣2) of 𝒢1 : // All copies of 𝐺1
6 on 𝐺[𝑉1 × {𝑣2}] = (𝑉
′, 𝐸′), route each 𝑣 ∈ 𝑉′ ∩ dom(𝜋) to (𝜋(𝑣)1, 𝑣2) ∈ 𝑉
′
7 foreach 𝑖 ∈ 𝑉1 :
8 route all 𝑣 ∈ dom(𝜋) ∩ 𝑉𝑖 to 𝜋(𝑣) within 𝒢𝑖
9 return the transpositions that implement this routing
Algorithm 3.1: Partial Routing via Matchings on the hierarchical product of graphs
Π𝐯(𝐺1, 𝐺2). In Lines 4 and 6, routing means constructing a partial permutation 𝜎 on a sub-
graph (𝐺1 or 𝐺2), using the applicable permuter to find transpositions implementing 𝜎, and
applying those transpositions to update 𝜋 and each 𝑅𝑖.
Lemma 3.3. For a graphΠ𝐯(𝐺1, 𝐺2), 𝜋∶ 𝑉 ⥎ 𝑉, let 𝑑 ≔ deg(𝜋). We can find distinct sets of representa-
tive vertices 𝑅𝑖, for 𝑖 ∈ [𝑑], such that
{𝑣 ∈ dom(𝜋) ∣ 𝑣1 ≠ 𝜋(𝑣)1} ⊆ ⋃
𝑖∈[𝑑]
𝑅𝑖 .
Proof. Let 𝐺 = (𝑈,𝑉, 𝐸) be a bipartite multi-graph, with 𝑈 = 𝑉 ≔ [𝑛1] the left and right vertex sets,
and the edge multi-set
𝐸 = {(𝑣1, 𝜋(𝑣)1) ∣ 𝑣 ∈ dom(𝜋)} . (19)
Each vertex 𝑘 ∈ 𝑈 belongs to at most 𝑑 edges (𝑘, 𝑙), for 𝑙 ∈ 𝑉 and 𝑘 ≠ 𝑙, and each vertex 𝑙′ ∈ 𝑉 belongs
to at most 𝑑 edges (𝑘′, 𝑙′), for 𝑘′ ∈ 𝑈 and 𝑘′ ≠ 𝑙′. However, for any 𝑘 ∈ 𝑈 there could be as many as
𝑛2 edges (𝑘, 𝑘). For all 𝑘 ∈ 𝑈 we remove as many (𝑘, 𝑘) ∈ 𝐸 as necessary to ensure that the maximum
degree of any vertex in 𝐺 is 𝑑.
Wemake𝐺𝑑-regular by repeating the following: If ∄𝑘 ∈ 𝑈with deg(𝑘) < 𝑑weare done. Otherwise,
such a 𝑘 exists and ∃𝑘′ ∈ 𝑉 with deg(𝑘′) < 𝑑, since
∑
𝑘∈𝑈
deg(𝑘) = ∑
𝑘′∈𝑉
deg(𝑘′) . (20)
It follows that there exist vertices 𝑢 ∈ 𝒱𝑘 ⧵ dom(𝜋) and 𝑣 ∈ 𝒱𝑘′ ⧵ image(𝜋). For the purposes of this
proof, we set 𝜋(𝑢) = 𝑣, effectively adding an edge (𝑘, 𝑘′) to 𝐸.
Now we have modified 𝜋 so that 𝐺 is 𝑑-regular. By Hall’s marriage theorem, there exists a perfect
matching in 𝐺, and removing it results in a (𝑑 − 1)-regular graph. We iterate this to find 𝑑 distinct
perfect matchings in 𝐺. Each edge (𝑘, 𝑘′) ∈ 𝐸 corresponds to some 𝑣 ∈ 𝒱𝑘 and 𝑢 ∈ 𝒱𝑘′, with 𝜋(𝑣) = 𝑢.
Therefore, each perfect matching corresponds to a set of representative vertices, 𝑅𝑖. Since all perfect
matchings are distinct, and all 𝑒 ∈ 𝐸 are covered by some matching, the Lemma follows.
Algorithm 3.1 specifies a permuter for the hierarchical product. We prove the following performance
bounds for this algorithm.
11
Theorem 3.4. For a graph Π𝐯(𝐺1, 𝐺2), Algorithm 3.1 finds a sequence of transpositions that implements
𝜋∶ 𝑉 ⥎ 𝑉 certifying that
rt(Π𝐯(𝐺1, 𝐺2), 𝜋) ≤ ⎡
⎢
deg(𝜋)
ham(𝐯)
⎤
⎥
(rt(𝐺1) + rt(𝐺2)) + rt(𝐺2) ,
where ham(𝐯) is the Hamming weight of 𝐯, i.e., the number of ones in 𝐯.
Proof. In every round of routing, we route ham(𝐯) sets 𝑅𝑖 to their destination 𝒢𝑗s, for 𝑗 ∈ 𝑉1. In each
round, we route on all copies of 𝐺2 in parallel and then route on all copies of 𝐺1 in parallel. After routing
all 𝑅𝑖 in at most ⌈deg(𝜋)/ham(𝐯)⌉ rounds, Lemma 3.3 ensures that only permutations local to each 𝒢𝑗
remain. Finally, we route vertices to their destinations, as given by 𝜋, in each 𝒢𝑗 independently using
the permuter for 𝐺2.
Corollary 3.5.
rt(Π𝐯(𝐺1, 𝐺2)) ≤ ⎡⎢
𝑛2
ham(𝐯)
⎤
⎥
(rt(𝐺1) + rt(𝐺2)) + rt(𝐺2) .
Proof. By definition (9), we maximize Theorem 3.4 over 𝜋∶ 𝑉 ↔ 𝑉 and bound deg(𝜋) ≤ 𝑛2 to get the
result.
As a possible optimization, we can remove some vertices from the partial permutations in the rout-
ing steps. For each removed vertex, we must ensure that the remaining steps of the routing algorithm
remain valid. Specifically, let there be a 𝑢 ∈ 𝒢𝑖 ∩ 𝑅𝑘 for 𝑖 ∈ 𝑉1 and 𝑘 ∈ [deg(𝜋)]. If 𝑢 ∈ dom(𝜋) and
𝜋(𝑢) ∈ 𝒱𝑖, thenwe remove it since it does not need to be routed outside of 𝒢𝑖. Otherwise, if 𝑢 ∉ dom(𝜋),
we remove it unless
∃𝑣 ∈ {𝑅𝑘 ∩ dom(𝜋) ∣ 𝜋(𝑣) ∈ 𝒢𝑖} (21)
since an unmapped vertex is expected at the communicator vertex in the second loop of the routing
round. We apply this optimization in our implementation of the permuter for modular graphs (Sec-
tion 3.1.4).
Next, we analyze the time complexity of Algorithm 3.1. Let 𝑡1 and 𝑡2 upper bound the time com-
plexity of algorithms for Partial Routing via Matchings on 𝐺1 and 𝐺2, respectively. We first find
deg(𝜋) distinct sets of representative vertices by Lemma 3.3. The time to find one set of representa-
tive vertices is dominated by the time to find the maximum bipartite matching, 𝑂(𝑛2.51 ) [24]. Then, for
⌈deg(𝜋)/ham(𝐯)⌉ iterations, we route on all copies of 𝐺2 and then 𝐺1 in parallel. Overall, we get a time
complexity of
𝑂(deg(𝜋) · 𝑛2.51 + ⎡⎢
deg(𝜋)
ham(𝐯)
⎤
⎥
(ham(𝐯)𝑡1 + 𝑛1𝑡2) + 𝑛1𝑡2) . (22)
We show a lower bound on the routing number of hierarchical products of graphs and prove that it
is tight, up to constant factors.
Theorem 3.6. For a graphΠ𝐯(𝐺1, 𝐺2) and any 𝜋∶ 𝑉 ⥎ 𝑉,
2⎡
⎢
deg(𝜋)
ham(𝐯)
⎤
⎥
− 1 ≤ rt(Π𝐯(𝐺1, 𝐺2), 𝜋) .
Proof. Let us consider the token-based formulation of Partial Routing via Matchings. At most
deg(𝜋) tokens need to be moved out of any 𝒢𝑖, for 𝑖 ∈ 𝑉1. Every matching can move at most ham(𝐯)
tokens out of their original 𝒢𝑖. Once moved out, a new set of tokens must be moved onto the ham(𝐯)
communicator vertices. Therefore, it takes at least 2⌈deg(𝜋)/ham(𝐯)⌉ − 1 matchings to move deg(𝜋)
tokens out of any 𝒢𝑖.
12
We now show that Theorem 3.6 is tight up to constant factors by considering a specific permutation
on the path graph 𝑃2𝑛, with 𝑛 ∈ ℕ
+. We have 𝑃2𝑛 = (𝑉, 𝐸) ≅ Π𝐞1(𝑃2, 𝑃𝑛) by a relabeling of vertices. We
define 𝜋′∶ 𝑉 ↔ 𝑉 as
𝜋′ ≔
𝑛−1
∏
𝑖=0
(𝑖 (𝑛 + 𝑖)) . (23)
Then,
2⎡
⎢
deg(𝜋′)
ham(𝐞1)
⎤
⎥
− 1 = 2𝑛 − 1 ≤ rt(Π𝐞1(𝑃2, 𝑃𝑛), 𝜋
′) = rt(𝑃2𝑛, 𝜋
′) ≤ rt(𝑃2𝑛) ≤ 2𝑛 , (24)
where we used Section 3.1.2 for the last inequality, and 𝐞𝑖 ∈ {0, 1}
𝑛 is the 𝑖th standard basis vector. This
also matches the tightest known (diameter) lower bound for rt(𝑃2𝑛).
3.1.4 Modular Graphs
Large-scale quantum computation may benefit from amodular design, with many interconnected sub-
units [40, 41, 12]. As a simple model of a modular quantum processor consisting of 𝑛1 modules with 𝑛2
qubits each, we consider the modular graph Mod(𝑛1, 𝑛2) ≔ Π𝐞1(𝐾𝑛1, 𝐾𝑛2) = (𝑉, 𝐸). In this architec-
ture, any two qubits in the same module can be directly coupled, and any two modules can be coupled
through their unique communicator qubits. With one minor modification to Theorem 3.4, we get the
following bounds on the routing number of the modular graph.
Corollary 3.7. For 𝑛1, 𝑛2 ∈ ℕ and 𝜋∶ 𝑉 ⥎ 𝑉, we have
2 deg(𝜋) − 1 ≤ rt(Mod(𝑛1, 𝑛2), 𝜋) ≤ 3 deg(𝜋) + 2 .
Proof. Directly applying Theorem 3.4 gives
rt(Mod(𝑛1, 𝑛2), 𝜋) ≤ 4 deg(𝜋) + 2 . (25)
However, only one token needs to be routed to the communicator vertex in every round of Algorithm 3.1
and this satisfies (11). Therefore, we can route with one set of parallel transpositions, saving us one
matching every round.
To show the lower bound, we apply Theorem 3.6 with ham(𝐞1) = 1.
We evaluate the time complexity of this permuter using Eq. (22). Recall from Section 3.1.1 that the
time complexity of the permuter is 𝑂(𝑛). Thus we have 𝑡1 = 𝑂(𝑛1) and 𝑡2 = 𝑂(𝑛2), giving an overall
time complexity of 𝑂(𝑑𝑛2.51 + 𝑛1𝑛2), where we noted that 𝑡2 = 𝑂(1) while doing the deg(𝜋) rounds of
routing.
3.1.5 Cartesian Product
The Cartesian product of graphs is a special case of the hierarchical product, namelyΠ𝟏 for 𝟏 ≔ [1…1].
We refer to a copy of 𝐺1 in 𝐺1 × 𝐺2 (i.e., 𝐺[𝑉1 × {𝑣2}] for some 𝑣2 ∈ 𝑉2) as a row of 𝐺1 × 𝐺2, and, vice
versa, to a copy of 𝐺2 as a column. Also, let 𝑛1 ≔ |𝑉1| and 𝑛2 ≔ |𝑉2|. Theorem 3.4 allows us to reprove
an upper bound on the routing number of a Cartesian product of graphs [2].
Corollary 3.8. For any graphs 𝐺1 = (𝑉1, 𝐸1) and 𝐺2 = (𝑉2, 𝐸2),
rt(𝐺1 × 𝐺2) = rt(Π𝟏(𝐺1, 𝐺2)) ≤ rt(𝐺1) + 2 rt(𝐺2) .
Proof. We fill in ham(𝐯) = 𝑛2 in Corollary 3.5 to get the result.
13
Lemma 3.3 does not specify the order in which systems of distinct representatives are picked, but
this order matters in practice. Since ham(𝐯) = 𝑛2, we can pick 𝑛2 distinct sets of representative vertices
without incurring another round of routing (in Algorithm 3.1). We propose a heuristic for picking these
𝑛2 sets that seems to produce low-depth implementations of partial permutations in practice (Algo-
rithm 3.2).
Algorithm 3.2 uses a modification of Lemma 3.3 to choose representative vertices. The proof of
Lemma 3.3 can be straightforwardly extended by not initially removing edges of the form (𝑘, 𝑘) and
adding edges until an 𝑛2-regular bipartite multi-graph, 𝐵, is constructed. Thus, by Hall’s marriage
theorem, there exist 𝑛2 distinct perfect matchings in 𝐵, enough for all the rows. We choose a perfect
matching of minimumweight for each row with respect to a heuristic cost function 𝑐∶ dom(𝜋)×𝑉2 →
ℕ, with the rows processed in a random order.
We add additional edges to 𝐵 to allow for more options to minimize the weight. We construct a
bipartite multi-graph 𝐵′ that contains 𝐵, disregarding some duplicated edges. Edge duplication does
not change the minimum-weight perfect matching. Instead of adding an edge for unmapped vertices
as in Lemma 3.3, we add edges to all possible destination columns for each column with an unmapped
vertex.
Let 𝜎∶ 𝑉1 × 𝑉2 ⥎ 𝑉1 × 𝑉2 be the partial permutation defined on Line 1 of Algorithm 3.2. The cost
function depends on the current value of 𝜎 and is defined as
𝑐(𝑣, 𝑖) ≔ rt(𝐺2, 𝜋1) + rt(𝐺2, {𝑖 ↦ 𝜋(𝑣)2}) − rt(𝐺2, 𝜋2) − rt(𝐺2, {𝑣2 ↦ 𝜋(𝑣)2}) , (26)
where we define 𝜋𝑘∶ 𝑉2 ⥎ 𝑉2 for 𝑘 ∈ [2] such that
𝜋1∶ 𝑢 ↦ {
𝜎(𝑣1, 𝑢)2 if (𝑣1, 𝑢) ∈ dom(𝜎) ,
𝑖 if 𝑢 = 𝑣2
(27)
is the partial permutation routing 𝑣 to row 𝑖 within its column, and 𝜋2∶ 𝑢 ↦ 𝜎(𝑣1, 𝑢)2 is the current
partial permutation already planned for column 𝑣1. For simplicity, we assume the routing time along
rows is the same in both cases, so it cancels out. To compute an upper bound on the routing number
in (26) we use the given permuter for 𝐺2.
To implement routing on the Cartesian product of graphs, we route 𝜎 obtained from Algorithm 3.2
within each column independently, and proceed with Line 5 of Algorithm 3.1.
Finally, we analyze the time complexity of the permuter for Cartesian products of graphs. Assume
the time complexity of computing rt(𝐺1, 𝜎) and rt(𝐺2, 𝜎
′) is upper bounded by 𝑡1 and 𝑡2, respectively.
Computing the cost function (26) then has time complexity 𝑂(𝑡2). In Algorithm 3.2 we construct a bi-
partite weighted graphwith 2𝑛2 vertices in time𝑂(𝑛2𝑛1𝑡2 + 𝑛
2
2). On that graphwe perform amaximum
weighted bipartite matching algorithm in 𝑂(𝑛32) using the Hungarian algorithm [26].
3 We do this once
for each row and route all vertices to their assigned rows. Then, we continue with Line 5 of Algo-
rithm 3.1, resulting in a total time complexity for running the permuter of 𝑂(𝑛1 (𝑛2𝑛1𝑡2 + 𝑛
3
2) + 𝑛2𝑡1).
3.2 Partial Token Swapping
TheTokenSwappingproblem is similar toRoutingviaMatchings, butminimizes the total number
of transpositions instead of the depth [57]. It follows that the induced permutation circuit is optimized
for circuit size. For 𝜖 > 0, a (1 + 𝜖)-approximation algorithm is an algorithm that produces a solution
within a factor (1 + 𝜖) of optimal for all valid inputs. Here, we define a generalized version of Token
3A tighter bound of 𝑂(√𝑛𝑚 log (𝑛𝐶)) is possible [19], for 𝑛,𝑚,𝐶 the number of vertices, the number of edges, and
absolute maximum integer edge weight, respectively. Our edge weights can be scaled to integers that are upper bounded by
𝑂(𝑛2 (𝑛1 +𝑛2)).
14
input : 𝜋∶ 𝑉1 × 𝑉2 ⥎ 𝑉1 × 𝑉2, a partial permutation
1 𝜎 ← ∅ // we have 𝜎∶ 𝑉1 × 𝑉2 ⥎ 𝑉1 × 𝑉2
2 𝑟 ← 𝑛2 // #remaining rows
3 foreach row 𝑖 ∈𝑅 𝑉2 :
4 𝐸 ← {(𝑣1, 𝜋(𝑣)1, 𝑐(𝑣, 𝑖)) ∣ 𝑣 ∈ dom(𝜋) ⧵ dom(𝜎)}
// Add edges for unmapped vertices
5 𝐸′ ← 𝐸
6 𝐺 = (𝑈,𝑉, 𝐸′), with 𝑈 = 𝑉 ≔ [𝑛1]
7 foreach 𝑢 ∈ 𝑈 with deg𝐸(𝑢) < 𝑟 :
8 foreach 𝑣 ∈ 𝑉 with deg𝐸(𝑣) < 𝑟 :
9 Add (𝑢, 𝑣, 𝜖) to 𝐸
10 Find a minimum-weight perfect matching 𝐸match in 𝐺
11 𝑉match ← the set of vertices associated with 𝐸match
12 𝜎 ← 𝜎 + {𝑣 ↦ (𝑣1, 𝑖) ∣ 𝑣 ∈ 𝑉match} // Recall (1)
13 𝑟 ← 𝑟 − 1
14 return 𝜎
Algorithm3.2:Heuristically choosing distinct sets of representative vertices for theCartesian
product of graphs. Wemodify Lemma 3.3 to pick 𝑛2minimum-weight perfectmatchings, with
respect to the heuristic cost function 𝑐 (Eq. (26)). The notation ∈𝑅 indicates that we select
elements uniformly at random without replacement. The edges of the weighted undirected
bipartite multi-graph 𝐺 are specified as a multi-set of triples from 𝑉 × 𝑉 × ℝ. We pick 𝜖 > 0
so that zero-cost edges for mapped vertices are favored over edges for unmapped vertices.
Swapping that allows for partial permutations, and then give a 4-approximation algorithm for this
problem on connected simple graphs that generalizes a previous 4-approximation algorithm for total
permutations [39].
Definition 3.9 (PartialToken Swapping). Wedefine PartialToken Swapping as an optimization
problem. Given are a graph 𝐺 = (𝑉, 𝐸) and partial permutation 𝜋∶ 𝑉 ⥎ 𝑉. The objective is to find the
smallest 𝑘 ∈ ℕ such that ?̂? = (𝑢1 𝑣1)(𝑢2 𝑣2) … (𝑢𝑘 𝑣𝑘), for ?̂? some completion of 𝜋 and (𝑢𝑖, 𝑣𝑖) ∈ 𝐸 for
𝑖 ∈ [𝑘].
Analogous to the routing number, we define the routing size of 𝜋∶ 𝑉 ⥎ 𝑉 on 𝐺, rs(𝐺, 𝜋), to be the
minimum 𝑘 in Definition 3.9, and the routing size of 𝐺 as
rs(𝐺) ≔ max
𝜍∈Sym(𝑉)
rs(𝐺, 𝜎) . (28)
Token Swapping is the special case of Partial Token Swapping where 𝜋 is constrained to be a total
permutation. Partial Token Swapping also has an equivalent token-based formulation, similar to
Partial Routing via Matchings.
The decision version of Token Swapping was first shown to be NP-complete [39] and hard for a
model of parametrized complexity, parametrized by the number of swaps 𝑘 [10]. Furthermore, assum-
ing theExponentialTimeHypothesis (ETH),TokenSwapping cannot be solved in time𝑓(𝑘)(|𝑉| + |𝐸|)𝑜(𝑘/ log𝑘)
with 𝑓 any computable function [10].
15
input : 𝜋∶ 𝑉 ⥎ 𝑉
1 while 𝜋 ≠ id |dom(𝜋) :
2 if there exists a happy swap chain 𝑣1𝑣2…𝑣ℓ then
3 Perform transpositions (𝑣1 𝑣2)(𝑣2 𝑣3) … (𝑣ℓ−1 𝑣ℓ)
4 else if ∃𝑣 ∈ dom(𝜋), ∃𝑢 ∈ 𝑁(𝑣) ⧵ dom(𝜋) ∶ 𝑑(𝑢, 𝜋(𝑣)) < 𝑑(𝑣, 𝜋(𝑣)) then
5 Perform no-token swap (𝑣 𝑢) // 𝑢 has no token
6 else
7 There exists an unhappy swap; perform it
8 Update 𝜋 according to the transpositions that were performed
9 return The sequence of transpositions that was performed
Algorithm 3.3: Routing tokens to their destinations while minimizing the number of trans-
positions. We add an extra step that performs no-token swaps to the algorithm of [39]. For
𝑣 ∈ 𝑉,𝑁(𝑣) ⊆ 𝑉 denotes the set of neighbors of 𝑣. The partial permutation id |dom(𝜋)∶ 𝑉 ⥎ 𝑉
is the restriction of the identity function id∶ 𝑉 ↔ 𝑉 to dom(𝜋) (so it is undefined outside of
dom(𝜋)).
3.2.1 Approximation Algorithm for Partial Token Swapping
We now describe a permuter that aims to minimize the circuit size. Miltzow et al. [39] gave a 4-approx-
imation algorithm for Token Swapping. Here, we generalize their results to Partial Token Swap-
ping and prove that our generalized algorithm is also a 4-approximation algorithm. For this section,
we consider the token-based formulation of Partial Token Swapping (recall the notion of tokens
introduced in Section 3.1).
The main idea of Miltzow et al. is to perform swaps that reduce the sum of all distances of tokens
to their destinations. We use the following definitions from [39]: An unhappy swap is “an edge swap
where one of the tokens swapped is already on its target and the other token reduces its distance to its
target vertex (by one)”, and a happy swap chain is a path of ℓ + 1 distinct vertices 𝑣1𝑣2…𝑣ℓ, such that
swapping all (𝑣𝑖, 𝑣𝑖+1) ∈ 𝐸, for 𝑖 ∈ [ℓ−1], in increasing order strictly reduces the distances of all tokens
in the chain to their destinations.
When considering a partial permutation, not all vertices have a token assigned to them. We add an
extra step to the approximation algorithm for Token Swapping tomake use of this: Before considering
an unhappy swap, we first try to swap a token to a tokenless neighbor if it brings the token closer to its
destination. We call this a no-token swap.
The approximation algorithm for Partial Token Swapping is specified in full in Algorithm 3.3.
Theorem 3.10. Given a simple connected graph 𝐺 = (𝑉, 𝐸) and 𝜋∶ 𝑉 ⥎ 𝑉, Algorithm 3.3 uses at most
4 · rs(𝐺, 𝜋) transpositions.
Proof. The proof is very similar to [39, Theorem 7] with some minor modifications to account for no-
token swaps. Let
𝑆 ≔ ∑
𝑣∈dom(𝜋)
𝑑(𝑣, 𝜋(𝑣)) . (29)
We know that rs(𝐺, 𝜋) ≥ 𝑆/2 since each swap can only reduce 𝑆 by two. A no-token swap reduces 𝑆 by
one. A happy swap chain of length ℓ reduces 𝑆 by ℓ + 1. As such, over the course of the algorithm,
#(happy swaps) + #(no-token swaps) ≤ 𝑆 . (30)
16
For an unhappy swap, the token that is swapped away from its destination must next be involved in a
happy swap or a no-token swap, so
#(unhappy swaps) ≤ #(happy swaps) + #(no-token swaps) . (31)
Overall, we have
#(unhappy swaps) + #(happy swaps) + #(no-token swaps)
≤ 2#(happy swaps) + 2#(no-token swaps)
≤ 2𝑆 ≤ 4 · rs(𝐺, 𝜋) .
Miltzow et al. further showed that their algorithm for total permutations gives a 2-approximation
algorithm when the graph is a tree. We now give an example showing that this is not the case for our
modified algorithmwhen the permutation is partial. Consider the path graph𝑃𝑛, for 𝑛 > 2, and a partial
permutation
𝜋 ≔ {𝑖 ↦ 𝑖 + 1 ∣ 𝑖 ∈ [𝑛 − 2]} ∪ {𝑛 ↦ 1} . (32)
Trivially, the shortest product of transpositions implementing 𝜋 is
𝑛−2
∏
𝑖=0
((𝑛 − 𝑖) (𝑛 − 1 − 𝑖)) (33)
of length 𝑛 − 1. However, the algorithm selects no-token swaps arbitrarily. In the worst case, it could
select the sequence of transpositions
[
𝑛−3
∏
𝑖=0
((𝑛 − 2 − 𝑖) (𝑛 − 1 − 𝑖))] ⋅ [
𝑛−2
∏
𝑖=0
((𝑛 − 𝑖) (𝑛 − 1 − 𝑖))] ⋅ [
𝑛−3
∏
𝑖=0
((2 + 𝑖) (3 + 𝑖))] (34)
of length 3𝑛 − 5. Therefore, in the limit we get an approximation ratio of lim𝑛→∞(3𝑛 − 5)/(𝑛 − 1) = 3.
While (32) is only undefined on one input, we can modify 𝜋 by removing 𝑘 = 𝑜(𝑛) entries to make
it harder to find an appropriate completion, since there are (𝑘 + 1)! possibilities. Then the algorithm
still asymptotically achieves an approximation ratio of lim𝑛→∞(3𝑛 − 5 − 2𝑘)/(𝑛 − 1) = 3.
Of course, it is still possible that the algorithm could achieve better than a 4-approximation. We
leave the best approximation ratio of our PartialToken Swapping algorithm (on trees and in general)
as an open question.
Finally, we determine the time complexity of this permuter. Computing an all-to-all distancematrix
takes timeΘ(|𝑉|3) using the Floyd-Warshall algorithm [17], but this cost needs only to be incurred once
for a graph so we do not include it. A happy or unhappy swap can be found in time 𝑂(|𝐸|) by finding
cycles in an auxiliary directed graph [39]. Similarly, finding no-token swaps has time complexity𝑂(|𝐸|).
Therefore, we get a total time complexity of 𝑂(𝑆|𝐸|) ≤ 𝑂(|𝑉|2|𝐸|).
4 Placing Qubits on the Architecture
Amapping algorithm (ormapper) finds an assignment of circuit qubits to architecture vertices such that
gates can be executed efficiently. We specify mappers in terms of the routing number rt(𝐺, 𝜋) (Eq. (9))
and the routing size rs(𝐺, 𝜋) (Eq. (28)), where 𝐺 = (𝑉, 𝐸) is the architecture graph and 𝜋∶ 𝑉 ⥎ 𝑉. In
practice, we replace these quantities with the upper bounds that result from applying our permuters.
Mappers construct placements of circuit qubits onto qubits of the architecture. A placement is a
bijective partial function 𝑝∶ 𝑄 ⥎ 𝑉. Amapper has access to the current placement ?̂? ∶ 𝑄 → 𝑉 provided
17
by the circuit transformation. Given a placement 𝑝 and the current placement ?̂?, we can compute a
partial permutation 𝑝 ∘ ?̂?−1∶ 𝑉 ⥎ 𝑉 that implements 𝑝. All our mappers construct a placement 𝑝 that
is initially undefined everywhere and modify it until finished.
In the remainder of this section, we describe several specific mappers that we implement and eval-
uate. We describe mappers optimizing for circuit depth in Section 4.1 and for circuit size in Section 4.2.
We also give an upper bound on the time complexity of themappers as a function of the time complexity
of the permuter, 𝑡𝑝.
4.1 Circuit Depth Mappers
In this section we discuss mappers that attempt to minimize the transformed circuit depth. Let 𝐿 be the
first layer of gates of the input circuit, and let𝑀 be a maximummatching in the architecture graph.
4.1.1 Greedy Depth Mapper
The greedy depth mapper iteratively places the highest-cost gate at its lowest-cost location, where cost
is measured in terms of the routing number to achieve the placement. More precisely, we initialize the
set of used vertices 𝑈 ≔ ∅ and find a placement 𝑝′ ≔ {𝑞1 ↦ 𝑣1, 𝑞2 ↦ 𝑣2} that attains the optimum
max
(𝑞1,𝑞2)∈tg(𝐿)
min
(𝑣1,𝑣2)∈𝑀
rt(𝐺, (𝑝 ∪ {𝑞1 ↦ 𝑣1, 𝑞2 ↦ 𝑣2}) ∘ ?̂?
−1) , (35)
where we consider both orderings of edges from𝑀, (𝑣, 𝑢), (𝑢, 𝑣) ∈ 𝑀, since edges are undirected. Then,
we update𝑈 ← 𝑈∪dom(𝑝′) and recompute𝑀 for the graph 𝐺[𝑉 ⧵𝑈] (recall (15)); we remove the gate
associated to (𝑞1, 𝑞2) from 𝐿; we set 𝑝 ← 𝑝 ∪ 𝑝
′; and we iterate until tg(𝐿) = ∅ or𝑀 = ∅. Finally, we
return the placement 𝑝.
In this procedure, we perform at most min{|𝐿|, |𝑀|} iterations to place gates. In each iteration, we
find a 𝑝′ according to (35) in time 𝑂(|𝐿||𝑀|𝑡𝑝). Thus, the time complexity for one call of the mapper is
𝑂(min{|𝐿|, |𝑀|} (|𝐿||𝑀|𝑡𝑝 +√|𝑉||𝐸|)) , (36)
where 𝑂(√|𝑉||𝐸|) is the complexity of computing a maximummatching [38].
4.1.2 Incremental Depth Mapper
Instead of trying to place (almost) all gates in 𝐿, the incremental depth mapper guarantees placement
of only the lowest-cost gate, as given by the routing number, and incrementally improves the situation
for the other gates. Specifically, we first find a placement 𝑝min ≔ {𝑞1 ↦ 𝑣1, 𝑞2 ↦ 𝑣2} that attains the
optimum
𝑐′min ≔ min
(𝑞1,𝑞2)∈tg(𝐿)
min
(𝑣1,𝑣2)∈𝐸
rt(𝐺, (𝑝 ∪ {𝑞1 ↦ 𝑣1, 𝑞2 ↦ 𝑣2}) ∘ ?̂?
−1) , (37)
where we consider both orderings of 𝐸, (𝑢, 𝑣), (𝑣, 𝑢) ∈ 𝐸. We set 𝑝 ← 𝑝min and define 𝑈 ≔ {𝑢, 𝑣}. Let
𝑐min ≔ max{𝑐
′
min, 1}.
We find a placement for the remaining two-qubit gates that (individually) does not exceed 𝑐min. We
iterate in arbitrary order over (𝑞1, 𝑞2) ∈ tg(𝐿) and do the following: For 𝑖 ∈ [2], we construct a set of
eligible vertices
𝑈𝑖 ≔ {𝑣 ∈ 𝑉 ⧵ 𝑈 ∣ rt(𝐺, (𝑝 ∪ {𝑞𝑖 ↦ 𝑣}) ∘ ?̂?
−1) ≤ 𝑐min} . (38)
Now we try to find 𝑣∗1 ≠ 𝑣
∗
2 as
(𝑣∗1 , 𝑣
∗
2) ≔ argmin
(𝑣1,𝑣2)∈𝑈1×𝑈2
𝑑(𝑣1, 𝑣2) . (39)
18
If such (𝑣∗1 , 𝑣
∗
2) does not exist, we do not include 𝑞1 and 𝑞2 in 𝑝; otherwise, we set 𝑝 ← 𝑝 ∪ {𝑞1 ↦
𝑣∗1 , 𝑞2 ↦ 𝑣
∗
2 } and update 𝑈 ← 𝑈 ∪ {𝑣
∗
1 , 𝑣
∗
2 }. After iterating over all gates in tg(𝐿), we return 𝑝.
The time complexity of the incremental mapper is
𝑂(|𝐿| (|𝐸|𝑡𝑝 + |𝑉|𝑡𝑝 + |𝑉|
2)) . (40)
This assumes we have access to the all-pairs distance matrix of the architecture graph, which can be
precomputed in time Θ(|𝑉|3) [17] (independent of the input circuit).
4.2 Circuit Size Mappers
We now discuss mappers that optimize for circuit size. The behavior of such mappers is somewhat
different frommappers optimizing for circuit depth. If there is any gate that can be performed without
moving qubits, then there is no disadvantage to doing that immediately since it will have to be per-
formed eventually. If there is any such gate, we simply return the empty placement. Thus we assume,
for all mappers in this section, that there are no gates to be performed in-place.
4.2.1 Greedy Size Mapper
The greedy size mapper the same as the greedy depth mapper (Section 4.1.1), except that we replace rt(·)
with rs(·) in (35).
4.2.2 Simple Size Mapper
The simple size mapper places only the lowest-cost gate at its lowest-cost location. More precisely, we
find a placement 𝑝 ≔ {𝑞1 ↦ 𝑣1, 𝑞2 ↦ 𝑣2} that attains the optimum
min
(𝑞1,𝑞2)∈tg(𝐿)
min
(𝑣1,𝑣2)∈𝐸
rs(𝐺, (𝑝 ∪ {𝑞1 ↦ 𝑣1, 𝑞2 ↦ 𝑣2}) ∘ ?̂?
−1) (41)
where we consider all orderings of the edges of 𝐸, and return 𝑝. Note that we have replaced rt(·) with
rs(·) in (37). The time complexity of the simple size mapper is 𝑂(|𝐿||𝐸|𝑡𝑝).
4.2.3 Extension Size Mapper
The extension size mapper first finds an initial placement 𝑝 using (41). Let 𝑐′min be the value attained
at the optimum for (41). After finding the initial placement, we try to only place another gate if it is
cheaper to place now rather than in a later call to the mapper.
Specifically, for the current 𝑝 and ?̂?, we define ?̂?′∶ 𝑄 → 𝑉 as the placement after performing the
permutation circuit constructed from transpositions achieving rs(𝐺, 𝑝 ∘ ?̂?−1). Let 𝑈 ≔ ∅. Now we
define a heuristic for the number of saved transpositions, 𝑠∶ 𝑄 × 𝑄 → ℕ, as
𝑠(𝑞1, 𝑞2) ≔ rs(𝐺, 𝑝 ∘ ?̂?
−1) + min
(𝑣1,𝑣2)∈𝐸
rs(𝐺, {𝑞1 ↦ 𝑣1, 𝑞2 ↦ 𝑣2} ∘ (?̂?
′)−1)
− min
(ᵆ1,ᵆ2)∈𝐸′
rs(𝐺, (𝑝 ∪ {𝑞1 ↦ 𝑢1, 𝑞2 ↦ 𝑢2}) ∘ ?̂?
−1) ,
(42)
where 𝐸′ is the edge set of 𝐺[𝑉 ⧵ 𝑈] and we consider all orderings of the edges of 𝐸 and 𝐸′.
The extension size mapper iterates the following. We find the gate (𝑞∗1 , 𝑞
∗
2) ∈ tg(𝐿) attaining
𝑠max ≔ max
(𝑞1,𝑞2)∈tg(𝐿)
𝑠(𝑞1, 𝑞2) , (43)
19
and let (𝑢∗1 , 𝑢
∗
2) ∈ 𝐸
′ be the edge attaining 𝑠max as given by (42). If 𝑠max ≥ 0, we set 𝑝 ← 𝑝 ∪
{𝑞∗1 ↦ 𝑢
∗
1 , 𝑞
∗
2 ↦ 𝑢
∗
2}, remove the gate (𝑞
∗
1 , 𝑞
∗
2) from 𝐿, update 𝑈 ← 𝑈 ∪ {𝑣
∗
1 , 𝑣
∗
2 }, and iterate; otherwise,
we stop and return 𝑝.
Calculating 𝑠(𝑞1, 𝑞2) for any 𝑞1, 𝑞2 ∈ 𝑄 takes time 𝑂(|𝐸|𝑡𝑝). Therefore, the total time complexity of
the extension size mapper is
𝑂(|𝐿|2|𝐸|𝑡𝑝) . (44)
4.2.4 Qiskit-based Mapper
Finally, we implement a mapper that is based on Qiskit’s circuit transformation (described in Sec-
tion 2.2.1). Since this is a mapper, we only execute one iteration of the circuit transformation: for the
first layer 𝐿. We also do not modify the output circuit, but instead return the final ?̂? that would be
induced by executing all swaps found during the mapping process.
We make three changes to Qiskit’s circuit transformation. The first is that when minimizing 𝑆,
instead of choosing a maximal set of swaps in every iteration, we choose only one swap along an edge
𝑒 ∈ 𝐸 that minimizes 𝑆. The second is that the upper bound on the number of iterations is raised to
|𝑉|2, since we only apply one swap per iteration. Thirdly, if no trial is successful, we fall back to the
simple size mapper (Section 4.2.2) and return the placement it finds, which places only one gate in this
iteration.
We now give the time complexity of the Qiskit mapper. First, we compute an all-to-all distance ma-
trix in timeΘ(|𝑉|3) [17], which we ignore since it is a one-time cost dependent only on the architecture.
Each of the 𝑂(|𝑉|2) iterations has a time complexity of 𝑂(|𝐸||𝐿|). Thus, the Qiskit mapper has time
complexity 𝑂(|𝑉|2|𝐸||𝐿|).
5 Results
We implement the circuit transformation introduced in Section 2.2.3 with a variety of mappers and
appropriate permuters. We also implement the greedy swap transformation described in Section 2.2.2.
We check the validity of our implementations by testing closeness in fidelity of the original output
state and that of the transformed circuit for random input states of 11 qubits on random circuits [48]
(described in the next section).
5.1 Evaluation Criteria
When testing the performance of these circuit transformations, each is allocated at most 8GB of RAM
and 2 days to transform all circuits of a data point. For each data point we transform 10 random circuits
and 1 QSP circuit. We consider a 2-day runtime acceptable, given that classical computational resources
are plentiful compared to quantum ones. We generate the data on a heterogeneous cluster with Intel
Opteron 2354 and Intel Xeon X5560 processors.
The Cartesian permuter (Section 3.1.5), the general size permuter (Section 3.2.1), and Qiskit’s circuit
transformation (Section 2.2.1) are randomized. We run multiple trials of these permuters and take the
best result. Most of the time, trials produce equally good permutation circuits, although occasionally
they deviate by a few swap gates. Our mappers run permuters 𝑂(|𝐿||𝐸|) times, so we do only 4 trials to
quickly remove any bad outliers. In contrast, our circuit transformation only directly runs a permuter
once per layer of gates, so in this case we perform a slower 100 trials in an attempt to save a few swaps.
We leave the number of trials for Qiskit’s circuit transformation at its default of 40.
20
We test the performance of circuit transformations for the grid, 𝑃𝑛1 × 𝑃𝑛2, using the permuter from
Section 3.1.5 and themodular architecture,Mod(𝑛1, 𝑛2), with the permuter fromSection 3.1.4, for𝑛1, 𝑛2 ∈
ℕ. For an 𝑁-qubit circuit, we set 𝑛1 = 𝑛2 = ⌈√𝑁⌉ so that there are enough qubits in the architecture
to contain the circuit. By Corollary 3.8, we know that taking 𝑛1 = 𝑛2 minimizes the routing time for
our routing strategy among all grids with the same number of qubits. It is less clear how to balance
parameters for the modular architecture since Corollary 3.7 does not depend on 𝑛1 and 𝑛2. For 𝑛1 ≪ 𝑛2
or 𝑛2 ≪ 𝑛1, less movement of qubits is needed, since many qubits are adjacent to one another. Thus,
we take 𝑛1 = 𝑛2 in an attempt to consider a hard case. For some values of 𝑁, it may also be possible to
find parameters 𝑛′1 ≠ 𝑛
′
2 such that 𝑁 ≤ 𝑛
′
1𝑛
′
2 < ⌈√𝑁⌉
2 = 𝑛1𝑛2, requiring fewer qubits. However, this
introduces unwanted size-dependent behavior in our results when |𝑛′1 − 𝑛
′
2| ≫ 0 for one circuit size
and 𝑛′1 ≈ 𝑛
′
2 for the next, so we find it preferable to fix 𝑛1 = 𝑛2.
We compare the transformed circuits in terms of their weighted depth and weighted size. For both
trapped-ion and superconducting qubits, two-qubit gates typically have longer execution times and
lower fidelities than single-qubit gates [32]. Even among two-qubit gates there is a difference between
execution times. Assuming fast local unitaries, the swap gate has 1–3 times the interaction cost of a
cnot depending on the physical interactions used to realize the gates [53]. For simplicity, we assign
unit cost for one-qubit gates, cost 10 for cnot, and cost 30 for swap. We define the weighted size of a
circuit as the sum of all gate weights and the weighted depth of a circuit as the maximum-weight path
in the DAG of the circuit, where the weight of a path is the sum of the weights of the gates along it.
We consider two circuit families: randomcircuits and quantum signal processing (QSP) circuits [33].
Random circuits have been proposed for quantum computational supremacy experiments on near-term
quantum devices [9, 11]. Such proposals typically construct random circuits so that architecture con-
straints are automatically obeyed. For our purposes, random circuits provide a class of examples with
little structure for circuit transformations to exploit, so we expect them to represent a hard case with
large overhead. We generate a fixed set of 10 random circuits for various qubit counts. We set the num-
ber of circuit layers to 20. For each layer, we bin the qubits into pairs uniformly at random and assign
each pair of qubits a Haar-random unitary from SU(4). Finally, we decompose each unitary into the
smallest possible number of cnot + SU(2) gates [51]. This random circuit generator is provided by
Qiskit [8].
We consider QSP circuits for Hamiltonian simulation as an example of a realistic quantum algo-
rithm. We use the unoptimized circuits provided in [14], decomposed into 𝑍 rotations, cnot gates, and
single-qubit Clifford gates. The QSP algorithm requires precise angles that turn out to be expensive to
compute. Therefore, [14] uses randomized angles instead, giving a circuit that does not correctly imple-
ment theHamiltonian simulation. Nevertheless, the circuit corresponds to an accurate implementation
of QSP, up to rotation angles, and can be used for benchmarking resources. Furthermore, the circuit
transformations we construct are unaffected by those angles. We only consider one pair of phased it-
erates of the QSP algorithm (𝑉†𝜑𝑖+𝜋𝑉𝜑𝑖−1 as in [14, Eq. 31]). A full QSP circuit for the architecture can
be constructed by iterating the mapped circuit of such phased iterates, a permutation circuit between
iterations, state preparation, and state unpreparation. The cost of the transformed phased iterates dom-
inates all other costs of the construction, so the total cost can be estimated by taking our result times
the number of iterations.
The circuit transformations from Section 2.2.3 are constructed from a permuter and a mapper. We
denote such circuit transformations by tf∶ {d,s} × ℳ, whereℳ is the set of all mappers (see Table 2),
“d” denotes an appropriate depth permuter (Section 3.1), and “s” denotes the general size permuter
(Section 3.2.1). For example, by tf(d,greedy depth) we denote a circuit transformation with a depth
permuter for the architecture and the greedy depth mapper (Section 4.1.1).
21
Abbreviation Mapper name Section
greedy depth greedy depth mapper 4.1.1
incremental incremental depth mapper 4.1.2
greedy size greedy size mapper 4.2.1
simple simple size mapper 4.2.2
extend extension size mapper 4.2.3
qiskit qiskit-based mapper 4.2.4
Table 2: The abbreviated names of the set of mappersℳ used to construct circuit transformations tf∶ {d,s} ×ℳ.
5.2 Numerical Results
Figure 1 plots our results. We first consider the random circuit results. For the grid, we find that tf(d,in-
cremental) shows much slower growth of weighted depth than circuit transformations that do not use
depth-optimized permuters (Section 3.1.5). We also note that tf(d,qiskit) performs much better than
Qiskit’s circuit transformation (Section 2.2.1), suggesting that depth-optimized permuters can offer a
significant advantage. On the modular graph, Qiskit’s circuit transformation is much better at mini-
mizing the weighted depth, but tf(d,qiskit) starts closing the gap for larger sizes. Unfortunately, we do
not know if tf(d,qiskit) performs better at larger sizes because Qiskit’s circuit transformation is not fast
enough to generate the relevant data. Up to 100 qubits tf(s,qiskit) achieves the best weighted size on
grid architectures, and tf(s,simple) does best on modular architectures up to 121 qubits. For all sizes the
greedy swap circuit transformation (Section 2.2.2) performs as one of the best at optimizing for weighted
circuit size. The greedy swap circuit transformation is also able transform larger circuits within the time
limit as expected from its lower time complexity.
For larger QSP circuits, the greedy circuit transformation (Section 2.2.2) is the clear winner in both
weighted depth and weighted size, suggesting that it may be a good approach for practical quantum
circuits. Surprisingly, tf(s,qiskit) also performs fairly well at minimizing the depth despite targeting the
circuit size.
6 Conclusion and FutureWork
Wehave specified various ways to efficiently transform general quantum circuits to respect architecture
constraints while attempting to minimize the overhead. We investigated the qubit movement subprob-
lem and proposed Partial Routing via Matchings and Partial Token Swapping as models of
our optimization objectives of minimizing the circuit depth and circuit size, respectively. We gave al-
gorithms for Partial Routing via Matchings for the path graph, the complete graph, and for the
generalized hierarchical products of graphs, and showed tighter bounds for certain partial permuta-
tions. We then gave more detailed analyses of special cases of the generalized hierarchical product that
arise in proposed quantum architectures: the Cartesian product (e.g., for grid architectures) and the
modular architecture. We also showed a 4-approximation algorithm for Partial Token Swapping on
general graphs.
We constructed circuit transformations with a variety of heuristic qubit placement strategies (called
mappers). Amapper attempts to find suitable qubit placements on the architecture to execute the circuit
succinctly. Given a permuter subroutine, ourmappers can handle any connected simple graph. We also
showed how to construct a circuit transformation from a permuter and a mapper.
Finally, we tested our circuit transformations against Qiskit’s circuit transformation and a basic
22
010000
20000
30000
40000
50000
60000
we
ig
ht
ed
 d
ep
th
0 50 100 150 200 250
qubits
0
100000
200000
300000
400000
500000
600000
700000
800000
we
ig
ht
ed
 si
ze
0 50 100 150 200 250
qubits
0
100000
200000
300000
400000
500000
we
ig
ht
ed
 d
ep
th
0 20 40 60 80 100
input circuit qubits
0
200000
400000
600000
800000
we
ig
ht
ed
 si
ze
0 20 40 60 80 100
input circuit qubits
greedy_swap
original
qiskit
tf(d,greedy depth)
tf(d,incremental)
tf(d,qiskit)
tf(s,extend)
tf(s,greedy size)
tf(s,qiskit)
tf(s,simple)
Figure 1: The weighted depth and weighted size for transformed random circuits (top two rows) and QSP cir-
cuits (bottom two rows) on the grid architecture (left column) and the modular architecture (right column). We
generate fixed sets of 10 random circuits for increasing qubit counts and plot the mean and standard deviation for
each data point. One QSP circuit is considered for each data point. The metrics for the original circuit are also
given to make the overhead introduced in circuit transformations explicit; note that the original circuit does not
respect the architecture constraints. The notation tf∶ {d,s} × ℳ indicates a circuit transformation constructed
from either an appropriate depth (“d”) permuter or the size (“s”) permuter and one of our mappers (Table 2).
23
greedy strategy with large quantum circuits on a grid or modular architecture. When optimizing for
weighted circuit size, our greedy circuit transformation was one of the best in all cases, though using
our circuit transformations with algorithms for Partial Token Swapping sometimes gave a slight
advantage. For the weighted circuit depth, the picture was more nuanced. We found that algorithms
using PartialRoutingviaMatchings for qubitmovement could give good performance for random
circuits, but Qiskit’s circuit transformation and our greedy circuit transformation also performed well
and gave the best results in some cases.
We would like to better understand what circuit transformations work best for which architectures,
quantum algorithms, and objective functions. We also would like to use the tools of Partial Routing
viaMatchings and Partial Token Swapping to establish bounds on the overhead of specific quan-
tum architectures. Ideally, we could use these tools and circuit transformations to design architectures
that offer good performance subject to realistic hardware constraints and to compute realistic resource
estimates for implementations of quantum algorithms.
There are many ways our methods could be improved. It would be interesting to knowwhether one
can do better than just using swap gates to route qubits. Our mapper algorithms may also be improved
by including some form of lookahead to consider later layers of the given circuit, or by specializing
mappers to particular architectures.
Modeling the architecture as a simple graph loses information about the underlying hardware. For
example, in the IBM system the architecture edges have directionality indicating the control and tar-
get of cnots. In implementations of the modular architecture, the interconnecting links are probably
much noisier and slower than local operations. In general, gate costs and times can vary significantly
across a hardware implementation and sometimes even vary over time [42]. Adapting to variable costs
and keeping track of operations performed asynchronously is challenging but could be worthwhile for
architectures that support a mixture of fast and slow operations.
Finally, we hope that future progress on the challenges addressed in this paper will be facilitated
by a suitable set of benchmarks of large quantum circuits. We publicly make available and license our
source code, benchmark circuits, and results (in TSV format) [48] and encourage others to do the same.
Acknowledgments
The authors would like to thank Aniruddha Bapat for insights in the hierarchical product of graphs and
suggestions for tightening the routing lower bound on these graphs. We would also like to thank Drew
Risinger for helpful formative discussions. This work was supported in part by the Army Research
Office (MURI award number W911NF-16-1-0349), the Canadian Institute for Advanced Research, the
National Science Foundation (grant number 1813814), and the U.S. Department of Energy, Office of Sci-
ence, Office of Advanced Scientific Computing Research, Quantum Algorithms Teams and Quantum
Testbed Pathfinder (award number DE-SC0019040) programs.
References
[1] M. Ajtai, J. Komlós, and E. Szemerédi. “An 𝑂(𝑛 log𝑛) sorting network”. In: Proceedings of the
fifteenth annual ACM symposium on theory of computing (STOC). ACM Press, 1983, pp. 1–9. isbn:
0-89791-099-0. doi: 10.1145/800061.808726.
[2] Noga Alon, F. R. K. Chung, and R. L. Graham. “Routing permutations on graphs via matchings”.
In: SIAM journal on discrete mathematics 7.3 (May 1994), pp. 513–530. issn: 0895-4801. doi: 10.
1137/s0895480192236628.
24
[3] Indranil Banerjee and Dana Richards. “New results on routing via matchings on graphs”. In:
Fundamentals of computation theory. Springer Berlin Heidelberg, 2017, pp. 69–81. doi: 10.1007/
978-3-662-55751-8_7.
[4] Indranil Banerjee, Dana Richards, and Igor Shinkar. “Sorting networks on restricted topologies”.
In:SOFSEM2019: theory andpractice of computer science. Springer International Publishing, 2019,
pp. 54–66. doi: 10.1007/978-3-030-10801-4_6.
[5] Aniruddha Bapat, Zachary Eldredge, James R. Garrison, AbhinavDeshpande, Frederic T. Chong,
andAlexeyV. Gorshkov. “Unitary entanglement construction in hierarchical networks”. In: Phys-
ical review a 98.6 (Dec. 26, 2018). doi: 10.1103/PhysRevA.98.062328.
[6] L. Barrière, C.Dalfó,M.A. Fiol, andM.Mitjana. “The generalizedhierarchical product of graphs”.
In: Discrete mathematics 309.12 (June 2009), pp. 3871–3881. issn: 0012-365X. doi: 10.1016/j.
disc.2008.10.028.
[7] R. Beals, S. Brierley, O. Gray, A. W. Harrow, S. Kutin, N. Linden, D. Shepherd, and M. Stather.
“Efficient distributed quantum computing”. In: Proceedings of the royal society a: mathematical,
physical and engineering sciences 469.2153 (Feb. 20, 2013). doi: 10.1098/rspa.2012.0686.
[8] LucianoBello, JimChallenger,AndrewCross, Ismael Faro, JayGambetta, JuanGomez,Ali Javadi-
Abhari, PacoMartin, DiegoMoreda, Jesus Perez, ErickWinston, and ChrisWood.Qiskit. An open
source quantum computing framework for writing quantum experiments, programs, and applica-
tions. IBM. 2017. url: https://www.qiskit.org/ (visited on 08/21/2018).
[9] Sergio Boixo, Sergei V. Isakov, Vadim N. Smelyanskiy, Ryan Babbush, Nan Ding, Zhang Jiang,
Michael J. Bremner, JohnM.Martinis, andHartmutNeven. “Characterizing quantum supremacy
in near-term devices”. In: Nature physics 14.6 (June 2018), pp. 595–600. doi: 10.1038/s41567-
018-0124-x.
[10] Édouard Bonnet, Tillmann Miltzow, and Paweł Rzążewski. “Complexity of token swapping and
its variants”. In:Algorithmica 80.9 (Oct. 2017), pp. 2656–2682. doi: 10.1007/s00453-017-0387-0.
[11] Adam Bouland, Bill Fefferman, Chinmay Nirkhe, and Umesh Vazirani. “On the complexity and
verification of quantum random circuit sampling”. In: Nature physics (Oct. 29, 2018). doi: 10.
1038/s41567-018-0318-2.
[12] Teresa Brecht, Wolfgang Pfaff, Chen Wang, Yiwen Chu, Luigi Frunzio, Michel H. Devoret, and
Robert J. Schoelkopf. “Multilayer microwave integrated quantum circuits for scalable quantum
computing”. In: Npj quantum information 2.16002 (Feb. 23, 2016). doi: 10.1038/npjqi.2016.2.
[13] Stephen Brierley. “Efficient implementation of quantum circuits with limited qubit interactions”.
In: Quantum info. comput. 17.13-14 (Nov. 2017), pp. 1096–1104. issn: 1533-7146. arXiv: 1507.04263
[quant-ph].
[14] Andrew M. Childs, Dmitri Maslov, Yunseong Nam, Neil J. Ross, and Yuan Su. “Toward the first
quantum simulation with quantum speedup”. In: Proceedings of the national academy of sciences
115.38 (Sept. 18, 2018), pp. 9456–9461. doi: 10.1073/pnas.1801723115.
[15] Byung-Soo Choi and Rodney van Meter. “A Θ(√𝑁)-depth quantum adder on the 2D NTC quan-
tum computer architecture”. In: Acm journal on emerging technologies in computing systems 8.3
(Aug. 3, 2012), 24:1–24:22. issn: 1550-4832. doi: 10.1145/2287696.2287707.
[16] Byung-Soo Choi and Rodney vanMeter. “On the effect of quantum interaction distance on quan-
tum addition circuits”. In: ACM journal on emerging technologies in computing systems 7.3 (Aug.
2011), pp. 1–17. doi: 10.1145/2000502.2000504.
25
[17] Robert W. Floyd. “Algorithm 97: shortest path”. In: Communications of the ACM 5.6 (June 1962),
p. 345. doi: 10.1145/367766.368168.
[18] Austin G. Fowler, Simon J. Devitt, and Lloyd C. L. Hollenberg. “Implementation of Shor’s al-
gorithm on a linear nearest neighbour qubit array”. In: Quantum information & computation 4
(Feb. 25, 2004), pp. 237–251. arXiv: quant-ph/0402196v1.
[19] Harold N. Gabow and Robert E. Tarjan. “Faster scaling algorithms for network problems”. In:
SIAM journal on computing 18.5 (Oct. 1989), pp. 1013–1036. doi: 10.1137/0218069.
[20] Google Quantum AI Lab. A preview of Bristlecone, Google’s new quantum processor. Mar. 5, 2018.
url: https://ai.googleblog.com/2018/03/a-preview-of-bristlecone-googles-
new.html (visited on 10/09/2018).
[21] Jeff Heckey, Shruti Patil, Ali JavadiAbhari, Adam Holmes, Daniel Kudrow, Kenneth R. Brown,
Diana Franklin, Frederic T. Chong, andMargaretMartonosi. “Compilermanagement of commu-
nication and parallelism for quantum computation”. In: Proceedings of the twentieth international
conference on architectural support for programming languages and operating systems (ASPLOS).
ACM Press, 2015. doi: 10.1145/2694344.2694357.
[22] Steven Herbert. On the depth overhead incurred when running quantum algorithms on near-term
quantum computers with limited qubit connectivity. Sept. 25, 2018. arXiv: 1805.12570v4 [quant-
ph].
[23] Yuichi Hirata, Masaki Nakanishi, Shigeru Yamashita, and Yasuhiko Nakashima. “An efficient
method to convert arbitrary quantum circuits to ones on a linear nearest neighbor architecture”.
In: 2009 third international conference on quantum, nano andmicro technologies. IEEE, Feb. 2009.
doi: 10.1109/icqnm.2009.25.
[24] John E. Hopcroft and Richard M. Karp. “An 𝑛5/2 algorithm for maximummatchings in bipartite
graphs”. In: SIAM journal on computing 2.4 (Dec. 1973), pp. 225–231. doi: 10.1137/0202019.
[25] Donald E. Knuth. “Networks for sorting”. In: Second. Vol. 3. The Art of Computer Programming.
Addison-Wesley Professional, 1998, pp. 219–247. isbn: 0201896850.
[26] H. W. Kuhn. “The hungarian method for the assignment problem”. In: Naval research logistics
quarterly 2.1-2 (Mar. 1955), pp. 83–97. doi: 10.1002/nav.3800020109.
[27] Manfred Kunde. “Optimal sorting on multi-dimensionally mesh-connected computers”. In: Pro-
ceedings of the symposium on theoretical aspects of computer science (STACS). Vol. 247. Lecture
Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg, 1987, pp. 408–419.
isbn: 978-3-540-47419-7. doi: 10.1007/bfb0039623.
[28] Samuel A. Kutin, David Petrie Moulton, and Lawren M. Smithline. “Computation at a distance”.
In: Chicago journal of theoretical computer science 13.1 (Sept. 25, 2007), pp. 1–17. doi: 10.4086/
cjtcs.2007.001.
[29] L. Lao, B. van Wee, I. Ashraf, J. van Someren, N. Khammassi, K. Bertels, and C. G. Almudever.
“Mapping of lattice surgery-based quantum circuits on surface code architectures”. In:Quantum
science and technology 4.1 (Sept. 2018), p. 015005. doi: 10.1088/2058-9565/aadd1a.
[30] Gushu Li, Yufei Ding, and Yuan Xie. “Tackling the qubit mapping problem for nisq-era quantum
devices”. In: Proceedings of the twenty-fourth international conference on architectural support for
programming languages and operating systems (ASPLOS). Ed. by Iris Bahar, Maurice Herlihy,
Emmett Witchel, and Alvin R. Lebeck. Providence, RI, USA: ACM, 2019, pp. 1001–1014. isbn:
978-1-4503-6240-5. doi: 10.1145/3297858.3304023.
26
[31] Chia-ChunLin, Susmita Sur-Kolay, andNirajK. Jha. “PAQCS: physical design-aware fault-tolerant
quantum circuit synthesis”. In: IEEE transactions on very large scale integration (VLSI) systems
23.7 (July 2015), pp. 1221–1234. doi: 10.1109/tvlsi.2014.2337302.
[32] Norbert M. Linke, Dmitri Maslov, Martin Roetteler, Shantanu Debnath, Caroline Figgatt, Kevin
A. Landsman, Kenneth Wright, and Christopher Monroe. “Experimental comparison of two
quantum computing architectures”. In: Proceedings of the national academy of sciences 114.13
(Mar. 28, 2017), pp. 3305–3310. doi: 10.1073/pnas.1618020114.
[33] Guang Hao Low and Isaac L. Chuang. “Optimal Hamiltonian simulation by quantum signal pro-
cessing”. In: Physical review letters 118.1 (Jan. 2017). doi: 10.1103/physrevlett.118.010501.
[34] Aaron Lye, Robert Wille, and Rolf Drechsler. “Determining the minimal number of SWAP gates
for multi-dimensional nearest neighbor quantum circuits”. In: 20th asia and south pacific design
automation conference (ASP-DAC). IEEE, 2015. isbn: 978-1-4799-7792-5. doi: 10.1109/aspdac.
2015.7059001.
[35] D. Maslov, S. M. Falconer, and M. Mosca. “Quantum circuit placement”. In: IEEE transactions
on computer-aided design of integrated circuits and systems 27.4 (Mar. 21, 2008), pp. 752–763. doi:
10.1109/tcad.2008.917562.
[36] Dmitri Maslov. “Linear depth stabilizer and quantum Fourier transformation circuits with no
auxiliary qubits in finite-neighbor quantum architectures”. In: Physical review a 76.5 (Nov. 2007).
doi: 10.1103/physreva.76.052310.
[37] Tzvetan S.Metodi, DarshanD.Thaker, AndrewW.Cross, FredericT. Chong, and Isaac L. Chuang.
“Scheduling physical operations in a quantum information processor”. In:Quantum information
and computation IV. Ed. by Eric J. Donkor, Andrew R. Pirich, and Howard E. Brandt. Vol. 6244.
SPIE, May 12, 2006. doi: 10.1117/12.666419.
[38] Silvio Micali and Vijay V. Vazirani. “An 𝑂(√|𝑉||𝐸|) algorithm for finding maximummatching in
general graphs”. In: 21st annual symposium on foundations of computer science (sfcs). IEEE, Oct.
1980. doi: 10.1109/sfcs.1980.12.
[39] Tillmann Miltzow, Lothar Narins, Yoshio Okamoto, Günter Rote, Antonis Thomas, and Takeaki
Uno. “Approximation and hardness of token swapping”. In: 24th annual european symposium on
algorithms (ESA). Ed. by Piotr Sankowski and Christos Zaroliagis. Vol. 57. Leibniz International
Proceedings in Informatics (LIPIcs). Dagstuhl,Germany: SchlossDagstuhl–Leibniz-Zentrum fuer
Informatik, Aug. 2016, 66:1–66:15. isbn: 978-3-95977-015-6. doi: 10.4230/LIPIcs.ESA.2016.66.
[40] C. Monroe and J. Kim. “Scaling the ion trap quantum processor”. In: Science 339.6124 (Mar. 8,
2013), pp. 1164–1169. doi: 10.1126/science.1231298.
[41] C. Monroe, R. Raussendorf, A. Ruthven, K. R. Brown, P. Maunz, L.-M. Duan, and J. Kim. “Large-
scalemodular quantum-computer architecturewith atomicmemory andphotonic interconnects”.
In: Physical review a 89.2 (Feb. 2014). doi: 10.1103/physreva.89.022317.
[42] PrakashMurali, JonathanM.Baker,Ali Javadi-Abhari, FredericT.Chong, andMargaretMartonosi.
“Noise-adaptive compiler mappings for noisy intermediate-scale quantum computers”. In: Pro-
ceedings of the twenty-fourth international conference on architectural support for programming
languages and operating systems (ASPLOS). Ed. by Iris Bahar, Maurice Herlihy, EmmettWitchel,
andAlvin R. Lebeck. Providence, RI, USA:ACM, 2019, pp. 1015–1029. isbn: 978-1-4503-6240-5. doi:
10.1145/3297858.3304075.
[43] FlemmingNielson,HanneRiisNielson, andChrisHankin.Principles of programanalysis. Springer
Berlin Heidelberg, 1999. doi: 10.1007/978-3-662-03811-6.
27
[44] M. Pedram and A. Shafaei. “Layout optimization for quantum circuits with linear nearest neigh-
bor architectures”. In: Ieee circuits and systems magazine 16.2 (2016), pp. 62–74. issn: 1531-636X.
doi: 10.1109/MCAS.2016.2549950.
[45] Rigetti. QPU specifications. Rigetti 16q aspen-1. 2018. url: https://www.rigetti.com/qpu
(visited on 01/23/2019).
[46] David J. Rosenbaum. “Optimal quantum circuits for nearest-neighbor architectures”. In: 8th con-
ference on the theory of quantum computation, communication and cryptography (TQC). Ed. by Si-
mone Severini and Fernando Brandao. Vol. 22. Leibniz International Proceedings in Informatics
(LIPIcs). Dagstuhl, Germany: Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2013, pp. 294–
307. isbn: 978-3-939897-55-2. doi: 10.4230/LIPIcs.TQC.2013.294.
[47] Mehdi Saeedi, RobertWille, and Rolf Drechsler. “Synthesis of quantum circuits for linear nearest
neighbor architectures”. In:Quantum information processing 10.3 (June 1, 2011), pp. 355–377. issn:
1573-1332. doi: 10.1007/s11128-010-0201-2.
[48] Eddie Schoute, CemUnsal, andAndrewChilds.Arct. Architecture-respect circuit transformations.
2019. url: https://gitlab.umiacs.umd.edu/amchilds/arct (visited on 02/24/2019).
[49] Alireza Shafaei, Mehdi Saeedi, and Massoud Pedram. “Qubit placement to minimize communi-
cation overhead in 2D quantum architectures”. In: 19th asia and south pacific design automation
conference (ASP-DAC). IEEE, Jan. 2014. isbn: 978-1-4799-2816-3. doi: 10.1109/aspdac.2014.
6742940.
[50] IBMQTeam. IBMQExperience devices. 2018. url: https://quantumexperience.ng.bluemix.
net/qx/devices (visited on 10/09/2018).
[51] Farrokh Vatan and Colin Williams. “Optimal quantum circuits for general two-qubit gates”. In:
Physical review a 69.3 (Mar. 22, 2004). doi: 10.1103/physreva.69.032315.
[52] Davide Venturelli, Minh Do, Eleanor Rieffel, and Jeremy Frank. “Compiling quantum circuits to
realistic hardware architectures using temporal planners”. In: Quantum science and technology
3.2 (Feb. 2018), p. 025004. doi: 10.1088/2058-9565/aaa331.
[53] G. Vidal, K. Hammerer, and J. I. Cirac. “Interaction cost of nonlocal gates”. In: Physical review
letters 88 (23 May 2002), p. 237902. doi: 10.1103/PhysRevLett.88.237902.
[54] Mark Whitney, Nemanja Isailovic, Yatish Patel, and John Kubiatowicz. “Automated generation
of layout and control for quantum circuits”. In: Proceedings of the 4th international conference on
computing frontiers (CF). Ischia, Italy: ACM, 2007, pp. 83–94. isbn: 978-1-59593-683-7. doi: 10.
1145/1242531.1242546.
[55] Robert Wille, Oliver Keszocze, Marcel Walter, Patrick Rohrs, Anupam Chattopadhyay, and Rolf
Drechsler. “Look-ahead schemes for nearest neighbor optimization of 1D and 2D quantum cir-
cuits”. In: 21st asia and south pacific design automation conference (ASP-DAC). IEEE, Jan. 2016,
pp. 292–297. doi: 10.1109/aspdac.2016.7428026.
[56] RobertWille, Aaron Lye, and Rolf Drechsler. “Exact reordering of circuit lines for nearest neigh-
bor quantum architectures”. In: IEEE transactions on computer-aided design of integrated circuits
and systems 33.12 (Dec. 2014), pp. 1818–1831. doi: 10.1109/tcad.2014.2356463.
[57] Katsuhisa Yamanaka, Erik D. Demaine, Takehiro Ito, Jun Kawahara, Masashi Kiyomi, Yoshio
Okamoto, Toshiki Saitoh, Akira Suzuki, Kei Uchizawa, and Takeaki Uno. “Swapping labeled to-
kens on graphs”. In: Fun with algorithms. Fun 2014. Vol. 8496: Fun with algorithms. Lecture Notes
in Computer Science. Springer International Publishing, 2014, pp. 364–375. isbn: 978-3-319-07890-
8. doi: 10.1007/978-3-319-07890-8_31.
28
[58] Louxin Zhang. “Optimal bounds for matching routing on trees”. In: SIAM journal on discrete
mathematics 12.1 (Jan. 1999), pp. 64–77. doi: 10.1137/s0895480197323159.
[59] Alwin Zulehner, Alexandru Paler, and Robert Wille. “An efficient methodology for mapping
quantum circuits to the IBMQXarchitectures”. In: IEEE transactions on computer-aided design of
integrated circuits and systems (June 7, 2018). issn: 1937-4151. doi: 10.1109/tcad.2018.2846658.
[60] Alwin Zulehner and RobertWille. “Compiling SU(4) quantum circuits to IBMQX architectures”.
In: 24th asia and south pacific design automation conference (ASP-DAC). Tokyo, Japan: ACM
Press, 2019, pp. 185–190. isbn: 978-1-4503-6007-4. doi: 10.1145/3287624.3287704.
29
