Qubit Mapping Based on Subgraph Isomorphism and Filtered Depth-Limited
  Search by Li, Sanjiang et al.
Qubit Mapping Based on Subgraph Isomorphism and
Filtered Depth-Limited Search
Sanjiang Li∗1, Xiangzhen Zhou1,2, and Yuan Feng†1
1Centre for Quantum Software and Information, Faculty of Engineering and
Information Technology, University of Technology Sydney, NSW 2007, Australia
2State Key Lab of Millimeter Waves, Southeast University, Nanjing 211189,
China
Abstract
Mapping logical quantum circuits to Noisy Intermediate-Scale Quantum (NISQ) devices
is a challenging problem which has attracted rapidly increasing interests from both quantum
and classical computing communities. This paper proposes an efficient method by (i) selecting
an initial mapping that takes into consideration the similarity between the architecture graph
of the given NISQ device and a graph induced by the input logical circuit; and (ii) searching,
in a filtered and depth-limited way, a most useful swap combination that makes executable as
many as possible two-qubit gates in the logical circuit. The proposed circuit transformation
algorithm can significantly decrease the number of auxiliary two-qubit gates required to be
added to the logical circuit, especially when it has a large number of two-qubit gates. For
an extensive benchmark set of 131 circuits and IBM’s current premium Q system, viz., IBM
Q Tokyo, our algorithm needs, in average, 0.4346 extra two-qubit gates per input two-qubit
gate, while the corresponding figures for three state-of-the-art algorithms are 0.6047, 0.8154,
and 1.0067 respectively.
1 Introduction
Since Shor’s exciting quantum algorithms for solving integer factorization and discrete logarithm
[17], many quantum algorithms have been proposed that could offer an exponential speed-up
when compared with best classical algorithms. These include in particular the HHL algorithm
for solving systems of linear equations [9] and other machine learning algorithms derived from
HHL (cf. [4] for a summary). Typically, the implementation of these algorithms requires quantum
computers with millions of qubits which are perhaps still not available in the next two decades.
∗sanjiang.li@uts.edu.au
†yuan.feng@uts.edu.au
1
ar
X
iv
:2
00
4.
07
13
8v
2 
 [q
ua
nt-
ph
]  
16
 A
pr
 20
20
1 2 3 4 5 6 7 8
0 9
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
Figure 1: The architecture graphs for IBM Q Tokyo.
On the other hand, IBM, Intel and Google have all announced their quantum devices with around
50-70 qubits recently. The Noisy Intermediate-Scale Quantum (NISQ) era seems coming in reality.
Despite that quantum error correction is not yet available in the near future, it is expected that
quantum supremacy could still be demonstrated on an NISQ device in, for instance, the random
circuit sampling problem.1
There is yet another gap between theoretical research on quantum algorithms and their im-
plementation on realistic quantum devices. When designing quantum algorithms, typically, the
quantum circuit model allows multi-qubit gates to act on any set of qubits without restriction.
This is, however, not the case in realistic NISQ devices, which have “limited number of qubits,
limited connectivity between qubits, restricted (hardware-specific) gate alphabets, and limited
circuit depth due to noise” [10]. In the superconducting devices of IBM, Google, and Rigetti, only
single and special two-qubit gates (like cnot or cz) are supported. Even worse, these two-qubit
gates can only be implemented between neighbouring qubits. For example, Fig. 1 shows the ar-
chitecture graph of IBM’s current premium quantum system IBM Q Tokyo (also known as IBM
Q20), which supports elementary single-qubit gates and two-qubit cnot gates and a cnot gate
can be implemented only between qubits which are connected by an (undirected) edge. In order
to use these NISQ devices, the desired quantum functionality in an ideal quantum circuit should
be transformed or mapped so that the underlying coupling constraints imposed by these quantum
devices are satisfied.
More precisely, in order to implement an ideal quantum circuit on an NISQ device like IBM Q
Tokyo, we need to address two issues. The first is to decompose the desired functionality (arbitrary
quantum gates) into elementary operations that can be directly applied on the NISQ device. This
issue has already been properly addressed in several works [2, 14, 22]. The second issue, known
as qubit mapping or circuit transformation, is to map or route the qubits in the ideal quantum
circuit to qubits of the quantum device so that the coupling constraints imposed by the quantum
device are satisfied and thus the two-qubit gates in the ideal quantum circuit are executable.
In the past several years, the qubit mapping problem has attracted rapidly increasing interests
1https://www.quantamagazine.org/quantum-supremacy-is-coming-heres-what-you-should-know-20190718/
2
from both classical and quantum computing communities, see, e.g., [13, 16, 18, 21, 15, 24, 11, 5,
7, 23] and references therein. Given an arbitrary quantum circuit and an NISQ device, the task
of qubit mapping is to construct automatically a quantum circuit with the same functionality
which can be immediately implemented in the NISQ device. For ease of presentation, we call an
arbitrary quantum circuit a logical circuit and call a circuit that is implementable in the NISQ
device a physical circuit. Similarly, we call qubits in a logical (physical) circuit logical (physical)
qubits. We assume that gates in the input logical circuit are already decomposed into elementary
gates supported by the NISQ device. In particular, each gate in the logical circuit involves at
most two qubits. Naturally, we also assume that the number of logical qubits is not larger than
the number of physical qubits.
It is not difficult to find such a solution for an input logical circuit. Indeed, we can start with
an arbitrary initial mapping and then execute one two-qubit gate per round (note single-qubit
gates can be executed directly) by inserting several swap operations to transform the current
mapping to a mapping that can execute the current two-qubit gate. The challenge lies in that
if we can find a solution with minimal overhead in terms of the number of auxiliary swap gates
added. This is crucial for the success of quantum computing as a large number of extra two-qubit
gates will significantly accumulate the error of the output physical circuit.
Finding an optimal solution for the qubit mapping problem is often very difficult. Indeed, it
is NP-complete [18] to decide if an input logical circuit can be transformed into an equivalent
physical circuit using up to k swap gates for a fixed integer k > 0. Several previous works use
off-the-shelf tools like dynamic programming [18], SAT solvers [16], temporal planners [21], Integer
Linear Programming (ILP) [8], satisfiability modulo theory (SMT) solvers [15]. In worst cases, all
these approaches take time exponential in the number of qubits.
Many other works devise specialised heuristic search algorithms for solving the qubit mapping
problem. For example, Zulehner, Paler and Wille [24] partition the input logical circuit into layers
and introduce an A∗ search algorithm. When combined with a lookahead scheme and a dedicated
method for selecting the initial mapping, their algorithm performs much better than IBM’s own
solution. However, this A∗ algorithm also takes time exponential in the number of qubits. Li, Ding
and Xie [11] propose a search algorithm based on reverse traversal, which is polynomial in the
number of qubits and works very well for small circuits with less than a hundred two-qubit gates.
Their algorithm, called sabre, outperforms the A∗-approach [24] with exponential speedup and
comparable or better results on various benchmarks. Childs, Schoute and Unsal [5] also propose
efficient methods that attempt to minimise the circuit depth or size overhead and have worst-case
time complexity polynomial in the sizes of the input circuit and the architecture graph. To this
end, they decompose the problem into two subproblems: qubit movement and qubit placement.
The first subproblem concerns how to transform the current mapping to a selected next mapping
by imposing swap gates on edges in the architecture graph, while the second subproblem gives
method to compute the next mapping. In another work, Cowtan et al. [7] describe a solution
implemented in the platform-independent compiler t|ket〉. They also partition the input circuit
into layers and then select the swap which can maximally reduce the diameter of the subgraph
composed of all pairs of qubits in the current layer. We address their algorithm as the Cambridge
algorithm henceforth. In [23], we designed a new qubit mapping algorithm, called sahs in this
paper, which uses simulated annealing for constructing an initial mapping and searches the next
3
mapping by using a heuristic function that reflects the variable influence of gates in different layers.
Empirical results showed that sahs outperforms both sabre and the Cambridge algorithm by a
large margin. Initial mappings of sahs are, however, computed non-deterministically by simulated
annealing, which is sometimes unstable and runs slowly when the circuit size is large [23].
In this paper, we propose a new search algorithm based on subgraph isomorphism and filtered
depth-limited search. The idea is to construct a graph G from the input circuit which is isomorphic
to a subgraph of AG, the architecture graph of the given NISQ device, and select any embedding
from G to AG as the initial mapping. Starting from this initial mapping, we then, step by step,
construct the physical circuit while removing executable gates from the logical circuit. If the
current mapping can execute some gates in the front layer of the logical circuit, we remove them
from the logical circuit and properly append them to the current physical circuit; if there are no
executable gates in the front layer, then we need to insert swap gates and obtain a new mapping
so that some two-qubit gates in the front layer can be executed. To select a good next mapping,
we tend to exhaustively search all possible combinations of swap operations such that the number
of executable two-qubit gates per swap is maximised. As selecting the best swap combination
is expensive, we set up a fixed limit k > 0 and only consider combinations of at most k swaps.
The search process could be further sped up if we ‘filter’ those swaps which do not interact with
gates in the front layers of the circuit. Such filters are designed and used in our algorithm. While
less efficient when compared with sabre, our algorithm is, if neglecting the time for computing
the initial mapping, still polynomial in all relevant parameters and can significantly reduce the
number of swaps required to transform the input logical circuit. Indeed, empirical evaluation
shows that our algorithm can often reduce by half the number of swaps required when compared
with sabre, if the input logical circuit has hundreds or more two-qubit gates. Similar empirical
evaluations also show that our algorithm is significantly better than the Cambridge algorithm
developed by Cowtan et al. [7] and our sahs algorithm [23] in terms of the size of the output
circuits, when no postmapping optimisation is used.
The remainder of this paper is organised as follows. In Section 2, we recall some background of
quantum computation and quantum circuits, and describe and analyse our algorithm in Section 3.
Detailed empirical evaluation is reported in Section 4. The last section concludes the paper with
discussions on directions for future research.
2 Backgrounds
In this section, after a brief introduction of quantum gates and quantum circuits, we describe the
dependency graph associated to a logical circuit and show how to partition the logical circuit into
layers by using the dependency graph.
2.1 Quantum Gates and Quantum Circuits
Qubit is the counterpart of bit in quantum computation. While a ‘classical’ bit can only be in one
of two states, viz., 0 and 1, a qubit can be in the superposition state |ψ〉 = α |0〉+β |1〉 of the two
basis states, |0〉 and |1〉, where α, β ∈ C are probability amplitudes satisfying |α|2 + |β|2 = 1. For
4
Figure 2: Hadamard, cnot and swap gate (from left to right).
Figure 3: A logical quantum circuit with only cnot gates (left) and its dependency graph (right).
example, |+〉 = 1√
2
(|0〉+ |1〉) and |−〉 = 1√
2
(|0〉 − |1〉) are two superposition states. The success of
quantum computation partially lies in ingenious use of quantum superposition.
Quantum computation is realised by applying quantum gates on qubits. Complex, multi-qubit
gates can be decomposed into elementary single or two-qubit gates. In fact, any quantum gate can
be approximated to arbitrary accuracy using a fixed set of single-qubit gates and cnot gates [3].
Fig. 2 illustrates three very useful gates: Hadamard gate h, cnot gate and swap gate. Hadamard
gate is a single-qubit gate which can evenly mix the basis states to produce a superposed one.
Precisely, h maps |0〉 to |+〉 and |1〉 to |−〉. cnot and swap are both two-qubit gates, i.e., they
operate on two qubits. A cnot gate flips the target qubit (indicated graphically with ⊕) if and
only if the control qubit (indicated graphically with a black dot •) is in state |1〉, while a swap
gate exchanges the states of the two qubits operated. Precisely, cnot maps |a〉 |b〉 to |a〉 |a⊕ b〉
and swap maps |a〉 |b〉 to |b〉 |a〉 for a, b ∈ {0, 1}. Most NISQ devices do not support swap gates
directly and, if this is the case, we may implement a swap gate by three cnot gates (see Fig. 2
(right)).
Quantum circuits are the most commonly used model to describe quantum algorithms, which
consist of input qubits, quantum gates, measurements and classical registers [20]. As only input
qubits and quantum gates are relevant in the qubit mapping problem, in this paper, we represent
a quantum circuit simply as a pair (Q,C), where Q is the set of involved qubits and C a sequence
of quantum gates.
5
2.2 Dependency Graph and Front Layer
Two-qubit gates in a logical circuit LC = (Q,C) are not independent in general. We say a two-
qubit gate g1 depends on another two-qubit gate g2 if the latter must be executed before the
former. This happens when g2 is in front of g1 in C and they share a common qubit, or when g1
depends on a two-qubit gate g3 which depends on g2. For clarity, we say g1 directly depends on g2
if g2 is in front of g1 in C and they share a common point and there are no other gates between
them which share the same common point.
For a logical circuit LC = (Q,C), we construct a directed acyclic graph (DAG), called the
dependency graph, to characterise the direct dependency between two-qubit gates in LC [11, 5].
Each node of the dependency graph represents a two-qubit gate and each directed edge the direct
dependency relationship from one two-qubit gate to another. The front layer of LC, denoted
F(LC) or L0(LC), consists of all two-qubit gates in LC which have no parents in the dependency
graph. The second layer L1(LC) is then the front layer of the circuit obtained from LC by deleting
all gates in F(LC). Analogously, we can define the k-th layer Lk(LC) of LC for all k ≥ 0.
Example 1. Consider the logical circuit LC = (Q,C) shown in Fig. 3 (left), where
Q = {q0, q1, q2, q3},
C = (g0 ≡ 〈q2, q0〉, g1 ≡ 〈q3, q2〉, g2 ≡ 〈q0, q3〉, g3 ≡ 〈q0, q2〉, g4 ≡ 〈q3, q2〉, g5 ≡ 〈q0, q3〉, g6 ≡ 〈q3, q1〉).
Note that, for ease of presentation, we use for example 〈q2, q0〉 to represent the cnot gate in the
logical circuit with q2 being the control qubit and q0 the target.
For this circuit, we have F(LC) = {g0}, L1(LC) = {g1}, L2(LC) = {g2}, and L3(LC) = {g3},
and so on. Fig. 3 (right) shows the dependency graph of LC. From the dependency graph we can
see, for example, gate g2 can be executed only after g0 and g1.
3 The Proposed Approach
The main objective of qubit mapping is to transform an input logical circuit to a physical one
with minimal size or depth so that the constraints imposed by the NISQ device are satisfied.
To simplify the discussion, we only consider the connectivity constraints for two-qubit gates as
specified by the architecture graph. This means that single-qubit gates have no effect in the circuit
transformation process. Furthermore, we make the following assumptions:2
1. The NISQ device supports all single-qubit gates and cnot gates;
2. The architecture graph of the NISQ device, AG, is an undirected graph;
3. cnot gates are the only two-qubit gates in the input logical circuit.
From now on and as in Example 1, we write a cnot gate simply as a pair 〈q, q′〉, where q is
the control qubit and q′ is the target qubit. We call the cnot gate 〈q′, q〉 the inverse of 〈q, q′〉.
2The occurrences of cnot gates may be replaced by cz gates when, e.g., a Rigetti device is used.
6
Let AG = (V,E) be the undirected architecture graph of the NISQ device we are given, where
V is the set of physical qubits and E the set of edges along which cnot gates can be performed.
Recall that an edge e in an undirected graph is an unordered pair of the two endnodes p, q of e.
In the following, we write e simply by {p, q}, i.e., the set of its two endnodes.
Given a logical circuit LC = (Q,C) consisting of only cnot gates with |Q| ≤ |V |, we need
to construct a physical circuit PC = (V,Cp) which is equivalent to LC in functionality and only
contains cnot gates. Moreover, for any cnot gate 〈q, q′〉 in Cp, we have {q, q′} ∈ E.
It is easy to find a physical circuit that satisfies the above conditions, but the real challenge
is to find one with minimal size or depth, which is NP-hard in general [18]. In this paper, we
modify the input logical circuit stepwise by inserting auxiliary swap operations (each implemented
with three cnot gates as in Fig. 2 (right)) until the logical circuit is transformed into a physical
circuit that can be executed on the NISQ device. To evaluate the effectiveness of qubit mapping
algorithms, we use the sizes of the output circuits, i.e., the total number of its two-qubit gates.
3.1 Qubit Mapping
In each step, qubits in the logical circuit are mapped or allocated to physical qubits in the NISQ
device. Mathematically, a (partial) qubit mapping is a (partial) function τ from Q to V such that
τ(q) = τ(q′) if and only if q = q′ for any q, q′ ∈ Q. We say a partial qubit mapping is complete if
it is defined for every q in Q. The mapping may change in consecutive steps of the transformation
which is determined by the inserted auxiliary swap operations.
Given a logical circuit LC and a mapping τ , a cnot gate g = 〈q, q′〉 in LC is said to be
satisfied by τ , or τ satisfies g, if {τ(q), τ(q′)} is an edge in AG. Furthermore, g is executable by τ
if it appears in the front layer of LC and τ satisfies g. If this is the case, we remove g from LC
and append a cnot gate τ(g) := 〈τ(q), τ(q′)〉 to the end of the physical circuit. This process is
called the execution of g.
For two physical qubits v, v′ in AG, we write distAG(v, v′) for the distance (i.e., the length of
a shortest path) from v to v′ in AG. For a two-qubit gate g in C, the physical distance between
the two qubits q, q′ in g under the current mapping τ is defined as the distance between τ(q) and
τ(q′) in AG if both τ(q) and τ(q′) are defined and as the diameter of AG otherwise. Apparently,
a gate g = 〈q, q′〉 in the front layer is executable by τ iff the physical distance between q and q′
under τ is 1.
Example 2 (Example 1 cont’d). Consider the logical circuit LC shown in Fig. 3. Let τ1 : Q→ V
be the mapping specified by τ1(q0) = v2, τ1(q1) = v0, τ1(q2) = v10, τ1(q3) = v6, see Figure 4 (left).
Then, because τ1(q2) = v10 and τ1(q0) = v2 and distAG(v2, v10) = 2, we can see that g0 ≡ 〈q2, q0〉 is
not executable by τ1 in IBM Q Tokyo. However, for the mapping τ3 shown in Figure 4 (right), g0
is executable by τ3 and, indeed, every gate in LC is satisfied by τ3.
3.2 Initial Mapping
An initial mapping can be constructed step by step or selected arbitrarily or computed from
a dedicated subroutine. Zulehner et al. [24] tested both arbitrary initial mappings and initial
mappings evolved from an empty mapping. Their experimental evaluation shows that, in general,
7
τ1 τ2 τ3
Figure 4: Qubit mappings τi : {q0, q1, q2, q3} → V for i = 1, 2, 3, where τ2 is obtained from τ1
by swap(1,6) and τ3 is obtained from τ2 by swap(6,10), where swap(1, 6) is a shorthand for
swap(v1, v6).
the latter approach has better performance. In [11], Li et al. proposed to use an initial mapping
that takes the whole input circuit into consideration. Starting from a randomly generated mapping
τ0, they first take this mapping as the initial mapping and apply it on the input circuit (Q,C), the
obtained final mapping τ1 is then used as the initial mapping and applied to the inverse circuit
3
(Q,Cinv), and lastly, the obtained final mapping τf is selected as the initial mapping for their
main algorithm sabre. Their approach is demonstrated as consistently better than the A∗ search
algorithm in [24]. In [23], we proposed sahs, which uses simulated annealing to search for the best
initial mapping that fits well with the input logical circuit and empirical evaluation there shows
that, when compared with the naive mapping that sends qi to vi, sahs works significantly better
with the initial mapping obtained by simulated annealing.
In this section we show how to obtain a good initial mapping by matching a particular graph
induced by the input circuit with the architectural graph.
Suppose LC = (Q,C) is the input logical circuit. We first construct an undirected graph
Gcirc(C) = (Q,Ecirc) on Q, where {q, q′} is an edge in Ecirc if either 〈q, q′〉 or its inverse 〈q′, q〉 is
in C. If Gcirc(C) happens to be isomorphic to a subgraph of the architecture graph AG, then the
qubit mapping problem is solved by constructing an (arbitrary) isomorphic embedding τ from
Gcirc(C) to AG. For NISQ devices, which have up to several thousands qubits, this can be easily
solved by, for example, the VF2 algorithm [6]. If Gcirc(C) is not isomorphic to a subgraph of AG,
then we may select a maximal sub-circuit Ctop of C such that
(i) Ctop is a front section of C = {gi ∈ C | 1 ≤ i ≤ n} w.r.t. the dependency graph of C, i.e., a
two-qubit gate gi is in Ctop only if all two-qubit gates gj on which gi depends are in Ctop;
(ii) the graph Gcirc(Ctop) is isomorphic to a subgraph of AG; and
(iii) the graph Gcirc(Ctop ∪ {gi∗}) is not isomorphic to a subgraph of AG for any gi∗ that is in the
front layer of C \ Ctop.
3Note the “inverse” here has a different meaning as the “inverse” cnot gate defined in page 3.
8
We call Gcirc(Ctop) a top subgraph (topgraph for short) of Gcirc(C). Let τtop be an isomorphic
embedding from Gcirc(Ctop) to AG. We select τtop as the initial mapping, which satisfies all gates
in Ctop. Note that τtop might not be a complete mapping from Q to V .
Example 3 (Example 1 cont’d). For the circuit LC in Example 1, we have Q = {q0, q1, q2, q3}
and Ecirc = {{q0, q2}, {q0, q3}, {q2, q3}, {q1, q3}}. Clearly, Gcirc(C) is isomorphic to a subgraph of
AG and such an isomorphism is specified by the qubit mapping τ3 in Fig. 4 (right).
Note that τtop often does not take the whole circuit into consideration. We propose another
method for constructing the initial mapping that considers the whole circuit. For a logical circuit
LC = (Q,C), we introduce a weight function ω which assigns a weight on each edge of Ecirc (the
edge set of the undirected graph Gcirc(C) defined above) such that ω({q, q′}) is the number of gates
gi in C with gi = 〈q, q′〉 or gi = 〈q′, q〉. Let Ewcirc = {e1, e2, ..., en} where ω(e1) ≥ . . . ≥ ω(en). We
then construct a subgraph G∗ = (Q,E∗) of Gcirc(C) which is isomorphic to a subgraph of AG as
follows. We start by letting E∗ = {e1} and then consider the next edge e2. In general, suppose
we have decided if ei should be put into E
∗ or not for all i < k for some k ≤ n and the current
subgraph G∗ is isomorphic to some subgraph of AG. We consider ei+1. If putting ei+1 into E∗
will make G∗ non-isomorphic to any subgraph of AG, we skip this edge; otherwise, we put ei+1
into E∗ and update G∗, which is still isomorphic to some subgraph of AG. If i + 1 < n, we
continue to consider ei+2 till there is no edge left in E
w
circ. In this way, we obtain a subgraph G∗ of
Gcirc(C) that is isomorphic to some subgraph of AG. The sum of weights of edges in G∗, though
not necessary the largest, is sufficiently large among all subgraphs of Gcirc(C) that are isomorphic
to some subgraph of AG. Using the VF2 algorithm again, we can find an embedding τwgt which
embeds G∗ into AG. Again, we note that τwgt might be a partial mapping from Q to V .
In the following, we call τtop the topgraph initial mapping and call τwgt the weighted graph
initial mapping of LC. Besides these two initial mappings, we also introduce a method for evolving
an initial mapping from the empty mapping. Similar idea was used by Zulehner et al. [24], while
we extend a partial mapping only when necessary, i.e., when the thus extended mapping can
execute a two-qubit gate in the current front layer or it can reduce the minimum physical distance
(cf. Section 3.1) between qubits in a two-qubit gate in the current front layer. This mapping
extension technique is also used when τtop or τwgt is incomplete.
3.3 Fixed-Depth Heuristic Search
In most search-based algorithms for the qubit mapping problem, a heuristic function is used to
select an action (i.e., a swap or a sequence of swaps) which can maximally reduce the sum or
the minimum of the distances between the two qubits in the cnot gates of the front layer and,
sometimes, the lookahead layer.
For each edge e = {v, v′} in AG there is an associated swap operation, written swap(e), which
swaps the states on v and v′. More precisely, suppose τ is the current mapping and τ(q) = v,
τ(q′) = v′. Then swap(e) transforms τ into a new mapping τ ′ such that τ ′(q) = v′, τ ′(q′) = v,
and τ ′(q∗) = τ(q∗) for q∗ 6∈ {q, q′}. In case if τ(q) (τ(q′)) is not defined and τ(q′) = v′ (τ(q) = v),
then τ ′(q′) (τ ′(q)) is not defined and τ ′(q) = v′ (τ ′(q′) = v). If both are undefined, then τ ′ = τ .
We often write swap(e) ◦ τ for τ ′.
9
Example 4 (Example 1 cont’d). For the three qubit mappings in Fig. 4, we have τ2 = swap(1, 6)◦
τ1 and τ3 = swap(6, 10) ◦ τ2.
In this section, we propose a new heuristic function which measures how efficient the mapping
can execute gates in the logical circuit. For convenience, we say a cnot gate (in or not in the
front layer) is executable by a mapping τ if all cnot gates it depends on are satisfiable by τ .
Starting with a selected initial mapping τ 0, we write s0 = (τ 0, PC0, LC0) for the initial state of
the search process, where LC0 is obtained by removing all cnot gates 〈q, q′〉 that are executable
by τ 0 from LC, and PC0 is obtained by adding the corresponding cnot gates 〈τ 0(q), τ 0(q′)〉 in an
empty physical circuit. Step by step, we select an action a from S, the set of sequences of swaps
on AG and enforce all swaps in a one by one to get the next mapping (and the next state) till
there are no gates left in the logical circuit.
Suppose si is the current state with si = (τ i, PCi, LCi) and all gates that are executable by τ i
are already removed from LCi. For a sequence a = (swap1, swap2, ..., swap`) of swaps on AG,
we define a value function
val(τ i, a) =
number of gates executable by τ ′
len(a)× 3 , (1)
where τ ′ is the mapping obtained by enforcing swaps in action a one by one on τ i and len(a) = `
is the number of swaps in a. Recall each swap is implemented by three cnot gates (see Fig. 2
(right)).
Our action set consists of all sequences of swaps onAG and we select any one with the maximal
value, i.e., we select a∗ from
arg max
a∈S
val(τ i, a). (2)
After selecting a∗, we enforce on τ i swaps in a∗ one by one and obtain the next mapping τ i+1.
Then we remove all gates that are executable by τ i+1 from LCi and write LCi+1 for the resulted
logical circuit. In the meanwhile, we append to PCi three cnot gates (as in Fig. 2 (right)) for
each swap in a∗, and a cnot gate 〈τ i+1(q), τ i+1(q′)〉 for each cnot gate 〈q, q′〉 removed from LCi.
In this way, we obtain PCi+1 and the next state si+1 = (τ i+1, PCi+1, LCi+1).
Apparently, considering all sequences of swaps is inefficient. In practice, we propose to consider
actions with up to k swaps for some fixed k ≥ 1. In particular, for IBM Q Tokyo, we select k = 3.
Example 5 (Example 1 cont’d). Suppose τ1 is the qubit mapping which maps q0, q1, q2, q3 to,
respectively, v2, v0, v10, v6, see Fig. 4 (left). As the front layer contains only g0 = 〈q2, q0〉, which is
not executable by τ1, there are no gates in LC that can be executed by τ1. Examining all sequences
of up to 3 swaps, the four best actions are as follows:
• a1 = (swap(1, 6), swap(6, 10)), which can execute all 7 gates in LC;
• a2 = (swap(5, 6), swap(2, 6)), which can execute all 7 gates in LC;
• a3 = (swap(6, 7), swap(6, 10)), which can execute all but the last gate in LC;
• a4 = (swap(6, 11), swap(2, 6)), which can execute all but the last gate in LC.
The fifth best action contains 3 swaps. Thus a1 and a2 are the optimal actions, with the optimal
value val(τ1, a1) = val(τ1, a2) = 7/6.
10
3.3.1 Heuristics used in related works
Now it is a good time to compare our heuristic function with those used in the related works.
Zulehner et al. [24] select the action that results in a mapping which can execute all gates in
the front layer and the lookahead layer. The action consists of a sequence of swaps and is selected
by using A∗ search and the following heuristics:
h(τ i) =
∑{
3× (distAG(τ i(q), τ i(q′))− 1) | 〈q, q′〉 ∈ L0(LCi) ∪ L1(LCi)}, (3)
where, for any two-qubit gate 〈q, q′〉, distAG(τ i(q), τ i(q′)) is the distance from τ i(q) to τ i(q′) in AG.
The heuristic cost is not admissible and thus an optimal action is not guaranteed. Moreover, the
worst-case time complexity of this A∗ search algorithm is exponential in the number of logical
qubits.
Childs et al. [5] select the action which can maximally reduce the total distance between qubits
in the cnot gates in the current front layer, i.e.,
R(τ i) =
∑{
distAG(τ i(q), τ i(q′)) | 〈q, q′〉 ∈ L0(LCi)
}
. (4)
Their algorithm is polynomial in all relevant parameters but its performance is not directly com-
pared with the A∗ algorithm in [24].
To overcome the inefficiency of the A∗ search algorithm, several researchers (see, e.g., [11, 7])
propose to select a single swap each time. Their methods are more efficient than the A∗ approach
when processing logical circuits with more than 15 qubits. In [11], Li et al. design a heuristic cost
function that can reduce the sum of distances between the two qubits in each two-qubit gate in
the front (and the lookahead) layers. Analogously, Cowtan et al. [7] use a heuristic cost function
that can reduce the diameter of the subgraph composed of all qubits in the two-qubit gates of the
front layer. However, as will become clear in the evaluation section, the efficiency of the above
algorithms is achieved at the cost of the quality of the output physical circuit.
In the sahs algorithm [23], we introduce a heuristic function that supports weight parameters
to reflect the variable influence of gates in different layers. In each step, instead of selecting the
action with the minimal cost to apply, the sahs algorithm selects the swap which has the best
consecutive swap to apply.
3.4 Optimisation and Fallback
Considering all sequences of up to k swaps is still not very efficient for devices with a medium to
large architecture graph. Let τ be the current mapping and Li the current i-th (0 ≤ i ≤ k) layer.
Write Qi for the set of logical qubits in Li. It’s natural not to consider swaps that do not interact
with gates in the front layers. This idea was used in, e.g., [24, 11, 23].
For an edge e = {v, v′}, if neither τ−1(v) nor τ−1(v′) is in Q0, then swapping v and v′ does not
reduce the minimum distance between qubits in a two-qubit gate in the current front layer, viz. L0.
Therefore, it is reasonable to introduce the following filter for selecting actions a = (e1, e2, ..., e`)
with at most k swaps:
11
1. Q0-filter: We say a = (e1, e2, ..., e`) is a Q0-plausible action if, for any edge ej = {v, v′} of
a, we have either τ−1j−1(v) or τ
−1
j−1(v
′) is in Q0, where τ0 ≡ τ and τj is obtained from τj−1 by
enforcing swap(ej) for 1 ≤ j ≤ `.
Similarly, we could look ahead and introduce Qi-filter for 0 < i < k.
2. Qi-filter: We say a = (e1, e2, ..., e`) is a Qi-plausible action if, for any edge ej = {v, v′} of a
with j > i, we have either τ−1j−1(v) or τ
−1
j−1(v
′) is in Qi, where τ0 ≡ τ and τj is obtained from
τj−1 by enforcing swap(ej) for 1 ≤ j ≤ `.
The above Qi-filter could be weakened by requiring that either τ
−1
i−1(v) or τ
−1
i−1(v
′) is in (a subset
of) Q0 ∪ Q1 ∪ · · · ∪ Qi. In our evaluation, we used Q0 and (weakened) Q1 filters and the results
are very promising (see Section 4).
It should be stressed that, sometimes, Q0-filter may ‘filter’ out optimal actions.
Example 6 (Example 1 cont’d). Note that Q0 = {0, 2} and τ1(q0) = v2, τ2(q2) = v10. Each ai
for 1 ≤ i ≤ 4 in Example 5 is not Q0-plausible. Thus our algorithm with Q0-filter cannot find an
optimal action. In fact, the following Q0-plausible action is selected by our algorithm
• a5 = (swap(2, 7), swap(1, 6), swap(6, 10)).
This action (see Fig. 4) can execute all 7 gates in LC and has val(τ1, a5) = 7/9.
For a prefixed positive integer k, it is possible that, in some cases, no sequence of swaps with
length ≤ k can lead to a mapping which can execute any cnot gate in the current front layer. If
this is the case, we use the following natural fallback:
3. Fallback: Select any swap that can reduce FB(τ), the minimum distance between qubits
in a two-qubit gate in the current front layer, which is formally defined as follows:
FB(τ) = min
{
distAG(τ(q), τ(q′)) | 〈q, q′〉 ∈ L0
}
. (5)
It is worth noting that, for our experiments on IBM Q Tokyo and an extensive set of logical
circuits, the fallback is rarely activated.
3.5 Complexity Analysis
In the following we give a rough estimation of the complexity of the search process of our algorithm.
The construction of the topgraph and weighted graph initial mappings requires finding an
isomorphic graph embedding. In general, it is NP-hard to check if a graph is isomorphic to a
subgraph of another but there are efficient algorithms, say the VF2 algorithm [6], which can easily
solve this problem and output an embedding if the answer is yes for graphs with several thousands
nodes. Therefore, for the purpose of qubit mapping with NISQ devices, we don’t take into our
analysis the time of computing an initial mapping.
Suppose LC = (Q,C) is a logical circuit and AG = (V,E) is the architecture graph of a NISQ
device. Write |Q|, |V |, and |E| be, respectively, the cardinalities of Q, V , and E. Let d be the
diameter of AG and m the number of cnot gates in C.
We have the following simple observations:
12
• The dependency graph of LC can be computed in time linear in m.
• For any mapping τ and any logical circuit LC, we can identify (and remove from LC as well
as from its dependency graph) in time linear in m the set of gates in LC executable by τ .
We first consider the ideal case when fallback is never activated during the search process. As
described in Section 3.3, starting with a selected initial mapping τ 0, step by step, we select an
action a consisting of up to k swaps on AG and enforce all swaps in a one by one to get the
next mapping till there are no gates left in the logical circuit. Suppose si = (τ i, PCi, LCi) is the
current search state. As there are at most O(|E|k) actions with up to k swaps, we can generate
at most O(|E|k) different mappings from τ i. To select from these mappings the one which can
execute the most gates in LCi, we need time O(|E|k × m) (cf. the second observation above).
Because each step removes at least one cnot from LC, in at most O(|E|k × m2) time, we can
execute all gates in LC.
Now, suppose fallback is activated. Since each activation of the fallback reduces by (at least)
one the minimum distance between qubits in a cnot gate in the current front layer (i.e., FB(τ) in
Eq. 5), the whole search process activates the fallback procedure at most m× d times. Note that
each activation (see Eq. 5) needs to compute the shortest distance between the control and target
qubits in a cnot gate in the front layer of the current logical circuit and there are at most |Q|/2
cnot gates in the front layer. Using Dijkstra’s algorithm with lists, FB(τ) can be computed in
time O(|Q| × |V |2). Thus the total fallback on-cost is at most O(|Q| × |V |2 ×m× d).
Therefore, the overall time complexity of the search is O(|E|k × m2 + |Q| × |V |2 × m × d).
As |Q| ≤ |V | ≤ |E| + 1 and d is usually very small when compared with m, the overall time
complexity is bounded by O(|E|k ×m2) if k ≥ 3. In practice, this could be significantly reduced
if we use Q-filters as the base in |E|k can be significantly reduced.
As for the space complexity, in each state s, we maintain, besides the logical and physical
circuits, the dependency graph of the current logical circuit and the set of plausible actions with
up to k swaps. Thus the space complexity of the algorithm is bounded by O(|E|k +m).
4 Evaluation
In this section, we compare our approach with the sabre algorithm of Li et al. [11], the algorithm
of [7] (Cambridge henceforth), and our qubit transformation algorithm sahs based on simulated
annealing and heuristic search [23], which are the state-of-the-art algorithms for the qubit mapping
problem on IBM Q Tokyo (see Fig. 1). Although we focus on a particular NISQ device in the
evaluation, our approach is applicable to any undirected architecture graph, including for example
Rigetti 16Q Aspen-44. We use Python as our programming language and IBM Qiskit [1] as
auxiliary environment.5 All experiments are conducted in a MacBook Pro with 3.1 GHz Intel
Core i5 processor and 8GB memory.
As for benchmark circuits, we consider all publicly available circuits evaluated in [11] or [7].
Note that only cnot gates are concerned in our comparison. For each individual circuit, we
4Note that this device supports CZ instead of cnot.
5The code is available at https://github.com/BensonZhou1991/Qubit-Mapping-Subgraph-Isomorphism
13
extract all its cnot gates and use the thus reduced circuit as the input of our qubit mapping
algorithm. We then compare the involved algorithms in terms of the number of auxiliary cnot
gates required. One may also use the following relative measure Rcnot, first introduced in [7], to
compare different algorithms on a particular circuit:
Rcnot =
number of cnot gates in the output physical circuit
number of cnot gates in the input logical circuit
. (6)
In order to compare algorithms evaluated over different benchmark sets of circuits, for any bench-
mark set B of quantum circuits, we define the following cnot index, written IBcnot, of an algorithm
relative to B as the fraction of the total number of cnot gates in all output physical circuits over
the total number of cnot gates in all input logical circuits from B, i.e.,
IBcnot =
∑
LC∈B number of cnot gates in the output physical circuit of LC∑
LC∈B number of cnot gates in LC
. (7)
For convenience, we write Bs and Bc for the benchmark sets of circuits used in [11] and [7]
respectively. Note that Bc contains 131 circuits and includes all the 23 circuits in Bs. For more
precise comparison, we decompose Bc into three categories according to the number of cnots
these benchmark circuits contain: small (0-99), medium (100-999) and large (≥ 1000).
For all experiments reported here, we fix the search depth k as 3 and use Q0-filter for the
first swap and Q1-filter for all the other swaps for filtering actions a = (e1, ..., e`) with at most 3
swaps.
4.1 Comparison Among Different Initial Mappings
We first compare the two subgraph isomorphism related initial mappings (viz., the topgraph initial
mapping τtop and the weighted graph initial mapping τwgt) introduced in Section 3 with the empty
mapping and the naive initial mapping (which maps qi to vi for each qi in Q). Table 1 summarises
the results for all (small, medium, large) circuits in Bc, while we give the detailed results in
Tables 3-5 in Appendix A. For each of these circuits and each initial mapping, the transformation
can be completed within 15 minutes by using our algorithm.
From Table 1, we can see that the two isomorphism subgraph related initial mappings, τtop
and τwgt are significantly better than the empty initial mapping and the naive initial mapping for
small and medium circuits, but the difference is not significant when large circuits are evaluated.
This is not a surprise as the search heuristics plays a dominant role if the circuit has a large size.
Note that if a logical circuit can be transformed into a physical circuit with zero overhead, our
algorithm, when using either the topgraph or weighted graph initial mapping, will certainly detect
this.
Since the topgraph initial mapping is slightly better than the other three initial mappings, in
the following, when compared with other algorithms, we always use the topgraph initial mapping.
4.2 Comparison with SABRE, Cambridge, and SAHS
We then compare our algorithm with sabre [11] on the small benchmark set Bs of circuits used
in [11]. We use the topgraph initial mapping τtop. The results are reported in Table 2, in which
14
benchmarks #circ. topgr.i.m. wghtgr.i.m. empty i.m. naive i.m. sahs Cambridge
small 63 1.2545 1.3497 1.6598 1.8785 1.2619 1.5103
medium 39 1.3718 1.3351 1.4208 1.5698 1.3101 1.6854
large 29 1.4376 1.4386 1.4478 1.4413 1.6151 1.8211
all 131 1.4346 1.4353 1.4486 1.4477 1.6047 1.8154
Table 1: Summary of the IBcnot-index of our algorithm with four different initial mapping con-
structing methods and sahs and Cambridge
the ‘Comparison’ column shows the improvement of our algorithm over sabre in terms of the
numbers of auxiliary cnot gates added. Specifically, let nsabre and nours be the numbers of cnot
gates added by sabre and by ours, respectively. Then the improvement ratio is calculated as
(nsabre−nours)/nsabre. From Table 2 we can see that only two circuits have negative improvement
against sabre. For all circuits with more than 200 cnot gates, the improvement is at least 50%.
In terms of the cnot index, we have successfully decreased the index IBscnot from 1+50874/50534 =
2.0067 to 1 + 20652/50534 = 1.4087.
We further compare our algorithm with the Cambridge algorithm of [7] and our sahs algorithm
[23] on the large benchmark set Bc, which contains 131 circuits. A summary of the results in
terms of the cnot index is presented in Table 1. As for detailed results, see Tables 3-5 in
Appendix A. It is worth stressing that the results of the Cambridge algorithm is obtained without
using postmapping optimisations. Precisely, we have removed the following codes from their
algorithm:
• ‘Transform.OptimisePhaseGadgets().apply(tkcirc)’ and
• ‘Transform.OptimisePostRouting().apply(outcirc)’.
Thus the results of their algorithm given in Tables 3-5 are different from those reported in [7,
Table VI]. In this sense, Tables 3-5 provide a fair comparison as we do not do postmapping
optimisations either.
From Table 1 we can see that, for all 131 circuits in Bc, the IBccnot index of our algorithm is
1 + 145083/333811 = 1.4346, while the indices for sahs and Cambridge are, respectively, 1.6047
and 1.8154. This shows that our algorithm can in average generate significantly better results
than Cambridge and sahs. This is particularly true for large circuits which contain 1000 or more
cnot gates. For small circuits with less than 100 cnot gates, our algorithm has almost the same
index as sahs, but sahs is 6 points (1.3101 vs. 1.3718) better than ours when medium circuits
with 100-999 cnot gates are considered. Both algorithms are significantly better than Cambridge
in all three categories.
In the above experiments, we used Q0-filter for the first swap and Q1-filter (see Section 3.4)
for all the other swaps when filtering actions with up to 3 swaps. If we use Q0-filter for all
swaps, then the index becomes 1.4702, which is inferior to the index 1.4346 reported in Table 1.
In addition, if we weaken Q1-filter by using the qubits in the front layer and the lookahead layer,
then the index could be further improved to 1.3886 from 1.4346. However, this is achieved at
the cost of much slower search process: for several large circuits (e.g., ‘sym10 262’), the compete
process requires 2000-3600 seconds, but all circuits can still be transformed within one hour.
15
Circuit
Name
qubit
no.
input
gate
input
CNOT
sabre
added
topgraph
added
Time (s) Comp.
4mod5-v1 22 5 21 11 0 0 0.38 0.00%
mod5mils 65 5 35 16 0 0 0.81 0.00%
alu-v0 27 5 36 17 3 9 0.71 -200.00%
decod24-v2 43 4 52 22 0 0 0.98 0.00%
4gt13 92 5 66 30 0 0 0.36 0.00%
ising model 10 10 480 90 0 0 0.94 0.00%
ising model 13 13 633 120 0 0 1.01 0.00%
ising model 16 16 786 150 0 0 1.08 0.00%
qft 10 10 200 90 54 45 6.06 16.67%
qft 16 16 512 240 186 189 124 -1.61%
rd84 142 15 343 154 105 84 8.72 20.00%
adr4 197 13 3439 1498 1614 681 37.4 57.81%
radd 250 13 3213 1405 1275 633 52.8 50.35%
z4 268 11 3073 1343 1365 525 27.5 61.54%
sym6 145 7 3888 1701 1272 540 4.8 57.55%
misex1 241 15 4813 2100 1521 621 40.9 59.17%
rd73 252 10 5321 2319 2133 1062 23.6 50.21%
cycle10 2 110 12 6050 2648 2622 1125 23.7 57.09%
square root 7 15 7630 3089 2598 1263 446.9 51.39%
sqn 258 10 10223 4459 4344 1467 39.9 66.23%
rd84 253 12 13658 5960 6147 2952 111.8 51.98%
co14 215 15 17936 7840 8982 3975 194.6 55.74%
sym9 193 11 34881 15232 16653 5481 161.1 67.09%
sum - 117289 50534 50874 20652 - 59.41%
Table 2: Comparison of our algorithm with sabre in [11] on IBM Q Tokyo. Note the percentage
in the last row denotes the average improvement of our algorithm (with the topgraph initial
mapping) against sabre.
16
5 Conclusion
We have proposed a new algorithm for qubit mapping which can significantly reduce the extra two-
qubit gates required per two-qubit gate in the input circuit. Our algorithm is based on subgraph
isomorphism and filtered depth-limited search. If the input circuit can be executed directly, the
proposed approach can always detect this. It seems that this nice property is not enjoyed by any
other approach.
From our experimental results, we can see that, when the circuit has less than 1000 two-qubit
gates, our subgraph isomorphism induced initial mappings are much better than empty mappings
and naive mappings that assign the i-th qubit in the logical circuit to the i-th qubit in the quantum
device. For large circuits, our results show that initial mappings are not very important.
A weighted graph like ours (see Section 3.2) is also introduced in a recent work [12], where Lin,
Anschuetz, and Harrow exploit spectral graph theory to qubit mapping. The performance of their
algorithm is in general not better than the A∗ approach of [24]. They also suggested to use their
“spectral mapper to provide an initial mapping” while its effectiveness needs further investigation.
It seems that we are still quite far from devising algorithms that could output circuits with
nearly minimal overheads. Future work will investigate along the following directions:
• Although our approach basically can be applied on any NISQ device with an undirected
architecture graph, it is not certain if the actual efficacy highly depends on the size or
‘compactness’ of the architecture graph. Further work is required to evaluate our approach
on more general NISQ devices with a large number of qubits and various topologies.
• Minimizing depth or latency and circuit error is also important for qubit mapping. Although
the number of CNOT gates in our output circuit is already smaller than the depth of the
output circuit (only CNOT gates are counted) of some compared algorithm [7] (see [23] for
a more detailed analysis), it will be nice if we can adapt our approach to address other or
multiple optimisation objectives.
• More on heuristic search and filters. Could we do better by designing new filters?
• In our approach, the initial mapping is obtained by constructing a subgraph isomorphism.
We certainly could do this in each step, after those executable gates are removed from the
logical circuit. But the clear obstacle is how to transform the current mapping to a selected
next mapping and how to select a good subgraph isomorphism.
• Machine learning and deep learning algorithms may be designed to quickly select the best
action in Eq. 2.
A Detailed Empirical Results
Circuit
name
qubit
no.
input
CNOT
topgr.
added
wgtgr.
added
empty
added
naive
added
sahs
added
Cambridge
added
ex1 226 6 5 0 0 3 18 0 0
17
Circuit
name
qubit
no.
input
CNOT
topgr.
added
wgtgr.
added
empty
added
naive
added
sahs
added
Cambridge
added
graycode6 47 6 5 0 0 0 9 0 0
xor5 254 6 5 0 0 3 18 0 0
4gt11 84 4 9 0 0 9 12 0 0
ex-1 166 3 9 0 0 6 6 0 0
4mod5-v0 20 5 10 0 0 9 9 0 9
4mod5-v1 22 5 11 0 0 9 12 0 9
ham3 102 3 11 0 0 9 6 0 0
mod5d1 63 5 13 0 0 3 12 0 0
4gt11 83 5 14 0 0 0 18 0 12
4mod5-v0 19 5 16 0 0 0 18 0 9
4mod5-v1 24 5 16 0 0 15 21 0 12
mod5mils 65 5 16 0 0 15 21 0 9
rd32-v0 66 4 16 0 0 18 18 0 0
rd32-v1 68 4 16 0 0 18 18 0 0
3 17 13 3 17 0 0 9 9 0 0
alu-v0 27 5 17 9 6 12 18 6 3
alu-v1 29 5 17 9 6 12 24 6 3
alu-v2 33 5 17 9 6 12 18 6 9
4gt11 82 5 18 3 3 3 27 3 12
alu-v1 28 5 18 9 6 21 18 6 3
alu-v3 35 5 18 9 6 12 24 6 3
alu-v4 37 5 18 9 6 12 18 6 3
decod24-v2 43 4 22 0 0 18 15 0 0
decod24-v0 38 4 23 0 0 27 15 0 0
miller 11 3 23 0 0 6 9 0 0
alu-v3 34 5 24 9 6 27 27 6 3
mod5d2 64 5 25 18 9 18 36 12 12
4gt13 92 5 30 0 0 30 42 0 18
4gt13-v1 93 5 30 0 0 33 27 0 18
4mod5-bdd 287 7 31 6 18 9 36 6 15
4mod5-v0 18 5 31 9 12 30 36 9 9
4mod5-v1 23 5 32 9 21 30 36 9 12
decod24-bdd 294 6 32 15 27 24 24 15 21
one-two-three-v2 100 5 32 9 9 18 30 9 9
one-two-three-v3 101 5 32 15 18 24 36 6 15
rd32 270 5 36 18 24 39 30 12 18
4gt5 75 5 38 9 12 24 51 15 15
alu-bdd 288 7 38 24 15 39 36 24 45
alu-v0 26 5 38 12 9 30 36 9 21
decod24-v1 41 5 38 3 21 21 45 15 18
4gt5 76 5 46 21 36 48 48 15 27
18
Circuit
name
qubit
no.
input
CNOT
topgr.
added
wgtgr.
added
empty
added
naive
added
sahs
added
Cambridge
added
4gt13 91 5 49 6 6 15 45 15 6
alu-v4 36 5 51 6 15 30 30 15 36
4gt13 90 5 53 9 9 18 48 27 9
4gt5 77 5 58 9 18 18 45 9 36
one-two-three-v1 99 5 59 24 33 42 48 12 39
rd53 138 8 60 30 42 30 42 27 39
decod24-v3 45 5 64 15 24 36 72 15 39
one-two-three-v0 98 5 65 18 27 48 63 24 27
4gt10-v1 81 5 66 15 18 36 60 27 33
aj-e11 165 5 69 33 36 42 33 18 24
4mod7-v0 94 5 72 12 27 33 51 12 39
4mod7-v1 96 5 72 21 21 48 45 18 42
alu-v2 32 5 72 15 45 69 45 15 39
mod10 176 5 78 15 24 63 66 24 36
4gt4-v0 80 6 79 15 39 48 69 24 78
cnt3-5 179 16 85 3 30 72 87 15 87
4gt12-v0 88 6 86 21 15 69 66 21 21
ising model 10 10 90 0 0 18 27 0 0
qft 10 10 90 45 33 57 96 36 57
sys6-v0 111 10 98 54 81 63 60 45 111
4 49 16 5 99 18 30 42 48 36 69
sum 2428 618 849 1602 2133 636 1239
I-index 1.255 1.350 1.660 1.879 1.262 1.510
Table 3: Comparison on IBM Q Tokyo with small circuits
Circuit
name
qubit
no.
input
CNOT
topgr.
added
wgtgr.
added
empty
added
naive
added
sahs
added
Cambridge
added
4gt12-v1 89 6 100 57 18 30 81 24 93
0410184 169 14 104 6 27 75 93 12 75
4gt4-v0 79 6 105 12 12 30 87 12 96
hwb4 49 5 107 36 42 54 63 33 45
mod10 171 5 108 39 27 27 60 24 60
4gt4-v0 78 6 109 15 15 33 93 15 99
4gt12-v0 87 6 112 6 6 24 69 6 123
4gt4-v0 72 6 113 45 39 51 93 42 90
4gt12-v0 86 6 116 9 9 27 75 9 123
4gt4-v1 74 6 119 39 27 75 93 78 114
ising model 13 13 120 0 0 36 45 0 0
mini-alu 167 5 126 30 33 69 87 33 75
one-two-three-v0 97 5 128 42 78 72 90 66 66
19
Circuit
name
qubit
no.
input
CNOT
topgr.
added
wgtgr.
added
empty
added
naive
added
sahs
added
Cambridge
added
rd53 135 7 134 60 84 99 111 54 48
decod24-enable 126 6 149 66 63 72 84 87 81
ham7 104 7 149 48 42 75 51 81 102
ising model 16 16 150 0 0 48 48 0 0
mod8-10 178 6 152 69 33 69 87 21 162
rd84 142 15 154 84 126 87 108 102 198
ex3 229 6 175 24 81 87 102 18 174
4gt4-v0 73 6 179 99 42 117 120 42 177
mod8-10 177 6 196 123 78 72 87 39 135
alu-v2 31 5 198 78 90 60 99 54 63
rd53 131 7 200 63 78 93 81 90 87
C17 204 7 205 111 84 144 147 96 114
alu-v2 30 6 223 60 54 87 93 45 105
mod5adder 127 6 239 84 81 93 114 51 87
qft 16 16 240 189 135 204 231 135 195
rd53 133 7 256 60 174 138 150 105 159
majority 239 7 267 66 105 87 213 84 123
ex2 227 7 275 78 108 90 126 96 270
cm82a 208 8 283 117 105 102 225 84 222
sf 274 6 336 30 36 63 180 24 381
sf 276 6 336 36 36 102 159 24 384
con1 216 9 415 273 153 210 195 192 375
rd53 130 7 448 267 207 168 222 171 390
f2 232 8 525 336 126 192 312 213 225
rd53 251 8 564 201 195 225 240 204 309
hwb5 53 6 598 207 204 195 237 174 210
sum 8513 3165 2853 3582 4851 2640 5835
I-index 1.372 1.335 1.421 1.570 1.310 1.685
Table 4: Comparison on IBM Q Tokyo with medium
circuits
Circuit
name
qubit
no.
input
CNOT
topgr.
added
wgtgr.
added
empty
added
naive
added
sahs
added
Cambridge
added
z4 268 11 1343 525 468 609 600 546 1671
radd 250 13 1405 633 567 555 549 669 1647
adr4 197 13 1498 681 741 642 807 711 1146
sym6 145 7 1701 540 585 540 765 744 2139
misex1 241 15 2100 621 726 858 924 921 1263
rd73 252 10 2319 1062 852 1227 1074 1065 2115
cycle10 2 110 12 2648 1125 1290 1320 1236 1038 2424
20
Circuit
name
qubit
no.
input
CNOT
topgr.
added
wgtgr.
added
empty
added
naive
added
sahs
added
Cambridge
added
hwb6 56 7 2952 1077 1026 1098 1011 1104 1719
square root 7 15 3089 1263 1374 1242 1470 1353 1326
sqn 258 10 4459 1467 1716 1638 1986 1953 3192
cm85a 209 14 4986 2073 2091 2289 2397 2337 4173
rd84 253 12 5960 2952 2841 3174 3009 3198 5286
root 255 13 7493 2928 3099 3468 3399 3525 5601
co14 215 15 7840 3975 4563 4629 4437 4356 7752
mlp4 245 16 8232 4275 4146 4173 4104 4116 6462
sym9 148 10 9408 1947 2166 1992 2388 2172 6438
urf2 277 8 10066 6285 6267 6135 6045 5934 8205
hwb7 59 8 10681 3684 3846 3696 3588 4602 6378
max46 240 10 11844 4410 4308 4560 4530 5289 9681
clip 206 14 14772 6762 6834 6963 6405 6843 12624
9symml 195 11 15232 5481 6462 5373 5682 6036 11454
sym9 193 11 15232 5481 6462 5373 5682 6123 11454
dist 223 13 16624 7470 6582 7107 7254 6936 12834
sao2 257 14 16864 7596 6756 8361 6792 7827 11742
urf5 280 9 23764 11988 11679 12060 11802 13065 20436
urf1 278 9 26692 13872 13809 14190 14022 15678 24600
sym10 262 12 28084 11490 10623 11520 10635 11697 20115
hwb8 113 9 30372 11295 11382 11769 11394 59977 35376
urf2 152 8 35210 18342 18342 18018 18489 18780 25857
sum 322870 141300 141603 144579 142476 198595 265110
I-index 1.438 1.439 1.448 1.441 1.615 1.821
Table 5: Comparison on IBM Q Tokyo with large circuits
References
[1] Gadi Aleksandrowicz, Thomas Alexander, P Barkoutsos, L Bello, Y Ben-Haim, D Bucher,
FJ Cabrera-Herna´ndez, J Carballo-Franquis, A Chen, CF Chen, et al. Qiskit: An open-source
framework for quantum computing. Accessed on: March 16, 2019, 2019.
[2] Matthew Amy, Dmitri Maslov, Michele Mosca, and Martin Roetteler. A meet-in-the-middle
algorithm for fast synthesis of depth-optimal quantum circuits. IEEE Trans. on CAD of
Integrated Circuits and Systems, 32(6):818–830, 2013.
[3] Adriano Barenco, Charles H. Bennett, Richard Cleve, David P. DiVincenzo, Norman Margo-
lus, Peter Shor, Tycho Sleator, John A. Smolin, and Harald Weinfurter. Elementary gates
for quantum computation. Phys. Rev. A, 52:3457–3467, Nov 1995.
[4] Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth
Lloyd. Quantum machine learning. Nature, 549:195 EP –, 09 2017.
21
[5] Andrew M. Childs, Eddie Schoute, and Cem M. Unsal. Circuit transformations for quantum
architectures. In van Dam and Mancinska [19], pages 3:1–3:24.
[6] Luigi P. Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento. A (sub)graph iso-
morphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell.,
26(10):1367–1372, 2004.
[7] Alexander Cowtan, Silas Dilkes, Ross Duncan, Alexandre Krajenbrink, Will Simmons, and
Seyon Sivarajah. On the qubit routing problem. In van Dam and Mancinska [19], pages
5:1–5:32.
[8] Alexandre AA de Almeida, Gerhard W Dueck, and Alexandre CR da Silva. Finding optimal
qubit permutations for IBM’s quantum computer architectures. In Proceedings of the 32nd
Symposium on Integrated Circuits and Systems Design, page 13. ACM, 2019.
[9] Aram W Harrow, Avinatan Hassidim, and Seth Lloyd. Quantum algorithm for linear systems
of equations. Physical Review Letters, 103(15):150502, 2009.
[10] Sumeet Khatri, Ryan LaRose, Alexander Poremba, Lukasz Cincio, Andrew T Sornborger,
and Patrick J Coles. Quantum-assisted quantum compiling. Quantum, 3:140, 2019.
[11] Gushu Li, Yufei Ding, and Yuan Xie. Tackling the qubit mapping problem for nisq-era quan-
tum devices. In Proceedings of the Twenty-Fourth International Conference on Architectural
Support for Programming Languages and Operating Systems, pages 1001–1014. ACM, 2019.
[12] Joseph X. Lin, Eric R. Anschuetz, and Aram W. Harrow. Using spectral graph theory to
map qubits onto connectivity-limited devices. arXiv:1910.11489, 2019.
[13] Dmitri Maslov, Sean M. Falconer, and Michele Mosca. Quantum circuit placement. IEEE
Trans. on CAD of Integrated Circuits and Systems, 27(4):752–763, 2008.
[14] Ken Matsumoto and Kazuyuki Amano. Representation of quantum circuits with clifford and
pi/8 gates. arXiv:0806.3834, 2008.
[15] Prakash Murali, Jonathan M. Baker, Ali Javadi-Abhari, Frederic T. Chong, and Margaret
Martonosi. Noise-adaptive compiler mappings for noisy intermediate-scale quantum com-
puters. In Iris Bahar, Maurice Herlihy, Emmett Witchel, and Alvin R. Lebeck, editors,
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Pro-
gramming Languages and Operating Systems, ASPLOS 2019, Providence, RI, USA, April
13-17, 2019, pages 1015–1029. ACM, 2019.
[16] Mehdi Saeedi, Robert Wille, and Rolf Drechsler. Synthesis of quantum circuits for linear
nearest neighbor architectures. Quantum Information Processing, 10(3):355–377, 2011.
[17] Peter W. Shor. Polynominal time algorithms for discrete logarithms and factoring on a
quantum computer. In Leonard M. Adleman and Ming-Deh A. Huang, editors, Algorithmic
Number Theory, First International Symposium, ANTS-I, Ithaca, NY, USA, May 6-9, 1994,
Proceedings, volume 877 of Lecture Notes in Computer Science, page 289. Springer, 1994.
22
[18] Marcos Yukio Siraichi, Vin´ıcius Fernandes dos Santos, Sylvain Collange, and Fernando
Magno Quinta˜o Pereira. Qubit allocation. In Proceedings of the 2018 International Sym-
posium on Code Generation and Optimization, pages 113–125. ACM, 2018.
[19] Wim van Dam and Laura Mancinska, editors. 14th Conference on the Theory of Quantum
Computation, Communication and Cryptography, TQC 2019, June 3-5, 2019, University of
Maryland, College Park, Maryland, USA, volume 135 of LIPIcs. Schloss Dagstuhl - Leibniz-
Zentrum fuer Informatik, 2019.
[20] Rodney Van Meter. Quantum Networking. John Wiley & Sons, 2014.
[21] Davide Venturelli, Minh Do, Eleanor Rieffel, and Jeremy Frank. Compiling quantum circuits
to realistic hardware architectures using temporal planners. Quantum Science and Technology,
3(2):025004, Feb 2018.
[22] Robert Wille, Mathias Soeken, Christian Otterstedt, and Rolf Drechsler. Improving the
mapping of reversible circuits to quantum circuits using multiple target lines. In 18th Asia and
South Pacific Design Automation Conference, ASP-DAC 2013, Yokohama, Japan, January
22-25, 2013, pages 145–150. IEEE, 2013.
[23] X. Zhou, S. Li, and Y. Feng. Quantum circuit transformation based on simulated annealing
and heuristic search. IEEE Transactions on Computer-Aided Design of Integrated Circuits
and Systems, 2020. DOI: 10.1109/TCAD.2020.2969647, arXiv:1908.08853.
[24] Alwin Zulehner, Alexandru Paler, and Robert Wille. Efficient mapping of quantum circuits
to the IBM QX architectures. In 2018 Design, Automation & Test in Europe Conference
& Exhibition, DATE 2018, Dresden, Germany, March 19-23, 2018, pages 1135–1138. IEEE,
2018.
23
