A Monte Carlo Tree Search Framework for Quantum Circuit Transformation by Zhou, Xiangzhen et al.
A Monte Carlo Tree Search Framework for
Quantum Circuit Transformation
Xiangzhen Zhou1,2, Yuan Feng∗2 and Sanjiang Li†2
1State Key Lab of Millimeter Waves, Southeast University, Nanjing 211189, China
2Centre for Quantum Software and Information, Faculty of Engineering and Information
Technology, University of Technology Sydney, NSW 2007, Australia
April 2020
Abstract
In Noisy Intermediate-Scale Quantum (NISQ) era, quantum processing units (QPUs) suffer from, among
others, highly limited connectivity between physical qubits. To make a quantum circuit executable, a circuit
transformation process is necessary to transform it into a functionally equivalent one so that the connectivity
constraints imposed by the QPU are satisfied. While several algorithms have been proposed for this goal,
the overhead costs are often very high, which degenerates the fidelity of the obtained circuits sharply. One
major reason for this lies in that, due to the high branching factor and vast search space, almost all these
algorithms only search very shallowly and thus, very often, only (at most) locally optimal solutions can be
reached.
In this paper, we propose a Monte Carlo Tree Search (MCTS) framework to tackle this problem, which
enables the search process to go much deeper. In particular, we carefully design, by taking both short-
and long-term rewards into consideration, a scoring mechanism. and propose to use a fast random strategy
for simulation. The thus designed search algorithm is polynomial in all relevant parameters and empirical
results on extensive realistic circuits show that it can often reduce the size of output circuits by at least
30% when compared with the state-of-the-art algorithms on IBM Q20.
1 Introduction
With Google’s recent conspicuous, though arguable, success in demonstrating quantum supremacy in a 53-qubit
quantum processor [1], NISQ (Noisy Intermediate-Scale Quantum) devices have attracted rapidly increasing
interests from researchers in both academic and industrial communities. Quantum processing units (QPUs) in
the NISQ era only support a limited set of basic operations (elementary quantum gates) and often suffer from
high gate errors, short coherence time, and limited connectivity between physical qubits. In order to run a
quantum algorithm, described as a quantum circuit, we need to compile the circuit (referred to as logical circuit
henceforth) into a functionally equivalent physical circuit executable on the QPU. The compilation includes two
basic processes. In the decomposition process, gates in the logical circuit are decomposed, or transformed, into
elementary gates supported by the QPU [2, 11]. The transformation process, also known as quantum circuit
transformation (QCT) [6, 31] or qubit mapping [15, 16, 30], is then performed on the generated circuit, which
further consists of two steps: initial mapping and qubit routing. The former maps qubits in a logical circuit,
called logical qubits, to the ones in the QPU, called physical qubits; while the latter transforms a circuit through
adding ancillary operations like SWAP gates to ‘route’ physical qubits in order to make all multi-qubits gates
executable.
∗yuan.feng@uts.edu.au
†sanjiang.li@uts.edu.au
1
ar
X
iv
:2
00
8.
09
33
1v
1 
 [q
ua
nt-
ph
]  
21
 A
ug
 20
20
Both the decomposition and the transformation processes have been studied extensively in the literature.
As there are now standard decomposition processes (see [19, 23]), in this paper, we focus on the transformation
one, and assume that gates in the input logical circuit have been well decomposed into elementary gates that
are supported by the QPU. Furthermore, we assume that an initial mapping is given, which can be obtained
by employing, say, the greedy strategy [32, 21, 7], the reverse traversal technique [15], the simulated annealing
based algorithm [31], or the subgraph isomorphism based methods [16].
To reduce the gate overheads in the qubit routing step, many algorithms have been proposed aiming at
minimizing gate counts [32, 31, 16, 17], circuit depths [14, 30, 3, 29] or circuit error [20, 18]. These algorithms can
be roughly classified into two broad categories. The first category consists of algorithms that try to reformulate
QCT as a planning or optimization problem and solve it by applying off-the-shelf tools [3, 29, 27, 22, 28, 18, 8].
However, as already shown in [27, 6], the QCT problem is NP-complete in general. Algorithms in this category
are usually highly unscalable when the size of input circuits becomes large.
In contrast, algorithms in the second category use heuristic search to construct the output quantum circuit
step by step from the original input quantum circuit [15, 21, 32, 27, 10, 31]. Experimental results show that
customized heuristic search algorithms are more promising in transforming large-scale circuits, but usually
there is still a considerable gap between the output circuit and an optimal one. The reason partially lies in
the limited search depth in most of these algorithms. To achieve efficiency, one either divides the circuits into
layers and tries to execute the gates layer-wise [32], or simply considers only the direct effect of a single move
(i.e., SWAP) (see e.g., [15, 6, 7]). This leads to a very shallow search depth. The Simulated Annealing and
Heuristic Search algorithm (SAHS) [31] and the Filtered and Depth-Limited Search approach (FiDLS) [16] can
go one or two steps further, but exploring even more seems impractical as the searching process will become
very slow if dozens or hundreds of qubit connections are presented in the QPU.
Inspired by the recent spectacular success of Monte Carlo Tree Search (MCTS) in Go play [24, 25], in this
paper, we propose an MCTS framework for the QCT problem. Although first designed for solving computer
games, MCTS has found applications in many domains which can be represented as trees of sequential decisions
[4]. MCTS is a flexible statistical anytime algorithm, which can be used with little or no domain knowledge
[4]. The basic idea behind MCTS is to explore and exploit, in a balanced way, a search tree in which each
node represents a game state and each branch a legal move starting from that state. Given the current game
state, the aim is to select the most promising move by exploring a search tree rooted with this state, based on
random sampling of the search space. This is achieved through the following five steps: (1) Selection. Starting
from the root, we first select successively a child node until a leaf node is reached; (2) Expansion. Expand the
selected leaf node with one or more child nodes each of which corresponding to a legal move; (3) Simulation.
Play out the task to completion by selecting subsequent moves randomly; (4) Backpropagation. Backpropagate
the simulation result (wining, losing, or the reward points collected) towards the root node to update the values
of nodes along the way; (5) Selection. After repeated a sufficient number of times, we then select the best move
(with the largest value) and move to the next game state.
Example 1. Now we show how to conduct a full playout based on a search tree as shown in Fig. 1(a). Suppose
a simple strategy only choosing child with maximum winning rate1 is used in Selection, then starting from root
node 0, nodes 2 and 6 with maximum values 5/6 and 2/2 among their peers according to the evaluation table
in Fig. 1 (c) will be chosen successively. Because node 6 is a leaf, it will be expanded and its child nodes 8 and
9, as shown in the dashed box of Fig. 1 (b), will be opened. After Expansion, one or more newly opened nodes
will be chosen to perform simulations. In this example, both nodes 8 and 9 are chosen to execute 2 random
simulations and the results are assumed to be 0/2 and 1/2 respectively. After all the simulations in 8 are done,
the result will be back propagated to root node 0 through nodes 6 and 2, and their values will be updated and are
marked red in the ‘EABP1’ column of Fig. 1 (c). The same operation applies after the simulations in node 9
are finished and the updated values can be found in the ‘EABP2’ column.
Our MCTS framework for the QCT problem also consists of these five major modules. In the framework,
we carefully design, by taking both short and long-term rewards into consideration, a scoring mechanism. The
MCTS algorithm is polynomial in all relevant parameters and experiments on an extensive set of realistic
benchmark circuits show that the search depth of our method can easily exceed most, if not all, existing
1The strategy for Selection in practice is much more complex than this and should take both evaluations and times of visit into
account. Interested readers can refer to [13, 5] for further details.
2
02
6 7
31
4 5
0
2
6 7
31
4 5
8 9
(a) Selection (b) Expansion
Node EBBP EABP1 EABP2
0 4/14 6/16 7/18
1 4/6 4/6 4/6
2 5/6 5/8 6/10
3 1/2 1/2 1/2
4 1/2 1/2 1/2
5 2/2 2/2 2/2
6 2/2 4/4 5/6
7 1/2 1/2 1/2
8 unopen 0/2 0/2
9 unopen 0/0 1/2
(c) Simulation and Backpropagation
Figure 1: Example for one playout in MCTS. Each grey or white node in (a) and (b) represents a legal move
made by the corresponding player. The last three columns in (c) represent evaluations in each node before
Backpropagation, after the first Backpropagation and after the second Backpropagation, respectively. The
evaluation is defined as #wins/#simulations obtained by Simulation and Backpropagation.
Figure 2: Hadamard, CNOT and SWAP gates (from left to right).
algorithms and the gate overhead of the obtained physical circuits can be reduced by at least 30% when
compared to the state-of-the-art algorithms [31, 16, 7].
The remainder of this paper is organised as follows: Section 2 provides some background knowledge about
quantum computation and summarizes the state-of-the-art of the quantum circuit transformation problem.
Section 3 then presents a detailed description of the MTCS framework as well as a theoretical analysis. Empirical
results on an extensive set of realistic benchmark circuits are presented in Section 4. The last section then
concludes the paper with an outlook.
2 Quantum Circuit Transformation
In classical computing, data are stored in the form of bits which have two states, 0 and 1. In contrast, data in
quantum computing are stored in qubits, which also have two basis states represented by |0〉 and |1〉, respectively.
However, unlike a classical bit, a qubit can be in the superposition
α |0〉+ β |1〉 (1)
of basis states, where α and β are both complex numbers, and |α|2 + |β|2 = 1.
The state of a qubit can be changed by quantum gates, which are mathematically represented by unitary
matrices. Fig. 2 depicts three important quantum gates used in this paper: Hadamard, CNOT and SWAP gates.
Hadamard is a single-qubit gate that has the ability to generate superposition: it maps |0〉 to (|0〉+ |1〉)/√2 and
|1〉 to (|0〉 − |1〉)/√2. CNOT and SWAP are both two-qubit gates. CNOT flips the target qubit depending on
the state of the control qubit; that is, CNOT: |c〉 |t〉 → |c〉 |c⊕ t〉, where c, t ∈ {0, 1} and ⊕ denotes exclusive-or.
SWAP exchanges the states of its operand qubits: it maps |a〉 |b〉 to |b〉 |a〉 for all a, b ∈ {0, 1}. Note that a
SWAP gate can be decomposed to three CNOT gates as shown in Fig. 3.
Quantum gates can be concatenated to form complex circuits which, together with measurements, are used
to describe quantum algorithms. A circuit is usually denoted by a pair (Q,C), where Q is a set of qubits and
C a sequence of quantum gates on Q. Sometimes we also call C a circuit when Q is clear from the context.
3
Figure 3: The decomposition of a SWAP into three CNOT gates.
g0
g1
g2
g3
g4
g1
g2
g3
g4
g0
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
g0
g1
g2
g3
g4
Figure 4: A quantum circuit (left), its dependency graph (middle) and the circuit after transformation (right).
Fig. 4 shows a circuit where Q = {q0, . . . , q4}, C = (g0, . . . , g4), g0 = CNOT(q0, q2), g1 = CNOT(q3, q4), etc.
Here CNOT gates are annotated with the qubits on which they are applied.
2.1 Quantum Circuit Transformation
As mentioned in the introduction, to run a quantum circuit on a given QPU in NISQ era, we need to transform
it so that the connectivity constraints imposed by the QPU are all satisfied. Such connectivity constraints are
typically described as an undirected and connected graph AG = (V,E), called the architecture graph [6], where
V denotes the set of physical qubits of the QPU and E the pairs of physical qubits on which a two-qubit gate
can be applied.
Note that by a standard process [19], any quantum circuit can be decomposed into a functionally equivalent
one which consists of only CNOT and single-qubit gates. Furthermore, as single-qubit gates can be executed
directly on a QPU (connectivity constraints only prevent two-qubit gates from applying on certain pairs of
physical qubits), we assume that the circuits to be transformed in this paper consist solely of CNOT gates.2
An important notion related to quantum circuits which plays a key role in QCT is the dependency graph.
Let C = (g0, g1, . . .) be a quantum circuit. We say gate gi in C depends on gj if j < i and they share at least
one common qubit. The dependence is direct if there is no gate gk with j < k < i such that gi depends on
gk and gk depends on gj . In general, we can construct a directed acyclic graph (DAG), called the dependency
graph [12], to characterize the dependency between gates in a circuit. Specifically, each node of the dependency
graph represents a gate and each directed edge the direct dependency relationship between the gates involved.
With the help of dependency graph, any quantum circuit C can be divided into different layers such that gates
in the same layer can be executed in parallel. The first or front layer, denoted by L0(C), consists of the gates
which have no parents in the DAG. The second layer, L1(C), is then the front layer of the DAG obtained by
deleting all gates in L0(C). Analogously, we can define the i-th layer of a circuit for any i ≥ 0.
Example 2. Fig. 4 shows an example of a quantum circuit (left) and its dependency graph (right), from which
we can see that the front layer of the circuit consists of g0 and g1, the second g2, the third g3, and the fourth
g4.
Another key notion for QCT is a qubit mapping τ which allocates logical qubits Q to physical qubits V so
that for any qi, qj ∈ Q, τ(qi) = τ(qj) if and only if i = j. Given a logical circuit (Q,C) and an architecture
graph AG, a two-qubit gate g = CNOT(qi, qj) in C is called executable by τ if τ(qi) and τ(qj) are adjacent in
AG, and g is either in the front layer of C or all the gates it depends on are executable. Note that in general
it is impossible that all two-qubit gates in a circuit are executable by a single mapping. Once no gates are
2This implies that we cannot simplify the circuits by, say, cancelling two consecutive CNOT gates acting on the same pair of
qubits.
4
IBM Q20
v0 v1 v2 v3 v4
v5 v6 v7 v8 v9
v10 v11 v12 v13 v14
v15 v16 v17 v18 v19
Figure 5: The architecture graph for IBM Q20.
executable by the current mapping τ , a QCT algorithm seeks to insert into the circuit some ancillary SWAP
gates to change τ into a new one so that more gates are executable. This insertion-execution process is iterated
until all gates from the input circuits are executed. To illustrate the basic ideas, we revisit the circuit on the
left side of Fig. 4 as an example.
Example 3. We transform the logical circuit LC = (Q,Cl) shown in Fig. 4 into a physical one PC = (V,Cp)
satisfying the architecture graph AG in Fig. 5. Suppose the initial qubit mapping τ is given as a naive one
which maps qi to vi, 0 ≤ i ≤ 4.
1. Since τ(q3) = v3, τ(q4) = v4, and v3 and v4 are adjacent in AG, gate g1 in Cl is already executable by τ .
Thus we initialise PC as a physical circuit with V = {v0, . . . , v19} and only a single CNOT gate acting
on v3 and v4, and delete g1 from Cl. Thus now, Cl = (g0, g2, g3, g4) and Cp = (CNOT(v3, v4)).
2. As no gates in Cl is executable by τ , we have to insert a SWAP gate (or a sequence of them) to get a
new mapping which admits more CNOT gates from Cl executable. In this example, we choose to add
SWAP(v0, v1) to Cp, which in effect converts τ into τ ′ that maps q0 to v1 and q1 to v0. Now g0, which
acts on q0 and q2, is executable (since v1 and v2 are adjacent in AG). Similarly, g2 is executable as
well. Thus they can be deleted from Cl and added into Cp (with the operand qubits changed accordingly).
Consequently, now Cl = (g3, g4) and
Cp = (CNOT(v3, v4),SWAP(v0, v1),CNOT(v1, v2),CNOT(v1, v0)) .
3. Proceeding in a similar way, we add another SWAP(v0, v1) to Cp to converts τ ′ back to τ so that g3 and
g4 are executable. After deleting them from Cl and adding them into Cp, we have Cl = ∅ and the finial
physical circuit becomes
Cp =(CNOT(v3, v4),SWAP(v0, v1),CNOT(v1, v2),CNOT(v1, v0),
SWAP(v0, v1),CNOT(v1, v2),CNOT(v2, v3)),
which satisfies all the connectivity constraints of AG. The final physical circuit is shown in Fig. 4 (right).
2.2 Heuristic Search Algorithms
Recall that given a logical circuit LC0, an architecture graph AG, and an initial qubit mapping τini, a QCT
process aims to output a physical circuit which respects all the connectivity constraints in AG. To present this
process as a search problem, we need to first define the notion of states. Naturally, a state of the QCT process
is a triple s = (τ, PC,LC), where τ is a qubit mapping describing the current allocation of logical qubits, PC
is the physical circuit that consists of all gates that have been executed so far and the auxiliary SWAP gates
inserted, and the logical circuit LC consists of the remaining gates to be executed. Sometimes we denote by
LC(s) and PC(s) the logical and the physical circuits of s, respectively.
5
A legal action in the QCT process can be either a SWAP operation (corresponding to an edge in AG) or a
sequence of SWAP operations. Let s = (τ, PC,LC) be the current state, and suppose an action SWAP(vi, vj)
is taken on s. Then a new state s′ = (τ ′, PC ′, LC ′) is reached where τ ′ is the same as τ except that it maps
τ−1(vi) to vj and τ−1(vj) to vi, where τ−1(vi) and τ−1(vj) are, respectively, the preimages of vi and vj under
τ . Furthermore, LC ′ is obtained from LC by deleting all gates which are executable by τ ′, and PC ′ is obtained
from PC by adding first SWAP(vi, vj) and then all the gates just deleted from PC, with the operand qubits
changed according to τ ′. While most algorithms select one SWAP each time, the A∗ algorithm [32] and FiDLS
[16] select a sequence of SWAPs. Note that when regarding sequences of SWAPs as legal actions, usually we
execute a gate only after the last SWAP is applied.
Finally, the initial state s0 of the QCT process is taken as (τini, PC0, LC ′0) where PC0 is the physical circuit
consisting of all gates from LC0 which are executable by τini, and LC ′0 the logic circuit obtained by deleting all
gates in PC0 from LC0. The goal states are those with the associated logical circuit being empty. Note that
the associated physical circuit of any goal state respects the connectivity restraints in AG. The cost of a state
s is the total number of auxiliary SWAPs inserted so far to reach it from s0. The aim of QCT is to find a goal
state with the minimal cost.
Many QCT algorithms in the literature adopt a divide-and-conquer approach in the search process. Starting
from the current state s = (τ, PC,LC), each subtask consists of executing the front layer, the first two layers,
or a front section of the circuit. For example, in the A∗ algorithm, a shortest path in AG (which corresponds
to a sequence of SWAPs) is found which converts τ to a new mapping so that all gates in the first two layers
of LC are executable. In [7], Cowtan et al. partition LC into layers and then select the SWAP which can
maximally reduce the diameter of the subgraph composed of all pairs of qubits in the current layer. Siraichi
et al. [26] decompose LC into sub-circuits each of which leads to an isomorphic subgraph of AG and thus the
corresponding embedding can act as a mapping τ ′ that executes all gates in the sub-circuit. Their algorithm
then tries to find a minimal sequence of SWAPs which converts τ to τ ′. A similar approach is also adopted in
Childs et al. [6].
Unlike the above algorithms, SAHS [31] and FiDLS [16] do not divide the problem into sub-problems.
Whenever a mapping is generated, they try to execute as many as possible gates from the logical circuit, no
matter which level they are in. SAHS regards each SWAP as a valid action, but when selecting the best SWAP
to enforce, it simulates the search process one step further and select the SWAP which has the best consecutive
SWAP to apply. In principle, SAHS can go deeper but this will make the algorithm much slower (cf. Fig. 11
for an example). FiDLS regards any sequence with up to k SWAPs as a legal action and selects the sequence
which executes the most number of gates per SWAP. In a sense, this means that its search depth can reach
k. To ensure the running time is acceptable, in the experiments on Q20, FiDLS chooses k as 3 and introduces
various filters to filter out unlike SWAPs.
3 The Proposed MCTS Framework
In this section, we describe an MCTS framework for quantum circuit transformation. A detailed algorithm
implementation is also given. Like general MCTS algorithms, our framework also consists of five major parts:
Selection, Expansion, Simulation, Backpropagation and Decision. However, some significant modifications have
been made to cater to the unique characteristics of QCT.
The Monte Carlo search tree for QCT, which is initialized immediately after the algorithm starts, stores all
states having been explored during the transformation process. As stated in the previous section, an edge (s′, s)
connecting node s′ and its child s indicates that a SWAP gate is applied to convert s′ to s. Recall that the
aim of QCT is to find a goal state which has the minimal cost in terms of the number of SWAP gates inserted
along the path from the initial state to it. For this purpose, we define an immediate, short-term reward for
each edge and a long-term value for each node of the search tree as follows.
• rew(s′, s), which is the reward collected from the father node s′ to the child s, in terms of the number of
gates executed by the newly inserted SWAP when this transition is made:
rew(s′, s) = # of gates in LC(s′) - # of gates in LC(s). (2)
6
(a) Selection (b) Expansion
According to
pertinent SWAPs 
(c) Simulation
Simulate on GSIM gates 
randomly for NSIM times
(d) Backpropagation
Repeat NBP times
With discount γ
Simulate on GSIM gates 
randomly for NSIM times
With discount γ
According to
pertinent SWAPs 
Figure 6: Overview of the Monte Carlo Tree Search Framework.
• val(s). To determine the value of a state s, the following two factors are taken into account: (i) the
number of inserted SWAPs when transformation of the remaining logical circuit is simulated at s. For
efficiency consideration, the simulation is performed on a fixed-sized sub-circuit of LC(s) instead of LC(s)
itself. It is expected that the larger the sub-circuit is for simulation, the better simulated value will be
obtained. (ii) the (simulated) value of its best child node and the reward collected from s to it. To be
specific,
val(s) = max{sim, γ · [rew(s, s′′) + val(s′′)]} (3)
where sim is the simulated value obtained from (i), s′′ is the child of s with the maximal value, and γ is a
predefined discount factor. In our later implementation, val(s) is initially assigned sim in the Simulation
module, and then updated in Backpropagation whenever simulations are performed at a descendant of s.
Intuitively, val(s) describes the efficiency of introducing SWAP gates (in terms of the average number of
executed gates per SWAP) from s, considering both the simulation at itself and the backpropagated one
from this child nodes. Obviously, the larger val(s) is, the smaller the number of SWAPs needed to lead
s to a goal node, and the ‘better’ s is (compared with its siblings).
In addition to the above definitions, as shown in Fig. 6, our framework differs from traditional MCTS
algorithms for game playing in the following ways:
1. The simulation is performed on the leaf node selected in the Selection module, in stead of the child nodes
opened in the Expansion one. Experimental results on real benchmarks indicate that this achieves a
better performance for the QCT problem.
2. In game playing, simulation result can be obtained only when the game is decided. In contrast, the
reward of a move in our setting is collected during the execution of CNOT gates from the logic circuit.
Consequently, in the Simulation module, we only simulate on a sub-circuit of the current logic circuit to
improve the efficiency.
3. We introduce a discount factor, which can be adjusted to better suit the problem setting, when back-
propagating the simulated values.
3.1 Main modules
We now elaborate the five major modules of our framework one by one.
Selection. Selection is an iterated process to find an appropriate leaf node in the search tree to expand
and simulate. It starts from the root node and, in each iteration, evaluates and picks one of the child nodes
until a leaf node is reached.
The way we evaluate child nodes during the Selection process is critical to the performance of the whole
algorithm. On one hand, if we only consider the values of them, the chance for exploring unpromising nodes
7
will be too low and we can easily get stuck in a local minimum. On the other hand, if we always select nodes
with a smaller visit count, the search will be too shallow and thus a large amount of time will be wasted in
exploring inferior nodes. To get a balance between these two aspects, the following evaluation formula, similar
to the well-known UCT (Upper Confidence Bound 1 applied to trees) [13], is introduced in our implementation
to make a balanced evaluation among all child nodes s′ of s:
rew(s, s′) + val(s′) + c
√
logvisit(s)
visit(s′)
(4)
where c is a pre-defined parameter, and visit(s) is the number of times that s has been visited. Intuitively,
the first two terms in Eq. 4 correspond to the exploitation rate and the third the exploration rate in UCT. In
each iteration of the Selection module, the node which maximises Eq. 4 is selected. The Selection module is
presented in Alg. 1.
Algorithm 1: Select(T )
input : A Monte Carlo search tree T .
output: A leaf node to expand and simulate.
begin
s← root(T );
visit(s)← visit(s) + 1;
while s is not a leaf node do
s← the child node s′ of s which maximises Eq. 4;
visit(s)← visit(s) + 1;
end
return s;
end
Expansion. The goal of Expansion is to open all child nodes of a given leaf node by applying all relevant
SWAP operations. Given a logic circuit C and a qubit mapping τ , the set of pertinent SWAPs, denoted
SWAPC,τ , is the set of gates SWAP(vi, vj) such that either τ−1(vi) or τ−1(vj) appears in a gate in the current
front layer of C, i.e.,
(vi, vj) ∈ E and (τ−1(vi) ∈ Q0 or τ−1(vj) ∈ Q0), (5)
where Q0 is the set of logical qubits that are involved in the gates in L0(C). To expand a selected node
s = (τ, PC,LC), only SWAP gates in SWAPLC,τ will be applied to generate child nodes. Note that this
strategy has been widely used in quantum circuit transformation, see e.g., [32, 15, 31]. In particular, multiple
variants of this filter strategy are introduced in FiDLS [16].
For each pertinent SWAP of s, a new child node s′ will be generated. Furthermore, we set val(s′) =
visit(s′) = 0, and the reward rew(s, s′) as defined in Eq. 2. The details can be found in Alg. 2.
Simulation. The objective of this module is to obtain a simulated score, serving as the initial long-term
value val(s), of the current state s by random simulation. In our implementation, we perform simulation on
the first Gsim, a predefined number, gates in the current logical circuit. Moreover, almost all existing QCT
algorithms can be used for this purpose. However, note that the Simulation module will be invoked every time
a new node is opened. For the sake of efficiency, a fast random simulation is designed in Alg. 3.
Given the current state s, let N be the minimal, among all Nsim (a predefined number) iterations, number
of SWAP gates we have inserted until all the first Gsim CNOT gates of LC(s) have been executed. Then the
initial long-term value of s is defined as
val(s) = γN/2 ·Gsim, (6)
where γ is a predefined discount factor.
We next show how to do random simulation. Let C be a sub-circuit of LC(s) and τ the current mapping.
We write SWAPC,τ for the set of pertinent SWAPs for C under τ . For any h ∈ SWAPC,τ , its impact factor
8
Algorithm 2: Expand(T , s)
input : A Monte Carlo search tree T and node s = (τ, PC,LC).
begin
for all SWAP(vi, vj) in SWAPLC,τ do
τ ′ ← τ [τ−1(vi) 7→ vj , τ−1(vj) 7→ vi];
C ← the set of all executable gates in LC under τ ′;
LC ′ ← LC with all gates in C deleted;
PC ′ ← PC by adding SWAP(ui, uj) and all gates in C;
s′ ← (τ ′, PC ′, LC ′);
val(s′), visit(s′)← 0;
Add s′ as a child node of s;
rew(s, s′)← number of gates in C;
end
end
is defined as
IF(h) := f
 ∑
g∈L0(C)
scost(g, τ)−
∑
g∈L0(C)
scost(g, τ ′)
 (7)
where τ ′ is the mapping obtained from τ after applying h and f the scaling function defined as
f (x) =
 0, if x < 00.001, if x = 0
x, if x > 0
(8)
Furthermore, scost(g, τ) is the swap cost of g = CNOT(qj , qk) with respect to τ , which is defined to be the
shortest distance between the physical qubits τ(qj) and τ(qk) in the architecture graph. Then, a probability
distribution is obtained as follows
P (X = h) =
IF(h)∑{IF(h′) | h′ ∈ SWAPC,τ} , (9)
through which a SWAP operation can be sampled from SWAPC,τ and used to execute gates from LC. Note
that this simulation process will be repeated for Nsim, also a predefined parameter, times to obtained a best
score.
Another issue which deserves explanation is the way we compute the simulated score (or, the initial value) for
state s in Eq. 6. In particular, one may wonder why the exponent is N/2 instead of N? The intuitive meaning
of this definition is as follows. Although these Gsim gates are executed in different steps during the simulation,
for simplicity, we suppose they are all executed right at the middle point s′ which is the N/2-generation son of
s. Then the reward collected at the transition to s′ from its father is exactly Gsim. Note that every edge along
the path from s and the father of s′ has zero reward. Thus, we need only backpropagate the reward collected
at s′ upwards with discount factor γ. This gives the simulated score γN/2 · Gsim for s as specified in Eq. 6.
Moreover, real benchmark experiments also show that the current choice performs better than simply letting
val(s) be the sum of all the (discounted) rewards collected during the actual execution of these Gsim CNOT
gates.
Backpropagation. Backpropagation module updates the values of ancestors of the just simulated node
in the search tree. More precisely, the value of node s in the propagated path will be updated if
val(s) < γ · [rew(s, s′) + val(s′)], (10)
in which s′ is the child node of s on the path, and if it is the case, then val(s) is updated with γ · [rew(s, s′)+
val(s′)]. This reflects the intuitive meaning of val(s) discussed at the beginning of this section. The imple-
mentation is shown in Alg. 4.
9
Algorithm 3: Simulate(T , s)
input : A Monte Carlo search tree T and node s = (τ, PC,LC).
begin
N ←∞;
do
C ← circuit with the first Gsim gates in LC;
n← 0; τ ′ ← τ ;
while C is not empty do
Sample h from SWAPC,τ ′ according to the probability distribution in Eq. 9;
τ ′ ← τ ′ by applying h;
C ← C with all gates executable by τ ′ deleted;
n← n+ 1;
end
if n < N then
N ← n;
end
for Nsim times;
val(s)← γN/2 ·Gsim;
end
Algorithm 4: Backpropagate(T , s)
input : A Monte Carlo search tree T and node s.
begin
while s 6= root(T ) do
s′ ← father node of s;
val(s′)← max{val(s′), γ · [rew(s′, s) + val(s)]};
s← s′;
end
end
Decision. This module, depicted in Alg. 5, decides the best move from the root node, and replace the
search tree with the subtree rooted at the corresponding best child node.
Algorithm 5: Decide(T )
input : A Monte Carlo search tree T .
begin
rt← root(T );
s← child node of rt with the highest rew(rt, s) + val(s);
T ← the subtree of T rooted at s;
end
3.2 Combine Everything Together
Finally, we combine all the modules together as in Alg. 6 to form the MTCS framework for QCT. Note that
to ensure the reliability of the Decision module, a sufficiently large number (Nbp, a predefined parameter) of
Selection, Expansion, Simulation, and Backpropagation should be performed to get a good estimation of the
values of relevant states. Note also that we perform simulation on the leaf node chosen by the Selection module,
which is different from traditional MCTS algorithms where simulation is usually performed on the newly opened
10
nodes. This design is made merely for the sake of efficiency.
Due to the stochastic nature of our algorithm, there is a negligible but still positive possibility that at
certain iteration of the while loop in Alg. 6, even the best child node derived from the Decision module cannot
execute any new gate. To guarantee termination in this extreme case, a fallback mechanism, which has been
widely used in the literature (cf. [6]), is adopted. Specifically, if no CNOT gates have been executed after |V |
consecutive Decisions and the current root node is (τ, PC,LC), then we choose a CNOT gate from L0(LC)
with minimum swap cost with respect to τ , and insert the corresponding SWAP gates to PC so that progress
will be made by executing this chosen CNOT gate.
Algorithm 6: Quantum circuit transformation based on Monte Carlo tree search
input : An architecture graph AG, a logical circuit LC, and an initial mapping τini.
output: A physical circuit satisfying the connectivity constraints in AG.
begin
PC ← the circuit consisting of all executable gates in LC under τini;
LC ← LC with gates in PC deleted;
s← (τini, PC,LC);
val(s),visit(s)← 0;
T ← a search tree with a single (root) node s;
while LC(s) 6= ∅ do
do
s← Select(T );
Expand(T , s);
Simulate(T , s);
Backpropagate(T , s);
for Nbp times;
Decide(T );
s← root(T );
end
return PC(s)
end
3.3 Complexity Analysis
This subsection is devoted to a rough analysis of the complexity of our algorithm. Suppose AG = (V,E) and
the input logical circuit LC = (Q,C). Among the five main modules presented in subsection 3.1, the most
expensive ones are Selection, Expansion, and Simulation. We analyze their complexity separately as follows.
Selection. The complexity of this module depends on the depth of the search tree. In the worst case,
each of the Nbp iteration in the do loop of Alg. 6 increases the depth by 1. Taking into account the fallback
introduced in the last subsection, the depth of the search tree is at most Nbp · |V |. As each node has at most
|E| children, the overall complexity for this module is O (Nbp · |V | · |E|).
Expansion. There are at most |E| pertinent SWAP gates available to create new nodes, and for each new
one, at most |C| gates need to be checked to see whether they are executable. Thus the time complexity is
O (|E| · |C|). Here |C| denotes the number of gates in C.
Simulation. Computing the probability distribution in Eq. 9 takes time O(|E| · |V |). To guarantee termi-
nation, the while loop will be aborted if no gates have been executed after |V | consecutive iterations. Hence,
the complexity of this module is O(|E| · |V |2 ·Gsim ·Nsim).
Finally, note that in the worst case, all gates from C are executed by the fallback mechanism which is
invoked after every |V | iterations. Hence, the Sel-Exp-Sim-BP modules will be run for at most |C| · |V | ·Nbp
times, and the overall time complexity of our algorithm is
O
(|C| · |V | ·Nbp · |E| · [Nbp · |V |+ |C|+ |V |2 ·Gsim ·Nsim]) , (11)
or O(|C| · |V | · |E| · (|C|+ |V |2)) when the parameters are regarded as constants.
11
Process
Communication
Figure 7: System diagram for the parallelization of the proposed algorithm.
3.4 Parallelization
In addition to the performance improvement illustrated in Sec 4, another advantage of our MCTS framework
is that it supports parallelised implementation in a multi-thread way, where computational processes3 are run
in parallel and share the same memory.
As depicted in Fig. 7, the main process is responsible for initializing the search tree which is shared among
all subprocesses, and the invocation and termination of all other subprocesses, including the Decision module.
At each iteration of Alg. 6, the main process shares the search tree produced by the Decision module in the last
iteration (or initialised from the algorithm input for the first iteration), and invoke all other subprocesses. The
Sel. subprocesses, implementing the Selection module, put the chosen leaf nodes into a sequence shared with
the Exp. subprocesses. The latter ones, implementing the Expansion module, take nodes from the sequence,
expand them and put their child nodes into another sequence shared with the Sim. subprocesses. Similar
communication occurs between Sim. subprocesses and BP subprocesses.
After sufficiently large numbers of Backpropagations or the waiting time has exceeded a threshold, the main
process terminates all the subprocesses and invokes the Decision module to produce a new search tree. Then
another iteration continues until the input circuit is transformed completely.
4 Programming and Benchmarks
To evaluate our approach, we compare it with three state-of-the-art algorithms proposed in the literature
[31, 16, 7]. As the choice of initial mappings may sometimes influence the performance of QCT algorithms, to
make a fair comparison, we always take the same initial mappings as they do if available. We use Python as our
main programming language and IBM Qiskit [9] as the auxiliary environment to implement the unparallelized
version of our algorithm4. For efficiency consideration, the Simulation module is implemented in C++. All
3Note the difference of process used in this subsection and those used in other parts of the paper.
4Source code and detailed empirical results are available at https://github.com/BensonZhou1991/Circuit-Transformation-via-
Monte-Carlo-Tree-Search
12
0
500
1000
1500
2000
2500
3000
3500
4000
4500
6400
6600
6800
7000
7200
7400
7600
7800
8000
10 20 30 40 50 60 70 80 90 100
#Added CNOT vs. K0 Time (s) vs. k0(a)
0
2000
4000
6000
8000
10000
12000
14000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
#Gates vs. k4(b)
0
200
400
600
800
1000
1200
1400
1600
1800
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
10 20 30 40 50 60 70 80
#Gates (left axis) vs. k2 Time (seconds, right axis) vs. k2(c)
Figure 8: Evaluation for different parameter settings: (a) Nbp, (b) γ, and (c) Gsim on IBM Q20 where the
vertical axes on the left and right in each sub-figure represent the aggregated number of added CNOT gates in
output circuits (corresponding to blue lines) and running time (seconds, corresponding to orange dashed lines),
respectively. All data is aggregated from a small set of benchmarks with 11 circuits and 26,676 gates in total
and initial mappings are taken from SAHS [31].
experimental results reported here are obtained by choosing the best one from 5 trials.
Note that our MCTS-based QCT algorithm has a couple of parameters to be determined before actual
running: Nbp (repeated times for Sel-Exp-Sim-BP modules before each Decision), c (the exploration parameter
used in Eq. 4), Gsim (the size of sub-circuit used in simulation), Nsim (the number of simulations), and γ (the
discount ratio). In our implementation, we choose a small set of benchmarks with 11 circuits and 26,676 CNOT
gates (see Table 2) running on IBM Q20 to help determine these parameters. Fig. 8 depicts the dependency of
the size of the final physical circuits (left vertical axis) and the running time (right vertical axis) on different
parameter settings. To get a good balance between algorithm performance and running time, we empirically
set Nbp = c = 20, Gsim = 30, Nsim = 500 and γ = 0.7 for our experiments.
We have shown in Subsection 3.3 that our algorithm runs in time polynomial in all relevant parameters.
To further demonstrate the running time in practice, we randomly generate two sets of 10 quantum circuits.
In one set, each circuit has 500 CNOT gates, and the number of logical qubits ranges from 5 to 14. In the
other, each circuit has 10 qubits, and the number of CNOT gates ranges from 50 to 500. We transform all
these circuits via our algorithm on IBM Q20 architecture, and record the average running time for each circuit
set. As shown in Fig. 9, the real time cost is roughly the 1.9th power in the number of qubits and linear (with
slope being about 0.15) in the number of CNOT gates, indicating that our algorithm is practically scalable.
Now we compare our algorithm with SAHS [31] on IBM Q20 on two benchmarks, consisting of 11 and 114
circuits respectively. The results are presented in Tables 2 and 3 where the column ‘Improvement’ shows the
improvement of our algorithm over SAHS in terms of the numbers of auxiliary CNOT gates5 added. Specifically,
let ncomp and nours be the numbers of gates added by the compared algorithm and by ours, respectively. Then
the ratio in ‘Improvement’ is defined as (ncomp − nours)/ncomp. Note that this definition for ‘Improvement’
is also used in other tables in the rest of the paper. It turns out that our algorithm achieves a conspicuous
improvement of 41.67% and 36.68% on average on these two benchmarks, respectively.
5Each added SWAP gate is decomposed to 3 CNOT gates as shown in Fig. 3.
13
0
20
40
60
80
100
120
140
160
5 6 7 8 9 10 11 12 13 14
Running Time (s) #Qubits^1.9
(a)
0
10
20
30
40
50
60
70
80
50 100 150 200 250 300 350 400 450 500
Running Time (s) 0.15 * Size
(b)
Figure 9: The average running time (seconds, orange line) vs. number of logical qubits (a) and number of
CNOT gates (b) of the input circuits on IBM Q20 with the naive initial mapping. For each data point, we take
the average value of randomly generated 10 quantum circuits with 500 CNOT gates (a) or 10 qubits (b).
Table 1: Summarized results for a large benchmark set with 114 circuits and 248,553 CNOT gates in total,
where Columns 2 & 3 represent aggregated numbers of added CNOT gates obtained from other methods and
our algorithm (MCTS) when using their initial mappings, respectively.
CNOT Added
Others
CNOT Added
MCTS Improvement
Naive SAHS 161013 77544 51.84%
SAHS 116487 73758 36.68%
Topg. FiDLS 107406 74763 30.39%
Weig. FiDLS 105645 75126 28.89%
Cambridge 251143 77544 69.12%
We further compare our algorithm with FiDLS [16], where two techniques for initial mappings, topgraph
(topg.) and weighted graph (weig.), are proposed. As shown in Table 1, our algorithm has a consistent
improvement, 30.39% for topgraph mapping and 28.89% for weighted graph one. Interestingly, our algorithm
still performs well even the initial mapping is taken to be the naive one; the overhead compared with the best
result is only 5.13% (77,544 v.s. 73,758). Note that many other algorithms [31, 15] rely on specially designed
techniques to construct initial mappings, which is itself time-consuming. Ours provides more flexibility in
choosing initial mappings, without compromising too much performance. The detailed results can be found in
Appendix.
Finally, we compare our algorithm with the one in [7], called Cambridge in this paper. As their initial
mappings are not directly available, in Table 1 (last row), we only show their result (without using postmapping
optimisations), which is above 3 times of our result with naive initial mappings.
We also record the search depth for each use of the Selection module and calculate the minimum, average,
and maximum depth before each Decision of our algorithm. As shown in Fig. 10, the maximum depth can
easily exceed that of SAHS [31], which is set to 2. Actually, at most of time it is more than 3, meaning that
our algorithm has better ability of exploring the unknown state. Note that it is claimed in [31] that the size
of output physical circuits can be further decreased by increasing the search depth, with the cost of more
time consumption. Fig. 11 depicts a comparison of the output circuit size as well as the running time of our
algorithm and SAHS on the example circuit ‘misex1_241’ with 4,813 gates, where for SAHS, the search depth
varies from 1 to 4. It shows that our algorithm outperforms SAHS in the output circuit size even when the
search depth of SAHS is set to 4. However, in this case, the running time of SAHS is over 20 minutes, while
our algorithm finishes in only 31 seconds.
14
01
2
3
4
5
6
7
8
1 11 21 31 41 51 61 71 81 91 101 111
MCTS Min. Depth MCTS Ave. Depth MCTS Max. Depth SASH Depth
Figure 10: Search depth (for Selection before each Decision) of the proposed algorithm for circuit ‘misex1_241’.
The horizontal and vertical axis represent rounds for Decision and search depth, respectively.
1
10
100
1000
10000
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1 2 3 4
#Gates
SAHS
Time
SAHS
Figure 11: The benefit brought by increasing the search depth in SAHS [31] for the example circuit ‘mi-
sex1_241’. The horizontal axis and vertical axes on the left and right represent search depth, number of gates
in the output circuit and running time (seconds), respectively. The dashed area and line represent number of
gates and running time in our algorithm, respectively.
5 Conclusion
In this paper, an MCTS framework is proposed for the quantum circuit transformation problem which aims at
minimizing the ancillary SWAP gates added to transform an ideal logical circuit to a physical one executable
on a QPU with connectivity constraints. For this purpose, a scoring mechanism (cf. Eq. 10) is designed which
takes into account both the short-term reward of introducing a SWAP gate on the current state and a long-term
value obtained by random simulation. Furthermore, when backpropagating rewards collected by descendant
states to their ancestors, a discount factor is introduced to guide the algorithm towards a shortest path to a
goal state. The MCTS algorithm is polynomial in all relevant parameters and, with five parameters, it is very
flexible, can stop whenever a preassigned resource limit is reached and, meanwhile, search much deeper than
existing algorithms. Empirical results on extensive realistic circuits confirmed that the MCTS algorithm has
indeed a deep search depth and can reduce, in average, the size of the output circuits by at least 30% when
compared with the state-of-the-art algorithms on IBM Q20.
For future studies, we propose the following problems to solve. First, more objectives, e.g., depth, fidelity
and error rate, should be included in evaluating output physical circuits. Second, parameters presented in our
MCTS algorithm are QPU-dependent, and a careful study of their correlation may provide a better insight
on how to choose them in practice. Third, there is still much room to improve the implementation of our
algorithm. For example, the current strategy for choosing pertinent SWAPs in Expansion may be too strict,
and, hopefully, better performance may be obtained if we use a more relaxed strategy, e.g., the Q0 and Q1
filters used in FiDLS [16].
15
Table 2: Comparison of our algorithm with SAHS [31] on IBM Q20.
Circuit
Name
Input
CNOT
Added
SAHS
Added
MCTS Improvement
rd84_142 154 102 66 35.29%
adr4_197 1498 711 414 41.77%
radd_250 1405 729 471 35.39%
z4_268 1343 546 390 28.57%
sym6_145 1701 744 339 54.44%
misex1_241 2100 921 357 61.24%
rd73_252 2319 1125 783 30.40%
cycle10_2_110 2648 1038 738 28.90%
square_root_7 3089 1353 624 53.88%
sqn_258 4459 1953 1182 39.48%
rd84_253 5960 3198 1881 41.18%
Summary 26676 12420 7245 41.67%
6 Acknowledgments
This work was supported by the National Key R&D Program of China (Grant No. 2018YFA0306704) and the
Australian Research Council (Grant No. DP180100691).
References
[1] Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C Bardin, Rami Barends, Rupak Biswas,
Sergio Boixo, Fernando GSL Brandao, David A Buell, et al. Quantum supremacy using a programmable
superconducting processor. Nature, 574(7779):505–510, 2019.
[2] Adriano Barenco, Charles H Bennett, Richard Cleve, David P DiVincenzo, Norman Margolus, Peter Shor,
Tycho Sleator, John A Smolin, and Harald Weinfurter. Elementary gates for quantum computation.
Physical Review A, 52(5):3457, 1995.
[3] Kyle EC Booth, Minh Do, J Christopher Beck, Eleanor Rieffel, Davide Venturelli, and Jeremy Frank. Com-
paring and integrating constraint programming and temporal planning for quantum circuit compilation.
In Twenty-Eighth International Conference on Automated Planning and Scheduling, 2018.
[4] Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlf-
shagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. A survey of monte carlo
tree search methods. IEEE Transactions on Computational Intelligence and AI in games, 4(1):1–43, 2012.
[5] Guillaume M JB Chaslot, Mark HM Winands, H JAAP VAN DEN HERIK, Jos WHM Uiterwijk, and
Bruno Bouzy. Progressive strategies for monte-carlo tree search. New Mathematics and Natural Compu-
tation, 4(03):343–357, 2008.
[6] Andrew M Childs, Eddie Schoute, and Cem M Unsal. Circuit transformations for quantum architectures.
In 14th Conference on the Theory of Quantum Computation, Communication and Cryptography, 2019.
[7] Alexander Cowtan, Silas Dilkes, Ross Duncan, Alexandre Krajenbrink, Will Simmons, and Seyon Sivara-
jah. On the qubit routing problem. In 14th Conference on the Theory of Quantum Computation, Com-
munication and Cryptography, 2019.
[8] Alexandre AA de Almeida, Gerhard W Dueck, and Alexandre CR da Silva. Finding optimal qubit permu-
tations for IBM’s quantum computer architectures. In Proceedings of the 32nd Symposium on Integrated
Circuits and Systems Design, pages 1–6, 2019.
16
[9] Gadi Aleksandrowicz et al. Qiskit: An open-source framework for quantum computing, 2019.
[10] Will Finigan, Michael Cubeddu, Thomas Lively, Johannes Flick, and Prineha Narang. Qubit allocation
for noisy intermediate-scale quantum computers. arXiv preprint arXiv:1810.08291, 2018.
[11] Thomas Häner, Damian S Steiger, Krysta Svore, and Matthias Troyer. A software methodology for
compiling quantum programs. Quantum Science and Technology, 3(2):020501, 2018.
[12] Toshinari Itoko, Rudy Raymond, Takashi Imamichi, Atsushi Matsuo, and Andrew W Cross. Quantum
circuit compilers using gate commutation rules. In Proceedings of the 24th Asia and South Pacific Design
Automation Conference, pages 191–196. ACM, 2019.
[13] Levente Kocsis and Csaba Szepesvári. Bandit based monte-carlo planning. In 15th European Conference
on Machine Learning, pages 282–293. Springer, 2006.
[14] Lingling Lao, Daniel M Manzano, Hans van Someren, Imran Ashraf, and Carmen G Almudever. Mapping
of quantum circuits onto NISQ superconducting processors. arXiv preprint arXiv:1908.04226, 2019.
[15] Gushu Li, Yufei Ding, and Yuan Xie. Tackling the qubit mapping problem for NISQ-era quantum devices.
In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming
Languages and Operating Systems, pages 1001–1014. ACM, 2019.
[16] Sanjiang Li, Xiangzhen Zhou, and Yuan Feng. Qubit mapping based on subgraph isomorphism and filtered
depth-limited search. arXiv preprint arXiv:2004.07138, 2020.
[17] Aaron Lye, Robert Wille, and Rolf Drechsler. Determining the minimal number of swap gates for multi-
dimensional nearest neighbor quantum circuits. In The 20th Asia and South Pacific Design Automation
Conference, pages 178–183. IEEE, 2015.
[18] Prakash Murali, Jonathan M Baker, Ali Javadi-Abhari, Frederic T Chong, and Margaret Martonosi.
Noise-adaptive compiler mappings for noisy intermediate-scale quantum computers. In Proceedings of
the Twenty-Fourth International Conference on Architectural Support for Programming Languages and
Operating Systems, pages 1015–1029. ACM, 2019.
[19] Michael A Nielsen and Isaac L Chuang. Quantum information and quantum computation. Cambridge:
Cambridge University Press, 2(8):23, 2000.
[20] Shin Nishio, Yulu Pan, Takahiko Satoh, Hideharu Amano, and Rodney Van Meter. Extracting success
from IBM’s 20-qubit machines using error-aware compilation. arXiv preprint arXiv:1903.10963, 2019.
[21] Alexandru Paler. On the influence of initial qubit placement during NISQ circuit compilation. In Inter-
national Workshop on Quantum Technology and Optimization Problems, pages 207–217. Springer, 2019.
[22] Mehdi Saeedi, Robert Wille, and Rolf Drechsler. Synthesis of quantum circuits for linear nearest neighbor
architectures. Quantum Information Processing, 10(3):355–377, 2011.
[23] Vivek V Shende, Stephen S Bullock, and Igor L Markov. Synthesis of quantum-logic circuits. IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems, 25(6):1000–1010, 2006.
[24] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian
Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go
with deep neural networks and tree search. Nature, 529(7587):484, 2016.
[25] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas
Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human
knowledge. Nature, 550(7676):354–359, 2017.
[26] Marcos Yukio Siraichi, Vinícius Fernandes dos Santos, Caroline Collange, and Fernando Magno Quintão
Pereira. Qubit allocation as a combination of subgraph isomorphism and token swapping. Proc. ACM
Program. Lang., 3(OOPSLA):120:1–120:29, 2019.
17
[27] Marcos Yukio Siraichi, Vinícius Fernandes dos Santos, Sylvain Collange, and Fernando Magno Quintão
Pereira. Qubit allocation. In Proceedings of the 2018 International Symposium on Code Generation and
Optimization, pages 113–125. ACM, 2018.
[28] Davide Venturelli, Minh Do, Eleanor Rieffel, and Jeremy Frank. Compiling quantum circuits to realistic
hardware architectures using temporal planners. Quantum Science and Technology, 3(2):025004, 2018.
[29] Davide Venturelli, Minh Do, Eleanor G Rieffel, and Jeremy Frank. Temporal planning for compilation of
quantum approximate optimization circuits. In Twenty-Sixth International Joint Conference on Artificial
Intelligence, pages 4440–4446, 2017.
[30] Chi Zhang, Yanhao Chen, Yuwei Jin, Wonsun Ahn, Youtao Zhang, and Eddy Z Zhang. A depth-aware
swap insertion scheme for the qubit mapping problem. arXiv preprint arXiv:2002.07289, 2020.
[31] Xiangzhen Zhou, Sanjiang Li, and Yuan Feng. Quantum circuit transformation based on simulated an-
nealing and heuristic search. IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, pages 1–1, 2020.
[32] Alwin Zulehner, Alexandru Paler, and Robert Wille. An efficient methodology for mapping quantum
circuits to the IBM QX architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits
and Systems, 2018.
A Large Circuit Set Benchmarks Compared to SAHS
Table 3: Comparison of our algorithm with SAHS [31] on IBM
Q20.
Circuit
Name
Input
CNOT
CNOT Added
SAHS
CNOT Added
MCTS Improvement
graycode6_47 5 0 0 0.0%
xor5_254 5 0 0 0.0%
ex1_226 5 0 0 0.0%
4gt11_84 9 0 0 0.0%
ex-1_166 9 0 0 0.0%
ham3_102 11 0 0 0.0%
4mod5-v0_20 10 0 0 0.0%
4mod5-v1_22 11 0 0 0.0%
mod5d1_63 13 0 0 0.0%
4gt11_83 14 0 0 0.0%
4gt11_82 18 3 3 0.0%
rd32-v0_66 16 0 0 0.0%
mod5mils_65 16 0 0 0.0%
4mod5-v0_19 16 0 0 0.0%
rd32-v1_68 16 0 0 0.0%
alu-v0_27 17 6 6 0.0%
3_17_13 17 0 0 0.0%
4mod5-v1_24 16 0 0 0.0%
alu-v1_29 17 6 6 0.0%
alu-v1_28 18 6 6 0.0%
alu-v3_35 18 6 6 0.0%
alu-v2_33 17 6 6 0.0%
alu-v4_37 18 6 6 0.0%
miller_11 23 0 0 0.0%
decod24-v0_38 23 0 0 0.0%
18
Table 3: Comparison of our algorithm with SAHS [31] on IBM
Q20.
Circuit
Name
Input
CNOT
CNOT Added
SAHS
CNOT Added
MCTS Improvement
alu-v3_34 24 6 6 0.0%
decod24-v2_43 22 0 0 0.0%
mod5d2_64 25 12 12 0.0%
4gt13_92 30 0 0 0.0%
4gt13-v1_93 30 0 0 0.0%
one-two-three-v2_100 32 9 9 0.0%
4mod5-v1_23 32 9 9 0.0%
4mod5-v0_18 31 9 9 0.0%
one-two-three-v3_101 32 6 6 0.0%
4mod5-bdd_287 31 6 6 0.0%
decod24-bdd_294 32 15 15 0.0%
4gt5_75 38 15 15 0.0%
alu-v0_26 38 9 9 0.0%
rd32_270 36 12 12 0.0%
alu-bdd_288 38 24 15 37.5%
decod24-v1_41 38 15 12 20.0%
4gt5_76 46 15 15 0.0%
4gt13_91 49 15 15 0.0%
4gt13_90 53 27 21 22.2%
alu-v4_36 51 15 15 0.0%
4gt5_77 58 9 9 0.0%
one-two-three-v1_99 59 12 12 0.0%
rd53_138 60 27 27 0.0%
one-two-three-v0_98 65 24 24 0.0%
4gt10-v1_81 66 27 21 22.2%
decod24-v3_45 64 15 15 0.0%
aj-e11_165 69 18 18 0.0%
4mod7-v0_94 72 12 12 0.0%
alu-v2_32 72 15 15 0.0%
4mod7-v1_96 72 18 18 0.0%
cnt3-5_179 85 15 9 40.0%
mod10_176 78 24 24 0.0%
4gt4-v0_80 79 24 24 0.0%
4gt12-v0_88 86 21 21 0.0%
0410184_169 104 12 12 0.0%
4_49_16 99 36 27 25.0%
4gt12-v1_89 100 24 15 37.5%
4gt4-v0_79 105 12 9 25.0%
hwb4_49 107 33 30 9.1%
4gt4-v0_78 109 15 15 0.0%
mod10_171 108 24 24 0.0%
4gt12-v0_87 112 6 6 0.0%
4gt12-v0_86 116 9 9 0.0%
4gt4-v0_72 113 42 18 57.1%
4gt4-v1_74 119 78 30 61.5%
mini-alu_167 126 33 33 0.0%
one-two-three-v0_97 128 66 39 40.9%
rd53_135 134 54 54 0.0%
19
Table 3: Comparison of our algorithm with SAHS [31] on IBM
Q20.
Circuit
Name
Input
CNOT
CNOT Added
SAHS
CNOT Added
MCTS Improvement
ham7_104 149 81 60 25.9%
decod24-enable_126 149 87 54 37.9%
mod8-10_178 152 21 21 0.0%
4gt4-v0_73 179 42 42 0.0%
ex3_229 175 18 18 0.0%
mod8-10_177 196 39 36 7.7%
alu-v2_31 198 54 48 11.1%
C17_204 205 96 60 37.5%
rd53_131 200 90 51 43.3%
alu-v2_30 223 45 45 0.0%
mod5adder_127 239 51 51 0.0%
rd53_133 256 105 84 20.0%
majority_239 267 84 60 28.6%
ex2_227 275 96 78 18.7%
cm82a_208 283 84 84 0.0%
sf_276 336 24 24 0.0%
sf_274 336 24 24 0.0%
con1_216 415 192 102 46.9%
rd53_130 448 171 132 22.8%
f2_232 525 213 105 50.7%
rd53_251 564 204 147 27.9%
hwb5_53 598 174 159 8.6%
radd_250 1405 669 441 34.1%
rd73_252 2319 1065 627 41.1%
cycle10_2_110 2648 1296 759 41.4%
hwb6_56 2952 1104 813 26.4%
cm85a_209 4986 2337 1278 45.3%
rd84_253 5960 3246 1950 39.9%
root_255 7493 3525 2082 40.9%
mlp4_245 8232 4116 2670 35.1%
urf2_277 10066 5934 4470 24.7%
sym9_148 9408 2172 1146 47.2%
hwb7_59 10681 4602 2769 39.8%
clip_206 14772 6843 4596 32.8%
sym9_193 15232 6441 4104 36.3%
dist_223 16624 6936 4719 32.0%
sao2_257 16864 7827 4515 42.3%
urf5_280 23764 13065 8721 33.2%
urf1_278 26692 15678 11040 29.6%
sym10_262 28084 11697 6636 43.3%
hwb8_113 30372 14976 8127 45.7%
summary 248553 116487 73758 36.7%
B More Initial Mappings
20
Table 4: Comparison of our algorithm to FiDLS [16] based on
various initial mappings including Naive, Topgraph (topgr.) [16]
and Weighted graph (wgtgr.) [16] on IBM Q20.
Circuit
Name
Input
CNOT
CNOT Added
Naive MCTS
CNOT Added
Topg. FiDLS
CNOT Added
Topg. MCTS
CNOT Added
Weig. FiDLS
CNOT Added
Topg. MCTS
graycode6_47 5 9 0 0 0 0
xor5_254 5 15 0 0 0 0
ex1_226 5 15 0 0 0 0
4gt11_84 9 12 0 0 0 0
ex-1_166 9 6 0 0 0 0
ham3_102 11 6 0 0 0 0
4mod5-v0_20 10 9 0 0 0 0
4mod5-v1_22 11 9 0 0 0 0
mod5d1_63 13 12 0 0 0 0
4gt11_83 14 12 0 0 0 0
4gt11_82 18 15 3 3 3 3
rd32-v0_66 16 12 0 0 0 0
mod5mils_65 16 18 0 0 0 0
4mod5-v0_19 16 15 0 0 0 0
rd32-v1_68 16 12 0 0 0 0
alu-v0_27 17 15 9 6 6 9
3_17_13 17 6 0 0 0 0
4mod5-v1_24 16 15 0 0 0 0
alu-v1_29 17 18 9 6 6 9
alu-v1_28 18 15 9 6 6 9
alu-v3_35 18 15 9 6 6 9
alu-v2_33 17 15 9 6 6 6
alu-v4_37 18 15 9 6 6 9
miller_11 23 6 0 0 0 0
decod24-v0_38 23 12 0 0 0 0
alu-v3_34 24 15 9 6 6 12
decod24-v2_43 22 12 0 0 0 0
mod5d2_64 25 21 18 9 9 12
4gt13_92 30 21 0 0 0 0
4gt13-v1_93 30 24 0 0 0 0
one-two-three-v2_100 32 24 9 9 9 9
4mod5-v1_23 32 24 9 15 21 9
4mod5-v0_18 31 24 9 12 12 9
one-two-three-v3_101 32 27 15 9 18 9
4mod5-bdd_287 31 21 6 12 18 6
decod24-bdd_294 32 21 15 15 27 12
4gt5_75 38 24 9 9 12 15
alu-v0_26 38 27 12 9 9 12
rd32_270 36 27 18 18 24 12
alu-bdd_288 38 27 24 15 15 24
decod24-v1_41 38 30 3 15 21 3
4gt5_76 46 27 21 24 36 15
4gt13_91 49 27 6 6 6 6
4gt13_90 53 30 9 9 9 9
alu-v4_36 51 30 6 21 15 6
4gt5_77 58 24 9 18 18 9
one-two-three-v1_99 59 30 24 27 33 15
21
Table 4: Comparison of our algorithm to FiDLS [16] based on
various initial mappings including Naive, Topgraph (topgr.) [16]
and Weighted graph (wgtgr.) [16] on IBM Q20.
Circuit
Name
Input
CNOT
CNOT Added
Naive MCTS
CNOT Added
Topg. FiDLS
CNOT Added
Topg. MCTS
CNOT Added
Weig. FiDLS
CNOT Added
Topg. MCTS
rd53_138 60 33 30 30 42 24
one-two-three-v0_98 65 39 18 27 27 18
4gt10-v1_81 66 33 15 18 18 15
decod24-v3_45 64 39 15 24 24 15
aj-e11_165 69 33 33 21 36 18
4mod7-v0_94 72 42 12 27 27 12
alu-v2_32 72 27 15 21 45 15
4mod7-v1_96 72 33 21 21 21 27
cnt3-5_179 85 63 3 15 30 3
mod10_176 78 36 15 24 24 15
4gt4-v0_80 79 48 15 12 39 21
4gt12-v0_88 86 48 21 12 15 18
0410184_169 104 72 6 24 27 6
4_49_16 99 42 18 30 30 21
4gt12-v1_89 100 36 57 12 18 27
4gt4-v0_79 105 54 12 9 12 9
hwb4_49 107 42 36 39 42 30
4gt4-v0_78 109 60 15 15 15 15
mod10_171 108 42 39 27 27 33
4gt12-v0_87 112 60 6 6 6 6
4gt12-v0_86 116 63 9 9 9 9
4gt4-v0_72 113 60 45 36 39 27
4gt4-v1_74 119 72 39 27 27 24
mini-alu_167 126 48 30 33 33 30
one-two-three-v0_97 128 57 42 51 78 36
rd53_135 134 81 60 66 84 60
ham7_104 149 51 48 39 42 36
decod24-enable_126 149 69 66 51 63 42
mod8-10_178 152 33 69 24 33 48
4gt4-v0_73 179 66 99 42 42 87
ex3_229 175 42 24 24 81 15
mod8-10_177 196 57 123 42 78 54
alu-v2_31 198 81 78 60 90 57
C17_204 205 81 111 72 84 63
rd53_131 200 78 63 75 78 33
alu-v2_30 223 84 60 57 54 51
mod5adder_127 239 69 84 51 81 81
rd53_133 256 84 60 90 174 39
majority_239 267 96 66 84 105 93
ex2_227 275 93 78 90 108 72
cm82a_208 283 93 117 78 105 60
sf_276 336 60 36 24 36 24
sf_274 336 45 30 24 36 69
con1_216 415 138 273 111 153 162
rd53_130 448 126 267 144 207 129
f2_232 525 114 336 111 126 117
rd53_251 564 159 201 144 195 165
22
Table 4: Comparison of our algorithm to FiDLS [16] based on
various initial mappings including Naive, Topgraph (topgr.) [16]
and Weighted graph (wgtgr.) [16] on IBM Q20.
Circuit
Name
Input
CNOT
CNOT Added
Naive MCTS
CNOT Added
Topg. FiDLS
CNOT Added
Topg. MCTS
CNOT Added
Weig. FiDLS
CNOT Added
Topg. MCTS
hwb5_53 598 174 207 171 204 156
radd_250 1405 453 633 474 567 435
rd73_252 2319 813 1062 630 852 825
cycle10_2_110 2648 891 1125 954 1290 912
hwb6_56 2952 825 1077 783 1026 834
cm85a_209 4986 1479 2073 1449 2091 1140
rd84_253 5960 1959 2952 1995 2841 2055
root_255 7493 2442 2928 2196 3099 2013
mlp4_245 8232 2811 4275 2838 4146 2844
urf2_277 10066 4551 6285 4605 6267 4560
sym9_148 9408 1206 1947 1311 2166 1131
hwb7_59 10681 2826 3684 2904 3846 2805
clip_206 14772 4773 6762 4677 6834 4764
sym9_193 15232 3783 5481 3855 6462 4020
dist_223 16624 4560 7470 4764 6582 4857
sao2_257 16864 4899 7596 4386 6756 4749
urf5_280 23764 8721 11988 8946 11679 8916
urf1_278 26692 10824 13872 10749 13809 10548
sym10_262 28084 7500 11490 6633 10623 7284
hwb8_113 30372 8226 11295 8073 11382 7989
Summary 248553 77544 107406 74763 105645 75126
C Depth Analysis
23
0 25 50 75 100 125
2
4
adr4_197
0 50 100 150
1
2
3
4
radd_250
0 25 50 75 100 125
2
4
6
z4_268
0 25 50 75 100
2
4
6
sym6_145
0 25 50 75 100
2
4
6
misex1_241
0 50 100 150 200 250
2
4
rd73_252
0 50 100 150 200
2
4
6
cycle10_2_110
0 50 100 150 200
2
4
6
square_root_7
0 100 200 300
2.5
5.0
7.5
10.0
sqn_258
0 200 400 600
2
4
6
rd84_253
Figure 12: Results of search depths on small benchmark set with parameters setting as described in paper. The
horizontal and vertical axis represent rounds for Decision and search depth, respectively. The green, orange
and blue line represent, respectively, the maximum, average and minimum depth among all Selections before
each Decision.
24
