Timing and resource-aware mapping of quantum circuits to superconducting
  processors by Lao, Lingling et al.
Timing and resource-aware mapping of quantum circuits to superconducting
processors
Lingling Lao,1, 2, 3 Hans van Someren,1, 2 Imran Ashraf,1, 2 and Carmen G. Almudever1, 2
1QuTech, Delft University of Technology, The Netherlands
2Department of Quantum and Computer Engineering, Delft University of Technology, The Netherlands
3Department of Physics and Astronomy, University College London, UK
Quantum algorithms need to be compiled to respect the constraints imposed by quantum proces-
sors, which is known as the mapping problem. The mapping procedure will result in an increase of
the number of gates and of the circuit latency, decreasing the algorithm’s success rate. It is crucial
to minimize mapping overhead, especially for Noisy Intermediate-Scale Quantum (NISQ) processors
that have relatively short qubit coherence times and high gate error rates. Most of prior mapping al-
gorithms have only considered constraints such as the primitive gate set and qubit connectivity, but
the actual gate duration and the restrictions imposed by the use of shared classical control electron-
ics have not been taken into account. In this paper, we present a timing and resource-aware mapper
called Qmap to make quantum circuits executable on a scalable superconducting processor named
Surface-17 with the objective of achieving the shortest circuit latency. In particular, we propose
an approach to formulate the classical control restrictions as resource constraints in a conventional
list scheduler with polynomial complexity. Furthermore, we implement a routing heuristic to cope
with the connectivity limitation. This router finds a set of movement operations that minimally
extends circuit latency. To analyze the mapping overhead and evaluate the performance of different
mappers, we map 56 quantum benchmarks onto Surface-17. Compared to a prior mapping strategy
that minimizes the number of operations, Qmap can reduce the latency overhead up to 47.3% and
operation overhead up to 28.6%, respectively.
I. INTRODUCTION
Quantum computing is entering the Noisy
Intermediate-Scale Quantum (NISQ) era [1]. This
refers to exploiting quantum processors consisting
of only 50 to a few hundreds of noisy qubits - i.e.
qubits with a relatively short coherence time and faulty
operations. Due to the limited number of qubits,
hardly or no quantum error correction (QEC) will be
used in the next coming years, posing a limitation
on the size of the quantum applications that can be
successfully run on NISQ processors. Nevertheless,
these processors will still be useful to explore quantum
physics, and implement small quantum algorithms that
will hopefully demonstrate quantum advantage [2]. For
running quantum applications on NISQ devices, it is
thus crucial to minimize their size in terms of circuit
width (number of qubits), number of gates, and circuit
latency/depth (number of cycles/steps). In addition,
these quantum applications have to be adapted to the
hardware constraints imposed by quantum processors.
The main constraints include:
• Primitive gate set: Generally, only a limited set
of quantum gates that can be realized with rela-
tively high fidelity will be predefined on a quantum
device. Each quantum technology may support a
specific universal set of single-qubit and two-qubit
gates, which are called primitive gates. Different
primitive gates may have different gate durations.
For instance, some superconducting quantum tech-
nologies have CZ as a primitive two-qubit gate of
which the duration is twice as long as of a single-
qubit primitive gate [3].
• Qubit connectivity: quantum technologies such
as superconducting qubits [4–6] and quantum
dots [7, 8] arrange their qubits in 1D/2D archi-
tectures with nearest-neighbour (NN) interactions.
This means that only neighbouring qubits can in-
teract or in other words, qubits are required to be
adjacent for performing a two-qubit gate.
• Classical control: classical electronics are re-
quired for controlling and operating the qubits. Us-
ing a dedicated instrument per qubit is not scal-
able and is a very expensive approach. Therefore,
shared control is required especially when building
scalable quantum processors. For instance, a single
Arbitrary Waveform Generator (AWG) is used for
operating on a group of qubits and several qubits
are measured through the same feedline [9, 10].
All these constraints may vary between different quan-
tum processors, and quantum circuits normally cannot be
directly executable on these devices. A mapping proce-
dure is required to transform a hardware-agnostic quan-
tum circuit into a constraint-compliant one that can be
realized on a given device. This mapping process i) de-
composes any quantum gate into the supported primitive
gates; ii) performs an initial placement of qubits and finds
the set of movement operations to route non-NN qubits
to adjacent positions when they need to interact; and iii)
schedules operations to leverage the maximum available
parallelism. Moreover, minimizing mapping overhead in
terms of the number of gates and circuit execution time
(latency) is critical for implementing quantum algorithms
on NISQ processors.
ar
X
iv
:1
90
8.
04
22
6v
2 
 [q
ua
nt-
ph
]  
2 J
ul 
20
20
2Different solutions including both exact algorithms and
heuristics have been proposed to map quantum circuits
onto NISQ processors. [11–15] propose mapping ap-
proaches for a 2D grid qubit architecture with NN in-
teractions. Other works [6, 16–26] target current quan-
tum processors from IBM and Rigetti which have irreg-
ular qubit connections. Most of prior works [6, 11–23]
mainly consider the qubit connectivity and the primitive
gate set constraints and their strategies focus on min-
imizing gate overhead. They assume that any opera-
tion takes one time-step without taking the actual gate
duration into account. Moreover, they do not consider
the shared classical control electronics, which restricts
the parallelism of some operations. This means the out-
put circuits from previous mapping algorithms need to
be further scheduled by another hardware-aware trans-
lation phase such as OpenPulse from IBM [10] so that
quantum operations can be performed on real qubits with
correct timing without violating any classical control con-
straint [10, 27]. Venturelli et al. [24–26] consider gate
duration and crosstalk constraints, but their mathemati-
cal optimization formulation has exponential complexity.
This paper presents a timing and resource-aware map-
per called Qmap to make quantum circuits executable
on the Surface-17 superconducting processor [27]. Dif-
ferent modules are developed in Qmap to comply with
the hardware constraints, including common restrictions
such as primitive gate set and qubit connectivity, as well
as other hardware parameters such as actual gate du-
ration and classical control constraints which have not
been addressed in prior works. Qmap is embedded in the
OpenQL compiler [28] and its output circuit is described
by an executable low-level QASM-like code with precise
timing information. In order to analyze the impact of
the mapping procedure, we compile 56 benchmarks taken
from RevLib [29] and QLib [30] onto the Surface-17 pro-
cessor. Compared to the original circuit characteristics
before mapping, the evaluation results show that the cir-
cuit latency and the number of operations after mapping
can increase up to 260% and 78.1%, respectively.
The main contributions of this paper are the following:
• We provide a comprehensive analysis of the hard-
ware constraints of the Surface-17 processor, in-
cluding the supported primitive gates with corre-
sponding duration, the processor’s topology that
limits the qubit connectivity, and the classical con-
trol constraints resulting from the shared control
electronics among qubits.
• We develop a Qmap mapper embedded in the
OpenQL compiler [28] to compile a quantum cir-
cuit into one that complies with all the above con-
straints of Surface-17. Specifically, we propose an
approach to formulate the classical control limita-
tions as resource constraints in a conventional list
scheduling algorithm. Its objective is to achieve
the shortest circuit latency and therefore the high-
est gate-level parallelism with respect to these con-
straints. The complexity of the developed schedul-
ing heuristic is polynomial in terms of the number
of operations and resources, which is applicable to
large-scale circuits.
• For coping with the limited qubit connectivity, we
present a routing strategy in Qmap to move qubits
that need to interact to be adjacent. The proposed
router not only finds shortest paths that use least
number of operations for moving qubits (which is
the routing strategy developed in prior works) but
also selects a set of movement operations that will
minimally extend the overall circuit latency. Com-
pared to a prior mapping strategy, the average re-
duction of latency overhead and the average reduc-
tion of gate overhead when using Qmap are 22%
and 3.0%, respectively.
• To enable a flexible implementation, we provide a
method to encode all hardware characteristics in a
configuration file that is accessed by every module
of the compiler. This flexibility also allows a com-
parative analysis of the mapping impacts of differ-
ent characteristics, giving some directions for build-
ing future quantum devices. In addition, it allows
the mapper to target different processors.
• Qmap uses not only SWAP operations (3 consec-
utive CNOTs) for moving qubits but also MOVE
operations (2 consecutive CNOTs) when possible.
Compared to the mapping by only using SWAPs
in prior works, the use of MOVEs helps to reduce
the number of gates and the circuit latency up to
38.9% and 29% respectivel.
The rest of this paper is organized as follows. We first
describe all the hardware parameters that will be con-
sidered in this work in Section II. Then we introduce
the proposed resource-constrained scheduling algorithm
in Section III and other modules of the developed mapper
such as the routing heuristic in Section IV. Afterwards,
we evaluate this mapping strategy in Section V and sum-
marize related works in Section VI. Finally, Section VII
concludes the paper and discusses future work.
II. QUANTUM HARDWARE CONSTRAINTS
In this section, the hardware constraints of the Surface-
17 superconducting processor will be briefly introduced,
including the primitive gates that can be directly per-
formed, the topology of the processor which limits in-
teractions between qubits, and the constraints caused by
the classical control electronics which impose extra limi-
tations on the parallelism of the operations.
3A. Primitive gate set
In order to run any quantum circuit, a universal set
of operations needs to be implemented. In supercon-
ducting quantum processors, these operations commonly
are measurement, single-qubit rotations, and multi-qubit
gates.
In principle, any kind of single-qubit rotation can be
performed on the Surface-17 processor. However, an infi-
nite amount of gates cannot be predefined. In this work,
we will limit single qubit gates to X and Y rotations
(easier to implement), and more specifically ± 45, ±
90 and ± 180 degrees will be used in our decomposi-
tion. The primitive two-qubit gate on this processor is
the conditional-phase (CZ) gate. Table I shows the gate
duration (gate execution time) of single-qubit gates, CZ
gate and measurement (in the Z basis) [31]. After map-
ping, the output circuit will only contain operations that
belong to this primitive gate set. The decomposition for
Z,H, S, S†, T, T †, CNOT, SWAP and MOVE gates into
these primitive gates is shown in Figure 1 (ignoring the
global phase).
TABLE I: The gate duration in cycles (each cycle
represent 20 nanoseconds) of the primitive gates in the
Surface-17 processor.
Gate type Duration
RX(±45,±90,±180) 1 cycle
RY (±45,±90,±180) 1 cycle
CZ 2 cycles
MZ 15 cycles
Z ≡ X Y
H ≡ Y-90 Z ≡ Z Y+90 ≡ X Y-90
T ≡ H X+45 H ≡ Y+90 X+45 Y-90
T † ≡ H X−45 H ≡ Y+90 X−45 Y-90
S ≡ H X+90 H ≡ Y+90 X+90 Y-90
S† ≡ H X+90 H ≡ Y+90 X−90 Y-90
• ≡ •
Y−90 • Y+90
× ≡ • • ≡ • Y−90 • Y+90 •
× • Y−90 • Y+90 • Y−90 • Y+90
|ψ〉
Umv
|0〉 ≡ • ≡ • Y−90 • Y+90
|0〉 |ψ〉 • Y−90 • Y+90 •
FIG. 1: Gate decomposition into primitives supported
in the superconducting Surface-17 processor. Umv is the
MOVE operation.
B. Processor topology
0
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16
FIG. 2: Schematic of the realization of Surface-17
superconducting processor.
Figure 2 shows the topology of the Surface-17 proces-
sor, where nodes represent the qubits and edges repre-
sent the connections (resonators) between them. Two-
qubit gates can only be performed between connected
qubits, i.e., nearest-neighbouring qubits. This implies
that qubits that have to interact but are not placed in
neighbouring positions will need to be moved to be ad-
jacent. Quantum states in superconducting technology
are usually moved using SWAP gates. A SWAP gate
is implemented by three CNOTs that in the case of the
Surface-17 processor need to be further decomposed into
CZ and RY gates as shown in Figure 1. In this work, we
also consider the use of a MOVE operation which only
requires two CNOTs (see Figure 1). Note that a MOVE
operation requires that the destination qubit where the
quantum state needs to be moved to, is in the |0〉 state.
As mentioned, moving qubits results in an overhead in
terms of number of operations and circuit depth, which
in turn will decrease the circuit reliability. Therefore, an
efficient routing procedure is required to find the series of
movement operations to enable all two-qubit gates with
minimum overhead.
C. Classical control constraints
In principle, any qubit in a processor can be operated
individually and then any combination of independent
single-qubit and two-qubit operations can be performed
in parallel. However, scalable quantum processors use
classical control electronics with channels that are shared
among several qubits. Here we will describe the con-
straints imposed by the classical control electronics used
in the Surface-17 processor and how they affect the par-
allelism of quantum operations.
a. Single-qubit gates: Single-qubit gates on super-
conducting qubits are performed by using microwave
pulses. In Surface-17, these pulses are applied at a few
fixed specific frequencies to ensure scalability and pre-
cise control. The three frequencies used in Surface-17
are shown in Figure 2: single-qubit gates on red, blue
and pink colored qubits are performed at frequencies f1,
4f2, and f3, respectively [27]. In this work, we assume
that same-frequency qubits are operated by the same mi-
crowave source or arbitrary waveform generator (AWG)
and a vector switch matrix (VSM) is used for distribut-
ing the control pulses modulated on the waves to the
corresponding qubits [9].
The consequence of this is that one can perform the
same single-qubit gate on all or some of the qubits
that share a frequency, but one cannot perform differ-
ent single-qubit gates at the same time on these qubits
(as these would require other pulses to be generated). For
instance, an X gate can be performed simultaneously on
any of the pink qubits (7, 8 and 9) but not an X and a
Y operation.
b. Measurement: Measuring the qubits is done by
using feedlines each of which is coupled to multiple
qubits [27]. In Figure 2, qubits in the same dashed rect-
angle are using the same feedline, e.g., qubits 13 and 16
will be measured through the same feedline. Because
measurement takes several steps in sequence, measure-
ment of a qubit cannot start when another qubit coupled
to the same feedline is being measured, but any combi-
nation of qubits that are coupled to the same feedline
can be measured simultaneously at a given time. For
instance, qubits 13 and 16 can be measured at time t0,
but it is not possible to start measuring qubit 13 at time
t0 and then measure qubit 16 at time t1 if the previous
measurement has not finished.
c. Two-qubit gates: As mentioned, in the processor
of Figure 2 each qubit belongs to one of three frequency
groups f1 > f2 > f3, colored red, blue and pink, re-
spectively; links between neighbouring qubits are either
between qubits from f1 and f2, or between qubits from
f2 and f3, i.e. between a higher frequency qubit and a
next lower one. In between additional frequencies are de-
fined: f1 > f
int
1 > f2 > f
park
2 > f
int
2 > f3 > f
park
3 (see the
frequency arrangement and the example interactions pre-
sented in Figure 5 of [27]); each qubit can be individually
driven with one of the frequencies of its group (e.g. group
{f2, f int2 , fpark2 }). A CZ gate between two neighbouring
qubits is realized by lowering the frequency of the higher
frequency qubit near to the frequency of the lower one.
For instance, a CZ gate between qubits 3 and 0 is per-
formed by detuning qubit 3 from f1 to f
int
1 , which is near
to the frequency f2 of qubit 0. However, CZ gates will
occur between any two neighbouring qubits which have
close frequencies and share a connection. For example, a
CZ gate can occur between the detuned qubit 3 in f int1
and its neighbour qubit 6 in f2 in the above example. To
avoid this, the qubits that should not be involved in a CZ
gate must be kept out of the way. In this example, q6
needs to be detuned to a lower parking frequency, fpark2
. Note that, qubits in parking frequencies cannot engage
in any two-qubit or single-qubit gate. In addition, when
performing a CZ on qubits 3 and 0, qubit 2 must stay
at f1 (and not be detuned) to avoid interaction between
qubits 2 and 0. The implementation of two-qubit gates
poses limitations not only on parallelizing multiple two-
qubit gates but also on the parallelism of two-qubit gates
and single-qubit gates. More details can be found in [27].
Violation of these classical control constraints will
cause incorrect execution of quantum operations, leading
to a computational failure. Therefore, scheduling algo-
rithms that can take these constraints into account are
needed to explore the maximum available parallelism.
D. Configuration file
The hardware characteristics explained in this section
are precisely described in a configuration file (in json for-
mat). It parameterizes the mapping modules that will
be introduced in the next section.
a. Primitive gate set: For Surface-17, the primitive
gates with all attributes including duration as listed in
Table I and the gate decomposition rules corresponding
to those in Figure 1 are described in full detail in the
configuration file.
b. Processor topology: The topology is defined by
describing each connection with its source and target
qubits. In Surface-17, all edges are bidirectional, e.g.,
both CNOT(qa, qb) and CNOT(qb, qa) can be performed
on edge e(qa, qb). Qubits and directed qubit connections
are both named by integer values taken from contiguous
ranges of integer numbers starting from 0. As an exam-
ple, the qubit numbering of the Surface-17 processor is
shown in Figure 2; in the Surface-17 topology the number
of directed qubit connections is 48.
c. Classical control constraints: For single-qubit
gates, we use a look-up table Tg1 to describe the avail-
able AWGs and the list of corresponding qubits that each
AWG controls. Similarly for measurement, the feed-
lines (three feedlines in Surface-17) and the correspond-
ing qubits that each feedline is coupled to are described
in a look-up table Tgm in the configuration file. The
AWGs and feedlines are both named by contiguous inte-
ger numbers starting from 0. As mentioned in Section II,
it is assumed that three AWGs and three feedlines are
used in Surface-17, that is, |Tg1| = 3 and |Tgm| = 3,
respectively. The classical control constraints of two-
qubit gates are defined by using two look-up tables.
One called Tg2f describes for each connection which other
connections cannot be used to execute CZ gates in par-
allel (24 bi-directional edges on the Surface-17 topology,
i.e. |Tg2f | = 48). The other table Tg2d describes for each
connection which set of qubits needs to be detuned in
addition to one of its end-points, which means a CZ on
this connection and single-qubit gates on these detuned
qubits cannot be performed in parallel( |Tg2d| = 48).
III. RESOURCE-CONSTRAINED SCHEDULING
Some current quantum technologies such as supercon-
ducting qubits and quantum dots have relatively short
coherence times, limiting the size of circuits that can be
5run successfully with high fidelity. It is therefore nec-
essary to minimize the execution time of the circuit (or
makespan, or circuit latency) and explore the highest
gate-level parallelism, which is the objective of a quan-
tum gate scheduler. Before discussing the other map-
ping modules, we first introduce the proposed heuris-
tic scheduling algorithm that can take the actual gate
duration and classical control constraints into account.
The circuit shown in Figure 3 will be used as an exam-
ple. We refer to the qubits in the quantum circuit as
virtual qubits (others call them program qubits or logi-
cal qubits). These need to be mapped to the qubits in
the quantum processor called physical, real or hardware
qubits or locations
A. Weighted dependency graph
As mentioned previously, precise timing is essential for
correctly executing quantum applications on real qubits.
Therefore, a scheduler that considers gate duration is re-
quired to efficiently generate the correct instruction se-
quences with timing information meanwhile minimizing
the circuit execution time. Prior works [6, 11–23] do not
consider the actual gate duration, assuming any opera-
tion takes one time-step. To ensure quantum operations
can be executed at correct time, their output circuits
need to be further scheduled by some other low-level
hardware-aware units such as OpenPulse [10]. In con-
trast, the scheduling algorithm developed in the Qmap
mapper will directly take gate duration into account.
Similar to classical scheduling, a Quantum Operation
Dependency Graph (QODG) G(VG, EG) is constructed
from the QASM representation of a quantum circuit, in
which each operation is denoted by a node vi ∈ VG, and
the data dependency between two operations vi and vj is
represented by a directed edge e(vi, vj) ∈ EG with weight
wi that represents the duration of operation vi. Pseudo
source and sink nodes are added to the start and end to
simplify starting and stopping iteration over the graph.
The QODG of the circuit in Figure 3a is shown in Figure
4a. In previous works that do not consider gate duration,
only directed graphs are constructed, which cannot be
directly applied to this work.
B. Formulation of resource constraints
Furthermore, the scheduler also needs to adhere to the
parallelism restrictions imposed by the shared classical
control electronics as described in Section II. In this work,
these classical control constraints are treated as resource
constraints in an otherwise conventional critical path list-
scheduler implementation [32]. A so-called machine state
S is defined to describe the occupation status of each re-
source ri ∈ R, where R represents the set of all resources
in Tg1, Tgm, Tg2f , and Tg2d. The constraints for single-
qubit gates and measurement are implemented by using
(a) (b)
FIG. 3: An example circuit consisting of 6 qubits and
15 gates. (a) Its circuit description (top) and its virtual
to physical qubits mapping (bottom) for the Surface-17
processor after initial placement; (b) Its cQASM
representation without scheduling.
(a) (b)
FIG. 4: (a) The QODG of the circuit in Figure 3a. The
red and purple boxed CNOTs have qubits that are not
NN. (b) The parallel cQASM code of the mapped
circuit, where each line represents one cycle and
operations in the same line (inside one bracket) are
scheduled to start at the same cycle. The CZ gates in
bold are already nearest neighbouring. Movement
operations (added two-qubit gates are in yellow) are
inserted to perform the CZ gates in red and purple.
|Tg1| and |Tgm| resource states, respectively. To support
the two-qubit gates constraint, there is a resource state for
each connection (to constrain mutual CZ concurrency)
and a resource state per qubit (to constrain CZ versus
single-qubit gate concurrency). Specifically, a resource
state consists of two elements: the operation type that
is using this resource and the occupation period which is
described by a pair of cycle time ([t0, t1)), representing
the first cycle that it is occupied and the first cycle that
it is free again, respectively. If an operation v is sched-
6uled at cycle t0 (v.cycle = t0), then all the resources for
performing v (v.resources) will be occupied till (and not
including) t1 = t0 + v.duration (v.duration is the dura-
tion of v).
Algorithm 1 Forward Scheduling algorithm
Input: Non-scheduled circuit
Input: Configuration file with gate durations and resource
descriptions R
Output: Scheduled circuit
1: Generate QODG G(VG, EG) from circuit
2: Initialize ∀v ∈ VG : v.resources ⊂ R and v.duration
3: Vm ← Unique pseudo source node
4: Vav ← All available gates in G(VG − Vm, EG)
5: Initialize cycle t← 0
6: Initialize machine-state S ← ∀r ∈ R is free
7: while Vav 6= ∅ do
8: Vr ← resource-free gates ⊂ Vav based on S
9: if Vr 6= ∅ then
10: Vc ← Most-critical gates ⊂ Vr in G(VG − Vm, EG)
11: Select v ∈ Vc which is first in the circuit
12: Add v to Vm
13: v.cycle← t
14: Update S with v.resources occupied at [t, t +
v.duration)
15: Vav ← All available gates in G(VG − Vm, EG)
16: else
17: t← t+ 1
18: end if
19: end while
C. Scheduling heuristic
Algorithm 1 shows the pseudo code of this algorithm,
which schedules all the gates of a given circuit with re-
spect to the resource constraints. Its objective is to
achieve the shortest circuit latency.
The heuristic maintains two sets of gates: Vm holds
the gates that have been scheduled, and Vav includes the
gates that are available for scheduling. A gate v is avail-
able when all predecessors p of v in G have been sched-
uled, that is, ∀p, p is in Vm. Furthermore, it maintains
a machine-state S consisting of all resource states as de-
scribed above.
Algorithm 1 first generates a QODG for the input
circuit and initializes Vm, Vav, and S (lines 1-6). Af-
ter finding all the available gates at current cycle t, it
selects the ones that can be scheduled at cycle t and
collects them in Vr (line 8). A gate v ∈ Vav can
be scheduled at cycle t only if it is resource-free at
t: when its predecessors have finished execution, i.e.,
∀p ∈ Vm, p.cycle+ p.duration 6 t (this data dependency
constraint can be seen as qubit resource constraint); and
when all resources in v.resources are not occupied for all
cycles in [t, t + v.duration). The worst-case time com-
plexity of this step is O(min(g, n) · (n+ |R|)), n and g are
the number of qubits and operations in the input circuit,
respectively (in the worst case, gates on every qubit can
be scheduled.).
If Vr is not empty, the heuristic selects the first most-
critical gate v in this set (lines 9-11). A most-critical gate
in Vr is the one that has the longest path to the pseudo
sink node of the QODG G. In this work, the length of the
longest path is pre-computed for each node in G, which
only takes linear time. Then it adds this gate v to Vm,
assigns the current cycle attribute to v.cycle, updates S
by reserving all the resources of v (v.resources) for its
execution duration, and updates Vav given that v has
been scheduled now and thus some more gates may have
become available (lines 12-15). In this case, cycle t is not
incremented because more gates may be scheduled in the
same cycle.
If Vr is empty, the heuristic increments t (line 17) and
continues the schedule loop again until all the gates are
scheduled, that is, Vav is empty. In the worst case, this
loop needs to be repeated O(L) times, L is the multi-
plication of the total number of operations (g) in the
given circuit and the longest gate duration in cycles.
Resource-constrained scheduling is NP-hard in the strong
sense [33]. Previous works that are using exact opti-
mization approaches or exhaustive search algorithms for
scheduling [12, 17, 18, 24] cannot be adapted to efficiently
solve this problem. In contrast, the proposed scheduling
algorithm has reduced its complexity to at most
Oschedule = O (min(g, n) · (n + |R|) · g) .
IV. MAPPING QUANTUM ALGORITHMS
Mapping means to transform the original hardware-
agnostic quantum circuit that describes the quantum al-
gorithm to an equivalent one that can be executed on the
target quantum processor. To this purpose, the mapping
process has to be aware of the constraints imposed by
the physical implementation of the quantum processor.
These include the set of primitive gates that is supported,
the allowed qubit interactions that are determined by the
processor topology, and the limited concurrency of multi-
gate execution because of classical control constraints.
Mapping will likely increase the number of operations
that are required to implement the given algorithm as
well as the circuit latency/depth, decreasing the reliabil-
ity of the algorithm. Efficient algorithms that can mini-
mize this mapping overhead are then necessary, especially
in NISQ processors where noise sets a limit on the maxi-
mum size of a computation that can be run successfully.
A. Overview of the Qmap mapper
The Qmap mapper developed in this work is embed-
ded in the OpenQL compiler [28] and its design flow is
shown in Figure 5. The input of Qmap is a quantum cir-
cuit written in OpenQL (C++ or Python). The OpenQL
compiler reads and parses it to a QASM-level interme-
diate representation. Qmap then performs the mapping
7and optimization of the quantum circuit based on the
processor characteristics provided in a configuration file
as described in the previous section. This approach al-
lows Qmap to target different quantum devices by just
changing the parameters in the configuration file. After
mapping, QASM-like code is generated. Currently, the
OpenQL compiler is capable of generating cQASM [34]
that can be executed on the QX simulator [35] as well
as eQASM [36], a QASM-like executable code that can
target the Surface-17 processor. The generation of other
QASM-like languages will be part of future extensions
of the OpenQL compiler. The modules of Qmap will be
discussed in the rest of this section.
Ro
ut
in
g  
w
ith
 
lo
ca
l s
ch
ed
ul
in
g
Gl
ob
al
 
sc
he
du
ln
g
In
iti
al
 
pl
ac
em
en
t
Qmap mapper
optimization
Quantum 
algorithms 
written in 
OpenQL
QASM 
code
(cQASM
   eQASM)
Executable codeInput circuits
Configuration file
- Elementary gates with duration
- Gate decomposition 
- Chip topology
- Classical control constraints
FIG. 5: Overview of the Qmap mapper embedded in
the OpenQL compiler.
B. Initial placement
It is preferable to place highly interacting qubits next
to each other such that less movement operations will
be added for performing two-qubit gates. Similar to
the placement approaches in [37], the initial placement
problem in this work is formulated as a quadratic as-
signment problem (QAP) and the objective is to mini-
mize the movement or communication overhead, which
is modeled by the distance between interacting qubits
minus 1. Qmap tries to find an initial placement with
minimum communication overhead by using the Integer
Linear Programming (ILP) algorithm presented in [38].
Such an initial placement implementation can only solve
small-scale problems in reasonable time. Even though for
near-term implementations these numbers largely suffice,
for large-scale circuits, one can either partition a large
circuit into several smaller ones or apply heuristic algo-
rithms to efficiently solve these models [39]. Other works
also solve this initial placement problem by using a Sat-
isfiability Modulo Theories (SMT) solver [40].
C. Qubit Router
It is unlikely to find an initial placement in which all
the qubit pairs that a two-qubit gate need to be per-
formed on can be placed in neighboring positions. There-
fore, qubits will have to be moved during computation.
For instance, based on the initial placement of qubits
shown in Figure 3a, the first 6 CNOT gates of the circuit
can be performed directly as qubits are NN, but qubits
in the last 2 CNOT gates will need to be routed to ad-
jacent positions. Routing refers to the task of finding
a series of movement operations that enables the execu-
tion of two-qubit gates on a given processor topology with
low communication overhead. To do so, multiple routing
paths are evaluated and one is selected based on various
optimization criteria such as the number of added move-
ment operations, increase of circuit depth, or decrease of
circuit reliability [6, 16–22, 24, 40, 41]. Afterwards, the
corresponding movement operations are inserted.
Algorithm 2 Forward Routing algorithm
Input: Non-routed circuit, VP-map M
Input: Configuration file with topology and constraints
Output: Routed circuit
1: Generate QODG G(VG, EG)
2: Vm ← Unique pseudo source node
3: Vav ← All available gates in G(VG − Vm, EG)
4: while Vav 6= ∅ do
5: Vnn ← All single-qubit and NN two-qubit gates in Vav
6: if Vnn 6= ∅ then
7: Select the first most-critical gate v ∈ Vnn
8: else
9: Vc ← Most-critical gates ⊂ Vav in G(VG−Vm, EG)
10: Select v ∈ Vc which is first in the circuit
11: Insert movement(s) for v
12: Update M
13: end if
14: Map v according to M
15: Add v to Vm
16: Vav ← All available gates in G(VG − Vm, EG)
17: end while
1. Routing heuristic
In this work, after the ILP-based initial placement, a
heuristic algorithm is used to perform this routing task.
It is a scheduler-based heuristic of which the objective
is to minimize overall circuit latency. Algorithm 2 shows
the pseudo code of the proposed routing algorithm, which
finds all two-qubit gates in which qubits are not nearest-
neighbours and inserts the required movement operations
to make them adjacent. As mentioned in Section II
we use SWAPs as well as MOVE operations for moving
qubits.
The router algorithm starts by mapping the pseudo
source node and then selecting all available gates (Vav)
from the generated QODG (lines 1-3). Then it finds all
the single-qubit gates and the two-qubit gates of which
8qubits are NN from Vav, these gates are collected in Vnn
(line 5). If Vnn is not empty, then all gates in this set
are mapped directly and a new set of available gates is
computed (lines 6, 7, and 13-15). Mapping a (NN) gate
implies replacing virtual qubit operands by their physical
counterparts according to the VP-map table M similar
to the one shown in Figure 3a and decomposing this gate
to its primitives when the configuration specifies so.
After that, only non-NN two-qubit gates remain in the
available set. The router selects the ones which are most
critical in the remaining dependency graph G since they
have the highest likelihood to extend the circuit when
mapped in an inefficient way or when delayed (line 9).
When there are several equally critical gates, the routing
heuristic chooses the first one in the input circuit (line
10) and finds a set of movement operations to bring these
two qubits to adjacent positions. After the movement set
selection, the router schedules the SWAP/MOVE opera-
tions into the circuit (line 11), updates the VP-map (line
12), recomputes the set of available gates (line 15), and
runs the routing heuristic until all the gates are mapped.
2. Movement set selection
For finding a set of movement operations for a non-
NN two-qubit gate, all shortest paths between these two
qubits are considered. During Qmap initialization time,
the distance (i.e. the length of the shortest path) between
each pair of qubits has been computed using the Floyd-
Warshall algorithm. Finding all shortest paths between
qubits at mapping-time is done by a breadth-first search
(BFS), that is, selecting only path extensions which de-
crease the distance between the qubits. For each short-
est path, there may exist several movement sets since
qubits can meet in any neighboring position within the
path. Note that all movement sets would lead to adding
an equal minimum number of movements to the circuit.
In a
√
N × √N grid architecture, the total number of
shortest paths between most remote two nodes (qi, qj) is
O(4
√
N ) and the number of movement sets for each path
is (2
√
N − 2).
In this work, a set of movement operations that min-
imally extends the circuit latency is selected and sched-
uled into the circuit. As shown in Algorithm 3, this
router evaluates all movement sets by looking back to
the previously mapped gates (lines 1 and 2) and inter-
leaving each set of movements with those gates using the
proposed resource-constrained scheduling heuristic (Sec-
tion III) in an as-soon-as-possible (ASAP) policy (line
4). It selects the one(s) which minimally extend(s) the
circuit latency (lines 6 and 7). When there are multiple
minimal-cost sets, a random one is taken. The complex-
ity of this routing strategy is O(g
√
n4
√
n) ·Oschedule.
Algorithm 3 Movement selection algorithm
Input: QODG G(VG, EG), gate v, VP-map M
Input: Configuration file with topology and constraints
Output: The set of movements for v
1: P ← All shortest paths for v
2: MVP ← All possible sets of movements based on P
3: for mvj in MVP do
4: Interleave mvj with previous gates (looking back)
5: Lmvj ← circuit’s latency extension by mvj
6: end for
7: if Lmvi = min(
⋃
j Lmvj ) then
8: Select mvi as the set of movements, picking a random
minimum one when there are more
9: end if
D. Global scheduling
After routing, the circuit adheres to the processor
topology constraint for two-qubit interactions and has
been scheduled in an as-soon-as-possible (ASAP) way.
The global scheduler reschedules the routed circuit to
achieve the shortest circuit latency and the highest
instruction-level parallelism. It does this in an as-late-as-
possible (ALAP) way to minimize the required life-time
and thus the decoherence error of each qubit. The global
scheduler employs a backward version of Algorithm 1,
i.e. it traverses the circuit starting from the sink node,
working backwards through the circuit and decrementing
t.
E. Decomposition and optimization
Starting from a quantum circuit described in cQASM
format (see Figure 3), the circuit is also decomposed
during mapping into one which only contains the prim-
itive gates specified in the configuration file, on top of
adherence to the other constraints. A circuit optimiza-
tion module is also implemented to reduce the number
of gates, e.g., two consecutive X gates can cancel each
other out.
The decomposition and optimization can be done at
every step of the mapping procedure, i.e. before, dur-
ing, and after routing. Qmap reduces sequences of single
qubit gates to their minimally required sequence both
before and after routing. Whether decomposition is ap-
plied at a mapping step is specified in the configuration
file. The implementation of the QODG represents the
commutability of not only all gates with disjoint qubit
operands but also the known two-qubit operations CNOT
and CZ with overlapping operands, and optimizes their
order during both routing and global scheduling.
The mapped version of the circuit in Figure 3a by using
the Qmap mapper is shown in Figure 4b. It is described
in cQASM code with precise timing information, that is,
which operations can be issued at each cycle. The output
circuit can also be represented by eQASM code [36] that
9can be directly read by the quantum microarchitecture
in [42].
V. QMAP EVALUATION
In this section, we evaluate Qmap by mapping a set of
benchmarks from RevLib [29] and QLib [30] on the su-
perconducting processor Surface-17 that has a distance-3
surface-code topology [27]. All the hardware constraints
discussed in Section II, including the primitive gates with
their real gate duration, the topology and the electronic
control constraints are taken into account. The mapping
experiments are executed on a server with 2 Intel Xeon
E5-2683 CPUs (56 logical cores) and 377GB memory.
The Operating System is CentOS 7.5 with Linux kernel
version 3.10 and GCC version 4.8.5.
A. Benchmarks
The circuit characteristics of the used benchmarks are
shown in Table II. All circuits have been decomposed into
ones which only consist of gates from the universal set
{Pauli, S, S†, T, T †, H, CNOT}. In these benchmarks,
the number of qubits varies from 3 to 16, the number of
gates goes from 5 to 64283, and the percentage of CNOT
gates varies from 2.8% to 100%. Moreover, the minimum
circuit depth and the minimum circuit latency are also
included, ranging from 2 to 35572 time-steps and from
5 to 12256 cycles (using the gate duration of Surface-
17), respectively. Note that these numbers are meant
to characterize the algorithms without considering the
processor topology and classical control constraints.
The latter two parameters are defined as follows:
Circuit depth is the length of the circuit. It is equiva-
lent to the total number of time-steps for executing the
circuit assuming each of the gates takes one time-step.
Circuit latency refers to the execution time of the cir-
cuit considering the real gate duration. Latency and gate
duration are expressed in cycles. In this paper, we as-
sume that a cycle takes 20 nanoseconds.
In order to generate quantum circuits which are ex-
ecutable on real processors, extra movement operations
need to be added and gate parallelism will be compro-
mised. Other parameters after mapping these bench-
marks to the Surface-17 processor are obtained, such as
the number of inserted SWAP and MOVE operations and
the CPU time the mapping process takes. We analyze
the impact of the mapping procedure in terms of number
of gates and circuit latency for Surface-17. The mapping
overhead is calculated by (Xo−Xin)/Xin, where Xin and
Xo represent the values of the same circuit characteristic
before and after mapping, respectively.
B. Prior mapping strategies
As mentioned previously, the routing algorithms in
most of prior works [6, 11–23] optimise the number of
operations, that is, the number of added SWAP gates.
They do not take actual gate duration and classical con-
trol limitations into account. Their output circuits need
to be further scheduled by a low-level hardware unit like
OpenPulse [10] such that they can be correctly executed
with precise timing. In this work, we also implement such
a mapping procedure called MinPath mapper to com-
pare with the proposed timing and resource-aware Qmap
that considers gate duration and control constraints dur-
ing routing to minimize the circuit latency. MinPath uses
the same initial placement approach as the Qmap map-
per. However, the router in MinPath randomly selects
one of the movement sets along one of the shortest paths
as described in Section IV C without respecting to clas-
sical control constraints and without evaluating which
set(s) will minimally extend circuit latency. The com-
plexity of the router in MinPath is O(g
√
n4
√
n).
Furthermore, we also introduce a Trivial mapper that
may not be able to map the circuit with minimal latency
extension but its routing strategy has linear complexity
(O(g)). In the trivial mapping strategy, a naive initial
placement is used in which qubits are just placed in their
appearing order, no circuit optimization is performed.
For the router in the trivial mapper, the gates in the in-
put circuit are mapped in the order as they appear in
the circuit, i.e. by-passing the QODG. For performing a
non-NN two-qubit gate, it simply selects the first shortest
path that is found. Moreover, only a single set of move-
ment operations is generated for the chosen path, the set
moving the control qubit adjacent to the target qubit.
In addition, only SWAP gates are generated for moving
qubits. After routing, the proposed resource-constrained
scheduling will be applied. By contrast, the MinPath
and Qmap mappers use the ILP-based initial placement,
enable circuit optimization, and can insert both SWAP
and MOVE gates.
The main differences of these three mapping strategies
are summarized in Table III. To provide gate sequences
with precise timing and comply with the classical control
constraints, the proposed resource-constrained schedul-
ing is also performed after routing procedure of the Triv-
ial and MinPath mappers.
C. Overhead comparison of different mappers
Table IV shows the results of mapping benchmark cir-
cuits to the Surface-17 superconducting processor using
three different mapping strategies: Trivial, MinPath, and
Qmap. In this paper, the mapper is set to only find an
ILP-based initial placement for the first ten two-qubit
gates in any given circuit and computation time is lim-
ited to 10 minutes and is not included in the final CPU
time. For each benchmark circuit, the mapping proce-
10
TABLE II: The characteristics of the input benchmarks including the number of qubits, the total number of gates,
the number of two-qubit gates (CNOTs), its circuit depth and its circuit latency in cycles (20 ns per cycle).
Benchmarks Qubits Gates CNOTs Depth Latency Benchmarks Qubits Gates CNOTs Depth Latency
alu bdd 288 7 84 38 48 169 sym9 146 12 328 148 127 450
alu v0 27 5 36 17 21 72 sys6 v0 111 10 215 98 74 266
benstein vazirani 16 35 1 5 40 vbeAdder 2b 7 210 42 52 116
4gt12 v1 89 6 228 100 130 448 wim 266 11 986 427 514 1788
4gt4 v0 72 6 258 113 137 478 xor5 254 6 7 5 2 5
4mod5 bdd 287 7 70 31 40 140 z4 268 11 3073 1343 1643 5688
cm42a 207 14 1776 771 940 3249 adr4 197 13 3439 1498 1839 6377
cnt3 5 180 16 485 215 207 729 9symml 195 11 34881 15232 19235 66303
cuccaroAdder 1b 4 73 17 25 58 clip 206 14 33827 14772 17879 61786
cuccaroMultiply 6 176 32 55 133 cm152a 212 12 1221 532 684 2366
decod24 bdd 294 6 73 32 40 143 cm85a 209 14 11414 4986 6374 21967
decod24 enable 6 338 149 190 669 co14 215 15 17936 7840 8570 29608
graycode6 47 6 5 5 5 20 cycle10 2 110 12 6050 2648 3384 11692
ham3 102 3 20 11 11 41 dc1 220 11 1914 833 1038 3597
miller 11 3 50 23 29 105 dc2 222 15 9462 4131 5242 18097
mini alu 167 5 288 126 162 564 dist 223 13 38046 16624 19693 68111
mod5adder 127 6 555 239 302 1048 ham15 107 15 8763 3858 4793 16607
mod8 10 177 6 440 196 248 872 life 238 11 22445 9800 12511 43123
one two three 5 70 32 40 141 max46 240 10 27126 11844 14257 49400
rd32 v0 66 4 34 16 18 66 mini alu 305 10 173 77 68 242
rd53 311 13 275 124 124 441 misex1 241 15 4813 2100 2676 9240
rd73 140 10 230 104 92 330 pm1 249 14 1776 771 940 3249
rd84 142 15 343 154 110 394 radd 250 13 3213 1405 1778 6163
sf 274 6 781 336 436 1516 root 255 13 17159 7493 8835 30575
shor 15 11 4792 1788 2268 7731 sqn 258 10 10223 4459 5458 18955
sqrt8 260 12 3009 1314 1659 5740 square root 7 15 7630 3089 3830 13049
squar5 261 13 1993 869 1048 3644 sym10 262 12 64283 28084 35572 122564
sym6 145 7 3888 1701 2187 7615 sym9 148 10 21504 9408 12087 41641
TABLE III: The main differences of the Trivial, MinPath, and Qmap mappers. n and g are the number of qubits
and gates in an input circuit, respectively.
Circuit
optimization
ILP-based
placement
Routing
Smart gate
selection
Shortest
path
MOVE
operation
Multiple
movement sets
Minimize
latency
wrt. Classical
controls
Complexity
Trivial No No No Yes No No No No O(g)
MinPath Yes Yes Yes Yes Yes Yes No No O(g
√
n4
√
n)
Qmap Yes Yes Yes Yes Yes Yes Yes Yes O(g
√
n4
√
n) ·Oschedule
dure is executed for five times and the one with minimum
overhead is reported.
Compared to the circuit characteristics before mapping
(Table II), no matter which strategy is applied, the map-
ping procedure results in high overhead for most of the
benchmarks as shown in Table IV. The only exceptions
are the ‘benstein v’ and ‘graycode6 47’ circuits, because
some operations in these circuits can be canceled out
by the optimization module in the mapper, decreasing
their circuit sizes. When the trivial mapper is used, the
mapping procedure leads to a high overhead in both cir-
cuit latency and total number of gates that ranges from
50% (‘graycode6 47’) to 1160% (‘xor5 254’) and from
122.9% (‘wim 266’) to 800% (‘xor5 254’), respectively.
The MinPath mapper results in an increase of the cir-
cuit latency and the total number of gates that goes from
38.9% (‘alu v0 27’) to 260% (‘xor5 254’)) and from 26.0%
(‘cuccaroAdder 1b’) to 373.4% (‘rd84 142’), respectively.
Finally, the proposed Qmap mapper increases the cir-
cuit latency and the total number of gates from 32.4%
(‘miller 11’) to 260% (‘xor5 254’) and from 20.7% (‘cuc-
caroAdder 1b’) to 78.1% (‘rd32 v0’), respectively.
Furthermore, we compare the resulted overhead of
these three mapping strategies as shown in Figure 6. The
trivial mapper leads to the highest mapping overhead as
less optimization is performed. Compared to the triv-
ial strategy, the MinPath mapper can reduce the latency
overhead and gate overhead up to 140% (‘gray6 47’) and
11
FIG. 6: Comparison of three different mapping strategies. Overhead reduction (left) when comparing the MinPath
mapper to the trivial mapper and (right) when comparing Qmap to MinPath. Benchmarks are in the horizontal axis
and listed in their appearing order in Table II.
360% (‘benstein vazirani’), respectively. The average la-
tency (AVL) reduction and average gate (AVG) reduction
are 30% and 30.2%, respectively. Moreover, the proposed
Qmap mapper has lower or equal overhead than the Min-
Path mapper in terms of both circuit latency and num-
ber of gates for 96.4% and 87.5% of the benchmarks,
respectively. More specifically, Qmap can reduce the la-
tency overhead up to 47.3% (‘decod24 b’) and decrease
the gate overhead up to 28.6% (‘cuccaroMultiply’) com-
pared to the MinPath mapping strategy. The average
latency (AVL) reduction and average gate (AVG) reduc-
tion are 22% and 3.0%, respectively. This is because the
router in the MinPath mapper only considers the qubit
connectivity limitation and minimizes the number of op-
erations, that is, it randomly selects a movement set that
has minimum number of operations to move qubits to be
neighbours. The gate duration and classical constraints
will only be taken into account by a later module (such
as the global scheduler in this work and the OpenPulse in
IBM Qiskit [16]). In comparison, the router in Qmap uses
the proposed resource-constrained scheduling approach
as base and evaluates more minimum-weight movement
sets to select one which minimally extends the circuit
latency (Section IV).
D. Scalability and runtime
As discussed in Section IV, the complexity of the pro-
posed resource-constrained scheduling heuristic in the
worst case is still polynomial, making it applicable to
large-scale quantum circuits. The complexity of the rout-
ing heuristic is polynomial in terms of the number of
gates but scales sub-exponentially with the number of
qubits in a given circuit when using the Qmap and Min-
Path strategies.
We have tested three mapping strategies (Trivial, Min-
Path, Qmap) for different sizes of benchmarks, in which
the number of qubits ranges from 3 to 16 and the two-
qubit gate number from 5 to 62483. The runtime (in sec-
onds) that different mappers requires for mapping each
benchmark on the Surface-17 processor can be found in
Table IV, which is measured by the CPU time that the
entire mapping procedure takes, excluding the time the
ILP-based initial placement takes. As expected, the map-
per that performs more optimizations and evaluates more
movement sets has a longer runtime. In this case, the
trivial mapper has the shortest execution time whereas
the Qmap takes the longest time. For example, when
mapping the largest benchmark ‘sym10 262’ with 62483
gates onto the Surface-17 processor, the trivial and the
Qmap mappers take 72.8 seconds and 9083.4 seconds,
respectively.
Based on the complexity analysis and the experimental
results, we can conclude that Qmap is scalable in terms
of large number of gates. However, our experiments only
use benchmarks which have less 20 qubits. Therefore, its
scalability with the number of qubits needs to be further
investigated. Moreover, one may need to make a com-
promise between mapping performance and runtime for
large-scale benchmarks.
E. SWAPs versus MOVEs
FIG. 7: Reduction of mapping overhead when using
MOVEs if possible compared to when only using
SWAPs. Benchmarks are in the horizontal axis and
listed in their appearing order in Table II. The average
latency (AVL) reduction and average gate (AVG)
reduction are 2.76% and 4.21%, respectively.
12
As mentioned in Section II, a SWAP gate is imple-
mented by three consecutive CNOT gates whereas a
MOVE operation is implemented by two consecutive
CNOT gates but requiring an ancilla qubit in the state
|0〉. Therefore, if there are available ancilla qubits (qubits
that are not used for computation), then it is preferable
to use MOVE operations rather than SWAP gates, which
helps to reduce the mapping overhead. In this section,
we evaluate the benefit of using MOVE operations in-
stead of only using SWAPs. We map the benchmarks
in Table II onto the Surface-17 processor using the Min-
Path mapper. Different from the setups in Table IV, to
have a fair comparison between using MOVEs if possi-
ble and only using SWAPs, in this case the native initial
placement is applied and the first movement set is al-
ways selected. With the same qubit overhead, the map-
ping with MOVEs can reduce the number of gates up
to 38.9% (‘bestein vazirani’) and the circuit latency up
to 29% (‘graycode6 47’) compared to the mapping with
only SWAPs as shown in Figure 7. The latency reduction
and gate reduction are higher than 1% for around 48.2%
and 64.3% of the benchmarks, respectively.
VI. RELATED WORK
To achieve the shortest circuit latency and provide
precise timing information to generate correct control
signals, schedulers that consider actual gate duration
should be developed. Furthermore, the control electronic
constraints that can be very restrictive especially when
scaling-up quantum processors, should also be taken into
account to allow valid execution of quantum applica-
tions. As discussed previously, most of prior mapping
works [6, 11–23] only focus on the primitive gate set and
qubit connectivity constraints. The output circuits from
prior mappers need to be further scheduled with respect
with the gate duration and classical control constraints,
which is less optimal than the Qmap mapper as shown
in Section V. Moreover, they all use SWAP operations
for moving qubits when targeting superconducting quan-
tum processors. In addition, so far no mapper has been
developed for more scalable quantum processors such as
the Surface-17 processor presented in [27, 43]. Although
this type of processors has been designed with the aim of
building a large qubit array capable of performing fault-
tolerant quantum computations based on surface code,
it can be also used for running quantum algorithms in a
near-term implementation.
Many existing mapping algorithms [6, 11–26] and this
paper use either the number of inserted movement oper-
ations or the circuit depth/latency as optimization met-
rics. Although all these metrics affect the success prob-
ability of a quantum circuit, an analysis on which ones
are more critical to be minimized is required. Recent
works [19, 21, 22, 40, 41, 44] suggest to choose the rout-
ing path based on the fidelity of the two-qubit gates along
the path as they are used to implement the movements
(noise-aware mapper). However, the reliability of a path
is calculated by simply multiplying the fidelity of each
gate without considering error propagation and decoher-
ence, which makes this metric incomplete and not very
accurate and it thus sometimes fails in predicting the
most reliable route [22]. A more accurate metric that can
well represent success probability and also can be easily
used by the mapping procedure needs to be developed.
VII. CONCLUSION AND DISCUSSION
In this work, we have presented a mapper called Qmap
to make quantum circuits executable on the Surface-17
superconducting processor. It takes into account com-
mon processor constraints such as the primitive gate set
and qubit connectivity, as well as actual gate duration
and classical control electronic restrictions as it mini-
mizes the circuit latency. Qmap has been embedded in
the OpenQL compiler and consists of several modules, in-
cluding qubit initial placement, resource-constrained list
scheduling with polynomial complexity, qubit routing,
and gate decomposition and optimization. The evalua-
tion results show that the proposed timing and resource-
aware mapper results in lower overhead in terms of both
circuit latency and number of gates compared to the prior
mapping strategy (MinPath) that minimizes the number
of operations in the routing process and then reschedules
the circuits with respect to the actual gate duration and
classical control constraints. However, the complexity of
the routing algorithm in Qmap scales sub-exponentially
with the number of qubits in the input circuit. Future
work can reduce its complexity by only evaluating the
shortest paths where less qubits were, are or will be busy
in the past, current, or coming cycles.
Furthermore, Qmap can be applied to different pro-
cessors by only changing their corresponding hardware
characteristics in the configuration file. We will investi-
gate the performance of Qmap on other NISQ processors
and compare it with prior works in the future. In addi-
tion, more mapping metrics need to be investigated and
included in the mapper. Note that what parameter(s) to
optimise during the mapping might depend on the char-
acteristics of the target quantum processor. In addition,
our mapping approach is based on the compilation of
quantum circuits at the gate level. Although it generates
valid instructions with precise timing, they still need to
be further translated into appropriate signals that con-
trol the qubits by the microarchitecture proposed in [42].
A different approach is to directly compile quantum algo-
rithms to control pulses [45]. Further work will compare
both solutions and investigate the trade-off of allocating
mapping tasks to the compiler and the microarchitecture.
13
ACKNOWLEDGMENTS
The authors would like to thank Xiang Fu and Adriaan
Rol for useful discussions. The authors acknowledge sup-
port from the Intel Corporation. LLL also acknowledges
funding from the China Scholarship Council.
[1] J. Preskill, Quantum 2, 79 (2018).
[2] R. P. Feynman, International Journal of Theoretical
Physics 21, 467 (1982).
[3] M. Kjaergaard, M. E. Schwartz, J. Braumu¨ller,
P. Krantz, J. I.-J. Wang, S. Gustavsson, and W. D.
Oliver, arXiv:1905.13641 (2019).
[4] IBM, “Quantum experience,” https://www.research.
ibm.com/ibm-q/ (2017).
[5] S. Boixo, S. V. Isakov, V. N. Smelyanskiy, R. Babbush,
N. Ding, Z. Jiang, M. J. Bremner, J. M. Martinis, and
H. Neven, Nature Physics 14, 595 (2018).
[6] Rigetti, “Rigetti forest,” https://www.rigetti.com/
forest (2018).
[7] C. D. Hill, E. Peretz, S. J. Hile, M. G. House, M. Fuech-
sle, S. Rogge, M. Y. Simmons, and L. C. Hollenberg,
Science advances 1, e1500707 (2015).
[8] R. Li, L. Petit, D. P. Franke, J. P. Dehollain, J. Helsen,
M. Steudtner, N. K. Thomas, Z. R. Yoscovits, K. J.
Singh, S. Wehner, et al., Science advances 4, eaar3960
(2018).
[9] S. Asaad, C. Dickel, N. K. Langford, S. Poletto,
A. Bruno, M. A. Rol, D. Deurloo, and L. DiCarlo, npj
Quantum Information 2, 16029 (2016).
[10] D. C. McKay, T. Alexander, L. Bello, M. J. Biercuk,
L. Bishop, J. Chen, J. M. Chow, A. D. Co´rcoles, D. Eg-
ger, S. Filipp, et al., arXiv:1809.03452 (2018).
[11] M. Yazdani, M. S. Zamani, and M. Sedighi, Quantum
information processing 12, 3239 (2013).
[12] A. Lye, R. Wille, and R. Drechsler, in The 20th Asia
and South Pacific Design Automation Conference (ASP-
DAC) (IEEE, 2015) pp. 178–183.
[13] R. Wille, O. Keszocze, M. Walter, P. Rohrs, A. Chat-
topadhyay, and R. Drechsler, in 2016 21st Asia and
South Pacific Design Automation Conference (ASP-
DAC) (IEEE, 2016) pp. 292–297.
[14] A. Farghadan and N. Mohammadzadeh, International
Journal of Circuit Theory and Applications 45, 989
(2017).
[15] S. Herbert and A. Sengupta, arXiv:1812.11619 (2018).
[16] H. Abraham, AduOffei, I. Y. Akhalwaya, G. Alek-
sandrowicz, et al., “Qiskit: An open-source framework
for quantum computing,” (2019).
[17] A. Zulehner, A. Paler, and R. Wille, IEEE Transactions
on Computer-Aided Design of Integrated Circuits and
Systems (2018).
[18] M. Y. Siraichi, V. F. d. Santos, S. Collange, and F. M. Q.
Pereira, in Proceedings of the 2018 International Sympo-
sium on Code Generation and Optimization (ACM, 2018)
pp. 113–125.
[19] W. Finigan, M. Cubeddu, T. Lively, J. Flick, and
P. Narang, arXiv:1810.08291 (2018).
[20] G. Li, Y. Ding, and Y. Xie, in Proceedings of the Twenty-
Fourth International Conference on Architectural Sup-
port for Programming Languages and Operating Systems
(ACM, 2019) pp. 1001–1014.
[21] S. S. Tannu and M. K. Qureshi, in Proceedings of the
Twenty-Fourth International Conference on Architec-
tural Support for Programming Languages and Operating
Systems (ACM, 2019) pp. 987–999.
[22] S. Nishio, Y. Pan, T. Satoh, H. Amano, and R. Van Me-
ter, arXiv:1903.10963 (2019).
[23] A. Cowtan, S. Dilkes, R. Duncan, A. Krajenbrink,
W. Simmons, and S. Sivarajah, arXiv:1902.08091
(2019).
[24] D. Venturelli, M. Do, E. Rieffel, and J. Frank, Quantum
Science and Technology 3, 025004 (2018).
[25] K. E. Booth, M. Do, J. C. Beck, E. Rieffel, D. Venturelli,
and J. Frank, in Twenty-Eighth International Conference
on Automated Planning and Scheduling (2018).
[26] D. Venturelli, M. Do, B. O’Gorman, J. Frank, E. Rieffel,
K. E. Booth, T. Nguyen, P. Narayan, and S. Nanda,
(2019).
[27] R. Versluis, S. Poletto, N. Khammassi, B. Tarasinski,
N. Haider, D. J. Michalak, A. Bruno, K. Bertels, and
L. DiCarlo, Phys. Rev. Applied 8, 034021 (2017).
[28] N. Khammassi, I. Ashraf, J. v. Someren, R. Nane,
A. Krol, M. A. Rol, L. Lao, K. Bertels, and C. G. Al-
mudever, arXiv:2005.13283 (2020).
[29] R. Wille, D. Große, L. Teuber, G. W. Dueck, and
R. Drechsler, in 38th International Symposium on Multi-
ple Valued Logic (ismvl 2008) (IEEE, 2008) pp. 220–225.
[30] C. C. Lin, A. Chakrabarti, and N. K. Jha, ACM Journal
on Emerging Technologies in Computing Systems 11, 7
(2014).
[31] T. E. OBrien, B. Tarasinski, and L. DiCarlo, npj Quan-
tum Information 3, 39 (2017).
[32] J. E. Kelley Jr and M. R. Walker, in Papers presented at
the December 1-3, 1959, eastern joint IRE-AIEE-ACM
computer conference (ACM, 1959) pp. 160–173.
[33] J. Blazewicz, J. K. Lenstra, and A. R. Kan, Discrete
applied mathematics 5, 11 (1983).
[34] N. Khammassi, G. G. Guerreschi, I. Ashraf, J. W.
Hogaboam, C. G. Almudever, and K. Bertels,
arXiv:1805.09607 (2018).
[35] N. Khammassi, I. Ashraf, X. Fu, C. G. Almude´ver, and
K. Bertels, in Design, Automation & Test in Europe Con-
ference & Exhibition (DATE), 2017 (IEEE, 2017) pp.
464–469.
[36] X. Fu, L. Riesebos, M. A. rOL, J. van Straten, J. van
Someren, N. Khammassi, I. Ashraf, R. Vermeulen,
V. Newsum, K. Loh, et al., in 2019 IEEE International
Symposium on High Performance Computer Architecture
(HPCA) (IEEE, 2019) pp. 224–237.
[37] M. J. Dousti, A. Shafaei, and M. Pedram, in Proceedings
of the 24th edition of the great lakes symposium on VLSI
(ACM, 2014) pp. 117–122.
[38] L. Lao, B. van Wee, I. Ashraf, J. van Someren, N. Kham-
massi, K. Bertels, and C. G. Almudever, Quantum Sci-
ence and Technology 4, 015005 (2019).
[39] M. J. Dousti and M. Pedram, in Proceedings of the 50th
14
Annual Design Automation Conference (DAC)) (ACM,
2013) p. 42.
[40] P. Murali, N. M. Linke, M. Martonosi, A. J. Abhari,
N. H. Nguyen, and C. H. Alderete, in Proceedings of the
46th International Symposium on Computer Architecture
(2019) pp. 527–540.
[41] P. Murali, J. M. Baker, A. Javadi-Abhari, F. T. Chong,
and M. Martonosi, in Proceedings of the Twenty-Fourth
International Conference on Architectural Support for
Programming Languages and Operating Systems (2019)
pp. 1015–1029.
[42] X. Fu, M. A. Rol, C. C. Bultink, J. van Someren,
N. Khammassi, I. Ashraf, R. F. L. Vermeulen, J. C.
de Sterke, W. J. Vlothuizen, R. N. Schouten, C. G. Al-
mudever, L. DiCarlo, and K. Bertels, in Proceedings of
the 50th Annual IEEE/ACM International Symposium
on Microarchitecture (MICRO-50) (IEEE/ACM, 2017)
pp. 813–825.
[43] Intel, “Intel newsroom,” https://newsroom.intel.
com/press-kits/quantum-computing/#intel-qutech
(2019).
[44] N. M. Linke, D. Maslov, M. Roetteler, S. Debnath,
C. Figgatt, K. A. Landsman, K. Wright, and C. Mon-
roe, Proceedings of the National Academy of Sciences
114, 3305 (2017).
[45] Y. Shi, N. Leung, P. Gokhale, Z. Rossi, D. I. Schus-
ter, H. Hoffmann, and F. T. Chong, in Proceedings of
the Twenty-Fourth International Conference on Architec-
tural Support for Programming Languages and Operating
Systems (ACM, 2019) pp. 1031–1044.
15
TABLE IV: The results of mapping quantum benchmarks to the Surface-17 processor, including the total number of
gates and the number of two-qubit gates (CZs) in the mapped output circuits, the circuit latency in cycles (20 ns
per cycle), the numbers of inserted SWAP (SWs) and MOVE (MVs) operations, and the CPU time that mapping
takes in seconds.
Benchmarks
The Trivial mapper The MinPath mapper The Qmap mapper
Latency Gates CZs SWs MVs Time Latency Gates CZs SWs MVs Time Latency Gates CZs SWS MVs Time
alu bdd 288 335 393 113 25 0 0.06365 286 341 100 16 7 1.7313 254 362 109 15 13 1.77362
alu v0 27 166 188 56 13 0 0.0353 100 116 30 3 2 4.20968 106 122 34 3 4 4.13529
benstein vazirani 36 45 10 3 0 0.01667 36 9 1 0 0 0.01051 36 9 1 0 0 0.01135
4gt12 v1 89 931 1191 346 82 0 0.19592 811 917 270 54 4 25.6367 690 886 259 51 3 26.0342
4gt4 v0 72 1124 1416 413 100 0 0.2555 884 1018 296 55 9 4.40794 788 973 273 52 2 4.628
4mod5 bdd 287 298 339 100 23 0 0.07120 234 247 71 10 5 18.7469 226 240 69 10 4 19.5225
cm42a 207 7167 8782 2532 587 0 1.42467 6499 7887 2352 517 15 611.534 5713 7724 2301 494 24 535.889
cnt3 5 180 1985 2491 725 170 0 0.38054 1480 2103 623 136 0 25.0301 1236 2132 641 142 0 25.6028
cuccaroAdder 1b 175 171 50 11 0 0.03036 90 92 23 0 3 0.2521 90 92 23 0 3 0.28906
cuccaroMultiply 417 427 122 30 0 0.06122 260 274 74 10 6 2.05933 217 246 64 6 7 2.09601
decod24 bdd 315 375 110 26 0 0.06353 253 301 90 14 8 1.38109 201 287 83 15 3 1.46449
decod24 enable 1342 1607 467 106 0 0.23441 1324 1464 434 95 0 28.704 1151 1474 434 95 0 28.7617
graycode6 47 30 31 11 2 0 0.00898 16 15 5 0 0 5.83973 16 15 5 0 0 5.8601
ham3 102 79 87 26 5 0 0.01126 60 62 17 2 0 0.15724 60 62 17 2 0 0.22297
miller 11 199 222 65 14 0 0.02856 156 166 46 3 7 0.19096 139 149 39 0 8 0.27471
mini alu 167 1144 1431 414 96 0 0.21319 985 1120 309 61 0 29.411 818 1068 294 56 0 28.5271
mod5adder 127 2229 2744 794 185 0 0.44677 1908 2240 645 130 8 7.0105 1618 2104 598 109 16 7.4544
mod8 10 177 1819 2285 661 155 0 0.36898 1570 1808 530 102 14 2.26425 1434 1786 518 106 2 2.53567
one two three 287 346 101 23 0 0.05446 235 263 76 12 4 6.18516 215 252 70 10 4 6.41456
rd32 v0 66 168 184 55 13 0 0.02766 105 113 32 4 2 1.65692 104 111 31 1 6 1.71454
rd53 311 1183 1514 448 108 0 0.24811 909 1249 370 78 6 0.32513 856 1257 375 81 4 0.67113
rd73 140 970 1198 350 82 0 0.19047 751 1010 300 62 5 20.682 662 988 292 52 16 20.3441
rd84 142 1385 1804 526 124 0 0.30129 1044 1624 481 109 0 20.7494 861 1516 448 98 0 21.1735
sf 274 3351 3892 1137 267 0 0.67493 2705 3157 926 178 28 40.0639 2151 2822 818 104 85 41.2879
shor 15 15082 19608 5472 1228 0 4.33023 13460 17464 5046 1028 87 2.45284 11217 17058 4924 982 95 14.6928
sqrt8 260 12708 16131 4719 1135 0 3.49803 11626 14041 4231 953 29 4.12037 10020 13944 4216 956 17 13.2009
squar5 261 7865 10178 2951 694 0 2.17597 7198 8922 2663 594 6 3.48788 6468 8764 2630 585 3 7.76352
sym6 145 15466 19266 5583 1294 0 4.12125 14094 16427 4872 965 138 3.94839 12873 16145 4787 970 88 16.7757
sym9 146 1250 1721 499 117 0 0.30173 1040 1493 447 93 10 21.1801 980 1456 431 91 5 21.6935
sys6 v0 111 859 1142 338 80 0 0.24816 640 976 290 62 3 21.1563 573 909 267 49 11 21.1608
vbeAdder 2b 332 468 135 31 0 0.09799 236 300 79 9 5 0.16455 215 298 80 6 10 0.1938
wim 266 3941 5084 1474 349 0 0.98658 3546 4289 1273 272 15 13.0377 3190 4203 1254 265 16 13.5583
xor5 254 63 63 23 6 0 0.01135 18 18 8 1 0 29.7578 18 18 8 1 0 29.2384
z4 268 12341 15792 4598 1085 0 3.19178 11463 13962 4178 905 60 818.036 9704 13537 4088 887 42 869.445
adr4 197 14296 18110 5287 1263 0 3.38715 12772 15868 4780 1082 18 1.67818 11070 15496 4685 1021 62 10.924
9symml 195 142144 182319 53224 12664 0 36.5722 134023 164219 49485 11167 376 16.642 116118 162001 49154 11282 38 2332.7
clip 206 139948 180243 52809 12679 0 40.1273 128597 162421 49227 11379 159 17.44 111253 160880 49090 11268 257 2587.99
cm152a 212 5166 6320 1834 434 0 1.3859 4508 5347 1586 346 8 0.66896 4086 5306 1591 353 0 3.53968
cm85a 209 48394 61007 17886 4300 0 14.2237 44110 54845 16654 3832 86 6.49007 37839 53363 16224 3716 45 389.036
co14 215 75821 99108 29218 7126 0 20.9755 68064 92308 28381 6837 15 10.8777 57968 90267 27787 6615 51 1044.04
cycle10 2 110 25607 31630 9236 2196 0 7.12406 23070 28148 8460 1904 50 3.26144 20458 27897 8471 1899 63 106.071
dc1 220 7740 9845 2867 678 0 2.62955 7116 8575 2574 567 20 1.45486 5979 8117 2444 481 84 8.51783
dc2 222 39466 50396 14754 3541 0 12.5991 36113 44864 13547 3100 58 5.16826 31796 44379 13520 3077 79 268.637
dist 223 156674 201426 58891 14089 0 40.4085 144079 183197 55613 12757 359 55.0312 124031 179639 54717 12599 148 3550.7
ham15 107 36221 45826 13356 3166 0 10.1604 33368 40721 12257 2797 4 5.74947 28906 39762 12030 2704 30 193.48
life 238 92286 117371 34238 8146 0 30.3595 85068 104447 31370 7134 84 14.0716 75462 104689 31920 7324 74 1364.49
max46 240 111978 141438 41211 9789 0 35.6086 101798 125209 37631 8375 331 16.8426 89164 123895 37565 8217 535 1840.89
mini alu 305 767 862 254 59 0 0.16079 505 741 228 35 23 0.19804 518 775 242 41 21 0.40925
misex1 241 19670 24793 7206 1702 0 5.75472 18143 22002 6577 1479 20 3.13745 15892 21883 6588 1480 24 53.4152
pm1 249 7167 8782 2532 587 0 2.3615 6446 7793 2314 499 23 1.48028 5629 7774 2331 504 24 6.86838
radd 250 13254 16700 4867 1154 0 4.11447 12291 14955 4516 979 87 2.43807 10798 14408 4363 948 57 24.0061
root 255 71310 91873 26882 6463 0 20.7858 64948 82599 24991 5824 13 11.4199 55963 80542 24520 5627 73 844.114
sqn 258 43328 53252 15529 3690 0 11.7106 38165 46370 13908 3019 196 6.55198 33010 45801 13815 2984 202 270.981
square root 7 35769 44042 12896 3269 0 10.6409 27419 34333 10274 2371 36 50077.7 23203 33088 9845 2184 102 46862.9
sym10 262 269200 340622 99658 23858 0 72.7567 247750 305153 92270 21030 548 42.2372 215185 303141 92326 20986 642 9083.41
sym9 148 87919 110393 32127 7573 0 27.4377 79881 95215 28378 6152 257 14.4444 70756 94656 28462 6182 254 800.226
