Noise-Adaptive Compiler Mappings for Noisy Intermediate-Scale Quantum
  Computers by Murali, Prakash et al.
Noise-Adaptive Compiler Mappings
for Noisy Intermediate-Scale Quantum Computers
Prakash Murali∗
Princeton University
Jonathan M. Baker
University of Chicago
Ali Javadi Abhari
IBM T. J. Watson Research Center
Frederic T. Chong
University of Chicago
Margaret Martonosi
Princeton University
Abstract
A massive gap exists between current quantum computing
(QC) prototypes, and the size and scale required for many
proposed QC algorithms. Current QC implementations are
prone to noise and variability which affect their reliability,
and yet with less than 80 quantum bits (qubits) total, they
are too resource-constrained to implement error correction.
The term Noisy Intermediate-Scale Quantum (NISQ) refers
to these current and near-term systems of 1000 qubits or
less. Given NISQ’s severe resource constraints, low reliabil-
ity, and high variability in physical characteristics such as
coherence time or error rates, it is of pressing importance
to map computations onto them in ways that use resources
efficiently and maximize the likelihood of successful runs.
This paper proposes and evaluates backend compiler ap-
proaches to map and optimize high-level QC programs to
execute with high reliability on NISQ systems with diverse
hardware characteristics. Our techniques all start from an
LLVM intermediate representation of the quantum program
(such as would be generated from high-level QC languages
like Scaffold) and generate QC executables runnable on the
IBM Q public QC machine. We then use this framework to
implement and evaluate several optimal and heuristic map-
ping methods. These methods vary in how they account for
the availability of dynamic machine calibration data, the rel-
ative importance of various noise parameters, the different
possible routing strategies, and the relative importance of
compile-time scalability versus runtime success. Using real-
system measurements, we show that fine grained spatial and
temporal variations in hardware parameters can be exploited
to obtain an average 2.9x (and up to 18x) improvement in
program success rate over the industry standard IBM Qiskit
compiler. Despite small qubit counts, NISQ systems will soon
be large enough to demonstrate “quantum supremacy,” i.e., an
advantage over classical computing. Tools like ours provide
significant improvements in program reliability and execu-
tion time, and offer high leverage in accelerating progress
towards quantum supremacy.
Keywords noise-adaptive compilation; qubit mapping
∗Prakash Murali is the corresponding author and can be reached at
pmurali@cs.princeton.edu.
1 Introduction
Quantum computing (QC) aims to solve intractable compu-
tational problems by leveraging quantum mechanical prin-
ciples like superposition and entanglement to manipulate
information efficiently. QC algorithms show potential to sig-
nificantly impact areas such as quantum chemistry [1, 2],
cryptography [3], machine learning [4], and others. Unfortu-
nately, a massive gap exists between the resources required
by most proposed QC algorithms, and the resources which
exist in current prototype hardware.
QC systems have been announced with 49-72 qubits [5–7]
and current operational systems have been demonstrated
publicly with roughly 20 qubits or fewer [8]. A QC system
with 72 fully-entangled qubits and sufficiently-precise oper-
ations (“gates”) would likely be sufficient to show “quantum
advantage” over the largest classical supercomputers, but
would still be 5-6 orders of magnitude smaller than the re-
source requirements of Shor’s well-known QC algorithm for
factoring large numbers [3, 9, 10].
The term Noisy Intermediate-Scale Quantum (NISQ) com-
puters refers to the current and near-term QC systems which
have roughly 1000 qubits or fewer—typically too small to em-
ploy error correction codes (ECC) [11]. While resource con-
strained, NISQ machines offer an important step forward: if
used well, they can demonstrate QC applications generating
useful results. Making good use of NISQ hardware, however,
requires very efficient, near-optimal mappings of algorithms
onto them. This paper proposes a suite of optimization- and
heuristic-based approaches for mapping applications onto
NISQ hardware, and evaluates them by running the mapped
executables on a public 16-qubit IBM system1.
A good mapping of a QC algorithm onto NISQ hardware
requires first an intelligent initial placement of the program
qubits onto the hardware qubits in order to reduce commu-
nication requirements. Second, it requires efficient orches-
tration of operations both for the computation itself, and
also for the additional SWAP operations which communicate
state between hardware qubits. Third and most importantly,
mapping decisions must reduce the likelihood of operational
or decoherence errors which cause the program run to fail
1We run all experiments on the 16-qubit IBM instance named IBMQ 16
Rueschlikon [8]. For the remainder of the paper, we shorten this name to
IBMQ16.
ar
X
iv
:1
90
1.
11
05
4v
1 
 [q
ua
nt-
ph
]  
30
 Ja
n 2
01
9
0 5 10 15 20 25
Day
20
40
60
80
100
120
T2
 ti
m
e 
(u
s)
Q0
Q4
Q9
Q13
(a) Coherence time (T2)
0 5 10 15 20 25
Day
0.05
0.10
0.15
0.20
0.25
0.30
0.35
Ga
te
 e
rro
r r
at
e CNOT 5,4CNOT 7,10
CNOT 3,14
(b) CNOT gate error rate
Figure 1. Daily variations in qubit coherence time (larger
is better) and gate error rates (lower is better) for selected
elements in IBMQ 16 Rueschlikon. The qubits and gates that
are most or least reliable are different across days.
to achieve a useful answer. Our work performs mappings
using the daily calibration data provided by IBM in order to
avoid using unreliable qubits and to prioritize qubit position-
ing which reduces the likelihood of communication (SWAP)
errors. For example, Figure 1 shows large daily variations
in the gate error rates and coherence times of the qubits of
IBMQ16 on which we experiment. Our contributions are:
First, we develop an LLVM [12] compiler which optimally
or near-optimally maps quantum programs to OpenQASM
assembly code [13] and then to the web-accessible IBMQ16
machine for real-system evaluation. For 12 QC programs
written in the Scaffold quantum programming language [14],
we use this framework to explore how optimal and heuristic
mapping methods, qubit movement policies, and the intelli-
gent adaptation to machine calibration data can affect the
quality of the compiled code.
In particular, our compiler provides up to 1.68x gain in
execution time and 9x gain in success rate over an optimal
but calibration-unaware baseline. Our compiler obtains an
average 2.9x improvement (up to 18x) in success rate, and an
average 2.7x improvement in execution time (up to 6x), com-
pared to the IBM Qiskit compiler [15], which is the industry
standard for IBMQ16.
Furthermore, although compile-time is not a first-order
design goal, QC compilers must scale well enough for intel-
ligent compilation to be tractable throughout NISQ-range
machines. We show that our methods based on Satisfiability
Modulo Theory (SMT) scale well up to 32 qubits. Further, we
have developed calibration-aware heuristic methods which
produce executables with similar reliability and execution
time as the SMT approaches, but with more scalable compile-
times beyond 32 qubits.
Finally, across the 12 benchmarks, we study the influ-
ence of application instruction mix and time varying qubit
error characteristics on compiled programs. For example,
applications for which our compiler can identify zero-qubit-
movement mappings have substantially higher likelihood of
success (up to 2.8x), compared to programs which require
even a single qubit movement operation.
Overall, NISQ systems are important to QC progress be-
cause their success in demonstrating quantum supremacy
and running small but useful QC programs is an important
stepping-stone in the maturation of this technology. In its
leveraging of intelligent and calibration-aware mapping tech-
niques to significantly improve execution time and success
rate of quantum executions, our tool makes an important
contribution in helping close the gap to quantum supremacy
and advancing toward practical QC.
2 Background on Quantum Computing
Principles of Quantum Computing: A qubit is the basic
unit of quantum information. Unlike classical bits, which
take two values (0 and 1), superposition allows qubits to
be in a probabilistic combination of the two states. If we
consider the states |0⟩ and |1⟩ as basis vectors of C2, we can
express the state of a qubit |ψ ⟩ as |ψ ⟩ = α |0⟩+β |1⟩, where α
and β are complex amplitudes such that |α |2 + |β |2 = 1. The
state of one or more qubits can be manipulated by modifying
the complex amplitudes using operations termed as gates.
Single-qubit operations include: H, X, Y, Z and others. The
act of measurement or readout collapses the superposition
state to one of the two basis vectors, a classical output.
A controlled NOT (CNOT) gate is an example of a two-
qubit gate. A CNOT gate has a control and target qubit.When
the control qubit is in the state |1⟩, the state of the target bit
is flipped. In quantum CNOT gates, the gate can operate on
qubits to entangle them to have non-classical correlations in
their states and measurement outputs. We use the notation
CNOT C, T for a CNOT gate with control C and target T.
A quantum computer with n fully-entangled qubits has an
exponential state space of size 2n . In a QC application, a set
of qubits are initialized to encode a given problem including
its data input. As the program executes, qubit amplitudes
are manipulated, typically to boost the probabilities of the
desired outcomes in the state space. Finally, the qubits are
measured to produce classical output for the given problem.
2
NISQ Systems: NISQ systems are near-term quantum sys-
tems expected to scale to a few hundred qubits, paving the
way towards large-scale QC [11]. Qubits in NISQ systems
have short coherence time, high gate error rates and and
limited qubit connectivity. They are typically too resource-
constrained to implement error-correcting codes (ECC).
As a concrete NISQ example, Figure 2b shows the lay-
out of the qubits in the 16-qubit IBM system. This system
implements a set of 1- and 2-qubit operations, akin to an
instruction set. For 2-qubit operations, this machine only
supports hardware CNOT gates being performed between
adjacent qubits, based on the topology shown in Figure 2b.
To perform CNOT gates between non-adjacent qubits, we
should use SWAP operations between adjacent qubits until
the two of interest for a given CNOT computation are in
adjacent locations. Each SWAP operation between two adja-
cent qubits itself requires 3 CNOT gates2 Our compiler aims
to reduce the time cost of these operations. More importantly,
each one of these operations incurs some error, so a key goal
of our optimization is to reduce operation counts and error
rates in order to increase the likelihood of an overall success-
ful run. We refer to this as reliability and it is the primary
design goal of this work.
In addition to compiler optimization based on attributes
like gate counts, our approach also adapts based on publicly-
available experimental data. In particular, the IBM Q ma-
chines are calibrated twice a day. Once a day there are pub-
lic postings of experimental measurements of key proper-
ties: qubit relaxation time (T1), coherence time (T2), gate
errors and readout errors [17]. From daily calibration logs,
we observe that qubit coherence time is 70 microseconds
on average, but varies up to 9.2x spatially and temporally
across qubits and daily calibrations. The average error rate
for CNOTs is 0.04, readouts is 0.07 and single qubit gates is
0.002. CNOT and readout error rates exhibit up to 9.0x and
5.9x variation across qubits and calibration cycles, respec-
tively. CNOT gate durations vary up to 1.8x across qubits.
These fluctuations stem from material defects caused by the
lithographic processes used to manufacture the qubits and
are expected to be present in future generations of supercon-
ducting qubits also [18].
These error rates imply only very short programs can
execute reliably on the machine. A program with more than
16 CNOT operations, has less than 50% chance of executing
correctly. A key goal of our compiler optimizations is to use
this calibration data to boost the success rate of individual
program runs, by avoiding portions of the machine with
poor coherence, operation, or readout errors.
2For two qubits X and Y , SWAP(X,Y) := {CNOT X,Y; CNOT Y,X; CNOT
X,Y} [16].
3 Compilation Framework: Overview
Our framework takes a Scaffold program [14] as input, and
produces compiled OpenQASM code [13]. The Scaffold quan-
tum programming language extends C with quantum gates.
Scaffold programs are independent of the machine topol-
ogy, size and qubit properties. The ScaffCC compiler [19, 20]
performs automatic gate and rotation decomposition, imple-
ments high level operations like the Toffoli gate and produces
an LLVM Intermediate Representation (IR) [12] of the pro-
gram. The IR version of the program includes the qubits
required for each operation and the data dependencies be-
tween operations. For example, Figure 2a shows the IR for
the simple 4-qubit Bernstein-Vazirani algorithm which is
chosen because it fits on machines of this size and has an
answer which can be calculated to check our results [21]. We
use the program IR as a starting point for the noise-aware
backend described here.
Starting from the IR, the noise-aware backend has three
primary tasks. First, qubits in the program must bemapped
to distinct qubits in the hardware implementation, preferably
in a way that reduces qubit state movement required as the
program executes. Second, the compiler performs operation
scheduling while respecting data dependencies between
gates. To accomplish this, each operation is assigned a start
time constraint, and the scheduler emits control code that
enforces this. Third, to perform 2-qubit operations on non-
adjacent qubits, the compiler should orchestrate commu-
nication through SWAPs. That is, it automatically inserts
the required SWAP operations to bring the qubits adjacent
to each other before the operation is performed.
Consider a simple compilation method where program
qubits are assigned to random qubits on the hardware. Figure
2b shows such a mapping for the BV4 IR. In this mapping, the
compiler must insert qubit movement or swap operations to
perform the CNOT gates between p1 and p3. In contrast, the
mapping shown in Figure 2c requires no qubit movement
because the qubits required for the CNOTs are adjacent. In
addition, this mapping is noise-aware; namely, it uses the
calibration data to select a mapping that avoids using qubits
with low coherence time and gates with high error rates.
Our compiler uses machine topology and calibration data to
automatically generate such mappings for a given program.
Our primary goal is to maximize the likelihood that the
program runs successfully. To accomplish this, we have three
main strategies. First, the compiler places program qubits on
hardware locations with high reliability, based on the cali-
bration data. The compiler considers the effect of errors due
to CNOTs and readouts; for this machine, single-qubit error
rates are considerably smaller so our formulation chooses to
ignore them. Second, to mitigate errors due to decoherence,
the compiler should schedule all gates to finish before the
coherence time of the hardware qubits (intuitively analogous
to making use of data within the refresh interval of a DRAM).
3
p0 H • H
p1 H • H
p2 H • H
p3 X H H
(a) Bernstein-Vazirani Intermediate Representation (b) Layout of qubits in IBMQ16 and a
naive mapping for BV4.
(c) Optimized mapping for BV4
Figure 2. Figure (a) shows the intermediate representation of the Bernstein-Vazirani algorithm on 4 qubits (BV4). Each program
qubit is represented by a line. X and H are single qubit gates. The CNOT gates from each qubit p0,1,2 to p3 are marked by vertical
connectors. The measurement or readout operation is indicated by the meter. Figure (b) shows the layout of the hardware qubits
in IBMQ16 and a naive mapping of BV4’s program qubits. The black circles denote qubits and the edges indicate permitted
CNOT gates. The numbers on the labelled edges indicate the CNOT gate error (×10−2). The hatched qubits and crossed gates
are unreliable. In this mapping, qubit movement is required to perform the CNOTs and error-prone operations are used. Figure
(c) shows a mapping where qubit movement is not required and unreliable qubits and gates are avoided.
Executable OpenQASM Code Generation
Solve Constrained Optimization
Perform Qubit Mapping, Gate Scheduling and Routing
Generate Data-Aware Constraints
Readout Error 
Constraints
CNOT Error 
Constraints
CNOT Time 
Constraints
Generate Configuration Constraints
Mapping Constraints
Scheduling 
Constraints
Routing Constraints
Compiler
QC Application 
in LLVM IR
Machine 
Configuration and 
Calibration Data
Compiler Options 
(Objective, Routing 
Policy, Solver etc.)
OpenQASM code optimized for 
current machine state
Figure 3. Optimization Pipeline. Inputs are a QC program,
details about the specific hardware configuration, and a set
of options, such as routing policy and solver approach. From
these, compiler generates a set of appropriate constraints
and uses them to map program qubits to hardware qubits
and schedule operations. Finally, the compiler generates an
executable version of the program, here for IBMQ16.
Third, the compiler optimizes for the qubit topology to avoid
unnecessary qubit movement. Qubit movement not only in-
creases execution duration, but more importantly leads to
high error rates since each qubit SWAP operation includes
three error-prone CNOTs. We have designed a set of optimal
and heuristic compilation variants to accomplish these goals.
Table 1 enumerates the full set of compiler variants we con-
sider in this paper. In addition to the publicly-available IBM
Qiskit compiler we use as a comparative baseline, we also de-
velop several approaches which are either truly optimization-
based or heuristic. We give an overview of these approaches
here, before offering details in the following section.
3.1 Optimization-Based Mappings
In the optimization-based variants of our compiler, we imple-
ment the above goals by posing the compilation problem as
a constrained optimization problem to be solved by a satisfia-
bility modulo theory (SMT) solver. The optimization problem
has variables and constraints which express program infor-
mation, machine topology constraints, and machine error
information. The variables include program qubit locations,
gate start times and routing paths. The constraints spec-
ify qubit mappings should be distinct, gates should start in
program dependency order, and routing paths should be
non-overlapping. Fig. 3 summarizes the general compilation
pipeline for the solver-based approach, beginning with an
IR of a program and resulting in execution-ready code.
The optimization objective is to maximize the reliability
or success rate of program runs. We express the reliability
of the program as the product of the reliability of all gates
in the program. (Because of the degree of entanglement in
QC programs, this serves as a useful measure of overall
correctness.) For a given mapping, the solver determines the
reliability of each program CNOT, readout operation and
single qubit gate and computes an overall reliability score.
For the optimization variants which are noise-aware, the
solver can maximize the reliability score over all mappings
by tracking and adapting to the error rates, coherence limits,
and qubit movement based on program qubit locations.
4
Algorithm Objective Parameters Constraints
Qiskit Heuristic, minimize duration - -
T-SMT SMT solver, minimize duration Routing policy: RR, 1BP 1-4, 7-9
T-SMT⋆ SMT solver, minimize duration Routing policy: RR, 1BP 1-3, 5-9
R-SMT⋆ SMT solver, maximize reliability Routing policy: 1BPReadout weight ω ∈ [0, 1] 1-3, 5-6, 9, 10-11
GreedyV⋆ Heuristic, maximize reliability Routing Policy: Best Path -
GreedyE⋆ Heuristic, maximize reliability Routing Policy: Best Path -
Table 1. List of compiler configurations used in our study. The IBM Qiskit 0.5.7 compiler is used as a the baseline. The use of
calibration data is marked by a⋆.
Given a target machine, our framework converts the pro-
gram IR into an optimization problem by expressing an ob-
jective and constraints that can be solved using an Satisfi-
ability Modulo Theory (SMT) solver [22, 23]. For classical
programs, these solvers have been used to obtain optimal
hardware mapping and scheduling for spatial architectures
[24], but to our knowledge, ours is the first use of them for
QC systems. SMT solvers take as input a set of linear con-
straints, and an objective function and search for an optimal
solution. Although the reliability objective is a product of
individual gate reliability scores (and therefore non-linear),
we linearize the objective by instead optimizing for the addi-
tive logarithms of the reliability scores. An SMT solver can
then be invoked to find a mapping which maximizes the log
reliability.
Doesmaximizing the reliability score achieve our goal
of increasing program success rate? Optimizing for the
reliability score induces the compiler to place qubits at loca-
tions where CNOT and readout errors are low. It also indi-
rectly minimizes qubit movement because CNOTs between
far away qubits are error-prone. For example, for the BV4 IR,
consider mapping shown in Figure 2b. Here, the reliability of
the CNOT between p0 and p3 is 0.8 (80% chance of executing
correctly), while the reliability of the CNOT between p1 and
p3 is only 0.653. Thus, the compiler will choose mappings
where communicating qubits are close together, minimiz-
ing unnecessary qubit movement and allowing gates to be
scheduled to finish within the coherence window.
3.2 Heuristic Mappings
We also determine whether heuristic techniques can ap-
proach the optimization-based results, but with better scal-
ability. For this, we develop two comparative algorithms
based on greedy heuristics. The greedy heuristics analyze the
CNOTs in the program IR, and determine a gate frequency
for each qubit and program CNOT.
3p1 has to swap once to move to a location adjacent to p3. The net reliability
of the 3 CNOTs required to perform the SWAP is 0.93 = 0.729. Then the
actual CNOT operation can be performed with reliability 0.9. Hence, the
overall CNOT reliability is 0.65.
We explore two policies. In the first policy, GreedyV⋆,
we place program qubits on hardware qubits in the heaviest
qubit first order. In the second policy, GreedyE⋆, we place
program CNOTs and their control and target qubits in a
heaviest edge first order. Intuitively, the first policy places
qubits which use more CNOTs in locations which have good
CNOT and readout error rates. The second policy places
pairs of qubits which have the most frequent CNOTs first.
4 Optimal Compilation
4.1 Notations and Assumptions
Let QP be the set of program qubits. Let QH be the set of
hardware qubits. In this work, we assume hardware qubits
are arranged as a 2-D grid of dimensionsMx ×My . Likewise,
due to the connectivity characteristics of IBMQ16, we assume
only hardware qubits which are adjacent in the grid are per-
mitted to participate in two qubit operations. More elaborate
topology and routing assumptions can be handled in future
work. For q ∈ QP , the ordered pair (q.x ,q.y) corresponds to
the location of the hardware qubit assigned to the program
qubit q. Let G be the set of operations in the program. This
includes single-qubit gates such asH , and the 2-qubitCNOT
gate and qubit measurement or Readout operations. CNOT
and readout operations dominate the reliability outcomes,
so the reliability score focuses on them. The subset of CNOT
gates is denoted byGCNOT , and the subset of readout opera-
tions is GReadout . For each gate д in the program, the start
time is denoted by (д.τ ), duration by (д.δ ), and reliability by
(д.ϵ). To denote data dependencies between the operations,
we use a binary relation > on the gates, so that for two oper-
ations д2 > д1 if д2 depends on д1. Although the reliability
objective focuses on a subset of operations, we map and
schedule all operations (including single-qubit operations)
to provide a valid real-system executable.
4.2 Constraints
Qubit Mapping Constraints: Constraint 1, guarantees all
program qubits are mapped to actual hardware qubits. Con-
straint 2 guarantees each program qubit is assigned a unique
5
location.
∀q ∈ QP : 0 ≤ q.x < Mx ∧ 0 ≤ q.y < My (1)
∀q1,q2 ∈ QP : q1.x , q2.x ∨ q1.y , q2.y (2)
Gate Scheduling Constraints: For each gate д in the pro-
gram, the compiler determines the start time and execution
duration. If two gates д1 and д2 both operate on the same
qubit, and д2 uses the output of д1, д2 should start only after
д1 finishes. For every such edge in the dependency graph,
Constraint 3 shows the form we use to enforce such data
dependencies.
∀д1,д2 ∈ G : д2 > д1 ⇒ д2.τ ≥ д1.τ + д1.δ (3)
The durations, δ , for single qubit operations are set using
the documented durations in timeslots of the corresponding
hardware operations. For CNOTs, the duration includes both
the operation itself as well as the time to bring the relevant
program qubit states into adjacent hardware qubits; this
depends on the routing policy and is discussed below.
CNOT Duration based on Grid Distance: The duration
of a CNOT gate accounts for both CNOT time and the du-
ration of the swap paths before and after the CNOT. For
a CNOT д ∈ GCNOT , let the control and target qubits
be qc and qt . Then the duration of the CNOT is: д.δ =
2 ∗ (∥qc − qt ∥1 − 1) ∗ τSWAP + τCNOT where ∥qc − qt ∥1 =
|qc .x − qt .x | + |qc .y − qt .y | and τSWAP , τCNOT are the times
to complete a SWAP or CNOT operation, respectively.
The compiler must schedule operations before the individ-
ual qubits decohere. For T-SMT (noise-unaware) we simply
use an assumption ofMT as 1000 timeslots of coherence time,
which is the long-term average for the machine:
∀д ∈ G : д.τ + д.δ < MT (4)
CNOT Duration based on Calibration Data: For T-
SMT⋆ and R-SMT⋆, we set durations based on calibration
data. In particular, since qubit coherence time changes daily
(Figure 1a) and CNOT gate durations vary across qubits,
these approaches use the calibration-based data in the opti-
mization constraint. To set durations based on calibration
data, we assume a routing policy and compute the CNOT du-
rations for each hardware qubit pair. Let ∆ be an |QH | × |QH |
matrix where ∆hi ,hj , i , j, specifies the duration of a CNOT
between hardware qubits hi ,hj ∈ QH . The duration of a
program CNOT can be set as: for all д ∈ GCNOT and for all
h1,h2 ∈ QH :
дc = h1 ∧ дt = h2 ⇒ д.δ = ∆h1,h2 (5)
For the calibration-aware coherence time bound, con-
straint 6 ensures every gate finishes before the coherence
time of the qubits it acts on i.e., if a gate uses a hardware
qubit h, it should complete before h decoheres, with h.τ as
the coherence time of a hardware qubit h ∈ QH . We have for
(a) Rectangle Reser-
vation (RR)
(b) One Bend Paths
(1BP)
Figure 4. Two routing policies for swap-based architectures.
all д ∈ G and for all h1,h2 ∈ QH :
дc = h1 ∧ дt = д2 ⇒ д.τ + д.δ ≤ min (h1.τ ,h2.τ ) (6)
4.3 Routing for CNOT Gates
To route multiple CNOTs in parallel, the compiler uses two
routing policies based on policies in VLSI routing [25, 26].
Rectangle Reservation: In this policy, for every CNOT in
the program, the compiler blocks a 2D region bounded by
the control and target qubit, during the CNOT execution. For
example, in Figure 4a, the highlighted rectangle is reserved
for the duration of the CNOT.
Consider a CNOT gateдi ∈ GCNOT . Let (l ix , l iy ) and (r ix , r iy )
denote the top left and bottom right corners, respectively,
of the bounding rectangle of дi . These variables are defined
using min and max relations on the qubit mapping variables
of the CNOT. For two CNOTsдi andдj , the routing constraint
is:
S(Ri ,R j ) = ¬(l ix > r jx ∨ r ix < l jx ∨ l iy > r jy ∨ r iy < l jy ) (7)
T (дi ,дj ) = ¬(дi .τ > дj .τ + дj .δ ∨ дj .τ > дi .τ + дi .δ ) (8)
Constraint S checks if the two rectangles overlap in space.
Constraint T checks whether CNOTs overlap in time. For
any pair of CNOTs дi and дj , they cannot overlap in time if
they overlap in space: S(дi ,дj ) =⇒ ¬T (дi ,дj ).
One Bend Paths: In this policy, CNOT routes are restricted
to the two paths along the bounding rectangle of the control
and target qubit. For example, in Figure 4b, the CNOT is al-
lowed to use one of the two highlighted paths. To implement
this policy, the solver selects one of the two routes for every
CNOT in the program.
To express constraints for this policy, we use variables
to record the junction through which the CNOT is routed.
The one bend path is composed of two segments: control
to junction and junction to target. For generality, we can
consider these segments as rectangles, and apply the same
overlap check as in rectangle reservation. Denote the control
to junction path for CNOT i as Rc ji . Then, we can check if
two CNOTs дi and дj overlap using:
Overlap(i, j) =S(Rc ji ,Rc jj ) ∨ S(Rc ji ,R jtj )∨
S(R jti ,Rc jj ) ∨ S(R jti ,R jtj ) (9)
Similar to rectangle reservation, we impose the condition
that CNOTs do not overlap in time if they overlap in space.
6
4.4 Reliability Constraints
To optimize the reliability of program executions, we use
a set of constraints to track the reliability scores of CNOT
and readout operations in the program. Let д.ϵ denote the
reliability score for the operation д. For readout operations,
we set the reliability as
∀д ∈ GReadout : ∀h ∈ QH : д.q = h ⇒ д.ϵ = ERh (10)
where ERh is the reliability score for readout operations on
hardware qubit h, and GR ⊆ G is the set of readout opera-
tions.
In R-SMT⋆ we perform reliability optimization using
the one bend paths routing policy. Under this policy, for
CNOT gate, we set reliability tracking variables based on
the junction used for routing. For each pair of hardware
qubits, we compute the reliability of the two possible paths,
and store them in a matrix EC , indexed by the hardware
qubits and junction. This reliability factors in the reliability
of the swap paths through the junction and the actual CNOT
operation. Let д.j be the junction for gate д ∈ GCNOT . The
constraints to track CNOT error are given for all д ∈ GCNOT
and for all h1,h2,hj ∈ QH :
дc = h1 ∧ дt = h2 ∧ д.j = hj ⇒ д.ϵ = ECh1,h2, j (11)
In our experiments, considering the error rates of single
qubit gates such as H, X, Y etc. is not required for IBMQ16,
because their error rates are much smaller than CNOTs and
readouts. For systems where such errors matter, they can
be easily incorporated into the optimization using similar
constraints.
4.5 Optimal Compilation: Objective Function
The different optimization variants use different objec-
tive functions. For the time-oriented variants T-SMTand
T-SMT⋆, the objective function is based on the execution
time for the program. Using the gate scheduling and dura-
tion constraints in Section 4, the objective is to minimize the
finish time of the last gate in the dependency order.
For the reliability-oriented variant, R-SMT⋆, the objec-
tive function is based on the reliability of a program ex-
ecution. We define the reliability of a program execution
as the product of the reliability of each of its gates. Since
single qubit gates have low error, we define the reliabil-
ity using CNOT and readout operations only. Ideally, the
reliability objective would be the product across all gates
of the readout and CNOT errors for the whole program:
max
∏
∀д∈GReadout∪GCNOT (д.ϵ). Because the SMT solver re-
quires linear operations, we convert this to an additive linear
objective function by considering the logarithm of the opera-
tion reliabilities, instead of their product. Finally, to allow for
different emphases on readout error versus CNOT error, we
convert the above objective into a weighted objective using
a weight ω which is applied to the readout error rates:
ω
∑
д∈GReadout
log(д.ϵ) + (1 − ω)
∑
д∈GCNOT
log(д.ϵ). (12)
We use this objective to study the relative importance of
CNOT and readout error rates.
Optimizing reliability places qubits at hardware locations
with high CNOT and readout reliability. It indirectly opti-
mizes qubit movement because CNOT gates between non-
adjacent qubits have low reliability. This objective is used
by R-SMT⋆ in our experiments. The output of the solver
has the optimal reliability with respect to the program and
machine model assumptions. Our experiments show that it
is also near-optimal in execution duration.
To compute a qubit mapping and gate schedule whichmax-
imizes this objective, we set up an optimization problem us-
ing this along with the mapping and scheduling constraints,
gate durations using calibration data, routing approaches,
and reliability constraints discussed before. The reliability
constraints make the д.ϵ variables dependent on the qubit
mapping variables.
5 Heuristic Compilation
Where tractable, the SMT-based compilation approach offers
the best chance at successful application runs on real hard-
ware. However, effective heuristic approaches may offer sim-
ilar reliability but scale better to future NISQ systems with
hundreds of qubits. Here we propose and evaluate heuris-
tic mapping/scheduling alternatives as comparators to the
optimization-based approaches.
Our heuristic techniques are also based on a program
graph constructed from the program IR. The program graph
has a node for every qubit, and an edge between every pair
of qubits which is involved in a CNOT. For example, the
program graph of BV4 has 4 nodes forp0,1,2,3 and 3 edges, one
from each of p0,1,2 to p3. For each heuristic, we first compute
the most reliable path between every pair of hardware qubits
using Dijkstra’s algorithm, where edge weights are given as
the negative log of the CNOT errors from the calibration data.
For both heuristics, once we map the qubits, we schedule
gates using an earliest ready gate first policy [27] and route
based on the precomputed paths.
5.1 Greatest Vertex Degree First
The GreedyV⋆ heuristic seeks to minimize communication
distance (and therefore reduce the number of error-prone
SWAP operations) by considering qubits in descending order
of degree. The degree of the qubit is the number of CNOTs
in which the qubit is used. First, place the highest degree
program qubit at the hardware location which has highest
readout reliability among high degree hardware qubits. Next,
for each program qubit which shares a CNOT with an al-
ready placed qubit, place this qubit in order to maximize the
total reliability of paths between it and each of its placed
7
neighbors, where the total reliability is given by the sum of
the path lengths computed between it and its neighbors.
5.2 Greatest Weighted Edge First
In GreedyE⋆, we map edges in the descending order of
weight. The weight of an edge between two nodes is the
number of times a CNOT gate is invoked between them.
Therefore, placing edges with high weight first allows qubits
which interact highly to be close together. Such placement
reduces qubit movement and increases reliability. The al-
gorithm starts by placing the highest weighted edge at on
hardware location with maximum CNOT and readout re-
liability. Next, for each edge which has one mapped one
unmapped endpoint, we map the unmapped qubit to the po-
sition which maximizes the total reliability of CNOTs with
already mapped qubits, where the total reliability is given
by the sum of the path lengths computed from before be-
tween it and its neighbors. The process is repeated for each
unmapped edge in weight order.
6 Experimental Setup
Benchmarks: Table 2 lists 12 quantum programs derived
from prior work on compilation and system benchmarking
[28–30]. These benchmarks include the Bernstein-Vazirani al-
gorithm [21], Hidden Shift Algorithm [31], Quantum Fourier
Transform [32], a one bit adder and important quantum ker-
nels such as the Toffoli gate [16]. We used or created Scaffold
programs for each benchmark and obtained LLVM IR using
the ScaffCC compiler [19]. To be runnable on real-system QC
hardware, the benchmarks must be relatively small in qubit
counts and short in execution time steps. Nonetheless, our
ability to show order-of-magnitude improvements in success
rate for these programs is a promising indicator of the value
of such compilation techniques for future larger systems and
programs. Furthermore, several of these programs—such as
QFT and Toffoli—are important kernels for larger programs.
Beyond these, to study scalability trends across different
qubit and gate counts, we generate a synthetic benchmark
where we can specify the number of qubits and gates and
from this, we experiment with randomly generated quantum
programs with 4-128 qubits and 128-2048 gates. We generate
these circuits by uniformly sampling gates from the universal
gate set of H, X, Y, Z, S, T, CNOT.
Compiler Configurations: To study various compilation
schemes, our framework includes various options for the
solver, routing policy, use of calibration data and other pa-
rameters. We evaluate these options one factor at a time
using the configurations listed in Table 1. We compare R-
SMT⋆and T-SMT⋆to demonstrate the benefits of noise-
adaptive compilation. We compare T-SMT⋆and T-SMT to
demonstrate the importance of considering gate times and
coherence times from calibration data.
Name Qubits Gates CNOTs CNOT Graph
BV4 4 12 3
BV6 6 12 3
BV8 8 18 3
HS2 2 16 2
HS4 4 28 4
HS6 6 42 6
Fredkin 3 19 8
Or 3 17 6
Peres 3 16 5
Toffoli 3 18 6
Adder 4 23 10
QFT 2 13 5
Table 2. Characteristics of benchmark programs.
Experimental Setup: Our compilation experiments use
an Intel Skylake processor (2.6GHz, 12GB RAM) using
Python3.5 and gcc version 5.4. Our optimization approach
uses the Z3 SMT solver [22]. To perform experiments on
IBMQ16, we use the IBM Quantum Experience APIs [8, 17].
The daily machine calibration data is available through the
Quantum Experience APIs. The calibration data includes
time data such as single qubit gate time, qubit coherence
time (T2 time), durations for CNOT gates, and error rates
such as single qubit gate error, CNOT gate error, and read out
(measurement) error. We use IBM’s Qiskit compiler/mapper
as our baseline for comparison, version 0.5.7.
Metrics: Before each run, we obtain the latest calibration
data, and recompile the benchmark. We execute each bench-
mark on IBMQ16, using 8192 trials in each run. We mea-
sure the success rate as the fraction of trials which gave
the correct answer. For example, success rate of 0.6 means
the execution produced the correct answer in 60% of the
trials. The ideal success rate is 1, where all trials succeed.
Results within a single graph are performed closely in time
so are comparable. Results from different graphs may not be
comparable because the machine error characteristics can
be different across runs. We also study quantum execution
time and compilation time. Because timing granularity is so
coarse, execution time is estimated using real gate duration
data from the IBMQ16 system. We report durations in terms
of timeslots on IBMQ16, where each timeslot is 80ns.
7 Optimizing Execution Reliability
BaselineComparison to IBMQiskit:We compare the suc-
cess rate of program runs from our compiler versus the IBM
Qiskit compiler for real-system runs on IBMQ16. Figure 5
shows the success rate of the IBM Qiskit compiler, T-SMT⋆
and R-SMT⋆ with ω = 0.5 on all the benchmarks. In all
8
B
V
4
B
V
6
B
V
8
H
S2
H
S4
H
S6
T
off
ol
i
Fr
ed
ki
n O
r
P
er
es
Q
FT
A
dd
er
Benchmarks
0.0
0.2
0.4
0.6
0.8
1.0
S
uc
ce
ss
R
at
e
Qiskit T-SMTF R-SMTF ω = 0.5
Figure 5. Measured success rate of R-SMT⋆compared to
Qiskit and T-SMT⋆. (Of 8192 trials per execution, success
rate is the percentage that achieve the correct answer in
real-system execution.) R-SMT⋆obtains higher success rate
than Qiskit because it simultaneously adapts placement ac-
cording to dynamic error rates and avoids unnecessary qubit
movement.
07
/3
1/
18
08
/0
1/
18
08
/0
2/
18
08
/0
3/
18
08
/0
4/
18
08
/0
5/
18
08
/0
6/
18
Date
0.3
0.4
0.5
0.6
S
uc
ce
ss
R
at
e
Toffoli T-SMTF
Toffoli R-SMTF
BV4 T-SMTF
BV4 R-SMTF
HS6 T-SMTF
HS6 R-SMTF
Figure 6. Executions of three benchmarks for 1 week. R-
SMT⋆is more resilient to errors compared to T-SMT⋆. Sim-
ilar trends for other benchmarks.
benchmarks, R-SMT⋆ has higher success rate than Qiskit,
indicating that its reliability-oriented objective function is ef-
fective. In fact, R-SMT⋆ obtains geomean 2.9x improvement
over Qiskit, with up to 18x gain. Figure 8 shows the mapping
used by Qiskit, T-SMT⋆and R-SMT⋆for BV4. Qiskit places
qubits in a lexicographic order without considering CNOT
and readout errors and incurs extra swap operations. For
BV8, the compiled code produced by Qiskit used 15 CNOT
operations to move qubits (in addition to the 3 CNOTs re-
quired by the algorithm), while R-SMT⋆ obtains a mapping
which require no qubit movement. Each extra CNOT gate
increases both the error rate and the execution duration of
the code and leads to poor success rate. Benchmarks which
require no qubit movement such as BV, HS, QFT and Adder
have higher reliability than Toffoli, Fredkin, Or, and Peres,
which require at least one qubit swap.
In all benchmarks, R-SMT⋆ outperforms T-SMT⋆, even
though they use the same number of qubit movement opera-
tions. While optimizing qubit communication is important,
it is essential to optimize for gate error rates to improve
success rate. In fact, in our experiments, when the machine
state has high variability, R-SMT⋆ can obtain up to 9.2x
improvement in success rate over T-SMT⋆ (see Fig. 7 and 8).
Resilience to Daily Variations: Since IBM limits the
executions researchers may perform per day, we perform
detailed experiments on three benchmarks, BV4, HS6 and
Toffoli. These benchmarks are chosen as examples of differ-
ent CNOT patterns (see Table 2). Figure 6 compares the suc-
cess rate of R-SMT⋆ and T-SMT⋆ over a week for the three
benchmarks. The success rate of the programs change every
day because error rates of the hardware CNOT and readout
units change daily. (We recompile each day before running.)
For all three benchmarks, R-SMT⋆ is more resilient to error
than T-SMT⋆, since it adapts the qubit mappings to account
for daily variations in operation error rates. Since T-SMT⋆
compiles based on static information (qubit topology and
gate duration), it uses the same qubits and hardware gates
every day, irrespective of their dynamic error characteristics.
7.1 Choice of Optimization Objective
Figure 7 compares R-SMT⋆ withω = {0, 0.5, 1} and T-SMT⋆
on the three benchmarks. R-SMT⋆ withω = 0.5 achieves the
highest success rate among the methods, with up to 9.25x
gain over T-SMT⋆. For BV4, we illustrate the mappings
obtained by the these methods in Figure 8. T-SMT⋆ obtains
a mapping which requires no qubit movement, but it uses
a hardware CNOT with very high error rate. With ω = 1,
R-SMT⋆ optimizes only for readouts and uses long swap
paths which reduce success rate. With ω = 0.5, R-SMT⋆
maps qubits to simultaneously optimize CNOT gate error,
readout error and qubit movement.
R-SMT⋆ with ω = 0.5 also achieves near-optimal exe-
cution durations, comparable to T-SMT⋆, which directly
optimizes for duration. From the perspective of compilation
time, optimizing for reliability is harder than optimizing
execution duration. However, each method finds optimal
mappings in under a minute, for each benchmarks.
R-SMT⋆ was executed withω ∈ [0, 1] to determine the rel-
ative importance of optimizing for readout error and CNOT
error. In general, choosing an ω roughly near 0.5 is appro-
priate to obtain good success rates. On the IBMQ16 machine,
readout and CNOT error rates are fairly balanced, and hence
we see that an equal weighted combination of both is suitable
for optimization.
9
BV4 HS6 Toffoli
Benchmarks
0.0
0.2
0.4
0.6
S
uc
ce
ss
R
at
e
T-SMTF
R-SMTF ω = 1
R-SMTF ω = 0
R-SMTF ω = 0.5
(a) Success Rate
BV4 HS6 Toffoli
Benchmarks
0
50
100
150
200
250
D
ur
at
io
n
(t
im
es
lo
ts
)
(b) Execution Duration
BV4 HS6 Toffoli
Benchmarks
0
5
10
15
20
C
om
pi
la
ti
on
T
im
e
(s
)
(c) Compile Time
Figure 7.Measured success rate, execution duration and compile time for three representative benchmarks. T-SMT⋆ which
directly optimizes for execution duration obtains the minimum execution durations, but R-SMT⋆ with ω = 0.5 is close, and
more resilient to errors (higher success rate). All benchmarks compile in less than 1 minute.
(a) Qiskit (b) T-SMT⋆:Optimize duration without error data
(c) R-SMT⋆(ω = 1): Optimize readout reliability (d) R-SMT⋆(ω = 0.5): Optimize CNOT+readout reliabil-
ity
Figure 8. For real data/experiment, on IBMQ16, qubit mappings for Qiskit and our compiler with three optimization objectives,
varying the type of noise-awareness. In each figure, the edge labels indicate the CNOT gate error rate (×10−2), and the numbers
inside each node indicate that qubit’s readout error rate (×10−2). The thin red arrows indicate CNOT gates. The yellow thick
arrows indicate SWAP operations. (a) Qiskit finds a mapping which requires SWAP operations and uses hardware qubits with
high readout errors (b), T-SMT⋆finds a a mapping which requires no SWAP operations, but it uses an unreliable hardware
CNOT between p3 and p0. (c) Program qubits are placed on the best readout qubits, but p0 and p3 communicate using swaps.
(d) R-SMT⋆finds a mapping which has the best reliability where the best CNOTs and readout qubits are used. It also requires
no SWAP operations.
7.2 Sensitivity to Gate Durations and Coherence
Time
We test whether the use of real gate time data significantly
affects the execution duration of NISQ benchmarks. Our
compiler is run on three settings: T-SMT(RR) which assumes
all hardware CNOTs have the same gate duration and T-
SMT⋆ (RR) and R-SMT⋆ (1BP) which use real gate durations.
We restrict R-SMT⋆ to the 1BP policy to reduce the number
of experimental configurations; we show in Section 7.3 that
the choice of routing policy doesn’t affect execution duration
for NISQ benchmarks.
Gate Durations: Figure 9 shows execution duration, com-
puted using the gate time data, for the three methods. Consid-
ering real gate durations can improve the execution duration
for each benchmark, with up to 1.68x gain on Toffoli. Con-
sidering real durations increases the number of constraints
in the optimization problem and increases the compilation
time by up to 3x (not shown). Even with real durations, each
benchmark requires only a few seconds of compilation time.
Coherence Time: Each benchmark finishes in less than
150 timeslots using the R-SMT⋆method. Since the coherence
time of the worst qubit on the machine is more than 300
timeslots, considering fine grained variations in coherence
time is not necessary for our benchmarks.
10
B
V
4
B
V
6
B
V
8
H
S2
H
S4
H
S6
T
off
ol
i
Fr
ed
ki
n
O
R
P
er
es
Q
FT
A
dd
er
Benchmarks
0
32
64
96
128
160
192
224
E
xe
cu
ti
on
D
ur
at
io
n
(t
im
es
lo
ts
) T-SMT RR
T-SMTF RR
T-SMTF 1BP
R-SMTF 1BP
Figure 9. Effect of gate durations, routing policy and objec-
tive function on execution duration. Although reliability is
our primary objective, several variants perform well on run
time as well. T-SMT⋆(either RR or 1BP) has the best exe-
cution duration, but R-SMT⋆is very close in run time and
offers better success rates. Noise-aware policies, R-SMT⋆and
T-SMT⋆, are 1.6x better than T-SMT.
B
V
4
B
V
6
B
V
8
H
S2
H
S4
H
S6
T
off
ol
i
Fr
ed
ki
n O
r
P
er
es
Q
FT
A
dd
er
Benchmarks
0.0
0.2
0.4
0.6
0.8
1.0
S
uc
ce
ss
R
at
e
R-SMTF ω = 0.5 GreedyEF GreedyVF
Figure 10. Noise-aware Heuristics: GreedyE⋆ heuristic
mapping offers reliability comparable to R-SMT⋆on most
benchmarks.
7.3 Effect of Routing Policy
Figure 9 compares the execution duration and compilation
time of T-SMT⋆ with two routing policies (RR and 1BP) and
R-SMT⋆ (1BP). The three policies produce executables with
similar execution duration since NISQ benchmarks are small,
and have only few parallel CNOTs. Hence, most CNOTs
execute without swapping or blocking qubits. Although R-
SMT⋆ optimizes reliability, it obtains execution durations
close to T-SMT⋆ on all benchmarks.
7.4 Success Rate and Scalability of Heuristics
We compare the success rate of heuristics to the optimal
methods and evaluate the scalability of all methods.
Figure 10 compares the success rate of the heuristics and
R-SMT⋆. Greedy methods are comparable to R-SMT⋆ in
success rate and in some cases, they outperform R-SMT⋆
marginally because ω = 0.5 may not the optimal value for
12
8
19
2
25
6
38
4
51
2
76
8
10
24
15
36
20
48
Operation Count
101
102
103
104
105
106
107
108
109
1010
C
om
pi
la
ti
on
T
im
e
(u
se
c)
4 (R-SMTF)
8 (R-SMTF)
32 (R-SMTF)
4 (GreedyEF)
8 (GreedyEF)
32 (GreedyEF)
128 (GreedyEF)
Figure 11. Scalability of optimal and heuristic methods on
synthetic benchmarks. The legend shows a line’s qubit count.
every benchmark and machine state. GreedyE⋆ is as success-
ful as R-SMT⋆ in all cases. Our study reveals the edge based
heuristic GreedyE⋆, is more successful than the vertex based
heuristic GreedyV⋆. Considering edges instead of vertices
allows the heuristic to prioritize the reliability of the most
frequent CNOTs.
To study the scalability of optimal and heuristic methods,
we used a benchmark of randomly generated quantum pro-
grams. Figure 11 shows the compilation time on the bench-
mark. R-SMT⋆ requires up to 3 hours to compile a program
with 32 qubits and 384 gates. On the other hand, the greedy
methods compile programs in under one second in all cases.
8 Related Work
Quantum programming languages and their compilers have
been developed by extending languages such as C and
C# with quantum functionality. Examples include Quipper
[33, 34], LIQUi|⟩ [35], and Scaffold [19, 20]. ProjectQ [36]
is a Python framework to describe quantum circuits and
compile them for different backends. PyQuil, developed at
Rigetti [37, 38] is another such Python framework. Until
very recently, most backends were simulators or resource-
estimators, rather than real hardware. Our work here is an
early example of top-to-bottom compilation from a high-
level QC language (Scaffold) to real hardware.
OpenQASM [13] and Quil [39] are low-level assembly
language interfaces to QC hardware [8]. To target IBM ma-
chines, our compiler produces optimized OpenQASM code.
Our compiler can be easily extended to generate code for
other low-level interface languages also.
QC compilation has been studied for different hardware
technologies and topologies. [40] develops a heuristic to
schedule quantum circuits on linear topologies where all
gates (including swaps) consume unit time. [41] uses AI plan-
ners for scheduling a specific class of quantum circuits. [27]
11
develops heuristic techniques for ion trap systems. Recently,
[42] compiled small benchmarks for IBM systems, based on
only qubit topology information, not calibration data. Two
recent works [43, 44] reduce swap operations and optimize
1-qubit gates for 5-qubit IBM systems. Other prior work
[45–58] are either manual methods or restricted to a specific
architecture, or a specific class of quantum programs; none
account for real gate durations, gate errors and variations in
qubit coherence time. Similarly, other work has focused on
compilation issues in future QC systems with ECC [59–65].
In contrast to these works, our compiler is designed and
evaluated using a real IBM QC system. Using real-system
measurements, we show that driving compilation decisions
based on machine calibration and configuration data dramat-
ically improves program success rates.
[66, 67] observed the usefulness of calibration data. While
[66] uses error data manually to improve execution success,
[67] proposes the use of calibration data-aware qubit map-
ping and movement policies on the 20-qubit IBM system.
However, they do not perform any real hardware executions
of their mapped code, making it difficult to compare results
based on reliability. Their work also does not discuss how
program success rates are computed on the simulator and
uses error rates which are scaled by 10x. Simulated or scaled
success rates may not correlate well with real performance.
[68] is another recent work which maps circuits in described
in the low-level OpenQASM language to IBMQ16. Their sim-
ulated annealing based method considers only CNOT error
rates to compute the qubit mapping. In contrast, our work
develops a toolflow which maps high-level programs onto
IBMQ16, using both CNOT and readout error rates, gate
times, coherence times and qubit layout. Using real-system
evaluations our work determines the relative importance of
these parameters and compares the performance of heuristic
and optimal techniques.
9 Conclusions
This paper proposed and evaluated calibration-aware com-
piler techniques for NISQ systems. We considered optimal
and heuristic compilation methods, the use of calibration
data, different objective functions and routing policies. Our
evaluations show it is crucial to adapt quantum program
compilation to dynamic operation error characteristics of the
machine. It is most important to consider CNOT and readout
error rates, since these operations are more noisy than single
qubit gates. Optimization based on qubit coherence time is
also useful, but less critical here because gate errors severely
limit useful computation time. Our research has shown that
SMT approaches are very effective for current and near-term
systems, but may not scale well to the far-NISQ machines of
500 qubits or more. For those, we have developed heuristic
approaches, GreedyV⋆ and GreedyE⋆, which offer nearly
as good results but with much more tractable compile times.
This paper’s results offer important insights on QC based
on real-system measurements. Our work shows the impor-
tance of initial qubit placement. Namely, benchmarks which
require more qubit movement are hard to reliably execute on
systems with grid topologies. Our results show that proper
placement could result in over 10X improvements in run
success rate. Mapping and scheduling based on calibration
data offer further benefits. Ultimately the best-performing
approach offered up to 18X improvement (2.9X average) in
success rate and up to 6X (2.7X average) improvement in
runtime over the current IBM Qiskit baseline. Our results
also give insights to future system designers. Developing
richer qubit topologies will reduce the need for SWAP op-
erations and improve the reliability of important quantum
primitives such as the Toffoli gate.
Our work is relevant for future QC systems for several
reasons. Fundamental unreliability in qubits [18] and short
coherence times, even with Schoelkopf’s coherence scaling
law [70], necessitate optimizations based on error rates and
gate times. Although QEC is promising in the long run, even
a single logical error-corrected qubit will be composed of
many noisy qubits and our methods will be useful to per-
form noise-adaptive compilation of error correcting circuits.
Our methods can also be extended to map programs to log-
ical qubits based on their error properties. Our techniques
can be adapted for other qubit technologies such as trapped
ions [71] and other routing approaches such as teleportation-
based communication [72] by choosing the appropriate con-
straints in the optimization.
Overall, given the challenges of building reliable and scal-
able QC hardware, the key for the next five years or more will
lie in ultra-efficient use of the resources available in NISQ
systems. Our tool offers important leverage in stewarding
runtime resource usage and optimizing reliability.
Acknowledgments
This work is funded in part by EPiQC, an NSF Expedition in
Computing, under grants CCF-1730449/1730082, in part by
NSF PHY-1818914 and a research gift from Intel.
References
[1] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung,
Xiao-Qi Zhou, Peter J. Love, Alán Aspuru-Guzik, and Jeremy L. O’Brien.
A variational eigenvalue solver on a photonic quantum processor.
Nature Communications, 5:4213 EP –, Jul 2014. Article.
[2] Abhinav Kandala, Antonio Mezzacapo, Kristan Temme, Maika Takita,
Markus Brink, Jerry M. Chow, and Jay M. Gambetta. Hardware-
efficient variational quantum eigensolver for small molecules and
quantum magnets. Nature, 549:242 EP –, Sep 2017.
[3] P. Shor. Polynomial-time algorithms for prime factorization and dis-
crete logarithms on a quantum computer. SIAM Review, 41(2):303–332,
1999.
[4] Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost,
Nathan Wiebe, and Seth Lloyd. Quantum machine learning. Nature,
549:195 EP –, Sep 2017.
12
[5] Julian Kelly. A Preview of Bristlecone, Google’s New
Quantum Processor. https://ai.googleblog.com/2018/03/
a-preview-of-bristlecone-googles-new.html, 2018. Accessed:
2018-08-05.
[6] IBM. IBM Announces Advances to IBM Quantum Systems and Ecosys-
tem. https://www-03.ibm.com/press/us/en/pressrelease/53374.wss,
2018. Accessed: 2018-08-05.
[7] Hsu, Jeremy. CES 2018: Intel’s 49-Qubit Chip Shoots for Quantum
Supremacy. https://spectrum.ieee.org/tech-talk/computing/hardware/
intels-49qubit-chip-aims-for-quantum-supremacy, 2018. Accessed:
2018-08-05.
[8] IBM. IBM Quantum Devices. https://quantumexperience.ng.bluemix.
net/qx/devices, 2018. Accessed: 2018-05-16.
[9] Simon J. Devitt, Ashley M. Stephens, William J. Munro, and Kae
Nemoto. Requirements for fault-tolerant factoring on an atom-optics
quantum computer. Nature Communications, 4:2524 EP –, Oct 2013.
Article.
[10] Martin Roetteler, Michael Naehrig, Krysta M. Svore, and Kristin Lauter.
Quantum resource estimates for computing elliptic curve discrete
logarithms, 2017.
[11] John Preskill. Quantum Computing in the NISQ era and beyond, 2018.
[12] Chris Lattner and Vikram Adve. LLVM: A Compilation Framework for
Lifelong Program Analysis & Transformation. In Proceedings of the In-
ternational Symposium on Code Generation and Optimization: Feedback-
directed and Runtime Optimization, CGO ’04, pages 75–, Washington,
DC, USA, 2004. IEEE Computer Society.
[13] AndrewW. Cross, Lev S. Bishop, John A. Smolin, and Jay M. Gambetta.
Open quantum assembly language, 2017.
[14] Ali Javadi Abhari, Arvin Faruque, Mohammad Javad Dousti, Lukas
Svec, Oana Catu, Amlan Chakrabati, Chen-Fu Chiang, Seth Vander-
wilt, John Black, Fred Chong, Margaret Martonosi, Martin Suchara,
Ken Brown, Massoud Pedram, and Todd Brun. Scaffold: Quantum
programming language. Report TR-934-12, Princeton University, 2012.
[15] IBM. IBM Qiskit. https://qiskit.org/, 2018. Accessed: 2018-08-05.
[16] N. David Mermin. Quantum Computer Science: An Introduction. Cam-
bridge University Press, New York, NY, USA, 2007.
[17] IBM. IBM Quantum Experience. https://github.com/Qiskit/
qiskit-api-py, 2018. Accessed: 2018-11-16.
[18] P. V. Klimov, J. Kelly, Z. Chen, M. Neeley, A. Megrant, B. Burkett,
R. Barends, K. Arya, B. Chiaro, Yu Chen, A. Dunsworth, A. Fowler,
B. Foxen, C. Gidney, M. Giustina, R. Graff, T. Huang, E. Jeffrey, Erik
Lucero, J. Y. Mutus, O. Naaman, C. Neill, C. Quintana, P. Roushan,
Daniel Sank, A. Vainsencher, J. Wenner, T. C. White, S. Boixo, R. Bab-
bush, V. N. Smelyanskiy, H. Neven, and John M. Martinis. Fluctuations
of energy-relaxation times in superconducting qubits. Phys. Rev. Lett.,
121:090502, Aug 2018.
[19] Ali JavadiAbhari, Shruti Patil, Daniel Kudrow, JeffHeckey, Alexey Lvov,
Frederic T. Chong, and Margaret Martonosi. Scaffcc: A framework
for compilation and analysis of quantum computing programs. In
Proceedings of the 11th ACM Conference on Computing Frontiers, CF
’14, pages 1:1–1:10, New York, NY, USA, 2014. ACM.
[20] ScaffCC Compiler. Compiler for the Scaffold Language. https://github.
com/epiqc/ScaffCC, 2018. Accessed: 2018-05-16.
[21] Ethan Bernstein and Umesh Vazirani. Quantum complexity theory. In
Proceedings of the Twenty-fifth Annual ACM Symposium on Theory of
Computing, STOC ’93, pages 11–20, New York, NY, USA, 1993. ACM.
[22] Leonardo de Moura and Nikolaj Bjørner. Z3: An Efficient SMT Solver.
In C. R. Ramakrishnan and Jakob Rehof, editors, Tools and Algorithms
for the Construction and Analysis of Systems, pages 337–340, Berlin,
Heidelberg, 2008. Springer Berlin Heidelberg.
[23] Nikolaj Bjørner, Anh-Dung Phan, and Lars Fleckenstein. νZ - An
Optimizing SMT Solver. In Christel Baier and Cesare Tinelli, editors,
Tools and Algorithms for the Construction and Analysis of Systems, pages
194–199, Berlin, Heidelberg, 2015. Springer Berlin Heidelberg.
[24] Tony Nowatzki, Michael Sartin-Tarm, Lorenzo De Carli, Karthikeyan
Sankaralingam, Cristian Estan, and Behnam Robatmili. A General
Constraint-centric Scheduling Framework for Spatial Architectures.
In Proceedings of the 34th ACM SIGPLAN Conference on Programming
Language Design and Implementation, PLDI ’13, pages 495–506, New
York, NY, USA, 2013. ACM.
[25] Teofilo F. Gonzalez and David Serena. Complexity of pairwise shortest
path routing in the grid. Theoretical Computer Science, 326(1):155 –
185, 2004.
[26] Christopher J. Glass and Lionel M. Ni. The turn model for adaptive
routing. J. ACM, 41(5):874–902, September 1994.
[27] Jeff Heckey, Shruti Patil, Ali JavadiAbhari, Adam Holmes, Daniel
Kudrow, Kenneth R. Brown, Diana Franklin, Frederic T. Chong, and
Margaret Martonosi. Compiler management of communication and
parallelism for quantum computation. In Proceedings of the Twenti-
eth International Conference on Architectural Support for Programming
Languages and Operating Systems, ASPLOS ’15, pages 445–456, New
York, NY, USA, 2015. ACM.
[28] Matthew Amy, Dmitri Maslov, Michele Mosca, and Martin Roetteler.
A meet-in-the-middle algorithm for fast synthesis of depth-optimal
quantum circuits. Trans. Comp.-Aided Des. Integ. Cir. Sys., 32(6):818–
830, June 2013.
[29] Norbert M. Linke, Dmitri Maslov, Martin Roetteler, Shantanu Debnath,
Caroline Figgatt, Kevin A. Landsman, Kenneth Wright, and Christo-
pher Monroe. Experimental comparison of two quantum comput-
ing architectures. Proceedings of the National Academy of Sciences,
114(13):3305–3310, 2017.
[30] Mathias Soeken, Thomas Haner, and Martin Roetteler. Programming
Quantum Computers Using Design Automation, 2018.
[31] Andrew M. Childs and Wim van Dam. Quantum algorithm for a gen-
eralized hidden shift problem. In Proceedings of the Eighteenth Annual
ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, pages 1225–
1232, Philadelphia, PA, USA, 2007. Society for Industrial and Applied
Mathematics.
[32] Michael A. Nielsen and Isaac L. Chuang. Quantum Computation and
Quantum Information: 10th Anniversary Edition. Cambridge University
Press, New York, NY, USA, 10th edition, 2011.
[33] Alexander S. Green, Peter LeFanu Lumsdaine, Neil J. Ross, Peter
Selinger, and Benoît Valiron. Quipper: A scalable quantum program-
ming language. In Proceedings of the 34th ACM SIGPLAN Conference
on Programming Language Design and Implementation, PLDI ’13, pages
333–342, New York, NY, USA, 2013. ACM.
[34] Alexander S. Green, Peter LeFanu Lumsdaine, Neil J. Ross, Peter
Selinger, and Benoît Valiron. Quipper: A scalable quantum program-
ming language. SIGPLAN Not., 48(6):333–342, June 2013.
[35] Dave Wecker and Krysta M. Svore. Liqui|>: A software design archi-
tecture and domain-specific language for quantum computing, 2014.
[36] Damian S. Steiger, Thomas Häner, and Matthias Troyer. ProjectQ: an
open source software framework for quantum computing. Quantum,
2:49, January 2018.
[37] Rigetti. PyQuil. https://github.com/rigetticomputing/pyquil, 2018.
Accessed: 2018-08-01.
[38] Rigetti. Rigetti Forest. http://forest.rigetti.com, 2018. Accessed: 2018-
08-01.
[39] Robert S. Smith, Michael J. Curtis, and William J. Zeng. A practical
quantum instruction set architecture, 2016.
[40] Gian Giacomo Guerreschi and Jongsoo Park. Two-step approach to
scheduling quantum circuits, 2017.
[41] Davide Venturelli, Minh Do, Eleanor Rieffel, and Jeremy Frank. Compil-
ing quantum circuits to realistic hardware architectures using temporal
planners. Quantum Science and Technology, 3(2):025004, 2018.
[42] Marcos Yukio Siraichi, Vinícius Fernandes dos Santos, Sylvain Col-
lange, and Fernando Magno Quintao Pereira. Qubit allocation. In
Proceedings of the 2018 International Symposium on Code Generation
13
and Optimization, CGO 2018, pages 113–125, New York, NY, USA, 2018.
ACM.
[43] Xin Zhang, Hong Xiang, Tao Xiang, Li Fu, and Jun Sang. An efficient
quantum circuits optimizing scheme compared with qiskit, 2018.
[44] Alwin Zulehner, Alexandru Paler, and Robert Wille. An efficient
methodology formapping quantum circuits to the ibm qx architectures,
2017.
[45] Amlan Chakrabarti, Susmita Sur-Kolay, and Ayan Chaudhury. Linear
nearest neighbor synthesis of reversible circuits by graph partitioning,
2011.
[46] Byung-Soo Choi and Rodney Van Meter. On the effect of quantum
interaction distance on quantum addition circuits. J. Emerg. Technol.
Comput. Syst., 7(3):11:1–11:17, August 2011.
[47] C. Lin, A. Chakrabarti, and N. K. Jha. Ftqls: Fault-tolerant quantum
logic synthesis. IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, 22(6):1350–1363, June 2014.
[48] Mohammad Javad Dousti and Massoud Pedram. Minimizing the la-
tency of quantum circuits during mapping to the ion-trap circuit fab-
ric. In Proceedings of the Conference on Design, Automation and Test
in Europe, DATE ’12, pages 840–843, San Jose, CA, USA, 2012. EDA
Consortium.
[49] C. Lin, S. Sur-Kolay, and N. K. Jha. Paqcs: Physical design-aware fault-
tolerant quantum circuit synthesis. IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, 23(7):1221–1234, July 2015.
[50] A. Kole, K. Datta, and I. Sengupta. A new heuristic for n -dimensional
nearest neighbor realization of a quantum circuit. IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, 37(1):182–
192, Jan 2018.
[51] D. Ruffinelli and B. Baran. A multiobjective approach to linear nearest
neighbor optimization for 2d quantum circuits. In 2016 XLII Latin
American Computing Conference (CLEI), pages 1–8, Valparaiso, Chile,
Oct 2016. IEEE.
[52] Mrityunjay Ghosh, Amlan Chakrabarti, and Niraj K. Jha. Automated
quantum circuit synthesis and cost estimation for the binary welded
tree oracle. J. Emerg. Technol. Comput. Syst., 13(4):51:1–51:14, June
2017.
[53] M. Pedram and A. Shafaei. Layout optimization for quantum circuits
with linear nearest neighbor architectures. IEEE Circuits and Systems
Magazine, 16(2):62–74, Secondquarter 2016.
[54] Mark Whitney, Nemanja Isailovic, Yatish Patel, and John Kubiatowicz.
Automated generation of layout and control for quantum circuits. In
Proceedings of the 4th International Conference on Computing Frontiers,
CF ’07, pages 83–94, New York, NY, USA, 2007. ACM.
[55] Paul Pham and Krysta M. Svore. A 2d nearest-neighbor quantum
architecture for factoring in polylogarithmic depth. Quantum Info.
Comput., 13(11-12):937–962, November 2013.
[56] Mehdi Saeedi, Robert Wille, and Rolf Drechsler. Synthesis of quantum
circuits for linear nearest neighbor architectures. Quantum Information
Processing, 10(3):355–377, June 2011.
[57] Alireza Shafaei, Mehdi Saeedi, and Massoud Pedram. Optimization of
quantum circuits for interaction distance in linear nearest neighbor
architectures. In Proceedings of the 50th Annual Design Automation
Conference, DAC ’13, pages 41:1–41:6, New York, NY, USA, 2013. ACM.
[58] Simon J. Devitt. Programming Quantum Computers Using 3-D Puzzles,
Coffee Cups, and Doughnuts. XRDS, 23(1):45–50, September 2016.
[59] Alexandru Paler, Simon J. Devitt, Kae Nemoto, and Ilia Polian. Mapping
of topological quantum circuits to physical hardware. Scientific Reports,
4:4657 EP –, Apr 2014. Article.
[60] Adam Paetznick and Austin G. Fowler. Quantum circuit optimization
by topological compaction in the surface code, 2013.
[61] Alexandru Paler, Austin G. Fowler, and Robert Wille. Synthesis of
arbitrary quantum circuits to topological assembly: Systematic, online
and compact. Scientific Reports, 7(1):10414, 2017.
[62] Alexandru Paler, Ilia Polian, Kae Nemoto, and Simon J Devitt. Fault-
tolerant, high-level quantum circuits: form, compilation and descrip-
tion. Quantum Science and Technology, 2(2):025003, 2017.
[63] Y. Lin, B. Yu, M. Li, and D. Z. Pan. Layout synthesis for topological
quantum circuits with 1-d and 2-d architectures. IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, 37(8):1574–
1587, Aug 2018.
[64] Alexandru Paler, Austin G. Fowler, and Robert Wille. Online scheduled
execution of quantum circuits protected by surface codes, 2017.
[65] L. Lao, B. van Wee, I. Ashraf, J. van Someren, N. Khammassi, K. Bertels,
and C. G. Almudever. Mapping of lattice surgery-based quantum
circuits on surface code architectures, 2018.
[66] Christophe Vuillot. Is error detection helpful on ibm 5q chips ?, 2017.
[67] Swamit S. Tannu and Moinuddin K. Qureshi. A Case for Variability-
Aware Policies for NISQ-Era Quantum Computers, 2018.
[68] Will Finigan, Michael Cubeddu, Thomas Lively, Johannes Flick, and
Prineha Narang. Qubit allocation for noisy intermediate-scale quan-
tum computers, 2018.
[69] X. Fu, M. A. Rol, C. C. Bultink, J. van Someren, N. Khammassi, I. Ashraf,
R. F. L. Vermeulen, J. C. de Sterke, W. J. Vlothuizen, R. N. Schouten,
C. G. Almudever, L. DiCarlo, and K. Bertels. A microarchitecture for
a superconducting quantum processor. IEEE Micro, 38(3):40–47, May
2018.
[70] Adam Sears. Extending Coherence in Superconducting Qubits: from
Microseconds to Milliseconds. PhD dissertation, Yale University, 2013.
[71] S. Debnath, N. M. Linke, C. Figgatt, K. A. Landsman, K. Wright, and
C. Monroe. Demonstration of a small programmable quantum com-
puter with atomic qubits. Nature, 536:63 EP –, Aug 2016.
[72] K. S. Chou, J. Z. Blumoff, C. S. Wang, P. C. Reinhold, C. J. Axline,
Y. Y. Gao, L. Frunzio, M. H. Devoret, Liang Jiang, and R. J. Schoelkopf.
Deterministic teleportation of a quantum gate between two logical
qubits, 2018.
14
