Machine learning of noise-resilient quantum circuits by Cincio, Lukasz et al.
Machine learning of noise-resilient quantum circuits
Lukasz Cincio,1 Kenneth Rudinger,2 Mohan Sarovar,3, ∗ and Patrick J. Coles1, †
1Theoretical Division, MS 213, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
2Quantum Computer Science, Sandia National Laboratories, Albuquerque, NM 87185, USA
3Extreme-scale Data Science and Analytics, Sandia National Laboratories, Livermore, CA 94550, USA
Noise mitigation and reduction will be crucial for obtaining useful answers from near-term quan-
tum computers. In this work, we present a general framework based on machine learning for reducing
the impact of quantum hardware noise on quantum circuits. Our method, called noise-aware circuit
learning (NACL), applies to circuits designed to compute a unitary transformation, prepare a set
of quantum states, or estimate an observable of a many-qubit state. Given a task and a device
model that captures information about the noise and connectivity of qubits in a device, NACL
outputs an optimized circuit to accomplish this task in the presence of noise. It does so by minimiz-
ing a task-specific cost function over circuit depths and circuit structures. To demonstrate NACL,
we construct circuits resilient to a fine-grained noise model derived from gate set tomography on a
superconducting-circuit quantum device, for applications including quantum state overlap, quantum
Fourier transform, and W -state preparation.
I. INTRODUCTION
Recent years have seen a surge in quantum computer
hardware development, and we now have several quan-
tum computing platforms with tens of qubits that can
be controlled and coupled with fidelities that enable ex-
ecution of quantum circuits of limited depth. This has
led to intense interest in formulating quantum algorithms
that can be reliably executed on such devices. The chal-
lenge however is that naive compilations of nearly all non-
trivial quantum algorithms require circuit depths that
are currently out of reach for near-term hardware. Mo-
tivated by this challenge, in this work we study how ma-
chine learning (ML) can be applied to formulate noise-
aware quantum circuits that can be executed on near-
term quantum hardware to produce reliable results.
Our method is called noise-aware circuit learning
(NACL), and given suitable description of a computa-
tional task and a device model that captures the noise
and constraints of a device, it outputs a native circuit
that performs the task with greatest robustness to noise.
NACL has several broad applications, as illustrated in
Fig. 1. The task can be the compilation of a specified
unitary transformation (Fig. 1(a)), the preparation of a
target state from a specified input state (Fig. 1(b)), or
the extraction of an observable from a many-qubit state
(Fig. 1(c)). In each case, NACL returns a circuit that
is the significantly more noise-resilient to the given noise
model, however, as we detail below, the formulation of
the machine learning problem is different in each appli-
cation. Perhaps the most familiar version of NACL is
that depicted in Fig. 1(a), where a specified unitary ma-
trix is to be implemented by a circuit composed of native
gates, which is usually called compilation. In this con-
text, NACL results in noise-aware circuit compilations.
∗ mnsarov@sandia.gov
† pcoles@lanl.gov
FIG. 1. Applications of NACL. (a) In compiling, the goal is
to approximate an input unitary matrix U by a noise-resilient
circuit that is compatible with the device constraints. (b) In
state preparation, one inputs a set of N input and output
states {|xi〉, |yi〉}, where N could be as small as one, and the
output is a noise-resilient circuit that approximately prepares
the |yi〉 states from the |xi〉 states. (c) In observable extrac-
tion, one inputs a set of input states and classical outputs
that typically correspond to local observable expectation val-
ues, {|xi〉, yi}, and the output is a noise-resilient circuit that
approximately computes the outputs from any input state |ψ〉
that might or might not be in the input set.
Previous work on circuit optimization for noise mit-
igation has largely considered the task of compilation,
under restricted models of errors or imperfections. In
fact, most work focuses on reducing overall circuit er-
ror by reducing the number of two-qubit gates (which
tend to be more noisy than single-qubit gates), avoid-
ing faulty qubits, reducing the number of SWAP gates
required in architectures with restricted connectivity, or
reducing the amount of qubit idle time and/or overall cir-
ar
X
iv
:2
00
7.
01
21
0v
1 
 [q
ua
nt-
ph
]  
2 J
ul 
20
20
2cuit depth [1–8]. These strategies incorporate very little
information about errors present in a particular hardware
platform. More recent work on error-aware compilation
by Murali et al. [9] goes beyond this and includes basic
calibration information (e.g., qubit T2 times, CNOT gate
error rates) to compile circuits using more reliable qubits
and gates.
In this work we extend this direction even further and
demonstrate that one can use fine-grained error model
information to increase the reliability of the outputs of
quantum circuits. Incorporating detailed noise models
into one’s circuit optimization, as we do here, is par-
ticularly compelling at present with the advent of ad-
vanced characterization techniques like gate-set tomog-
raphy [10, 11]. These techniques produce fine-grained
details – e.g., estimates of process matrices representing
the action of imperfect quantum gates – describing the
actual evolution of qubits in near-term hardware. We
will demonstrate that such experimentally derived noise
models can be used to go beyond naive circuit compila-
tions for several example quantum algorithms.
NACL has several additional strengths relative to ex-
isting approaches in the literature. Crucially, NACL
takes a task-oriented approach to quantum circuit dis-
covery, which implies that one does not need a starting
point or example quantum circuit that already accom-
plishes the task. Note that traditional compilers do re-
quire such a quantum circuit to start from. Furthermore,
because NACL does not start from a template circuit,
the optimization is less susceptible to bias. In contrast,
standard literature methods that tweak a given quantum
circuit inherently bias their optimization towards solu-
tions that look like that starting point. This means that
NACL has the potential to discover more novel solutions
that otherwise would not be obvious to the human mind.
In addition, we will see that NACL naturally balances
the trade-off between circuit depth, which leads to more
expressivity, and circuit noise, which makes outputs less
accurate.
Machine learning was previously applied to train pa-
rameterized quantum circuits [6, 12], albeit in a noise-
free setting. In addition, variational quantum algorithms
(VQAs) [13–26] can also be thought of as machine learn-
ing of quantum circuits. In the Discussion (Sec. VII), we
elaborate on the relationship between NACL and VQAs.
In what follows, we first present our theoretical frame-
work (Sec. II). We then discuss a device model with
experimentally determined noise parameters (Sec. III).
Next, we present our implementations of NACL with this
noisy device, for examples from the three different appli-
cation classes shown in Fig. 1 (Secs. IV - VI). Finally,
we conclude with a discussion in Sec. VII.
FIG. 2. Schematic diagram of NACL. Our approach takes a
task and a device model as an input. The task is defined via
examples in a training set and a cost function, C. That infor-
mation is sufficient to find a noise-aware circuit that approx-
imates a specified task. It is done via optimization over a set
of parameters (L,k,θ) that describe a quantum circuit. The
algorithm returns parameters (Lopt,kopt,θopt), which repre-
sent an optimized quantum circuit that minimizes the cost
function C. See text for details.
II. MACHINE LEARNING FRAMEWORK
A. Overview
A schematic diagram of the steps of NACL is shown in
Figure 2. There are two inputs to NACL: (1) a task, and
(2) a device model. The output of NACL is an optimized
quantum circuit that accomplishes the inputted task in
the presence of the inputted device model. NACL may
not output a globally optimal solution (this depends on
details of the cost function landscape and optimization
method used), but even local optima are improvements
over circuit compilations that are not noise-aware.
Note that circuit depth is not an input to NACL. This
is because NACL optimizes over circuit depths, and aims
to find the depth that achieves the most noise resilience.
In addition, an ansatz for the circuit is not an input,
because NACL attempts to optimize over many ansätze.
Hence, the structure of the circuit, as well as its depth,
are optimized by NACL. This feature of NACL is in the
spirit of task-oriented programming, where the user only
needs to specify the task, and not the details of the cir-
cuit. NACL adapts the circuit structure to optimize a
cost function that depends on the type of task specified.
As shown in Fig. 1, there are three categories of tasks.
In what follows we provide more details on how NACL
works. Sections II B and IIC discuss the device model
and noise specification, and Section IID defines the
NACL cost function for each application. Finally, Sec-
tion II E summarizes the optimization methods used by
NACL.
3B. Parameterized circuit
For a given quantum hardware, we denote the native
gate set or gate alphabet as A = {Aj(θ)}. Each gate
Aj is either a one- or two-qubit gate and may also have
an internal continuous parameter θ. As an example, the
IBM Q 5-qubit computer “Ourense” has the native gate
alphabet
AOurense = {CNOT12,CNOT23,CNOT24,CNOT45,
Z1(θ), X1(pi/2), Z2(θ), X2(pi/2),
Z3(θ), X3(pi/2), Z4(θ), X4(pi/2),
Z5(θ), X5(pi/2)} , (1)
where CNOTjk is a CNOT between qubits j and k, Zj(θ)
is a rotation of angle θ about the z-axis of qubit j, and
Xj(pi/2) is a rotation of angle pi/2 about the x-axis of
qubit j (also called a pulse gate).
Such a gate set is supplemented by state preparation
and measurement quantum operations. These are typ-
ically fixed in most quantum computing architectures
(e.g., prepare all qubits in the ground state and mea-
sure in the computational basis), and therefore there is
no opportunity for optimizing over these. Therefore, we
do not consider these as part of the learnable set.
We consider a generic gate sequence that defines a cir-
cuit
Gα = G(L,k,θ) = AkL(θL) · · ·Ak2(θ2)Ak1(θ1) , (2)
where L is the number of gates, k = (k1, ..., kL) is the
vector of indices describing which gates are utilized in the
gate sequence, θ = (θ1, ..., θL) is the vector of continuous
parameters associated with these gates, andα = (L,k,θ)
is the set of all these parameters. All parameters in α =
(L,k,θ) are optimized over in NACL.
C. Device model
An input to NACL is a device model, which captures
the constraints of a device (e.g., limited connectivity)
and also represents the noise in the device. We assume
the device constraints and connectivity are captured by
the specification of a native gate alphabet for the device,
e.g., Eq. (1). Only gates that are available are listed in
this specification.
The salient characteristics of noise are captured by (i)
process matrices for each element of the device’s native
gate alphabet, and (ii) for state preparation and mea-
surement (SPAM) noise, by quantum-classical channels
that represent noisy state preparation or measurement
POVM elements. The assumption of a fixed process ma-
trix for each gate in the alphabet restricts this treatment
to Markovian noise. This can be relaxed by generalizing
to time-dependent process matrices for each elementary
gate, but we do not do this here for simplicity, and also
because characterization tools capable of producing such
non-Markovian representations of quantum computer op-
erations are still in early stages of development [27]. Sim-
ilarly, in this treatment we mostly ignore the effects of
crosstalk, and assume that the process matrix describing
a gate operates only on the qubits the ideal gate is de-
fined on. Properly incorporating crosstalk into the noise
models that NACL considers requires advances in char-
acterization methods [28] that we discuss later.
Given this paradigm for representing noisy quantum
operations, each gate in the alphabet A has an associated
process matrix that accounts for the local noise occurring
during that gate. Note that even the identity gate may
have a non-trivial process matrix, for example due to
relaxation during idling.
Mathematically speaking, the noise model provides a
map from a parameterized circuit Gα to a parameterized
quantum channel Eα:
Gα
Noise Model−−−−−−−−→ Eα . (3)
Here, Eα is a completely positive trace preserving
(CPTP) map that represents the action of Gα in a noisy
environment.
Specifically, when the noise model is given in the form
of process matrices for gates, one can do the follow-
ing. Let A = {Aj(θ)} denote the gate alphabet asso-
ciated with the noiseless gates. In the presence of noise,
this gate alphabet becomes a set of quantum channels,
A¯ = {A¯j(θ)}, where we note that A¯j(θ) now denotes
a quantum channel. Now suppose that Gα is given by
Gα = AkL(θL) · · ·Ak2(θ2)Ak1(θ1). Then the simplest way
to incorporate the noise model would be to replace each
Aki with A¯ki ; i.e., to transform Gα into a sequence of
quantum channels:
Eα = A¯kL(θL) ◦ · · · ◦ A¯k2(θ2) ◦ A¯k1(θ1) . (4)
However, it is important to note that this formula for Eα
only accounts for the non-trivial gates that were in the
original circuit Gα. However, in practice, identity gates
will occur with noise due to, e.g., thermal relaxation.
Therefore, care must be taken with respect to identity
gates, and we discuss this next.
1. Parallelization
The object we are optimizing over, the circuit in Eq.
(2), needs to be modified in the presence of imperfect
idle operations. In this case, the sensible thing to do
is to perform as many gates in parallel as possible, but
the description of a circuit as a sequence of gates, as in
Eq. (2), is incomplete because it does not capture which
gates can be performed in parallel. In other words, in the
presence of imperfect idle operations we cannot simply
think of Gα as a linear sequence of gates; we have to
map Gα to a two-dimensional circuit diagram, in space
and time.
4Abstractly, we can re-write
GParα = Gα = Uα,M · · · Uα,2Uα,1 . (5)
Here, each Uα,j represents a layer of gates that can be
parallelized. Specifically, we take the circuit proposed in
Gα and compress it using simple circuit rules to mini-
mize idling of qubits. For example, an X(pi/2) rotation
that occurs on the target qubit after a CNOT can be
moved to before the CNOT because their actions on the
target qubit commute. In this manner, each gate in Gα
is moved to as early a time as possible without changing
the unitary being implemented by Gα. This naturally
defines the circuit layers and subsequently GParα . Even
though the reordering does not change the overall uni-
tary, whenever we write Gα in the form in Eq. (5) we de-
note it as GParα . An important aspect of the optimization
in NACL is to numerically find the parallelized represen-
tation, GParα , that yields the minimum error in the cost
functions detailed below.
Once Gα is rewritten in the form of GParα , we can then
account for noise by replacing each gate in GParα by the
quantum channel that represents its noisy implementa-
tion. For example, if a circuit layer in GParα on a 5-qubit
processor happens to be
Uα,j = Z
1(θ)⊗ CNOT23 ⊗ I4 ⊗X5, (6)
where the superscript indicates which qubit the gates are
operating on (and Z(θ) is a rotation around the Z axis,
CNOT is a CNOT gate, I is the identity, and X is a
pi/2 rotation around the X axis). This layer would be
replaced by
U¯α,j = Z¯
1(θ)⊗ CNOT23 ⊗ I¯4 ⊗ X¯5, (7)
where the quantities with bars above them are the quan-
tum channels representing those gates. Then the overall
noisy circuit corresponding to GParα is written as
EParα = U¯α,M ◦ · · · ◦ U¯α,2 ◦ U¯α,1. (8)
It is important to note that NACL uses EParα rather than
Eα as the overall noisy channel associated with Gα.
Note that this procedure of parallelizing and incorpo-
rating the noise model that we have outlined is valid
because our noise models do not account for crosstalk
effects. If crosstalk is significant, then this strategy of
maximizing parallelization might not be optimal since
performing many gates in parallel may lead to more noise.
Moreover, in the presence of significant crosstalk, captur-
ing processor noise using quantum channels for each of its
gates is probably insufficient. Instead, one would need to
characterize each possible layer (there are an exponential
number of these) since the operation on a qubit due to
application of a gate could depend on what is done to
any other qubit in the computer at the same time. We
discuss how to extend NACL in the presence of crosstalk
in the Sec. VII.
D. Cost functions
In this subsection, we construct the cost functions that
are minimized by NACL in each of the application classes
outlined in Fig. 1.
1. Preliminaries
We first define some relevant quantities. Let F (ρ, σ) =
(Tr
√√
ρσ
√
ρ)2 be the fidelity between two states ρ and
σ. For a given pure input state |ψ〉, we can denote the
fidelity of the output states under quantum channels E
and F as
F (E ,F , |ψ〉) := F (E(|ψ〉〈ψ|),F(|ψ〉〈ψ|)) . (9)
We will be interested in the case where F corresponds to
a unitary process U , in which case we have
F (E ,U , |ψ〉) = Tr(E(|ψ〉〈ψ|)U(|ψ〉〈ψ|)) . (10)
Furthermore, we can define the average process fidelity
as
F (E ,U) =
∫
dψF (E ,U , |ψ〉) (11)
=
∫
dψTr(E(|ψ〉〈ψ|)U(|ψ〉〈ψ|)) , (12)
with the integral taken over the Haar measure.
2. Observable extraction
The first class of applications involves estimating an
observable given one, or a set of, input states. An exam-
ple of this is computing the overlap of two quantum states
(discussed in Section IV). In this application, the output
of the circuit is a classical number (the observable expec-
tation, which in practice is estimated by many executions
of the circuit) denoted f(x), and the input, denoted |x〉,
is a quantum state (or classical data encoded in a quan-
tum state). Hence, we want to construct a circuit that
computes the function |x〉 → f(x), We classically gener-
ate a training data set of the form
T = {(|x(i)〉, f(x(i)))}Ni=1 . (13)
In general, the amount of training data required could
scale exponentially in the problem size (i.e., number of
qubits), since the data must be general enough to cover
the space of possible inputs.
Recall that the parameters α define a circuit Gα,
which in turn defines a noisy quantum channel EParα . For
this quantum channel, let y(i)α denote the output of the
circuit (i.e., the expectation of the observable of interest)
5when the input is |x(i)〉. Then we define the cost function
as
COE(α) =
1
N
N∑
i=1
(f(x(i))− y(i)α )2 . (14)
The cost quantifies the discrepancy between the desired
output f(x(i)) and the true output y(i)α , averaged over all
training data points.
3. State Preparation
A second class of applications outlined in Fig. 1 is
state preparation. Here, the input is a quantum state
or more generally a set of quantum states {|x(i)〉}Ni=1.
The task is then to construct a circuit U that prepares
the output states {|y(i)〉 = U |x(i)〉}Ni=1 from these input
states. In other words, one wishes to learn a unitary U
that accomplishes the desired state preparation task on
the training data, {|x(i)〉, |y(i)〉}Ni=1. Note that this is an
under-constrained problem since in the state preparation
application N  2n, where U is an n-qubit unitary. In
this case, we use the following cost function:
CSP(α) = 1− 1
N
N∑
i=1
F (EParα ,U , |x(i)〉), (15)
where U(·) ≡ U(·)U†. This is the infidelity between state
prepared by EParα and the target state |y(i)〉, averaged
over the training data points. A typical scenario is when
there is a single input and output state (N = 1), as we
will consider below in Section V.
4. Compilation
Finally we consider the application of compiling a tar-
get unitary, U , into a set of native gates. The action of
U on all possible input quantum states must be repro-
duced. This is a more challenging task than constructing
a state-preparation circuit, since one must consider the
action on all states rather than just on one state or a
small set of states.
Let U(·) ≡ U(·)U† denote the quantum channel as-
sociated with U . Then we define the cost function for
compiling as
CUC(α) = 1− F (EParα ,U). (16)
Note that this is analogous to Eq. (15) with the discrete
average replaced by a continuous average (i.e., integral
with Haar measure). The average, F¯ , can be computed in
various ways. Most elegantly, the average process fidelity
is related to the entanglement fidelity Fe, via [29–31]
F (Eα,U) = dFe(U
† ◦ EParα ) + 1
d+ 1
, (17)
where Fe(E) = 〈φ|I ⊗E(|φ〉〈φ|)|φ〉 = F (|φ〉〈φ|, E(|φ〉〈φ|)),
with |φ〉 = ∑j |j〉|j〉/√d being a maximally entangled
state, and d = 2n being the Hilbert-space dimension.
Therefore, we can compute the compilation cost func-
tion by computing F (|φ〉〈φ|, I ⊗U† ◦EParα (|φ〉〈φ|)). From
the machine learning perspective, the training data set in
this case just consists of a pair of states {|φ〉, (1 ⊗U)|φ〉}.
However, this approach requires a computation in a dou-
bled space of dimension 22n.
Alternative approaches to computing F¯ that trade this
greater memory complexity for greater time complexity
(but can be easily parallelized) are (i) to approximate the
Haar average with a sample average over a set of states
that form a 2-design, or (ii) to use Nielsen’s formula in
terms of Pauli operators {σi}d2i=1 [31]
F (Eα,U) = 1
d2(d+ 1)
( d2∑
i=1
Tr(UσiU
†E(σi)) + d2
)
.
From the machine learning perspective, for (i), the train-
ing data set corresponds to the sampled 2-design and the
the action of the ideal channel on these, {|φi〉, U |φi〉},
and for (ii) the training data set corresponds to the Pauli
operators and the action of the ideal channel on these,
{σi, UσiU†}.
E. Optimization Methods
In this Section we describe the techniques used to find
optimal values of parameters αopt = (Lopt, kopt,θopt) for
a given task and device model, see Fig. 2. The methods
are general and could be applied to any cost function.
In particular, they are applicable to the cost functions
associated with the applications discussed in Section IID.
The space in which the optimization takes place is large
and has a complicated form. In our method we are opti-
mizing over circuits composed of gates taken from a par-
ticular alphabet. The circuit is described by two kinds
of parameters, discrete and continuous. The discrete pa-
rameters k define the circuit’s layout. That is, they spec-
ify what type of gate is acting on a given qubit, at a given
time during the evaluation of the circuit. The continu-
ous parameters θ span all gates that contain a variational
parameter. In the example of an alphabet derived from
the IBM Q Ourense device in Eq. (1), only Z rotations
contain a continuous parameter.
The optimization is an iterative procedure in which
every iteration is organized in two nested loops. In the
inner one, the optimizer deals with continuous parame-
ters with a fixed circuit layout k. Changes to the struc-
ture of the circuit are introduced in the outer loop. The
optimization over continuous parameters θ is straightfor-
ward. Once the structure parameters k are fixed, the cost
function depends on at most L continuous parameters θi.
We use off-the-shelf, unconstrained (the cost function is
invariant under θi → θi+2pi) methods to find a minimum
of the cost function Ck = Ck(θ).
6When the minimum c of Ck = Ck(θ) is found, the op-
timizer switches to the outer loop and makes a change
in the structural parameters k. In this part of the pro-
cedure the optimizer is testing small, random updates to
the structure of the circuit. Those updates include gate
shuffling, gate removal as well as inserting new gates in
the form of resolutions of identity (1-qubit and 2-qubit
ones). This way, the number of gates L in the circuit is
variable and reaches an optimal (noise dependent) value
during the optimization, see below for more detailed dis-
cussion. After new structural parameters k′ are identi-
fied, the optimizer enters the inner loop and varies con-
tinuous parameters θ to find a new minimum c′ of a cost
function Ck′ = Ck′(θ). Finally, the optimizer makes a
decision whether or not the old circuit structure k should
be replaced by the new one k′. Here we follow the simu-
lated annealing approach and accept the change if c′ < c.
The change is rejected if c′ > c with probability exponen-
tially increasing in c′ − c.
The above describes one iteration of the optimization
algorithm. The iterations are repeated until convergence
of the cost function is observed. The optimization is also
restarted multiple times to detect possible local minima.
Finally, let us mention an important feature of the op-
timization approach. As stated above, random structure
updates done in the outer loop involve identity insertion
and gate removal. Because the cost function is evalu-
ated in the presence of noise, this procedure can some-
times lead to a larger value of the cost function (this is
not possible with noiseless simulator). Thanks to that,
the optimization algorithm automatically finds the opti-
mal length L of the circuit for a specified error model.
Other machine learning approaches that are not noise-
aware must be artificially biased towards short circuits.
In contrast, our approach automatically finds a balance
between deep, expressive but noisy circuits and shallow,
less noisy ones.
III. NOISE MODEL
We demonstrate NACL in the following sections us-
ing a fine-grained noise model derived from one- and
two-qubit gate-set tomography (GST) [10, 11, 32] ex-
periments run on the five-qubit IBM Q Ourense super-
conducting qubit device. We emphasize that we are not
claiming to capture the full behavior of this device; this
cannot be done with just one- and two-qubit GST, and
we need to make some assumptions about device behav-
ior. The most important physical effects we are ignoring
in this noise model are: (i) non-uniformity across the de-
vice, since we use one-qubit GST results on qubit 0 and
two-qubit GST on the qubit pair 0-1 to infer process ma-
trices for all qubits on the device, and (ii) since we do
not characterize spectator qubits, we do not capture any
crosstalk effects.
One-qubit GST on qubit 0 of the Ourense device yields
estimated one-qubit process matrices representing chan-
FIG. 3. Qubit layout and connectivity for device modeled in
the noise model used to demonstrate NACL. This layout is
inspired by the IBM Q Oursense device, and the lines indicate
the qubits that can participate in CNOT coupling gates.
FIG. 4. The textbook SWAP-test based circuit for state over-
lap estimation when the input states (ρ, σ) are single qubit
states. It is obtained by decomposing the SWAP operation
into a standard universal gate set.
nels associated to the principal native gates on the device,
X(pi/2) (or the “pulse” gate), and I, the single-qubit idle
operation. The other single qubit gate used in this de-
vice is Z(θ), but this is performed virtually in software
(through a phase shift of future single qubit gates) and so
we assume it takes no time and is implemented perfectly.
We also use the process matrices estimated by single-
qubit GST for |0〉 state preparation and single-qubit mea-
surement POVM elements for representing these opera-
tions. Then two-qubit GST on qubits 0 and 1 is used
to extract a process matrix for the CNOT gate. All the
estimated process matrices and their figures of merit are
presented in Appendix B.
We assume the layout and connectivity of the qubits
is the same as for the IBM Q Ourense device, and these
are outlined in Fig. 3. This connectivity and the pro-
cess matrices described above together define our device
model.
Note that we only performed GST on qubits 0 and 1
for simplicity, and assume that the resulting process ma-
trices describe the same gates on other qubits also. This
assumption could easily be relaxed at the expense of more
GST experiments on all the qubits in the device.
IV. IMPLEMENTATION FOR OBSERVABLE
EXTRACTION
The observable extraction application we focus on is
state overlap estimation, where the task is to estimate
the overlap between two input states ρ and σ, i.e., esti-
mate Tr(ρσ). The standard way to achieve this is to ap-
ply a controlled swap operation conditioned on an ancilla
qubit, and then measure an expectation of an observable
on the ancilla. We consider the case where ρ and σ are
7FIG. 5. The form of the textbook SWAP-test based overlap
estimation circuit, shown in Fig. 4, when decomposed into the
native gates in our device model. P denotes the pulse gate, or
X(pi/2) rotation, and I is an idle timestep. The vertical lines
denote Z(θ) rotations that are done virtually and therefore
take no time. This notation helps visualize which gates can be
performed in parallel. Values of θn are shown in Appendix A.
single qubit states, and decompose the textbook SWAP-
based circuit for overlap estimation into a standard gate
set in Fig. 4.
For evaluation under the noise model, we first compile
the textbook circuit in Fig. 4 into the native gate set
composed of CNOT, X(pi/2) and Z(θ) rotations. Given
the connectivity of the device, Fig. 3, we map the input
qubits to qubits 2 and 3, and the ancilla qubit to qubit 1.
This is the most favorable mapping since in this case the
minimal number (2) of CNOTs in Fig. 4 needs to be de-
composed to account for the lack of device connectivity.
There are other mappings that result in similar require-
ments for CNOT decomposition. We iterated over all of
them and selected the decomposition that gives the small-
est error (as measured by the value of the cost function
evaluated in the presence of noise).
The decomposed circuit is shown in Fig. 5. In this
figure we show identity gates, or periods where a qubit
is idle, in red. This circuit has been compressed and
made as parallel as possible (using simplifications af-
forded by simple commutations relations and circuit iden-
tities), however, the remaining idle periods cannot be
compressed away. We assume that X(pi/2) rotations (de-
noted P in the figure) take the same amount of time as
a CNOT for simplicity.
Next, we consider ML-based circuit implementations
that do not consider noise. Using techniques developed
in [6], which attempt to finding exact implementations
that consist of as few gates as possible, we perform train-
ing without the noise model (but with the connectivity
restrictions of the device). The training dataset size con-
sists of 15 pairs of randomly generated single qubit states
and their computed overlap. The resulting circuit for
overlap estimation and its compiled version are shown
in Fig. 6. In the absence of the noise model there is no
penalty for the circuit to contain identity gates, and so
the resulting circuit has a lot of them.
Finally, we apply NACL to this problem and formulate
the cost function using circuit simulation with the noise
model described in Sec. III. The training dataset size
consists again of 15 pairs of randomly generated single
qubit states and their computed overlap. The algorithm
works directly with the native gate set, and so no subse-
FIG. 6. (a) Machine learned circuit found without consid-
ering the noise model. (b) The circuit decomposed into the
native gates in the device model. The notation is the same as
in Fig. 5. Values of θn are shown in Appendix A.
FIG. 7. Machine learned circuit found by NACL incorporat-
ing the noise model. The notation is the same as in Fig. 5.
Values of θn are shown in the Appendix A.
quent decomposition is necessary. The circuit found by
NACL is shown in Fig. 7. Two features of the NACL cir-
cuit immediately stand out. First, since we have taken
into account the noise associated with idling qubits, the
circuit contains very few idles. Second, NACL makes in-
teresting use of Z(θ) gates – these are error free, take
no time, and also increase the expressiveness of a cir-
cuit – and consequently, NACL seems to maximize their
use (especially compared the noise unaware ML circuit in
Fig. 6, which does not distinguish Z(θ) gates from other
gates, and therefore does not use them more frequently).
This liberal use of Z(θ) gates most likely also leads to
the shorter depth circuit. It should be stressed that these
features are not built into the algorithm but result from
the optimization and represent the best found balance
between the number of gates and the noise induced by
their action.
In the following, we compare the performance of the
three circuits described above. We generated a valida-
tion dataset – 1000 pairs of new random one-qubit, mixed
states {ρj , σj} – and apply the three circuits to estimate
the overlap between each pair (the circuits are simulated
under the noise model). For simplicity, we label the text-
book circuit (Fig. 5) A1, the noise unaware, standard
ML circuit (Fig. 6) A2, and the result of NACL (Fig. 7)
A3. Fig. 8(a) compares the errors of all three circuits, de-
fined as the absolute value of the difference between the
exact overlap Tr(ρjσj) and its estimate computed with
the given circuit:
errorj,Ai = |Tr(ρjσj)− 〈σz〉Ai |, (18)
where 〈σz〉Ai is the expectation value of the σz operator
on the measured qubit at the end of circuit Ai. The
8FIG. 8. (a) A comparison of error in computing state overlap
(as quantified by Eq. (18)) for each of the validation samples
for the three circuits: textbook (A1, blue), noise unaware ML
(A2, red), and NACL (A3, green). The x-axis indexes pairs of
states in the validation dataset. The inset shows differences
in error. (b) Overlap estimation error for the three circuits
as a function of the exact value of the overlap Tr(ρjσj). The
inset shows a histogram of exact overlaps for the validation
dataset.
data is sorted such that the error of A1 is increasing
with sample index, j. Fig. 8(a) shows that noise-aware
ML generated circuit gives the best overlap estimate for
most of the state-pairs.
The inset in Fig. 8(a) shows the difference between the
error of the textbook circuit and both ML circuits (these
sets of data points are both independently ordered ac-
cording to decreasing error difference). The ML circuit
is better than the textbook one if the value shown in the
inset is positive. We can see that this is indeed the case
for over 90% of cases, with the NACL circuit also outper-
forming the regular ML circuit in these cases. For fur-
ther analysis, we look at the same data in Fig. 8(b), but
this time with the errors plotted against exact overlap of
the 1000 samples in the validation dataset. This figure
shows that the error of A1 generally decreases with the
exact overlap. In addition, the error of A3 (NACL) shows
non-monotonic behavior with exact overlap, achieving its
minimum around exact overlap of 0.5 and increasing for
larger and smaller overlaps. This behavior of NACL er-
ror can be explained by the specifics of training method
and the type of the cost function that was used. NACL
is trying to minimize average error (see Eq. (14)), and
examining a histogram of overlaps in the training sam-
ple (inset in Fig. 8 (b)) we see that these overlaps are
concentrated between 0.4 and 0.5. Therefore, NACL op-
timizes the average-case cost function by performing best
on input state pairs that have overlaps around this value.
An interesting observation is that there can be a correla-
tion between the structure of a circuit and the overlaps
it can best estimate.
Finally, we can explain why the textbook circuit out-
performs NACL in regions of low exact overlap as a com-
bination of two factors: (i) as mentioned above, NACL
minimizes average error, and the contribution to this
from training samples with small overlap is small; hence
it sacrifices performance on small overlap states to get
better performance on states with larger overlap; (ii) the
other factor that results in the textbook circuit perform-
ing well for small exact overlap samples is accidental;
namely, that the overlap is estimated by measuring 〈σz〉
on the ancilla, and this quantity tends to zero with circuit
length (since the stochastic noise in the gates dampens
this polarization). The output ofA1 is small due to noise,
and thus is accidentally close to the correct answer for
small overlap states.
We note that the uneven behavior of NACL with exact
overlap of input states can be easily modified by (i) mod-
ifying the training dataset to have uniformly distributed
overlaps, and (ii) modifying the cost function to be a
worst-case measure of performance instead of average-
case and/or a function of relative error as opposed to
absolute error with the exact overlap.
V. IMPLEMENTATION FOR STATE
PREPARATION
For the state preparation application, we will focus on
preparing W-states of n qubits:
|Wn〉 = 1√
n
n∑
i=1
|i〉, (19)
where |i〉 is the state where qubit i is |1〉 and all other
qubits are in state |0〉. W-states are multipartite entan-
gled states that are robust against loss and can be used
for multipartite cryptographic protocols and for telepor-
tation [33]. As far as we are aware, the circuits generated
in Cruz et al. [34] are the most efficient circuits for W-
state generation, and we will use these circuits as our
base-case “textbook” circuits to compare against.
In the following we will study the prepartion of W-
states for n = 4, 5.
A. 4 qubit W-state preparation
The textbook circuit for preparing |W4〉 is shown in
Fig. 9(a). It was obtained by following the general proce-
dure given in [34]. This circuit will be applied to the first
four qubits in the device shown in Fig. 3. The perfor-
mance of the textbook circuit and the NACL circuit will
depend on the subset of qubits on which we are preparing
the state. However, in realistic situations, one will not be
given that freedom since the state preparation is usually
only one step in a larger quantum circuit, which imposes
constraints on the choice of qubits. We select qubits 1-4
9FIG. 9. (a) Textbook circuit for preparing |W4〉. (b) De-
composition of control-G(p) gate into CNOT and one-qubit
gates u and u†, where u = e−iY α and α = arcsin(√p)/2. (c)
Compilation of the textbook circuit shown in (a) into first
four qubits of the device model in Fig. 3. The notation is
the same as in Fig. 5. The values of angles θj are given in
Appendix A.
to show how NACL can optimize circuits on devices with
restricted connectivity.
The one-qubit gate, depicted as G(p) in Fig. 9(a), is
defined as follows:
G(p) =
( √
p
√
1− p√
1− p −√p
)
. (20)
Note that this is a slightly different definition than the
one given in Ref. [34]. The above definition of G(p) leads
to the same state that is prepared with the circuit shown
in Fig. 9(a) but allows for more efficient decomposition
of control-G(p) into CNOTs and one-qubit gates.
The circuit shown in Fig. 9(a) must be compiled into
the native gate set in the device model. The W state is
invariant under permutation of qubits, and so one can re-
label the qubits in the circuit shown in Fig. 9(a) if this is
advantageous for compilation. To find the optimal com-
pilation of the textbook circuit we checked all possible
permutations of qubits. All permutations lead to a com-
pilation in which at least two CNOTs are not compatible
with device connectivity and need to be decomposed fur-
ther. We evaluated each permutation by simulation (with
the noise model) under the corresponding compiled cir-
cuit and computing the fidelity of the output with the
exact |W4〉 state. The permutation that gives the high-
est fidelity is simply [1, 2, 3, 4] (there are however other
permutations that lead to the same fidelity), and the cor-
responding compiled circuit is shown in Fig. 9(b). We
found that this textbook circuit produces |W4〉 with fi-
delity 0.671 under the noise model.
The circuit produced by NACL for preparing |W4〉 is
shown in Fig. 10. Since the task here is to prepare one
state from one other state, the training dataset and vali-
dation dataset are the same, and just consist of one pair
{|0〉⊗4, |W4〉}; the first element is the input state and
the second is the ideal output state. This NACL circuit
FIG. 10. Circuit that prepares |W4〉 found by NACL. The
notation is the same as in Fig. 5. Angles θj are specified in
Appendix A.
outputs a state under the noise model with a fidelity of
0.8894 to the exact state. This is a reduction in error (as
measured by 1− F , where F is fidelity) by a factor of 3
as compared with the best known textbook circuit.
Careful inspection of the circuit in Fig. 10 reveals an
interesting feature. In certain circumstances, it is more
beneficial (from the point of minimizing the cost func-
tion; infidelity in this case) to have a long sequence of
gates that are not compiled into an equivalent transfor-
mation with a shorter sequence. An example is the final
13 gates (including Z(θ) gates in this count) applied to
qubit 1. It is possible to implement the resulting trans-
formation with a shorter sequence of gates, but doing so
would mean that the qubit sits idle for the remaining
time while the operations on the other qubits complete.
Apparently this incurs a greater cost than the longer se-
quence (the pulse gates are fairly high quality gates for
this device and in fact, have a smaller infidelity than the
idle operations, see Appendix B). We thus observe a fea-
ture that resembles dynamical decoupling or a dynami-
cally corrected gate for this final transformation of qubit
1. We have reasonable confidence that this feature is not
a numerical artifact or local optimum because we also
independently optimized just that subcircuit (i.e., keep
the rest of the circuit fixed and optimized just the last
six clock cycles of qubit 1 under the same cost function
that evaluates the error on the 4-qubit output state), and
could not find a better sequence. Note that this feature is
“emergent”. Dynamical gate correction techniques were
not coded in the search algorithm and yet NACL effec-
tively used them in the optimized solution. It a way,
those techniques were “discovered” via cost optimization.
We also point out that this feature of preferring longer
sequences to idles is not general – one cannot replace ev-
ery sequence of idles with a sequence of pulses and Z(θ)
rotations and lower the error. For example, qubit 3 sits
idle over five clock cycles and this achieves the minimum
cost function even when we attempt to re-optimize just
that sub-sequence of gates. This feature demonstrates
the ability of NACL to find circuit implementations that
optimize performance in highly non-trivial ways that in-
corporate an interplay between the computational task
(encoded in the cost function) and the device model.
10
FIG. 11. (a) Circuit for preparing |W5〉 obtained by following
the construction given in [34]. The first controlled G(p) gate
can be simplified, as the first qubit is initialized in |1〉. This
allows for a shorter compilation. (b) Its best compiled version
achieved by a proper permutation of qubits. The notation is
the same as in Fig. 5. Angles θj are given in Appendix A.
B. 5 qubit W-state preparation
We also study the preparation of |W5〉 since this task
requires the use of all qubits on the device in Fig. 3.
Again, we follow the prescription in Cruz et al. [34] to
arrive at the best textbook circuit for preparing |W5〉 in
Fig. 11. The compilation of this circuit onto the device
under study is not trivial since we can arbitrarily permute
the qubits. Every permutation will result in a potentially
different decomposition of CNOTs, given the constrained
connectivity of the device. We checked all 120 qubit per-
mutations and found that the circuit compilation shown
in Fig. 11(b) gives the smallest value of the cost function
when evaluated under the noise model. This optimal per-
mutation was found to be [4, 3, 5, 2, 1]. Under this per-
mutation, only one CNOT (the second gate from the left
in Fig. 11(a)) needs to be decomposed due to the lack
of connectivity. The circuit in Fig. 11(b) achieves the
fidelity of 0.675.
NACL found the circuit presented in Fig. 12 for |W5〉
state preparation. Again, NACL finds a circuit that is
much more compact than the textbook one. It uses fewer
CNOTs, requires less idling of qubits, and uses the error-
free Z(θ) gates liberally. The circuit produces an output
state with fidelity of F = 0.837 with the ideal |W5〉 state.
That is, the error (as measured by 1− F ) is reduced by
a factor of 2 as compared to the textbook circuit.
FIG. 12. The circuit that approximates preparation of |W5〉
found by NACL. The notation is the same as in Fig. 5. Angles
θj are given in Appendix A.
FIG. 13. (a) A textbook circuit for performing QFT on three
qubits. (b) A compilation of the circuit in (a) into the native
gate set in the device model we are simulating. The compi-
lation has to take into account that qubit 1 and 3 are not
directly connected. Angles θj are specified in Appendix A.
VI. IMPLEMENTATION FOR CIRCUIT
COMPILATION
For the circuit compilation application we consider the
problem of compiling the quantum Fourier transform
(QFT), which is a paradigmatic building block that is
used in many quantum algorithms [35]. In the following
we will consider implementing a three-qubit QFT.
A textbook circuit for implementing a QFT on three
qubits is shown in Fig. 13(a). We will consider imple-
menting this on qubits 1, 2 and 3 in the device shown in
Fig. 3. We first need to decompose the controlled Z(θ)
rotations. Every controlled Z(θ) is decomposed using two
CNOTs [36]. This decomposition leads to two CNOTs be-
tween qubits 1 and 3. Since these qubits are not directly
connected, these CNOTs need to be decomposed fur-
ther. The result of this compilation procedure is shown
in Fig. 13(b). This compilation leads to a very sparse
circuit with many (incompressible) idle gates, which has
negative impact on the quality of the final result.
The circuit constructed via NACL is shown in Fig. 14.
We used NACL with the cost function defined in Eq.
(16) with the average process fidelity computed via Eq.
(17). The circuit has shorter depth than the compiled
textbook circuit, and does not contain a single idle gate
(as compared with 18 for the textbook circuit). It also
11
FIG. 14. Circuit performing QFT found by NACL. The nota-
tion is the same as in Fig. 5. Angles θj are given in Appendix
A.
contains more error-free Z(θ) rotations enhancing the ex-
pressiveness of the circuit.
To compare the performance of the two compiled cir-
cuits for QFT, we select 1000 random pure states |Ψj〉
and evaluate each circuit on those states. The error met-
ric we use is the infidelity between the ideal QFT output
and the circuit output; 1−Tr(ρj |Ψexj 〉〈Ψexj |), where |Ψexj 〉
is the result of the exact evaluation of QFT on |Ψj〉. Our
results are summarized in Fig. 15. For easier comparison,
the states |Ψj〉 were ordered such that the error of the
textbook circuit (represented by the blue line) increases
with the state index j. The NACL-generated circuit per-
formed better than the textbook one on all considered
states. Since the validation dataset is composed of ran-
dom pure input states, the average infidelity (over these
input states) is related to the entanglement infidelity of
the channel defined by the noisy circuit (see Eq. (17)),
which is an input-state independent measure of the qual-
ity of a channel (or circuit implementation). We use this
relation to validate our error metric defined over ran-
domly sampled input states. In Fig. 15 the dotted lines
show 1− (dFe(U† ◦ E) + 1)/(d+ 1), where d = 8, U is the
channel corresponding to the ideal circuit implementa-
tion, E is the channel corresponding to the noisy circuit
implementation, and Fe is the entangled fidelity defined
in Sec. IID. These lines correspond well to the sample av-
erages of our infidelity error metric. We find that NACL
reduced the average infidelity from 0.289 to 0.124, that is,
by 57%. Another observation is that the performance of
the textbook circuit varies more significantly with input
state than for the NACL-generated circuit.
VII. DISCUSSION AND CONCLUSIONS
We have introduced the framework of noise-aware cir-
cuit learning (NACL), whereby the circuit implementa-
tion of a quantum algorithm is formulated by machine
learning and optimization based on a cost-function that
captures the goal of the algorithm and a device model
that captures the connectivity and noise in the device
that executes the circuit. We have shown that this frame-
work can be applied to all of the common tasks in quan-
tum computing – observable (or mean-value) extraction,
state preparation, and circuit compilation – and demon-
strated through examples the types of performance im-
FIG. 15. Performance of textbook and NACL-generated cir-
cuits to evaluate QFT. The figure shows error (as defined
in the text) for 1000 randomly generated pure states. NACL-
generated circuit performs much better than the textbook one
on all considered states.
provements that can be obtained through NACL. For the
examples considered here, NACL produces reductions in
error rates (suitably defined for the different tasks) by
factors of 2 to 3, when compared to textbook circuits for
the same tasks.
In general, NACL produces shorter depth circuits that
minimize the impact of stochastic noise sources. How-
ever, as demonstrated through the examples consid-
ered here, NACL can automatically derive known noise-
suppression concepts such as dynamical decoupling and
apply these in contexts where they are useful (as defined
by the cost function). It also naturally outputs circuits
that incorporate commonsense strategies such as mini-
mizing the number of noisy idle gates and maximizing
the use of ideal gates, such as error-free Z(θ) rotations.
NACL can incorporate much more fine-grained informa-
tion about the device than other circuit compilation tech-
niques – e.g., in the demonstrations presented here we
have used process matrices derived from gate set tomog-
raphy of real hardware to approximately model noise on
this device. Such process matrices can capture effects ig-
nored by effective noise models, such as coherent noise
and non-unital processes such as relaxation.
We note that we have also executed NACL with an
error model derived from trapped-ion physics (see Ap-
pendix C for details), to validate that the technique can
be used with a variety of noise model specifications. The
results are very similar to those presented above, al-
though there are some simplifications due to an assump-
tion of full connectivity in the device (which is realistic
for small trapped-ion platforms).
The noise models currently compatible with NACL do
not include crosstalk effects. Although these can be in-
corporated for small devices using the approach outlined
in this paper, incorporating crosstalk in a scalable man-
ner is complicated. The heart of the issue is how to
12
model crosstalk in a scalable manner [28]. In the pres-
ence of crosstalk, the natural description of operations on
a quantum computer is not in terms of gates, but in terms
of layers, which capture what is done to each qubit in the
device in a given clock cycle. This is because the precise
operation performed on a qubit could, in principle, de-
pend on what is performed on any other qubit in the
device. Therefore, the first extension of NACL required
to capture crosstalk is to optimize circuits in terms of
layers as opposed to gate sequences. Moreover, one has
to also consider whether it is realistic to develop quan-
tum channels representing noisy implementation of any
circuit layer. Firstly, there are an exponential (in n, the
number of qubits) number of possible layers to character-
ize, and secondly, one needs to perform n-qubit process
tomography in order to get quantum channels for each
layer. This last task is obviously impossible for large n,
and therefore one has to develop more approximate tech-
niques to describe noisy implementations of layers. One
approach around these issues is to patch together quan-
tum channels derived from one-, two-, and three-qubit
tomography to get an approximate description of a cir-
cuit layer, similar to what is demonstrated in Govia et al.
[37]. This would model a physically important subclass
types of crosstalk errors [28]. Future work will look at
incorporating these more complex noise effects into the
NACL circuit learning framework.
An important issue to consider is how to scale NACL
to develop noise-resilient circuits for larger devices. The
complexity of circuit simulation under a noise model and
the complexity of optimization over the circuit parame-
ters increase exponentially with number of qubits. This
means that NACL can be used as-is to optimize circuits
for small modular elements (operating on 10-20 qubits)
of a larger application; e.g., magic state distillation cir-
cuits. However, we can also outline a strategy for ex-
tending NACL beyond this use-case. The strategy ap-
plies when one is already given a circuit compilation for
a computational task. Perhaps this is a compilation de-
rived using theoretical decompositions or some other ef-
ficient method. Then one can sample a subcircuit from
this circuit. This subcircuit defines an ideal unitary and
one can use NACL to find best approximations to this
unitary under the given device model. This sampling can
be repeated for multiple subcircuits. However, note that
this strategy does not guarantee any optimality proper-
ties for the circuit derived from combining these individ-
ually optimized subcircuits. Studying the potential of
this strategy for scaling up the NACL framework is left
as future work.
Related to scalability is the connection between NACL
and variational quantum algorithms (VQAs). An alter-
native to evaluating the NACL cost functions in Sec. IID
by simulating a parameterized quantum circuit on a clas-
sical computer is to evaluate them by executing the pa-
rameterized circuits on quantum hardware directly in the
spirit of VQAs. In addition to the obvious advantage of
scalability, this hardware-enabled approach has the ad-
vantage of capturing the noise model exactly (and does
not require any noise modeling). However, for certain ap-
plications (e.g., compiling and state preparation [19, 20]),
the NACL cost function require comparing against the
ideal target circuit outputs. In a VQA setting, any prepa-
ration of the targets would also be noisy, and therefore
one cannot exactly evaluate the required cost functions.
Whether it is possible to sufficiently approximate the cost
functions with noisy hardware is an open problem [38],
and if this were possible, it would make hardware-enabled
NACL realistic.
Modern optimization and machine learning methods
will be critical for deriving computational use from near-
term quantum devices. Motivated by this, we have de-
veloped the NACL framework as a way to utilize detailed
noise characterization information to build noise-resilient
circuits for near-term quantum computing applications,
and we outlined promising directions for extending this
framework. Our NACL method can be combined with
(and hence is complementary to) other approaches to er-
ror mitigation that have been recently proposed [39–43].
Hence, NACL is a novel primitive that will play an im-
portant role in the quest for quantum advantage.
ACKNOWLEDGMENTS
The authors would like to thank Tim Proctor and An-
drew Baczewski for useful comments on a draft of this
work.
Research presented in this article was supported by
the Laboratory Directed Research and Development pro-
gram of Los Alamos National Laboratory under project
number 20180628ECR for the noise-free machine learn-
ing approach and project number 20190065DR for the
machine learning approach in the presence of noise. PJC
also acknowledges support from the LANL ASC Beyond
Moore’s Law project. This work was also supported by
the U.S. Department of Energy, Office of Science, Office
of Advanced Scientific Computing Research, under the
Quantum Computing Application Teams (QCAT) pro-
gram.
Sandia National Laboratories is a multimission labora-
tory managed and operated by National Technology and
Engineering Solutions of Sandia, LLC, a wholly owned
subsidiary of Honeywell International, Inc., for the U.S.
Department of Energy’s National Nuclear Security Ad-
ministration under contract DE-NA0003525. This paper
describes objective technical results and analysis. Any
subjective views or opinions that might be expressed in
the paper do not necessarily represent the views of the
U.S. Department of Energy or the United States Govern-
ment.
13
REFERENCES
[1] Robert R. Tucci, “Qubiter Algorithm Modification, Ex-
pressing Unstructured Unitary Matrices with Fewer
CNOTs,” arXiv:quant-ph/0411027 (2004), arXiv: quant-
ph/0411027.
[2] Ali JavadiAbhari, Shruti Patil, Daniel Kudrow, Jeff
Heckey, Alexey Lvov, Frederic T. Chong, and Margaret
Martonosi, “ScaffCC: a framework for compilation and
analysis of quantum computing programs,” in Proceed-
ings of the 11th ACM Conference on Computing Fron-
tiers - CF ’14 (ACM Press, Cagliari, Italy, 2014) pp.
1–10.
[3] Dmitri Maslov, “Basic circuit compilation techniques for
an ion-trap quantum machine,” New Journal of Physics
19, 023035 (2017).
[4] Davide Venturelli, Minh Do, Eleanor Rieffel, and Jeremy
Frank, “Compiling Quantum Circuits to Realistic Hard-
ware Architectures using Temporal Planners,” in Proc. of
IJCAI (2017).
[5] Prakash Murali, Ali Javadi-Abhari, Frederic T. Chong,
and Margaret Martonosi, “Formal constraint-based com-
pilation for noisy intermediate-scale quantum systems,”
Microprocessors and Microsystems 66, 102–112 (2019).
[6] L. Cincio, Y. Subaşı, A. T. Sornborger, and P. J. Coles,
“Learning the quantum algorithm for state overlap,” New
Journal of Physics 20, 113022 (2018), arXiv:1803.04114
[quant-ph].
[7] Swamit S. Tannu and Moinuddin K. Qureshi, “Not All
Qubits Are Created Equal: A Case for Variability-Aware
Policies for NISQ-Era Quantum Computers,” in Proceed-
ings of the Twenty-Fourth International Conference on
Architectural Support for Programming Languages and
Operating Systems (ACM, Providence RI USA, 2019) pp.
987–999.
[8] Seyon Sivarajah, Silas Dilkes, Alexander Cowtan, Will
Simmons, Alec Edgington, and Ross Duncan, “t|ket〉 : A
Retargetable Compiler for NISQ Devices,” Quantum Sci-
ence and Technology (2020), 10.1088/2058-9565/ab8e92,
arXiv: 2003.10611.
[9] Prakash Murali, Jonathan M. Baker, Ali Javadi Abhari,
Frederic T. Chong, and Margaret Martonosi, “Noise-
Adaptive Compiler Mappings for Noisy Intermediate-
Scale Quantum Computers,” in Proceedings of the
Twenty-Fourth International Conference on Architec-
tural Support for Programming Languages and Operating
Systems (ACM, 2019) p. 1015, arXiv: 1901.11054.
[10] Robin Blume-Kohout, John King Gamble, Erik Nielsen,
Kenneth Rudinger, Jonathan Mizrahi, Kevin Fortier,
and Peter Maunz, “Demonstration of qubit operations
below a rigorous fault tolerance threshold with gate set
tomography,” Nature Communications 8, 1 (2017).
[11] “PyGSTi. A python implementation of Gate Set Tomog-
raphy.” .
[12] Kwok Ho Wan, Oscar Dahlsten, Hlér Kristjánsson,
Robert Gardner, and MS Kim, “Quantum generalisation
of feedforward neural networks,” npj Quantum Informa-
tion 3, 36 (2017).
[13] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-
Hong Yung, Xiao-Qi Zhou, Peter J Love, Alán Aspuru-
Guzik, and Jeremy L O’brien, “A variational eigenvalue
solver on a photonic quantum processor,” Nature Com-
munications 5, 4213 (2014).
[14] Jarrod R McClean, Jonathan Romero, Ryan Babbush,
and Alán Aspuru-Guzik, “The theory of variational
hybrid quantum-classical algorithms,” New Journal of
Physics 18, 023023 (2016).
[15] Edward Farhi, Jeffrey Goldstone, and Sam Gut-
mann, “A quantum approximate optimization algo-
rithm,” arXiv:1411.4028 (2014).
[16] J. Romero, J. P. Olson, and A. Aspuru-Guzik, “Quantum
autoencoders for efficient compression of quantum data,”
Quantum Science and Technology 2, 045001 (2017).
[17] Ying Li and Simon C Benjamin, “Efficient variational
quantum simulator incorporating active error minimiza-
tion,” Physical Review X 7, 021050 (2017).
[18] Kosuke Mitarai, Makoto Negoro, Masahiro Kitagawa,
and Keisuke Fujii, “Quantum circuit learning,” Physical
Review A 98, 032309 (2018).
[19] Sumeet Khatri, Ryan LaRose, Alexander Poremba,
Lukasz Cincio, Andrew T Sornborger, and Patrick J
Coles, “Quantum-assisted quantum compiling,” Quan-
tum 3, 140 (2019).
[20] Tyson Jones and Simon C Benjamin, “Quantum compi-
lation and circuit optimisation via energy dissipation,”
arXiv:1811.03147 (2018).
[21] Ryan LaRose, Arkin Tikku, Étude O’Neel-Judy, Lukasz
Cincio, and Patrick J Coles, “Variational quantum state
diagonalization,” npj Quantum Information 5, 57 (2019).
[22] Andrew Arrasmith, Lukasz Cincio, Andrew T Sorn-
borger, Wojciech H Zurek, and Patrick J Coles, “Vari-
ational consistent histories as a hybrid algorithm for
quantum foundations,” Nature Communications 10, 3438
(2019).
[23] Marco Cerezo, Alexander Poremba, Lukasz Cincio, and
Patrick J Coles, “Variational quantum fidelity estima-
tion,” Quantum 4, 248 (2020).
[24] Carlos Bravo-Prieto, LaRose, M. Cerezo, Yigit Subasi,
Lukasz Cincio, and Patrick J. Coles, “Variational quan-
tum linear solver: A hybrid algorithm for linear systems,”
arXiv:1909.05820 (2019).
[25] Cristina Cirstoiu, Zoe Holmes, Joseph Iosue, Lukasz Cin-
cio, Patrick J Coles, and Andrew Sornborger, “Vari-
ational fast forwarding for quantum simulation beyond
the coherence time,” arXiv preprint arXiv:1910.04292
(2019).
[26] M Cerezo, Kunal Sharma, Andrew Arrasmith, and
Patrick J Coles, “Variational quantum state eigensolver,”
arXiv preprint arXiv:2004.01372 (2020).
[27] Timothy Proctor, Melissa Revelle, Erik Nielsen, Kenneth
Rudinger, Daniel Lobser, Peter Maunz, Robin Blume-
Kohout, and Kevin Young, “Detecting, tracking, and
eliminating drift in quantum information processors,”
arXiv:1907.13608 [physics, physics:quant-ph] (2019).
[28] Mohan Sarovar, Timothy Proctor, Kenneth Rudinger,
Kevin Young, Erik Nielsen, and Robin Blume-Kohout,
“Detecting crosstalk errors in quantum information pro-
cessors,” arXiv:1908.09855 [quant-ph] (2019), arXiv:
1908.09855.
[29] Benjamin Schumacher, “Sending entanglement through
noisy quantum channels,” Physical Review A 54, 2614
(1996).
[30] Michał Horodecki, Paweł Horodecki, and Ryszard
Horodecki, “General teleportation channel, singlet frac-
tion, and quasidistillation,” Physical Review A 60, 1888
(1999).
[31] Michael A Nielsen, “A simple formula for the average
14
gate fidelity of a quantum dynamical operation,” Physics
Letters A 303, 249–252 (2002).
[32] Erik Nielsen, Kenneth Rudinger, Timothy Proctor, An-
tonio Russo, Kevin Young, and Robin Blume-Kohout,
“Probing quantum processor performance with pyGSTi,”
arXiv preprint arXiv:2002.12476 (2020).
[33] Jaewoo Joo, Young-Jai Park, Sangchul Oh, and Jaewan
Kim, “Quantum teleportation via a W state,” New Jour-
nal of Physics 5, 136–136 (2003).
[34] Diogo Cruz, Romain Fournier, Fabien Gremion, Alix
Jeannerot, Kenichi Komagata, Tara Tosic, Jarla Thies-
brummel, Chun Lam Chan, Nicolas Macris, Marc-André
Dupertuis, et al., “Efficient quantum algorithms for ghz
and w states, and implementation on the ibm quantum
computer,” Advanced Quantum Technologies 2, 1900015
(2019).
[35] M. A. Nielsen and I L Chuang, Quantum computation
and quantum information, Book (Springer, 2010).
[36] Andrew W. Cross, Lev S. Bishop, John A. Smolin,
and Jay M. Gambetta, “Open Quantum Assembly Lan-
guage,” arXiv e-prints (2017), arXiv:1707.03429 [quant-
ph].
[37] L. C. G. Govia, G. J. Ribeill, D. Ristè, M. Ware, and
H. Krovi, “Bootstrapping quantum process tomography
via a perturbative ansatz,” Nature Communications 11,
1084 (2020).
[38] Kunal Sharma, Sumeet Khatri, Marco Cerezo, and
Patrick Coles, “Noise resilience of variational quantum
compiling,” New Journal of Physics (2020).
[39] Kristan Temme, Sergey Bravyi, and Jay M Gam-
betta, “Error mitigation for short-depth quantum cir-
cuits,” Physical review letters 119, 180509 (2017).
[40] Abhinav Kandala, Kristan Temme, Antonio D Córcoles,
Antonio Mezzacapo, Jerry M Chow, and Jay M Gam-
betta, “Error mitigation extends the computational reach
of a noisy quantum processor,” Nature 567, 491–495
(2019).
[41] Piotr Czarnik, Andrew Arrasmith, Patrick J Coles, and
Lukasz Cincio, “Error mitigation with clifford quantum-
circuit data,” arXiv preprint arXiv:2005.10189 (2020).
[42] Armands Strikis, Dayue Qin, Yanzhu Chen, Simon C.
Benjamin, and Ying Li, “Learning-based quantum error
mitigation,” arXiv preprint arXiv:2005.07601 (2020).
[43] Alexander Zlokapa and Alexandru Gheorghiu, “A deep
learning model for noise prediction on near-term quan-
tum devices,” arXiv preprint arXiv:2005.10811 (2020).
[44] Yuval R Sanders, Joel J Wallman, and Barry C Sanders,
“Bounding quantum gate error rate based on reported av-
erage fidelity,” New Journal of Physics 18, 012002 (2015).
[45] Colin J. Trout, Muyuan Li, Mauricio Gutierrez, Yukai
Wu, Sheng-Tao Wang, Luming Duan, and Kenneth R.
Brown, “Simulating the performance of a distance-3 sur-
face code in a linear ion trap,” New Journal of Physics
20, 043038 (2018), arXiv: 1710.01378.
15
Appendix A: Numerical values of rotation angles
In Table I we list the angles θn that define the Z(θ) gates in all the circuits presented in the main text.
n θn in θn in θn in θn in θn in θn in θn in θn in θn in
Fig. 5 Fig.6(b) Fig. 7 Fig. 9(c) Fig. 10 Fig. 11(b) Fig. 12 Fig. 13(b) Fig. 14
1 3.926991 4.729459 1.249703 1.570796 6.261032 0.785398 0.000110 2.748894 5.528124
2 1.570796 2.356455 2.410073 0.785398 1.581678 2.356194 3.107730 1.570796 1.603007
3 1.570796 6.271231 5.503714 2.356194 0.618247 3.141593 0.775046 2.356194 3.180240
4 0.785398 0.012770 0.172928 3.141593 6.210505 0.615480 1.385048 1.570796 1.588121
5 5.497787 5.497958 0.106332 0.785398 3.155136 2.526113 3.218745 0.785398 5.982926
6 5.497787 0.017911 0.022432 2.356194 3.088771 3.141593 6.184576 5.497787 1.517815
7 0.785398 0.785692 1.621729 3.141593 3.127992 1.369438 0.000214 1.570796 3.174252
8 5.497787 4.713347 6.267293 2.708279 0.785398 0.725692 2.356194 2.380614
9 2.355849 3.672289 1.477670 2.356194 0.895856 5.497787 3.909589
10 4.711034 0.132619 2.327048 3.141593 0.149143 0.392699 6.271246
11 5.697289 0.012648 2.289056 5.890486 0.006795
12 3.141953 0.876330 3.142930 3.903039
13 4.364565 0.444937 4.665748 4.728925
14 0.964557 4.781066 0.000281 3.157351
15 6.037635 5.429627 0.614126 2.383873
16 5.975455 2.827826 6.176411 0.022179
17 0.159144 3.101400 3.698677 3.135448
18 6.194334 0.505015 1.349278 5.062797
19 1.518005 1.752444 5.651896 3.119667
20 2.570119 0.077924 0.048880 0.821506
21 2.836344 2.324862 1.604027
22 3.171423 5.525191 4.721183
23 4.005286 1.711153 0.000189
24 0.113893 1.563388
25 0.040828 3.165257
26 0.863436 0.025552
27 6.210510 3.165166
28 4.981590 1.566059
29 2.622501 0.897177
30 0.536382 5.028318
31 2.882208 6.282094
32 0.148144 2.400830
33 2.916385 3.127614
34 5.971181 2.312352
35 5.047211
TABLE I. Angles (in radians) defining the Z(θ) gates in each of the circuits presented in the main text.
Appendix B: Noise model process matrices
In this Appendix we list the process matrices and SPAM elements derived from GST experiments that define our
error model for the 5-qubit device we demonstrate NACL on. These process matrices are completely-positive trace-
preserving estimates of the corresponding operations. (We note that, in order to estimate these process matrices, GST
required that we also estimate the process matrix corresponding to the Y (pi/2) operation. We omit that estimate
here as our device model does not include the Y (pi/2) gate in the native gate set). All process matrices are given
in the Pauli basis (i.e., they are “Pauli transfer matrices”) while the SPAM operations are given in the “standard”
representation. Because of throughput constraints only “short” GST circuits (i.e., circuits for linear-inversion GST
[10]) were used; each circuit was repeated 1024 times.
16
I =
 1.0000 −0.0000 0.0000 −0.00000.0042 0.9943 −0.0064 0.0178−0.0033 0.0120 0.9962 0.0186
0.0029 −0.0182 −0.0167 0.9928

X(pi/2) =
 1.0000 0.0000 0.0000 −0.00000.0007 0.9988 −0.0050 −0.0055−0.0010 −0.0060 0.0167 −0.9980
−0.0017 0.0065 0.9979 0.0176

P0 =
(
0.9997 −0.0006
0.0055 0.0231
)
P1 =
(
0.0003 0.0006
−0.0055 0.9769
)
ρ0 =
(
0.9903 0
0 0.0097
)
.
Here, P0 and P1 are the imperfect POVM effects for projections onto the |0〉 and |1〉 states, respectively. ρ0 is the
density matrix for the single-qubit imperfect state preparation. Finally,
CNOT =

1.000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0.012 0.973 0.016 0.005 0.005 −0.002 0.012 −0.004 −0.002 0.003 −0.004 0.002 −0.010 0.008 0.015 −0.001
0.001 −0.009 0.004 −0.003 −0.002 0.000 −0.023 0.001 −0.006 −0.001 −0.007 0.003 0.005 −0.019 0.974 0.003
0.002 0.006 0.000 0.003 −0.005 −0.001 0.002 −0.021 −0.010 0.001 0.003 −0.010 −0.001 −0.007 0.004 0.983
0.002 0.001 0.012 −0.008 0.015 0.964 0.017 0.004 0.001 0.020 −0.018 0.003 0.048 0.020 −0.002 −0.004
0.002 −0.001 0.004 0.002 0.980 0.004 −0.002 −0.009 0.018 0.001 −0.005 0.012 0.021 0.042 0.002 0.005
−0.002 −0.003 0.041 0.002 −0.009 0.001 0.005 −0.018 −0.005 −0.002 0.003 0.977 0.014 −0.003 0.000 0.012
−0.003 −0.006 −0.002 0.045 −0.006 0.019 0.015 0.006 −0.002 0.022 −0.968 −0.001 −0.006 0.001 −0.008 0.005
0.001 0.007 −0.004 0.001 0.000 −0.019 0.017 −0.001 0.011 0.966 0.019 0.003 0.012 0.009 −0.002 −0.005
0.001 0.008 0.004 −0.001 −0.021 −0.000 0.002 −0.011 0.981 0.004 −0.001 −0.005 0.014 0.004 0.002 0.010
−0.001 −0.005 0.007 −0.002 0.005 0.005 −0.003 −0.975 −0.011 0.002 0.007 −0.020 −0.003 −0.002 0.008 −0.023
−0.002 −0.012 0.004 0.006 0.003 −0.021 0.967 0.001 −0.005 0.017 0.016 0.007 0.003 0.004 0.021 0.004
−0.002 −0.003 −0.001 0.001 −0.021 −0.035 −0.008 −0.001 −0.010 −0.006 0.001 −0.006 0.987 0.002 0.001 −0.000
−0.008 0.006 0.012 −0.001 −0.043 −0.020 −0.003 0.003 −0.010 −0.009 0.003 0.008 0.011 0.970 0.016 0.007
0.005 −0.018 0.973 0.003 −0.004 −0.009 0.002 0.008 0.002 0.005 −0.001 −0.039 −0.004 −0.007 0.005 −0.005
0.000 −0.007 0.005 0.982 0.005 0.002 −0.008 0.003 0.003 −0.009 0.040 0.002 0.002 0.005 0.001 0.001

We also list below various error metrics for these noisy operators (as compared to ideal operators).
Gate label Infidelity 1/2 diamond distance
I 2.8 · 10−3 1.7 · 10−2
X(pi/2) 8.8 · 10−4 1.1 · 10−2
CNOT 1.9 · 10−2 5.0 · 10−2
ρ0 9.7 · 10−3 -
P0 2.0 · 10−3 -
P1 2.3 · 10−2 -
TABLE II. Error metrics for noisy operations (compared to ideal operations) used in our device model input to NACL. For
gate operations, entanglement infidelity and diamond distance are presented, while for SPAM operations, only state infidelity
is used.
“Infidelity” for gate operations is taken to be average gate infidelity, i.e., 1− F¯ , where F¯ is the average gate fidelity
(with respect to the desired target operation), as defined in Eq. (12). For SPAM operations we simply use state
infidelity, i.e.,
1− F (ρ, σ) = 1−
(
Tr
√√
ρσ
√
ρ
)2
(B1)
Half-diamond distance, denoted , is defined as
(A,B) = 12 ||A−B|| = 12 sup
ρ
|| (A⊗ 1d[ρ])− (B ⊗ 1d[ρ]) ||1, (B2)
where || · ||1 is the trace norm, sup is taken over all density matrices of dimension d2, and d = dimA = dimB.
17
Average gate infidelity may be thought of as, averaged over the Haar measure, the infidelity of a state that has
passed through the gate’s channel; diamond distance may be thought of as a worst-case error rate. Average gate
infidelity is quadratically more sensitive to stochastic error than unitary error, while diamond distance is equally
sensitive to both classes of errors. [44]
Appendix C: Noise model for a trapped-ion quantum computer
In addition to the noise model presented in the main text, we also ran NACL using an additional noise model, that
is an effective model formed from error metrics derived from a near-term trapped-ion quantum computer. We adapt
the coarse-grained error maps used to model errors during execution of a common trapped-ion gate set developed
by Trout et al. in Ref. [45]. In particular, the native gates in the processor are assumed to be X(θ), Y (θ), Z(θ) and
XX(θ) ≡ eiθX⊗X , where the first three are single qubit rotations about the three orthogonal axes and the last is
an arbitrary angle Molmer-Sørensøn interaction between two qubits. The quantum channels representing the noisy
versions of each of these gates are given by:
EX(θ) = D(pd) ◦W(pdep) ◦ RX(pα) ◦ UX(θ),
EY (θ) = D(pd) ◦W(pdep) ◦ RY (pα) ◦ UY (θ),
EZ(θ) = D(pd) ◦W(pdep) ◦ RZ(pα) ◦ UZ(θ),
EXX(θ) = [D1(pd,1)⊗D2(pd,2)]◦
[W1(pdep)⊗W1(pdep)]◦
H(pxx) ◦ H(ph) ◦ UXX(θ).
Here, Uk(θ) represents an ideal rotation about axis k (e.g., UX(θ)ρ = e−iθXρeiθX), Rk(pα) represents the effects of
rotation angle imprecision about axis k (e.g., RX(pα)ρ = (1 − pα)ρ + pαXρX), W(pdep) is a depolarizing channel
(i.e., W(pdep)ρ = (1 − pdep)ρ + pdepI), D(pd) is a dephasing channel (i.e., D(pd)ρ = (1 − pd)ρ + ZρZ, and finally,
H(p)ρ = (1−p)ρ+XXρXX, is a two-qubit channel that represents the effects of an imprecise rotation (when p = pxx)
or the effects of ion heating (when p = ph). The subscripts on any of these channels (in the case of the two-qubit
operation) denotes action on that qubit.
In addition to these imperfect gates, we model SPAM errors by following an ideal ground state preparation with a
depolarizing channel, and by preceding ideal single qubit measurement POVM effects by a depolarizing channel, i.e.,
〈〈0| → 〈〈0|D(pdep)
|i〉〉 → D(pdep)|i〉〉 for i = 0, 1, (C1)
where we have notated state preparation and measurement effects as Hilbert-Schmidt vectors. Finally, in order to
capture noise during idle cycles, all idles are modeled as a depolarizing channel D(pidle).
This effective noise model captures many of the non-idealities in typical ion trap quantum computing architectures.
However, note that under tis model there are no connectivity restrictions and it is possible to perform a two-qubit
gate between any two of qubits. In the following computations we use the error rates:
pd = 1.5× 10−4
pdep = 8× 10−4
pd,1 = pd,2 = 7.5× 10−4
pα = 1× 10−4
pxx = 1× 10−3
ph = 1.25× 10−3
pidle = 8× 10−4 (C2)
