Expressibility and entangling capability of parameterized quantum
  circuits for hybrid quantum-classical algorithms by Sim, Sukin et al.
Expressibility and entangling capability of parameterized quantum circuits for hybrid
quantum-classical algorithms
Sukin Sim,1, 2, ∗ Peter D. Johnson,2 and Alán Aspuru-Guzik2, 3, 4, 5, †
1Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street,
Cambridge, MA 02138, USA
2Zapata Computing, Inc., 501 Massachusetts Avenue,
Cambridge, MA 02139, USA
3Department of Chemistry and Department of Computer Science, University of Toronto, 80 St. George Street,
Toronto, ON M5S 3H6, Canada
4Canadian Institute for Advanced Research (CIFAR) Senior Fellow, 661 University Avenue, Suite 505,
Toronto, ON M5G 1M1, Canada
5Vector Institute, 661 University Avenue, Suite 710
Toronto, ON M5G 1M1, Canada
(Dated: May 28, 2019)
Parameterized quantum circuits play an essential role in the performance of many variational
hybrid quantum-classical (HQC) algorithms. One challenge in implementing such algorithms is to
choose an effective circuit that well represents the solution space while maintaining a low circuit
depth and number of parameters. To characterize and identify expressible, yet compact, parame-
terized circuits, we propose several descriptors, including measures of expressibility and entangling
capability, that can be statistically estimated from classical simulations of parameterized quantum
circuits. We compute these descriptors for different circuit structures, varying the qubit connectivity
and selection of gates. From our simulations, we identify circuit fragments that perform well with
respect to the descriptors. In particular, we quantify the substantial improvement in performance
of two-qubit gates in a ring or all-to-all connected arrangement compared to that of those on a line.
Furthermore, we quantify the improvement in expressibility and entangling capability achieved by
sequences of controlled X-rotation gates compared to sequences of controlled Z-rotation gates. In
addition, we investigate how expressibility “saturates” with increased circuit depth, finding that the
rate and saturated-value appear to be distinguishing features of a parameterized quantum circuit
template. While the correlation between each descriptor and performance of an algorithm remains to
be investigated, methods and results from this study can be useful for both algorithm development
and design of experiments for general variational HQC algorithms.
Due to significant development in both algorithm and
hardware, we are expected to approach the era of “Noisy
Intermediate-Scale Quantum” (NISQ) devices in the near
future [1]. This generation of quantum machines are ex-
pected to support 50 − 100 qubits and around 103 gate
operations. While these devices cannot perform error-
corrected, large scale quantum computations, smaller but
meaningful computations are anticipated to find use by
combining both quantum and classical computational re-
sources.
A particular class of algorithms that maximizes the use
of such pre-threshold hardware is the hybrid quantum-
classical (HQC) algorithm, which strategically divides
computational tasks between quantum and classical re-
sources. A prime example of a HQC algorithm is the
variational quantum eigensolver (VQE), used to compute
the ground states of molecular systems [2, 3]. Within the
VQE framework, a parameterized trial wavefunction of
the system-of-interest is prepared on the quantum com-
puter by tuning a quantum circuit based on the chosen
parameterization. This is followed by measurements of
∗ ssim@g.harvard.edu
† alan@aspuru.com
the energy expectation value with respect to the ansatz
parameters. These parameters are updated using opti-
mization routines on the classical computer and fed back
into the quantum device to prepare a “better-learned”
trial state. This cycle of updating the trial state and its
parameters continues until the convergence criteria (e.g.
energy convergence) are satisfied.
Other examples of HQC algorithms include the quan-
tum approximate optimization algorithm (QAOA) [4],
quantum autoencoder (QAE) [5], quantum variational
error corrector (QVECTOR) [6], classification via near-
term quantum neural networks (QNN) [7–9], generative
modeling [10–12], among others. These algorithms pro-
vide an approach for a variety of problems. Though
their objective functions (e.g. energy, average fidelity,
mean squared error, KL divergence) may differ, these al-
gorithms share a quantum subroutine for producing pa-
rameterized trial state(s) or ansatz(e) such that the pa-
rameters can be tuned to correspond to the optimal or
near-optimal objective function value. Consequently, the
performance (i.e. accuracy, scalability) of each algorithm
will depend on the expressive power of the chosen param-
eterized quantum circuit.
Despite the crucial role of the parameterized circuit for
a variety of HQC algorithms, there is a lack of general
understanding and intuition behind characteristics asso-
ar
X
iv
:1
90
5.
10
87
6v
1 
 [q
ua
nt-
ph
]  
26
 M
ay
 20
19
2ciated with an “effective” training circuit. For instance,
given two circuits A and B, which circuit is more suitable
for a given application and why? Are there figures-of-
merit, or more neutrally, descriptors we can compute for
these circuits to approach such question? In this work,
we present a set of operational descriptors for characteriz-
ing and evaluating parameterized quantum circuits using
classical simulations such that given a particular problem
to solve using a HQC algorithm, we are equipped with
tools for making decisions about and designing parame-
terized quantum circuits.
Similar questions have been posed and addressed in
the field of statistical learning, in which various metrics,
such as empirical risk, generalization ability, and sample
complexity, have been proposed to evaluate the perfor-
mance of a learning algorithm [13]. In practice, a method
used to improve performance defined by one metric can
come at the cost of losing performance defined by another
metric, demonstrating a tension between these descrip-
tors. One example is the balance between minimizing
empirical risk and maximizing generalization ability for
a supervised learning task: while the algorithm should
learn the training data well, it is also important to per-
form well with respect to unobserved data. In recent
years, the theory has been extended to the quantum do-
main [14]. Our work can be viewed as an effort to develop
NISQ analogues to the ansatz descriptors in statistical
learning theory. Further investigation is needed to more
rigorously connect descriptors from this work to analo-
gous metrics in statistical learning theory. But, situating
our framework as a potential NISQ analogue urges us
to investigate new questions about PQCs and to address
analogous challenges, such as the phenomenon of “barren
plateaus” [15] in training landscapes.
Before presenting the descriptors, we provide a brief
background on the structure and past studies of param-
eterized quantum circuits in Section I. In Section II, we
propose two descriptors, namely expressibility and entan-
gling capability, that provide a quantitative description of
parameterized circuits independent of the algorithm or
application. In Section III, we demonstrate the utility of
such descriptors by computing and analyzing these quan-
tities for a select set of parameterized circuits, several of
which have been designed or inspired by past studies. In
Section IV, we discuss key observations from our simula-
tions, including the phenomenon of expressibility satura-
tion, the efficacy of two-qubit controlled Z-rotation (CRZ)
versus controlled X-rotation (CRX) gates, and the circuit
configuration or topology. We conclude with a summary
and outlook of future directions for this work.
I. PARAMETERIZED QUANTUM CIRCUITS
We define a parameterized quantum circuit (PQC) as a
tunable unitary operation on n qubits, Uθ, that is applied
to a reference state |φ0〉, often set to |0〉⊗n. The resulting
parameterized quantum state is:
|ψθ〉 = Uθ |φ0〉 , (1)
where θ is a vector of a polynomial number of circuit
parameters. In this work, circuit parameters correspond
to angles of rotation gates (e.g. θ in RX(θ)), but more
generally, they can represent any tunable parameters in
a quantum operation. In near-term HQC settings, the
PQC is the point-of-contact between quantum and classi-
cal computational resources. That is, upon computing an
objective function value based on executing these circuit
runs on the quantum computer, the circuit parameters
are then refined using optimization schemes on the clas-
sical computer. More recently, this model has been com-
pared to and interpreted in the language of classical neu-
ral networks, in which the parameters of the quantum cir-
cuit are analogous to the parameters (i.e. weights, biases)
of a classical neural network [9, 16]. And just as various
neural network architectures have been proposed for spe-
cific tasks, the structure of the parameterized quantum
circuits can widely vary depending on the application.
For instance, in the case of simulating fermionic systems,
various ansatz designs, including unitary coupled-cluster
[2, 17], fermionic SWAP network [18], and low-depth cir-
cuit ansatz (LDCA) [19], have been proposed. In recent
years, a more heuristic but near-term approach to circuit
designs, e.g. “hardware-efficient” circuits [20], has been
used for applications in quantum chemistry and quantum
machine learning. Specifically, this circuit layout assumes
a unit layer containing single-qubit operations followed
by entangling two-qubit operations. In this work, we will
also refer to this unit layer of gate sequence as a circuit
template. This unit layer can be repeated L times to
provide more flexibility in the circuit. This “multi-layer”
circuit architecture, reminiscent of deep neural networks,
has been shown to provide better results in VQE [20].
The importance of parameterized circuits has led to
the development of new ansatz designs as well as studies
of circuit properties and capabilities [21, 22]. However,
there remains a lack of understanding on what makes a
particular parameterized circuit more powerful or useful
than another. In this work, we approach this question by
defining some operational quantities to characterize pa-
rameterized circuits. In the following section, we propose
two descriptors, expressibility and entangling capability,
that can be quantified by computing statistical properties
based on sampling states from a parameterized quantum
circuit template.
II. CIRCUIT DESCRIPTORS
In this section, we define two descriptors which we call
expressibility and entangling capability, in addition to
other descriptors such as the number of circuit parame-
ters and the number of two-qubit operations. Several of
these quantities or related quantities have been applied
to past studies of pseudorandom quantum circuits [23].
3However, our objective in this work is not to generate
pseudorandom circuits but rather study the capabilities
of a PQC by quantifying its deviation from random cir-
cuits in order to approach the question of how much gen-
eralization or expressiveness is useful or enough in a PQC
for a particular task.
A. Expressibility
We define expressibility as a circuit’s ability to gen-
erate (pure) states that are well representative of the
Hilbert space. In the case of a single qubit, this corre-
sponds to a circuit’s ability to explore the Bloch sphere.
One approach for computing this notion of expressibil-
ity is to compare the distribution of states obtained from
sampling the parameters of a PQC to the (expressive)
uniform distribution of states, i.e. the ensemble of Haar-
random states. To quantify the non-uniformity, we look
to the definition of an -approximate state t-design using
the Hilbert-Schmidt norm1. That is, we are interested
in computing the deviation from a state t-design, where
 may not necessarily be small. The deviation is often
quantified as:
A =
∫
Haar
(|ψ〉〈ψ|)⊗tdψ −
∫
E
(|φ〉〈φ|)⊗t dφ, (2)
where the first integral is taken over all pure states over
the Haar measure, and the second integral is taken over
all states |φ〉 according to an ensemble in consideration,
E . In our application, we consider a specific case of a state
ensemble that is generated by uniformly sampling in the
parameter space. Choosing an uninformative distribu-
tion on θ is to reflect the fact that in practice, the op-
timizer employed in variational algorithms initially lacks
knowledge of the parameter landscape. Considering such
an ensemble, A can be rewritten as:
A =
∫
Haar
(|ψ〉〈ψ|)⊗tdψ −
∫
Θ
(|ψθ〉〈ψθ|)⊗t dθ, (3)
where the latter integral, which we call µt, is taken over
all states over the measure induced by uniformly sam-
pling the parameters θ of the PQC. Once matrix A is
defined, consider the square of its Hilbert-Schmidt norm,
which can be expanded as follows:
1 This definition of an approximate t-design employing the Hilbert-
Schmidt norm can be related to alternative definitions via norm
inequalities. That is, an -approximate t-design of a particular
definition and an ′-approximate t-design of another definition
are related by a factor of poly(2nt) for n qubits [24].
‖A‖2HS = tr(A†A) (4)
= tr
([
Πtsym
d
(t)
sym
− µt
]†[
Πtsym
d
(t)
sym
− µt
])
(5)
= tr
[(
Πtsym
d
(t)
sym
)2]
− 2 tr
(
Πtsym µt
d
(t)
sym
)
+ tr(µ2t )
(6)
=
1
d
(t)
sym
− 2
d
(t)
sym
+ tr(µ2t ) (7)
=
−1
d
(t)
sym
+ tr(µ2t ). (8)
In Eq.(4), we substitute the Haar integral with the nor-
malized projector onto the symmetric subspace of t 2n-
dimensional spaces, or Π
t
sym
d
(t)
sym
, where d(t)sym denotes the di-
mension of the subspace. Using properties of the projec-
tor, specifically properties of a general projector as well
as the expansion of the Haar integral as a linear combina-
tion of subsystem permutation operators, the expression
simplifies to a sum of two terms – a constant (provided
a fixed number of qubits) plus a purity term for µt, as
shown in Eq.(8). The purity term is:
tr(µ2t ) =
∫
Θ
∫
Φ
|〈ψθ|ψφ〉|2t dθ dφ, (9)
which as seen above, is integrated twice over the param-
eter space. Each term in Eq.(8) can be expressed as a
quantity called the frame potential [25]:
F (t)Haar =
1
d
(t)
sym
and F (t) =
∫
Θ
∫
Φ
|〈ψθ|ψφ〉|2t dθ dφ,
(10)
where the generalized t-th frame potential F (t) for the
state |0〉 = |0〉⊗n is defined as:
F (t)|0〉〈0| =
∫
dµ(U) dµ(V ) [〈0|UV † |0〉 〈0|V U† |0〉]t (11)
=
∫
dµ(|ψ〉) dµ(|φ〉) |〈ψ|φ〉|2t. (12)
In this definition, the measures µ correspond, respec-
tively, to the distribution over unitaries and the distri-
bution over states induced by applying unitaries from
the distribution to the initial state |0〉. For an ensemble
of Haar random states or a state t-design, F (t) = 1dtsym =
(t)!(N−1)!
(t+N−1)! , where N = 2
n for n qubits. These frame po-
tential values saturate the, so-called, Welch bounds [26].
4Past studies have shown that the first t frame poten-
tials of an ensemble achieve their minimum values (Welch
bounds) if and only if the ensemble is a state t-design [27–
29], motivating frame potentials as “probe[s] of random-
ness” [25]. This lower bound can be observed by noting
that ‖A‖2HS ≥ 0, and thus Eq.(10) implies F (t) ≥ F (t)Haar.
Given this property, one potential approach to quantify-
ing expressibility could be to estimate the first few frame
potentials using sampled parameterized states and com-
paring them against the Haar values. However, using this
method, it is difficult to derive a single meaningful score.
Instead, we draw inspiration from the structure of frame
potentials to establish an operationally meaningful score
of expressibility.
1. Estimating expressibility
Frame potentials can be understood as the t-th mo-
ments of the distribution of state overlaps:
p(F = |〈ψθ|ψφ〉|2). (13)
This is seen as follows:
F (t) =
∫
Θ
∫
Φ
|〈ψθ|ψφ〉|2t dθ dφ (14)
= Eθ Eφ[(|〈ψθ|ψφ〉|2)t] (15)
= E[F t], F = |〈ψθ|ψφ〉|2. (16)
For the ensemble of Haar random states, the analyti-
cal form of the probability density function of fidelities
is known: PHaar(F ) = (N − 1)(1 − F )N−2, where F
corresponds to the fidelity and N is the dimension of
the Hilbert space [30]. To probe the non-uniformity of
the set of states generated by a PQC, we propose com-
paring the resulting distribution of state fidelities gen-
erated by the sampled ensemble of parameterized states
to that of the ensemble of Haar random states. In prac-
tice, we can estimate the fidelity distributions or esti-
mate the first few moments of fidelity F by indepen-
dently sampling pairs of states (i.e. sampling pairs of
parameter vectors and obtaining parameterized states)
and treating the corresponding fidelities as random vari-
ables. Using this sampling technique, we can show that
the resulting sample mean is an unbiased estimator of
the population mean. After collecting sufficient samples
of state fidelities2, the Kullback-Leibler (KL) divergence
[31], often used in machine learning applications, between
the estimated fidelity distribution and that of the Haar-
distributed ensemble can be computed to quantify ex-
pressibility (“Expr”):
2 Refer to Appendix B for a discussion on the sample size.
Expr = DKL(PˆPQC(F ;θ) ‖ PHaar(F )), (17)
where PˆPQC(F ;θ) is the estimated probability distribu-
tion of fidelities resulting from sampling states from a
PQC. Due to a finite sample size, the probability dis-
tribution is estimated with a histogram. Therefore, we
must choose a discretization of the two probability dis-
tributions in order to numerically estimate the KL diver-
gence. By this scoring method, a PQC with a resulting
fidelity distribution that corresponds to a lower KL diver-
gence with respect to that of Haar, is a more expressible
circuit. In the least-expressible case, in which a circuit
has fixed gates, e.g. I (versus parameterized gates), that
always outputs the same state, the upper bound express-
ibility value is (N − 1) ln(nbin) for system size of N and
number of bins for the histograms, nbin.
Defining expressibility in terms of the KL divergence
grants us an operational meaning for this numerical
value: expressibility is the amount of information that
is lost if we were to approximate the distribution of state
fidelities generated by a PQC using that of Haar random
states. We expect that, for well-behaved distributions,
the expressibility will be zero if and only if the distribu-
tion of states is the Haar distribution. However, a proof
of this fact has so far eluded us.
2. Expressibility: single qubit demonstration
We describe a simple example of computing express-
ibility to clarify its construction. Consider the single-
qubit circuits shown in Fig. 1a that ranges in their abil-
ities to explore the Bloch sphere. The “idle” circuit is
an un-expressive circuit that consists of an identity or
idle gate acting on the reference state |0〉. The states
obtained from uniformly sampling θ (rotation about the
Z-axis) on circuit A are limited to an exploration about
the equator of the Bloch sphere. The set of states gen-
erated by circuit B, however, is expected to have a bet-
ter coverage of the Bloch sphere due to the additional
degree of freedom provided by the X-rotation. Finally,
as a reference, we uniformly sample single-qubit unitary
matrices to simulate the most expressible circuit case.
Fig. 1b displays simulated data of sampled states (2000
points) plotted on the Bloch sphere. Because we are sam-
pling uniformly in the parameter space (versus the state
space), sampling states from circuit B will lead to greater
concentrations of states in the +x and −x poles. In Fig.
1c, the estimated histograms of fidelities are displayed
for each circuit, overlaid with the histogram of fidelities
for the Haar-distributed ensemble for comparison. A bin
number of 75 was used to generate the histograms. Al-
though the KL divergences will vary with different bin
number, we expect the observations coming from rela-
tive quantitative comparisons among circuits to remain
the same. Above each histogram, the KL divergences are
5|0i H RZ |0i H RZ RX
a)
b)
c)
Low expressibility High expressibility
|0i I |0i U
Circuit A Circuit B Arbitrary unitaryIdle circuit
d)
DKL = 4.30
<latexit sha1_base64="0Dq06UtoSHmrinniDAhe4aY8U0 U=">AAACLnicbVDLSgNBEJyN78RH1KOXwSB4CrsmoBchqAdBDxHMA5IlzE46yZCZ2XVmVglrvsOr/oFfI3gQr36Gk8fBJBY 0FFXddHcFEWfauO6nk1paXlldW99IZza3tneyu3tVHcaKQoWGPFT1gGjgTELFMMOhHikgIuBQC/qXI7/2CEqzUN6bQQS+I F3JOowSYyX/qpXc3A7xOS7mC24rm3Pz7hh4kXhTkkNTlFu7TqbZDmksQBrKidYNz42MnxBlGOUwTDdjDRGhfdKFhqWSCNB+ Mr56iI+s0sadUNmSBo/VvxMJEVoPRGA7BTE9Pe+NxP+8Rmw6Z37CZBQbkHSyqBNzbEI8igC3mQJq+MASQhWzt2LaI4pQY4O a2RL1BppRbR+R8ERDIYhsJ81+H8yw4fnJM855uKmI7I6etfF582EtkupJ3ivk3btirnQxDXIdHaBDdIw8dIpK6BqVUQVR9 IBe0Ct6c96dD+fL+Z60ppzpzD6agfPzC/7ppqE=</latexit>
DKL = 0.22
<latexit sha1_base64="sglEJh1wXGPnxcYzYS0neIudVT k=">AAACLnicbVDLSgNBEJz1GRMfiR69DAbBU9iNgl6EoB4EPUQwD0iWMDvpJENmZteZWSWs+Q6v+gd+jeBBvPoZTh4Hk1j QUFR1090VRJxp47qfztLyyuraemojndnc2t7J5narOowVhQoNeajqAdHAmYSKYYZDPVJARMChFvQvR37tEZRmobw3gwh8Q bqSdRglxkr+VSu5uR3ic+wWisVWNu8W3DHwIvGmJI+mKLdyTqbZDmksQBrKidYNz42MnxBlGOUwTDdjDRGhfdKFhqWSCNB+ Mr56iA+t0sadUNmSBo/VvxMJEVoPRGA7BTE9Pe+NxP+8Rmw6Z37CZBQbkHSyqBNzbEI8igC3mQJq+MASQhWzt2LaI4pQY4O a2RL1BppRbR+R8ERDIYhsJ81+H8yw4fnJM857uKmI7I6etfF582Etkmqx4B0X3LuTfOliGmQK7aMDdIQ8dIpK6BqVUQVR9 IBe0Ct6c96dD+fL+Z60LjnTmT00A+fnF/mipp4=</latexit>
DKL = 0.02
<latexit sha1_base64="+QJEiT1glJGivPFJqMKcVInr1kI=">AAACL3icbVDLSgMxFM34rK2PVpdugkVwVWaqoBuhqAt BFxXsA9qxZNLbNjTJDElGKWP/w63+gV8jbsStf2H6WNjWAyGHc+7l3nuCiDNtXPfTWVpeWV1bT22kM5tb2zvZ3G5Vh7GiUKEhD1U9IBo4k1AxzHCoRwqICDjUgv7lyK89gtIslPdmEIEvSFeyDqPEWOnhqpXc3A7xOXYLbjHdyubtPwZeJN6U5NEU5VbOy TTbIY0FSEM50brhuZHxE6IMoxyG6WasISK0T7rQsFQSAdpPxmsP8aFV2rgTKvukwWP1b0dChNYDEdhKQUxPz3sj8T+vEZvOmZ8wGcUGJJ0M6sQcmxCPMsBtpoAaPrCEUMXsrpj2iCLU2KRmpkS9gWZU20MkPNFQCCLbSbPfBzNseH7yjPMebioiu6NjbX zefFiLpFoseMcF9+4kX7qYBplC++gAHSEPnaISukZlVEEUKfSCXtGb8+58OF/O96R0yZn27KEZOD+/MESmsA==</latexit>
DKL = 0.007
<latexit sha1_base64="7NkFWNlrimucis4GvlSkq1r4TGI=">AAACMnicbVBNS8NAEN34bau26tHLYhE8lUSFehGKehD 0oGBtoQ1ls522S3c3YXejhJhf4lX/gX9Gb+LVH+FGc7Dqg4HHezPMzAsizrRx3RdnZnZufmFxablUXlldq1TXN250GCsKLRryUHUCooEzCS3DDIdOpICIgEM7mJzkfvsWlGahvDZJBL4gI8mGjBJjpX61ctpPzy8yfITduus2Sv1qLSc58F/iFaSGClz21 51ybxDSWIA0lBOtu54bGT8lyjDKISv1Yg0RoRMygq6lkgjQfvp1eYZ3rDLAw1DZkgZ/qT8nUiK0TkRgOwUxY/3by8X/vG5shod+ymQUG5D0e9Ew5tiEOI8BD5gCanhiCaGK2VsxHRNFqLFhTW2JxolmVNtHJNzRUAgiB2lvMgGTdT0/vcc1D/cUkaP8WR uf9zusv+Rmr+7t192rg1rzuAhyCW2hbbSLPNRATXSGLlELURSjB/SInpxn59V5c96/W2ecYmYTTcH5+AQ0Pqcg</latexit>
Figure 1: Quantifying expressibility for single-qubit circuits. (a) Circuit diagrams are shown for the four types of
circuits. (b) For each circuit, 1000 sample pairs of circuit parameter vectors were uniformly drawn, corresponding to
2000 parameterized states that were plotted on the Bloch sphere using QuTiP [32]. (c) Histograms of estimated
fidelities are shown, overlaid with fidelities of the Haar-distributed ensemble, with the computed KL divergences
reported above the histograms. (d) The frame potential estimates for the first four moments are plotted for each
circuit, with the Haar values (Welch bounds) plotted using a purple dotted line.
reported to quantify the deviation, in which a lower KL
divergence value corresponds to more favorable express-
ibility or a greater closeness to Haar random states. For
the least expressible case, i.e. the idle circuit, the ex-
pected KL divergence is ln(75) ≈ 4.3. In Fig. 1d, the
estimated frame potentials for the first four moments are
plotted, in which we see that circuits that are expected
to be more expressible have frame potential values that
are lower or closer to the corresponding values of the
Haar-distributed ensemble.
B. Entangling capability
In the context of variational HQC algorithms, poten-
tial advantages of generating highly entangled states with
low-depth circuits include the ability to efficiently repre-
sent the solution space for tasks such as ground state
preparation or data classification, and to capture non-
trivial correlation in the quantum data [9, 20]. Accord-
ingly, in VQE [20] and quantum machine learning [8, 9],
strongly entangling circuits have been realized by ap-
pending and repeating layers comprised of varying con-
figurations of two-qubit gates, e.g. CNOT, CZ, and their
parameterized variants. Though the powers and advan-
tages of such “entanglers” have been empirically demon-
strated for specific problems, we propose computing the
Meyer-Wallach (MW) entanglement measure [33] as a
way to quantify the entangling capability of a parameter-
ized quantum circuit, or its ability to generate entangled
states, independent of the problem at hand. The Meyer-
Wallach entanglement measure, often denoted as Q, is
6a global measure of multi-particle entanglement for pure
states. While there exist several methods for quantifying
entanglement as a resource [34], the MW measure was
chosen due to its scalability and ease of computation.
1. Meyer-Wallach measure
The Meyer-Wallach entanglement measure, or Q, is
defined as follows. For a system of n qubits, consider
a linear mapping ιj(b) that acts on the computational
basis:
ιj(b) |b1...bn〉 = δbbj |b1...bˆj ...bn〉 (18)
where bj ∈ {0, 1} and the symbol ˆ denotes absence of the
j-th qubit. Meyer and Wallach define the entanglement
measure Q as:
Q(|ψ〉) ≡ 4
n
n∑
j=1
D
(
ιj(0) |ψ〉 , ιj(1) |ψ〉
)
, (19)
where the generalized distance D is:
D(|u〉 , |v〉) = 1
2
∑
i,j
|uivj − ujvi|2, (20)
with |u〉 = ∑ui |i〉 and |v〉 = ∑ vi |i〉. D can be un-
derstood as the square of the area of the parallelogram
created by vectors |u〉 and |v〉. By construction, Q has
the following properties: (1) Q is invariant under local
unitaries, (2) 0 ≤ Q ≤ 1, and (3) Q(|ψ〉) = 0 if and
only if |ψ〉 is a product state. For instance, Q(|01〉) = 0,
while Q( |00〉+|11〉√
2
) = 1. It can be shown that Q is the
average linear entropy (i.e. 1 − tr{ρ2}) of all the sin-
gle qubit reduced states [35]. A drawback of the MW
measure is that it is fairly undiscerning with respect to
different types of entanglement. As described in [35],
for instance, the state |Ψ〉 = |00〉1,2+|11〉1,2√
2
⊗ |00〉3,4+|11〉3,4√
2
and the four-qubit GHZ state both correspond to the
maximum Q value of 1. However, alternative measures
of entanglement, such as the Schmidt number [36], rank
the GHZ state as having higher entanglement than |Ψ〉.
Nevertheless, we chose the MW measure because it has
been used as an effective probe of entanglement for a wide
range of applications in quantum information, including
characterization of entangled states involved in quantum
error correcting codes [33] and quantum phase transitions
[37]. The MW measure has also been applied as a tool for
dynamically tracking the convergence of pseudorandom
circuits by computing the deviation of the MW measure
from the Haar value [23], and the evolution of entangle-
ment for an instance of Grover’s algorithm [33]. Over the
course of a variational HQC algorithm, parameters of a
PQC are dynamically tuned to reach the solution space
for a given problem. Therefore, the MW measure seems
to be especially well-suited for quantifying the entangling
capability of a parameterized quantum circuit by quan-
tifying the number and types of entangled states it can
generate.
2. Estimating entangling capability
We define the entangling capability of a parameterized
quantum circuit as the average Meyer-Wallach entangle-
ment of states generated by the circuit. For a given PQC,
we can estimate this value by sampling the circuit pa-
rameters and computing the sample average of the MW
measure of output states. More precisely, we take the
estimate of the entangling capability to be:
Ent =
1
|S|
∑
θi∈S
Q
( |ψθi〉 ), (21)
where S = {θi} is the set of sampled circuit parame-
ter vectors. Using this measure, a parameterized circuit
that outputs only product states will have an entangling
capability score of 0, whereas one that always produces
highly entangled states will correspond to a score close to
1. For cases in between, the mean will lie in between the
two values, and for a closer investigation to understand
to what extent the circuit is able to produce entangled
states, one can compute statistical properties of the sam-
ple distribution of Q values. Note that we can recycle the
states generated in the expressibility study for estimating
this entangling capability.
C. Circuit costs
In order to establish a fair comparison of the ex-
pressibility and entangling capability among circuits, the
“costs” of implementing the various circuits should also
be taken into account. The three costs we consider are
the circuit depth, circuit connectivity, and number of pa-
rameters. The first two costs are particularly relevant
for NISQ computers, which are limited in their coherence
times and connectivity among the qubits. In addition to
the circuit depth, we also track the number of two-qubit
gates for each circuit to estimate the difficulty of imple-
menting the circuit on a quantum device. That is, we
expect a circuit comprising significantly of costly two-
qubit gates to execute with a lower program fidelity. For
the circuits we consider in the study, the configuration
of their two-qubit gates determine the overall connectiv-
ity, where the use of non-local two-qubit gates generally
increases the complexity of the required qubit topology.
In principle, one can decompose the circuit to map onto
alternative topologies, but the resulting circuit may in-
cur a high cost in the circuit depth and number of gates.
7Therefore, we consider three possible configurations of
two-qubit gates in our circuits: nearest-neighbor, ring
topology, and all-to-all connectivities. The number of
parameters is especially relevant for variational quantum
algorithms, in which an optimization routine on the clas-
sical computer is tasked with updating and refining the
parameter values. Thus, we use the number of parame-
ters as a rough measure for the difficulty of optimization.
III. NUMERICAL EXPERIMENTS
In this section, we demonstrate the use of the descrip-
tors for characterizing parameterized quantum circuits
by computing them for a select set of circuits composed
of different configurations of single-qubit and two-qubit
gate operations. Parameterized gates used in this work
are RX , RY , and RZ , defined in [38]. We set the circuit
width to four qubits, but for an analysis of these circuit
templates at larger widths, the reader should refer to Ap-
pendix A. Circuit simulations presented in this work are
implemented using the Forest platform [39].
Several circuit designs in Fig. 2 were derived or in-
spired by past studies. For instance, circuits 5 and 6
were developed in [40] as programmable quantum cir-
cuits and were applied to train the quantum autoencoder
in [5]. Circuits 7 and 8 were used as encoding and un-
encoding circuits for the QVECTOR algorithm [6]. Cir-
cuit 9 was a Quantum Kitchen Sinks ansatz considered
in [41]. Circuit 10 followed the hardware-efficient circuit
architecture from [20]. Circuit 11 and 12 were Joseph-
son sampler circuits from [21]. Circuits 13-15 and 18-19
followed the circuit-block construction from [9] used for
data classification.
In the following subsections, we review observations
from our simulations for expressibility and entangling ca-
pability. For an in-depth discussion of trends e.g. in gate
types and configurations, the reader should refer to Sec-
tion IV.
A. Expressibility observations
One of the main advantages of employing a multi-
layered parameterized circuit to variational HQC algo-
rithms is the potential to systematically extend its “flex-
ibility,” or the ability to represent a wider class of states,
simply by adding more layers to the original circuit,
which was demonstrated in the Appendix of Ref. [20].
In this section, we provide a deeper analysis into the ef-
fects of multi-layering by computing how much each layer
contributes to the overall expressibility for a given cir-
cuit template. We numerically demonstrate that, while
expressibility generally improves with increased circuit
layers, it does so differently depending on the circuit
template. This implies that, provided two circuits with
different expressibility values, it may be preferred (e.g.
more economic in depth and gate count) to select the
less expressible circuit template but with added layers,
in the case that multi-layering boosts the expressibility
of that circuit.
To compute and compare the expressibility among pa-
rameterized circuit templates in Fig. 2, 5000 state fideli-
ties were sampled for each circuit instance to construct
its histogram, using a bin size of 75. For each circuit tem-
plate, we considered instances in which the unit layer was
repeated up to five times, i.e. Lmax = 5. Expressibility
for these circuit instances are computed and plotted in
Fig. 3, in which the data is organized such that circuits
are ordered based on ascending values of expressibility
for the L = 1 instance. This ordering scheme enables us
to track the changes in the expressibility values as well
as changes in the relative ranking (by expressibility) with
increased values of L.
We expect that circuits with more gates will generally
have a more favorable expressibility. In Appendix C we
compare expressibility among circuits as a function of
the number of two-qubit gates. Expressibility as a func-
tion of two-qubit gate number may be a more fair way
to compare two circuit templates. Two-qubit gates are
more costly in terms of time required for implementation
as well as the noise they contribute. However, care must
be taken with such a cost analysis. In practice, the cir-
cuits proposed must be transpiled into the gate set that
is native to the particular device. Depending on the de-
vice, the resulting total number of two-qubit gates will
be different. For now, we will consider expressibility as a
function of circuit layer number, and reserve the express-
ibility per-two-qubit-gate comparisons for Appendix C.
We first grouped circuits based on closeness in express-
ibility values at L = 1. This enables us to make compar-
isons among circuits that start with similar expressibility
values. In the following, we describe observations on the
groups of circuits, before extending the analysis to L > 1.
At L = 1:
• Circuit 9 is the least expressible of the nineteen
circuit templates, i.e. corresponds to the highest
KL divergence of approximately 0.68. By construc-
tion, a single layer of circuit 9 can produce high-
entanglement (Q=1) states but cannot efficiently
explore low-entanglement states. This results in an
unfavorable expressibility value.
• Circuits 1, 2, 16, 3, 18, 10, 12, and 15 had compara-
ble expressibility values of around 0.2.3 Circuits 1-4
were originally devised to demonstrate a systematic
3 Circuits 3 and 16 are in fact equivalent, and we chose to include
both circuits to reflect the fact that parameterized circuits are
often generated by choosing a particular stencil (fixed placement
of arbitrary gates), and then populating the gate slots from a
selected gate set. In principle, we should be able to (and do)
capture the circuit equivalence from matching descriptor values.
Moreover, these circuits may not yield identical behavior in an
experimental setting.
8Circuit 1 Circuit 2 Circuit 3
|0i
|0i
|0i
|0i
RX
RX
RX
RX
RZ
RZ
RZ
RZ
|0i
|0i
|0i
|0i
RX
RX
RX
RX
RZ
RZ
RZ
RZ
RZ
RZ
RZ |0i
|0i
|0i
|0i
RX
RX
RX
RX
RZ
RZ
RZ
RZ
RX
RX
RX|0i
|0i
|0i
|0i
RX
RX
RX
RX
RZ
RZ
RZ
RZ
Circuit 4
|0i
|0i
|0i
|0i
RX
RX
RX
RX
RZ
RZ
RZ
RZ
RZ
RZ
RZ
RZ
RZ
RZ
RZ
RZ
RZ
RZ
RZ
RZ
RX
RX
RX
RX
RZ
RZ
RZ
RZ
|0i
|0i
|0i
|0i
RX
RX
RX
RX
RZ
RZ
RZ
RZ
RZ
RZ
RX
RX
RX
RX
RZ
RZ
RZ
RZ
RZ
Circuit 5 Circuit 7
|0i
|0i
|0i
|0i
RX
RX
RX
RX
RZ
RZ
RZ
RZ
RX
RX
RX
RX
RX
RX
RX
RX
RX
RX
RX
RX
RX
RX
RX
RX
RZ
RZ
RZ
RZ
|0i
|0i
|0i
|0i
RX
RX
RX
RX
RZ
RZ
RZ
RZ
RX
RX
RX
RX
RX
RX
RZ
RZ
RZ
RZ
RX
Circuit 6 Circuit 8
|0i
|0i
|0i
|0i
H
H
H
H
RX
RX
RX
RX
|0i
|0i
|0i
|0i
RY
RY
RY
RY
RY
RY
RY
RY
|0i
|0i
|0i
|0i
RY
RY
RY
RY
RZ
RZ
RZ
RZ
RY
RY
RZ
RZ
|0i
|0i
|0i
|0i
RY
RY
RY
RY
RZ
RZ
RZ
RZ
RY
RY
RZ
RZ
Circuit 9 Circuit 10 Circuit 11 Circuit 12
|0i
|0i
|0i
|0i
RY
RY
RY
RY
RZ
RZ
RZ
RZ
RY
RY
RY
RY
RZ
RZ
RZ
RZ
|0i
|0i
|0i
|0i
RY
RY
RY
RY
RX
RX
RX
RX
RY
RY
RY
RY
RX
RX
RX
RX
Circuit 13 Circuit 14
|0i
|0i
|0i
|0i
RY
RY
RY
RY
RY
RY
RY
RY
|0i
|0i
|0i
|0i
RX
RX
RX
RX
RZ
RZ
RZ
RZ
RZ
RZ
RZ
|0i
|0i
|0i
|0i
RX
RX
RX
RX
RZ
RZ
RZ
RZ
RX
RX
RX
Circuit 15 Circuit 16 Circuit 17
|0i
|0i
|0i
|0i
RX
RX
RX
RX
RZ
RZ
RZ
RZ
RZ
RZ
RZ
RZ
|0i
|0i
|0i
|0i
RX
RX
RX
RX
RZ
RZ
RZ
RZ
RX
RX
RX
RX
Circuit 18 Circuit 19
Figure 2: A set of circuit templates considered in the study, each labeled with a circuit ID. The dashed box indicates
a single circuit layer, denoted by L in the text, that can be repeated. Gates RX , RY , and RZ are parameterized.
Several circuit templates are from or inspired by past studies. Circuit diagrams were generated using qpic [42].
9Low 
Expr
High 
Expr
Figure 3: Expressibility values (or KL divergences) computed for the benchmark circuits from Fig. 2 with circuit
widths of n = 4 qubits. Marker colors indicate different numbers of circuit layers (L) applied to a circuit template.
Data for each circuit are presented in the order of increasing expressibility (i.e. decreasing KL divergence) for L = 1.
The zoomed-in plots of the two highlighted regions show instances of “expressibility saturation” discussed in the text.
increase in expressibility, as (fixed) two-qubit gates
are added to circuit 1 to construct circuit 2. From
circuit 2, the CNOTs are replaced with paramet-
ric two-qubit gates to construct circuits 3 and 4.
Using a single circuit layer, there was no signifi-
cant increase in expressibility for circuit 2 and 3,
compared to that of circuit 1. However, a higher
expressibility was observed for circuit 4.
• Circuits 17, 4, and 11 achieved expressibility val-
ues near 0.09. These circuits, which employed
two-qubit gates in a nearest-neighbor fashion, were
more expressible than circuit 15 that employed two-
qubit gates in a ring topology. This may be due to
the parametric two-qubit gates used by the three
circuits, compared to the (static) CNOTs used in
circuit 15.
• Circuits 7, 8, and 19 achieved expressibility values
near 0.09. Circuit 8 can be viewed as a parame-
terized variant of circuit 11, where the additional
degrees of freedom from the extended parameteri-
zation led to a (small) improvement in expressibil-
ity.
• Circuit templates exhibiting favorable expressibil-
ity (DKL < 0.02) included circuits 5, 13, 14, and
6, in ascending order. Despite the use of costly all-
to-all configurations of two-qubit gates employed
by circuit 5, circuit 13, which employs the circuit-
block construction (ranges of control set to 1 and
3), achieved more favorable expressibility.
• Circuit 6 was the most expressible circuit at L = 1,
achieving a low KL divergence with a single cir-
cuit layer. However, even a single layer of circuit
6 requires a high circuit depth and number of pa-
rameters.
For each circuit template, its unit layer was repeated
L times, leading to a general increase in expressibility.
However, shifts in the ordering of circuits by expressibil-
ity were observed, as shown in Fig. 3 for L > 1 instances.
For example, two layers of circuit 2 are more expressible
than two layers of circuits 16 or 3. This implies that
the rates of change in expressibility vary among the cir-
cuits. For fourteen out of the nineteen circuit templates
that were considered, expressibility of these circuits at
L = 2 increased (i.e. their corresponding KL divergences
decrease) by more than 60% of their respective values
at L = 1. From L = 2 to L = 3, expressibility values
of six circuits, including that of circuit 9, increased by
more than 60%. That is, differences in circuit structures
corresponded to having different rates of increase in ex-
pressibility from layer to layer. One consequence is that
there may be a circuit template that corresponds to a less
favorable expressibility with limited layers but reaches a
significantly better value with sufficient layers. For exam-
ple, with just three layers the expressibility of circuit 11
“catches up” to that of circuit 6 and does so with a simpler
circuit connectivity. Finally, we observe that the value of
expressibility for increasing layers appears to “saturates”
at different values for different circuit templates. We fur-
ther explore and elaborate on these observed behaviors
in Section IV.
10
Low 
Ent
High 
Ent
Figure 4: Entangling capability values for the
benchmark circuits with widths of n = 4. Marker colors
indicate the different numbers of circuit layers (L)
applied. Data for each circuit are presented in the order
of increasing entangling capability for L = 1. The black
dashed line shows the mean Q value for random pure
states.
B. Entangling capability observations
As noted earlier, the entangling capability, or the aver-
age Q, can be estimated by recycling the set of parame-
terized states used to estimate the expressibility. We per-
form this computation for the circuit templates in Fig. 2,
this time exploring the effects of multi-layering in gener-
ating high-entanglement states. The two “extreme” cases
are circuit 1 and 9, in which the former is composed only
of local unitaries and thus all the states sampled by tun-
ing its parameters have Q values of 0, as observed in the
figure. By contrast, circuit 9 outputs high-entanglement
states, resulting in Q values near 1. However, when up
to four and five layers were added to circuit 9, a decrease
in the mean was observed. At that point, the circuit
is flexible enough to produce lower-entanglement states,
leading to lower average Q. More specifically, the entan-
gling capability for L = 4 is lower than that for L = 5.
This may be explained as an oscillatory convergence to
the mean Q value for Haar-random pure states:
〈Q〉Haar = N − 2
N + 1
, (22)
given an N -dimensional Hilbert space [23]. This alterna-
tive metric for quantifying the closeness to Haar is ver-
ified in the expressibility data for circuit 9 in Fig. 3, in
which its expressibility monotonically increases with the
number of circuit layers. For other circuits, their mean Q
values are distributed between 0 and 1, where their values
appear to approach the theoretical average for random
pure states as circuit layers are added. These simula-
tions allow us to compare expressibility and entangling
capability. While a favorable expressibility corresponds
to an entangling capability that approaches or converges
to 〈Q〉Haar, the converse is not generally true. For exam-
ple, a single layer of circuit 2 has a relatively unfavorable
expressibility value, despite having a mean Q value close
to 〈Q〉Haar. This joint information in fact provides some
insight into the types of states explored by circuit 2, in
which sampling states from this circuit allows for a “selec-
tive” exploration of some highly-entangled states in the
Hilbert space.
C. Cost estimate observations
For each circuit template, we considered the circuit
depth, number and topology of two-qubit gates, and the
number of parameters to estimate the cost associated
with implementing the circuit. The number of parame-
ters and two-qubit gates and the circuit depth are shown
in Table I in terms of the number of qubits n and number
of circuit layers L. For circuit templates that employ the
circuit-block construction from ref. [9], they are also de-
fined by their ranges of control, or the number of qubits
between the control and target qubits to quantify the
non-locality of two-qubit gates in the block. Circuits 13-
15 are each comprised of two consecutive circuit-blocks,
with ranges of control set to 1 and 3, respectively. Cir-
cuits 18-19 each utilize a single circuit-block with range
of control set to 1. To compare circuits based on the
cost, we group them based on the scaling in terms of the
qubit number. The circuit depths for circuits 5 and 6
grow quadratically with the qubit number. Depths for
circuits 2-4, 9, 10, 13-15, 18, 19 have a linear dependence
with respect to the number of qubits, while the remain-
ing circuits have a constant dependence. Similarly, for
the number of two-qubit gates and circuit parameters,
circuits 5 and 6 again have a quadratic dependence on
the qubit number, while the remaining circuits have a
linear dependence. Circuits 5 and 6 are consistently the
two most costly circuits, in terms of the circuit depth and
numbers of two-qubit gates and parameters.
Circuit templates in Fig. 2 vary in qubit topologies.
For example, the nearest-neighbor two-qubit interactions
in circuits 2-4, 7-9, 11, 12, 16, and 17 enable them to be
mapped naturally onto a linear array of qubits. Several
circuit templates employ non-local two-qubit operations
to represent circuits tailored to quantum hardware that
can support such interactions between qubits. Circuits
10, 18, and 19 can be executed on a ring topology (i.e.
the range on control is 1 for these circuits). Circuits
13, 14, and 15 each are comprised of two circuit blocks,
with ranges of control set to 1 and 3 respectively. Conse-
quently, the overall qubit connectivity is somewhere be-
tween a ring topology and a fully connected graph topol-
ogy. Lastly, circuits 5 and 6 are composed of two-qubit
interactions that require all-to-all connectivity.
From the analysis of circuit costs, it was evident that,
while circuits 5 and 6 were shown to exhibit favorable
expressibility and entangling capability ( especially with
additional circuit layers), this comes at the cost of hav-
ing a large number of parameters and two-qubit gates, as
well as high circuit depth and qubit connectivity. These
factors make the circuits less favorable for use on NISQ
devices. Some promising alternatives include circuits 11,
12, and 19 that become comparable in values of express-
11
ibility and entangling capability with additional layers,
as those of circuits 5 and 6 while also maintaining rea-
sonable circuit costs.
IV. DISCUSSION
With the statistical properties and cost estimates com-
puted for the set of test circuits, we discuss several obser-
vations and trends noted in the simulation data that may
serve as guides for designing new or improved parameter-
ized quantum circuits for variational HQC algorithms.
Expressibility saturation. As additional layers are
added to parameterized quantum circuits, the express-
ibility value does not always continue to improve. For
each of the circuits studied, there is a layer number be-
yond which the expressibility of added layers “saturates.”
An example of the phenomenon is shown in the two call-
outs of Fig. 3. Circuit 15 saturates to a value around
0.1, while Circuit 13 is expected to saturate at 0 (as ex-
plained in Appendix C, the true saturation value here
can only be said to be below the systematic statistical
finite-sampling bias of 0.0039). In Appendix C we show
numerics of expressibility saturation with respect to two-
qubit gate number. We find that different PQCs saturate
at different layer numbers and to different expressibil-
ity values. This observation may inform the selection of
PQCs used in practice.
Consider trying to design a parameterized quantum
circuit to maximize expressibility, while maintaining a
low depth. One should choose a circuit that does not
saturate at a poor value of expressibility and choose a
number of layers that is below that circuit’s saturation
point. Furthermore, in thinking about using expressibil-
ity to choose circuit fragments, saturation might be useful
in determining the number of layers used in each circuit
fragment.
In the case that a PQC achieves sufficiently favor-
able expressibility with added layers, it is important to
note that introducing more layers not only increases the
depth but also increases the number of circuit param-
eters. While having more circuit parameters can in-
crease the dimension of the manifold of states explored,
it may also make the optimization more challenging. If
the states in the smaller manifold (from adding some k
layers) are close in distance to the larger manifold (us-
ing greater than k layers), then a similar optimization
performance can be achieved, while using fewer parame-
ters. The expressibility value may be an indicator of the
representational power of a PQC, informing how many
circuit layers are sufficient to use in an application. For
a better understanding of how expressibility (saturation)
correlates with the algorithm performance, future inves-
tigation is needed.
Types of two-qubit operations. Many circuit struc-
tures in the benchmark set utilized parameterized two-
qubit operations. These included controlled-Z rotation
(CRZ) and controlled-X rotation (CRX) gates. Controlled-
Z rotation gates tend to feature in many experimen-
tal demonstrations as they are natively implemented
in superconducting architectures. Several experimental
demonstrations have used layers of controlled-Z rotations
for variational quantum algorithms [43, 44] . Controlled-
X gates can be constructed by conjugating a controlled-Z
rotation by a Hadamard gate.
To compare expressibility and entangling capability of
the two types of gates, we considered several pairs of cir-
cuit templates that only differ in the type of two-qubit
gates used. For example, circuits 5 and 6 share a com-
mon circuit template, with 5 using CRZ gates and 6 using
CRX gates in their respective two-qubit entangling blocks.
The descriptor values for such circuit pairs are shown
in Fig. II. For each pair, circuits employing CRX gates
corresponded to more favorable expressibility and entan-
gling capability. An explanation might be that CRZ oper-
ations in the entangling block commute with each other
and thus the effective unitary operation comprised of CRZ
gates can be expressed using unique generator terms that
are fewer than the number of parameters for these gates.
Accordingly, the dimension of the manifold of states ex-
plored by, say, circuit 5 will be less than the dimension
of that explored by circuit 6. This suggests that, if one is
trying to design a PQC to increase expressibility, it is bet-
ter to insert single-qubit gates which skew the controlled-
gate rotation axis away from the control axis (i.e. the
Z-axis).
Configurations of two-qubit operations. Though
abstract quantum circuits are often depicted in a lin-
ear orientation, recent hardware developments allow for
more complex qubit topologies, e.g. nearest-neighbor in-
teractions on a two-dimensional lattice or all-to-all qubit
interactions, depending on the architecture. To investi-
gate the potential advantages of these qubit topologies,
three configurations of two-qubit gates were compared:
nearest-neighbor (NN), circuit-block (CB), and all-to-all
(AA) configurations. The NN configuration is a natural
arrangement of two-qubit operations on a linear array of
qubits. The CB configuration is a natural arrangement
for an array of qubits that form a closed loop. The AA
configuration assumes a fully connected graph arrange-
ment of qubits. To set up a fair comparison, we consid-
ered the three circuits shown in Fig. 5 allowing the same
single-qubit rotations as well as the same number of two-
qubit operations. Qualitatively, the circuit-block config-
uration from [9] (in which the range of control is fixed
to 1) can be interpreted as an intermediate between the
nearest-neighbor and all-to-all configurations. That is,
for each circuit block, this circuit structure includes re-
gions of consecutive nearest-neighbor interactions in ad-
dition to a non-local interaction to complete the cyclic
connectivity.
Both expressibility and entangling capability were
computed, as shown in Table III. The AA configuration
led to the most favorable expressibility (lowest KL diver-
12
Circuit ID
Number of
parameters
Number of
two-qubit gates
Circuit depth
1 2nL 0 2L
2 2nL (n− 1)L (n+ 1)L
3 (3n− 1)L (n− 1)L (n+ 1)L
4 (3n− 1)L (n− 1)L (n+ 1)L
5 (n2 + 3n)L (n2 − n)L (n2 − n+ 4)L
6 (n2 + 3n)L (n2 − n)L (n2 − n+ 4)L
7 (5n− 1)L (n− 1)L 6L
8 (5n− 1)L (n− 1)L 6L
9 nL (n− 1)L (n+ 1)L
10 n+ nL nL 1 + (n+ 1)L
11 (4n− 4)L (n− 1)L 6L
12 (4n− 4)L (n− 1)L 6L
13
(
3n+ ngcd(n,3)
)
L
(
n+ ngcd(n,3)
)
L
(
2 + n+ ngcd(n,3)
)
L
14
(
3n+ ngcd(n,3)
)
L
(
n+ ngcd(n,3)
)
L
(
2 + n+ ngcd(n,3)
)
L
15 2nL
(
n+ ngcd(n,3)
)
L
(
2 + n+ ngcd(n,3)
)
L
16 (3n− 1)L (n− 1)L 4L
17 (3n− 1)L (n− 1)L 4L
18 3nL nL (n+ 2)L
19 3nL nL (n+ 2)L
Table I: Cost estimates for circuits from Fig. 2, i.e. the number of parameters, number of two-qubit operations, and
circuit depth in terms of n, number of qubits and L, number of circuit layers.
CRZ CRX
ID Expr Ent ID Expr Ent
3 0.24 0.34 4 0.13 0.47
5 0.06 0.41 6 0.004 0.78
7 0.10 0.33 8 0.09 0.39
13 0.05 0.61 14 0.01 0.66
16 0.26 0.35 17 0.14 0.45
18 0.24 0.44 19 0.08 0.59
Table II: Descriptors computed for six circuit pairs with
widths of n = 4 qubits and depths of L = 1 layer. Data
for each pair are reported in the same row. Each pair
assumes the same circuit template with varying
two-qubit operations (CRZ or CRX gates).
gence), although the CB configuration had an expressibil-
ity value close to that of the AA configuration. Although
the NN configuration had the worst expressibility, for the
same number of two-qubit operations, it corresponded to
the lowest circuit depth. Trends in entangling capabil-
ity were similar: both CB and AA configurations led to
high entangling capability. Therefore, the use of an all-
to-all configuration led to both favorable expressibility
Configuration Expr Ent
Nearest-neighbor 0.087 0.67
Circuit-block 0.015 0.80
All-to-all 0.011 0.80
Table III: Descriptors computed for circuits (when
n = 4) from Fig. 5 that employ different configurations
of two-qubit gates, i.e. nearest-neighbor, circuit-block,
or all-to-all.
and entangling capability scores but with a trade-off in
the number of parameters, circuit depth, and qubit con-
nectivity requirements. Though slightly less expressible
than the all-to-all configuration, the use of the circuit-
block architecture led to relatively favorable expressibil-
ity and entangling capability, offering a cheaper or more
near-term circuit structure alternative.
Multi-descriptor comparison. In practice, when se-
lecting a suitable circuit structure for a given applica-
tion, laying out the “circuit descriptor landscape” may be
a useful method to identify circuits that have favorable
qualities as well as reasonable circuit costs depending on
the resource constraints of the quantum hardware. An
example of such landscape plot is shown in Fig. 6. From
this plot, for an application in which favorable express-
13
|0i
|0i
|0i
|0i
RX
RX
RX
RX
RZ
RZ
RZ
RZ
RX
RX
RX
RX
RX
RX
RX
RX
RX
RX
RX
RX
a) Nearest-neighbor (NN)
|0i
|0i
|0i
|0i
RX
RX
RX
RX
RZ
RZ
RZ
RZ
RX
RX
RX
RX
RX
RX
RX
RX
RX
RX
RX
RX
b) Circuit-block (CB)
|0i
|0i
|0i
|0i
RX
RX
RX
RX
RZ
RZ
RZ
RZ
RX
RX
RX
RX
RX
RX
RX
RX
RX
RX
RX
RX
c) All-to-all (AA)
Figure 5: Circuits considered for comparing two-qubit
interaction configurations: (a) nearest-neighbor, (b)
circuit-block, and (c) all-to-all. Two-qubit entangling
blocks are shown in blue dashed lines. For a fair
comparison, these circuits assume the same number and
type of two-qubit operations.
Low Ent High Ent
Low 
Expr
High 
Expr
Figure 6: Circuit descriptor landscape for circuit
instances with width of n = 4 qubits and depth of L = 1
layer. Circuits are labeled by their IDs assigned in Fig.
2. Marker color indicates the number of parameters
associated with the circuit instance.
ibility can improve the performance, one may consider
selecting circuit 6. However, for a “cheaper” alternative
in terms of the number of parameters and circuit depth,
one may instead choose to use circuit 14.
Limitations of the method. With growing system
size, a large number of state samples will be required to
estimate each property to a reasonable precision: the ex-
pected overlap between states drops exponentially with
qubit number. Nevertheless, the theoretical framework
from this study can be useful for studying and designing
modest-sized circuits that are suitable for NISQ comput-
ers. In addition, trends observed in the descriptors for
several low-width circuits appear to generalize to larger
widths. We also note that estimation of each descriptor
may be improved using alternative quantities or numer-
ical methods. For example, the entanglement measure
used to quantify the entangling capability is one particu-
lar measure of multi-particle entanglement that does not
fully characterize the entanglement in the system. Ad-
ditional entanglement measures [34] can be explored in
tandem for an in-depth characterization.
V. CONCLUSION AND OUTLOOK
In an effort to improve or create HQC algorithms, nu-
merous studies have developed better ansatze, in terms of
accuracy, scalability, and implementability on near-term
devices for specific applications [2, 18–20, 45]. Many pro-
posals have been either theoretically motivated but im-
practical, or practical but largely ad hoc. There is an op-
portunity and need for developing principled approaches
to designing parameterized quantum circuits.
In this work, we presented a theoretical framework
to characterize and compare parameterized quantum cir-
cuits, independent of the algorithm or application. With
one of the descriptors, expressibility, its value was shown
to saturate with sufficient depth. We described how the
rate of saturation and the saturated value may be useful
indicators of the performance of a parameterized quan-
tum circuit and, therefore, may help to design and se-
lect such circuits in an application. In addition, we ap-
plied the descriptors to identify useful circuit fragments,
in terms of both the gate choice and the configuration
of two-qubit operations. Several of these fragments are
natural operations for particular quantum hardware (e.g.
all-to-all qubit connectivity of ion trap quantum comput-
ers) and thus may guide designs of PQCs for experiments
on particular devices. However, there still remain open
questions and challenges regarding design of PQCs that
will be explored in future work.
It remains to understand the connection between ex-
pressibility and performance of a PQC in a particular
algorithm such as VQE. This will require a deeper bench-
mark study that quantifies the correlations between the
descriptors and the performance metrics of a given al-
gorithm, e.g. energy errors and/or numbers of function
evaluations. Some degree of expressibility is expected to
be necessary in order that a PQC performs well for a vari-
ety of Hamiltonians in VQE. For certain settings, it may
be advantageous to exploit symmetry in the problem to
design a PQC. As a simple example, consider a fermionic
second-quantized Hamiltonian describing a system with
fixed particle number. In this case, the parameterized
14
quantum circuit may be designed to output states only
in the proper particle-number subspace, resulting in an
unfavorable expressibility. One approach to handling this
situation would be to generalize the notion of expressibil-
ity to subspace expressibility, comparing the distribution
of fidelities to that of the Haar distribution over the sub-
space. By understanding the correlations between the
descriptors and the performance metrics of a variational
HQC algorithm, these descriptors may become useful for
guiding the design and compilation of the ansatz. In
practice, it may be particularly helpful to derive a uni-
fied quantity that combines the descriptors to evaluate
a single value that quantifies the capability of a PQC,
similar to the “quantum volume” [46] for quantifying the
capabilities of quantum devices.
Thus far, our descriptors have only been explored
for pure-state classical simulations of quantum circuits.
To better understand the performance of parameterized
quantum circuits in a more-realistic setting, it may be
worth exploring noisy simulations or designing experi-
mental protocols to estimate expressibility (e.g. using
SWAP tests to compute state fidelities).
A major challenge in developing and scaling up vari-
ational HQC algorithms is the “barren plateau” phe-
nomenon highlighted in [15]. The authors show that the
expectation value of the gradient of the objective func-
tion rapidly approaches zero with increasing system size
when the output states are randomly drawn from an ap-
proximate 2-design. This shows that expressible circuits
must be used with care. In particular, with an express-
ible circuit, choosing a random starting point for a VQE
optimization is not a reasonable approach. Methods for
circumventing this issue have been proposed [15, 47, 48],
suggesting ways to improve the “optimizability” of a pa-
rameterized quantum circuit. These techniques indicate
a tension between being expressible and being optimiz-
able (i.e. having sufficient variation in the cost function).
Both qualities are expected to be important in practice.
This tension points to an opportunity to investigate the
proper balance between these two qualities when solving
particular problems of interest.
While this study provides only loose design criteria for
selecting circuit ansatze, it presents a concrete classical
simulation framework for identifying and approaching the
said challenges by defining quantities that can be easily
computed and compared among various circuit designs
or, even, circuit fragments. This can allow for one to eval-
uate and rank a group of potential circuits based on crite-
ria such as the application-of-interest and/or hardware-
of-choice. We intend for this study to be a useful starting
point for researchers designing parameterized quantum
circuits, as well as for experimentalists benchmarking and
developing gates for quantum devices.
ACKNOWLEDGMENTS
The authors thank Ryan Sweke for a helpful discussion
in connecting ideas in this work to ideas in statistical
learning theory. The authors also thank Morten Kjaer-
gaard, Juani Bermejo-Vega, Maria Schuld, and Jonathan
Olson for valuable discussions and comments on the
manuscript. S. S. also thanks Aram Harrow and Isaac
Chuang for helpful discussions on the project regard-
ing frame potentials and parameterized circuit designs
respectively during the Quantum Information Science
(8.371) course. All the authors thank the colleagues
at Zapata Computing for their feedback. A. A.-G. ac-
knowledges support from Anders G. Frøseth as well as
the Google Focused Award. A. A.-G. acknowledges sup-
port from the Vannevar Bush Faculty Fellowship under
Award No. ONR 00014-16-1-2008 and the Army Re-
search Office under Award No. W911NF-15-1-0256. S.
S. is supported by the Department of Energy Computa-
tional Science Graduate Fellowship (DOE CSGF) under
grant number DE-FG02-97ER25308.
[1] J. Preskill, Quantum 2, 79 (2018).
[2] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-
Guzik, New J. Phys 18, 023023 (2016).
[3] Y. Cao, J. Romero, J. P. Olson, M. Degroote, P. D. John-
son, M. Kieferová, I. D. Kivlichan, T. Menke, B. Per-
opadre, N. P. D. Sawaya, S. Sim, L. Veis, and A. Aspuru-
Guzik, (2018), arXiv:1812.09976.
[4] E. Farhi, J. Goldstone, and S. Gutmann, arXiv preprint
arXiv:1411.4028 (2014).
[5] J. Romero, J. P. Olson, and A. Aspuru-Guzik, Quantum
Sci. Technol 2, 023023 (2017).
[6] P. D. Johnson, J. Romero, J. Olson, Y. Cao,
and A. Aspuru-Guzik, arXiv preprint arXiv:1711.02249
(2017).
[7] E. Farhi and H. Neven, arXiv preprint arXiv:1802.06002
(2018).
[8] V. Havlíček, A. D. Córcoles, K. Temme, A. W. Harrow,
A. Kandala, J. M. Chow, and J. M. Gambetta, Nature
567, 209 (2019).
[9] M. Schuld, A. Bocharov, K. Svore, and N. Wiebe, arXiv
preprint arXiv:1804.00633 (2018).
[10] P.-L. Dallaire-Demers and N. Killoran, Phys. Rev. A 98,
012324 (2018).
[11] S. Lloyd and C. Weedbrook, Phys. Rev. Lett 121, 040502
(2018).
[12] D. Zhu, N. M. Linke, M. Benedetti, K. A. Landsman,
N. H. Nguyen, C. H. Alderete, A. Perdomo-Ortiz, N. Ko-
rda, A. Garfoot, C. Brecque, L. Egan, O. Perdomo, and
C. Monroe, arXiv preprint arXiv:1812.08862 (2018).
[13] S. Shalev-Shwartz and S. Ben-David, Understanding Ma-
chine Learning: From Theory to Algorithms (Cambridge
University Press, 2014).
[14] S. Arunachalam and R. de Wolf, (2017),
arXiv:1701.06806.
[15] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush,
and H. Neven, Nat. Commun 9, 4812 (2018).
[16] N. Killoran, T. R. Bromley, J. M. Arrazola,
M. Schuld, N. Quesada, and S. Lloyd, arXiv preprint
15
arXiv:1806.06871 (2018).
[17] J. Romero, R. Babbush, J. R. McClean, C. Hempel, P. J.
Love, and A. Aspuru-Guzik, Quantum Sci. Technol 4,
014008 (2018).
[18] I. D. Kivlichan, J. McClean, N. Wiebe, C. Gidney,
A. Aspuru-Guzik, G. K.-L. Chan, and R. Babbush, Phys.
Rev. Lett 120 (2018).
[19] P.-L. Dallaire-Demers, J. Romero, L. Veis, S. Sim,
and A. Aspuru-Guzik, arXiv preprint arXiv:1801.01053
(2018).
[20] A. Kandala, A. Mezzacapo, K. Temme, M. Takita,
M. Brink, J. M. Chow, and J. M. Gambetta, Nature
549, 242 (2017).
[21] M. R. Geller, Phys. Rev. Appl 10, 024052 (2018),
arXiv:1711.11026.
[22] Y. Du, M.-H. Hsieh, T. Liu, and D. Tao, arXiv preprint
arXiv:1810.11922 (2018).
[23] Y. S. Weinstein, W. G. Brown, and L. Viola, Phys. Rev.
A 78, 052332 (2008).
[24] Y. Nakata, M. Koashi, and M. Murao, New J. Phys 16,
053043 (2014).
[25] D. A. Roberts and B. Yoshida, J. High Energ. Phys
(2017).
[26] S. Datta, S. Howard, and D. Cochran, Linear Algebra
Its Appl. 437, 2455 (2012).
[27] J. M. Renes, R. Blume-Kohout, A. J. Scott, and C. M.
Caves, J. Math. Phys (2004).
[28] A. Klappenecker and M. Rotteler, in Proceedings. In-
ternational Symposium on Information Theory, 2005.
ISIT 2005., Vol. 2005 (IEEE, 2005) pp. 1740–1744,
arXiv:0502031 [quant-ph].
[29] I. Bengtsson and K. Zyczkowski, Geometry of Quan-
tum States: An Introduction to Quantum Entanglement
(Cambridge University Press, 2017) pp. 1–619.
[30] K. Życzkowski and H.-J. Sommers, Phys. Rev. A 71,
032313 (2005).
[31] S. Kullback and R. A. Leibler, Ann. Math. Stat 22, 79
(1951).
[32] J. Johansson, P. Nation, and F. Nori, Comput. Phys.
Commun 183, 1760 (2012), arXiv:1110.0573.
[33] D. A. Meyer and N. R. Wallach, J. Math. Phys 43, 4273
(2002).
[34] M. A. Nielsen, C. M. Dawson, J. L. Dodd, A. Gilchrist,
D. Mortimer, T. J. Osborne, M. J. Bremner, A. W. Har-
row, and A. Hines, Phys. Rev. A 67, 052301 (2003).
[35] G. K. Brennen, Quantum Information & Computation 3,
619 (2003).
[36] B. M. Terhal and P. Horodecki, Physical Review A 61,
040301 (2000), arXiv:9911117 [quant-ph].
[37] R. Somma, G. Ortiz, H. Barnum, E. Knill, and L. Viola,
Phys. Rev. A 70 (2004), 10.1103/PhysRevA.70.042311.
[38] M. A. Nielsen and I. L. Chuang, Quantum Computation
and Quantum Information (Cambridge University Press,
2009).
[39] R. S. Smith, M. J. Curtis, andW. J. Zeng, arXiv preprint
arXiv:1608.03355 (2016).
[40] P. B. M. Sousa and R. V. Ramos, arXiv preprint quant-
ph/0602174 (2006).
[41] C. M. Wilson, J. S. Otterbach, N. Tezak, R. S. Smith,
G. E. Crooks, and M. P. da Silva, arXiv preprint
arXiv:1806.08321 (2018).
[42] “qpic,” https://github.com/qpic/qpic (2016).
[43] P. J. J. O’Malley, R. Babbush, I. D. Kivlichan,
J. Romero, J. R. McClean, R. Barends, J. Kelly,
P. Roushan, A. Tranter, N. Ding, B. Campbell, Y. Chen,
Z. Chen, B. Chiaro, A. Dunsworth, A. G. Fowler, E. Jef-
frey, E. Lucero, A. Megrant, J. Y. Mutus, M. Neeley,
C. Neill, C. Quintana, D. Sank, A. Vainsencher, J. Wen-
ner, T. C. White, P. V. Coveney, P. J. Love, H. Neven,
A. Aspuru-Guzik, and J. M. Martinis, Phys. Rev. X 6,
031007 (2016).
[44] J. Otterbach, R. Manenti, N. Alidoust, A. Bestwick,
M. Block, B. Bloom, S. Caldwell, N. Didier, E. S. Fried,
S. Hong, et al., arXiv preprint arXiv:1712.05771 (2017).
[45] D. Wecker, M. B. Hastings, and M. Troyer, Phys. Rev.
A 92 (2015).
[46] L. S. Bishop, S. Bravyi, A. Cross, J. M. Gam-
betta, and J. Smolin, “Quantum Volume,”
https://dal.objectstorage.open.softlayer.
com/v1/AUTH_039c3bf6e6e54d76b8e66152e2f87877/
community-documents/quatnum-volumehp08co1vbo0cc8fr.
pdf (2017).
[47] H. R. Grimsley, S. E. Economou, E. Barnes, and N. J.
Mayhall, arXiv preprint arXiv:1812.11173 (2018).
[48] E. Grant, L. Wossnig, M. Ostaszewski, and
M. Benedetti, arXiv preprint arXiv:1903.05076 (2019),
arXiv:1903.05076.
[49] A. Chao and T. J. Shen, Environ. Ecol. Stat. 10, 429
(2003).
APPENDIX
Appendix A: Circuit width
This section aims to numerically demonstrate the utility of the theoretical framework for characterizing modest-
sized circuits (i.e. with widths and depths suitable for NISQ devices). We extend the descriptor analysis of the
circuit templates by considering their instances at larger qubit numbers: n = 6 and n = 8. For each circuit width,
we compute expressibility and entangling capability of the circuit at depths L = 1 and L = 2 (in circuit layers). As
observed in Fig. 7, the relative (ascending) ordering based on descriptor values as well as trends in the rate of increase
for these descriptors with an added circuit layer are largely preserved, up to statistical error. For example, circuit 9
is the least expressible for the three different circuit widths considered, and circuit 6 is the most expressible for all
three cases. A similar result is observed for circuits 1 and 9 for the entangling capability. While the raw values of
the rates of increase in expressibility or entangling capability change with circuit width, circuit templates at n = 4
that correspond to large (or small) increases in the descriptor values from L = 1 to L = 2 also are templates that
16
a) n = 6 qubits
b) n = 8 qubits
Figure 7: Expressibility and entangling capability computed for circuits from Fig. 2 considering larger circuit
widths: (a) n = 6 and (b) n = 8 qubits. Marker colors indicate different numbers of circuit layers (L) applied, and
marker types indicate different circuit widths (n). In each plot, circuits are ordered by ascending descriptor values
for when L = 1. Black dashed lines in entangling capability plots for both (a) and (b) show the average Q values for
random pure states. Error bars show standard deviations over three independent computations.
correspond to large (or small) increases in the descriptor values for larger qubit numbers. With circuit templates that
exhibit consistent trends in descriptor values over varying qubit numbers, it may be possible to use observations and
insights from simulations of smaller circuit instances to infer the performance of the same circuit template with a
larger qubit number.
Appendix B: Sample size
The appropriate sample size for estimating expressibility and entangling capability was deduced by applying Cheby-
shev’s inequality. For each circuit simulation in Section III, 5000 pairs of states (104 states total) were sampled to
compute the descriptors. In the case of expressibility, this sample size corresponded to estimating the mean of state
fidelities within a relative precision of approximately 0.1, with respect to the variance, with 98% confidence. Similarly,
this sample size corresponded to estimating the average MW measure, or 〈Q〉, within a relative precision of 0.07 with
98% confidence. To empirically justify the sample size, both descriptor values were computed for a single layer of cir-
cuit 6, considering two different circuit widths, n = 4 and n = 8. Estimated descriptor values are plotted over varying
sample size in Figs. 8. In this figure we observe bias in the estimation of expressibility, in which the magnitude of the
bias is pronounced at low sample sizes. Computing an unbiased estimator for entropy (or entropy-based quantities)
is a well-known challenge [49]. The bias can often be alleviated by a combination of collecting sufficient samples and
adding correction terms to the entropy estimate, e.g. Chao-Shen terms for relative abundance and sample coverage
[49]. We leave such considerations in the context of estimating expressibility to future work. On the left-hand panels
we can observe the bias due to finite sampling of the expressibility value; the plotted standard deviations do not
encompass the subsequent sample means that are estimated using larger sample numbers.
17
a) n = 4 qubits
b) n = 8 qubits
Figure 8: Plots showing convergences in both expressibility and entangling capability values for circuit 6 with
increased sample size. Plots (a) and (b) show the descriptor data for n = 4 and n = 8 qubits, respectively. Error
bars show standard deviations over five independent computations. The term Nsamples refers to the number of
sample pairs of parameterized states.
Appendix C: Expressibility saturation
In Section IIIA, we observed evidence of expressibility values saturating with increased layers. We extend the
number of layers L up to 10, to confirm and further investigate this phenomenon. Expressibility values are plotted
with respect to the number of two-qubit gates as shown in Fig. 9, where the markers indicate the layer number. This
allows the reader to visualize the saturating effect while taking into account the total number of two-qubit gates it
took to reach a particular expressibility value. The red dotted line in Fig. 9 indicates the bias in the estimation of the
KL divergence due to finite sampling, in which this value is computed by averaging over five sets of 5000 Haar-random
states. This provides a numerical barrier below which we cannot resolve the expressibility values. In order to better
resolve the value of expressibility saturation below this numerical barrier (e.g. in the case of circuit 16), we would
either need to generate more samples or use a bias-corrected estimator.
From the plot, we observe different cases of expressibility saturation, varying in the rate of and value at saturation.
For instance, circuits 6, 10, and 15 saturate at nearly the first layer. However, circuits 10 and 15 saturate at unfavorable
expressibility values, compared to that of circuit 6. Circuit 3 (or, equivalently, 16) does not reach saturation even
with ten circuit layers, though the expressibility continues to improve. Many of the circuits, regardless of the qubit
connectivity or two-qubit gate configuration, reach favorable expressibility values with sufficient depth. In practice,
we may be interested in circuits that correspond to the most favorable expressibility (lowest KL divergence) while
maintaining a low number of two-qubit gates. Circuits 11 and 12 saturate at favorable expressibility values using
between 10 and 20 nearest-neighbor two-qubit operations. By contrast, circuit 6 reaches a favorable expressibility
value within a single circuit layer, using 12 two-qubit operations, but several of these gates are non-local, requiring
a higher degree of qubit connectivity or a (costly) decomposition into nearest-neighboring gates. This implies that
circuits 11 and 12 may be better circuit choices to employ on NISQ devices than circuit 6 is.
18
0.05 0.00 0.05
# of 2q gates
10 2
10 1
E
xp
r,
 D
KL
Circuit 1
10 20 30
# of 2q gates
Circuit 2
10 20 30
# of 2q gates
Circuit 3
10 20 30
# of 2q gates
Circuit 4
50 100
# of 2q gates
10 2
10 1
E
xp
r,
 D
KL
Circuit 5
50 100
# of 2q gates
Circuit 6
10 20 30
# of 2q gates
Circuit 7
10 20 30
# of 2q gates
Circuit 8
10 20 30
# of 2q gates
10 2
10 1
E
xp
r,
 D
KL
Circuit 9
20 40
# of 2q gates
Circuit 10
10 20 30
# of 2q gates
Circuit 11
10 20 30
# of 2q gates
Circuit 12
25 50 75
# of 2q gates
10 2
10 1
E
xp
r,
 D
KL
Circuit 13
25 50 75
# of 2q gates
Circuit 14
25 50 75
# of 2q gates
Circuit 15
10 20 30
# of 2q gates
Circuit 16
10 20 30
# of 2q gates
10 2
10 1
E
xp
r,
 D
KL
Circuit 17
20 40
# of 2q gates
Circuit 18
20 40
# of 2q gates
Circuit 19
Layer Number
L=1
L=2
L=3
L=4
L=5
L=6
L=7
L=8
L=9
L=10
Figure 9: Expressibility values plotted against the number of two-qubit gates for each circuit in Fig. 2 with width of
n = 4 qubits. Markers indicate the layer number. As described in Appendix C, finite sampling introduces a bias in
the expressibility estimation. The red dotted line in each subplot shows the estimator’s bias introduced in the case
of 5000 samples when the true value is zero (i.e. the expressibility of the Haar distribution itself).
