We introduce a single-number metric, quantum volume, that can be measured using a concrete protocol on near-term quantum computers of modest size (n < ∼ 50), and measure it on several stateof-the-art transmon devices, finding values as high as 8. The quantum volume is linked to system error rates, and is empirically reduced by uncontrolled interactions within the system. It quantifies the largest random circuit of equal width and depth that the computer successfully implements. Quantum computing systems with high-fidelity operations, high connectivity, large calibrated gate sets, and circuit rewriting toolchains are expected to have higher quantum volumes. The quantum volume is a pragmatic way to measure and compare progress toward improved system-wide gate error rates for near-term quantum computation and error-correction experiments.
IBM T. J. Watson Research Center, Yorktown Heights, NY 10598
We introduce a single-number metric, quantum volume, that can be measured using a concrete protocol on near-term quantum computers of modest size (n < ∼ 50), and measure it on several stateof-the-art transmon devices, finding values as high as 8. The quantum volume is linked to system error rates, and is empirically reduced by uncontrolled interactions within the system. It quantifies the largest random circuit of equal width and depth that the computer successfully implements. Quantum computing systems with high-fidelity operations, high connectivity, large calibrated gate sets, and circuit rewriting toolchains are expected to have higher quantum volumes. The quantum volume is a pragmatic way to measure and compare progress toward improved system-wide gate error rates for near-term quantum computation and error-correction experiments.
Recent quantum computing efforts have moved beyond controlling a few qubits, and are now focused on controlling systems with several tens of qubits [1] [2] [3] . In these noisy intermediate-scale quantum (NISQ) systems [4] , performance of isolated gates may not predict the behavior of the system. Methods such as randomized benchmarking [5] , state and process tomography [6] , and gateset tomography [7] are valued for measuring the performance of operations on a few qubits, yet they fail to account for errors arising from interactions with spectator qubits [8, 9] . Given a system such as this, whose individual gate operations have been independently calibrated and verified, how do we measure the degree to which the system performs as a general purpose quantum computer? We address this question by introducing a single-number metric, the quantum volume, together with a concrete protocol for measuring it on near-term systems. Similar to how LINPACK is used for comparing diverse classical computers [10] , this metric is not tailored to any particular system, requiring only the ability to implement a universal set of quantum gates. With the concept of this metric being discussed elsewhere [11, 12] , our focus here is on measuring this metric in near-term quantum devices.
The quantum volume protocol we present is strongly linked to gate error rates, and is influenced by underlying qubit connectivity and gate parallelism. It can thus be improved by moving toward the limit in which large numbers of well-controlled, highly coherent, connected, and generically programmable qubits are manipulated within a state-of-the-art circuit rewriting toolchain. High-fidelity state preparation and readout are also necessary. In this work, we evaluate the quantum volume of current IBM Q devices [1] , and corroborate the results with simulations of the same circuits under a depolarizing error model. While we focus on transmon devices, the protocol can be implemented with any universal programmable quantum computing device.
The quantum volume is based on the performance of * awcross@us.ibm.com † lsbishop@us.ibm.com SU (4) SU (4) SU (4) SU (4) SU (4) SU (4) SU (4) SU (4) SU (4) FIG. 1. Model circuit. A model circuit consists of d layers of random permutations of the qubit labels, followed by random two-qubit gates. When the circuit width m is odd, one of the qubits is idle in each layer. A final permutation can be applied to the labels of the measurement outcomes.
random circuits with a fixed but generic form. It is well-known that quantum algorithms can be expressed as polynomial-sized quantum circuits built from two-qubit unitary gates [13] . Quantum algorithms are generally not random circuits. However, random circuits model generic state preparations, and are used as the basis of proposals for demonstrating quantum advantage [14] . In addition, circuits with a similar form appear in near-term algorithms like quantum adiabatic optimization algorithms [15] and variational quantum eigensolvers [16] .
A model circuit, shown in Fig. 1 , with depth d and width m, is a sequence
each labeled by times t = 1, . . . , d and acting on m = 2 n/2 qubits. Each layer is specified by choosing a uniformly random permutation π t ∈ S m of the m qubit indices and sampling each U (t) a,b , acting on qubits a and b, from the Haar measure on SU (4) .
To define when a model circuit U has been successfully implemented in practice, we use the heavy output generation problem [17] . The ideal output distribution is
m is an observable bit-string. Consider the set of output probabilities given by the range of p U (x) sorted in ascending order p 0 ≤ p 1 · · · ≤ p 2 m −1 . The median of the set of probabilities is p med = (p 2 m−1 + p 2 m−1 −1 )/2, and the heavy outputs are
The heavy output generation problem is to produce a set of output strings such that more than two-thirds are heavy.
To evaluate heavy output generation, we implement model circuits using the gate set provided by the target system. For example, the model circuit may need to be rewritten, not only to use the system's gate set, but also to respect the set of available interactions, which may require additional operations such as SWAP gates. The average gate fidelity [18] between m-qubit unitaries U and U is
Given a model circuit U , a circuit-to-circuit transpiler finds an implementation U for the target system such that 1 − F avg (U, U ) ≤ 1. The approximation error is limited by the selected classical precision in many cases, but may be further increased if the hardware requires SU(4) to be approximated with a discrete set of available gates.
The transpiler is free to use all available tricks and hardware resources to implement U (e.g., taking great computational effort in finding an optimized U , using extra qubits for gate teleportation or temporary storage, etc.). It may optimize over qubit placements by choosing the best region of the device. If it is practical to calibrate a very large gate set, and it happens to include an accurate implementation of U , the transpiler is free to use it. None of these approaches is expected to provide an asymptotic advantage, but may significantly improve practical performance. We do require that the transpiler make an honest attempt to implement U , and not merely choose a relatively simple operation far from U that nevertheless produces the heavy outputs for U . The compilation routine for computing the quantum volume of IBM Q devices is described in Appendix A, and an approximation scheme given in Appendix B.
The observed distribution for an implementation U of model circuit U is q U (x), and the probability of sampling a heavy output is
To determine if a given output is heavy, we compute H U directly from U using a method that scales exponentially 
We desire a metric that is a single real number, as this enables straightforward comparison. Data {d(m)} can be gathered by sweeping over values of m and d. We are free to choose any function of this data {d(m)} to capture how well a device performs. The quantum volume treats the width and depth of a model circuit with equal importance and measures the largest squareshaped (i.e., m = d) model circuit a quantum computer can implement successfully on average [11, 12] . We define the quantum volume V Q as
This definition loosely coincides with the complexity of classically simulating the model circuits. There are different ways to classically simulate the model quantum circuits. A straightforward wave-vector propagation approach requires exponential space and time ∼ 2 m . A 'Feynman' algorithm uses linear space ∼ dm but exponential time ∼ 4 dm . It is possible to trade off time and space complexity in a smooth way [17] . Clever partitioning of circuits can achieve good parallelism and efficient U that can be successfully implemented will involve few enough qubits and/or low enough depth to compute H U classically. For lower error rates than this, the quantum volume can be superseded by new metrics.
use of distributed memory resources for particular supercomputer architectures [19] [20] [21] [22] [23] [24] [25] . Particular efforts for circuit partitioning and parallelism have been expended for circuits defined on a 2-dimensional square grid of qubits, where the state-of-the-art is d = 40 for a 9 × 9 grid [20] .
One view of these methods is that they use heuristics to approach optimal variable elimination ordering for a tensor network calculation on the graph corresponding to the circuit. The time complexity scales exponentially with the treewidth of the circuit graph [26] . The treewidth is upper-bounded by m, and while there are specific circuits of depth d = 4 with expander graph structure for which the treewidth is Ω(m), heuristic estimation of the treewidth for some classes of random circuits [22, 23] indicates that the treewidth grows roughly as d. Therefore, we heuristically bound the treewidth of the model circuits as min(d, m), and since the simulation complexity grows exponentially with the treewidth, we define the quantum volume as
We have run quantum volume circuits on three IBM Q devices: 5-qubit Tenerife [27] , 16-qubit Melbourne [28] , and 20-qubit Tokyo. We generate 200 circuits for d = m with m in 2, 3, 4 to determine V Q . The experimental results and comparison to simulated data for Tokyo are given in Fig. 2 , whereas a summary of results across all devices is in Table I . We note that the noisy simulation substantially over-estimates the performance, highlighting the value of system-level metrics such as quantum volume. In order to set a high confidence level that the experimental measurements of h d surpass the threshold, we repeat the experiments for m = 2 on Tenerife and m = 3 on Tokyo with 5000 circuits. This larger number of circuits has a strict threshold ofĥ d > 0.68 for a 97.5% one-sided confidence interval (see Appendix C). From Table I we see that log 2 V Q = 3 for Tokyo, log 2 V Q = 2 for Tenerife, and log 2 V Q < 2 for Melbourne. Additional details about the devices used here are given in Appendix D.
We also compare circuits run on Tokyo with optimized compiling schemes. Table II 
found with circuits optimized both by the KAK decomposition [29, 30] described in Appendix A and the approximate SU(4) decomposition described in Appendix B, assuming CX error rates of 0.01, 0.03, and 0.05. We find modest increases inĥ d that correspond to the reduction in the total number of CX gates in the compiled circuits: the standard Qiskit Terra transpiler [31] produces circuits with 28 CX gates on average, and we measurê h d = 0.614(0.003); KAK reduces the average number of CX gates to 21 and producesĥ d = 0.632(0.005). The approximate SU(4) circuits introduce further gains with the best result ofĥ d = 0.649(0.005) achieved using circuits with an 1% CX error approximation.
To understand how the quantum volume scales in a system with limited connectivity, as gate error probabilities decrease, we consider model circuits of width m on a square grid of m qubits. The m qubits are arranged into the largest possible square, and extra qubits are added
Experimental data for square (width = depth) quantum volume circuits using the IBM Q 20-qubit device, Tokyo. The ideal simulation results are green plus signs. The noisy simulations, using a depolarizing noise model with average error rates from the qubits used on the device, are red circles. The experiments using 200 circuits are blue squares. The dotted line is the threshold of 2/3 for heavy output generation, and the dashed (green) line is the asymptotic ideal heavy output probability of 1+ln 2 2
[17], which the ideal simulations quickly approach. In order to set a high confidence level that h d surpasses the threshold, the point at m = d = 3 was repeated with 5000 circuits (cyan diamond). This number of shots corresponds to a stricter threshold of 0.68 indicated by the solid line at the experimental points for m = 3. first to a new right column and then to a new bottom row. We approximate the achievable model circuit depthd(m) by assuming independent stochastic errors, so that the computation fails with high probability when the model circuit volume (width times depth) satisfies
We substitute an estimate of the mean effective error probability eff (m) per two-qubit gate into this expression. This estimate eff (m) = (a √ m + b) is proportional to the two-qubit gate error probability , with a prefactor that is linear in √ m. This factor fits the mean number of SWAPs necessary to bring a pair of qubits next to each other, apply the gate, and then return them to their original positions. It is twice the average shortest path length (minus one). We do a similar calculation for a loop of m To validate these estimates, we consider the influence of connectivity on quantum volume by simulating three coupling graphs for up to 12 qubits: all-to-all connectivity, square grid, and loop. We estimate the two-qubit gate error required for each coupling graph to obtain a log 2 V Q of 4, 6, 8, and 12, assuming the single-qubit gate error is equal to /10 (Table III) . We run these simulations with no measurement error for all graphs, and for measurement errors of 0%, 1%, and 5% for the square grid (Table IV) . The values for here correspond to 200 simulated circuits with a heavy output probability ofĥ d = 0.67 ± 0.05.
It is clear from Table III that all-to-all connectivity provides an advantage over the less-connected graph; log 2 V Q of 12 is achievable with twice the two-qubit error rate (0.0032) of the square grid (0.0015) and the 12-qubit loop (0.0014). At the same time, there is little difference between the required two-qubit error rate for the square grid versus the loop graphs; the error rate for the loop is less than 7% lower than that of the square grid for the 12-qubit case. This relatively small difference is due to the small total number of qubits, since there is a significant asymptotic difference between loop and grid layouts. However, the difference may increase, even at small sizes, when using an optimal transpiler. All circuits for the simulations in Tables III and IV were compiled using the standard Qiskit Terra transpiler. Quantum volume estimates computed from Eq. 8 are consistent with these depolarizing noise simulations at error probabilities down to ≈ 10 −3 , as shown in Fig. 3 . Table III together with estimates using the expression in Eq. 8 for grid and loop connectivities.
These simulations give an indication of how quantum volume measurements might look on different quantum computing architectures. Trapped ions, for instance, will benefit from having all-to-all connectivity. Typical trapped-ion systems have both two-qubit gate errors and measurement errors less than 0.01, which based on Table III should be sufficient to achieve log 2 V Q = 6 if not higher. Recently, trapped-ion experiments have demonstrated two-qubit gates with errors of 0.001 [32] , indicat-ing higher quantum volumes should be possible. However, multi-qubit experiments are susceptible to larger error rates than isolated two-qubit gates, due to correlated errors across many ions [33] . A measurement of quantum volume would give a reliable validation of multiqubit trapped-ion systems. Similarly, we can infer that for superconducting devices, coupling maps with more connectivity should produce higher quantum volume, but only if additional coupling does not also introduce larger errors.
In this paper we expand on a previously presented metric, the quantum volume [11, 12] , and show both a concrete specification and a method for benchmarking noisy intermediate-scale quantum devices. This metric takes into account all relevant hardware parameters. This includes the performance parameters (coherence, calibration errors, crosstalk, spectator errors, gate fidelity, measurement fidelity, initialization fidelity) as well as the design parameters such as connectivity and gate set. It also includes the software behind the circuit optimization. Additionally, the quantum volume is architectureindependent, and can be applied to any system that is capable of running quantum circuits. We implement this metric on several IBM Q devices, and find a quantum volume as high as 8. We conjecture that systems with higher connectivity will have higher quantum volume given otherwise similar performance parameters.
From numerical simulations for a given connectivity, we find that there are two possible paths for increasing the quantum volume. Although all operations must improve to increase the quantum volume, the first path is to prioritize improving the gate fidelity above other operations, such as measurement and initialization. This sets the roadmap for device performance to focus on the errors that limit gate performance, such as coherence and calibration errors. The second path stems from the observation that, for these devices and this metric, circuit optimization is becoming important. We implemented various circuit optimization passes (far from optimal) and showed a measurable change in the experimental performance. In particular, we introduced an approximate method for NISQ devices, and used it to show experimental improvements.
We encourage the adoption of quantum volume as a primary performance metric, which we believe will allow the field to work together and focus efforts on the important factors to develop improved NISQ devices.
ACKNOWLEDGMENTS
The authors acknowledge support from ARO under Contract No. W911NF-14-1-0124 and thank Sergey Bravyi, John A. Smolin, and Christopher J. Wood for informative discussions. We thank Antonio Córcoles, Abigail Cross, John Gunnels, David McKay, Travis Scholten, and Ted Yoder for valuable comments on the manuscript. We are grateful to the IBM Q team for their contributions to the systems and devices used in this work.
Model circuits must be rewritten to use the gate set of the target system, while attempting to minimize any additional overhead that might result from the translation. The IBM Q systems used in this paper accept quantum circuits expressed by products of controlled-NOT (CNOT) gates and single-qubit gates [34] . The singlequbit gates are defined by
where R P (θ) = exp(−iθP/2) for a Pauli matrix P ∈ {X, Y, Z}. The available CNOT gates for a particular system are given in the form of a qubit connectivity graph G = (V, E). Each vertex of G represents a qubit and each (directed) edge represents a pair of qubits that can be coupled by gates. We generate input model circuits by sampling and expanding each SU(4) gate to CNOT and single-qubit gates using the KAK decomposition [29, 30] implemented in Qiskit Terra (see also Appendix B). Each input circuit is then mapped to the target system and optimized using a sequence of circuit rewriting passes that are implemented in Qiskit Terra. These passes are named unrolling, CNOT reorientation, CNOT cancellation, singlequbit optimization, and swap mapping. All of the passes can be applied multiple times, but some passes, such as CNOT reorientation, have requirements that are ensured by other passes, such as swap mapping.
The unrolling pass is essentially a macro expansion that descends into each gate's hierarchical definition and rewrites that gate in terms of lower-level gates. In the setting of rewriting model circuits, the lower-level gate set is always the IBM Q gate set. For example, a Hadamard (H) gate is defined as u 2 (0, π) in the Qiskit Terra gate library, which is in the IBM Q gate set, and a SWAP gate is defined as CNOT a,b CNOT b,a CNOT a,b .
The CNOT reorientation pass examines each CNOT gate in the circuit and applies the identity
if (t, c) is a directed edge of G but (c, t) is not. The pass fails if neither (c, t) nor (t, c) are edges of G. The CNOT cancellation pass collects sequences CNOT m c,t of CNOT gates with the same control and target qubits, and replaces them by CNOT c,t if m is odd or removes them from the circuit if m is even.
The single-qubit optimization pass collects sequences of single-qubit gates on the same qubit and replaces each sequence by at most one single-qubit gate. Furthermore, the replacement is chosen in an attempt to minimize the number of physical pulses used to implement the gate; u 1 uses zero pulses, u 2 uses one pulse, and u 3 uses two pulses. The algorithm composes the gates in sequence, rewriting each composed pair of gates as a new gate according to a handful of rewriting rules that follow from the definitions.
The swap mapping pass is the most involved of the fundamental passes within Qiskit Terra. This pass first partitions the input circuit into a sequence of layers such that each layer consists of gates that act on disjoint sets of qubits. The algorithm then acts layer by layer. For simplicity we will ignore single-qubit gates in the following discussion. Consider the gate U = U 1 U 2 . . . U m applied in a particular layer, where U 1 , . . . , U m are pairwise disjoint two-qubit gates that may act on remote pairs qubits. When the mapping pass acts on this layer, it computes a quantum circuit U with the following properties:
1. U consists of nearest-neighbor gates with respect to the connectivity graph G = (V, E)
2. U = W U where W is some permutation of the n = |V | qubits 3. U has small depth, which the algorithm tries to minimize subject to the first two conditions
The algorithm to compute U consists of a sequence of rounds, each of which increases the depth of U by one.
At the beginning of a round, the algorithm applies all gates U j that are nearest-neighbors and removes them from U . The rest of the round performs a greedy (randomized) optimization over swap gates to choose a depthone swap circuit that brings pairs of qubits coupled by gates as close as possible.
The passes are applied in the following order for our standard compilation: In our study of optimized model circuits, we apply the following optimization passes after the standard set of passes:
1. Two-qubit block collection pass 2. Two-qubit block optimization pass
The two-qubit block collection pass is an analysis pass that traverses the circuit's gates in topologically sorted order. Starting at each newly-discovered CNOT gate, the pass explores that gate's predecessors and ancestors to collect the largest block of previously unseen and contiguous gates acting on the control and target qubits. The pass continues in this manner and returns a collection of disjoint blocks. The two-qubit block optimization pass computes the unitary operation for each block, synthesizes a new sub-circuit (either exactly, using the KAK decomposition [29, 30] , or approximately; see Appendix B), and replaces the block.
To further reduce the number of SWAP gates, we considered an optimization called the Local Ordering Circuit Optimization (LOCO), that permutes qubits such that those interacting via CNOT gates are as nearestneighbor as possible in the circuit representation; the circuit is optimized for a linear nearest-neighbor topology. This method employs a weighted-variant of reverse Cuthill-Mckee ordering [35, 36] to reorder the sparse matrix A ij , with non-zero elements counting the number of CNOT gate operations between qubits i and j in the circuit, so that its bandwidth is minimized. The matrix is symmetric as we do not consider the direction of the CNOTs. This reordering is efficient, having a runtime that is linear in the number of nonzero matrix elements [37] . To properly account for multiple CNOT interactions between qubits, the LOCO algorithm uses a weighted heuristic when reordering, that favors optimizing pairs of qubits with the largest number of repeated interactions over those with fewer gates between them. Input circuits whose bandwidth was reduced by LOCO were replaced with their optimized counterparts. Although this optimization did not lead to significant improvements for heavy output generation using small numbers of qubits, we expect SWAP optimizations such as these to further improve results for larger circuits mapped onto devices with limited connectivity.
Appendix B: Approximate compiling
We can always decompose [38, 39] an arbitrary twoqubit unitary in the form
where
are products of single-qubit unitaries K l,r i , the two-qubit component is represented in terms of the information content (α, β, γ) as
and we can always restrict to the Weyl chamber π/4 ≥ α ≥ β ≥ |γ|. Let U ∼ V denote equivalence between U and V under local operations, implying equality of the information content of U and V . We can calculate a trace of the product of two
From this trace we may easily determine the average gate fidelity [18] 
and these expressions give also the maximal fidelity between arbitrary unitaries U c,t ∈ SU(4) after optimizing over local pre-and post-rotations [40] max
We are interested in decompositions of a target unitary U t ∈ SU(4) with the minimal number of applications of a fixed 'basis' gate U b . It is obvious that with zero applications of the basis we can construct only non-entangling target unitaries U t ∼ U d (0, 0, 0), and with one application of the basis we can construct only target unitaries which are equivalent to the basis [41, 42] that 3 applications of the basis is sufficient to cover all of SU(4). Zhang et al. [43] give decompositions using a more general 'super controlled' basis U b ∼ U d (π/4, β b , 0), for any β b , both an expansion with 3 applications of U b to decompose an arbitrary U t ∼ U d (α t , β t , γ t ) and also an expansion using two applications of U b for a restricted target unitary
The above expansions are exact so that the constructed unitary U c satisfies
but we can use eq. (B5) to find the average gate fidelity due to approximating general U t by fewer applications of the basis gate than is necessary for exact expansion. With zero applications of arbitrary U b we have:
which is optimal. With one application of arbitrary U b we have:
which is optimal. With two applications of super controlled U b ∼ U d (π/4, β b , 0) we have:
which is optimal for applications of super controlled U b there is no need to approximate and we have:
which is clearly optimal. There can be an additional freedom when expanding a two-qubit gate: in many cases it does not matter whether we implement U t or U tm = U t · SWAP since the latter differs merely by permutation of the output qubit labels. We call it the mirror gate of U t and its expansion is easily related to U t :
making use of the sign function defined as sgn(x) = −1 for x < 0 and sgn(x) = 1 for x ≥ 0. We can extend eqs. (B8) to give i-gate expansions of U tm , U (im) c with fidelities F (im) avg , defined by choosing to expand whichever of U t and U tm gives the better fidelity. For example, the 2-gate expansion has
Because of the mirroring action within the Weyl chamber, the expansion of the mirrored gate has best fidelity exactly when the expansion of the unmirrored gate has worst fidelity, and vice versa. In addition to improving F avg , the freedom to combine a SWAP operation may also allow reduction in the number of inserted SWAP gates during a 'swap mapping pass' as described in Appendix A.
It is interesting to investigate the expected infidelity of each of the approximate expansions of U t , averaged over U t uniformly distributed within SU(4) in the Haar measure on the Weyl chamber [44, 45] M (α, β, γ) = 24 π cos(4α) cos(8β) + cos(4β) cos(8γ) + cos(4γ) cos(8α) − cos(8α) cos(4β)
allowing calculating the distribution of fidelities of the 2-basis gate approximation of eq. (B8f) for a random element of SU(4)
where z is defined by
for F > 3/5, and
for F ≤ 3/5. Similarly, for the mirrored version eq. (B10)
for z < π/8, F > 0.88, and
The 2-basis gate approximations perform surprisingly well, with the median fidelities F (2) avg = 0.99, F (2m) avg = 0.997 comparing favorably to the typical 2-qubit gate fidelities for current quantum devices. The full distribution of fidelities for the zero-, one-, and two-gate approximations are plotted in Fig. 4 , where the zero-and one-gate distributions are determined by random sampling.
By comparing F
avg for all i we can choose the best approximation for any given U t . Specifically, if the basis gate U b may be implemented with average gate fidelity F b we can estimate the overall fidelity by multiplying the fidelity due to approximation with the fidelity due to the number of applications of U b , and choose the expansion with the highest overall fidelity
The statistics of the number of basis gate applications for a randomly-generated ensemble of target gates are shown in Fig. 5 . With a fairly noisy basis gate F b = 0.97 and no mirroring, the best expansion by this method has 3 applications of the basis for 22%, two applications for 76%, one application for 2%, and zero applications for < 0.1% of targets, thus an average of 2.2 basis gate applications. With the freedom to mirror, three applications for 3%, two applications for 93%, one application for 4%, and zero applications for < 0.1% of targets, thus a mean of 2.0 basis gate applications. The resulting fidelity can be quoted as an 'effective fidelity' F e equal to the cube root of the mean of F best , which we can interpret as the equivalent basis gate fidelity if we were to use only exact 3-gate expansions of random targets. We show in Fig. 6 the ratio of the effective infidelity 1 − F e to the basis gate infidelity 1 − F b , giving the factor by which the use of approximate expansions improves effective gate performance. For F b = 0.97 we get F e = 0.976, F roring, assuming fixed 1 − F b of 1%, 3% or 5%. Using measured CNOT fidelities for each of the qubit pairs, implementing the mirror expansions, and combining the mirror choice with the swap-mapping pass should allow future compiler-driven improvements in quantum volume.
Appendix C: Confidence intervals for the heavy probability
To be confident with a finite number of trials that the heavy probability h d exceeds 2/3, we should set stricter threshold t > 2/3 for the estimated probability, requiringĥ d > t to claim success. Drawing n c random model circuits of given width and depth, and executing each circuit n s times gives a total of n c n s experiment outcomes, each of which is to be checked against simulation of the corresponding circuit to determine a count n h of heavy outcomes. We estimate h d in the natural way by the heavy fraction over these outcomeŝ
For the purposes of making a conservative bound on the spread ofĥ d we analyze using the worst-case distribution where the heavy probability conditioned on each circuit is either zero or one. Thus, executing each circuit multiple times n s > 1 (as is typically convenient to avoid reconfiguring experimental settings and allow recycling of simulation results) will generally narrow the observed fluctuations inĥ d but, for fear of systematic errors we do not allow this to alter the threshold t. Under this worstcase assumption, n h /n s is binomial distributed with parameter n c and while it would be straightforward to calculate numerically confidence intervals directly from the binomial distribution, because the interesting range ofĥ d is close to 2/3 where a normal approximation is valid, we instead require a minimum of n c = 100 circuits and make a normal approximation to the binomial, and write the requirements for claiming success at a given width and depth n c ≥ 100 (C2)
where we set z = 2 for a 97.5% '2-sigma' one-sided confidence interval. For example, to claim success with n c = 5000 model circuits, the observed heavy fraction must exceed the threshold t = 0.68.
Appendix D: Device parameters
We measured the quantum volume of three IBM Q devices: 5-qubit Tenerife, 16-qubit Melbourne, and 20-qubit Tokyo. The device connectivities are shown in Fig.  7 , with the four qubits from each device that were used for the experiments highlighted in grey boxes. Table V lists the average error rates for the set of qubits used in these experiments. These error rates were measured one day before the quantum volume experiments were performed. Fluctuations in these numbers can occur during the time scale of these experiments, but they are representative of the single-qubit, two-qubit, and measurement errors for each device. The data from Table V was also used in the noisy simulations of the quantum volume circuits in Table II . 1Q for single-qubit error rates, CX for two-qubit error rates, and M for measurement. The averages are taken over the set of qubits from each device that were used in the quantum volume experiments.
(a) (c)
FIG. 7. Device diagrams used for the experimental data in Table I : (a) Tenerife, (b) Melbourne, and (c) Tokyo. The shaded boxes indicate the qubits selected for the experiments discussed here. CX gates are available between pairs of qubits connected by a line.
