Performance of a quantum annealer on range-limited constraint
  satisfaction problems by King, Andrew D. et al.
Performance of a quantum annealer on range-limited constraint satisfaction problems
A. D. King,∗ T. Lanting, and R. Harris
D-Wave Systems Inc., 3033 Beta Avenue, Burnaby, British Columbia, Canada V5G 4M9
(Dated: September 7, 2015)
The performance of a D-Wave Vesuvius quantum annealing processor was recently compared to a
suite of classical algorithms on a class of constraint satisfaction instances based on frustrated loops.
However, the construction of these instances leads the maximum coupling strength to increase with
problem size. As a result, larger instances are subject to amplified analog control error, and are
effectively annealed at higher temperatures in both hardware and software. We generate similar
constraint satisfaction instances with limited range of coupling strength and perform a similar
comparison to classical algorithms. On these instances the D-Wave Vesuvius processor, run with
a fixed 20µs anneal time, shows a scaling advantage in the median case over the software solvers
for the hardest regime studied. This scaling advantage implies that quantum speedup is not ruled
out for these problems. Our results support the hypothesis that performance of D-Wave Vesuvius
processors is strongly influenced by analog control error, which can be reduced and mitigated as the
technology matures.
I. INTRODUCTION
Following the recent introduction of D-Wave quantum
annealing processors, a wealth of research has aimed to
characterize the performance of this new platform, in
particular pitting it against classical competition [1–11].
D-Wave processors take as input spin glass instances in
the Ising model, and it is straightforward to express a
variety of NP-hard problems in this format [12]. How-
ever, the energy landscape of some instances may be more
amenable to solution by thermal or combinatorial meth-
ods than quantum methods [13, 14], and input to current
D-Wave processors must be reasonably robust to analog
control error if we are to observe the mechanics of the un-
derlying quantum annealing platform rather than classi-
cal noise [3, 15]. The selection of appropriate testbeds to
use when probing for quantum speedup has recently been
the subject of much research. This research has identi-
fied several desirable properties of input sets, including
the existence of a nonzero-temperature spin glass phase
transition [13], foreknowledge of the ground state energy
and possibly ground states [1], tunable difficulty, and ro-
bustness to analog control error and thermal effects [15].
Randomly generated instances of constraint satisfac-
tion problems (CSPs) or satisfiability problems are an
attractive target: they are well-understood from a sta-
tistical physics perspective, and their difficulty can be
tuned by a single parameter: the constraint-to-variable
ratio α [16]. However, direct solution of these instances
requires, in general, the ability to couple arbitrary pairs
of qubits in the processor. While this can be done indi-
rectly in a D-Wave processor through creation of logical
qubits [3, 17–20], this may amplify control error and ob-
scure the underlying mechanics of the processor [3].
Hen [21] managed this issue by constructing constraint
satisfaction problems that can be directly embedded in
∗ aking@dwavesys.com
an arbitrary qubit connectivity graph; each constraint
is a frustrated loop, i.e. a cycle of couplers of which an
odd number are antiferromagnetic. These problems have
two desirable properties: a planted (foreknown) ground
state and difficulty that can be tuned with the parame-
ter α. Hen et al. [1] show that performance scaling for
the D-Wave processor is superior to the best of a suite
of classical solvers in one region of α, but is worse in the
region of α encompassing the hardest instances. How-
ever, their instances are constructed in such a way that
for a fixed value of α, thermal effects and analog error are
increasingly amplified by normalization as the problems
increase in size.
Here we present a simple modification of the construc-
tion of these instances that curtails this effect, putting
the analog and digital solvers on more level ground and
reducing unwanted thermal behavior. On these range-
limited instances, we find that a D-Wave Vesuvius pro-
cessor shows better performance scaling than classical
competition for all values of α tested. This competi-
tion consists of the two best-performing classical soft-
ware solvers studied by Hen et al.: the zero-temperature
Hamze-de Freitas-Selby (HFS) algorithm [22–25], and a
solver version of simulated annealing (SAS) [26]. Hen
et al. also showed very strong correlation between suc-
cess probabilities in the D-Wave processor and a thermal
Gibbs state approximated using standard simulated an-
nealing (SAA) [1]. Moderating the coupling range of the
input instances, and therefore the temperature relative to
the final gap of the time-dependent Hamiltonian, reduces
correlation with the thermal model.
II. QUANTUM ANNEALING AND THE
D-WAVE PLATFORM
Quantum annealing in the Ising model aims to find
low-energy states in a system of n interacting spins via
ar
X
iv
:1
50
2.
02
09
8v
2 
 [q
ua
nt-
ph
]  
3 S
ep
 20
15
2evolution of the time-dependent Hamiltonian
HS(t) = 1
2
∑
i
A(t)σxi +B(t)HI (1)
where 0 ≤ t ≤ tf , tf is the run time of the QA al-
gorithm, A(0)  B(0), A(tf )  B(tf ) and HI is the
time-independent Ising problem Hamiltonian:
HI =
∑
i<j
Jijσ
z
i σ
z
j +
∑
i
hiσ
z
i . (2)
We refer to the Ising Hamiltonian as HI = (h, J), where
the biases h and pairwise couplings J encode the opti-
mization problem (i.e. energy function) we wish to solve
(i.e. minimize).
In a D-Wave quantum annealing processor [27, 28], not
all pairs of qubits are coupled, and therefore the set of
nonzero entries of J must adhere to the physical con-
straints of the processor. One can view (h, J) as a set of
vertex and edge weights, respectively, of the qubit con-
nectivity graph, whose vertices correspond to qubits and
whose edges correspond to couplers.
III. FRUSTRATED LOOP PROBLEMS AND
LIMITED COUPLING RANGE
For a particular hardware qubit connectivity graph G
with n vertices (qubits), and a particular constraint-to-
qubit ratio α, Hen et al. construct a frustrated loop in-
stance using k = roundoff(αn) loops like so:
1. For i = 1, . . . , k, loop `i is a cycle in G chosen as the
first cycle generated by the edges of a random walk
in G starting at a random vertex. If `i contains
fewer than 8 vertices, it is discarded and generated
anew.
2. The constraint Ising Hamiltonian Ji corresponding
to `i has value −1 on every edge of `i except for a
randomly selected edge ei of `i, where Ji has value
+1. Ji is zero elsewhere.
3. The final Ising Hamiltonian is (h, J), where h is the
zero vector and J =
∑k
i=1 Ji.
Any instance constructed with this method has integer-
valued h and J , but the coupling range R = maxi,j{|Jij |},
i.e. the maximum magnitude of any entry in J , is not
necessarily bounded. Moreover, typical instances con-
structed at a fixed ratio α on increasingly large subgrids
of the D-Wave processor have increasing range limits R
[1, 29]. Since input (h, J) to the D-Wave processor must
be normalized to within the range [−1, 1], coupling range
R necessitates scaling by a factor of 1/R. This scale fac-
tor creates two complications when studying the efficacy
of the quantum annealing algorithm on practical hard-
ware. First, the operating temperature of the processor
relative to the magnitude of the input increases with R,
thus increasing undesirable thermal effects. Second, each
coupler and local field is subject to analog control error
on the order of δJ ∼ 0.035 and δh ∼ 0.05 respectively.
The magnitude of the errors δh and δJ are relative to
normalized full energy scale J = 1. Note that errors
are present even for h = 0 or J = 0. For scale factors
of R, analog control error relative to the magnitude of
the desired input is increased by a factor of R. Thus,
the deviation of the actual input from the desired input
increases with increasing R. In the range-unlimited in-
stances studied by Hen et al. this amplification factor is
as high as 17, and can be dictated by a single coupler that
happens to be in disproportionately many loops. Since
R grows with instance size n, the D-Wave processor is
penalized on larger instances.
In order to address this issue, we construct each in-
stance with respect to an integer coupling range R ≥ 2,
so that in our instances each entry of h and J is an in-
teger between −R and R. To do this, when selecting
a candidate for `i via random walk we ignore edges of
G on which |∑i−1j=1 Ji| is already R. This ensures that
the final Hamiltonian (h, J) has all entries in the range
[−R,R], so when the instance is necessarily normalized
to the range [−1, 1] as input to the D-Wave processor, it
is scaled down by no more than a factor of 1/R. Con-
sequently, analog control errors and thermal effects in
our instances are relatively amplified by no more than a
factor of R where R is independent of instance size n.
There is another, less crucial modification of the con-
struction: While Hen et al. reject and resample a choice
of `i if it is too short, we reject the choice if it is contained
in a single eight-qubit unit cell [28]. Thus we sometimes
allow loops of length 6, and sometimes forbid loops of
length 8. This modification should in principle allow for
greater frustration and less domain clustering within unit
cells.
In any meaningful study of analog quantum annealing
processors it is desirable to limit relative amplification
of analog control error and unwanted thermal effects if it
can be done without otherwise materially detracting from
the experiment. In this work we consider R ∈ {2, 3,∞}
and α ∈ [0.1, 0.5]; these values of α include the hardest
regime. Implications of our choice of coupling range and
loop selection criteria are considered in greater detail in
the Supplemental Material [29].
All of these frustrated loop instances will, by construc-
tion, have ↑↑ · · · ↑ and ↓↓ · · · ↓ as planted ground states.
Hen et al. describe these instances as being constructed
with respect to an arbitrary antipodal pair of planted so-
lutions, but our construction is equivalent under change
of variables (Ising spin reversal) both in theory and, due
to the application of random spin reversals in hardware,
in practice.
3IV. EXPERIMENTAL RESULTS
We compare performance of a D-Wave quantum an-
nealing processor to HFS and SAS, both described by
Hen et al. [1]. Our instances are constructed on sub-
graphs of a Chimera graph CL [28], in which n ofN = 8L
2
qubits are functional, for 4 ≤ L ≤ 8 (see Supplemental
Material [29]). Following their methodology, we run SAS
on a linear schedule of optimal length in inverse temper-
ature spanning the range β ∈ [0.01, 5] after scaling the
input to the range [−1, 1]. Scaling of the Hamiltonian has
no bearing on HFS, which is a large-neighborhood zero-
temperature search that exploits low-treewidth induced
subgraphs in the Chimera architecture.
In order to remain consistent with previous probes for
quantum speedup [1, 4], we assume that classical algo-
rithms are run by a perfectly parallel oracle that allows
all n sites to be updated simultaneously in simulated an-
nealing, and allows all possible cell updates in HFS to
be performed in parallel. Going even further, we simply
divide SAS running time by n and divide HFS running
time by L =
√
N/8, the maximum possible number of
parallel cell updates at any point in the algorithm. Fur-
ther detail on experimental methods, benchmarking and
data analysis is given in the Supplemental Material [29].
To account for differences between implementations and
hardware, we use the assumption of Hen et al. that each
Monte Carlo sweep takes time τSA = 3.54µs. We assume
that each HFS unit cell update takes 1µs.
The D-Wave processor used was a D-Wave Two V6
processor of the same architecture and fabrication lot as
the processor used by Hen et al. [1]
A. Performance scaling results
In Fig. 1 we show the scaling of the median time to
solution for the three solvers as the problem size L in-
creases. As in previous work [1, 4], we are particularly in-
terested in how the ratio between two solvers’ time to so-
lution scales with respect to problem size. This is shown
in Fig. 2. A positive slope in Fig. 2 indicates a perfor-
mance scaling advantage for the D-Wave processor, and
the possibility of limited quantum speedup as defined by
Rønnow et al. [4] in the case of SAS, and the possibil-
ity of potential quantum speedup in the case of HFS1. In
the Supplemental Material [29] we arrange the data for
range-2, range-3, and range-unlimited instances by α.
D-Wave Vesuvius processors allow a minimum anneal
time of 20µs; previous work has shown that C8-scale
problems with optimal anneal time greater than this are
elusive [1, 3, 4]. Proving limited quantum speedup in this
1 The distinction arises because HFS is a combinatorial algorithm
rather than one based on a physical model [1].
C4 C5 C6 C7 C8
101
102
103
104
105
Problem size
T
im
e
to
so
lu
ti
o
n
(µ
s)
D-Wave, range 2
α = 0.10
α = 0.15
α = 0.20
α = 0.25
α = 0.30
α = 0.35
α = 0.40
α = 0.45
α = 0.50
C4 C5 C6 C7 C8
101
102
103
104
105
Problem size
HFS, range 2
C4 C5 C6 C7 C8
101
102
103
104
105
Problem size
SAS, range 2
C4 C5 C6 C7 C8
101
102
103
104
105
Problem size
T
im
e
to
so
lu
ti
o
n
(µ
s)
D-Wave, range 3
C4 C5 C6 C7 C8
101
102
103
104
105
Problem size
HFS, range 3
C4 C5 C6 C7 C8
101
102
103
104
105
Problem size
SAS, range 3
FIG. 1. Median time to solution per size. Shown is the
median time to solution for the D-Wave processor (left), HFS
(middle), and SAS (right). The top and bottom rows show
data for range-2 and range-3 instances respectively. Following
Hen et al. [1], we divide HFS times by L =
√
N/8 to simulate
hypothetical parallelization. SAS data incorporates full n-
core hypothetical parallelization. Error bars represent one
standard deviation from bootstrap samples; most are smaller
than the data markers.
framework would require data from the D-Wave proces-
sor using shorter anneals to certify that we are not ar-
tificially slowing the processor on easier instances. In
particular for the smaller and easier problems, the mini-
mum anneal time may mask the true performance scaling
of the quantum annealing platform [1, 30]. This may ex-
plain, to some extent, the outstanding performance of
the D-Wave processor on high-α instances.
In the Supplemental Material [29] we analyze the effect
that range limitation has on the difficulty of the prob-
lems. Here we simply note that for the hardest range of
α, limiting the range of instances to 3 does not seem to
make the problems significantly easier. This can be seen
where range-2, range-3, and range-unlimited instances
are compared for the available solvers.
4C4 C5 C6 C7 C8
100
101
102
Problem size
D
-W
av
e
ti
m
e
/
H
F
S
ti
m
e
D-Wave vs. HFS, range 2
C4 C5 C6 C7 C8
100
101
102
Problem size
D-Wave vs. HFS, range 3
α = 0.10
α = 0.15
α = 0.20
α = 0.25
α = 0.30
α = 0.35
α = 0.40
α = 0.45
α = 0.50
C4 C5 C6 C7 C8
101
102
103
Problem size
D
-W
av
e
ti
m
e
/
S
A
S
ti
m
e
D-Wave vs. SAS, range 2
C4 C5 C6 C7 C8
101
102
103
Problem size
D-Wave vs. SAS, range 3
α = 0.10
α = 0.15
α = 0.20
α = 0.25
α = 0.30
α = 0.35
α = 0.40
α = 0.45
α = 0.50
FIG. 2. Median ratio of running time by size. Shown is
the median ratio of time to solution for the D-Wave processor
compared with each classical solver. Positive slope represents
an increasing advantage for the D-Wave processor as problems
get larger. The hardest overall regime roughly corresponds to
α ≈ 0.25. Error bars represent one standard deviation from
bootstrap samples.
B. Comparison with a nearly equilibrated thermal
annealer
Hen et al. found strong correlation between the success
probabilities of a D-Wave processor and a nearly equili-
brated thermal annealer with a final inverse temperature
of βf = 5. Our results (see Fig. 3 and Supplemental Ma-
terial [29]) show poorer correlation and an inability to fit
D-Wave scaling data to SAA at a single inverse temper-
ature. Unlike the range-unlimited instances studied in
[1], the hardness peak for SAA does not remain constant
with varying βf . In the Supplemental Material we show
instancewise scatter plots of D-Wave and SAA success
probabilities. The predictions of the thermal model do
not correlate well with the hardware on the range 2 and
range 3 instances.
V. DISCUSSION
Far from being artificial, fixed coupling range in the
large system limit appears, at the phase transition in the
Ising formulation of Not-All-Equal 3-SAT [31]2, which
is NP-hard and can be formulated as a frustrated loop
problem on the complete graph. Furthermore, limiting
coupling range in hard frustrated loop instances affects
a vanishingly small portion of each instance (see Supple-
mental Material [29]).
The study of instances with fixed coupling range allows
for the control of two factors: amplification of analog
control error (for D-Wave) and effective operating tem-
perature (for D-Wave, SAS, SAA, and any other simu-
lated physical model with a thermal component [1]). Our
results, taken in conjunction with those from Ref. [1], in-
dicate a decreasing advantage for the D-Wave hardware
relative to SAS with increasing range. This observation
is consistent with the hypothesis that increasing range
penalizes the hardware by augmenting both the relative
magnitude of control errors and the importance of ther-
malization.
It is straightforward to construct input classes for
which analog control error will dominate the performance
scaling of an analog processor. When probing the poten-
tial for quantum speedup in a quantum annealing plat-
form, it is important to do the opposite: construct an
input class for which the impact of analog control error
is minimal. In doing so we might better observe proper-
ties of the annealer’s mechanics rather than observing the
effect of precision limitations, which by now are reason-
ably well understood [1, 3, 32] and expected to improve
with the maturation of the technology and the possible
implementation of error correction strategies [20, 33–35].
ACKNOWLEDGMENTS
The authors thank Tameem Albash, Itay Hen, Joshua
Job, and Daniel Lidar for a detailed and informative
exchange on this work and theirs, and for generous
provision of data. They thank Evgeny Andriyash and
Jack Raymond for fruitful discussions about frustrated
loops, and Emile Hoskinson for providing specifications
of the D-Wave processor used. They thank Mohammad
Amin and Miles Steininger for valuable comments on the
manuscript.
2 The expected number of constraints containing a given pair of
variables is approximately 12.6/n at the phase transition for
NAE3SAT instances on n variables, whereas in a frustrated loop
instance with α = 0.25 on a Chimera graph the expected number
of constraints containing a given coupler is close to 1.
50.1 0.2 0.3 0.4 0.5
0
0.2
0.4
0.6
α
S
ca
li
n
g
co
effi
ci
en
t
b
Scaling coefficients on range-2 instances
0.1 0.2 0.3 0.4 0.5
0
0.5
1
1.5
α
S
ca
li
n
g
co
effi
ci
en
t
b
Scaling coefficients on range-3 instances
D-Wave
SAA, βf = 3
SAA, βf = 4
SAA, βf = 5
FIG. 3. Scaling coefficients for the D-Wave processor and SAA. Shown are scaling coefficients (exponential slope fits,
i.e. time to solution ∝ exp(b(α)L)) for performance of the D-Wave processor and SAA on range-2 and range-3 instances. Hen
et al. find agreement between the D-Wave processor and SAA at βf = 5. Here each final inverse temperature βf ∈ {3, 4, 5}
appears deficient in some regime as a model for performance of the D-Wave processor. Error bars represent two standard
deviations from the bootstrap set.
[1] I. Hen, T. Albash, J. Job, T. F. Rønnow, M. Troyer, and
D. Lidar, “Probing for quantum speedup in spin glass
problems with planted solutions,” (2015), arXiv preprint
arXiv:1502.01663v2.
[2] J. King, S. Yarkoni, M. M. Nevisi, J. P. Hilton, and
C. C. McGeoch, “Benchmarking a quantum annealing
processor with the time-to-target metric,” (2015), arXiv
preprint arXiv:1508.05087v1.
[3] D. Venturelli, S. Mandra`, S. Knysh, B. O’Gorman,
R. Biswas, and V. Smelyanskiy, arXiv preprint
arXiv:1406.7553 (2014).
[4] T. Rønnow, Z. Wang, J. Job, S. Boixo,
S. Isakov, D. Wecker, J. Martinis, D. Li-
dar, and M. Troyer, Science 345, 420 (2014),
http://www.sciencemag.org/content/345/6195/420.full.pdf.
[5] V. Martin-Mayor and I. Hen, arXiv preprint
arXiv:1502.02494 (2015).
[6] S. Boixo, T. Albash, F. M. Spedalieri, N. Chancellor,
and D. A. Lidar, Nature Communications 4 (2013).
[7] S. Boixo, V. N. Smelyanskiy, A. Shabani, S. V.
Isakov, M. Dykman, V. S. Denchev, M. Amin,
A. Smirnov, M. Mohseni, and H. Neven, arXiv preprint
arXiv:1411.4036 (2014).
[8] H. G. Katzgraber, F. Hamze, Z. Zhu, A. J. Ochoa, and
H. Munoz-Bauza, Phys. Rev. X 5, 031026 (2015).
[9] S. Boixo, T. Rønnow, S. Isakov, Z. Wang, D. Wecker,
D. Lidar, J. Martinis, and M. Troyer, Nature Physics
10, 218 (2014).
[10] C. McGeoch and C. Wang, in Proceedings of the ACM In-
ternational Conference on Computing Frontiers (ACM,
2013) p. 23.
[11] T. Albash, W. Vinci, A. Mishra, P. A. Warburton, and
D. A. Lidar, Physical Review A 91, 042314 (2015).
[12] A. Lucas, Frontiers in Physics 2 (2014),
10.3389/fphy.2014.00005.
[13] H. Katzgraber, F. Hamze, and R. Andrist, Physical Re-
view X 4, 021008 (2014).
[14] D. S. Steiger, T. F. Rønnow, and M. Troyer, (2015),
arXiv:1504.07991.
[15] Z. Zhu, A. J. Ochoa, S. Schnabel, F. Hamze, and H. G.
Katzgraber, (2015).
[16] M. Me´zard and A. Montanari, Information, Physics, and
Computation (Oxford University Press, 2009).
[17] C. Klymko, B. D. Sullivan, and T. S. Humble, Quantum
Information Processing 13, 709 (2014).
[18] J. Cai, W. Macready, and A. Roy, arXiv preprint
arXiv:1406.2741 (2014).
[19] T. Boothby, A. D. King, and A. Roy, arXiv preprint
arXiv:1507.04774 (2015).
[20] W. Vinci, T. Albash, G. Paz-Silva, I. Hen, and D. A.
Lidar, arXiv preprint arXiv:1507.02658 (2015).
[21] I. Hen, “Performance of D-Wave Two on problems with
planted solutions,” (2014), presented at AQC 2014, www.
isi.edu/events/aqc2014/.
[22] F. Hamze and N. de Freitas, in Proceedings of the
20th conference on Uncertainty in artificial intelligence
(AUAI Press, 2004) pp. 243–250.
[23] A. Selby, arXiv preprint arXiv:1409.3934v1 (2014).
[24] A. Selby, alex1770/QUBO-Chimera · GitHub,
http://github.com/alex1770/QUBO-Chimera (2014).
[25] A. Selby, D-Wave: comment on comparison with
classical computers, http://www.archduke.org/stuff/d-
wave-comment-on-comparison-with-classical-computers/
(2014).
[26] S. Kirkpatrick, C. Gelatt Jr, and M. Vecchi, Science 220,
671 (1983).
[27] M. Johnson, M. Amin, S. Gildert, T. Lanting, F. Hamze,
N. Dickson, R. Harris, A. Berkley, J. Johansson,
P. Bunyk, et al., Nature 473, 194 (2011).
[28] P. Bunyk, E. Hoskinson, M. Johnson, E. Tolkacheva,
F. Altomare, A. Berkley, R. Harris, J. Hilton, T. Lant-
ing, A. Przybysz, et al., IEEE Transactions on Applied
Superconductivity (2014).
[29] See Supplemental Material at [URL to be inserted] for
additional data and analysis.
[30] M. Amin, arXiv preprint arXiv:1503.04216 (2015).
6[31] D. Achlioptas, A. Chtcherba, G. Istrate, and C. Moore,
in Proceedings of the Twelfth Annual ACM-SIAM Sympo-
sium on Discrete Algorithms (Society for Industrial and
Applied Mathematics, 2001) pp. 721–722.
[32] A. D. King and C. C. McGeoch, arXiv preprint
arXiv:1410.2628 (2014).
[33] K. Pudenz, T. Albash, and D. Lidar, Nature Communi-
cations 5 (2014).
[34] K. Pudenz, T. Albash, and D. Lidar, arXiv preprint
arXiv:1408.4382 (2014).
[35] K. Young, R. Blume-Kohout, and D. Lidar, Physical
Review A 88, 062314 (2013).
Supplemental material for “Performance of a quantum annealer on range-limited
constraint satisfaction problems”
A. D. King,∗ T. Lanting, and R. Harris
D-Wave Systems Inc., 3033 Beta Avenue, Burnaby, British Columbia, Canada V5G 4M9
(Dated: September 3, 2015)
I. METHODS
A. Quantum annealing platform
In this work we used a Vesuvius quantum annealing
processor manufactured by D-Wave Systems Inc. The
processor is of identical design to the D-Wave Two V6
processor installed at ISI [1], and is from the same fab-
rication lot. We call these two processors “SR10-V6”
and “ISI-V6” respectively. The problems we consider are
generated on subgraphs of the processor’s Chimera qubit
connectivity graph [28], using up to 467 qubits (see Fig.
5). The data were gathered in June, 2014. SR10-V6 had
an operating temperature of approximately 15 mK, and
used maximum J inductance (coupling strength) of 1.25
pH, compared with temperature and inductance of 17mK
and 1.33pH for ISI-V6 [1]. All experiments were run us-
ing an anneal length of ta = 20µs, equal to the runs used
for most of the key analysis in the work of Hen et al. [1].
Although the processors have the same architecture and
FIG. 5. The largest subprocessor used, a partial C8 graph –
an 8×8 grid of unit cells – with 467 qubits. Smaller instances
use the square subgrid containing the top-left corner.
∗ aking@dwavesys.com
0 0.2 0.4 0.6 0.8 1
0
2
4
6
8
t/tf
A
n
n
ea
li
n
g
sc
h
ed
u
le
(G
H
z)
A(t)
B(t)
kBT
FIG. 6. Annealing schedule of the D-Wave processor.
A(t) and B(t) represent the transverse field and longitudinal
couplings, respectively. Operating temperature of 15mK is
shown. Scale is normalized to h = 1.
similar ratios of temperature to inductance, they used
different annealing schedules (see Fig. 6 and Ref. [1]).
B. Experimental details
The primary testbed of problems studied consists of
200 instances of each size L ∈ {4, 5, 6, 7, 8}, for each value
of α ∈ {0.10, 0.15, . . . , 0.50}, for each range limit R ∈
{2, 3,∞}. Data were not collected using the hardware for
R =∞, as the processor was taken offline in 2014 before
the need for such results was apparent. In Section II B we
provide evidence that such data would likely be similar to
the results in Ref. [1], where quantum annealing success
probabilities are highly correlated with themselves when
run on a different annealing schedule.
1. D-Wave processor
Each instance was annealed 10240 times by the D-
Wave processor with an anneal length of 20µs, the min-
imum allowed by the system. These experiments were
performed in batches of 1024 anneals, each batch with a
random Ising spin reversal, or gauge transformation ap-
plied, as in previous work [4, 32].
22. HFS
Selby’s implementation of the HFS algorithm [1, 24, 25]
is a heuristic approach whose main effort consists of tree
updates, and which typically restarts to a random state
when it performs two tree updates without improvement
in energy. In order to treat HFS more like an annealer,
we modified the code so that a solution is recorded before
each reset – as in SAS, the original algorithm is modified
to return possibly optimal intermediate states. It would
be algorithmically equivalent to force the HFS algorithm
to terminate instead of resetting.
In our experiments, each instance was solved 10240
times by the HFS algorithm. The runs were performed
single-threaded in parallel using an HPC cluster of 8-core
Intel Xeon E5-2670 processors.
3. SAS
SAS was run on each instance 1024 times at anneal
length roundoff (2a/2) for each integer a ∈ {2, . . . , 22}.
Thus the longest anneal for each instance is 2048 Monte
Carlo sweeps, enough so that the optimal anneal length
for median time to solution is exceeded for all problem
sets studied (see Fig. 7).
4. SAA
SAA was run on each instance 1024 times for 20,000
Monte Carlo sweeps. For similar instances, Hen et al.
support the claim that 20,000 gets us reasonably close
to equilibration (of performance as a constant-time op-
timizer) for frustrated loop problems. We did not inves-
tigate longer SAA anneals due to limited availability of
computing resources. Like SAS, SAA was run on a lin-
ear schedule in inverse temperature from β0 = 0.01 to
βf ∈ {3, 4, 5}.
C. Benchmarking methodology
Following previous work [1, 4, 32], we treat each solver
(now including HFS) as a stochastic sampler, and mea-
sure the time to reach 99% confidence of having found
the ground state energy of an instance. We call this time
to solution (TTS). Given a solver achieving success prob-
ability p over a set of trials (i.e. anneals) and taking mean
time τ to complete a single trial, we compute the number
of samples required as TTS for this solver and instance
as
r(p) =
log(0.01)
log(1− p) , (1)
and compute TTS for this solver and instances as τr. For
SAS we assume τ to be 3.54µs multiplied by the number
of update sweeps in the anneal in order to remain consis-
tent with Hen et al. [1]. We determine the optimal sweep
length for each set of 200 problems as the length giving
the minimum bootstrapped median time to solution. We
disregard the issue of conditioning our SAS results on
minimizing TTS; due to the smoothness of the curves in
Fig. 7 near the optimal anneal lengths, we do not expect
this to have a significant effect on the results.
For HFS our methodology differs from that of Hen et
al.: We use an enumerative effort computation, as with
the D-Wave processor and SAS, rather than timing the
process. This allows a bare look at the dominant op-
erations. Following Hen et al., we assume hypothetical
parallelization of L cores on a CL instance, and accord-
ingly assume that a tree update on a CL instance takes
O(L) parallel steps (actually L steps) of “leaf updates”.
Using these assumptions we compute the total number
of leaf update steps required for a given “anneal” (i.e.
sample draw) and for convenience we assume a constant
of L · 1µs per tree update, noting that this gives reason-
ably comparable performance at C4 scale to the results
of Hen et al. [1].
1. Statistical methods
To generate the data points and error bars in the per-
formance and speedup figures, we used the same Bayesian
bootstrapping approach used by Hen et al. [1]. First we
describe the approach for performance data, which differs
slightly for HFS.
For a given solver, SR10-V6 or SAS or SAA, and each
set S of 200 instances at a given value of (L,α), we have,
for each instance si, an empirical success probability pi
representing xi successes out of y trials. We consider
the probability distribution of success probability to be
βi = β(xi +
1
2 , y − xi + 12 ). We then construct 1000
bootstrap sets Sj of size 100 by drawing 200 members
from S with repetition, resulting in multisets Sj = {si,j |
1 ≤ i ≤ 100}j . Now for each set we sample a probability
pi,j from distribution βi,j .
At this point we can apply the desired function fj to
the set of 100 probabilities, which is typically the median
of {r(pi,j)} for a fixed j. We then take the data point to
be the mean of fj , and we take the error in the statistic
to be the standard deviation of fj .
For HFS we use a similar approach suggested by
Joshua Job (personal communication, January 2015).
After we sample 1000 values indicating the number ri of
samples needed for 99% assuredness of success, we take
drie random samples (with repetition) from our set of
10240 samples, and take the sum of the numbers of tree
updates in those drie samples to give a sample of time to
solution.
Speedup. To compute data on speedup between two
solvers (or equally, the same solver on two problem sets),
we assume that time to solution is normally distributed
with mean and standard deviation as estimated above.
3101 102 103
102
103
104
105
106
Number of Monte Carlo sweeps
T
im
e
to
so
lu
ti
o
n
(µ
s)
SAS, range 2
101 102 103
102
103
104
105
106
Number of Monte Carlo sweeps
SAS, range 3
101 102 103
102
103
104
105
106
Number of Monte Carlo sweeps
SAS, range ∞
α = 0.50
α = 0.45
α = 0.40
α = 0.35
α = 0.30
α = 0.25
α = 0.20
α = 0.15
α = 0.10
FIG. 7. SAS time to solution vs. anneal length for C8 instances. Shown here are SAS results for R ∈ {2, 3,∞} on
C8 problems. Note that all panels are qualitatively similar. Behavior at the left of each panel is a statistical artifact arising
from our Bayesian model and limited number of experiments, which effectively places a lower bound on the success probability
(cf. the data of Hen et al. on suboptimal annealing time [1]). The slight increase in difficulty for the hardest problems as R
increases can be explained by higher temperature relative to the gap of the final Hamiltonian. The increase in difficulty for
long anneals in high-α problems as R increases can be explained by the suppression of randomness (and accelerated evolution
of ferromagnetic structure) for low-R, high-α combinations (see Appendix II B).
We then sample 1000 points from each normal distribu-
tion and compute 1000 speedup ratios. We use the mean
and standard deviation of this set of ratios for our data
point and error bar.
Scaling coefficients. When computing scaling coeffi-
cients (see Appendix II C), we assume normal distri-
butions on TTS for a given solver and set of 1000 in-
stances (200 of each size). We then draw 1000 samples
from each of the five distributions, and for each set of
five samples we compute the slope of the best fit line
ln(TTS) ≈ a(α)+b(α)L. From these 1000 slopes we take
the mean and standard deviation. Error bars in Fig. 3
represent two standard deviations, in keeping close to the
methods of Hen et al., who use 95% confidence intervals
[1].
II. FURTHER DATA
A. Problem hardness as a function of
constraint-to-variable ratio
In Section III we described Itay Hen’s original con-
struction of frustrated loop instances, and offered a mod-
ification. For convenience, we call the former HenFL in-
stances, and the latter KingFL instances. As explored
by Hen et al. [1], there is a clear easy-hard-easy pat-
tern of hardness for all solvers as α increases from 0 to
1 and beyond, with the hardest regime generally falling
near the point α = 0.25. This is shown to correlate with
frustration in the problem, measured by Hen et al. as
the proportion of extant couplers that are frustrated in
the planted ground state. For large α, systems tend to-
wards ferromagnetism. For sufficiently small α, a typical
system is simply a collection of disjoint frustrated loops,
and therefore both highly degenerate and combinatorially
trivial. In Fig. 8 we see the same qualitative dependence
on α.
In HenFL problems, the mean loop length is approx-
imately 11. In KingFL problems it is approximately 9
(see Fig. 9 for the instances studied). It is therefore not
surprising that the hardest problems appear for slightly
larger α in our results compared with the results of Hen
et al. We remark that a well-yielded Chimera graph will
have between 2n/5 and 3n edges, so in our data the hard-
est problems arise when the expected number of loops
containing a given coupler is near 1.
B. Effect of range limitation and loop distribution
Of fundamental importance to this work is the ques-
tion of whether or not limiting the coupling range of
frustrated loop instances makes them intrinsically easier.
Here we give evidence that the difference in performance
of the D-Wave processors here and in the paper of Hen
et al. [1] cannot be explained by our testbed being easier.
Clearly the range-limited instances studied here are
easier for the SR10-V6 D-Wave processor used in this
work than the range-unlimited HenFL instances are for
the ISI-V6 processor, and clearly the range-2 instances
are easier than the range-3 instances for SR10-V6. For an
idea of whether or not range-limited instances are com-
binatorially easier, we appeal to HFS performance, since
HFS is unaffected by coupling range and has no ther-
mal component. Perhaps the most pertinent answer to
this question is in the bottom-middle panel of Fig. 11.
There we see that for the highest α, HFS finds range-
40.1 0.2 0.3 0.4 0.5
101
102
103
104
105
α
T
im
e
to
so
lu
ti
o
n
(µ
s)
D-Wave, range 2
C4
C5
C6
C7
C8
0.1 0.2 0.3 0.4 0.5
101
102
103
104
105
α
HFS, range 2
0.1 0.2 0.3 0.4 0.5
101
102
103
104
105
α
SAS, range 2
0.1 0.2 0.3 0.4 0.5
101
102
103
104
105
α
T
im
e
to
so
lu
ti
o
n
(µ
s)
D-Wave, range 3
0.1 0.2 0.3 0.4 0.5
101
102
103
104
105
α
HFS, range 3
0.1 0.2 0.3 0.4 0.5
101
102
103
104
105
α
SAS, range 3
0.1 0.2 0.3 0.4 0.5
101
102
103
104
105
α
T
im
e
to
so
lu
ti
o
n
(µ
s)
HFS, range ∞
0.1 0.2 0.3 0.4 0.5
101
102
103
104
105
α
SAS, range ∞
FIG. 8. Time to solution vs. qubit-to-constraint ratio
α. Shown is the median time to solution for the D-Wave pro-
cessor (left), HFS (middle), and SAS (right) plotted against
α for each problem size CL for L ∈ {4, 5, 6, 7, 8}.
2 instances consistently easier than range-∞ instances,
but this phenomenon weakens as α decreases. There is
a simple intuitive reason for this: for any given coupling
range limit R, if α is sufficiently large (but not so large
that a random KingFL instance cannot be consistently
constructed), the number of loops containing an edge is
forced to be relatively consistent across the edges of the
graph. Consequently, the system is more orderly and
therefore more ferromagnetic – recall that at least 5/6
of nonzero couplers in each Ji are ferromagnetic. This
intuition is corroborated by our HFS results.
C4 C5 C6 C7 C8
7
8
9
10
11
Problem size
A
v
er
a
g
e
lo
o
p
le
n
g
th
Range-2 instances
C4 C5 C6 C7 C8
7
8
9
10
11
Problem size
A
v
er
a
g
e
lo
o
p
le
n
g
th
Range-3 instances
α = 0.10
α = 0.15
α = 0.20
α = 0.25
α = 0.30
α = 0.35
α = 0.40
α = 0.45
α = 0.50
FIG. 9. Loop length in experimental testbed. Shown
here are the mean and standard deviation of loop length for
range-2 and range-3 instances. The distribution appears to
converge to approximately length 9. This may be slightly
different for more fully-yielded Chimera graphs.
C4 C5 C6 C7 C8
101
102
103
104
105
Problem size
T
im
e
to
so
lu
ti
o
n
(µ
s)
HFS, range ∞
C4 C5 C6 C7 C8
101
102
103
104
105
Problem size
SAS, range ∞
α = 0.10
α = 0.15
α = 0.20
α = 0.25
α = 0.30
α = 0.35
α = 0.40
α = 0.45
α = 0.50
FIG. 10. Median time to solution per size for range-
unlimited instances. Shown here are data for HFS and
SAS on range-unlimited instances; these plots are analogous
to Fig. 1.
In Figures 12 and 13 we give HFS data on a more
extensive set of inputs, arranged according to α. For each
choice of L and α, the set contains 200 instances. Fig. 12
suggests that we should not expect the range-unlimited
KingFL instances on the SR10-V6 hardware graph to be
any easier than the range-unlimited HenFL instances on
the ISI-V6 hardware graph, if for each set we choose α
to maximize median HFS time to solution. This takes
into consideration the fact that ISI-V6 uses 503 qubits
on C8 problems, while SR10-V6 uses only 467. Fig. 13
suggests that for L ≤ 8 and α ≤ 0.3, we should not
expect range-3 KingFL instances to be consistently easier
than range-unlimited KingFL instances. For L ≤ 8 and
α ≤ 0.2, we should not expect range-2 KingFL instances
to be consistently easier than range-unlimited KingFL
instances. In short, it appears that the difference between
our results and the results of Hen et al. [1], especially the
scaling advantage we see for the D-Wave processor on
5C4 C5 C6 C7 C8
2−1
20
21
22
Problem size
A
d
va
n
ta
g
e
fr
o
m
re
d
u
ci
n
g
ra
n
g
e
D-Wave, range 2 vs. 3
C4 C5 C6 C7 C8
2−1
20
21
22
Problem size
HFS, range 2 vs. 3
C4 C5 C6 C7 C8
2−1
20
21
22
Problem size
SAS, range 2 vs. 3
C4 C5 C6 C7 C8
2−1
20
21
22
Problem size
A
d
va
n
ta
g
e
fr
o
m
re
d
u
ci
n
g
ra
n
g
e
D-Wave, range 3 vs. ∞
α = 0.10
α = 0.15
α = 0.20
α = 0.25
α = 0.30
α = 0.35
α = 0.40
α = 0.45
α = 0.50
C4 C5 C6 C7 C8
2−1
20
21
22
Problem size
HFS, range 3 vs. ∞
C4 C5 C6 C7 C8
2−1
20
21
22
Problem size
SAS, range 3 vs. ∞
C4 C5 C6 C7 C8
2−1
20
21
22
Problem size
A
d
va
n
ta
g
e
fr
o
m
re
d
u
ci
n
g
ra
n
g
e
HFS, range 2 vs. ∞
C4 C5 C6 C7 C8
2−1
20
21
22
Problem size
SAS, range 2 vs. ∞
FIG. 11. Benefit of reduced coupling range. Here we
show the median speedup arising from changing the coupling
range for each solver. No range-∞ data exists for the D-Wave
processor. Positive slopes indicate increasing benefit from re-
duction of precision range as problems get larger. Error bars
represent one standard deviation from bootstrap samples.
range-3 instances, cannot be attributed to the problems
being fundamentally easier.
For SAS, the issue of computational difficulty is con-
volved with that of effective temperature. This is most
troublesome for range-∞ instances, where the energy
scale (and relative temperature) varies significantly be-
tween instances (see Fig. 14). Speedup for range-3 over
range-2 problems for low α may indicate that the final
inverse temperature βf is too low relative to the energy
scale of the normalized Hamiltonian, whereas speedup
for range-3 over range-2 problems for high α may indi-
cate, in agreement with HFS data, that these problems
are easier in range 2. Fig. 7 provides further insight into
the question of easiness and temperature.
Going back to Fig. 11, we emphasize that particularly
for α ≤ 0.3, which includes the hardest instances, the
HFS speedup as coupling range decreases is small com-
pared with both the speedup of the D-Wave processor
over HFS and the speedup of the D-Wave processor be-
tween range-2 and range-3 instances. For α ≤ 0.25,
range-3 instances appear to have very little structural
“easiness” compared with range-∞ instances.
As further illustration of why this should be the case,
Fig. 15 takes the range-∞ KingFL testbed and plots the
proportion of nonzero couplers exceeding a given range
limit for each value of α. For a fixed α, the structural
impact of imposing a coupler range limit of R diminishes
quickly as R increases. In other words, in range-∞ in-
stances scaled to the interval [−1, 1], the scaling factor
is determined by the tail of the coupler value distribu-
tion, which we expect to be insignificant to the overall
combinatorial structure of the problem.
C. Ruling out a thermal model for D-Wave
performance
In order to support a thermal model for performance
scaling of the D-Wave processor, SAA data should pro-
vide a good match with D-Wave processor data on both
range-2 and range-3 instances at the same final inverse
temperature βf , for various choices of α. In other words,
it should have predictive power [11]. Hen et al. found
a good match between SAA and their D-Wave processor
data at βf = 5 for certain values of α [1]. In Fig. 3 we can
see the scaling coefficients for the D-Wave processor and
for SAA with βf ∈ {3, 4, 5}. The ratio of temperature to
energy scale is similar for the D-Wave processor studied
here and that studied by Hen et al. (∼ 7% difference).
For range-3 instances, βf = 4 appears to be too high
and βf = 3 is clearly too low. Although βf = 4 gives a
better fit, its scaling does not match that of the D-Wave
processor on low-α range-2 instances.
Fig. 16 shows instance-wise scatter plots of success
probabilities for the D-Wave processor and SAA at βf =
4. It is clear from this data that the best correlation is
poor when compared with the data of Hen et al., par-
ticularly looking at each problem size individually, and
that there is not a good fit between success probabili-
ties on range-2 instances. These data fail to support the
hypothesis that SAA provides a thermal model for per-
formance scaling on frustrated loop instances. On the
contrary, there are several pieces of evidence that in this
context, analog control error causes the appearance of
thermal behavior. First is the poor correlation compared
to that found by Hen et al.; the data shown in Fig. 16 for
60.1 0.2 0.3 0.4 0.5
101
102
103
104
105
106
α
T
im
e
to
so
lu
ti
o
n
(µ
s)
HFS, KingFL range ∞
0.1 0.2 0.3 0.4 0.5
101
102
103
104
105
106
α
HFS, HenFL range ∞
C4
C5
C6
C7
C8
C9
C10
C11
C12
C13
C14
P
ro
b
le
m
si
ze
FIG. 12. Comparison of KingFL and HenFL instances on different hardware graphs. Shown here are median TTS
for HFS on range-unlimited KingFL and HenFL instances. The KingFL instances are constructed on subgrids of a random C16
graph with similar yield to the SR10-V6 processor (91%), while HenFL instances are constructed on subgrids of a random C16
graph with similar yield to the ISI-V6 processor (98%).
0.1 0.2 0.3 0.4 0.5
100
101
α
A
d
va
n
ta
g
e
fr
o
m
re
d
u
ci
n
g
ra
n
g
e
HFS, range 2 vs. ∞
0.1 0.2 0.3 0.4 0.5
100
101
α
HFS, range 3 vs. ∞
C4
C5
C6
C7
C8
C9
C10
C11
C12
C13
C14
P
ro
b
le
m
si
ze
FIG. 13. Benefit of reduced coupling range for HFS. Here we show median speedup arising from changing the coupling
range for HFS on a broad set of inputs: α ∈ {0.100, 0.125, . . . , 0.500} and L ∈ {4, . . . , 14}. Instances are constructed on full CL
graphs. Error bars represent one standard deviation from bootstrap samples.
range-3 instances represents the best visual fit between
the D-Wave processor and SAA data out of all our exper-
iments. Second is the fact that we do not find agreement
at the same inverse temperature βf = 5. Third is given
by Hen et al. ([1] Fig. 12), who show scaling coefficients
for various choices of βf , along with results on perturbed
instances at βf = 5. Their data suggests that perturb-
ing the Hamiltonian has a similar effect to increasing the
apparent temperature of SAA.
71 2 3 4 5 6 7 8 9 10
0
200
400
600
800
Coupling range
In
st
a
n
ce
s
(o
f
1
0
0
0
)
HenFL precision limits for α = 0.25
C2
C3
C4
C5
C6
C7
C8
1 2 3 4 5 6 7 8 9 10
0
200
400
600
800
Coupling range
In
st
a
n
ce
s
(o
f
1
0
0
0
)
HenFL precision limits for α = 0.35
C2
C3
C4
C5
C6
C7
C8
1 2 3 4 5 6 7 8 9 10
0
200
400
600
800
Coupling range
In
st
a
n
ce
s
(o
f
1
0
0
0
)
KingFL precision limits for α = 0.25
C2
C3
C4
C5
C6
C7
C8
1 2 3 4 5 6 7 8 9 10
0
200
400
600
800
Coupling range
In
st
a
n
ce
s
(o
f
1
0
0
0
)
KingFL precision limits for α = 0.35
C2
C3
C4
C5
C6
C7
C8
FIG. 14. Distribution of coupling range for range-unlimited instances. Here we see the change in the distribution of
coupling range for random HenFL and KingFL instances at α = 0.25 and α = 0.35. Each data series consists of 1000 randomly
generated instances on a full Chimera graph CL. Naturally, coupling range increases as α increases. KingFL instances have
lower coupling range than corresponding HenFL instances, owing to their lower average loop length.
1 2 3 4 5 6 7
10−4
10−3
10−2
10−1
100
Coupler range R
P
ro
p
o
rt
io
n
o
f
n
o
n
ze
ro
co
u
p
le
rs
Proportion of nonzero couplers exceeding a given range
α = 0.10
α = 0.15
α = 0.20
α = 0.25
α = 0.30
α = 0.35
α = 0.40
α = 0.45
α = 0.50
FIG. 15. Couplers exceeding a given range. Shown is the mean proportion of nonzero couplers in a C8 range-∞ exceeding
each range limit. For fixed α, the proportion of couplers exceeding the range limit R decreases superexponentially with R.
Error bars represent one standard deviation from each data set of 200 instances.
80 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
D-Wave
S
A
A
(β
f
=
4
)
Range 2, α = 0.25
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
D-Wave
Range 2, α = 0.30
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
D-Wave
S
A
A
(β
f
=
4
)
Range 3, α = 0.25
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
D-Wave
Range 3, α = 0.30
FIG. 16. Success probability correlations between the D-Wave processor and SAA. Shown are instance-wise scatters
for the most difficult problems (α ∈ {0.25, 0.30}) for range-2 and range-3 instances. Instances receive colors according to their
size. SAA has βf = 4. Correlation on the range-3 instances is weak compared to correlation found by Hen et al. [1], whereas
correlation on range-2 instances, particularly for each problem size as a separate data set, is poor.
