Chaotic Amplitude Control for Neuromorphic Ising Machine in Silico by Leleu, Timothee et al.
Chaotic Amplitude Control for
Neuromorphic Ising Machine in Silico
Timothee Leleu1,2∗, Farad Khoyratee1,3∗, Timothee Levi1,3,4,
Ryan Hamerly,5 Takashi Kohno,1,2,4 Kazuyuki Aihara1,2
1Institute of Industrial Science, University of Tokyo, Japan
2International Research Center for Neurointelligence, University of Tokyo, Japan
3IMS, University of Bordeaux, France
4LIMMS/CNRS-IIS, the University of Tokyo, Japan
5Massachusetts Institute of Technology, Cambridge, MA, USA
∗To whom correspondence should be addressed; E-mail: timothee@iis.u-tokyo.ac.jp.
Ising machines are special-purpose hardware designed to reduce time, resources,
and energy consumption needed for finding low energy states of the Ising
Hamiltonian. In recent years, most of the physical implementations of such
machines have been based on a similar concept that is closely related to an-
nealing such as in simulated, mean-field, chaotic, and quantum annealing. We
argue that Ising machines benefit from implementing a chaotic amplitude con-
trol of mean field dynamics that does not rely solely on annealing for escap-
ing local minima trap. We show that this scheme outperforms recent state-
of-the-art solvers for some reference benchmarks quantitatively and qualita-
tively in terms of time to solution, energy efficiency, and scaling of its variance
with problem size when it is implemented in hardware such as in a field pro-
grammable gate array that is organized according to a neuromorphic design
where memory is distributed with processing.
1
ar
X
iv
:2
00
9.
04
08
4v
1 
 [p
hy
sic
s.c
om
p-
ph
]  
9 S
ep
 20
20
One Sentence Summary (150 characters): The performance of Ising machines
can be increased quantitatively and qualitatively by controlling the amplitude
of the mean field dynamics especially when implemented on a neuromorphic
hardware in silico.
1 Introduction
Many complex systems such as spin glasses, interacting proteins, large scale hardware, and fi-
nancial portfolios, can be described as ensembles of disordered elements that have competing
and frustrated interactions (1). There has been a growing interest in using physical simulators
called “Ising machines” in order to reduce time and resources needed to identify configura-
tions that minimize their total interaction energy, notably that of the Ising HamiltoniansH with
H(σ) = −1
2
∑
ij ωijσiσj (with ωij the symmetric Ising couplings, i.e., ωij = ωji, and σi = ±1)
that is related to many NP-hard combinatorial optimization problems and various real-world
applications (2). Recently proposed implementations include memresistor networks (3), micro-
or nano-electromechanical systems (4), micro-magnets (5, 6), coherent optical systems (7), hy-
brid opto-electronic hardware (8–10), integrated photonics (11–13), flux qubits (14), and Bose-
Einstein condensates (15). In principle, these physical systems often possess unique properties,
such as coherent superposition in flux qubits (16) and energy efficiency of memresistors (3,17),
which could lead to a distinctive advantage compared to conventional computers (see Fig. 1 (a))
for solving hard combinatorial optimization problems. In practice, the difficulty in constructing
connections between constituting elements of the hardware is often the main limiting factor
to scalability and performance for these systems (14, 18). Moreover, these devices often im-
plement schemes that are directly related to the concept of annealing (either simulated (19,20),
mean-field (21,22), chaotic (17,23), and quantum (16,24)) in which the escape from the numer-
ous local minima (25) and saddle points (26) of the free energy function can only be achieved
under very slow modulation of the control parameter (see Fig. 1 (b)).
Interestingly, alternative dynamics that does not depend on the concept of annealing may
perform better for solving hard combinatorial optimization problems (27–29). Various kinds of
dynamics have been proposed (30–34), notably chaotic dynamics (17, 35–38), but have either
2
not been implemented onto specialized hardware (35, 39) or use chaotic dynamics merely as
a replacement to random fluctuations (17, 36). We have recently proposed that the control of
amplitude in mean-field dynamics can significantly improve the performance of Ising machines
by introducing error correction terms (see Fig. 1 (d)), effectively doubling the dimensionality of
the system, whose role is to correct the amplitude heterogeneity (27). Because of the similarity
of such dynamics with that of a neural network, it can be implemented especially efficiently in
electronic neuromorphic hardware where memory is distributed with the processing (40–42).
In this paper, we implement a version of this scheme that we name chaotic amplitude control
(CAC) on a field programmable gate array (FPGA, see Fig. 1 (c)) as a paradigmatic example
and show that the developed hardware outperforms state-of-the-art Ising problem solvers for
some reference benchmarks quantitatively and qualitatively in terms of time to solution, energy
efficiency, and scaling of its variance with problem size.
2 Results
For the sake of simplicity, we consider the classical limit of Ising machines for which the state
space is often naturally described by analog variables (i.e., real numbers) noted xi in the follow-
ing. The variables xi represent measured physical quantities such as voltage (3) or optical field
amplitude (7–12) and these systems can often be simplified to networks of interacting nonlinear
elements whose time evolution can be written as follows:
dxi
dt
= fi(xi) + βi(t)
∑
j
ωijgj(xj) + σ0ηi,∀i ∈ {1, . . . , N}, (1)
where fi and gi represent the nonlinear gain and interaction, respectively, and are assumed to be
monotonic, odd, and invertible “sigmoidal” functions for the sake of simplicity; ηi, experimen-
tal white noise of standard deviation σ0 with 〈ηi(t)ηj(t′)〉 = δijδ(t− t′)1; and N , the number of
spins. Ordinary differential equations similar to eq. (1) have been used in various computational
models that are applied to nondeterministic polynomial-hard (NP-hard) combinatorial optimiza-
tion problems such as Hopfield-Tank neural networks (43), coherent Ising machines (44), and
1δij ; δ(t), the Kronecker delta symbol and Dirac delta function, respectively.
3
correspond to the “soft” spin description of frustrated spin systems (45). Moreover, the steady
states of eq. (1) correspond to the solutions of the “naive” Thouless-Anderson-Palmer (nTAP)
equations that arise from the mean-field description of Sherrington-Kirkpatrick spin glasses
when the Onsager reaction term has been discarded (46). In the case of neural networks in
particular, the variables xi and constant parameters ωij correspond to firing rates of neurons and
synaptic coupling weights, respectively.
It is well known that, when βi = β for all i and the noise is not taken into account (σ0 = 0),
the time evolution of this system is motion in the state space that seeks out minima of a potential
function (47) (or Lyapunov function) V given as V = βH(y)+∑i Vb(yi) where Vb is a bistable
potential with Vb(yi) = −
∫ yi
0
fi(g
−1
i (y)) dy and H(y) = −12
∑
ij ωijyiyj is the extension of
the Ising Hamiltonian in the real space with yi = gi(xi) (see supplementary material A.1).
The bifurcation parameter β, which can be interpreted as the inverse temperature of the naive
TAP equations (46), the steepness of the neuronal transfer function in Hopfield-Tank neural
networks (43), or to the coupling strength in coherent Ising machines (7–9), is usually decreased
gradually in order to improve the quality of solutions found. This procedure has been called
mean-field annealing (22), and can be interpreted as a quasi-static deformation of the potential
function V (see Fig. 1 (b)). There is, however, no guarantee that a sufficiently slow deformation
of the landscape V will ensure convergence to the lowest energy state contrarily to the quantum
adiabatic theorem (48) or the convergence theorem of simulated annealing (49)2. Moreover,
the statistical analysis of spin glasses suggests that the potential V is highly non-convex at low
temperature and that simple gradient descent of the potential V very unlikely reaches the global
minimum of H(σ) because of the presence of exponentially numerous local minima (25) and
saddle points (26) as the size of the system increases. The slow relaxation time of Monte-Carlo
simulations, such as simulated annealing, might also be explained in the case of spin glasses
by similar trapping dynamics during the descent of the free energy landscape obtained from
the TAP equations (26). In the following, we consider in particular the soft spin description
obtained by taking fi(xi) = (−1 + p)xi − x3i and yi = g(xi) = xi, where p is the gain
2At fixed β, global convergence to the minimum of the potential V can be assured if σ0 is gradually decrease
with σ0(t)2 ∼ clog(2+t) and c sufficiently large (50). The parameter σ20 is analogous to the temperature in simulated
annealing in this case. The global minimum of the potential V does not, however, generally correspond to that of
the Ising HamiltoniansH at finite β.
4
parameter, which is the canonical model of the system described in eq. (1) at proximity of a
pitchfork bifurcation with respect to the parameter p. In this case, the potential function Vb is
given as Vb(xi) = (−1 + p)x
2
i
2
− x4i
4
and eq. (1) can be written as dxi
dt
= − ∂V
∂xi
, ∀i.
The proposed amplitude control of mean-field dynamics consists in introducing error sig-
nals, noted ei ∈ R, that modulate the strength of coupling βi to the ith nonlinear element such
that βi(t) defined in eq. (1) is expressed as βi(t) = βei(t) with β > 0. The time evolution of
the error signals ei are given as follows (27):
dei
dt
= −ξ(g(xi)2 − a)ei, (2)
where a and ξ are the target amplitude and the rate of change of error variables, respectively,
with a > 0 and ξ > 0. If the system settles to a steady state, the values y∗i = g(x
∗
i ) become
exactly binary with y∗i = ±
√
a. When p < 1, the internal fields hi at the steady state, defined as
hi =
∑
j ωijσj with σj = y
∗
j/|y∗j |, are such that hiσi > 0, ∀i (27). Thus, each equilibrium point
of the analog system corresponds to that of a zero-temperature local minimum of the binary
spin system.
The dynamics described by the coupled equations (1) and (2) is not derived from a potential
function and the computational principle is not related to a gradient descent. Rather, the addition
of error variables results in additional dimensions in the phase space via which the dynamics
can escape local minima. The mechanism of this escape can be summarized as follows. It can
be shown (see the supplementary material A.2) that the dimension of the unstable manifold at
equilibrium points corresponding to local minima σ of the Ising Hamiltonian depends on the
number of eigenvalues µ(σ) with µ(σ) > F (a) where µ(σ) are the eigenvalues of the matrix
{ ωij|hi|}ij (with internal field hi) and F a function given as F (y) =
ψ′(y)
ψ(y)
y and ψ(y) = f(g
−1(y))
(g−1)′(y) .
Thus, there exists a value of a such that all local minima (including the ground state) are un-
stable and for which the system exhibits chaotic dynamics that explores successively candidate
boolean configurations. The energy is evaluated at each step and the best configuration visited
is kept as the solution of a run. Interestingly, this chaotic search is particularly efficient for
sampling configurations of the Ising Hamiltonian close to that of the ground state using a sin-
gle run although the distribution of sampled states is not necessarily given by the Boltzmann
distribution. Note that the use of chaotic dynamics for solving Ising problems has been dis-
5
cussed previously (17, 51), notably in the context of neural networks, and it has been argued
that chaotic fluctuations may possess better properties than Brownian noise for escaping from
local minima traps. In the case of the proposed scheme, the chaotic dynamics is not merely used
as a replacement to noise. Rather, the interaction between nonlinear gain and error-correction
results in the destabilization of states associated with lower Ising Hamiltonian.
The target amplitude a for which all local minima is unstable is not known a priori. There-
fore, we propose instead to destabilize the local minima traps by dynamically modulating the
target amplitude a as a function of the visited configuration σ using the heuristic function given
as follows:
a(t) = α− ρtanh(δ∆H(t)), (3)
where ∆H(t) = Hopt −H(t); H(t), the Ising Hamiltonian of the configuration visited at time
t; and Hopt, a given target energy. In practice, we set Hopt to the lowest energy visited during
the current run, i.e., Hopt(t) = mint′≤tH(t′). The function tanh is the tangent hyperbolic. ρ
and δ are positive real constants. In this way, configurations that have much larger Ising energy
than the lowest energy visited are destabilized more strongly due to smaller target amplitude
a. Lastly, the parameter ξ (see eq. (2)) is modulated as follows: dξ
dt
= γ when t − tr < ∆t,
where tr is the last time for which either the best known energyHopt was updated or ξ was reset.
Otherwise, ξ is reset to 0 if t− tr ≥ ∆t and tr is set to t3.
In order to verify that chaotic amplitude control is able to accelerate the search of mean-
field dynamics for finding the ground state of typical frustrated systems, we solve Sherrington-
Kirkpatrick (SK) spin glass problems using the numerical simulation of eqs. (1) to (3) and
two recently proposed implementation of the mean-field annealing method: noisy mean-field
annealing (NMFA) (21) and the simulation of the coherent Ising machine (simCIM) 4. Because
the arithmetic complexity of calculating one step of these three schemes are dominated by the
3The modulation described in eq. (3) is better suited for a digital implementation on FPGA than the one pro-
posed in (27). Numerical simulations shown in this paper suggest that this modulation results in destabilization of
non-trivial attractors (periodic, chaotic, etc.) for typical problem instances.
4See supplementary material E.1 and 2
6
Figure 1: Schematic representation of (a) conventional CPU architecture with the von Neumann
bottleneck problem and (c) the proposed neuromorphic chip for combinatorial optimization.
Schema of state-space dynamics of algorithms based on (b) annealing on a potential function
and (d) the proposed chaotic amplitude control scheme.
matrix-vector multiplication (MVM), it is sufficient for the sake of comparison to count the
number of MVM, noted ν, to find the ground state energy of a given instance with 99% success
probability, with ν(K) = K ln(1−0.99)ln(1−p0(K)) and p0(K) the probability of visiting a ground state
configuration at least once after a number K of MVMs in a single run. In Fig. 2, NMFA (a) and
the CAC (b) are compared using the averaged success probability 〈p0〉 of finding the ground
state for 100 randomly generated SK spin glass instances per problem size N . Note that the
success probability of the mean-field annealing method does not seem to converge to 1 even for
large annealing time (see Fig. 2 (a)). Because the success probability of NMFA and simCIM
remains low at larger problem size, its correct estimation requires simulating a larger number
7
of runs which we achieved by implementing these methods using a GPU. On the other hand,
the average success probability 〈p0〉 of CAC is of order 1 when the maximal number of MVM
is large enough, suggesting that the system rarely gets trapped in local minima of the Ising
Hamiltonian or non-trivial attractors. In Figure 2 (c) and (d) are shown the qth percentile (with
q = 50, i.e., the median) of the MVM to solution distribution, noted νq(K;N), for various
duration of simulation K, where K is the number of MVMs of a single run. The minimum
of these curves, noted ν∗q (N) with ν
∗
q (N) = minKνq(K;N), represents the optimal scaling
of MVM to solution vs. problem size N (14). For large N , the median MVM to solution
ν∗50(N) is better fitted by an exponential scaling with the square root problem size N , that
is ν∗50(N) ∼ eγ
√
N (see Fig. 2 (e), p-value: 4.3 × 10−5, R-value: 0.95), than an exponential
scaling with N (p-value: 1.1 × 10−4, R-value: 0.91) although there are not enough data points
to reject either of the two hypotheses. Using the former hypothesis, CAC exhibits significantly
smaller scaling exponent (γ = 0.26 ± 0.03) than the NMFA (γ = 0.47 ± 0.04) and simCIM
(γ = 0.54± 0.03, see inset in Fig. 2 (e)). We have verified that this scaling advantage holds for
various parameters of the mean-field annealing (see supplementary material E.1).
Although comparison of algebraic complexity indicates that CAC has a scaling advantage
over mean-field annealing, it is in practice necessary to compare its physical implementation
against other state-of-the-art methods because the performance of hardware depends on other
factors such as memory access and information propagation delays. To this end, CAC is imple-
mented into a FPGA because its similarity with neural networks makes it well-fitted for a design
where memory is distributed with processing (see supplementary material B for the details of
the FPGA implementation). The organization of the electronic circuit can be understood using
the following analogy. Pairs of analog values xi and ei, which represent averaged activity of
two types of neurons, are encoded within neighboring circuits. This micro-structure is repeated
N times on the whole surface of the chip which resembles the columnar organization of the
brain. The nonlinear processes fi(xi), which model the local-population activation functions
and are independent for i 6= j, are calculated in parallel. The coupling between elements i and
j ∈ {1, . . . , N} that is achieved by the dot product in eq. (1) is implemented by circuits that
are at the periphery of the chip and are organized in a summation tree reminiscent of dendritic
branching (see Fig. 1 (c)).
8
Figure 2: (a,b) Average success probability 〈p0〉 of finding the ground state configuration of 100
Sherrington-Kirkpatrick spin glass instances and (c,d) 50th percentile of the MVM to solution
distribution ν50 vs. problem size N . (a,c) NMFA. (b,d) CAC. (e) Number of matrix-vector mul-
tiplication MVM to solution distribution. Lower, higher, and upper whisker of boxes show the
50th, 80th, and 90th percentiles of the distribution. The upper right inset shows the exponential
scaling factor γ of the 50th percentile with ν50 ∼ eγ
√
N for CAC, NMFA, and simCIM.
We compare the FPGA implementation of CAC against state-of-the-art methods imple-
mented on various types of hardware: the single-core CPU implementation of break-out local
search (52) (BLS) that has been used to find most of the best known maximum-cuts (equiva-
lently, Ising energies) from the GSET benchmark set (53), a well-optimized single-core CPU
implementation of parallel tempering (or random replica exchange Monte-Carlo Markov chain
sampling) (54, 55) (PT, courtesy of S. Mandra), simulated annealing (SA) (56), and Toshiba’s
bifurcation machine on GPU (TBM) (57). Figure 3 (a) shows that the CAC on FPGA has the
smallest real time to solution τ ∗q against the other Ising machines and state-of-the-art algorithms
despite just 5W power-consumption where τ ∗q (N) is the optimal q
th percentile of time to so-
lution with 99% success probability and is given as τ ∗q (N) = minT τq(T ;N) where τ(T ) of
a given instance is τ(T ) = T ln(1−0.99)ln(1−p0(T )) and T is the duration in seconds of a run. Predic-
tions of time to solution obtained when increasing the clock frequencies of the FPGA are also
9
shown in dashed lines. Using 20W of power consumption, CAC on FPGA implementation has
smaller time to solution than recently proposed on-chip implementation of simulated annealing
called Digital Annealer (see results in (19)). Note that the power consumption of transistors
in the FPGA scales proportionally to its clock frequencies. In order to compare different Ising
machines despite the heterogeneity in their power consumption, the qth percentile of energy-
to-solution E∗q , i.e., the energy E
∗
q required to solve SK instances with E
∗
q = Pτ
∗
q and P the
power consumption5, is plotted in Fig. 3 (b). CAC on FPGA is 102 to 103 times more energy ef-
ficient than state-of-the-art algorithms running on classical computers. Moreover, the relatively
slow increase of time to solution with respect to the number of spins N when solving larger SK
problems using CAC suggests that our FPGA implementation is faster than the Hopfield neural
network implemented using memresistors (mem-HNN) (3) and the restricted Boltzmann ma-
chine using a FPGA (FPGA-RBM) (58) for N ' 1000 (see Tab. 1 and supplementary material
C).
Ising machine N = 100 N = 700 N = 2000
experimental eγN fit eγ
√
N fit eγN fit eγ
√
N fit
NTT CIM (9) 6× 10−2 6× 106 8× 102 > 105 > 105
mem-HNN (59) 10−4 2× 103 10−1 > 105 4× 102
FPGA-RBM (58) 3× 10−5 104 2× 10−1 > 105 3× 103
experimental eγN fit eγ
√
N fit
5 W FPGA-CAC 2× 10−3 5× 10−1 6× 104 2× 102
Table 1: Median time to solution in seconds and extrapolations for the NTT CIM (not paral-
lelized) (9), mem-HNN (59), FPGA-RBM (58), and FPGA-CAC (this work). Extrapolations
are based on the hypotheses of scaling in eγN and eγ
√
N by fitting the available experimental
data for each Ising machine (see supplementary material C).
We next consider the whole distribution of time to solution in order to compare the ability of
various methods to solve harder instances. As shown in Fig. 4 (a), the cumulative distribution
function (CDF) P (τ ;T ) of time to solution with 99% success probability τ is not uniquely
defined as it depends on the duration T of the runs. We can define an optimal CDF P ∗(τ) that
is independent of the runtime T as follows: P ∗(τ) = maxTP (τ ;T ). Numerical simulations
5For the sake of simplicity, we assume a 20 and 120 watts power consumptions for the CPU and GPU. These
numbers represent typical orders to magnitude for contemporary digital systems.
10
Figure 3: (a) Lower, higher, and upper whisker of boxes show the 50th, 80th, and 90th percentiles
of the real time to solution distribution in seconds for the FPGA implementation of CAC with
a maximum of 5W power consumption, CAC, SA, and PT algorithm running on a CPU (20W)
and TBM on a GPU (120W). The dashed and dotted red lines show the predictions of the real
time to solution in the case of a 10W and 20W FPGA implementation, respectively. (b) The
same as (a) for the energy-to-solution E∗.
show that this optimal CDF is well described by lognormal distribution, that is P ∗(log(τ)) ∼
N (µ,√v) where √v is the standard deviation of log(τ) (see Figs. 4 (b), (c), and (d) for the
cases of CAC, SA, and NMFA, respectively). In Fig. 4 (e), it is shown that the scaling of the
standard deviation
√
v(N) with the problem size N is significantly smaller for CAC, which
implies that harder instances can be solved relatively more rapidly than using other methods.
This result confirms the advantageous scaling of higher percentiles for CAC that was observed
in Figs. 2 and 3.
3 Conclusions
In order to test that hard instances of other types than SK can be solved by our FPGA implemen-
tation, we have benchmarked it against BLS on the GSET (53) (see supplementary material D):
most of the best known solutions were found approximately a hundred times faster than the BLS
algorithm (52) using less than 5W of power consumption although the current implementation
11
does not take advantage of the sparsity of GSET instances for reducing energy consumption.
For some instances (14th and 15th of size N = 2000 from GSET), we have found solutions of
better quality than previously known in (52) and (27). The framework described in this paper
can be extended to solve other types of constrained combinatorial optimization problems such
as the traveling salesman (43), vehicle routing, and lead optimization problems. Moreover, it
can be easily adapted to a variety of recently proposed Ising machines (3–12, 14) which would
benefit from implementing a scheme that does not rely solely on the descent of a potential
function. In particular, the performance of CIM (8, 9), mem-HNN (3), and chip-scale photonic
Ising machine (13), which have small time to solution for problem sizes N ≈ 100 but with a
relatively large scaling exponent that limit their scalability, could be significantly improved by
adapting the implementation we propose if these hardware can be shown to be able to support
larger problem sizes experimentally.
4 Methods
We target the implementation of a low-power system with maximum power supply of 5W using
a XCKU040 Kintex Ultrascale Xilinx FPGA integrated on an Avnet board. The implemented
circuit can process Ising problems of more than 2000 spins. Data are encoded into 18 bits fixed
point vectors with 1 sign, 5 integer and 12 decimal bits to optimize computation time and power
consumption. An important feature our FPGA implementation of CAC is the use of several
clock frequencies to concentrate the electrical power on the circuits that are the bottleneck of
computation and require high speed clock. For the realization of the matrix-vector product, each
element of the matrix is encoded with 2 bits precision (wij is −1, 0 or 1). An approximation
based on the combination of logic equations describing the behavior of a multiplexer allows
to achieve 104 multiplications within one clock cycle. The results of these multiplications are
summed using cascading DSP and CARRY8 connected in a tree structure. Using pipelining,
a matrix-vector product for a squared matrix of size N is computed in 2 + 5 log(N−4)log(5) + (
N
u
)2
clock cycles (see supplementary material B.4) at a clock frequency of 50MHz with u = 100
which is determined by the limitation of the number of available electronic component of the
12
Figure 4: (a) Cumulative distribution of the time to solution P (τ) for N = 400 SK problems.
(b,c,d) Optimal cumulative distribution P ∗(τ) with P ∗(log(τ)) ∼ N (µ(N),√v(N)) for CAC
(b), SA (c), and NMFA (d), respectively. (e) Standard deviation
√
v of the logarithm of time
to solution distribution vs. problem size N . Shaded regions show the 95% error in the scaling
exponents.
XCKU040 FPGA6. The calculation of the nonlinearity fi and error terms is achieved at higher
frequency using DSP in 8+(N/u) and 9+(N/u) clock cycles, respectively. In order to minimize
energy resources and maximize speed, the nonlinear and error terms are calculated multiple
times during the calculation of a single matrix-vector product (see supplementary material B).
6The block size u can be made at least 3 times larger using commercially available FPGAs, which implies that
the number of clock cycles needed to compute a dot product can scale almost logarithmically for problems of size
N < 1000 (see supplementary material B.4 for discussions of scalability) and that the calculation time can be
further significantly decreased using a higher-end FPGA.
13
References
1. G. Parisi, M. Me´zard, M. Virasoro, World Scientific, Singapore 187, 202 (1987).
2. G. Kochenberger, et al., Journal of Combinatorial Optimization 28, 58 (2014).
3. F. Cai, et al., Nature Electronics pp. 1–10 (2020).
4. I. Mahboob, H. Okamoto, H. Yamaguchi, Science advances 2, e1600236 (2016).
5. K. Y. Camsari, R. Faria, B. M. Sutton, S. Datta, Physical Review X 7, 031014 (2017).
6. K. Y. Camsari, B. M. Sutton, S. Datta, Applied Physics Reviews 6, 011305 (2019).
7. A. Marandi, Z. Wang, K. Takata, R. L. Byer, Y. Yamamoto, Nature Photonics 8, 937 (2014).
8. P. L. McMahon, et al., Science 354, 614 (2016).
9. T. Inagaki, et al., Nature Photonics 10, 415 (2016).
10. D. Pierangeli, G. Marcucci, C. Conti, Physical Review Letters 122, 213902 (2019).
11. C. Roques-Carmes, et al., Nature communications 11, 1 (2020).
12. M. Prabhu, et al., arXiv preprint arXiv:1909.13877 (2019).
13. Y. Okawachi, et al., Nature Communications 11, 1 (2020).
14. R. Hamerly, et al., Science advances 5, eaau0823 (2019).
15. K. P. Kalinin, N. G. Berloff, Scientific reports 8, 17791 (2018).
16. M. W. Johnson, et al., Nature 473, 194 (2011).
17. S. Kumar, J. P. Strachan, R. S. Williams, Nature 548, 318 (2017).
18. B. Heim, T. F. Rønnow, S. V. Isakov, M. Troyer, Science 348, 215 (2015).
19. M. Aramon, et al., Frontiers in Physics 7, 48 (2019).
14
20. S. Kirkpatrick, C. D. Gelatt, M. P. Vecchi, Science 220, 671 (1983).
21. A. D. King, W. Bernoudy, J. King, A. J. Berkley, T. Lanting, arXiv preprint
arXiv:1806.08422 (2018).
22. G. Bilbro, et al., Advances in neural information processing systems (1989), pp. 91–98.
23. L. Chen, K. Aihara, Neural Netw. 8, 915 (1995).
24. T. Kadowaki, H. Nishimori, Phys. Rev. E 58, 5355 (1998).
25. F. Tanaka, S. Edwards, Journal of Physics F: Metal Physics 10, 2769 (1980).
26. G. Biroli, Journal of Physics A: Mathematical and General 32, 8365 (1999).
27. T. Leleu, Y. Yamamoto, P. L. McMahon, K. Aihara, Physical review letters 122, 040607
(2019).
28. M. Ercsey-Ravasz, Z. Toroczkai, Nat. Phys. 7, 966 (2011).
29. B. Molna´r, F. Molna´r, M. Varga, Z. Toroczkai, M. Ercsey-Ravasz, Nature communications
9, 4864 (2018).
30. T. Aspelmeier, M. Moore, Physical Review E 100, 032127 (2019).
31. S. Boettcher, A. G. Percus, Phys. Rev. Lett. 86, 5211 (2001).
32. G. Zarand, F. Pazmandi, K. Pal, G. Zimanyi, Physical review letters 89, 150201 (2002).
33. T. Leleu, Y. Yamamoto, S. Utsunomiya, K. Aihara, Phys. Rev. E 95, 022118 (2017).
34. S. Krishna Vadlamani, T. P. Xiao, E. Yablonovitch, arXiv e-prints pp. arXiv–2007 (2020).
35. T. Aspelmeier, R. Blythe, A. J. Bray, M. A. Moore, Physical Review B 74, 184411 (2006).
36. M. Hasegawa, T. Ikeguchi, K. Aihara, Phys. Rev. Lett. 79, 2344 (1997).
37. Y. Horio, K. Aihara, Physica D: Nonlinear Phenomena 237, 1215 (2008).
15
38. K. Aihara, Proceedings of the IEEE 90, 919 (2002).
39. A. Montanari, arXiv preprint arXiv:1812.10897 (2018).
40. S. B. Furber, F. Galluppi, S. Temple, L. A. Plana, Proceedings of the IEEE 102, 652 (2014).
41. M. Davies, et al., IEEE Micro 38, 82 (2018).
42. B. V. Benjamin, et al., Proceedings of the IEEE 102, 699 (2014).
43. J. J. Hopfield, D. W. Tank, Biol. Cybern. 52, 141 (1985).
44. Z. Wang, A. Marandi, K. Wen, R. L. Byer, Y. Yamamoto, Physical Review A 88, 063853
(2013).
45. H. Sompolinsky, A. Zippelius, Physical Review B 25, 6860 (1982).
46. D. J. Thouless, P. W. Anderson, R. G. Palmer, Philosophical Magazine 35, 593 (1977).
47. J. J. Hopfield, Proceedings of the national academy of sciences 81, 3088 (1984).
48. E. Farhi, J. Goldstone, S. Gutmann, M. Sipser, arXiv preprint quant-ph/0001106 (2000).
49. V. Granville, M. Kriva´nek, J.-P. Rasson, IEEE transactions on pattern analysis and machine
intelligence 16, 652 (1994).
50. S. Geman, C.-R. Hwang, SIAM Journal on Control and Optimization 24, 1031 (1986).
51. H. Goto, Scientific reports 6, 21686 (2016).
52. U. Benlic, J.-K. Hao, Eng. Appl. Artif. Intell. 26, 1162 (2013).
53. https://web.stanford.edu/˜yyye/yyye/Gset/.
54. K. Hukushima, K. Nemoto, Journal of the Physical Society of Japan 65, 1604 (1996).
55. S. Mandra, B. Villalonga, S. Boixo, H. Katzgraber, E. Rieffel, APS Meeting Abstracts
(2019).
16
56. S. V. Isakov, I. N. Zintchenko, T. F. Rønnow, M. Troyer, Computer Physics Communica-
tions 192, 265 (2015).
57. H. Goto, K. Tatsumura, A. R. Dixon, Science advances 5, eaav2372 (2019).
58. S. Patel, L. Chen, P. Canoza, S. Salahuddin, arXiv preprint arXiv:2008.04436 (2020).
59. F. Cai, et al., arXiv preprint arXiv:1903.11194 (2019).
60. F. J. Pineda, Journal of Complexity 4, 216 (1988).
Acknowledgments
This research is partially supported by the Brain-Morphic AI to Resolve Social Issues (BMAI)
project (NEC corporation), AMED under Grant Number JP20dm0307009, and UTokyo Center
for Integrative Science of Human Behavior(CiSHuB).
17
Supplementary material A: theoretical analysis
1. Derivation of a potential function
First, we analyze the system described by eq. (1) only, when σ0 = 0 and βi = β, ∀i. In this
case, the potential function V (y) = −1
2
β
∑
ij ωijyiyj −
∑
i
∫ yi
0
fi(g
−1
i (y)) dy has the following
property (47):
dV
dt
= −
∑
i
dyi
dt
(β
∑
j
ωijyj + fi(g
−1
i (yi))), (4)
= −
∑
i
dyi
dt
(
dxi
dt
), (5)
= −
∑
i
(
dyi
dt
)2(g−1)′(yi). (6)
Consequently, V is such that dV
dt
< 0 because g is strictly monotonic and dV
dt
= 0 =⇒ dyi
dt
= 0,
∀i. In other words, the dynamics of the system can be understood in this case as the gradient
descent on the potential function V and its stable steady states correspond to local minima of
V .
2. Analysis of CAC
Next, we analyze the system described in eqs. (1) and (2) (a is constant) by considering the
change of variable yi = g(xi) with g such that g−1(yi) = xi. In this case, eqs. (1) and (2) can
be rewritten as follows (note that f and g are odd functions):
φ′(yi)
dyi
dt
= f(g−1(yi)) + βei
∑
j
ωijyj, (7)
dei
dt
= ξ(y2i − a)ei, (8)
where φ(y) = g−1(y).
We analyze the dimension of the unstable manifold of CAC by linearizing the dynamics
near the steady states and calculating the real part of eigenvalues of the corresponding Jacobian
matrices. The steady state of the system in eqs. (7) and (8) can be written as follows:
18
y∗i = σi
√
a, (9)
e∗i = −
f(g−1(σi
√
a))
β
√
ahi
= −f(g
−1(
√
a))
β
√
aσihi
. (10)
with hi =
∑
j ωijσj . The last equality is a consequence of the fact that g, and thus also g
−1, are
odd functions.
The Jacobian matrix J corresponding to the system of eqs. (1) to (2) at the steady state
can be written in the block representation J = [JyyJye; JeyJee] with its components given as
follows:
Jyyij = ψ
′(
√
a)δij − ψ(
√
a)√
a
ωij
hiσi
, (11)
Jyeij =
β
√
a
φ′(
√
a)
hiδij, (12)
Jeyij =
−2ξf(g−1(√a))
βhi
δij, (13)
Jeeij = 0. (14)
with ψ(y) = f(g
−1(y))
φ′(y) and φ(y) = g
−1(y). Note that we define Jeyii J
ye
jj = b with b =
−2ξ√aψ(√a).
The eigenvalues of the Jacobian matrix J are solutions of the polynomial equation P (λ) =
det[J − λI] = det[(Jyy − λI)λI − bI]. The eigenvalues of J are thus solutions of the quadratic
equation z(z − λi) − b = 0 where λi is the ith eigenvalue of the matrix Jyy. Therefore, the
eigenvalues of J can be described by pairs λ+i and λ
−
i given as follows:
λ±i =
1
2
(λi ±
√
∆i), with (15)
λi = ψ
′(
√
a)− ψ(
√
a)√
a
µi, and (16)
∆i = λ
2
i + 4b, (17)
where µi is the ith eigenvalue of the matrix D[(σh)−1]Ω.
19
The eigenvalues λ+i and λ
−
i become complex conjugate when the ∆i = 0, i.e., [ψ
′(
√
a) −
ψ(
√
a)√
a
µi]
2 − 8ξ√aψ(√a)) = 0, which can be rewritten as follows:
µi = G
±
ξ (
√
a), (18)
with G±ξ (y) =
y
ψ(y)
[ψ′(y)± 2√2ξ|y|ψ(|y|)].
Lastly, the real part of eigenvalues λ+i become equal to zero at the condition given as follows
(note that Re[λ+i ] ≥ Re[λ−i ] and µi are real at local minima7 for which, ∀j, σjhj > 0):
√
a = 0 or ψ(
√
a) = 0 if ∆i ≤ 0, (19)
µi = F (
√
a) otherwise. (20)
with F (y) = ψ
′(y)
ψ(y)
y. The dimension of the unstable manifold at a given fixed point is then given
by the number of indices i for which Re[λ±i ] is positive. To illustrate the destabilization of the
ground state configuration of an Ising problem, we show in Figs. 5 and 6 the dynamics of CAC
when solving a problem of size N = 15 spins when the state encoding for the ground state
configuration is stable and unstable, respectively, for two different examples of functions f and
g defining eqs. (1) and (2). The first set of functions f and g with f(x) = (−1 + p)x− x3 and
g(x) = x shown in Fig. 5 (a,b,c,d) corresponds to the soft spin model (or simulation of CIM)
with chaotic amplitude control whereas the second one (e,f,g,h) with f(x) = −x+ tanh[0.99x]
and g(x) = tanh[x] corresponds to an Hopfield neural network with amplitude heterogeneity
error correction. These two figures show that the stability of a given local minima of the Ising
Hamiltonian depends on the value of the target amplitude a as predicted in eq. (20).
7Because the vector σ · h has positive components at local minima, the eigenvalues of D[(σ · h)−1]Ω are the
same as the ones of D[(σ · h)−1] 12 ΩD[(σ · h)−1] 12 (Sylvester’s law of inertia), which is a symmetric real matrix.
Thus, the eigenvalues µj are always real.
20
Figure 5: (a,e) Bifurcation diagram in the space {√a, µ = µj(σ)}, ∀j ∈ {1, . . . , N}, at config-
urationσ. The full red and green dotted lines correspond to the set for whichRe[λ±i ] = 0 (where
fixed points corresponding to local minima become stable) and ∆i = 0 (where oscillations start
to occur around fixed points). Red + and black× symbols correspond to the largest eigenvalues
µ0(σ) of the Jacobian at the ground state and excited states, respectively, of an Ising problem of
size N = 15 and a = 0.03. In (b,f) and (c,g) are shown the dynamics of the variables x and e,
respectively. In (d,h) are shown the Ising Hamiltonian H(t) with horizontal lines showing the
values of the Ising Hamiltonian at local minima. (a,b,c,d) f(x) = (−1 +p)x−x3 and g(x) = x
with p = 0.95, β = 0.1, ξ = 0.1. (e,f,g,h) f(x) = −x + tanh[0.99x] and g(x) = tanh[x] with
β = 0.1, ξ = 0.1.
21
Figure 6: Same as Fig. 5 with a = 0.13.
22
Supplementary material B: FPGA implementation
1. FPGA implementation of CAC
CAC has been implemented on a XCKU040 Xilinx FPGA integrated into the KU040 develop-
ment board designed by Avnet. The circuit can process Ising problems of more than 2000 spins.
The reconfigurable chip is a regular Ultrascale FPGA including 484,800 flip-flops, 242,400
Look Up Table (LUT), 12,000 Digital Signal Processing (DSP) and 600 Block RAM (BRAM).
Initial value for xi, ei and ωij are sent through Universal Asynchronous Receiver Transmitter
(UART) transmission. UART protocol has been chosen because it does not require a lot of
resources to be implemented and its simplicity.
Data are encoded into 18 bits fixed point with 1 sign bit, 5 integer bits which represents a
good compromise between accuracy and power consumption. The power consumption of the
FPGA is equal or lower to 5W depending on the problem size. The system is defined by its
top-level entity that represents the highest level of organization of the reconfigurable chip.
2. Top level entity
The circuit is organized into four principal building blocks as shown on Fig. 7 (a). Several
clock frequencies have been generated to concentrate the electrical power on the circuits that
need high speed clock when these circuits constitute a bottleneck in the current implementation.
Finally, the organization of the main circuit is represented in Fig. 7 (b). The control block is
composed of Finite State Machines (FSM) and is well tuned to pilot all computation core, the
Random-Access Memory (RAM), and data flowing between computation cores and RAM. All
circuits are synchronized although the system utilizes multiple clock frequencies. Clock Enable
(CE) ports on the BUFGCE component are used to stop the power when a circuit is not used in
order to reduce further the global power dissipation.
23
Figure 7: Organization of the FPGA circuit. (a) The top layer entity is divided into four mod-
ules: The Phased-Locked Loop (PLL) that convert a differential clock of 250MHz into three
clocks fg=50MHz, fx=300Mhz and fe=100MHz respectively representing the global clock, the
clock for xi and the clock for ei. Two UART modules are used to receives parameters and to
return the results of the computation. (b) The Max-Cut circuit is composed of several processes
aiming to control the exchange of data between the memory and the computation cores (xi, ei,
xiωij and Ising). The three generated clocks are controlled by the Clock Enable (CE) port of
the BUFGCE component of the FPGA.
24
3. Temporal organization of the computation
The control circuit pilot the computation in order to calculate several steps of xi and ei per cal-
culation of the Ising coupling. The dynamics of CAC (see eqs. (1), (2) and (3)) is implemented
based on the following pseudocode:
Algorithm Pseudocode from which the FPGA implementation is adapted
1: for ν ∈ {0, · · · , K} do . Iterate on the number of MVMs
2: x′ ← x . Save state
3: I ← e(˙Ωx′) . calculation of injection term
4: σ ← x′i|x′i|
5: H ← −1
2
σ(˙Ωσ) . Ising energy calculation
6: for i ∈ {0, · · · , nx} do . Update nonlinear terms
7: ∆x ← (−1 + p)x− (x)3 + I
8: x← x+∆xdtx
9: for i ∈ {0, · · · , ny} do . Update error terms
10: ∆e ← −β((x′)2 − a)e
11: e← e+∆edte
12: β ← β + λdt . Update β
13: ∆H ← H −Hopt . Update a
14: a← α + ρtanh(δ∆H)
15: if ν − νc > τ/dt then . Reset of β
16: νc ← ν
17: β ← 0
18: if H < Hopt then . Update optimal H
19: Hopt ← H
20: νopt ← ν
21: νc ← ν
Note that the order of operations described in the pseudocode are a simplification of the ones
occurring on the FPGA.
The temporal organization of the circuit represented in Fig. 8 shows how is controlled the
reading and writing state on the RAM in the case of a problem size N = 500 spins. Fig. 8
(a) shows the case of a single xi and ei calculation per dot product which is the classic way
to integrate the differential equations (1) and (2) using the Euler method. Fig. 8 (b) represents
the case of eight and four calculations of xi and ei, respectively, per dot product. In this case,
the error correction term is computed with a normalized time step of 2−4 (half of the time step
25
used in the computation of xi). The total power consumption can be reduced by increasing the
frequency of circuits involved in the calculation of xi because these circuits are smaller than
those involved in the dot product calculation (see Fig. 9 (d) and (e)). Increasing the problem
size will increase the power consumption by filling up the calculation pipelines but this can
be balanced out by reducing the frequency used for the calculation of xi for larger problem
size. The end of the dot product is followed by an update of all RAM and saving of σi which
correspond to the Most Significant Bit (MSB) of xi values. This step is not represented here.
Figure 8: Temporal representation of the circuits where the red bars represent the computation
cores and the blue bars the use of RAM. Here, (r) stand for read and (w) for write. (a) One
time step representing the classical way of solving differential equation. In this configuration,
the dot product create a bottleneck to the computation speed. (b) 8 time steps for xi and 4 time
step for xi accelerating the computation time to find the ground state energy and create a new
bottleneck to the number of possible time step possible. Note that the time between every red
bars are necessary update the xi RAM which is not represented in this figure for better clarity.
The use of several xi and ei calculations per dot product allows to reduce the bottleneck
26
created by the later whose calculation is in principle the most time consuming. Fig. 9 (a) to (c)
show the reduction in time to solution vs. the number of xi calculations per dot product.
Figure 9: (a) The number of iterations of the x variable to solution vs. the number of update,
noted nx, of the x variable per matrix-vector product. (b) The number of matrix-vector multi-
plications MVM to solution. (c) Real time to solution. (d) Maximum possible time step with
and without clock modulation on xi. (e) Maximum possible time step with and without clock
modulation on ei.
4. Coupling
Overview of the coupling circuit
The dot product operation (see Fig. 10) is preceded by a multiplication by β so that the domain
of xi is reduced and, in turn, the number of digits required to encode the integer part of xi. The
result of the dot product is multiplied by the error correction term as shown in Fig. 10. Since the
dot product is performed at 50MHz clock speed, the optional registers of the DSP are no longer
27
required and have been removed for the two multiplications resulting in 1 clock cycle operation
for each operation.
Figure 10: The circuit designed to compute the coupling is composed of three parts: the multi-
plication between xi and β, the dot product, and multiplication with the error correction term.
High speed and massively parallel multiplication
The multiplication of the elements of a vector x by the element of a matrix Ω can be performed
using an approximation based on a specific circuit combining logic equations describing the
behavior of a multiplexer and the optimized use of FPGA components. This circuit allows the
design to achieve 10,000 multiplications in 1 clock cycle. In our case an element of x is fixed
point binary vector on k bits that are multiplied by an element of a matrix composed of two
bits vectors where ω is an element of Ω and ω ∈ {−1, 0, 1}. The behavior of a multiplexer
that implements the multiplication of x by ω by selecting R ∈ {−x, 0, x} is given as follows (x¯
represent −x):
R = xω¯1ω¯0 + x¯ω1ω0 (21)
where x is an element of a vector, ωi a binary vector and an element of the matrix Ω with i the
index of the binary vector that is either 1, the most significant bit (MSB) or 0, the less significant
bit (LSB). If we expand eq. (21) to each bits xi of the vector x and use Boolean operation, we
obtain the equation eq. (22) given as follows:
R = xω¯1ω0 + (x¯⊕ C)ω1ω0 (22)
28
where −x is represented by the two-complement operation x¯ ⊕ C that consist of inverting the
bits of x and adding C a constant corresponding to 2−d with d the decimal part size of the vector.
Here, x¯ now represents the bit-wise not operation. Applying De Morgans law on eq. (22) will
lead to the following:
R = xω¯1ω0 + x¯ω1ω0 ⊕ Cω1ω0, (23)
R = ω0(x⊕ ω1)⊕ Cω1ω0, (24)
R = ω0(x⊕ ω1), (25)
In eq. (24), we consider C = 0 which introduces an absolute error of −2−d when the MSB and
the LSB of ω are equal to 0. The aim of such approximation is that eq. (25) can be implemented
by a single LUT3 component. Consequently, achieving 10,000 operations require k×104 LUT3
where k represents the precision of x and is chosen to lower the required resources and error.
First adder stage
The circuit shown in Fig. 11 represents the operations of multiplication used in the dot product
based on eq. (25). To accelerate the computation time, multiple access memory is utilized:
Fig. 11 (a) and (b) show the implementation of multiple block RAM (BRAM) that output 100
rows of 100 values at the same time. Eq. (25) is implemented in LUT3 as described in the
circuit of the Fig. 11 (c) which compute the addition of two elements of the vector {ωijxi}i.
Using LUT3 for the elements at index 2i, with i the index of the generated vectors beginning at
0, and multiplexers for the elements at index 2i + 1 ensures the use of lower resources and the
optimal use of the configurable logic block (CLB).
DSP tree
For block matrices and vectors of size u with u = 100, the dot product requires to perform
10,000 multiplications and 8,100 additions. The circuit of the Fig. 11 (c) computes two multi-
plications and one addition. Thus, by reproducing this circuit 5,000 times, the FPGA computes
15,000 operations in a clock cycle and generates 100 vectors of 50 values that need to be added.
This operation is done using an adder tree represented in Fig. 12. A first stage of adder, that is
29
not represented between the circuit of the Fig. 11 (c) and Fig. 12, is realized using the CARRY8
component that reduces the 100 vectors of 100 elements to 100 vectors of 25 elements each.
The remaining elements are then added using a DSP tree in adder mode. The advantage of using
an adder tree of DSP is the low LUT number that is required, the optimal use of power con-
sumption and the possibility to increase the frequency of the circuit. The Ultrascale architecture
provided by the FPGA KU040 possesses a high number of DSP that can be cascaded allowing
to reduce the routing circuit and the computation time. The DSP tree of Fig. 12 is repeated 100
times to compute at the same time all elements of the dot product resulting in the use of 1,200
DSP.
Scalability of such architecture for bigger FPGA
The number of clock cycles required to perform the dot product is given as follows:
Ct(u,N) = K + (N/u)
2 (26)
where N is the problem size, u the divider that partitions the matrix (in the current implemen-
tation u=100) and K the number of clock cycles required for the multiplications and additions.
The adder tree composed of CARRY8 and DSP previously proposed is constrained by the fol-
lowing condition:
N
2hC + 5hD
= 1, (27)
where hC represents the height of the CARRY8 tree and hD the height of the DSP tree. The
CARRY8 requires only one clock cycle when two cascaded DSP need 5 clock cycles. The
height K of the adder tree (in clock cycle) is then:
K = hC + 5hD. (28)
We can fix hC as a constant according to the proposed FPGA circuit and find hD as follow:
30
N = 2hC + 5hD , (29)
N − 2hC = 5hD ,
log(N − 2hC ) = hDlog(5),
hD =
log(N − 2hC )
log(5)
. (30)
Then the height K is equal to:
K = hC + 5
log(N − 2hC )
log(5)
. (31)
The adder tree increases logarithmically if we assume that an infinite amount of resources is
available. Also, we show here that the design can be significantly improved if u become larger
with more available resources.
5. Ising energy circuit
The Ising energy is computed at the same time as the main dot product of xi by ωij because it
shares the same output from the RAM that store the ωij . As for the dot product, the Ising energy
has been computed using logic equations to fit in a minimal number of LUT. The logic equation
for the multiplication of σ by the matrix Ω can be described as follows:
S1 = (ω1 ⊕ σj)ω0, (32)
S0 = w0, (33)
where σi is the sign of xi and S is a 2 bits vector representing the multiplication of σi at index i
by ω ( representing ωij) which is also represented by 2 bits whose index is either 0 (LSB) or 1
(MSB).
The Fig. 13 shows a schematic of the circuit used to compute the Ising energy. Note that
the Ising energy of a matrix M divided into matrices mij , where i and j are the indexes of the
partitioned M, is equal to the sum of the energies of mij . Thus, the output of the pipelined Ising
energy circuit is added with itself.
31
6. Non-linear term
To optimize the use of the electrical power, xi has been designed to use the highest frequency
fx and the error correction use and intermediate frequency fe. Both are computed several times
during the operation of the dot product to reduce the computation time. Fig. 14 shows the circuit
use to compute one element of the vectors xi and ei that are reproduced 100 times. The circuits
use pipelined DSP to compute additions, subtractions and multiplications. A shift register is
used to multiply by the dt of Euler approximation. To reduce the number of DSP into the
design, x2i is shared between xi and ei through a true dual port RAM that can be used with two
different clocks to synchronize the two circuits. The RAM is controlled by an external FSM in
the control module of Fig. 7.
7. Modulation term
The circuits implementing the modulation of the target amplitude a defined in eq. (3) are shown
in Fig. 15. A saturation circuit has been designed to reduce the domain of the sigmoid function.
Pre and post operations are available into the DSP for reducing the use of LUT.
The tangent hyperbolic function is approximated by Lagrange interpolation using only 96
LUT and 1 DSP. The Lagrange interpolation, defined by L, is given as follows:
L(x) = y0 +
y1 − y0
x1 − x0 (x− x0) (34)
where, L(x) is the interpolation function, [x0, y0] the coordinate of the point situated before the
results and [x1, y1] the coordinate of the point after the result (see Fig. 16).
8. Power consumption
Energy consumption of the circuit is determined by the number of logic transitions (from 0 to 1
or 1 to 0) and the frequency with P = 〈s〉CV 2F where P is the power dissipated by a transistor
based circuit, C the switching capacitance, V the voltage and F the frequency, and 〈s〉 the aver-
age number of switch per clock cycle. Voltage and capacitance are FPGA dependent since it is
already manufactured. The digital circuit are at the highest power consumption when the com-
ponents are enabled and when the clock signal drives the Configurable Logic Block (CLB) or
32
the DSP. Thus, Enable and disable the block RAM is efficient to reduce the consumption how-
ever, clock gating shows better performances in this implementation. The methods have been
extended on the available FF of the Xilinx FPGA and DSP which possess clock enable gate.
However, when the design becomes large, high number of signal propagating enable towards
CLB or DSP increase fanout and routing complexity. This can be solved by using BUFGCE
component that are available in the FPGA and able to enable or disable the clock. Thus, when a
given circuit is not needed, the clock can be disabled which will reduce significantly the power
consumption.
The KU040 boards use Infineon regulators IR38060 incorporated voltage and current sen-
sors. These sensors can be access either from the FPGA or from external bus with the PMBUS
protocol which is based on i2c. We used here an Arduino to communicate with the regulators
and record power data.
The power has been measured for different problem sizes and does not exceed 5W. Exper-
iments show that the most critical operation for energy efficiency and computation time is the
dot product which dissipates most of the FPGA power and needs more clock cycles to operate.
33
Figure 11: Representation of the high speed and multiple memory access apply to a circuit used
for the dot product. (a) 100 RAM are instantiated and can be accessed at the same time. Each
RAM corresponds to a row of the matrix Ω. Each element ωij of Ω is encoded on a two bits
vector. (b) As in (a), 100 RAM can be accessed at the same time to compose the vector x. (c)
The Ωi,2j and x2i values are injected into a LUT3 to compute the eq. (25). The x2i+1 values
are injected into a multiplexer (MUX) whose selector is controlled by the LSB of Ωi,2j+1. The
output of the LUT3 and the MUX are propagated through the CARRY8 component that add or
subtract according to the value of Ωi,2j+1 MSB.
34
Figure 12: Schematic representation of the DSP tree following the multiplication and the two
addition stages shown in Fig. 11. This first addition stage takes 5 clock cycles and produces 5
values that can be added into a new DSP stage of 5 clock cycles.
35
Figure 13: Ising energy circuit. The LUTs integrate eqs. (32) and (33) producing vectors of
σjωij . The results are added up using an adder tree composed of CARRY8 resulting in a vector
that is multiplied by σi using a multiplexer (MUX). The MUXs select between the result or its
negation with σi as a selector. Then, the output of MUXs are added and negated.
36
Figure 14: Circuits of the non-linear terms xi and ei. The two terms use different clocks and
share their results through a dual port block RAM allowing to read and write at different speed.
Both circuit use DSP for high speed computation and are generated one hundred times to per-
form parallel computation.
37
Figure 15: Circuit of the target amplitude using pre-subtractor into one DSP and post-subtractor
into the other DSP. A saturation comparator has been added to reduce the range of value at
the input of the sigmoid function. The sigmoid function uses an interpolation of the tangent
hyperbolic.
Figure 16: (a) Schematic representation of the FPGA implementation of the Lagrange inter-
polation function. Each points used for the interpolation is separated by 0.5. A shift register
allows to select the address of interpolated points into the Read Only Memory (ROM). The sign
of x is conserved so that only the positive part is interpolated. (b) Representation of the tanh
function and interpolated points.
38
11. Parameter values used for solving SK spin glass instances
The parameter values used for solving SK spin glass instances are shown in Tab. 2.
Symbol meaning value
β coupling strength 0.25
α target amplitude baseline 3.0
p linear gain 1− ( N
220
)−2
ρ amplitude and gain variation 3
δ sensitivity to energy variations 4
γ rate of increase of ξ 0.00011
τ max. time w/o energy change 1200
nx number of iterations for nonlinear terms 6
ne number of iterations for error terms 4
dtx normalized time-step of nonlinear terms 2−5
dte normalized time-step of error terms 2−4
Table 2: Parameters used for solving SK problem instances.
39
Supplementary material C: scaling vs. recently proposed Ising
machines
In order to assess the scalability of the chaotic amplitude control and its FPGA implementation,
we compare its median time to solution in seconds and extrapolations against that of the NTT
CIM (not parallelized) (9), Hopfield neural network implemented using memresistors (mem-
HNN) (59), restricted Boltzmann machine using a FPGA (FPGA-RBM) (58). Extrapolations
are based on the hypotheses of scaling in eγN and eγ
√
N by fitting the available experimental
data up to N = 100 for mem-HNN and FPGA-RBM, N = 150 for NTT CIM, and N = 700
for FPGA-CAC. Fig. 17 shows that mem-HNN, FPGA-RBM, and NTT CIM have similar
scaling exponents, although FPGA-RBM tends to exhibit a scaling in eγN rather than eγ
√
N for
N ≈ 100 (58). In the hypothesis of scaling in eγ
√
N for these three machines, extrapolations
show that FPGA-CAC exhibits smaller time to solution for N ' 1000 even though we do
not take into account the increase in scaling exponents that would occur when implementing
larger problem sizes for mem-HNN and FPGA-RBM. Fig. 17 shows moreover that the 20W
implementation of FPGA-CAC has smaller time to solution than DA (19) atN ≈ 700. It should
be noted that the experimental data points of Fig. 17 are directly taken from published results
of each Ising machine (9, 58, 59) which are not based on the same benchmark instances. It
can be nonetheless expected that the algorithm implemented in mem-HNN, which is similar to
mean-field annealing, has the same scaling behavior as simCIM and NMFA (see Fig. 2).
40
Figure 17: Median time to solution and extrapolations based on the hypotheses of scaling in eγN
and eγ
√
N by fitting the available experimental data for each Ising machine (see text). Shaded
regions show the 95% error in the scaling exponents. For FPGA-CAC and DA, the lower,
higher, and upper whisker of boxes show the 50th, 80th, and 90th percentiles of the real time to
solution distribution.
41
Supplementary material D: benchmark on GSET
Performance of FPGA-CAC in finding the maximum cuts known, i.e., lowest Ising Hamiltonian
known, of graphs in the GSET benchmark are shown in Tab. 3.
id N Copt C∗ C∗ − Copt p0 < tBLS > (s) < tFPGA > (s) < tBLS > / < tFPGA >
1 800 11624 11624 0 1.00 13.00 0.18 72.16
2 800 11620 11620 0 1.00 41.00 0.57 71.82
3 800 11622 11622 0 1.00 83.00 0.25 329.85
4 800 11646 11646 0 1.00 214.00 0.68 314.29
5 800 11631 11631 0 1.00 14.00 0.26 54.15
6 800 2178 2178 0 0.70 18.00 0.76 23.73
7 800 2006 2006 0 1.00 317.00 0.85 372.20
8 800 2005 2005 0 1.00 195.00 0.53 368.32
9 800 2054 2054 0 0.85 97.00 0.61 158.63
10 800 2000 2000 0 1.00 79.00 0.84 94.17
11 800 564 564 0 0.85 1.00 0.79 1.27
12 800 556 556 0 0.35 2.00 1.01 1.98
13 800 582 582 0 0.25 2.00 1.00 2.01
14 800 3064 3064 0 0.25 119.00 0.94 126.01
15 800 3050 3050 0 1.00 43.00 0.31 138.08
16 800 3052 3052 0 1.00 70.00 0.51 136.65
17 800 3047 3047 0 0.65 96.00 1.28 74.95
18 800 992 992 0 1.00 106.00 0.25 421.13
19 800 906 906 0 0.85 20.00 0.85 23.62
20 800 941 941 0 1.00 9.00 0.11 80.30
21 800 931 931 0 0.40 42.00 1.21 34.79
1 2000 13359 13359 0 0.15 560.00 2.76 203.06
2 2000 13354 13342 -12 0.25 278.00 4.33 64.23
3 2000 13337 13337 0 0.35 311.00 2.48 125.58
4 2000 13340 13340 0 0.25 148.00 9.49 15.59
5 2000 13328 13328 0 0.25 429.00 7.14 60.08
6 2000 3341 3341 0 1.00 449.00 9.96 45.07
7 2000 3298 3298 0 0.20 432.00 13.86 31.18
8 2000 3405 3405 0 0.20 17.00 9.90 1.72
9 2000 3413 3413 0 0.05 283.00 3.12 90.76
10 2000 3310 3309 -1 0.25 285.00 14.22 20.04
11 2000 1410 1410 0 0.05 336.00 35.90 9.36
12 2000 1382 1380 -2 0.05 402.00 5.68 70.74
13 2000 1384 1384 0 0.05 170.00 1.55 109.63
14 2000 7684 7685 1 0.15 442.00 12.94 34.15
15 2000 7678 7679 1 0.20 604.00 19.71 30.64
16 2000 7690 7690 0 0.05 444.00 33.87 13.11
17 2000 7688 7688 0 0.05 461.00 1.94 237.46
18 2000 2408 2408 0 0.45 251.00 15.01 16.72
19 2000 2400 2398 -2 0.05 431.00 2.45 175.71
20 2000 2405 2405 0 0.05 73.00 5.57 13.10
21 2000 2481 2481 0 0.50 183.00 19.15 9.56
1 1000 6660 6660 0 1.00 26.00 0.34 76.92
2 1000 6650 6650 0 1.00 43.00 0.31 140.21
3 1000 6654 6654 0 1.00 104.00 1.12 92.71
4 1000 6649 6649 0 1.00 67.00 1.42 47.24
5 1000 6657 6657 0 0.95 102.00 1.20 85.15
1 1000 3848 3848 0 0.45 81.00 2.62 30.86
2 1000 3851 3851 0 0.80 78.00 2.24 34.88
3 1000 3850 3850 0 0.35 117.00 2.43 48.14
4 1000 3852 3851 -1 0.80 131.00 2.23 58.62
Table 3: Performance of FPGA-CAC implementation in finding the maximum cuts known, i.e., lowest
Ising Hamiltonian known, of graphs in the GSET benchmark. id, Copt, and C∗ are the name of instances,
best maximum cuts known from (27, 52) and the proposed method, respectively, after 20 runs. p0 is the
probability that FPGA-CAC finds the cut C∗ in a single run. Moreover, < tBLS > and < tFPGA > are
the averaged time to solution using BLS written C++ and running on a Xeon E5440 2.83 GHz (52) and
the proposed scheme simulated implemented on the KU040 FPGA, respectively.
We believe a better tuning of the system parameters would allow to find the best known
cut for all instances (except the 2nd instance of N = 2000). For instances 14 and 15 of size
42
N = 2000, FPGA-CAC finds solutions of better quality than previously known from (52)
and (27). The solutions found are given as follows.
• Solution of instance 14 of size N = 2000 with cut 7685:
[1,-1, 1,-1,-1,-1,-1, 1, 1,-1,-1,-1,-1,-1, 1,-1,-1,-1, -1, 1, 1, 1,-1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,-1,-1,
1, -1, 1, 1, 1,-1, 1,-1,-1,-1,-1, 1, 1, 1,-1,-1,-1,-1, 1, 1, 1,-1, 1, 1, 1,-1, 1, 1, 1,-1, 1,-1,-1,-1,-1,-1,-1,
-1,-1,-1, 1, 1,-1, 1, 1, 1, 1, 1, 1,-1,-1, 1, 1,-1, 1, 1,-1,-1,-1,-1,-1, 1, 1, 1,-1, 1,-1,-1, 1,-1,-1,-1, 1,
1,-1,-1, 1, 1,-1,-1,-1,-1, 1,-1, 1, 1,-1, 1, 1, 1,-1, 1,-1,-1, 1,-1, 1,-1, 1, 1,-1,-1,-1,-1, 1, 1, 1, 1, 1,
1, 1, 1,-1, 1,-1, 1,-1,-1, 1,-1, 1, 1, 1, 1,-1, 1, 1, 1, 1, 1,-1, 1, 1, 1,-1, 1,-1, 1,-1,-1, 1, 1, 1,-1, 1,
1, 1,-1,-1, 1,-1, 1, 1,-1,-1,-1, 1,-1, 1, 1,-1, 1, 1, 1, 1,-1, 1,-1,-1, 1, 1, 1, 1, 1, 1, 1,-1, 1, 1,-1,-1,
-1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,-1,-1, 1, 1, 1,-1, 1,-1, 1, 1,-1, 1, 1, 1, 1, 1,-1, 1, 1,-1,-1,-1, 1, 1,
1, 1, 1,-1,-1, 1, 1, 1, 1,-1, 1, 1, 1, 1,-1,-1, 1,-1, 1,-1,-1, 1,-1,-1, 1, 1,-1, 1, 1,-1,-1, 1,-1,-1,-1, 1,
-1, 1, 1, 1, 1,-1, 1, 1, 1,-1,-1, 1, 1, 1, 1,-1,-1,-1, 1,-1, 1,-1,-1, 1,-1,-1, 1, 1, 1,-1, 1, 1,-1,-1, 1,-1,
-1, 1,-1,-1, 1, 1,-1, 1, 1, 1, 1, 1, 1, 1, 1,-1,-1,-1, -1, 1, 1,-1, 1,-1, 1, 1, 1, 1, 1,-1, 1,-1, 1, 1,-1, 1,
-1, 1,-1,-1, 1, 1,-1, 1,-1, 1,-1, 1, 1,-1,-1,-1, 1,-1, -1, 1,-1, 1, 1,-1, 1,-1,-1,-1,-1, 1,-1,-1,-1, 1, 1,-1,
-1,-1,-1,-1, 1, 1, 1, 1, 1,-1,-1, 1,-1, 1,-1, 1,-1, 1, -1, 1, 1, 1,-1,-1,-1,-1,-1, 1,-1, 1,-1, 1,-1, 1, 1,-1,
1,-1, 1, 1,-1,-1, 1, 1, 1,-1,-1, 1, 1,-1, 1, 1,-1,-1, -1, 1,-1, 1,-1, 1, 1,-1,-1, 1, 1, 1,-1,-1, 1,-1,-1, 1,
1,-1, 1, 1, 1,-1, 1,-1, 1,-1,-1, 1, 1,-1,-1, 1, 1, 1, -1,-1,-1,-1, 1, 1, 1,-1,-1,-1, 1, 1,-1, 1, 1,-1, 1, 1,
1,-1, 1,-1,-1,-1,-1,-1,-1,-1, 1, 1, 1, 1, 1,-1,-1, 1, 1,-1,-1, 1, 1,-1,-1,-1, 1, 1,-1,-1,-1,-1,-1, 1, 1,-1,
-1, 1, 1, 1,-1, 1, 1,-1, 1, 1,-1,-1,-1,-1,-1, 1, 1,-1, 1,-1, 1,-1,-1,-1, 1,-1,-1,-1, 1,-1, 1,-1,-1, 1, 1,
1, -1, 1, 1, 1,-1,-1, 1,-1,-1,-1, 1,-1, 1,-1,-1, 1, 1,-1, -1,-1,-1, 1,-1,-1,-1,-1,-1, 1,-1,-1,-1,-1,-1,-1,
1,-1, 1,-1, 1, 1, 1, 1, 1,-1, 1, 1, 1,-1, 1,-1,-1, 1, 1, 1, -1,-1,-1, 1, 1, 1,-1, 1,-1,-1,-1,-1, 1,-1,-1,-1,-1,
1, -1,-1, 1, 1,-1,-1, 1,-1,-1,-1,-1,-1,-1,-1, 1,-1,-1,-1, 1,-1,-1, 1,-1,-1, 1, 1,-1,-1,-1,-1,-1, 1, 1,-1,
1, 1, -1,-1, 1, 1, 1,-1,-1,-1,-1, 1,-1, 1,-1,-1, 1,-1, 1,-1, -1,-1,-1,-1, 1, 1,-1,-1,-1, 1,-1,-1,-1,-1, 1,
1,-1,-1, 1,-1, 1,-1, 1,-1, 1, 1, 1,-1,-1,-1,-1,-1, 1,-1,-1, 1, 1, 1,-1, 1, 1,-1,-1,-1,-1, 1, 1,-1,-1, 1,-1,
1,-1,-1, -1, 1, 1,-1, 1,-1, 1,-1,-1, 1, 1,-1,-1, 1,-1,-1,-1,-1, -1,-1,-1, 1, 1, 1,-1, 1, 1, 1,-1, 1, 1, 1, 1,
1,-1, 1, 1, 1, 1,-1, 1,-1, 1, 1, 1,-1, 1, 1, 1,-1, 1,-1, 1, 1, 1,-1,-1,-1, 1,-1,-1,-1, 1, 1,-1, 1, 1,-1, 1, 1,
1,-1, 1,-1,-1, 1, 1, 1, 1,-1, 1, 1,-1,-1,-1,-1,-1,-1, 1,-1, 1, 1,-1,-1,-1, 1, 1,-1, 1,-1, 1, 1, 1,-1,-1,-1,
1, 1, 1,-1,-1,-1, 1,-1, 1,-1,-1,-1,-1,-1,-1, 1,-1,-1,-1,-1, -1,-1, 1, 1,-1,-1, 1,-1,-1,-1, 1, 1, 1, 1, 1,-1,
1, 1, -1,-1,-1, 1,-1, 1, 1, 1, 1, 1,-1,-1,-1, 1, 1, 1, 1, 1, 1,-1,-1,-1, 1, 1,-1,-1, 1, 1,-1, 1,-1, 1, 1,-1, 1,
1, 1, 1, 1,-1, 1, 1,-1,-1, 1,-1, 1,-1, 1, 1,-1, 1, 1,-1, -1, 1,-1, 1,-1, 1, 1,-1,-1,-1, 1,-1, 1, 1, 1, 1,-1,-1,
43
-1,-1, 1,-1, 1,-1, 1, 1,-1,-1, 1, 1,-1, 1,-1, 1,-1, 1, 1,-1, 1, 1, 1, 1, 1,-1, 1,-1, 1, 1, 1,-1,-1,-1, 1, 1,
-1, 1, 1,-1, 1,-1, 1, 1,-1,-1,-1,-1,-1, 1, 1,-1,-1, 1, 1,-1, 1,-1, 1,-1, 1, 1, 1,-1, 1,-1,-1, 1, 1, 1,-1, 1,
1,-1,-1, 1, 1,-1,-1, 1, 1, 1, 1,-1, 1,-1,-1,-1,-1, 1, 1, 1,-1,-1, 1, 1,-1,-1, 1, 1, 1, 1,-1, 1,-1,-1, 1,-1,
-1, 1, 1,-1, 1,-1,-1,-1,-1, 1, 1,-1,-1,-1, 1, 1, 1, 1, 1, 1,-1,-1,-1,-1,-1,-1, 1, 1,-1, 1, 1, 1, 1,-1,-1, 1,
-1,-1,-1, 1,-1,-1,-1, 1, 1, 1, 1,-1, 1,-1, 1, 1, 1,-1, -1, 1, 1, 1,-1, 1,-1, 1, 1, 1,-1, 1, 1,-1,-1, 1,-1,-1,
1, 1,-1,-1,-1,-1, 1,-1,-1,-1,-1, 1, 1,-1, 1, 1, 1, 1, -1, 1,-1, 1, 1,-1, 1,-1,-1,-1, 1,-1, 1,-1,-1,-1, 1,-1,
-1, 1,-1, 1,-1,-1, 1, 1,-1, 1, 1, 1,-1, 1, 1, 1, 1,-1, -1,-1,-1, 1, 1, 1,-1, 1, 1,-1,-1, 1,-1,-1, 1,-1, 1, 1,
-1,-1,-1, 1, 1,-1, 1, 1, 1,-1,-1,-1, 1, 1,-1, 1,-1,-1, 1,-1,-1, 1,-1,-1, 1,-1, 1, 1,-1, 1,-1,-1,-1,-1, 1, 1,
1, 1,-1,-1,-1,-1, 1,-1, 1,-1, 1, 1, 1,-1,-1,-1,-1,-1, 1,-1,-1, 1, 1, 1, 1,-1,-1,-1,-1, 1, 1,-1,-1,-1, 1,-1,
1, 1, 1, 1, 1,-1, 1, 1,-1,-1, 1, 1, 1,-1, 1,-1,-1, 1, 1, 1,-1,-1,-1,-1, 1,-1, 1,-1, 1, 1, 1,-1,-1, 1, 1,-1,
-1, 1,-1,-1,-1,-1,-1,-1, 1, 1,-1,-1,-1,-1,-1,-1, 1,-1, 1, 1,-1,-1, 1,-1,-1,-1, 1, 1,-1, 1, 1,-1,-1, 1, 1, 1,
-1,-1, 1,-1, 1,-1, 1,-1,-1, 1,-1, 1,-1,-1, 1, 1,-1, 1, 1,-1, 1,-1, 1,-1,-1,-1, 1, 1,-1, 1, 1,-1, 1, 1,-1, 1,
-1,-1,-1,-1, 1,-1, 1, 1, 1,-1,-1,-1, 1, 1,-1, 1, 1,-1, 1,-1,-1,-1,-1, 1, 1, 1, 1,-1,-1,-1, 1, 1, 1,-1,-1, 1,
1, 1,-1, 1,-1,-1,-1,-1,-1, 1,-1, 1,-1,-1,-1,-1, 1,-1, -1,-1,-1,-1,-1,-1, 1, 1,-1,-1, 1,-1,-1,-1,-1,-1,-1,-1,
1,-1, 1,-1,-1,-1, 1, 1,-1, 1,-1, 1, 1,-1, 1,-1,-1,-1, 1,-1, 1, 1, 1,-1,-1,-1, 1,-1,-1, 1,-1,-1, 1,-1, 1,-1,
-1,-1,-1,-1, 1, 1, 1, 1,-1, 1,-1,-1, 1,-1,-1,-1,-1,-1, 1,-1, 1, 1,-1, 1, 1,-1, 1,-1, 1,-1, 1,-1, 1, 1,-1, 1,
-1,-1, 1,-1, 1, 1, 1, 1,-1, 1, 1, 1,-1,-1,-1, 1, 1,-1, 1,-1,-1, 1,-1, 1,-1, 1,-1,-1, 1, 1, 1, 1,-1,-1,-1, 1,
-1, 1,-1,-1, 1,-1, 1, 1,-1, 1,-1, 1,-1,-1,-1, 1,-1, 1, -1, 1,-1,-1, 1,-1,-1,-1,-1,-1, 1, 1,-1,-1, 1, 1,-1, 1,
1, 1, 1,-1,-1,-1, 1, 1,-1, 1, 1,-1,-1, 1, 1,-1,-1,-1, -1,-1, 1,-1,-1,-1,-1,-1, 1, 1, 1, 1, 1,-1, 1,-1, 1, 1,
1, 1, 1,-1, 1,-1,-1, 1,-1,-1, 1, 1, 1,-1,-1, 1,-1, 1, -1,-1,-1, 1,-1,-1,-1, 1,-1,-1,-1, 1, 1, 1, 1, 1,-1,-1,
1, 1,-1, 1, 1, 1,-1, 1, 1, 1, 1, 1,-1, 1,-1, 1, 1, 1, -1, 1, 1,-1, 1,-1, 1, 1, 1, 1,-1,-1,-1, 1, 1, 1, 1,-1,
1,-1,-1,-1, 1, 1,-1,-1,-1,-1,-1, 1, 1,-1, 1, 1, 1,-1, 1, 1,-1, 1, 1, 1, 1,-1,-1,-1, 1,-1, 1,-1, 1, 1,-1,-1,
1,-1,-1,-1,-1,-1, 1, 1,-1,-1,-1, 1, 1,-1, 1, 1, 1, 1, 1, 1, 1, 1,-1, 1,-1,-1,-1,-1,-1, 1, 1,-1, 1,-1, 1, 1,
-1, 1,-1, 1, 1,-1,-1,-1, 1, 1,-1,-1, 1,-1,-1,-1, 1, 1, 1, 1,-1,-1,-1,-1, 1, 1,-1,-1, 1, 1,-1, 1, 1,-1,-1,-1,
-1, 1,-1, 1, 1, 1,-1,-1,-1, 1,-1, 1, 1, 1, 1,-1,-1,-1, 1, 1,-1, 1, 1,-1, 1,-1,-1, 1, 1, 1,-1, 1,-1, 1,-1,-1,
-1, 1,-1, 1,-1, 1, 1, 1,-1, 1, 1, 1,-1,-1,-1, 1,-1,-1, -1, 1,-1,-1,-1, 1,-1, 1, 1,-1, 1,-1, 1, 1,-1,-1, 1, 1,
-1,-1,-1,-1, 1, 1,-1, 1,-1, 1, 1, 1,-1, 1,-1, 1, 1, 1, -1, 1,-1, 1,-1,-1, 1, 1,-1,-1,-1,-1, 1, 1,-1, 1,-1, 1,
-1,-1, 1,-1, 1,-1, 1, 1, 1, 1, 1, 1, 1,-1,-1, 1,-1, 1, 1, 1,-1, 1, 1, 1,-1, 1, 1, 1,-1, 1, 1, 1,-1, 1, 1, 1, -1,
1, 1,-1, 1, 1,-1,-1, 1, 1, 1,-1, 1, 1, 1,-1,-1, 1, 1, 1]
• Solution of instance 15 of size N = 2000 with cut 7679:
44
[-1, 1,-1,-1,-1,-1, 1,-1,-1,-1,-1,-1, 1,-1, 1, 1,-1,-1, -1,-1, 1,-1, 1,-1,-1, 1,-1, 1,-1, 1,-1,-1,-1, 1,
1,-1, -1, 1,-1, 1, 1, 1, 1,-1, 1, 1,-1, 1,-1,-1, 1, 1, 1, 1, 1,-1,-1,-1, 1, 1,-1,-1, 1, 1, 1,-1,-1, 1, 1, 1,
1,-1, 1, 1,-1,-1, 1,-1, 1, 1, 1,-1, 1,-1, 1, 1,-1,-1,-1,-1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,-1, 1,-1,-1,
1, -1,-1,-1,-1,-1,-1, 1, 1, 1,-1,-1, 1,-1, 1,-1,-1, 1, 1, -1, 1,-1,-1,-1,-1, 1, 1,-1,-1, 1, 1,-1,-1, 1, 1,
1,-1, 1, 1, 1, 1, 1,-1,-1, 1, 1, 1, 1, 1, 1,-1, 1, 1, 1,-1, 1,-1, 1,-1, 1, 1,-1,-1, 1, 1, 1,-1,-1, 1,-1, 1,-1,
1, 1, 1, 1, 1, 1, 1,-1, 1, 1, 1, 1,-1, 1, 1, 1, 1, 1, 1, -1, 1,-1, 1,-1,-1, 1,-1,-1, 1,-1,-1, 1,-1, 1, 1,-1,-1,
1, 1, 1, 1, 1,-1, 1,-1,-1,-1, 1,-1, 1,-1, 1,-1, 1, 1, -1,-1,-1, 1,-1, 1, 1, 1,-1,-1, 1, 1,-1,-1, 1, 1,-1,-1,
1,-1, 1, 1,-1,-1, 1,-1,-1,-1, 1,-1,-1, 1,-1,-1,-1, 1, -1,-1, 1,-1, 1,-1, 1, 1, 1,-1, 1, 1,-1,-1,-1,-1, 1,-1,
1,-1, 1, 1, 1, 1,-1,-1, 1, 1,-1, 1, 1,-1,-1, 1, 1, 1, 1, 1, 1, 1, 1,-1, 1,-1, 1, 1,-1, 1, 1,-1, 1, 1, 1, 1,
-1,-1, 1, 1, 1,-1,-1,-1,-1,-1, 1, 1,-1, 1, 1,-1,-1, 1, 1, 1,-1, 1,-1, 1, 1,-1,-1, 1,-1, 1, 1, 1, 1, 1,-1,-1,
1, 1, 1, 1, 1,-1,-1,-1,-1, 1,-1,-1, 1,-1, 1, 1, 1, 1, 1,-1,-1, 1, 1,-1, 1,-1,-1, 1,-1,-1, 1,-1, 1,-1,-1,-1,
-1,-1,-1, 1, 1,-1, 1,-1,-1, 1, 1, 1,-1, 1,-1, 1,-1,-1, 1,-1, 1,-1,-1,-1, 1, 1, 1,-1, 1,-1,-1, 1,-1, 1,-1,-1,
-1, 1, 1,-1, 1, 1,-1,-1,-1, 1,-1,-1, 1,-1,-1, 1, 1,-1, -1,-1,-1,-1, 1, 1,-1, 1,-1,-1, 1,-1, 1,-1,-1,-1, 1, 1,
-1,-1,-1, 1,-1, 1, 1, 1,-1, 1,-1, 1,-1,-1, 1, 1,-1,-1, -1,-1,-1, 1, 1, 1,-1, 1,-1,-1,-1, 1, 1,-1, 1, 1,-1, 1,
1,-1,-1,-1,-1,-1,-1, 1,-1, 1,-1, 1, 1, 1,-1,-1, 1,-1, -1, 1,-1, 1,-1, 1,-1, 1, 1, 1, 1, 1, 1,-1,-1, 1, 1, 1,
1,-1, 1, 1, 1,-1, 1,-1, 1,-1,-1,-1, 1, 1, 1,-1, 1, 1, 1,-1, 1, 1, 1, 1, 1,-1, 1, 1,-1,-1, 1,-1,-1,-1, 1, 1,
1,-1, 1,-1,-1, 1,-1,-1, 1,-1, 1, 1,-1, 1,-1, 1,-1, 1, -1,-1,-1, 1,-1,-1, 1, 1, 1, 1, 1, 1,-1,-1, 1,-1, 1, 1,
1,-1,-1, 1,-1, 1, 1, 1, 1,-1, 1, 1,-1,-1,-1,-1,-1, 1, -1,-1, 1,-1, 1,-1, 1, 1, 1, 1,-1, 1, 1, 1,-1, 1,-1,-1,
-1,-1, 1, 1, 1, 1, 1, 1,-1,-1,-1, 1,-1, 1, 1,-1, 1, 1, 1,-1, 1, 1,-1,-1, 1, 1, 1, 1,-1,-1, 1,-1, 1,-1, 1,-1,
-1, 1, 1, 1,-1, 1, 1,-1,-1, 1, 1, 1,-1, 1, 1, 1, 1,-1, 1, 1, 1,-1,-1,-1, 1, 1, 1,-1,-1,-1, 1,-1,-1, 1,-1, 1,
1, 1,-1, 1, 1, 1, 1, 1, 1, 1,-1, 1,-1, 1, 1,-1,-1, 1, 1,-1, 1,-1,-1, 1, 1,-1, 1, 1, 1, 1, 1, 1,-1,-1,-1, 1,
1, 1, 1, 1,-1, 1,-1,-1, 1, 1, 1, 1, 1, 1,-1, 1, 1, 1, -1,-1,-1, 1, 1,-1, 1,-1, 1,-1, 1,-1,-1, 1, 1,-1,-1, 1,
-1, 1, 1,-1, 1, 1,-1,-1, 1, 1,-1, 1,-1, 1,-1,-1,-1, 1, -1, 1, 1, 1, 1,-1, 1, 1,-1,-1, 1,-1, 1, 1, 1,-1, 1,-1,
-1, 1, 1,-1, 1,-1, 1,-1, 1,-1,-1, 1,-1,-1,-1, 1,-1,-1, -1,-1,-1,-1, 1, 1,-1, 1, 1, 1,-1,-1,-1,-1,-1,-1,-1, 1,
-1,-1, 1,-1,-1,-1, 1,-1, 1, 1,-1,-1,-1,-1,-1, 1,-1,-1, -1, 1,-1,-1,-1,-1,-1, 1, 1, 1, 1,-1, 1,-1, 1,-1, 1,-1,
1,-1,-1, 1,-1,-1,-1, 1,-1,-1,-1,-1,-1, 1, 1,-1,-1,-1, -1, 1,-1, 1,-1,-1,-1, 1, 1, 1,-1,-1, 1, 1,-1, 1,-1, 1,
-1, 1,-1,-1, 1,-1, 1, 1,-1,-1, 1,-1,-1, 1,-1,-1, 1,-1, 1,-1,-1,-1, 1,-1,-1, 1, 1,-1, 1,-1,-1, 1,-1,-1,-1, 1,
1, 1,-1,-1,-1, 1,-1, 1, 1,-1, 1,-1,-1,-1, 1,-1, 1,-1, 1, 1,-1,-1,-1, 1,-1,-1, 1, 1, 1,-1,-1, 1,-1,-1, 1,-1,
1, 1,-1,-1,-1, 1,-1,-1,-1, 1, 1, 1,-1,-1, 1, 1, 1,-1, -1,-1,-1, 1, 1, 1, 1,-1,-1, 1,-1, 1,-1, 1,-1, 1,-1, 1,
-1, 1, 1, 1,-1, 1,-1,-1, 1,-1,-1, 1, 1, 1,-1, 1,-1,-1, -1, 1, 1, 1,-1,-1, 1,-1, 1, 1, 1, 1,-1, 1,-1,-1,-1, 1,
45
1, 1,-1,-1, 1,-1,-1, 1, 1,-1,-1,-1,-1,-1,-1, 1, 1, 1, 1,-1, 1, 1,-1, 1, 1,-1,-1,-1, 1, 1, 1,-1, 1, 1,-1,-1,
-1,-1, 1,-1,-1, 1,-1,-1, 1, 1,-1, 1,-1, 1,-1,-1, 1,-1, -1, 1,-1,-1, 1,-1,-1,-1,-1, 1,-1, 1, 1,-1,-1, 1, 1, 1,
1,-1,-1, 1, 1, 1, 1,-1, 1, 1, 1,-1,-1, 1, 1, 1, 1, 1, -1,-1, 1,-1,-1,-1, 1,-1,-1,-1, 1,-1,-1, 1, 1, 1,-1, 1,
-1,-1,-1, 1,-1,-1,-1,-1,-1, 1, 1,-1, 1, 1, 1,-1, 1, 1, 1,-1, 1, 1,-1, 1, 1, 1, 1,-1, 1, 1, 1,-1,-1,-1,-1,-1,
1, 1,-1, 1,-1, 1,-1,-1,-1, 1,-1, 1,-1,-1,-1, 1, 1, 1, 1,-1, 1,-1,-1, 1,-1,-1,-1, 1,-1,-1, 1, 1,-1, 1,-1,-1,
-1,-1,-1, 1, 1,-1, 1, 1, 1,-1,-1,-1, 1, 1, 1, 1,-1, 1, 1, 1, 1,-1, 1, 1,-1, 1,-1, 1, 1, 1, 1, 1, 1,-1,-1, 1,
-1,-1,-1,-1,-1, 1,-1,-1,-1, 1,-1, 1, 1,-1,-1, 1, 1, 1, 1, 1,-1, 1,-1,-1,-1, 1, 1, 1, 1, 1,-1,-1, 1, 1,-1,-1,
-1,-1,-1, 1, 1, 1,-1, 1,-1, 1, 1, 1,-1,-1,-1,-1, 1, 1, 1,-1, 1, 1,-1, 1, 1, 1,-1, 1, 1,-1, 1,-1, 1, 1,-1,-1,
-1, 1,-1,-1, 1, 1, 1, 1,-1, 1, 1,-1,-1, 1, 1,-1, 1, 1, 1, 1,-1, 1, 1, 1, 1, 1,-1, 1,-1,-1,-1,-1, 1,-1,-1, 1,
1,-1, 1, 1,-1, 1, 1,-1,-1, 1,-1, 1, 1, 1,-1, 1, 1,-1, -1,-1, 1, 1,-1,-1,-1, 1,-1, 1, 1, 1, 1, 1,-1, 1, 1, 1,
1,-1,-1, 1,-1, 1, 1, 1,-1, 1,-1, 1, 1,-1,-1,-1, 1,-1, 1,-1, 1, 1,-1, 1,-1, 1,-1, 1, 1,-1,-1, 1,-1, 1, 1, 1,
-1, 1,-1,-1, 1, 1,-1, 1,-1,-1,-1,-1,-1, 1, 1,-1,-1,-1, 1, 1, 1, 1,-1,-1,-1, 1,-1,-1, 1,-1, 1, 1, 1,-1, 1, 1,
1, 1,-1,-1,-1, 1, 1,-1, 1, 1,-1,-1, 1, 1,-1,-1, 1, 1, -1,-1, 1,-1, 1, 1, 1,-1,-1,-1, 1, 1,-1, 1,-1, 1, 1,-1,
-1, 1, 1, 1,-1, 1, 1,-1, 1,-1,-1,-1, 1, 1, 1, 1, 1,-1, -1,-1, 1,-1,-1,-1, 1, 1, 1,-1,-1, 1,-1, 1,-1,-1,-1,-1,
-1, 1, 1,-1, 1, 1,-1, 1, 1,-1, 1, 1,-1,-1, 1,-1, 1,-1, 1,-1, 1,-1, 1, 1, 1, 1, 1,-1,-1, 1, 1, 1,-1,-1,-1,-1,
-1, 1, 1,-1,-1,-1, 1,-1,-1,-1,-1, 1, 1,-1,-1,-1,-1, 1, 1,-1,-1,-1, 1, 1,-1, 1, 1,-1,-1,-1, 1, 1,-1,-1, 1,-1,
1, 1, 1,-1,-1, 1, 1,-1,-1, 1, 1, 1, 1, 1,-1, 1,-1,-1, 1, 1,-1, 1,-1, 1,-1,-1, 1, 1,-1, 1, 1,-1, 1, 1,-1, 1,
1, 1, 1, 1, 1,-1, 1, 1,-1,-1,-1,-1, 1, 1, 1, 1,-1,-1, -1,-1, 1, 1,-1,-1, 1, 1,-1, 1, 1, 1, 1,-1, 1, 1, 1, 1,
-1, 1,-1,-1, 1, 1, 1,-1, 1,-1,-1,-1,-1, 1,-1,-1, 1,-1, -1,-1,-1,-1, 1, 1, 1, 1, 1,-1, 1,-1,-1,-1,-1,-1,-1, 1,
1, 1, 1, 1,-1, 1,-1, 1,-1, 1,-1,-1,-1, 1,-1,-1,-1, 1, 1,-1, 1,-1, 1, 1, 1, 1,-1,-1,-1, 1,-1, 1,-1, 1, 1, 1,
-1,-1,-1, 1,-1, 1,-1,-1, 1, 1, 1, 1,-1,-1,-1,-1,-1, 1, 1, 1,-1, 1, 1, 1, 1,-1,-1, 1, 1, 1, 1,-1, 1, 1, 1, 1,
1,-1,-1, 1, 1,-1,-1, 1, 1,-1, 1, 1,-1, 1,-1, 1, 1, 1, 1,-1,-1, 1, 1, 1,-1,-1, 1,-1,-1,-1, 1,-1,-1, 1, 1,-1,
1,-1,-1,-1,-1,-1, 1, 1,-1,-1,-1,-1, 1, 1, 1,-1,-1,-1, 1, 1, 1, 1,-1,-1, 1,-1,-1, 1,-1, 1,-1, 1, 1,-1, 1, 1,
1, 1, 1, 1,-1, 1, 1, 1,-1, 1, 1,-1, 1, 1, 1, 1,-1,-1, -1,-1,-1,-1, 1,-1,-1, 1,-1, 1,-1,-1,-1,-1,-1,-1, 1,-1,
-1, 1, 1,-1,-1,-1, 1,-1, 1,-1,-1, 1, 1, 1, 1,-1, 1, 1, 1, 1, 1, 1,-1, 1,-1, 1,-1, 1,-1, 1, 1, 1, 1,-1, 1, 1, 1,
1,-1,-1,-1, 1, 1,-1,-1, 1,-1,-1,-1,-1, 1, 1, 1,-1, -1, 1]
46
Parameter values used for solving instances of the GSET are shown in Tab. 4.
Symbol meaning value
 coupling strength 3
d0
α target amplitude baseline 3.0
pi linear gain baseline 1− 400d−2.51
ρ amplitude and gain variation 1.0
δ sensitivity to energy variations 2.6
N
γ rate of increase of β 2
N
τ max. time w/o energy change 9N
nx number of iterations for nonlinear terms 6
ne number of iterations for error terms 4
dtx normalized time-step of nonlinear terms 2−6
dte normalized time-step of error terms 2−4
Table 4: Parameters used for solving GSET problem instances.
where d1 is a function of the maximum degree given as d1 = max{d0, 10} and d0 = meani{
∑
j |ωij |}.
47
Supplementary material E: other algorithms
1. Details of NMFA simulation
Noisy mean-field annealing can be simplified to the following discrete system (21):
yi(n+ 1) = (1− α)yi(n) + αtanh[ 1
σωT (t)
(
∑
j
ωijyj(n)) + σrri], (35)
with σω =
√∑
j J
2
ij . When the noise is not taken into account (i.e., ri = 0), the steady state of eq. (35)
is given as follows:
y∗i = tanh[
1
σωT (t)
(
∑
j
ωijyj(n))], (36)
Note that the solutions of the eq. (36) are the same as those of the TAP naive mean-field equations
(see (22)). Moreover, they are the same as the steady state of eq. (1) when considering the change of
variable yi = g(xi) with g(x) = tanh(x) and βi(t) = 1T (t) . In fact, it can be shown that the two systems
have the same set of attractors (60).
The default parameters used in the numerical simulations are given as follows (21):
Symbol meaning value
α recombination parameter 0.15
σr standard deviation of noise 0.15
Moreover, the temperature T (t) is decreased with time according to an annealing schedule.
The eq. (35) is simulated on a GPU using CUDA code provided in (21). Various parameters of
the temperature scheduling T (t) and parameters α and σr have been tried in order to maximize the
performance of this algorithm in finding the ground state of SK spin glass problems (see Figs. 18 and
19).
48
Figure 18: 50th (a) and 80th (b) percentiles of the step to solution distribution vs. the problems
size N of bimodal Sherrington-Kirkpatrick spin glass problems.
Figure 19: Exponential and inverse linear scaling of T (t) = 1
β(t)
for T = 1000.
49
2. Details of CIM simulation (simCIM)
The physical model of the measurement feedback coherent Ising machine developed in (8) can be sim-
plified as follows:
xi(n+ 1) = AG(xi(n)) + r1 +
√
ξ0Θ(B
∑
j
ωijG(xi(n)) + r3)) (37)
where Θ(x) = R(−Fx;xmax) and R(x; y) is the saturation function defined as R(x; y) = x if |x| < y
and R(x; y) = xmax otherwise. If we assume, for simplicity, that the saturation function Θ(x) is simply
linear with Θ(x) = Ftx, then eq. (37) can be written under the form of eq. (1) by using the following:
f(xi) = AG(xi), g(βxi) = G(xi), βi(t) = F (t)B
√
ξ0, and r1 +
√
ξ0 + r3 = σηi.
Eq. (37) is simulated using a GPU implementation in order to approximate accurately the success
probability when it is small.
50
3. Benchmark set availability
Sherrington-Kirkpatrick instances used in this paper are available upon request. The GSET instances are
available at https://web.stanford.edu/ yyye/yyye/Gset/.
51
