Abstract. We have designed, fabricated and tested an XY-addressable readout system that is specifically tailored for the reading of superconducting flux qubits in an integrated circuit that could enable adiabatic quantum optimization. In such a system, the flux qubits only need to be read at the end of an adiabatic evolution when quantum mechanical tunneling has been suppressed, thus simplifying many aspects of the readout process. The readout architecture for an N -qubit adiabatic quantum optimization system comprises N hysteretic dc SQUIDs and N rf SQUID latches controlled by 2 √ N + 2 bias lines. The latching elements are coupled to the qubits and the dc SQUIDs are then coupled to the latching elements. This readout scheme provides two key advantages: First, the latching elements provide exceptional flux sensitivity that significantly exceeds what may be achieved by directly coupling the flux qubits to the dc SQUIDs using a practical mutual inductance. Second, the states of the latching elements are robust against the influence of ac currents generated by the switching of the hysteretic dc SQUIDs, thus allowing one to interrogate the latching elements repeatedly so as to mitigate the effects of stochastic switching of the dc SQUIDs. We demonstrate that it is possible to achieve single qubit errors rates of < 10 −6 with this readout scheme. We have characterized the system-level performance of a 128-qubit readout system and have measured an error probability of 8 × 10 −5 in the presence of optimal latching element bias conditions. PACS numbers: 85.25. Cp,85.25.Dq,85.25.Hv 
Introduction
Many proposals exist for how to build superconducting quantum computing systems [1, 2] , each with their own unique challenges. One such proposal is to enable adiabatic quantum optimization (AQO) using networks of inductively coupled rf SQUID flux qubits [3] [4] [5] . In AQO, one embeds an optimization problem of interest in the set of local flux biases applied to each qubit and in the set of inductive inter-qubit couplings available in a given processor architecture. The processor is initialized in a state wherein all single qubit tunneling energies ∆ q are much larger than the energy scales involved in the problem of interest. In this case, all qubits will readily relax into their ground states. Thereafter, the AQO algorithm for finding an optimal solution proceeds by smoothly decreasing ∆ q in time until it is much less than the energy scales involved in the problem of interest. If the resultant evolution of the state of the processor is adiabatic, then the final state of the processor will encode an optimal solution to the problem of interest. The challenge that is discussed in this article is that of reading the final state of an AQO processor. Note that at the end of the AQO algorithm ∆ q is negligible, thus naturally terminating the evolution in a state that is diagonal in the qubits' flux bases. This is in contrast to a general purpose gate model (GM) quantum information processor in which the final state of the processor could be a superposition state due to appreciable ∆ q .
Several methods for reading the state of a flux qubit in its flux basis have been reported upon in the literature to date. One approach is to inductively couple a qubit to a hysteretic dc SQUID and then interrogate the latter device using either a current bias ramp [6, 7] or carefully crafted current pulse [8] to discern the bias at which it switches into the voltage state, which depends upon the flux imparted to the dc SQUID by the qubit. One key advantage to this readout mechanism is physical size: dc SQUIDs can be made relatively small, thus improving the prospects for scaling such a readout architecture to many qubit processors.
Furthermore, as shown in this article, one can design scalable readout biasing architectures based upon dc SQUIDs that make economical use of a limited number of external bias controls. On the other hand, while hysteretic dc SQUIDs were used in early GM quantum computation experiments, they have fallen out of favor due to on-chip heating and back-action on the qubits. The first of these drawbacks could be remedied, in principle, by better thermal design. The second drawback is less of an issue for AQO as the quantum computation itself naturally localizes the state of each qubit by making ∆ q negligible by the end of the computation, thus inhibiting transitions within the qubit. As such, hysteretic dc SQUIDs could still be of use in designing scalable readout architectures for AQO processors. Note that readout architectures based upon scalable dispersive techniques [9, 10] , which may prove to be essential in the development of GM processors, take up significantly more physical space than those based upon hysteretic dc SQUIDs but with some work may also be an attractive option for an AQO.
While the arguments presented above suggest that the relatively simple hysteretic dc SQUID readout of flux qubits may be useful in the development of large scalable AQO processors, our research has revealed one more drawback of such an approach. It is well known that unshunted dc SQUIDs when switched into the voltage state act as an on-chip source of microwave radiation. While this feature has been exploited to resonantly excite flux qubits in novel quantum mechanics experiments [11] , this behavior would be highly undesirable in a future functional quantum information processor. We have observed that this behavior is particularly problematic in dense superconducting circuits as the radiation generated by a dc SQUID can resonantly drive multiple flux qubits within its vicinity, thus altering their final states. Part of this issue may simply be the choice of rf SQUID parameters as the design pressures in an AQO processor tend to favor flux qubits with large geometric inductances so as to facilitate multiple inter-qubit inductive couplings. Consequently, in order to achieve appreciable ∆ q , one must lower the designed critical current of the rf SQUID, thus yielding a flux qubit with a relatively low tunnel barrier over which radiation from dc SQUIDs can drive resonant activation. This is clearly unacceptable as the solution to the optimization problem that has been posed to the processor is then corrupted by the act of reading. The novel approach that we have implemented to remedy this problem is to insert a quantum flux parametron (QFP) [12, 13] between each qubit and its dedicated dc SQUID. The QFP is an rf SQUID that possesses a small inductance, a large capacitance and a very large critical current. Consequently, the QFP possesses a substantially larger tunnel barrier ‡ as compared to the flux qubit, which makes the state of the QFP robust in the presence of radiation emanating from dc SQUIDs in the voltage state. In our architecture, the QFP is used to quietly latch the final state of a given flux qubit with minimal back-action. Thereafter, the final state of the QFP is read using a hysteretic dc SQUID. Thus, the QFP and dc SQUID combination provides a viable route to a scalable readout architecture that is suitable for an AQO processor. Figure 1 shows a high level schematic of a small portion of an XY-addressable readout system. Since this article is focused upon the readout architecture, we have suppressed the details of the flux qubits and their control circuitry and refer the reader to Refs. [14, 15] for information concerning those elements. Each qubit in the circuit requires two elements for readout: a dc SQUID and a QFP. Each dc SQUID has two shared control lines: a current bias (cb 0 , cb 1 , ...) and a flux bias (fb 0 , fb 1 , ...). All QFP latches share a single activation line ("QFP latch", flux bias referred to as Φ x latch ) and a ‡ For comparison, typical barrier heights in our rf SQUID flux qubits are 460 GHz. We have found that doubling the maximum β of our qubits to 4 (and thus the barrier to closer to 1 THz) removes the resonant activation problem. The resulting devices are parametrically similar to typical phase qubit rf SQUIDs, which we have measured do not have this problem. Unfortunately, a maximum β near 4 makes requirements on junction asymmetry too strict to be useful. . Let the inductance and capacitance of the QFP be denoted by L qfp and C qfp , respectively. Each QFP is coupled to its affiliated qubit (dc SQUID) via a mutual inductance M qfp,qu (M dc,qfp ). The top panel of figure 2 shows the physical layout of a portion of the readout system, as implemented in the chips discussed in this article. Note that the QFP is galvanically attached to the qubit; this is done purely to reduce the contribution of the readout to the length and inductance of the qubit. The bottom panel of this figure shows a cross-section of our planarized fabrication process that was used to produce the test circuits reported upon herein. All results presented in this article were obtained from test circuits containing either 8 or 128 flux qubits, as indicated in the text.
Readout Circuit Design
At a high level, the readout circuit functions as follows: During the course of an adiabatic quantum computation, all current biases cb x and flux biases f b Y are set to zero, thus ensuring that the dc SQUIDs are in the zero voltage state. In order to decouple the QFPs from the flux qubits, the QFP latch line is set to provide Φ x latch ≈ Φ 0 /2 to all QFPs, thus suppressing their persistent currents to negligible levels. Once the qubits have reached their final states at the end of an adiabatic quantum computation, the waveform sequence depicted in figure 3 is applied to the readout circuitry. First, the QFP latch bias is raised to provide Φ x latch ≈ Φ 0 to all QFPs, thus raising the rf SQUID tunnel barriers (see figure 4) , which adiabatically changes the state of each QFP from being the ground state of a monostable potential to being the groundstate of a bistable potential. The direction of the resultant groundstate persistent current flow about the body of each QFP is determined by the polarity of the flux imparted by the qubit to which each QFP is coupled. At the end of this linear ramp the height of the QFP tunnel barriers ∆U qfp has been raised to a sufficiently high level so as to preclude further dynamics (∆U qfp ≫h/ L qfp C qfp ≫ k b T , where T is the temperature of the chip), and the final states of the flux qubits have been latched into the QFPs. Thereafter, a linear current ramp is applied to the bias cb x whose maximum amplitude is significantly lower than the maximum dc SQUID switching current I sw max amongst the population of dc SQUIDs connected in series with cb x , so that only the dc SQUID with a flux bias applied to its fb y line is triggered (see figure 5 ). The timing of a voltage arising across the dc SQUID is then measured: If the flux arising from the QFP increases (decreases) the total flux applied to the dc SQUID addressed by cb x and fb y , then switching to the voltage state will happen sooner (later) (see figure 7) . This difference in timing reveals the state of the QFP and therefore the state of the qubit. A more detailed discussion of each of the elements in this readout architecture and their operation is presented below. 
Qubit signal
For the purposes of reading the final state of the flux qubits in an AQO processor, the relevant signal is the magnitude of the qubit persistent current when the qubit tunnel barriers have been raised to their maxima. The flux qubits used in our circuits are of the type described in Ref. [14] and had inductances L q ∼ 300 pH, of which 10 pH was reserved for readout, and critical currents I For the qubit parameters cited above, the maximum tunnel barrier height between flux states would be ∆U q /h ∼ 460 GHz. The design pressures that led to this choice are discussed in detail in [14, 15] and so we only briefly repeat the relevant points herein. While in theory one could design flux qubits with larger I c q (and thus a larger ∆U q to provide immunity to any dc SQUID driving signal), asymmetry in the qubit junctions would eventually limit circuit performance as an AQO processor by producing uncorrectable flux offsets in the qubit as I c q is varied during annealing [14] . Furthermore, typical AQO architectures require a substantial number of inter-qubit couplings [3] which necessitates larger L q than may be considered appropriate in the study of flux qubits for implementing GM hardware. As such, the flux qubits in our circuits tend to possess substantially smaller I c q in order to realize appreciable single qubit tunneling energies ∆ q over as broad of a range of annealing parameter as is practical. Consequently, our flux qubits appear to be more susceptible to resonant activation by dc SQUID switching transients than other flux qubits reported upon in the literature.
QFP latch
The QFP is a 2-junction rf SQUID, which, in comparison to our flux qubits [14] , possesses a lower inductance L qfp ≈ 65 pH and a substantially higher critical current I c qfp ≈ 12 µA. Consequently, the former device has a maximum β = 2πL qf p I c qfp /Φ 0 = 2.37, a maximum persistent current I p qfp ≈ 10.5 µA, and a maximum tunnel barrier between its two persistent current states on the order of ∆U qfp ∼ 3.4 THz (compare with 0.46 THz for the qubit). The effective 1-dimensional potential energy of a QFP with identical Josephson junctions can be written as
where φ ≡ 2πΦ qfp /Φ 0 is the mean phase across the two Josephson junctions. Note that in tuning Φ x latch from Φ 0 /2 to Φ 0 that U qfp (φ) morphs from being monostable to being bistable (see figure 4) . Raising Φ x latch slowly adiabatically latches the state of each flux qubit into the QFP to which it is coupled for all qubit/QFP pairs in the integrated circuit. Adiabaticity is guaranteed here due to the large signal from the qubit and the relatively low bandwidth of the QFP latch line which has a rise time of approximately 1 µs. Adiabaticity is violated during the latching phase only if there is no signal applied to the QFP, as the two QFP circulating current states are then degenerate in energy in the large β limit. Note that if the qubit state were to be unchanged after dc SQUID switching, then the QFP could be operated fully reversibly: the QFP could then be reset in the presence of the signal in which it was initially latched, so the whole cycle of barrier raising and lowering would then be adiabatic.
One challenge in designing QFPs (or any 2-junction rf SQUID) is that junction asymmetry leads to a Φ x latch -dependent effective flux offset from zero [14] . This offset limits how small one can make M qfp,qu , as one would like the bimodal qubit signal to straddle any such flux offset for values of Φ x latch about which the QFP becomes bistable. Assuming a typical fabrication spread of 1% in junction critical currents and a QFP maximum β ∼ 2.5, this offset could be up to 4 mΦ 0 . For this reason, we had chosen M qfp,qu such that 2I 
Linear detection
By biasing the QFP such that it is barely monostable, it can be used as a preamplifier acting on the signal from the qubit. Realistically, achieving appreciable net flux gain into the dc SQUID requires careful tuning of the QFP and dc SQUID. Instead, to use the latching detector as a linear preamplifier, we apply flux feedback in software to the QFP body loop to keep the QFP at its balance point where we have a 50% probability of reading a plus or minus signal from the QFP. The required flux feedback is then a linear representation of the signal from the qubit. At this balanced population point, one can show that the QFP has minimal back-action on the qubit and can be used directly to probe the magnitude of the circulating current of the qubit and thus can be used to extract qubit parameters as is described in [14] .
Shift register readout
There is a well developed logic family based on QFPs [13] , and it is straightforward to design a QFP based shift register to reduce the line requirements of this readout circuit. We have successfully fabricated and tested an 8 QFP long shift register driven with a three phase clock and having a dc SQUID readout at the end as a prototype of such a readout scheme.
dc SQUID
The method for using a hysteretic dc SQUID for flux detection is discussed in [16] and more generally in [6, 7] . For the implementation discussed herein, current ramp times Figure 5 . Schematic I c vs. applied flux for a typical dc SQUID. Applying a signal via the fb y line allows a user to shift the operating point on the abscissa for a given row of dc SQUIDs. Then when a current bias is applied to the cb x line, only those squids whose critical current I c has been suppressed will switch.
were typically 10 − 100µs and peak voltages from the dc SQUIDs after switching were typically kept to much less than the gap voltage of Nb to reduce heating on chip. The mutual inductance from the dc SQUID to the QFP was M dc,qfp ∼ 1.4 pH.
Basic operation of the XY-addressable array of dc SQUIDs is presented in figure 3 . To read out the i th column and j th row QFP, we set the value of the current in the fb j line in figure 1 to provide a flux bias of approximately 0.35 Φ 0 into the dc SQUIDs on that row and the value in all other fb k =j lines to be 0. That results in the suppression of the critical current for the dc SQUIDs in row j to I c = I c0 cos(π0.35) = 0.45I c0 . Then, applying a current ramp on the column cb i of maximum amplitude 0.5I c0 will cause the target dc SQUID to switch at a time that gives information about the flux coupled from the QFP. In other words, the fb y lines are used to row select by flux biasing all dc SQUIDs in the target row to a bias point where a current ramp on the dc SQUID will trigger readout (see figure 5 ). The cb x lines are used as column selects and all other current biases are kept at zero. After we have registered an event, the current biases are reset to zero, and the process repeated for all i and j.
For devices with less than a few thousand qubits it is feasible to provide enough analog lines to run an XY readout scheme as described. Modifying the XY readout scheme by adding another control direction, for example an XYZ readout scheme where X and Z are summed together into the dc SQUID flux bias, changes the scaling of number of required wires to a cube root in the number of qubits, thereby allowing practical scaling to reading tens of thousands of qubits. If margins on the XYZ scheme are too low, then we can implement the QFP shift register method described earlier which uses a constant number of analog lines. In that case we can use excess analog lines to decrease readout time or increase operating margins.
Electronics
The electronics to run the readout and qubit control were integrated into a compact rack system with 128 output channels, a fraction of which were devoted to readout. The dedicated readout channels had an amplifier, a tunable comparator chain and retriggerable timers in order to detect multiple SQUID switching events per channel. For the purposes of characterizing test circuits, we limited the readout sequence to a rate of approximately one dc SQUID switching event/100 µs per channel. Without modifying the line bandwidth (3 MHz), this could in principle be sped up to roughly one event/5 µs. Further speed-up could be obtained by reading multiple columns simultaneously, but we have some experimental evidence that rf isolation between columns needs to be improved as dc SQUIDs from different columns can possibly interfere. With further work, one can likely reach the bandwidth limitation of the wiring in the refrigerator. For example, if a particular chip were to have 32 columns of dc SQUIDs and each line had a bandwidth of 3 MHz, one could obtain roughly a 50 Mbit/second readout rate.
Power consumption
The readout process dissipates energy onto the chip whenever a dc SQUID is in its normal state. An upper bound on the total energy dissipated by the readout process can be estimated as follows: After the dc SQUID switches to the voltage state, the large capacitance of the refrigerator wiring and filtering (C = 2nF ) starts to charge with a rate set by dV dt = I switch /C, typically ∼ 500µV /µs. For simplicity, we ignore the first several nanoseconds after switching which are dominated by transients and ringing as the lines are not matched to the high impedance the dc SQUIDs present at switching. Until the dc SQUID reaches a significant fraction of the gap voltage there is little dissipation on chip. For fast operation we typically restrict the charging time to less than two to three microseconds to keep the dc SQUID from charging to the gap voltage and generating heat. For the data presented in this paper, though, we allowed many microseconds to pass (see figure 7) . The resulting dissipated energy is ∆t × 2.8mV × 1µA ∼ 28fJ. If we performed this same inefficient procedure with a 100µs readout cycle, we would obtain an average dissipated power during readout of ∼ 280pW, which would be well within the capabilities of a standard dilution refrigerator to dissipate.
Challenges
While we routinely fabricate and operate XY readout systems based upon the architecture described herein, there were some formidable challenges that we had to overcome in order to operate them at their full potential. While the typical challenges associated with proper design of cryostat wiring, filtering, and shielding were encountered, we focus here on only those issues directly related to the readout scheme. First, failure to yield all junctions is typically a soft failure where one loses either a row or a column or even a single readout element. Since the readout system discussed herein was typically a part of a large integrated circuit, it was possible to at least study large portions of a circuit even in the presence of a few fabrication errors.
Related to fabrication yield is the issue of critical current variation across a chip. For the dc SQUIDs, small variations of critical current are not a significant issue as there is plenty of range between OFF and ON switching currents in this XY scheme. If one wanted to be even more frugal with bias line count, it would be possible by invoking an even higher dimensional scheme, such as XYZ in which one either sums X and Z into the dc SQUID flux bias or Y and Z into the current bias. However, such a scheme would come at the expense of a reduced ON and OFF switching current ratio in the presence of realistic critical current variations.
Similar to critical current variation, flux offsets of each dc SQUID lead to reduced ON/OFF margins as well. In practice, this proved to be the key challenge to operating large scale XY readout schemes in our laboratory. We now use a combination of passive magnetic shielding and active magnetic field compensation to achieve fields below 1 nT perpendicular to the chip. We still typically end up with flux offsets of a few mΦ 0 in QFPs and dc SQUIDs, which may be due to local magnetic impurities on the on-chip wiring. Cross-talk, both between off-chip and on-chip wiring, similarly reduces operating margins.
For the QFPs, asymmetry in the critical currents of their two Josephson junctions directly create an apparent flux offset that competes with the qubit signal, as discussed previously. Due to the threshold like latching, small differences in critical current do very little, while large differences render the device unusable as the qubit signal cannot overcome the asymmetry-induced flux bias acting on the QFP. Typical requirements are less than 2% junction critical current asymmetry in the QFP in order for the device to be operable.
We have successfully dealt with all of the issues listed above, and do not expect further scaling problems up to at least one thousand qubits (32 dc SQUIDs in a column).
A further subtle effect which is relevant to fast operation of the readout circuitry is thermally induced hysteresis. If one tries to perform experiments with a fast repetition rate (for tuned dc SQUID ramps, this rate is typically 100 µs), then the chip temperature is affected by dc SQUID heating (see discussion in §4 below). This dc SQUID heating depends quite strongly on when the dc SQUID escapes to the voltage state, and therefore what signal is applied to the dc SQUID. We have found that to measure any temperature dependent phenomena, such as the transition width of a QFP or qubit where one simultaneously requires high speed and high accuracy measurements we needed to temperature stabilize the mixing chamber of the dilution refrigerator at 35-40 mK (much above the < 10 mK base temperature of the refrigerators) and take care to minimize the time the dc SQUID stays in the voltage state. 
Readout errors
There are two errors that contribute to imperfect readout. The first is the thermal noise on the QFP, and the second is the uncertainty in the switching time of the dc SQUID caused by the stochastic nature of the quantum tunneling from the zero voltage state to the voltage state [17] . Each of these error mechanisms will be discussed below.
Examining the first error source, figure 6 shows the measured flux transfer curve of a typical QFP device, as obtained by applying a constant flux through the QFP flux bias line and then latching the QFP and measuring the probability of the QFP ending up in one of its two possible states (labelled 1 and -1) by repeating the measurement 1024 times at each flux bias point. To understand the source of the QFP transition width, it suffices to analyze the dynamics of the two lowest QFP energy levels as the barrier ∆U qfp between flux states is raised (as in figure 4 ) in the presence of a thermal bath. As the barrier is raised, the tunneling energy ∆ qfp between localized flux states gets reduced to zero.
Once ∆U qfp is high enough such that k B T and the square root of the integrated environmental noise spectral density W [18] is larger than the QFP ground to first excited state tunneling energy ∆ qfp , we reach the realm of incoherent macroscopic tunneling, as analyzed in [18, 19] . As long as ∆ 2 qfp /W is fast enough, the system is able to stay in thermal equilibrium via incoherent tunneling. At some point ∆ 
we extract an effective T = 47 mK. For comparison, the refrigerator was stabilized at approximately 40 mK. If we decrease the cooling time after the dc SQUID switching event before the QFP latch in the next measurement frame we can characterize the cooling of the QFP after the dc SQUID heating event:
Cooling time (µs) Temperature (mK) 300 75 ± 7 1000 62 ± 6 2000 55 ± 5 4000 47 ± 5
When operated with dc SQUID current biases tuned to minimize heating, or with sufficient cooling time such as the effective 47 mK data shown in figure 6 , we obtain a corresponding flux sensitivity of the QFP of k B T /I p qfp = 142 µΦ 0 . For comparison, the qubit signal coupled into the QFP is of order 10 mΦ 0 . This results in practically zero error probability for the latch.
The QFP flux uncertainty (142 µΦ 0 ) is significantly better than that of its accompanying dc SQUID, which was estimated to be 1.8 mΦ 0 from the width of its switching distribution. This dramatic difference in sensitivity arises from the fact that dc SQUID noise results from the stochastic nature of quantum tunneling from the zero voltage to finite voltage state with an equivalent temperature of several hundred mK, while the QFP annealing step leads to a thermal uncertainty near the refrigerator temperature of 40 mK.
Focusing now on the second error source, figure 7 shows measured switching histograms of a single dc SQUID for the case when its respective qubit is first initialized in one state (red) and then in the other (blue). While the dc SQUID switching distributions are characteristic of quantum tunneling and thus asymmetric, we still can characterize the dc SQUID sensitivity by the standard deviation of the switching histogram. The abscissa can be converted from time to flux by direct measurement of the shift in switching time as a known amount of flux is applied to the dc SQUID. The resulting dc SQUID sensitivity is 1.8 mΦ 0 per read, and the signal from the QFP latch is 11.9 mΦ 0 , resulting in a measured error probability of the dc SQUID in reading the QFP as small as 0.01. This is a reasonably low error probability, but we require even lower error rates in order to implement a large scale AQO processor. It is trivial to decrease this dc SQUID limited error by taking more samples of the dc SQUID switching time since the QFP maintains its state between these samples. The results of this repeated reading of the dc SQUID on its resolving power are shown in figure  7 . After 2 reads, the dc SQUID sensitivity increases to 1.3 mΦ 0 and after 4 reads to 0.9 mΦ 0 . At 4 reads we obtain an error probability of 10 −6 . This error probability Figure 7 . Reducing readout errors with repeated dc SQUID sampling of the QFP state. The dc SQUID is interrogated with a 50 µs long linear current bias ramp. The red and blue curves correspond to different initialized flux states of the qubit (which is then adiabatically transferred to the QFP). The three lines (from thin to thick) correspond to 1, 2, and 4 averaged reads of the dc SQUID. Once 4 reads are performed we see no overlap in the data set, which was 4 million points. From the thick lines we extract an error probability of 10 −6 that is quoted in the text. The small steps in the switching histograms are due to the resolution of the room temperature electronics. The measurements shown in this plot were obtained from one readout on an 8-qubit chip.
includes qubit initialization errors, QFP errors, and dc SQUID errors. This behavior, while varying slightly quantitatively, was similar for all 8 readouts on this test chip.
To demonstrate the performance of the entire readout system, we present the results of a system level margin measurement from a 128-qubit chip in figure 8. This particular chip had two flux biases to provide Φ x qfp to the 128 QFPs: The two lines each biased 64 QFPs, with the two groups partitioned depending upon the physical orientation of the QFP body and denoted as "horizontal" or "vertical". We then studied the error probability of the readout system as a function of horizontal Φ x qfp,h and vertical Φ x qfp,v QFP bias. In order to render the readout test independent of the details of quantum annealing, we prepared the 128 qubits into a known state by setting all of the inter-qubit couplings to zero [16] , biasing all qubits using their local flux sources [15] hard to one side, and then raising their tunnel barriers. This placed each flux qubit into a known state with certainty. We then read the state of each qubit, as described above using a single dc SQUID switching measurement. This sequence was repeated 128 times for every point on a 2-dimensional grid in (Φ this particular readout failed due to the inability of the qubit signal to straddle that QFP's degeneracy point, it contributed 1/128 to the total failure rate, thus dominating the shape of the inner most contour. The next margin contour then required a much larger horizontal flux bias shift to cause a second QFP to fail repeatedly. The further slow increase in failure rates is due to a several mΦ 0 random distribution of flux offsets in the QFPs. In order to better quantify the performance of the 128 qubit readout system, we biased the QFPs to an operating point near the center of the margin plot, (Φ x qfp,h , Φ x qfp,v ) = (−5, 0) mΦ 0 , and performed 65536 measurements in the manner cited above. We observed 5 errors out of 65536 measurements, which then provided an estimated system level error probability of 5/65536 ≈ 8 × 10 −5 . The five errors were attributed to different readouts and were probably due to the stochastic nature of the dc SQUID switching. From these results, we can crudely estimate the per qubit error to be on the order of 5/(65536 × 128) ≈ 6 × 10 −7 for the 128 qubit chip. This could be improved upon by simply using repeated dc SQUID reads, as was done with the 8 qubit chip, and the final limit upon the readout error probability will likely be determined by outlier events not captured in the preceding discussions.
Conclusion
We have described and implemented a scalable readout scheme that is suitable for a superconducting adiabatic quantum optimization system and measured the readout error rate and margins of an 8-and 128-qubit system. We conclude that the single qubit readout error rate can be made less than 10 −6 using the QFP enabled architecture, which makes readout errors a non-issue for large scale integrated circuits of the types considered herein. We have experimentally demonstrated a system level error rate of 8 × 10 −5 using a 128 qubit circuit. We do not expect any significant challenges in further scaling this architecture to about one thousand qubits. Finally, we have briefly discussed both an XYZ and a QFP shift register based solution that would facilitate scaling up to read even larger numbers of qubits on a single chip.
