Quantum computing architectures rely on classical electronics for control and readout. Employing classical electronics in a feedback loop with the quantum system allows to stabilize states, correct errors and to realize specific feedforward-based quantum computing and communication schemes such as deterministic quantum teleportation. These feedback and feedforward operations are required to be fast compared to the coherence time of the quantum system to minimize the probability of errors. We present a field programmable gate array (FPGA) based digital signal processing system capable of real-time quadrature demodulation, determination of the qubit state and generation of state-dependent feedback trigger signals. The feedback trigger is generated with a latency of 110 ns with respect to the timing of the analog input signal. We characterize the performance of the system for an active qubit initialization protocol based on dispersive readout of a superconducting qubit and discuss potential applications in feedback and feedforward algorithms.
I. INTRODUCTION
Recent quantum physical research is directed towards gaining experimental control of large-scale, stronglyinteracting quantum systems such as trapped ions [1] and solid-state devices [2] . The ultimate goal is to realize a quantum computer [3] [4] [5] [6] with a large number of quantum bits (qubits) which may outperform classical computers for certain computational tasks [7] [8] [9] [10] [11] . However, quantum systems do not act as stand-alone components but must be combined with classical electronics to control inputs such as microwave pulses or external magnetic fields and to record and analyze the output signals [12] . Analyzing the output signals in real time can be advantageous to condition input signals on prior measurement results and therefore realize a feedback loop with the quantum system [13] .
Quantum feedback schemes [14] make use of the results of quantum measurements to act back onto the quantum state of the system within its coherence time. Experimental realizations of quantum feedback have shown that it is possible to prepare and stabilize non-classical states of electromagnetic fields in optical [15, 16] and microwave [17] cavities, and to enhance the precision of phase measurements using an adaptive homodyne scheme [18] .
The first demonstrations of feedback protocols with superconducting qubits showed active initialization of qubits into their ground state [19] and the stabilization of Rabi and Ramsey oscillations [20, 21] . Further recent feedback experiments with superconducting qubits demonstrated the deterministic preparation of entangled two-qubit states [22, 23] , the reversal of measurementinduced dephasing [24] , and the stabilization of arbitrary * ysalathe@phys.ethz.ch single-qubit states by continuously observing the spontaneous emission from a qubit [25] .
Quantum feedforward schemes are closely related to quantum feedback schemes. In quantum feedforward schemes one part of a quantum system is measured while the action takes place on another part of the quantum system. A prominent example for a feedforward scheme is the quantum teleportation protocol [26] , which has been realized with active feedforward in quantum optics setups [27] [28] [29] [30] , in molecules using nuclear magnetic resonance [31] , trapped ions [32, 33] , atomic ensembles [34] and solid-state qubits [35, 36] .
The feedback latency is commonly defined as the time required for a single feedback round, i.e. the time between the beginning of the measurement of the state and the completion of the feedback action onto the state. A general requirement to achieve high success probabilities in quantum feedback schemes is that the feedback latency is much shorter than the timescale on which the quantum state decoheres.
Analog feedback schemes such as those reported in Refs. [20, 25] feature feedback latencies on the order of 100 ns, where the latencies are limited by analog bandwidth and delays in the cables in the cryogenic setups. However, analog signal processing circuits have limited flexibility. The flexibility can be improved by using a digital signal processing (DSP) unit in the feedback loop, which can be implemented on a central processing unit (CPU) or on a field programmable gate array (FPGA) [37] . CPU-based DSP systems offer versatile and convenient programming at the cost of several microseconds latency [17, 19] due to the delays introduced by the digital input and output of the signal, which is too slow to achieve very low error probabilities for feedback operations on superconducting qubits.
In this paper, we describe an FPGA-based feedbackcapable signal analyzer which allows for real-time digital demodulation of a dispersive readout signal [38] [39] [40] and the generation of a qubit-state-dependent trigger with input-to-output latency of 110 ns. Our signal analyzer is therefore among the fastest feedback-capable digital signal analyzers reported so far [21, 22, [41] [42] [43] . The capabilities of our signal analyzer enabled the feedforward action in the deterministic quantum teleportation experiment presented in Ref. [35] . In this paper, we illustrate the use of the feedback signal analyzer in a feedback loop for qubit initialization [19] and experimentally characterize its latency and performance.
The paper is organized as follows: in Sec. II we present an overview of a typical feedback loop in which our instrument is used and analyze the feedback latency. In Sec. III we discuss the implementation of the digital signal processing on the FPGA and analyze the processing latencies. Finally, in Sec. IV we experimentally characterize the performance of the feedback loop. In the appendices, we provide more details about our experimental setup and our implementation of the digital signal processing on the FPGA.
II. OVERVIEW OF THE FEEDBACK LOOP
In this section, we explain the elements of a typical feedback loop shown in Fig. 1(a) . We designed the feedback loop to issue pulses onto a superconducting qubit inside a dilution refrigerator conditioned on a measurement of the qubit state by analog and digital signal processing using cryogenic and room-temperature electronics. We first discuss the elements of the detection scheme and the actuator electronics and then present the latencies of the feedback loop. We provide a detailed description of our experimental setup in App. A.
A. Principle of the detection scheme
We consider the dispersive readout of the state of transmon qubits [44, 45] with typical frequencies ω q /(2π) ≈ 4-6 GHz for the transition between the ground |g and first excited state |e . We couple a microwave resonator to the qubit [green box in Fig. 1(a) ] with a frequency difference between qubit and resonator designed to be in the dispersive regime [38, 39] .
In our experimental realization of the feedback loop (see Sec. IV), the qubit transition frequency is ω q /(2π) = 6.148 GHz and the center resonator frequency amounts to ω r /(2π) = 7.133 GHz with dispersive coupling rate χ/(2π) ≈ 1.1 MHz between the qubit and the resonator. Depending on whether the qubit is in state |g or |e , we observe the dispersively shifted resonator frequency at ω r ± χ respectively.
The qubit-state-dependent frequency shift leads to a state-dependent resonator response when the resonator is probed with a microwave pulse. In the dispersive readout scheme, high-fidelity quantum nondemolition readout [13] is achieved when probing the resonator with . We consider a scenario in which the response time of the resonator is much shorter than the lifetime of the qubit. Specific times indicated are the onset of the readout pulse (t0) as well as the beginning (t1) and end (t2) of the integration time (τi, blue shaded region). We define the total readout time τRO as the time difference between t0 and t2 (blue arrow between dashed lines) [46] . (c) Sketch of the trajectories in the plane spanned by the I and Q components of the signal for the states |g (blue) and |e (red). Specific points in the trajectories are marked corresponding to the times t0, t1 and t2 as defined in (b). (d) Sketch of the typical distribution of the integrated in-phase component (I) when the qubit is in state |g (blue curve) or |e (red curve). The dashed line represents the threshold value It based on which the state of the qubit is determined.
power κ n ω r ≈ 10 −16 W such that the steady-state average photon number n in the resonator is on the order of 1-10 microwave photons [39, [46] [47] [48] [49] [50] . Due to the low power, it is essential to connect the output of the resonator to a Josephson parametric amplifier (JPA) [47, [51] [52] [53] [54] [55] [56] [57] [58] to be able to discern the qubitstate-dependent resonator response within a single repetition of the experiment and in a time shorter than the qubit lifetime. Other schemes involve the direct coupling of a qubit to a Josephson bifurcation amplifier [59] [60] [61] [62] , autoresonant oscillator [63] or parametric oscillator [64] to be able to discern the qubit state with a higher microwave power.
For simplicity, we consider the case where the resonator is probed with a microwave pulse with frequency ω r and square envelope. The scheme considered here could be extended to include more sophisticated pulse shapes [46, 49, 65, 66] which increase the speed and fidelity of the readout as well as the speed of the reset of the intraresonator field.
We employ the complex representation of the signal I(t) + iQ(t) ≡ A(t) exp [φ(t)] where A(t) and φ(t) are the time-dependent amplitude and phase of the signal at frequency ω r . Upon transmission of the readout pulse with frequency close to resonance, the time-dependent in-phase I(t) and quadrature Q(t) components of the signal follow an exponential rise towards steady-state values starting at time t 0 after the onset of the readout pulse as illustrated in Fig. 1(b) [67] . The steady-state values depend on whether the qubit is in state |g (blue curve) or state |e (red curve). The trajectories of the readout signal in the two-dimensional plane spanned by I and Q as sketched in Fig. 1 (c) start at the center of the plane which corresponds to zero amplitude and move into two different directions depending on the qubit state |g (blue curve) or |e (red curve).
The signal is subject to noise added by passive and active components [68] . Therefore we apply a linear filter to the signal with the goal to attenuate noise frequency components while keeping the frequency components that contain the signal [24, 40, 46, 69] . In particular, we apply a moving average filter which is advantageous in terms of the signal processing latency (see Sec. III C). The moving average is equivalent to an unweighted integration of the original signal in a particular integration window starting at a variable time t 1 and ending at time t 2 = t 1 + τ i [see Fig. 1(b) and Fig. 1(c) ], where τ i is a constant integration time. We define the total readout duration as the time difference τ RO ≡ t 2 − t 0 between the onset of the readout pulse and the end of the integration window. In the experiment presented in Sec. IV we used an integration window of τ i = 40 ns and a readout duration of τ RO = (105 ± 2) ns.
In the absence of transitions between qubit states during the integration time, the statistical distribution of the integrated signal, when the experiment is repeated many times, is expected to be represented by two Gaussian-shaped peaks in a histogram of the I component [ Fig. 1(d) ]. In the presence of qubit state transitions during the readout, the distributions corresponding to the states |g and |e are expected to be non-Gaussian with an increased overlap [40, 46] . We discern the states |g and |e of the qubit by comparing the I signal to a threshold value I t [dashed line in Fig. 1(d) ]. The fidelity of the readout depends on the signal-to-noise ratio of the readout signal [46, 49] . To maximize the readout fidelity, we optimize the integration window and threshold value I t .
B. Implementation of the detection scheme
The readout pulse is issued by the static control hardware [gray box in Fig. 1(a) ]. Simultaneously, the static control hardware sends a trigger [tr in Fig. 1(a) ] to the FPGA to synchronize the digital signal processing with the readout pulse.
We use an analog detection chain [yellow box in Fig. 1(a) ] containing amplifiers with a total gain of approximately 120 dB (see App. A) to detect the signal at the output of the resonator. In addition, the detection chain uses analog down-conversion electronics to convert the readout signal to an intermediate frequency ω IF compatible with the sampling rate f s = 100 MS/s of our DSP unit. We choose an intermediate frequency at a quarter of the sampling frequency, i.e. ω IF /(2π) = f s /4 = 25 MHz, which allows for efficient digital down-conversion (see Sec. III C). Note that in principle it is possible to directly demodulate the signal into its I and Q components in the analog signal processing but this requires the I and Q component of the signal to be digitized using two separate analog-to-digital converter (ADC) channels [41, 70, 71] . The separate digitization of the I and Q components is sensitive to mismatches between the conversion-loss and reference level which lead to a distortion of the digitized complex signal. In contrast, downconversion to an intermediate frequency in the range of 10 MHz to 1 GHz avoids low-frequency noise, DC offsets and requires only one ADC channel at the cost of a reduced bandwidth [41, 70, 71] .
We implement the digital signal processing on a Xilinx Virtex-4 FPGA mounted on a commercial DSP unit by Nallatech, Inc. (BenADDA-V4 TM ) [blue box in Fig. 1(a) ] which includes an ADC with sampling rate f s = 100 MS/s and 14-bit voltage resolution. In a first step, the DSP digitally demodulates the signal [labeled as demod. in Fig. 1(a) ]. The state discrimination module [state det. in Fig. 1(a) ] then compares the filtered I signal at time τ RO to the threshold I t , to determine the qubit state from the demodulated signal. Depending on the determined qubit state, a feedback trigger [fb in Fig. 1(a) ] is sent from the FPGA to the actuator electronics.
C. Actuator
The actuator is realized with an arbitrary waveform generator (AWG). When it receives the feedback trigger, the AWG generates a feedback pulse with a sampling rate of 1 GHz. In our experiment, the actuator pulse (AP) has a duration of τ AP = 28 ns and uses the derivative removal by adiabatic gate (DRAG) technique [72, 73] to prevent transitions to higher-excited states of the transmon outside of the subspace spanned by the states |g and |e . We typically generate the actuator pulse with a carrier frequency of 100-300 MHz limited by the bandwidth of the AWG and analog mixer. In the experiment presented in Sec. IV we chose a carrier frequency of 100 MHz for the actuator pulse. We use an analog mixer to up-convert the actuator pulse to the qubit transition frequency, which is typically in the range of 4-6 GHz. Forwarding this pulse to the qubit realizes a conditional quantum gate on the qubit closing the feedback loop.
D. Latencies
We define the latency τ FB of the feedback loop [ Fig. 1(a) ] as the time from the beginning of the readout pulse until the completion of the feedback pulse, i.e.
where τ EL,tot is the total electronic delay of the signal in the analog and digital components and cables of the feedback loop, τ RO the readout duration (see Sec. II A) and τ AP = 28 ns is the length of the actuator pulse (see Sec. II C). We measured the total electronic delay τ EL,tot = (219 ± 2) ns in-situ by changing the upconversion frequency of the feedback pulse to the resonance frequency of the readout resonator and adjusting the amplitude of the pulse. The resonant feedback pulse is transmitted through the resonator which makes it possible to determine the timing of the feedback pulse relative to the readout pulse. By adding up the contributions according to Eq. (1) we infer a feedback latency of τ FB = (352 ± 3) ns. The electronic delay
can be broken up into accumulated contributions. The signal processing, which we implemented in the FPGA, introduces a processing delay of three clock cycles τ proc = 30 ns (see Sec. III). The feedback trigger is delayed by τ proc + τ ADC,DIO = (110 ± 3) ns with respect to the analog input signal, where τ ADC,DIO is the delay introduced by the ADC and digital interfaces (see App. B). By subtracting the separately determined quantities τ proc , τ ADC,DIO and τ AWG from the total electronic delay τ EL,tot we estimate the inferred total group delay τ G,tot = (69 ± 7) ns in the cables and analog components. We expect the total cable length connecting the analog and digital components to be the dominant contribution to the inferred group delay. The inferred group delay corresponds to an approximate total cable length of 14 m considering an effective dielectric constant eff ≈ 2 for the coaxial cables with PTFE dielectric. This inferred total cable length is consistent with the experimental setup. The cable length in our setup could be reduced further by placing the individual components of the feedback loop closer to each other which can be achieved, for example, by placing the FPGA and control electronics inside the dilution refrigerator [74] [75] [76] .
III. FPGA-BASED DIGITAL SIGNAL PROCESSING
In this section, we describe our digital signal processing (DSP) circuit which we implemented on the Virtex-4 FPGA. To derive feedback triggers, the DSP circuit (Fig. 2) determines the qubit state by digital demodulation of the readout signal (see Sec. II). We start by discussing the digitization and synchronization of the input signal. Next, we discuss the signal processing features of each block and the corresponding latencies. Details of the FPGA implementation of each signal processing block are discussed in App. C. We analyze the FPGA timing and resource usage for the implementation of the DSP circuit on the Xilinx Virtex-4, Virtex-6 and Virtex-7 FPGA in App. D.
A. Digitization of the input signal
Before entering the DSP circuit, the readout signal is digitized by an external ADC chip which samples the signal with rate f s = 100 MS/s. Typical readout signals are sine waves with qubit-state-dependent amplitude and phase as shown in Fig. 3(a) . We parameterize the timedependent voltage at the input of the ADC as
As discussed in Sec. II B, we choose an intermediate frequency of ω IF /(2π) = f s /4 = 25 MHz for the readout signal after analog down-conversion (see Sec. II B) which is a useful choice for digital demodulation as discussed below. The time-dependent amplitudeÃ(t) is proportional to the amplitude A(t) of the field at the output of the resonator scaled by the gain of the analog detection chain and conversion loss of the mixer. The ADC samples the signal V ADC (t n ) at discrete times t n = n/f s = n × 10 ns with index n. The ADC encodes the input voltage range of approximately ±1 V as 14-bit fixed-point binary values. The fixedpoint representation leads to a discretization step size of 2 −13 V ≈ 0.12 mV. A trigger pulse (tr) is provided together with the analog signal via a separate digital input of the FPGA to mark the onset of the readout pulse.
B. Pipelined processing
We designed the DSP circuit to process the signal from the ADC in a pipelined manner. The signal from the ADC is initially buffered in a register implemented by synchronous D-flip-flops (ADC z −1 block in Fig. 2 ) which forward the value of the signal at each event of a rising edge of the sampling clock to the next processing element in the pipeline. A separate trigger input (tr, orange lines in Fig. 2 ) marks the beginning of each experimental repetition. In order to synchronize the trigger with the ADC signal, the trigger initially goes through six pipelined registers (z in Fig. 2) , which compensate the difference in delay between the ADC line and trigger line. To synchronize the signal processing with the sampling clock, we insert further pipelined registers into the signal and trigger lines at specific points in the circuit (blue dashed lines in Fig. 2 ).
C. Digital demodulation
As discussed in Sec. II B, we digitally demodulate the readout signal to obtain the I and Q components of the signal. Digital demodulation is achieved by digital frequency down-conversion which involves digital mixing of the signal with a digital reference oscillator followed by digital low-pass filtering to remove noise and unwanted sideband frequency components [71] .
Digital mixing
In the first part of the digital demodulation circuit (yellow box in Fig. 2 ), we implement a digital mixing method [70, 71] (digital mixer in Fig. 2 ) to obtain a sideband at zero frequency. In the digital mixer, the input signal V ADC as defined in Eq. (3), is multiplied with a complex exponential with down-conversion frequency ω IF to obtain a complex output signal S m , The action of the multiplication is to generate two sidebands corresponding to the two complex exponentials in Eq. (4); one is corresponding to the complex signal I + iQ ≡Ã(t)e iφ(t) /2 and the other leads to oscillations with frequency 2ω IF of the output signals of the mixer [ Fig. 3(b) ]. The complex signal I + iQ is the basis on which we determine the state of the qubit after filtering out the oscillating sideband (see following sections).
In practice, the real (Re[S m ]) and imaginary (Im[S m ]) parts of the output signal of the mixer are computed separately by multiplying the input signal with a discrete cosine to obtain the real part and with a discrete negative sine to obtain the imaginary part. The FPGA implementation of the digital mixer is described in App. C 1. For ω IF /(2π) = f s /4, the digital mixer introduces a latency of less than one clock cycle (10 ns) due to its multiplierless implementation [70, 71] . Since the output signal of the mixer is registered by synchronous D-flip-flops, the effective latency is one clock cycle. For synchronization, the trigger signal (tr) is delayed by one clock cycle [z −1 tr in Fig. 3(b) ].
Digital low-pass filter
The second essential part of the digital downconversion circuit is a digital low-pass filter, which extracts the I and Q components from the signals Re[S m ] and Im[S m ] by removing the sideband spectral components oscillating at frequency 2ω IF [71] . We implement the digital low-pass filter as a finite impulse response (FIR) filter [71] which is a discrete convolution of the digital signal with a finite sequence of filter coefficients. By matching the filter coefficients (integration weights) to the expected resonator response, it is possible to optimize the single-shot readout fidelity [24, 40, 46, 48, 49, 69] . While our DSP circuit in principle allows for 40-point FIR filters with arbitrary filter coefficients, a moving average is the simplest type of FIR low-pass filter which is possible to implement without multipliers and therefore has a reduced processing latency and uses less FPGA resources than a more general FIR filter. The FPGA implementation of the moving average module is described in App. C 2.
The moving average (FIR filter in Fig. 2 ) is applied separately to the real part (Re[S m ]) and imaginary part (Im[S m ]) of the complex output signal of the digital mixer, S m , leading to
which is a discrete convolution with a square window of length l. In the limit of negligible modulation bandwidth, the moving average filters a sinusoidal perfectly if the window length l is a multiple of the oscillation period. In the case of ω IF /(2π) = f s /4, the periodicity of the unwanted terms at 2ω IF is equal to two discrete samples. Therefore any window length which spans an even number of samples is suitable to filter out the 2ω IF sideband. The output of the moving average with window length l = 4 is shown in Fig. 3(c) . The I and Q signals at the output of the moving average show a smooth ramp towards a steady-state value. In the simulated signals shown in Fig. 3 an appropriate global phase offset has been chosen such that the difference between the traces corresponding to the |g and |e state is maximized in the I component of the signal (see Sec. II A).
The moving average module has a latency of one clock cycle. The trigger is delayed accordingly by one additional clock cycle (z −1 z −1 tr = z −2 tr) for synchronization.
D. Offset subtraction and scaling
Following the FIR filter block, the I and Q signals enter blocks which perform offset subtraction and scaling of the signal (green boxes in Fig. 2 ). The main purpose of offset subtraction is to set a threshold value as described in Sec. III E. Moreover, offset subtraction and scaling allows to make best use of the fixed range and resolution used for recording histograms (see Sec. III F).
The outputs of the offset subtraction and scaling blocks are described byĨ
where c I and c Q are offsets in the I/Q plane and m I and m Q are multiplication factors. We determine the parameters (c I , c Q ) and (m I , m Q ) in a calibration measurement. The latencies of the offset subtraction and scaling blocks are less than one clock cycle and no synchronous D-flipflops are used.
E. State discrimination module
The state discrimination module (red box in Fig. 2 ) determines the state of the qubit based on the preprocessed input signalsĨ andQ. Due to the offset subtraction, the threshold value for state discrimination can be kept fixed at zero which simplifies the FPGA implementation of the state discrimination module as discussed in App. C 4.
The readout time τ RO relative to the onset of the readout pulse (see Sec. II A) is specified with a variable delay of d clock cycles after the detection of the trigger signal, i.e. d × 10 ns = τ RO . In the example shown in Fig. 3(c) , the |g and |e states of the qubit are discriminated based on a threshold value (thick horizontal bar) defined for the I signal at a time t = 160 ns which is d = 14 clock cycles after the detection of the trigger signal z −2 tr. The simulated I signals corresponding to the |0 [blue curve in Fig. 3(c) ] and |1 [red curve in Fig. 3(c) ] state are well distinguishable at the time when the threshold is checked, such that the state of the qubit can be determined successfully even in presence of noise (see Sec. IV). The state discrimination module either issues the feedback trigger [red curve in Fig. 3(d) ] or does not issue the feedback trigger [blue curve in Fig. 3(d) ] based on the determined qubit state.
Our DSP circuit provides the possibility to derive a second feedback trigger (fb2 in Fig. 2 ) based on both the in-phase (Ĩ) or quadrature (Q) signal components. For example, in the quantum teleportation protocol [26] the states of two qubits at the sender's location are measured in order to perform a state-dependent rotation on a qubit at the receiver's location. In our experimental realization of the teleportation protocol as discussed in Ref. [35] , we discriminated the states of the two sender's qubits based on two threshold values defined for the I and Q signals.
Based on the outcome of comparing the I and Q signals to the two threshold values, we issued two independent trigger signals to two separate AWGs in order to implement a conditional operation on the receiver's qubit [35] .
F. Histogram module
The histogram module records how often the values of the signalsĨ andQ obtained from a specific integration window fall into a particular histogram bin when the experiment is repeated many times. The bins are defined by subdividing the signal range from -1 to +1 into typically 128 bins. From the histogram, an estimate of the probability density function of the signal at the specified times is obtained.
We typically repeat the experiment 10 5 -10 7 times to obtain standard deviations of less than a part per thousand for the counts in each histogram bin. Storing the histogram of the signal needs less memory than storing the value of the signal in each repetition if the number of repetitions exceeds the number of histogram bins. The histogram module therefore allows for data reduction at the time when the data is recorded.
We have used the histogram module in previous experiments to characterize the quantum statistics of microwave radiation emitted from circuit QED systems [77] [78] [79] [80] . In the context of feedback experiments, we record histograms to obtain the probabilities of observing a particular qubit state in two consecutive qubit readouts as described in Sec. IV.
We update the histogram at the same time as the state discrimination module determines the qubit state in order to analyze the readout fidelity and feedback performance (see Sec. IV). We synchronize the state discrimination module and the histogram module using a marker signal (fbTime in Fig. 2 ) which is sent from the state discrimination module to the histogram module. We use an external Zero Bus Turnaround (ZBT) Random Access Memory (RAM) (see Fig. 2 ) to store the histogram. When the recording of the histogram is completed, we transfer the histogram to the host computer via the interface. The implementation details of the histogram module are described in App. C 5.
IV. QUBIT STATE INITIALIZATION EXPERIMENT
In this section, the functionality of the presented DSP circuit is demonstrated in the context of a qubit state initialization experiment. In the experiment we use the feedback loop to reset the state of a superconducting qubit [19, [81] [82] [83] (see App. E) deterministically into its ground state, independent of its initial state. We correlate the outcomes of two consecutive qubit measurements in order to separate out the different effects such as the qubit lifetime and readout fidelity which contribute to the overall performance of the feedback protocol.
We choose the repetition period 10 µs of the experiment to be longer than the qubit lifetime T 1 ≈ 1.4 µs, such that the qubit is approximately in thermal equilibrium with its environment at the beginning of each experimental repetition. We observe a finite thermal population P therm ≈ 7% of the excited state |e due to the elevated effective temperature of about 114 mK of the system on which the experiments were performed (see App. F).
In order to test the feedback protocol, we prepare an equal superposition of the computational states |g and |e of the superconducting qubit. This choice of initial state will ideally lead to equal probabilities to find the states |g and |e when the qubit is measured. Preparing an equal superposition as an initial state will therefore test the feedback actuator for both computational states |g and |e of the qubit. An additional data set (App. F) shows that the feedback scheme can also be used to reduce the thermal population of the excited state [19, 82, 84] , providing an additional benchmark for our feedback loop.
Ideally, we consider the case when the qubit is initialized in the state |g corresponding to the Bloch vector pointing to the upper pole of the Bloch sphere [stage 1 in Fig. 4(a) ]. A microwave pulse at frequency ω q [green line in Fig. 4(b) ] is applied to the qubit to realize a π/2 rotation which brings the qubit into the superposition state |+ ≡ (|g + |e )/ √ 2 corresponding to a Bloch vector pointing at the equator of the Bloch sphere [stage 2 in Fig. 4(a) ].
When the qubit initially is in state |e , for example due to the non-zero temperature of the system, the effect of the π/2 rotation is to prepare the state |− ≡ (|g − |e )/ √ 2 which is an equal superposition of |g and |e with a different phase. The states |+ and |− are expected to lead to an identical distribution of outcomes in the state detection.
In the experiment, directly after the preparation of the initial state, at time t M1 = 0, the state of the qubit is measured with a readout pulse of length 160 ns (see M1 in Fig. 4 ) applied to the resonator. The dispersive readout projects the state of the qubit into either the ground or excited state corresponding to the upper and lower pole of the Bloch sphere [stage 3 in Fig. 4(a) ]. The DSP (see Sec. III) extracts the in-phase component I 1 during the readout pulse M1. We filter the signal I 1 with a moving average of four consecutive samples, corresponding to an integration window [blue region M1 in Fig. 4(b) ] of 40 ns. We extracted the time τ RO ≈ 105 ns of the end of the integration window [85] relative to the beginning of the readout pulse by fitting a theoretical model to the switch-on dynamics of the readout signal in a time-resolved measurement [46] . or |e . The measured initial excited state probability P[E 1 ] fb off = 46.06(3)%, is the fraction of counts of values I 1 above the threshold value I t = 16 mV [dashed line in Fig. 4(c) ] relative to the total count C tot = 2 097 152 of measurements.
With a master equation [86] we simulate the decay of the qubit state with characteristic time T 1 = 1.4 µs during the time of the π/2 pulse and the readout up to the center of the integration window [see Fig. 4(b) ]. Furthermore we take into account a bias of the measured probabilities towards 50% due to the finite readout error of 3% (see App. G). From the master equation simulation we obtain an expected excited state probability of P[E 1 ] sim = 47.07% in the first measurement M1 which agrees reasonably with the measured probability P[E 1 ] fb off (see above). A source of systematic errors is measurement-induced mixing [87] . An additional reason for the systematic deviation of the measured probability from the simulated probability is that the chosen threshold value I t = 16 mV deviates from the value I t, opt ≈ 13 mV which optimizes readout fidelity (see App. G). This offset leads to a bias of the observed probabilities towards the ground state in addition to a systematic bias due to state transitions during the integration time [46] .
The feedback loop is configured to deterministically prepare the state |g [stage 4 in Fig. 4(a) ]. The feedback pulse, inducing a π rotation of the Bloch vector of the qubit, turns the state |e into |g and vice versa. Thus, the feedback π pulse is issued only if the first measurement M1 revealed the qubit to be in state |e . The π pulse [red dashed line in Fig. 4(b) ] arrives at the qubit with delay of τ EL,tot (see Sec. II D) conditioned on the readout result of M1.
For verification, a second readout pulse (M2 in Fig. 4 ) is applied to the qubit at the time t M2 = 360 ns directly after the arrival of the feedback pulse at the qubit. The difference between t M2 and the beginning of the first readout pulse corresponds to the total feedback latency τ FB (see Sec. II D). We recorded histograms of I 2 , which is the filtered in-phase component of the signal at time t M2 + τ RO . When the feedback actuator is disabled, the histogram of I 2 [orange dots in Fig. 4(c) ] shows reduced counts on the right side of the threshold with an excited state probability of P[E 2 ] fb off = 34.97(3)%. Extending the master equation simulation introduced above to include the full pulse sequence up to the second readout pulse, we obtain P[E 2 ] fb off, sim = 37.89% in reasonably good agreement with the measured value. The state decay between M1 and M2, which leads to the observed reduction in the excited state population, causes errors in the feedback action as discussed below.
When the experiment is repeated with the feedback actuator enabled, the double-peaked histogram obtained from the first readout I 1 [blue dots in Fig. 4(d) ] is approximately identical to the case without feedback, as expected, with the measured excited state probability P[E 1 ] fb on = 46.02(3)% agreeing with P[E 1 ] fb off within the statistical error bars. After the feedback pulse, in the histogram of I 2 [orange dots in Fig. 4(d) ], the measured excited state probability is significantly reduced to P[E 2 ] fb on = 13.23(2)%. This probability compares reasonably well with the simulated value of P[E 2 ] fb on, sim = 10.50% obtained from the master equation simulation introduced above. We attribute the difference between the measured and simulated value of P[E 2 ] fb on to measurement-induced mixing and the deviation of the feedback threshold from the optimal value (see above).
To obtain a figure of merit for the feedback protocol that is independent of characteristics such as state decay and temperature of the quantum system, we study correlations between the outcomes of the two readout pulses M1 and M2. From the two-dimensional histograms [ Fig. 4(e,f) ] with axes I 1 and I 2 , we obtain experimental probabilities to observe a specific range R of two consecutive measurement outcomes (I 1 , I 2 ). The probabilities P[R xy ] correspond to observing the qubit in state x with the first readout pulse and consecutively in state y with the second readout pulse. These probabilities are obtained from the normalized counts in the four quadrants (R GG , R GE , R EG , R EE ) separated by the threshold [dashed lines in Fig. 4(e,f) ].
When the feedback is enabled, the measured probability P[R EE ] fb on = 11.57(2)% [ Fig. 4(f) ] corresponds to the unwanted event of the state |e being observed Fig. 4(f) ] of a transition from state |g to |e when the feedback loop is enabled is close to the reference value P[R GE ] fb off = 1.88(1)% [ Fig. 4(e) ] when the feedback is disabled. This shows that the state is correctly left unchanged when the qubit is already in state |g . A possible reason for the small systematic deviation of P[R GE ] fb on from P[R GE ] fb off , which is on the order of 0.2%, could be drifts in the experimental parameters such as the phase of the readout signal.
In summary, the probabilities of the combined events (Tab. I) show that in the feedback protocol the π pulse is applied only when it is intended and that the probability of the unwanted events in region R EE is limited by state decay between the first measurement and the feedback pulse.
V. CONCLUSIONS AND DISCUSSION
We developed a low-latency FPGA-based digital signal processing unit for quantum feedback and feedforward applications such as the qubit initialization scheme presented in this paper and the deterministic quantum teleportation realized in Ref. [35] .
Our experimental results show that the feedback loop performs as expected.
The total feedback latency amounts to τ FB = (352 ± 3) ns determined by the sum of ADC latency, processing latency, AWG latency, cable delays, readout time and feedback pulse duration. To reduce the probability of state decay between the state detection and the feedback action, the ratio r ≡ τ FB /T 1 of the feedback latency to the qubit lifetime T 1 needs to be reduced. Since the probability of state decay is expected to be proportional to 1−exp(−r), a T 1 time of about 40 µs would be needed to achieve error probabilities of less than 1% in one iteration of the feedback scheme presented in this work. Conversely, with the longest T 1 times achievable with state-of-the art superconducting circuits of up to approximately 100 µs [88] [89] [90] , feedback latencies of less than 100 ns would be needed to reduce the error probability to less than one part per thousand. In the present work, we demonstrated digital processing latencies on the order of 30 ns, which are among the shortest latencies reported for FPGA-based signal analyzers [21, 22, 43] in the context of superconducting qubits. Simultaneously, the usage of advanced readout strategies enables a shorter optimal readout times [46, 49] . Shorter latencies for analog-to-digital conversion and cable delays may be achievable by using custom-made circuit boards which work at cryogenic temperatures [74] [75] [76] or by on-chip logical elements [91] [92] [93] .
Low latency feedback loops may play a role in realizing future quantum computers, where a key ingredient is quantum error correction [94] [95] [96] in which error syndromes of a quantum error correction code are detected by repetitive measurements. The syndrome measurements are designed to keep track of unwanted bit flip and phase errors. In this context it is essential to have a flexible low latency classical processing unit to process the error syndromes without causing additional delay for the quantum processor. A large set of quantum error correction codes may work with a passive 'Pauli frame' update [97] , however, it still remains an open question [98] whether some level of correction and qubit reset using active feedback is preferable. Therefore, having a low latency signal processor with feedback capabilities as presented in this work, will be instrumental for scaling up quantum technologies. The device under test (DUT, green box in Fig. 5 ) is a superconducting circuit with one superconducting transmon qubit. The DUT is thermalized to the 20 mK stage of a dilution refrigerator (purple box in Fig. 5 ).
Single-qubit quantum gates are realized by driving transitions between the ground and first excited state of the transmon by applying microwave pulses through a dedicated microwave line (port A in Fig. 5 ). The microwave line is thermalized by attenuators at three temperature stages T = (4 K, 100 mK, 20 mK). The attenuators reduce the signal and noise coming from the roomtemperature electronics and add Johnson-Nyquist noise at their respective temperature T , thereby reducing the effective temperature of the microwave radiation in the cable. The qubit pulses for static control (grey box in Fig. 5 ) are generated by AWG 1 and up-converted to microwave frequencies using an I/Q mixer driven by a local oscillator (LO) signal from microwave generator MWG 1.
Readout of the qubit is realized by a pulsed measurement of the transmission of microwaves through a coplanar waveguide resonator (CPWR). The readout pulse is applied to the CPWR through the resonator drive line (port B in Fig. 5 ). The readout pulses are also generated by AWG 1. An I/Q mixer with an LO signal from MWG 2 allows for shaping the readout pulses which can be useful to achieve faster ring-up and ring-down of the intra-cavity field [65, 66] . In order to adjust the power range for the resonator drive a variable attenuator is used at the RF output of the mixer.
The transmitted signal is directed through an isolator, circulator, and directional coupler to a Josephson parametric amplifier (JPA) [51] based on a λ/4 resonator shunted with an array of SQUID loops [52, 53, 56] . The isolators and circulators protect the DUT from pump leakage and thermal noise. The pump tone needed to achieve a gain of approximately 20 dB in the JPA is derived via splitters from the same microwave generator MWG 2 as is used for the readout pulses which reduces drifts of relative phase between the two signals. Low phase noise is essential if the JPA is operated in a phasesensitive mode [80, 99] . The pump signal (port E in JPA. To avoid saturation of the subsequent amplifiers, we destructively interfere the reflected pump tone with a cancellation tone applied to the directional coupler (port D in Fig. 5 ). The phase and amplitude of the cancellation tone are adjusted using a variable phase shifter and attenuator.
After amplification by the JPA, the signal is passed via isolators which attenuate reversely propagating radiation towards a high-electron-mobility transistor (HEMT) amplifier to further amplify the signal with a gain of 40 dB before it exits the dilution refrigerator (port C in Fig. 5 ).
In the detection electronics (yellow box in Fig. 5 ) at room temperature, the signal is amplified further using low-noise microwave amplifiers. In order to reduce noise below the frequencies of interest, the signal is high-pass filtered with a cut-off frequency of about 4 GHz. The carrier frequency of typically 7 GHz is converted down to an intermediate frequency (IF) using an analog I/Q mixer and a separate microwave generator, MWG 3, for the LO signal. The IF signal at the I output of the mixer is further amplified using an IF amplifier and low-pass filters are used to suppress noise outside the detection bandwidth of the ADC (50 MHz) . Attenuators between the amplifiers and the mixer are used to suppress stand-ing waves due to impedance mismatches and in order to prevent saturation of the mixer, amplifiers and ADC.
After amplification and analog down-conversion, the signal is digitized by the ADC and forwarded to the FPGA on the Nallatech BenADDA-V4
TM card. The digital signal processing (DSP) circuit which we implemented on the FPGA generates a feedback trigger conditioned on the digitized and processed signal (see Sec. III).
The feedback trigger is forwarded to AWG 2 which is part of the actuator electronics (red box in Fig. 5) . When receiving the feedback trigger, AWG 2 generates a pulse which is up-converted to the qubit frequency, typically at 5-6 GHz, using an I/Q mixer and LO from microwave generator MWG 4. Bias-tees allow to compensate unwanted DC offsets of the I/Q inputs of the mixer in order to suppress LO leakage. The up-converted microwave pulses are forwarded to the qubit (port A in Fig. 5 ).
All AWGs, MWGs, as well as ADC and DSP clocks are synchronized to a 10 MHz sine wave from an SRS FS725 rubidium frequency standard.
Appendix B: Latency of analog to digital conversion and digital input
The ADC latency and digital input-output latencies of the FPGA are inferred from the timing relative to the input trigger and feedback trigger. When the variable delay in the state discrimination module (see App. C 4) is set to d = 1 clock cycle, we measure the delay from the trigger input to the feedback trigger with an oscilloscope to be τ tr-fb = 110 ns ± 3 ns. Since the input trigger is synchronized with the digitized signal from the ADC in the DSP circuit, we infer that the ADC delay and digital input-output delay is τ ADC,DIO = τ tr-fb − τ proc = 80 ns ± 3 ns.
The delay τ ADC,DIO has several contributions which we did not determine individually. The pipelined architecture of the AD6645 ADC introduces a delay of four clock cycles (40 ns) and a latency of one additional clock cycle (10 ns) to transfer the digitized signal from the ADC to the FPGA where it is registered in a synchronous Dflip-flop. Further delays are expected to contribute to τ ADC,DIO due to the routing of the digital signal on the BenADDA-V4
TM board as well as pad-to-flip-flop and flip-flop-to-pad delays on the FPGA (see App. D 1).
Appendix C: Implementation details of digital signal processing blocks
Here we specify implementation details of the blocks of the DSP circuit presented in Sec. III which are relevant for the processing latency.
Digital mixer
The cosine and sine signals, cos(ω IF t n ) and − sin(ω IF t n ), for digital mixing are typically generated either using a lookup table with precomputed values or by an iterative algorithm and then multiplied with two copies of the signal as shown in Fig. 6(a) . While these methods work for arbitrary frequencies ω IF , a simplified method exists for the special case when ω IF equals a quarter of the sampling rate, i.e. ω IF /(2π) = f s /4 [70, 71] .
In the f s /4 case, the periodic sequences for the cosine and negative sine are simply (1, 0, −1, 0) and (0, −1, 0, 1) respectively [70] . Since multiplication with 0, 1 and −1 is trivial, we replace the multipliers by counter-driven multiplexers (MUX) that periodically switch between four inputs as shown in Fig. 6(b) . The 2-bit repeating counter (CNT) iterates through a sequence of four values (0, 1, 2, 3), jumping to the next value in every clock cycle and restarting from 0 after it has reached 3. The output of the counter is forwarded to the selection (sel) input of the multiplexers (MUX). The selection input of the multiplexers determine which of the four inputs (in0, in1, in2, in3) of the multiplexers are forwarded to their output. The four inputs of the multiplexer for the real part (Re[S m ]), correspond to multiplying the signal with (1, 0, −1, 0) while the inputs of the multiplexer for the imaginary part (Im[S m ]) correspond to multiplication with (0, −1, 0, 1).
Moving average
In the following, we discuss how to implement the moving average (circuit shown in Fig. 6(c) ), which is the simplest type of FIR filter, with a processing latency of less than one clock cycle (10 ns). The moving average is applied in parallel to the real and imaginary parts of the output S m of the mixer, i.e. two copies of the circuit shown in Fig. 6 (c) are implemented with outputs I and Q respectively. The first step in the circuit for computing the moving average, as shown in Fig. 6(c) , is to fan out the input signal into two branches. One branch b is delayed by a variable delay (z −l ) of l clock cycles while no operation is performed on the other branch a, i.e. b m = a m−l . A subtractor then computes the difference a − b between the values of the two branches which is forwarded to an accumulator [+= in Fig. 6(c) ]. In every clock cycle, the accumulator adds the value at its input to the sum stored internally and forwards the updated sum to the output. Therefore the output S accu,n at clock cycle n of the accumulator is the sum of all input samples up to clock cycle n − 1, i.e.
where the last equality holds assuming that all input samples with negative index are equal to zero, i.e a m = 0 for m < 0. To make sure that this assumption holds true, we initialize the registers of the variable delay and the accumulator to zero. As depicted in Fig. 6(c) , an additional adder (+) adds the most recent value of the difference a n − b n at the input of the accumulator to its output and a constant factor of 1/l normalizes the moving average. Thus, the final signal at the output of the moving average
As opposed to the sums in Eq. (C1), which stop at index n − 1, the final sum in Eq. (C2) includes the most recent sample with index n, which shows that the additional adder reduces the effective processing latency to less than one clock cycle.
Preprocessing module
Offset subtraction (−c in Fig. 6(d) ) is implemented with lookup tables (LUTs). The parameter c is configurable via the interface with the host computer (indicated by dashed arrows). The multiplication (×m in Fig. 6(d) ) is implemented without the use of actual multipliers but rather uses bit shift operations, which are effective multiplications with powers of two. Avoiding the allocation of multipliers reduces hardware resource consumption and leads to reduced processing latencies. The multiplication is made configurable using multiplexers to choose between different bit shift operations. The bit shift operation is chosen via the host computer interface.
State discrimination module
The state discrimination module determines the qubit state and provides feedback triggers based on the sign bits x and y of the preprocessed signalsĨ andQ as shown in Fig. 6(e) . The sign bits ofĨ andQ are 0 if the respective signal is positive and 1 if it is negative, as depicted in Fig. 6(f) . Due to the prior offset subtraction, determining the sign bits ofĨ andQ is equivalent to comparing the I and Q signals each to an arbitrary threshold value. Two lookup tables (LUT) define the binary feedback with two independent bits L (1) xy and L (2) xy which are selected based on the two sign bits x and y as depicted in Fig. 6(g) . The entries of the LUT can be set via the host computer interface [dashed arrows in Fig. 6(e) ]. In the example shown in Fig. 6(h) , the value of the feedback bit is 1 if and only if x = 0 corresponding to a non-negative value of the I component of the signal.
The input trigger signal is used as a reference for the timing of the feedback triggers relative to the onset of the readout pulse. As shown in Fig. 6(e) , the trigger first enters a rising edge detection block. The output of the rising edge detection block is 1 if and only if the input binary value of the trigger was 0 in the previous clock cycle and 1 in the present clock cycle. The output of the rising edge detection is delayed with a variable delay z −d where d is the number of clock cycles (each being 10 ns) corresponding to the readout time τ RO , i.e. d × 10 ns = τ RO . The parameter d can be set via the host computer interface. The output of the variable delay, to which we refer as the fbTime marker, marks the specific time at which the feedback pulse is provided. To assert that the feedback triggers are issued at the correct time, the feedback triggers fb and fb2 are based on the AND operation of the output of the LUT and the fbTime marker.
Histogram module
The histogram module is important to assess the feedback performance and to calibrate the experimental setup. Here we explain how our multi-dimensional histogram module is implemented. The histogram module has different operational modes. We first introduce the circuit for recording two-dimensional histograms as shown in Fig. 7(a) . The input signalsĨ andQ are rounded to 7-bit fixed point numbers which means that the full range of ±1V is subdivided into 2 7 = 128 bins. The 7-bit fixed-point representations ofĨ andQ are concatenated into a 14-bit address of the histogram bin which stores the number of occurrences of the combination of values (Ĩ,Q). The "increase count" block manages the communication with the ZBT RAM in order to increase the stored count whenever the enable flag (en) is active. For feedback experiments, the enable flag is derived from the fbTime marker such that the histogram is updated when the feedback decision is made (see Sec. III).
In the correlation mode of our histogram module, a buffer [ Fig. 7(b) ] stores the value ofĨ at every reception of the fbTime marker. The buffered signalĨ 1 is combined with the most recent signalĨ 2 to record the probability to observe a specific combinationĨ 1 andĨ 2 in two consecutive readouts (see Sec. IV). In addition a segment counter [seg cnt in Fig. 7(b) ] allows to distinguish alternating experimental scenarios, such as when the feedback is enabled or disabled alternately in consecutive runs of the experiment. In the correlation mode, the value of Q is in principle not needed but is an additional useful piece of information. In order to make use of the total amount of 2 25 bits (4 MB) available space in the ZBT RAM, we reduce the Q dimension to 5 bits and concatenate the values (Ĩ 2 ,Q,Ĩ 1 , seg) into a 21-bit address with a word size of 16 bits to store the counts as presented in Fig. 7(b) .
In the time-resolved mode, a time counter [time cnt in Using the Xilinx ISE R tool suite [100] , we extracted information about the timing of the signal processing for our present implementation of the DSP circuit in the Virtex-4 (xc4vsx35-10ff668) and for future implementations on the Virtex-6 (xc6vlx240t-1ff1156) and Virtex-7 (xc7vx485t-2ffg1761c) FPGA. In App. D 2, we present the corresponding FPGA resource allocations.
We define the pad-to-pad delay τ p−p as the delay the digitized signal encounters in the path from the signal input pads of the FPGA to the feedback trigger output pad. For the full implementation on the Virtex-4 FPGA ("V-4 full" in Tab. II), the predicted pad-to-pad delay amounts to τ p−p ≡ τ p−f + τ proc + τ f−p = 1.5 ns + 30 ns + 3.8 ns = 35.3 ns,
where τ proc = 30 ns is the processing latency of three pipeline stages (see blue dashed lines in Fig. 2) . Moreover, the pad-to-flip-flop delay τ p−f = 1.5 ns is the maximum delay from the ADC input pads to the D-flipflops of the first pipelined register. Furthermore, the flip-flop-to-pad delay τ f−p = 3.8 ns is the maximum delay from the flip-flops of the last pipelined register to the output pad of the feedback trigger. The pad-toflip-flop τ p−f and flip-flop-to-pad τ f−p delays are expected to contribute to the digital input and output delay τ ADC,DIO (see Sec. II D).
A clock period analysis shows that the minimum clock period due to the timing of the signals between two pipelined registers amounts to T min = 6.7 ns which corresponds to a maximum clock frequency of f max = 149 MHz. Increasing the clock frequency in a pipelined architecture is however only beneficial when the sampling rate of the ADC is also increased. Instead, removing pipeline stages in the signal path can lead to a further decrease in processing time as long as the minimal clock period is larger than the sampling period, i.e. T min ≥ 1/f s .
As a first step towards a future optimization of the processing and pad-to-pad delay, we separately simulated the implementation of what we consider the core feedback functionality of the DSP circuit which only includes the f s /4 mixer, the moving average, the offset subtraction and scaling modules and the state discrimination module. For the implementation of the core DSP circuit we keep only two pipelined registers, one at the ADC input and one at the feedback outputs fb and fb2. Therefore the processing latency amounts to one clock cycle. In order to optimize the pad-to-pad delay, we optimized first the register-to-register delay, which determines the maximal clock frequency. Afterwards, we optimize the pad-to-register and register-to-pad delays.
Assuming that the sampling rate is equal to the maximal clock frequency, we obtain a pad-to-pad delay of 4 ns + 9. Tab. II) . These results show that a further reduction of the latency introduced by the DSP from 35.3 ns to 14.3 ns is possible by an optimized implementation of the core functionalities and using a recent FPGA. We therefore consider the integration of a recent FPGA into our experimental setup as possible future work.
FPGA resource analysis
Here we report the FPGA resource allocation for the full design implemented on the Virtex-4 FPGA and compare it to the resources needed to implement the core functionality consisting of the f s /4 mixer (App. C 1), the moving average (App. C 2), the preprocessing module (App. C 3) and the state discrimination module (App. C 4). The analysis of the resource allocation is done for the implementation of the core design on the Virtex-4, Virtex-6 and Virtex-7 FPGAs corresponding to the timing analysis performed in App. D 1.
The resource usage is summarized in Tab by the flexibility in the signal processing, such as the phase-adjustable mixer and the FIR filter with arbitrary coefficients and the possibility to record histograms. In addition, the full design includes hardware modules for interfacing with PC and ZBT memory. To implement the added flexibility in the signal processing, the full design requires n DSP = 184 dedicated DSP slice resources, which contain multipliers and adders. The core design (V-4 core, V-6 core and V-7 core in Tab. III) implements only a subset of the functionality to maintain the minimal requirements for the DSP operations. Therefore, the number of D-flip-flops n DFF and LUTs n LUT is reduced by almost two orders of magnitude compared to the full implementation. In addition, the implementation of the core functionality does not require dedicated DSP slices of the FPGA (n DSP in Tab. III) since no multipliers are used in the blocks of the core design. The numbers n DFF and n LUT vary depending on whether the core design is implemented for the Virtex-4 (V-4 core), Virtex-6 (V-6 core) or Virtex-7 (V-7 core) FPGA, as displayed in Tab. III. We ascribe the variations of resource usage among the implementations of the core design to differences in the slice and LUT structure between the respective FPGA models. Different slice and LUT structures result in differences of the resource optimization in the mapping process using the Xilinx ISE software.
Appendix E: Experimental parameters
The superconducting transmon qubit [44] has a resonance frequency ω q /(2π) = 6.148 GHz corresponding to the transition between the ground and first excited state and an anharmonicity of α = −401 MHz. The qubit is capacitively coupled to a λ/2 coplanar waveguide resonator with a coupling strength g/(2π) ≈ 65 MHz. We measure a fundamental mode resonance frequency of ω r /(2π) = 7.133 GHz defined as the center of the dispersively shifted resonance frequencies for the qubit states |g and |e . We measure a linewidth of κ/(2π) = 6.3 MHz of the resonator. The qubit shows an exponential energy relaxation with time constant T 1 ≈ 1.4 µs. We choose an experiment repetition period of 10 µs, which for the given T 1 , is sufficient to obtain a residual out-of-equilibrium excited state population of 0.1% . The measured thermal equilibrium excited state probability is P therm ≈ 7% (see App. F).
The envelope of the microwave pulses for qubit rotations is Gaussian with σ = 7 ns, truncated symmetrically at ±2σ as seen in the pulse scheme Fig. 4(b) and uses the DRAG technique [72, 101] to avoid errors due to the presence of states outside the qubit subspace.
From pulsed spectroscopy we observe a dispersive shift of the resonator frequency ω |g (|e ) r for the qubit in the ground |g or excited state |e of
We choose the frequency of the resonator drive pulses for dispersive readout at the center between the two shifted resonator frequencies, i.e ω r ≡ (ω |e r +ω |g r )/2. The amplitude of the readout pulse is chosen such that the expected steady-state mean photon number is n readout ≈ 10, which we calibrated by measuring the ac Stark shift [102] of the qubit frequency when a continuous coherent drive is applied to the resonator.
Appendix F: Reduction of thermal excited state population
A possible application of active feedback initialization is to temporarily reduce the excited state population when the qubit is initially in thermal equilibrium with its environment [19, 82, 84] . In order to test the performance of our feedback loop for reducing the thermal excited state population, we omit the π/2 pulse in the beginning of the protocol presented in Sec. IV, such that the expected input state is a mixed state described by the density matrix
where P therm is the excited state population when the system is in thermal equilibrium with its environment. As discussed in Sec. IV, the qubit state is measured by two successive readout pulses M1 and M2. When the feedback actuator is disabled, the measured histogram of the in-phase component I 1 of the signal during M1 (blue dots in Fig. 8(a) ] is almost identical to the histogram of the in-phase component I 2 of the signal during M2 [orange dots in Fig. 8(a) ). From counting the values on the right side of the threshold [dashed line in Fig. 8(a) ], we obtain corresponding thermal excited state probabilities of P[E 1 ] fb off = 8.21(2)% for the first measurement M1 and P[E 2 ] fb off = 8.18(2)% for the second measurement M2. This indicates that, without conditioning on the measurement outcome, the measurement leaves the thermal steady state ρ therm unperturbed. Note that the overlap readout error (see App. G) biases the measured probabilities towards 50%. Taking this bias into account, we infer a thermal excited state population of P therm ≈ 7% from the measured probabilities P[E 1 ] and P[E 2 ]. The inferred thermal excited state population P therm corresponds to a temperature of a bosonic environment of T env ≈ 114 mK. The effective temperature T env is close to the measured base temperature of the dilution refrigerator which for the presented experiment was 90 mK instead of the typical temperature of 20 mK due to problems with the cryogenic setup.
When feedback is enabled, the excited state probability in the second measurement amounts to P[E 2 ] fb on = 5.43(2)%, as obtained from the histogram of I 2 [orange dots in Fig. 8(b) ], is reduced compared to the excited state probability in the first measurement [blue dots in Fig. 8(b) ], showing that a reduction of the thermal excited state population is possible with our feedback loop. The measured probability P[E 2 ] fb on is in reasonably good agreement with the simulated value of P[E 2 ] fb on, sim = 5.24% obtained from a master equation simulation using the same model and parameters as discussed in Sec. IV. We recorded two-dimensional histograms of the values I 1 and I 2 for the case when feedback is disabled and enabled as shown in Fig. 8(c) and Fig. 8(d) respectively. The measured relative counts in the four regions (R GG , R GE , R EG , R EE ) of the two-dimensional histograms show the swapping of the probabilities P[R EG ] and P[R EE ] and the invariance of the probabilities P[R GG ] and P[R GE ] under the feedback action as discussed in Sec. IV. The histogram of the signal in the region R EG for the "feedback off" case [ Fig. 8(c) ] matches well with the histogram in region R EE for the "feedback on" case [ Fig. 8(d) ]. In particular, the corresponding probabilities P[R EG ] fb off = 2.66(1)% and P[R EE ] fb on = 2.79(2)% match reasonably well, which shows that the feedback pulse is applied when the state |e is detected in the first measurement.
The experimentally observed probabilities are in reasonably good agreement with the simulation results P[R EG ] fb off, sim = 1.98% and P[R EE ] fb on, sim = 1.45% (Tab. IV) considering the sources of systematic errors as discussed in Sec. IV. We observe that the histogram in the region R EE in Fig. 8(d) is double-peaked, which is a consequence of the readout error since the tail of the distribution associated with the |g state extends into the region R EG .
Furthermore, the histograms in the region R GE match for both the "feedback off" [ Fig. 8(c) ] and the "feedback on" case [ Fig. 8(d) ]. The probabilities of P[R GE ] fb off = 2.63(1)% and P[R GE ] fb on = 2.64(1)% agree within the statistical error bars, which shows that the feedback pulse is not applied when the state |g is detected in the first measurement.
The data presented here serves as a further experimental benchmark of our implementation of the feedback scheme and illustrates the use of two-dimensional histograms to get insight into processes that lead to the observed excited state probabilities.
Appendix G: Readout fidelity
The readout is calibrated in a separate calibration step where either no pulse or a π pulse is applied to the qubit prior to the measurement. A threshold check, as described in Sec. II A, leads either to the result G corresponding the qubit state |g or E corresponding to |e . The single-shot readout fidelity is defined as F r = 1 − P[E|"no pulse"] − P[G|"π pulse"],
where P[E|"no pulse"] represents the conditional probability of obtaining the result E when no pulse has been applied whereas P[G|"π pulse"] represents the conditional probability of obtaining the result G when a π pulse has been issued. For a fixed moving average window length of l = 4 digital samples (40 ns), the single-shot fidelity reaches a maximal value of F r = 77% at time τ RO ≈ 105 ns relative to the onset of the readout pulse. We expect the contributions to the readout infidelity to be 1 − F r ≈ 2P therm + P decay + P overlap ,
where P therm ≈ 7% is the initial excited state population in thermal equilibrium (see App. F), P decay ≈ 1 − exp(−τ RO /T 1) ≈ 6% is the error due to the decay of the |e state and P overlap ≈ 3% is the probability of misidentification of the qubit state due to overlap of the probability density functions for the signals corresponding to the |g and |e state. We extracted the contributions to the readout infidelity from fits to recorded histograms using methods similar to the ones described in Ref. [46] .
